Rituals»Blog
William Bundy
I've finally removed SDL2 and the CRT (on windows) from my personal codebase! I've been furiously hacking away at a platform layer + needed libs for most of April, and while it wasn't ready for Ludum Dare, it's coming along now; I can get a window + OpenGL graphics + input working with no issues.

Removing the CRT proved to be an issue for a lot of the free/public domain libraries I was using. After correctly mapping intrinsics and other library functions to my own code, adding my own special-case sort functions to replace instances of qsort, and commenting out #includes when things didn't work, this is how things ended up:
  • stb_vorbis didn't make it. I couldn't stop it from generating a __chkstk in one of its procedures.
  • stb_image works, but has trouble. I ended up spending a lot of time tracing its allocations, only to find that it expects realloc to behave like malloc if it passes a null pointer, which differs from the behavior of Microsoft's HeapReAlloc. Right now it's having trouble with pngs generated by Aseprite, so I'm still looking into that.
  • stb_sprintf works perfectly!
  • I ended up cutting nuklear; I didn't want to rewrite the vertex buffer renderer and I find the library has a lot of little gotcha's. If I'm going to have to read the source code to figure out how to use a library, if it's got problems, I'd rather copy the hard stuff out or write it myself. The amount of time I saved in VMFSketch by using nuklear was significantly offset by the amount of time lost trying to figure out why my app would crash if I didn't call nk_begin on every window every frame. In the end, if I'm prepared to write my own GUI stuff (which I've already done in part for Rituals), I would only be using nuklear for its font atlas stuff, which isn't too much to implement myself.
  • To that end, I pulled in stb_truetype, which compiled just fine after replacing all its intrinsics.
  • stb_rectpack had to have a few instances of qsort replaced.
  • If I remember correctly, miniz, sts_mixer, dr_wav, and dr_flac all worked just fine after cleaning up.

Small, single-file libraries tend to do pretty well if you have replacements for the commonly used CRT functions, namely qsort, memcpy and the math.h transcendentals, all of which I spent time on replacements for.

Funnily enough, the transcendental math was the easiest: these zlib licensed implementations seem to be pretty good. Add several hours translating a atan2 approximation to SSE2 intrinsics, adding single-float versions, and filling in some of the gaps (pow, fabs, min, max, sqrt/rsqrt, ldexp), and I have a reasonable replacement for most of the CRT's math functions.

Sorting turned out to be a little more difficult, but only because I bothered translating Orson Peters' pdqsort to C. The translation of an iterator-heavy C++ library to the sort(type* array, size_t count) C-style ended up being more confusing than I had accounted for; I found myself debugging line-by-line in two instances of Visual Studio. Some problems I ran into:
  • std::less behaves differently than qsort comparison functions.
  • Keeping track of offsets is tricky when converting from iterators to array/count style. I could, maybe should have used two pointers instead.
  • Debugging with C++ iterators is a pain, since you don't know where you are relative to the start of the array. I suppose my implementation doesn't work with generic list-like objects... but that's a bridge to cross when I get there.
  • When bugs had me reading and writing outside of the array, offsetting the array in a bigger chunk of memory and writing all the external values to -1 helped find where things were going wrong
  • Converting everything to macros wasn't as straightforward as I'd have liked; I ended up inlining everything to get around this, which was no fun.

Actually, I'm pretty sure there's still a bug in there that makes it run slower when a lot of values are the same (probably a <= vs a <), but the important thing is that it correctly sorts everything I thought to throw at it. When I start using it more, I'll compare its speed to other implementations.

And, last of the big three things I've wrestled with, replacing memcpy had me running in circles for a while. I wasn't able to find too much on the subject; talking to d7 and J_vanRijn in the discord and mmozeiko's post were my main sources of information (there's a big post on CodeProject too, but supposedly it's all licensed under the CPOL, which is pretty restrictive). d7's conclusion was that there's too much variation in processors to really write one memcpy to rule them all; for his current project, a simple memcpy is all that was needed. Starting with his general process as a base, I played around with some common stuff (loop unrolling, Duff's device), which maybe helped a little bit. Mārtiņš' advice holds true though: rep movsb is pretty good in most cases. My final version pretty much matches or beats the builtin CRT memcpy across a range of sizes... at least on my Skylake i5. Movsb isn't optimized on pre-Ivy Bridge Intels either, and I don't know about AMD chips. The final implementation looks like this:
  1. For sizes less than 16 bytes, copy them as a series of ints.
  2. For sizes less than 1024 bytes in the AVX route and 512 bytes in the SSE route, use head and tail copying (copy the first N bytes, then the last N bytes, and so on). This tends to be a lot faster than the builtin memcpy on my computer.
  3. For sizes greater than 1kb and less than ~2000kb (well, 1<<21, about 2 megabytes), use movsb. (It's an intrinsic on cl, you can use inline assembly on gcc/clang)
  4. For sizes greater than 2mb, I use _mm_load_si128 and _mm_stream_si128 on 16-byte aligned buffers and _mm_storeu_si128 and _mm_lddqu_si128 on unaligned buffers. This ends up being faster than both movsb and memcpy for aligned; can't remember for unaligned; however, I highly doubt I'll actually be doing any copies of this size.

Again, your mileage may vary here, and I expect to have to revisit this as other people try to run my code. It's possible that I'll need to provide a movsb alternative for Sandy Bridge and previous, or for AMD chips. A small note: AVX wasn't faster at scale, but it consistently did better in the 512-1024 byte range with head/tail copies. If the buffer was in cache, it was much, much, faster up to 4096 bytes. You can check out the code for this here

That's about it. I wrote my own OpenGL loader, but that's remarkably simple if you already have a list of functions and their parameters. Creating OpenGL contexts with Win32 is a pain, but well documented. Feature-wise, soon I'm going to implement a texture atlas system to use with stb_truetype and my own graphics. I haven't done audio yet, I'm told WASAPI is the way to go for modern stuff, and poking around it seems like it works from pure C? I'm not sure yet, so that's more testing to come. I grabbed sts_mixer for a LD a while back, and it seems to be okay, but I might take a stab at writing my own too.

As for future plans? I plan to get this working pretty well in May. I'd like to stream more and put together a few videos, but I don't expect to be too consistent with it this month.

William Bundy
Rituals isn't going to get much development attention from me for at least another month. I've been working on some of the design elements behind it, and I feel like I've got a better idea of what will actually be in the game and how all the parts interact. However, until I'm ready to start testing them, I'd rather not go into detail; everything is liable to change until it's proven to work.

So what happened in March?

Well, I joined Handmade Network's education initiative! You can read Abner's post about that. Over the last couple weeks, I found some inspiration to make a map editor for Source games VMFSketch ("vmf" stands for "Valve Map Format"). I have a fair amount to talk about from making this, but it's getting late and I said I'd finish this article today.

Also: I'll be streaming my work on the Ludum Dare this month; it's the weekend of the 21st. This is also the 15 year anniversary of LD, which is pretty exciting. Hopefully this one is a good one.
William Bundy
...or not as the case may be. I got to streaming a few times this month, but overall there isn't much to show.

See you all in March.
William Bundy
I've been having a difficult time trying to sum up January 2017. My best attempt has been "it's been a month, certainly," which isn't particularly descriptive. This post also spans a lot of December, too, which doesn't help.

Let's see... For the Ludum Dare I rewrote a lot of my personal library code, giving me more concrete ideas for how to clean up Rituals, and required me to make some tweaks to my metaprogramming/header tool. In terms of new features, I only added a few things: variadic args procedures to make text handling easier, an easy to use memory pool, and switching to double pointers for types that use it; but they make a lot of messy code disappear.

Miblo brought up the topic of code cleanliness in a wider sense in last month's post, so I thought I'd explain more of my thoughts on it. The short version: read Casey's post on semantic compression. 90% or so of the code cleanliness problems I whine about come from me not being very good at noticing opportunities to do that. Some of the biggest examples of this are when defining Sprites and Entities in code; the definitions cover multiple lines and often set the same few properties, generating a lot of clutter. I missed the opportunity to introduce convenience functions partly because they all look slightly different, partly because I was changing names as I was developing them, but mostly because I didn't know what the common use cases for these objects were when I was setting things up. I reworked a lot of stuff for my Ludum Dare game (sadly, I didn't finish it), and creating utility functions that made it easy to set the important fields on common structs made a lot of code cleaner and easier to understand at a glance.

There are a few cleanliness problems outside of semantic compression (though, possibly still related to compressing the information in your code). Firstly: argument order for similar procedures. Rituals' code has several procedures that need to render something that has text in it. Sometimes it's text, or text with a background, or a button. I find if I need to do something a few times I'll make a procedure that wraps the base one to make life a little easier, but sometimes different constraints or slips in concentration lead to me changing the order of arguments (say your rendering procedures follow a different pattern than printf ones; what happens when you try and combine them?) Over time, this can build up to something quite confusing. While your editor can help you with this (vim has some tricks I just found out about), ideally you can give similar procedures similar signatures. Secondly, especially when making a game, over time you end up mixing some amount of game data into the code. While this isn't a problem depending on the scale of your project, it's generally a fine line to walk. Long definitions take up screen space (something I consider quite valuable) and can make your code less flexible, but moving and centralizing definitions can lead to complex, abstract code, made worse in C/C++ by the lack of reflection (compared to say, C# or Python). As a project goes from small to large, it probably ends up moving data from inside the code to outside, and it feels like Rituals has hit at least two of those (entities, particle effects), at once.

Another part of clean code is organizing commonly-edited procedures together. If you're using a single-translation unit build ("unity build"), you might well not write headers, so you have to arrange your procedures as they depend upon each other. Coming from C#, I find this somewhat frustrating--often I'll want to write a bunch of utility functions at the bottom of a file and reference them in an update or render procedure at the top--so I started writing a tool on stream to do it for Rituals. I'd like to announce that I'm going to roll this off into a separate project, jokingly named "Wirmpht" after all the different things it tries to do. The current implementation is quite messy, but I plan on rearchitecting most of it and adding some missing features. Its current lack of any kind of error or warning generation can make it difficult to use right now, and there are features I never finished satisfactorily (introspection data, serializer generation, to name two). I already find the occasionally buggy and limited form of the tool quite helpful, so I'd like to imagine a complete and polished version would be widely useful.

I haven't been able to jump back into streaming quite as quickly as I'd have liked, but I'll start streaming Wirmpht programming this month, moving on to Rituals when it's ready. Instead of trying to stream every day, I plan on three days a week, mostly on Tuesday, Thursday, and Saturday, starting on the 7th.
William Bundy
Originally titled "Where did October go?"

Due to some upsets back in October, I've had a messy time streaming Rituals. Throw in HandmadeCon, the horrible cold I brought to it (which has lingered since), and life in general, and I'm pretty far behind, both with development and writing about the project. However, development continues!

Rituals' code has gotten pretty messy, or, at least, has some messy parts that the important code interacts with, and I find this a pain to work on. As such, most of my development has been outside the Rituals project, essentially forking the ideas, borrowing code, and cleaning it up. It feels like I've grown as a programmer too, and even if not, my style has changed slightly, which means that a lot of the next part of Rituals' development will be changing how I use the technology available to me to suit that. I'm planning on moving to dynamically loaded game code, removing the CRT on Windows, possibly abandoning SDL, and generally tweaking and cleaning things up. I'm also thinking of moving into whatever subset of C99/C11 MSVC supports, rather than continuing to use C++. There aren't a lot of C++ features I use, and I'm starting to feel like the ones I do hurt more than help.

The last thing I spent real time on stream with was the metaprogramming tool/header generator, which I left in a mostly functioning state, but I expect it'll need more work to be truly useful. The introspection/reflection capabilities are rather weak right now, and I never got around to writing the serialization code generator.

I suspect regular streams will return in the new year. I've got a lot of little projects I'd like to wrap up and a bunch of things I need to learn more about (as we all do, I'm sure), so hopefully I'll have a productive December. See you all next year!

More immediately: I'll be streaming my work on the Ludum Dare this Saturday and Sunday. Times will be: from whenever I wake up to whenever I have to sleep (or whenever my compulsion to play Stardew Valley consumes me).