Return of the Beast

Stewart Orchard

If you were at the 2018 Cambridge Dragon Meetup, then you would have experienced a number of cool things, including seeing some ultra-rare Dragon prototypes, getting your copy of 'Inside the Dragon' signed by one of the original authors, and having a random conversation concerning pickled fish with a Spanish guy who lives in Sweden.

You would also have been lucky enough to score an olde style cassette tape edition of Ciaran Anscomb's Dunjunz, and got the author to scribble on it, though I was too chicken to ask after an earlier name/face mix up, and had to get someone else to ask for me... But I have to say, everything about this game including the tape and inlay has been produced to a beautifully high standard. I believe some copies were still available at the time of writing so get 'em while you can. (Some info here)

Another great benefit of attending a show is that it renews your enthusiasm for your own projects. I'd lost focus on my game and started messing around with a couple of side projects to the extent that the trail had gone cold and I didn't really know what to do next.

So while I was intently listening to Steve Bamford's explanation of how the sprite engine and behavioural scripting works in his awesome Kiruke no Shima project, I started to remember I still had a lot of work to do to get my own sprite engine fit for purpose.

The Smell of the Beast

The first thing to do then was to get reacquainted with my own code. Now, as soon as I fired up the editor I was immediately hit by the unpleasant smell of "get it working now, tidy up later". Except without the "tidy up later" part. And I wasn't entirely sure about the "working" either. Yes, there was a sprite engine, but it didn't really have any organising principles, making it hard to work with, which in turn was stalling development.

As things stood, a sprite was a fairly passive object that relied on external code to make interesting things happen, such as moving in something other than a straight line. What I really wanted was for the sprite to be in charge of its own behaviour from initialisation to destruction. I wanted the game engine to trigger the creation of sprites and the sprites themselves should look after the details.

I made a fairly simple change to start with. Instead of passing the sprite directly to the drawing routine, each sprite now contains a vector pointing to code that is responsible for updating the sprite. This vector could simply point to the drawing routine, or it could point to a routine that implements more complex behaviour. When it's done, the update routine can choose to jump to the drawing routine, or skip the drawing entirely and move on to the next sprite in the list.

So what use is a sprite that doesn't always get drawn? Very useful as it turns out: If each chunk of update code changes the update vector to point to a new chunk of code, we can chain together a sequence of actions and suddenly we have ourselves some cooperative multi-tasking, with each sprite effectively running in its own thread.

Uh oh, story time

This is certainly not a new idea and is pretty much a form of round-robin scheduling, which has been around forever. When I started in my current employment, they were still using 6800-based control systems to run large pieces of automated test equipment. The software would have had something like ten separate threads of execution to run the various subsystems in parallel with a main loop that boiled down to a list of instructions of the form JSR [subsystem_n_vector], for subsystems 1 to n.

The thread for one of these subsystems would be a set of short routines. When one routine had completed, it would store the address of the next routine in the vector and then RTS to yield control back to the main loop and allow the next subsystem to do some work. The next time the subsystem got called, it would jump to the new routine thanks to the updated vector. If the subsystem needed to wait for a condition, such as a sensor turning on, or a measurement becoming available, then it would just leave the vector unchanged until the condition was met. The same routine would get called repeatedly until it decided it was time to change the vector. The key point is that every subroutine yields control without hogging too many CPU cycles to allow other threads to get some attention

One of the strangest jobs I recall was for a client who wanted us to write the software for their machine in BASIC. And not just any BASIC, but VBA, inside an Excel workbook, for the sole reason that they had some in-house experience of VBA. Mad as a box of frogs. So I had to write a multi-threaded application in BASIC, and I did it in a vaguely similar way to the old 6800-based control programs. Each thread had a SELECT...CASE statement that chose a subroutine to call based on the value of a state variable, and the subroutines would update the state variables to control the sequence.

It worked surprisingly well. There was the illusion of several things happening at the same time while still having a responsive user interface, though we were somewhat amused when the same client came back later and asked us to help them demonstrate that the system followed GAMP guidelines. (Good Automated Manufacturing Practice). Well, using a properly documented language that conforms to international standards would have been a good start...

Back to the plot

But I digress. I mentioned that not drawing sprites was a useful thing. When not drawing a sprite, we can use that time to perform initialisation tasks, spreading the work over several video frames if necessary. Only when everything is ready do we start drawing the sprite. This means we have greater control over the amount of work that occurs during each video frame, reducing the chance of doing too much and missing the video sync. (aka slowdown and juddering)

This in turn unlocked something I'd been wanting to do for a while but wasn't sure how best to go about it: Pre-shifting the sprite graphics on demand.

Just to recap, the sprites are 12 x 12 pixels. As we're in a four colour mode, this means 3 bytes x 12 rows = 36 bytes per sprite, plus another 36 bytes for the mask. 72 bytes total.

That's fine if we want to draw the sprite at whole byte offsets, but I need pixel precision, meaning that the sprite image needs to be shifted to suit. As this is quite a slow operation, I preshift the data, so that we have the same sprite image repeated in memory four times, each with a different degree of shift.

That takes up a lot of memory. The shifted images are now four bytes wide, so we have 4 bytes x 12 rows x 4 shifts = 192 bytes per sprite, plus the mask making 384 bytes total. This quickly adds up. For example, the end of level boss is built from four different sprites, and therefore requires 1536 bytes.

Late shift

All of this preshifting happens before the game loop starts, which is wasteful. Some sprite types don't appear on the screen together, and therefore don't need the preshifted graphics together in memory at the same time. It would be a more efficient use of memory if sprites could do their own preshifting just in time.

Now that I have this threading model, the task is much easier. I've broken down the preshifting task into chunks of work that each take no more time than it takes to draw a sprite. When certain types of sprite are triggered, they first preshift their graphics data into a common area, then perform any other necessary initialisation, then start running the normal update code, finally appearing on screen. This occurs over several video frames but is still a fast process in human terms.

As I write this, it suddenly occurs to me that the source data for the masks only requires one bit per pixel and could be represented by 18 bytes instead of the current 36 bytes. The preshift routines would just need an extra step to perform the decompression. I could maybe even run length encode the source data. Now there's a thought...

Pythonic png to packed pixel processing

Since moving from Windows to Linux, I had put development of graphics on the back burner, as I had become too reliant on Windows software, and the front burner already had some peas and noodles on the go.

In particular I was using Tile Studio to draw the sprites. This has a neat output scripting feature that generates source files that can be directly included in the game build, but working in Linux I was a bit stuck. I could draw stuff in Gimp, but lacked a way of getting the data into the game.

I ended up creating a conversion script in Python. This takes a source image in the form of a png file and converts it to the packed pixel format required by the game. It also generates masks from any colours that have out of range indices. In the images below, cyan and black would be interpreted as mask rather than image:

The conversion program can slice the image up into tiles, so we can create a whole sheet of sprites in Gimp and then have the conversion program extract the individual sprites. This works nicely in conjunction with a Gimp plugin that converts layers into sprite sheets.

I've still got a lot of learning to do in Python, but I was very impressed with the language features that make this kind of task easier than you might at first imagine. For example, I was able to perform the tile slicing without any calculation or array indexing. Instead, the image data is fed through a series of iterators that return progressively smaller pieces of image, with a simple matrix transposition (matrix_t = zip(*matrix)) in the right place to put the data in the right sequence for output. It might be useful for other projects so I'll publish it once I've made it a bit more user-friendly.

There are definitely fewer excuses available for not making progress in the game...