Sunday, 3 June 2018

Buzz Kill

To make things more interesting, I was going to attempt something special for this post, and write using the third person once removed. However, when I spoke to the blog author about it, he said that Stew doesn't like the idea; that at best it would only be "vaguely amusing" (his words apparently), and then wear thin with "tedious rapidity". Oh well, I'll just resort to those twin pillars of factual reporting: Exaggeration and Lies.

Bee in a jam jar

Here's an annoying thing: Programming for joystick control in a Dragon/CoCo game. It's annoying because some of the audio hardware is shared with the joystick hardware, requiring care to get the two working together.

It's possible to demonstrate the problem in Dragon BASIC by enabling audio before reading joysticks:

   20 PRINT JOYSTK(0);
   30 GOTO20

That makes a loud buzzing sound. The joystick read routine needs to mess with the DAC to figure out the joystick positions, and modifies the sound source select bits in order to select the joystick axis. If audio is enabled, you can hear all this happening. (I should mention that the above code snippet only has an audible effect on Dragons and possibly older CoCos. Newer CoCos appear to be smart enough to disable audio on joystick read.)

So the main thing is to make sure audio is disabled while reading the joystick. It is also important to restore the DAC to its original value and to reselect the DAC before enabling audio again. This is the list of steps:

  • Save the current DAC value
  • Disable audio
  • Read joystick
  • Ensure DAC is selected
  • Restore DAC value
  • Enable audio

Some effort can be saved by reading the right-hand joystick x-axis last, as selecting this axis also selects the DAC. (Assuming you want the DAC selected of course. If you were instead using cartridge audio, then it would be more efficient to read the left-hand x-axis last)

It is helpful to make the joystick read routine as fast as possible, to minimise the time that the audio is off. My routine is relatively fast as I'm not interested in converting a precise joystick position, just left-centre-right on the x-axis and up-centre-down on the y-axis.

Without going into too much detail, this is done with four writes to the DAC at $FF20 and four reads from the comparator bit in $FF00. One pair of writes for the y-axis and the other pair for the x-axis. After each DAC write the comparator bit is shifted into the B register, resulting in a number representing the joystick position that can be further processed when the audio has been turned back on. I use a lookup table to convert the value into an angle, but it could just as easily be a jump table.

I've had that in place for quite some time and had been happy with it until fairly recently when I recommended the same method to someone else and it didn't work so well: They were getting a buzzing sound.

Umptioned again

In my mind I imagined a capacitor somewhere in the audio circuit that would hold the voltage while the audio was disabled. The assumption was that if the audio was off for a short enough time, then the voltage wouldn't change enough to make a noticeable click when the audio was enabled again. However, after a fair bit of digging, taking measurements and running circuit simulations it turns out that is only part of the story.

The first thing I discovered was that connecting different things into the audio output each gave very different results with my joystick code:

  1. Audio via the RF modulator gave the cleanest sound
  2. A composite monitor gave a buzz that decayed to silence in about a second on the tail end of each sound effect. Strangely, I hadn't noticed it before until I started listening for it and now it seems obvious.
  3. A set of powered speakers gave a continuous buzzing sound

It's starting to look like there's a complicated interplay of factors, but it's surprisingly simple to understand once you learn what is happening. I hinted that capacitors are only part of the story. What was missing is the fact that a capacitor is connected to a circuit via some resistance and this introduces a property called a 'time constant'. This defines how long the circuit takes to reach a steady state after something changes, for example after a new value is written to the DAC.

Use dumb luck. It's super effective.

In the case of the RF modulator, there is a combination of capacitance and resistance giving a time constant such that it takes about 5 milliseconds for the circuit to settle down after the DAC is updated. If the audio is disabled during this time, then you get a click because you're interrupting the current flowing in the circuit and that causes a jump in voltage. The shorter the gap between DAC update and audio off, the louder the click.

It just so happens that my joystick routine was so far away from the last DAC update that I couldn't detect any disturbance. Dumb luck in other words. When I get round to improving the sound engine, the DAC update will be getting closer and closer to the joystick read, so I may start hearing interference.

In the case of the composite monitor, this has a much larger audio input capacitance. This gives a time constant that means it takes a whole second for the circuit to stabilise. After each sound effect, the circuit is out of balance for approximately 25 game loops, each with a joystick read producing a click slightly quieter than the previous one.

The powered speakers are a bit different. There is no visible input capacitance on these. Instead there is a relatively low resistance to ground through the volume control. What this means is that the voltage on the audio output drops to a low level within microseconds of disabling audio. This results in a click on every joystick read giving a continuous buzzing.

Taking all scenarios into consideration, it would seem that the best defence against joystick buzzing is to leave audio disabled after reading the joystick, and to only enable sound at the start of a sound effect. But even then, clicking may still be audible, particularly for quiet or fading sound effects. If possible leave a few milliseconds after a DAC update before reading joysticks and that should at least keep things quiet as far as audio via the RF modulator is concerned.

Of course, this all results from investigation on a Dragon. Things may be a little different for the CoCo as the various models have custom ICs which may not be the same as the heap of discrete parts in the Dragon. Your kilometrage may vary.

And now for something completely similar

Previously I discussed my initial attempts at producing music for the SN76489, as used in the CoCo Game Master Cartridge. In particular I speculated about modulating the attenuation registers as a way of superimposing lower frequency tones that are below the sound chip's normal range.

To test that idea I would need some real hardware. I ended up making a small sound card that could be piggy-backed onto another cartridge and plugged into my Dragon. In the process I found out just how bad my eyesight had become and wished that I had the luxury of magnifiers and bright lights that I'm used to using at work. (And that I hadn't stupidly left my reading glasses at work)

Fortunately, I still ended up with a working card without too much trouble. The SN76489 audio output appears to be highly prone to unwanted oscillation, making it hard to see the audio waveform on an oscilloscope, so I added a high frequency bypass to ground on the audio output pin. The bypass consists of 100nF + 10R in series, values suggested on the datasheet for a similar part, the SN76496.

I don't always make my own cartridges, but when I do I attach them to other cartridges with tiny bits of wire

Some quick tests proved that the modulation idea worked in principle, so I extended the music player to modulate two of the SN76489 tone channels with software generated waveforms. These have a programmable pitch offset from the hardware tone and also have variable duty cycle. This results in the following sound:

It's not quite the same kind of sound as the variable duty pulse waves that I was playing with before, largely because the phase relationship between the soft and hard waveforms jumps every video frame while the next sound fragment is set up, but it does add a great edge to the sound. Definitely a keeper :)

Friday, 6 April 2018


Ooh shiny

Well if there was any kind of schedule, I'm definitely behind it. Interesting things keep appearing and distracting me. Like that thing over there. Back in a minute...

The latest shiny is that development snapshots of xroar now have Game Master Cartridge support. This is exciting stuff: I've been thinking about the GMC for some time as a vehicle for a 'deluxe' version, and this was just the excuse I needed to have a play. The banked ROM would allow for a greater variety of sprites and a larger background map, plus the SN76489 sound chip could give more options for in-game sound. More on that in a bit.

Lost in music

I felt I was on a roll with the music, and it seems to be a hot topic at the moment, so I stuck with it and have now recorded my changes to CyD as a fork on GitHub.

I'd previously written about how I tweaked CyD to play variable duty cycle square waves and added simple support for drum samples. Since then I've improved the duty cycle control a little and added a couple more example tunes.

I originally had a bit of code like this to control the pulse duty over time:

        lda duty
rate    equ *+1
        adda #0
cond    bmi skip
        sta duty

The bmi instruction prevents the duty cycle being negative, so we could start with a low value and a positive rate to give a waveform that changes from narrow pulses to square, or go the other way by starting with a large positive value and have a negative rate.

As an added bonus, the bmi instruction can be modified into a brn to make the duty keep wrapping round and cycle endlessly. That sounds really good, but I wondered how much better it would be if I could have the duty ping-pong between two arbitrary limits.

The difficulty with that is there are then two conditions to check for, and which one we check depends on which direction the rate is heading. For quite a while I didn't bother because it sounded like too much complexity for the gain. I didn't want to add too much code because the time taken reduces the quality of the sound.

Then I thought of a way. Not a pretty way, but a way nonetheless. The following only adds one immediate instruction to the critical path. The other instructions only come into play briefly when one of the limits is reached:

        lda   duty
rate    equ   *+1
        adda  #0
cond1   equ   *+1
        cmpa  #$e0
        bls   store
        neg   rate
cond2   equ   *+1
        ldd   #$2024  ; bhs=$24 bls=$23
        ldx   cond1
        stx   cond2
        std   cond1
        bra   skip
store   sta   duty

This exploits the fact that the limit value and the branch opcode are sitting right next to each other in memory and can both be written at the same time with a 16-bit store. Each time the limit is hit, the limit value and branch condition are updated and the sign of the rate is reversed. Ugly, but worth it: This gives much more variety to the sound.

Bend it like Beckham

To help with manually entered music I've added a macro to set up portamento effects. This is where one note transitions into another by smoothly sliding the frequency between two values and is a familiar element in many chiptunes. The way CyD does it is by adding a constant value to the frequency value each 'tick', or 1/50th of a second. So to slide from one note to another we need to calculate how much to add per tick by dividing the desired frequency change by the number of ticks over which it should happen. This could be done by the assembler if the frequency values are defined in the source code.

Among the CyD source files there was a nice Perl script that calculated the note frequency lookup table. I say 'was' because I've butchered this script most 'orridly (Perl just doesn't understand me) to enable the assembly source to calculate its own timings based on how the channels have been configured.

This script now also outputs the note frequency lookup values as symbols which in turn allowed me to add the helper macro to set up a portamento based on the start and end notes and the time interval. This makes it really easy to add guitar-like bends.

After much experimentation I've discovered that a snare drum sample can be lengthened and enhanced by switching to white noise near the end of the sample. The white noise then just continues until cut off by another sound. This sounds a bit odd on its own, but when mixed with other channels, gives the impression of a longer sample with reverb. I think it sounds pretty effective anyway:

All your bass are belong to us

As I've invested a lot of time in CyD and am familiar with the way it works, I hacked together a GMC version to get a feel for how things might sound. What quickly became apparent is that the lowest notes are not particularly low. It goes down to about a B2, which is nowhere near the lowest note on a guitar, let alone a bass.

One way to get lower notes is to use a lower clock frequency than the standard 4MHz. In fact Time Pilot '84, the arcade game that my game is at least trying to be loosely based on, runs three similar chips at half the NTSC subcarrier frequency, or approximately 1.79MHz. That would get you a mains-hum-like A1, the second lowest string on a standard bass, and a bit closer to what we like.

Another way is to make use of the 'periodic noise' capability. This is a function of the noise channel, but isn't really noise, just a low duty square wave with a frequency that is (I think) 1/15 of the usual frequency. It has a thin, buzzy kind of sound.

Yet another route to lower notes is to modulate the attenuation registers to superimpose lower frequencies onto the normal output. This requires CPU time so wouldn't be very suitable for in-game music, but might sound interesting enough to be good for title music, especially if the duty cycle varies over time. I'm very tempted to give it a go.

Play it again, GMC

I reworked my example tune above to play more nicely on the SN76489. I had to transpose it up 4 semitones and throw away the really low notes, which makes it sound completely different. I also ended up mashing in some other ideas just to hear what it would sound like and this is the unholy result:

The echo effect at the start is easily achieved by playing the same thing on two channels with a moderate delay between the two. In this case the delay is something like 300ms. When the notes bend it really comes alive. I've also snuck in some cheeky vibrato as well to spice things up a bit. This is just a rapid series of bends above and below the start note to make the frequency wobble around in a pleasing way.

The drum sounds are simply bursts of noise with the volume and frequency changing over time. The kick drum has a really rapid drop in frequency to make it sound like a thud while the snare has a short period of rapid frequency drop followed by a longer period with more gentle change. It works quite well but it leaves me wondering how good a result could be achieved with a more sophisticated shape to the frequency envelope.

I've put my GMC version of CyD on github, just in case anyone wants to have a play...

Saturday, 27 January 2018


I decided to take a long break from coding the game mechanics and instead have a closer look at generating some music, or "mooozik", as they say round these parts.

Because my living involves a certain amount of programming I don't need much in the way of persuasion to run as far away from coding as I can. On the other hand, playing around with music is much more appealing to me and I've been having a lot of fun over the last few weeks getting the Dragon to produce some audio that is - dare I say it? - musical.

The fruit of someone else's labour

I would like to have as much music in the game as possible: Theme music on the title screen, story music, get ready music, game over music, high score music etc. Whatever I can fit in to fill the silence. Maybe even in-game music if a sound chip is present. I'm going to aim high and see how far I get.

The biggest problem seemed to be the lack of a suitable music engine. When I started work on the game I wasn't aware of anything off the shelf that I could use and it looked like I was going to have to write something from scratch.

I learned quite a bit about music playback while working on a musical tape loader with Simon Jonassen and started to get a feel for the kind of things that might be possible. Then, as luck would have it, Ciaran Anscomb brought his marvellous CyD project to my attention.

To fill in a bit of background this is a software synth based on the CoCoSID project by RĂ©mi Veilleux, which as the title suggests, is designed to play C64 SID tunes on the CoCo. It works by mixing three channels of synthesised waveforms and updating the synthesis parameters 50 times per second. The waveforms are synthesised from a selection of reference waveforms stored in memory. The playback parameters are determined by an offline conversion of a SID register dump obtained from a C64 music player into a pattern-based format that CoCoSID can read in real time. The results are really impressive.

What Ciaran has done with CyD is to produce a rewritten, optimised version with a new music processor allowing you to enter music data in a more memory-efficient programmatic form. It has a variety of features such as envelopes, portamento, subroutine calls, transposition and arpeggios - i.e. he's made it more like a music driver that back-in-the-day purveyors of fine tunery such as Martin Galway or Rob Hubbard might have used. "That looks handy," I remember thinking to myself, "I'm having that."

Some of you may have heard the title music from the 32K demo version of the game. This is being played using an unmodified version of CyD. It has a very distinctive and dynamic sound, far removed from the sounds that most of us Dragon owners are used to hearing. However, after the novelty wore off, I wondered if I could bend things a bit to add even more complexity to the sound.

Pushing the envelope

Generating a tone in software will very often be based around a phase accumulator implemented with something like the following piece of code:

phase equ  *+1
      ldd  #0
freq  equ  *+1
      addd #0
      std  <phase

That generates a sawtooth waveform in the A register. Self-modifying code is used for performance, and direct page addressing helps too. Synthesising an arbitrary waveform is just an indexed instruction away:

      lda a,y  ; y points to a 256 byte waveform

The same lookup could be done by self-modifying the lower address byte of an extended address instruction with the upper byte pointing to the current waveform. In fact CyD does a bit of both to mix three channels together in an optimal way.

Envelopes are simulated by using separate versions of the same waveform differing only in volume level. Having just two or three volume levels works surprisingly well.

So what can we do that's different? Funnily enough, Ciaran came up with something new for his Dunjunz game, this time generating square waves directly. Instead of looking up a reference waveform stored in memory it converts the sawtooth in the A register into a square wave using something like this:

    lsla       ; put msb of A into B
    rorb       ;
    sex        ; extend sign of B into A
    anda #vol  ; set volume

If A is greater than or equal to 128, it becomes set to #vol, else it gets set to zero. Nice. It's the sort of branchless conditional construct that you might see a compiler generate to avoid stalling a pipelined cpu and here it is making itself useful on the 6809.

That got me thinking about how cool it would be to be able to vary the duty cycle of the square wave i.e. generate rectangular pulses of varying width. That would make for some very interesting noises. It turns out that duty can be added for free by replacing the lsla instruction with an adda:

    adda #duty ; use duty to generate carry
    rorb       ; ...and set sign of B
    sex        ; extend sign of B into A
    anda #vol  ; set volume

Varying the duty value from 0 to 255 will generate rectangular waveforms with corresponding duty ratio of 0 to nearly 100%. A duty value of 128 will generate a square wave as before.

To try it out I added a command to CyD to set the duty. To demonstrate what it sounds like, first here's a simple bass riff played using a square wave:

And now the same notes but with different duty cycles:

Things get more interesting when the duty is varied while a note is playing so I added a tiny bit of code to do just that:

Let's go mad and play two voices an octave apart with different duty settings:

Now you're talking. If you enjoy upsetting C64 fans, try calling this sound 'SID sound'. That riles them up something chronic.

Kicks like a concrete donkey

CyD gives us three channels to play with, which immediately makes me think of great power trios such as Clapton/Bruce/Baker, Lifeson/Lee/Peart and, err, Rod/Jane/Freddy. I'm thinking one channel for lead, another for bass and another for drums.

Drums can be synthesised from combinations of noise, square, triangle waveforms etc., and messing with the frequency. In fact I did this for the title music in the 32K demo, but the kick drum, while adding cool accents to the bass, did sound a bit feeble to me.

A kick drum is pretty much a noisy click followed immediately by a thud. The frequency is fairly low and drops with time with the volume swelling in the middle. The whole thing might last 50 - 300ms. Bearing this in mind I drew a 256 sample waveform freehand in Milky Tracker:

Crude, but effective

I boosted the volume somewhat, resulting in a bit of clipping, which is helpful here and makes the kick louder. This was then exported to an 8 bit wav file and converted using sox into a 256 byte raw binary file that could be included into the assembly source. To save you the pain of trying to decipher the sox manual, this is the command I used:

    sox -D in.wav out.raw vol 0.33 dcshift -0.66

The -D option disables dithering. When dithering is enabled, it can make the waveform one bit too loud, potentially resulting in overflows and nasty clicking during playback.

There are still some hoops to jump through to get CyD to play the sample properly. First we need to reset the channel phase counter to the beginning of the waveform when we trigger the sample, otherwise it will play from some random point. This required a new sample playback command. We next need to make an envelope that is about as long as the sample, in this case 4/50ths of a second, and ending with a silent waveform to stop playback. Then the frequency needs to be fine tuned so that the sample ends when the envelope ends. Otherwise the sample will either be cut short, or it will start again from the beginning. I manually adjusted the first note in the frequency table to suit the sample, though I think a frequency set command will be required to allow a variety of samples to be played.

This sounds like the sort of fiddly exercise that should be automated in software, so that's yet another thing to add to the ever-growing todo list. Anyway, this is what a 256 byte kick drum sample sounds like on a Dragon:

Music puns are not my forte

To actually compose music I play around with ideas in Milky Tracker and then hand convert to CyD. (Yeah, yeah, I know I should write a conversion script.) I've also been playing with MikroWave on my phone. It's a bit of a toy but you can throw together new loops and rhythms really quickly.

Time for a quick demo tune. Now it's fair to warn you that I quite like droning repetitive music. This allows me to listen to Moby without having to gnaw off my own arm to escape, though the urge is never far away. Like all the other recordings on this page this was captured from xroar. Roll credits...

Wednesday, 22 November 2017

Like a Boss

Now there's a title: A bold running start, a title fit for an epic post. A title full of optimism and hope. A title spoiled only by the quality of what follows. You see, I've spent much of the last few weeks engaged in one form of development related navel gazing or another. Please allow me to explain...

Filthy OSes

I do all my hobby stuff on an ageing Tecra laptop. It's now over 10 years old, but I've no reason to change it as it works fine and has respectable performance. The only problem is Windows XP, which is way past its end of support. It ran off the top of that particular cliff three years ago, and very much like Wile E. Coyote, is kind of just hanging there waiting for gravity to notice. Updates for various bits of software are drying up, it's increasingly likely I have to search for old versions of drivers, and the nagware anti-virus software is a big resource hog. We hates it.

I briefly looked into upgrading to Windows 7, but Mister "None Shall Pass" Upgrade Advisor Wizard flagged up a bunch of issues with the hardware. Scratch that idea: Time to look at a free OS.

I've played with a Linux distro before. I must have been feeling particularly wealthy circa year 2000, because I walked into a PC shop and bought a boxed SuSE Linux 6.2 or 6.3 to see what all the fuss was about. I've still got the box in the roof somewhere. Anyway, I installed it on a 486 PC and it worked well enough, but I didn't really do anything with it due to lack of software that could work with the stuff I'd been doing on Windows. I lost interest and stopped using it. That was a mistake with a capital M. I've basically missed out on over 15 years of useful education.

Like I GNU you would

I guess I've got a bit of catching up to do then. Now the thing that rapidly becomes apparent is the amount of choice available for a free OS: There's a bewildering selection of distributions, kernels, and desktop managers to choose from. If your worst nightmare is based around having to make a decision then this might not be your night. To help me decide I came up with the following sketchy criteria: 'Popular', 'Lightweight' and 'Googled evidence that it works on an old Tecra'. To cut a long story short I eventually settled on Debian/Linux with an Xfce desktop.

It almost worked first time: I had to download a legacy driver to get the wifi working and the trackpad settings needed a tweak but I was very impressed at how smoothly it all went. So now game development is proceeding on *nix. If I can stop playing 2048, that is...

So, Anyway

In spite of all that I did manage get some work done on the game.

In Time Pilot '84, a boss appears after a certain number of enemy craft have been destroyed. The boss chases after the player while continuously firing homing missiles. To defeat the boss and proceed to the next level, the player must wait to acquire a missile lock and then successfully fire a missile while avoiding collision with the boss or its missiles.

The boss is quite large compared to the player, and rather than write a special large sprite routine, I chose to construct it out of four standard sprites effectively flying in tight formation.

I see five sprites

To get the boss to chase the player we need to figure out how to determine the direction of the player relative to the boss. Those who know their vectors will know that subtracting the boss coords from the player coords will give the precise direction of the player, though it would be hard to make use of it as we would need to convert that direction into a unit vector which means square roots and stuff.

More useful would be a function that returned an angle. That would allow us to steer the boss towards the player. One way to get an angle from vector components would be to use the arctangent function, but that's no better than having to calculate square roots, which leaves us wondering if there is some approximation we can use.

Mmm pie

It turns out there is something relatively simple we can do. If you imagine a cherry pie centred on the boss, and that pie is divided into four delicious slices, aka quadrants, then it's easy to figure out which quadrant the player is in just by looking at whether the difference in coord components is positive or negative. Even better, it's possible to refine it more and divide the area into eight octants by comparing the sizes of the x and y components. (And a thinner pie slice means you can have a bigger blob of double cream. Ooh you would)

The following piece of code returns an octant number in the range 0-7 based on the difference in coords. It works by progressively rotating the coords until they lie in the first quadrant then finally comparing the x and y components to figure out which octant:

        clrb         ;
        lda dy       ;
        bpl ypos     ; dy >= 0
        neg dx       ; rotate 180 cw
        neg dy       ;
        addb #4      ;
ypos    lda dx       ;
        bpl xpos     ; dx >= 0
        nega         ; rotate 90 cw
        ldx dy       ; don't care about low byte
        stx dx       ;  just need to swap dx & dy
        sta dy       ;
        lda dx       ;
        addb #2      ;
xpos    cmpa dy      ;
        bhs done     ; dx >= dy
        addb #1      ;

done                 ;

The coords have previously been scaled and subtracted and stored as 8 bit values in dx & dy. In case you're wondering what the strange business with ldx/stx is about, it's because at that point I need to keep the contents of both the A and B registers and rather than save and restore one of them, I used the X register and saved a few bytes and cycles instead. stx dx overwrites dy, but it doesn't matter because we then write to dy in the next instruction.

The number returned in B represents the direction in which we want the boss to move, and that can be used to look up a velocity from a table. Now, rather than suddenly change to the new direction like Automan's car, I increment or decrement the current direction once per update so that the boss steers more gently towards the player. But even that is too aggressive: The boss homes in on the player far too efficiently and there is no way to avoid the collision. (It does work great for homing missiles however.)

That left me wondering how the original arcade game worked as I remember it being pretty easy to avoid a collision. I tried slowing the boss down as it approached the player and slowing down the update rate but nothing seemed to give a satisfactory result.

In the end, I fed a fraction of the player steering input into the homing routine. What now happens is the boss approaches the player while the player is flying in a straight line, but if the player turns, the boss steers behind the player. Rather surprisingly this ad hoc solution feels more true to the original.

Out with a bang

For a while I wasn't sure what should happen when the player defeats the boss. I didn't think I would be able to pull off a convincing explosion, so I decided to cheat instead. When the player successfully hits the boss, a mode change occurs and the boss starts running away from the player while pouring out flames, which are just re-used enemy explosion graphics. Here's how it looks so far:

The battle is a bit one-sided at the moment. The boss still needs to fire missiles at the player, and the player should have to sweat it out for a while before getting a missile lock, but it's a good start.

I'm in two minds what to do next: I could finish the boss battle or I could try out some musical ideas while they're still fresh in my mind. Also December is almost upon us. So much to do, so little time...

Thursday, 5 October 2017

Sounds Like a Plan

Sounds like a pain?

I was nervous about the sound effects, having found it in the past to be hard work to get good results. The Dragon and CoCo machines have a 6 bit DAC that is capable of creating some fantastic sounds, but in a rather cruel twist, things are not easy if you want sound at the same time as animation.

The problem is that sound generation requires a fairly regular rate of writes to the DAC, whereas updating graphics often involves a variety of routines of different lengths. The two things don't mesh together very well, so what usually happens is that the sound ends up being produced in short bursts in between screen updates, creating a distinctive thin and choppy kind of sound that can be heard in many Dragon and CoCo games.

Trip down memory lane

In my first game, Balldozer, there wasn't much work to do to keep the graphics updated: Just a ball, a bat and a power-up, along with the occasional brick that needed erasing. That allowed me to have relatively long bursts of sound in between the updates. Low frequencies with decaying volume were generated when the ball bounced off the bat or the sides of the screen, giving a 'boing' sound, and higher frequencies were generated on hitting the bricks, giving a 'ching' sound. This worked out pretty well:

For my second game, ROTABB, I didn't put any sound in until the last minute and it really shows. Because the sound in Balldozer worked out OK, it didn't occur to me that I would have a problem. What I discovered too late was there were no spare cycles left over, and the best sound I could achieve during normal play were some feeble explosions for destroyed enemies. The only way I could find to generate a reasonable sound effect was by having longer bursts of sound and letting the frame rate drop. So to avoid messing up the gameplay, I only did this for destroying the level boss and for the player death, where the lower frame rate gives a nice slow-motion effect. The lesson here is to allow for sound early on in the design:

For the abandoned third game, I had started experimenting with producing sound and graphics at the same time i.e. embedding the DAC update code within the graphics update code. This game spent a lot of time drawing background tiles, so I inserted a DAC write for every second tile drawn. The values to write to the DAC were read from tables, with the volume of the sound being set by a MUL instruction. This gave a much more full and satisfying sound:

Back to the Future

For the new game I would like the sounds to have something of an arcade feel. That generally means lots of noise going on, with a variety of sounds that are rich in low to mid-range frequencies. I felt that the sound in the third game was a step in the right direction, so I decided to build on those ideas:

  • Embed more DAC updates in the game code
  • Make the DAC updates more efficient by not using MUL for volume
  • Instead of a set of fixed wave tables, have a sound buffer that is updated by the sound engine at the start of each game loop
  • Sound engine that generates sounds from parameters
  • Choose a sound effect for playback by pointing to a parameter entry in a table

This list is saying I need a sound engine that is much more sophisticated than anything I've done before. Fortunately the task can be broken down into more manageable pieces. First of all I need a small piece of code that takes one sample from the buffer, writes it to the DAC and updates the buffer pointer using a minimum of registers so that it can be embedded in existing code without causing too much upheaval:

    lda [sndbufptr] ; Get next sample
    sta $ff20       ; Write it to the DAC
    inc sndbufptr+1 ; Update buffer pointer

That takes 20 cycles. It can be reduced slightly by using self-modifying code, but that's only worth doing in a loop with a fair number of iterations because of the overhead of copying the buffer pointer at the start and end. Note that I'm only incrementing the bottom byte of the buffer pointer. This is fine providing the buffer is completely contained within one 256 byte page.

So how many DAC updates do we need to embed in the game loop? Say we wanted an update rate of 1KHz. Assuming normal CPU speed and a frame rate of 25fps, that means an update approximately every 895 cycles, or 40 updates per game loop.

Those updates need to be sprinkled as evenly as possible throughout the code. This is not an easy thing to do, but it doesn't need to be perfect and it might be easier to change the update rate to suit the code. For example I've put a DAC update inside the loop that copies 128 byte blocks of background to the screen. That performs 21 updates spaced at 523 cycle intervals, forcing me to aim for a higher update rate of 1.7KHz

In the tile drawing routines, the loops are slightly too short, so the DAC update is preceded by a flag toggle to perform the update every second loop.

If a loop is really short, then it might be necessary to break the loop up into two or more chunks by having an inner loop doing the work and an outer loop containing the DAC updates. It's a similar story for the loop that waits for the video frame sync. I have an inner loop checking for the sync flag in the PIA, and the outer loop performs the DAC updates at the target rate.

Where's that noise coming from?

Next we need to generate some data to put into the sound buffer. To keep things simple I decided to have a single sound channel. This means only one sound effect plays at a time, and a priority system is required to prevent a less important sound from interrupting a more important sound.

There are two types of sound commonly generated by sound chips, namely square wave and noise, so we may as well start with those. The following code writes a square wave* to the buffer with adjustable frequency and volume:

      ldx #soundbuf
      ldb snd_phase
      lda snd_vol
ssqlp sta ,x+
      addb snd_freq
      bcc ssq1
      eora snd_vol
ssq1  cmpx #soundbuf_end
      blo ssqlp
      bra donesound

The frequency and volume are values in the range 0-255. The higher they are, the higher the frequency and volume respectively. snd_phase stores the value of the B register between successive calls to the routine, so that the square wave continues where it left off on the previous game loop. (Though the placement of DAC updates would have to be very accurate for this to make much difference, and if I did get that accurate, I would have to save the volume of the square wave between calls as well)

The noise routine is a little more complex. It reads values from a table of random numbers, sets the volume by ANDing with a mask and then outputs the values to the buffer. The values are repeated in a similar way to the square waves to give some control over the frequency content of the sound. I've used self-modifying code so that the routine is not too slow compared to the square wave routine:

      ldx #soundbuf
      ldu #rndtable
      ldd snd_vol   ; snd_vol & snd_freq
      sta sns2+1    ; volume
      stb sns3+1    ; frequency
      ldb snd_phase
      anda ,u+      ; first sample
snslp sta ,x+
sns3  addb #0       ; frequency
      bcc sns1
      lda ,u+       ; get next random value
sns2  anda #0       ; set volume
sns1  cmpx #soundbuf_end
      blo snslp
      bra donesound

The random number table is a 64 byte table of pseudo-random values, refreshed at a rate of one new byte per game loop such that the table contents are continuously changing.

Now we need a table to define the different sound effects. Each table entry defines the type of sound, how long it lasts and how it changes over time. That suggests these parameters:

  • Sound duration
  • Start volume
  • Change in volume per game loop
  • Start frequency
  • Change in frequency per game loop
  • Address of sound generator (square or noise)

The good news is those parameters fit into seven bytes, meaning we can pack a lot of different sound effects into a small space.

When a sound effect starts, the volume and frequency are copied to variables for the sound generator to use. Then for each game loop, the sound generator is called and then the volume and frequency have the 'change' parameters added to them. The duration is used to initialise a counter that is decremented once per game loop to set the length of the sound.

The sound priority is defined very simply by the address of the parameters. Sounds nearer to the start of the table have lower priorities than sounds nearer to the end of the table. When a sound effect request is made, the address of the new sound is compared to the address of the current sound. If the new address is lower, then the sound is not started, otherwise the new sound will replace the existing sound.

If the current sound has address zero, then it means no sound is playing, which happens when the sound duration counter has expired, for example. Having the lowest possible address also means the silent state has the lowest priority, ensuring the next sound request will be granted.

Put it all together and what do you get?

I've implemented the ideas discussed but haven't to date put a lot of effort into spreading out the DAC updates evenly. There are about 44 DAC writes per loop, some bunched up, some spread out and a big gap when the sprites are being drawn. Even so, the result is surprisingly good: (Please excuse the jumpy video. The game runs much more smoothly than this)

Back to the Future II

At some point I would like to add more parameters to make the sound effects more complex, such as periodic frequency variation (vibrato) and variable duty cycle for the square waves. I also need to do a better job of spreading out the DAC updates plus there is the issue of supporting both PAL and NTSC machines, as the different frame rates mean different sizes of sound buffer. But on the whole I am extremely pleased with the new sound engine. I just wish I had thought of it back in the day :)

As a final thought it occurred to me that the same sound engine could be used to drive a variety of different sound hardware (such as sound carts) just by replacing the lowest levels of the driver. Definitely worth considering for a future 'deluxe' version.

I'd better get on and write some more code before I run out of things to write about...

*Caution: Under no circumstances attempt to generate square waves in real life. The infinite accelerations and energy densities required would rapidly bring on the end of the universe and possibly void the warranty on your speakers.

Tuesday, 5 September 2017

Crash Course

Ooh it's a good one this time: They're doing collision detection.

Call me old fashioned...

...but I like stuff to blow up when I shoot at it. There are some enemy sprites flying around, some player bullets headed in their general direction and I'd like to know when a bullet hits a sprite so I can make it asplode.

One way is by comparing the coordinates. If you imagine subtracting the coords of one object from another, then the result will say something about how close together those objects are. Small numbers = close together, big numbers = far apart.

In fact we can tell if two rectangular regions are overlapping by subtracting the coordinates and comparing the resulting numbers against limits:

    ldd XORD,y
    subd XORD,u
    cmpd #high_limit_x
    bgt miss
    cmpd #low_limit_x
    blt miss
    ldd YORD,y
    subd YORD,u
    cmpd #high_limit_y
    bgt miss
    cmpd #low_limit_y
    blt miss
    ; Hit. You sunk my battleship.

That takes about 56 cycles. It can be reduced to 52 cycles by noting that each cmpd # can be replaced with a faster subd # and modifying the limits. (Also handy to know is that cmp #value can be replaced with add #-value, which allows you, for example, to compare against positive and negative versions of the same variable)

More cycles than Halfords

Now, here's the thing: We need to check each object against every other object with which we wish to detect a collision.

Taking the case of eight player bullets and eight enemy sprites, that makes 8 x 8 = 64 collision checks. At 52 cycles per check that amounts to well over 3ms. Ouch. That's 10% of the total time budget, or a very large chunk of the time remaining after the background and sprites have been drawn. And I still have to detect collisions with the player and implement sound.

I did some research on optimising collision detection, using space partitioning methods such as quadtrees, but soon began to realise that these techniques are better suited for more powerful systems dealing with larger numbers of objects. Not only that, the worst case scenario would still require the maximum number of checks, resulting in dropped frames and slow downs. Suddenly the performance of my game wasn't looking so good. Reality had invited itself round for dinner, like some kind of unwelcome freeloading walrus, and there was nothing I could do about it, except perhaps to hide in a cupboard until the problem went away.

But then I started to think about how collision detection was often done in games written in BASIC. For example in text mode games, before an object is drawn, the destination address can be checked to see what's already there. Or in graphics modes, PPOINT was often used to check the colour of a target pixel. Both amount to the same thing: Using the data in the screen buffer to detect collisions.

One of these pixels is not like the others

So the new strategy could work like this: The routine that draws the bullets records data for each bullet location. Then, after each sprite is drawn, the data is compared against the new screen contents. If something is different then we know that the sprite just drawn has been hit by the bullet just checked.

Because the bullets are only one pixel in size, we just need to record one pixel of information for each. The information is comprised of address, mask and pixel value. The address tells us where in the buffer the bullet is drawn, the mask selects the pixel of interest (out of the four pixels making up a byte), and the pixel value is the bullet colour ANDed with the mask, giving us something we can use to detect changes.

After each sprite is drawn, it will need to check for all eight bullets, using something like the following bit of code repeated eight times:

    lda addr1  ; grab data from screen
    anda mask1 ; select pixel of interest
    cmpa pix1  ; compare with bullet colour
    bne hit    ; collision if not equal

As this code fragment could be run 64 times, every single cycle matters, so I've opted to use self-modifying code, with the bullet update routine storing the data directly into the collision detect code:

addr1 lda >0   ; 5 cycles
mask1 anda #0  ; 2
pix1  cmpa #0  ; 2
      bne hit  ; 3

We're now down to 12 cycles per check, which when added to the additional code in the bullet update routine, takes less than a third of the time taken by the coord comparison method. Much better!

Now I should mention that this method does have a flaw: If the sprite pixel is the same colour as the bullet pixel then the collision won't be detected. It's not as bad as it sounds, because there is a good chance that a missed collision will be picked up on the next frame when the bullet moves to a different part of the sprite. Consider also that the coord comparison method is not perfect either, and requires the collision region to be a rectangular approximation of the true shape of the sprite image. On balance I think the faster method easily wins.


That leaves us with detecting collisions with the player. The coord comparison method could be viable here, because we only need one check per enemy sprite. The downside is that the accuracy will be a compromise. To avoid unfair situations, the bounding rectangles will have to be completely contained within the sprite images, which could mean large parts of sprites overlapping without a collision being detected. Instead, a more accurate screen buffer method could be used:

  • First draw the player
  • Then draw all the sprites that are dangerous to the player
  • Then check the screen buffer to see if anything has been drawn on top of the player

To determine if anything has been drawn on top of the player, we can use the player mask data to extract a player-shaped region of the screen. This should be identical to the player image, otherwise something must have been drawn on top. The following bit of code does just that:

loop  pulu d     ; get two mask bytes
      coma       ; invert mask bytes
      comb       ;
      anda ,x    ; collect data from mask-
      andb 1,x   ; -shaped region of screen
      subd ofs,u ; compare with image data
      bne hit    ; difference means collision
      pulu d     ; get next two mask bytes
      coma       ; and do same again
      comb       ;
      anda 2,x   ;
      andb 3,x   ;
      subd ofs,u ;
      bne hit    ;
      leax 64,x  ; check every other row
      leau 4,u   ;
      dec count  ;
      bne loop   ; repeat until done

u is pointing to the player mask and image data, with ofs being the size of the mask data minus two, to allow for pulu d adding two. x is pointing to the screen buffer. Note that I'm testing every other row to save time, the assumption being that this is sufficient to detect a collision with a large object. If I needed to detect collisions with single pixels then I may have to check every row.

This routine takes about 500 cycles to check six rows of player sprite, which is actually a little slower than comparing coords, but has the advantage of greater accuracy. It could be accelerated by using a compiled sprite technique, unrolling and converting instructions to immediate mode, but it would consume a lot of memory, requiring one routine for each of the 16 player directions. Worth keeping in mind for a high-performance 64K version.

Another advantage over coord comparisons is that the run time is independent of the number of objects. I could decide to add a load of enemy bullets for example and the player collision check would take the same amount of time.

The downside is that we don't know which object collided with the player. This is OK for my purposes, but might not be enough for some games. One way to work out which object was in collision would be additional code after the screen buffer test to determine which object was nearest to the player. (Though be careful of situations where the nearest object may not actually be the one colliding with the player)

He's making a list

These collision detection methods require the sprites to be handled in different ways. There are collidable sprites, such as enemies, that need to be checked for collisions, and there are non-collidable sprites, such as explosions, which skip the checks. These are allocated from a pool of eight free sprites.

And checking it twice

I have implemented three linked lists, to keep track of the collidable, non-collidable and free sprites. The non-collidable sprites are drawn before the player and skip the player bullet checks. The collidable sprites are drawn after the player and before the player collision check and have the bullet checks enabled. (Drawing the non-collidable sprites before the player avoids drawing the same sprite twice in one frame as would happen when an enemy sprite is turned into an explosion.)

Oh crap it's almost Christmas

I'm using one-way linked lists to keep the overhead down (each list item having a pointer to the next list item), and this is working OK, but I'm not sure if that will be the solution used in the final version of the game. Moving an item from the collidable to the non-collidable list takes six load/stores to modify the pointers leaving me wondering if there's a better way. Perhaps a job for another time...

Saturday, 29 July 2017


I might be slow, but at least I'm inconsistent. A few weeks ago, I thought, "I know, I'll read Stephen King's Dark Tower novels again, seeing as I'm a bit of a fan and the movie is coming out soon. It won't take long; There are only eight books, and probably not much more than 4000 pages." The next thing I'm aware of is struggling to remember the password to get back into this blog.

In the meantime, Ciaran Anscomb has been working hard and has got his excellent Dunjunz conversion ready for release. This is an achievement to be proud of as finishing a project can sometimes be the hardest part. Check it out; it's everything a game should be.

Right, I can't put it off any longer. Strap in and prepare yourselves for some awful puns. Truly awful.

Let there be sprite

As this is a shoot-em-up, we need cannon fodder to shoot at, enemies flying around and suchlike, and for that we will need some sprites.

I'm not sure of a precise definition for a sprite, but perhaps a reasonable description would be a graphical object that moves around separately to the background. It's a bit like a tile, but with a life of its own and behaviour that depends on what type of object is being represented.

This suggests that a sprite should have variable and constant data associated with it in addition to the image data. For example I've currently got the following pieces of information making up a sprite:

  • Pointer to image data
  • Screen coordinates
  • Velocity
  • Scoring and sound effect information
  • Pointers to routines that implement behaviour

The image data pointer can be constant or changed frame by frame to give animation. There are more than a few ways of drawing sprites, each with a different trade-off between memory use and speed. I would like to draw at arbitrary pixel coordinates which means bit shifting will be involved. At the slow end of the spectrum we could pixel shift the image data as it's being drawn, or at the fast end we could have pre-compiled sprites, where the image data is embedded into immediate instructions.

I've opted for a middle-of-the-road method that is based on pre-shifted image data. By that I mean the four possible bit-shifted versions of the image already exist in memory at the time of drawing.

I've chosen a sprite size of 12 x 12 pixels, requiring 3 x 12 = 36 bytes per image. When converted to the pre-shifted form, it takes 48 x 4 = 192 bytes, because an extra byte of width is needed to fit a shifted image, and we need four images, each with a different amount of shift. The extra memory used is the price paid to gain some speed.

Simply copying the sprite data onto the screen isn't going to work very well in this game, because the background would be overwritten by the parts of the sprite image that should have remained transparent. A way to fix this is to use a mask i.e. an image that is ANDed onto the screen to make a hole into which we can OR the sprite image.

A set of pre-shifted masks and images

Unfortunately, using logical operations means we are restricted to using the a and b registers to transfer the image data to the screen, though a small optimisation is to use ADD instead of OR, allowing us to combine two 8 bit operations into one 16 bit operation. Drawing one row of sprite looks a bit like this:

    pulu d     ; get two bytes of mask
    anda ,x    ; AND with screen
    andb 1,x   ;
    addd 46,u  ; OR with two bytes of image data
    std ,x     ; store onto screen
    pulu d     ; get next two bytes of mask
    anda 2,x   ; etc
    andb 3,x
    addd 46,u
    std 2,x
    leax 32,x  ; move to next screen row

The u register is pointing to the image data which is made up of 48 bytes of mask followed by 48 bytes of sprite. x is pointing to where we need to draw in the screen buffer. Nothing too exciting going on, but it gets the job done.

Sprite place at the sprite time

To determine where to draw the sprite, we need to calculate a screen buffer address from the coordinates. To simplify the address calculation, I'm using scaled coordinates. The x-coord is scaled by 64 and the y-coord is scaled by 32. That means I can form an address just by throwing away the bottom five bits of the y-coord, adding the top byte of the x-coord, then adding the screen buffer base address:

    ldd SP_YORD,y  ; y part of coord
    andb #$e0      ; remove sub-pixel bits
    adda td_fbuf   ; screen start address
    tfr d,x
    lda SP_XORD,y  ; x part of coord
    leax a,x

The y register is pointing to the sprite variables and the calculated address is returned in x. Note that I'm only adding the top eight bits of the buffer base address. This is fine as long as the bottom eight bits are zero.

That gives us the address, but we also need to select one of the pre-shifted frames to show the sprite in the correct pixel column. We need to determine the shift value 0-3 and then multiply that by 96 to give us an offset to add to the sprite image address.

Noting that the shift value is contained in bits six and seven of the x-coord, i.e. already multiplied by 64, we can get the required result by multiplying those bits by 1.5 (Because 64 x 1.5 = 96):

    ldb SP_XORD+1,y
    andb #192
    leau d,u
    leau b,u

Having to use clra and leau d,u is a bit unfortunate and is there to avoid interpreting the top bit of b as a sign bit. One day I may look into re-arranging the pre-shifted data and replace the last four instructions with the more efficient:

    leau b,u
    leau b,u

Or even better, if the registers can be re-allocated without incurring a penalty somewhere else:


The speed of sprite

Movement is achieved very simply by regularly adding a velocity to the coordinates. The moving background makes things slightly more interesting because we want to specify the sprite velocity relative to the background, not the screen. The way to deal with that is to add the background velocity to the sprite velocity. The background velocity is determined once per frame and affects all moving objects equally. The coord update looks like this:

    ldd SP_XORD,y   ; horizontal component
    addd SP_XVEL,y  ; sprite horizontal velocity
    addd scroll_x   ; background horizontal velocity
    std SP_XORD,y

    ldd SP_YORD,y   ; vertical component
    addd SP_YVEL,y  ; sprite vertical velocity
    addd scroll_y   ; background vertical velocity
    std SP_YORD,y

A Hard Day's Sprite

I'd imagine that most people who have experimented with software sprites will have seen what happens when a sprite goes off the edge of the screen. Going off the left or right edges usually means harmlessly reappearing on the other side, but going off the top or bottom can spell disaster as the sprite has moved out of display memory and possibly into memory being used for something else. It's something to be avoided.

In some games the issue can be avoided by ensuring the sprite coords always remain within bounds. However, in my game I would like sprites to cross the screen edges, moving in and out of view just as the background does. That means the sprites must be clipped at the screen edges.

In ROTABB, I had a screen buffer that was larger than the viewport. The sprites were rendered into the buffer and then the central portion of the buffer was copied to the screen, automatically clipping any sprites overlapping the edges of the viewport. Unfortunately, that doesn't sit very well with my new scrolling engine which relies on being able to copy big chunks of data without worrying about where the screen edges are. Plus I need to at least try to keep memory use down, meaning the buffers should be no larger than necessary. I'm going to have to clip sprites the hard way.

Clipping at the bottom of the screen isn't too bad. If the y-coord is greater than the screen height less the sprite height, then the sprite needs to be partially drawn. Subtracting the y-coord from the screen height in fact gives us the number of sprite rows to draw. If it's zero or negative then the sprite is completely off screen and doesn't need drawing.

Clipping at the top is a little more complicated in that we need to calculate how many rows to draw and we also need to calculate an offset for the image data, because the drawing no longer starts at the top row of the sprite. If the y-coord is negative then we need to clip. If it is less than or equal to the negated sprite height then the sprite is off screen. Negating the y-coord tells us how many rows of image to skip, and subtracting that value from the sprite height tells us how many rows to draw.

Sprite of hand

Clipping at the left and right edges is something altogether different. We need to selectively draw one, two, three or four bytes wide, depending on how much of the sprite is visible. I'm happy to say that Steve Bamford had already figured this one out and shared his clever solution involving a lookup table. All we need to do is take the horizontal offset of the screen address and use this to look up the address of a sprite drawing routine from a table. (Taking care to treat the offset as a signed value, because it becomes negative as the sprite crosses the left edge.)

The different sprite routines look after drawing the different widths of sprite and not a single x-coord comparison is required, because the lookup table can be book-ended with addresses that point to a routine that deals with the sprite moving completely off screen. Very elegant. Thanks, Steve!

A sprite for sore eyes:
Clipping in action on all four sides of the screen

Sprite at the end of the tunnel

I've got some more distractions coming up in August but hopefully it won't be too long until the next post, even though the path of the spriteous man is beset on all sides by the inequities of life and the tyranny of work. Or something.