Return of the Beast: April 2017

At some point during development, it struck me how complicated the scroll engine was becoming. The initial concept could be summarised in a few words and was supposed to be a simple matter of moving pointers around circular buffers, copying data from one place to another. But now there seemed to be an endless list of details to consider.

In some ways, the scrolling really is simple. It does pretty much boil down to pointers and copying. Where it gets complicated is figuring out which pieces of tile to draw, and a lot of that complication comes from the interaction between horizontal and vertical scrolling. One direction messes up the other.

But I do like a challenge, and I really wanted to see this thing working...

For horizontal scrolling we need to draw a one byte wide vertical stripe of tile pieces. Right away there are a couple of problems. The first is caused by the tiles being two bytes wide and the second is caused by vertical scrolling.

The first problem is easily solved by modifying the 'tile zero' base address we use to calculate the tile image addresses. By adding one to the base address, we automatically access the right half of each tile. If the tile image data starts at an even address, then the least significant bit of the address can be set to achieve the same effect.

The information to decide if we are accessing the left or right half of a tile comes from the map column pointer. This pointer is incremented or decremented at the same time as the buffer pointer, meaning the least significant bit indicates which tile half is being pointed to.

The second problem is a bit more involved. It means we have to deal with partial tiles at the top and bottom of the display. This divides the vertical stripe into three sections: A partial tile at the top, a run of complete tiles (well, half tiles), and a partial tile at the bottom.

Sometimes the tiles will line up neatly with the top and bottom edges of the display, meaning no partial tiles, and fewer visible tiles. How does this work exactly? The screen is 12 tiles high, but if partially scrolled, 13 will be visible. We can define things a bit more precisely to help shape the algorithm:

The top tile will have some or all pixel rows visible.
The run of complete tiles will always be the same number of tiles (11 if part of a full height display).
The bottom tile will have some pixel rows visible, or none.

That puts the bottom tile in The Occasionally Disappearing 13th Row*. It simply gets skipped on those frames where the tiles have perfect vertical alignment with the screen.

Take it from the top

Let's look at the top tile first. When it is partially visible, the bit that is missing is the top part. We can use the vertical pixel counter to figure out the parameters. This counter is incremented each time we need to move the screen contents up one pixel. The tiles are eight pixels high, so we're interested in the bottom three bits of the counter, which gives us a number in the range zero to seven. When it is zero, all of the tile rows are visible, and when it seven, just the bottom row of the tile is visible. So what we need is to subtract this number from eight to give us the number of rows of pixels to draw.

The other thing we need to determine is an offset into the tile image data so that we draw the correct part of the tile. Just like we did for vertical scrolling, we can take the bottom three bits of the vertical counter and multiply by two to create an offset for the tile image data.

The list of hoops to jump through before we can start drawing looks like this:

Determine where in the buffer to start drawing using the logic discussed in Scrolling 101
Determine where in the map we will start reading using the map row and column pointers discussed in Details
Modify the tile image base address to select the left half or right half of each tile. (i.e. add one if the bottom bit of the map column pointer is set)
Determine the parameters for the partial top tile using the vertical counter

That sets us up for the top tile. We use the map pointer to give us the tile ID which in turn allows us to calculate the address of the image data. We can then copy tile image bytes to the buffer.

After each tile image byte is drawn into the buffer, we need to advance the buffer destination pointer to the next row and check that it hasn't crossed the end of the buffer. If it has, then the buffer size should be subtracted from the pointer so that drawing continues from the top of the buffer.

Take it to the bridge

After the top tile, we need to draw the run of full height tiles. These are relatively easy as they are all fixed height and can be drawn with two loops: An inner loop to output the fixed number of bytes per tile, and an outer loop to output the fixed number of tiles. We continue advancing and wrapping the buffer destination pointer for each byte written, and similarly advance and wrap the map pointer for each tile produced.

Having to check the buffer destination address for every byte written consumes a lot of cycles. It looks like this piece of code:

    cmpx #buffer_end
    blo no_adjust
    leax -buffer_size,x
no_adjust

As the end of the buffer can only be crossed once, this code does very little useful work. It nearly always executes just the cmpx and the blo, but that's still 96 x 7 = 672 cycles for a full height draw.

It would be nice to avoid as many of those checks as possible. The approach I've used is to check the buffer pointer before drawing each full tile. If there is room to draw a tile without reaching the end of the buffer then it draws the tile using an unrolled loop with no pointer check. Otherwise the tile is drawn byte by byte in a loop with the pointer check. That trims out a lot of cycles without adding a lot of complexity.

Throw it in the river

Finally we reach the partial tile at the bottom. This is easier to deal with than the top tile. Firstly, the part of the tile that is missing is the lower part of the tile, so there's no need to offset the image address. Secondly, the number of pixel rows we need to draw is simply the bottom three bits of the vertical pixel counter. If it's zero, we don't need to draw the tile at all as we've already reached the bottom of the screen.

What we have so far, is pixel-by-pixel vertical scrolling, but the horizontal scroll is still only byte-by-byte. To get fast horizontal scrolling working at the pixel level, we need to bring in additional buffers and expand the drawing routines to include pixel shifting. Another layer of complexity. But at least the scroll engine will then be complete. It couldn't get any more complicated than that. Could it? To be continued...

(Spoiler alert: Yeah, it could)

* The Occasionally Disappearing 13th Row is possibly a British movie of the "I say, that's inconvenient!" disaster movie sub-genre, starring Timothy Dalton as a cheesy airline boss; Bill Nighy, apparently legally required to be in every British movie; and Martin Freeman as Tim from The Office. Again.

It's fair to warn you that this one could be heavy going. Time to figure out some details. It's just drawing tiles, so you'd think it would be easy, but there are complications. Oh the horror...

Problems, problems

For the moment, I'll focus on scrolling whole bytes rather than shifting pixels, to avoid getting too complicated, too quickly. I'm not smart enough to deal with the whole problem in one go and I find it's helpful to understand it in small byte-sized pieces, if you'll forgive the terrible pun. (And you've forgiven so much to have got this far). Here's a list to start thinking about:

We need to know which tiles need drawing. That implies some kind of pointer into the tile map to track movement on the screen.

We need to convert tile numbers from the map into addresses of graphics data for drawing.

For reasons that used to make sense, the tiles are two bytes wide. That's a complication. Sometimes we will be drawing the left half of a tile, and sometimes we will be drawing the right half.

We could reach the end of the buffer before we've finished drawing a strip of tile fragments. That means we need to keep control of the drawing address and wrap it when required.

We could fall off the edge of the world. I mean the map. We could fall off the edge of the map while drawing tiles. That means the map pointer may need adjustment between tiles.

The multidirectional scrolling means we will have variable sized pieces of tile on all sides of the screen. Oh joy. It didn't have to be this way. I could have done a nice side-scroller. That would have been much easier to program and document. But not me, nooOOOooo. Scrolling in one direction just wasn't good enough for Lady-La-De-Da-Fancy-Multiway-Scroller. Now everyone has to stay late and clear up the mess. Well, I hope I'm happy with myself. Hey where'd everyone go?

Solutions, solutions

The map is a two-dimensional structure, which suggests the use of separate row and column pointers. It's tempting to use a single pointer to access the map, just like we're doing with the buffer pointer, but there is a subtle difference: When we increment the address past the left or right hand edge of the map, we find ourselves on the other side of the map, but one row up or down. Not exactly what we want. We would like to appear on the other side of the map on the same row. That seems easier to handle (to me anyway) with separate pointers.

The row pointer could be the address of the first tile of the current map row, and it would need to be updated each time we've scrolled enough to bring a new row of tiles into view. i.e. every eight pixels. If the pointer goes off the top of the map then the map size can be added to get us back into bounds and a similar adjustment done for the other direction.

That leaves the column pointer to specify an offset for a particular tile in the row. If it is incremented or decremented at the same time as the buffer pointer then it represents a half tile. (Recall that the tiles are two bytes wide). This sounds pretty useful. The least significant bit can be shifted into the carry and used to select the left or right half of the tile, and the remaining bits can be added to the row pointer to give us the map location. As the map is a power of two wide, wrapping can be achieved by ANDing the pointer with a mask. (Otherwise it would need a compare and branch)

The tile that the map pointer is pointing to is the one that should appear in the top left corner of the screen. This is fine for scrolling left or up, because the new strip of graphics will include that tile. For scrolling right or down there's an extra step: An offset will need to be added to the column or row pointer to address the tiles on the other side of the screen. It's a similar concept to the one used to decide where to draw in the buffer for each scroll direction.

Each tile image takes up 16 bytes of memory, so all we need to do to find the address of a tile image is to load the tile number from the map, multiply it by 16 and add the address of tile number zero. Using a lookup table would be a little faster, but I'll keep it simple and stick with a MUL instruction for now.

A closer look at vertical scrolling

Vertical scrolling involves drawing a horizontal strip one pixel high, running the width of the screen. A tile is eight pixels high, so we need to pick one of the eight pixel rows to draw. This can be handled by adding an offset to the tile zero address. The offset is simply two times the number of pixels of displacement required, because each row of the tile is two bytes.

That means we need to keep track of the vertical pixel offset. I'm using a vertical counter variable that is decremented or incremented for each pixel scrolled up or down. By ANDing the counter with #7, we can detect when we cross a tile boundary and therefore need to move the map pointer. After ANDing with #7, the value can be left-shifted (i.e. multiplied by two) to create the pixel row offset.

That works neatly for scrolling up, but there is a modification required for scrolling down. Firstly, we need a different offset. If you imagine we've scrolled into a position where the tiles are neatly lined up with the top and bottom edges of the screen, the counter ANDed with #7 will be zero. If we scrolled up into this position then we needed to draw the top edge of the tiles as the zero offset suggests. On the other hand, if we had scrolled down, then the bottom edge of the tiles has just appeared. That's a different offset and we get it by ADDing #7 to the counter value before we AND with #7.

When I first implemented that, something weird happened when scrolling down. Seven out of every eight lines drawn were correct, but the ones that appeared for vertical offset zero were wrong. This didn't make any intuitive sense and it took me a while to figure out: A screen full of tiles would be 12 tiles high, but in general you would be able to see 13 rows of tiles, thanks to the partial tiles at the top and bottom. So the bottom row is usually 12 map rows below the map pointer, but only 11 rows when the vertical offset is zero. I ended up doing a clunky little correction just for offset zero. There might be a better way but I haven't thought of one so far.

As if the vertical scroll wasn't complicated enough already, there are another two cases to consider. The tiles are two bytes wide, meaning the drawing starts on either the left or right hand half of a tile. When it starts with the left half of a tile, there will be 16 tiles across the screen, all drawn in the same kind of way. I call this even drawing. But if we start with the right half of a tile, then there will be three sections: The right half of the first tile, then 15 complete tiles, then the left half of a 17th tile. No prizes for guessing I call this odd drawing.

I handled this by having two drawing routines, one even and one odd, and the one called is determined by the least significant bit of the map column pointer.

Before I get into the drawing, here's a summary of the steps so far:

Scroll up:

Move buffer pointer up one line, wrapping if necessary
Decrement vertical pixel counter
If pixel counter AND #7 = 7 then move map row pointer up one row (and wrap)
Calculate the tile image base address: tile zero address + 2 x (pixel counter AND #7)
Buffer drawing start address = buffer pointer
Map source data = Map row pointer
Draw even or odd depending on map column offset LSB

Scroll down:

Move buffer pointer down one line, wrapping if necessary
Increment vertical pixel counter
If pixel counter AND #7 = 0 then move map row pointer down one row (and wrap)
Calculate the tile image base address: tile zero address + 2 x ((pixel counter + 7) AND #7)
Buffer drawing start address = old buffer pointer (the line above the current pointer)
Map source data = Map row pointer + number of visible rows - 1 row
Draw even or odd depending on map column offset LSB

Draw some tiles already

The even drawing routine looks something like the code below. On entry, u is the tile image base address and y points to the map row.

lda map_col ; map column pointer
lsra ; get the offset
    sta coffset ; store offset directly in instruction

      lda #16 ; draw 16 tile pieces
    sta count ;

loop
coffset equ *+2 ; variable is part of instruction
    lda <0,y       ; get tile number from map
    ldb #16 ; calculate address of image data
    mul ;
    ldd d,u ; load the tile image bytes
    std ,x++ ; store tile bytes in buffer
    cmpx #BUF_END ; check for end of buffer
    blo nowrap ; not end of buffer
    leax -BUF_SIZE,x ; adjust back to start of buffer
nowrap lda coffset ; move to next column in map
    inca ;
    anda #(MAPWID-1) ; wrap map column
    sta coffset ;

    dec count ; do next tile
    bne loop ;

There isn't all that much to it. I've used some self-modifying code to make life easier. The coffset variable is actually the 8 bit index mode offset used to load the tile number from the map. The < symbol tells the assembler to use an 8-bit offset so the instruction bytes are in the expected layout.

It's assumed that the two bytes of the tile image will never fall on either side of the end of the buffer. This is a safe assumption providing that the buffer size is a multiple of the tile width, and that the first tile is aligned with the start of the buffer.

Odd drawing is a bit more complicated. First there is the right half of the first tile:

lda map_col ; map column pointer
lsra ; get the offset
lda a,y ; get tile number
    ldb #16 ; calculate address of image data
    mul ;
    ldd d,u ; load the tile image bytes
stb ,x+ ; store right half in buffer

It might seem strange to load both bytes of the tile to then throw away the left byte, but this is an easy way of accessing the right byte. If I wanted to load just one byte, then I would first have to add one to the address which would take longer.

After this we need to check and wrap the buffer address and then we can draw 15 tile pieces using the even drawing algorithm above. The only differences are we need to draw 15 pieces instead of 16, and we need to add one to the map column offset before we start.

The left half of the tile at the end is really easy. The registers already have the values we need thanks to the code that drew the 15 tiles:

lda a,y
  ldb #16
  mul
  lda d,u
  sta ,x

Simples!

Hey, what about horizontal scrolling?

Yeah, I know, we're having so much fun and it's hard to stop, but my fingers are all wore out, what with all that typing and everything. So next time we'll have a closer look at the horizontal scrolling...

Return of the Beast

Tuesday, 18 April 2017

More details