Deep Dive: LoadChrData

Here’s a post about loading CHR data! NESMaker uses a macro, which in turn uses a subroutine, to load the graphic tiles into the PPU. In this post, I will dissect what NESMaker does under the hood exactly, while adding some context to how the NES loads tiles from the game into its picture processing unit.

Before we start

This post is probably not for absolute NESMaker beginners. It mentions and makes use of things like indirect addressing, bank switching and PPU addressing. If you are not comfortable with these principals, you may want to check up on those first.

Still with me? Okay, let’s deep dive into loading CHR data!

doLoadScreen

The LoadChrData macro gets called in the doLoadScreen subroutine. This subroutine gets called whenever a new screen is being loaded, presuming screen rendering is currently turned off. It does everything that’s needed to show the desired screen on your TV or monitor. First, it removes all objects from screen. Then it loads all data needed to be able to show the new screen in its initial state: screen table data, palettes, background and sprite tiles, nametable data, attribute data and collision data. Thereafter, it loads the player object in the starting position of the screen and draws the HUD if needed. Finally, it does some optional game-specific custom routines and turns on the screen.

I won’t go into detail for all those routines, but instead zoom in on the part that loads the graphic tiles into the PPU.

LoadChrData

All graphic tiles is stored in various ROM banks. LoadChrData is the macro that retrieves the tile data from those banks and transfer them over to the PPU. This is supposed to be done when screen rendering is turned off, so that graphical updates are done when the game is not actually drawing anything on screen, to prevent garbled graphical glitches. This is why sometimes a buffer my be used for screen updates; this macro does not use such buffer though.

This is how the MACRO is instantiated:

MACRO LoadChrData arg0, arg1, arg2, arg3, arg4, arg5, arg6

The macro takes a whopping seven attributes:

  • arg0 is the bank number to draw from. As said before, graphics are stored among different banks. This argument tells the macro which bank should be referenced.
  • arg1 and arg2 are the pattern table row and column of the PPU to load CHR data into. You can find a schematic overview later in the post to show what this means exactly.
  • arg3 is the number of tiles to load. These will vary based on the tile template used in the NESMaker UI, and whether background or sprite tiles are being loaded.
  • arg4 and arg5 hold the high and low byte of the table to load the graphics from; essentially, they form the memory address in the graphics bank where tiles are stored within the game.
  • arg6 is the table index number, or the position in the graphics table starting from the memory address referenced by arg4 and arg5.

Okay, so what does the macro do? First, it stores some of the arguments in “safe” variables, so the inital variables can’t be overwritten within the macro itself by mistake. This looks like this in the code:

    ;; Store bank number in tempBank variable
    LDA arg0
    STA tempBank
    
    ;; Store other arguments in "hold" variables (to prevent
    ;; overwriting any assigned variables called by the macro)
    LDA arg1
    STA arg1_hold
    LDA arg2
    STA arg2_hold
    LDA arg3
    STA arg3_hold
    LDA arg6
    STA arg6_hold

Then, it needs to switch to the bank where the graphic table data pointers are stored, which is bank #$16 by default. There, the macro then first retrieves the high and low byte of the graphical data of the first tile to load, using arg4 and arg5 to look those values up in the CHR lookup table. In code, it looks like this:

    ;; Switch to bank #$16 (which holds various lookup tables
    ;; for pointer memory addresses)
    SwitchBank #$16

        ;; Load table index in y-register
        LDY arg6

        ;; Load table pointer to load from in temp16 variable
        ;; (two byte memory address)
        LDA #<arg4
        STA temp16
        LDA #>arg4
        STA temp16+1

        ;; Load the needed value from the pointer table, based
        ;; on the index value put in the macro, and store in the
        ;; temp variable. This is the high byte.
        LDA (temp16),y
        STA temp

        ;; Load second table pointer to load from in temp16
        ;; variable (two byte memory address)
        LDA #<arg5
        STA temp16
        LDA #>arg5
        STA temp16+1

        ;; Load the needed value from the pointer table, based
        ;; on the index value put in the macro, and store in the
        ;; temp1 variable. This is the low byte.
        LDA (temp16),y
        STA temp1

        ;; Copy both high and low byte of the graphics ROM
        ;; memory location into the two byte temp16 variable.
        LDA temp
        STA temp16
        LDA temp1
        STA temp16+1
    ReturnBank

Now we know where the first tile graphic data is stored, and the two-byte (temp16) variable holds that memory address. So far, that’s all we’ve done; with this data, we can finally actually load the graphics and transfer them over to the buffer. We call a subroutine to do this. When the subroutine has finished and returned to the macro, that is where the macro ends:

    ;; Now we have the correct address in (temp16), so we can
    ;; actually load the CHR data into the PPU through the
    ;; following subroutine.
    JSR doLoadChrRam

;; End macro
ENDM

doLoadChrRam

So far, the macro has prepared the tiles to load by retrieving the memory address where the first tile is stored. tempBank holds the bank, temp16 and temp16+1 holds the memory address, arg1_hold and arg2_hold hold the memory location in PPU to transfer the tiles to, and arg3_hold is used to loop through all tiles that need to be transferred. So now, let’s take a look at the subroutine that is called by the macro: doLoadChrRam. First, I’ll add the entire subroutine script, and then I’ll explain a bit more what it does exactly.

doLoadChrRam:

    ;; Switch to the appropriate CHR RAM bank
    SwitchBank tempBank

        ;; Reset the latch
        BIT $2002

        ;; Load correct memory address
        LDA arg1_hold
        STA $2006
        LDA arg2_hold
        STA $2006

        ;; Loop through the number of tiles (arg3_hold) to be drawn
        LoadTilesOuterLoop:

            ;; Set tile counter to 16 (this helps increasing the
            ;; temp16 high byte when the y-register overflows)
            LDA #$10
            STA temp1
            LDY #$00

            LoadTilesLoop:
                LDX #$10

                ;; Copy current tile (16 bytes) into PPU memory
                LoadChrRamLoop:
                    LDA (temp16),y
                    STA $2007
                    INY
                    DEX
                BNE LoadChrRamLoop

                ;; Now a full tile has been loaded.
                ;; Check if we have loaded 16 tiles yet.
                DEC temp1
                BNE keepLoading

                    ;; Check if there are more tiles to load
                    DEC arg3_hold
                    BEQ doneLoadingTiles

                    ;; Increase the high byte of the address in
                    ;; (temp16)
                    INC temp16+1
                    JMP LoadTilesOuterLoop
                keepLoading:

                ;; Check if there are more tiles to load
                DEC arg3_hold
            BNE LoadTilesLoop

        ;; All tiles in the current "chunk" have been loaded 
        doneLoadingTiles:

    ;; Swap back previous bank
    ReturnBank

    ;; Return
    RTS

There’s a lot to delve into here! Okay, let’s go. First, we need to swap in the bank where the tile data is stored. Then we need to reset the address latch, or the PPU address register may not know whether the written byte is a high or low byte. Reading the PPU status register ($2002) resolves this. Thereafter, we set the high and low byte of the PPU address to write to by writing two bytes to the PPU address register, $2006. This basically sets up the writing of CHR data to the PPU.

Now, we set up three loops: an outer loop for each of the rows to load (this is needed because after 16 tiles, 256 bytes have been loaded into the PPU, which causes the y-register to overflow; we need to account for this happening, so we can increase the high byte of the address in temp16 (INC temp16+1) and keep loading the correct tiles, instead of the same ones again), an inner loop to load each tile within that row (a row has 16 tiles total), and a LoadChrRamLoop to load each tile (a tile consists of sixteen bytes, each byte holding four two-bit pixel colors). We load each byte from the switched in bank, from the correct address through (temp16), with an extra offset of y. By writing that byte to the PPU data port ($2007), the byte gets transferred to the PPU. After sixteen bytes, the first tile has been loaded. Now we check if the row has done loading (i.e. decrease temp1 and check if it is zero), and finally we check if all tiles have been loaded (i.e. decrease arg3_hold and check if it is zero). When all tiles have been loaded into the PPU, we swap back the original bank from before calling the subroutine, and return to where the subroutine was called – in this case, at the end of the LoadChrData macro.

Hooray, we have now loaded a bunch of tiles for use within the current screen of the game!

What about the PPU row and column values?

As mentioned before, the MACRO uses two arguments (arg1 and arg2) to define where in the PPU the tiles should be loaded. To conclude things for now, below you find a schematic overview of how NESMaker uses these values to load graphics into PPU.

The PPU has two 4kB tables to store pattern data (i.e. graphic tiles). NESMaker uses PPU address $0000-$0FFF (or pattern table 0) for sprite graphics, and $1000-$1FFF (or pattern table 1) for background graphics. These two memory tables are further divided in smaller chunks of graphics. The sprite graphics are divided in 2kB of game object graphics which can be used on every screen, and 2kB of monster graphics which can be changed on a per-screen basis. The background graphics are divided in smaller chunks, based on which tile template is used. In the below schema, the example of the normal tile template (Main/Screen/Path) is shown.

External resources