Here’s a post about loading CHR data! NESMaker uses a macro, which in turn uses a subroutine, to load the graphic tiles into the PPU. In this post, I will dissect what NESMaker does under the hood exactly, while adding some context to how the NES loads tiles from the game into its picture processing unit.
Before we start
This post is probably not for absolute NESMaker beginners. It mentions and makes use of things like indirect addressing, bank switching and PPU addressing. If you are not comfortable with these principals, you may want to check up on those first.
Still with me? Okay, let’s deep dive into loading CHR data!
doLoadScreen
The LoadChrData
macro gets called in the doLoadScreen
subroutine. This subroutine gets called whenever a new screen is being loaded, presuming screen rendering is currently turned off. It does everything that’s needed to show the desired screen on your TV or monitor. First, it removes all objects from screen. Then it loads all data needed to be able to show the new screen in its initial state: screen table data, palettes, background and sprite tiles, nametable data, attribute data and collision data. Thereafter, it loads the player object in the starting position of the screen and draws the HUD if needed. Finally, it does some optional game-specific custom routines and turns on the screen.
I won’t go into detail for all those routines, but instead zoom in on the part that loads the graphic tiles into the PPU.
LoadChrData
All graphic tiles is stored in various ROM banks. LoadChrData
is the macro that retrieves the tile data from those banks and transfer them over to the PPU. This is supposed to be done when screen rendering is turned off, so that graphical updates are done when the game is not actually drawing anything on screen, to prevent garbled graphical glitches. This is why sometimes a buffer my be used for screen updates; this macro does not use such buffer though.
This is how the MACRO is instantiated:
MACRO LoadChrData arg0, arg1, arg2, arg3, arg4, arg5, arg6
The macro takes a whopping seven attributes:
- arg0 is the bank number to draw from. As said before, graphics are stored among different banks. This argument tells the macro which bank should be referenced.
- arg1 and arg2 are the pattern table row and column of the PPU to load CHR data into. You can find a schematic overview later in the post to show what this means exactly.
- arg3 is the number of tiles to load. These will vary based on the tile template used in the NESMaker UI, and whether background or sprite tiles are being loaded.
- arg4 and arg5 hold the high and low byte of the table to load the graphics from; essentially, they form the memory address in the graphics bank where tiles are stored within the game.
- arg6 is the table index number, or the position in the graphics table starting from the memory address referenced by arg4 and arg5.
Okay, so what does the macro do? First, it stores some of the arguments in “safe” variables, so the inital variables can’t be overwritten within the macro itself by mistake. This looks like this in the code:
;; Store bank number in tempBank variable LDA arg0 STA tempBank ;; Store other arguments in "hold" variables (to prevent ;; overwriting any assigned variables called by the macro) LDA arg1 STA arg1_hold LDA arg2 STA arg2_hold LDA arg3 STA arg3_hold LDA arg6 STA arg6_hold
Then, it needs to switch to the bank where the graphic table data pointers are stored, which is bank #$16
by default. There, the macro then first retrieves the high and low byte of the graphical data of the first tile to load, using arg4
and arg5
to look those values up in the CHR lookup table. In code, it looks like this:
;; Switch to bank #$16 (which holds various lookup tables ;; for pointer memory addresses) SwitchBank #$16 ;; Load table index in y-register LDY arg6 ;; Load table pointer to load from in temp16 variable ;; (two byte memory address) LDA #<arg4 STA temp16 LDA #>arg4 STA temp16+1 ;; Load the needed value from the pointer table, based ;; on the index value put in the macro, and store in the ;; temp variable. This is the high byte. LDA (temp16),y STA temp ;; Load second table pointer to load from in temp16 ;; variable (two byte memory address) LDA #<arg5 STA temp16 LDA #>arg5 STA temp16+1 ;; Load the needed value from the pointer table, based ;; on the index value put in the macro, and store in the ;; temp1 variable. This is the low byte. LDA (temp16),y STA temp1 ;; Copy both high and low byte of the graphics ROM ;; memory location into the two byte temp16 variable. LDA temp STA temp16 LDA temp1 STA temp16+1 ReturnBank
Now we know where the first tile graphic data is stored, and the two-byte (temp16)
variable holds that memory address. So far, that’s all we’ve done; with this data, we can finally actually load the graphics and transfer them over to the buffer. We call a subroutine to do this. When the subroutine has finished and returned to the macro, that is where the macro ends:
;; Now we have the correct address in (temp16), so we can ;; actually load the CHR data into the PPU through the ;; following subroutine. JSR doLoadChrRam ;; End macro ENDM
doLoadChrRam
So far, the macro has prepared the tiles to load by retrieving the memory address where the first tile is stored. tempBank
holds the bank, temp16
and temp16+1
holds the memory address, arg1_hold
and arg2_hold
hold the memory location in PPU to transfer the tiles to, and arg3_hold
is used to loop through all tiles that need to be transferred. So now, let’s take a look at the subroutine that is called by the macro: doLoadChrRam
. First, I’ll add the entire subroutine script, and then I’ll explain a bit more what it does exactly.
doLoadChrRam: ;; Switch to the appropriate CHR RAM bank SwitchBank tempBank ;; Reset the latch BIT $2002 ;; Load correct memory address LDA arg1_hold STA $2006 LDA arg2_hold STA $2006 ;; Loop through the number of tiles (arg3_hold) to be drawn LoadTilesOuterLoop: ;; Set tile counter to 16 (this helps increasing the ;; temp16 high byte when the y-register overflows) LDA #$10 STA temp1 LDY #$00 LoadTilesLoop: LDX #$10 ;; Copy current tile (16 bytes) into PPU memory LoadChrRamLoop: LDA (temp16),y STA $2007 INY DEX BNE LoadChrRamLoop ;; Now a full tile has been loaded. ;; Check if we have loaded 16 tiles yet. DEC temp1 BNE keepLoading ;; Check if there are more tiles to load DEC arg3_hold BEQ doneLoadingTiles ;; Increase the high byte of the address in ;; (temp16) INC temp16+1 JMP LoadTilesOuterLoop keepLoading: ;; Check if there are more tiles to load DEC arg3_hold BNE LoadTilesLoop ;; All tiles in the current "chunk" have been loaded doneLoadingTiles: ;; Swap back previous bank ReturnBank ;; Return RTS
There’s a lot to delve into here! Okay, let’s go. First, we need to swap in the bank where the tile data is stored. Then we need to reset the address latch, or the PPU address register may not know whether the written byte is a high or low byte. Reading the PPU status register ($2002) resolves this. Thereafter, we set the high and low byte of the PPU address to write to by writing two bytes to the PPU address register, $2006. This basically sets up the writing of CHR data to the PPU.
Now, we set up three loops: an outer loop for each of the rows to load (this is needed because after 16 tiles, 256 bytes have been loaded into the PPU, which causes the y-register to overflow; we need to account for this happening, so we can increase the high byte of the address in temp16 (INC temp16+1
) and keep loading the correct tiles, instead of the same ones again), an inner loop to load each tile within that row (a row has 16 tiles total), and a LoadChrRamLoop
to load each tile (a tile consists of sixteen bytes, each byte holding four two-bit pixel colors). We load each byte from the switched in bank, from the correct address through (temp16)
, with an extra offset of y. By writing that byte to the PPU data port ($2007), the byte gets transferred to the PPU. After sixteen bytes, the first tile has been loaded. Now we check if the row has done loading (i.e. decrease temp1
and check if it is zero), and finally we check if all tiles have been loaded (i.e. decrease arg3_hold
and check if it is zero). When all tiles have been loaded into the PPU, we swap back the original bank from before calling the subroutine, and return to where the subroutine was called – in this case, at the end of the LoadChrData
macro.
Hooray, we have now loaded a bunch of tiles for use within the current screen of the game!
What about the PPU row and column values?
As mentioned before, the MACRO uses two arguments (arg1
and arg2
) to define where in the PPU the tiles should be loaded. To conclude things for now, below you find a schematic overview of how NESMaker uses these values to load graphics into PPU.
The PPU has two 4kB tables to store pattern data (i.e. graphic tiles). NESMaker uses PPU address $0000-$0FFF (or pattern table 0) for sprite graphics, and $1000-$1FFF (or pattern table 1) for background graphics. These two memory tables are further divided in smaller chunks of graphics. The sprite graphics are divided in 2kB of game object graphics which can be used on every screen, and 2kB of monster graphics which can be changed on a per-screen basis. The background graphics are divided in smaller chunks, based on which tile template is used. In the below schema, the example of the normal tile template (Main/Screen/Path) is shown.
External resources