November 29, 2023

How NES Emulation Works?

Denis Selimović, Backend Engineer
Gaming has been an integral part of childhood since the first gaming consoles came around. One such console is the Nintendo Entertainment System (NES), one of the best-selling consoles out there. We've all played games like Super Mario Bros, The Legend of Zelda, or Castlevania III. These games are exclusively available on this console, which is no longer in production, leading us to the concept of emulation. Emulation is the process of implementing the interface and functionality of one system on a system with a different interface and functionality. In short, one computer system simulates the behavior of another.

The NES is based on the 6502 microprocessor, which utilizes the instruction set with the same name. Emulating the CPU involves reading one instruction at a time in an interpreter loop. After reading, the instruction is decoded to determine its type and the number of operands. Based on the type, the corresponding routine for simulating the instruction on another computer system is invoked. Each instruction updates the state of the interpreter, including program memory, data memory, and general and special-purpose registers.

The NES has a 64kB memory divided roughly into ROM, RAM, and registers. The ROM content depends on the loaded game and remains unchanged during program execution. This content is used to read the next instruction in the interpretative loop. The RAM has a capacity of 8kB, containing four 2kB components. The NES is unique in that RAM is implemented through mirroring, so only the first 2kB of memory can be used in reality, and this memory is mapped to the remaining three components. Hence, the actual capacity is 2kB, a trick used in implementation to reduce any read or write to memory from the original range to within 2kB. The remaining memory space is used for registers, sound, and graphics.

The registers are 8-bit, and there are six of them. Three of them are for general purpose: accumulator, X, and Y. The accumulator stores intermediate results of arithmetic or logical operations. The X and Y are index registers used as offsets during addressing. Additionally, the X register can store the stack pointer value. The program counter, stack pointer, and status register are special-purpose registers.

The program counter contains the address of the next instruction in the ROM. After executing one iteration of the interpretative loop, the program counter value points to the next instruction. Instructions that call a subroutine or perform an interrupt can update this register, changing the program execution context. The stack pointer contains the address of the top of the stack with a size of 256 B. Each stack write decrements this address, and reading from the top increments the pointer value (important for implementation, as it differs from how stacks work in modern programming languages).

The status register stores 8-bit values, each with different meanings and uses in various instructions. It can be implemented using an 8-bit integer type or possibly using enumerated types or discriminated unions in C or C++. Interrupts are used to handle unexpected events or communication with I/O devices. These are signals that disrupt the normal program execution flow to execute a special handling routine. For I/O communication, IRQ is used, where a given device requests access to the processor via the bus. The device's priority determines their order in getting processor time.

Another type of interrupt is NMI, which has priority over all other interrupts. When executing an interrupt, the processor's context must be saved so that the instruction execution can continue after the interrupt routine is finished. The stack is used to store the processor's state, where the program counter and status register are saved. After this, the address of the interrupt routine is loaded into the program counter.

When the interrupt is executed, the processor's context is restored to the previous state using values from the stack. The 6502 has 56 instructions, some using different addressing modes, giving a total of 151 operation codes. These are used in the interpreter loop when decoding instructions. For this reason, each instruction is encoded with 8 bits, allowing a maximum of 256 codes (some codes are unused or used for special purposes in certain games). There are 12 supported addressing modes.

Three are used for addressing in the zero page of memory (ZP0, ZPX, ZPY), where the direct address of the zero page is used (ZP0), or the address + offset from registers X or Y (ZPX, ZPY). The IMP addressing is used for instructions that have no operands or for instructions that operate exclusively on the accumulator. IMM addressing uses the constant value as the address in memory. There are three absolute addressing modes (AB0, ABX, ABY) that use the absolute address with a possible offset from registers. The REL addressing is used for branch instructions. Indirect addressing (pointers) is implemented through IND, IZX, or IZY addressing.

All of the above can be utilized to programmatically implement an emulator for the NES console's central processing unit. The game is loaded into ROM, where the initial address of the program counter is located. The instruction (one or more bytes) is read from the address pointed to by the program counter, and then it is decoded based on its opcode and addressing mode. Each instruction is associated with a separate routine that emulates the NES on the target computer system while updating the state of registers and memory.

Different instructions take a different number of clock cycles, which must be considered during implementation. At the end of each clock cycle, an IRQ or NMI interrupt check is performed, and the interrupt routine is called if necessary. This procedure is repeated in each iteration of the interpretative loop.