Verifying emulator correctness - part 1 (diff)
I’ve been creating emulators for while now, and something that’s always a bit difficult is verifying
that they work as they should. Is ADC
setting the carry flag correctly? Is POP
incrementing SP as it should?
An invaluable tool are the instruction exercisers people create to verify emulator correctness. Examples of these are 8080EXM for the Intel 8080 and zexall for the Z80. These programs will typically give us an indication of whether an instruction is implemented correctly or not. However, if a test is failing the program will not tell us what’s wrong with our implementation.
E.g. zexall shows us this if a particular test is passing:
<rlca,rrca,rla,rra>........... OK
And something like this when it’s failing:
<rlca,rrca,rla,rra>........... ERROR **** crc expected:9ba3807c found:944e7aed
In this post I’ll explain how I figure out where the errors are when the tests are failing. This particular post will
use diff
for finding errors. The next post will explain
how I deal with tests that generate too much data to be run to completion before diffing.
I use a Z80 emulator in the examples below, but the principles are the same for all CPUs.
Assembling with zmac
Assume we run the Z80 test binary zexdoc and the <rlca,rrca,rla,rra>
test is failing, but none of the others. The
output would then be something like this:
Z80doc instruction exerciser
<adc,sbc> hl,<bc,de,hl,sp>.... OK
add hl,<bc,de,hl,sp>.......... OK
add ix,<bc,de,ix,sp>.......... OK
add iy,<bc,de,iy,sp>.......... OK
aluop a,nn.................... OK
aluop a,<b,c,d,e,h,l,(hl),a>.. OK
aluop a,<ixh,ixl,iyh,iyl>..... OK
aluop a,(<ix,iy>+1)........... OK
bit n,(<ix,iy>+1)............. OK
bit n,<b,c,d,e,h,l,(hl),a>.... OK
cpd<r>........................ OK
cpi<r>........................ OK
<daa,cpl,scf,ccf>............. OK
<inc,dec> a................... OK
<inc,dec> b................... OK
<inc,dec> bc.................. OK
<inc,dec> c................... OK
<inc,dec> d................... OK
<inc,dec> de.................. OK
<inc,dec> e................... OK
<inc,dec> h................... OK
<inc,dec> hl.................. OK
<inc,dec> ix.................. OK
<inc,dec> iy.................. OK
<inc,dec> l................... OK
<inc,dec> (hl)................ OK
<inc,dec> sp.................. OK
<inc,dec> (<ix,iy>+1)......... OK
<inc,dec> ixh................. OK
<inc,dec> ixl................. OK
<inc,dec> iyh................. OK
<inc,dec> iyl................. OK
ld <bc,de>,(nnnn)............. OK
ld hl,(nnnn).................. OK
ld sp,(nnnn).................. OK
ld <ix,iy>,(nnnn)............. OK
ld (nnnn),<bc,de>............. OK
ld (nnnn),hl.................. OK
ld (nnnn),sp.................. OK
ld (nnnn),<ix,iy>............. OK
ld <bc,de,hl,sp>,nnnn......... OK
ld <ix,iy>,nnnn............... OK
ld a,<(bc),(de)>.............. OK
ld <b,c,d,e,h,l,(hl),a>,nn.... OK
ld (<ix,iy>+1),nn............. OK
ld <b,c,d,e>,(<ix,iy>+1)...... OK
ld <h,l>,(<ix,iy>+1).......... OK
ld a,(<ix,iy>+1).............. OK
ld <ixh,ixl,iyh,iyl>,nn....... OK
ld <bcdehla>,<bcdehla>........ OK
ld <bcdexya>,<bcdexya>........ OK
ld a,(nnnn) / ld (nnnn),a..... OK
ldd<r> (1).................... OK
ldd<r> (2).................... OK
ldi<r> (1).................... OK
ldi<r> (2).................... OK
neg........................... OK
<rrd,rld>..................... OK
<rlca,rrca,rla,rra>........... ERROR **** crc expected:251330ae found:2afeca3f
shf/rot (<ix,iy>+1)........... OK
shf/rot <b,c,d,e,h,l,(hl),a>.. OK
<set,res> n,<bcdehl(hl)a>..... OK
<set,res> n,(<ix,iy>+1)....... OK
ld (<ix,iy>+1),<b,c,d,e>...... OK
ld (<ix,iy>+1),<h,l>.......... OK
ld (<ix,iy>+1),a.............. OK
ld (<bc,de>),a................ OK
Tests complete
Running all the tests will first of all take a lot of time, but also run a lot of instructions that we know are fine. zexdoc runs more than 5 billion instructions. By modifying the zexdoc source code and reassembling the binary we can make it run only the tests we are interested in.
We can use zmac - Z-80 Macro Cross Assembler to assemble Z80 and 8080 programs. To use zmac you can download the zip file on the website and either use the precompiled binary if you are on a Windows machine, or compile it if you are on Linux or Mac.
To compile:
$ unzip zmac.zip
$ cd src/
$ make
That’s it. We can now assemble Z80 and 8080 programs like so:
$ ./zmac zexdoc.src
$ ls zout/
zexdoc.250.cas zexdoc.250.wav zexdoc.ams zexdoc.bds zexdoc.cim zexdoc.cmd zexdoc.hex zexdoc.lst zexdoc.mds zexdoc.tap
The newly created zexdoc.cim binary is the one that’s interesting.
To modify the tests to only run the tests we are interested in we have to look at the source code (zexdoc.src). It contains a list of tests:
tests:
dw adc16
dw add16
dw add16x
dw add16y
dw alu8i
dw alu8r
dw alu8rx
dw alu8x
dw bitx
dw bitz80 ; not tested from there upwards
dw cpd1
dw cpi1
dw daaop ; can't use opcode as label
dw inca
dw incb
dw incbc
dw incc
dw incd
dw incde
dw ince
dw inch
dw inchl
dw incix
dw inciy
dw incl
dw incm
dw incsp
dw incx
dw incxh
dw incxl
dw incyh
dw incyl
dw ld161
dw ld162
dw ld163
dw ld164
dw ld165
dw ld166
dw ld167
dw ld168
dw ld16im
dw ld16ix
dw ld8bd
dw ld8im
dw ld8imx
dw ld8ix1
dw ld8ix2
dw ld8ix3
dw ld8ixy
dw ld8rr
dw ld8rrx
dw lda
dw ldd1
dw ldd2
dw ldi1
dw ldi2
dw negop ; jgh: can't use opcode as label
dw rldop ; jgh: can't use opcode as label
dw rot8080
dw rotxy
dw rotz80
dw srz80
dw srzx
dw st8ix1
dw st8ix2
dw st8ix3
dw stabd
dw 0
By modifying this list we can choose what tests to run. To only run the <rlca,rrca,rla,rra>
tests, we keep
dw rot8080
and dw 0
, but delete the rest:
tests:
dw rot8080
dw 0
Then reassemble the file with $ ./zmac zexdoc.src
as shown above. After running the new test binary in the emulator
we get the following result:
Z80doc instruction exerciser
<rlca,rrca,rla,rra>........... ERROR **** crc expected:251330ae found:2afeca3f
Tests complete
Only a fraction of the instructions are run in this case, which makes testing against a reference emulator much easier.
Setting up our own emulator and the reference emulator
To figure out what’s wrong we can compare our emulator to a reference emulator that we know runs correctly. Superzazu’s Z80 emulator is one such emulator. We can tell from its Github page that it passes all the zexdoc tests.
The goal is to compare our emulator to the reference emulator by printing the emulator’s state during each instruction, run and store what the emulator prints to stdout in a log file, and then use a diffing tool to see if there are any diffs between the two log files. There should be no differences; the reference emulator is known to be correct, so any deviation on our side is probably incorrect.
By the state of the emulator I mean the following:
- PC
- SP
- Currently executed opcode
- Content of all registers
- The flag register
The memory content is also part of the emulator’s state, but is in most cases too large to print out, so I’ll skip that.
We therefore have to create a function that prints the state in both emulators. In my own emulator I’ll add a call to the print function right before the big jump table in the CPU core (which is how I implement the opcode parsing):
m_opcode = get_next_byte().farg;
print_debug(m_opcode); // HERE
r_tick();
switch (m_opcode) {
case NOP:
nop(cycles);
break;
}
I’m also adding the same call to the IX/IY, IX/IY bits, bits and EXTD jump tables. The calls can be seen here, here, here, here and here.
The print function that prints the emulator’s state looks like this:
std::string hexify(u8 val) {
std::stringstream ss;
ss << "0x" << std::setfill('0') << std::setw(2) << std::hex << static_cast<int>(val);
std::string return_val = ss.str();
return return_val;
}
void Cpu::print_debug(u8 opcode) {
std::cout << "pc=" << hexify(m_pc)
<< ",sp=" << hexify(m_sp)
<< ",op=" << hexify(opcode)
<< ",a=" << hexify(m_acc_reg)
<< ",b=" << hexify(m_b_reg)
<< ",c=" << hexify(m_c_reg)
<< ",d=" << hexify(m_d_reg)
<< ",e=" << hexify(m_e_reg)
<< ",h=" << hexify(m_h_reg)
<< ",l=" << hexify(m_l_reg)
<< ",a'=" << hexify(m_acc_p_reg)
<< ",b'=" << hexify(m_b_p_reg)
<< ",c'=" << hexify(m_c_p_reg)
<< ",d'=" << hexify(m_d_p_reg)
<< ",e'=" << hexify(m_e_p_reg)
<< ",h'=" << hexify(m_h_p_reg)
<< ",l'=" << hexify(m_l_p_reg)
<< ",ix=" << hexify(m_ix_reg)
<< ",iy=" << hexify(m_iy_reg)
<< ",i=" << hexify(m_i_reg)
<< ",r=" << hexify(m_r_reg)
<< ",c=" << m_flag_reg.is_carry_flag_set()
<< ",po=" << m_flag_reg.is_parity_overflow_flag_set()
<< ",hc=" << m_flag_reg.is_half_carry_flag_set()
<< ",n=" << m_flag_reg.is_add_subtract_flag_set()
<< ",z=" << m_flag_reg.is_zero_flag_set()
<< ",s=" << m_flag_reg.is_sign_flag_set()
<< ",y=" << m_flag_reg.is_y_flag_set()
<< ",x=" << m_flag_reg.is_x_flag_set()
<< "\n"
<< std::flush;
}
The individual bits of the flag register are shown on their own. This is to make it easier to see where the error is if we encounter a diff. In general we only want one line per instruction (or two or three in the case of IX/IY, bits or EXTD instructions). We will print a lot of instructions, so each line should not be too big, otherwise we’ll end up using a lot of unnecessary space on the hard drive.
When running zexdoc with the emulator now we see the following at the top of the log file. The log file itself is 2.8 GB, and that was only for one small test.
pc=0x0101,sp=0xffff,op=0xc3,a=0xff,b=0x00,c=0x00,d=0x00,e=0x00,h=0x00,l=0x00,a'=0x00,b'=0x00,c'=0x00,d'=0x00,e'=0x00,h'=0x00,l'=0x00,ix=0x0000,iy=0x0000,i=0x00,r=0x00,c=1,po=1,hc=1,n=1,z=1,s=1,y=1,x=1
pc=0x0114,sp=0xffff,op=0x2a,a=0xff,b=0x00,c=0x00,d=0x00,e=0x00,h=0x00,l=0x00,a'=0x00,b'=0x00,c'=0x00,d'=0x00,e'=0x00,h'=0x00,l'=0x00,ix=0x0000,iy=0x0000,i=0x00,r=0x01,c=1,po=1,hc=1,n=1,z=1,s=1,y=1,x=1
pc=0x0117,sp=0xffff,op=0xf9,a=0xff,b=0x00,c=0x00,d=0x00,e=0x00,h=0xc9,l=0x01,a'=0x00,b'=0x00,c'=0x00,d'=0x00,e'=0x00,h'=0x00,l'=0x00,ix=0x0000,iy=0x0000,i=0x00,r=0x02,c=1,po=1,hc=1,n=1,z=1,s=1,y=1,x=1
pc=0x0118,sp=0xc901,op=0x11,a=0xff,b=0x00,c=0x00,d=0x00,e=0x00,h=0xc9,l=0x01,a'=0x00,b'=0x00,c'=0x00,d'=0x00,e'=0x00,h'=0x00,l'=0x00,ix=0x0000,iy=0x0000,i=0x00,r=0x03,c=1,po=1,hc=1,n=1,z=1,s=1,y=1,x=1
The same print function has to be set up in the reference emulator. The reference emulator has to print out its state
in exactly the same format as our emulator, otherwise the diffing tool will complain on every single line. Superzazu’s
emulator is written in C, and the C version of print_debug
looks like this:
void print_debug(z80* const z, uint8_t opcode) {
printf("pc=0x%04x", z->pc);
printf(",sp=0x%04x", z->sp);
printf(",op=0x%02x", opcode);
printf(",a=0x%02x", z->a);
printf(",b=0x%02x", z->b);
printf(",c=0x%02x", z->c);
printf(",d=0x%02x", z->d);
printf(",e=0x%02x", z->e);
printf(",h=0x%02x", z->h);
printf(",l=0x%02x", z->l);
printf(",a'=0x%02x", z->a_);
printf(",b'=0x%02x", z->b_);
printf(",c'=0x%02x", z->c_);
printf(",d'=0x%02x", z->d_);
printf(",e'=0x%02x", z->e_);
printf(",h'=0x%02x", z->h_);
printf(",l'=0x%02x", z->l_);
printf(",ix=0x%04x", z->ix);
printf(",iy=0x%04x", z->iy);
printf(",i=0x%02x", z->i);
printf(",r=0x%02x", z->r);
printf(",c=%d", z->cf);
printf(",po=%d", z->pf);
printf(",hc=%d", z->hf);
printf(",n=%d", z->nf);
printf(",z=%d", z->zf);
printf(",s=%d", z->sf);
printf(",y=%d", z->yf);
printf(",x=%d", z->xf);
printf("\n");
}
It’s called in the same places as for my emulator:
void exec_opcode(z80* const z, uint8_t opcode) {
print_debug(z, opcode);
z->cyc += cyc_00[opcode];
inc_r(z);
switch (opcode) {
// ...
}
}
Superzazu’s emulator is also set up to run three tests in z80_tests, so I’ll comment out the two unnecessary tests:
//r += run_test(&cpu, "roms/prelim.com", 8721LU);
r += run_test(&cpu, "roms/zexdoc.cim", 46734978649LU);
//r += run_test(&cpu, "roms/zexall.cim", 46734978649LU);
Comparing with diff
Now that both the emulators are setup for printing their state to stdout, we can run them both and log stdout to a log file:
$ ./emulator run zexdoc > sut.log
$ ./z80_tests > ref.log
The files are 2.8 GB, so lets compare the first 10,000 lines first:
head -10000 sut.log > sut_short.log
head -10000 ref.log > ref_short.log
We can now use a diffing tool such as KDiff3 or just old regular diff
to compare the files:
From the screenshot of the diff in KDiff3 (click on it to see the full-sized image), we can see that the first diff is
at line 9375. The opcode that was executed before the diff is rra
. On the left side, which is my emulator’s output,
we can see that the N flag is set after the rra
, while it’s unset in the reference emulator. According to the
documentation, the rra
flag should always reset N, so that’s an error in my emulator.
After fixing the code we can see that KDiff3 no longer reports a diff at that line anymore.
In fact, in no longer report any diffs, and the test now passes:
Z80doc instruction exerciser
<rlca,rrca,rla,rra>........... OK
Tests complete
And that’s how I fix bugs in my emulators. By narrowing down the test suite to a single test and then comparing my emulator to a reference emulator that is known to be correct. This might not always work, however. In some cases the diff might occur after millions of executed instructions, which makes the log files gigantic. In cases like that, executing one instruction at a time while running the emulators in parallel might be a better solution. I’ll explain how I do that in the next part.