Posted on 2 Comments

In circuit emulation for the ZPU

So, you’ve been getting used to JTAG for debugging CPUs and funky reverse engineering, haven’t you?

Let’s move to something more constructive. The ZPU softcore has been out for a while. It’s intriguingly simple to use and is really saving on resources. Moreover, it has a fully functional toolchain, simulator and debugger. So why not take this on to the hardware?

The shopping list:

  1. A Test Access Port implementation (TAP)
  2. A piece of software wrapping all the gdb command primitives
  3. A JTAG adapter

The TAP

Lets summarize again on the functionality we expect from a debug port. We want to:

  1. Stop CPU, resume
  2. Single step through code
  3. Read and write PC, SP and other registers
  4. Access Memory (program memory, stack, I/O)
  5. Set software breakpoints in the code

So, the ZPU needs a breakpoint instruction. Well, it does have one! Just that it hasn’t been handled (apart from the simulation) until now. What else is missing? Basically, the EMULATION state. Emulation or to be really precise, In Circuit Emulation (not to mix up with the emulated instructions inside the ZPU) is the standard method to test a CPU in real world, by interrupting its normal execution and feeding instructions via the Debug Port or better: TAP. After the CPU core has executed the so emulated instruction, it returns to emulation mode as long as the emulation bit is set. Leaving emulation again requires an instruction, we just use the same breakpoint instruction (0x00) for this. If the emulation bit is no longer set, the ZPU continues its normal operation, otherwise it executes the next instruction and returns to emulation mode.

This way, we can achieve everything in a simple way – we just have to make sure to save all CPU states in order to avoid being too intrusive. Remember, it can be a nightmare when the program runs when the debug monitor is active, but crash when not in debug mode. Or worse, vice versa.

Being non-intrusive is a matter of the software. On the ZPU we are changing the stack during most of the operations, so we have to explicitely fix it up before returning from emulation.

Let’s summarize what we needed to implement for the TAP – in VHDL modules:

  • jtagx.vhdl: The generic JTAG controller
  • tap.vhdl: The Test Access Port module, using the above JTAG controller. Other type of debug interfaces can be implemented, too

Between TAP and core (ZPU small), we have a bunch of signals and registers. These are merely:

  • emurequest: Request emulation mode (input, level sensitive)
  • emuexec: Execute emulated instruction (input, one clk wide pulse)
  • emuir: Emulation instruction register (input)
  • pc, sp, emudata: Program Counter, Stack pointer, Content at stack pointer (output)
  • state bits: What state is the CPU in?

To see in detail how these modules are linked with the core, see wb_core.vhdl.

Simulating the stuff

Before going into the hardware, we normally simulate things. This is reflected in the test bench hwdbg_small1_tb.vhd. Using the very useful trace module of the zealot ZPU variant, we can verify our architecture from the ZPU interface. Because we have used the TAP and JTAG side in other IP cores, we could safely omit them from the simulation.

Going to the hardware: Software test benches

Once we want to test everything on a real board (and run it over night), we need a JTAG adapter and some piece of software to run JTAG commands. We are using our own JTAG library based on the ICEbearPlus adapter, but any toolchain would do. So to test our primitives like “stop CPU”, “memory read/write”, etc. we just write a simple C program.

For example, the memory read function for 32 bit values, looks like:

uint32_t mem_read32(CONTROLLER jtag, uint32_t addr)
{
    REGISTER r;
    int q = jtag_queue(jtag, 0);
    scanchain_select(jtag, TAP_EMUIR);
    push_opcode(jtag, OPCODE_PUSHSP, EXEC);
    push_val32(jtag, addr);
    push_opcode(jtag, OPCODE_LOAD, EXEC);
    push_opcode(jtag, OPCODE_NOP, EXEC);
    scanchain_select(jtag, TAP_EMUDATA);
    scanchain_shiftout32(jtag, &r, UPDATE);
    scanchain_select(jtag, TAP_EMUIR);
    push_opcode(jtag, OPCODE_LOADSP | (LOADSP_INV ^ 0x01), EXEC); // Execute Stack fixup
    push_opcode(jtag, OPCODE_POPSP, EXEC);
    push_opcode(jtag, OPCODE_NOP, EXEC);
    jtag_queue(jtag, q); return r;
}

Basically, our JTAG sequences are hidden in functions like scanchain_select(), or scanchain_shiftout32(). With all shifting functions, we hint what state we want to enter after shifting. Whenever we enter EXEC, the TAP pulses the emuexec pin for a clock cycle, so the command in the emuir register is executed by the CPU.

Implementing the debugger

Once we have a little library with all basic functionality together, we can start wrapping it with a gdbproxy backend. Wait, what’s gdbproxy? This is a tiny little server, listening on a TCP port and waiting for gdb remote commands. The only thing we have to do: translate a set of skeleton functions into the appropriate calls of our library (called zpuemu). Like we’ve done this for the Blackfin a long time ago, we added another zpu target.

Another approach would be to use openOCD, since it supports a large number of JTAG adapters. The porting exercise we leave to others for now.

A real debugging session

So, let’s debug some program. We are using an old Spartan3 starter kit, equipped with a bunch of useful LEDs, but the main reason is: There is an existing ZPU setup with some I/O, found here: Softcore_implementation_on_a_Spartan-3_FPGA. Thanks to the authors for providing this.

In the image below you can see the Board with a bunch of PCBs stuck in. The ICEbear JTAG is connected to the expansion port, the big Coolrunner board behind is actually our ‘hacked’ Xilinx JTAG adapter, used to program the FPGA.

Spartan 3 board ZPU setup

What we had to do, is the swap the default ZPU implementation against the TAP-enhanced Zealot variant we used. Piece of cake.

gdbproxy sessionNow let’s start hacking. We fire up our gdbproxy server as shown above, it is sitting there and waiting on port 2000.

Then we compile a little program for the ZPU that lights up a few LEDs. Provided that a full ZPU GCC toolchain is installed, the debugging session is dead simple, if you know gdb. Let’s see:

strubi@gmuhl:~/src/vhdl/core/zealot$ zpu-elf-gdb main
GNU gdb 6.2.1
...
(gdb) target remote :2000
Remote debugging using :2000
0x000005b5 in delay (i=5) at main.c:16
16            for (j = 0; j < 1000; j++) {
(gdb) fin
Run till exit from #0  0x000005b5 in delay (i=5) at main.c:16
[New Thread 1]
[Switching to Thread 1]
0x0000063a in main () at main.c:39
39            delay(10);
(gdb) b delay
Breakpoint 1 at 0x57d: file main.c, line 13.
(gdb) c
Continuing.

Breakpoint 1, delay (i=1) at main.c:13
13    {
(gdb)

This works like you might be used to doing it in the simulator. But on real hardware!

Things to try for the future

Actually, you might wonder, why the heck do we need two JTAG adapters? Can’t it be simpler?

In fact, it can. We have used our FPGA vendor independent JTAG I/O, but you could use the Xilinx JTAG primitives for Boundary Scan.

However, as far as I can see, there are only two user defined JTAG instructions. So our current TAP would not work, you would have to tunnel our TAP sequences through the USER1 and USER2 IRs or invent another protocol, for example, by packing our TAP scanchains into the USERx registers. This is again left to implementers, we’d love to hear whether this works though.

Update: By now, the ZPU and other soft cores are being debugged via the native Spartan3 and Spartan6 JTAG port using the BSCAN primitives from above.

Also, nobody forces you do use JTAG. You could just write a very simple interface to a uC and use the UART as debug interface port to the TAP.

So where do we go from here? Have a look at the recent experimental git branch via this link.