Posted on

Asynchronous remote simulation using GHDL

Simulation is daily business for hardware developers, you can’t get things running right by just staring at your VHDL code (unless you’re a real genius).

There are various commercial tools out there which did the job so far: MentorGraphics, Xilinx isim, and many more, the limit mostly being your wallet.

We’re not cutting edge chip designers, so we used to work with the stuff that comes for free with the standard FPGA toolchains. However, these tools – as all proprietary stuff – confront you with limitations sooner or later. Moreover, VHDL testbench coding is a very tedious task when you need to cover all test scenarios. Sooner or later you’ll want to interface with some real world stuff, means: the program that should work with the hardware should first and likewise be able to talk to the simulation.

The interfacing

Ok, so we have a program written in say, C – and a hardware description. How do we marry them? Searching for solutions, it turns out that the OpenSource GHDL simulation package is an ideal candidate for these kind of experiments. It implements the VHPI-Interface, allowing to integrate C routines into your VHDL simulation. Its implementation is not too well documented, but hey, being able to read the source code compensates that, doesn’t it?

So, we can call C routines from our simulation. But that means: The VHDL side is the master, or rather: It implements the main loop. This way, we can’t run an independent and fully asynchronous C procedure from the outside – YET.

Assume we want to build up some kind of communication between a program and a HDL core through a FIFO. We’d set up two FIFOs for Simulation to C, and one for the reverse direction. To run asynchronously, we could spawn the C routine into a separate thread, fill/empty the FIFO in a clock sensitive process from within the simulation (respecting data buffer availability) and run a fully dynamic simulation. Would that work? Turns out it does. Let’s have a look at the routine below.

process -- clock process for clk
    thread_init; -- Initialize external thread
    wait for OFFSET;
    clockloop : loop
        u_ifclk <= '0';
        wait for (PERIOD - (PERIOD * DUTY_CYCLE));
        u_ifclk <= '1';
        wait for (PERIOD * DUTY_CYCLE);
        if finish = '1' then
            print(output, "TERMINATED");
            u_ifclk <= 'X';
        end if;
    end loop clockloop;
end process;

Before we actually start the clock, we initialize the external thread which runs our C test routine. Inside another, clock sensitive process, we call the simulation interface of our little C library, for example, the FIFO emptier. Of course we can keep things much simpler and just query a bunch of pins (e.g. button states). We’ll get to the detailed VHPI interfacing later.

Going „virtual“

The previous method still has some drawbacks: We have to write a specific thread for all our asynchronous, functionality specific C events. This is not too nice. Why can’t we just use a typical program that talks a UART protocol, for example, and reroute this into our simulation?

Well, you expected that: yes we can. Turns out there is another nice application for our netpp library (which we have used a lot for remote stuff). Inside the thread, we just fire up a netpp server listening on a TCP port and connect to it from our program. We can use a very simple server for a raw protocol, or use the netpp protocol to remote-control various simulation properties (pins, timing, stop conditions, etc).

This way, we are interactively communicating with our simulation for example through a python script with the FIFO:

import time
import netpp dev = netpp.connect("localhost")
r = dev.sync()
r.EnablePin.set(1) # arm input in the simulation
r.Fifo.set(QUERY_FRAME) # Send query frame command sequence
frame = r.Fifo.get() # Fetch frame
hexdump(frame) # Dump frame data

Timing considerations

When running this for hours, you might realize that your simulation setup takes a lot of CPU time. Or when you’re plotting wave data, you might end up with huge wave files with a lot of „idle data“. Why is that? Remember that your simulation does not run ‚real time‘. It simulates your entire clocked architecture just as fast as it can. If you have a fast machine and a not too complex design, chances are that the simulation actually has a shorter runtime that its actual realtime duration.

So for our clock main loop, we’d very likely have to insert some wait states and put the main clock process to sleep for a few µs. Well, now we’d like to introduce the resource which has taught us quite a bit on how to play with the VHPI interface: Yann Guidons GHDL extensions. Have a look at the GHDL/clk/ code. Taking this one step further, we enhance our netpp server with the Clock.Start and Clock.Stop properties so we can halt the simulation if we are idling.

Dirty little VHPI details

Little words have been lost about exactly how it’s done. Yanns examples show how to pass integers around, but not std_logic_vectors. However, this is very simple: they are just character arrays. However, as we know, a std_logic has not just 0 and 1 states, there are some more (X, U, Z, ..)

Let’s have a look at our FIFO interfacing code. We have prototyped the routine sim_fifo_io() in VHDL as follows:

procedure fifo_io( din: inout fdata; flags : inout flag_t );
    attribute foreign of fifo_io : procedure is "VHPIDIRECT sim_fifo_io";

The attribute statement registers the routine as externally callable through the VHPI interface. On the C side, our interface looks like:

void sim_fifo_io(char *in, char *out, char *flag);

The char arrays just have the length of the std_logic_vector from the VHDL definition. But there is one important thing: the LSB/MSB order is not respected in the indexing order of the array. So, if you have a definition for flag_t like ’subtype flag_t is unsigned(3 downto 0)‘, flag(3) (VHDL) will correspond to flag[0] in C. If you address single elements, it might be wise to reorder them or not use a std_logic_vector. See also Yanns ‚bouton‘ example.

Conclusion and more ideas

So with this enhancement we are able to:

  • Make a C program talk to a simulation – remotely!
  • Allow the same C program to run on real hardware without modifications
  • Trigger certain events (those nasty ones that occur in one out of 10000) and monitor them selectively
  • Script our entire simulation using Python

Well, there’s certainly more to it. A talented JAVA hacker could probably design a virtual FPGA board with buttons and 7 segment displays without much effort. A good starting point might be the OpenSource goJTAG application (for which we hacked an experimental virtual JTAG adapter that speaks to our simulation over netpp). Interested? Let us know!

Update: More of a stepwise approach is shown at

Another update: Find my presentation and paper for the Embedded World 2012 trade show here: