Posted on

netpp on programmable logic: IoT for the FPGA

For quite a while I wouldn’t have said, it’s impossible, but wouldn’t put much effort in it either. Why, if you have a spare $1 microprocessor that can run a simple communication stack like netpp.
Well, sometimes it’s the time to try something else: Running a soft core CPU (ZPU) on small FPGAs has found some interest, due to the limited resource consumption. The ZPU will even fit on a $5 FPGA and still leave some space for specific interfaces like motor control. It is a slow stack machine, even the fastest pipelined implementations don’t really beat the MIPS alike architectures, however this doesn’t bother us when we just have to configure a set of registers, moreover, the ZPU architecture compensates with quite some code density.
To keep a long sermon short: A full netpp stack running over a UART interface fits in less than 15kB of memory. And runs on a MachXO2-7000 from Lattice, for example, with less than 50% logic usage.
Who’s still saying that an FPGA is too dumb for the internet?

Ok, there’s one little missing piece: The TCP/IP stack and the ethernet MAC. For this purpose, I’m using a esp8266 module. You’re right, we don’t want to do the full networking on the FPGA – yet.


The solution was presented on the Embedded World Conference 2016. The paper is available for download below.


Posted on Leave a comment

Dynamic netpp properties

The legacy

Up to netpp v0.3x, device properties used to be static, that means, a device had a certain predefined set of properties written in XML and that’s it. No stuff on-the-fly.
However, as coders familiar with the internals know, there are dynamic properties: The port concept of netpp allows to dynamically add a ‘Port’ to a ‘Hub’ when the latter is probed.
For example, when sending a ‘probe’ broadcast on the ‘TCP’ Hub, all responding servers are added to the Port property list and show up when running the netpp master tool

> netpp
Available interfaces/hubs:
 Child: [80000000] 'TCP'
    Child: [80010000] ''
 Child: [80000001] 'UDP'
    Child: [80020000] ''

The challenge

So, where would dynamic properties in an embedded device make sense? You could think of the following scenarios:

  1. A device would not only certain properties when in a certain operation mode
  2. A device would retrieve its properties from another description than the static property list

The first scenario could so far always be dealt with using class derivation: When in Mode A, the device shows its base properties, when in mode B, it shows the base properties plus some extensions. The built in method of the base device would just solve this by returning the proper device index on the fly from the local_getroot() function.

The second scenario however is trickier:  Of course you could build a static property list from another device description and register it with netpp. However, this would turn out in a rather complicated external code mess, it turns out to be easier to enhance the existing structures for the dynamic properties.

Let’s just work on an example to get to the details:

Example for VHDL test bench

The VPI specification originating from the Verilog HDL (hardware description language) allows to iterate through the signal hierarchy of a hardware design simulation. Using these extensions, quite a few hardware simulators allow to load a shared library on top of the simulation that does a few things, like external signal manipulation.

In various articles (like this), it is described how to interface netpp with virtual hardware using the VHPI specification. However, the VHPI interface does not have the fancy shared library option. This means, each test bench that should be netpp capable must be manually enhanced using the desired netpp interface and DClib hardware description that maps abstract properties into register addresses on the VHDL side.

For a quick and dirty approach, it would be nice if we could just load a module on top of a simulation that queries the top level signals of the test bench, export them as netpp properties, and the user could manipulate them from outside (or remotely) using a python script.

Within our so called vpiwrapper (vpiwrapper.c), we create a function that creates a dynamic property from a signal

TOKEN property_from_signal(TOKEN parent, vpiHandle sig)

The parent is a dynamic(!) root node that we have created previously, if none did already exist.

Enhanced functionality

The above example just created a number of top level properties from existing signals. So far so good. But what if we wanted the basic functionality from the VHPI extensions, too, common to all the simulations? This again would be the static properties from previous approaches. So we mix a lot of stuff: VPI with VHPI extensions and static with dynamic properties.

Wouldn’t that turn out into a nightmare?

Nope. The answer is again derivation: We just create two device root nodes: One is the static node from the static property list (proplist.c). The other is the dynamic node that we created inside the vpiwrapper (if it didn’t already exist). Now we simply set the static node as “base class” of the dynamic node. That means, the property iteration on the slave side (inside the simulation) will see the signal properties created dynamically and the static properties, likewise.

As example, a running “simram” simulation will answer the netpp call as follows:

Properties of Device 'VPI_GHDLwrapper' associated with Hub 'TCP':
Child: [80000001] 'clk'
Child: [80000002] 'we'
Child: [80000003] 'addr'
Child: [80000004] 'data0'
Child: [80000005] 'data1'
Child: [80000006] 'data2'
Child: [80000007] 'ram0'
Child: [80000008] 'ram1'
Child: [00000002] 'Enable'
Child: [00000005] 'Irq'
Child: [00000007] 'Reset'
Child: [00000008] 'Throttle'
Child: [00000009] 'Timeout'
Child: [00000003] 'Fifo'

The properties beginning with a capital are the static ones. You can see a difference in the TOKEN values as well when compared to the signal properties with lower caps. (Internals experts know, that dynamic property tokens have the MSB set)

On a side note: The properties ram0 and ram1 are explicitely registered by this simulation. The implement a netpp BUFFER variable that can simply be read/written asynchronously during the simulation. From the VHDL side, they simulate a simple dual port memory.

Posted on

Asynchronous remote simulation using GHDL

Simulation is daily business for hardware developers, you can’t get things running right by just staring at your VHDL code (unless you’re a real genius).

There are various commercial tools out there which did the job so far: MentorGraphics, Xilinx isim, and many more, the limit mostly being your wallet.

We’re not cutting edge chip designers, so we used to work with the stuff that comes for free with the standard FPGA toolchains. However, these tools – as all proprietary stuff – confront you with limitations sooner or later. Moreover, VHDL testbench coding is a very tedious task when you need to cover all test scenarios. Sooner or later you’ll want to interface with some real world stuff, means: the program that should work with the hardware should first and likewise be able to talk to the simulation.

The interfacing

Ok, so we have a program written in say, C – and a hardware description. How do we marry them? Searching for solutions, it turns out that the OpenSource GHDL simulation package is an ideal candidate for these kind of experiments. It implements the VHPI-Interface, allowing to integrate C routines into your VHDL simulation. Its implementation is not too well documented, but hey, being able to read the source code compensates that, doesn’t it?

So, we can call C routines from our simulation. But that means: The VHDL side is the master, or rather: It implements the main loop. This way, we can’t run an independent and fully asynchronous C procedure from the outside – YET.

Assume we want to build up some kind of communication between a program and a HDL core through a FIFO. We’d set up two FIFOs for Simulation to C, and one for the reverse direction. To run asynchronously, we could spawn the C routine into a separate thread, fill/empty the FIFO in a clock sensitive process from within the simulation (respecting data buffer availability) and run a fully dynamic simulation. Would that work? Turns out it does. Let’s have a look at the routine below.

process -- clock process for clk
    thread_init; -- Initialize external thread
    wait for OFFSET;
    clockloop : loop
        u_ifclk <= '0';
        wait for (PERIOD - (PERIOD * DUTY_CYCLE));
        u_ifclk <= '1';
        wait for (PERIOD * DUTY_CYCLE);
        if finish = '1' then
            print(output, "TERMINATED");
            u_ifclk <= 'X';
        end if;
    end loop clockloop;
end process;

Before we actually start the clock, we initialize the external thread which runs our C test routine. Inside another, clock sensitive process, we call the simulation interface of our little C library, for example, the FIFO emptier. Of course we can keep things much simpler and just query a bunch of pins (e.g. button states). We’ll get to the detailed VHPI interfacing later.

Going “virtual”

The previous method still has some drawbacks: We have to write a specific thread for all our asynchronous, functionality specific C events. This is not too nice. Why can’t we just use a typical program that talks a UART protocol, for example, and reroute this into our simulation?

Well, you expected that: yes we can. Turns out there is another nice application for our netpp library (which we have used a lot for remote stuff). Inside the thread, we just fire up a netpp server listening on a TCP port and connect to it from our program. We can use a very simple server for a raw protocol, or use the netpp protocol to remote-control various simulation properties (pins, timing, stop conditions, etc).

This way, we are interactively communicating with our simulation for example through a python script with the FIFO:

import time
import netpp dev = netpp.connect("localhost")
r = dev.sync()
r.EnablePin.set(1) # arm input in the simulation
r.Fifo.set(QUERY_FRAME) # Send query frame command sequence
frame = r.Fifo.get() # Fetch frame
hexdump(frame) # Dump frame data

Timing considerations

When running this for hours, you might realize that your simulation setup takes a lot of CPU time. Or when you’re plotting wave data, you might end up with huge wave files with a lot of “idle data”. Why is that? Remember that your simulation does not run ‘real time’. It simulates your entire clocked architecture just as fast as it can. If you have a fast machine and a not too complex design, chances are that the simulation actually has a shorter runtime that its actual realtime duration.

So for our clock main loop, we’d very likely have to insert some wait states and put the main clock process to sleep for a few µs. Well, now we’d like to introduce the resource which has taught us quite a bit on how to play with the VHPI interface: Yann Guidons GHDL extensions. Have a look at the GHDL/clk/ code. Taking this one step further, we enhance our netpp server with the Clock.Start and Clock.Stop properties so we can halt the simulation if we are idling.

Dirty little VHPI details

Little words have been lost about exactly how it’s done. Yanns examples show how to pass integers around, but not std_logic_vectors. However, this is very simple: they are just character arrays. However, as we know, a std_logic has not just 0 and 1 states, there are some more (X, U, Z, ..)

Let’s have a look at our FIFO interfacing code. We have prototyped the routine sim_fifo_io() in VHDL as follows:

procedure fifo_io( din: inout fdata; flags : inout flag_t );
    attribute foreign of fifo_io : procedure is "VHPIDIRECT sim_fifo_io";

The attribute statement registers the routine as externally callable through the VHPI interface. On the C side, our interface looks like:

void sim_fifo_io(char *in, char *out, char *flag);

The char arrays just have the length of the std_logic_vector from the VHDL definition. But there is one important thing: the LSB/MSB order is not respected in the indexing order of the array. So, if you have a definition for flag_t like ‘subtype flag_t is unsigned(3 downto 0)’, flag(3) (VHDL) will correspond to flag[0] in C. If you address single elements, it might be wise to reorder them or not use a std_logic_vector. See also Yanns ’bouton’ example.

Conclusion and more ideas

So with this enhancement we are able to:

  • Make a C program talk to a simulation – remotely!
  • Allow the same C program to run on real hardware without modifications
  • Trigger certain events (those nasty ones that occur in one out of 10000) and monitor them selectively
  • Script our entire simulation using Python

Well, there’s certainly more to it. A talented JAVA hacker could probably design a virtual FPGA board with buttons and 7 segment displays without much effort. A good starting point might be the OpenSource goJTAG application (for which we hacked an experimental virtual JTAG adapter that speaks to our simulation over netpp). Interested? Let us know!

Update: More of a stepwise approach is shown at

Another update: Find my presentation and paper for the Embedded World 2012 trade show here: