Active support for Papilio and Breakout MachXO2 board had been dropped
Very basic support for the neo430 (msp430 compatible) added, see Docker notes below
Includes a non-configureable basic ‘eval edition’ (in VHDL only) of our pipelined ZPUng core
Basic Virtual board support (using ghdlex co-simulation extensions)
Docker files and recipes included
Docker containers are in my opinion the optimum for automated deployment and for testing different configurations. To stay close to actual GHDL simulator development, the container is based on the ghdl/ghdl:buster-gcc-7.2.0 edition.
Here’s a short howto to set up an environment ready to play with. You can try this online at
Just register yourself a Docker account, login and start playing in your online sandbox. You’ll need to build and copy some files from contrib/docker to the remote Docker machine instance.
Run ‘make dist’ inside contrib/docker, this will create a file masocist_sfx.sh
Copy Dockerfile and init-pty.sh to Docker playground by dragging the file onto the shell window
Build the container and run it:
docker build -t masocist .
docker run -it -v/root:/usr/local/src masocist
Copy masocist_sfx.sh to the Docker machine and run, inside the running container’s home dir (/home/masocist):
sudo sh /usr/local/src/masocist_sfx.sh
Now pull and build all necessary packages:
If nothing went wrong, the simulation for the neo430 CPU will be built and started with a virtual UART and SPI simulation. A minicom terminal will connect to that UART and you’ll be able to speak to the neo430 ‘bare metal’ shell, for example, you can dump the content of the virtual SPI flash by:
s 0 1
The simulation is a cycle accurate model of your user program, which you can of course modify. During the build process, i.e. when you run ‘make sim’ in the masocist(-opensource) directory, the msp430-gcc compiler builds the software C code from the sw/ directory and places the code into memory according to the linker script in sw/ldscripts/neo430. This results in a ELF binary, which is again converted into a VHDL initialization file for the target. Then the simulation is built.
The linker script is, however very basic. Since a somewhat different, automatically generated memory map is used at this experimental stage, all peripherals are configured in the XML device description at hdl/plat/minimal.xml, however the data memory configuration (‘dmem’ entity) does not automatically adapt the linker script.
Turning this into a fully configurable solution is left to be done.
Since scalability of the netpp node solution was advertised, one issue has turned into a FAQ: How to handle errors on loaded networks and traffic between multiple nodes?
Especially with UDP, plenty of scenarios can occur which can confuse all higher protocol layers: lost packets, reverse packet order, duplicate packets…
How to handle these errors, does the netpp layer take care of it all?
It doesn’t. The current strategy with UDP is: We want to see all errors. If we don’t, we’d rather switch to a TCP implementation.
In our test scenario we have, connected by a Gigabit Hub:
One Gigabit Ethernet capable client, one 100M client
Six netpp nodes
If the network is not completely jammed, the usual you would see is timeouts. In Python, these are simple IOError exceptions. If an illegal packet sequence is detected, a SystemError exception will be raised.
As a simple example, the script below will poll all detected hosts at highest frequency possible and cover IOError and SystemError exceptions with different recovery timings.
Note that there is no particular finer control for specific errors in Python. If required, these have to be handled on the C-API level.
hostlist = range(8, 15)
targets = 
for h in hostlist:
ip = "192.168.0.%d" % h
d = netpp.connect("UDP:%s:2016" % ip)
print "Target %s alive" % ip
print "Target %s down" % ip
for h, t in targets:
rev = t.SysCtrl.ReleaseTag.get()
print "Node %d : Class '%s' rev %s" % (h, t.name(), rev)
l = t.LED.Red.get()
l = not l
print 40 * "-"
l = True
for h, t in targets:
dataready = t.UART.RXREADY.get()
print "> %s" % t.UART.RxData.get()
print "Node %d: %s" % (h, sys.exc_info())
print "Node %d: %s" % (h, sys.exc_info())
l = not l
targets = init(hostlist)
The script will just toggle the Green LED for each target and check if there’s input available on the UART. If you have the corresponding netpp node connected via USB serial, you can type in a character at the terminal and see it reported from the script.
Timeouts and sessions
UDP is session-less, therefore netpp handles the connectivity from peer to peer. By default, only two simultaneous connections (two clients) are supported. If a connection is lost, the netpp node will terminate it after a certain timeout, if a new connection is detected. This is signalled on the netpp node UART console by:
If the connection is lost from the netpp node side, for example via a reboot or long cable disconnect, the client may not detect that and keep sending queries. In this case, you might see the following error on the netpp node console:
QRY 55 NAK
(the 55 could be any other code).
In this case, the session would have to be reopened from the client:
d = netpp.connect(...)
FPGA goes cloudy
If you’re collecting data as simple as Temperature and Humidity, for instance, you might want to push the data to the cloud. This is also done using a simple Python script doing a HTTP get request to ThinkSpeak. Note the netpp node does not push the data, the script is running on an embedded Linux module.
I am glad to announce a new user evaluation platform module called ‘netpp node’. Its motto is ‘IoT on FPGA done right’. See detailed specs and preliminary order information here: [ refdesigns/netpp_node ].
The netpp node engineering samples v0.0 have just passed the long term burn-in. Running since approx. 11 weeks non-stop, the units are flooded with netpp requests from an embedded PC and have shown no failure in the hardware, except a reboot resulting from a power outage.
v0.1 series [18.1]:
Received the series! So finally we can ramp up with the delivery to beta developers…(thanks for being so patient).
For analog I/O, U3 on the board is by default populated with a MSP430G2553, functioning as a smart ADC that is controlled from the ZPUng ‘dagobert’ SoC via i2c. All relevant ADC configuration registers are directly accessible via netpp. For instance, we access the low level registers through a process browser panel as shown above to play with the parameters. The process view panel automatically updates the volatile properties from the netpp peer device. The ADC10 variant of the netpp node provides up to six analog channels internally sampled at up to 200ksps. When in synchronous acquisition configuration (SPI master), only five channels can be used.
Differential 16 bit sigma-delta ADC
The alternate population option with a MSP430F2013 provides a Sigma-Delta 16 bit ADC with differential inputs and programmable gain amplifier. This variant provides three different input channel configurations using the provided analog input pins on this board. Moreover, the internal temperature is available in a separate channel.
‘Push on demand’ data streaming
By default, the analog sensors are polled, i.e. a measurement value is delivered upon request by the master. For synchronous sampling however, a ‘push’ strategy might be desired, where a netpp node delivers a value stream to a data logger or database. This can be netpp (where the netpp node acts as a master), however for high speed data transfers (‘network scope’), a low overhead UDP stream is more desirable. The dagobert SoC features a data port option with programmable slots to stream I/O channels as well as analog values using a standard real time protocol with 90 kHz time stamps.
Monitoring netpp packet performance
Packet behaviour in a real network is measured using the Wireshark protocol analyzer.
The figure below shows some example netpp transaction log that the netpp node handles at a very low CPU overhead based on direct register accesses.The red bars is the effective number of query responses using somewhat ineffective ping-pong requests. The performance can be increased by accumulating data into larger buffer properties.
For i2c or SPI transactions however, the packet rate is expected way lower.
For high speed performance like MJPEG video streaming, a separate UDP/RTP queue can be set up within the firmware to reach maximum throughput. However, there is no handshaking using this method.
The image below shows a repeated property query from within Python. The pauses are introduced by external disturbance (stress test) that causes a packet drop – and the netpp engine to timeout and re-synchronize.
Improved RX/TX queue
With an improved packet FIFO on FPGA, I was able to crank up the number of netpp requests per second, as shown in the Wireshark trace below. This test makes sure that several netpp clients can poll the netpp node at high frequencies without disturbing each other. The blue trace is a repeated poll of the full property tree, the red bars are the timed queries from a process viewer daemon. With no other disturbance, we get the occasional drops (e.g. at 45s, 101.5s) due to the queue running full
The default boot loader firmware supports self-programming over the cable. That means, the netpp_node can be supplied remotely with a new firmware image via a simple upgrade procedure over netpp. If the uploaded image is faulty, the system will fall back to the default boot loader. However, if the new design itself has errors, the system will be unable to recover unless the reset button is pressed.
As the full model of this design is available for simulation, we can verify the system effectively against stress situations. In particular, network safety is of outmost importance. The test procedure check list of the dagobert SoC:
ARP and ping flooding
netpp packet performance test
Broken packet handling
Lost interrupt scenario (packet queue desynchronization)
Jumbo packet flooding was tested, however support can not be enabled on this platform for the receiver queue. It is however possible under certain circumstances to generate (TX) Jumbo packets for experimental purposes. The performance gain is however minimal.
Extended RTOS support
Currently, the netpp node runs a simply bare metal main loop without particular RTOS functionality, i.e. all user code must be designed such that there are no blocking wait statements. Let me just put the FAQ together:
There is FreeRTOS and eCos support code for the ZPU architecture. However, I have no plans in going down that road, you’d be on your own.
A NuttX port is currently under evaluation and may likely be released in a few months time. NO PROMISES!
A simple ‘netpp OS’ with very basic task management is in experimental stage:
Guaranteed latency time from driver interrupt to queue handler task
‘User space’ context switch when sleep() called
Very cheap context switches due to ZPU architecture improvements
Code size is an issue on this particular platform, larger programs (TCP stacks) need to move to the SPI flash overlay program space. Since this involves caching, the program timing is no longer fully deterministic and the RTOS functionality can only apply to program code running in the L1 memory.