Posted on

Multiple node monitoring/control via Python/netpp

Since scalability of the netpp node solution was advertised, one issue has turned into a FAQ: How to handle errors on loaded networks and traffic between multiple nodes?

Especially with UDP, plenty of scenarios can occur which can confuse all higher protocol layers: lost packets, reverse packet order, duplicate packets…

How to handle these errors, does the netpp layer take care of it all?

It doesn’t. The current strategy with UDP is: We want to see all errors. If we don’t, we’d rather switch to a TCP implementation.

In our test scenario we have, connected by a Gigabit Hub:

  • One Gigabit Ethernet capable client, one 100M client
  • Six netpp nodes

If the network is not completely jammed, the usual you would see is timeouts. In Python, these are simple IOError exceptions. If an illegal packet sequence is detected, a SystemError exception will be raised.

As a simple example, the script below will poll all detected hosts at highest frequency possible and cover IOError and SystemError exceptions with different recovery timings.

Note that there is no particular finer control for specific errors in Python. If required, these have to be handled on the C-API level.

import netpp
import time
import sys

hostlist = range(8, 15)

def init(hostlist):
    targets = []
    for h in hostlist:
            ip = "192.168.0.%d" % h
            d = netpp.connect("UDP:%s:2016" % ip)
            print "Target %s alive" % ip
            targets.append((h, d.sync()))
            print "Target %s down" % ip
    return targets

def poll(targets):
    for h, t in targets:
        rev = t.SysCtrl.ReleaseTag.get()
        print "Node %d : Class '%s' rev %s" % (h,, rev)
        l = t.LED.Red.get()
        l = not l
        print 40 * "-"

    l = True
    while 1:
        for h, t in targets:
                dataready = t.UART.RXREADY.get()
                if dataready:
                    print "> %s" % t.UART.RxData.get()
            except SystemError:
                print "Node %d: %s" % (h, sys.exc_info()[1])
            except IOError:
                print "Node %d: %s" % (h, sys.exc_info()[1])
        l = not l
        # time.sleep(0.05)

targets = init(hostlist)

Script details

The script will just toggle the Green LED for each target and check if there’s input available on the UART. If you have the corresponding netpp node connected via USB serial, you can type in a character at the terminal and see it reported from the script.

Timeouts and sessions

UDP is session-less, therefore netpp handles the connectivity from peer to peer. By default, only two simultaneous connections (two clients) are supported. If a connection is lost, the netpp node will terminate it after a certain timeout, if a new connection is detected. This is signalled on the netpp node UART console by:

disconnect_timeout: c0a80002:49587

If the connection is lost from the netpp node side, for example via a reboot or long cable disconnect, the client may not detect that and keep sending queries. In this case, you might see the following error on the netpp node console:


(the 55 could be any other code).

In this case, the session would have to be reopened from the client:

d = netpp.connect(...)

FPGA goes cloudy

If you’re collecting data as simple as Temperature and Humidity, for instance, you might want to push the data to the cloud. This is also done using a simple Python script doing a HTTP get request to ThinkSpeak. Note the netpp node does not push the data, the script is running on an embedded Linux module.

Posted on

netpp node evaluation platform

netpp node [Spartan6-LX9]

I am glad to announce a new user evaluation platform module called ‘netpp node’. Its motto is ‘IoT on FPGA done right’. See detailed specs and preliminary order information here: [ refdesigns/netpp_node ].

Update [21.12.]:

The netpp node engineering samples v0.0 have just passed the long term burn-in. Running since approx. 11 weeks non-stop, the units are flooded with netpp requests from an embedded PC and have shown no failure in the hardware, except a reboot resulting from a power outage.

v0.1 series [18.1]:

Received the series! So finally we can ramp up with the delivery to beta developers…(thanks for being so patient).

Analog I/O

ADC10 low level control

For analog I/O, U3 on the board is by default populated with a MSP430G2553, functioning as a smart ADC that is controlled from the ZPUng ‘dagobert’ SoC via i2c. All relevant ADC configuration registers are directly accessible via netpp. For instance, we access the low level registers through a process browser panel as shown above to play with the parameters. The process view panel automatically updates the volatile properties from the netpp peer device. The ADC10 variant of the netpp node provides up to six  analog channels internally sampled at up to 200ksps. When in synchronous acquisition configuration (SPI master), only five channels can be used.

Differential 16 bit sigma-delta ADC

SD16 analog input

The alternate population option with a MSP430F2013 provides a Sigma-Delta 16 bit ADC with differential inputs and programmable gain amplifier. This variant provides three different input channel configurations using the provided analog input pins on this board. Moreover, the internal temperature is available in a separate channel.

‘Push on demand’ data streaming

By default, the analog sensors are polled, i.e. a measurement value is delivered upon request by the master. For synchronous sampling however, a ‘push’ strategy might be desired, where a netpp node delivers a value stream to a data logger or database. This can be netpp (where the netpp node acts as a master), however for high speed data transfers (‘network scope’), a low overhead UDP stream is more desirable. The dagobert SoC features a data port option with programmable slots to stream I/O channels as well as analog values using a standard real time protocol with 90 kHz time stamps.

Monitoring netpp packet performance

Packet behaviour in a real network is measured using the Wireshark protocol analyzer.

The figure below shows some example netpp transaction log that the netpp node handles at a very low CPU overhead based on direct register accesses.The red bars is the effective number of query responses using somewhat ineffective ping-pong requests. The performance can be increased by accumulating data into larger buffer properties.

For i2c or SPI transactions however, the packet rate is expected way lower.

For high speed performance like MJPEG video streaming, a separate UDP/RTP queue can be set up within the firmware to reach maximum throughput. However, there is no handshaking using this method.

The image below shows a repeated property query from within Python. The pauses are introduced by external disturbance (stress test) that causes a packet drop – and the netpp engine to timeout and re-synchronize.

Python property query session
Python property query session

Improved RX/TX queue

With an improved packet FIFO on FPGA, I was able to crank up the number of netpp requests per second, as shown in the Wireshark trace below. This test makes sure that several netpp clients can poll the netpp node at high frequencies without disturbing each other. The blue trace is a repeated poll of the full property tree, the red bars are the timed queries from a process viewer daemon. With no other disturbance, we get the occasional drops (e.g. at 45s, 101.5s) due to the queue running full


In-Field/System update

The default boot loader firmware supports self-programming over the cable. That means, the netpp_node can be supplied remotely with a new firmware image via a simple upgrade procedure over netpp. If the uploaded image is faulty, the system will fall back to the default boot loader. However, if the new design itself has errors, the system will be unable to recover  unless the reset button is pressed.

Test procedures

As the full model of this design is available for simulation, we can verify the system effectively against stress situations. In particular, network safety is of outmost importance. The test procedure check list of the dagobert SoC:


  • ARP and ping flooding
  • netpp packet performance test
  • Broken packet handling
  • Lost interrupt scenario (packet queue desynchronization)

Yet open

  • Jumbo packet flooding was tested, however support can not be enabled on this platform for the receiver queue. It is however possible under certain circumstances to generate (TX) Jumbo packets for experimental purposes. The performance gain is however minimal.

Extended RTOS support

Currently, the netpp node runs a simply bare metal main loop without particular RTOS functionality, i.e. all user code must be designed such that there are no blocking wait statements. Let me just put the FAQ together:

  • There is FreeRTOS and eCos support code for the ZPU architecture. However, I have no plans in going down that road, you’d be on your own.
  • A NuttX port is currently under evaluation and may likely be released in a few months time. NO PROMISES!
  • A simple ‘netpp OS’ with very basic task management is in experimental stage:
    • Guaranteed latency time from driver interrupt to queue handler task
    • ‘User space’ context switch when sleep() called
    • Very cheap context switches due to ZPU architecture improvements

Code size is an issue on this particular platform, larger programs (TCP stacks) need to move to the SPI flash overlay program space. Since this involves caching, the program timing is no longer fully deterministic and the RTOS functionality can only apply to program code running in the L1 memory.


Posted on

ECP5G Versa board under Linux

The ECP5 platform has caught quite some attention a while ago, being on the rather high end with respect to GigaBit LVDS communication of all sorts. It’s the successor of the ECP3 which has been performing well with HDMI applications in the past.

Lattice Semiconductor had launched a Promo action for this Versa ECP5G board. With great assistance from Future Electronics Switzerland I was able to get hold of a devkit.

Running the Lattice Diamond toolchain on a Linux environment has so far been straightforward, with minor quirks on the GUI side. Let me revisit the important items to get up and running. If you happen to have a Linux OS installed that does not match the Lattice Semi recommendations for a development system, you might want to look at the Docker approach on how to set up the environment.

Programmer preparation

Porting a very simple CPU with UART interface to the platform, the final step is flashing things down into the board using the Lattice Diamond Programmer. Usually, when not having installed any udev rules, these are the steps you have to go through with root powers, in order to get the USB FTDI programmer interface recognized:

Bus 001 Device 013: ID 0403:6010 Future Technology Devices International, Ltd FT2232C Dual USB-UART/FIFO IC

According to this device enumeration, you give access to the device:

chmod a+rw /dev/bus/usb/001/013

and unbind the first ttyUSB0 device from the ftdi_sio driver – unfortunately, the Lattice driver is not able to unbind from within.

echo -n $TTY_ID > /sys/bus/usb/drivers/ftdi_sio/unbind

Replace $TTY_ID by the tty device entry in /sys/bus/usb/drivers/ftdi_sio/ that typically is of the form “1-3.2:1.0” (the trailing 0 stands for port A where the JTAG is at). If you have plugged in more than one FTDI adapter with UART capabilities, you will see several and need to figure out which is which.

Now you can download code into the programmer.

To automatize the process, you could also use the script below:


allow_io=`lsusb | sed -n 's/^Bus \([0-9]*\) Device \([0-9]*\): ID 0403:6010 .*/\1\/\2/p'`

unbind_tty=`ls /sys/bus/usb/drivers/ftdi_sio/ | sed -n 's/\(.*\:1\.0\).*/\1/p'`

sudo chmod a+rw \/dev\/bus\/usb\/$allow_io
sudo sh -c "echo $unbind_tty > /sys/bus/usb/drivers/ftdi_sio/unbind"

SPI flash download

When downloading your design into the SPI flash, make sure you have MASTER_SPI_PORT=ENABLED set in your *.lpf file. Otherwise the programmer will fail with an error report on CHECK_ID:

ERROR - Verification Error...when Processing function: 'CHECK_ID'

Then you’ll have to use the fast download (into SRAM) with an enabled Master SPI in order to be able to flash again.

Design issues

When starting to port our SoC design to this board, quite a few issues came up:

  • Don’t bother developing with Diamond v3.7. Some very obscure behaviour with wrong I/O mapping cost quite some headache. Seems to be solved in v3.8
  • v3.8 however is somewhat misleading with respect to output and return states from the Synopsys Synthesis engine ‘Synplify’. Make sure to check your design thoroughly, if Synplify throws an error, Diamond sometimes would not recognize that and map/PAR an old design netlist.
  • Weird random behaviour can occur under Linux with Diamond during PAR, like error messages with respect to path names. This seems to happen especially with long names and underscores. The behaviour has been around in many previous versions of Diamond, Tech Support has refused to accept this as a bug so far. The workaround is to call PAR using the command line (TCL) or use the Run Manager.
    Addendum: It seems that this bug is fixed in v3.9, but there is no release note about it.


Talking through the UART

In theory, you should be able to talk to your design through the UART (if your design supports it) by firing up minicom:

minicom -o -D /dev/ttyUSB1

Now here comes the catch: The default EEPROM from my Versa 5G board did not have a correct descriptor, however it came with the default VID:PID from FTDI, so the ftdi_sio driver would recognize it, but in fact not communicate, neither report an error.

So, in order to properly use this board, you may have to erase the EEPROM of the FTDI adapter on the Versa kit using FTDIs Mprog tool or alike. You might want to save the previous EEPROM content for reference, however it does not seem needed, the Programmer recognizes the Board just fine.

Finally, after downloading our SoC setup into the board, it is talking:

Booting, HW rev: 04 -- Running at 50 MHZ

------------- test shell -------------
-- ZpuSoC for Versa ECP5 --
-- (c) 2017 --
-- type 'h' for help --

Simulation issues

When trying to fully simulate the SoC setup with PLL primitives and some instanced ip cores created by the Clarity module (obviously that’s the IPExpress descendant for ECP5), it turns out that some of the simulation primitives for the VHDL side are missing.

Some of them could be converted using the VHDL conversion trick from Icarus Verilog by the following Makefile rule:

%.vhdl: %.v
    iverilog -tvhdl -o $@ -pdepth=1 $<

However, some of the needed components might not convert without additional tweaking. Hopefully, Lattice Semi will come out with updated VHDL libraries.

IPcore simulation under GHDL

When generating IP cores that depend on library items and running them through GHDL, you might see this error message:

warning: component instance "scuba_vlo_inst" is not bound

However, if you have prepared your FPGA primitive components library, like right, the primitive simulation models should be in there (search for ‘vlo’ in the *.cf file if in doubt).

The reason why this is happening is that there could be component prototype declarations in the IP core file that shouldn’t be there in order to reference to the components from the library. So: Just remove the “component” declaration sections and all should link fine. The drawback of this is, that you need two IP core versions, one for simulation and one for synthesis. If you have a better solution, let me know.

Next steps

Now, the fancy stuff on this board to be evaluated is:

  • DDR3 memory
  • Two GigE capable interfaces

Also, the ECP5 on this board has enough resources to run several ZPUng cores simultaneously. For safety reasons, we wouldn’t want the IoT crap run in the same environment as our controlling main loop.

Therefore, a second processor (Core B) is instanced for running the Ethernet Stack (lwip) only, while maintaining a simple DMA channel to the controller Core A for communication. If core B is compromised, Core A will still maintain its “hardened” control loop and not go haywire.

Implementing DDR3 is tricky. Therefore you might want to use the DDR3 IP core supplied by Lattice. On the demo kit, it will work for a few hours and then pull the global reset.


Of course, I was very curious about the Ethernet ports on this board. It’s armed with two GigE capable Marvell Phys whose data sheets are a little hard to get hold of, but one might also look at various source code around the web or just check the reference design from Lattice. The reference design uses lwip, since I only need and want UDP, I ported a zerocopy-capable UDP stack I developed for the Blackfin EMAC to the ZPUng SoC (“cranach”), which is equipped with some DMA capable scratch pad memory for a proper packet queue.

After all, the FPGA is now able to speak netpp, so I can turn on an LED for example:

> netpp UDP:192.168.05:2016 LED.Yellow 1

Resource usage

You might want to know how much logic and RAM is consumed by this solution.

Design Summary
   Number of registers:   2770 out of 44457 (6%)
      PFU registers:         2767 out of 43848 (6%)
      PIO registers:            3 out of   609 (0%)
   Number of SLICEs:      2963 out of 21924 (14%)
      SLICEs as Logic/ROM:   2891 out of 21924 (13%)
      SLICEs as RAM:           72 out of 16443 (0%)
      SLICEs as Carry:        309 out of 21924 (1%)
   Number of LUT4s:        4154 out of 43848 (9%)
   Number of block RAMs:  28 out of 108 (26%)
   Number of DCS:  1 out of 2 (50%)
   Number of PLLs:  1 out of 4 (25%)

As for the actual program code, containing:

  • UDP stack supporting ARP, ICMP ping
  • Minimal shell (UART)
  • System I/O drivers (UART, Timer, MAC, PWM)
  • netpp minimal server with some LED handling

This is what’s effectively downloaded into the target:

(gdb) init
Loading section .fixed_vectors, size 0x400 lma 0x0
Loading section .l1.text, size 0x4af5 lma 0x400
Loading section .rodata, size 0x168 lma 0x4ef8
Loading section .rodata.str1.4, size 0xf60 lma 0x5060
Loading section .data, size 0x2c0 lma 0x5fc0
Start address 0x0, load size 25213

There’s a significant amount of string data for debugging in the .rodata.str1.4 section due to debugging info, plus some netpp descriptors. These again could be ‘overlayed’ to the SPI flash, as they are not too frequently accessed. To be investigated next…

Posted on

Dockerized FPGA toolchains

For a while, LXC (linux container) technology might have been known for the better chroot. Docker takes this approach even further by letting you mess with changes and undo them easily. You can just install foreign binaries and play with dependencies without compromising your desktop host’s runtime libraries. This article describes how to easily put commercial FPGA toolchains into a docker environment and carry them around on an external hard disk for quick installation on a new developer machine.

Lattice Diamond

The Lattice Diamond toolchain comes as RPM package and is recommended to be run under a Redhat OS. It is possible to convert to a DEB and install and run it on a Debian system likewise, but we try basing it on a minimal RPM compatible environment as an existing CentOS container.

This short howto describes the necessary steps for Diamond v3.8.

First create a file called Dockerfile in a sandbox work directory, like:

FROM centos

RUN yum update ; \
	yum install -y freetype libSM libXrender fontconfig libXext libXt \
		tcl xorg-x11-fonts-Type1 net-tools libusb-0.1.4 usbutils \

RUN adduser -u 1000 -g 100 diamond; echo "." >> /home/diamond/.bashrc

COPY /home/diamond

ENTRYPOINT ["/bin/bash"]

You could automate things more by copying the RPM into the docker container, but that would just take useless space in the image (and stripping this back down would take little elegant extra action). Therefore we mount the directory where the RPM was downloaded to into the container. Unfortunately, there is no clean way to pull it from an official source.

You may have to sort out permissions first (by adding yourself to the ‘docker’ group) or prepend ‘sudo’ to each docker call.

The is a small setup script needed to initialize the environment:


export DISPLAY=:0.0


source $bindir/diamond_env

export LM_LICENSE_FILE=$HOME/license.dat

Other than that, follow these steps:

  1. Change UID and GID for adduser command in the Dockerfile, if necessary
  2. Run
    docker build -t diamond .
  3. Then you can start the docker container using the following command. Note you have to replace $ETHADDR by the ethernet MAC address you have registered your license.dat to. Also, make $PATH_TO_RPM_INSTALLDIR point to the directory where you installed the RPM from.
    docker run -ti -e DISPLAY=:0 \
    --mac-address=$ETHADDR \
    --privileged --ipc host \
    -v /dev/bus/usb/:/dev/bus/usb/ \
    -v /tmp/.X11-unix/:/tmp/.X11-unix diamond:latest
  4. Then install diamond by
    rpm -i /mnt/diamond_3_8-base_x64-115-3-x86_64-linux.rpm
  5. You still need to copy your license.dat into your /home/diamond/ directory within the docker container. If it’s supposed to be elsewhere, edit the LM_LICENSE_FILE environment variable in
  6. Then you should be able to start the diamond GUI as user ‘diamond’:
    su -l diamond
  7. Finally, if you are happy with your changes, you might want to commit everything to a new image:
    docker commit -m "Diamond install" $HASH_OF_YOUR_CONTAINER diamond:v0

A few notes

The -v option takes care about sharing your X11 sockets with the docker sandbox. Note that there also some options inside the to make QT work in this limited environment. Don’t try to run Diamond as root, as the X11 forwarding will not be allowed.

For Diamond v3.9, the libXt package needs to be installed, the Dockerfile listing was updated accordingly.

Lattice iCEcube2

The iCEcube2 2017.0.8 release includes a mix of 32 and 64 bit binaries, therefore requires installation of the i686 variants of some libraries. The minimal Dockerfile would here look as follows:

FROM centos

RUN yum update ; \
    yum install -y libXext libpng libSM libXi libXrender libXrandr \
        libXfixes libXcursor libXinerama freetype fontconfig

# 32 bit support:
RUN yum install -y glibc.i686 glib2.i686 \
    zlib.i686 libXext.i686 libpng12.i686 \
    libSM.i686 libXrender.i686 libXfixes.i686 libXrandr.i686 \
    libXcursor.i686 freetype.i686 fontconfig.i686 

RUN adduser -u 1000 -g 100 icecube2

ENTRYPOINT ["/bin/bash"]

Run the iCEcube2 installer binary obtained from as user icecube2 and select the license file you have registered via the /mnt directory from your host OS.

Xilinx ISE

The same procedure works  for the ISE 14.7 toolchain. The Dockerfile in this case is almost similar, although includes a few X11 extras.

FROM centos

RUN yum update ; \
    yum install -y freetype libSM libXrender fontconfig libXext \
        tcl xorg-x11-fonts-Type1 net-tools libXScrnSaver-1.2.2 \
        libXi libXrandr \
        libusb-0.1.4 usbutils

RUN adduser -u 1000 -g 100 ise; echo "." >> /home/ise/.bashrc

COPY /home/ise

ENTRYPOINT ["/bin/bash"]

Downloading and unpacking ISE

Make sure you have downloaded the following files from the Xilinx website:


Then untar the first one by

> tar xf Xilinx_ISE_DS_14.7_1015_1-1.tar

This directory path will have to be exported to docker under $PATH_TO_UNPACKED_XILINX_TAR below.

The file:


export DISPLAY=:0.0

alias ise=$XILINXDIR/ISE/bin/lin64/ise

Likewise, the docker container is run by something like:

docker run -ti -e DISPLAY=:0 \
--mac-address=$ETHADDR \
--privileged --ipc host \
-v /dev/bus/usb/:/dev/bus/usb/ \
-v /tmp/.X11-unix/:/tmp/.X11-unix ise:latest

Once you’re inside docker, install as root by running /mnt/xsetup. This may take a long time. Left to do:

  • Install a license file
  • Mess with the USB drivers for Impact. This is left open to the user. I am using xc3sprog from my host system.