Posted on 14 Comments

Dockerized FPGA toolchains

For a while, LXC (linux container) technology might have been known for the better chroot. Docker takes this approach even further by letting you mess with changes and undo them easily. You can just install foreign binaries and play with dependencies without compromising your desktop host’s runtime libraries. This article describes how to easily put commercial FPGA toolchains into a docker environment and carry them around on an external hard disk for quick installation on a new developer machine.

Lattice Diamond

The Lattice Diamond toolchain comes as RPM package and is recommended to be run under a Redhat OS. It is possible to convert to a DEB and install and run it on a Debian system likewise, but we try basing it on a minimal RPM compatible environment as an existing CentOS container.

This short howto describes the necessary steps for Diamond v3.10 (older versions work likewise):

First create a file called Dockerfile, like:

FROM centos

RUN yum update ; \
	yum install -y freetype libSM libXrender fontconfig libXext libXt \
		tcl tcsh perl libXft xorg-x11-fonts-Type1 net-tools \
		libXScrnSaver-1.2.2 \
		libusb-0.1.4 usbutils


RUN adduser -u 1000 -g 100 diamond; echo ". env-setup.sh" >> /home/diamond/.bashrc

COPY env-setup.sh /home/diamond

ENTRYPOINT ["/bin/bash"]

You could automate things more by copying the RPM into the docker container, but that would just take useless space in the image (and stripping this back down would take extra action). Therefore we mount the directory where the RPM was downloaded to into the container. Unfortunately, there is no clean way to pull it from an official source the wget way.

You may have to sort out permissions first (by adding yourself to the ‘docker’ group) or prepend ‘sudo’ to each docker call.

The env-setup.sh is a small setup script needed to initialize the environment:

DIAMOND_DIR=/usr/local/diamond/3.10_x64

export DISPLAY=:0.0
bindir=$DIAMOND_DIR/bin/lin64

export QT_GRAPHICSSYSTEM=native

source $bindir/diamond_env

export LM_LICENSE_FILE=$HOME/license.dat

Building the container

  1. Change UID and GID for adduser command in the Dockerfile, if necessary
  2. Run
    docker build -t diamond .
  3. Then you can start the docker container using the following command. Note you have to replace $ETHADDR by the ethernet MAC address you have registered your license.dat to. Also, make $PATH_TO_RPM_INSTALLDIR point to the directory where the downloaded RPM resides.
    docker run -ti -e DISPLAY=:0 \
    --mac-address=$ETHADDR \
    --privileged --ipc host \
    -v $PATH_TO_RPM_INSTALLDIR:/mnt \
    -v /dev/bus/usb/:/dev/bus/usb/ \
    -v /tmp/.X11-unix/:/tmp/.X11-unix diamond:latest
  4. Then install diamond by
    rpm -i /mnt/diamond_3_10-base_x64-111-2-x86_64-linux.rpm
  5. You still need to copy your license.dat into your /home/diamond/ directory within the docker container. If it’s supposed to be elsewhere, edit the LM_LICENSE_FILE environment variable in env-setup.sh.
  6. Then you should be able to start the diamond GUI as user ‘diamond’:
    su -l diamond
    diamond
  7. Finally, if you are happy with your changes, you might want to commit everything to a new image:
    docker commit -m "Diamond install" $HASH_OF_YOUR_CONTAINER diamond:v3.10

A few notes

The -v option takes care about sharing your X11 sockets with the docker sandbox. Note that there also some options inside the env-setup.sh to make QT work in this limited environment. Don’t try to run Diamond as root, as the X11 forwarding will not be allowed.

Lattice iCEcube2

The iCEcube2 2017.0.8 release includes a mix of 32 and 64 bit binaries, therefore requires installation of the i686 variants of some libraries. The minimal Dockerfile would here look as follows:

FROM centos

RUN yum update ; \
    yum install -y libXext libpng libSM libXi libXrender libXrandr \
        libXfixes libXcursor libXinerama freetype fontconfig

# 32 bit support:
RUN yum install -y glibc.i686 glib2.i686 \
    zlib.i686 libXext.i686 libpng12.i686 \
    libSM.i686 libXrender.i686 libXfixes.i686 libXrandr.i686 \
    libXcursor.i686 freetype.i686 fontconfig.i686 

RUN adduser -u 1000 -g 100 icecube2

ENTRYPOINT ["/bin/bash"]

Run the iCEcube2 installer binary obtained from latticesemi.com as user icecube2 and select the license file you have registered via the /mnt directory from your host OS.

Xilinx ISE

The same procedure works  for the ISE 14.7 toolchain. The Dockerfile in this case is almost similar, although includes a few X11 extras.

FROM centos

RUN yum update ; \
    yum install -y freetype libSM libXrender fontconfig libXext \
        tcl xorg-x11-fonts-Type1 net-tools libXScrnSaver-1.2.2 \
        libXi libXrandr \
        libusb-0.1.4 usbutils

RUN adduser -u 1000 -g 100 ise; echo ". env-setup.sh" >> /home/ise/.bashrc

COPY env-setup.sh /home/ise

ENTRYPOINT ["/bin/bash"]

Downloading and unpacking ISE

Make sure you have downloaded the following files from the Xilinx website:

Xilinx_ISE_DS_14.7_1015_1-1.tar
Xilinx_ISE_DS_14.7_1015_1-2.zip.xz
Xilinx_ISE_DS_14.7_1015_1-3.zip.xz
Xilinx_ISE_DS_14.7_1015_1-4.zip.xz

Then untar the first one by

> tar xf Xilinx_ISE_DS_14.7_1015_1-1.tar

This directory path will have to be exported to docker under $PATH_TO_UNPACKED_XILINX_TAR below.

The env-setup.sh file:

#!/bin/bash

export DISPLAY=:0.0
XILINXDIR=/opt/Xilinx/14.7/ISE_DS

. $XILINXDIR/settings64.sh
alias ise=$XILINXDIR/ISE/bin/lin64/ise

Likewise, the docker container is run by something like:

docker run -ti -e DISPLAY=:0 \
--mac-address=$ETHADDR \
--privileged --ipc host \
-v $PATH_TO_UNPACKED_XILINX_TAR:/mnt \
-v /dev/bus/usb/:/dev/bus/usb/ \
-v /tmp/.X11-unix/:/tmp/.X11-unix ise:latest

Once you’re inside docker, install as root by running /mnt/xsetup. This may take a long time. Left to do:

  • Install a license file
  • Mess with the USB drivers for Impact. This is left open to the user. I am using xc3sprog from my host system.
Posted on 1 Comment

Virtual ROM on small FPGAs

Readonly data – outsourced

When running display output applications on small FPGAs where printing of strings is required, using the internal block RAM for character sets would be a waste of resources, or even: not sufficient. The 64kB character map for a normal and inverted font using both RGB and BGR subpixel smoothed renderings (more below) would eat up more than available on the Spartan3-250k device, for example.
The Spartan3 on my good old Papilio has a SPI flash attached with less than 50% actual usage for the FPGA bit stream. Tadaa. Plenty of space for character bitmaps. So we can just store the bitmap in an unused sector on the SPI and blit-copy from SPI flash to the LCD display directly.
But what if there’s more read-only data, like second stage program code or coprocessor microcode that is loaded on the fly by an applet? If we don’t need it for the boot process, it should not live in the block ram permanently, but still execute from block RAM. Screams for a cache, doesn’t it?
So the simple solution actually is, to create a small controller entity ‘scache’ inside the system specific peripheral of the SoC. This simply watches access to certain addresses and generates an exception once this address is hit. The exception vector then jumps into a handler routine which does all the SPI flash loading into the physical cache memory area. Then, for the next LOAD instruction, the virtual address internally translates to the physical cache address.
This requires very simple logic, however runs through some program code and needs a bit of time to load from the SPI flash.
Turns out this is barely noticeable for the LCD display.

Under the hood: Linker scripting

Ok, so there is plenty more data from the .rodata section. We could implement some kind of overlay loader and a kind of file system, but why make it complicated, when our data is somewhat static. We just relocate the cached external data using a linker script. Say, our program memory ranges from 0x0000 to 0x2000, the cache is allocated after that. Then we define a memory specification in the linker script as shown below:

MEMORY
{
        l1ram(rwx): ORIGIN = 0x0000, LENGTH = 0x2000
        l1cache(rwx): ORIGIN = 0x2000, LENGTH = 0x2000
        xdata(r): ORIGIN = 0x10000, LENGTH = 0x8000
}

For the .rodata section, we can simply use a line like

.rodata         : { *(.rodata .rodata.* .gnu.linkonce.r.*) } > xdata

to allocate the read-only areas into the virtual SPI ROM starting from 0x10000. Eventually, compiling the whole story will result in an ELF file which we only have to separate into the boot ROM area and the second stage binaries that live in the SPI flash.

But we may want to have read-only data that should be present at all times, like library code that is frequently used. For this purpose, we can just define our own segments and use the __attribute__ decorator for the data that we want in BRAM:

__attribute__((section(".l1.rodata"))) char g_fifobuf[32];

I will not dive into the details of linker scripting, but likewise you can also allocate entire library or object files in specific memory segments.

This simple technique allows you to pack quite a bit of code into small FPGA SoCs.

The LCD example

LCD screen
LCD subpixel smoothing example

Here you can see an example of the character table in action. I have mentioned the RGB/BGR subpixel smoothing above which lead to a bit higher ROM usage. The used LCDs have quite a bit of functionality, like setting the orientation of the screen display. You could just not care about when designing a character table and use a simple black/white scheme. However, when trying to achieve a 4×8 pixel character set, the font will look quite unreadable and you’re better off with the subpixel smoothing. This again is sensitive to the pixel order, so for top and bottom display orientation you will already need the font rendered in two variants, plus in color. Now, as we have the space, it is just much simpler to drop the entire bit map into the flash instead of figuring out compression techniques.

Program code

Likewise, not too frequently used program code can be dropped into external SPI flash. This comes in handy when a program becomes bigger during development and the space usage can not be determined a priori. The ZPUng is able to handle a specific exception signal that is risen by the SCACHE controller. The cache handler microcode routine then loads code from SPI flash and resumes execution while translating the virtual PC address into a physical address inside the cache area, where instructions are effectively fetched from. This is the trick used to run the netpp communication library for IoT applications  on FPGAs with limited resources, like the Papilio One.


		
Posted on

netpp on programmable logic: IoT for the FPGA

For quite a while I wouldn’t have said, it’s impossible, but wouldn’t put much effort in it either. Why, if you have a spare $1 microprocessor that can run a simple communication stack like netpp.
Well, sometimes it’s the time to try something else: Running a soft core CPU (ZPU) on small FPGAs has found some interest, due to the limited resource consumption. The ZPU will even fit on a $5 FPGA and still leave some space for specific interfaces like motor control. It is a slow stack machine, even the fastest pipelined implementations don’t really beat the MIPS alike architectures, however this doesn’t bother us when we just have to configure a set of registers, moreover, the ZPU architecture compensates with quite some code density.
To keep a long sermon short: A full netpp stack running over a UART interface fits in less than 15kB of memory. And runs on a MachXO2-7000 from Lattice, for example, with less than 50% logic usage.
Who’s still saying that an FPGA is too dumb for the internet?

Ok, there’s one little missing piece: The TCP/IP stack and the ethernet MAC. For this purpose, I’m using a esp8266 module. You’re right, we don’t want to do the full networking on the FPGA – yet.

Update

The solution was presented on the Embedded World Conference 2016. The paper is available for download below.

embedded2016