Posted on 2 Comments

ECP5G Versa board under Linux

The ECP5 platform has caught quite some attention a while ago, being on the rather high end with respect to GigaBit LVDS communication of all sorts. It’s the successor of the ECP3 which has been performing well with HDMI applications in the past.

Lattice Semiconductor had launched a Promo action for this Versa ECP5G board. With great assistance from Future Electronics Switzerland I was able to get hold of a devkit.

Running the Lattice Diamond toolchain on a Linux environment has so far been straightforward, with minor quirks on the GUI side. Let me revisit the important items to get up and running. If you happen to have a Linux OS installed that does not match the Lattice Semi recommendations for a development system, you might want to look at the Docker approach on how to set up the environment.

Programmer preparation

Porting a very simple CPU with UART interface to the platform, the final step is flashing things down into the board using the Lattice Diamond Programmer. Usually, when not having installed any udev rules, these are the steps you have to go through with root powers, in order to get the USB FTDI programmer interface recognized:

Bus 001 Device 013: ID 0403:6010 Future Technology Devices International, Ltd FT2232C Dual USB-UART/FIFO IC

According to this device enumeration, you give access to the device:

chmod a+rw /dev/bus/usb/001/013

and unbind the first ttyUSB0 device from the ftdi_sio driver – unfortunately, the Lattice driver is not able to unbind from within.

echo -n $TTY_ID > /sys/bus/usb/drivers/ftdi_sio/unbind

Replace $TTY_ID by the tty device entry in /sys/bus/usb/drivers/ftdi_sio/ that typically is of the form “1-3.2:1.0” (the trailing 0 stands for port A where the JTAG is at). If you have plugged in more than one FTDI adapter with UART capabilities, you will see several and need to figure out which is which.

Now you can download code into the programmer.

To automatize the process, you could also use the script below:

#/bin/bash

allow_io=`lsusb | sed -n 's/^Bus \([0-9]*\) Device \([0-9]*\): ID 0403:6010 .*/\1\/\2/p'`

unbind_tty=`ls /sys/bus/usb/drivers/ftdi_sio/ | sed -n 's/\(.*\:1\.0\).*/\1/p'`

sudo chmod a+rw \/dev\/bus\/usb\/$allow_io
sudo sh -c "echo $unbind_tty > /sys/bus/usb/drivers/ftdi_sio/unbind"

SPI flash download

When downloading your design into the SPI flash, make sure you have MASTER_SPI_PORT=ENABLED set in your *.lpf file. Otherwise the programmer will fail with an error report on CHECK_ID:

ERROR - Verification Error...when Processing function: 'CHECK_ID'

Then you’ll have to use the fast download (into SRAM) with an enabled Master SPI in order to be able to flash again.

Design issues

When starting to port our SoC design to this board, quite a few issues came up:

  • Don’t bother developing with Diamond v3.7. Some very obscure behaviour with wrong I/O mapping cost quite some headache. Seems to be solved in v3.8
  • v3.8 however is somewhat misleading with respect to output and return states from the Synopsys Synthesis engine ‘Synplify’. Make sure to check your design thoroughly, if Synplify throws an error, Diamond sometimes would not recognize that and map/PAR an old design netlist.
  • Weird random behaviour can occur under Linux with Diamond during PAR, like error messages with respect to path names. This seems to happen especially with long names and underscores. The behaviour has been around in many previous versions of Diamond, Tech Support has refused to accept this as a bug so far. The workaround is to call PAR using the command line (TCL) or use the Run Manager.
    Addendum: It seems that this bug is fixed in v3.9, but there is no release note about it.

 

Talking through the UART

In theory, you should be able to talk to your design through the UART (if your design supports it) by firing up minicom:

minicom -o -D /dev/ttyUSB1

Now here comes the catch: The default EEPROM from my Versa 5G board did not have a correct descriptor, however it came with the default VID:PID from FTDI, so the ftdi_sio driver would recognize it, but in fact not communicate, neither report an error.

So, in order to properly use this board, you may have to erase the EEPROM of the FTDI adapter on the Versa kit using FTDIs Mprog tool or alike. You might want to save the previous EEPROM content for reference, however it does not seem needed, the Programmer recognizes the Board just fine.

Finally, after downloading our SoC setup into the board, it is talking:

Booting, HW rev: 04 -- Running at 50 MHZ

------------- test shell -------------
-- ZpuSoC for Versa ECP5 --
-- (c) 2017 www.section5.ch --
-- type 'h' for help --
# 

Simulation issues

When trying to fully simulate the SoC setup with PLL primitives and some instanced ip cores created by the Clarity module (obviously that’s the IPExpress descendant for ECP5), it turns out that some of the simulation primitives for the VHDL side are missing.

Some of them could be converted using the VHDL conversion trick from Icarus Verilog by the following Makefile rule:

%.vhdl: %.v
    iverilog -tvhdl -o $@ -pdepth=1 $<

However, some of the needed components might not convert without additional tweaking. Hopefully, Lattice Semi will come out with updated VHDL libraries.

IPcore simulation under GHDL

When generating IP cores that depend on library items and running them through GHDL, you might see this error message:

warning: component instance "scuba_vlo_inst" is not bound

However, if you have prepared your FPGA primitive components library, like ecp5um-obj93.cf right, the primitive simulation models should be in there (search for ‘vlo’ in the *.cf file if in doubt).

The reason why this is happening is that there could be component prototype declarations in the IP core file that shouldn’t be there in order to reference to the components from the library. So: Just remove the “component” declaration sections and all should link fine. The drawback of this is, that you need two IP core versions, one for simulation and one for synthesis. If you have a better solution, let me know.

Next steps

Now, the fancy stuff on this board to be evaluated is:

  • DDR3 memory
  • Two GigE capable interfaces

Also, the ECP5 on this board has enough resources to run several ZPUng cores simultaneously. For safety reasons, we wouldn’t want the IoT crap run in the same environment as our controlling main loop.

Therefore, a second processor (Core B) is instanced for running the Ethernet Stack (lwip) only, while maintaining a simple DMA channel to the controller Core A for communication. If core B is compromised, Core A will still maintain its “hardened” control loop and not go haywire.

Implementing DDR3 is tricky. Therefore you might want to use the DDR3 IP core supplied by Lattice. On the demo kit, it will work for a few hours and then pull the global reset.

Networking

Of course, I was very curious about the Ethernet ports on this board. It’s armed with two GigE capable Marvell Phys whose data sheets are a little hard to get hold of, but one might also look at various source code around the web or just check the reference design from Lattice. The reference design uses lwip, since I only need and want UDP, I ported a zerocopy-capable UDP stack I developed for the Blackfin EMAC to the ZPUng SoC (“cranach”), which is equipped with some DMA capable scratch pad memory for a proper packet queue.

After all, the FPGA is now able to speak netpp, so I can turn on an LED for example:

> netpp UDP:192.168.05:2016 LED.Yellow 1

Resource usage

You might want to know how much logic and RAM is consumed by this solution.

Design Summary
   Number of registers:   2770 out of 44457 (6%)
      PFU registers:         2767 out of 43848 (6%)
      PIO registers:            3 out of   609 (0%)
   Number of SLICEs:      2963 out of 21924 (14%)
      SLICEs as Logic/ROM:   2891 out of 21924 (13%)
      SLICEs as RAM:           72 out of 16443 (0%)
      SLICEs as Carry:        309 out of 21924 (1%)
   Number of LUT4s:        4154 out of 43848 (9%)
   Number of block RAMs:  28 out of 108 (26%)
   Number of DCS:  1 out of 2 (50%)
   Number of PLLs:  1 out of 4 (25%)

As for the actual program code, containing:

  • UDP stack supporting ARP, ICMP ping
  • Minimal shell (UART)
  • System I/O drivers (UART, Timer, MAC, PWM)
  • netpp minimal server with some LED handling

This is what’s effectively downloaded into the target:

(gdb) init
Loading section .fixed_vectors, size 0x400 lma 0x0
Loading section .l1.text, size 0x4af5 lma 0x400
Loading section .rodata, size 0x168 lma 0x4ef8
Loading section .rodata.str1.4, size 0xf60 lma 0x5060
Loading section .data, size 0x2c0 lma 0x5fc0
Start address 0x0, load size 25213


There’s a significant amount of string data for debugging in the .rodata.str1.4 section due to debugging info, plus some netpp descriptors. These again could be ‘overlayed’ to the SPI flash, as they are not too frequently accessed. To be investigated next…

SoC design virtualization

Live simulation examples

The video below demonstrates a live running CPU simulation which can be fully debugged through gdb (at a somewhat slower speed). Code can also be downloaded into the running simulation without the need to recompile.

Older videos

These are legacy flash animations which may no longer be supported by your browser. They demonstrate various trace scenarios of cycle accurate virtual SoC debugging.

Virtualisation benefits

Being able to run a full cycle accurate CPU simulation is helpful in various situations:

  • Verification of algorithms
  • Hardware verification: Make sure a IP core is functioning properly and not prone to timing issues
  • Firmware verification: hard verification of proper access (access to uninitialized registers or variables is found immediately)
  • Safety relevant applications: Full proof of correct functionality of a program main loop

Virtual interfaces and entities

Virtual entities allow to loop in external data or events into a fully cycle and timing accurate HDL simulation. For example, interaction with a user space software can take place by a virtual UART console, like a terminal program can speak to a program running on a simulated CPU with 16550 UART.

For all these virtualisation approaches, the software has to take a very different timing into account, because the simulation would run slower by up to a factor of 1000, when simulating complex SoC environments. However, using mimicked timing models, it turns out that the software in general becomes more stable and race conditions are avoided effectively.

So far, the following simple models are covered by our co-simulation library:

  • Virtual UART/PTY
  • FIFO: Cypress FX2 model, FT2232H model
  • Packet FIFO: Ethernet MAC receive/transmit (without Phy simulation)
  • Virtual Bus: Access wishbone components by address or registers directly by Python, like:
    fpga.SPI_CONFIG.ENABLE.set(True)

The virtualization library is issued in an opensource version at: github:hackfin/ghdlex

More complex model concepts:

RAM

For fast simulation, a dual port RAM model was implemented for Co-Simulation that allows access through a back door via network. That way, new RAM content can be updated in fraction of seconds for regression tests via simple Python scripting.

Virtual optical sensor

For camera algorithm verification with simulated image data (like from a PNG image or YUV video), we have developed a customizeable virtual sensor model that can be fed with arbitrary image data, likewise. Its video timing (blanking times, etc.) can be configured freely, image data is fed by a FIFO. A backwards FIFO channel can again receive processed data, like from an edge filter. This way, complex FPGA and DSP hybrid systems can be fully emulated and algorithms be verified by automated regression tests.

Display

For visual direct control or front end for virtualized LCD devices, the netpp display server allows to post YUV, RGB or indexed, hardware-processed images to be displayed on the PC screen from within the simulation. For example, decoded YUV-format video can be displayed. When running as full, cycle accurate HDL simulation, this is very slow. However, functional simulation for example through Python has turned out to be quite effective.

See also old announcement post for example: [ Link ]

Co-Simulation ecosystems

Sometimes it is necessary to link different development branches, like hardware and software: Make ends meet (and meet the deadlines, as well). Or, you might want to pipe processed data from matlab into your simulation and compare with the on-chip-processed result for thorough coverage or numerical stability. This is where you run into the typical problem:

  • Simulation runs on host A (Linux workstation)
  • Your LabVIEW client runs on the Student’s Windows PC in the other building
  • The sensors are on the roof

When you order a IP core design, you might want to have the same reference (test environment) as we do. This is based on a Docker container, so you do not have local dependency issues. Plus, it allows a continuous integration of software and hardware designs.

HDL playground

The HDL playground is a Jupyter Notebook based environment that is launched in a browser via the link below:

https://github.com/hackfin/hdlplayground

Features:

  • Co-Simulation of Python stimuli and your own data against Verilog (optional VHDL) modules
  • yosys, nextpnr and Lattice ECP5 specific tools for synthesis, mapping and PnR in the cloud
  • Auto-testing of Notebooks
  • No installation of local software (other than the Docker service)

VisionKit

VisionKit Kamera-Framework

Unser ‘VisionKit’ ist ein weitgehend modularer Baukasten aus Hardware und Software, basierend auf einem hochperformanten Bilderfassungssystem. Die Referenzdesigns:

  • embedded Linux-basierende Netzwerk-Kameras mit Auswertung auf der Kamera
  • FPGA SoC Streaming-Kameras
  • Fernsteuerung per netpp (optional GenIcam)

Module

  • ppivideo: BufferQueue-basierender v4l2-Kerneltreiber für Blackfin/uClinux, ‘zero-copy’ & verlustfrei (optimiert für Zeilenscan-Anwendungen)
  • videoserver: Portable User-Space-Library mit Bildverarbeitungspipeline/FIFO, simultane Bilderfassung und Verarbeitung wie auch Bildübertragung.
  • display: Remote-Display-Server für rohe Videoformate (Bildübertragung per netpp)
  • FPGA-IP (dombert streaming SoC)
  • Cottonpicken IP: DSP Baukasten für Bildverarbeitungs-Elemente (Pipelining, Debayer, Matrix-Kernel, JPEG DCT und Wavelet-Algorithmen)
  • Camasutra: User interface (Windows/Linux) für Fernsteuerung aller Arten von Kamera/netpp-Geräten. Nicht mehr unterstützt.

Die Ansteuerung der Sensoren erfolgt aus dem User Space. Für verschiedene Sensoren (s.u.) existieren Bibliotheken zur Registerkonfiguration via netpp-Properties.

In Zusammenarbeit mit Kunden und Partnerfirmen wurden FPGA-basierte Evaluationsplattformen (icarus, gözcü) entwickelt um Bildverarbeitungsalgorithmen sowohl auf einem FPGA wie auch DSPs zu evaluieren.

Referenzanwendungen

  • MJPEG streaming via http oder gstreamer (RFC2435 Echtzeit-Standard, niedrige Latenz)
  • 2D Barcode-Leser, 3-5 frames per second
  • Line-Scan-Anwendungen
  • Tracking, Blob-Detection

On-Camera processing demo via remote control tool

Industrielle Kundenprojekte (Referenzen auf Anfrage)

  • Mesa Imaging SR3k/SR4k Linux-Kamera
  • Medizinische Diagnostik
  • Navigationstechnik, optische Vermessung

Plattformen

  • FPGAs (‘bare metal’ I/O und Vorverarbeitung)
  • IMX6-Prozessorreihe von NXP
  • Blackfin BF53x, BF561 (keine aktive Weiterentwicklung, Legacy!)

Sensor-Support

Für folgende Sensoren existiert “Legacy”-Support per netpp-Registerbeschreibung. Alle auf dem Sensor verfügbaren Parameter können aus dem VisionKit heraus konfiguriert werden. Für alle neueren OnSemi-Sensoren mit *.xsdat-Registerdatei (Devkit Software) werden automatisch entsprechende netpp-Gerätedateien erzeugt.

Vendor Product ID
ON Semi(Aptina) MT9V024(034)
ON Semi(Aptina) MT9V032
ON Semi(Aptina) MT9D131 (JPEG)
ON Semi(Aptina) MT9D111(JPEG)
ON Semi(Aptina) MT9P031, MT9P001
ON Semi(Aptina) MT9T111 (JPEG)
Omnivision OV9620
Omnivision OV9655
Omnivision OV7725
Omnivision OV5670