Posted on

VHDL to XML to …whatever

Time to write about some academic fun again. Well, it’s language stuff you might not think about. Like: What does Python have to do with FPGAs, or VHDL with XML.

If you have been picking up a few bits on this website in the past, you might have learned about the IP-XACT alike approach: Turn XML into HDL using a XSLT style sheet. Things like these are done on the web daily for XML to web pages.

Now we go the other way: Turning VHDL into XML and then turn this into something else again. Why would we want to do that? Let’s save the clue for later and assume there’s this adacemic fun thing only. But what’s the occasion?
Recent versions of the free GHDL VHDL compiler and simulator were augmented by a –file-to-xml option, which dumps the fully analyzed AST (abstract syntax tree) into an XML structure. Typically, this results in a huge file, including lots of cross references between analyzed objects, representing all of the complexity of carefully engineered languages as VHDL.

But why would we want to go beyond something that is as strict as VHDL?

Users familiar with GHDL might not just like it for its simulation purposes, but also for its excellent analysis and re-factoring features, such as the cross referenced HTML output. The XML file output option is taking this even further: Now you can basically analyze your code structures even more in depth and your own customized way, without getting into the VHDL parsing yourself.
But…halt. What’s taking the burden from parsing XML instead of VHDL? That’s where the XSLT technology comes in. We don’t have to mess with all tree related issues in a programming language. XSLT might be unreadable and not too easy to debug, but once you get the hang of it, it can save you a lot of work. More plusses:

  • It’s compact
  • It allows to switch between  ‘lazy analysis’ and ‘complex coverage’ approach using its template features
  • It runs in a browser

The last item is represented in a demo below.

So where is this heading? This XML discussion might have been originated from some pondering on how to get from VHDL analysis (done by GHDL) to a synthesizable net list, or anything that can be fed to a mapper.
That’s ambitious, really. But why not take the small steps first:

  • Create a graph from an entity [Demo]
  • Check HDL for non-synthesizable constructs
  • Check HDL for design rules
  • Convert HDL into [ MyHDL, Verilog, …]
  • Create language independent RTL format for OpenSource mappers
  • Follow whygee’s idea of a smart elaboration towards synthesis
Posted on

netpp on programmable logic: IoT for the FPGA

For quite a while I wouldn’t have said, it’s impossible, but wouldn’t put much effort in it either. Why, if you have a spare $1 microprocessor that can run a simple netpp communication stack just fine.
Well, sometimes it’s time to try something else: Running a soft core CPU (ZPU) on small FPGAs has found some interest, due to the limited resource consumption, but full freedom when it comes to specific interfaces, such as

  • Programmable PWM engines (motor control)
  • Many RS485 capable interfaces (that classic uCs don’t have)
  • Safety proof specific state machines

The ZPU will even fit on a $5 FPGA and still leave enough space for the specific interfaces. It is a slow stack machine, even the fastest pipelined implementations don’t really beat the MIPS alike architectures, however this doesn’t bother us when we just have to configure a set of registers, moreover, the ZPU architecture compensates with quite some code density.

This solution was presented on the Embedded World Conference 2016 in Nuremberg. The published documentation can be downloaded for free in the web shop.

[ Update 01/2018 ]

After many proof of concepts and field tests, an eval kit for some real networked FPGA fun is out: the netpp node, featured by a high speed UDP engine. Earlier solutions were using a Wiznet chip, the WLAN hack using the horrible esp8266 chipset turned out to be unusable for industrial purposes. The current two chip (FPGA and Phy) solution works so far most robust.

Posted on

Virtual ROM on small FPGAs

Readonly data – outsourced

When running display output applications on small FPGAs where printing of strings is required, using the internal block RAM for character sets would be a waste of resources, or even: not sufficient. The 64kB character map for a normal and inverted font using both RGB and BGR subpixel smoothed renderings (more below) would eat up more than available on the Spartan3-250k device, for example.
The Spartan3 on my good old Papilio has a SPI flash attached with less than 50% actual usage for the FPGA bit stream. Tadaa. Plenty of space for character bitmaps. So we can just store the bitmap in an unused sector on the SPI and blit-copy from SPI flash to the LCD display directly.
But what if there’s more read-only data, like second stage program code or coprocessor microcode that is loaded on the fly by an applet? If we don’t need it for the boot process, it should not live in the block ram permanently, but still execute from block RAM. Screams for a cache, doesn’t it?
So the simple solution actually is, to create a small controller entity ‘scache’ inside the system specific peripheral of the SoC. This simply watches access to certain addresses and generates an exception once this address is hit. The exception vector then jumps into a handler routine which does all the SPI flash loading into the physical cache memory area. Then, for the next LOAD instruction, the virtual address internally translates to the physical cache address.
This requires very simple logic, however runs through some program code and needs a bit of time to load from the SPI flash.
Turns out this is barely noticeable for the LCD display.

Under the hood: Linker scripting

Ok, so there is plenty more data from the .rodata section. We could implement some kind of overlay loader and a kind of file system, but why make it complicated, when our data is somewhat static. We just relocate the cached external data using a linker script. Say, our program memory ranges from 0x0000 to 0x2000, the cache is allocated after that. Then we define a memory specification in the linker script as shown below:

MEMORY
{
        l1ram(rwx): ORIGIN = 0x0000, LENGTH = 0x2000
        l1cache(rwx): ORIGIN = 0x2000, LENGTH = 0x2000
        xdata(r): ORIGIN = 0x10000, LENGTH = 0x8000
}

For the .rodata section, we can simply use a line like

.rodata         : { *(.rodata .rodata.* .gnu.linkonce.r.*) } > xdata

to allocate the read-only areas into the virtual SPI ROM starting from 0x10000. Eventually, compiling the whole story will result in an ELF file which we only have to separate into the boot ROM area and the second stage binaries that live in the SPI flash.

But we may want to have read-only data that should be present at all times, like library code that is frequently used. For this purpose, we can just define our own segments and use the __attribute__ decorator for the data that we want in BRAM:

__attribute__((section(".l1.rodata"))) my_printf(...)

I will not dive into the details of linker scripting, but likewise you can also allocate entire library or object files in specific memory segments.

This simple technique allows you to pack quite a bit of code into small FPGA SoCs.

The LCD example

Here you can see an example of the character table in action. I have mentioned the RGB/BGR subpixel smoothing above which lead to a bit higher ROM usage. The used LCDs have quite a bit of functionality, like setting the orientation of the screen display. You could just not care about when designing a character table and use a simple black/white scheme. However, when trying to achieve a 4×8 pixel character set, the font will look quite unreadable and you’re better off with the subpixel smoothing. This again is sensitive to the pixel order, so for top and bottom display orientation you will already need the font rendered in two variants, plus in color. Now, as we have the space, it is just much simpler to drop the entire bit map into the flash instead of figuring out compression techniques.

Program code

Likewise, not too frequently used program code can be dropped into external SPI flash. This comes in handy when a program becomes bigger during development and the space usage can not be determined a priori. The ZPUng is able to handle a specific exception signal that is risen by the SCACHE controller. The cache handler microcode routine then loads code from SPI flash and resumes execution while translating the virtual PC address into a physical address inside the cache area, where instructions are effectively fetched from. This is the trick used to run the netpp communication library for IoT applications  on FPGAs with limited resources, like the Papilio One.