Posted on Leave a comment

Multitasking on the ZPUng

For preemptive or less preemptive multi tasking on the ZPUng architecture, some mechanisms for task switching come in handy. Since the ZPUng is a context saving architecture by design, the context switching is very light: We only need to manipulate the stack pointer and program counter somewhere in the code (plus regard some minor details with global variables used by GCC which we will ignore for the begin).

Every task has its own stack area in the stack memory. Since we have no virtual memory in this architecture, care must be taken that the  local stack areas are not trashing other task’s reserved stack spaces.

Preemptive (time slice) multitasking

In this case, a timer interrupt service routine will always change the context. By design of an interrupt handling hardware, the return address PC is always stored on the stack, so if we manipulate the stack pointer (SP) inside the interrupt handler routine, there’s not much more to do than saving any global context entities on the stack.

For this context switch, we need to store the current SP into the address pointed to by a global context pointer g_context on IRQ handler entry and restore it upon exit. The following assembler macros are required to do that:

; Save current SP context in a global ptr g_context
.macro save_context
im g_context

; Restore SP context from global ptr g_context
.macro restore_context
im g_context

The timer service routine looks very simple as well:

; -- IRQ handler
	.globl irq_timer_handler
	; Stores the current context (sp) into the variable pointed to
	; by g_context.

	im timer_service

	; This leaves a possibly new return jump address on
	; TOS, if g_context was modified by the timer_service.
	.byte 15

So inside the timer_service() function which can be coded in C, we need to only modify the g_context pointer with each tasks stack pointer storage address:

 g_context = &walk->sp; // Context switch

Using a very simple prioritized round robin scheduler and two example tasks toggling GPIO pins, we achieve a simulation result as shown below:

Multitasking trace

The TaskDesc debug output denotes the currently active task ID, 0x2a90 being the main task, where 0x2aa8 toggles GPIO1, 0x2ac0 toggles GPIO0.

Internally, these task descriptors are put into a worker queue and are cycled through using some bit of priority distribution, i.e. tasks with a lower ‘interval’ value get more CPU time, however, a task can never block completely.


You might notice something odd in the above wave trace around t = 1.5ms (and 1.7ms, likewise): GPIO0 is changed even though the corresponding task 0x2ac0 is not active. Why is that? Let’s have a look at the task code:

int task1(void *p)
    while (1) {
        MMR(Reg_GPIO_OUT) ^= 0x02;
    return 0;

int task2(void *p)
    while (1) {
        MMR(Reg_GPIO_OUT) ^= 0x01;
    return 0;

The solution:

The XOR statement to the GPIO register is not atomic. Meaning, it splits up into the following primitive instructions:

  1. Get value from OUT register
  2. XOR with a value
  3. Write back value to OUT register

Let a timer IRQ request come in between 1 or 2 and assume it is switching the context to the other GPIO manipulating task – here we go. task2() is actually getting in between!

If we were to use global variables and tasks depending on single bits, we should keep this big virtual banner around in our coder’s brains:

Make sure your semaphores are atomic!

Non-preemptive (user space) multi tasking

Another aspect of concurring tasks: There might be a process waiting for input data, i.e. sleeping until data is ready and the IRQ handler wakes the corresponding process up. In the meantime, other processes might want to consume the CPU time. The rather dumb round robin scheme doesn’t take this into account, it just cycles through processes and makes sure each gets its slice once in a while.

Non-preemptive multi tasking implies, that some control is actually given to the currently running task. Loosely speaking: a task switch is induced from user space (not inside an IRQ handler). Let’s summarize what functionality we’d want to have for a user space triggered context switch:

  1. Process might want to sleep for a certain time:
    -> We put the context descriptor into a sleep queue that is worked on inside the timer service handler. Once the timeout is reached, the process is put back first into the worker queue, hence is resumed next.
  2. Process waits for data to arrive / DMA to complete:
    -> The context descriptor is put into a wait queue and resumes upon a specific data IRQ event.

A similar scheme is run in the Linux kernel. We try to keep this layer way thinner for our simple ZPUng SoC though.

Now, with a lack of atomicity as shown above, things can get in each other’s way. Classical CPU architecture tend to block IRQs to implement atomic behaviour, we can overcome this overhead using the ZPUng with a trick by jumping into microcode emulation code space (using a reserved instruction), where interrupts are by default masked, but still latched (like inside an IM instruction sequence). This introduces some minor latency for interrupt response, however this is most of the time not of any concern.

Inside the context switch system call, the stack context is manipulated as inside the timer service handler. Using simple queue techniques we can make sure that no unwanted modification is getting in between non-atomic operations.

The simulation benefit

When developing tailored multi tasking configurations without a generic OS overhead, bugs are easily introduced. The classical problem of a race condition with uninitialized variables (that never turn up in a source code review or MISRA compliance check) can cause a lot of headache on uC-Systems with no fully non-intrusive trace unit. In this case, a full 1:1 simulation comes in extremely handy.

For example, if a task accesses a variable before it was actually initialized or properly defined, the system would recognize the undefined memory content as such and display this event in the simulation.

However, the system as such can not take the burden of you, to create proper test cases. For example, a multi tasking setup may never show a problem in the simulation if the timing of interrupt events is deterministic. If external data availability comes into play, you would have to create a stimulating test bench that makes use of all possible timing intervals with respect to a task switch event to actually prove that the programm is robust in all possible scenarios.

Posted on Leave a comment

VHDL to XML to …whatever

Time to write about some academic fun again. Well, it’s language stuff you might not think about. Like: What does Python have to do with FPGAs, or VHDL with XML.

If you have been picking up a few bits on this website in the past, you might have learned about the IP-XACT alike approach: Turn XML into HDL using a XSLT style sheet. Things like these are done on the web daily for XML to web pages.

Now we go the other way: Turning VHDL into XML and then turn this into something else again. Why would we want to do that? Let’s save the clue for later and assume there’s this adacemic fun thing only. But what’s the occasion?
Recent versions of the free GHDL VHDL compiler and simulator were augmented by a –file-to-xml option, which dumps the fully analyzed AST (abstract syntax tree) into an XML structure. Typically, this results in a huge file, including lots of cross references between analyzed objects, representing all of the complexity of carefully engineered languages as VHDL.

But why would we want to go beyond something that is as strict as VHDL?

Users familiar with GHDL might not just like it for its simulation purposes, but also for its excellent analysis and re-factoring features, such as the cross referenced HTML output. The XML file output option is taking this even further: Now you can basically analyze your code structures even more in depth and your own customized way, without getting into the VHDL parsing yourself.
But…halt. What’s taking the burden from parsing XML instead of VHDL? That’s where the XSLT technology comes in. We don’t have to mess with all tree related issues in a programming language. XSLT might be unreadable and not too easy to debug, but once you get the hang of it, it can save you a lot of work. More plusses:

  • It’s compact
  • It allows to switch between  ‘lazy analysis’ and ‘complex coverage’ approach using its template features
  • It runs in a browser

The last item is represented in a demo below.

So where is this heading? This XML discussion might have been originated from some pondering on how to get from VHDL analysis (done by GHDL) to a synthesizable net list, or anything that can be fed to a mapper.
That’s ambitious, really. But why not take the small steps first:

  • Create a graph from an entity [Demo]
  • Check HDL for non-synthesizable constructs
  • Check HDL for design rules
  • Convert HDL into [ MyHDL, Verilog, …]
  • Create language independent RTL format for OpenSource mappers
  • Follow whygee’s idea of a smart elaboration towards synthesis
Posted on Leave a comment

Using linux kernel config and devdesc/XML for VHDL designs

A true mess of a VHDL design

Almost every linux kernel user is kinda familiar with it: a ‘make menuconfig’ calls up the blue configuration screen that lets you choose all kind of drivers. Some folks have been using the kconfig tool for embedded systems for a while, like busybox, or the very nice antares setup for esp8266 systems ().

So why not use the linux kernel config for hardware designs? Since I’ve been working with System on Chip (SoC) implementations of various kinds, I’ve kept shooting myself in the foot with a lot of maintenance work, caused by all kinds of different configurations. For example, I’d like to have a SoC where peripherals can be configured freely, or using another soft core. Merging all these projects into a single setup with *one* configuration entity to touch, turned out to be a bit of a challenge.

So far we used XML to describe the entire SoC. This generates all the necessary peripheral decoders and address maps automatically, like with the various system builder tools from the big FPGA vendors. But what if there is an entire family of SoCs even running on FPGAs from different vendors? Even then, an XML file would have to be written for each platform configuration. No good. Plus, the XML does not help you much in selecting the source files, unless you export Makefiles on the fly. No good idea either.

A better approach is, to actually specify *one* omnipotent hardware setup in the XML device description and use kconfig to turn on/off the components or even specify the number of instantiations, for example of several UARTs.

So what’s left to do?
kconfig covers up the export of its CONFIG parameters using various backends. There are already backends for:

  • Makefiles
  • C headers

Missing is the support for VHDL. This is currently done by a makefile hack which exports the configuration into a global_config.vhdl file. This is used as a package from every relevant HDL design file.

Now there comes up another problem: If units can be turned on or off, the component interface (I/O pin layout) will change. VHDL does not have any preprocessing functionality for the full power of conditional compilation, however, every decent developer system has a program which can do: The C preprocessor.

So for the top level SoC module which directly instances the peripheral I/O module with a conditional pin mapping, we use VHDL code decorated with the #ifdef statements known from C. These VHDL files have a .chdl suffix. In the Makefiles, there is simply a rule to convert this .chdl file to .vhdl. Done. We just need to figure out the proper Makefile rules and make sure everything is resynced to source files changes upon calling a “make all”.

The real stuff

So, how does it work?


When you call the classic “make menuconfig” inside our SoC project, you would see the above configuration screen. Everything else works pretty much linux’ish.

For defining new hardware peripherals, you still have to do a few redundant extra steps:

  1. Edit the XML file with the register map of the peripheral
  2. Define the peripheral address mapping in the XML file
  3. Add the configuration options to the perio/Kconfig file, just like in Linux
  4. Adapt the kconfig->VHDL makefile script (
  5. Of course: Implement the core for your peripheral to be instanced by the soc_mmr_perio module.

Step (4) could be simplified by writing a specific VHDL backend for kconfig. This is just “nice to have”, so maybe someone else would want to step in?

devdesc/XML for hardware

netpp/devdesc is close to having its 10th birthday. It’s our own XML language to describe devices and has been in service for quite a bit for Internet of Things applications. It turned out that not only existing hardware can be described with it, but a full SoC design can be generated at very little overhead. The gensoc tool, making heavy use of netpp technology, creates the full peripheral module including decoders and peripheral instances (like UART0, UART1, PWM, …) from a system description XML file.

As mentioned, we don’t want to write plenty of similar looking XMLs for each SoC variant. It is much easier to decorate the existing XML which contains all HW definitions with processing commands. Their purpose is, to only emit XML nodes to a specific target description when the corresponding CONFIG_<module> variable is set.

In short, we create a stripped down XML file, describing the specific target, from a XML family file containing the whole peripheral superset. The Target XML file is then used to generate the HDL via gensoc.

The software side

Every SoC of course has some built-in software – a bootloader or a bare metal program doing stuff. This code is typically written in C or assembly and makes quite some happy usage of the kconfig output as well. For instance, an “autoconf.h” header is exported that contains all the macro defines for the configuration. So you can enable test routines or hardware drivers as usual.

Every SoC should come with proper debugging features. It helps a lot, when developing new SoC peripherals and drivers, to make use of some regression testing scripts that can run inside GDB or as a remote procedure call solution inside the simulation.

For example, we use the same framework to generate a wrapper such that all peripheral registers can be accessed through a python script, like:


for i in range(100):

This is rather exclusive to the simulation, when running on the target, writing GDB scripts is typically the best option.

The python script does not have to be aware of any hard addresses, so testing systems can be set up for entire families of SoCs with differing address bases. For a specific hardware target however, the GDB scripts containing the register mappings have to be explicitely regenerated.


The kconfig tool boosts the SoC development quite a bit and makes things really clean and more robust. The entire system with the XML description is still a bit from being perfect, but it just works on plenty of platforms using the typicall (free) developer tools.