Posted on Leave a comment

Multitasking on the ZPUng

For preemptive or less preemptive multi tasking on the ZPUng architecture, some mechanisms for task switching come in handy. Since the ZPUng is a context saving architecture by design, the context switching is very light: We only need to manipulate the stack pointer and program counter somewhere in the code (plus regard some minor details with global variables used by GCC which we will ignore for the begin).

Every task has its own stack area in the stack memory. Since we have no virtual memory in this architecture, care must be taken that the  local stack areas are not trashing other task’s reserved stack spaces.

Preemptive (time slice) multitasking

In this case, a timer interrupt service routine will always change the context. By design of an interrupt handling hardware, the return address PC is always stored on the stack, so if we manipulate the stack pointer (SP) inside the interrupt handler routine, there’s not much more to do than saving any global context entities on the stack.

For this context switch, we need to store the current SP into the address pointed to by a global context pointer g_context on IRQ handler entry and restore it upon exit. The following assembler macros are required to do that:

; Save current SP context in a global ptr g_context
.macro save_context
im g_context

; Restore SP context from global ptr g_context
.macro restore_context
im g_context

The timer service routine looks very simple as well:

; -- IRQ handler
	.globl irq_timer_handler
	; Stores the current context (sp) into the variable pointed to
	; by g_context.

	im timer_service

	; This leaves a possibly new return jump address on
	; TOS, if g_context was modified by the timer_service.
	.byte 15

So inside the timer_service() function which can be coded in C, we need to only modify the g_context pointer with each tasks stack pointer storage address:

 g_context = &walk->sp; // Context switch

Using a very simple prioritized round robin scheduler and two example tasks toggling GPIO pins, we achieve a simulation result as shown below:

Multitasking trace

The TaskDesc debug output denotes the currently active task ID, 0x2a90 being the main task, where 0x2aa8 toggles GPIO1, 0x2ac0 toggles GPIO0.

Internally, these task descriptors are put into a worker queue and are cycled through using some bit of priority distribution, i.e. tasks with a lower ‘interval’ value get more CPU time, however, a task can never block completely.


You might notice something odd in the above wave trace around t = 1.5ms (and 1.7ms, likewise): GPIO0 is changed even though the corresponding task 0x2ac0 is not active. Why is that? Let’s have a look at the task code:

int task1(void *p)
    while (1) {
        MMR(Reg_GPIO_OUT) ^= 0x02;
    return 0;

int task2(void *p)
    while (1) {
        MMR(Reg_GPIO_OUT) ^= 0x01;
    return 0;

The solution:

The XOR statement to the GPIO register is not atomic. Meaning, it splits up into the following primitive instructions:

  1. Get value from OUT register
  2. XOR with a value
  3. Write back value to OUT register

Let a timer IRQ request come in between 1 or 2 and assume it is switching the context to the other GPIO manipulating task – here we go. task2() is actually getting in between!

If we were to use global variables and tasks depending on single bits, we should keep this big virtual banner around in our coder’s brains:

Make sure your semaphores are atomic!

Non-preemptive (user space) multi tasking

Another aspect of concurring tasks: There might be a process waiting for input data, i.e. sleeping until data is ready and the IRQ handler wakes the corresponding process up. In the meantime, other processes might want to consume the CPU time. The rather dumb round robin scheme doesn’t take this into account, it just cycles through processes and makes sure each gets its slice once in a while.

Non-preemptive multi tasking implies, that some control is actually given to the currently running task. Loosely speaking: a task switch is induced from user space (not inside an IRQ handler). Let’s summarize what functionality we’d want to have for a user space triggered context switch:

  1. Process might want to sleep for a certain time:
    -> We put the context descriptor into a sleep queue that is worked on inside the timer service handler. Once the timeout is reached, the process is put back first into the worker queue, hence is resumed next.
  2. Process waits for data to arrive / DMA to complete:
    -> The context descriptor is put into a wait queue and resumes upon a specific data IRQ event.

A similar scheme is run in the Linux kernel. We try to keep this layer way thinner for our simple ZPUng SoC though.

Now, with a lack of atomicity as shown above, things can get in each other’s way. Classical CPU architecture tend to block IRQs to implement atomic behaviour, we can overcome this overhead using the ZPUng with a trick by jumping into microcode emulation code space (using a reserved instruction), where interrupts are by default masked, but still latched (like inside an IM instruction sequence). This introduces some minor latency for interrupt response, however this is most of the time not of any concern.

Inside the context switch system call, the stack context is manipulated as inside the timer service handler. Using simple queue techniques we can make sure that no unwanted modification is getting in between non-atomic operations.

The simulation benefit

When developing tailored multi tasking configurations without a generic OS overhead, bugs are easily introduced. The classical problem of a race condition with uninitialized variables (that never turn up in a source code review or MISRA compliance check) can cause a lot of headache on uC-Systems with no fully non-intrusive trace unit. In this case, a full 1:1 simulation comes in extremely handy.

For example, if a task accesses a variable before it was actually initialized or properly defined, the system would recognize the undefined memory content as such and display this event in the simulation.

However, the system as such can not take the burden of you, to create proper test cases. For example, a multi tasking setup may never show a problem in the simulation if the timing of interrupt events is deterministic. If external data availability comes into play, you would have to create a stimulating test bench that makes use of all possible timing intervals with respect to a task switch event to actually prove that the programm is robust in all possible scenarios.

Posted on 3 Comments

Verpackung und Gehäusedesign mit OpenSource-Tools

Mit fortschreitender Verbesserung diverser Softwarewerkzeuge aus der nicht immer bedingungslos geliebten OpenSource-Szene erliegt man alle Jahre der Versuchung, mal wieder etwas herumzuspielen. Dabei lässt man sich auch immer wieder überraschen, wie gut die Tools inzwischen geworden sind.

Problemstellung: Zeige dem Kunden ein mögliches Gehäuse zum Anfassen. In dem Zug kommt oft die erste Frage: Können Sie das mit einem 3D-Drucker machen? Ich kann es nicht, aber andere können das. Leider nur nicht in Stückzahlen, und die Ausgabe dauert doch eine geraume Weile.

Inzwischen haben wir unsere notch flap-Falttechnik so weit optimiert, dass aus einem Papierbogen in wenigen Sekunden ein fertiges Gehäuse wird.

Damit es auch passt, müssen die Abmessungen die Platinen wie auch der Bauteile passen. Damit kommen wir zu den Werkzeugen.


Das Allzweckwerkzeug namens Blender, was nach einem wenig erfolgreichen Kommerzialisierungsversuch als OpenSource veröffentlicht wurde, ist prima dazu geeignet, die diversen chirurgischen Konversions-Operationen vorzunehmen und daraus auch die Gehäuseprototypen auszugeben.

Aber nun von vorne, die involvierten Tools der Reihe nach:

  1. Ursprungsdaten (PCB-Design, Positionen der Bauteile): KiCAD
  2. Konversion der CAD-Daten: freecad
  3. Aufbereitung und Gehäuse-Design: Blender

Den Rest erledigt unsere hauseigene ‘notch flap’ Extension für Blender. Die Ausgabe geschieht auf einem Schneideplotter, und für das Falten sind einige Handgriffe nötig. Fertig!


Gehäuseprototyp für Eval-Kit
Gehäuseprototyp für Eval-Kit
Posted on 2 Comments

CCS811 sensor review

CCS 811 sensor drifting

I’ve logged the CCS811 eCO2 values over 24 h, using the 10s acquisition PULSE mode of the MEAS_MODE.DRIVE_MODE register bit field. It looks like the internal algorithm is not so well able to cope with humidity changes, as soon as humidity drops (when opening a window), the eCO2 values start drifting. The hardware/firmware specs:


The air quality in the room is somewhat constant and should normally range between 400 and 800 ECO2 pseudo-ppm.

In the examples below, no environmental on-sensor compensation (using the ENV_DATA register) is used.

In all cases where too much drifting was registered, the sensor DRIVE_MODE was reset to Pulse (2). You can then also see that the RAW value is then at a different offset.

The sudden humidity changes seem to be problematic, obviously. Since the raw values don’t change much during the continuous measurement interval (without restarting/rewriting DRIVE_MODE) , this would suggest that the firmware is somewhat mislead and keeps adapting its values.

The app note AN369 mentions the handling of the BASELINE register in paragraph 17. This is to be experimented with next.

T/H Environment compensation

The Write-Only (!) ENV_DATA register should, according to the HW reference, allow compensation of the humitidy. However, first attempts made the measurements go completely haywire and oscillating. This is currently under scrutiny.

It looks like that only a more or less constant humidity makes the sensor operate in a sane range. Otherwise, the high eCO2 values can not be trusted whenever the humidity rises rapidly. I guess this is a somewhat unwanted property of any SnO2 gas sensor and it can be quite difficult to adjust without knowing the parameters. However in this case it seems that the firmware is doing some internal adjustment that is still somewhat not understood.

Baseline adjustment

The BASELINE register is somewhat obscure and undocumented: Some MSBs seem to be flags, the LSB some offset value. The AN tells you to just write back a ‘known’ baseline from clean air. However, I’d assume that for a proper measurement you’d have to record a set of baselines corresponding to environment T/H values to figure out which to write. Not there yet…

Conclusion / possible way out

Whenever the humidity is changing rapidly, the current strategy is:

  • Wait until humidity has settled and try resetting the BASELINE to a ‘known good’ state
  • Rewrite DRIVE_MODE to the previous value and wait a certain settling time

Resetting the baseline according to the AN sometimes works, sometimes produces completely unusable results. It is still a bit of a mystery why the ENV_DATA compensation does not work as ‘promised’.

I’ll probably post more findings here.