Posted on Leave a comment

Hardware generation and simulation in Python

There are various approaches to Python HDLs, some more suited to Python developers than to HDL developers. They all have one thing in common: The very refined test bench capabilities of the Python ecosystem which allow you to just connect almost everything to all. From all these Python dialects, myHDL turns out to be the most readable and sustainable language for hardware development. Let me outline a few more properties:

  • Has a built-in simulator (limited to defined values)
  • Converts a design into flattened Verilog or VHDL
  • Uses a sophisticated ‘wire’ concept for integer arithmetics

In a previous post, I mentioned experiments with yosys and its Python API. Not much has changed on that front, as the myHDL ‘kernel’ based approach turned out to be unmaintainable for various reasons. Plus, the myHDL kernel has a basic limitation due to its AST-Translation into target HDLs that impedes code reusability and easy extendability with custom signal types.

For experiments with higher level synthesis, such as automated pipeline unrolling or matrix multiplications, a different approach was taken. This ‘kernel’, if you will, can handle the legacy myHDL data types plus derived extensions. This works as follows:

  • Front end language (myHDL) is slightly AST-translated into a different internal representation language (‘myIRL’)
  • The myIRL representation is executed within a target context to generate logic as:
    • VHDL (synthesizeable)
    • RTL (via pyosys target)
    • mangled Verilog (via yosys)

Now the big omnipresent question is: Does that logic perform right? How to verify?

  • The VHDL output (hierarchical modules) is imported into the GHDL simulator and can be driven by a test bench. The test bench is also generated as a VHDL module. Co-Simulation support is currently not provided.
  • The Verilog output can be simulated with iverilog, however, Co-Simulation is not enabled for the time being for this target
  • The RTL representation is translated to C++ via the CXXRTL back end and is co-simulated against the Python test bench. Note that support for signal events are rudimentary. CXXRTL is targeting at speedy execution with defined values (no ‘X’ and ‘U’)

Instead using classic documentation frameworks, the strategy was chosen again to use Jupyter Notebooks running in a Jupyter Lab environment. Again, the Binder technology enables us to run this in the cloud without requirement to install a specific Linux environment. The advantages:

  • Auto-Testing functionality for notebooks in a reference Docker environment
  • Reduced overhead for creating minimum working examples or error cases

This Binder is launched via the button below.

Launch button for myhdl emulation demos

Overview of functionality:

  • Generation of hardware as RTL or VHDL
  • Simulation (GHDL, rudimentary CXXRTL)
  • RTL display, output of waveforms
  • Application examples:
    • Generators (CRC, Gray-Counter, …)
    • Pipeline and vector operations
    • Extension types (SoC register map generation, etc.)

Yosys synthesis and target architectures

The OpenSource yosys tool finally allows to drop a reference tool chain flow into the cloud without licensing issues. This is in particular interesting for new, sustainable FPGA architectures. A few architectures have been under scrutiny for ‘dry dock’ synthesis without actually having hardware.

In particular, a reference SoC environment (MaSoCist) was dropped into the make flow for various target architectures to see:

  • How much logic is used
  • If synthesis translates into the correct primitives
  • If the entire mapped output simulates correctly with different simulators

The latter is a huge task that could only be somewhat automated using Python. Therefore, the entire MaSoCist SoC builder will slowly migrate towards a Python based architecture.

It is expected to document some more in particular about several architectures.

As an example, a synthesis and mapping step for a multiplier:

Limitations

As always with educational software, some scenarios don’t play. The restrictions in place for this release:

  • Variable usage in HDL not supported
  • Custom generators, such as Partial assignments (p(1 downto 0) <= s) or vector operations not supported in RTLIL
  • Limited support for @block interfaces
  • Thus: No HLS alike library support through direct synthesis (yet)

Exploring CXXRTL

CXXRTL by @whitequark is a relatively fresh simulator backend for yosys, creating heavily template-decorated C++ code compiling into a binary executable simulation model. It was found to perform quite well as a cythonized (compiled Python) back end driven from a thin simulator API integrated into the MyIRL library.

Since it requires its own driver from the top, a thin simulator API built on top of the myIRL library takes care of the event scheduling, unlike GHDL or icarus verilog which handle delays and delta cycling for complex combinatorial units. It is therefore still regarded as a ‘know thy innards’ tool. A few more benefits:

  • Allows to distribute functional simulation models as executables, without requirements to publish the source
  • Permits model-in-the-loop scenarios to integrate external simulators as black boxes
  • Eventually aids in mixed language (VHDL, Verilog, RTL) and many-level model simulations

There are also drawbacks: Like the MyHDL simulator, CXXRTL is not aware of ‘U’ (uninitialized) and ‘X’ (undefined) values, it knows 0 and 1 signals only. It is therefore not suitable for a full trace of your ASIC’s reset circuitry without workarounds. Plus, CXXRTL only processes synthesizeable code and would not provide the necessary delay handling for post place and route simulation.

Co-Simulation: How does this play with MyHDL syntax?

This is where it gets complicated. MyHDL allows a a subset of Python code to be translated to Verilog of VHDL such that you can write simple test benches for verification that run entirely in the target language.

Then there’s the co-simulation option, where native Python code (featured by the myHDL ‘simulator kernel’, if you will) runs alongside a compiled simulation model of your hardware. The simplest setup is basically a circuit or entire virtual board with only a virtual reset and clock stimulus. Any other simulation model, like as UART, a SPI flash, etc. can be connected to such a simulation with more or less effort. The big issue: Who is producing the event, who is consuming it? This leads us back to the infamous master and slave topic (I am aware it’s got a connotation).

The de-facto standards aiding us so far in the simulator interfacing ecosystem:

  • VHDL: VHPI, VHDLDIRECT, specific GHDL implementations
  • Verilog/mixed: VPI, FLI
  • QEMU as CPU emulation coupled to hardware models

The easiest to handle may be the VPI transaction layer, that is already present for myHDL. In this implementation, it is using a pipe to send signal events to the simulation and reading back results through another reverse path. Here, the myHDL plays a clear master role. For GHDL, a asynchronous concept was explored via my ghdlex library, allowing distributed co-simulation across networks where master and slave relationships are becoming fuzzy.

Finally, the CXXRTL method provides most flexibility, as we can add blackbox hardware that does just something. We have the full control here over a simple C++-layer without any overhead induced through pipes. The binding for Python can easily be created using Cython code. However it requires to clearly separate testbench code from hardware implementation.

This implies:

  • Test bench must be written in myHDL syntax style and needs to use specific simulation signal classes
  • Extended bulk signal/container classes re-usage is restricted
  • Hardware description can be in any syntax or intermediate representation, as well as blackbox Verilog or VHDL modules

Links and further documentation

As usual in the quickly moving opensource world, documentation is sparse and solutions on top of it are prone to become orphanware, once the one man bands retire or lose interest. However, I tend to rate the risk very low in this case. Useful links so far (hopefully, there’ll be found more soon):

Disclaimers

  • Recommended for academical or private/experimental use only
  • The pyosys API (Python wrapper for libyosys) may at this moment crash without warning or yield misleading feedback. There’s not much being done about this now as updates from the yosys development are expected.
  • Therefore, jupyter notebooks may crash and you may lose your input/data
  • No liability taken!
Posted on 2 Comments

RISC-V in the loop

Continuous integration (‘CI’) for hardware is a logical step to take: Why not do for hardware, what works fine for software?

To keep things short: I’ve decided to stick my proprietary RISC-V approach ‘pyrv32’ into the opensourced MaSoCist testing loop to always have an online reference that can run anywhere without massive software installation dances.

Because there’s still quite a part of the toolchain missing from the OpenSource repo (work in progress), only a stripped down VHDL edition of the pyrv32 is available for testing and playing around.

This is what it currently does, when running ‘make all test’ in the provided Docker environment:

  • Build some tools necessary to build the virtual hardware
  • Compile source code, create a ROM file from it as VHDL
  • Build a virtual System on Chip based on the pyrv32 core
  • Downloads the ‘official’ riscv-tests suite onto the virtual target and runs the tests
  • Optionally, you can also talk to the system via a virtual (UART) console

Instructions

This is the quickest ‘online’ way without installing software. You might need to register yourself a docker account beforehand.

  1. Log in at the docker playground: https://labs.play-with-docker.com
  2. Add a new instance of a virtual machine via the left panel
  3. Run the docker container:
    docker run -it hackfin/masocist
  4. Run the test suite:
    wget section5.ch/downloads/masocist_sfx.sh && sh masocist_sfx.sh && make all test
  5. Likewise, you can run the virtual console demo:
    make clean run-pyrv32
  6. Wait for Boot message and # prompt to appear, then type h for help.
  7. Dump virtual SPI flash:
    s 0 1
  8. Exit minicom terminal by Ctrl-A, then q.

What’s in the box?

  • ghdl, ghdlex: Turns a set of VHDL sources into a simulation executable that exposes signals to the network (The engine for the virtual chip).
  • masocist: A build system for a System on Chip:
    • GNU Make, Linux kconfig
    • Plenty of XML hardware definitions based on netpp.
    • IP core library and plenty of ugly preprocessor hacks
    • Cross compiler packages for ZPU, riscv32 and msp430 architectures
  • gensoc: SoC generator alias IP-XACT’s mean little brother (from another mother…)
  • In-House CPU cores with In Circuit Emulation features (Debug TAPs over JTAG, etc.):
    • ZPUng: pipelined ZPU architecture with optimum code density
    • pyrv32: a rv32ui compatible RISC-V core
  • Third party opensource cores, not fully verified (but running a simple I/O test):
    • neo430: a msp430 compatible architecture in VHDL
    • potato: a RISC-V compatible CPU design

Posted on Leave a comment

MaSoCist opensource 0.2 release

I’ve finally got to release the opensource tree of our SoC builder environment on github:

https://github.com/hackfin/MaSoCist

Changes in this release:

  • Active support for Papilio and Breakout MachXO2 board had been dropped
  • Very basic support for the neo430 (msp430 compatible) added, see Docker notes below
  • Includes a non-configureable basic ‘eval edition’ (in VHDL only) of our pipelined ZPUng core
  • Basic Virtual board support (using ghdlex co-simulation extensions)
  • Docker files and recipes included

Docker environment

Docker containers are in my opinion the optimum for automated deployment and for testing different configurations. To stay close to actual GHDL simulator development, the container is based on the ghdl/ghdl:buster-gcc-7.2.0 edition.

Here’s a short howto to set up an environment ready to play with. You can try this online at

https://labs.play-with-docker.com, for example.

Just register yourself a Docker account, login and start playing in your online sandbox.

If you want to skip the build, you can use the precompiled docker image by running

docker run -it -v/root:/usr/local/src hackfin/masocist

and skip (3.) below.

You’ll need to build and copy some files from contrib/docker to the remote Docker machine instance.

  1. Run ‘make dist’ inside contrib/docker, this will create a file masocist_sfx.sh
  2. Copy Dockerfile and init-pty.sh to Docker playground by dragging the file onto the shell window
  3. Build the container and run it:
    docker build -t masocist .
    
    docker run -it -v/root:/usr/local/src masocist
  4. Copy masocist_sfx.sh to the Docker machine and run, inside the running container’s home dir (/home/masocist):
    sudo sh /usr/local/src/masocist_sfx.sh
  5. Now pull and build all necessary packages:
    make all run
  6. If nothing went wrong, the simulation for the neo430 CPU will be built and started with a virtual UART and SPI simulation. A minicom terminal will connect to that UART and you’ll be able to speak to the neo430 ‘bare metal’ shell, for example, you can dump the content of the virtual SPI flash by:
    s 0 1
    

    Note: This can be very slow. On a docker playground virtual machine, it can take up to a minute until the prompt appears, depending on the server load.

Development details

The simulation is a cycle accurate model of your user program, which you can of course modify. During the build process, i.e. when you run ‘make sim’ in the masocist(-opensource) directory, the msp430-gcc compiler builds the software C code from the sw/ directory and places the code into memory according to the linker script in sw/ldscripts/neo430. This results in a ELF binary, which is again converted into a VHDL initialization file for the target. Then the simulation is built.

The linker script is, however very basic. Since a somewhat different, automatically generated memory map is used at this experimental stage, all peripherals are configured in the XML device description at hdl/plat/minimal.xml, however the data memory configuration (‘dmem’ entity) does not automatically adapt the linker script.

Turning this into a fully configurable solution is left to be done.