Posted on

Image compression prediction techniques

Prediction techniques are present in almost any approach to image compression, be it lossless or lossy. The general idea of a predictor is, to extrapolate a pixel from its neighbours, i.e. assuming it has a specific value. The difference between this assumed value and the effective value is regarded as error. For a round trip coding, i.e. forward and inverse transformation, it is only required to store a reference, prediction history and the errors.

Since these errors are – seen from a statistical distribution – rather limited, the encoding can happen effectively with less bits than the eight bits normally used per pixel channel.

A simple predictor

A very simple predictor tries to predict pixel values based on the previous scan line or pixel row. It evaluates the pixels [0, 1, 2] as shown below to extrapolate the value of [X]:

00  11  22 ...
##  XX  ##

Running this through a test image and storing only the error, we get something like below:

Test image from unsplash, scaled/cropped
Prediction error image

The prediction error image was quantized to make the potential compression gain more visible.

By simple bit plane reshuffling and entropy coding techniques deployed by the JPEG XS format, our predictor experiment allows to compress the above image lossless by roughly factor two. Using Wavelet decomposition, lossless compression can be achieved up to factor four. By quantization (that will introduce information loss), much higher compression in the range of 1:10-1:20 can be achieved with quality compareable to the classic JPEG format, however with reduced block artefacts at higher compression.

Context sensitivity and adaptivity

When recompressing images, artefacts can arise that are hard to re-compress. Likewise, bayer pattern raw sensor images converted to YUV or RGB values can lead to frequency artefacts that cause higher entropy.  Our approach uses a very simple, context sensitive predictor based on a 8-state finite state machine where the decision on what state is chosen next is based on a simple history and a comparison. The image below shows a typical error distribution for these eight states. For a normal grayscale image, the statistical distribution is alike for each state. For artefact-tainted images, the distribution can be rather distorted.

Error distribution for eight contexts
Refined context tracker

Experiments with a higher number of contexts and a complex tracking history yielded a better compression for artefact-tainted images – under certain circumstances. The above T shaped predictor (internally named ‘STP8’ for sliding T predictor) was enhanced to 27 states (‘STP27’) in particular to perform well in the following scenarios:

  • JPEG recompression artefacts
  • Bayer pattern artefacts (Bayer to YUV422)
  • Custom color pattern experiments (undocumented)

Each of these artefact scenarios requires a specific default parameter setting for the STP27. A classification of pixels is then done ‘on the fly’ and adaptive compression is applied. This showed a significant improvement over the STP8, but introduces some complexity of the hardware state machine and requires a rather large huffman coding table.

In simulation experiments with a number of ‘classic’ test images, this turned out to actually beat lossless wavelet compression.

Hardware implementation

Using our proprietary FLiX coprocessing engine, a hardware based pipeline can be generated that uses FPGA technology to compress grayscale images at a very high data rate (up to 200 Mpixel/s per channel). Basically, no multiplication is required for the ‘baseline’ variant. For images with higher amount of artefacts however, decorrelation approaches are deployed that will require multiplications, and a feedback regulation similar to convolutional neuronal networks. This is still under scrutiny. For most sensor imagery however, it appears that the compression gain is not excessive. Update: Obsolete with STP27 implementation.

Update: 2.4.2015
Repub: 11.9.2017 [Replace images]

Posted on

JPEG encoding on FPGA [revisited]

Although considered ‘ancient’ because invented in the early nineties, the JPEG standard is far from being dead or superseded. Its basic methods are still up to date for modern video compression.

For low latency image streaming, we have developed our own system on chip encoder solution ‘dorothea’ in 2013. It is based on the second generation L2 (a tag referring to ‘two lane’) JPEG engine, allowing JPEG compression of YCbCr 4:2:2 video at full pixelclock.

The ‘dorothea’ SoC is now superseded by the new ZPUng architecture, allowing more microcode tricks than on the previous MIPS based SoC. It is available as reference design ‘dombert’ (see SoC design overview) for UDP streaming up to 100 Mbps, optionally, 1G cores (third party) can be deployed as well for more throughput.

L2 example videos

These example videos are taken by direct capture (as coming from the camera) of the UDP video stream. The direct Bayer to YUV422 method is implemented in a microcode engine (license and patent free) and may still show visible artefacts, also, color correction is not implemented for this demo. For the live videos, a MT9M024 sensor on the HDR60 development kit has been used. Bit files for evaluation are available in the Reference Design section.

Posted on

A full baseline pipelined JPEG encoder in VHDL

As described in a previous post, a framework was built to loop in a DCT hardware design into a software JPEG encoder for verification (and acceleration) purposes.

Turns out this strategy speeds up development a lot, and that the remaining modules on the way to a full hardware based and pipelined JPEG encoding solution weren’t a big job. Actually, I was expecting that this enhanced encoder would no longer fit into a small Spartan3E 250k. Wrong!

Have a look:

Device utilization summary:

Selected Device : 3s250evq100-5 

 Number of Slices:                     1567  out of   2448    64%  
 Number of Slice Flip Flops:           1063  out of   4896    21%  
 Number of 4 input LUTs:               2915  out of   4896    59%  
    Number used as logic:              2900
    Number used as Shift registers:      15
 Number of IOs:                          49
 Number of bonded IOBs:                  47  out of     66    71%  
 Number of BRAMs:                        12  out of     12   100%  
 Number of MULT18X18SIOs:                11  out of     12    91%  
 Number of GCLKs:                         2  out of     24     8%

JPEG encoder latency and timing

From the XST summary, we get:

Timing Summary:
Speed Grade: -5

   Minimum period: 13.449ns (Maximum Frequency: 74.353MHz)
   Minimum input arrival time before clock: 9.229ns
   Maximum output required time after clock: 6.532ns
   Maximum combinational path delay: 7.693ns

The timing is again optimistic, place and route normally deteriorates the latencies. The maximum clock is in fact the clock you can feed the JPEG encoder with pixel data (12 bit) without causing overflow. The output is a huffman coded byte stream that is typically embedded into a JFIF structure header, table data and the appropriate markers by a CPU.

There is quite some room for optimization, in fact, the best compromise of BRAM bandwidth and area has not yet been reached. Quite a few BRAMs ports are not used, but kept open to allow access through an external CPU, like for manipulation of the Huffman tables.

JPEG encoder waveforms
JPEG encoder waveforms

The last performance question might be the latency: how long does it take until encoded JPEG data appears after the first arriving pixel data? The above waveform snapshot should speak for itself: at 50MHz input clock, the latency is approx. 4 microseconds.

Colour encoding

We haven’t talked about colour yet. This is a complex subject, because there are many possibilities of encoding colour, but not really for the JPEG encoder. This is rather a matter of I/O sequencing and the proper colour conversion. As you might remember, a JPEG encoder does not encode three RGB channels, but in YUV space, which might be roughly described as: brightness, redness and blueness. The ‘greenness’ is implicitely included in this information. But why repeat what’s already nicely described: You find all the details right here on Wikipedia.

So, to encode all the colour, we just need properly separated data according to one of the interleaving schemes (4:2:0 or 4:2:2) and feed the MCU blocks of 8×8 pixels through the encoder while assserting the channel value (Y, Cb, Cr) on the channel_select input. Voilà.

Turns out that the Bayer Pattern that we receive from many optical colour sensors can be converted rather directly into YUV 4:2:0 space using the right setting four our Scatter-Gather unit (‘cottonpickens’ engine). With a tiny bit of software intervention through a soft core, we finally cover the entire colour processing stream. Proof below.

Colour JPEG
Colour JPEG output from the encoder
The original RGB photo
The original RGB photo

As you can see, the colours are quite not perfect yet compared with the original. This is a typical problem, that you get a greenish tint. We leave this to the colour optimization department 🙂

One more serious word: Just yesterday I’ve read the news and had to see that the person who changed the optical colour sensor industry, Bryce Bayer, has passed away. As a final “thank you” to his work, I’d like to post the Bayer Picture of the above.

Bayer pattern source image
Dedicated to Bryce Bayer