As described in a previous post, a framework was built to loop in a DCT hardware design into a software JPEG encoder for verification (and acceleration) purposes.
Turns out this strategy speeds up development a lot, and that the remaining modules on the way to a full hardware based and pipelined JPEG encoding solution weren’t a big job. Actually, I was expecting that this enhanced encoder would no longer fit into a small Spartan3E 250k. Wrong!
Have a look:
Device utilization summary: --------------------------- Selected Device : 3s250evq100-5 Number of Slices: 1567 out of 2448 64% Number of Slice Flip Flops: 1063 out of 4896 21% Number of 4 input LUTs: 2915 out of 4896 59% Number used as logic: 2900 Number used as Shift registers: 15 Number of IOs: 49 Number of bonded IOBs: 47 out of 66 71% Number of BRAMs: 12 out of 12 100% Number of MULT18X18SIOs: 11 out of 12 91% Number of GCLKs: 2 out of 24 8%
JPEG encoder latency and timing
From the XST summary, we get:
Timing Summary: --------------- Speed Grade: -5 Minimum period: 13.449ns (Maximum Frequency: 74.353MHz) Minimum input arrival time before clock: 9.229ns Maximum output required time after clock: 6.532ns Maximum combinational path delay: 7.693ns
The timing is again optimistic, place and route normally deteriorates the latencies. The maximum clock is in fact the clock you can feed the JPEG encoder with pixel data (12 bit) without causing overflow. The output is a huffman coded byte stream that is typically embedded into a JFIF structure header, table data and the appropriate markers by a CPU.
There is quite some room for optimization, in fact, the best compromise of BRAM bandwidth and area has not yet been reached. Quite a few BRAMs ports are not used, but kept open to allow access through an external CPU, like for manipulation of the Huffman tables.
The last performance question might be the latency: how long does it take until encoded JPEG data appears after the first arriving pixel data? The above waveform snapshot should speak for itself: at 50MHz input clock, the latency is approx. 4 microseconds.
We haven’t talked about colour yet. This is a complex subject, because there are many possibilities of encoding colour, but not really for the JPEG encoder. This is rather a matter of I/O sequencing and the proper colour conversion. As you might remember, a JPEG encoder does not encode three RGB channels, but in YUV space, which might be roughly described as: brightness, redness and blueness. The ‘greenness’ is implicitely included in this information. But why repeat what’s already nicely described: You find all the details right here on Wikipedia.
So, to encode all the colour, we just need properly separated data according to one of the interleaving schemes (4:2:0 or 4:2:2) and feed the MCU blocks of 8×8 pixels through the encoder while assserting the channel value (Y, Cb, Cr) on the channel_select input. Voilà.
Turns out that the Bayer Pattern that we receive from many optical colour sensors can be converted rather directly into YUV 4:2:0 space using the right setting for our Scatter-Gather unit (‘Cottonpicken’ engine). With a tiny bit of software intervention through a soft core, we finally cover the entire colour processing stream. Proof below.
As you can see, the colours are quite not perfect yet compared with the original. This is a typical problem, that you get a greenish tint. We leave this to the colour optimization department 🙂
One more serious word: Just yesterday I’ve read the news and had to see that the person who changed the optical colour sensor industry, Bryce Bayer, has passed away. As a final “thank you” to his work, I’d like to post the Bayer Picture of the above.