Friday, September 27, 2013

Stream Decoding (part 2)

 I've been considering a complementary idea since writing the first part of LZ4 streaming interface.

A major point which bothered me was the need to shuffle data around, and the impossibility to decode data directly at final destination. Let's go into detail.

Currently, the LZ4 decoding functions either accept of refuse references outside of current block. Refusing is the default behavior, compatible with independent blocks.

For chained block, however, there is a major restriction. The decoding function will expect to find data from previous block *just before* current block. The good news is that it makes the decoding function fairly simple. The downside is that it pushes some requirements into user territory.

Addressing this requirement is now a major concern. It can be accomplished by decoding into an intermediate buffer, where the last decoded 64KB can be saved, and decompression operation can be performed in a contiguous space after it.

Decoding into an intermediate buffer makes the work done, but also results in the following issues :
1) It's necessary to allocate more memory for stream-decoding. Memory-constrained systems may not afford it. Worse, this cost is multiplied by the number of streams decoded in parallel.
2) It's necessary to copy decoded data, from the intermediate buffer to final destination. This is obviously bad for performance.
3) Keeping the last decoded 64KB just before the new block to decode require either to repeatedly copy 64KB at this place, which can become a serious waste for small blocks (think about small network packets), or require an even bigger intermediate buffer to limit the number of times data is copied.

Trying to fix these issues, I came up with a different design, with much more desirable properties.

The main idea is to work with a rotating memory buffer. It reduces the memory requirement for stream-decoding to just and only 64KB.

Even more importantly, it authorizes to decode directly at final destination, with no requirement regarding size or continuity. This is way more flexible for a user program.

To reach these properties, it's necessary to create a new family of decoding functions, able to use an external dictionary. The main idea is : when a reference is generated "outside of current block", look into another memory space, provided as an input argument.

As a positive side effect, this capability saves a lot of memory for server systems, handling many clients' connections. When using full adaptive streams, the memory requirement is only 64KB per decoding stream.
But even better, when using a static dictionary (which means, each block is compressed with the same prefix data), memory requirement per stream is reduced to zero. All decoding streams will directly read into the same memory block. For systems handling huge number of streams in parallel, it is a major memory allocation saver.

With the main properties of the decoding stream settled, now is the time to talk about the compressing stream.

(To be continued...)