Monday, March 7, 2011

Reaching system speed limits

Although results from CompressionRatings benchmark were good, i was nonetheless surprised to notice that decoding speed would not improve on increasing the number of threads.

The only explanation i could come up with is that LZ4 decoder is decompressing data faster than the RAM Drive can deliver. Which is likely a correct statement, considering the in-memory benchmark results (LZ4 can decompress at an average 800MB/s per core, while RAM Drive can only deliver between 400 and 500 MB/s).

However, it would not explain why compressing with 4 threads was faster than decoding...

Here also, there is a plausible explanation : writing to RAM Drive may be slower than reading. In this case, compression has an advantage, since it writes less data.

Now, let's put that hypothesis to the test. I built a quick benchmark to measure read and write speed from a RAM Drive installed into a Windows XP box. I'm using Gavotte's Ramdisk for this test. Using another ramdisk might result in different speed, but is unlikely to dramatically change the conclusions. A different OS may also change results, but since CompressionRatings run with Windows XP, i'm mostly interested in mimicking the same conditions.

On running the benchmark, i witnessed a very stable 1190 MB/s for read operations.
On the other hand, write operations were limited at "only" 770 MB/s.
So now, that's confirmed : writing is slower than reading.

It's consistent with CompressionRatings results. Extrapolating from these figures :
if compressing at a ratio of 2:1, then the ramdrive r+w speed is limited to 525 MB/s.
On decoding the same data 1:2, the speed limit is now down to 455MB/s.
Hey, that's about the same ratio as LZ4 compression/decompression speed difference  ...

No comments:

Post a Comment