ZFS supports file-system compression and it also caches frequently or recently accessed data.
If a system has lots of CPU but the underlying data storage system is slow. It is possible that ZFS would perform better with compression turned on. This can be easily tested when writing files by measuring CPU and disk usage and throughput. (of course latency may exist,, but this would not be an issue for large files).
But what about cache? If data will have to be decompressed every time it is read then this is probably less of a good idea.
Is the cached data compressed?. Does anybody have some information on this?
I asked Richard Elling, ex-sun ZFS engineer this question. He told me L2ARC is uncompressed, just like the ARC is uncompressed.
Sorry, I can't provide documentation or specifications. My only proof is that one of the guys who helped design ZFS told me in person when I met him last week. :)
Today the L2ARC can be compressed with LZ4, the ARC is still uncompressed. More information is available on the OpenZFS website -> http://open-zfs.org/wiki/Features#l2arc_compression
Cached data in ARC or L2ARC is always uncompressed. Period. Otherwise every read from ARC or L2ARC would have corresponding CPU overhead, which with some algorithms could be significant (I'm looking at you bzip2). Assuming compression=yes on your filesystem(s), data on pool disks and the ZIL (if applicable), will always be compressed.
You are correct, when storing data that compresses well and a system with plenty of CPU but limited IO might preform better with compression enabled. This is not a unique characteristic of ZFS, you'll find plenty of references to this with regards to enabling compression on NTFS or other filesystems.
This has changed in recent versions of zfs (at least on linux). We just did an apples for apples comparison with two 32k datasets one with lz4 compression the other without. The memory used by arc was double in the case of uncompressed.
It appears that actually it is usually more efficient to uncompress the data that is actually needed into a short term cache as the arc cache often reads in data that is never requested. Seems the choice was made to compress in memory..
I also see a few parameters in the /proc/spl/kstat/zfs/arcstats file that confirm this:
This commit looks relevant https://www.illumos.org/issues/6950
Yes. ZFS will cache frequently accessed data in either form. Performance with the default compression scheme is good and costs nothing but a little CPU time. The compression is done on the fly. You can extend this even further by adding an SSD L2ARC cache device.