Is there a good way to prime a ZFS L2ARC cache on Solaris 11.3?
The L2ARC is designed to ignore blocks that have been read sequentially from a file. This makes sense for ongoing operation but makes it hard to prime the cache for initial warm-up or benchmarking.
In addition, highly-fragmented files may benefit greatly from sequential reads being cached in the L2ARC (because on-disk they are random reads), but with the current heuristics these files will never get cached even if the L2ARC is only 10% full.
In previous releases of Solaris 10 and 11, I had success in using dd
twice in a row on each file. The first dd
read the file into the ARC, and the second dd
seemed to tickle the buffers so they became eligible for L2ARC caching. The same technique does not appear to work in Solaris 11.3.
I have confirmed that the files in question have an 8k recordsize, and I have tried setting UPDATE: zfs_prefetch_disable
but this had no impact on the L2ARC behaviourzfs_prefetch_disable
turns out to be important, see my answer below.
If there is no good way to do it, I would consider using a tool that generates random reads over 100% of a file. This might be worth the time given that the cache is persistent now in 11.3. Do any tools like this exist?
With a bit of experimentation I've found four possible solutions.
With each approach, you need to perform the steps and then continue to read more data to fill up the ZFS ARC cache and to trigger the feed from the ARC to the L2ARC. Note that if the data is already cached in memory, or if the compressed size on disk of each block is greater than 32kB, these methods won't generally do anything.
1. Set the documented kernel flag
zfs_prefetch_disable
The L2ARC by default refuses to cache data that has been automatically prefetched. We can bypass this by disabling the ZFS prefetch feature. This flag is often a good idea for database workloads anyway.
..or to set it permananently, add the following to
/etc/system
:Now when files are read using
dd
, they will still be eligible for the L2ARC.Operationally, this change also improves the behaviour of reads in my testing. Normally, when ZFS detects a sequential read it balances the throughput among the data vdevs and cache vdevs instead of just reading from cache -- but this hurts performance if the cache devices are significantly lower-latency or higher-throughput than the data devices.
2. Re-write the data
As data is written to a ZFS filesystem it is cached in the ARC and (if it meets the block size criteria) is eligible to be fed into the L2ARC. It's not always easy to re-write data, but some applications and databases can do it live, e.g. through application-level file mirroring or moving of the data files.
Problems:
3. Unset the undocumented kernel flag
l2arc_noprefetch
This is based on reading the OpenSolaris source code and is no doubt completely unsupported. Use at your own risk.
Disable the
l2arc_noprefetch
flag:Data read into the ARC while this flag is disabled will be eligible for the L2ARC even if it's a sequential read (as long the blocks are at most 32k on disk).
Read the file from disk:
Re-enable the
l2arc_noprefetch
flag:4. Read the data randomly
I wrote a Perl script to read files in 8kB chunks pseudorandomly (based on the ordering of a Perl hash). It may also work with larger chunks but I haven't tested that yet.
Problems:
Remaining issues
evict_l2_eligible
kstat increases even when the SSD devices are under no pressure, indicating that data is being dropped. This remaining rump of uncached data has a disproportionate effect on performance.I'd suggest using a real workload and monitoring the result with
arcstat
.Something like:
I don't think there's any need to "prime" the cache. If the workload you have doesn't naturally populate the cache, then it's not a representative benchmarking workload, right?
Maybe you have an exceptional use case (what's your dataset size, ARC size and working set size?), but in general, the focus on L2ARC is overemphasized.