I'm trying to speed up a standup of virtual machines used for development / automated test environments and wanted to verify some assumptions about disk writes caching.
I'm using ext4
for the root filesystem in the VM and I don't really care about power-loss scenarios. If there's a power loss and disk gets corrupted, the whole machine can be rebuilt in a couple of minutes. For me that means the following options can be safely applied and it should make no difference to the applications - they will just affect how the buffered data is written to the disk itself, but the cached in-memory representation will be always accurate:
- nobarrier
- data=writeback
- nobh
- commit=3600
Is this correct? And are there any other ext4 parameters I should look at for performance improvements?
journal_async_commit, noauto_da_alloc, dioread_nolock. See ext4 documentation for descriptions.
Also nouser_xattr, noacl if you don't use them might give minor performance improvements on first lookups (but is not relevant unless you're using order of millions of files).
Note that using commit=3600 (while improving overall time for some operation due to batching) might not be doing what you want. When it triggers (probably much sooner than 3600, due to journal full conditions) you'll have BIG burst of I/O which would stop mostly anything running on the machine until it's finished (which could be minutes, depending on your journal sizes and I/O speeds). Smaller value will give you more but smaller bursts of metadata, so it would not look like machine "hanged". It might or might not be of issue for you.
If you do not want jounrnal, you might want to disable it completely - note that it might improve performance somewhat, but it also might make it worse:
Also, some specific loads (like creating and removing many small files in small timeframes, like SMTP mail queue of busy mail server for example) might paradoxically actually prefer data=journal to data=writeback (or even no journal at all) - as it will be using journal only (which is linear writes instead of random writes, hence MUCH faster on non-SSD storage).
But most importantly - you will have to benchmark to find which one suits you the best - there is no silver bullet.