I'm trying to understand all the relevant implications of the --max-obj-size value used when creating an s3ql file system. I have yet to find a complete description of the implications of this option, but have been able to piece together a few bits from the docs and discussion groups.
Mainly, I have found reasons to use larger --max-obj-size values, which leaves me wondering, why not use an arbitrarily large value (10mb? 100mb? 1gb?):
- Smaller values mean more "inodes" are used, and worse performance from the sqlite database (as the same number of files requires more inode entries)
- Smaller values can hurt throughput (especially for sequential reads).
From the version 1.8 changelog:
As a matter of fact, a small S3QL block size does not have any advantage over a large block size when storing lots of small files. A small block size, however, seriously degrades performance when storing larger files. This is because S3QL is effectively using a dynamic block size, and the --blocksize value merperformanceely specifies an upper limit.
So far the only advantages I have found or imagined for smaller block sizes are:
- Less bandwidth used to re-write a portion of a file
- Possibly better deduplication
The --min-obj-size option does not affect deduplication. Deduplication happens before blocks are grouped.
The --max-obj-size affects deduplication, since it implicitly determines the maximum size of a block.
Found here:
Can anyone offer a summary of the trade-offs one makes when selecting a larger or smaller --max-obj-size when creating a s3ql file system?