I want to compress about 100'000 files (that's what find . -type f | wc -l
says) with a total disk usage of 100 GB. Most of the files are small but just a handful of them makes up about 70 GB of the 100 GB.
I don't want to use tar or tar.gz for this because if I want to access the archive, File Roller first has to read in the entire archive from the external HDD before I can even see the file list. Same thing if I try to list the files on the terminal.
I don't need the rights management of tar because I can remember the few files which need other rights than the others. What compression algorithm should I use?
And while I'm at it: I make full disk backups with this command:
dd if=/dev/sda bs=32M | gzip -9 > /location/dateAndMachineName.gz
It does a pretty good compression. But do you know a better compression algorithm?
The only solution I am aware of is pixz (
sudo apt-get install pixz
), a variant of xz using a blocked encoder which allows for fast random acccess/indexing. Additionally, it is a parallel method using multiple cores for compression.Citing the docs:
Usage is simple:
tar -Ipixz -cf foo.tpxz foo
to compress a folderfoo
pixz -l foo.tpxz
to list files in it (fast!)pixz -x <file_path> < foo.tpxz | tar x
to extract a single file given<file_path>
in the archiveAs a bonus, you will get access rights stored as well since the files are tarred first!
I con only think of one solution for you: Make a new partition, with a btrfs filesystem and activate transparent compression. Keep in mind tha some people still considder btrfs an "experimental" filesystem. That being said, my secondary backup HDD is using btrfs (for little over 2 years) and so far it's given me 0 issues. But as usual YMMV.
This and this should get you started with btrfs, if you are not familiar with it already.