I'd like to back up some old folders with documents I only very rarely need to access. For that, I'd like to put them all in one archive. As this will be a backup, the format should suit that purpose. So, bottom line:
Which one is the most reliable/robust archiving format in Ubuntu?
It depends. The two most popular options are tarballs and zip files, but they both are lacking:
.tar
tape archives are a very popular option for most Linux users. It preserves UNIX file permissions (which is important for a backup) and hard links. It's supported out-of-the box on every Linux distro I've tested, as well as some Windows programs like 7-zip. However, tar has several limitations and drawbacks for the back-up use-case, as explained by the Duplicity developers. It can be very slow: even to get a list of filenames stored in the archive, the entire archive must be read. It also doesn't handle detailed meta-data that some newer filesystems have..zip
zip files acts as both an archive and a compression format. For speed, you can disable compression completely. Zip files are better than tape archives in that they store a type of table of contents, allowing programs to quickly jump to the specific file they need to extract. It also stores checksums for the contents of each file, to allow for easy file corruption detection. Zip files are extremely popular, unfortunately, they are not suitable for Linux back-ups because they do not store simple file permissions.Here are two more options that are, sadly, also lacking:
.7z
7z compressed archives have some excellent features such as encryption and support for very large files. Unfortunately, it does not store UNIX file permissions, so it is not suitable for Linux back-ups..ar
classic UNIX archives are the predecessor to tar archives, and suffer from the same limitations as tar archives.In my opinion, there is no completely robust back-up archive format for Linux back-ups, none that are sufficiently well-known to warrant my trust, any way.
One way to overcome the limitations of each of these formats is to combine them: for example, archive each file individually in a tar archive, and then archive all of these tarballs in one zip file.
If you want a really robust back-up, you should probably look into these solutions instead:
back-up directly on to an external hard disk, with the same file system on both source and destination. This ensures that you will store each file's permissions and metadata exactly as intended. (As an aside, the owners and group owners of files are stored using their userid and groupid numbers, not their names.)
Use full-disk imaging and cloning software, like CloneZilla. You can't retrieve one file from one of these back-ups, but you can be absolutely sure that you have saved everything you possibly can.
And remember, always remember: you can only be confident of your back-ups if you have attempted to restore them. If the worst came to worst and your source hard-drive was completely destroyed, could you restore everything you need to restore to a new hard-drive? Would it work as you expect? Try restoring your back-up to a new hard-disk and try running from that hard-disk for a couple of days. If you notice anything missing, you know your back-up wasn't thorough enough.
Also think about where you are keeping your back-ups. You need at least some back-ups that are not in same building as the source disks to protect yourself from theft or fire. Some options for this are the cloud, or a friend's house.
A tarball (
.tar
files) would be the way to go. Use thegzip
compression format for less compression, but a good speed.bzip2
is much slower but provides a better compression ratio. For binary data, there is not a big different though.The command for compressing a directory using the gzip compression:
To extract a gzip-compressed tarball while preserving file permissions:
Replace
z
byj
for the bzip2 compression and addv
before v (e.g.czvf
andxzpvf
) to print the filenames as they're archived / extracted.I choose 7zip (
sudo apt-get install p7zip-full
). It looks like an ideal compressing archiver from my point of view.I don't like classic tarballs for their clumsiness as the whole tar file is to be decompressed (which may happen behind the scenes, but still happen) just to view the archive's table of contents.
Although not nearly as well known or widely used I'd be inclined to go with afio due to the way it compresses files individually thereby making recovery possible in case of corruption. Install via aptitude or similar.
Gzipped tarball (.tar.gz, .tgz), Linux archiving standard. You can't go wrong with that.
Never in my life did I have a corrupted or even troublesome .tar.gz archive. At FlatmateRooms we use this to archive hundreds of thousands of images on the server and all backups.
In some cases this one is usefull for me.