Recently, I've been backing up a lot of my data, and I noticed that I can save files as .gz
or .tar.gz
, or .7z
and .tar.7z
, etcetera. What are the differences between the normal one and the .tar.*
variant? Which one of them is adviced when making backups?
If you come from a Windows background, you may be familiar with the zip and rar formats. These are archives of multiple files compressed together.
In Unix and Unix-like systems (like Ubuntu), archiving and compression are separate.
tar
puts multiple files into a single (tar) file.gzip
compresses one file (only).So, to get a compressed archive, you combine the two, first use
tar
orpax
to get all files into a single file (archive.tar
), thengzip
it (archive.tar.gz
).If you have only one file, you need to compress (
notes.txt
): there's no need fortar
, so you just dogzip notes.txt
which will result innotes.txt.gz
. There are other types of compression, such ascompress
,bzip2
andxz
which work in the same manner asgzip
(apart from using different types of compression of course).It depends on what you are looking for... Compression or archiving?
When I talk about archiving, I mean preserving permissions, directory structure, etc...
Compression may ignore most of that and just get your files in a smaller packages.
To preserve file permissions, use tar:
The p flag will save file permissions. Use the z flag for gzip compression or the j flag for bzip compression.
If you want to have a tar file you can "update" package the tar using the P flag:
Then to update, replace 'c' with 'u' and when unpacking, you can use 'k' to preserve files that already exist.
The P flag saves files with full paths, so - /home/username vs home/username (notice the leading forward slash).
7z compression offers greater compression, but does not preserve file ownership, permissions, etc. Rzip is another compression utility that offers comparable compression with 7z as well.
I guess a backup.tar.7z file is just a tar file (with permissions) compressed by a 7z file, though I wouldn't be surprised if little compression occurred because 7z may not be able to dump the file metadata. It's 7z's ability to exclude the file metadata that it can offer great compression (amongst other things of course).
Compression depends entirely on data type as well. Some files don't compress well because they may already be compressed with some other means (ie, .mp3, .jpg, .tiff/with lzma, .rpm, etc).
gzip or bzip2 doesn't know about
file system
- file name, directory, or tree structure. It just compresses input stream, then output result. Even gzip or bzip2 can't archive directories on their own, that is why it is usually combined with tar.tar(archiver) - just archive file structure. gzip,bzip2(compressor) - just compress input.
I think this strategy came from 'do one thing well' Unix philosophy. Tar works well? Leave it as is. Need more compression ratio than gzip? Here is bzip2 or 7zip.
its different styles of compression , tar by itself is simply archived(little to no compression). tar.gz is a tar archive but the contents are compressed by gzip(moderate compression) hence the .gz and tar.7z is compressed using 7zip (usually super high compression)
when backing up I would recommend tar.7z as it has the highest compression rate saving you space but uses an extra program (7zip). .tar.gz will be larger files and do the same job, you could also use bzip (.tar.bz/bz2) although i'm not sure if that would suit you better as I use gzip or 7zip
typically, *.tar files are just tar files created by tar program, *.gz programs are created by gzip, *.tar.gz (sometime also *.tgz) are gziped tar files, and *.7z are created by 7zip.
However, in Linux/Unix, one can name a file pretty much anyway he wants, so it is completely at the discretion of the creator of the files.
Tar (Tape Archiver) has traditionally been used as a container in Unix/Linux to package files for movement. It packages the file structure and maintains file attributes, but it doesn't compress the files.
Compression programs compress the file to make it smaller, but they may not handle multiple files, and/or they may not handle the file attributes neccesssary for Linux. Since tar already exists and is well-supported, there's no reason for archiving programs to duplicate this functionality, which is platform-specific (re, different for Windows and Linux). Also, different compression programs may perform differently on different types of files, so having a choice of more than one is desirable.
Other answers have explained the difference between compression and archiving well.
7z
is an archiver, which means it knows about the internal directory structure, file names, etc. without having to decompress everything. However, there are some limitations. I quote fromman 7z
on my Ubuntu system:There you have it. One can use
tar
inside7z
(resulting indirectory.tar.7z
) to make sure you have preserved all the special Linux goodies. However, 7z will only know about the one tar file inside, and the entire tar file will have to be unpacked and read to discover what lies inside. Therefore, for a bunch of regular files, and where ownership doesn't matter, just use 7z directly.Also, if a tar file (or a compressed tar.anything file) is damaged, you will only be able to recover your data up to the point of injury. With an archive like 7z (not using tar inside) your chances of recovering more files are better.
PS: 7z can also create solid archives, which result in better compression, but comes with the same limitations as using tar inside any compressor. Source: https://en.wikipedia.org/wiki/Solid_compression