tar can create the archive in different formats. GNU tar, ustar, pax, v7. What would be the best for long time archiving? Is there significant differencies in these formats?
I would use the best format for general backups, and I don't want that maybe I can extract the data, because format problems. (v7 is disappear from implementation for example)
The GNU tar manual actually has an entire section dedicated for tar archive formats. The formats
ustar
andpax
are based on POSIX standards, andgnu
is very widespread. I'd steer clear from the other ones.My suggestion would be to choose
pax
, that is POSIX.1-2001. GNU tar is making it the default in the future and even oldustar
implementations can decompress it. It's also the least restricting format.You can create POSIX.1-2001 archives e.g. with GNU tar by specifying
--format pax
or with a separate pax archiver.Some technical comparisons among
v7
,ustar
andpax
formats:v7
The format before POSIX.1-1988.
'\0'
), directory, hard link (typeflag1
), symbolic link (typeflag2
). Directory is identified by the trailing slash in the name field. reference 1ustar
ustar extends the header block from the v7 format and, when uncompressed, the size of a ustar tarball is identical to v7 tarball. There's no big reason to prefer v7 format, unless you are deliberately stripping information that ustar would archive.
'\0'
or0
), directory (marked with typeflag5
), hard link, symbolic link, character device (3
), block device (4
), FIFO (6
). (Vendor extensions on file types are allowed inA
throughZ
.)ustar has minor, backward-incompatible differences from the pre-standard v7 format – the typeflags
0
and5
for regular files and directories respectively. In v7 the typeflag field used to indicate links only and not other file types.pax
pax extends ustar format through (optional) Extended Header blocks, these Extended Headers would look like regular text files when extracted though old tar programs. The Extended Headers are identified internally with typeflags
x
(file extended header) andg
(global extended header). Their unlimited extensibility also means that pax tarball would be typically larger than ustar. It's good for archiving, but a bit bloaty for a format for software distribution.pax is a superset of ustar format – a pax tarball becomes no different from ustar if all of its Extended Headers are stripped out.
You can read this for what can be extended in pax format. But comparing to ustar in summary:
path=
keyword in Extended Header).linkpath=
keyword)size
(file size),uid
(user ID),uname
(user name),gid
(group ID),gname
(group name), are all extensible to unlimited length.path
,linkpath
,uname
andgname
.atime
) can be stored along with modification time (mtime
).Note: POSIX does not mandate a filename pattern for storing Extended Headers, so implementations are free to make any name pattern they want. In GNU tar, for example, the name pattern is controlled via
--pax-option=exthdr.name=
option. If you plan to make a deterministic tarball (amongtar
/pax
implementations), beware of this.gnu and oldgnu formats
According to GNU tar manual, GNU tar was based on the early draft of POSIX.1
ustar
standard. But GNU extensions totar
makes it incompatible withustar
format. If you want to make a portable archive, you should avoid GNU tar format and favorpax
orustar
instead.GNU tar format may be identified with the magic field (8 bytes) of
ustar<space><space><nul>
, comparing to ustar's magic and version fieldsustar<nul>00
.GNU tar format is backward-compatible with v7 format nevertheless.
ustar
that uses prefix field for extending the path, GNU tar stores the long filename in a (non-pax) extended header, which has typeflagL
. Similarly, link targets are extended though an extended header with typeflagK
.D
). See GNU tar--incremental
option.M
). See GNU tar--multi-volume
option.S
).V
), or a label for the archive volume. See GNU tar--label
option.oldgnu
(GNU tar <= 1.12) andgnu
(GNU tar >= 1.13.12) formats are minor for end-users, but according to the manual and create.c and NEWS from source code, there are at least two differences:oldgnu
format will always terminate the strings file names, user names and group names with null bytes. (This means file names have a maximum of 99 characters before using an extended header.)pax is POSIX compliant... That having been said, I only use tar, tar+gz and tar+bz2
tar.gz is the fairly standard one.
tar archives all the files into a single file, like .iso, but it doesn't compress those.
gzip (gz) will compress the tar files.
The *nix command line to perform this is: