I run a test creating two tars from the same dir (its files remained unchanged), and I found that their md5sums were different. I assume there's some timestamp being included in the tar's header, but I haven't found the way to override it. My OS is Ubuntu 9.1. Any ideas ?
Thanks.
As Dennis pointed out above, it's gzip. Part of the gzip header is a mod time for whatever is compressed in the file. If you need gzip, you can compress the tarfile as an extra step outside of tar rather than using tar's internal gzip. The gzip command has a flag to suppress the saving of that modification time.
This will not affect times inside the tarfile, only the one in the gzip header.
To make a tar file with a consistent checksum, just prepend
GZIP=-n
like this:How this works: Tar can accept gzip options using a temporary
GZIP
environment variable, as above. Like Valter said, tar uses gzip, which by default puts a timestamp in the archive. This means you get a different checksum when you compress the same files. The-n
option disables that timestamp.I had this problem too, to make gzip do not alter the timestamp, use
gzip -n
-n, --no-name do not save or restore the original name and time stamp
Example:
I went down a rabbit-hole after the other answers failed me, and managed to figure out that my version of tar (1.27.1 from the openSUSE 42.3 OSS repo) was using the non-deterministic
pax
archival format by default, which means that even without compression, (and even setting the mtime explicitly) archives created with tar from the same files would differ:Note that the output above differs, even though no compression is being used; the uncompressed archive contents (generated by running tar twice on the same contents) are different, so the compressed content will also differ even when using
GZIP=-n
as other answers suggestIn order to get around this, you can specify
--format gnu
:This works with the suggestion about gzip above:
However, in addition to valid reasons to prefer better compression formats to gzip, you might want to consider using xz instead (which tar also supports with the
--xz
or-J
flags instead of-z
), because it saves you a step here; the default behaviour ofxz
is to generate the same compressed output when the uncompressed contents are the same, so there's no need to specify an option likeGZIP=-n
: