How can I use docker without sudo?

Question

kos

Asked: 2015-02-21 04:47:36 +0800 CST2015-02-21 04:47:36 +0800 CST 2015-02-21 04:47:36 +0800 CST

Why does gzip on tar output always produce different results?

772

What I expect from two commands which always produce the same output on their own, is them to always produce the same output when put in a pipeline, but apparently this is not the case for tar | gzip:

~/test$ ls
~/test$ dd if=/dev/urandom of=file bs=10000000 count=1
1+0 records in
1+0 records out
10000000 bytes (10 MB) copied, 0,877671 s, 11,4 MB/s // Creating a 10MB random file
~/test$ tar cf file.tar file // Archiving the file in a tarball
~/test$ tar cf file1.tar file // Archiving the file again in another tarball
~/test$ cmp file.tar file1.tar // Comparing the two output files
~/test$ gzip -c file > file.gz // Compressing the file with gzip
~/test$ gzip -c file > file1.gz // Compressing the file again with gzip
~/test$ cmp file.gz file1.gz // Comparing the two output files
~/test$ tar c file | gzip > file.tar.gz // Archiving and compressing the file
~/test$ tar c file | gzip > file1.tar.gz // Archiving and compressing the file again
~/test$ cmp file.tar.gz file1.tar.gz // Comparing the output files
file.tar.gz file1.tar.gz differ: byte 5, line 1 // File differs at byte 5
~/test$ cmp -i 5 file.tar.gz file1.tar.gz // Comparing the output files after byte 5
~/test$

Adding to this, even tar cfz file.tar file on his own always produces different outputs:

~/test$ tar cfz file2.tar file // Archiving and compressing the file
~/test$ tar cfz file3.tar file // Archiving and compressing the file again
~/test$ cmp file2.tar.gz file3.tar.gz // Comparing the output files
file2.tar.gz file3.tar.gz differ: byte 5, line 1 // File differs at byte 5
~/test$ cmp -i 5 file2.tar.gz file3.tar.gz // Comparing the output files after byte 5
~/test$

While splitting the pipeline finally produces an output that makes sense:

~/test$ gzip -c file.tar > file4.tar.gz
~/test$ gzip -c file.tar > file5.tar.gz
~/test$ cmp file4.tar.gz file5.tar.gz 
~/test$

It looks like whatever happens happens only when tar's output is piped directly into gzip.

What is the explanation of this behavior?

2 Answers

Voted

Clayton Mills · Answer 1 · 2015-02-21T06:09:20+08:00

The header for the resulting gzip file is different depending on how it is called.

Gzip tries to store some origin information in the resulting file header. When called on normal files this includes the origin file name by default and a timestamp, which it gets from the original file.

When it is made to compress data piped to it, the origin is not as easy as with a normal file, so it resorts to a different naming and time stamp convention.

To prove this try adding the -n param to the offending lines in your example as...

~/temp$ tar c file | gzip -n > file1.tar.gz
~/temp$ tar c file | gzip -n > file.tar.gz
~/temp$ cmp file.tar.gz file1.tar.gz

Now the files are identical again...

From man gzip ...

   -n --no-name
          When  compressing,  do  not save the original file name and time
          stamp by default. (The original name is always saved if the name
          had  to  be  truncated.)  When decompressing, do not restore the
          original file name if present (remove only the gzip suffix  from
          the  compressed  file name) and do not restore the original time
          stamp if present (copy it from the compressed file). This option
          is the default when decompressing.

So the difference is indeed the original file name and time stamp information that is turned off by the -n param.

Andrea Corbellini · Answer 2 · 2015-02-21T04:52:17+08:00

Andrea Corbellini

2015-02-21T04:52:17+08:002015-02-21T04:52:17+08:00

Gzip files include a timestamp. If you create two gzip files at different times, these will different by the creation time, not by content.

5

Why does gzip on tar output always produce different results?

How to install Google Chrome

Is there a command to list all users? Also to add, delete, modify users, in the terminal?

How to delete a non-empty directory in Terminal?

How to unzip a zip file from the Terminal?

How can I copy the contents of a folder to another folder in a different directory using terminal?

How do I install a .deb file via the command line?

How do I run .sh scripts?

How do I install a .tar.gz (or .tar.bz2) file?

How to list all installed packages

Unable to lock the administration directory (/var/lib/dpkg/) is another process using it?