Recently I switched from zip
to bz2
for compressing nightly database dumps. The command I'm using is tar cj
. The old zip
files would always differ ever so slightly in size from day to day:
-rw-r--r-- 1 mysql mysql 1192139 Aug 20 22:00 mysql_full_export.Fri.zip
-rw-r--r-- 1 mysql mysql 1192425 Aug 23 22:00 mysql_full_export.Mon.zip
-rw-r--r-- 1 mysql mysql 1192140 Aug 21 22:00 mysql_full_export.Sat.zip
-rw-r--r-- 1 mysql mysql 1192145 Aug 22 22:00 mysql_full_export.Sun.zip
-rw-r--r-- 1 mysql mysql 1192137 Aug 19 22:00 mysql_full_export.Thu.zip
-rw-r--r-- 1 mysql mysql 1192403 Aug 24 22:00 mysql_full_export.Tue.zip
-rw-r--r-- 1 mysql mysql 1186645 Aug 25 22:00 mysql_full_export.Wed.zip
Whereas the new bz2
files show identical file sizes over the last week:
-rw-r--r-- 1 mysql mysql 972800 Oct 1 22:00 mysql_full_export.Fri.bz2
-rw-r--r-- 1 mysql mysql 972800 Oct 4 22:00 mysql_full_export.Mon.bz2
-rw-r--r-- 1 mysql mysql 972800 Oct 2 22:00 mysql_full_export.Sat.bz2
-rw-r--r-- 1 mysql mysql 972800 Oct 3 22:00 mysql_full_export.Sun.bz2
-rw-r--r-- 1 mysql mysql 972800 Oct 7 22:00 mysql_full_export.Thu.bz2
-rw-r--r-- 1 mysql mysql 972800 Oct 5 22:00 mysql_full_export.Tue.bz2
-rw-r--r-- 1 mysql mysql 972800 Oct 6 22:00 mysql_full_export.Wed.bz2
Is this normal for bz2
if the compressed files differ only slightly in size? This database hardly changes but it does change a little bit as you can see from the zip
file sizes.
Follow-up:
The answer marked correct below seems the best explanation. The suggestion to calculate an md5 checksum was also helpful as it confirmed that the files are indeed different:
$ md5sum *.bz2
7bec25e80644645e6b2d5b417bb4627d mysql_full_export.Fri.bz2
9cca30e7ed4fb536976ef9d8705e0466 mysql_full_export.Mon.bz2
bc9b9cd1e5a5e552811bff80192b1b43 mysql_full_export.Sat.bz2
7ebbed98f7153a6cafe61836d9a6440d mysql_full_export.Sun.bz2
ad1af98a0ecf90bef1dc1c0b3dedb101 mysql_full_export.Thu.bz2
b399d30e03c200c1ad03bde391e5e682 mysql_full_export.Tue.bz2
b14b4d1bb22ef39b9ebc2f668a2f520d mysql_full_export.Wed.bz2
Perhaps there is a bug in the script archive. Compare files:
Compare the contents of archives(use diff or cmp).
In the directory containing your bz2 files paste this command:
If the checksums all differ then the uncompressed files are different.
Another thought is that the tar file format is always aligned on a 512 byte boundary, it pads it out with
NUL
characters if it's shorter (per file).Now granted, the tar should be being done before the bz2, so it should still be varying in size (theoretically). But perhaps it's compressing first and then putting it into the tar, causing it to be aligned to the 512 byte boundary?