After I copy say 50+ GB(30,000 files including different formats) of files from an internal hard drive to an external drive is there any way to find out if everything has been copied correctly? Also if I stop in between by canceling the operation and later say merge when continuing the operation will the correctness take a hit?
I could use applications like back-in-time
but I am very choosy in copying files and for the next time I intend to use copy
operation and say merge
instead of replace
. Is it advisable when copying large number of files?
I'm using hashdeep to verify backups/restores and occasionally to check for file system corruption in a RAID.
The speed depends on which hash functions you use (some are more CPU intensive than others) as well as the read speed of your disks. On my system
hashdeep
can process or verify around 1 TB/hour with md5 and 300 MB/s read speed.Example on calculating checksums and storing them in a file:
Parameters:
r
– recursivel
– use relative pathsc
– specify hash function.
– recursive starting at the current directory>
– redirect output to the specified fileSee the man page.
Example on verifying checksums and printing a list of differences:
Parameters:
a
– audit (compare with the list of known checksums)v
– verbose (to get a listing of mismatches, multiplev
s means more verbose)k
– file of known hashesNote that as of March 2016
hashdeep
appears to be abandoned.It looks like the perfect task for rsync. Rsync is comparing and copying diffs.
The
rsync
utility first popped into my mind when I saw your question. Doing something like below could quickly show what files are in directorya
but not inb
:This is a good option because you can compare the contents of the files as well to make sure they match.
rsync
's delta algorithm is optimized for this type of use case. Then if you want to makeb
match the contents ofa
, you can just remove the-n
option to perform the actual sync.Some related questions:
If the GUI apps suggested over at File and directory comparison tool? don't do it for you, try
diff -rq /path/to/one /path/to/other
to recurse through both directories quietly, logging only differences to the screen.The situation you are saying is too complex. Though you can write a script to calculate MD5 of all the files you want to copy and later on compare them with the ones copied:
If you want something simple and fast (it will not work in very complex scenarios) you can use Meld
On the "if everything has been copied correctly", I use a modified cp (or mv) which includes checksumming (optionally stored in xattr, hence it only has to be calculated once for the source) http://sourceforge.net/projects/crcsum/