I've been using the following command line to do recursive checksumming of directories. It seems to get the job done, but still being a newbie I've been wondering, are there any potential problems with doing it this way? Is it possible for this command to miss files or otherwise mess up?
find ./dir/ -type f -exec sha1sum {} \; > files.sha1
There's nothing wrong I can see with your approach. You're excluding directories and find will include hidden files by default. Yeah, it's fine.
But I'll offer you an alternative because that's what I do:
globstar
enables a recursive match for**
anddotglob
enables matching hidden files. Between them they expand to all the filenames and sha1sum can parse them all.The main problem with this approach is it will pass all the filenames off to
sha1sum
in one fat pile. While this can be somewhat faster at small loads, it will explode if you have too many filenames. I don't know what the cut-off is.Python script with hashlib and os.walk
Aside from using
find
andglobstar
, python has modules for hashsum calculation and recursive walk through directory tree. Thus, one can write a simple script just as presented below. In fact, this script is pretty much the same as what I've used for this answer with one minor difference.This script assumes that you want to recursively walk through the current working directory, so make sure you
cd
to the desired top directory first.I would also recommend that you save it in
~/bin
directory and runsource ~/.bashrc
prior to usage, since that way, you can just type the name of the script on command-line.The script gathers all files, including the hidden-ones ( with the leading dot in the filename ).
Script Source
Demo Run