I have two directories that should contain the same files and have the same directory structure.
I think that something is missing in one of these directories.
Using the bash shell, is there a way to compare my directories and see if one of them is missing files that are present in the other?
You can use the
diff
command just as you would use it for files:If you want to see subfolders and -files too, you can use the
-r
option:A good way to do this comparison is to use
find
withmd5sum
, then adiff
.Example
Use find to list all the files in the directory then calculate the md5 hash for each file and pipe it sorted by filename to a file:
Do the same procedure to the another directory:
Then compare the result two files with
diff
:Or as a single command using process substitution:
If you want to see only the changes:
The cut command prints only the hash (first field) to be compared by diff. Otherwise diff will print every line as the directory paths differ even when the hash is the same.
But you won't know which file changed...
For that, you can try something like
This strategy is very useful when the two directories to be compared are not in the same machine and you need to make sure that the files are equal in both directories.
Another good way to do the job is using Git’s
diff
command (may cause problems when files has different permissions -> every file is listed in output then):Through you are not using bash, you can do it using diff with
--brief
and--recursive
:The
man diff
includes both options:Maybe one option is to run rsync two times:
With the previous line, you will get files that are in dir1 and are different (or missing) in dir2.
The same for dir2
You can delete the
-n
option to undergo the changes. That is copying the list of files to the second folder.In case you do that, maybe a good option is to use
-u
, to avoid overwriting newer files.A one-liner:
Here is an alternative, to compare just filenames, and not their contents:
This is an easy way to list missing files, but of course it won't detect files with the same name but different contents!
(Personally I use my own
diffdirs
script, but that is part of a larger library.)I would like to suggest a great tool that I have just discover: MELD.
It works properly and everything you can do with the command
diff
on Linux-based system, can be there replicated with a nice Graphic Interface!For instance, the comparison of directories is straightforward:
and also the files comparison is made easier:
There is a nice integration with some control version (for instance Git) and can be used as merge tool. See the complete documentation on its website.
If you want to make each file expandable and collapsible, you can pipe the output of
diff -r
into Vim.First let's give Vim a folding rule:
Now just:
You can hit
zo
andzc
to open and close folds. To get out of Vim, hit:q<Enter>
The
-R
is optional, but I find it useful alongside-
because it stops Vim from bugging you to save the buffer when you quit.Inspired by Sergiy's reply, I wrote my own Python script to compare two directories.
Unlike many other solutions it doesn't compare contents of the files. Also it doesn't go inside subdirectories which are missing in one of the directories. So the output is quite concise and the script works fast with large directories.
If you save it to a file named
compare_dirs.py
, you can run it with Python3.x:Sample output:
P.S. If you need to compare file sizes and file hashes for potential changes, I published an updated script here: https://gist.github.com/amakukha/f489cbde2afd32817f8e866cf4abe779
Fairly easy task to achieve in python:
Substitute actual values for
DIR1
andDIR2
.Here's sample run:
For readability, here's an actual script instead of one-liner:
Adail Junior's nice answer might have an issue in time execution if you have hundreds of thousands of files! So here is another way to do it. Say you want to compare all the filenames of folder A with all the filenames of folder B. Step 1, cd to folder A and do:
Step 2, cd to folder B and do:
Step 3, take the diff of listA.txt and listB.txt
I tried that in folders containing half a million txt files and in less than 30 secs I had the diff on my screen, whereas computing the md5sums and then piping and then appending can be very very time consuming. Note also the original question is asking for comparing filenames (not their content!) and check if there are files missing between the folders under comparison! Thanks