I have an old backup of documents. In my current Documents
directory, a lot of these files exist in different locations with different names. I'm trying to find a way to show which files exist in the backup that do not exist in the Documents
directory, preferably nice and GUI-y so that I can easily overview a lot of documents.
When I search for this question, a lot of people are looking for ways to do the opposite. There are tools like FSlint and DupeGuru, but they show duplicates. There is no invert mode.
If you are ready to use CLI, the following command should work for you:
This will show you the files that are unique to each folder. If you want you can also ignore filename cases with the
--ignore-file-name-case
As an example:
In addition, if you want to report only when the files differ (and not report the actual 'difference'), you can use the
--brief
option as in:There are several visual diff tools such as
meld
that can do the same thing. You can installmeld
from the universe repository by:and use its "Directory comparison" option. Select the folder you want to compare. After selection you can compare them side-by-side:
fdupes
is an excellent program to find the duplicate files but it does not list the non-duplicate files, which is what you are looking for. However, we can list the files that are not in thefdupes
output using a combination offind
andgrep
.The following example lists the files that are unique to
backup
.I figured that the best workflow to merge old backups with thousands of files, archived under different directories with different names is to use DupeGuru after all. It looks a lot like the duplicates tab from FSlint, but it has the extra important feature of adding sources as 'reference'.
~/Documents
) as a reference.If you have multiple old backup directories, it makes sense to merge the newest backup directory like this first, and then use this backup directory as a reference to clean it's duplicates from the older backups before merging them to the main document directory. This safes a lot of work where you don't have to remove unique files that you want to trash in stead of merge from the backups.
Remember to make a fresh backup after you've destroyed all old backups in the process. :)
I had the same problem with a lot of very large files and there are a lot of solutions for duplicates but not for invert search, and I also did not want to search for content diffs because of the large amount of data.
So I wrote this python script to search for "isolated-files"
this will show any files (recursively) within folder2 which are not in folder1 (also recursively). Can also be used on ssh connections and with multiple folders.
see https://github.com/ezzra/isolated-files