I have a large and growing set of text files, which are all quite small (less than 100 bytes). I want to diff each possible pair of files and note which are duplicates. I could write a Python script to do this, but I'm wondering if there's an existing Linux command-line tool (or perhaps a simple combination of tools) that would do this?
Update (in response to mfinni comment): The files are all in a single directory, so they all have different filenames. (But they all have a filename extension in common, making it easy to select them all with a wildcard.)
There's the fdupes. But I usually use a combination of
find . -type f -exec md5sum '{}' \; | sort | uniq -d -w 36
Well there is FSlint - which I haven't used for this particularly case, but I should be able to handle it: http://en.flossmanuals.net/FSlint/Introduction
You almost certainly don't want to diff each pair of files. You probably would want to use something like md5sums to get all the checksums of all the files and pipe that into some other tool that will only report back duplicate checksums.
I see fdupes and fslint mentioned as answers. jdupes is based on fdupes and significantly faster than either, fdupes ought to be considered deprecated at this point.