Ping a Specific Port

Question

Daryl Spitzer

Asked: 2010-10-08 10:50:10 +0800 CST2010-10-08 10:50:10 +0800 CST 2010-10-08 10:50:10 +0800 CST

Linux tools to find duplicate files?

772

I have a large and growing set of text files, which are all quite small (less than 100 bytes). I want to diff each possible pair of files and note which are duplicates. I could write a Python script to do this, but I'm wondering if there's an existing Linux command-line tool (or perhaps a simple combination of tools) that would do this?

Update (in response to mfinni comment): The files are all in a single directory, so they all have different filenames. (But they all have a filename extension in common, making it easy to select them all with a wildcard.)

4 Answers

Voted

Hubert Kario · Answer 1 · 2010-10-08T11:03:57+08:00

Best Answer

Hubert Kario

2010-10-08T11:03:57+08:002010-10-08T11:03:57+08:00

There's the fdupes. But I usually use a combination of find . -type f -exec md5sum '{}' \; | sort | uniq -d -w 36

23

faker · Answer 2 · 2010-10-08T11:03:47+08:00

faker

2010-10-08T11:03:47+08:002010-10-08T11:03:47+08:00

Well there is FSlint - which I haven't used for this particularly case, but I should be able to handle it: http://en.flossmanuals.net/FSlint/Introduction

6

Zoredache · Answer 3 · 2010-10-08T11:02:58+08:00

Zoredache

2010-10-08T11:02:58+08:002010-10-08T11:02:58+08:00

You almost certainly don't want to diff each pair of files. You probably would want to use something like md5sums to get all the checksums of all the files and pipe that into some other tool that will only report back duplicate checksums.

3

Mr. T · Answer 4 · 2021-07-18T22:28:34+08:00

Mr. T

2021-07-18T22:28:34+08:002021-07-18T22:28:34+08:00

I see fdupes and fslint mentioned as answers. jdupes is based on fdupes and significantly faster than either, fdupes ought to be considered deprecated at this point.

0

Linux tools to find duplicate files?

Ping a Specific Port

How do I tell Git for Windows where to find my private RSA key?

How do you restart php-fpm?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Resolve host name from IP address

How can I sort du -h output by size

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?