Ping a Specific Port

Question

aidan

Asked: 2010-11-24 01:57:40 +0800 CST2010-11-24 01:57:40 +0800 CST 2010-11-24 01:57:40 +0800 CST

Fast way to recursively count files in linux

772

I'm using the following to count the number of files in a directory, and its subdirectories:

find . -type f | wc -l

But I have half a million files in there, and the count takes a long time.

Is there a faster way to get a count of the number of files, that doesn't involve piping a huge amount of text to something that counts lines? It seems like an inefficient way to do things.

7 Answers

Voted

Sean Reifschneider · Answer 1 · 2010-11-24T05:34:37+08:00

If you have this on a dedicated file-system, or you have a steady number of files overhead, you may be able to get a rough enough count of the number of files by looking at the number of inodes in the file-system via "df -i":

root@dhcp18:~# df -i
Filesystem            Inodes   IUsed   IFree IUse% Mounted on
/dev/sda1            60489728   75885 60413843    1% /

On my test box above I have 75,885 inodes allocated. However, these inodes are not just files, they are also directories. For example:

root@dhcp18:~# mkdir /tmp/foo
root@dhcp18:~# df -i /tmp 
Filesystem            Inodes   IUsed   IFree IUse% Mounted on
/dev/sda1            60489728   75886 60413842    1% /
root@dhcp18:~# touch /tmp/bar
root@dhcp18:~# df -i /tmp
Filesystem            Inodes   IUsed   IFree IUse% Mounted on
/dev/sda1            60489728   75887 60413841    1% /

NOTE: Not all file-systems maintain inode counts the same way. ext2/3/4 will all work, however btrfs always reports 0.

If you have to differentiate files from directories, you're going to have to walk the file-system and "stat" each one to see if it's a file, directory, sym-link, etc... The biggest issue here is not the piping of all the text to "wc", but seeking around among all the inodes and directory entries to put that data together.

Other than the inode table as shown by "df -i", there really is no database of how many files there are under a given directory. However, if this information is important to you, you could create and maintain such a database by having your programs increment a number when they create a file in this directory and decrement it when deleted. If you don't control the programs that create them, this isn't an option.

Christopher Schultz · Answer 2 · 2017-10-26T11:38:12+08:00

Christopher Schultz

2017-10-26T11:38:12+08:002017-10-26T11:38:12+08:00

I wrote a custom file-counting program for this StackOverflow question: https://stackoverflow.com/questions/1427032/fast-linux-file-count-for-a-large-number-of-files

You can find the GitHub repo here if you'd like to browse, download, or contribute: https://github.com/ChristopherSchultz/fast-file-count

3

abu_bua · Answer 3 · 2018-04-25T02:38:30+08:00

abu_bua

2018-04-25T02:38:30+08:002018-04-25T02:38:30+08:00

If you want to count recursively the number of files in a directory the locate command is the fastet one I know, assumed you have an up-to-date database (sudo update database .. made per default via chron job every day). However, you can speed up the command if you avoid the grep pipe.

See man locate:

-c, --count
       Instead  of  writing  file  names on standard output, write the number of 
       matching entries only.

So the fastest command is:

locate -c -r '/path/to/dir'

2

Oren · Answer 4 · 2012-04-26T08:43:59+08:00

Oren

2012-04-26T08:43:59+08:002012-04-26T08:43:59+08:00

I would also try:

find topDir -maxdepth 3 -printf '%h %f\n'

And then process the output, reducing into a count for the directories.

This is especially useful if you anticipate the directory structure.

1

thibault ketterer · Answer 5 · 2015-05-13T02:05:39+08:00

thibault ketterer

2015-05-13T02:05:39+08:002015-05-13T02:05:39+08:00

if you have locate installed you can use

locate -c "$PWD"

Fast way to recursively count files in linux

Ping a Specific Port

How do I tell Git for Windows where to find my private RSA key?

How do you restart php-fpm?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Resolve host name from IP address

How can I sort du -h output by size

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?