I'm trying to batch an operation that counts files in a given subfolder of a remote NFS drive.
The NFS client is Ubuntu 16.04 LTS. I have very few informations on the remote NFS server. It's an NFS v3. It's anonimously rw mounted and its authentication is IP based. The bandwidth of the client is a 100/10 and it can upload around 1.1 MB/s. The provider advertises its backup storage to be 1Gbit/1Gbit guranteed. The usable size of the volume is <4TB, the expected count of files is estimated in >600000 units.
--Edit #1:
The storage's advertised guaranteed IOPs is 2000 but testing the remote fs results in 7-800iops.
The mount options used on the client are as suggested by the provider:
rsize=8192,wsize=8192,timeo=14,intr
To perform the count, my choice was this script:
#!/bin/bash
if [[ $# -eq 0 ]] ; then
echo 'no folder supplied, use $0 /path/to/folder'
exit 0
else
COUNT=$(find $1 -type f|wc -l)
echo $1 contains $COUNT files.
fi
exit 0
I tried it on my home, and it was obviously very fast, outputting:
/home/user contains 12 files.
When I try to get such stat from the remote NFS drive, the script sits down "forever".
--Edit #2:
I tried removing the |wc -l
and add >> $LOGFILE
at the end of the find
, but it looks like it randomly hangs in a 2 to 24 hours time range and when it hangs after long time, the list is far from beeing complete.
I tought I could split the find in chunks, in order to prevent this issue, maybe producing a list of all subfolders...
for d in $FOLDERLIST;
do
find $d -maxdepth 0 -type f|wc -l >> $TMPLOG
done
..and then sum all the numbers in $TMPLOG, so maybe in smaller operations the script won't hang.
QUESTION: Am I using the best possible resource saving way to perform this count? Maybe there's a cheaper way than find
to get files count?
I'm considering it maybe the wrong approach to count files, since I saw how long it takes on the remote drive there should be quite an overhead... I remember when I had some experience on remote filesystems mounted via curlftpfs. Huge overhead, huge delay.
NFS should be much better about that, but in this case it doesn't seems!
You can try with
rsync
, using something similar to:The first two line of the output will be like this:
The first value is the sum of dirs+files entries, while the second one is number of files only