Both machines are running Ubuntu 12.04
Remote NFSv4 Client
$ ls /mnt/storage/aaaaaaa_aaa/bbbb/cccc_ccccc gives this error:
ls: reading directory .: Too many levels of symbolic links
How can I fix this?
When error occurs ls start listing the files, however PHP brakes.
On the NFSv4 Server
In /etc/fstab
:
/mnt/storage /srv/storage none bind 0 0
In /etc/exports
/srv 192.168.1.0/24(rw,async,insecure,no_subtree_check,crossmnt,fsid=0,no_root_squash)
/srv/storage 192.168.1.0/24(rw,async,nohide,insecure,no_subtree_check,no_root_squash)
ERROR
root@ds:root@ds:/mnt/storage/foreign_dbs/imdb/imdb_htmls# ls -l | head
ls: reading directory .: Too many levels of symbolic links
total 10302840
-rw-r--r-- 1 root root 10484 Jul 5 13:56 0019038.gz
-rw-r--r-- 1 root root 16264 Mar 30 00:31 0259701.gz
-rw-r--r-- 1 root root 13784 Mar 30 14:20 1000000.gz
-rw-r--r-- 1 root root 12741 Mar 30 13:04 1000003.gz
-rw-r--r-- 1 root root 12794 Mar 30 12:40 1000004.gz
-rw-r--r-- 1 root root 13123 Mar 30 12:07 1000005.gz
-rw-r--r-- 1 root root 13183 Mar 30 12:04 1000006.gz
-rw-r--r-- 1 root root 13443 Jul 4 01:16 1000007.gz
-rw-r--r-- 1 root root 12968 Mar 30 11:05 1000008.gz
I came across it in PHP. scandir would return 1612577.gz & 1612579.gz, but skips 1612578.gz and yet the file types and properties are identical on them
and this only happens on the nfs client, works 100% on the server
About the problem
You can have a problem where two or more files have the same readdir cookie.
This problem is more common when using a NFS filesystem (v3 or v4) over an EXT4 backend and with a lot of files in the same directory (more than 50000). It problem can also occur when using GlusterFS instead of NFS.
PS: This problem can occur also with only few files inside a single directory, but this last case is very very improbable.
In this case, you will see
Too many levels of symbolic links
errors even if you have no symlinks inside your directory. You can prove this verifying that the following command returns no output:To check if you're getting this specific problem, run the above command:
After, check your syslog (
/var/log/syslog
) for entries like:The problem is related to the
readdir
function of the readdir API, that uses the readdir cookie to quickly locate a file inside a directory. The NFS server uses this API while communicating with EXT4 backends.A complete and excellent explanation about the duplicate cookie problem (actually, a hash collision problem) can be found at Widening ext4's readdir() cookie.
A related bug report can be found at NFS client reports a 'readdir loop' with a corrupt name.
If you can reboot your system, the good news is that, according to David Hedberg, this problem is already solved in newer Ubuntu kernel versions (>= 3.2.0-60-generic). You may need to update your NFS server also (the solution only works if both NFS server and Kernel are updated).
PS: If you really love Operating Systems, you can check the kernel/nfs patchs at http://comments.gmane.org - 32/64 bit llseek hashes.
Solution
Update your kernel and NFS kernel server and reboot the system:
If you can't reboot the system, you can also detect the file with the duplicated readdir cookie (check your syslog) and move it to another dir (or rename it to change it's cookie/hash).
Somewhere you have a symbolic link that points back to its parent. Use this to find it:
Once you do, then perhaps you can figure out how to correct it.