There is a particular directory (/var/www
), that when I run ls
(with or without some options), the command hangs and never completes. There is only about 10-15 files and directories in /var/www
. Mostly just text files. Here is some investigative info:
[me@server www]$ df .
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg_dev-lv_root
50G 19G 29G 40% /
[me@server www]$ df -i .
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/mapper/vg_dev-lv_root
3.2M 435K 2.8M 14% /
find
works fine. Also I can type in cd /var/www/
and press TAB before pressing enter and it will successfully tab-completion list of all files/directories in there:
[me@server www]$ cd /var/www/
cgi-bin/ create_vhost.sh html/ manual/ phpMyAdmin/ scripts/ usage/
conf/ error/ icons/ mediawiki/ rackspace sqlbuddy/ vhosts/
[me@server www]$ cd /var/www/
I have had to kill my terminal sessions several times because of the ls
hanging:
[me@server ~]$ ps | grep ls
gdm 6215 0.0 0.0 488152 2488 ? S<sl Jan18 0:00 /usr/bin/pulseaudio --start --log-target=syslog
root 23269 0.0 0.0 117724 1088 ? D 18:24 0:00 ls -Fh --color=always -l
root 23477 0.0 0.0 117724 1088 ? D 18:34 0:00 ls -Fh --color=always -l
root 23579 0.0 0.0 115592 820 ? D 18:36 0:00 ls -Fh --color=always
root 23634 0.0 0.0 115592 816 ? D 18:38 0:00 ls -Fh --color=always
root 23740 0.0 0.0 117724 1088 ? D 18:40 0:00 ls -Fh --color=always -l
me 23770 0.0 0.0 103156 816 pts/6 S+ 18:41 0:00 grep ls
kill
doesn't seem to have any affect on the processes, even as sudo.
What else should I do to investigate this problem? It just randomly started happening today.
UPDATE
dmesg
is a big list of things, mostly related to an external USB HDD that I've mounted too many times and the max mount count has been reached, but that is an un-related problem I think. Near the bottom of dmesg
I'm seeing this:
INFO: task ls:23579 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
ls D ffff88041fc230c0 0 23579 23505 0x00000080
ffff8801688a1bb8 0000000000000086 0000000000000000 ffffffff8119d279
ffff880406d0ea20 ffff88007e2c2268 ffff880071fe80c8 00000003ae82967a
ffff880407169ad8 ffff8801688a1fd8 0000000000010518 ffff880407169ad8
Call Trace:
[<ffffffff8119d279>] ? __find_get_block+0xa9/0x200
[<ffffffff814c97ae>] __mutex_lock_slowpath+0x13e/0x180
[<ffffffff814c964b>] mutex_lock+0x2b/0x50
[<ffffffff8117a4d3>] do_lookup+0xd3/0x220
[<ffffffff8117b145>] __link_path_walk+0x6f5/0x1040
[<ffffffff8117a47d>] ? do_lookup+0x7d/0x220
[<ffffffff8117bd1a>] path_walk+0x6a/0xe0
[<ffffffff8117beeb>] do_path_lookup+0x5b/0xa0
[<ffffffff8117cb57>] user_path_at+0x57/0xa0
[<ffffffff81178986>] ? generic_readlink+0x76/0xc0
[<ffffffff8117cb62>] ? user_path_at+0x62/0xa0
[<ffffffff81171d3c>] vfs_fstatat+0x3c/0x80
[<ffffffff81258ae5>] ? _atomic_dec_and_lock+0x55/0x80
[<ffffffff81171eab>] vfs_stat+0x1b/0x20
[<ffffffff81171ed4>] sys_newstat+0x24/0x50
[<ffffffff810d40a2>] ? audit_syscall_entry+0x272/0x2a0
[<ffffffff81013172>] system_call_fastpath+0x16/0x1b
And also, strace ls /var/www/
spits out a whole BUNCH of information. I don't know what is useful here... The last handful of lines:
ioctl(1, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
ioctl(1, TIOCGWINSZ, {ws_row=68, ws_col=145, ws_xpixel=0, ws_ypixel=0}) = 0
stat("/var/www/", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
open("/var/www/", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
fcntl(3, F_GETFD) = 0x1 (flags FD_CLOEXEC)
getdents(3, /* 16 entries */, 32768) = 488
getdents(3, /* 0 entries */, 32768) = 0
close(3) = 0
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 9), ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f3093b18000
write(1, "cgi-bin conf create_vhost.sh\te"..., 125cgi-bin conf create_vhost.sh error html icons manual mediawiki phpMyAdmin rackspace scripts sqlbuddy usage vhosts
) = 125
close(1) = 0
munmap(0x7f3093b18000, 4096) = 0
close(2) = 0
exit_group(0) = ?
Run
strace ls /var/www/
and see what it hangs on. It's certainly hung on I/O -- that's what theD
state in yourps
output means (and sincekill
doesn't help, it's one of the uninterruptible I/O syscalls). Most hangs involve an NFS server that's gone to god, but based on yourdf
that isn't the case here. A quick check ofdmesg
for anything related to filesystems or disks might be worthwhile, just in case.On the hope this will be helpful, I had the above symptoms being caused by using
docker
anddocker compose
with the AUFS driver in Ubuntu 14.04.ls <dir>
was hanging, andstrace ls <dir>
showed it was hanging on thegetdents
call. Stopping all running containers allowed me to begin using the drive as expected.I had a problem with the same symptoms. It turned out that I had a symlink in that directory to an SMB mount over GVFS.
Normally
ls
would complete instantly whether or not the share was mounted. But in this case I had suspended and resumed the machine, and the mount was performing poorly in general. Remounting the share fixed the problem.I was experiencing the same problem.
Entering a directory is fine, listing it hangs, find works, tab complete hangs, and some folders beneath do work. Very head-scratchingly-weird.
Reading this thread on Server Fault did lead me on a logic path towards the solution.
It being to do with NAS, and NAS commonly being put as `automount' made me realise that I had recently changed my fstab to 'automount' some usb drives if they were present but carry on as normal when they weren't.
I then proceeded as follows:
Try entering the directory again and get that warm fuzzy feeling of having fixed the issue.
Womble's suggestions are excellent, and you should try those first, but if they don't fix it I have had this problem when a filesystem has become self-inconsistent (through flaky hardware, obscure kernel bugs, or even cosmic rays).
If you think it might be that, you can force a fsck on reboot by doing
touch /forcefsck; reboot
. Watch what it says at boot time, to see if the fsck picks up any inconsistencies.Warning: this will fsck all the filesystems attached to the machine; do not do it if you also have a multi-petabyte disc array attached, it may take days.
fsck
ing filesystems can also lead to data loss; if you really do have inconsistencies in your file system, e2fsck will change it from one that looks right but doesn't quite work, to one that works right but may not contain everything you expect.I had the same exact symptoms that you described. To fix the problem all I had to do was fix the DNS server addresses. We had moved the NAS to a new network, which required updating the DNS server addresses. The addresses were statically assigned, but in the QNAP web interface I updated it to automatically assign.
This happened to me. The cause ultimately was due to an
sshfs
mount point in the directory where the SSH server had become unreachable.strace
did not give me any clue thatls
was hanging on that entry (or maybe I don't understand how to readstrace
output).I managed identify the cause by:
ls
alias hanged but not running/bin/ls
directly.Curiously, in my case,
/bin/ls -F
worked but/bin/ls --color
did not. (I don't understand why, but that probably deserves its own question.)Running strace ls /var/www/ will give you hind of what is wrong. I had similar issue for / dir and using strace I was able to locate it was a NAS mount which caused it. Unmounting that NAS fixed the issue.