I have a small local network which has a Gentoo box and a Windows box. I mount a share originating on the Windows box onto the Gentoo box with a command like:
mount -t cifs -o username=WindowsUsername,password=thepassword,uid=pistos //192.168.0.103/Users /mnt/windowsbox
Most of the time, everything Just Works, and I can read and write without problems. However, every few weeks or so, the connection or the mount point seems to go dead or hang, such that any process that tries to access the mount point gets stuck in D state (disk, or I/O wait). These processes become impervious to TERM and KILL signals. Disconnecting and reconnecting the Windows box from the network does not help. The frozen state lasts for 5+ minutes. It's really frustrating and gets in the way of normal work, because it freezes Save As dialogues, ls
commands, etc. If I issue a umount
on the mount point, it either hangs also, or reports that the mount point is in use. Eventually, the dead state resolves itself, and the mount point gets unmounted, or it becomes possible to umount
with no delay.
My guess is that this happens when the connection/mount has gone idle, or when the Windows machine has been idle. I am not really sure.
Why is this happening, and what can I do to prevent it? Or how can I successfully kill these D-state processes at will?
Possibly related: CIFS mounts hang on read
Not sure why the problem is happening, but as a workaround, have you tried to put something like
touch /mnt/windowsbox/keepalive.txt
orecho "I am still alive." >/mnt/windowsbox/keepalive.txt
to be run via cron every minute? That way the connection should stay active.I too encounter this every few months.
sudo umount -l
is my workaround. https://stackoverflow.com/a/96288/2097284Another potential answer suggested writing to a file on the mount on a regular interval via cron. I would suggest instead using the smbclient program to connect to the share and disconnect.
I wrote a bash script like this to accomplish that:
This command makes a new connection to the share and then runs the exit command, immediately shutting down the connection it just established on the command line. There should be 8 slashes before the server name and 4 before the share name, as backslashes need to be escaped, and the escapes need to be escaped when inside a double quoted string. Perhaps there's a smarter way to do this, but this does seem to work.
Perhaps there is a way to make this even more reliable by making it hold the connection open for several minutes at a time, but that's a bit out of my league.