I have run into a problem with NFS which has me baffled. I can't even come up with a plausible explanation for it. I have ten machines:
slave1 (10.0.0.10) - slave10 (10.0.0.20)
Each one runs NFS server and exports a directory.
I mount the directories as usual:
sudo mount 10.0.0.11:/var/export /mnt/slave/1/ -o soft
sudo mount 10.0.0.12:/var/export /mnt/slave/2/ -o soft
...
sudo mount 10.0.0.20:/var/export /mnt/slave/10/ -o soft
At this point the mounts look like this:
10.0.0.11:/var/export on /mnt/slaves/1 type nfs (rw,soft,vers=4,addr=10.0.0.11,clientaddr=10.3.3.212)
10.0.0.12:/var/export on /mnt/slaves/2 type nfs (rw,soft,vers=4,addr=10.0.0.12,clientaddr=10.3.3.212)
10.0.0.13:/var/export on /mnt/slaves/3 type nfs (rw,soft,vers=4,addr=10.0.0.13,clientaddr=10.3.3.212)
10.0.0.14:/var/export on /mnt/slaves/4 type nfs (rw,soft,vers=4,addr=10.0.0.14,clientaddr=10.3.3.212)
10.0.0.15:/var/export on /mnt/slaves/5 type nfs (rw,soft,vers=4,addr=10.0.0.15,clientaddr=10.3.3.212)
10.0.0.16:/var/export on /mnt/slaves/6 type nfs (rw,soft,vers=4,addr=10.0.0.16,clientaddr=10.3.3.212)
10.0.0.17:/var/export on /mnt/slaves/7 type nfs (rw,soft,vers=4,addr=10.0.0.17,clientaddr=10.3.3.212)
10.0.0.18:/var/export on /mnt/slaves/8 type nfs (rw,soft,vers=4,addr=10.0.0.18,clientaddr=10.3.3.212)
10.0.0.19:/var/export on /mnt/slaves/9 type nfs (rw,soft,vers=4,addr=10.0.0.19,clientaddr=10.3.3.212)
10.0.0.20:/var/export on /mnt/slaves/10 type nfs (rw,soft,vers=4,addr=10.0.0.20,clientaddr=10.3.3.212)
Now, for what has me baffled. Some of these mounts are, randomly, pointing at the wrong server!
For example, the files /mnt/slaves/2 may be from slave5 (10.0.0.15). Or perhaps from slave9. Or if I am lucky from slave2. Unmounting and mounting again, using exactly the same line causes the mount to randomly point at another slave. By remounting enough times (and after each check if I got the right slave) I can get the mounts right, but it is a big annoyance.
This problem is showing up using ubuntu 14.04.1 LTS.
Some info:
- It's not DNS. We are not using DNS.
- It's not an IP/Mac address conflict. All other prototocols (SSH, HTTP etc) goes to the right slave.
- I can't see anything in the logs, as far as I can tell the master thinks it communicates with the right slave.
- Using wireshark I see some traffic to the right slave. The master sends the SETCLIENTID command, so it initiates communication. After that communication just stops. No SETCLIENTID_CONFIRM is sent.
So at this point I wonder:
- Is this some kind of known bug?
- Is there a workaround for it?
I have managed to reproduce the same behavior on two different Ubuntu systems setup at different times, so it is not isolated to a single server.