I have a server that holds ZFS snapshots that I export via NFS to the servers they are backing up so you can restore via a custom application written in-house. The issue is as follows:
NOTE: I am not using ZFS built-in NFS for a reason, so please don't tell me to use that!
This is all NFS v4
The host is running CentOS 6.2
The client is running CentOS 5.7
I have 8 nfs servers started by default on the host.
On the backup server that holds the NFS shares, I can traverse the directory structure as deep as needed and see all expected files.
On the client, I can traverse the filesystem, but sometimes, and it really seems random, when I go 2 or more directories deep, I end up seeing the files from another server.
Here is an example:
[NFSSERVER /nfs/share]# ls -l
total 60
drwx--x--x 30 root root 4096 Feb 25 00:15 20120225
drwx--x--x 30 root root 4096 Feb 26 00:05 20120226
drwx--x--x 30 root root 4096 Feb 27 00:06 20120227
.....
so on
[NFSCLIENT /app/backups]# ls -l
total 60
drwx--x--x 30 nobody nobody 4096 Mar 2 00:25 20120225/
drwx--x--x 30 nobody nobody 4096 Mar 2 00:25 20120226/
drwx--x--x 30 nobody nobody 4096 Mar 2 00:25 20120227/
......
so on
You can see those are identical, as they should be.
This is where the problem starts. If i go into:
[NFSCLIENT /app/backups/20120225/home] # ls -l
When I run this ls -l on the client sometimes I see the proper files, sometimes I see the home dir of another server.
If I got to [NFSSERVER /nfs/share/20120225/home]# ls -l
When I run this ls -l I see the proper files. If I delete a folder in /nfs/share/ I can see the result on the client immediately. It is only when i go deeper that I see these "cross-mounted" filesystems.
Here is a portion of my /etc/exports (hostnames changed)
/nfs *.domain.com(fsid=0,ro,nohide,no_root_squash)
/nfs/server1/20120308 *.domain.com(ro,nohide,no_root_squash)
/nfs/server1/20120309 *.domain.com(ro,nohide,no_root_squash)
/nfs/server1/20120310 *.domain.com(ro,nohide,no_root_squash)
/nfs/server1/20120311 *.domain.com(ro,nohide,no_root_squash)
/nfs/server2/20120308 *.domain.com(ro,nohide,no_root_squash)
/nfs/server2/20120309 *.domain.com(ro,nohide,no_root_squash)
/nfs/server2/20120310 *.domain.com(ro,nohide,no_root_squash)
/nfs/server2/20120311 *.domain.com(ro,nohide,no_root_squash)
/nfs/server3/20120204 *.domain.com(ro,nohide,no_root_squash)
/nfs/server3/20120205 *.domain.com(ro,nohide,no_root_squash)
/nfs/server3/20120206 *.domain.com(ro,nohide,no_root_squash)
/nfs/server3/20120207 *.domaincom(ro,nohide,no_root_squash)
IF I remove all lines from etc exports EXCEPT the one that is cross-mounting, then reload the exports file (ie, only leaving one entry in /etc/exports), it shows all of the proper directories on the client machine.
So, stale NFS handles? More NFS servers running by default? Something else? Any ideas? I've been banging my head for a couple weeks now on this one.
UPDATE
This is the line of code my script runs that is setting up the directories that are being exported:
mount -t ext4 -o noload,ro /dev/zvol/backups/$HOST@$DATE"-00" /nfs/$HOST/$DATE
The /nfs/$HOST/$DATE folders are the ones being exported (as you can see in the exports file above)
so it seems it was the wildcard exports, which, if you read the man page, are not recommended. I'd read that before but for some reason didn't fix it. I still think this is a "bug" and it should work in theory but in practice, it doesn't.
Hope this helps others.
Example of my new exports file: