I'm managing a server running NFS v4 with Pacemaker/OpenAIS. NFS is configured to use TCP. When I migrate the NFS server to another node in the Pacemaker cluster, even though the metadata is persisted, connections from the clients 'hang' and eventually time out after 90 seconds. After that 90 seconds, the old mountpoint becomes 'stale' and the mounted files can no longer be accessed.
The 90 second grace period seems to be part of the server configuration and not the client configuration. I see this message on the server:
kernel: NFSD: starting 90-second grace period
If I restart the NFS client on the client nodes after I migrate (unmounting and then remounting the share), then I don't experience the problem, but connections and file transfers still interrupted.
Three questions:
- What is the 90 second grace period? What's it there for?
- How can I keep the files from going stale on the clients without restarting them after I migrate the NFS server to another node?
- Is it actually possible to migrate the NFS server without having large file uploads drop?
NFS stores a lot of the client's state on the server. Pacemaker/OpenAIS can't make up for NFS's shortcomings in this area. The grace period is there for the server and clients to recapture state. It's part of the protocol.
Anyway, it seems that you are not moving client state over completely (like /var/lib/nfs contents). See this for ideas and what needs to be synced state-wise on the server side.
Whereas with NfSv3 you could specify UDP transport for mounts to achieve instantaneous failover and the client/server wouldn't be any the wiser, NFSv4 makes it a little more tricky. Foremost because TCP is the only available transport and it's not in TCP's nature to have a connection ripped from beneath it's feet and carry on like normal.
You can get the transfer time down. Especially if you follow the advice about a common server state directory and maintaining FSIDs. Try to pull the listening interface before stopping NFS and make sure that mounts aren't retracted (
exportfs -ua
). But it will never be absolutely instant.You should also bear in mind that switching from one server and then straight back again is a no-no. The former server can still hold the previous connections open in a
TIME_WAIT
state and will refuse new connections for up to 20 minutes.A lot of the details on this Heartbeat wiki page are a bit old school but still pertinent.
Is the physical disc shared between the units, eg is it a SAN disc ?
Are you exporting the disc with a constant fsid
/share *(rw,sync,fsid=6667)
otherwise:
The inode-Number, the IP, the minor and the major number of the device that serves NFS have to be the same to keep the same NFS-file-handle. So use lvm on top of device and keep minor/major of lvm in sync.
NFSv4 is a stateful protocol, meaning the parties (client, server) should be aware at all times one about the other if they are engaged in communication. in other words, if the server is stopped & restarted somewhere else, the clients should disconnect before the move, then reconnect when the move is complete (I guess Pacemaker + NFSd != love :-)
maybe you should try glusterfs for HA / clustering