Ping a Specific Port

Question

Karl Katzke

Asked: 2009-06-06 04:55:49 +0800 CST2009-06-06 04:55:49 +0800 CST 2009-06-06 04:55:49 +0800 CST

NFS v4, HA Migration, and stale handles on clients

772

I'm managing a server running NFS v4 with Pacemaker/OpenAIS. NFS is configured to use TCP. When I migrate the NFS server to another node in the Pacemaker cluster, even though the metadata is persisted, connections from the clients 'hang' and eventually time out after 90 seconds. After that 90 seconds, the old mountpoint becomes 'stale' and the mounted files can no longer be accessed.

The 90 second grace period seems to be part of the server configuration and not the client configuration. I see this message on the server:

kernel: NFSD: starting 90-second grace period

If I restart the NFS client on the client nodes after I migrate (unmounting and then remounting the share), then I don't experience the problem, but connections and file transfers still interrupted.

Three questions:

What is the 90 second grace period? What's it there for?
How can I keep the files from going stale on the clients without restarting them after I migrate the NFS server to another node?
Is it actually possible to migrate the NFS server without having large file uploads drop?

4 Answers

Voted

Allen · Answer 1 · 2009-06-07T20:38:53+08:00

Best Answer

Allen

2009-06-07T20:38:53+08:002009-06-07T20:38:53+08:00

NFS stores a lot of the client's state on the server. Pacemaker/OpenAIS can't make up for NFS's shortcomings in this area. The grace period is there for the server and clients to recapture state. It's part of the protocol.

Anyway, it seems that you are not moving client state over completely (like /var/lib/nfs contents). See this for ideas and what needs to be synced state-wise on the server side.

5

Dan Carley · Answer 2 · 2009-07-17T10:04:51+08:00

Dan Carley

2009-07-17T10:04:51+08:002009-07-17T10:04:51+08:00

Whereas with NfSv3 you could specify UDP transport for mounts to achieve instantaneous failover and the client/server wouldn't be any the wiser, NFSv4 makes it a little more tricky. Foremost because TCP is the only available transport and it's not in TCP's nature to have a connection ripped from beneath it's feet and carry on like normal.

You can get the transfer time down. Especially if you follow the advice about a common server state directory and maintaining FSIDs. Try to pull the listening interface before stopping NFS and make sure that mounts aren't retracted (exportfs -ua). But it will never be absolutely instant.

You should also bear in mind that switching from one server and then straight back again is a no-no. The former server can still hold the previous connections open in a TIME_WAIT state and will refuse new connections for up to 20 minutes.

A lot of the details on this Heartbeat wiki page are a bit old school but still pertinent.

3

James · Answer 3 · 2009-07-09T14:54:58+08:00

James

2009-07-09T14:54:58+08:002009-07-09T14:54:58+08:00

Is the physical disc shared between the units, eg is it a SAN disc ?

Are you exporting the disc with a constant fsid

/share *(rw,sync,fsid=6667)

otherwise:

The inode-Number, the IP, the minor and the major number of the device that serves NFS have to be the same to keep the same NFS-file-handle. So use lvm on top of device and keep minor/major of lvm in sync.

1

cipy · Answer 4 · 2009-07-17T09:24:43+08:00

cipy

2009-07-17T09:24:43+08:002009-07-17T09:24:43+08:00

NFSv4 is a stateful protocol, meaning the parties (client, server) should be aware at all times one about the other if they are engaged in communication. in other words, if the server is stopped & restarted somewhere else, the clients should disconnect before the move, then reconnect when the move is complete (I guess Pacemaker + NFSd != love :-)

maybe you should try glusterfs for HA / clustering

0

NFS v4, HA Migration, and stale handles on clients

Ping a Specific Port

What port does SFTP use?

Resolve host name from IP address

How can I sort du -h output by size

Command line to list users in a Windows Active Directory group?

What's the command-line utility in Windows to do a reverse DNS look-up?

How to check if a port is blocked on a Windows machine?

What port should I open to allow remote desktop?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?