We are implementing DRDB + heartbeat with two servers to have a file system with failover. These servers exposes a NFS service for other servers
Currently DRDB is working just fine, but when testing we switch from one server to another the mounted folders trough NFS in the other servers just hangs.
Is there any transparent way to make this failover? Make it transparent to NFS or we need to necessary re-mount those nfs-mounted folders?
The problem here is that you have made a redundant storage array using DRBD, but you have two disjointed NFS daemons running with the same shared data. NFS is stateful - as long as you cannot transfer the state as well, you will have serious problems on failover. Solaris HA setups do have daemons that take care of this problem. For a Linux installation, you will have to make sure that your NFS state directory (configurable, typically /var/lib/nfs) is located on the shared disk for both servers.
Stick with Heartbeat or Corosync for failure detection and failover - it generally does the Right Thing (tm) when configured with a Quorum. Other failover techniques might be too focused on just providing a virtual IP (e.g. VRRP) and would not suit your needs. See the http://linux-ha.org for further details and additional components for a cluster setup.
I recommend that you read this HOWTO on highly available NFS using NFSv4, DRBD and Pacemaker. It contains detailed instructions and explanations as well as important details on how to provide a highly available NFS service. We have put a few such HA-NFS setups in production now and they work very well.
Part of such a HA setup is to move away from the old Heartbeat system (the one that uses
/etc/ha.d/haresources
and/etc/ha.d/ha.cf
) and use the much more capable and robust Pacemaker stack. It's a bit of a transition from old Heartbeat and quite a learning curve but eventually it means you have a cluster running that is worth its name.The HOWTO is written by Linbit, the company that created and maintains DRBD and contributes much to the whole Linux HA stack. Unfortunately (free) registration on their website is required to access the tech guides but they are well written and very useful.
The best way I can think of to make this transparent is to use a virtual IP and virtual MAC address, and switches that are aware that this transition may happen/do the right thing when there's a gratuitous ARP (so you don't have to wait for an ARP cache to clear, which may take long enough to make your NFS mounts stale).
Something like CARP is probably the way to go for the IP failover - this is available on all the *BSDs, and as far as I know is in the Linux kernel too. Obviously give it some testing to make sure that it works the way you want (it sounds like you're currently doing testing so you're in a good place).
Make sure the filesystems are located on the same major/minor device number (if you use the same drbd-device on both sides this should be true) and use a virtual IP for your NFS-service.
In Heartbeat use this order of resorces:
It is important to put the VIP last - else your clients will loose their NFS-connection instead of continuously retrying it.
BTW: Putting an IP as resouce into heartbeat will do a gratitous arp upon failover as well - so you don't have to care about that (normally).