We have a shinny new multihomed Windows Server 2008 (64 bit) cluster exibiting some strange behavior.
The problem:
Everything works perfectly until we failover one of the cluster groups
Prior to a failover, internal clients can connect as well as external clients. And, all domain authentication works properly
Once we failover a cluster group, Internal clients in different subnets loose connectivity (as if the static routes had disappeared) and you can no longer log into the server using a domain account (Domain Controller is in different subnet)
All DNS lookups occur via the Public/Internet interface. It is as if the server(s) can no longer find/resolve the Internal/Domain DNS servers.
Rebooting fixes the problem until the next group failover
Setting the default gateway to the Internal network also works, at the extreme consequence of having to make static routes for the entire Internet (I don't have the time)
The network adapters are as follows:
Heartbeat Network (crossover cable between two servers)
Internal Network (Active Directory based Network w/ DNS no WINS)
Public Network (Internet Connection - Default Gateway - w/ DNS)
Microsoft Cluster Failover Virtual Adapter (this is hidden in most cases but you can see it when you do an "ipconfig /all")
Other information:
This system must provide services to both the Internal and Public networks
The Public/Internet connection is the default gateway
We have entered persistent static routes to several subnets off the Internal network
Each cluster group has a network name and associated IP address
The binding order of the network interfaces are:
1 Internal
2 Public
3 Heartbeat
We're stumnped. We have used this configuration on older clustered Windows 2K clusters. We have also used this configuratin in standalone Windows 2K3 servers. Any suggestions would be greatly appreciated.
Todd
I think I have this exact same problem on a new 2008 R2 cluster with an equallogic, what is the solution? I have a microsoft case and they're pointing me to weak/strong host but it is not helping.
Here is solution for anything with broadcom NICs (and maybe others):
http://support.microsoft.com/default.aspx?scid=kb;EN-US;951037
You must disable rss/chimney/netdma. Resolved my problems immediately, after dell/ms support calls!
The following post on technet by John Marlin, Senior Support Escalation Engineer at Microsoft, was exactly what was happening and provides the solution.
He described the problem as:
We followed his advice and things started working! We did have some additional DNS problems, but those were easier to solve. Windows Server 2008, when clustered, is really a different beast from a network perspective than previous versions.
Note: We also had lots of problems with applications binding to virtual cluster failover adapter/address and other issues with multicast/udp traffic and the windows firewall, but that is for another post.