We have run into a strange situation in which a SQL Server 2008 single-node cluster hangs. As background, we are rebuilding a Windows Server 2003/SQL Server 2005 two-node cluster using Windows 2008 and SQL Server 2008. Here's the timeline:
- Evicted the passive node (server B) from the Windows 2003/SQL 2005 cluster. The active node now functions as a single-node cluster with no problems.
- Wiped server B's disks and installed Windows 2008 and SQL Server 2008 as a single-node cluster. Since we do not want to the two clusters to communicate yet, we left the cluster's private network "heartbeat" adapter unconfigured. The cluster comes up and functions normally.
- Moved all databases to the new cluster. Cluster continues to function normally.
- Turned off server A (old cluster) in preparation for rebuilding as the second node of the new cluster.
- SQL Server instance on server B (new cluster) locks up, even though it should have no knowledge of or interaction with server A.
- Restarted server A. SQL Server instance on server B (new cluster) immediately begins working again.
Things we have tried:
- The new cluster's name responds to ping and NETBIOS requests, even while the SQL Server is hung.
- We have confirmed that no IP address is assigned to the old heartbeat adapter, and it is not pulling an IP address from DHCP.
- Disabling the heartbeat's network card has the same effect.
- No errors were generated in any logs - Windows or SQL.
- When the error first occurred, it sat in the hung state for quite some time (well over 10 minutes) before anyone figured out what was going on. This would seem to eliminate any sort of normal cluster timeout in which it would have been searching for the other node (even if one had been configured).
Server B is running Windows 2008 SP2, fully patched, and SQL Server 2008 SP1 CU7 (10.0.2775).