I've got a small failover cluster that I run for the websites my company has. During a RAM upgrade of the standby server, our websites started to show errors about not being able to access the database server. I verified that the instance was indeed up and the server accessable via remote desktop. I also tried a SQL connection to it and it worked, but that might have been after it became available again.
This happened on and off until we were able to roll back the hardware changes that were in progress on the standby server and we were able to bring it back up.
There was nothing of interest in the SQL Server log, but there is a continous log for the whole duration of the problem, so there was no restart of the SQL Server service. The event viewer is of more interest, since it shows events relating to the heartbeat network card, but I don't know how that would affect the availability of the server, since the standby node is offline. I'd appreciate any help you can provide, it's not very redundant if the setup depends on the standby server being up. :)
Here are the event logs from the time of the problem, I include all of them since I can't seem to see what could possibly be the cause of the problem.
Event log: http://hlekkir.com:800/htmltable.htm
Have you got the log from the web server that was failing to get a SQL connection? There might be a clue in there.
This is a bit concerning though:
"The system detected that network adapter Local Area Connection* 9 was connected to the network, and has initiated normal operation."
"The system detected that network adapter Cluster was connected to the network, and has initiated normal operation."
Within 30 seconds of each other at the end of the outage.
What is Local Area Connection 9? I'm assuming "Cluster" is the heartbeat connection to the other node... so that would make this your domain connection? If so, given that we don't see an entry for it going down, it sounds like it was unplugged too during your maintenance...