When you combine failover clustering and database mirroring in SQL Server, you need to change the mirroring partner timeout value so that the local cluster gets a chance to fail over before database mirroring fails over. I'm curious as to what people are doing when combining these technologies - I teach various HA classes and this is not too common a combination.
Here are my questions IF you are using failover clustering and database mirroring combined. If you could answer them all in each response, that would be very useful to me. I don't need an explanation of why things need to be changed or how the technologies work - I used to own them both when at Microsoft - I'm interested in industry practices now the possibility of marrying them has been out there for 4 years.
1) how long, on average, does it take for a clustered SQL Server instance to fail over for you? (I know it depends on how much crash recovery is required, but what's an average for you?)
2) for these same instances, what do you set the mirroring partner timeout to?
3) are you comfortable with the fact that a real cluster outage could occur and it may be quite a while until mirroring notices that the failure has occured because you've bumped the mirroring partner timeout up?
Thanks for all responses!
Paul, 1. Typically a few seconds, up to a couple of minutes depending on ... (you know the rest).
Were I to setup auto failover I'd go for several minutes. That way site to site VPN connections would have time to come back up, Cluster could restart, etc. At the minimum I'd probably go with 4 minutes longer than it would take the nodes of the cluster to restart in the event of a local power outage.
Yep. DR issues are usually defined as a failure over an hour. Besides it'll probably take longer than that for the global load ballancer to notice the other site is down, and upload all the DNS, plus the TTL time on the DNS. This total time should be the upper end of the amount of time for auto failover.
I wasn't involved in the original design, but this is how things have been setup:
There is another stand alone server at each site that is able to act as a witness. The witness currently runs on the site where all the principals are.
I've never seen a cluster fail over occur. Mirror fail overs are fast, I'd say about 10 seconds at the most.
Partner time-out is 30 seconds for all db's
It was by design that a mirror fail-over will occur before a cluster fail-over. The db's are clustered as an additional level of redundancy only, although each instance is configured to only use half the available RAM on the server.