I have created the following structure:
- Two hardware servers with Windows Server 2012R2 installed, SQL Server 2012R2 installed. Then Windows went fully updated and SQL Server was patched to SP4, current version is reported as 11.0.7493.
- WSFC formed on both servers, added a file share witness elsewhere
- Two standalone SQL Server instances, one on each server, with enabled AlwaysOn
- One AlwaysOn availability group with one database and one listener
This works as intended when it comes to connecting to the database and SSMS-driven (T-SQL-driven) manual failover. (Had to resolve an issue with local SQL Server logins having different SIDs, since the app used SQL Server authentication, but it works) Now I have tried to simulate an SQL Server crash by stopping the server - BAM, AAG totally failed. Investigation with Get-ClusterLog
showed that the WSFC said "Not failing over group XXX, failoverCount 3, failoverThresholdSetting 1, lastFailover 1601/01/01-00:00:00.000". Okay I said, let's wait 6 hours (the default timeout on a WSFC resource to clean failover count), tried again - BAM failoverCount raised to 4. I have then tried lowering the failover period to 1 hours and threshold to 5 - again nothing and failover count raised again beyond the threshold. I went Googling and discovered some info that this timeout can be lowered to zero effectively insta-resetting the failover count - NO WAY, it still grows whenever I tried to simulate a failover. However, when I just restart the now-primary cluster node together with the SQL server, the AAG properly moves to the remaining node and the local database replica becomes primary.
So, what to do and how to make SQL Server 2012 AAG fail over to the other node in case the SQL Server goes down while the host remains operational?
As a side note, why does the last failover time shows zeroes? Maybe this is the case, or a part of the symptoms that shows where to look at?
0 Answers