I am working on a solution that is using SQL Server 2012 SP2, but without the use of AlwaysOn availability groups. This is due to cross-database transactions, that does not work for this scenario.
Note: This is being addressed as we speak, but added as background information to my problem.
We are using a HP 3PAR StoreServ solution to gain site-to-site synchronous replication via a SAN. This allows DR scenarios to work cross-site, so we can failover to our secondary.
My concern lies with the RPO of 0 because there are scenarios in which data can be lost and corruption will occur. For example, the link is severed between sites, then primary goes down.
My questions are as follows:
- Does the SAN deny data writes to the I/O until synchronisation has completed?
OR
If a link is severed, does the SAN buffer the block changes until the connection is restored?
If a link is severed during a TL log write, and a DR occurs, doesn't this mean that we will have a potentially corrupt TL written to the secondary site, and therefore incur data loss? The data loss is only because the primary was able to commit, but the secondary was not able to synchronise.
Is RPO of zero ever a guarantee across the stack (SQL Server / Memory / Network / SAN / IO)?
From the HP 3PAR StorageServ whitepaper: Replication Solutions for demanding disaster tolerant environments, page 6:
For synchronous replication solutions the RPO of the solution is always zero. For asynchronous replication solutions however the RPO will always be something greater than zero. Asynchronous Periodic mode is asynchronous replication. As a result, when designing a solution that uses Asynchronous Periodic replication, RPO becomes a concern.
The SAN guarantees a RPO of 0 tolerance, so is it a case that the when the network dies, the SAN does not allow the change to permeate to the I/O?
Update:
I found this information on page 12 of the reference above:
Synchronous Long Distance topology
The Remote Copy Synchronous Long Distance (SLD) topology is the only topology supported today that allows volumes in a Remote Copy Volume Group to be replicated from one source StoreServ array to two different target StoreServ arrays. It does this by replicating data synchronously between two StoreServ arrays (the “Source” and “Sync Target” arrays) while simultaneously replicating the same data via Periodic Asynchronous mode to a third StoreServ array, the disaster recovery or “Async Target” array. The user has the option of treating the two sync arrays in an active-active manner, failing over between them if/when a failure in a data center dictates a failover is necessary and resuming operations on the “Sync Target” array. This provides a failover solution that delivers an RPO equal to zero due to the synchronous nature of the replication that occurs between the sync arrays.
On failover to a Sync Target array, the passive Asynchronous Periodic link between that array and the Async Target array becomes active and any data that was replicated to the Sync Target but that had not yet made it to the Async Target array is sent from the Sync Target array to the Async Target bringing the Async Target array up to date with the last write that occurred to the Sync Target. Operations then continue in the Sync Target data center and it continues to replicate data to the Async Target array.
From this information, you do need a 3rd endpoint to participate in asynchronous replication, so that the secondary site will be able to be informed of changes when the network link breaks.