I am working on a solution that is using SQL Server 2012 SP2, but without the use of AlwaysOn availability groups. This is due to cross-database transactions, that does not work for this scenario.
Note: This is being addressed as we speak, but added as background information to my problem.
We are using a HP 3PAR StoreServ solution to gain site-to-site synchronous replication via a SAN. This allows DR scenarios to work cross-site, so we can failover to our secondary.
My concern lies with the RPO of 0 because there are scenarios in which data can be lost and corruption will occur. For example, the link is severed between sites, then primary goes down.
My questions are as follows:
- Does the SAN deny data writes to the I/O until synchronisation has completed?
OR
If a link is severed, does the SAN buffer the block changes until the connection is restored?
If a link is severed during a TL log write, and a DR occurs, doesn't this mean that we will have a potentially corrupt TL written to the secondary site, and therefore incur data loss? The data loss is only because the primary was able to commit, but the secondary was not able to synchronise.
Is RPO of zero ever a guarantee across the stack (SQL Server / Memory / Network / SAN / IO)?
From the HP 3PAR StorageServ whitepaper: Replication Solutions for demanding disaster tolerant environments, page 6:
For synchronous replication solutions the RPO of the solution is always zero. For asynchronous replication solutions however the RPO will always be something greater than zero. Asynchronous Periodic mode is asynchronous replication. As a result, when designing a solution that uses Asynchronous Periodic replication, RPO becomes a concern.
The SAN guarantees a RPO of 0 tolerance, so is it a case that the when the network dies, the SAN does not allow the change to permeate to the I/O?
Update:
I found this information on page 12 of the reference above:
Synchronous Long Distance topology
The Remote Copy Synchronous Long Distance (SLD) topology is the only topology supported today that allows volumes in a Remote Copy Volume Group to be replicated from one source StoreServ array to two different target StoreServ arrays. It does this by replicating data synchronously between two StoreServ arrays (the “Source” and “Sync Target” arrays) while simultaneously replicating the same data via Periodic Asynchronous mode to a third StoreServ array, the disaster recovery or “Async Target” array. The user has the option of treating the two sync arrays in an active-active manner, failing over between them if/when a failure in a data center dictates a failover is necessary and resuming operations on the “Sync Target” array. This provides a failover solution that delivers an RPO equal to zero due to the synchronous nature of the replication that occurs between the sync arrays.
On failover to a Sync Target array, the passive Asynchronous Periodic link between that array and the Async Target array becomes active and any data that was replicated to the Sync Target but that had not yet made it to the Async Target array is sent from the Sync Target array to the Async Target bringing the Async Target array up to date with the last write that occurred to the Sync Target. Operations then continue in the Sync Target data center and it continues to replicate data to the Async Target array.
From this information, you do need a 3rd endpoint to participate in asynchronous replication, so that the secondary site will be able to be informed of changes when the network link breaks.
I can't comment specifically on 3PAR, but I do have a lot of experience with EMC Symmetrix arrays.
My advice would be: Find another way. Synchronous replication is one of these technologies that look great on paper, and in optimal circumstances, but in the real world cause vast amounts of pain.
The way it works is:
It is 'RPO 0' in the sense that IF IT IS ON DISK it's on the remote site. Most applications use memory caching, which will be lost in a DR. However, it comes at a significant price:
You need enough total bandwidth to your remote site that you will always have enough to service the replication requirements - if you don't, your primary systems will suffer badly, because the disk latency will climb dramatically. If you ever saturate this link, you will suffer and your primary service may just crash.
You will always have a latency burden, and your performance will suffer as a result.
Now, it may be both these things are 'fine'. In my experience though, RPO0 and 'sync replication' are usually only discussed when you've got Really Important things.
To answer your questions directly though:
Does the SAN deny data writes to the I/O until synchronisation has completed?
No - it'll 'catch up' in async, before going into sync mode. This might take a while depending on bandwdith, and until it is sync, you don't have your '0 RPO'.
If a link is severed, does the SAN buffer the block changes until the connection is restored?
Depends a little on your config. Generally, it'll treat link suspend/resume as an event that it needs to async resync. Whilst the link is 'out' your RPO is no longer zero. You can 'block' IO in a link failure, but this will probably just crash your app.
If a link is severed during a TL log write, and a DR occurs, doesn't this mean that we will have a potentially corrupt TL written to the secondary site, and therefore incur data loss? The data loss is only because the primary was able to commit, but the secondary was not able to synchronise.
No - sync means sync. If you're in sync, ever IO on the disk is also on the remote. Any IO not on the disk is lost though, so you may lose your last translogs.
Is RPO of zero ever a guarantee across the stack (SQL Server / Memory / Network / SAN / IO)?
RPO is recover point objective. If your objective is (truly) zero, then ... you need to think very hard about your architecture. It's achievable, but it's incredibly expensive.
Personally, I would suggest instead of sync:
Run your primary datastores async, and rely on your logs to provide the 'sync' bit. Your 'RPO0' - practically speaking - will only be 'your committed translog' anyway. So NFS (CIFS?) mount a remote drive, and write you translogs across the network as well as to 'local' storage, and replay them onto your - few minutes out of sync - database.
You'll get about the same recovery point anyway - because I very much doubt you'll want to use data that isn't journaled - and you'll do so without needing expensive sync operations.