I have a long-term goal of setting up a DR site in a colo somewhere and part of that plan includes replicating some volumes of my EqualLogic SAN. I'm having a bit of a difficult time doing this because I don't know if my method is sound.
This post may be a bit lengthy for the sake of completeness.
Generally relevant information:
- I have one EqualLogic PS4000X (~4TB).
- The SAN Acts as shared storage for 2 ESXi hosts in vSphere 5 environment.
- I have 4 volumes of 500GB each. Volumes 1 and 2 contain my "tier 1" VMs. These will be the only volumes I plan to replicate.
- Currently have 3Mb/s connection with actual data bandwidth at ~2.8Mb/s because of our PRI(voice).
My method of measuring change in a volume:
I was told by a Dell rep that a way (perhaps not the best?) to estimate deltas in a volume is to measure the snapshot reserve space used over a period of time of a regular snapshot schedule.
My first experiment with this was to create a schedule of 15 minutes between snapshots with a snapshot reserve of 500GB. I let this run overnight and until COB the following day. I don't recall the number of snapshots that could be held in 500GB but I ended up with an average of ~15GB per snapshot.
$average_snapshot_delta = $snapshot_reserve_used / $number_of_snapshots
I then changed the snapshot interval to 60 minutes which after a full 24 hours passing means a total of 13 snapshots in 500GB. This leaves me with ~37GB per hour (or ~9GB per 15 mins).
The problem:
These numbers are astronomical to me. With my bandwidth I can do a little over 1GB/hour with 100% utilization. Is block-level replication this expensive or am I doing something completely wrong?
Your numbers boil down to 10.24 MB/s, which does seem a bit on the high side for pure write. But then, I don't know your workloads.
However, you have a bigger problem. The initial replication will be replicating 1TB of data over a 3MB/s straw.
During that time it'll be queueing up your net-change for when the initial sync finishes. And if you ever need to pull data from the remote array, it'll be 4.33 days until you're fully up and running (unless you have an out-of-band method of data-transfer, like a FedEx Overnight Shipping or a truck).
As for the difference in net-change between your 15 minute snapshots and the 60 minute snapshots, I believe the 60 minute snapshot is getting the benefit of a lot of write-combining. Or put another way, all of those writes to the filesystem journals are being coalesced in the longer snapshot in the way they aren't as much in the 15 minute snaps.
This is where sync mode really comes into its own. A 3MB/s pipe is woefully underprovisioned for synchronous replication. A batched asynchronous replication will gain some of the benefits of write-combining, and therefore lower total transfer, at the cost of losing some data in a disaster. Unfortunately, I'm not well versed enough in Equilogic to know what it's capable of
This is the biggest con against equallogic in my opinion. Replication is based on snapshots and their snapshot technology is incredably ineffecient.
We run about 25 arrays in our environment and my 2-3 year goal is to replace them all with netapp. Based on what we see on our netapp cif filers and testing of nfs the replication bandwidth and snapshot space will be reduced by 80%. add to the the dedupe features of the netapp and it is much more efficient.
Make sure to put your windows page files and your vmware swap files on a non replicated volume.
Also - if you can afford it look at adding some riverbed wan optimizers. They will reduce the amount of data on your wan for repliation by 60% or so. It has saved us and we have minumum ds3 wan connections up to oc-3.
You also did not mention what you latency is. It is a critical component in replication calculations.
If your VMs do not have their page files on a separate datastore, you should try moving them to one and then re-measuring your rate of data change (data churn). This will definitely help. Don't replicate more than you need to.
Does EQL support continuous async replication or is driven by a snapshot schedule? Can you use the whole 3Mb 24/7?
I also second the suggestion that you synchronize the arrays before putting one at the remote site.
For the sake of focusing on the most relevant information, I'd suggest defining an objective for your recovery point and recovery time. These are unimaginatively referred to as an "RPO" and "RTO". Disk replication is supposed to reduce them both by keeping a crash-consistent copy of the data that's never older than a few minutes on another site.Once you have these numbers, you can define things like how often you have to have a crash consistent replica.
3Mb/s is probably not going to cut the mustard unless you use WAN acceleration (such as Riverbed, mentioned by one of the other answerers). WAN acceleration works by keeping a cache on disk at both sides of the link where they store all the most recent data you've sent, and if you ever send a duplicate block, it sends a reference instead of the data.
That said, assuming your storage is using the same engine to take snapshots as it uses to replicate snapshots, then the most accurate measure of change is indeed a snapshot reserve. You'd need to keep one snapshot and its reserve isolated for the duration of the the measurement period, though. Assuming EqualLogic uses copy on write snapshots, comparing data from the reserve of several snapshots taken throughout the day might actually make it seem like your data's changing more than it actually is.
As for the data itself, I agree with the replies that suggest not replicating the swap files. Swap files can take a lot of disk and are always changing, which would trigger a lot of replication traffic. I don't know whether VMWare supports replication of an environment without them, though... I assume that the VMs in a a VM datastore replicated without swap files would be crash-consistent, however I can't confirm that myself.
I am currently in the process of something similar however with Solaris 11 and zfs as our san backend. Because of bandwidth I decided to separate out most of the components. We migrated to exchange 2010 so that we can set up our dr site with an identical copy. What I found was doing san level snapshots would be ridiculous for this data because of bandwidth issues like you are seeing. We decided it would be cheaper and more efficient to set up a dag and replicate within exchange itself. We also did the same thing with our mysql servers. What we clone now are systems with less deltas between snapshots. I was able to do the initial synchronization at the office and transport it to its final destination.
The block size is 16 Mo for snapshot and replication on Equallogic SANs. This is why you got those astronomical numbers. No way to change that. The solution for us to meet our RTO/RPO SLA was to install a Riverbed WAN Optimization Appliance between the 2 sites.