PLEASE READ UPDATE AT THE BOTTOM. THANKS! ;)
Environment Info (all Windows):
- 2 sites
- 30 servers site #1 (3TB of backup data)
- 5 servers site #2 (1TB of backup data)
- MPLS backbone tunnel connecting site #1 and site #2
Current Backup Process:
Online Backup (disk-to-disk)
Site #1 has a server running Symantec Backup Exec 12.5 with four 1TB USB 2.0 disks. BE jobs for full backups run nightly on all servers in site #1 to these disks. Site #2 backs up to a central file server there using software they already had when we purchased them. A BE job pulls that data nightly to site #1 and stores them on said disks.
Off-site Backup (tape)
Connected to our backup server is a tape drive. BE backs up the external disks to tape once a week which gets picked up by our off-site storage company. Obviously we rotate two tape libraries, one is always here and one is always there.
Requirements:
- Eliminate the need for tape and off-site storage service by doing disk-to-disk at each site and replicating site #1 to site #2 and vice versa.
- Software based solution as hardware options have been too pricey (ie, SonicWall, Arkeia).
- Agents for Exchange, SharePoint, and SQL.
Some Ideas So Far:
Storage
DroboPro at each site with an initial 8TB of storage (these are expandable up to 16TB at present). I like these because they are rackmountable, allow disparate drives, and have iSCSI interfaces. They are relatively cheap too.
Software
Symantec Backup Exec 12.5 already has all the agents and licenses we need. I'd like to keep using it unless there is a better solution, similarly priced, that does everything BE does plus deduplication and replication.
Server
Because there is no more need for a SCSI adapter (for tape drive) we are going to virtualize our backup server as it is currently the only physical machine save for SQL boxes.
Problems:
- When replicating between sites we want as little data as possible to go across the pipe. There is no deduplication or compression in what I have laid out here so far.
- The files being replicated are BE's virtual tape libraries from our disk-to-disk backup. Because of this each of those huge files will go across the wire every week because they change every day.
And Finally, the Question:
Is there any software out there that does deduplication, or at least compression, to handle just our site-to-site replication? Or, looking at our setup, is there any other solution that I am missing that might be cheaper, faster, better?
Thanks. Sorry so long.
UPDATE 2:
I've set a bounty on this question to get it more attention. I'm looking for software that will handle replication of data between two sites using the least amount of data possible (either compression, deduplication, or some other method). Something similar to rsync would work but it needs to be native to Windows and not a port involving shenanigans to get up and running. Prefer a GUI based product and I don't mind shelling out a few bones if it works.
Please, answers that meet the above criteria only. If you don't think one exists or if you think I'm being to restrictive keep it to yourself. If after seven days there is no answer at all, so be it. Thanks again everyone.
UPDATE 2:
I really appreciate everyone coming forward with suggestions. There is no way for me to try all of these before the bounty expires. For now I'm going to let this bounty run out and whoever has the most votes will get the 100 rep points. Thanks again!
Windows Server 2003 R2 and later has support for DFSR, which I used extensively to sync and backup large amounts of data over a rather small pipe across three sites (80GB+ over a T1<-->T1<-->T1 topology).
msdn.microsoft.com/en-us/library/bb540025(VS.85).aspx
DFSR is fully multimaster and can be configured however you want. That will keep your data in sync on the "backup" location, for a very small amount of bandwidth and CPU. From here, you can use the Volume Shadow Copy Service.
technet.microsoft.com/en-us/library/cc785914.aspx
The shadow copies reside on disk, and take "no space" aside from the changed files from snapshot to snapshot. This is a process that can run on a live dataset with no ill effects, aside from slightly increased disk I/O as the snapshot is being created.
I used this solution for quite some time with great success. Changes to files were written out to the other sites within seconds (even over the low bandwidth links), even in cases where just a few bytes out of a very large file changes. The snapshots can be accessed independently from any other snapshot taken at any point in time, which provides both backups in case of emergency and very very little overhead. I set the snapshots to fire at 5 hour intervals, in addition to once before the workday started, once during the lunch hour and once after the day was over.
With this, you could store all data in parallel at both locations, kept relatively up to date and "backed up" (which amounts to versioned, really) as often as you want it to.
The Shadow Copy Client can be installed on the client computers to give them access to the versioned files, too.
www.microsoft.com/downloads/details.aspx?FamilyId=E382358F-33C3-4DE7-ACD8-A33AC92D295E&displaylang=en
If a user accidentally deletes a file, they can right-click the folder, properties, Shadow Copies, select the latest snapshot and copy it out of the snapshot and into the live copy, right where it belongs.
MSSQL backups can be written out to a specific folder (or network share) which would then automatically be synched between sites and versioned on a schedule you define.
I've found that data redundancy and versioning with these can act as an awesome backup system. It also gives you the option to copy a specific snapshot offsite without interfering with the workflow, as the files it reads from aren't in use...
This should work with your setup, as the second backup site can be configured as a read-only sync/mirror.
Windows isn't my area of expertise but rsync may help get the backups from one site to the other. Rsync works by breaking down files into smaller blocks and then only transfers the blocks that changed across the network. It can also compress the data as it sends it.
There are some versions of it for Windows out there but I've never used them so I can't comment on how well they work. With cygwin you can get rsync on windows but that may make things a bit messy. But ideally you should find a rsync client for windows that will allow you to use the scheduled tasks to automate it's execution.
Edit:
We are using "SureSync" from Software Pursuits (see http://www.softwarepursuits.com/SureSync/SureSync.asp) to replicate data between a production and standby server in a Customer's remote site with great success. It is a native Windows application, runs as a service on the publisher and subscriber machines, copies deltas, retains security, follows the NTFS change journal, and in general has rocked for our needs.
(Our particular Customer who is doing this still has traditional off-site tape backup rotation, too. I think you still need offline backups, and I can't answer your question in good conscience without leaving that section of my answer intact, but I can tell you that SureSync has been great. Read some of the changelogs on the product-- it's clear to me that the manufacturer is really, really attentive to detail.)
One observation: If you eliminate tape, you're eliminating offline storage. Off-site is one thing, but offline is a different thing. When a remote attacker destroys your production system it's really, really nice to have an air gap between the tapes and the tape drive to stop them from being able to destroy the backups too.
You need off-site storage of backups, and you need offline backups, too.
It's also very hard to have an independent third-party do a test restore and data verification without something like tape. Perhaps in your industry that's not a concern, but I've worked for financial institutions that sent their data, via tape, off-site to a third-party to independently verify the integrity of their data, both from a "restorability" perspective, and from a "let's see if your totals compare properly with the totals that we compute on a trusted installation of your application using only your data as input".
If you want to increase the speed of your site to site replication, you might look into a WAN accelerator. There are several on the market. Another admin just recommended the ones from Riverbed to me: http://www.riverbed.com/index.php?cnt=1
Essentially they compress the data before sending it and decompress the data after receipt. It's seamless to the user.
Adding a new option to this running thread.
The software we started using is made by AppAssure, (Now acquired by Dell) Product name is called Replay.
it's designed for Windows servers doing disk to disk backup and there is a replication option that allows you to automatically copy the snapshots over to a remote site.
It includes automatic deduplication, automatic version rollup, and the replication is pretty efficient and can be scheduled to happen on off hours even if the backup snapshots are happening all day on 15 minute or hourly basis.
Only the changes get sent over the WAN not a full copy of the data and if you need to do a brand new full copy of data you can offload the initial backup to an external disk and ship that out to the remote site to be imported to save you from having to send a full backup over the WAN for the initial sync.
For backup disks a perfect companion is the Drobo B800i Iscsi san system. It's relatively cheap, takes commodity off the shelf SATA drives and has reasonable performance for doing backups (but not good enough for doing anything too heavy like VMWare ESX hosts or SQL data hosting) There is a reason that Drobo does not publish performance data on the B800i, it's pretty low end compared to something like an Equallogic PS san, or anything from the big SAN vendors like EMC, or HP. But it's great as the storage for a disk to disk system.
As much as I hate to say it, the easiest, and fastest way to perform multi-site backups is with a good storage array. Both Dell/EqualLogic and HP/LeftHand have software built into their SAN products that will allow constant incremental backups across multiple SANs. They are quick and easy to set up, but may not be the cheapest solution.
I had a similar issue about a year ago and looked at everything from robocopy and rsync to Cisco WAAS and WAN accelerators. Eventually I stumbled upon a stupid cheap solution that works great for securely and quickly delta syncing files between sites. Delta syncing is the key. Most, if not all, P2P clients do full file syncs only.
Powerfolder
It has a LAN-Only mode, allowing you to specify which sets of IP's you'll allow clients to connect with. It also has a pretty good mix of Transfer Modes.
There was a little bit of digging to get things set up exactly how I wanted. The Powerfolder guys are definitely not UI developers, but support was extremely helpful and their wiki documentation is great even if the search function on the wiki is not. :-)
We haven't gone to an inline solution for Exchange, SQL, and Sharepoint yet, but saving a backup of the databases to disk and having Powerfolder sync them is enough peace of for us.
This solution works well and the company loves it as it cost less than $100 (excluding man hours for research and setup) to implement.
It's surprising Powerfolder isn't more well known.
P.S. - sorry for the lack of links (LAN-Only Mode, Transfer Modes, etc). "new users can only post a maximum of one hyperlink"
IBM aquired a company previously called "Softek" that has a software solution called Replicator. It's block level replication that runs over TCP/IP. After the initial synchronization is complete only the changed blocks are copied over to the remote site. So just because one of your huge BE files gets changed somewhat it would not be necessary to copy the entire file. This is a native windows application, has an easy to use console, and is a really good way to manage disk synchronization over a network.
IBM/Softek Replicator
BackupExec doesn't make this easy. Ideally, you should have the option to 'copy' a backup to somewhere, and I'm not sure BE has that.Here is what I'd build if I were using HP Data Protector in this environment. It does have a 'copy' for jobs. It also has a de-duplication option, but I'd be deeply careful of that on file-systems with more than about 500K files on it.
I believe you can set disk-backups to use compression, but this'll really slow down your backup speeds as that'll be done in software.
The DataProtector dedupe works only for file-servers on Windows and Linux. It won't de-dupe Exchange/MS-SQL/Sharepoint.
You should give a look at robocopy, or, if you need a gui, richcopy. Both tools are multithreaded, fast, efficient, and have lots of options for merging and syncing. You can use in conjunction with the osql backup database command (for DB backup) and exmerge (for brick-level exchange backup), and can create a simple scheduled task to automate.