We have a very large Windows file store (several terabytes and tens of millions of files) that I want to keep in continuous replication over to another server, in as near to real-time as possible. I'm looking for options on tools that will make this happen.
So far I've come up with:
- Move it to a NAS or SAN and share the files. Won't work, not an option for us for at least six months.
- Use robocopy with /MON. I'm worried about the drain on replication source resources as it rescans the entire tree every cycle. (While /MON watches for file changes before triggering a new cycle, each run performs a full scan, and does not actually use the file change data.)
- Rsync. No better than robocopy.
- DFS. I've heard very bad things from people at MS about DFS for stores with millions of small files. Probably half our files are very small.
- Hybrid. Write my own tool or find one online that uses file watchers to target what files need to be copied. Though the buffer can overflow, so still run a nightly robocopy to pick up anything that was missed.
- Backup-based. Some kind of crazy scripted backup/restore thing where the backup software can do very fast incremental snapshots.
Any ideas would be greatly appreciated, thanks!
Whatever you use, be sure that it uses the NTFS change journal so that it's not effectively "polling" the filesytem. A "robocopy /MON", for example, doesn't use the NTFS change journal so it ends up just "polling" the filesystem.
I have a Customer using SureSync to replicate a few million large and small files (around 1.5TB) to a "hot standby" file server computer. It works well for them. It uses the NTFS change journal to keep abreast of changes to the volume. It supports delta compression and use of a dedicated network interface for inter-server communication. (I have no affiliation with the company; I just like the tool.)
This is the first I've heard anyone saying negative things about DFS-R. Our experience has only been positive.
In any case, I would start by trying DFS-R since it requires no additional hardware or software other than the Windows licenses you already have (assuming you've got Enterprise editions). It's also pretty easy to setup.
The largest DFS-R volume we manage is about 200GB and a little over 1 million files. Obviously this is a lot smaller than what you've got, but it's still fairly sizable. The contents are primarily software installation packages (some of which contain thousands of tiny files). We used to replicate this store with NTFRS and had nothing but problems. We upgraded to DFS-R back when 2003 R2 came out and it was a night and day difference. The servers have since been upgraded to 2008 and are still humming along without a glitch.
You'll definitely want to setup your staging volume on a different set of spindles for performance and you'll have to configure it pretty large as well. I'm not really an expert on the specifics though. It likely depends on how large your largest file is and how much churn there is on a regular basis. Microsoft PSS folks would likely provide better advice on this.
So get it setup and see if it performs adequately enough. If it doesn't, all you've lost is some time.