We have two large storage servers (+100TB), one runs on ZFS the other one runs XFS, we intent to use the XFS as our work server and use the ZFS as the backup server (snapshots <3). Now the problem is keeping these beasts in sync ... (sync as in daily synced)
The easiest option is to use rsync, but sadly the directory structure is deep, and full of hard links, all over the place. So this means we would need to do a "global" scan which would take ages... On top of that, most of the data is created and never modified. So rsync might just not be the way to go.
I looked into inotify, which seems relatively cheap and since we only wanne sync on a daily base, would be able to offload to a good time... sadly, if we only look to the created files we would copy hard links as data which would double the amount of storage used in our backup ... (basically there is no way to do the -H check from rsync)
The only option left I could think off would be to reorganize our storage to use a date based directory, sadly moving around that much data is not something we would prefer ...
Are there other options ?
For reference :
- Server with XFS has a raid controller (no JBOD option), and SATA disks (WD RE). 32Gb RAM
- Server with ZFS has a HBA controller and SAS disks. 126Gb RAM
When I reference ZFS as beeing slow, I see 'ls' taking seconds...
You really should be using ZFS on both sides, coupled with a block-level snapshot/replication routine like Sanoid.
Without that, you're stuck with file-based operations and the pain of the rsync file scan.
How fast is "fast enough"?
You are doing this once per day, so I'd suspect that if it takes 2-3 hours, that is sufficient.
In that case, "rsync -avP" should be all you need. The newest versions handle large directories, deep hierarchies, and doesn't require as much RAM as older versions.
If no files changed, "rsync -a" will be as fast as "ls -lR". You can't get any faster than "ls -lR" because it does an lstat() of every file on the system.
Benchmark "ls -lR" and "rsync -a". If they are slower than you think they should be, look at https://serverfault.com/a/746868/6472 for advice.
If you need something faster than the "ls -lR" benchmark, you'll have to either write something that uses "inotify", or use some kind of block-based system. In particular, using ZFS on both systems would enable you to use the snapshot export/import system that ZFS has built-in.
I would adopt a 2-part strategy... and at the end I'll suggest a 3rd part that is optional.
Part 1: Use inotify: Write a program that uses inotify to log which files were created, deleted, and modified. Write another program that reads the log, removes any duplicates, and does backups of those files (and deletes the deleted files). This won't be easy. Programming inotify is complex. The log can't be a simple text file since filenames can include newlines. If the system crashes while the log is being written, you'll need to be able to deal with partially-written filenames.
Part 2: Weekly rsync's just in case. Every few days do a "rsync -a --delete" to catch any files that were missed. The solution in part 1 is imperfect. If your program doesn't keep up with inotify, it may miss some files. If the machine reboots, the log of created/deleted/modified files may lose the most recent items. Bugs and other issues may also result in missing some files.
Optional Part 3: After you have this running for a few weeks and you've gotten all the bugs worked out, you'll still find that rsync is occasionally finding files that were missed. I promise you that will happen. inotify is "best effort". So, at this point you you'll realize that maintaining the code in Part 1 and Part 2 is twice as much work as you expected. To solve this problem, throw away the code you wrote in Part 1, because rsync is all you really needed in the first place.