Setup:
I would like to backup 1 TB on a weekly basis from PROD to BACKUP server. The servers run under Linux Ubuntu-1004-lucid-64-minimal 2.6.32-35-server.
There is the BACKUP server (where I start the rsync program) and the PROD server with the data.
The command I currently use is the following:
time rsync -r --delete [email protected]:/home/myuser/data .
Issue:
The issue with this is: the PROD server becomes nearly unresponsive. The web application that runs on the PROD server is nearly dying.
UPDATE: Current working solution
After some feedback, I'm using this command now for doing the backup of 1TB of data and it is absolutely working fine:
rsync -r --delete --rsync-path "ionice -c 3 nice rsync" --bwlimit=30000 [email protected]:/home/myuser/data .
Please note that I have set a bandwidth limit to 30 Mbps because my connection between PROD and backup server has 100 Mbps and that bandwidth is shared with the production traffic of my web applications.
Please note that I execute this command on the BACKUP server, so that's why I'm using the rsync-path option in order to nice and ionice the process on the remote server (PROD).
My Original questions for possible Solutions
How can I control the impact of rsync (started on the BACKUP server) ?
How would you solve this issue?
My little research brought out the following possibilites:
Execute rsync in a way so that the 1TB is synched in chunks? e.g.
rsync /source/[0-9]* [email protected]:/source_backup rsync /source/[a-h]* [email protected]:/source_backup/ rsync /source/[i-p]* [email protected]:/source_backup/ rsync /source/[q-z]* [email protected]:/source_backup/
Would it help to limit bandwidth with the option
--bwlimit=10000
Is it possible to nice the process on the remote machine in a way? e.g.
nice -n19 backup.sh
I don't know whether the process on the PROD machine would be niced too?
Any help and ideas is very welcome.
I love rsync. But it still has a design flaw where it wants to "load up" a list of every file in the directory tree it is scanning. Previously, it used to wait for the entire tree to be loaded before it would start transmitting the list to the peer. That seems to be fixed now and it does things in parallel better than before. However, it still wants to load the whole list. The impact is proportional to the number of files, not the size of data.
While this all involves I/O to load the list from all the scattered directories, that impact cannot be altered by splitting things up, since everything must still be scanned. However, I have found a greater impact exists when the list is very large in one run because it takes up a lot of virtual memory and places a high demand on keeping that list in real RAM by the way it operates on it. This memory demand is forcing other processes to swap.
Breaking up the directory tree, as you have suggested, will help break up the impact of the memory demand.
There is also a downside to this if your data makes use of hardlinked files. If you have hardlinked files AND they are hardlinked between the parts you break up the directory tree into, then you lose the ability for rsync to maintain the same hardlinking on the target (backup server). That will result in a greater usage of space on the target, and depending on what you are using hardlinking for, may break how your data works (for example expecting a change to one file to be seen in another). If you are not explicitly using hardlinks, then this aspect won't be an issue for you.
Count the number of files you have in each section you break up. Try to keep that balanced as much as you can. The best number to limit at depends on your available physical RAM and the need for that RAM by other processes.
The bandwidth and nice settings are unlikely to be much help for the memory issue. The bandwidth can still help if there are also network capacity issues.