I am making a remote backup of my website. Whole catalog is about 70GB with about 5,000,000 of files total. Here is the command that I run on my backup server:
rsync -ah -e ssh --delete --link-dest=/backups/2013.09.06 [email protected]:/var/www/backups/2013.09.07
Process runs more than 48 hours and just hangs.
I've ran strace -p
of rsync process on client (webserver where website is located) and saw, that process periodicly stops at select
command ending with = 0 (Timeout)
after some time, then continues.
open("mysite/files/1694201", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=10083, ...}) = 0
read(3, "\r\n\320\224\320\265\321\201\321\217\321\202\321\214 \320\273\320\265\321\202, \321\210\320\265\321\201\321\202\321"..., 10083) = 10083
select(2, NULL, [1], [1], {60, 0}) = 1 (out [1], left {59, 999998})
write(1, "\374\17\0\7", 4) = 4
select(2, NULL, [1], [1], {60, 0}) = 1 (out [1], left {59, 999999})
write(1, "\320\260\320\262\320\260\320\271\321\202\320\265...\320\232\320\270\320\264\320\260\320\271\321\202\320\265 \320\274"..., 4092) = 4092
select(2, NULL, [1], [1], {60, 0}) = 1 (out [1], left {59, 999999})
write(1, "\374\17\0\7", 4) = 4
select(2, NULL, [1], [1], {60, 0}) = 0 (Timeout)
Process hangs on the last line for a minute or so.
Why can this be happening? Why the process takes so long and never reaches the end? What could those 0 (Timeout)
in strace mean?
Both servers run rsync 3.0.9, IO is not overloaded.
Go read up on the 5th parameter passed to select.
Plainly rsync (on its own) is not appropriate for the method you have chosen for backing up the files. It has to generate a hash for each of 5 million files and send that across the network just to find if anything has changed.
If it were me, I'd wrap it up in a script running on the source server which
Checks the time (tstart) the previous successful sync was started
Finds all files on the source which have a mtime > tstart
rsync those files modified to the backup server
e.g.
are you sure you have 5 billion files?
I'd rather tgz and rsync that tgz, since the initial comparison from src to dst would take forever if you have somewhat "normal" hds, no high-speed SAN or SSD.
where does your process is slow? during file-transfer or during initial src<->dst - check?(sending incremental file list ...)
I'd check IOWAIT on both ends, if possible. and, if machines have md-raid, cat /proc/mdstatus. very bad io-performance can be a result of a rebuilding raid (but very unlikely).
and i'd to a transfer with a single large file with
--progress
switched on during rsync-transfer to check network-speed.debugging hints (you should test each possible bottleneck, even just to make sure: this is NOT the problem)