I'm backing up a Linux box over SMB to a NAS. I mount the NAS locally and then I rsync a lot of data (100GB or so). I believe it's taking an awfully long time to do it: more than 12 hours. I would expected to be much faster once everything is copied since almost nothing is changed from day to day.
Is there a way to speed this up?
I was thinking that maybe rsync thinks it's working with local hard disks and uses checksum instead of time/size comparisons? But I didn't find a way to force time and date comparisons. Anything else I could check?
I think you're having a misunderstanding of the rsync algorithm and how the tool should be applied.
Rsync's performance advantage comes from doing delta transfers-- that is, moving only the changed bits in a file. In order to determine the changed bits, the file has to be read by the source and destination hosts and block checksums compared to determine which bits changed. This is the "magic" part of rsync-- the rsync algorithm itself.
When you're mounting the destination volume with SMB and using rsync to copy files from what Linux "sees" as a local source and a local destination (both mounted on that machine), most modern rsync versions switch to 'whole file' copy mode, and switch off the delta copy algorithm. This is a "win" because, with the delta-copy algorithm on, rsync would read the entire destination file (over the wire from the NAS) in order to determine what bits of the file have changed.
The "right way" to use rsync is to run the rsync server on one machine and the rsync client on the other. Each machine will read files from its own local storage (which should be very fast), agree on what bits of the files have changed, and only transfer those bits. They way you're using rsync amounts of a trumped-up 'cp'. You could accomplish the same thing with 'cp' and it would probably be faster.
If your NAS device supports running an rsync server (or client) then you're in business. If you're just going to mount it on the source machine via SMB then you might as well just use 'cp' to copy the files.
It sounds like timestamps are your problem, as this page relates:
http://www.goodjobsucking.com/?p=16
The proposed solution is to add
to the rsync parameters.
Yes, you can speed it up. You need to make either the source or destination look like a remote machine, say by addressing it as "
localhost:
".You stated that you are mounting the SMB share locally. This makes the source or destination look like a local path to rsync. The rsync man page states that copies where the source and destination are local paths will copy the whole file. This is stated in the paragraph for the "--whole-file" option in the man page. Therefore, the delta algorithm isn't used. Using the "
localhost:
" workaround will restore the delta algorithm functionality and will speed up transfers.Thought I would throw my 2p in here.
My brother has just installed a Buffalo NAS on his office network. He's now looking at off-site backups, so that should the office burn down, at least he still has all his business documents elsewhere (many hundreds of miles away).
My first hurdle was to get the VPS he has (a small Linux virtual private server, nothing too beefy) to dial-in as a VPN user to his broadband router (he's using a DrayTek for this) so that it itself can be part of his VPN, and so it can then can access the NAS directly, in a secure fashion. Got that sorted and working brilliantly.
The next problem was then transferring the files from the NAS to the VPS server. I started off by doing a Samba mount and ran into exactly the same (or even worse) issue that you've described. I did a dry-run rsync and it took over 1 hour 30 mins just to work out what files it was going to transfer, because as Evan says, under this method, the other end isn't rsync so it has to do many filing system calls/reads on the Samba mount (across a PPTP/tunnelled connection, with a round trip time of about 40ms). Completely unworkable.
Little did I know that the Buffalo actually runs an rsync daemon so, using that instead, the entire dry-run takes only 1 minute 30 seconds for 87k files totalling 50Gb. Obviously, to transfer 50Gb of files (from a NAS that is on a broadband link with only 100k/sec outbound bandwidth) is another matter entirely (this will take several days) but, once the initial rsync is complete, any incremental backups should be grease lightening (his data is not going to change much on a daily basis).
My suggestion is use a decent NAS, that supports rsync, for the reasons Evan has said above. It will solve all your problems.
Smells like you have a cheaper NAS. It could also be from your network bandwidth...
"Standard" consumer NAS are really weak when it comes to heavy IO which is what you are trying to do here. It could also be a cheap switch connecting your PC and your NAS that is not strong enough to handle all the packets correctly.
try this it think aleast gives you 10% more what speed your getting http://www.thegeekstuff.com/2009/09/linux-remote-backup-using-rsnapshot-rsync-utility/
There are two potential sources of the problem - either you use incorrect comman line options or your NAS has issues with timestamping (or both :-). Please check this thread "rsync to NAS copies everything every time" for more info.