I am trying to create a backup of a remote server. This is my configuration:
Server1 (webserver)
Server2 (backupserver)
This is my little script. It starts from the server2:
#!/bin/bash
date=`date +%F`
basepath=/var/backup
webfolder=$basepath/$date/websites/
for f in $(ssh root@server1 "ls -l /var/www/ | egrep '^l'")
do
if [[ $f = *.* ]]
then
echo "processing $f ";
ssh root@server1 "tar zcf - /var/www/$f/web/" > $webfolder/$f.tar.gz
fi
done;
The problem is that it is too slow! How have I to speed up this script?
Updates:
I have already used the Rsync without success. This is the command that I use:
/usr/bin/rsync -a --delete --numeric-ids --relative --delete-excluded \
--rsh="/usr/bin/ssh -p 22" [email protected]:/var/www \
/home/backups/daily.0/webserver/
The servers are connected by a Dell Gigabit Switch. Both servers have the Gigabit network card. They are in the same subnet.
rSync Solution:
At the end, and thanks to the suggestions I have followed this path:
- Install rsync in all the debian box
- Install rsnapshot in the backup server
- Configure rsync deamon in the debian box (excluding the backup server)
- Set the rsnapshot cron configuration file
Waiting the first time a lot of time for the first backup.
Distro: Debian Servers
You are reinventing the wheel. You should try using rsync. rsync will build the file list for you, and uses an amazing algorithm that is very fast, even over slow links, or encrypted connections that are slower from the overhead.
Very easy to run as well
rsync -vvarP root@server1:/var/www/ root@server2:/var/backup/
I don't think this is the most likely explanation, but having read the trouble you're having with rsync, it's just possible that you're suffering from a duplex mismatch on one or both of the NIC-switch connections.
Try doing a
netstat -in
on both servers, and check the error counts on transmission. Non-zero TX-errors often signal a duplex mismatch, and one effect of those is to permit slow, small-packet (interactive) connections unimpeded, but brutally restrict full-speed bulk-data connections.Edit (following your comment below): OK, that's not symptomatic of duplex mismatch, so ignore my suggestion. It would still be useful to find out what the bottleneck is when you try an rsync-over-ssh right now, since it's not CPU.
Since your two servers are residing on the same switch and same network segment my suggestion would be to set up an rsync daemon on your backup box and avoid the use of SSH all together.
My suggested settings for your rsync daemon would be as follows. I'd give a little more specific instruction but you did not mention your distribution.
This can be restricted down to only being accessible from the servers you want to back up from. From there you should be able to schedule a rsync job directly to your destination without the use of SSH, eliminating that issue.
If your site consists of a great many files the rsync process may hang at the sending incremental file list. If so the --delete-before or --delete-after options may prove beneficial.
There are also some configurations where the files are first copied and then analyzed locally. I haven't used rsync over SSH in a while but it is possible that the settings you are trying are having this effect.
I would suggest you to use rsnapshot. It's based on rsync as well. I use it to backup many remote server. It just take some time the first time and then it's very fast if your data doesn't change a lot. It's fully customisable and quite fast (the network in my case is the bottleneck).