I recently signed up with Rackspace to host some database servers. I've got two MySQL servers set up, and have a method to create backups (using the Percona Xtrabackup and innobackupex tools). I've been trying to use duplicity to copy these backups to S3 and CloudFiles storage, and it is taking forreverr! I would expect the S3 backup to not be very fast, but the CloudFiles backup has taken 15 hours to backup 9GB. That's horrendously slow, and unacceptable for me.
I've looked through the duplicity source code, and by default it does not utilize the Rackspace Servicenet to transfer to cloudfiles. I then looked at the cloudfiles source code for the lib that duplicity uses for the CF backend, and saw that there was an environmental option for utilizing the Servicenet (RACKSPACE_SERVICENET
). So long as that is set to something the cloudfiles lib should be connecting to cloudfiles via the Rackspace Servicenet, which SHOULD make for fast transfers. It does not.
I'm not sure if the speed limitation is because of some limitation of CloudFiles or if the cloudfiles python library isn't actually connecting via the RackSpace servicenet.
Do any of y'all have any other suggestions for how I should/could go about getting these backups off the server and onto a 3rd party or remote backup service?
Maybe not a full answer, more a suggestion. Could you not set up an Amazon EC2 instance that continually mirrored (or trailed by a few minutes) the main DB servers. Then you could run backups off that EC2 instance directly to S3 and get faster transfer speeds, as well as reducing the load on your primary DB machines.
Although 15 hours for 9GB is, if my mental maths is correct (which it probably isn't), less than 2MB/s which does sound like an issue. It might be worth contacting Rackspace support asking them why the slow transfer speeds.
We use Rackspace Server Backup (a.k.a JungleDisk Server backup), which like Duplicity does local dedupe and compression and then uploads "chunks" via HTTP to a cloud provider. We saw some performance issues, and the underlying reason was our provisioning points for cloud files vs. cloud servers were different. Our cloud servers were being created in the DFW datacenter, but all cloud files buckets for JungleDisk are in the ORD datacenter.
Rackspace does not currently give people a choice for which datacenter they are going to use, because the DFW facility is near capacity. So everything for "newer" accounts as being provisioned in ORD. So you have to open a support ticket to get your provisioning point changed.
Also, you can't use ServiceNet between Rackspace datacenters (yet).
That said, we do see 40+ Mbps during backups even crossing Rackspace datacenters using Rackspace Cloud Backup, so I suspect you have some form of configuration issue with duplicity, or you are disk-or-CPU bound during the backup. Have you tried running the backup to the same target from outside cloud files? How does a simple HTTP PUT of a large file perform (i.e. exlude duplicity for a test)?