Ping a Specific Port

Question

morpheous

Asked: 2010-01-31 10:09:07 +0800 CST2010-01-31 10:09:07 +0800 CST 2010-01-31 10:09:07 +0800 CST

How to choose a cloud service for backups

772

I am thinking of using a cloud service to backup one of my client's website.

My (clients) main concerns are (in decreasing order of importance)

Protection of IP (trade secrets, source code), user account details etc
Uptime guarantee offered by service provider (to minimise webserver down times)
Cost
Upload/download speeds

Ideally, I would like service that does not have a long tie in (i.e. I would prefer a kind of "pay-as-you-go" service)

I would also like to avoid vendor lockin, where it is next to impossible to move to another service.

I would like some general guidelines on:

How to go about choosing a service provider
Who are the main players in the field
recommendation of software to use for: backup/restore/ and upload/download of the saved/restored files

The server software is either going to be Ubuntu or Debian (I'll probably post a question on which OS to go for as a server - I am already familiar with Ubuntu)

6 Answers

Voted

RichVel · Answer 1 · 2011-10-08T02:56:47+08:00

Any solution that doesn't include encryption on the client side with keys held by the owner is not going to meet the first stated requirement (IP protection / security) - any hack of the server side discloses unencrypted data. This rules out cloud syncing systems such as Dropbox that own the keys.

To avoid hosting the all-important encryption keys on the website's server, which is also likely to be hacked at some point, here's what I would do:

In-house backup server on the customer's own site - has encryption keys and SSH keys for both other servers
Server hosting the website - could be a web host
Cloud backup server or service

Step 1: Server (1) pulls the backup from (2), so most hacks of the website server will not compromise backups. Encryption takes place at this point.

I would use rsnapshot over SSH using key-based login, as this has minimal requirements on the web host and in-house backup server - unless you have a large DB to backup it is very efficient in bandwidth and stores multiple versions of the site, and also handles purging of old backups.
Encryption could be done by any file to file tool such as GPG, copying the rsnapshot tree to another tree - or you could use duplicity for step 2, saving disk space.
"Pull" from the backup server is important - if the main server (2) has the passwords/keys for the backup server, hackers can and sometimes will delete the backups after hacking the main server (see below). Really advanced hacks can install trojaned SSH binaries which could then compromise the backup server, but that's less likely for most companies.

Step 2: server (1) pushes the encrypted backups to (3) so that there is an offsite backup. If the backups were encrypted in step 1, you can just use an rsync mirror of the local rsnapshot tree to the remote system.

Duplicity would be a good option to directly encrypt and backup the unencrypted rsnapshot tree onto the remote server. Duplicity's features are a bit different to rsnapshot, using GPG-encrypted tar archives, but it provides backup encryption on the remote host and only requires SSH on that host (or it can use Amazon S3). Duplicity doesn't support hard links, so if this is required (e.g. for a full server backup), it's best if a script converts the rsnapshot tree (which does support hard links) into a tar file (maybe just the files that have >1 hard link, which will be quite small) so duplicity can back up the tar file.
Since the remote server is just an SSH host, possibly with rsync, it could be a web host (but from a different hosting provider and in a different part of the country), or a cloud service that provides rsync and/or SSH - see this answer on rsync backups to cloud for its recommendation of bqbackup and rsync.net, though I don't agree with the backup setup mentioned.
You can use Amazon S3 as the remote server with duplicity, which would give you really good availability though perhaps it would cost more for large backups.
Other options for remote encrypted backups are Boxbackup (not quite as mature, some nice features) and Tarsnap (commercial cloud service based on Amazon S3 with simple command line interface, good deduplication and very thorough encryption).
- JungleDisk may be an option but I haven't had a great experience with them in the past and their encryption has some issues (from the Tarsnap author).

The security of all the various hosts is important, so this should be adjusted to meet the security profile of the client i.e. analyse the threats, risks, attack vectors, etc. Ubuntu Server is not a bad start as it has frequent security updates for 5 years, but attention to security is required on all servers.

This setup provides 2 independent backups, one of which can be a highly available cloud storage service, operates in pull mode so most attacks on the website cannot destroy the backups at the same time, and it uses well proven open source tools that don't require much administration.

Independent backups are critical, because hackers really do sometimes delete all backups at the same time as hacking the website - in the most recent case hackers destroyed 4800 websites, including backups by hacking the web hosting environment rather than the sites. See also this answer and this one.
Restoring is very easy with rsnapshot - there is one file in each snapshot tree for every file backed up, so just find the files with Linux tools and rsync or scp them back to the website. If the on-site backup server is unavailable for some reason, just use duplicity to restore them from the cloud backup server - or you can use standard tools like GPG, rdiff and tar to restore the backups.

Since this setup uses standard SSH and rsync, it should be easier to choose a suitable provider with the right uptime guarantees, strong security, etc. You don't have to lock in to a long contract, and if the backup service has a catastrophic failure, you still have a local backup and can switch to another backup service quite easily.

Tobu · Answer 2 · 2010-01-31T16:48:27+08:00

Tobu

2010-01-31T16:48:27+08:002010-01-31T16:48:27+08:00

Software-wise, consider duplicity for incremental backups with asymetric encryption and a dumb receiver (non-cloud howto).

2

Jason Berlinsky · Answer 3 · 2010-02-01T00:15:24+08:00

Jason Berlinsky

2010-02-01T00:15:24+08:002010-02-01T00:15:24+08:00

I always tell my clients that the best, least expensive and most efficient backup solution is one that you build yourself, for your own purposes.

When I build a system for my clients, I use rsync with SSH keys to handle authentication between serverA and serverB, where serverA contains the data to be backed up. The command to archive and rsync the data is contained in a bash script in a non-web-accessible directory, called by cron every H hours (24 for daily, etc. etc.)

The backup server, serverB, is to be used SOLELY for backups. I always advise my clients to use an extremely lengthy password with SSH key authentication to allow for downloading of backups and backing up. Sometimes, my clients need backups to be saved for D days, so I write some scripts to handle that (take data from the active backup directory, apply a timestamp, add to an archive in another directory).

1

RJFalconer · Answer 4 · 2010-01-31T10:49:25+08:00

RJFalconer

2010-01-31T10:49:25+08:002010-01-31T10:49:25+08:00

For small business / prosumer, I'd recommend Amazon's Storage Service.

Region control (Ie objects stored in a EU never leave the EU).
99.9% uptime for any given billing cycle
$0.150 per GB stored per month
$0.170 per GB downloaded
Free upload until June 2010, $0.10 per GB thereafter

And the rather vague assurance that "Authentication mechanisms are provided to ensure that data is kept secure from unauthorized access"

0

phoebus · Answer 5 · 2010-01-31T15:21:59+08:00

phoebus

2010-01-31T15:21:59+08:002010-01-31T15:21:59+08:00

While bluenovember is on the right track with S3, Amazon's system isn't really a drop-in backup solution, it's a raw data storage solution that still requires a front end system to be used for backup, whether that's a few API calls or a full backup management suite. Something like JungleDisk Server Edition, which uses S3 at the backend but provides a better interface for use as a backup solution, would probably be better.

In addition, JungleDisk would give you built in encryption, something you'd need to add on regardless of how you plan to connect to S3/"the cloud". They have some pretty nice client softwre for Linux as well.

0

Rob · Answer 6 · 2011-10-08T02:16:37+08:00

Rob

2011-10-08T02:16:37+08:002011-10-08T02:16:37+08:00

I like to store my backup within Amazon AWS and I use the free tool s3cmd (http://s3tools.org/s3cmd)

It can be installed quite easily (Debian: apt-get install s3cmd).

All you need an Amazon AWS account to store your files on S3. Then a simple command can run your backup, even incremential or as a sync solution, e.g.:

s3cmd sync /srv/backup  s3://your-bucket-name-at-amazon/

Make sure you run

s3cms --configure

first to enter your AWS credentials.

0

How to choose a cloud service for backups

Ping a Specific Port

How do I tell Git for Windows where to find my private RSA key?

How do you restart php-fpm?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Resolve host name from IP address

How can I sort du -h output by size

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?