I am thinking of using a cloud service to backup one of my client's website.
My (clients) main concerns are (in decreasing order of importance)
- Protection of IP (trade secrets, source code), user account details etc
- Uptime guarantee offered by service provider (to minimise webserver down times)
- Cost
- Upload/download speeds
Ideally, I would like service that does not have a long tie in (i.e. I would prefer a kind of "pay-as-you-go" service)
I would also like to avoid vendor lockin, where it is next to impossible to move to another service.
I would like some general guidelines on:
- How to go about choosing a service provider
- Who are the main players in the field
- recommendation of software to use for: backup/restore/ and upload/download of the saved/restored files
The server software is either going to be Ubuntu or Debian (I'll probably post a question on which OS to go for as a server - I am already familiar with Ubuntu)
Any solution that doesn't include encryption on the client side with keys held by the owner is not going to meet the first stated requirement (IP protection / security) - any hack of the server side discloses unencrypted data. This rules out cloud syncing systems such as Dropbox that own the keys.
To avoid hosting the all-important encryption keys on the website's server, which is also likely to be hacked at some point, here's what I would do:
Step 1: Server (1) pulls the backup from (2), so most hacks of the website server will not compromise backups. Encryption takes place at this point.
Step 2: server (1) pushes the encrypted backups to (3) so that there is an offsite backup. If the backups were encrypted in step 1, you can just use an rsync mirror of the local rsnapshot tree to the remote system.
The security of all the various hosts is important, so this should be adjusted to meet the security profile of the client i.e. analyse the threats, risks, attack vectors, etc. Ubuntu Server is not a bad start as it has frequent security updates for 5 years, but attention to security is required on all servers.
This setup provides 2 independent backups, one of which can be a highly available cloud storage service, operates in pull mode so most attacks on the website cannot destroy the backups at the same time, and it uses well proven open source tools that don't require much administration.
Since this setup uses standard SSH and rsync, it should be easier to choose a suitable provider with the right uptime guarantees, strong security, etc. You don't have to lock in to a long contract, and if the backup service has a catastrophic failure, you still have a local backup and can switch to another backup service quite easily.
Software-wise, consider duplicity for incremental backups with asymetric encryption and a dumb receiver (non-cloud howto).
I always tell my clients that the best, least expensive and most efficient backup solution is one that you build yourself, for your own purposes.
When I build a system for my clients, I use rsync with SSH keys to handle authentication between serverA and serverB, where serverA contains the data to be backed up. The command to archive and rsync the data is contained in a bash script in a non-web-accessible directory, called by cron every H hours (24 for daily, etc. etc.)
The backup server, serverB, is to be used SOLELY for backups. I always advise my clients to use an extremely lengthy password with SSH key authentication to allow for downloading of backups and backing up. Sometimes, my clients need backups to be saved for D days, so I write some scripts to handle that (take data from the active backup directory, apply a timestamp, add to an archive in another directory).
For small business / prosumer, I'd recommend Amazon's Storage Service.
And the rather vague assurance that "Authentication mechanisms are provided to ensure that data is kept secure from unauthorized access"
While bluenovember is on the right track with S3, Amazon's system isn't really a drop-in backup solution, it's a raw data storage solution that still requires a front end system to be used for backup, whether that's a few API calls or a full backup management suite. Something like JungleDisk Server Edition, which uses S3 at the backend but provides a better interface for use as a backup solution, would probably be better.
In addition, JungleDisk would give you built in encryption, something you'd need to add on regardless of how you plan to connect to S3/"the cloud". They have some pretty nice client softwre for Linux as well.
I like to store my backup within Amazon AWS and I use the free tool s3cmd (http://s3tools.org/s3cmd)
It can be installed quite easily (Debian: apt-get install s3cmd).
All you need an Amazon AWS account to store your files on S3. Then a simple command can run your backup, even incremential or as a sync solution, e.g.:
Make sure you run
first to enter your AWS credentials.