I've been playing about with Amazon S3 a little for the first time and like what I see for various reasons relating to my potential use case.
We have multiple (online) remote server boxes harvesting sensor data that is regularly uploaded every hour or so (rsync'ed) to a VPS server. The number of remote server boxes is growing regularly and forecast to keep growing (hundreds). The servers are geographically dispersed. The servers are also automatically built, therefore generic with standard tools and not bespoke per location. The data is many hundreds of files per day.
I want to avoid a situation where I need to provision more VPS storage, or additional servers every time we hit the VPS capacity limit, after every N server deployments, whatever N might be.
The remote servers can never be considered fully secure due to us not knowing what might happen to them when we are not looking. Our current solution is a bit naive and simply restricts inbound rsync only over ssh to known mac address directories and a known public key. There are plenty of holes to pick in this, I know.
Let's say I write or use a script like s3cmd/s3sync to potentially push up the files.
Would I need to manage hundreds of access keys and have each server customized to include this (do-able, but key management becomes nightmarish?)
Could I restrict inbound connections somehow (eg by mac address), or just allow write-only to any client that was running the script? ( i could deal with a flood of data if someone got into a system? )
having a bucket per remote machine does not seem feasible due to bucket limits?
I don't think I want to use a single common key as if one machine is breached then potentially, a malicious hack could get access to the filestore key and start deleting for ll clients, correct?
I hope my inexperience has not blinded me to some other solution that might be suggested!
I've read lots of examples of people using S3 for backup, but can't really find anything about this sort of data collection, unless my google terminology is wrong...
I've written more than I should here, perhaps it can be summarised thus: In a perfect world I just want to have one of our techs install a new remote server into a location and it automagically starts sending files home with little or no intervention, and minimises risk? Pipedream or feasible?
TIA, Aitch
Edit 1: Perhaps bad form to answer ones own question but...
After much further googling and browsing it appears that the (new?) Identity and Access Management (IAM) might be what I need, it says "...IAM eliminates the need to share passwords or access keys, and makes it easy to enable or disable a User’s access as appropriate..." I may start thinking about using the hw mac address as some sort of unique user and a hash os some form as the password so it can be programatically set.
That's correct you'll want to use IAM http://aws.amazon.com/documentation/iam/ to handle the per server credentials. As far as buckets there's a 100 bucket limit. One way to use multiple buckets can be to have a bucket per region, if the server is compromise you reduce your loss. Another option is to have servers upload to bucket A every night and have a separate secure process that moves data from bucket A to bucket B (you only have access to this bucket). If a server is compromised you have up to the last run of the process.