I have a project that I'm working on that will require a lot of computationally intensive services running 24/7 like crawlers, calculations for recommendations etc. My plan is to have the public website that users can login to and interact with hosted on Amazon's EC2 service to ensure that it is almost always up (yeah I know Amazon recently had the huge outage). Since on EC2 you get charge based on CPU time I'd like to run all my computationally intensive services on my own local computers. I'll then keep the database on the Amazon cloud synced with my local databases. I think this should keep my costs pretty low. My one question is what is the best way to connect everything. To try and give a bit more detail while still being ambiguous: say a user adds what I'll call here an object through the web interface. I then need to somehow let me local servers know that the user has added this object and to preform a task t
. Say task t
modifies the database. I could either have it modify the Amazon database so that the changes are immediately apparent to the user or have it modify my local database which would eventually replicate to the Amazon database. So here are my two main questions:
- How would you transfer information between the the cloud and local servers? Would you have say a REST api for both the web app on S3 and your local services so they can communicate over HTTP communications or would you maybe use SSH somehow to keep a continuous connection between the two open?
- Would you have all database changes made to the Amazon database so change are immediately visible to users or should the local services only make changes to the local database.
All suggestions or recommendations are greatly appreciated.
It's hard to answer without knowing more about your situation. Does it matter if the user adds the object but doesn't see it for an hour? What is the object? A db record, or is it a file?
Are your computations database-intensive or does the database just store the output of a computation? If the latter, why have a local database at all? If the former, are your computations mostly selects? You could set up master-slave replication between the two databases, with the computations only running on the slave.
In your situation you may want to look into Amazon Virtual Private Cloud (VPC) for communication between your EC2 instances and local servers.