I'm running a simple rails stack on a single dedicated machine. We're reaching our full capacity and have absolutely no setup for scaling, just one app on one machine. I did some research and came up with a potential stack for scalability. I'm no pro admin, but I have drawn up a few thoughts on what we should do with EC2. I'm still a bit unsure about filesystem sharing, which is where my main question is. First, here's what I'm dealing with.
Current stack:
- rails 2.3.11
- postgresql
- passenger+nginx
- delayed_job
- sphinx + thinking_sphinx
- imagemagick (heavy image processing)
- jaxer (will explain)
What our app does:
Our app does a lot of image uploads and heavy image processing with ImageMagick. It also talks to jaxer for lengthy canvas-to-image conversions. All of this is in delayed job. We'd like to make sure this stuff can scale especially. So we're talking about fast-growing file storage needs and heavy image processing in background jobs.
My decisions so far:
- use rubber gem for help with deployment/administration
- move from delayed_job to redis/resque for easier decoupling of workers (client/server), multiple queues, and sinatra web interface
- have roles like app, db, web, redis, resque, with everything on a single ec2 instance at first, but soon split out the redis/resque stuff into a separate instance, and potentially more of those
Questions:
The main actual question is: what happens to all the files? If I decide to split the app role into multiple instances, how do I get a shared filesystem access?
Additionally it'd be great to hear some thoughts on my setup in general.
So, most of what you've got there is pretty straightforward to scale. PgSQL onto it's own machine to relieve a pile of CPU/disk IO/memory consumption as a first step. Sphinx into it's own little world too. Definitely switch to resque to allow easy horizontal scaling of your workers.
But the files... yes, the files are difficult. They're always difficult. And your options are perilously thin.
Some people will recommend the clustered filesystem route, either magic superclustering (GFS2/OCFS2) or the slightly poorer-mans option of something like GlusterFS. I've run a lot of systems (1000+) using GFS, and I'd never, ever do it again. (It may not even run on EC2's network, actually). GFS2/OCFS2 are a mess of moving parts, under-documented, overcomplicated, and prone to confusing failure modes that just give you downtime hassle. It also doesn't perform worth a damn, especially in a write-heavy environment -- it just falls over, taking your entire cluster down and taking 10-30 minutes of guru-level work to get it up and running again. Avoid it, and your life is a lot easier. I've never run GlusterFS, but that's because I've never been particularly impressed with it. Anything you can do with it, there's usually a better way to do it anyway.
A better option, in my opinion, is the venerable NFS server. One machine with a big (or not so big) EBS volume and an NFS daemon running, and everyone mounts it. It has it's gotchas (it's not really a POSIX filesystem, so don't treat it as such), but for simple "there, I fixed it" operation for 99% of use cases, it's not bad. At the very least, it can be a stopgap while you work on a better solution, which is...
Use your knowledge of your application to tier out your storage. This is the approach I took most recently (scaling Github), and it's worked beautifully. Basically, rather than treating file storage as a filesystem, you treat it like a service -- provide an intelligent API for the file-storage-using parts of your application to use to do what you need to do. In your case, you might just need to be able to store and retrieve images to pre-allocated IDs (the PK for your "images" table in the DB, for instance). That doesn't need a whole POSIX filesystem, it just needs a couple of super-optimised HTTP methods (the POSTs need to be handled dynamically, but if you're really smart you can make the GETs come straight off disk as static files). Hell, you're probably serving those images straight back to customers, so cut out the middle man and make that server your publically-accessable assets server while you're at it.
The workflow might then be something like:
You don't necessarily have to use HTTP POST to put the images onto the server, either -- Github, for instance, talks to it's fileservers using Git-over-SSH, which works fine for them. The key, though, is to put more thought into where work has to be done, and avoid unnecessary use of scarce resources (network IO for an NFS server request), trading instead for a more constrained set of use options ("You can only request whole files atomically!") that work for your application.
Finally, you're going to have to consider the scalability factor. If you're using an intelligent transport, though, that's easy -- add the (small amount of) logic required to determine which fileserver you need to talk to in each place that talks to the fileserver, and you've basically got infinite scalability. In your situation, you might realise that your first fileserver is going to be full at, say, 750,000 images. So your rule is "Talk to the fileserver with hostname
fs#{image_id / 750_000}
", which isn't hard to encode everywhere.I always used to have fun with this part of my job...