Our PHP application consists of a single web server that will receive files from clients and perform a CPU-intensive analysis on them. Right now, analysis of a single user upload can take 3sec to conclude and take 100% CPU. This makes our system capacity amount to 1/3 requests per second.
My team's requirement is to increase capacity without a lot of code reengineering. A possible solution would be to set up a load balancer in front of multiple servers running the same app, connecting to a common DB. The problem is that the analysis outputs files on disk.
A load balancer would increase capacity, but then files won't be available between servers so consequent client requests may fail. We are hosted on Rackspace, is there a way to configure some sort of "common" storage for all servers, without having to rewrite our file persistance code? Current code relies on simple fopens etc. What are our options?
GlusterFS will provides a full POSIX filesystem export that can be mounted just as most other local filesystems. It will replicate to a configured degree for redundancy and otherwise will only pull data on request. As long as each server is configured so that the files created have unique paths even in a blind situation, you should be in a very good spot.
Why don't you mount Cloud Files onto each server using CloudFuse?
You can then use cloud files as your common filesystem. It's not ideal for I.O heavy work but for just saying and reading occasionally it works fine, plus you can then serve the file from a CDN
Which subsequent requests will need to access the file? Just the one user within that login session, the same user forever, or anyone? If it's one of the first two options the load balancer should be able to help.
I believe Rackspace offer an F5 BIG-IP based load balancing service. Session persistence (sending users back to the same server for their whole session) should be an option in the load balancing service. I assume you're talking about HTTP traffic, in which case the load balancer can inject a cookie into the session and use that to make sure the client comes back to the same server where their processed file resides (either a session cookie or a time limited one).
I don't know if Rackspace let customers use F5 iRules but if they do, you might even be able to handle the third case by having the load-balancer work out which server is hosting the file.
If the files never get into the db, then yes, you need a single file system used by all web heads. If the files are only used during the user session, (the session that uploads the file), you can use source-ip or session based stickyness in the load balancer to solve the problem without needing a single file system.
All the load balancers support various stickiness methods. The F5 loadbalancer is great, but rackspace also sells the brocade which is much less $.
If you need to go to single file system, that will involve some rework, and there are a number of ways to solve it (e.g. one of the web heads could be the file system, or the db server, or a new dedicated system, or a cloud storage system from rackspace, AWS or other).
hth!