I was wondering about the differences between cloud computing management tools and server configuration software as there seems to be an overlap between them. I would like to have some feedback from customers who have been using both software.
What tools do you use to help you manage your cloud, specially with rackspace cloud?
Things like Provisioning, Monitoring, Autoscaling, Alerting etc... I'm just a developer, and for a little while I'm on devops duty.
Here is some background information, of why I am asking:
I am a developer that has been managing several medium sized ( 300,000 to 550,000 page views monthly ) custom web applications. We are running them in one single rackspace server 16 cores / 32 GB Ram / Raid Stripping
Some of the legacy applications are not the better designed and can be resource hungry and the server has been known to be overwhelmed frequently.
Another company just adquired us and we now have to manage their sites. I estimate 350 more views a month.
We need to move into the cloud, because of adminitrative reasons, and are interested about its posbilities of autoscaling. However we are tied to Rackspace Cloud.
Configuration of the servers is no problem as we have several chef recipes to do most of the heavy lifting.
What we need is a way to spin up new servers easily, and something that monitors the servers and either alerts us and maybe create a replacement server.
I have tried to use Scalr.net but after a promising first day it everything went downhill. Then it started to behave erratically: - some servers didn't boot - others went into error mode - scalr wasn't receiving statistics ( so no autoscaling ) - at one time I deleted the servers and scalr didn't noticed
Im still waiting on scalr support. To tell you the truth rackspace maybe partially at fault, but scalr is heavily geared towards AWS so integration with rackspace isn't as solid. Rackspace hasn't been any help either. They are still to provide an explanation.
Then I tried Rightscale, my second choice because of price and openness, but it seems it suffers from the same problems as scalr. They make rackspace a second class citizen.
UPDATE: Actually asked a question :)
Recently I've been experimenting with various cloud management tools like RightScale, Scalr, custom scripts for managing a variety of servers, each hosting several roles (app, db, load balancer, job queues, etc).
The one thing I find lacking in most solutions is a way to do rolling deployments, i.e. running deployments sequentially across a number of servers with the same role. For instance, I dont want to build all of my webservers at the same time, as that will almost definitely result in some down time or 500s for my customers. I'd rather have one or two servers build at a time, while other servers are still available to handle requests.
The other alternative is obviously to launch new servers that automatically update themselves on boot, but this isn't as cost effective, and most likely requires more time for the build to complete (it's faster to build on an existing server than to launch a new server and kill old ones).
We've all heard of the big companies having the famous "push to build" button (companies like Twilio, Etsy, etc.) but it seems that they all have custom implementations of this. I'm not talking about a simple ssh-loop, clusterssh, or even an mcollective - I preferably want something with a nice simple interface that allows me to specify something like a RightScript or a Scalr script to run on a set of servers with a specific role, and it builds them sequentially.
Does any one know of easy ways to get this done, or is this a candidate for a new open source project?
I run a site which has high traffic surges and because of that auto scaling solutions is very profitable for this case. Currently the web server is able to horizontally auto scale but the bottleneck is on the MySQL server.
- I have tried with Amazon RDS Multi-AZ but it takes like 15 minutes for the 12 GB database to upgrade with some minutes of downtime. It has helped a lot when I already knew that a traffic surge was going to happen in some specific moment.
- I have also considered Xeround. This is probably the best solution although it is quite expensive for databases of this size. Anyway it is not an option because I legally need the database to be in the European Union.
- I have read about Scalr but not sure if that could be helpful and how.
- I have seen that many cloud hosting providers offer vertical scaling solutions which I think it has 0 downtime (not sure if that is really possible, as far as I know they use Xen hypervisor). That could be a solution but I wonder if it has not downtime and how the MySQL config (and many other things on the OS) are able to upgrade also without downtime.
- I have tried with MySQL slave servers but it was not helpful at all.
- I am using memcache which helps a lot but it is not enough. I need to upgrade because of writes, not just because of reads.
Any suggestions? Thank you in advance
Of the folks managing their own clusters (i.e. not using/paying for Amazon Autoscale, Rightscale, Scalr, etc.), how are you managing your instances on EC2 and handling (e.g.) failover? I'm wondering if most folks just end up writing their own boatloads of scripts against the EC2 API, as I suspect.
That's certainly our approach: whip up our own Python Boto-based monitoring/restarting daemon that runs off-site, listening for UDP keep-alives from our instances. On failure, we snapshot volumes, register images, start new instances, delete old volumes, and so on.
Every so often, when hacking on our scripts, I think there must be some open-source tools out there that deal with these issues already, and which don't have the constraints of (say) Scalr, but I always come back from Google empty-handed. (Things like Scalr have are pretty limited in the supported set/versions/configurations of software, and have specialized and IMO cumbersome ways of manipulating these setups.)
Also, the Linux-HA/Pacemaker ecosystem (Heartbeat, ldirectord, etc.) sounds like it isn't really suited for EC2. (But then I found this - though I'm not sure this is really a high-quality solution).