Recently I've been experimenting with various cloud management tools like RightScale, Scalr, custom scripts for managing a variety of servers, each hosting several roles (app, db, load balancer, job queues, etc).
The one thing I find lacking in most solutions is a way to do rolling deployments, i.e. running deployments sequentially across a number of servers with the same role. For instance, I dont want to build all of my webservers at the same time, as that will almost definitely result in some down time or 500s for my customers. I'd rather have one or two servers build at a time, while other servers are still available to handle requests.
The other alternative is obviously to launch new servers that automatically update themselves on boot, but this isn't as cost effective, and most likely requires more time for the build to complete (it's faster to build on an existing server than to launch a new server and kill old ones).
We've all heard of the big companies having the famous "push to build" button (companies like Twilio, Etsy, etc.) but it seems that they all have custom implementations of this. I'm not talking about a simple ssh-loop, clusterssh, or even an mcollective - I preferably want something with a nice simple interface that allows me to specify something like a RightScript or a Scalr script to run on a set of servers with a specific role, and it builds them sequentially.
Does any one know of easy ways to get this done, or is this a candidate for a new open source project?
Use Puppet and MCollective together. Puppet can do most of your build work. MCollective can let you pick nodes and schedule them.
http://www.devco.net/archives/2010/03/17/scheduling_puppet_with_mcollective.php
I did deploy webistrano, but I could never get our developers to work with it. they always found some way causing it to mess the deploy up.
I'm not aware of any service that will aid in this type of updates. The problem is that the application needs to be designed with this in mind. What you see a lot of is servers being configured to use server x as the db server or server y as the cache server. This is the biggest problem I see when I start looking at our legacy software and thinking about how we can automate update processes etc.
We had the same problem as you. The solution for us was not too difficult because all our latest product was designed for these kind of updates in the beginning because we've seen how difficult it can be. We tried to avoid tight coupled services in our development. What this enables us to do is to launch a whole new group of servers in a staging area. Once we're done testing the staging area we promote the staging area to the production by changing a CNAME in the dns server. This process happens without any downtime and low risk of updating the servers with a wrong configuration etc. We achieved this using http as the main communication protocol and a local dns server.
I realize that it's probably no easy task to redesign your entire application to fit a specific architecture that works well with rolling updates but that's what we found to be the easiest solution. The famous "push to build" button doesn't have to be only for the big fish even the little tuna can get some of that action. Depending on how complex or simple your application is will define how difficult or easy it is to build your own "push to build" button.