I am always curious about how large-scale live web application updates are done. Since the application is live, that is why it complicates everything -you should not down your service and at the same time you should recover the activity/changes (in database etc.) made on your site to the new version during the update later on.
The first and most natural technique comes to the mind is that redirecting all the requests to some other replicated server, so that you can update original server without shutting-down your service.
I just wonder is there any other smarter techniques to handle updates in a live web service. Please, share your experience and opinions guys!
Do you load your site on a single server? If not, I would presume you have some sort of load balancer.. If you do load on a single server, scale out and install a load balancer..
The joys of having such a service allow you to not only be highly available, but should you need to work on your application you can disable one of the servers from accepting traffic to the outside world and then work on upgrading/testing your application/website during less busy traffic periods.
I load my websites and application (same thing really) over 15 servers and have some used as 'Sorry Servers', where if my main servers are busy I can load to the 'Sorry's'. In this situation I can work on my spares and upgrade them, then when I'm happy with everything working, I slowly bring one box out of the pool at a time and work on that one..
Monitoring your website/network traffic with something like Cacti (www.cacti.net) will allow you to see your busy times based on traffic and then work on updating outside of busy periods.
Hope this gives you some light.
We typically do "rolling upgrades" to address this problem. This means instead of having just one server handling all your load, you have N servers. You are able to "take a server out of rotation" by simply stopping the apache webserver for example. We need to have another mechanism to keep a server up from an apache perspective but to shield it from taking incomming customer connections. We do this with a simple text file, the presence of which, indicates to our load balancer to mark the weight of this server as 0.
For load balancing, we use the open source Linux Virtual Server, or LVS: http://www.linuxvirtualserver.org
It support a "realserver" health check. We run this health check periodically to maintain a list of active webservers behind the load balancer.
By marking a host for upgrade, upgrading an individual server at a time, sending test traffic to it, confirming it is working, and then readding it to the pool, we are able to effectively roll upgrades to a live service.
Harder to do on the DB tier when we're talking online schema changes... but there are solutions out there like Tungsten Cluster in the MySQL space for example that try to solve this problem.