Forgive me if I'm not able to be totally clear here. It is not intentional, I'm a senior level developer in a very small company having to act like a manager at the moment.
Anyway, the story is that we have 2 older dell servers with SQL Server 2008 Standard in a "cluster". I put that in quotes because I'm still not 100% clear what that means. We have 2 brand new blade servers and want to move the existing databases to the new hardware.
Ok, so here is the gotcha. We need to do this with little or no down time. I'm being told that we can evict the passive node, then pull in one of the new servers. But I'm also being told that this is a dangerous step because something could go wrong that would cause the cluster to fail and then we would be left with nothing because the active server would not be able to come back up.
Does anyone have any thoughts on how to handle this? I'm being told that the only way to ensure success is to have at least a day of down time where we bring up a new cluster on the new hardware and then migrate the databases 1 by 1.
[Edit] Since it is still related to this question I'd like to add another question. Is it possible for us to remove a machine from the cluster. Then create a new cluster with the removed node as the active machine and then bring a new server into that? Effectively preserving the old cluster while the new machines get swapped in and out in case something goes wrong?
While it's of little help now you should be running enterprise is you need high availability, the most obvious feature you would be using in this situation is the ability to have up to 16 nodes in a cluster, so in your case you just would have added 2 more nodes then removed the ones you no longer wanted. I would consider upgrading the version while you are upgrading the hardware
Anything is possible. While I've never seen a server 208 sql 2008 failover cluster simply drop dead, it's theoreticaly possible. Note that the active node is not "down" during the node upgrade so there is nothing to take down. The cluster is simply running on 1 node without possibility of failover. The reasonable worst case scenario is that the old node is somehow dead and the replacement won't add, in which case you would be running without failover capability until the issue that is causing the server not to add is resolved.
That's probably the only way to ensure the success of the guy doing the work. I'd ask the innocent question of "if it takes a day of downtime to move a cluster why would I cluster in the first place? I could buy 2 machines and leave 1 off and ready to go for that kind of availability". In short you need to find someone that's actually works with clusters befiore and understands the technology involved. Presuming there are no unique issues (EG your company wrote some almost cluster aware software that runs on the cluster) I'd think most professional microsoft admins would be embarrassed to say it would take a day of downtime to replace/add hardware to an existing, working cluster
First off, the recommended strategy at the end of your question is the way I would recommend to do it as well but seeing as that is not an option this is how I would handle it. You seem confused about a cluster, basically both servers have SQL installed and cluster services, with a command through cluster services you can "roll" SQL from one server to another. If I were in your shoes I would do it as you have suggested, roll all services to one node, remove the second node from the cluster, add one of your new servers as a cluster node, roll all services to the new cluster node, add the second new node, remove the second old node from the cluster.
**Please note, if you are unfamiliar with cluster services and/or clustered SQL installations and you attempt this on your live system this could end very, very badly for you. As in far worse than the one day of planned downtime. I would either hire a consultant with experience with clusters, or if that was not an option setup a test environment htat it could test the process inside and out.
You don't need to break the old cluster at all unless you want to use the hardware again. I would recommend the following:
This will get your new instance in the same state as the old one, along wiht a fresh install of the OS and SQL. In order to cutover to the new cluster you can do the following, assuming that the name of your old instance is INSTA and the new one is INSTB:
Once this is done the applications should be connecting the the old name of the SQL instance but that will take them to the new server. You may need to run "ipconfig /flushdns" on all the application servers in order to make the DNS change work faster, make sure to ping the old name to see when it points back. We use this method for cutover because it allows us to keep the old cluster around in case we need to roll back. You will not be able to bring the old SQL instance up until you change the SQL Server Network Name parameter to something else, but once that is done you would just point the DNS alias back to the old one if you want to roll back.
Without knowing the specifics of the hardware to know if this would work, my suggestion would be to image the old passive node over to the new server. Using something like Acronis that would allow for the image to be put on new hardware should allow you to basically move the passive node to the new hardware. Once there, you can power it up and verify that it is functioning properly (as much as you can), and then try to fail it over to the new hardware. Although there are many things that could go wrong, as Jim B said, there is a good chance it will either fail over properly to the new hardware, or not work and just have to go back to the old hardware. If it works, then you can repeat the process on the other node. If it doesn't, you can just power the old passive node back on (which you wouldn't have to destroy), and try something else.