Background: I am forced to remotely upgrade a server from Ubuntu 8.04 LTS to 10.04 LTS due to an incompability issue with the raid controller.
The internet connection to the server is somewhat stable and seldom drops. Despite that I am concerned about losing the connection over SSH while doing the upgrade, leaving the server in an unreachable state. I am also worried about the server not being able to boot after the upgrade, in case I will be unable to know what is the problem.
Action plan: What I am looking for is advice to minimize the risk of losing the server, I am aware that what I am doing is very risky. This is my current action plan:
1) Backup everything that matters, locally and externally.
2) Temporarily disable boot-time disk checks with fsck. (I will have no clue what is going on if the disk check would take a long time to finish). This would be done through fstab by changing the very last paramter from 1 to 0:
UUID=5b1ff964-7608-44fd-a38d-7e43ad6b4c11 / ext3 relatime,errors=remount-ro 0 0
3) Starting all upgrade processes with with screen so that they can be resumed if I lose the connection. Ie:
sudo screen apt-get upgrade
Questions:
- Does my proposed action plan seem reasonable?
- Is disabling the boot-time disk check a bad idea?
- What else could be done to decrease the risk of losing the server?
Update: Almost all answeres suggested me to setup DRAC/IPMI which I have now done. This feels like a really great acheivement that will for sure make the risk much much smaller as I can follow the entire power cycle over KVM/console redirection. For future references, this is what I have done:
1) Installed ipmitool to setup IP address, gateway etc for IPMI v2.0:
sudo ipmitool lan set 1 ipaddr 192.168.1.99
sudo ipmitool lan set 1 defgw ipaddr 192.168.1.1
2) Installed free-ipmi to change the NIC selection mode to shared (I have only one network interface connected to the network):
sudo ipmi-oem dell set-nic-selection shared
3) Used DRAC's https interface on https://192.168.1.99 to launch the console redirection viewer. This allows me to follow the entire boot sequence as well as configuring BIOS, raid controllers etc. Awesome.
Update 2. Done. All went with a charm, took less than 30 min to do the job. I ended up not turning off the disk check as the redirected console gave me the freedom to interrupt it whenever I wanted to, but I let it run to the end.
Thank you guys, your wisdom is invaluable!