We have two servers I inherited, both running DRBD and each then running KVM virtual machines.
I would love to stop a VM running on server1, and bring up just the 1 VM on server2 for some tests. Though with DRBD doing its thing on these servers and the broken startup script (posted here) I have from server2, it makes me nervous as I don't want to stop fully server1, just the one vm on it. I didn't create or configure these machines and I am in doubt weather the DRBD (Which I know little about) was fully properly implemented. Server1's stop script is posted and servers2 start script is posted here to.
But before all that, I guess I just want to know how to stop safely drbd from mucking with the two servers for a time. So that I can mount a file system on server2, and bring up a VM that I stopped on server1.
Server1 site stop script:
echo poweroff -p now
echo
read -rsp $'Press any key to continue...\n' -n1 key
virsh shutdown irsc
virsh shutdown backup
virsh shutdown user
virsh shutdown repository
virsh shutdown web-firewall
virsh shutdown wiki
virsh shutdown a-gateway
virsh shutdown b-gateway
virsh shutdown dhcp
# shutdown the drbd
#drbd-stop
echo now manually turn off drbd
echo umount /systems
echo drbdadm secondary all
echo drbd-overview
Why the drbd-stop is commented out no idea, and why it echos things it should be doing? I have no idea. But okay, so thats the stop script. Server1's img files for the KVM live in /systems btw.
So I goto server 2. First issue: the /systems folder has no img files in it, but there is a mount line in the startup script. Here is the start-script for server2: (I have no idea what the nodedev-detach pci is really doing.)
#!/bin/sh
# isolate the CPUs for the VMs
#site-isolate
# backup 192 network
virsh nodedev-detach pci_0000_06_10_2
# 10.7
virsh nodedev-detach pci_0000_02_10_0
# 10.5
virsh nodedev-detach pci_0000_06_10_3
# 10.2
virsh nodedev-detach pci_0000_02_10_1
# a-gateway
# 192
virsh nodedev-detach pci_0000_06_10_0
# 10.5
virsh nodedev-detach pci_0000_06_10_1
# 10.7
virsh nodedev-detach pci_0000_02_10_4
# b-gateway
# 192
virsh nodedev-detach pci_0000_06_10_4
# 10.2
virsh nodedev-detach pci_0000_02_10_5
# dhcp
# 10.5
virsh nodedev-detach pci_0000_06_10_7
# 10.7
virsh nodedev-detach pci_0000_02_11_0
# 10.2
virsh nodedev-detach pci_0000_02_11_1
# dns2
# 192
virsh nodedev-detach pci_0000_06_11_0
# web-server
# 10.7
virsh nodedev-detach pci_0000_02_11_4
# web-firewall
# 192
virsh nodedev-detach pci_0000_06_10_6
# 10.7
virsh nodedev-detach pci_0000_02_12_4
# 10.2
virsh nodedev-detach pci_0000_02_11_5
# irsc
# 10.7
virsh nodedev-detach pci_0000_02_13_0
# BTTV
virsh nodedev-detach pci_0000_09_00_0
# firewall
# 10.25
virsh nodedev-detach pci_0000_02_12_1
# 10.5
virsh nodedev-detach pci_0000_06_11_1
# bro-server
# 192
virsh nodedev-detach pci_0000_06_11_2
echo start drbd
# start the disk mirror with the slave
service drbd start
sleep 2
# now setup drbd and filesystems
# for all VM images, mount the /systems
drbdadm primary systems
mount /dev/drbd/by-res/systems /systems
# for arc-gateway
drbdadm primary arc-gateway-data
# for backup
drbdadm primary archive
drbdadm primary amanda
# for user computer
# for user computer
drbdadm primary users
# for web server computer
drbdadm primary web-server
# for wiki
drbdadm primary svn
# for irsc. *** this is the one I want to bring up? do I have to do this drbdadm primary irsc
drbdadm primary irsc
echo start vms
# start the VMs
# fundamental servers
virsh start dns2
virsh start dhcp
# take a long time to start servers
virsh start devel1
virsh start xmail
# gateways, sdss-gateway takes a long time
virsh start sdss-gateway
virsh start arc-gateway
virsh start user
# APO servers
virsh start web-server
virsh start backup
virsh start repository
virsh start wiki
virsh start irsc
# finally web firewall, now online to the world
virsh start web-firewall
As you explained in an above comment. All the VM's root volumes are stored as image files in the filesystem mounted at
/systems
. In order to safely fail this over to the peer system you would need to stop access to this filesystem (stop all VMs) and unmount it first. This lumps all the VMs together, and makes it so you would need to failover all VMs.One option, which is generally not advised would be to disconnect the DRBD nodes and manually cause a split-brain. Essentially both nodes would be primary at the same time, and thus cause data-divergence which you will need to manually resolve to reconnect them. I would first verify your DRBD configuration doesn't include any automatic split-brain recovery options. The procedure should be similar to the below. Use caution here particularly with the
--discard-my-data
command. Running these from the wrong node could be disastrous.