Ping a Specific Port

Question

LetMeSOThat4U

Asked: 2018-10-09 04:42:47 +0800 CST2018-10-09 04:42:47 +0800 CST 2018-10-09 04:42:47 +0800 CST

KVM+DRBD replicated between two active-passive servers with manual switching

772

I need to build 2-node cluster(-like?) solution in active-passive mode, that is, one server is active while the other is passive (standby) that continuously gets the data replicated from active. KVM-based virtual machines would be running on active node.

In case of the active node being unavailable for any reason I would like to manually switch to the second node (becoming active and the other passive).

I've seen this tutorial: https://www.alteeve.com/w/AN!Cluster_Tutorial_2#Technologies_We_Will_Use

However, I'm not brave enough to trust fully automatic failover and build something that complex and trust it to operate correctly. Too much risk of split-brain situation, complexity failing somehow, data corruption, etc, while my maximum downtime requirement is not so severe as to require immediate automatic failover.

I'm having trouble finding information on how to build this kind of configuration. If you have done this, please share the info / HOWTO in an answer.

Or maybe it is possible to build highly reliable automatic failover with Linux nodes? The trouble with Linux high-availability is that there seems to have been a surge of interest in the concept like 8 years ago and many tutorials are quite old by now. This suggests that there may have been substantial problems with HA in practice and some/many sysadmins simply dropped it.

If that is possible, please share the info how to build it and your experiences with clusters running in production.

5 Answers

Voted

batistuta09 · Answer 1 · 2018-10-15T14:46:59+08:00

batistuta09

2018-10-15T14:46:59+08:002018-10-15T14:46:59+08:00

Why not using things which have been checked by thousands of users and proved their reliability? You can just deploy free Hyper-V server with, for example, StarWind VSAN Free and get true HA without any issues. Check out this manual: https://www.starwindsoftware.com/resource-library/starwind-virtual-san-hyperconverged-2-node-scenario-with-hyper-v-server-2016

6

shodanshok · Answer 2 · 2018-10-09T11:39:12+08:00

I have a very similar installation with the setup you described: a KVM server with a stanby replica via DRBD active/passive. To have a system as simple as possible (and to avoid any automatic split-brain, ie: due to my customer messing with the cluster network), I also ditched automatic cluster failover.

The system is 5+ years old and never gave me any problem. My volume setup is the following:

a dedicated RAID volume for VM storage;
a small overlay volume containing QEMU/KVM config files;
bigger volumes for virtual disks;
a DRBD resources managing the entire dedicated array block device.

I wrote some shell scripts to help me in case of failover. You can found them here

Please note that the system was architected for maximum performance, even at the expense of features as fast snapshots and file-based (rather than volume-based) virtual disks.

Rebuilding a similar, active/passive setup now, I would heavily lean toward using ZFS and continuous async replication via send/recv. It is not real-time, block based replication, but it is more than sufficient for 90%+ case.

If realtime replication is really needed, I would use DRBD on top of a ZVOL + XFS; I tested such a setup + automatic pacemaker switch in my lab with great satisfaction, in fact. If using 3rdy part modules (as ZoL is) is not possible, I would use a DRBD resources on top of a lvmthin volume + XFS.

Dok · Answer 3 · 2018-10-09T08:45:13+08:00

Dok

2018-10-09T08:45:13+08:002018-10-09T08:45:13+08:00

You can totally setup DRBD and use it in a purely manual fashion. The process should not be complex at all. You would simply do what a Pacemaker or Rgmanager cluster does, but by hand. Essentially:

Stop the VM on the active node
Demote DRBD on the active node
Promote DRBD on the peer node
Start the VM on the peer node

Naturally, this will require that both nodes have the proper packages installed, and the VM's configurations and definition exist on both nodes.

I can assure that the Linux HA stack (corosync and pacemaker) are still actively developed and supported. Many guides are old, the software has been around for 10 years. When done properly, there are no major problems or issues. It is not abandoned, but it is no longer "new and exciting".

3

Chaoxiang N · Answer 4 · 2018-11-07T10:27:09+08:00

Active/Passive clusters are still heavilly used in many places, and running in production. Please find below our production setup, it is working fine, and you can either let it run in manual mode (orchestrate=start), or enable automatic failover (orchestrate=ha). We use zfs to benefit from zfs send/receive, and zfs snapshots, but it is also possible to use drbd if you prefer synchronous replication.

Prerequisites :

2 nodes (in my setup 2 physical nodes 400 kilometers distance)
internal disks
1 zfs pool on each node
stretched vlan (in my setup we use "vrack" at OVH hosting provider)

Steps :

install opensvc agent on both nodes (https://repo.opensvc.com)
form opensvc cluster (3 commands needed, described in the screencast at https://www.opensvc.com)
create a root ssh trust between both nodes
create 1 opensvc service per kvm guest [service config file below]

root@node1:~$ svcmgr -s win1 print config

[DEFAULT]
env = PRD
nodes = node1.acme.com node2.acme.com
id = 7a10881d-e5d5-4817-a8fe-e7a2004c5520
orchestrate = start

[fs#1]
mnt_opt = rw,xattr,acl
mnt = /srv/{svcname}
dev = data/{svcname}
type = zfs

[container#0]
type = kvm
name = {svcname}
guestos = windows
shared = true

[sync#1]
src = data/{svcname}
dst = data/{svcname}
type = zfs
target = nodes
recursive = true
schedule = @12h

A few explanations :

service is named "win1" and each {svcname} in the service config file is a reference pointing to actual service name (win1)
service start do the following :
- mount zfs dataset data/win1 on mountpoint /srv/win1
- start kvm container win1
ressource sync#1 is used to declare an asynchronous zfs dataset replication to the slave node (data/win1 on node1 is sent to data/win1 on node2), once per 12 hours in the example (zfs send/receive is managed by the opensvc agent)
opensvc agent is also dealing with kvm qemu config replication, and defining it when the service is relocated to the slave node

Some management commands :

svcmgr -s win1 start start the service
svcmgr -s win1 stop stop the service
svcmgr -s win1 stop --rid container#0 stop the container referenced container#0 in the config file
svcmgr -s win1 switch relocate the service to the other node
svcmgr -s win1 sync update trigger an incremental zfs dataset copy
svcmgr -s win1 sync full trigger a full zfs dataset copy

Some services I manage also need zfs snapshots on a regular basis (daily/weekly/monthly), with retention, in this case I add the following config snippet to the service configuration file, and the opensvc agent does the job.

[sync#1sd]
type = zfssnap
dataset = data/{svcname}
schedule = 23:00-23:59@61
keep = 7
name = daily
recursive = true
sync_max_delay = 1d

[sync#1sw]
type = zfssnap
dataset = data/{svcname}
schedule = 23:00-23:59@61 sun
keep = 4
name = weekly
recursive = true
sync_max_delay = 7d

[sync#1sm]
type = zfssnap
dataset = data/{svcname}
schedule = 23:00-23:59@61 * *:first
keep = 6
name = monthly
recursive = true
sync_max_delay = 31d

As requested, I also add one lvm/drbd/kvm config :

drbd resource config /etc/drbd.d/kvmdrbd.res :

resource kvmdrbd {
    device /dev/drbd10;
    disk /dev/drbdvg/drbdlv;
    on node1 {
        address 1.2.3.4:12345;
        meta-disk internal;
    }
    on node2 {
        address 4.3.2.1:12345;
        meta-disk internal;
    }
}

opensvc service config file /etc/opensvc/kvmdrbd.conf :

root@node1# svcmgr -s kvmdrbd print config
[DEFAULT]
env = PRD
nodes = node1.acme.com node2.acme.com
id = 7a10881d-f4d3-1234-a2cd-e7a2018c4321
orchestrate = start

[disk#1]
type = lvm
vgname = {env.drbdvgname}
standby = true

[disk#2]
type = drbd
standby = true
shared = true
res = {svcname}

[fs#0]
mnt = {env.basedir}/{svcname}
type = ext4
dev = /dev/{env.drbddev}
shared = true

[container#0]
type = kvm
name = {svcname}
shared = true

[sync#i0]
schedule = @1440

[env]
basedir = /srv
drbddev = drbd10
drbdvgname = drbdvg

Some explanations :

in my setup, I replicate lvm lv with drbd. I create a filesystem on the drbd block device. In this filesystem, I create 1 flat file per disk I want to present to the kvm guest.
disk#1 : is the lvm vg hosting the big logical volume. should be at least 5GB.
disk#2 : is the drbd disk pointed by the drbd resource name. If opensvc service is named "foo", you should have /etc/drbd.d/foo.res. Or change disk#2.res parameter in the service config file.
fs#0 : the main filesystem hosting all disk files for kvm guest
container#0 : the kvm guest, same name as the opensvc service in the example. agent must be able to dns resolve the kvm guest, to do a ping check before accepting to start the service (if ping answer, the kvm guest is already running somewhere, and it is not a good idea to start it. double start protection ensured by opensvc agent)
standby = true : mean that this resource must remain up when the service is running on the other node. In our example, it is needed to keep drbd running fine
shared = true : https://docs.opensvc.com/latest/agent.service.provisioning.html#shared-resources

Bert · Answer 5 · 2018-10-09T05:02:52+08:00

I'm currently up to an extremely similar system. 2 servers, one active, one backup and they both have a few VMs running inside them. Database is being replicated and the fileservers are in constant sync with rsync (but only one way). In case of emergency, the secondary server is being served. There was the idea of using Pacemaker and Corosync but since this has to be 100%, I didn't have the courage to experiment. My idea is NginX watching over the servers. This could be done because I'm using a webapplication, but in your case, I don't know if you could use it. DRBD is a mess for me. The previous servers were using it and while it seemingly worked, it felt like I'm trying to dissect a human body.

Check this out, it might help you: http://jensd.be/156/linux/building-a-high-available-failover-cluster-with-pacemaker-corosync-pcs

It doesn't look hard, in fact, in a small environment I've already tried it and worked. Easy to learn, easy to make, easy to maintain. Actually I think this is what you are looking for.

KVM+DRBD replicated between two active-passive servers with manual switching

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?