Ping a Specific Port

Question

Gerald Schneider

Asked: 2019-02-13 00:09:20 +0800 CST2019-02-13 00:09:20 +0800 CST 2019-02-13 00:09:20 +0800 CST

keepalived doesn't detect loss of virtual IP

772

I'm using keepalived to switch a floating IP between two VMs.

/etc/keepalived/keepalived.conf on VM 1:

vrrp_instance VI_1 {
    state MASTER
    interface ens160
    virtual_router_id 101
    priority 150
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass secret
    }
    virtual_ipaddress {
        1.2.3.4
    }
}

/etc/keepalived/keepalived.conf on VM 2:

vrrp_instance VI_1 {
    state MASTER
    interface ens160
    virtual_router_id 101
    priority 100
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass secret
    }
    virtual_ipaddress {
        1.2.3.4
    }
}

This basically works fine, with one exception: Everytime systemd gets updated (it's running Ubuntu 18.04) it reloads it's network component, resulting in dropping the floating IP because it's not configured in the system. Since both keepalived instances still can ping each other, none of them sees anything wrong and none of them reacts on this, resulting in the floating IP staying down.

I found that you can check for the IP with a simple script like this:

vrrp_script chk_proxyip {
    script "/sbin/ip addr |/bin/grep 1.2.3.4"
}

vrrp_instance VI_1 {
    # [...]
    track_script {
        chk_proxyip
    }
}

But I'm not sure if this is a working approach.

If I understand it correctly the following would happen, if I configure this script on VM1:

VM1 loses the IP due to a systemd restart
keepalived on VM1 detects the loss of the IP
keepalived switches to FAULT state and stops broadcasting vrrp packages
keepalived on VM2 detects the loss of keepalived on VM1 and puts the floating IP up

At this point the IP is working again on VM2, but VM1 would stay in this state because the IP never comes up again on VM1. If VM2 goes down (for whatever reason) VM1 wouldn't take it over, because it is still in FAULT state.

How can I ensure that the floating IP is always up on one of the VMs?

Further tests:

I tried to ping the floating IP instead of checking if it is active on a specific host in a check_script:

vrrp_script chk_proxyip {
    script "/bin/ping -c 1 -w 1 1.2.3.4"
    interval 2
}

Configuring this script on node 2 resulted in the following:

removed the IP on node 1 for testing
node 2 detected the IP loss and changed from BACKUP to FAULT
node 1 ignored the state change and stayed MASTER

The result: the IP stayed down.

Configuring the script on node 1 resulted in the following:

removed the IP on node 1
node 1 detected the IP loss and changed from MASTER to FAULT
node 2 detected the state change on node 1 and changed from BACKUP to MASTER, configuring the floating IP on node 2

Well, and then ...

Feb 13 10:11:26 node2 Keepalived_vrrp[3486]: VRRP_Instance(VI_1) Transition to MASTER STATE
Feb 13 10:11:27 node2 Keepalived_vrrp[3486]: VRRP_Instance(VI_1) Entering MASTER STATE
Feb 13 10:11:29 node2 Keepalived_vrrp[3486]: VRRP_Instance(VI_1) Received advert with higher priority 150, ours 100
Feb 13 10:11:29 node2 Keepalived_vrrp[3486]: VRRP_Instance(VI_1) Entering BACKUP STATE
Feb 13 10:11:32 node2 Keepalived_vrrp[3486]: VRRP_Instance(VI_1) Transition to MASTER STATE
Feb 13 10:11:33 node2 Keepalived_vrrp[3486]: VRRP_Instance(VI_1) Entering MASTER STATE
Feb 13 10:11:36 node2 Keepalived_vrrp[3486]: VRRP_Instance(VI_1) Received advert with higher priority 150, ours 100
Feb 13 10:11:36 node2 Keepalived_vrrp[3486]: VRRP_Instance(VI_1) Entering BACKUP STATE
Feb 13 10:11:38 node2 Keepalived_vrrp[3486]: VRRP_Instance(VI_1) Transition to MASTER STATE
Feb 13 10:11:39 node2 Keepalived_vrrp[3486]: VRRP_Instance(VI_1) Entering MASTER STATE
Feb 13 10:11:41 node2 Keepalived_vrrp[3486]: VRRP_Instance(VI_1) Received advert with higher priority 150, ours 100
Feb 13 10:11:41 node2 Keepalived_vrrp[3486]: VRRP_Instance(VI_1) Entering BACKUP STATE
Feb 13 10:11:44 node2 Keepalived_vrrp[3486]: VRRP_Instance(VI_1) Transition to MASTER STATE
Feb 13 10:11:45 node2 Keepalived_vrrp[3486]: VRRP_Instance(VI_1) Entering MASTER STATE
Feb 13 10:11:47 node2 Keepalived_vrrp[3486]: VRRP_Instance(VI_1) Received advert with higher priority 150, ours 100
...

I had to restart keepalived on node1 to stop the ping pong game between the nodes.

5 Answers

Voted

mp3foley · Answer 1 · 2019-03-01T11:37:00+08:00

Best Answer

mp3foley

2019-03-01T11:37:00+08:002019-03-01T11:37:00+08:00

We experienced this issue and decided it is an issue with systemd-networkd in ubuntu 18.04 now using netplan. A newer version of keepalived should fix this as it can detect the removal of the floating IP which causes a failover, see https://github.com/acassen/keepalived/issues/836.

The newer version of keepalived is not available in 18.04, and rather than trying to backport we decided to stay on ubuntu 16.04 and wait until ubuntu 20.04 for our servers that use keepalived.

7

teissler · Answer 2 · 2019-04-26T05:16:17+08:00

teissler

2019-04-26T05:16:17+08:002019-04-26T05:16:17+08:00

This issue is fixed in keepalived 2.0.0 from 2018-05-26, see changelog of keepalived

Monitor VIP/eVIP deletion and transition to backup if a VIP/eVIP is removed unloes it is configured with the no-track option.

5

Mark · Answer 3 · 2019-04-10T07:00:55+08:00

Mark

2019-04-10T07:00:55+08:002019-04-10T07:00:55+08:00

I think you can make a ping check on the floating ip then when it fails restart the keepalived service on all nodes

Youre ip wil com back

Put this in a cronjob which runs every minute or 5 minutes

1

clockworknet · Answer 4 · 2019-02-13T03:27:14+08:00

clockworknet

2019-02-13T03:27:14+08:002019-02-13T03:27:14+08:00

I think your general approach is fine, but that you need rethink your test condition. The condition that you are concerned about, is whether systemd is restarting the network infra (the indirect consequence of that being, whether or not your VIP is up), so that is what you need to check for.

I don't have a system that I can easily test on as I type this, so YMMV, however systemctl is-active network.service may be sufficient to cover this. Failing that checking the state of systemctl show network.service | grep 'ActiveState' for a state other than 'active' should do it.

As an aside, should one of your nodes not be configured with the state 'BACKUP', rather than both as 'MASTER'?

0

Gerald Schneider · Answer 5 · 2019-02-25T23:05:56+08:00

As a workaround I configured the floating IP as an additional IP on the primary node (with the higher priority)

/etc/netplan/01-netcfg.yaml:

network:
  version: 2
  renderer: networkd
  ethernets:
    ens160:
      addresses: [ 1.2.3.5/24, 1.2.3.4/24 ]
      gateway4: 1.2.3.254
      nameservers:
          search: [ example.com ]
          addresses:
              - "1.2.3.40"

This way, upon boot or systemd reconfigure the floating IP is on the primary node. Should it fail, it is taken over by the secondary node via keepalived. Should the primary node return the IP is released by keepalived on the secondary node.

It's not really a solution, but currently I don't see anything better.

Update

While this workaround kinda worked, it had some side effects. After a reboot the floating IP address existed twice on the interface:

2: ens160: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 00:50:56:a3:d7:d1 brd ff:ff:ff:ff:ff:ff
    inet 1.2.3.5/24 brd 1.2.3.255 scope global ens160
       valid_lft forever preferred_lft forever
    inet 1.2.3.4/32 scope global ens160
       valid_lft forever preferred_lft forever
    inet 1.2.3.4/24 brd 1.2.3.255 scope global secondary ens160
       valid_lft forever preferred_lft forever

This didn't seem to affect anything, it worked, but it bothered me. In the end I resulted in the answer by mp3foley and reinstalled the VMs with Ubuntu 16.04.

keepalived doesn't detect loss of virtual IP

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?