We have set up 3 servers running keepalived . We started noticing some random re-elections occurring which we can't explain so I cam here looking for advice.
Here is our configuration:
MASTER:
global_defs {
notification_email {
[email protected]
}
notification_email_from keepalived@hostname
smtp_server example.com:587
smtp_connect_timeout 30
router_id some_rate
}
vrrp_script chk_nginx {
script "killall -0 nginx"
interval 2
weight 2
}
vrrp_instance VIP_61 {
interface bond0
virtual_router_id 61
state MASTER
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass PASSWORD
}
virtual_ipaddress {
X.X.X.X
X.X.X.X
X.X.X.X
}
track_script {
chk_nginx
}
}
BACKUP1:
global_defs {
notification_email {
[email protected]
}
notification_email_from keepalived@hostname
smtp_server example.com:587
smtp_connect_timeout 30
router_id some_rate
}
vrrp_script chk_nginx {
script "killall -0 nginx"
interval 2
weight 2
}
vrrp_instance VIP_61 {
interface bond0
virtual_router_id 61
state MASTER
priority 99
advert_int 1
authentication {
auth_type PASS
auth_pass PASSWORD
}
virtual_ipaddress {
X.X.X.X
X.X.X.X
X.X.X.X
}
track_script {
chk_nginx
}
}
BACKUP2:
global_defs {
notification_email {
[email protected]
}
notification_email_from keepalived@hostname
smtp_server example.com:587
smtp_connect_timeout 30
router_id some_rate
}
vrrp_script chk_nginx {
script "killall -0 nginx"
interval 2
weight 2
}
vrrp_instance VIP_61 {
interface bond0
virtual_router_id 61
state MASTER
priority 98
advert_int 1
authentication {
auth_type PASS
auth_pass PASSWORD
}
virtual_ipaddress {
X.X.X.X
X.X.X.X
X.X.X.X
}
track_script {
chk_nginx
}
}
Every now and then I can see this happening (grepped in logs):
MASTER:
Jan 6 18:30:15 lb-public01 Keepalived_vrrp[24380]: VRRP_Instance(VIP_61) Received lower prio advert, forcing new election
Jan 6 18:30:16 lb-public01 Keepalived_vrrp[24380]: VRRP_Instance(VIP_61) Received lower prio advert, forcing new election
Jan 6 18:32:37 lb-public01 Keepalived_vrrp[24380]: VRRP_Instance(VIP_61) Received lower prio advert, forcing new election
BACKUP1:
Jan 6 18:30:16 lb-public02 Keepalived_vrrp[26235]: VRRP_Instance(VIP_61) Transition to MASTER STATE
Jan 6 18:30:16 lb-public02 Keepalived_vrrp[26235]: VRRP_Instance(VIP_61) Received higher prio advert
Jan 6 18:30:16 lb-public02 Keepalived_vrrp[26235]: VRRP_Instance(VIP_61) Entering BACKUP STATE
Jan 6 18:32:37 lb-public02 Keepalived_vrrp[26235]: VRRP_Instance(VIP_61) forcing a new MASTER election
Jan 6 18:32:38 lb-public02 Keepalived_vrrp[26235]: VRRP_Instance(VIP_61) Transition to MASTER STATE
Jan 6 18:32:38 lb-public02 Keepalived_vrrp[26235]: VRRP_Instance(VIP_61) Received higher prio advert
Jan 6 18:32:38 lb-public02 Keepalived_vrrp[26235]: VRRP_Instance(VIP_61) Entering BACKUP STATE
BACKUP2:
Jan 6 18:32:36 lb-public03 Keepalived_vrrp[14255]: VRRP_Script(chk_nginx) succeeded
Jan 6 18:32:37 lb-public03 Keepalived_vrrp[14255]: VRRP_Instance(VIP_61) Transition to MASTER STATE
Jan 6 18:32:37 lb-public03 Keepalived_vrrp[14255]: VRRP_Instance(VIP_61) Received higher prio advert
Jan 6 18:32:37 lb-public03 Keepalived_vrrp[14255]: VRRP_Instance(VIP_61) Entering BACKUP STATE
So MASTER receives LOWER PRIO advert and NEW election is started. WHY ? Looks like BACKUP transitions into MASTER for a short period of time (based on the logs) and then fails back to the BACKUP state. I'm quite clueless as why is this actually happening so any hints would be more than welcome.
Also, I found out there is a unicast patch in keepalived, however it's not clear to me whether it supports more than 1 unicast peer - in our case we have a cluster of 3 machines so we need more than 1 unicast peers.
Any hints on these issues would be superamazingly appreciated!
The problem is that you use the default state MASTER for the backup nodes. They should state BACKUP.
Hope this solves your mistery.