I'm trying to setup keepalived + HAProxy as a redundant load balancer on an EC2 VPC (yes, I know that ELB is an option). I believe we have things configured correctly, but killing the master server doesn't seem to failover.
Server A Config:
vrrp_script chk_haproxy {
script "pidof haproxy"
interval 2
}
vrrp_instance VI_1 {
interface eth0
state BACKUP
priority 100
nopreempt
virtual_router_id 33
unicast_src_ip 172.30.1.100
unicast_peer {
172.30.1.101
}
authentication {
auth_type PASS
auth_pass PASSWORD
}
track_script {
chk_haproxy
}
notify_master /etc/keepalived/master.sh
}
Server B Config:
vrrp_script chk_haproxy {
script "pidof haproxy"
interval 2
}
vrrp_instance VI_1 {
interface eth0
state BACKUP
priority 100
nopreempt
virtual_router_id 33
unicast_src_ip 172.30.1.101
unicast_peer {
172.30.1.100
}
authentication {
auth_type PASS
auth_pass PASSWORD
}
track_script {
chk_haproxy
}
notify_master /etc/keepalived/master.sh
}
I've setup the security group rules to:
HTTP TCP 80 0.0.0.0/0
Custom ICMP Rule Echo Reply N/A 0.0.0.0/0
SSH TCP 22 0.0.0.0/0
Custom Protocol VRRP (112) All 0.0.0.0/0
Custom ICMP Rule Echo Request N/A 0.0.0.0/0
However, the following command always times out from the backup (and same with reverse on master):
nc -vz 172.30.1.100 112
Also, the following command never returns anything, making me think these are still not going through for some reason:
sudo tshark -f "vrrp"
Your netcat command is trying to use port 112, not protocol 112. That's why it doesn't work. Also, using netcat to test comms in this case is not the right way to go. Use either of these commands to see if your traffic is present on either instance:
Your configs should define one of the servers as MASTER, the other as BACKUP. The priority should be 100 on the BACKUP, 101 on the MASTER.
Having them both set to BACKUP may be your issue.
The issue turned out to be painfully obvious once I slept on it and took a second look (aren't they always). It was as simple as there being a typo in the
unicast_src_ip
. Since the IP was incorrect, no messages were going through on either server. I would have thought there'd be some error message for this, but everything started working 100% once this was fixed.