We have the following setup:
- RedHat 6
- LVS set up to fail between two webservers
- Connection persistence of 900 seconds
It's a pretty simple setup however when a server is marked as failed the piranha/pulse/nanny process marks the weight of the server in the table as 0 and doesn't remove the failed server. This means any persistent connections remain attached to a failed server and the load balancing is defeated.
How can we tell nanny to force the failed node out so persistent connections are failed to a working node?
Thanks
We have the following lvs.cf:
serial_no = 201305302344
primary = 10.1.1.45
service = lvs
backup = 0.0.0.0
heartbeat = 1
heartbeat_port = 539
keepalive = 6
deadtime = 18
network = nat
nat_router = 10.1.1.70 eth0:1
nat_nmask = 255.255.255.0
debug_level = NONE
virtual http {
active = 1
address = 10.1.1.70 eth0:1
vip_nmask = 255.255.255.0
persistent = 900
pmask = 255.255.255.0
port = 80
send = "GET / HTTP/1.0\r\n\r\n"
expect = "HTTP/1.1 200 OK"
use_regex = 0
load_monitor = none
scheduler = wlc
protocol = tcp
timeout = 6
reentry = 15
quiesce_server = 1
server web1 {
address = 10.1.1.51
active = 1
weight = 1
}
server web2 {
address = 10.1.1.52
active = 1
weight = 1
}
}
virtual https {
active = 1
address = 10.1.1.70 eth0:1
vip_nmask = 255.255.255.0
port = 443
persistent = 900
pmask = 255.255.255.0
send = "GET / HTTP/1.0\r\n\r\n"
expect = "up"
use_regex = 0
load_monitor = none
scheduler = wlc
protocol = tcp
timeout = 6
reentry = 15
quiesce_server = 1
server web1 {
address = 10.1.1.51
active = 1
weight = 1
}
server web2 {
address = 10.1.1.52
active = 1
weight = 1
}
}
Try
echo 1 > /proc/sys/net/ipv4/vs/expire_quiescent_template
More details here:
http://www.austintek.com/LVS/LVS-HOWTO/HOWTO/LVS-HOWTO.persistent_connection.html
You have to trigger a script on failure/recovery of a director that removes/adds that director.
I use
lvs-kiss
for this, which has a syntax to include scripts for these cases.