I'm using it to balance and remove failed instances from a http/https cluster, but have noticed that when connections to a real server hang, ldirectord never marks them quiescent, though it will instantly if the connection is rejected or cannot be made, e.g., instance shutdown or nginx stopped.
This is a bit of a problem in this case because the servers are cloud instances which do occasionally completely hang and use an app server stack that occasionally gets an infinite loop going until restarted; both cases resulting in connections hanging.
Here's an example of an /etc/ha.d conf:
negotiatetimeout = 1
checkinterval = 1
quiescent = yes
fallback = 127.0.0.1
emailalert = "[email protected]"
virtual = <vip 1>:80
protocol = tcp
scheduler = wlc
real = <real ip 1>:80 ipip 5
real = <real ip 2>:80 ipip 5
[more reals]
checktype = negotiate
request = "/node-status"
receive = "OK"
virtual = <vip 2>:443
protocol = tcp
scheduler = wlc
real = <real ip 1>:443 ipip 5
real = <real ip 2>:443 ipip 5
[more reals]
checktype = negotiate
request = "/node-status"
receive = "OK"
One balancer is Ubuntu 10.10, the other 10.04.2, ldirectord is 1.186-ha on both.
Note, this 2002 thread implies ldirectord didn't catch hung connections then: http://archive.linuxvirtualserver.org/html/lvs-users/2002-05/msg00163.html
UPDATE
Note that the above times are aggressive while I'm trying to nail this problem down, normally they are higher and include failurecount, but I've seen the problem with both the settings above and below:
negotiatetimeout = 2
checkinterval = 2
failurecount = 5
Also, ldirectord's log files show no entries leading up to or during a time when one of these "outages" occurs on a real server. But if the http service or the instance itself is shutdown, while it's "hanging", the display from ipvsadm and the log files immediately, correctly show the IP becoming quiescent.
And, when I say "server hung" I mean the entire (cloud) instance is unresponsive, all connection attempts eventually timeout (ping, ssh, http, whatever) and the console is as well.
Unfortunately, I've not found the root cause of either of the problems (server hang and stack infinite loop) that put a server into such a state so I can't (yet) repro the situation on demand.
I am not 100% sure, but don't you need a "service=http" directive when using request & receive? Have you tried without request/receive and use checktype=connect instead? Or what exactly do you mean with "server hangs"? Connection times out? Could you please add logfile? e.g. logfile="/var/log/ldirectord_vhost.log"
And in general set the checktimeout=10 (not sure what the default is here)