Ping a Specific Port

Question

Yuri Gadow

Asked: 2011-04-13 11:23:08 +0800 CST2011-04-13 11:23:08 +0800 CST 2011-04-13 11:23:08 +0800 CST

Does Ubuntu-stock ldirectord not handle hung connections to real servers or am I configuring it wrong?

772

I'm using it to balance and remove failed instances from a http/https cluster, but have noticed that when connections to a real server hang, ldirectord never marks them quiescent, though it will instantly if the connection is rejected or cannot be made, e.g., instance shutdown or nginx stopped.

This is a bit of a problem in this case because the servers are cloud instances which do occasionally completely hang and use an app server stack that occasionally gets an infinite loop going until restarted; both cases resulting in connections hanging.

Here's an example of an /etc/ha.d conf:

negotiatetimeout = 1
checkinterval = 1
quiescent = yes
fallback = 127.0.0.1
emailalert = "[email protected]"

virtual = <vip 1>:80
    protocol = tcp
    scheduler = wlc
    real = <real ip 1>:80 ipip 5
    real = <real ip 2>:80 ipip 5
    [more reals]
    checktype = negotiate
    request = "/node-status"
    receive = "OK"

virtual = <vip 2>:443
    protocol = tcp
    scheduler = wlc
    real = <real ip 1>:443 ipip 5
    real = <real ip 2>:443 ipip 5
    [more reals]
    checktype = negotiate
    request = "/node-status"
    receive = "OK"

One balancer is Ubuntu 10.10, the other 10.04.2, ldirectord is 1.186-ha on both.

Note, this 2002 thread implies ldirectord didn't catch hung connections then: http://archive.linuxvirtualserver.org/html/lvs-users/2002-05/msg00163.html

UPDATE

Note that the above times are aggressive while I'm trying to nail this problem down, normally they are higher and include failurecount, but I've seen the problem with both the settings above and below:

negotiatetimeout = 2
checkinterval = 2
failurecount = 5

Also, ldirectord's log files show no entries leading up to or during a time when one of these "outages" occurs on a real server. But if the http service or the instance itself is shutdown, while it's "hanging", the display from ipvsadm and the log files immediately, correctly show the IP becoming quiescent.

And, when I say "server hung" I mean the entire (cloud) instance is unresponsive, all connection attempts eventually timeout (ping, ssh, http, whatever) and the console is as well.

Unfortunately, I've not found the root cause of either of the problems (server hang and stack infinite loop) that put a server into such a state so I can't (yet) repro the situation on demand.

1 Answers

Voted

shakalandy · Answer 1 · 2011-04-13T12:57:22+08:00

shakalandy

2011-04-13T12:57:22+08:002011-04-13T12:57:22+08:00

I am not 100% sure, but don't you need a "service=http" directive when using request & receive? Have you tried without request/receive and use checktype=connect instead? Or what exactly do you mean with "server hangs"? Connection times out? Could you please add logfile? e.g. logfile="/var/log/ldirectord_vhost.log"

And in general set the checktimeout=10 (not sure what the default is here)

0

Does Ubuntu-stock ldirectord not handle hung connections to real servers or am I configuring it wrong?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Resolve host name from IP address

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?