Ping a Specific Port

Question

Naftuli Kay

Asked: 2016-05-03 11:34:05 +0800 CST2016-05-03 11:34:05 +0800 CST 2016-05-03 11:34:05 +0800 CST

Do clients typically implement failover/load-balancing on multiple A records?

772

Typically, load balancers like Amazon's Elastic Load Balancers use a DNS record set with multiple A records to provide multiple load balancer instances which can handle traffic to requesting endpoints:

$ dig +short my-fancy-elb.us-east-1.elb.amazonaws.com
10.0.1.1
10.0.1.2

If I attempt to curl this URL in verbose mode, I notice that curl seems to round-robin attempts to the two IP addresses:

$ curl -ivs http://my-fancy-elb.us-east-1.elb.amazonaws.com | grep -i 'connected'
* Connected to my-fancy-elb.us-east-1.elb.amazonaws.com (10.0.1.1)
$ curl -ivs http://my-fancy-elb.us-east-1.elb.amazonaws.com | grep -i 'connected'
* Connected to my-fancy-elb.us-east-1.elb.amazonaws.com (10.0.1.2)

Is the fact that curl does round-robin on the A records described in the record set done by the curl binary itself or is it something that the Linux kernel does for it?

TCP exists at layer 4 and DNS exists at layer 7, so I'd imagine that individual binaries and libraries would have to implement their own load-balancing and failover: fetching the DNS record set for the given domain name and choosing a TCP address to connect to from that set.

Can I reasonably expect that programming languages, browsers, and libraries like curl will do load-balancing and failover on A records for me?

4 Answers

Voted

Andrew B · Answer 1 · 2016-05-03T13:24:07+08:00

The short answer is that it varies.

When multiple address records are present in the answer set, a queried DNS server normally returns them in a randomized order. The operating system will typically present the returned record set to the application in the order they were received. That said, there are options on both sides of the transaction (the nameserver and the OS) which can result in different behaviors. Usually these are not employed. As an example, a little-known file called /etc/gai.conf controls this on glibc based systems.

The Zytrax book (DNS for Rocket Scientists) has a good summary on the history of this topic, and concludes that RFC 6724 is the current standard that applications and resolver implementations should adhere to.

From here it's worth noting a choice quote from RFC 6724:

   Well-behaved applications SHOULD NOT simply use the first address
   returned from an API such as getaddrinfo() and then give up if it
   fails.  For many applications, it is appropriate to iterate through
   the list of addresses returned from getaddrinfo() until a working
   address is found.  For other applications, it might be appropriate to
   try multiple addresses in parallel (e.g., with some small delay in
   between) and use the first one to succeed.

The standard encourages applications to not stop at the first address on failure, but it is neither a requirement nor the behavior that many casually written applications are going to implement. You should never rely solely on multiple address records for high availability unless you are certain that the greater (or at least most important) percentage of your consuming applications will play nicely. Modern browsers tend to be good about this, but remember that they are not the only consumers that you are dealing with.

(also, as @kasperd notes below, it's important to distinguish between what this buys you in HA vs. load balancing)

Sven · Answer 2 · 2016-05-03T11:51:04+08:00

Sven

2016-05-03T11:51:04+08:002016-05-03T11:51:04+08:00

My guess what happens is that the DNS TTL for the record is set really low and curl just needs to resolve again every time and will get another IP from the DNS server.

Neither curl nor the kernel are at all aware that this DNS level load balancing happens and you can't reasonably expect anything like that.

4

Fedor Piecka · Answer 3 · 2016-05-03T12:34:38+08:00

Fedor Piecka

2016-05-03T12:34:38+08:002016-05-03T12:34:38+08:00

The basic thing is DNS servers usually cycle the records in a pseudorandom fashion.

fedor@piecka:~$ dig +short @ns1.yahoo.com yahoo.com
206.190.36.45
98.138.253.109
98.139.183.24
fedor@piecka:~$ dig +short @ns1.yahoo.com yahoo.com
98.139.183.24
206.190.36.45
98.138.253.109
fedor@piecka:~$ dig +short @ns1.yahoo.com yahoo.com
98.139.183.24
98.138.253.109
206.190.36.45

In the case of curl, it has it's own DNS resolving library which respects the server presented order.

There is a story on this topic on https://daniel.haxx.se/blog/2012/01/03/getaddrinfo-with-round-robin-dns-and-happy-eyeballs/. The curl's implementation is mentioned there too.

1

Danish Rizvi · Answer 4 · 2017-01-24T17:27:09+08:00

Is the fact that curl does round-robin on the A records described in the record set done by the curl binary itself or is it something that the Linux kernel does for it?

Neither. Its the DNS server which changes the IP address usually. The curl library needs to resolve the host-name to get the IP address for each request. It sends the request to the DNS server which sends back a list of IP addresses. The DNS server can also be local on the same machine for caching. Most of the DNS server rotate the IP list round-robin in every request. Thus you get a different IP in every request as the top IP of the list has changed. If you ping www.google.com from a linux machine you will likely see different address each time.

Do clients typically implement failover/load-balancing on multiple A records?

I performed a test with curl to fetch a file over http. Curl is able to retry with another IP when the first ip is not accessible (failover). So 'failover' is working with curl for http request.

Do clients typically implement failover/load-balancing on multiple A records?

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?