How do most browsers behave if they get multiple A-records from the DNS server? Do the stick to one IP as long as it is reachable (and only use another if the IP is down)? Or do they switch all the time for no reason?
If the majority current browsers stick to one IP, DNS-RR would be enough for me as a simple failover solution.
Each browser has it's own method of handling round-robin DNS, I've spent some time today researching this problem and will continue to update my answer as I find proof of implementation which will limit my answers to browsers that expose their behavior.
Google Chrome
Google Chrome (v58 used) will request all host entries for an address (A, AAAA, CNAME) and put them into an array (address_list). Chrome will then attempt to open a socket on each IP address in order from first to last, chrome will not attempt the fastest or closest IP, it assumes the first IP (given by your upstream dns resolvers) is the best IP. In my tests bind and windows dns servers give a different order of IPs per lookup, giving what seems like 50/50 split in bandwidth to each IP. This functionality is exposed in
chrome://net-internals/#events&q=type:SOCKET%20is:active
Curl (libcurl/7.54.0)
Curl also has this fail-over function but the
--connect-timeout
is much longer than the default in chrome, chrome fails over immediately, Curl does not. If you use libcurl and want to survive a round-robin dns instance where one IP fails, (works in chrome but not in code) be sure to specify this value lower.DEFAULT_CONNECT_TIMEOUT:0 made me think this wasn't possible with curl.
* After 149990ms connect time, move on!
On both browsers, the IP was not sticky, they followed the TTL given in DNS and once that ttl expired (chrome maintains this internally, curl asks on each request), the ip selection is performed each time as described above.
What does this mean? DNS-RR is ok for some systems, but it is not designed for failover. You should expect that all results from the DNS looking are (a source of truth) valid and available to serve traffic. There are many ways to ensure IP availability, such as virtual float IPs, BGP/Routing tricks, etc. Use them.
All tests performed in IPv4 only environment, will return with dual-stack results once enough infrastructure is available to test.
I speculate these changes are a side-effect of the IPv6-Fallback RFC Happy Eyeballs
Update A useful consideration, RR DNS can only assist with load balancing, not application failures, if one of your nodes has a 503 you will serve 40-60% if your traffic 503s. The assumption is made that all IPs listed are valid working endpoints if reachable
See this my question (and answer): How browsers handle multiple IPs.
Shortly - round robin dns does not improve availability at all. Browser chooses one IP and sticks to it, even if it does not responds. (Checked with FF and chrome).
Once browser dns cache expires, hostname resolved again and the process repeated, regardless of did IP answered or not.
For basic HA, you may use dynamic DNS or various IP-based approaches.
EDIT: This behavior will take place when inaccessible host acts as a "black hole". If instead the host ctively refuses incoming connections, browser will try one ip, get refuse and immediately use another ip and thus it will fail-over pretty well.
All modern browsers implement this https://www.rfc-editor.org/rfc/rfc8305 Happy Eyeballs Version 2: Better Connectivity Using Concurrency
Roughly that means that a server with fastest connection will be used. IPv6 has higher priority.
edit: Editing my answer since HiPerFreak schooled me.
DNS servers will return a list of all A records it has for a given host name. Where round robin comes in is that it rotates how the list is ordered. The link that lain posted is a great example of how web browsers will make use of that list.
Round Robinning can be used for a very primitive form of load balancing, but is a very poor substitute for real load balancing, since if one of the hosts in the round robin rotation goes down, the DNS server will be none the wiser and will still put the IP address of the downed node in the list.
They switch the IPs, it isn't a failover solution.
The browsers let the OS to do the name resolution, and for examle Linux always randomizes the IP addresses, try host google.com several times. The IPs will come in random order.
The DNS return all the IP in a list but they change the order of the list and this order is not random or change when 1 fails but they always return the IPs in the same sequence for load balancing reasons. When the browser receive the list, I suppose it picks the 1st in the list if not known as non-working.