Ping a Specific Port

Question

niXar

Asked: 2009-06-01 01:37:49 +0800 CST2009-06-01 01:37:49 +0800 CST 2009-06-01 01:37:49 +0800 CST

Most resilient form of (private) DNS clustering?

772

I'm working on a setup with two datacenters linked by a MAN (bridged) and everything is doubled between them, in fail-over mode with RedHat Cluster, DRBD and that kinf of things.

I have one DNS server for each location, but it turns out that having both in /etc/resolv.conf doesn't help much; if one goes down, the client waits 10s or so half of the time. In other words, it's using them for load balancing, not fail-over. So I configured the two servers to use a VIP with ucarp (≈VRRP).

Is there a way to have my two DNS servers both be up and, for example, respond to the same IP, all the time? It's no big deal if one NS resquest gets two answers.

Is there a way to do this with Anycast / Multicast and so on?

Edit: turns out anycast won't do me any good in my scenario, I have only static routes, and most traffic is actually through a bridge.

What would be interesting would be a way to have two DNS servers answer to requests on the same IP, if that's somehow possible.

11 Answers

Voted

David Pashley · Answer 1 · 2009-06-05T03:48:02+08:00

David Pashley

2009-06-05T03:48:02+08:002009-06-05T03:48:02+08:00

You can massively mitigate problems by setting a couple of options in your resolv.conf:

options rotate timeout:2

rotate makes the resolver pick one of your nameservers at random, rather than using the first one unless it times out. timeout:2 reduces the dns timeout to two seconds, rather than the default value.

(NB: this was tested on Debian/Ubuntu, but I don't think this is a Debian specific change)

6

Murali Suriar · Answer 2 · 2009-06-01T01:44:02+08:00

Best Answer

Murali Suriar

2009-06-01T01:44:02+08:002009-06-01T01:44:02+08:00

Anycast DNS would allow you to configure one resolver IP in all your clients; client requests would be forwarded to the 'closest' (from a network routing perspective) server.

If you tied the advertisement of the anycast VIP to a healthcheck (e.g. requesting the A record for a well known domain), then should one of your servers fail its route would be withdrawn. Once the network reconverged, all requests would be forwarded to the other device without any manual reconfiguration.

In terms of implementation, this can be done either through the use of hardware appliances (e.g. F5 Big IP, Citrix Netscaler), or through your own configuration. You can either run a routing daemon (e.g. Quagga) running on your DNS servers, or have some custom scripts that log in to your routers in order to change the state of each anycast VIP.

5

BrianEss · Answer 3 · 2009-09-23T22:16:40+08:00

BrianEss

2009-09-23T22:16:40+08:002009-09-23T22:16:40+08:00

Fix the client - use a better resolver.

lwresd is part of Bind. It runs as a local service. You configure libc to use it via /etc/nsswitch.conf, so using it is transparent to all but statically compiled programs.

lwresd monitors the performance and availability of configured name servers (this is standard Bind behaviour). Should a host become unavailable, lwresd will back off from a server and send all queries to other configured servers. As it runs locally on each host, it should normally send all queries to the closest server.

3

ZaphodB · Answer 4 · 2009-06-02T12:41:25+08:00

ZaphodB

2009-06-02T12:41:25+08:002009-06-02T12:41:25+08:00

I run an internal BGP anycast recursive DNS Cluster on two Linux Virtual Server (IPVS) Loadbalancers and it works like a charm.

The basic setup is described here: great: sorry, new users aren't allowed to add hyperlinks... (see for link below and later then)

The Problem with using VRRP for the Service IP is that it will wander between your two servers and thus your nameserver will need to bind to it quickly in order to be able to respond to queries in the case of a failover. You could work around this by NATing just as in my IPVS setup but i'd recommend loadbalancing with active service checks so you know when something is wrong.

Please note that while there are DNS implementations that make use of multicast (Apple Bonjour/mdns for example) these are usually not well suited for reliant or high volume recursive DNS service and are also commonly limited to use within the same collision domain i.e. LAN.

2

Mathieu Chateau · Answer 5 · 2009-06-10T12:57:18+08:00

Mathieu Chateau

2009-06-10T12:57:18+08:002009-06-10T12:57:18+08:00

The simple dumb way:

Ask your linux to be much more aggressive on dns servers in resolv.conf: options timeout:0.1 rotate

So timeout is quick and rotate make him use both to round robin the load, without any VIP/VRRP/staff to manage, just 2 dns servers doing their job...

2

netlinxman · Answer 6 · 2009-06-01T08:53:09+08:00

netlinxman

2009-06-01T08:53:09+08:002009-06-01T08:53:09+08:00

Anycast is frequently used to solve this requirement. Anycast DNS is the use of routing and addressing policies to affect the most efficient path between a single source (DNS Client) and several geographically dispersed targets that "listen" to a service (DNS) within a receiver group. In Anycast, the same IP addresses are used to address each of the listening targets (DNS servers in this case). Layer 3 routing dynamically handles the calculation and transmission of packets from our source (DNS Client) to its most appropriate (DNS Server) target.

Please see www.netlinxinc.com for an entire series of blog posts devoted to Anycast DNS. There you will find recipes for how to configure Anycast DNS. The series has covered Anycast DNS using Static Routing, RIP, and I will be posting recipes on OSPF and BGP shortly.

1

Alex J · Answer 7 · 2009-06-01T02:57:02+08:00

Alex J

2009-06-01T02:57:02+08:002009-06-01T02:57:02+08:00

If it's acceptable to have a few seconds of DNS failure before the swapover occurs, you can create a simple shell script to do this. Non working pseudocode follows:

#!/bin/sh
localns=192.168.0.1
remotens=192.168.0.2
currentns=`cat /etc/resolv.conf | grep nameserver | awk '{print $2}'`

while 1; do
    if ping -W1 -q -c3 -i0.5 $localns > /dev/null 2>&1; then
        # Local DNS is up
        [ $currentns != $localns ] || echo "nameserver $localns" > /etc/resolv.conf
        currentns=$localdns
    else;
        # Local DNS is down
        [ $currentns != $remotens ] || echo "nameserver $remotens" > /etc/resolv.conf
        currentns=$remotedns
    sleep 2 # Will detect failures in no more than 5 secs

0

Matt Simmons · Answer 8 · 2009-06-01T03:33:48+08:00

Matt Simmons

2009-06-01T03:33:48+08:002009-06-01T03:33:48+08:00

If you are using load balancers anywhere in your site, you should be able to configure them to have DNS as a virtual service.

My Kemp Loadmaster 1500s can be setup to do round-robin with failover. That would use their service checking to make sure that each DNS server is up every few seconds and divide the traffic between the two servers. If one dies, it drops out of the RR pool and only the "up" server gets queried.

You'd just have to point your resolv.conf to the VIP on the loadbalancer.

0

carlito · Answer 9 · 2009-06-04T21:31:45+08:00

You want DNS to be reliable. Adding a huge amount of complexity to the setup will cause an absolute nightmare when something breaks.

Some of the proposed solutions only work when the redundant DNS servers are at the same site.

The fundamental issue is that the DNS client is broken as designed. It doesn't remember when a server was unreachable, and keeps trying to connect to the same nonresponsive server.

NIS handled this issue by having ypbind keep state. A clumsy solution, but it usually works.

The solution here is to lean on vendors to implement a reasonable solution to this problem. It's getting worse with IPV6, as the AAAA requests are adding to the length of time spent wasted on timeouts. I have seen protocols fail (e.g. an sshd connection) because they spent so much time waiting on DNS timeouts due to a single unreachable DNS server.

In the interim, as has been previously suggested, write a script that replaces resolv.conf with one that contains only valid nameservers. Share this script with vendors to demonstrate the unclean solution that you were forced to implement.

This hasn't been seriously tested, and it assumes an nslookup that parses like mine, and a grep that supports "-q".

Run this out of cron every 5 minutes or so.

I'm not seriously suggesting that anyone actually use cron and a shell script for critical failover management, the error-handling surprises are just too great. This is a proof of concept only.

To test this for real, change the "nameservers=" line at the top, change resolv_conf at the top to /etc/resolv.conf not /tmp/resolv.conf, and the default header for resolv.conf that contains example.com.

You may need to restart nscd if you replace resolv.conf.

#!/bin/bash
# full list of nameservers
nameservers="127.0.0.1 192.168.0.1 192.168.1.1"

# resolv.conf filename, change to /etc/resolv.conf for production use
resolv_conf="/tmp/resolv.conf"

# for tracking during the test
failed_nameservers=""
good_nameservers=""

# test loop
for nameserver in $nameservers; do
    if nslookup localhost $nameserver | grep -q 'Address.*127\.0\.0\.1'; then
        good_nameservers="$good_nameservers $nameserver"
    else
        failed_nameservers="$failed_nameservers $nameserver"
    fi
done

# if none succeded, include them all
if [ -z "$good_nameservers" ]; then
    good_nameservers="$nameservers"
fi

# error reporting, consider writing to syslog
if [ -n "$failed_nameservers" ]; then
    echo warning: failed nameservers $failed_nameservers
fi

# create the temporary replacement resolv.conf
new_rc="$resolv_conf.new.$$"
echo domain example.com  > $new_rc
echo search example.com >> $new_rc
for nameserver in $good_nameservers; do
    echo nameserver $nameserver >> $new_rc
done

# don't deploy a corrupt resolv.conf
if ! grep -q nameserver $new_rc; then
    echo warning: sanity check on $new_rc failed, giving up
    exit 1
fi

# keep a backup
if [ -f $resolv_conf ]; then
    rm -f $resolv_conf.previous
    ln $resolv_conf $resolv_conf.previous
fi
# deploy the new one
mv $new_rc $resolv_conf

eliott · Answer 10 · 2009-06-05T09:37:10+08:00

eliott

2009-06-05T09:37:10+08:002009-06-05T09:37:10+08:00

I would first try duplicating your VRRP, but with an additional VIP. For each VIP, alternate the primary and backup nodes.

DNS1 = vip1 primary, vip2 secondary DNS2 = vip2 primary, vip1 secondary

Then have each of your client machines have both ips in the resolver. That way the load is spread across the nameservers, but if one goes down, the other one just takes over the additional load.

0

Most resilient form of (private) DNS clustering?

Ping a Specific Port

What port does SFTP use?

Resolve host name from IP address

How can I sort du -h output by size

Command line to list users in a Windows Active Directory group?

What's the command-line utility in Windows to do a reverse DNS look-up?

How to check if a port is blocked on a Windows machine?

What port should I open to allow remote desktop?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?