I am implementing a solution to load balance DNS queries across multiple bind recursive DNS servers to increase QPS limit
Each centos VM has a namespace gi set up with the loopback of the ns set to asingle DNS Public IP
Each DNS server advertises the same DNS IP to my network across bgp peerings configured on my quagga router
all incoming queries are load-balanced via the network core using the bgp maximum-paths feature
However only 1 Bind DNS server will query the DNS IP, the other will just return servfail (this is not static, if i kill the bgp peerings to Server1, queries are succesful, the same happens if i kill the peerings to Server2) however they will not work in tandem.
One thing i have noticed is that if i do a
ip netns exec gi dig @DNSIP +trace
ip netns exec gi dig @DNSIP cloudflare.com +trace
; <<>> DiG 9.11.4-P2-RedHat-9.11.4-16.P2.el7_8.6 <<>> @DNSIP cloudflare.com +trace
; (1 server found)
;; global options: +cmd
. 509520 IN NS e.root-servers.net.
. 509520 IN NS c.root-servers.net.
. 509520 IN NS f.root-servers.net.
. 509520 IN NS j.root-servers.net.
. 509520 IN NS b.root-servers.net.
. 509520 IN NS i.root-servers.net.
. 509520 IN NS h.root-servers.net.
. 509520 IN NS m.root-servers.net.
. 509520 IN NS k.root-servers.net.
. 509520 IN NS a.root-servers.net.
. 509520 IN NS l.root-servers.net.
. 509520 IN NS d.root-servers.net.
. 509520 IN NS g.root-servers.net.
. 509520 IN RRSIG NS 8 0 (didn't include the key)
whereas Server2 does not return an RRSIG even though both named.conf files have dnssec-enable yes and dnssec-validation yes
ip netns exec gi dig @DNSIP cloudflare.com +trace
; <<>> DiG 9.11.4-P2-RedHat-9.11.4-16.P2.el7_8.6 <<>> @DNSIP cloudflare.com +trace
; (1 server found)
;; global options: +cmd
. 518400 IN NS c.root-servers.net.
. 518400 IN NS k.root-servers.net.
. 518400 IN NS g.root-servers.net.
. 518400 IN NS d.root-servers.net.
. 518400 IN NS a.root-servers.net.
. 518400 IN NS j.root-servers.net.
. 518400 IN NS e.root-servers.net.
. 518400 IN NS h.root-servers.net.
. 518400 IN NS f.root-servers.net.
. 518400 IN NS i.root-servers.net.
. 518400 IN NS m.root-servers.net.
. 518400 IN NS b.root-servers.net.
. 518400 IN NS l.root-servers.net.
My dnssec configuration is as follows:
dnssec-enable no;
dnssec-validation no;
/* Path to ISC DLV key */
bindkeys-file "/etc/named.iscdlv.key";
managed-keys-directory "/var/named/dynamic";
If i disable dnssec in my named.conf file thr DNS servers work in tandem and I can achieve my target goal of 20,000 QPS, however with dnssec enabled it does not work.
Has anyone encountered a problem like this before, is it a limiation of BIND behind a single PublicIP? or is as I suspect an issue with DNSSEC setup
Try this DNSsec configuration on all of your DNS resolvers:
The options
dnssec-enable
has been deprecated (see ARM, p. 156) and DNSsec lookaside validation (DLV) also has been deprecated (see ARM, p. 156).For recursive resolvers I do not believe you need something else.
Note that
dnssec-validation auto;
is the default setting so you do not even need to enter this one.