I am implementing a solution to load balance DNS queries across multiple bind recursive DNS servers to increase QPS limit
Each centos VM has a namespace gi set up with the loopback of the ns set to asingle DNS Public IP
Each DNS server advertises the same DNS IP to my network across bgp peerings configured on my quagga router
all incoming queries are load-balanced via the network core using the bgp maximum-paths feature
However only 1 Bind DNS server will query the DNS IP, the other will just return servfail (this is not static, if i kill the bgp peerings to Server1, queries are succesful, the same happens if i kill the peerings to Server2) however they will not work in tandem.
One thing i have noticed is that if i do a
ip netns exec gi dig @DNSIP +trace
ip netns exec gi dig @DNSIP cloudflare.com +trace
; <<>> DiG 9.11.4-P2-RedHat-9.11.4-16.P2.el7_8.6 <<>> @DNSIP cloudflare.com +trace
; (1 server found)
;; global options: +cmd
. 509520 IN NS e.root-servers.net.
. 509520 IN NS c.root-servers.net.
. 509520 IN NS f.root-servers.net.
. 509520 IN NS j.root-servers.net.
. 509520 IN NS b.root-servers.net.
. 509520 IN NS i.root-servers.net.
. 509520 IN NS h.root-servers.net.
. 509520 IN NS m.root-servers.net.
. 509520 IN NS k.root-servers.net.
. 509520 IN NS a.root-servers.net.
. 509520 IN NS l.root-servers.net.
. 509520 IN NS d.root-servers.net.
. 509520 IN NS g.root-servers.net.
. 509520 IN RRSIG NS 8 0 (didn't include the key)
whereas Server2 does not return an RRSIG even though both named.conf files have dnssec-enable yes and dnssec-validation yes
ip netns exec gi dig @DNSIP cloudflare.com +trace
; <<>> DiG 9.11.4-P2-RedHat-9.11.4-16.P2.el7_8.6 <<>> @DNSIP cloudflare.com +trace
; (1 server found)
;; global options: +cmd
. 518400 IN NS c.root-servers.net.
. 518400 IN NS k.root-servers.net.
. 518400 IN NS g.root-servers.net.
. 518400 IN NS d.root-servers.net.
. 518400 IN NS a.root-servers.net.
. 518400 IN NS j.root-servers.net.
. 518400 IN NS e.root-servers.net.
. 518400 IN NS h.root-servers.net.
. 518400 IN NS f.root-servers.net.
. 518400 IN NS i.root-servers.net.
. 518400 IN NS m.root-servers.net.
. 518400 IN NS b.root-servers.net.
. 518400 IN NS l.root-servers.net.
My dnssec configuration is as follows:
dnssec-enable no;
dnssec-validation no;
/* Path to ISC DLV key */
bindkeys-file "/etc/named.iscdlv.key";
managed-keys-directory "/var/named/dynamic";
If i disable dnssec in my named.conf file thr DNS servers work in tandem and I can achieve my target goal of 20,000 QPS, however with dnssec enabled it does not work.
Has anyone encountered a problem like this before, is it a limiation of BIND behind a single PublicIP? or is as I suspect an issue with DNSSEC setup