I have an existing PowerDNS Recursor 4.0.4 server running on Debian Jessie 8 (I know, I know, out of date ... I'm getting to that). It handles all DNS requests for my home lab network. It has a fairly simple config and has worked without interruption for literally years at a time. It also is configured to validate and successfully validates all DNSSEC.
Last night, shortly after midnight, it stopped resolving about half of all domains worldwide, returning SERVFAIL
for them. Sometimes it will resolve the primary domain (such as athenahealth.com
) but not a subdomain (such as 20785-1.portal.athenahealth.com
). Sometimes it will not resolve the primary domain (such as serverfault.com
or askubuntu.com
). I haven't been able to find any pattern, and no matter how I've mucked with my config (including turning DNSSEC completely off), it doesn't fix the problem.
My next thought was that I needed to upgrade PowerDNS Recursor, but I couldn't because of how old my DNS server was. So, I built out a brand new server running PowerDNS Recursor 5.1.3 on Ubuntu 24.04.1. Again, the config is simple. Here's the primary file:
$ cat /etc/powerdns/recursor.conf
dnssec:
# validation: process # default
trustanchorfile: /usr/share/dns/root.key
recursor:
hint_file: /usr/share/dns/root.hints
include_dir: /etc/powerdns/recursor.d
#incoming:
# listen:
# - 127.0.0.1 # default
#outgoing:
# source_address:
# - 0.0.0.0 # default
And here's a file in recursor.d
:
$ cat /etc/powerdns/recursor.d/me.yml
dnssec:
validation: off # validate
# log_bogus: true
incoming:
listen:
- 10.20.30.76:53
logging:
common_errors: true
facility: 1
loglevel: 6
quiet: true
trace: fail
recursor:
auth_zones:
- zone: my-domain-1.com
file: /etc/powerdns/my-domain-1.com.zone
forward_zones:
- zone: my-domain-2.com
forwarders:
- 10.20.31.2
setgid: pdns
setuid: pdns
socket_dir: /var/run
write_pid: true
webservice:
address: 10.20.30.76
allow_from:
- 10.20.30.0/24
- 172.24.52.0/24
api_key: loremipsum
password: foobarbazqux
port: 8080
This config is identical to my old PowerDNS Recursor config except that DNSSEC is disabled to try to get it to work. If I manually dig
(I love dig
) askubuntu.com
from the root up, I easily find an answer:
# Using i.root-servers.net is 192.36.148.17
$ dig @192.36.148.17 com NS
; <<>> DiG 9.10.6 <<>> @192.36.148.17 com NS
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 2217
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 13, AUTHORITY: 0, ADDITIONAL: 21
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;com. IN NS
;; ANSWER SECTION:
com. 136670 IN NS d.gtld-servers.net.
com. 136670 IN NS c.gtld-servers.net.
com. 136670 IN NS k.gtld-servers.net.
com. 136670 IN NS f.gtld-servers.net.
com. 136670 IN NS i.gtld-servers.net.
com. 136670 IN NS b.gtld-servers.net.
com. 136670 IN NS l.gtld-servers.net.
com. 136670 IN NS a.gtld-servers.net.
com. 136670 IN NS e.gtld-servers.net.
com. 136670 IN NS m.gtld-servers.net.
com. 136670 IN NS j.gtld-servers.net.
com. 136670 IN NS h.gtld-servers.net.
com. 136670 IN NS g.gtld-servers.net.
;; ADDITIONAL SECTION:
b.gtld-servers.net. 43604 IN A 192.33.14.30
b.gtld-servers.net. 71837 IN AAAA 2001:503:231d::2:30
l.gtld-servers.net. 44115 IN A 192.41.162.30
l.gtld-servers.net. 74612 IN AAAA 2001:500:d937::30
a.gtld-servers.net. 59944 IN A 192.5.6.30
a.gtld-servers.net. 52029 IN AAAA 2001:503:a83e::2:30
e.gtld-servers.net. 11582 IN A 192.12.94.30
e.gtld-servers.net. 63219 IN AAAA 2001:502:1ca1::30
m.gtld-servers.net. 27782 IN A 192.55.83.30
m.gtld-servers.net. 50020 IN AAAA 2001:501:b1f9::30
j.gtld-servers.net. 39663 IN A 192.48.79.30
h.gtld-servers.net. 79936 IN A 192.54.112.30
g.gtld-servers.net. 57527 IN A 192.42.93.30
g.gtld-servers.net. 63219 IN AAAA 2001:503:eea3::30
d.gtld-servers.net. 44435 IN A 192.31.80.30
d.gtld-servers.net. 10633 IN AAAA 2001:500:856e::30
c.gtld-servers.net. 50185 IN A 192.26.92.30
k.gtld-servers.net. 32146 IN A 192.52.178.30
i.gtld-servers.net. 48002 IN A 192.43.172.30
i.gtld-servers.net. 27967 IN AAAA 2001:503:39c1::30
$ dig @192.33.14.30 askubuntu.com NS
; <<>> DiG 9.10.6 <<>> @192.33.14.30 askubuntu.com NS
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 46168
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 13
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;askubuntu.com. IN NS
;; ANSWER SECTION:
askubuntu.com. 86400 IN NS sureena.ns.cloudflare.com.
askubuntu.com. 86400 IN NS damian.ns.cloudflare.com.
;; ADDITIONAL SECTION:
damian.ns.cloudflare.com. 48087 IN A 172.64.35.50
damian.ns.cloudflare.com. 48087 IN A 162.159.44.50
damian.ns.cloudflare.com. 48087 IN A 108.162.195.50
damian.ns.cloudflare.com. 13178 IN AAAA 2803:f800:50::6ca2:c332
damian.ns.cloudflare.com. 13178 IN AAAA 2606:4700:58::a29f:2c32
damian.ns.cloudflare.com. 13178 IN AAAA 2a06:98c1:50::ac40:2332
sureena.ns.cloudflare.com. 38809 IN A 108.162.194.126
sureena.ns.cloudflare.com. 38809 IN A 172.64.34.126
sureena.ns.cloudflare.com. 38809 IN A 162.159.38.126
sureena.ns.cloudflare.com. 32427 IN AAAA 2a06:98c1:50::ac40:227e
sureena.ns.cloudflare.com. 32427 IN AAAA 2803:f800:50::6ca2:c27e
sureena.ns.cloudflare.com. 32427 IN AAAA 2606:4700:50::a29f:267e
$ dig @172.64.35.50 askubuntu.com A
; <<>> DiG 9.10.6 <<>> @172.64.35.50 askubuntu.com A
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 35705
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;askubuntu.com. IN A
;; ANSWER SECTION:
askubuntu.com. 300 IN A 172.64.150.156
askubuntu.com. 300 IN A 104.18.37.100
Perfect. But if I ask either my existing PowerDNS Recursor 4.0.4 server or my new PowerDNS Recursor 5.1.3 server, I get SERVFAIL
:
$ dig @10.20.30.76 askubuntu.com A
; <<>> DiG 9.10.6 <<>> @10.20.30.76 askubuntu.com A
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 58213
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
; OPT=15: 00 16 64 65 6c 65 67 61 74 69 6f 6e 20 63 6f 6d ("..delegation com")
;; QUESTION SECTION:
;askubuntu.com. IN A
The OPT=15
line with some kind of signature plus delegation com
is interesting. It's not happening on every domain that's failing to resolve, so it might be a red herring (and it changes ... like running that same query again resulted in OPT=15: 00 16 64 65 6c 65 67 61 74 69 6f 6e 20 61 73 6b 75 62 75 6e 74 75 2e 63 6f 6d ("..delegation askubuntu.com")
).
Here is the PowerDNS Recursor 5.1.3 fail trace for a failed lookup of askubuntu.com
: https://gist.github.com/beamerblvd/d8fa24bdf1037e2a670f8e331b7e4905
FWIW, I'm on Comcast Business Class with a 5-address static IP delegation.
What am I doing wrong?
Ultimately, a comment on this page about Comcast's "SecurityEdge" product helped me, after beating my head against the wall for more than 12 hours over this. I logged into my Comcast Business account, went to my internet service tab, scrolled to the bottom, and disabled the SecurityEdge product that had magically been enabled overnight. Suddenly all of my DNS is working perfectly again. I wasn't doing anything wrong after all. Comcast was.
Turns out their SecurityEdge product intercepts ALL DNS requests, even those made directly to authoritative servers, and inserts their own responses, which results in nonsensical answers coming from what appear to be authoritative servers but actually aren't.
Leaving this question up in case it helps someone else in the future.