We're using BIND 9.7.3 on the stable version of Debian (updated weekly), and we see some very strange behaviour for one particular domain. We host a few hundred, but this one is ours.
Basically, the secondary DNS server is trying to transfer the domain from the master. According to the logs, it succeeds in transferring the domain every time, but it always gets the serial number wrong! As a result, it keeps re-doing the transfer at every opportunity. I'm not even sure where it's getting the serial number, because the primary server reports back with the right serial number.
Here's the logs we get from the secondary (the ip 192.168.0.130 is the primary server, 192.168.0.4 is the secondary. And of course, they're not real.):
Aug 23 03:01:08 ns2 named[4242]: transfer of 'mydomain.ca/IN/external' from 192.168.0.130#53: connected using 192.168.0.4#60959
Aug 23 03:01:08 ns2 named[4242]: transfer of 'mydomain.ca/IN/external' from 192.168.0.130#53: Transfer completed: 0 messages, 1 records, 0 bytes, 0.001 secs (0 bytes/sec)
This seems pretty normal, although both hosts are set up with IPv6 addresses and technically speaking they should be using them, but that's a problem for another day (I think).
So let's query the primary server from the secondary one, and see what it says:
$ host -4 -t any mydomain.ca 192.168.0.130
Using domain server:
Name: 192.168.0.130
Address: 192.168.0.130#53
Aliases:
mydomain.ca has IPv6 address fc00:::31
mydomain.ca has SOA record ns1.mydomain.bc.ca. hostmaster.mydomain.ca. 2011082201 900 3600 604800 86400
mydomain.ca name server ns2.mydomain.bc.ca.
mydomain.ca name server ns1.mydomain.bc.ca.
mydomain.ca mail is handled by 20 pop.mydomain.ca.
mydomain.ca has address 192.168.0.205
mydomain.ca descriptive text "v=spf1 mx ip4:192.168.0.4 ip4:192.168.0.193 ip6:fc00:::23 ip6:fc00:::12 ip6:fc00:::33 a:smtp.mydomain.ca a:webmail.mydomain.ca a:smtp2.mydomain.ca a:ns2.mydomain.ca ~all"
And then let's do the same for the secondary nameserver:
$ host -4 -t any mydomain.ca 192.168.0.4
Using domain server:
Name: 192.168.0.4
Address: 192.168.0.4#53
Aliases:
mydomain.ca has SOA record ns1.mydomain.bc.ca. hostmaster.mydomain.ca. 2011011013 600 600 600 600
mydomain.ca descriptive text "v=spf1 mx ip4:192.168.0.4 ip4:192.168.0.193 ip6:fc00::23 ip6:fc00::12 ip6:fc00::33 a:smtp.mydomain.ca a:webmail.mydomain.ca a:smtp2.mydomain.ca a:ns2.mydomain.ca ~all"
mydomain.ca has address 192.168.0.205
mydomain.ca mail is handled by 20 pop.mydomain.ca.
mydomain.ca name server ns1.mydomain.bc.ca.
mydomain.ca name server ns2.mydomain.bc.ca.
mydomain.ca has IPv6 address fc00::31
You can see here that the serial number is 2011011013 on the secondary, but 2011082201 for the primary. I've used the date plus a 2-digit number, so the secondary is somehow using a serial number from January. I've tried searching our configuration on both the primary and secondary servers for this serial number, but it's nowhere to be found.
Speaking of configuration, here's the configuration for this domain in /etc/bind/named.conf:
zone "mydomain.ca" { type slave; file "secondaries/mydomain.ca"; masters { 192.168.0.130; }; };
and the timestamp on secondaries/mydomain.ca is the time of the most recent update. Deleting this file still results in a serial number of 2011011013. The contents of this file are very long, but here are the headers on the secondary server:
$ORIGIN .
$TTL 3600 ; 1 hour
mydomain.ca IN SOA ns1.mydomain.bc.ca. hostmaster.mydomain.ca. (
2011011013 ; serial
600 ; refresh (10 minutes)
600 ; retry (10 minutes)
600 ; expire (10 minutes)
600 ; minimum (10 minutes)
)
NS ns1.mydomain.bc.ca.
NS ns2.mydomain.bc.ca.
A 192.168.0.205
MX 20 pop.mydomain.ca.
TXT "v=spf1 mx ip4:192.168.0.4 ip4:192.168.0.193 ip6:fc00::23 ip6:fc00::12 ip6:fc00::33 a:smtp.mydomain.ca a:webmail.mydomain.ca a:smtp2.mydomain.ca a:ns2.mydomain.ca ~all"
AAAA fc00::31
$ORIGIN mydomain.ca.
and the headers from the equivalent file on the primary:
$TTL 1d
@ IN SOA ns1.mydomain.bc.ca. hostmaster.mydomain.ca. (
2011082302 ; serial
15m ; refresh after 15 minutes
1h ; retry after 1 hour
1w ; expire after 1 week
1d ) ; negative caching TTL of 1 day.
IN NS ns1.mydomain.bc.ca.
IN NS ns2.mydomain.bc.ca.
IN MX 20 pop.mydomain.ca.
@ IN A 192.168.0.205
;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; SPF TXT records
;;;;;;;;;;;;;;;;;;;;;;;;;;;
mydomain.ca. TXT "v=spf1 mx ip4:192.168.0.4 ip4:192.168.0.193 ip6:fc00::23 ip6:fc00::12 ip6:fc00::33 a:smtp.mydomain.ca a:webmail.mydomain.ca a:smtp2.mydomain.ca a:ns2.mydomain.ca ~all"
; this next bit is for the Sender Policy Framework, if it ever really matters.
pop TXT "v=spf1 a -all"
pop3 TXT "v=spf1 a -all"
smtp TXT "v=spf1 a -all"
webmail TXT "v=spf1 a -all"
horde TXT "v=spf1 a -all"
check your permissions on the secondary server's directory for the zones. Can the named process write to that folder? Try deleting that secondary zone and let the transfer recreate it
Notice that your BIND log at the primary indicates
That's not a success confirmation. Is there any message indicating an error before this?