The SOA validation failure

Or: how I learned to stop worrying and love dogfooding

October 16, 2017

On October 12th, the SOA record for did not pass DNSSEC validation due to a broken signature. A string of events lead to the creation of PowerDNS bug #5814 and incidently also to #5807.

The Plan and Execution

On the 10th of October we started a small internal project to change the set up of our authoritative infrastructure. This project had as a goal to have a more representable DNS authoritative infrastructure we could use to better dogfood our own software. In short, this was the work that had to be done:

  • Upgrade to 4.1.0-rc on
  • Switch to the gsqlite3 backend from the BIND backend
  • Add a hidden master on and slave via the local loopback

The upgrade was easy enough with the use of the repositories. The whole migration from BIND to gsqlite3 was very straight-forward thanks to pdnsutil b2b-migrate, although some bugs were found and subsequently fixed. Adding the hidden master was also effortless and after changing the zones from ‘MASTER’ to ‘SLAVE’ and setting the master addresses to the local loopback address the whole chain of zone transfers was tested and worked properly: NOTIFY mesages were sent, AXFRs were performed and dig was happy.

Yay, Time to stop working for the day!

The Missed Warning Signs

The next morning I was pretty sleepy on a train on my way to Open-Xchange Summit and saw a bunch of monitoring emails come in regarding the DNSSEC validation of our domains:

Info:    CRITICAL validation failure < SOA IN>: RSA signature verification failed from 2604:a880:1:20::132:5001

And yes, we actually monitor SOA freshness between master and slaves and the DNSSEC chain of trust for all our domains (as one should). In my sleepy state, I decided these were probably emails from yesterday and just stared out the window until I arrived in Brussels. Once there, I got a message from the cool people at that the RRSIG sent alongside the SOA for was not valid. Oops!

The Issue

In the PowerDNS Authoritative Server, we try really hard to ensure that slave server will fetch fresh RRSIGs when zones are resigned. One of these ways is the SOA-EDIT domain metadata. This piece of metadata lets the nameserver process bump the SOA serial artificially when serving the record from the backend.

There is also another piece of metadata called PRESIGNED. This indicates that the zone was AXFR’d from a master and already has DNSSEC signatures in the backend, it is set automatically when we see RRSIGs in the incoming AXFR.

During the b2b-migrate, the domain metadata for all zones was correctly migrated. This included the SOA-EDIT we had in place for

So after AXFR’ing the zone, we had the following metadata in the database:

select * from domainmetadata where domain_id=41;

When querying the nameserver the following records were returned:

; <<>> DiG 9.11.2 <<>> +dnssec soa
;; ANSWER SECTION:      3600  IN   SOA pieter\ 2017101203 10800 3600 604800 3600      3600  IN   RRSIG   SOA 8 2 3600 20171026000000 20171005000000 36021 C/VixIC.....

After some debugging, the PRESIGNED metadata was removed. And lo and behold, the following was returned:

;; ANSWER SECTION:      3600  IN   RRSIG   SOA 8 2 3600 20171026000000 20171005000000 36021 C/VixIC.....      3600  IN   SOA pieter\ 2017101201 10800 3600 604800 3600

Notice that the SOA serial is reduced by 2. This SOA record actually validates correctly:

Info:    OK the chain of trust is valid.

So it appears PowerDNS honors the SOA-EDIT metadata for pre-signed zones, oops! To prevent others from running into this issue, the problem was clearly documented and a solution was writen within an hour. This fix will land in the next release of the PowerDNS Authoritative Server, and will be backported to the 4.0 release train as well.

What We Learned

Several things were learned in the course of migrating and figuring out what was going:

The SOA-EDIT interaction with PRESIGNED was not known and thus a corner case that was not foreseen in the migration plan.

Our DNSSEC trust-chain monitoring only found this issue because it queries for SOA. In the future we will write a checking script that will attempt to check all (or at least many) RRSIGs in the zone.

And most importantly: eating your own dogfood is an amazing way to find bugs before your users do!