The powerdns.com SOA validation failure
On October 12th, the SOA record for powerdns.com did not pass DNSSEC validation due to a broken signature. A string of events lead to the creation of PowerDNS bug #5814 and incidently also to #5807.
The Plan and Execution
On the 10th of October we started a small internal project to change the set up of our authoritative infrastructure. This project had as a goal to have a more representable DNS authoritative infrastructure we could use to better dogfood our own software. In short, this was the work that had to be done:
- Upgrade to 4.1.0-rc on pdns-public-ns1.powerdns.com
- Switch to the gsqlite3 backend from the BIND backend
- Add a hidden master on pdns-public-ns1.powerdns.com and slave via the local loopback
The upgrade was easy enough with the use of the repositories.
The whole migration from BIND to gsqlite3 was very straight-forward thanks to
pdnsutil b2b-migrate, although some bugs were found and subsequently fixed.
Adding the hidden master was also effortless and after changing the zones from ‘MASTER’ to ‘SLAVE’ and setting the master addresses to the local loopback address the whole chain of zone transfers was tested and worked properly: NOTIFY mesages were sent, AXFRs were performed and
dig was happy.
Yay, Time to stop working for the day!
The Missed Warning Signs
The next morning I was pretty sleepy on a train on my way to Open-Xchange Summit and saw a bunch of monitoring emails come in regarding the DNSSEC validation of our domains:
Info: CRITICAL powerdns.com: validation failure <powerdns.com. SOA IN>: RSA signature verification failed from 2604:a880:1:20::132:5001
And yes, we actually monitor SOA freshness between master and slaves and the DNSSEC chain of trust for all our domains (as one should). In my sleepy state, I decided these were probably emails from yesterday and just stared out the window until I arrived in Brussels. Once there, I got a message from the cool people at internet.nl that the RRSIG sent alongside the SOA for powerdns.com was not valid. Oops!
In the PowerDNS Authoritative Server, we try really hard to ensure that slave server will fetch fresh RRSIGs when zones are resigned. One of these ways is the SOA-EDIT domain metadata. This piece of metadata lets the nameserver process bump the SOA serial artificially when serving the record from the backend.
There is also another piece of metadata called PRESIGNED. This indicates that the zone was AXFR’d from a master and already has DNSSEC signatures in the backend, it is set automatically when we see RRSIGs in the incoming AXFR.
b2b-migrate, the domain metadata for all zones was correctly migrated.
This included the SOA-EDIT we had in place for powerdns.com.
So after AXFR’ing the zone, we had the following metadata in the database:
select * from domainmetadata where domain_id=41; 9|41|SOA-EDIT|INCEPTION-INCREMENT 11|41|PRESIGNED|1
When querying the nameserver the following records were returned:
; <<>> DiG 9.11.2 <<>> +dnssec soa powerdns.com @pdns-public-ns1.powerdns.com. ... ;; ANSWER SECTION: powerdns.com. 3600 IN SOA pdns-public-ns1.powerdns.com. pieter\.lexis.powerdns.com. 2017101203 10800 3600 604800 3600 powerdns.com. 3600 IN RRSIG SOA 8 2 3600 20171026000000 20171005000000 36021 powerdns.com. C/VixIC.....
After some debugging, the PRESIGNED metadata was removed. And lo and behold, the following was returned:
;; ANSWER SECTION: powerdns.com. 3600 IN RRSIG SOA 8 2 3600 20171026000000 20171005000000 36021 powerdns.com. C/VixIC..... powerdns.com. 3600 IN SOA pdns-public-ns1.powerdns.com. pieter\.lexis.powerdns.com. 2017101201 10800 3600 604800 3600
Notice that the SOA serial is reduced by 2. This SOA record actually validates correctly:
Info: OK powerdns.com: the chain of trust is valid.
So it appears PowerDNS honors the SOA-EDIT metadata for pre-signed zones, oops! To prevent others from running into this issue, the problem was clearly documented and a solution was writen within an hour. This fix will land in the next release of the PowerDNS Authoritative Server, and will be backported to the 4.0 release train as well.
What We Learned
Several things were learned in the course of migrating and figuring out what was going:
The SOA-EDIT interaction with PRESIGNED was not known and thus a corner case that was not foreseen in the migration plan.
Our DNSSEC trust-chain monitoring only found this issue because it queries for SOA. In the future we will write a checking script that will attempt to check all (or at least many) RRSIGs in the zone.
And most importantly: eating your own dogfood is an amazing way to find bugs before your users do!