While working on the TLDmon plugins a couple of weeks ago, I noticed that a certain query to b.iana-servers.net was consistenly failing:
It was strange because queries to the same server for all of the other TLD hosted there work just fine:$ dig +bufsiz=2048 @b.iana-servers.net XN--9T4B11YI5A RRSIG ; >> DiG 9.3.5-P2 >> +bufsiz=2048 @b.iana-servers.net XN--9T4B11YI5A RRSIG ; (1 server found) ;; global options: printcmd ;; connection timed out; no servers could be reached
It was also strange because the problematic TLD works fine from hosts outside of ISC's network (which is where the OARC servers are located), and it works if the query is sent to a.iana-servers.net or c.iana-servers.net. A tcpdump shows that the DNS reply message is fragmented and we only get the first fragment. That this problem happens only (?) when querying from ISC's network seems to imply it is caused by something on ISC's network. But then why does it work when querying c.iana-servers.net? Why would the second fragment from c arrive, but not the fragment from b? Here's a tcpdump showing the correctly received second fragment from c. I think it safe to assume that the fragment leaving b is the same, except with different values some TCP header fields (ip_sum, ip_id, ip_ttl, ip_src). Here's another tcpdump showing the fragment from b received outside ISC's network. Note that both of these fragments are smaller (ip_len=31) than the minimum Ethernet payload size so they are padded out to 46 bytes.$ dig +short +bufsiz=2048 @b.iana-servers.net XN--KGBECHTV RRSIG | wc 6 78 1794 $ dig +short +bufsiz=2048 @b.iana-servers.net XN--HGBK6AJ7F53BBA RRSIG | wc 6 78 1831