OARC's TLDmon uses Nagios to monitor operational characteristics of authoritative nameservers for the Root Zone and all Top Level Domains. TLDmon checks for authoritative answers, EDNS support, lame delegations, consistent NS RR sets, open resolvers, expired RRSIGs, matching serial numbers, and TCP support. As the Domain Name System continues its evolution, it becomes increasingly important that these critical nameservers are configured correctly. TLDmon is available to the public. OARC members can receive notifications (via email) about zone problems directly from Nagios. Members can also request monitoring of additional (non-TLD) zones. Please contact the OARC Admin for more information. Nagios is really designed to monitor hosts and services that run on those hosts. We've configured Nagios such that each DNS zone is a "host" and the characteristic to be monitored is a "service." TLDmon checks the following operational characteristics of each zone:
AA
The AA service checks that all nameservers set the Authoritative Answer (AA) bit in responses to SOA queries for the zone. When all nameservers set the AA bit, the AA service is marked OK and shown in green. If one or more do not set the AA bit, the service is marked WARNING and shown in yellow. A nameserver that does not set the AA bit may be configured as a caching resolver, rather than an authoritative server. Caching resolvers are susceptible to DNS cache poisoning.EDNS
The EDNS service checks that all nameservers support EDNS0 extensions. When all nameservers support EDNS0, the EDNS service is marked OK and shown in green. If one or more do not, the service is marked WARNING and shown in yellow. The EDNS0 protocol extension (written in 1999) is necessary for the transmission of UDP DNS messages larger than 512 octets. It is also used to request DNSSEC validation.IPV6
The IPV6 service checks that the zone's IPv6-enabled nameservers are working. Only those zones with at least one IPv6-enabled nameserver are checked. When all IPv6-enabled nameservers are working, the service is marked OK and shown in green. If one or more do not, the service is marked WARNING and shown in yellow. IPv6 is increasingly important as the IPv4 free pool shrinks in size.LAME
The LAME service checks that no nameservers are lame for the zone. A nameserver is considered lame when the response to an SOA query for the zone contains no records in the answer section. When no nameservers are lame, the LAME service is marked OK and shown in green. If one or more are lame, the service is marked WARNING and shown in yellow. A lame nameserver results in wasted queries and additional latency for end users.NSSET
The NSSET service checks that all nameservers report the same set of NS records for the zone and that they match the delegations from the parent zone. Note that lame nameservers are excluded from this check. If all (non-lame) nameservers report the same NS set, the service is marked OK and shown in green. If there is at least one inconsistency, the service is marked WARNING and shown in yellow. A nameserver that is known by different names appears as an inconsistency when the delegation name does not match the name listed in the zone. Some people may not consider this a real problem. The NSSET service only checks the nameserver names, not their IP addresses.OPENRES
The OPENRES service checks that none of the nameservers are open resolvers (i.e., providing recursive resolution to any client). If no nameservers are open resolvers, the OPENRES service is marked OK and shown in green. If one or more are open, the service is marked WARNING and shown in yellow. It is usually a bad idea to mix recursive and authoritative DNS services in a single process, and especially so for top-level zones in the DNS infrastructure. Open resolvers have an increased vulnerability to cache poisoning and denial of service.QTYPE
The QTYPE tests for the DNS server's expected response to an unknown QTYPE. The response should always come back as unknown, but if an unexpected response is sent instead, the service is flagged with a warning.RCODE
The RCODE service checks that all nameservers return response code zero ("NOERROR") in response to an SOA query for the zone. If all nameservers return NOERROR, the RCODE server is marked OK and shown in green. If one or more return an error code (such as SERVFAIL, REFUSED) or no response at all, the service is marked WARNING and shown in yellow. Nameservers that return errors or cause timeouts lead to wasted queries and increased latency for end users.RRSIG
The RRSIG service checks the expiration time of DNSSEC RRSIG records for zone itself. Zones that do not implement DNSSEC are excluded from this check. If the expiration time for RRSIG records is greater than 3 days, the service is marked OK and shown in green. If one or more RRSIG records is already expired, the service is marked CRITICAL and shown in red. If records expire in less than 3 days, the service is marked WARNING and shown in yellow.SERIAL
The SERIAL service checks that all nameservers report the same serial number for the zone. When all nameservers report the same serial number, the SERIAL service is marked OK and shown in green. If one or more nameservers has a different serial number, the service is marked WARNING and shown in yellow. Serial number checking is prone to false alarms due to latencies involved in master/slave synchronization, and in the time that it takes to query multiple nameservers. To reduce false alarms, we tolerate two exceptions to the requirement that serial numbers must match:- We always tolerate off-by-one differences, such as 12345 and 12346.
- We tolerate a difference of up to 3600 if the serial number appears to be a Unix epoch time value and the maximum serial differs from the current time by less than 3600 seconds.