Submitted by admin on

(Note: For further information on other OARC data, please consult the OARC Data Catalog. Information about contributing to DITL can be found below.)

An Overview

DNS-OARC collects DNS packet captures from busy and interesting DNS nameservers through various means, including the annual Day In The Life of the Internet (DITL) collection effort. It also includes other data collections in other formats such as BIND query logs.

The Day in the Life collection is an annual event where contributors record traffic from DNS servers they manage for the same 50-hour period. This provides an annual two-day slice of DNS traffic across a broad swath of the Internet which is useful for research into things like "typical" behavior, trends over time, and other subjects.

OARC offers access to these data to researchers and OARC members through the use of a number of analysis servers. Approved users/individuals are given logins on these servers to conduct whatever studies are necessary for their work. Data contributed to DNS-OARC under our Data Sharing Agreement requires that the original data be kept on servers under OARC's control, and restricts data extracted from these servers to highly aggregated and anonymized data, synthesized from the original, and suitable for publication.

Organizations wishing to become an OARC Member, in order to access these data, should consult the Joining and Participating in DNS-OARC page.

A Day in the Life of the Internet is a large-scale data collection project, initially undertaken by CAIDA, and managed by OARC every year since 2006. If you would like to participate by collecting and contributing DNS packet captures, please email us.

Participation Requirements

There are no strict participation requirements. OARC is happy to accept data from members and non-members alike. We particularly encourage participation from:

  • root server operators
  • TLD operators
  • AS112 operators and other large reverse zone operators (e.g. Regional Internet Registries, ISPs)
  • medium sized recursive operators (e.g. large universities or enterprises, regional ISPs)

For recursive server contributions, we expect the data collection to be done on the network interface "above" the recursive server, capturing traffic to authoritative servers instead of traffic directly from individual clients. This avoids privacy issues with personally identifiable information.

Any organization that wishes to contribute data to DITL should contact staff to coordinate setup.

Types of DNS Data

DITL contributions are typically PCAP files (from tools like dnscap or tcpdump). OARC has an established system to receive a stream of compressed PCAP files from contributors, live during the collection. Contributing organizations will need a login for DNS-OARC's data collection system to upload data. Contact OARC staff if you wish to participate and do not already have a login for your organization.

In some cases in the past we have accepted data in other formats, such as query logs. If you wish to contribute data in a format other than PCAP, please contact us to make other arrangements.

Technical Information for Contributors

Pre-collection Checklist

  • Please make sure that your collection hosts are time-synchronized with NTP. Do not simply use date to check a clock as timezone offsets can go unnoticed. Use ntpd (or similar long running daemons) to keep your clocks in sync, or use ntpdate like this:
    $ ntpdate -q time.google.com
    server 204.152.184.72, stratum 1, offset 0.002891, delay 0.02713
    Pick a time server that makes sense for your network.

    The reported offset should normally be very small (less than one second). If not, your clock is probably not synchronized with NTP.
  • Be sure to do some "dry runs" before the actual collection time. This will test your procedures and give you a sense of how much data you'll be collecting. OARC runs an official test window two weeks before the main collection, but other testing is welcome anytime up to a few days before.
  • Carefully consider your local storage options. Do you have enough local space to store all the DITL data? Or will you need to upload it as it is being collected? If you have enough space, perhaps you'll find it easier to collect first and upload after, rather than trying to manage both at the same time.

Collecting Data with dnscap

If you don't already have your own collection system for DNS traffic, we recommend using dnscap, with some shell scripts that we provide specifically for DITL:

  1. Install the most recent version of dnscap, available from the OARC package repository.
  2. Next, download the ditl-tools package. This provides scripts for automatic capture and upload using either dnscap, or tcpdump with tcpdump-split.

    In most cases dnscap should be the easiest option. The tcpdump method is included for sites that would prefer it or cannot use dnscap for some reason. Note that the settings.sh configuration file described below includes variables for both dnscap and tcpdump. Some variables are common to both, while some are unique to each. By default these will store pcap files in the current directory. You may want to copy these scripts to a different directory where you have plenty of free disk space.
  3. Copy settings.sh.default to settings.sh.
  4. Open settings.sh in a text editor.
  5. Set the IFACES variable to the names of your network interfaces carrying DNS data.

    IMPORTANT: For recursive servers this should be the interface where outgoing queries toward authoritative servers or upstream forwarders exit the system, and not the interface where incoming client queries are received. If these are the same interface, then additional filters will be required to ensure only queries sourced from the local server are captured.
  6. Set the NODENAME variable (or leave it commented to use the output of hostname as the NODENAME). Please make sure that each instance of dnscap that you run has a unique NODENAME!
  7. Set the OARC_MEMBER variable to your OARC-assigned contributor login. The provided scripts automatically prepend oarc- to the login name before connecting, so just give the short version here. The scripts assume your OARC ssh upload key is at /root/.ssh/oarc_id_ed25519 unless the settings are changed.
  8. Look over the remaining variables in settings.sh. Read the comments in capture-dnscap.sh to understand what all the variables mean.

Here is an example of a customized settings.sh file:

# Settings that you should customize
#
IFACES="fxp0"
NODENAME="lgh"
OARC_MEMBER="test"

# START_T='2011-04-12 11:00:00'
# STOP_T='2011-04-14 13:00:00'

When you're done customizing the settings, run capture-dnscap.sh as root:

$ sudo sh capture-dnscap.sh

When its time to do the actual DITL data collection, uncomment the START_T and STOP_T variables in settings.sh. The date settings for each year's DITL collection are communicated to the contributors in early February in order to give plenty of notice for testing and setup.

With the date values set, the script will automatically start and stop capturing data at the correct times.

You can run the scripts from within a terminal session manager like screen or tmux to avoid terminal disconnections from prematurely ending your collection.

Collecting Data with tcpdump and tcpdump-split

Another collection option is to use tcpdump and our tcpdump-split program. The instructions are similar to the above:

  1. Download and install the ditl-tools package.
  2. Follow the instructions in the ditl-tools README.md file for compiling and installing tcpdump-split.
  3. Copy settings.sh.default to settings.sh and bring it up in a text editor.
  4. Set the IFACES variable to the single network interface to collect DNS data from.
  5. Set NODENAME, and OARC_MEMBER as above.
  6. Set DESTINATIONS if desired.
  7. Start the capture with:
    $ sudo sh capture-tcpdump.sh
  8. Set and uncomment the START_T and STOP_T values, and use screen or tmux for the main collection event, also as above.

Uploading Data Manually

If for some reason there is an interruption and the scripts need to be restarted, or something didn't get uploaded and files collected on your local server due to a bad SSH key, you can still upload that data manually. This can be done really easily by invoking the pcap-submit-to-oarc.sh script within a shell command, like so:

for F in *.pcap.gz; do /your/path/to/pcap-submit-to-oarc.sh $F; done

If you happen to have files that have not been compressed, compress those manually and then invoke the script as above.

Anonymization

Data providers may wish to anonymize their PCAP data prior to upload due to privacy concerns, corporate policy or local legal requirements. There are a number of tools available for anonymizing PCAP data in ways that are still scientifically useful. OARC staff can put new contributors in contact with existing contributors who are already anonymizing their contributions for pointers and assistance.

Contact

Contact DNS-OARC staff with any questions about DITL.