A Day in the Life of the Internet is a large-scale data collection project initially undertaken by CAIDA and subsequently by OARC every year since 2006. This year, the DITL collection will take place in April. If you would like to participate by collecting and contributing DNS packet captures, please subscribe to the DITL mailing list.

Participation Requirements

There are no strict participation requirements. OARC is happy to accept data from members and non-members alike. You will need a login from OARC to submit data and OARC will need your ssh public key. Contact OARC Admin if you need to setup or update your login or ssh keys. If you are not an OARC member, you may want to sign a Proprietary Data Agreement with us, but this is not required. In terms of data sources, we are always interested in getting a lot of coverage from DNS Root servers, TLD servers, AS112 nodes, and "client-side" iterative/caching resolvers.

Types of DNS Data

Most of the data that we collect for DITL will be pcap files (e.g., from dnscap or tcpdump). We are also happy to accept other data formats such as BIND query logs, text files, SQL database dumps, and so on. We have an established system for receiving compressed pcap files from contributors. If you want to contribute data in a different format, please contact us to make transfer arrangements.

Pre-collection Checklist

  • Please make sure that your collection hosts are time-synchronized with NTP. Do not simply use date to check a clock since you might be confused by time zone offsets. Instead use ntpdate like this:
    $ ntpdate -q clock.isc.org
    server 204.152.184.72, stratum 1, offset 0.002891, delay 0.02713
    
    The reported offset should normally be very small (less than one second). If not, your clock is probably not synchronized with NTP.
  • Be sure to do some "dry runs" before the actual collection time. This will obviously test your procedures and give you a sense of how much data you'll be collecting.
  • Carefully consider your local storage options. Do you have enough local space to store all the DITL data? Or will you need to upload it as it is being collected? If you have enough space, perhaps you'll find it easier to collect first and upload after, rather than trying to manage both at the same time.

Collecting Data with dnscap

If you don't already have your own system for capturing DNS traffic, we recommend using dnscap with some shell scripts that we provide specifically for DITL collection.

  1. Download the most recent version of dnscap.
  2. Note that dnscap does not require libbind, unless you want to use the -x or -X options.
  3. Run ./configure, make and then 'make install' as root. This installs dnscap to /usr/local/bin.

Next download the ditl-tools package, where we provide scripts for using either (dnscap) or (tcpdump and tcpdump-split). In most cases dnscap should be easier. The tcpdump method is included for sites that would prefer it or cannot use dnscap for some reason. Note that the settings.sh configuration file described below includes variables for both dnscap and tcpdump. Some variables are common to both, while some are unique to each. By default these will store pcap files in the current directory. You may want to copy these scripts to a different directory where you have plenty of free disk space.

  1. Copy settings.sh.default to settings.sh.
  2. Open settings.sh in a text editor.
  3. Set the IFACES variable to the names of your network interfaces carrying DNS data.
  4. Set the NODENAME variable (or leave it commented to use the output of `hostname` as the NODENAME). Please make sure that each instance of dnscap that you run has a unique $nodename!
  5. Set the OARC_MEMBER variable to your OARC-assigned name. Note that the scripts automatically prepend "oarc-" to the login name so just give the short version here.
  6. Note that the scripts assume your OARC ssh upload key is at /root/.ssh/oarc_id_dsa.
  7. Look over the remaining variables in settings.sh. Read the comments in capture-dnscap.sh to understand what all the variables mean.

Here is an example customized settings.sh file:

# Settings that you should customize
#
IFACES="fxp0"
NODENAME="lgh"
OARC_MEMBER="test"

#START_T='2011-04-12 11:00:00'
#STOP_T='2011-04-14 13:00:00'

When you're done customizing the settings, run capture-dnscap.sh as root:

$ sudo sh capture-dnscap.sh

When its time to do the actual DITL data collection, please uncomment the START_T and STOP_T variables in settings.sh and run the scripts from within a screen session.

Collecting Data with tcpdump and tcpdump-split

Another collection option is to use tcpdump and our tcpdump-split program. The instructions are similar to the above.

  1. Download and install the ditl-tools package (see link above).
  2. Copy settings.sh.default to settings.sh and bring it up in a text editor
  3. Set the IFACES variable to the single network interface to collect DNS data from.
  4. Set NODNAME
  5. Set OARC_MEMBER
  6. Set DESTINATIONS if desired

Start the capture with:

$ sudo sh capture-tcpdump.sh

Uncomment the START_T and STOP_T and use screen when its time for the real deal.

Uploading Data Manually

If for some reason there was an interruption and the scripts had to be restarted again, or something didn't get uploaded and files collected on your local server due to a bad SSH key, you can still upload that data manually. This can be done really easily by invoking the pcap-submit-to-oarc.sh script within a shell command, like so:

for F in *.pcap.gz; do /your/path/to/pcap-submit-to-oarc.sh $F; done

If you happen to have files that have not been compressed, compress those manually and then invoke the script as above.

Anonymization

Data providers may wish to anonymize their PCAP data prior to upload due to privacy concerns, corporate policy or local legal requirements. There are a number of tools that CAIDA lists which may be of help.

Contact

Contact the OARC Admin with any questions about DITL.