RFC 8618

Compacted-DNS (C-DNS): A Format for DNS Packet Capture

Pages: 79
Proposed Standard

Part 1 of 4 – Pages 1 to 17

RFC8618 - Page 1

Internet Engineering Task Force (IETF)                      J. Dickinson
Request for Comments: 8618                                      J. Hague
Category: Standards Track                                   S. Dickinson
ISSN: 2070-1721                                               Sinodun IT
                                                            T. Manderson
                                                                   ICANN
                                                                 J. Bond
                                              Wikimedia Foundation, Inc.
                                                          September 2019


         Compacted-DNS (C-DNS): A Format for DNS Packet Capture

Abstract

   This document describes a data representation for collections of DNS
   messages.  The format is designed for efficient storage and
   transmission of large packet captures of DNS traffic; it attempts to
   minimize the size of such packet capture files but retain the full
   DNS message contents along with the most useful transport metadata.
   It is intended to assist with the development of DNS traffic-
   monitoring applications.

Status of This Memo

   This is an Internet Standards Track document.

   This document is a product of the Internet Engineering Task Force
   (IETF).  It represents the consensus of the IETF community.  It has
   received public review and has been approved for publication by the
   Internet Engineering Steering Group (IESG).  Further information on
   Internet Standards is available in Section 2 of RFC 7841.

   Information about the current status of this document, any errata,
   and how to provide feedback on it may be obtained at
   https://www.rfc-editor.org/info/rfc8618.

RFC8618 - Page 2

Copyright Notice

   Copyright (c) 2019 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (https://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   4
   2.  Terminology . . . . . . . . . . . . . . . . . . . . . . . . .   5
   3.  Data Collection Use Cases . . . . . . . . . . . . . . . . . .   5
   4.  Design Considerations . . . . . . . . . . . . . . . . . . . .   8
   5.  Choice of CBOR  . . . . . . . . . . . . . . . . . . . . . . .  10
   6.  C-DNS Format Conceptual Overview  . . . . . . . . . . . . . .  10
     6.1.  Block Parameters  . . . . . . . . . . . . . . . . . . . .  14
     6.2.  Storage Parameters  . . . . . . . . . . . . . . . . . . .  14
       6.2.1.  Optional Data Items . . . . . . . . . . . . . . . . .  15
       6.2.2.  Optional RRs and OPCODEs  . . . . . . . . . . . . . .  16
       6.2.3.  Storage Flags . . . . . . . . . . . . . . . . . . . .  17
       6.2.4.  IP Address Storage  . . . . . . . . . . . . . . . . .  17
   7.  C-DNS Format Detailed Description . . . . . . . . . . . . . .  18
     7.1.  Map Quantities and Indexes  . . . . . . . . . . . . . . .  18
     7.2.  Tabular Representation  . . . . . . . . . . . . . . . . .  18
     7.3.  "File"  . . . . . . . . . . . . . . . . . . . . . . . . .  19
       7.3.1.  "FilePreamble"  . . . . . . . . . . . . . . . . . . .  20
         7.3.1.1.  "BlockParameters" . . . . . . . . . . . . . . . .  20
           7.3.1.1.1.  "StorageParameters" . . . . . . . . . . . . .  21
             7.3.1.1.1.1.  "StorageHints"  . . . . . . . . . . . . .  22
           7.3.1.1.2.  "CollectionParameters"  . . . . . . . . . . .  24
       7.3.2.  "Block" . . . . . . . . . . . . . . . . . . . . . . .  25
         7.3.2.1.  "BlockPreamble" . . . . . . . . . . . . . . . . .  26
         7.3.2.2.  "BlockStatistics" . . . . . . . . . . . . . . . .  27
         7.3.2.3.  "BlockTables" . . . . . . . . . . . . . . . . . .  28
           7.3.2.3.1.  "ClassType" . . . . . . . . . . . . . . . . .  29
           7.3.2.3.2.  "QueryResponseSignature"  . . . . . . . . . .  30
           7.3.2.3.3.  "Question"  . . . . . . . . . . . . . . . . .  33
           7.3.2.3.4.  "RR"  . . . . . . . . . . . . . . . . . . . .  34
           7.3.2.3.5.  "MalformedMessageData"  . . . . . . . . . . .  34

RFC8618 - Page 3

         7.3.2.4.  "QueryResponse" . . . . . . . . . . . . . . . . .  35
           7.3.2.4.1.  "ResponseProcessingData"  . . . . . . . . . .  36
           7.3.2.4.2.  "QueryResponseExtended" . . . . . . . . . . .  37
         7.3.2.5.  "AddressEventCount" . . . . . . . . . . . . . . .  38
         7.3.2.6.  "MalformedMessage"  . . . . . . . . . . . . . . .  39
   8.  Versioning  . . . . . . . . . . . . . . . . . . . . . . . . .  39
   9.  C-DNS to PCAP . . . . . . . . . . . . . . . . . . . . . . . .  40
     9.1.  Name Compression  . . . . . . . . . . . . . . . . . . . .  42
   10. Data Collection . . . . . . . . . . . . . . . . . . . . . . .  42
     10.1.  Matching Algorithm . . . . . . . . . . . . . . . . . . .  43
     10.2.  Message Identifiers  . . . . . . . . . . . . . . . . . .  45
       10.2.1.  Primary ID (Required)  . . . . . . . . . . . . . . .  45
       10.2.2.  Secondary ID (Optional)  . . . . . . . . . . . . . .  46
     10.3.  Algorithm Parameters . . . . . . . . . . . . . . . . . .  46
     10.4.  Algorithm Requirements . . . . . . . . . . . . . . . . .  46
     10.5.  Algorithm Limitations  . . . . . . . . . . . . . . . . .  47
     10.6.  Workspace  . . . . . . . . . . . . . . . . . . . . . . .  47
     10.7.  Output . . . . . . . . . . . . . . . . . . . . . . . . .  47
     10.8.  Post-Processing  . . . . . . . . . . . . . . . . . . . .  47
   11. Implementation Guidance . . . . . . . . . . . . . . . . . . .  47
     11.1.  Optional Data  . . . . . . . . . . . . . . . . . . . . .  48
     11.2.  Trailing Bytes . . . . . . . . . . . . . . . . . . . . .  48
     11.3.  Limiting Collection of RDATA . . . . . . . . . . . . . .  49
     11.4.  Timestamps . . . . . . . . . . . . . . . . . . . . . . .  49
   12. IANA Considerations . . . . . . . . . . . . . . . . . . . . .  49
     12.1.  Transport Types  . . . . . . . . . . . . . . . . . . . .  49
     12.2.  Data Storage Flags . . . . . . . . . . . . . . . . . . .  50
     12.3.  Response-Processing Flags  . . . . . . . . . . . . . . .  51
     12.4.  AddressEvent Types . . . . . . . . . . . . . . . . . . .  51
   13. Security Considerations . . . . . . . . . . . . . . . . . . .  52
   14. Privacy Considerations  . . . . . . . . . . . . . . . . . . .  52
   15. References  . . . . . . . . . . . . . . . . . . . . . . . . .  53
     15.1.  Normative References . . . . . . . . . . . . . . . . . .  53
     15.2.  Informative References . . . . . . . . . . . . . . . . .  55
   Appendix A.  CDDL . . . . . . . . . . . . . . . . . . . . . . . .  58
   Appendix B.  DNS Name Compression Example . . . . . . . . . . . .  69
     B.1.  NSD Compression Algorithm . . . . . . . . . . . . . . . .  70
     B.2.  Knot Authoritative Compression Algorithm  . . . . . . . .  70
     B.3.  Observed Differences  . . . . . . . . . . . . . . . . . .  71
   Appendix C.  Comparison of Binary Formats . . . . . . . . . . . .  71
     C.1.  Comparison with Full PCAP Files . . . . . . . . . . . . .  74
     C.2.  Simple versus Block Coding  . . . . . . . . . . . . . . .  74
     C.3.  Binary versus Text Formats  . . . . . . . . . . . . . . .  75
     C.4.  Performance . . . . . . . . . . . . . . . . . . . . . . .  75
     C.5.  Conclusions . . . . . . . . . . . . . . . . . . . . . . .  75
     C.6.  Block Size Choice . . . . . . . . . . . . . . . . . . . .  76

RFC8618 - Page 4

   Appendix D.  Data Fields for Traffic Regeneration . . . . . . . .  77
     D.1.  Recommended Fields for Traffic Regeneration . . . . . . .  77
     D.2.  Issues with Small Data Captures . . . . . . . . . . . . .  77
   Acknowledgements  . . . . . . . . . . . . . . . . . . . . . . . .  78
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  79

1.  Introduction

   There has long been a need for server operators to collect DNS
   Queries and Responses on authoritative and recursive name servers for
   monitoring and analysis.  This data is used in a number of ways,
   including traffic monitoring, analyzing network attacks, and "day in
   the life" (DITL) [ditl] analysis.

   A wide variety of tools already exist that facilitate the collection
   of DNS traffic data, such as the DNS Statistics Collector (DSC)
   [dsc], packetq [packetq], dnscap [dnscap], and dnstap [dnstap].
   However, there is no standard exchange format for large DNS packet
   captures.  The PCAP ("packet capture") [pcap] format or the PCAP Next
   Generation (PCAP-NG) [pcapng] format is typically used in practice
   for packet captures, but these file formats can contain a great deal
   of additional information that is not directly pertinent to DNS
   traffic analysis and thus unnecessarily increases the capture file
   size.  Additionally, these tools and formats typically have no filter
   mechanism to selectively record only certain fields at capture time,
   requiring post-processing for anonymization or pseudonymization of
   data to protect user privacy.

   There has also been work on using text-based formats to describe DNS
   packets (for example, see [dnsxml] and [RFC8427]), but this work is
   largely aimed at producing convenient representations of single
   messages.

   Many DNS operators may receive hundreds of thousands of Queries per
   second on a single name server instance, so a mechanism to minimize
   the storage and transmission size (and therefore upload overhead) of
   the data collected is highly desirable.

   The format described in this document, C-DNS (Compacted-DNS), focuses
   on the problem of capturing and storing large packet capture files of
   DNS traffic with the following goals in mind:

   o  Minimize the file size for storage and transmission.

   o  Minimize the overhead of producing the packet capture file and the
      cost of any further (general-purpose) compression of the file.

RFC8618 - Page 5

   This document contains:

   o  A discussion of some common use cases in which DNS data is
      collected; see Section 3.

   o  A discussion of the major design considerations in developing an
      efficient data representation for collections of DNS messages; see
      Section 4.

   o  A description of why the Concise Binary Object Representation
      (CBOR) [RFC7049] was chosen for this format; see Section 5.

   o  A conceptual overview of the C-DNS format; see Section 6.

   o  The definition of the C-DNS format for the collection of DNS
      messages; see Section 7.

   o  Notes on converting C-DNS data to PCAP format; see Section 9.

   o  Some high-level implementation considerations for applications
      designed to produce C-DNS; see Section 10.

2.  Terminology

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
   "OPTIONAL" in this document are to be interpreted as described in
   BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all
   capitals, as shown here.

   "Packet" refers to an individual IPv4 or IPv6 packet.  Typically,
   packets are UDP datagrams, but such packets may also be part of a TCP
   data stream.  "Message", unless otherwise qualified, refers to a DNS
   payload extracted from a UDP datagram or a TCP data stream.

   The parts of DNS messages are named as they are in [RFC1035].
   Specifically, the DNS message has five sections: Header, Question,
   Answer, Authority, and Additional.

3.  Data Collection Use Cases

   From a purely server operator perspective, collecting full packet
   captures of all packets going into or out of a name server provides
   the most comprehensive picture of network activity.  However, there
   are several design choices or other limitations that are common to
   many DNS installations and operators.

RFC8618 - Page 6

   o  DNS servers are hosted in a variety of situations:

      *  Self-hosted servers

      *  Third-party hosting (including multiple third parties)

      *  Third-party hardware (including multiple third parties)

   o  Data is collected under different conditions:

      *  On well-provisioned servers running in a steady state

      *  On heavily loaded servers

      *  On virtualized servers

      *  On servers that are under DoS attack

      *  On servers that are unwitting intermediaries in DoS attacks

   o  Traffic can be collected via a variety of mechanisms:

      *  Within the name server implementation itself

      *  On the same hardware as the name server itself

      *  Using a network tap on an adjacent host to listen to DNS
         traffic

      *  Using port mirroring to listen from another host

   o  The capabilities of data collection (and upload) networks vary:

      *  Out-of-band networks with the same capacity as the in-band
         network

      *  Out-of-band networks with less capacity than the in-band
         network

      *  Everything being on the in-band network

   Thus, there is a wide range of use cases, from very limited data
   collection environments (third-party hardware, servers that are under
   attack, packet capture on the name server itself and no out-of-band
   network) to "limitless" environments (self-hosted, well-provisioned
   servers, using a network tap or port mirroring with out-of-band
   networks with the same capacity as the in-band network).  In the

RFC8618 - Page 7

   former case, it is infeasible to reliably collect full packet
   captures, especially if the server is under attack.  In the latter
   case, collection of full packet captures may be reasonable.

   As a result of these restrictions, the C-DNS data format is designed
   with the most limited use case in mind, such that:

   o  Data collection will occur on the same hardware as the name server
      itself

   o  Collected data will be stored on the same hardware as the name
      server itself, at least temporarily

   o  Collected data being returned to some central analysis system will
      use the same network interface as the DNS Queries and Responses

   o  There can be multiple third-party servers involved

   Because of these considerations, a major factor in the design of the
   format is minimal storage size of the capture files.

   Another significant consideration for any application that records
   DNS traffic is that the running of the name server software and the
   transmission of DNS Queries and Responses are the most important jobs
   of a name server; capturing data is not.  Any data collection system
   co-located with the name server needs to be intelligent enough to
   carefully manage its CPU, disk, memory, and network utilization.
   This leads to designing a format that requires a relatively low
   overhead to produce and minimizes the requirement for further
   potentially costly compression.

   However, it is also essential that interoperability with less
   restricted infrastructure is maintained.  In particular, it is highly
   desirable that the collection format should facilitate the
   re-creation of common formats (such as PCAP) that are as close to the
   original as is realistic, given the restrictions above.

RFC8618 - Page 8

4.  Design Considerations

   This section presents some of the major design considerations used in
   the development of the C-DNS format.

   1.  The basic unit of data is a combined DNS Query and the associated
       Response (a "Query/Response (Q/R) data item").  The same
       structure will be used for unmatched Queries and Responses.
       Queries without Responses will be captured omitting the Response
       data.  Responses without Queries will be captured omitting the
       Query data (but using the Question section from the Response, if
       present, as an identifying QNAME).

       *  Rationale: A Query and the associated Response represent the
          basic level of a client's interaction with the server.  Also,
          combining the Query and Response into one item often reduces
          storage requirements due to commonality in the data of the two
          messages.

       In the context of generating a C-DNS file, it is assumed that
       only those DNS payloads that can be parsed to produce a
       well-formed DNS message are stored in the structured Query/
       Response data items of the C-DNS format and that all other
       messages will (optionally) be recorded as separate malformed
       messages.  Parsing a well-formed message means, at a minimum, the
       following:

       *  The packet has a well-formed 12-byte DNS Header with a
          recognized OPCODE.

       *  The section counts are consistent with the section contents.

       *  All of the Resource Records (RRs) can be fully parsed.

   2.  All top-level fields in each Query/Response data item will be
       optional.

       *  Rationale: Different operators will have different
          requirements for data to be available for analysis.  Operators
          with minimal requirements should not have to pay the cost of
          recording full data, though this will limit the ability to
          perform certain kinds of data analysis and also to reconstruct
          packet captures.  For example, omitting the RRs from a
          Response will reduce the C-DNS file size; in principle,
          Responses can be synthesized if there is enough context.
          Operators may have different policies for collecting user data
          and can choose to omit or anonymize certain fields at capture
          time, e.g., client address.

RFC8618 - Page 9

   3.  Multiple Query/Response data items will be collected into blocks
       in the format.  Common data in a block will be abstracted and
       referenced from individual Query/Response data items by indexing.
       The maximum number of Query/Response data items in a block will
       be configurable.

       *  Rationale: This blocking and indexing action provides a
          significant reduction in the volume of file data generated.
          Although this introduces complexity, it provides compression
          of the data that makes use of knowledge of the DNS message
          structure.

       *  It is anticipated that the files produced can be subject to
          further compression using general-purpose compression tools.
          Measurements show that blocking significantly reduces the CPU
          required to perform such strong compression.  See
          Appendix C.2.

       *  Examples of commonality between DNS messages are that in most
          cases the QUESTION RR is the same in the Query and Response
          and that there is a finite set of Query "signatures" (based on
          a subset of attributes).  For many authoritative servers,
          there is very likely to be a finite set of Responses that are
          generated, of which a large number are NXDOMAIN.

   4.  Traffic metadata can optionally be included in each block.
       Specifically, counts of some types of non-DNS packets (e.g.,
       ICMP, TCP resets) sent to the server may be of interest.

   5.  The wire-format content of malformed DNS messages may optionally
       be recorded.

       *  Rationale: Any structured capture format that does not capture
          the DNS payload byte for byte will be limited to some extent
          in that it cannot represent malformed DNS messages.  Only
          those messages that can be fully parsed and transformed into
          the structured format can be fully represented.  Note,
          however, that this can result in rather misleading statistics.
          For example, a malformed Query that cannot be represented in
          the C-DNS format will lead to the (well-formed) DNS Response
          with error code FORMERR appearing as "unmatched".  Therefore,
          it can greatly aid downstream analysis to have the wire format
          of the malformed DNS messages available directly in the
          C-DNS file.

RFC8618 - Page 10

5.  Choice of CBOR

   This document presents a detailed format description for C-DNS.  The
   format uses CBOR [RFC7049].

   The choice of CBOR was made taking a number of factors into account.

   o  CBOR is a binary representation and thus is economical in storage
      space.

   o  Other binary representations were investigated, and whilst all had
      attractive features, none had a significant advantage over CBOR.
      See Appendix C for some discussion of this.

   o  CBOR is an IETF specification and is familiar to IETF
      participants.  It is based on the now-common ideas of lists and
      objects and thus requires very little familiarization for those in
      the wider industry.

   o  CBOR is a simple format and can easily be implemented from scratch
      if necessary.  Formats that are more complex require library
      support, which may present problems on unusual platforms.

   o  CBOR can also be easily converted to text formats such as JSON
      [RFC8259] for debugging and other human inspection requirements.

   o  CBOR data schemas can be described using the Concise Data
      Definition Language (CDDL) [RFC8610].

6.  C-DNS Format Conceptual Overview

   The following figures show purely schematic representations of the
   C-DNS format to convey the high-level structure of the C-DNS format.
   Section 7 provides a detailed discussion of the CBOR representation
   and individual elements.

   Figure 1 shows the C-DNS format at the top level, including the file
   header and data blocks.  The Query/Response data items, Address/Event
   Count data items, and Malformed Message data items link to various
   Block Tables.

RFC8618 - Page 11

                   +-------+
                   + C-DNS |
                   +-------+--------------------------+
                   | File Type Identifier             |
                   +----------------------------------+
                   | File Preamble                    |
                   | +--------------------------------+
                   | | Format Version                 |
                   | +--------------------------------+
                   | | Block Parameters               |
                   +-+--------------------------------+
                   | Block                            |
                   | +--------------------------------+
                   | | Block Preamble                 |
                   | +--------------------------------+
                   | | Block Statistics               |
                   | +--------------------------------+
                   | | Block Tables                   |
                   | +--------------------------------+
                   | | Query/Response data items      |
                   | +--------------------------------+
                   | | Address/Event Count data items |
                   | +--------------------------------+
                   | | Malformed Message data items   |
                   +-+--------------------------------+
                   | Block                            |
                   | +--------------------------------+
                   | | Block Preamble                 |
                   | +--------------------------------+
                   | | Block Statistics               |
                   | +--------------------------------+
                   | | Block Tables                   |
                   | +--------------------------------+
                   | | Query/Response data items      |
                   | +--------------------------------+
                   | | Address/Event Count data items |
                   | +--------------------------------+
                   | | Malformed Message data items   |
                   +-+--------------------------------+
                   | Further Blocks...                |
                   +----------------------------------+

                        Figure 1: The C-DNS Format

   Figure 2 shows some more-detailed relationships within each Block,
   specifically those between the Query/Response data item and the
   relevant Block Tables.  Some fields have been omitted for clarity.

RFC8618 - Page 12

   +----------------+
   | Query/Response |
   +-------------------------+
   | Time Offset             |
   +-------------------------+            +------------------+
   | Client Address          |---------+->| IP Address array |
   +-------------------------+         |  +------------------+
   | Client Port             |         |
   +-------------------------+         |  +------------------+
   | Transaction ID          |     +---)->| Name/RDATA array |<--------+
   +-------------------------+     |   |  +------------------+         |
   | Query Signature         |--+  |   |                               |
   +-------------------------+  |  |   |  +-----------------+          |
   | Client Hoplimit (q)     |  +--)---)->| Query Signature |          |
   +-------------------------+     |   |  +-----------------+-------+  |
   | Response Delay (r)      |     |   +--| Server Address          |  |
   +-------------------------+     |      +-------------------------+  |
   | Query Name              |--+--+      | Server Port             |  |
   +-------------------------+  |         +-------------------------+  |
   | Query Size (q)          |  |         | Transport Flags         |  |
   +-------------------------+  |         +-------------------------+  |
   | Response Size (r)       |  |         | QR Type                 |  |
   +-------------------------+  |         +-------------------------+  |
   | Response Processing (r) |  |         | QR Signature Flags      |  |
   | +-----------------------+  |         +-------------------------+  |
   | | Bailiwick             |--+         | Query OPCODE (q)        |  |
   | +-----------------------+            +-------------------------+  |
   | | Flags                 |            | QR DNS Flags            |  |
   +-+-----------------------+            +-------------------------+  |
   | Extra Query Info (q)    |            | Query RCODE (q)         |  |
   | +-----------------------+            +-------------------------+  |
   | | Question              |--+---+  +--+-Query Class/Type (q)    |  |
   | +-----------------------+      |  |  +-------------------------+  |
   | | Answer                |--+   |  |  | Query QDCOUNT (q)       |  |
   | +-----------------------+  |   |  |  +-------------------------+  |
   | | Authority             |--+   |  |  | Query ANCOUNT (q)       |  |
   | +-----------------------+  |   |  |  +-------------------------+  |
   | | Additional            |--+   |  |  | Query NSCOUNT (q)       |  |

RFC8618 - Page 13

   +-+-----------------------+  |   |  |  +-------------------------+  |
   | Extra Response Info (r) |  |-+ |  |  | Query ARCOUNT (q)       |  |
   | +-----------------------+  | | |  |  +-------------------------+  |
   | | Answer                |--+ | |  |  | Query EDNS version (q)  |  |
   | +-----------------------+  | | |  |  +-------------------------+  |
   | | Authority             |--+ | |  |  | Query EDNS UDP Size (q) |  |
   | +-----------------------+  | | |  |  +-------------------------+  |
   | | Additional            |--+ | |  |  | Query OPT RDATA (q)     |--+
   +-+-----------------------+    | |  |  +-------------------------+  |
                                  | |  |  | Response RCODE (r)      |  |
                                  | |  |  +-------------------------+  |
   + -----------------------------+ |  +----------+                    |
   |                                |             |                    |
   | + -----------------------------+             |                    |
   | |  +---------------+  +----------+           |                    |
   | +->| Question List |->| Question |           |                    |
   |    | array         |  | array    |           |                    |
   |    +---------------+  +----------+--+        |                    |
   |                       | Name        |--+-----)--------------------+
   |                       +-------------+  |     |  +------------+
   |                       | Class/Type  |--)---+-+->| Class/Type |
   |                       +-------------+  |   |    | array      |
   |                                        |   |    +------------+--+
   |                                        |   |    | CLASS         |
   |    +---------------+  +----------+     |   |    +---------------+
   +--->| RR List array |->| RR array |     |   |    | TYPE          |
        +---------+-----+  +----------+--+  |   |    +---------------+
                           | Name        |--+   |
                           +-------------+      |
                           | Class/Type  |------+
                           +-------------+

       Figure 2: The Query/Response Data Item and Subsidiary Tables

   In Figure 2, data items annotated (q) are only present when a
   Query/Response has a Query, and those annotated (r) are only present
   when a Query/Response Response is present.

   A C-DNS file begins with a file header containing a File Type
   Identifier and a File Preamble.  The File Preamble contains
   information on the file Format Version and an array of Block
   Parameters items (the contents of which include Collection and
   Storage Parameters used for one or more Blocks).

   The file header is followed by a series of Blocks.

RFC8618 - Page 14

   A Block consists of a Block Preamble item, some Block Statistics for
   the traffic stored within the Block, and then various arrays of
   common data collectively called the Block Tables.  This is then
   followed by an array of the Query/Response data items detailing the
   Queries and Responses stored within the Block.  The array of
   Query/Response data items is in turn followed by the Address/Event
   Count data items (an array of per-client counts of particular IP
   events) and then Malformed Message data items (an array of malformed
   messages that are stored in the Block).

   The exact nature of the DNS data will affect what Block size is the
   best fit; however, sample data for a root server indicated that Block
   sizes up to 10,000 Query/Response data items give good results.  See
   Appendix C.6 for more details.

   This design exploits data commonality and block-based storage to
   minimize the C-DNS file size.  As a result, C-DNS cannot be streamed
   below the level of a Block.

6.1.  Block Parameters

   The details of the Block Parameters items are not shown in the
   diagrams but are discussed here for context.

   An array of Block Parameters items is stored in the File Preamble
   (with a minimum of one item at index 0); a Block Parameters item
   consists of a collection of Storage and Collection Parameters that
   applies to any given Block.  An array is used in order to support use
   cases such as wanting to merge C-DNS files from different sources.
   The Block Preamble item then contains an optional index for the Block
   Parameters item that applies for that Block; if not present, the
   index defaults to 0.  Hence, in effect, a global Block Parameters
   item is defined that can then be overridden per Block.

6.2.  Storage Parameters

   The Block Parameters item includes a Storage Parameters item -- this
   contains information about the specific data fields stored in the
   C-DNS file.

   These parameters include:

   o  The sub-second timing resolution used by the data.

   o  Information (hints) on which optional data are omitted.  See
      Section 6.2.1.

RFC8618 - Page 15

   o  Recorded OPCODES [opcodes] and RR TYPEs [rrtypes].  See
      Section 6.2.2.

   o  Flags indicating, for example, whether the data is sampled or
      anonymized.  See Sections 6.2.3 and 14.

   o  Client and server IPv4 and IPv6 address prefixes.  See
      Section 6.2.4.

6.2.1.  Optional Data Items

   To enable implementations to store data to their precise requirements
   in as space-efficient a manner as possible, all fields in the
   following arrays are optional:

   o  Query/Response

   o  Query Signature

   o  Malformed Messages

   In other words, an implementation can choose to omit any data item
   that is not required for its use case (whilst observing the
   restrictions relating to IP address storage described in
   Section 6.2.4).  In addition, implementations may be configured to
   not record all RRs or to only record messages with certain OPCODES.

   This does, however, mean that a consumer of a C-DNS file faces two
   problems:

   1.  How can it quickly determine if a file definitely does not
       contain the data items it requires to complete a particular task
       (e.g., reconstructing DNS traffic or performing a specific piece
       of data analysis)?

   2.  How can it determine whether a data item is not present because
       it was (1) explicitly not recorded or (2) not available/present?

   For example, capturing C-DNS data from within a name server
   implementation makes it unlikely that the Client Hoplimit can be
   recorded.  Or, if there is no Query ARCOUNT recorded and no Query OPT
   RDATA [RFC6891] recorded, is that because no Query contained an OPT
   RR, or because that data was not stored?

   The Storage Parameters item therefore also contains a Storage Hints
   item, which specifies which items the encoder of the file omits from
   the stored data and will therefore never be present.  (This approach
   is taken because a flag that indicated which items were included for

RFC8618 - Page 16

   collection would not guarantee that the item was present -- only that
   it might be.)  An implementation decoding that file can then use
   these flags to quickly determine whether the input data is not rich
   enough for its needs.

   One scenario where this may be particularly important is the case of
   regenerating traffic.  It is possible to collect such a small set of
   data items that an implementation decoding the file cannot determine
   if a given Query/Response data item was generated from just a Query,
   just a Response, or a Query/Response pair.  This makes it impossible
   to reconstruct DNS traffic even if sensible defaults are provided for
   the missing data items.  This is discussed in more detail in
   Section 9.

6.2.2.  Optional RRs and OPCODEs

   Also included in the Storage Parameters item are explicit arrays
   listing the RR TYPEs and the OPCODEs to be recorded.  These arrays
   remove any ambiguity over whether, for example, messages containing
   particular OPCODEs are not present because (1) certain OPCODEs did
   not occur or (2) the implementation is not configured to record them.

   In the case of OPCODEs, for a message to be fully parsable, the
   OPCODE must be known to the collecting implementation.  Any message
   with an OPCODE unknown to the collecting implementation cannot be
   validated as correctly formed and so must be treated as malformed.
   Messages with OPCODES known to the recording application but not
   listed in the Storage Parameters item are discarded by the recording
   application during C-DNS capture (regardless of whether they are
   malformed or not).

   In the case of RRs, each record in a message must be fully parsable,
   including parsing the record RDATA, as otherwise the message cannot
   be validated as correctly formed.  Any RR with an RR TYPE not known
   to the collecting implementation cannot be validated as correctly
   formed and so must be treated as malformed.

   Once a message is correctly parsed, an implementation is free to
   record only a subset of the RRs present.

RFC8618 - Page 17

6.2.3.  Storage Flags

   The Storage Parameters item contains flags that can be used to
   indicate if:

   o  the data is anonymized,

   o  the data is produced from sample data, or

   o  names in the data have been normalized (converted to uniform
      case).

   The Storage Parameters item also contains optional fields holding
   details of the sampling method used and the anonymization method
   used.  It is RECOMMENDED that these fields contain URIs [RFC3986]
   pointing to resources describing the methods used.  See Section 14
   for further discussion of anonymization and normalization.

6.2.4.  IP Address Storage

   The format can store either full IP addresses or just IP prefixes;
   the Storage Parameters item contains fields to indicate if only IP
   prefixes were stored.

   If the IP address prefixes are absent, then full addresses are
   stored.  In this case, the IP version can be directly inferred from
   the stored address length and the fields "qr-transport-flags" in
   QueryResponseSignature, "ae-transport-flags" in AddressEventCount,
   and "mm-transport-flags" in MalformedMessageData (which contain the
   IP version bit) are optional.

   If IP address prefixes are given, only the prefix bits of addresses
   are stored.  In this case, in order to determine the IP version, the
   fields "qr-transport-flags" in QueryResponseSignature, "ae-transport-
   flags" in AddressEventCount, and "mm-transport-flags" in
   MalformedMessageData MUST be present.  See Sections 7.3.2.3.2 and
   7.3.2.3.5.

   As an example of storing only IP prefixes, if a client IPv6 prefix of
   48 is specified, a client address of 2001:db8:85a3::8a2e:370:7334
   will be stored as 0x20010db885a3, reducing address storage space
   requirements.  Similarly, if a client IPv4 prefix of 16 is specified,
   a client address of 192.0.2.1 will be stored as 0xc000 (192.0).

(next page on part 2)