Internet Engineering Task Force (IETF) J. Dickinson Request for Comments: 8618 J. Hague Category: Standards Track S. Dickinson ISSN: 2070-1721 Sinodun IT T. Manderson ICANN J. Bond Wikimedia Foundation, Inc. September 2019 Compacted-DNS (C-DNS): A Format for DNS Packet CaptureAbstract
This document describes a data representation for collections of DNS messages. The format is designed for efficient storage and transmission of large packet captures of DNS traffic; it attempts to minimize the size of such packet capture files but retain the full DNS message contents along with the most useful transport metadata. It is intended to assist with the development of DNS traffic- monitoring applications. Status of This Memo This is an Internet Standards Track document. This document is a product of the Internet Engineering Task Force (IETF). It represents the consensus of the IETF community. It has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG). Further information on Internet Standards is available in Section 2 of RFC 7841. Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at https://www.rfc-editor.org/info/rfc8618.
Copyright Notice Copyright (c) 2019 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 5 3. Data Collection Use Cases . . . . . . . . . . . . . . . . . . 5 4. Design Considerations . . . . . . . . . . . . . . . . . . . . 8 5. Choice of CBOR . . . . . . . . . . . . . . . . . . . . . . . 10 6. C-DNS Format Conceptual Overview . . . . . . . . . . . . . . 10 6.1. Block Parameters . . . . . . . . . . . . . . . . . . . . 14 6.2. Storage Parameters . . . . . . . . . . . . . . . . . . . 14 6.2.1. Optional Data Items . . . . . . . . . . . . . . . . . 15 6.2.2. Optional RRs and OPCODEs . . . . . . . . . . . . . . 16 6.2.3. Storage Flags . . . . . . . . . . . . . . . . . . . . 17 6.2.4. IP Address Storage . . . . . . . . . . . . . . . . . 17 7. C-DNS Format Detailed Description . . . . . . . . . . . . . . 18 7.1. Map Quantities and Indexes . . . . . . . . . . . . . . . 18 7.2. Tabular Representation . . . . . . . . . . . . . . . . . 18 7.3. "File" . . . . . . . . . . . . . . . . . . . . . . . . . 19 7.3.1. "FilePreamble" . . . . . . . . . . . . . . . . . . . 20 7.3.1.1. "BlockParameters" . . . . . . . . . . . . . . . . 20 7.3.1.1.1. "StorageParameters" . . . . . . . . . . . . . 21 7.3.1.1.1.1. "StorageHints" . . . . . . . . . . . . . 22 7.3.1.1.2. "CollectionParameters" . . . . . . . . . . . 24 7.3.2. "Block" . . . . . . . . . . . . . . . . . . . . . . . 25 7.3.2.1. "BlockPreamble" . . . . . . . . . . . . . . . . . 26 7.3.2.2. "BlockStatistics" . . . . . . . . . . . . . . . . 27 7.3.2.3. "BlockTables" . . . . . . . . . . . . . . . . . . 28 7.3.2.3.1. "ClassType" . . . . . . . . . . . . . . . . . 29 7.3.2.3.2. "QueryResponseSignature" . . . . . . . . . . 30 7.3.2.3.3. "Question" . . . . . . . . . . . . . . . . . 33 7.3.2.3.4. "RR" . . . . . . . . . . . . . . . . . . . . 34 7.3.2.3.5. "MalformedMessageData" . . . . . . . . . . . 34
7.3.2.4. "QueryResponse" . . . . . . . . . . . . . . . . . 35 7.3.2.4.1. "ResponseProcessingData" . . . . . . . . . . 36 7.3.2.4.2. "QueryResponseExtended" . . . . . . . . . . . 37 7.3.2.5. "AddressEventCount" . . . . . . . . . . . . . . . 38 7.3.2.6. "MalformedMessage" . . . . . . . . . . . . . . . 39 8. Versioning . . . . . . . . . . . . . . . . . . . . . . . . . 39 9. C-DNS to PCAP . . . . . . . . . . . . . . . . . . . . . . . . 40 9.1. Name Compression . . . . . . . . . . . . . . . . . . . . 42 10. Data Collection . . . . . . . . . . . . . . . . . . . . . . . 42 10.1. Matching Algorithm . . . . . . . . . . . . . . . . . . . 43 10.2. Message Identifiers . . . . . . . . . . . . . . . . . . 45 10.2.1. Primary ID (Required) . . . . . . . . . . . . . . . 45 10.2.2. Secondary ID (Optional) . . . . . . . . . . . . . . 46 10.3. Algorithm Parameters . . . . . . . . . . . . . . . . . . 46 10.4. Algorithm Requirements . . . . . . . . . . . . . . . . . 46 10.5. Algorithm Limitations . . . . . . . . . . . . . . . . . 47 10.6. Workspace . . . . . . . . . . . . . . . . . . . . . . . 47 10.7. Output . . . . . . . . . . . . . . . . . . . . . . . . . 47 10.8. Post-Processing . . . . . . . . . . . . . . . . . . . . 47 11. Implementation Guidance . . . . . . . . . . . . . . . . . . . 47 11.1. Optional Data . . . . . . . . . . . . . . . . . . . . . 48 11.2. Trailing Bytes . . . . . . . . . . . . . . . . . . . . . 48 11.3. Limiting Collection of RDATA . . . . . . . . . . . . . . 49 11.4. Timestamps . . . . . . . . . . . . . . . . . . . . . . . 49 12. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 49 12.1. Transport Types . . . . . . . . . . . . . . . . . . . . 49 12.2. Data Storage Flags . . . . . . . . . . . . . . . . . . . 50 12.3. Response-Processing Flags . . . . . . . . . . . . . . . 51 12.4. AddressEvent Types . . . . . . . . . . . . . . . . . . . 51 13. Security Considerations . . . . . . . . . . . . . . . . . . . 52 14. Privacy Considerations . . . . . . . . . . . . . . . . . . . 52 15. References . . . . . . . . . . . . . . . . . . . . . . . . . 53 15.1. Normative References . . . . . . . . . . . . . . . . . . 53 15.2. Informative References . . . . . . . . . . . . . . . . . 55 Appendix A. CDDL . . . . . . . . . . . . . . . . . . . . . . . . 58 Appendix B. DNS Name Compression Example . . . . . . . . . . . . 69 B.1. NSD Compression Algorithm . . . . . . . . . . . . . . . . 70 B.2. Knot Authoritative Compression Algorithm . . . . . . . . 70 B.3. Observed Differences . . . . . . . . . . . . . . . . . . 71 Appendix C. Comparison of Binary Formats . . . . . . . . . . . . 71 C.1. Comparison with Full PCAP Files . . . . . . . . . . . . . 74 C.2. Simple versus Block Coding . . . . . . . . . . . . . . . 74 C.3. Binary versus Text Formats . . . . . . . . . . . . . . . 75 C.4. Performance . . . . . . . . . . . . . . . . . . . . . . . 75 C.5. Conclusions . . . . . . . . . . . . . . . . . . . . . . . 75 C.6. Block Size Choice . . . . . . . . . . . . . . . . . . . . 76
Appendix D. Data Fields for Traffic Regeneration . . . . . . . . 77 D.1. Recommended Fields for Traffic Regeneration . . . . . . . 77 D.2. Issues with Small Data Captures . . . . . . . . . . . . . 77 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 78 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 791. Introduction
There has long been a need for server operators to collect DNS Queries and Responses on authoritative and recursive name servers for monitoring and analysis. This data is used in a number of ways, including traffic monitoring, analyzing network attacks, and "day in the life" (DITL) [ditl] analysis. A wide variety of tools already exist that facilitate the collection of DNS traffic data, such as the DNS Statistics Collector (DSC) [dsc], packetq [packetq], dnscap [dnscap], and dnstap [dnstap]. However, there is no standard exchange format for large DNS packet captures. The PCAP ("packet capture") [pcap] format or the PCAP Next Generation (PCAP-NG) [pcapng] format is typically used in practice for packet captures, but these file formats can contain a great deal of additional information that is not directly pertinent to DNS traffic analysis and thus unnecessarily increases the capture file size. Additionally, these tools and formats typically have no filter mechanism to selectively record only certain fields at capture time, requiring post-processing for anonymization or pseudonymization of data to protect user privacy. There has also been work on using text-based formats to describe DNS packets (for example, see [dnsxml] and [RFC8427]), but this work is largely aimed at producing convenient representations of single messages. Many DNS operators may receive hundreds of thousands of Queries per second on a single name server instance, so a mechanism to minimize the storage and transmission size (and therefore upload overhead) of the data collected is highly desirable. The format described in this document, C-DNS (Compacted-DNS), focuses on the problem of capturing and storing large packet capture files of DNS traffic with the following goals in mind: o Minimize the file size for storage and transmission. o Minimize the overhead of producing the packet capture file and the cost of any further (general-purpose) compression of the file.
This document contains: o A discussion of some common use cases in which DNS data is collected; see Section 3. o A discussion of the major design considerations in developing an efficient data representation for collections of DNS messages; see Section 4. o A description of why the Concise Binary Object Representation (CBOR) [RFC7049] was chosen for this format; see Section 5. o A conceptual overview of the C-DNS format; see Section 6. o The definition of the C-DNS format for the collection of DNS messages; see Section 7. o Notes on converting C-DNS data to PCAP format; see Section 9. o Some high-level implementation considerations for applications designed to produce C-DNS; see Section 10.2. Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here. "Packet" refers to an individual IPv4 or IPv6 packet. Typically, packets are UDP datagrams, but such packets may also be part of a TCP data stream. "Message", unless otherwise qualified, refers to a DNS payload extracted from a UDP datagram or a TCP data stream. The parts of DNS messages are named as they are in [RFC1035]. Specifically, the DNS message has five sections: Header, Question, Answer, Authority, and Additional.3. Data Collection Use Cases
From a purely server operator perspective, collecting full packet captures of all packets going into or out of a name server provides the most comprehensive picture of network activity. However, there are several design choices or other limitations that are common to many DNS installations and operators.
o DNS servers are hosted in a variety of situations: * Self-hosted servers * Third-party hosting (including multiple third parties) * Third-party hardware (including multiple third parties) o Data is collected under different conditions: * On well-provisioned servers running in a steady state * On heavily loaded servers * On virtualized servers * On servers that are under DoS attack * On servers that are unwitting intermediaries in DoS attacks o Traffic can be collected via a variety of mechanisms: * Within the name server implementation itself * On the same hardware as the name server itself * Using a network tap on an adjacent host to listen to DNS traffic * Using port mirroring to listen from another host o The capabilities of data collection (and upload) networks vary: * Out-of-band networks with the same capacity as the in-band network * Out-of-band networks with less capacity than the in-band network * Everything being on the in-band network Thus, there is a wide range of use cases, from very limited data collection environments (third-party hardware, servers that are under attack, packet capture on the name server itself and no out-of-band network) to "limitless" environments (self-hosted, well-provisioned servers, using a network tap or port mirroring with out-of-band networks with the same capacity as the in-band network). In the
former case, it is infeasible to reliably collect full packet captures, especially if the server is under attack. In the latter case, collection of full packet captures may be reasonable. As a result of these restrictions, the C-DNS data format is designed with the most limited use case in mind, such that: o Data collection will occur on the same hardware as the name server itself o Collected data will be stored on the same hardware as the name server itself, at least temporarily o Collected data being returned to some central analysis system will use the same network interface as the DNS Queries and Responses o There can be multiple third-party servers involved Because of these considerations, a major factor in the design of the format is minimal storage size of the capture files. Another significant consideration for any application that records DNS traffic is that the running of the name server software and the transmission of DNS Queries and Responses are the most important jobs of a name server; capturing data is not. Any data collection system co-located with the name server needs to be intelligent enough to carefully manage its CPU, disk, memory, and network utilization. This leads to designing a format that requires a relatively low overhead to produce and minimizes the requirement for further potentially costly compression. However, it is also essential that interoperability with less restricted infrastructure is maintained. In particular, it is highly desirable that the collection format should facilitate the re-creation of common formats (such as PCAP) that are as close to the original as is realistic, given the restrictions above.
4. Design Considerations
This section presents some of the major design considerations used in the development of the C-DNS format. 1. The basic unit of data is a combined DNS Query and the associated Response (a "Query/Response (Q/R) data item"). The same structure will be used for unmatched Queries and Responses. Queries without Responses will be captured omitting the Response data. Responses without Queries will be captured omitting the Query data (but using the Question section from the Response, if present, as an identifying QNAME). * Rationale: A Query and the associated Response represent the basic level of a client's interaction with the server. Also, combining the Query and Response into one item often reduces storage requirements due to commonality in the data of the two messages. In the context of generating a C-DNS file, it is assumed that only those DNS payloads that can be parsed to produce a well-formed DNS message are stored in the structured Query/ Response data items of the C-DNS format and that all other messages will (optionally) be recorded as separate malformed messages. Parsing a well-formed message means, at a minimum, the following: * The packet has a well-formed 12-byte DNS Header with a recognized OPCODE. * The section counts are consistent with the section contents. * All of the Resource Records (RRs) can be fully parsed. 2. All top-level fields in each Query/Response data item will be optional. * Rationale: Different operators will have different requirements for data to be available for analysis. Operators with minimal requirements should not have to pay the cost of recording full data, though this will limit the ability to perform certain kinds of data analysis and also to reconstruct packet captures. For example, omitting the RRs from a Response will reduce the C-DNS file size; in principle, Responses can be synthesized if there is enough context. Operators may have different policies for collecting user data and can choose to omit or anonymize certain fields at capture time, e.g., client address.
3. Multiple Query/Response data items will be collected into blocks in the format. Common data in a block will be abstracted and referenced from individual Query/Response data items by indexing. The maximum number of Query/Response data items in a block will be configurable. * Rationale: This blocking and indexing action provides a significant reduction in the volume of file data generated. Although this introduces complexity, it provides compression of the data that makes use of knowledge of the DNS message structure. * It is anticipated that the files produced can be subject to further compression using general-purpose compression tools. Measurements show that blocking significantly reduces the CPU required to perform such strong compression. See Appendix C.2. * Examples of commonality between DNS messages are that in most cases the QUESTION RR is the same in the Query and Response and that there is a finite set of Query "signatures" (based on a subset of attributes). For many authoritative servers, there is very likely to be a finite set of Responses that are generated, of which a large number are NXDOMAIN. 4. Traffic metadata can optionally be included in each block. Specifically, counts of some types of non-DNS packets (e.g., ICMP, TCP resets) sent to the server may be of interest. 5. The wire-format content of malformed DNS messages may optionally be recorded. * Rationale: Any structured capture format that does not capture the DNS payload byte for byte will be limited to some extent in that it cannot represent malformed DNS messages. Only those messages that can be fully parsed and transformed into the structured format can be fully represented. Note, however, that this can result in rather misleading statistics. For example, a malformed Query that cannot be represented in the C-DNS format will lead to the (well-formed) DNS Response with error code FORMERR appearing as "unmatched". Therefore, it can greatly aid downstream analysis to have the wire format of the malformed DNS messages available directly in the C-DNS file.
5. Choice of CBOR
This document presents a detailed format description for C-DNS. The format uses CBOR [RFC7049]. The choice of CBOR was made taking a number of factors into account. o CBOR is a binary representation and thus is economical in storage space. o Other binary representations were investigated, and whilst all had attractive features, none had a significant advantage over CBOR. See Appendix C for some discussion of this. o CBOR is an IETF specification and is familiar to IETF participants. It is based on the now-common ideas of lists and objects and thus requires very little familiarization for those in the wider industry. o CBOR is a simple format and can easily be implemented from scratch if necessary. Formats that are more complex require library support, which may present problems on unusual platforms. o CBOR can also be easily converted to text formats such as JSON [RFC8259] for debugging and other human inspection requirements. o CBOR data schemas can be described using the Concise Data Definition Language (CDDL) [RFC8610].6. C-DNS Format Conceptual Overview
The following figures show purely schematic representations of the C-DNS format to convey the high-level structure of the C-DNS format. Section 7 provides a detailed discussion of the CBOR representation and individual elements. Figure 1 shows the C-DNS format at the top level, including the file header and data blocks. The Query/Response data items, Address/Event Count data items, and Malformed Message data items link to various Block Tables.
+-------+ + C-DNS | +-------+--------------------------+ | File Type Identifier | +----------------------------------+ | File Preamble | | +--------------------------------+ | | Format Version | | +--------------------------------+ | | Block Parameters | +-+--------------------------------+ | Block | | +--------------------------------+ | | Block Preamble | | +--------------------------------+ | | Block Statistics | | +--------------------------------+ | | Block Tables | | +--------------------------------+ | | Query/Response data items | | +--------------------------------+ | | Address/Event Count data items | | +--------------------------------+ | | Malformed Message data items | +-+--------------------------------+ | Block | | +--------------------------------+ | | Block Preamble | | +--------------------------------+ | | Block Statistics | | +--------------------------------+ | | Block Tables | | +--------------------------------+ | | Query/Response data items | | +--------------------------------+ | | Address/Event Count data items | | +--------------------------------+ | | Malformed Message data items | +-+--------------------------------+ | Further Blocks... | +----------------------------------+ Figure 1: The C-DNS Format Figure 2 shows some more-detailed relationships within each Block, specifically those between the Query/Response data item and the relevant Block Tables. Some fields have been omitted for clarity.
+----------------+
| Query/Response |
+-------------------------+
| Time Offset |
+-------------------------+ +------------------+
| Client Address |---------+->| IP Address array |
+-------------------------+ | +------------------+
| Client Port | |
+-------------------------+ | +------------------+
| Transaction ID | +---)->| Name/RDATA array |<--------+
+-------------------------+ | | +------------------+ |
| Query Signature |--+ | | |
+-------------------------+ | | | +-----------------+ |
| Client Hoplimit (q) | +--)---)->| Query Signature | |
+-------------------------+ | | +-----------------+-------+ |
| Response Delay (r) | | +--| Server Address | |
+-------------------------+ | +-------------------------+ |
| Query Name |--+--+ | Server Port | |
+-------------------------+ | +-------------------------+ |
| Query Size (q) | | | Transport Flags | |
+-------------------------+ | +-------------------------+ |
| Response Size (r) | | | QR Type | |
+-------------------------+ | +-------------------------+ |
| Response Processing (r) | | | QR Signature Flags | |
| +-----------------------+ | +-------------------------+ |
| | Bailiwick |--+ | Query OPCODE (q) | |
| +-----------------------+ +-------------------------+ |
| | Flags | | QR DNS Flags | |
+-+-----------------------+ +-------------------------+ |
| Extra Query Info (q) | | Query RCODE (q) | |
| +-----------------------+ +-------------------------+ |
| | Question |--+---+ +--+-Query Class/Type (q) | |
| +-----------------------+ | | +-------------------------+ |
| | Answer |--+ | | | Query QDCOUNT (q) | |
| +-----------------------+ | | | +-------------------------+ |
| | Authority |--+ | | | Query ANCOUNT (q) | |
| +-----------------------+ | | | +-------------------------+ |
| | Additional |--+ | | | Query NSCOUNT (q) | |
+-+-----------------------+ | | | +-------------------------+ | | Extra Response Info (r) | |-+ | | | Query ARCOUNT (q) | | | +-----------------------+ | | | | +-------------------------+ | | | Answer |--+ | | | | Query EDNS version (q) | | | +-----------------------+ | | | | +-------------------------+ | | | Authority |--+ | | | | Query EDNS UDP Size (q) | | | +-----------------------+ | | | | +-------------------------+ | | | Additional |--+ | | | | Query OPT RDATA (q) |--+ +-+-----------------------+ | | | +-------------------------+ | | | | | Response RCODE (r) | | | | | +-------------------------+ | + -----------------------------+ | +----------+ | | | | | | + -----------------------------+ | | | | +---------------+ +----------+ | | | +->| Question List |->| Question | | | | | array | | array | | | | +---------------+ +----------+--+ | | | | Name |--+-----)--------------------+ | +-------------+ | | +------------+ | | Class/Type |--)---+-+->| Class/Type | | +-------------+ | | | array | | | | +------------+--+ | | | | CLASS | | +---------------+ +----------+ | | +---------------+ +--->| RR List array |->| RR array | | | | TYPE | +---------+-----+ +----------+--+ | | +---------------+ | Name |--+ | +-------------+ | | Class/Type |------+ +-------------+ Figure 2: The Query/Response Data Item and Subsidiary Tables In Figure 2, data items annotated (q) are only present when a Query/Response has a Query, and those annotated (r) are only present when a Query/Response Response is present. A C-DNS file begins with a file header containing a File Type Identifier and a File Preamble. The File Preamble contains information on the file Format Version and an array of Block Parameters items (the contents of which include Collection and Storage Parameters used for one or more Blocks). The file header is followed by a series of Blocks.
A Block consists of a Block Preamble item, some Block Statistics for the traffic stored within the Block, and then various arrays of common data collectively called the Block Tables. This is then followed by an array of the Query/Response data items detailing the Queries and Responses stored within the Block. The array of Query/Response data items is in turn followed by the Address/Event Count data items (an array of per-client counts of particular IP events) and then Malformed Message data items (an array of malformed messages that are stored in the Block). The exact nature of the DNS data will affect what Block size is the best fit; however, sample data for a root server indicated that Block sizes up to 10,000 Query/Response data items give good results. See Appendix C.6 for more details. This design exploits data commonality and block-based storage to minimize the C-DNS file size. As a result, C-DNS cannot be streamed below the level of a Block.6.1. Block Parameters
The details of the Block Parameters items are not shown in the diagrams but are discussed here for context. An array of Block Parameters items is stored in the File Preamble (with a minimum of one item at index 0); a Block Parameters item consists of a collection of Storage and Collection Parameters that applies to any given Block. An array is used in order to support use cases such as wanting to merge C-DNS files from different sources. The Block Preamble item then contains an optional index for the Block Parameters item that applies for that Block; if not present, the index defaults to 0. Hence, in effect, a global Block Parameters item is defined that can then be overridden per Block.6.2. Storage Parameters
The Block Parameters item includes a Storage Parameters item -- this contains information about the specific data fields stored in the C-DNS file. These parameters include: o The sub-second timing resolution used by the data. o Information (hints) on which optional data are omitted. See Section 6.2.1.
o Recorded OPCODES [opcodes] and RR TYPEs [rrtypes]. See Section 6.2.2. o Flags indicating, for example, whether the data is sampled or anonymized. See Sections 6.2.3 and 14. o Client and server IPv4 and IPv6 address prefixes. See Section 6.2.4.6.2.1. Optional Data Items
To enable implementations to store data to their precise requirements in as space-efficient a manner as possible, all fields in the following arrays are optional: o Query/Response o Query Signature o Malformed Messages In other words, an implementation can choose to omit any data item that is not required for its use case (whilst observing the restrictions relating to IP address storage described in Section 6.2.4). In addition, implementations may be configured to not record all RRs or to only record messages with certain OPCODES. This does, however, mean that a consumer of a C-DNS file faces two problems: 1. How can it quickly determine if a file definitely does not contain the data items it requires to complete a particular task (e.g., reconstructing DNS traffic or performing a specific piece of data analysis)? 2. How can it determine whether a data item is not present because it was (1) explicitly not recorded or (2) not available/present? For example, capturing C-DNS data from within a name server implementation makes it unlikely that the Client Hoplimit can be recorded. Or, if there is no Query ARCOUNT recorded and no Query OPT RDATA [RFC6891] recorded, is that because no Query contained an OPT RR, or because that data was not stored? The Storage Parameters item therefore also contains a Storage Hints item, which specifies which items the encoder of the file omits from the stored data and will therefore never be present. (This approach is taken because a flag that indicated which items were included for
collection would not guarantee that the item was present -- only that it might be.) An implementation decoding that file can then use these flags to quickly determine whether the input data is not rich enough for its needs. One scenario where this may be particularly important is the case of regenerating traffic. It is possible to collect such a small set of data items that an implementation decoding the file cannot determine if a given Query/Response data item was generated from just a Query, just a Response, or a Query/Response pair. This makes it impossible to reconstruct DNS traffic even if sensible defaults are provided for the missing data items. This is discussed in more detail in Section 9.6.2.2. Optional RRs and OPCODEs
Also included in the Storage Parameters item are explicit arrays listing the RR TYPEs and the OPCODEs to be recorded. These arrays remove any ambiguity over whether, for example, messages containing particular OPCODEs are not present because (1) certain OPCODEs did not occur or (2) the implementation is not configured to record them. In the case of OPCODEs, for a message to be fully parsable, the OPCODE must be known to the collecting implementation. Any message with an OPCODE unknown to the collecting implementation cannot be validated as correctly formed and so must be treated as malformed. Messages with OPCODES known to the recording application but not listed in the Storage Parameters item are discarded by the recording application during C-DNS capture (regardless of whether they are malformed or not). In the case of RRs, each record in a message must be fully parsable, including parsing the record RDATA, as otherwise the message cannot be validated as correctly formed. Any RR with an RR TYPE not known to the collecting implementation cannot be validated as correctly formed and so must be treated as malformed. Once a message is correctly parsed, an implementation is free to record only a subset of the RRs present.
6.2.3. Storage Flags
The Storage Parameters item contains flags that can be used to indicate if: o the data is anonymized, o the data is produced from sample data, or o names in the data have been normalized (converted to uniform case). The Storage Parameters item also contains optional fields holding details of the sampling method used and the anonymization method used. It is RECOMMENDED that these fields contain URIs [RFC3986] pointing to resources describing the methods used. See Section 14 for further discussion of anonymization and normalization.6.2.4. IP Address Storage
The format can store either full IP addresses or just IP prefixes; the Storage Parameters item contains fields to indicate if only IP prefixes were stored. If the IP address prefixes are absent, then full addresses are stored. In this case, the IP version can be directly inferred from the stored address length and the fields "qr-transport-flags" in QueryResponseSignature, "ae-transport-flags" in AddressEventCount, and "mm-transport-flags" in MalformedMessageData (which contain the IP version bit) are optional. If IP address prefixes are given, only the prefix bits of addresses are stored. In this case, in order to determine the IP version, the fields "qr-transport-flags" in QueryResponseSignature, "ae-transport- flags" in AddressEventCount, and "mm-transport-flags" in MalformedMessageData MUST be present. See Sections 7.3.2.3.2 and 7.3.2.3.5. As an example of storing only IP prefixes, if a client IPv6 prefix of 48 is specified, a client address of 2001:db8:85a3::8a2e:370:7334 will be stored as 0x20010db885a3, reducing address storage space requirements. Similarly, if a client IPv4 prefix of 16 is specified, a client address of 192.0.2.1 will be stored as 0xc000 (192.0).