Tech-invite3GPPspaceIETFspace
9796959493929190898887868584838281807978777675747372717069686766656463626160595857565554535251504948474645444342414039383736353433323130292827262524232221201918171615141312111009080706050403020100
in Index   Prev   Next

RFC 8618

Compacted-DNS (C-DNS): A Format for DNS Packet Capture

Pages: 79
Proposed Standard
Part 4 of 4 – Pages 58 to 79
First   Prev   None

Top   ToC   RFC8618 - Page 58   prevText

Appendix A. CDDL

This appendix gives a CDDL [RFC8610] specification for C-DNS. CDDL does not permit a range of allowed values to be specified for a bitfield. Where necessary, those values are given as a CDDL group, but the group definition is commented out to prevent CDDL tooling from warning that the group is unused. ; CDDL specification of the file format for C-DNS, ; which describes a collection of DNS messages and ; traffic metadata. ; ; The overall structure of a file. ; File = [ file-type-id : "C-DNS", file-preamble : FilePreamble, file-blocks : [* Block], ] ; ; The File Preamble. ; FilePreamble = { major-format-version => 1, minor-format-version => 0, ? private-version => uint, block-parameters => [+ BlockParameters], } major-format-version = 0 minor-format-version = 1 private-version = 2 block-parameters = 3 BlockParameters = { storage-parameters => StorageParameters, ? collection-parameters => CollectionParameters, } storage-parameters = 0 collection-parameters = 1 IPv6PrefixLength = 1..128 IPv4PrefixLength = 1..32 OpcodeRange = 0..15 RRTypeRange = 0..65535
Top   ToC   RFC8618 - Page 59
     StorageParameters = {
         ticks-per-second             => uint,
         max-block-items              => uint,
         storage-hints                => StorageHints,
         opcodes                      => [+ OpcodeRange],
         rr-types                     => [+ RRTypeRange],
         ? storage-flags              => StorageFlags,
         ? client-address-prefix-ipv4 => IPv4PrefixLength,
         ? client-address-prefix-ipv6 => IPv6PrefixLength,
         ? server-address-prefix-ipv4 => IPv4PrefixLength,
         ? server-address-prefix-ipv6 => IPv6PrefixLength,
         ? sampling-method            => tstr,
         ? anonymization-method       => tstr,
     }
     ticks-per-second           = 0
     max-block-items            = 1
     storage-hints              = 2
     opcodes                    = 3
     rr-types                   = 4
     storage-flags              = 5
     client-address-prefix-ipv4 = 6
     client-address-prefix-ipv6 = 7
     server-address-prefix-ipv4 = 8
     server-address-prefix-ipv6 = 9
     sampling-method            = 10
     anonymization-method       = 11

       ; A hint indicates whether the collection method will always omit
       ; the item from the file.
       StorageHints = {
           query-response-hints           => QueryResponseHints,
           query-response-signature-hints =>
               QueryResponseSignatureHints,
           rr-hints                       => RRHints,
           other-data-hints               => OtherDataHints,
       }
       query-response-hints           = 0
       query-response-signature-hints = 1
       rr-hints                       = 2
       other-data-hints               = 3

         QueryResponseHintValues = &(
             time-offset                  : 0,
             client-address-index         : 1,
             client-port                  : 2,
             transaction-id               : 3,
             qr-signature-index           : 4,
             client-hoplimit              : 5,
Top   ToC   RFC8618 - Page 60
             response-delay               : 6,
             query-name-index             : 7,
             query-size                   : 8,
             response-size                : 9,
             response-processing-data     : 10,
             query-question-sections      : 11,    ; Second & subsequent
                                                   ; Questions
             query-answer-sections        : 12,
             query-authority-sections     : 13,
             query-additional-sections    : 14,
             response-answer-sections     : 15,
             response-authority-sections  : 16,
             response-additional-sections : 17,
         )
         QueryResponseHints = uint .bits QueryResponseHintValues

         QueryResponseSignatureHintValues = &(
             server-address-index  : 0,
             server-port           : 1,
             qr-transport-flags    : 2,
             qr-type               : 3,
             qr-sig-flags          : 4,
             query-opcode          : 5,
             qr-dns-flags          : 6,
             query-rcode           : 7,
             query-classtype-index : 8,
             query-qdcount         : 9,
             query-ancount         : 10,
             query-nscount         : 11,
             query-arcount         : 12,
             query-edns-version    : 13,
             query-udp-size        : 14,
             query-opt-rdata-index : 15,
             response-rcode        : 16,
         )
         QueryResponseSignatureHints =
             uint .bits QueryResponseSignatureHintValues

         RRHintValues = &(
             ttl         : 0,
             rdata-index : 1,
         )
         RRHints = uint .bits RRHintValues

         OtherDataHintValues = &(
             malformed-messages   : 0,
             address-event-counts : 1,
         )
Top   ToC   RFC8618 - Page 61
         OtherDataHints = uint .bits OtherDataHintValues

       StorageFlagValues = &(
           anonymized-data      : 0,
           sampled-data         : 1,
           normalized-names     : 2,
       )
       StorageFlags = uint .bits StorageFlagValues

    ; Metadata about data collection
    VLANIdRange = 1..4094

    CollectionParameters = {
         ? query-timeout      => uint,             ; Milliseconds
         ? skew-timeout       => uint,             ; Microseconds
         ? snaplen            => uint,
         ? promisc            => bool,
         ? interfaces         => [+ tstr],
         ? server-addresses   => [+ IPAddress],
         ? vlan-ids           => [+ VLANIdRange],
         ? filter             => tstr,
         ? generator-id       => tstr,
         ? host-id            => tstr,
     }
     query-timeout      = 0
     skew-timeout       = 1
     snaplen            = 2
     promisc            = 3
     interfaces         = 4
     server-addresses   = 5
     vlan-ids           = 6
     filter             = 7
     generator-id       = 8
     host-id            = 9

   ;
   ; Data in the file is stored in Blocks.
   ;
   Block = {
       block-preamble          => BlockPreamble,
       ? block-statistics      => BlockStatistics, ; Much of this
                                                   ; could be derived
       ? block-tables          => BlockTables,
       ? query-responses       => [+ QueryResponse],
       ? address-event-counts  => [+ AddressEventCount],
       ? malformed-messages    => [+ MalformedMessage],
   }
Top   ToC   RFC8618 - Page 62
   block-preamble        = 0
   block-statistics      = 1
   block-tables          = 2
   query-responses       = 3
   address-event-counts  = 4
   malformed-messages    = 5

   ;
   ; The (mandatory) preamble to a Block.
   ;
   BlockPreamble = {
       ? earliest-time          => Timestamp,
       ? block-parameters-index => uint .default 0,
   }
   earliest-time          = 0
   block-parameters-index = 1

   ; Ticks are sub-second intervals.  The number of ticks in a second is
   ; file/block metadata.  Signed and unsigned tick types are defined.
   ticks = int
   uticks = uint

   Timestamp = [
       timestamp-secs   : uint,      ; POSIX time
       timestamp-ticks  : uticks,
   ]

   ;
   ; Statistics about the Block contents.
   ;
   BlockStatistics = {
       ? processed-messages  => uint,
       ? qr-data-items       => uint,
       ? unmatched-queries   => uint,
       ? unmatched-responses => uint,
       ? discarded-opcode    => uint,
       ? malformed-items     => uint,
   }
   processed-messages  = 0
   qr-data-items       = 1
   unmatched-queries   = 2
   unmatched-responses = 3
   discarded-opcode    = 4
   malformed-items     = 5
Top   ToC   RFC8618 - Page 63
   ;
   ; Tables of common data referenced from records in a Block.
   ;
   BlockTables = {
       ? ip-address             => [+ IPAddress],
       ? classtype              => [+ ClassType],
       ? name-rdata             => [+ bstr],    ; Holds both names
                                                ; and RDATA
       ? qr-sig                 => [+ QueryResponseSignature],
       ? QuestionTables,
       ? RRTables,
       ? malformed-message-data => [+ MalformedMessageData],
   }
   ip-address             = 0
   classtype              = 1
   name-rdata             = 2
   qr-sig                 = 3
   qlist                  = 4
   qrr                    = 5
   rrlist                 = 6
   rr                     = 7
   malformed-message-data = 8

   IPv4Address = bstr .size (0..4)
   IPv6Address = bstr .size (0..16)
   IPAddress = IPv4Address / IPv6Address

   ClassType = {
       type  => uint,
       class => uint,
   }
   type  = 0
   class = 1

   QueryResponseSignature = {
       ? server-address-index  => uint,
       ? server-port           => uint,
       ? qr-transport-flags    => QueryResponseTransportFlags,
       ? qr-type               => QueryResponseType,
       ? qr-sig-flags          => QueryResponseFlags,
       ? query-opcode          => uint,
       ? qr-dns-flags          => DNSFlags,
       ? query-rcode           => uint,
       ? query-classtype-index => uint,
       ? query-qdcount         => uint,
       ? query-ancount         => uint,
       ? query-nscount         => uint,
       ? query-arcount         => uint,
Top   ToC   RFC8618 - Page 64
       ? query-edns-version    => uint,
       ? query-udp-size        => uint,
       ? query-opt-rdata-index => uint,
       ? response-rcode        => uint,
   }
   server-address-index  = 0
   server-port           = 1
   qr-transport-flags    = 2
   qr-type               = 3
   qr-sig-flags          = 4
   query-opcode          = 5
   qr-dns-flags          = 6
   query-rcode           = 7
   query-classtype-index = 8
   query-qdcount         = 9
   query-ancount         = 10
   query-nscount         = 11
   query-arcount         = 12
   query-edns-version    = 13
   query-udp-size        = 14
   query-opt-rdata-index = 15
   response-rcode        = 16

     ; Transport gives the values that may appear in bits 1..4 of
     ; TransportFlags.  There is currently no way to express this in
     ; CDDL, so Transport is unused.  To avoid confusion when used
     ; with CDDL tools, it is commented out.
     ;
     ; Transport = &(
     ;     udp               : 0,
     ;     tcp               : 1,
     ;     tls               : 2,
     ;     dtls              : 3,
     ;     https             : 4,
     ;     non-standard      : 15,
     ; )

     TransportFlagValues = &(
         ip-version         : 0,     ; 0=IPv4, 1=IPv6
     ) / (1..4)
     TransportFlags = uint .bits TransportFlagValues

     QueryResponseTransportFlagValues = &(
         query-trailingdata : 5,
     ) / TransportFlagValues
     QueryResponseTransportFlags =
         uint .bits QueryResponseTransportFlagValues
Top   ToC   RFC8618 - Page 65
     QueryResponseType = &(
         stub      : 0,
         client    : 1,
         resolver  : 2,
         auth      : 3,
         forwarder : 4,
         tool      : 5,
     )

     QueryResponseFlagValues = &(
         has-query               : 0,
         has-response            : 1,
         query-has-opt           : 2,
         response-has-opt        : 3,
         query-has-no-question   : 4,
         response-has-no-question: 5,
     )
     QueryResponseFlags = uint .bits QueryResponseFlagValues

     DNSFlagValues = &(
         query-cd   : 0,
         query-ad   : 1,
         query-z    : 2,
         query-ra   : 3,
         query-rd   : 4,
         query-tc   : 5,
         query-aa   : 6,
         query-do   : 7,
         response-cd: 8,
         response-ad: 9,
         response-z : 10,
         response-ra: 11,
         response-rd: 12,
         response-tc: 13,
         response-aa: 14,
     )
     DNSFlags = uint .bits DNSFlagValues
Top   ToC   RFC8618 - Page 66
   QuestionTables = (
       qlist => [+ QuestionList],
       qrr   => [+ Question]
   )

     QuestionList = [+ uint]           ; Index of Question

     Question = {                      ; Second and subsequent Questions
         name-index      => uint,      ; Index to a name in the
                                       ; name-rdata table
         classtype-index => uint,
     }
     name-index      = 0
     classtype-index = 1

   RRTables = (
       rrlist => [+ RRList],
       rr     => [+ RR]
   )

     RRList = [+ uint]                     ; Index of RR

     RR = {
         name-index      => uint,          ; Index to a name in the
                                           ; name-rdata table
         classtype-index => uint,
         ? ttl           => uint,
         ? rdata-index   => uint,          ; Index to RDATA in the
                                           ; name-rdata table
     }
     ; Other map key values already defined above.
     ttl         = 2
     rdata-index = 3

   MalformedMessageData = {
       ? server-address-index   => uint,
       ? server-port            => uint,
       ? mm-transport-flags     => TransportFlags,
       ? mm-payload             => bstr,
   }
   ; Other map key values already defined above.
   mm-transport-flags      = 2
   mm-payload              = 3
Top   ToC   RFC8618 - Page 67
   ;
   ; A single Query/Response data item.
   ;
   QueryResponse = {
       ? time-offset              => uticks,     ; Time offset from
                                                 ; start of Block
       ? client-address-index     => uint,
       ? client-port              => uint,
       ? transaction-id           => uint,
       ? qr-signature-index       => uint,
       ? client-hoplimit          => uint,
       ? response-delay           => ticks,
       ? query-name-index         => uint,
       ? query-size               => uint,       ; DNS size of Query
       ? response-size            => uint,       ; DNS size of Response
       ? response-processing-data => ResponseProcessingData,
       ? query-extended           => QueryResponseExtended,
       ? response-extended        => QueryResponseExtended,
   }
   time-offset              = 0
   client-address-index     = 1
   client-port              = 2
   transaction-id           = 3
   qr-signature-index       = 4
   client-hoplimit          = 5
   response-delay           = 6
   query-name-index         = 7
   query-size               = 8
   response-size            = 9
   response-processing-data = 10
   query-extended           = 11
   response-extended        = 12

   ResponseProcessingData = {
       ? bailiwick-index  => uint,
       ? processing-flags => ResponseProcessingFlags,
   }
   bailiwick-index = 0
   processing-flags = 1

     ResponseProcessingFlagValues = &(
         from-cache : 0,
     )
     ResponseProcessingFlags = uint .bits ResponseProcessingFlagValues
Top   ToC   RFC8618 - Page 68
   QueryResponseExtended = {
       ? question-index   => uint,       ; Index of QuestionList
       ? answer-index     => uint,       ; Index of RRList
       ? authority-index  => uint,
       ? additional-index => uint,
   }
   question-index   = 0
   answer-index     = 1
   authority-index  = 2
   additional-index = 3

   ;
   ; Address event data.
   ;
   AddressEventCount = {
       ae-type              => &AddressEventType,
       ? ae-code            => uint,
       ae-address-index     => uint,
       ? ae-transport-flags => TransportFlags,
       ae-count             => uint,
   }
   ae-type            = 0
   ae-code            = 1
   ae-address-index   = 2
   ae-transport-flags = 3
   ae-count           = 4

   AddressEventType = (
       tcp-reset              : 0,
       icmp-time-exceeded     : 1,
       icmp-dest-unreachable  : 2,
       icmpv6-time-exceeded   : 3,
       icmpv6-dest-unreachable: 4,
       icmpv6-packet-too-big  : 5,
   )

   ;
   ; Malformed messages.
   ;
   MalformedMessage = {
       ? time-offset           => uticks,   ; Time offset from
                                            ; start of Block
       ? client-address-index  => uint,
       ? client-port           => uint,
       ? message-data-index    => uint,
   }
   ; Other map key values already defined above.
   message-data-index = 3
Top   ToC   RFC8618 - Page 69

Appendix B. DNS Name Compression Example

The basic algorithm, which follows the guidance in [RFC1035], is simply to collect each name, and the offset in the packet at which it starts, during packet construction. As each name is added, it is offered to each of the collected names in order of collection, starting from the first name. If (1) labels at the end of the name can be replaced with a reference back to part (or all) of the earlier name and (2) the uncompressed part of the name is shorter than any compression already found, the earlier name is noted as the compression target for the name. The following tables illustrate the step-by-step process of adding names and performing name compression. In an example packet, the first name added is foo.example, which cannot be compressed. +---+-------------+--------------+--------------------+ | N | Name | Uncompressed | Compression Target | +---+-------------+--------------+--------------------+ | 1 | foo.example | foo.example | None | +---+-------------+--------------+--------------------+ The next name added is bar.example. This is matched against foo.example. The example part of this can be used as a compression target, with the remaining uncompressed part of the name being bar. +---+-------------+--------------+-----------------------+ | N | Name | Uncompressed | Compression Target | +---+-------------+--------------+-----------------------+ | 1 | foo.example | foo.example | None | | 2 | bar.example | bar | 1 + offset to example | +---+-------------+--------------+-----------------------+ The third name added is www.bar.example. This is first matched against foo.example, and as before this is recorded as a compression target, with the remaining uncompressed part of the name being www.bar. It is then matched against the second name, which again can be a compression target. Because the remaining uncompressed part of the name is www, this is an improved compression, and so it is adopted. +---+-----------------+--------------+-----------------------+ | N | Name | Uncompressed | Compression Target | +---+-----------------+--------------+-----------------------+ | 1 | foo.example | foo.example | None | | 2 | bar.example | bar | 1 + offset to example | | 3 | www.bar.example | www | 2 | +---+-----------------+--------------+-----------------------+
Top   ToC   RFC8618 - Page 70
   As an optimization, if a name is already perfectly compressed (in
   other words, the uncompressed part of the name is empty), then no
   further names will be considered for compression.

B.1. NSD Compression Algorithm

Using the above basic algorithm, the packet lengths of Responses generated by the Name Server Daemon (NSD) [NSD] can be matched almost exactly. At the time of writing, a tiny number (<.01%) of the reconstructed packets had incorrect lengths.

B.2. Knot Authoritative Compression Algorithm

The Knot Authoritative name server [Knot] uses different compression behavior, which is the result of internal optimization designed to balance runtime speed with compression size gains. In brief, and omitting complications, Knot Authoritative will only consider the QNAME and names in the immediately preceding RR section in an RRSET as compression targets. A set of smart heuristics as described below can be implemented to mimic this, and while not perfect, it produces output nearly, but not quite, as good a match as with NSD. The heuristics are as follows: 1. A match is only perfect if the name is completely compressed AND the TYPE of the section in which the name occurs matches the TYPE of the name used as the compression target. 2. If the name occurs in RDATA: * If the compression target name is in a Query, then only the first RR in an RRSET can use that name as a compression target. * The compression target name MUST be in RDATA. * The name section TYPE must match the compression target name section TYPE. * The compression target name MUST be in the immediately preceding RR in the RRSET. Using this algorithm, less than 0.1% of the reconstructed packets had incorrect lengths.
Top   ToC   RFC8618 - Page 71

B.3. Observed Differences

In sample traffic collected on a root name server, around 2-4% of Responses generated by Knot had different packet lengths than those produced by NSD.

Appendix C. Comparison of Binary Formats

Several binary serialization formats were considered. For completeness, they were also compared to JSON. o Apache Avro [Avro]. Data is stored according to a predefined schema. The schema itself is always included in the data file. Data can therefore be stored untagged, for a smaller serialization size, and be written and read by an Avro library. * At the time of writing, Avro libraries are available for C, C++, C#, Java, Python, Ruby, and PHP. Optionally, tools are available for C++, Java, and C# to generate code for encoding and decoding. o Google Protocol Buffers [Protocol-Buffers]. Data is stored according to a predefined schema. The schema is used by a generator to generate code for encoding and decoding the data. Data can therefore be stored untagged, for a smaller serialization size. The schema is not stored with the data, so unlike Avro, it cannot be read with a generic library. * Code must be generated for a particular data schema to read and write data using that schema. At the time of writing, the Google code generator can currently generate code for encoding and decoding a schema for C++, Go, Java, Python, Ruby, C#, Objective-C, JavaScript, and PHP. o CBOR [RFC7049]. This serialization format is comparable to JSON but with a binary representation. It does not use a predefined schema, so data is always stored tagged. However, CBOR data schemas can be described using CDDL [RFC8610], and tools exist to verify that data files conform to the schema. * CBOR is a simple format and is simple to implement. At the time of writing, the CBOR website lists implementations for 16 languages.
Top   ToC   RFC8618 - Page 72
   Avro and Protocol Buffers both allow storage of untagged data, but
   because they rely on the data schema for this, their implementation
   is considerably more complex than CBOR.  Using Avro or Protocol
   Buffers in an unsupported environment would require notably greater
   development effort compared to CBOR.

   A test program was written that reads input from a PCAP file and
   writes output using one of two basic structures: either a simple
   structure, where each Query/Response pair is represented in a single
   record entry, or the C-DNS block structure.

   The resulting output files were then compressed using a variety of
   common general-purpose lossless compression tools to explore the
   compressibility of the formats.  The compression tools employed were:

   o  snzip [snzip].  A command-line compression tool based on the
      Google Snappy library [snappy].

   o  lz4 [lz4].  The command-line compression tool from the reference C
      LZ4 implementation.

   o  gzip [gzip].  The ubiquitous GNU zip tool.

   o  zstd [zstd].  Compression using the Zstandard algorithm.

   o  xz [xz].  A popular compression tool noted for high compression.

   In all cases, the compression tools were run using their default
   settings.

   Note that this document does not mandate the use of compression, nor
   any particular compression scheme, but it anticipates that in
   practice output data will be subject to general-purpose compression,
   and so this should be taken into consideration.

   "test.pcap", a 662 MB capture of sample data from a root instance,
   was used for the comparison.  The following table shows the formatted
   size and size after compression (abbreviated to Comp. in the table
   headers), together with the task Resident Set Size (RSS) and the user
   time taken by the compression.  File sizes are in MB, RSS is in kB,
   and user time is in seconds.
Top   ToC   RFC8618 - Page 73
   +-------------+-----------+-------+------------+-------+-----------+
   | Format      | File Size | Comp. | Comp. Size |   RSS | User Time |
   +-------------+-----------+-------+------------+-------+-----------+
   | PCAP        |    661.87 | snzip |     212.48 |  2696 |      1.26 |
   |             |           | lz4   |     181.58 |  6336 |      1.35 |
   |             |           | gzip  |     153.46 |  1428 |     18.20 |
   |             |           | zstd  |      87.07 |  3544 |      4.27 |
   |             |           | xz    |      49.09 | 97416 |    160.79 |
   |             |           |       |            |       |           |
   | JSON simple |   4113.92 | snzip |     603.78 |  2656 |      5.72 |
   |             |           | lz4   |     386.42 |  5636 |      5.25 |
   |             |           | gzip  |     271.11 |  1492 |     73.00 |
   |             |           | zstd  |     133.43 |  3284 |      8.68 |
   |             |           | xz    |      51.98 | 97412 |    600.74 |
   |             |           |       |            |       |           |
   | Avro simple |    640.45 | snzip |     148.98 |  2656 |      0.90 |
   |             |           | lz4   |     111.92 |  5828 |      0.99 |
   |             |           | gzip  |     103.07 |  1540 |     11.52 |
   |             |           | zstd  |      49.08 |  3524 |      2.50 |
   |             |           | xz    |      22.87 | 97308 |     90.34 |
   |             |           |       |            |       |           |
   | CBOR simple |    764.82 | snzip |     164.57 |  2664 |      1.11 |
   |             |           | lz4   |     120.98 |  5892 |      1.13 |
   |             |           | gzip  |     110.61 |  1428 |     12.88 |
   |             |           | zstd  |      54.14 |  3224 |      2.77 |
   |             |           | xz    |      23.43 | 97276 |    111.48 |
   |             |           |       |            |       |           |
   | PBuf simple |    749.51 | snzip |     167.16 |  2660 |      1.08 |
   |             |           | lz4   |     123.09 |  5824 |      1.14 |
   |             |           | gzip  |     112.05 |  1424 |     12.75 |
   |             |           | zstd  |      53.39 |  3388 |      2.76 |
   |             |           | xz    |      23.99 | 97348 |    106.47 |
   |             |           |       |            |       |           |
   | JSON block  |    519.77 | snzip |     106.12 |  2812 |      0.93 |
   |             |           | lz4   |     104.34 |  6080 |      0.97 |
   |             |           | gzip  |      57.97 |  1604 |     12.70 |
   |             |           | zstd  |      61.51 |  3396 |      3.45 |
   |             |           | xz    |      27.67 | 97524 |    169.10 |
   |             |           |       |            |       |           |
   | Avro block  |     60.45 | snzip |      48.38 |  2688 |      0.20 |
   |             |           | lz4   |      48.78 |  8540 |      0.22 |
   |             |           | gzip  |      39.62 |  1576 |      2.92 |
   |             |           | zstd  |      29.63 |  3612 |      1.25 |
   |             |           | xz    |      18.28 | 97564 |     25.81 |
   |             |           |       |            |       |           |
Top   ToC   RFC8618 - Page 74
   | CBOR block  |     75.25 | snzip |      53.27 |  2684 |      0.24 |
   |             |           | lz4   |      51.88 |  8008 |      0.28 |
   |             |           | gzip  |      41.17 |  1548 |      4.36 |
   |             |           | zstd  |      30.61 |  3476 |      1.48 |
   |             |           | xz    |      18.15 | 97556 |     38.78 |
   |             |           |       |            |       |           |
   | PBuf block  |     67.98 | snzip |      51.10 |  2636 |      0.24 |
   |             |           | lz4   |      52.39 |  8304 |      0.24 |
   |             |           | gzip  |      40.19 |  1520 |      3.63 |
   |             |           | zstd  |      31.61 |  3576 |      1.40 |
   |             |           | xz    |      17.94 | 97440 |     33.99 |
   +-------------+-----------+-------+------------+-------+-----------+

   The above results are discussed in the following sections.

C.1. Comparison with Full PCAP Files

An important first consideration is whether moving away from PCAP offers significant benefits. The simple binary formats are typically larger than PCAP, even though they omit some information such as Ethernet Media Access Control (MAC) addresses. But not only do they require less CPU to compress than PCAP, the resulting compressed files are smaller than compressed PCAP.

C.2. Simple versus Block Coding

The intention of the block coding is to perform data deduplication on Query/Response records within the block. The simple and block formats shown above store exactly the same information for each Query/Response record. This information is parsed from the DNS traffic in the input PCAP file, and in all cases each field has an identifier and the field data is typed. The data deduplication on the block formats show an order-of- magnitude reduction in the size of the format file size against the simple formats. As would be expected, the compression tools are able to find and exploit a lot of this duplication, but as the deduplication process uses knowledge of DNS traffic, it is able to retain a size advantage. This advantage reduces as stronger compression is applied, as again would be expected, but even with the strongest compression applied the block-formatted data remains around 75% of the size of the simple format and its compression requires roughly a third of the CPU time.
Top   ToC   RFC8618 - Page 75

C.3. Binary versus Text Formats

Text data formats offer many advantages over binary formats, particularly in the areas of ad hoc data inspection and extraction. It was therefore felt worthwhile to carry out a direct comparison, implementing JSON versions of the simple and block formats. Concentrating on JSON block format, the format files produced are a significant fraction of an order of magnitude larger than binary formats. The impact on file size after compression is as might be expected from that starting point; the stronger compression produces files that are 150% of the size of similarly compressed binary format and require over 4x more CPU to compress.

C.4. Performance

Concentrating again on the block formats, all three produce format files that are close to an order of magnitude smaller than the original "test.pcap" file. CBOR produces the largest files and Avro the smallest, 20% smaller than CBOR. However, once compression is taken into account, the size difference narrows. At medium compression (with gzip), the size difference is 4%. Using strong compression (with xz), the difference reduces to 2%, with Avro the largest and Protocol Buffers the smallest, although CBOR and Protocol Buffers require slightly more compression CPU. The measurements presented above do not include data on the CPU required to generate the format files. Measurements indicate that writing Avro requires 10% more CPU than CBOR or Protocol Buffers. It appears, therefore, that Avro's advantage in compression CPU usage is probably offset by a larger CPU requirement in writing Avro.

C.5. Conclusions

The above assessments lead us to the choice of a binary format file using blocking. As noted previously, this document anticipates that output data will be subject to compression. There is no compelling case for one particular binary serialization format in terms of either final file size or machine resources consumed, so the choice must be largely based on other factors. CBOR was therefore chosen as the binary serialization format for the reasons listed in Section 5.
Top   ToC   RFC8618 - Page 76

C.6. Block Size Choice

Given the choice of a CBOR format using blocking, the question arises of what an appropriate default value for the maximum number of Query/Response pairs in a block should be. This has two components: 1. What is the impact on performance of using different block sizes in the format file? 2. What is the impact on the size of the format file before and after compression? The following table addresses the performance question, showing the impact on the performance of a C++ program converting "test.pcap" to C-DNS. File sizes are in MB, RSS is in kB, and user time is in seconds. +------------+-----------+--------+-----------+ | Block Size | File Size | RSS | User Time | +------------+-----------+--------+-----------+ | 1,000 | 133.46 | 612.27 | 15.25 | | 5,000 | 89.85 | 676.82 | 14.99 | | 10,000 | 76.87 | 752.40 | 14.53 | | 20,000 | 67.86 | 750.75 | 14.49 | | 40,000 | 61.88 | 736.30 | 14.29 | | 80,000 | 58.08 | 694.16 | 14.28 | | 160,000 | 55.94 | 733.84 | 14.44 | | 320,000 | 54.41 | 799.20 | 13.97 | +------------+-----------+--------+-----------+ Therefore, increasing block size tends to increase maximum RSS a little, with no significant effect (if anything, a small reduction) on CPU consumption.
Top   ToC   RFC8618 - Page 77
   The following table demonstrates the effect of increasing block size
   on output file size for different compressions.

      +------------+--------+-------+-------+-------+-------+-------+
      | Block Size |   None | snzip |   lz4 |  gzip |  zstd |    xz |
      +------------+--------+-------+-------+-------+-------+-------+
      |      1,000 | 133.46 | 90.52 | 90.03 | 74.65 | 44.78 | 25.63 |
      |      5,000 |  89.85 | 59.69 | 59.43 | 46.99 | 37.33 | 22.34 |
      |     10,000 |  76.87 | 50.39 | 50.28 | 38.94 | 33.62 | 21.09 |
      |     20,000 |  67.86 | 43.91 | 43.90 | 33.24 | 32.62 | 20.16 |
      |     40,000 |  61.88 | 39.63 | 39.69 | 29.44 | 28.72 | 19.52 |
      |     80,000 |  58.08 | 36.93 | 37.01 | 27.05 | 26.25 | 19.00 |
      |    160,000 |  55.94 | 35.10 | 35.06 | 25.44 | 24.56 | 19.63 |
      |    320,000 |  54.41 | 33.87 | 33.74 | 24.36 | 23.44 | 18.66 |
      +------------+--------+-------+-------+-------+-------+-------+

   There is obviously scope for tuning the default block size to the
   compression being employed, traffic characteristics, frequency of
   output file rollover, etc.  Using a strong compression scheme, block
   sizes over 10,000 Query/Response pairs would seem to offer limited
   improvements.

Appendix D. Data Fields for Traffic Regeneration

D.1. Recommended Fields for Traffic Regeneration

This section specifies the data fields that would need to be captured in order to perform the fullest PCAP traffic reconstruction for well-formed DNS messages that is possible with C-DNS. o All data fields in the QueryResponse type except response- processing-data. o All data fields in the QueryResponseSignature type except qr-type. o All data fields in the RR TYPE.

D.2. Issues with Small Data Captures

At the other extreme, an interesting corner case arises when opting to perform captures with a smaller data set than that recommended above. The following list specifies a subset of the above data fields; if only these data fields are captured, then even a minimal traffic reconstruction is problematic because there is not enough information to determine if the Query/Response data item contained just a Query, just a Response, or a Query/Response pair.
Top   ToC   RFC8618 - Page 78
   o  The following data fields from the QueryResponse type:

      *  time-offset

      *  client-address-index

      *  client-port

      *  transaction-id

      *  query-name-index

   o  The following data fields from the QueryResponseSignature type:

      *  server-address-index

      *  server-port

      *  qr-transport-flags

      *  query-classtype-index

   In this case, simply also capturing the qr-sig-flags will provide
   enough information to perform a minimal traffic reconstruction
   (assuming that suitable defaults for the remaining fields are
   provided).  Additionally, capturing response-delay, query-opcode, and
   response-rcode will avoid having to rely on potentially misleading
   defaults for these values and should result in a PCAP that represents
   the basics of the real traffic flow.

Acknowledgements

The authors wish to thank CZ.NIC -- in particular, Tomas Gavenciak -- for many useful discussions on binary formats, compression, and packet matching. Thanks also to Jan Vcelak and Wouter Wijngaards for discussions on name compression, and Paul Hoffman for a detailed review of this document and the C-DNS CDDL. Thanks also to Robert Edmonds, Jerry Lundstrom, Richard Gibson, Stephane Bortzmeyer, and many other members of DNSOP for review. Also, thanks to Miek Gieben for [mmark].
Top   ToC   RFC8618 - Page 79

Authors' Addresses

John Dickinson Sinodun IT Magdalen Centre Oxford Science Park Oxford OX4 4GA United Kingdom Email: jad@sinodun.com Jim Hague Sinodun IT Magdalen Centre Oxford Science Park Oxford OX4 4GA United Kingdom Email: jim@sinodun.com Sara Dickinson Sinodun IT Magdalen Centre Oxford Science Park Oxford OX4 4GA United Kingdom Email: sara@sinodun.com Terry Manderson ICANN 12025 Waterfront Drive Suite 300 Los Angeles, CA 90094-2536 United States of America Email: terry.manderson@icann.org John Bond Wikimedia Foundation, Inc. 1 Montgomery Street Suite 1600 San Francisco, CA 94104 United States of America Email: ietf-wikimedia@johnbond.org