Appendix A. CDDL
This appendix gives a CDDL [RFC8610] specification for C-DNS. CDDL does not permit a range of allowed values to be specified for a bitfield. Where necessary, those values are given as a CDDL group, but the group definition is commented out to prevent CDDL tooling from warning that the group is unused. ; CDDL specification of the file format for C-DNS, ; which describes a collection of DNS messages and ; traffic metadata. ; ; The overall structure of a file. ; File = [ file-type-id : "C-DNS", file-preamble : FilePreamble, file-blocks : [* Block], ] ; ; The File Preamble. ; FilePreamble = { major-format-version => 1, minor-format-version => 0, ? private-version => uint, block-parameters => [+ BlockParameters], } major-format-version = 0 minor-format-version = 1 private-version = 2 block-parameters = 3 BlockParameters = { storage-parameters => StorageParameters, ? collection-parameters => CollectionParameters, } storage-parameters = 0 collection-parameters = 1 IPv6PrefixLength = 1..128 IPv4PrefixLength = 1..32 OpcodeRange = 0..15 RRTypeRange = 0..65535
StorageParameters = { ticks-per-second => uint, max-block-items => uint, storage-hints => StorageHints, opcodes => [+ OpcodeRange], rr-types => [+ RRTypeRange], ? storage-flags => StorageFlags, ? client-address-prefix-ipv4 => IPv4PrefixLength, ? client-address-prefix-ipv6 => IPv6PrefixLength, ? server-address-prefix-ipv4 => IPv4PrefixLength, ? server-address-prefix-ipv6 => IPv6PrefixLength, ? sampling-method => tstr, ? anonymization-method => tstr, } ticks-per-second = 0 max-block-items = 1 storage-hints = 2 opcodes = 3 rr-types = 4 storage-flags = 5 client-address-prefix-ipv4 = 6 client-address-prefix-ipv6 = 7 server-address-prefix-ipv4 = 8 server-address-prefix-ipv6 = 9 sampling-method = 10 anonymization-method = 11 ; A hint indicates whether the collection method will always omit ; the item from the file. StorageHints = { query-response-hints => QueryResponseHints, query-response-signature-hints => QueryResponseSignatureHints, rr-hints => RRHints, other-data-hints => OtherDataHints, } query-response-hints = 0 query-response-signature-hints = 1 rr-hints = 2 other-data-hints = 3 QueryResponseHintValues = &( time-offset : 0, client-address-index : 1, client-port : 2, transaction-id : 3, qr-signature-index : 4, client-hoplimit : 5,
response-delay : 6, query-name-index : 7, query-size : 8, response-size : 9, response-processing-data : 10, query-question-sections : 11, ; Second & subsequent ; Questions query-answer-sections : 12, query-authority-sections : 13, query-additional-sections : 14, response-answer-sections : 15, response-authority-sections : 16, response-additional-sections : 17, ) QueryResponseHints = uint .bits QueryResponseHintValues QueryResponseSignatureHintValues = &( server-address-index : 0, server-port : 1, qr-transport-flags : 2, qr-type : 3, qr-sig-flags : 4, query-opcode : 5, qr-dns-flags : 6, query-rcode : 7, query-classtype-index : 8, query-qdcount : 9, query-ancount : 10, query-nscount : 11, query-arcount : 12, query-edns-version : 13, query-udp-size : 14, query-opt-rdata-index : 15, response-rcode : 16, ) QueryResponseSignatureHints = uint .bits QueryResponseSignatureHintValues RRHintValues = &( ttl : 0, rdata-index : 1, ) RRHints = uint .bits RRHintValues OtherDataHintValues = &( malformed-messages : 0, address-event-counts : 1, )
OtherDataHints = uint .bits OtherDataHintValues StorageFlagValues = &( anonymized-data : 0, sampled-data : 1, normalized-names : 2, ) StorageFlags = uint .bits StorageFlagValues ; Metadata about data collection VLANIdRange = 1..4094 CollectionParameters = { ? query-timeout => uint, ; Milliseconds ? skew-timeout => uint, ; Microseconds ? snaplen => uint, ? promisc => bool, ? interfaces => [+ tstr], ? server-addresses => [+ IPAddress], ? vlan-ids => [+ VLANIdRange], ? filter => tstr, ? generator-id => tstr, ? host-id => tstr, } query-timeout = 0 skew-timeout = 1 snaplen = 2 promisc = 3 interfaces = 4 server-addresses = 5 vlan-ids = 6 filter = 7 generator-id = 8 host-id = 9 ; ; Data in the file is stored in Blocks. ; Block = { block-preamble => BlockPreamble, ? block-statistics => BlockStatistics, ; Much of this ; could be derived ? block-tables => BlockTables, ? query-responses => [+ QueryResponse], ? address-event-counts => [+ AddressEventCount], ? malformed-messages => [+ MalformedMessage], }
block-preamble = 0 block-statistics = 1 block-tables = 2 query-responses = 3 address-event-counts = 4 malformed-messages = 5 ; ; The (mandatory) preamble to a Block. ; BlockPreamble = { ? earliest-time => Timestamp, ? block-parameters-index => uint .default 0, } earliest-time = 0 block-parameters-index = 1 ; Ticks are sub-second intervals. The number of ticks in a second is ; file/block metadata. Signed and unsigned tick types are defined. ticks = int uticks = uint Timestamp = [ timestamp-secs : uint, ; POSIX time timestamp-ticks : uticks, ] ; ; Statistics about the Block contents. ; BlockStatistics = { ? processed-messages => uint, ? qr-data-items => uint, ? unmatched-queries => uint, ? unmatched-responses => uint, ? discarded-opcode => uint, ? malformed-items => uint, } processed-messages = 0 qr-data-items = 1 unmatched-queries = 2 unmatched-responses = 3 discarded-opcode = 4 malformed-items = 5
; ; Tables of common data referenced from records in a Block. ; BlockTables = { ? ip-address => [+ IPAddress], ? classtype => [+ ClassType], ? name-rdata => [+ bstr], ; Holds both names ; and RDATA ? qr-sig => [+ QueryResponseSignature], ? QuestionTables, ? RRTables, ? malformed-message-data => [+ MalformedMessageData], } ip-address = 0 classtype = 1 name-rdata = 2 qr-sig = 3 qlist = 4 qrr = 5 rrlist = 6 rr = 7 malformed-message-data = 8 IPv4Address = bstr .size (0..4) IPv6Address = bstr .size (0..16) IPAddress = IPv4Address / IPv6Address ClassType = { type => uint, class => uint, } type = 0 class = 1 QueryResponseSignature = { ? server-address-index => uint, ? server-port => uint, ? qr-transport-flags => QueryResponseTransportFlags, ? qr-type => QueryResponseType, ? qr-sig-flags => QueryResponseFlags, ? query-opcode => uint, ? qr-dns-flags => DNSFlags, ? query-rcode => uint, ? query-classtype-index => uint, ? query-qdcount => uint, ? query-ancount => uint, ? query-nscount => uint, ? query-arcount => uint,
? query-edns-version => uint, ? query-udp-size => uint, ? query-opt-rdata-index => uint, ? response-rcode => uint, } server-address-index = 0 server-port = 1 qr-transport-flags = 2 qr-type = 3 qr-sig-flags = 4 query-opcode = 5 qr-dns-flags = 6 query-rcode = 7 query-classtype-index = 8 query-qdcount = 9 query-ancount = 10 query-nscount = 11 query-arcount = 12 query-edns-version = 13 query-udp-size = 14 query-opt-rdata-index = 15 response-rcode = 16 ; Transport gives the values that may appear in bits 1..4 of ; TransportFlags. There is currently no way to express this in ; CDDL, so Transport is unused. To avoid confusion when used ; with CDDL tools, it is commented out. ; ; Transport = &( ; udp : 0, ; tcp : 1, ; tls : 2, ; dtls : 3, ; https : 4, ; non-standard : 15, ; ) TransportFlagValues = &( ip-version : 0, ; 0=IPv4, 1=IPv6 ) / (1..4) TransportFlags = uint .bits TransportFlagValues QueryResponseTransportFlagValues = &( query-trailingdata : 5, ) / TransportFlagValues QueryResponseTransportFlags = uint .bits QueryResponseTransportFlagValues
QueryResponseType = &( stub : 0, client : 1, resolver : 2, auth : 3, forwarder : 4, tool : 5, ) QueryResponseFlagValues = &( has-query : 0, has-response : 1, query-has-opt : 2, response-has-opt : 3, query-has-no-question : 4, response-has-no-question: 5, ) QueryResponseFlags = uint .bits QueryResponseFlagValues DNSFlagValues = &( query-cd : 0, query-ad : 1, query-z : 2, query-ra : 3, query-rd : 4, query-tc : 5, query-aa : 6, query-do : 7, response-cd: 8, response-ad: 9, response-z : 10, response-ra: 11, response-rd: 12, response-tc: 13, response-aa: 14, ) DNSFlags = uint .bits DNSFlagValues
QuestionTables = ( qlist => [+ QuestionList], qrr => [+ Question] ) QuestionList = [+ uint] ; Index of Question Question = { ; Second and subsequent Questions name-index => uint, ; Index to a name in the ; name-rdata table classtype-index => uint, } name-index = 0 classtype-index = 1 RRTables = ( rrlist => [+ RRList], rr => [+ RR] ) RRList = [+ uint] ; Index of RR RR = { name-index => uint, ; Index to a name in the ; name-rdata table classtype-index => uint, ? ttl => uint, ? rdata-index => uint, ; Index to RDATA in the ; name-rdata table } ; Other map key values already defined above. ttl = 2 rdata-index = 3 MalformedMessageData = { ? server-address-index => uint, ? server-port => uint, ? mm-transport-flags => TransportFlags, ? mm-payload => bstr, } ; Other map key values already defined above. mm-transport-flags = 2 mm-payload = 3
; ; A single Query/Response data item. ; QueryResponse = { ? time-offset => uticks, ; Time offset from ; start of Block ? client-address-index => uint, ? client-port => uint, ? transaction-id => uint, ? qr-signature-index => uint, ? client-hoplimit => uint, ? response-delay => ticks, ? query-name-index => uint, ? query-size => uint, ; DNS size of Query ? response-size => uint, ; DNS size of Response ? response-processing-data => ResponseProcessingData, ? query-extended => QueryResponseExtended, ? response-extended => QueryResponseExtended, } time-offset = 0 client-address-index = 1 client-port = 2 transaction-id = 3 qr-signature-index = 4 client-hoplimit = 5 response-delay = 6 query-name-index = 7 query-size = 8 response-size = 9 response-processing-data = 10 query-extended = 11 response-extended = 12 ResponseProcessingData = { ? bailiwick-index => uint, ? processing-flags => ResponseProcessingFlags, } bailiwick-index = 0 processing-flags = 1 ResponseProcessingFlagValues = &( from-cache : 0, ) ResponseProcessingFlags = uint .bits ResponseProcessingFlagValues
QueryResponseExtended = { ? question-index => uint, ; Index of QuestionList ? answer-index => uint, ; Index of RRList ? authority-index => uint, ? additional-index => uint, } question-index = 0 answer-index = 1 authority-index = 2 additional-index = 3 ; ; Address event data. ; AddressEventCount = { ae-type => &AddressEventType, ? ae-code => uint, ae-address-index => uint, ? ae-transport-flags => TransportFlags, ae-count => uint, } ae-type = 0 ae-code = 1 ae-address-index = 2 ae-transport-flags = 3 ae-count = 4 AddressEventType = ( tcp-reset : 0, icmp-time-exceeded : 1, icmp-dest-unreachable : 2, icmpv6-time-exceeded : 3, icmpv6-dest-unreachable: 4, icmpv6-packet-too-big : 5, ) ; ; Malformed messages. ; MalformedMessage = { ? time-offset => uticks, ; Time offset from ; start of Block ? client-address-index => uint, ? client-port => uint, ? message-data-index => uint, } ; Other map key values already defined above. message-data-index = 3
Appendix B. DNS Name Compression Example
The basic algorithm, which follows the guidance in [RFC1035], is simply to collect each name, and the offset in the packet at which it starts, during packet construction. As each name is added, it is offered to each of the collected names in order of collection, starting from the first name. If (1) labels at the end of the name can be replaced with a reference back to part (or all) of the earlier name and (2) the uncompressed part of the name is shorter than any compression already found, the earlier name is noted as the compression target for the name. The following tables illustrate the step-by-step process of adding names and performing name compression. In an example packet, the first name added is foo.example, which cannot be compressed. +---+-------------+--------------+--------------------+ | N | Name | Uncompressed | Compression Target | +---+-------------+--------------+--------------------+ | 1 | foo.example | foo.example | None | +---+-------------+--------------+--------------------+ The next name added is bar.example. This is matched against foo.example. The example part of this can be used as a compression target, with the remaining uncompressed part of the name being bar. +---+-------------+--------------+-----------------------+ | N | Name | Uncompressed | Compression Target | +---+-------------+--------------+-----------------------+ | 1 | foo.example | foo.example | None | | 2 | bar.example | bar | 1 + offset to example | +---+-------------+--------------+-----------------------+ The third name added is www.bar.example. This is first matched against foo.example, and as before this is recorded as a compression target, with the remaining uncompressed part of the name being www.bar. It is then matched against the second name, which again can be a compression target. Because the remaining uncompressed part of the name is www, this is an improved compression, and so it is adopted. +---+-----------------+--------------+-----------------------+ | N | Name | Uncompressed | Compression Target | +---+-----------------+--------------+-----------------------+ | 1 | foo.example | foo.example | None | | 2 | bar.example | bar | 1 + offset to example | | 3 | www.bar.example | www | 2 | +---+-----------------+--------------+-----------------------+
As an optimization, if a name is already perfectly compressed (in other words, the uncompressed part of the name is empty), then no further names will be considered for compression.B.1. NSD Compression Algorithm
Using the above basic algorithm, the packet lengths of Responses generated by the Name Server Daemon (NSD) [NSD] can be matched almost exactly. At the time of writing, a tiny number (<.01%) of the reconstructed packets had incorrect lengths.B.2. Knot Authoritative Compression Algorithm
The Knot Authoritative name server [Knot] uses different compression behavior, which is the result of internal optimization designed to balance runtime speed with compression size gains. In brief, and omitting complications, Knot Authoritative will only consider the QNAME and names in the immediately preceding RR section in an RRSET as compression targets. A set of smart heuristics as described below can be implemented to mimic this, and while not perfect, it produces output nearly, but not quite, as good a match as with NSD. The heuristics are as follows: 1. A match is only perfect if the name is completely compressed AND the TYPE of the section in which the name occurs matches the TYPE of the name used as the compression target. 2. If the name occurs in RDATA: * If the compression target name is in a Query, then only the first RR in an RRSET can use that name as a compression target. * The compression target name MUST be in RDATA. * The name section TYPE must match the compression target name section TYPE. * The compression target name MUST be in the immediately preceding RR in the RRSET. Using this algorithm, less than 0.1% of the reconstructed packets had incorrect lengths.
B.3. Observed Differences
In sample traffic collected on a root name server, around 2-4% of Responses generated by Knot had different packet lengths than those produced by NSD.Appendix C. Comparison of Binary Formats
Several binary serialization formats were considered. For completeness, they were also compared to JSON. o Apache Avro [Avro]. Data is stored according to a predefined schema. The schema itself is always included in the data file. Data can therefore be stored untagged, for a smaller serialization size, and be written and read by an Avro library. * At the time of writing, Avro libraries are available for C, C++, C#, Java, Python, Ruby, and PHP. Optionally, tools are available for C++, Java, and C# to generate code for encoding and decoding. o Google Protocol Buffers [Protocol-Buffers]. Data is stored according to a predefined schema. The schema is used by a generator to generate code for encoding and decoding the data. Data can therefore be stored untagged, for a smaller serialization size. The schema is not stored with the data, so unlike Avro, it cannot be read with a generic library. * Code must be generated for a particular data schema to read and write data using that schema. At the time of writing, the Google code generator can currently generate code for encoding and decoding a schema for C++, Go, Java, Python, Ruby, C#, Objective-C, JavaScript, and PHP. o CBOR [RFC7049]. This serialization format is comparable to JSON but with a binary representation. It does not use a predefined schema, so data is always stored tagged. However, CBOR data schemas can be described using CDDL [RFC8610], and tools exist to verify that data files conform to the schema. * CBOR is a simple format and is simple to implement. At the time of writing, the CBOR website lists implementations for 16 languages.
Avro and Protocol Buffers both allow storage of untagged data, but because they rely on the data schema for this, their implementation is considerably more complex than CBOR. Using Avro or Protocol Buffers in an unsupported environment would require notably greater development effort compared to CBOR. A test program was written that reads input from a PCAP file and writes output using one of two basic structures: either a simple structure, where each Query/Response pair is represented in a single record entry, or the C-DNS block structure. The resulting output files were then compressed using a variety of common general-purpose lossless compression tools to explore the compressibility of the formats. The compression tools employed were: o snzip [snzip]. A command-line compression tool based on the Google Snappy library [snappy]. o lz4 [lz4]. The command-line compression tool from the reference C LZ4 implementation. o gzip [gzip]. The ubiquitous GNU zip tool. o zstd [zstd]. Compression using the Zstandard algorithm. o xz [xz]. A popular compression tool noted for high compression. In all cases, the compression tools were run using their default settings. Note that this document does not mandate the use of compression, nor any particular compression scheme, but it anticipates that in practice output data will be subject to general-purpose compression, and so this should be taken into consideration. "test.pcap", a 662 MB capture of sample data from a root instance, was used for the comparison. The following table shows the formatted size and size after compression (abbreviated to Comp. in the table headers), together with the task Resident Set Size (RSS) and the user time taken by the compression. File sizes are in MB, RSS is in kB, and user time is in seconds.
+-------------+-----------+-------+------------+-------+-----------+ | Format | File Size | Comp. | Comp. Size | RSS | User Time | +-------------+-----------+-------+------------+-------+-----------+ | PCAP | 661.87 | snzip | 212.48 | 2696 | 1.26 | | | | lz4 | 181.58 | 6336 | 1.35 | | | | gzip | 153.46 | 1428 | 18.20 | | | | zstd | 87.07 | 3544 | 4.27 | | | | xz | 49.09 | 97416 | 160.79 | | | | | | | | | JSON simple | 4113.92 | snzip | 603.78 | 2656 | 5.72 | | | | lz4 | 386.42 | 5636 | 5.25 | | | | gzip | 271.11 | 1492 | 73.00 | | | | zstd | 133.43 | 3284 | 8.68 | | | | xz | 51.98 | 97412 | 600.74 | | | | | | | | | Avro simple | 640.45 | snzip | 148.98 | 2656 | 0.90 | | | | lz4 | 111.92 | 5828 | 0.99 | | | | gzip | 103.07 | 1540 | 11.52 | | | | zstd | 49.08 | 3524 | 2.50 | | | | xz | 22.87 | 97308 | 90.34 | | | | | | | | | CBOR simple | 764.82 | snzip | 164.57 | 2664 | 1.11 | | | | lz4 | 120.98 | 5892 | 1.13 | | | | gzip | 110.61 | 1428 | 12.88 | | | | zstd | 54.14 | 3224 | 2.77 | | | | xz | 23.43 | 97276 | 111.48 | | | | | | | | | PBuf simple | 749.51 | snzip | 167.16 | 2660 | 1.08 | | | | lz4 | 123.09 | 5824 | 1.14 | | | | gzip | 112.05 | 1424 | 12.75 | | | | zstd | 53.39 | 3388 | 2.76 | | | | xz | 23.99 | 97348 | 106.47 | | | | | | | | | JSON block | 519.77 | snzip | 106.12 | 2812 | 0.93 | | | | lz4 | 104.34 | 6080 | 0.97 | | | | gzip | 57.97 | 1604 | 12.70 | | | | zstd | 61.51 | 3396 | 3.45 | | | | xz | 27.67 | 97524 | 169.10 | | | | | | | | | Avro block | 60.45 | snzip | 48.38 | 2688 | 0.20 | | | | lz4 | 48.78 | 8540 | 0.22 | | | | gzip | 39.62 | 1576 | 2.92 | | | | zstd | 29.63 | 3612 | 1.25 | | | | xz | 18.28 | 97564 | 25.81 | | | | | | | |
| CBOR block | 75.25 | snzip | 53.27 | 2684 | 0.24 | | | | lz4 | 51.88 | 8008 | 0.28 | | | | gzip | 41.17 | 1548 | 4.36 | | | | zstd | 30.61 | 3476 | 1.48 | | | | xz | 18.15 | 97556 | 38.78 | | | | | | | | | PBuf block | 67.98 | snzip | 51.10 | 2636 | 0.24 | | | | lz4 | 52.39 | 8304 | 0.24 | | | | gzip | 40.19 | 1520 | 3.63 | | | | zstd | 31.61 | 3576 | 1.40 | | | | xz | 17.94 | 97440 | 33.99 | +-------------+-----------+-------+------------+-------+-----------+ The above results are discussed in the following sections.C.1. Comparison with Full PCAP Files
An important first consideration is whether moving away from PCAP offers significant benefits. The simple binary formats are typically larger than PCAP, even though they omit some information such as Ethernet Media Access Control (MAC) addresses. But not only do they require less CPU to compress than PCAP, the resulting compressed files are smaller than compressed PCAP.C.2. Simple versus Block Coding
The intention of the block coding is to perform data deduplication on Query/Response records within the block. The simple and block formats shown above store exactly the same information for each Query/Response record. This information is parsed from the DNS traffic in the input PCAP file, and in all cases each field has an identifier and the field data is typed. The data deduplication on the block formats show an order-of- magnitude reduction in the size of the format file size against the simple formats. As would be expected, the compression tools are able to find and exploit a lot of this duplication, but as the deduplication process uses knowledge of DNS traffic, it is able to retain a size advantage. This advantage reduces as stronger compression is applied, as again would be expected, but even with the strongest compression applied the block-formatted data remains around 75% of the size of the simple format and its compression requires roughly a third of the CPU time.
C.3. Binary versus Text Formats
Text data formats offer many advantages over binary formats, particularly in the areas of ad hoc data inspection and extraction. It was therefore felt worthwhile to carry out a direct comparison, implementing JSON versions of the simple and block formats. Concentrating on JSON block format, the format files produced are a significant fraction of an order of magnitude larger than binary formats. The impact on file size after compression is as might be expected from that starting point; the stronger compression produces files that are 150% of the size of similarly compressed binary format and require over 4x more CPU to compress.C.4. Performance
Concentrating again on the block formats, all three produce format files that are close to an order of magnitude smaller than the original "test.pcap" file. CBOR produces the largest files and Avro the smallest, 20% smaller than CBOR. However, once compression is taken into account, the size difference narrows. At medium compression (with gzip), the size difference is 4%. Using strong compression (with xz), the difference reduces to 2%, with Avro the largest and Protocol Buffers the smallest, although CBOR and Protocol Buffers require slightly more compression CPU. The measurements presented above do not include data on the CPU required to generate the format files. Measurements indicate that writing Avro requires 10% more CPU than CBOR or Protocol Buffers. It appears, therefore, that Avro's advantage in compression CPU usage is probably offset by a larger CPU requirement in writing Avro.C.5. Conclusions
The above assessments lead us to the choice of a binary format file using blocking. As noted previously, this document anticipates that output data will be subject to compression. There is no compelling case for one particular binary serialization format in terms of either final file size or machine resources consumed, so the choice must be largely based on other factors. CBOR was therefore chosen as the binary serialization format for the reasons listed in Section 5.
C.6. Block Size Choice
Given the choice of a CBOR format using blocking, the question arises of what an appropriate default value for the maximum number of Query/Response pairs in a block should be. This has two components: 1. What is the impact on performance of using different block sizes in the format file? 2. What is the impact on the size of the format file before and after compression? The following table addresses the performance question, showing the impact on the performance of a C++ program converting "test.pcap" to C-DNS. File sizes are in MB, RSS is in kB, and user time is in seconds. +------------+-----------+--------+-----------+ | Block Size | File Size | RSS | User Time | +------------+-----------+--------+-----------+ | 1,000 | 133.46 | 612.27 | 15.25 | | 5,000 | 89.85 | 676.82 | 14.99 | | 10,000 | 76.87 | 752.40 | 14.53 | | 20,000 | 67.86 | 750.75 | 14.49 | | 40,000 | 61.88 | 736.30 | 14.29 | | 80,000 | 58.08 | 694.16 | 14.28 | | 160,000 | 55.94 | 733.84 | 14.44 | | 320,000 | 54.41 | 799.20 | 13.97 | +------------+-----------+--------+-----------+ Therefore, increasing block size tends to increase maximum RSS a little, with no significant effect (if anything, a small reduction) on CPU consumption.
The following table demonstrates the effect of increasing block size on output file size for different compressions. +------------+--------+-------+-------+-------+-------+-------+ | Block Size | None | snzip | lz4 | gzip | zstd | xz | +------------+--------+-------+-------+-------+-------+-------+ | 1,000 | 133.46 | 90.52 | 90.03 | 74.65 | 44.78 | 25.63 | | 5,000 | 89.85 | 59.69 | 59.43 | 46.99 | 37.33 | 22.34 | | 10,000 | 76.87 | 50.39 | 50.28 | 38.94 | 33.62 | 21.09 | | 20,000 | 67.86 | 43.91 | 43.90 | 33.24 | 32.62 | 20.16 | | 40,000 | 61.88 | 39.63 | 39.69 | 29.44 | 28.72 | 19.52 | | 80,000 | 58.08 | 36.93 | 37.01 | 27.05 | 26.25 | 19.00 | | 160,000 | 55.94 | 35.10 | 35.06 | 25.44 | 24.56 | 19.63 | | 320,000 | 54.41 | 33.87 | 33.74 | 24.36 | 23.44 | 18.66 | +------------+--------+-------+-------+-------+-------+-------+ There is obviously scope for tuning the default block size to the compression being employed, traffic characteristics, frequency of output file rollover, etc. Using a strong compression scheme, block sizes over 10,000 Query/Response pairs would seem to offer limited improvements.Appendix D. Data Fields for Traffic Regeneration
D.1. Recommended Fields for Traffic Regeneration
This section specifies the data fields that would need to be captured in order to perform the fullest PCAP traffic reconstruction for well-formed DNS messages that is possible with C-DNS. o All data fields in the QueryResponse type except response- processing-data. o All data fields in the QueryResponseSignature type except qr-type. o All data fields in the RR TYPE.D.2. Issues with Small Data Captures
At the other extreme, an interesting corner case arises when opting to perform captures with a smaller data set than that recommended above. The following list specifies a subset of the above data fields; if only these data fields are captured, then even a minimal traffic reconstruction is problematic because there is not enough information to determine if the Query/Response data item contained just a Query, just a Response, or a Query/Response pair.
o The following data fields from the QueryResponse type: * time-offset * client-address-index * client-port * transaction-id * query-name-index o The following data fields from the QueryResponseSignature type: * server-address-index * server-port * qr-transport-flags * query-classtype-index In this case, simply also capturing the qr-sig-flags will provide enough information to perform a minimal traffic reconstruction (assuming that suitable defaults for the remaining fields are provided). Additionally, capturing response-delay, query-opcode, and response-rcode will avoid having to rely on potentially misleading defaults for these values and should result in a PCAP that represents the basics of the real traffic flow.Acknowledgements
The authors wish to thank CZ.NIC -- in particular, Tomas Gavenciak -- for many useful discussions on binary formats, compression, and packet matching. Thanks also to Jan Vcelak and Wouter Wijngaards for discussions on name compression, and Paul Hoffman for a detailed review of this document and the C-DNS CDDL. Thanks also to Robert Edmonds, Jerry Lundstrom, Richard Gibson, Stephane Bortzmeyer, and many other members of DNSOP for review. Also, thanks to Miek Gieben for [mmark].
Authors' Addresses
John Dickinson Sinodun IT Magdalen Centre Oxford Science Park Oxford OX4 4GA United Kingdom Email: jad@sinodun.com Jim Hague Sinodun IT Magdalen Centre Oxford Science Park Oxford OX4 4GA United Kingdom Email: jim@sinodun.com Sara Dickinson Sinodun IT Magdalen Centre Oxford Science Park Oxford OX4 4GA United Kingdom Email: sara@sinodun.com Terry Manderson ICANN 12025 Waterfront Drive Suite 300 Los Angeles, CA 90094-2536 United States of America Email: terry.manderson@icann.org John Bond Wikimedia Foundation, Inc. 1 Montgomery Street Suite 1600 San Francisco, CA 94104 United States of America Email: ietf-wikimedia@johnbond.org