3. CDNI Logging File
3.1. Rules
This specification uses the Augmented Backus-Naur Form (ABNF) notation and core rules of [RFC5234]. In particular, the present document uses the following rules from [RFC5234]: CR = %x0D ; carriage return ALPHA = %x41-5A / %x61-7A ; A-Z / a-z DIGIT = %x30-39 ; 0-9 DQUOTE = %x22 ; " (Double Quote) CRLF = CR LF ; Internet standard newline HEXDIG = DIGIT / "A" / "B" / "C" / "D" / "E" / "F" HTAB = %x09 ; horizontal tab LF = %x0A ; linefeed VCHAR = %x21-7E ; visible (printing) characters OCTET = %x00-FF ; 8 bits of data The present document also uses the following rules from [RFC3986]: host = as specified in Section 3.2.2 of [RFC3986]. IPv4address = as specified in Section 3.2.2 of [RFC3986]. IPv6address = as specified in Section 3.2.2 of [RFC3986]. partial-time = as specified in Section 5.6 of [RFC3339]. The present document also defines the following additional rules: ADDRESS = IPv4address / IPv6address ALPHANUM = ALPHA / DIGIT DATE = 4DIGIT "-" 2DIGIT "-" 2DIGIT ; Dates are encoded as "full-date" specified in [RFC3339].
DEC = 1*DIGIT ["." 1*DIGIT] NAMEFORMAT = ALPHANUM *(ALPHANUM / "_" / "-") QSTRING = DQUOTE *(NDQUOTE / PCT-ENCODED) DQUOTE NDQUOTE = %x20-21 / %x23-24 / %x26-7E / UTF8-2 / UTF8-3 / UTF8-4 ; whereby a DQUOTE is conveyed inside a QSTRING unambiguously ; by escaping it with PCT-ENCODED. PCT-ENCODED = "%" HEXDIG HEXDIG ; percent encoding is used for escaping octets that might be ; possible in HTTP headers such as bare CR, bare LF, CR LF, ; HTAB, SP, or null. These octets are rendered with percent ; encoding in ABNF as specified by [RFC3986] in order to avoid ; considering them as separators for the Logging Records. NHTABSTRING = 1*(SP / VCHAR) TIME = partial-time USER-COMMENT = *(SP / VCHAR / UTF8-2 / UTF8-3 / UTF8-4)3.2. CDNI Logging File Structure
As defined in Section 1.1, a CDNI Logging Field is an atomic Logging information element, a CDNI Logging Record is a collection of CDNI Logging fields containing all logging information corresponding to a single logging event, and a CDNI Logging File contains a collection of CDNI Logging Records. This structure is illustrated in Figure 3. The use of a file structure for transfer of CDNI Logging information is selected since this is the most common practice today for exchange of Logging information within and across CDNs.
+----------------------------------------------------------+ |CDNI Logging File | | | | #Directive 1 | | #Directive 2 | | ... | | #Directive P | | | | +------------------------------------------------------+ | | |CDNI Logging Record 1 | | | | +-------------+ +-------------+ +-------------+ | | | | |CDNI Logging | |CDNI Logging | ... |CDNI Logging | | | | | | Field 1 | | Field 2 | | Field N | | | | | +-------------+ +-------------+ +-------------+ | | | +------------------------------------------------------+ | | | | +------------------------------------------------------+ | | |CDNI Logging Record 2 | | | | +-------------+ +-------------+ +-------------+ | | | | |CDNI Logging | |CDNI Logging | ... |CDNI Logging | | | | | | Field 1 | | Field 2 | | Field N | | | | | +-------------+ +-------------+ +-------------+ | | | +------------------------------------------------------+ | | | | ... | | | | #Directive P+1 | | | | ... | | | | +------------------------------------------------------+ | | |CDNI Logging Record M | | | | +-------------+ +-------------+ +-------------+ | | | | |CDNI Logging | |CDNI Logging | ... |CDNI Logging | | | | | | Field 1 | | Field 2 | | Field N | | | | | +-------------+ +-------------+ +-------------+ | | | +------------------------------------------------------+ | | | | | | #Directive P+Q | +----------------------------------------------------------+ Figure 3: Structure of Logging Files
The CDNI Logging File format is inspired from the W3C Extended Log File Format [ELF]. However, it is fully specified by the present document. Where the present document differs from the W3C Extended Log File Format, an implementation of the CDNI Logging interface MUST comply with the present document. The W3C Extended Log File Format was used as a starting point, reused where possible, and expanded when necessary. Using a format that resembles the W3C Extended Log File Format is intended to keep the CDNI logging format close to the intra-CDN Logging information format commonly used in CDNs today, thereby minimizing systematic translation at the CDN/CDNI boundary. A CDNI Logging File MUST contain a sequence of lines containing US- ASCII characters [CHAR_SET] terminated by CRLF. Each line of a CDNI Logging File MUST contain either a directive or a CDNI Logging Record. Directives record information about the CDNI Logging process itself. Lines containing directives MUST begin with the "#" character. Directives are specified in Section 3.3. Logging Records provide actual details of the logged event. Logging Records are specified in Section 3.4. The CDNI Logging File has a specific structure. It always starts with a directive line, and the first directive it contains MUST be the version. The directive lines form together a group that contains at least one directive line. Each directives group is followed by a group of Logging Records. The records group contains zero or more actual Logging Record lines about the event being logged. A record line consists of the values corresponding to all or a subset of the possible Logging fields defined within the scope of the record-type directive. These values MUST appear in the order defined by the fields directive. Note that future extensions MUST be compliant with the previous description. The following examples depict the structure of a CDNILOGFILE as defined currently by the record-type "cdni_http_request_v1."
DIRLINE = "#" directive CRLF DIRGROUP = 1*DIRLINE RECLINE = <any subset of record values that match what is expected according to the fields directive within the immediately preceding DIRGROUP> RECGROUP = *RECLINE CDNILOGFILE = 1*(DIRGROUP RECGROUP)3.3. CDNI Logging Directives
A CDNI Logging directive line contains the directive name followed by ":" HTAB and the directive value. Directive names MUST be of the format NAMEFORMAT. All directive names MUST be registered in the "CDNI Logging Directives Names" registry. Directive names are case-insensitive as per the basic ABNF ([RFC5234]). Unknown directives MUST be ignored. Directive values can have various formats. All possible directive values for the record-type "cdni_http_request_v1" are further detailed in this section. The following example shows the structure of a directive and enumerates strictly the directive values presently defined in the version "cdni/1.0" of the CDNI Logging File. directive = DIRNAME ":" HTAB DIRVAL DIRNAME = NAMEFORMAT FIENAME = <any CDNI Logging field name registered in the CDNI Logging Field Names registry (Section 6.4) that is valid for the record type specified in the record-type directive.> DIRVAL = NHTABSTRING / QSTRING / host / USER-COMMENT / FIENAME *(HTAB FIENAME) / 64HEXDIG
An implementation of the CDNI Logging interface MUST support all of the following directives, listed below by their directive name: o Version: * Format: NHTABSTRING * Directive value: Indicates the version of the CDNI Logging File format. The entity transmitting a CDNI Logging File as per the present document MUST set the value to "cdni/1.0". In the future, other versions of the CDNI Logging File might be specified; those would use a value different from "cdni/1.0", which allows the entity receiving the CDNI Logging File to identify the corresponding version. CDNI Logging File versions are case-insensitive as per the basic ABNF ([RFC5234]). * Occurrence: There MUST be one and only one instance of this directive per the CDNI Logging File. It MUST be the first line of the CDNI Logging File. * Example: "version: HTAB cdni/1.0". o UUID: * Format: NHTABSTRING * Directive value: This a Uniform Resource Name (URN) from the Universally Unique IDentifier (UUID) URN namespace specified in [RFC4122]. The UUID contained in the URN uniquely identifies the CDNI Logging File. * Occurrence: There MUST be one and only one instance of this directive per the CDNI Logging File. * Example: "UUID: HTAB NHTABSTRING". o Claimed-origin: * Format: Host * Directive value: This contains the claimed identification of the entity transmitting the CDNI Logging File (e.g., the host in a dCDN supporting the CDNI Logging interface) or the entity responsible for transmitting the CDNI Logging File (e.g., the dCDN).
* Occurrence: There MUST be zero or exactly one instance of this directive per the CDNI Logging File. This directive MAY be included by the dCDN. It MUST NOT be included or modified by the uCDN. * Example: "claimed-origin: HTAB host". o Established-origin: * Format: Host * Directive value: This contains the identification, as established by the entity receiving the CDNI Logging File, of the entity transmitting the CDNI Logging File (e.g., the host in a dCDN supporting the CDNI Logging interface) or the entity responsible for transmitting the CDNI Logging File (e.g., the dCDN). * Occurrence: There MUST be zero or exactly one instance of this directive per the CDNI Logging File. This directive MAY be added by the uCDN (e.g., before storing the CDNI Logging File). It MUST NOT be included by the dCDN. The mechanisms used by the uCDN to establish and validate the entity responsible for the CDNI Logging File is outside the scope of the present document. We observe that, in particular, this may be achieved through authentication mechanisms that are part of the transport layer of the CDNI Logging File pull mechanism (Section 4.2). * ABNF example: "established-origin: HTAB host". o Remark: * Format: USER-COMMENT * Directive value: This contains comment information. Data contained in this field is to be ignored by analysis tools. * Occurrence: There MAY be zero, one, or any number of instances of this directive per the CDNI Logging File. * Example: "remark: HTAB USER-COMMENT".
o Record-type: * Format: NAMEFORMAT * Directive value: Indicates the type of the CDNI Logging Records that follow this directive, until another record-type directive appears in the CDNI Logging File (or the end of the CDNI Logging File). This can be any CDNI Logging Record type registered in the "CDNI Logging record-types" registry (Section 6.3). For example, this may be "cdni_http_request_v1" as specified in Section 3.4.1. CDNI Logging record-types are case-insensitive as per the basic ABNF ([RFC5234]). * Occurrence: There MUST be at least one instance of this directive per the CDNI Logging File. The first instance of this directive MUST precede a fields directive and MUST precede all CDNI Logging Records. * Example: "record-type: HTAB cdni_http_request_v1". o Fields: * Format: FIENAME *(HTAB FIENAME) ; where FIENAME can take any CDNI Logging field name registered in the "CDNI Logging Field Names" registry (Section 6.4) that is valid for the record type specified in the record-type directive. * Directive value: This lists the names of all the fields for which a value is to appear in the CDNI Logging Records that follow the instance of this directive (until another instance of this directive appears in the CDNI Logging File). The names of the fields, as well as their occurrences, MUST comply with the corresponding rules specified in the document referenced in the "CDNI Logging record-types" registry (Section 6.3) for the corresponding CDNI Logging record-type. * Occurrence: There MUST be at least one instance of this directive per record-type directive. The first instance of this directive for a given record-type MUST appear before any CDNI Logging Record for this record-type. One situation where more than one instance of the fields directive can appear within a given CDNI Logging File is when there is a change, in the middle of a fairly large logging period, and in the agreement between the uCDN and the dCDN about the set of fields that are to be exchanged. The multiple occurrences allow records with the old set of fields and records with the new set of fields to be carried inside the same Logging File.
* Example: "fields: HTAB FIENAME * (HTAB FIENAME)". o SHA256-hash: * Format: 64HEXDIG * Directive value: This directive permits the detection of a corrupted CDNI Logging File. This can be useful, for instance, if a problem occurs on the file system of the dCDN Logging system and leads to a truncation of a Logging File. The valid SHA256-hash value is included in this directive by the entity that transmits the CDNI Logging File. It MUST be computed by applying the SHA-256 ([RFC6234]) cryptographic hash function on the CDNI Logging File, including all the directives and Logging Records, up to the SHA256-hash directive itself, excluding the SHA256-hash directive itself. The SHA256-hash value MUST be represented as a 64-digit hexadecimal number encoded in US- ASCII (representing a 256 bit hash value). The entity receiving the CDNI Logging File also computes, in a similar way, the SHA-256 hash on the received CDNI Logging File and compares this hash to the value of the SHA256-hash directive. If the two values are equal, then the received CDNI Logging File is to be considered non-corrupted. If the two values are different, the received CDNI Logging File is to be considered corrupted. The behavior of the entity that received a corrupted CDNI Logging File is outside the scope of this specification; we note that the entity MAY attempt to pull the same CDNI Logging File from the transmitting entity again. If the entity receiving a non-corrupted CDNI Logging File adds an established-origin directive, it MUST then recompute and update the SHA256-hash directive so that it also protects the added established-origin directive. * Occurrence: There MUST be zero or exactly one instance of this directive. There SHOULD be exactly one instance of this directive. One situation where that directive could be omitted is where integrity protection is already provided via another mechanism (for example, if an integrity hash is associated to the CDNI Logging File out of band through the CDNI Logging Feed (Section 4.1) leveraging ATOM extensions such as those proposed in [ATOMPUB]. When present, the SHA256-hash field MUST be the last line of the CDNI Logging File. * Example: "SHA256-hash: HTAB 64HEXDIG". A uCDN-side implementation of the CDNI Logging interface MUST ignore a CDNI Logging File that does not comply with the occurrences specified above for each and every directive. For example, a uCDN-
side implementation of the CDNI Logging interface receiving a CDNI Logging File with zero occurrence of the version directive, or with two occurrences of the SHA256-hash, MUST ignore this CDNI Logging File. An entity receiving a CDNI Logging File with a value set to "cdni/1.0" MUST process the CDNI Logging File as per the present document. An entity receiving a CDNI Logging File with a value set to a different value MUST process the CDNI Logging File as per the specification referenced in the "CDNI Logging File version" registry (see Section 6.1) if the implementation supports this specification and MUST ignore the CDNI Logging File otherwise.3.4. CDNI Logging Records
A CDNI Logging Record consists of a sequence of CDNI Logging fields relating to that single CDNI Logging Record. CDNI Logging fields MUST be separated by the horizontal tabulation (HTAB) character. To facilitate readability, a prefix scheme is used for CDNI Logging field names in a similar way to the one used in W3C Extended Log File Format [ELF]. The semantics of the prefix in the present document are: o "c-" refers to the User Agent that issues the request (corresponds to the "client" of W3C Extended Log Format) o "d-" refers to the dCDN (relative to a given CDN acting as an uCDN) o "s-" refers to the dCDN Surrogate that serves the request (corresponds to the "server" of the W3C Extended Log Format) o "u-" refers to the uCDN (relative to a given CDN acting as a dCDN) o "cs-" refers to communication from the User Agent towards the dCDN Surrogate o "sc-" refers to communication from the dCDN Surrogate towards the User Agent An implementation of the CDNI Logging interface as per the present specification MUST support the CDNI HTTP Request Logging Record as specified in Section 3.4.1.
A CDNI Logging Record contains the corresponding values for the fields that are enumerated in the last fields directive before the current log line. Note that the order in which the field values appear is dictated by the order of the fields names in the fields directive. There SHOULD be no dependency between the various fields values.3.4.1. HTTP Request Logging Record
This section defines the CDNI Logging Record of record-type "cdni_http_request_v1". It is applicable to content delivery performed by the dCDN using HTTP/1.0 ([RFC1945]), HTTP/1.1 ([RFC7230] [RFC7231] [RFC7232] [RFC7233] [RFC7234] [RFC7235]), or HTTPS ([RFC2818] [RFC7230]). We observe that, in the case of HTTPS delivery, there may be value in logging additional information specific to the operation of HTTP over Transport Layer Security (TLS) and we note that this is outside the scope of the present document and may be addressed in a future document defining another CDNI Logging Record or another version of the HTTP Request Logging Record. The "cdni_http_request_v1" record-type is also expected to be applicable to HTTP/2 [RFC7540] since a fundamental design tenet of HTTP/2 is to preserve the HTTP/1.1 semantics. We observe that, in the case of HTTP/2 delivery, there may be value in logging additional information specific to the additional functionality of HTTP/2 (e.g., information related to connection identification, to stream identification, to stream priority, and to flow control). We note that such additional information is outside the scope of the present document and may be addressed in a future document defining another CDNI Logging Record or another version of the HTTP Request Logging Record. The "cdni_http_request_v1" record-type contains the following CDNI Logging fields, listed by their field name: o Date: * Format: DATE * Field value: The date on which the processing of the request completed on the Surrogate. * Occurrence: There MUST be one and only one instance of this field.
o Time: * Format: TIME * Field value: The time, which MUST be expressed in Coordinated Universal Time (UTC), at which the processing of the request completed on the Surrogate. * Occurrence: There MUST be one and only one instance of this field. o Time-taken: * Format: DEC * Field value: Decimal value of the duration, in seconds, between the start of the processing of the request and the completion of the request processing (e.g., completion of delivery) by the Surrogate. * Occurrence: There MUST be one and only one instance of this field. o c-groupid: * Format: NHTABSTRING * Field value: An opaque identifier for an aggregate set of clients, derived from the client IPv4 or IPv6 address in the request received by the Surrogate and/or other network-level identifying information. The c-groupid serves to group clients into aggregates. Example aggregates include civil geolocation information (the country, second-level administrative division, or postal code from which the client is presumed to make the request based on a geolocation database lookup) or network topological information (e.g., the BGP autonomous system (AS) number announcing the prefix containing the address). The c-groupid MAY be structured, e.g., US/TN/MEM/38138. Agreement between the dCDN and the uCDN on a mapping between IPv4 and IPv6 addresses and aggregates is presumed to occur out of band. The aggregation mapping SHOULD be chosen such that each aggregate contains more than one client. + When the aggregate is chosen so that it contains a single client (e.g., to allow more detailed analytics, or to allow a posteriori analysis of individual delivery, for example, in situations of performance-based penalties), the c-groupid MAY be structured where some elements identify aggregates
and one element identifies the client, e.g., US/TN/MEM/38138/43a5bdd6-95c4-4d62-be65-7410df0021e2. In the case where the aggregate is chosen so that it contains a single client: - The element identifying the client SHOULD be algorithmically generated (from the client IPv4 or IPv6 address in the request received by the Surrogate and/or other network-level identifying information) in a way that SHOULD NOT be linkable back to the global addressing context and that SHOULD vary over time (to offer protection against long-term attacks). - It is RECOMMENDED that the mapping varies at least once every 24 hours. - The algorithmic mapping and variation over time can, in some cases, allow the uCDN (with the knowledge of the algorithm, the time variation, and the associated attributes and keys) to reconstruct the actual client IPv4 or IPv6 address and/or other network-level identifying information when required (e.g., to allow a posteriori analysis of individual delivery, for example, in situations of performance-based penalties). However, these end-user addresses SHOULD only be reconstructed on- demand and the CDNI Logging File SHOULD only be stored with the anonymized c-groupid value. - Allowing reconstruction of client address information carries with it grave risks to end-user privacy. Since the c-groupid is, in this case, equivalent in identification power to a client IP address, its use may be restricted by regulation or law as personally identifiable information. For this reason, such use is NOT RECOMMENDED. - One method for mapping that MAY be supported by implementations relies on a symmetric key that is known only to the uCDN, the dCDN, and the HMAC-based Extract- and-Expand Key Derivation Function (HKDF) key derivation ([RFC5869]), as will be used in TLS 1.3 ([TLS-1.3]). When that method is used: o The uCDN and dCDN need to agree on the "salt" and "input keying material", as described in Section 2.2 of [RFC5869] and the initial "info" parameter (which could be something like the business names of the two organizations in UTF-8, concatenated), as described in
Section 2.3 of [RFC5869]. The hash SHOULD be either SHA-2 or SHA-3 [SHA-3], and the encryption algorithm SHOULD be 128-bit AES [AES] in Galois Counter Mode (GCM) [GCM] (AES-GCM) or better. The pseudorandom key (PRK) SHOULD be chosen by both parties contributing alternate random bytes until sufficient length exists. After the initial setup, client-information can be encrypted using the key generated by the "expand" step of Section 2.3 of [RFC5869]. The encrypted value SHOULD be hex encoded or base64 encoded (as specified in Section 4 of [RFC4648]). At the agreed-upon expiration time, a new key SHOULD be generated and used. New keys SHOULD be indicated by prefixing the key with a special character such as an exclamation point. In this way, shorter lifetimes can be used as needed. * Occurrence: There MUST be one and only one instance of this field. o s-ip: * Format: ADDRESS * Field value: The IPv4 or IPv6 address of the Surrogate that served the request (i.e., the "server" address). * Occurrence: There MUST be zero or exactly one instance of this field. o s-hostname: * Format: Host * Field value: The hostname of the Surrogate that served the request (i.e., the "server" hostname). * Occurrence: There MUST be zero or exactly one instance of this field. o s-port: * Format: 1*DIGIT * Field value: The destination TCP port (i.e., the "server" port) in the request received by the Surrogate.
* Occurrence: There MUST be zero or exactly one instance of this field. o cs-method: * Format: NHTABSTRING * Field value: This is the method of the request received by the Surrogate. In the case of HTTP delivery, this is the HTTP method in the request. * Occurrence: There MUST be one and only one instance of this field. o cs-uri: * Format: NHTABSTRING * Field value: This is the "effective request URI" of the request received by the Surrogate as specified in [RFC7230]. It complies with the "http" URI scheme or the "https" URI scheme as specified in [RFC7230]. Note that cs-uri can be privacy sensitive. In that case, and where appropriate, u-uri could be used instead of cs-uri. * Occurrence: There MUST be zero or exactly one instance of this field. o u-uri: * Format: NHTABSTRING * Field value: This is a complete URI, derived from the "effective request URI" ([RFC7230]) of the request received by the Surrogate (i.e., the cs-uri) but transformed by the entity generating or transmitting the CDNI Logging Record, in a way that is agreed upon between the two ends of the CDNI Logging interface, so the transformed URI is meaningful to the uCDN. For example, the two ends of the CDNI Logging interface could agree that the u-uri is constructed from the cs-uri by removing the part of the hostname that exposes which individual Surrogate actually performed the delivery. The details of modification performed to generate the u-uri, as well as the mechanism to agree on these modifications between the two sides of the CDNI Logging interface are outside the scope of the present document.
* Occurrence: There MUST be one and only one instance of this field. o Protocol: * Format: NHTABSTRING * Field value: This is the value of the HTTP-Version field as specified in [RFC7230] of the Request-Line of the request received by the Surrogate (e.g., "HTTP/1.1"). * Occurrence: There MUST be one and only one instance of this field. o sc-status: * Format: 3DIGIT * Field value: This is the Status-Code in the response from the Surrogate. In the case of HTTP delivery, this is the HTTP Status-Code in the HTTP response. * Occurrence: There MUST be one and only one instance of this field. o sc-total-bytes: * Format: 1*DIGIT * Field value: This is the total number of bytes of the response sent by the Surrogate in response to the request. In the case of HTTP delivery, this includes the bytes of the Status-Line, the bytes of the HTTP headers, and the bytes of the message- body. * Occurrence: There MUST be one, and only one, instance of this field. o sc-entity-bytes: * Format: 1*DIGIT * Field value: This is the number of bytes of the message-body in the HTTP response sent by the Surrogate in response to the request. This does not include the bytes of the Status-Line or the bytes of the HTTP headers.
* Occurrence: There MUST be zero or exactly one instance of this field. o cs(insert_HTTP_header_name_here): * Format: QSTRING * Field value: The value of the HTTP header (identified by the insert_HTTP_header_name_here in the CDNI Logging field name) as it appears in the request processed by the Surrogate, but prepended by a DQUOTE and appended by a DQUOTE. For example, when the CDNI Logging field name (FIENAME) listed in the preceding fields directive is cs(User-Agent), this CDNI Logging field value contains the value of the User-Agent HTTP header as received by the Surrogate in the request it processed, but prepended by a DQUOTE and appended by a DQUOTE. If the HTTP header, as it appeared in the request processed by the Surrogate, contains one or more DQUOTE, each DQUOTE MUST be escaped with percent encoding. For example, if the HTTP header contains My_Header"value", then the field value of the cs(insert_HTTP_header_name_here) is "My_Header%x22value%x22". The entity transmitting the CDNI Logging File MUST ensure that the respective insert_HTTP_header_name_here of the cs(insert_HTTP_header_name_here) listed in the fields directive comply with HTTP specifications. In particular, this field name does not include any HTAB, since this would prevent proper parsing of the fields directive by the entity receiving the CDNI Logging File. * Occurrence: There MAY be zero, one, or any number of instance of this field. o sc(insert_HTTP_header_name_here): * Format: QSTRING * Field value: The value of the HTTP header (identified by the insert_HTTP_header_name_here in the CDNI Logging field name) as it appears in the response issued by the Surrogate to serve the request, but prepended by a DQUOTE and appended by a DQUOTE. If the HTTP header, as it appeared in the request processed by the Surrogate, contains one or more DQUOTEs, each DQUOTE MUST be escaped with percent encoding. For example, if the HTTP header contains My_Header"value", then the field value of the sc(insert_HTTP_header_name_here) is "My_Header%x22value%x22". The entity transmitting the CDNI Logging File MUST ensure that the respective insert_HTTP_header_name_here of the cs(insert_HTTP_header_name_here) listed in the fields directive
comply with HTTP specifications. In particular, this field name does not include any HTAB, since this would prevent proper parsing of the fields directive by the entity receiving the CDNI Logging File. * Occurrence: There MAY be zero, one, or any number of instances of this field. For a given insert_HTTP_header_name_here, there MUST be zero or exactly one instance of this field. o s-ccid: * Format: QSTRING * Field value: This contains the value of the Content Collection IDentifier (CCID) associated by the uCDN to the content served by the Surrogate via the CDNI Metadata interface ([CDNI-META]), prepended by a DQUOTE and appended by a DQUOTE. If the CCID conveyed in the CDNI Metadata interface contains one or more DQUOTEs, each DQUOTE MUST be escaped with percent encoding. For example, if the CCID conveyed in the CDNI Metadata interface is My_CCIDD"value", then the field value of the s-ccid is "My_CCID%x22value%X22". * Occurrence: There MUST be zero or exactly one instance of this field. For a given insert_HTTP_header_name_here, there MUST be zero or exactly one instance of this field. o s-sid: * Format: QSTRING * Field value: This contains the value of a Session IDentifier (SID) generated by the dCDN for a specific HTTP session, prepended by a DQUOTE and appended by a DQUOTE. In particular, for an HTTP Adaptive Streaming (HAS) session, the SID value is included in the Logging Record for every content chunk delivery of that session in view of facilitating the later correlation of all the per-content chunk log records of a given HAS session. See Section 3.4.2.2. of [RFC6983] for more discussion on the concept of Session IDentifier in the context of HAS. If the SID conveyed contains one or more DQUOTEs, each DQUOTE MUST be escaped with percent-encoding. For example, if the SID is My_SID"value", then the field value of the s-sid is "My_SID%x22value%x22". * Occurrence: There MUST be zero or exactly one instance of this field.
o s-cached: * Format: 1DIGIT * Field value: This characterizes whether or not the Surrogate served the request using content already stored on its local cache. The allowed values are "0" (for miss) and "1" (for hit). "1" MUST be used when the Surrogate did serve the request exclusively using content already stored on its local cache. "0" MUST be used otherwise (including cases where the Surrogate served the request using some, but not all, content already stored on its local cache). Note that a "0" only means a cache miss in the Surrogate and does not provide any information on whether or not the content was already stored in another device of the dCDN, i.e., whether this was a "dCDN hit" or a "dCDN miss". * Occurrence: There MUST be zero or exactly one instance of this field. CDNI Logging field names are case-insensitive as per the basic ABNF ([RFC5234]). The "fields" directive corresponding to an HTTP Request Logging Record MUST contain all the fields names whose occurrence is specified above as "[t]here MUST be one and only one instance of this field." The corresponding fields value MUST be present in every HTTP Request Logging Record. The "fields" directive corresponding to an HTTP Request Logging Record MAY list all the fields values whose occurrence is specified above as "[t]here MUST be zero or exactly one instance of this field" or "[t]here MAY be zero, one, or any number of instances of this field." The set of such field names actually listed in the "fields" directive is selected by the CDN generating the CDNI Logging File based on agreements between the interconnected CDNs established through mechanisms outside the scope of this specification (e.g., contractual agreements). When such a field name is not listed in the "fields" directive, the corresponding field value MUST NOT be included in the Logging Record. When such a field name is listed in the "fields" directive, the corresponding field value MUST be included in the Logging Record; if the value for the field is not available, this MUST be conveyed via a dash character ("-"). The fields names listed in the "fields" directive MAY be listed in the order in which they are listed in Section 3.4.1 or MAY be listed in any other order.
Logging some specific fields from HTTP requests and responses can introduce serious security and privacy risks. For example, cookies will often contain (months) long-lived token values that can be used to log into a service as the relevant user. Similar values may be included in other header fields or within URLs or elsewhere in HTTP requests and responses. Centralizing such values in a CDNI Logging File can therefore represent a significant increase in risk both for the user and the web service provider, but also for the CDNs involved. Therefore, implementations ought to attempt to lower the probability of such bad outcomes, e.g., by only allowing a configured set of headers to be added to CDNI Logging Records, or by not supporting wildcard selection of HTTP request/response fields to add. Such mechanisms can reduce the probability that security (or privacy) sensitive values are centralized in CDNI Logging Files. Also, when agreeing on which HTTP request/response fields are to be provided in CDNI Logging Files, the uCDN and dCDN administrators ought to consider these risks. Furthermore, CDNs making use of c-groupid to identify an aggregate of clients rather than individual clients ought to realize that, by logging certain header fields, they may create the possibility to re-identify individual clients. In these cases, heeding the above advice, or not logging header fields at all, is particularly important if the goal is to provide logs that do not identify individual clients. A dCDN-side implementation of the CDNI Logging interface MUST implement all the following Logging fields in a CDNI Logging Record of record-type "cdni_http_request_v1" and MUST support the ability to include valid values for each of them: o date o time o time-taken o c-groupid o s-ip o s-hostname o s-port o cs-method o cs-uri o u-uri
o protocol o sc-status o sc-total-bytes o sc-entity-bytes o cs(insert_HTTP_header_name_here) o sc(insert_HTTP_header_name_here) o s-cached A dCDN-side implementation of the CDNI Logging interface MAY support the following Logging fields in a CDNI Logging Record of record-type "cdni_http_request_v1": o s-ccid o s-sid If a dCDN-side implementation of the CDNI Logging interface supports these fields, it MUST support the ability to include valid values for them. An uCDN-side implementation of the CDNI Logging interface MUST be able to accept CDNI Logging Files with CDNI Logging Records of record-type "cdni_http_request_v1" containing any CDNI Logging Field defined in Section 3.4.1 as long as the CDNI Logging Record and the CDNI Logging File are compliant with the present document. In case an uCDN-side implementation of the CDNI Logging interface receives a CDNI Logging File with HTTP Request Logging Records that do not contain field values for exactly the set of field names actually listed in the preceding "fields" directive, the implementation MUST ignore those HTTP Request Logging Records and MUST accept the other HTTP Request Logging Records. To ensure that the Logging File is correct, the text MUST be sanitized before being logged. Null, bare CR, bare LF, and HTAB have to be removed by escaping them through percent encoding to avoid confusion with the Logging Record separators.
3.5. CDNI Logging File Extension
The CDNI Logging File contains blocks of directives and blocks of corresponding records. The supported set of directives is defined relative to the CDNI Logging File Format version. The complete set of directives for version "cdni/1.0" are defined in Section 3.3. The directive list is not expected to require much extension, but when it does, the new directive MUST be defined and registered in the "CDNI Logging Directive Names" registry, as described in Figure 9, and a new version MUST be defined and registered in the "CDNI Logging File version" registry, as described in Section 6.2. For example, adding a new CDNI Logging Directive, e.g., "foo", to the set of directives defined for "cdni/1.0" in Section 3.3, would require registering both the new CDNI Logging Directive "foo" and a new CDNI Logging File version, e.g., "CDNI/2.0", which includes all of the existing CDNI Logging Directives of "cdni/1.0" plus "foo". It is expected that as new logging requirements arise, the list of fields to log will change and expand. When adding new fields, the new fields MUST be defined and registered in the "CDNI Logging Field Names" registry, as described in Section 6.4, and a new record-type MUST be defined and registered in the "CDNI Logging record-types" registry, as described in Section 6.3. For example, adding a new CDNI Logging Field, e.g., "c-bar", to the set of fields defined for "cdni_http_request_v1" in Section 3.4.1, would require registering both the new CDNI Logging Field "c-bar" and a new CDNI record-type, e.g., "cdni_http_request_v2", which includes all of the existing CDNI Logging Fields of "cdni_http_request_v1" plus "c-bar".3.6. CDNI Logging File Examples
Let us consider the upstream CDN and the downstream CDN-labeled uCDN and dCDN-1 in Figure 1. When dCDN-1 acts as a downstream CDN for uCDN and performs content delivery on behalf of uCDN, dCDN-1 will include the CDNI Logging Records corresponding to the content deliveries performed on behalf of uCDN in the CDNI Logging Files for uCDN. An example CDNI Logging File communicated by dCDN-1 to uCDN is shown below in Figure 4.
#version:<HTAB>cdni/1.0<CRLF> #UUID:<HTAB>urn:uuid:f81d4fae-7dec-11d0-a765-00a0c91e6bf6<CRLF> #claimed-origin:<HTAB>cdni-logging-entity.dcdn-1.example.com<CRLF> #record-type:<HTAB>cdni_http_request_v1<CRLF> #fields:<HTAB>date<HTAB>time<HTAB>time-taken<HTAB>c-groupid<HTAB> cs-method<HTAB>u-uri<HTAB>protocol<HTAB> sc-status<HTAB>sc-total-bytes<HTAB>cs(User-Agent)<HTAB> cs(Referer)<HTAB>s-cached<CRLF> 2013-05-17<HTAB>00:38:06.825<HTAB>9.058<HTAB>US/TN/MEM/38138<HTAB> GET<HTAB> http://cdni-ucdn.dcdn-1.example.com/video/movie100.mp4<HTAB> HTTP/1.1<HTAB>200<HTAB>6729891<HTAB>"Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.127 Safari/533.4"<HTAB> "host1.example.com"<HTAB>1<CRLF> 2013-05-17<HTAB>00:39:09.145<HTAB>15.32<HTAB>FR/PACA/NCE/06100<HTAB> GET<HTAB> http://cdni-ucdn.dcdn-1.example.com/video/movie118.mp4<HTAB> HTTP/1.1<HTAB>200<HTAB>15799210<HTAB>"Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.127 Safari/533.4"<HTAB> "host1.example.com"<HTAB>1<CRLF> 2013-05-17<HTAB>00:42:53.437<HTAB>52.879<HTAB>US/TN/MEM/38138<HTAB> GET<HTAB> http://cdni-ucdn.dcdn-1.example.com/video/picture11.mp4<HTAB> HTTP/1.0<HTAB>200<HTAB>97234724<HTAB>"Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.127 Safari/533.4"<HTAB> "host5.example.com"<HTAB>0<CRLF> #SHA256-hash:<HTAB> 64-hexadecimal-digit hash value <CRLF> Figure 4: CDNI Logging File Example If uCDN establishes, by some means (e.g., via TLS authentication when pulling the CDNI Logging File), the identity of the entity from which it pulled the CDNI Logging File, uCDN can add an established-origin directive to the CDNI Logging as illustrated below: #established-origin:<HTAB>cdni-logging-entity.dcdn-1.example.com<CRLF>
As illustrated in Figure 2, uCDN will then ingest the corresponding CDNI Logging Records into its Collection process, alongside the Logging Records generated locally by the uCDN itself. This allows uCDN to aggregate Logging Records for deliveries performed by itself (through Records generated locally) as well as for deliveries performed by its downstream CDN(s). This aggregate information can then be used (after Filtering and Rectification, as illustrated in Figure 2) by log-consuming applications that take into account deliveries performed by uCDN as well as by all of its downstream CDNs. We observe that the time between 1. when a delivery is completed in dCDN and 2. when the corresponding Logging Record is ingested by the Collection process in uCDN depends on a number of parameters such as the Logging Period agreed to by uCDN and dCDN, how much time uCDN waits before pulling the CDNI Logging File once it is advertised in the CDNI Logging Feed, and the time to complete the pull of the CDNI Logging File. Therefore, if we consider the set of Logging Records aggregated by the Collection process in uCDN in a given time interval, there could be a permanent significant timing difference between the CDNI Logging Records received from the dCDN and the Logging Records generated locally. For example, in a given time interval, the Collection process in uCDN may be aggregating Logging Records generated locally by uCDN for deliveries performed in the last hour and CDNI Logging Records generated in the dCDN for deliveries in the hour before last. Say that, for some reason (for example, a Surrogate bug), dCDN-1 could not collect the total number of bytes of the responses sent by the Surrogate (in other words, the value for sc-total-bytes is not available). Then the corresponding CDNI Logging Records would contain a dash character ("-") in lieu of the value for the sc-total- bytes field (as specified in Section 3.4.1). In that case, the CDNI Logging File that would be communicated by dCDN-1 to uCDN is shown below in Figure 5.
#version:<HTAB>cdni/1.0<CRLF> #UUID:<HTAB>urn:uuid:f81d4fae-7dec-11d0-a765-00a0c91e6bf6<CRLF> #claimed-origin:<HTAB>cdni-logging-entity.dcdn-1.example.com<CRLF> #record-type:<HTAB>cdni_http_request_v1<CRLF> #fields:<HTAB>date<HTAB>time<HTAB>time-taken<HTAB>c-groupid<HTAB> cs-method<HTAB>u-uri<HTAB>protocol<HTAB> sc-status<HTAB>sc-total-bytes<HTAB>cs(User-Agent)<HTAB> cs(Referer)<HTAB>s-cached<CRLF> 2013-05-17<HTAB>00:38:06.825<HTAB>9.058<HTAB>US/TN/MEM/38138<HTAB> GET<HTAB> http://cdni-ucdn.dcdn-1.example.com/video/movie100.mp4<HTAB> HTTP/1.1<HTAB>200<HTAB>-<HTAB>"Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.127 Safari/533.4"<HTAB> "host1.example.com"<HTAB>1<CRLF> 2013-05-17<HTAB>00:39:09.145<HTAB>15.32<HTAB>FR/PACA/NCE/06100<HTAB> GET<HTAB> http://cdni-ucdn.dcdn-1.example.com/video/movie118.mp4<HTAB> HTTP/1.1<HTAB>200<HTAB>-<HTAB>"Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.127 Safari/533.4"<HTAB> "host1.example.com"<HTAB>1<CRLF> 2013-05-17<HTAB>00:42:53.437<HTAB>52.879<HTAB>US/TN/MEM/38138<HTAB> GET<HTAB> http://cdni-ucdn.dcdn-1.example.com/video/picture11.mp4<HTAB> HTTP/1.0<HTAB>200<HTAB>-<HTAB>"Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.127 Safari/533.4"<HTAB> "host5.example.com"<HTAB>0<CRLF> #SHA256-hash:<HTAB> 64-hexadecimal-digit hash value <CRLF> Figure 5: CDNI Logging File Example with a Missing Field Value
3.7. Cascaded CDNI Logging Files Example
Let us consider the cascaded CDN scenario of uCDN, dCDN-2, and dCDN-3 as depicted in Figure 1. After completion of a delivery by dCDN-3 on behalf of dCDN-2, dCDN-3 will include a corresponding Logging Record in a CDNI Logging File that will be pulled by dCDN-2 and that is illustrated below in Figure 6. In practice, a CDNI Logging File is likely to contain a very high number of CDNI Logging Records. However, for readability, the example in Figure 6 contains a single CDNI Logging Record. #version:<HTAB>cdni/1.0<CRLF> #UUID:<HTAB>urn:uuid:65718ef-0123-9876-adce4321bcde<CRLF> #claimed-origin:<HTAB>cdni-logging-entity.dcdn-3.example.com<CRLF> #record-type:<HTAB>cdni_http_request_v1<CRLF> #fields:<HTAB>date<HTAB>time<HTAB>time-taken<HTAB>c-groupid<HTAB> cs-method<HTAB>u-uri<HTAB>protocol<HTAB> sc-status<HTAB>sc-total-bytes<HTAB>cs(User-Agent)<HTAB> cs(Referer)<HTAB>s-cached<CRLF> 2013-05-17<HTAB>00:39:09.119<HTAB>14.07<HTAB>US/CA/SFO/94114<HTAB> GET<HTAB> http://cdni-dcdn-2.dcdn-3.example.com/video/movie118.mp4<HTAB> HTTP/1.1<HTAB>200<HTAB>15799210<HTAB>"Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.127 Safari /533.4"<HTAB> "host1.example.com"<HTAB>1<CRLF> #SHA256-hash:<HTAB> 64-hexadecimal-digit hash value <CRLF> Figure 6: Cascaded CDNI Logging File Example (dCDN-3 to dCDN-2) If dCDN-2 establishes, by some means (e.g., via TLS authentication when pulling the CDNI Logging File), the identity of the entity from which it pulled the CDNI Logging File, dCDN-2 can add an established- origin directive to the CDNI Logging as illustrated below: #established-origin:<HTAB>cdni-logging-entity.dcdn-3.example.com<CRLF> dCDN-2 (behaving as an upstream CDN from the viewpoint of dCDN-3) will then ingest the CDNI Logging Record for the considered dCDN-3 delivery into its Collection process (as illustrated in Figure 2). This Logging Record may be aggregated with Logging Records generated locally by dCDN-2 for deliveries performed by dCDN-2 itself. Say,
for illustration, that the content delivery performed by dCDN-3 on behalf of dCDN-2 had actually been redirected to dCDN-2 by uCDN, and say that another content delivery has just been redirected by uCDN to dCDN-2 and that dCDN-2 elected to perform the corresponding delivery itself. Then, after Filtering and Rectification (as illustrated in Figure 2), dCDN-2 will include the two Logging Records corresponding respectively to the delivery performed by dCDN-3 and the delivery performed by dCDN-2, in the next CDNI Logging File that will be communicated to uCDN. An example of such a CDNI Logging File is illustrated below in Figure 7. #version:<HTAB>cdni/1.0<CRLF> #UUID:<HTAB>urn:uuid:1234567-8fedc-abab-0987654321ff<CRLF> #claimed-origin:<HTAB>cdni-logging-entity.dcdn-2.example.com<CRLF> #record-type:<HTAB>cdni_http_request_v1<CRLF> #fields:<HTAB>date<HTAB>time<HTAB>time-taken<HTAB>c-groupid<HTAB> cs-method<HTAB>u-uri<HTAB>protocol<HTAB> sc-status<HTAB>sc-total-bytes<HTAB>cs(User-Agent)<HTAB> cs(Referer)<HTAB>s-cached<CRLF> 2013-05-17<HTAB>00:39:09.119<HTAB>14.07<HTAB>US/CA/SFO/94114<HTAB> GET<HTAB> http://cdni-ucdn.dcdn-2.example.com/video/movie118.mp4<HTAB> HTTP/1.1<HTAB>200<HTAB>15799210<HTAB>"Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.127 Safari /533.4"<HTAB> "host1.example.com"<HTAB>1<CRLF> 2013-05-17<HTAB>01:42:53.437<HTAB>52.879<HTAB>FR/IDF/PAR/75001<HTAB> GET<HTAB> http://cdni-ucdn.dcdn-2.example.com/video/picture11.mp4<HTAB> HTTP/1.0<HTAB>200<HTAB>97234724<HTAB>"Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.127 Safari /533.4"<HTAB> "host5.example.com"<HTAB>0<CRLF> #SHA256-hash:<HTAB> 64-hexadecimal-digit hash value <CRLF> Figure 7: Cascaded CDNI Logging File Example (dCDN-2 to uCDN)
If uCDN establishes, by some means (e.g., via TLS authentication when pulling the CDNI Logging File), the identity of the entity from which it pulled the CDNI Logging File, uCDN can add to the CDNI Logging an established-origin directive as illustrated below: #established-origin:<HTAB>cdni-logging-entity.dcdn-2.example.com<CRLF> In the example of Figure 7, we observe that: o The first Logging Record corresponds to the Logging Record communicated earlier to dCDN-2 by dCDN-3, which corresponds to a delivery redirected by uCDN to dCDN-2 and then redirected by dCDN-2 to dCDN-3. The fields values in this Logging Record are copied from the corresponding CDNI Logging Record communicated to dCDN2 by dCDN-3, with the exception of the u-uri that now reflects the URI convention between uCDN and dCDN-2 and that presents the delivery to uCDN as if it was performed by dCDN-2 itself. This reflects the fact that dCDN-2 had taken full responsibility of the corresponding delivery (even if in this case, dCDN-2 elected to redirect the delivery to dCDN-3 so it is actually performed by dCDN-3 on behalf of dCDN-2). o The second Logging Record corresponds to a delivery redirected by uCDN to dCDN-2 and performed by dCDN-2 itself. The time of the delivery in this Logging Record may be significantly more recent than the first Logging Record since it was generated locally while the first Logging Record was generated by dCDN-3 and had to be advertised, and then pulled and then ingested into the dCDN-2 Collection process, before being aggregated with the second Logging Record.