Network Working Group B. Trammell Request for Comments: 5655 E. Boschi Category: Standards Track Hitachi Europe L. Mark Fraunhofer IFAM T. Zseby Fraunhofer FOKUS A. Wagner ETH Zurich October 2009 Specification of the IP Flow Information Export (IPFIX) File FormatAbstract
This document describes a file format for the storage of flow data based upon the IP Flow Information Export (IPFIX) protocol. It proposes a set of requirements for flat-file, binary flow data file formats, then specifies the IPFIX File format to meet these requirements based upon IPFIX Messages. This IPFIX File format is designed to facilitate interoperability and reusability among a wide variety of flow storage, processing, and analysis tools. Status of This Memo This document specifies an Internet standards track protocol for the Internet community, and requests discussion and suggestions for improvements. Please refer to the current edition of the "Internet Official Protocol Standards" (STD 1) for the standardization state and status of this protocol. Distribution of this memo is unlimited. Copyright Notice Copyright (c) 2009 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the BSD License.
Table of Contents
1. Introduction ....................................................4 1.1. IPFIX Documents Overview ...................................4 2. Terminology .....................................................5 3. Design Overview .................................................6 4. Motivation ......................................................7 5. Requirements ...................................................10 5.1. Record Format Flexibility .................................10 5.2. Self-Description ..........................................10 5.3. Data Compression ..........................................11 5.4. Indexing and Searching ....................................11 5.5. Error Recovery ............................................12 5.6. Authentication, Confidentiality, and Integrity ............12 5.7. Anonymization and Obfuscation .............................13 5.8. Session Auditability and Replayability ....................13 5.9. Performance Characteristics ...............................14 6. Applicability ..................................................14 6.1. Storage of IPFIX-Collected Flow Data ......................14 6.2. Storage of NetFlow-V9-Collected Flow Data .................15 6.3. Testing IPFIX Collecting Processes ........................15 6.4. IPFIX Device Diagnostics ..................................16 7. Detailed File Format Specification .............................16 7.1. File Reader Specification .................................16 7.2. File Writer Specification .................................17 7.3. Specific File Writer Use Cases ............................18 7.3.1. Collocating a File Writer with a Collecting Process ............................................18 7.3.2. Collocating a File Writer with a Metering Process ..19 7.3.3. Using IPFIX Files for Archival Storage .............20 7.3.4. Using IPFIX Files as Documents .....................20 7.3.5. Using IPFIX Files for Testing ......................21 7.3.6. Writing IPFIX Files for Device Diagnostics .........22 7.3.7. IPFIX File Manipulation ............................22 7.4. Media Type of IPFIX Files .................................22 8. File Format Metadata Specification .............................22 8.1. Recommended Options Templates for IPFIX Files .............22 8.1.1. Message Checksum Options Template ..................23 8.1.2. File Time Window Options Template ..................23 8.1.3. Export Session Details Options Template ............24 8.1.4. Message Details Options Template ...................26 8.2. Recommended Information Elements for IPFIX Files ..........29 8.2.1. collectionTimeMilliseconds .........................29 8.2.2. collectorCertificate ...............................29 8.2.3. exporterCertificate ................................29 8.2.4. exportSctpStreamId .................................30 8.2.5. maxExportSeconds ...................................30 8.2.6. maxFlowEndMicroseconds .............................30
8.2.7. maxFlowEndMilliseconds .............................31 8.2.8. maxFlowEndNanoseconds ..............................31 8.2.9. maxFlowEndSeconds ..................................32 8.2.10. messageMD5Checksum ................................32 8.2.11. messageScope ......................................32 8.2.12. minExportSeconds ..................................33 8.2.13. minFlowStartMicroseconds ..........................33 8.2.14. minFlowStartMilliseconds ..........................34 8.2.15. minFlowStartNanoseconds ...........................34 8.2.16. minFlowStartSeconds ...............................34 8.2.17. opaqueOctets ......................................35 8.2.18. sessionScope ......................................35 9. Signing and Encryption of IPFIX Files ..........................36 9.1. CMS Detached Signatures ...................................36 9.1.1. ContentInfo ........................................37 9.1.2. SignedData .........................................38 9.1.3. SignerInfo .........................................38 9.1.4. EncapsulatedContentInfo ............................39 9.2. Encryption Error Resilience ...............................39 10. Compression of IPFIX Files ....................................39 10.1. Supported Compression Formats ............................40 10.2. Compression Recognition at the File Reader ...............40 10.3. Compression Error Resilience .............................40 11. Recommended File Integration Strategies .......................41 11.1. Encapsulation of Non-IPFIX Data in IPFIX Files ...........41 11.2. Encapsulation of IPFIX Files within Other File Formats ...42 12. Security Considerations .......................................42 12.1. Relationship between IPFIX File and Transport Encryption ...............................................43 12.2. End-to-End Assertions for IPFIX Files ....................43 12.3. Recommendations for Strength of Cryptography for IPFIX Files ..............................................44 13. IANA Considerations ...........................................44 14. Acknowledgements ..............................................46 15. References ....................................................47 15.1. Normative References .....................................47 15.2. Informative References ...................................48 Appendix A. Example IPFIX File ...................................49 A.1. Example Options Templates .................................50 A.2. Example Supplemental Options Data .........................52 A.3. Example Message Checksum ..................................54 A.4. File Example Data Set .....................................55 A.5. Complete File Example .....................................55 Appendix B. Applicability of IPFIX Files to NetFlow V9 Flow Storage ..............................................57 B.1. Comparing NetFlow V9 to IPFIX .............................57 B.1.1. Message Header Format .................................57 B.1.2. Set Header Format .....................................58
B.1.3. Template Format .......................................59 B.1.4. Information Model .....................................59 B.1.5. Template Management ...................................59 B.1.6. Transport .............................................59 B.2. A Method for Transforming NetFlow V9 Messages to IPFIX ....60 B.3. NetFlow V9 Transformation Example .........................611. Introduction
This document specifies a file format based upon IPFIX, designed to facilitate interoperability and reusability among a wide variety of flow storage, processing, and analysis tools. It begins with an overview of the IPFIX File format, and a quick summary of how IPFIX Files work in Section 3. The detailed specification of the IPFIX File format appears in Section 7; this section includes general specifications for IPFIX File Readers and IPFIX File Writers and specific recommendations for common situations in which they are used. The format makes use of the IPFIX Options mechanism for additional file metadata, in order to avoid requiring any protocol extensions, and to minimize the effort required to adapt IPFIX implementations to use the file format; a detailed definition of the Options Templates used for storage metadata appears in Section 8. Appendix A contains a detailed example IPFIX File. An advantage of file-based storage is that files can be readily encapsulated within each other and other data storage and transmission formats. The IPFIX File format leverages this to provide encryption, described in Section 9 and compression, described in Section 10. Section 11 provides specific recommendations for integration of IPFIX File data with other formats. The IPFIX File format was designed to be applicable to a wide variety of flow storage situations; the motivation behind its creation is described in Section 4. The document outlines of the set of requirements the format is designed to meet in Section 5, and explores the applicability of such a format to various specific application areas in Section 6. These sections are intended to give background on the development of IPFIX Files.1.1. IPFIX Documents Overview
"Specification of the IP Flow Information Export (IPFIX) Protocol for the Exchange of IP Traffic Flow Information" [RFC5101] and its associated documents define the IPFIX protocol, which provides network engineers and administrators with access to IP traffic flow information.
"Architecture for IP Flow Information Export" [RFC5470] defines the architecture for the export of measured IP flow information out of an IPFIX Exporting Process to an IPFIX Collecting Process, and the basic terminology used to describe the elements of this architecture, per the requirements defined in "Requirements for IP Flow Information Export" [RFC3917]. [RFC5101] then covers the details of the method for transporting IPFIX Data Records and Templates via a congestion- aware transport protocol from an IPFIX Exporting Process to an IPFIX Collecting Process. "Information Model for IP Flow Information Export" [RFC5102] describes the Information Elements used by IPFIX, including details on Information Element naming, numbering, and data type encoding. "IP Flow Information Export (IPFIX) Applicability" [RFC5472] describes the various applications of the IPFIX protocol and their use of information exported via IPFIX, and it relates the IPFIX architecture to other measurement architectures and frameworks. In addition, "Exporting Type Information for IP Flow Information Export (IPFIX) Information Elements" [RFC5610] specifies a method for encoding Information Model properties within an IPFIX Message stream. This document references [RFC5101] and [RFC5470] for terminology, defines IPFIX File Writer and IPFIX File Reader in terms of the IPFIX Exporting Process and IPFIX Collecting Process definitions from [RFC5101], and extends the IPFIX Information Model defined in [RFC5102] to provide new Information Elements for IPFIX File metadata. It uses the method described in [RFC5610] to support the self-description of IPFIX Files containing enterprise-specific Information Elements.2. Terminology
This section defines terminology related to the IPFIX File format. In addition, terms used in this document that are defined in the "Terminology" section of [RFC5101] are to be interpreted as defined there. IPFIX File: An IPFIX File is a serialized stream of IPFIX Messages; this stream may be stored on a filesystem or transported using any technique customarily used for files. Any IPFIX Message stream that would be considered valid when transported over one or more of the specified IPFIX transports (Stream Control Transmission Protocol (SCTP), TCP, or UDP) as defined in [RFC5101] is
considered an IPFIX File. However, this document extends that definition with recommendations on the construction of IPFIX Files that meet the requirements identified in Section 5. IPFIX File Reader: An IPFIX File Reader is a process that reads IPFIX Files from a filesystem. An IPFIX File Reader operates as an IPFIX Collecting Process as specified in [RFC5101], except as modified by this document. IPFIX File Writer: An IPFIX File Writer is a process that writes IPFIX Files to a filesystem. An IPFIX File Writer operates as an IPFIX Exporting Process as specified in [RFC5101], except as modified by this document. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].3. Design Overview
An IPFIX File is simply a data stream containing one or more IPFIX Messages serialized to some filesystem. Though any set of valid IPFIX Messages can be serialized into an IPFIX File, the specification includes guidelines designed to ease storage and retrieval of flow data using the IPFIX File format. IPFIX Files contain only IPFIX Messages; any file metadata such as checksums or export session details are stored using Options within the IPFIX Message. This design is completely compatible with the IPFIX protocol on the wire. A schematic of a typical IPFIX File is shown below:
+=======================================+ | IPFIX File | | +===================================+ | | | IPFIX Message | | | | +-------------------------------+ | | | | | IPFIX Message Header | | | | | +-------------------------------+ | | | | +-------------------------------+ | | | | | Options Template Set | | | | | | Options Template Record | | | | | | . . . | | | | | +-------------------------------+ | | | | +-------------------------------+ | | | | | Template Set | | | | | | Template Record | | | | | | . . . | | | | | +-------------------------------+ | | | +===================================+ | | | IPFIX Message | | | | +-------------------------------+ | | | | | IPFIX Message Header | | | | | +-------------------------------+ | | | | +-------------------------------+ | | | | | Data Set | | | | | | Data Record | | | | | | . . . | | | | | +-------------------------------+ | | | | +-------------------------------+ | | | | | Data Set | | | | | | Data Record | | | | | | . . . | | | | | +-------------------------------+ | | | | . . . | | | +===================================+ | | . . . | +=======================================+ Figure 1: Typical File Structure4. Motivation
There is a wide variety of applications for the file-based storage of IP flow data, across a continuum of time scales. Tools used in the analysis of flow data and creation of analysis products often use files as a convenient unit of work, with an ephemeral lifetime. A set of flows relevant to a security investigation may be stored in a file for the duration of that investigation, and further exchanged among incident handlers via email or within an external incident
handling workflow application. Sets of flow data relevant to Internet measurement research may be published as files, much as libpcap [pcap] packet trace files are, to provide common datasets for the repeatability of research efforts; these files would have lifetimes measured in months or years. Operational flow measurement systems also have a need for long-term, archival storage of flow data, either as a primary flow data repository, or as a backing tier for online storage in a relational database management system (RDBMS). The variety of applications of flow data, and the variety of presently deployed storage approaches, indicates the need for a standard approach to flow storage with applicability across the continuum of time scales over which flow data is stored. A storage format based around flat files would best address the variety of storage requirements. While much work has been done on structured storage via RDBMS, relational database systems are not a good basis for format standardization owing to the fact that their internal data structures are generally private to a single implementation and subject to change for internal reasons. Also, there are a wide variety of operations available on flat files, and external tools and standards can be leveraged to meet file-based flow storage requirements. Further, flow data is often not very semantically complicated, and is managed in very high volume; therefore, an RDBMS- based flow storage system would not benefit much from the advantages of relational database technology. The simplest way to create a new file format is simply to serialize some internal data model to disk, with either textual or binary representation of data elements, and some framing strategy for delimiting fields and records. "Ad hoc" file formats such as this have several important disadvantages. They impose the semantics of the data model from which they are derived on the file format, and as such, they are difficult to extend, describe, and standardize. Indeed, one de facto standard for the storage of flow data is one of these ad hoc formats. A common method of storing data collected via Cisco NetFlow is to serialize a stream of raw NetFlow datagrams into files. These NetFlow PDU files consist of a collection of header- prefixed blocks (corresponding to the datagrams as received on the wire) containing fixed-length binary flow records. NetFlow V5, V7, and V8 data may be mixed within a given file, as the header on each datagram defines the NetFlow version of the records following. While this NetFlow PDU file format has all the disadvantages of an ad hoc format, and is not extensible to data models other than that defined by Cisco NetFlow, it is at least reasonably well understood due to its ubiquity.
Over the past decade, XML has emerged as a new "universal" representation format for structured data. It is intended to be human readable; indeed, that is one reason for its rapid adoption. However, XML has limited usefulness for representing network flow data. Network flow data has a simple, repetitive, non-hierarchical structure that does not benefit much from XML. An XML representation of flow data would be an essentially flat list of the attributes and their values for each flow record. The XML approach to data encoding is very heavyweight when compared to binary flow encoding. XML's use of start- and end-tags, and plaintext encoding of the actual values, leads to significant inefficiency in encoding size. Typical network traffic datasets can contain millions or billions of flows per hour of traffic represented. Any increase in storage size per record can have dramatic impact on flow data storage and transfer sizes. While data compression algorithms can partially remove the redundancy introduced by XML encoding, they introduce additional overhead of their own. A further problem is that XML processing tools require a full XML parser. XML parsers are fully general and therefore complex, resource-intensive, and relatively slow, introducing significant processing time overhead for large network-flow datasets. In contrast, parsers for typical binary flow data encodings are simply structured, since they only need to parse a very small header and then have complete knowledge of all following fields for the particular flow. These can then be read in a very efficient linear fashion. This leads us to propose the IPFIX Message format as the basis for a new flow data file format. The IPFIX Working Group, in defining the IPFIX protocol, has already defined an information model and data formatting rules for representation of flow data. Especially at shorter time scales, when a file is a unit of data interchange, the filesystem may be viewed as simply another IPFIX Message transport between processes. This format is especially well suited to representing flow data, as it was designed specifically for flow data export; it is easily extensible, unlike ad hoc serialization, and compact, unlike XML. In addition, IPFIX is an IETF Standards-Track protocol for the export and collection of flow data; using a common format for storage and analysis at the collection side allows implementors to use substantially the same information model and data formatting implementation for transport as well as storage.
5. Requirements
In this section, we outline a proposed set of requirements [SAINT2007] for any persistent storage format for flow data. First and foremost, a flow data file format should support storage across the continuum of time scales important to flow storage applications. Each of the requirements enumerated in the sections below is broadly applicable to flow storage applications, though each may be more important at certain time scales. For each, we first identify the requirement, then explain how the IPFIX Message format addresses it, or briefly outline the changes that must be made in order for an IPFIX-based file format to meet the requirement.5.1. Record Format Flexibility
Due to the wide variety of flow attributes collected by different network flow attribute measurement systems, the ideal flow storage format will not impose a single data model or a specific record type on the flows it stores. The file format must be flexible and extensible; that is, it must support the definition of multiple record types within the file itself, and must be able to support new field types for data within the records in a graceful way. IPFIX provides record format flexibility through the use of Templates to describe each Data Record, through the use of an IANA Registry to define its Information Elements, and through the use of enterprise- specific Information Elements.5.2. Self-Description
Archived data may be read at a time in the future when any external reference to the meaning of the data may be lost. The ideal flow storage format should be self-describing; that is, a process reading flow data from storage should be able to properly interpret the stored flows without reference to anything other than standard sources (e.g., the standards document describing the file format) and the stored flow data itself. The IPFIX Message format is partially self-describing; that is, IPFIX Templates containing only IANA-assigned Information Elements can be completely interpreted according to the IPFIX Information Model without additional external data. However, Templates containing private information elements lack detailed type and semantic information; a Collecting Process receiving Data Records described by a Template containing enterprise- specific Information Elements it does not understand can only treat the data contained within those Information Elements as octet arrays.
To be fully self-describing, enterprise-specific Information Elements must be additionally described via IPFIX Options according to the Information Element Type Options Template defined in [RFC5610].5.3. Data Compression
Regardless of the representation format, flow data describing traffic on real networks tends to be highly compressible. Compression tends to improve the scalability of flow collection systems, by reducing the disk storage and I/O bandwidth requirement for a given workload. The ideal flow storage format should support applications that wish to leverage this fact by supporting compression of stored data. The IPFIX Message format has no support for data compression, as the IPFIX protocol was designed for speed and simplicity of export. Of course, any flat file is readily compressible using a wide variety of external data compression tools, formats, and algorithms; therefore, this requirement can be met via encapsulation in one of these formats. Section 10 specifies an encapsulation based on bzip2 or gzip, to maximize interoperability. A few simple optimizations can be made by File Writers to increase the integrity and usability of compressed IPFIX data; these are outlined in Section 10.3.5.4. Indexing and Searching
Binary, record-stream-oriented file formats natively support only one form of searching: sequential scan in file order. By choosing the order of records in a file carefully (e.g., by flow end time), a file can be indexed by a single key. Beyond this, properly addressing indexing is an application-specific problem, as it inherently involves trade-offs between storage complexity and retrieval speed, and requirements vary widely based on time scales and the types of queries used from site to site. However, a generic standard flow storage format may provide limited direct support for indexing and searching. The ideal flow storage format will support a limited table of contents facility noting that the records in a file contain data relating only to certain keys or values of keys, in order to keep multi-file search implementations from having to scan a file for data it does not contain. The IPFIX Message format has no direct support for indexing. However, the technique described in "Reducing Redundancy in IP Flow Information Export (IPFIX) and Packet Sampling (PSAMP) Reports"
[RFC5473] can be used to describe the contents of a file in a limited way. Additionally, as flow data is often sorted and divided by time, the start and end time of the flows in a file may be declared using the File Time Window Options Template defined in Section 8.1.2.5.5. Error Recovery
When storing flow data for archival purposes, it is important to ensure that hardware or software faults do not introduce errors into the data over time. The ideal flow storage format will support the detection and correction of encoding-level errors in the data. Note that more advanced error correction is best handled at a layer below that addressed by this document. Error correction is a topic well addressed by the storage industry in general (e.g., by Redundant Array of Independent Disks (RAID) and other technologies). By specifying a flow storage format based upon files, we can leverage these features to meet this requirement. However, the ideal flow storage format will be resilient against errors, providing an internal facility for the detection of errors and the ability to isolate errors to as few data records as possible. Note that this requirement interacts with the choice of data compression or encryption algorithm. For example, the use of block compression algorithms can serve to isolate errors to a single compression block, unlike stream compressors, which may fail to resynchronize after a single bit error, invalidating the entire message stream. The IPFIX Message format does not support data integrity assurance. It is assumed that advanced error correction will be provided externally. Compression and encryption, if used, provide some allowance for detection, if not correction, of errors. For simple error detection support in the absence of compression or encryption, checksums may be attached to messages via IPFIX Options according to the Message Checksum Options Template defined in Section 8.1.1.5.6. Authentication, Confidentiality, and Integrity
Archival storage of flow data may also require assurance that no unauthorized entity can read or modify the stored data. Cryptography can be applied to this problem to ensure integrity and confidentiality by signing and encryption.
As with error correction, this problem has been addressed well at a layer below that addressed by this document. We can leverage the fact that existing cryptographic technologies work quite well on data stored in files to meet this requirement. Beyond support for the use of Transport Layer Security (TLS) for transport over TCP or Datagram Transport Layer Security (DTLS) for transport over SCTP or UDP, both of which provide transient authentication and confidentiality, the IPFIX protocol does not support this requirement directly. The IETF has specified the Cryptographic Message Syntax (CMS) [RFC3852] for creating detached signatures for integrity and authentication; Section 9 specifies a CMS-based method for signing IPFIX Files. Confidentiality protection is assumed to be met by methods external to this specification, leveraging one of the many such technologies for encrypting files to meet specific application and process requirements; however, notes on improving archival integrity of encrypted IPFIX Files are given in Section 9.2.5.7. Anonymization and Obfuscation
To ensure the privacy of individuals and organizations at the endpoints of communications represented by flow records, it is often necessary to obfuscate or anonymize stored and exported flow data. The ideal flow storage format will provide for a notation that a given information element on a given record type represents anonymized, rather than real, data. The IPFIX protocol presently has no support for anonymization notation. It should be noted that anonymization is one of the requirements given for IPFIX in [RFC3917]. The decision to qualify this requirement with 'MAY' and not 'MUST' in the requirements document, and its subsequent lack of specification in the current version of the IPFIX protocol, is due to the fact that anonymization algorithms are still an open area of research, and that there currently exist no standardized methods for anonymization. No support is presently defined in [RFC5101] or this IPFIX-based File format for anonymization, as anonymization notation is an area of open work for the IPFIX Working Group.5.8. Session Auditability and Replayability
Certain use cases for archival flow storage require the storage of collection infrastructure details alongside the data itself. These details include information about how and when data was received, and where it was received from. They are useful for auditing as well as for the replaying received data for testing purposes.
The IPFIX protocol contains no direct support for auditability and replayability, though the IPFIX Information Model does define various Information Elements required to represent collection infrastructure details. These details may be stored in IPFIX Files using the Export Session Details Options Template defined in Section 8.1.3, and the Message Details Options Template defined in Section 8.1.4.5.9. Performance Characteristics
The ideal standard flow storage format will not have a significant negative impact on the performance of the application generating or processing flow data stored in the format. This is a non-functional requirement, but it is important to note that a standard that implies a significant performance penalty is unlikely to be widely implemented and adopted. An examination of the IPFIX protocol would seem to suggest that implementations of it are not particularly prone to slowness; indeed, a template-based data representation is more easily subject to optimization for common cases than representations that embed structural information directly in the data stream (e.g., XML). However, a full analysis of the impact of using IPFIX Messages as a basis for flow data storage on read/write performance will require more implementation experience and performance measurement.