6. Additional Considerations and Special Cases in Flow Aggregation
6.1. Exact versus Approximate Counting during Aggregation
In certain circumstances, particularly involving aggregation by devices with limited resources, and in situations where exact aggregated counts are less important than relative magnitudes (e.g., driving graphical displays), counter distribution during key aggregation may be performed by approximate counting means (e.g., Bloom filters). The choice to use approximate counting is implementation and application dependent.6.2. Delay and Loss Introduced by the IAP
When accepting Original Flows in export order from traffic captured live, the Intermediate Aggregation Process waits for all Original Flows that may contribute to a given interval during interval distribution. This is generally dominated by the active timeout of the Metering Process measuring the Original Flows. For example, with Metering Processes configured with a five-minute active timeout, the Intermediate Aggregation Process introduces a delay of at least five minutes to all exported Aggregated Flows to ensure it has received all Original Flows. Note that when aggregating Flows from multiple Metering Processes with different active timeouts, the delay is determined by the maximum active timeout. In certain circumstances, additional delay at the original Exporter may cause an IAP to close an interval before the last Original Flow(s) accountable to the interval arrives. In this case, the IAP MAY drop the late Original Flow(s). Accounting of Flows lost at an Intermediate Process due to such issues is covered in [IPFIX-MED-PROTO].6.3. Considerations for Aggregation of Sampled Flows
The accuracy of Aggregated Flows may also be affected by sampling of the Original Flows, or sampling of packets making up the Original Flows. At the time of writing, the effect of sampling on Flow aggregation is still an open research question. However, to maximize the comparability of Aggregated Flows, aggregation of sampled Flows should only be applied to Original Flows sampled using the same sampling rate and sampling algorithm, Flows created from packets sampled using the same sampling rate and sampling algorithm, or Original Flows that have been normalized as if they had the same sampling rate and algorithm before aggregation. For more on packet sampling within IPFIX, see [RFC5476]. For more on Flow sampling within the IPFIX Mediator framework, see [RFC7014].
6.4. Considerations for Aggregation of Heterogeneous Flows
Aggregation may be applied to Original Flows from different sources and of different types (i.e., represented using different, perhaps wildly different Templates). When the goal is to separate the heterogeneous Original Flows and aggregate them into heterogeneous Aggregated Flows, each aggregation should be done at its own Intermediate Aggregation Process. The Observation Domain ID on the Messages containing the output Aggregated Flows can be used to identify the different Processes and to segregate the output. However, when the goal is to aggregate these Flows into a single stream of Aggregated Flows representing one type of data, and if the Original Flows may represent the same original packet at two different Observation Points, the Original Flows should be correlated by the correlation and normalization operation within the IAP to ensure that each packet is only represented in a single Aggregated Flow or set of Aggregated Flows differing only by aggregation interval.7. Export of Aggregated IP Flows Using IPFIX
In general, Aggregated Flows are exported in IPFIX as any other Flow. However, certain aspects of Aggregated Flow export benefit from additional guidelines or new Information Elements to represent aggregation metadata or information generated during aggregation. These are detailed in the following subsections.7.1. Time Interval Export
Since an Aggregated Flow is simply a Flow, the existing timestamp Information Elements in the IPFIX Information Model (e.g., flowStartMilliseconds, flowEndNanoseconds) are sufficient to specify the time interval for aggregation. Therefore, no new aggregation- specific Information Elements for exporting time interval information are necessary. Each Aggregated Flow carrying timing information SHOULD contain both an interval start and interval end timestamp.7.2. Flow Count Export
The following four Information Elements are defined to count Original Flows as discussed in Section 5.2.1.
7.2.1. originalFlowsPresent
Description: The non-conservative count of Original Flows contributing to this Aggregated Flow. Non-conservative counts need not sum to the original count on re-aggregation. Abstract Data Type: unsigned64 Data Type Semantics: deltaCounter ElementID: 3757.2.2. originalFlowsInitiated
Description: The conservative count of Original Flows whose first packet is represented within this Aggregated Flow. Conservative counts must sum to the original count on re-aggregation. Abstract Data Type: unsigned64 Data Type Semantics: deltaCounter ElementID: 3767.2.3. originalFlowsCompleted
Description: The conservative count of Original Flows whose last packet is represented within this Aggregated Flow. Conservative counts must sum to the original count on re-aggregation. Abstract Data Type: unsigned64 Data Type Semantics: deltaCounter ElementID: 3777.2.4. deltaFlowCount
Description: The conservative count of Original Flows contributing to this Aggregated Flow; may be distributed via any of the methods expressed by the valueDistributionMethod Information Element. Abstract Data Type: unsigned64 Data Type Semantics: deltaCounter ElementID: 3
7.3. Distinct Host Export
The following six Information Elements represent the distinct counts of source and destination network-layer addresses used to export distinct host counts reduced away during key aggregation.7.3.1. distinctCountOfSourceIPAddress
Description: The count of distinct source IP address values for Original Flows contributing to this Aggregated Flow, without regard to IP version. This Information Element is preferred to the IP-version-specific counters, unless it is important to separate the counts by version. Abstract Data Type: unsigned64 Data Type Semantics: totalCounter ElementID: 3787.3.2. distinctCountOfDestinationIPAddress
Description: The count of distinct destination IP address values for Original Flows contributing to this Aggregated Flow, without regard to IP version. This Information Element is preferred to the version-specific counters below, unless it is important to separate the counts by version. Abstract Data Type: unsigned64 Data Type Semantics: totalCounter ElementID: 3797.3.3. distinctCountOfSourceIPv4Address
Description: The count of distinct source IPv4 address values for Original Flows contributing to this Aggregated Flow. Abstract Data Type: unsigned32 Data Type Semantics: totalCounter ElementID: 380
7.3.4. distinctCountOfDestinationIPv4Address
Description: The count of distinct destination IPv4 address values for Original Flows contributing to this Aggregated Flow. Abstract Data Type: unsigned32 Data Type Semantics: totalCounter ElementID: 3817.3.5. distinctCountOfSourceIPv6Address
Description: The count of distinct source IPv6 address values for Original Flows contributing to this Aggregated Flow. Abstract Data Type: unsigned64 Data Type Semantics: totalCounter ElementID: 3827.3.6. distinctCountOfDestinationIPv6Address
Description: The count of distinct destination IPv6 address values for Original Flows contributing to this Aggregated Flow. Abstract Data Type: unsigned64 Data Type Semantics: totalCounter ElementID: 3837.4. Aggregate Counter Distribution Export
When exporting counters distributed among Aggregated Flows, as described in Section 5.1.1, the Exporting Process MAY export an Aggregate Counter Distribution Option Record for each Template describing Aggregated Flow records; this Options Template is described below. It uses the valueDistributionMethod Information Element, also defined below. Since, in many cases, distribution is simple, accounting the counters from Contributing Flows to the first Interval to which they contribute, this is the default situation, for which no Aggregate Counter Distribution Record is necessary; Aggregate Counter Distribution Records are only applicable in more exotic situations, such as using an Aggregation Interval smaller than the durations of Original Flows.
7.4.1. Aggregate Counter Distribution Options Template
This Options Template defines the Aggregate Counter Distribution Record, which allows the binding of a value distribution method to a Template ID. The scope is the Template ID, whose uniqueness, per [RFC7011], is local to the Transport Session and Observation Domain that generated the Template ID. This is used to signal to the Collecting Process how the counters were distributed. The fields are as below: +-----------------------------+-------------------------------------+ | IE | Description | +-----------------------------+-------------------------------------+ | templateId [scope] | The Template ID of the Template | | | defining the Aggregated Flows to | | | which this distribution option | | | applies. This Information Element | | | MUST be defined as a Scope field. | | valueDistributionMethod | The method used to distribute the | | | counters for the Aggregated Flows | | | defined by the associated Template. | +-----------------------------+-------------------------------------+7.4.2. valueDistributionMethod Information Element
Description: A description of the method used to distribute the counters from Contributing Flows into the Aggregated Flow records described by an associated scope, generally a Template. The method is deemed to apply to all the non-Key Information Elements in the referenced scope for which value distribution is a valid operation; if the originalFlowsInitiated and/or originalFlowsCompleted Information Elements appear in the Template, they are not subject to this distribution method, as they each infer their own distribution method. This is intended to be a complete set of possible value distribution methods; it is encoded as follows:
+-------+-----------------------------------------------------------+ | Value | Description | +-------+-----------------------------------------------------------+ | 0 | Unspecified: The counters for an Original Flow are | | | explicitly not distributed according to any other method | | | defined for this Information Element; use for arbitrary | | | distribution, or distribution algorithms not described by | | | any other codepoint. | | | --------------------------------------------------------- | | | | | 1 | Start Interval: The counters for an Original Flow are | | | added to the counters of the appropriate Aggregated Flow | | | containing the start time of the Original Flow. This | | | should be assumed the default if value distribution | | | information is not available at a Collecting Process for | | | an Aggregated Flow. | | | --------------------------------------------------------- | | | | | 2 | End Interval: The counters for an Original Flow are added | | | to the counters of the appropriate Aggregated Flow | | | containing the end time of the Original Flow. | | | --------------------------------------------------------- | | | | | 3 | Mid Interval: The counters for an Original Flow are added | | | to the counters of a single appropriate Aggregated Flow | | | containing some timestamp between start and end time of | | | the Original Flow. | | | --------------------------------------------------------- | | | | | 4 | Simple Uniform Distribution: Each counter for an Original | | | Flow is divided by the number of time intervals the | | | Original Flow covers (i.e., of appropriate Aggregated | | | Flows sharing the same Flow Key), and this number is | | | added to each corresponding counter in each Aggregated | | | Flow. | | | --------------------------------------------------------- | | | | | 5 | Proportional Uniform Distribution: Each counter for an | | | Original Flow is divided by the number of time units the | | | Original Flow covers, to derive a mean count rate. This | | | mean count rate is then multiplied by the number of time | | | units in the intersection of the duration of the Original | | | Flow and the time interval of each Aggregated Flow. | | | This is like simple uniform distribution, but accounts | | | for the fractional portions of a time interval covered by | | | an Original Flow in the first and last time interval. | | | --------------------------------------------------------- |
| | --------------------------------------------------------- | | 6 | Simulated Process: Each counter of the Original Flow is | | | distributed among the intervals of the Aggregated Flows | | | according to some function the Intermediate Aggregation | | | Process uses based upon properties of Flows presumed to | | | be like the Original Flow. This is essentially an | | | assertion that the Intermediate Aggregation Process has | | | no direct packet timing information but is nevertheless | | | not using one of the other simpler distribution methods. | | | The Intermediate Aggregation Process specifically makes | | | no assertion as to the correctness of the simulation. | | | --------------------------------------------------------- | | | | | 7 | Direct: The Intermediate Aggregation Process has access | | | to the original packet timings from the packets making up | | | the Original Flow, and uses these to distribute or | | | recalculate the counters. | +-------+-----------------------------------------------------------+ Abstract Data Type: unsigned8 ElementID: 3848. Examples
In these examples, the same data, described by the same Template, will be aggregated multiple different ways; this illustrates the various different functions that could be implemented by Intermediate Aggregation Processes. Templates are shown in IESpec format as introduced in [RFC7013]. The source data format is a simplified Flow: timestamps, traditional 5-tuple, and octet count; the Flow Key fields are the 5-tuple. The Template is shown in Figure 9. flowStartMilliseconds(152)[8] flowEndMilliseconds(153)[8] sourceIPv4Address(8)[4]{key} destinationIPv4Address(12)[4]{key} sourceTransportPort(7)[2]{key} destinationTransportPort(11)[2]{key} protocolIdentifier(4)[1]{key} octetDeltaCount(1)[8] Figure 9: Input Template for Examples The data records given as input to the examples in this section are shown below; timestamps are given in H:MM:SS.sss format. In this and subsequent figures, flowStartMilliseconds is shown in H:MM:SS.sss format as 'start time', flowEndMilliseconds is shown in H:MM:SS.sss
format as 'end time', sourceIPv4Address is shown as 'source ip4' with the following 'port' representing sourceTransportPort, destinationIPv4Address is shown as 'dest ip4' with the following 'port' representing destinationTransportPort, protocolIdentifier is shown as 'pt', and octetDeltaCount as 'oct'. start time |end time |source ip4 |port |dest ip4 |port|pt| oct 9:00:00.138 9:00:00.138 192.0.2.2 47113 192.0.2.131 53 17 119 9:00:03.246 9:00:03.246 192.0.2.2 22153 192.0.2.131 53 17 83 9:00:00.478 9:00:03.486 192.0.2.2 52420 198.51.100.2 443 6 1637 9:00:07.172 9:00:07.172 192.0.2.3 56047 192.0.2.131 53 17 111 9:00:07.309 9:00:14.861 192.0.2.3 41183 198.51.100.67 80 6 16838 9:00:03.556 9:00:19.876 192.0.2.2 17606 198.51.100.68 80 6 11538 9:00:25.210 9:00:25.210 192.0.2.3 47113 192.0.2.131 53 17 119 9:00:26.358 9:00:30.198 192.0.2.3 48458 198.51.100.133 80 6 2973 9:00:29.213 9:01:00.061 192.0.2.4 61295 198.51.100.2 443 6 8350 9:04:00.207 9:04:04.431 203.0.113.3 41256 198.51.100.133 80 6 778 9:03:59.624 9:04:06.984 203.0.113.3 51662 198.51.100.3 80 6 883 9:00:30.532 9:06:15.402 192.0.2.2 37581 198.51.100.2 80 6 15420 9:06:56.813 9:06:59.821 203.0.113.3 52572 198.51.100.2 443 6 1637 9:06:30.565 9:07:00.261 203.0.113.3 49914 198.51.100.133 80 6 561 9:06:55.160 9:07:05.208 192.0.2.2 50824 198.51.100.2 443 6 1899 9:06:49.322 9:07:05.322 192.0.2.3 34597 198.51.100.3 80 6 1284 9:07:05.849 9:07:09.625 203.0.113.3 58907 198.51.100.4 80 6 2670 9:10:45.161 9:10:45.161 192.0.2.4 22478 192.0.2.131 53 17 75 9:10:45.209 9:11:01.465 192.0.2.4 49513 198.51.100.68 80 6 3374 9:10:57.094 9:11:00.614 192.0.2.4 64832 198.51.100.67 80 6 138 9:10:59.770 9:11:02.842 192.0.2.3 60833 198.51.100.69 443 6 2325 9:02:18.390 9:13:46.598 203.0.113.3 39586 198.51.100.17 80 6 11200 9:13:53.933 9:14:06.605 192.0.2.2 19638 198.51.100.3 80 6 2869 9:13:02.864 9:14:08.720 192.0.2.3 40429 198.51.100.4 80 6 18289 Figure 10: Input Data for Examples8.1. Traffic Time Series per Source
Aggregating Flows by source IP address in time series (i.e., with a regular interval) can be used in subsequent heavy-hitter analysis and as a source parameter for statistical anomaly detection techniques. Here, the Intermediate Aggregation Process imposes an interval, aggregates the key to remove all key fields other than the source IP address, then combines the result into a stream of Aggregated Flows. The imposed interval of five minutes is longer than the majority of Flows; for those Flows crossing interval boundaries, the entire Flow is accounted to the interval containing the start time of the Flow.
In this example, the Partially Aggregated Flows after each conceptual operation in the Intermediate Aggregation Process are shown. These are meant to be illustrative of the conceptual operations only, and not to suggest an implementation (indeed, the example shown here would not necessarily be the most efficient method for performing these operations). Subsequent examples will omit the Partially Aggregated Flows for brevity. The input to this process could be any Flow Record containing a source IP address and octet counter; consider for this example the Template and data from the introduction. The Intermediate Aggregation Process would then output records containing just timestamps, source IP, and octetDeltaCount, as in Figure 11. flowStartMilliseconds(152)[8] flowEndMilliseconds(153)[8] sourceIPv4Address(8)[4] octetDeltaCount(1)[8] Figure 11: Output Template for Time Series per Source
Assume the goal is to get 5-minute (300 s) time series of octet counts per source IP address. The aggregation operations would then be arranged as in Figure 12. Original Flows | V +-----------------------+ | interval distribution | | * impose uniform | | 300s time interval | +-----------------------+ | | Partially Aggregated Flows V +------------------------+ | key aggregation | | * reduce key to only | | sourceIPv4Address | +------------------------+ | | Partially Aggregated Flows V +-------------------------+ | aggregate combination | | * sum octetDeltaCount | +-------------------------+ | V Aggregated Flows Figure 12: Aggregation Operations for Time Series per Source After applying the interval distribution step to the source data in Figure 10, only the time intervals have changed; the Partially Aggregated Flows are shown in Figure 13. Note that interval distribution follows the default Start Interval policy; that is, the entire Flow is accounted to the interval containing the Flow's start time.
start time |end time |source ip4 |port |dest ip4 |port|pt| oct 9:00:00.000 9:05:00.000 192.0.2.2 47113 192.0.2.131 53 17 119 9:00:00.000 9:05:00.000 192.0.2.2 22153 192.0.2.131 53 17 83 9:00:00.000 9:05:00.000 192.0.2.2 52420 198.51.100.2 443 6 1637 9:00:00.000 9:05:00.000 192.0.2.3 56047 192.0.2.131 53 17 111 9:00:00.000 9:05:00.000 192.0.2.3 41183 198.51.100.67 80 6 16838 9:00:00.000 9:05:00.000 192.0.2.2 17606 198.51.100.68 80 6 11538 9:00:00.000 9:05:00.000 192.0.2.3 47113 192.0.2.131 53 17 119 9:00:00.000 9:05:00.000 192.0.2.3 48458 198.51.100.133 80 6 2973 9:00:00.000 9:05:00.000 192.0.2.4 61295 198.51.100.2 443 6 8350 9:00:00.000 9:05:00.000 203.0.113.3 41256 198.51.100.133 80 6 778 9:00:00.000 9:05:00.000 203.0.113.3 51662 198.51.100.3 80 6 883 9:00:00.000 9:05:00.000 192.0.2.2 37581 198.51.100.2 80 6 15420 9:00:00.000 9:05:00.000 203.0.113.3 39586 198.51.100.17 80 6 11200 9:05:00.000 9:10:00.000 203.0.113.3 52572 198.51.100.2 443 6 1637 9:05:00.000 9:10:00.000 203.0.113.3 49914 197.51.100.133 80 6 561 9:05:00.000 9:10:00.000 192.0.2.2 50824 198.51.100.2 443 6 1899 9:05:00.000 9:10:00.000 192.0.2.3 34597 198.51.100.3 80 6 1284 9:05:00.000 9:10:00.000 203.0.113.3 58907 198.51.100.4 80 6 2670 9:10:00.000 9:15:00.000 192.0.2.4 22478 192.0.2.131 53 17 75 9:10:00.000 9:15:00.000 192.0.2.4 49513 198.51.100.68 80 6 3374 9:10:00.000 9:15:00.000 192.0.2.4 64832 198.51.100.67 80 6 138 9:10:00.000 9:15:00.000 192.0.2.3 60833 198.51.100.69 443 6 2325 9:10:00.000 9:15:00.000 192.0.2.2 19638 198.51.100.3 80 6 2869 9:10:00.000 9:15:00.000 192.0.2.3 40429 198.51.100.4 80 6 18289 Figure 13: Interval Imposition for Time Series per Source After the key aggregation step, all Flow Keys except the source IP address have been discarded, as shown in Figure 14. This leaves duplicate Partially Aggregated Flows to be combined in the final operation.
start time |end time |source ip4 |octets 9:00:00.000 9:05:00.000 192.0.2.2 119 9:00:00.000 9:05:00.000 192.0.2.2 83 9:00:00.000 9:05:00.000 192.0.2.2 1637 9:00:00.000 9:05:00.000 192.0.2.3 111 9:00:00.000 9:05:00.000 192.0.2.3 16838 9:00:00.000 9:05:00.000 192.0.2.2 11538 9:00:00.000 9:05:00.000 192.0.2.3 119 9:00:00.000 9:05:00.000 192.0.2.3 2973 9:00:00.000 9:05:00.000 192.0.2.4 8350 9:00:00.000 9:05:00.000 203.0.113.3 778 9:00:00.000 9:05:00.000 203.0.113.3 883 9:00:00.000 9:05:00.000 192.0.2.2 15420 9:00:00.000 9:05:00.000 203.0.113.3 11200 9:05:00.000 9:10:00.000 203.0.113.3 1637 9:05:00.000 9:10:00.000 203.0.113.3 561 9:05:00.000 9:10:00.000 192.0.2.2 1899 9:05:00.000 9:10:00.000 192.0.2.3 1284 9:05:00.000 9:10:00.000 203.0.113.3 2670 9:10:00.000 9:15:00.000 192.0.2.4 75 9:10:00.000 9:15:00.000 192.0.2.4 3374 9:10:00.000 9:15:00.000 192.0.2.4 138 9:10:00.000 9:15:00.000 192.0.2.3 2325 9:10:00.000 9:15:00.000 192.0.2.2 2869 9:10:00.000 9:15:00.000 192.0.2.3 18289 Figure 14: Key Aggregation for Time Series per Source Aggregate combination sums the counters per key and interval; the summations of the first two keys and intervals are shown in detail in Figure 15.
start time |end time |source ip4 |octets 9:00:00.000 9:05:00.000 192.0.2.2 119 9:00:00.000 9:05:00.000 192.0.2.2 83 9:00:00.000 9:05:00.000 192.0.2.2 1637 9:00:00.000 9:05:00.000 192.0.2.2 11538 + 9:00:00.000 9:05:00.000 192.0.2.2 15420 ----- = 9:00:00.000 9:05:00.000 192.0.2.2 28797 9:00:00.000 9:05:00.000 192.0.2.3 111 9:00:00.000 9:05:00.000 192.0.2.3 16838 9:00:00.000 9:05:00.000 192.0.2.3 119 + 9:00:00.000 9:05:00.000 192.0.2.3 2973 ----- = 9:00:00.000 9:05:00.000 192.0.2.3 20041 Figure 15: Summation during Aggregate Combination This can be applied to each set of Partially Aggregated Flows to produce the final Aggregated Flows that are shown in Figure 16, as exported by the Template in Figure 11. start time |end time |source ip4 |octets 9:00:00.000 9:05:00.000 192.0.2.2 28797 9:00:00.000 9:05:00.000 192.0.2.3 20041 9:00:00.000 9:05:00.000 192.0.2.4 8350 9:00:00.000 9:05:00.000 203.0.113.3 12861 9:05:00.000 9:10:00.000 192.0.2.2 1899 9:05:00.000 9:10:00.000 192.0.2.3 1284 9:05:00.000 9:10:00.000 203.0.113.3 4868 9:10:00.000 9:15:00.000 192.0.2.2 2869 9:10:00.000 9:15:00.000 192.0.2.3 20614 9:10:00.000 9:15:00.000 192.0.2.4 3587 Figure 16: Aggregated Flows for Time Series per Source8.2. Core Traffic Matrix
Aggregating Flows by source and destination ASN in time series is used to generate core traffic matrices. The core traffic matrix provides a view of the state of the routes within a network, and it can be used for long-term planning of changes to network design based on traffic demand. Here, imposed time intervals are generally much longer than active Flow timeouts. The traffic matrix is reported in terms of octets, packets, and flows, as each of these values may have a subtly different effect on capacity planning.
This example demonstrates key aggregation using derived keys and Original Flow counting. While some Original Flows may be generated by Exporting Processes on forwarding devices, and therefore contain the bgpSourceAsNumber and bgpDestinationAsNumber Information Elements, Original Flows from Exporting Processes on dedicated measurement devices without routing data contain only a destinationIPv[46]Address. For these Flows, the Mediator must look up a next-hop AS from an IP-to-AS table, replacing source and destination addresses with ASNs. The table used in this example is shown in Figure 17. (Note that due to limited example address space, in this example we ignore the common practice of routing only blocks of /24 or larger.) prefix |ASN 192.0.2.0/25 64496 192.0.2.128/25 64497 198.51.100/24 64498 203.0.113.0/24 64499 Figure 17: Example ASN Map The Template for Aggregated Flows produced by this example is shown in Figure 18. flowStartMilliseconds(152)[8] flowEndMilliseconds(153)[8] bgpSourceAsNumber(16)[4] bgpDestinationAsNumber(17)[4] octetDeltaCount(1)[8] Figure 18: Output Template for Traffic Matrix Assume the goal is to get 60-minute time series of octet counts per source/destination ASN pair. The aggregation operations would then be arranged as in Figure 19.
Original Flows | V +-----------------------+ | interval distribution | | * impose uniform | | 3600s time interval| +-----------------------+ | | Partially Aggregated Flows V +------------------------+ | key aggregation | | * reduce key to only | | sourceIPv4Address + | | destIPv4Address | +------------------------+ | V +------------------------+ | key aggregation | | * replace addresses | | with ASN from map | +------------------------+ | | Partially Aggregated Flows V +-------------------------+ | aggregate combination | | * sum octetDeltaCount | +-------------------------+ | V Aggregated Flows Figure 19: Aggregation Operations for Traffic Matrix After applying the interval distribution step to the source data in Figure 10, the Partially Aggregated Flows are shown in Figure 20. Note that the Flows are identical to those in the interval distribution step in the previous example, except the chosen interval (1 hour, 3600 seconds) is different; therefore, all the Flows fit into a single interval.
start time |end time |source ip4 |port |dest ip4 |port|pt| oct 9:00:00 10:00:00 192.0.2.2 47113 192.0.2.131 53 17 119 9:00:00 10:00:00 192.0.2.2 22153 192.0.2.131 53 17 83 9:00:00 10:00:00 192.0.2.2 52420 198.51.100.2 443 6 1637 9:00:00 10:00:00 192.0.2.3 56047 192.0.2.131 53 17 111 9:00:00 10:00:00 192.0.2.3 41183 198.51.100.67 80 6 16838 9:00:00 10:00:00 192.0.2.2 17606 198.51.100.68 80 6 11538 9:00:00 10:00:00 192.0.2.3 47113 192.0.2.131 53 17 119 9:00:00 10:00:00 192.0.2.3 48458 198.51.100.133 80 6 2973 9:00:00 10:00:00 192.0.2.4 61295 198.51.100.2 443 6 8350 9:00:00 10:00:00 203.0.113.3 41256 198.51.100.133 80 6 778 9:00:00 10:00:00 203.0.113.3 51662 198.51.100.3 80 6 883 9:00:00 10:00:00 192.0.2.2 37581 198.51.100.2 80 6 15420 9:00:00 10:00:00 203.0.113.3 52572 198.51.100.2 443 6 1637 9:00:00 10:00:00 203.0.113.3 49914 197.51.100.133 80 6 561 9:00:00 10:00:00 192.0.2.2 50824 198.51.100.2 443 6 1899 9:00:00 10:00:00 192.0.2.3 34597 198.51.100.3 80 6 1284 9:00:00 10:00:00 203.0.113.3 58907 198.51.100.4 80 6 2670 9:00:00 10:00:00 192.0.2.4 22478 192.0.2.131 53 17 75 9:00:00 10:00:00 192.0.2.4 49513 198.51.100.68 80 6 3374 9:00:00 10:00:00 192.0.2.4 64832 198.51.100.67 80 6 138 9:00:00 10:00:00 192.0.2.3 60833 198.51.100.69 443 6 2325 9:00:00 10:00:00 203.0.113.3 39586 198.51.100.17 80 6 11200 9:00:00 10:00:00 192.0.2.2 19638 198.51.100.3 80 6 2869 9:00:00 10:00:00 192.0.2.3 40429 198.51.100.4 80 6 18289 Figure 20: Interval Imposition for Traffic Matrix The next steps are to discard irrelevant key fields and to replace the source and destination addresses with source and destination ASNs in the map; the results of these key aggregation steps are shown in Figure 21.
start time |end time |source ASN |dest ASN |octets 9:00:00 10:00:00 AS64496 AS64497 119 9:00:00 10:00:00 AS64496 AS64497 83 9:00:00 10:00:00 AS64496 AS64498 1637 9:00:00 10:00:00 AS64496 AS64497 111 9:00:00 10:00:00 AS64496 AS64498 16838 9:00:00 10:00:00 AS64496 AS64498 11538 9:00:00 10:00:00 AS64496 AS64497 119 9:00:00 10:00:00 AS64496 AS64498 2973 9:00:00 10:00:00 AS64496 AS64498 8350 9:00:00 10:00:00 AS64499 AS64498 778 9:00:00 10:00:00 AS64499 AS64498 883 9:00:00 10:00:00 AS64496 AS64498 15420 9:00:00 10:00:00 AS64499 AS64498 1637 9:00:00 10:00:00 AS64499 AS64498 561 9:00:00 10:00:00 AS64496 AS64498 1899 9:00:00 10:00:00 AS64496 AS64498 1284 9:00:00 10:00:00 AS64499 AS64498 2670 9:00:00 10:00:00 AS64496 AS64497 75 9:00:00 10:00:00 AS64496 AS64498 3374 9:00:00 10:00:00 AS64496 AS64498 138 9:00:00 10:00:00 AS64496 AS64498 2325 9:00:00 10:00:00 AS64499 AS64498 11200 9:00:00 10:00:00 AS64496 AS64498 2869 9:00:00 10:00:00 AS64496 AS64498 18289 Figure 21: Key Aggregation for Traffic Matrix: Reduction and Replacement Finally, aggregate combination sums the counters per key and interval. The resulting Aggregated Flows containing the traffic matrix, shown in Figure 22, are then exported using the Template in Figure 18. Note that these Aggregated Flows represent a sparse matrix: AS pairs for which no traffic was received have no corresponding record in the output. start time end time source ASN dest ASN octets 9:00:00 10:00:00 AS64496 AS64497 507 9:00:00 10:00:00 AS64496 AS64498 86934 9:00:00 10:00:00 AS64499 AS64498 17729 Figure 22: Aggregated Flows for Traffic Matrix The output of this operation is suitable for re-aggregation: that is, traffic matrices from single links or Observation Points can be aggregated through the same interval imposition and aggregate combination steps in order to build a traffic matrix for an entire network.
8.3. Distinct Source Count per Destination Endpoint
Aggregating Flows by destination address and port, and counting distinct sources aggregated away, can be used as part of passive service inventory and host characterization. This example shows aggregation as an analysis technique, performed on source data stored in an IPFIX File. As the Transport Session in this File is bounded, removal of all timestamp information allows summarization of the entire time interval contained within the interval. Removal of timing information during interval imposition is equivalent to an infinitely long imposed time interval. This demonstrates both how infinite intervals work, and how unique counters work. The aggregation operations are summarized in Figure 23.
Original Flows | V +-----------------------+ | interval distribution | | * discard timestamps | +-----------------------+ | | Partially Aggregated Flows V +----------------------------+ | value aggregation | | * discard octetDeltaCount | +----------------------------+ | | Partially Aggregated Flows V +----------------------------+ | key aggregation | | * reduce key to only | | destIPv4Address + | | destTransportPort, | | * count distinct sources | +----------------------------+ | | Partially Aggregated Flows V +----------------------------------------------+ | aggregate combination | | * no-op (distinct sources already counted) | +----------------------------------------------+ | V Aggregated Flows Figure 23: Aggregation Operations for Source Count The Template for Aggregated Flows produced by this example is shown in Figure 24. destinationIPv4Address(12)[4] destinationTransportPort(11)[2] distinctCountOfSourceIPAddress(378)[8] Figure 24: Output Template for Source Count
Interval distribution, in this case, merely discards the timestamp information from the Original Flows in Figure 10, and as such is not shown. Likewise, the value aggregation step simply discards the octetDeltaCount value field. The key aggregation step reduces the key to the destinationIPv4Address and destinationTransportPort, counting the distinct source addresses. Since this is essentially the output of this aggregation function, the aggregate combination operation is a no-op; the resulting Aggregated Flows are shown in Figure 25. dest ip4 |port |dist src 192.0.2.131 53 3 198.51.100.2 80 1 198.51.100.2 443 3 198.51.100.67 80 2 198.51.100.68 80 2 198.51.100.133 80 2 198.51.100.3 80 3 198.51.100.4 80 2 198.51.100.17 80 1 198.51.100.69 443 1 Figure 25: Aggregated Flows for Source Count8.4. Traffic Time Series per Source with Counter Distribution
Returning to the example in Section 8.1, note that our source data contains some Flows with durations longer than the imposed interval of five minutes. The default method for dealing with such Flows is to account them to the interval containing the Flow's start time. In this example, the same data is aggregated using the same arrangement of operations and the same output Template as in Section 8.1, but using a different counter distribution policy, Simple Uniform Distribution, as described in Section 5.1.1. In order to do this, the Exporting Process first exports the Aggregate Counter Distribution Options Template, as in Figure 26. templateId(12)[2]{scope} valueDistributionMethod(384)[1] Figure 26: Aggregate Counter Distribution Options Template This Template is followed by an Aggregate Counter Distribution Record described by this Template; assuming the output Template in Figure 11 has ID 257, this record would appear as in Figure 27.
template ID | value distribution method 257 4 (simple uniform) Figure 27: Aggregate Counter Distribution Record Following metadata export, the aggregation steps follow as before. However, two long Flows are distributed across multiple intervals in the interval imposition step, as indicated with "*" in Figure 28. Note the uneven distribution of the three-interval, 11200-octet Flow into three Partially Aggregated Flows of 3733, 3733, and 3734 octets; this ensures no cumulative error is injected by the interval distribution step. start time |end time |source ip4 |port |dest ip4 |port|pt| oct 9:00:00.000 9:05:00.000 192.0.2.2 47113 192.0.2.131 53 17 119 9:00:00.000 9:05:00.000 192.0.2.2 22153 192.0.2.131 53 17 83 9:00:00.000 9:05:00.000 192.0.2.2 52420 198.51.100.2 443 6 1637 9:00:00.000 9:05:00.000 192.0.2.3 56047 192.0.2.131 53 17 111 9:00:00.000 9:05:00.000 192.0.2.3 41183 198.51.100.67 80 6 16838 9:00:00.000 9:05:00.000 192.0.2.2 17606 198.51.100.68 80 6 11538 9:00:00.000 9:05:00.000 192.0.2.3 47113 192.0.2.131 53 17 119 9:00:00.000 9:05:00.000 192.0.2.3 48458 198.51.100.133 80 6 2973 9:00:00.000 9:05:00.000 192.0.2.4 61295 198.51.100.2 443 6 8350 9:00:00.000 9:05:00.000 203.0.113.3 41256 198.51.100.133 80 6 778 9:00:00.000 9:05:00.000 203.0.113.3 51662 198.51.100.3 80 6 883 9:00:00.000 9:05:00.000 192.0.2.2 37581 198.51.100.2 80 6 7710* 9:00:00.000 9:05:00.000 203.0.113.3 39586 198.51.100.17 80 6 3733* 9:05:00.000 9:10:00.000 203.0.113.3 52572 198.51.100.2 443 6 1637 9:05:00.000 9:10:00.000 203.0.113.3 49914 197.51.100.133 80 6 561 9:05:00.000 9:10:00.000 192.0.2.2 50824 198.51.100.2 443 6 1899 9:05:00.000 9:10:00.000 192.0.2.3 34597 198.51.100.3 80 6 1284 9:05:00.000 9:10:00.000 203.0.113.3 58907 198.51.100.4 80 6 2670 9:05:00.000 9:10:00.000 192.0.2.2 37581 198.51.100.2 80 6 7710* 9:05:00.000 9:10:00.000 203.0.113.3 39586 198.51.100.17 80 6 3733* 9:10:00.000 9:15:00.000 192.0.2.4 22478 192.0.2.131 53 17 75 9:10:00.000 9:15:00.000 192.0.2.4 49513 198.51.100.68 80 6 3374 9:10:00.000 9:15:00.000 192.0.2.4 64832 198.51.100.67 80 6 138 9:10:00.000 9:15:00.000 192.0.2.3 60833 198.51.100.69 443 6 2325 9:10:00.000 9:15:00.000 192.0.2.2 19638 198.51.100.3 80 6 2869 9:10:00.000 9:15:00.000 192.0.2.3 40429 198.51.100.4 80 6 18289 9:10:00.000 9:15:00.000 203.0.113.3 39586 198.51.100.17 80 6 3734* Figure 28: Distributed Interval Imposition for Time Series per Source Subsequent steps are as in Section 8.1; the results, to be exported using the Template shown in Figure 11, are shown in Figure 29, with Aggregated Flows differing from the example in Section 8.1 indicated by "*".
start time |end time |source ip4 |octets 9:00:00.000 9:05:00.000 192.0.2.2 21087* 9:00:00.000 9:05:00.000 192.0.2.3 20041 9:00:00.000 9:05:00.000 192.0.2.4 8350 9:00:00.000 9:05:00.000 203.0.113.3 5394* 9:05:00.000 9:10:00.000 192.0.2.2 9609* 9:05:00.000 9:10:00.000 192.0.2.3 1284 9:05:00.000 9:10:00.000 203.0.113.3 8601* 9:10:00.000 9:15:00.000 192.0.2.2 2869 9:10:00.000 9:15:00.000 192.0.2.3 20614 9:10:00.000 9:15:00.000 192.0.2.4 3587 9:10:00.000 9:15:00.000 203.0.113.3 3734* Figure 29: Aggregated Flows for Time Series per Source with Counter Distribution9. Security Considerations
This document specifies the operation of an Intermediate Aggregation Process with the IPFIX protocol; the Security Considerations for the protocol itself in Section 11 of [RFC7011] therefore apply. In the common case that aggregation is performed on a Mediator, the Security Considerations for Mediators in Section 9 of [RFC6183] apply as well. As mentioned in Section 3, certain aggregation operations may tend to have an anonymizing effect on Flow data by obliterating sensitive identifiers. Aggregation may also be combined with anonymization within a Mediator, or as part of a chain of Mediators, to further leverage this effect. In any case in which an Intermediate Aggregation Process is applied as part of a data anonymization or protection scheme, or is used together with anonymization as described in [RFC6235], the Security Considerations in Section 9 of [RFC6235] apply.10. IANA Considerations
This document specifies the creation of new IPFIX Information Elements in the IPFIX Information Element registry [IANA-IPFIX], as defined in Section 7 above. IANA has assigned Information Element numbers to these Information Elements, and entered them into the registry.11. Acknowledgments
Special thanks to Elisa Boschi for early work on the concepts laid out in this document. Thanks to Lothar Braun, Christian Henke, and Rahul Patel for their reviews and valuable feedback, with special
thanks to Paul Aitken for his multiple detailed reviews. This work is materially supported by the European Union Seventh Framework Programme under grant agreement 257315 (DEMONS).12. References
12.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC5226] Narten, T. and H. Alvestrand, "Guidelines for Writing an IANA Considerations Section in RFCs", BCP 26, RFC 5226, May 2008. [RFC7011] Claise, B., Ed., Trammell, B., Ed., and P. Aitken, "Specification of the IP Flow Information Export (IPFIX) Protocol for the Exchange of Flow Information", STD 77, RFC 7011, September 2013.12.2. Informative References
[RFC3917] Quittek, J., Zseby, T., Claise, B., and S. Zander, "Requirements for IP Flow Information Export (IPFIX)", RFC 3917, October 2004. [RFC5470] Sadasivan, G., Brownlee, N., Claise, B., and J. Quittek, "Architecture for IP Flow Information Export", RFC 5470, March 2009. [RFC5472] Zseby, T., Boschi, E., Brownlee, N., and B. Claise, "IP Flow Information Export (IPFIX) Applicability", RFC 5472, March 2009. [RFC5476] Claise, B., Johnson, A., and J. Quittek, "Packet Sampling (PSAMP) Protocol Specifications", RFC 5476, March 2009. [RFC5655] Trammell, B., Boschi, E., Mark, L., Zseby, T., and A. Wagner, "Specification of the IP Flow Information Export (IPFIX) File Format", RFC 5655, October 2009. [RFC5982] Kobayashi, A. and B. Claise, "IP Flow Information Export (IPFIX) Mediation: Problem Statement", RFC 5982, August 2010. [RFC6183] Kobayashi, A., Claise, B., Muenz, G., and K. Ishibashi, "IP Flow Information Export (IPFIX) Mediation: Framework", RFC 6183, April 2011.
[RFC6235] Boschi, E. and B. Trammell, "IP Flow Anonymization Support", RFC 6235, May 2011. [RFC6728] Muenz, G., Claise, B., and P. Aitken, "Configuration Data Model for the IP Flow Information Export (IPFIX) and Packet Sampling (PSAMP) Protocols", RFC 6728, October 2012. [RFC7012] Claise, B., Ed. and B. Trammell, Ed., "Information Model for IP Flow Information Export (IPFIX)", RFC 7012, September 2013. [RFC7013] Trammell, B. and B. Claise, "Guidelines for Authors and Reviewers of IP Flow Information Export (IPFIX) Information Elements", BCP 184, RFC 7013, September 2013. [RFC7014] D'Antonio, S., Zseby, T., Henke, C., and L. Peluso, "Flow Selection Techniques", RFC 7014, September 2013. [IANA-IPFIX] IANA, "IP Flow Information Export (IPFIX) Entities", <http://www.iana.org/assignments/ipfix>. [IPFIX-MED-PROTO] Claise, B., Kobayashi, A., and B. Trammell, "Operation of the IP Flow Information Export (IPFIX) Protocol on IPFIX Mediators", Work in Progress, July 2013.
Authors' Addresses
Brian Trammell Swiss Federal Institute of Technology Zurich Gloriastrasse 35 8092 Zurich Switzerland Phone: +41 44 632 70 13 EMail: trammell@tik.ee.ethz.ch Arno Wagner Consecom AG Bleicherweg 64a 8002 Zurich Switzerland EMail: arno@wagner.name Benoit Claise Cisco Systems, Inc. De Kleetlaan 6a b1 1831 Diegem Belgium Phone: +32 2 704 5622 EMail: bclaise@cisco.com