This section describes, in detail, how the method operates. A special emphasis is given to the measurement of packet loss, which represents the core application of the method, but applicability to delay and jitter measurements is also considered.
The basic idea is to virtually split traffic flows into consecutive blocks: each block represents a measurable entity unambiguously recognizable by all network devices along the path. By counting the number of packets in each block and comparing the values measured by different network devices along the path, it is possible to measure if packet loss occurred in any single block between any two points.
As discussed in the previous section, a simple way to create the blocks is to "color" the traffic (two colors are sufficient) so that packets belonging to alternate consecutive blocks will have different colors. Whenever the color changes, the previous block terminates and the new one begins. Hence, all the packets belonging to the same block will have the same color, and packets of different consecutive blocks will have different colors. The number of packets in each block depends on the criterion used to create the blocks:
-
if the color is switched after a fixed number of packets, then each block will contain the same number of packets (except for any losses); and
-
if the color is switched according to a fixed timer, then the number of packets may be different in each block depending on the packet rate.
The use of a fixed timer for the creation of blocks is
REQUIRED when implementing this specification. The switching after a fixed number of packets is an additional possibility, but its detailed specification is out of scope. An example of application is in [
EXPLICIT-FLOW-MEASUREMENTS].
The following figure shows how a flow appears when it is split into traffic blocks with colored packets.
A: packet with A coloring
B: packet with B coloring
| | | | |
| | Traffic Flow | |
------------------------------------------------------------------->
BBBBBBB AAAAAAAAAAA BBBBBBBBBBB AAAAAAAAAAA BBBBBBBBBBB AAAAAAA
------------------------------------------------------------------->
... | Block 5 | Block 4 | Block 3 | Block 2 | Block 1
| | | | |
Figure 3 shows how the method can be used to measure link packet loss between two adjacent nodes.
Referring to the figure, let's assume we want to monitor the packet loss on the link between two routers: router R1 and router R2. According to the method, the traffic is colored alternatively with two different colors: A and B. Whenever the color changes, the transition generates a sort of square-wave signal, as depicted in the following figure.
Color A ----------+ +-----------+ +----------
| | | |
Color B +-----------+ +-----------+
Block n ... Block 3 Block 2 Block 1
<---------> <---------> <---------> <---------> <--------->
Traffic Flow
===========================================================>
Color ...AAAAAAAAAAA BBBBBBBBBBB AAAAAAAAAAA BBBBBBBBBBB AAAAAAA...
===========================================================>
Traffic coloring can be done by R1 itself if the traffic is not already colored. R1 needs two counters, C(A)R1 and C(B)R1, on its egress interface: C(A)R1 counts the packets with color A and C(B)R1 counts those with color B. As long as traffic is colored as A, only counter C(A)R1 will be incremented, while C(B)R1 is not incremented; conversely, when the traffic is colored as B, only C(B)R1 is incremented. C(A)R1 and C(B)R1 can be used as reference values to determine the packet loss from R1 to any other measurement point down the path. Router R2, similarly, will need two counters on its ingress interface, C(A)R2 and C(B)R2, to count the packets received on that interface and colored with A and B, respectively. When an A block ends, it is possible to compare C(A)R1 and C(A)R2 and calculate the packet loss within the block; similarly, when the successive B block terminates, it is possible to compare C(B)R1 with C(B)R2, and so on, for every successive block.
Likewise, by using two counters on the R2 egress interface, it is possible to count the packets sent out of the R2 interface and use them as reference values to calculate the packet loss from R2 to any measurement point downstream from R2.
The length of the blocks can be chosen large enough to simplify the collection and the comparison of measures taken by different network devices. It's preferable to read the value of the counters not immediately after the color switch: some packets could arrive out of order and increment the counter associated with the previous block (color), so it is worth waiting for some time. A safe choice is to wait L/2 time units (where L is the duration for each block) after the color switch, to read the counter of the previous color (
Section 5). The drawback is that the longer the duration of the block, the less frequently the measurement can be taken.
Two different strategies that can be used when implementing the method are:
-
flow-based:
-
the flow-based strategy is used when well-defined traffic flows need to be monitored. According to this strategy, only the specified flow is colored. Counters for packet-loss measurements can be instantiated for each single flow, or for the set as a whole, depending on the desired granularity. With this approach, it is necessary to know in advance the path followed by flows that are subject to measurement. Path rerouting and traffic load balancing need to be taken into account.
-
link-based:
-
measurements are performed on all the traffic on a link-by-link basis. The link could be a physical link or a logical link. Counters could be instantiated for the traffic as a whole or for each traffic class (in case it is desired to monitor each class separately), but in the second case, two counters are needed for each class.
The flow-based strategy is
REQUIRED when implementing this specification. It requires the identification of the flow to be monitored and the discovery of the path followed by the selected flow. It is possible to monitor a single flow or multiple flows grouped together, but in this case, measurement is consistent only if all the flows in the group follow the same path. Moreover, if a measurement is performed by grouping many flows, it is not possible to determine exactly which flow was affected by packet loss. In order to have measures per single flow, it is necessary to configure counters for each specific flow. Once the flow(s) to be monitored has been identified, it is necessary to configure the monitoring on the proper nodes. Configuring the monitoring means configuring the rule to intercept the traffic and configuring the counters to count the packets. To have just an end-to-end monitoring, it is sufficient to enable the monitoring on the first- and last-hop routers of the path: the mechanism is completely transparent to intermediate nodes and independent of the path followed by traffic flows. On the contrary, to monitor the flow on a hop-by-hop basis along its whole path, it is necessary to enable the monitoring on every node from the source to the destination. In case the exact path followed by the flow is not known a priori (i.e., the flow has multiple paths to reach the destination), it is necessary to enable the monitoring on every path: counters on interfaces traversed by the flow will report packet count, whereas counters on other interfaces will be null.
The same principle used to measure packet loss can be applied also to one-way delay measurement. There are two methodologies, as described hereinafter.
Note that, for all the one-way delay alternatives described in the next sections, by summing the one-way delays of the two directions of a path, it is always possible to measure the two-way delay (round-trip "virtual" delay). The Network Time Protocol (NTP) [
RFC 5905] or the IEEE 1588 Precision Time Protocol (PTP) [
IEEE-1588] (as discussed in the previous section) can be used for the timestamp formats depending on the needed precision.
The alternation of colors can be used as a time reference to calculate the delay. Whenever the color changes (which means that a new block has started), a network device can store the timestamp of the first packet of the new block; that timestamp can be compared with the timestamp of the same packet on a second router to compute packet delay. When looking at
Figure 2, R1 stores the timestamp TS(A1)R1 when it sends the first packet of block 1 (A-colored), the timestamp TS(B2)R1 when it sends the first packet of block 2 (B-colored), and so on for every other block. R2 performs the same operation on the receiving side, recording TS(A1)R2, TS(B2)R2, and so on. Since the timestamps refer to specific packets (the first packet of each block), in the case where no packet loss or misordering exists, we would be sure that timestamps compared to compute delay refer to the same packets. By comparing TS(A1)R1 with TS(A1)R2 (and similarly TS(B2)R1 with TS(B2)R2, and so on), it is possible to measure the delay between R1 and R2. In order to have more measurements, it is possible to take and store more timestamps, referring to other packets within each block. The number of measurements could be increased by considering multiple packets in the block; for instance, a timestamp could be taken every N packets, thus generating multiple delay measurements. Taking this to the limit, in principle, the delay could be measured for each packet by taking and comparing the corresponding timestamps (possible but impractical from an implementation point of view).
In order to coherently compare timestamps collected on different routers, the clocks on the network nodes
MUST be in sync (
Section 5). Furthermore, a measurement is valid only if no packet loss occurs and if packet misordering can be avoided; otherwise, the first packet of a block on R1 could be different from the first packet of the same block on R2 (for instance, if that packet is lost between R1 and R2 or it arrives after the next one). Since packet misordering is generally undetectable, it is not possible to check whether the first packet on R1 is the same on R2, and this is part of the intrinsic error in this measurement.
The method previously exposed for measuring the delay is sensitive to out-of-order reception of packets. In order to overcome this problem, an approach based on the concept of mean delay can be considered. The mean delay is calculated by considering the average arrival time of the packets within a single block. The network device locally stores a timestamp for each packet received within a single block: summing all the timestamps and dividing by the total number of packets received, the average arrival time for that block of packets can be calculated. By subtracting the average arrival times of two adjacent devices, it is possible to calculate the mean delay between those nodes. This method greatly reduces the number of timestamps that have to be collected (only one per block for each network device), and it is robust to out-of-order packets with only a small error introduced in case of packet loss. But, when computing the mean delay, the measurement error could be augmented by accumulating the measurement error of a lot of packets. Additionally, it only gives one measure for the duration of the block, and it doesn't give the minimum, maximum, and median delay values [
RFC 6703]. This limitation could be overcome by reducing the duration of the block (for instance, from minutes to seconds), which implies a highly optimized implementation of the method. For this reason, the mean delay calculation may not be so viable in some cases.
As mentioned above, the Single-Marking methodology for one-way delay measurement has some limitations, since it is sensitive to out-of-order reception of packets, and even the mean delay calculation is limited because it doesn't give information about the delay value's distribution for the duration of the block. Actually, it may be useful to have not only the mean delay but also the minimum, maximum, and median delay values and, in wider terms, to know more about the statistical distribution of delay values. So, in order to have more information about the delay and to overcome out-of-order issues, a different approach can be introduced, and it is based on a Double-Marking methodology.
Basically, the idea is to use the first marking to create the alternate flow and, within this colored flow, a second marking to select the packets for measuring delay/jitter. The first marking is needed for packet loss and may be used for mean delay measurement. The second marking creates a new set of marked packets that are fully identified over the network so that a network device can store the timestamps of these packets. These timestamps can be compared with the timestamps of the same packets on the next node to compute packet delay values for each packet. The number of measurements can be easily increased by changing the frequency of the second marking. But the frequency of the second marking must not be too high in order to avoid out-of-order issues. Between packets with the second marking, there should be an adequate time gap to avoid out-of-order issues and also to have a number of measurement packets that are rate independent. This gap may be, at the minimum, the mean network delay calculated with the previous methodology. Therefore, it is possible to choose a proper time gap to guarantee a fixed number of double-marked packets uniformly spaced in each block. If packets with the second marking are lost, it is easy to recognize the loss since the number of double-marked packets is known for each block. Based on the spacing between these packets, it can also be possible to understand which packet of the second marking sequence has been lost and perform the measurements only for the remaining packets. But this may be complicated if more packets are lost. In this case, an implementation may simply discard the delay measurements for the corrupted block and proceed with the next block.
An efficient and robust mode is to select a single packet with the second marking for each block; in this way, there is no time gap to consider between the double-marked packets to avoid their reorder. In addition, it is also easier to identify the only double-marked packet in each block and skip the delay measurement for the block if it is lost.
The Double-Marking methodology can also be used to get more statistics of delay extent data, e.g., percentiles, variance, and median delay values. Indeed, a subset of batch packets is selected for extensive delay calculation by using the second marking, and it is possible to perform a detailed analysis on these double-marked packets. It is worth noting that there are classic algorithms for median and variance calculation, but they are out of the scope of this document. The conventional range (maximum-minimum) should be avoided for several reasons, including stability of the maximum delay due to the influence by outliers. In this regard,
Section 6.5 of
RFC 5481 highlights how the 99.9th percentile of delay and delay variation is more helpful to performance planners.
Similar to one-way delay measurement (both for Single Marking and Double Marking), the method can also be used to measure the inter-arrival jitter. We refer to the definition in [
RFC 3393]. The alternation of colors, for a Single-Marking Method, can be used as a time reference to measure delay variations. In case of Double Marking, the time reference is given by the second-marked packets. Considering the example depicted in
Figure 2, R1 stores the timestamp TS(A)R1 whenever it sends the first packet of a block, and R2 stores the timestamp TS(B)R2 whenever it receives the first packet of a block. The inter-arrival jitter can be easily derived from one-way delay measurement, by evaluating the delay variation of consecutive samples.
The concept of mean delay can also be applied to delay variation, by evaluating the average variation of the interval between consecutive packets of the flow from R1 to R2.