6. Traffic Benchmarking Methodology
The traffic benchmarking methodology uses the test setup from Section 1.2 and metrics defined in Section 4. Each test SHOULD compare the network device's internal statistics (available via command line management interface, SNMP, etc.) to the measured metrics defined in Section 4. This evaluates the accuracy of the internal traffic management counters under individual test conditions and capacity test conditions as defined in Sections 4.1 and 4.2. This comparison is not intended to compare real-time statistics, but rather the cumulative statistics reported after the test has completed and device counters have updated (it is common for device counters to update after an interval of 10 seconds or more). From a device configuration standpoint, scheduling and shaping functionality can be applied to logical ports (e.g., Link Aggregation (LAG)). This would result in the same scheduling and shaping configuration applied to all of the member physical ports. The focus of this document is only on tests at a physical-port level. The following sections provide the objective, procedure, metrics, and reporting format for each test. For all test steps, the following global parameters must be specified: Test Runs (Tr): The number of times the test needs to be run to ensure accurate and repeatable results. The recommended value is a minimum of 10. Test Duration (Td): The duration of a test iteration, expressed in seconds. The recommended minimum value is 60 seconds. The variability in the test results MUST be measured between test runs, and if the variation is characterized as a significant portion of the measured values, the next step may be to revise the methods to achieve better consistency.6.1. Policing Tests
A policer is defined as the entity performing the policy function. The intent of the policing tests is to verify the policer performance (i.e., CIR/CBS and EIR/EBS parameters). The tests will verify that the network device can handle the CIR with CBS and the EIR with EBS, and will use back-to-back packet-testing concepts as described in [RFC2544] (but adapted to burst size algorithms and terminology). Also, [MEF-14], [MEF-19], and [MEF-37] provide some bases for
specific components of this test. The burst hunt algorithm defined in Section 5.1.1 can also be used to automate the measurement of the CBS value. The tests are divided into two (2) sections: individual policer tests and then full-capacity policing tests. It is important to benchmark the basic functionality of the individual policer and then proceed into the fully rated capacity of the device. This capacity may include the number of policing policies per device and the number of policers simultaneously active across all ports.6.1.1. Policer Individual Tests
Objective: Test a policer as defined by [RFC4115] or [MEF-10.3], depending upon the equipment's specification. In addition to verifying that the policer allows the specified CBS and EBS bursts to pass, the policer test MUST verify that the policer will remark or drop excess packets, and pass traffic at the specified CBS/EBS values. Test Summary: Policing tests should use stateless traffic. Stateful TCP test traffic will generally be adversely affected by a policer in the absence of traffic shaping. So, while TCP traffic could be used, it is more accurate to benchmark a policer with stateless traffic. As an example of a policer as defined by [RFC4115], consider a CBS/EBS of 64 KB and CIR/EIR of 100 Mbps on a 1 GigE physical link (in color-blind mode). A stateless traffic burst of 64 KB would be sent into the policer at the GigE rate. This equates to an approximately 0.512-millisecond burst time (64 KB at 1 GigE). The traffic generator must space these bursts to ensure that the aggregate throughput does not exceed the CIR. The Ti between the bursts would equal CBS * 8 / CIR = 5.12 milliseconds in this example. Test Metrics: The metrics defined in Section 4.1 (BSA, LP, OOS, PD, and PDV) SHALL be measured at the egress port and recorded. Procedure: 1. Configure the DUT policing parameters for the desired CIR/EIR and CBS/EBS values to be tested. 2. Configure the tester to generate a stateless traffic burst equal to CBS and an interval equal to Ti (CBS in bits/CIR).
3. Compliant Traffic Test: Generate bursts of CBS + EBS traffic into the policer ingress port, and measure the metrics defined in Section 4.1 (BSA, LP, OOS, PD, and PDV) at the egress port and across the entire Td (default 60-second duration). 4. Excess Traffic Test: Generate bursts of greater than CBS + EBS bytes into the policer ingress port, and verify that the policer only allowed the BSA bytes to exit the egress. The excess burst MUST be recorded; the recommended value is 1000 bytes. Additional tests beyond the simple color-blind example might include color-aware mode, configurations where EIR is greater than CIR, etc. Reporting Format: The policer individual report MUST contain all results for each CIR/EIR/CBS/EBS test run. A recommended format is as follows: *********************************************************** Test Configuration Summary: Tr, Td DUT Configuration Summary: CIR, EIR, CBS, EBS The results table should contain entries for each test run, as follows (Test #1 to Test #Tr): - Compliant Traffic Test: BSA, LP, OOS, PD, and PDV - Excess Traffic Test: BSA ***********************************************************6.1.2. Policer Capacity Tests
Objective: The intent of the capacity tests is to verify the policer performance in a scaled environment with multiple ingress customer policers on multiple physical ports. This test will benchmark the maximum number of active policers as specified by the device manufacturer. Test Summary: The specified policing function capacity is generally expressed in terms of the number of policers active on each individual physical port as well as the number of unique policer rates that are utilized. For all of the capacity tests, the benchmarking test
procedure and reporting format described in Section 6.1.1 for a single policer MUST be applied to each of the physical-port policers. For example, a Layer 2 switching device may specify that each of the 32 physical ports can be policed using a pool of policing service policies. The device may carry a single customer's traffic on each physical port, and a single policer is instantiated per physical port. Another possibility is that a single physical port may carry multiple customers, in which case many customer flows would be policed concurrently on an individual physical port (separate policers per customer on an individual port). Test Metrics: The metrics defined in Section 4.1 (BSA, LP, OOS, PD, and PDV) SHALL be measured at the egress port and recorded. The following sections provide the specific test scenarios, procedures, and reporting formats for each policer capacity test.6.1.2.1. Maximum Policers on Single Physical Port
Test Summary: The first policer capacity test will benchmark a single physical port, with maximum policers on that physical port. Assume multiple categories of ingress policers at rates r1, r2, ..., rn. There are multiple customers on a single physical port. Each customer could be represented by a single-tagged VLAN, a double-tagged VLAN, a Virtual Private LAN Service (VPLS) instance, etc. Each customer is mapped to a different policer. Each of the policers can be of rates r1, r2, ..., rn. An example configuration would be - Y1 customers, policer rate r1 - Y2 customers, policer rate r2 - Y3 customers, policer rate r3 ... - Yn customers, policer rate rn
Some bandwidth on the physical port is dedicated for other traffic (i.e., other than customer traffic); this includes network control protocol traffic. There is a separate policer for the other traffic. Typical deployments have three categories of policers; there may be some deployments with more or less than three categories of ingress policers. Procedure: 1. Configure the DUT policing parameters for the desired CIR/EIR and CBS/EBS values for each policer rate (r1-rn) to be tested. 2. Configure the tester to generate a stateless traffic burst equal to CBS and an interval equal to Ti (CBS in bits/CIR) for each customer stream (Y1-Yn). The encapsulation for each customer must also be configured according to the service tested (VLAN, VPLS, IP mapping, etc.). 3. Compliant Traffic Test: Generate bursts of CBS + EBS traffic into the policer ingress port for each customer traffic stream, and measure the metrics defined in Section 4.1 (BSA, LP, OOS, PD, and PDV) at the egress port for each stream and across the entire Td (default 30-second duration). 4. Excess Traffic Test: Generate bursts of greater than CBS + EBS bytes into the policer ingress port for each customer traffic stream, and verify that the policer only allowed the BSA bytes to exit the egress for each stream. The excess burst MUST be recorded; the recommended value is 1000 bytes. Reporting Format: The policer individual report MUST contain all results for each CIR/EIR/CBS/EBS test run, per customer traffic stream. A recommended format is as follows: ***************************************************************** Test Configuration Summary: Tr, Td Customer Traffic Stream Encapsulation: Map each stream to VLAN, VPLS, IP address DUT Configuration Summary per Customer Traffic Stream: CIR, EIR, CBS, EBS
The results table should contain entries for each test run, as follows (Test #1 to Test #Tr): - Customer Stream Y1-Yn (see note) Compliant Traffic Test: BSA, LP, OOS, PD, and PDV - Customer Stream Y1-Yn (see note) Excess Traffic Test: BSA ***************************************************************** Note: For each test run, there will be two (2) rows for each customer stream: the Compliant Traffic Test result and the Excess Traffic Test result.6.1.2.2. Single Policer on All Physical Ports
Test Summary: The second policer capacity test involves a single policer function per physical port with all physical ports active. In this test, there is a single policer per physical port. The policer can have one of the rates r1, r2, ..., rn. All of the physical ports in the networking device are active. Procedure: The procedure for this test is identical to the procedure listed in Section 6.1.1. The configured parameters must be reported per port, and the test report must include results per measured egress port.6.1.2.3. Maximum Policers on All Physical Ports
The third policer capacity test is a combination of the first and second capacity tests, i.e., maximum policers active per physical port and all physical ports active. Procedure: The procedure for this test is identical to the procedure listed in Section 6.1.2.1. The configured parameters must be reported per port, and the test report must include per-stream results per measured egress port.
6.2. Queue/Scheduler Tests
Queues and traffic scheduling are closely related in that a queue's priority dictates the manner in which the traffic scheduler transmits packets out of the egress port. Since device queues/buffers are generally an egress function, this test framework will discuss testing at the egress (although the technique can be applied to ingress-side queues). Similar to the policing tests, these tests are divided into two sections: individual queue/scheduler function tests and then full-capacity tests.6.2.1. Queue/Scheduler Individual Tests
The various types of scheduling techniques include FIFO, Strict Priority (SP) queuing, and Weighted Fair Queuing (WFQ), along with other variations. This test framework recommends testing with a minimum of three techniques, although benchmarking other device-scheduling algorithms is left to the discretion of the tester.6.2.1.1. Testing Queue/Scheduler with Stateless Traffic
Objective: Verify that the configured queue and scheduling technique can handle stateless traffic bursts up to the queue depth. Test Summary: A network device queue is memory based, unlike a policing function, which is token or credit based. However, the same concepts from Section 6.1 can be applied to testing network device queues. The device's network queue should be configured to the desired size in KB (i.e., Queue Length (QL)), and then stateless traffic should be transmitted to test this QL. A queue should be able to handle repetitive bursts with the transmission gaps proportional to the Bottleneck Bandwidth (BB). The transmission gap is referred to here as the transmission interval (Ti). The Ti can be defined for the traffic bursts and is based on the QL and BB of the egress interface. Ti = QL * 8 / BB
Note that this equation is similar to the Ti required for transmission into a policer (QL = CBS, BB = CIR). Note also that the burst hunt algorithm defined in Section 5.1.1 can also be used to automate the measurement of the queue value. The stateless traffic burst SHALL be transmitted at the link speed and spaced within the transmission interval (Ti). The metrics defined in Section 4.1 SHALL be measured at the egress port and recorded; the primary intent is to verify the BSA and verify that no packets are dropped. The scheduling function must also be characterized to benchmark the device's ability to schedule the queues according to the priority. An example would be two levels of priority that include SP and FIFO queuing. Under a flow load greater than the egress port speed, the higher-priority packets should be transmitted without drops (and also maintain low latency), while the lower- priority (or best-effort) queue may be dropped. Test Metrics: The metrics defined in Section 4.1 (BSA, LP, OOS, PD, and PDV) SHALL be measured at the egress port and recorded. Procedure: 1. Configure the DUT QL and scheduling technique parameters (FIFO, SP, etc.). 2. Configure the tester to generate a stateless traffic burst equal to QL and an interval equal to Ti (QL in bits/BB). 3. Generate bursts of QL traffic into the DUT, and measure the metrics defined in Section 4.1 (LP, OOS, PD, and PDV) at the egress port and across the entire Td (default 30-second duration). Reporting Format: The Queue/Scheduler Stateless Traffic individual report MUST contain all results for each QL/BB test run. A recommended format is as follows: **************************************************************** Test Configuration Summary: Tr, Td DUT Configuration Summary: Scheduling technique (i.e., FIFO, SP, WFQ, etc.), BB, and QL
The results table should contain entries for each test run, as follows (Test #1 to Test #Tr): - LP, OOS, PD, and PDV ****************************************************************6.2.1.2. Testing Queue/Scheduler with Stateful Traffic
Objective: Verify that the configured queue and scheduling technique can handle stateful traffic bursts up to the queue depth. Test Background and Summary: To provide a more realistic benchmark and to test queues in Layer 4 devices such as firewalls, stateful traffic testing is recommended for the queue tests. Stateful traffic tests will also utilize the Network Delay Emulator (NDE) from the network setup configuration in Section 1.2. The BDP of the TCP test traffic must be calibrated to the QL of the device queue. Referencing [RFC6349], the BDP is equal to: BB * RTT / 8 (in bytes) The NDE must be configured to an RTT value that is large enough to allow the BDP to be greater than QL. An example test scenario is defined below: - Ingress link = GigE - Egress link = 100 Mbps (BB) - QL = 32 KB RTT(min) = QL * 8 / BB and would equal 2.56 ms (and the BDP = 32 KB) In this example, one (1) TCP connection with window size / SSB of 32 KB would be required to test the QL of 32 KB. This Bulk Transfer Test can be accomplished using iperf, as described in Appendix A.
Two types of TCP tests MUST be performed: the Bulk Transfer Test and the Micro Burst Test Pattern, as documented in Appendix B. The Bulk Transfer Test only bursts during the TCP Slow Start (or Congestion Avoidance) state, while the Micro Burst Test Pattern emulates application-layer bursting, which may occur any time during the TCP connection. Other types of tests SHOULD include the following: simple web sites, complex web sites, business applications, email, and SMB/CIFS (Common Internet File System) file copy (all of which are also documented in Appendix B). Test Metrics: The test results will be recorded per the stateful metrics defined in Section 4.2 -- primarily the TCP Test Pattern Execution Time (TTPET), TCP Efficiency, and Buffer Delay. Procedure: 1. Configure the DUT QL and scheduling technique parameters (FIFO, SP, etc.). 2. Configure the test generator* with a profile of an emulated application traffic mixture. - The application mixture MUST be defined in terms of percentage of the total bandwidth to be tested. - The rate of transmission for each application within the mixture MUST also be configurable. * To ensure repeatable results, the test generator MUST be capable of generating precise TCP test patterns for each application specified. 3. Generate application traffic between the ingress (client side) and egress (server side) ports of the DUT, and measure the metrics (TTPET, TCP Efficiency, and Buffer Delay) per application stream and at the ingress and egress ports (across the entire Td, default 60-second duration). A couple of items require clarification concerning application measurements: an application session may be comprised of a single TCP connection or multiple TCP connections. If an application session utilizes a single TCP connection, the application throughput/metrics have a 1-1 relationship to the TCP connection measurements.
If an application session (e.g., an HTTP-based application) utilizes multiple TCP connections, then all of the TCP connections are aggregated in the application throughput measurement/metrics for that application. Then, there is the case of multiple instances of an application session (i.e., multiple FTPs emulating multiple clients). In this situation, the test should measure/record each FTP application session independently, tabulating the minimum, maximum, and average for all FTP sessions. Finally, application throughput measurements are based on Layer 4 TCP throughput and do not include bytes retransmitted. The TCP Efficiency metric MUST be measured during the test, because it provides a measure of "goodput" during each test. Reporting Format: The Queue/Scheduler Stateful Traffic individual report MUST contain all results for each traffic scheduler and QL/BB test run. A recommended format is as follows: ****************************************************************** Test Configuration Summary: Tr, Td DUT Configuration Summary: Scheduling technique (i.e., FIFO, SP, WFQ, etc.), BB, and QL Application Mixture and Intensities: These are the percentages configured for each application type. The results table should contain entries for each test run, with minimum, maximum, and average per application session, as follows (Test #1 to Test #Tr): - Throughput (bps) and TTPET for each application session - Bytes In and Bytes Out for each application session - TCP Efficiency and Buffer Delay for each application session ******************************************************************
6.2.2. Queue/Scheduler Capacity Tests
Objective: The intent of these capacity tests is to benchmark queue/scheduler performance in a scaled environment with multiple queues/schedulers active on multiple egress physical ports. These tests will benchmark the maximum number of queues and schedulers as specified by the device manufacturer. Each priority in the system will map to a separate queue. Test Metrics: The metrics defined in Section 4.1 (BSA, LP, OOS, PD, and PDV) SHALL be measured at the egress port and recorded. The following sections provide the specific test scenarios, procedures, and reporting formats for each queue/scheduler capacity test.6.2.2.1. Multiple Queues, Single Port Active
For the first queue/scheduler capacity test, multiple queues per port will be tested on a single physical port. In this case, all of the queues (typically eight) are active on a single physical port. Traffic from multiple ingress physical ports is directed to the same egress physical port. This will cause oversubscription on the egress physical port. There are many types of priority schemes and combinations of priorities that are managed by the scheduler. The following sections specify the priority schemes that should be tested.6.2.2.1.1. Strict Priority on Egress Port
Test Summary: For this test, SP scheduling on the egress physical port should be tested, and the benchmarking methodologies specified in Sections 6.2.1.1 (stateless) and 6.2.1.2 (stateful) (procedure, metrics, and reporting format) should be applied here. For a given priority, each ingress physical port should get a fair share of the egress physical-port bandwidth.
Since this is a capacity test, the configuration and report results format (see Sections 6.2.1.1 and 6.2.1.2) MUST also include: Configuration: - The number of physical ingress ports active during the test - The classification marking (DSCP, VLAN, etc.) for each physical ingress port - The traffic rate for stateful traffic and the traffic rate/mixture for stateful traffic for each physical ingress port Report Results: - For each ingress port traffic stream, the achieved throughput rate and metrics at the egress port6.2.2.1.2. Strict Priority + WFQ on Egress Port
Test Summary: For this test, SP and WFQ should be enabled simultaneously in the scheduler, but on a single egress port. The benchmarking methodologies specified in Sections 6.2.1.1 (stateless) and 6.2.1.2 (stateful) (procedure, metrics, and reporting format) should be applied here. Additionally, the egress port bandwidth-sharing among weighted queues should be proportional to the assigned weights. For a given priority, each ingress physical port should get a fair share of the egress physical-port bandwidth. Since this is a capacity test, the configuration and report results format (see Sections 6.2.1.1 and 6.2.1.2) MUST also include: Configuration: - The number of physical ingress ports active during the test - The classification marking (DSCP, VLAN, etc.) for each physical ingress port - The traffic rate for stateful traffic and the traffic rate/mixture for stateful traffic for each physical ingress port
Report Results: - For each ingress port traffic stream, the achieved throughput rate and metrics at each queue of the egress port queue (both the SP and WFQ) Example: - Egress Port SP Queue: throughput and metrics for ingress streams 1-n - Egress Port WFQ: throughput and metrics for ingress streams 1-n6.2.2.2. Single Queue per Port, All Ports Active
Test Summary: Traffic from multiple ingress physical ports is directed to the same egress physical port. This will cause oversubscription on the egress physical port. Also, the same amount of traffic is directed to each egress physical port. The benchmarking methodologies specified in Sections 6.2.1.1 (stateless) and 6.2.1.2 (stateful) (procedure, metrics, and reporting format) should be applied here. Each ingress physical port should get a fair share of the egress physical-port bandwidth. Additionally, each egress physical port should receive the same amount of traffic. Since this is a capacity test, the configuration and report results format (see Sections 6.2.1.1 and 6.2.1.2) MUST also include: Configuration: - The number of ingress ports active during the test - The number of egress ports active during the test - The classification marking (DSCP, VLAN, etc.) for each physical ingress port - The traffic rate for stateful traffic and the traffic rate/mixture for stateful traffic for each physical ingress port
Report Results: - For each egress port, the achieved throughput rate and metrics at the egress port queue for each ingress port stream Example: - Egress Port 1: throughput and metrics for ingress streams 1-n - Egress Port n: throughput and metrics for ingress streams 1-n6.2.2.3. Multiple Queues per Port, All Ports Active
Test Summary: Traffic from multiple ingress physical ports is directed to all queues of each egress physical port. This will cause oversubscription on the egress physical ports. Also, the same amount of traffic is directed to each egress physical port. The benchmarking methodologies specified in Sections 6.2.1.1 (stateless) and 6.2.1.2 (stateful) (procedure, metrics, and reporting format) should be applied here. For a given priority, each ingress physical port should get a fair share of the egress physical-port bandwidth. Additionally, each egress physical port should receive the same amount of traffic. Since this is a capacity test, the configuration and report results format (see Sections 6.2.1.1 and 6.2.1.2) MUST also include: Configuration: - The number of physical ingress ports active during the test - The classification marking (DSCP, VLAN, etc.) for each physical ingress port - The traffic rate for stateful traffic and the traffic rate/mixture for stateful traffic for each physical ingress port Report Results: - For each egress port, the achieved throughput rate and metrics at each egress port queue for each ingress port stream
Example: - Egress Port 1, SP Queue: throughput and metrics for ingress streams 1-n - Egress Port 2, WFQ: throughput and metrics for ingress streams 1-n ... - Egress Port n, SP Queue: throughput and metrics for ingress streams 1-n - Egress Port n, WFQ: throughput and metrics for ingress streams 1-n