Network Working Group R. Dietz Request for Comments: 4150 Hifn, Inc. Category: Standards Track R. Cole JHU/APL August 2005 Transport Performance Metrics MIB Status of This Memo This document specifies an Internet standards track protocol for the Internet community, and requests discussion and suggestions for improvements. Please refer to the current edition of the "Internet Official Protocol Standards" (STD 1) for the standardization state and status of this protocol. Distribution of this memo is unlimited. Copyright Notice Copyright (C) The Internet Society (2005).Abstract
This memo defines a portion of the Management Information Base (MIB) for use with network management protocols in the Internet community. In particular, it describes managed objects used for monitoring selectable performance metrics and statistics derived from the monitoring of network packets and sub-application level transactions. The metrics can be defined through reference to existing IETF, ITU, and other standards organizations' documents. The monitoring covers both passive and active traffic generation sources.
Table of Contents
1. The Internet-Standard Management Framework ......................2 2. Overview ........................................................2 2.1. Terms ......................................................5 2.2. Report Aggregation .........................................5 2.3. Structure of the MIB .......................................6 2.4. Statistics for Aggregation of Data: Conventions ............7 2.5. Relationship to the Remote Monitoring MIB ..................7 2.6. Relationship to RMON2-MIB Protocol Identifier Reference ....7 2.7. Relationship to Standards-Based Performance Metrics ........7 2.8. Relationship to Application Performance Measurement MIB ....8 3. Statistics Perspective ..........................................8 3.1. Statistics Structure ......................................10 3.2. Statistics Analysis .......................................11 4. Definitions ....................................................11 5. Acknowledgements ...............................................51 6. Security Considerations ........................................52 7. Normative References ...........................................53 8. Informative References .........................................541. The Internet-Standard Management Framework
For a detailed overview of the documents that describe the current Internet-Standard Management Framework, please refer to section 7 of RFC 3410 [RFC3410]. Managed objects are accessed via a virtual information store, termed the Management Information Base or MIB. MIB objects are generally accessed through the Simple Network Management Protocol (SNMP). Objects in the MIB are defined using the mechanisms defined in the Structure of Management Information (SMI). This memo specifies a MIB module that is compliant to the SMIv2, which is described in STD 58, RFC 2578 [RFC2578], STD 58, RFC 2579 [RFC2579] and STD 58, RFC 2580 [RFC2580].2. Overview
This document continues the architecture created in the RMON2-MIB [RFC2021] by providing a major feature upgrade, primarily by providing new metrics and studies to assist in the analysis of performance for sub-application transaction flows in the network, in direct relationship to the transport of application layer protocols. Performance-monitoring agents have been widely used to analyze the parameters and metrics related to the perceived performance of distributed applications and services in networks. The metrics collected by these agents have ranged from basic response time to a
combination of metrics related to the loss and re-transmission of datagrams and PDUs. Although the metrics are becoming more useful in the implementation of service-level monitoring and troubleshooting tools, the lack of a standard method to report these has limited the deployment to very specific customer needs and areas. This document is intended to create a general framework for the collection and reporting of performance-related metrics on sub- application level transaction flows in a network. The MIB in this document is directly linked to the current RMON2-MIB [RFC2021], and uses the Protocol Directory as a key component in reporting the layering involved in the sub-application level transaction flows. The specific objectives of this document are to: + Provide a drill-down capability to complement the user-perceived monitoring defined within the Application Performance Measurement MIB (APM-MIB) [RFC3729]. This capability is intended to support trouble resolution, further characterization of performance, and a finer granularity of monitoring capabilities. The APM-MIB provides a method for retrieving aggregated measurement data of the end-user's perception of application-level performance. APM additionally provides thresholding and associated alarms if the end-user perceived performance degrades below defined thresholds. The Transport Performance Metrics MIB (TPM-MIB) complements the APM-MIB capabilities by monitoring sub-application level transaction aspects not typically perceived by the end-user. As an example, APM-MIB provides response time statistics of a typical web- browser application. This application typically consists of DNS transactions, TCP connection establishment (or multiple establishments), HTTP download of the base page, and multiple downloads of the various embedded objects. Ideally, TPM-MIB would provide statistics on the performance aspects of these multiple sub-application level transactions. + Provide additional performance metrics and related statistics. For troubleshooting and a finer granularity of performance monitoring, it is useful to provide measurements of additional metrics beyond those supported by the APM-MIB. + Support standards-based metrics and associated statistical aggregation by defining methods to reference those standards. The TPM-MIB provides a capability to describe metrics by reference to appropriate IETF, ITU, or other standards bodies defining metrics, including enterprise-specific standards bodies. This capability is provided through the tpmMetricsDefTable.
Specifically, this MIB itself does not make references to metric specifications of the IETF, ITU and other organizations. Instead, it allows for the setup of the tpmMetricDefTable that does reference such IETF, ITU, and other metric specifications, and it allows pointers to such specifications to be dynamically listed in this table. The following objects allow for that, and the DESCRIPTION clauses (of the objects below) explain how this is done: tpmMetricDefName OBJECT-TYPE tpmMetricDefReference OBJECT-TYPE tpmMetricDefGlobalID OBJECT-TYPE The tpmMetricDefGlobalID object contains a reference to the Object ID in a metrics registration MIB being developed in the IP Performance Metrics (IPPM) Working Group at the IETF; e.g., the IPPM-REGISTRY-MIB [RFC4148], which defines the metric. For metrics defined within the IPPM Working Group, which are included in the IPPM-REGISTRY-MIB, this object is used to reference those metrics directly. For metrics not included within the IPPM-REGISTRY-MIB, the value of this object is set to 0.0 for none. Examples of appropriate references include the ITU-T Recommendation Y.1540 [Y.1540] on IP packet transfer performance metrics, and the IETF documents from the IPPM WG; e.g., RFC 2681 on the round trip delay metric [RFC2681] or RFC3393 on the delay variation metric [RFC3393]. Others include RFC 2679 [RFC2679], RFC2680 [RFC2680], and RFC3432 [RFC3432]. Although no specific metric is mandatory, implementations should, at a minimum, support a round-trip delay and a round-trip loss metric. + Provide (as an option) a table storing the measurements of the metrics on a transaction by transaction basis. There are times when it is useful to have access to the raw measurements. The tpmCurReportTable optionally provides access to this capability. Although this document outlines the basic measurements of performance in regard to the transport of application flows, it does not attempt to measure or provide a means to measure the actual perceived performance of the application transactions or quality. The detailed measurements of end-user-perceived performance are directly related to this document and may be found in the APM-MIB [RFC3729]. The objects defined in this document are intended as an interface between an RMON agent and an RMON management application and are not intended for direct manipulation by humans. Although some users may tolerate the direct display of some of these objects, few will
tolerate the complexity of manually manipulating objects to accomplish row creation. These functions should be handled by the management application.2.1. Terms
This document uses some terms that need introduction: DataSource A source of data for monitoring purposes. This term is used exactly as defined in the RMON2-MIB [RFC2021]. protocol A specific protocol encapsulation, as identified for monitoring purposes. This term is used exactly as defined in the RMON Protocol Identifiers document [RFC2895]. performance metric A specific, measured reporting metric, as identified for monitoring purposes. There can be several metrics reported by an agent in the same implementation. The metrics are extensible based on the agent implementation. application A network-based, high-level protocol performing useful work to an end-user of an end-system. Typically, the application performs multiple request/response transactions to complete its work. E.g., a web application downloading a web page completes DNS, TCP-connect, and multiple HTTP GET transactions prior to completing its task. transactions Elemental request/response transactions comprising more complex network-based applications. E.g., a transaction may include an ftp get request and the file download in response.2.2. Report Aggregation
This MIB module provides functions that aggregate measurements into higher-level summaries identical to the aggregation defined in the APM-MIB [RFC3729]. In addition to temporal aggregation of data, the Textual Convention, TransactionAggregationType, is imported from the APM-MIB, which specifies the nature of the spatial aggregation employed.
2.3. Structure of the MIB
The objects are arranged in the following groups: -- tpmCapabilitiesGroup -- tpmAggregateReportsGroup -- tpmCurrentReportsGroup -- tpmExceptionReportsGroup These groups are the basic units of conformance. If an agent implements a group, then it must implement all objects in that group. Although this section provides an overview of grouping and conformance information for this MIB module, the authoritative reference for such information is contained in the MODULE-COMPLIANCE and OBJECT-GROUP macros later in this MIB module. These groups are defined to provide a means of assigning object identifiers, and to provide a method for implementers of managed agents to know which objects they must implement.2.3.1. The tpmCapabilitiesGroup
The tpmCapabilitiesGroup contains objects and tables that show the measurement protocol and metric capabilities of the agent. This group primarily consists of the tpmTransMetricDirTable and the tpmMetricDefTable.2.3.2. The tpmAggregateReportsGroup
The tpmAggregateReportsGroup is used to provide the collection of aggregated statistical measurements for the configured report intervals. The tpmAggregateReportsGroup consists of the tpmAggrReportCntrlTable and the tpmAggrReportTable.2.3.3. The tpmCurrentReportsGroup
The tpmCurrentReportsGroup is used to provide the collection of uncompleted measurements for the current configured report for those transactions caught in progress. A history of these transactions is also maintained once the current transaction has been completed. The tpmCurrentReportsGroup consists of the tpmCurReportTable and the tpmCurReportSize object.
2.3.4. The tpmExceptionReportsGroup
The tpmExceptionReportsGroup is used to link immediate notifications of transactions that exceed certain thresholds defined in the apmExceptionGroup [RFC3729]. This group reports the aggregated sub- application measurements for those applications exceeding thresholds. The tpmExceptionReportsGroup consists of the tpmExcpReportTable.2.4. Statistics for Aggregation of Data: Conventions
In order to measure the performance of traffic flows in a network, the proper analysis of a set of statistics is required. Because a large majority of the statistics have a basis of time, the use of a simple statistical model is feasible. Therefore, the MIB definitions within this document all use a basic set of statistical computed values to assist in further analysis by a management application. The remaining subsections in this section detail the common structured features the are applied to the performance metrics in the statistical format described above. The tpmMetricsDefTable (discussed below) describes the set of metrics supported in this MIB module.2.5. Relationship to the Remote Monitoring MIB
This document describes the implementation of an additional MIB for the support of performance-related metrics within the framework of the RMON2-MIB [RFC2021]. The objects and table defined in this MIB module are an extension to the existing framework for the support of both Client/Server and Server push-related applications and services.2.6. Relationship to RMON2-MIB Protocol Identifier Reference
This document uses the Protocol Identifiers outlined in the current Protocol Identifier Reference document, RFC 2895 [RFC2895]. The protocol index values throughout the document are a direct reference to the same relationship that exists between the RMON2-MIB [RFC2021] and the Protocol Identifier Reference document, RFC 2895 [RFC2895]. An important extension of the Protocol Identification to application- level verbs is found in RFC 3395 [RFC3395].2.7. Relationship to Standards-Based Performance Metrics
This document uses the tpmMetricsDefTable to describe the metrics supported by an instance of the TPM-MIB. The performance metric index values throughout the document are a direct reference to the
metrics defined in that table. The table defines metrics by directly referencing other standards that provide definitive descriptions of the metric.2.8. Relationship to Application Performance Measurement MIB
This document uses the apmReportControlIndex, appLocalIndex, and apmReportIndex, as outlined in the current Application Performance Measurement MIB [RFC3729]. These objects are used to create a reference link for the purpose of reporting transaction flow details on application-level measurements. As such, the TPM-MIB is designed to provide a drill-down extension to the APM-MIB. Further, it draws heavily on the ideas and designs laid out in the APM-MIB.3. Statistics Perspective
When dealing with time-based measurements on application data packets, ideally all the timestamps and related data could be stored and forwarded for later analysis. However, when faced with thousands of conversations per second on ever-faster networks, storing all the data, even if compressed, would take too much processing, memory, and manager download time to be practical. It is important to note that in dealing with network data we will be dealing with statistical populations and not samples. Statistics books deal with both because the math is similar. In collecting agent data, a population (i.e., all the data) must be processed. Because of the nature of application protocols, just sampling some of the packets will not give good results. Missing just one critical packet, such as one that specified an ephemeral port on which data will be transmitted or what application will be run, can cause much valid data to be lost. The time-based measurements the agent collects will come from examining the entire group of data, i.e., the population. The population will be finite. The agent will seek only to provide information that will describe the actual data. Analysis of that data will be left to the management station. The simplest form of representing a group of data is by frequency distributions, i.e., buckets. Statistics provides a great many ways of analyzing this type of data, and there are some rules in creating the buckets. First, the range needs to be known. Second, a bucket size needs to be determined. Fixed bucket sizes are best, although variable may be used if needed. However, the statistics texts tend only to refer to operations of fixed-size buckets. This method of describing data is expensive for an agent to implement. First, the
agent must process a great amount of data at a time. Storing the data, determining the range, locating the buckets, and then filling in the data after the fact takes a fair amount of storage and time. Fixing the range and bucket sizes in the beginning can be problematic, as the agent may have to adjust the values for each of the applications it collects data on. Such numbers can be in the thousands. Additional complexity arises in adding new protocols and even in describing the buckets themselves to the management application. This is the approach taken in the APM-MIB. A complimentary approach is to provide frequency distribution statistics. They describe aggregation such as mean and standard deviation that can be obtained by summation functions on the individual data elements in a population. Analysis of the data described by these functions has been thoroughly studied, and interpretation of these values is available to anyone with an introduction to statistics. In fact, frequency distributions are routinely analyzed to generate these varied numbers, which are then used for further analysis. Note that frequency distributions, by their very nature, provide an exact characterization of the data. Whereas buckets will introduce error factors that are not present with direct analysis by summation-type formulas. Because the TPM-MIB provides a drill-down capability to the APM MIB, it has to measure and store much more information than the APM-MIB. For this reason, and in order to complement the APM-MIB, the TPM-MIB relies on statistical descriptions rather than a bucket description of the measurement data. The agent will provide data that can be used to calculate the most basic and useful statistical aggregates. The agent will not perform the calculations and will not provide the statistical measurement directly. There are several reasons why this is not desired. The first is that finding the final measurement can be expensive in terms of computation and representation. There are divisions and square roots, and the measurements are expressed as floating point values. The second is that by providing the variables to the statistical functions, those variables are scalable. It is possible to combine smaller intervals into larger ones. An example is the arithmetic mean or average. This is the sum of the data divided by the number of data elements. The agent will provide the sum of the x and the number of elements N. The management station can perform the division to obtain the average. Given two samples, they can be combined by adding the sum of the x's and by adding the number of elements to get a combined sum and number of elements. The average formula then works just the same. Also, the sum of the x and the number of element variables are used in calculating other statistical measurement values.
3.1. Statistics Structure
The data statistical elements, datum, of the metric have been chosen to maximize the amount of data available while minimizing the amount of memory needed to store the statistic and minimizing the CPU processing requirement needed to generate the statistic. The statistic data structure contains five unsigned integer datum. N count of the number of data points for the metric S(X) sum of all the data point values for the metric S(X2) sum of all the data point values squared for the metric Xmax maximum data point value for the metric Xmin minimum data point value for the metric S(I*X) sum of the data points multiplied by their order, i.e., = SUM from i=1 to N { i*X sub i} A performance metric is used to describe events over a time interval. The measurement points can be processed immediately into the statistic and do not have to be stored for later processing. For example, to count the number of events in a time interval, it is sufficient to increment a counter for each event. It is not necessary to cache all the events and then to count them at the end of the interval. The statistic is also designed to be easily scalable in terms of combining adjacent intervals. For example, if an agent created a specific statistic every 30 seconds and a user table interval was set to 60 seconds, the 60-second statistic could be obtained by combining the two 30-second statistics. The following rules will be applied when combining adjacent statistics. N S(N) S(X) S(S(X)) S(X2) S(S(X2)) Xmax MAX(Xmax) Xmin MIN(Xmin) S(I*X) S(I*X) + N*S(X) +S(I*X) where the last two terms refer to the statistics from the later 30 second period and N is the count from the former 30 second period. This structure gives a generic framework upon which the actual performance statistics will be defined. Each specific statistical definition must address the specific significance, if any, given to each metric datum. While a specific metric definition should try to conform to the generic framework, it is acceptable for a metric datum to not be used, and to have no meaning, for a specific metric. In such cases the datum will default to a 0 value.
3.2. Statistics Analysis
The actual meaning of a specific statistical datum is determined by the definition of the specific statistic. The following is a discussion of the operations and observations that can be performed on a generic metric. This means that the following may or may not apply and/or have meaning when applied to any specific metric. The following observations and analysis techniques are not all inclusive. Rather these are the ones we have come up with at the time of writing this document. + Number. + Frequency. + The time interval is that specified in the control table. It is not a metric datum, but it is associated with the metric sample. + Maximum + Minimum + Range + Arithmetic Mean + Root Mean Square + Variance + Standard Deviation + Slope of a least-squares line These are accessible from the statistical datum provided by this MIB module.