Network Working Group S. Waldbusser Request for Comments: 3729 March 2004 Category: Standards Track Application Performance Measurement MIB Status of this Memo This document specifies an Internet standards track protocol for the Internet community, and requests discussion and suggestions for improvements. Please refer to the current edition of the "Internet Official Protocol Standards" (STD 1) for the standardization state and status of this protocol. Distribution of this memo is unlimited. Copyright Notice Copyright (C) The Internet Society (2004). All Rights Reserved.Abstract
This memo defines a portion of the Management Information Base (MIB) for use with network management protocols in TCP/IP-based internets. In particular, it defines objects for measuring the application performance as experienced by end-users.Table of Contents
1. The Internet-Standard Management Framework . . . . . . . . . . 2 2. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2.1. Report Aggregation . . . . . . . . . . . . . . . . . . . 4 2.2. AppLocalIndex Linkages . . . . . . . . . . . . . . . . . 8 2.3. Measurement Methodology. . . . . . . . . . . . . . . . . 10 2.4. Instrumentation Architectures. . . . . . . . . . . . . . 10 2.4.1. Application Directory Caching. . . . . . . . . . 10 2.4.2. Push Model . . . . . . . . . . . . . . . . . . . 11 2.5. Structure of this MIB Module . . . . . . . . . . . . . . 12 2.5.1. The APM Application Directory Group. . . . . . . 13 2.5.2. The APM User Defined Applications Group. . . . . 13 2.5.3. The APM Report Group . . . . . . . . . . . . . . 13 2.5.4. The APM Transaction Group. . . . . . . . . . . . 13 2.5.5. The APM Exception Group. . . . . . . . . . . . . 14 2.5.6. The APM Notification Group . . . . . . . . . . . 14 3. Definitions. . . . . . . . . . . . . . . . . . . . . . . . . . 14 4. Security Considerations. . . . . . . . . . . . . . . . . . . . 58 5. References . . . . . . . . . . . . . . . . . . . . . . . . . . 60 5.1. Normative References . . . . . . . . . . . . . . . . . . 60 5.2. Informative References . . . . . . . . . . . . . . . . . 60
6. Author's Address . . . . . . . . . . . . . . . . . . . . . . . 60 7. Full Copyright Statement . . . . . . . . . . . . . . . . . . . 611. The Internet-Standard Management Framework
For a detailed overview of the documents that describe the current Internet-Standard Management Framework, please refer to section 7 of RFC 3410 [8]. Managed objects are accessed via a virtual information store, termed the Management Information Base or MIB. MIB objects are generally accessed through the Simple Network Management Protocol (SNMP). Objects in the MIB are defined using the mechanisms defined in the Structure of Management Information (SMI). This memo specifies a MIB module that is compliant to the SMIv2, which is described in STD 58, RFC 2578 [1], STD 58, RFC 2579 [2] and STD 58, RFC 2580 [3].2. Overview
This document continues the architecture created in the RMON MIB [7] by providing analysis of application performance as experienced by end-users. Application performance measurement measures the quality of service delivered to end-users by applications. With this perspective, a true end-to-end view of the IT infrastructure results, combining the performance of the application, desktop, network, and server, as well as any positive or negative interactions between these components. Despite all the technically sophisticated ways in which networking and system resources can be measured, human end-users perceive only two things about an application: availability and responsiveness. Availability - The percentage of the time that the application is ready to give a user service. Responsiveness - The speed at which the application delivers the requested service. A transaction is an action initiated by a user that starts and completes a distributed processing function. A transaction begins when a user initiates a request for service (i.e., pushing a submit button) and ends when the work is completed (i.e., information is provided or a confirmation is delivered). A transaction is the fundamental item measured by the APM MIB.
A failed transaction is a transaction that fails to provide the service requested by the end user, regardless of whether it is due to a processing failure or transport failure. An application protocol (e.g., POP3) may implement different commands or application "verbs" (e.g., POP3 Login and POP3 Retrieval). It will often be interesting to monitor these verbs separately because: 1) The verbs may have widely differing performance characteristics (in fact some may be response time oriented while others are throughput oriented) 2) The verbs have varying business significance 3) It provides more granularity of exactly what might be performing poorly This MIB Module allows the measurement of a parent application, its component verbs, or both. If monitoring both, one can watch the top-level application and then drill down to the verbs when trouble is spotted to learn which subcomponents are in trouble. Each application verb is registered separately in the Protocol Directory [5] [6] as a child of its parent application. Application protocols implement one of three different types of transactions: transaction-oriented, throughput-oriented, or streaming-oriented. While the availability metric is the same for all three types, the responsiveness metric varies: Transaction-Oriented: These transactions have a fairly constant workload to perform for all transactions. In particular, to the degree that the workload may vary, it doesn't vary based on the amount of data to be transferred but based on the parameters of the transaction. The responsiveness metric for transaction- oriented applications is application response time, the elapsed time between the user's request for service (e.g., pushing the submit button) and the completion of the request (e.g., displaying the results) and is measured in milliseconds. This is commonly referred to as end-user response time. Throughput-Oriented: These transactions have widely varying workloads based on the amount of data requested. The responsiveness metric for throughput-oriented applications is kilobits per second. Streaming-Oriented: These transactions deliver data at a constant metered rate of speed regardless of excess capacity in the networking and computing infrastructure. However, when the infrastructures cannot deliver data at this speed, interruption of service or degradation of service can result. The responsiveness
metric for streaming-oriented applications is the signal quality ratio of time that the service is degraded or interrupted to the total service time. This metric is measured in parts per million.2.1. Report Aggregation
This MIB Module provides functions to aggregate measurements into higher level summaries. Every transaction is identified by its application, server, and client and has an availability measure as well as a responsiveness measure. The appropriate responsiveness measure is context-sensitive depending on whether the application is transaction-oriented, throughput-oriented, or streaming- oriented. For example, in a 5 minute period several transactions might be recorded: Application Client Server Successful Responsiveness HTTP Jim Sales 1 6 sec. SAP/R3 Jane Finance 1 17 sec. HTTP Joe HR 0 - FTP Jim FTP 1 212 Kbps HTTP Joe HR 1 25 sec. RealVideo Joe Videoconf 1 100.0% HTTP Jane HR 1 5 sec. These transactions can be aggregated in several ways, providing statistical summaries - for example summarizing all HTTP transactions, or all HTTP transactions to the HR Server. Note that data from different applications may not be summarized because: 1. The performance characteristics of different applications differ widely enough to render statistical analysis meaningless. 2. The responsiveness metrics of different applications may be different, making a statistical analysis impossible (in other words, one application may be transaction-oriented, while another is throughput-oriented). Aggregating transactions collected over a period requires an aggregation algorithm. In this MIB Module, transaction aggregation always results in the following statistics: TransactionCount The total number of transactions during this period
SuccessfulTransactions The total number of transactions that were successful. The management station can derive the percent success by dividing SuccessfulTransactions by the TransactionCount. ResponsivenessMean The average of the responsiveness metric for all aggregated transactions that completed successfully. ResponsivenessMin The minimum responsiveness metric for all aggregated transactions that completed successfully. ResponsivenessMax The maximum responsiveness metric for all aggregated transactions that completed successfully. ResponsivenessBx The count of successful transactions whose responsiveness metric fell into the range specified for Bx. There are 7 buckets specified. Because the performance of different applications varies widely, the bucket ranges are specified separately for each application (in the apmAppDirTable) so that they may be tuned to typical performance of each application. For example, when aggregating the previous set of transactions by application we get (for simplicity the example only shows TransactionCount, SuccessfulTransactions, and ResponsivenessMean): Application Count Successful ResponsivenessMean HTTP 4 3 12 sec. SAP/R3 1 1 17 sec. FTP 1 1 212 Kbps. RealVideo 1 1 100.0% There are four different types of aggregation. The flows(1) aggregation is the simplest. All transactions that share common application/server/client 3-tuples are aggregated together, resulting in a set of metrics for all such unique 3- tuples. The clients(2) aggregation results in somewhat more aggregation (i.e., fewer resulting records). All transactions that share common application/client tuples are aggregated together, resulting in a set of metrics for all such unique tuples.
The servers(3) aggregation usually results in still more aggregation (i.e., fewer resulting records). All transactions that share common application/server tuples are aggregated together, resulting in a set of metrics for all such unique tuples. The applications(4) aggregation results in the most aggregation (i.e., the fewest resulting records). All transactions that share a common application are aggregated together, resulting in a set of metrics for all such unique applications. For example, if in a 5 minute period the following transactions occurred: Actual Transactions: # App Client Server Successful Responsiveness 1 HTTP Jim CallCtr N - 2 HTTP Jim HR Y 12 sec. 3 HTTP Jim Sales Y 7 sec. 4 HTTP Jim CallCtr Y 5 sec. 5 Email Jim Pop3 Y 12 sec. 6 HTTP Jane CallCtr Y 3 sec. 7 SAP/R3 Jane Finance Y 19 sec. 8 Email Jane Pop3 Y 16 sec. 9 HTTP Joe HR Y 18 sec. The flows(1) aggregation results in the following table. Note that the first record (HTTP/Jim/CallCtr) is the aggregation of transactions #1 and #4: Flow Aggregation: App Client Server Count Succe- Rsp Rsp Rsp RspB1 RspB2 ssful Mean Min Max HTTP Jim CallCtr 2 1 5 5 5 1 0 HTTP Jim HR 1 1 12 12 12 0 1 HTTP Jim Sales 1 1 7 7 7 1 0 Email Jim Pop3 1 1 12 12 12 0 1 HTTP Jane CallCtr 1 1 3 3 3 1 0 SAP/R3 Jane Finance 1 1 19 19 19 0 1 Email Jane Pop3 1 1 16 16 16 0 1 HTTP Joe HR 1 1 18 18 18 0 1 (Note: Columns above such as RspMean and RspB1 are abbreviations for objects in the apmReportTable) The clients(2) aggregation results in the following table. Note that the first record (HTTP/Jim) is the aggregate of transactions #1, #2, #3 and #4:
Client Aggregation: App Client Count Succe- Rsp Rsp Rsp RspB1 RspB2 ... ssful Mean Min Max HTTP Jim 4 3 8 5 12 2 1 Email Jim 1 1 12 12 12 0 1 HTTP Jane 1 1 3 3 3 1 0 SAP/R3 Jane 1 1 19 19 19 0 1 Email Jane 1 1 16 16 16 0 1 HTTP Joe 1 1 18 18 18 0 1 The servers(3) aggregation results in the following table. Note that the first record (HTTP/CallCtr) is the aggregation of transactions #1, #4 and #6: Server Aggregation: App Server Count Succe- Rsp Rsp Rsp RspB1 RspB2 ... ssful Mean Min Max HTTP CallCtr 3 2 4 3 5 2 0 HTTP HR 2 2 15 12 18 0 2 HTTP Sales 1 1 7 7 7 1 0 Email Pop3 2 2 14 12 16 0 2 SAP/R3 Finance 1 1 19 19 19 0 1 The applications(4) aggregation results in the following table. Note that the first record (HTTP) is the aggregate of transactions #1, #2, #3, #5, #6 and #9: Application Aggregation: App Count Succe- Rsp Rsp Rsp RspB1 RspB2 ... ssful Mean Min Max HTTP 6 5 9 3 18 3 2 Email 2 2 14 12 16 0 2 SAP/R3 1 1 19 19 19 0 1 The apmReportControlTable provides for a historical set of the last 'X' reports, combining the historical records found in history tables with the periodic snapshots found in TopN tables. Conceptually the components are: apmReportControlTable Specifies data collection and summarization parameters, including the number of reports to keep and the size of each report. apmReport Each APM Report contains an aggregated list of records that represent data collected during a specific time period.
An apmReportControlEntry causes a family of APM Reports to be created, where each report summarizes different, successive, contiguous periods of time. While the conceptual model of APM Reports shows them as distinct entities, they are all entries in a single apmReportTable, where entries in report 'A' are separated from entries in report 'B' by different values of the apmReportIndex. +-----------------------+ | | | apmReportControlTable | | | +-----------+ +-----------------------+ | | +-----------+ | | | | +-----------+ |---+ | | | +----------+ |---+ | | | apmReport |apmReport |----+ +-----------------------+ | | |Thu Mar 30 12-1PM | +----------+ | | |CLNT SERV PROT stats | | | |Joe News HTTP data | |Jan POP POP3 data | |Jan POP SMTP data | |Bob HR PSOFT data | |... | |... | +-----------------------+2.2. AppLocalIndex Linkages
The following set of example tables illustrates a few points: 1. How protocolDirEntries, apmHttpFilterEntries and apmUserDefinedAppEntries(not shown) all result in entries in the apmAppDirTable. 2. How a single appLocalIndex may be represented multiple times in the apmAppDirTable and apmReportTable if the agent measures multiple responsiveness types for that application. A convention in the formatting of these tables is that the columns to the left of the '|' separator are index columns for the table.
Assuming the following entries in the RMON2 protocolDirectory: protocolDirectory ID (*) Parameters | LocalIndex ... WWW None | 1 WWW Get None | 2 SAP/R3 None | 3 (*) These IDs are represented here symbolically. Consult [5] for more detail in their format and the following entry in the apmHttpFilterTable: ApmHttpFilterTable Index | AppLocalIndex ServerAddress URLPath MatchType ... 5 | 20 hr.example.com /expense prefix(3) ... the apmAppDirTable would be populated with the following entries: apmAppDir AppLocalIndex ResponsivenessType | Config ... 1 transaction(1) | On ... 1 throughput(2) | On ... 2 transaction(1) | On ... 2 throughput(2) | On ... 3 transaction(1) | On ... 20 transaction(1) | On ... 20 throughput(2) | On ... The entries in the apmAppDirTable with an appLocalIndex of 1, 2 and 3 correspond to the identically named entries in the protocolDirectory table. appLocalIndex #1 results in 2 entries, one to measure the transaction responsiveness of WWW and one to measure its throughput responsiveness. In contrast, appLocalIndex #3 results in only a transaction entry because the agent does not measure the throughput responsiveness for SAP/R3 (probably because it isn't very meaningful). Finally, appLocalIndex #20 corresponds to the entry in the apmHttpFilterTable and has transaction responsiveness and throughput responsiveness measurements available. If a report was configured using application aggregation, entries in that report might look like:
apmReportTable CtlIndex Index AppLocalIdx ResponsivenessType | TransactionCount ... 1 1 1 transaction(1) | counters... 1 1 1 throughput(2) | counters... 1 1 2 transaction(1) | counters... 1 1 2 throughput(2) | counters... 1 1 3 transaction(1) | counters... 1 1 20 transaction(1) | counters... 1 1 20 throughput(2) | counters... Note that the index items protocolDirLocalIndex, apmReportServerAddress and apmReportClientID were omitted from apmReportTable example for brevity because they would have been equal to zero due to the use of the application aggregation in this example.2.3. Measurement Methodology
There are many different measurement methodologies available for measuring application performance (e.g., probe-based, client-based, synthetic-transaction, etc.). This specification does not mandate a particular methodology - it is open to any that meet the minimum requirements. Conformance to this specification requires that the collected data match the semantics described herein. In particular, a data collection methodology must be able to measure response time, throughput, streaming responsiveness and availability as specified. Note that in some cases a transaction may run for a long time but ultimately be successful. The measurement software shouldn't prematurely classify lengthy transactions as failures but should wait as long as the client application will wait for a successful response.2.4. Instrumentation Architectures
Different architectural approaches and deployment strategies may be taken towards implementation of this specification. If a highly distributed approach is desired (e.g., an agent per desktop), one or both of the two approaches below may be used to make it more practical.2.4.1. Application Directory Caching
It is necessary for the manager to have a copy of the tables that define the Application Directory in order to interpret APM measurements. It is likely that in a highly distributed network of
thousands of APM agents, this Application Directory will be the same on many, if not all of the agents. Repeated downloads of the Application Directory may be inefficient. The apmAppDirID object is a single object that identifies the configuration of all aspects of the Application Directory when it is equal to a well-known, registered configuration. Thus, when a manager sees an apmAppDirID value that it recognizes, it need not download the Application Directory from that agent. In fact, the manager may discover a new registered Application Directory configuration on one agent and then re-use that configuration on another agent that shares the same apmAppDirID value. Application directory registrations are unique within an administrative domain, allowing an administrator to create a custom application directory configuration without the need to assign it a globally-unique registration.2.4.2. Push Model
When APM agents are installed on "desktops" (including laptops), a few issues make polling difficult: 1. Desktops often have dynamically-assigned addresses so there is no long-lived address to poll. 2. Desktops are not available as much as infrastructure components due to crashes, user-initiated reboots and shutdowns and user control over monitoring software. Thus a desktop may not be available to answer a poll at the moment when the manager is scheduled to poll that desktop. 3. Laptops that are connected via dialup connections are only sporadically connected and will routinely be unreachable when the manager is scheduled to poll. As a consequence, a push model is usually more appropriate for desktop-based agents. To achieve this, the agent should follow the following rules in deciding what data to send in notifications.
APM Reports If an agent wishes to push APM reports to a manager, it must send: apmAppDirID apmNameTable (any data updated since the last push) For each report the agent wishes to upload, it must send the entire apmReportControlEntry associated with that report and the associated entries in the apmReportTable that have changed since the last report. APM Transactions If an agent wishes to push APM transactions to a manager, it must send: apmAppDirID apmNameTable (any data updated since the last push) apmTransactionTable (relevant entries) APM Exceptions The agent must send: apmAppDirID apmNameTable (any data updated since the last push) apmTransactionEntry (of exception transaction) apmExceptionEntry (entry that generated exception) [Note that this list supersedes the information in the OBJECTS clauses of the apmTransactionResponsivenessAlarm and apmTransactionUnsuccessfulAlarm when the agent is using a push model. This additional information eliminates the need for the manager to request additional data to understand the exception.] The order of varbinds and where to segment varbinds into PDUs is at the discretion of the agent.2.5. Structure of this MIB Module
The objects are arranged into the following groups: - APM Application Directory Group - APM User Defined Applications Group - APM Report Group - APM Transaction Group - APM Exception Group - APM Notification Group
These groups are the basic unit of conformance. If an agent implements a group, then it must implement all objects in that group. While this section provides an overview of grouping and conformance information for this MIB Module, the authoritative reference for such information is contained in the MODULE-COMPLIANCE and OBJECT-GROUP macros later in this MIB Module. These groups are defined to provide a means of assigning object identifiers, and to provide a method for implementors of managed agents to know which objects they must implement.2.5.1. The APM Application Directory Group
The APM Application Directory group contains configuration objects for every application or application verb monitored on this system. This group consists of the apmAppDirTable.2.5.2. The APM User Defined Applications Group
The APM User Defined Applications Group contains objects that allow for the tracking of applications or application verbs that aren't registered in the protocolDirTable. This group consists of the apmHttpFilterTable and the apmUserDefinedAppTable.2.5.3. The APM Report Group
The APM Report Group is used to prepare regular reports that aggregate application performance by flow, by client, by server, or by application. This group consists of the apmReportControlTable and the apmReportTable.2.5.4. The APM Transaction Group
The APM Transaction Group is used to show transactions that are currently in progress and ones that have ended recently, along with their responsiveness metric. Because many transactions last a very short time and because an agent may not retain completed transactions very long, transactions may exist in this table for a very short time. Thus, polling this table isn't an effective mechanism for retrieving all transactions unless the value of apmTransactionsHistorySize is suitably large for the transactions being monitored. One important benefit of this table is that it allows a management station to check on the status of long-lived transactions. Because the apmReport and apmException mechanisms act only on transactions that have finished, a network manager may not have visibility for
some time into the performance of long-lived transactions such as streaming applications, large data transfers, or (very) poorly performing transactions. In fact, by their very definition, the apmReport and apmException mechanisms only provide visibility into a problem after nothing can be done about it. This group consists primarily of the apmTransactionTable.2.5.5. The APM Exception Group
The APM Exception Group is used to generate immediate notifications of transactions that cross certain thresholds. The apmExceptionTable is used to configure which thresholds are to be checked for which types of transactions. The apmTransactionResponsivenessAlarm notification is sent when a transaction occurs with a responsiveness that crosses a threshold. The apmTransactionUnsuccessfulAlarm notification is sent when a transaction fails for which exception checking was configured. This group consists primarily of the apmExceptionTable.2.5.6. The APM Notification Group
The APM Notification Group contains 2 notifications that are sent when thresholds in the APM Exception Table are exceeded.