RFC 3729

Application Performance Measurement MIB

Pages: 61
Proposed Standard

Part 1 of 3 – Pages 1 to 14

RFC3729 - Page 1

Network Working Group                                      S. Waldbusser
Request for Comments: 3729                                    March 2004
Category: Standards Track


                Application Performance Measurement MIB

Status of this Memo

   This document specifies an Internet standards track protocol for the
   Internet community, and requests discussion and suggestions for
   improvements.  Please refer to the current edition of the "Internet
   Official Protocol Standards" (STD 1) for the standardization state
   and status of this protocol.  Distribution of this memo is unlimited.

Copyright Notice

   Copyright (C) The Internet Society (2004).  All Rights Reserved.

Abstract

   This memo defines a portion of the Management Information Base (MIB)
   for use with network management protocols in TCP/IP-based internets.
   In particular, it defines objects for measuring the application
   performance as experienced by end-users.

Table of Contents

   1.  The Internet-Standard Management Framework . . . . . . . . . .  2
   2.  Overview . . . . . . . . . . . . . . . . . . . . . . . . . . .  2
       2.1.  Report Aggregation . . . . . . . . . . . . . . . . . . .  4
       2.2.  AppLocalIndex Linkages . . . . . . . . . . . . . . . . .  8
       2.3.  Measurement Methodology. . . . . . . . . . . . . . . . . 10
       2.4.  Instrumentation Architectures. . . . . . . . . . . . . . 10
             2.4.1.  Application Directory Caching. . . . . . . . . . 10
             2.4.2.  Push Model . . . . . . . . . . . . . . . . . . . 11
       2.5.  Structure of this MIB Module . . . . . . . . . . . . . . 12
             2.5.1.  The APM Application Directory Group. . . . . . . 13
             2.5.2.  The APM User Defined Applications Group. . . . . 13
             2.5.3.  The APM Report Group . . . . . . . . . . . . . . 13
             2.5.4.  The APM Transaction Group. . . . . . . . . . . . 13
             2.5.5.  The APM Exception Group. . . . . . . . . . . . . 14
             2.5.6.  The APM Notification Group . . . . . . . . . . . 14
   3.  Definitions. . . . . . . . . . . . . . . . . . . . . . . . . . 14
   4.  Security Considerations. . . . . . . . . . . . . . . . . . . . 58
   5.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 60
       5.1.  Normative References . . . . . . . . . . . . . . . . . . 60
       5.2.  Informative References . . . . . . . . . . . . . . . . . 60

RFC3729 - Page 2

   6.  Author's Address . . . . . . . . . . . . . . . . . . . . . . . 60
   7.  Full Copyright Statement . . . . . . . . . . . . . . . . . . . 61

1.  The Internet-Standard Management Framework

   For a detailed overview of the documents that describe the current
   Internet-Standard Management Framework, please refer to section 7 of
   RFC 3410 [8].

   Managed objects are accessed via a virtual information store, termed
   the Management Information Base or MIB.  MIB objects are generally
   accessed through the Simple Network Management Protocol (SNMP).
   Objects in the MIB are defined using the mechanisms defined in the
   Structure of Management Information (SMI).  This memo specifies a MIB
   module that is compliant to the SMIv2, which is described in STD 58,
   RFC 2578 [1], STD 58, RFC 2579 [2] and STD 58, RFC 2580 [3].

2.  Overview

   This document continues the architecture created in the RMON MIB [7]
   by providing analysis of application performance as experienced by
   end-users.

   Application performance measurement measures the quality of service
   delivered to end-users by applications.  With this perspective, a
   true end-to-end view of the IT infrastructure results, combining the
   performance of the application, desktop, network, and server, as well
   as any positive or negative interactions between these components.

   Despite all the technically sophisticated ways in which networking
   and system resources can be measured, human end-users perceive only
   two things about an application: availability and responsiveness.

      Availability - The percentage of the time that the application is
      ready to give a user service.

      Responsiveness - The speed at which the application delivers the
      requested service.

   A transaction is an action initiated by a user that starts and
   completes a distributed processing function.  A transaction begins
   when a user initiates a request for service (i.e., pushing a submit
   button) and ends when the work is completed (i.e., information is
   provided or a confirmation is delivered).  A transaction is the
   fundamental item measured by the APM MIB.

RFC3729 - Page 3

   A failed transaction is a transaction that fails to provide the
   service requested by the end user, regardless of whether it is due to
   a processing failure or transport failure.

   An application protocol (e.g., POP3) may implement different commands
   or application "verbs" (e.g., POP3 Login and POP3 Retrieval).  It
   will often be interesting to monitor these verbs separately because:

   1) The verbs may have widely differing performance characteristics
      (in fact some may be response time oriented while others are
      throughput oriented)
   2) The verbs have varying business significance
   3) It provides more granularity of exactly what might be performing
      poorly

   This MIB Module allows the measurement of a parent application, its
   component verbs, or both.  If monitoring both, one can watch the
   top-level application and then drill down to the verbs when trouble
   is spotted to learn which subcomponents are in trouble.  Each
   application verb is registered separately in the Protocol Directory
   [5] [6] as a child of its parent application.

   Application protocols implement one of three different types of
   transactions: transaction-oriented, throughput-oriented, or
   streaming-oriented.  While the availability metric is the same for
   all three types, the responsiveness metric varies:

      Transaction-Oriented: These transactions have a fairly constant
      workload to perform for all transactions.  In particular, to the
      degree that the workload may vary, it doesn't vary based on the
      amount of data to be transferred but based on the parameters of
      the transaction.  The responsiveness metric for transaction-
      oriented applications is application response time, the elapsed
      time between the user's request for service (e.g., pushing the
      submit button) and the completion of the request (e.g., displaying
      the results) and is measured in milliseconds.  This is commonly
      referred to as end-user response time.

      Throughput-Oriented: These transactions have widely varying
      workloads based on the amount of data requested.  The
      responsiveness metric for throughput-oriented applications is
      kilobits per second.

      Streaming-Oriented: These transactions deliver data at a constant
      metered rate of speed regardless of excess capacity in the
      networking and computing infrastructure.  However, when the
      infrastructures cannot deliver data at this speed, interruption of
      service or degradation of service can result.  The responsiveness

RFC3729 - Page 4

      metric for streaming-oriented applications is the signal quality
      ratio of time that the service is degraded or interrupted to the
      total service time.  This metric is measured in parts per million.

2.1.  Report Aggregation

   This MIB Module provides functions to aggregate measurements into
   higher level summaries.

   Every transaction is identified by its application, server, and
   client and has an availability measure as well as a responsiveness
   measure.  The appropriate responsiveness measure is context-sensitive
   depending on whether the application is transaction-oriented,
   throughput-oriented, or streaming- oriented.  For example, in a 5
   minute period several transactions might be recorded:

   Application  Client  Server    Successful    Responsiveness
   HTTP         Jim     Sales     1             6 sec.
   SAP/R3       Jane    Finance   1             17 sec.
   HTTP         Joe     HR        0             -
   FTP          Jim     FTP       1             212 Kbps
   HTTP         Joe     HR        1             25 sec.
   RealVideo    Joe     Videoconf 1             100.0%
   HTTP         Jane    HR        1             5 sec.

   These transactions can be aggregated in several ways, providing
   statistical summaries - for example summarizing all HTTP
   transactions, or all HTTP transactions to the HR Server.  Note that
   data from different applications may not be summarized because:

   1. The performance characteristics of different applications differ
      widely enough to render statistical analysis meaningless.

   2. The responsiveness metrics of different applications may be
      different, making a statistical analysis impossible (in other
      words, one application may be transaction-oriented, while another
      is throughput-oriented).

   Aggregating transactions collected over a period requires an
   aggregation algorithm.  In this MIB Module, transaction aggregation
   always results in the following statistics:

   TransactionCount
      The total number of transactions during this period

RFC3729 - Page 5

   SuccessfulTransactions
      The total number of transactions that were successful.  The
      management station can derive the percent success by dividing
      SuccessfulTransactions by the TransactionCount.

   ResponsivenessMean
      The average of the responsiveness metric for all aggregated
      transactions that completed successfully.

   ResponsivenessMin
      The minimum responsiveness metric for all aggregated transactions
      that completed successfully.

   ResponsivenessMax
      The maximum responsiveness metric for all aggregated transactions
      that completed successfully.

   ResponsivenessBx
      The count of successful transactions whose responsiveness metric
      fell into the range specified for Bx.  There are 7 buckets
      specified.  Because the performance of different applications
      varies widely, the bucket ranges are specified separately for each
      application (in the apmAppDirTable) so that they may be tuned to
      typical performance of each application.

   For example, when aggregating the previous set of transactions by
   application we get (for simplicity the example only shows
   TransactionCount, SuccessfulTransactions, and ResponsivenessMean):

   Application  Count Successful      ResponsivenessMean
   HTTP         4     3               12 sec.
   SAP/R3       1     1               17 sec.
   FTP          1     1               212 Kbps.
   RealVideo    1     1               100.0%

   There are four different types of aggregation.

      The flows(1) aggregation is the simplest.  All transactions that
      share common application/server/client 3-tuples are aggregated
      together, resulting in a set of metrics for all such unique 3-
      tuples.

      The clients(2) aggregation results in somewhat more aggregation
      (i.e., fewer resulting records).  All transactions that share
      common application/client tuples are aggregated together,
      resulting in a set of metrics for all such unique tuples.

RFC3729 - Page 6

      The servers(3) aggregation usually results in still more
      aggregation (i.e., fewer resulting records).  All transactions
      that share common application/server tuples are aggregated
      together, resulting in a set of metrics for all such unique
      tuples.

      The applications(4) aggregation results in the most aggregation
      (i.e., the fewest resulting records).  All transactions that share
      a common application are aggregated together, resulting in a set
      of metrics for all such unique applications.

   For example, if in a 5 minute period the following transactions
   occurred:

   Actual Transactions:
   #   App      Client  Server   Successful    Responsiveness
   1   HTTP     Jim     CallCtr  N             -
   2   HTTP     Jim     HR       Y             12 sec.
   3   HTTP     Jim     Sales    Y             7 sec.
   4   HTTP     Jim     CallCtr  Y             5 sec.
   5   Email    Jim     Pop3     Y             12 sec.
   6   HTTP     Jane    CallCtr  Y             3 sec.
   7   SAP/R3   Jane    Finance  Y             19 sec.
   8   Email    Jane    Pop3     Y             16 sec.
   9   HTTP     Joe     HR       Y             18  sec.

   The flows(1) aggregation results in the following table.  Note that
   the first record (HTTP/Jim/CallCtr) is the aggregation of
   transactions #1 and #4:

Flow Aggregation:
App     Client  Server    Count  Succe-  Rsp    Rsp   Rsp   RspB1 RspB2
                                 ssful   Mean   Min   Max
HTTP    Jim     CallCtr   2      1       5      5     5     1     0
HTTP    Jim     HR        1      1       12     12    12    0     1
HTTP    Jim     Sales     1      1       7      7     7     1     0
Email   Jim     Pop3      1      1       12     12    12    0     1
HTTP    Jane    CallCtr   1      1       3      3     3     1     0
SAP/R3  Jane    Finance   1      1       19     19    19    0     1
Email   Jane    Pop3      1      1       16     16    16    0     1
HTTP    Joe     HR        1      1       18     18    18    0     1

   (Note: Columns above such as RspMean and RspB1 are abbreviations for
   objects in the apmReportTable)

   The clients(2) aggregation results in the following table.  Note that
   the first record (HTTP/Jim) is the aggregate of transactions #1, #2,
   #3 and #4:

RFC3729 - Page 7

   Client Aggregation:
   App     Client   Count  Succe-  Rsp    Rsp   Rsp   RspB1  RspB2 ...
                           ssful   Mean   Min   Max
   HTTP    Jim      4      3       8      5     12    2      1
   Email   Jim      1      1       12     12    12    0      1
   HTTP    Jane     1      1       3      3     3     1      0
   SAP/R3  Jane     1      1       19     19    19    0      1
   Email   Jane     1      1       16     16    16    0      1
   HTTP    Joe      1      1       18     18    18    0      1

   The servers(3) aggregation results in the following table.  Note that
   the first record (HTTP/CallCtr) is the aggregation of transactions
   #1, #4 and #6:

   Server Aggregation:
   App     Server   Count  Succe-  Rsp    Rsp   Rsp   RspB1  RspB2 ...
                           ssful   Mean   Min   Max
   HTTP    CallCtr  3      2       4      3     5     2      0
   HTTP    HR       2      2       15     12    18    0      2
   HTTP    Sales    1      1       7      7     7     1      0
   Email   Pop3     2      2       14     12    16    0      2
   SAP/R3  Finance  1      1       19     19    19    0      1

   The applications(4) aggregation results in the following table.  Note
   that the first record (HTTP) is the aggregate of transactions #1, #2,
   #3, #5, #6 and #9:

   Application Aggregation:
   App      Count  Succe-  Rsp    Rsp   Rsp   RspB1  RspB2 ...
                   ssful   Mean   Min   Max
   HTTP     6      5       9      3     18    3      2
   Email    2      2       14     12    16    0      2
   SAP/R3   1      1       19     19    19    0      1

   The apmReportControlTable provides for a historical set of the last
   'X' reports, combining the historical records found in history tables
   with the periodic snapshots found in TopN tables.  Conceptually the
   components are:

   apmReportControlTable
      Specifies data collection and summarization parameters, including
      the number of reports to keep and the size of each report.

   apmReport
      Each APM Report contains an aggregated list of records that
      represent data collected during a specific time period.

RFC3729 - Page 8

      An apmReportControlEntry causes a family of APM Reports to be
      created, where each report summarizes different, successive,
      contiguous periods of time.

      While the conceptual model of APM Reports shows them as distinct
      entities, they are all entries in a single apmReportTable, where
      entries in report 'A' are separated from entries in report 'B' by
      different values of the apmReportIndex.

      +-----------------------+
      |                       |
      | apmReportControlTable |
      |                       |      +-----------+
      +-----------------------+      |           |
                                 +-----------+   |
                                 |           |   |
                             +-----------+   |---+
                             |           |   |
                         +----------+    |---+
                         |          |    |               apmReport
                         |apmReport |----+  +-----------------------+
                         |          |       |Thu Mar 30 12-1PM      |
                         +----------+       |                       |
                                            |CLNT SERV  PROT  stats |
                                            |                       |
                                            |Joe  News  HTTP  data  |
                                            |Jan  POP   POP3  data  |
                                            |Jan  POP   SMTP  data  |
                                            |Bob  HR    PSOFT data  |
                                            |...                    |
                                            |...                    |
                                            +-----------------------+

2.2.  AppLocalIndex Linkages

      The following set of example tables illustrates a few points:

   1. How protocolDirEntries, apmHttpFilterEntries and
      apmUserDefinedAppEntries(not shown) all result in entries in the
      apmAppDirTable.

   2. How a single appLocalIndex may be represented multiple times in
      the apmAppDirTable and apmReportTable if the agent measures
      multiple responsiveness types for that application.

   A convention in the formatting of these tables is that the columns to
   the left of the '|' separator are index columns for the table.

RFC3729 - Page 9

   Assuming the following entries in the RMON2 protocolDirectory:

   protocolDirectory
   ID (*)     Parameters   |    LocalIndex ...
   WWW        None         |    1
   WWW Get    None         |    2
   SAP/R3     None         |    3

     (*) These IDs are represented here symbolically.  Consult [5] for
         more detail in their format

   and the following entry in the apmHttpFilterTable:

   ApmHttpFilterTable
   Index   |  AppLocalIndex  ServerAddress   URLPath    MatchType ...
   5       |  20             hr.example.com  /expense   prefix(3) ...

   the apmAppDirTable would be populated with the following
   entries:

   apmAppDir
   AppLocalIndex  ResponsivenessType       | Config  ...
   1              transaction(1)           | On      ...
   1              throughput(2)            | On      ...
   2              transaction(1)           | On      ...
   2              throughput(2)            | On      ...
   3              transaction(1)           | On      ...
   20             transaction(1)           | On      ...
   20             throughput(2)            | On      ...

   The entries in the apmAppDirTable with an appLocalIndex of 1, 2 and 3
   correspond to the identically named entries in the protocolDirectory
   table.  appLocalIndex #1 results in 2 entries, one to measure the
   transaction responsiveness of WWW and one to measure its throughput
   responsiveness.  In contrast, appLocalIndex #3 results in only a
   transaction entry because the agent does not measure the throughput
   responsiveness for SAP/R3 (probably because it isn't very
   meaningful).  Finally, appLocalIndex #20 corresponds to the entry in
   the apmHttpFilterTable and has transaction responsiveness and
   throughput responsiveness measurements available.

   If a report was configured using application aggregation, entries in
   that report might look like:

RFC3729 - Page 10

   apmReportTable
   CtlIndex Index AppLocalIdx  ResponsivenessType | TransactionCount ...
   1        1     1            transaction(1)     | counters...
   1        1     1            throughput(2)      | counters...
   1        1     2            transaction(1)     | counters...
   1        1     2            throughput(2)      | counters...
   1        1     3            transaction(1)     | counters...
   1        1     20           transaction(1)     | counters...
   1        1     20           throughput(2)      | counters...

   Note that the index items protocolDirLocalIndex,
   apmReportServerAddress and apmReportClientID were omitted from
   apmReportTable example for brevity because they would have been equal
   to zero due to the use of the application aggregation in this
   example.

2.3.  Measurement Methodology

   There are many different measurement methodologies available for
   measuring application performance (e.g., probe-based, client-based,
   synthetic-transaction, etc.).  This specification does not mandate a
   particular methodology - it is open to any that meet the minimum
   requirements.  Conformance to this specification requires that the
   collected data match the semantics described herein.  In particular,
   a data collection methodology must be able to measure response time,
   throughput, streaming responsiveness and availability as specified.

   Note that in some cases a transaction may run for a long time but
   ultimately be successful.  The measurement software shouldn't
   prematurely classify lengthy transactions as failures but should wait
   as long as the client application will wait for a successful
   response.

2.4.  Instrumentation Architectures

   Different architectural approaches and deployment strategies may be
   taken towards implementation of this specification.  If a highly
   distributed approach is desired (e.g., an agent per desktop), one or
   both of the two approaches below may be used to make it more
   practical.

2.4.1.  Application Directory Caching

   It is necessary for the manager to have a copy of the tables that
   define the Application Directory in order to interpret APM
   measurements.  It is likely that in a highly distributed network of

RFC3729 - Page 11

   thousands of APM agents, this Application Directory will be the same
   on many, if not all of the agents.  Repeated downloads of the
   Application Directory may be inefficient.

   The apmAppDirID object is a single object that identifies the
   configuration of all aspects of the Application Directory when it is
   equal to a well-known, registered configuration.  Thus, when a
   manager sees an apmAppDirID value that it recognizes, it need not
   download the Application Directory from that agent.  In fact, the
   manager may discover a new registered Application Directory
   configuration on one agent and then re-use that configuration on
   another agent that shares the same apmAppDirID value.

   Application directory registrations are unique within an
   administrative domain, allowing an administrator to create a custom
   application directory configuration without the need to assign it a
   globally-unique registration.

2.4.2.  Push Model

   When APM agents are installed on "desktops" (including laptops), a
   few issues make polling difficult:

   1. Desktops often have dynamically-assigned addresses so there is no
      long-lived address to poll.

   2. Desktops are not available as much as infrastructure components
      due to crashes, user-initiated reboots and shutdowns and user
      control over monitoring software.  Thus a desktop may not be
      available to answer a poll at the moment when the manager is
      scheduled to poll that desktop.

   3. Laptops that are connected via dialup connections are only
      sporadically connected and will routinely be unreachable when the
      manager is scheduled to poll.

   As a consequence, a push model is usually more appropriate for
   desktop-based agents.  To achieve this, the agent should follow the
   following rules in deciding what data to send in notifications.

RFC3729 - Page 12

   APM Reports
       If an agent wishes to push APM reports to a manager, it
       must send:
           apmAppDirID
           apmNameTable (any data updated since the last push)
       For each report the agent wishes to upload, it must
       send the entire apmReportControlEntry associated with
       that report and the associated entries in the
       apmReportTable that have changed since the last report.

   APM Transactions
       If an agent wishes to push APM transactions to
       a manager, it must send:
           apmAppDirID
           apmNameTable (any data updated since the last push)
           apmTransactionTable (relevant entries)

   APM Exceptions
       The agent must send:
           apmAppDirID
           apmNameTable (any data updated since the last push)
           apmTransactionEntry (of exception transaction)
           apmExceptionEntry (entry that generated exception)
     [Note that this list supersedes the information in the
     OBJECTS clauses of the apmTransactionResponsivenessAlarm
     and apmTransactionUnsuccessfulAlarm when the agent is
     using a push model.  This additional information
     eliminates the need for the manager to request additional
     data to understand the exception.]

   The order of varbinds and where to segment varbinds into PDUs is at
   the discretion of the agent.

2.5.  Structure of this MIB Module

   The objects are arranged into the following groups:

      - APM Application Directory Group

      - APM User Defined Applications Group

      - APM Report Group

      - APM Transaction Group

      - APM Exception Group

      - APM Notification Group

RFC3729 - Page 13

   These groups are the basic unit of conformance.  If an agent
   implements a group, then it must implement all objects in that group.
   While this section provides an overview of grouping and conformance
   information for this MIB Module, the authoritative reference for such
   information is contained in the MODULE-COMPLIANCE and OBJECT-GROUP
   macros later in this MIB Module.

   These groups are defined to provide a means of assigning object
   identifiers, and to provide a method for implementors of managed
   agents to know which objects they must implement.

2.5.1.  The APM Application Directory Group

   The APM Application Directory group contains configuration objects
   for every application or application verb monitored on this system.
   This group consists of the apmAppDirTable.

2.5.2.  The APM User Defined Applications Group

   The APM User Defined Applications Group contains objects that allow
   for the tracking of applications or application verbs that aren't
   registered in the protocolDirTable.  This group consists of the
   apmHttpFilterTable and the apmUserDefinedAppTable.

2.5.3.  The APM Report Group

   The APM Report Group is used to prepare regular reports that
   aggregate application performance by flow, by client, by server, or
   by application.  This group consists of the apmReportControlTable and
   the apmReportTable.

2.5.4.  The APM Transaction Group

   The APM Transaction Group is used to show transactions that are
   currently in progress and ones that have ended recently, along with
   their responsiveness metric.

   Because many transactions last a very short time and because an agent
   may not retain completed transactions very long, transactions may
   exist in this table for a very short time.  Thus, polling this table
   isn't an effective mechanism for retrieving all transactions unless
   the value of apmTransactionsHistorySize is suitably large for the
   transactions being monitored.

   One important benefit of this table is that it allows a management
   station to check on the status of long-lived transactions.  Because
   the apmReport and apmException mechanisms act only on transactions
   that have finished, a network manager may not have visibility for

RFC3729 - Page 14

   some time into the performance of long-lived transactions such as
   streaming applications, large data transfers, or (very) poorly
   performing transactions.  In fact, by their very definition, the
   apmReport and apmException mechanisms only provide visibility into a
   problem after nothing can be done about it.  This group consists
   primarily of the apmTransactionTable.

2.5.5.  The APM Exception Group

   The APM Exception Group is used to generate immediate notifications
   of transactions that cross certain thresholds.  The apmExceptionTable
   is used to configure which thresholds are to be checked for which
   types of transactions.  The apmTransactionResponsivenessAlarm
   notification is sent when a transaction occurs with a responsiveness
   that crosses a threshold.  The apmTransactionUnsuccessfulAlarm
   notification is sent when a transaction fails for which exception
   checking was configured.  This group consists primarily of the
   apmExceptionTable.

2.5.6.  The APM Notification Group

   The APM Notification Group contains 2 notifications that are sent
   when thresholds in the APM Exception Table are exceeded.

(page 14 continued on part 2)