Internet Engineering Task Force (IETF) T. Mizrahi Request for Comments: 7276 Marvell Category: Informational N. Sprecher ISSN: 2070-1721 Nokia Solutions and Networks E. Bellagamba Ericsson Y. Weingarten June 2014 An Overview of Operations, Administration, and Maintenance (OAM) ToolsAbstract
Operations, Administration, and Maintenance (OAM) is a general term that refers to a toolset for fault detection and isolation, and for performance measurement. Over the years, various OAM tools have been defined for various layers in the protocol stack. This document summarizes some of the OAM tools defined in the IETF in the context of IP unicast, MPLS, MPLS Transport Profile (MPLS-TP), pseudowires, and Transparent Interconnection of Lots of Links (TRILL). This document focuses on tools for detecting and isolating failures in networks and for performance monitoring. Control and management aspects of OAM are outside the scope of this document. Network repair functions such as Fast Reroute (FRR) and protection switching, which are often triggered by OAM protocols, are also out of the scope of this document. The target audience of this document includes network equipment vendors, network operators, and standards development organizations. This document can be used as an index to some of the main OAM tools defined in the IETF. At the end of the document, a list of the OAM toolsets and a list of the OAM functions are presented as a summary.
Status of This Memo This document is not an Internet Standards Track specification; it is published for informational purposes. This document is a product of the Internet Engineering Task Force (IETF). It represents the consensus of the IETF community. It has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG). Not all documents approved by the IESG are a candidate for any level of Internet Standard; see Section 2 of RFC 5741. Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at http://www.rfc-editor.org/info/rfc7276. Copyright Notice Copyright (c) 2014 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
Table of Contents
1. Introduction ....................................................4 1.1. Background .................................................5 1.2. Target Audience ............................................6 1.3. OAM-Related Work in the IETF ...............................6 1.4. Focusing on the Data Plane .................................7 2. Terminology .....................................................8 2.1. Abbreviations ..............................................8 2.2. Terminology Used in OAM Standards .........................10 2.2.1. General Terms ......................................10 2.2.2. Operations, Administration, and Maintenance ........10 2.2.3. Functions, Tools, and Protocols ....................11 2.2.4. Data Plane, Control Plane, and Management Plane ....11 2.2.5. The Players ........................................12 2.2.6. Proactive and On-Demand Activation .................13 2.2.7. Connectivity Verification and Continuity Checks ....14 2.2.8. Connection-Oriented vs. Connectionless Communication ......................................15 2.2.9. Point-to-Point vs. Point-to-Multipoint Services ....16 2.2.10. Failures ..........................................16 3. OAM Functions ..................................................17 4. OAM Tools in the IETF - A Detailed Description .................18 4.1. IP Ping ...................................................18 4.2. IP Traceroute .............................................19 4.3. Bidirectional Forwarding Detection (BFD) ..................20 4.3.1. Overview ...........................................20 4.3.2. Terminology ........................................20 4.3.3. BFD Control ........................................20 4.3.4. BFD Echo ...........................................21 4.4. MPLS OAM ..................................................21 4.4.1. LSP Ping ...........................................21 4.4.2. BFD for MPLS .......................................22 4.4.3. OAM for Virtual Private Networks (VPNs) over MPLS ..23 4.5. MPLS-TP OAM ...............................................23 4.5.1. Overview ...........................................23 4.5.2. Terminology ........................................24 4.5.3. Generic Associated Channel .........................25 4.5.4. MPLS-TP OAM Toolset ................................25 4.5.4.1. Continuity Check and Connectivity Verification ..............................26 4.5.4.2. Route Tracing .............................26 4.5.4.3. Lock Instruct .............................27 4.5.4.4. Lock Reporting ............................27 4.5.4.5. Alarm Reporting ...........................27 4.5.4.6. Remote Defect Indication ..................27 4.5.4.7. Client Failure Indication .................27
4.5.4.8. Performance Monitoring ....................28 4.5.4.8.1. Packet Loss Measurement (LM) ...28 4.5.4.8.2. Packet Delay Measurement (DM) ..28 4.6. Pseudowire OAM ............................................29 4.6.1. Pseudowire OAM Using Virtual Circuit Connectivity Verification (VCCV) ...................29 4.6.2. Pseudowire OAM Using G-ACh .........................30 4.6.3. Attachment Circuit - Pseudowire Mapping ............30 4.7. OWAMP and TWAMP ...........................................31 4.7.1. Overview ...........................................31 4.7.2. Control and Test Protocols .........................32 4.7.3. OWAMP ..............................................32 4.7.4. TWAMP ..............................................33 4.8. TRILL .....................................................33 5. Summary ........................................................34 5.1. Summary of OAM Tools ......................................34 5.2. Summary of OAM Functions ..................................37 5.3. Guidance to Network Equipment Vendors .....................38 6. Security Considerations ........................................38 7. Acknowledgments ................................................39 8. References .....................................................39 8.1. Normative References ......................................39 8.2. Informative References ....................................39 Appendix A. List of OAM Documents ................................ 46 A.1. List of IETF OAM Documents ............................... 46 A.2. List of Selected Non-IETF OAM Documents .................. 501. Introduction
"OAM" is a general term that refers to a toolset for detecting, isolating, and reporting failures, and for monitoring network performance. There are several different interpretations of the "OAM" acronym. This document refers to Operations, Administration, and Maintenance, as recommended in Section 3 of [OAM-Def]. This document summarizes some of the OAM tools defined in the IETF in the context of IP unicast, MPLS, MPLS Transport Profile (MPLS-TP), pseudowires, and TRILL. This document focuses on tools for detecting and isolating failures and for performance monitoring. Hence, this document focuses on the tools used for monitoring and measuring the data plane; control and management aspects of OAM are outside the scope of this document. Network repair functions such as Fast Reroute (FRR) and protection switching, which are often triggered by OAM protocols, are also out of the scope of this document.
1.1. Background
OAM was originally used in traditional communication technologies such as E1 and T1, evolving into Plesiochronous Digital Hierarchy (PDH) and then later into Synchronous Optical Network / Synchronous Digital Hierarchy (SONET/SDH). ATM was probably the first technology to include inherent OAM support from day one, while in other technologies OAM was typically defined in an ad hoc manner after the technology was already defined and deployed. Packet-based networks were traditionally considered unreliable and best effort. As packet- based networks evolved, they have become the common transport for both data and telephony, replacing traditional transport protocols. Consequently, packet-based networks were expected to provide a similar "carrier grade" experience, and specifically to support more advanced OAM functions, beyond ICMP and router hellos, that were traditionally used for fault detection. As typical networks have a multi-layer architecture, the set of OAM protocols similarly take a multi-layer structure; each layer has its own OAM protocols. Moreover, OAM can be used at different levels of hierarchy in the network to form a multi-layer OAM solution, as shown in the example in Figure 1. Figure 1 illustrates a network in which IP traffic between two customer edges is transported over an MPLS provider network. MPLS OAM is used at the provider level for monitoring the connection between the two provider edges, while IP OAM is used at the customer level for monitoring the end-to-end connection between the two customer edges. |<-------------- Customer-level OAM -------------->| IP OAM (Ping, Traceroute, OWAMP, TWAMP) |<- Provider-level OAM ->| MPLS OAM (LSP Ping) +-----+ +----+ +----+ +-----+ | | | |========================| | | | | |-------| | MPLS | |-------| | | | IP | | | | IP | | +-----+ +----+ +----+ +-----+ Customer Provider Provider Customer Edge Edge Edge Edge Figure 1: Example of Multi-layer OAM
1.2. Target Audience
The target audience of this document includes: o Standards development organizations - Both IETF working groups and non-IETF organizations can benefit from this document when designing new OAM protocols, or when looking to reuse existing OAM tools for new technologies. o Network equipment vendors and network operators can use this document as an index to some of the common IETF OAM tools. It should be noted that some background in OAM is necessary in order to understand and benefit from this document. Specifically, the reader is assumed to be familiar with the term "OAM" [OAM-Def], the motivation for using OAM, and the distinction between OAM and network management [OAM-Mng].1.3. OAM-Related Work in the IETF
This memo provides an overview of the different sets of OAM tools defined by the IETF. The set of OAM tools described in this memo are applicable to IP unicast, MPLS, pseudowires, MPLS Transport Profile (MPLS-TP), and TRILL. While OAM tools that are applicable to other technologies exist, they are beyond the scope of this memo. This document focuses on IETF documents that have been published as RFCs, while other ongoing OAM-related work is outside the scope. The IETF has defined OAM protocols and tools in several different contexts. We roughly categorize these efforts into a few sets of OAM-related RFCs, listed in Table 1. Each set defines a logically coupled set of RFCs, although the sets are in some cases intertwined by common tools and protocols. The discussion in this document is ordered according to these sets (the acronyms and abbreviations are listed in Section 2.1).
+--------------+------------+ | Toolset | Transport | | | Technology | +--------------+------------+ |IP Ping | IPv4/IPv6 | +--------------+------------+ |IP Traceroute | IPv4/IPv6 | +--------------+------------+ |BFD | generic | +--------------+------------+ |MPLS OAM | MPLS | +--------------+------------+ |MPLS-TP OAM | MPLS-TP | +--------------+------------+ |Pseudowire OAM| Pseudowires| +--------------+------------+ |OWAMP and | IPv4/IPv6 | |TWAMP | | +--------------+------------+ |TRILL OAM | TRILL | +--------------+------------+ Table 1: OAM Toolset Packages in the IETF Documents This document focuses on OAM tools that have been developed in the IETF. A short summary of some of the significant OAM standards that have been developed in other standard organizations is presented in Appendix A.2.1.4. Focusing on the Data Plane
OAM tools may, and quite often do, work in conjunction with a control plane and/or management plane. OAM provides instrumentation tools for measuring and monitoring the data plane. OAM tools often use control-plane functions, e.g., to initialize OAM sessions and to exchange various parameters. The OAM tools communicate with the management plane to raise alarms, and often OAM tools may be activated by the management plane (as well as by the control plane), e.g., to locate and localize problems. The considerations of the control-plane maintenance tools and the functionality of the management plane are out of scope for this document, which concentrates on presenting the data-plane tools that are used for OAM. Network repair functions such as Fast Reroute (FRR) and protection switching, which are often triggered by OAM protocols, are also out of the scope of this document.
Since OAM protocols are used for monitoring the data plane, it is imperative for OAM tools to be capable of testing the actual data plane with as much accuracy as possible. Thus, it is important to enforce fate-sharing between OAM traffic that monitors the data plane and the data-plane traffic it monitors.2. Terminology
2.1. Abbreviations
ACH Associated Channel Header AIS Alarm Indication Signal ATM Asynchronous Transfer Mode BFD Bidirectional Forwarding Detection CC Continuity Check CC-V Continuity Check and Connectivity Verification CV Connectivity Verification DM Delay Measurement ECMP Equal-Cost Multipath FEC Forwarding Equivalence Class FRR Fast Reroute G-ACh Generic Associated Channel GAL Generic Associated Channel Label ICMP Internet Control Message Protocol L2TP Layer 2 Tunneling Protocol L2VPN Layer 2 Virtual Private Network L3VPN Layer 3 Virtual Private Network LCCE L2TP Control Connection Endpoint LDP Label Distribution Protocol
LER Label Edge Router LM Loss Measurement LSP Label Switched Path LSR Label Switching Router ME Maintenance Entity MEG Maintenance Entity Group MEP MEG End Point MIP MEG Intermediate Point MP Maintenance Point MPLS Multiprotocol Label Switching MPLS-TP MPLS Transport Profile MTU Maximum Transmission Unit OAM Operations, Administration, and Maintenance OWAMP One-Way Active Measurement Protocol PDH Plesiochronous Digital Hierarchy PE Provider Edge PSN Public Switched Network PW Pseudowire PWE3 Pseudowire Emulation Edge-to-Edge RBridge Routing Bridge RDI Remote Defect Indication SDH Synchronous Digital Hierarchy SONET Synchronous Optical Network TRILL Transparent Interconnection of Lots of Links
TTL Time To Live TWAMP Two-Way Active Measurement Protocol VCCV Virtual Circuit Connectivity Verification VPN Virtual Private Network2.2. Terminology Used in OAM Standards
2.2.1. General Terms
A wide variety of terms is used in various OAM standards. This section presents a comparison of the terms used in various OAM standards, without fully quoting the definition of each term. An interesting overview of the term "OAM" and its derivatives is presented in [OAM-Def]. A thesaurus of terminology for MPLS-TP terms is presented in [TP-Term], which provides a good summary of some of the OAM-related terminology.2.2.2. Operations, Administration, and Maintenance
The following definition of OAM is quoted from [OAM-Def]: The components of the "OAM" acronym (and provisioning) are defined as follows: o Operations - Operation activities are undertaken to keep the network (and the services that the network provides) up and running. It includes monitoring the network and finding problems. Ideally these problems should be found before users are affected. o Administration - Administration activities involve keeping track of resources in the network and how they are used. It includes all the bookkeeping that is necessary to track networking resources and the network under control. o Maintenance - Maintenance activities are focused on facilitating repairs and upgrades -- for example, when equipment must be replaced, when a router needs a patch for an operating system image, or when a new switch is added to a network. Maintenance also involves corrective and preventive measures to make the managed network run more effectively, e.g., adjusting device configuration and parameters.
2.2.3. Functions, Tools, and Protocols
OAM Function An OAM function is an instrumentation measurement type or diagnostic. OAM functions are the atomic building blocks of OAM, where each function defines an OAM capability. Typical examples of OAM functions are presented in Section 3. OAM Protocol An OAM protocol is a protocol used for implementing one or more OAM functions. The OWAMP-Test [OWAMP] is an example of an OAM protocol. OAM Tool An OAM tool is a specific means of applying one or more OAM functions. In some cases, an OAM protocol *is* an OAM tool, e.g., OWAMP-Test. In other cases, an OAM tool uses a set of protocols that are not strictly OAM related; for example, Traceroute (Section 4.2) can be implemented using UDP and ICMP messages, without using an OAM protocol per se.2.2.4. Data Plane, Control Plane, and Management Plane
Data Plane The data plane is the set of functions used to transfer data in the stratum or layer under consideration [ITU-Terms]. The data plane is also known as the forwarding plane or the user plane. Control Plane The control plane is the set of protocols and mechanisms that enable routers to efficiently learn how to forward packets towards their final destination (based on [Comp]).
Management Plane The term "Management Plane", as described in [Mng], is used to describe the exchange of management messages through management protocols (often transported by IP and by IP transport protocols) between management applications and the managed entities such as network nodes. Data Plane vs. Control Plane vs. Management Plane The distinction between the planes is at times a bit vague. For example, the definition of "Control Plane" above may imply that OAM tools such as ping, BFD, and others are in fact in the control plane. This document focuses on tools used for monitoring the data plane. While these tools could arguably be considered to be in the control plane, these tools monitor the data plane, and hence it is imperative to have fate-sharing between OAM traffic that monitors the data plane and the data-plane traffic it monitors. Another potentially vague distinction is between the management plane and control plane. The management plane should be seen as separate from, but possibly overlapping with, the control plane (based on [Mng]).2.2.5. The Players
An OAM tool is used between two (or more) peers. Various terms are used in IETF documents to refer to the players that take part in OAM. Table 2 summarizes the terms used in each of the toolsets discussed in this document.
+--------------------------+---------------------------+ | Toolset | Terms | +--------------------------+---------------------------+ | Ping / Traceroute |- Host | | ([ICMPv4], [ICMPv6], |- Node | | [TCPIP-Tools]) |- Interface | | |- Gateway | + ------------------------ + ------------------------- + | BFD [BFD] |- System | + ------------------------ + ------------------------- + | MPLS OAM [MPLS-OAM-FW] |- LSR | + ------------------------ + ------------------------- + | MPLS-TP OAM [TP-OAM-FW] |- End Point - MEP | | |- Intermediate Point - MIP | + ------------------------ + ------------------------- + | Pseudowire OAM [VCCV] |- PE | | |- LCCE | + ------------------------ + ------------------------- + | OWAMP and TWAMP |- Host | | ([OWAMP], [TWAMP]) |- End system | + ------------------------ + ------------------------- + | TRILL OAM [TRILL-OAM] |- RBridge | +--------------------------+---------------------------+ Table 2: Maintenance Point Terminology2.2.6. Proactive and On-Demand Activation
The different OAM tools may be used in one of two basic types of activation: Proactive Proactive activation - indicates that the tool is activated on a continual basis, where messages are sent periodically, and errors are detected when a certain number of expected messages are not received. On-demand On-demand activation - indicates that the tool is activated "manually" to detect a specific anomaly.
2.2.7. Connectivity Verification and Continuity Checks
Two distinct classes of failure management functions are used in OAM protocols: Connectivity Verification and Continuity Checks. The distinction between these terms is defined in [MPLS-TP-OAM] and is used similarly in this document. Continuity Check Continuity Checks are used to verify that a destination is reachable, and are typically sent proactively, though they can be invoked on-demand as well. Connectivity Verification A Connectivity Verification function allows Alice to check whether she is connected to Bob or not. It is noted that while the CV function is performed in the data plane, the "expected path" is predetermined in either the control plane or the management plane. A Connectivity Verification (CV) protocol typically uses a CV message, followed by a CV reply that is sent back to the originator. A CV function can be applied proactively or on-demand. Connectivity Verification tools often perform path verification as well, allowing Alice to verify that messages from Bob are received through the correct path, thereby verifying not only that the two MPs are connected, but also that they are connected through the expected path, allowing detection of unexpected topology changes. Connectivity Verification functions can also be used for checking the MTU of the path between the two peers. Connectivity Verification and Continuity Checks are considered complementary mechanisms and are often used in conjunction with each other.
2.2.8. Connection-Oriented vs. Connectionless Communication
Connection-Oriented In connection-oriented technologies, an end-to-end connection is established (by a control protocol or provisioned by a management system) prior to the transmission of data. Typically a connection identifier is used to identify the connection. In connection-oriented technologies, it is often the case (although not always) that all packets belonging to a specific connection use the same route through the network. Connectionless In connectionless technologies, data is typically sent between end points without prior arrangement. Packets are routed independently based on their destination address, and hence different packets may be routed in a different way across the network. Discussion The OAM tools described in this document include tools that support connection-oriented technologies, as well as tools for connectionless technologies. In connection-oriented technologies, OAM is used to monitor a *specific* connection; OAM packets are forwarded through the same route as the data traffic and receive the same treatment. In connectionless technologies, OAM is used between a source and destination pair without defining a specific connection. Moreover, in some cases, the route of OAM packets may differ from the one of the data traffic. For example, the connectionless IP Ping (Section 4.1) tests the reachability from a source to a given destination, while the connection-oriented LSP Ping (Section 4.4.1) is used for monitoring a specific LSP (connection) and provides the capability to monitor all the available paths used by an LSP. It should be noted that in some cases connectionless protocols are monitored by connection-oriented OAM protocols. For example, while IP is a connectionless protocol, it can be monitored by BFD (Section 4.3), which is connection oriented.
2.2.9. Point-to-Point vs. Point-to-Multipoint Services
Point-to-point (P2P) A P2P service delivers data from a single source to a single destination. Point-to-multipoint (P2MP) A P2MP service delivers data from a single source to a one or more destinations (based on [Signal]). An MP2MP service is a service that delivers data from more than one source to one or more receivers (based on [Signal]). Note: the two definitions for P2MP and MP2MP are quoted from [Signal]. Although [Signal] describes a specific case of P2MP and MP2MP that is MPLS-specific, these two definitions also apply to non-MPLS cases. Discussion The OAM tools described in this document include tools for P2P services, as well as tools for P2MP services. The distinction between P2P services and P2MP services affects the corresponding OAM tools. A P2P service is typically simpler to monitor, as it consists of a single pair of endpoints. P2MP and MP2MP services present several challenges. For example, in a P2MP service, the OAM mechanism not only verifies that each of the destinations is reachable from the source but also verifies that the P2MP distribution tree is intact and loop-free.2.2.10. Failures
The terms "Failure", "Fault", and "Defect" are used interchangeably in the standards, referring to a malfunction that can be detected by a Connectivity Verification or a Continuity Check. In some standards, such as 802.1ag [IEEE802.1Q], there is no distinction between these terms, while in other standards each of these terms refers to a different type of malfunction.
The terminology used in IETF MPLS-TP OAM is based on the ITU-T terminology, which distinguishes between these three terms in [ITU-T-G.806] as follows: Fault The term "Fault" refers to an inability to perform a required action, e.g., an unsuccessful attempt to deliver a packet. Defect The term "Defect" refers to an interruption in the normal operation, such as a consecutive period of time where no packets are delivered successfully. Failure The term "Failure" refers to the termination of the required function. While a Defect typically refers to a limited period of time, a failure refers to a long period of time.