RFC 6372

MPLS Transport Profile (MPLS-TP) Survivability Framework

Pages: 56
Informational

Part 1 of 3 – Pages 1 to 10

RFC6372 - Page 1

Internet Engineering Task Force (IETF)                  N. Sprecher, Ed.
Request for Comments: 6372                        Nokia Siemens Networks
Category: Informational                                   A. Farrel, Ed.
ISSN: 2070-1721                                         Juniper Networks
                                                          September 2011


        MPLS Transport Profile (MPLS-TP) Survivability Framework

Abstract

   Network survivability is the ability of a network to recover traffic
   delivery following failure or degradation of network resources.
   Survivability is critical for the delivery of guaranteed network
   services, such as those subject to strict Service Level Agreements
   (SLAs) that place maximum bounds on the length of time that services
   may be degraded or unavailable.

   The Transport Profile of Multiprotocol Label Switching (MPLS-TP) is a
   packet-based transport technology based on the MPLS data plane that
   reuses many aspects of the MPLS management and control planes.

   This document comprises a framework for the provision of
   survivability in an MPLS-TP network; it describes recovery elements,
   types, methods, and topological considerations.  To enable data-plane
   recovery, survivability may be supported by the control plane,
   management plane, and by Operations, Administration, and Maintenance
   (OAM) functions.  This document describes mechanisms for recovering
   MPLS-TP Label Switched Paths (LSPs).  A detailed description of
   pseudowire recovery in MPLS-TP networks is beyond the scope of this
   document.

   This document is a product of a joint Internet Engineering Task Force
   (IETF) / International Telecommunication Union Telecommunication
   Standardization Sector (ITU-T) effort to include an MPLS Transport
   Profile within the IETF MPLS and Pseudowire Emulation Edge-to-Edge
   (PWE3) architectures to support the capabilities and functionalities
   of a packet-based transport network as defined by the ITU-T.

Status of This Memo

   This document is not an Internet Standards Track specification; it is
   published for informational purposes.

   This document is a product of the Internet Engineering Task Force
   (IETF).  It represents the consensus of the IETF community.  It has
   received public review and has been approved for publication by the
   Internet Engineering Steering Group (IESG).  Not all documents

RFC6372 - Page 2

   approved by the IESG are a candidate for any level of Internet
   Standard; see Section 2 of RFC 5741.

   Information about the current status of this document, any errata,
   and how to provide feedback on it may be obtained at
   http://www.rfc-editor.org/info/rfc6372.

Copyright Notice

   Copyright (c) 2011 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1. Introduction ....................................................4
      1.1. Recovery Schemes ...........................................4
      1.2. Recovery Action Initiation .................................5
      1.3. Recovery Context ...........................................6
      1.4. Scope of This Framework ....................................7
   2. Terminology and References ......................................8
   3. Requirements for Survivability .................................10
   4. Functional Architecture ........................................10
      4.1. Elements of Control .......................................10
           4.1.1. Operator Control ...................................11
           4.1.2. Defect-Triggered Actions ...........................12
           4.1.3. OAM Signaling ......................................12
           4.1.4. Control-Plane Signaling ............................12
      4.2. Recovery Scope ............................................13
           4.2.1. Span Recovery ......................................13
           4.2.2. Segment Recovery ...................................13
           4.2.3. End-to-End Recovery ................................14
      4.3. Grades of Recovery ........................................15
           4.3.1. Dedicated Protection ...............................15
           4.3.2. Shared Protection ..................................16
           4.3.3. Extra Traffic ......................................17
           4.3.4. Restoration ........................................19
           4.3.5. Reversion ..........................................20
      4.4. Mechanisms for Protection .................................20

RFC6372 - Page 3

           4.4.1. Link-Level Protection ..............................20
           4.4.2. Alternate Paths and Segments .......................21
           4.4.3. Protection Tunnels .................................22
      4.5. Recovery Domains ..........................................23
      4.6. Protection in Different Topologies ........................24
      4.7. Mesh Networks .............................................25
           4.7.1. 1:n Linear Protection ..............................26
           4.7.2. 1+1 Linear Protection ..............................28
           4.7.3. P2MP Linear Protection .............................29
           4.7.4. Triggers for the Linear Protection
                  Switching Action ...................................30
           4.7.5. Applicability of Linear Protection for LSP
                  Segments ...........................................31
           4.7.6. Shared Mesh Protection .............................32
      4.8. Ring Networks .............................................33
      4.9. Recovery in Layered Networks ..............................34
           4.9.1. Inherited Link-Level Protection ....................35
           4.9.2. Shared Risk Groups .................................35
           4.9.3. Fault Correlation ..................................36
   5. Applicability and Scope of Survivability in MPLS-TP ............37
   6. Mechanisms for Providing Survivability for MPLS-TP LSPs ........39
      6.1. Management Plane ..........................................39
           6.1.1. Configuration of Protection Operation ..............40
           6.1.2. External Manual Commands ...........................41
      6.2. Fault Detection ...........................................41
      6.3. Fault Localization ........................................42
      6.4. OAM Signaling .............................................43
           6.4.1. Fault Detection ....................................44
           6.4.2. Testing for Faults .................................44
           6.4.3. Fault Localization .................................45
           6.4.4. Fault Reporting ....................................45
           6.4.5. Coordination of Recovery Actions ...................46
      6.5. Control Plane .............................................46
           6.5.1. Fault Detection ....................................47
           6.5.2. Testing for Faults .................................47
           6.5.3. Fault Localization .................................48
           6.5.4. Fault Status Reporting .............................48
           6.5.5. Coordination of Recovery Actions ...................49
           6.5.6. Establishment of Protection and Restoration LSPs ...49
   7. Pseudowire Recovery Considerations .............................50
      7.1. Utilization of Underlying MPLS-TP Recovery ................50
      7.2. Recovery in the Pseudowire Layer ..........................51
   8. Manageability Considerations ...................................51
   9. Security Considerations ........................................52
   10. Acknowledgments ...............................................52
   11. References ....................................................53
      11.1. Normative References .....................................53
      11.2. Informative References ...................................54

RFC6372 - Page 4

1. Introduction

   Network survivability is the network's ability to recover traffic
   delivery following the failure or degradation of traffic delivery
   caused by a network fault or a denial-of-service attack on the
   network.  Survivability plays a critical role in the delivery of
   reliable services in transport networks.  Guaranteed services in the
   form of Service Level Agreements (SLAs) require a resilient network
   that very rapidly detects facility or node degradation or failures,
   and immediately starts to recover network operations in accordance
   with the terms of the SLA.

   The MPLS Transport Profile (MPLS-TP) is described in [RFC5921].
   MPLS-TP is designed to be consistent with existing transport network
   operations and management models, while providing survivability
   mechanisms, such as protection and restoration.  The functionality
   provided is intended to be similar to or better than that found in
   established transport networks that set a high benchmark for
   reliability.  That is, it is intended to provide the operator with
   functions with which they are familiar through their experience with
   other transport networks, although this does not preclude additional
   techniques.

   This document provides a framework for MPLS-TP-based survivability
   that meets the recovery requirements specified in [RFC5654].  It uses
   the recovery terminology defined in [RFC4427], which draws heavily on
   [G.808.1], and it refers to the requirements specified in [RFC5654].

   This document is a product of a joint Internet Engineering Task Force
   (IETF) / International Telecommunication Union Telecommunication
   Standardization Sector (ITU-T) effort to include an MPLS Transport
   Profile within the IETF MPLS and PWE3 architectures to support the
   capabilities and functionalities of a packet-based transport network,
   as defined by the ITU-T.

1.1.  Recovery Schemes

   Various recovery schemes (for protection and restoration) and
   processes have been defined and analyzed in [RFC4427] and [RFC4428].
   These schemes can also be applied in MPLS-TP networks to re-establish
   end-to-end traffic delivery according to the agreed service
   parameters, and to trigger recovery from "failed" or "degraded"
   transport entities.  In the context of this document, transport
   entities are nodes, links, transport path segments, concatenated
   transport path segments, and entire transport paths.  Recovery
   actions are initiated by the detection of a defect, or by an external
   request (e.g., an operator's request for manual control of protection
   switching).

RFC6372 - Page 5

   [RFC4427] makes a distinction between protection switching and
   restoration mechanisms.

   - Protection switching uses pre-assigned capacity between nodes,
     where the simplest scheme has a single, dedicated protection entity
     for each working entity, while the most complex scheme has m
     protection entities shared between n working entities (m:n).

   - Restoration uses any capacity available between nodes and usually
     involves rerouting.  The resources used for restoration may be pre-
     planned (i.e., predetermined, but not yet allocated to the recovery
     path), and recovery priority may be used as a differentiation
     mechanism to determine which services are recovered and which are
     not recovered.

   Both protection switching and restoration may be either
   unidirectional or bidirectional; unidirectional implies that
   protection switching is performed independently for each direction of
   a bidirectional transport path, while bidirectional means that both
   directions are switched simultaneously using appropriate
   coordination, even if the fault applies to only one direction of the
   path.

   Both protection and restoration mechanisms may be either revertive or
   non-revertive as described in Section 4.11 of [RFC4427].

   Preemption priority may be used to determine which services are
   sacrificed to enable the recovery of other services.  Restoration may
   also be either unidirectional or bidirectional.  In general,
   protection actions are completed within time frames amounting to tens
   of milliseconds, while automated restoration actions are normally
   completed within periods ranging from hundreds of milliseconds to a
   maximum of a few seconds.  Restoration is not guaranteed (for
   example, because network resources may not be available at the time
   of the defect).

1.2.  Recovery Action Initiation

   The recovery schemes described in [RFC4427] and evaluated in
   [RFC4428] are presented in the context of control-plane-driven
   actions (such as the configuration of the protection entities and
   functions, etc.).  The presence of a distributed control plane in an
   MPLS-TP network is optional.  However, the absence of such a control
   plane does not affect the operation of the network and the use of
   MPLS-TP forwarding, Operations, Administration, and Maintenance
   (OAM), and survivability capabilities.  In particular, the concepts

RFC6372 - Page 6

   discussed in [RFC4427] and [RFC4428] refer to recovery actions
   effected in the data plane; they are equally applicable in MPLS-TP,
   with or without the use of a control plane.

   Thus, some of the MPLS-TP recovery mechanisms do not depend on a
   control plane and use MPLS-TP OAM mechanisms or management actions to
   trigger recovery actions.

   The principles of MPLS-TP protection-switching actions are similar to
   those described in [RFC4427], since the protection mechanism is based
   on the capability to detect certain defects in the transport entities
   within the recovery domain.  The protection-switching controller does
   not care which initiation method is used, provided that it can be
   given information about the status of the transport entities within
   the recovery domain (e.g., OK, signal failure, signal degradation,
   etc.).

   In the context of MPLS-TP, it is imperative to ensure that performing
   switchovers is possible, regardless of the way in which the network
   is configured and managed (for example, regardless of whether a
   control-plane, management-plane, or OAM initiation mechanism is
   used).

   All MPLS and GMPLS protection mechanisms [RFC4428] are applicable in
   an MPLS-TP environment.  It is also possible to provision and manage
   the related protection entities and functions defined in MPLS and
   GMPLS using the management plane [RFC5654].  Regardless of whether an
   OAM, management, or control plane initiation mechanism is used, the
   protection-switching operation is a data-plane operation.

   In some recovery schemes (such as bidirectional protection
   switching), it is necessary to coordinate the protection state
   between the edges of the recovery domain to achieve initiation of
   recovery actions for both directions.  An MPLS-TP protocol may be
   used as an in-band (i.e., data-plane based) control protocol in order
   to coordinate the protection state between the edges of the
   protection domain.  When the MPLS-TP control plane is in use, a
   control-plane-based mechanism can also be used to coordinate the
   protection states between the edges of the protection domain.

1.3.  Recovery Context

   An MPLS-TP Label Switched Path (LSP) may be subject to any part of or
   all of MPLS-TP link recovery, path-segment recovery, or end-to-end
   recovery, where:

RFC6372 - Page 7

   o  MPLS-TP link recovery refers to the recovery of an individual link
      (and hence all or a subset of the LSPs routed over the link)
      between two MPLS-TP nodes.  For example, link recovery may be
      provided by server-layer recovery.

   o  Segment recovery refers to the recovery of an LSP segment (i.e.,
      segment and concatenated segment in the language of [RFC5654])
      between two nodes and is used to recover from the failure of one
      or more links or nodes.

   o  End-to-end recovery refers to the recovery of an entire LSP, from
      its ingress to its egress node.

   For additional resiliency, more than one of these recovery techniques
   may be configured concurrently for a single path.

   Co-routed bidirectional MPLS-TP LSPs are defined in a way that allows
   both directions of the LSP to follow the same route through the
   network.  In this scenario, the operator often requires the
   directions to fate-share (that is, if one direction fails, both
   directions should cease to operate).

   Associated bidirectional MPLS-TP LSPs exist where the two directions
   of a bidirectional LSP follow different paths through the network.
   An operator may also request fate-sharing for associated
   bidirectional LSPs.

   The requirement for fate-sharing causes a direct interaction between
   the recovery processes affecting the two directions of an LSP, so
   that both directions of the bidirectional LSP are recovered at the
   same time.  This mode of recovery is termed bidirectional recovery
   and may be seen as a consequence of fate-sharing.

   The recovery scheme operating at the data-plane level can function in
   a multi-domain environment (in the wider sense of a "domain"
   [RFC4726]).  It can also protect against a failure of a boundary node
   in the case of inter-domain operation.  MPLS-TP recovery schemes are
   intended to protect client services when they are sent across the
   MPLS-TP network.

1.4.  Scope of This Framework

   This framework introduces the architecture of the MPLS-TP recovery
   domain and describes the recovery schemes in MPLS-TP (based on the
   recovery types defined in [RFC4427]) as well as the principles of
   operation, recovery states, recovery triggers, and information
   exchanges between the different elements that support the reference
   model.

RFC6372 - Page 8

   The framework also describes the qualitative grades of the
   survivability functions that can be provided, such as dedicated
   recovery, shared protection, restoration, etc.  In the event of a
   network failure, the grade of recovery directly affects the service
   grade provided to the end-user.

   The general description of the functional architecture is applicable
   to both LSPs and pseudowires (PWs); however, PW recovery is only
   introduced in Section 7, and the relevant details are beyond the
   scope of this document and are for further study.

   This framework applies to general recovery schemes as well as to
   mechanisms that are optimized for specific topologies and are
   tailored to efficiently handle protection switching.

   This document addresses the need for the coordination of protection
   switching across multiple layers and at sub-layers (for clarity, we
   use the term "layer" to refer equally to layers and sub-layers).
   This allows an operator to prevent race conditions and allows the
   protection-switching mechanism of one layer to recover from a failure
   before switching is invoked at another layer.

   This framework also specifies the functions that must be supported by
   MPLS-TP to provide the recovery mechanisms.  MPLS-TP introduces a
   tool kit to enable recovery in MPLS-TP-based networks and to ensure
   that affected services are recovered in the event of a failure.

   Generally, network operators aim to provide the fastest, most stable,
   and best protection mechanism at a reasonable cost in accordance with
   customer requirements.  The greater the grade of protection required,
   the greater the number of resources will be consumed.  It is
   therefore expected that network operators will offer a wide spectrum
   of service grade.  MPLS-TP-based recovery offers the flexibility to
   select a recovery mechanism, define the granularity at which traffic
   delivery is to be protected, and choose the specific traffic types
   that are to be protected.  With MPLS-TP-based recovery, it should be
   possible to provide different grades of protection for different
   traffic classes within the same path based on the service
   requirements.

2.  Terminology and References

   The terminology used in this document is consistent with that defined
   in [RFC4427].  The latter is consistent with [G.808.1].

   However, certain protection concepts (such as ring protection) are
   not discussed in [RFC4427]; for those concepts, the terminology used
   in this document is drawn from [G.841].

RFC6372 - Page 9

   Readers should refer to those documents for normative definitions.

   This document supplies brief summaries of a number of terms for
   reasons of clarity and to assist the reader, but it does not redefine
   terms.

   Note, in particular, the distinction and definitions made in
   [RFC4427] for the following three terms:

   o  Protection: re-establishing end-to-end traffic delivery using pre-
      allocated resources.

   o  Restoration: re-establishing end-to-end traffic delivery using
      resources allocated at the time of need; sometimes referred to as
      "repair" of a service, LSP, or the traffic.

   o  Recovery: a generic term covering both Protection and Restoration.

   Note that the term "survivability" is used in [RFC5654] to cover the
   functional elements of "protection" and "restoration", which are
   collectively known as "recovery".

   Important background information on survivability can be found in
   [RFC3386], [RFC3469], [RFC4426], [RFC4427], and [RFC4428].

   In this document, the following additional terminology is applied:

   o  "Fault Management", as defined in [RFC5950].

   o  The terms "defect" and "failure" are used interchangeably to
      indicate any defect or failure in the sense that they are defined
      in [G.806].  The terms also include any signal degradation event
      as defined in [G.806].

   o  A "fault" is a fault or fault cause as defined in [G.806].

   o  "Trigger" indicates any event that may initiate a recovery action.
      See Section 4.1 for a more detailed discussion of triggers.

   o  The acronym "OAM" is defined as Operations, Administration, and
      Maintenance, consistent with [RFC6291].

   o  A "Transport Entity" is a node, link, transport path segment,
      concatenated transport path segment, or entire transport path.

   o  A "Working Entity" is a transport entity that carries traffic
      during normal network operation.

RFC6372 - Page 10

   o  A "Protection Entity" is a transport entity that is pre-allocated
      and used to protect and transport traffic when the working entity
      fails.

   o  A "Recovery Entity" is a transport entity that is used to recover
      and transport traffic when the working entity fails.

   o  "Survivability Actions" are the steps that may be taken by network
      nodes to communicate faults and to switch traffic from faulted or
      degraded paths to other paths.  This may include sending messages
      and establishing new paths.

   General terminology for MPLS-TP is found in [RFC5921] and [ROSETTA].
   Background information on MPLS-TP requirements can be found in
   [RFC5654].

3.  Requirements for Survivability

   MPLS-TP requirements are presented in [RFC5654] and serve as
   normative references for the definition of all MPLS-TP functionality,
   including survivability.  Survivability is presented in [RFC5654] as
   playing a critical role in the delivery of reliable services, and the
   requirements for survivability are set out using the recovery
   terminology defined in [RFC4427].

(page 10 continued on part 2)