Section 5.1 of
RFC 6513 describes procedures used by an MVPN downstream PE to determine the Upstream Multicast Hop (UMH) for a given (C-S,C-G).
For a given downstream PE and a given VRF, the P-tunnel corresponding to a given Upstream PE for a given (C-S,C-G) state is the S-PMSI tunnel advertised by that Upstream PE for that (C-S,C-G) and imported into that VRF or, if there isn't any such S-PMSI, the I-PMSI tunnel advertised by that PE and imported into that VRF.
The procedure described here is optional one, based on a downstream PE taking into account the status of P-tunnels rooted at each possible Upstream PE, for including or not including each given PE in the list of candidate UMHs for a given (C-S,C-G) state. If it is not possible to determine whether a P-tunnel's current status is Up, the state shall be considered "not known to be Down", and it may be treated as if it is Up so that attempts to use the tunnel are acceptable. The result is that, if a P-tunnel is Down (see
Section 3.1), the PE that is the root of the P-tunnel will not be considered for UMH selection. This will result in the downstream PE failing over to use the next Upstream PE in the list of candidates. Some downstream PEs could arrive at a different conclusion regarding the tunnel's state because the failure impacts only a subset of branches. Because of that, the procedures of
Section 9.1.1 of
RFC 6513 are applicable when using I-PMSI P-tunnels. That document is a foundation for this document, and its processes all apply here.
There are three options specified in
Section 5.1 of
RFC 6513 for a downstream PE to select an Upstream PE.
-
The first two options select the Upstream PE from a candidate PE set based either on an IP address or a hashing algorithm. When used together with the optional procedure of considering the P-tunnel status as in this document, a candidate Upstream PE is included in the set if it either:
-
advertises an x-PMSI bound to a tunnel, where the specified tunnel's state is not known to be Down, or,
-
does not advertise any x-PMSI applicable to the given (C-S,C-G) but has associated a VRF Route Import BGP Extended Community to the unicast VPN route for S. That is necessary to avoid incorrectly invalidating a UMH PE that would use a policy where no I-PMSI is advertised for a given VRF and where only S-PMSIs are used. The S-PMSI can be advertised only after the Upstream PE receives a C-multicast route for (C-S,C-G) / (C-*,C-G) to be carried over the advertised S-PMSI.
If the resulting candidate set is empty, then the procedure is repeated without considering the P-tunnel status.
-
The third option uses the installed UMH Route (i.e., the "best" route towards the C-root) as the Selected UMH Route, and its originating PE is the selected Upstream PE. With the optional procedure of considering P-tunnel status as in this document, the Selected UMH Route is the best one among those whose originating PE's P-tunnel is not "down". If that does not exist, the installed UMH Route is selected regardless of the P-tunnel status.
Different factors can be considered to determine the "status" of a P-tunnel and are described in the following subsections. The optional procedures described in this section also handle the case when the downstream PEs do not all apply the same rules to define what the status of a P-tunnel is (please see
Section 6), and some of them will produce a result that may be different for different downstream PEs. Thus, the "status" of a P-tunnel in this section is not a characteristic of the tunnel in itself but is the tunnel status, as seen from a particular downstream PE. Additionally, some of the following methods determine the ability of a downstream PE to receive traffic on the P-tunnel and not specifically on the status of the P-tunnel itself. That could be referred to as "P-tunnel reception status", but for simplicity, we will use the terminology of P-tunnel "status" for all of these methods.
Depending on the criteria used to determine the status of a P-tunnel, there may be an interaction with another resiliency mechanism used for the P-tunnel itself, and the UMH update may happen immediately or may need to be delayed. Each particular case is covered in each separate subsection below.
An implementation may support any combination of the methods described in this section and provide a network operator with control to choose which one to use in the particular deployment.
When determining if the status of a P-tunnel is Up, a condition to consider is whether the root of the tunnel, as specified in the x-PMSI Tunnel attribute, is reachable through unicast routing tables. In this case, the downstream PE can immediately update its UMH when the reachability condition changes.
That is similar to BGP next-hop tracking for VPN routes, except that the address considered is not the BGP next-hop address but the root address in the x-PMSI Tunnel attribute. BGP next-hop tracking monitors BGP next-hop address changes in the routing table. In general, when a change is detected, it performs a next-hop scan to find if any of the next hops in the BGP table is affected and updates it accordingly.
If BGP next-hop tracking is done for VPN routes and the root address of a given tunnel happens to be the same as the next-hop address in the BGP A-D Route advertising the tunnel, then checking, in unicast routing tables, whether the tunnel root is reachable will be unnecessary duplication and will thus not bring any specific benefit.
When determining if the status of a P-tunnel is Up, a condition to consider is whether the last-hop link of the P-tunnel is Up. Conversely, if the last-hop link of the P-tunnel is Down, then this can be taken as an indication that the P-tunnel is Down.
Using this method when a fast restoration mechanism (such as MPLS Fast Reroute (FRR) [
RFC 4090]) is in place for the link requires careful consideration and coordination of defect detection intervals for the link and the tunnel. When using multi-layer protection, particular consideration must be given to the interaction of defect detections at different network layers. It is recommended to use longer detection intervals at the higher layers. Some recommendations suggest using a multiplier of 3 or larger, e.g., 10 msec detection for the link failure detection and at least 100 msec for the tunnel failure detection. In many cases, it is not practical to use both protection methods simultaneously because uncorrelated timers might cause unnecessary switchovers and destabilize the network.
For P-tunnels of type P2MP MPLS-TE, the status of the P-tunnel is considered Up if the sub-LSP to this downstream PE is in the Up state. The determination of whether a P2MP RSVP-TE Label Switched Path (LSP) is in the Up state requires Path and Resv state for the LSP and is based on procedures specified in [
RFC 4875]. As a result, the downstream PE can immediately update its UMH when the reachability condition changes.
When using this method and if the signaling state for a P2MP TE LSP is removed (e.g., if the ingress of the P2MP TE LSP sends a PathTear message) or the P2MP TE LSP changes state from Up to Down as determined by procedures in [
RFC 4875], the status of the corresponding P-tunnel
MUST be re-evaluated. If the P-tunnel transitions from Up to Down state, the Upstream PE that is the ingress of the P-tunnel
MUST NOT be considered to be a valid candidate UMH.
An Upstream PE
MUST be removed from the UMH candidate list for a given (C-S,C-G) if the P-tunnel (I-PMSI or S-PMSI) for this (S,G) is leaf triggered (PIM, mLDP), but for some reason, internal to the protocol, the upstream one-hop branch of the tunnel from P to PE cannot be built. As a result, the downstream PE can immediately update its UMH when the reachability condition changes.
In cases where the downstream node can be configured so that the maximum inter-packet time is known for all the multicast flows mapped on a P-tunnel, the local traffic counter information per (C-S,C-G) for traffic received on this P-tunnel can be used to determine the status of the P-tunnel.
When such a procedure is used, in the context where fast restoration mechanisms are used for the P-tunnels, a configurable timer
MUST be set on the downstream PE to wait before updating the UMH to let the P-tunnel restoration mechanism execute its actions. Determining that a tunnel is probably down by waiting for enough packets to fail to arrive as expected is a heuristic and operational matter that depends on the maximum inter-packet time. A timeout of three seconds is a generally suitable default waiting period to ascertain that the tunnel is down, though other values would be needed for atypical conditions.
In cases where this mechanism is used in conjunction with the method described in
Section 5, no prior knowledge of the rate or maximum inter-packet time on the multicast streams is required; downstream PEs can periodically compare actual packet reception statistics on the two P-tunnels to determine when one of them is down. The detailed specification of this mechanism is outside the scope of this document.
The P-tunnel status may be derived from the status of a multipoint BFD session [
RFC 8562] whose discriminator is advertised along with an x-PMSI A-D Route. A P2MP BFD session can be instantiated using a mechanism other than the BFD Discriminator attribute, e.g., MPLS LSP Ping ([
MPLS-P2MP-BFD]). The description of these methods is outside the scope of this document.
This document defines the format and ways of using a new BGP attribute called the "BFD Discriminator" (38). It is an optional transitive BGP attribute. Thus, it is expected that an implementation that does not recognize or is configured not to support this attribute, as if the attribute was unrecognized, follows procedures defined for optional transitive path attributes in
Section 5 of
RFC 4271. See
Section 7.2 for more information. The format of this attribute is shown in
Figure 1.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+
| BFD Mode |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| BFD Discriminator |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
~ Optional TLVs ~
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Where:
-
-
BFD Mode field is 1 octet long. This specification defines P2MP BFD Session as value 1 (Section 7.2).
-
-
BFD Discriminator field is 4 octets long.
-
-
Optional TLVs is the optional variable-length field that MAY be used in the BFD Discriminator attribute for future extensions. TLVs MAY be included in a sequential or nested manner. To allow for TLV nesting, it is advised to define a new TLV as a variable-length object. Figure 2 presents the Optional TLV format TLV that consists of:
-
Type:
-
a 1-octet-long field that characterizes the interpretation of the Value field (Section 7.3)
-
Length:
-
a 1-octet-long field equal to the length of the Value field in octets
-
Value:
-
a variable-length field
All multibyte fields in TLVs defined in this specification are in network byte order.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type | Length | Value ...
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
An optional Source IP Address TLV is defined in this document. The Source IP Address TLV
MUST be used when the value of the BFD Mode field's value is P2MP BFD Session. The BFD Discriminator attribute that does not include the Source IP Address TLV
MUST be handled according to the "attribute discard" approach, as defined in [
RFC 7606]. For the Source IP Address TLV, fields are set as follows:
-
The Type field is set to 1 (Section 7.3).
-
The Length field is 4 for the IPv4 address family and 16 for the IPv6 address family.The TLV is considered malformed if the field is set to any other value.
-
The Value field contains the address associated with the MultipointHead of the P2MP BFD session.
The BFD Discriminator attribute
MUST be considered malformed if its length is smaller than 11 octets or if Optional TLVs are present but not well formed. If the attribute is deemed to be malformed, the UPDATE message
SHALL be handled using the approach of Attribute Discard per [
RFC 7606].
To enable downstream PEs to track the P-tunnel status using a point-to-multipoint (P2MP) BFD session, the Upstream PE:
-
MUST initiate the BFD session and set bfd.SessionType = MultipointHead as described in [RFC 8562];
-
when transmitting BFD Control packets MUST set the IP destination address of the inner IP header to the internal loopback address 127.0.0.1/32 for IPv4 [RFC 1122]. For IPv6, it MUST use the loopback address ::1/128 [RFC 4291];
-
MUST use the IP address included in the Source IP Address TLV of the BFD Discriminator attribute as the source IP address when transmitting BFD Control packets;
-
MUST include the BFD Discriminator attribute in the x-PMSI A-D Route with the value set to the My Discriminator value;
-
MUST periodically transmit BFD Control packets over the x-PMSI P-tunnel after the P-tunnel is considered established. Note that the methods to declare that a P-tunnel has been established are outside the scope of this specification.
If the tracking of the P-tunnel by using a P2MP BFD session is enabled after the x-PMSI A-D Route has been already advertised, the x-PMSI A-D Route
MUST be resent with the only change between the previous advertisement and the new advertisement to be the inclusion of the BFD Discriminator attribute.
If the x-PMSI A-D Route is advertised with P-tunnel status tracked using the P2MP BFD session, and it is desired to stop tracking P-tunnel status using BFD, then:
-
the x-PMSI A-D Route MUST be resent with the only change between the previous advertisement and the new advertisement be the exclusion of the BFD Discriminator attribute;
-
the P2MP BFD session MUST be deleted. The session MAY be deleted after some configurable delay, which should have a reasonable default.
Upon receiving the BFD Discriminator attribute in the x-PMSI A-D Route, the downstream PE:
-
MUST associate the received BFD Discriminator value with the P-tunnel originating from the Upstream PE and the IP address of the Upstream PE;
-
MUST create a P2MP BFD session and set bfd.SessionType = MultipointTail as described in [RFC 8562];
-
to properly demultiplex BFD session, MUST use:
-
the IP address in the Source IP Address TLV included the BFD Discriminator attribute in the x-PMSI A-D Route;
-
the value of the BFD Discriminator field in the BFD Discriminator attribute;
-
the x-PMSI Tunnel Identifier [RFC 6514] the BFD Control packet was received on.
After the state of the P2MP BFD session is up, i.e., bfd.SessionState == Up, the session state will then be used to track the health of the P-tunnel.
According to [
RFC 8562], if the downstream PE receives Down or AdminDown in the State field of the BFD Control packet, or if the Detection Timer associated with the BFD session expires, the BFD session is down, i.e., bfd.SessionState == Down. When the BFD session state is Down, then the P-tunnel associated with the BFD session
MUST be considered down. If the site that contains C-S is connected to two or more PEs, a downstream PE will select one as its Primary Upstream PE, while others are considered to be Standby Upstream PEs. In such a scenario, when the P-tunnel is considered down, the downstream PE
MAY initiate a switchover of the traffic from the Primary Upstream PE to the Standby Upstream PE only if the Standby Upstream PE is deemed to be in the Up state. That
MAY be determined from the state of a P2MP BFD session with the Standby Upstream PE as the MultipointHead.
If the downstream PE's P-tunnel is already established when the downstream PE receives the new x-PMSI A-D Route with the BFD Discriminator attribute, the downstream PE
MUST associate the value of the BFD Discriminator field with the P-tunnel and follow procedures listed above in this section if and only if the x-PMSI A-D Route was properly processed as per [
RFC 6514], and the BFD Discriminator attribute was validated.
If the downstream PE's P-tunnel is already established, its state being monitored by the P2MP BFD session set up using the BFD Discriminator attribute, and both the downstream PE receives the new x-PMSI A-D Route without the BFD Discriminator attribute and the x-PMSI A-D Route was processed without any error as per the relevant specifications, then:
-
The downstream PE MUST stop processing BFD Control packets for this P2MP BFD session;
-
The P2MP BFD session associated with the P-tunnel MUST be deleted. The session MAY be deleted after some configurable delay, which should have a reasonable default.
-
The downstream PE MUST NOT switch the traffic to the Standby Upstream PE.
The following approach is defined in response to the detection by the Upstream PE of a PE-CE link failure. Even though the provider tunnel is still up, it is desired for the downstream PEs to switch to a backup Upstream PE. To achieve that, if the Upstream PE detects that its PE-CE link fails, it
MUST set the bfd.LocalDiag of the P2MP BFD session to Concatenated Path Down or Reverse Concatenated Path Down (per
Section 6.8.17 of
RFC 5880) unless it switches to a new PE-CE link within the time of bfd.DesiredMinTxInterval for the P2MP BFD session (in that case, the Upstream PE will start tracking the status of the new PE-CE link). When a downstream PE receives that bfd.LocalDiag code, it treats it as if the tunnel itself failed and tries to switch to a backup PE.
Several methods to monitor the status of a P-tunnel are described in
Section 3.1.
Tracking the root of an MVPN (
Section 3.1.1) reveals the status of a P-tunnel based on the control plane information. Because, in general, the MPLS data plane is not fate sharing with the control plane, this method might produce false-positive or false-negative alarms, for example, resulting in tunnels that are considered Up but are not able to reach the root, or ones that are declared down prematurely. On the other hand, because BGP next-hop tracking is broadly supported and deployed, this method might be the easiest to deploy.
The method described in
Section 3.1.2 monitors the state of the data plane but only for an egress P-PE link of a P-tunnel. As a result, network failures that affect upstream links might not be detected using this method and the MVPN convergence would be determined by the convergence of the BGP control plane.
Using the state change of a P2MP RSVP-TE LSP as the trigger to re-evaluate the status of the P-tunnel (
Section 3.1.3) relies on the mechanism used to monitor the state of the P2MP LSP.
The method described in
Section 3.1.4 is simple and is safe from causing false alarms, e.g., considering a tunnel operationally Up even though its data path has a defect or, conversely, declaring a tunnel failed when it is unaffected. But the method applies to a subset of MVPNs, those that use the leaf-triggered x-PMSI tunnels.
Though some MVPNs might be used to provide a multicast service with predictable inter-packet intervals (
Section 3.1.5), the number of such cases seem limited.
Monitoring the status of a P-tunnel using a P2MP BFD session (
Section 3.1.6) may produce the most accurate and expedient failure notification of all monitoring methods discussed. On the other hand, it requires careful consideration of the additional load of BFD sessions onto network and PE nodes. Operators should consider the rate of BFD Control packets transmitted by root PEs combined with the number of such PEs in the network. In addition, the number of P2MP BFD sessions per PE determines the amount of state information that a PE maintains.