Tech-invite3GPPspaceIETFspace
9796959493929190898887868584838281807978777675747372717069686766656463626160595857565554535251504948474645444342414039383736353433323130292827262524232221201918171615141312111009080706050403020100
in Index   Prev   Next

RFC 6513

Multicast in MPLS/BGP IP VPNs

Pages: 88
Proposed Standard
Errata
Updated by:  758279007988
Part 3 of 5 – Pages 33 to 55
First   Prev   Next

Top   ToC   RFC6513 - Page 33   prevText

6. PMSI Instantiation

This section provides the procedures for using P-tunnels to instantiate a PMSI. It describes the procedures for setting up and maintaining the P-tunnels as well as for sending and receiving C-data and/or C-control messages on the P-tunnels. However, procedures for binding particular C-flows to particular P-tunnels are discussed in Section 7.
Top   ToC   RFC6513 - Page 34
   PMSIs can be instantiated either by P-multicast trees or by PE-PE
   unicast tunnels.  In the latter case, the PMSI is said to be
   instantiated by "ingress replication".

   This specification supports a number of different methods for setting
   up P-multicast trees: these are detailed below.  A P-tunnel may
   support a single VPN (a non-aggregated P-multicast tree) or multiple
   VPNs (an aggregated P-multicast tree).

6.1. Use of the Intra-AS I-PMSI A-D Route

6.1.1. Sending Intra-AS I-PMSI A-D Routes

When a PE is provisioned to have one or more VRFs that provide MVPN support, the PE announces its MVPN membership information using Intra-AS I-PMSI A-D routes, as discussed in Section 4 and detailed in Section 9.1.1 of [MVPN-BGP]. (Under certain conditions, detailed in [MVPN-BGP], the Intra-AS I-PMSI A-D route may be omitted.) Generally, the Intra-AS I-PMSI A-D route will have a PMSI Tunnel attribute that identifies a P-tunnel that is being used to instantiate the I-PMSI. Section 9.1.1 of [MVPN-BGP] details certain conditions under which the PMSI Tunnel attribute may be omitted (or in which a PMSI Tunnel attribute with the "no tunnel information present" bit may be sent). As a special case, when (a) C-PIM control messages are to be sent through an MI-PMSI and (b) the MI-PMSI is instantiated by a P-tunnel technique for which each PE needs to know only a single P-tunnel identifier per VPN, then the use of the Intra-AS I-PMSI A-D routes MAY be omitted, and static configuration of the tunnel identifier used instead. However, this is not recommended for long-term use, and in all other cases, the Intra-AS I-PMSI A-D routes MUST be used. The PMSI Tunnel attribute MAY contain an upstream-assigned MPLS label, assigned by the PE originating the Intra-AS I-PMSI A-D route. If this label is present, the P-tunnel can be carrying data from several MVPNs. The label is used on the data packets traveling through the tunnel to identify the MVPN to which those data packets belong. (The specified label identifies the packet as belonging to the MVPN that is identified by the RTs of the Intra-AS I-PMSI A-D route.) See Section 12.2 for details on how to place the label in the packet's label stack.
Top   ToC   RFC6513 - Page 35
   The Intra-AS I-PMSI A-D route may contain a "PE Distinguisher Labels"
   attribute.  This contains a set of bindings between upstream-assigned
   labels and PE addresses.  The PE that originated the route may use
   this to bind an upstream-assigned label to one or more of the other
   PEs that belong to the same MVPN.  The way in which PE Distinguisher
   Labels are used is discussed in Sections 6.4.1, 6.4.3, 11.2.2, and
   12.3.  Other uses of the PE Distinguisher Labels attribute are
   outside the scope of this document.

6.1.2. Receiving Intra-AS I-PMSI A-D Routes

The action to be taken when a PE receives an Intra-AS I-PMSI A-D route for a particular MVPN depends on the particular P-tunnel technology that is being used by that MVPN. If the P-tunnel technology requires tunnels to be built by means of receiver- initiated joins, the PE SHOULD join the tunnel immediately.

6.2. When C-flows Are Specifically Bound to P-Tunnels

This situation is discussed in Section 7.

6.3. Aggregating Multiple MVPNs on a Single P-Tunnel

When a P-multicast tree is shared across multiple MVPNs, it is termed an "Aggregate Tree". The procedures described in this document allow a single SP multicast tree to be shared across multiple MVPNs. Unless otherwise specified, P-multicast tree technology supports aggregation. All procedures that are specific to multi-MVPN aggregation are OPTIONAL and are explicitly pointed out. Aggregate Trees allow a single P-multicast tree to be used across multiple MVPNs so that state in the SP core grows per set of MVPNs and not per MVPN. Depending on the congruence of the aggregated MVPNs, this may result in trading off optimality of multicast routing. An Aggregate Tree can be used by a PE to provide a UI-PMSI or MI-PMSI service for more than one MVPN. When this is the case, the Aggregate Tree is said to have an inclusive mapping.
Top   ToC   RFC6513 - Page 36

6.3.1. Aggregate Tree Leaf Discovery

BGP MVPN membership discovery (Section 4) allows a PE to determine the different Aggregate Trees that it should create and the MVPNs that should be mapped onto each such tree. The leaves of an Aggregate Tree are determined by the PEs, supporting aggregation, that belong to all the MVPNs that are mapped onto the tree. If an Aggregate Tree is used to instantiate one or more S-PMSIs, then it may be desirable for the PE at the root of the tree to know which PEs (in its MVPN) are receivers on that tree. This enables the PE to decide when to aggregate two S-PMSIs, based on congruence (as discussed in the next section). Thus, explicit tracking may be required. Since the procedures for disseminating C-multicast routes do not provide explicit tracking, a type of A-D route known as a "Leaf A-D route" is used. The PE that wants to assign a particular C-multicast flow to a particular Aggregate Tree can send an A-D route, which elicits Leaf A-D routes from the PEs that need to receive that C-multicast flow. This provides the explicit tracking information needed to support the aggregation methodology discussed in the next section. For more details on Leaf A-D routes, please refer to [MVPN-BGP].

6.3.2. Aggregation Methodology

This document does not specify the mandatory implementation of any particular set of rules for determining whether or not the PMSIs of two particular MVPNs are to be instantiated by the same Aggregate Tree. This determination can be made by implementation-specific heuristics, by configuration, or even perhaps by the use of offline tools. It is the intention of this document that the control procedures will always result in all the PEs of an MVPN agreeing on the PMSIs that are to be used and on the tunnels used to instantiate those PMSIs. This section discusses potential methodologies with respect to aggregation. The "congruence" of aggregation is defined by the amount of overlap in the leaves of the customer trees that are aggregated on an SP tree. For Aggregate Trees with an inclusive mapping, the congruence depends on the overlap in the membership of the MVPNs that are aggregated on the tree. If there is complete overlap, i.e., all MVPNs have exactly the same sites, aggregation is perfectly congruent. As the overlap between the MVPNs that are aggregated reduces, i.e., the number of sites that are common across all the MVPNs reduces, the congruence reduces.
Top   ToC   RFC6513 - Page 37
   If aggregation is done such that it is not perfectly congruent, a PE
   may receive traffic for MVPNs to which it doesn't belong.  As the
   amount of multicast traffic in these unwanted MVPNs increases,
   aggregation becomes less optimal with respect to delivered traffic.
   Hence, there is a trade-off between reducing state and delivering
   unwanted traffic.

   An implementation should provide knobs to control the congruence of
   aggregation.  These knobs are implementation dependent.  Configuring
   the percentage of sites that MVPNs must have in common to be
   aggregated is an example of such a knob.  This will allow an SP to
   deploy aggregation depending on the MVPN membership and traffic
   profiles in its network.  If different PEs or servers are setting up
   Aggregate Trees, this will also allow a service provider to engineer
   the maximum amount of unwanted MVPNs for which a particular PE may
   receive traffic.

6.3.3. Demultiplexing C-Multicast Traffic

If a P-multicast tree is associated with only one MVPN, determining the P-multicast tree on which a packet was received is sufficient to determine the packet's MVPN. All that the egress PE needs to know is the MVPN with which the P-multicast tree is associated. When multiple MVPNs are aggregated onto one P-multicast tree, determining the tree over which the packet is received is not sufficient to determine the MVPN to which the packet belongs. The packet must also carry some demultiplexing information to allow the egress PEs to determine the MVPN to which the packet belongs. Since the packet has been multicast through the P-network, any given demultiplexing value must have the same meaning to all the egress PEs. The demultiplexing value is a MPLS label that corresponds to the multicast VRF to which the packet belongs. This label is placed by the ingress PE immediately beneath the P-multicast tree header. Each of the egress PEs must be able to associate this MPLS label with the same MVPN. If downstream-assigned labels were used, this would require all the egress PEs in the MVPN to agree on a common label for the MVPN. Instead, the MPLS label is upstream-assigned [MPLS-UPSTREAM-LABEL]. The label bindings are advertised via BGP Updates originated by the ingress PEs. This procedure requires each egress PE to support a separate label space for every other PE. The egress PEs create a forwarding entry for the upstream-assigned MPLS label, allocated by the ingress PE, in this label space. Hence, when the egress PE receives a packet over an Aggregate Tree, it first determines the tree over which the packet was received. The tree identifier determines the label space in which the upstream-assigned MPLS label lookup has to be performed.
Top   ToC   RFC6513 - Page 38
   The same label space may be used for all P-multicast trees rooted at
   the same ingress PE or an implementation may decide to use a separate
   label space for every P-multicast tree.

   A full specification of the procedures to support aggregation on
   shared trees or on MP2MP LSPs is outside the scope of this document.

   The encapsulation format is either MPLS or MPLS-in-something (e.g.,
   MPLS-in-GRE [MPLS-IP]).  When MPLS is used, this label will appear
   immediately below the label that identifies the P-multicast tree.
   When MPLS-in-GRE is used, this label will be the top MPLS label that
   appears when the GRE header is stripped off.

   When IP encapsulation is used for the P-multicast tree, whatever
   information that particular encapsulation format uses for identifying
   a particular tunnel is used to determine the label space in which the
   MPLS label is looked up.

   If the P-multicast tree uses MPLS encapsulation, the P-multicast tree
   is itself identified by an MPLS label.  The egress PE MUST NOT
   advertise IMPLICIT NULL or EXPLICIT NULL for that tree.  Once the
   label representing the tree is popped off the MPLS label stack, the
   next label is the demultiplexing information that allows the proper
   MVPN to be determined.

   This specification requires that, to support this sort of
   aggregation, there be at least one upstream-assigned label per MVPN.
   It does not require that there be only one.  For example, an ingress
   PE could assign a unique label to each (C-S,C-G).  (This could be
   done using the same technique that is used to assign a particular
   (C-S,C-G) to an S-PMSI, see Section 7.4.)

   When an egress PE receives a C-multicast data packet over a
   P-multicast tree, it needs to forward the packet to the CEs that have
   receivers in the packet's C-multicast group.  In order to do this,
   the egress PE needs to determine the P-tunnel on which the packet was
   received.  The PE can then determine the MVPN that the packet belongs
   to and, if needed, do any further lookups that are needed to forward
   the packet.

6.4. Considerations for Specific Tunnel Technologies

While it is believed that the architecture specified in this document places no limitations on the protocols used for setting up and maintaining P-tunnels, the only protocols that have been explicitly considered are PIM-SM (both the SSM and ASM service models are
Top   ToC   RFC6513 - Page 39
   considered, as are bidirectional trees), RSVP-TE, mLDP, and BGP.
   (BGP's role in the setup and maintenance of P-tunnels is to "stitch"
   together the intra-AS segments of a segmented inter-AS P-tunnel.)

6.4.1. RSVP-TE P2MP LSPs

If an I-PMSI is to be instantiated as one or more non-segmented P-tunnels, where the P-tunnels are RSVP-TE P2MP LSPs, then only the PEs that are at the head ends of those LSPs will ever include the PMSI Tunnel attribute in their Intra-AS I-PMSI A-D routes. (These will be the PEs in the "Sender Sites set".) If an I-PMSI is to be instantiated as one or more segmented P-tunnels, where some of the intra-AS segments of these tunnels are RSVP-TE P2MP LSPs, then only a PE or ASBR that is at the head end of one of these LSPs will ever include the PMSI Tunnel attribute in its Inter-AS I-PMSI A-D route. Other PEs send Intra-AS I-PMSI A-D routes without PMSI Tunnel attributes. (These will be the PEs that are in the "Receiver Sites set" but not in the "Sender Sites set".) As each "Sender Site" PE receives an Intra-AS I-PMSI A-D route from a PE in the Receiver Sites set, it adds the PE originating that Intra-AS I-PMSI A-D route to the set of receiving PEs for the P2MP LSP. The PE at the head end MUST then use RSVP-TE [RSVP-P2MP] signaling to add the receiver PEs to the P-tunnel. When RSVP-TE P2MP LSPs are used to instantiate S-PMSIs, and a particular C-flow is to be bound to the LSP, it is necessary to use explicit tracking so that the head end of the LSP knows which PEs need to receive data from the specified C-flow. If the binding is done using S-PMSI A-D routes (see Section 7.4.1), the "Leaf Information Required" bit MUST be set in the PMSI Tunnel attribute. RSVP-TE P2MP LSPs can optionally support aggregation of multiple MVPNs. If an RSVP-TE P2MP LSP Tunnel is used for only a single MVPN, the mapping between the LSP and the MVPN can either be configured or be deduced from the procedures used to announce the LSP (e.g., from the RTs in the A-D route that announced the LSP). If the LSP is used for multiple MVPNs, the set of MVPNs using it (and the corresponding MPLS labels) is inferred from the PMSI Tunnel attributes that specify the LSP. If an RSVP-TE P2MP LSP is being used to carry a set of C-flows traveling along a bidirectional C-tree, using the procedures of Section 11.2, the head end MUST include the PE Distinguisher Labels
Top   ToC   RFC6513 - Page 40
   attribute in its Intra-AS I-PMSI A-D route or S-PMSI A-D route, and
   it MUST provide an upstream-assigned label for each PE that it has
   selected as the Upstream PE for the C-tree's RPA (Rendezvous Point
   Address).  See Section 11.2 for details.

   A PMSI Tunnel attribute specifying an RSVP-TE P2MP LSP contains the
   following information:

     - The type of the tunnel is set to RSVP-TE P2MP Tunnel

     - The RSVP-TE P2MP Tunnel's SESSION Object.

     - Optionally, the RSVP-TE P2MP LSP's SENDER_TEMPLATE Object.  This
       object is included when it is desired to identify a particular
       P2MP TE LSP.

   Demultiplexing the C-multicast data packets at the egress PE follows
   procedures described in Section 6.3.3.  As specified in Section
   6.3.3, an egress PE MUST NOT advertise IMPLICIT NULL or EXPLICIT NULL
   for an RSVP-TE P2MP LSP that is carrying traffic for one or more
   MVPNs.

   If (and only if) a particular RSVP-TE P2MP LSP is possibly carrying
   data from multiple MVPNs, the following special procedures apply:

     - A packet in a particular MVPN, when transmitted into the LSP,
       must carry the MPLS label specified in the PMSI Tunnel attribute
       that announced that LSP as a P-tunnel for that for that MVPN.

     - Demultiplexing the C-multicast data packets at the egress PE is
       done by means of the MPLS label that rises to the top of the
       stack after the label corresponding to the P2MP LSP is popped
       off.

   It is possible that at the time a PE learns, via an A-D route with a
   PMSI Tunnel attribute, that it needs to receive traffic on a
   particular RSVP-TE P2MP LSP, the signaling to set up the LSP will not
   have been completed.  In this case, the PE needs to wait for the
   RSVP-TE signaling to take place before it can modify its forwarding
   tables as directed by the A-D route.

   It is also possible that the signaling to set up an RSVP-TE P2MP LSP
   will be completed before a given PE learns, via a PMSI Tunnel
   attribute, of the use to which that LSP will be put.  The PE MUST
   discard any traffic received on that LSP until that time.
Top   ToC   RFC6513 - Page 41
   In order for the egress PE to be able to discard such traffic, it
   needs to know that the LSP is associated with an MVPN and that the
   A-D route that binds the LSP to an MVPN or to a particular a C-flow
   has not yet been received.  This is provided by extending [RSVP-P2MP]
   with [RSVP-OOB].

6.4.2. PIM Trees

When the P-tunnels are PIM trees, the PMSI Tunnel attribute contains enough information to allow each other PE in the same MVPN to use P-PIM signaling to join the P-tunnel. If an I-PMSI is to be instantiated as one or more PIM trees, then the PE that is at the root of a given PIM tree sends an Intra-AS I-PMSI A-D route containing a PMSI Tunnel attribute that contains all the information needed for other PEs to join the tree. If PIM trees are to be used to instantiate an MI-PMSI, each PE in the MVPN must send an Intra-AS I-PMSI A-D route containing such a PMSI Tunnel attribute. If a PMSI is to be instantiated via a shared tree, the PMSI Tunnel attribute identifies the P-group address. The RP or RPA corresponding to the P-group address is not specified. It must, of course, be known to all the PEs. It is presupposed that the PEs use one of the methods for automatically learning the RP-to-group correspondences (e.g., Bootstrap Router Protocol [BSR]), or else that the correspondence is configured. If a PMSI is to be instantiated via a source-specific tree, the PMSI Tunnel attribute identifies the PE router that is the root of the tree, as well as a P-group address. The PMSI Tunnel attribute always specifies whether the PIM tree is to be a unidirectional shared tree, a bidirectional shared tree, or a source-specific tree. If PIM trees are being used to instantiate S-PMSIs, the above procedures assume that each PE router has a set of group P-addresses that it can use for setting up the PIM-trees. Each PE must be configured with this set of P-addresses. If the P-tunnels are source-specific trees, then the PEs may be configured with overlapping sets of group P-addresses. If the trees are not source- specific, then each PE must be configured with a unique set of group P-addresses (i.e., having no overlap with the set configured at any other PE router). The management of this set of addresses is thus greatly simplified when source-specific trees are used, so the use of source-specific trees is strongly recommended whenever unidirectional trees are desired.
Top   ToC   RFC6513 - Page 42
   Specification of the full set of procedures for using bidirectional
   PIM trees to instantiate S-PMSIs is outside the scope of this
   document.

   Details for constructing the PMSI Tunnel attribute identifying a PIM
   tree can be found in [MVPN-BGP].

6.4.3. mLDP P2MP LSPs

When the P-tunnels are mLDP P2MP trees, each Intra-AS I-PMSI A-D route has a PMSI Tunnel attribute containing enough information to allow each other PE in the same MVPN to use mLDP signaling to join the P-tunnel. The tunnel identifier consists of a P2MP Forwarding Equivalence Class (FEC) Element [mLDP]. An mLDP P2MP LSP may be used to carry the traffic of multiple VPNs, if the PMSI Tunnel attribute specifying it contains a non-zero MPLS label. If an mLDP P2MP LSP is being used to carry the set of flows traveling along a particular bidirectional C-tree, using the procedures of Section 11.2, the root of the LSP MUST include the PE Distinguisher Labels attribute in its Intra-AS I-PMSI A-D route or S-PMSI A-D route, and it MUST provide an upstream-assigned label for the PE that it has selected to be the Upstream PE for the C-tree's RPA. See Section 11.2 for details.

6.4.4. mLDP MP2MP LSPs

The specification of the procedures for assigning C-flows to mLDP MP2MP LSPs that serve as P-tunnels is outside the scope of this document.

6.4.5. Ingress Replication

As described in Section 3, a PMSI can be instantiated using Unicast Tunnels between the PEs that are participating in the MVPN. In this mechanism, the ingress PE replicates a C-multicast data packet belonging to a particular MVPN and sends a copy to all or a subset of the PEs that belong to the MVPN. A copy of the packet is tunneled to a remote PE over a Unicast Tunnel to the remote PE. IP/GRE Tunnels or MPLS LSPs are examples of unicast tunnels that may be used. The same Unicast Tunnel can be used to transport packets belonging to different MVPNs In order for a PE to use Unicast P-tunnels to send a C-multicast data packet for a particular MVPN to a set of remote PEs, the remote PEs must be able to correctly decapsulate such packets and to assign each
Top   ToC   RFC6513 - Page 43
   one to the proper MVPN.  This requires that the encapsulation used
   for sending packets through the P-tunnel have demultiplexing
   information that the receiver can associate with a particular MVPN.

   If ingress replication is being used to instantiate the PMSIs for an
   MVPN, the PEs announce this as part of the BGP-based MVPN membership
   auto-discovery process, described in Section 4.  The PMSI Tunnel
   attribute specifies ingress replication; it also specifies a
   downstream-assigned MPLS label.  This label will be used to identify
   that a particular packet belongs to the MVPN that the Intra-AS I-PMSI
   A-D route belongs to (as inferred from its RTs).  If PE1 specifies a
   particular label value for a particular MVPN, then any other PE
   sending PE1 a packet for that MVPN through a unicast P-tunnel must
   put that label on the packet's label stack.  PE1 then treats that
   label as the demultiplexor value identifying the MVPN in question.

   Ingress replication may be used to instantiate any kind of PMSI.
   When ingress replication is done, it is RECOMMENDED, except in the
   one particular case mentioned in the next paragraph, that explicit
   tracking be done and that the data packets of a particular C-flow
   only get sent to those PEs that need to see the packets of that
   C-flow.  There is never any need to use the procedures of Section 7.4
   for binding particular C-flows to particular P-tunnels.

   The particular case in which there is no need for explicit tracking
   is the case where ingress replication is being used to create a
   one-hop ASBR-ASBR inter-AS segment of an segmented inter-AS P-tunnel.

   Section 9.1 specifies three different methods that can be used to
   prevent duplication of multicast data packets.  Any given deployment
   must use at least one of those methods.  Note that the method
   described in Section 9.1.1 ("Discarding Packets from Wrong PE")
   presupposes that the egress PE of a P-tunnel can, upon receiving a
   packet from the P-tunnel, determine the identity of the PE that
   transmitted the packet into the P-tunnel.  SPs that use ingress
   replication to instantiate their PMSIs are cautioned against this use
   for this purpose of unicast P-tunnel technologies that do not allow
   the egress PE to identify the ingress PE (e.g., MP2P LSPs for which
   penultimate-hop-popping is done).  Deployment of ingress replication
   with such P-tunnel technology MUST NOT be done unless it is known
   that the deployment relies entirely on the procedures of Sections
   9.1.2 or 9.1.3 for duplicate prevention.
Top   ToC   RFC6513 - Page 44

7. Binding Specific C-Flows to Specific P-Tunnels

As discussed previously, Intra-AS I-PMSI A-D routes may (or may not) have PMSI Tunnel attributes, identifying P-tunnels that can be used as the default P-tunnels for carrying C-multicast traffic, i.e., for carrying C-multicast traffic that has not been specifically bound to another P-tunnel. If none of the Intra-AS I-PMSI A-D routes originated by a particular PE for a particular MVPN carry PMSI Tunnel attributes at all (or if the only PMSI Tunnel attributes they carry have type "No tunnel information present"), then there are no default P-tunnels for that PE to use when transmitting C-multicast traffic in that MVPN to other PEs. In that case, all such C-flows must be assigned to specific P-tunnels using one of the mechanisms specified in Section 7.4. That is, all such C-flows are carried on P-tunnels that instantiate S-PMSIs. There are other cases where it may be either necessary or desirable to use the mechanisms of Section 7.4 to identify specific C-flows and bind them to or unbind them from specific P-tunnels. Some possible cases are as follows: - The policy for a particular MVPN is to send all C-data on S-PMSIs, even if the Intra-AS I-PMSI A-D routes carry PMSI Tunnel attributes. (This is another case where all C-data is carried on S-PMSIs; presumably, the I-PMSIs are used for control information.) - It is desired to optimize the routing of the particular C-flow, which may already be traveling on an I-PMSI, by sending it instead on an S-PMSI. - If a particular C-flow is traveling on an S-PMSI, it may be considered desirable to move it to an I-PMSI (i.e., optimization of the routing for that flow may no longer be considered desirable). - It is desired to change the encapsulation used to carry the C-flow, e.g., because one now wants to aggregate it on a P-tunnel with flows from other MVPNs. Note that if Full PIM Peering over an MI-PMSI (Section 5.2) is being used, then from the perspective of the PIM state machine, the "interface" connecting the PEs to each other is the MI-PMSI, even if some or all of the C-flows are being sent on S-PMSIs. That is, from
Top   ToC   RFC6513 - Page 45
   the perspective of the C-PIM state machine, when a C-flow is being
   sent or received on an S-PMSI, the output or input interface
   (respectively) is considered to be the MI-PMSI.

   Section 7.1 discusses certain general considerations that apply
   whenever a specified C-flow is bound to a specified P-tunnel using
   the mechanisms of Section 7.4.  This includes the case where the
   C-flow is moved from one P-tunnel to another as well as the case
   where the C-flow is initially bound to an S-PMSI P-tunnel.

   Section 7.2 discusses the specific case of using the mechanisms of
   Section 7.4 as a way of optimizing multicast routing by switching
   specific flows from one P-tunnel to another.

   Section 7.3 discusses the case where the mechanisms of Section 7.4
   are used to announce the presence of "unsolicited flooded data" and
   to assign such data to a particular P-tunnel.

   Section 7.4 specifies the protocols for assigning specific C-flows to
   specific P-tunnels.  These protocols may be used to assign a C-flow
   to a P-tunnel initially or to switch a flow from one P-tunnel to
   another.

   Procedures for binding to a specified P-tunnel the set of C-flows
   traveling along a specified C-tree (or for so binding a set of
   C-flows that share some relevant characteristic), without identifying
   each flow individually, are outside the scope of this document.

7.1. General Considerations

7.1.1. At the PE Transmitting the C-Flow on the P-Tunnel

The decision to bind a particular C-flow (designated as (C-S,C-G)) to a particular P-tunnel, or to switch a particular C-flow to a particular P-tunnel, is always made by the PE that is to transmit the C-flow onto the P-tunnel. Whenever a PE moves a particular C-flow from one P-tunnel, say P1, to another, say P2, care must be taken to ensure that there is no steady state duplication of traffic. At any given time, the PE transmits the C-flow either on P1 or on P2, but not on both. When a particular PE, say PE1, decides to bind a particular C-flow to a particular P-tunnel, say P2, the following procedures MUST be applied:
Top   ToC   RFC6513 - Page 46
     - PE1 must issue the required control plane information to signal
       that the specified C-flow is now bound to P-tunnel P2 (see
       Section 7.4).

     - If P-tunnel P2 needs to be constructed from the root downwards,
       PE1 must initiate the signaling to construct P2.  This is only
       required if P2 is an RSVP-TE P2MP LSP.

     - If the specified C-flow is currently bound to a different
       P-tunnel, say P1, then:

         * PE1 MUST wait for a "switch-over" delay before sending
           traffic of the C-flow on P-tunnel P2.  It is RECOMMENDED to
           allow this delay to be configurable.

         * Once the "switch-over" delay has elapsed, PE1 MUST send
           traffic for the C-flow on P2 and MUST NOT send it on P1.  In
           no case is any C-flow packet sent on both P-tunnels.

   When a C-flow is switched from one P-tunnel to another, the purpose
   of running a switch-over timer is to minimize packet loss without
   introducing packet duplication.  However, jitter may be introduced
   due to the difference in transit delays between the old and new
   P-tunnels.

   For best effect, the switch-over timer should be configured to a
   value that is "just long enough" (a) to allow all the PEs to learn
   about the new binding of C-flow to P-tunnel and (b) to allow the PEs
   to construct the P-tunnel, if it doesn't already exist.

   If, after such a switch, the "old" P-tunnel P1 is no longer needed,
   it SHOULD be torn down and the resources supporting it freed.  The
   procedures for "tearing down" a P-tunnel are specific to the P-tunnel
   technology.

   Procedures for binding sets of C-flows traveling along specified
   C-trees (or sets of C-flows sharing any other characteristic) to a
   specified P-tunnel (or for moving them from one P-tunnel to another)
   are outside the scope of this document.

7.1.2. At the PE Receiving the C-flow from the P-Tunnel

Suppose that a particular PE, say PE1, learns, via the procedures of Section 7.4, that some other PE, say PE2, has bound a particular C-flow, designated as (C-S,C-G), to a particular P-tunnel, say P2. Then, PE1 must determine whether it needs to receive (C-S,C-G) traffic from PE2.
Top   ToC   RFC6513 - Page 47
   If BGP is being used to distribute C-multicast routing information
   from PE to PE, the conditions under which PE1 needs to receive
   (C-S,C-G) traffic from PE2 are specified in Section 12.3 of
   [MVPN-BGP].

   If PIM over an MI-PMSI is being used to distribute C-multicast
   routing from PE to PE, PE1 needs to receive (C-S,C-G) traffic from
   PE2 if one or more of the following conditions holds:

     - PE1 has (C-S,C-G) state such that PE2 is PE1's Upstream PE for
       (C-S,C-G), and PE1 has downstream neighbors ("non-null olist")
       for the (C-S,C-G) state.

     - PE1 has (C-*,C-G) state with an Upstream PE (not necessarily PE2)
       and with downstream neighbors ( "non-null olist"), but PE1 does
       not have (C-S,C-G) state.

     - Native PIM methods are being used to prevent steady-state packet
       duplication, and PE1 has either (C-*,C-G) or (C-S,C-G) state such
       that the MI-PMSI is one of the downstream interfaces.  Note that
       this includes the case where PE1 is itself sending (C-S,C-G)
       traffic on an S-PMSI.  (In this case, PE1 needs to receive the
       (C-S,C-G) traffic from PE2 in order to allow the PIM Assert
       mechanism to function properly.)

   Irrespective of whether BGP or PIM is being used to distribute
   C-multicast routing information, once PE1 determines that it needs to
   receive (C-S,C-G) traffic from PE2, the following procedures MUST be
   applied:

     - PE1 MUST take all necessary steps to be able to receive the
       (C-S,C-G) traffic on P2.

         * If P2 is a PIM tunnel or an mLDP LSP, PE1 will need to use
           PIM or mLDP (respectively) to join P2 (unless it is already
           joined to P2).

         * PE1 may need to modify the forwarding state for (C-S,C-G) to
           indicate that (C-S,C-G) traffic is to be accepted on P2.  If
           P2 is an Aggregate Tree, this also implies setting up the
           demultiplexing forwarding entries based on the inner label as
           described in Section 6.3.3

     - If PE1 was previously receiving the (C-S,C-G) C-flow on another
       P-tunnel, say P1, then:

         * PE1 MAY run a switch-over timer, and until it expires, SHOULD
           accept traffic for the given C-flow on both P1 and P2;
Top   ToC   RFC6513 - Page 48
         * If, after such a switch, the "old" P-tunnel P1 is no longer
           needed, it SHOULD be torn down and the resources supporting
           it freed.  The procedures for "tearing down" a P-tunnel are
           specific to the P-tunnel technology.

     - If PE1 later determines that it no longer needs to receive any of
       the C-multicast data that is being sent on a particular P-tunnel,
       it may initiate signaling (specific to the P-tunnel technology)
       to remove itself from that tunnel.

7.2. Optimizing Multicast Distribution via S-PMSIs

Whenever a particular multicast stream is being sent on an I-PMSI, it is likely that the data of that stream is being sent to PEs that do not require it. If a particular stream has a significant amount of traffic, it may be beneficial to move it to an S-PMSI that includes only those PEs that are transmitters and/or receivers (or at least includes fewer PEs that are neither). If explicit tracking is being done, S-PMSI creation can also be triggered on other criteria. For instance, there could be a "pseudo- wasted bandwidth" criterion: switching to an S-PMSI would be done if the bandwidth multiplied by the number of uninterested PEs (PE that are receiving the stream but have no receivers) is above a specified threshold. The motivation is that (a) the total bandwidth wasted by many sparsely subscribed low-bandwidth groups may be large and (b) there's no point to moving a high-bandwidth group to an S-PMSI if all the PEs have receivers for it. Switching a (C-S,C-G) stream to an S-PMSI may require the root of the S-PMSI to determine the egress PEs that need to receive the (C-S,C-G) traffic. This is true in the following cases: - If the P-tunnel is a source-initiated tree, such as an RSVP-TE P2MP Tunnel, the PE needs to know the leaves of the tree before it can instantiate the S-PMSI. - If a PE instantiates multiple S-PMSIs, belonging to different MVPNs, using one P-multicast tree, such a tree is termed an Aggregate Tree with a selective mapping. The setting up of such an Aggregate Tree requires the ingress PE to know all the other PEs that have receivers for multicast groups that are mapped onto the tree.
Top   ToC   RFC6513 - Page 49
   The above two cases require that explicit tracking be done for the
   (C-S,C-G) stream.  The root of the S-PMSI MAY decide to do explicit
   tracking of this stream only after it has determined to move the
   stream to an S-PMSI, or it MAY have been doing explicit tracking all
   along.

   If the S-PMSI is instantiated by a P-multicast tree, the PE at the
   root of the tree must signal the leaves of the tree that the
   (C-S,C-G) stream is now bound to the S-PMSI.  Note that the PE could
   create the identity of the P-multicast tree prior to the actual
   instantiation of the P-tunnel.

   If the S-PMSI is instantiated by a source-initiated P-multicast tree
   (e.g., an RSVP-TE P2MP tunnel), the PE at the root of the tree must
   establish the source-initiated P-multicast tree to the leaves.  This
   tree MAY have been established before the leaves receive the S-PMSI
   binding, or it MAY be established after the leaves receive the
   binding.  The leaves MUST NOT switch to the S-PMSI until they receive
   both the binding and the tree signaling message.

7.3. Announcing the Presence of Unsolicited Flooded Data

A PE may receive "unsolicited" data from a CE, where the data is intended to be flooded to the other PEs of the same MVPN and then on to other CEs. By "unsolicited", we mean that the data is to be delivered to all the other PEs of the MVPN, even though those PEs may not have sent any control information indicating that they need to receive that data. For example, if the BSR [BSR] is being used within the MVPN, BSR control messages may be received by a PE from a CE. These need to be forwarded to other PEs, even though no PE ever issues any kind of explicit signal saying that it wants to receive BSR messages. If a PE receives a BSR message from a CE, and if the CE's MVPN has an MI-PMSI, then the PE can just send BSR messages on the appropriate P-tunnel. Otherwise, the PE MUST announce the binding of a particular C-flow to a particular P-tunnel, using the procedures of Section 7.4. The particular C-flow in this case would be (C-IPaddress_of_PE, ALL-PIM-ROUTERS). The P-tunnel identified by the procedures of Section 7.4 may or may not be one that was previously identified in the PMSI Tunnel attribute of an I-PMSI A-D route. Further procedures for handling BSR may be found in Sections 5.2.1 and 5.3.4.
Top   ToC   RFC6513 - Page 50
   Analogous procedures may be used for announcing the presence of other
   sorts of unsolicited flooded data, e.g., dense mode data or data from
   proprietary protocols that presume messages can be flooded.  However,
   a full specification of the procedures for traffic other than BSR
   traffic is outside the scope of this document.

7.4. Protocols for Binding C-Flows to P-Tunnels

We describe two protocols for binding C-flows to P-tunnels. These protocols can be used for moving C-flows from I-PMSIs to S-PMSIs, as long as the S-PMSI is instantiated by a P-multicast tree. (If the S-PMSI is instantiated by means of ingress replication, the procedures of Section 6.4.5 suffice.) These protocols can also be used for other cases in which it is necessary to bind specific C-flows to specific P-tunnels.

7.4.1. Using BGP S-PMSI A-D Routes

Not withstanding the name of the mechanism "S-PMSI A-D routes", the mechanism to be specified in this section may be used any time it is necessary to advertise a binding of a C-flow to a particular P-tunnel.
7.4.1.1. Advertising C-Flow Binding to P-Tunnel
The ingress PE informs all the PEs that are on the path to receivers of the (C-S,C-G) of the binding of the P-tunnel to the (C-S,C-G). The BGP announcement is done by sending an update for the MCAST-VPN address family. An S-PMSI A-D route is used, containing the following information: 1. The IP address of the originating PE. 2. The RD configured locally for the MVPN. This is required to uniquely identify the (C-S,C-G) as the addresses could overlap between different MVPNs. This is the same RD value used in the auto-discovery process. 3. The C-S address. 4. The C-G address. 5. A PE MAY use a single P-tunnel to aggregate two or more S-PMSIs. If the PE already advertised unaggregated S-PMSI A-D routes for these S-PMSIs, then a decision to aggregate them requires the PE to re-advertise these routes. The re-
Top   ToC   RFC6513 - Page 51
         advertised routes MUST be the same as the original ones, except
         for the PMSI Tunnel attribute.  If the PE has not previously
         advertised S-PMSI A-D routes for these S-PMSIs, then the
         aggregation requires the PE to advertise (new) S-PMSI A-D
         routes for these S-PMSIs.  The PMSI Tunnel attribute in the
         newly advertised/re-advertised routes MUST carry the identity
         of the P-tunnel that aggregates the S-PMSIs.

         If all these aggregated S-PMSIs belong to the same MVPN, and
         this MVPN uses PIM as its C-multicast routing protocol, then
         the corresponding S-PMSI A-D routes MAY carry an MPLS upstream-
         assigned label [MPLS-UPSTREAM-LABEL].  Moreover, in this case,
         the labels MUST be distinct on a per-MVPN basis, and MAY be
         distinct on a per-route basis.

         If all these aggregated S-PMSIs belong to the MVPN(s) that use
         mLDP as its C-multicast routing protocol, then the
         corresponding S-PMSI A-D routes MUST carry an MPLS upstream-
         assigned label [MPLS-UPSTREAM-LABEL], and these labels MUST be
         distinct on a per-route (per-mLDP-FEC) basis, irrespective of
         whether the aggregated S-PMSIs belong to the same or different
         MVPNs.

   When a PE distributes this information via BGP, it must include the
   following:

      1. An identifier for the particular P-tunnel to which the stream
         is to be bound.  This identifier is a structured field that
         includes the following information:

           * The type of tunnel

           * An identifier for the tunnel.  The form of the identifier
             will depend upon the tunnel type.  The combination of
             tunnel identifier and tunnel type should contain enough
             information to enable all the PEs to "join" the tunnel and
             receive messages from it.

      2. Route Target Extended Communities attribute.  This is used as
         described in Section 4.

7.4.1.2. Explicit Tracking
If the PE wants to enable explicit tracking for the specified flow, it also indicates this in the A-D route it uses to bind the flow to a particular P-tunnel. Then, any PE that receives the A-D route will
Top   ToC   RFC6513 - Page 52
   respond with a "Leaf A-D route" in which it identifies itself as a
   receiver of the specified flow.  The Leaf A-D route will be withdrawn
   when the PE is no longer a receiver for the flow.

   If the PE needs to enable explicit tracking for a flow without at the
   same time binding the flow to a specific P-tunnel, it can do so by
   sending an S-PMSI A-D route whose NLRI identifies the flow and whose
   PMSI Tunnel attribute has its tunnel type value set to "no tunnel
   information present" and its "leaf information required" bit set to
   1.  This will elicit the Leaf A-D routes.  This is useful when the PE
   needs to know the receivers before selecting a P-tunnel.

7.4.2. UDP-Based Protocol

This procedure carries its control messages in UDP and requires that the MVPN have an MI-PMSI that can be used to carry the control messages.
7.4.2.1. Advertising C-Flow Binding to P-Tunnel
In order for a given PE to move a particular C-flow to a particular P-tunnel, an "S-PMSI Join message" is sent periodically on the MI-PMSI. (Notwithstanding the name of the mechanism, the mechanism may be used to bind a flow to any P-tunnel.) The S-PMSI Join message is a UDP-encapsulated message whose destination address is ALL-PIM- ROUTERS (224.0.0.13) and whose destination port is 3232. The S-PMSI Join message contains the following information: - An identifier for the particular multicast stream that is to be bound to the P-tunnel. This can be represented as an (S,G) pair. - An identifier for the particular P-tunnel to which the stream is to be bound. This identifier is a structured field that includes the following information: * The type of tunnel used to instantiate the S-PMSI. * An identifier for the tunnel. The form of the identifier will depend upon the tunnel type. The combination of tunnel identifier and tunnel type should contain enough information to enable all the PEs to "join" the tunnel and receive messages from it. * If (and only if) the identified P-tunnel is aggregating several S-PMSIs, any demultiplexing information needed by the tunnel encapsulation protocol to identify a particular S-PMSI.
Top   ToC   RFC6513 - Page 53
   If the policy for the MVPN is that traffic is sent/received by
   default over an MI-PMSI, then traffic for a particular C-flow can be
   switched back to the MI-PMSI simply by ceasing to send S-PMSI Joins
   for that C-flow.

   Note that an S-PMSI Join that is not received over a PMSI (e.g., one
   that is received directly from a CE) is an illegal packet that MUST
   be discarded.

7.4.2.2. Packet Formats and Constants
The S-PMSI Join message is encapsulated within UDP and has the following type/length/value (TLV) encoding: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | Length | Value | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | . | | . | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Type (8 bits) Length (16 bits): the total number of octets in the Type, Length, and Value fields combined Value (variable length) In this specification, only one type of S-PMSI Join is defined. A Type 1 S-PMSI Join is used when the S-PMSI tunnel is a PIM tunnel that is used to carry a single multicast stream, where the packets of that stream have IPv4 source and destination IP addresses. The S-PMSI Join format to use when the C-source and C-group are IPv6 addresses will be defined in a follow-on document.
Top   ToC   RFC6513 - Page 54
        0                   1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |     Type      |           Length            |    Reserved     |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                           C-source                            |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                           C-group                             |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                           P-group                             |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   Type (8 bits): 1

   Length (16 bits): 16

   Reserved (8 bits): This field SHOULD be zero when transmitted, and
   MUST be ignored when received.

   C-source (32 bits): the IPv4 address of the traffic source in the
   VPN.

   C-group (32 bits): the IPv4 address of the multicast traffic
   destination address in the VPN.

   P-group (32 bits): the IPv4 group address that the PE router is going
   to use to encapsulate the flow (C-source, C-group).

   The P-group identifies the S-PMSI P-tunnel, and the (C-S,C-G)
   identifies the multicast flow that is carried in the P-tunnel.

   The protocol uses the following constants.

   [S-PMSI_DELAY]:

       Once an S-PMSI Join message has been sent, the PE router that is
       to transmit onto the S-PMSI will delay this amount of time before
       it begins using the S-PMSI.  The default value is 3 seconds.

   [S-PMSI_TIMEOUT]:

       If a PE (other than the transmitter) does not receive any packets
       over the S-PMSI P-tunnel for this amount of time, the PE will
       prune itself from the S-PMSI P-tunnel, and will expect (C-S,C-G)
       packets to arrive on an I-PMSI.  The default value is 3 minutes.

       This value must be consistent among PE routers.
Top   ToC   RFC6513 - Page 55
   [S-PMSI_HOLDOWN]:

       If the PE that transmits onto the S-PMSI does not see any
       (C-S,C-G) packets for this amount of time, it will resume sending
       (C-S,C-G) packets on an I-PMSI.

       This is used to avoid oscillation when traffic is bursty.  The
       default value is 1 minute.

   [S-PMSI_INTERVAL]:

       The interval the transmitting PE router uses to periodically send
       the S-PMSI Join message.  The default value is 60 seconds.

7.4.3. Aggregation

S-PMSIs can be aggregated on a P-multicast tree. The S-PMSI to (C-S,C-G) binding advertisement supports aggregation. Furthermore, the aggregation procedures of Section 6.3 apply. It is also possible to aggregate both S-PMSIs and I-PMSIs on the same P-multicast tree.


(page 55 continued on part 4)

Next Section