6. PMSI Instantiation
This section provides the procedures for using P-tunnels to instantiate a PMSI. It describes the procedures for setting up and maintaining the P-tunnels as well as for sending and receiving C-data and/or C-control messages on the P-tunnels. However, procedures for binding particular C-flows to particular P-tunnels are discussed in Section 7.
PMSIs can be instantiated either by P-multicast trees or by PE-PE unicast tunnels. In the latter case, the PMSI is said to be instantiated by "ingress replication". This specification supports a number of different methods for setting up P-multicast trees: these are detailed below. A P-tunnel may support a single VPN (a non-aggregated P-multicast tree) or multiple VPNs (an aggregated P-multicast tree).6.1. Use of the Intra-AS I-PMSI A-D Route
6.1.1. Sending Intra-AS I-PMSI A-D Routes
When a PE is provisioned to have one or more VRFs that provide MVPN support, the PE announces its MVPN membership information using Intra-AS I-PMSI A-D routes, as discussed in Section 4 and detailed in Section 9.1.1 of [MVPN-BGP]. (Under certain conditions, detailed in [MVPN-BGP], the Intra-AS I-PMSI A-D route may be omitted.) Generally, the Intra-AS I-PMSI A-D route will have a PMSI Tunnel attribute that identifies a P-tunnel that is being used to instantiate the I-PMSI. Section 9.1.1 of [MVPN-BGP] details certain conditions under which the PMSI Tunnel attribute may be omitted (or in which a PMSI Tunnel attribute with the "no tunnel information present" bit may be sent). As a special case, when (a) C-PIM control messages are to be sent through an MI-PMSI and (b) the MI-PMSI is instantiated by a P-tunnel technique for which each PE needs to know only a single P-tunnel identifier per VPN, then the use of the Intra-AS I-PMSI A-D routes MAY be omitted, and static configuration of the tunnel identifier used instead. However, this is not recommended for long-term use, and in all other cases, the Intra-AS I-PMSI A-D routes MUST be used. The PMSI Tunnel attribute MAY contain an upstream-assigned MPLS label, assigned by the PE originating the Intra-AS I-PMSI A-D route. If this label is present, the P-tunnel can be carrying data from several MVPNs. The label is used on the data packets traveling through the tunnel to identify the MVPN to which those data packets belong. (The specified label identifies the packet as belonging to the MVPN that is identified by the RTs of the Intra-AS I-PMSI A-D route.) See Section 12.2 for details on how to place the label in the packet's label stack.
The Intra-AS I-PMSI A-D route may contain a "PE Distinguisher Labels" attribute. This contains a set of bindings between upstream-assigned labels and PE addresses. The PE that originated the route may use this to bind an upstream-assigned label to one or more of the other PEs that belong to the same MVPN. The way in which PE Distinguisher Labels are used is discussed in Sections 6.4.1, 6.4.3, 11.2.2, and 12.3. Other uses of the PE Distinguisher Labels attribute are outside the scope of this document.6.1.2. Receiving Intra-AS I-PMSI A-D Routes
The action to be taken when a PE receives an Intra-AS I-PMSI A-D route for a particular MVPN depends on the particular P-tunnel technology that is being used by that MVPN. If the P-tunnel technology requires tunnels to be built by means of receiver- initiated joins, the PE SHOULD join the tunnel immediately.6.2. When C-flows Are Specifically Bound to P-Tunnels
This situation is discussed in Section 7.6.3. Aggregating Multiple MVPNs on a Single P-Tunnel
When a P-multicast tree is shared across multiple MVPNs, it is termed an "Aggregate Tree". The procedures described in this document allow a single SP multicast tree to be shared across multiple MVPNs. Unless otherwise specified, P-multicast tree technology supports aggregation. All procedures that are specific to multi-MVPN aggregation are OPTIONAL and are explicitly pointed out. Aggregate Trees allow a single P-multicast tree to be used across multiple MVPNs so that state in the SP core grows per set of MVPNs and not per MVPN. Depending on the congruence of the aggregated MVPNs, this may result in trading off optimality of multicast routing. An Aggregate Tree can be used by a PE to provide a UI-PMSI or MI-PMSI service for more than one MVPN. When this is the case, the Aggregate Tree is said to have an inclusive mapping.
6.3.1. Aggregate Tree Leaf Discovery
BGP MVPN membership discovery (Section 4) allows a PE to determine the different Aggregate Trees that it should create and the MVPNs that should be mapped onto each such tree. The leaves of an Aggregate Tree are determined by the PEs, supporting aggregation, that belong to all the MVPNs that are mapped onto the tree. If an Aggregate Tree is used to instantiate one or more S-PMSIs, then it may be desirable for the PE at the root of the tree to know which PEs (in its MVPN) are receivers on that tree. This enables the PE to decide when to aggregate two S-PMSIs, based on congruence (as discussed in the next section). Thus, explicit tracking may be required. Since the procedures for disseminating C-multicast routes do not provide explicit tracking, a type of A-D route known as a "Leaf A-D route" is used. The PE that wants to assign a particular C-multicast flow to a particular Aggregate Tree can send an A-D route, which elicits Leaf A-D routes from the PEs that need to receive that C-multicast flow. This provides the explicit tracking information needed to support the aggregation methodology discussed in the next section. For more details on Leaf A-D routes, please refer to [MVPN-BGP].6.3.2. Aggregation Methodology
This document does not specify the mandatory implementation of any particular set of rules for determining whether or not the PMSIs of two particular MVPNs are to be instantiated by the same Aggregate Tree. This determination can be made by implementation-specific heuristics, by configuration, or even perhaps by the use of offline tools. It is the intention of this document that the control procedures will always result in all the PEs of an MVPN agreeing on the PMSIs that are to be used and on the tunnels used to instantiate those PMSIs. This section discusses potential methodologies with respect to aggregation. The "congruence" of aggregation is defined by the amount of overlap in the leaves of the customer trees that are aggregated on an SP tree. For Aggregate Trees with an inclusive mapping, the congruence depends on the overlap in the membership of the MVPNs that are aggregated on the tree. If there is complete overlap, i.e., all MVPNs have exactly the same sites, aggregation is perfectly congruent. As the overlap between the MVPNs that are aggregated reduces, i.e., the number of sites that are common across all the MVPNs reduces, the congruence reduces.
If aggregation is done such that it is not perfectly congruent, a PE may receive traffic for MVPNs to which it doesn't belong. As the amount of multicast traffic in these unwanted MVPNs increases, aggregation becomes less optimal with respect to delivered traffic. Hence, there is a trade-off between reducing state and delivering unwanted traffic. An implementation should provide knobs to control the congruence of aggregation. These knobs are implementation dependent. Configuring the percentage of sites that MVPNs must have in common to be aggregated is an example of such a knob. This will allow an SP to deploy aggregation depending on the MVPN membership and traffic profiles in its network. If different PEs or servers are setting up Aggregate Trees, this will also allow a service provider to engineer the maximum amount of unwanted MVPNs for which a particular PE may receive traffic.6.3.3. Demultiplexing C-Multicast Traffic
If a P-multicast tree is associated with only one MVPN, determining the P-multicast tree on which a packet was received is sufficient to determine the packet's MVPN. All that the egress PE needs to know is the MVPN with which the P-multicast tree is associated. When multiple MVPNs are aggregated onto one P-multicast tree, determining the tree over which the packet is received is not sufficient to determine the MVPN to which the packet belongs. The packet must also carry some demultiplexing information to allow the egress PEs to determine the MVPN to which the packet belongs. Since the packet has been multicast through the P-network, any given demultiplexing value must have the same meaning to all the egress PEs. The demultiplexing value is a MPLS label that corresponds to the multicast VRF to which the packet belongs. This label is placed by the ingress PE immediately beneath the P-multicast tree header. Each of the egress PEs must be able to associate this MPLS label with the same MVPN. If downstream-assigned labels were used, this would require all the egress PEs in the MVPN to agree on a common label for the MVPN. Instead, the MPLS label is upstream-assigned [MPLS-UPSTREAM-LABEL]. The label bindings are advertised via BGP Updates originated by the ingress PEs. This procedure requires each egress PE to support a separate label space for every other PE. The egress PEs create a forwarding entry for the upstream-assigned MPLS label, allocated by the ingress PE, in this label space. Hence, when the egress PE receives a packet over an Aggregate Tree, it first determines the tree over which the packet was received. The tree identifier determines the label space in which the upstream-assigned MPLS label lookup has to be performed.
The same label space may be used for all P-multicast trees rooted at the same ingress PE or an implementation may decide to use a separate label space for every P-multicast tree. A full specification of the procedures to support aggregation on shared trees or on MP2MP LSPs is outside the scope of this document. The encapsulation format is either MPLS or MPLS-in-something (e.g., MPLS-in-GRE [MPLS-IP]). When MPLS is used, this label will appear immediately below the label that identifies the P-multicast tree. When MPLS-in-GRE is used, this label will be the top MPLS label that appears when the GRE header is stripped off. When IP encapsulation is used for the P-multicast tree, whatever information that particular encapsulation format uses for identifying a particular tunnel is used to determine the label space in which the MPLS label is looked up. If the P-multicast tree uses MPLS encapsulation, the P-multicast tree is itself identified by an MPLS label. The egress PE MUST NOT advertise IMPLICIT NULL or EXPLICIT NULL for that tree. Once the label representing the tree is popped off the MPLS label stack, the next label is the demultiplexing information that allows the proper MVPN to be determined. This specification requires that, to support this sort of aggregation, there be at least one upstream-assigned label per MVPN. It does not require that there be only one. For example, an ingress PE could assign a unique label to each (C-S,C-G). (This could be done using the same technique that is used to assign a particular (C-S,C-G) to an S-PMSI, see Section 7.4.) When an egress PE receives a C-multicast data packet over a P-multicast tree, it needs to forward the packet to the CEs that have receivers in the packet's C-multicast group. In order to do this, the egress PE needs to determine the P-tunnel on which the packet was received. The PE can then determine the MVPN that the packet belongs to and, if needed, do any further lookups that are needed to forward the packet.6.4. Considerations for Specific Tunnel Technologies
While it is believed that the architecture specified in this document places no limitations on the protocols used for setting up and maintaining P-tunnels, the only protocols that have been explicitly considered are PIM-SM (both the SSM and ASM service models are
considered, as are bidirectional trees), RSVP-TE, mLDP, and BGP. (BGP's role in the setup and maintenance of P-tunnels is to "stitch" together the intra-AS segments of a segmented inter-AS P-tunnel.)6.4.1. RSVP-TE P2MP LSPs
If an I-PMSI is to be instantiated as one or more non-segmented P-tunnels, where the P-tunnels are RSVP-TE P2MP LSPs, then only the PEs that are at the head ends of those LSPs will ever include the PMSI Tunnel attribute in their Intra-AS I-PMSI A-D routes. (These will be the PEs in the "Sender Sites set".) If an I-PMSI is to be instantiated as one or more segmented P-tunnels, where some of the intra-AS segments of these tunnels are RSVP-TE P2MP LSPs, then only a PE or ASBR that is at the head end of one of these LSPs will ever include the PMSI Tunnel attribute in its Inter-AS I-PMSI A-D route. Other PEs send Intra-AS I-PMSI A-D routes without PMSI Tunnel attributes. (These will be the PEs that are in the "Receiver Sites set" but not in the "Sender Sites set".) As each "Sender Site" PE receives an Intra-AS I-PMSI A-D route from a PE in the Receiver Sites set, it adds the PE originating that Intra-AS I-PMSI A-D route to the set of receiving PEs for the P2MP LSP. The PE at the head end MUST then use RSVP-TE [RSVP-P2MP] signaling to add the receiver PEs to the P-tunnel. When RSVP-TE P2MP LSPs are used to instantiate S-PMSIs, and a particular C-flow is to be bound to the LSP, it is necessary to use explicit tracking so that the head end of the LSP knows which PEs need to receive data from the specified C-flow. If the binding is done using S-PMSI A-D routes (see Section 7.4.1), the "Leaf Information Required" bit MUST be set in the PMSI Tunnel attribute. RSVP-TE P2MP LSPs can optionally support aggregation of multiple MVPNs. If an RSVP-TE P2MP LSP Tunnel is used for only a single MVPN, the mapping between the LSP and the MVPN can either be configured or be deduced from the procedures used to announce the LSP (e.g., from the RTs in the A-D route that announced the LSP). If the LSP is used for multiple MVPNs, the set of MVPNs using it (and the corresponding MPLS labels) is inferred from the PMSI Tunnel attributes that specify the LSP. If an RSVP-TE P2MP LSP is being used to carry a set of C-flows traveling along a bidirectional C-tree, using the procedures of Section 11.2, the head end MUST include the PE Distinguisher Labels
attribute in its Intra-AS I-PMSI A-D route or S-PMSI A-D route, and it MUST provide an upstream-assigned label for each PE that it has selected as the Upstream PE for the C-tree's RPA (Rendezvous Point Address). See Section 11.2 for details. A PMSI Tunnel attribute specifying an RSVP-TE P2MP LSP contains the following information: - The type of the tunnel is set to RSVP-TE P2MP Tunnel - The RSVP-TE P2MP Tunnel's SESSION Object. - Optionally, the RSVP-TE P2MP LSP's SENDER_TEMPLATE Object. This object is included when it is desired to identify a particular P2MP TE LSP. Demultiplexing the C-multicast data packets at the egress PE follows procedures described in Section 6.3.3. As specified in Section 6.3.3, an egress PE MUST NOT advertise IMPLICIT NULL or EXPLICIT NULL for an RSVP-TE P2MP LSP that is carrying traffic for one or more MVPNs. If (and only if) a particular RSVP-TE P2MP LSP is possibly carrying data from multiple MVPNs, the following special procedures apply: - A packet in a particular MVPN, when transmitted into the LSP, must carry the MPLS label specified in the PMSI Tunnel attribute that announced that LSP as a P-tunnel for that for that MVPN. - Demultiplexing the C-multicast data packets at the egress PE is done by means of the MPLS label that rises to the top of the stack after the label corresponding to the P2MP LSP is popped off. It is possible that at the time a PE learns, via an A-D route with a PMSI Tunnel attribute, that it needs to receive traffic on a particular RSVP-TE P2MP LSP, the signaling to set up the LSP will not have been completed. In this case, the PE needs to wait for the RSVP-TE signaling to take place before it can modify its forwarding tables as directed by the A-D route. It is also possible that the signaling to set up an RSVP-TE P2MP LSP will be completed before a given PE learns, via a PMSI Tunnel attribute, of the use to which that LSP will be put. The PE MUST discard any traffic received on that LSP until that time.
In order for the egress PE to be able to discard such traffic, it needs to know that the LSP is associated with an MVPN and that the A-D route that binds the LSP to an MVPN or to a particular a C-flow has not yet been received. This is provided by extending [RSVP-P2MP] with [RSVP-OOB].6.4.2. PIM Trees
When the P-tunnels are PIM trees, the PMSI Tunnel attribute contains enough information to allow each other PE in the same MVPN to use P-PIM signaling to join the P-tunnel. If an I-PMSI is to be instantiated as one or more PIM trees, then the PE that is at the root of a given PIM tree sends an Intra-AS I-PMSI A-D route containing a PMSI Tunnel attribute that contains all the information needed for other PEs to join the tree. If PIM trees are to be used to instantiate an MI-PMSI, each PE in the MVPN must send an Intra-AS I-PMSI A-D route containing such a PMSI Tunnel attribute. If a PMSI is to be instantiated via a shared tree, the PMSI Tunnel attribute identifies the P-group address. The RP or RPA corresponding to the P-group address is not specified. It must, of course, be known to all the PEs. It is presupposed that the PEs use one of the methods for automatically learning the RP-to-group correspondences (e.g., Bootstrap Router Protocol [BSR]), or else that the correspondence is configured. If a PMSI is to be instantiated via a source-specific tree, the PMSI Tunnel attribute identifies the PE router that is the root of the tree, as well as a P-group address. The PMSI Tunnel attribute always specifies whether the PIM tree is to be a unidirectional shared tree, a bidirectional shared tree, or a source-specific tree. If PIM trees are being used to instantiate S-PMSIs, the above procedures assume that each PE router has a set of group P-addresses that it can use for setting up the PIM-trees. Each PE must be configured with this set of P-addresses. If the P-tunnels are source-specific trees, then the PEs may be configured with overlapping sets of group P-addresses. If the trees are not source- specific, then each PE must be configured with a unique set of group P-addresses (i.e., having no overlap with the set configured at any other PE router). The management of this set of addresses is thus greatly simplified when source-specific trees are used, so the use of source-specific trees is strongly recommended whenever unidirectional trees are desired.
Specification of the full set of procedures for using bidirectional PIM trees to instantiate S-PMSIs is outside the scope of this document. Details for constructing the PMSI Tunnel attribute identifying a PIM tree can be found in [MVPN-BGP].6.4.3. mLDP P2MP LSPs
When the P-tunnels are mLDP P2MP trees, each Intra-AS I-PMSI A-D route has a PMSI Tunnel attribute containing enough information to allow each other PE in the same MVPN to use mLDP signaling to join the P-tunnel. The tunnel identifier consists of a P2MP Forwarding Equivalence Class (FEC) Element [mLDP]. An mLDP P2MP LSP may be used to carry the traffic of multiple VPNs, if the PMSI Tunnel attribute specifying it contains a non-zero MPLS label. If an mLDP P2MP LSP is being used to carry the set of flows traveling along a particular bidirectional C-tree, using the procedures of Section 11.2, the root of the LSP MUST include the PE Distinguisher Labels attribute in its Intra-AS I-PMSI A-D route or S-PMSI A-D route, and it MUST provide an upstream-assigned label for the PE that it has selected to be the Upstream PE for the C-tree's RPA. See Section 11.2 for details.6.4.4. mLDP MP2MP LSPs
The specification of the procedures for assigning C-flows to mLDP MP2MP LSPs that serve as P-tunnels is outside the scope of this document.6.4.5. Ingress Replication
As described in Section 3, a PMSI can be instantiated using Unicast Tunnels between the PEs that are participating in the MVPN. In this mechanism, the ingress PE replicates a C-multicast data packet belonging to a particular MVPN and sends a copy to all or a subset of the PEs that belong to the MVPN. A copy of the packet is tunneled to a remote PE over a Unicast Tunnel to the remote PE. IP/GRE Tunnels or MPLS LSPs are examples of unicast tunnels that may be used. The same Unicast Tunnel can be used to transport packets belonging to different MVPNs In order for a PE to use Unicast P-tunnels to send a C-multicast data packet for a particular MVPN to a set of remote PEs, the remote PEs must be able to correctly decapsulate such packets and to assign each
one to the proper MVPN. This requires that the encapsulation used for sending packets through the P-tunnel have demultiplexing information that the receiver can associate with a particular MVPN. If ingress replication is being used to instantiate the PMSIs for an MVPN, the PEs announce this as part of the BGP-based MVPN membership auto-discovery process, described in Section 4. The PMSI Tunnel attribute specifies ingress replication; it also specifies a downstream-assigned MPLS label. This label will be used to identify that a particular packet belongs to the MVPN that the Intra-AS I-PMSI A-D route belongs to (as inferred from its RTs). If PE1 specifies a particular label value for a particular MVPN, then any other PE sending PE1 a packet for that MVPN through a unicast P-tunnel must put that label on the packet's label stack. PE1 then treats that label as the demultiplexor value identifying the MVPN in question. Ingress replication may be used to instantiate any kind of PMSI. When ingress replication is done, it is RECOMMENDED, except in the one particular case mentioned in the next paragraph, that explicit tracking be done and that the data packets of a particular C-flow only get sent to those PEs that need to see the packets of that C-flow. There is never any need to use the procedures of Section 7.4 for binding particular C-flows to particular P-tunnels. The particular case in which there is no need for explicit tracking is the case where ingress replication is being used to create a one-hop ASBR-ASBR inter-AS segment of an segmented inter-AS P-tunnel. Section 9.1 specifies three different methods that can be used to prevent duplication of multicast data packets. Any given deployment must use at least one of those methods. Note that the method described in Section 9.1.1 ("Discarding Packets from Wrong PE") presupposes that the egress PE of a P-tunnel can, upon receiving a packet from the P-tunnel, determine the identity of the PE that transmitted the packet into the P-tunnel. SPs that use ingress replication to instantiate their PMSIs are cautioned against this use for this purpose of unicast P-tunnel technologies that do not allow the egress PE to identify the ingress PE (e.g., MP2P LSPs for which penultimate-hop-popping is done). Deployment of ingress replication with such P-tunnel technology MUST NOT be done unless it is known that the deployment relies entirely on the procedures of Sections 9.1.2 or 9.1.3 for duplicate prevention.
7. Binding Specific C-Flows to Specific P-Tunnels
As discussed previously, Intra-AS I-PMSI A-D routes may (or may not) have PMSI Tunnel attributes, identifying P-tunnels that can be used as the default P-tunnels for carrying C-multicast traffic, i.e., for carrying C-multicast traffic that has not been specifically bound to another P-tunnel. If none of the Intra-AS I-PMSI A-D routes originated by a particular PE for a particular MVPN carry PMSI Tunnel attributes at all (or if the only PMSI Tunnel attributes they carry have type "No tunnel information present"), then there are no default P-tunnels for that PE to use when transmitting C-multicast traffic in that MVPN to other PEs. In that case, all such C-flows must be assigned to specific P-tunnels using one of the mechanisms specified in Section 7.4. That is, all such C-flows are carried on P-tunnels that instantiate S-PMSIs. There are other cases where it may be either necessary or desirable to use the mechanisms of Section 7.4 to identify specific C-flows and bind them to or unbind them from specific P-tunnels. Some possible cases are as follows: - The policy for a particular MVPN is to send all C-data on S-PMSIs, even if the Intra-AS I-PMSI A-D routes carry PMSI Tunnel attributes. (This is another case where all C-data is carried on S-PMSIs; presumably, the I-PMSIs are used for control information.) - It is desired to optimize the routing of the particular C-flow, which may already be traveling on an I-PMSI, by sending it instead on an S-PMSI. - If a particular C-flow is traveling on an S-PMSI, it may be considered desirable to move it to an I-PMSI (i.e., optimization of the routing for that flow may no longer be considered desirable). - It is desired to change the encapsulation used to carry the C-flow, e.g., because one now wants to aggregate it on a P-tunnel with flows from other MVPNs. Note that if Full PIM Peering over an MI-PMSI (Section 5.2) is being used, then from the perspective of the PIM state machine, the "interface" connecting the PEs to each other is the MI-PMSI, even if some or all of the C-flows are being sent on S-PMSIs. That is, from
the perspective of the C-PIM state machine, when a C-flow is being sent or received on an S-PMSI, the output or input interface (respectively) is considered to be the MI-PMSI. Section 7.1 discusses certain general considerations that apply whenever a specified C-flow is bound to a specified P-tunnel using the mechanisms of Section 7.4. This includes the case where the C-flow is moved from one P-tunnel to another as well as the case where the C-flow is initially bound to an S-PMSI P-tunnel. Section 7.2 discusses the specific case of using the mechanisms of Section 7.4 as a way of optimizing multicast routing by switching specific flows from one P-tunnel to another. Section 7.3 discusses the case where the mechanisms of Section 7.4 are used to announce the presence of "unsolicited flooded data" and to assign such data to a particular P-tunnel. Section 7.4 specifies the protocols for assigning specific C-flows to specific P-tunnels. These protocols may be used to assign a C-flow to a P-tunnel initially or to switch a flow from one P-tunnel to another. Procedures for binding to a specified P-tunnel the set of C-flows traveling along a specified C-tree (or for so binding a set of C-flows that share some relevant characteristic), without identifying each flow individually, are outside the scope of this document.7.1. General Considerations
7.1.1. At the PE Transmitting the C-Flow on the P-Tunnel
The decision to bind a particular C-flow (designated as (C-S,C-G)) to a particular P-tunnel, or to switch a particular C-flow to a particular P-tunnel, is always made by the PE that is to transmit the C-flow onto the P-tunnel. Whenever a PE moves a particular C-flow from one P-tunnel, say P1, to another, say P2, care must be taken to ensure that there is no steady state duplication of traffic. At any given time, the PE transmits the C-flow either on P1 or on P2, but not on both. When a particular PE, say PE1, decides to bind a particular C-flow to a particular P-tunnel, say P2, the following procedures MUST be applied:
- PE1 must issue the required control plane information to signal that the specified C-flow is now bound to P-tunnel P2 (see Section 7.4). - If P-tunnel P2 needs to be constructed from the root downwards, PE1 must initiate the signaling to construct P2. This is only required if P2 is an RSVP-TE P2MP LSP. - If the specified C-flow is currently bound to a different P-tunnel, say P1, then: * PE1 MUST wait for a "switch-over" delay before sending traffic of the C-flow on P-tunnel P2. It is RECOMMENDED to allow this delay to be configurable. * Once the "switch-over" delay has elapsed, PE1 MUST send traffic for the C-flow on P2 and MUST NOT send it on P1. In no case is any C-flow packet sent on both P-tunnels. When a C-flow is switched from one P-tunnel to another, the purpose of running a switch-over timer is to minimize packet loss without introducing packet duplication. However, jitter may be introduced due to the difference in transit delays between the old and new P-tunnels. For best effect, the switch-over timer should be configured to a value that is "just long enough" (a) to allow all the PEs to learn about the new binding of C-flow to P-tunnel and (b) to allow the PEs to construct the P-tunnel, if it doesn't already exist. If, after such a switch, the "old" P-tunnel P1 is no longer needed, it SHOULD be torn down and the resources supporting it freed. The procedures for "tearing down" a P-tunnel are specific to the P-tunnel technology. Procedures for binding sets of C-flows traveling along specified C-trees (or sets of C-flows sharing any other characteristic) to a specified P-tunnel (or for moving them from one P-tunnel to another) are outside the scope of this document.7.1.2. At the PE Receiving the C-flow from the P-Tunnel
Suppose that a particular PE, say PE1, learns, via the procedures of Section 7.4, that some other PE, say PE2, has bound a particular C-flow, designated as (C-S,C-G), to a particular P-tunnel, say P2. Then, PE1 must determine whether it needs to receive (C-S,C-G) traffic from PE2.
If BGP is being used to distribute C-multicast routing information from PE to PE, the conditions under which PE1 needs to receive (C-S,C-G) traffic from PE2 are specified in Section 12.3 of [MVPN-BGP]. If PIM over an MI-PMSI is being used to distribute C-multicast routing from PE to PE, PE1 needs to receive (C-S,C-G) traffic from PE2 if one or more of the following conditions holds: - PE1 has (C-S,C-G) state such that PE2 is PE1's Upstream PE for (C-S,C-G), and PE1 has downstream neighbors ("non-null olist") for the (C-S,C-G) state. - PE1 has (C-*,C-G) state with an Upstream PE (not necessarily PE2) and with downstream neighbors ( "non-null olist"), but PE1 does not have (C-S,C-G) state. - Native PIM methods are being used to prevent steady-state packet duplication, and PE1 has either (C-*,C-G) or (C-S,C-G) state such that the MI-PMSI is one of the downstream interfaces. Note that this includes the case where PE1 is itself sending (C-S,C-G) traffic on an S-PMSI. (In this case, PE1 needs to receive the (C-S,C-G) traffic from PE2 in order to allow the PIM Assert mechanism to function properly.) Irrespective of whether BGP or PIM is being used to distribute C-multicast routing information, once PE1 determines that it needs to receive (C-S,C-G) traffic from PE2, the following procedures MUST be applied: - PE1 MUST take all necessary steps to be able to receive the (C-S,C-G) traffic on P2. * If P2 is a PIM tunnel or an mLDP LSP, PE1 will need to use PIM or mLDP (respectively) to join P2 (unless it is already joined to P2). * PE1 may need to modify the forwarding state for (C-S,C-G) to indicate that (C-S,C-G) traffic is to be accepted on P2. If P2 is an Aggregate Tree, this also implies setting up the demultiplexing forwarding entries based on the inner label as described in Section 6.3.3 - If PE1 was previously receiving the (C-S,C-G) C-flow on another P-tunnel, say P1, then: * PE1 MAY run a switch-over timer, and until it expires, SHOULD accept traffic for the given C-flow on both P1 and P2;
* If, after such a switch, the "old" P-tunnel P1 is no longer needed, it SHOULD be torn down and the resources supporting it freed. The procedures for "tearing down" a P-tunnel are specific to the P-tunnel technology. - If PE1 later determines that it no longer needs to receive any of the C-multicast data that is being sent on a particular P-tunnel, it may initiate signaling (specific to the P-tunnel technology) to remove itself from that tunnel.7.2. Optimizing Multicast Distribution via S-PMSIs
Whenever a particular multicast stream is being sent on an I-PMSI, it is likely that the data of that stream is being sent to PEs that do not require it. If a particular stream has a significant amount of traffic, it may be beneficial to move it to an S-PMSI that includes only those PEs that are transmitters and/or receivers (or at least includes fewer PEs that are neither). If explicit tracking is being done, S-PMSI creation can also be triggered on other criteria. For instance, there could be a "pseudo- wasted bandwidth" criterion: switching to an S-PMSI would be done if the bandwidth multiplied by the number of uninterested PEs (PE that are receiving the stream but have no receivers) is above a specified threshold. The motivation is that (a) the total bandwidth wasted by many sparsely subscribed low-bandwidth groups may be large and (b) there's no point to moving a high-bandwidth group to an S-PMSI if all the PEs have receivers for it. Switching a (C-S,C-G) stream to an S-PMSI may require the root of the S-PMSI to determine the egress PEs that need to receive the (C-S,C-G) traffic. This is true in the following cases: - If the P-tunnel is a source-initiated tree, such as an RSVP-TE P2MP Tunnel, the PE needs to know the leaves of the tree before it can instantiate the S-PMSI. - If a PE instantiates multiple S-PMSIs, belonging to different MVPNs, using one P-multicast tree, such a tree is termed an Aggregate Tree with a selective mapping. The setting up of such an Aggregate Tree requires the ingress PE to know all the other PEs that have receivers for multicast groups that are mapped onto the tree.
The above two cases require that explicit tracking be done for the (C-S,C-G) stream. The root of the S-PMSI MAY decide to do explicit tracking of this stream only after it has determined to move the stream to an S-PMSI, or it MAY have been doing explicit tracking all along. If the S-PMSI is instantiated by a P-multicast tree, the PE at the root of the tree must signal the leaves of the tree that the (C-S,C-G) stream is now bound to the S-PMSI. Note that the PE could create the identity of the P-multicast tree prior to the actual instantiation of the P-tunnel. If the S-PMSI is instantiated by a source-initiated P-multicast tree (e.g., an RSVP-TE P2MP tunnel), the PE at the root of the tree must establish the source-initiated P-multicast tree to the leaves. This tree MAY have been established before the leaves receive the S-PMSI binding, or it MAY be established after the leaves receive the binding. The leaves MUST NOT switch to the S-PMSI until they receive both the binding and the tree signaling message.7.3. Announcing the Presence of Unsolicited Flooded Data
A PE may receive "unsolicited" data from a CE, where the data is intended to be flooded to the other PEs of the same MVPN and then on to other CEs. By "unsolicited", we mean that the data is to be delivered to all the other PEs of the MVPN, even though those PEs may not have sent any control information indicating that they need to receive that data. For example, if the BSR [BSR] is being used within the MVPN, BSR control messages may be received by a PE from a CE. These need to be forwarded to other PEs, even though no PE ever issues any kind of explicit signal saying that it wants to receive BSR messages. If a PE receives a BSR message from a CE, and if the CE's MVPN has an MI-PMSI, then the PE can just send BSR messages on the appropriate P-tunnel. Otherwise, the PE MUST announce the binding of a particular C-flow to a particular P-tunnel, using the procedures of Section 7.4. The particular C-flow in this case would be (C-IPaddress_of_PE, ALL-PIM-ROUTERS). The P-tunnel identified by the procedures of Section 7.4 may or may not be one that was previously identified in the PMSI Tunnel attribute of an I-PMSI A-D route. Further procedures for handling BSR may be found in Sections 5.2.1 and 5.3.4.
Analogous procedures may be used for announcing the presence of other sorts of unsolicited flooded data, e.g., dense mode data or data from proprietary protocols that presume messages can be flooded. However, a full specification of the procedures for traffic other than BSR traffic is outside the scope of this document.7.4. Protocols for Binding C-Flows to P-Tunnels
We describe two protocols for binding C-flows to P-tunnels. These protocols can be used for moving C-flows from I-PMSIs to S-PMSIs, as long as the S-PMSI is instantiated by a P-multicast tree. (If the S-PMSI is instantiated by means of ingress replication, the procedures of Section 6.4.5 suffice.) These protocols can also be used for other cases in which it is necessary to bind specific C-flows to specific P-tunnels.7.4.1. Using BGP S-PMSI A-D Routes
Not withstanding the name of the mechanism "S-PMSI A-D routes", the mechanism to be specified in this section may be used any time it is necessary to advertise a binding of a C-flow to a particular P-tunnel.7.4.1.1. Advertising C-Flow Binding to P-Tunnel
The ingress PE informs all the PEs that are on the path to receivers of the (C-S,C-G) of the binding of the P-tunnel to the (C-S,C-G). The BGP announcement is done by sending an update for the MCAST-VPN address family. An S-PMSI A-D route is used, containing the following information: 1. The IP address of the originating PE. 2. The RD configured locally for the MVPN. This is required to uniquely identify the (C-S,C-G) as the addresses could overlap between different MVPNs. This is the same RD value used in the auto-discovery process. 3. The C-S address. 4. The C-G address. 5. A PE MAY use a single P-tunnel to aggregate two or more S-PMSIs. If the PE already advertised unaggregated S-PMSI A-D routes for these S-PMSIs, then a decision to aggregate them requires the PE to re-advertise these routes. The re-
advertised routes MUST be the same as the original ones, except for the PMSI Tunnel attribute. If the PE has not previously advertised S-PMSI A-D routes for these S-PMSIs, then the aggregation requires the PE to advertise (new) S-PMSI A-D routes for these S-PMSIs. The PMSI Tunnel attribute in the newly advertised/re-advertised routes MUST carry the identity of the P-tunnel that aggregates the S-PMSIs. If all these aggregated S-PMSIs belong to the same MVPN, and this MVPN uses PIM as its C-multicast routing protocol, then the corresponding S-PMSI A-D routes MAY carry an MPLS upstream- assigned label [MPLS-UPSTREAM-LABEL]. Moreover, in this case, the labels MUST be distinct on a per-MVPN basis, and MAY be distinct on a per-route basis. If all these aggregated S-PMSIs belong to the MVPN(s) that use mLDP as its C-multicast routing protocol, then the corresponding S-PMSI A-D routes MUST carry an MPLS upstream- assigned label [MPLS-UPSTREAM-LABEL], and these labels MUST be distinct on a per-route (per-mLDP-FEC) basis, irrespective of whether the aggregated S-PMSIs belong to the same or different MVPNs. When a PE distributes this information via BGP, it must include the following: 1. An identifier for the particular P-tunnel to which the stream is to be bound. This identifier is a structured field that includes the following information: * The type of tunnel * An identifier for the tunnel. The form of the identifier will depend upon the tunnel type. The combination of tunnel identifier and tunnel type should contain enough information to enable all the PEs to "join" the tunnel and receive messages from it. 2. Route Target Extended Communities attribute. This is used as described in Section 4.7.4.1.2. Explicit Tracking
If the PE wants to enable explicit tracking for the specified flow, it also indicates this in the A-D route it uses to bind the flow to a particular P-tunnel. Then, any PE that receives the A-D route will
respond with a "Leaf A-D route" in which it identifies itself as a receiver of the specified flow. The Leaf A-D route will be withdrawn when the PE is no longer a receiver for the flow. If the PE needs to enable explicit tracking for a flow without at the same time binding the flow to a specific P-tunnel, it can do so by sending an S-PMSI A-D route whose NLRI identifies the flow and whose PMSI Tunnel attribute has its tunnel type value set to "no tunnel information present" and its "leaf information required" bit set to 1. This will elicit the Leaf A-D routes. This is useful when the PE needs to know the receivers before selecting a P-tunnel.7.4.2. UDP-Based Protocol
This procedure carries its control messages in UDP and requires that the MVPN have an MI-PMSI that can be used to carry the control messages.7.4.2.1. Advertising C-Flow Binding to P-Tunnel
In order for a given PE to move a particular C-flow to a particular P-tunnel, an "S-PMSI Join message" is sent periodically on the MI-PMSI. (Notwithstanding the name of the mechanism, the mechanism may be used to bind a flow to any P-tunnel.) The S-PMSI Join message is a UDP-encapsulated message whose destination address is ALL-PIM- ROUTERS (224.0.0.13) and whose destination port is 3232. The S-PMSI Join message contains the following information: - An identifier for the particular multicast stream that is to be bound to the P-tunnel. This can be represented as an (S,G) pair. - An identifier for the particular P-tunnel to which the stream is to be bound. This identifier is a structured field that includes the following information: * The type of tunnel used to instantiate the S-PMSI. * An identifier for the tunnel. The form of the identifier will depend upon the tunnel type. The combination of tunnel identifier and tunnel type should contain enough information to enable all the PEs to "join" the tunnel and receive messages from it. * If (and only if) the identified P-tunnel is aggregating several S-PMSIs, any demultiplexing information needed by the tunnel encapsulation protocol to identify a particular S-PMSI.
If the policy for the MVPN is that traffic is sent/received by default over an MI-PMSI, then traffic for a particular C-flow can be switched back to the MI-PMSI simply by ceasing to send S-PMSI Joins for that C-flow. Note that an S-PMSI Join that is not received over a PMSI (e.g., one that is received directly from a CE) is an illegal packet that MUST be discarded.7.4.2.2. Packet Formats and Constants
The S-PMSI Join message is encapsulated within UDP and has the following type/length/value (TLV) encoding: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | Length | Value | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | . | | . | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Type (8 bits) Length (16 bits): the total number of octets in the Type, Length, and Value fields combined Value (variable length) In this specification, only one type of S-PMSI Join is defined. A Type 1 S-PMSI Join is used when the S-PMSI tunnel is a PIM tunnel that is used to carry a single multicast stream, where the packets of that stream have IPv4 source and destination IP addresses. The S-PMSI Join format to use when the C-source and C-group are IPv6 addresses will be defined in a follow-on document.
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | Length | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | C-source | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | C-group | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | P-group | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Type (8 bits): 1 Length (16 bits): 16 Reserved (8 bits): This field SHOULD be zero when transmitted, and MUST be ignored when received. C-source (32 bits): the IPv4 address of the traffic source in the VPN. C-group (32 bits): the IPv4 address of the multicast traffic destination address in the VPN. P-group (32 bits): the IPv4 group address that the PE router is going to use to encapsulate the flow (C-source, C-group). The P-group identifies the S-PMSI P-tunnel, and the (C-S,C-G) identifies the multicast flow that is carried in the P-tunnel. The protocol uses the following constants. [S-PMSI_DELAY]: Once an S-PMSI Join message has been sent, the PE router that is to transmit onto the S-PMSI will delay this amount of time before it begins using the S-PMSI. The default value is 3 seconds. [S-PMSI_TIMEOUT]: If a PE (other than the transmitter) does not receive any packets over the S-PMSI P-tunnel for this amount of time, the PE will prune itself from the S-PMSI P-tunnel, and will expect (C-S,C-G) packets to arrive on an I-PMSI. The default value is 3 minutes. This value must be consistent among PE routers.
[S-PMSI_HOLDOWN]: If the PE that transmits onto the S-PMSI does not see any (C-S,C-G) packets for this amount of time, it will resume sending (C-S,C-G) packets on an I-PMSI. This is used to avoid oscillation when traffic is bursty. The default value is 1 minute. [S-PMSI_INTERVAL]: The interval the transmitting PE router uses to periodically send the S-PMSI Join message. The default value is 60 seconds.7.4.3. Aggregation
S-PMSIs can be aggregated on a P-multicast tree. The S-PMSI to (C-S,C-G) binding advertisement supports aggregation. Furthermore, the aggregation procedures of Section 6.3 apply. It is also possible to aggregate both S-PMSIs and I-PMSIs on the same P-multicast tree.