8. Inter-AS Procedures
If an MVPN has sites in more than one AS, it requires one or more PMSIs to be instantiated by inter-AS P-tunnels. This document describes two different types of inter-AS P-tunnel: 1. "Segmented inter-AS P-tunnels" A segmented inter-AS P-tunnel consists of a number of independent segments that are stitched together at the ASBRs. There are two types of segment: inter-AS segments and intra-AS segments. The segmented inter-AS P-tunnel consists of alternating intra-AS and inter-AS segments. Inter-AS segments connect adjacent ASBRs of different ASes; these "one-hop" segments are instantiated as unicast P-tunnels. Intra-AS segments connect ASBRs and PEs that are in the same AS. An intra-AS segment may be of whatever technology is desired by the SP that administers the that AS. Different intra-AS segments may be of different technologies. Note that the intra-AS segments of inter-AS P-tunnels form a category of P-tunnels that is distinct from simple intra-AS P-tunnels; we will rely on this distinction later (see Section 9).
A segmented inter-AS P-tunnel can be thought of as a tree that is rooted at a particular AS, and that has, as its leaves, the other ASes that need to receive multicast data from the root AS. 2. "Non-segmented Inter-AS P-tunnels" A non-segmented inter-AS P-tunnel is a single P-tunnel that spans AS boundaries. The tunnel technology cannot change from one point in the tunnel to the next, so all ASes through which the P-tunnel passes must support that technology. In essence, AS boundaries are of no significance to a non-segmented inter- AS P-tunnel. Section 10 of [RFC4364] describes three different options for supporting unicast inter-AS BGP/MPLS IP VPNs, known as options A, B, and C. We describe below how both segmented and non-segmented inter- AS trees can be supported when options B or C are used. (Option A does not pass any routing information through an ASBR at all, so no special inter-AS procedures are needed.)8.1. Non-Segmented Inter-AS P-Tunnels
In this model, the previously described discovery and tunnel setup mechanisms are used, even though the PEs belonging to a given MVPN may be in different ASes.8.1.1. Inter-AS MVPN Auto-Discovery
The previously described BGP-based auto-discovery mechanisms work "as is" when an MVPN contains PEs that are in different Autonomous Systems. However, please note that, if non-segmented inter-AS P-tunnels are to be used, then the Intra-AS I-PMSI A-D routes MUST be distributed across AS boundaries!8.1.2. Inter-AS MVPN Routing Information Exchange
When non-segmented inter-AS P-tunnels are used, MVPN C-multicast routing information may be exchanged by means of PIM peering across an MI-PMSI or by means of BGP carrying C-multicast routes. When PIM peering is used to distribute the C-multicast routing information, a PE that sends C-PIM Join/Prune messages for a particular (C-S,C-G) must be able to identify the PE that is its PIM adjacency on the path to S. This is the "Selected Upstream PE" described in Section 5.1.3.
If BGP (rather than PIM) is used to distribute the C-multicast routing information, and if option b of Section 10 of [RFC4364] is in use, then the C-multicast routes will be installed in the ASBRs along the path from each multicast source in the MVPN to each multicast receiver in the MVPN. If option b is not in use, the C-multicast routes are not installed in the ASBRs. The handling of the C-multicast routes in either case is thus exactly analogous to the handling of unicast VPN-IP routes in the corresponding case.8.1.3. Inter-AS P-Tunnels
The procedures described earlier in this document can be used to instantiate either an I-PMSI or an S-PMSI with inter-AS P-tunnels. Specific tunneling techniques require some explanation. If ingress replication is used, the inter-AS PE-PE P-tunnels will use the inter-AS tunneling procedures for the tunneling technology used. Procedures in [RSVP-P2MP] are used for inter-AS RSVP-TE P2MP P-tunnels. Procedures for using PIM to set up the P-tunnels are discussed in the next section.8.1.3.1. PIM-Based Inter-AS P-Multicast Trees
When PIM is used to set up a non-segmented inter-AS P-multicast tree, the PIM Join/Prune messages used to join the tree contain the IP address of the Upstream PE. However, there are two special considerations that must be taken into account: - It is possible that the P routers within one or more of the ASes will not have routes to the Upstream PE. For example, if an AS has a "BGP-free core", the P routers in an AS will not have routes to addresses outside the AS. - If the PIM Join/Prune message must travel through several ASes, it is possible that the ASBRs will not have routes to he PE routers. For example, in an inter-AS VPN constructed according to "option b" of Section 10 of [RFC4364], the ASBRs do not necessarily have routes to the PE routers. In either case, "ordinary" PIM Join/Prune messages cannot be routed to the Upstream PE. Therefore, in that case, the PIM Join/Prune messages MUST contain the "PIM MVPN Join attribute". This allows the multicast distribution tree to be properly constructed, even if routes to PEs in other ASes do not exist in the given AS's IGP and
even if the routes to those PEs do not exist in BGP. The use of a PIM MVPN Join attribute in the PIM messages allows the inter-AS trees to be built. The PIM MVPN Join attribute adds the following information to the PIM Join/Prune messages: a "proxy address", which contains the address of the next ASBR on the path to the Upstream PE. When the PIM Join/Prune arrives at the ASBR that is identified by the "proxy address", that ASBR must change the proxy address to identify the next hop ASBR. This information allows the PIM Join/Prune to be routed through an AS, even if the P routers of that AS do not have routes to the Upstream PE. However, this information is not sufficient to enable the ASBRs to route the Join/Prune if the ASBRs themselves do not have routes to the Upstream PE. However, even if the ASBRs do not have routes to the Upstream PE, the procedures of this document ensure that they will have Intra-AS I-PMSI A-D routes that lead to the Upstream PE. (Recall that if non- segmented inter-AS P-tunnels are being used, the ASBRs and PEs will have Intra-AS I-PMSI A-D routes that have been distributed inter-AS.) So, rather than having the PIM Join/Prune messages routed by the ASBRs along a route to the Upstream PE, the PIM Join/Prune messages MUST be routed along the path determined by the Intra-AS I-PMSI A-D routes. The basic format of a PIM Join attribute is specified in [PIM-ATTRIB]. The details of the PIM MVPN Join attribute are specified in the next section.8.1.3.2. The PIM MVPN Join Attribute
8.1.3.2.1. Definition
In [PIM-ATTRIB], the notion of a "join attribute" is defined, and a format for included join attributes in PIM Join/Prune messages is specified. We now define a new join attribute, which we call the "MVPN Join attribute". 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |F|E| Attr_Type | Length | Proxy IP address +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | RD +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-.......
The Attr_Type field of the MVPN Join attribute is set to 1. The F bit is set to 0. Two information fields are carried in the MVPN Join attribute: - Proxy IP address: The IP address of the node towards which the PIM Join/Prune message is to be forwarded. This will be either an IPv4 or an IPv6 address, depending on whether the PIM Join/Prune message itself is IPv4 or IPv6. - RD: An eight-byte RD. This immediately follows the proxy IP address. The PIM message also carries the address of the Upstream PE. In the case of an intra-AS MVPN, the proxy and the Upstream PE are the same. In the case of an inter-AS MVPN, the proxy will be the ASBR that is the exit point from the local AS on the path to the Upstream PE.8.1.3.2.2. Usage
When a PE router originates a PIM Join/Prune message in order to set up an inter-AS PMSI, it does so as a result of having received a particular Intra-AS I-PMSI A-D route or S-PMSI A-D route. It includes an MVPN Join attribute whose fields are set as follows: - If the Upstream PE is in the same AS as the local PE, then the proxy field contains the address of the Upstream PE. Otherwise, it contains the address of the BGP Next Hop of the route to the Upstream PE. - The RD field contains the RD from the NLRI of the Intra-AS A-D route. - The Upstream PE field contains the address of the PE that originated the Intra-AS I-PMSI A-D route or S-PMSI A-D route (obtained from the NLRI of that route). When a PIM router processes a PIM Join/Prune message with an MVPN Join attribute, it first checks to see if the proxy field contains one of its own addresses. If not, the router uses the proxy IP address in order to determine the RPF interface and neighbor. The MVPN Join attribute must be passed upstream unchanged.
If the proxy address is one of the router's own IP addresses, then the router looks in its BGP routing table for an Intra-AS A-D route whose NLRI consists of the Upstream PE address prepended with the RD from the Join attribute. If there is no match, the PIM message is discarded. If there is a match, the IP address from the BGP next hop field of the matching route is used in order to determine the RPF interface and neighbor. When the PIM Join/Prune is forwarded upstream, the proxy field is replaced with the address of the BGP next hop, and the RD and Upstream PE fields are left unchanged. The use of non-segmented inter-AS trees constructed via BIDIR-PIM is outside the scope of this document.8.2. Segmented Inter-AS P-Tunnels
The procedures for setting up and maintaining segmented inter-AS Inclusive and Selective P-tunnels may be found in [MVPN-BGP].9. Preventing Duplication of Multicast Data Packets
Consider the case of an egress PE that receives packets of a particular C-flow, (C-S,C-G), over a non-aggregated S-PMSI. The procedures described so far will never cause the PE to receive duplicate copies of any packet in that stream. It is possible that the (C-S,C-G) stream is carried in more than one S-PMSI; this may happen when the site that contains C-S is multihomed to more than one PE. However, a PE that needs to receive (C-S,C-G) packets only joins one of these S-PMSIs, and so it only receives one copy of each packet. However, if the data packets of stream (C-S,C-G) are carried in either an I-PMSI or an aggregated S-PMSI, then the procedures specified so far make it possible for an egress PE to receive more than one copy of each data packet. Additional procedures are needed to either make this impossible or ensure that the egress PE does not forward duplicates to the CE routers. This section covers only the situation where the C-trees are unidirectional, in either the ASM or SSM service models. The case where the C-trees are bidirectional is considered separately in Section 11. There are two cases where the procedures specified so far make it possible for an egress PE to receive duplicate copies of a multicast data packet. These are as follows: 1. The first case occurs when both of the following conditions hold:
a. an MVPN site that contains C-S or C-RP is multihomed to more than one PE, and b. either an I-PMSI or an aggregated S-PMSI is used for carrying the packets originated by C-S. In this case, an egress PE may receive one copy of the packet from each PE to which the site is homed. This case is discussed further in Section 9.2. 2. The second case occurs when all of the following conditions hold: a. the IP destination address of the customer packet, C-G, identifies a multicast group that is operating in ASM mode and whose C-multicast tree is set up using PIM-SM, b. an MI-PMSI is used for carrying the data packets, and c. a router or a CE in a site connected to the egress PE switches from the C-RP tree to the C-S tree. In this case, it is possible to get one copy of a given packet from the ingress PE attached to the C-RP's site and one from the ingress PE attached to the C-S's site. This case is discussed further in Section 9.3. Additional procedures are therefore needed to ensure that no MVPN customer sees steady state multicast data packet duplication. There are three procedures that may be used: 1. Discarding data packets received from the "wrong" PE 2. Single Forwarder Selection 3. Native PIM methods These methods are described in Section 9.1. Their applicability to the two scenarios where duplication is possible is discussed in Sections 9.2 and 9.3.9.1. Methods for Ensuring Non-Duplication
Every MVPN MUST use at least one of the three methods for ensuring non-duplication.
9.1.1. Discarding Packets from Wrong PE
Per Section 5.1.3, an egress PE, say PE1, chooses a specific Upstream PE, for given (C-S,C-G). When PE1 receives a (C-S,C-G) packet from a PMSI, it may be able to identify the PE that transmitted the packet onto the PMSI. If that transmitter is other than the PE selected by PE1 as the Upstream PE, then PE1 can drop the packet. This means that the PE will see a duplicate, but the duplicate will not get forwarded. The method used by an egress PE to determine the ingress PE for a particular packet, received over a particular PMSI, depends on the P-tunnel technology that is used to instantiate the PMSI. If the P-tunnel is a P2MP LSP, a PIM-SM or PIM-SSM tree, or a unicast P-tunnel that uses IP encapsulation, then the tunnel encapsulation contains information that can be used (possibly along with other state information in the PE) to determine the ingress PE, as long as the P-tunnel is instantiating an intra-AS PMSI or an inter-AS PMSI which is supported by a non-segmented inter-AS tunnel. Even when inter-AS segmented P-tunnels are used, if an aggregated S-PMSI is used for carrying the packets, the tunnel encapsulation must have some information that can be used to identify the PMSI; in turn, that implicitly identifies the ingress PE. Consider the case of an I-PMSI that spans multiple ASes and that is instantiated by segmented inter-AS P-tunnels. Suppose it is carrying data that is traveling along a particular C-tree. Suppose also that the C-root of that C-tree is multihomed to two or more PEs, and that each such PE is in a different AS than the others. Then, if there is any duplicate traffic, the duplicates will arrive on a different P-tunnel. Specifically, if the PE was expecting the traffic on a particular inter-AS P-tunnel, duplicate traffic will arrive either on an intra-AS P-tunnel (not an intra-AS segment of an inter-AS P-tunnel) or on some other inter-AS P-tunnel. To detect duplicates, the PE has to keep track of which inter-AS A-D route the PE uses for sending MVPN multicast routing information towards the C-S/C-RP. The PE MUST process received (multicast) traffic originated by C-S/C-RP only from the inter-AS P-tunnel that was carried in the best Inter-AS A-D route for the MVPN and that was originated by the AS that contains C-S/C-RP (where "the best" is determined by the PE). The PE MUST discard, as duplicates, all other multicast traffic originated by the C-S/C-RP, but received on any other P-tunnel. If, for a given MVPN, (a) an MI-PMSI is used for carrying multicast data packets, (b) the MI-PMSI is instantiated by a segmented inter-AS P-tunnel, (c) the C-S or C-RP is multihomed to different PEs, and (d) at least two such PEs are in the same AS, then, depending on the
tunneling technology used to instantiate the MI-PMSI, it may not always be possible for the egress PE to determine the Upstream PE. In that case, the procedure of Sections 9.1.2 or 9.1.3 must be used. NB: Section 10 describes an exception case where PE1 has to accept a packet even if it is not from the Selected Upstream PE.9.1.2. Single Forwarder Selection
Section 5.1 specifies a procedure for choosing a "default Upstream PE selection", such that (except during routing transients) all PEs will choose the same default Upstream PE. To ensure that duplicate packets are not sent through the backbone (except during routing transients), an ingress PE does not forward to the backbone any (C-S,C-G) multicast data packet it receives from a CE, unless the PE is the default Upstream PE selection. One difference in effect between this procedure and the procedure of Section 9.1.1 is that this procedure sends only one copy of each packet to each egress PE, rather than sending multiple copies and forcing the egress PE to discard all but one.9.1.3. Native PIM Methods
If PE-PE multicast routing information for a given MVPN is being disseminated by running PIM over an MI-PMSI, then native PIM methods will prevent steady state data packet duplication. The PIM Assert mechanism prevents steady state duplication in the scenario of Section 9.2, even if Single Forwarder Selection is not done. The PIM Prune(S,G,rpt) mechanism addresses the scenario of Section 9.3.9.2. Multihomed C-S or C-RP
Any of the three methods of Section 9.1 will prevent steady state duplicates in the case of a multihomed C-S or C-RP.9.3. Switching from the C-RP Tree to the C-S Tree
9.3.1. How Duplicates Can Occur
If some PEs are on the C-S tree and some are on the C-RP tree, then a PE may also receive duplicate data traffic after a (C-*,C-G) to (C-S,C-G) switch. If PIM is being used on an MI-PMSI to disseminate multicast routing information, native PIM methods (in particular, the use of the Prune(S,G,rpt) message) prevent steady state data duplication in this case.
If BGP C-multicast routing is being used, then the procedure of Section 9.1.1, if applicable, can be used to prevent duplication. However, if that procedure is not applicable, then the procedure of Section 9.1.2 is not sufficient to prevent steady state data duplication in all scenarios. In the scenario in which (a) BGP C-multicast routing is being used, (b) there are inter-site shared C-trees, and (c) there are inter-site source C-trees, additional procedures are needed. To see this, consider the following topology: CE1---C-RP | | CE2---PE1-- ... --PE2---CE5---C-S ... C-R1---CE3---PE3-- ... --PE4---CE4---C-R2 Suppose that C-R1 and C-R2 use PIM to join the (C-*,C-G) tree, where C-RP is the RP corresponding to C-G. As a result, CE3 and CE4 will send PIM Join(*,G) messages to PE3 and PE4, respectively. This will cause PE3 and PE4 to originate C-multicast Shared Tree Join Routes, specifying (C-*,C-G). These routes will identify PE1 as the Upstream PE. Now suppose that C-S is a transmitter for multicast group C-G, and that C-S sends its multicast data packets to C-RP in PIM Register messages. Then, PE1 will receive (C-S,C-G) data packets from CE1, and will forward them over an I-PMSI to PE3 and PE4, who will forward them, in turn, to CE3 and CE4, respectively. When C-R1 receives (C-S,C-G) data packets, it may decide to join the (C-S,C-G) source tree, by sending a PIM Join(S,G) to CE3. This will, in turn, cause CE3 to send a PIM Join(S,G) to PE3, which will, in turn, cause PE3 to originate a C-multicast Source Tree Join Route, specifying (C-S,C-G) and identifying PE2 as the Upstream PE. As a result, when PE2 receives (C-S,C-G) data packets from CE5, it will forward them on a PMSI to PE3.
At this point, the following situation exists: - If PE1 receives (C-S,C-G) packets from CE1, PE1 must forward them on the I-PMSI, because PE4 is still expecting to receive the (C-S,C-G) packets from PE1. - PE3 must continue to receive packets from the I-PMSI, since there may be other sources transmitting C-G traffic and PE3 currently has no other way to receive that traffic. - PE3 must also receive (C-S,C-G) traffic from PE2. As a result, PE3 may receive two copies of each (C-S,C-G) packet. The procedure of Section 9.1.2 (Single Forwarder Selection) does not prevent PE3 from receiving two copies, because it does not prevent one PE from forwarding (C-S,C-G) traffic along the shared C-tree while another forwards (C-S,C-G) traffic along a source-specific C-tree. So if PE3 cannot apply the method of Section 9.1.1 (Discarding Packets from Wrong PE), perhaps because the tunneling technology does not allow the egress PE to identify the ingress PE, then additional procedures are needed.9.3.2. Solution Using Source Active A-D Routes
The issue described in Section 9.3.1 is resolved through the use of Source Active A-D routes. In the remainder this section, we provide an example of how this works, along with an informal description of the procedures. A full and precise specification of the relevant procedures can be found in Section 13 of [MVPN-BGP]. In the event of any conflicts or other discrepancies between the description below and the description in [MVPN-BGP], [MVPN-BGP] is to be considered to be the authoritative document. Please note that the material in this section only applies when inter-site shared trees are being used. Whenever a PE creates an (C-S,C-G) state as a result of receiving a C-multicast route for (C-S,C-G) from some other PE, and the C-G group is an ASM group, the PE that creates the state MUST originate a Source Active A-D route (see [MVPN-BGP], Section 4.5). The NLRI of the route includes C-S and C-G. By default, the route carries the same set of Route Targets as the Intra-AS I-PMSI A-D route of the MVPN originated by the PE. Using the normal BGP procedures, the
route is propagated to all the PEs of the MVPN. For more details, see Section 13.1 ("Source within a Site - Source Active Advertisement") of [MVPN-BGP]. When, as a result of receiving a new Source Active A-D route, a PE updates its VRF with the route, the PE MUST check if the newly received route matches any (C-*,C-G) entries. If (a) there is a matching entry, (b) the PE does not have (C-S,C-G) state in its MVPN Tree Information Base (MVPN-TIB) for (C-S,C-G) carried in the route, and (c) the received route is selected as the best (using the BGP route selection procedures), then the PE takes the following action: - If the PE's (C-*,C-G) state has a PMSI as a downstream interface, the PE acts as if all the other PEs had pruned C-S off the (C-*,C-G) tree. That is: * If the PE receives (C-S,C-G) traffic from a CE, it does not transmit it to other PEs. * Depending on the PIM state of the PE's PE-CE interfaces, the PE may or may not need to invoke PIM procedures to prune C-S off the (C-*,C-G) tree by sending a PIM Prune(S,G,rpt) to one or more of the CEs. This is determined by ordinary PIM procedures. If this does need to be done, the PE SHOULD delay sending the Prune until it first runs a timer; this helps ensure that the source is not pruned from the shared tree until all PEs have had time to receive the Source Active A-D route. - If the PE's (C-*,C-G) state does not have a PMSI as a downstream interface, the PE sets up its forwarding path to receive (C-S,C-G) traffic from the originator of the selected Source Active A-D route. Whenever a PE deletes the (C-S,C-G) state that was previously created as a result of receiving a C-multicast route for (C-S,C-G) from some other PE, the PE that deletes the state also withdraws the Source Active A-D route (if there is one) that was advertised when the state was created. In the example topology of Section 9.3.1, this procedure will cause PE2 to generate a Source Active A-D route for (C-S,C-G). When this route is received, PE4 will set up its forwarding state to expect (C-S,C-G) packets from PE2. PE1 will change its forwarding state so that (C-S,C-G) packets that it receives from CE1 are not forwarded to any other PEs. (Note that PE1 may still forward (C-S,C-G) packets received from CE1 to CE2, if CE2 has receivers for C-G and those
receivers did not switch from the (C-*,C-G) tree to the (C-S,C-G) tree.) As a result, PE3 and PE4 do not receive duplicate packets of the (C-S,C-G) C-flow. With this procedure in place, there is no need to have any kind of C-multicast route that has the semantics of a PIM Prune(S,G,rpt) message. It is worth noting that if, as a result of this procedure, a PE sets up its forwarding state to receive (C-S,C-G) traffic from the source tree, the UMH is not necessarily the same as it would be if the PE had joined the source tree as a result of receiving a PIM Join for the same source tree from a directly attached CE. Note that the mechanism described in Section 7.4.1 can be leveraged to advertise an S-PMSI binding along with the source active messages. This is accomplished by using the same BGP Update message to carry both the NLRI of the S-PMSI A-D route and the NLRI of the Source Active A-D route. (Though an implementation processing the received routes cannot assume that this will always be the case.)10. Eliminating PE-PE Distribution of (C-*,C-G) State
In the ASM service model, a node that wants to become a receiver for a particular multicast group G first joins a shared tree, rooted at a rendezvous point. When the receiver detects traffic from a particular source, it has the option of joining a source tree, rooted at that source. If it does so, it has to prune that source from the shared tree, to ensure that it receives packets from that source on only one tree. Maintaining the shared tree can require considerable state, as it is necessary not only to know who the upstream and downstream nodes are, but to know which sources have been pruned off which branches of the share tree. The BGP-based signaling procedures defined in this document and in [MVPN-BGP] eliminate the need for PEs to distribute to each other any state having to do with which sources have been pruned off a shared C-tree. Those procedures do still allow multicast data traffic to travel on a shared C-tree, but they do not allow a situation in which some CEs receive (S,G) traffic on a shared tree and some on a source tree. This results in a considerable simplification of the PE-PE procedures with minimal change to the multicast service seen within the VPN. However, shared C-trees are still supported across the VPN backbone. That is, (C-*,C-G) state is distributed PE-PE, but (C-*,C-G,rpt) state is not.
In this section, we specify a number of optional procedures that go further and that completely eliminate the support for shared C-trees across the VPN backbone. In these procedures, the PEs keep track of the active sources for each C-G. As soon as a CE tries to join the (*,G) tree, the PEs instead join the (S,G) trees for all the active sources. Thus, all distribution of (C-*,C-G) state is eliminated. These procedures are optional because they require some additional support on the part of the VPN customer and because they are not always appropriate. (For example, a VPN customer may have his own policy of always using shared trees for certain multicast groups.) There are several different options, described in the following sub- sections.10.1. Co-Locating C-RPs on a PE
[MVPN-REQ] describes C-RP engineering as an issue when PIM-SM (or BIDIR-PIM) is used in Any-Source Multicast (ASM) mode [RFC4607] on the VPN customer site. To quote from [MVPN-REQ]: In the case of PIM-SM, when a source starts to emit traffic toward a group (in ASM mode), if sources and receivers are located in VPN sites that are different than that of the RP, then traffic may transiently flow twice through the SP network and the CE-PE link of the RP (from source to RP, and then from RP to receivers). This traffic peak, even short, may not be convenient depending on the traffic and link bandwidth. Thus, a VPN solution MAY provide features that solve or help mitigate this potential issue. One of the C-RP deployment models is for the customer to outsource the RP to the provider. In this case, the provider may co-locate the RP on the PE that is connected to the customer site [MVPN-REQ]. This section describes how "anycast-RP" can be used to achieve this. This is described below.10.1.1. Initial Configuration
For a particular MVPN, at least one or more PEs that have sites in that MVPN, act as an RP for the sites of that MVPN connected to these PEs. Within each MVPN, all of these RPs use the same (anycast) address. All of these RPs use the Anycast RP technique.10.1.2. Anycast RP Based on Propagating Active Sources
This mechanism is based on propagating active sources between RPs.
10.1.2.1. Receiver(s) within a Site
The PE that receives a C-Join message for (*,G) does not send the information that it has receiver(s) for G until it receives information about active sources for G from an Upstream PE. On receiving this (described in the next section), the downstream PE will respond with a Join message for (C-S,C-G). Sending this information could be done using any of the procedures described in Section 5. Only the Upstream PE will process this information.10.1.2.2. Source within a Site
When a PE receives a PIM Register message from a site that belongs to a given VPN, PE follows the normal PIM anycast RP procedures. It then advertises the source and group of the multicast data packet carried in the PIM Register message to other PEs in BGP using the following information elements: - Active source address - Active group address - Route target of the MVPN. This advertisement goes to all the PEs that belong to that MVPN. When a PE receives this advertisement, it checks whether there are any receivers in the sites attached to the PE for the group carried in the source active advertisement. If there are, then it generates an advertisement for (C-S,C-G) as specified in the previous section.10.1.2.3. Receiver Switching from Shared to Source Tree
No additional procedures are required when multicast receivers in customer's site shift from shared tree to source tree.10.2. Using MSDP between a PE and a Local C-RP
Section 10.1 describes the case where each PE is a C-RP. This enables the PEs to know the active multicast sources for each MVPN, and they can then use BGP to distribute this information to each other. As a result, the PEs do not have to join any shared C-trees, and this results in a simplification of the PE operation. In another deployment scenario, the PEs are not themselves C-RPs, but use Multicast Source Discovery Protocol (MSDP) [RFC3618] to talk to the C-RPs. In particular, a PE that attaches to a site that contains a C-RP becomes an MSDP peer of that C-RP. That PE then uses BGP to
distribute the information about the active sources to the other PEs. When the PE determines, by MSDP, that a particular source is no longer active, then it withdraws the corresponding BGP Update. Then, the PEs do not have to join any shared C-trees, and they do not have to be C-RPs either. MSDP provides the capability for a Source Active (SA) message to carry an encapsulated data packet. This capability can be used to allow an MSDP speaker to receive the first (or first several) packet(s) of an (S,G) flow, even though the MSDP speaker hasn't yet joined the (S,G) tree. (Presumably, it will join that tree as a result of receiving the SA message that carries the encapsulated data packet.) If this capability is not used, the first several data packets of an (S,G) stream may be lost. A PE that is talking MSDP to an RP may receive such an encapsulated data packet from the RP. The data packet should be decapsulated and transmitted to the other PEs in the MVPN. If the packet belongs to a particular (S,G) flow, and if the PE is a transmitter for some S-PMSI to which (S,G) has already been bound, the decapsulated data packet should be transmitted on that S-PMSI. Otherwise, if an I-PMSI exists for that MVPN, the decapsulated data packet should be transmitted on it. (If a MI-PMSI exists, this would typically be used.) If neither of these conditions hold, the decapsulated data packet is not transmitted to the other PEs in the MVPN. The decision as to whether and how to transmit the decapsulated data packet does not affect the processing of the SA control message itself. Suppose that PE1 transmits a multicast data packet on a PMSI, where that data packet is part of an (S,G) flow, and PE2 receives that packet from that PMSI. According to Section 9, if PE1 is not the PE that PE2 expects to be transmitting (S,G) packets, then PE2 must discard the packet. If an MSDP-encapsulated data packet is transmitted on a PMSI, as specified above, this rule from Section 9 would likely result in the packet being discarded. Therefore, if MSDP-encapsulated data packets being decapsulated and transmitted on a PMSI, we need to modify the rules of Section 9 as follows: 1. If the receiving PE, PE2, has already joined the (S,G) tree, and has chosen PE1 as the Upstream PE for the (S,G) tree, but this packet does not come from PE1, PE2 must discard the packet. 2. If the receiving PE, PE2, has not already joined the (S,G) tree, but is a PIM adjacency to a CE that is downstream on the (*,G) tree, the packet should be forwarded to the CE.
11. Support for PIM-BIDIR C-Groups
In BIDIR-PIM, each multicast group is associated with a Rendezvous Point Address (RPA). The Rendezvous Point Link (RPL) is the link that attaches to the RPA. Usually, it's a LAN where the RPA is in the IP subnet assigned to the LAN. The root node of a BIDIR-PIM tree is a node that has an interface on the RPL. On any LAN (other than the RPL) that is a link in a BIDIR-PIM tree, there must be a single node that has been chosen to be the DF. (More precisely, for each RPA there is a single node that is the DF for that RPA.) A node that receives traffic from an upstream interface may forward it on a particular downstream interface only if the node is the DF for that downstream interface. A node that receives traffic from a downstream interface may forward it on an upstream interface only if that node is the DF for the downstream interface. If, for any period of time, there is a link on which each of two different nodes believes itself to be the DF, data forwarding loops can form. Loops in a bidirectional multicast tree can be very harmful. However, any election procedure will have a convergence period. The BIDIR-PIM DF election procedure is very complicated, because it goes to great pains to ensure that if convergence is not extremely fast, then there is no forwarding at all until convergence has taken place. Other variants of PIM also have a DF election procedure for LANs. However, as long as the multicast tree is unidirectional, disagreement about who the DF is can result only in duplication of packets, not in loops. Therefore, the time taken to converge on a single DF is of much less concern for unidirectional trees and it is for bidirectional trees. In the MVPN environment, if PIM signaling is used among the PEs, then the standard LAN-based DF election procedure can be used. However, election procedures that are optimized for a LAN may not work as well in the MVPN environment. So, an alternative to DF election would be desirable. If BGP signaling is used among the PEs, an alternative to DF election is necessary. One might think that the "Single Forwarder Selection" procedures described in Sections 5 and 9 could be used to choose a single PE "DF" for the backbone (for a given RPA in a given MVPN). However, that is still likely to leave a convergence period of at least several seconds during which loops could form, and there could be a much longer convergence period if there is anything disrupting the smooth flow of BGP Updates. So, a simple procedure like that is not sufficient.
The remainder of this section describes two different methods that can be used to support BIDIR-PIM while eliminating the DF election.11.1. The VPN Backbone Becomes the RPL
On a per-MVPN basis, this method treats the whole service provider(s) infrastructure as a single RPL. We refer to such an RPL as an "MVPN- RPL". This eliminates the need for the PEs to engage in any "DF election" procedure because BIDIR-PIM does not have a DF on the RPL. However, this method can only be used if the customer is "outsourcing" the RPL/RPA functionality to the SP. An MVPN-RPL could be realized either via an I-PMSI (this I-PMSI is on a per-MVPN basis and spans all the PEs that have sites of a given MVPN), via a collection of S-PMSIs, or even via a combination of an I-PMSI and one or more S-PMSIs.11.1.1. Control Plane
Associated with each MVPN-RPL is an address prefix that is unambiguous within the context of the MVPN associated with the MVPN- RPL. For a given MVPN, each VRF connected to an MVPN-RPL of that MVPN is configured to advertise to all of its connected CEs the address prefix of the MVPN-RPL. Since, in BIDIR-PIM, there is no Designated Forwarder on an RPL, in the context of MVPN-RPL, there is no need to perform the Designated Forwarder election among the PEs (note it is still necessary to perform the Designated Forwarder election between a PE and its directly attached CEs, but that is done using plain BIDIR-PIM procedures). For a given MVPN, a PE connected to an MVPN-RPL of that MVPN should send multicast data (C-S,C-G) on the MVPN-RPL only if at least one other PE connected to the MVPN-RPL has a downstream multicast state for C-G. In the context of MVPN, this is accomplished by requiring a PE that has a downstream state for a particular C-G of a particular VRF present on the PE to originate a C-multicast route for (C-*,C-G). The RD of this route should be the same as the RD associated with the VRF. The RTs carried by the route should be such as to ensure that the route gets distributed to all the PEs of the MVPN.
11.1.2. Data Plane
A PE that receives (C-S,C-G) multicast data from a CE should forward this data on the MVPN-RPL of the MVPN the CE belongs to only if the PE receives at least one C-multicast route for (C-*, C-G). Otherwise, the PE should not forward the data on the RPL/I-PMSI. When a PE receives a multicast packet with (C-S,C-G) on an MVPN-RPL associated with a given MVPN, the PE forwards this packet to every directly connected CE of that MVPN, provided that the CE sends Join (C-*,C-G) to the PE (provided that the PE has the downstream (C-*,C-G) state). The PE does not forward this packet back on the MVPN-RPL. If a PE has no downstream (C-*,C-G) state, the PE does not forward the packet.11.2. Partitioned Sets of PEs
This method does not require the use of the MVPN-RPL, and it does not require the customer to outsource the RPA/RPL functionality to the SP.11.2.1. Partitions
Consider a particular C-RPA, call it C-R, in a particular MVPN. Consider the set of PEs that attach to sites that have senders or receivers for a BIDIR-PIM group C-G, where C-R is the RPA for C-G. (As always, we use the "C-" prefix to indicate that we are referring to an address in the VPN's address space rather than in the provider's address space.) Following the procedures of Section 5.1, each PE in the set independently chooses some other PE in the set to be its "Upstream PE" for those BIDIR-PIM groups with RPA C-R. Optionally, they can all choose the "default selection" (described in Section 5.1) to ensure that each PE to choose the same Upstream PE. Note that if a PE has a route to C-R via a VRF interface, then the PE may choose itself as the Upstream PE. The set of PEs can now be partitioned into a number of subsets. We'll say that PE1 and PE2 are in the same partition if and only if there is some PE3 such that PE1 and PE2 have each chosen PE3 as the Upstream PE for C-R. Note that each partition has exactly one Upstream PE. So it is possible to identify the partition by identifying its Upstream PE. Consider packet P, and let PE1 be its ingress PE. PE1 will send the packet on a PMSI so that it reaches the other PEs that need to receive it. This is done by encapsulating the packet and sending it
on a P-tunnel. If the original packet is part of a PIM-BIDIR group (its ingress PE determines this from the packet's destination address C-G), and if the VPN backbone is not the RPL, then the encapsulation MUST carry information that can be used to identify the partition to which the ingress PE belongs. When PE2 receives a packet from the PMSI, PE2 must determine, by examining the encapsulation, whether the packet's ingress PE belongs to the same partition (relative to the C-RPA of the packet's C-G) to which the PE2 itself belongs. If not, PE2 discards the packet. Otherwise, PE2 performs the normal BIDIR-PIM data packet processing. With this rule in place, harmful loops cannot be introduced by the PEs into the customer's bidirectional tree. Note that if there is more than one partition, the VPN backbone will not carry a packet from one partition to another. The only way for a packet to get from one partition to another is for it to go up towards the RPA and then down another path to the backbone. If this is not considered desirable, then all PEs should choose the same Upstream PE for a given C-RPA. Then, multiple partitions will only exist during routing transients.11.2.2. Using PE Distinguisher Labels
If a given P-tunnel is to be used to carry packets traveling along a bidirectional C-tree, then, EXCEPT for the case described in Sections 11.1 and 11.2.3, the packets that travel on that P-tunnel MUST carry a PE Distinguisher Label (defined in Section 4), using the encapsulation discussed in Section 12.3. When a given PE transmits a given packet of a bidirectional C-group to the P-tunnel, the packet will carry the PE Distinguisher Label corresponding to the partition, for the C-group's C-RPA, that contains the transmitting PE. This is the PE Distinguisher Label that has been bound to the Upstream PE of that partition; it is not necessarily the label that has been bound to the transmitting PE. Recall that the PE Distinguisher Labels are upstream-assigned labels that are assigned and advertised by the node that is at the root of the P-tunnel. The information about PE Distinguisher Labels is distributed with Intra-AS I-PMSI A-D routes and/or S-PMSI A-D routes by encoding it into the PE Distinguisher Labels attribute carried by these routes. When a PE receives a packet with a PE label that does not identify the partition of the receiving PE, then the receiving PE discards the packet.
Note that this procedure does not necessarily require the root of a P-tunnel to assign a PE Distinguisher Label for every PE that belongs to the tunnel. If the root of the P-tunnel is the only PE that can transmit packets to the P-tunnel, then the root needs to assign PE Distinguisher Labels only for those PEs that the root has selected to be the UMHs for the particular C-RPAs known to the root.11.2.3. Partial Mesh of MP2MP P-Tunnels
There is one case in which support for BIDIR-PIM C-groups does not require the use of a PE Distinguisher Label. For each C-RPA, suppose a distinct MP2MP LSP is used as the P-tunnel serving that C-RPA's partition. Then, for a given packet, a PE receiving the packet from a P-tunnel can infer the partition from the tunnel. So, PE Distinguisher Labels are not needed in this case.