15. MAC Mobility
It is possible for a given host or end-station (as defined by its MAC address) to move from one Ethernet segment to another; this is referred to as 'MAC Mobility' or 'MAC move', and it is different from the multihoming situation in which a given MAC address is reachable via multiple PEs for the same Ethernet segment. In a MAC move, there would be two sets of MAC/IP Advertisement routes -- one set with the new Ethernet segment and one set with the previous Ethernet segment -- and the MAC address would appear to be reachable via each of these segments. In order to allow all of the PEs in the EVPN instance to correctly determine the current location of the MAC address, all advertisements of it being reachable via the previous Ethernet segment MUST be withdrawn by the PEs, for the previous Ethernet segment, that had advertised it. If local learning is performed using the data plane, these PEs will not be able to detect that the MAC address has moved to another Ethernet segment, and the receipt of MAC/IP Advertisement routes, with the MAC Mobility extended community attribute, from other PEs serves as the trigger for these PEs to withdraw their advertisements. If local learning is performed using the control or management planes, these interactions serve as the trigger for these PEs to withdraw their advertisements. In a situation where there are multiple moves of a given MAC, possibly between the same two Ethernet segments, there may be multiple withdrawals and re-advertisements. In order to ensure that all PEs in the EVPN instance receive all of these correctly through the intervening BGP infrastructure, introducing a sequence number into the MAC Mobility extended community attribute is necessary. In order to process mobility events correctly, an implementation MUST handle scenarios in which sequence number wraparound occurs. Every MAC mobility event for a given MAC address will contain a sequence number that is set using the following rules: - A PE advertising a MAC address for the first time advertises it with no MAC Mobility extended community attribute. - A PE detecting a locally attached MAC address for which it had previously received a MAC/IP Advertisement route with a different Ethernet segment identifier advertises the MAC address in a MAC/IP Advertisement route tagged with a MAC Mobility extended community attribute with a sequence number one greater than the sequence
number in the MAC Mobility extended community attribute of the received MAC/IP Advertisement route. In the case of the first mobility event for a given MAC address, where the received MAC/IP Advertisement route does not carry a MAC Mobility extended community attribute, the value of the sequence number in the received route is assumed to be 0 for the purpose of this processing. - A PE detecting a locally attached MAC address for which it had previously received a MAC/IP Advertisement route with the same non-zero Ethernet segment identifier advertises it with: 1. no MAC Mobility extended community attribute, if the received route did not carry said attribute. 2. a MAC Mobility extended community attribute with the sequence number equal to the highest of the sequence number(s) in the received MAC/IP Advertisement route(s), if the received route(s) is (are) tagged with a MAC Mobility extended community attribute. - A PE detecting a locally attached MAC address for which it had previously received a MAC/IP Advertisement route with the same zero Ethernet segment identifier (single-homed scenarios) advertises it with a MAC Mobility extended community attribute with the sequence number set properly. In the case of single-homed scenarios, there is no need for ESI comparison. ESI comparison is done for multihoming in order to prevent false detection of MAC moves among the PEs attached to the same multihomed site. A PE receiving a MAC/IP Advertisement route for a MAC address with a different Ethernet segment identifier and a higher sequence number than that which it had previously advertised withdraws its MAC/IP Advertisement route. If two (or more) PEs advertise the same MAC address with the same sequence number but different Ethernet segment identifiers, a PE that receives these routes selects the route advertised by the PE with the lowest IP address as the best route. If the PE is the originator of the MAC route and it receives the same MAC address with the same sequence number that it generated, it will compare its own IP address with the IP address of the remote PE and will select the lowest IP. If its own route is not the best one, it will withdraw the route.
15.1. MAC Duplication Issue
A situation may arise where the same MAC address is learned by different PEs in the same VLAN because of two (or more) hosts being misconfigured with the same (duplicate) MAC address. In such a situation, the traffic originating from these hosts would trigger continuous MAC moves among the PEs attached to these hosts. It is important to recognize such a situation and avoid incrementing the sequence number (in the MAC Mobility extended community attribute) to infinity. In order to remedy such a situation, a PE that detects a MAC mobility event via local learning starts an M-second timer (with a default value of M = 180), and if it detects N MAC moves before the timer expires (with a default value of N = 5), it concludes that a duplicate-MAC situation has occurred. The PE MUST alert the operator and stop sending and processing any BGP MAC/IP Advertisement routes for that MAC address until a corrective action is taken by the operator. The values of M and N MUST be configurable to allow for flexibility in operator control. Note that the other PEs in the EVPN instance will forward the traffic for the duplicate MAC address to one of the PEs advertising the duplicate MAC address.15.2. Sticky MAC Addresses
There are scenarios in which it is desired to configure some MAC addresses as static so that they are not subjected to MAC moves. In such scenarios, these MAC addresses are advertised with a MAC Mobility extended community where the static flag is set to 1 and the sequence number is set to zero. If a PE receives such advertisements and later learns the same MAC address(es) via local learning, then the PE MUST alert the operator.16. Multicast and Broadcast
The PEs in a particular EVPN instance may use ingress replication or P2MP LSPs to send multicast traffic to other PEs.16.1. Ingress Replication
The PEs may use ingress replication for flooding BUM traffic as described in Section 11 ("Handling of Multi-destination Traffic"). A given broadcast packet must be sent to all the remote PEs. However, a given multicast packet for a multicast flow may be sent to only a subset of the PEs. Specifically, a given multicast flow may be sent to only those PEs that have receivers that are interested in the multicast flow. Determining which of the PEs have receivers for a given multicast flow is done using explicit tracking per [RFC7117].
16.2. P2MP LSPs
A PE may use an "Inclusive" tree for sending a BUM packet. This terminology is borrowed from [RFC7117]. A variety of transport technologies may be used in the service provider (SP) network. For Inclusive P-multicast trees, these transport technologies include point-to-multipoint LSPs created by RSVP-TE or Multipoint LDP (mLDP).16.2.1. Inclusive Trees
An Inclusive tree allows the use of a single multicast distribution tree, referred to as an Inclusive P-multicast tree, in the SP network to carry all the multicast traffic from a specified set of EVPN instances on a given PE. A particular P-multicast tree can be set up to carry the traffic originated by sites belonging to a single EVPN instance, or to carry the traffic originated by sites belonging to several EVPN instances. The ability to carry the traffic of more than one EVPN instance on the same tree is termed 'Aggregation', and the tree is called an Aggregate Inclusive P-multicast tree or Aggregate Inclusive tree for short. The Aggregate Inclusive tree needs to include every PE that is a member of any of the EVPN instances that are using the tree. This implies that a PE may receive BUM traffic even if it doesn't have any receivers that are interested in receiving that traffic. An Inclusive or Aggregate Inclusive tree as defined in this document is a P2MP tree. A P2MP tree is used to carry traffic only for EVPN CEs that are connected to the PE that is the root of the tree. The procedures for signaling an Inclusive tree are the same as those in [RFC7117], with the VPLS A-D route replaced with the Inclusive Multicast Ethernet Tag route. The P-tunnel attribute [RFC7117] for an Inclusive tree is advertised with the Inclusive Multicast Ethernet Tag route as described in Section 11 ("Handling of Multi-destination Traffic"). Note that for an Aggregate Inclusive tree, a PE can "aggregate" multiple EVPN instances on the same P2MP LSP using upstream labels. The procedures for aggregation are the same as those described in [RFC7117], with VPLS A-D routes replaced by EVPN Inclusive Multicast Ethernet Tag routes.
17. Convergence
This section describes failure recovery from different types of network failures.17.1. Transit Link and Node Failures between PEs
The use of existing MPLS fast-reroute mechanisms can provide failure recovery on the order of 50 ms, in the event of transit link and node failures in the infrastructure that connects the PEs.17.2. PE Failures
Consider a host CE1 that is dual-homed to PE1 and PE2. If PE1 fails, a remote PE, PE3, can discover this based on the failure of the BGP session. This failure detection can be in the sub-second range if Bidirectional Forwarding Detection (BFD) is used to detect BGP session failures. PE3 can update its forwarding state to start sending all traffic for CE1 to only PE2.17.3. PE-to-CE Network Failures
If the connectivity between the multihomed CE and one of the PEs to which it is attached fails, the PE MUST withdraw the set of Ethernet A-D per ES routes that had been previously advertised for that ES. This enables the remote PEs to remove the MPLS next hop to this particular PE from the set of MPLS next hops that can be used to forward traffic to the CE. When the MAC entry on the PE ages out, the PE MUST withdraw the MAC address from BGP. When an Ethernet tag is decommissioned on an Ethernet segment, then the PE MUST withdraw the Ethernet A-D per EVI route(s) announced for the <ESI, Ethernet tags> that are impacted by the decommissioning. In addition, the PE MUST also withdraw the MAC/IP Advertisement routes that are impacted by the decommissioning. The Ethernet A-D per ES routes should be used by an implementation to optimize the withdrawal of MAC/IP Advertisement routes. When a PE receives a withdrawal of a particular Ethernet A-D route from an advertising PE, it SHOULD consider all the MAC/IP Advertisement routes that are learned from the same ESI as in the Ethernet A-D route from the advertising PE as having been withdrawn. This optimizes the network convergence times in the event of PE-to-CE failures.
18. Frame Ordering
In a MAC address, if the value of the first nibble (bits 8 through 5) of the most significant octet of the destination MAC address (which follows the last MPLS label) happens to be 0x4 or 0x6, then the Ethernet frame can be misinterpreted as an IPv4 or IPv6 packet by intermediate P nodes performing ECMP based on deep packet inspection, thus resulting in load balancing packets belonging to the same flow on different ECMP paths and subjecting those packets to different delays. Therefore, packets belonging to the same flow can arrive at the destination out of order. This out-of-order delivery can happen during steady state in the absence of any failures, resulting in significant impact on network operations. In order to avoid any such misordering, the following rules are applied: - If a network uses deep packet inspection for its ECMP, then the "Preferred PW MPLS Control Word" [RFC4385] SHOULD be used with the value 0 (e.g., a 4-octet field with a value of zero) when sending EVPN-encapsulated packets over an MP2P LSP. - If a network uses entropy labels [RFC6790], then the control word SHOULD NOT be used when sending EVPN-encapsulated packets over an MP2P LSP. - When sending EVPN-encapsulated packets over a P2MP LSP or P2P LSP, then the control word SHOULD NOT be used.19. Security Considerations
Security considerations discussed in [RFC4761] and [RFC4762] apply to this document for MAC learning in the data plane over an Attachment Circuit (AC) and for flooding of unknown unicast and ARP messages over the MPLS/IP core. Security considerations discussed in [RFC4364] apply to this document for MAC learning in the control plane over the MPLS/IP core. This section describes additional considerations. As mentioned in [RFC4761], there are two aspects to achieving data privacy and protecting against denial-of-service attacks in a VPN: securing the control plane and protecting the forwarding path. Compromise of the control plane could result in a PE sending customer data belonging to some EVPN to another EVPN, or black-holing EVPN customer data, or even sending it to an eavesdropper, none of which are acceptable from a data privacy point of view. In addition, compromise of the control plane could provide opportunities for
unauthorized EVPN data usage (e.g., exploiting traffic replication within a multicast tree to amplify a denial-of-service attack based on sending large amounts of traffic). The mechanisms in this document use BGP for the control plane. Hence, techniques such as those discussed in [RFC5925] help authenticate BGP messages, making it harder to spoof updates (which can be used to divert EVPN traffic to the wrong EVPN instance) or withdrawals (denial-of-service attacks). In the multi-AS backbone options (b) and (c) [RFC4364], this also means protecting the inter-AS BGP sessions between the Autonomous System Border Routers (ASBRs), the PEs, or the Route Reflectors. Further discussion of security considerations for BGP may be found in the BGP specification itself [RFC4271] and in the security analysis for BGP [RFC4272]. The original discussion of the use of the TCP MD5 signature option to protect BGP sessions is found in [RFC5925], while [RFC6952] includes an analysis of BGP keying and authentication issues. Note that [RFC5925] will not help in keeping MPLS labels private -- knowing the labels, one can eavesdrop on EVPN traffic. Such eavesdropping additionally requires access to the data path within an SP network. Users of VPN services are expected to take appropriate precautions (such as encryption) to protect the data exchanged over a VPN. One of the requirements for protecting the data plane is that the MPLS labels be accepted only from valid interfaces. For a PE, valid interfaces comprise links from other routers in the PE's own AS. For an ASBR, valid interfaces comprise links from other routers in the ASBR's own AS, and links from other ASBRs in ASes that have instances of a given EVPN. It is especially important in the case of multi-AS EVPN instances that one accept EVPN packets only from valid interfaces. It is also important to help limit malicious traffic into a network for an impostor MAC address. The mechanism described in Section 15.1 shows how duplicate MAC addresses can be detected and continuous false MAC mobility can be prevented. The mechanism described in Section 15.2 shows how MAC addresses can be pinned to a given Ethernet segment, such that if they appear behind any other Ethernet segments, the traffic for those MAC addresses can be prevented from entering the EVPN network from the other Ethernet segments.
20. IANA Considerations
This document defines a new NLRI, called "EVPN", to be carried in BGP using multiprotocol extensions. This NLRI uses the existing AFI of 25 (L2VPN). IANA has assigned BGP EVPNs a SAFI value of 70. IANA has allocated the following EVPN Extended Community sub-types in [RFC7153], and this document is the only reference for them. 0x00 MAC Mobility [RFC7432] 0x01 ESI Label [RFC7432] 0x02 ES-Import Route Target [RFC7432] This document creates a registry called "EVPN Route Types". New registrations will be made through the "RFC Required" procedure defined in [RFC5226]. The registry has a maximum value of 255. Initial registrations are as follows: 0 Reserved [RFC7432] 1 Ethernet Auto-discovery [RFC7432] 2 MAC/IP Advertisement [RFC7432] 3 Inclusive Multicast Ethernet Tag [RFC7432] 4 Ethernet Segment [RFC7432]21. References
21.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997, <http://www.rfc-editor.org/info/rfc2119>. [RFC4271] Rekhter, Y., Ed., Li, T., Ed., and S. Hares, Ed., "A Border Gateway Protocol 4 (BGP-4)", RFC 4271, January 2006, <http://www.rfc-editor.org/info/rfc4271>. [RFC4360] Sangli, S., Tappan, D., and Y. Rekhter, "BGP Extended Communities Attribute", RFC 4360, February 2006, <http://www.rfc-editor.org/info/rfc4360>. [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private Networks (VPNs)", RFC 4364, February 2006, <http://www.rfc-editor.org/info/rfc4364>. [RFC4760] Bates, T., Chandra, R., Katz, D., and Y. Rekhter, "Multiprotocol Extensions for BGP-4", RFC 4760, January 2007, <http://www.rfc-editor.org/info/rfc4760>.
[RFC4761] Kompella, K., Ed., and Y. Rekhter, Ed., "Virtual Private LAN Service (VPLS) Using BGP for Auto-Discovery and Signaling", RFC 4761, January 2007, <http://www.rfc-editor.org/info/rfc4761>. [RFC4762] Lasserre, M., Ed., and V. Kompella, Ed., "Virtual Private LAN Service (VPLS) Using Label Distribution Protocol (LDP) Signaling", RFC 4762, January 2007, <http://www.rfc-editor.org/info/rfc4762>. [RFC7153] Rosen, E. and Y. Rekhter, "IANA Registries for BGP Extended Communities", RFC 7153, March 2014, <http://www.rfc-editor.org/info/rfc7153>.21.2. Informative References
[802.1D-REV] "IEEE Standard for Local and metropolitan area networks - Media Access Control (MAC) Bridges", IEEE Std. 802.1D, June 2004. [802.1Q] "IEEE Standard for Local and metropolitan area networks - Media Access Control (MAC) Bridges and Virtual Bridged Local Area Networks", IEEE Std 802.1Q(tm), 2014 Edition, November 2014. [RFC4272] Murphy, S., "BGP Security Vulnerabilities Analysis", RFC 4272, January 2006, <http://www.rfc-editor.org/info/rfc4272>. [RFC4385] Bryant, S., Swallow, G., Martini, L., and D. McPherson, "Pseudowire Emulation Edge-to-Edge (PWE3) Control Word for Use over an MPLS PSN", RFC 4385, February 2006, <http://www.rfc-editor.org/info/rfc4385>. [RFC4664] Andersson, L., Ed., and E. Rosen, Ed., "Framework for Layer 2 Virtual Private Networks (L2VPNs)", RFC 4664, September 2006, <http://www.rfc-editor.org/info/rfc4664>. [RFC4684] Marques, P., Bonica, R., Fang, L., Martini, L., Raszuk, R., Patel, K., and J. Guichard, "Constrained Route Distribution for Border Gateway Protocol/MultiProtocol Label Switching (BGP/MPLS) Internet Protocol (IP) Virtual Private Networks (VPNs)", RFC 4684, November 2006, <http://www.rfc-editor.org/info/rfc4684>.
[RFC5226] Narten, T. and H. Alvestrand, "Guidelines for Writing an IANA Considerations Section in RFCs", BCP 26, RFC 5226, May 2008, <http://www.rfc-editor.org/info/rfc5226>. [RFC5925] Touch, J., Mankin, A., and R. Bonica, "The TCP Authentication Option", RFC 5925, June 2010, <http://www.rfc-editor.org/info/rfc5925>. [RFC6514] Aggarwal, R., Rosen, E., Morin, T., and Y. Rekhter, "BGP Encodings and Procedures for Multicast in MPLS/BGP IP VPNs", RFC 6514, February 2012, <http://www.rfc-editor.org/info/rfc6514>. [RFC6790] Kompella, K., Drake, J., Amante, S., Henderickx, W., and L. Yong, "The Use of Entropy Labels in MPLS Forwarding", RFC 6790, November 2012, <http://www.rfc-editor.org/info/rfc6790>. [RFC6952] Jethanandani, M., Patel, K., and L. Zheng, "Analysis of BGP, LDP, PCEP, and MSDP Issues According to the Keying and Authentication for Routing Protocols (KARP) Design Guide", RFC 6952, May 2013, <http://www.rfc-editor.org/info/rfc6952>. [RFC7117] Aggarwal, R., Ed., Kamite, Y., Fang, L., Rekhter, Y., and C. Kodeboniya, "Multicast in Virtual Private LAN Service (VPLS)", RFC 7117, February 2014, <http://www.rfc-editor.org/info/rfc7117>. [RFC7209] Sajassi, A., Aggarwal, R., Uttaro, J., Bitar, N., Henderickx, W., and A. Isaac, "Requirements for Ethernet VPN (EVPN)", RFC 7209, May 2014, <http://www.rfc-editor.org/info/rfc7209>.
Acknowledgements
Special thanks to Yakov Rekhter for reviewing this document several times and providing valuable comments, and for his very engaging discussions on several topics of this document that helped shape this document. We would also like to thank Pedro Marques, Kaushik Ghosh, Nischal Sheth, Robert Raszuk, Amit Shukla, and Nadeem Mohammed for discussions that helped shape this document. We would also like to thank Han Nguyen for his comments and support of this work. We would also like to thank Steve Kensil and Reshad Rahman for their reviews. We would like to thank Jorge Rabadan for his contribution to Section 5 of this document. We would like to thank Thomas Morin for his review of this document and his contribution of Section 8.6. Many thanks to Jakob Heitz for his help to improve several sections of this document. We would also like to thank Clarence Filsfils, Dennis Cai, Quaizar Vohra, Kireeti Kompella, and Apurva Mehta for their contributions to this document. Last but not least, special thanks to Giles Heron (our WG chair) for his detailed review of this document in preparation for WG Last Call and for making many valuable suggestions.Contributors
In addition to the authors listed on the front page, the following co-authors have also contributed to this document: Keyur Patel Samer Salam Sami Boutros Cisco Yakov Rekhter Ravi Shekhar Juniper Networks Florin Balus Nuage Networks
Authors' Addresses
Ali Sajassi (editor) Cisco EMail: sajassi@cisco.com Rahul Aggarwal Arktan EMail: raggarwa_1@yahoo.com Nabil Bitar Verizon Communications EMail : nabil.n.bitar@verizon.com Aldrin Isaac Bloomberg EMail: aisaac71@bloomberg.net James Uttaro AT&T EMail: uttaro@att.com John Drake Juniper Networks EMail: jdrake@juniper.net Wim Henderickx Alcatel-Lucent EMail: wim.henderickx@alcatel-lucent.com