Internet Engineering Task Force (IETF) A. Sajassi, Ed. Request for Comments: 7432 Cisco Category: Standards Track R. Aggarwal ISSN: 2070-1721 Arktan N. Bitar Verizon A. Isaac Bloomberg J. Uttaro AT&T J. Drake Juniper Networks W. Henderickx Alcatel-Lucent February 2015 BGP MPLS-Based Ethernet VPNAbstract
This document describes procedures for BGP MPLS-based Ethernet VPNs (EVPN). The procedures described here meet the requirements specified in RFC 7209 -- "Requirements for Ethernet VPN (EVPN)". Status of This Memo This is an Internet Standards Track document. This document is a product of the Internet Engineering Task Force (IETF). It represents the consensus of the IETF community. It has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG). Further information on Internet Standards is available in Section 2 of RFC 5741. Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at http://www.rfc-editor.org/info/rfc7432.
Copyright Notice Copyright (c) 2015 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.Table of Contents
1. Introduction ....................................................4 2. Specification of Requirements ...................................4 3. Terminology .....................................................4 4. BGP MPLS-Based EVPN Overview ....................................6 5. Ethernet Segment ................................................7 6. Ethernet Tag ID ................................................10 6.1. VLAN-Based Service Interface ..............................11 6.2. VLAN Bundle Service Interface .............................11 6.2.1. Port-Based Service Interface .......................11 6.3. VLAN-Aware Bundle Service Interface .......................11 6.3.1. Port-Based VLAN-Aware Service Interface ............12 7. BGP EVPN Routes ................................................13 7.1. Ethernet Auto-discovery Route .............................14 7.2. MAC/IP Advertisement Route ................................14 7.3. Inclusive Multicast Ethernet Tag Route ....................15 7.4. Ethernet Segment Route ....................................16 7.5. ESI Label Extended Community ..............................16 7.6. ES-Import Route Target ....................................17 7.7. MAC Mobility Extended Community ...........................18 7.8. Default Gateway Extended Community ........................18 7.9. Route Distinguisher Assignment per EVI ....................18 7.10. Route Targets ............................................19 7.10.1. Auto-derivation from the Ethernet Tag ID ..........19 8. Multihoming Functions ..........................................19 8.1. Multihomed Ethernet Segment Auto-discovery ................19 8.1.1. Constructing the Ethernet Segment Route ............19 8.2. Fast Convergence ..........................................20 8.2.1. Constructing Ethernet A-D per Ethernet Segment Route ......................................21 8.2.1.1. Ethernet A-D Route Targets ................21
8.3. Split Horizon .............................................22 8.3.1. ESI Label Assignment ...............................22 8.3.1.1. Ingress Replication .......................22 8.3.1.2. P2MP MPLS LSPs ............................24 8.4. Aliasing and Backup Path ..................................25 8.4.1. Constructing Ethernet A-D per EVPN Instance Route ..26 8.5. Designated Forwarder Election .............................27 8.6. Interoperability with Single-Homing PEs ...................29 9. Determining Reachability to Unicast MAC Addresses ..............30 9.1. Local Learning ............................................30 9.2. Remote Learning ...........................................30 9.2.1. Constructing MAC/IP Address Advertisement ..........31 9.2.2. Route Resolution ...................................32 10. ARP and ND ....................................................33 10.1. Default Gateway ..........................................34 11. Handling of Multi-destination Traffic .........................36 11.1. Constructing Inclusive Multicast Ethernet Tag Route ......36 11.2. P-Tunnel Identification ..................................37 12. Processing of Unknown Unicast Packets .........................38 12.1. Ingress Replication ......................................38 12.2. P2MP MPLS LSPs ...........................................39 13. Forwarding Unicast Packets ....................................39 13.1. Forwarding Packets Received from a CE ....................39 13.2. Forwarding Packets Received from a Remote PE .............41 13.2.1. Unknown Unicast Forwarding ........................41 13.2.2. Known Unicast Forwarding ..........................41 14. Load Balancing of Unicast Packets .............................41 14.1. Load Balancing of Traffic from a PE to Remote CEs ........41 14.1.1. Single-Active Redundancy Mode .....................42 14.1.2. All-Active Redundancy Mode ........................42 14.2. Load Balancing of Traffic between a PE and a Local CE ....44 14.2.1. Data-Plane Learning ...............................44 14.2.2. Control-Plane Learning ............................44 15. MAC Mobility ..................................................45 15.1. MAC Duplication Issue ....................................47 15.2. Sticky MAC Addresses .....................................47 16. Multicast and Broadcast .......................................47 16.1. Ingress Replication ......................................47 16.2. P2MP LSPs ................................................48 16.2.1. Inclusive Trees ...................................48 17. Convergence ...................................................49 17.1. Transit Link and Node Failures between PEs ...............49 17.2. PE Failures ..............................................49 17.3. PE-to-CE Network Failures ................................49 18. Frame Ordering ................................................50
19. Security Considerations .......................................50 20. IANA Considerations ...........................................52 21. References ....................................................52 21.1. Normative References .....................................52 21.2. Informative References ...................................53 Acknowledgements ..................................................55 Contributors ......................................................55 Authors' Addresses ................................................561. Introduction
Virtual Private LAN Service (VPLS), as defined in [RFC4664], [RFC4761], and [RFC4762], is a proven and widely deployed technology. However, the existing solution has a number of limitations when it comes to multihoming and redundancy, multicast optimization, provisioning simplicity, flow-based load balancing, and multipathing; these limitations are important considerations for Data Center (DC) deployments. [RFC7209] describes the motivation for a new solution to address these limitations. It also outlines a set of requirements that the new solution must address. This document describes procedures for a BGP MPLS-based solution called Ethernet VPN (EVPN) to address the requirements specified in [RFC7209]. Please refer to [RFC7209] for the detailed requirements and motivation. EVPN requires extensions to existing IP/MPLS protocols as described in this document. In addition to these extensions, EVPN uses several building blocks from existing MPLS technologies.2. Specification of Requirements
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].3. Terminology
Broadcast Domain: In a bridged network, the broadcast domain corresponds to a Virtual LAN (VLAN), where a VLAN is typically represented by a single VLAN ID (VID) but can be represented by several VIDs where Shared VLAN Learning (SVL) is used per [802.1Q]. Bridge Table: An instantiation of a broadcast domain on a MAC-VRF. CE: Customer Edge device, e.g., a host, router, or switch.
EVI: An EVPN instance spanning the Provider Edge (PE) devices participating in that EVPN. MAC-VRF: A Virtual Routing and Forwarding table for Media Access Control (MAC) addresses on a PE. Ethernet Segment (ES): When a customer site (device or network) is connected to one or more PEs via a set of Ethernet links, then that set of links is referred to as an 'Ethernet segment'. Ethernet Segment Identifier (ESI): A unique non-zero identifier that identifies an Ethernet segment is called an 'Ethernet Segment Identifier'. Ethernet Tag: An Ethernet tag identifies a particular broadcast domain, e.g., a VLAN. An EVPN instance consists of one or more broadcast domains. LACP: Link Aggregation Control Protocol. MP2MP: Multipoint to Multipoint. MP2P: Multipoint to Point. P2MP: Point to Multipoint. P2P: Point to Point. PE: Provider Edge device. Single-Active Redundancy Mode: When only a single PE, among all the PEs attached to an Ethernet segment, is allowed to forward traffic to/from that Ethernet segment for a given VLAN, then the Ethernet segment is defined to be operating in Single-Active redundancy mode. All-Active Redundancy Mode: When all PEs attached to an Ethernet segment are allowed to forward known unicast traffic to/from that Ethernet segment for a given VLAN, then the Ethernet segment is defined to be operating in All-Active redundancy mode.
4. BGP MPLS-Based EVPN Overview
This section provides an overview of EVPN. An EVPN instance comprises Customer Edge devices (CEs) that are connected to Provider Edge devices (PEs) that form the edge of the MPLS infrastructure. A CE may be a host, a router, or a switch. The PEs provide virtual Layer 2 bridged connectivity between the CEs. There may be multiple EVPN instances in the provider's network. The PEs may be connected by an MPLS Label Switched Path (LSP) infrastructure, which provides the benefits of MPLS technology, such as fast reroute, resiliency, etc. The PEs may also be connected by an IP infrastructure, in which case IP/GRE (Generic Routing Encapsulation) tunneling or other IP tunneling can be used between the PEs. The detailed procedures in this document are specified only for MPLS LSPs as the tunneling technology. However, these procedures are designed to be extensible to IP tunneling as the Packet Switched Network (PSN) tunneling technology. In an EVPN, MAC learning between PEs occurs not in the data plane (as happens with traditional bridging in VPLS [RFC4761] [RFC4762]) but in the control plane. Control-plane learning offers greater control over the MAC learning process, such as restricting who learns what, and the ability to apply policies. Furthermore, the control plane chosen for advertising MAC reachability information is multi-protocol (MP) BGP (similar to IP VPNs [RFC4364]). This provides flexibility and the ability to preserve the "virtualization" or isolation of groups of interacting agents (hosts, servers, virtual machines) from each other. In EVPN, PEs advertise the MAC addresses learned from the CEs that are connected to them, along with an MPLS label, to other PEs in the control plane using Multiprotocol BGP (MP-BGP). Control-plane learning enables load balancing of traffic to and from CEs that are multihomed to multiple PEs. This is in addition to load balancing across the MPLS core via multiple LSPs between the same pair of PEs. In other words, it allows CEs to connect to multiple active points of attachment. It also improves convergence times in the event of certain network failures. However, learning between PEs and CEs is done by the method best suited to the CE: data-plane learning, IEEE 802.1x, the Link Layer Discovery Protocol (LLDP), IEEE 802.1aq, Address Resolution Protocol (ARP), management plane, or other protocols. It is a local decision as to whether the Layer 2 forwarding table on a PE is populated with all the MAC destination addresses known to the control plane, or whether the PE implements a cache-based scheme. For instance, the MAC forwarding table may be populated only with the MAC destinations of the active flows transiting a specific PE.
The policy attributes of EVPN are very similar to those of IP-VPN. An EVPN instance requires a Route Distinguisher (RD) that is unique per MAC-VRF and one or more globally unique Route Targets (RTs). A CE attaches to a MAC-VRF on a PE, on an Ethernet interface that may be configured for one or more Ethernet tags, e.g., VLAN IDs. Some deployment scenarios guarantee uniqueness of VLAN IDs across EVPN instances: all points of attachment for a given EVPN instance use the same VLAN ID, and no other EVPN instance uses this VLAN ID. This document refers to this case as a "Unique VLAN EVPN" and describes simplified procedures to optimize for it.5. Ethernet Segment
As indicated in [RFC7209], each Ethernet segment needs a unique identifier in an EVPN. This section defines how such identifiers are assigned and how they are encoded for use in EVPN signaling. Later sections of this document describe the protocol mechanisms that utilize the identifiers. When a customer site is connected to one or more PEs via a set of Ethernet links, then this set of Ethernet links constitutes an "Ethernet segment". For a multihomed site, each Ethernet segment (ES) is identified by a unique non-zero identifier called an Ethernet Segment Identifier (ESI). An ESI is encoded as a 10-octet integer in line format with the most significant octet sent first. The following two ESI values are reserved: - ESI 0 denotes a single-homed site. - ESI {0xFF} (repeated 10 times) is known as MAX-ESI and is reserved. In general, an Ethernet segment SHOULD have a non-reserved ESI that is unique network wide (i.e., across all EVPN instances on all the PEs). If the CE(s) constituting an Ethernet segment is (are) managed by the network operator, then ESI uniqueness should be guaranteed; however, if the CE(s) is (are) not managed, then the operator MUST configure a network-wide unique ESI for that Ethernet segment. This is required to enable auto-discovery of Ethernet segments and Designated Forwarder (DF) election.
In a network with managed and non-managed CEs, the ESI has the following format: +---+---+---+---+---+---+---+---+---+---+ | T | ESI Value | +---+---+---+---+---+---+---+---+---+---+ Where: T (ESI Type) is a 1-octet field (most significant octet) that specifies the format of the remaining 9 octets (ESI Value). The following six ESI types can be used: - Type 0 (T=0x00) - This type indicates an arbitrary 9-octet ESI value, which is managed and configured by the operator. - Type 1 (T=0x01) - When IEEE 802.1AX LACP is used between the PEs and CEs, this ESI type indicates an auto-generated ESI value determined from LACP by concatenating the following parameters: + CE LACP System MAC address (6 octets). The CE LACP System MAC address MUST be encoded in the high-order 6 octets of the ESI Value field. + CE LACP Port Key (2 octets). The CE LACP port key MUST be encoded in the 2 octets next to the System MAC address. + The remaining octet will be set to 0x00. As far as the CE is concerned, it would treat the multiple PEs that it is connected to as the same switch. This allows the CE to aggregate links that are attached to different PEs in the same bundle. This mechanism could be used only if it produces ESIs that satisfy the uniqueness requirement specified above.
- Type 2 (T=0x02) - This type is used in the case of indirectly connected hosts via a bridged LAN between the CEs and the PEs. The ESI Value is auto-generated and determined based on the Layer 2 bridge protocol as follows: If the Multiple Spanning Tree Protocol (MSTP) is used in the bridged LAN, then the value of the ESI is derived by listening to Bridge PDUs (BPDUs) on the Ethernet segment. To achieve this, the PE is not required to run MSTP. However, the PE must learn the Root Bridge MAC address and Bridge Priority of the root of the Internal Spanning Tree (IST) by listening to the BPDUs. The ESI Value is constructed as follows: + Root Bridge MAC address (6 octets). The Root Bridge MAC address MUST be encoded in the high-order 6 octets of the ESI Value field. + Root Bridge Priority (2 octets). The CE Root Bridge Priority MUST be encoded in the 2 octets next to the Root Bridge MAC address. + The remaining octet will be set to 0x00. This mechanism could be used only if it produces ESIs that satisfy the uniqueness requirement specified above. - Type 3 (T=0x03) - This type indicates a MAC-based ESI Value that can be auto-generated or configured by the operator. The ESI Value is constructed as follows: + System MAC address (6 octets). The PE MAC address MUST be encoded in the high-order 6 octets of the ESI Value field. + Local Discriminator value (3 octets). The Local Discriminator value MUST be encoded in the low-order 3 octets of the ESI Value. This mechanism could be used only if it produces ESIs that satisfy the uniqueness requirement specified above. - Type 4 (T=0x04) - This type indicates a router-ID ESI Value that can be auto-generated or configured by the operator. The ESI Value is constructed as follows: + Router ID (4 octets). The system router ID MUST be encoded in the high-order 4 octets of the ESI Value field. + Local Discriminator value (4 octets). The Local Discriminator value MUST be encoded in the 4 octets next to the IP address. + The low-order octet of the ESI Value will be set to 0x00.
This mechanism could be used only if it produces ESIs that satisfy the uniqueness requirement specified above. - Type 5 (T=0x05) - This type indicates an Autonomous System (AS)-based ESI Value that can be auto-generated or configured by the operator. The ESI Value is constructed as follows: + AS number (4 octets). This is an AS number owned by the system and MUST be encoded in the high-order 4 octets of the ESI Value field. If a 2-octet AS number is used, the high-order extra 2 octets will be 0x0000. + Local Discriminator value (4 octets). The Local Discriminator value MUST be encoded in the 4 octets next to the AS number. + The low-order octet of the ESI Value will be set to 0x00. This mechanism could be used only if it produces ESIs that satisfy the uniqueness requirement specified above.6. Ethernet Tag ID
An Ethernet Tag ID is a 32-bit field containing either a 12-bit or 24-bit identifier that identifies a particular broadcast domain (e.g., a VLAN) in an EVPN instance. The 12-bit identifier is called the VLAN ID (VID). An EVPN instance consists of one or more broadcast domains (one or more VLANs). VLANs are assigned to a given EVPN instance by the provider of the EVPN service. A given VLAN can itself be represented by multiple VIDs. In such cases, the PEs participating in that VLAN for a given EVPN instance are responsible for performing VLAN ID translation to/from locally attached CE devices. If a VLAN is represented by a single VID across all PE devices participating in that VLAN for that EVPN instance, then there is no need for VID translation at the PEs. Furthermore, some deployment scenarios guarantee uniqueness of VIDs across all EVPN instances; all points of attachment for a given EVPN instance use the same VID, and no other EVPN instances use that VID. This allows the RT(s) for each EVPN instance to be derived automatically from the corresponding VID, as described in Section 7.10.1. The following subsections discuss the relationship between broadcast domains (e.g., VLANs), Ethernet Tag IDs (e.g., VIDs), and MAC-VRFs as well as the setting of the Ethernet Tag ID, in the various EVPN BGP routes (defined in Section 8), for the different types of service interfaces described in [RFC7209].
The following Ethernet Tag ID value is reserved: - Ethernet Tag ID {0xFFFFFFFF} is known as MAX-ET.6.1. VLAN-Based Service Interface
With this service interface, an EVPN instance consists of only a single broadcast domain (e.g., a single VLAN). Therefore, there is a one-to-one mapping between a VID on this interface and a MAC-VRF. Since a MAC-VRF corresponds to a single VLAN, it consists of a single bridge table corresponding to that VLAN. If the VLAN is represented by multiple VIDs (e.g., a different VID per Ethernet segment per PE), then each PE needs to perform VID translation for frames destined to its Ethernet segment(s). In such scenarios, the Ethernet frames transported over an MPLS/IP network SHOULD remain tagged with the originating VID, and a VID translation MUST be supported in the data path and MUST be performed on the disposition PE. The Ethernet Tag ID in all EVPN routes MUST be set to 0.6.2. VLAN Bundle Service Interface
With this service interface, an EVPN instance corresponds to multiple broadcast domains (e.g., multiple VLANs); however, only a single bridge table is maintained per MAC-VRF, which means multiple VLANs share the same bridge table. This implies that MAC addresses MUST be unique across all VLANs for that EVI in order for this service to work. In other words, there is a many-to-one mapping between VLANs and a MAC-VRF, and the MAC-VRF consists of a single bridge table. Furthermore, a single VLAN must be represented by a single VID -- e.g., no VID translation is allowed for this service interface type. The MPLS-encapsulated frames MUST remain tagged with the originating VID. Tag translation is NOT permitted. The Ethernet Tag ID in all EVPN routes MUST be set to 0.6.2.1. Port-Based Service Interface
This service interface is a special case of the VLAN bundle service interface, where all of the VLANs on the port are part of the same service and map to the same bundle. The procedures are identical to those described in Section 6.2.6.3. VLAN-Aware Bundle Service Interface
With this service interface, an EVPN instance consists of multiple broadcast domains (e.g., multiple VLANs) with each VLAN having its own bridge table -- i.e., multiple bridge tables (one per VLAN) are maintained by a single MAC-VRF corresponding to the EVPN instance.
Broadcast, unknown unicast, or multicast (BUM) traffic is sent only to the CEs in a given broadcast domain; however, the broadcast domains within an EVI either MAY each have their own P-Tunnel or MAY share P-Tunnels -- e.g., all of the broadcast domains in an EVI MAY share a single P-Tunnel. In the case where a single VLAN is represented by a single VID and thus no VID translation is required, an MPLS-encapsulated packet MUST carry that VID. The Ethernet Tag ID in all EVPN routes MUST be set to that VID. The advertising PE MAY advertise the MPLS Label1 in the MAC/IP Advertisement route representing ONLY the EVI or representing both the Ethernet Tag ID and the EVI. This decision is only a local matter by the advertising PE (which is also the disposition PE) and doesn't affect any other PEs. In the case where a single VLAN is represented by different VIDs on different CEs and thus VID translation is required, a normalized Ethernet Tag ID (VID) MUST be carried in the EVPN BGP routes. Furthermore, the advertising PE advertises the MPLS Label1 in the MAC/IP Advertisement route representing both the Ethernet Tag ID and the EVI, so that upon receiving an MPLS-encapsulated packet, it can identify the corresponding bridge table from the MPLS EVPN label and perform Ethernet Tag ID translation ONLY at the disposition PE -- i.e., the Ethernet frames transported over the MPLS/IP network MUST remain tagged with the originating VID, and VID translation is performed on the disposition PE. The Ethernet Tag ID in all EVPN routes MUST be set to the normalized Ethernet Tag ID assigned by the EVPN provider.6.3.1. Port-Based VLAN-Aware Service Interface
This service interface is a special case of the VLAN-aware bundle service interface, where all of the VLANs on the port are part of the same service and are mapped to a single bundle but without any VID translation. The procedures are a subset of those described in Section 6.3.