Geneve is a UDP-based network virtualization overlay encapsulation protocol designed to establish tunnels between NVEs over an existing IP network. It is intended for use in public or private data center environments, for deploying multi-tenant overlay networks over an existing IP underlay network.
As a UDP-based protocol, Geneve adheres to the UDP usage guidelines as specified in [
RFC 8085]. The applicability of these guidelines is dependent on the underlay IP network and the nature of the Geneve payload protocol (for example, TCP/IP, IP/Ethernet).
Geneve is intended to be deployed in a data center network environment operated by a single operator or an adjacent set of cooperating network operators that fits with the definition of controlled environments in [
RFC 8085]. A network in a controlled environment can be managed to operate under certain conditions, whereas in the general Internet, this cannot be done. Hence, requirements for a tunneling protocol operating under a controlled environment can be less restrictive than the requirements of the general Internet.
For the purpose of this document, a traffic-managed controlled environment (TMCE) is defined as an IP network that is traffic engineered and/or otherwise managed (e.g., via use of traffic rate limiters) to avoid congestion. The concept of a TMCE is outlined in [
RFC 8086]. Significant portions of the text in
Section 4.1 through
Section 4.3 are based on [
RFC 8086] as applicable to Geneve.
It is the responsibility of the operator to ensure that the guidelines/requirements in this section are followed as applicable to their Geneve deployment(s).
Geneve does not natively provide congestion-control functionality and relies on the payload protocol traffic for congestion control. As such, Geneve
MUST be used with congestion-controlled traffic or within a TMCE to avoid congestion. An operator of a TMCE may avoid congestion through careful provisioning of their networks, rate-limiting user data traffic, and managing traffic engineering according to path capacity.
The outer UDP checksum
SHOULD be used with Geneve when transported over IPv4; this is to provide integrity for the Geneve headers, options, and payload in case of data corruption (for example, to avoid misdelivery of the payload to different tenant systems). The UDP checksum provides a statistical guarantee that a payload was not corrupted in transit. These integrity checks are not strong from a coding or cryptographic perspective and are not designed to detect physical-layer errors or malicious modification of the datagram (see
Section 3.4 of
RFC 8085). In deployments where such a risk exists, an operator
SHOULD use additional data integrity mechanisms such as those offered by IPsec (see
Section 6.2).
An operator
MAY choose to disable UDP checksums and use zero UDP checksum if Geneve packet integrity is provided by other data integrity mechanisms, such as IPsec or additional checksums, or if one of the conditions (a, b, or c) in
Section 4.3.1 is met.
By default, UDP checksums
MUST be used when Geneve is transported over IPv6. A tunnel endpoint
MAY be configured for use with zero UDP checksum if additional requirements in
Section 4.3.1 are met.
When Geneve is used over IPv6, the UDP checksum is used to protect IPv6 headers, UDP headers, and Geneve headers, options, and payload from potential data corruption. As such, by default, Geneve
MUST use UDP checksums when transported over IPv6. An operator
MAY choose to configure zero UDP checksum if operating in a TMCE as stated in
Section 4.1 if one of the following conditions is met.
-
It is known that packet corruption is exceptionally unlikely (perhaps based on knowledge of equipment types in their underlay network) and the operator is willing to risk undetected packet corruption.
-
It is judged through observational measurements (perhaps through historic or current traffic flows that use non-zero checksum) that the level of packet corruption is tolerably low and is where the operator is willing to risk undetected corruption.
-
The Geneve payload is carrying applications that are tolerant of misdelivered or corrupted packets (perhaps through higher-layer checksum validation and/or reliability through retransmission).
In addition, Geneve tunnel implementations using zero UDP checksum
MUST meet the following requirements:
-
Use of UDP checksum over IPv6 MUST be the default configuration for all Geneve tunnels.
-
If Geneve is used with zero UDP checksum over IPv6, then such a tunnel endpoint implementation MUST meet all the requirements specified in Section 4 of RFC 6936 and requirement 1 as specified in Section 5 of RFC 6936 since it is relevant to Geneve.
-
The Geneve tunnel endpoint that decapsulates the tunnel SHOULD check that the source and destination IPv6 addresses are valid for the Geneve tunnel that is configured to receive zero UDP checksum and discard other packets for which such a check fails.
-
The Geneve tunnel endpoint that encapsulates the tunnel MAY use different IPv6 source addresses for each Geneve tunnel that uses zero UDP checksum mode in order to strengthen the decapsulator's check of the IPv6 source address (i.e., the same IPv6 source address is not to be used with more than one IPv6 destination address, irrespective of whether that destination address is a unicast or multicast address). When this is not possible, it is RECOMMENDED to use each source address for as few Geneve tunnels that use zero UDP checksum as is feasible.
Note that for requirements 3 and 4, the receiving tunnel endpoint can apply these checks only if it has out-of-band knowledge that the encapsulating tunnel endpoint is applying the indicated behavior. One possibility to obtain this out-of-band knowledge is through signaling by the control plane. The definition of the control plane is beyond the scope of this document.
-
Measures SHOULD be taken to prevent Geneve traffic over IPv6 with zero UDP checksum from escaping into the general Internet. Examples of such measures include employing packet filters at the gateways or edge of the Geneve network and/or keeping logical or physical separation of the Geneve network from networks carrying general Internet traffic.
The above requirements do not change the requirements specified in either [
RFC 8200] or [
RFC 6936].
The use of the source IPv6 address in addition to the destination IPv6 address, plus the recommendation against reuse of source IPv6 addresses among Geneve tunnels, collectively provide some mitigation for the absence of UDP checksum coverage of the IPv6 header. A traffic-managed controlled environment that satisfies at least one of the three conditions listed at the beginning of this section provides additional assurance.
As an IP-based tunneling protocol, Geneve shares many properties and techniques with existing protocols. The application of some of these are described in further detail, although, in general, most concepts applicable to the IP layer or to IP tunnels generally also function in the context of Geneve.
It is
RECOMMENDED that Path MTU Discovery (see [
RFC 1191] and [
RFC 8201]) be used to prevent or minimize fragmentation. The use of Path MTU Discovery on the transit network provides the encapsulating tunnel endpoint with soft-state information about the link that it may use to prevent or minimize fragmentation depending on its role in the virtualized network. The NVE can maintain this state (the MTU size of the tunnel link(s) associated with the tunnel endpoint), so if a tenant system sends large packets that, when encapsulated, exceed the MTU size of the tunnel link, the tunnel endpoint can discard such packets and send exception messages to the tenant system(s). If the tunnel endpoint is associated with a routing or forwarding function and/or has the capability to send ICMP messages, the encapsulating tunnel endpoint
MAY send ICMP fragmentation needed [
RFC 0792] or Packet Too Big [
RFC 4443] messages to the tenant system(s). When determining the MTU size of a tunnel link, the maximum length of options
MUST be assumed as options may vary on a per-packet basis. Recommendations and guidance for handling fragmentation in similar overlay encapsulation services like Pseudowire Emulation Edge-to-Edge (PWE3) are provided in
Section 5.3 of
RFC 3985.
Note that some implementations may not be capable of supporting fragmentation or other less common features of the IP header, such as options and extension headers. Some of the issues associated with MTU size and fragmentation in IP tunneling and use of ICMP messages are outlined in [
INTAREA-TUNNELS].
When encapsulating IP (including over Ethernet) packets in Geneve, there are several considerations for propagating Differentiated Services Code Point (DSCP) and Explicit Congestion Notification (ECN) bits from the inner header to the tunnel on transmission and the reverse on reception.
[
RFC 2983] provides guidance for mapping DSCP between inner and outer IP headers. Network virtualization is typically more closely aligned with the Pipe model described, where the DSCP value on the tunnel header is set based on a policy (which may be a fixed value, one based on the inner traffic class or some other mechanism for grouping traffic). Aspects of the Uniform model (which treats the inner and outer DSCP values as a single field by copying on ingress and egress) may also apply, such as the ability to re-mark the inner header on tunnel egress based on transit marking. However, the Uniform model is not conceptually consistent with network virtualization, which seeks to provide strong isolation between encapsulated traffic and the physical network.
[
RFC 6040] describes the mechanism for exposing ECN capabilities on IP tunnels and propagating congestion markers to the inner packets. This behavior
MUST be followed for IP packets encapsulated in Geneve.
Though either the Uniform or Pipe models could be used for handling TTL (or Hop Limit in case of IPv6) when tunneling IP packets, the Pipe model is more consistent with network virtualization. [
RFC 2003] provides guidance on handling TTL between inner IP header and outer IP tunnels; this model is similar to the Pipe model and is
RECOMMENDED for use with Geneve for network virtualization applications.
Geneve tunnels may either be point-to-point unicast between two tunnel endpoints or utilize broadcast or multicast addressing. It is not required that inner and outer addressing match in this respect. For example, in physical networks that do not support multicast, encapsulated multicast traffic may be replicated into multiple unicast tunnels or forwarded by policy to a unicast location (possibly to be replicated there).
With physical networks that do support multicast, it may be desirable to use this capability to take advantage of hardware replication for encapsulated packets. In this case, multicast addresses may be allocated in the physical network corresponding to tenants, encapsulated multicast groups, or some other factor. The allocation of these groups is a component of the control plane and, therefore, is beyond the scope of this document.
When physical multicast is in use, devices with heterogeneous capabilities may be present in the same group. Some options may only be interpretable by a subset of the devices in the group. Other devices can safely ignore such options unless the 'C' bit is set to mark the unknown option as critical. The requirements outlined in
Section 3.4 apply for critical options.
In addition, [
RFC 8293] provides examples of various mechanisms that can be used for multicast handling in network virtualization overlay networks.
Generally speaking, a Geneve tunnel is a unidirectional concept. IP is not a connection-oriented protocol, and it is possible for two tunnel endpoints to communicate with each other using different paths or to have one side not transmit anything at all. As Geneve is an IP-based protocol, the tunnel layer inherits these same characteristics.
It is possible for a tunnel to encapsulate a protocol, such as TCP, that is connection oriented and maintains session state at that layer. In addition, implementations
MAY model Geneve tunnels as connected, bidirectional links, for example, to provide the abstraction of a virtual port. In both of these cases, bidirectionality of the tunnel is handled at a higher layer and does not affect the operation of Geneve itself.
Geneve is intended to be flexible for use with a wide range of current and future applications. As a result, certain constraints may be placed on the use of metadata or other aspects of the protocol in order to optimize for a particular use case. For example, some applications may limit the types of options that are supported or enforce a maximum number or length of options. Other applications may only handle certain encapsulated payload types, such as Ethernet or IP. These optimizations can be implemented either globally (throughout the system) or locally (for example, restricted to certain classes of devices or network paths).
These constraints may be communicated to tunnel endpoints either explicitly through a control plane or implicitly by the nature of the application. As Geneve is defined as a data plane protocol that is control plane agnostic, definition of such mechanisms is beyond the scope of this document.
While Geneve options are flexible, a control plane may restrict the number of option TLVs as well as the order and size of the TLVs between tunnel endpoints to make it simpler for a data plane implementation in software or hardware to handle (see [
NVO3-ENCAP]). For example, there may be some critical information, such as a secure hash, that must be processed in a certain order to provide the lowest latency, or there may be other scenarios where the options must be processed in a given order due to protocol semantics.
A control plane may negotiate a subset of option TLVs and certain TLV ordering; it may also limit the total number of option TLVs present in the packet, for example, to accommodate hardware capable of processing fewer options. Hence, a control plane needs to have the ability to describe the supported TLV subset and its ordering to the tunnel endpoints. In the absence of a control plane, alternative configuration mechanisms may be used for this purpose. Such mechanisms are beyond the scope of this document.
Modern NICs currently provide a variety of offloads to enable the efficient processing of packets. The implementation of many of these offloads requires only that the encapsulated packet be easily parsed (for example, checksum offload). However, optimizations such as LSO and LRO involve some processing of the options themselves since they must be replicated/merged across multiple packets. In these situations, it is desirable not to require changes to the offload logic to handle the introduction of new options. To enable this, some constraints are placed on the definitions of options to allow for simple processing rules:
-
When performing LSO, a NIC MUST replicate the entire Geneve header and all options, including those unknown to the device, onto each resulting segment unless an option allows an exception. Conversely, when performing LRO, a NIC may assume that a binary comparison of the options (including unknown options) is sufficient to ensure equality and MAY merge packets with equal Geneve headers.
-
Options MUST NOT be reordered during the course of offload processing, including when merging packets for the purpose of LRO.
-
NICs performing offloads MUST NOT drop packets with unknown options, including those marked as critical, unless explicitly configured to do so.
There is no requirement that a given implementation of Geneve employ the offloads listed as examples above. However, as these offloads are currently widely deployed in commercially available NICs, the rules described here are intended to enable efficient handling of current and future options across a variety of devices.
Geneve is capable of encapsulating a wide range of protocols; therefore, a given implementation is likely to support only a small subset of the possibilities. However, as Ethernet is expected to be widely deployed, it is useful to describe the behavior of VLANs inside encapsulated Ethernet frames.
As with any protocol, support for inner VLAN headers is
OPTIONAL. In many cases, the use of encapsulated VLANs may be disallowed due to security or implementation considerations. However, in other cases, the trunking of VLAN frames across a Geneve tunnel can prove useful. As a result, the processing of inner VLAN tags upon ingress or egress from a tunnel endpoint is based upon the configuration of the tunnel endpoint and/or control plane and is not explicitly defined as part of the data format.