Internet Engineering Task Force (IETF) A. Ford Request for Comments: 6824 Cisco Category: Experimental C. Raiciu ISSN: 2070-1721 U. Politechnica of Bucharest M. Handley U. College London O. Bonaventure U. catholique de Louvain January 2013 TCP Extensions for Multipath Operation with Multiple AddressesAbstract
TCP/IP communication is currently restricted to a single path per connection, yet multiple paths often exist between peers. The simultaneous use of these multiple paths for a TCP/IP session would improve resource usage within the network and, thus, improve user experience through higher throughput and improved resilience to network failure. Multipath TCP provides the ability to simultaneously use multiple paths between peers. This document presents a set of extensions to traditional TCP to support multipath operation. The protocol offers the same type of service to applications as TCP (i.e., reliable bytestream), and it provides the components necessary to establish and use multiple TCP flows across potentially disjoint paths. Status of This Memo This document is not an Internet Standards Track specification; it is published for examination, experimental implementation, and evaluation. This document defines an Experimental Protocol for the Internet community. This document is a product of the Internet Engineering Task Force (IETF). It represents the consensus of the IETF community. It has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG). Not all documents approved by the IESG are a candidate for any level of Internet Standard; see Section 2 of RFC 5741. Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at http://www.rfc-editor.org/info/rfc6824.
Copyright Notice Copyright (c) 2013 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.Table of Contents
1. Introduction ....................................................4 1.1. Design Assumptions .........................................4 1.2. Multipath TCP in the Networking Stack ......................5 1.3. Terminology ................................................6 1.4. MPTCP Concept ..............................................7 1.5. Requirements Language ......................................8 2. Operation Overview ..............................................8 2.1. Initiating an MPTCP Connection .............................9 2.2. Associating a New Subflow with an Existing MPTCP Connection .................................................9 2.3. Informing the Other Host about Another Potential Address ..10 2.4. Data Transfer Using MPTCP .................................11 2.5. Requesting a Change in a Path's Priority ..................11 2.6. Closing an MPTCP Connection ...............................12 2.7. Notable Features ..........................................12 3. MPTCP Protocol .................................................12 3.1. Connection Initiation .....................................14 3.2. Starting a New Subflow ....................................18 3.3. General MPTCP Operation ...................................23 3.3.1. Data Sequence Mapping ..............................25 3.3.2. Data Acknowledgments ...............................28 3.3.3. Closing a Connection ...............................29 3.3.4. Receiver Considerations ............................30 3.3.5. Sender Considerations ..............................31 3.3.6. Reliability and Retransmissions ....................32 3.3.7. Congestion Control Considerations ..................33 3.3.8. Subflow Policy .....................................34 3.4. Address Knowledge Exchange (Path Management) ..............35 3.4.1. Address Advertisement ..............................36 3.4.2. Remove Address .....................................39 3.5. Fast Close ................................................40
3.6. Fallback ..................................................41 3.7. Error Handling ............................................45 3.8. Heuristics ................................................45 3.8.1. Port Usage .........................................46 3.8.2. Delayed Subflow Start ..............................46 3.8.3. Failure Handling ...................................47 4. Semantic Issues ................................................48 5. Security Considerations ........................................49 6. Interactions with Middleboxes ..................................51 7. Acknowledgments ................................................55 8. IANA Considerations ............................................55 9. References .....................................................57 9.1. Normative References ......................................57 9.2. Informative References ....................................57 Appendix A. Notes on Use of TCP Options ...........................59 Appendix B. Control Blocks ........................................60 B.1. MPTCP Control Block .......................................60 B.1.1. Authentication and Metadata ........................60 B.1.2. Sending Side .......................................61 B.1.3. Receiving Side .....................................61 B.2. TCP Control Blocks ........................................62 B.2.1. Sending Side .......................................62 B.2.2. Receiving Side .....................................62 Appendix C. Finite State Machine ..................................63
1. Introduction
Multipath TCP (MPTCP) is a set of extensions to regular TCP [1] to provide a Multipath TCP [2] service, which enables a transport connection to operate across multiple paths simultaneously. This document presents the protocol changes required to add multipath capability to TCP; specifically, those for signaling and setting up multiple paths ("subflows"), managing these subflows, reassembly of data, and termination of sessions. This is not the only information required to create a Multipath TCP implementation, however. This document is complemented by three others: o Architecture [2], which explains the motivations behind Multipath TCP, contains a discussion of high-level design decisions on which this design is based, and an explanation of a functional separation through which an extensible MPTCP implementation can be developed. o Congestion control [5] presents a safe congestion control algorithm for coupling the behavior of the multiple paths in order to "do no harm" to other network users. o Application considerations [6] discusses what impact MPTCP will have on applications, what applications will want to do with MPTCP, and as a consequence of these factors, what API extensions an MPTCP implementation should present.1.1. Design Assumptions
In order to limit the potentially huge design space, the working group imposed two key constraints on the Multipath TCP design presented in this document: o It must be backwards-compatible with current, regular TCP, to increase its chances of deployment. o It can be assumed that one or both hosts are multihomed and multiaddressed. To simplify the design, we assume that the presence of multiple addresses at a host is sufficient to indicate the existence of multiple paths. These paths need not be entirely disjoint: they may share one or many routers between them. Even in such a situation, making use of multiple paths is beneficial, improving resource utilization and resilience to a subset of node failures. The congestion control algorithms defined in [5] ensure this does not act detrimentally. Furthermore, there may be some scenarios where different TCP ports on a single host can provide disjoint paths (such
as through certain Equal-Cost Multipath (ECMP) implementations [7]), and so the MPTCP design also supports the use of ports in path identifiers. There are three aspects to the backwards-compatibility listed above (discussed in more detail in [2]): External Constraints: The protocol must function through the vast majority of existing middleboxes such as NATs, firewalls, and proxies, and as such must resemble existing TCP as far as possible on the wire. Furthermore, the protocol must not assume the segments it sends on the wire arrive unmodified at the destination: they may be split or coalesced; TCP options may be removed or duplicated. Application Constraints: The protocol must be usable with no change to existing applications that use the common TCP API (although it is reasonable that not all features would be available to such legacy applications). Furthermore, the protocol must provide the same service model as regular TCP to the application. Fallback: The protocol should be able to fall back to standard TCP with no interference from the user, to be able to communicate with legacy hosts. The complementary application considerations document [6] discusses the necessary features of an API to provide backwards-compatibility, as well as API extensions to convey the behavior of MPTCP at a level of control and information equivalent to that available with regular, single-path TCP. Further discussion of the design constraints and associated design decisions are given in the MPTCP Architecture document [2] and in [8].1.2. Multipath TCP in the Networking Stack
MPTCP operates at the transport layer and aims to be transparent to both higher and lower layers. It is a set of additional features on top of standard TCP; Figure 1 illustrates this layering. MPTCP is designed to be usable by legacy applications with no changes; detailed discussion of its interactions with applications is given in [6].
+-------------------------------+ | Application | +---------------+ +-------------------------------+ | Application | | MPTCP | +---------------+ + - - - - - - - + - - - - - - - + | TCP | | Subflow (TCP) | Subflow (TCP) | +---------------+ +-------------------------------+ | IP | | IP | IP | +---------------+ +-------------------------------+ Figure 1: Comparison of Standard TCP and MPTCP Protocol Stacks1.3. Terminology
This document makes use of a number of terms that are either MPTCP- specific or have defined meaning in the context of MPTCP, as follows: Path: A sequence of links between a sender and a receiver, defined in this context by a 4-tuple of source and destination address/ port pairs. Subflow: A flow of TCP segments operating over an individual path, which forms part of a larger MPTCP connection. A subflow is started and terminated similar to a regular TCP connection. (MPTCP) Connection: A set of one or more subflows, over which an application can communicate between two hosts. There is a one-to- one mapping between a connection and an application socket. Data-level: The payload data is nominally transferred over a connection, which in turn is transported over subflows. Thus, the term "data-level" is synonymous with "connection level", in contrast to "subflow-level", which refers to properties of an individual subflow. Token: A locally unique identifier given to a multipath connection by a host. May also be referred to as a "Connection ID". Host: An end host operating an MPTCP implementation, and either initiating or accepting an MPTCP connection. In addition to these terms, note that MPTCP's interpretation of, and effect on, regular single-path TCP semantics are discussed in Section 4.
1.4. MPTCP Concept
This section provides a high-level summary of normal operation of MPTCP, and is illustrated by the scenario shown in Figure 2. A detailed description of operation is given in Section 3. o To a non-MPTCP-aware application, MPTCP will behave the same as normal TCP. Extended APIs could provide additional control to MPTCP-aware applications [6]. An application begins by opening a TCP socket in the normal way. MPTCP signaling and operation are handled by the MPTCP implementation. o An MPTCP connection begins similarly to a regular TCP connection. This is illustrated in Figure 2 where an MPTCP connection is established between addresses A1 and B1 on Hosts A and B, respectively. o If extra paths are available, additional TCP sessions (termed MPTCP "subflows") are created on these paths, and are combined with the existing session, which continues to appear as a single connection to the applications at both ends. The creation of the additional TCP session is illustrated between Address A2 on Host A and Address B1 on Host B. o MPTCP identifies multiple paths by the presence of multiple addresses at hosts. Combinations of these multiple addresses equate to the additional paths. In the example, other potential paths that could be set up are A1<->B2 and A2<->B2. Although this additional session is shown as being initiated from A2, it could equally have been initiated from B1. o The discovery and setup of additional subflows will be achieved through a path management method; this document describes a mechanism by which a host can initiate new subflows by using its own additional addresses, or by signaling its available addresses to the other host. o MPTCP adds connection-level sequence numbers to allow the reassembly of segments arriving on multiple subflows with differing network delays. o Subflows are terminated as regular TCP connections, with a four- way FIN handshake. The MPTCP connection is terminated by a connection-level FIN.
Host A Host B ------------------------ ------------------------ Address A1 Address A2 Address B1 Address B2 ---------- ---------- ---------- ---------- | | | | | (initial connection setup) | | |----------------------------------->| | |<-----------------------------------| | | | | | | (additional subflow setup) | | |--------------------->| | | |<---------------------| | | | | | | | | | Figure 2: Example MPTCP Usage Scenario1.5. Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [3].2. Operation Overview
This section presents a single description of common MPTCP operation, with reference to the protocol operation. This is a high-level overview of the key functions; the full specification follows in Section 3. Extensibility and negotiated features are not discussed here. Considerable reference is made to symbolic names of MPTCP options throughout this section -- these are subtypes of the IANA- assigned MPTCP option (see Section 8), and their formats are defined in the detailed protocol specification that follows in Section 3. A Multipath TCP connection provides a bidirectional bytestream between two hosts communicating like normal TCP and, thus, does not require any change to the applications. However, Multipath TCP enables the hosts to use different paths with different IP addresses to exchange packets belonging to the MPTCP connection. A Multipath TCP connection appears like a normal TCP connection to an application. However, to the network layer, each MPTCP subflow looks like a regular TCP flow whose segments carry a new TCP option type. Multipath TCP manages the creation, removal, and utilization of these subflows to send data. The number of subflows that are managed within a Multipath TCP connection is not fixed and it can fluctuate during the lifetime of the Multipath TCP connection.
All MPTCP operations are signaled with a TCP option -- a single numerical type for MPTCP, with "sub-types" for each MPTCP message. What follows is a summary of the purpose and rationale of these messages.2.1. Initiating an MPTCP Connection
This is the same signaling as for initiating a normal TCP connection, but the SYN, SYN/ACK, and ACK packets also carry the MP_CAPABLE option. This is variable length and serves multiple purposes. Firstly, it verifies whether the remote host supports Multipath TCP; secondly, this option allows the hosts to exchange some information to authenticate the establishment of additional subflows. Further details are given in Section 3.1. Host A Host B ------ ------ MP_CAPABLE -> [A's key, flags] <- MP_CAPABLE [B's key, flags] ACK + MP_CAPABLE -> [A's key, B's key, flags]2.2. Associating a New Subflow with an Existing MPTCP Connection
The exchange of keys in the MP_CAPABLE handshake provides material that can be used to authenticate the endpoints when new subflows will be set up. Additional subflows begin in the same way as initiating a normal TCP connection, but the SYN, SYN/ACK, and ACK packets also carry the MP_JOIN option. Host A initiates a new subflow between one of its addresses and one of Host B's addresses. The token -- generated from the key -- is used to identify which MPTCP connection it is joining, and the HMAC is used for authentication. The Hash-based Message Authentication Code (HMAC) uses the keys exchanged in the MP_CAPABLE handshake, and the random numbers (nonces) exchanged in these MP_JOIN options. MP_JOIN also contains flags and an Address ID that can be used to refer to the source address without the sender needing to know if it has been changed by a NAT. Further details are in Section 3.2.
Host A Host B ------ ------ MP_JOIN -> [B's token, A's nonce, A's Address ID, flags] <- MP_JOIN [B's HMAC, B's nonce, B's Address ID, flags] ACK + MP_JOIN -> [A's HMAC] <- ACK2.3. Informing the Other Host about Another Potential Address
The set of IP addresses associated to a multihomed host may change during the lifetime of an MPTCP connection. MPTCP supports the addition and removal of addresses on a host both implicitly and explicitly. If Host A has established a subflow starting at address IP#-A1 and wants to open a second subflow starting at address IP#-A2, it simply initiates the establishment of the subflow as explained above. The remote host will then be implicitly informed about the new address. In some circumstances, a host may want to advertise to the remote host the availability of an address without establishing a new subflow, for example, when a NAT prevents setup in one direction. In the example below, Host A informs Host B about its alternative IP address (IP#-A2). Host B may later send an MP_JOIN to this new address. Due to the presence of middleboxes that may translate IP addresses, this option uses an address identifier to unambiguously identify an address on a host. Further details are in Section 3.4.1. Host A Host B ------ ------ ADD_ADDR -> [IP#-A2, IP#-A2's Address ID] There is a corresponding signal for address removal, making use of the Address ID that is signaled in the add address handshake. Further details in Section 3.4.2. Host A Host B ------ ------ REMOVE_ADDR -> [IP#-A2's Address ID]
2.4. Data Transfer Using MPTCP
To ensure reliable, in-order delivery of data over subflows that may appear and disappear at any time, MPTCP uses a 64-bit data sequence number (DSN) to number all data sent over the MPTCP connection. Each subflow has its own 32-bit sequence number space and an MPTCP option maps the subflow sequence space to the data sequence space. In this way, data can be retransmitted on different subflows (mapped to the same DSN) in the event of failure. The "Data Sequence Signal" carries the "Data Sequence Mapping". The data sequence mapping consists of the subflow sequence number, data sequence number, and length for which this mapping is valid. This option can also carry a connection-level acknowledgment (the "Data ACK") for the received DSN. With MPTCP, all subflows share the same receive buffer and advertise the same receive window. There are two levels of acknowledgment in MPTCP. Regular TCP acknowledgments are used on each subflow to acknowledge the reception of the segments sent over the subflow independently of their DSN. In addition, there are connection-level acknowledgments for the data sequence space. These acknowledgments track the advancement of the bytestream and slide the receiving window. Further details are in Section 3.3. Host A Host B ------ ------ DATA_SEQUENCE_SIGNAL -> [Data Sequence Mapping] [Data ACK] [Checksum]2.5. Requesting a Change in a Path's Priority
Hosts can indicate at initial subflow setup whether they wish the subflow to be used as a regular or backup path -- a backup path only being used if there are no regular paths available. During a connection, Host A can request a change in the priority of a subflow through the MP_PRIO signal to Host B. Further details are in Section 3.3.8. Host A Host B ------ ------ MP_PRIO ->
2.6. Closing an MPTCP Connection
When Host A wants to inform Host B that it has no more data to send, it signals this "Data FIN" as part of the Data Sequence Signal (see above). It has the same semantics and behavior as a regular TCP FIN, but at the connection level. Once all the data on the MPTCP connection has been successfully received, then this message is acknowledged at the connection level with a DATA_ACK. Further details are in Section 3.3.3. Host A Host B ------ ------ DATA_SEQUENCE_SIGNAL -> [Data FIN] <- (MPTCP DATA_ACK)2.7. Notable Features
It is worth highlighting that MPTCP's signaling has been designed with several key requirements in mind: o To cope with NATs on the path, addresses are referred to by Address IDs, in case the IP packet's source address gets changed by a NAT. Setting up a new TCP flow is not possible if the passive opener is behind a NAT; to allow subflows to be created when either end is behind a NAT, MPTCP uses the ADD_ADDR message. o MPTCP falls back to ordinary TCP if MPTCP operation is not possible, for example, if one host is not MPTCP capable or if a middlebox alters the payload. o To meet the threats identified in [9], the following steps are taken: keys are sent in the clear in the MP_CAPABLE messages; MP_JOIN messages are secured with HMAC-SHA1 ([10], [4]) using those keys; and standard TCP validity checks are made on the other messages (ensuring sequence numbers are in-window).