9. References
9.1. Normative References
[1] Postel, J., "Transmission Control Protocol", STD 7, RFC 793, September 1981. [2] Ford, A., Raiciu, C., Handley, M., Barre, S., and J. Iyengar, "Architectural Guidelines for Multipath TCP Development", RFC 6182, March 2011. [3] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [4] National Institute of Science and Technology, "Secure Hash Standard", Federal Information Processing Standard (FIPS) 180-3, October 2008, <http://csrc.nist.gov/publications/ fips/fips180-3/fips180-3_final.pdf>.9.2. Informative References
[5] Raiciu, C., Handley, M., and D. Wischik, "Coupled Congestion Control for Multipath Transport Protocols", RFC 6356, October 2011. [6] Scharf, M. and A. Ford, "MPTCP Application Interface Considerations", Work in Progress, October 2012. [7] Hopps, C., "Analysis of an Equal-Cost Multi-Path Algorithm", RFC 2992, November 2000. [8] Raiciu, C., Paasch, C., Barre, S., Ford, A., Honda, M., Duchene, F., Bonaventure, O., and M. Handley, "How Hard Can It Be? Designing and Implementing a Deployable Multipath TCP", Usenix Symposium on Networked Systems Design and Implementation 012, 2012, <https://www.usenix.org/conference/ nsdi12/how-hard-can-it-be-designing-and-implementing- deployable-multipath-tcp>. [9] Bagnulo, M., "Threat Analysis for TCP Extensions for Multipath Operation with Multiple Addresses", RFC 6181, March 2011. [10] Krawczyk, H., Bellare, M., and R. Canetti, "HMAC: Keyed-Hashing for Message Authentication", RFC 2104, February 1997. [11] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP Selective Acknowledgment Options", RFC 2018, October 1996.
[12] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion Control", RFC 5681, September 2009. [13] Gont, F., "Survey of Security Hardening Methods for Transmission Control Protocol (TCP) Implementations", Work in Progress, March 2012. [14] Eastlake, D., Schiller, J., and S. Crocker, "Randomness Requirements for Security", BCP 106, RFC 4086, June 2005. [15] Eastlake, D. and T. Hansen, "US Secure Hash Algorithms (SHA and SHA-based HMAC and HKDF)", RFC 6234, May 2011. [16] Jacobson, V., Braden, B., and D. Borman, "TCP Extensions for High Performance", RFC 1323, May 1992. [17] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition of Explicit Congestion Notification (ECN) to IP", RFC 3168, September 2001. [18] Rekhter, Y., Moskowitz, R., Karrenberg, D., Groot, G., and E. Lear, "Address Allocation for Private Internets", BCP 5, RFC 1918, February 1996. [19] Braden, R., "Requirements for Internet Hosts - Communication Layers", STD 3, RFC 1122, October 1989. [20] Ramaiah, A., "TCP option space extension", Work in Progress, March 2012. [21] Srisuresh, P. and K. Egevang, "Traditional IP Network Address Translator (Traditional NAT)", RFC 3022, January 2001. [22] Border, J., Kojo, M., Griner, J., Montenegro, G., and Z. Shelby, "Performance Enhancing Proxies Intended to Mitigate Link-Related Degradations", RFC 3135, June 2001. [23] Handley, M., Paxson, V., and C. Kreibich, "Network Intrusion Detection: Evasion, Traffic Normalization, and End-to-End Protocol Semantics", Usenix Security 2001, 2001, <http://www.usenix.org/events/sec01/full_papers/ handley/handley.pdf>. [24] Freed, N., "Behavior of and Requirements for Internet Firewalls", RFC 2979, October 2000. [25] Narten, T. and H. Alvestrand, "Guidelines for Writing an IANA Considerations Section in RFCs", BCP 26, RFC 5226, May 2008.
Appendix A. Notes on Use of TCP Options
The TCP option space is limited due to the length of the Data Offset field in the TCP header (4 bits), which defines the TCP header length in 32-bit words. With the standard TCP header being 20 bytes, this leaves a maximum of 40 bytes for options, and many of these may already be used by options such as timestamp and SACK. We have performed a brief study on the commonly used TCP options in SYN, data, and pure ACK packets, and found that there is enough room to fit all the options we propose using in this document. SYN packets typically include Maximum Segment Size (MSS) (4 bytes), window scale (3 bytes), SACK permitted (2 bytes), and timestamp (10 bytes) options. Together these sum to 19 bytes. Some operating systems appear to pad each option up to a word boundary, thus using 24 bytes (a brief survey suggests Windows XP and Mac OS X do this, whereas Linux does not). Optimistically, therefore, we have 21 bytes spare, or 16 if it has to be word-aligned. In either case, however, the SYN versions of Multipath Capable (12 bytes) and Join (12 or 16 bytes) options will fit in this remaining space. TCP data packets typically carry timestamp options in every packet, taking 10 bytes (or 12 with padding). That leaves 30 bytes (or 28, if word-aligned). The Data Sequence Signal (DSS) option varies in length depending on whether the data sequence mapping and DATA_ACK are included, and whether the sequence numbers in use are 4 or 8 octets. The maximum size of the DSS option is 28 bytes, so even that will fit in the available space. But unless a connection is both bidirectional and high-bandwidth, it is unlikely that all that option space will be required on each DSS option. Within the DSS option, it is not necessary to include the data sequence mapping and DATA_ACK in each packet, and in many cases it may be possible to alternate their presence (so long as the mapping covers the data being sent in the following packet). It would also be possible to alternate between 4- and 8-byte sequence numbers in each option. On subflow and connection setup, an MPTCP option is also set on the third packet (an ACK). These are 20 bytes (for Multipath Capable) and 24 bytes (for Join), both of which will fit in the available option space. Pure ACKs in TCP typically contain only timestamps (10 bytes). Here, Multipath TCP typically needs to encode only the DATA_ACK (maximum of 12 bytes). Occasionally, ACKs will contain SACK information. Depending on the number of lost packets, SACK may utilize the entire
option space. If a DATA_ACK had to be included, then it is probably necessary to reduce the number of SACK blocks to accommodate the DATA_ACK. However, the presence of the DATA_ACK is unlikely to be necessary in a case where SACK is in use, since until at least some of the SACK blocks have been retransmitted, the cumulative data-level ACK will not be moving forward (or if it does, due to retransmissions on another path, then that path can also be used to transmit the new DATA_ACK). The ADD_ADDR option can be between 8 and 22 bytes, depending on whether IPv4 or IPv6 is used, and whether or not the port number is present. It is unlikely that such signaling would fit in a data packet (although if there is space, it is fine to include it). It is recommended to use duplicate ACKs with no other payload or options in order to transmit these rare signals. Note this is the reason for mandating that duplicate ACKs with MPTCP options are not taken as a signal of congestion. Finally, there are issues with reliable delivery of options. As options can also be sent on pure ACKs, these are not reliably sent. This is not an issue for DATA_ACK due to their cumulative nature, but may be an issue for ADD_ADDR/REMOVE_ADDR options. Here, it is recommended to send these options redundantly (whether on multiple paths or on the same path on a number of ACKs -- but interspersed with data in order to avoid interpretation as congestion). The cases where options are stripped by middleboxes are discussed in Section 6.Appendix B. Control Blocks
Conceptually, an MPTCP connection can be represented as an MPTCP control block that contains several variables that track the progress and the state of the MPTCP connection and a set of linked TCP control blocks that correspond to the subflows that have been established. RFC 793 [1] specifies several state variables. Whenever possible, we reuse the same terminology as RFC 793 to describe the state variables that are maintained by MPTCP.B.1. MPTCP Control Block
The MPTCP control block contains the following variable per connection.B.1.1. Authentication and Metadata
Local.Token (32 bits): This is the token chosen by the local host on this MPTCP connection. The token MUST be unique among all established MPTCP connections, generated from the local key.
Local.Key (64 bits): This is the key sent by the local host on this MPTCP connection. Remote.Token (32 bits): This is the token chosen by the remote host on this MPTCP connection, generated from the remote key. Remote.Key (64 bits): This is the key chosen by the remote host on this MPTCP connection MPTCP.Checksum (flag): This flag is set to true if at least one of the hosts has set the C bit in the MP_CAPABLE options exchanged during connection establishment, and is set to false otherwise. If this flag is set, the checksum must be computed in all DSS options.B.1.2. Sending Side
SND.UNA (64 bits): This is the data sequence number of the next byte to be acknowledged, at the MPTCP connection level. This variable is updated upon reception of a DSS option containing a DATA_ACK. SND.NXT (64 bits): This is the data sequence number of the next byte to be sent. SND.NXT is used to determine the value of the DSN in the DSS option. SND.WND (32 bits with RFC 1323, 16 bits otherwise): This is the sending window. MPTCP maintains the sending window at the MPTCP connection level and the same window is shared by all subflows. All subflows use the MPTCP connection level SND.WND to compute the SEQ.WND value that is sent in each transmitted segment.B.1.3. Receiving Side
RCV.NXT (64 bits): This is the data sequence number of the next byte that is expected on the MPTCP connection. This state variable is modified upon reception of in-order data. The value of RCV.NXT is used to specify the DATA_ACK that is sent in the DSS option on all subflows. RCV.WND (32 bits with RFC 1323, 16 bits otherwise): This is the connection-level receive window, which is the maximum of the RCV.WND on all the subflows.
B.2. TCP Control Blocks
The MPTCP control block also contains a list of the TCP control blocks that are associated to the MPTCP connection. Note that the TCP control block on the TCP subflows does not contain the RCV.WND and SND.WND state variables as these are maintained at the MPTCP connection level and not at the subflow level. Inside each TCP control block, the following state variables are defined.B.2.1. Sending Side
SND.UNA (32 bits): This is the sequence number of the next byte to be acknowledged on the subflow. This variable is updated upon reception of each TCP acknowledgment on the subflow. SND.NXT (32 bits): This is the sequence number of the next byte to be sent on the subflow. SND.NXT is used to set the value of SEG.SEQ upon transmission of the next segment.B.2.2. Receiving Side
RCV.NXT (32 bits): This is the sequence number of the next byte that is expected on the subflow. This state variable is modified upon reception of in-order segments. The value of RCV.NXT is copied to the SEG.ACK field of the next segments transmitted on the subflow. RCV.WND (32 bits with RFC 1323, 16 bits otherwise): This is the subflow-level receive window that is updated with the window field from the segments received on this subflow.
Appendix C. Finite State Machine
The diagram in Figure 17 shows the Finite State Machine for connection-level closure. This illustrates how the DATA_FIN connection-level signal (indicated as the DFIN flag on a DATA_ACK) interacts with subflow-level FINs, and permits "break-before-make" handover between subflows. +---------+ | M_ESTAB | +---------+ M_CLOSE | | rcv DATA_FIN ------- | | ------- +---------+ snd DATA_FIN / \ snd DATA_ACK[DFIN] +---------+ | M_FIN |<----------------- ------------------->| M_CLOSE | | WAIT-1 |--------------------------- | WAIT | +---------+ rcv DATA_FIN \ +---------+ | rcv DATA_ACK[DFIN] ------- | M_CLOSE | | -------------- snd DATA_ACK | ------- | | CLOSE all subflows | snd DATA_FIN | V V V +-----------+ +-----------+ +-----------+ |M_FINWAIT-2| | M_CLOSING | | M_LAST-ACK| +-----------+ +-----------+ +-----------+ | rcv DATA_ACK[DFIN] | rcv DATA_ACK[DFIN] | | rcv DATA_FIN -------------- | -------------- | | ------- CLOSE all subflows | CLOSE all subflows | | snd DATA_ACK[DFIN] V delete MPTCP PCB V \ +-----------+ +---------+ ------------------------>|M_TIME WAIT|----------------->| M_CLOSED| +-----------+ +---------+ All subflows in CLOSED ------------ delete MPTCP PCB Figure 17: Finite State Machine for Connection Closure
Authors' Addresses
Alan Ford Cisco Ruscombe Business Park Ruscombe, Berkshire RG10 9NN UK EMail: alanford@cisco.com Costin Raiciu University Politehnica of Bucharest Splaiul Independentei 313 Bucharest Romania EMail: costin.raiciu@cs.pub.ro Mark Handley University College London Gower Street London WC1E 6BT UK EMail: m.handley@cs.ucl.ac.uk Olivier Bonaventure Universite catholique de Louvain Pl. Ste Barbe, 2 Louvain-la-Neuve 1348 Belgium EMail: olivier.bonaventure@uclouvain.be