Tech-invite3GPPspaceIETFspace
9796959493929190898887868584838281807978777675747372717069686766656463626160595857565554535251504948474645444342414039383736353433323130292827262524232221201918171615141312111009080706050403020100
in Index   Prev   Next

RFC 6824

TCP Extensions for Multipath Operation with Multiple Addresses

Pages: 64
Obsoleted by:  8684
Part 4 of 4 – Pages 57 to 64
First   Prev   None

Top   ToC   RFC6824 - Page 57   prevText

9. References

9.1. Normative References

[1] Postel, J., "Transmission Control Protocol", STD 7, RFC 793, September 1981. [2] Ford, A., Raiciu, C., Handley, M., Barre, S., and J. Iyengar, "Architectural Guidelines for Multipath TCP Development", RFC 6182, March 2011. [3] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [4] National Institute of Science and Technology, "Secure Hash Standard", Federal Information Processing Standard (FIPS) 180-3, October 2008, <http://csrc.nist.gov/publications/ fips/fips180-3/fips180-3_final.pdf>.

9.2. Informative References

[5] Raiciu, C., Handley, M., and D. Wischik, "Coupled Congestion Control for Multipath Transport Protocols", RFC 6356, October 2011. [6] Scharf, M. and A. Ford, "MPTCP Application Interface Considerations", Work in Progress, October 2012. [7] Hopps, C., "Analysis of an Equal-Cost Multi-Path Algorithm", RFC 2992, November 2000. [8] Raiciu, C., Paasch, C., Barre, S., Ford, A., Honda, M., Duchene, F., Bonaventure, O., and M. Handley, "How Hard Can It Be? Designing and Implementing a Deployable Multipath TCP", Usenix Symposium on Networked Systems Design and Implementation 012, 2012, <https://www.usenix.org/conference/ nsdi12/how-hard-can-it-be-designing-and-implementing- deployable-multipath-tcp>. [9] Bagnulo, M., "Threat Analysis for TCP Extensions for Multipath Operation with Multiple Addresses", RFC 6181, March 2011. [10] Krawczyk, H., Bellare, M., and R. Canetti, "HMAC: Keyed-Hashing for Message Authentication", RFC 2104, February 1997. [11] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP Selective Acknowledgment Options", RFC 2018, October 1996.
Top   ToC   RFC6824 - Page 58
   [12]  Allman, M., Paxson, V., and E. Blanton, "TCP Congestion
         Control", RFC 5681, September 2009.

   [13]  Gont, F., "Survey of Security Hardening Methods for
         Transmission Control Protocol (TCP) Implementations", Work
         in Progress, March 2012.

   [14]  Eastlake, D., Schiller, J., and S. Crocker, "Randomness
         Requirements for Security", BCP 106, RFC 4086, June 2005.

   [15]  Eastlake, D. and T. Hansen, "US Secure Hash Algorithms (SHA and
         SHA-based HMAC and HKDF)", RFC 6234, May 2011.

   [16]  Jacobson, V., Braden, B., and D. Borman, "TCP Extensions for
         High Performance", RFC 1323, May 1992.

   [17]  Ramakrishnan, K., Floyd, S., and D. Black, "The Addition of
         Explicit Congestion Notification (ECN) to IP", RFC 3168,
         September 2001.

   [18]  Rekhter, Y., Moskowitz, R., Karrenberg, D., Groot, G., and E.
         Lear, "Address Allocation for Private Internets", BCP 5,
         RFC 1918, February 1996.

   [19]  Braden, R., "Requirements for Internet Hosts - Communication
         Layers", STD 3, RFC 1122, October 1989.

   [20]  Ramaiah, A., "TCP option space extension", Work in Progress,
         March 2012.

   [21]  Srisuresh, P. and K. Egevang, "Traditional IP Network Address
         Translator (Traditional NAT)", RFC 3022, January 2001.

   [22]  Border, J., Kojo, M., Griner, J., Montenegro, G., and Z.
         Shelby, "Performance Enhancing Proxies Intended to Mitigate
         Link-Related Degradations", RFC 3135, June 2001.

   [23]  Handley, M., Paxson, V., and C. Kreibich, "Network Intrusion
         Detection: Evasion, Traffic Normalization, and End-to-End
         Protocol Semantics", Usenix Security 2001, 2001,
         <http://www.usenix.org/events/sec01/full_papers/
         handley/handley.pdf>.

   [24]  Freed, N., "Behavior of and Requirements for Internet
         Firewalls", RFC 2979, October 2000.

   [25]  Narten, T. and H. Alvestrand, "Guidelines for Writing an IANA
         Considerations Section in RFCs", BCP 26, RFC 5226, May 2008.
Top   ToC   RFC6824 - Page 59

Appendix A. Notes on Use of TCP Options

The TCP option space is limited due to the length of the Data Offset field in the TCP header (4 bits), which defines the TCP header length in 32-bit words. With the standard TCP header being 20 bytes, this leaves a maximum of 40 bytes for options, and many of these may already be used by options such as timestamp and SACK. We have performed a brief study on the commonly used TCP options in SYN, data, and pure ACK packets, and found that there is enough room to fit all the options we propose using in this document. SYN packets typically include Maximum Segment Size (MSS) (4 bytes), window scale (3 bytes), SACK permitted (2 bytes), and timestamp (10 bytes) options. Together these sum to 19 bytes. Some operating systems appear to pad each option up to a word boundary, thus using 24 bytes (a brief survey suggests Windows XP and Mac OS X do this, whereas Linux does not). Optimistically, therefore, we have 21 bytes spare, or 16 if it has to be word-aligned. In either case, however, the SYN versions of Multipath Capable (12 bytes) and Join (12 or 16 bytes) options will fit in this remaining space. TCP data packets typically carry timestamp options in every packet, taking 10 bytes (or 12 with padding). That leaves 30 bytes (or 28, if word-aligned). The Data Sequence Signal (DSS) option varies in length depending on whether the data sequence mapping and DATA_ACK are included, and whether the sequence numbers in use are 4 or 8 octets. The maximum size of the DSS option is 28 bytes, so even that will fit in the available space. But unless a connection is both bidirectional and high-bandwidth, it is unlikely that all that option space will be required on each DSS option. Within the DSS option, it is not necessary to include the data sequence mapping and DATA_ACK in each packet, and in many cases it may be possible to alternate their presence (so long as the mapping covers the data being sent in the following packet). It would also be possible to alternate between 4- and 8-byte sequence numbers in each option. On subflow and connection setup, an MPTCP option is also set on the third packet (an ACK). These are 20 bytes (for Multipath Capable) and 24 bytes (for Join), both of which will fit in the available option space. Pure ACKs in TCP typically contain only timestamps (10 bytes). Here, Multipath TCP typically needs to encode only the DATA_ACK (maximum of 12 bytes). Occasionally, ACKs will contain SACK information. Depending on the number of lost packets, SACK may utilize the entire
Top   ToC   RFC6824 - Page 60
   option space.  If a DATA_ACK had to be included, then it is probably
   necessary to reduce the number of SACK blocks to accommodate the
   DATA_ACK.  However, the presence of the DATA_ACK is unlikely to be
   necessary in a case where SACK is in use, since until at least some
   of the SACK blocks have been retransmitted, the cumulative data-level
   ACK will not be moving forward (or if it does, due to retransmissions
   on another path, then that path can also be used to transmit the new
   DATA_ACK).

   The ADD_ADDR option can be between 8 and 22 bytes, depending on
   whether IPv4 or IPv6 is used, and whether or not the port number is
   present.  It is unlikely that such signaling would fit in a data
   packet (although if there is space, it is fine to include it).  It is
   recommended to use duplicate ACKs with no other payload or options in
   order to transmit these rare signals.  Note this is the reason for
   mandating that duplicate ACKs with MPTCP options are not taken as a
   signal of congestion.

   Finally, there are issues with reliable delivery of options.  As
   options can also be sent on pure ACKs, these are not reliably sent.
   This is not an issue for DATA_ACK due to their cumulative nature, but
   may be an issue for ADD_ADDR/REMOVE_ADDR options.  Here, it is
   recommended to send these options redundantly (whether on multiple
   paths or on the same path on a number of ACKs -- but interspersed
   with data in order to avoid interpretation as congestion).  The cases
   where options are stripped by middleboxes are discussed in Section 6.

Appendix B. Control Blocks

Conceptually, an MPTCP connection can be represented as an MPTCP control block that contains several variables that track the progress and the state of the MPTCP connection and a set of linked TCP control blocks that correspond to the subflows that have been established. RFC 793 [1] specifies several state variables. Whenever possible, we reuse the same terminology as RFC 793 to describe the state variables that are maintained by MPTCP.

B.1. MPTCP Control Block

The MPTCP control block contains the following variable per connection.

B.1.1. Authentication and Metadata

Local.Token (32 bits): This is the token chosen by the local host on this MPTCP connection. The token MUST be unique among all established MPTCP connections, generated from the local key.
Top   ToC   RFC6824 - Page 61
   Local.Key (64 bits):  This is the key sent by the local host on this
      MPTCP connection.

   Remote.Token (32 bits):  This is the token chosen by the remote host
      on this MPTCP connection, generated from the remote key.

   Remote.Key (64 bits):  This is the key chosen by the remote host on
      this MPTCP connection

   MPTCP.Checksum (flag):  This flag is set to true if at least one of
      the hosts has set the C bit in the MP_CAPABLE options exchanged
      during connection establishment, and is set to false otherwise.
      If this flag is set, the checksum must be computed in all DSS
      options.

B.1.2. Sending Side

SND.UNA (64 bits): This is the data sequence number of the next byte to be acknowledged, at the MPTCP connection level. This variable is updated upon reception of a DSS option containing a DATA_ACK. SND.NXT (64 bits): This is the data sequence number of the next byte to be sent. SND.NXT is used to determine the value of the DSN in the DSS option. SND.WND (32 bits with RFC 1323, 16 bits otherwise): This is the sending window. MPTCP maintains the sending window at the MPTCP connection level and the same window is shared by all subflows. All subflows use the MPTCP connection level SND.WND to compute the SEQ.WND value that is sent in each transmitted segment.

B.1.3. Receiving Side

RCV.NXT (64 bits): This is the data sequence number of the next byte that is expected on the MPTCP connection. This state variable is modified upon reception of in-order data. The value of RCV.NXT is used to specify the DATA_ACK that is sent in the DSS option on all subflows. RCV.WND (32 bits with RFC 1323, 16 bits otherwise): This is the connection-level receive window, which is the maximum of the RCV.WND on all the subflows.
Top   ToC   RFC6824 - Page 62

B.2. TCP Control Blocks

The MPTCP control block also contains a list of the TCP control blocks that are associated to the MPTCP connection. Note that the TCP control block on the TCP subflows does not contain the RCV.WND and SND.WND state variables as these are maintained at the MPTCP connection level and not at the subflow level. Inside each TCP control block, the following state variables are defined.

B.2.1. Sending Side

SND.UNA (32 bits): This is the sequence number of the next byte to be acknowledged on the subflow. This variable is updated upon reception of each TCP acknowledgment on the subflow. SND.NXT (32 bits): This is the sequence number of the next byte to be sent on the subflow. SND.NXT is used to set the value of SEG.SEQ upon transmission of the next segment.

B.2.2. Receiving Side

RCV.NXT (32 bits): This is the sequence number of the next byte that is expected on the subflow. This state variable is modified upon reception of in-order segments. The value of RCV.NXT is copied to the SEG.ACK field of the next segments transmitted on the subflow. RCV.WND (32 bits with RFC 1323, 16 bits otherwise): This is the subflow-level receive window that is updated with the window field from the segments received on this subflow.
Top   ToC   RFC6824 - Page 63

Appendix C. Finite State Machine

The diagram in Figure 17 shows the Finite State Machine for connection-level closure. This illustrates how the DATA_FIN connection-level signal (indicated as the DFIN flag on a DATA_ACK) interacts with subflow-level FINs, and permits "break-before-make" handover between subflows. +---------+ | M_ESTAB | +---------+ M_CLOSE | | rcv DATA_FIN ------- | | ------- +---------+ snd DATA_FIN / \ snd DATA_ACK[DFIN] +---------+ | M_FIN |<----------------- ------------------->| M_CLOSE | | WAIT-1 |--------------------------- | WAIT | +---------+ rcv DATA_FIN \ +---------+ | rcv DATA_ACK[DFIN] ------- | M_CLOSE | | -------------- snd DATA_ACK | ------- | | CLOSE all subflows | snd DATA_FIN | V V V +-----------+ +-----------+ +-----------+ |M_FINWAIT-2| | M_CLOSING | | M_LAST-ACK| +-----------+ +-----------+ +-----------+ | rcv DATA_ACK[DFIN] | rcv DATA_ACK[DFIN] | | rcv DATA_FIN -------------- | -------------- | | ------- CLOSE all subflows | CLOSE all subflows | | snd DATA_ACK[DFIN] V delete MPTCP PCB V \ +-----------+ +---------+ ------------------------>|M_TIME WAIT|----------------->| M_CLOSED| +-----------+ +---------+ All subflows in CLOSED ------------ delete MPTCP PCB Figure 17: Finite State Machine for Connection Closure
Top   ToC   RFC6824 - Page 64

Authors' Addresses

Alan Ford Cisco Ruscombe Business Park Ruscombe, Berkshire RG10 9NN UK EMail: alanford@cisco.com Costin Raiciu University Politehnica of Bucharest Splaiul Independentei 313 Bucharest Romania EMail: costin.raiciu@cs.pub.ro Mark Handley University College London Gower Street London WC1E 6BT UK EMail: m.handley@cs.ucl.ac.uk Olivier Bonaventure Universite catholique de Louvain Pl. Ste Barbe, 2 Louvain-la-Neuve 1348 Belgium EMail: olivier.bonaventure@uclouvain.be