RFC 6824

TCP Extensions for Multipath Operation with Multiple Addresses

Pages: 64
Obsoleted by: 8684

Part 4 of 4 – Pages 57 to 64

RFC6824 - Page 57 prevText

9.  References

9.1.  Normative References

   [1]   Postel, J., "Transmission Control Protocol", STD 7, RFC 793,
         September 1981.

   [2]   Ford, A., Raiciu, C., Handley, M., Barre, S., and J. Iyengar,
         "Architectural Guidelines for Multipath TCP Development",
         RFC 6182, March 2011.

   [3]   Bradner, S., "Key words for use in RFCs to Indicate Requirement
         Levels", BCP 14, RFC 2119, March 1997.

   [4]   National Institute of Science and Technology, "Secure Hash
         Standard", Federal Information Processing Standard
         (FIPS) 180-3, October 2008, <http://csrc.nist.gov/publications/
         fips/fips180-3/fips180-3_final.pdf>.

9.2.  Informative References

   [5]   Raiciu, C., Handley, M., and D. Wischik, "Coupled Congestion
         Control for Multipath Transport Protocols", RFC 6356,
         October 2011.

   [6]   Scharf, M. and A. Ford, "MPTCP Application Interface
         Considerations", Work in Progress, October 2012.

   [7]   Hopps, C., "Analysis of an Equal-Cost Multi-Path Algorithm",
         RFC 2992, November 2000.

   [8]   Raiciu, C., Paasch, C., Barre, S., Ford, A., Honda, M.,
         Duchene, F., Bonaventure, O., and M. Handley, "How Hard Can It
         Be? Designing and Implementing a Deployable Multipath TCP",
         Usenix Symposium on Networked Systems Design and
         Implementation 012, 2012, <https://www.usenix.org/conference/
         nsdi12/how-hard-can-it-be-designing-and-implementing-
         deployable-multipath-tcp>.

   [9]   Bagnulo, M., "Threat Analysis for TCP Extensions for Multipath
         Operation with Multiple Addresses", RFC 6181, March 2011.

   [10]  Krawczyk, H., Bellare, M., and R. Canetti, "HMAC: Keyed-Hashing
         for Message Authentication", RFC 2104, February 1997.

   [11]  Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP
         Selective Acknowledgment Options", RFC 2018, October 1996.

RFC6824 - Page 58

   [12]  Allman, M., Paxson, V., and E. Blanton, "TCP Congestion
         Control", RFC 5681, September 2009.

   [13]  Gont, F., "Survey of Security Hardening Methods for
         Transmission Control Protocol (TCP) Implementations", Work
         in Progress, March 2012.

   [14]  Eastlake, D., Schiller, J., and S. Crocker, "Randomness
         Requirements for Security", BCP 106, RFC 4086, June 2005.

   [15]  Eastlake, D. and T. Hansen, "US Secure Hash Algorithms (SHA and
         SHA-based HMAC and HKDF)", RFC 6234, May 2011.

   [16]  Jacobson, V., Braden, B., and D. Borman, "TCP Extensions for
         High Performance", RFC 1323, May 1992.

   [17]  Ramakrishnan, K., Floyd, S., and D. Black, "The Addition of
         Explicit Congestion Notification (ECN) to IP", RFC 3168,
         September 2001.

   [18]  Rekhter, Y., Moskowitz, R., Karrenberg, D., Groot, G., and E.
         Lear, "Address Allocation for Private Internets", BCP 5,
         RFC 1918, February 1996.

   [19]  Braden, R., "Requirements for Internet Hosts - Communication
         Layers", STD 3, RFC 1122, October 1989.

   [20]  Ramaiah, A., "TCP option space extension", Work in Progress,
         March 2012.

   [21]  Srisuresh, P. and K. Egevang, "Traditional IP Network Address
         Translator (Traditional NAT)", RFC 3022, January 2001.

   [22]  Border, J., Kojo, M., Griner, J., Montenegro, G., and Z.
         Shelby, "Performance Enhancing Proxies Intended to Mitigate
         Link-Related Degradations", RFC 3135, June 2001.

   [23]  Handley, M., Paxson, V., and C. Kreibich, "Network Intrusion
         Detection: Evasion, Traffic Normalization, and End-to-End
         Protocol Semantics", Usenix Security 2001, 2001,
         <http://www.usenix.org/events/sec01/full_papers/
         handley/handley.pdf>.

   [24]  Freed, N., "Behavior of and Requirements for Internet
         Firewalls", RFC 2979, October 2000.

   [25]  Narten, T. and H. Alvestrand, "Guidelines for Writing an IANA
         Considerations Section in RFCs", BCP 26, RFC 5226, May 2008.

RFC6824 - Page 59

Appendix A.  Notes on Use of TCP Options

   The TCP option space is limited due to the length of the Data Offset
   field in the TCP header (4 bits), which defines the TCP header length
   in 32-bit words.  With the standard TCP header being 20 bytes, this
   leaves a maximum of 40 bytes for options, and many of these may
   already be used by options such as timestamp and SACK.

   We have performed a brief study on the commonly used TCP options in
   SYN, data, and pure ACK packets, and found that there is enough room
   to fit all the options we propose using in this document.

   SYN packets typically include Maximum Segment Size (MSS) (4 bytes),
   window scale (3 bytes), SACK permitted (2 bytes), and timestamp (10
   bytes) options.  Together these sum to 19 bytes.  Some operating
   systems appear to pad each option up to a word boundary, thus using
   24 bytes (a brief survey suggests Windows XP and Mac OS X do this,
   whereas Linux does not).  Optimistically, therefore, we have 21 bytes
   spare, or 16 if it has to be word-aligned.  In either case, however,
   the SYN versions of Multipath Capable (12 bytes) and Join (12 or 16
   bytes) options will fit in this remaining space.

   TCP data packets typically carry timestamp options in every packet,
   taking 10 bytes (or 12 with padding).  That leaves 30 bytes (or 28,
   if word-aligned).  The Data Sequence Signal (DSS) option varies in
   length depending on whether the data sequence mapping and DATA_ACK
   are included, and whether the sequence numbers in use are 4 or 8
   octets.  The maximum size of the DSS option is 28 bytes, so even that
   will fit in the available space.  But unless a connection is both
   bidirectional and high-bandwidth, it is unlikely that all that option
   space will be required on each DSS option.

   Within the DSS option, it is not necessary to include the data
   sequence mapping and DATA_ACK in each packet, and in many cases it
   may be possible to alternate their presence (so long as the mapping
   covers the data being sent in the following packet).  It would also
   be possible to alternate between 4- and 8-byte sequence numbers in
   each option.

   On subflow and connection setup, an MPTCP option is also set on the
   third packet (an ACK).  These are 20 bytes (for Multipath Capable)
   and 24 bytes (for Join), both of which will fit in the available
   option space.

   Pure ACKs in TCP typically contain only timestamps (10 bytes).  Here,
   Multipath TCP typically needs to encode only the DATA_ACK (maximum of
   12 bytes).  Occasionally, ACKs will contain SACK information.
   Depending on the number of lost packets, SACK may utilize the entire

RFC6824 - Page 60

   option space.  If a DATA_ACK had to be included, then it is probably
   necessary to reduce the number of SACK blocks to accommodate the
   DATA_ACK.  However, the presence of the DATA_ACK is unlikely to be
   necessary in a case where SACK is in use, since until at least some
   of the SACK blocks have been retransmitted, the cumulative data-level
   ACK will not be moving forward (or if it does, due to retransmissions
   on another path, then that path can also be used to transmit the new
   DATA_ACK).

   The ADD_ADDR option can be between 8 and 22 bytes, depending on
   whether IPv4 or IPv6 is used, and whether or not the port number is
   present.  It is unlikely that such signaling would fit in a data
   packet (although if there is space, it is fine to include it).  It is
   recommended to use duplicate ACKs with no other payload or options in
   order to transmit these rare signals.  Note this is the reason for
   mandating that duplicate ACKs with MPTCP options are not taken as a
   signal of congestion.

   Finally, there are issues with reliable delivery of options.  As
   options can also be sent on pure ACKs, these are not reliably sent.
   This is not an issue for DATA_ACK due to their cumulative nature, but
   may be an issue for ADD_ADDR/REMOVE_ADDR options.  Here, it is
   recommended to send these options redundantly (whether on multiple
   paths or on the same path on a number of ACKs -- but interspersed
   with data in order to avoid interpretation as congestion).  The cases
   where options are stripped by middleboxes are discussed in Section 6.

Appendix B.  Control Blocks

   Conceptually, an MPTCP connection can be represented as an MPTCP
   control block that contains several variables that track the progress
   and the state of the MPTCP connection and a set of linked TCP control
   blocks that correspond to the subflows that have been established.

   RFC 793 [1] specifies several state variables.  Whenever possible, we
   reuse the same terminology as RFC 793 to describe the state variables
   that are maintained by MPTCP.

B.1.  MPTCP Control Block

   The MPTCP control block contains the following variable per
   connection.

B.1.1.  Authentication and Metadata

   Local.Token (32 bits):  This is the token chosen by the local host on
      this MPTCP connection.  The token MUST be unique among all
      established MPTCP connections, generated from the local key.

RFC6824 - Page 61

   Local.Key (64 bits):  This is the key sent by the local host on this
      MPTCP connection.

   Remote.Token (32 bits):  This is the token chosen by the remote host
      on this MPTCP connection, generated from the remote key.

   Remote.Key (64 bits):  This is the key chosen by the remote host on
      this MPTCP connection

   MPTCP.Checksum (flag):  This flag is set to true if at least one of
      the hosts has set the C bit in the MP_CAPABLE options exchanged
      during connection establishment, and is set to false otherwise.
      If this flag is set, the checksum must be computed in all DSS
      options.

B.1.2.  Sending Side

   SND.UNA (64 bits):  This is the data sequence number of the next byte
      to be acknowledged, at the MPTCP connection level.  This variable
      is updated upon reception of a DSS option containing a DATA_ACK.

   SND.NXT (64 bits):  This is the data sequence number of the next byte
      to be sent.  SND.NXT is used to determine the value of the DSN in
      the DSS option.

   SND.WND (32 bits with RFC 1323, 16 bits otherwise):  This is the
      sending window.  MPTCP maintains the sending window at the MPTCP
      connection level and the same window is shared by all subflows.
      All subflows use the MPTCP connection level SND.WND to compute the
      SEQ.WND value that is sent in each transmitted segment.

B.1.3.  Receiving Side

   RCV.NXT (64 bits):  This is the data sequence number of the next byte
      that is expected on the MPTCP connection.  This state variable is
      modified upon reception of in-order data.  The value of RCV.NXT is
      used to specify the DATA_ACK that is sent in the DSS option on all
      subflows.

   RCV.WND (32 bits with RFC 1323, 16 bits otherwise):  This is the
      connection-level receive window, which is the maximum of the
      RCV.WND on all the subflows.

RFC6824 - Page 62

B.2.  TCP Control Blocks

   The MPTCP control block also contains a list of the TCP control
   blocks that are associated to the MPTCP connection.

   Note that the TCP control block on the TCP subflows does not contain
   the RCV.WND and SND.WND state variables as these are maintained at
   the MPTCP connection level and not at the subflow level.

   Inside each TCP control block, the following state variables are
   defined.

B.2.1.  Sending Side

   SND.UNA (32 bits):  This is the sequence number of the next byte to
      be acknowledged on the subflow.  This variable is updated upon
      reception of each TCP acknowledgment on the subflow.

   SND.NXT (32 bits):  This is the sequence number of the next byte to
      be sent on the subflow.  SND.NXT is used to set the value of
      SEG.SEQ upon transmission of the next segment.

B.2.2.  Receiving Side

   RCV.NXT (32 bits):  This is the sequence number of the next byte that
      is expected on the subflow.  This state variable is modified upon
      reception of in-order segments.  The value of RCV.NXT is copied to
      the SEG.ACK field of the next segments transmitted on the subflow.

   RCV.WND (32 bits with RFC 1323, 16 bits otherwise):  This is the
      subflow-level receive window that is updated with the window field
      from the segments received on this subflow.

RFC6824 - Page 63

Appendix C.  Finite State Machine

   The diagram in Figure 17 shows the Finite State Machine for
   connection-level closure.  This illustrates how the DATA_FIN
   connection-level signal (indicated as the DFIN flag on a DATA_ACK)
   interacts with subflow-level FINs, and permits "break-before-make"
   handover between subflows.

                              +---------+
                              | M_ESTAB |
                              +---------+
                     M_CLOSE    |     |    rcv DATA_FIN
                      -------   |     |    -------
 +---------+       snd DATA_FIN /       \ snd DATA_ACK[DFIN] +---------+
 |  M_FIN  |<-----------------           ------------------->| M_CLOSE |
 | WAIT-1  |---------------------------                      |   WAIT  |
 +---------+               rcv DATA_FIN \                    +---------+
   | rcv DATA_ACK[DFIN]         ------- |                   M_CLOSE |
   | --------------        snd DATA_ACK |                   ------- |
   | CLOSE all subflows                 |              snd DATA_FIN |
   V                                    V                           V
 +-----------+              +-----------+                  +-----------+
 |M_FINWAIT-2|              | M_CLOSING |                  | M_LAST-ACK|
 +-----------+              +-----------+                  +-----------+
   |              rcv DATA_ACK[DFIN] |           rcv DATA_ACK[DFIN] |
   | rcv DATA_FIN     -------------- |               -------------- |
   |  -------     CLOSE all subflows |           CLOSE all subflows |
   | snd DATA_ACK[DFIN]              V            delete MPTCP PCB  V
   \                          +-----------+                  +---------+
     ------------------------>|M_TIME WAIT|----------------->| M_CLOSED|
                              +-----------+                  +---------+
                                         All subflows in CLOSED
                                             ------------
                                         delete MPTCP PCB

          Figure 17: Finite State Machine for Connection Closure

RFC6824 - Page 64

Authors' Addresses

   Alan Ford
   Cisco
   Ruscombe Business Park
   Ruscombe, Berkshire  RG10 9NN
   UK

   EMail: alanford@cisco.com


   Costin Raiciu
   University Politehnica of Bucharest
   Splaiul Independentei 313
   Bucharest
   Romania

   EMail: costin.raiciu@cs.pub.ro


   Mark Handley
   University College London
   Gower Street
   London  WC1E 6BT
   UK

   EMail: m.handley@cs.ucl.ac.uk


   Olivier Bonaventure
   Universite catholique de Louvain
   Pl. Ste Barbe, 2
   Louvain-la-Neuve  1348
   Belgium

   EMail: olivier.bonaventure@uclouvain.be