Tech-invite3GPPspaceIETFspace
96959493929190898887868584838281807978777675747372717069686766656463626160595857565554535251504948474645444342414039383736353433323130292827262524232221201918171615141312111009080706050403020100
in Index   Prev   Next

RFC 5044

Marker PDU Aligned Framing for TCP Specification

Pages: 74
Proposed Standard
Errata
Updated by:  65817146
Part 2 of 3 – Pages 22 to 44
First   Prev   Next

Top   ToC   RFC5044 - Page 22   prevText

5. MPA's interactions with TCP

The following sections describe MPA's interactions with TCP. This section discusses using a standard layered TCP stack with MPA attached above a TCP socket. Discussion of using an optimized MPA- aware TCP with an MPA implementation that takes advantage of the extra optimizations is done in Appendix A. +-----------------------------------+ | +-----+ +-----------------+ | | | MPA | | Other Protocols | | | +-----+ +-----------------+ | | || || | | ----- socket API -------------- | | || | | +-----+ | | | TCP | | | +-----+ | | || | | +-----+ | | | IP | | | +-----+ | +-----------------------------------+ Figure 7: Fully Layered Implementation The Fully layered implementation is described for completeness; however, the user is cautioned that the reduced probability of FPDU alignment when transmitting with this implementation will tend to introduce a higher overhead at optimized receivers. In addition, the lack of out-of-order receive processing will significantly reduce the value of DDP/MPA by imposing higher buffering and copying overhead in the local receiver.

5.1. MPA transmitters with a standard layered TCP

MPA transmitters SHOULD calculate a MULPDU as described in Section 4.5. If the TCP implementation allows EMSS to be determined by MPA, that value should be used. If the transmit side TCP implementation is not able to report the EMSS, MPA SHOULD use the current MTU value to establish a likely FPDU size, taking into account the various expected header sizes. MPA transmitters SHOULD also use whatever facilities the TCP stack presents to cause the TCP transmitter to start TCP segments at FPDU boundaries. Multiple FPDUs MAY be packed into a single TCP segment as determined by the EMSS calculation as long as they are entirely contained in the TCP segment.
Top   ToC   RFC5044 - Page 23
   For example, passing FPDU buffers sized to the current EMSS to the
   TCP socket and using the TCP_NODELAY socket option to disable the
   Nagle [RFC896] algorithm will usually result in many of the segments
   starting with an FPDU.

   It is recognized that various effects can cause an FPDU Alignment to
   be lost.  Following are a few of the effects:

   *   ULPDUs that are smaller than the MULPDU.  If these are sent in a
       continuous stream, FPDU Alignment will be lost.  Note that
       careful use of a dynamic MULPDU can help in this case; the MULPDU
       for future FPDUs can be adjusted to re-establish alignment with
       the segments based on the current EMSS.

   *   Sending enough data that the TCP receive window limit is reached.
       TCP may send a smaller segment to exactly fill the receive
       window.

   *   Sending data when TCP is operating up against the congestion
       window.  If TCP is not tracking the congestion window in
       segments, it may transmit a smaller segment to exactly fill the
       receive window.

   *   Changes in EMSS due to varying TCP options, or changes in MTU.

   If FPDU Alignment with TCP segments is lost for any reason, the
   alignment is regained after a break in transmission where the TCP
   send buffers are emptied.  Many usage models for DDP/MPA will include
   such breaks.

   MPA receivers are REQUIRED to be able to operate correctly even if
   alignment is lost (see Section 6).

5.2. MPA receivers with a standard layered TCP

MPA receivers will get TCP data in the usual ordered stream. The receivers MUST identify FPDU boundaries by using the ULPDU_LENGTH field, as described in Section 6. Receivers MAY utilize markers to check for FPDU boundary consistency, but they are NOT required to examine the markers to determine the FPDU boundaries.
Top   ToC   RFC5044 - Page 24

6. MPA Receiver FPDU Identification

An MPA receiver MUST first verify the FPDU before passing the ULPDU to DDP. To do this, the receiver MUST: * locate the start of the FPDU unambiguously, * verify its CRC (if CRC checking is enabled). If the above conditions are true, the MPA receiver passes the ULPDU to DDP. To detect the start of the FPDU unambiguously one of the following MUST be used: 1: In an ordered TCP stream, the ULPDU Length field in the current FPDU when FPDU has a valid CRC, can be used to identify the beginning of the next FPDU. 2: For optimized MPA/TCP receivers that support out-of-order reception of FPDUs (see Section 4.3, MPA Markers) a Marker can always be used to locate the beginning of an FPDU (in FPDUs with valid CRCs). Since the location of the Marker is known in the octet stream (sequence number space), the Marker can always be found. 3: Having found an FPDU by means of a Marker, an optimized MPA/TCP receiver can find following contiguous FPDUs by using the ULPDU Length fields (from FPDUs with valid CRCs) to establish the next FPDU boundary. The ULPDU Length field (see Section 4) MUST be used to determine if the entire FPDU is present before forwarding the ULPDU to DDP. CRC calculation is discussed in Section 4.4 above.

7. Connection Semantics

7.1. Connection Setup

MPA requires that the Consumer MUST activate MPA, and any TCP enhancements for MPA, on a TCP half connection at the same location in the octet stream at both the sender and the receiver. This is required in order for the Marker scheme to correctly locate the Markers (if enabled) and to correctly locate the first FPDU. MPA, and any TCP enhancements for MPA are enabled by the ULP in both directions at once at an endpoint.
Top   ToC   RFC5044 - Page 25
   This can be accomplished several ways, and is left up to DDP's ULP:

   *   DDP's ULP MAY require DDP on MPA startup immediately after TCP
       connection setup.  This has the advantage that no streaming mode
       negotiation is needed.  An example of such a protocol is shown in
       Figure 10: Example Immediate Startup negotiation.

       This may be accomplished by using a well-known port, or a service
       locator protocol to locate an appropriate port on which DDP on
       MPA is expected to operate.

   *   DDP's ULP MAY negotiate the start of DDP on MPA sometime after a
       normal TCP startup, using TCP streaming data exchanges on the
       same connection.  The exchange establishes that DDP on MPA (as
       well as other ULPs) will be used, and exactly locates the point
       in the octet stream where MPA is to begin operation.  Note that
       such a negotiation protocol is outside the scope of this
       specification.  A simplified example of such a protocol is shown
       in Figure 9: Example Delayed Startup negotiation on page 33.

   An MPA endpoint operates in two distinct phases.

   The Startup Phase is used to verify correct MPA setup, exchange CRC
   and Marker configuration, and optionally pass Private Data between
   endpoints prior to completing a DDP connection.  During this phase,
   specifically formatted frames are exchanged as TCP byte streams
   without using CRCs or Markers.  During this phase a DDP endpoint need
   not be "bound" to the MPA connection.  In fact, the choice of DDP
   endpoint and its operating parameters may not be known until the
   Consumer supplied Private Data (if any) has been examined by the
   Consumer.

   The second distinct phase is Full Operation during which FPDUs are
   sent using all the rules that pertain (CRCs, Markers, MULPDU
   restrictions, etc.).  A DDP endpoint MUST be "bound" to the MPA
   connection at entry to this phase.

   When Private Data is passed between ULPs in the Startup Phase, the
   ULP is responsible for interpreting that data, and then placing MPA
   into Full Operation.

   Note: The following text differentiates the two endpoints by calling
       them Initiator and Responder.  This is quite arbitrary and is NOT
       related to the TCP startup (SYN, SYN/ACK sequence).  The
       Initiator is the side that sends first in the MPA startup
       sequence (the MPA Request Frame).
Top   ToC   RFC5044 - Page 26
   Note: The possibility that both endpoints would be allowed to make a
       connection at the same time, sometimes called an active/active
       connection, was considered by the work group and rejected.  There
       were several motivations for this decision.  One was that
       applications needing this facility were few (none other than
       theoretical at the time of this document).  Another was that the
       facility created some implementation difficulties, particularly
       with the "dual stack" designs described later on.  A last issue
       was that dealing with rejected connections at startup would have
       required at least an additional frame type, and more recovery
       actions, complicating the protocol.  While none of these issues
       was overwhelming, the group and implementers were not motivated
       to do the work to resolve these issues.  The protocol includes a
       method of detecting these active/active startup attempts so that
       they can be rejected and an error reported.

   The ULP is responsible for determining which side is Initiator or
   Responder.  For client/server type ULPs, this is easy.  For peer-peer
   ULPs (which might utilize a TCP style active/active startup), some
   mechanism (not defined by this specification) must be established, or
   some streaming mode data exchanged prior to MPA startup to determine
   which side starts in Initiator and which starts in Responder MPA
   mode.

7.1.1 MPA Request and Reply Frame Format

0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 0 | | + Key (16 bytes containing "MPA ID Req Frame") + 4 | (4D 50 41 20 49 44 20 52 65 71 20 46 72 61 6D 65) | + Or (16 bytes containing "MPA ID Rep Frame") + 8 | (4D 50 41 20 49 44 20 52 65 70 20 46 72 61 6D 65) | + + 12 | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 16 |M|C|R| Res | Rev | PD_Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | ~ ~ ~ Private Data ~ | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 8: MPA Request/Reply Frame
Top   ToC   RFC5044 - Page 27
   Key: This field contains the "key" used to validate that the sender
       is an MPA sender.  Initiator mode senders MUST set this field to
       the fixed value "MPA ID Req Frame" or (in byte order) 4D 50 41 20
       49 44 20 52 65 71 20 46 72 61 6D 65 (in hexadecimal).  Responder
       mode receivers MUST check this field for the same value, and
       close the connection and report an error locally if any other
       value is detected.  Responder mode senders MUST set this field to
       the fixed value "MPA ID Rep Frame" or (in byte order) 4D 50 41 20
       49 44 20 52 65 70 20 46 72 61 6D 65 (in hexadecimal).  Initiator
       mode receivers MUST check this field for the same value, and
       close the connection and report an error locally if any other
       value is detected.

   M: This bit declares an endpoint's REQUIRED Marker usage.  When this
       bit is '1' in an MPA Request Frame, the Initiator declares that
       Markers are REQUIRED in FPDUs sent from the Responder.  When set
       to '1' in an MPA Reply Frame, this bit declares that Markers are
       REQUIRED in FPDUs sent from the Initiator.  When in a received
       MPA Request Frame or MPA Reply Frame and the value is '0',
       Markers MUST NOT be added to the data stream by that endpoint.
       When '1' Markers MUST be added as described in Section 4.3, MPA
       Markers.

   C: This bit declares an endpoint's preferred CRC usage.  When this
       field is '0' in the MPA Request Frame and the MPA Reply Frame,
       CRCs MUST not be checked and need not be generated by either
       endpoint.  When this bit is '1' in either the MPA Request Frame
       or MPA Reply Frame, CRCs MUST be generated and checked by both
       endpoints.  Note that even when not in use, the CRC field remains
       present in the FPDU.  When CRCs are not in use, the CRC field
       MUST be considered valid for FPDU checking regardless of its
       contents.

   R: This bit is set to zero, and not checked on reception in the MPA
       Request Frame.  In the MPA Reply Frame, this bit is the Rejected
       Connection bit, set by the Responders ULP to indicate acceptance
       '0', or rejection '1', of the connection parameters provided in
       the Private Data.

   Res: This field is reserved for future use.  It MUST be set to zero
       when sending, and not checked on reception.
Top   ToC   RFC5044 - Page 28
   Rev: This field contains the revision of MPA.  For this version of
       the specification, senders MUST set this field to one.  MPA
       receivers compliant with this version of the specification MUST
       check this field.  If the MPA receiver cannot interoperate with
       the received version, then it MUST close the connection and
       report an error locally.  Otherwise, the MPA receiver should
       report the received version to the ULP.

   PD_Length: This field MUST contain the length in octets of the
       Private Data field.  A value of zero indicates that there is no
       Private Data field present at all.  If the receiver detects that
       the PD_Length field does not match the length of the Private Data
       field, or if the length of the Private Data field exceeds 512
       octets, the receiver MUST close the connection and report an
       error locally.  Otherwise, the MPA receiver should pass the
       PD_Length value and Private Data to the ULP.

   Private Data: This field may contain any value defined by ULPs or may
       not be present.  The Private Data field MUST be between 0 and 512
       octets in length.  ULPs define how to size, set, and validate
       this field within these limits.  Private Data usage is further
       discussed in Section 7.1.4.

7.1.2. Connection Startup Rules

The following rules apply to MPA connection Startup Phase: 1. When MPA is started in the Initiator mode, the MPA implementation MUST send a valid MPA Request Frame. The MPA Request Frame MAY include ULP-supplied Private Data. 2. When MPA is started in the Responder mode, the MPA implementation MUST wait until an MPA Request Frame is received and validated before entering Full MPA/DDP Operation. If the MPA Request Frame is improperly formatted, the implementation MUST close the TCP connection and exit MPA. If the MPA Request Frame is properly formatted but the Private Data is not acceptable, the implementation SHOULD return an MPA Reply Frame with the Rejected Connection bit set to '1'; the MPA Reply Frame MAY include ULP-supplied Private Data; the implementation MUST exit MPA, leaving the TCP connection open. The ULP may close TCP or use the connection for other purposes. If the MPA Request Frame is properly formatted and the Private Data is acceptable, the implementation SHOULD return an MPA Reply Frame with the Rejected Connection bit set to '0'; the MPA Reply
Top   ToC   RFC5044 - Page 29
       Frame MAY include ULP-supplied Private Data; and the Responder
       SHOULD prepare to interpret any data received as FPDUs and pass
       any received ULPDUs to DDP.

       Note: Since the receiver's ability to deal with Markers is
           unknown until the Request and Reply Frames have been
           received, sending FPDUs before this occurs is not possible.


       Note: The requirement to wait on a Request Frame before sending a
           Reply Frame is a design choice.  It makes for a well-ordered
           sequence of events at each end, and avoids having to specify
           how to deal with situations where both ends start at the same
           time.

   3.  MPA Initiator mode implementations MUST receive and validate an
       MPA Reply Frame.

       If the MPA Reply Frame is improperly formatted, the
       implementation MUST close the TCP connection and exit MPA.

       If the MPA Reply Frame is properly formatted but is the Private
       Data is not acceptable, or if the Rejected Connection bit is set
       to '1', the implementation MUST exit MPA, leaving the TCP
       connection open.  The ULP may close TCP or use the connection for
       other purposes.

       If the MPA Reply Frame is properly formatted and the Private Data
       is acceptable, and the Reject Connection bit is set to '0', the
       implementation SHOULD enter Full MPA/DDP Operation Phase;
       interpreting any received data as FPDUs and sending DDP ULPDUs as
       FPDUs.

   4.  MPA Responder mode implementations MUST receive and validate at
       least one FPDU before sending any FPDUs or Markers.

       Note: This requirement is present to allow the Initiator time to
           get its receiver into Full Operation before an FPDU arrives,
           avoiding potential race conditions at the Initiator.  This
           was also subject to some debate in the work group before
           rough consensus was reached.  Eliminating this requirement
           would allow faster startup in some types of applications.
           However, that would also make certain implementations
           (particularly "dual stack") much harder.

   5.  If a received "Key" does not match the expected value (see
       Section 7.1.1, MPA Request and Reply Frame Format) the TCP/DDP
       connection MUST be closed, and an error returned to the ULP.
Top   ToC   RFC5044 - Page 30
   6.  The received Private Data fields may be used by Consumers at
       either end to further validate the connection and set up DDP or
       other ULP parameters.  The Initiator ULP MAY close the
       TCP/MPA/DDP connection as a result of validating the Private Data
       fields.  The Responder SHOULD return an MPA Reply Frame with the
       "Reject Connection" bit set to '1' if the validation of the
       Private Data is not acceptable to the ULP.

   7.  When the first FPDU is to be sent, then if Markers are enabled,
       the first octets sent are the special Marker 0x00000000, followed
       by the start of the FPDU (the FPDU's ULPDU Length field).  If
       Markers are not enabled, the first octets sent are the start of
       the FPDU (the FPDU's ULPDU Length field).

   8.  MPA implementations MUST use the difference between the MPA
       Request Frame and the MPA Reply Frame to check for incorrect
       "Initiator/Initiator" startups.  Implementations SHOULD put a
       timeout on waiting for the MPA Request Frame when started in
       Responder mode, to detect incorrect "Responder/Responder"
       startups.

   9.  MPA implementations MUST validate the PD_Length field.  The
       buffer that receives the Private Data field MUST be large enough
       to receive that data; the amount of Private Data MUST not exceed
       the PD_Length or the application buffer.  If any of the above
       fails, the startup frame MUST be considered improperly formatted.

   10. MPA implementations SHOULD implement a reasonable timeout while
       waiting for the entire set of startup frames; this prevents
       certain denial-of-service attacks.  ULPs SHOULD implement a
       reasonable timeout while waiting for FPDUs, ULPDUs, and
       application level messages to guard against application failures
       and certain denial-of-service attacks.

7.1.3. Example Delayed Startup Sequence

A variety of startup sequences are possible when using MPA on TCP. Following is an example of an MPA/DDP startup that occurs after TCP has been running for a while and has exchanged some amount of streaming data. This example does not use any Private Data (an example that does is shown later in Section 7.1.4.2, Example Immediate Startup Using Private Data), although it is perfectly legal to include the Private Data. Note that since the example does not use any Private Data, there are no ULP interactions shown between receiving "startup frames" and putting MPA into Full Operation.
Top   ToC   RFC5044 - Page 31
         Initiator                                 Responder

  +---------------------------+
  |ULP streaming mode         |
  |  <Hello> request to       |
  |  transition to DDP/MPA    |           +---------------------------+
  |  mode (optional).         | --------> |ULP gets request;          |
  +---------------------------+           |  enables MPA Responder    |
                                          |  mode with last (optional)|
                                          |  streaming mode           |
                                          |  <Hello Ack> for MPA to   |
                                          |  send.                    |
  +---------------------------+           |MPA waits for incoming     |
  |ULP receives streaming     | <-------- |  <MPA Request Frame>.     |
  |  <Hello Ack>;             |           +---------------------------+
  |Enters MPA Initiator mode; |
  |MPA sends                  |
  |  <MPA Request Frame>;     |
  |MPA waits for incoming     |           +---------------------------+
  |  <MPA Reply Frame>.       | - - - - > |MPA receives               |
  +---------------------------+           |  <MPA Request Frame>.     |
                                          |Consumer binds DDP to MPA; |
                                          |MPA sends the              |
                                          |  <MPA Reply Frame>.       |
                                          |DDP/MPA enables FPDU       |
  +---------------------------+           |  decoding, but does not   |
  |MPA receives the           | < - - - - |  send any FPDUs.          |
  |  <MPA Reply Frame>        |           +---------------------------+
  |Consumer binds DDP to MPA; |
  |DDP/MPA begins Full        |
  |  Operation.               |
  |MPA sends first FPDU (as   |           +---------------------------+
  |  DDP ULPDUs become        | ========> |MPA receives first FPDU.   |
  |  available).              |           |MPA sends first FPDU (as   |
  +---------------------------+           |  DDP ULPDUs become        |
                                  <====== |  available).              |
                                          +---------------------------+

              Figure 9: Example Delayed Startup Negotiation
Top   ToC   RFC5044 - Page 32
   An example Delayed Startup sequence is described below:

       *   Active and passive sides start up a TCP connection in the
           usual fashion, probably using sockets APIs.  They exchange
           some amount of streaming mode data.  At some point, one side
           (the MPA Initiator) sends streaming mode data that
           effectively says "Hello, let's go into MPA/DDP mode".

   *   When the remote side (the MPA Responder) gets this streaming mode
       message, the Consumer would send a last streaming mode message
       that effectively says "I acknowledge your Hello, and am now in
       MPA Responder mode".  The exchange of these messages establishes
       the exact point in the TCP stream where MPA is enabled.  The
       Responding Consumer enables MPA in the Responder mode and waits
       for the initial MPA startup message.

       *   The Initiating Consumer would enable MPA startup in the
           Initiator mode which then sends the MPA Request Frame.  It is
           assumed that no Private Data messages are needed for this
           example, although it is possible to do so.  The Initiating
           MPA (and Consumer) would also wait for the MPA connection to
           be accepted.

   *   The Responding MPA would receive the initial MPA Request Frame
       and would inform the Consumer that this message arrived.  The
       Consumer can then accept the MPA/DDP connection or close the TCP
       connection.

   *   To accept the connection request, the Responding Consumer would
       use an appropriate API to bind the TCP/MPA connections to a DDP
       endpoint, thus enabling MPA/DDP into Full Operation.  In the
       process of going to Full Operation, MPA sends the MPA Reply
       Frame.  MPA/DDP waits for the first incoming FPDU before sending
       any FPDUs.

   *   If the initial TCP data was not a properly formatted MPA Request
       Frame, MPA will close or reset the TCP connection immediately.

       *   The Initiating MPA would receive the MPA Reply Frame and
           would report this message to the Consumer.  The Consumer can
           then accept the MPA/DDP connection, or close or reset the TCP
           connection to abort the process.

       *   On determining that the connection is acceptable, the
           Initiating Consumer would use an appropriate API to bind the
           TCP/MPA connections to a DDP endpoint thus enabling MPA/DDP
           into Full Operation.  MPA/DDP would begin sending DDP
           messages as MPA FPDUs.
Top   ToC   RFC5044 - Page 33

7.1.4. Use of Private Data

This section is advisory in nature, in that it suggests a method by which a ULP can deal with pre-DDP connection information exchange.
7.1.4.1. Motivation
Prior RDMA protocols have been developed that provide Private Data via out-of-band mechanisms. As a result, many applications now expect some form of Private Data to be available for application use prior to setting up the DDP/RDMA connection. Following are some examples of the use of Private Data. An RDMA endpoint (referred to as a Queue Pair, or QP, in InfiniBand and the [VERBS-RDMA]) must be associated with a Protection Domain. No receive operations may be posted to the endpoint before it is associated with a Protection Domain. Indeed under both the InfiniBand and proposed RDMA/DDP verbs [VERBS-RDMA] an endpoint/QP is created within a Protection Domain. There are some applications where the choice of Protection Domain is dependent upon the identity of the remote ULP client. For example, if a user session requires multiple connections, it is highly desirable for all of those connections to use a single Protection Domain. Note: Use of Protection Domains is further discussed in [RDMASEC]. InfiniBand, the DAT APIs [DAT-API], and the IT-API [IT-API] all provide for the active-side ULP to provide Private Data when requesting a connection. This data is passed to the ULP to allow it to determine whether to accept the connection, and if so with which endpoint (and implicitly which Protection Domain). The Private Data can also be used to ensure that both ends of the connection have configured their RDMA endpoints compatibly on such matters as the RDMA Read capacity (see [RDMAP]). Further ULP- specific uses are also presumed, such as establishing the identity of the client. Private Data is also allowed for when accepting the connection, to allow completion of any negotiation on RDMA resources and for other ULP reasons. There are several potential ways to exchange this Private Data. For example, the InfiniBand specification includes a connection management protocol that allows a small amount of Private Data to be exchanged using datagrams before actually starting the RDMA connection.
Top   ToC   RFC5044 - Page 34
   This document allows for small amounts of Private Data to be
   exchanged as part of the MPA startup sequence.  The actual Private
   Data fields are carried in the MPA Request Frame and the MPA Reply
   Frame.

   If larger amounts of Private Data or more negotiation is necessary,
   TCP streaming mode messages may be exchanged prior to enabling MPA.
Top   ToC   RFC5044 - Page 35
7.1.4.2. Example Immediate Startup Using Private Data
Initiator Responder +---------------------------+ |TCP SYN sent. | +--------------------------+ +---------------------------+ --------> |TCP gets SYN packet; | +---------------------------+ | sends SYN-Ack. | |TCP gets SYN-Ack | <-------- +--------------------------+ | sends Ack. | +---------------------------+ --------> +--------------------------+ +---------------------------+ |Consumer enables MPA | |Consumer enables MPA | |Responder mode, waits for | |Initiator mode with | | <MPA Request frame>. | |Private Data; MPA sends | +--------------------------+ | <MPA Request Frame>; | |MPA waits for incoming | +--------------------------+ | <MPA Reply Frame>. | - - - - > |MPA receives | +---------------------------+ | <MPA Request Frame>. | |Consumer examines Private | |Data, provides MPA with | |return Private Data, | |binds DDP to MPA, and | |enables MPA to send an | | <MPA Reply Frame>. | |DDP/MPA enables FPDU | +---------------------------+ |decoding, but does not | |MPA receives the | < - - - - |send any FPDUs. | | <MPA Reply Frame>. | +--------------------------+ |Consumer examines Private | |Data, binds DDP to MPA, | |and enables DDP/MPA to | |begin Full Operation. | |MPA sends first FPDU (as | +--------------------------+ |DDP ULPDUs become | ========> |MPA receives first FPDU. | |available). | |MPA sends first FPDU (as | +---------------------------+ |DDP ULPDUs become | <====== |available). | +--------------------------+ Figure 10: Example Immediate Startup Negotiation Note: The exact order of when MPA is started in the TCP connection sequence is implementation dependent; the above diagram shows one possible sequence. Also, the Initiator "Ack" to the Responder's "SYN-Ack" may be combined into the same TCP segment containing the MPA Request Frame (as is allowed by TCP RFCs).
Top   ToC   RFC5044 - Page 36
   The example immediate startup sequence is described below:

   *   The passive side (Responding Consumer) would listen on the TCP
       destination port, to indicate its readiness to accept a
       connection.

       *   The active side (Initiating Consumer) would request a
           connection from a TCP endpoint (that expected to upgrade to
           MPA/DDP/RDMA and expected the Private Data) to a destination
           address and port.

       *   The Initiating Consumer would initiate a TCP connection to
           the destination port.  Acceptance/rejection of the connection
           would proceed as per normal TCP connection establishment.

   *   The passive side (Responding Consumer) would receive the TCP
       connection request as usual allowing normal TCP gatekeepers, such
       as INETD and TCPserver, to exercise their normal
       safeguard/logging functions.  On acceptance of the TCP
       connection, the Responding Consumer would enable MPA in the
       Responder mode and wait for the initial MPA startup message.

       *   The Initiating Consumer would enable MPA startup in the
           Initiator mode to send an initial MPA Request Frame with its
           included Private Data message to send.  The Initiating MPA
           (and Consumer) would also wait for the MPA connection to be
           accepted, and any returned Private Data.

   *   The Responding MPA would receive the initial MPA Request Frame
       with the Private Data message and would pass the Private Data
       through to the Consumer.  The Consumer can then accept the
       MPA/DDP connection, close the TCP connection, or reject the MPA
       connection with a return message.

   *   To accept the connection request, the Responding Consumer would
       use an appropriate API to bind the TCP/MPA connections to a DDP
       endpoint, thus enabling MPA/DDP into Full Operation.  In the
       process of going to Full Operation, MPA sends the MPA Reply
       Frame, which includes the Consumer-supplied Private Data
       containing any appropriate Consumer response.  MPA/DDP waits for
       the first incoming FPDU before sending any FPDUs.

   *   If the initial TCP data was not a properly formatted MPA Request
       Frame, MPA will close or reset the TCP connection immediately.
Top   ToC   RFC5044 - Page 37
   *   To reject the MPA connection request, the Responding Consumer
       would send an MPA Reply Frame with any ULP-supplied Private Data
       (with reason for rejection), with the "Rejected Connection" bit
       set to '1', and may close the TCP connection.

       *   The Initiating MPA would receive the MPA Reply Frame with the
           Private Data message and would report this message to the
           Consumer, including the supplied Private Data.

           If the "Rejected Connection" bit is set to a '1', MPA will
           close the TCP connection and exit.

           If the "Rejected Connection" bit is set to a '0', and on
           determining from the MPA Reply Frame Private Data that the
           connection is acceptable, the Initiating Consumer would use
           an appropriate API to bind the TCP/MPA connections to a DDP
           endpoint thus enabling MPA/DDP into Full Operation.  MPA/DDP
           would begin sending DDP messages as MPA FPDUs.

7.1.5. "Dual Stack" Implementations

MPA/DDP implementations are commonly expected to be implemented as part of a "dual stack" architecture. One stack is the traditional TCP stack, usually with a sockets interface API (Application Programming Interface). The second stack is the MPA/DDP stack with its own API, and potentially separate code or hardware to deal with the MPA/DDP data. Of course, implementations may vary, so the following comments are of an advisory nature only. The use of the two stacks offers advantages: TCP connection setup is usually done with the TCP stack. This allows use of the usual naming and addressing mechanisms. It also means that any mechanisms used to "harden" the connection setup against security threats are also used when starting MPA/DDP. Some applications may have been originally designed for TCP, but are "enhanced" to utilize MPA/DDP after a negotiation reveals the capability to do so. The negotiation process takes place in TCP's streaming mode, using the usual TCP APIs. Some new applications, designed for RDMA or DDP, still need to exchange some data prior to starting MPA/DDP. This exchange can be of arbitrary length or complexity, but often consists of only a small amount of Private Data, perhaps only a single message. Using the TCP streaming mode for this exchange allows this to be done using well-understood methods.
Top   ToC   RFC5044 - Page 38
   The main disadvantage of using two stacks is the conversion of an
   active TCP connection between them.  This process must be done with
   care to prevent loss of data.

   To avoid some of the problems when using a "dual stack" architecture,
   the following additional restrictions may be required by the
   implementation:

   1.  Enabling the DDP/MPA stack SHOULD be done only when no incoming
       stream data is expected.  This is typically managed by the ULP
       protocol.  When following the recommended startup sequence, the
       Responder side enters DDP/MPA mode, sends the last streaming mode
       data, and then waits for the MPA Request Frame.  No additional
       streaming mode data is expected.  The Initiator side ULP receives
       the last streaming mode data, and then enters DDP/MPA mode.
       Again, no additional streaming mode data is expected.

   2.  The DDP/MPA MAY provide the ability to send a "last streaming
       message" as part of its Responder DDP/MPA enable function.  This
       allows the DDP/MPA stack to more easily manage the conversion to
       DDP/MPA mode (and avoid problems with a very fast return of the
       MPA Request Frame from the Initiator side).

   Note: Regardless of the "stack" architecture used, TCP's rules MUST
       be followed.  For example, if network data is lost, re-segmented,
       or re-ordered, TCP MUST recover appropriately even when this
       occurs while switching stacks.

7.2. Normal Connection Teardown

Each half connection of MPA terminates when DDP closes the corresponding TCP half connection. A mechanism SHOULD be provided by MPA to DDP for DDP to be made aware that a graceful close of the TCP connection has been received by the TCP (e.g., FIN is received).
Top   ToC   RFC5044 - Page 39

8. Error Semantics

The following errors MUST be detected by MPA and the codes SHOULD be provided to DDP or other Consumer: Code Error 1 TCP connection closed, terminated, or lost. This includes lost by timeout, too many retries, RST received, or FIN received. 2 Received MPA CRC does not match the calculated value for the FPDU. 3 In the event that the CRC is valid, received MPA Marker (if enabled) and ULPDU Length fields do not agree on the start of an FPDU. If the FPDU start determined from previous ULPDU Length fields does not match with the MPA Marker position, MPA SHOULD deliver an error to DDP. It may not be possible to make this check as a segment arrives, but the check SHOULD be made when a gap creating an out-of-order sequence is closed and any time a Marker points to an already identified FPDU. It is OPTIONAL for a receiver to check each Marker, if multiple Markers are present in an FPDU, or if the segment is received in order. 4 Invalid MPA Request Frame or MPA Response Frame received. In this case, the TCP connection MUST be immediately closed. DDP and other ULPs should treat this similar to code 1, above. When conditions 2 or 3 above are detected, an optimized MPA/TCP implementation MAY choose to silently drop the TCP segment rather than reporting the error to DDP. In this case, the sending TCP will retry the segment, usually correcting the error, unless the problem was at the source. In that case, the source will usually exceed the number of retries and terminate the connection. Once MPA delivers an error of any type, it MUST NOT pass or deliver any additional FPDUs on that half connection. For Error codes 2 and 3, MPA MUST NOT close the TCP connection following a reported error. Closing the connection is the responsibility of DDP's ULP. Note that since MPA will not Deliver any FPDUs on a half connection following an error detected on the receive side of that connection, DDP's ULP is expected to tear down the connection. This may not occur until after one or more last messages are transmitted on the opposite half connection. This allows a diagnostic error message to be sent.
Top   ToC   RFC5044 - Page 40

9. Security Considerations

This section discusses the security considerations for MPA.

9.1. Protocol-Specific Security Considerations

The vulnerabilities of MPA to third-party attacks are no greater than any other protocol running over TCP. A third party, by sending packets into the network that are delivered to an MPA receiver, could launch a variety of attacks that take advantage of how MPA operates. For example, a third party could send random packets that are valid for TCP, but contain no FPDU headers. An MPA receiver reports an error to DDP when any packet arrives that cannot be validated as an FPDU when properly located on an FPDU boundary. A third party could also send packets that are valid for TCP, MPA, and DDP, but do not target valid buffers. These types of attacks ultimately result in loss of connection and thus become a type of DOS (Denial Of Service) attack. Communication security mechanisms such as IPsec [RFC2401, RFC4301] may be used to prevent such attacks. Independent of how MPA operates, a third party could use ICMP messages to reduce the path MTU to such a small size that performance would likewise be severely impacted. Range checking on path MTU sizes in ICMP packets may be used to prevent such attacks. [RDMAP] and [DDP] are used to control, read, and write data buffers over IP networks. Therefore, the control and the data packets of these protocols are vulnerable to the spoofing, tampering, and information disclosure attacks listed below. In addition, connection to/from an unauthorized or unauthenticated endpoint is a potential problem with most applications using RDMA, DDP, and MPA.

9.1.1. Spoofing

Spoofing attacks can be launched by the Remote Peer or by a network based attacker. A network-based spoofing attack applies to all Remote Peers. Because the MPA Stream requires a TCP Stream in the ESTABLISHED state, certain types of traditional forms of wire attacks do not apply -- an end-to-end handshake must have occurred to establish the MPA Stream. So, the only form of spoofing that applies is one when a remote node can both send and receive packets. Yet even with this limitation the Stream is still exposed to the following spoofing attacks.
Top   ToC   RFC5044 - Page 41
9.1.1.1. Impersonation
A network-based attacker can impersonate a legal MPA/DDP/RDMAP peer (by spoofing a legal IP address) and establish an MPA/DDP/RDMAP Stream with the victim. End-to-end authentication (i.e., IPsec or ULP authentication) provides protection against this attack.
9.1.1.2. Stream Hijacking
Stream hijacking happens when a network-based attacker follows the Stream establishment phase, and waits until the authentication phase (if such a phase exists) is completed successfully. He can then spoof the IP address and redirect the Stream from the victim to its own machine. For example, an attacker can wait until an iSCSI authentication is completed successfully, and hijack the iSCSI Stream. The best protection against this form of attack is end-to-end integrity protection and authentication, such as IPsec, to prevent spoofing. Another option is to provide physical security. Discussion of physical security is out of scope for this document.
9.1.1.3. Man-in-the-Middle Attack
If a network-based attacker has the ability to delete, inject, replay, or modify packets that will still be accepted by MPA (e.g., TCP sequence number is correct, FPDU is valid, etc.), then the Stream can be exposed to a man-in-the-middle attack. The attacker could potentially use the services of [DDP] and [RDMAP] to read the contents of the associated Data Buffer, to modify the contents of the associated Data Buffer, or to disable further access to the buffer. Other attacks on the connection setup sequence and even on TCP can be used to cause denial of service. The only countermeasure for this form of attack is to either secure the MPA/DDP/RDMAP Stream (i.e., integrity protect) or attempt to provide physical security to prevent man-in-the-middle type attacks. The best protection against this form of attack is end-to-end integrity protection and authentication, such as IPsec, to prevent spoofing or tampering. If Stream or session level authentication and integrity protection are not used, then a man-in-the-middle attack can occur, enabling spoofing and tampering. Another approach is to restrict access to only the local subnet/link and provide some mechanism to limit access, such as physical security or 802.1.x. This model is an extremely limited deployment scenario and will not be further examined here.
Top   ToC   RFC5044 - Page 42

9.1.2. Eavesdropping

Generally speaking, Stream confidentiality protects against eavesdropping. Stream and/or session authentication and integrity protection are a counter measurement against various spoofing and tampering attacks. The effectiveness of authentication and integrity against a specific attack depend on whether the authentication is machine-level authentication (as the one provided by IPsec) or ULP authentication.

9.2. Introduction to Security Options

The following security services can be applied to an MPA/DDP/RDMAP Stream: 1. Session confidentiality - protects against eavesdropping. 2. Per-packet data source authentication - protects against the following spoofing attacks: network-based impersonation, Stream hijacking, and man in the middle. 3. Per-packet integrity - protects against tampering done by network-based modification of FPDUs (indirectly affecting buffer content through DDP services). 4. Packet sequencing - protects against replay attacks, which is a special case of the above tampering attack. If an MPA/DDP/RDMAP Stream may be subject to impersonation attacks, or Stream hijacking attacks, it is recommended that the Stream be authenticated, integrity protected, and protected from replay attacks. It may use confidentiality protection to protect from eavesdropping (in case the MPA/DDP/RDMAP Stream traverses a public network). IPsec is capable of providing the above security services for IP and TCP traffic. ULP protocols may be able to provide part of the above security services. See [NFSv4CHAN] for additional information on a promising approach called "channel binding". From [NFSv4CHAN]: "The concept of channel bindings allows applications to prove that the end-points of two secure channels at different network layers are the same by binding authentication at one channel to the session protection at the other channel. The use of channel
Top   ToC   RFC5044 - Page 43
       bindings allows applications to delegate session protection to
       lower layers, which may significantly improve performance for
       some applications."

9.3. Using IPsec with MPA

IPsec can be used to protect against the packet injection attacks outlined above. Because IPsec is designed to secure individual IP packets, MPA can run above IPsec without change. IPsec packets are processed (e.g., integrity checked and decrypted) in the order they are received, and an MPA receiver will process the decrypted FPDUs contained in these packets in the same manner as FPDUs contained in unsecured IP packets. MPA implementations MUST implement IPsec as described in Section 9.4 below. The use of IPsec is up to ULPs and administrators.

9.4. Requirements for IPsec Encapsulation of MPA/DDP

The IP Storage working group has spent significant time and effort to define the normative IPsec requirements for IP storage [RFC3723]. Portions of that specification are applicable to a wide variety of protocols, including the RDDP protocol suite. In order not to replicate this effort, an MPA on TCP implementation MUST follow the requirements defined in RFC 3723, Sections 2.3 and 5, including the associated normative references for those sections. Additionally, since IPsec acceleration hardware may only be able to handle a limited number of active Internet Key Exchange Protocol (IKE) Phase 2 security associations (SAs), Phase 2 delete messages MAY be sent for idle SAs, as a means of keeping the number of active Phase 2 SAs to a minimum. The receipt of an IKE Phase 2 delete message MUST NOT be interpreted as a reason for tearing down a DDP/RDMA Stream. Rather, it is preferable to leave the Stream up, and if additional traffic is sent on it, to bring up another IKE Phase 2 SA to protect it. This avoids the potential for continually bringing Streams up and down. The IPsec requirements for RDDP are based on the version of IPsec specified in RFC 2401 [RFC2401] and related RFCs, as profiled by RFC 3723 [RFC3723], despite the existence of a newer version of IPsec specified in RFC 4301 [RFC4301] and related RFCs. One of the important early applications of the RDDP protocols is their use with iSCSI [iSER]; RDDP's IPsec requirements follow those of IPsec in order to facilitate that usage by allowing a common profile of IPsec to be used with iSCSI and the RDDP protocols. In the future, RFC
Top   ToC   RFC5044 - Page 44
   3723 may be updated to the newer version of IPsec; the IPsec security
   requirements of any such update should apply uniformly to iSCSI and
   the RDDP protocols.

   Note that there are serious security issues if IPsec is not
   implemented end-to-end.  For example, if IPsec is implemented as a
   tunnel in the middle of the network, any hosts between the peer and
   the IPsec tunneling device can freely attack the unprotected Stream.

10. IANA Considerations

No IANA actions are required by this document. If a well-known port is chosen as the mechanism to identify a DDP on MPA on TCP, the well-known port must be registered with IANA. Because the use of the port is DDP specific, registration of the port with IANA is left to DDP.


(next page on part 3)

Next Section