RFC 5044

Marker PDU Aligned Framing for TCP Specification

Pages: 74
Proposed Standard
→ Errata
Updated by: 6581 7146

Part 1 of 3 – Pages 1 to 21

RFC5044 - Page 1

Network Working Group                                          P. Culley
Request for Comments: 5044                       Hewlett-Packard Company
Category: Standards Track                                       U. Elzur
                                                    Broadcom Corporation
                                                                R. Recio
                                                         IBM Corporation
                                                               S. Bailey
                                                   Sandburst Corporation
                                                              J. Carrier
                                                               Cray Inc.
                                                            October 2007


            Marker PDU Aligned Framing for TCP Specification

Status of This Memo

   This document specifies an Internet standards track protocol for the
   Internet community, and requests discussion and suggestions for
   improvements.  Please refer to the current edition of the "Internet
   Official Protocol Standards" (STD 1) for the standardization state
   and status of this protocol.  Distribution of this memo is unlimited.

Abstract

   Marker PDU Aligned Framing (MPA) is designed to work as an
   "adaptation layer" between TCP and the Direct Data Placement protocol
   (DDP) as described in RFC 5041.  It preserves the reliable, in-order
   delivery of TCP, while adding the preservation of higher-level
   protocol record boundaries that DDP requires.  MPA is fully compliant
   with applicable TCP RFCs and can be utilized with existing TCP
   implementations.  MPA also supports integrated implementations that
   combine TCP, MPA and DDP to reduce buffering requirements in the
   implementation and improve performance at the system level.

RFC5044 - Page 2

Table of Contents

   1. Introduction ....................................................4
      1.1. Motivation .................................................4
      1.2. Protocol Overview ..........................................5
   2. Glossary ........................................................8
   3. MPA's Interactions with DDP ....................................11
   4. MPA Full Operation Phase .......................................13
      4.1. FPDU Format ...............................................13
      4.2. Marker Format .............................................14
      4.3. MPA Markers ...............................................14
      4.4. CRC Calculation ...........................................16
      4.5. FPDU Size Considerations ..................................21
   5. MPA's interactions with TCP ....................................22
      5.1. MPA transmitters with a standard layered TCP ..............22
      5.2. MPA receivers with a standard layered TCP .................23
   6. MPA Receiver FPDU Identification ...............................24
   7. Connection Semantics ...........................................24
      7.1. Connection Setup ..........................................24
           7.1.1. MPA Request and Reply Frame Format .................26
           7.1.2. Connection Startup Rules ...........................28
           7.1.3. Example Delayed Startup Sequence ...................30
           7.1.4. Use of Private Data ................................33
                  7.1.4.1. Motivation ................................33
                  7.1.4.2. Example Immediate Startup Using
                           Private Data ..............................35
           7.1.5. "Dual Stack" Implementations .......................37
      7.2. Normal Connection Teardown ................................38
   8. Error Semantics ................................................39
   9. Security Considerations ........................................40
      9.1. Protocol-Specific Security Considerations .................40
           9.1.1. Spoofing ...........................................40
                  9.1.1.1. Impersonation .............................41
                  9.1.1.2. Stream Hijacking ..........................41
                  9.1.1.3. Man-in-the-Middle Attack ..................41
           9.1.2. Eavesdropping ......................................42
      9.2. Introduction to Security Options ..........................42
      9.3. Using IPsec with MPA ......................................43
      9.4. Requirements for IPsec Encapsulation of MPA/DDP ...........43
   10. IANA Considerations ...........................................44
   Appendix A. Optimized MPA-Aware TCP Implementations ...............45
      A.1. Optimized MPA/TCP Transmitters ............................46
      A.2. Effects of Optimized MPA/TCP Segmentation .................46
      A.3. Optimized MPA/TCP Receivers ...............................48
      A.4. Re-segmenting Middleboxes and Non-Optimized MPA/TCP
           Senders ...................................................49
      A.5. Receiver Implementation ...................................50
           A.5.1. Network Layer Reassembly Buffers ...................51

RFC5044 - Page 3

           A.5.2. TCP Reassembly Buffers .............................52
   Appendix B. Analysis of MPA over TCP Operations ...................52
      B.1. Assumptions ...............................................53
           B.1.1. MPA Is Layered beneath DDP .........................53
           B.1.2. MPA Preserves DDP Message Framing ..................53
           B.1.3. The Size of the ULPDU Passed to MPA Is Less Than
                  EMSS Under Normal Conditions .......................53
           B.1.4. Out-of-Order Placement but NO Out-of-Order Delivery.54
     B.2.  The Value of FPDU Alignment ...............................54
           B.2.1. Impact of Lack of FPDU Alignment on the Receiver
                  Computational Load and Complexity ..................56
           B.2.2. FPDU Alignment Effects on TCP Wire Protocol ........60
   Appendix C. IETF Implementation Interoperability with RDMA
               Consortium Protocols ..................................62
     C.1. Negotiated Parameters ......................................63
     C.2. RDMAC RNIC and Non-Permissive IETF RNIC ....................64
          C.2.1. RDMAC RNIC Initiator ................................65
          C.2.2. Non-Permissive IETF RNIC Initiator ..................65
          C.2.3. RDMAC RNIC and Permissive IETF RNIC .................65
          C.2.4. RDMAC RNIC Initiator ................................66
          C.2.5. Permissive IETF RNIC Initiator ......................67
     C.3. Non-Permissive IETF RNIC and Permissive IETF RNIC ..........67
   Normative References ..............................................68
   Informative References ............................................68
   Contributors ......................................................70

Table of Figures

   Figure 1: ULP MPA TCP Layering .....................................5
   Figure 2: FPDU Format .............................................13
   Figure 3: Marker Format ...........................................14
   Figure 4: Example FPDU Format with Marker .........................16
   Figure 5: Annotated Hex Dump of an FPDU ...........................19
   Figure 6: Annotated Hex Dump of an FPDU with Marker ...............20
   Figure 7: Fully Layered Implementation ............................22
   Figure 8: MPA Request/Reply Frame .................................26
   Figure 9: Example Delayed Startup Negotiation .....................31
   Figure 10: Example Immediate Startup Negotiation ..................35
   Figure 11: Optimized MPA/TCP Implementation .......................45
   Figure 12: Non-Aligned FPDU Freely Placed in TCP Octet Stream .....56
   Figure 13: Aligned FPDU Placed Immediately after TCP Header .......58
   Figure 14: Connection Parameters for the RNIC Types ...............63
   Figure 15: MPA Negotiation between an RDMAC RNIC and a
              Non-Permissive IETF RNIC ...............................65
   Figure 16: MPA Negotiation between an RDMAC RNIC and a Permissive
              IETF RNIC ..............................................66
   Figure 17: MPA Negotiation between a Non-Permissive IETF RNIC and
              a Permissive IETF RNIC .................................67

RFC5044 - Page 4

1.  Introduction

   This section discusses the reason for creating MPA on TCP and a
   general overview of the protocol.

1.1.  Motivation

   The Direct Data Placement protocol [DDP], when used with TCP
   [RFC793], requires a mechanism to detect record boundaries.  The DDP
   records are referred to as Upper Layer Protocol Data Units by this
   document.  The ability to locate the Upper Layer Protocol Data Unit
   (ULPDU) boundary is useful to a hardware network adapter that uses
   DDP to directly place the data in the application buffer based on the
   control information carried in the ULPDU header.  This may be done
   without requiring that the packets arrive in order.  Potential
   benefits of this capability are the avoidance of the memory copy
   overhead and a smaller memory requirement for handling out-of-order
   or dropped packets.

   Many approaches have been proposed for a generalized framing
   mechanism.  Some are probabilistic in nature and others are
   deterministic.  An example probabilistic approach is characterized by
   a detectable value embedded in the octet stream, with no method of
   preventing that value elsewhere within user data.  It is
   probabilistic because under some conditions the receiver may
   incorrectly interpret application data as the detectable value.
   Under these conditions, the protocol may fail with unacceptable
   frequency.  One deterministic approach is characterized by embedded
   controls at known locations in the octet stream.  Because the
   receiver can guarantee it will only examine the data stream at
   locations that are known to contain the embedded control, the
   protocol can never misinterpret application data as being embedded
   control data.  For unambiguous handling of an out-of-order packet, a
   deterministic approach is preferred.

   The MPA protocol provides a framing mechanism for DDP running over
   TCP using the deterministic approach.  It allows the location of the
   ULPDU to be determined in the TCP stream even if the TCP segments
   arrive out of order.

RFC5044 - Page 5

1.2.  Protocol Overview

   The layering of PDUs with MPA is shown in Figure 1, below.

               +------------------+
               |     ULP client   |
               +------------------+  <- Consumer messages
               |        DDP       |
               +------------------+  <- ULPDUs
               |        MPA*      |
               +------------------+  <- FPDUs (containing ULPDUs)
               |        TCP*      |
               +------------------+  <- TCP Segments (containing FPDUs)
               |      IP etc.     |
               +------------------+
                * These may be fully layered or optimized together.

                       Figure 1: ULP MPA TCP Layering

   MPA is described as an extra layer above TCP and below DDP.  The
   operation sequence is:

   1.  A TCP connection is established by ULP action.  This is done
       using methods not described by this specification.  The ULP may
       exchange some amount of data in streaming mode prior to starting
       MPA, but is not required to do so.

   2.  The Consumer negotiates the use of DDP and MPA at both ends of a
       connection.  The mechanisms to do this are not described in this
       specification.  The negotiation may be done in streaming mode, or
       by some other mechanism (such as a pre-arranged port number).

   3.  The ULP activates MPA on each end in the Startup Phase, either as
       an Initiator or a Responder, as determined by the ULP.  This mode
       verifies the usage of MPA, specifies the use of CRC and Markers,
       and allows the ULP to communicate some additional data via a
       Private Data exchange.  See Section 7.1, Connection Setup, for
       more details on the startup process.

   4.  At the end of the Startup Phase, the ULP puts MPA (and DDP) into
       Full Operation and begins sending DDP data as further described
       below.  In this document, DDP data chunks are called ULPDUs.  For
       a description of the DDP data, see [DDP].

RFC5044 - Page 6

   Following is a description of data transfer when MPA is in Full
   Operation.

   1.  DDP determines the Maximum ULPDU (MULPDU) size by querying MPA
       for this value.  MPA derives this information from TCP or IP,
       when it is available, or chooses a reasonable value.

   2.  DDP creates ULPDUs of MULPDU size or smaller, and hands them to
       MPA at the sender.

   3.  MPA creates a Framed Protocol Data Unit (FPDU) by prepending a
       header, optionally inserting Markers, and appending a CRC field
       after the ULPDU and PAD (if any).  MPA delivers the FPDU to TCP.

   4.  The TCP sender puts the FPDUs into the TCP stream.  If the sender
       is optimized MPA/TCP, it segments the TCP stream in such a way
       that a TCP Segment boundary is also the boundary of an FPDU.  TCP
       then passes each segment to the IP layer for transmission.

   5.  The receiver may or may not be optimized.  If it is optimized
       MPA/TCP, it may separate passing the TCP payload to MPA from
       passing the TCP payload ordering information to MPA.  In either
       case, RFC-compliant TCP wire behavior is observed at both the
       sender and receiver.

   6.  The MPA receiver locates and assembles complete FPDUs within the
       stream, verifies their integrity, and removes MPA Markers (when
       present), ULPDU_Length, PAD, and the CRC field.

   7.  MPA then provides the complete ULPDUs to DDP.  MPA may also
       separate passing MPA payload to DDP from passing the MPA payload
       ordering information.

   A fully layered MPA on TCP is implemented as a data stream ULP for
   TCP and is therefore RFC compliant.

   An optimized DDP/MPA/TCP uses a TCP layer that potentially contains
   some additional behaviors as suggested in this document.  When
   DDP/MPA/TCP are cross-layer optimized, the behavior of TCP
   (especially sender segmentation) may change from that of the un-
   optimized implementation, but the changes are within the bounds
   permitted by the TCP RFC specifications, and will interoperate with
   an un-optimized TCP.  The additional behaviors are described in
   Appendix A and are not normative; they are described at a TCP
   interface layer as a convenience.  Implementations may achieve the
   described functionality using any method, including cross-layer
   optimizations between TCP, MPA, and DDP.

RFC5044 - Page 7

   An optimized DDP/MPA/TCP sender is able to segment the data stream
   such that TCP segments begin with FPDUs (FPDU Alignment).  This has
   significant advantages for receivers.  When segments arrive with
   aligned FPDUs, the receiver usually need not buffer any portion of
   the segment, allowing DDP to place it in its destination memory
   immediately, thus avoiding copies from intermediate buffers (DDP's
   reason for existence).

   An optimized DDP/MPA/TCP receiver allows a DDP on MPA implementation
   to locate the start of ULPDUs that may be received out of order.  It
   also allows the implementation to determine if the entire ULPDU has
   been received.  As a result, MPA can pass out-of-order ULPDUs to DDP
   for immediate use.  This enables a DDP on MPA implementation to save
   a significant amount of intermediate storage by placing the ULPDUs in
   the right locations in the application buffers when they arrive,
   rather than waiting until full ordering can be restored.

   The ability of a receiver to recover out-of-order ULPDUs is optional
   and declared to the transmitter during startup.  When the receiver
   declares that it does not support out-of-order recovery, the
   transmitter does not add the control information to the data stream
   needed for out-of-order recovery.

   If the receiver is fully layered, then MPA receives a strictly
   ordered stream of data and does not deal with out-of-order ULPDUs.
   In this case, MPA passes each ULPDU to DDP when the last bytes arrive
   from TCP, along with the indication that they are in order.

   MPA implementations that support recovery of out-of-order ULPDUs MUST
   support a mechanism to indicate the ordering of ULPDUs as the sender
   transmitted them and indicate when missing intermediate segments
   arrive.  These mechanisms allow DDP to reestablish record ordering
   and report Delivery of complete messages (groups of records).

   MPA also addresses enhanced data integrity.  Some users of TCP have
   noted that the TCP checksum is not as strong as could be desired (see
   [CRCTCP]).  Studies such as [CRCTCP] have shown that the TCP checksum
   indicates segments in error at a much higher rate than the underlying
   link characteristics would indicate.  With these higher error rates,
   the chance that an error will escape detection, when using only the
   TCP checksum for data integrity, becomes a concern.  A stronger
   integrity check can reduce the chance of data errors being missed.

   MPA includes a CRC check to increase the ULPDU data integrity to the
   level provided by other modern protocols, such as SCTP [RFC4960].  It
   is possible to disable this CRC check; however, CRCs MUST be enabled
   unless it is clear that the end-to-end connection through the network
   has data integrity at least as good as an MPA with CRC enabled (for

RFC5044 - Page 8

   example, when IPsec is implemented end to end).  DDP's ULP expects
   this level of data integrity and therefore the ULP does not have to
   provide its own duplicate data integrity and error recovery for lost
   data.

2.  Glossary

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in [RFC2119].

   Consumer - the ULPs or applications that lie above MPA and DDP.  The
       Consumer is responsible for making TCP connections, starting MPA
       and DDP connections, and generally controlling operations.

   CRC - Cyclic Redundancy Check.

   Delivery - (Delivered, Delivers) - For MPA, Delivery is defined as
       the process of informing DDP that a particular PDU is ordered for
       use.  A PDU is Delivered in the exact order that it was sent by
       the original sender; MPA uses TCP's byte stream ordering to
       determine when Delivery is possible.  This is specifically
       different from "passing the PDU to DDP", which may generally
       occur in any order, while the order of Delivery is strictly
       defined.

   EMSS - Effective Maximum Segment Size.  EMSS is the smaller of the
       TCP maximum segment size (MSS) as defined in RFC 793 [RFC793],
       and the current path Maximum Transmission Unit (MTU) [RFC1191].

   FPDU - Framed Protocol Data Unit.  The unit of data created by an MPA
       sender.

   FPDU Alignment - The property that an FPDU is Header Aligned with the
       TCP segment, and the TCP segment includes an integer number of
       FPDUs.  A TCP segment with an FPDU Alignment allows immediate
       processing of the contained FPDUs without waiting on other TCP
       segments to arrive or combining with prior segments.

   FPDU Pointer (FPDUPTR) - This field of the Marker is used to indicate
       the beginning of an FPDU.

   Full Operation (Full Operation Phase) - After the completion of the
       Startup Phase, MPA begins exchanging FPDUs.

RFC5044 - Page 9

   Header Alignment - The property that a TCP segment begins with an
       FPDU.  The FPDU is Header Aligned when the FPDU header is exactly
       at the start of the TCP segment (right behind the TCP headers on
       the wire).

   Initiator - The endpoint of a connection that sends the MPA Request
       Frame, i.e., the first to actually send data (which may not be
       the one that sends the TCP SYN).

   Marker - A four-octet field that is placed in the MPA data stream at
       fixed octet intervals (every 512 octets).

   MPA-aware TCP - A TCP implementation that is aware of the receiver
       efficiencies of MPA FPDU Alignment and is capable of sending TCP
       segments that begin with an FPDU.

   MPA-enabled - MPA is enabled if the MPA protocol is visible on the
       wire.  When the sender is MPA-enabled, it is inserting framing
       and Markers.  When the receiver is MPA-enabled, it is
       interpreting framing and Markers.

   MPA Request Frame - Data sent from the MPA Initiator to the MPA
       Responder during the Startup Phase.

   MPA Reply Frame - Data sent from the MPA Responder to the MPA
       Initiator during the Startup Phase.

   MPA - Marker-based ULP PDU Aligned Framing for TCP protocol.  This
       document defines the MPA protocol.

   MULPDU - Maximum ULPDU.  The current maximum size of the record that
       is acceptable for DDP to pass to MPA for transmission.

   Node - A computing device attached to one or more links of a network.
       A Node in this context does not refer to a specific application
       or protocol instantiation running on the computer.  A Node may
       consist of one or more MPA on TCP devices installed in a host
       computer.

   PAD - A 1-3 octet group of zeros used to fill an FPDU to an exact
       modulo 4 size.

   PDU - Protocol data unit

   Private Data - A block of data exchanged between MPA endpoints during
       initial connection setup.

RFC5044 - Page 10

   Protection Domain - An RDMA concept (see [VERBS-RDMA] and [RDMASEC])
       that ties use of various endpoint resources (memory access, etc.)
       to the specific RDMA/DDP/MPA connection.

   RDDP - A suite of protocols including MPA, [DDP], [RDMAP], an overall
       security document [RDMASEC], a problem statement [RFC4297], an
       architecture document [RFC4296], and an applicability document
       [APPL].

   RDMA - Remote Direct Memory Access; a protocol that uses DDP and MPA
       to enable applications to transfer data directly from memory
       buffers.  See [RDMAP].

   Remote Peer - The MPA protocol implementation on the opposite end of
       the connection.  Used to refer to the remote entity when
       describing protocol exchanges or other interactions between two
       Nodes.

   Responder - The connection endpoint that responds to an incoming MPA
       connection request (the MAP Request Frame).  This may not be the
       endpoint that awaited the TCP SYN.

   Startup Phase - The initial exchanges of an MPA connection that
       serves to more fully identify MPA endpoints to each other and
       pass connection specific setup information to each other.

   ULP - Upper Layer Protocol.  The protocol layer above the protocol
       layer currently being referenced.  The ULP for MPA is DDP [DDP].

   ULPDU - Upper Layer Protocol Data Unit.  The data record defined by
       the layer above MPA (DDP).  ULPDU corresponds to DDP's DDP
       segment.

   ULPDU_Length - A field in the FPDU describing the length of the
       included ULPDU.

RFC5044 - Page 11

3.  MPA's Interactions with DDP

   DDP requires MPA to maintain DDP record boundaries from the sender to
   the receiver.  When using MPA on TCP to send data, DDP provides
   records (ULPDUs) to MPA.  MPA will use the reliable transmission
   abilities of TCP to transmit the data, and will insert appropriate
   additional information into the TCP stream to allow the MPA receiver
   to locate the record boundary information.

   As such, MPA accepts complete records (ULPDUs) from DDP at the sender
   and returns them to DDP at the receiver.

   MPA MUST encapsulate the ULPDU such that there is exactly one ULPDU
   contained in one FPDU.

   MPA over a standard TCP stack can usually provide FPDU Alignment with
   the TCP Header if the FPDU is equal to TCP's EMSS.  An optimized
   MPA/TCP stack can also maintain alignment as long as the FPDU is less
   than or equal to TCP's EMSS.  Since FPDU Alignment is generally
   desired by the receiver, DDP cooperates with MPA to ensure FPDUs'
   lengths do not exceed the EMSS under normal conditions.  This is done
   with the MULPDU mechanism.

   MPA MUST provide information to DDP on the current maximum size of
   the record that is acceptable to send (MULPDU).  DDP SHOULD limit
   each record size to MULPDU.  The range of MULPDU values MUST be
   between 128 octets and 64768 octets, inclusive.

   The sending DDP MUST NOT post a ULPDU larger than 64768 octets to
   MPA.  DDP MAY post a ULPDU of any size between one and 64768 octets;
   however, MPA is not REQUIRED to support a ULPDU Length that is
   greater than the current MULPDU.

   While the maximum theoretical length supported by the MPA header
   ULPDU_Length field is 65535, TCP over IP requires the IP datagram
   maximum length to be 65535 octets.  To enable MPA to support FPDU
   Alignment, the maximum size of the FPDU must fit within an IP
   datagram.  Thus, the ULPDU limit of 64768 octets was derived by
   taking the maximum IP datagram length, subtracting from it the
   maximum total length of the sum of the IPv4 header, TCP header, IPv4
   options, TCP options, and the worst-case MPA overhead, and then
   rounding the result down to a 128-octet boundary.

   Note that MULPDU will be significantly smaller than the theoretical
   maximum in most implementations for most circumstances, due to link
   MTUs, use of extra headers such as required for IPsec, etc.

RFC5044 - Page 12

   On receive, MPA MUST pass each ULPDU with its length to DDP when it
   has been validated.

   If an MPA implementation supports passing out-of-order ULPDUs to DDP,
   the MPA implementation SHOULD:

   *   Pass each ULPDU with its length to DDP as soon as it has been
       fully received and validated.

   *   Provide a mechanism to indicate the ordering of ULPDUs as the
       sender transmitted them.  One possible mechanism might be
       providing the TCP sequence number for each ULPDU.

   *   Provide a mechanism to indicate when a given ULPDU (and prior
       ULPDUs) are complete (Delivered to DDP).  One possible mechanism
       might be to allow DDP to see the current outgoing TCP ACK
       sequence number.

   *   Provide an indication to DDP that the TCP has closed or has begun
       to close the connection (e.g., received a FIN).

   MPA MUST provide the protocol version negotiated with its peer to
   DDP.  DDP will use this version to set the version in its header and
   to report the version to [RDMAP].

RFC5044 - Page 13

4.  MPA Full Operation Phase

   The following sections describe the main semantics of the Full
   Operation Phase of MPA.

4.1.  FPDU Format

   MPA senders create FPDUs out of ULPDUs.  The format of an FPDU shown
   below MUST be used for all MPA FPDUs.  For purposes of clarity,
   Markers are not shown in Figure 2.

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |          ULPDU_Length         |                               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               +
      |                                                               |
      ~                                                               ~
      ~                            ULPDU                              ~
      |                                                               |
      |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                               |          PAD (0-3 octets)     |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                             CRC                               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                           Figure 2: FPDU Format

   ULPDU_Length: 16 bits (unsigned integer).  This is the number of
   octets of the contained ULPDU.  It does not include the length of the
   FPDU header itself, the pad, the CRC, or of any Markers that fall
   within the ULPDU.  The 16-bit ULPDU Length field is large enough to
   support the largest IP datagrams for IPv4 or IPv6.

   PAD: The PAD field trails the ULPDU and contains between 0 and 3
   octets of data.  The pad data MUST be set to zero by the sender and
   ignored by the receiver (except for CRC checking).  The length of the
   pad is set so as to make the size of the FPDU an integral multiple of
   four.

   CRC: 32 bits.  When CRCs are enabled, this field contains a CRC32c
   check value, which is used to verify the entire contents of the FPDU,
   using CRC32c.  See Section 4.4, CRC Calculation.  When CRCs are not
   enabled, this field is still present, may contain any value, and MUST
   NOT be checked.

RFC5044 - Page 14

   The FPDU adds a minimum of 6 octets to the length of the ULPDU.  In
   addition, the total length of the FPDU will include the length of any
   Markers and from 0 to 3 pad octets added to round-up the ULPDU size.

4.2.  Marker Format

   The format of a Marker MUST be as specified in Figure 3:

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |           RESERVED            |            FPDUPTR            |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                          Figure 3: Marker Format

   RESERVED: The Reserved field MUST be set to zero on transmit and
   ignored on receive (except for CRC calculation).

   FPDUPTR: The FPDU Pointer is a relative pointer, 16 bits long,
   interpreted as an unsigned integer that indicates the number of
   octets in the TCP stream from the beginning of the ULPDU Length field
   to the first octet of the entire Marker.  The least significant two
   bits MUST always be set to zero at the transmitter, and the receivers
   MUST always treat these as zero for calculations.

4.3.  MPA Markers

   MPA Markers are used to identify the start of FPDUs when packets are
   received out of order.  This is done by locating the Markers at fixed
   intervals in the data stream (which is correlated to the TCP sequence
   number) and using the Marker value to locate the preceding FPDU
   start.

   All MPA Markers are included in the containing FPDU CRC calculation
   (when both CRCs and Markers are in use).

   The MPA receiver's ability to locate out-of-order FPDUs and pass the
   ULPDUs to DDP is implementation dependent.  MPA/DDP allows those
   receivers that are able to deal with out-of-order FPDUs in this way
   to require the insertion of Markers in the data stream.  When the
   receiver cannot deal with out-of-order FPDUs in this way, it may
   disable the insertion of Markers at the sender.  All MPA senders MUST
   be able to generate Markers when their use is declared by the
   opposing receiver (see Section 7.1, Connection Setup).

RFC5044 - Page 15

   When Markers are enabled, MPA senders MUST insert a Marker into the
   data stream at a 512-octet periodic interval in the TCP Sequence
   Number Space.  The Marker contains a 16-bit unsigned integer referred
   to as the FPDUPTR (FPDU Pointer).

   If the FPDUPTR's value is non-zero, the FPDU Pointer is a 16-bit
   relative back-pointer.  FPDUPTR MUST contain the number of octets in
   the TCP stream from the beginning of the ULPDU Length field to the
   first octet of the Marker, unless the Marker falls between FPDUs.
   Thus, the location of the first octet of the previous FPDU header can
   be determined by subtracting the value of the given Marker from the
   current octet-stream sequence number (i.e., TCP sequence number) of
   the first octet of the Marker.  Note that this computation MUST take
   into account that the TCP sequence number could have wrapped between
   the Marker and the header.

   An FPDUPTR value of 0x0000 is a special case -- it is used when the
   Marker falls exactly between FPDUs (between the preceding FPDU CRC
   field and the next FPDU's ULPDU Length field).  In this case, the
   Marker is considered to be contained in the following FPDU; the
   Marker MUST be included in the CRC calculation of the FPDU following
   the Marker (if CRCs are being generated or checked).  Thus, an
   FPDUPTR value of 0x0000 means that immediately following the Marker
   is an FPDU header (the ULPDU Length field).

   Since all FPDUs are integral multiples of 4 octets, the bottom two
   bits of the FPDUPTR as calculated by the sender are zero.  MPA
   reserves these bits so they MUST be treated as zero for computation
   at the receiver.

   When Markers are enabled (see Section 7.1, Connection Setup), the MPA
   Markers MUST be inserted immediately preceding the first FPDU of Full
   Operation Phase, and at every 512th octet of the TCP octet stream
   thereafter.  As a result, the first Marker has an FPDUPTR value of
   0x0000.  If the first Marker begins at octet sequence number
   SeqStart, then Markers are inserted such that the first octet of the
   Marker is at octet sequence number SeqNum if the remainder of (SeqNum
   - SeqStart) mod 512 is zero.  Note that SeqNum can wrap.

   For example, if the TCP sequence number were used to calculate the
   insertion point of the Marker, the starting TCP sequence number is
   unlikely to be zero, and 512-octet multiples are unlikely to fall on
   a modulo 512 of zero.  If the MPA connection is started at TCP
   sequence number 11, then the 1st Marker will begin at 11, and
   subsequent Markers will begin at 523, 1035, etc.

RFC5044 - Page 16

   If an FPDU is large enough to contain multiple Markers, they MUST all
   point to the same point in the TCP stream: the first octet of the
   ULPDU Length field for the FPDU.

   If a Marker interval contains multiple FPDUs (the FPDUs are small),
   the Marker MUST point to the start of the ULPDU Length field for the
   FPDU containing the Marker unless the Marker falls between FPDUs, in
   which case the Marker MUST be zero.

   The following example shows an FPDU containing a Marker.

   0                   1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |       ULPDU Length (0x0010)   |                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               +
   |                                                               |
   +                                                               +
   |                         ULPDU (octets 0-9)                    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |            (0x0000)           |        FPDU ptr (0x000C)      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                        ULPDU (octets 10-15)                   |
   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                               |          PAD (2 octets:0,0)   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                              CRC                              |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

              Figure 4: Example FPDU Format with Marker

   MPA Receivers MUST preserve ULPDU boundaries when passing data to
   DDP.  MPA Receivers MUST pass the ULPDU data and the ULPDU Length to
   DDP and not the Markers, headers, and CRC.

4.4.  CRC Calculation

   An MPA implementation MUST implement CRC support and MUST either:

   (1)  always use CRCs; the MPA provider is not REQUIRED to support an
        administrator's request that CRCs not be used.

        or

   (2a) only indicate a preference not to use CRCs on the explicit
        request of the system administrator, via an interface not
        defined in this spec.  The default configuration for a
        connection MUST be to use CRCs.

RFC5044 - Page 17

   (2b) disable CRC checking (and possibly generation) if both the local
        and remote endpoints indicate preference not to use CRCs.

   An administrative decision to have a host request CRC suppression
   SHOULD NOT be made unless there is assurance that the TCP connection
   involved provides protection from undetected errors that is at least
   as strong as an end-to-end CRC32c.  End-to-end usage of an IPsec
   cryptographic integrity check is among the ways to provide such
   protection, and the use of channel bindings [NFSv4CHANNEL] by the ULP
   can provide a high level of assurance that the IPsec protection scope
   is end-to-end with respect to the ULP.

   The process MUST be invisible to the ULP.

   After receipt of an MPA startup declaration indicating that its peer
   requires CRCs, an MPA instance MUST continue generating and checking
   CRCs until the connection terminates.  If an MPA instance has
   declared that it does not require CRCs, it MUST turn off CRC checking
   immediately after receipt of an MPA mode declaration indicating that
   its peer also does not require CRCs.  It MAY continue generating
   CRCs.  See Section 7.1, Connection Setup, for details on the MPA
   startup.

   When sending an FPDU, the sender MUST include a CRC field.  When CRCs
   are enabled, the CRC field in the MPA FPDU MUST be computed using the
   CRC32c polynomial in the manner described in the iSCSI Protocol
   [iSCSI] document for Header and Data Digests.

   The fields which MUST be included in the CRC calculation when sending
   an FPDU are as follows:

   1)  If a Marker does not immediately precede the ULPDU Length field,
       the CRC-32c is calculated from the first octet of the ULPDU
       Length field, through all the ULPDU and Markers (if present), to
       the last octet of the PAD (if present), inclusive.  If there is a
       Marker immediately following the PAD, the Marker is included in
       the CRC calculation for this FPDU.

   2)  If a Marker immediately precedes the first octet of the ULPDU
       Length field of the FPDU, (i.e., the Marker fell between FPDUs,
       and thus is required to be included in the second FPDU), the
       CRC-32c is calculated from the first octet of the Marker, through
       the ULPDU Length header, through all the ULPDU and Markers (if
       present), to the last octet of the PAD (if present), inclusive.

   3)  After calculating the CRC-32c, the resultant value is placed into
       the CRC field at the end of the FPDU.

RFC5044 - Page 18

   When an FPDU is received, and CRC checking is enabled, the receiver
   MUST first perform the following:

   1)  Calculate the CRC of the incoming FPDU in the same fashion as
       defined above.

   2)  Verify that the calculated CRC-32c value is the same as the
       received CRC-32c value found in the FPDU CRC field.  If not, the
       receiver MUST treat the FPDU as an invalid FPDU.

   The procedure for handling invalid FPDUs is covered in Section 8,
   Error Semantics.

   The following is an annotated hex dump of an example FPDU sent as the
   first FPDU on the stream.  As such, it starts with a Marker.  The
   FPDU contains a 42 octet ULPDU (an example DDP segment) which in turn
   contains 24 octets of the contained ULPDU, which is a data load that
   is all zeros.  The CRC32c has been correctly calculated and can be
   used as a reference.  See the [DDP] and [RDMAP] specification for
   definitions of the DDP Control field, Queue, MSN, MO, and Send Data.

RFC5044 - Page 19

       Octet Contents  Annotation
       Count

       0000    00      Marker: Reserved
       0001    00
       0002    00      Marker: FPDUPTR
       0003    00
       0004    00      ULPDU Length
       0005    2a
       0006    41      DDP Control Field, Send with Last flag set
       0007    43
       0008    00      Reserved (DDP STag position with no STag)
       0009    00
       000a    00
       000b    00
       000c    00      DDP Queue = 0
       000d    00
       000e    00
       000f    00
       0010    00      DDP MSN = 1
       0011    00
       0012    00
       0013    01
       0014    00      DDP MO = 0
       0015    00
       0016    00
       0017    00
       0018    00      DDP Send Data (24 octets of zeros)
       ...
       002f    00
       0030    52      CRC32c
       0031    23
       0032    99
       0033    83

                  Figure 5: Annotated Hex Dump of an FPDU

RFC5044 - Page 20

      The following is an example sent as the second FPDU of the stream
      where the first FPDU (which is not shown here) had a length of 492
      octets and was also a Send to Queue 0 with Last Flag set.  This
      example contains a Marker.

       Octet Contents  Annotation
       Count

       01ec    00      Length
       01ed    2a
       01ee    41      DDP Control Field: Send with Last Flag set
       01ef    43
       01f0    00      Reserved (DDP STag position with no STag)
       01f1    00
       01f2    00
       01f3    00
       01f4    00      DDP Queue = 0
       01f5    00
       01f6    00
       01f7    00
       01f8    00      DDP MSN = 2
       01f9    00
       01fa    00
       01fb    02
       01fc    00      DDP MO = 0
       01fd    00
       01fe    00
       01ff    00
       0200    00      Marker: Reserved
       0201    00
       0202    00      Marker: FPDUPTR
       0203    14
       0204    00      DDP Send Data (24 octets of zeros)
       ...
       021b    00
       021c    84      CRC32c
       021d    92
       021e    58
       021f    98

            Figure 6: Annotated Hex Dump of an FPDU with Marker

RFC5044 - Page 21

4.5.  FPDU Size Considerations

   MPA defines the Maximum Upper Layer Protocol Data Unit (MULPDU) as
   the size of the largest ULPDU fitting in an FPDU.  For an empty TCP
   Segment, MULPDU is EMSS minus the FPDU overhead (6 octets) minus
   space for Markers and pad octets.

       The maximum ULPDU Length for a single ULPDU when Markers are
       present MUST be computed as:

       MULPDU = EMSS - (6 + 4 * Ceiling(EMSS / 512) + EMSS mod 4)

   The formula above accounts for the worst-case number of Markers.

       The maximum ULPDU Length for a single ULPDU when Markers are NOT
       present MUST be computed as:

       MULPDU = EMSS - (6 + EMSS mod 4)

   As a further optimization of the wire efficiency an MPA
   implementation MAY dynamically adjust the MULPDU (see Section 5 for
   latency and wire efficiency trade-offs).  When one or more FPDUs are
   already packed into a TCP Segment, MULPDU MAY be reduced accordingly.

   DDP SHOULD provide ULPDUs that are as large as possible, but less
   than or equal to MULPDU.

   If the TCP implementation needs to adjust EMSS to support MTU changes
   or changing TCP options, the MULPDU value is changed accordingly.

   In certain rare situations, the EMSS may shrink below 128 octets in
   size.  If this occurs, the MPA on TCP sender MUST NOT shrink the
   MULPDU below 128 octets and is not required to follow the
   segmentation rules in Section 5.1 and Appendix A.

   If one or more FPDUs are already packed into a TCP segment, such that
   the remaining room is less than 128 octets, MPA MUST NOT provide a
   MULPDU smaller than 128.  In this case, MPA would typically provide a
   MULPDU for the next full sized segment, but may still pack the next
   FPDU into the small remaining room, provide that the next FPDU is
   small enough to fit.

   The value 128 is chosen as to allow DDP designers room for the DDP
   Header and some user data.

(next page on part 2)