Tech-invite3GPPspaceIETFspace
96959493929190898887868584838281807978777675747372717069686766656463626160595857565554535251504948474645444342414039383736353433323130292827262524232221201918171615141312111009080706050403020100
in Index   Prev   Next

RFC 5044

Marker PDU Aligned Framing for TCP Specification

Pages: 74
Proposed Standard
Errata
Updated by:  65817146
Part 1 of 3 – Pages 1 to 21
None   None   Next

Top   ToC   RFC5044 - Page 1
Network Working Group                                          P. Culley
Request for Comments: 5044                       Hewlett-Packard Company
Category: Standards Track                                       U. Elzur
                                                    Broadcom Corporation
                                                                R. Recio
                                                         IBM Corporation
                                                               S. Bailey
                                                   Sandburst Corporation
                                                              J. Carrier
                                                               Cray Inc.
                                                            October 2007


            Marker PDU Aligned Framing for TCP Specification

Status of This Memo

   This document specifies an Internet standards track protocol for the
   Internet community, and requests discussion and suggestions for
   improvements.  Please refer to the current edition of the "Internet
   Official Protocol Standards" (STD 1) for the standardization state
   and status of this protocol.  Distribution of this memo is unlimited.

Abstract

Marker PDU Aligned Framing (MPA) is designed to work as an "adaptation layer" between TCP and the Direct Data Placement protocol (DDP) as described in RFC 5041. It preserves the reliable, in-order delivery of TCP, while adding the preservation of higher-level protocol record boundaries that DDP requires. MPA is fully compliant with applicable TCP RFCs and can be utilized with existing TCP implementations. MPA also supports integrated implementations that combine TCP, MPA and DDP to reduce buffering requirements in the implementation and improve performance at the system level.
Top   ToC   RFC5044 - Page 2

Table of Contents

1. Introduction ....................................................4 1.1. Motivation .................................................4 1.2. Protocol Overview ..........................................5 2. Glossary ........................................................8 3. MPA's Interactions with DDP ....................................11 4. MPA Full Operation Phase .......................................13 4.1. FPDU Format ...............................................13 4.2. Marker Format .............................................14 4.3. MPA Markers ...............................................14 4.4. CRC Calculation ...........................................16 4.5. FPDU Size Considerations ..................................21 5. MPA's interactions with TCP ....................................22 5.1. MPA transmitters with a standard layered TCP ..............22 5.2. MPA receivers with a standard layered TCP .................23 6. MPA Receiver FPDU Identification ...............................24 7. Connection Semantics ...........................................24 7.1. Connection Setup ..........................................24 7.1.1. MPA Request and Reply Frame Format .................26 7.1.2. Connection Startup Rules ...........................28 7.1.3. Example Delayed Startup Sequence ...................30 7.1.4. Use of Private Data ................................33 7.1.4.1. Motivation ................................33 7.1.4.2. Example Immediate Startup Using Private Data ..............................35 7.1.5. "Dual Stack" Implementations .......................37 7.2. Normal Connection Teardown ................................38 8. Error Semantics ................................................39 9. Security Considerations ........................................40 9.1. Protocol-Specific Security Considerations .................40 9.1.1. Spoofing ...........................................40 9.1.1.1. Impersonation .............................41 9.1.1.2. Stream Hijacking ..........................41 9.1.1.3. Man-in-the-Middle Attack ..................41 9.1.2. Eavesdropping ......................................42 9.2. Introduction to Security Options ..........................42 9.3. Using IPsec with MPA ......................................43 9.4. Requirements for IPsec Encapsulation of MPA/DDP ...........43 10. IANA Considerations ...........................................44 Appendix A. Optimized MPA-Aware TCP Implementations ...............45 A.1. Optimized MPA/TCP Transmitters ............................46 A.2. Effects of Optimized MPA/TCP Segmentation .................46 A.3. Optimized MPA/TCP Receivers ...............................48 A.4. Re-segmenting Middleboxes and Non-Optimized MPA/TCP Senders ...................................................49 A.5. Receiver Implementation ...................................50 A.5.1. Network Layer Reassembly Buffers ...................51
Top   ToC   RFC5044 - Page 3
           A.5.2. TCP Reassembly Buffers .............................52
   Appendix B. Analysis of MPA over TCP Operations ...................52
      B.1. Assumptions ...............................................53
           B.1.1. MPA Is Layered beneath DDP .........................53
           B.1.2. MPA Preserves DDP Message Framing ..................53
           B.1.3. The Size of the ULPDU Passed to MPA Is Less Than
                  EMSS Under Normal Conditions .......................53
           B.1.4. Out-of-Order Placement but NO Out-of-Order Delivery.54
     B.2.  The Value of FPDU Alignment ...............................54
           B.2.1. Impact of Lack of FPDU Alignment on the Receiver
                  Computational Load and Complexity ..................56
           B.2.2. FPDU Alignment Effects on TCP Wire Protocol ........60
   Appendix C. IETF Implementation Interoperability with RDMA
               Consortium Protocols ..................................62
     C.1. Negotiated Parameters ......................................63
     C.2. RDMAC RNIC and Non-Permissive IETF RNIC ....................64
          C.2.1. RDMAC RNIC Initiator ................................65
          C.2.2. Non-Permissive IETF RNIC Initiator ..................65
          C.2.3. RDMAC RNIC and Permissive IETF RNIC .................65
          C.2.4. RDMAC RNIC Initiator ................................66
          C.2.5. Permissive IETF RNIC Initiator ......................67
     C.3. Non-Permissive IETF RNIC and Permissive IETF RNIC ..........67
   Normative References ..............................................68
   Informative References ............................................68
   Contributors ......................................................70

Table of Figures

Figure 1: ULP MPA TCP Layering .....................................5 Figure 2: FPDU Format .............................................13 Figure 3: Marker Format ...........................................14 Figure 4: Example FPDU Format with Marker .........................16 Figure 5: Annotated Hex Dump of an FPDU ...........................19 Figure 6: Annotated Hex Dump of an FPDU with Marker ...............20 Figure 7: Fully Layered Implementation ............................22 Figure 8: MPA Request/Reply Frame .................................26 Figure 9: Example Delayed Startup Negotiation .....................31 Figure 10: Example Immediate Startup Negotiation ..................35 Figure 11: Optimized MPA/TCP Implementation .......................45 Figure 12: Non-Aligned FPDU Freely Placed in TCP Octet Stream .....56 Figure 13: Aligned FPDU Placed Immediately after TCP Header .......58 Figure 14: Connection Parameters for the RNIC Types ...............63 Figure 15: MPA Negotiation between an RDMAC RNIC and a Non-Permissive IETF RNIC ...............................65 Figure 16: MPA Negotiation between an RDMAC RNIC and a Permissive IETF RNIC ..............................................66 Figure 17: MPA Negotiation between a Non-Permissive IETF RNIC and a Permissive IETF RNIC .................................67
Top   ToC   RFC5044 - Page 4

1. Introduction

This section discusses the reason for creating MPA on TCP and a general overview of the protocol.

1.1. Motivation

The Direct Data Placement protocol [DDP], when used with TCP [RFC793], requires a mechanism to detect record boundaries. The DDP records are referred to as Upper Layer Protocol Data Units by this document. The ability to locate the Upper Layer Protocol Data Unit (ULPDU) boundary is useful to a hardware network adapter that uses DDP to directly place the data in the application buffer based on the control information carried in the ULPDU header. This may be done without requiring that the packets arrive in order. Potential benefits of this capability are the avoidance of the memory copy overhead and a smaller memory requirement for handling out-of-order or dropped packets. Many approaches have been proposed for a generalized framing mechanism. Some are probabilistic in nature and others are deterministic. An example probabilistic approach is characterized by a detectable value embedded in the octet stream, with no method of preventing that value elsewhere within user data. It is probabilistic because under some conditions the receiver may incorrectly interpret application data as the detectable value. Under these conditions, the protocol may fail with unacceptable frequency. One deterministic approach is characterized by embedded controls at known locations in the octet stream. Because the receiver can guarantee it will only examine the data stream at locations that are known to contain the embedded control, the protocol can never misinterpret application data as being embedded control data. For unambiguous handling of an out-of-order packet, a deterministic approach is preferred. The MPA protocol provides a framing mechanism for DDP running over TCP using the deterministic approach. It allows the location of the ULPDU to be determined in the TCP stream even if the TCP segments arrive out of order.
Top   ToC   RFC5044 - Page 5

1.2. Protocol Overview

The layering of PDUs with MPA is shown in Figure 1, below. +------------------+ | ULP client | +------------------+ <- Consumer messages | DDP | +------------------+ <- ULPDUs | MPA* | +------------------+ <- FPDUs (containing ULPDUs) | TCP* | +------------------+ <- TCP Segments (containing FPDUs) | IP etc. | +------------------+ * These may be fully layered or optimized together. Figure 1: ULP MPA TCP Layering MPA is described as an extra layer above TCP and below DDP. The operation sequence is: 1. A TCP connection is established by ULP action. This is done using methods not described by this specification. The ULP may exchange some amount of data in streaming mode prior to starting MPA, but is not required to do so. 2. The Consumer negotiates the use of DDP and MPA at both ends of a connection. The mechanisms to do this are not described in this specification. The negotiation may be done in streaming mode, or by some other mechanism (such as a pre-arranged port number). 3. The ULP activates MPA on each end in the Startup Phase, either as an Initiator or a Responder, as determined by the ULP. This mode verifies the usage of MPA, specifies the use of CRC and Markers, and allows the ULP to communicate some additional data via a Private Data exchange. See Section 7.1, Connection Setup, for more details on the startup process. 4. At the end of the Startup Phase, the ULP puts MPA (and DDP) into Full Operation and begins sending DDP data as further described below. In this document, DDP data chunks are called ULPDUs. For a description of the DDP data, see [DDP].
Top   ToC   RFC5044 - Page 6
   Following is a description of data transfer when MPA is in Full
   Operation.

   1.  DDP determines the Maximum ULPDU (MULPDU) size by querying MPA
       for this value.  MPA derives this information from TCP or IP,
       when it is available, or chooses a reasonable value.

   2.  DDP creates ULPDUs of MULPDU size or smaller, and hands them to
       MPA at the sender.

   3.  MPA creates a Framed Protocol Data Unit (FPDU) by prepending a
       header, optionally inserting Markers, and appending a CRC field
       after the ULPDU and PAD (if any).  MPA delivers the FPDU to TCP.

   4.  The TCP sender puts the FPDUs into the TCP stream.  If the sender
       is optimized MPA/TCP, it segments the TCP stream in such a way
       that a TCP Segment boundary is also the boundary of an FPDU.  TCP
       then passes each segment to the IP layer for transmission.

   5.  The receiver may or may not be optimized.  If it is optimized
       MPA/TCP, it may separate passing the TCP payload to MPA from
       passing the TCP payload ordering information to MPA.  In either
       case, RFC-compliant TCP wire behavior is observed at both the
       sender and receiver.

   6.  The MPA receiver locates and assembles complete FPDUs within the
       stream, verifies their integrity, and removes MPA Markers (when
       present), ULPDU_Length, PAD, and the CRC field.

   7.  MPA then provides the complete ULPDUs to DDP.  MPA may also
       separate passing MPA payload to DDP from passing the MPA payload
       ordering information.

   A fully layered MPA on TCP is implemented as a data stream ULP for
   TCP and is therefore RFC compliant.

   An optimized DDP/MPA/TCP uses a TCP layer that potentially contains
   some additional behaviors as suggested in this document.  When
   DDP/MPA/TCP are cross-layer optimized, the behavior of TCP
   (especially sender segmentation) may change from that of the un-
   optimized implementation, but the changes are within the bounds
   permitted by the TCP RFC specifications, and will interoperate with
   an un-optimized TCP.  The additional behaviors are described in
   Appendix A and are not normative; they are described at a TCP
   interface layer as a convenience.  Implementations may achieve the
   described functionality using any method, including cross-layer
   optimizations between TCP, MPA, and DDP.
Top   ToC   RFC5044 - Page 7
   An optimized DDP/MPA/TCP sender is able to segment the data stream
   such that TCP segments begin with FPDUs (FPDU Alignment).  This has
   significant advantages for receivers.  When segments arrive with
   aligned FPDUs, the receiver usually need not buffer any portion of
   the segment, allowing DDP to place it in its destination memory
   immediately, thus avoiding copies from intermediate buffers (DDP's
   reason for existence).

   An optimized DDP/MPA/TCP receiver allows a DDP on MPA implementation
   to locate the start of ULPDUs that may be received out of order.  It
   also allows the implementation to determine if the entire ULPDU has
   been received.  As a result, MPA can pass out-of-order ULPDUs to DDP
   for immediate use.  This enables a DDP on MPA implementation to save
   a significant amount of intermediate storage by placing the ULPDUs in
   the right locations in the application buffers when they arrive,
   rather than waiting until full ordering can be restored.

   The ability of a receiver to recover out-of-order ULPDUs is optional
   and declared to the transmitter during startup.  When the receiver
   declares that it does not support out-of-order recovery, the
   transmitter does not add the control information to the data stream
   needed for out-of-order recovery.

   If the receiver is fully layered, then MPA receives a strictly
   ordered stream of data and does not deal with out-of-order ULPDUs.
   In this case, MPA passes each ULPDU to DDP when the last bytes arrive
   from TCP, along with the indication that they are in order.

   MPA implementations that support recovery of out-of-order ULPDUs MUST
   support a mechanism to indicate the ordering of ULPDUs as the sender
   transmitted them and indicate when missing intermediate segments
   arrive.  These mechanisms allow DDP to reestablish record ordering
   and report Delivery of complete messages (groups of records).

   MPA also addresses enhanced data integrity.  Some users of TCP have
   noted that the TCP checksum is not as strong as could be desired (see
   [CRCTCP]).  Studies such as [CRCTCP] have shown that the TCP checksum
   indicates segments in error at a much higher rate than the underlying
   link characteristics would indicate.  With these higher error rates,
   the chance that an error will escape detection, when using only the
   TCP checksum for data integrity, becomes a concern.  A stronger
   integrity check can reduce the chance of data errors being missed.

   MPA includes a CRC check to increase the ULPDU data integrity to the
   level provided by other modern protocols, such as SCTP [RFC4960].  It
   is possible to disable this CRC check; however, CRCs MUST be enabled
   unless it is clear that the end-to-end connection through the network
   has data integrity at least as good as an MPA with CRC enabled (for
Top   ToC   RFC5044 - Page 8
   example, when IPsec is implemented end to end).  DDP's ULP expects
   this level of data integrity and therefore the ULP does not have to
   provide its own duplicate data integrity and error recovery for lost
   data.

2. Glossary

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. Consumer - the ULPs or applications that lie above MPA and DDP. The Consumer is responsible for making TCP connections, starting MPA and DDP connections, and generally controlling operations. CRC - Cyclic Redundancy Check. Delivery - (Delivered, Delivers) - For MPA, Delivery is defined as the process of informing DDP that a particular PDU is ordered for use. A PDU is Delivered in the exact order that it was sent by the original sender; MPA uses TCP's byte stream ordering to determine when Delivery is possible. This is specifically different from "passing the PDU to DDP", which may generally occur in any order, while the order of Delivery is strictly defined. EMSS - Effective Maximum Segment Size. EMSS is the smaller of the TCP maximum segment size (MSS) as defined in RFC 793 [RFC793], and the current path Maximum Transmission Unit (MTU) [RFC1191]. FPDU - Framed Protocol Data Unit. The unit of data created by an MPA sender. FPDU Alignment - The property that an FPDU is Header Aligned with the TCP segment, and the TCP segment includes an integer number of FPDUs. A TCP segment with an FPDU Alignment allows immediate processing of the contained FPDUs without waiting on other TCP segments to arrive or combining with prior segments. FPDU Pointer (FPDUPTR) - This field of the Marker is used to indicate the beginning of an FPDU. Full Operation (Full Operation Phase) - After the completion of the Startup Phase, MPA begins exchanging FPDUs.
Top   ToC   RFC5044 - Page 9
   Header Alignment - The property that a TCP segment begins with an
       FPDU.  The FPDU is Header Aligned when the FPDU header is exactly
       at the start of the TCP segment (right behind the TCP headers on
       the wire).

   Initiator - The endpoint of a connection that sends the MPA Request
       Frame, i.e., the first to actually send data (which may not be
       the one that sends the TCP SYN).

   Marker - A four-octet field that is placed in the MPA data stream at
       fixed octet intervals (every 512 octets).

   MPA-aware TCP - A TCP implementation that is aware of the receiver
       efficiencies of MPA FPDU Alignment and is capable of sending TCP
       segments that begin with an FPDU.

   MPA-enabled - MPA is enabled if the MPA protocol is visible on the
       wire.  When the sender is MPA-enabled, it is inserting framing
       and Markers.  When the receiver is MPA-enabled, it is
       interpreting framing and Markers.

   MPA Request Frame - Data sent from the MPA Initiator to the MPA
       Responder during the Startup Phase.

   MPA Reply Frame - Data sent from the MPA Responder to the MPA
       Initiator during the Startup Phase.

   MPA - Marker-based ULP PDU Aligned Framing for TCP protocol.  This
       document defines the MPA protocol.

   MULPDU - Maximum ULPDU.  The current maximum size of the record that
       is acceptable for DDP to pass to MPA for transmission.

   Node - A computing device attached to one or more links of a network.
       A Node in this context does not refer to a specific application
       or protocol instantiation running on the computer.  A Node may
       consist of one or more MPA on TCP devices installed in a host
       computer.

   PAD - A 1-3 octet group of zeros used to fill an FPDU to an exact
       modulo 4 size.

   PDU - Protocol data unit

   Private Data - A block of data exchanged between MPA endpoints during
       initial connection setup.
Top   ToC   RFC5044 - Page 10
   Protection Domain - An RDMA concept (see [VERBS-RDMA] and [RDMASEC])
       that ties use of various endpoint resources (memory access, etc.)
       to the specific RDMA/DDP/MPA connection.

   RDDP - A suite of protocols including MPA, [DDP], [RDMAP], an overall
       security document [RDMASEC], a problem statement [RFC4297], an
       architecture document [RFC4296], and an applicability document
       [APPL].

   RDMA - Remote Direct Memory Access; a protocol that uses DDP and MPA
       to enable applications to transfer data directly from memory
       buffers.  See [RDMAP].

   Remote Peer - The MPA protocol implementation on the opposite end of
       the connection.  Used to refer to the remote entity when
       describing protocol exchanges or other interactions between two
       Nodes.

   Responder - The connection endpoint that responds to an incoming MPA
       connection request (the MAP Request Frame).  This may not be the
       endpoint that awaited the TCP SYN.

   Startup Phase - The initial exchanges of an MPA connection that
       serves to more fully identify MPA endpoints to each other and
       pass connection specific setup information to each other.

   ULP - Upper Layer Protocol.  The protocol layer above the protocol
       layer currently being referenced.  The ULP for MPA is DDP [DDP].

   ULPDU - Upper Layer Protocol Data Unit.  The data record defined by
       the layer above MPA (DDP).  ULPDU corresponds to DDP's DDP
       segment.

   ULPDU_Length - A field in the FPDU describing the length of the
       included ULPDU.
Top   ToC   RFC5044 - Page 11

3. MPA's Interactions with DDP

DDP requires MPA to maintain DDP record boundaries from the sender to the receiver. When using MPA on TCP to send data, DDP provides records (ULPDUs) to MPA. MPA will use the reliable transmission abilities of TCP to transmit the data, and will insert appropriate additional information into the TCP stream to allow the MPA receiver to locate the record boundary information. As such, MPA accepts complete records (ULPDUs) from DDP at the sender and returns them to DDP at the receiver. MPA MUST encapsulate the ULPDU such that there is exactly one ULPDU contained in one FPDU. MPA over a standard TCP stack can usually provide FPDU Alignment with the TCP Header if the FPDU is equal to TCP's EMSS. An optimized MPA/TCP stack can also maintain alignment as long as the FPDU is less than or equal to TCP's EMSS. Since FPDU Alignment is generally desired by the receiver, DDP cooperates with MPA to ensure FPDUs' lengths do not exceed the EMSS under normal conditions. This is done with the MULPDU mechanism. MPA MUST provide information to DDP on the current maximum size of the record that is acceptable to send (MULPDU). DDP SHOULD limit each record size to MULPDU. The range of MULPDU values MUST be between 128 octets and 64768 octets, inclusive. The sending DDP MUST NOT post a ULPDU larger than 64768 octets to MPA. DDP MAY post a ULPDU of any size between one and 64768 octets; however, MPA is not REQUIRED to support a ULPDU Length that is greater than the current MULPDU. While the maximum theoretical length supported by the MPA header ULPDU_Length field is 65535, TCP over IP requires the IP datagram maximum length to be 65535 octets. To enable MPA to support FPDU Alignment, the maximum size of the FPDU must fit within an IP datagram. Thus, the ULPDU limit of 64768 octets was derived by taking the maximum IP datagram length, subtracting from it the maximum total length of the sum of the IPv4 header, TCP header, IPv4 options, TCP options, and the worst-case MPA overhead, and then rounding the result down to a 128-octet boundary. Note that MULPDU will be significantly smaller than the theoretical maximum in most implementations for most circumstances, due to link MTUs, use of extra headers such as required for IPsec, etc.
Top   ToC   RFC5044 - Page 12
   On receive, MPA MUST pass each ULPDU with its length to DDP when it
   has been validated.

   If an MPA implementation supports passing out-of-order ULPDUs to DDP,
   the MPA implementation SHOULD:

   *   Pass each ULPDU with its length to DDP as soon as it has been
       fully received and validated.

   *   Provide a mechanism to indicate the ordering of ULPDUs as the
       sender transmitted them.  One possible mechanism might be
       providing the TCP sequence number for each ULPDU.

   *   Provide a mechanism to indicate when a given ULPDU (and prior
       ULPDUs) are complete (Delivered to DDP).  One possible mechanism
       might be to allow DDP to see the current outgoing TCP ACK
       sequence number.

   *   Provide an indication to DDP that the TCP has closed or has begun
       to close the connection (e.g., received a FIN).

   MPA MUST provide the protocol version negotiated with its peer to
   DDP.  DDP will use this version to set the version in its header and
   to report the version to [RDMAP].
Top   ToC   RFC5044 - Page 13

4. MPA Full Operation Phase

The following sections describe the main semantics of the Full Operation Phase of MPA.

4.1. FPDU Format

MPA senders create FPDUs out of ULPDUs. The format of an FPDU shown below MUST be used for all MPA FPDUs. For purposes of clarity, Markers are not shown in Figure 2. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ULPDU_Length | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | | ~ ~ ~ ULPDU ~ | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | PAD (0-3 octets) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | CRC | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 2: FPDU Format ULPDU_Length: 16 bits (unsigned integer). This is the number of octets of the contained ULPDU. It does not include the length of the FPDU header itself, the pad, the CRC, or of any Markers that fall within the ULPDU. The 16-bit ULPDU Length field is large enough to support the largest IP datagrams for IPv4 or IPv6. PAD: The PAD field trails the ULPDU and contains between 0 and 3 octets of data. The pad data MUST be set to zero by the sender and ignored by the receiver (except for CRC checking). The length of the pad is set so as to make the size of the FPDU an integral multiple of four. CRC: 32 bits. When CRCs are enabled, this field contains a CRC32c check value, which is used to verify the entire contents of the FPDU, using CRC32c. See Section 4.4, CRC Calculation. When CRCs are not enabled, this field is still present, may contain any value, and MUST NOT be checked.
Top   ToC   RFC5044 - Page 14
   The FPDU adds a minimum of 6 octets to the length of the ULPDU.  In
   addition, the total length of the FPDU will include the length of any
   Markers and from 0 to 3 pad octets added to round-up the ULPDU size.

4.2. Marker Format

The format of a Marker MUST be as specified in Figure 3: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | RESERVED | FPDUPTR | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 3: Marker Format RESERVED: The Reserved field MUST be set to zero on transmit and ignored on receive (except for CRC calculation). FPDUPTR: The FPDU Pointer is a relative pointer, 16 bits long, interpreted as an unsigned integer that indicates the number of octets in the TCP stream from the beginning of the ULPDU Length field to the first octet of the entire Marker. The least significant two bits MUST always be set to zero at the transmitter, and the receivers MUST always treat these as zero for calculations.

4.3. MPA Markers

MPA Markers are used to identify the start of FPDUs when packets are received out of order. This is done by locating the Markers at fixed intervals in the data stream (which is correlated to the TCP sequence number) and using the Marker value to locate the preceding FPDU start. All MPA Markers are included in the containing FPDU CRC calculation (when both CRCs and Markers are in use). The MPA receiver's ability to locate out-of-order FPDUs and pass the ULPDUs to DDP is implementation dependent. MPA/DDP allows those receivers that are able to deal with out-of-order FPDUs in this way to require the insertion of Markers in the data stream. When the receiver cannot deal with out-of-order FPDUs in this way, it may disable the insertion of Markers at the sender. All MPA senders MUST be able to generate Markers when their use is declared by the opposing receiver (see Section 7.1, Connection Setup).
Top   ToC   RFC5044 - Page 15
   When Markers are enabled, MPA senders MUST insert a Marker into the
   data stream at a 512-octet periodic interval in the TCP Sequence
   Number Space.  The Marker contains a 16-bit unsigned integer referred
   to as the FPDUPTR (FPDU Pointer).

   If the FPDUPTR's value is non-zero, the FPDU Pointer is a 16-bit
   relative back-pointer.  FPDUPTR MUST contain the number of octets in
   the TCP stream from the beginning of the ULPDU Length field to the
   first octet of the Marker, unless the Marker falls between FPDUs.
   Thus, the location of the first octet of the previous FPDU header can
   be determined by subtracting the value of the given Marker from the
   current octet-stream sequence number (i.e., TCP sequence number) of
   the first octet of the Marker.  Note that this computation MUST take
   into account that the TCP sequence number could have wrapped between
   the Marker and the header.

   An FPDUPTR value of 0x0000 is a special case -- it is used when the
   Marker falls exactly between FPDUs (between the preceding FPDU CRC
   field and the next FPDU's ULPDU Length field).  In this case, the
   Marker is considered to be contained in the following FPDU; the
   Marker MUST be included in the CRC calculation of the FPDU following
   the Marker (if CRCs are being generated or checked).  Thus, an
   FPDUPTR value of 0x0000 means that immediately following the Marker
   is an FPDU header (the ULPDU Length field).

   Since all FPDUs are integral multiples of 4 octets, the bottom two
   bits of the FPDUPTR as calculated by the sender are zero.  MPA
   reserves these bits so they MUST be treated as zero for computation
   at the receiver.

   When Markers are enabled (see Section 7.1, Connection Setup), the MPA
   Markers MUST be inserted immediately preceding the first FPDU of Full
   Operation Phase, and at every 512th octet of the TCP octet stream
   thereafter.  As a result, the first Marker has an FPDUPTR value of
   0x0000.  If the first Marker begins at octet sequence number
   SeqStart, then Markers are inserted such that the first octet of the
   Marker is at octet sequence number SeqNum if the remainder of (SeqNum
   - SeqStart) mod 512 is zero.  Note that SeqNum can wrap.

   For example, if the TCP sequence number were used to calculate the
   insertion point of the Marker, the starting TCP sequence number is
   unlikely to be zero, and 512-octet multiples are unlikely to fall on
   a modulo 512 of zero.  If the MPA connection is started at TCP
   sequence number 11, then the 1st Marker will begin at 11, and
   subsequent Markers will begin at 523, 1035, etc.
Top   ToC   RFC5044 - Page 16
   If an FPDU is large enough to contain multiple Markers, they MUST all
   point to the same point in the TCP stream: the first octet of the
   ULPDU Length field for the FPDU.

   If a Marker interval contains multiple FPDUs (the FPDUs are small),
   the Marker MUST point to the start of the ULPDU Length field for the
   FPDU containing the Marker unless the Marker falls between FPDUs, in
   which case the Marker MUST be zero.

   The following example shows an FPDU containing a Marker.

   0                   1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |       ULPDU Length (0x0010)   |                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               +
   |                                                               |
   +                                                               +
   |                         ULPDU (octets 0-9)                    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |            (0x0000)           |        FPDU ptr (0x000C)      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                        ULPDU (octets 10-15)                   |
   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                               |          PAD (2 octets:0,0)   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                              CRC                              |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

              Figure 4: Example FPDU Format with Marker

   MPA Receivers MUST preserve ULPDU boundaries when passing data to
   DDP.  MPA Receivers MUST pass the ULPDU data and the ULPDU Length to
   DDP and not the Markers, headers, and CRC.

4.4. CRC Calculation

An MPA implementation MUST implement CRC support and MUST either: (1) always use CRCs; the MPA provider is not REQUIRED to support an administrator's request that CRCs not be used. or (2a) only indicate a preference not to use CRCs on the explicit request of the system administrator, via an interface not defined in this spec. The default configuration for a connection MUST be to use CRCs.
Top   ToC   RFC5044 - Page 17
   (2b) disable CRC checking (and possibly generation) if both the local
        and remote endpoints indicate preference not to use CRCs.

   An administrative decision to have a host request CRC suppression
   SHOULD NOT be made unless there is assurance that the TCP connection
   involved provides protection from undetected errors that is at least
   as strong as an end-to-end CRC32c.  End-to-end usage of an IPsec
   cryptographic integrity check is among the ways to provide such
   protection, and the use of channel bindings [NFSv4CHANNEL] by the ULP
   can provide a high level of assurance that the IPsec protection scope
   is end-to-end with respect to the ULP.

   The process MUST be invisible to the ULP.

   After receipt of an MPA startup declaration indicating that its peer
   requires CRCs, an MPA instance MUST continue generating and checking
   CRCs until the connection terminates.  If an MPA instance has
   declared that it does not require CRCs, it MUST turn off CRC checking
   immediately after receipt of an MPA mode declaration indicating that
   its peer also does not require CRCs.  It MAY continue generating
   CRCs.  See Section 7.1, Connection Setup, for details on the MPA
   startup.

   When sending an FPDU, the sender MUST include a CRC field.  When CRCs
   are enabled, the CRC field in the MPA FPDU MUST be computed using the
   CRC32c polynomial in the manner described in the iSCSI Protocol
   [iSCSI] document for Header and Data Digests.

   The fields which MUST be included in the CRC calculation when sending
   an FPDU are as follows:

   1)  If a Marker does not immediately precede the ULPDU Length field,
       the CRC-32c is calculated from the first octet of the ULPDU
       Length field, through all the ULPDU and Markers (if present), to
       the last octet of the PAD (if present), inclusive.  If there is a
       Marker immediately following the PAD, the Marker is included in
       the CRC calculation for this FPDU.

   2)  If a Marker immediately precedes the first octet of the ULPDU
       Length field of the FPDU, (i.e., the Marker fell between FPDUs,
       and thus is required to be included in the second FPDU), the
       CRC-32c is calculated from the first octet of the Marker, through
       the ULPDU Length header, through all the ULPDU and Markers (if
       present), to the last octet of the PAD (if present), inclusive.

   3)  After calculating the CRC-32c, the resultant value is placed into
       the CRC field at the end of the FPDU.
Top   ToC   RFC5044 - Page 18
   When an FPDU is received, and CRC checking is enabled, the receiver
   MUST first perform the following:

   1)  Calculate the CRC of the incoming FPDU in the same fashion as
       defined above.

   2)  Verify that the calculated CRC-32c value is the same as the
       received CRC-32c value found in the FPDU CRC field.  If not, the
       receiver MUST treat the FPDU as an invalid FPDU.

   The procedure for handling invalid FPDUs is covered in Section 8,
   Error Semantics.

   The following is an annotated hex dump of an example FPDU sent as the
   first FPDU on the stream.  As such, it starts with a Marker.  The
   FPDU contains a 42 octet ULPDU (an example DDP segment) which in turn
   contains 24 octets of the contained ULPDU, which is a data load that
   is all zeros.  The CRC32c has been correctly calculated and can be
   used as a reference.  See the [DDP] and [RDMAP] specification for
   definitions of the DDP Control field, Queue, MSN, MO, and Send Data.
Top   ToC   RFC5044 - Page 19
       Octet Contents  Annotation
       Count

       0000    00      Marker: Reserved
       0001    00
       0002    00      Marker: FPDUPTR
       0003    00
       0004    00      ULPDU Length
       0005    2a
       0006    41      DDP Control Field, Send with Last flag set
       0007    43
       0008    00      Reserved (DDP STag position with no STag)
       0009    00
       000a    00
       000b    00
       000c    00      DDP Queue = 0
       000d    00
       000e    00
       000f    00
       0010    00      DDP MSN = 1
       0011    00
       0012    00
       0013    01
       0014    00      DDP MO = 0
       0015    00
       0016    00
       0017    00
       0018    00      DDP Send Data (24 octets of zeros)
       ...
       002f    00
       0030    52      CRC32c
       0031    23
       0032    99
       0033    83

                  Figure 5: Annotated Hex Dump of an FPDU
Top   ToC   RFC5044 - Page 20
      The following is an example sent as the second FPDU of the stream
      where the first FPDU (which is not shown here) had a length of 492
      octets and was also a Send to Queue 0 with Last Flag set.  This
      example contains a Marker.

       Octet Contents  Annotation
       Count

       01ec    00      Length
       01ed    2a
       01ee    41      DDP Control Field: Send with Last Flag set
       01ef    43
       01f0    00      Reserved (DDP STag position with no STag)
       01f1    00
       01f2    00
       01f3    00
       01f4    00      DDP Queue = 0
       01f5    00
       01f6    00
       01f7    00
       01f8    00      DDP MSN = 2
       01f9    00
       01fa    00
       01fb    02
       01fc    00      DDP MO = 0
       01fd    00
       01fe    00
       01ff    00
       0200    00      Marker: Reserved
       0201    00
       0202    00      Marker: FPDUPTR
       0203    14
       0204    00      DDP Send Data (24 octets of zeros)
       ...
       021b    00
       021c    84      CRC32c
       021d    92
       021e    58
       021f    98

            Figure 6: Annotated Hex Dump of an FPDU with Marker
Top   ToC   RFC5044 - Page 21

4.5. FPDU Size Considerations

MPA defines the Maximum Upper Layer Protocol Data Unit (MULPDU) as the size of the largest ULPDU fitting in an FPDU. For an empty TCP Segment, MULPDU is EMSS minus the FPDU overhead (6 octets) minus space for Markers and pad octets. The maximum ULPDU Length for a single ULPDU when Markers are present MUST be computed as: MULPDU = EMSS - (6 + 4 * Ceiling(EMSS / 512) + EMSS mod 4) The formula above accounts for the worst-case number of Markers. The maximum ULPDU Length for a single ULPDU when Markers are NOT present MUST be computed as: MULPDU = EMSS - (6 + EMSS mod 4) As a further optimization of the wire efficiency an MPA implementation MAY dynamically adjust the MULPDU (see Section 5 for latency and wire efficiency trade-offs). When one or more FPDUs are already packed into a TCP Segment, MULPDU MAY be reduced accordingly. DDP SHOULD provide ULPDUs that are as large as possible, but less than or equal to MULPDU. If the TCP implementation needs to adjust EMSS to support MTU changes or changing TCP options, the MULPDU value is changed accordingly. In certain rare situations, the EMSS may shrink below 128 octets in size. If this occurs, the MPA on TCP sender MUST NOT shrink the MULPDU below 128 octets and is not required to follow the segmentation rules in Section 5.1 and Appendix A. If one or more FPDUs are already packed into a TCP segment, such that the remaining room is less than 128 octets, MPA MUST NOT provide a MULPDU smaller than 128. In this case, MPA would typically provide a MULPDU for the next full sized segment, but may still pack the next FPDU into the small remaining room, provide that the next FPDU is small enough to fit. The value 128 is chosen as to allow DDP designers room for the DDP Header and some user data.


(next page on part 2)

Next Section