Tech-invite3GPPspaceIETFspace
96959493929190898887868584838281807978777675747372717069686766656463626160595857565554535251504948474645444342414039383736353433323130292827262524232221201918171615141312111009080706050403020100
in Index   Prev   Next

RFC 5040

A Remote Direct Memory Access Protocol Specification

Pages: 66
Proposed Standard
Updated by:  7146
Part 2 of 3 – Pages 19 to 46
First   Prev   Next

Top   ToC   RFC5040 - Page 19   prevText

4. Header Format

The control information of RDMA Messages is included in DDP protocol-defined header fields, with the following exceptions: * The first octet reserved for ULP usage on all DDP Messages in the DDP Protocol (i.e., the RsvdULP Field) is used by RDMAP to carry the RDMA Message Opcode and the RDMAP version. This octet is known as the RDMAP Control Field in this specification. For Send with Invalidate and Send with Solicited Event and Invalidate, RDMAP uses the second through fifth octets, provided by DDP on Untagged DDP Messages, to carry the STag that will be Invalidated. * The RDMA Message length is passed by the RDMAP layer to the DDP layer on all outbound transfers. * For RDMA Read Request Messages, the RDMA Read Message Size is included in the RDMA Read Request Header.
Top   ToC   RFC5040 - Page 20
   *  The RDMA Message length is passed to the RDMAP layer by the DDP
      layer on inbound Untagged Buffer transfers.

   *  Two RDMA Messages carry additional RDMAP headers.  The RDMA Read
      Request carries the Data Sink and Data Source buffer descriptions,
      including buffer length.  The Terminate carries additional
      information associated with the error that caused the Terminate.

4.1. RDMAP Control and Invalidate STag Field

The version of RDMAP defined by this specification uses all 8 bits of the RDMAP Control Field. The first octet reserved for ULP use in the DDP Protocol MUST be used by the RDMAP to carry the RDMAP Control Field. The ordering of the bits in the first octet MUST be as defined in Figure 3, "DDP Control, RDMAP Control, and Invalidate STag Fields". For Send with Invalidate and Send with Solicited Event and Invalidate, the second through fifth octets of the DDP RsvdULP field MUST be used by RDMAP to carry the Invalidate STag. Figure 3 depicts the format of the DDP Control and RDMAP Control fields. (Note: In Figure 3, the DDP Header is offset by 16 bits to accommodate the MPA header defined in [MPA]. The MPA header is only present if DDP is layered on top of MPA.) 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |T|L| Resrv | DV| RV|Rsv| Opcode| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Invalidate STag | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 3: DDP Control, RDMAP Control, and Invalidate STag Fields All RDMA Messages handed by the RDMAP layer to the DDP layer MUST define the value of the Tagged flag in the DDP Header. Figure 4, "RDMA Usage of DDP Fields", MUST be used to define the value of the Tagged flag that is handed to the DDP layer for each RDMA Message. Figure 4 defines the value of the RDMA Opcode field that MUST be used for each RDMA Message. Figure 4 defines when the STag, Queue Number, and Tagged Offset fields MUST be provided for each RDMA Message.
Top   ToC   RFC5040 - Page 21
   For this version of the RDMAP, all RDMA Messages MUST have:

   *  Bits 24-25; RDMA Version field: 01b for an RNIC that complies with
      this RDMA protocol specification.  00b for an RNIC that complies
      with the RDMA Consortium's RDMA protocol specification.  Both
      version numbers are valid.  Interoperability is dependent on MPA
      protocol version negotiation (e.g., MPA marker and MPA CRC).

   *  Bits 26-27; Reserved.  MUST be set to zero by sender, ignored by
      the receiver.

   *  Bits 28-31; OpCode field: see Figure 4.

   *  Bits 32-63; Invalidate STag.  However, this field is only valid
      for Send with Invalidate and Send with Solicited Event and
      Invalidate Messages (see Figure 4).

      For Send, Send with Solicited Event, RDMA Read Request, and
      Terminate, the Invalidate STag field MUST be set to zero on
      transmit and ignored by the receiver.
Top   ToC   RFC5040 - Page 22
   -------+-----------+-------+------+-------+-----------+--------------
   RDMA   | Message   | Tagged| STag | Queue | Invalidate| Message
   Message| Type      | Flag  | and  | Number| STag      | Length
   OpCode |           |       | TO   |       |           | Communicated
          |           |       |      |       |           | between DDP
          |           |       |      |       |           | and RDMAP
   -------+-----------+-------+------+-------+-----------+--------------
   0000b  | RDMA Write| 1     | Valid| N/A   | N/A       | Yes
          |           |       |      |       |           |
   -------+-----------+-------+------+-------+-----------+--------------
   0001b  | RDMA Read | 0     | N/A  | 1     | N/A       | Yes
          | Request   |       |      |       |           |
   -------+-----------+-------+------+-------+-----------+--------------
   0010b  | RDMA Read | 1     | Valid| N/A   | N/A       | Yes
          | Response  |       |      |       |           |
   -------+-----------+-------+------+-------+-----------+--------------
   0011b  | Send      | 0     | N/A  | 0     | N/A       | Yes
          |           |       |      |       |           |
   -------+-----------+-------+------+-------+-----------+--------------
   0100b  | Send with | 0     | N/A  | 0     | Valid     | Yes
          | Invalidate|       |      |       |           |
   -------+-----------+-------+------+-------+-----------+--------------
   0101b  | Send with | 0     | N/A  | 0     | N/A       | Yes
          | SE        |       |      |       |           |
   -------+-----------+-------+------+-------+-----------+--------------
   0110b  | Send with | 0     | N/A  | 0     | Valid     | Yes
          | SE and    |       |      |       |           |
          | Invalidate|       |      |       |           |
   -------+-----------+-------+------+-------+-----------+--------------
   0111b  | Terminate | 0     | N/A  | 2     | N/A       | Yes
          |           |       |      |       |           |
   -------+-----------+-------+------+-------+-----------+--------------
   1000b  |           |
   to     | Reserved  |               Not Specified
   1111b  |           |
   -------+-----------+-------------------------------------------------

                    Figure 4: RDMA Usage of DDP Fields

   Note:  N/A means Not Applicable.
Top   ToC   RFC5040 - Page 23

4.2. RDMA Message Definitions

The following figure defines which RDMA Headers MUST be used on each RDMA Message and which RDMA Messages are allowed to carry ULP Payload: -------+-----------+-------------------+------------------------- RDMA | Message | RDMA Header Used | ULP Message allowed in Message| Type | | the RDMA Message OpCode | | | | | | -------+-----------+-------------------+------------------------- 0000b | RDMA Write| None | Yes | | | -------+-----------+-------------------+------------------------- 0001b | RDMA Read | RDMA Read Request | No | Request | Header | -------+-----------+-------------------+------------------------- 0010b | RDMA Read | None | Yes | Response | | -------+-----------+-------------------+------------------------- 0011b | Send | None | Yes | | | -------+-----------+-------------------+------------------------- 0100b | Send with | None | Yes | Invalidate| | -------+-----------+-------------------+------------------------- 0101b | Send with | None | Yes | SE | | -------+-----------+-------------------+------------------------- 0110b | Send with | None | Yes | SE and | | | Invalidate| | -------+-----------+-------------------+------------------------- 0111b | Terminate | Terminate Header | No | | | -------+-----------+-------------------+------------------------- 1000b | | to | Reserved | Not Specified 1111b | | -------+-----------+-------------------+------------------------- Figure 5: RDMA Message Definitions
Top   ToC   RFC5040 - Page 24

4.3. RDMA Write Header

The RDMA Write Message does not include an RDMAP header. The RDMAP layer passes to the DDP layer an RDMAP Control Field. The RDMA Write Message is fully described by the DDP Headers of the DDP Segments associated with the Message. See Appendix A for a description of the DDP Segment format associated with RDMA Write Messages.

4.4. RDMA Read Request Header

The RDMA Read Request Message carries an RDMA Read Request Header that describes the Data Sink and Data Source Buffers used by the RDMA Read operation. The RDMA Read Request Header immediately follows the DDP header. The RDMAP layer passes to the DDP layer an RDMAP Control Field. The following figure depicts the RDMA Read Request Header that MUST be used for all RDMA Read Request Messages: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Data Sink STag (SinkSTag) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | + Data Sink Tagged Offset (SinkTO) + | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | RDMA Read Message Size (RDMARDSZ) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Data Source STag (SrcSTag) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | + Data Source Tagged Offset (SrcTO) + | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 6: RDMA Read Request Header Format Data Sink Steering Tag: 32 bits. The Data Sink Steering Tag identifies the Data Sink's Tagged Buffer. This field MUST be copied, without interpretation, from the RDMA Read Request into the corresponding RDMA Read Response; this field allows the Data Sink to place the returning data. The STag is associated with the RDMAP Stream through a mechanism that is outside the scope of the RDMAP specification.
Top   ToC   RFC5040 - Page 25
      Data Sink Tagged Offset: 64 bits.

           The Data Sink Tagged Offset specifies the starting offset, in
           octets, from the base of the Data Sink's Tagged Buffer, where
           the data is to be written by the Data Source.  This field is
           copied from the RDMA Read Request into the corresponding RDMA
           Read Response and allows the Data Sink to place the returning
           data.  The Data Sink Tagged Offset MAY start at an arbitrary
           offset.

           The Data Sink STag and Data Sink Tagged Offset fields
           describe the buffer to which the RDMA Read data is written.

           Note: the DDP layer protects against a wrap of the Data Sink
           Tagged Offset.

      RDMA Read Message Size: 32 bits.

           The RDMA Read Message Size is the amount of data, in octets,
           read from the Data Source.  A single RDMA Read Request
           Message can retrieve from 0 to 2^32-1 data octets from the
           Data Source.

      Data Source Steering Tag: 32 bits.

           The Data Source Steering Tag identifies the Data Source's
           Tagged Buffer.  The STag is associated with the RDMAP Stream
           through a mechanism that is outside the scope of the RDMAP
           specification.

      Data Source Tagged Offset: 64 bits.

           The Tagged Offset specifies the starting offset, in octets,
           that is to be read from the Data Source's Tagged Buffer.  The
           Data Source Tagged Offset MAY start at an arbitrary offset.

           The Data Source STag and Data Source Tagged Offset fields
           describe the buffer from which the RDMA Read data is read.

   See Section 7.2, "Errors Detected at the Remote Peer on Incoming RDMA
   Messages", for a description of error checking required upon
   processing of an RDMA Read Request at the Data Source.
Top   ToC   RFC5040 - Page 26

4.5. RDMA Read Response Header

The RDMA Read Response Message does not include an RDMAP header. The RDMAP layer passes to the DDP layer an RDMAP Control Field. The RDMA Read Response Message is fully described by the DDP Headers of the DDP Segments associated with the Message. See Appendix A for a description of the DDP Segment format associated with RDMA Read Response Messages.

4.6. Send Header and Send with Solicited Event Header

The Send and Send with Solicited Event Messages do not include an RDMAP header. The RDMAP layer passes to the DDP layer an RDMAP Control Field. The Send and Send with Solicited Event Messages are fully described by the DDP Headers of the DDP Segments associated with the Messages. See Appendix A for a description of the DDP Segment format associated with Send and Send with Solicited Event Messages.

4.7. Send with Invalidate Header and Send with SE and Invalidate Header

The Send with Invalidate and Send with Solicited Event and Invalidate Messages do not include an RDMAP header. The RDMAP layer passes to the DDP layer an RDMAP Control Field and the Invalidate STag field (see section 4.1 RDMAP Control and Invalidate STag Field). The Send with Invalidate and Send with Solicited Event and Invalidate Messages are fully described by the DDP Headers of the DDP Segments associated with the Messages. See Appendix A for a description of the DDP Segment format associated with Send and Send with Solicited Event Messages.

4.8. Terminate Header

The Terminate Message carries a Terminate Header that contains additional information associated with the cause of the Terminate. The Terminate Header immediately follows the DDP header. The RDMAP layer passes to the DDP layer an RDMAP Control Field. The following figure depicts a Terminate Header that MUST be used for the Terminate Message:
Top   ToC   RFC5040 - Page 27
     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |       Terminate Control             |      Reserved           |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |  DDP Segment Length  (if any) |                               |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               +
    |                                                               |
    //                                                             //
    |                  Terminated DDP Header (if any)               |
    +                                                               +
    |                                                               |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                                                               |
    //                                                             //
    |                 Terminated RDMA Header (if any)               |
    +                                                               +
    |                                                               |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                 Figure 7: Terminate Header Format

      Terminate Control: 19 bits.

          The Terminate Control field MUST have the format defined in
          Figure 8 below.

     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | Layer | EType |   Error Code  |HdrCt|
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                    Figure 8: Terminate Control Field

   *  Figure 9, "Terminate Control Field Values", defines the valid
      values that MUST be used for this field.

      *  Layer: 4 bits.

         Identifies the layer that encountered the error.

      *  EType (RDMA Error Type): 4 bits.

         Identifies the type of error that caused the Terminate.  When
         the error is detected at the RDMAP layer, the RDMAP layer
         inserts the Error Type into this field.  When the error is
         detected at an LLP layer, an LLP layer creates the Error Type
Top   ToC   RFC5040 - Page 28
         and the DDP layer passes it up to the RDMAP layer, and the
         RDMAP layer inserts it into this field.

      *  Error Code: 8 bits.

         This field identifies the specific error that caused the
         Terminate.  When the error is detected at the RDMAP layer, the
         RDMAP layer creates the Error Code.  When the error is detected
         at an LLP layer, the LLP layer creates the Error Code, the DDP
         layer passes it up to the RDMAP layer, and the RDMAP layer
         inserts it into this field.

      *  HdrCt: 3 bits.

         Header control bits:

         *  M: bit 16.  DDP Segment Length valid.  See Figure 10 for
            when this bit SHOULD be set.

         *  D: bit 17.  DDP Header Included.  See Figure 10 for when
            this bit SHOULD be set.

         *  R: bit 18.  RDMAP Header Included.  See Figure 10 for when
            this bit SHOULD be set.
Top   ToC   RFC5040 - Page 29
   -------+-----------+-------+-------------+------+--------------------
   Layer  | Layer     | Error | Error Type  | Error| Error Code Name
          | Name      | Type  | Name        | Code |
   -------+-----------+-------+-------------+------+--------------------
          |           | 0000b | Local       | None | None - This error
          |           |       | Catastrophic|      | type does not have
          |           |       | Error       |      | an error code. Any
          |           |       |             |      | value in this field
          |           |       |             |      | is acceptable.
          |           +-------+-------------+------+--------------------
          |           |       |             | 00X  | Invalid STag
          |           |       |             +------+--------------------
          |           |       |             | 01X  | Base or bounds
          |           |       |             |      | violation
          |           |       | Remote      +------+--------------------
          |           | 0001b | Protection  | 02X  | Access rights
          |           |       | Error       |      | violation
          |           |       |             +------+--------------------
   0000b  | RDMA      |       |             | 03X  | STag not associated
          |           |       |             |      | with RDMAP Stream
          |           |       |             +------+--------------------
          |           |       |             | 04X  | TO wrap
          |           |       |             +------+--------------------
          |           |       |             | 09X  | STag cannot be
          |           |       |             |      | Invalidated
          |           |       |             +------+--------------------
          |           |       |             | FFX  | Unspecified Error
          |           +-------+-------------+------+--------------------
          |           |       |             | 05X  | Invalid RDMAP
          |           |       |             |      | version
          |           |       |             +------+--------------------
          |           |       |             | 06X  | Unexpected OpCode
          |           |       | Remote      +------+--------------------
          |           | 0010b | Operation   | 07X  | Catastrophic error,
          |           |       | Error       |      | localized to RDMAP
          |           |       |             |      | Stream
          |           |       |             +------+--------------------
          |           |       |             | 08X  | Catastrophic error,
          |           |       |             |      | global
          |           |       |             +------+--------------------
          |           |       |             | 09X  | STag cannot be
          |           |       |             |      | Invalidated
          |           |       |             +------+--------------------
          |           |       |             | FFX  | Unspecified Error
Top   ToC   RFC5040 - Page 30
   -------+-----------+-------+-------------+------+--------------------
   0001b  | DDP       | See DDP Specification [DDP] for a description of
          |           | the values and names.
   -------+-----------+-------+-----------------------------------------
   0010b  | LLP       | For MPA, see MPA Specification [MPA] for a
          |(e.g., MPA)| description of the values and names.
   -------+-----------+-------+-----------------------------------------

              Figure 9: Terminate Control Field Values

      Reserved: 13 bits.  This field MUST be set to zero on transmit,
      ignored on receive.

      DDP Segment Length: 16 bits

           The length handed up by the DDP layer when the error was
           detected.  It MUST be valid if the M bit is set.  It MUST be
           present when the D bit is set.

      Terminated DDP Header: 112 bits for Tagged Messages and 144 bits
      for Untagged Messages.

           The DDP Header of the incoming Message that is associated
           with the Terminate.  The DDP Header is not present if the
           Terminate Error Type is a Local Catastrophic Error.  It MUST
           be present if the D bit is set.

      Terminated RDMA Header: 224 bits.

           The Terminated RDMA Header is only sent back if the terminate
           is associated with an RDMA Read Request Message.  It MUST be
           present if the R bit is set.

           If the terminate occurs before the first RDMA Read Request
           byte is processed, the original RDMA Read Request Header is
           sent back.

           If the terminate occurs after the first RDMA Read Request
           byte is processed, the RDMA Read Request Header is updated to
           reflect the current location of the RDMA Read operation that
           is in process:

               *  Data Sink STag = Data Sink STag originally sent in the
                  RDMA Read Request.
Top   ToC   RFC5040 - Page 31
               *  Data Sink Tagged Offset = Current offset into the Data
                  Sink Tagged Buffer.  For example, if the RDMA Read
                  Request was terminated after 2048 octets were sent,
                  then the Data Sink Tagged Offset = the original Data
                  Sink Tagged Offset + 2048.

               *  Data Message size = Number of bytes left to transfer.

               *  Data Source STag = Data Source STag in the RDMA Read
                  Request.

               *  Data Source Tagged Offset = Current offset into the
                  Data Source Tagged Buffer.  For example, if the RDMA
                  Read Request was terminated after 2048 octets were
                  sent, then the Data Source Tagged Offset = the
                  original Data Source Tagged Offset + 2048.

   Note: if a given LLP does not define any termination codes for the
   RDMAP Termination message to use, then none would be used for that
   LLP.

   Figure 10, "Error Type to RDMA Message Mapping", maps layer name and
   error types to each RDMA Message type:
Top   ToC   RFC5040 - Page 32
   ---------+-------------+------------+------------+-----------------
   Layer    | Error Type  | Terminate  | Terminate  | What type of
   Name     | Name        | Includes   | Includes   | RDMA Message can
            |             | DDP Header | RDMA Header| cause the error
            |             | and DDP    |            |
            |             | Segment    |            |
            |             | Length     |            |
   ---------+-------------+------------+------------+-----------------
            | Local       | No         | No         | Any
            | Catastrophic|            |            |
            | Error       |            |            |
            +-------------+------------+------------+-----------------
            | Remote      | Yes, if    | Yes        | Only RDMA Read
   RDMA     | Protection  | possible   |            | Request, Send
            | Error       |            |            | with Invalidate,
            |             |            |            | and Send with SE
            |             |            |            | and Invalidate
            +-------------+------------+------------+-----------------
            | Remote      | Yes, if    | No         | Any
            | Operation   | possible   |            |
            | Error       |            |            |
   ---------+-------------+------------+------------+-----------------
   DDP      | See DDP Spec| Yes        | No         | Any
            | [DDP]       |            |            |
   ---------+-------------+------------+------------+-----------------
   LLP      | See LLP Spec| No         | No         | Any
            | (e.g., MPA) |            |            |

            Figure 10: Error Type to RDMA Message Mapping

5. Data Transfer

5.1. RDMA Write Message

An RDMA Write is used by the Data Source to transfer data to a previously Advertised Tagged Buffer at the Data Sink. The RDMA Write Message has the following semantics: * An RDMA Write Message MUST reference a Tagged Buffer. That is, the Data Source RDMAP layer MUST request that the DDP layer mark the Message as Tagged. * A valid RDMA Write Message MUST NOT be delivered to the Data Sink's ULP (i.e., it is placed by the DDP layer). * At the Remote Peer, when an invalid RDMA Write Message is delivered to the Remote Peer's RDMAP layer, an error is surfaced (see Section 7.1, "RDMAP Error Surfacing").
Top   ToC   RFC5040 - Page 33
   *  The Tagged Offset of a Tagged Buffer MAY start at a non-zero
      value.

   *  An RDMA Write Message MAY target all or part of a previously
      Advertised Buffer.

   *  The RDMAP does not define how the buffer(s) are used by an
      outbound RDMA Write or how they are addressed.  For example, an
      implementation of RDMA may choose to allow a gather-list of non-
      contiguous data blocks to be the source of an RDMA Write.  In this
      case, the data blocks would be combined by the Data Source and
      sent as a single RDMA Write Message to the Data Sink.

   *  The Data Source RDMAP layer MUST issue RDMA Write Messages to the
      DDP layer in the order they were submitted by the ULP.

   *  At the Data Source, a subsequent Send (Send with Invalidate, Send
      with Solicited Event, or Send with Solicited Event and Invalidate)
      Message MAY be used to signal Delivery of previous RDMA Write
      Messages to the Data Sink, if the ULP chooses to signal Delivery
      in this fashion.

   *  If the Local Peer wishes to write to multiple Tagged Buffers on
      the Remote Peer, the Local Peer MUST use multiple RDMA Write
      Messages.  That is, a single RDMA Write Message can only write to
      one remote Tagged Buffer.

   *  The Data Source MAY issue a zero-length RDMA Write Message.

5.2. RDMA Read Operation

The RDMA Read operation MUST consist of a single RDMA Read Request Message and a single RDMA Read Response Message.

5.2.1. RDMA Read Request Message

An RDMA Read Request is used by the Data Sink to transfer data from a previously Advertised Tagged Buffer at the Data Source to a Tagged Buffer at the Data Sink. The RDMA Read Request Message has the following semantics: * An RDMA Read Request Message MUST reference an Untagged Buffer. That is, the Local Peer's RDMAP layer MUST request that the DDP mark the Message as Untagged. * One RDMA Read Request Message MUST consume one Untagged Buffer.
Top   ToC   RFC5040 - Page 34
   *  The Remote Peer's RDMAP layer MUST process an RDMA Read Request
      Message.  A valid RDMA Read Request Message MUST NOT be delivered
      to the Data Sink's ULP (i.e., it is processed by the RDMAP layer).

   *  At the Remote Peer, when an invalid RDMA Read Request Message is
      delivered to the Remote Peer's RDMAP layer, an error is surfaced
      (see Section 7.1, "RDMAP Error Surfacing").

   *  An RDMA Read Request Message MUST reference the RDMA Read Request
      Queue.  That is, the Local Peer's RDMAP layer MUST request that
      the DDP layer set the Queue Number field to one.

   *  The Local Peer MUST pass to the DDP layer RDMA Read Request
      Messages in the order they were submitted by the ULP.

   *  The Remote Peer MUST process the RDMA Read Request Messages in the
      order they were sent.

   *  If the Local Peer wishes to read from multiple Tagged Buffers on
      the Remote Peer, the Local Peer MUST use multiple RDMA Read
      Request Messages.  That is, a single RDMA Read Request Message
      MUST only read from one remote Tagged Buffer.

   *  AN RDMA Read Request Message MAY target all or part of a
      previously Advertised Buffer.

   *  If the Data Source receives a valid RDMA Read Request Message, it
      MUST respond with a valid RDMA Read Response Message.

   *  The Data Sink MAY issue a zero-length RDMA Read Request Message by
      setting the RDMA Read Message Size field to zero in the RDMA Read
      Request Header.

   *  If the Data Source receives a non-zero-length RDMA Read Message
      Size, the Data Source RDMAP MUST validate the Data Source STag and
      Data Source Tagged Offset contained in the RDMA Read Request
      Header.

   *  If the Data Source receives an RDMA Read Request Header with the
      RDMA Read Message Size set to zero, the Data Source RDMAP:

      *  MUST NOT validate the Data Source STag and Data Source Tagged
         Offset contained in the RDMA Read Request Header, and

      *  MUST respond with a zero-length RDMA Read Response Message.
Top   ToC   RFC5040 - Page 35

5.2.2. RDMA Read Response Message

The RDMA Read Response Message uses the DDP Tagged Buffer Model to Deliver the contents of a previously requested Data Source Tagged Buffer to the Data Sink, without any involvement from the ULP at the Remote Peer. The RDMA Read Response Message has the following semantics: * The RDMA Read Response Message for the associated RDMA Read Request Message travels in the opposite direction. * An RDMA Read Response Message MUST reference a Tagged Buffer. That is, the Data Source RDMAP layer MUST request that the DDP mark the Message as Tagged. * The Data Source MUST ensure that a sufficient number of Untagged Buffers are available on the RDMA Read Request Queue (Queue with DDP Queue Number 1) to support the maximum number of RDMA Read Requests negotiated by the ULP. * The RDMAP layer MUST Deliver the RDMA Read Response Message to the ULP. * At the Remote Peer, when an invalid RDMA Read Response Message is delivered to the Remote Peer's RDMAP layer, an error is surfaced (see Section 7.1, "RDMAP Error Surfacing"). * The Tagged Offset of a Tagged Buffer MAY start at a non-zero value. * The Data Source RDMAP layer MUST pass RDMA Read Response Messages to the DDP layer, in the order that the RDMA Read Request Messages were received by the RDMAP layer, at the Data Source. * The Data Sink MAY validate that the STag, Tagged Offset, and length of the RDMA Read Response Message are the same as the STag, Tagged Offset, and length included in the corresponding RDMA Read Request Message. * A single RDMA Read Response Message MUST write to one remote Tagged Buffer. If the Data Sink wishes to read multiple Tagged Buffers, the Data Sink can use multiple RDMA Read Request Messages.
Top   ToC   RFC5040 - Page 36

5.3. Send Message Type

The Send Message Type uses the DDP Untagged Buffer Model to transfer data from the Data Source into an Untagged Buffer at the Data Sink. * A Send Message Type MUST reference an Untagged Buffer. That is, the Local Peer's RDMAP layer MUST request that the DDP layer mark the Message as Untagged. * One Send Message Type MUST consume one Untagged Buffer. * The ULP Message sent using a Send Message Type MAY be less than or equal to the size of the consumed Untagged Buffer. The RDMAP layer communicates to the ULP the size of the data written into the Untagged Buffer. * If the ULP Message sent via Send Message Type is larger than the Data Sink's Untagged Buffer, it is an error (see Section 9.1, "RDMAP Error Surfacing"). * At the Remote Peer, the Send Message Type MUST be Delivered to the Remote Peer's ULP in the order they were sent. * After the Send with Solicited Event or Send with Solicited Event and Invalidate Message is Delivered to the ULP, the RDMAP MAY generate an Event, if the Data Sink is configured to generate such an Event. * At the Remote Peer, when an invalid Send Message Type is Delivered to the Remote Peer's RDMAP layer, an error is surfaced (see Section 7.1, "RDMAP Error Surfacing"). * The RDMAP does not specify the structure of the buffer(s) used by an outbound RDMA Write nor does it specify how the buffer(s) are addressed. For example, an implementation of RDMA may choose to allow a gather-list of non-contiguous data blocks to be the source of a Send Message Type. In this case, the data blocks would be combined by the Data Source and sent as a single Send Message Type to the Data Sink. * For a Send Message Type, the Local Peer's RDMAP layer MUST request that the DDP layer set the Queue Number field to zero. * The Local Peer MUST issue Send Message Type Messages in the order they were submitted by the ULP.
Top   ToC   RFC5040 - Page 37
   *  The Data Source MAY pass a zero-length Send Message Type.  A
      zero-length Send Message Type MUST consume an Untagged Buffer at
      the Data Sink.  A Send with Invalidate or Send with Solicited
      Event and Invalidate Message MUST reference an STag.  That is, the
      Local Peer's RDMAP layer MUST pass the RDMA control field and the
      STag that will be Invalidated to the DDP layer.

   *  When the Send with Invalidate and Send with Solicited Event and
      Invalidate Message are Delivered to the Remote Peer's RDMAP layer,
      the RDMAP layer MUST:

      *  Verify the STag that is associated with the RDMAP Stream; and

      *  Invalidate the STag if it is associated with the RDMAP Stream;
         or issue a Terminate Message with the STag Cannot be
         Invalidated Terminate Error Code, if the STag is not associated
         with the RDMAP Stream.

5.4. Terminate Message

The Terminate Message uses the DDP Untagged Buffer Model to transfer-error-related information from the Data Source into an Untagged Buffer at the Data Sink and then ceases all further communications on the underlying DDP Stream. The Terminate Message has the following semantics: * A Terminate Message MUST reference an Untagged Buffer. That is, the Local Peer's RDMAP layer MUST request that the DDP layer mark the Message as Untagged. * A Terminate Message references the Terminate Queue. That is, the Local Peer's RDMAP layer MUST request that the DDP layer set the Queue Number field to two. * One Terminate Message MUST consume one Untagged Buffer. * On a single RDMAP Stream, the RDMAP layer MUST guarantee placement of a single Terminate Message. * A Terminate Message MUST be Delivered to the Remote Peer's RDMAP layer. The RDMAP layer MUST Deliver the Terminate Message to the ULP. * At the Remote Peer, when an invalid Terminate Message is delivered to the Remote Peer's RDMAP layer, an error is surfaced (see Section 7.1 "RDMAP Error Surfacing").
Top   ToC   RFC5040 - Page 38
   *  The RDMAP layer Completes in error all ULP operations that have
      not been provided to the DDP layer.

   *  After sending a Terminate Message on an RDMAP Stream, the Local
      Peer MUST NOT send any more Messages on that specific RDMAP
      Stream.

   *  After receiving a Terminate Message on an RDMAP Stream, the Remote
      Peer MAY stop sending Messages on that specific RDMAP Stream.

5.5. Ordering and Completions

It is important to understand the difference between Placement and Delivery ordering since RDMAP provides quite different semantics for the two. Note that many current protocols, both as used in the Internet and elsewhere, assume that data is both Placed and Delivered in order. Taking advantage of this fact allowed applications to take a variety of shortcuts. For RDMAP, many of these shortcuts are no longer safe to use, and could cause application failure. The following rules apply to implementations of the RDMAP protocol. Note that in these rules, Send includes Send, Send with Invalidate, Send with Solicited Event, and Send with Solicited Event and Invalidate: 1. RDMAP does not provide ordering among Messages on different RDMAP Streams. 2. RDMAP does not provide ordering between operations that are generated from the two ends of an RDMAP Stream. 3. RDMA Messages that use Tagged and Untagged Buffers MAY be Placed in any order. If an application uses overlapping buffers (points different Messages or portions of a single Message at the same buffer), then it is possible that the last incoming write to the Data Sink buffer will not be the last outgoing data sent from the Data Source. 4. For a Send operation, the contents of an Untagged Buffer at the Data Sink MAY be indeterminate until the Send is Delivered to the ULP at the Data Sink. 5. For an RDMA Write operation, the contents of the Tagged Buffer at the Data Sink MAY be indeterminate until a subsequent Send is Delivered to the ULP at the Data Sink.
Top   ToC   RFC5040 - Page 39
   6.  For an RDMA Read operation, the contents of the Tagged Buffer at
       the Data Sink MAY be indeterminate until the RDMA Read Response
       Message has been Delivered at the Local Peer.

   Statements 4, 5, and 6 imply "no peeking" at the data to see if it is
   done.  It is possible for some data to arrive before logically
   earlier data does, and peeking may cause unpredictable application
   failure.

   7.  If the ULP or Application modifies the contents of Tagged or
       Untagged Buffers, which are being modified by an RDMA Operation
       while the RDMAP is processing the RDMA Operation, the state of
       the Buffers is indeterminate.

   8.  If the ULP or Application modifies the contents of Tagged or
       Untagged Buffers, which are read by an RDMA Operation while the
       RDMAP is processing the RDMA Operation, the results of the read
       are indeterminate.

   9.  The Completion of an RDMA Write or Send Operation at the Local
       Peer does not guarantee that the ULP Message has yet reached the
       Remote Peer ULP Buffer or been examined by the Remote ULP.

   10. Send Messages MUST be Delivered to the ULP at the Remote Peer
       after they are Delivered to RDMAP by DDP and in the order that
       they were Delivered to RDMAP.

       Note that DDP ordering rules ensure that this will be the same
       order that they were submitted at the Local Peer and that any
       prior RDMA Writes have been submitted for ordered Placement at
       the Remote Peer.  This means that when the ULP sees the Delivery
       of the Send, the memory buffers targeted by any preceding RDMA
       Writes and Sends are available to be accessed locally or remotely
       as authorized.  If the ULP overlaps its buffers for different
       operations, the data from the RDMA Write or Send may be
       overwritten by subsequent RDMA Operations before the ULP receives
       and processes the Delivery.

   11. RDMA Read Response Messages MUST be Delivered to the ULP at the
       Remote Peer after they are Delivered to RDMAP by DDP and in the
       order that the they were Delivered to RDMAP.

       DDP ordering rules ensure that this will be the same order that
       they were submitted at the Local Peer.  This means that when the
       ULP sees the Delivery of the RDMA Read Response, the memory
       buffers targeted by the RDMA Read Response are available to be
       accessed locally or remotely as authorized.  If the ULP overlaps
Top   ToC   RFC5040 - Page 40
       its buffers for different operations, the data from the RDMA Read
       Response may be overwritten by subsequent RDMA Operations before
       the ULP receives and processes the Delivery.

   12. RDMA Read Request Messages, including zero-length RDMA Read
       Requests, MUST NOT start processing at the Remote Peer until they
       have been Delivered to RDMAP by DDP.

       Note: the ULP is assured that data written can be read back.  For
       example, if

          a) an RDMA Read Request is issued by the local peer,
          b) the Request targets the same ULP Buffer as a preceding Send
             or RDMA Write (in the same direction as the RDMA Read
             Request), and
          c) there are no other sources of update for the ULP Buffer,

       then the Remote Peer will send back the data written by the Send
       or RDMA Write.  That is, for this example, the ULP Buffer is
       Advertised for use on a series of RDMA Messages, is only valid on
       the RDMAP Stream for which it is Advertised, and is not locally
       updated while the series of RDMAP Messages are performed.  For
       this example, order rule (12) assures that subsequent local or
       remote accesses to the ULP Buffer contain the data written by the
       Send or RDMA Write.

       RDMA Read Response Messages MAY be generated at the Remote Peer
       after subsequent RDMA Write Messages or Send Messages have been
       Placed or Delivered.  Therefore, when an application does an RDMA
       Read Request followed by an RDMA Write (or Send) to the same
       buffer, it may get the data from the later RDMA Write (or Send)
       in the RDMA Read Response Message, even though the operations
       completed in order at the Local Peer.  If this behavior is not
       desired, the Local Peer ULP must Fence the later RDMA write (or
       Send) by withholding the RDMA Write Message until all outstanding
       RDMA Read Responses have been Delivered.

   13. The RDMAP layer MUST submit RDMA Messages to the DDP layer in the
       order the RDMA Operations are submitted to the RDMAP layer by the
       ULP.

   14. A Send or RDMA Write Message MUST NOT be considered Complete at
       the Local Peer (Data Source) until it has been successfully
       completed at the DDP layer.

   15. RDMA Operations MUST be Completed at the Local Peer in the order
       that they were submitted by the ULP.
Top   ToC   RFC5040 - Page 41
   16. At the Data Sink, an incoming Send Message MUST be Delivered to
       the ULP only after the DDP Message has been Delivered to the
       RDMAP layer by the DDP layer.

   17. RDMA Read Response Message processing at the Remote Peer (reading
       the specified Tagged Buffer) MUST be started only after the RDMA
       Read Request Message has been Delivered by the DDP layer (thus,
       all previous RDMA Messages have been properly submitted for
       ordered Placement).

   18. Send Messages MAY be Completed at the Remote Peer (Data Sink)
       before prior incoming RDMA Read Request Messages have completed
       their response processing.

   19. An RDMA Read operation MUST NOT be Completed at the Local Peer
       until the DDP layer Delivers the associated incoming RDMA Read
       Response Message.

   20. If more than one outstanding RDMA Read Request Messages are
       supported by both peers, the RDMA Read Response Messages MUST be
       submitted to the DDP layer on the Remote Peer in the order the
       RDMA Read Request Messages were Delivered by DDP, but the actual
       read of the buffer contents MAY take place in any order at the
       Remote Peer.

       This simplifies Local Peer Completion processing for RDMA Reads
       in that a Delivered RDMA Read Response MUST be sufficient to
       Complete the RDMA Read operation.

6. RDMAP Stream Management

RDMAP Stream management consists of RDMAP Stream Initialization and RDMAP Stream Termination.

6.1. Stream Initialization

RDMAP Stream initialization occurs after the LLP Stream has been created (e.g., for DDP/MPA over TCP, the first TCP Segment after the SYN, SYN/ACK exchange). The ULP is responsible for transitioning the LLP Stream into RDMA-enabled mode. The switch to RDMA mode typically occurs sometime after LLP Stream setup. Once in RDMA enabled mode, an implementation MUST send only RDMA Messages across the transport Stream until the RDMAP Stream is torn down. For each direction of an RDMAP Stream: * For a given RDMAP Stream, the number of outstanding RDMA Read Requests is limited per RDMAP Stream direction.
Top   ToC   RFC5040 - Page 42
   *  It is the ULP's responsibility to set the maximum number of
      outstanding, inbound RDMA Read Requests per RDMAP Stream
      direction.

   *  The RDMAP layer MUST provide the maximum number of outstanding,
      inbound RDMA Read Requests per RDMAP Stream direction that were
      negotiated between the ULP and the Local Peer's RDMAP layer.  The
      negotiation mechanism is outside the scope of this specification.

   *  It is the ULP's responsibility to set the maximum number of
      outstanding, outbound RDMA Read Requests per RDMAP Stream
      direction.

   *  The RDMAP layer MUST provide the maximum number of outstanding,
      outbound RDMA Read Requests for the RDMAP Stream direction that
      were negotiated between the ULP and the Local Peer's RDMAP layer.
      The negotiation mechanism is outside the scope of this
      specification.

   *  The Local Peer's ULP is responsible for negotiating with the
      Remote Peer's ULP the maximum number of outstanding RDMA Read
      Requests for the RDMAP Stream direction.  It is recommended that
      the ULP set the maximum number of outstanding, inbound RDMA Read
      Requests equal to the maximum number of outstanding, outbound RDMA
      Read Requests for a given RDMAP Stream direction.

   *  For outbound RDMA Read Requests, the RDMAP layer MUST NOT exceed
      the maximum number of outstanding, outbound RDMA Read Requests
      that were negotiated between the ULP and the Local Peer's RDMAP
      layer.

   *  For inbound RDMA Read Requests, the RDMAP layer MUST NOT exceed
      the maximum number of outstanding, inbound RDMA Read Requests that
      were negotiated between the ULP and the Local Peer's RDMAP layer.

6.2. Stream Teardown

There are three methods for terminating an RDMAP Stream: ULP Graceful Termination, RDMAP Abortive Termination, and LLP Abortive Termination. The ULP is responsible for performing ULP Graceful Termination. After a ULP Graceful Termination, either side of the Stream can initiate LLP Graceful Termination, using the graceful termination mechanism provided by the LLP.
Top   ToC   RFC5040 - Page 43
   RDMAP Abortive Termination allows the RDMAP to issue a Terminate
   Message describing the reason the RDMAP Stream was terminated.  The
   next section (6.2.1, "RDMAP Abortive Termination") describes the
   RDMAP Abortive Termination in detail.

   LLP Abortive Termination results due to an LLP error and causes the
   RDMAP Stream to be torn down midstream, without an RDMAP Terminate
   Message.  While this last method is highly undesirable, it is
   possible, and the ULP should take this into consideration.

6.2.1. RDMAP Abortive Termination

RDMAP defines a Terminate operation that SHOULD be invoked when either an RDMAP error is encountered or an LLP error is surfaced to the RDMAP layer by the LLP. It is not always possible to send the Terminate Message. For example, certain LLP errors may occur that cause the LLP Stream to be torn down a) before RDMAP is aware of the error, b) before RDMAP is able to send the Terminate Message, or c) after RDMAP has posted the Terminate Message to the LLP, but it has not yet been transmitted by the LLP. Note that an RDMAP Abortive Termination may entail loss of data. In general, when a Terminate Message is received, it is impossible to tell for sure what unacknowledged RDMA Messages were Completed successfully at the Remote Peer. Thus, the state of all outstanding RDMA Messages is indeterminate, and the Messages SHOULD be considered Completed in error. When a peer sends or receives a Terminate Message, it MAY immediately tear down the LLP Stream. The peer SHOULD perform a graceful LLP teardown to ensure the Terminate Message is successfully Delivered. See Section 4.8, "Terminate Header", for a description of the Terminate Message and its contents. See Section 5.4, "Terminate Message", for a description of the Terminate Message semantics.

7. RDMAP Error Management

The RDMAP protocol does not have RDMAP- or DDP-layer error recovery operations built in. If everything is working, the LLP guarantees will ensure that the Messages are arriving at the destination. If errors are detected at the RDMAP or DDP layer, then the RDMAP, DDP, and LLP Streams are Abortively Terminated (see Section 4.8, "Terminate Header").
Top   ToC   RFC5040 - Page 44
   In general, poor implementations or improper ULP programming cause
   the errors detected at the RDMAP and DDP layers.  In these cases,
   returning a diagnostic termination error Message and closing the
   RDMAP Stream is far simpler than attempting to maintain the RDMAP
   Stream, particularly when the cause of the error is not known.

   If an LLP does not support teardown of a Stream independent of other
   Streams, and an RDMAP error results in the Termination of a specific
   Stream, then the LLP MUST label the Stream as an erroneous Stream and
   MUST NOT allow any further data transfer on that Stream after RDMAP
   requests the Stream to be torn down.

   For a specific LLP connection, when all Streams are either gracefully
   torn down or are labeled as erroneous Streams, the LLP connection
   MUST be torn down.

   Since errors are detected at the Remote Peer (possibly long) after
   RDMA Messages are passed to the DDP and the LLP at the Local Peer and
   after the RDMA Operations conveyed by the Messages are Completed, the
   sender cannot easily determine which of its Messages have been
   received.  (RDMA Reads are an exception to this rule.)

   For a list of errors returned to the Remote Peer as a result of an
   Abortive Termination, see Section 4.8, "Terminate Header".

7.1. RDMAP Error Surfacing

If an error occurs at the Local Peer, the RDMAP layer MUST attempt to inform the local ULP that the error has occurred. The Local Peer MUST send a Terminate Message for each of the following cases: 1. For errors detected while creating RDMA Write, Send, Send with Invalidate, Send with Solicited Event, Send with Solicited Event and Invalidate, or RDMA Read Requests, or other reasons not directly associated with an incoming Message, the Terminate Message and Error code are sent instead of the request. In this case, the Error Type and Error Code fields are included in the Terminate Message, but the Terminated DDP Header and Terminated RDMA Header fields are set to zero. 2. For errors detected on an incoming RDMA Write, Send, Send with Invalidate, Send with Solicited Event, Send with Solicited Event and Invalidate, or Read Response Message (after the Message has been Delivered by DDP), the Terminate Message is sent at the earliest possible opportunity, preferably in the next outgoing RDMA Message. In this case, the Error Type, Error Code, ULP PDU
Top   ToC   RFC5040 - Page 45
       Length, and Terminated DDP Header fields are included in the
       Terminate Message, but the Terminated RDMA Header field is set to
       zero.

   3.  For errors detected on an incoming RDMA Read Request Message
       (after the Message has been Delivered by DDP), the Terminate
       Message is sent at the earliest possible opportunity, preferably
       in the next outgoing RDMA Message.  In this case, the Error Type,
       Error Code, ULP PDU Length, Terminated DDP Header, and Terminated
       RDMA Header fields are included in the Terminate Message.

   4.  If more than one error is detected on incoming RDMA Messages,
       before the Terminate Message can be sent, then the first RDMA
       Message (and its associated DDP Segment) that experienced an
       error MUST be captured by the Terminate Message, in accordance
       with rules 2 and 3 above.

7.2. Errors Detected at the Remote Peer on Incoming RDMA Messages

On incoming RDMA Writes, RDMA Read Response, Sends, Send with Invalidate, Send with Solicited Event, Send with Solicited Event and Invalidate, and Terminate Messages, the following must be validated: 1. The DDP layer MUST validate all DDP Segment fields. 2. The RDMA OpCode MUST be valid. 3. The RDMA Version MUST be valid. Additionally, on incoming Send with Invalidate and Send with Solicited Event and Invalidate Messages, the following must also be validated: 4. The Invalidate STag MUST be valid. 5. The STag MUST be associated to this RDMAP Stream. On incoming RDMA Request Messages, the following must be validated: 1. The DDP layer MUST validate all Untagged DDP Segment fields. 2. The RDMA OpCode MUST be valid. 3. The RDMA Version MUST be valid. 4. For non-zero length RDMA Read Request Messages: a. The Data Source STag MUST be valid.
Top   ToC   RFC5040 - Page 46
       b.  The Data Source STag MUST be associated to this RDMAP Stream.

       c.  The Data Source Tagged Offset MUST fall in the range of legal
           offsets associated with the Data Source STag.

       d.  The sum of the Data Source Tagged Offset and the RDMA Read
           Message Size MUST fall in the range of legal offsets
           associated with the Data Source STag.

       e.  The sum of the Data Source Tagged Offset and the RDMA Read
           Message Size MUST NOT cause the Data Source Tagged Offset to
           wrap.



(page 46 continued on part 3)

Next Section