Tech-invite3GPPspaceIETFspace
96959493929190898887868584838281807978777675747372717069686766656463626160595857565554535251504948474645444342414039383736353433323130292827262524232221201918171615141312111009080706050403020100
in Index   Prev   Next

RFC 5044

Marker PDU Aligned Framing for TCP Specification

Pages: 74
Proposed Standard
Errata
Updated by:  65817146
Part 3 of 3 – Pages 45 to 74
First   Prev   None

Top   ToC   RFC5044 - Page 45   prevText

Appendix A. Optimized MPA-Aware TCP Implementations

This appendix is for information only and is NOT part of the standard. This appendix covers some Optimized MPA-aware TCP implementation guidance to implementers. It is intended for those implementations that want to send/receive as much traffic as possible in an aligned and zero-copy fashion. +-----------------------------------+ | +-----------+ +-----------------+ | | | Optimized | | Other Protocols | | | | MPA/TCP | +-----------------+ | | +-----------+ || | | \\ --- socket API --- | | \\ || | | \\ +-----+ | | \\ | TCP | | | \\ +-----+ | | \\ // | | +-------+ | | | IP | | | +-------+ | +-----------------------------------+ Figure 11: Optimized MPA/TCP Implementation The diagram above shows a block diagram of a potential implementation. The network sub-system in the diagram can support traditional sockets-based connections using the normal API as shown on the right side of the diagram. Connections for DDP/MPA/TCP are run using the facilities shown on the left side of the diagram. The DDP/MPA/TCP connections can be started using the facilities shown on the left side using some suitable API, or they can be initiated using the facilities shown on the right side and transitioned to the left side at the point in the connection setup where MPA goes to "Full MPA/DDP Operation Phase" as described in Section 7.1.2. The optimized MPA/TCP implementations (left side of diagram and described below) are only applicable to MPA. All other TCP applications continue to use the standard TCP stacks and interfaces shown in the right side of the diagram.
Top   ToC   RFC5044 - Page 46

A.1. Optimized MPA/TCP Transmitters

The various TCP RFCs allow considerable choice in segmenting a TCP stream. In order to optimize FPDU recovery at the MPA receiver, an optimized MPA/TCP implementation uses additional segmentation rules. To provide optimum performance, an optimized MPA/TCP transmit side implementation should be enabled to: * With an EMSS large enough to contain the FPDU(s), segment the outgoing TCP stream such that the first octet of every TCP segment begins with an FPDU. Multiple FPDUs may be packed into a single TCP segment as long as they are entirely contained in the TCP segment. * Report the current EMSS from the TCP to the MPA transmit layer. There are exceptions to the above rule. Once an ULPDU is provided to MPA, the MPA/TCP sender transmits it or fails the connection; it cannot be repudiated. As a result, during changes in MTU and EMSS, or when TCP's Receive Window size (RWIN) becomes too small, it may be necessary to send FPDUs that do not conform to the segmentation rule above. A possible, but less desirable, alternative is to use IP fragmentation on accepted FPDUs to deal with MTU reductions or extremely small EMSS. Even when alignment with TCP segments is lost, the sender still formats the FPDU according to FPDU format as shown in Figure 2. On a retransmission, TCP does not necessarily preserve original TCP segmentation boundaries. This can lead to the loss of FPDU Alignment and containment within a TCP segment during TCP retransmissions. An optimized MPA/TCP sender should try to preserve original TCP segmentation boundaries on a retransmission.

A.2. Effects of Optimized MPA/TCP Segmentation

Optimized MPA/TCP senders will fill TCP segments to the EMSS with a single FPDU when a DDP message is large enough. Since the DDP message may not exactly fit into TCP segments, a "message tail" often occurs that results in an FPDU that is smaller than a single TCP segment. Additionally, some DDP messages may be considerably shorter than the EMSS. If a small FPDU is sent in a single TCP segment, the result is a "short" TCP segment.
Top   ToC   RFC5044 - Page 47
   Applications expected to see strong advantages from Direct Data
   Placement include transaction-based applications and throughput
   applications.  Request/response protocols typically send one FPDU per
   TCP segment and then wait for a response.  Under these conditions,
   these "short" TCP segments are an appropriate and expected effect of
   the segmentation.

   Another possibility is that the application might be sending multiple
   messages (FPDUs) to the same endpoint before waiting for a response.
   In this case, the segmentation policy would tend to reduce the
   available connection bandwidth by under-filling the TCP segments.

   Standard TCP implementations often utilize the Nagle [RFC896]
   algorithm to ensure that segments are filled to the EMSS whenever the
   round-trip latency is large enough that the source stream can fully
   fill segments before ACKs arrive.  The algorithm does this by
   delaying the transmission of TCP segments until a ULP can fill a
   segment, or until an ACK arrives from the far side.  The algorithm
   thus allows for smaller segments when latencies are shorter to keep
   the ULP's end-to-end latency to reasonable levels.

   The Nagle algorithm is not mandatory to use [RFC1122].

   When used with optimized MPA/TCP stacks, Nagle and similar algorithms
   can result in the "packing" of multiple FPDUs into TCP segments.

   If a "message tail", small DDP messages, or the start of a larger DDP
   message are available, MPA may pack multiple FPDUs into TCP segments.
   When this is done, the TCP segments can be more fully utilized, but,
   due to the size constraints of FPDUs, segments may not be filled to
   the EMSS.  A dynamic MULPDU that informs DDP of the size of the
   remaining TCP segment space makes filling the TCP segment more
   effective.

       Note that MPA receivers do more processing of a TCP segment that
       contains multiple FPDUs; this may affect the performance of some
       receiver implementations.

   It is up to the ULP to decide if Nagle is useful with DDP/MPA.  Note
   that many of the applications expected to take advantage of MPA/DDP
   prefer to avoid the extra delays caused by Nagle.  In such scenarios,
   it is anticipated there will be minimal opportunity for packing at
   the transmitter and receivers may choose to optimize their
   performance for this anticipated behavior.
Top   ToC   RFC5044 - Page 48
   Therefore, the application is expected to set TCP parameters such
   that it can trade off latency and wire efficiency.  Implementations
   should provide a connection option that disables Nagle for MPA/TCP
   similar to the way the TCP_NODELAY socket option is provided for a
   traditional sockets interface.

   When latency is not critical, application is expected to leave Nagle
   enabled.  In this case, the TCP implementation may pack any available
   FPDUs into TCP segments so that the segments are filled to the EMSS.
   If the amount of data available is not enough to fill the TCP segment
   when it is prepared for transmission, TCP can send the segment partly
   filled, or use the Nagle algorithm to wait for the ULP to post more
   data.

A.3. Optimized MPA/TCP Receivers

When an MPA receive implementation and the MPA-aware receive side TCP implementation support handling out-of-order ULPDUs, the TCP receive implementation performs the following functions: 1) The implementation passes incoming TCP segments to MPA as soon as they have been received and validated, even if not received in order. The TCP layer commits to keeping each segment before it can be passed to the MPA. This means that the segment must have passed the TCP, IP, and lower layer data integrity validation (i.e., checksum), must be in the receive window, must be part of the same epoch (if timestamps are used to verify this), and must have passed any other checks required by TCP RFCs. This is not to imply that the data must be completely ordered before use. An implementation can accept out-of-order segments, SACK them [RFC2018], and pass them to MPA immediately, before the reception of the segments needed to fill in the gaps. MPA expects to utilize these segments when they are complete FPDUs or can be combined into complete FPDUs to allow the passing of ULPDUs to DDP when they arrive, independent of ordering. DDP uses the passed ULPDU to "place" the DDP segments (see [DDP] for more details). Since MPA performs a CRC calculation and other checks on received FPDUs, the MPA/TCP implementation ensures that any TCP segments that duplicate data already received and processed (as can happen during TCP retries) do not overwrite already received and processed FPDUs. This avoids the possibility that duplicate data may corrupt already validated FPDUs.
Top   ToC   RFC5044 - Page 49
   2)  The implementation provides a mechanism to indicate the ordering
       of TCP segments as the sender transmitted them.  One possible
       mechanism might be attaching the TCP sequence number to each
       segment.

   3)  The implementation also provides a mechanism to indicate when a
       given TCP segment (and the prior TCP stream) is complete.  One
       possible mechanism might be to utilize the leading (left) edge of
       the TCP Receive Window.

       MPA uses the ordering and completion indications to inform DDP
       when a ULPDU is complete; MPA Delivers the FPDU to DDP.  DDP uses
       the indications to "deliver" its messages to the DDP consumer
       (see [DDP] for more details).

       DDP on MPA utilizes the above two mechanisms to establish the
       Delivery semantics that DDP's consumers agree to.  These
       semantics are described fully in [DDP].  These include
       requirements on DDP's consumer to respect ownership of buffers
       prior to the time that DDP delivers them to the Consumer.

   The use of SACK [RFC2018] significantly improves network utilization
   and performance and is therefore recommended.  When combined with the
   out-of-order passing of segments to MPA and DDP, significant
   buffering and copying of received data can be avoided.

A.4. Re-Segmenting Middleboxes and Non-Optimized MPA/TCP Senders

Since MPA senders often start FPDUs on TCP segment boundaries, a receiving optimized MPA/TCP implementation may be able to optimize the reception of data in various ways. However, MPA receivers MUST NOT depend on FPDU Alignment on TCP segment boundaries. Some MPA senders may be unable to conform to the sender requirements because their implementation of TCP is not designed with MPA in mind. Even for optimized MPA/TCP senders, the network may contain "middleboxes" which modify the TCP stream by changing the segmentation. This is generally interoperable with TCP and its users and MPA must be no exception. The presence of Markers in MPA (when enabled) allows an optimized MPA/TCP receiver to recover the FPDUs despite these obstacles, although it may be necessary to utilize additional buffering at the receiver to do so.
Top   ToC   RFC5044 - Page 50
   Some of the cases that a receiver may have to contend with are listed
   below as a reminder to the implementer:

   *   A single aligned and complete FPDU, either in order or out of
       order:  This can be passed to DDP as soon as validated, and
       Delivered when ordering is established.

   *   Multiple FPDUs in a TCP segment, aligned and fully contained,
       either in order or out of order:  These can be passed to DDP as
       soon as validated, and Delivered when ordering is established.

   *   Incomplete FPDU: The receiver should buffer until the remainder
       of the FPDU arrives.  If the remainder of the FPDU is already
       available, this can be passed to DDP as soon as validated, and
       Delivered when ordering is established.

   *   Unaligned FPDU start: The partial FPDU must be combined with its
       preceding portion(s).  If the preceding parts are already
       available, and the whole FPDU is present, this can be passed to
       DDP as soon as validated, and Delivered when ordering is
       established.  If the whole FPDU is not available, the receiver
       should buffer until the remainder of the FPDU arrives.

   *   Combinations of unaligned or incomplete FPDUs (and potentially
       other complete FPDUs) in the same TCP segment:  If any FPDU is
       present in its entirety, or can be completed with portions
       already available, it can be passed to DDP as soon as validated,
       and Delivered when ordering is established.

A.5. Receiver Implementation

Transport & Network Layer Reassembly Buffers: The use of reassembly buffers (either TCP reassembly buffers or IP fragmentation reassembly buffers) is implementation dependent. When MPA is enabled, reassembly buffers are needed if out-of-order packets arrive and Markers are not enabled. Buffers are also needed if FPDU alignment is lost or if IP fragmentation occurs. This is because the incoming out-of-order segment may not contain enough information for MPA to process all of the FPDU. For cases where a re-segmenting middlebox is present, or where the TCP sender is not optimized, the presence of Markers significantly reduces the amount of buffering needed. Recovery from IP fragmentation is transparent to the MPA Consumers.
Top   ToC   RFC5044 - Page 51

A.5.1 Network Layer Reassembly Buffers

The MPA/TCP implementation should set the IP Don't Fragment bit at the IP layer. Thus, upon a path MTU change, intermediate devices drop the IP datagram if it is too large and reply with an ICMP message that tells the source TCP that the path MTU has changed. This causes TCP to emit segments conformant with the new path MTU size. Thus, IP fragments under most conditions should never occur at the receiver. But it is possible. There are several options for implementation of network layer reassembly buffers: 1. drop any IP fragments, and reply with an ICMP message according to [RFC792] (fragmentation needed and DF set) to tell the Remote Peer to resize its TCP segment. 2. support an IP reassembly buffer, but have it of limited size (possibly the same size as the local link's MTU). The end node would normally never Advertise a path MTU larger than the local link MTU. It is recommended that a dropped IP fragment cause an ICMP message to be generated according to RFC 792. 3. multiple IP reassembly buffers, of effectively unlimited size. 4. support an IP reassembly buffer for the largest IP datagram (64 KB). 5. support for a large IP reassembly buffer that could span multiple IP datagrams. An implementation should support at least 2 or 3 above, to avoid dropping packets that have traversed the entire fabric. There is no end-to-end ACK for IP reassembly buffers, so there is no flow control on the buffer. The only end-to-end ACK is a TCP ACK, which can only occur when a complete IP datagram is delivered to TCP. Because of this, under worst case, pathological scenarios, the largest IP reassembly buffer is the TCP receive window (to buffer multiple IP datagrams that have all been fragmented). Note that if the Remote Peer does not implement re-segmentation of the data stream upon receiving the ICMP reply updating the path MTU, it is possible to halt forward progress because the opposite peer would continue to retransmit using a transport segment size that is too large. This deadlock scenario is no different than if the fabric MTU (not last-hop MTU) was reduced after connection setup, and the remote node's behavior is not compliant with [RFC1122].
Top   ToC   RFC5044 - Page 52

A.5.2 TCP Reassembly Buffers

A TCP reassembly buffer is also needed. TCP reassembly buffers are needed if FPDU Alignment is lost when using TCP with MPA or when the MPA FPDU spans multiple TCP segments. Buffers are also needed if Markers are disabled and out-of-order packets arrive. Since lost FPDU Alignment often means that FPDUs are incomplete, an MPA on TCP implementation must have a reassembly buffer large enough to recover an FPDU that is less than or equal to the MTU of the locally attached link (this should be the largest possible Advertised TCP path MTU). If the MTU is smaller than 140 octets, a buffer of at least 140 octets long is needed to support the minimum FPDU size. The 140 octets allow for the minimum MULPDU of 128, 2 octets of pad, 2 of ULPDU_Length, 4 of CRC, and space for a possible Marker. As usual, additional buffering is likely to provide better performance. Note that if the TCP segments were not stored, it would be possible to deadlock the MPA algorithm. If the path MTU is reduced, FPDU Alignment requires the source TCP to re-segment the data stream to the new path MTU. The source MPA will detect this condition and reduce the MPA segment size, but any FPDUs already posted to the source TCP will be re-segmented and lose FPDU Alignment. If the destination does not support a TCP reassembly buffer, these segments can never be successfully transmitted and the protocol deadlocks. When a complete FPDU is received, processing continues normally.

Appendix B. Analysis of MPA over TCP Operations

This appendix is for information only and is NOT part of the standard. This appendix is an analysis of MPA on TCP and why it is useful to integrate MPA with TCP (with modifications to typical TCP implementations) to reduce overall system buffering and overhead. One of MPA's high-level goals is to provide enough information, when combined with the Direct Data Placement Protocol [DDP], to enable out-of-order placement of DDP payload into the final Upper Layer Protocol (ULP) Buffer. Note that DDP separates the act of placing data into a ULP Buffer from that of notifying the ULP that the ULP Buffer is available for use. In DDP terminology, the former is defined as "Placement", and the later is defined as "Delivery". MPA supports in-order Delivery of the data to the ULP, including support for Direct Data Placement in the final ULP Buffer location when TCP segments arrive out of order. Effectively, the goal is to use the
Top   ToC   RFC5044 - Page 53
   pre-posted ULP Buffers as the TCP receive buffer, where the
   reassembly of the ULP Protocol Data Unit (PDU) by TCP (with MPA and
   DDP) is done in place, in the ULP Buffer, with no data copies.

   This appendix walks through the advantages and disadvantages of the
   TCP sender modifications proposed by MPA:

   1) that MPA prefers that the TCP sender to do Header Alignment, where
      a TCP segment should begin with an MPA Framing Protocol Data Unit
      (FPDU) (if there is payload present).

   2) that there be an integral number of FPDUs in a TCP segment (under
      conditions where the path MTU is not changing).

   This appendix concludes that the scaling advantages of FPDU Alignment
   are strong, based primarily on fairly drastic TCP receive buffer
   reduction requirements and simplified receive handling.  The analysis
   also shows that there is little effect to TCP wire behavior.

B.1. Assumptions

B.1.1 MPA Is Layered beneath DDP

MPA is an adaptation layer between DDP and TCP. DDP requires preservation of DDP segment boundaries and a CRC32c digest covering the DDP header and data. MPA adds these features to the TCP stream so that DDP over TCP has the same basic properties as DDP over SCTP.

B.1.2. MPA Preserves DDP Message Framing

MPA was designed as a framing layer specifically for DDP and was not intended as a general-purpose framing layer for any other ULP using TCP. A framing layer allows ULPs using it to receive indications from the transport layer only when complete ULPDUs are present. As a framing layer, MPA is not aware of the content of the DDP PDU, only that it has received and, if necessary, reassembled a complete PDU for Delivery to the DDP.

B.1.3. The Size of the ULPDU Passed to MPA Is Less Than EMSS under Normal Conditions

To make reception of a complete DDP PDU on every received segment possible, DDP passes to MPA a PDU that is no larger than the EMSS of the underlying fabric. Each FPDU that MPA creates contains sufficient information for the receiver to directly place the ULP payload in the correct location in the correct receive buffer.
Top   ToC   RFC5044 - Page 54
   Edge cases when this condition does not occur are dealt with, but do
   not need to be on the fast path.

B.1.4. Out-of-Order Placement but NO Out-of-Order Delivery

DDP receives complete DDP PDUs from MPA. Each DDP PDU contains the information necessary to place its ULP payload directly in the correct location in host memory. Because each DDP segment is self-describing, it is possible for DDP segments received out of order to have their ULP payload placed immediately in the ULP receive buffer. Data delivery to the ULP is guaranteed to be in the order the data was sent. DDP only indicates data delivery to the ULP after TCP has acknowledged the complete byte stream.

B.2. The Value of FPDU Alignment

Significant receiver optimizations can be achieved when Header Alignment and complete FPDUs are the common case. The optimizations allow utilizing significantly fewer buffers on the receiver and less computation per FPDU. The net effect is the ability to build a "flow-through" receiver that enables TCP-based solutions to scale to 10G and beyond in an economical way. The optimizations are especially relevant to hardware implementations of receivers that process multiple protocol layers -- Data Link Layer (e.g., Ethernet), Network and Transport Layer (e.g., TCP/IP), and even some ULP on top of TCP (e.g., MPA/DDP). As network speed increases, there is an increasing desire to use a hardware-based receiver in order to achieve an efficient high performance solution. A TCP receiver, under worst-case conditions, has to allocate buffers (BufferSizeTCP) whose capacities are a function of the bandwidth- delay product. Thus: BufferSizeTCP = K * bandwidth [octets/second] * Delay [seconds]. Where bandwidth is the end-to-end bandwidth of the connection, delay is the round-trip delay of the connection, and K is an implementation-dependent constant. Thus, BufferSizeTCP scales with the end-to-end bandwidth (10x more buffers for a 10x increase in end-to-end bandwidth). As this buffering approach may scale poorly for hardware or software implementations alike, several approaches allow reduction in the amount of buffering required for high-speed TCP communication.
Top   ToC   RFC5044 - Page 55
   The MPA/DDP approach is to enable the ULP's Buffer to be used as the
   TCP receive buffer.  If the application pre-posts a sufficient amount
   of buffering, and each TCP segment has sufficient information to
   place the payload into the right application buffer, when an out-of-
   order TCP segment arrives it could potentially be placed directly in
   the ULP Buffer.  However, placement can only be done when a complete
   FPDU with the placement information is available to the receiver, and
   the FPDU contents contain enough information to place the data into
   the correct ULP Buffer (e.g., there is a DDP header available).

   For the case when the FPDU is not aligned with the TCP segment, it
   may take, on average, 2 TCP segments to assemble one FPDU.
   Therefore, the receiver has to allocate BufferSizeNAF (Buffer Size,
   Non-Aligned FPDU) octets:

       BufferSizeNAF = K1* EMSS * number_of_connections + K2 * EMSS

   Where K1 and K2 are implementation-dependent constants and EMSS is
   the effective maximum segment size.

   For example, a 1 GB/sec link with 10,000 connections and an EMSS of
   1500 B would require 15 MB of memory.  Often the number of
   connections used scales with the network speed, aggravating the
   situation for higher speeds.

   FPDU Alignment would allow the receiver to allocate BufferSizeAF
   (Buffer Size, Aligned FPDU) octets:

       BufferSizeAF = K2 * EMSS

   for the same conditions.  An FPDU Aligned receiver may require memory
   in the range of ~100s of KB -- which is feasible for an on-chip
   memory and enables a "flow-through" design, in which the data flows
   through the network interface card (NIC) and is placed directly in
   the destination buffer.  Assuming most of the connections support
   FPDU Alignment, the receiver buffers no longer scale with number of
   connections.

   Additional optimizations can be achieved in a balanced I/O sub-system
   -- where the system interface of the network controller provides
   ample bandwidth as compared with the network bandwidth.  For almost
   twenty years this has been the case and the trend is expected to
   continue.  While Ethernet speeds have scaled by 1000 (from 10
   megabit/sec to 10 gigabit/sec), I/O bus bandwidth of volume CPU
   architectures has scaled from ~2 MB/sec to ~2 GB/sec (PC-XT bus to
   PCI-X DDR).  Under these conditions, the FPDU Alignment approach
   allows BufferSizeAF to be indifferent to network speed.  It is
   primarily a function of the local processing time for a given frame.
Top   ToC   RFC5044 - Page 56
   Thus, when the FPDU Alignment approach is used, receive buffering is
   expected to scale gracefully (i.e., less than linear scaling) as
   network speed is increased.

B.2.1. Impact of Lack of FPDU Alignment on the Receiver Computational Load and Complexity

The receiver must perform IP and TCP processing, and then perform FPDU CRC checks, before it can trust the FPDU header placement information. For simplicity of the description, the assumption is that an FPDU is carried in no more than 2 TCP segments. In reality, with no FPDU Alignment, an FPDU can be carried by more than 2 TCP segments (e.g., if the path MTU was reduced). ----++-----------------------------++-----------------------++----- +---||---------------+ +--------||--------+ +----------||----+ | TCP Seg X-1 | | TCP Seg X | | TCP Seg X+1 | +---||---------------+ +--------||--------+ +----------||----+ ----++-----------------------------++-----------------------++----- FPDU #N-1 FPDU #N Figure 12: Non-Aligned FPDU Freely Placed in TCP Octet Stream The receiver algorithm for processing TCP segments (e.g., TCP segment #X in Figure 12) carrying non-aligned FPDUs (in order or out of order) includes: Data Link Layer processing (whole frame) -- typically including a CRC calculation. 1. Network Layer processing (assuming not an IP fragment, the whole Data Link Layer frame contains one IP datagram. IP fragments should be reassembled in a local buffer. This is not a performance optimization goal.) 2. Transport Layer processing -- TCP protocol processing, header and checksum checks. a. Classify incoming TCP segment using the 5 tuple (IP SRC, IP DST, TCP SRC Port, TCP DST Port, protocol).
Top   ToC   RFC5044 - Page 57
       3.  Find FPDU message boundaries.

           a.  Get MPA state information for the connection.

               If the TCP segment is in order, use the receiver-managed
               MPA state information to calculate where the previous
               FPDU message (#N-1) ends in the current TCP segment X.
               (previously, when the MPA receiver processed the first
               part of FPDU #N-1, it calculated the number of bytes
               remaining to complete FPDU #N-1 by using the MPA Length
               field).

                   Get the stored partial CRC for FPDU #N-1.

                   Complete CRC calculation for FPDU #N-1 data (first
                       portion of TCP segment #X).

                   Check CRC calculation for FPDU #N-1.

                   If no FPDU CRC errors, placement is allowed.

                   Locate the local buffer for the first portion of
                       FPDU#N-1, CopyData(local buffer of first portion
                       of FPDU #N-1, host buffer address, length).

                   Compute host buffer address for second portion of
                       FPDU #N-1.

                   CopyData (local buffer of second portion of FPDU #N-
                       1, host buffer address for second portion,
                       length).

                   Calculate the octet offset into the TCP segment for
                       the next FPDU #N.

                   Start calculation of CRC for available data for FPDU.
                       #N

                   Store partial CRC results for FPDU #N.

                   Store local buffer address of first portion of FPDU
                       #N.

                   No further action is possible on FPDU #N, before it
                       is completely received.
Top   ToC   RFC5044 - Page 58
               If the TCP segment is out of order, the receiver must
               buffer the data until at least one complete FPDU is
               received.  Typically, buffering for more than one TCP
               segment per connection is required.  Use the MPA-based
               Markers to calculate where FPDU boundaries are.

                   When a complete FPDU is available, a similar
                   procedure to the in-order algorithm above is used.
                   There is additional complexity, though, because when
                   the missing segment arrives, this TCP segment must be
                   run through the CRC engine after the CRC is
                   calculated for the missing segment.

   If we assume FPDU Alignment, the following diagram and the algorithm
   below apply.  Note that when using MPA, the receiver is assumed to
   actively detect presence or loss of FPDU Alignment for every TCP
   segment received.

      +--------------------------+      +--------------------------+
   +--|--------------------------+   +--|--------------------------+
   |  |       TCP Seg X          |   |  |         TCP Seg X+1      |
   +--|--------------------------+   +--|--------------------------+
      +--------------------------+      +--------------------------+
                FPDU #N                          FPDU #N+1

      Figure 13: Aligned FPDU Placed Immediately after TCP Header
Top   ToC   RFC5044 - Page 59
   The receiver algorithm for FPDU Aligned frames (in order or out of
   order) includes:

       1)  Data Link Layer processing (whole frame) -- typically
           including a CRC calculation.

       2)  Network Layer processing (assuming not an IP fragment, the
           whole Data Link Layer frame contains one IP datagram.  IP
           fragments should be reassembled in a local buffer.  This is
           not a performance optimization goal.)

       3)  Transport Layer processing -- TCP protocol processing, header
           and checksum checks.

           a.  Classify incoming TCP segment using the 5 tuple (IP SRC,
               IP DST, TCP SRC Port, TCP DST Port, protocol).

       4)  Check for Header Alignment (described in detail in Section
           6).  Assuming Header Alignment for the rest of the algorithm
           below.

           a.  If the header is not aligned, see the algorithm defined
               in the prior section.

       5)  If TCP segment is in order or out of order, the MPA header is
           at the beginning of the current TCP payload.  Get the FPDU
           length from the FPDU header.

       6)  Calculate CRC over FPDU.

       7)  Check CRC calculation for FPDU #N.

       8)  If no FPDU CRC errors, placement is allowed.

       9)  CopyData(TCP segment #X, host buffer address, length).

       10) Loop to #5 until all the FPDUs in the TCP segment are
           consumed in order to handle FPDU packing.

   Implementation note: In both cases, the receiver has to classify the
   incoming TCP segment and associate it with one of the flows it
   maintains.  In the case of no FPDU Alignment, the receiver is forced
   to classify incoming traffic before it can calculate the FPDU CRC.
   In the case of FPDU Alignment, the operations order is left to the
   implementer.
Top   ToC   RFC5044 - Page 60
   The FPDU Aligned receiver algorithm is significantly simpler.  There
   is no need to locally buffer portions of FPDUs.  Accessing state
   information is also substantially simplified -- the normal case does
   not require retrieving information to find out where an FPDU starts
   and ends or retrieval of a partial CRC before the CRC calculation can
   commence.  This avoids adding internal latencies, having multiple
   data passes through the CRC machine, or scheduling multiple commands
   for moving the data to the host buffer.

   The aligned FPDU approach is useful for in-order and out-of-order
   reception.  The receiver can use the same mechanisms for data storage
   in both cases, and only needs to account for when all the TCP
   segments have arrived to enable Delivery.  The Header Alignment,
   along with the high probability that at least one complete FPDU is
   found with every TCP segment, allows the receiver to perform data
   placement for out-of-order TCP segments with no need for intermediate
   buffering.  Essentially, the TCP receive buffer has been eliminated
   and TCP reassembly is done in place within the ULP Buffer.

   In case FPDU Alignment is not found, the receiver should follow the
   algorithm for non-aligned FPDU reception, which may be slower and
   less efficient.

B.2.2. FPDU Alignment Effects on TCP Wire Protocol

In an optimized MPA/TCP implementation, TCP exposes its EMSS to MPA. MPA uses the EMSS to calculate its MULPDU, which it then exposes to DDP, its ULP. DDP uses the MULPDU to segment its payload so that each FPDU sent by MPA fits completely into one TCP segment. This has no impact on wire protocol, and exposing this information is already supported on many TCP implementations, including all modern flavors of BSD networking, through the TCP_MAXSEG socket option. In the common case, the ULP (i.e., DDP over MPA) messages provided to the TCP layer are segmented to MULPDU size. It is assumed that the ULP message size is bounded by MULPDU, such that a single ULP message can be encapsulated in a single TCP segment. Therefore, in the common case, there is no increase in the number of TCP segments emitted. For smaller ULP messages, the sender can also apply packing, i.e., the sender packs as many complete FPDUs as possible into one TCP segment. The requirement to always have a complete FPDU may increase the number of TCP segments emitted. Typically, a ULP message size varies from a few bytes to multiple EMSSs (e.g., 64 Kbytes). In some cases, the ULP may post more than one message at a time for transmission, giving the sender an opportunity for packing. In the case where more than one FPDU is available for transmission and the FPDUs are encapsulated into a TCP segment and there is no room in the TCP segment to include the next complete FPDU, another
Top   ToC   RFC5044 - Page 61
   TCP segment is sent.  In this corner case, some of the TCP segments
   are not full size.  In the worst-case scenario, the ULP may choose an
   FPDU size that is EMSS/2 +1 and has multiple messages available for
   transmission.  For this poor choice of FPDU size, the average TCP
   segment size is therefore about 1/2 of the EMSS and the number of TCP
   segments emitted is approaching 2x of what is possible without the
   requirement to encapsulate an integer number of complete FPDUs in
   every TCP segment.  This is a dynamic situation that only lasts for
   the duration where the sender ULP has multiple non-optimal messages
   for transmission and this causes a minor impact on the wire
   utilization.

   However, it is not expected that requiring FPDU Alignment will have a
   measurable impact on wire behavior of most applications.  Throughput
   applications with large I/Os are expected to take full advantage of
   the EMSS.  Another class of applications with many small outstanding
   buffers (as compared to EMSS) is expected to use packing when
   applicable.  Transaction-oriented applications are also optimal.

   TCP retransmission is another area that can affect sender behavior.
   TCP supports retransmission of the exact, originally transmitted
   segment (see [RFC793], Sections 2.6 and 3.7 (under "Managing the
   Window") and [RFC1122], Section 4.2.2.15).  In the unlikely event
   that part of the original segment has been received and acknowledged
   by the Remote Peer (e.g., a re-segmenting middlebox, as documented in
   Appendix A.4, Re-Segmenting Middleboxes and Non-Optimized MPA/TCP
   Senders), a better available bandwidth utilization may be possible by
   retransmitting only the missing octets.  If an optimized MPA/TCP
   retransmits complete FPDUs, there may be some marginal bandwidth
   loss.

   Another area where a change in the TCP segment number may have impact
   is that of slow start and congestion avoidance.  Slow-start
   exponential increase is measured in segments per second, as the
   algorithm focuses on the overhead per segment at the source for
   congestion that eventually results in dropped segments.  Slow-start
   exponential bandwidth growth for optimized MPA/TCP is similar to any
   TCP implementation.  Congestion avoidance allows for a linear growth
   in available bandwidth when recovering after a packet drop.  Similar
   to the analysis for slow start, optimized MPA/TCP doesn't change the
   behavior of the algorithm.  Therefore, the average size of the
   segment versus EMSS is not a major factor in the assessment of the
   bandwidth growth for a sender.  Both slow start and congestion
   avoidance for an optimized MPA/TCP will behave similarly to any TCP
   sender and allow an optimized MPA/TCP to enjoy the theoretical
   performance limits of the algorithms.
Top   ToC   RFC5044 - Page 62
   In summary, the ULP messages generated at the sender (e.g., the
   amount of messages grouped for every transmission request) and
   message size distribution has the most significant impact over the
   number of TCP segments emitted.  The worst-case effect for certain
   ULPs (with average message size of EMSS/2+1 to EMSS) is bounded by an
   increase of up to 2x in the number of TCP segments and acknowledges.
   In reality, the effect is expected to be marginal.

Appendix C. IETF Implementation Interoperability with RDMA Consortium Protocols

This appendix is for information only and is NOT part of the standard. This appendix covers methods of making MPA implementations interoperate with both IETF and RDMA Consortium versions of the protocols. The RDMA Consortium created early specifications of the MPA/DDP/RDMA protocols, and some manufacturers created implementations of those protocols before the IETF versions were finalized. These protocols are very similar to the IETF versions making it possible for implementations to be created or modified to support either set of specifications. For those interested, the RDMA Consortium protocol documents (draft- culley-iwarp-mpa-v1.0.pdf [RDMA-MPA], draft-shah-iwarp-ddp-v1.0.pdf [RDMA-DDP], and draft-recio-iwarp-rdmac-v1.0.pdf [RDMA-RDMAC]) can be obtained at http://www.rdmaconsortium.org/home. In this section, implementations of MPA/DDP/RDMA that conform to the RDMAC specifications are called RDMAC RNICs. Implementations of MPA/DDP/RDMA that conform to the IETF RFCs are called IETF RNICs. Without the exchange of MPA Request/Reply Frames, there is no standard mechanism for enabling RDMAC RNICs to interoperate with IETF RNICs. Even if a ULP uses a well-known port to start an IETF RNIC immediately in RDMA mode (i.e., without exchanging the MPA Request/Reply messages), there is no reason to believe an IETF RNIC will interoperate with an RDMAC RNIC because of the differences in the version number in the DDP and RDMAP headers on the wire. Therefore, the ULP or other supporting entity at the RDMAC RNIC must implement MPA Request/Reply Frames on behalf of the RNIC in order to negotiate the connection parameters. The following section describes the results following the exchange of the MPA Request/Reply Frames before the conversion from streaming to RDMA mode.
Top   ToC   RFC5044 - Page 63

C.1. Negotiated Parameters

Three types of RNICs are considered: Upgraded RDMAC RNIC - an RNIC implementing the RDMAC protocols that has a ULP or other supporting entity that exchanges the MPA Request/Reply Frames in streaming mode before the conversion to RDMA mode. Non-permissive IETF RNIC - an RNIC implementing the IETF protocols that is not capable of implementing the RDMAC protocols. Such an RNIC can only interoperate with other IETF RNICs. Permissive IETF RNIC - an RNIC implementing the IETF protocols that is capable of implementing the RDMAC protocols on a per-connection basis. The Permissive IETF RNIC is recommended for those implementers that want maximum interoperability with other RNIC implementations. The values used by these three RNIC types for the MPA, DDP, and RDMAP versions as well as MPA Markers and CRC are summarized in Figure 14. +----------------++-----------+-----------+-----------+-----------+ | RNIC TYPE || DDP/RDMAP | MPA | MPA | MPA | | || Version | Revision | Markers | CRC | +----------------++-----------+-----------+-----------+-----------+ +----------------++-----------+-----------+-----------+-----------+ | RDMAC || 0 | 0 | 1 | 1 | | || | | | | +----------------++-----------+-----------+-----------+-----------+ | IETF || 1 | 1 | 0 or 1 | 0 or 1 | | Non-permissive || | | | | +----------------++-----------+-----------+-----------+-----------+ | IETF || 1 or 0 | 1 or 0 | 0 or 1 | 0 or 1 | | permissive || | | | | +----------------++-----------+-----------+-----------+-----------+ Figure 14: Connection Parameters for the RNIC Types for MPA Markers and MPA CRC, enabled=1, disabled=0. It is assumed there is no mixing of versions allowed between MPA, DDP, and RDMAP. The RNIC either generates the RDMAC protocols on the wire (version is zero) or uses the IETF protocols (version is one).
Top   ToC   RFC5044 - Page 64
   During the exchange of the MPA Request/Reply Frames, each peer
   provides its MPA Revision, Marker preference (M: 0=disabled,
   1=enabled), and CRC preference.  The MPA Revision provided in the MPA
   Request Frame and the MPA Reply Frame may differ.

   From the information in the MPA Request/Reply Frames, each side sets
   the Version field (V: 0=RDMAC, 1=IETF) of the DDP/RDMAP protocols as
   well as the state of the Markers for each half connection.  Between
   DDP and RDMAP, no mixing of versions is allowed.  Moreover, the DDP
   and RDMAP version MUST be identical in the two directions.  The RNIC
   either generates the RDMAC protocols on the wire (version is zero) or
   uses the IETF protocols (version is one).

   In the following sections, the figures do not discuss CRC negotiation
   because there is no interoperability issue for CRCs.  Since the RDMAC
   RNIC will always request CRC use, then, according to the IETF MPA
   specification, both peers MUST generate and check CRCs.

C.2. RDMAC RNIC and Non-Permissive IETF RNIC

Figure 15 shows that a Non-permissive IETF RNIC cannot interoperate with an RDMAC RNIC, despite the fact that both peers exchange MPA Request/Reply Frames. For a Non-permissive IETF RNIC, the MPA negotiation has no effect on the DDP/RDMAP version and it is unable to interoperate with the RDMAC RNIC. The rows in the figure show the state of the Marker field in the MPA Request Frame sent by the MPA Initiator. The columns show the state of the Marker field in the MPA Reply Frame sent by the MPA Responder. Each type of RNIC is shown as an Initiator and a Responder. The connection results are shown in the lower right corner, at the intersection of the different RNIC types, where V=0 is the RDMAC DDP/RDMAP version, V=1 is the IETF DDP/RDMAC version, M=0 means MPA Markers are disabled, and M=1 means MPA Markers are enabled. The negotiated Marker state is shown as X/Y, for the receive direction of the Initiator/Responder.
Top   ToC   RFC5044 - Page 65
          +---------------------------++-----------------------+
          |   MPA                     ||          MPA          |
          | CONNECT                   ||       Responder       |
          |   MODE  +-----------------++-------+---------------+
          |         |   RNIC          || RDMAC |     IETF      |
          |         |   TYPE          ||       | Non-permissive|
          |         |          +------++-------+-------+-------+
          |         |          |MARKER|| M=1   | M=0   |  M=1  |
          +---------+----------+------++-------+-------+-------+
          +---------+----------+------++-------+-------+-------+
          |         |   RDMAC  | M=1  || V=0   | close | close |
          |         |          |      || M=1/1 |       |       |
          |         +----------+------++-------+-------+-------+
          |   MPA   |          | M=0  || close | V=1   | V=1   |
          |Initiator|   IETF   |      ||       | M=0/0 | M=0/1 |
          |         |Non-perms.+------++-------+-------+-------+
          |         |          | M=1  || close | V=1   | V=1   |
          |         |          |      ||       | M=1/0 | M=1/1 |
          +---------+----------+------++-------+-------+-------+

           Figure 15: MPA Negotiation between an RDMAC RNIC and
                      a Non-Permissive IETF RNIC

C.2.1. RDMAC RNIC Initiator

If the RDMAC RNIC is the MPA Initiator, its ULP sends an MPA Request Frame with Rev field set to zero and the M and C bits set to one. Because the Non-permissive IETF RNIC cannot dynamically downgrade the version number it uses for DDP and RDMAP, it would send an MPA Reply Frame with the Rev field equal to one and then gracefully close the connection.

C.2.2. Non-Permissive IETF RNIC Initiator

If the Non-permissive IETF RNIC is the MPA Initiator, it sends an MPA Request Frame with Rev field equal to one. The ULP or supporting entity for the RDMAC RNIC responds with an MPA Reply Frame that has the Rev field equal to zero and the M bit set to one. The Non- permissive IETF RNIC will gracefully close the connection after it reads the incompatible Rev field in the MPA Reply Frame.

C.2.3. RDMAC RNIC and Permissive IETF RNIC

Figure 16 shows that a Permissive IETF RNIC can interoperate with an RDMAC RNIC regardless of its Marker preference. The figure uses the same format as shown with the Non-permissive IETF RNIC.
Top   ToC   RFC5044 - Page 66
          +---------------------------++-----------------------+
          |   MPA                     ||          MPA          |
          | CONNECT                   ||       Responder       |
          |   MODE  +-----------------++-------+---------------+
          |         |   RNIC          || RDMAC |     IETF      |
          |         |   TYPE          ||       |  Permissive   |
          |         |          +------++-------+-------+-------+
          |         |          |MARKER|| M=1   | M=0   | M=1   |
          +---------+----------+------++-------+-------+-------+
          +---------+----------+------++-------+-------+-------+
          |         |   RDMAC  | M=1  || V=0   | N/A   | V=0   |
          |         |          |      || M=1/1 |       | M=1/1 |
          |         +----------+------++-------+-------+-------+
          |   MPA   |          | M=0  || V=0   | V=1   | V=1   |
          |Initiator|   IETF   |      || M=1/1 | M=0/0 | M=0/1 |
          |         |Permissive+------++-------+-------+-------+
          |         |          | M=1  || V=0   | V=1   | V=1   |
          |         |          |      || M=1/1 | M=1/0 | M=1/1 |
          +---------+----------+------++-------+-------+-------+

           Figure 16: MPA Negotiation between an RDMAC RNIC and
                         a Permissive IETF RNIC

   A truly Permissive IETF RNIC will recognize an RDMAC RNIC from the
   Rev field of the MPA Req/Rep Frames and then adjust its receive
   Marker state and DDP/RDMAP version to accommodate the RDMAC RNIC.  As
   a result, as an MPA Responder, the Permissive IETF RNIC will never
   return an MPA Reply Frame with the M bit set to zero.  This case is
   shown as a not applicable (N/A) in Figure 16.

C.2.4. RDMAC RNIC Initiator

When the RDMAC RNIC is the MPA Initiator, its ULP or other supporting entity prepares an MPA Request message and sets the revision to zero and the M bit and C bit to one. The Permissive IETF Responder receives the MPA Request message and checks the revision field. Since it is capable of generating RDMAC DDP/RDMAP headers, it sends an MPA Reply message with revision set to zero and the M and C bits set to one. The Responder must inform its ULP that it is generating version zero DDP/RDMAP messages.
Top   ToC   RFC5044 - Page 67

C.2.5 Permissive IETF RNIC Initiator

If the Permissive IETF RNIC is the MPA Initiator, it prepares the MPA Request Frame setting the Rev field to one. Regardless of the value of the M bit in the MPA Request Frame, the ULP or other supporting entity for the RDMAC RNIC will create an MPA Reply Frame with Rev equal to zero and the M bit set to one. When the Initiator reads the Rev field of the MPA Reply Frame and finds that its peer is an RDMAC RNIC, it must inform its ULP that it should generate version zero DDP/RDMAP messages and enable MPA Markers and CRC.

C.3. Non-Permissive IETF RNIC and Permissive IETF RNIC

For completeness, Figure 17 below shows the results of MPA negotiation between a Non-permissive IETF RNIC and a Permissive IETF RNIC. The important point from this figure is that an IETF RNIC cannot detect whether its peer is a Permissive or Non-permissive RNIC. +---------------------------++-------------------------------+ | MPA || MPA | | CONNECT || Responder | | MODE +-----------------++---------------+---------------+ | | RNIC || IETF | IETF | | | TYPE || Non-permissive| Permissive | | | +------++-------+-------+-------+-------+ | | |MARKER|| M=0 | M=1 | M=0 | M=1 | +---------+----------+------++-------+-------+-------+-------+ +---------+----------+------++-------+-------+-------+-------+ | | | M=0 || V=1 | V=1 | V=1 | V=1 | | | IETF | || M=0/0 | M=0/1 | M=0/0 | M=0/1 | | |Non-perms.+------++-------+-------+-------+-------+ | | | M=1 || V=1 | V=1 | V=1 | V=1 | | | | || M=1/0 | M=1/1 | M=1/0 | M=1/1 | | MPA +----------+------++-------+-------+-------+-------+ |Initiator| | M=0 || V=1 | V=1 | V=1 | V=1 | | | IETF | || M=0/0 | M=0/1 | M=0/0 | M=0/1 | | |Permissive+------++-------+-------+-------+-------+ | | | M=1 || V=1 | V=1 | V=1 | V=1 | | | | || M=1/0 | M=1/1 | M=1/0 | M=1/1 | +---------+----------+------++-------+-------+-------+-------+ Figure 17: MPA negotiation between a Non-permissive IETF RNIC and a Permissive IETF RNIC.
Top   ToC   RFC5044 - Page 68

Normative References

[iSCSI] Satran, J., Meth, K., Sapuntzakis, C., Chadalapaka, M., and E. Zeidner, "Internet Small Computer Systems Interface (iSCSI)", RFC 3720, April 2004. [RFC1191] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191, November 1990. [RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP Selective Acknowledgment Options", RFC 2018, October 1996. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC2401] Kent, S. and R. Atkinson, "Security Architecture for the Internet Protocol", RFC 2401, November 1998. [RFC3723] Aboba, B., Tseng, J., Walker, J., Rangan, V., and F. Travostino, "Securing Block Storage Protocols over IP", RFC 3723, April 2004. [RFC793] Postel, J., "Transmission Control Protocol", STD 7, RFC 793, September 1981. [RDMASEC] Pinkerton, J. and E. Deleganes, "Direct Data Placement Protocol (DDP) / Remote Direct Memory Access Protocol (RDMAP) Security", RFC 5042, October 2007.

Informative References

[APPL] Bestler, C. and L. Coene, "Applicability of Remote Direct Memory Access Protocol (RDMA) and Direct Data Placement (DDP)", RFC 5045, October 2007. [CRCTCP] Stone J., Partridge, C., "When the CRC and TCP checksum disagree", ACM Sigcomm, Sept. 2000. [DAT-API] DAT Collaborative, "kDAPL (Kernel Direct Access Programming Library) and uDAPL (User Direct Access Programming Library)", Http://www.datcollaborative.org. [DDP] Shah, H., Pinkerton, J., Recio, R., and P. Culley, "Direct Data Placement over Reliable Transports", RFC 5041, October 2007.
Top   ToC   RFC5044 - Page 69
   [iSER]       Ko, M., Chadalapaka, M., Hufferd, J., Elzur, U., Shah,
                H., and P. Thaler, "Internet Small Computer System
                Interface (iSCSI) Extensions for Remote Direct Memory
                Access (RDMA)" RFC 5046, October 2007.

   [IT-API]     The Open Group, "Interconnect Transport API (IT-API)"
                Version 2.1, http://www.opengroup.org.

   [NFSv4CHAN]  Williams, N., "On the Use of Channel Bindings to Secure
                Channels", Work in Progress, June 2006.

   [RDMA-DDP]   "Direct Data Placement over Reliable Transports (Version
                1.0)", RDMA Consortium, October 2002,
                <http://www.rdmaconsortium.org/home/draft-shah-iwarp-
                ddp-v1.0.pdf>.

   [RDMA-MPA]   "Marker PDU Aligned Framing for TCP Specification
                (Version 1.0)", RDMA Consortium, October 2002,
                <http://www.rdmaconsortium.org/home/draft-culley-iwarp-
                mpa-v1.0.pdf>.

   [RDMA-RDMAC] "An RDMA Protocol Specification (Version 1.0)", RDMA
                Consortium, October 2002,
                <http://www.rdmaconsortium.org/home/draft-recio-iwarp-
                rdmac-v1.0.pdf>.

   [RDMAP]      Recio, R., Culley, P., Garcia, D., Hilland, J., and B.
                Metzler, "A Remote Direct Memory Access Protocol
                Specification", RFC 5040, October 2007.

   [RFC792]     Postel, J., "Internet Control Message Protocol", STD 5,
                RFC 792, September 1981.

   [RFC896]     Nagle, J., "Congestion control in IP/TCP internetworks",
                RFC 896, January 1984.

   [RFC1122]    Braden, R., "Requirements for Internet Hosts -
                Communication Layers", STD 3, RFC 1122, October 1989.

   [RFC4960]    Stewart, R., Ed., "Stream Control Transmission
                Protocol", RFC 4960, September 2007.

   [RFC4296]    Bailey, S. and T. Talpey, "The Architecture of Direct
                Data Placement (DDP) and Remote Direct Memory Access
                (RDMA) on Internet Protocols", RFC 4296, December 2005.
Top   ToC   RFC5044 - Page 70
   [RFC4297]    Romanow, A., Mogul, J., Talpey, T., and S. Bailey,
                "Remote Direct Memory Access (RDMA) over IP Problem
                Statement", RFC 4297, December 2005.

   [RFC4301]    Kent, S. and K. Seo, "Security Architecture for the
                Internet Protocol", RFC 4301, December 2005.

   [VERBS-RMDA] "RDMA Protocol Verbs Specification", RDMA Consortium
                standard, April 2003, <http://www.rdmaconsortium.org/
                home/draft-hilland-iwarp-verbs-v1.0-RDMAC.pdf>.

Contributors

Dwight Barron Hewlett-Packard Company 20555 SH 249 Houston, TX 77070-2698 USA Phone: 281-514-2769 EMail: dwight.barron@hp.com Jeff Chase Department of Computer Science Duke University Durham, NC 27708-0129 USA Phone: +1 919 660 6559 EMail: chase@cs.duke.edu Ted Compton EMC Corporation Research Triangle Park, NC 27709 USA Phone: 919-248-6075 EMail: compton_ted@emc.com Dave Garcia 24100 Hutchinson Rd. Los Gatos, CA 95033 Phone: 831 247 4464 EMail: Dave.Garcia@StanfordAlumni.org Hari Ghadia Gen10 Technology, Inc. 1501 W Shady Grove Road Grand Prairie, TX 75050 Phone: (972) 301 3630 EMail: hghadia@gen10technology.com
Top   ToC   RFC5044 - Page 71
   Howard C. Herbert
   Intel Corporation
   MS CH7-404
   5000 West Chandler Blvd.
   Chandler, AZ 85226
   Phone: 480-554-3116
   EMail: howard.c.herbert@intel.com

   Jeff Hilland
   Hewlett-Packard Company
   20555 SH 249
   Houston, TX 77070-2698 USA
   Phone: 281-514-9489
   EMail: jeff.hilland@hp.com

   Mike Ko
   IBM
   650 Harry Rd.
   San Jose, CA 95120
   Phone: (408) 927-2085
   EMail: mako@us.ibm.com

   Mike Krause
   Hewlett-Packard Corporation, 43LN
   19410 Homestead Road
   Cupertino, CA 95014 USA
   Phone: +1 (408) 447-3191
   EMail: krause@cup.hp.com

   Dave Minturn
   Intel Corporation
   MS JF1-210
   5200 North East Elam Young Parkway
   Hillsboro, Oregon  97124
   Phone: 503-712-4106
   EMail: dave.b.minturn@intel.com

   Jim Pinkerton
   Microsoft, Inc.
   One Microsoft Way
   Redmond, WA 98052 USA
   EMail: jpink@microsoft.com
Top   ToC   RFC5044 - Page 72
   Hemal Shah
   Broadcom Corporation
   5300 California Avenue
   Irvine, CA 92617 USA
   Phone: +1 (949) 926-6941
   EMail: hemal@broadcom.com

   Allyn Romanow
   Cisco Systems
   170 W Tasman Drive
   San Jose, CA 95134 USA
   Phone: +1 408 525 8836
   EMail: allyn@cisco.com

   Tom Talpey
   Network Appliance
   1601 Trapelo Road #16
   Waltham, MA  02451 USA
   Phone: +1 (781) 768-5329
   EMail: thomas.talpey@netapp.com

   Patricia Thaler
   Broadcom
   16215 Alton Parkway
   Irvine, CA 92618
   Phone: 916 570 2707
   EMail: pthaler@broadcom.com

   Jim Wendt
   Hewlett Packard Corporation
   8000 Foothills Boulevard MS 5668
   Roseville, CA 95747-5668 USA
   Phone: +1 916 785 5198
   EMail: jim_wendt@hp.com

   Jim Williams
   Emulex Corporation
   580 Main Street
   Bolton, MA 01740 USA
   Phone: +1 978 779 7224
   EMail: jim.williams@emulex.com
Top   ToC   RFC5044 - Page 73

Authors' Addresses

Paul R. Culley Hewlett-Packard Company 20555 SH 249 Houston, TX 77070-2698 USA Phone: 281-514-5543 EMail: paul.culley@hp.com Uri Elzur 5300 California Avenue Irvine, CA 92617, USA Phone: 949.926.6432 EMail: uri@broadcom.com Renato J Recio IBM Internal Zip 9043 11400 Burnett Road Austin, Texas 78759 Phone: 512-838-3685 EMail: recio@us.ibm.com Stephen Bailey Sandburst Corporation 600 Federal Street Andover, MA 01810 USA Phone: +1 978 689 1614 EMail: steph@sandburst.com John Carrier Cray Inc. 411 First Avenue S, Suite 600 Seattle, WA 98104-2860 Phone: 206-701-2090 EMail: carrier@cray.com
Top   ToC   RFC5044 - Page 74
Full Copyright Statement

   Copyright (C) The IETF Trust (2007).

   This document is subject to the rights, licenses and restrictions
   contained in BCP 78, and except as set forth therein, the authors
   retain all their rights.

   This document and the information contained herein are provided on an
   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
   THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
   OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
   THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

Intellectual Property

   The IETF takes no position regarding the validity or scope of any
   Intellectual Property Rights or other rights that might be claimed to
   pertain to the implementation or use of the technology described in
   this document or the extent to which any license under such rights
   might or might not be available; nor does it represent that it has
   made any independent effort to identify any such rights.  Information
   on the procedures with respect to rights in RFC documents can be
   found in BCP 78 and BCP 79.

   Copies of IPR disclosures made to the IETF Secretariat and any
   assurances of licenses to be made available, or the result of an
   attempt made to obtain a general license or permission for the use of
   such proprietary rights by implementers or users of this
   specification can be obtained from the IETF on-line IPR repository at
   http://www.ietf.org/ipr.

   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights that may cover technology that may be required to implement
   this standard.  Please address the information to the IETF at
   ietf-ipr@ietf.org.