Appendix A. Optimized MPA-Aware TCP Implementations
This appendix is for information only and is NOT part of the standard. This appendix covers some Optimized MPA-aware TCP implementation guidance to implementers. It is intended for those implementations that want to send/receive as much traffic as possible in an aligned and zero-copy fashion. +-----------------------------------+ | +-----------+ +-----------------+ | | | Optimized | | Other Protocols | | | | MPA/TCP | +-----------------+ | | +-----------+ || | | \\ --- socket API --- | | \\ || | | \\ +-----+ | | \\ | TCP | | | \\ +-----+ | | \\ // | | +-------+ | | | IP | | | +-------+ | +-----------------------------------+ Figure 11: Optimized MPA/TCP Implementation The diagram above shows a block diagram of a potential implementation. The network sub-system in the diagram can support traditional sockets-based connections using the normal API as shown on the right side of the diagram. Connections for DDP/MPA/TCP are run using the facilities shown on the left side of the diagram. The DDP/MPA/TCP connections can be started using the facilities shown on the left side using some suitable API, or they can be initiated using the facilities shown on the right side and transitioned to the left side at the point in the connection setup where MPA goes to "Full MPA/DDP Operation Phase" as described in Section 7.1.2. The optimized MPA/TCP implementations (left side of diagram and described below) are only applicable to MPA. All other TCP applications continue to use the standard TCP stacks and interfaces shown in the right side of the diagram.
A.1. Optimized MPA/TCP Transmitters
The various TCP RFCs allow considerable choice in segmenting a TCP stream. In order to optimize FPDU recovery at the MPA receiver, an optimized MPA/TCP implementation uses additional segmentation rules. To provide optimum performance, an optimized MPA/TCP transmit side implementation should be enabled to: * With an EMSS large enough to contain the FPDU(s), segment the outgoing TCP stream such that the first octet of every TCP segment begins with an FPDU. Multiple FPDUs may be packed into a single TCP segment as long as they are entirely contained in the TCP segment. * Report the current EMSS from the TCP to the MPA transmit layer. There are exceptions to the above rule. Once an ULPDU is provided to MPA, the MPA/TCP sender transmits it or fails the connection; it cannot be repudiated. As a result, during changes in MTU and EMSS, or when TCP's Receive Window size (RWIN) becomes too small, it may be necessary to send FPDUs that do not conform to the segmentation rule above. A possible, but less desirable, alternative is to use IP fragmentation on accepted FPDUs to deal with MTU reductions or extremely small EMSS. Even when alignment with TCP segments is lost, the sender still formats the FPDU according to FPDU format as shown in Figure 2. On a retransmission, TCP does not necessarily preserve original TCP segmentation boundaries. This can lead to the loss of FPDU Alignment and containment within a TCP segment during TCP retransmissions. An optimized MPA/TCP sender should try to preserve original TCP segmentation boundaries on a retransmission.A.2. Effects of Optimized MPA/TCP Segmentation
Optimized MPA/TCP senders will fill TCP segments to the EMSS with a single FPDU when a DDP message is large enough. Since the DDP message may not exactly fit into TCP segments, a "message tail" often occurs that results in an FPDU that is smaller than a single TCP segment. Additionally, some DDP messages may be considerably shorter than the EMSS. If a small FPDU is sent in a single TCP segment, the result is a "short" TCP segment.
Applications expected to see strong advantages from Direct Data Placement include transaction-based applications and throughput applications. Request/response protocols typically send one FPDU per TCP segment and then wait for a response. Under these conditions, these "short" TCP segments are an appropriate and expected effect of the segmentation. Another possibility is that the application might be sending multiple messages (FPDUs) to the same endpoint before waiting for a response. In this case, the segmentation policy would tend to reduce the available connection bandwidth by under-filling the TCP segments. Standard TCP implementations often utilize the Nagle [RFC896] algorithm to ensure that segments are filled to the EMSS whenever the round-trip latency is large enough that the source stream can fully fill segments before ACKs arrive. The algorithm does this by delaying the transmission of TCP segments until a ULP can fill a segment, or until an ACK arrives from the far side. The algorithm thus allows for smaller segments when latencies are shorter to keep the ULP's end-to-end latency to reasonable levels. The Nagle algorithm is not mandatory to use [RFC1122]. When used with optimized MPA/TCP stacks, Nagle and similar algorithms can result in the "packing" of multiple FPDUs into TCP segments. If a "message tail", small DDP messages, or the start of a larger DDP message are available, MPA may pack multiple FPDUs into TCP segments. When this is done, the TCP segments can be more fully utilized, but, due to the size constraints of FPDUs, segments may not be filled to the EMSS. A dynamic MULPDU that informs DDP of the size of the remaining TCP segment space makes filling the TCP segment more effective. Note that MPA receivers do more processing of a TCP segment that contains multiple FPDUs; this may affect the performance of some receiver implementations. It is up to the ULP to decide if Nagle is useful with DDP/MPA. Note that many of the applications expected to take advantage of MPA/DDP prefer to avoid the extra delays caused by Nagle. In such scenarios, it is anticipated there will be minimal opportunity for packing at the transmitter and receivers may choose to optimize their performance for this anticipated behavior.
Therefore, the application is expected to set TCP parameters such that it can trade off latency and wire efficiency. Implementations should provide a connection option that disables Nagle for MPA/TCP similar to the way the TCP_NODELAY socket option is provided for a traditional sockets interface. When latency is not critical, application is expected to leave Nagle enabled. In this case, the TCP implementation may pack any available FPDUs into TCP segments so that the segments are filled to the EMSS. If the amount of data available is not enough to fill the TCP segment when it is prepared for transmission, TCP can send the segment partly filled, or use the Nagle algorithm to wait for the ULP to post more data.A.3. Optimized MPA/TCP Receivers
When an MPA receive implementation and the MPA-aware receive side TCP implementation support handling out-of-order ULPDUs, the TCP receive implementation performs the following functions: 1) The implementation passes incoming TCP segments to MPA as soon as they have been received and validated, even if not received in order. The TCP layer commits to keeping each segment before it can be passed to the MPA. This means that the segment must have passed the TCP, IP, and lower layer data integrity validation (i.e., checksum), must be in the receive window, must be part of the same epoch (if timestamps are used to verify this), and must have passed any other checks required by TCP RFCs. This is not to imply that the data must be completely ordered before use. An implementation can accept out-of-order segments, SACK them [RFC2018], and pass them to MPA immediately, before the reception of the segments needed to fill in the gaps. MPA expects to utilize these segments when they are complete FPDUs or can be combined into complete FPDUs to allow the passing of ULPDUs to DDP when they arrive, independent of ordering. DDP uses the passed ULPDU to "place" the DDP segments (see [DDP] for more details). Since MPA performs a CRC calculation and other checks on received FPDUs, the MPA/TCP implementation ensures that any TCP segments that duplicate data already received and processed (as can happen during TCP retries) do not overwrite already received and processed FPDUs. This avoids the possibility that duplicate data may corrupt already validated FPDUs.
2) The implementation provides a mechanism to indicate the ordering of TCP segments as the sender transmitted them. One possible mechanism might be attaching the TCP sequence number to each segment. 3) The implementation also provides a mechanism to indicate when a given TCP segment (and the prior TCP stream) is complete. One possible mechanism might be to utilize the leading (left) edge of the TCP Receive Window. MPA uses the ordering and completion indications to inform DDP when a ULPDU is complete; MPA Delivers the FPDU to DDP. DDP uses the indications to "deliver" its messages to the DDP consumer (see [DDP] for more details). DDP on MPA utilizes the above two mechanisms to establish the Delivery semantics that DDP's consumers agree to. These semantics are described fully in [DDP]. These include requirements on DDP's consumer to respect ownership of buffers prior to the time that DDP delivers them to the Consumer. The use of SACK [RFC2018] significantly improves network utilization and performance and is therefore recommended. When combined with the out-of-order passing of segments to MPA and DDP, significant buffering and copying of received data can be avoided.A.4. Re-Segmenting Middleboxes and Non-Optimized MPA/TCP Senders
Since MPA senders often start FPDUs on TCP segment boundaries, a receiving optimized MPA/TCP implementation may be able to optimize the reception of data in various ways. However, MPA receivers MUST NOT depend on FPDU Alignment on TCP segment boundaries. Some MPA senders may be unable to conform to the sender requirements because their implementation of TCP is not designed with MPA in mind. Even for optimized MPA/TCP senders, the network may contain "middleboxes" which modify the TCP stream by changing the segmentation. This is generally interoperable with TCP and its users and MPA must be no exception. The presence of Markers in MPA (when enabled) allows an optimized MPA/TCP receiver to recover the FPDUs despite these obstacles, although it may be necessary to utilize additional buffering at the receiver to do so.
Some of the cases that a receiver may have to contend with are listed below as a reminder to the implementer: * A single aligned and complete FPDU, either in order or out of order: This can be passed to DDP as soon as validated, and Delivered when ordering is established. * Multiple FPDUs in a TCP segment, aligned and fully contained, either in order or out of order: These can be passed to DDP as soon as validated, and Delivered when ordering is established. * Incomplete FPDU: The receiver should buffer until the remainder of the FPDU arrives. If the remainder of the FPDU is already available, this can be passed to DDP as soon as validated, and Delivered when ordering is established. * Unaligned FPDU start: The partial FPDU must be combined with its preceding portion(s). If the preceding parts are already available, and the whole FPDU is present, this can be passed to DDP as soon as validated, and Delivered when ordering is established. If the whole FPDU is not available, the receiver should buffer until the remainder of the FPDU arrives. * Combinations of unaligned or incomplete FPDUs (and potentially other complete FPDUs) in the same TCP segment: If any FPDU is present in its entirety, or can be completed with portions already available, it can be passed to DDP as soon as validated, and Delivered when ordering is established.A.5. Receiver Implementation
Transport & Network Layer Reassembly Buffers: The use of reassembly buffers (either TCP reassembly buffers or IP fragmentation reassembly buffers) is implementation dependent. When MPA is enabled, reassembly buffers are needed if out-of-order packets arrive and Markers are not enabled. Buffers are also needed if FPDU alignment is lost or if IP fragmentation occurs. This is because the incoming out-of-order segment may not contain enough information for MPA to process all of the FPDU. For cases where a re-segmenting middlebox is present, or where the TCP sender is not optimized, the presence of Markers significantly reduces the amount of buffering needed. Recovery from IP fragmentation is transparent to the MPA Consumers.
A.5.1 Network Layer Reassembly Buffers
The MPA/TCP implementation should set the IP Don't Fragment bit at the IP layer. Thus, upon a path MTU change, intermediate devices drop the IP datagram if it is too large and reply with an ICMP message that tells the source TCP that the path MTU has changed. This causes TCP to emit segments conformant with the new path MTU size. Thus, IP fragments under most conditions should never occur at the receiver. But it is possible. There are several options for implementation of network layer reassembly buffers: 1. drop any IP fragments, and reply with an ICMP message according to [RFC792] (fragmentation needed and DF set) to tell the Remote Peer to resize its TCP segment. 2. support an IP reassembly buffer, but have it of limited size (possibly the same size as the local link's MTU). The end node would normally never Advertise a path MTU larger than the local link MTU. It is recommended that a dropped IP fragment cause an ICMP message to be generated according to RFC 792. 3. multiple IP reassembly buffers, of effectively unlimited size. 4. support an IP reassembly buffer for the largest IP datagram (64 KB). 5. support for a large IP reassembly buffer that could span multiple IP datagrams. An implementation should support at least 2 or 3 above, to avoid dropping packets that have traversed the entire fabric. There is no end-to-end ACK for IP reassembly buffers, so there is no flow control on the buffer. The only end-to-end ACK is a TCP ACK, which can only occur when a complete IP datagram is delivered to TCP. Because of this, under worst case, pathological scenarios, the largest IP reassembly buffer is the TCP receive window (to buffer multiple IP datagrams that have all been fragmented). Note that if the Remote Peer does not implement re-segmentation of the data stream upon receiving the ICMP reply updating the path MTU, it is possible to halt forward progress because the opposite peer would continue to retransmit using a transport segment size that is too large. This deadlock scenario is no different than if the fabric MTU (not last-hop MTU) was reduced after connection setup, and the remote node's behavior is not compliant with [RFC1122].
A.5.2 TCP Reassembly Buffers
A TCP reassembly buffer is also needed. TCP reassembly buffers are needed if FPDU Alignment is lost when using TCP with MPA or when the MPA FPDU spans multiple TCP segments. Buffers are also needed if Markers are disabled and out-of-order packets arrive. Since lost FPDU Alignment often means that FPDUs are incomplete, an MPA on TCP implementation must have a reassembly buffer large enough to recover an FPDU that is less than or equal to the MTU of the locally attached link (this should be the largest possible Advertised TCP path MTU). If the MTU is smaller than 140 octets, a buffer of at least 140 octets long is needed to support the minimum FPDU size. The 140 octets allow for the minimum MULPDU of 128, 2 octets of pad, 2 of ULPDU_Length, 4 of CRC, and space for a possible Marker. As usual, additional buffering is likely to provide better performance. Note that if the TCP segments were not stored, it would be possible to deadlock the MPA algorithm. If the path MTU is reduced, FPDU Alignment requires the source TCP to re-segment the data stream to the new path MTU. The source MPA will detect this condition and reduce the MPA segment size, but any FPDUs already posted to the source TCP will be re-segmented and lose FPDU Alignment. If the destination does not support a TCP reassembly buffer, these segments can never be successfully transmitted and the protocol deadlocks. When a complete FPDU is received, processing continues normally.Appendix B. Analysis of MPA over TCP Operations
This appendix is for information only and is NOT part of the standard. This appendix is an analysis of MPA on TCP and why it is useful to integrate MPA with TCP (with modifications to typical TCP implementations) to reduce overall system buffering and overhead. One of MPA's high-level goals is to provide enough information, when combined with the Direct Data Placement Protocol [DDP], to enable out-of-order placement of DDP payload into the final Upper Layer Protocol (ULP) Buffer. Note that DDP separates the act of placing data into a ULP Buffer from that of notifying the ULP that the ULP Buffer is available for use. In DDP terminology, the former is defined as "Placement", and the later is defined as "Delivery". MPA supports in-order Delivery of the data to the ULP, including support for Direct Data Placement in the final ULP Buffer location when TCP segments arrive out of order. Effectively, the goal is to use the
pre-posted ULP Buffers as the TCP receive buffer, where the reassembly of the ULP Protocol Data Unit (PDU) by TCP (with MPA and DDP) is done in place, in the ULP Buffer, with no data copies. This appendix walks through the advantages and disadvantages of the TCP sender modifications proposed by MPA: 1) that MPA prefers that the TCP sender to do Header Alignment, where a TCP segment should begin with an MPA Framing Protocol Data Unit (FPDU) (if there is payload present). 2) that there be an integral number of FPDUs in a TCP segment (under conditions where the path MTU is not changing). This appendix concludes that the scaling advantages of FPDU Alignment are strong, based primarily on fairly drastic TCP receive buffer reduction requirements and simplified receive handling. The analysis also shows that there is little effect to TCP wire behavior.B.1. Assumptions
B.1.1 MPA Is Layered beneath DDP
MPA is an adaptation layer between DDP and TCP. DDP requires preservation of DDP segment boundaries and a CRC32c digest covering the DDP header and data. MPA adds these features to the TCP stream so that DDP over TCP has the same basic properties as DDP over SCTP.B.1.2. MPA Preserves DDP Message Framing
MPA was designed as a framing layer specifically for DDP and was not intended as a general-purpose framing layer for any other ULP using TCP. A framing layer allows ULPs using it to receive indications from the transport layer only when complete ULPDUs are present. As a framing layer, MPA is not aware of the content of the DDP PDU, only that it has received and, if necessary, reassembled a complete PDU for Delivery to the DDP.B.1.3. The Size of the ULPDU Passed to MPA Is Less Than EMSS under Normal Conditions
To make reception of a complete DDP PDU on every received segment possible, DDP passes to MPA a PDU that is no larger than the EMSS of the underlying fabric. Each FPDU that MPA creates contains sufficient information for the receiver to directly place the ULP payload in the correct location in the correct receive buffer.
Edge cases when this condition does not occur are dealt with, but do not need to be on the fast path.B.1.4. Out-of-Order Placement but NO Out-of-Order Delivery
DDP receives complete DDP PDUs from MPA. Each DDP PDU contains the information necessary to place its ULP payload directly in the correct location in host memory. Because each DDP segment is self-describing, it is possible for DDP segments received out of order to have their ULP payload placed immediately in the ULP receive buffer. Data delivery to the ULP is guaranteed to be in the order the data was sent. DDP only indicates data delivery to the ULP after TCP has acknowledged the complete byte stream.B.2. The Value of FPDU Alignment
Significant receiver optimizations can be achieved when Header Alignment and complete FPDUs are the common case. The optimizations allow utilizing significantly fewer buffers on the receiver and less computation per FPDU. The net effect is the ability to build a "flow-through" receiver that enables TCP-based solutions to scale to 10G and beyond in an economical way. The optimizations are especially relevant to hardware implementations of receivers that process multiple protocol layers -- Data Link Layer (e.g., Ethernet), Network and Transport Layer (e.g., TCP/IP), and even some ULP on top of TCP (e.g., MPA/DDP). As network speed increases, there is an increasing desire to use a hardware-based receiver in order to achieve an efficient high performance solution. A TCP receiver, under worst-case conditions, has to allocate buffers (BufferSizeTCP) whose capacities are a function of the bandwidth- delay product. Thus: BufferSizeTCP = K * bandwidth [octets/second] * Delay [seconds]. Where bandwidth is the end-to-end bandwidth of the connection, delay is the round-trip delay of the connection, and K is an implementation-dependent constant. Thus, BufferSizeTCP scales with the end-to-end bandwidth (10x more buffers for a 10x increase in end-to-end bandwidth). As this buffering approach may scale poorly for hardware or software implementations alike, several approaches allow reduction in the amount of buffering required for high-speed TCP communication.
The MPA/DDP approach is to enable the ULP's Buffer to be used as the TCP receive buffer. If the application pre-posts a sufficient amount of buffering, and each TCP segment has sufficient information to place the payload into the right application buffer, when an out-of- order TCP segment arrives it could potentially be placed directly in the ULP Buffer. However, placement can only be done when a complete FPDU with the placement information is available to the receiver, and the FPDU contents contain enough information to place the data into the correct ULP Buffer (e.g., there is a DDP header available). For the case when the FPDU is not aligned with the TCP segment, it may take, on average, 2 TCP segments to assemble one FPDU. Therefore, the receiver has to allocate BufferSizeNAF (Buffer Size, Non-Aligned FPDU) octets: BufferSizeNAF = K1* EMSS * number_of_connections + K2 * EMSS Where K1 and K2 are implementation-dependent constants and EMSS is the effective maximum segment size. For example, a 1 GB/sec link with 10,000 connections and an EMSS of 1500 B would require 15 MB of memory. Often the number of connections used scales with the network speed, aggravating the situation for higher speeds. FPDU Alignment would allow the receiver to allocate BufferSizeAF (Buffer Size, Aligned FPDU) octets: BufferSizeAF = K2 * EMSS for the same conditions. An FPDU Aligned receiver may require memory in the range of ~100s of KB -- which is feasible for an on-chip memory and enables a "flow-through" design, in which the data flows through the network interface card (NIC) and is placed directly in the destination buffer. Assuming most of the connections support FPDU Alignment, the receiver buffers no longer scale with number of connections. Additional optimizations can be achieved in a balanced I/O sub-system -- where the system interface of the network controller provides ample bandwidth as compared with the network bandwidth. For almost twenty years this has been the case and the trend is expected to continue. While Ethernet speeds have scaled by 1000 (from 10 megabit/sec to 10 gigabit/sec), I/O bus bandwidth of volume CPU architectures has scaled from ~2 MB/sec to ~2 GB/sec (PC-XT bus to PCI-X DDR). Under these conditions, the FPDU Alignment approach allows BufferSizeAF to be indifferent to network speed. It is primarily a function of the local processing time for a given frame.
Thus, when the FPDU Alignment approach is used, receive buffering is expected to scale gracefully (i.e., less than linear scaling) as network speed is increased.B.2.1. Impact of Lack of FPDU Alignment on the Receiver Computational Load and Complexity
The receiver must perform IP and TCP processing, and then perform FPDU CRC checks, before it can trust the FPDU header placement information. For simplicity of the description, the assumption is that an FPDU is carried in no more than 2 TCP segments. In reality, with no FPDU Alignment, an FPDU can be carried by more than 2 TCP segments (e.g., if the path MTU was reduced). ----++-----------------------------++-----------------------++----- +---||---------------+ +--------||--------+ +----------||----+ | TCP Seg X-1 | | TCP Seg X | | TCP Seg X+1 | +---||---------------+ +--------||--------+ +----------||----+ ----++-----------------------------++-----------------------++----- FPDU #N-1 FPDU #N Figure 12: Non-Aligned FPDU Freely Placed in TCP Octet Stream The receiver algorithm for processing TCP segments (e.g., TCP segment #X in Figure 12) carrying non-aligned FPDUs (in order or out of order) includes: Data Link Layer processing (whole frame) -- typically including a CRC calculation. 1. Network Layer processing (assuming not an IP fragment, the whole Data Link Layer frame contains one IP datagram. IP fragments should be reassembled in a local buffer. This is not a performance optimization goal.) 2. Transport Layer processing -- TCP protocol processing, header and checksum checks. a. Classify incoming TCP segment using the 5 tuple (IP SRC, IP DST, TCP SRC Port, TCP DST Port, protocol).
3. Find FPDU message boundaries. a. Get MPA state information for the connection. If the TCP segment is in order, use the receiver-managed MPA state information to calculate where the previous FPDU message (#N-1) ends in the current TCP segment X. (previously, when the MPA receiver processed the first part of FPDU #N-1, it calculated the number of bytes remaining to complete FPDU #N-1 by using the MPA Length field). Get the stored partial CRC for FPDU #N-1. Complete CRC calculation for FPDU #N-1 data (first portion of TCP segment #X). Check CRC calculation for FPDU #N-1. If no FPDU CRC errors, placement is allowed. Locate the local buffer for the first portion of FPDU#N-1, CopyData(local buffer of first portion of FPDU #N-1, host buffer address, length). Compute host buffer address for second portion of FPDU #N-1. CopyData (local buffer of second portion of FPDU #N- 1, host buffer address for second portion, length). Calculate the octet offset into the TCP segment for the next FPDU #N. Start calculation of CRC for available data for FPDU. #N Store partial CRC results for FPDU #N. Store local buffer address of first portion of FPDU #N. No further action is possible on FPDU #N, before it is completely received.
If the TCP segment is out of order, the receiver must buffer the data until at least one complete FPDU is received. Typically, buffering for more than one TCP segment per connection is required. Use the MPA-based Markers to calculate where FPDU boundaries are. When a complete FPDU is available, a similar procedure to the in-order algorithm above is used. There is additional complexity, though, because when the missing segment arrives, this TCP segment must be run through the CRC engine after the CRC is calculated for the missing segment. If we assume FPDU Alignment, the following diagram and the algorithm below apply. Note that when using MPA, the receiver is assumed to actively detect presence or loss of FPDU Alignment for every TCP segment received. +--------------------------+ +--------------------------+ +--|--------------------------+ +--|--------------------------+ | | TCP Seg X | | | TCP Seg X+1 | +--|--------------------------+ +--|--------------------------+ +--------------------------+ +--------------------------+ FPDU #N FPDU #N+1 Figure 13: Aligned FPDU Placed Immediately after TCP Header
The receiver algorithm for FPDU Aligned frames (in order or out of order) includes: 1) Data Link Layer processing (whole frame) -- typically including a CRC calculation. 2) Network Layer processing (assuming not an IP fragment, the whole Data Link Layer frame contains one IP datagram. IP fragments should be reassembled in a local buffer. This is not a performance optimization goal.) 3) Transport Layer processing -- TCP protocol processing, header and checksum checks. a. Classify incoming TCP segment using the 5 tuple (IP SRC, IP DST, TCP SRC Port, TCP DST Port, protocol). 4) Check for Header Alignment (described in detail in Section 6). Assuming Header Alignment for the rest of the algorithm below. a. If the header is not aligned, see the algorithm defined in the prior section. 5) If TCP segment is in order or out of order, the MPA header is at the beginning of the current TCP payload. Get the FPDU length from the FPDU header. 6) Calculate CRC over FPDU. 7) Check CRC calculation for FPDU #N. 8) If no FPDU CRC errors, placement is allowed. 9) CopyData(TCP segment #X, host buffer address, length). 10) Loop to #5 until all the FPDUs in the TCP segment are consumed in order to handle FPDU packing. Implementation note: In both cases, the receiver has to classify the incoming TCP segment and associate it with one of the flows it maintains. In the case of no FPDU Alignment, the receiver is forced to classify incoming traffic before it can calculate the FPDU CRC. In the case of FPDU Alignment, the operations order is left to the implementer.
The FPDU Aligned receiver algorithm is significantly simpler. There is no need to locally buffer portions of FPDUs. Accessing state information is also substantially simplified -- the normal case does not require retrieving information to find out where an FPDU starts and ends or retrieval of a partial CRC before the CRC calculation can commence. This avoids adding internal latencies, having multiple data passes through the CRC machine, or scheduling multiple commands for moving the data to the host buffer. The aligned FPDU approach is useful for in-order and out-of-order reception. The receiver can use the same mechanisms for data storage in both cases, and only needs to account for when all the TCP segments have arrived to enable Delivery. The Header Alignment, along with the high probability that at least one complete FPDU is found with every TCP segment, allows the receiver to perform data placement for out-of-order TCP segments with no need for intermediate buffering. Essentially, the TCP receive buffer has been eliminated and TCP reassembly is done in place within the ULP Buffer. In case FPDU Alignment is not found, the receiver should follow the algorithm for non-aligned FPDU reception, which may be slower and less efficient.B.2.2. FPDU Alignment Effects on TCP Wire Protocol
In an optimized MPA/TCP implementation, TCP exposes its EMSS to MPA. MPA uses the EMSS to calculate its MULPDU, which it then exposes to DDP, its ULP. DDP uses the MULPDU to segment its payload so that each FPDU sent by MPA fits completely into one TCP segment. This has no impact on wire protocol, and exposing this information is already supported on many TCP implementations, including all modern flavors of BSD networking, through the TCP_MAXSEG socket option. In the common case, the ULP (i.e., DDP over MPA) messages provided to the TCP layer are segmented to MULPDU size. It is assumed that the ULP message size is bounded by MULPDU, such that a single ULP message can be encapsulated in a single TCP segment. Therefore, in the common case, there is no increase in the number of TCP segments emitted. For smaller ULP messages, the sender can also apply packing, i.e., the sender packs as many complete FPDUs as possible into one TCP segment. The requirement to always have a complete FPDU may increase the number of TCP segments emitted. Typically, a ULP message size varies from a few bytes to multiple EMSSs (e.g., 64 Kbytes). In some cases, the ULP may post more than one message at a time for transmission, giving the sender an opportunity for packing. In the case where more than one FPDU is available for transmission and the FPDUs are encapsulated into a TCP segment and there is no room in the TCP segment to include the next complete FPDU, another
TCP segment is sent. In this corner case, some of the TCP segments are not full size. In the worst-case scenario, the ULP may choose an FPDU size that is EMSS/2 +1 and has multiple messages available for transmission. For this poor choice of FPDU size, the average TCP segment size is therefore about 1/2 of the EMSS and the number of TCP segments emitted is approaching 2x of what is possible without the requirement to encapsulate an integer number of complete FPDUs in every TCP segment. This is a dynamic situation that only lasts for the duration where the sender ULP has multiple non-optimal messages for transmission and this causes a minor impact on the wire utilization. However, it is not expected that requiring FPDU Alignment will have a measurable impact on wire behavior of most applications. Throughput applications with large I/Os are expected to take full advantage of the EMSS. Another class of applications with many small outstanding buffers (as compared to EMSS) is expected to use packing when applicable. Transaction-oriented applications are also optimal. TCP retransmission is another area that can affect sender behavior. TCP supports retransmission of the exact, originally transmitted segment (see [RFC793], Sections 2.6 and 3.7 (under "Managing the Window") and [RFC1122], Section 4.2.2.15). In the unlikely event that part of the original segment has been received and acknowledged by the Remote Peer (e.g., a re-segmenting middlebox, as documented in Appendix A.4, Re-Segmenting Middleboxes and Non-Optimized MPA/TCP Senders), a better available bandwidth utilization may be possible by retransmitting only the missing octets. If an optimized MPA/TCP retransmits complete FPDUs, there may be some marginal bandwidth loss. Another area where a change in the TCP segment number may have impact is that of slow start and congestion avoidance. Slow-start exponential increase is measured in segments per second, as the algorithm focuses on the overhead per segment at the source for congestion that eventually results in dropped segments. Slow-start exponential bandwidth growth for optimized MPA/TCP is similar to any TCP implementation. Congestion avoidance allows for a linear growth in available bandwidth when recovering after a packet drop. Similar to the analysis for slow start, optimized MPA/TCP doesn't change the behavior of the algorithm. Therefore, the average size of the segment versus EMSS is not a major factor in the assessment of the bandwidth growth for a sender. Both slow start and congestion avoidance for an optimized MPA/TCP will behave similarly to any TCP sender and allow an optimized MPA/TCP to enjoy the theoretical performance limits of the algorithms.
In summary, the ULP messages generated at the sender (e.g., the amount of messages grouped for every transmission request) and message size distribution has the most significant impact over the number of TCP segments emitted. The worst-case effect for certain ULPs (with average message size of EMSS/2+1 to EMSS) is bounded by an increase of up to 2x in the number of TCP segments and acknowledges. In reality, the effect is expected to be marginal.Appendix C. IETF Implementation Interoperability with RDMA Consortium Protocols
This appendix is for information only and is NOT part of the standard. This appendix covers methods of making MPA implementations interoperate with both IETF and RDMA Consortium versions of the protocols. The RDMA Consortium created early specifications of the MPA/DDP/RDMA protocols, and some manufacturers created implementations of those protocols before the IETF versions were finalized. These protocols are very similar to the IETF versions making it possible for implementations to be created or modified to support either set of specifications. For those interested, the RDMA Consortium protocol documents (draft- culley-iwarp-mpa-v1.0.pdf [RDMA-MPA], draft-shah-iwarp-ddp-v1.0.pdf [RDMA-DDP], and draft-recio-iwarp-rdmac-v1.0.pdf [RDMA-RDMAC]) can be obtained at http://www.rdmaconsortium.org/home. In this section, implementations of MPA/DDP/RDMA that conform to the RDMAC specifications are called RDMAC RNICs. Implementations of MPA/DDP/RDMA that conform to the IETF RFCs are called IETF RNICs. Without the exchange of MPA Request/Reply Frames, there is no standard mechanism for enabling RDMAC RNICs to interoperate with IETF RNICs. Even if a ULP uses a well-known port to start an IETF RNIC immediately in RDMA mode (i.e., without exchanging the MPA Request/Reply messages), there is no reason to believe an IETF RNIC will interoperate with an RDMAC RNIC because of the differences in the version number in the DDP and RDMAP headers on the wire. Therefore, the ULP or other supporting entity at the RDMAC RNIC must implement MPA Request/Reply Frames on behalf of the RNIC in order to negotiate the connection parameters. The following section describes the results following the exchange of the MPA Request/Reply Frames before the conversion from streaming to RDMA mode.
C.1. Negotiated Parameters
Three types of RNICs are considered: Upgraded RDMAC RNIC - an RNIC implementing the RDMAC protocols that has a ULP or other supporting entity that exchanges the MPA Request/Reply Frames in streaming mode before the conversion to RDMA mode. Non-permissive IETF RNIC - an RNIC implementing the IETF protocols that is not capable of implementing the RDMAC protocols. Such an RNIC can only interoperate with other IETF RNICs. Permissive IETF RNIC - an RNIC implementing the IETF protocols that is capable of implementing the RDMAC protocols on a per-connection basis. The Permissive IETF RNIC is recommended for those implementers that want maximum interoperability with other RNIC implementations. The values used by these three RNIC types for the MPA, DDP, and RDMAP versions as well as MPA Markers and CRC are summarized in Figure 14. +----------------++-----------+-----------+-----------+-----------+ | RNIC TYPE || DDP/RDMAP | MPA | MPA | MPA | | || Version | Revision | Markers | CRC | +----------------++-----------+-----------+-----------+-----------+ +----------------++-----------+-----------+-----------+-----------+ | RDMAC || 0 | 0 | 1 | 1 | | || | | | | +----------------++-----------+-----------+-----------+-----------+ | IETF || 1 | 1 | 0 or 1 | 0 or 1 | | Non-permissive || | | | | +----------------++-----------+-----------+-----------+-----------+ | IETF || 1 or 0 | 1 or 0 | 0 or 1 | 0 or 1 | | permissive || | | | | +----------------++-----------+-----------+-----------+-----------+ Figure 14: Connection Parameters for the RNIC Types for MPA Markers and MPA CRC, enabled=1, disabled=0. It is assumed there is no mixing of versions allowed between MPA, DDP, and RDMAP. The RNIC either generates the RDMAC protocols on the wire (version is zero) or uses the IETF protocols (version is one).
During the exchange of the MPA Request/Reply Frames, each peer provides its MPA Revision, Marker preference (M: 0=disabled, 1=enabled), and CRC preference. The MPA Revision provided in the MPA Request Frame and the MPA Reply Frame may differ. From the information in the MPA Request/Reply Frames, each side sets the Version field (V: 0=RDMAC, 1=IETF) of the DDP/RDMAP protocols as well as the state of the Markers for each half connection. Between DDP and RDMAP, no mixing of versions is allowed. Moreover, the DDP and RDMAP version MUST be identical in the two directions. The RNIC either generates the RDMAC protocols on the wire (version is zero) or uses the IETF protocols (version is one). In the following sections, the figures do not discuss CRC negotiation because there is no interoperability issue for CRCs. Since the RDMAC RNIC will always request CRC use, then, according to the IETF MPA specification, both peers MUST generate and check CRCs.C.2. RDMAC RNIC and Non-Permissive IETF RNIC
Figure 15 shows that a Non-permissive IETF RNIC cannot interoperate with an RDMAC RNIC, despite the fact that both peers exchange MPA Request/Reply Frames. For a Non-permissive IETF RNIC, the MPA negotiation has no effect on the DDP/RDMAP version and it is unable to interoperate with the RDMAC RNIC. The rows in the figure show the state of the Marker field in the MPA Request Frame sent by the MPA Initiator. The columns show the state of the Marker field in the MPA Reply Frame sent by the MPA Responder. Each type of RNIC is shown as an Initiator and a Responder. The connection results are shown in the lower right corner, at the intersection of the different RNIC types, where V=0 is the RDMAC DDP/RDMAP version, V=1 is the IETF DDP/RDMAC version, M=0 means MPA Markers are disabled, and M=1 means MPA Markers are enabled. The negotiated Marker state is shown as X/Y, for the receive direction of the Initiator/Responder.
+---------------------------++-----------------------+ | MPA || MPA | | CONNECT || Responder | | MODE +-----------------++-------+---------------+ | | RNIC || RDMAC | IETF | | | TYPE || | Non-permissive| | | +------++-------+-------+-------+ | | |MARKER|| M=1 | M=0 | M=1 | +---------+----------+------++-------+-------+-------+ +---------+----------+------++-------+-------+-------+ | | RDMAC | M=1 || V=0 | close | close | | | | || M=1/1 | | | | +----------+------++-------+-------+-------+ | MPA | | M=0 || close | V=1 | V=1 | |Initiator| IETF | || | M=0/0 | M=0/1 | | |Non-perms.+------++-------+-------+-------+ | | | M=1 || close | V=1 | V=1 | | | | || | M=1/0 | M=1/1 | +---------+----------+------++-------+-------+-------+ Figure 15: MPA Negotiation between an RDMAC RNIC and a Non-Permissive IETF RNICC.2.1. RDMAC RNIC Initiator
If the RDMAC RNIC is the MPA Initiator, its ULP sends an MPA Request Frame with Rev field set to zero and the M and C bits set to one. Because the Non-permissive IETF RNIC cannot dynamically downgrade the version number it uses for DDP and RDMAP, it would send an MPA Reply Frame with the Rev field equal to one and then gracefully close the connection.C.2.2. Non-Permissive IETF RNIC Initiator
If the Non-permissive IETF RNIC is the MPA Initiator, it sends an MPA Request Frame with Rev field equal to one. The ULP or supporting entity for the RDMAC RNIC responds with an MPA Reply Frame that has the Rev field equal to zero and the M bit set to one. The Non- permissive IETF RNIC will gracefully close the connection after it reads the incompatible Rev field in the MPA Reply Frame.C.2.3. RDMAC RNIC and Permissive IETF RNIC
Figure 16 shows that a Permissive IETF RNIC can interoperate with an RDMAC RNIC regardless of its Marker preference. The figure uses the same format as shown with the Non-permissive IETF RNIC.
+---------------------------++-----------------------+ | MPA || MPA | | CONNECT || Responder | | MODE +-----------------++-------+---------------+ | | RNIC || RDMAC | IETF | | | TYPE || | Permissive | | | +------++-------+-------+-------+ | | |MARKER|| M=1 | M=0 | M=1 | +---------+----------+------++-------+-------+-------+ +---------+----------+------++-------+-------+-------+ | | RDMAC | M=1 || V=0 | N/A | V=0 | | | | || M=1/1 | | M=1/1 | | +----------+------++-------+-------+-------+ | MPA | | M=0 || V=0 | V=1 | V=1 | |Initiator| IETF | || M=1/1 | M=0/0 | M=0/1 | | |Permissive+------++-------+-------+-------+ | | | M=1 || V=0 | V=1 | V=1 | | | | || M=1/1 | M=1/0 | M=1/1 | +---------+----------+------++-------+-------+-------+ Figure 16: MPA Negotiation between an RDMAC RNIC and a Permissive IETF RNIC A truly Permissive IETF RNIC will recognize an RDMAC RNIC from the Rev field of the MPA Req/Rep Frames and then adjust its receive Marker state and DDP/RDMAP version to accommodate the RDMAC RNIC. As a result, as an MPA Responder, the Permissive IETF RNIC will never return an MPA Reply Frame with the M bit set to zero. This case is shown as a not applicable (N/A) in Figure 16.C.2.4. RDMAC RNIC Initiator
When the RDMAC RNIC is the MPA Initiator, its ULP or other supporting entity prepares an MPA Request message and sets the revision to zero and the M bit and C bit to one. The Permissive IETF Responder receives the MPA Request message and checks the revision field. Since it is capable of generating RDMAC DDP/RDMAP headers, it sends an MPA Reply message with revision set to zero and the M and C bits set to one. The Responder must inform its ULP that it is generating version zero DDP/RDMAP messages.
C.2.5 Permissive IETF RNIC Initiator
If the Permissive IETF RNIC is the MPA Initiator, it prepares the MPA Request Frame setting the Rev field to one. Regardless of the value of the M bit in the MPA Request Frame, the ULP or other supporting entity for the RDMAC RNIC will create an MPA Reply Frame with Rev equal to zero and the M bit set to one. When the Initiator reads the Rev field of the MPA Reply Frame and finds that its peer is an RDMAC RNIC, it must inform its ULP that it should generate version zero DDP/RDMAP messages and enable MPA Markers and CRC.C.3. Non-Permissive IETF RNIC and Permissive IETF RNIC
For completeness, Figure 17 below shows the results of MPA negotiation between a Non-permissive IETF RNIC and a Permissive IETF RNIC. The important point from this figure is that an IETF RNIC cannot detect whether its peer is a Permissive or Non-permissive RNIC. +---------------------------++-------------------------------+ | MPA || MPA | | CONNECT || Responder | | MODE +-----------------++---------------+---------------+ | | RNIC || IETF | IETF | | | TYPE || Non-permissive| Permissive | | | +------++-------+-------+-------+-------+ | | |MARKER|| M=0 | M=1 | M=0 | M=1 | +---------+----------+------++-------+-------+-------+-------+ +---------+----------+------++-------+-------+-------+-------+ | | | M=0 || V=1 | V=1 | V=1 | V=1 | | | IETF | || M=0/0 | M=0/1 | M=0/0 | M=0/1 | | |Non-perms.+------++-------+-------+-------+-------+ | | | M=1 || V=1 | V=1 | V=1 | V=1 | | | | || M=1/0 | M=1/1 | M=1/0 | M=1/1 | | MPA +----------+------++-------+-------+-------+-------+ |Initiator| | M=0 || V=1 | V=1 | V=1 | V=1 | | | IETF | || M=0/0 | M=0/1 | M=0/0 | M=0/1 | | |Permissive+------++-------+-------+-------+-------+ | | | M=1 || V=1 | V=1 | V=1 | V=1 | | | | || M=1/0 | M=1/1 | M=1/0 | M=1/1 | +---------+----------+------++-------+-------+-------+-------+ Figure 17: MPA negotiation between a Non-permissive IETF RNIC and a Permissive IETF RNIC.
Normative References
[iSCSI] Satran, J., Meth, K., Sapuntzakis, C., Chadalapaka, M., and E. Zeidner, "Internet Small Computer Systems Interface (iSCSI)", RFC 3720, April 2004. [RFC1191] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191, November 1990. [RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP Selective Acknowledgment Options", RFC 2018, October 1996. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC2401] Kent, S. and R. Atkinson, "Security Architecture for the Internet Protocol", RFC 2401, November 1998. [RFC3723] Aboba, B., Tseng, J., Walker, J., Rangan, V., and F. Travostino, "Securing Block Storage Protocols over IP", RFC 3723, April 2004. [RFC793] Postel, J., "Transmission Control Protocol", STD 7, RFC 793, September 1981. [RDMASEC] Pinkerton, J. and E. Deleganes, "Direct Data Placement Protocol (DDP) / Remote Direct Memory Access Protocol (RDMAP) Security", RFC 5042, October 2007.Informative References
[APPL] Bestler, C. and L. Coene, "Applicability of Remote Direct Memory Access Protocol (RDMA) and Direct Data Placement (DDP)", RFC 5045, October 2007. [CRCTCP] Stone J., Partridge, C., "When the CRC and TCP checksum disagree", ACM Sigcomm, Sept. 2000. [DAT-API] DAT Collaborative, "kDAPL (Kernel Direct Access Programming Library) and uDAPL (User Direct Access Programming Library)", Http://www.datcollaborative.org. [DDP] Shah, H., Pinkerton, J., Recio, R., and P. Culley, "Direct Data Placement over Reliable Transports", RFC 5041, October 2007.
[iSER] Ko, M., Chadalapaka, M., Hufferd, J., Elzur, U., Shah, H., and P. Thaler, "Internet Small Computer System Interface (iSCSI) Extensions for Remote Direct Memory Access (RDMA)" RFC 5046, October 2007. [IT-API] The Open Group, "Interconnect Transport API (IT-API)" Version 2.1, http://www.opengroup.org. [NFSv4CHAN] Williams, N., "On the Use of Channel Bindings to Secure Channels", Work in Progress, June 2006. [RDMA-DDP] "Direct Data Placement over Reliable Transports (Version 1.0)", RDMA Consortium, October 2002, <http://www.rdmaconsortium.org/home/draft-shah-iwarp- ddp-v1.0.pdf>. [RDMA-MPA] "Marker PDU Aligned Framing for TCP Specification (Version 1.0)", RDMA Consortium, October 2002, <http://www.rdmaconsortium.org/home/draft-culley-iwarp- mpa-v1.0.pdf>. [RDMA-RDMAC] "An RDMA Protocol Specification (Version 1.0)", RDMA Consortium, October 2002, <http://www.rdmaconsortium.org/home/draft-recio-iwarp- rdmac-v1.0.pdf>. [RDMAP] Recio, R., Culley, P., Garcia, D., Hilland, J., and B. Metzler, "A Remote Direct Memory Access Protocol Specification", RFC 5040, October 2007. [RFC792] Postel, J., "Internet Control Message Protocol", STD 5, RFC 792, September 1981. [RFC896] Nagle, J., "Congestion control in IP/TCP internetworks", RFC 896, January 1984. [RFC1122] Braden, R., "Requirements for Internet Hosts - Communication Layers", STD 3, RFC 1122, October 1989. [RFC4960] Stewart, R., Ed., "Stream Control Transmission Protocol", RFC 4960, September 2007. [RFC4296] Bailey, S. and T. Talpey, "The Architecture of Direct Data Placement (DDP) and Remote Direct Memory Access (RDMA) on Internet Protocols", RFC 4296, December 2005.
[RFC4297] Romanow, A., Mogul, J., Talpey, T., and S. Bailey, "Remote Direct Memory Access (RDMA) over IP Problem Statement", RFC 4297, December 2005. [RFC4301] Kent, S. and K. Seo, "Security Architecture for the Internet Protocol", RFC 4301, December 2005. [VERBS-RMDA] "RDMA Protocol Verbs Specification", RDMA Consortium standard, April 2003, <http://www.rdmaconsortium.org/ home/draft-hilland-iwarp-verbs-v1.0-RDMAC.pdf>.Contributors
Dwight Barron Hewlett-Packard Company 20555 SH 249 Houston, TX 77070-2698 USA Phone: 281-514-2769 EMail: dwight.barron@hp.com Jeff Chase Department of Computer Science Duke University Durham, NC 27708-0129 USA Phone: +1 919 660 6559 EMail: chase@cs.duke.edu Ted Compton EMC Corporation Research Triangle Park, NC 27709 USA Phone: 919-248-6075 EMail: compton_ted@emc.com Dave Garcia 24100 Hutchinson Rd. Los Gatos, CA 95033 Phone: 831 247 4464 EMail: Dave.Garcia@StanfordAlumni.org Hari Ghadia Gen10 Technology, Inc. 1501 W Shady Grove Road Grand Prairie, TX 75050 Phone: (972) 301 3630 EMail: hghadia@gen10technology.com
Howard C. Herbert Intel Corporation MS CH7-404 5000 West Chandler Blvd. Chandler, AZ 85226 Phone: 480-554-3116 EMail: howard.c.herbert@intel.com Jeff Hilland Hewlett-Packard Company 20555 SH 249 Houston, TX 77070-2698 USA Phone: 281-514-9489 EMail: jeff.hilland@hp.com Mike Ko IBM 650 Harry Rd. San Jose, CA 95120 Phone: (408) 927-2085 EMail: mako@us.ibm.com Mike Krause Hewlett-Packard Corporation, 43LN 19410 Homestead Road Cupertino, CA 95014 USA Phone: +1 (408) 447-3191 EMail: krause@cup.hp.com Dave Minturn Intel Corporation MS JF1-210 5200 North East Elam Young Parkway Hillsboro, Oregon 97124 Phone: 503-712-4106 EMail: dave.b.minturn@intel.com Jim Pinkerton Microsoft, Inc. One Microsoft Way Redmond, WA 98052 USA EMail: jpink@microsoft.com
Hemal Shah Broadcom Corporation 5300 California Avenue Irvine, CA 92617 USA Phone: +1 (949) 926-6941 EMail: hemal@broadcom.com Allyn Romanow Cisco Systems 170 W Tasman Drive San Jose, CA 95134 USA Phone: +1 408 525 8836 EMail: allyn@cisco.com Tom Talpey Network Appliance 1601 Trapelo Road #16 Waltham, MA 02451 USA Phone: +1 (781) 768-5329 EMail: thomas.talpey@netapp.com Patricia Thaler Broadcom 16215 Alton Parkway Irvine, CA 92618 Phone: 916 570 2707 EMail: pthaler@broadcom.com Jim Wendt Hewlett Packard Corporation 8000 Foothills Boulevard MS 5668 Roseville, CA 95747-5668 USA Phone: +1 916 785 5198 EMail: jim_wendt@hp.com Jim Williams Emulex Corporation 580 Main Street Bolton, MA 01740 USA Phone: +1 978 779 7224 EMail: jim.williams@emulex.com
Authors' Addresses
Paul R. Culley Hewlett-Packard Company 20555 SH 249 Houston, TX 77070-2698 USA Phone: 281-514-5543 EMail: paul.culley@hp.com Uri Elzur 5300 California Avenue Irvine, CA 92617, USA Phone: 949.926.6432 EMail: uri@broadcom.com Renato J Recio IBM Internal Zip 9043 11400 Burnett Road Austin, Texas 78759 Phone: 512-838-3685 EMail: recio@us.ibm.com Stephen Bailey Sandburst Corporation 600 Federal Street Andover, MA 01810 USA Phone: +1 978 689 1614 EMail: steph@sandburst.com John Carrier Cray Inc. 411 First Avenue S, Suite 600 Seattle, WA 98104-2860 Phone: 206-701-2090 EMail: carrier@cray.com
Full Copyright Statement Copyright (C) The IETF Trust (2007). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights. This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Intellectual Property The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79. Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org.