3. SMC-R Rendezvous Architecture
"Rendezvous" is the process that SMC-R-capable peers use to dynamically discover each others' capabilities, negotiate SMC-R connections, set up SMC-R links and link groups, and manage those link groups. A key aspect of SMC-R Rendezvous is that it occurs dynamically and automatically, without requiring SMC-R link configuration to be defined by an administrator. SMC-R Rendezvous starts with the TCP/IP three-way handshake, during which connection peers use TCP options to announce their SMC-R capabilities. If both endpoints are SMC-R capable, then Connection Layer Control (CLC) messages are exchanged between the peers' SMC-R layers over the newly established TCP connection to negotiate SMC-R credentials. The CLC message mechanism is analogous to the messages exchanged by SSL for its handshake processing. If a new SMC-R link is being set up, Link Layer Control (LLC) messages are used to confirm RDMA connectivity. LLC messages are also used by the SMC-R layers at each peer to manage the links and link groups. Once an SMC-R link is set up or agreed to by the peers, the TCP sockets are passed to the peer applications, which use them as normal. The SMC-R layer, which resides under the sockets layer, transmits the socket data between peers over RDMA using the SMC-R protocol, bypassing the TCP/IP stack.3.1. TCP Options
During the TCP/IP three-way handshake, the client and server indicate their support for SMC-R by including experimental TCP option 254 on the three-way handshake flows, in accordance with [RFC6994] ("Shared Use of Experimental TCP Options"). The Experiment Identifier (ExID) value used is the string "SMCR" in EBCDIC (IBM-1047) encoding (0xE2D4C3D9). This ExID has been registered in the "TCP Experimental Option Experiment Identifiers (TCP ExIDs)" registry maintained by IANA.
After completion of the three-way TCP handshake, each peer queries its peer's options. If both peers set the TCP option on the three-way handshake, inline SMC-R negotiation occurs using CLC messages. If neither peer, or only one peer, sets the TCP option, SMC-R cannot be used for the TCP connection, and the TCP connection completes the setup using the IP fabric.3.2. Connection Layer Control (CLC) Messages
CLC messages are sent as data payload over the IP network using the TCP connection between SMC-R layers at the peers. They are analogous to the messages used to exchange parameters for SSL. The use of CLC messages is detailed in the following sections. The following list provides a summary of the defined CLC messages and their purposes: o SMC Proposal: Sent from the client to propose that this TCP connection is eligible to be moved to SMC-R. The client identifies itself and its subnet to the server and passes the SMC-R elements for a suggested RoCE path via the MAC and GID. o SMC Accept: Sent from the server to accept the client's TCP connection SMC Proposal. The server responds to the client's proposal by identifying itself to the client and passing the elements of a RoCE path that the client can use to perform RDMA writes to the server. This consists of such SMC-R link elements as RoCE MAC, GID, and RMB information. o SMC Confirm: Sent from the client to confirm the server's acceptance of the SMC connection. The client responds to the server's acceptance by passing the elements of a RoCE path that the server can use to perform RDMA writes to the client. This consists of such SMC-R link elements as RoCE MAC, GID, and RMB information. o SMC Decline: Sent from either the server or the client to reject the SMC connection, indicating the reason the peer must decline the SMC Proposal and allowing the TCP connection to revert back to IP connectivity.3.3. LLC Messages
Link Layer Control (LLC) messages are sent between peer SMC-R layers over an SMC-R link to manage the link or the link group. LLC messages are sent using RoCE SendMsg and are 44 bytes long. The 44-byte size is based on what can fit into a RoCE Work Queue Element (WQE) without requiring the posting of receive buffers.
LLC messages generally follow a request-reply semantic. Each message has a request flavor and a reply flavor, and each request must be confirmed with a reply, except where otherwise noted. The use of LLC messages is detailed in the following sections. The following list provides a summary of the defined LLC messages and their purposes: o ADD LINK: Used to add a new link to a link group. Sent from the server to the client to initiate addition of a new link to the link group, or from the client to the server to request that the server initiate addition of a new link. o ADD LINK CONTINUATION: A continuation of ADD LINK that allows the ADD LINK to span multiple commands, because all of the link information cannot be contained in a single ADD LINK message. o CONFIRM LINK: Used to confirm that RoCE connectivity over a newly created SMC-R link is working correctly. Initiated by the server. Both this message and its reply must flow over the SMC-R link being confirmed. o DELETE LINK: When initiated by the server, deletes a specific link from the link group or deletes the entire link group. When initiated by the client, requests that the server delete a specific link or the entire link group. o CONFIRM RKEY: Informs the peer on the SMC-R link of the addition of an RMB to the link group. o CONFIRM RKEY CONTINUATION: A continuation of CONFIRM RKEY that allows the CONFIRM RKEY to span multiple commands, in the event that all of the information cannot be contained in a single CONFIRM RKEY message. o DELETE RKEY: Informs the peer on the SMC-R link of the deletion of one or more RMBs from the link group. o TEST LINK: Verifies that an already-active SMC-R link is active and healthy. o Optional LLC message: Any LLC message in which the two high-order bits of the opcode are b'10'. This optional message must be silently discarded by a receiving peer that does not support the opcode. No such messages are defined in this version of the architecture; however, the concept is defined to allow for toleration of possible advanced, optional functions.
CONFIRM LINK and TEST LINK are sensitive to which link they flow on and must flow on the link being confirmed or tested. The other flows may flow over any active link in the link group. When there are multiple links in a link group, a response to an LLC message must flow over the same link that the original message flowed over, with the following exceptions: o ADD LINK request from a server in response to an ADD LINK from a client. o DELETE LINK request from a server in response to a DELETE LINK from a client.3.4. CDC Messages
Connection Data Control (CDC) messages are sent over the RoCE fabric between peers using RoCE SendMsg and are 44 bytes long. The 44-byte size is based on the size that can fit into a RoCE WQE without requiring the posting of receive buffers. CDC messages are used to describe the socket application data passed via RDMA write operations, as well as TCP connection state information, including producer cursors and consumer cursors, RMBE state information, and failover data validation.3.5. Rendezvous Flows
Rendezvous information for SMC-R is exchanged as TCP options on the TCP three-way handshake flows to indicate capability, followed by inline TCP negotiation messages to actually do the SMC-R setup. Formats of all rendezvous options and messages discussed in this section are detailed in Appendix A.3.5.1. First Contact
First contact between RoCE peers occurs when a new SMC-R link group is being set up. This could be because no SMC-R links already exist between the peers, or the server decides to create a new SMC-R link group in parallel with an existing one.3.5.1.1. Pre-negotiation of TCP Options
The client and server indicate their SMC-R capability to each other using TCP option 254 on the TCP three-way handshake flows. A client who wishes to do SMC-R will include TCP option 254 using an ExID equal to the EBCDIC (codepage IBM-1047) encoding of "SMCR" on its SYN flow.
A server that supports SMC-R will include TCP option 254 with the ExID value of EBCDIC "SMCR" on its SYN-ACK flow. Because the server is listening for connections and does not know where client connections will come from, the server implementation may choose to unconditionally include this TCP option if it supports SMC-R. This may be required for server implementations where extensions to the TCP stack are not practical. For server implementations that can add code to examine and react to packets during the three-way handshake, the server should only include the SMC-R TCP option on the SYN-ACK if the client included it on its SYN packet. A client who supports SMC-R and meets the three conditions outlined above may optionally include the TCP option for SMC-R on its ACK flow, regardless of whether or not the server included it on its SYN-ACK flow. Some TCP/IP stacks may have to include it if the SMC-R layer cannot modify the options on the socket until the three-way handshake completes. Proprietary servers should not include this option on the ACK flow, since including it on the SYN flow was sufficient to indicate the client's capabilities. Once the initial three-way TCP handshake is completed, each peer examines the socket options. SMC-R implementations may do this by examining what was actually provided on the SYN and SYN-ACK packets or by performing a getsockopt() operation to determine the options sent by the peer. If neither peer, or only one peer, specified the TCP option for SMC-R, then SMC-R cannot be used on this connection and it proceeds using normal IP flows and processing. If both peers specified the TCP option for SMC-R, then the TCP connection is not started yet and the peers proceed to SMC-R negotiation using inline data flows. The socket is not yet turned over to the applications; instead, the respective SMC layers exchange CLC messages over the newly formed TCP connection.3.5.1.2. Client Proposal
If SMC-R is supported by both peers, the client sends an SMC Proposal CLC message to the server. It is not immediately apparent on this flow from client to server whether this is a new or existing SMC-R link, because in clustered environments a single IP address may represent multiple hosts. This type of cluster virtual IP address can be owned by a network-based or host-based Layer 4 load balancer that distributes incoming TCP connections across a cluster of servers/hosts. For purposes of high availability, other clustered environments may also support the movement of a virtual IP address dynamically from one host in the cluster to another. In summary, the client cannot predetermine that a connection is targeting the same host by simply matching the destination IP address for outgoing TCP
connections. Therefore, it cannot predetermine the SMC-R link that will be used for a new TCP connection. This information will be dynamically learned, and the appropriate actions will be taken as the SMC-R negotiation handshake unfolds. In the SMC-R proposal message, the initiator (client) proposes the use of SMC-R by including its peer ID, GID, and MAC addresses, as well as the IP subnet number of the outgoing interface (if IPv4) or the IP prefix list for the network over which the proposal is sent (if IPv6). At this point in the flow, the client makes no local commitments of resources for SMC-R. When the server receives the SMC Proposal CLC message, it uses the peer ID provided by the client, plus subnet or prefix information provided by the client, to determine if it already has a usable SMC-R link with this SMC-R peer. If there are one or more existing SMC-R links with this SMC-R peer, the server then decides which SMC-R link it will use for this TCP connection. See Sections 3.5.2 and 3.5.3 for the cases of reusing an existing SMC-R link or creating a parallel SMC-R link group between SMC-R peers. If this is a first contact between SMC-R peers, the server must validate that it is on the same LAN as the client before continuing. For IPv4, the server does this by verifying that it has an interface with an IP subnet number that matches the subnet number sent by the client in the SMC Proposal. For IPv6, it does this by verifying that it is directly attached to at least one IP prefix that was listed by the client in its SMC Proposal message. If the server agrees to use SMC-R, the server begins the setup of a new SMC-R link by allocating local QP and RMB resources (setting its QP state to INIT) and providing its full SMC-R information in an SMC Accept CLC message to the client over the TCP connection, along with a flag set indicating that this is a first contact flow. While the SMC Accept message could flow over any IP route back to the client depending upon Layer 3 IP routing, the SMC-R credentials provided must be for the common subnet or prefix between the server and client, as determined above. If the server cannot or does not want to do SMC-R with the client, it sends an SMC Decline CLC message to the client, and the connection data may begin flowing using normal TCP/IP flows.
3.5.1.3. Server Acceptance
When the client receives the SMC Accept from the server, it determines whether this is a new or existing SMC-R link, using the combination of the following: the first contact flag, its MAC/GID and the MAC/GID returned by the server, the VLAN over which the connection is setting up, and the QP number provided by the server. If it is an existing SMC-R link and the client agrees to use that link for the TCP connection, see Section 3.5.2 ("Subsequent Contact") below. If it is a new SMC-R link between peers that already have an SMC-R link, then the server is starting a new SMC-R link group. Assuming that either (1) this is a first contact between peers or (2) the server is starting a new SMC-R link group, the client now allocates local QP and RMB resources for the SMC-R link (setting the QP state to RTR (ready to receive)), associates them with the server QP as learned from the SMC Accept CLC message, and sends an SMC Confirm CLC message to the server over the TCP connection with its SMC-R link information included. The client also starts a timer to wait for the server to confirm the reliably connected queue pair, as described below.3.5.1.4. Client Confirmation
Upon receipt of the client's SMC Confirm CLC message, the server associates its QP for this SMC-R link with the client's QP as learned from the SMC Confirm CLC message and sets its QP state to RTS (ready to send). The client and the server now have reliably connected queue pairs.3.5.1.5. Link (QP) Confirmation
Since setting up the SMC-R link and its QPs did not require any network flows on the RoCE fabric, the client and server must now confirm connectivity over the RoCE fabric. To accomplish this, the server will send a CONFIRM LINK Link Layer Control (LLC) message to the client over the newly created SMC-R link, using the RoCE fabric. The CONFIRM LINK LLC message will provide the server's MAC, GID, and QP information for the connection, allow each partner to communicate the maximum number of links it can tolerate in this link group (the "link limit"), and will additionally provide two link IDs: o a 1-byte server-assigned link number that is used by both peers to identify the link within the link group and is only unique within a link group.
o a 4-byte link user ID. This opaque value is assigned by the server for the server's local use and is provided to the client for management purposes -- for example, to use in network management displays and products. When the server sends this message, it will set a timer for receiving confirmation from the client. When the client receives the server's confirmation in the form of a CONFIRM LINK LLC message, it will cancel the confirmation timer it set when it sent the SMC Confirm message. The client will also advance its QP state to RTS and respond over the RoCE fabric with a CONFIRM LINK response LLC message that (1) provides its MAC, GID, QP number, and link limit, (2) confirms the 1-byte link number sent by the server, and (3) provides its own 4-byte link user ID to the server.
Host X -- Server Host Y -- Client +-------------------+ +-------------------+ | Peer ID = PS1 | | Peer ID = PC1 | | +------+ +------+ | | QP 8 |RNIC 1| |RNIC 2| QP 64 | |RToken X| |MAC MA| |MAC MB| | | | | |GID GA| |GID GB| |RToken Y| | \/ +------+ (Subnet S1) +------+ \/ | |+--------+ | | +--------+ | || RMB | | | | RMB | | |+--------+ | | +--------+ | | +------+ +------+ | | |RNIC 3| |RNIC 4| | | |MAC MC| |MAC MD| | | |GID GC| |GID GD| | | +------+ +------+ | +-------------------+ +-------------------+ SYN TCP options(254,"SMCR") <--------------------------------------------------------- SYN-ACK TCP options(254,"SMCR") ---------------------------------------------------------> ACK [TCP options(254,"SMCR")] <-------------------------------------------------------- SMC Proposal(PC1,MB,GB,S1) <-------------------------------------------------------- SMC Accept(PS1,first contact,MA,GA,MTU,QP8,RToken=X,RMB elem index) ---------------------------------------------------------> SMC Confirm(PC1,MB,GB,MTU,QP64,RToken=Y,RMB element index) <-------------------------------------------------------- CONFIRM LINK(MA,GA,QP8, link lim, server link user ID, linknum) .........................................................> CONFIRM LINK rsp(MB,GB,QP64, link lim, client link user ID, linknum) <........................................................ Legend: ------------ TCP/IP and CLC flows ............ RoCE (LLC) flows Square brackets ("[ ]") indicate optional information Figure 8: First Contact Rendezvous Flows
Technically, the data for the TCP connection could now flow over the RoCE path. However, if this is a first contact, there is no alternate for this recently established RoCE path. Since in the current architecture there is no failover from RoCE to IP once connection data starts flowing, this means that a failure of this path would disrupt the TCP connection, meaning that the level of redundancy and failover is less than that provided by IP. If the network has alternate RoCE paths available, they would not be usable at this point. This situation would be unacceptable.3.5.1.6. Second SMC-R Link Setup
Because of the unacceptable situation described above, TCP data will not be allowed to flow on the newly established SMC-R link until a second path has been set up, or at least attempted. If the server has a second RNIC available on the same LAN, it attempts to set up the second SMC-R link over that second RNIC. If it only has one RNIC available on the LAN, it will attempt to set up the second SMC-R link over that one RNIC. In the latter case, the server is attempting to set up an asymmetric link, in case the client does have a second RNIC on the LAN. In either case, the server allocates a new QP over the RNIC it is attempting to use for the second link and assigns a link number to the new link; the server also creates an RToken for the RMB over this second QP (note that this means that the first and second QP each have their own RToken to represent the same RMB). The server provides this information, as well as the MAC and GID of the RNIC over which it is attempting to set up the second link, in an ADD LINK LLC message that it sends to the client over the SMC-R link that is already set up.3.5.1.6.1. Client Processing of ADD LINK LLC Message from Server
When the client receives the server's ADD LINK LLC message, it examines the GID and MAC provided by the server to determine whether the server is attempting to use the same server-side RNIC as the existing SMC-R link or a different one. If the server is attempting to use the same server-side RNIC as the existing SMC-R link, then the client verifies that it has a second RNIC on the same LAN. If it does not, the client rejects the ADD LINK request from the server, because the resulting link would be a parallel link, which is not supported within a link group. If the client does have a second RNIC on the same LAN, it accepts the request, and an asymmetric link will be set up.
If the server is using a different server-side RNIC from the existing SMC-R link, then the client will accept the request and a second SMC-R link will be set up in this SMC-R link group. If the client has a second RNIC on the same LAN, that second RNIC will be used for the second SMC-R link, creating symmetric links. If the client does not have a second RNIC on the same LAN, it will use the same RNIC as was used for the initial SMC-R link, resulting in the setup of an asymmetric link in the SMC-R link group. In either case, when the client accepts the server's ADD LINK request, it allocates a new QP on the chosen RNIC and creates an RKey over that new QP for the client-side RMB for the SMC-R link group, then sends an ADD LINK reply LLC message to the server providing that information as well as echoing the link number that was sent by the server. If the client rejects the server's ADD LINK request, it sends an ADD LINK reply LLC message to the server with the reason code for the rejection.3.5.1.6.2. Server Processing of ADD LINK Reply LLC Message from Client
If the client sends a negative response to the server or no reply is received, the server frees the RoCE resources it had allocated for the new link. Having a single link in an SMC-R link group is undesirable. The server's recovery is detailed in Appendix C.8 ("Failure to Add Second SMC-R Link to a Link Group"). If the client sends a positive reply to the server with MAC/GID/QP/RKey information, the server associates its QP for the new SMC-R link to the QP that the client provided. Now, the new SMC-R link is in the same situation that the first was in after the client sent its ACK packet -- there is a reliably connected queue pair over the new RoCE path, but there have been no RoCE flows to confirm that it's actually usable. So, at this point, the client and server will exchange CONFIRM LINK LLC messages just like they did on the first SMC-R link. If either peer receives a failure during this second CONFIRM LINK LLC exchange (either an immediate failure -- which implies that the message did not reach the partner -- or a timeout), it sends a DELETE LINK LLC message to the partner over the first (and now only) link in the link group. This DELETE LINK LLC message must be acknowledged before data can flow on the single link in the link group.
Host X -- Server Host Y -- Client +-------------------+ +-------------------+ | Peer ID = PS1 | | Peer ID = PC1 | | +------+ +------+ | | QP 8 |RNIC 1| SMC-R Link 1 |RNIC 2| QP 64 | |RToken X| |MAC MA|<-------------------->|MAC MB| | | | | |GID GA| |GID GB| |RToken Y| | \/ +------+ +------+ \/ | |+--------+ | | +--------+ | || | | | | | | || RMB | | | | RMB | | || | | | | | | |+--------+ | | +--------+ | | /\ +------+ +------+ /\ | | | |RNIC 3| SMC-R Link 2 |RNIC 4| | | |RToken Z| |MAC MC|<-------------------->|MAC MD| |RToken W | | QP 9 |GID GC| (being added) |GID GD| QP 65 | | +------+ +------+ | +-------------------+ +-------------------+ First SMC-R link setup as shown in Figure 8 <-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-> ADD LINK request(QP9,MC,GC, link number = 2) ............................................> ADD LINK response(QP65,MD,GD, link number = 2) <............................................ ADD LINK CONTINUATION request(RToken=Z) ............................................> ADD LINK CONTINUATION response(RToken=W) <............................................ CONFIRM LINK(MC,GC,QP9, link number = 2, link user ID) .............................................> CONFIRM LINK response(MD,GD,QP65, link number = 2, link user ID) <............................................. Legend: ------------ TCP/IP and CLC flows ............ RoCE (LLC) flows Figure 9: First Contact, Second Link Setup
3.5.1.6.3. Exchange of RKeys on Second SMC-R Link
Note that in the scenario described here -- first contact -- there is only one RMB RKey to exchange on the second SMC-R link, and it is exchanged in the ADD LINK CONTINUATION request and reply. In scenarios other than first contact -- for example, adding a new SMC-R link to a longstanding link group with multiple RMBs -- additional flows will be required to exchange additional RMB RKeys. See Section 3.5.5.2.3 ("Adding a New SMC-R Link to a Link Group with Multiple RMBs") for more details on these flows.3.5.1.6.4. Aborting SMC-R and Falling Back to IP
If both partners don't provide the SMC-R TCP option during the three-way TCP handshake, the connection falls back to normal TCP/IP. During the SMC-R negotiation that occurs after the three-way TCP handshake, either partner may break off SMC-R by sending an SMC Decline CLC message. The SMC Decline CLC message may be sent in place of any expected message and may also be sent during the CONFIRM LINK LLC exchange if there is a failure before any application data has flowed over the RoCE fabric. For more details on exactly when an SMC Decline can flow during link group setup, see Appendices C.1 ("SMC Decline during CLC Negotiation") and C.2 ("SMC Decline during LLC Negotiation"). If this fallback to IP happens while setting up a new SMC-R link group, the RoCE resources allocated for this SMC-R link group relationship are torn down, and it will be retried as a new SMC-R link group next time a connection starts between these peers with SMC-R proposed. Note that if this happens because one side doesn't support SMC-R, there will be very little to tear down, as the TCP option will have failed to flow on either the initial SYN or the SYN-ACK before either side had reserved any local RoCE resources.3.5.2. Subsequent Contact
"Subsequent contact" means setting up a new TCP connection between two peers that already have an SMC-R link group between them and reusing the existing SMC-R link group. In this case, it is not necessary to allocate new QPs. However, it is possible that a new RMB has been allocated for this TCP connection, if the previous TCP connection used the last element available in the previously used RMB, or for any other implementation-dependent reason. For this reason, and for convenience and error checking, the same TCP option 254, followed by the inline negotiation method described for initial contact, will be used for subsequent contact, but the processing differs in some ways. That processing is described below.
3.5.2.1. SMC-R Proposal
When the client begins the inline negotiation with the server, it does not know if this is a first contact or a subsequent contact. The client cannot know this information until it sees the server's peer ID, to determine whether or not it already has an SMC-R link with this peer that it can use. There are several reasons why it is not sufficient to use the partner IP address, subnet, VLAN, or other IP information to make this determination. The most obvious reason is distributed systems: if the server IP address is actually a virtual IP address representing a distributed cluster, the actual host serving this TCP connection may not be the same as the host that served the last TCP connection to this same IP address. After the TCP three-way handshake, assuming that both partners indicate SMC-R capability, the client builds and sends the SMC Proposal CLC message to the server in exactly the same manner as it does in the "first contact" case, and in fact at this point doesn't know if it's a first contact or a subsequent contact. As in the "first contact" case, the client sends its peer ID value, suggested RNIC MAC/GID, and IP subnet or prefix information. Upon receiving the client's proposal, the server looks up the provided peer ID to determine if it already has a usable SMC-R link group with this peer. If it does already have a usable SMC-R link group, the server then needs to decide whether it will use the existing SMC-R link group or create a new link group. For the case of the new link group, see Section 3.5.3 ("First Contact Variation: Creating a Parallel Link Group") below. For this discussion, assume that the server decides to use the existing SMC-R link group for the TCP connection, which is expected to be the most common case. The server is responsible for making this decision. The server then needs to communicate that information to the client, but it is not necessary to allocate, associate, and confirm QPs for the chosen SMC-R link. All that remains to be done is to set up RMB space for this TCP connection. If one of the RMBs already in use for this SMC-R link group has an available element that uses the appropriate buffer size, the server merely chooses one for this TCP connection and then sends an SMC Accept CLC message providing the full RoCE information for the chosen SMC-R link to the client, using the same format as the SMC Accept CLC message described in Section 3.5.1 ("First Contact") above.
The server may choose to use the SMC-R link that matches the suggested MAC/GID provided by the client in the SMC Proposal for its RDMA writes but is not obligated to do so. The final decision on which specific SMC-R link to assign a TCP connection to is an independent server and client decision. It may be necessary for the server to allocate a new RMB for this connection. The reasons for this are implementation dependent and could include the following: o no available space in existing RMB or RMBs, or o desire to allocate a new RMB that uses a different buffer size from the ones already created, or o any other implementation-dependent reason In this case, the server will allocate the new RMB and then perform the flows described in Section 3.5.5.2.1 ("Adding a New RMB to an SMC-R Link Group"). Once that processing is complete, the server then provides the full RoCE information, including the new RKey, for this connection in an SMC Confirm CLC message to the client.3.5.2.2. SMC-R Acceptance
Upon receiving the SMC Accept CLC message from the server, the client examines the RoCE information provided by the server to determine whether this is a first contact for a new SMC-R link group or a subsequent contact for an existing SMC-R link group. It is a subsequent contact if the server-side peer ID, GID, MAC, and QP number provided in the packet match a known SMC-R link, and the first contact flag is not set. If this is not the case -- for example, the GID and MAC match but the QP is new -- then the server is creating a new, parallel SMC-R link group, and this is treated as a first contact. A different RMB RToken does not indicate a first contact, as the server may have allocated a new RMB or may be using several RMBs for this SMC-R link. The client needs the server's RMB information only for its RDMA writes to the server, and since there is no requirement for symmetric RMBs, this information is simply control information for the RDMA writes on this SMC-R link. The client must validate that the RMB element being provided by the server is not in use by another TCP connection on this SMC-R link group. This validation must validate the new <rtoken, index> across
all known <rtoken, index> on this link group. See Section 4.4.2 ("RMB Element Reuse and Conflict Resolution") for the case in which the server tries to use an RMB element that is already in use on this link group. Once the client has determined that this TCP connection is a subsequent contact over an existing SMC-R link, it performs an RMB allocation process similar to what the server did: it either (1) allocates an element from an RMB already associated with this SMC-R link or (2) allocates a new RMB, associates it with this SMC-R link, and then chooses an element out of it. If the client allocates a new RMB for this TCP connection, it performs the processing described in Section 3.5.5.2.1 ("Adding a New RMB to an SMC-R Link Group"). Once that processing is complete, the client provides its full RoCE information for this TCP connection in an SMC Confirm CLC message. Because an SMC-R link with a verified connected QP already exists and is being reused, there is no need for verification or alternate QP selection flows or timers.3.5.2.3. SMC-R Confirmation
When the server receives the client's SMC Confirm CLC message on a subsequent contact, it verifies the following: o The RMB element provided by the client is not already in use by another TCP connection on this SMC-R link group (see Section 4.4.2 ("RMB Element Reuse and Conflict Resolution") for the case in which it is). o The MAC/GID/QP information provided by the client matches an active link within the link group. The client is free to select any valid/active link. The client is not required to select the same link as the server. If this validation passes, the server stores the client's RMB information for this connection, and the RoCE setup of the TCP connection is complete.3.5.2.4. TCP Data Flow Race with SMC Confirm CLC Message
On a subsequent contact TCP/IP connection, a peer may send data as soon as it has received the peer RMB information for the connection. There are no additional RoCE confirmation flows, since the QPs on the SMC-R link are already reliably connected and verified.
In the majority of cases, the first data will flow from the client to the server. The client must send the SMC Confirm CLC message before sending any connection data over the chosen SMC-R link; however, the client need not wait for confirmation of this message, and in fact there will be no such confirmation. Since the server is required to have the RMB fully set up and ready to receive data from the client before sending an SMC Accept CLC message, the client can begin sending data over the SMC-R link immediately upon completing the send of the SMC Confirm CLC message. It is possible that data from the client will arrive at the server-side RMB before the SMC Confirm CLC message from the client has been processed. In this case, the server must handle this race condition and not provide the arrived TCP data to the socket application until the SMC Confirm CLC message has been received and fully processed, opening the socket. If the server has initial data to send to the client that is not a response to the client (this case should be rare), it can send the data immediately upon receiving and processing the SMC Confirm CLC message from the client. The client must have opened the TCP socket to the client application upon sending the SMC Confirm CLC message so the client will be ready to process data from the server.3.5.3. First Contact Variation: Creating a Parallel Link Group
Recall that parallel SMC-R links within an SMC-R link group are not supported. These are multiple SMC-R links within a link group that use the same network path. However, multiple SMC-R link groups between the same peers are supported. This means that if multiple SMC-R links over the same RoCE path are desired, it is necessary to use multiple SMC-R link groups. While not a recommended practice, this could be done for platform-specific reasons, like QP separation of different workloads. Only the server can drive the creation of multiple SMC-R link groups between peers. At a high level, when the server decides to create an additional SMC-R link group with a client with which it already has an SMC-R link group, the flows are basically the same as the normal "first contact" case described above. The following text provides more detail and clarification of processing in this case. When the server receives the SMC Proposal CLC message from the client and, using the MAC/GID information, determines that it already has an SMC-R link group with this client, the server can either reuse the existing SMC-R link group (detailed in Section 3.5.2 ("Subsequent Contact") above) or create a new SMC-R link group in addition to the existing one.
If the server decides to create a new SMC-R link group, it does the same processing it would have done for first contact: allocate QP and RMB resources as well as alternate QP resources, and communicate the QP and RMB information to the client in the SMC Accept CLC message with the first contact flag set. When the client receives the server's SMC Accept CLC message with the new QP information and the first contact flag set, it knows that the server is creating a new SMC-R link group even though it already has an SMC-R link group with the server. In this case, the client will also allocate a new QP for this new SMC-R link, allocate an RMB for it, and generate an RKey for it. Note that multiple SMC-R link groups between the same peers must access different RMB resources, so new RMBs will be required. Using the same RMBs that are in use in another SMC-R link group is not permitted. The client then associates its new QP with the server's new QP and sends its SMC Confirm CLC message back to the server providing the new QP/RMB information, and then sets its confirmation timer for the new SMC-R link. When the server receives the client's SMC Confirm CLC message, it associates its QP with the client's QP as learned from the SMC Confirm CLC message and sends a confirmation LLC message. The rest of the flow, with the confirmation QP and setup of additional SMC-R links, unfolds just like the "first contact" case.3.5.4. Normal SMC-R Link Termination
The normal socket API trigger points are used by the SMC-R layer to initiate SMC-R connection termination flows. The main design point for SMC-R normal connection flows is to use the SMC-R protocol to first shut down the SMC-R connection and free up any SMC-R RDMA resources, and then allow the normal TCP connection termination protocol (i.e., FIN processing) to drive cleanup of the TCP connection that exists on the IP fabric. This design point is very important in ensuring that RDMA resources such as the RMBEs are only freed and reused when both SMC-R endpoints are completely done with their RDMA write operations to the partner's RMBE. When the last TCP connection over an SMC-R link group terminates, the link group can be terminated. Similar to creation of SMC-R links and link groups, the primary responsibility for determining that normal termination is needed and initiating it lies with the server.
Implementations may opt to set timers to keep SMC-R link groups up for a specified time after the last TCP connection ends, to avoid churn in cases where TCP connections come and go regularly. The link or link group may also be terminated as a result of a command initiated by the operator. This command can be entered at either the client or the server. If entered at the client, the client requests that the server perform link or link group termination, and the responsibility for doing so ultimately lies with the server. When the server determines that the SMC-R link group is to be terminated, it sends a DELETE LINK LLC message to the client, with a flag set indicating that all links in the link group are to be terminated. After receiving confirmation from the adapter that the DELETE LINK LLC message has been sent, the server can clean up its end of the link group (QPs, RMBs, etc.). Upon receipt of the DELETE LINK message from the server, the client must immediately comply and clean up its end of the link group. Any TCP connections that the client believes to be active on the link group must be immediately terminated. The client can request that the server delete the link group as well. The client does this by sending a DELETE LINK message to the server, indicating that cleanup of all links is requested. The server must comply by sending a DELETE LINK to the client and processing as described in the previous paragraph. If there are TCP connections active on the link group when the server receives this request, they are immediately terminated by sending a RST flow over the IP fabric.3.5.5. Link Group Management Flows
3.5.5.1. Adding and Deleting Links in an SMC-R Link Group
The server has the lead role in managing the composition of the link group. Links are added to the link group by the server. The client may notify the server of new conditions that may result in the server adding a new link, but the server is ultimately responsible. In general, links are deleted from the link group by the server; however, in certain error cases the client may inform the server that a link must be deleted and treat it as deleted without waiting for action from the server. These flows are detailed in the sections that follow.
3.5.5.1.1. Server-Initiated ADD LINK Processing
As described in previous sections, the server initiates an ADD LINK exchange to create redundancy in a newly created link group. Once a link group is established, the server may also initiate ADD LINK for other reasons, including: o Availability of additional resources on the server host to support an additional SMC-R link. This may include the provisioning of an additional RNIC, more storage becoming available to support additional QP resources, operator command, or any other implementation-dependent reason. Note that in order to be available for an existing link group a new RNIC must be attached to the same RoCE LAN that the link group is using. o Receipt of notification from the client that additional resources on the client are available to support an additional SMC-R link. See Section 3.5.5.1.2 ("Client-Initiated ADD LINK Processing"). Server-initiated ADD LINK processing in an established SMC-R link group is the same as the ADD LINK processing described in Section 3.5.1.6 ("Second SMC-R Link Setup"), with the following changes: o If an asymmetric SMC-R link already exists in the link group, a second asymmetric link will not be created. Only one asymmetric link is permitted in a link group. o TCP data flow on already-existing link(s) in the link group is not halted or otherwise affected during the process of setting up the additional link. The server will not initiate ADD LINK processing if the link group already has the maximum number of links negotiated by the partners.3.5.5.1.2. Client-Initiated ADD LINK Processing
If an additional RNIC becomes available for an existing SMC-R link group on the client's side, the client notifies the server by sending an ADD LINK request LLC message to the server. Unlike an ADD LINK request sent by the server to the client, this ADD LINK request merely informs the server that the client has a new RNIC. If the link group lacks redundancy or has redundancy only on an asymmetric link with a single RNIC on the client side, the server must initiate an ADD LINK exchange in response to this message, to create or improve the link group's redundancy.
If the link group already has symmetric-link redundancy but has fewer than the negotiated maximum number of links, the server may respond by initiating an ADD LINK exchange to create a new link using the client's new resource but is not required to do so. If the link group already has the negotiated maximum number of links, the server must ignore the client's ADD LINK request LLC message. Because the server is not required to respond to the client's ADD LINK LLC message in all cases, the client must not wait for a response or throw an error if one does not come.3.5.5.1.3. Server-Initiated DELETE LINK Processing
Reasons that a server may delete a link include the following: o The link has not been used for TCP connections for an implementation-defined time interval, and deleting the link will not cause the link group to lack redundancy. o Errors in resources supporting the link occur. These errors may include, but are not limited to, RNIC errors, QP errors, and software errors. o The RNIC supporting this SMC-R link is being taken down, either because of an error case or because of an operator or software command. If a link being deleted is supporting TCP connections and there are one or more surviving links in the link group, the TCP connections are moved to the surviving links. For more information on this processing, see Section 2.3 ("SMC-R Resilience and Load Balancing"). The server deletes a link from the link group by sending a DELETE LINK request LLC message to the client over any of the usable links in the link group. Because the DELETE LINK LLC message specifies which link is to be deleted, it may flow over any link in the link group. The server must not clean up its RoCE resources for the link until the client responds. The client responds to the server's DELETE LINK request LLC message by sending the server a DELETE LINK response LLC message. The client must respond positively; it cannot decline to delete the link. Once the server has received the client's DELETE LINK response, both sides may clean up their resources for the link.
Either a positive write completion or some other indication from the RNIC on the client's side is sufficient to indicate to the client that the server has received the DELETE LINK response. Host X Host Y +-------------------+ +-------------------+ | +------+ +------+ | | QP 8 |RNIC 1| SMC-R Link 1 |RNIC 2| QP 9 | |RToken X| |Failed|<--X----X----X----X-->| | | | | | | | | | | \/ +------+ +------+ | |+--------+ | | | || Deleted| | | | || RMB | | | | || | | | | |+--------+ | | | | /\ +------+ +------+ | |RToken Z| | | SMC-R Link 2 | | | | | |RNIC 3|<-------------------->|RNIC 4| | | QP 64| | | | QP 65 | | +------+ +------+ | +-------------------+ +-------------------+ DELETE LINK(request, link number = 1, ................................................> reason code = RNIC failure) DELETE LINK(response, link number = 1) <................................................ (Note: Architecturally, this exchange can flow over either SMC-R link but most likely flows over Link 2, since the RNIC for Link 1 has failed.) Figure 10: Server-Initiated DELETE LINK Flow
3.5.5.1.4. Client-Initiated DELETE LINK Request
The client may request that the server delete a link for the same reasons that the server may delete a link, except for inactivity timeout. Because the client depends on the server to delete links, there are two types of delete requests from client to server: o Orderly: The client is requesting that the server delete the link when able. This would result from an operator command to bring down the RNIC or some other nonfatal reason. In this case, the server is required to delete the link but may not do it right away. o Disorderly: The server must delete the link right away, because the client has experienced a fatal error with the link. In either case, the server responds by initiating a DELETE LINK exchange with the client, as described in the previous section. The difference between the two is whether the server must do so immediately or can delay for an opportunity to gracefully delete the link.
Host X Host Y +-------------------+ +-------------------+ | +------+ +------+ | | QP 8 |RNIC 1| SMC-R Link 1 |RNIC 2| QP 9 | |RToken X| | |<---X--X--X--X--X--X->|Failed| | | | | | | | | | \/ +------+ +------+ | |+--------+ | | | || Deleted| | | | || RMB | | | | || | | | | |+--------+ | | | | /\ +------+ +------+ | |RToken Z| | | SMC-R Link 2 | | | | | |RNIC 3|<-------------------->|RNIC 4| | | QP 64| | | | QP 65 | | +------+ +------+ | +-------------------+ +-------------------+ DELETE LINK(request, link number = 1, disorderly, <............................................... reason code = RNIC failure) DELETE LINK(request, link number = 1, ................................................> reason code = RNIC failure) DELETE LINK(response, link number = 1) <................................................ (Note: Architecturally, this exchange can flow over either SMC-R link but most likely flows over Link 2, since the RNIC for Link 1 has failed.) Figure 11: Client-Initiated DELETE LINK Flow3.5.5.2. Managing Multiple RKeys over Multiple SMC-R Links in a Link Group
After the initial contact sequence completes and the number of TCP connections increases, it is possible that the SMC peers could add more RMBs to the link group. Recall that each peer independently manages its RMBs. Also recall that an RMB's RToken is specific to a QP, which means that when there are multiple SMC-R links in a link group, each RMB accessed with the link group requires a separate RToken for each SMC-R link in the group.
Each RMB that is added to a link must be added to all links within the link group. The set of RMBs created for the link is called the "RToken set". The RTokens must be exchanged with the peer. As RMBs are added and deleted, the RToken set must remain in sync.3.5.5.2.1. Adding a New RMB to an SMC-R Link Group
A new RMB can be added to an SMC-R link group on either the client side or the server side. When an additional RMB is added to an existing SMC-R link group, that RMB must be associated with the QPs for each link in the link group. Therefore, when an RMB is added to an SMC-R link group, its RMB RToken for each SMC-R link's QP must be communicated to the peer. The tokens for a new RMB added to an existing SMC-R link group are communicated using CONFIRM RKEY LLC messages, as shown in Figure 12. The RToken set is specified as pairs: an SMC-R link number, paired with the new RMB's RToken over that SMC-R link. To preserve failover capability, any TCP connection that uses a newly added RMB cannot go active until all RTokens for the RMB have been communicated for all of the links in the link group.
Host X Host Y +-------------------+ +-------------------+ | +------+ +------+ | | QP 8 |RNIC 1| SMC-R Link 1 |RNIC 2| QP 9 | |RToken X| | |<-------------------->| | | | | | | | | | | \/ +------+ +------+ | |+--------+ | | | || New | | | | || RMB | | | | || | | | | |+--------+ | | | | /\ +------+ +------+ | |RToken Z| | | SMC-R Link 2 | | | | | |RNIC 3|<-------------------->|RNIC 4| | | QP 64| | | | QP 65 | | +------+ +------+ | +-------------------+ +-------------------+ CONFIRM RKEY(request, Add, ................................................> RToken set((Link 1,RToken X),(Link 2,RToken Z))) CONFIRM RKEY(response, Add, <................................................ RToken set((Link 1,RToken X),(Link 2,RToken Z))) (Note: This exchange can flow over either SMC-R link.) Figure 12: Add RMB to Existing Link Group Implementations may choose to proactively add RMBs to link groups in anticipation of need. For example, an implementation may add a new RMB when a certain usage threshold (e.g., percentage used) for all of its existing RMBs has been exceeded. A new RMB may also be added to an existing link group on an as-needed basis -- for example, when a new TCP connection is added to the link group but there are no available RMB elements. In this case, the CLC exchange is paused while the peer that requires the new RMB adds it. An example of this is illustrated in Figure 13.
Host X -- Server Host Y -- Client +-------------------+ +--------------------+ | Peer ID = PS1 | | Peer ID = PC1 | | +------+ +------+ | | QP 8 |RNIC 1| SMC-R Link 1 |RNIC 2| QP 64 | |RToken X| |MAC MA|<-------------------->|MAC MB| | | | | |GID GA| |GID GB| |RToken Y2| | \/ +------+ +------+ \/ | |+--------+ | | +--------+ | || | | Subnet S1 | | New | | || RMB | | | | RMB | | |+--------+ | | +--------+ | | /\ +------+ +------+ /\ | | | |RNIC 3| SMC-R Link 2 |RNIC 4| |RToken W2| | | |MAC MC|<-------------------->|MAC MD| | | | QP 9 |GID GC| |GID GD| QP 65 | | +------+ +------+ | +-------------------+ +--------------------+ SYN / SYN-ACK / ACK TCP three-way handshake with TCP option <---------------------------------------------------------> SMC Proposal(PC1,MB,GB,S1) <-------------------------------------------------------- SMC Accept(PS1,not 1st contact,MA,GA,QP8,RToken=X,RMB elem index) ---------------------------------------------------------> CONFIRM RKEY(request, Add, <........................................................ RToken set((Link 1,RToken Y2),(Link 2,RToken W2))) CONFIRM RKEY(response, Add, ........................................................> RToken set((Link 1,RToken Y2),(Link 2,RToken W2))) SMC Confirm(PC1,MB,GB,QP64,RToken=Y2, RMB element index) <-------------------------------------------------------- Legend: ------------ TCP/IP and CLC flows ............ RoCE (LLC) flows Figure 13: Client Adds RMB during TCP Connection Setup
3.5.5.2.2. Deleting an RMB from an SMC-R Link Group
Either peer can delete one or more of its RMBs as long as it is not being used for any TCP connections. Ideally, an SMC-R peer would use a timer to avoid freeing an RMB immediately after the last TCP connection stops using it, to keep the RMB available for later TCP connections and avoid thrashing with addition and deletion of RMBs. Once an SMC-R peer decides to delete an RMB, it sends a DELETE RKEY LLC message to its peer. It can then free the RMB once it receives a response from the peer. Multiple RMBs can be deleted in a DELETE RKEY exchange. Note that in a DELETE RKEY message, it is not necessary to specify the full RToken for a deleted RMB. The RMB's RKey over one link in the link group is sufficient to specify which RMB is being deleted. Host X Host Y +-------------------+ +-------------------+ | +------+ +------+ | | QP 8 |RNIC 1| SMC-R Link 1 |RNIC 2| QP 9 | |RToken X| | |<-------------------->| | | | | | | | | | | \/ +------+ +------+ | |+--------+ | | | || Deleted| | | | || RMB | | | | || | | | | |+--------+ | | | | /\ +------+ +------+ | |RToken Z| | | SMC-R Link 2 | | | | | |RNIC 3|<-------------------->|RNIC 4| | | QP 9 | | | | | | +------+ +------+ | +-------------------+ +-------------------+ DELETE RKEY(request, RKey list(RKey X)) ................................................> DELETE RKEY(response, RKey list(RKey X)) <................................................ (Note: This exchange can flow over either SMC-R link.) Figure 14: Delete RMB from SMC-R Link Group
3.5.5.2.3. Adding a New SMC-R Link to a Link Group with Multiple RMBs
When a new SMC-R link is added to an existing link group, there could be multiple RMBs on each side already associated with the link group. There could also be a different number of RMBs on one side than on the other, because each peer manages its RMBs independently. Each of these RMBs will require a new RToken to be used on the new SMC-R link, and those new RTokens must then be communicated to the peer. This requires two-way communication, as the server will have to communicate its RTokens to the client and vice versa. RTokens are communicated between peers in pairs. Each RToken pair consists of: o The RToken for the RMB, as is already known on an existing SMC-R link in the link group. o The RToken for the same RMB, to be used on the new SMC-R link. These pairs are required to ensure that each peer knows which RTokens across QPs are equivalent. The ADD LINK request and response LLC messages do not have enough space to contain any RToken pairs. ADD LINK CONTINUATION LLC messages are used to communicate these pairs, as shown in Figure 15. The ADD LINK CONTINUATION LLC messages are sent on the same SMC-R link that the ADD LINK LLC messages were sent over, and in both the ADD LINK and ADD LINK CONTINUATION LLC messages the first RToken in each RToken pair will be the RToken for the RMB as known on the SMC-R link over which the LLC message is being sent.
Host X -- Server Host Y -- Client +-------------------+ +-------------------+ | Peer ID = PS1 | | Peer ID = PC1 | | +------+ +------+ | | QP 8 |RNIC 1| SMC-R Link 1 |RNIC 2| QP 64 | |RKey set| |MAC MA|<-------------------->|MAC MB| |RKey set| |X,Y,Z | |GID GA| |GID GB| |Q,R,S,T | | \/ +------+ +------+ \/ | |+--------+ | | +--------+ | || 3 RMBs | | | | 4 RMBs | | |+--------+ | | +--------+ | | /\ +------+ +------+ /\ | |RKey set| |RNIC 3| SMC-R Link 2 |RNIC 4| | RKey set| |U,V,W | |MAC MC|<-------------------->|MAC MD| | L,M,N,P | | QP 9 |GID GC| (being added) |GID GD| QP 65 | | +------+ +------+ | +-------------------+ +-------------------+ ADD LINK request (QP9,MC,GC, link number = 2) ............................................> ADD LINK response (QP65,MD,GD, link number = 2) <............................................ ADD LINK CONTINUATION req(RToken pairs=((X,U),(Y,V),(Z,W))) ............................................> ADD LINK CONTINUATION rsp(RToken pairs=((Q,L),(R,M),(S,N),(T,P))) <............................................. CONFIRM LINK req/rsp exchange on Link 2 <.............................................> Legend: ------------ TCP/IP and CLC flows ............ RoCE (LLC) flows Figure 15: Exchanging RKeys when a New Link Is Added to a Link Group
3.5.5.3. Serialization of LLC Exchanges, and Collisions
LLC flows can be divided into two main groups for serialization considerations. The first group is LLC messages that are independent and can flow at any time. These are one-time, unsolicited messages that either do not have a required response or have a simple response that does not interfere with the operations of another group of messages. These messages are as follows: o TEST LINK from either the client or the server: This message requires a TEST LINK response to be returned but does not affect the configuration of the link group or the RKeys. o ADD LINK from the client to the server: This message is provided as an "FYI" to the server to let it know that the client has an additional RNIC available. The server is not required to act upon or respond to this message. o DELETE LINK from the client to the server: This message informs the server that either (1) the client has experienced an error or problem that requires a link or link group to be terminated or (2) an operator has commanded that a link or link group be terminated. The server does not respond directly to the message; rather, it initiates a DELETE LINK exchange as a result of receiving it. o DELETE LINK from the server to the client, with the "delete entire link group" flag set: This message informs the client that the entire link group is being deleted. The second group is LLC messages that are part of an exchange of LLC messages that affects link group configuration; this exchange must complete before another exchange of LLC messages that affects link group configuration can be processed. When a peer knows that one of these exchanges is in progress, it must not start another exchange. These exchanges are as follows: o ADD LINK / ADD LINK response / ADD LINK CONTINUATION / ADD LINK CONTINUATION response / CONFIRM LINK / CONFIRM LINK response: This exchange, by adding a new link, changes the configuration of the link group. o DELETE LINK / DELETE LINK response initiated by the server, without the "delete entire link group" flag set: This exchange, by deleting a link, changes the configuration of the link group.
o CONFIRM RKEY / CONFIRM RKEY response or DELETE RKEY / DELETE RKEY response: This exchange changes the RMB configuration of the link group. RKeys cannot change while links are being added or deleted (while an ADD LINK or DELETE LINK is in progress). However, CONFIRM RKEY and DELETE RKEY are unique in that both the client and server can independently manage (add or remove) their own RMBs. This allows each peer to concurrently change their RKeys and therefore concurrently send CONFIRM RKEY or DELETE RKEY requests. The concurrent CONFIRM RKEY or DELETE RKEY requests can be independently processed and do not represent a collision. Because the server is in control of the configuration of the link group, many timing windows and collisions are avoided, but there are still some that must be handled.3.5.5.3.1. Collisions with ADD LINK / CONFIRM LINK Exchange
Colliding LLC message: TEST LINK Action to resolve: Send immediate TEST LINK reply. Colliding LLC message: ADD LINK from client to server Action to resolve: Server ignores the ADD LINK message. When client receives server's ADD LINK, client will consider that message to be in response to its ADD LINK message and the flow works. Since both client and server know not to start this exchange if an ADD LINK operation is already underway, this can only occur if the client sends this message before receiving the server's ADD LINK and this message crosses with the server's ADD LINK message; therefore, the server's ADD LINK arrives at the client immediately after the client sent this message. Colliding LLC message: DELETE LINK from client to server, specific link specified Action to resolve: Server queues the DELETE LINK message and processes it after the ADD LINK exchange completes. If it is an orderly link termination, it can wait until after this exchange continues. If it is disorderly and the link affected is the one that the current exchange is using, the server will discover the outage when a message in this exchange fails. Colliding LLC message: DELETE LINK from client to server, entire link group to be deleted Action to resolve: Immediately clean up the link group.
Colliding LLC message: CONFIRM RKEY from client Action to resolve: Send a negative CONFIRM RKEY response to the client. Once the current exchange finishes, client will have to recompute its RKey set to include the new link and then start a new CONFIRM RKEY exchange.3.5.5.3.2. Collisions during DELETE LINK Exchange
Colliding LLC message: TEST LINK from either peer Action to resolve: Send immediate TEST LINK response. Colliding LLC message: ADD LINK from client to server Action to resolve: Server queues the ADD LINK and processes it after the current exchange completes. Colliding LLC message: DELETE LINK from client to server (specific link) Action to resolve: Server queues the DELETE LINK message and processes it after the current exchange completes. If it is an orderly link termination, it can wait until after this exchange continues. If it is disorderly and the link affected is the one that the current exchange is using, the server will discover the outage when a message in this exchange fails. Colliding LLC message: DELETE LINK from either client or server, deleting the entire link group Action to resolve: Immediately clean up the link group. Colliding LLC message: CONFIRM RKEY from client to server Action to resolve: Send a negative CONFIRM RKEY response to the client. Once the current exchange finishes, client will have to recompute its RKey set to include the new link and then start a new CONFIRM RKEY exchange.
3.5.5.3.3. Collisions during CONFIRM RKEY Exchange
Colliding LLC message: TEST LINK Action to resolve: Send immediate TEST LINK reply. Colliding LLC message: ADD LINK from client to server Action to resolve: Queue the ADD LINK, and process it after the current exchange completes. Colliding LLC message: ADD LINK from server to client (CONFIRM RKEY exchange was initiated by the client, and it crossed with the server initiating an ADD LINK exchange) Action to resolve: Process the ADD LINK. Client will receive a negative CONFIRM RKEY from the server and will have to redo this CONFIRM RKEY exchange after the ADD LINK exchange completes. Colliding LLC message: DELETE LINK from client to server, specific link to be deleted (CONFIRM RKEY exchange was initiated by the server, and it crossed with the client's DELETE LINK request) Action to resolve: Server queues the DELETE LINK message and processes it after the CONFIRM RKEY exchange completes. If it is an orderly link termination, it can wait until after this exchange continues. If it is disorderly and the link affected is the one that the current exchange is using, the server will discover the outage when a message in this exchange fails. Colliding LLC message: DELETE LINK from server to client, specific link deleted (CONFIRM RKEY exchange was initiated by the client, and it crossed with the server's DELETE LINK) Action to resolve: Process the DELETE LINK. Client will receive a negative CONFIRM RKEY from the server and will have to redo this CONFIRM RKEY exchange after the ADD LINK exchange completes. Colliding LLC message: DELETE LINK from either client or server, entire link group deleted Action to resolve: Immediately clean up the link group. Colliding LLC message: CONFIRM LINK from the peer that did not start the current CONFIRM LINK exchange Action to resolve: Queue the request, and process it after the current exchange completes.