Appendix A. Formats
A.1. TCP Option
The SMC-R TCP option is formatted in accordance with [RFC6994] ("Shared Use of Experimental TCP Options"). The ExID value is IBM-1047 (EBCDIC) encoding for "SMCR". 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Kind = 254 | Length = 6 | x'E2' | x'D4' | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | x'C3' | x'D9' | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 24: SMC-R TCP Option FormatA.2. CLC Messages
The following rules apply to all CLC messages: General rules on formats: o Reserved fields must be set to zero and not validated. o Each message has an eye catcher at the start and another eye catcher at the end. These must both be validated by the receiver. o SMC version indicator: The only SMC-R version defined in this architecture is version 1. In the future, if peers have a mismatch of versions, the lowest common version number is used.
A.2.1. Peer ID Format
All CLC messages contain a peer ID that uniquely identifies an instance of a TCP/IP stack. This peer ID is required to be universally unique across TCP/IP stacks and instances (including restarts) of TCP/IP stacks. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Instance ID | RoCE MAC (first 2 bytes) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | RoCE MAC (last 4 bytes) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 25: Peer ID Format Instance ID A 2-byte instance count that ensures that if the same RNIC MAC is later used in the peer ID for a different TCP/IP stack -- for example, if an RNIC is redeployed to another stack -- the values are unique. It also ensures that if a TCP/IP stack is restarted, the instance ID changes. The value is implementation defined, with one suggestion being 2 bytes of the system clock. RoCE MAC The RoCE MAC address for one of the peer's RNICs. Note that in a virtualized environment this will be the virtual MAC of one of the peer's RNICs.
A.2.2. SMC Proposal CLC Message Format
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | x'E2' | x'D4' | x'C3' | x'D9' | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type = 1 | Length |Version| Rsrvd | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | +- Client's Peer ID -+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | +- -+ | | +- Client's preferred GID -+ | | +- -+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Client's preferred RoCE | +- MAC address +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |Offset to mask/prefix area (0) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ . . . Area for future growth . . . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | IPv4 Subnet Mask | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | IPv4 Mask Lgth| Reserved |Num IPv6 prfx | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ : : : Array of IPv6 prefixes (variable length) : : : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | x'E2' | x'D4' | x'C3' | x'D9' | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 26: SMC Proposal CLC Message Format
The fields present in the SMC Proposal CLC message are: Eye catchers Like all CLC messages, the SMC Proposal has beginning and ending eye catchers to aid with verification and parsing. The hex digits spell "SMCR" in IBM-1047 (EBCDIC). Type CLC message Type 1 indicates SMC Proposal. Length The length of this CLC message. If this is an IPv4 flow, this value is 52. Otherwise, it is variable, depending upon how many prefixes are listed. Version Version of the SMC-R protocol. Version 1 is the only currently defined value. Client's Peer ID As described in Appendix A.2.1 above. Client's preferred RoCE GID The IPv6 address of the client's preferred RNIC on the RoCE fabric. Client's preferred RoCE MAC address The MAC address of the client's preferred RNIC on the RoCE fabric. It is required, as some operating systems do not have neighbor discovery or ARP support for RoCE RNICs. Offset to mask/prefix area Provides the number of bytes that must be skipped after this field, to access the IPv4 Subnet Mask field and the fields that follow it. Allows for future growth of this signal. In this version of the architecture, this value is always zero.
Area for future growth In this version of the architecture, this field does not exist. This indicates where additional information may be inserted into the signal in the future. The "Offset to mask/prefix area" field must be used to skip over this area. IPv4 Subnet Mask If this message is flowing over an IPv4 TCP connection, the value of the subnet mask associated with the interface over which the client sent this message. If this is an IPv6 flow, this field is all zeros. This field, along with all fields that follow it in this signal, must be accessed by skipping the number of bytes listed in the "Offset to mask/prefix area" field after the end of that field. IPv4 Mask Lgth If this message is flowing over an IPv4 TCP connection, the number of significant bits in the IPv4 Subnet Mask field. If this is an IPv6 flow, this field is zero. Num IPv6 prfx If this message is flowing over an IPv6 TCP connection, the number of IPv6 prefixes that follow, with a maximum value of 8. If this is an IPv4 flow, this field is zero and is immediately followed by the ending eye catcher.
Array of IPv6 prefixes For IPv6 TCP connections, a list of the IPv6 prefixes associated with the network over which the client sent this message, up to a maximum of eight prefixes. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | + + | | + IPv6 prefix value + | | + + | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Prefix Length | +-+-+-+-+-+-+-+-+ Figure 27: Format for IPv6 Prefix Array Element
A.2.3. SMC Accept CLC Message Format
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | x'E2' | x'D4' | x'C3' | x'D9' | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type = 2 | Length = 68 |Version|F|Rsrvd| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | +- Server's Peer ID -+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | +- -+ | | +- Server's RoCE GID -+ | | +- -+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Server's RoCE | +- MAC address +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | Server QP (bytes 1-2) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+---+ |Srvr QP byte 3 | Server RMB RKey (bytes 1-3) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Srvr RMB byte 4|Server RMB indx| Srvr RMB alert tkn (bytes 1-2)| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Srvr RMB alert tkn (bytes 3-4)|Bsize | MTU | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | +- Server's RMB virtual address -+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Reserved | Server's initial packet sequence number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | x'E2' | x'D4' | x'C3' | x'D9' | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 28: SMC Accept CLC Message Format
The fields present in the SMC Accept CLC message are: Eye catchers Like all CLC messages, the SMC Accept has beginning and ending eye catchers to aid with verification and parsing. The hex digits spell "SMCR" in IBM-1047 (EBCDIC). Type CLC message Type 2 indicates SMC Accept. Length The SMC Accept CLC message is 68 bytes long. Version Version of the SMC-R protocol. Version 1 is the only currently defined value. F-bit First contact flag: A 1-bit flag that indicates that the server believes this TCP connection is the first SMC-R contact for this link group. Server's Peer ID As described in Appendix A.2.1 above. Server's RoCE GID The IPv6 address of the RNIC that the server chose for this SMC-R link. Server's RoCE MAC address The MAC address of the server's RNIC for the SMC-R link. It is required, as some operating systems do not have neighbor discovery or ARP support for RoCE RNICs. Server's QP number The number for the reliably connected queue pair that the server created for this SMC-R link.
Server's RMB RKey The RDMA RKey for the RMB that the server created or chose for this TCP connection. Server's RMB element index Indexes which element within the server's RMB will represent this TCP connection. Server's RMB element alert token A platform-defined, architecturally opaque token that identifies this TCP connection. Added by the client as immediate data on RDMA writes from the client to the server to inform the server that there is data for this connection to retrieve from the RMB element. Bsize: Server's RMB element buffer size in 4-bit compressed notation: x = 4 bits. Actual buffer size value is (2^(x + 4)) * 1K. Smallest possible value is 16K. Largest size supported by this architecture is 512K. MTU An enumerated value indicating this peer's QP MTU size. The two peers exchange their MTU values, and whichever value is smaller will be used for the QP. This field should only be validated in the first contact exchange. The enumerated MTU values are: 0: reserved 1: 256 2: 512 3: 1024 4: 2048 5: 4096 6-15: reserved
Server's RMB virtual address The virtual address of the server's RMB as assigned by the server's RNIC. Server's initial packet sequence number The starting packet sequence number that this peer will use when sending to the other peer, so that the other peer can prepare its QP for the sequence number to expect.
A.2.4. SMC Confirm CLC Message Format
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | x'E2' | x'D4' | x'C3' | x'D9' | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type = 3 | Length = 68 |Version| Rsrvd | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | +- Client's Peer ID -+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | +- -+ | | +- Client's RoCE GID -+ | | +- -+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Client's RoCE | +- MAC address +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | Client QP (bytes 1-2) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+---+ |Clnt QP byte 3 | Client RMB RKey (bytes 1-3) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Clnt RMB byte 4|Client RMB indx| Clnt RMB alert tkn (bytes 1-2)| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Clnt RMB alert tkn (bytes 3-4)|Bsize | MTU | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | +- Client's RMB Virtual Address -+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Reserved | Client's initial packet sequence number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | x'E2' | x'D4' | x'C3' | x'D9' | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 29: SMC Confirm CLC Message Format The SMC Confirm CLC message is nearly identical to the SMC Accept, except that it contains client information and lacks a first contact flag.
The fields present in the SMC Confirm CLC message are: Eye catchers Like all CLC messages, the SMC Confirm has beginning and ending eye catchers to aid with verification and parsing. The hex digits spell "SMCR" in IBM-1047 (EBCDIC). Type CLC message Type 3 indicates SMC Confirm. Length The SMC Confirm CLC message is 68 bytes long. Version Version of the SMC-R protocol. Version 1 is the only currently defined value. Client's Peer ID As described in Appendix A.2.1 above. Client's RoCE GID The IPv6 address of the RNIC that the client chose for this SMC-R link. Client's RoCE MAC address The MAC address of the client's RNIC for the SMC-R link. It is required, as some operating systems do not have neighbor discovery or ARP support for RoCE RNICs. Client's QP number The number for the reliably connected queue pair that the client created for this SMC-R link. Client's RMB RKey The RDMA RKey for the RMB that the client created or chose for this TCP connection.
Client's RMB element index Indexes which element within the client's RMB will represent this TCP connection. Client's RMB element alert token A platform-defined, architecturally opaque token that identifies this TCP connection. Added by the server as immediate data on RDMA writes from the server to the client to inform the client that there is data for this connection to retrieve from the RMB element. Bsize: Client's RMB element buffer size in 4-bit compressed notation: x = 4 bits. Actual buffer size value is (2^(x + 4)) * 1K. Smallest possible value is 16K. Largest size supported by this architecture is 512K. MTU An enumerated value indicating this peer's QP MTU size. The two peers exchange their MTU values, and whichever value is smaller will be used for the QP. The values are enumerated in Appendix A.2.3. This value should only be validated in the first contact exchange. Client's RMB Virtual Address The virtual address of the client's RMB as assigned by the server's RNIC. Client's initial packet sequence number The starting packet sequence number that this peer will use when sending to the other peer, so that the other peer can prepare its QP for the sequence number to expect.
A.2.5. SMC Decline CLC Message Format
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | x'E2' | x'D4' | x'C3' | x'D9' | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type = 4 | Length = 28 |Version|S|Rsrvd| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | +- Sender's Peer ID -+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Peer Diagnosis Information | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | x'E2' | x'D4' | x'C3' | x'D9' | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 30: SMC Decline CLC Message Format The fields present in the SMC Decline CLC message are: Eye catchers Like all CLC messages, the SMC Decline has beginning and ending eye catchers to aid with verification and parsing. The hex digits spell "SMCR" in IBM-1047 (EBCDIC). Type CLC message Type 4 indicates SMC Decline. Length The SMC Decline CLC message is 28 bytes long. Version Version of the SMC-R protocol. Version 1 is the only currently defined value. S-bit Sync Bit. Indicates that the link group is out of sync and the receiving peer must clean up its representation of the link group.
Sender's Peer ID As described in Appendix A.2.1 above. Peer Diagnosis Information 4 bytes of diagnosis information provided by the peer. These values are defined by the individual peers, and it is necessary to consult the peer's system documentation to interpret the results.A.3. LLC Messages
LLC messages are sent over an existing SMC-R link using RoCE SendMsg and are always 44 bytes long so that they fit into the space available in a single WQE without requiring the receiver to post receive buffers. If all 44 bytes are not needed, they are padded out with zeros. LLC messages are in a request/response format. The message type is the same for request and response, and a flag indicates whether a message is flowing as a request or a response. The two high-order bits of an LLC message opcode indicate how it is to be handled by a peer that does not support the opcode. If the high-order bits of the opcode are b'00', then the peer must support the LLC message and indicate a protocol error if it does not. If the high-order bits of the opcode are b'10', then the peer must silently discard the LLC message if it does not support the opcode. This requirement is included to allow for toleration of advanced, but optional, functionality. High-order bits of b'11' indicate a Connection Data Control (CDC) message as described in Appendix A.4.
A.3.1. CONFIRM LINK LLC Message Format
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type = 1 | Length = 44 | Reserved |R| Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Sender's RoCE | +- MAC address +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | | +- -+ | Sender's RoCE GID | +- -+ | | +- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |Sender's QP number, bytes 1-2 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Sender QP byte3| Link number |Sender's link userID, bytes 1-2| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Sender's link userID, bytes 3-4| Max links | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | +- Reserved -+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 31: CONFIRM LINK LLC Message Format The CONFIRM LINK LLC message is required to be exchanged between the server and client over a newly created SMC-R link to complete the setup of an SMC-R link. Its purpose is to confirm that the RoCE path is actually usable. On first contact, this message flows after the server receives the SMC Confirm CLC message from the client over the IP connection. For additional links added to an SMC-R link group, it flows after the ADD LINK and ADD LINK CONTINUATION exchange. This flow provides confirmation that the queue pair is in fact usable. Each peer echoes its RoCE information back to the other.
The contents of the CONFIRM LINK LLC message are: Type Type 1 indicates CONFIRM LINK. Length The CONFIRM LINK LLC message is 44 bytes long. R Reply flag. When set, indicates that this is a CONFIRM LINK reply. Sender's RoCE MAC address The MAC address of the sender's RNIC for the SMC-R link. It is required, as some operating systems do not have neighbor discovery or ARP support for RoCE RNICs. Sender's RoCE GID The IPv6 address of the RNIC that the sender is using for this SMC-R link. Sender's QP number The number for the reliably connected queue pair that the sender created for this SMC-R link. Link number An identifier assigned by the server that uniquely identifies the link within the link group. This identifier is ONLY unique within a link group. Provided by the server and echoed back by the client. Link user ID An opaque, implementation-defined identifier assigned by the sender and provided to the receiver solely for purposes of display, diagnosis, network management, etc. The link user ID should be unique across the sender's entire software space, including all other link groups.
Max links The maximum number of links the sender can support in a link group. The maximum for this link group is the smaller of the values provided by the two peers.A.3.2. ADD LINK LLC Message Format
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type = 2 | Length = 44 | Rsrvd |RsnCode|R|Z| Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Sender's RoCE | +- MAC address +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | | +- -+ | Sender's RoCE GID | +- -+ | | +- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |Sender's QP number, bytes 1-2 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Sender QP byte3| Link number |Rsrvd | MTU |Initial PSN | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Initial PSN (continued) | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ -+ | Reserved | +- -+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 32: ADD LINK LLC Message Format The ADD LINK LLC message is sent over an existing link in the link group when a peer wishes to add an SMC-R link to an existing SMC-R link group. It is sent by the server to add a new SMC-R link to the group, or by the client to request that the server add a new link -- for example, when a new RNIC becomes active. When sent from the client to the server, it represents a request that the server initiate an ADD LINK exchange.
This message is sent immediately after the initial SMC-R link in the group completes, as described in Section 3.5.1 ("First Contact"). It can also be sent over an existing SMC-R link group at any time as new RNICs are added and become available. Therefore, there can be as few as one new RMB RToken to be communicated, or several. RTokens will be communicated using ADD LINK CONTINUATION messages. The contents of the ADD LINK LLC message are: Type Type 2 indicates ADD LINK. Length The ADD LINK LLC message is 44 bytes long. RsnCode If the Z (rejection) flag is set, this field provides the reason code. Values can be: X'1' - no alternate path available: set when the server provides the same MAC/GID as an existing SMC-R link in the group, and the client does not have any additional RNICs available (i.e., the server is attempting to set up an asymmetric link but none is available). X'2' - Invalid MTU value specified. R Reply flag. When set, indicates that this is an ADD LINK reply. Z Rejection flag. When set on reply, indicates that the server's ADD LINK was rejected by the client. When this flag is set, the reason code will also be set. Sender's RoCE MAC address The MAC address of the sender's RNIC for the new SMC-R link. It is required, as some operating systems do not have neighbor discovery or ARP support for RoCE RNICs.
Sender's RoCE GID The IPv6 address of the RNIC that the sender is using for the new SMC-R link. Sender's QP number The number for the reliably connected queue pair that the sender created for the new SMC-R link. Link number An identifier for the new SMC-R link. This is assigned by the server and uniquely identifies the link within the link group. This identifier is ONLY unique within a link group. Provided by the server and echoed back by the client. MTU An enumerated value indicating this peer's QP MTU size. The two peers exchange their MTU values, and whichever value is smaller will be used for the QP. The values are enumerated in Appendix A.2.3. Initial PSN The starting packet sequence number (PSN) that this peer will use when sending to the other peer, so that the other peer can prepare its QP for the sequence number to expect.
A.3.3. ADD LINK CONTINUATION LLC Message Format
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type = 3 | Length = 44 | Reserved |R| Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Linknum | NumRTokens | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | +- -+ | | +- RKey/RToken pair -+ | | +- -+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | +- -+ | | +- RKey/RToken pair or zeros -+ | | +- -+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 33: ADD LINK CONTINUATION LLC Message Format When a new SMC-R link is added to an SMC-R link group, it is necessary to communicate the new link's RTokens for the RMBs that the SMC-R link group can access. This message follows the ADD LINK and provides the RTokens. The server kicks off this exchange by sending the first ADD LINK CONTINUATION LLC message, and the server controls the exchange as described below. o If the client and the server require the same number of ADD LINK CONTINUATION messages to communicate their RTokens, the server starts the exchange by sending the first ADD LINK CONTINUATION request to the client with its (the server's) RTokens. The client then responds with an ADD LINK CONTINUATION response with its RTokens, and so on until the exchange is completed.
o If the server requires more ADD LINK CONTINUATION messages than the client, then after the client has communicated all of its RTokens, the server continues to send ADD LINK CONTINUATION request messages to the client. The client continues to respond, using empty (number of RTokens to be communicated = 0) ADD LINK CONTINUATION response messages. o If the client requires more ADD LINK CONTINUATION messages than the server, then after communicating all of its RTokens, the server will continue to send empty ADD LINK CONTINUATION messages to the client to solicit replies with the client's RTokens, until all have been communicated. The contents of the ADD LINK CONTINUATION LLC message are: Type Type 3 indicates ADD LINK CONTINUATION. Length The ADD LINK CONTINUATION LLC message is 44 bytes long. R Reply flag. When set, indicates that this is an ADD LINK CONTINUATION reply. LinkNum The link number of the new link within the SMC-R link group for which RKeys are being communicated. NumRTokens Number of RTokens remaining to be communicated (including the ones in this message). If the value is less than or equal to 2, this is the last message. If it is greater than 2, another continuation message will be required, and its value will be the value in this message minus 2, and so on until all RKeys are communicated. The maximum value for this field is 255.
RKey/RToken pairs (two or less) These consist of an RKey for an RMB that is known on the SMC-R link over which this message was sent (the reference RKey), paired with the same RMB's RToken over the new SMC-R link. A full RToken is not required for the reference, because it is only being used to distinguish which RMB it applies to, not address it. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Reference RKey | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | New RKey | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | +- New Virtual Address -+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 34: RKey/RToken Pair Format The contents of the RKey/RToken pair are: Reference RKey The RKey of the RMB as it is already known on the SMC-R link over which this message is being sent. Required so that the peer knows with which RMB to associate the new RToken. New RKey The RKey of this RMB as it is known over the new SMC-R link. New Virtual Address The virtual address of this RMB as it is known over the new SMC-R link.
A.3.4. DELETE LINK LLC Message Format
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type = 4 | Length = 44 | Reserved |R|A|O| Rsrvd | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Linknum | reason code (bytes 1-3) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |RsnCode byte 4 | | +-+-+-+-+-+-+-+-+ -+ | | +- -+ | | +- -+ | | +- Reserved -+ | | +- -+ | | +- -+ | | +- -+ | | +- -+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 35: DELETE LINK LLC Message Format When the client or server detects that a QP or SMC-R link goes down or needs to come down, it sends this message over one of the other links in the link group. When the DELETE LINK is sent from the client, it only serves as a notification, and the client expects the server to respond by sending a DELETE LINK request. To avoid races, only the server will initiate the actual DELETE LINK request and response sequence that results from notification from the client. The server can also initiate the DELETE LINK without notification from the client if it detects an error or if orderly link termination was initiated. The client may also request termination of the entire link group, and the server may terminate the entire link group using this message.
The contents of the DELETE LINK LLC message are: Type Type 4 indicates DELETE LINK. Length The DELETE LINK LLC message is 44 bytes long. R Reply flag. When set, indicates that this is a DELETE LINK reply. A "All" flag. When set, indicates that all links in the link group are to be terminated. This terminates the link group. O Orderly flag. Indicates orderly termination. Orderly termination is generally caused by an operator command rather than an error on the link. When the client requests orderly termination, the server may wait to complete other work before terminating. LinkNum The link number of the link to be terminated. If the A flag is set, this field has no meaning and is set to 0. RsnCode The termination reason code. Currently defined reason codes are: Request reason codes: X'00010000' = Lost path X'00020000' = Operator initiated termination X'00030000' = Program initiated termination (link inactivity) X'00040000' = LLC protocol violation X'00050000' = Asymmetric link no longer needed
Response reason code: X'00100000' = Unknown link ID (no link)A.3.5. CONFIRM RKEY LLC Message Format
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type = 6 | Length = 44 | Reserved |R|0|Z|C|Rsrvd | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | NumTkns | New RMB RKey for this link (bytes 1-3) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |ThisLink byte 4| | +-+-+-+-+-+-+-+-+ -+ | New RMB virtual address for this link | +- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | +-+-+-+-+-+-+-+-+ -+ | | +- Other link RMB specification or zeros -+ | | +- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ -+ | | +- -+ | Other link RMB specification or zeros | +- +-+-+-+-+-+-+-+-+ | | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 36: CONFIRM RKEY LLC Message Format The CONFIRM RKEY flow can be sent at any time from either the client or the server, to inform the peer that an RMB has been created or deleted. The creator of a new RMB must inform its peer of the new RMB's RToken for all SMC-R links in the SMC-R link group. For RMB creation, the creator sends this message over the SMC-R link that the first TCP connection that uses the new RMB is using. This message contains the new RMB RToken for the SMC-R link over which the message is sent. It then lists the sender's SMC-R links in the link group paired with the new RToken for the new RMB for that link. This message can communicate the new RTokens for three QPs: the QP for the link over which this message is sent, and two others. If there are more than three links in the SMC-R link group, a CONFIRM RKEY CONTINUATION will be required.
The peer responds by simply echoing the message with the response flag set. If the response is a negative response, the sender must recalculate the RToken set and start a new CONFIRM RKEY exchange from the beginning. The timing of this retry is controlled by the C flag, as described below. The contents of the CONFIRM RKEY LLC message are: Type Type 6 indicates CONFIRM RKEY. Length The CONFIRM RKEY LLC message is 44 bytes long. R Reply flag. When set, indicates that this is a CONFIRM RKEY reply. 0 Reserved bit. Z Negative response flag. C Configuration Retry bit. If this is a negative response and this flag is set, the originator should recalculate the RKey set and retry this exchange as soon as the current configuration change is completed. If this flag is not set on a negative response, the originator must wait for the next natural stimulus (for example, a new TCP connection started that requires a new RMB) before retrying. NumTkns The number of other link/RToken pairs, including those provided in this message, to be communicated. Note that this value does not include the RToken for the link on which this message was sent (i.e., the maximum value is 2). If this value is 3 or less, this is the only message in the exchange. If this value is greater than 3, a CONFIRM RKEY CONTINUATION message will be required.
Note: In this version of the architecture, eight is the maximum number of links supported in a link group. New RMB RKey for this link The new RMB's RKey as assigned on the link over which this message is being sent. New RMB virtual address for this link The new RMB's virtual address as assigned on the link over which this message is being sent. Other link RMB specification The new RMB's specification on the other links in the link group, as shown in Figure 37. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Link number | RMB's RKey for the specified link (bytes 1-3) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |New RKey byte 4| | +-+-+-+-+-+-+-+-+ -+ | RMB's virtual address for the specified link | +- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+ Figure 37: Format of Link Number/RKey Pairs Link number The link number for a link in the link group. RMB's RKey for the specified link The RKey used to reach the RMB over the link whose number was specified in the Link number field. RMB's virtual address for the specified link The virtual address used to reach the RMB over the link whose number was specified in the Link number field.
A.3.6. CONFIRM RKEY CONTINUATION LLC Message Format
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type = 8 | Length = 44 | Reserved |R|0|Z| Rsrvd | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | NumTknsLeft | | +-+-+-+-+-+-+-+-+ -+ | | +- Other link RMB specification -+ | | +- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | +-+-+-+-+-+-+-+-+ -+ | | +- Other link RMB specification or zeros -+ | | +- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ -+ | | +- -+ | Other link RMB specification or zeros | +- +-+-+-+-+-+-+-+-+ | | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 38: CONFIRM RKEY CONTINUATION LLC Message Format The CONFIRM RKEY CONTINUATION LLC message is used to communicate any additional RMB RTokens that did not fit into the CONFIRM RKEY message. Each of these messages can hold up to three RMB RTokens. The NumTknsLeft field indicates how many RMB RTokens are to be communicated, including the ones in this message. If the value is 3 or less, this is the last message of the group. If the value is 4 or higher, additional CONFIRM RKEY CONTINUATION messages will follow, and the NumTknsLeft value will be a countdown until all are communicated. Like the CONFIRM RKEY message, the peer responds by echoing the message back with the reply flag set.
The contents of the CONFIRM RKEY CONTINUATION LLC message are: Type Type 8 indicates CONFIRM RKEY CONTINUATION. Length The CONFIRM RKEY CONTINUATION LLC message is 44 bytes long. R Reply flag. When set, indicates that this is a CONFIRM RKEY CONTINUATION reply. 0 Reserved bit. Z Negative response flag. NumTknsLeft The number of link/RToken pairs, including those provided in this message, that are remaining to be communicated. If this value is 3 or less, this is the last message in the exchange. If this value is greater than 3, another CONFIRM RKEY CONTINUATION message will be required. Note that in this version of the architecture, eight is the maximum number of links supported in a link group. Other link RMB specification The new RMB's specification on other links in the link group, as shown in Figure 37.
A.3.7. DELETE RKEY LLC Message Format
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type = 9 | Length = 44 | Reserved |R|0|Z| Rsrvd | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Count | Error Mask | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | First deleted RKey | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Second deleted RKey or zeros | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Third deleted RKey or zeros | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Fourth deleted RKey or zeros | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Fifth deleted RKey or zeros | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Sixth deleted RKey or zeros | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Seventh deleted RKey or zeros | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Eighth deleted RKey or zeros | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 39: DELETE RKEY LLC Message Format The DELETE RKEY flow can be sent at any time from either the client or the server, to inform the peer that one or more RMBs have been deleted. Because the peer already knows every RMB's RKey on each link in the link group, this message only specifies one RKey for each RMB being deleted. The RKey provided for each deleted RMB will be its RKey as known on the SMC-R link over which this message is sent. It is not necessary to provide the entire RToken. The RKey alone is sufficient for identifying an existing RMB. The peer responds by simply echoing the message with the response flag set. If the peer did not recognize an RKey, a negative response flag will be set; however, no aggressive recovery action beyond logging the error will be taken.
The contents of the DELETE RKEY LLC message are: Type Type 9 indicates DELETE RKEY. Length The DELETE RKEY LLC message is 44 bytes long. R Reply flag. When set, indicates that this is a DELETE RKEY reply. 0 Reserved bit. Z Negative response flag. Count Number of RMBs being deleted by this message. Maximum value is 8. Error Mask If this is a negative response, indicates which RMBs were not successfully deleted. Each bit corresponds to a listed RMB; for example, b'01010000' indicates that the second and fourth RKeys weren't successfully deleted. Deleted RKeys A list of Count RKeys. Provided on the request flow and echoed back on the response flow. Each RKey is valid on the link over which this message is sent and represents a deleted RMB. Up to eight RMBs can be deleted in this message.
A.3.8. TEST LINK LLC Message Format
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type = 7 | Length = 44 | Reserved |R| Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | +- -+ | | +- User Data -+ | | +- -+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | +- -+ | | +- -+ | Reserved | +- -+ | | +- -+ | | +- -+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 40: TEST LINK LLC Message Format The TEST LINK request can be sent from either peer to the other on an existing SMC-R link at any time to test that the SMC-R link is active and healthy at the software level. A peer that receives a TEST LINK LLC message immediately sends back a TEST LINK reply, echoing back the user data. Refer also to Section 4.5.3 ("TCP Keepalive Processing").
The contents of the TEST LINK LLC message are: Type Type 7 indicates TEST LINK. Length The TEST LINK LLC message is 44 bytes long. R Reply flag. When set, indicates that this is a TEST LINK reply. User Data The receiver of this message echoes the sender's data back in a TEST LINK response LLC message.A.4. Connection Data Control (CDC) Message Format
The RMBE control data is communicated using Connection Data Control (CDC) messages, which use RoCE SendMsg, similar to LLC messages. Also, as with LLC messages, CDC messages are 44 bytes long to ensure that they can fit into private data areas of receive WQEs without requiring the receiver to post receive buffers. Unlike LLC messages, this data is integral to the data path, so its processing must be prioritized and optimized similarly to other data path processing. While LLC messages may be processed on a slower path than data, these messages cannot be.
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type = x'FE' | Length = 44 | Sequence number | 4 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SMC-R alert token | 8 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Reserved | Producer cursor wrap seqno | 12 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Producer Cursor | 16 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Reserved | Consumer cursor wrap seqno | 20 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Consumer Cursor | 24 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |B|P|U|R|F|Rsrvd|D|C|A| Reserved | 28 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | 32 +- -+ | | 36 +- Reserved -+ | | 40 +- -+ | | 44 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 41: Connection Data Control (CDC) Message Format Type = x'FE' This type number has the two high-order bits turned on to enable processing to quickly distinguish it from an LLC message. Length = 44 The length of inline data that does not require the posting of a receive buffer. Sequence number A 2-byte unsigned integer that represents a wrapping sequence number. The initial value is 1, and this value can wrap to 0. Incremented with every control message sent, except for the failover data validation message, and used to guard against processing an old control message out of sequence. Also used in failover data validation. In normal usage, if this number is less
than the last received value, discard this message. If greater, process this message. Old control messages can be lost with no ill effect but cannot be processed after newer ones. If this is a failover validation CDC message (F flag set), then the receiver must verify that it has received and fully processed the RDMA write that was described by the CDC message with the sequence number in this message. If not, the TCP connection must be reset to guard against data loss. Details of this processing are provided in Section 4.6.1. SMC-R alert token The endpoint-assigned alert token that identifies to which TCP connection on the link group this control message refers. Producer cursor wrap seqno A 2-byte unsigned integer that represents a wrapping counter incremented by the producer whenever the data written into this RMBE receive buffer causes a wrap (i.e., the producer cursor wraps). This is used by the receiver to determine when new data is available even though the cursors appear unchanged, such as when a full window size write is completed (producer cursor of this RMBE sent by peer = local consumer cursor) or in scenarios where the producer cursor sent for this RMBE < local consumer cursor. Producer Cursor A 4-byte unsigned integer that is a wrapping offset into the RMBE data area. Points to the next byte of data to be written by the sender. Can advance up to the receiver's consumer cursor as known by the sender. When the urgent data present indicator is on, points 1 byte beyond the last byte of urgent data. When computing this cursor, the presence of the eye catcher in the RMBE data area must be accounted for. The first writable data location in the RMBE is at offset 4, so this cursor begins at 4 and wraps to 4. Consumer cursor wrap seqno A 2-byte unsigned integer that mirrors the value of the producer cursor wrap sequence number when the last read from this RMBE occurred. Used as an indicator of how far along the consumer is in reading data (i.e., processed last wrap point or not). The producer side can use this indicator to detect whether or not more data can be written to the partner in full window write scenarios (where the producer cursor = consumer cursor as known on the
remote RMBE). In this scenario, if the consumer sequence number equals the local producer sequence number, the producer knows that more data can be written. Consumer Cursor A 4-byte unsigned integer that is a wrapping offset into the sender's RMBE data area. Points to the offset of the next byte of data to be consumed by the peer in its own RMBE. When computing this cursor, the presence of the eye catcher in the RMBE data area must be accounted for. The first writable data location in the RMBE is at offset 4, so this cursor begins at 4 and wraps to 4. The sender cannot write beyond this cursor into the peer's RMBE without causing data loss. B-bit Writer blocked indicator: Sender is blocked for writing. If this bit is set, sender will require explicit notification when receive buffer space is available. P-bit Urgent data pending: Sender has urgent data pending for this connection. U-bit Urgent data present: Indicates that urgent data is present in the RMBE data area, and the producer cursor points to 1 byte beyond the last byte of urgent data. R-bit Request for consumer cursor update: Indicates that an immediate consumer cursor update is requested, regardless of whether or not one is warranted according to the window size optimization algorithm described in Section 4.5.1. F-bit Failover validation indicator: Sent by a peer to guard against data loss during failover when the TCP connection is being moved to another SMC-R link in the link group. When this bit is set, the only other fields in the CDC message that are significant are the Type, Length, SMC-R alert token, and Sequence number fields. The receiver must validate that it has fully processed the RDMA write described by the previous CDC message bearing the same
sequence number as this validation message. If it has, no further action is required. If it has not, the TCP connection must be reset. This processing is described in detail in Section 4.6.1. D-bit Sending done indicator: Sent by a peer when it is done writing new data into the receiver's RMBE data area. C-bit PeerConnectionClosed indicator: Sent by a peer when it is completely done with this connection and will no longer be making any updates to the receiver's RMBE or sending any more control messages. A-bit Abnormal close indicator: Sent by a peer when the connection is abnormally terminated (for example, the TCP connection was reset). When sent, it indicates that the peer is completely done with this connection and will no longer be making any updates to this RMBE or sending any more control messages. It also indicates that the RMBE owner must flush any remaining data on this connection and generate an error return code to any outstanding socket APIs on this connection (same processing as receiving a RST segment on a TCP connection).