Data transmission
MUST only happen in the ESTABLISHED, SHUTDOWN-PENDING, and SHUTDOWN-RECEIVED states. The only exception to this is that DATA chunks are allowed to be bundled with an outbound COOKIE ECHO chunk when in the COOKIE-WAIT state.
DATA chunks
MUST only be received according to the rules below in ESTABLISHED, SHUTDOWN-PENDING, and SHUTDOWN-SENT states. A DATA chunk received in CLOSED is out of the blue and
SHOULD be handled per
Section 8.4. A DATA chunk received in any other state
SHOULD be discarded.
A SACK chunk
MUST be processed in ESTABLISHED, SHUTDOWN-PENDING, and SHUTDOWN-RECEIVED states. An incoming SACK chunk
MAY be processed in COOKIE-ECHOED. A SACK chunk in the CLOSED state is out of the blue and
SHOULD be processed according to the rules in
Section 8.4. A SACK chunk received in any other state
SHOULD be discarded.
For transmission efficiency, SCTP defines mechanisms for bundling of small user messages and fragmentation of large user messages. The following diagram depicts the flow of user messages through SCTP.
In this section, the term "data sender" refers to the endpoint that transmits a DATA chunk, and the term "data receiver" refers to the endpoint that receives a DATA chunk. A data receiver will transmit SACK chunks.
+-------------------------+
| User Messages |
+-------------------------+
SCTP user ^ |
==================|==|=======================================
| v (1)
+------------------+ +---------------------+
| SCTP DATA Chunks | | SCTP Control Chunks |
+------------------+ +---------------------+
^ | ^ |
| v (2) | v (2)
+--------------------------+
| SCTP packets |
+--------------------------+
SCTP ^ |
===========================|==|===========================
| v
Connectionless Packet Transfer Service (e.g., IP)
The following applies:
- 1)
-
When converting user messages into DATA chunks, an endpoint MUST fragment large user messages into multiple DATA chunks. The size of each DATA chunk SHOULD be smaller than or equal to the Association Maximum DATA Chunk Size (AMDCS). The data receiver will normally reassemble the fragmented message from DATA chunks before delivery to the user (see Section 6.9 for details).
- 2)
-
Multiple DATA and control chunks MAY be bundled by the sender into a single SCTP packet for transmission, as long as the final size of the SCTP packet does not exceed the current PMTU. The receiver will unbundle the packet back into the original chunks. Control chunks MUST come before DATA chunks in the packet.
The fragmentation and bundling mechanisms, as detailed in Sections [
6.9] and [
6.10], are
OPTIONAL to implement by the data sender, but they
MUST be implemented by the data receiver, i.e., an endpoint
MUST properly receive and process bundled or fragmented data.
This section specifies the rules for sending DATA chunks. In particular, it defines zero window probing, which is required to avoid the indefinite stalling of an association in case of a loss of packets containing SACK chunks performing window updates.
This document is specified as if there is a single retransmission timer per destination transport address, but implementations
MAY have a retransmission timer for each DATA chunk.
The following general rules
MUST be applied by the data sender for transmission and/or retransmission of outbound DATA chunks:
- A)
-
At any given time, the data sender MUST NOT transmit new data to any destination transport address if its peer's rwnd indicates that the peer has no buffer space (i.e., rwnd is smaller than the size of the next DATA chunk; see Section 6.2.1), except for zero window probes.
A zero window probe is a DATA chunk sent when the receiver has no buffer space. This rule allows the sender to probe for a change in rwnd that the sender missed due to the SACK chunks having been lost in transit from the data receiver to the data sender. A zero window probe MUST only be sent when the cwnd allows (see rule B below). A zero window probe SHOULD only be sent when all outstanding DATA chunks have been cumulatively acknowledged and no DATA chunks are in flight. Senders MUST support zero window probing.
If the sender continues to receive SACK chunks from the peer while doing zero window probing, the unacknowledged window probes SHOULD NOT increment the error counter for the association or any destination transport address. This is because the receiver could keep its window closed for an indefinite time. Section 6.2 describes the receiver behavior when it advertises a zero window. The sender SHOULD send the first zero window probe after 1 RTO when it detects that the receiver has closed its window and SHOULD increase the probe interval exponentially afterwards. Also note that the cwnd SHOULD be adjusted according to Section 7.2.1. Zero window probing does not affect the calculation of cwnd.
The sender MUST also have an algorithm for sending new DATA chunks to avoid silly window syndrome (SWS) as described in [RFC 1122]. The algorithm can be similar to the one described in Section 4.2.3.4 of RFC 1122.
- B)
-
At any given time, the sender MUST NOT transmit new data to a given transport address if it has cwnd + (PMDCS - 1) or more bytes of data outstanding to that transport address. If data is available, the sender SHOULD exceed cwnd by up to (PMDCS - 1) bytes on a new data transmission if the flightsize does not currently reach cwnd. The breach of cwnd MUST constitute one packet only.
- C)
-
When the time comes for the sender to transmit, before sending new DATA chunks, the sender MUST first transmit any DATA chunks that are marked for retransmission (limited by the current cwnd).
- D)
-
When the time comes for the sender to transmit new DATA chunks, the protocol parameter 'Max.Burst' SHOULD be used to limit the number of packets sent. The limit MAY be applied by adjusting cwnd temporarily, as follows:
if ((flightsize + Max.Burst * PMDCS) < cwnd)
cwnd = flightsize + Max.Burst * PMDCS
Or, it MAY be applied by strictly limiting the number of packets emitted by the output routine. When calculating the number of packets to transmit, and particularly when using the formula above, cwnd SHOULD NOT be changed permanently.
- E)
-
Then, the sender can send as many new DATA chunks as rule A and rule B allow.
Multiple DATA chunks committed for transmission
MAY be bundled in a single packet. Furthermore, DATA chunks being retransmitted
MAY be bundled with new DATA chunks, as long as the resulting SCTP packet size does not exceed the PMTU. A ULP can request that no bundling is performed, but this only turns off any delays that an SCTP implementation might be using to increase bundling efficiency. It does not in itself stop all bundling from occurring (i.e., in case of congestion or retransmission).
Before an endpoint transmits a DATA chunk, if any received DATA chunks have not been acknowledged (e.g., due to delayed ack), the sender
SHOULD create a SACK chunk and bundle it with the outbound DATA chunk, as long as the size of the final SCTP packet does not exceed the current PMTU. See
Section 6.2.
When the window is full (i.e., transmission is disallowed by rule A and/or rule B), the sender
MAY still accept send requests from its upper layer but
MUST transmit no more DATA chunks until some or all of the outstanding DATA chunks are acknowledged and transmission is allowed by rule A and rule B again.
Whenever a transmission or retransmission is made to any address, if the T3-rtx timer of that address is not currently running, the sender
MUST start that timer. If the timer for that address is already running, the sender
MUST restart the timer if the earliest (i.e., lowest TSN) outstanding DATA chunk sent to that address is being retransmitted. Otherwise, the data sender
MUST NOT restart the timer.
When starting or restarting the T3-rtx timer, the timer value
SHOULD be adjusted according to the timer rules defined in Sections [
6.3.2] and [
6.3.3].
The data sender
MUST NOT use a TSN that is more than 2
31 - 1 above the beginning TSN of the current send window.
For each stream, the data sender
MUST NOT have more than 2
16 - 1 ordered user messages in the current send window.
Whenever the sender of a DATA chunk can benefit from the corresponding SACK chunk being sent back without delay, the sender
MAY set the I bit in the DATA chunk header. Please note that why the sender has set the I bit is irrelevant to the receiver.
Reasons for setting the I bit include, but are not limited to, the following (see
Section 4 of
RFC 7053 for a discussion of the benefits):
-
The application requests that the I bit of the last DATA chunk of a user message be set when providing the user message to the SCTP implementation (see Section 11.1).
-
The sender is in the SHUTDOWN-PENDING state.
-
The sending of a DATA chunk fills the congestion or receiver window.
The SCTP endpoint
MUST always acknowledge the reception of each valid DATA chunk when the DATA chunk received is inside its receive window.
When the receiver's advertised window is 0, the receiver
MUST drop any new incoming DATA chunk with a TSN larger than the largest TSN received so far. Also, if the new incoming DATA chunk holds a TSN value less than the largest TSN received so far, then the receiver
SHOULD drop the largest TSN held for reordering and accept the new incoming DATA chunk. In either case, if such a DATA chunk is dropped, the receiver
MUST immediately send back a SACK chunk with the current receive window showing only DATA chunks received and accepted so far. The dropped DATA chunk(s)
MUST NOT be included in the SACK chunk, as they were not accepted. The receiver
MUST also have an algorithm for advertising its receive window to avoid receiver silly window syndrome (SWS), as described in [
RFC 1122]. The algorithm can be similar to the one described in
Section 4.2.3.3 of
RFC 1122.
The guidelines on the delayed acknowledgement algorithm specified in
Section 4.2 of
RFC 5681 SHOULD be followed. Specifically, an acknowledgement
SHOULD be generated for at least every second packet (not every second DATA chunk) received and
SHOULD be generated within 200 ms of the arrival of any unacknowledged DATA chunk. In some situations, it might be beneficial for an SCTP transmitter to be more conservative than the algorithms detailed in this document allow. However, an SCTP transmitter
MUST NOT be more aggressive in sending SACK chunks than the following algorithms allow.
An SCTP receiver
MUST NOT generate more than one SACK chunk for every incoming packet, other than to update the offered window as the receiving application consumes new data. When the window opens up, an SCTP receiver
SHOULD send additional SACK chunks to update the window even if no new data is received. The receiver
MUST avoid sending a large number of window updates -- in particular, large bursts of them. One way to achieve this is to send a window update only if the window can be increased by at least a quarter of the receive buffer size of the association.
Implementation Note: The maximum delay for generating an acknowledgement
MAY be configured by the SCTP administrator, either statically or dynamically, in order to meet the specific timing requirement of the protocol being carried.
An implementation
MUST NOT allow the maximum delay (protocol parameter 'SACK.Delay') to be configured to be more than 500 ms. In other words, an implementation
MAY lower the value of 'SACK.Delay' below 500 ms but
MUST NOT raise it above 500 ms.
Acknowledgements
MUST be sent in SACK chunks unless shutdown was requested by the ULP, in which case an endpoint
MAY send an acknowledgement in the SHUTDOWN chunk. A SACK chunk can acknowledge the reception of multiple DATA chunks. See
Section 3.3.4 for SACK chunk format. In particular, the SCTP endpoint
MUST fill in the Cumulative TSN Ack field to indicate the latest sequential TSN (of a valid DATA chunk) it has received. Any received DATA chunks with TSN greater than the value in the Cumulative TSN Ack field are reported in the Gap Ack Block fields. The SCTP endpoint
MUST report as many Gap Ack Blocks as can fit in a single SACK chunk such that the size of the SCTP packet does not exceed the current PMTU.
The SHUTDOWN chunk does not contain Gap Ack Block fields. Therefore, the endpoint
SHOULD use a SACK chunk instead of the SHUTDOWN chunk to acknowledge DATA chunks received out of order.
Upon receipt of an SCTP packet containing a DATA chunk with the I bit set, the receiver
SHOULD NOT delay the sending of the corresponding SACK chunk, i.e., the receiver
SHOULD immediately respond with the corresponding SACK chunk.
When a packet arrives with duplicate DATA chunk(s) and with no new DATA chunk(s), the endpoint
MUST immediately send a SACK chunk with no delay. If a packet arrives with duplicate DATA chunk(s) bundled with new DATA chunks, the endpoint
MAY immediately send a SACK chunk. Normally, receipt of duplicate DATA chunks will occur when the original SACK chunk was lost and the peer's RTO has expired. The duplicate TSN number(s)
SHOULD be reported in the SACK chunk as duplicate.
When an endpoint receives a SACK chunk, it
MAY use the duplicate TSN information to determine if SACK chunk loss is occurring. Further use of this data is for future study.
The data receiver is responsible for maintaining its receive buffers. The data receiver
SHOULD notify the data sender in a timely manner of changes in its ability to receive data. How an implementation manages its receive buffers is dependent on many factors (e.g., operating system, memory management system, amount of memory, etc.). However, the data sender strategy defined in
Section 6.2.1 is based on the assumption of receiver operation similar to the following:
- A)
-
At initialization of the association, the endpoint tells the peer how much receive buffer space it has allocated to the association in the INIT or INIT ACK chunk. The endpoint sets a_rwnd to this value.
- B)
-
As DATA chunks are received and buffered, decrement a_rwnd by the number of bytes received and buffered. This is, in effect, closing rwnd at the data sender and restricting the amount of data it can transmit.
- C)
-
As DATA chunks are delivered to the ULP and released from the receive buffers, increment a_rwnd by the number of bytes delivered to the upper layer. This is, in effect, opening up rwnd on the data sender and allowing it to send more data. The data receiver SHOULD NOT increment a_rwnd unless it has released bytes from its receive buffer. For example, if the receiver is holding fragmented DATA chunks in a reassembly queue, it SHOULD NOT increment a_rwnd.
- D)
-
When sending a SACK chunk, the data receiver SHOULD place the current value of a_rwnd into the a_rwnd field. The data receiver SHOULD take into account that the data sender will not retransmit DATA chunks that are acked via the Cumulative TSN Ack (i.e., will drop from its retransmit queue).
Under certain circumstances, the data receiver
MAY drop DATA chunks that it has received but has not released from its receive buffers (i.e., delivered to the ULP). These DATA chunks might have been acked in Gap Ack Blocks. For example, the data receiver might be holding data in its receive buffers while reassembling a fragmented user message from its peer when it runs out of receive buffer space. It
MAY drop these DATA chunks even though it has acknowledged them in Gap Ack Blocks. If a data receiver drops DATA chunks, it
MUST NOT include them in Gap Ack Blocks in subsequent SACK chunks until they are received again via retransmission. In addition, the endpoint
SHOULD take into account the dropped data when calculating its a_rwnd.
An endpoint
SHOULD NOT revoke a SACK chunk and discard data. Only in extreme circumstances might an endpoint use this procedure (such as out of buffer space). The data receiver
SHOULD take into account that dropping data that has been acked in Gap Ack Blocks can result in suboptimal retransmission strategies in the data sender and thus in suboptimal performance.
The following example illustrates the use of delayed acknowledgements:
Endpoint A Endpoint Z
{App sends 3 messages; strm 0}
DATA [TSN=7,Strm=0,Seq=3] ------------> (ack delayed)
(Start T3-rtx timer)
DATA [TSN=8,Strm=0,Seq=4] ------------> (send ack)
/------- SACK [TSN Ack=8,block=0]
(cancel T3-rtx timer) <-----/
DATA [TSN=9,Strm=0,Seq=5] ------------> (ack delayed)
(Start T3-rtx timer)
...
{App sends 1 message; strm 1}
(bundle SACK with DATA)
/----- SACK [TSN Ack=9,block=0] \
/ DATA [TSN=6,Strm=1,Seq=2]
(cancel T3-rtx timer) <------/ (Start T3-rtx timer)
(ack delayed)
(send ack)
SACK [TSN Ack=6,block=0] -------------> (cancel T3-rtx timer)
If an endpoint receives a DATA chunk with no user data (i.e., the Length field is set to 16), it
SHOULD send an ABORT chunk with a "No User Data" error cause.
An endpoint
SHOULD NOT send a DATA chunk with no user data part. This avoids the need to be able to return a zero-length user message in the API, especially in the socket API as specified in [
RFC 6458] for details.
Each SACK chunk an endpoint receives contains an a_rwnd value. This value represents the amount of buffer space the data receiver, at the time of transmitting the SACK chunk, has left of its total receive buffer space (as specified in the INIT/INIT ACK chunk). Using a_rwnd, Cumulative TSN Ack, and Gap Ack Blocks, the data sender can develop a representation of the peer's receive buffer space.
One of the problems the data sender takes into account when processing a SACK chunk is that a SACK chunk can be received out of order. That is, a SACK chunk sent by the data receiver can pass an earlier SACK chunk and be received first by the data sender. If a SACK chunk is received out of order, the data sender can develop an incorrect view of the peer's receive buffer space.
Since there is no explicit identifier that can be used to detect out-of-order SACK chunks, the data sender uses heuristics to determine if a SACK chunk is new.
An endpoint
SHOULD use the following rules to calculate the rwnd, using the a_rwnd value, the Cumulative TSN Ack, and Gap Ack Blocks in a received SACK chunk.
- A)
-
At the establishment of the association, the endpoint initializes the rwnd to the Advertised Receiver Window Credit (a_rwnd) the peer specified in the INIT or INIT ACK chunk.
- B)
-
Any time a DATA chunk is transmitted (or retransmitted) to a peer, the endpoint subtracts the data size of the chunk from the rwnd of that peer.
- C)
-
Any time a DATA chunk is marked for retransmission, either via T3-rtx timer expiration (Section 6.3.3) or via Fast Retransmit (Section 7.2.4), add the data size of those chunks to the rwnd.
- D)
-
Any time a SACK chunk arrives, the endpoint performs the following:
- i)
-
If Cumulative TSN Ack is less than the Cumulative TSN Ack Point, then drop the SACK chunk. Since Cumulative TSN Ack is monotonically increasing, a SACK chunk whose Cumulative TSN Ack is less than the Cumulative TSN Ack Point indicates an out-of-order SACK chunk.
- ii)
-
Set rwnd equal to the newly received a_rwnd minus the number of bytes still outstanding after processing the Cumulative TSN Ack and the Gap Ack Blocks.
- iii)
-
If the SACK chunk is missing a TSN that was previously acknowledged via a Gap Ack Block (e.g., the data receiver reneged on the data), then consider the corresponding DATA that might be possibly missing: Count one miss indication towards Fast Retransmit as described in Section 7.2.4, and if no retransmit timer is running for the destination address to which the DATA chunk was originally transmitted, then T3-rtx is started for that destination address.
- iv)
-
If the Cumulative TSN Ack matches or exceeds the Fast Recovery exit point (Section 7.2.4), Fast Recovery is exited.
An SCTP endpoint uses a retransmission timer T3-rtx to ensure data delivery in the absence of any feedback from its peer. The duration of this timer is referred to as RTO (retransmission timeout).
When an endpoint's peer is multi-homed, the endpoint will calculate a separate RTO for each different destination transport address of its peer endpoint.
The computation and management of RTO in SCTP follow closely how TCP manages its retransmission timer. To compute the current RTO, an endpoint maintains two state variables per destination transport address: SRTT (smoothed round-trip time) and RTTVAR (round-trip time variation).
The rules governing the computation of SRTT, RTTVAR, and RTO are as follows:
- C1)
-
Until an RTT measurement has been made for a packet sent to the given destination transport address, set RTO to the protocol parameter 'RTO.Initial'.
- C2)
-
When the first RTT measurement R is made, perform:
SRTT = R
RTTVAR = R/2
RTO = SRTT + 4 * RTTVAR
- C3)
-
When a new RTT measurement R' is made, perform:
RTTVAR = (1 - RTO.Beta) * RTTVAR + RTO.Beta * |SRTT - R'|
SRTT = (1 - RTO.Alpha) * SRTT + RTO.Alpha * R'
Note: The value of SRTT used in the update to RTTVAR is its value before updating SRTT itself using the second assignment.
After the computation, update:
- C4)
-
When data is in flight and when allowed by rule C5 below, a new RTT measurement MUST be made each round trip. Furthermore, new RTT measurements SHOULD be made no more than once per round trip for a given destination transport address. There are two reasons for this recommendation: First, it appears that measuring more frequently often does not in practice yield any significant benefit [ALLMAN99]; second, if measurements are made more often, then the values of 'RTO.Alpha' and 'RTO.Beta' in rule C3 above SHOULD be adjusted so that SRTT and RTTVAR still adjust to changes at roughly the same rate (in terms of how many round trips it takes them to reflect new values) as they would if making only one measurement per round trip and using 'RTO.Alpha' and 'RTO.Beta' as given in rule C3. However, the exact nature of these adjustments remains a research issue.
- C5)
-
Karn's algorithm: RTT measurements MUST NOT be made using chunks that were retransmitted (and thus for which it is ambiguous whether the reply was for the first instance of the chunk or for a later instance).
RTT measurements SHOULD only be made using a DATA chunk with TSN r if no DATA chunk with TSN less than or equal to r was retransmitted since the DATA chunk with TSN r was sent first.
- C6)
-
Whenever RTO is computed, if it is less than 'RTO.Min' seconds, then it is rounded up to 'RTO.Min' seconds. The reason for this rule is that RTOs that do not have a high minimum value are susceptible to unnecessary timeouts [ALLMAN99].
- C7)
-
A maximum value MAY be placed on RTO, provided it is at least 'RTO.Max' seconds.
There is no requirement for the clock granularity G used for computing RTT measurements and the different state variables, other than:
- G1)
-
Whenever RTTVAR is computed, if RTTVAR == 0, then adjust RTTVAR = G.
Experience [
ALLMAN99] has shown that finer clock granularities (less than 100 msec) perform somewhat better than more coarse granularities.
See
Section 16 for suggested parameter values.
The rules for managing the retransmission timer are as follows:
- R1)
-
Every time a DATA chunk is sent to any address (including a retransmission), if the T3-rtx timer of that address is not running, start it running so that it will expire after the RTO of that address. The RTO used here is that obtained after any doubling due to previous T3-rtx timer expirations on the corresponding destination address as discussed in rule E2 below.
- R2)
-
Whenever all outstanding data sent to an address have been acknowledged, turn off the T3-rtx timer of that address.
- R3)
-
Whenever a SACK chunk is received that acknowledges the DATA chunk with the earliest outstanding TSN for that address, restart the T3-rtx timer for that address with its current RTO (if there is still outstanding data on that address).
- R4)
-
Whenever a SACK chunk is received missing a TSN that was previously acknowledged via a Gap Ack Block, start the T3-rtx for the destination address to which the DATA chunk was originally transmitted if it is not already running.
The following example shows the use of various timer rules (assuming that the receiver uses delayed acks).
Endpoint A Endpoint Z
{App begins to send}
Data [TSN=7,Strm=0,Seq=3] ------------> (ack delayed)
(Start T3-rtx timer)
{App sends 1 message; strm 1}
(bundle ack with data)
DATA [TSN=8,Strm=0,Seq=4] ----\ /-- SACK [TSN Ack=7,Block=0]
\ / DATA [TSN=6,Strm=1,Seq=2]
\ / (Start T3-rtx timer)
\
/ \
(Restart T3-rtx timer) <------/ \--> (ack delayed)
(ack delayed)
{send ack}
SACK [TSN Ack=6,Block=0] --------------> (Cancel T3-rtx timer)
..
(send ack)
(Cancel T3-rtx timer) <-------------- SACK [TSN Ack=8,Block=0]
Whenever the retransmission timer T3-rtx expires for a destination address, do the following:
- E1)
-
For the destination address for which the timer expires, adjust its ssthresh with rules defined in Section 7.2.3 and set cwnd = PMDCS.
- E2)
-
For the destination address for which the timer expires, set RTO = RTO * 2 ("back off the timer"). The maximum value discussed in rule C7 above ('RTO.Max') MAY be used to provide an upper bound to this doubling operation.
- E3)
-
Determine how many of the earliest (i.e., lowest TSN) outstanding DATA chunks for the address for which the T3-rtx has expired will fit into a single SCTP packet, subject to the PMTU corresponding to the destination transport address to which the retransmission is being sent (this might be different from the address for which the timer expires; see Section 6.4). Call this value K. Bundle and retransmit those K DATA chunks in a single packet to the destination endpoint.
- E4)
-
Start the retransmission timer T3-rtx on the destination address to which the retransmission is sent if rule R1 above indicates to do so. The RTO to be used for starting T3-rtx SHOULD be the one for the destination address to which the retransmission is sent, which, when the receiver is multi-homed, might be different from the destination address for which the timer expired (see Section 6.4 below).
After retransmitting, once a new RTT measurement is obtained (which can happen only when new data has been sent and acknowledged, per rule C5, or for a measurement made from a HEARTBEAT chunk; see
Section 8.3), the computation in rule C3 is performed, including the computation of RTO, which might result in "collapsing" RTO back down after it has been subject to doubling (rule E2).
Any DATA chunks that were sent to the address for which the T3-rtx timer expired but did not fit in an SCTP packet of size smaller than or equal to the PMTU (rule E3 above)
SHOULD be marked for retransmission and sent as soon as cwnd allows (normally, when a SACK chunk arrives).
The final rule for managing the retransmission timer concerns failover (see
Section 6.4.1):
- F1)
-
Whenever an endpoint switches from the current destination transport address to a different one, the current retransmission timers are left running. As soon as the endpoint transmits a packet containing DATA chunk(s) to the new transport address, start the timer on that transport address, using the RTO value of the destination address to which the data is being sent, if rule R1 indicates to do so.
An SCTP endpoint is considered multi-homed if there is more than one transport address that can be used as a destination address to reach that endpoint.
Moreover, the ULP of an endpoint selects one of the multiple destination addresses of a multi-homed peer endpoint as the primary path (see Sections [
5.1.2] and [
11.1] for details).
By default, an endpoint
SHOULD always transmit to the primary path, unless the SCTP user explicitly specifies the destination transport address (and possibly source transport address) to use.
An endpoint
SHOULD transmit reply chunks (e.g., INIT ACK, COOKIE ACK, and HEARTBEAT ACK) in response to control chunks to the same destination transport address from which it received the control chunk to which it is replying.
The selection of the destination transport address for packets containing SACK chunks is implementation dependent. However, an endpoint
SHOULD NOT vary the destination transport address of a SACK chunk when it receives DATA chunks coming from the same source address.
When acknowledging multiple DATA chunks received in packets from different source addresses in a single SACK chunk, the SACK chunk
MAY be transmitted to one of the destination transport addresses from which the DATA or control chunks being acknowledged were received.
When a receiver of a duplicate DATA chunk sends a SACK chunk to a multi-homed endpoint, it
MAY be beneficial to vary the destination address and not use the source address of the DATA chunk. The reason is that receiving a duplicate from a multi-homed endpoint might indicate that the return path (as specified in the source address of the DATA chunk) for the SACK chunk is broken.
Furthermore, when its peer is multi-homed, an endpoint
SHOULD try to retransmit a chunk that timed out to an active destination transport address that is different from the last destination address to which the chunk was sent.
When its peer is multi-homed, an endpoint
SHOULD send fast retransmissions to the same destination transport address to which the original data was sent. If the primary path has been changed and the original data was sent to the old primary path before the Fast Retransmit, the implementation
MAY send it to the new primary path.
Retransmissions do not affect the total outstanding data count. However, if the DATA chunk is retransmitted onto a different destination address, both the outstanding data counts on the new destination address and the old destination address to which the data chunk was last sent is adjusted accordingly.
Some of the transport addresses of a multi-homed SCTP endpoint might become inactive due to either the occurrence of certain error conditions (see
Section 8.2) or adjustments from the SCTP user.
When there is outbound data to send and the primary path becomes inactive (e.g., due to failures) or where the SCTP user explicitly requests to send data to an inactive destination transport address before reporting an error to its ULP, the SCTP endpoint
SHOULD try to send the data to an alternate active destination transport address if one exists.
When retransmitting data that timed out, if the endpoint is multi-homed, it needs to consider each source-destination address pair in its retransmission selection policy. When retransmitting timed-out data, the endpoint
SHOULD attempt to pick the most divergent source-destination pair from the original source-destination pair to which the packet was transmitted.
Note: Rules for picking the most divergent source-destination pair are an implementation decision and are not specified within this document.
Every DATA chunk
MUST carry a valid stream identifier. If an endpoint receives a DATA chunk with an invalid stream identifier, it
SHOULD acknowledge the reception of the DATA chunk following the normal procedure, immediately send an ERROR chunk with cause set to "Invalid Stream Identifier" (see
Section 3.3.10), and discard the DATA chunk. The endpoint
MAY bundle the ERROR chunk and the SACK chunk in the same packet.
The Stream Sequence Number in all the outgoing streams
MUST start from 0 when the association is established. The Stream Sequence Number of an outgoing stream
MUST be incremented by 1 for each ordered user message sent on that outgoing stream. In particular, when the Stream Sequence Number reaches the value 65535, the next Stream Sequence Number
MUST be set to 0. For unordered user messages, the Stream Sequence Number
MUST NOT be changed.
Within a stream, an endpoint
MUST deliver DATA chunks received with the U flag set to 0 to the upper layer according to the order of their Stream Sequence Number. If DATA chunks arrive out of order of their Stream Sequence Number, the endpoint
MUST hold the received DATA chunks from delivery to the ULP until they are reordered.
However, an SCTP endpoint can indicate that no ordered delivery is required for a particular DATA chunk transmitted within the stream by setting the U flag of the DATA chunk to 1.
When an endpoint receives a DATA chunk with the U flag set to 1, it bypasses the ordering mechanism and immediately deliver the data to the upper layer (after reassembly if the user data is fragmented by the data sender).
This provides an effective way of transmitting "out-of-band" data in a given stream. Also, a stream can be used as an "unordered" stream by simply setting the U flag to 1 in all DATA chunks sent through that stream.
Implementation Note: When sending an unordered DATA chunk, an implementation
MAY choose to place the DATA chunk in an outbound packet that is at the head of the outbound transmission queue if possible.
The 'Stream Sequence Number' field in a DATA chunk with U flag set to 1 has no significance. The sender can fill the 'Stream Sequence Number' with arbitrary value, but the receiver
MUST ignore the field.
Note: When transmitting ordered and unordered data, an endpoint does not increment its Stream Sequence Number when transmitting a DATA chunk with U flag set to 1.
Upon the reception of a new DATA chunk, an endpoint examines the continuity of the TSNs received. If the endpoint detects a gap in the received DATA chunk sequence, it
SHOULD send a SACK chunk with Gap Ack Blocks immediately. The data receiver continues sending a SACK chunk after receipt of each SCTP packet that does not fill the gap.
Based on the Gap Ack Block from the received SACK chunk, the endpoint can calculate the missing DATA chunks and make decisions on whether to retransmit them (see
Section 6.2.1 for details).
Multiple gaps can be reported in one single SACK chunk (see
Section 3.3.4).
When its peer is multi-homed, the SCTP endpoint
SHOULD always try to send the SACK chunk to the same destination address from which the last DATA chunk was received.
Upon the reception of a SACK chunk, the endpoint
MUST remove all DATA chunks that have been acknowledged by the SACK chunk's Cumulative TSN Ack from its transmit queue. All DATA chunks with TSNs not included in the Gap Ack Blocks that are smaller than the highest-acknowledged TSN reported in the SACK chunk
MUST be treated as "missing" by the sending endpoint. The number of "missing" reports for each outstanding DATA chunk
MUST be recorded by the data sender to make retransmission decisions. See
Section 7.2.4 for details.
The following example shows the use of SACK chunk to report a gap.
Endpoint A Endpoint Z
{App sends 3 messages; strm 0}
DATA [TSN=6,Strm=0,Seq=2] ---------------> (ack delayed)
(Start T3-rtx timer)
DATA [TSN=7,Strm=0,Seq=3] --------> X (lost)
DATA [TSN=8,Strm=0,Seq=4] ---------------> (gap detected,
immediately send ack)
/----- SACK [TSN Ack=6,Block=1,
/ Start=2,End=2]
<-----/
(remove 6 from out-queue,
and mark 7 as "1" missing report)
The maximum number of Gap Ack Blocks that can be reported within a single SACK chunk is limited by the current PMTU. When a single SACK chunk cannot cover all the Gap Ack Blocks needed to be reported due to the PMTU limitation, the endpoint
MUST send only one SACK chunk. This single SACK chunk
MUST report the Gap Ack Blocks from the lowest to highest TSNs, within the size limit set by the PMTU, and leave the remaining highest TSN numbers unacknowledged.
When sending an SCTP packet, the endpoint
MUST strengthen the data integrity of the transmission by including the CRC32c checksum value calculated on the packet, as described below.
After the packet is constructed (containing the SCTP common header and one or more control or DATA chunks), the transmitter
MUST:
- 1)
-
fill in the proper Verification Tag in the SCTP common header and initialize the checksum field to 0,
- 2)
-
calculate the CRC32c checksum of the whole packet, including the SCTP common header and all the chunks (refer to Appendix A for details of the CRC32c algorithm), and
- 3)
-
put the resultant value into the checksum field in the common header and leave the rest of the bits unchanged.
When an SCTP packet is received, the receiver
MUST first check the CRC32c checksum as follows:
- 1)
-
Store the received CRC32c checksum value aside.
- 2)
-
Replace the 32 bits of the checksum field in the received SCTP packet with 0 and calculate a CRC32c checksum value of the whole received packet.
- 3)
-
Verify that the calculated CRC32c checksum is the same as the received CRC32c checksum. If it is not, the receiver MUST treat the packet as an invalid SCTP packet.
The default procedure for handling invalid SCTP packets is to silently discard them.
Any hardware implementation
SHOULD permit alternative verification of the CRC in software.
An endpoint
MAY support fragmentation when sending DATA chunks, but it
MUST support reassembly when receiving DATA chunks. If an endpoint supports fragmentation, it
MUST fragment a user message if the size of the user message to be sent causes the outbound SCTP packet size to exceed the current PMTU. An endpoint that does not support fragmentation and is requested to send a user message such that the outbound SCTP packet size would exceed the current PMTU
MUST return an error to its upper layer and
MUST NOT attempt to send the user message.
An SCTP implementation
MAY provide a mechanism to the upper layer that disables fragmentation when sending DATA chunks. When fragmentation of DATA chunks is disabled, the SCTP implementation
MUST behave in the same way an implementation that does not support fragmentation, i.e., it rejects calls that would result in sending SCTP packets that exceed the current PMTU.
Implementation Note: In this error case, the SEND primitive discussed in
Section 11.1.5 would need to return an error to the upper layer.
If its peer is multi-homed, the endpoint
SHOULD choose a DATA chunk size smaller than or equal to the AMDCS.
Once a user message is fragmented, it cannot be re-fragmented. Instead, if the PMTU has been reduced, then IP fragmentation
MUST be used. Therefore, an SCTP association can fail if IP fragmentation is not working on any path. Please see
Section 7.3 for details of PMTU discovery.
When determining when to fragment, the SCTP implementation
MUST take into account the SCTP packet header as well as the DATA chunk header(s). The implementation
MUST also take into account the space required for a SACK chunk if bundling a SACK chunk with the DATA chunk.
Fragmentation takes the following steps:
- 1)
-
The data sender MUST break the user message into a series of DATA chunks. The sender SHOULD choose a size of DATA chunks that is smaller than or equal to the AMDCS.
- 2)
-
The transmitter MUST then assign, in sequence, a separate TSN to each of the DATA chunks in the series. The transmitter assigns the same Stream Sequence Number to each of the DATA chunks. If the user indicates that the user message is to be delivered using unordered delivery, then the U flag of each DATA chunk of the user message MUST be set to 1.
- 3)
-
The transmitter MUST also set the B/E bits of the first DATA chunk in the series to 10, the B/E bits of the last DATA chunk in the series to 01, and the B/E bits of all other DATA chunks in the series to 00.
An endpoint
MUST recognize fragmented DATA chunks by examining the B/E bits in each of the received DATA chunks and queue the fragmented DATA chunks for reassembly. Once the user message is reassembled, SCTP passes the reassembled user message to the specific stream for possible reordering and final dispatching.
If the data receiver runs out of buffer space while still waiting for more fragments to complete the reassembly of the message, it
SHOULD dispatch part of its inbound message through a partial delivery API (see
Section 11), freeing some of its receive buffer space so that the rest of the message can be received.
An endpoint bundles chunks by simply including multiple chunks in one outbound SCTP packet. The total size of the resultant SCTP packet
MUST be less that or equal to the current PMTU.
If its peer endpoint is multi-homed, the sending endpoint
SHOULD choose a size no larger than the PMTU of the current primary path.
When bundling control chunks with DATA chunks, an endpoint
MUST place control chunks first in the outbound SCTP packet. The transmitter
MUST transmit DATA chunks within an SCTP packet in increasing order of TSN.
Note: Since control chunks are placed first in a packet and since DATA chunks are transmitted before SHUTDOWN or SHUTDOWN ACK chunks, DATA chunks cannot be bundled with SHUTDOWN or SHUTDOWN ACK chunks.
Partial chunks
MUST NOT be placed in an SCTP packet. A partial chunk is a chunk that is not completely contained in the SCTP packet; i.e., the SCTP packet is too short to contain all the bytes of the chunk as indicated by the chunk length.
An endpoint
MUST process received chunks in their order in the packet. The receiver uses the Chunk Length field to determine the end of a chunk and beginning of the next chunk, taking account of the fact that all chunks end on a 4-byte boundary. If the receiver detects a partial chunk, it
MUST drop the chunk.
An endpoint
MUST NOT bundle INIT, INIT ACK, or SHUTDOWN COMPLETE chunks with any other chunks.