RFC 4960

Stream Control Transmission Protocol

Pages: 152
Obsoletes: 2960 3309
Obsoleted by: 9260
Updated by: 6096 6335 7053 8899

Part 5 of 7 – Pages 83 to 106

RFC4960 - Page 83 prevText

6.3.  Management of Retransmission Timer

   An SCTP endpoint uses a retransmission timer T3-rtx to ensure data
   delivery in the absence of any feedback from its peer.  The duration
   of this timer is referred to as RTO (retransmission timeout).

   When an endpoint's peer is multi-homed, the endpoint will calculate a
   separate RTO for each different destination transport address of its
   peer endpoint.

   The computation and management of RTO in SCTP follow closely how TCP
   manages its retransmission timer.  To compute the current RTO, an
   endpoint maintains two state variables per destination transport
   address: SRTT (smoothed round-trip time) and RTTVAR (round-trip time
   variation).

6.3.1.  RTO Calculation

   The rules governing the computation of SRTT, RTTVAR, and RTO are as
   follows:

   C1)  Until an RTT measurement has been made for a packet sent to the
        given destination transport address, set RTO to the protocol
        parameter 'RTO.Initial'.

   C2)  When the first RTT measurement R is made, set

        SRTT <- R,

        RTTVAR <- R/2, and

        RTO <- SRTT + 4 * RTTVAR.

   C3)  When a new RTT measurement R' is made, set

        RTTVAR <- (1 - RTO.Beta) * RTTVAR + RTO.Beta * |SRTT - R'|

        and

        SRTT <- (1 - RTO.Alpha) * SRTT + RTO.Alpha * R'

        Note: The value of SRTT used in the update to RTTVAR is its
        value before updating SRTT itself using the second assignment.

        After the computation, update RTO <- SRTT + 4 * RTTVAR.

RFC4960 - Page 84

   C4)  When data is in flight and when allowed by rule C5 below, a new
        RTT measurement MUST be made each round trip.  Furthermore, new
        RTT measurements SHOULD be made no more than once per round trip
        for a given destination transport address.  There are two
        reasons for this recommendation: First, it appears that
        measuring more frequently often does not in practice yield any
        significant benefit [ALLMAN99]; second, if measurements are made
        more often, then the values of RTO.Alpha and RTO.Beta in rule C3
        above should be adjusted so that SRTT and RTTVAR still adjust to
        changes at roughly the same rate (in terms of how many round
        trips it takes them to reflect new values) as they would if
        making only one measurement per round-trip and using RTO.Alpha
        and RTO.Beta as given in rule C3.  However, the exact nature of
        these adjustments remains a research issue.

   C5)  Karn's algorithm: RTT measurements MUST NOT be made using
        packets that were retransmitted (and thus for which it is
        ambiguous whether the reply was for the first instance of the
        chunk or for a later instance)

        IMPLEMENTATION NOTE: RTT measurements should only be made using
        a chunk with TSN r if no chunk with TSN less than or equal to r
        is retransmitted since r is first sent.

   C6)  Whenever RTO is computed, if it is less than RTO.Min seconds
        then it is rounded up to RTO.Min seconds.  The reason for this
        rule is that RTOs that do not have a high minimum value are
        susceptible to unnecessary timeouts [ALLMAN99].

   C7)  A maximum value may be placed on RTO provided it is at least
        RTO.max seconds.

   There is no requirement for the clock granularity G used for
   computing RTT measurements and the different state variables, other
   than:

   G1) Whenever RTTVAR is computed, if RTTVAR = 0, then adjust RTTVAR <-
   G.

   Experience [ALLMAN99] has shown that finer clock granularities (<=
   100 msec) perform somewhat better than more coarse granularities.

RFC4960 - Page 85

6.3.2.  Retransmission Timer Rules

   The rules for managing the retransmission timer are as follows:

   R1)  Every time a DATA chunk is sent to any address (including a
        retransmission), if the T3-rtx timer of that address is not
        running, start it running so that it will expire after the RTO
        of that address.  The RTO used here is that obtained after any
        doubling due to previous T3-rtx timer expirations on the
        corresponding destination address as discussed in rule E2 below.

   R2)  Whenever all outstanding data sent to an address have been
        acknowledged, turn off the T3-rtx timer of that address.

   R3)  Whenever a SACK is received that acknowledges the DATA chunk
        with the earliest outstanding TSN for that address, restart the
        T3-rtx timer for that address with its current RTO (if there is
        still outstanding data on that address).

   R4)  Whenever a SACK is received missing a TSN that was previously
        acknowledged via a Gap Ack Block, start the T3-rtx for the
        destination address to which the DATA chunk was originally
        transmitted if it is not already running.

   The following example shows the use of various timer rules (assuming
   that the receiver uses delayed acks).

   Endpoint A                                         Endpoint Z
   {App begins to send}
   Data [TSN=7,Strm=0,Seq=3] ------------> (ack delayed)
   (Start T3-rtx timer)
                                           {App sends 1 message; strm 1}
                                           (bundle ack with data)
   DATA [TSN=8,Strm=0,Seq=4] ----\     /-- SACK [TSN Ack=7,Block=0]
                                  \   /      DATA [TSN=6,Strm=1,Seq=2]
                                   \ /     (Start T3-rtx timer)
                                    \
                                   / \
   (Restart T3-rtx timer)  <------/   \--> (ack delayed)
   (ack delayed)
   {send ack}
   SACK [TSN Ack=6,Block=0] --------------> (Cancel T3-rtx timer)
                                           ..
                                           (send ack)
   (Cancel T3-rtx timer)  <-------------- SACK [TSN Ack=8,Block=0]

                       Figure 8: Timer Rule Examples

RFC4960 - Page 86

6.3.3.  Handle T3-rtx Expiration

   Whenever the retransmission timer T3-rtx expires for a destination
   address, do the following:

   E1)  For the destination address for which the timer expires, adjust
        its ssthresh with rules defined in Section 7.2.3 and set the
        cwnd <- MTU.

   E2)  For the destination address for which the timer expires, set RTO
        <- RTO * 2 ("back off the timer").  The maximum value discussed
        in rule C7 above (RTO.max) may be used to provide an upper bound
        to this doubling operation.

   E3)  Determine how many of the earliest (i.e., lowest TSN)
        outstanding DATA chunks for the address for which the T3-rtx has
        expired will fit into a single packet, subject to the MTU
        constraint for the path corresponding to the destination
        transport address to which the retransmission is being sent
        (this may be different from the address for which the timer
        expires; see Section 6.4).  Call this value K.  Bundle and
        retransmit those K DATA chunks in a single packet to the
        destination endpoint.

   E4)  Start the retransmission timer T3-rtx on the destination address
        to which the retransmission is sent, if rule R1 above indicates
        to do so.  The RTO to be used for starting T3-rtx should be the
        one for the destination address to which the retransmission is
        sent, which, when the receiver is multi-homed, may be different
        from the destination address for which the timer expired (see
        Section 6.4 below).

   After retransmitting, once a new RTT measurement is obtained (which
   can happen only when new data has been sent and acknowledged, per
   rule C5, or for a measurement made from a HEARTBEAT; see Section
   8.3), the computation in rule C3 is performed, including the
   computation of RTO, which may result in "collapsing" RTO back down
   after it has been subject to doubling (rule E2).

   Note: Any DATA chunks that were sent to the address for which the
   T3-rtx timer expired but did not fit in one MTU (rule E3 above)
   should be marked for retransmission and sent as soon as cwnd allows
   (normally, when a SACK arrives).

   The final rule for managing the retransmission timer concerns
   failover (see Section 6.4.1):

RFC4960 - Page 87

   F1)  Whenever an endpoint switches from the current destination
        transport address to a different one, the current retransmission
        timers are left running.  As soon as the endpoint transmits a
        packet containing DATA chunk(s) to the new transport address,
        start the timer on that transport address, using the RTO value
        of the destination address to which the data is being sent, if
        rule R1 indicates to do so.

6.4.  Multi-Homed SCTP Endpoints

   An SCTP endpoint is considered multi-homed if there are more than one
   transport address that can be used as a destination address to reach
   that endpoint.

   Moreover, the ULP of an endpoint shall select one of the multiple
   destination addresses of a multi-homed peer endpoint as the primary
   path (see Section 5.1.2 and Section 10.1 for details).

   By default, an endpoint SHOULD always transmit to the primary path,
   unless the SCTP user explicitly specifies the destination transport
   address (and possibly source transport address) to use.

   An endpoint SHOULD transmit reply chunks (e.g., SACK, HEARTBEAT ACK,
   etc.) to the same destination transport address from which it
   received the DATA or control chunk to which it is replying.  This
   rule should also be followed if the endpoint is bundling DATA chunks
   together with the reply chunk.

   However, when acknowledging multiple DATA chunks received in packets
   from different source addresses in a single SACK, the SACK chunk may
   be transmitted to one of the destination transport addresses from
   which the DATA or control chunks being acknowledged were received.

   When a receiver of a duplicate DATA chunk sends a SACK to a multi-
   homed endpoint, it MAY be beneficial to vary the destination address
   and not use the source address of the DATA chunk.  The reason is that
   receiving a duplicate from a multi-homed endpoint might indicate that
   the return path (as specified in the source address of the DATA
   chunk) for the SACK is broken.

   Furthermore, when its peer is multi-homed, an endpoint SHOULD try to
   retransmit a chunk that timed out to an active destination transport
   address that is different from the last destination address to which
   the DATA chunk was sent.

   Retransmissions do not affect the total outstanding data count.
   However, if the DATA chunk is retransmitted onto a different
   destination address, both the outstanding data counts on the new

RFC4960 - Page 88

   destination address and the old destination address to which the data
   chunk was last sent shall be adjusted accordingly.

6.4.1.  Failover from an Inactive Destination Address

   Some of the transport addresses of a multi-homed SCTP endpoint may
   become inactive due to either the occurrence of certain error
   conditions (see Section 8.2) or adjustments from the SCTP user.

   When there is outbound data to send and the primary path becomes
   inactive (e.g., due to failures), or where the SCTP user explicitly
   requests to send data to an inactive destination transport address,
   before reporting an error to its ULP, the SCTP endpoint should try to
   send the data to an alternate active destination transport address if
   one exists.

   When retransmitting data that timed out, if the endpoint is multi-
   homed, it should consider each source-destination address pair in its
   retransmission selection policy.  When retransmitting timed-out data,
   the endpoint should attempt to pick the most divergent source-
   destination pair from the original source-destination pair to which
   the packet was transmitted.

   Note: Rules for picking the most divergent source-destination pair
   are an implementation decision and are not specified within this
   document.

6.5.  Stream Identifier and Stream Sequence Number

   Every DATA chunk MUST carry a valid stream identifier.  If an
   endpoint receives a DATA chunk with an invalid stream identifier, it
   shall acknowledge the reception of the DATA chunk following the
   normal procedure, immediately send an ERROR chunk with cause set to
   "Invalid Stream Identifier" (see Section 3.3.10), and discard the
   DATA chunk.  The endpoint may bundle the ERROR chunk in the same
   packet as the SACK as long as the ERROR follows the SACK.

   The Stream Sequence Number in all the streams MUST start from 0 when
   the association is established.  Also, when the Stream Sequence
   Number reaches the value 65535 the next Stream Sequence Number MUST
   be set to 0.

6.6.  Ordered and Unordered Delivery

   Within a stream, an endpoint MUST deliver DATA chunks received with
   the U flag set to 0 to the upper layer according to the order of
   their Stream Sequence Number.  If DATA chunks arrive out of order of

RFC4960 - Page 89

   their Stream Sequence Number, the endpoint MUST hold the received
   DATA chunks from delivery to the ULP until they are reordered.

   However, an SCTP endpoint can indicate that no ordered delivery is
   required for a particular DATA chunk transmitted within the stream by
   setting the U flag of the DATA chunk to 1.

   When an endpoint receives a DATA chunk with the U flag set to 1, it
   must bypass the ordering mechanism and immediately deliver the data
   to the upper layer (after reassembly if the user data is fragmented
   by the data sender).

   This provides an effective way of transmitting "out-of-band" data in
   a given stream.  Also, a stream can be used as an "unordered" stream
   by simply setting the U flag to 1 in all DATA chunks sent through
   that stream.

   IMPLEMENTATION NOTE: When sending an unordered DATA chunk, an
   implementation may choose to place the DATA chunk in an outbound
   packet that is at the head of the outbound transmission queue if
   possible.

   The 'Stream Sequence Number' field in a DATA chunk with U flag set to
   1 has no significance.  The sender can fill it with arbitrary value,
   but the receiver MUST ignore the field.

   Note: When transmitting ordered and unordered data, an endpoint does
   not increment its Stream Sequence Number when transmitting a DATA
   chunk with U flag set to 1.

6.7.  Report Gaps in Received DATA TSNs

   Upon the reception of a new DATA chunk, an endpoint shall examine the
   continuity of the TSNs received.  If the endpoint detects a gap in
   the received DATA chunk sequence, it SHOULD send a SACK with Gap Ack
   Blocks immediately.  The data receiver continues sending a SACK after
   receipt of each SCTP packet that doesn't fill the gap.

   Based on the Gap Ack Block from the received SACK, the endpoint can
   calculate the missing DATA chunks and make decisions on whether to
   retransmit them (see Section 6.2.1 for details).

   Multiple gaps can be reported in one single SACK (see Section 3.3.4).

   When its peer is multi-homed, the SCTP endpoint SHOULD always try to
   send the SACK to the same destination address from which the last
   DATA chunk was received.

RFC4960 - Page 90

   Upon the reception of a SACK, the endpoint MUST remove all DATA
   chunks that have been acknowledged by the SACK's Cumulative TSN Ack
   from its transmit queue.  The endpoint MUST also treat all the DATA
   chunks with TSNs not included in the Gap Ack Blocks reported by the
   SACK as "missing".  The number of "missing" reports for each
   outstanding DATA chunk MUST be recorded by the data sender in order
   to make retransmission decisions.  See Section 7.2.4 for details.

   The following example shows the use of SACK to report a gap.

       Endpoint A                                    Endpoint Z {App
       sends 3 messages; strm 0} DATA [TSN=6,Strm=0,Seq=2] ----------
       -----> (ack delayed) (Start T3-rtx timer)

       DATA [TSN=7,Strm=0,Seq=3] --------> X (lost)

       DATA [TSN=8,Strm=0,Seq=4] ---------------> (gap detected,
                                                   immediately send ack)
                                       /----- SACK [TSN Ack=6,Block=1,
                                      /             Start=2,End=2]
                               <-----/ (remove 6 from out-queue,
        and mark 7 as "1" missing report)

                  Figure 9: Reporting a Gap using SACK

   The maximum number of Gap Ack Blocks that can be reported within a
   single SACK chunk is limited by the current path MTU.  When a single
   SACK cannot cover all the Gap Ack Blocks needed to be reported due to
   the MTU limitation, the endpoint MUST send only one SACK, reporting
   the Gap Ack Blocks from the lowest to highest TSNs, within the size
   limit set by the MTU, and leave the remaining highest TSN numbers
   unacknowledged.

6.8.  CRC32c Checksum Calculation

   When sending an SCTP packet, the endpoint MUST strengthen the data
   integrity of the transmission by including the CRC32c checksum value
   calculated on the packet, as described below.

   After the packet is constructed (containing the SCTP common header
   and one or more control or DATA chunks), the transmitter MUST

   1)  fill in the proper Verification Tag in the SCTP common header and
       initialize the checksum field to '0's,

   2)  calculate the CRC32c checksum of the whole packet, including the
       SCTP common header and all the chunks (refer to Appendix B for
       details of the CRC32c algorithm); and

RFC4960 - Page 91

   3)  put the resultant value into the checksum field in the common
       header, and leave the rest of the bits unchanged.

   When an SCTP packet is received, the receiver MUST first check the
   CRC32c checksum as follows:

   1)  Store the received CRC32c checksum value aside.

   2)  Replace the 32 bits of the checksum field in the received SCTP
       packet with all '0's and calculate a CRC32c checksum value of the
       whole received packet.

   3)  Verify that the calculated CRC32c checksum is the same as the
       received CRC32c checksum.  If it is not, the receiver MUST treat
       the packet as an invalid SCTP packet.

   The default procedure for handling invalid SCTP packets is to
   silently discard them.

   Any hardware implementation SHOULD be done in a way that is
   verifiable by the software.

6.9.  Fragmentation and Reassembly

   An endpoint MAY support fragmentation when sending DATA chunks, but
   it MUST support reassembly when receiving DATA chunks.  If an
   endpoint supports fragmentation, it MUST fragment a user message if
   the size of the user message to be sent causes the outbound SCTP
   packet size to exceed the current MTU.  If an implementation does not
   support fragmentation of outbound user messages, the endpoint MUST
   return an error to its upper layer and not attempt to send the user
   message.

   Note: If an implementation that supports fragmentation makes
   available to its upper layer a mechanism to turn off fragmentation,
   it may do so.  However, in so doing, it MUST react just like an
   implementation that does NOT support fragmentation, i.e., it MUST
   reject sends that exceed the current Path MTU (P-MTU).

   IMPLEMENTATION NOTE: In this error case, the Send primitive discussed
   in Section 10.1 would need to return an error to the upper layer.

   If its peer is multi-homed, the endpoint shall choose a size no
   larger than the association Path MTU.  The association Path MTU is
   the smallest Path MTU of all destination addresses.

RFC4960 - Page 92

   Note: Once a message is fragmented, it cannot be re-fragmented.
   Instead, if the PMTU has been reduced, then IP fragmentation must be
   used.  Please see Section 7.3 for details of PMTU discovery.

   When determining when to fragment, the SCTP implementation MUST take
   into account the SCTP packet header as well as the DATA chunk
   header(s).  The implementation MUST also take into account the space
   required for a SACK chunk if bundling a SACK chunk with the DATA
   chunk.

   Fragmentation takes the following steps:

   1)  The data sender MUST break the user message into a series of DATA
       chunks such that each chunk plus SCTP overhead fits into an IP
       datagram smaller than or equal to the association Path MTU.

   2)  The transmitter MUST then assign, in sequence, a separate TSN to
       each of the DATA chunks in the series.  The transmitter assigns
       the same SSN to each of the DATA chunks.  If the user indicates
       that the user message is to be delivered using unordered
       delivery, then the U flag of each DATA chunk of the user message
       MUST be set to 1.

   3)  The transmitter MUST also set the B/E bits of the first DATA
       chunk in the series to '10', the B/E bits of the last DATA chunk
       in the series to '01', and the B/E bits of all other DATA chunks
       in the series to '00'.

   An endpoint MUST recognize fragmented DATA chunks by examining the
   B/E bits in each of the received DATA chunks, and queue the
   fragmented DATA chunks for reassembly.  Once the user message is
   reassembled, SCTP shall pass the reassembled user message to the
   specific stream for possible reordering and final dispatching.

   Note: If the data receiver runs out of buffer space while still
   waiting for more fragments to complete the reassembly of the message,
   it should dispatch part of its inbound message through a partial
   delivery API (see Section 10), freeing some of its receive buffer
   space so that the rest of the message may be received.

6.10.  Bundling

   An endpoint bundles chunks by simply including multiple chunks in one
   outbound SCTP packet.  The total size of the resultant IP datagram,

   including the SCTP packet and IP headers, MUST be less that or equal
   to the current Path MTU.

RFC4960 - Page 93

   If its peer endpoint is multi-homed, the sending endpoint shall
   choose a size no larger than the latest MTU of the current primary
   path.

   When bundling control chunks with DATA chunks, an endpoint MUST place
   control chunks first in the outbound SCTP packet.  The transmitter
   MUST transmit DATA chunks within an SCTP packet in increasing order
   of TSN.

   Note: Since control chunks must be placed first in a packet and since
   DATA chunks must be transmitted before SHUTDOWN or SHUTDOWN ACK
   chunks, DATA chunks cannot be bundled with SHUTDOWN or SHUTDOWN ACK
   chunks.

   Partial chunks MUST NOT be placed in an SCTP packet.  A partial chunk
   is a chunk that is not completely contained in the SCTP packet; i.e.,
   the SCTP packet is too short to contain all the bytes of the chunk as
   indicated by the chunk length.

   An endpoint MUST process received chunks in their order in the
   packet.  The receiver uses the Chunk Length field to determine the
   end of a chunk and beginning of the next chunk taking account of the
   fact that all chunks end on a 4-byte boundary.  If the receiver
   detects a partial chunk, it MUST drop the chunk.

   An endpoint MUST NOT bundle INIT, INIT ACK, or SHUTDOWN COMPLETE with
   any other chunks.

7.  Congestion Control

   Congestion control is one of the basic functions in SCTP.  For some
   applications, it may be likely that adequate resources will be
   allocated to SCTP traffic to ensure prompt delivery of time-critical
   data -- thus, it would appear to be unlikely, during normal
   operations, that transmissions encounter severe congestion
   conditions.  However, SCTP must operate under adverse operational
   conditions, which can develop upon partial network failures or
   unexpected traffic surges.  In such situations, SCTP must follow
   correct congestion control steps to recover from congestion quickly
   in order to get data delivered as soon as possible.  In the absence
   of network congestion, these preventive congestion control algorithms
   should show no impact on the protocol performance.

   IMPLEMENTATION NOTE: As far as its specific performance requirements
   are met, an implementation is always allowed to adopt a more
   conservative congestion control algorithm than the one defined below.

RFC4960 - Page 94

   The congestion control algorithms used by SCTP are based on
   [RFC2581].  This section describes how the algorithms defined in
   [RFC2581] are adapted for use in SCTP.  We first list differences in
   protocol designs between TCP and SCTP, and then describe SCTP's
   congestion control scheme.  The description will use the same
   terminology as in TCP congestion control whenever appropriate.

   SCTP congestion control is always applied to the entire association,
   and not to individual streams.

7.1.  SCTP Differences from TCP Congestion Control

   Gap Ack Blocks in the SCTP SACK carry the same semantic meaning as
   the TCP SACK.  TCP considers the information carried in the SACK as
   advisory information only.  SCTP considers the information carried in
   the Gap Ack Blocks in the SACK chunk as advisory.  In SCTP, any DATA
   chunk that has been acknowledged by SACK, including DATA that arrived
   at the receiving end out of order, is not considered fully delivered
   until the Cumulative TSN Ack Point passes the TSN of the DATA chunk
   (i.e., the DATA chunk has been acknowledged by the Cumulative TSN Ack
   field in the SACK).  Consequently, the value of cwnd controls the
   amount of outstanding data, rather than (as in the case of non-SACK
   TCP) the upper bound between the highest acknowledged sequence number
   and the latest DATA chunk that can be sent within the congestion
   window.  SCTP SACK leads to different implementations of Fast
   Retransmit and Fast Recovery than non-SACK TCP.  As an example, see
   [FALL96].

   The biggest difference between SCTP and TCP, however, is multi-
   homing.  SCTP is designed to establish robust communication
   associations between two endpoints each of which may be reachable by
   more than one transport address.  Potentially different addresses may
   lead to different data paths between the two endpoints; thus, ideally
   one may need a separate set of congestion control parameters for each
   of the paths.  The treatment here of congestion control for multi-
   homed receivers is new with SCTP and may require refinement in the
   future.  The current algorithms make the following assumptions:

   o  The sender usually uses the same destination address until being
      instructed by the upper layer to do otherwise; however, SCTP may
      change to an alternate destination in the event an address is
      marked inactive (see Section 8.2).  Also, SCTP may retransmit to a
      different transport address than the original transmission.

   o  The sender keeps a separate congestion control parameter set for
      each of the destination addresses it can send to (not each
      source-destination pair but for each destination).  The parameters

RFC4960 - Page 95

      should decay if the address is not used for a long enough time
      period.

   o  For each of the destination addresses, an endpoint does slow start
      upon the first transmission to that address.

   Note: TCP guarantees in-sequence delivery of data to its upper-layer
   protocol within a single TCP session.  This means that when TCP
   notices a gap in the received sequence number, it waits until the gap
   is filled before delivering the data that was received with sequence
   numbers higher than that of the missing data.  On the other hand,
   SCTP can deliver data to its upper-layer protocol even if there is a
   gap in TSN if the Stream Sequence Numbers are in sequence for a
   particular stream (i.e., the missing DATA chunks are for a different
   stream) or if unordered delivery is indicated.  Although this does
   not affect cwnd, it might affect rwnd calculation.

7.2.  SCTP Slow-Start and Congestion Avoidance

   The slow-start and congestion avoidance algorithms MUST be used by an
   endpoint to control the amount of data being injected into the
   network.  The congestion control in SCTP is employed in regard to the
   association, not to an individual stream.  In some situations, it may
   be beneficial for an SCTP sender to be more conservative than the
   algorithms allow; however, an SCTP sender MUST NOT be more aggressive
   than the following algorithms allow.

   Like TCP, an SCTP endpoint uses the following three control variables
   to regulate its transmission rate.

   o  Receiver advertised window size (rwnd, in bytes), which is set by
      the receiver based on its available buffer space for incoming
      packets.

      Note: This variable is kept on the entire association.

   o  Congestion control window (cwnd, in bytes), which is adjusted by
      the sender based on observed network conditions.

      Note: This variable is maintained on a per-destination-address
      basis.

   o  Slow-start threshold (ssthresh, in bytes), which is used by the
      sender to distinguish slow-start and congestion avoidance phases.

      Note: This variable is maintained on a per-destination-address
      basis.

RFC4960 - Page 96

   SCTP also requires one additional control variable,
   partial_bytes_acked, which is used during congestion avoidance phase
   to facilitate cwnd adjustment.

   Unlike TCP, an SCTP sender MUST keep a set of these control variables
   cwnd, ssthresh, and partial_bytes_acked for EACH destination address
   of its peer (when its peer is multi-homed).  Only one rwnd is kept
   for the whole association (no matter if the peer is multi-homed or
   has a single address).

7.2.1.  Slow-Start

   Beginning data transmission into a network with unknown conditions or
   after a sufficiently long idle period requires SCTP to probe the
   network to determine the available capacity.  The slow-start
   algorithm is used for this purpose at the beginning of a transfer, or
   after repairing loss detected by the retransmission timer.

   o  The initial cwnd before DATA transmission or after a sufficiently
      long idle period MUST be set to min(4*MTU, max (2*MTU, 4380
      bytes)).

   o  The initial cwnd after a retransmission timeout MUST be no more
      than 1*MTU.

   o  The initial value of ssthresh MAY be arbitrarily high (for
      example, implementations MAY use the size of the receiver
      advertised window).

   o  Whenever cwnd is greater than zero, the endpoint is allowed to
      have cwnd bytes of data outstanding on that transport address.

   o  When cwnd is less than or equal to ssthresh, an SCTP endpoint MUST
      use the slow-start algorithm to increase cwnd only if the current
      congestion window is being fully utilized, an incoming SACK
      advances the Cumulative TSN Ack Point, and the data sender is not
      in Fast Recovery.  Only when these three conditions are met can
      the cwnd be increased; otherwise, the cwnd MUST not be increased.
      If these conditions are met, then cwnd MUST be increased by, at
      most, the lesser of 1) the total size of the previously
      outstanding DATA chunk(s) acknowledged, and 2) the destination's
      path MTU.  This upper bound protects against the ACK-Splitting
      attack outlined in [SAVAGE99].

   In instances where its peer endpoint is multi-homed, if an endpoint
   receives a SACK that advances its Cumulative TSN Ack Point, then it
   should update its cwnd (or cwnds) apportioned to the destination
   addresses to which it transmitted the acknowledged data.  However, if

RFC4960 - Page 97

   the received SACK does not advance the Cumulative TSN Ack Point, the
   endpoint MUST NOT adjust the cwnd of any of the destination
   addresses.

   Because an endpoint's cwnd is not tied to its Cumulative TSN Ack
   Point, as duplicate SACKs come in, even though they may not advance
   the Cumulative TSN Ack Point an endpoint can still use them to clock
   out new data.  That is, the data newly acknowledged by the SACK
   diminishes the amount of data now in flight to less than cwnd, and so
   the current, unchanged value of cwnd now allows new data to be sent.
   On the other hand, the increase of cwnd must be tied to the
   Cumulative TSN Ack Point advancement as specified above.  Otherwise,
   the duplicate SACKs will not only clock out new data, but also will
   adversely clock out more new data than what has just left the
   network, during a time of possible congestion.

   o  When the endpoint does not transmit data on a given transport
      address, the cwnd of the transport address should be adjusted to
      max(cwnd/2, 4*MTU) per RTO.

7.2.2.  Congestion Avoidance

   When cwnd is greater than ssthresh, cwnd should be incremented by
   1*MTU per RTT if the sender has cwnd or more bytes of data
   outstanding for the corresponding transport address.

   In practice, an implementation can achieve this goal in the following
   way:

   o  partial_bytes_acked is initialized to 0.

   o  Whenever cwnd is greater than ssthresh, upon each SACK arrival
      that advances the Cumulative TSN Ack Point, increase
      partial_bytes_acked by the total number of bytes of all new chunks
      acknowledged in that SACK including chunks acknowledged by the new
      Cumulative TSN Ack and by Gap Ack Blocks.

   o  When partial_bytes_acked is equal to or greater than cwnd and
      before the arrival of the SACK the sender had cwnd or more bytes
      of data outstanding (i.e., before arrival of the SACK, flightsize
      was greater than or equal to cwnd), increase cwnd by MTU, and
      reset partial_bytes_acked to (partial_bytes_acked - cwnd).

   o  Same as in the slow start, when the sender does not transmit DATA
      on a given transport address, the cwnd of the transport address
      should be adjusted to max(cwnd / 2, 4*MTU) per RTO.

RFC4960 - Page 98

   o  When all of the data transmitted by the sender has been
      acknowledged by the receiver, partial_bytes_acked is initialized
      to 0.

7.2.3.  Congestion Control

   Upon detection of packet losses from SACK (see Section 7.2.4), an
   endpoint should do the following:

      ssthresh = max(cwnd/2, 4*MTU)
      cwnd = ssthresh
      partial_bytes_acked = 0

   Basically, a packet loss causes cwnd to be cut in half.

   When the T3-rtx timer expires on an address, SCTP should perform slow
   start by:

      ssthresh = max(cwnd/2, 4*MTU)
      cwnd = 1*MTU

   and ensure that no more than one SCTP packet will be in flight for
   that address until the endpoint receives acknowledgement for
   successful delivery of data to that address.

7.2.4.  Fast Retransmit on Gap Reports

   In the absence of data loss, an endpoint performs delayed
   acknowledgement.  However, whenever an endpoint notices a hole in the
   arriving TSN sequence, it SHOULD start sending a SACK back every time
   a packet arrives carrying data until the hole is filled.

   Whenever an endpoint receives a SACK that indicates that some TSNs
   are missing, it SHOULD wait for two further miss indications (via
   subsequent SACKs for a total of three missing reports) on the same
   TSNs before taking action with regard to Fast Retransmit.

   Miss indications SHOULD follow the HTNA (Highest TSN Newly
   Acknowledged) algorithm.  For each incoming SACK, miss indications
   are incremented only for missing TSNs prior to the highest TSN newly
   acknowledged in the SACK.  A newly acknowledged DATA chunk is one not
   previously acknowledged in a SACK.  If an endpoint is in Fast
   Recovery and a SACK arrives that advances the Cumulative TSN Ack
   Point, the miss indications are incremented for all TSNs reported
   missing in the SACK.

   When the third consecutive miss indication is received for a TSN(s),
   the data sender shall do the following:

RFC4960 - Page 99

   1)  Mark the DATA chunk(s) with three miss indications for
       retransmission.

   2)  If not in Fast Recovery, adjust the ssthresh and cwnd of the
       destination address(es) to which the missing DATA chunks were
       last sent, according to the formula described in Section 7.2.3.

   3)  Determine how many of the earliest (i.e., lowest TSN) DATA chunks
       marked for retransmission will fit into a single packet, subject
       to constraint of the path MTU of the destination transport
       address to which the packet is being sent.  Call this value K.
       Retransmit those K DATA chunks in a single packet.  When a Fast
       Retransmit is being performed, the sender SHOULD ignore the value
       of cwnd and SHOULD NOT delay retransmission for this single
       packet.

   4)  Restart the T3-rtx timer only if the last SACK acknowledged the
       lowest outstanding TSN number sent to that address, or the
       endpoint is retransmitting the first outstanding DATA chunk sent
       to that address.

   5)  Mark the DATA chunk(s) as being fast retransmitted and thus
       ineligible for a subsequent Fast Retransmit.  Those TSNs marked
       for retransmission due to the Fast-Retransmit algorithm that did
       not fit in the sent datagram carrying K other TSNs are also
       marked as ineligible for a subsequent Fast Retransmit.  However,
       as they are marked for retransmission they will be retransmitted
       later on as soon as cwnd allows.

   6)  If not in Fast Recovery, enter Fast Recovery and mark the highest
       outstanding TSN as the Fast Recovery exit point.  When a SACK
       acknowledges all TSNs up to and including this exit point, Fast
       Recovery is exited.  While in Fast Recovery, the ssthresh and
       cwnd SHOULD NOT change for any destinations due to a subsequent
       Fast Recovery event (i.e., one SHOULD NOT reduce the cwnd further
       due to a subsequent Fast Retransmit).

   Note: Before the above adjustments, if the received SACK also
   acknowledges new DATA chunks and advances the Cumulative TSN Ack
   Point, the cwnd adjustment rules defined in Section 7.2.1 and Section
   7.2.2 must be applied first.

   A straightforward implementation of the above keeps a counter for
   each TSN hole reported by a SACK.  The counter increments for each
   consecutive SACK reporting the TSN hole.  After reaching 3 and
   starting the Fast-Retransmit procedure, the counter resets to 0.

RFC4960 - Page 100

   Because cwnd in SCTP indirectly bounds the number of outstanding
   TSN's, the effect of TCP Fast Recovery is achieved automatically with
   no adjustment to the congestion control window size.

7.3.  Path MTU Discovery

   [RFC4821], [RFC1981], and [RFC1191] specify "Packetization Layer Path
   MTU Discovery", whereby an endpoint maintains an estimate of the
   maximum transmission unit (MTU) along a given Internet path and
   refrains from sending packets along that path that exceed the MTU,
   other than occasional attempts to probe for a change in the Path MTU
   (PMTU).  [RFC4821] is thorough in its discussion of the MTU discovery
   mechanism and strategies for determining the current end-to-end MTU
   setting as well as detecting changes in this value.

   An endpoint SHOULD apply these techniques, and SHOULD do so on a
   per-destination-address basis.

   There are two important SCTP-specific points regarding Path MTU
   discovery:

   1)  SCTP associations can span multiple addresses.  An endpoint MUST
       maintain separate MTU estimates for each destination address of
       its peer.

   2)  The sender should track an association PMTU that will be the
       smallest PMTU discovered for all of the peer's destination
       addresses.  When fragmenting messages into multiple parts this
       association PMTU should be used to calculate the size of each
       fragment.  This will allow retransmissions to be seamlessly sent
       to an alternate address without encountering IP fragmentation.

8.  Fault Management

8.1.  Endpoint Failure Detection

   An endpoint shall keep a counter on the total number of consecutive
   retransmissions to its peer (this includes retransmissions to all the
   destination transport addresses of the peer if it is multi-homed),
   including unacknowledged HEARTBEAT chunks.  If the value of this
   counter exceeds the limit indicated in the protocol parameter
   'Association.Max.Retrans', the endpoint shall consider the peer
   endpoint unreachable and shall stop transmitting any more data to it
   (and thus the association enters the CLOSED state).  In addition, the
   endpoint MAY report the failure to the upper layer and optionally
   report back all outstanding user data remaining in its outbound
   queue.  The association is automatically closed when the peer
   endpoint becomes unreachable.

RFC4960 - Page 101

   The counter shall be reset each time a DATA chunk sent to that peer
   endpoint is acknowledged (by the reception of a SACK) or a HEARTBEAT
   ACK is received from the peer endpoint.

8.2.  Path Failure Detection

   When its peer endpoint is multi-homed, an endpoint should keep an
   error counter for each of the destination transport addresses of the
   peer endpoint.

   Each time the T3-rtx timer expires on any address, or when a
   HEARTBEAT sent to an idle address is not acknowledged within an RTO,
   the error counter of that destination address will be incremented.
   When the value in the error counter exceeds the protocol parameter
   'Path.Max.Retrans' of that destination address, the endpoint should
   mark the destination transport address as inactive, and a
   notification SHOULD be sent to the upper layer.

   When an outstanding TSN is acknowledged or a HEARTBEAT sent to that
   address is acknowledged with a HEARTBEAT ACK, the endpoint shall
   clear the error counter of the destination transport address to which
   the DATA chunk was last sent (or HEARTBEAT was sent).  When the peer
   endpoint is multi-homed and the last chunk sent to it was a
   retransmission to an alternate address, there exists an ambiguity as
   to whether or not the acknowledgement should be credited to the
   address of the last chunk sent.  However, this ambiguity does not
   seem to bear any significant consequence to SCTP behavior.  If this
   ambiguity is undesirable, the transmitter may choose not to clear the
   error counter if the last chunk sent was a retransmission.

   Note: When configuring the SCTP endpoint, the user should avoid
   having the value of 'Association.Max.Retrans' larger than the
   summation of the 'Path.Max.Retrans' of all the destination addresses
   for the remote endpoint.  Otherwise, all the destination addresses
   may become inactive while the endpoint still considers the peer
   endpoint reachable.  When this condition occurs, how SCTP chooses to
   function is implementation specific.

   When the primary path is marked inactive (due to excessive
   retransmissions, for instance), the sender MAY automatically transmit
   new packets to an alternate destination address if one exists and is
   active.  If more than one alternate address is active when the
   primary path is marked inactive, only ONE transport address SHOULD be
   chosen and used as the new destination transport address.

RFC4960 - Page 102

8.3.  Path Heartbeat

   By default, an SCTP endpoint SHOULD monitor the reachability of the
   idle destination transport address(es) of its peer by sending a
   HEARTBEAT chunk periodically to the destination transport
   address(es).  HEARTBEAT sending MAY begin upon reaching the
   ESTABLISHED state and is discontinued after sending either SHUTDOWN
   or SHUTDOWN-ACK.  A receiver of a HEARTBEAT MUST respond to a
   HEARTBEAT with a HEARTBEAT-ACK after entering the COOKIE-ECHOED state
   (INIT sender) or the ESTABLISHED state (INIT receiver), up until
   reaching the SHUTDOWN-SENT state (SHUTDOWN sender) or the SHUTDOWN-
   ACK-SENT state (SHUTDOWN receiver).

   A destination transport address is considered "idle" if no new chunk
   that can be used for updating path RTT (usually including first
   transmission DATA, INIT, COOKIE ECHO, HEARTBEAT, etc.) and no
   HEARTBEAT has been sent to it within the current heartbeat period of
   that address.  This applies to both active and inactive destination
   addresses.

   The upper layer can optionally initiate the following functions:

   A) Disable heartbeat on a specific destination transport address of a
      given association,

   B) Change the HB.interval,

   C) Re-enable heartbeat on a specific destination transport address of
      a given association, and

   D) Request an on-demand HEARTBEAT on a specific destination transport
      address of a given association.

   The endpoint should increment the respective error counter of the
   destination transport address each time a HEARTBEAT is sent to that
   address and not acknowledged within one RTO.

   When the value of this counter reaches the protocol parameter
   'Path.Max.Retrans', the endpoint should mark the corresponding
   destination address as inactive if it is not so marked, and may also
   optionally report to the upper layer the change of reachability of
   this destination address.  After this, the endpoint should continue
   HEARTBEAT on this destination address but should stop increasing the
   counter.

   The sender of the HEARTBEAT chunk should include in the Heartbeat
   Information field of the chunk the current time when the packet is
   sent out and the destination address to which the packet is sent.

RFC4960 - Page 103

   IMPLEMENTATION NOTE: An alternative implementation of the heartbeat
   mechanism that can be used is to increment the error counter variable
   every time a HEARTBEAT is sent to a destination.  Whenever a
   HEARTBEAT ACK arrives, the sender SHOULD clear the error counter of
   the destination that the HEARTBEAT was sent to.  This in effect would
   clear the previously stroked error (and any other error counts as
   well).

   The receiver of the HEARTBEAT should immediately respond with a
   HEARTBEAT ACK that contains the Heartbeat Information TLV, together
   with any other received TLVs, copied unchanged from the received
   HEARTBEAT chunk.

   Upon the receipt of the HEARTBEAT ACK, the sender of the HEARTBEAT
   should clear the error counter of the destination transport address
   to which the HEARTBEAT was sent, and mark the destination transport
   address as active if it is not so marked.  The endpoint may
   optionally report to the upper layer when an inactive destination
   address is marked as active due to the reception of the latest
   HEARTBEAT ACK.  The receiver of the HEARTBEAT ACK must also clear the
   association overall error count as well (as defined in Section 8.1).

   The receiver of the HEARTBEAT ACK should also perform an RTT
   measurement for that destination transport address using the time
   value carried in the HEARTBEAT ACK chunk.

   On an idle destination address that is allowed to heartbeat, it is
   recommended that a HEARTBEAT chunk is sent once per RTO of that
   destination address plus the protocol parameter 'HB.interval', with
   jittering of +/- 50% of the RTO value, and exponential backoff of the
   RTO if the previous HEARTBEAT is unanswered.

   A primitive is provided for the SCTP user to change the HB.interval
   and turn on or off the heartbeat on a given destination address.  The
   heartbeat interval set by the SCTP user is added to the RTO of that
   destination (including any exponential backoff).  Only one heartbeat
   should be sent each time the heartbeat timer expires (if multiple
   destinations are idle).  It is an implementation decision on how to
   choose which of the candidate idle destinations to heartbeat to (if
   more than one destination is idle).

   Note: When tuning the heartbeat interval, there is a side effect that
   SHOULD be taken into account.  When this value is increased, i.e.,
   the HEARTBEAT takes longer, the detection of lost ABORT messages
   takes longer as well.  If a peer endpoint ABORTs the association for
   any reason and the ABORT chunk is lost, the local endpoint will only
   discover the lost ABORT by sending a DATA chunk or HEARTBEAT chunk
   (thus causing the peer to send another ABORT).  This must be

RFC4960 - Page 104

   considered when tuning the HEARTBEAT timer.  If the HEARTBEAT is
   disabled, only sending DATA to the association will discover a lost
   ABORT from the peer.

8.4.  Handle "Out of the Blue" Packets

   An SCTP packet is called an "out of the blue" (OOTB) packet if it is
   correctly formed (i.e., passed the receiver's CRC32c check; see
   Section 6.8), but the receiver is not able to identify the
   association to which this packet belongs.

   The receiver of an OOTB packet MUST do the following:

   1)  If the OOTB packet is to or from a non-unicast address, a
       receiver SHOULD silently discard the packet.  Otherwise,

   2)  If the OOTB packet contains an ABORT chunk, the receiver MUST
       silently discard the OOTB packet and take no further action.
       Otherwise,

   3)  If the packet contains an INIT chunk with a Verification Tag set
       to '0', process it as described in Section 5.1.  If, for whatever
       reason, the INIT cannot be processed normally and an ABORT has to
       be sent in response, the Verification Tag of the packet
       containing the ABORT chunk MUST be the Initiate Tag of the
       received INIT chunk, and the T bit of the ABORT chunk has to be
       set to 0, indicating that the Verification Tag is NOT reflected.

   4)  If the packet contains a COOKIE ECHO in the first chunk, process
       it as described in Section 5.1.  Otherwise,

   5)  If the packet contains a SHUTDOWN ACK chunk, the receiver should
       respond to the sender of the OOTB packet with a SHUTDOWN
       COMPLETE.  When sending the SHUTDOWN COMPLETE, the receiver of
       the OOTB packet must fill in the Verification Tag field of the
       outbound packet with the Verification Tag received in the
       SHUTDOWN ACK and set the T bit in the Chunk Flags to indicate
       that the Verification Tag is reflected.  Otherwise,

   6)  If the packet contains a SHUTDOWN COMPLETE chunk, the receiver
       should silently discard the packet and take no further action.
       Otherwise,

   7)  If the packet contains a "Stale Cookie" ERROR or a COOKIE ACK,
       the SCTP packet should be silently discarded.  Otherwise,

RFC4960 - Page 105

   8)  The receiver should respond to the sender of the OOTB packet with
       an ABORT.  When sending the ABORT, the receiver of the OOTB
       packet MUST fill in the Verification Tag field of the outbound
       packet with the value found in the Verification Tag field of the
       OOTB packet and set the T bit in the Chunk Flags to indicate that
       the Verification Tag is reflected.  After sending this ABORT, the
       receiver of the OOTB packet shall discard the OOTB packet and
       take no further action.

8.5.  Verification Tag

   The Verification Tag rules defined in this section apply when sending
   or receiving SCTP packets that do not contain an INIT, SHUTDOWN
   COMPLETE, COOKIE ECHO (see Section 5.1), ABORT, or SHUTDOWN ACK
   chunk.  The rules for sending and receiving SCTP packets containing
   one of these chunk types are discussed separately in Section 8.5.1.

   When sending an SCTP packet, the endpoint MUST fill in the
   Verification Tag field of the outbound packet with the tag value in
   the Initiate Tag parameter of the INIT or INIT ACK received from its
   peer.

   When receiving an SCTP packet, the endpoint MUST ensure that the
   value in the Verification Tag field of the received SCTP packet
   matches its own tag.  If the received Verification Tag value does not
   match the receiver's own tag value, the receiver shall silently
   discard the packet and shall not process it any further except for
   those cases listed in Section 8.5.1 below.

8.5.1.  Exceptions in Verification Tag Rules

   A) Rules for packet carrying INIT:

   -   The sender MUST set the Verification Tag of the packet to 0.

   -   When an endpoint receives an SCTP packet with the Verification
       Tag set to 0, it should verify that the packet contains only an
       INIT chunk.  Otherwise, the receiver MUST silently discard the
       packet.

   B) Rules for packet carrying ABORT:

   -   The endpoint MUST always fill in the Verification Tag field of
       the outbound packet with the destination endpoint's tag value, if
       it is known.

   -   If the ABORT is sent in response to an OOTB packet, the endpoint
       MUST follow the procedure described in Section 8.4.

RFC4960 - Page 106

   -   The receiver of an ABORT MUST accept the packet if the
       Verification Tag field of the packet matches its own tag and the
       T bit is not set OR if it is set to its peer's tag and the T bit
       is set in the Chunk Flags.  Otherwise, the receiver MUST silently
       discard the packet and take no further action.

   C) Rules for packet carrying SHUTDOWN COMPLETE:

   -   When sending a SHUTDOWN COMPLETE, if the receiver of the SHUTDOWN
       ACK has a TCB, then the destination endpoint's tag MUST be used,
       and the T bit MUST NOT be set.  Only where no TCB exists should
       the sender use the Verification Tag from the SHUTDOWN ACK, and
       MUST set the T bit.

   -   The receiver of a SHUTDOWN COMPLETE shall accept the packet if
       the Verification Tag field of the packet matches its own tag and
       the T bit is not set OR if it is set to its peer's tag and the T
       bit is set in the Chunk Flags.  Otherwise, the receiver MUST
       silently discard the packet and take no further action.  An
       endpoint MUST ignore the SHUTDOWN COMPLETE if it is not in the
       SHUTDOWN-ACK-SENT state.

   D) Rules for packet carrying a COOKIE ECHO

   -   When sending a COOKIE ECHO, the endpoint MUST use the value of
       the Initiate Tag received in the INIT ACK.

   -   The receiver of a COOKIE ECHO follows the procedures in Section
       5.

   E) Rules for packet carrying a SHUTDOWN ACK

   -   If the receiver is in COOKIE-ECHOED or COOKIE-WAIT state the
       procedures in Section 8.4 SHOULD be followed; in other words, it
       should be treated as an Out Of The Blue packet.

(page 106 continued on part 6)