Internet Engineering Task Force (IETF) G. Fairhurst, Ed. Request for Comments: 8095 University of Aberdeen Category: Informational B. Trammell, Ed. ISSN: 2070-1721 M. Kuehlewind, Ed. ETH Zurich March 2017 Services Provided by IETF Transport Protocols and Congestion Control MechanismsAbstract
This document describes, surveys, and classifies the protocol mechanisms provided by existing IETF protocols, as background for determining a common set of transport services. It examines the Transmission Control Protocol (TCP), Multipath TCP, the Stream Control Transmission Protocol (SCTP), the User Datagram Protocol (UDP), UDP-Lite, the Datagram Congestion Control Protocol (DCCP), the Internet Control Message Protocol (ICMP), the Real-Time Transport Protocol (RTP), File Delivery over Unidirectional Transport / Asynchronous Layered Coding (FLUTE/ALC) for Reliable Multicast, NACK- Oriented Reliable Multicast (NORM), Transport Layer Security (TLS), Datagram TLS (DTLS), and the Hypertext Transport Protocol (HTTP), when HTTP is used as a pseudotransport. This survey provides background for the definition of transport services within the TAPS working group. Status of This Memo This document is not an Internet Standards Track specification; it is published for informational purposes. This document is a product of the Internet Engineering Task Force (IETF). It represents the consensus of the IETF community. It has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG). Not all documents approved by the IESG are a candidate for any level of Internet Standard; see Section 2 of RFC 7841. Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at http://www.rfc-editor.org/info/rfc8095.
Copyright Notice Copyright (c) 2017 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.Table of Contents
1. Introduction ....................................................4 1.1. Overview of Transport Features .............................4 2. Terminology .....................................................5 3. Existing Transport Protocols ....................................6 3.1. Transport Control Protocol (TCP) ...........................6 3.1.1. Protocol Description ................................6 3.1.2. Interface Description ...............................8 3.1.3. Transport Features ..................................9 3.2. Multipath TCP (MPTCP) .....................................10 3.2.1. Protocol Description ...............................10 3.2.2. Interface Description ..............................10 3.2.3. Transport Features .................................11 3.3. User Datagram Protocol (UDP) ..............................11 3.3.1. Protocol Description ...............................11 3.3.2. Interface Description ..............................12 3.3.3. Transport Features .................................13 3.4. Lightweight User Datagram Protocol (UDP-Lite) .............13 3.4.1. Protocol Description ...............................13 3.4.2. Interface Description ..............................14 3.4.3. Transport Features .................................14 3.5. Stream Control Transmission Protocol (SCTP) ...............14 3.5.1. Protocol Description ...............................15 3.5.2. Interface Description ..............................17 3.5.3. Transport Features .................................19 3.6. Datagram Congestion Control Protocol (DCCP) ...............20 3.6.1. Protocol Description ...............................21 3.6.2. Interface Description ..............................22 3.6.3. Transport Features .................................22
3.7. Transport Layer Security (TLS) and Datagram TLS (DTLS) as a Pseudotransport ...............................23 3.7.1. Protocol Description ...............................23 3.7.2. Interface Description ..............................24 3.7.3. Transport Features .................................25 3.8. Real-Time Transport Protocol (RTP) ........................26 3.8.1. Protocol Description ...............................26 3.8.2. Interface Description ..............................27 3.8.3. Transport Features .................................27 3.9. Hypertext Transport Protocol (HTTP) over TCP as a Pseudotransport ...........................................28 3.9.1. Protocol Description ...............................28 3.9.2. Interface Description ..............................29 3.9.3. Transport Features .................................30 3.10. File Delivery over Unidirectional Transport / Asynchronous Layered Coding (FLUTE/ALC) for Reliable Multicast .......................................31 3.10.1. Protocol Description ..............................31 3.10.2. Interface Description .............................33 3.10.3. Transport Features ................................33 3.11. NACK-Oriented Reliable Multicast (NORM) ..................34 3.11.1. Protocol Description ..............................34 3.11.2. Interface Description .............................35 3.11.3. Transport Features ................................36 3.12. Internet Control Message Protocol (ICMP) .................36 3.12.1. Protocol Description ..............................37 3.12.2. Interface Description .............................37 3.12.3. Transport Features ................................38 4. Congestion Control .............................................38 5. Transport Features .............................................39 6. IANA Considerations ............................................42 7. Security Considerations ........................................42 8. Informative References .........................................42 Acknowledgments ...................................................53 Contributors ......................................................53 Authors' Addresses ................................................54
1. Introduction
Internet applications make use of the services provided by a transport protocol, such as TCP (a reliable, in-order stream protocol) or UDP (an unreliable datagram protocol). We use the term "transport service" to mean the end-to-end service provided to an application by the transport layer. That service can only be provided correctly if information about the intended usage is supplied from the application. The application may determine this information at design time, compile time, or run time, and may include guidance on whether a feature is required, a preference by the application, or something in between. Examples of features of transport services are reliable delivery, ordered delivery, content privacy to in-path devices, and integrity protection. The IETF has defined a wide variety of transport protocols beyond TCP and UDP, including SCTP, DCCP, MPTCP, and UDP-Lite. Transport services may be provided directly by these transport protocols or layered on top of them using protocols such as WebSockets (which runs over TCP), RTP (over TCP or UDP) or WebRTC data channels (which run over SCTP over DTLS over UDP or TCP). Services built on top of UDP or UDP-Lite typically also need to specify additional mechanisms, including a congestion control mechanism (such as NewReno [RFC6582], TCP-Friendly Rate Control (TFRC) [RFC5348], or Low Extra Delay Background Transport (LEDBAT) [RFC6817]). This extends the set of available transport services beyond those provided to applications by TCP and UDP. The transport protocols described in this document provide a basis for the definition of transport services provided by common protocols, as background for the TAPS working group. The protocols listed here were chosen to help expose as many potential transport services as possible and are not meant to be a comprehensive survey or classification of all transport protocols.1.1. Overview of Transport Features
Transport protocols can be differentiated by the features of the services they provide. Some of these provided features are closely related to basic control function that a protocol needs to work over a network path, such as addressing. The number of participants in a given association also determines its applicability: a connection can be between endpoints (unicast), to one of multiple endpoints (anycast), or simultaneously to multiple endpoints (multicast). Unicast protocols usually support bidirectional communication, while multicast is generally
unidirectional. Another feature is whether a transport requires a control exchange across the network at setup (e.g., TCP) or whether it is connectionless (e.g., UDP). For packet delivery itself, reliability and integrity protection, ordering, and framing are basic features. However, these features are implemented with different levels of assurance in different protocols. As an example, a transport service may provide full reliability, with detection of loss and retransmission (e.g., TCP). SCTP offers a message-based service that can provide full or partial reliability and allows the protocol to minimize the head-of-line blocking due to the support of ordered and unordered message delivery within multiple streams. UDP-Lite and DCCP can provide partial integrity protection to enable corruption tolerance. Usually, a protocol has been designed to support one specific type of delivery/framing: either data needs to be divided into transmission units based on network packets (datagram service) or a data stream is segmented and re-combined across multiple packets (stream service). Whole objects such as files are handled accordingly. This decision strongly influences the interface that is provided to the upper layer. In addition, transport protocols offer a certain support for transmission control. For example, a transport service can provide flow control to allow a receiver to regulate the transmission rate of a sender. Further, a transport service can provide congestion control (see Section 4). As an example, TCP and SCTP provide congestion control for use in the Internet, whereas UDP leaves this function to the upper-layer protocol that uses UDP. Security features are often provided independently of the transport protocol, via Transport Layer Security (TLS) (see Section 3.7) or by the application-layer protocol itself. The security properties TLS provides to the application (such as confidentiality, integrity, and authenticity) are also features of the transport layer, even though they are often presently implemented in a separate protocol.2. Terminology
The following terms are used throughout this document and in subsequent documents produced by the TAPS working group that describe the composition and decomposition of transport services. Transport Feature: a specific end-to-end feature that the transport layer provides to an application. Examples include confidentiality, reliable delivery, ordered delivery, message- versus-stream orientation, etc.
Transport Service: a set of transport features, without an association to any given framing protocol, that provides a complete service to an application. Transport Protocol: an implementation that provides one or more different transport services using a specific framing and header format on the wire. Application: an entity that uses the transport layer for end-to-end delivery data across the network (this may also be an upper-layer protocol or tunnel encapsulation).3. Existing Transport Protocols
This section provides a list of known IETF transport protocols and transport protocol frameworks. It does not make an assessment about whether specific implementations of protocols are fully compliant to current IETF specifications.3.1. Transport Control Protocol (TCP)
TCP is an IETF Standards Track transport protocol. [RFC793] introduces TCP as follows: The Transmission Control Protocol (TCP) is intended for use as a highly reliable host-to-host protocol between hosts in packet- switched computer communication networks, and in interconnected systems of such networks. Since its introduction, TCP has become the default connection- oriented, stream-based transport protocol in the Internet. It is widely implemented by endpoints and widely used by common applications.3.1.1. Protocol Description
TCP is a connection-oriented protocol that provides a three-way handshake to allow a client and server to set up a connection and negotiate features and provides mechanisms for orderly completion and immediate teardown of a connection [RFC793] [TCP-SPEC]. TCP is defined by a family of RFCs (see [RFC7414]). TCP provides multiplexing to multiple sockets on each host using port numbers. A similar approach is adopted by other IETF-defined transports. An active TCP session is identified by its four-tuple of local and remote IP addresses and local and remote port numbers. The destination port during connection setup is often used to indicate the requested service.
TCP partitions a continuous stream of bytes into segments, sized to fit in IP packets based on a negotiated maximum segment size and further constrained by the effective Maximum Transmission Unit (MTU) from Path MTU Discovery (PMTUD). ICMP-based PMTUD [RFC1191] [RFC1981] as well as Packetization Layer PMTUD (PLPMTUD) [RFC4821] have been defined by the IETF. Each byte in the stream is identified by a sequence number. The sequence number is used to order segments on receipt, to identify segments in acknowledgments, and to detect unacknowledged segments for retransmission. This is the basis of the reliable, ordered delivery of data in a TCP stream. TCP Selective Acknowledgment (SACK) [RFC2018] extends this mechanism by making it possible to provide earlier identification of which segments are missing, allowing faster retransmission. SACK-based methods (e.g., Duplicate Selective ACK) can also result in less spurious retransmission. Receiver flow control is provided by a sliding window, which limits the amount of unacknowledged data that can be outstanding at a given time. The window scale option [RFC7323] allows a receiver to use windows greater than 64 KB. All TCP senders provide congestion control, such as that described in [RFC5681]. TCP uses a sequence number with a sliding receiver window for flow control. The TCP congestion control mechanism also utilizes this TCP sequence number to manage a separate congestion window [RFC5681]. The sending window at a given point in time is the minimum of the receiver window and the congestion window. The congestion window is increased in the absence of congestion and decreased if congestion is detected. Often, loss is implicitly handled as a congestion indication, which is detected in TCP (also as input for retransmission handling) based on two mechanisms: a retransmission timer with exponential back-off or the reception of three acknowledgments for the same segment, so called "duplicated ACKs" (fast retransmit). In addition, Explicit Congestion Notification (ECN) [RFC3168] can be used in TCP and, if supported by both endpoints, allows a network node to signal congestion without inducing loss. Alternatively, a delay-based congestion control scheme that reacts to changes in delay as an early indication of congestion can be used in TCP. This is further described in Section 4. Examples of different kinds of congestion control schemes are provided in Section 4. TCP protocol instances can be extended (see [RFC7414]). Some protocol features may also be tuned to optimize for a specific deployment scenario. Some features are sender-side only, requiring no negotiation with the receiver; some are receiver-side only; and some are explicitly negotiated during connection setup.
TCP may buffer data, e.g., to optimize processing or capacity usage. TCP therefore provides mechanisms to control this, including an optional "PUSH" function [RFC793] that explicitly requests the transport service not to delay data. By default, TCP segment partitioning uses Nagle's algorithm [TCP-SPEC] to buffer data at the sender into large segments, potentially incurring sender-side buffering delay; this algorithm can be disabled by the sender to transmit more immediately, e.g., to reduce latency for interactive sessions. TCP provides an "urgent data" function for limited out-of-order delivery of the data. This function is deprecated [RFC6093]. A TCP Reset (RST) control message may be used to force a TCP endpoint to close a session [RFC793], aborting the connection. A mandatory checksum provides a basic integrity check against misdelivery and data corruption over the entire packet. Applications that require end-to-end integrity of data are recommended to include a stronger integrity check of their payload data. The TCP checksum [RFC1071] [RFC2460] does not support partial payload protection (as in DCCP/UDP-Lite). TCP supports only unicast connections.3.1.2. Interface Description
The User/TCP Interface defined in [RFC793] provides six user commands: Open, Send, Receive, Close, Status, and Abort. This interface does not describe configuration of TCP options or parameters aside from the use of the PUSH and URGENT flags. [RFC1122] describes extensions of the TCP/application-layer interface for: o reporting soft errors such as reception of ICMP error messages, extensive retransmission, or urgent pointer advance, o providing a possibility to specify the Differentiated Services Code Point (DSCP) [RFC3260] (formerly, the Type-of-Service (TOS)) for segments, o providing a flush call to empty the TCP send queue, and o multihoming support.
In API implementations derived from the BSD Sockets API, TCP sockets are created using the "SOCK_STREAM" socket type as described in the IEEE Portable Operating System Interface (POSIX) Base Specifications [POSIX]. The features used by a protocol instance may be set and tuned via this API. There are currently no documents in the RFC Series that describe this interface.3.1.3. Transport Features
The transport features provided by TCP are: o connection-oriented transport with feature negotiation and application-to-port mapping (implemented using SYN segments and the TCP Option field to negotiate features), o unicast transport (though anycast TCP is implemented, at risk of instability due to rerouting), o port multiplexing, o unidirectional or bidirectional communication, o stream-oriented delivery in a single stream, o fully reliable delivery (implemented using ACKs sent from the receiver to confirm delivery), o error detection (implemented using a segment checksum to verify delivery to the correct endpoint and integrity of the data and options), o segmentation, o data bundling (optional; uses Nagle's algorithm to coalesce data sent within the same RTT into full-sized segments), o flow control (implemented using a window-based mechanism where the receiver advertises the window that it is willing to buffer), and o congestion control (usually implemented using a window-based mechanism and four algorithms for different phases of the transmission: slow start, congestion avoidance, fast retransmit, and fast recovery [RFC5681]).
3.2. Multipath TCP (MPTCP)
Multipath TCP [RFC6824] is an extension for TCP to support multihoming for resilience, mobility, and load balancing. It is designed to be as indistinguishable to middleboxes from non-multipath TCP as possible. It does so by establishing regular TCP flows between a pair of source/destination endpoints and multiplexing the application's stream over these flows. Sub-flows can be started over IPv4 or IPv6 for the same session.3.2.1. Protocol Description
MPTCP uses TCP options for its control plane. They are used to signal multipath capabilities, as well as to negotiate data sequence numbers, advertise other available IP addresses, and establish new sessions between pairs of endpoints. By multiplexing one byte stream over separate paths, MPTCP can achieve a higher throughput than TCP in certain situations. However, if coupled congestion control [RFC6356] is used, it might limit this benefit to maintain fairness to other flows at the bottleneck. When aggregating capacity over multiple paths, and depending on the way packets are scheduled on each TCP subflow, additional delay and higher jitter might be observed before in-order delivery of data to the applications.3.2.2. Interface Description
By default, MPTCP exposes the same interface as TCP to the application. [RFC6897], however, describes a richer API for MPTCP- aware applications. This Basic API describes how an application can: o enable or disable MPTCP. o bind a socket to one or more selected local endpoints. o query local and remote endpoint addresses. o get a unique connection identifier (similar to an address-port pair for TCP). The document also recommends the use of extensions defined for SCTP [RFC6458] (see Section 3.5) to support multihoming for resilience and mobility.
3.2.3. Transport Features
As an extension to TCP, MPTCP provides mostly the same features. By establishing multiple sessions between available endpoints, it can additionally provide soft failover solutions in the case that one of the paths becomes unusable. Therefore, the transport features provided by MPTCP in addition to TCP are: o multihoming for load balancing, with endpoint multiplexing of a single byte stream, using either coupled congestion control or throughput maximization, o address family multiplexing (using IPv4 and IPv6 for the same session), and o resilience to network failure and/or handover.3.3. User Datagram Protocol (UDP)
The User Datagram Protocol (UDP) [RFC768] [RFC2460] is an IETF Standards Track transport protocol. It provides a unidirectional datagram protocol that preserves message boundaries. It provides no error correction, congestion control, or flow control. It can be used to send broadcast datagrams (IPv4) or multicast datagrams (IPv4 and IPv6), in addition to unicast and anycast datagrams. IETF guidance on the use of UDP is provided in [RFC8085]. UDP is widely implemented and widely used by common applications, including DNS.3.3.1. Protocol Description
UDP is a connectionless protocol that maintains message boundaries, with no connection setup or feature negotiation. The protocol uses independent messages, ordinarily called "datagrams". It provides detection of payload errors and misdelivery of packets to an unintended endpoint, both of which result in discard of received datagrams, with no indication to the user of the service. It is possible to create IPv4 UDP datagrams with no checksum, and while this is generally discouraged [RFC1122] [RFC8085], certain special cases permit this use. These datagrams rely on the IPv4 header checksum to protect from misdelivery to an unintended endpoint. IPv6 does not permit UDP datagrams with no checksum, although in certain cases [RFC6936], this rule may be relaxed [RFC6935].
UDP does not provide reliability and does not provide retransmission. Messages may be reordered, lost, or duplicated in transit. Note that due to the relatively weak form of checksum used by UDP, applications that require end-to-end integrity of data are recommended to include a stronger integrity check of their payload data. Because UDP provides no flow control, a receiving application that is unable to run sufficiently fast, or frequently, may miss messages. The lack of congestion handling implies UDP traffic may experience loss when using an overloaded path and may cause the loss of messages from other protocols (e.g., TCP) when sharing the same network path. On transmission, UDP encapsulates each datagram into a single IP packet or several IP packet fragments. This allows a datagram to be larger than the effective path MTU. Fragments are reassembled before delivery to the UDP receiver, making this transparent to the user of the transport service. When jumbograms are supported, larger messages may be sent without performing fragmentation. UDP on its own does not provide support for segmentation, receiver flow control, congestion control, PMTUD/PLPMTUD, or ECN. Applications that require these features need to provide them on their own or use a protocol over UDP that provides them [RFC8085].3.3.2. Interface Description
[RFC768] describes basic requirements for an API for UDP. Guidance on the use of common APIs is provided in [RFC8085]. A UDP endpoint consists of a tuple of (IP address, port number). De-multiplexing using multiple abstract endpoints (sockets) on the same IP address is supported. The same socket may be used by a single server to interact with multiple clients. (Note: This behavior differs from TCP, which uses a pair of tuples to identify a connection). Multiple server instances (processes) that bind to the same socket can cooperate to service multiple clients. The socket implementation arranges to not duplicate the same received unicast message to multiple server processes. Many operating systems also allow a UDP socket to be "connected", i.e., to bind a UDP socket to a specific (remote) UDP endpoint. Unlike TCP's connect primitive, for UDP, this is only a local operation that serves to simplify the local send/receive functions and to filter the traffic for the specified addresses and ports [RFC8085].
3.3.3. Transport Features
The transport features provided by UDP are: o unicast, multicast, anycast, or IPv4 broadcast transport, o port multiplexing (where a receiving port can be configured to receive datagrams from multiple senders), o message-oriented delivery, o unidirectional or bidirectional communication where the transmissions in each direction are independent, o non-reliable delivery, o unordered delivery, and o error detection (implemented using a segment checksum to verify delivery to the correct endpoint and integrity of the data; optional for IPv4 and optional under specific conditions for IPv6 where all or none of the payload data is protected).3.4. Lightweight User Datagram Protocol (UDP-Lite)
The Lightweight User Datagram Protocol (UDP-Lite) [RFC3828] is an IETF Standards Track transport protocol. It provides a unidirectional, datagram protocol that preserves message boundaries. IETF guidance on the use of UDP-Lite is provided in [RFC8085]. A UDP-Lite service may support IPv4 broadcast, multicast, anycast, and unicast, as well as IPv6 multicast, anycast, and unicast. Examples of use include a class of applications that can derive benefit from having partially damaged payloads delivered rather than discarded. One use is to provide header integrity checks but allow delivery of corrupted payloads to error-tolerant applications or to applications that use some other mechanism to provide payload integrity (see [RFC6936]).3.4.1. Protocol Description
Like UDP, UDP-Lite is a connectionless datagram protocol, with no connection setup or feature negotiation. It changes the semantics of the UDP Payload Length field to that of a Checksum Coverage Length field and is identified by a different IP protocol/next-header value. The Checksum Coverage Length field specifies the intended checksum coverage, with the remaining unprotected part of the payload called
the "error-insensitive part". Therefore, applications using UDP-Lite cannot make assumptions regarding the correctness of the data received in the insensitive part of the UDP-Lite payload. Otherwise, UDP-Lite is semantically identical to UDP. In the same way as for UDP, mechanisms for receiver flow control, congestion control, PMTU or PLPMTU discovery, support for ECN, etc., need to be provided by upper-layer protocols [RFC8085].3.4.2. Interface Description
There is no API currently specified in the RFC Series, but guidance on use of common APIs is provided in [RFC8085]. The interface of UDP-Lite differs from that of UDP by the addition of a single (socket) option that communicates a checksum coverage length value. The checksum coverage may also be made visible to the application via the UDP-Lite MIB module [RFC5097].3.4.3. Transport Features
The transport features provided by UDP-Lite are: o unicast, multicast, anycast, or IPv4 broadcast transport (same as for UDP), o port multiplexing (same as for UDP), o message-oriented delivery (same as for UDP), o unidirectional or bidirectional communication where the transmissions in each direction are independent (same as for UDP), o non-reliable delivery (same as for UDP), o non-ordered delivery (same as for UDP), and o partial or full payload error detection (where the Checksum Coverage field indicates the size of the payload data covered by the checksum).3.5. Stream Control Transmission Protocol (SCTP)
SCTP is a message-oriented IETF Standards Track transport protocol. The base protocol is specified in [RFC4960]. It supports multihoming and path failover to provide resilience to path failures. An SCTP association has multiple streams in each direction, providing in-sequence delivery of user messages within each stream. This
allows it to minimize head-of-line blocking. SCTP supports multiple stream- scheduling schemes controlling stream multiplexing, including priority and fair weighting schemes. SCTP was originally developed for transporting telephony signaling messages and is deployed in telephony signaling networks, especially in mobile telephony networks. It can also be used for other services, for example, in the WebRTC framework for data channels.3.5.1. Protocol Description
SCTP is a connection-oriented protocol using a four-way handshake to establish an SCTP association and a three-way message exchange to gracefully shut it down. It uses the same port number concept as DCCP, TCP, UDP, and UDP-Lite. SCTP only supports unicast. SCTP uses the 32-bit CRC32c for protecting SCTP packets against bit errors and misdelivery of packets to an unintended endpoint. This is stronger than the 16-bit checksums used by TCP or UDP. However, partial payload checksum coverage as provided by DCCP or UDP-Lite is not supported. SCTP has been designed with extensibility in mind. A common header is followed by a sequence of chunks. [RFC4960] defines how a receiver processes chunks with an unknown chunk type. The support of extensions can be negotiated during the SCTP handshake. Currently defined extensions include mechanisms for dynamic reconfiguration of streams [RFC6525] and IP addresses [RFC5061]. Furthermore, the extension specified in [RFC3758] introduces the concept of partial reliability for user messages. SCTP provides a message-oriented service. Multiple small user messages can be bundled into a single SCTP packet to improve efficiency. For example, this bundling may be done by delaying user messages at the sender, similar to Nagle's algorithm used by TCP. User messages that would result in IP packets larger than the MTU will be fragmented at the sender and reassembled at the receiver. There is no protocol limit on the user message size. For MTU discovery, the same mechanism as for TCP can be used [RFC1981] [RFC4821], as well as utilization of probe packets with padding chunks, as defined in [RFC4820]. [RFC4960] specifies TCP-friendly congestion control to protect the network against overload. SCTP also uses sliding window flow control to protect receivers against overflow. Similar to TCP, SCTP also supports delaying acknowledgments. [RFC7053] provides a way for the sender of user messages to request immediate sending of the corresponding acknowledgments.
Each SCTP association has between 1 and 65536 unidirectional streams in each direction. The number of streams can be different in each direction. Every user message is sent on a particular stream. User messages can be sent unordered or ordered upon request by the upper layer. Unordered messages can be delivered as soon as they are completely received. For user messages not requiring fragmentation, this minimizes head-of-line blocking. On the other hand, ordered messages sent on the same stream are delivered at the receiver in the same order as sent by the sender. The base protocol defined in [RFC4960] does not allow interleaving of user messages. Large messages on one stream can therefore block the sending of user messages on other streams. [SCTP-NDATA] describes a method to overcome this limitation. This document also specifies multiple algorithms for the sender-side selection of which streams to send data from, supporting a variety of scheduling algorithms including priority-based methods. The stream reconfiguration extension defined in [RFC6525] allows streams to be reset during the lifetime of an association and to increase the number of streams, if the number of streams negotiated in the SCTP handshake becomes insufficient. Each user message sent is delivered to the receiver or, in case of excessive retransmissions, the association is terminated in a non-graceful way [RFC4960], similar to TCP behavior. In addition to this reliable transfer, the partial reliability extension [RFC3758] allows a sender to abandon user messages. The application can specify the policy for abandoning user messages. SCTP supports multihoming. Each SCTP endpoint uses a list of IP addresses and a single port number. These addresses can be any mixture of IPv4 and IPv6 addresses. These addresses are negotiated during the handshake, and the address reconfiguration extension specified in [RFC5061] in combination with [RFC4895] can be used to change these addresses in an authenticated way during the lifetime of an SCTP association. This allows for transport-layer mobility. Multiple addresses are used for improved resilience. If a remote address becomes unreachable, the traffic is switched over to a reachable one, if one exists. For securing user messages, the use of TLS over SCTP has been specified in [RFC3436]. However, this solution does not support all services provided by SCTP, such as unordered delivery or partial reliability. Therefore, the use of DTLS over SCTP has been specified in [RFC6083] to overcome these limitations. When using DTLS over SCTP, the application can use almost all services provided by SCTP.
[NAT-SUPP] defines methods for endpoints and middleboxes to provide NAT traversal for SCTP over IPv4. For legacy NAT traversal, [RFC6951] defines the UDP encapsulation of SCTP packets. Alternatively, SCTP packets can be encapsulated in DTLS packets as specified in [SCTP-DTLS-ENCAPS]. The latter encapsulation is used within the WebRTC [WEBRTC-TRANS] context. An SCTP ABORT chunk may be used to force a SCTP endpoint to close a session [RFC4960], aborting the connection. SCTP has a well-defined API, described in the next subsection.3.5.2. Interface Description
[RFC4960] defines an abstract API for the base protocol. This API describes the following functions callable by the upper layer of SCTP: Initialize, Associate, Send, Receive, Receive Unsent Message, Receive Unacknowledged Message, Shutdown, Abort, SetPrimary, Status, Change Heartbeat, Request Heartbeat, Get SRTT Report, Set Failure Threshold, Set Protocol Parameters, and Destroy. The following notifications are provided by the SCTP stack to the upper layer: COMMUNICATION UP, DATA ARRIVE, SHUTDOWN COMPLETE, COMMUNICATION LOST, COMMUNICATION ERROR, RESTART, SEND FAILURE, and NETWORK STATUS CHANGE. An extension to the BSD Sockets API is defined in [RFC6458] and covers: o the base protocol defined in [RFC4960]. The API allows control over local addresses and port numbers and the primary path. Furthermore, the application has fine control of parameters like retransmission thresholds, the path supervision, the delayed acknowledgment timeout, and the fragmentation point. The API provides a mechanism to allow the SCTP stack to notify the application about events if the application has requested them. These notifications provide information about status changes of the association and each of the peer addresses. In case of send failures, including drop of messages sent unreliably, the application can also be notified, and user messages can be returned to the application. When sending user messages, the application can indicate a stream id, a payload protocol identifier, and an indication of whether ordered delivery is requested. These parameters can also be provided on message reception. Additionally, a context can be provided when sending, which can be used in case of send failures. The sending of arbitrarily large user messages is supported.
o the SCTP Partial Reliability extension defined in [RFC3758] to specify for a user message the Partially Reliable SCTP (PR-SCTP) policy and the policy-specific parameter. Examples of these policies defined in [RFC3758] and [RFC7496] are: * limiting the time a user message is dealt with by the sender. * limiting the number of retransmissions for each fragment of a user message. If the number of retransmissions is limited to 0, one gets a service similar to UDP. * abandoning messages of lower priority in case of a send buffer shortage. o the SCTP Authentication extension defined in [RFC4895] allowing management of the shared keys and allowing the HMAC to use and set the chunk types (which are only accepted in an authenticated way) and get the list of chunks that are accepted by the local and remote endpoints in an authenticated way. o the SCTP Dynamic Address Reconfiguration extension defined in [RFC5061]. It allows the manual addition and deletion of local addresses for SCTP associations, as well as the enabling of automatic address addition and deletion. Furthermore, the peer can be given a hint for choosing its primary path. A BSD Sockets API extension has been defined in the documents that specify the following SCTP extensions: o the SCTP Stream Reconfiguration extension defined in [RFC6525]. The API allows triggering of the reset operation for incoming and outgoing streams and the whole association. It also provides a way to notify the association about the corresponding events. Furthermore, the application can increase the number of streams. o the UDP Encapsulation of SCTP packets extension defined in [RFC6951]. The API allows the management of the remote UDP encapsulation port. o the SCTP SACK-IMMEDIATELY extension defined in [RFC7053]. The API allows the sender of a user message to request the receiver to send the corresponding acknowledgment immediately. o the additional PR-SCTP policies defined in [RFC7496]. The API allows enabling/disabling the PR-SCTP extension, choosing the PR-SCTP policies defined in the document, and providing statistical information about abandoned messages.
Future documents describing SCTP extensions are expected to describe the corresponding BSD Sockets API extension in a "Socket API Considerations" section. The SCTP Socket API supports two kinds of sockets: o one-to-one style sockets (by using the socket type "SOCK_STREAM"). o one-to-many style socket (by using the socket type "SOCK_SEQPACKET"). One-to-one style sockets are similar to TCP sockets; there is a 1:1 relationship between the sockets and the SCTP associations (except for listening sockets). One-to-many style SCTP sockets are similar to unconnected UDP sockets, where there is a 1:n relationship between the sockets and the SCTP associations. The SCTP stack can provide information to the applications about state changes of the individual paths and the association whenever they occur. These events are delivered similarly to user messages but are specifically marked as notifications. New functions have been introduced to support the use of multiple local and remote addresses. Additional SCTP-specific send and receive calls have been defined to permit SCTP-specific information to be sent without using ancillary data in the form of additional Control Message (cmsg) calls. These functions provide support for detecting partial delivery of user messages and notifications. The SCTP Socket API allows a fine-grained control of the protocol behavior through an extensive set of socket options. The SCTP kernel implementations of FreeBSD, Linux, and Solaris follow mostly the specified extension to the BSD Sockets API for the base protocol and the corresponding supported protocol extensions.3.5.3. Transport Features
The transport features provided by SCTP are: o connection-oriented transport with feature negotiation and application-to-port mapping, o unicast transport, o port multiplexing, o unidirectional or bidirectional communication,
o message-oriented delivery with durable message framing supporting multiple concurrent streams, o fully reliable, partially reliable, or unreliable delivery (based on user-specified policy to handle abandoned user messages) with drop notification, o ordered and unordered delivery within a stream, o support for stream scheduling prioritization, o segmentation, o user message bundling, o flow control using a window-based mechanism, o congestion control using methods similar to TCP, o strong error detection (CRC32c), and o transport-layer multihoming for resilience and mobility.