Tech-invite3GPPspaceIETFspace
96959493929190898887868584838281807978777675747372717069686766656463626160595857565554535251504948474645444342414039383736353433323130292827262524232221201918171615141312111009080706050403020100
in Index   Prev   Next

RFC 8095

Services Provided by IETF Transport Protocols and Congestion Control Mechanisms

Pages: 54
Informational
Errata
Part 1 of 3 – Pages 1 to 20
None   None   Next

Top   ToC   RFC8095 - Page 1
Internet Engineering Task Force (IETF)                 G. Fairhurst, Ed.
Request for Comments: 8095                        University of Aberdeen
Category: Informational                                 B. Trammell, Ed.
ISSN: 2070-1721                                       M. Kuehlewind, Ed.
                                                              ETH Zurich
                                                              March 2017


                          Services Provided by
       IETF Transport Protocols and Congestion Control Mechanisms

Abstract

This document describes, surveys, and classifies the protocol mechanisms provided by existing IETF protocols, as background for determining a common set of transport services. It examines the Transmission Control Protocol (TCP), Multipath TCP, the Stream Control Transmission Protocol (SCTP), the User Datagram Protocol (UDP), UDP-Lite, the Datagram Congestion Control Protocol (DCCP), the Internet Control Message Protocol (ICMP), the Real-Time Transport Protocol (RTP), File Delivery over Unidirectional Transport / Asynchronous Layered Coding (FLUTE/ALC) for Reliable Multicast, NACK- Oriented Reliable Multicast (NORM), Transport Layer Security (TLS), Datagram TLS (DTLS), and the Hypertext Transport Protocol (HTTP), when HTTP is used as a pseudotransport. This survey provides background for the definition of transport services within the TAPS working group. Status of This Memo This document is not an Internet Standards Track specification; it is published for informational purposes. This document is a product of the Internet Engineering Task Force (IETF). It represents the consensus of the IETF community. It has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG). Not all documents approved by the IESG are a candidate for any level of Internet Standard; see Section 2 of RFC 7841. Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at http://www.rfc-editor.org/info/rfc8095.
Top   ToC   RFC8095 - Page 2
Copyright Notice

   Copyright (c) 2017 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

1. Introduction ....................................................4 1.1. Overview of Transport Features .............................4 2. Terminology .....................................................5 3. Existing Transport Protocols ....................................6 3.1. Transport Control Protocol (TCP) ...........................6 3.1.1. Protocol Description ................................6 3.1.2. Interface Description ...............................8 3.1.3. Transport Features ..................................9 3.2. Multipath TCP (MPTCP) .....................................10 3.2.1. Protocol Description ...............................10 3.2.2. Interface Description ..............................10 3.2.3. Transport Features .................................11 3.3. User Datagram Protocol (UDP) ..............................11 3.3.1. Protocol Description ...............................11 3.3.2. Interface Description ..............................12 3.3.3. Transport Features .................................13 3.4. Lightweight User Datagram Protocol (UDP-Lite) .............13 3.4.1. Protocol Description ...............................13 3.4.2. Interface Description ..............................14 3.4.3. Transport Features .................................14 3.5. Stream Control Transmission Protocol (SCTP) ...............14 3.5.1. Protocol Description ...............................15 3.5.2. Interface Description ..............................17 3.5.3. Transport Features .................................19 3.6. Datagram Congestion Control Protocol (DCCP) ...............20 3.6.1. Protocol Description ...............................21 3.6.2. Interface Description ..............................22 3.6.3. Transport Features .................................22
Top   ToC   RFC8095 - Page 3
      3.7. Transport Layer Security (TLS) and Datagram TLS
           (DTLS) as a Pseudotransport ...............................23
           3.7.1. Protocol Description ...............................23
           3.7.2. Interface Description ..............................24
           3.7.3. Transport Features .................................25
      3.8. Real-Time Transport Protocol (RTP) ........................26
           3.8.1. Protocol Description ...............................26
           3.8.2. Interface Description ..............................27
           3.8.3. Transport Features .................................27
      3.9. Hypertext Transport Protocol (HTTP) over TCP as a
           Pseudotransport ...........................................28
           3.9.1. Protocol Description ...............................28
           3.9.2. Interface Description ..............................29
           3.9.3. Transport Features .................................30
      3.10. File Delivery over Unidirectional Transport /
            Asynchronous Layered Coding (FLUTE/ALC) for
            Reliable Multicast .......................................31
           3.10.1. Protocol Description ..............................31
           3.10.2. Interface Description .............................33
           3.10.3. Transport Features ................................33
      3.11. NACK-Oriented Reliable Multicast (NORM) ..................34
           3.11.1. Protocol Description ..............................34
           3.11.2. Interface Description .............................35
           3.11.3. Transport Features ................................36
      3.12. Internet Control Message Protocol (ICMP) .................36
           3.12.1. Protocol Description ..............................37
           3.12.2. Interface Description .............................37
           3.12.3. Transport Features ................................38
   4. Congestion Control .............................................38
   5. Transport Features .............................................39
   6. IANA Considerations ............................................42
   7. Security Considerations ........................................42
   8. Informative References .........................................42
   Acknowledgments ...................................................53
   Contributors ......................................................53
   Authors' Addresses ................................................54
Top   ToC   RFC8095 - Page 4

1. Introduction

Internet applications make use of the services provided by a transport protocol, such as TCP (a reliable, in-order stream protocol) or UDP (an unreliable datagram protocol). We use the term "transport service" to mean the end-to-end service provided to an application by the transport layer. That service can only be provided correctly if information about the intended usage is supplied from the application. The application may determine this information at design time, compile time, or run time, and may include guidance on whether a feature is required, a preference by the application, or something in between. Examples of features of transport services are reliable delivery, ordered delivery, content privacy to in-path devices, and integrity protection. The IETF has defined a wide variety of transport protocols beyond TCP and UDP, including SCTP, DCCP, MPTCP, and UDP-Lite. Transport services may be provided directly by these transport protocols or layered on top of them using protocols such as WebSockets (which runs over TCP), RTP (over TCP or UDP) or WebRTC data channels (which run over SCTP over DTLS over UDP or TCP). Services built on top of UDP or UDP-Lite typically also need to specify additional mechanisms, including a congestion control mechanism (such as NewReno [RFC6582], TCP-Friendly Rate Control (TFRC) [RFC5348], or Low Extra Delay Background Transport (LEDBAT) [RFC6817]). This extends the set of available transport services beyond those provided to applications by TCP and UDP. The transport protocols described in this document provide a basis for the definition of transport services provided by common protocols, as background for the TAPS working group. The protocols listed here were chosen to help expose as many potential transport services as possible and are not meant to be a comprehensive survey or classification of all transport protocols.

1.1. Overview of Transport Features

Transport protocols can be differentiated by the features of the services they provide. Some of these provided features are closely related to basic control function that a protocol needs to work over a network path, such as addressing. The number of participants in a given association also determines its applicability: a connection can be between endpoints (unicast), to one of multiple endpoints (anycast), or simultaneously to multiple endpoints (multicast). Unicast protocols usually support bidirectional communication, while multicast is generally
Top   ToC   RFC8095 - Page 5
   unidirectional.  Another feature is whether a transport requires a
   control exchange across the network at setup (e.g., TCP) or whether
   it is connectionless (e.g., UDP).

   For packet delivery itself, reliability and integrity protection,
   ordering, and framing are basic features.  However, these features
   are implemented with different levels of assurance in different
   protocols.  As an example, a transport service may provide full
   reliability, with detection of loss and retransmission (e.g., TCP).
   SCTP offers a message-based service that can provide full or partial
   reliability and allows the protocol to minimize the head-of-line
   blocking due to the support of ordered and unordered message delivery
   within multiple streams.  UDP-Lite and DCCP can provide partial
   integrity protection to enable corruption tolerance.

   Usually, a protocol has been designed to support one specific type of
   delivery/framing: either data needs to be divided into transmission
   units based on network packets (datagram service) or a data stream is
   segmented and re-combined across multiple packets (stream service).
   Whole objects such as files are handled accordingly.  This decision
   strongly influences the interface that is provided to the upper
   layer.

   In addition, transport protocols offer a certain support for
   transmission control.  For example, a transport service can provide
   flow control to allow a receiver to regulate the transmission rate of
   a sender.  Further, a transport service can provide congestion
   control (see Section 4).  As an example, TCP and SCTP provide
   congestion control for use in the Internet, whereas UDP leaves this
   function to the upper-layer protocol that uses UDP.

   Security features are often provided independently of the transport
   protocol, via Transport Layer Security (TLS) (see Section 3.7) or by
   the application-layer protocol itself.  The security properties TLS
   provides to the application (such as confidentiality, integrity, and
   authenticity) are also features of the transport layer, even though
   they are often presently implemented in a separate protocol.

2. Terminology

The following terms are used throughout this document and in subsequent documents produced by the TAPS working group that describe the composition and decomposition of transport services. Transport Feature: a specific end-to-end feature that the transport layer provides to an application. Examples include confidentiality, reliable delivery, ordered delivery, message- versus-stream orientation, etc.
Top   ToC   RFC8095 - Page 6
   Transport Service:  a set of transport features, without an
      association to any given framing protocol, that provides a
      complete service to an application.

   Transport Protocol:  an implementation that provides one or more
      different transport services using a specific framing and header
      format on the wire.

   Application:  an entity that uses the transport layer for end-to-end
      delivery data across the network (this may also be an upper-layer
      protocol or tunnel encapsulation).

3. Existing Transport Protocols

This section provides a list of known IETF transport protocols and transport protocol frameworks. It does not make an assessment about whether specific implementations of protocols are fully compliant to current IETF specifications.

3.1. Transport Control Protocol (TCP)

TCP is an IETF Standards Track transport protocol. [RFC793] introduces TCP as follows: The Transmission Control Protocol (TCP) is intended for use as a highly reliable host-to-host protocol between hosts in packet- switched computer communication networks, and in interconnected systems of such networks. Since its introduction, TCP has become the default connection- oriented, stream-based transport protocol in the Internet. It is widely implemented by endpoints and widely used by common applications.

3.1.1. Protocol Description

TCP is a connection-oriented protocol that provides a three-way handshake to allow a client and server to set up a connection and negotiate features and provides mechanisms for orderly completion and immediate teardown of a connection [RFC793] [TCP-SPEC]. TCP is defined by a family of RFCs (see [RFC7414]). TCP provides multiplexing to multiple sockets on each host using port numbers. A similar approach is adopted by other IETF-defined transports. An active TCP session is identified by its four-tuple of local and remote IP addresses and local and remote port numbers. The destination port during connection setup is often used to indicate the requested service.
Top   ToC   RFC8095 - Page 7
   TCP partitions a continuous stream of bytes into segments, sized to
   fit in IP packets based on a negotiated maximum segment size and
   further constrained by the effective Maximum Transmission Unit (MTU)
   from Path MTU Discovery (PMTUD).  ICMP-based PMTUD [RFC1191]
   [RFC1981] as well as Packetization Layer PMTUD (PLPMTUD) [RFC4821]
   have been defined by the IETF.

   Each byte in the stream is identified by a sequence number.  The
   sequence number is used to order segments on receipt, to identify
   segments in acknowledgments, and to detect unacknowledged segments
   for retransmission.  This is the basis of the reliable, ordered
   delivery of data in a TCP stream.  TCP Selective Acknowledgment
   (SACK) [RFC2018] extends this mechanism by making it possible to
   provide earlier identification of which segments are missing,
   allowing faster retransmission.  SACK-based methods (e.g., Duplicate
   Selective ACK) can also result in less spurious retransmission.

   Receiver flow control is provided by a sliding window, which limits
   the amount of unacknowledged data that can be outstanding at a given
   time.  The window scale option [RFC7323] allows a receiver to use
   windows greater than 64 KB.

   All TCP senders provide congestion control, such as that described in
   [RFC5681].  TCP uses a sequence number with a sliding receiver window
   for flow control.  The TCP congestion control mechanism also utilizes
   this TCP sequence number to manage a separate congestion window
   [RFC5681].  The sending window at a given point in time is the
   minimum of the receiver window and the congestion window.  The
   congestion window is increased in the absence of congestion and
   decreased if congestion is detected.  Often, loss is implicitly
   handled as a congestion indication, which is detected in TCP (also as
   input for retransmission handling) based on two mechanisms: a
   retransmission timer with exponential back-off or the reception of
   three acknowledgments for the same segment, so called "duplicated
   ACKs" (fast retransmit).  In addition, Explicit Congestion
   Notification (ECN) [RFC3168] can be used in TCP and, if supported by
   both endpoints, allows a network node to signal congestion without
   inducing loss.  Alternatively, a delay-based congestion control
   scheme that reacts to changes in delay as an early indication of
   congestion can be used in TCP.  This is further described in
   Section 4.  Examples of different kinds of congestion control schemes
   are provided in Section 4.

   TCP protocol instances can be extended (see [RFC7414]).  Some
   protocol features may also be tuned to optimize for a specific
   deployment scenario.  Some features are sender-side only, requiring
   no negotiation with the receiver; some are receiver-side only; and
   some are explicitly negotiated during connection setup.
Top   ToC   RFC8095 - Page 8
   TCP may buffer data, e.g., to optimize processing or capacity usage.
   TCP therefore provides mechanisms to control this, including an
   optional "PUSH" function [RFC793] that explicitly requests the
   transport service not to delay data.  By default, TCP segment
   partitioning uses Nagle's algorithm [TCP-SPEC] to buffer data at the
   sender into large segments, potentially incurring sender-side
   buffering delay; this algorithm can be disabled by the sender to
   transmit more immediately, e.g., to reduce latency for interactive
   sessions.

   TCP provides an "urgent data" function for limited out-of-order
   delivery of the data.  This function is deprecated [RFC6093].

   A TCP Reset (RST) control message may be used to force a TCP endpoint
   to close a session [RFC793], aborting the connection.

   A mandatory checksum provides a basic integrity check against
   misdelivery and data corruption over the entire packet.  Applications
   that require end-to-end integrity of data are recommended to include
   a stronger integrity check of their payload data.  The TCP checksum
   [RFC1071] [RFC2460] does not support partial payload protection (as
   in DCCP/UDP-Lite).

   TCP supports only unicast connections.

3.1.2. Interface Description

The User/TCP Interface defined in [RFC793] provides six user commands: Open, Send, Receive, Close, Status, and Abort. This interface does not describe configuration of TCP options or parameters aside from the use of the PUSH and URGENT flags. [RFC1122] describes extensions of the TCP/application-layer interface for: o reporting soft errors such as reception of ICMP error messages, extensive retransmission, or urgent pointer advance, o providing a possibility to specify the Differentiated Services Code Point (DSCP) [RFC3260] (formerly, the Type-of-Service (TOS)) for segments, o providing a flush call to empty the TCP send queue, and o multihoming support.
Top   ToC   RFC8095 - Page 9
   In API implementations derived from the BSD Sockets API, TCP sockets
   are created using the "SOCK_STREAM" socket type as described in the
   IEEE Portable Operating System Interface (POSIX) Base Specifications
   [POSIX].  The features used by a protocol instance may be set and
   tuned via this API.  There are currently no documents in the RFC
   Series that describe this interface.

3.1.3. Transport Features

The transport features provided by TCP are: o connection-oriented transport with feature negotiation and application-to-port mapping (implemented using SYN segments and the TCP Option field to negotiate features), o unicast transport (though anycast TCP is implemented, at risk of instability due to rerouting), o port multiplexing, o unidirectional or bidirectional communication, o stream-oriented delivery in a single stream, o fully reliable delivery (implemented using ACKs sent from the receiver to confirm delivery), o error detection (implemented using a segment checksum to verify delivery to the correct endpoint and integrity of the data and options), o segmentation, o data bundling (optional; uses Nagle's algorithm to coalesce data sent within the same RTT into full-sized segments), o flow control (implemented using a window-based mechanism where the receiver advertises the window that it is willing to buffer), and o congestion control (usually implemented using a window-based mechanism and four algorithms for different phases of the transmission: slow start, congestion avoidance, fast retransmit, and fast recovery [RFC5681]).
Top   ToC   RFC8095 - Page 10

3.2. Multipath TCP (MPTCP)

Multipath TCP [RFC6824] is an extension for TCP to support multihoming for resilience, mobility, and load balancing. It is designed to be as indistinguishable to middleboxes from non-multipath TCP as possible. It does so by establishing regular TCP flows between a pair of source/destination endpoints and multiplexing the application's stream over these flows. Sub-flows can be started over IPv4 or IPv6 for the same session.

3.2.1. Protocol Description

MPTCP uses TCP options for its control plane. They are used to signal multipath capabilities, as well as to negotiate data sequence numbers, advertise other available IP addresses, and establish new sessions between pairs of endpoints. By multiplexing one byte stream over separate paths, MPTCP can achieve a higher throughput than TCP in certain situations. However, if coupled congestion control [RFC6356] is used, it might limit this benefit to maintain fairness to other flows at the bottleneck. When aggregating capacity over multiple paths, and depending on the way packets are scheduled on each TCP subflow, additional delay and higher jitter might be observed before in-order delivery of data to the applications.

3.2.2. Interface Description

By default, MPTCP exposes the same interface as TCP to the application. [RFC6897], however, describes a richer API for MPTCP- aware applications. This Basic API describes how an application can: o enable or disable MPTCP. o bind a socket to one or more selected local endpoints. o query local and remote endpoint addresses. o get a unique connection identifier (similar to an address-port pair for TCP). The document also recommends the use of extensions defined for SCTP [RFC6458] (see Section 3.5) to support multihoming for resilience and mobility.
Top   ToC   RFC8095 - Page 11

3.2.3. Transport Features

As an extension to TCP, MPTCP provides mostly the same features. By establishing multiple sessions between available endpoints, it can additionally provide soft failover solutions in the case that one of the paths becomes unusable. Therefore, the transport features provided by MPTCP in addition to TCP are: o multihoming for load balancing, with endpoint multiplexing of a single byte stream, using either coupled congestion control or throughput maximization, o address family multiplexing (using IPv4 and IPv6 for the same session), and o resilience to network failure and/or handover.

3.3. User Datagram Protocol (UDP)

The User Datagram Protocol (UDP) [RFC768] [RFC2460] is an IETF Standards Track transport protocol. It provides a unidirectional datagram protocol that preserves message boundaries. It provides no error correction, congestion control, or flow control. It can be used to send broadcast datagrams (IPv4) or multicast datagrams (IPv4 and IPv6), in addition to unicast and anycast datagrams. IETF guidance on the use of UDP is provided in [RFC8085]. UDP is widely implemented and widely used by common applications, including DNS.

3.3.1. Protocol Description

UDP is a connectionless protocol that maintains message boundaries, with no connection setup or feature negotiation. The protocol uses independent messages, ordinarily called "datagrams". It provides detection of payload errors and misdelivery of packets to an unintended endpoint, both of which result in discard of received datagrams, with no indication to the user of the service. It is possible to create IPv4 UDP datagrams with no checksum, and while this is generally discouraged [RFC1122] [RFC8085], certain special cases permit this use. These datagrams rely on the IPv4 header checksum to protect from misdelivery to an unintended endpoint. IPv6 does not permit UDP datagrams with no checksum, although in certain cases [RFC6936], this rule may be relaxed [RFC6935].
Top   ToC   RFC8095 - Page 12
   UDP does not provide reliability and does not provide retransmission.
   Messages may be reordered, lost, or duplicated in transit.  Note that
   due to the relatively weak form of checksum used by UDP, applications
   that require end-to-end integrity of data are recommended to include
   a stronger integrity check of their payload data.

   Because UDP provides no flow control, a receiving application that is
   unable to run sufficiently fast, or frequently, may miss messages.
   The lack of congestion handling implies UDP traffic may experience
   loss when using an overloaded path and may cause the loss of messages
   from other protocols (e.g., TCP) when sharing the same network path.

   On transmission, UDP encapsulates each datagram into a single IP
   packet or several IP packet fragments.  This allows a datagram to be
   larger than the effective path MTU.  Fragments are reassembled before
   delivery to the UDP receiver, making this transparent to the user of
   the transport service.  When jumbograms are supported, larger
   messages may be sent without performing fragmentation.

   UDP on its own does not provide support for segmentation, receiver
   flow control, congestion control, PMTUD/PLPMTUD, or ECN.
   Applications that require these features need to provide them on
   their own or use a protocol over UDP that provides them [RFC8085].

3.3.2. Interface Description

[RFC768] describes basic requirements for an API for UDP. Guidance on the use of common APIs is provided in [RFC8085]. A UDP endpoint consists of a tuple of (IP address, port number). De-multiplexing using multiple abstract endpoints (sockets) on the same IP address is supported. The same socket may be used by a single server to interact with multiple clients. (Note: This behavior differs from TCP, which uses a pair of tuples to identify a connection). Multiple server instances (processes) that bind to the same socket can cooperate to service multiple clients. The socket implementation arranges to not duplicate the same received unicast message to multiple server processes. Many operating systems also allow a UDP socket to be "connected", i.e., to bind a UDP socket to a specific (remote) UDP endpoint. Unlike TCP's connect primitive, for UDP, this is only a local operation that serves to simplify the local send/receive functions and to filter the traffic for the specified addresses and ports [RFC8085].
Top   ToC   RFC8095 - Page 13

3.3.3. Transport Features

The transport features provided by UDP are: o unicast, multicast, anycast, or IPv4 broadcast transport, o port multiplexing (where a receiving port can be configured to receive datagrams from multiple senders), o message-oriented delivery, o unidirectional or bidirectional communication where the transmissions in each direction are independent, o non-reliable delivery, o unordered delivery, and o error detection (implemented using a segment checksum to verify delivery to the correct endpoint and integrity of the data; optional for IPv4 and optional under specific conditions for IPv6 where all or none of the payload data is protected).

3.4. Lightweight User Datagram Protocol (UDP-Lite)

The Lightweight User Datagram Protocol (UDP-Lite) [RFC3828] is an IETF Standards Track transport protocol. It provides a unidirectional, datagram protocol that preserves message boundaries. IETF guidance on the use of UDP-Lite is provided in [RFC8085]. A UDP-Lite service may support IPv4 broadcast, multicast, anycast, and unicast, as well as IPv6 multicast, anycast, and unicast. Examples of use include a class of applications that can derive benefit from having partially damaged payloads delivered rather than discarded. One use is to provide header integrity checks but allow delivery of corrupted payloads to error-tolerant applications or to applications that use some other mechanism to provide payload integrity (see [RFC6936]).

3.4.1. Protocol Description

Like UDP, UDP-Lite is a connectionless datagram protocol, with no connection setup or feature negotiation. It changes the semantics of the UDP Payload Length field to that of a Checksum Coverage Length field and is identified by a different IP protocol/next-header value. The Checksum Coverage Length field specifies the intended checksum coverage, with the remaining unprotected part of the payload called
Top   ToC   RFC8095 - Page 14
   the "error-insensitive part".  Therefore, applications using UDP-Lite
   cannot make assumptions regarding the correctness of the data
   received in the insensitive part of the UDP-Lite payload.

   Otherwise, UDP-Lite is semantically identical to UDP.  In the same
   way as for UDP, mechanisms for receiver flow control, congestion
   control, PMTU or PLPMTU discovery, support for ECN, etc., need to be
   provided by upper-layer protocols [RFC8085].

3.4.2. Interface Description

There is no API currently specified in the RFC Series, but guidance on use of common APIs is provided in [RFC8085]. The interface of UDP-Lite differs from that of UDP by the addition of a single (socket) option that communicates a checksum coverage length value. The checksum coverage may also be made visible to the application via the UDP-Lite MIB module [RFC5097].

3.4.3. Transport Features

The transport features provided by UDP-Lite are: o unicast, multicast, anycast, or IPv4 broadcast transport (same as for UDP), o port multiplexing (same as for UDP), o message-oriented delivery (same as for UDP), o unidirectional or bidirectional communication where the transmissions in each direction are independent (same as for UDP), o non-reliable delivery (same as for UDP), o non-ordered delivery (same as for UDP), and o partial or full payload error detection (where the Checksum Coverage field indicates the size of the payload data covered by the checksum).

3.5. Stream Control Transmission Protocol (SCTP)

SCTP is a message-oriented IETF Standards Track transport protocol. The base protocol is specified in [RFC4960]. It supports multihoming and path failover to provide resilience to path failures. An SCTP association has multiple streams in each direction, providing in-sequence delivery of user messages within each stream. This
Top   ToC   RFC8095 - Page 15
   allows it to minimize head-of-line blocking.  SCTP supports multiple
   stream- scheduling schemes controlling stream multiplexing, including
   priority and fair weighting schemes.

   SCTP was originally developed for transporting telephony signaling
   messages and is deployed in telephony signaling networks, especially
   in mobile telephony networks.  It can also be used for other
   services, for example, in the WebRTC framework for data channels.

3.5.1. Protocol Description

SCTP is a connection-oriented protocol using a four-way handshake to establish an SCTP association and a three-way message exchange to gracefully shut it down. It uses the same port number concept as DCCP, TCP, UDP, and UDP-Lite. SCTP only supports unicast. SCTP uses the 32-bit CRC32c for protecting SCTP packets against bit errors and misdelivery of packets to an unintended endpoint. This is stronger than the 16-bit checksums used by TCP or UDP. However, partial payload checksum coverage as provided by DCCP or UDP-Lite is not supported. SCTP has been designed with extensibility in mind. A common header is followed by a sequence of chunks. [RFC4960] defines how a receiver processes chunks with an unknown chunk type. The support of extensions can be negotiated during the SCTP handshake. Currently defined extensions include mechanisms for dynamic reconfiguration of streams [RFC6525] and IP addresses [RFC5061]. Furthermore, the extension specified in [RFC3758] introduces the concept of partial reliability for user messages. SCTP provides a message-oriented service. Multiple small user messages can be bundled into a single SCTP packet to improve efficiency. For example, this bundling may be done by delaying user messages at the sender, similar to Nagle's algorithm used by TCP. User messages that would result in IP packets larger than the MTU will be fragmented at the sender and reassembled at the receiver. There is no protocol limit on the user message size. For MTU discovery, the same mechanism as for TCP can be used [RFC1981] [RFC4821], as well as utilization of probe packets with padding chunks, as defined in [RFC4820]. [RFC4960] specifies TCP-friendly congestion control to protect the network against overload. SCTP also uses sliding window flow control to protect receivers against overflow. Similar to TCP, SCTP also supports delaying acknowledgments. [RFC7053] provides a way for the sender of user messages to request immediate sending of the corresponding acknowledgments.
Top   ToC   RFC8095 - Page 16
   Each SCTP association has between 1 and 65536 unidirectional streams
   in each direction.  The number of streams can be different in each
   direction.  Every user message is sent on a particular stream.  User
   messages can be sent unordered or ordered upon request by the upper
   layer.  Unordered messages can be delivered as soon as they are
   completely received.  For user messages not requiring fragmentation,
   this minimizes head-of-line blocking.  On the other hand, ordered
   messages sent on the same stream are delivered at the receiver in the
   same order as sent by the sender.

   The base protocol defined in [RFC4960] does not allow interleaving of
   user messages.  Large messages on one stream can therefore block the
   sending of user messages on other streams.  [SCTP-NDATA] describes a
   method to overcome this limitation.  This document also specifies
   multiple algorithms for the sender-side selection of which streams to
   send data from, supporting a variety of scheduling algorithms
   including priority-based methods.  The stream reconfiguration
   extension defined in [RFC6525] allows streams to be reset during the
   lifetime of an association and to increase the number of streams, if
   the number of streams negotiated in the SCTP handshake becomes
   insufficient.

   Each user message sent is delivered to the receiver or, in case of
   excessive retransmissions, the association is terminated in a
   non-graceful way [RFC4960], similar to TCP behavior.  In addition to
   this reliable transfer, the partial reliability extension [RFC3758]
   allows a sender to abandon user messages.  The application can
   specify the policy for abandoning user messages.

   SCTP supports multihoming.  Each SCTP endpoint uses a list of IP
   addresses and a single port number.  These addresses can be any
   mixture of IPv4 and IPv6 addresses.  These addresses are negotiated
   during the handshake, and the address reconfiguration extension
   specified in [RFC5061] in combination with [RFC4895] can be used to
   change these addresses in an authenticated way during the lifetime of
   an SCTP association.  This allows for transport-layer mobility.
   Multiple addresses are used for improved resilience.  If a remote
   address becomes unreachable, the traffic is switched over to a
   reachable one, if one exists.

   For securing user messages, the use of TLS over SCTP has been
   specified in [RFC3436].  However, this solution does not support all
   services provided by SCTP, such as unordered delivery or partial
   reliability.  Therefore, the use of DTLS over SCTP has been specified
   in [RFC6083] to overcome these limitations.  When using DTLS over
   SCTP, the application can use almost all services provided by SCTP.
Top   ToC   RFC8095 - Page 17
   [NAT-SUPP] defines methods for endpoints and middleboxes to provide
   NAT traversal for SCTP over IPv4.  For legacy NAT traversal,
   [RFC6951] defines the UDP encapsulation of SCTP packets.
   Alternatively, SCTP packets can be encapsulated in DTLS packets as
   specified in [SCTP-DTLS-ENCAPS].  The latter encapsulation is used
   within the WebRTC [WEBRTC-TRANS] context.

   An SCTP ABORT chunk may be used to force a SCTP endpoint to close a
   session [RFC4960], aborting the connection.

   SCTP has a well-defined API, described in the next subsection.

3.5.2. Interface Description

[RFC4960] defines an abstract API for the base protocol. This API describes the following functions callable by the upper layer of SCTP: Initialize, Associate, Send, Receive, Receive Unsent Message, Receive Unacknowledged Message, Shutdown, Abort, SetPrimary, Status, Change Heartbeat, Request Heartbeat, Get SRTT Report, Set Failure Threshold, Set Protocol Parameters, and Destroy. The following notifications are provided by the SCTP stack to the upper layer: COMMUNICATION UP, DATA ARRIVE, SHUTDOWN COMPLETE, COMMUNICATION LOST, COMMUNICATION ERROR, RESTART, SEND FAILURE, and NETWORK STATUS CHANGE. An extension to the BSD Sockets API is defined in [RFC6458] and covers: o the base protocol defined in [RFC4960]. The API allows control over local addresses and port numbers and the primary path. Furthermore, the application has fine control of parameters like retransmission thresholds, the path supervision, the delayed acknowledgment timeout, and the fragmentation point. The API provides a mechanism to allow the SCTP stack to notify the application about events if the application has requested them. These notifications provide information about status changes of the association and each of the peer addresses. In case of send failures, including drop of messages sent unreliably, the application can also be notified, and user messages can be returned to the application. When sending user messages, the application can indicate a stream id, a payload protocol identifier, and an indication of whether ordered delivery is requested. These parameters can also be provided on message reception. Additionally, a context can be provided when sending, which can be used in case of send failures. The sending of arbitrarily large user messages is supported.
Top   ToC   RFC8095 - Page 18
   o  the SCTP Partial Reliability extension defined in [RFC3758] to
      specify for a user message the Partially Reliable SCTP (PR-SCTP)
      policy and the policy-specific parameter.  Examples of these
      policies defined in [RFC3758] and [RFC7496] are:

      *  limiting the time a user message is dealt with by the sender.

      *  limiting the number of retransmissions for each fragment of a
         user message.  If the number of retransmissions is limited to
         0, one gets a service similar to UDP.

      *  abandoning messages of lower priority in case of a send buffer
         shortage.

   o  the SCTP Authentication extension defined in [RFC4895] allowing
      management of the shared keys and allowing the HMAC to use and set
      the chunk types (which are only accepted in an authenticated way)
      and get the list of chunks that are accepted by the local and
      remote endpoints in an authenticated way.

   o  the SCTP Dynamic Address Reconfiguration extension defined in
      [RFC5061].  It allows the manual addition and deletion of local
      addresses for SCTP associations, as well as the enabling of
      automatic address addition and deletion.  Furthermore, the peer
      can be given a hint for choosing its primary path.

   A BSD Sockets API extension has been defined in the documents that
   specify the following SCTP extensions:

   o  the SCTP Stream Reconfiguration extension defined in [RFC6525].
      The API allows triggering of the reset operation for incoming and
      outgoing streams and the whole association.  It also provides a
      way to notify the association about the corresponding events.
      Furthermore, the application can increase the number of streams.

   o  the UDP Encapsulation of SCTP packets extension defined in
      [RFC6951].  The API allows the management of the remote UDP
      encapsulation port.

   o  the SCTP SACK-IMMEDIATELY extension defined in [RFC7053].  The API
      allows the sender of a user message to request the receiver to
      send the corresponding acknowledgment immediately.

   o  the additional PR-SCTP policies defined in [RFC7496].  The API
      allows enabling/disabling the PR-SCTP extension, choosing the
      PR-SCTP policies defined in the document, and providing
      statistical information about abandoned messages.
Top   ToC   RFC8095 - Page 19
   Future documents describing SCTP extensions are expected to describe
   the corresponding BSD Sockets API extension in a "Socket API
   Considerations" section.

   The SCTP Socket API supports two kinds of sockets:

   o  one-to-one style sockets (by using the socket type "SOCK_STREAM").

   o  one-to-many style socket (by using the socket type
      "SOCK_SEQPACKET").

   One-to-one style sockets are similar to TCP sockets; there is a 1:1
   relationship between the sockets and the SCTP associations (except
   for listening sockets).  One-to-many style SCTP sockets are similar
   to unconnected UDP sockets, where there is a 1:n relationship between
   the sockets and the SCTP associations.

   The SCTP stack can provide information to the applications about
   state changes of the individual paths and the association whenever
   they occur.  These events are delivered similarly to user messages
   but are specifically marked as notifications.

   New functions have been introduced to support the use of multiple
   local and remote addresses.  Additional SCTP-specific send and
   receive calls have been defined to permit SCTP-specific information
   to be sent without using ancillary data in the form of additional
   Control Message (cmsg) calls.  These functions provide support for
   detecting partial delivery of user messages and notifications.

   The SCTP Socket API allows a fine-grained control of the protocol
   behavior through an extensive set of socket options.

   The SCTP kernel implementations of FreeBSD, Linux, and Solaris follow
   mostly the specified extension to the BSD Sockets API for the base
   protocol and the corresponding supported protocol extensions.

3.5.3. Transport Features

The transport features provided by SCTP are: o connection-oriented transport with feature negotiation and application-to-port mapping, o unicast transport, o port multiplexing, o unidirectional or bidirectional communication,
Top   ToC   RFC8095 - Page 20
   o  message-oriented delivery with durable message framing supporting
      multiple concurrent streams,

   o  fully reliable, partially reliable, or unreliable delivery (based
      on user-specified policy to handle abandoned user messages) with
      drop notification,

   o  ordered and unordered delivery within a stream,

   o  support for stream scheduling prioritization,

   o  segmentation,

   o  user message bundling,

   o  flow control using a window-based mechanism,

   o  congestion control using methods similar to TCP,

   o  strong error detection (CRC32c), and

   o  transport-layer multihoming for resilience and mobility.



(page 20 continued on part 2)

Next Section