Internet Engineering Task Force (IETF) M. Welzl Request for Comments: 8303 University of Oslo Category: Informational M. Tuexen ISSN: 2070-1721 Muenster Univ. of Appl. Sciences N. Khademi University of Oslo February 2018 On the Usage of Transport Features Provided by IETF Transport ProtocolsAbstract
This document describes how the transport protocols Transmission Control Protocol (TCP), MultiPath TCP (MPTCP), Stream Control Transmission Protocol (SCTP), User Datagram Protocol (UDP), and Lightweight User Datagram Protocol (UDP-Lite) expose services to applications and how an application can configure and use the features that make up these services. It also discusses the service provided by the Low Extra Delay Background Transport (LEDBAT) congestion control mechanism. The description results in a set of transport abstractions that can be exported in a transport services (TAPS) API. Status of This Memo This document is not an Internet Standards Track specification; it is published for informational purposes. This document is a product of the Internet Engineering Task Force (IETF). It represents the consensus of the IETF community. It has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG). Not all documents approved by the IESG are a candidate for any level of Internet Standard; see Section 2 of RFC 7841. Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at https://www.rfc-editor.org/info/rfc8303.
Copyright Notice Copyright (c) 2018 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.Table of Contents
1. Introduction ....................................................3 2. Terminology .....................................................5 3. Pass 1 ..........................................................6 3.1. Primitives Provided by TCP .................................6 3.1.1. Excluded Primitives or Parameters ...................9 3.2. Primitives Provided by MPTCP ..............................10 3.3. Primitives Provided by SCTP ...............................11 3.3.1. Excluded Primitives or Parameters ..................18 3.4. Primitives Provided by UDP and UDP-Lite ...................18 3.5. The Service of LEDBAT .....................................19 4. Pass 2 .........................................................20 4.1. CONNECTION-Related Primitives .............................21 4.2. DATA-Transfer-Related Primitives ..........................38 5. Pass 3 .........................................................41 5.1. CONNECTION-Related Transport Features .....................41 5.2. DATA-Transfer-Related Transport Features ..................47 5.2.1. Sending Data .......................................47 5.2.2. Receiving Data .....................................48 5.2.3. Errors .............................................49 6. IANA Considerations ............................................49 7. Security Considerations ........................................49 8. References .....................................................50 8.1. Normative References ......................................50 8.2. Informative References ....................................52 Appendix A. Overview of RFCs Used as Input for Pass 1 .............54 Appendix B. How This Document Was Developed .......................54 Acknowledgements ..................................................56 Authors' Addresses ................................................56
1. Introduction
This specification describes how transport protocols offer transport services, such that applications using them are no longer directly tied to a specific protocol. Breaking this strict connection can reduce the effort for an application programmer, yet attain greater transport flexibility by pushing complexity into an underlying transport services (TAPS) system. This design process has started with a survey of the services provided by IETF transport protocols and congestion control mechanisms [RFC8095]. The present document and [RFC8304] complement this survey with an in-depth look at the defined interactions between applications and the following unicast transport protocols: Transmission Control Protocol (TCP), MultiPath TCP (MPTCP), Stream Control Transmission Protocol (SCTP), User Datagram Protocol (UDP), and Lightweight User Datagram Protocol (UDP-Lite). We also define a primitive to enable/disable and configure the Low Extra Delay Background Transport (LEDBAT) unicast congestion control mechanism. For UDP and UDP-Lite, the first step of the protocol analysis -- a discussion of relevant RFC text -- is documented in [RFC8304]. This snapshot in time of the IETF transport protocols is published as an RFC to document the analysis by the authors and the TAPS Working Group; this generates a set of transport abstractions that can be exported in a TAPS API. It provides the basis for the minimal set of transport services that end systems supporting TAPS should implement [TAPS-MINSET]. The list of primitives, events, and transport features in this document is strictly based on the parts of protocol specifications that describe what the protocol provides to an application using it and how the application interacts with it. Transport protocols provide communication between processes that operate on network endpoints, which means that they allow for multiplexing of communication between the same IP addresses, and this multiplexing is achieved using port numbers. Port multiplexing is therefore assumed to be always provided and not discussed in this document. Parts of a protocol that are explicitly stated as optional to implement are not covered. Interactions between the application and a transport protocol that are not directly related to the operation of the protocol are also not covered. For example, there are various ways for an application to use socket options to indicate its interest in receiving certain notifications [RFC6458]. However, for the purpose of identifying primitives, events, and transport features, the ability to enable or disable the reception of notifications is irrelevant. Similarly, "one-to-many style sockets"
[RFC6458] just affect the application programming style, not how the underlying protocol operates, and they are therefore not discussed here. The same is true for the ability to obtain the unchanged value of a parameter that an application has previously set (e.g., via "get" in get/set operations [RFC6458]). The document presents a three-pass process to arrive at a list of transport features. In the first pass (pass 1), the relevant RFC text is discussed per protocol. In the second pass (pass 2), this discussion is used to derive a list of primitives and events that are uniformly categorized across protocols. Here, an attempt is made to present or -- where text describing primitives or events does not yet exist -- construct primitives or events in a slightly generalized form to highlight similarities. This is, for example, achieved by renaming primitives or events of protocols or by avoiding a strict 1:1 mapping between the primitives or events in the protocol specification and primitives or events in the list. Finally, the third pass (pass 3) presents transport features based on pass 2, identifying which protocols implement them. In the list resulting from the second pass, some transport features are missing because they are implicit in some protocols, and they only become explicit when we consider the superset of all transport features offered by all protocols. For example, TCP always carries out congestion control; we have to consider it together with a protocol like UDP (which does not have congestion control) before we can consider congestion control as a transport feature. The complete list of transport features across all protocols is therefore only available after pass 3. Some protocols are connection oriented. Connection-oriented protocols often use an initial call to a specific primitive to open a connection before communication can progress and require communication to be explicitly terminated by issuing another call to a primitive (usually called 'Close'). A "connection" is the common state that some transport primitives refer to, e.g., to adjust general configuration settings. Connection establishment, maintenance, and termination are therefore used to categorize transport primitives of connection-oriented transport protocols in pass 2 and pass 3. For this purpose, UDP is assumed to be used with "connected" sockets, i.e., sockets that are bound to a specific pair of addresses and ports [RFC8304].
2. Terminology
Transport Feature: a specific end-to-end feature that the transport layer provides to an application. Examples include confidentiality, reliable delivery, ordered delivery, message- versus-stream orientation, etc. Transport Service: a set of transport features, without an association to any given framing protocol, which provides a complete service to an application. Transport Protocol: an implementation that provides one or more transport services using a specific framing and header format on the wire. Transport Protocol Component: an implementation of a transport feature within a protocol. Transport Service Instance: an arrangement of transport protocols with a selected set of features and configuration parameters that implement a single transport service, e.g., a protocol stack (RTP over UDP). Application: an entity that uses the transport layer for end-to-end delivery of data across the network (this may also be an upper- layer protocol or tunnel encapsulation). Endpoint: an entity that communicates with one or more other endpoints using a transport protocol. Connection: shared state of two or more endpoints that persists across messages that are transmitted between these endpoints. Primitive: a function call that is used to locally communicate between an application and a transport endpoint. A primitive is related to one or more transport features. Event: a primitive that is invoked by a transport endpoint. Parameter: a value passed between an application and a transport protocol by a primitive. Socket: the combination of a destination IP address and a destination port number. Transport Address: the combination of an IP address, transport protocol, and the port number used by the transport protocol.
3. Pass 1
This first iteration summarizes the relevant text parts of the RFCs describing the protocols, focusing on what each transport protocol provides to the application and how it is used (abstract API descriptions, where they are available). When presenting primitives, events, and parameters, the use of lower- and upper-case characters is made uniform for the sake of readability.3.1. Primitives Provided by TCP
The initial TCP specification [RFC0793] states: The Transmission Control Protocol (TCP) is intended for use as a highly reliable host-to-host protocol between hosts in packet- switched computer communication networks, and in interconnected systems of such networks. Section 3.8 of [RFC0793] further specifies the interaction with the application by listing several transport primitives. It is also assumed that an Operating System provides a means for TCP to asynchronously signal the application; the primitives representing such signals are called 'events' in this section. This section describes the relevant primitives. Open: This is either active or passive, to initiate a connection or listen for incoming connections. All other primitives are associated with a specific connection, which is assumed to first have been opened. An active open call contains a socket. A passive open call with a socket waits for a particular connection; alternatively, a passive open call can leave the socket unspecified to accept any incoming connection. A fully specified passive call can later be made active by calling 'Send'. Optionally, a timeout can be specified, after which TCP will abort the connection if data has not been successfully delivered to the destination (else a default timeout value is used). A procedure for aborting the connection is used to avoid excessive retransmissions, and an application is able to control the threshold used to determine the condition for aborting; this threshold may be measured in time units or as a count of retransmission [RFC1122]. This indicates that the timeout could also be specified as a count of retransmission. Also optional, for multihomed hosts, the local IP address can be provided [RFC1122]. If it is not provided, a default choice will be made in case of active open calls. A passive open call will await incoming connection requests to all local addresses and then maintain usage of the local IP address where the incoming
connection request has arrived. Finally, the 'options' parameter allows the application to specify IP options such as Source Route, Record Route, or Timestamp [RFC1122]. It is not stated on which segments of a connection these options should be applied, but probably on all segments, as this is also stated in a specification given for the usage of the Source Route IP option (Section 4.2.3.8 of [RFC1122]). Source Route is the only non- optional IP option in this parameter, allowing an application to specify a source route when it actively opens a TCP connection. Master Key Tuples (MKTs) for authentication can optionally be configured when calling 'Open' (Section 7.1 of [RFC5925]). When authentication is in use, complete TCP segments are authenticated, including the TCP IPv4 pseudoheader, TCP header, and TCP data. TCP Fast Open (TFO) [RFC7413] allows applications to immediately hand over a message from the active open to the passive open side of a TCP connection together with the first message establishment packet (the SYN). This can be useful for applications that are sensitive to TCP's connection setup delay. [RFC7413] states that "TCP implementations MUST NOT use TFO by default, but only use TFO if requested explicitly by the application on a per-service-port basis." The size of the message sent with TFO cannot be more than TCP's maximum segment size (minus options used in the SYN). For the active open side, it is recommended to change or replace the connect() call in order to support a user data buffer argument [RFC7413]. For the passive open side, the application needs to enable the reception of Fast Open requests, e.g., via a new TCP_FASTOPEN setsockopt() socket option before listen(). The receiving application must be prepared to accept duplicates of the TFO message, as the first data written to a socket can be delivered more than once to the application on the remote host. Send: This is the primitive that an application uses to give the local TCP transport endpoint a number of bytes that TCP should reliably send to the other side of the connection. The 'urgent' flag, if set, states that the data handed over by this send call is urgent and this urgency should be indicated to the receiving process in case the receiving application has not yet consumed all non-urgent data preceding it. An optional timeout parameter can be provided that updates the connection's timeout (see 'Open'). Additionally, optional parameters allow the ability to indicate the preferred outgoing MKT (current_key) and/or the preferred incoming MKT (rnext_key) of a connection (Section 7.1 of [RFC5925]).
Receive: This primitive allocates a receiving buffer for a provided number of bytes. It returns the number of received bytes provided in the buffer when these bytes have been received and written into the buffer by TCP. The application is informed of urgent data via an 'urgent' flag: if it is on, there is urgent data; if it is off, there is no urgent data or this call to 'Receive' has returned all the urgent data. The application is also informed about the current_key and rnext_key information carried in a recently received segment via an optional parameter (Section 7.1 of [RFC5925]). Close: This primitive closes one side of a connection. It is semantically equivalent to "I have no more data to send" but does not mean "I will not receive any more", as the other side may still have data to send. This call reliably delivers any data that has already been given to TCP (and if that fails, 'Close' becomes 'abort'). Abort: This primitive causes all pending 'Send' and 'Receive' calls to be aborted. A TCP "RESET" message is sent to the TCP endpoint on the other side of the connection [RFC0793]. Close Event: TCP uses this primitive to inform an application that the application on the other side has called the 'Close' primitive, so the local application can also issue a 'Close' and terminate the connection gracefully. See [RFC0793], Section 3.5. Abort Event: When TCP aborts a connection upon receiving a "RESET" from the peer, it "advises the user and goes to the CLOSED state." See [RFC0793], Section 3.4. User Timeout Event: This event is executed when the user timeout (Section 3.9 of [RFC0793]) expires (see the definition of 'Open' in this section). All queues are flushed, and the application is informed that the connection had to be aborted due to user timeout. Error_Report event: This event informs the application of "soft errors" that can be safely ignored [RFC5461], including the arrival of an ICMP error message or excessive retransmissions (reaching a threshold below the threshold where the connection is aborted). See Section 4.2.4.1 of [RFC1122]. Type-of-Service: Section 4.2.4.2 of the requirements for Internet hosts [RFC1122] states that "The application layer MUST be able to specify the Type-of-Service (TOS) for segments that are sent on a connection." The application should be able to change the TOS during the connection lifetime, and the TOS value should be passed
to the IP layer unchanged. Since then, the TOS field has been redefined. The Differentiated Services (Diffserv) model [RFC2475] [RFC3260] replaces this field in the IP header, assigning the six most significant bits to carry the Differentiated Services Code Point (DSCP) field [RFC2474]. Nagle: The Nagle algorithm delays sending data for some time to increase the likelihood of sending a full-sized segment (Section 4.2.3.4 of [RFC1122]). An application can disable the Nagle algorithm for an individual connection. User Timeout Option: The User Timeout Option (UTO) [RFC5482] allows one end of a TCP connection to advertise its current user timeout value so that the other end of the TCP connection can adapt its own user timeout accordingly. In addition to the configurable value of the user timeout (see 'Send'), there are three per- connection state variables that an application can adjust to control the operation of the UTO: 'adv_uto' is the value of the UTO advertised to the remote TCP peer (default: system-wide default user timeout); 'enabled' (default false) is a boolean-type flag that controls whether the UTO option is enabled for a connection. This applies to both sending and receiving. 'changeable' is a boolean-type flag (default true) that controls whether the user timeout may be changed based on a UTO option received from the other end of the connection. 'changeable' becomes false when an application explicitly sets the user timeout (see 'Send'). Set/Get Authentication Parameters: The preferred outgoing MKT (current_key) and/or the preferred incoming MKT (rnext_key) of a connection can be configured. Information about current_key and rnext_key carried in a recently received segment can be retrieved (Section 7.1 of [RFC5925]).3.1.1. Excluded Primitives or Parameters
The 'Open' primitive can be handed optional precedence or security/ compartment information [RFC0793], but this was not included here because it is mostly irrelevant today [RFC7414]. The 'Status' primitive was not included because the initial TCP specification describes this primitive as "implementation dependent" and states that it "could be excluded without adverse effect" [RFC0793]. Moreover, while a data block containing specific information is described, it is also stated that not all of this information may always be available. While [RFC5925] states that 'Status' "SHOULD be augmented to allow the MKTs of a current or pending connection to be read (for confirmation)", the same
information is also available via 'Receive', which, following [RFC5925], "MUST be augmented" with that functionality. The 'Send' primitive includes an optional 'push' flag which, if set, requires data to be promptly transmitted to the receiver without delay [RFC0793]; the 'Receive' primitive described in can (under some conditions) yield the status of the 'push' flag. Because "push" functionality is optional to implement for both the 'Send' and 'Receive' primitives [RFC1122], this functionality is not included here. The requirements for Internet hosts [RFC1122] also introduce keep-alives to TCP, but these are optional to implement and hence not considered here. The same document also describes that "some TCP implementations have included a FLUSH call", indicating that this call is also optional to implement; therefore, it is not considered here.3.2. Primitives Provided by MPTCP
MPTCP is an extension to TCP that allows the use of multiple paths for a single data stream. It achieves this by creating different so- called TCP subflows for each of the interfaces and scheduling the traffic across these TCP subflows. The service provided by MPTCP is described as follows in [RFC6182]: Multipath TCP MUST follow the same service model as TCP [RFC0793]: in-order, reliable, and byte-oriented delivery. Furthermore, a Multipath TCP connection SHOULD provide the application with no worse throughput or resilience than it would expect from running a single TCP connection over any one of its available paths. Further, there are some constraints on the API exposed by MPTCP, as stated in [RFC6182]: A multipath-capable equivalent of TCP MUST retain some level of backward compatibility with existing TCP APIs, so that existing applications can use the newer transport merely by upgrading the operating systems of the end hosts. As such, the primitives provided by MPTCP are equivalent to the ones provided by TCP. Nevertheless, the MPTCP RFCs [RFC6824] and [RFC6897] clarify some parts of TCP's primitives with respect to MPTCP and add some extensions for better control on MPTCP's subflows. Hereafter is a list of the clarifications and extensions the above- cited RFCs provide to TCP's primitives.
Open: "An application should be able to request to turn on or turn off the usage of MPTCP" [RFC6897]. This functionality can be provided through a socket option called 'tcp_multipath_enable'. Further, MPTCP must be disabled in case the application is binding to a specific address [RFC6897]. Send/Receive: The sending and receiving of data does not require any changes to the application when MPTCP is being used [RFC6824]. The MPTCP-layer will take one input data stream from an application, and split it into one or more subflows, with sufficient control information to allow it to be reassembled and delivered reliably and in order to the recipient application. The use of the Urgent Pointer is special in MPTCP [RFC6824], which states: "a TCP subflow MUST NOT use the Urgent Pointer to interrupt an existing mapping." Address and Subflow Management: MPTCP uses different addresses and allows a host to announce these addresses as part of the protocol. The MPTCP API Considerations RFC [RFC6897] says "An application should be able to restrict MPTCP to binding to a given set of addresses" and thus allows applications to limit the set of addresses that are being used by MPTCP. Further, "An application should be able to obtain information on the pairs of addresses used by the MPTCP subflows."3.3. Primitives Provided by SCTP
TCP has a number of limitations that SCTP removes (Section 1.1 of [RFC4960]). The following three removed limitations directly translate into transport features that are visible to an application using SCTP: 1) it allows for preservation of message delimiters; 2) it does not provide in-order or reliable delivery unless the application wants that; 3) multihoming is supported. In SCTP, connections are called "associations" and they can be between not only two (as in TCP) but multiple addresses at each endpoint. Section 10 of the SCTP base protocol specification [RFC4960] specifies the interaction with the application (which SCTP calls the "Upper-Layer Protocol (ULP)"). It is assumed that the Operating System provides a means for SCTP to asynchronously signal the application; the primitives representing such signals are called 'events' in this section. Here, we describe the relevant primitives. In addition to the abstract API described in Section 10 of [RFC4960], an extension to the sockets API is described in [RFC6458]. This covers the functionality of the base protocol [RFC4960] and some of its extensions [RFC3758] [RFC4895] [RFC5061]. For other protocol extensions [RFC6525] [RFC6951] [RFC7053] [RFC7496] [RFC7829]
[RFC8260], the corresponding extensions of the sockets API are specified in these protocol specifications. The functionality exposed to the ULP through all these APIs is considered here. The abstract API contains a 'SetProtocolParameters' primitive that allows elements of a parameter list [RFC4960] to be adjusted; it is stated that SCTP implementations "may allow ULP to customize some of these protocol parameters", indicating that none of the elements of this parameter list are mandatory to make ULP configurable. Thus, we only consider the parameters in the abstract API that are also covered in one of the other RFCs listed above, which leads us to exclude the parameters 'RTO.Alpha', 'RTO.Beta', and 'HB.Max.Burst'. For clarity, we also replace 'SetProtocolParameters' itself with primitives that adjust parameters or groups of parameters that fit together. Initialize: Initialize creates a local SCTP instance that it binds to a set of local addresses (and, if provided, a local port number) [RFC4960]. Initialize needs to be called only once per set of local addresses. A number of per-association initialization parameters can be used when an association is created, but before it is connected (via the primitive 'Associate' below): the maximum number of inbound streams the application is prepared to support, the maximum number of attempts to be made when sending the INIT (the first message of association establishment), and the maximum retransmission timeout (RTO) value to use when attempting an INIT [RFC6458]. At this point, before connecting, an application can also enable UDP encapsulation by configuring the remote UDP encapsulation port number [RFC6951]. Associate: This creates an association (the SCTP equivalent of a connection) that connects the local SCTP instance and a remote SCTP instance. To identify the remote endpoint, it can be given one or multiple (using "connectx") sockets (Section 9.9 of [RFC6458]). Most primitives are associated with a specific association, which is assumed to first have been created. Associate can return a list of destination transport addresses so that multiple paths can later be used. One of the returned sockets will be selected by the local endpoint as the default primary path for sending SCTP packets to this peer, but this choice can be changed by the application using the list of destination addresses. Associate is also given the number of outgoing streams to request and optionally returns the number of negotiated outgoing streams. An optional parameter of 32 bits, the adaptation layer indication, can be provided [RFC5061]. If authenticated chunks are used, the chunk types required to be sent authenticated by the peer can be provided [RFC4895]. An 'SCTP_Cant_Str_Assoc' notification is used to inform the
application of a failure to create an association [RFC6458]. An application could use sendto() or sendmsg() to implicitly set up an association, thereby handing over a message that SCTP might send during the association setup phase [RFC6458]. Note that this mechanism is different from TCP's TFO mechanism: the message would arrive only once, after at least one RTT, as it is sent together with the third message exchanged during association setup, the COOKIE-ECHO chunk). Send: This sends a message of a certain length in bytes over an association. A number can be provided to later refer to the correct message when reporting an error, and a stream id is provided to specify the stream to be used inside an association (we consider this as a mandatory parameter here for simplicity: if not provided, the stream id defaults to 0). A condition to abandon the message can be specified (for example limiting the number of retransmissions or the lifetime of the user message). This allows control of the partial reliability extension [RFC3758] [RFC7496]. An optional maximum lifetime can specify the time after which the message should be discarded rather than sent. A choice (advisory, i.e., not guaranteed) of the preferred path can be made by providing a socket, and the message can be delivered out-of-order if the 'unordered' flag is set. An advisory flag indicates that the peer should not delay the acknowledgement of the user message provided [RFC7053]. Another advisory flag indicates whether the application prefers to avoid bundling user data with other outbound DATA chunks (i.e., in the same packet). A payload protocol-id can be provided to pass a value that indicates the type of payload protocol data to the peer. If authenticated chunks are used, the key identifier for authenticating DATA chunks can be provided [RFC4895]. Receive: Messages are received from an association, and optionally a stream within the association, with their size returned. The application is notified of the availability of data via a 'Data Arrive' notification. If the sender has included a payload protocol-id, this value is also returned. If the received message is only a partial delivery of a whole message, a 'partial' flag will indicate so, in which case the stream id and a stream sequence number are provided to the application. Shutdown: This primitive gracefully closes an association, reliably delivering any data that has already been handed over to SCTP. A parameter lets the application control whether further receive or send operations or both are disabled when the call is issued. A return code informs about success or failure of this procedure.
Abort: This ungracefully closes an association, by discarding any locally queued data and informing the peer that the association was aborted. Optionally, an abort reason to be passed to the peer may be provided by the application. A return code informs about success or failure of this procedure. Change Heartbeat / Request Heartbeat: This allows the application to enable/disable heartbeats and optionally specify a heartbeat frequency as well as requesting a single heartbeat to be carried out upon a function call, with a notification about success or failure of transmitting the HEARTBEAT chunk to the destination. Configure Max. Retransmissions of an Association: The parameter 'Association.Max.Retrans' [RFC4960] (called "sasoc_maxrxt" in the SCTP sockets API extensions [RFC6458]) allows the configuration of the number of unsuccessful retransmissions after which an entire association is considered as failed; this should invoke a 'Communication Lost' notification. Set Primary: This allows the ability to set a new primary default path for an association by providing a socket. Optionally, a default source address to be used in IP datagrams can be provided. Change Local Address / Set Peer Primary: This allows an endpoint to add/remove local addresses to/from an association. In addition, the peer can be given a hint for which address to use as the primary address [RFC5061]. Configure Path Switchover: The abstract API contains a primitive called 'Set Failure Threshold' [RFC4960]. This configures the parameter 'Path.Max.Retrans', which determines after how many retransmissions a particular transport address is considered as unreachable. If there are more transport addresses available in an association, reaching this limit will invoke a path switchover. An extension called "SCTP-PF" adds a concept of "Potentially Failed (PF)" paths to this method [RFC7829]. When a path is in PF state, SCTP will not entirely give up sending on that path, but it will preferably send data on other active paths if such paths are available. Entering the PF state is done upon exceeding a configured maximum number of retransmissions. Thus, for all paths where this mechanism is used, there are two configurable error thresholds: one to decide that a path is in PF state, and one to decide that the transport address is unreachable. Set/Get Authentication Parameters: This allows an endpoint to add/ remove key material to/from an association. In addition, the chunk types being authenticated can be queried [RFC4895].
Add/Reset Streams, Reset Association: This allows an endpoint to add streams to an existing association or to reset them individually. Additionally, the association can be reset [RFC6525]. Status: The 'Status' primitive returns a data block with information about a specified association, containing: an association connection state; a destination transport address list; destination transport address reachability states; current local and peer receiver window sizes; current local congestion window sizes; number of unacknowledged DATA chunks; number of DATA chunks pending receipt; a primary path; the most recent Smoothed Round- Trip Time (SRTT) on a primary path; RTO on a primary path; SRTT and RTO on other destination addresses [RFC4960]; and an MTU per path [RFC6458]. Enable/Disable Interleaving: This allows the negotiation of user message interleaving support for future associations to be enabled or disabled. For existing associations, it is possible to query whether user message interleaving support was negotiated or not on a particular association [RFC8260]. Set Stream Scheduler: This allows the ability to select a stream scheduler per association, with a choice of: First-Come, First- Served; Round-Robin; Round-Robin per Packet; Priority-Based; Fair Bandwidth; and Weighted Fair Queuing [RFC8260]. Configure Stream Scheduler: This allows the ability to change a parameter per stream for the schedulers: a priority value for the Priority-Based scheduler and a weight for the Weighted Fair Queuing scheduler. Enable/Disable NoDelay: This turns on/off any Nagle-like algorithm for an association [RFC6458]. Configure Send Buffer Size: This controls the amount of data SCTP may have waiting in internal buffers to be sent or retransmitted [RFC6458]. Configure Receive Buffer Size: This sets the receive buffer size in octets, thereby controlling the receiver window for an association [RFC6458]. Configure Message Fragmentation: If a user message causes an SCTP packet to exceed the maximum fragmentation size (which can be provided by the application and is otherwise the Path MTU (PMTU) size), then the message will be fragmented by SCTP. Disabling message fragmentation will produce an error instead of fragmenting the message [RFC6458].
Configure Path MTU Discovery: Path MTU Discovery (PMTUD) can be enabled or disabled per peer address of an association (Section 8.1.12 of [RFC6458]). When it is enabled, the current Path MTU value can be obtained. When it is disabled, the Path MTU to be used can be controlled by the application. Configure Delayed SACK Timer: The time before sending a SACK can be adjusted; delaying SACKs can be disabled; and the number of packets that must be received before a SACK is sent without waiting for the delay timer to expire can be configured [RFC6458]. Set Cookie Life Value: The cookie life value can be adjusted (Section 8.1.2 of [RFC6458]). 'Valid.Cookie.Life' is also one of the parameters that is potentially adjustable with 'SetProtocolParameters' [RFC4960]. Set Maximum Burst: The maximum burst of packets that can be emitted by a particular association (default 4, and values above 4 are optional to implement) can be adjusted (Section 8.1.2 of [RFC6458]). 'Max.Burst' is also one of the parameters that is potentially adjustable with 'SetProtocolParameters' [RFC4960]. Configure RTO Calculation: The abstract API contains the following adjustable parameters: 'RTO.Initial'; 'RTO.Min'; 'RTO.Max'; 'RTO.Alpha'; and 'RTO.Beta'. Only the initial, minimum and maximum RTOs are also described as configurable in the SCTP sockets API extensions [RFC6458]. Set DSCP Value: The DSCP value can be set per peer address of an association (Section 8.1.12 of [RFC6458]). Set IPv6 Flow Label: The flow label field can be set per peer address of an association (Section 8.1.12 of [RFC6458]). Set Partial Delivery Point: This allows the ability to specify the size of a message where partial delivery will be invoked. Setting this to a lower value will cause partial deliveries to happen more often [RFC6458]. Communication Up Notification: When a lost communication to an endpoint is restored or when SCTP becomes ready to send or receive user messages, this notification informs the application process about the affected association, the type of event that has occurred, the complete set of sockets of the peer, the maximum number of allowed streams, and the inbound stream count (the number of streams the peer endpoint has requested). If interleaving is supported by both endpoints, this information is also included in this notification.
Restart Notification: When SCTP has detected that the peer has restarted, this notification is passed to the upper layer [RFC6458]. Data Arrive Notification: When a message is ready to be retrieved via the 'Receive' primitive, the application is informed by this notification. Send Failure Notification / Receive Unsent Message / Receive Unacknowledged Message: When a message cannot be delivered via an association, the sender can be informed about it and learn whether the message has just not been acknowledged or (e.g., in case of lifetime expiry) if it has not even been sent. This can also inform the sender that a part of the message has been successfully delivered. Network Status Change Notification: This informs the application about a socket becoming active/inactive [RFC4960] or "Potentially Failed" [RFC7829]. Communication Lost Notification: When SCTP loses communication to an endpoint (e.g., via heartbeats or excessive retransmission) or detects an abort, this notification informs the application process of the affected association and the type of event (failure OR termination in response to a shutdown or abort request). Shutdown Complete Notification: When SCTP completes the shutdown procedures, this notification is passed to the upper layer, informing it about the affected association. Authentication Notification: When SCTP wants to notify the upper layer regarding the key management related to authenticated chunks [RFC4895], this notification is passed to the upper layer. Adaptation Layer Indication Notification: When SCTP completes the association setup and the peer provided an adaptation layer indication, this is passed to the upper layer [RFC5061] [RFC6458]. Stream Reset Notification: When SCTP completes the procedure for resetting streams [RFC6525], this notification is passed to the upper layer, informing it about the result. Association Reset Notification: When SCTP completes the association reset procedure [RFC6525], this notification is passed to the upper layer, informing it about the result.
Stream Change Notification: When SCTP completes the procedure used to increase the number of streams [RFC6525], this notification is passed to the upper layer, informing it about the result. Sender Dry Notification: When SCTP has no more user data to send or retransmit on a particular association, this notification is passed to the upper layer [RFC6458]. Partial Delivery Aborted Notification: When a receiver has begun to receive parts of a user message but the delivery of this message is then aborted, this notification is passed to the upper layer (Section 6.1.7 of [RFC6458]).3.3.1. Excluded Primitives or Parameters
The 'Receive' primitive can return certain additional information, but this is optional to implement and therefore not considered. With a 'Communication Lost' notification, some more information may optionally be passed to the application (e.g., identification to retrieve unsent and unacknowledged data). SCTP "can invoke" a 'Communication Error' notification and "may send" a 'Restart' notification, making these two notifications optional to implement. The list provided under 'Status' includes "etc.", indicating that more information could be provided. The primitive 'Get SRTT Report' returns information that is included in the information that 'Status' provides and is therefore not discussed. The 'Destroy SCTP Instance' API function was excluded: it erases the SCTP instance that was created by 'Initialize' but is not a primitive as defined in this document because it does not relate to a transport feature. The 'Shutdown' event informs an application that the peer has sent a SHUTDOWN, and hence no further data should be sent on this socket (Section 6.1 of [RFC6458]). However, if an application would try to send data on the socket, it would get an error message anyway; thus, this event is classified as "just affecting the application programming style, not how the underlying protocol operates" and is not included here.3.4. Primitives Provided by UDP and UDP-Lite
The set of pass 1 primitives for UDP and UDP-Lite is documented in [RFC8304].
3.5. The Service of LEDBAT
The service of the LEDBAT congestion control mechanism is described as follows: LEDBAT is designed for use by background bulk-transfer applications to be no more aggressive than standard TCP congestion control (as specified in RFC 5681) and to yield in the presence of competing flows, thus limiting interference with the network performance of competing flows [RFC6817]. LEDBAT does not have any primitives, as LEDBAT is not a transport protocol. According to its specification [RFC6817]: LEDBAT can be used as part of a transport protocol or as part of an application, as long as the data transmission mechanisms are capable of carrying timestamps and acknowledging data frequently. LEDBAT can be used with TCP, Stream Control Transmission Protocol (SCTP), and Datagram Congestion Control Protocol (DCCP), with appropriate extensions where necessary; and it can be used with proprietary application protocols, such as those built on top of UDP for peer-to-peer (P2P) applications. At the time of writing, the appropriate extensions for TCP, SCTP, or DCCP do not exist. A number of configurable parameters exist in the LEDBAT specification: TARGET, which is the queuing delay target at which LEDBAT tries to operate, must be set to 100 ms or less. 'allowed_increase' (should be 1, must be greater than 0) limits the speed at which LEDBAT increases its rate. 'gain', which according to [RFC6817] "MUST be set to 1 or less" to avoid a faster ramp-up than TCP Reno, determines how quickly the sender responds to changes in queueing delay. Implementations may divide 'gain' into two parameters: one for increase and a possibly larger one for decrease. We call these parameters 'Gain_Inc' and 'Gain_Dec' here. 'Base_History' is the size of the list of measured base delays, and, according to [RFC6817], "SHOULD be 10". This list can be filtered using a 'Filter' function, which is not prescribed [RFC6817], that yields a list of size 'Current_Filter'. The initial and minimum congestion windows, 'Init_CWND' and 'Min_CWND', should both be 2. Regarding which of these parameters should be under control of an application, the possible range goes from exposing nothing on the one hand to considering everything that is not prescribed with a "MUST" in the specification as a parameter on the other hand. Function implementations are not provided as a parameter to any of the transport protocols discussed here; hence, we do not regard the
'Filter' function as a parameter. However, to avoid unnecessarily limiting future implementations, we consider all other parameters above as tunable parameters that should be exposed.