Internet Engineering Task Force (IETF) R. Stewart Request for Comments: 6458 Adara Networks Category: Informational M. Tuexen ISSN: 2070-1721 Muenster Univ. of Appl. Sciences K. Poon Oracle Corporation P. Lei Cisco Systems, Inc. V. Yasevich HP December 2011 Sockets API Extensions for the Stream Control Transmission Protocol (SCTP)Abstract
This document describes a mapping of the Stream Control Transmission Protocol (SCTP) into a sockets API. The benefits of this mapping include compatibility for TCP applications, access to new SCTP features, and a consolidated error and event notification scheme. Status of This Memo This document is not an Internet Standards Track specification; it is published for informational purposes. This document is a product of the Internet Engineering Task Force (IETF). It represents the consensus of the IETF community. It has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG). Not all documents approved by the IESG are a candidate for any level of Internet Standard; see Section 2 of RFC 5741. Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at http://www.rfc-editor.org/info/rfc6458.
Copyright Notice Copyright (c) 2011 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. This document may contain material from IETF Documents or IETF Contributions published or made publicly available before November 10, 2008. The person(s) controlling the copyright in some of this material may not have granted the IETF Trust the right to allow modifications of such material outside the IETF Standards Process. Without obtaining an adequate license from the person(s) controlling the copyright in such materials, this document may not be modified outside the IETF Standards Process, and derivative works of it may not be created outside the IETF Standards Process, except to format it for publication as an RFC or to translate it into languages other than English.
Table of Contents
1. Introduction ....................................................6 2. Data Types ......................................................8 3. One-to-Many Style Interface .....................................8 3.1. Basic Operation ............................................8 3.1.1. socket() ............................................9 3.1.2. bind() .............................................10 3.1.3. listen() ...........................................11 3.1.4. sendmsg() and recvmsg() ............................12 3.1.5. close() ............................................14 3.1.6. connect() ..........................................14 3.2. Non-Blocking Mode .........................................15 3.3. Special Considerations ....................................16 4. One-to-One Style Interface .....................................18 4.1. Basic Operation ...........................................18 4.1.1. socket() ...........................................19 4.1.2. bind() .............................................19 4.1.3. listen() ...........................................21 4.1.4. accept() ...........................................21 4.1.5. connect() ..........................................22 4.1.6. close() ............................................23 4.1.7. shutdown() .........................................23 4.1.8. sendmsg() and recvmsg() ............................24 4.1.9. getpeername() ......................................24 5. Data Structures ................................................25 5.1. The msghdr and cmsghdr Structures .........................25 5.2. Ancillary Data Considerations and Semantics ...............26 5.2.1. Multiple Items and Ordering ........................27 5.2.2. Accessing and Manipulating Ancillary Data ..........27 5.2.3. Control Message Buffer Sizing ......................28 5.3. SCTP msg_control Structures ...............................28 5.3.1. SCTP Initiation Structure (SCTP_INIT) ..............29 5.3.2. SCTP Header Information Structure (SCTP_SNDRCV) - DEPRECATED .........................30 5.3.3. Extended SCTP Header Information Structure (SCTP_EXTRCV) - DEPRECATED .........................33 5.3.4. SCTP Send Information Structure (SCTP_SNDINFO) .....35 5.3.5. SCTP Receive Information Structure (SCTP_RCVINFO) ..37 5.3.6. SCTP Next Receive Information Structure (SCTP_NXTINFO) .....................................38 5.3.7. SCTP PR-SCTP Information Structure (SCTP_PRINFO) ...39 5.3.8. SCTP AUTH Information Structure (SCTP_AUTHINFO) ....40 5.3.9. SCTP Destination IPv4 Address Structure (SCTP_DSTADDRV4) ...................................41 5.3.10. SCTP Destination IPv6 Address Structure (SCTP_DSTADDRV6) ..................................41
6. SCTP Events and Notifications ..................................41 6.1. SCTP Notification Structure ...............................42 6.1.1. SCTP_ASSOC_CHANGE ..................................43 6.1.2. SCTP_PEER_ADDR_CHANGE ..............................45 6.1.3. SCTP_REMOTE_ERROR ..................................46 6.1.4. SCTP_SEND_FAILED - DEPRECATED ......................47 6.1.5. SCTP_SHUTDOWN_EVENT ................................48 6.1.6. SCTP_ADAPTATION_INDICATION .........................49 6.1.7. SCTP_PARTIAL_DELIVERY_EVENT ........................49 6.1.8. SCTP_AUTHENTICATION_EVENT ..........................50 6.1.9. SCTP_SENDER_DRY_EVENT ..............................51 6.1.10. SCTP_NOTIFICATIONS_STOPPED_EVENT ..................52 6.1.11. SCTP_SEND_FAILED_EVENT ............................52 6.2. Notification Interest Options .............................54 6.2.1. SCTP_EVENTS Option - DEPRECATED ....................54 6.2.2. SCTP_EVENT Option ..................................56 7. Common Operations for Both Styles ..............................57 7.1. send(), recv(), sendto(), and recvfrom() ..................57 7.2. setsockopt() and getsockopt() .............................59 7.3. read() and write() ........................................60 7.4. getsockname() .............................................60 7.5. Implicit Association Setup ................................61 8. Socket Options .................................................61 8.1. Read/Write Options ........................................63 8.1.1. Retransmission Timeout Parameters (SCTP_RTOINFO) ...63 8.1.2. Association Parameters (SCTP_ASSOCINFO) ............64 8.1.3. Initialization Parameters (SCTP_INITMSG) ...........66 8.1.4. SO_LINGER ..........................................66 8.1.5. SCTP_NODELAY .......................................66 8.1.6. SO_RCVBUF ..........................................67 8.1.7. SO_SNDBUF ..........................................67 8.1.8. Automatic Close of Associations (SCTP_AUTOCLOSE) ...67 8.1.9. Set Primary Address (SCTP_PRIMARY_ADDR) ............68 8.1.10. Set Adaptation Layer Indicator (SCTP_ADAPTATION_LAYER) ...........................68 8.1.11. Enable/Disable Message Fragmentation (SCTP_DISABLE_FRAGMENTS) ..........................68 8.1.12. Peer Address Parameters (SCTP_PEER_ADDR_PARAMS) ...69 8.1.13. Set Default Send Parameters (SCTP_DEFAULT_SEND_PARAM) - DEPRECATED ............71 8.1.14. Set Notification and Ancillary Events (SCTP_EVENTS) - DEPRECATED ........................72 8.1.15. Set/Clear IPv4 Mapped Addresses (SCTP_I_WANT_MAPPED_V4_ADDR) ......................72 8.1.16. Get or Set the Maximum Fragmentation Size (SCTP_MAXSEG) .....................................72 8.1.17. Get or Set the List of Supported HMAC Identifiers (SCTP_HMAC_IDENT) .....................73
8.1.18. Get or Set the Active Shared Key (SCTP_AUTH_ACTIVE_KEY) ............................74 8.1.19. Get or Set Delayed SACK Timer (SCTP_DELAYED_SACK) ...............................74 8.1.20. Get or Set Fragmented Interleave (SCTP_FRAGMENT_INTERLEAVE) ........................75 8.1.21. Set or Get the SCTP Partial Delivery Point (SCTP_PARTIAL_DELIVERY_POINT) .....................77 8.1.22. Set or Get the Use of Extended Receive Info (SCTP_USE_EXT_RCVINFO) - DEPRECATED ...............77 8.1.23. Set or Get the Auto ASCONF Flag (SCTP_AUTO_ASCONF) ................................77 8.1.24. Set or Get the Maximum Burst (SCTP_MAX_BURST) .....78 8.1.25. Set or Get the Default Context (SCTP_CONTEXT) .....78 8.1.26. Enable or Disable Explicit EOR Marking (SCTP_EXPLICIT_EOR) ...............................79 8.1.27. Enable SCTP Port Reusage (SCTP_REUSE_PORT) ........79 8.1.28. Set Notification Event (SCTP_EVENT) ...............79 8.1.29. Enable or Disable the Delivery of SCTP_RCVINFO as Ancillary Data (SCTP_RECVRCVINFO) ..............79 8.1.30. Enable or Disable the Delivery of SCTP_NXTINFO as Ancillary Data (SCTP_RECVNXTINFO) ..............80 8.1.31. Set Default Send Parameters (SCTP_DEFAULT_SNDINFO) ............................80 8.1.32. Set Default PR-SCTP Parameters (SCTP_DEFAULT_PRINFO) .............................80 8.2. Read-Only Options .........................................81 8.2.1. Association Status (SCTP_STATUS) ...................81 8.2.2. Peer Address Information (SCTP_GET_PEER_ADDR_INFO) ..........................82 8.2.3. Get the List of Chunks the Peer Requires to Be Authenticated (SCTP_PEER_AUTH_CHUNKS) ...........84 8.2.4. Get the List of Chunks the Local Endpoint Requires to Be Authenticated (SCTP_LOCAL_AUTH_CHUNKS) .......84 8.2.5. Get the Current Number of Associations (SCTP_GET_ASSOC_NUMBER) ............................85 8.2.6. Get the Current Identifiers of Associations (SCTP_GET_ASSOC_ID_LIST) ...........................85 8.3. Write-Only Options ........................................85 8.3.1. Set Peer Primary Address (SCTP_SET_PEER_PRIMARY_ADDR) .......................86 8.3.2. Add a Chunk That Must Be Authenticated (SCTP_AUTH_CHUNK) ..................................86 8.3.3. Set a Shared Key (SCTP_AUTH_KEY) ...................86 8.3.4. Deactivate a Shared Key (SCTP_AUTH_DEACTIVATE_KEY) .........................87 8.3.5. Delete a Shared Key (SCTP_AUTH_DELETE_KEY) .........88
9. New Functions ..................................................88 9.1. sctp_bindx() ..............................................88 9.2. sctp_peeloff() ............................................90 9.3. sctp_getpaddrs() ..........................................91 9.4. sctp_freepaddrs() .........................................92 9.5. sctp_getladdrs() ..........................................92 9.6. sctp_freeladdrs() .........................................93 9.7. sctp_sendmsg() - DEPRECATED ...............................93 9.8. sctp_recvmsg() - DEPRECATED ...............................94 9.9. sctp_connectx() ...........................................95 9.10. sctp_send() - DEPRECATED .................................96 9.11. sctp_sendx() - DEPRECATED ................................97 9.12. sctp_sendv() .............................................98 9.13. sctp_recvv() ............................................101 10. Security Considerations ......................................103 11. Acknowledgments ..............................................103 12. References ...................................................104 12.1. Normative References ....................................104 12.2. Informative References ..................................104 Appendix A. Example Using One-to-One Style Sockets ...............106 Appendix B. Example Using One-to-Many Style Sockets ..............1091. Introduction
The sockets API has provided a standard mapping of the Internet Protocol suite to many operating systems. Both TCP [RFC0793] and UDP [RFC0768] have benefited from this standard representation and access method across many diverse platforms. SCTP is a new protocol that provides many of the characteristics of TCP but also incorporates semantics more akin to UDP. This document defines a method to map the existing sockets API for use with SCTP, providing both a base for access to new features and compatibility so that most existing TCP applications can be migrated to SCTP with few (if any) changes. There are three basic design objectives: 1. Maintain consistency with existing sockets APIs: We define a sockets mapping for SCTP that is consistent with other sockets API protocol mappings (for instance UDP, TCP, IPv4, and IPv6). 2. Support a one-to-many style interface: This set of semantics is similar to that defined for connectionless protocols, such as UDP. A one-to-many style SCTP socket should be able to control multiple SCTP associations. This is similar to a UDP socket, which can communicate with many peer endpoints. Each of these associations is assigned an association identifier so that an
application can use the ID to differentiate them. Note that SCTP is connection-oriented in nature, and it does not support broadcast or multicast communications, as UDP does. 3. Support a one-to-one style interface: This interface supports a similar semantics as sockets for connection-oriented protocols, such as TCP. A one-to-one style SCTP socket should only control one SCTP association. One purpose of defining this interface is to allow existing applications built on other connection-oriented protocols to be ported to use SCTP with very little effort. Developers familiar with these semantics can easily adapt to SCTP. Another purpose is to make sure that existing mechanisms in most operating systems that support sockets, such as select(), should continue to work with this style of socket. Extensions are added to this mapping to provide mechanisms to exploit new features of SCTP. Goals 2 and 3 are not compatible, so this document defines two modes of mapping, namely the one-to-many style mapping and the one-to-one style mapping. These two modes share some common data structures and operations, but will require the use of two different application programming styles. Note that all new SCTP features can be used with both styles of socket. The decision on which one to use depends mainly on the nature of the applications. A mechanism is defined to extract an SCTP association from a one-to- many style socket into a one-to-one style socket. Some of the SCTP mechanisms cannot be adequately mapped to an existing socket interface. In some cases, it is more desirable to have a new interface instead of using existing socket calls. Section 9 of this document describes these new interfaces. Please note that some elements of the SCTP sockets API are declared as deprecated. During the evolution of this document, elements of the API were introduced, implemented, and later on replaced by other elements. These replaced elements are declared as deprecated, since they are still available in some implementations and the replacement functions are not. This applies especially to older versions of operating systems supporting SCTP. New SCTP socket implementations must implement at least the non-deprecated elements. Implementations intending interoperability with older versions of the API should also include the deprecated functions.
2. Data Types
Whenever possible, Portable Operating System Interface (POSIX) data types defined in [IEEE-1003.1-2008] are used: uintN_t means an unsigned integer of exactly N bits (e.g., uint16_t). This document also assumes the argument data types from POSIX when possible (e.g., the final argument to setsockopt() is a socklen_t value). Whenever buffer sizes are specified, the POSIX size_t data type is used.3. One-to-Many Style Interface
In the one-to-many style interface, there is a one-to-many relationship between sockets and associations.3.1. Basic Operation
A typical server in this style uses the following socket calls in sequence to prepare an endpoint for servicing requests: o socket() o bind() o listen() o recvmsg() o sendmsg() o close() A typical client uses the following calls in sequence to set up an association with a server to request services: o socket() o sendmsg() o recvmsg() o close() In this style, by default, all of the associations connected to the endpoint are represented with a single socket. Each association is assigned an association identifier (the type is sctp_assoc_t) so that an application can use it to differentiate among them. In some implementations, the peer endpoints' addresses can also be used for this purpose. But this is not required for performance reasons. If
an implementation does not support using addresses to differentiate between different associations, the sendto() call can only be used to set up an association implicitly. It cannot be used to send data to an established association, as the association identifier cannot be specified. Once an association identifier is assigned to an SCTP association, that identifier will not be reused until the application explicitly terminates the use of the association. The resources belonging to that association will not be freed until that happens. This is similar to the close() operation on a normal socket. The only exception is when the SCTP_AUTOCLOSE option (Section 8.1.8) is set. In this case, after the association is terminated gracefully and automatically, the association identifier assigned to it can be reused. All applications using this option should be aware of this to avoid the possible problem of sending data to an incorrect peer endpoint. If the server or client wishes to branch an existing association off to a separate socket, it is required to call sctp_peeloff() and to specify the association identifier. The sctp_peeloff() call will return a new one-to-one style socket that can then be used with recv() and send() functions for message passing. See Section 9.2 for more on branched-off associations. Once an association is branched off to a separate socket, it becomes completely separated from the original socket. All subsequent control and data operations to that association must be done through the new socket. For example, the close() operation on the original socket will not terminate any associations that have been branched off to a different socket. One-to-many style socket calls are discussed in more detail in the following subsections.3.1.1. socket()
Applications use socket() to create a socket descriptor to represent an SCTP endpoint. The function prototype is int socket(int domain, int type, int protocol); and one uses PF_INET or PF_INET6 as the domain, SOCK_SEQPACKET as the type, and IPPROTO_SCTP as the protocol.
Here, SOCK_SEQPACKET indicates the creation of a one-to-many style socket. The function returns a socket descriptor, or -1 in case of an error. Using the PF_INET domain indicates the creation of an endpoint that can use only IPv4 addresses, while PF_INET6 creates an endpoint that can use both IPv6 and IPv4 addresses.3.1.2. bind()
Applications use bind() to specify with which local address and port the SCTP endpoint should associate itself. An SCTP endpoint can be associated with multiple addresses. To do this, sctp_bindx() is introduced in Section 9.1 to help applications do the job of associating multiple addresses. But note that an endpoint can only be associated with one local port. These addresses associated with a socket are the eligible transport addresses for the endpoint to send and receive data. The endpoint will also present these addresses to its peers during the association initialization process; see [RFC4960]. After calling bind(), if the endpoint wishes to accept new associations on the socket, it must call listen() (see Section 3.1.3). The function prototype of bind() is int bind(int sd, struct sockaddr *addr, socklen_t addrlen); and the arguments are sd: The socket descriptor returned by socket(). addr: The address structure (struct sockaddr_in for an IPv4 address or struct sockaddr_in6 for an IPv6 address; see [RFC3493]). addrlen: The size of the address structure. bind() returns 0 on success and -1 in case of an error. If sd is an IPv4 socket, the address passed must be an IPv4 address. If the sd is an IPv6 socket, the address passed can either be an IPv4 or an IPv6 address.
Applications cannot call bind() multiple times to associate multiple addresses to an endpoint. After the first call to bind(), all subsequent calls will return an error. If the IP address part of addr is specified as a wildcard (INADDR_ANY for an IPv4 address, or as IN6ADDR_ANY_INIT or in6addr_any for an IPv6 address), the operating system will associate the endpoint with an optimal address set of the available interfaces. If the IPv4 sin_port or IPv6 sin6_port is set to 0, the operating system will choose an ephemeral port for the endpoint. If bind() is not called prior to a sendmsg() call that initiates a new association, the system picks an ephemeral port and will choose an address set equivalent to binding with a wildcard address. One of those addresses will be the primary address for the association. This automatically enables the multi-homing capability of SCTP. The completion of this bind() process does not allow the SCTP endpoint to accept inbound SCTP association requests. Until a listen() system call, described below, is performed on the socket, the SCTP endpoint will promptly reject an inbound SCTP INIT request with an SCTP ABORT.3.1.3. listen()
By default, a one-to-many style socket does not accept new association requests. An application uses listen() to mark a socket as being able to accept new associations. The function prototype is int listen(int sd, int backlog); and the arguments are sd: The socket descriptor of the endpoint. backlog: If backlog is non-zero, enable listening, else disable listening. listen() returns 0 on success and -1 in case of an error. Note that one-to-many style socket consumers do not need to call accept() to retrieve new associations. Calling accept() on a one-to- many style socket should return EOPNOTSUPP. Rather, new associations are accepted automatically, and notifications of the new associations are delivered via recvmsg() with the SCTP_ASSOC_CHANGE event (if
these notifications are enabled). Clients will typically not call listen(), so that they can be assured that only actively initiated associations are possible on the socket. Server or peer-to-peer sockets, on the other hand, will always accept new associations, so a well-written application using server one-to-many style sockets must be prepared to handle new associations from unwanted peers. Also note that the SCTP_ASSOC_CHANGE event provides the association identifier for a new association, so if applications wish to use the association identifier as a parameter to other socket calls, they should ensure that the SCTP_ASSOC_CHANGE event is enabled.3.1.4. sendmsg() and recvmsg()
An application uses the sendmsg() and recvmsg() calls to transmit data to and receive data from its peer. The function prototypes are ssize_t sendmsg(int sd, const struct msghdr *message, int flags); and ssize_t recvmsg(int sd, struct msghdr *message, int flags); using the following arguments: sd: The socket descriptor of the endpoint. message: Pointer to the msghdr structure that contains a single user message and possibly some ancillary data. See Section 5 for a complete description of the data structures. flags: No new flags are defined for SCTP at this level. See Section 5 for SCTP-specific flags used in the msghdr structure. sendmsg() returns the number of bytes accepted by the kernel or -1 in case of an error. recvmsg() returns the number of bytes received or -1 in case of an error.
As described in Section 5, different types of ancillary data can be sent and received along with user data. When sending, the ancillary data is used to specify the sent behavior, such as the SCTP stream number to use. When receiving, the ancillary data is used to describe the received data, such as the SCTP stream sequence number of the message. When sending user data with sendmsg(), the msg_name field in the msghdr structure will be filled with one of the transport addresses of the intended receiver. If there is no existing association between the sender and the intended receiver, the sender's SCTP stack will set up a new association and then send the user data (see Section 7.5 for more on implicit association setup). If sendmsg() is called with no data and there is no existing association, a new one will be established. The SCTP_INIT type ancillary data can be used to change some of the parameters used to set up a new association. If sendmsg() is called with NULL data, and there is no existing association but the SCTP_ABORT or SCTP_EOF flags are set as described in Section 5.3.4, then -1 is returned and errno is set to EINVAL. Sending a message using sendmsg() is atomic unless explicit end of record (EOR) marking is enabled on the socket specified by sd (see Section 8.1.26). If a peer sends a SHUTDOWN, an SCTP_SHUTDOWN_EVENT notification will be delivered if that notification has been enabled, and no more data can be sent to that association. Any attempt to send more data will cause sendmsg() to return with an ESHUTDOWN error. Note that the socket is still open for reading at this point, so it is possible to retrieve notifications. When receiving a user message with recvmsg(), the msg_name field in the msghdr structure will be populated with the source transport address of the user data. The caller of recvmsg() can use this address information to determine to which association the received user message belongs. Note that if SCTP_ASSOC_CHANGE events are disabled, applications must use the peer transport address provided in the msg_name field by recvmsg() to perform correlation to an association, since they will not have the association identifier. If all data in a single message has been delivered, MSG_EOR will be set in the msg_flags field of the msghdr structure (see Section 5.1). If the application does not provide enough buffer space to completely receive a data message, MSG_EOR will not be set in msg_flags. Successive reads will consume more of the same message until the entire message has been delivered, and MSG_EOR will be set.
If the SCTP stack is running low on buffers, it may partially deliver a message. In this case, MSG_EOR will not be set, and more calls to recvmsg() will be necessary to completely consume the message. Only one message at a time can be partially delivered in any stream. The socket option SCTP_FRAGMENT_INTERLEAVE controls various aspects of what interlacing of messages occurs for both the one-to-one and the one-to-many style sockets. Please consult Section 8.1.20 for further details on message delivery options.3.1.5. close()
Applications use close() to perform graceful shutdown (as described in Section 10.1 of [RFC4960]) on all of the associations currently represented by a one-to-many style socket. The function prototype is int close(int sd); and the argument is sd: The socket descriptor of the associations to be closed. 0 is returned on success and -1 in case of an error. To gracefully shut down a specific association represented by the one-to-many style socket, an application should use the sendmsg() call and include the SCTP_EOF flag. A user may optionally terminate an association non-gracefully by using sendmsg() with the SCTP_ABORT flag set and possibly passing a user-specified abort code in the data field. Both flags SCTP_EOF and SCTP_ABORT are passed with ancillary data (see Section 5.3.4) in the sendmsg() call. If sd in the close() call is a branched-off socket representing only one association, the shutdown is performed on that association only.3.1.6. connect()
An application may use the connect() call in the one-to-many style to initiate an association without sending data. The function prototype is int connect(int sd, const struct sockaddr *nam, socklen_t len);
and the arguments are sd: The socket descriptor to which a new association is added. nam: The address structure (struct sockaddr_in for an IPv4 address or struct sockaddr_in6 for an IPv6 address; see [RFC3493]). len: The size of the address. 0 is returned on success and -1 in case of an error. Multiple connect() calls can be made on the same socket to create multiple associations. This is different from the semantics of connect() on a UDP socket. Note that SCTP allows data exchange, similar to T/TCP [RFC1644] (made Historic by [RFC6247]), during the association setup phase. If an application wants to do this, it cannot use the connect() call. Instead, it should use sendto() or sendmsg() to initiate an association. If it uses sendto() and it wants to change the initialization behavior, it needs to use the SCTP_INITMSG socket option before calling sendto(). Or it can use sendmsg() with SCTP_INIT type ancillary data to initiate an association without calling setsockopt(). Note that the implicit setup is supported for the one-to-many style sockets. SCTP does not support half close semantics. This means that unlike T/TCP, MSG_EOF should not be set in the flags parameter when calling sendto() or sendmsg() when the call is used to initiate a connection. MSG_EOF is not an acceptable flag with an SCTP socket.3.2. Non-Blocking Mode
Some SCTP applications may wish to avoid being blocked when calling a socket interface function. Once a bind() call and/or subsequent sctp_bindx() calls are complete on a one-to-many style socket, an application may set the non-blocking option via a fcntl() (such as O_NONBLOCK). After setting the socket to non-blocking mode, the sendmsg() function returns immediately. The success or failure of sending the data message (with possible SCTP_INITMSG ancillary data) will be signaled by the SCTP_ASSOC_CHANGE event with SCTP_COMM_UP or SCTP_CANT_START_ASSOC. If user data could not be sent (due to an SCTP_CANT_START_ASSOC), the sender will also receive an SCTP_SEND_FAILED_EVENT event. Events can be received by the user calling recvmsg(). A server (having called listen()) is also
notified of an association-up event via the reception of an SCTP_ASSOC_CHANGE with SCTP_COMM_UP via the calling of recvmsg() and possibly the reception of the first data message. To shut down the association gracefully, the user must call sendmsg() with no data and with the SCTP_EOF flag set as described in Section 5.3.4. The function returns immediately, and completion of the graceful shutdown is indicated by an SCTP_ASSOC_CHANGE notification of type SCTP_SHUTDOWN_COMP (see Section 6.1.1). Note that this can also be done using the sctp_sendv() call described in Section 9.12. It is recommended that an application use caution when using select() (or poll()) for writing on a one-to-many style socket, because the interpretation of select() on write is implementation specific. Generally, a positive return on a select() on write would only indicate that one of the associations represented by the one-to-many style socket is writable. An application that writes after the select() returns may still block, since the association that was writable is not the destination association of the write call. Likewise, select() (or poll()) for reading from a one-to-many style socket will only return an indication that one of the associations represented by the socket has data to be read. An application that wishes to know that a particular association is ready for reading or writing should either use the one-to-one style or use the sctp_peeloff() function (see Section 9.2) to separate the association of interest from the one-to-many style socket. Note that some implementations may have an extended select call, such as epoll or kqueue, that may escape this limitation and allow a select on a specific association of a one-to-many style socket, but this is an implementation-specific detail that a portable application cannot depend on.3.3. Special Considerations
The fact that a one-to-many style socket can provide access to many SCTP associations through a single socket descriptor has important implications for both application programmers and system programmers implementing this API. A key issue is how buffer space inside the sockets layer is managed. Because this implementation detail directly affects how application programmers must write their code to ensure correct operation and portability, this section provides some guidance to both implementers and application programmers.
An important feature that SCTP shares with TCP is flow control. Specifically, a sender may not send data faster than the receiver can consume it. For TCP, flow control is typically provided for in the sockets API as follows. If the reader stops reading, the sender queues messages in the socket layer until the send socket buffer is completely filled. This results in a "stalled connection". Further attempts to write to the socket will block or return the error EAGAIN or EWOULDBLOCK for a non-blocking socket. At some point, either the connection is closed, or the receiver begins to read, again freeing space in the output queue. For one-to-one style SCTP sockets (this includes sockets descriptors that were separated from a one-to-many style socket with sctp_peeloff()), the behavior is identical. For one-to-many style SCTP sockets, there are multiple associations for a single socket, which makes the situation more complicated. If the implementation uses a single buffer space allocation shared by all associations, a single stalled association can prevent the further sending of data on all associations active on a particular one-to-many style socket. For a blocking socket, it should be clear that a single stalled association can block the entire socket. For this reason, application programmers may want to use non-blocking one-to-many style sockets. The application should at least be able to send messages to the non-stalled associations. But a non-blocking socket is not sufficient if the API implementer has chosen a single shared buffer allocation for the socket. A single stalled association would eventually cause the shared allocation to fill, and it would become impossible to send even to non-stalled associations. The API implementer can solve this problem by providing each association with its own allocation of outbound buffer space. Each association should conceptually have as much buffer space as it would have if it had its own socket. As a bonus, this simplifies the implementation of sctp_peeloff(). To ensure that a given stalled association will not prevent other non-stalled associations from being writable, application programmers should either o demand that the underlying implementation dedicates independent buffer space reservation to each association (as suggested above), or
o verify that their application-layer protocol does not permit large amounts of unread data at the receiver (this is true of some request-response protocols, for example), or o use one-to-one style sockets for association, which may potentially stall (either from the beginning, or by using sctp_peeloff() before sending large amounts of data that may cause a stalled condition).4. One-to-One Style Interface
The goal of this style is to follow as closely as possible the current practice of using the sockets interface for a connection- oriented protocol such as TCP. This style enables existing applications using connection-oriented protocols to be ported to SCTP with very little effort. One-to-one style sockets can be connected (explicitly or implicitly) at most once, similar to TCP sockets. Note that some new SCTP features and some new SCTP socket options can only be utilized through the use of sendmsg() and recvmsg() calls; see Section 4.1.8.4.1. Basic Operation
A typical one-to-one style server uses the following system call sequence to prepare an SCTP endpoint for servicing requests: o socket() o bind() o listen() o accept() The accept() call blocks until a new association is set up. It returns with a new socket descriptor. The server then uses the new socket descriptor to communicate with the client, using recv() and send() calls to get requests and send back responses. Then it calls o close() to terminate the association.
A typical client uses the following system call sequence to set up an association with a server to request services: o socket() o connect() After returning from the connect() call, the client uses send()/ sendmsg() and recv()/recvmsg() calls to send out requests and receive responses from the server. The client calls o close() to terminate this association when done.4.1.1. socket()
Applications call socket() to create a socket descriptor to represent an SCTP endpoint. The function prototype is int socket(int domain, int type, int protocol); and one uses PF_INET or PF_INET6 as the domain, SOCK_STREAM as the type, and IPPROTO_SCTP as the protocol. Here, SOCK_STREAM indicates the creation of a one-to-one style socket. Using the PF_INET domain indicates the creation of an endpoint that can use only IPv4 addresses, while PF_INET6 creates an endpoint that can use both IPv6 and IPv4 addresses.4.1.2. bind()
Applications use bind() to specify with which local address and port the SCTP endpoint should associate itself. An SCTP endpoint can be associated with multiple addresses. To do this, sctp_bindx() is introduced in Section 9.1 to help applications do the job of associating multiple addresses. But note that an endpoint can only be associated with one local port.
These addresses associated with a socket are the eligible transport addresses for the endpoint to send and receive data. The endpoint will also present these addresses to its peers during the association initialization process; see [RFC4960]. The function prototype of bind() is int bind(int sd, struct sockaddr *addr, socklen_t addrlen); and the arguments are sd: The socket descriptor returned by socket(). addr: The address structure (struct sockaddr_in for an IPv4 address or struct sockaddr_in6 for an IPv6 address; see [RFC3493]). addrlen: The size of the address structure. If sd is an IPv4 socket, the address passed must be an IPv4 address. If sd is an IPv6 socket, the address passed can either be an IPv4 or an IPv6 address. Applications cannot call bind() multiple times to associate multiple addresses to the endpoint. After the first call to bind(), all subsequent calls will return an error. If the IP address part of addr is specified as a wildcard (INADDR_ANY for an IPv4 address, or as IN6ADDR_ANY_INIT or in6addr_any for an IPv6 address), the operating system will associate the endpoint with an optimal address set of the available interfaces. If the IPv4 sin_port or IPv6 sin6_port is set to 0, the operating system will choose an ephemeral port for the endpoint. If bind() is not called prior to the connect() call, the system picks an ephemeral port and will choose an address set equivalent to binding with a wildcard address. One of these addresses will be the primary address for the association. This automatically enables the multi-homing capability of SCTP. The completion of this bind() process does not allow the SCTP endpoint to accept inbound SCTP association requests. Until a listen() system call, described below, is performed on the socket, the SCTP endpoint will promptly reject an inbound SCTP INIT request with an SCTP ABORT.
4.1.3. listen()
Applications use listen() to allow the SCTP endpoint to accept inbound associations. The function prototype is int listen(int sd, int backlog); and the arguments are sd: The socket descriptor of the SCTP endpoint. backlog: Specifies the max number of outstanding associations allowed in the socket's accept queue. These are the associations that have finished the four-way initiation handshake (see Section 5 of [RFC4960]) and are in the ESTABLISHED state. Note that a backlog of '0' indicates that the caller no longer wishes to receive new associations. listen() returns 0 on success and -1 in case of an error.4.1.4. accept()
Applications use the accept() call to remove an established SCTP association from the accept queue of the endpoint. A new socket descriptor will be returned from accept() to represent the newly formed association. The function prototype is int accept(int sd, struct sockaddr *addr, socklen_t *addrlen); and the arguments are sd: The listening socket descriptor. addr: On return, addr (struct sockaddr_in for an IPv4 address or struct sockaddr_in6 for an IPv6 address; see [RFC3493]) will contain the primary address of the peer endpoint. addrlen: On return, addrlen will contain the size of addr. The function returns the socket descriptor for the newly formed association on success and -1 in case of an error.
4.1.5. connect()
Applications use connect() to initiate an association to a peer. The function prototype is int connect(int sd, const struct sockaddr *addr, socklen_t addrlen); and the arguments are sd: The socket descriptor of the endpoint. addr: The peer's (struct sockaddr_in for an IPv4 address or struct sockaddr_in6 for an IPv6 address; see [RFC3493]) address. addrlen: The size of the address. connect() returns 0 on success and -1 on error. This operation corresponds to the ASSOCIATE primitive described in Section 10.1 of [RFC4960]. The number of outbound streams the new association has is stack dependent. Before connecting, applications can use the SCTP_INITMSG option described in Section 8.1.3 to change the number of outbound streams. If bind() is not called prior to the connect() call, the system picks an ephemeral port and will choose an address set equivalent to binding with INADDR_ANY and IN6ADDR_ANY_INIT for IPv4 and IPv6 sockets, respectively. One of the addresses will be the primary address for the association. This automatically enables the multi-homing capability of SCTP. Note that SCTP allows data exchange, similar to T/TCP [RFC1644] (made Historic by [RFC6247]), during the association setup phase. If an application wants to do this, it cannot use the connect() call. Instead, it should use sendto() or sendmsg() to initiate an association. If it uses sendto() and it wants to change the initialization behavior, it needs to use the SCTP_INITMSG socket option before calling sendto(). Or it can use sendmsg() with SCTP_INIT type ancillary data to initiate an association without calling setsockopt(). Note that the implicit setup is supported for the one-to-one style sockets.
SCTP does not support half close semantics. This means that unlike T/TCP, MSG_EOF should not be set in the flags parameter when calling sendto() or sendmsg() when the call is used to initiate a connection. MSG_EOF is not an acceptable flag with an SCTP socket.4.1.6. close()
Applications use close() to gracefully close down an association. The function prototype is int close(int sd); and the argument is sd: The socket descriptor of the association to be closed. close() returns 0 on success and -1 in case of an error. After an application calls close() on a socket descriptor, no further socket operations will succeed on that descriptor.4.1.7. shutdown()
SCTP differs from TCP in that it does not have half close semantics. Hence, the shutdown() call for SCTP is an approximation of the TCP shutdown() call, and solves some different problems. Full TCP compatibility is not provided, so developers porting TCP applications to SCTP may need to recode sections that use shutdown(). (Note that it is possible to achieve the same results as half close in SCTP using SCTP streams.) The function prototype is int shutdown(int sd, int how); and the arguments are sd: The socket descriptor of the association to be closed. how: Specifies the type of shutdown. The values are as follows: SHUT_RD: Disables further receive operations. No SCTP protocol action is taken. SHUT_WR: Disables further send operations, and initiates the SCTP shutdown sequence.
SHUT_RDWR: Disables further send and receive operations, and initiates the SCTP shutdown sequence. shutdown() returns 0 on success and -1 in case of an error. The major difference between SCTP and TCP shutdown() is that SCTP SHUT_WR initiates immediate and full protocol shutdown, whereas TCP SHUT_WR causes TCP to go into the half close state. SHUT_RD behaves the same for SCTP as for TCP. The purpose of SCTP SHUT_WR is to close the SCTP association while still leaving the socket descriptor open. This allows the caller to receive back any data that SCTP is unable to deliver (see Section 6.1.4 for more information) and receive event notifications. To perform the ABORT operation described in Section 10.1 of [RFC4960], an application can use the socket option SO_LINGER. SO_LINGER is described in Section 8.1.4.4.1.8. sendmsg() and recvmsg()
With a one-to-one style socket, the application can also use sendmsg() and recvmsg() to transmit data to and receive data from its peer. The semantics is similar to those used in the one-to-many style (see Section 3.1.4), with the following differences: 1. When sending, the msg_name field in the msghdr is not used to specify the intended receiver; rather, it is used to indicate a preferred peer address if the sender wishes to discourage the stack from sending the message to the primary address of the receiver. If the socket is connected and the transport address given is not part of the current association, the data will not be sent, and an SCTP_SEND_FAILED_EVENT event will be delivered to the application if send failure events are enabled. 2. Using sendmsg() on a non-connected one-to-one style socket for implicit connection setup may or may not work, depending on the SCTP implementation.4.1.9. getpeername()
Applications use getpeername() to retrieve the primary socket address of the peer. This call is for TCP compatibility and is not multi-homed. It may not work with one-to-many style sockets, depending on the implementation. See Section 9.3 for a multi-homed style version of the call.
The function prototype is int getpeername(int sd, struct sockaddr *address, socklen_t *len); and the arguments are sd: The socket descriptor to be queried. address: On return, the peer primary address is stored in this buffer. If the socket is an IPv4 socket, the address will be IPv4. If the socket is an IPv6 socket, the address will be either an IPv6 or IPv4 address. len: The caller should set the length of address here. On return, this is set to the length of the returned address. getpeername() returns 0 on success and -1 in case of an error. If the actual length of the address is greater than the length of the supplied sockaddr structure, the stored address will be truncated.