6. Sending and Receiving Data
6.1. Sending Data
To _Send a WebSocket Message_ comprising of /data/ over a WebSocket connection, an endpoint MUST perform the following steps.
1. The endpoint MUST ensure the WebSocket connection is in the OPEN state (cf. Sections 4.1 and 4.2.2.) If at any point the state of the WebSocket connection changes, the endpoint MUST abort the following steps. 2. An endpoint MUST encapsulate the /data/ in a WebSocket frame as defined in Section 5.2. If the data to be sent is large or if the data is not available in its entirety at the point the endpoint wishes to begin sending the data, the endpoint MAY alternately encapsulate the data in a series of frames as defined in Section 5.4. 3. The opcode (frame-opcode) of the first frame containing the data MUST be set to the appropriate value from Section 5.2 for data that is to be interpreted by the recipient as text or binary data. 4. The FIN bit (frame-fin) of the last frame containing the data MUST be set to 1 as defined in Section 5.2. 5. If the data is being sent by the client, the frame(s) MUST be masked as defined in Section 5.3. 6. If any extensions (Section 9) have been negotiated for the WebSocket connection, additional considerations may apply as per the definition of those extensions. 7. The frame(s) that have been formed MUST be transmitted over the underlying network connection.6.2. Receiving Data
To receive WebSocket data, an endpoint listens on the underlying network connection. Incoming data MUST be parsed as WebSocket frames as defined in Section 5.2. If a control frame (Section 5.5) is received, the frame MUST be handled as defined by Section 5.5. Upon receiving a data frame (Section 5.6), the endpoint MUST note the /type/ of the data as defined by the opcode (frame-opcode) from Section 5.2. The "Application data" from this frame is defined as the /data/ of the message. If the frame comprises an unfragmented message (Section 5.4), it is said that _A WebSocket Message Has Been Received_ with type /type/ and data /data/. If the frame is part of a fragmented message, the "Application data" of the subsequent data frames is concatenated to form the /data/. When the last fragment is received as indicated by the FIN bit (frame-fin), it is said that _A WebSocket Message Has Been Received_ with data /data/ (comprised of the concatenation of the "Application data" of the fragments) and
type /type/ (noted from the first frame of the fragmented message). Subsequent data frames MUST be interpreted as belonging to a new WebSocket message. Extensions (Section 9) MAY change the semantics of how data is read, specifically including what comprises a message boundary. Extensions, in addition to adding "Extension data" before the "Application data" in a payload, MAY also modify the "Application data" (such as by compressing it). A server MUST remove masking for data frames received from a client as described in Section 5.3.7. Closing the Connection
7.1. Definitions
7.1.1. Close the WebSocket Connection
To _Close the WebSocket Connection_, an endpoint closes the underlying TCP connection. An endpoint SHOULD use a method that cleanly closes the TCP connection, as well as the TLS session, if applicable, discarding any trailing bytes that may have been received. An endpoint MAY close the connection via any means available when necessary, such as when under attack. The underlying TCP connection, in most normal cases, SHOULD be closed first by the server, so that it holds the TIME_WAIT state and not the client (as this would prevent it from re-opening the connection for 2 maximum segment lifetimes (2MSL), while there is no corresponding server impact as a TIME_WAIT connection is immediately reopened upon a new SYN with a higher seq number). In abnormal cases (such as not having received a TCP Close from the server after a reasonable amount of time) a client MAY initiate the TCP Close. As such, when a server is instructed to _Close the WebSocket Connection_ it SHOULD initiate a TCP Close immediately, and when a client is instructed to do the same, it SHOULD wait for a TCP Close from the server. As an example of how to obtain a clean closure in C using Berkeley sockets, one would call shutdown() with SHUT_WR on the socket, call recv() until obtaining a return value of 0 indicating that the peer has also performed an orderly shutdown, and finally call close() on the socket.
7.1.2. Start the WebSocket Closing Handshake
To _Start the WebSocket Closing Handshake_ with a status code (Section 7.4) /code/ and an optional close reason (Section 7.1.6) /reason/, an endpoint MUST send a Close control frame, as described in Section 5.5.1, whose status code is set to /code/ and whose close reason is set to /reason/. Once an endpoint has both sent and received a Close control frame, that endpoint SHOULD _Close the WebSocket Connection_ as defined in Section 7.1.1.7.1.3. The WebSocket Closing Handshake is Started
Upon either sending or receiving a Close control frame, it is said that _The WebSocket Closing Handshake is Started_ and that the WebSocket connection is in the CLOSING state.7.1.4. The WebSocket Connection is Closed
When the underlying TCP connection is closed, it is said that _The WebSocket Connection is Closed_ and that the WebSocket connection is in the CLOSED state. If the TCP connection was closed after the WebSocket closing handshake was completed, the WebSocket connection is said to have been closed _cleanly_. If the WebSocket connection could not be established, it is also said that _The WebSocket Connection is Closed_, but not _cleanly_.7.1.5. The WebSocket Connection Close Code
As defined in Sections 5.5.1 and 7.4, a Close control frame may contain a status code indicating a reason for closure. A closing of the WebSocket connection may be initiated by either endpoint, potentially simultaneously. _The WebSocket Connection Close Code_ is defined as the status code (Section 7.4) contained in the first Close control frame received by the application implementing this protocol. If this Close control frame contains no status code, _The WebSocket Connection Close Code_ is considered to be 1005. If _The WebSocket Connection is Closed_ and no Close control frame was received by the endpoint (such as could occur if the underlying transport connection is lost), _The WebSocket Connection Close Code_ is considered to be 1006. NOTE: Two endpoints may not agree on the value of _The WebSocket Connection Close Code_. As an example, if the remote endpoint sent a Close frame but the local application has not yet read the data containing the Close frame from its socket's receive buffer, and the local application independently decided to close the connection and send a Close frame, both endpoints will have sent and received a
Close frame and will not send further Close frames. Each endpoint will see the status code sent by the other end as _The WebSocket Connection Close Code_. As such, it is possible that the two endpoints may not agree on the value of _The WebSocket Connection Close Code_ in the case that both endpoints _Start the WebSocket Closing Handshake_ independently and at roughly the same time.7.1.6. The WebSocket Connection Close Reason
As defined in Sections 5.5.1 and 7.4, a Close control frame may contain a status code indicating a reason for closure, followed by UTF-8-encoded data, the interpretation of said data being left to the endpoints and not defined by this protocol. A closing of the WebSocket connection may be initiated by either endpoint, potentially simultaneously. _The WebSocket Connection Close Reason_ is defined as the UTF-8-encoded data following the status code (Section 7.4) contained in the first Close control frame received by the application implementing this protocol. If there is no such data in the Close control frame, _The WebSocket Connection Close Reason_ is the empty string. NOTE: Following the same logic as noted in Section 7.1.5, two endpoints may not agree on _The WebSocket Connection Close Reason_.7.1.7. Fail the WebSocket Connection
Certain algorithms and specifications require an endpoint to _Fail the WebSocket Connection_. To do so, the client MUST _Close the WebSocket Connection_, and MAY report the problem to the user (which would be especially useful for developers) in an appropriate manner. Similarly, to do so, the server MUST _Close the WebSocket Connection_, and SHOULD log the problem. If _The WebSocket Connection is Established_ prior to the point where the endpoint is required to _Fail the WebSocket Connection_, the endpoint SHOULD send a Close frame with an appropriate status code (Section 7.4) before proceeding to _Close the WebSocket Connection_. An endpoint MAY omit sending a Close frame if it believes the other side is unlikely to be able to receive and process the Close frame, due to the nature of the error that led the WebSocket connection to fail in the first place. An endpoint MUST NOT continue to attempt to process data (including a responding Close frame) from the remote endpoint after being instructed to _Fail the WebSocket Connection_. Except as indicated above or as specified by the application layer (e.g., a script using the WebSocket API), clients SHOULD NOT close the connection.
7.2. Abnormal Closures
7.2.1. Client-Initiated Closure
Certain algorithms, in particular during the opening handshake, require the client to _Fail the WebSocket Connection_. To do so, the client MUST _Fail the WebSocket Connection_ as defined in Section 7.1.7. If at any point the underlying transport layer connection is unexpectedly lost, the client MUST _Fail the WebSocket Connection_. Except as indicated above or as specified by the application layer (e.g., a script using the WebSocket API), clients SHOULD NOT close the connection.7.2.2. Server-Initiated Closure
Certain algorithms require or recommend that the server _Abort the WebSocket Connection_ during the opening handshake. To do so, the server MUST simply _Close the WebSocket Connection_ (Section 7.1.1).7.2.3. Recovering from Abnormal Closure
Abnormal closures may be caused by any number of reasons. Such closures could be the result of a transient error, in which case reconnecting may lead to a good connection and a resumption of normal operations. Such closures may also be the result of a nontransient problem, in which case if each deployed client experiences an abnormal closure and immediately and persistently tries to reconnect, the server may experience what amounts to a denial-of-service attack by a large number of clients trying to reconnect. The end result of such a scenario could be that the service is unable to recover in a timely manner or recovery is made much more difficult. To prevent this, clients SHOULD use some form of backoff when trying to reconnect after abnormal closures as described in this section. The first reconnect attempt SHOULD be delayed by a random amount of time. The parameters by which this random delay is chosen are left to the client to decide; a value chosen randomly between 0 and 5 seconds is a reasonable initial delay though clients MAY choose a different interval from which to select a delay length based on implementation experience and particular application. Should the first reconnect attempt fail, subsequent reconnect attempts SHOULD be delayed by increasingly longer amounts of time, using a method such as truncated binary exponential backoff.
7.3. Normal Closure of Connections
Servers MAY close the WebSocket connection whenever desired. Clients SHOULD NOT close the WebSocket connection arbitrarily. In either case, an endpoint initiates a closure by following the procedures to _Start the WebSocket Closing Handshake_ (Section 7.1.2).7.4. Status Codes
When closing an established connection (e.g., when sending a Close frame, after the opening handshake has completed), an endpoint MAY indicate a reason for closure. The interpretation of this reason by an endpoint, and the action an endpoint should take given this reason, are left undefined by this specification. This specification defines a set of pre-defined status codes and specifies which ranges may be used by extensions, frameworks, and end applications. The status code and any associated textual message are optional components of a Close frame.7.4.1. Defined Status Codes
Endpoints MAY use the following pre-defined status codes when sending a Close frame. 1000 1000 indicates a normal closure, meaning that the purpose for which the connection was established has been fulfilled. 1001 1001 indicates that an endpoint is "going away", such as a server going down or a browser having navigated away from a page. 1002 1002 indicates that an endpoint is terminating the connection due to a protocol error. 1003 1003 indicates that an endpoint is terminating the connection because it has received a type of data it cannot accept (e.g., an endpoint that understands only text data MAY send this if it receives a binary message).
1004 Reserved. The specific meaning might be defined in the future. 1005 1005 is a reserved value and MUST NOT be set as a status code in a Close control frame by an endpoint. It is designated for use in applications expecting a status code to indicate that no status code was actually present. 1006 1006 is a reserved value and MUST NOT be set as a status code in a Close control frame by an endpoint. It is designated for use in applications expecting a status code to indicate that the connection was closed abnormally, e.g., without sending or receiving a Close control frame. 1007 1007 indicates that an endpoint is terminating the connection because it has received data within a message that was not consistent with the type of the message (e.g., non-UTF-8 [RFC3629] data within a text message). 1008 1008 indicates that an endpoint is terminating the connection because it has received a message that violates its policy. This is a generic status code that can be returned when there is no other more suitable status code (e.g., 1003 or 1009) or if there is a need to hide specific details about the policy. 1009 1009 indicates that an endpoint is terminating the connection because it has received a message that is too big for it to process. 1010 1010 indicates that an endpoint (client) is terminating the connection because it has expected the server to negotiate one or more extension, but the server didn't return them in the response message of the WebSocket handshake. The list of extensions that
are needed SHOULD appear in the /reason/ part of the Close frame. Note that this status code is not used by the server, because it can fail the WebSocket handshake instead. 1011 1011 indicates that a server is terminating the connection because it encountered an unexpected condition that prevented it from fulfilling the request. 1015 1015 is a reserved value and MUST NOT be set as a status code in a Close control frame by an endpoint. It is designated for use in applications expecting a status code to indicate that the connection was closed due to a failure to perform a TLS handshake (e.g., the server certificate can't be verified).7.4.2. Reserved Status Code Ranges
0-999 Status codes in the range 0-999 are not used. 1000-2999 Status codes in the range 1000-2999 are reserved for definition by this protocol, its future revisions, and extensions specified in a permanent and readily available public specification. 3000-3999 Status codes in the range 3000-3999 are reserved for use by libraries, frameworks, and applications. These status codes are registered directly with IANA. The interpretation of these codes is undefined by this protocol. 4000-4999 Status codes in the range 4000-4999 are reserved for private use and thus can't be registered. Such codes can be used by prior agreements between WebSocket applications. The interpretation of these codes is undefined by this protocol.
8. Error Handling
8.1. Handling Errors in UTF-8-Encoded Data
When an endpoint is to interpret a byte stream as UTF-8 but finds that the byte stream is not, in fact, a valid UTF-8 stream, that endpoint MUST _Fail the WebSocket Connection_. This rule applies both during the opening handshake and during subsequent data exchange.9. Extensions
WebSocket clients MAY request extensions to this specification, and WebSocket servers MAY accept some or all extensions requested by the client. A server MUST NOT respond with any extension not requested by the client. If extension parameters are included in negotiations between the client and the server, those parameters MUST be chosen in accordance with the specification of the extension to which the parameters apply.9.1. Negotiating Extensions
A client requests extensions by including a |Sec-WebSocket- Extensions| header field, which follows the normal rules for HTTP header fields (see [RFC2616], Section 4.2) and the value of the header field is defined by the following ABNF [RFC2616]. Note that this section is using ABNF syntax/rules from [RFC2616], including the "implied *LWS rule". If a value is received by either the client or the server during negotiation that does not conform to the ABNF below, the recipient of such malformed data MUST immediately _Fail the WebSocket Connection_. Sec-WebSocket-Extensions = extension-list extension-list = 1#extension extension = extension-token *( ";" extension-param ) extension-token = registered-token registered-token = token extension-param = token [ "=" (token | quoted-string) ] ;When using the quoted-string syntax variant, the value ;after quoted-string unescaping MUST conform to the ;'token' ABNF.
Note that like other HTTP header fields, this header field MAY be split or combined across multiple lines. Ergo, the following are equivalent: Sec-WebSocket-Extensions: foo Sec-WebSocket-Extensions: bar; baz=2 is exactly equivalent to Sec-WebSocket-Extensions: foo, bar; baz=2 Any extension-token used MUST be a registered token (see Section 11.4). The parameters supplied with any given extension MUST be defined for that extension. Note that the client is only offering to use any advertised extensions and MUST NOT use them unless the server indicates that it wishes to use the extension. Note that the order of extensions is significant. Any interactions between multiple extensions MAY be defined in the documents defining the extensions. In the absence of such definitions, the interpretation is that the header fields listed by the client in its request represent a preference of the header fields it wishes to use, with the first options listed being most preferable. The extensions listed by the server in response represent the extensions actually in use for the connection. Should the extensions modify the data and/or framing, the order of operations on the data should be assumed to be the same as the order in which the extensions are listed in the server's response in the opening handshake. For example, if there are two extensions "foo" and "bar" and if the header field |Sec-WebSocket-Extensions| sent by the server has the value "foo, bar", then operations on the data will be made as bar(foo(data)), be those changes to the data itself (such as compression) or changes to the framing that may "stack". Non-normative examples of acceptable extension header fields (note that long lines are folded for readability): Sec-WebSocket-Extensions: deflate-stream Sec-WebSocket-Extensions: mux; max-channels=4; flow-control, deflate-stream Sec-WebSocket-Extensions: private-extension A server accepts one or more extensions by including a |Sec-WebSocket-Extensions| header field containing one or more extensions that were requested by the client. The interpretation of
any extension parameters, and what constitutes a valid response by a server to a requested set of parameters by a client, will be defined by each such extension.9.2. Known Extensions
Extensions provide a mechanism for implementations to opt-in to additional protocol features. This document doesn't define any extension, but implementations MAY use extensions defined separately.10. Security Considerations
This section describes some security considerations applicable to the WebSocket Protocol. Specific security considerations are described in subsections of this section.10.1. Non-Browser Clients
The WebSocket Protocol protects against malicious JavaScript running inside a trusted application such as a web browser, for example, by checking of the |Origin| header field (see below). See Section 1.6 for additional details. Such assumptions don't hold true in the case of a more-capable client. While this protocol is intended to be used by scripts in web pages, it can also be used directly by hosts. Such hosts are acting on their own behalf and can therefore send fake |Origin| header fields, misleading the server. Servers should therefore be careful about assuming that they are talking directly to scripts from known origins and must consider that they might be accessed in unexpected ways. In particular, a server should not trust that any input is valid. EXAMPLE: If the server uses input as part of SQL queries, all input text should be escaped before being passed to the SQL server, lest the server be susceptible to SQL injection.10.2. Origin Considerations
Servers that are not intended to process input from any web page but only for certain sites SHOULD verify the |Origin| field is an origin they expect. If the origin indicated is unacceptable to the server, then it SHOULD respond to the WebSocket handshake with a reply containing HTTP 403 Forbidden status code. The |Origin| header field protects from the attack cases when the untrusted party is typically the author of a JavaScript application that is executing in the context of the trusted client. The client itself can contact the server and, via the mechanism of the |Origin|
header field, determine whether to extend those communication privileges to the JavaScript application. The intent is not to prevent non-browsers from establishing connections but rather to ensure that trusted browsers under the control of potentially malicious JavaScript cannot fake a WebSocket handshake.10.3. Attacks On Infrastructure (Masking)
In addition to endpoints being the target of attacks via WebSockets, other parts of web infrastructure, such as proxies, may be the subject of an attack. As this protocol was being developed, an experiment was conducted to demonstrate a class of attacks on proxies that led to the poisoning of caching proxies deployed in the wild [TALKING]. The general form of the attack was to establish a connection to a server under the "attacker's" control, perform an UPGRADE on the HTTP connection similar to what the WebSocket Protocol does to establish a connection, and subsequently send data over that UPGRADEd connection that looked like a GET request for a specific known resource (which in an attack would likely be something like a widely deployed script for tracking hits or a resource on an ad-serving network). The remote server would respond with something that looked like a response to the fake GET request, and this response would be cached by a nonzero percentage of deployed intermediaries, thus poisoning the cache. The net effect of this attack would be that if a user could be convinced to visit a website the attacker controlled, the attacker could potentially poison the cache for that user and other users behind the same cache and run malicious script on other origins, compromising the web security model. To avoid such attacks on deployed intermediaries, it is not sufficient to prefix application-supplied data with framing that is not compliant with HTTP, as it is not possible to exhaustively discover and test that each nonconformant intermediary does not skip such non-HTTP framing and act incorrectly on the frame payload. Thus, the defense adopted is to mask all data from the client to the server, so that the remote script (attacker) does not have control over how the data being sent appears on the wire and thus cannot construct a message that could be misinterpreted by an intermediary as an HTTP request. Clients MUST choose a new masking key for each frame, using an algorithm that cannot be predicted by end applications that provide data. For example, each masking could be drawn from a cryptographically strong random number generator. If the same key is used or a decipherable pattern exists for how the next key is chosen, the attacker can send a message that, when masked, could appear to be
an HTTP request (by taking the message the attacker wishes to see on the wire and masking it with the next masking key to be used, the masking key will effectively unmask the data when the client applies it). It is also necessary that once the transmission of a frame from a client has begun, the payload (application-supplied data) of that frame must not be capable of being modified by the application. Otherwise, an attacker could send a long frame where the initial data was a known value (such as all zeros), compute the masking key being used upon receipt of the first part of the data, and then modify the data that is yet to be sent in the frame to appear as an HTTP request when masked. (This is essentially the same problem described in the previous paragraph with using a known or predictable masking key.) If additional data is to be sent or data to be sent is somehow changed, that new or changed data must be sent in a new frame and thus with a new masking key. In short, once transmission of a frame begins, the contents must not be modifiable by the remote script (application). The threat model being protected against is one in which the client sends data that appears to be an HTTP request. As such, the channel that needs to be masked is the data from the client to the server. The data from the server to the client can be made to look like a response, but to accomplish this request, the client must also be able to forge a request. As such, it was not deemed necessary to mask data in both directions (the data from the server to the client is not masked). Despite the protection provided by masking, non-compliant HTTP proxies will still be vulnerable to poisoning attacks of this type by clients and servers that do not apply masking.10.4. Implementation-Specific Limits
Implementations that have implementation- and/or platform-specific limitations regarding the frame size or total message size after reassembly from multiple frames MUST protect themselves against exceeding those limits. (For example, a malicious endpoint can try to exhaust its peer's memory or mount a denial-of-service attack by sending either a single big frame (e.g., of size 2**60) or by sending a long stream of small frames that are a part of a fragmented message.) Such an implementation SHOULD impose a limit on frame sizes and the total message size after reassembly from multiple frames.
10.5. WebSocket Client Authentication
This protocol doesn't prescribe any particular way that servers can authenticate clients during the WebSocket handshake. The WebSocket server can use any client authentication mechanism available to a generic HTTP server, such as cookies, HTTP authentication, or TLS authentication.10.6. Connection Confidentiality and Integrity
Connection confidentiality and integrity is provided by running the WebSocket Protocol over TLS (wss URIs). WebSocket implementations MUST support TLS and SHOULD employ it when communicating with their peers. For connections using TLS, the amount of benefit provided by TLS depends greatly on the strength of the algorithms negotiated during the TLS handshake. For example, some TLS cipher mechanisms don't provide connection confidentiality. To achieve reasonable levels of protection, clients should use only Strong TLS algorithms. "Web Security Context: User Interface Guidelines" [W3C.REC-wsc-ui-20100812] discusses what constitutes Strong TLS algorithms. [RFC5246] provides additional guidance in Appendix A.5 and Appendix D.3.10.7. Handling of Invalid Data
Incoming data MUST always be validated by both clients and servers. If, at any time, an endpoint is faced with data that it does not understand or that violates some criteria by which the endpoint determines safety of input, or when the endpoint sees an opening handshake that does not correspond to the values it is expecting (e.g., incorrect path or origin in the client request), the endpoint MAY drop the TCP connection. If the invalid data was received after a successful WebSocket handshake, the endpoint SHOULD send a Close frame with an appropriate status code (Section 7.4) before proceeding to _Close the WebSocket Connection_. Use of a Close frame with an appropriate status code can help in diagnosing the problem. If the invalid data is sent during the WebSocket handshake, the server SHOULD return an appropriate HTTP [RFC2616] status code. A common class of security problems arises when sending text data using the wrong encoding. This protocol specifies that messages with a Text data type (as opposed to Binary or other types) contain UTF-8- encoded data. Although the length is still indicated and applications implementing this protocol should use the length to determine where the frame actually ends, sending data in an improper
encoding may still break assumptions that applications built on top of this protocol may make, leading to anything from misinterpretation of data to loss of data or potential security bugs.10.8. Use of SHA-1 by the WebSocket Handshake
The WebSocket handshake described in this document doesn't depend on any security properties of SHA-1, such as collision resistance or resistance to the second pre-image attack (as described in [RFC4270]).