2. Architecture
XMPP provides a technology for the asynchronous, end-to-end exchange of structured data by means of direct, persistent XML streams among a distributed network of globally addressable, presence-aware clients and servers. Because this architectural style involves ubiquitous knowledge of network availability and a conceptually unlimited number of concurrent information transactions in the context of a given client-to-server or server-to-server session, we label it "Availability for Concurrent Transactions" (ACT) to distinguish it from the "Representational State Transfer" [REST] architectural style familiar from the World Wide Web. Although the architecture of XMPP is similar in important ways to that of email (see [EMAIL-ARCH]), it introduces several modifications to facilitate communication in close to real time. The salient features of this ACTive architectural style are as follows.2.1. Global Addresses
As with email, XMPP uses globally unique addresses (based on the Domain Name System) in order to route and deliver messages over the network. All XMPP entities are addressable on the network, most particularly clients and servers but also various additional services that can be accessed by clients and servers. In general, server addresses are of the form <domainpart> (e.g., <im.example.com>), accounts hosted at a server are of the form <localpart@domainpart> (e.g., <juliet@im.example.com>, called a "bare JID"), and a
particular connected device or resource that is currently authorized for interaction on behalf of an account is of the form <localpart@domainpart/resourcepart> (e.g., <juliet@im.example.com/balcony>, called a "full JID"). For historical reasons, XMPP addresses are often called Jabber IDs or JIDs. Because the formal specification of the XMPP address format depends on internationalization technologies that are in flux at the time of writing, the format is defined in [XMPP-ADDR] instead of this document. The terms "localpart", "domainpart", and "resourcepart" are defined more formally in [XMPP-ADDR].2.2. Presence
XMPP includes the ability for an entity to advertise its network availability or "presence" to other entities. In XMPP, this availability for communication is signaled end-to-end by means of a dedicated communication primitive: the <presence/> stanza. Although knowledge of network availability is not strictly necessary for the exchange of XMPP messages, it facilitates real-time interaction because the originator of a message can know before initiating communication that the intended recipient is online and available. End-to-end presence is defined in [XMPP-IM].2.3. Persistent Streams
Availability for communication is also built into each point-to-point "hop" through the use of persistent XML streams over long-lived TCP connections. These "always-on" client-to-server and server-to-server streams enable each party to push data to the other party at any time for immediate routing or delivery. XML streams are defined under Section 4.2.4. Structured Data
The basic protocol data unit in XMPP is not an XML stream (which simply provides the transport for point-to-point communication) but an XML "stanza", which is essentially a fragment of XML that is sent over a stream. The root element of a stanza includes routing attributes (such as "from" and "to" addresses), and the child elements of the stanza contain a payload for delivery to the intended recipient. XML stanzas are defined under Section 8.2.5. Distributed Network of Clients and Servers
In practice, XMPP consists of a network of clients and servers that inter-communicate (however, communication between any two given deployed servers is strictly discretionary and a matter of local service policy). Thus, for example, the user <juliet@im.example.com>
associated with the server <im.example.com> might be able to exchange messages, presence, and other structured data with the user <romeo@example.net> associated with the server <example.net>. This pattern is familiar from messaging protocols that make use of global addresses, such as the email network (see [SMTP] and [EMAIL-ARCH]). As a result, end-to-end communication in XMPP is logically peer-to- peer but physically client-to-server-to-server-to-client, as illustrated in the following diagram. example.net <--------------> im.example.com ^ ^ | | v v romeo@example.net juliet@im.example.com Figure 1: Distributed Client-Server Architecture Informational Note: Architectures that employ XML streams (Section 4) and XML stanzas (Section 8) but that establish peer- to-peer connections directly between clients using technologies based on [LINKLOCAL] have been deployed, but such architectures are not defined in this specification and are best described as "XMPP-like"; for details, see [XEP-0174]. In addition, XML streams can be established end-to-end over any reliable transport, including extensions to XMPP itself; however, such methods are out of scope for this specification. The following paragraphs describe the responsibilities of clients and servers on the network. A client is an entity that establishes an XML stream with a server by authenticating using the credentials of a registered account (via SASL negotiation (Section 6)) and that then completes resource binding (Section 7) in order to enable delivery of XML stanzas between the server and the client over the negotiated stream. The client then uses XMPP to communicate with its server, other clients, and any other entities on the network, where the server is responsible for delivering stanzas to other connected clients at the same server or routing them to remote servers. Multiple clients can connect simultaneously to a server on behalf of the same registered account, where each client is differentiated by the resourcepart of an XMPP address (e.g., <juliet@im.example.com/balcony> vs. <juliet@im.example.com/chamber>), as defined under [XMPP-ADDR] and Section 7.
A server is an entity whose primary responsibilities are to: o Manage XML streams (Section 4) with connected clients and deliver XML stanzas (Section 8) to those clients over the negotiated streams; this includes responsibility for ensuring that a client authenticates with the server before being granted access to the XMPP network. o Subject to local service policies on server-to-server communication, manage XML streams (Section 4) with remote servers and route XML stanzas (Section 8) to those servers over the negotiated streams. Depending on the application, the secondary responsibilities of an XMPP server can include: o Storing data that is used by clients (e.g., contact lists for users of XMPP-based instant messaging and presence applications as defined in [XMPP-IM]); in this case, the relevant XML stanza is handled directly by the server itself on behalf of the client and is not routed to a remote server or delivered to a connected client. o Hosting add-on services that also use XMPP as the basis for communication but that provide additional functionality beyond that defined in this document or in [XMPP-IM]; examples include multi-user conferencing services as specified in [XEP-0045] and publish-subscribe services as specified in [XEP-0060].3. TCP Binding
3.1. Scope
As XMPP is defined in this specification, an initiating entity (client or server) MUST open a Transmission Control Protocol [TCP] connection to the receiving entity (server) before it negotiates XML streams with the receiving entity. The parties then maintain that TCP connection for as long as the XML streams are in use. The rules specified in the following sections apply to the TCP binding. Informational Note: There is no necessary coupling of XML streams to TCP, and other transports are possible. For example, two entities could connect to each other by means of [HTTP] as specified in [XEP-0124] and [XEP-0206]. However, this specification defines only a binding of XMPP to TCP.
3.2. Resolution of Fully Qualified Domain Names
Because XML streams are sent over TCP, the initiating entity needs to determine the IPv4 or IPv6 address (and port) of the receiving entity before it can attempt to open an XML stream. Typically this is done by resolving the receiving entity's fully qualified domain name or FQDN (see [DNS-CONCEPTS]).3.2.1. Preferred Process: SRV Lookup
The preferred process for FQDN resolution is to use [DNS-SRV] records as follows: 1. The initiating entity constructs a DNS SRV query whose inputs are: * a Service of "xmpp-client" (for client-to-server connections) or "xmpp-server" (for server-to-server connections) * a Proto of "tcp" * a Name corresponding to the "origin domain" [TLS-CERTS] of the XMPP service to which the initiating entity wishes to connect (e.g., "example.net" or "im.example.com") 2. The result is a query such as "_xmpp-client._tcp.example.net." or "_xmpp-server._tcp.im.example.com.". 3. If a response is received, it will contain one or more combinations of a port and FDQN, each of which is weighted and prioritized as described in [DNS-SRV]. (However, if the result of the SRV lookup is a single resource record with a Target of ".", i.e., the root domain, then the initiating entity MUST abort SRV processing at this point because according to [DNS-SRV] such a Target "means that the service is decidedly not available at this domain".) 4. The initiating entity chooses at least one of the returned FQDNs to resolve (following the rules in [DNS-SRV]), which it does by performing DNS "A" or "AAAA" lookups on the FDQN; this will result in an IPv4 or IPv6 address. 5. The initiating entity uses the IP address(es) from the successfully resolved FDQN (with the corresponding port number returned by the SRV lookup) as the connection address for the receiving entity.
6. If the initiating entity fails to connect using that IP address but the "A" or "AAAA" lookups returned more than one IP address, then the initiating entity uses the next resolved IP address for that FDQN as the connection address. 7. If the initiating entity fails to connect using all resolved IP addresses for a given FDQN, then it repeats the process of resolution and connection for the next FQDN returned by the SRV lookup based on the priority and weight as defined in [DNS-SRV]. 8. If the initiating entity receives a response to its SRV query but it is not able to establish an XMPP connection using the data received in the response, it SHOULD NOT attempt the fallback process described in the next section (this helps to prevent a state mismatch between inbound and outbound connections). 9. If the initiating entity does not receive a response to its SRV query, it SHOULD attempt the fallback process described in the next section.3.2.2. Fallback Processes
The fallback process SHOULD be a normal "A" or "AAAA" address record resolution to determine the IPv4 or IPv6 address of the origin domain, where the port used is the "xmpp-client" port of 5222 for client-to-server connections or the "xmpp-server" port of 5269 for server-to-server connections (these are the default ports as registered with the IANA as described under Section 14.7). If connections via TCP are unsuccessful, the initiating entity might attempt to find and use alternative connection methods such as the HTTP binding (see [XEP-0124] and [XEP-0206]), which might be discovered using [DNS-TXT] records as described in [XEP-0156].3.2.3. When Not to Use SRV
If the initiating entity has been explicitly configured to associate a particular FQDN (and potentially port) with the origin domain of the receiving entity (say, to "hardcode" an association from an origin domain of example.net to a configured FQDN of apps.example.com), the initiating entity is encouraged to use the configured name instead of performing the preferred SRV resolution process on the origin domain.
3.2.4. Use of SRV Records with Add-On Services
Many XMPP servers are implemented in such a way that they can host add-on services (beyond those defined in this specification and [XMPP-IM]) at DNS domain names that typically are "subdomains" of the main XMPP service (e.g., conference.example.net for a [XEP-0045] service associated with the example.net XMPP service) or "subdomains" of the first-level domain of the underlying service (e.g., muc.example.com for a [XEP-0045] service associated with the im.example.com XMPP service). If an entity associated with a remote XMPP server wishes to communicate with such an add-on service, it would generate an appropriate XML stanza and the remote server would attempt to resolve the add-on service's DNS domain name via an SRV lookup on resource records such as "_xmpp- server._tcp.conference.example.net." or "_xmpp- server._tcp.muc.example.com.". Therefore, if the administrators of an XMPP service wish to enable entities associated with remote servers to access such add-on services, they need to advertise the appropriate "_xmpp-server" SRV records in addition to the "_xmpp- server" record for their main XMPP service. In case SRV records are not available, the fallback methods described under Section 3.2.2 can be used to resolve the DNS domain names of add-on services.3.3. Reconnection
It can happen that an XMPP server goes offline unexpectedly while servicing TCP connections from connected clients and remote servers. Because the number of such connections can be quite large, the reconnection algorithm employed by entities that seek to reconnect can have a significant impact on software performance and network congestion. If an entity chooses to reconnect, it: o SHOULD set the number of seconds that expire before reconnecting to an unpredictable number between 0 and 60 (this helps to ensure that not all entities attempt to reconnect at exactly the same number of seconds after being disconnected). o SHOULD back off increasingly on the time between subsequent reconnection attempts (e.g., in accordance with "truncated binary exponential backoff" as described in [ETHERNET]) if the first reconnection attempt does not succeed. It is RECOMMENDED to make use of TLS session resumption [TLS-RESUME] when reconnecting. A future version of this document, or a separate specification, might provide more detailed guidelines regarding methods for speeding the reconnection process.
3.4. Reliability
The use of long-lived TCP connections in XMPP implies that the sending of XML stanzas over XML streams can be unreliable, since the parties to a long-lived TCP connection might not discover a connectivity disruption in a timely manner. At the XMPP application layer, long connectivity disruptions can result in undelivered stanzas. Although the core XMPP technology defined in this specification does not contain features to overcome this lack of reliability, there exist XMPP extensions for doing so (e.g., [XEP-0198]).4. XML Streams
4.1. Stream Fundamentals
Two fundamental concepts make possible the rapid, asynchronous exchange of relatively small payloads of structured information between XMPP entities: XML streams and XML stanzas. These terms are defined as follows. Definition of XML Stream: An XML stream is a container for the exchange of XML elements between any two entities over a network. The start of an XML stream is denoted unambiguously by an opening "stream header" (i.e., an XML <stream> tag with appropriate attributes and namespace declarations), while the end of the XML stream is denoted unambiguously by a closing XML </stream> tag. During the life of the stream, the entity that initiated it can send an unbounded number of XML elements over the stream, either elements used to negotiate the stream (e.g., to complete TLS negotiation (Section 5) or SASL negotiation (Section 6)) or XML stanzas. The "initial stream" is negotiated from the initiating entity (typically a client or server) to the receiving entity (typically a server), and can be seen as corresponding to the initiating entity's "connection to" or "session with" the receiving entity. The initial stream enables unidirectional communication from the initiating entity to the receiving entity; in order to enable exchange of stanzas from the receiving entity to the initiating entity, the receiving entity MUST negotiate a stream in the opposite direction (the "response stream"). Definition of XML Stanza: An XML stanza is the basic unit of meaning in XMPP. A stanza is a first-level element (at depth=1 of the stream) whose element name is "message", "presence", or "iq" and whose qualifying namespace is 'jabber:client' or 'jabber:server'. By contrast, a first-level element qualified by any other namespace is not an XML stanza (stream errors, stream features, TLS-related elements, SASL-related elements, etc.), nor is a
<message/>, <presence/>, or <iq/> element that is qualified by the 'jabber:client' or 'jabber:server' namespace but that occurs at a depth other than one (e.g., a <message/> element contained within an extension element (Section 8.4) for reporting purposes), nor is a <message/>, <presence/>, or <iq/> element that is qualified by a namespace other than 'jabber:client' or 'jabber:server'. An XML stanza typically contains one or more child elements (with accompanying attributes, elements, and XML character data) as necessary in order to convey the desired information, which MAY be qualified by any XML namespace (see [XML-NAMES] as well as Section 8.4 in this specification). There are three kinds of stanzas: message, presence, and IQ (short for "Info/Query"). These stanza types provide three different communication primitives: a "push" mechanism for generalized messaging, a specialized "publish-subscribe" mechanism for broadcasting information about network availability, and a "request- response" mechanism for more structured exchanges of data (similar to [HTTP]). Further explanations are provided under Section 8.2.1, Section 8.2.2, and Section 8.2.3, respectively. Consider the example of a client's connection to a server. The client initiates an XML stream by sending a stream header to the server, preferably preceded by an XML declaration specifying the XML version and the character encoding supported (see Section 11.5 and Section 11.6). Subject to local policies and service provisioning, the server then replies with a second XML stream back to the client, again preferably preceded by an XML declaration. Once the client has completed SASL negotiation (Section 6) and resource binding (Section 7), the client can send an unbounded number of XML stanzas over the stream. When the client desires to close the stream, it simply sends a closing </stream> tag to the server as further described under Section 4.4. In essence, then, one XML stream functions as an envelope for the XML stanzas sent during a session and another XML stream functions as an envelope for the XML stanzas received during a session. We can represent this in a simplistic fashion as follows.
+--------------------+--------------------+ | INITIAL STREAM | RESPONSE STREAM | +--------------------+--------------------+ | <stream> | | |--------------------|--------------------| | | <stream> | |--------------------|--------------------| | <presence> | | | <show/> | | | </presence> | | |--------------------|--------------------| | <message to='foo'> | | | <body/> | | | </message> | | |--------------------|--------------------| | <iq to='bar' | | | type='get'> | | | <query/> | | | </iq> | | |--------------------|--------------------| | | <iq from='bar' | | | type='result'> | | | <query/> | | | </iq> | |--------------------|--------------------| | [ ... ] | | |--------------------|--------------------| | | [ ... ] | |--------------------|--------------------| | </stream> | | |--------------------|--------------------| | | </stream> | +--------------------+--------------------+ Figure 2: A Simplistic View of Two Streams Those who are accustomed to thinking of XML in a document-centric manner might find the following analogies useful: o The two XML streams are like two "documents" (matching the "document" production from [XML]) that are built up through the accumulation of XML stanzas. o The root <stream/> element is like the "document entity" for each "document" (as described in Section 4.8 of [XML]). o The XML stanzas sent over the streams are like "fragments" of the "documents" (as described in [XML-FRAG]).
However, these descriptions are merely analogies, because XMPP does not deal in documents and fragments but in streams and stanzas. The remainder of this section defines the following aspects of XML streams (along with related topics): o How to open a stream (Section 4.2) o The stream negotiation process (Section 4.3) o How to close a stream (Section 4.4) o The directionality of XML streams (Section 4.5) o How to handle peers that are silent (Section 4.6) o The XML attributes of a stream (Section 4.7) o The XML namespaces of a stream (Section 4.8) o Error handling related to XML streams (Section 4.9)4.2. Opening a Stream
After connecting to the appropriate IP address and port of the receiving entity, the initiating entity opens a stream by sending a stream header (the "initial stream header") to the receiving entity. I: <?xml version='1.0'?> <stream:stream from='juliet@im.example.com' to='im.example.com' version='1.0' xml:lang='en' xmlns='jabber:client' xmlns:stream='http://etherx.jabber.org/streams'> The receiving entity then replies by sending a stream header of its own (the "response stream header") to the initiating entity.
R: <?xml version='1.0'?> <stream:stream from='im.example.com' id='++TR84Sm6A3hnt3Q065SnAbbk3Y=' to='juliet@im.example.com' version='1.0' xml:lang='en' xmlns='jabber:client' xmlns:stream='http://etherx.jabber.org/streams'> The entities can then proceed with the remainder of the stream negotiation process.4.3. Stream Negotiation
4.3.1. Basic Concepts
Because the receiving entity for a stream acts as a gatekeeper to the domains it services, it imposes certain conditions for connecting as a client or as a peer server. At a minimum, the initiating entity needs to authenticate with the receiving entity before it is allowed to send stanzas to the receiving entity (for client-to-server streams this means using SASL as described under Section 6). However, the receiving entity can consider conditions other than authentication to be mandatory-to-negotiate, such as encryption using TLS as described under Section 5. The receiving entity informs the initiating entity about such conditions by communicating "stream features": the set of particular protocol interactions that the initiating entity needs to complete before the receiving entity will accept XML stanzas from the initiating entity, as well as any protocol interactions that are voluntary-to-negotiate but that might improve the handling of an XML stream (e.g., establishment of application-layer compression as described in [XEP-0138]). The existence of conditions for connecting implies that streams need to be negotiated. The order of layers (TCP, then TLS, then SASL, then XMPP as described under Section 13.3) implies that stream negotiation is a multi-stage process. Further structure is imposed by two factors: (1) a given stream feature might be offered only to certain entities or only after certain other features have been negotiated (e.g., resource binding is offered only after SASL authentication), and (2) stream features can be either mandatory-to- negotiate or voluntary-to-negotiate. Finally, for security reasons the parties to a stream need to discard knowledge that they gained during the negotiation process after successfully completing the protocol interactions defined for certain features (e.g., TLS in all cases and SASL in the case when a security layer might be
established, as defined in the specification for the relevant SASL mechanism). This is done by flushing the old stream context and exchanging new stream headers over the existing TCP connection.4.3.2. Stream Features Format
If the initiating entity includes in the initial stream header the 'version' attribute set to a value of at least "1.0" (see Section 4.7.5), after sending the response stream header the receiving entity MUST send a <features/> child element (typically prefixed by the stream namespace prefix as described under Section 4.8.5) to the initiating entity in order to announce any conditions for continuation of the stream negotiation process. Each condition takes the form of a child element of the <features/> element, qualified by a namespace that is different from the stream namespace and the content namespace. The <features/> element can contain one child, contain multiple children, or be empty. Implementation Note: The order of child elements contained in any given <features/> element is not significant. If a particular stream feature is or can be mandatory-to-negotiate, the definition of that feature needs to do one of the following: 1. Declare that the feature is always mandatory-to-negotiate (e.g., this is true of resource binding for XMPP clients); or 2. Specify a way for the receiving entity to flag the feature as mandatory-to-negotiate for this interaction (e.g., for STARTTLS, this is done by including an empty <required/> element in the advertisement for that stream feature, but that is not a generic format for all stream features); it is RECOMMENDED that stream feature definitions for new mandatory-to-negotiate features do so by including an empty <required/> element as is done for STARTTLS. Informational Note: Because there is no generic format for indicating that a feature is mandatory-to-negotiate, it is possible that a feature that is not understood by the initiating entity might be considered mandatory-to-negotiate by the receiving entity, resulting in failure of the stream negotiation process. Although such an outcome would be undesirable, the working group deemed it rare enough that a generic format was not needed. For security reasons, certain stream features necessitate the initiating entity to send a new initial stream header upon successful negotiation of the feature (e.g., TLS in all cases and SASL in the case when a security layer might be established). If this is true of
a given stream feature, the definition of that feature needs to specify that a stream restart is expected after negotiation of the feature. A <features/> element that contains at least one mandatory-to- negotiate feature indicates that the stream negotiation is not complete and that the initiating entity MUST negotiate further features. R: <stream:features> <starttls xmlns='urn:ietf:params:xml:ns:xmpp-tls'> <required/> </starttls> </stream:features> A <features/> element MAY contain more than one mandatory-to- negotiate feature. This means that the initiating entity can choose among the mandatory-to-negotiate features at this stage of the stream negotiation process. As an example, perhaps a future technology will perform roughly the same function as TLS, so the receiving entity might advertise support for both TLS and the future technology at the same stage of the stream negotiation process. However, this applies only at a given stage of the stream negotiation process and does not apply to features that are mandatory-to-negotiate at different stages (e.g., the receiving entity would not advertise both STARTTLS and SASL as mandatory-to-negotiate, or both SASL and resource binding as mandatory-to-negotiate, because TLS would need to be negotiated before SASL and because SASL would need to be negotiated before resource binding). A <features/> element that contains both mandatory-to-negotiate and voluntary-to-negotiate features indicates that the negotiation is not complete but that the initiating entity MAY complete the voluntary- to-negotiate feature(s) before it attempts to negotiate the mandatory-to-negotiate feature(s). R: <stream:features> <bind xmlns='urn:ietf:params:xml:ns:xmpp-bind'/> <compression xmlns='http://jabber.org/features/compress'> <method>zlib</method> <method>lzw</method> </compression> </stream:features> A <features/> element that contains only voluntary-to-negotiate features indicates that the stream negotiation is complete and that the initiating entity is cleared to send XML stanzas, but that the initiating entity MAY negotiate further features if desired.
R: <stream:features> <compression xmlns='http://jabber.org/features/compress'> <method>zlib</method> <method>lzw</method> </compression> </stream:features> An empty <features/> element indicates that the stream negotiation is complete and that the initiating entity is cleared to send XML stanzas. R: <stream:features/>4.3.3. Restarts
On successful negotiation of a feature that necessitates a stream restart, both parties MUST consider the previous stream to be replaced but MUST NOT send a closing </stream> tag and MUST NOT terminate the underlying TCP connection; instead, the parties MUST reuse the existing connection, which might be in a new state (e.g., encrypted as a result of TLS negotiation). The initiating entity then MUST send a new initial stream header, which SHOULD be preceded by an XML declaration as described under Section 11.5. When the receiving entity receives the new initial stream header, it MUST generate a new stream ID (instead of reusing the old stream ID) before sending a new response stream header (which SHOULD be preceded by an XML declaration as described under Section 11.5).4.3.4. Resending Features
The receiving entity MUST send an updated list of stream features to the initiating entity after a stream restart. The list of updated features MAY be empty if there are no further features to be advertised or MAY include any combination of features.4.3.5. Completion of Stream Negotiation
The receiving entity indicates completion of the stream negotiation process by sending to the initiating entity either an empty <features/> element or a <features/> element that contains only voluntary-to-negotiate features. After doing so, the receiving entity MAY send an empty <features/> element (e.g., after negotiation of such voluntary-to-negotiate features) but MUST NOT send additional stream features to the initiating entity (if the receiving entity has new features to offer, preferably limited to mandatory-to-negotiate or security-critical features, it can simply close the stream with a <reset/> stream error (Section 4.9.3.16) and then advertise the new features when the initiating entity reconnects, preferably closing
existing streams in a staggered way so that not all of the initiating entities reconnect at once). Once stream negotiation is complete, the initiating entity is cleared to send XML stanzas over the stream for as long as the stream is maintained by both parties. Informational Note: Resource binding as specified under Section 7 is an historical exception to the foregoing rule, since it is mandatory-to-negotiate for clients but uses XML stanzas for negotiation purposes. The initiating entity MUST NOT attempt to send XML stanzas (Section 8) to entities other than itself (i.e., the client's connected resource or any other authenticated resource of the client's account) or the server to which it is connected until stream negotiation has been completed. Even if the initiating entity does attempt to do so, the receiving entity MUST NOT accept such stanzas and MUST close the stream with a <not-authorized/> stream error (Section 4.9.3.12). This rule applies to XML stanzas only (i.e., <message/>, <presence/>, and <iq/> elements qualified by the content namespace) and not to XML elements used for stream negotiation (e.g., elements used to complete TLS negotiation (Section 5) or SASL negotiation (Section 6)).4.3.6. Determination of Addresses
After the parties to an XML stream have completed the appropriate aspects of stream negotiation, the receiving entity for a stream MUST determine the initiating entity's JID. For client-to-server communication, both SASL negotiation (Section 6) and resource binding (Section 7) MUST be completed before the server can determine the client's address. The client's bare JID (<localpart@domainpart>) MUST be the authorization identity (as defined by [SASL]), either (1) as directly communicated by the client during SASL negotiation (Section 6) or (2) as derived by the server from the authentication identity if no authorization identity was specified during SASL negotiation. The resourcepart of the full JID (<localpart@domainpart/resourcepart>) MUST be the resource negotiated by the client and server during resource binding (Section 7). A client MUST NOT attempt to guess at its JID but instead MUST consider its JID to be whatever the server returns to it during resource binding. The server MUST ensure that the resulting JID (including localpart, domainpart, resourcepart, and separator characters) conforms to the canonical format for XMPP addresses defined in [XMPP-ADDR]; to meet this restriction, the server MAY replace the JID sent by the client with the canonicalized JID as determined by the server and communicate that JID to the client during resource binding.
For server-to-server communication, the initiating server's bare JID (<domainpart>) MUST be the authorization identity (as defined by [SASL]), either (1) as directly communicated by the initiating server during SASL negotiation (Section 6) or (2) as derived by the receiving server from the authentication identity if no authorization identity was specified during SASL negotiation. In the absence of SASL negotiation, the receiving server MAY consider the authorization identity to be an identity negotiated within the relevant verification protocol (e.g., the 'from' attribute of the <result/> element in Server Dialback [XEP-0220]). Security Warning: Because it is possible for a third party to tamper with information that is sent over the stream before a security layer such as TLS is successfully negotiated, it is advisable for the receiving server to treat any such unprotected information with caution; this applies especially to the 'from' and 'to' addresses on the first initial stream header sent by the initiating entity.4.3.7. Flow Chart
We summarize the foregoing rules in the following non-normative flow chart for the stream negotiation process, presented from the perspective of the initiating entity.
+---------------------+ | open TCP connection | +---------------------+ | v +---------------+ | send initial |<-------------------------+ | stream header | ^ +---------------+ | | | v | +------------------+ | | receive response | | | stream header | | +------------------+ | | | v | +----------------+ | | receive stream | | +------------------>| features | | ^ {OPTIONAL} +----------------+ | | | | | v | | +<-----------------+ | | | | | {empty?} ----> {all voluntary?} ----> {some mandatory?} | | | no | no | | | | yes | yes | yes | | | v v | | | +---------------+ +----------------+ | | | | MAY negotiate | | MUST negotiate | | | | | any or none | | one feature | | | | +---------------+ +----------------+ | | v | | | | +---------+ v | | | | DONE |<----- {negotiate?} | | | +---------+ no | | | | yes | | | | v v | | +--------->+<---------+ | | | | | v | +<-------------------------- {restart mandatory?} ------------>+ no yes Figure 3: Stream Negotiation Flow Chart
4.4. Closing a Stream
An XML stream from one entity to another can be closed at any time, either because a specific stream error (Section 4.9) has occurred or in the absence of an error (e.g., when a client simply ends its session). A stream is closed by sending a closing </stream> tag. E: </stream:stream> If the parties are using either two streams over a single TCP connection or two streams over two TCP connections, the entity that sends the closing stream tag MUST behave as follows: 1. Wait for the other party to also close its outbound stream before terminating the underlying TCP connection(s); this gives the other party an opportunity to finish transmitting any outbound data to the closing entity before the termination of the TCP connection(s). 2. Refrain from sending any further data over its outbound stream to the other entity, but continue to process data received from the other entity (and, if necessary, process such data). 3. Consider both streams to be void if the other party does not send its closing stream tag within a reasonable amount of time (where the definition of "reasonable" is a matter of implementation or deployment). 4. After receiving a reciprocal closing stream tag from the other party or waiting a reasonable amount of time with no response, terminate the underlying TCP connection(s). Security Warning: In accordance with Section 7.2.1 of [TLS], to help prevent a truncation attack the party that is closing the stream MUST send a TLS close_notify alert and MUST receive a responding close_notify alert from the other party before terminating the underlying TCP connection(s). If the parties are using multiple streams over multiple TCP connections, there is no defined pairing of streams and therefore the behavior is a matter for implementation.
4.5. Directionality
An XML stream is always unidirectional, by which is meant that XML stanzas can be sent in only one direction over the stream (either from the initiating entity to the receiving entity or from the receiving entity to the initiating entity). Depending on the type of session that has been negotiated and the nature of the entities involved, the entities might use: o Two streams over a single TCP connection, where the security context negotiated for the first stream is applied to the second stream. This is typical for client-to-server sessions, and a server MUST allow a client to use the same TCP connection for both streams. o Two streams over two TCP connections, where each stream is separately secured. In this approach, one TCP connection is used for the stream in which stanzas are sent from the initiating entity to the receiving entity, and the other TCP connection is used for the stream in which stanzas are sent from the receiving entity to the initiating entity. This is typical for server-to- server sessions. o Multiple streams over two or more TCP connections, where each stream is separately secured. This approach is sometimes used for server-to-server communication between two large XMPP service providers; however, this can make it difficult to maintain coherence of data received over multiple streams in situations described under Section 10.1, which is why a server MAY close the stream with a <conflict/> stream error (Section 4.9.3.3) if a remote server attempts to negotiate more than one stream (as described under Section 4.9.3.3). This concept of directionality applies only to stanzas and explicitly does not apply to first-level children of the stream root that are used to bootstrap or manage the stream (e.g., first-level elements used for TLS negotiation, SASL negotiation, Server Dialback [XEP-0220], and Stream Management [XEP-0198]). The foregoing considerations imply that while completing STARTTLS negotiation (Section 5) and SASL negotiation (Section 6) two servers would use one TCP connection, but after the stream negotiation process is done that original TCP connection would be used only for the initiating server to send XML stanzas to the receiving server. In order for the receiving server to send XML stanzas to the initiating server, the receiving server would need to reverse the roles and negotiate an XML stream from the receiving server to the
initiating server over a separate TCP connection. This separate TCP connection is then secured using a new round of TLS and/or SASL negotiation. Implementation Note: For historical reasons, a server-to-server session always uses two TCP connections. While that approach remains the standard behavior described in this document, extensions such as [XEP-0288] enable servers to negotiate the use of a single TCP connection for bidirectional stanza exchange. Informational Note: Although XMPP developers sometimes apply the terms "unidirectional" and "bidirectional" to the underlying TCP connection (e.g., calling the TCP connection for a client-to- server session "bidirectional" and the TCP connection for a server-to-server session "unidirectional"), strictly speaking a stream is always unidirectional (because the initiating entity and receiving entity always have a minimum of two streams, one in each direction) and a TCP connection is always bidirectional (because TCP traffic can be sent in both directions). Directionality applies to the application-layer traffic sent over the TCP connection, not to the transport-layer traffic sent over the TCP connection itself.4.6. Handling of Silent Peers
When an entity that is a party to a stream has not received any XMPP traffic from its stream peer for some period of time, the peer might appear to be silent. There are several reasons why this might happen: 1. The underlying TCP connection is dead. 2. The XML stream is broken despite the fact that the underlying TCP connection is alive. 3. The peer is idle and simply has not sent any XMPP traffic over its XML stream to the entity. These three conditions are best handled separately, as described in the following sections. Implementation Note: For the purpose of handling silent peers, we treat a two unidirectional TCP connections as conceptually equivalent to a single bidirectional TCP connection (see Section 4.5); however, implementers need to be aware that, in the case of two unidirectional TCP connections, responses to traffic at the XMPP application layer will come back from the peer on the second TCP connection. In addition, the use of multiple streams
in each direction (which is a somewhat frequent deployment choice for server-to-server connectivity among large XMPP service providers) further complicates application-level checking of XMPP streams and their underlying TCP connections, because there is no necessary correlation between any given initial stream and any given response stream.4.6.1. Dead Connection
If the underlying TCP connection is dead, stream-level checks (e.g., [XEP-0199] and [XEP-0198]) are ineffective. Therefore, it is unnecessary to close the stream with or without an error, and it is appropriate instead to simply terminate the TCP connection. One common method for checking the TCP connection is to send a space character (U+0020) between XML stanzas, which is allowed for XML streams as described under Section 11.7; the sending of such a space character is properly called a "whitespace keepalive" (the term "whitespace ping" is often used, despite the fact that it is not a ping since no "pong" is possible). However, this is not allowed during TLS negotiation or SASL negotiation, as described under Section 5.3.3 and Section 6.3.5.4.6.2. Broken Stream
Even if the underlying TCP connection is alive, the peer might never respond to XMPP traffic that the entity sends, whether normal stanzas or specialized stream-checking traffic such as the application-level pings defined in [XEP-0199] or the more comprehensive Stream Management protocol defined in [XEP-0198]. In this case, it is appropriate for the entity to close a broken stream with a <connection-timeout/> stream error (Section 4.9.3.4).4.6.3. Idle Peer
Even if the underlying TCP connection is alive and the stream is not broken, the peer might have sent no stanzas for a certain period of time. In this case, the peer itself MAY close the stream (as described under Section 4.4) rather than leaving an unused stream open. If the idle peer does not close the stream, the other party MAY either close the stream using the handshake described under Section 4.4 or close the stream with a stream error (e.g., <resource- constraint/> (Section 4.9.3.17) if the entity has reached a limit on the number of open TCP connections or <policy-violation/> (Section 4.9.3.14) if the connection has exceeded a local timeout policy). However, consistent with the order of layers (specified under Section 13.3), the other party is advised to verify that the underlying TCP connection is alive and the stream is unbroken (as
described above) before concluding that the peer is idle. Furthermore, it is preferable to be liberal in accepting idle peers, since experience has shown that doing so improves the reliability of communication over XMPP networks and that it is typically more efficient to maintain a stream between two servers than to aggressively time out such a stream.4.6.4. Use of Checking Methods
Implementers are advised to support whichever stream-checking and connection-checking methods they deem appropriate, but to carefully weigh the network impact of such methods against the benefits of discovering broken streams and dead TCP connections in a timely manner. The length of time between the use of any particular check is very much a matter of local service policy and depends strongly on the network environment and usage scenarios of a given deployment and connection type. At the time of writing, it is RECOMMENDED that any such check be performed not more than once every 5 minutes and that, ideally, such checks will be initiated by clients rather than servers. Those who implement XMPP software and deploy XMPP services are encouraged to seek additional advice regarding appropriate timing of stream-checking and connection-checking methods, particularly when power-constrained devices are being used (e.g., in mobile environments).4.7. Stream Attributes
The attributes of the root <stream/> element are defined in the following sections. Security Warning: Until and unless the confidentiality and integrity of the stream are protected via TLS as described under Section 5 or an equivalent security layer (such as the SASL GSSAPI mechanism), the attributes provided in a stream header could be tampered with by an attacker. Implementation Note: The attributes of the root <stream/> element are not prepended by a namespace prefix because, as explained in [XML-NAMES], "[d]efault namespace declarations do not apply directly to attribute names; the interpretation of unprefixed attributes is determined by the element on which they appear."4.7.1. from
The 'from' attribute specifies an XMPP identity of the entity sending the stream element.
For initial stream headers in client-to-server communication, the 'from' attribute is the XMPP identity of the principal controlling the client, i.e., a JID of the form <localpart@domainpart>. The client might not know the XMPP identity, e.g., because the XMPP identity is assigned at a level other than the XMPP application layer (as in the Generic Security Service Application Program Interface [GSS-API]) or is derived by the server from information provided by the client (as in some deployments of end-user certificates with the SASL EXTERNAL mechanism). Furthermore, if the client considers the XMPP identity to be private information then it is advised not to include a 'from' attribute before the confidentiality and integrity of the stream are protected via TLS or an equivalent security layer. However, if the client knows the XMPP identity then it SHOULD include the 'from' attribute after the confidentiality and integrity of the stream are protected via TLS or an equivalent security layer. I: <?xml version='1.0'?> <stream:stream from='juliet@im.example.com' to='im.example.com' version='1.0' xml:lang='en' xmlns='jabber:client' xmlns:stream='http://etherx.jabber.org/streams'> For initial stream headers in server-to-server communication, the 'from' attribute is one of the configured FQDNs of the server, i.e., a JID of the form <domainpart>. The initiating server might have more than one XMPP identity, e.g., in the case of a server that provides virtual hosting, so it will need to choose an identity that is associated with this output stream (e.g., based on the 'to' attribute of the stanza that triggered the stream negotiation attempt). Because a server is a "public entity" on the XMPP network, it MUST include the 'from' attribute after the confidentiality and integrity of the stream are protected via TLS or an equivalent security layer. I: <?xml version='1.0'?> <stream:stream from='example.net' to='im.example.com' version='1.0' xml:lang='en' xmlns='jabber:server' xmlns:stream='http://etherx.jabber.org/streams'>
For response stream headers in both client-to-server and server-to- server communication, the receiving entity MUST include the 'from' attribute and MUST set its value to one of the receiving entity's FQDNs (which MAY be an FQDN other than that specified in the 'to' attribute of the initial stream header, as described under Section 4.9.1.3 and Section 4.9.3.6). R: <?xml version='1.0'?> <stream:stream from='im.example.com' id='++TR84Sm6A3hnt3Q065SnAbbk3Y=' to='juliet@im.example.com' version='1.0' xml:lang='en' xmlns='jabber:client' xmlns:stream='http://etherx.jabber.org/streams'> Whether or not the 'from' attribute is included, each entity MUST verify the identity of the other entity before exchanging XML stanzas with it, as described under Section 13.5. Interoperability Note: It is possible that implementations based on [RFC3920] will not include the 'from' address on any stream headers (even ones whose confidentiality and integrity are protected); an entity SHOULD be liberal in accepting such stream headers.4.7.2. to
For initial stream headers in both client-to-server and server-to- server communication, the initiating entity MUST include the 'to' attribute and MUST set its value to a domainpart that the initiating entity knows or expects the receiving entity to service. (The same information can be provided in other ways, such as a Server Name Indication during TLS negotiation as described in [TLS-EXT].) I: <?xml version='1.0'?> <stream:stream from='juliet@im.example.com' to='im.example.com' version='1.0' xml:lang='en' xmlns='jabber:client' xmlns:stream='http://etherx.jabber.org/streams'> For response stream headers in client-to-server communication, if the client included a 'from' attribute in the initial stream header then the server MUST include a 'to' attribute in the response stream
header and MUST set its value to the bare JID specified in the 'from' attribute of the initial stream header. If the client did not include a 'from' attribute in the initial stream header then the server MUST NOT include a 'to' attribute in the response stream header. R: <?xml version='1.0'?> <stream:stream from='im.example.com' id='++TR84Sm6A3hnt3Q065SnAbbk3Y=' to='juliet@im.example.com' version='1.0' xml:lang='en' xmlns='jabber:client' xmlns:stream='http://etherx.jabber.org/streams'> For response stream headers in server-to-server communication, the receiving entity MUST include a 'to' attribute in the response stream header and MUST set its value to the domainpart specified in the 'from' attribute of the initial stream header. R: <?xml version='1.0'?> <stream:stream from='im.example.com' id='g4qSvGvBxJ+xeAd7QKezOQJFFlw=' to='example.net' version='1.0' xml:lang='en' xmlns='jabber:server' xmlns:stream='http://etherx.jabber.org/streams'> Whether or not the 'to' attribute is included, each entity MUST verify the identity of the other entity before exchanging XML stanzas with it, as described under Section 13.5. Interoperability Note: It is possible that implementations based on [RFC3920] will not include the 'to' address on stream headers; an entity SHOULD be liberal in accepting such stream headers.4.7.3. id
The 'id' attribute specifies a unique identifier for the stream, called a "stream ID". The stream ID MUST be generated by the receiving entity when it sends a response stream header and MUST BE unique within the receiving application (normally a server).
Security Warning: The stream ID MUST be both unpredictable and non-repeating because it can be security-critical when reused by an authentication mechanisms, as is the case for Server Dialback [XEP-0220] and the "XMPP 0.9" authentication mechanism used before RFC 3920 defined the use of SASL in XMPP; for recommendations regarding randomness for security purposes, see [RANDOM]. For initial stream headers, the initiating entity MUST NOT include the 'id' attribute; however, if the 'id' attribute is included, the receiving entity MUST ignore it. For response stream headers, the receiving entity MUST include the 'id' attribute. R: <?xml version='1.0'?> <stream:stream from='im.example.com' id='++TR84Sm6A3hnt3Q065SnAbbk3Y=' to='juliet@im.example.com' version='1.0' xml:lang='en' xmlns='jabber:client' xmlns:stream='http://etherx.jabber.org/streams'> Interoperability Note: In RFC 3920, the text regarding inclusion of the 'id' attribute was ambiguous, leading some implementations to leave the attribute off the response stream header.4.7.4. xml:lang
The 'xml:lang' attribute specifies an entity's preferred or default language for any human-readable XML character data to be sent over the stream (an XML stanza can also possess an 'xml:lang' attribute, as discussed under Section 8.1.5). The syntax of this attribute is defined in Section 2.12 of [XML]; in particular, the value of the 'xml:lang' attribute MUST conform to the NMTOKEN datatype (as defined in Section 2.3 of [XML]) and MUST conform to the language identifier format defined in [LANGTAGS]. For initial stream headers, the initiating entity SHOULD include the 'xml:lang' attribute.
I: <?xml version='1.0'?> <stream:stream from='juliet@im.example.com' to='im.example.com' version='1.0' xml:lang='en' xmlns='jabber:client' xmlns:stream='http://etherx.jabber.org/streams'> For response stream headers, the receiving entity MUST include the 'xml:lang' attribute. The following rules apply: o If the initiating entity included an 'xml:lang' attribute in its initial stream header and the receiving entity supports that language in the human-readable XML character data that it generates and sends to the initiating entity (e.g., in the <text/> element for stream and stanza errors), the value of the 'xml:lang' attribute MUST be the identifier for the initiating entity's preferred language (e.g., "de-CH"). o If the receiving entity supports a language that matches the initiating entity's preferred language according to the "lookup scheme" specified in Section 3.4 of [LANGMATCH] (e.g., "de" instead of "de-CH"), then the value of the 'xml:lang' attribute SHOULD be the identifier for the matching language. o If the receiving entity does not support the initiating entity's preferred language or a matching language according to the lookup scheme (or if the initiating entity did not include the 'xml:lang' attribute in its initial stream header), then the value of the 'xml:lang' attribute MUST be the identifier for the default language of the receiving entity (e.g., "en"). R: <?xml version='1.0'?> <stream:stream from='im.example.com' id='++TR84Sm6A3hnt3Q065SnAbbk3Y=' to='juliet@im.example.com' version='1.0' xml:lang='en' xmlns='jabber:client' xmlns:stream='http://etherx.jabber.org/streams'> If the initiating entity included the 'xml:lang' attribute in its initial stream header, the receiving entity SHOULD remember that value as the default xml:lang for all stanzas sent by the initiating entity over the current stream. As described under Section 8.1.5,
the initiating entity MAY include the 'xml:lang' attribute in any XML stanzas it sends over the stream. If the initiating entity does not include the 'xml:lang' attribute in any such stanza, the receiving entity SHOULD add the 'xml:lang' attribute to the stanza when routing it to a remote server or delivering it to a connected client, where the value of the attribute MUST be the identifier for the language preferred by the initiating entity (even if the receiving entity does not support that language for human-readable XML character data it generates and sends to the initiating entity, such as in stream or stanza errors). If the initiating entity includes the 'xml:lang' attribute in any such stanza, the receiving entity MUST NOT modify or delete it when routing it to a remote server or delivering it to a connected client.4.7.5. version
The inclusion of the version attribute set to a value of at least "1.0" signals support for the stream-related protocols defined in this specification, including TLS negotiation (Section 5), SASL negotiation (Section 6), stream features (Section 4.3.2), and stream errors (Section 4.9). The version of XMPP specified in this specification is "1.0"; in particular, XMPP 1.0 encapsulates the stream-related protocols as well as the basic semantics of the three defined XML stanza types (<message/>, <presence/>, and <iq/> as described under Sections 8.2.1, 8.2.2, and 8.2.3, respectively). The numbering scheme for XMPP versions is "<major>.<minor>". The major and minor numbers MUST be treated as separate integers and each number MAY be incremented higher than a single digit. Thus, "XMPP 2.4" would be a lower version than "XMPP 2.13", which in turn would be lower than "XMPP 12.3". Leading zeros (e.g., "XMPP 6.01") MUST be ignored by recipients and MUST NOT be sent. The major version number will be incremented only if the stream and stanza formats or obligatory actions have changed so dramatically that an older version entity would not be able to interoperate with a newer version entity if it simply ignored the elements and attributes it did not understand and took the actions defined in the older specification. The minor version number will be incremented only if significant new capabilities have been added to the core protocol (e.g., a newly defined value of the 'type' attribute for message, presence, or IQ stanzas). The minor version number MUST be ignored by an entity with a smaller minor version number, but MAY be used for informational purposes by the entity with the larger minor version number (e.g.,
the entity with the larger minor version number would simply note that its correspondent would not be able to understand that value of the 'type' attribute and therefore would not send it). The following rules apply to the generation and handling of the 'version' attribute within stream headers: 1. The initiating entity MUST set the value of the 'version' attribute in the initial stream header to the highest version number it supports (e.g., if the highest version number it supports is that defined in this specification, it MUST set the value to "1.0"). 2. The receiving entity MUST set the value of the 'version' attribute in the response stream header to either the value supplied by the initiating entity or the highest version number supported by the receiving entity, whichever is lower. The receiving entity MUST perform a numeric comparison on the major and minor version numbers, not a string match on "<major>.<minor>". 3. If the version number included in the response stream header is at least one major version lower than the version number included in the initial stream header and newer version entities cannot interoperate with older version entities as described, the initiating entity SHOULD close the stream with an <unsupported- version/> stream error (Section 4.9.3.25). 4. If either entity receives a stream header with no 'version' attribute, the entity MUST consider the version supported by the other entity to be "0.9" and SHOULD NOT include a 'version' attribute in the response stream header.
4.7.6. Summary of Stream Attributes
The following table summarizes the attributes of the root <stream/> element. +----------+--------------------------+-------------------------+ | | initiating to receiving | receiving to initiating | +----------+--------------------------+-------------------------+ | to | JID of receiver | JID of initiator | | from | JID of initiator | JID of receiver | | id | ignored | stream identifier | | xml:lang | default language | default language | | version | XMPP 1.0+ supported | XMPP 1.0+ supported | +----------+--------------------------+-------------------------+ Figure 4: Stream Attributes