B.4. State Tables
This section contains a table for each state. The table contains all the requests and events on which this state is allowed to act. The events that are method names are, unless noted, requests with the given method in the direction client to server (C->S). In some cases, there exists one or more requisites. The response column tells what type of response actions should be performed. Possible actions that are requested for an event include: response codes, e.g., 200, headers that need to be included in the response, setting of state variables, or settings of other session-related parameters. The new state column tells which state the state machine changes to. The response to a valid request meeting the requisites is normally a 2xx (SUCCESS) unless otherwise noted in the response column. The exceptions need to be given a response according to the response column. If the request does not meet the requisite, is erroneous, or some other type of error occurs, the appropriate response code is to be sent. If the response code is a 4xx, the session state is unchanged. A response code of 3rr will result in that the session being ended and its state changed to Init. A response code of 304 results in no state change. However, there are restrictions to when a 3rr response may be used. A 5xx response does not result in any change of the session state, except if the error is not possible to recover from. An unrecoverable error results in the ending of the session. In the general case, if it can't be determined whether or not it was an unrecoverable error, the client will be required to test. In the case that the next request after a 5xx is responded to with a 454 (Session Not Found), the client knows that the session has ended. For any request message that cannot be responded to within the time defined in Section 10.4, a 100 response must be sent. The server will time out the session after the period of time specified in the SETUP response, if no activity from the client is detected. Therefore, there exists a timeout event for all states except Init.
In the case that NRM = 1, the presentation URI is equal to the media URI or a specified presentation URI. For NRM > 1, the presentation URI needs to be other than any of the media that are part of the session. This applies to all states. +---------------+-----------------+---------------------------------+ | Event | Prerequisite | Response | +---------------+-----------------+---------------------------------+ | DESCRIBE | Needs REDIRECT | 3rr, Redirect | | | | | | DESCRIBE | | 200, Session description | | | | | | OPTIONS | Session ID | 200, Reset session timeout | | | | timer | | | | | | OPTIONS | | 200 | | | | | | SET_PARAMETER | Valid parameter | 200, change value of parameter | | | | | | GET_PARAMETER | Valid parameter | 200, return value of parameter | +---------------+-----------------+---------------------------------+ Table 9: Non-State-Machine Changing Events The methods in Table 9 do not have any effect on the state machine or the state variables. However, some methods do change other session- related parameters, for example, SET_PARAMETER, which will set the parameter(s) specified in its body. Also, all of these methods that allow the Session header will also update the keep-alive timer for the session. +------------------+----------------+-----------+-------------------+ | Action | Requisite | New State | Response | +------------------+----------------+-----------+-------------------+ | SETUP | | Ready | NRM=1, RP=0.0 | | | | | | | SETUP | Needs Redirect | Init | 3rr Redirect | | | | | | | S -> C: REDIRECT | No Session hdr | Init | Terminate all SES | +------------------+----------------+-----------+-------------------+ Table 10: State: Init The initial state of the state machine (Table 10) can only be left by processing a correct SETUP request. As seen in the table, the two state variables are also set by a correct request. This table also shows that a correct SETUP can in some cases be redirected to another URI or server by a 3rr response.
+-------------+------------------------+---------+------------------+ | Action | Requisite | New | Response | | | | State | | +-------------+------------------------+---------+------------------+ | SETUP | New URI | Ready | NRM +=1 | | | | | | | SETUP | URI Setup prior | Ready | Change transport | | | | | param | | | | | | | TEARDOWN | Prs URI, | Init | No session hdr, | | | | | NRM = 0 | | | | | | | TEARDOWN | md URI,NRM=1 | Init | No Session hdr, | | | | | NRM = 0 | | | | | | | TEARDOWN | md URI,NRM>1 | Ready | Session hdr, NRM | | | | | -= 1 | | | | | | | PLAY | Prs URI, No range | Play | Play from RP | | | | | | | PLAY | Prs URI, Range | Play | According to | | | | | range | | | | | | | PLAY | md URI, NRM=1, Range | Play | According to | | | | | range | | | | | | | PLAY | md URI, NRM=1 | Play | Play from RP | | | | | | | PAUSE | Prs URI | Ready | Return PP | | | | | | | SC:REDIRECT | Terminate-Reason | Ready | Set RedP | | | | | | | SC:REDIRECT | No Terminate-Reason | Init | Session is | | | time parameter | | removed | | | | | | | Timeout | | Init | | | | | | | | RedP | | Init | TEARDOWN of | | reached | | | session | +-------------+------------------------+---------+------------------+ Table 11: State: Ready In the Ready state (Table 11), some of the actions depend on the number of media streams (NRM) in the session, i.e., aggregated or non-aggregated control. A SETUP request in the Ready state can either add one more media stream to the session or, if the media stream (same URI) already is part of the session, change the
transport parameters. TEARDOWN depends on both the Request-URI and the number of media streams within the session. If the Request-URI is the presentation URI, the whole session is torn down. If a media URI is used in the TEARDOWN request and more than one media exists in the session, the session will remain and a session header is returned in the response. If only a single media stream remains in the session when performing a TEARDOWN with a media URI, the session is removed. The number of media streams remaining after tearing down a media stream determines the new state.
+----------------+-----------------------+--------+-----------------+ | Action | Requisite | New | Response | | | | State | | +----------------+-----------------------+--------+-----------------+ | PAUSE | Prs URI | Ready | Set RP to | | | | | present point | | | | | | | End of media | All media | Play | Set RP = End of | | | | | media | | | | | | | End of range | | Play | Set RP = End of | | | | | range | | | | | | | PLAY | Prs URI, No range | Play | Play from | | | | | present point | | | | | | | PLAY | Prs URI, Range | Play | According to | | | | | range | | | | | | | SC:PLAY_NOTIFY | | Play | 200 | | | | | | | SETUP | New URI | Play | 455 | | | | | | | SETUP | md URI | Play | 455 | | | | | | | SETUP | md URI, IFI | Play | Change | | | | | transport param.| | | | | | | TEARDOWN | Prs URI | Init | No session hdr | | | | | | | TEARDOWN | md URI,NRM=1 | Init | No Session hdr, | | | | | NRM=0 | | | | | | | TEARDOWN | md URI | Play | 455 | | | | | | | SC:REDIRECT | Terminate Reason with | Play | Set RedP | | | Time parameter | | | | | | | | | SC:REDIRECT | | Init | Session is | | | | | removed | | | | | | | RedP reached | | Init | TEARDOWN of | | | | | session | | | | | | | Timeout | | Init | Stop Media | | | | | playout | +----------------+-----------------------+--------+-----------------+ Table 12: State: Play
The Play state table (Table 12) contains a number of requests that need a presentation URI (labeled as Prs URI) to work on (i.e., the presentation URI has to be used as the Request-URI). This is due to the exclusion of non-aggregated stream control in sessions with more than one media stream. To avoid inconsistencies between the client and server, automatic state transitions are avoided. This can be seen at, for example, an "End of media" event when all media has finished playing but the session still remains in Play state. An explicit PAUSE request needs to be sent to change the state to Ready. It may appear that there exist automatic transitions in "RedP reached" and "PP reached". However, they are requested and acknowledged before they take place. The time at which the transition will happen is known by looking at the Terminate-Reason header's time parameter and Range header, respectively. If the client sends a request close in time to these transitions, it needs to be prepared for receiving error messages, as the state may or may not have changed.Appendix C. Media-Transport Alternatives
This section defines how certain combinations of protocols, profiles, and lower transports are used. This includes the usage of the Transport header's source and destination address parameters: "src_addr" and "dest_addr".C.1. RTP
This section defines the interaction of RTSP with respect to the RTP protocol [RFC3550]. It also defines any necessary media-transport signaling with regard to RTP. The available RTP profiles and lower-layer transports are described below along with rules on signaling the available combinations.C.1.1. AVP
The usage of the "RTP Profile for Audio and Video Conferences with Minimal Control" [RFC3551] when using RTP for media transport over different lower-layer transport protocols is defined below in regard to RTSP. One such case is defined within this document: the use of embedded (interleaved) binary data as defined in Section 14. The usage of this method is indicated by including the "interleaved" parameter.
When using embedded binary data, "src_addr" and "dest_addr" MUST NOT be used. This addressing and multiplexing is used as defined with use of channel numbers and the interleaved parameter.C.1.2. AVP/UDP
This part describes the sending of RTP [RFC3550] over lower- transport-layer UDP [RFC768] according to the profile "RTP Profile for Audio and Video Conferences with Minimal Control" defined in [RFC3551]. Implementations of RTP/AVP/UDP MUST implement RTCP (Appendix C.1.6). This profile requires one or two unidirectional or bidirectional UDP flows per media stream. The first UDP flow is for RTP and the second is for RTCP. Multiplexing of RTP and RTCP (Appendix C.1.6.4) MAY be used, in which case, a single UDP flow is used for both parts. Embedding of RTP data with the RTSP messages, in accordance with Section 14, SHOULD NOT be performed when RTSP messages are transported over unreliable transport protocols, like UDP [RFC768]. The RTP/UDP and RTCP/UDP flows can be established using the Transport header's "src_addr" and "dest_addr" parameters. In RTSP PLAY mode, the transmission of RTP packets from client to server is unspecified. The behavior in regard to such RTP packets MAY be defined in future. The "src_addr" and "dest_addr" parameters are used in the following way for media delivery and playback mode, i.e., Mode=PLAY: o The "src_addr" and "dest_addr" parameters MUST contain either 1 or 2 address specifications. Note that two address specifications MAY be provided even if RTP and RTCP multiplexing is negotiated. o Each address specification for RTP/AVP/UDP or RTP/AVP/TCP MUST contain either: * both an address and a port number, or * a port number without an address. o The first address specification given in either of the parameters applies to the RTP stream. The second specification, if present, applies to the RTCP stream, unless in the case RTP and RTCP multiplexing is negotiated where both RTP and RTCP will use the first specification.
o The RTP/UDP packets from the server to the client MUST be sent to the address and port given by the first address specification of the "dest_addr" parameter. o The RTCP/UDP packets from the server to the client MUST be sent to the address and port given by the second address specification of the "dest_addr" parameter, unless RTP and RTCP multiplexing has been negotiated, in which case RTCP MUST be sent to the first address specification. If no second pair is specified and RTP and RTCP multiplexing has not been negotiated, RTCP MUST NOT be sent. o The RTCP/UDP packets from the client to the server MUST be sent to the address and port given by the second address specification of the "src_addr" parameter, unless RTP and RTCP multiplexing has been negotiated, in which case RTCP MUST be sent to the first address specification. If no second pair is specified and RTP and RTCP multiplexing has not been negotiated, RTCP MUST NOT be sent. o The RTP/UDP packets from the client to the server MUST be sent to the address and port given by the first address specification of the "src_addr" parameter. o RTP and RTCP packets SHOULD be sent from the corresponding receiver port, i.e., RTCP packets from the server should be sent from the "src_addr" parameters second address port pair, unless RTP and RTCP multiplexing has been negotiated in which case the first address port pair is used.C.1.3. AVPF/UDP
The RTP profile "Extended RTP Profile for RTCP-based Feedback (RTP/ AVPF)" [RFC4585] MAY be used as RTP profiles in sessions using RTP. All that is defined for AVP MUST also apply for AVPF. The usage of AVPF is indicated by the media initialization protocol used. In the case of SDP, it is indicated by media lines ("m=") containing the profile RTP/AVPF. That SDP MAY also contain further AVPF-related SDP attributes configuring the AVPF session regarding reporting interval and feedback messages to be used [RFC4585]. This configuration MUST be followed.
C.1.4. SAVP/UDP
The RTP profile "The Secure Real-time Transport Protocol (SRTP)" [RFC3711] is an RTP profile (SAVP) that MAY be used in RTSP sessions using RTP. All that is defined for AVP MUST also apply for SAVP. The usage of SRTP requires that a security context be established. The default key-management unless otherwise signaled SHALL be MIKEY in RSA-R mode as defined in Appendix C.1.4.1 and not according to the procedure defined in "Key Management Extensions for Session Description Protocol (SDP) and Real Time Streaming Protocol (RTSP)" [RFC4567]. The reason is that RFC 4567 sends the initial MIKEY message in SDP, thus, both requiring the usage of the DESCRIBE method and forcing the server to keep state for clients performing DESCRIBE in anticipation that they might require key management. MIKEY is selected as the default method for establishing SRTP cryptographic context within an RTSP session as it can be embedded in the RTSP messages while still ensuring confidentiality of content of the keying material, even when using hop-by-hop TLS security for the RTSP messages. This method also supports pipelining of the RTSP messages.C.1.4.1. MIKEY Key Establishment
This method for using MIKEY [RFC3830] to establish the SRTP cryptographic context is initiated in the client's SETUP request, and the server's response to the SETUP carries the MIKEY response. This ensures that the crypto context establishment happens simultaneously with the establishment of the media stream being protected. By using MIKEY's RSA-R mode [RFC4738] the client can be the initiator and still allow the server to set the parameters in accordance with the actual media stream. The SRTP cryptographic context establishment is done according to the following process: 1. The client determines that SAVP or SAVPF shall be used from the media-description format, e.g., SDP. If no other key-management method is explicitly signaled, then MIKEY SHALL be used as defined herein. The use of SRTP with RTSP is only defined with MIKEY with keys established as defined in this section. Future documents may define how an RTSP implementation treats SDP that indicates some other key mechanism to be used. The need for such specification includes [RFC4567], which is not defined for use in RTSP 2.0 within this document.
2. The client SHALL establish a TLS connection for RTSP messages, directly or hop-by-hop with the server. If hop-by-hop TLS security is used, the User method SHALL be indicated in the Accept-Credentials header. Note that using hop-by-hop does allow the proxy to insert itself as a man in the middle. This can also occur in the MIKEY exchange by the proxy providing one of its certificates rather than the server's in the Connection- Credentials header. Therefore, the client SHALL validate the server certificate. 3. The client retrieves the server's certificate from a direct TLS connection or hop-by-hop from a Connection-Credentials header. The client then checks that the server certificate is valid and belongs to the server. 4. The client forms the MIKEY Initiator message using RSA-R mode in unicast mode as specified in [RFC4738]. The client SHOULD use the same certificate for TLS and MIKEY to enable the server to bind the two together. The client's certificate SHALL be included in the MIKEY message. The client SHALL indicate its SRTP capabilities in the message. 5. The MIKEY message from the previous step is base64-encoded [RFC4648] and becomes the value of the MIKEY parameter that is included in the transport specification(s) that specifies an SRTP-based profile (SAVP, SAVPF) in the SETUP request. 6. Any proxy encountering the MIKEY parameter SHALL forward it without modification. A proxy that is required to understand the Transport specifications will need to understand SAVP/SAVPF with MIKEY to enable the default keying for SRTP-protected media streams. If such a proxy does not support SAVP/SAVPF with MIKEY, it will discard the whole transport specification. Most types of proxies can easily support SAVP and SAVPF with MIKEY. If a client encounters a proxy not supporting SAVP/SAVPF with MIKEY, the client should attempt bypassing that proxy. 7. The server, upon receiving the SETUP request, will need to decide upon the transport specification to use, if multiple are included by the client. In the determination of which transport specifications are supported and preferred, the server SHOULD decode the MIKEY message to take the embedded SRTP parameters into account. If all transport spec require SRTP but no MIKEY parameter or other supported keying method is included, the server SHALL respond with 403 (Forbidden).
8. Upon generating a response, the following outcomes can occur: * A transport spec not using SRTP and MIKEY is selected. Thus, the response will not contain any MIKEY parameters. * A transport spec using SRTP and MIKEY is selected but an error is encountered in the MIKEY processing. In this case, an RTSP error response code of 466 (Key Management Error) SHALL be used. A MIKEY message describing the error MAY be included. * A transport spec using SRTP and MIKEY is selected and a MIKEY response message can be created. The server SHOULD use the same certificate for TLS and in MIKEY to enable the client to bind the two together. If a different certificate is used, it SHALL be included in the MIKEY message. It is RECOMMENDED that the envelope key-cache type be set to 'Cache' and that a single envelope key is reused for all MIKEY messages to the client. That message is included in the MIKEY parameter part of the single selected transport specification in the SETUP response. The server will set the SRTP parameters as preferred for this media stream within the supported range by the client. 9. The server transmits the SETUP response back to the client. 10. The client receives the SETUP response and, if the response code indicates a successful request, it decodes the MIKEY message and establishes the SRTP cryptographic context from the parameters in the MIKEY response. In the above method, the client's certificate may be self signed in cases where the client's identity is not necessary to authenticate and the security goal is only to ensure that the RTSP signaling client is the same as the one receiving the SRTP security context.C.1.5. SAVPF/UDP
The RTP profile "Extended Secure RTP Profile for Real-time Transport Control Protocol (RTCP)-Based Feedback (RTP/SAVPF)" [RFC5124] is an RTP profile (SAVPF) that MAY be used in RTSP sessions using RTP. All that is defined for AVPF MUST also apply for SAVPF. The usage of SRTP requires that a cryptographic context be established. The default mechanism for establishing that security association is to use MIKEY[RFC3830] with RTSP as defined in Appendix C.1.4.1.
C.1.6. RTCP Usage with RTSP
RTCP has several usages when RTP is implemented for media transport as explained below. Thus, RTCP MUST be supported if an RTSP agent handles RTP.C.1.6.1. Media Synchronization
RTCP provides media synchronization and clock-drift compensation. The initial media synchronization is available from RTP-Info header. However, to be able to handle any clock drift between the media streams, RTCP is needed.C.1.6.2. RTSP Session Keep-Alive
RTCP traffic from the RTSP client to the RTSP server MUST function as keep-alive. This requires an RTSP server supporting RTP to use the received RTCP packets as indications that the client desires the related RTSP session to be kept alive.C.1.6.3. Bitrate Adaption
RTCP Receiver reports and any additional feedback from the client MUST be used to adapt the bitrate used over the transport for all cases when RTP is sent over UDP. An RTP sender without reserved resources MUST NOT use more than its fair share of the available resources. This can be determined by comparing on short-to-medium terms (some seconds) the used bitrate and adapting it so that the RTP sender sends at a bitrate comparable to what a TCP sender would achieve on average over the same path. To ensure that the implementation's adaptation mechanism has a well- defined outer envelope, all implementations using a non-congestion- controlled unicast transport protocol, like UDP, MUST implement "Multimedia Congestion Control: Circuit Breakers for Unicast RTP Sessions" [RTP-CIRCUIT-BREAKERS].C.1.6.4. RTP and RTCP Multiplexing
RTSP can be used to negotiate the usage of RTP and RTCP multiplexing as described in [RFC5761]. This allows servers and client to reduce the amount of resources required for the session by only requiring one underlying transport stream per media stream instead of two when using RTP and RTCP. This lessens the server-port consumption and also the necessary state and keep-alive work when operating across NATs [RFC2663].
Content must be prepared with some consideration for RTP and RTCP multiplexing, mainly ensuring that the RTP payload types used do not collide with the ones used for RTCP packet types. This option likely needs explicit support from the content unless the RTP payload types can be remapped by the server and that is correctly reflected in the session description. Beyond that, support of this feature should come at little cost and much gain. It is recommended that, if the content and server support RTP and RTCP multiplexing, this is indicated in the session description, for example, using the SDP attribute "a=rtcp-mux". If the SDP message contains the "a=rtcp-mux" attribute for a media stream, the server MUST support RTP and RTCP multiplexing. If indicated or otherwise desired by the client, it can include the Transport parameter "RTCP- mux" in any transport specification where it desires to use "RTCP- mux". The server will indicate if it supports "RTCP-mux". Servers and Clients SHOULD support RTP and RTCP multiplexing. For capability exchange, an RTSP feature tag for RTP and RTCP multiplexing is defined: "setup.rtp.rtcp.mux". To minimize the risk of negotiation failure while using RTP and RTCP multiplexing, some recommendations are here provided. If the session description includes explicit indication of support ("a=rtcp-mux" in SDP), then an RTSP agent can safely create a SETUP request with a transport specification with only a single "dest_addr" parameter address specification. If no such explicit indication is provided, then even if the feature tag "setup.rtp.rtcp.mux" is provided in a Supported header by the RTSP server or the feature tag included in the Required header in the SETUP request, the media resource may not support RTP and RTCP multiplexing. Thus, to maximize the probability of successful negotiation, the RTSP agent is recommended to include two "dest_addr" parameter address specifications in the first or first set (if pipelining is used) of SETUP request(s) for any media resource aggregate. That way, the RTSP server can accept RTP and RTCP multiplexing and only use the first address specification or, if not, use both specifications. The RTSP agent, after having received the response for a successful negotiation of the usage of RTP and RTCP multiplexing, can then release the resources associated with the second address specification.C.2. RTP over TCP
Transport of RTP over TCP can be done in two ways: over independent TCP connections using [RFC4571] or interleaved in the RTSP connection. In both cases, the protocol MUST be "rtp" and the lower- layer MUST be TCP. The profile may be any of the above specified ones: AVP, AVPF, SAVP, or SAVPF.
C.2.1. Interleaved RTP over TCP
The use of embedded (interleaved) binary data transported on the RTSP connection is possible as specified in Section 14. When using this declared combination of interleaved binary data, the RTSP messages MUST be transported over TCP. TLS may or may not be used. If TLS is used, both RTSP messages and the binary data will be protected by TLS. One should, however, consider that this will result in all media streams going through any proxy. Using independent TCP connections can avoid that issue.C.2.2. RTP over Independent TCP
In this section, the sending of RTP [RFC3550] over lower-layer transport TCP [RFC793] according to "Framing Real-time Transport Protocol (RTP) and RTP Control Protocol (RTCP) Packets over Connection-Oriented Transport" [RFC4571] is described. This section adapts the guidelines for using RTP over TCP within SIP/SDP [RFC4145] to work with RTSP. A client codes the support of RTP over independent TCP by specifying an RTP/AVP/TCP transport option without an interleaved parameter in the Transport line of a SETUP request. This transport option MUST include the "unicast" parameter. If the client wishes to use RTP with RTCP, two address specifications need to be included in the "dest_addr" parameter. If the client wishes to use RTP without RTCP, one address specification is included in the "dest_addr" parameter. If the client wishes to multiplex RTP and RTCP on a single transport flow (see Appendix C.1.6.4), one or two address specifications are included in the "dest_addr" parameter in addition to the "RTCP-mux" transport parameter. Two address specifications are allowed to facilitate successful negotiation when the server or content can't support RTP and RTCP multiplexing. Ordering rules of dest_addr ports follow the rules for RTP/AVP/UDP. If the client wishes to play the active role in initiating the TCP connection, it MAY set the setup parameter (see Section 18.54) on the Transport line to be "active", or it MAY omit the setup parameter, as active is the default. If the client signals the active role, the ports in the address specifications in the "dest_addr" parameter MUST be set to 9 (the discard port). If the client wishes to play the passive role in TCP connection initiation, it MUST set the setup parameter on the Transport line to be "passive". If the client is able to assume the active or the
passive role, it MUST set the setup parameter on the Transport line to be "actpass". In either case, the "dest_addr" parameter's address specification port value for RTP MUST be set to the TCP port number on which the client is expecting to receive the TCP connection for RTP, and the "dest_addr" address specification port value for RTCP MUST be set to the TCP port number on which the client is expecting to receive the TCP connection for RTCP. In the case that the client wishes to multiplex RTP and RTCP on a single transport flow, the "RTCP-mux" parameter is included and one or two "dest_addr" parameter address specifications are included, as mentioned earlier in this section. Upon receipt of a non-interleaved RTP/AVP/TCP SETUP request, if a server decides to accept this requested option, the 2xx reply MUST contain a Transport option that specifies RTP/AVP/TCP (without using the interleaved parameter and using the unicast parameter). The "dest_addr" parameter value MUST be echoed from the parameter value in the client request unless the destination address (only port) was not provided; in which case, the server MAY include the source address of the RTSP TCP connection with the port number unchanged. In addition, the server reply MUST set the setup parameter on the Transport line, to indicate the role the server will play in the connection setup. Permissible values are "active" (if a client set setup to "passive" or "actpass") and "passive" (if a client set setup to "active" or "actpass"). If a server sets setup to "passive", the "src_addr" in the reply MUST indicate the ports on which the server is willing to receive a TCP connection for RTP and (if the client requested a TCP connection for RTCP by specifying two "dest_addr" address specifications) a TCP/ RTCP connection. If a server sets setup to "active", the ports specified in "src_addr" address specifications MUST be set to 9. The server MAY use the "ssrc" parameter, following the guidance in Section 18.54. The server sets only one address specification in the case that the client has indicated only a single address specification or in case RTP and RTCP multiplexing was requested and accepted by the server. Port ordering for "src_addr" follows the rules for RTP/AVP/UDP. Servers MUST support taking the passive role and MAY support taking the active role. Servers with a public IP address take the passive role, thus enabling clients behind NATs and firewalls a better chance of successful connect to the server by actively connecting outwards. Therefore, the clients are RECOMMENDED to take the active role.
After sending (receiving) a 2xx reply for a SETUP method for a non- interleaved RTP/AVP/TCP media stream, the active party SHOULD initiate the TCP connection as soon as possible. The client MUST NOT send a PLAY request prior to the establishment of all the TCP connections negotiated using SETUP for the session. In case the server receives a PLAY request in a session that has not yet established all the TCP connections, it MUST respond using the 464 (Data Transport Not Ready Yet) (Section 17.4.28) error code. Once the PLAY request for a media resource transported over non- interleaved RTP/AVP/TCP occurs, media begins to flow from server to client over the RTP TCP connection, and RTCP packets flow bidirectionally over the RTCP TCP connection. Unless RTP and RTCP multiplexing has been negotiated; in which case, RTP and RTCP will flow over a common TCP connection. As in the RTP/UDP case, client- to-server traffic on an RTP-only TCP session is unspecified by this memo. The packets that travel on these connections MUST be framed using the protocol defined in [RFC4571], not by the framing defined for interleaving RTP over the RTSP connection defined in Section 14. A successful PAUSE request for media being transported over RTP/AVP/ TCP pauses the flow of packets over the connections, without closing the connections. A successful TEARDOWN request signals that the TCP connections for RTP and RTCP are to be closed by the RTSP client as soon as possible. Subsequent SETUP requests using a URI already set up in an RTSP session using an RTP/AVP/TCP transport specification may be ambiguous in the following way: does the client wish to open up a new TCP connection for RTP or RTCP for the URI, or does the client wish to continue using the existing TCP connections? The client SHOULD use the "connection" parameter (defined in Section 18.54) on the Transport line to make its intention clear (by setting "connection" to "new" if new connections are needed, and by setting "connection" to "existing" if the existing connections are to be used). After a 2xx reply for a SETUP request for a new connection, parties should close the preexisting connections, after waiting a suitable period for any stray RTP or RTCP packets to arrive. The usage of SRTP, i.e., either SAVP or SAVPF profiles, requires that a security association be established. The default mechanism for establishing that security association is to use MIKEY[RFC3830] with RTSP as defined Appendix C.1.4.1.
Below, a rewritten version of the example "Media on Demand" (Appendix A.1) shows the use of RTP/AVP/TCP non-interleaved: C->M: DESCRIBE rtsp://example.com/twister.3gp RTSP/2.0 CSeq: 1 User-Agent: PhonyClient/1.2 M->C: RTSP/2.0 200 OK CSeq: 1 Server: PhonyServer/1.0 Date: Wed, 23 Jan 2013 15:36:52 +0000 Content-Type: application/sdp Content-Length: 227 Content-Base: rtsp://example.com/twister.3gp/ Expires: Thu, 24 Jan 2013 15:36:52 +0000 v=0 o=- 2890844256 2890842807 IN IP4 198.51.100.34 s=RTSP Session i=An Example of RTSP Session Usage e=adm@example.com c=IN IP4 0.0.0.0 a=control: * a=range:npt=00:00:00-00:10:34.10 t=0 0 m=audio 0 RTP/AVP 0 a=control: trackID=1 C->M: SETUP rtsp://example.com/twister.3gp/trackID=1 RTSP/2.0 CSeq: 2 User-Agent: PhonyClient/1.2 Require: play.basic Transport: RTP/AVP/TCP;unicast;dest_addr=":9"/":9"; setup=active;connection=new Accept-Ranges: npt, smpte, clock M->C: RTSP/2.0 200 OK CSeq: 2 Server: PhonyServer/1.0 Transport: RTP/AVP/TCP;unicast; dest_addr=":9"/":9"; src_addr="198.51.100.5:53478"/"198.51.100:54091"; setup=passive;connection=new;ssrc=93CB001E Session: OccldOFFq23KwjYpAnBbUr Expires: Thu, 24 Jan 2013 15:36:52 +0000 Date: Wed, 23 Jan 2013 15:36:52 +0000 Accept-Ranges: npt Media-Properties: Random-Access=0.8, Immutable, Unlimited
C->M: TCP Connection Establishment x2 C->M: PLAY rtsp://example.com/twister.3gp/ RTSP/2.0 CSeq: 4 User-Agent: PhonyClient/1.2 Range: npt=30- Session: OccldOFFq23KwjYpAnBbUr M->C: RTSP/2.0 200 OK CSeq: 4 Server: PhonyServer/1.0 Date: Wed, 23 Jan 2013 15:36:54 +0000 Session: OccldOFFq23KwjYpAnBbUr Range: npt=30-623.10 Seek-Style: First-Prior RTP-Info: url="rtsp://example.com/twister.3gp/trackID=1" ssrc=4F312DD8:seq=54321;rtptime=2876889C.3. Handling Media-Clock Time Jumps in the RTP Media Layer
RTSP allows media clients to control selected, non-contiguous sections of media presentations, rendering those streams with an RTP media layer [RFC3550]. Two cases occur, the first is when a new PLAY request replaces an old ongoing request and the new request results in a jump in the media. This should produce continuous media stream at the RTP layer. A client may also immediately follow a completed PLAY request with a new PLAY request. This will result in some gap in the media layer. The below text will look into both cases. A PLAY request that replaces an ongoing PLAY request allows the media layer rendering the RTP stream to do so continuously without being affected by jumps in media-clock time. The RTP timestamps for the new media range are set so that they become continuous with the previous media range in the previous request. The RTP sequence number for the first packet in the new range will be the next following the last packet in the previous range, i.e., monotonically increasing. The goal is to allow the media-rendering layer to work without interruption or reconfiguration across the jumps in media clock. This should be possible in all cases of replaced PLAY requests for media that has random access properties. In this case, care is needed to align frames or similar media-dependent structures. In cases where jumps in media-clock time are a result of RTSP signaling operations arriving after a completed PLAY operation, the request timing will result in that media becoming non-continuous. The server becomes unable to send the media so that it arrives timely and still carries timestamps to make the media stream continuous. In these situations, the server will produce RTP streams where there are
gaps in the RTP timeline for the media. If the media has frame structure, aligning the timestamp for the next frame with the previous structure reduces the burden to render this media. The gap should represent the time the server hasn't been serving media, e.g., the time between the end of the media stream or a PAUSE request and the new PLAY request. In these cases, the RTP sequence number would normally be monotonically increasing across the gap. For RTSP sessions with media that lacks random access properties, such as live streams, any media-clock jump is commonly the result of a correspondingly long pause of delivery. The RTP timestamp will have increased in direct proportion to the duration of the paused delivery. Note also that in this case the RTP sequence number should be the next packet number. If not, the RTCP packet loss reporting will indicate as loss all packets not received between the point of pausing and later resuming. This may trigger congestion avoidance mechanisms. An allowed exception from the above recommendation on monotonically increasing RTP sequence number is live media streams, likely being relayed. In this case, when the client resumes delivery, it will get the media that is currently being delivered to the server itself. For this type of basic delivery of live streams to multiple users over unicast, individual rewriting of RTP sequence numbers becomes quite a burden. For solutions that already cache media or perform time shifting, the rewriting should impose only a minor burden. The goal when handling jumps in media-clock time is that the provided stream is continuous without gaps in RTP timestamp or sequence number. However, when delivery has been halted for some reason, the RTP timestamp, when resuming, MUST represent the duration that the delivery was halted. An RTP sequence number MUST generally be the next number, i.e., monotonically increasing modulo 65536. For media resources with the properties Time-Progressing and Time-Duration=0.0, the server MAY create RTP media streams with RTP sequence number jumps in them due to the client first halting delivery and later resuming it (PAUSE and then later PLAY). However, servers utilizing this exception must take into consideration the resulting RTCP receiver reports that likely contain loss reports for all the packets that were a part of the discontinuity. A client cannot rely on the fact that a server will align when resuming play, even if it is RECOMMENDED. The RTP-Info header will provide information on how the server acts in each case. One cannot assume that the RTSP client can communicate with the RTP media agent, as the two may be independent processes. If the RTP timestamp shows the same gap as the NPT, the media agent will assume that there is a pause in the presentation. If the jump in NPT is large enough, the RTP timestamp may roll over and the media
agent may believe later packets to be duplicates of packets just played out. Having the RTP timestamp jump will also affect the RTCP measurements based on this. As an example, assume an RTP timestamp frequency of 8000 Hz, a packetization interval of 100 ms, and an initial sequence number and timestamp of zero. C->S: PLAY rtsp://example.com/fizzle RTSP/2.0 CSeq: 4 Session: ymIqLXufHkMHGdtENdblWK Range: npt=10-15 User-Agent: PhonyClient/1.2 S->C: RTSP/2.0 200 OK CSeq: 4 Session: ymIqLXufHkMHGdtENdblWK Range: npt=10-15 RTP-Info: url="rtsp://example.com/fizzle/audiotrack" ssrc=0D12F123:seq=0;rtptime=0 The ensuing RTP data stream is depicted below: S -> C: RTP packet - seq = 0, rtptime = 0, NPT time = 10s S -> C: RTP packet - seq = 1, rtptime = 800, NPT time = 10.1s . . . S -> C: RTP packet - seq = 49, rtptime = 39200, NPT time = 14.9s Upon the completion of the requested delivery, the server sends a PLAY_NOTIFY. S->C: PLAY_NOTIFY rtsp://example.com/fizzle RTSP/2.0 CSeq: 5 Notify-Reason: end-of-stream Request-Status: cseq=4 status=200 reason="OK" Range: npt=-15 RTP-Info:url="rtsp://example.com/fizzle/audiotrack" ssrc=0D12F123:seq=49;rtptime=39200 Session: ymIqLXufHkMHGdtENdblWK C->S: RTSP/2.0 200 OK CSeq: 5 User-Agent: PhonyClient/1.2 Upon the completion of the play range, the client follows up with a request to PLAY from a new NPT.
C->S: PLAY rtsp://example.com/fizzle RTSP/2.0 CSeq: 6 Session: ymIqLXufHkMHGdtENdblWK Range: npt=18-20 User-Agent: PhonyClient/1.2 S->C: RTSP/2.0 200 OK CSeq: 6 Session: ymIqLXufHkMHGdtENdblWK Range: npt=18-20 RTP-Info: url="rtsp://example.com/fizzle/audiotrack" ssrc=0D12F123:seq=50;rtptime=40100 The ensuing RTP data stream is depicted below: S->C: RTP packet - seq = 50, rtptime = 40100, NPT time = 18s S->C: RTP packet - seq = 51, rtptime = 40900, NPT time = 18.1s . . . S->C: RTP packet - seq = 69, rtptime = 55300, NPT time = 19.9s In this example, first, NPT 10 through 15 are played, then the client requests the server to skip ahead and play NPT 18 through 20. The first segment is presented as RTP packets with sequence numbers 0 through 49 and timestamps 0 through 39,200. The second segment consists of RTP packets with sequence numbers 50 through 69, with timestamps 40,100 through 55,200. While there is a gap in the NPT, there is no gap in the sequence-number space of the RTP data stream. The RTP timestamp gap is present in the above example due to the time it takes to perform the second play request, in this case, 12.5 ms (100/8000).C.4. Handling RTP Timestamps after PAUSE
During a PAUSE/PLAY interaction in an RTSP session, the duration of time for which the RTP transmission was halted MUST be reflected in the RTP timestamp of each RTP stream. The duration can be calculated for each RTP stream as the time elapsed from when the last RTP packet was sent before the PAUSE request was received and when the first RTP packet was sent after the subsequent PLAY request was received. The duration includes all latency incurred and processing time required to complete the request. RFC 3550 [RFC3550] states that: "the RTP timestamp for each unit [packet] would be related to the wallclock time at which the unit becomes current on the virtual presentation timeline".
In order to satisfy the requirements of [RFC3550], the RTP timestamp space needs to increase continuously with real time. While this is not optimal for stored media, it is required for RTP and RTCP to function as intended. Using a continuous RTP timestamp space allows the same timestamp model for both stored and live media and allows better opportunity to integrate both types of media under a single control. As an example, assume a clock frequency of 8000 Hz, a packetization interval of 100 ms, and an initial sequence number and timestamp of zero. C->S: PLAY rtsp://example.com/fizzle RTSP/2.0 CSeq: 4 Session: ymIqLXufHkMHGdtENdblWK Range: npt=10-15 User-Agent: PhonyClient/1.2 S->C: RTSP/2.0 200 OK CSeq: 4 Session: ymIqLXufHkMHGdtENdblWK Range: npt=10-15 RTP-Info: url="rtsp://example.com/fizzle/audiotrack" ssrc=0D12F123:seq=0;rtptime=0 The ensuing RTP data stream is depicted below: S -> C: RTP packet - seq = 0, rtptime = 0, NPT time = 10s S -> C: RTP packet - seq = 1, rtptime = 800, NPT time = 10.1s S -> C: RTP packet - seq = 2, rtptime = 1600, NPT time = 10.2s S -> C: RTP packet - seq = 3, rtptime = 2400, NPT time = 10.3s
The client then sends a PAUSE request: C->S: PAUSE rtsp://example.com/fizzle RTSP/2.0 CSeq: 5 Session: ymIqLXufHkMHGdtENdblWK User-Agent: PhonyClient/1.2 S->C: RTSP/2.0 200 OK CSeq: 5 Session: ymIqLXufHkMHGdtENdblWK Range: npt=10.4-15 20 seconds elapse and then the client sends a PLAY request. In addition, the server requires 15 ms to process the request: C->S: PLAY rtsp://example.com/fizzle RTSP/2.0 CSeq: 6 Session: ymIqLXufHkMHGdtENdblWK User-Agent: PhonyClient/1.2 S->C: RTSP/2.0 200 OK CSeq: 6 Session: ymIqLXufHkMHGdtENdblWK Range: npt=10.4-15 RTP-Info: url="rtsp://example.com/fizzle/audiotrack" ssrc=0D12F123:seq=4;rtptime=164400 The ensuing RTP data stream is depicted below: S -> C: RTP packet - seq = 4, rtptime = 164400, NPT time = 10.4s S -> C: RTP packet - seq = 5, rtptime = 165200, NPT time = 10.5s S -> C: RTP packet - seq = 6, rtptime = 166000, NPT time = 10.6s First, NPT 10 through 10.3 is played, then a PAUSE is received by the server. After 20 seconds, a PLAY is received by the server that takes 15 ms to process. The duration of time for which the session was paused is reflected in the RTP timestamp of the RTP packets sent after this PLAY request. A client can use the RTSP Range header and RTP-Info header to map NPT time of a presentation with the RTP timestamp. Note: in RFC 2326 [RFC2326], this matter was not clearly defined and was misunderstood commonly. However, for RTSP 2.0, it is expected that this will be handled correctly and no exception handling will be required.
Note further: it may be required to reset some of the state to ensure the correct media decoding and the usual jitter-buffer handling when issuing a PLAY request.C.5. RTSP/RTP Integration
For certain data types, tight integration between the RTSP layer and the RTP layer will be necessary. This by no means precludes the above restrictions. Combined RTSP/RTP media clients should use the RTP-Info field to determine whether incoming RTP packets were sent before or after a seek or before or after a PAUSE.C.6. Scaling with RTP
For scaling (see Section 18.46), RTP timestamps should correspond to the rendering timing. For example, when playing video recorded at 30 frames per second at a scale of two and speed (Section 18.50) of one, the server would drop every second frame to maintain and deliver video packets with the normal timestamp spacing of 3,000 per frame, but NPT would increase by 1/15 second for each video frame. Note: the above scaling puts requirements on the media codec or a media stream to support it. For example, motion JPEG or other non-predictive video coding can easier handle the above example.C.7. Maintaining NPT Synchronization with RTP Timestamps
The client can maintain a correct display of NPT by noting the RTP timestamp value of the first packet arriving after repositioning. The sequence parameter of the RTP-Info (Section 18.45) header provides the first sequence number of the next segment.C.8. Continuous Audio
For continuous audio, the server SHOULD set the RTP marker bit at the beginning of serving a new PLAY request or at jumps in timeline. This allows the client to perform playout delay adaptation.C.9. Multiple Sources in an RTP Session
Note that more than one SSRC MAY be sent in the media stream. If it happens, all sources are expected to be rendered simultaneously.C.10. Usage of SSRCs and the RTCP BYE Message during an RTSP Session
The RTCP BYE message indicates the end of use of a given SSRC. If all sources leave an RTP session, it can, in most cases, be assumed to have ended. Therefore, a client or server MUST NOT send an RTCP
BYE message until it has finished using a SSRC. A server SHOULD keep using an SSRC until the RTP session is terminated. Prolonging the use of a SSRC allows the established synchronization context associated with that SSRC to be used to synchronize subsequent PLAY requests even if the PLAY response is late. An SSRC collision with the SSRC that transmits media does also have consequences, as it will normally force the media sender to change its SSRC in accordance with the RTP specification [RFC3550]. However, an RTSP server may wait and see if the client changes and thus resolve the conflict to minimize the impact. As media sender, SSRC change will result in a loss of synchronization context and require any receiver to wait for RTCP sender reports for all media requiring synchronization before being able to play out synchronized. Due to these reasons, a client joining a session should take care not to select the same SSRC(s) as the server indicates in the ssrc Transport header parameter. Any SSRC signaled in the Transport header MUST be avoided. A client detecting a collision prior to sending any RTP or RTCP messages SHALL also select a new SSRC.C.11. Future Additions
It is the intention that any future protocol or profile regarding media delivery and lower transport should be easy to add to RTSP. This section provides the necessary steps that need to be met. The following things need to be considered when adding a new protocol or profile for use with RTSP: o The protocol or profile needs to define a name tag representing it. This tag is required to be an ABNF "token" to be possible to use in the Transport header specification. o The useful combinations of protocol, profiles, and lower-layer transport for this extension need to be defined. For each combination, declare the necessary parameters to use in the Transport header. o For new media protocols, the interaction with RTSP needs to be addressed. One important factor will be the media synchronization. It may be necessary to have new headers similar to RTP info to carry this information. o Discussion needs to occur regarding congestion control for media, especially if transport without built-in congestion control is used.
See the IANA Considerations section (Section 22) for information on how to register new attributes.