Appendix D. Use of SDP for RTSP Session Descriptions
The Session Description Protocol (SDP, [RFC4566]) may be used to describe streams or presentations in RTSP. This description is typically returned in reply to a DESCRIBE request on a URI from a server to a client or received via HTTP from a server to a client. This appendix describes how an SDP file determines the operation of an RTSP session. Thus, it is worth pointing out that the interpretation of the SDP is done in the context of the SDP receiver, which is the one being configured. This is the same as in SAP [RFC2974]; this differs from SDP Offer/Answer [RFC3264] where each SDP is interpreted in the context of the agent providing it. SDP as is provides no mechanism by which a client can distinguish, without human guidance, between several media streams to be rendered simultaneously and a set of alternatives (e.g., two audio streams spoken in different languages). The SDP extension found in "The Session Description Protocol (SDP) Grouping Framework" [RFC5888] provides such functionality to some degree. Appendix D.4 describes the usage of SDP media line grouping for RTSP.D.1. Definitions
The terms "session-level", "media-level", and other key/attribute names and values used in this appendix are to be used as defined in SDP [RFC4566]:D.1.1. Control URI
The "a=control" attribute is used to convey the control URI. This attribute is used both for the session and media descriptions. If used for individual media, it indicates the URI to be used for controlling that particular media stream. If found at the session level, the attribute indicates the URI for aggregate control (presentation URI). The session-level URI MUST be different from any media-level URI. The presence of a session-level control attribute MUST be interpreted as support for aggregated control. The control attribute MUST be present on the media level unless the presentation only contains a single media stream; in which case, the attribute MAY be present on the session level only and then also apply to that single media stream. ABNF for the attribute is defined in Section 20.3.
Example: a=control:rtsp://example.com/foo This attribute MAY contain either relative or absolute URIs, following the rules and conventions set out in RFC 3986 [RFC3986]. Implementations MUST look for a base URI in the following order: 1. the RTSP Content-Base field; 2. the RTSP Content-Location field; 3. the RTSP Request-URI. If this attribute contains only an asterisk (*), then the URI MUST be treated as if it were an empty embedded URI; thus, it will inherit the entire base URI. Note: RFC 2326 was very unclear on the processing of relative URIs and several RTSP 1.0 implementations at the point of publishing this document did not perform RFC 3986 processing to determine the resulting URI; instead, simple concatenation is common. To avoid this issue completely, it is recommended to use absolute URIs in the SDP. The URI handling for SDPs from container files needs special consideration. For example, let's assume that a container file has the URI: "rtsp://example.com/container.mp4". Let's further assume this URI is the base URI and that there is an absolute media-level URI: "rtsp://example.com/container.mp4/trackID=2". A relative media- level URI that resolves in accordance with RFC 3986 [RFC3986] to the above given media URI is "container.mp4/trackID=2". It is usually not desirable to need to include or modify the SDP stored within the container file with the server local name of the container file. To avoid this, one can modify the base URI used to include a trailing slash, e.g., "rtsp://example.com/container.mp4/". In this case, the relative URI for the media will only need to be "trackID=2". However, this will also mean that using "*" in the SDP will result in the control URI including the trailing slash, i.e., "rtsp://example.com/container.mp4/". Note: the usage of TrackID in the above is not a standardized form, but one example out of several similar strings such as TrackID, Track_ID, StreamID that is used by different server vendors to indicate a particular piece of media inside a container file.
D.1.2. Media Streams
The "m=" field is used to enumerate the streams. It is expected that all the specified streams will be rendered with appropriate synchronization. If the session is over multicast, the port number indicated SHOULD be used for reception. The client MAY try to override the destination port, through the Transport header. The servers MAY allow this: the response will indicate whether or not this is allowed. If the session is unicast, the port numbers are the ones RECOMMENDED by the server to the client, about which receiver ports to use; the client MUST still include its receiver ports in its SETUP request. The client MAY ignore this recommendation. If the server has no preference, it SHOULD set the port number value to zero. The "m=" lines contain information about which transport protocol, profile, and possibly lower-layer are to be used for the media stream. The combination of transport, profile, and lower layer, like RTP/AVP/UDP, needs to be defined for how to be used with RTSP. The currently defined combinations are discussed in Appendix C; further combinations MAY be specified. Example: m=audio 0 RTP/AVP 31D.1.3. Payload Type(s)
The payload type or types are specified in the "m=" line. In case the payload type is a static payload type from RFC 3551 [RFC3551], no other information may be required. In case it is a dynamic payload type, the media attribute "rtpmap" is used to specify what the media is. The "encoding name" within the "rtpmap" attribute may be one of those specified in [RFC4856], a media type registered with IANA according to [RFC4855], or an experimental encoding as specified in SDP [RFC4566]). Codec-specific parameters are not specified in this field, but rather in the "fmtp" attribute described below. The selection of the RTP payload type numbers used may be required to consider RTP and RTCP Multiplexing [RFC5761], if that is to be supported by the server.D.1.4. Format-Specific Parameters
Format-specific parameters are conveyed using the "fmtp" media attribute. The syntax of the "fmtp" attribute is specific to the encoding(s) to which the attribute refers. Note that some of the
format-specific parameters may be specified outside of the "fmtp" parameters, for example, like the "ptime" attribute for most audio encodings.D.1.5. Directionality of Media Stream
The SDP attributes "a=sendrecv", "a=recvonly", and "a=sendonly" provide instructions about the direction the media streams flow within a session. When using RTSP, the SDP can be delivered to a client using either RTSP DESCRIBE or a number of RTSP external methods, like HTTP, FTP, and email. Based on this, the SDP applies to how the RTSP client will see the complete session. Thus, media streams delivered from the RTSP server to the client would be given the "a=recvonly" attribute. "a=recvonly" in an SDP provided to the RTSP client indicates that media delivery will only occur in the direction from the RTSP server to the client. SDP provided to the RTSP client that lacks any of the directionality attributes ("a=recvonly", "a=sendonly", "a=sendrecv") would be interpreted as having "a=sendrecv". At the time of writing, there exists no RTSP mode suitable for media traffic in the direction from the RTSP client to the server. Thus, all RTSP SDP SHOULD have an "a=recvonly" attribute when using the PLAY mode defined in this document. If future modes are defined for media in the client-to- server direction, then usage of "a=sendonly" or "a=sendrecv" may become suitable to indicate intended media directions.D.1.6. Range of Presentation
The "a=range" attribute defines the total time range of the stored session or an individual media. Live sessions that are not seekable can be indicated as specified below; whereas the length of live sessions can be deduced from the "t=" and "r=" SDP parameters. The attribute is both a session- and a media-level attribute. For presentations that contain media streams of the same duration, the range attribute SHOULD only be used at the session level. In case of different lengths, the range attribute MUST be given at media level for all media and SHOULD NOT be given at the session level. If the attribute is present at both media level and session level, the media-level values MUST be used. Note: usually one will specify the same length for all media, even if there isn't media available for the full duration on all media. However, that requires that the server accept PLAY requests within that range.
Servers MUST take care to provide RTSP Range (see Section 18.40) values that are consistent with what is presented in the SDP for the content. There is no reason for non dynamic content, like media clips provided on demand to have inconsistent values. Inconsistent values between the SDP and the actual values for the content handled by the server is likely to generate some failure, like 457 "Invalid Range", in case the client uses PLAY requests with a Range header. In case the content is dynamic in length and it is infeasible to provide a correct value in the SDP, the server is recommended to describe this as content that is not seekable (see below). The server MAY override that property in the response to a PLAY request using the correct values in the Range header. The unit is specified first, followed by the value range. The units and their values are as defined in Section 4.4.1, Section 4.4.2, and Section 4.4.3 and MAY be extended with further formats. Any open- ended range (start-), i.e., without stop range, is of unspecified duration and MUST be considered as content that is not seekable unless this property is overridden. Multiple instances carrying different clock formats MAY be included at either session or media level. ABNF for the attribute is defined in Section 20.3. Examples: a=range:npt=0-34.4368 a=range:clock=19971113T211503Z-19971113T220300Z Non-seekable stream of unknown duration: a=range:npt=0-D.1.7. Time of Availability
The "t=" field defines when the SDP is valid. For on-demand content, the server SHOULD indicate a stop time value for which it guarantees the description to be valid and a start time that is equal to or before the time at which the DESCRIBE request was received. It MAY also indicate start and stop times of 0, meaning that the session is always available. For sessions that are of live type, i.e., specific start time, unknown stop time, likely not seekable, the "t=" and "r=" field SHOULD be used to indicate the start time of the event. The stop time SHOULD be given so that the live event will have ended at that time, while still not being unnecessary far into the future.
D.1.8. Connection Information
In SDP used with RTSP, the "c=" field contains the destination address for the media stream. If a multicast address is specified, the client SHOULD use this address in any SETUP request as destination address, including any additional parameters, such as TTL. For on-demand unicast streams and some multicast streams, the destination address MAY be specified by the client via the SETUP request, thus overriding any specified address. To identify streams without a fixed destination address, where the client is required to specify a destination address, the "c=" field SHOULD be set to a null value. For addresses of type "IP4", this value MUST be "0.0.0.0"; and for type "IP6", this value MUST be "0:0:0:0:0:0:0:0" (can also be written as "::"), i.e., the unspecified address according to RFC 4291 [RFC4291].D.1.9. Message Body Tag
The optional "a=mtag" attribute identifies a version of the session description. It is opaque to the client. SETUP requests may include this identifier in the If-Match field (see Section 18.24) to allow session establishment only if this attribute value still corresponds to that of the current description. The attribute value is opaque and may contain any character allowed within SDP attribute values. ABNF for the attribute is defined in Section 20.3. Example: a=mtag:"158bb3e7c7fd62ce67f12b533f06b83a" One could argue that the "o=" field provides identical functionality. However, it does so in a manner that would put constraints on servers that need to support multiple session description types other than SDP for the same piece of media content.
D.2. Aggregate Control Not Available
If a presentation does not support aggregate control, no session- level "a=control" attribute is specified. For an SDP with multiple media sections specified, each section will have its own control URI specified via the "a=control" attribute. Example: v=0 o=- 2890844256 2890842807 IN IP4 192.0.2.56 s=I came from a web page e=adm@example.com c=IN IP4 0.0.0.0 t=0 0 m=video 8002 RTP/AVP 31 a=control:rtsp://audio.example.com/movie.aud m=audio 8004 RTP/AVP 3 a=control:rtsp://video.example.com/movie.vid Note that the position of the control URI in the description implies that the client establishes separate RTSP control sessions to the servers audio.example.com and video.example.com. It is recommended that an SDP file contain the complete media- initialization information even if it is delivered to the media client through non-RTSP means. This is necessary as there is no mechanism to indicate that the client should request more detailed media stream information via DESCRIBE.D.3. Aggregate Control Available
In this scenario, the server has multiple streams that can be controlled as a whole. In this case, there are both a media-level "a=control" attribute, which is used to specify the stream URIs, and a session-level "a=control" attribute, which is used as the Request- URI for aggregate control. If the media-level URI is relative, it is resolved to absolute URIs according to Appendix D.1.1 above.
Example: C->M: DESCRIBE rtsp://example.com/movie RTSP/2.0 CSeq: 1 User-Agent: PhonyClient/1.2 M->C: RTSP/2.0 200 OK CSeq: 1 Date: Wed, 23 Jan 2013 15:36:52 +0000 Expires: Wed, 23 Jan 2013 16:36:52 +0000 Content-Type: application/sdp Content-Base: rtsp://example.com/movie/ Content-Length: 227 v=0 o=- 2890844256 2890842807 IN IP4 192.0.2.211 s=I contain i=<more info> e=adm@example.com c=IN IP4 0.0.0.0 a=control:* t=0 0 m=video 8002 RTP/AVP 31 a=control:trackID=1 m=audio 8004 RTP/AVP 3 a=control:trackID=2 In this example, the client is recommended to establish a single RTSP session to the server, and it uses the URIs rtsp://example.com/movie/ trackID=1 and rtsp://example.com/movie/trackID=2 to set up the video and audio streams, respectively. The URI rtsp://example.com/movie/, which is resolved from the "*", controls the whole presentation (movie). A client is not required to issue SETUP requests for all streams within an aggregate object. Servers should allow the client to ask for only a subset of the streams.D.4. Grouping of Media Lines in SDP
For some types of media, it is desirable to express a relationship between various media components, for instance, for lip synchronization or Scalable Video Codec (SVC) [RFC5583]. This relationship is expressed on the SDP level by grouping of media lines, as described in [RFC5888], and can be exposed to RTSP.
For RTSP, it is mainly important to know how to handle grouped media received by means of SDP, i.e., if the media are under aggregate control (see Appendix D.3) or if aggregate control is not available (see Appendix D.2). It is RECOMMENDED that grouped media are handled by aggregate control, to give the client the ability to control either the whole presentation or single media.D.5. RTSP External SDP Delivery
There are some considerations that need to be made when the session description is delivered to the client outside of RTSP, for example via HTTP or email. First of all, the SDP needs to contain absolute URIs, since relative will, in most cases, not work as the delivery will not correctly forward the base URI. The writing of the SDP session availability information, i.e., "t=" and "r=", needs to be carefully considered. When the SDP is fetched by the DESCRIBE method, the probability that it is valid is very high. However, the same is much less certain for SDPs distributed using other methods. Therefore, the publisher of the SDP should take care to follow the recommendations about availability in the SDP specification [RFC4566] in Section 4.2.Appendix E. RTSP Use Cases
This appendix describes the most important and considered use cases for RTSP. They are listed in descending order of importance in regard to ensuring that all necessary functionality is present. This specification only fully supports usage of the two first. Also, in these first two cases, there are special cases or exceptions that are not supported without extensions, e.g., the redirection of media delivery to an address other than the controlling agent's (client's).E.1. On-Demand Playback of Stored Content
An RTSP-capable server stores content suitable for being streamed to a client. A client desiring playback of any of the stored content uses RTSP to set up the media transport required to deliver the desired content. RTSP is then used to initiate, halt, and manipulate the actual transmission (playout) of the content. RTSP is also required to provide the necessary description and synchronization information for the content.
The above high-level description can be broken down into a number of functions of which RTSP needs to be capable. Presentation Description: Provide initialization information about the presentation (content); for example, which media codecs are needed for the content. Other information that is important includes the number of media streams the presentation contains, the transport protocols used for the media streams, and identifiers for these media streams. This information is required before setup of the content is possible and to determine if the client is even capable of using the content. This information need not be sent using RTSP; other external protocols can be used to transmit the transport presentation descriptions. Two good examples are the use of HTTP [RFC7230] or email to fetch or receive presentation descriptions like SDP [RFC4566] Setup: Set up some or all of the media streams in a presentation. The setup itself consists of selecting the protocol for media transport and the necessary parameters for the protocol, like addresses and ports. Control of Transmission: After the necessary media streams have been established, the client can request the server to start transmitting the content. The client must be allowed to start or stop the transmission of the content at arbitrary times. The client must also be able to start the transmission at any point in the timeline of the presentation. Synchronization: For media-transport protocols like RTP [RFC3550], it might be beneficial to carry synchronization information within RTSP. This may be due to either the lack of inter-media synchronization within the protocol itself or the potential delay before the synchronization is established (which is the case for RTP when using RTCP). Termination: Terminate the established contexts. For this use case, there are a number of assumptions about how it works. These are: On-Demand content: The content is stored at the server and can be accessed at any time during a time period when it is intended to be available.
Independent sessions: A server is capable of serving a number of clients simultaneously, including from the same piece of content at different points in that presentations timeline. Unicast Transport: Content for each individual client is transmitted to them using unicast traffic. It is also possible to redirect the media traffic to a different destination than that of the agent controlling the traffic. However, allowing this without appropriate mechanisms for checking that the destination approves of this allows for Distributed DoS (DDoS).E.2. Unicast Distribution of Live Content
This use case is similar to the above on-demand content case (see Appendix E.1), the difference is the nature of the content itself. Live content is continuously distributed as it becomes available from a source; i.e., the main difference from on-demand is that one starts distributing content before the end of it has become available to the server. In many cases, the consumer of live content is only interested in consuming what actually happens "now"; i.e., very similar to broadcast TV. However, in this case, it is assumed that there exists no broadcast or multicast channel to the users, and instead the server functions as a distribution node, sending the same content to multiple receivers, using unicast traffic between server and client. This unicast traffic and the transport parameters are individually negotiated for each receiving client. Another aspect of live content is that it often has a very limited time of availability, as it is only available for the duration of the event the content covers. An example of such live content could be a music concert that lasts two hours and starts at a predetermined time. Thus, there is a need to announce when and for how long the live content is available. In some cases, the server providing live content may be saving some or all of the content to allow clients to pause the stream and resume it from the paused point, or to "rewind" and play continuously from a point earlier than the live point. Hence, this use case does not necessarily exclude playing from other than the live point of the stream, playing with scales other than 1.0, etc.
E.3. On-Demand Playback Using Multicast
It is possible to use RTSP to request that media be delivered to a multicast group. The entity setting up the session (the controller) will then control when and what media is delivered to the group. This use case has some potential for DoS attacks by flooding a multicast group. Therefore, a mechanism is needed to indicate that the group actually accepts the traffic from the RTSP server. An open issue in this use case is how one ensures that all receivers listening to the multicast or broadcast receives the session presentation configuring the receivers. This specification has to rely on an external solution to solve this issue.E.4. Inviting an RTSP Server into a Conference
If one has an established conference or group session, it is possible to have an RTSP server distribute media to the whole group. Transmission to the group is simplest when controlled by a single participant or leader of the conference. Shared control might be possible, but would require further investigation and possibly extensions. This use case assumes that there exists either a multicast or a conference focus that redistributes media to all participants. This use case is intended to be able to handle the following scenario: a conference leader or participant (hereafter called the "controller") has some pre-stored content on an RTSP server that he wants to share with the group. The controller sets up an RTSP session at the streaming server for this content and retrieves the session description for the content. The destination for the media content is set to the shared multicast group or conference focus. When desired by the controller, he/she can start and stop the transmission of the media to the conference group. There are several issues with this use case that are not solved by this core specification for RTSP: DoS: To avoid an RTSP server from being an unknowing participant in a DoS attack, the server needs to be able to verify the destination's acceptance of the media. Such a mechanism to verify the approval of received media does not yet exist; instead, only policies can be used, which can be made to work in controlled environments.
Distributing the presentation description to all participants in the group: To enable a media receiver to correctly decode the content, the media configuration information needs to be distributed reliably to all participants. This will most likely require support from an external protocol. Passing control of the session: If it is desired to pass control of the RTSP session between the participants, some support will be required by an external protocol to exchange state information and possibly floor control of who is controlling the RTSP session.E.5. Live Content Using Multicast
This use case in its simplest form does not require any use of RTSP at all; this is what multicast conferences being announced with SAP [RFC2974] and SDP are intended to handle. However, in use cases where more advanced features like access control to the multicast session are desired, RTSP could be used for session establishment. A client desiring to join a live multicasted media session with cryptographic (encryption) access control could use RTSP in the following way. The source of the session announces the session and gives all interested an RTSP URI. The client connects to the server and requests the presentation description, allowing configuration for reception of the media. In this step, it is possible for the client to use secured transport and any desired level of authentication; for example, for billing or access control. An RTSP link also allows for load balancing between multiple servers. If these were the only goals, they could be achieved by simply using HTTP. However, for cases where the sender likes to keep track of each individual receiver of a session, and possibly use the session as a side channel for distributing key-updates or other information on a per-receiver basis, and the full set of receivers is not known prior to the session start, the state establishment that RTSP provides can be beneficial. In this case, a client would establish an RTSP session for this multicast group with the RTSP server. The RTSP server will not transmit any media, but instead will point to the multicast group. The client and server will be able to keep the session alive for as long as the receiver participates in the session thus enabling, for example, the server to push updates to the client. This use case will most likely not be able to be implemented without some extensions to the server-to-client push mechanism. Here the PLAY_NOTIFY method (see Section 13.5) with a suitable extension could provide clear benefits.
Appendix F. Text Format for Parameters
A resource of type "text/parameters" consists of either 1) a list of parameters (for a query) or 2) a list of parameters and associated values (for a response or setting of the parameter). Each entry of the list is a single line of text. Parameters are separated from values by a colon. The parameter name MUST only use US-ASCII visible characters while the values are UTF-8 text strings. The media type registration form is in Section 22.16. There is a potential interoperability issue for this format. It was named in RFC 2326 but never defined, even if used in examples that hint at the syntax. This format matches the purpose and its syntax supports the examples provided. However, it goes further by allowing UTF-8 in the value part; thus, usage of UTF-8 strings may not be supported. However, as individual parameters are not defined, the implementing application needs to have out-of-band agreement or using feature tag anyway to determine if the endpoint supports the parameters. The ABNF [RFC5234] grammar for "text/parameters" content is: file = *((parameter / parameter-value) CRLF) parameter = 1*visible-except-colon parameter-value = parameter *WSP ":" value visible-except-colon = %x21-39 / %x3B-7E ; VCHAR - ":" value = *(TEXT-UTF8char / WSP) TEXT-UTF8char = <as defined in Section 20.1> WSP = <See RFC 5234> ; Space or HTAB VCHAR = <See RFC 5234> CRLF = <See RFC 5234>Appendix G. Requirements for Unreliable Transport of RTSP
This appendix provides guidance for those who want to implement RTSP messages over unreliable transports as has been defined in RTSP 1.0 [RFC2326]. RFC 2326 defined the "rtspu" URI scheme and provided some basic information for the transport of RTSP messages over UDP. The information is being provided here as there has been at least one commercial implementation and compatibility with that should be maintained.
The following points should be considered for an interoperable implementation: o Requests shall be acknowledged by the receiver. If there is no acknowledgement, the sender may resend the same message after a timeout of one round-trip time (RTT). Any retransmissions due to lack of acknowledgement must carry the same sequence number as the original request. o The RTT can be estimated as in TCP (RFC 6298) [RFC6298], with an initial round-trip value of 500 ms. An implementation may cache the last RTT measurement as the initial value for future connections. o The Timestamp header (Section 18.53) is used to avoid the retransmission ambiguity problem [Stevens98]. o The registered default port for RTSP over UDP for the server is 554. o RTSP messages can be carried over any lower-layer transport protocol that is 8-bit clean. o RTSP messages are vulnerable to bit errors and should not be subjected to them. o Source authentication, or at least validation that RTSP messages comes from the same entity becomes extremely important, as session hijacking may be substantially easier for RTSP message transport using an unreliable protocol like UDP than for TCP. There are two RTSP headers that are primarily intended for being used by the unreliable handling of RTSP messages and which will be maintained: o CSeq: See Section 18.20. It should be noted that the CSeq header is also required to match requests and responses independent whether a reliable or unreliable transport is used. o Timestamp: See Section 18.53Appendix H. Backwards-Compatibility Considerations
This section contains notes on issues about backwards compatibility with clients or servers being implemented according to RFC 2326 [RFC2326]. Note that there exists no requirement to implement RTSP 1.0; in fact, this document recommends against it as it is difficult to do in an interoperable way.
A server implementing RTSP 2.0 MUST include an RTSP-Version of "RTSP/2.0" in all responses to requests containing RTSP-Version value of "RTSP/2.0". If a server receives an RTSP 1.0 request, it MAY respond with an RTSP 1.0 response if it chooses to support RFC 2326. If the server chooses not to support RFC 2326, it MUST respond with a 505 (RTSP Version Not Supported) status code. A server MUST NOT respond to an RTSP 1.0 request with an RTSP 2.0 response. Clients implementing RTSP 2.0 MAY use an OPTIONS request with an RTSP-Version of "RTSP/2.0" to determine whether a server supports RTSP 2.0. If the server responds with either an RTSP-Version of "RTSP/1.0" or a status code of 505 (RTSP Version Not Supported), the client will have to use RTSP 1.0 requests if it chooses to support RFC 2326.H.1. Play Request in Play State
The behavior in the server when a Play is received in Play state has changed (Section 13.4). In RFC 2326, the new PLAY request would be queued until the current Play completed. Any new PLAY request now takes effect immediately replacing the previous request.H.2. Using Persistent Connections
Some server implementations of RFC 2326 maintain a one-to-one relationship between a connection and an RTSP session. Such implementations require clients to use a persistent connection to communicate with the server and when a client closes its connection, the server may remove the RTSP session. This is worth noting if an RTSP 2.0 client also supporting 1.0 connects to a 1.0 server.Appendix I. Changes
This appendix briefly lists the differences between RTSP 1.0 [RFC2326] and RTSP 2.0 for an informational purpose. For implementers of RTSP 2.0, it is recommended to read carefully through this memo and not to rely on the list of changes below to adapt from RTSP 1.0 to RTSP 2.0, as RTSP 2.0 is not intended to be backwards compatible with RTSP 1.0 [RFC2326] other than the version negotiation mechanism.
I.1. Brief Overview
The following protocol elements were removed in RTSP 2.0 compared to RTSP 1.0: o the RECORD and ANNOUNCE methods and all related functionality (including 201 (Created) and 250 (Low On Storage Space) status codes); o the use of UDP for RTSP message transport (due to missing interest and to broken specification); o the use of PLAY method for keep-alive in Play state. The following protocol elements were added or changed in RTSP 2.0 compared to RTSP 1.0: o RTSP session TEARDOWN from the server to the client; o IPv6 support; o extended IANA registries (e.g., transport headers parameters, transport-protocol, profile, lower-transport, and mode); o request pipelining for quick session start-up; o fully reworked state machine; o RTSP messages now use URIs rather than URLs; o incorporated much of related HTTP text ([RFC2616]) in this memo, compared to just referencing the sections in HTTP, to avoid ambiguities; o the REDIRECT method was expanded and diversified for different situations; o Includes a new section about how to set up different media- transport alternatives and their profiles in addition to lower- layer protocols. This caused the appendix on RTP interaction to be moved to the new section instead of being in the part that describes RTP. The section also includes guidelines what to consider when writing usage guidelines for new protocols and profiles;
o Added an asynchronous notification method PLAY_NOTIFY. This method is used by the RTSP server to asynchronously notify clients about session changes while in Play state. To a limited extent, this is comparable with some implementations of ANNOUNCE in RTSP 1.0 not intended for Recording.I.2. Detailed List of Changes
The below changes have been made to RTSP 1.0 (RFC 2326) when defining RTSP 2.0. Note that this list does not reflect minor changes in wording or correction of typographical errors. o The section on minimal implementation was deleted. Instead, the main part of the specification defines the core of RTSP 2.0. o The Transport header has been changed in the following ways: * The ABNF has been changed to define that extensions are possible and that unknown parameters result in servers ignoring the transport specification. * To prevent backwards compatibility issues, any extension or new parameter requires the usage of a feature tag combined with the Require header. * Syntax ambiguities with the Mode parameter have been resolved. * Syntax error with ";" for multicast and unicast has been resolved. * Two new addressing parameters have been defined: src_addr and dest_addr. These replace the parameters "port", "client_port", "server_port", "destination", and "source". * Support for IPv6 explicit addresses in all address fields has been included. * To handle URI definitions that contain ";" or ",", a quoted-URI format has been introduced and is required. * IANA registries for the transport header parameters, transport- protocol, profile, lower-transport, and mode have been defined. * The Transport header's interleaved parameter's text was made more strict and uses formal requirements levels. It was also clarified that the interleaved channels are symmetric and that it is the server that sets the channel numbers.
* It has been clarified that the client can't request of the server to use a certain RTP SSRC, using a request with the transport parameter SSRC. * Syntax definition for SSRC has been clarified to require 8HEX. It has also been extended to allow multiple values for clients supporting this version. * Clarified the text on the Transport header's "dest_addr" parameters regarding what security precautions the server is required to perform. o The Range formats have been changed in the following way: * The NPT format has been given an initial NPT identifier that must now be used. * All formats now support initial open-ended formats of type "npt=-10" and also format only "Range: smpte" ranges for usage with GET_PARAMETER requests. * The npt-hhmmss notation now follows ISO 8601 more strictly. o RTSP message handling has been changed in the following ways: * RTSP messages now use URIs rather than URLs. * It has been clarified that a 4xx message due to a missing CSeq header shall be returned without a CSeq header. * The 300 (Multiple Choices) response code has been removed. * Rules for how to handle the timing out RTSP messages have been added. * Extended Pipelining rules allowing for quick session startup. * Sequence numbering and proxy handling of sequence numbers have been defined, including cases when responses arrive out of order. o The HTTP references have been updated to first RFCs 2616 and 2617 and then to RFC 7230-7235. Most of the text has been copied and then altered to fit RTSP into this specification. The Public and the Content-Base headers have also been imported from RFC 2068 so that they are defined in the RTSP specification. Known effects on RTSP due to HTTP clarifications:
* Content-Encoding header can include encoding of type "identity". o The state machine section has been completely rewritten. It now includes more details and is also more clear about the model used. o An IANA section has been included that contains a number of registries and their rules. This will allow us to use IANA to keep track of RTSP extensions. o The transport of RTSP messages has seen the following changes: * The use of UDP for RTSP message transport has been deprecated due to missing interest and to broken specification. * The rules for how TCP connections are to be handled have been clarified. Now it is made clear that servers should not close the TCP connection unless they have been unused for significant time. * Strong recommendations why servers and clients should use persistent connections have also been added. * There is now a requirement on the servers to handle non- persistent connections as this provides fault tolerance. * Added wording on the usage of Connection:Close for RTSP. * Specified usage of TLS for RTSP messages, including a scheme to approve a proxy's TLS connection to the next hop. o The following header-related changes have been made: * Accept-Ranges response-header has been added. This header clarifies which range formats can be used for a resource. * Fixed the missing definitions for the Cache-Control header. Also added to the syntax definition the missing delta-seconds for max-stale and min-fresh parameters. * Put requirement on CSeq header that the value is increased by one for each new RTSP request. A recommendation to start at 0 has also been added. * Added a requirement that the Date header must be used for all messages with a message body and the Server should always include it.
* Removed the possibility of using Range header with Scale header to indicate when it is to be activated, since it can't work as defined. Also, added a rule that lack of Scale header in a response indicates lack of support for the header. feature tags for scaled playback have been defined. * The Speed header must now be responded to in order to indicate support and the actual speed going to be used. A feature tag is defined. Notes on congestion control were also added. * The Supported header was borrowed from SIP [RFC3261] to help with the feature negotiation in RTSP. * Clarified that the Timestamp header can be used to resolve retransmission ambiguities. * The Session header text has been expanded with an explanation on keep-alive and which methods to use. SET_PARAMETER is now recommended to use if only keep-alive within RTSP is desired. * It has been clarified how the Range header formats are used to indicate pause points in the PAUSE response. * Clarified that RTP-Info URIs that are relative use the Request- URI as base URI. Also clarified that the used URI must be the one that was used in the SETUP request. The URIs are now also required to be quoted. The header also expresses the SSRC for the provided RTP timestamp and sequence number values. * Added text that requires the Range to always be present in PLAY responses. Clarified what should be sent in case of live streams. * The headers table has been updated using a structure borrowed from SIP. Those tables convey much more information and should provide a good overview of the available headers. * It has been clarified that any message with a message body is required to have a Content-Length header. This was the case in RFC 2326, but could be misinterpreted. * ETag has changed its name to MTag. * To resolve functionality around MTag, the MTag and If-None- Match header have been added from HTTP with necessary clarification in regard to RTSP operation.
* Imported the Public header from HTTP (RFC 2068 [RFC2068]) since it has been removed from HTTP due to lack of use. Public is used quite frequently in RTSP. * Clarified rules for populating the Public header so that it is an intersection of the capabilities of all the RTSP agents in a chain. * Added the Media-Range header for listing the current availability of the media range. * Added the Notify-Reason header for giving the reason when sending PLAY_NOTIFY requests. * A new header Seek-Style has been defined to direct and inform how any seek operation should/have been performed. o The Protocol Syntax has been changed in the following way: * All ABNF definitions are updated according to the rules defined in RFC 5234 [RFC5234] and have been gathered in a separate section (Section 20). * The ABNF for the User-Agent and Server headers have been corrected. * Some definitions in the introduction regarding the RTSP session have been changed. * The protocol has been made fully IPv6 capable. * The CHAR rule has been changed to exclude NULL. o The Status codes have been changed in the following ways: * The use of status code 303 (See Other) has been deprecated as it does not make sense to use in RTSP. * The never-defined status code 411 "Length Required" has been completely removed. * When sending response 451 (Parameter Not Understood) and 458 (Parameter Is Read-Only), the response body should contain the offending parameters.
* Clarification on when a 3rr redirect status code can be received has been added. This includes receiving 3rr as a result of a request within an established session. This provides clarification to a previous unspecified behavior. * Removed the 201 (Created) and 250 (Low On Storage Space) status codes as they are only relevant to recording, which is deprecated. * Several new status codes have been defined: 464 (Data Transport Not Ready Yet), 465 (Notification Reason Unknown), 470 (Connection Authorization Required), 471 (Connection Credentials Not Accepted), and 472 (Failure to Establish Secure Connection). o The following functionality has been deprecated from the protocol: * The use of Queued Play. * The use of PLAY method for keep-alive in Play state. * The RECORD and ANNOUNCE methods and all related functionality. Some of the syntax has been removed. * The possibility to use timed execution of methods with the time parameter in the Range header. * The description on how rtspu works is not part of the core specification and will require external description. Only that it exists is mentioned here and some requirements for the transport are provided. o The following changes have been made in relation to methods: * The OPTIONS method has been clarified with regard to the use of the Public and Allow headers. * Added text clarifying the usage of SET_PARAMETER for keep-alive and usage without a body. * PLAY method is now allowed to be pipelined with the pipelining of one or more SETUP requests following the initial that generates the session for aggregated control. * REDIRECT has been expanded and diversified for different situations.
* Added a new method PLAY_NOTIFY. This method is used by the RTSP server to asynchronously notify clients about session changes. o Wrote a new section about how to set up different media-transport alternatives and their profiles as well as lower-layer protocols. This caused the appendix on RTP interaction to be moved to the new section instead of being in the part that describes RTP. The new section also includes guidelines what to consider when writing usage guidelines for new protocols and profiles. o Setup and usage of independent TCP connections for transport of RTP has been specified. o Added a new section describing the available mechanisms to determine if functionality is supported, called "Capability Handling". Renamed option-tags to feature tags. o Added a Contributors section with people who have contributed actual text to the specification. o Added a section "Use Cases" that describes the major use cases for RTSP. o Clarified the usage of a=range and how to indicate live content that are not seekable with this header. o Text specifying the special behavior of PLAY for live content. o Security features of RTSP have been clarified: * HTTP-based authorization has been clarified requiring both Basic and Digest support * TLS support has been mandated * If one implements RTP, then SRTP and defined MIKEY-based key- exchange must be supported * Various minor mitigations discussed or resulted in protocol changes.
Acknowledgements
This memorandum defines RTSP version 2.0, which is a revision of the Proposed Standard RTSP version 1.0 defined in [RFC2326]. The authors of RFC 2326 are Henning Schulzrinne, Anup Rao, and Robert Lanphier. Both RTSP version 1.0 and RTSP version 2.0 borrow format and descriptions from HTTP/1.1. Robert Sparks and especially Elwyn Davies provided very valuable and detailed reviews in the IETF Last Call that greatly improved the document and resolved many issues, especially regarding consistency. This document has benefited greatly from the comments of all those participating in the MMUSIC WG. In addition to those already mentioned, the following individuals have contributed to this specification: Rahul Agarwal, Claudio Allocchio, Jeff Ayars, Milko Boic, Torsten Braun, Brent Browning, Bruce Butterfield, Steve Casner, Maureen Chesire, Jinhang Choi, Francisco Cortes, Elwyn Davies, Spencer Dawkins, Kelly Djahandari, Martin Dunsmuir, Adrian Farrel, Stephen Farrell, Ross Finlayson, Eric Fleischman, Jay Geagan, Andy Grignon, Christian Groves, V. Guruprasad, Peter Haight, Mark Handley, Brad Hefta-Gaub, Volker Hilt, John K. Ho, Patrick Hoffman, Go Hori, Philipp Hoschka, Anne Jones, Ingemar Johansson, Jae-Hwan Kim, Anders Klemets, Ruth Lang, Barry Leiba, Stephanie Leif, Jonathan Lennox, Eduardo F. Llach, Chris Lonvick, Xavier Marjou, Thomas Marshall, Rob McCool, Martti Mela, David Oran, Joerg Ott, Joe Pallas, Maria Papadopouli, Sujal Patel, Ema Patki, Alagu Periyannan, Colin Perkins, Pekka Pessi, Igor Plotnikov, Pete Resnick, Peter Saint-Andre, Holger Schmidt, Jonathan Sergent, Pinaki Shah, David Singer, Lior Sion, Jeff Smith, Alexander Sokolsky, Dale Stammen, John Francis Stracke, Geetha Srikantan, Scott Taylor, David Walker, Stephan Wenger, Dale R. Worley, and Byungjo Yoon, and especially Flemming Andreasen.
Contributors
The following people have made written contributions that were included in the specification: o Tom Marshall contributed text on the usage of 3rr status codes. o Thomas Zheng contributed text on the usage of the Range in PLAY responses and proposed an earlier version of the PLAY_NOTIFY method. o Sean Sheedy contributed text on the timeout behavior of RTSP messages and connections, the 463 (Destination Prohibited) status code, and proposed an earlier version of the PLAY_NOTIFY method. o Greg Sherwood proposed an earlier version of the PLAY_NOTIFY method. o Fredrik Lindholm contributed text about the RTSP security framework. o John Lazzaro contributed the text for RTP over Independent TCP. o Aravind Narasimhan contributed by rewriting "Media-Transport Alternatives" (Appendix C) and making editorial improvements on a number of places in the specification. o Torbjorn Einarsson has done some editorial improvements of the text.
Authors' Addresses
Henning Schulzrinne Columbia University 1214 Amsterdam Avenue New York, NY 10027 United States of America Email: schulzrinne@cs.columbia.edu Anup Rao Cisco United States of America Email: anrao@cisco.com Rob Lanphier San Francisco, CA United States of America Email: robla@robla.net Magnus Westerlund Ericsson Faeroegatan 2 Stockholm SE-164 80 Sweden Email: magnus.westerlund@ericsson.com Martin Stiemerling (editor) University of Applied Sciences Darmstadt Haardtring 100 64295 Darmstadt Germany Email: mls.ietf@gmail.com URI: http://www.stiemerling.org