2. Protocol Overview
This section provides an informative overview of the different mechanisms in the RTSP 2.0 protocol to give the reader a high-level understanding before getting into all the specific details. In case of conflict with this description and the later sections, the later sections take precedence. For more information about use cases considered for RTSP, see Appendix E. RTSP 2.0 is a bidirectional request and response protocol that first establishes a context including content resources (the media) and then controls the delivery of these content resources from the provider to the consumer. RTSP has three fundamental parts: Session Establishment, Media Delivery Control, and an extensibility model described below. The protocol is based on some assumptions about existing functionality to provide a complete solution for client- controlled real-time media delivery. RTSP uses text-based messages, requests and responses, that may contain a binary message body. An RTSP request starts with a method line that identifies the method, the protocol, and version and the resource on which to act. The resource is identified by a URI and the hostname part of the URI is used by RTSP client to resolve the IPv4 or IPv6 address of the RTSP server. Following the method line are a number of RTSP headers. These lines are ended by two consecutive carriage return line feed (CRLF) character pairs. The message body, if present, follows the two CRLF character pairs, and the body's length is described by a message header. RTSP responses are similar, but they start with a response line with the protocol and version followed by a status code and a reason phrase. RTSP messages are sent over a reliable transport protocol between the client and server. RTSP 2.0 requires clients and servers to implement TCP and TLS over TCP as mandatory transports for RTSP messages.
2.1. Presentation Description
RTSP exists to provide access to multimedia presentations and content but tries to be agnostic about the media type or the actual media delivery protocol that is used. To enable a client to implement a complete system, an RTSP-external mechanism for describing the presentation and the delivery protocol(s) is used. RTSP assumes that this description is either delivered completely out of band or as a data object in the response to a client's request using the DESCRIBE method (Section 13.2). Parameters that commonly have to be included in the presentation description are the following: o The number of media streams; o the resource identifier for each media stream/resource that is to be controlled by RTSP; o the protocol that will be used to deliver each media stream; o the transport protocol parameters that are not negotiated or vary with each client; o the media-encoding information enabling a client to correctly decode the media upon reception; and o an aggregate control resource identifier. RTSP uses its own URI schemes ("rtsp" and "rtsps") to reference media resources and aggregates under common control (see Section 4.2). This specification describes in Appendix D how one uses SDP [RFC4566] for describing the presentation.2.2. Session Establishment
The RTSP client can request the establishment of an RTSP session after having used the presentation description to determine which media streams are available, which media delivery protocol is used, and the resource identifiers of the media streams. The RTSP session is a common context between the client and the server that consists of one or more media resources that are to be under common media delivery control. The client creates an RTSP session by sending a request using the SETUP method (Section 13.3) to the server. In the Transport header (Section 18.54) of the SETUP request, the client also includes all
the transport parameters necessary to enable the media delivery protocol to function. This includes parameters that are preestablished by the presentation description but necessary for any middlebox to correctly handle the media delivery protocols. The Transport header in a request may contain multiple alternatives for media delivery in a prioritized list, which the server can select from. These alternatives are typically based on information in the presentation description. When receiving a SETUP request, the server determines if the media resource is available and if one or more of the of the transport parameter specifications are acceptable. If that is successful, an RTSP session context is created and the relevant parameters and state is stored. An identifier is created for the RTSP session and included in the response in the Session header (Section 18.49). The SETUP response includes a Transport header that specifies which of the alternatives has been selected and relevant parameters. A SETUP request that references an existing RTSP session but identifies a new media resource is a request to add that media resource under common control with the already-present media resources in an aggregated session. A client can expect this to work for all media resources under RTSP control within a multimedia content container. However, a server will likely refuse to aggregate resources from different content containers. Even if an RTSP session contains only a single media stream, the RTSP session can be referenced by the aggregate control URI. To avoid an extra round trip in the session establishment of aggregated RTSP sessions, RTSP 2.0 supports pipelined requests; i.e., the client can send multiple requests back-to-back without waiting first for the completion of any of them. The client uses a client- selected identifier in the Pipelined-Requests header (Section 18.33) to instruct the server to bind multiple requests together as if they included the session identifier. The SETUP response also provides additional information about the established sessions in a couple of different headers. The Media- Properties header (Section 18.29) includes a number of properties that apply for the aggregate that is valuable when doing media delivery control and configuring user interface. The Accept-Ranges header (Section 18.5) informs the client about range formats that the server supports for these media resources. The Media-Range header (Section 18.30) informs the client about the time range of the media currently available.
2.3. Media Delivery Control
After having established an RTSP session, the client can start controlling the media delivery. The basic operations are "begin playback", using the PLAY method (Section 13.4) and "suspend (pause) playback" by using the PAUSE method (Section 13.6). PLAY also allows for choosing the starting media position from which the server should deliver the media. The positioning is done by using the Range header (Section 18.40) that supports several different time formats: Normal Play Time (NPT) (Section 4.4.2), Society of Motion Picture and Television Engineers (SMPTE) Timestamps (Section 4.4.1), and absolute time (Section 4.4.3). The Range header also allows the client to specify a position where delivery should end, thus allowing a specific interval to be delivered. The support for positioning/searching within media content depends on the content's media properties. Content exists in a number of different types, such as on-demand, live, and live with simultaneous recording. Even within these categories, there are differences in how the content is generated and distributed, which affect how it can be accessed for playback. The properties applicable for the RTSP session are provided by the server in the SETUP response using the Media-Properties header (Section 18.29). These are expressed using one or several independent attributes. A first attribute is Random- Access, which indicates whether positioning is possible, and with what granularity. Another aspect is whether the content will change during the lifetime of the session. While on-demand content will be provided in full from the beginning, a live stream being recorded results in the length of the accessible content growing as the session goes on. There also exists content that is dynamically built by a protocol other than RTSP and, thus, also changes in steps during the session, but maybe not continuously. Furthermore, when content is recorded, there are cases where the complete content is not maintained, but, for example, only the last hour. All of these properties result in the need for mechanisms that will be discussed below. When the client accesses on-demand content that allows random access, the client can issue the PLAY request for any point in the content between the start and the end. The server will deliver media from the closest random access point prior to the requested point and indicate that in its PLAY response. If the client issues a PAUSE, the delivery will be halted and the point at which the server stopped will be reported back in the response. The client can later resume by sending a PLAY request without a Range header. When the server is about to complete the PLAY request by delivering the end of the content or the requested range, the server will send a PLAY_NOTIFY request (Section 13.5) indicating this.
When playing live content with no extra functions, such as recording, the client will receive the live media from the server after having sent a PLAY request. Seeking in such content is not possible as the server does not store it, but only forwards it from the source of the session. Thus, delivery continues until the client sends a PAUSE request, tears down the session, or the content ends. For live sessions that are being recorded, the client will need to keep track of how the recording progresses. Upon session establishment, the client will learn the current duration of the recording from the Media-Range header. Because the recording is ongoing, the content grows in direct relation to the time passed. Therefore, each server's response to a PLAY request will contain the current Media-Range header. The server should also regularly send (approximately every 5 minutes) the current media range in a PLAY_NOTIFY request (Section 13.5.2). If the live transmission ends, the server must send a PLAY_NOTIFY request with the updated Media- Properties indicating that the content stopped being a recorded live session and instead became on-demand content; the request also contains the final media range. While the live delivery continues, the client can request to play the current live point by using the NPT timescale symbol "now", or it can request a specific point in the available content by an explicit range request for that point. If the requested point is outside of the available interval, the server will adjust the position to the closest available point, i.e., either at the beginning or the end. A special case of recording is that where the recording is not retained longer than a specific time period; thus, as the live delivery continues, the client can access any media within a moving window that covers, for example, "now" to "now" minus 1 hour. A client that pauses on a specific point within the content may not be able to retrieve the content anymore. If the client waits too long before resuming the pause point, the content may no longer be available. In this case, the pause point will be adjusted to the closest point in the available media.2.4. Session Parameter Manipulations
A session may have additional state or functionality that affects how the server or client treats the session or content, how it functions, or feedback on how well the session works. Such extensions are not defined in this specification, but they may be covered in various extensions. RTSP has two methods for retrieving and setting parameter values on either the client or the server: GET_PARAMETER (Section 13.8) and SET_PARAMETER (Section 13.9). These methods carry the parameters in a message body of the appropriate format. One can also use headers to query state with the GET_PARAMETER method. As an
example, clients needing to know the current media range for a time- progressing session can use the GET_PARAMETER method and include the media range. Furthermore, synchronization information can be requested by using a combination of RTP-Info (Section 18.45) and Range (Section 18.40). RTSP 2.0 does not have a strong mechanism for negotiating the headers or parameters and their formats. However, responses will indicate request-headers or parameters that are not supported. A priori determination of what features are available needs to be done through out-of-band mechanisms, like the session description, or through the usage of feature tags (Section 4.5).2.5. Media Delivery
This document specifies how media is delivered with RTP [RFC3550] over UDP [RFC768], TCP [RFC793], or the RTSP connection. Additional protocols may be specified in the future as needed. The usage of RTP as a media delivery protocol requires some additional information to function well. The PLAY response contains information to enable reliable and timely delivery of how a client should synchronize different sources in the different RTP sessions. It also provides a mapping between RTP timestamps and the content- time scale. When the server wants to notify the client about the completion of the media delivery, it sends a PLAY_NOTIFY request to the client. The PLAY_NOTIFY request includes information about the stream end, including the last RTP sequence number for each stream, thus enabling the client to empty the buffer smoothly.2.5.1. Media Delivery Manipulations
The basic playback functionality of RTSP enables delivery of a range of requested content to the client at the pace intended by the content's creator. However, RTSP can also manipulate the delivery to the client in two ways. Scale: The ratio of media-content time delivered per unit of playback time. Speed: The ratio of playback time delivered per unit of wallclock time. Both affect the media delivery per time unit. However, they manipulate two independent timescales and the effects are possible to combine.
Scale (Section 18.46) is used for fast-forward or slow-motion control as it changes the amount of content timescale that should be played back per time unit. Scale > 1.0, means fast forward, e.g., scale = 2.0 results in that 2 seconds of content being played back every second of playback. Scale = 1.0 is the default value that is used if no scale is specified, i.e., playback at the content's original rate. Scale values between 0 and 1.0 provide for slow motion. Scale can be negative to allow for reverse playback in either regular pace (scale = -1.0), fast backwards (scale < -1.0), or slow-motion backwards (-1.0 < scale < 0). Scale = 0 would be equal to pause and is not allowed. In most cases, the realization of scale means server-side manipulation of the media to ensure that the client can actually play it back. The nature of these media manipulations and when they are needed is highly media-type dependent. Let's consider two common media types, audio and video. It is very difficult to modify the playback rate of audio. Typically, no more than a factor of two is possible while maintaining intelligibility by changing the pitch and rate of speech. Music goes out of tune if one tries to manipulate the playback rate by resampling it. This is a well-known problem, and audio is commonly muted or played back in short segments with skips to keep up with the current playback point. For video, it is possible to manipulate the frame rate, although the rendering capabilities are often limited to certain frame rates. Also, the allowed bitrates in decoding, the structure used in the encoding, and the dependency between frames and other capabilities of the rendering device limits the possible manipulations. Therefore, the basic fast-forward capabilities often are implemented by selecting certain subsets of frames. Due to the media restrictions, the possible scale values are commonly restricted to the set of realizable scale ratios. To enable the clients to select from the possible scale values, RTSP can signal the supported scale ratios for the content. To support aggregated or dynamic content, where this may change during the ongoing session and dependent on the location within the content, a mechanism for updating the media properties and the scale factor currently in use, exists. Speed (Section 18.50) affects how much of the playback timeline is delivered in a given wallclock period. The default is Speed = 1 which means to deliver at the same rate the media is consumed. Speed > 1 means that the receiver will get content faster than it regularly would consume it. Speed < 1 means that delivery is slower
than the regular media rate. Speed values of 0 or lower have no meaning and are not allowed. This mechanism enables two general functionalities. One is client-side scale operations, i.e., the client receives all the frames and makes the adjustment to the playback locally. The second is delivery control for the buffering of media. By specifying a speed over 1.0, the client can build up the amount of playback time it has present in its buffers to a level that is sufficient for its needs. A naive implementation of Speed would only affect the transmission schedule of the media and has a clear impact on the needed bandwidth. This would result in the data rate being proportional to the speed factor. Speed = 1.5, i.e., 50% faster than normal delivery, would result in a 50% increase in the data-transport rate. Whether or not that can be supported depends solely on the underlying network path. Scale may also have some impact on the required bandwidth due to the manipulation of the content in the new playback schedule. An example is fast forward where only the independently decodable intra-frames are included in the media stream. This usage of solely intra-frames increases the data rate significantly compared to a normal sequence with the same number of frames, where most frames are encoded using prediction. This potential increase of the data rate needs to be handled by the media sender. The client has requested that the media be delivered in a specific way, which should be honored. However, the media sender cannot ignore if the network path between the sender and the receiver can't handle the resulting media stream. In that case, the media stream needs to be adapted to fit the available resources of the path. This can result in a reduced media quality. The need for bitrate adaptation becomes especially problematic in connection with the Speed semantics. If the goal is to fill up the buffer, the client may not want to do that at the cost of reduced quality. If the client wants to make local playout changes, then it may actually require that the requested speed be honored. To resolve this issue, Speed uses a range so that both cases can be supported. The server is requested to use the highest possible speed value within the range, which is compatible with the available bandwidth. As long as the server can maintain a speed value within the range, it shall not change the media quality, but instead modify the actual delivery rate in response to available bandwidth and reflect this in the Speed value in the response. However, if this is not possible, the server should instead modify the media quality to respect the lowest speed value and the available bandwidth.
This functionality enables the local scaling implementation to use a tight range, or even a range where the lower bound equals the upper bound, to identify that it requires the server to deliver the requested amount of media time per delivery time, independent of how much it needs to adapt the media quality to fit within the available path bandwidth. For buffer filling, it is suitable to use a range with a reasonable span and with a lower bound at the nominal media rate 1.0, such as 1.0 - 2.5. If the client wants to reduce the buffer, it can specify an upper bound that is below 1.0 to force the server to deliver slower than the nominal media rate.2.6. Session Maintenance and Termination
The session context that has been established is kept alive by having the client show liveness. This is done in two main ways: o Media-transport protocol keep-alive. RTP Control Protocol (RTCP) may be used when using RTP. o Any RTSP request referencing the session context. Section 10.5 discusses the methods for showing liveness in more depth. If the client fails to show liveness for more than the established session timeout value (normally 60 seconds), the server may terminate the context. Other values may be selected by the server through the inclusion of the timeout parameter in the session header. The session context is normally terminated by the client sending a TEARDOWN request (Section 13.7) to the server referencing the aggregated control URI. An individual media resource can be removed from a session context by a TEARDOWN request referencing that particular media resource. If all media resources are removed from a session context, the session context is terminated. A client may keep the session alive indefinitely if allowed by the server; however, a client is advised to release the session context when an extended period of time without media delivery activity has passed. The client can re-establish the session context if required later. What constitutes an extended period of time is dependent on the client, server, and their usage. It is recommended that the client terminate the session before ten times the session timeout value has passed. A server may terminate the session after one session timeout period without any client activity beyond keep-alive. When a server terminates the session context, it does so by sending a TEARDOWN request indicating the reason.
A server can also request that the client tear down the session and re-establish it at an alternative server, as may be needed for maintenance. This is done by using the REDIRECT method (Section 13.10). The Terminate-Reason header (Section 18.52) is used to indicate when and why. The Location header indicates where it should connect if there is an alternative server available. When the deadline expires, the server simply stops providing the service. To achieve a clean closure, the client needs to initiate session termination prior to the deadline. In case the server has no other server to redirect to, and it wants to close the session for maintenance, it shall use the TEARDOWN method with a Terminate-Reason header.2.7. Extending RTSP
RTSP is quite a versatile protocol that supports extensions in many different directions. Even this core specification contains several blocks of functionality that are optional to implement. The use case and need for the protocol deployment should determine what parts are implemented. Allowing for extensions makes it possible for RTSP to address additional use cases. However, extensions will affect the interoperability of the protocol; therefore, it is important that they can be added in a structured way. The client can learn the capability of a server by using the OPTIONS method (Section 13.1) and the Supported header (Section 18.51). It can also try and possibly fail using new methods or require that particular features be supported using the Require (Section 18.43) or Proxy-Require (Section 18.37) header. The RTSP, in itself, can be extended in three ways, listed here in increasing order of the magnitude of changes supported: o Existing methods can be extended with new parameters, for example, headers, as long as these parameters can be safely ignored by the recipient. If the client needs negative acknowledgment when a method extension is not supported, a tag corresponding to the extension may be added in the field of the Require or Proxy- Require headers. o New methods can be added. If the recipient of the message does not understand the request, it must respond with error code 501 (Not Implemented) so that the sender can avoid using this method again. A client may also use the OPTIONS method to inquire about methods supported by the server. The server must list the methods it supports using the Public response-header.
o A new version of the protocol can be defined, allowing almost all aspects (except the position of the protocol version number) to change. A new version of the protocol must be registered through a Standards Track document. The basic capability discovery mechanism can be used to both discover support for a certain feature and to ensure that a feature is available when performing a request. For a detailed explanation of this, see Section 11. New media delivery protocols may be added and negotiated at session establishment, in addition to extensions to the core protocol. Certain types of protocol manipulations can be done through parameter formats using SET_PARAMETER and GET_PARAMETER.3. Document Conventions
3.1. Notational Conventions
All the mechanisms specified in this document are described in both prose and the Augmented Backus-Naur form (ABNF) described in detail in [RFC5234]. Indented paragraphs are used to provide informative background and motivation. This is intended to give readers who were not involved with the formulation of the specification an understanding of why things are the way they are in RTSP. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. The word, "unspecified" is used to indicate functionality or features that are not defined in this specification. Such functionality cannot be used in a standardized manner without further definition in an extension specification to RTSP.3.2. Terminology
Aggregate control: The concept of controlling multiple streams using a single timeline, generally one maintained by the server. A client, for example, uses aggregate control when it issues a single play or pause message to simultaneously control both the audio and video in a movie. A session that is under aggregate control is referred to as an "aggregated session".
Aggregate control URI: The URI used in an RTSP request to refer to and control an aggregated session. It normally, but not always, corresponds to the presentation URI specified in the session description. See Section 13.3 for more information. Client: The client is the requester of media service from the media server. Connection: A transport-layer virtual circuit established between two programs for the purpose of communication. Container file: A file that may contain multiple media streams that often constitute a presentation when played together. The concept of a container file is not embedded in the protocol. However, RTSP servers may offer aggregate control on the media streams within these files. Continuous media: Data where there is a timing relationship between source and sink; that is, the sink needs to reproduce the timing relationship that existed at the source. The most common examples of continuous media are audio and motion video. Continuous media can be real time (interactive or conversational), where there is a "tight" timing relationship between source and sink or it can be streaming where the relationship is less strict. Feature tag: A tag representing a certain set of functionality, i.e., a feature. IRI: An Internationalized Resource Identifier is similar to a URI but allows characters from the whole Universal Character Set (Unicode/ISO 10646), rather than the US-ASCII only. See [RFC3987] for more information. Live: A live presentation or session originates media from an event taking place at the same time as the media delivery. Live sessions often have an unbound or only loosely defined duration and seek operations may not be possible. Media initialization: The datatype- or codec-specific initialization. This includes such things as clock rates, color tables, etc. Any transport-independent information that is required by a client for playback of a media stream occurs in the media initialization phase of stream setup. Media parameter: A parameter specific to a media type that may be changed before or during stream delivery.
Media server: The server providing media-delivery services for one or more media streams. Different media streams within a presentation may originate from different media servers. A media server may reside on the same host or on a different host from which the presentation is invoked. (Media) Stream: A single media instance, e.g., an audio stream or a video stream as well as a single whiteboard or shared application group. When using RTP, a stream consists of all RTP and RTCP packets created by a media source within an RTP session. Message: The basic unit of RTSP communication, consisting of a structured sequence of octets matching the syntax defined in Section 20 and transmitted over a transport between RTSP agents. A message is either a request or a response. Message body: The information transferred as the payload of a message (request or response). A message body consists of meta- information in the form of message body headers and content in the form of an arbitrary number of data octets, as described in Section 9. Non-aggregated control: Control of a single media stream. Presentation: A set of one or more streams presented to the client as a complete media feed and described by a presentation description as defined below. Presentations with more than one media stream are often handled in RTSP under aggregate control. Presentation description: A presentation description contains information about one or more media streams within a presentation, such as the set of encodings, network addresses, and information about the content. Other IETF protocols, such as SDP ([RFC4566]), use the term "session" for a presentation. The presentation description may take several different formats, including but not limited to SDP format. Response: An RTSP response to a request. One type of RTSP message. If an HTTP response is meant, it is indicated explicitly. Request: An RTSP request. One type of RTSP message. If an HTTP request is meant, it is indicated explicitly. Request-URI: The URI used in a request to indicate the resource on which the request is to be performed.
RTSP agent: Either an RTSP client, an RTSP server, or an RTSP proxy. In this specification, there are many capabilities that are common to these three entities such as the capability to send requests or receive responses. This term will be used when describing functionality that is applicable to all three of these entities. RTSP session: A stateful abstraction upon which the main control methods of RTSP operate. An RTSP session is a common context; it is created and maintained on a client's request and can be destroyed by either the client or server. It is established by an RTSP server upon the completion of a successful SETUP request (when a 200 OK response is sent) and is labeled with a session identifier at that time. The session exists until timed out by the server or explicitly removed by a TEARDOWN request. An RTSP session is a stateful entity; an RTSP server maintains an explicit session state machine (see Appendix B) where most state transitions are triggered by client requests. The existence of a session implies the existence of state about the session's media streams and their respective transport mechanisms. A given session can have one or more media streams associated with it. An RTSP server uses the session to aggregate control over multiple media streams. Origin server: The server on which a given resource resides. Seeking: Requesting playback from a particular point in the content time line. Transport initialization: The negotiation of transport information (e.g., port numbers, transport protocols) between the client and the server. URI: A Universal Resource Identifier; see [RFC3986]. The URIs used in RTSP are generally URLs as they give a location for the resource. As URLs are a subset of URIs, they will be referred to as URIs to cover also the cases when an RTSP URI would not be a URL. URL: A Universal Resource Locator is a URI that identifies the resource through its primary access mechanism rather than identifying the resource by name or by some other attribute(s) of that resource.
4. Protocol Parameters
4.1. RTSP Version
This specification defines version 2.0 of RTSP. RTSP uses a "<major>.<minor>" numbering scheme to indicate versions of the protocol. The protocol versioning policy is intended to allow the sender to indicate the format of a message and its capacity for understanding further RTSP communication rather than the features obtained via that communication. No change is made to the version number for the addition of message components that do not affect communication behavior or that only add to extensible field values. The <minor> number is incremented when the changes made to the protocol add features that do not change the general message parsing algorithm but that may add to the message semantics and imply additional capabilities of the sender. The <major> number is incremented when the format of a message within the protocol is changed. The version of an RTSP message is indicated by an RTSP- Version field in the first line of the message. Note that the major and minor numbers MUST be treated as separate integers and that each MAY be incremented higher than a single digit. Thus, RTSP/2.4 is a lower version than RTSP/2.13, which, in turn, is lower than RTSP/12.3. Leading zeros SHALL NOT be sent and MUST be ignored by recipients.4.2. RTSP IRI and URI
RTSP 2.0 defines and registers or updates three URI schemes "rtsp", "rtsps", and "rtspu". The usage of the last, "rtspu", is unspecified in RTSP 2.0 and is defined here to register the URI scheme that was defined in RTSP 1.0. The "rtspu" scheme indicates unspecified transport of the RTSP messages over unreliable transport means (UDP in RTSP 1.0). An RTSP server MUST respond with an error code indicating the "rtspu" scheme is not implemented (501) to a request that carries a "rtspu" URI scheme. The details of the syntax of "rtsp" and "rtsps" URIs have been changed from RTSP 1.0. These changes include the addition of: o Support for an IPv6 literal in the host part and future IP literals through a mechanism defined in [RFC3986]. o A new relative format to use in the RTSP elements that is not required to start with "/".
Neither should have any significant impact on interoperability. If IPv6 literals are needed in the RTSP URI, then that RTSP server must be IPv6 capable, and RTSP 1.0 is not a fully IPv6 capable protocol. If an RTSP 1.0 client attempts to process the URI, the URI will not match the allowed syntax, it will be considered invalid, and processing will be stopped. This is clearly a failure to reach the resource; however, it is not a signification issue as RTSP 2.0 support was needed anyway in both server and client. Thus, failure will only occur in a later step when there is an RTSP version mismatch between client and server. The second change will only occur inside RTSP message headers, as the Request-URI must be an absolute URI. Thus, such usages will only occur after an agent has accepted and started processing RTSP 2.0 messages, and an agent using RTSP 1.0 only will not be required to parse such types of relative URIs. This specification also defines the format of RTSP IRIs [RFC3987] that can be used as RTSP resource identifiers and locators on web pages, user interfaces, on paper, etc. However, the RTSP request message format only allows usage of the absolute URI format. The RTSP IRI format MUST use the rules and transformation for IRIs to URIs, as defined in [RFC3987]. This allows a URI that matches the RTSP 2.0 specification, and so is suitable for use in a request, to be created from an RTSP IRI. The RTSP IRI and URI are both syntax restricted compared to the generic syntax defined in [RFC3986] and [RFC3987]: o An absolute URI requires the authority part; i.e., a host identity MUST be provided. o Parameters in the path element are prefixed with the reserved separator ";". The "scheme" and "host" parts of all URIs [RFC3986] and IRIs [RFC3987] are case insensitive. All other parts of RTSP URIs and IRIs are case sensitive, and they MUST NOT be case mapped. The fragment identifier is used as defined in Sections 3.5 and 4.3 of [RFC3986], i.e., the fragment is to be stripped from the IRI by the requester and not included in the Request-URI. The user agent needs to interpret the value of the fragment based on the media type the request relates to; i.e., the media type indicated in Content-Type header in the response to a DESCRIBE request. The syntax of any URI query string is unspecified and responder (usually the server) specific. The query is, from the requester's perspective, an opaque string and needs to be handled as such.
Please note that relative URIs with queries are difficult to handle due to the relative URI handling rules of RFC 3986. Any change of the path element using a relative URI results in the stripping of the query, which means the relative part needs to contain the query. The URI scheme "rtsp" requires that commands be issued via a reliable protocol (within the Internet, TCP), while the scheme "rtsps" identifies a reliable transport using secure transport (TLS [RFC5246]); see Section 19. For the scheme "rtsp", if no port number is provided in the authority part of the URI, the port number 554 MUST be used. For the scheme "rtsps", if no port number is provided in the authority part of the URI port number, the TCP port 322 MUST be used. A presentation or a stream is identified by a textual media identifier, using the character set and escape conventions of URIs [RFC3986]. URIs may refer to a stream or an aggregate of streams; i.e., a presentation. Accordingly, requests described in Section 13 can apply to either the whole presentation or an individual stream within the presentation. Note that some request methods can only be applied to streams, not presentations, and vice versa. For example, the RTSP URI: rtsp://media.example.com:554/twister/audiotrack may identify the audio stream within the presentation "twister", which can be controlled via RTSP requests issued over a TCP connection to port 554 of host media.example.com. Also, the RTSP URI: rtsp://media.example.com:554/twister identifies the presentation "twister", which may be composed of audio and video streams, but could also be something else, such as a random media redirector. This does not imply a standard way to reference streams in URIs. The presentation description defines the hierarchical relationships in the presentation and the URIs for the individual streams. A presentation description may name a stream "a.mov" and the whole presentation "b.mov". The path components of the RTSP URI are opaque to the client and do not imply any particular file system structure for the server.
This decoupling also allows presentation descriptions to be used with non-RTSP media control protocols simply by replacing the scheme in the URI.4.3. Session Identifiers
Session identifiers are strings of a length between 8-128 characters. A session identifier MUST be generated using methods that make it cryptographically random (see [RFC4086]). It is RECOMMENDED that a session identifier contain 128 bits of entropy, i.e., approximately 22 characters from a high-quality generator (see Section 21). However, note that the session identifier does not provide any security against session hijacking unless it is kept confidential by the client, server, and trusted proxies.4.4. Media-Time Formats
RTSP currently supports three different media-time formats defined below. Additional time formats may be specified in the future. These time formats can be used with the Range header (Section 18.40) to request playback and specify at which media position protocol requests actually will or have taken place. They are also used in description of the media's properties using the Media-Range header (Section 18.30). The unqualified format identifier is used on its own in Accept-Ranges header (Section 18.5) to declare supported time formats and also in the Range header (Section 18.40) to request the time format used in the response.4.4.1. SMPTE-Relative Timestamps
A timestamp may use a format derived from a Society of Motion Picture and Television Engineers (SMPTE) specification and expresses time offsets anchored at the start of the media clip. Relative timestamps are expressed as SMPTE time codes [SMPTE-TC] for frame-level access accuracy. The time code has the format: hours:minutes:seconds:frames.subframes with the origin at the start of the clip. The default SMPTE format is "SMPTE 30 drop" format, with a frame rate of 29.97 frames per second. Other SMPTE codes MAY be supported (such as "SMPTE 25") through the use of "smpte-type". For SMPTE 30, the "frames" field in the time value can assume the values 0 through 29. The difference between 30 and 29.97 frames per second is handled by dropping the first two frame indices (values 00 and 01) of every minute, except every tenth minute. If the frame and the subframe values are zero, they may be omitted. Subframes are measured in hundredths of a frame.
Examples: smpte=10:12:33:20- smpte=10:07:33- smpte=10:07:00-10:07:33:05.01 smpte-25=10:07:00-10:07:33:05.014.4.2. Normal Play Time
Normal Play Time (NPT) indicates the stream-absolute position relative to the beginning of the presentation. The timestamp consists of two parts: The mandatory first part may be expressed in either seconds only or in hours, minutes, and seconds. The optional second part consists of a decimal point and decimal figures and indicates fractions of a second. The beginning of a presentation corresponds to 0.0 seconds. Negative values are not defined. The special constant "now" is defined as the current instant of a live event. It MAY only be used for live events and MUST NOT be used for on-demand (i.e., non-live) content. NPT is defined as in Digital Storage Media Command and Control (DSMb;CC) [ISO.13818-6.1995]: Intuitively, NPT is the clock the viewer associates with a program. It is often digitally displayed on a DVD player. NPT advances normally when in normal play mode (scale = 1), advances at a faster rate when in fast-scan forward (high positive scale ratio), decrements when in scan reverse (negative scale ratio) and is fixed in pause mode. NPT is (logically) equivalent to SMPTE time codes. Examples: npt=123.45-125 npt=12:05:35.3- npt=now-
The syntax is based on ISO 8601 [ISO.8601.2000] and expresses the time elapsed since presentation start, with two different notations allowed: o The npt-hhmmss notation uses an ISO 8601 extended complete representation of the time of the day format (Section 5.3.1.1 of [ISO.8601.2000] ) using colons (":") as separators between hours, minutes, and seconds (hh:mm:ss). The hour counter is not limited to 0-24 hours; up to nineteen (19) hour digits are allowed. * In accordance with the requirements of the ISO 8601 time format, the hours, minutes, and seconds MUST all be present, with two digits used for minutes and for seconds and with at least two digits for hours. An NPT of 7 minutes and 0 seconds is represented as "00:07:00", and an NPT of 392 hours, 0 minutes, and 6 seconds is represented as "392:00:06". * RTSP 1.0 allowed NPT in the npt-hhmmss notation without any leading zeros to ensure that implementations don't fail; for backward compatibility, all RTSP 2.0 implementations are REQUIRED to support receiving NPT values, hours, minutes, or seconds, without leading zeros. o The npt-sec notation expresses the time in seconds, using between one and nineteen (19) digits. Both notations allow decimal fractions of seconds as specified in Section 5.3.1.3 of [ISO.8601.2000], using at most nine digits, and allowing only "." (full stop) as the decimal separator. The npt-sec notation is optimized for automatic generation; the npt- hhmmss notation is optimized for consumption by human readers. The "now" constant allows clients to request to receive the live feed rather than the stored or time-delayed version. This is needed since neither absolute time nor zero time are appropriate for this case.4.4.3. Absolute Time
Absolute time is expressed using a timestamp based on ISO 8601 [ISO.8601.2000]. The date is a complete representation of the calendar date in basic format (YYYYMMDD) without separators (per Section 5.2.1.1 of [ISO.8601.2000]). The time of day is provided in the complete representation basic format (hhmmss) as specified in Section 5.3.1.1 of [ISO.8601.2000], allowing decimal fractions of seconds following Section 5.3.1.3 requiring "." (full stop) as decimal separator and limiting the number of digits to no more than nine. The time expressed MUST use UTC (GMT), i.e., no time zone offsets are allowed. The full date and time specification is the
eight-digit date followed by a "T" followed by the six-digit time value, optionally followed by a full stop followed by one to nine fractions of a second and ended by "Z", e.g., YYYYMMDDThhmmss.ssZ. The reasons for this time format rather than using "Date and Time on the Internet: Timestamps" [RFC3339] are historic. We continue to use the format specified in RTSP 1.0. The motivations raised in RFC 3339 apply to why a selection from ISO 8601 was made; however, a different and even more restrictive selection was applied in this case. Below are three examples of media time formats, first, a request for a clock format range request for a starting time of November 8, 1996 at 14 h 37 min and 20 1/4 seconds UTC playing for 10 min and 5 seconds, followed by a Media-Properties header's "Time-Limited" UTC property for the 24th of December 2014 at 15 hours and 00 minutes, and finally a Terminate-Reason header "time" property for the 18th of June 2013 at 16 hours, 12 minutes, and 56 seconds: clock=19961108T143720.25Z-19961108T144725.25Z Time-Limited=20141224T1500Z time=20130618T161256Z4.5. Feature Tags
Feature tags are unique identifiers used to designate features in RTSP. These tags are used in Require (Section 18.43), Proxy-Require (Section 18.37), Proxy-Supported (Section 18.38), Supported (Section 18.51), and Unsupported (Section 18.55) header fields. A feature tag definition MUST indicate which combination of clients, servers, or proxies to which it applies. The creator of a new RTSP feature tag should either prefix the feature tag with a reverse domain name (e.g., "com.example.mynewfeature" is an apt name for a feature whose inventor can be reached at "example.com") or register the new feature tag with the Internet Assigned Numbers Authority (IANA). (See Section 22, "IANA Considerations".) The usage of feature tags is further described in Section 11, which deals with capability handling.
4.6. Message Body Tags
Message body tags are opaque strings that are used to compare two message bodies from the same resource, for example, in caches or to optimize setup after a redirect. Message body tags can be carried in the MTag header (see Section 18.31) or in SDP (see Appendix D.1.9). MTag is similar to ETag in HTTP/1.1 (see Section 3.11 of [RFC2068]). A message body tag MUST be unique across all versions of all message bodies associated with a particular resource. A given message body tag value MAY be used for message bodies obtained by requests on different URIs. The use of the same message body tag value in conjunction with message bodies obtained by requests on different URIs does not imply the equivalence of those message bodies. Message body tags are used in RTSP to make some methods conditional. The methods are made conditional through the inclusion of headers; see Section 18.24 and Section 18.26 for information on the If-Match and If-None-Match headers, respectively. Note that RTSP message body tags apply to the complete presentation, i.e., both the presentation description and the individual media streams. Thus, message body tags can be used to verify at setup time after a redirect that the same session description applies to the media at the new location using the If-Match header.4.7. Media Properties
When an RTSP server handles media, it is important to consider the different properties a media instance for delivery and playback can have. This specification considers the media properties listed below in its protocol operations. They are derived from the differences between a number of supported usages. On-demand: Media that has a fixed (given) duration that doesn't change during the lifetime of the RTSP session and is known at the time of the creation of the session. It is expected that the content of the media will not change, even if the representation, such as encoding, or quality, may change. Generally, one can seek, i.e., request any range, within the media. Dynamic On-demand: This is a variation of the on-demand case where external methods are used to manipulate the actual content of the media setup for the RTSP session. The main example is content defined by a playlist.
Live: Live media represents a progressing content stream (such as broadcast TV) where the duration may or may not be known. It is not seekable, only the content presently being delivered can be accessed. Live with Recording: A live stream that is combined with a server- side capability to store and retain the content of the live session and allow for random access delivery within the part of the already-recorded content. The actual behavior of the media stream is very much dependent on the retention policy for the media stream; either the server will be able to capture the complete media stream or it will have a limitation in how much will be retained. The media range will dynamically change as the session progress. For servers with a limited amount of storage available for recording, there will typically be a sliding window that moves forward while new data is made available and older data is discarded. To cover the above usages, the following media properties with appropriate values are specified.4.7.1. Random Access and Seeking
Random access is the ability to specify and get media delivered starting from any time (instant) within the content, an operation called "seeking". The Media-Properties header will indicate the general capability for a media resource to perform random access. Random-Access: The media is seekable to any out of a large number of points within the media. Due to media-encoding limitations, a particular point may not be reachable, but seeking to a point close by is enabled. A floating-point number of seconds may be provided to express the worst-case distance between random access points. Beginning-Only: Seeking is only possible to the beginning of the content. No-Seeking: Seeking is not possible at all. If random access is possible, as indicated by the Media-Properties header, the actual behavior policy when seeking can be controlled using the Seek-Style header (Section 18.47).
4.7.2. Retention
The following retention policies are used by media to limit possible protocol operations: Unlimited: The media will not be removed as long as the RTSP session is in existence. Time-Limited: The media will not be removed before the given wallclock time. After that time, it may or may not be available anymore. Time-Duration: The media (on fragment or unit basis) will be retained for the specified duration.4.7.3. Content Modifications
The media content and its timeline can be of different types, e.g. pre-produced content on demand, a live source that is being generated as time progresses, or something that is dynamically altered or recomposed during playback. Therefore, a media property for content modifications is needed and the following initial values are defined: Immutable: The content of the media will not change, even if the representation, such as encoding or quality changes. Dynamic: The content can change due to external methods or triggers, such as playlists, but this will be announced by explicit updates. Time-Progressing: As time progresses, new content will become available. If the content is also retained, it will become longer as everything between the start point and the point currently being made available can be accessed. If the media server uses a sliding-window policy for retention, the start point will also change as time progresses.4.7.4. Supported Scale Factors
A particular media content item often supports only a limited set or range of scales when delivering the media. To enable the client to know what values or ranges of scale operations that the whole content or the current position supports, a media properties attribute for this is defined that contains a list with the values or ranges that are supported. The attribute is named "Scales". The "Scales" attribute may be updated at any point in the content due to content consisting of spliced pieces or content being dynamically updated by out-of-band mechanisms.
4.7.5. Mapping to the Attributes
This section shows examples of how one would map the above usages to the properties and their values. Example of On-Demand: Random Access: Random-Access=5.0, Content Modifications: Immutable, Retention: Unlimited or Time-Limited. Example of Dynamic On-Demand: Random Access: Random-Access=3.0, Content Modifications: Dynamic, Retention: Unlimited or Time-Limited. Example of Live: Random Access: No-Seeking, Content Modifications: Time- Progressing, Retention: Time-Duration=0.0 Example of Live with Recording: Random Access: Random-Access=3.0, Content Modifications: Time- Progressing, Retention: Time-Duration=7200.0