Tech-invite3GPPspaceIETFspace
96959493929190898887868584838281807978777675747372717069686766656463626160595857565554535251504948474645444342414039383736353433323130292827262524232221201918171615141312111009080706050403020100
in Index   Prev   Next

RFC 8216

HTTP Live Streaming

Pages: 60
Informational
Errata
Part 1 of 4 – Pages 1 to 9
None   None   Next

Top   ToC   RFC8216 - Page 1
Independent Submission                                    R. Pantos, Ed.
Request for Comments: 8216                                   Apple, Inc.
Category: Informational                                           W. May
ISSN: 2070-1721                                       MLB Advanced Media
                                                             August 2017


                          HTTP Live Streaming

Abstract

This document describes a protocol for transferring unbounded streams of multimedia data. It specifies the data format of the files and the actions to be taken by the server (sender) and the clients (receivers) of the streams. It describes version 7 of this protocol. Status of This Memo This document is not an Internet Standards Track specification; it is published for informational purposes. This is a contribution to the RFC Series, independently of any other RFC stream. The RFC Editor has chosen to publish this document at its discretion and makes no statement about its value for implementation or deployment. Documents approved for publication by the RFC Editor are not a candidate for any level of Internet Standard; see Section 2 of RFC 7841. Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at http://www.rfc-editor.org/info/rfc8216. Copyright Notice Copyright (c) 2017 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. This document may not be modified, and derivative works of it may not be created, except to format it for publication as an RFC or to translate it into languages other than English.
Top   ToC   RFC8216 - Page 2

Table of Contents

1. Introduction to HTTP Live Streaming .............................4 2. Overview ........................................................4 3. Media Segments ..................................................6 3.1. Supported Media Segment Formats ............................6 3.2. MPEG-2 Transport Streams ...................................7 3.3. Fragmented MPEG-4 ..........................................7 3.4. Packed Audio ...............................................8 3.5. WebVTT .....................................................8 4. Playlists .......................................................9 4.1. Definition of a Playlist ..................................10 4.2. Attribute Lists ...........................................11 4.3. Playlist Tags .............................................12 4.3.1. Basic Tags .........................................12 4.3.1.1. EXTM3U ....................................12 4.3.1.2. EXT-X-VERSION .............................12 4.3.2. Media Segment Tags .................................13 4.3.2.1. EXTINF ....................................13 4.3.2.2. EXT-X-BYTERANGE ...........................14 4.3.2.3. EXT-X-DISCONTINUITY .......................14 4.3.2.4. EXT-X-KEY .................................15 4.3.2.5. EXT-X-MAP .................................17 4.3.2.6. EXT-X-PROGRAM-DATE-TIME ...................18 4.3.2.7. EXT-X-DATERANGE ...........................18 4.3.2.7.1. Mapping SCTE-35 into EXT-X-DATERANGE ................20 4.3.3. Media Playlist Tags ................................22 4.3.3.1. EXT-X-TARGETDURATION ......................22 4.3.3.2. EXT-X-MEDIA-SEQUENCE ......................22 4.3.3.3. EXT-X-DISCONTINUITY-SEQUENCE ..............23 4.3.3.4. EXT-X-ENDLIST .............................23 4.3.3.5. EXT-X-PLAYLIST-TYPE .......................24 4.3.3.6. EXT-X-I-FRAMES-ONLY .......................24 4.3.4. Master Playlist Tags ...............................25 4.3.4.1. EXT-X-MEDIA ...............................25 4.3.4.1.1. Rendition Groups ...............28 4.3.4.2. EXT-X-STREAM-INF ..........................29 4.3.4.2.1. Alternative Renditions .........32 4.3.4.3. EXT-X-I-FRAME-STREAM-INF ..................33 4.3.4.4. EXT-X-SESSION-DATA ........................34 4.3.4.5. EXT-X-SESSION-KEY .........................35 4.3.5. Media or Master Playlist Tags ......................35 4.3.5.1. EXT-X-INDEPENDENT-SEGMENTS ................35 4.3.5.2. EXT-X-START ...............................36
Top   ToC   RFC8216 - Page 3
   5. Key Files ......................................................37
      5.1. Structure of Key Files ....................................37
      5.2. IV for AES-128 ............................................37
   6. Client/Server Responsibilities .................................37
      6.1. Introduction ..............................................37
      6.2. Server Responsibilities ...................................37
           6.2.1. General Server Responsibilities ....................37
           6.2.2. Live Playlists .....................................40
           6.2.3. Encrypting Media Segments ..........................41
           6.2.4. Providing Variant Streams ..........................42
      6.3. Client Responsibilities ...................................44
           6.3.1. General Client Responsibilities ....................44
           6.3.2. Loading the Media Playlist File ....................44
           6.3.3. Playing the Media Playlist File ....................45
           6.3.4. Reloading the Media Playlist File ..................46
           6.3.5. Determining the Next Segment to Load ...............47
           6.3.6. Decrypting Encrypted Media Segments ................47
   7. Protocol Version Compatibility .................................48
   8. Playlist Examples ..............................................50
      8.1. Simple Media Playlist .....................................50
      8.2. Live Media Playlist Using HTTPS ...........................50
      8.3. Playlist with Encrypted Media Segments ....................51
      8.4. Master Playlist ...........................................51
      8.5. Master Playlist with I-Frames .............................51
      8.6. Master Playlist with Alternative Audio ....................52
      8.7. Master Playlist with Alternative Video ....................52
      8.8. Session Data in a Master Playlist .........................53
      8.9. CHARACTERISTICS Attribute Containing Multiple
           Characteristics ...........................................54
      8.10. EXT-X-DATERANGE Carrying SCTE-35 Tags ....................54
   9. IANA Considerations ............................................54
   10. Security Considerations .......................................55
   11. References ....................................................56
      11.1. Normative References .....................................56
      11.2. Informative References ...................................59
   Contributors ......................................................60
   Authors' Addresses ................................................60
Top   ToC   RFC8216 - Page 4

1. Introduction to HTTP Live Streaming

HTTP Live Streaming provides a reliable, cost-effective means of delivering continuous and long-form video over the Internet. It allows a receiver to adapt the bit rate of the media to the current network conditions in order to maintain uninterrupted playback at the best possible quality. It supports interstitial content boundaries. It provides a flexible framework for media encryption. It can efficiently offer multiple renditions of the same content, such as audio translations. It offers compatibility with large-scale HTTP caching infrastructure to support delivery to large audiences. Since the Internet-Draft was first posted in 2009, HTTP Live Streaming has been implemented and deployed by a wide array of content producers, tools vendors, distributors, and device manufacturers. In the subsequent eight years, the protocol has been refined by extensive review and discussion with a variety of media streaming implementors. The purpose of this document is to facilitate interoperability between HTTP Live Streaming implementations by describing the media transmission protocol. Using this protocol, a client can receive a continuous stream of media from a server for concurrent presentation. This document describes version 7 of the protocol.

2. Overview

A multimedia presentation is specified by a Uniform Resource Identifier (URI) [RFC3986] to a Playlist. A Playlist is either a Media Playlist or a Master Playlist. Both are UTF-8 text files containing URIs and descriptive tags. A Media Playlist contains a list of Media Segments, which, when played sequentially, will play the multimedia presentation.
Top   ToC   RFC8216 - Page 5
   Here is an example of a Media Playlist:

   #EXTM3U
   #EXT-X-TARGETDURATION:10

   #EXTINF:9.009,
   http://media.example.com/first.ts
   #EXTINF:9.009,
   http://media.example.com/second.ts
   #EXTINF:3.003,
   http://media.example.com/third.ts

   The first line is the format identifier tag #EXTM3U.  The line
   containing #EXT-X-TARGETDURATION says that all Media Segments will be
   10 seconds long or less.  Then, three Media Segments are declared.
   The first and second are 9.009 seconds long; the third is 3.003
   seconds.

   To play this Playlist, the client first downloads it and then
   downloads and plays each Media Segment declared within it.  The
   client reloads the Playlist as described in this document to discover
   any added segments.  Data SHOULD be carried over HTTP [RFC7230], but,
   in general, a URI can specify any protocol that can reliably transfer
   the specified resource on demand.

   A more complex presentation can be described by a Master Playlist.  A
   Master Playlist provides a set of Variant Streams, each of which
   describes a different version of the same content.

   A Variant Stream includes a Media Playlist that specifies media
   encoded at a particular bit rate, in a particular format, and at a
   particular resolution for media containing video.

   A Variant Stream can also specify a set of Renditions.  Renditions
   are alternate versions of the content, such as audio produced in
   different languages or video recorded from different camera angles.

   Clients should switch between different Variant Streams to adapt to
   network conditions.  Clients should choose Renditions based on user
   preferences.

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
   "OPTIONAL" in this document are to be interpreted as described in
   BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all
   capitals, as shown here.
Top   ToC   RFC8216 - Page 6

3. Media Segments

A Media Playlist contains a series of Media Segments that make up the overall presentation. A Media Segment is specified by a URI and optionally a byte range. The duration of each Media Segment is indicated in the Media Playlist by its EXTINF tag (Section 4.3.2.1). Each segment in a Media Playlist has a unique integer Media Sequence Number. The Media Sequence Number of the first segment in the Media Playlist is either 0 or declared in the Playlist (Section 4.3.3.2). The Media Sequence Number of every other segment is equal to the Media Sequence Number of the segment that precedes it plus one. Each Media Segment MUST carry the continuation of the encoded bitstream from the end of the segment with the previous Media Sequence Number, where values in a series such as timestamps and Continuity Counters MUST continue uninterrupted. The only exceptions are the first Media Segment ever to appear in a Media Playlist and Media Segments that are explicitly signaled as discontinuities (Section 4.3.2.3). Unmarked media discontinuities can trigger playback errors. Any Media Segment that contains video SHOULD include enough information to initialize a video decoder and decode a continuous set of frames that includes the final frame in the Segment; network efficiency is optimized if there is enough information in the Segment to decode all frames in the Segment. For example, any Media Segment containing H.264 video SHOULD contain an Instantaneous Decoding Refresh (IDR); frames prior to the first IDR will be downloaded but possibly discarded.

3.1. Supported Media Segment Formats

All Media Segments MUST be in a format described in this section. Transport of other media file formats is not defined. Some media formats require a common sequence of bytes to initialize a parser before a Media Segment can be parsed. This format-specific sequence is called the Media Initialization Section. The Media Initialization Section can be specified by an EXT-X-MAP tag (Section 4.3.2.5). The Media Initialization Section MUST NOT contain sample data.
Top   ToC   RFC8216 - Page 7

3.2. MPEG-2 Transport Streams

MPEG-2 Transport Streams are specified by [ISO_13818]. The Media Initialization Section of an MPEG-2 Transport Stream Segment is a Program Association Table (PAT) followed by a Program Map Table (PMT). Transport Stream Segments MUST contain a single MPEG-2 Program; playback of Multi-Program Transport Streams is not defined. Each Transport Stream Segment MUST contain a PAT and a PMT, or have an EXT-X-MAP tag (Section 4.3.2.5) applied to it. The first two Transport Stream packets in a Segment without an EXT-X-MAP tag SHOULD be a PAT and a PMT.

3.3. Fragmented MPEG-4

MPEG-4 Fragments are specified by the ISO Base Media File Format [ISOBMFF]. Unlike regular MPEG-4 files that have a Movie Box ('moov') that contains sample tables and a Media Data Box ('mdat') containing the corresponding samples, an MPEG-4 Fragment consists of a Movie Fragment Box ('moof') containing a subset of the sample table and a Media Data Box containing those samples. Use of MPEG-4 Fragments does require a Movie Box for initialization, but that Movie Box contains only non-sample-specific information such as track and sample descriptions. A Fragmented MPEG-4 (fMP4) Segment is a "segment" as defined by Section 3 of [ISOBMFF], including the constraints on Media Data Boxes in Section 8.16 of [ISOBMFF]. The Media Initialization Section for an fMP4 Segment is an ISO Base Media File that can initialize a parser for that Segment. Broadly speaking, fMP4 Segments and Media Initialization Sections are [ISOBMFF] files that also satisfy the constraints described in this section. The Media Initialization Section for an fMP4 Segment MUST contain a File Type Box ('ftyp') containing a brand that is compatible with 'iso6' or higher. The File Type Box MUST be followed by a Movie Box. The Movie Box MUST contain a Track Box ('trak') for every Track Fragment Box ('traf') in the fMP4 Segment, with matching track_ID. Each Track Box SHOULD contain a sample table, but its sample count MUST be zero. Movie Header Boxes ('mvhd') and Track Header Boxes ('tkhd') MUST have durations of zero. A Movie Extends Box ('mvex') MUST follow the last Track Box. Note that a Common Media Application Format (CMAF) Header [CMAF] meets all these requirements.
Top   ToC   RFC8216 - Page 8
   In an fMP4 Segment, every Track Fragment Box MUST contain a Track
   Fragment Decode Time Box ('tfdt'). fMP4 Segments MUST use movie-
   fragment-relative addressing. fMP4 Segments MUST NOT use external
   data references.  Note that a CMAF Segment meets these requirements.

   An fMP4 Segment in a Playlist containing the EXT-X-I-FRAMES-ONLY tag
   (Section 4.3.3.6) MAY omit the portion of the Media Data Box
   following the intra-coded frame (I-frame) sample data.

   Each fMP4 Segment in a Media Playlist MUST have an EXT-X-MAP tag
   applied to it.

3.4. Packed Audio

A Packed Audio Segment contains encoded audio samples and ID3 tags that are simply packed together with minimal framing and no per- sample timestamps. Supported Packed Audio formats are Advanced Audio Coding (AAC) with Audio Data Transport Stream (ADTS) framing [ISO_13818_7], MP3 [ISO_13818_3], AC-3 [AC_3], and Enhanced AC-3 [AC_3]. A Packed Audio Segment has no Media Initialization Section. Each Packed Audio Segment MUST signal the timestamp of its first sample with an ID3 Private frame (PRIV) tag [ID3] at the beginning of the segment. The ID3 PRIV owner identifier MUST be "com.apple.streaming.transportStreamTimestamp". The ID3 payload MUST be a 33-bit MPEG-2 Program Elementary Stream timestamp expressed as a big-endian eight-octet number, with the upper 31 bits set to zero. Clients SHOULD NOT play Packed Audio Segments without this ID3 tag.

3.5. WebVTT

A WebVTT Segment is a section of a WebVTT [WebVTT] file. WebVTT Segments carry subtitles. The Media Initialization Section of a WebVTT Segment is the WebVTT header. Each WebVTT Segment MUST contain all subtitle cues that are intended to be displayed during the period indicated by the segment EXTINF duration. The start time offset and end time offset of each cue MUST indicate the total display time for that cue, even if part of the cue time range is outside the Segment period. A WebVTT Segment MAY contain no cues; this indicates that no subtitles are to be displayed during that period.
Top   ToC   RFC8216 - Page 9
   Each WebVTT Segment MUST either start with a WebVTT header or have an
   EXT-X-MAP tag applied to it.

   In order to synchronize timestamps between audio/video and subtitles,
   an X-TIMESTAMP-MAP metadata header SHOULD be added to each WebVTT
   header.  This header maps WebVTT cue timestamps to MPEG-2 (PES)
   timestamps in other Renditions of the Variant Stream.  Its format is:

   X-TIMESTAMP-MAP=LOCAL:<cue time>,MPEGTS:<MPEG-2 time>
   e.g., X-TIMESTAMP-MAP=LOCAL:00:00:00.000,MPEGTS:900000

   The cue timestamp in the LOCAL attribute MAY fall outside the range
   of time covered by the segment.

   If a WebVTT segment does not have the X-TIMESTAMP-MAP, the client
   MUST assume that the WebVTT cue time of 0 maps to an MPEG-2 timestamp
   of 0.

   When synchronizing WebVTT with PES timestamps, clients SHOULD account
   for cases where the 33-bit PES timestamps have wrapped and the WebVTT
   cue times have not.



(page 9 continued on part 2)

Next Section