Tech-invite3GPPspaceIETFspace
9796959493929190898887868584838281807978777675747372717069686766656463626160595857565554535251504948474645444342414039383736353433323130292827262524232221201918171615141312111009080706050403020100
in Index   Prev   Next

RFC 7798

RTP Payload Format for High Efficiency Video Coding (HEVC)

Pages: 86
Proposed Standard
Part 1 of 4 – Pages 1 to 20
None   None   Next

Top   ToC   RFC7798 - Page 1
Internet Engineering Task Force (IETF)                        Y.-K. Wang
Request for Comments: 7798                                      Qualcomm
Category: Standards Track                                     Y. Sanchez
ISSN: 2070-1721                                               T. Schierl
                                                          Fraunhofer HHI
                                                               S. Wenger
                                                                   Vidyo
                                                        M. M. Hannuksela
                                                                   Nokia
                                                              March 2016


       RTP Payload Format for High Efficiency Video Coding (HEVC)

Abstract

This memo describes an RTP payload format for the video coding standard ITU-T Recommendation H.265 and ISO/IEC International Standard 23008-2, both also known as High Efficiency Video Coding (HEVC) and developed by the Joint Collaborative Team on Video Coding (JCT-VC). The RTP payload format allows for packetization of one or more Network Abstraction Layer (NAL) units in each RTP packet payload as well as fragmentation of a NAL unit into multiple RTP packets. Furthermore, it supports transmission of an HEVC bitstream over a single stream as well as multiple RTP streams. When multiple RTP streams are used, a single transport or multiple transports may be utilized. The payload format has wide applicability in videoconferencing, Internet video streaming, and high-bitrate entertainment-quality video, among others. Status of This Memo This is an Internet Standards Track document. This document is a product of the Internet Engineering Task Force (IETF). It represents the consensus of the IETF community. It has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG). Further information on Internet Standards is available in Section 2 of RFC 5741. Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at http://www.rfc-editor.org/info/rfc7798.
Top   ToC   RFC7798 - Page 2
Copyright Notice

   Copyright (c) 2016 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

1. Introduction ....................................................3 1.1. Overview of the HEVC Codec .................................4 1.1.1. Coding-Tool Features ................................4 1.1.2. Systems and Transport Interfaces ....................6 1.1.3. Parallel Processing Support ........................11 1.1.4. NAL Unit Header ....................................13 1.2. Overview of the Payload Format ............................14 2. Conventions ....................................................15 3. Definitions and Abbreviations ..................................15 3.1. Definitions ...............................................15 3.1.1. Definitions from the HEVC Specification ...........15 3.1.2. Definitions Specific to This Memo .................17 3.2. Abbreviations .............................................19 4. RTP Payload Format .............................................20 4.1. RTP Header Usage ..........................................20 4.2. Payload Header Usage ......................................22 4.3. Transmission Modes ........................................23 4.4. Payload Structures ........................................24 4.4.1. Single NAL Unit Packets ............................24 4.4.2. Aggregation Packets (APs) ..........................25 4.4.3. Fragmentation Units ................................29 4.4.4. PACI Packets .......................................32 4.4.4.1. Reasons for the PACI Rules (Informative) ..34 4.4.4.2. PACI Extensions (Informative) .............35 4.5. Temporal Scalability Control Information ..................36 4.6. Decoding Order Number .....................................37 5. Packetization Rules ............................................39 6. De-packetization Process .......................................40 7. Payload Format Parameters ......................................42 7.1. Media Type Registration ...................................42 7.2. SDP Parameters ............................................64
Top   ToC   RFC7798 - Page 3
           7.2.1. Mapping of Payload Type Parameters to SDP ..........64
           7.2.2. Usage with SDP Offer/Answer Model ..................65
           7.2.3. Usage in Declarative Session Descriptions ..........73
           7.2.4. Considerations for Parameter Sets ..................75
           7.2.5. Dependency Signaling in Multi-Stream Mode ..........75
   8. Use with Feedback Messages .....................................75
      8.1. Picture Loss Indication (PLI) .............................75
      8.2. Slice Loss Indication (SLI) ...............................76
      8.3. Reference Picture Selection Indication (RPSI) .............77
      8.4. Full Intra Request (FIR) ..................................77
   9. Security Considerations ........................................78
   10. Congestion Control ............................................79
   11. IANA Considerations ...........................................80
   12. References ....................................................80
      12.1. Normative References .....................................80
      12.2. Informative References ...................................82
   Acknowledgments ...................................................85
   Authors' Addresses ................................................86


1. Introduction

The High Efficiency Video Coding specification, formally published as both ITU-T Recommendation H.265 [HEVC] and ISO/IEC International Standard 23008-2 [ISO23008-2], was ratified by the ITU-T in April 2013; reportedly, it provides significant coding efficiency gains over H.264 [H.264]. This memo describes an RTP payload format for HEVC. It shares its basic design with the RTP payload formats of [RFC6184] and [RFC6190]. With respect to design philosophy, security, congestion control, and overall implementation complexity, it has similar properties to those earlier payload format specifications. This is a conscious choice, as at least RFC 6184 is widely deployed and generally known in the relevant implementer communities. Mechanisms from RFC 6190 were incorporated as HEVC version 1 supports temporal scalability. In order to help the overlapping implementer community, frequently only the differences between RFCs 6184 and 6190 and the HEVC payload format are highlighted in non-normative, explanatory parts of this memo. Basic familiarity with both specifications is assumed for those parts. However, the normative parts of this memo do not require study of RFCs 6184 or 6190.
Top   ToC   RFC7798 - Page 4

1.1. Overview of the HEVC Codec

H.264 and HEVC share a similar hybrid video codec design. In this memo, we provide a very brief overview of those features of HEVC that are, in some form, addressed by the payload format specified herein. Implementers have to read, understand, and apply the ITU-T/ISO/IEC specifications pertaining to HEVC to arrive at interoperable, well- performing implementations. Implementers should consider testing their design (including the interworking between the payload format implementation and the core video codec) using the tools provided by ITU-T/ISO/IEC, for example, conformance bitstreams as specified in [H.265.1]. Not doing so has historically led to systems that perform badly and that are not secure. Conceptually, both H.264 and HEVC include a Video Coding Layer (VCL), which is often used to refer to the coding-tool features, and a Network Abstraction Layer (NAL), which is often used to refer to the systems and transport interface aspects of the codecs.

1.1.1. Coding-Tool Features

Similar to earlier hybrid-video-coding-based standards, including H.264, the following basic video coding design is employed by HEVC. A prediction signal is first formed by either intra- or motion- compensated prediction, and the residual (the difference between the original and the prediction) is then coded. The gains in coding efficiency are achieved by redesigning and improving almost all parts of the codec over earlier designs. In addition, HEVC includes several tools to make the implementation on parallel architectures easier. Below is a summary of HEVC coding-tool features. Quad-tree block and transform structure One of the major tools that contributes significantly to the coding efficiency of HEVC is the use of flexible coding blocks and transforms, which are defined in a hierarchical quad-tree manner. Unlike H.264, where the basic coding block is a macroblock of fixed- size 16x16, HEVC defines a Coding Tree Unit (CTU) of a maximum size of 64x64. Each CTU can be divided into smaller units in a hierarchical quad-tree manner and can represent smaller blocks down to size 4x4. Similarly, the transforms used in HEVC can have different sizes, starting from 4x4 and going up to 32x32. Utilizing large blocks and transforms contributes to the major gain of HEVC, especially at high resolutions.
Top   ToC   RFC7798 - Page 5
   Entropy coding

   HEVC uses a single entropy-coding engine, which is based on Context
   Adaptive Binary Arithmetic Coding (CABAC) [CABAC], whereas H.264 uses
   two distinct entropy coding engines.  CABAC in HEVC shares many
   similarities with CABAC of H.264, but contains several improvements.
   Those include improvements in coding efficiency and lowered
   implementation complexity, especially for parallel architectures.

   In-loop filtering

   H.264 includes an in-loop adaptive deblocking filter, where the
   blocking artifacts around the transform edges in the reconstructed
   picture are smoothed to improve the picture quality and compression
   efficiency.  In HEVC, a similar deblocking filter is employed but
   with somewhat lower complexity.  In addition, pictures undergo a
   subsequent filtering operation called Sample Adaptive Offset (SAO),
   which is a new design element in HEVC.  SAO basically adds a pixel-
   level offset in an adaptive manner and usually acts as a de-ringing
   filter.  It is observed that SAO improves the picture quality,
   especially around sharp edges, contributing substantially to visual
   quality improvements of HEVC.

   Motion prediction and coding

   There have been a number of improvements in this area that are
   summarized as follows.  The first category is motion merge and
   Advanced Motion Vector Prediction (AMVP) modes.  The motion
   information of a prediction block can be inferred from the spatially
   or temporally neighboring blocks.  This is similar to the DIRECT mode
   in H.264 but includes new aspects to incorporate the flexible quad-
   tree structure and methods to improve the parallel implementations.
   In addition, the motion vector predictor can be signaled for improved
   efficiency.  The second category is high-precision interpolation.
   The interpolation filter length is increased to 8-tap from 6-tap,
   which improves the coding efficiency but also comes with increased
   complexity.  In addition, the interpolation filter is defined with
   higher precision without any intermediate rounding operations to
   further improve the coding efficiency.

   Intra prediction and intra-coding

   Compared to 8 intra prediction modes in H.264, HEVC supports angular
   intra prediction with 33 directions.  This increased flexibility
   improves both objective coding efficiency and visual quality as the
   edges can be better predicted and ringing artifacts around the edges
   can be reduced.  In addition, the reference samples are adaptively
   smoothed based on the prediction direction.  To avoid contouring
Top   ToC   RFC7798 - Page 6
   artifacts a new interpolative prediction generation is included to
   improve the visual quality.  Furthermore, Discrete Sine Transform
   (DST) is utilized instead of traditional Discrete Cosine Transform
   (DCT) for 4x4 intra-transform blocks.

   Other coding-tool features

   HEVC includes some tools for lossless coding and efficient screen-
   content coding, such as skipping the transform for certain blocks.
   These tools are particularly useful, for example, when streaming the
   user interface of a mobile device to a large display.

1.1.2. Systems and Transport Interfaces

HEVC inherited the basic systems and transport interfaces designs from H.264. These include the NAL-unit-based syntax structure, the hierarchical syntax and data unit structure, the Supplemental Enhancement Information (SEI) message mechanism, and the video buffering model based on the Hypothetical Reference Decoder (HRD). The hierarchical syntax and data unit structure consists of sequence- level parameter sets, multi-picture-level or picture-level parameter sets, slice-level header parameters, and lower-level parameters. In the following, a list of differences in these aspects compared to H.264 is summarized. Video parameter set A new type of parameter set, called Video Parameter Set (VPS), was introduced. For the first (2013) version of [HEVC], the VPS NAL unit is required to be available prior to its activation, while the information contained in the VPS is not necessary for operation of the decoding process. For future HEVC extensions, such as the 3D or scalable extensions, the VPS is expected to include information necessary for operation of the decoding process, e.g., decoding dependency or information for reference picture set construction of enhancement layers. The VPS provides a "big picture" of a bitstream, including what types of operation points are provided, the profile, tier, and level of the operation points, and some other high-level properties of the bitstream that can be used as the basis for session negotiation and content selection, etc. (see Section 7.1). Profile, tier, and level The profile, tier, and level syntax structure that can be included in both the VPS and Sequence Parameter Set (SPS) includes 12 bytes of data to describe the entire bitstream (including all temporally scalable layers, which are referred to as sub-layers in the HEVC specification), and can optionally include more profile, tier, and
Top   ToC   RFC7798 - Page 7
   level information pertaining to individual temporally scalable
   layers.  The profile indicator shows the "best viewed as" profile
   when the bitstream conforms to multiple profiles, similar to the
   major brand concept in the ISO Base Media File Format (ISOBMFF)
   [IS014496-12] [IS015444-12] and file formats derived based on
   ISOBMFF, such as the 3GPP file format [3GPPFF].  The profile, tier,
   and level syntax structure also includes indications such as 1)
   whether the bitstream is free of frame-packed content, 2) whether the
   bitstream is free of interlaced source content, and 3) whether the
   bitstream is free of field pictures.  When the answer is yes for both
   2) and 3), the bitstream contains only frame pictures of progressive
   source.  Based on these indications, clients/players without support
   of post-processing functionalities for the handling of frame-packed,
   interlaced source content or field pictures can reject those
   bitstreams that contain such pictures.

   Bitstream and elementary stream

   HEVC includes a definition of an elementary stream, which is new
   compared to H.264.  An elementary stream consists of a sequence of
   one or more bitstreams.  An elementary stream that consists of two or
   more bitstreams has typically been formed by splicing together two or
   more bitstreams (or parts thereof).  When an elementary stream
   contains more than one bitstream, the last NAL unit of the last
   access unit of a bitstream (except the last bitstream in the
   elementary stream) must contain an end of bitstream NAL unit, and the
   first access unit of the subsequent bitstream must be an Intra-Random
   Access Point (IRAP) access unit.  This IRAP access unit may be a
   Clean Random Access (CRA), Broken Link Access (BLA), or Instantaneous
   Decoding Refresh (IDR) access unit.

   Random access support

   HEVC includes signaling in the NAL unit header, through NAL unit
   types, of IRAP pictures beyond IDR pictures.  Three types of IRAP
   pictures, namely IDR, CRA, and BLA pictures, are supported: IDR
   pictures are conventionally referred to as closed group-of-pictures
   (closed-GOP) random access points whereas CRA and BLA pictures are
   conventionally referred to as open-GOP random access points.  BLA
   pictures usually originate from splicing of two bitstreams or part
   thereof at a CRA picture, e.g., during stream switching.  To enable
   better systems usage of IRAP pictures, altogether six different NAL
   units are defined to signal the properties of the IRAP pictures,
   which can be used to better match the stream access point types as
   defined in the ISOBMFF [IS014496-12] [IS015444-12], which are
   utilized for random access support in both 3GP-DASH [3GPDASH] and
   MPEG DASH [MPEGDASH].  Pictures following an IRAP picture in decoding
   order and preceding the IRAP picture in output order are referred to
Top   ToC   RFC7798 - Page 8
   as leading pictures associated with the IRAP picture.  There are two
   types of leading pictures: Random Access Decodable Leading (RADL)
   pictures and Random Access Skipped Leading (RASL) pictures.  RADL
   pictures are decodable when the decoding started at the associated
   IRAP picture; RASL pictures are not decodable when the decoding
   started at the associated IRAP picture and are usually discarded.
   HEVC provides mechanisms to enable specifying the conformance of a
   bitstream wherein the originally present RASL pictures have been
   discarded.  Consequently, system components can discard RASL
   pictures, when needed, without worrying about causing the bitstream
   to become non-compliant.

   Temporal scalability support

   HEVC includes an improved support of temporal scalability, by
   inclusion of the signaling of TemporalId in the NAL unit header, the
   restriction that pictures of a particular temporal sub-layer cannot
   be used for inter prediction reference by pictures of a lower
   temporal sub-layer, the sub-bitstream extraction process, and the
   requirement that each sub-bitstream extraction output be a conforming
   bitstream.  Media-Aware Network Elements (MANEs) can utilize the
   TemporalId in the NAL unit header for stream adaptation purposes
   based on temporal scalability.

   Temporal sub-layer switching support

   HEVC specifies, through NAL unit types present in the NAL unit
   header, the signaling of Temporal Sub-layer Access (TSA) and Step-
   wise Temporal Sub-layer Access (STSA).  A TSA picture and pictures
   following the TSA picture in decoding order do not use pictures prior
   to the TSA picture in decoding order with TemporalId greater than or
   equal to that of the TSA picture for inter prediction reference.  A
   TSA picture enables up-switching, at the TSA picture, to the sub-
   layer containing the TSA picture or any higher sub-layer, from the
   immediately lower sub-layer.  An STSA picture does not use pictures
   with the same TemporalId as the STSA picture for inter prediction
   reference.  Pictures following an STSA picture in decoding order with
   the same TemporalId as the STSA picture do not use pictures prior to
   the STSA picture in decoding order with the same TemporalId as the
   STSA picture for inter prediction reference.  An STSA picture enables
   up-switching, at the STSA picture, to the sub-layer containing the
   STSA picture, from the immediately lower sub-layer.

   Sub-layer reference or non-reference pictures

   The concept and signaling of reference/non-reference pictures in HEVC
   are different from H.264.  In H.264, if a picture may be used by any
   other picture for inter prediction reference, it is a reference
Top   ToC   RFC7798 - Page 9
   picture; otherwise, it is a non-reference picture, and this is
   signaled by two bits in the NAL unit header.  In HEVC, a picture is
   called a reference picture only when it is marked as "used for
   reference".  In addition, the concept of sub-layer reference picture
   was introduced.  If a picture may be used by another other picture
   with the same TemporalId for inter prediction reference, it is a sub-
   layer reference picture; otherwise, it is a sub-layer non-reference
   picture.  Whether a picture is a sub-layer reference picture or sub-
   layer non-reference picture is signaled through NAL unit type values.

   Extensibility

   Besides the TemporalId in the NAL unit header, HEVC also includes the
   signaling of a six-bit layer ID in the NAL unit header, which must be
   equal to 0 for a single-layer bitstream.  Extension mechanisms have
   been included in the VPS, SPS, Picture Parameter Set (PPS), SEI NAL
   unit, slice headers, and so on.  All these extension mechanisms
   enable future extensions in a backward-compatible manner, such that
   bitstreams encoded according to potential future HEVC extensions can
   be fed to then-legacy decoders (e.g., HEVC version 1 decoders), and
   the then-legacy decoders can decode and output the base-layer
   bitstream.

   Bitstream extraction

   HEVC includes a bitstream-extraction process as an integral part of
   the overall decoding process.  The bitstream extraction process is
   used in the process of bitstream conformance tests, which is part of
   the HRD buffering model.

   Reference picture management

   The reference picture management of HEVC, including reference picture
   marking and removal from the Decoded Picture Buffer (DPB) as well as
   Reference Picture List Construction (RPLC), differs from that of
   H.264.  Instead of the reference picture marking mechanism based on a
   sliding window plus adaptive Memory Management Control Operation
   (MMCO) described in H.264, HEVC specifies a reference picture
   management and marking mechanism based on Reference Picture Set
   (RPS), and the RPLC is consequently based on the RPS mechanism.  An
   RPS consists of a set of reference pictures associated with a
   picture, consisting of all reference pictures that are prior to the
   associated picture in decoding order, that may be used for inter
   prediction of the associated picture or any picture following the
   associated picture in decoding order.  The reference picture set
   consists of five lists of reference pictures; RefPicSetStCurrBefore,
   RefPicSetStCurrAfter, RefPicSetStFoll, RefPicSetLtCurr, and
   RefPicSetLtFoll.  RefPicSetStCurrBefore, RefPicSetStCurrAfter, and
Top   ToC   RFC7798 - Page 10
   RefPicSetLtCurr contain all reference pictures that may be used in
   inter prediction of the current picture and that may be used in inter
   prediction of one or more of the pictures following the current
   picture in decoding order.  RefPicSetStFoll and RefPicSetLtFoll
   consist of all reference pictures that are not used in inter
   prediction of the current picture but may be used in inter prediction
   of one or more of the pictures following the current picture in
   decoding order.  RPS provides an "intra-coded" signaling of the DPB
   status, instead of an "inter-coded" signaling, mainly for improved
   error resilience.  The RPLC process in HEVC is based on the RPS, by
   signaling an index to an RPS subset for each reference index; this
   process is simpler than the RPLC process in H.264.

   Ultra-low delay support

   HEVC specifies a sub-picture-level HRD operation, for support of the
   so-called ultra-low delay.  The mechanism specifies a standard-
   compliant way to enable delay reduction below a one-picture interval.
   Coded Picture Buffer (CPB) and DPB parameters at the sub-picture
   level may be signaled, and utilization of this information for the
   derivation of CPB timing (wherein the CPB removal time corresponds to
   decoding time) and DPB output timing (display time) is specified.
   Decoders are allowed to operate the HRD at the conventional access-
   unit level, even when the sub-picture-level HRD parameters are
   present.

   New SEI messages

   HEVC inherits many H.264 SEI messages with changes in syntax and/or
   semantics making them applicable to HEVC.  Additionally, there are a
   few new SEI messages reviewed briefly in the following paragraphs.

   The display orientation SEI message informs the decoder of a
   transformation that is recommended to be applied to the cropped
   decoded picture prior to display, such that the pictures can be
   properly displayed, e.g., in an upside-up manner.

   The structure of pictures SEI message provides information on the NAL
   unit types, picture-order count values, and prediction dependencies
   of a sequence of pictures.  The SEI message can be used, for example,
   for concluding what impact a lost picture has on other pictures.

   The decoded picture hash SEI message provides a checksum derived from
   the sample values of a decoded picture.  It can be used for detecting
   whether a picture was correctly received and decoded.
Top   ToC   RFC7798 - Page 11
   The active parameter sets SEI message includes the IDs of the active
   video parameter set and the active sequence parameter set and can be
   used to activate VPSs and SPSs.  In addition, the SEI message
   includes the following indications: 1) An indication of whether "full
   random accessibility" is supported (when supported, all parameter
   sets needed for decoding of the remaining of the bitstream when
   random accessing from the beginning of the current CVS by completely
   discarding all access units earlier in decoding order are present in
   the remaining bitstream, and all coded pictures in the remaining
   bitstream can be correctly decoded); 2) An indication of whether
   there is no parameter set within the current CVS that updates another
   parameter set of the same type preceding in decoding order.  An
   update of a parameter set refers to the use of the same parameter set
   ID but with some other parameters changed.  If this property is true
   for all CVSs in the bitstream, then all parameter sets can be sent
   out-of-band before session start.

   The decoding unit information SEI message provides information
   regarding coded picture buffer removal delay for a decoding unit.
   The message can be used in very-low-delay buffering operations.

   The region refresh information SEI message can be used together with
   the recovery point SEI message (present in both H.264 and HEVC) for
   improved support of gradual decoding refresh.  This supports random
   access from inter-coded pictures, wherein complete pictures can be
   correctly decoded or recovered after an indicated number of pictures
   in output/display order.

1.1.3. Parallel Processing Support

The reportedly significantly higher encoding computational demand of HEVC over H.264, in conjunction with the ever-increasing video resolution (both spatially and temporally) required by the market, led to the adoption of VCL coding tools specifically targeted to allow for parallelization on the sub-picture level. That is, parallelization occurs, at the minimum, at the granularity of an integer number of CTUs. The targets for this type of high-level parallelization are multicore CPUs and DSPs as well as multiprocessor systems. In a system design, to be useful, these tools require signaling support, which is provided in Section 7 of this memo. This section provides a brief overview of the tools available in [HEVC]. Many of the tools incorporated in HEVC were designed keeping in mind the potential parallel implementations in multicore/multiprocessor architectures. Specifically, for parallelization, four picture partition strategies, as described below, are available.
Top   ToC   RFC7798 - Page 12
   Slices are segments of the bitstream that can be reconstructed
   independently from other slices within the same picture (though there
   may still be interdependencies through loop filtering operations).
   Slices are the only tool that can be used for parallelization that is
   also available, in virtually identical form, in H.264.
   Parallelization based on slices does not require much inter-processor
   or inter-core communication (except for inter-processor or inter-core
   data sharing for motion compensation when decoding a predictively
   coded picture, which is typically much heavier than inter-processor
   or inter-core data sharing due to in-picture prediction), as slices
   are designed to be independently decodable.  However, for the same
   reason, slices can require some coding overhead.  Further, slices (in
   contrast to some of the other tools mentioned below) also serve as
   the key mechanism for bitstream partitioning to match Maximum
   Transfer Unit (MTU) size requirements, due to the in-picture
   independence of slices and the fact that each regular slice is
   encapsulated in its own NAL unit.  In many cases, the goal of
   parallelization and the goal of MTU size matching can place
   contradicting demands to the slice layout in a picture.  The
   realization of this situation led to the development of the more
   advanced tools mentioned below.

   Dependent slice segments allow for fragmentation of a coded slice
   into fragments at CTU boundaries without breaking any in-picture
   prediction mechanisms.  They are complementary to the fragmentation
   mechanism described in this memo in that they need the cooperation of
   the encoder.  As a dependent slice segment necessarily contains an
   integer number of CTUs, a decoder using multiple cores operating on
   CTUs can process a dependent slice segment without communicating
   parts of the slice segment's bitstream to other cores.
   Fragmentation, as specified in this memo, in contrast, does not
   guarantee that a fragment contains an integer number of CTUs.

   In Wavefront Parallel Processing (WPP), the picture is partitioned
   into rows of CTUs.  Entropy decoding and prediction are allowed to
   use data from CTUs in other partitions.  Parallel processing is
   possible through parallel decoding of CTU rows, where the start of
   the decoding of a row is delayed by two CTUs, so to ensure that data
   related to a CTU above and to the right of the subject CTU is
   available before the subject CTU is being decoded.  Using this
   staggered start (which appears like a wavefront when represented
   graphically), parallelization is possible with up to as many
   processors/cores as the picture contains CTU rows.

   Because in-picture prediction between neighboring CTU rows within a
   picture is allowed, the required inter-processor/inter-core
   communication to enable in-picture prediction can be substantial.
   The WPP partitioning does not result in the creation of more NAL
Top   ToC   RFC7798 - Page 13
   units compared to when it is not applied; thus, WPP cannot be used
   for MTU size matching, though slices can be used in combination for
   that purpose.

   Tiles define horizontal and vertical boundaries that partition a
   picture into tile columns and rows.  The scan order of CTUs is
   changed to be local within a tile (in the order of a CTU raster scan
   of a tile), before decoding the top-left CTU of the next tile in the
   order of tile raster scan of a picture.  Similar to slices, tiles
   break in-picture prediction dependencies (including entropy decoding
   dependencies).  However, they do not need to be included into
   individual NAL units (same as WPP in this regard); hence, tiles
   cannot be used for MTU size matching, though slices can be used in
   combination for that purpose.  Each tile can be processed by one
   processor/core, and the inter-processor/inter-core communication
   required for in-picture prediction between processing units decoding
   neighboring tiles is limited to conveying the shared slice header in
   cases a slice is spanning more than one tile, and loop-filtering-
   related sharing of reconstructed samples and metadata.  Insofar,
   tiles are less demanding in terms of inter-processor communication
   bandwidth compared to WPP due to the in-picture independence between
   two neighboring partitions.

1.1.4. NAL Unit Header

HEVC maintains the NAL unit concept of H.264 with modifications. HEVC uses a two-byte NAL unit header, as shown in Figure 1. The payload of a NAL unit refers to the NAL unit excluding the NAL unit header. +---------------+---------------+ |0|1|2|3|4|5|6|7|0|1|2|3|4|5|6|7| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |F| Type | LayerId | TID | +-------------+-----------------+ Figure 1: The Structure of the HEVC NAL Unit Header The semantics of the fields in the NAL unit header are as specified in [HEVC] and described briefly below for convenience. In addition to the name and size of each field, the corresponding syntax element name in [HEVC] is also provided. F: 1 bit forbidden_zero_bit. Required to be zero in [HEVC]. Note that the inclusion of this bit in the NAL unit header was to enable transport of HEVC video over MPEG-2 transport systems (avoidance of start code emulations) [MPEG2S]. In the context of this memo,
Top   ToC   RFC7798 - Page 14
      the value 1 may be used to indicate a syntax violation, e.g., for
      a NAL unit resulted from aggregating a number of fragmented units
      of a NAL unit but missing the last fragment, as described in
      Section 4.4.3.

   Type: 6 bits
      nal_unit_type.  This field specifies the NAL unit type as defined
      in Table 7-1 of [HEVC].  If the most significant bit of this field
      of a NAL unit is equal to 0 (i.e., the value of this field is less
      than 32), the NAL unit is a VCL NAL unit.  Otherwise, the NAL unit
      is a non-VCL NAL unit.  For a reference of all currently defined
      NAL unit types and their semantics, please refer to Section 7.4.2
      in [HEVC].

   LayerId: 6 bits
      nuh_layer_id.  Required to be equal to zero in [HEVC].  It is
      anticipated that in future scalable or 3D video coding extensions
      of this specification, this syntax element will be used to
      identify additional layers that may be present in the CVS, wherein
      a layer may be, e.g., a spatial scalable layer, a quality scalable
      layer, a texture view, or a depth view.

   TID: 3 bits
      nuh_temporal_id_plus1.  This field specifies the temporal
      identifier of the NAL unit plus 1.  The value of TemporalId is
      equal to TID minus 1.  A TID value of 0 is illegal to ensure that
      there is at least one bit in the NAL unit header equal to 1, so to
      enable independent considerations of start code emulations in the
      NAL unit header and in the NAL unit payload data.

1.2. Overview of the Payload Format

This payload format defines the following processes required for transport of HEVC coded data over RTP [RFC3550]: o Usage of RTP header with this payload format o Packetization of HEVC coded NAL units into RTP packets using three types of payload structures: a single NAL unit packet, aggregation packet, and fragment unit o Transmission of HEVC NAL units of the same bitstream within a single RTP stream or multiple RTP streams (within one or more RTP sessions), where within an RTP stream transmission of NAL units may be either non-interleaved (i.e., the transmission order of NAL units is the same as their decoding order) or interleaved (i.e., the transmission order of NAL units is different from the decoding order)
Top   ToC   RFC7798 - Page 15
   o  Media type parameters to be used with the Session Description
      Protocol (SDP) [RFC4566]

   o  A payload header extension mechanism and data structures for
      enhanced support of temporal scalability based on that extension
      mechanism.

2. Conventions

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119]. In this document, the above key words will convey that interpretation only when in ALL CAPS. Lowercase uses of these words are not to be interpreted as carrying the significance described in RFC 2119. This specification uses the notion of setting and clearing a bit when bit fields are handled. Setting a bit is the same as assigning that bit the value of 1 (On). Clearing a bit is the same as assigning that bit the value of 0 (Off).

3. Definitions and Abbreviations

3.1. Definitions

This document uses the terms and definitions of [HEVC]. Section 3.1.1 lists relevant definitions from [HEVC] for convenience. Section 3.1.2 provides definitions specific to this memo.

3.1.1. Definitions from the HEVC Specification

access unit: A set of NAL units that are associated with each other according to a specified classification rule, that are consecutive in decoding order, and that contain exactly one coded picture. BLA access unit: An access unit in which the coded picture is a BLA picture. BLA picture: An IRAP picture for which each VCL NAL unit has nal_unit_type equal to BLA_W_LP, BLA_W_RADL, or BLA_N_LP. Coded Video Sequence (CVS): A sequence of access units that consists, in decoding order, of an IRAP access unit with NoRaslOutputFlag equal to 1, followed by zero or more access units that are not IRAP access units with NoRaslOutputFlag equal to 1, including all subsequent access units up to but not including any subsequent access unit that is an IRAP access unit with NoRaslOutputFlag equal to 1.
Top   ToC   RFC7798 - Page 16
      Informative note: An IRAP access unit may be an IDR access unit, a
      BLA access unit, or a CRA access unit.  The value of
      NoRaslOutputFlag is equal to 1 for each IDR access unit, each BLA
      access unit, and each CRA access unit that is the first access
      unit in the bitstream in decoding order, is the first access unit
      that follows an end of sequence NAL unit in decoding order, or has
      HandleCraAsBlaFlag equal to 1.

   CRA access unit: An access unit in which the coded picture is a CRA
   picture.

   CRA picture: A RAP picture for which each VCL NAL unit has
   nal_unit_type equal to CRA_NUT.

   IDR access unit: An access unit in which the coded picture is an IDR
   picture.

   IDR picture: A RAP picture for which each VCL NAL unit has
   nal_unit_type equal to IDR_W_RADL or IDR_N_LP.

   IRAP access unit: An access unit in which the coded picture is an
   IRAP picture.

   IRAP picture: A coded picture for which each VCL NAL unit has
   nal_unit_type in the range of BLA_W_LP (16) to RSV_IRAP_VCL23 (23),
   inclusive.

   layer: A set of VCL NAL units that all have a particular value of
   nuh_layer_id and the associated non-VCL NAL units, or one of a set of
   syntactical structures having a hierarchical relationship.

   operation point: bitstream created from another bitstream by
   operation of the sub-bitstream extraction process with the another
   bitstream, a target highest TemporalId, and a target-layer identifier
   list as input.

   random access: The act of starting the decoding process for a
   bitstream at a point other than the beginning of the bitstream.

   sub-layer: A temporal scalable layer of a temporal scalable bitstream
   consisting of VCL NAL units with a particular value of the TemporalId
   variable, and the associated non-VCL NAL units.

   sub-layer representation: A subset of the bitstream consisting of NAL
   units of a particular sub-layer and the lower sub-layers.

   tile: A rectangular region of coding tree blocks within a particular
   tile column and a particular tile row in a picture.
Top   ToC   RFC7798 - Page 17
   tile column: A rectangular region of coding tree blocks having a
   height equal to the height of the picture and a width specified by
   syntax elements in the picture parameter set.

   tile row: A rectangular region of coding tree blocks having a height
   specified by syntax elements in the picture parameter set and a width
   equal to the width of the picture.

3.1.2. Definitions Specific to This Memo

dependee RTP stream: An RTP stream on which another RTP stream depends. All RTP streams in a Multiple RTP streams on a Single media Transport (MRST) or Multiple RTP streams on Multiple media Transports (MRMT), except for the highest RTP stream, are dependee RTP streams. highest RTP stream: The RTP stream on which no other RTP stream depends. The RTP stream in a Single RTP stream on a Single media Transport (SRST) is the highest RTP stream. Media-Aware Network Element (MANE): A network element, such as a middlebox, selective forwarding unit, or application-layer gateway that is capable of parsing certain aspects of the RTP payload headers or the RTP payload and reacting to their contents. Informative note: The concept of a MANE goes beyond normal routers or gateways in that a MANE has to be aware of the signaling (e.g., to learn about the payload type mappings of the media streams), and in that it has to be trusted when working with Secure RTP (SRTP). The advantage of using MANEs is that they allow packets to be dropped according to the needs of the media coding. For example, if a MANE has to drop packets due to congestion on a certain link, it can identify and remove those packets whose elimination produces the least adverse effect on the user experience. After dropping packets, MANEs must rewrite RTCP packets to match the changes to the RTP stream, as specified in Section 7 of [RFC3550]. Media Transport: As used in the MRST, MRMT, and SRST definitions below, Media Transport denotes the transport of packets over a transport association identified by a 5-tuple (source address, source port, destination address, destination port, transport protocol). See also Section 2.1.13 of [RFC7656]. Informative note: The term "bitstream" in this document is equivalent to the term "encoded stream" in [RFC7656].
Top   ToC   RFC7798 - Page 18
   Multiple RTP streams on a Single media Transport (MRST):  Multiple
   RTP streams carrying a single HEVC bitstream on a Single Transport.
   See also Section 3.5 of [RFC7656].

   Multiple RTP streams on Multiple media Transports (MRMT):  Multiple
   RTP streams carrying a single HEVC bitstream on Multiple Transports.
   See also Section 3.5 of [RFC7656].

   NAL unit decoding order: A NAL unit order that conforms to the
   constraints on NAL unit order given in Section 7.4.2.4 in [HEVC].

   NAL unit output order: A NAL unit order in which NAL units of
   different access units are in the output order of the decoded
   pictures corresponding to the access units, as specified in [HEVC],
   and in which NAL units within an access unit are in their decoding
   order.

   NAL-unit-like structure: A data structure that is similar to NAL
   units in the sense that it also has a NAL unit header and a payload,
   with a difference that the payload does not follow the start code
   emulation prevention mechanism required for the NAL unit syntax as
   specified in Section 7.3.1.1 of [HEVC].  Examples of NAL-unit-like
   structures defined in this memo are packet payloads of Aggregation
   Packet (AP), PAyload Content Information (PACI), and Fragmentation
   Unit (FU) packets.

   NALU-time: The value that the RTP timestamp would have if the NAL
   unit would be transported in its own RTP packet.

   RTP stream: See [RFC7656].  Within the scope of this memo, one RTP
   stream is utilized to transport one or more temporal sub-layers.

   Single RTP stream on a Single media Transport (SRST):  Single RTP
   stream carrying a single HEVC bitstream on a Single (Media)
   Transport.  See also Section 3.5 of [RFC7656].

   transmission order: The order of packets in ascending RTP sequence
   number order (in modulo arithmetic).  Within an aggregation packet,
   the NAL unit transmission order is the same as the order of
   appearance of NAL units in the packet.
Top   ToC   RFC7798 - Page 19

3.2. Abbreviations

AP Aggregation Packet BLA Broken Link Access CRA Clean Random Access CTB Coding Tree Block CTU Coding Tree Unit CVS Coded Video Sequence DPH Decoded Picture Hash FU Fragmentation Unit HRD Hypothetical Reference Decoder IDR Instantaneous Decoding Refresh IRAP Intra Random Access Point MANE Media-Aware Network Element MRMT Multiple RTP streams on Multiple media Transports MRST Multiple RTP streams on a Single media Transport MTU Maximum Transfer Unit NAL Network Abstraction Layer NALU Network Abstraction Layer Unit PACI PAyload Content Information PHES Payload Header Extension Structure PPS Picture Parameter Set RADL Random Access Decodable Leading (Picture) RASL Random Access Skipped Leading (Picture) RPS Reference Picture Set
Top   ToC   RFC7798 - Page 20
   SEI      Supplemental Enhancement Information

   SPS      Sequence Parameter Set

   SRST     Single RTP stream on a Single media Transport

   STSA     Step-wise Temporal Sub-layer Access

   TSA      Temporal Sub-layer Access

   TSCI     Temporal Scalability Control Information

   VCL      Video Coding Layer

   VPS      Video Parameter Set



(page 20 continued on part 2)

Next Section