RFC 7798

RTP Payload Format for High Efficiency Video Coding (HEVC)

Pages: 86
Proposed Standard

Part 4 of 4 – Pages 64 to 86

RFC7798 - Page 64 prevText

7.2.  SDP Parameters

   The receiver MUST ignore any parameter unspecified in this memo.

7.2.1.  Mapping of Payload Type Parameters to SDP

   The media type video/H265 string is mapped to fields in the Session
   Description Protocol (SDP) [RFC4566] as follows:

   o  The media name in the "m=" line of SDP MUST be video.

   o  The encoding name in the "a=rtpmap" line of SDP MUST be H265 (the
      media subtype).

   o  The clock rate in the "a=rtpmap" line MUST be 90000.

   o  The OPTIONAL parameters profile-space, profile-id, tier-flag,
      level-id, interop-constraints, profile-compatibility-indicator,
      sprop-sub-layer-id, recv-sub-layer-id, max-recv-level-id, tx-mode,

RFC7798 - Page 65

      max-lsr, max-lps, max-cpb, max-dpb, max-br, max-tr, max-tc, max-
      fps, sprop-max-don-diff, sprop-depack-buf-nalus, sprop-depack-buf-
      bytes, depack-buf-cap, sprop-segmentation-id, sprop-spatial-
      segmentation-idc, dec-parallel-cap, and include-dph, when present,
      MUST be included in the "a=fmtp" line of SDP.  This parameter is
      expressed as a media type string, in the form of a semicolon-
      separated list of parameter=value pairs.

   o  The OPTIONAL parameters sprop-vps, sprop-sps, and sprop-pps, when
      present, MUST be included in the "a=fmtp" line of SDP or conveyed
      using the "fmtp" source attribute as specified in Section 6.3 of
      [RFC5576].  For a particular media format (i.e., RTP payload
      type), sprop-vps sprop-sps, or sprop-pps MUST NOT be both included
      in the "a=fmtp" line of SDP and conveyed using the "fmtp" source
      attribute.  When included in the "a=fmtp" line of SDP, these
      parameters are expressed as a media type string, in the form of a
      semicolon-separated list of parameter=value pairs.  When conveyed
      in the "a=fmtp" line of SDP for a particular payload type, the
      parameters sprop-vps, sprop-sps, and sprop-pps MUST be applied to
      each SSRC with the payload type.  When conveyed using the "fmtp"
      source attribute, these parameters are only associated with the
      given source and payload type as parts of the "fmtp" source
      attribute.

         Informative note: Conveyance of sprop-vps, sprop-sps, and
         sprop-pps using the "fmtp" source attribute allows for out-of-
         band transport of parameter sets in topologies like Topo-Video-
         switch-MCU as specified in [RFC7667].

   An example of media representation in SDP is as follows:

      m=video 49170 RTP/AVP 98
      a=rtpmap:98 H265/90000
      a=fmtp:98 profile-id=1;
                sprop-vps=<video parameter sets data>

7.2.2.  Usage with SDP Offer/Answer Model

   When HEVC is offered over RTP using SDP in an offer/answer model
   [RFC3264] for negotiation for unicast usage, the following
   limitations and rules apply:

   o  The parameters identifying a media format configuration for HEVC
      are profile-space, profile-id, tier-flag, level-id, interop-
      constraints, profile-compatibility-indicator, and tx-mode.  These
      media configuration parameters, except level-id, MUST be used
      symmetrically when the answerer does not include recv-sub-layer-id

RFC7798 - Page 66

      in the answer for the media format (payload type) or the included
      recv-sub-layer-id is equal to sprop-sub-layer-id in the offer.
      The answerer MUST:

      1) maintain all configuration parameters with the values remaining
         the same as in the offer for the media format (payload type),
         with the exception that the value of level-id is changeable as
         long as the highest level indicated by the answer is not higher
         than that indicated by the offer;

      2) include in the answer the recv-sub-layer-id parameter, with a
         value less than the sprop-sub-layer-id parameter in the offer,
         for the media format (payload type), and maintain all
         configuration parameters with the values being the same as
         signaled in the sprop-vps for the chosen sub-layer
         representation, with the exception that the value of level-id
         is changeable as long as the highest level indicated by the
         answer is not higher than the level indicated by the sprop-vps
         in offer for the chosen sub-layer representation; or

      3) remove the media format (payload type) completely (when one or
         more of the parameter values are not supported).

            Informative note: The above requirement for symmetric use
            does not apply for level-id, and does not apply for the
            other bitstream or RTP stream properties and capability
            parameters.

   o  The profile-compatibility-indicator, when offered as sendonly,
      describes bitstream properties.  The answerer MAY accept an RTP
      payload type even if the decoder is not capable of handling the
      profile indicated by the profile-space, profile-id, and interop-
      constraints parameters, but capable of any of the profiles
      indicated by the profile-space, profile-compatibility-indicator,
      and interop-constraints.  However, when the profile-compatibility-
      indicator is used in a recvonly or sendrecv media description, the
      bitstream using this RTP payload type is required to conform to
      all profiles indicated by profile-space, profile-compatibility-
      indicator, and interop-constraints.

   o  To simplify handling and matching of these configurations, the
      same RTP payload type number used in the offer SHOULD also be used
      in the answer, as specified in [RFC3264].

   o  The same RTP payload type number used in the offer for the media
      subtype H265 MUST be used in the answer when the answer includes
      recv-sub-layer-id.  When the answer does not include recv-sub-
      layer-id, the answer MUST NOT contain a payload type number used

RFC7798 - Page 67

      in the offer for the media subtype H265 unless the configuration
      is exactly the same as in the offer or the configuration in the
      answer only differs from that in the offer with a different value
      of level-id.  The answer MAY contain the recv-sub-layer-id
      parameter if an HEVC bitstream contains multiple operation points
      (using temporal scalability and sub-layers) and sprop-vps is
      included in the offer where information of sub-layers are present
      in the first video parameter set contained in sprop-vps.  If the
      sprop-vps is provided in an offer, an answerer MAY select a
      particular operation point indicated in the first video parameter
      set contained in sprop-vps.  When the answer includes a recv-sub-
      layer-id that is less than a sprop-sub-layer-id in the offer, all
      video parameter sets contained in the sprop-vps parameter in the
      SDP answer and all video parameter sets sent in-band for either
      the offerer-to-answerer direction or the answerer-to-offerer
      direction MUST be consistent with the first video parameter set in
      the sprop-vps parameter of the offer (see the semantics of sprop-
      vps in Section 7.1 of this document on one video parameter set
      being consistent with another video parameter set), and the
      bitstream sent in either direction MUST conform to the profile,
      tier, level, and constraints of the chosen sub-layer
      representation as indicated by the first profile_tier_level( )
      syntax structure in the first video parameter set in the sprop-vps
      parameter of the offer.

         Informative note: When an offerer receives an answer that does
         not include recv-sub-layer-id, it has to compare payload types
         not declared in the offer based on the media type (i.e.,
         video/H265) and the above media configuration parameters with
         any payload types it has already declared.  This will enable it
         to determine whether the configuration in question is new or if
         it is equivalent to configuration already offered, since a
         different payload type number may be used in the answer.  The
         ability to perform operation point selection enables a receiver
         to utilize the temporal scalable nature of an HEVC bitstream.

   o  The parameters sprop-max-don-diff, sprop-depack-buf-nalus, and
      sprop-depack-buf-bytes describe the properties of an RTP stream,
      and all RTP streams the RTP stream depends on, when present, that
      the offerer or the answerer is sending for the media format
      configuration.  This differs from the normal usage of the
      offer/answer parameters: normally such parameters declare the
      properties of the bitstream or RTP stream that the offerer or the
      answerer is able to receive.  When dealing with HEVC, the offerer
      assumes that the answerer will be able to receive media encoded
      using the configuration being offered.

RFC7798 - Page 68

         Informative note:  The above parameters apply for any RTP
         stream and all RTP streams the RTP stream depends on, when
         present, sent by a declaring entity with the same
         configuration.  In other words, the applicability of the above
         parameters to RTP streams depends on the source endpoint.
         Rather than being bound to the payload type, the values may
         have to be applied to another payload type when being sent, as
         they apply for the configuration.

   o  The capability parameters max-lsr, max-lps, max-cpb, max-dpb, max-
      br, max-tr, and max-tc MAY be used to declare further capabilities
      of the offerer or answerer for receiving.  These parameters MUST
      NOT be present when the direction attribute is sendonly.

   o  The capability parameter max-fps MAY be used to declare lower
      capabilities of the offerer or answerer for receiving.  The
      parameters MUST NOT be present when the direction attribute is
      sendonly.

   o  The capability parameter dec-parallel-cap MAY be used to declare
      additional decoding capabilities of the offerer or answerer for
      receiving.  Upon receiving such a declaration of a receiver, a
      sender MAY send a bitstream to the receiver utilizing those
      capabilities under the assumption that the bitstream fulfills the
      parallelism requirement.  A bitstream that is sent based on
      choosing a capability point with parallel tool type 'w' from dec-
      parallel-cap MUST have entropy_coding_sync_enabled_flag equal to 1
      and min_spatial_segmentation_idc equal to or larger than dec-
      parallel-cap.spatial-seg-idc of the capability point.  A bitstream
      that is sent based on choosing a capability point with parallel
      tool type 't' from dec-parallel-cap MUST have
      entropy_coding_sync_enabled_flag equal to 0 and
      min_spatial_segmentation_idc equal to or larger than dec-parallel-
      cap.spatial-seg-idc of the capability point.

   o  An offerer has to include the size of the de-packetization buffer,
      sprop-depack-buf-bytes, as well as sprop-max-don-diff and sprop-
      depack-buf-nalus, in the offer for an interleaved HEVC bitstream
      or for the MRST or MRMT transmission mode when sprop-max-don-diff
      is greater than 0 for at least one of the RTP streams.  To enable
      the offerer and answerer to inform each other about their
      capabilities for de-packetization buffering in receiving RTP
      streams, both parties are RECOMMENDED to include depack-buf-cap.
      For interleaved RTP streams or in MRST or MRMT, it is also
      RECOMMENDED to consider offering multiple payload types with
      different buffering requirements when the capabilities of the
      receiver are unknown.

RFC7798 - Page 69

   o  The capability parameter include-dph MAY be used to declare the
      capability to utilize decoded picture hash SEI messages and which
      types of hashes in any HEVC RTP streams received by the offerer or
      answerer.

   o  The sprop-vps, sprop-sps, or sprop-pps, when present (included in
      the "a=fmtp" line of SDP or conveyed using the "fmtp" source
      attribute as specified in Section 6.3 of [RFC5576]), are used for
      out-of-band transport of the parameter sets (VPS, SPS, or PPS,
      respectively).

   o  The answerer MAY use either out-of-band or in-band transport of
      parameter sets for the bitstream it is sending, regardless of
      whether out-of-band parameter sets transport has been used in the
      offerer-to-answerer direction.  Parameter sets included in an
      answer are independent of those parameter sets included in the
      offer, as they are used for decoding two different bitstreams, one
      from the answerer to the offerer and the other in the opposite
      direction.  In case some RTP streams are sent before the SDP
      offer/answer settles down, in-band parameter sets MUST be used for
      those RTP stream parts sent before the SDP offer/answer.

   o  The following rules apply to transport of parameter set in the
      offerer-to-answerer direction.

      +  An offer MAY include sprop-vps, sprop-sps, and/or sprop-pps.
         If none of these parameters is present in the offer, then only
         in-band transport of parameter sets is used.

      +  If the level to use in the offerer-to-answerer direction is
         equal to the default level in the offer, the answerer MUST be
         prepared to use the parameter sets included in sprop-vps,
         sprop-sps, and sprop-pps (either included in the "a=fmtp" line
         of SDP or conveyed using the "fmtp" source attribute) for
         decoding the incoming bitstream, e.g., by passing these
         parameter set NAL units to the video decoder before passing any
         NAL units carried in the RTP streams.  Otherwise, the answerer
         MUST ignore sprop-vps, sprop-sps, and sprop-pps (either
         included in the "a=fmtp" line of SDP or conveyed using the
         "fmtp" source attribute) and the offerer MUST transmit
         parameter sets in-band.

      +  In MRST or MRMT, the answerer MUST be prepared to use the
         parameter sets out-of-band transmitted for the RTP stream and
         all RTP streams the RTP stream depends on, when present, for
         decoding the incoming bitstream, e.g., by passing these
         parameter set NAL units to the video decoder before passing any
         NAL units carried in the RTP streams.

RFC7798 - Page 70

   o  The following rules apply to transport of parameter set in the
      answerer-to-offerer direction.

      +  An answer MAY include sprop-vps, sprop-sps, and/or sprop-pps.
         If none of these parameters is present in the answer, then only
         in-band transport of parameter sets is used.

      +  The offerer MUST be prepared to use the parameter sets included
         in sprop-vps, sprop-sps, and sprop-pps (either included in the
         "a=fmtp" line of SDP or conveyed using the "fmtp" source
         attribute) for decoding the incoming bitstream, e.g., by
         passing these parameter set NAL units to the video decoder
         before passing any NAL units carried in the RTP streams.

      +  In MRST or MRMT, the offerer MUST be prepared to use the
         parameter sets out-of-band transmitted for the RTP stream and
         all RTP streams the RTP stream depends on, when present, for
         decoding the incoming bitstream, e.g., by passing these
         parameter set NAL units to the video decoder before passing any
         NAL units carried in the RTP streams.

   o  When sprop-vps, sprop-sps, and/or sprop-pps are conveyed using the
      "fmtp" source attribute as specified in Section 6.3 of [RFC5576],
      the receiver of the parameters MUST store the parameter sets
      included in sprop-vps, sprop-sps, and/or sprop-pps and associate
      them with the source given as part of the "fmtp" source attribute.
      Parameter sets associated with one source (given as part of the
      "fmtp" source attribute) MUST only be used to decode NAL units
      conveyed in RTP packets from the same source (given as part of the
      "fmtp" source attribute).  When this mechanism is in use, SSRC
      collision detection and resolution MUST be performed as specified
      in [RFC5576].

   For bitstreams being delivered over multicast, the following rules
   apply:

      o  The media format configuration is identified by profile-space,
         profile-id, tier-flag, level-id, interop-constraints, profile-
         compatibility-indicator, and tx-mode.  These media format
         configuration parameters, including level-id, MUST be used
         symmetrically; that is, the answerer MUST either maintain all
         configuration parameters or remove the media format (payload
         type) completely.  Note that this implies that the level-id for
         offer/answer in multicast is not changeable.

RFC7798 - Page 71

      o  To simplify the handling and matching of these configurations,
         the same RTP payload type number used in the offer SHOULD also
         be used in the answer, as specified in [RFC3264].  An answer
         MUST NOT contain a payload type number used in the offer unless
         the configuration is the same as in the offer.

      o  Parameter sets received MUST be associated with the originating
         source and MUST only be used in decoding the incoming bitstream
         from the same source.

      o  The rules for other parameters are the same as above for
         unicast as long as the three above rules are obeyed.

   Table 1 lists the interpretation of all the parameters that MUST be
   used for the various combinations of offer, answer, and direction
   attributes.  Note that the two columns wherein the recv-sub-layer-id
   parameter is used only apply to answers, whereas the other columns
   apply to both offers and answers.

   Table 1.  Interpretation of parameters for various combinations of
   offers, answers, direction attributes, with and without recv-sub-
   layer-id.  Columns that do not indicate offer or answer apply to
   both.

RFC7798 - Page 72

                                       sendonly --+
         answer: recvonly, recv-sub-layer-id --+  |
           recvonly w/o recv-sub-layer-id --+  |  |
   answer: sendrecv, recv-sub-layer-id --+  |  |  |
     sendrecv w/o recv-sub-layer-id --+  |  |  |  |
                                      |  |  |  |  |
   profile-space                      C  D  C  D  P
   profile-id                         C  D  C  D  P
   tier-flag                          C  D  C  D  P
   level-id                           D  D  D  D  P
   interop-constraints                C  D  C  D  P
   profile-compatibility-indicator    C  D  C  D  P
   tx-mode                            C  C  C  C  P
   max-recv-level-id                  R  R  R  R  -
   sprop-max-don-diff                 P  P  -  -  P
   sprop-depack-buf-nalus             P  P  -  -  P
   sprop-depack-buf-bytes             P  P  -  -  P
   depack-buf-cap                     R  R  R  R  -
   sprop-segmentation-id              P  P  P  P  P
   sprop-spatial-segmentation-idc     P  P  P  P  P
   max-br                             R  R  R  R  -
   max-cpb                            R  R  R  R  -
   max-dpb                            R  R  R  R  -
   max-lsr                            R  R  R  R  -
   max-lps                            R  R  R  R  -
   max-tr                             R  R  R  R  -
   max-tc                             R  R  R  R  -
   max-fps                            R  R  R  R  -
   sprop-vps                          P  P  -  -  P
   sprop-sps                          P  P  -  -  P
   sprop-pps                          P  P  -  -  P
   sprop-sub-layer-id                 P  P  -  -  P
   recv-sub-layer-id                  X  O  X  O  -
   dec-parallel-cap                   R  R  R  R  -
   include-dph                        R  R  R  R  -

   Legend:

    C: configuration for sending and receiving bitstreams
    D: changeable configuration, same as C except possible
       to answer with a different but consistent value (see the
       semantics of the six parameters related to profile, tier,
       and level on these parameters being consistent)
    P: properties of the bitstream to be sent
    R: receiver capabilities
    O: operation point selection
    X: MUST NOT be present
    -: not usable, when present MUST be ignored

RFC7798 - Page 73

   Parameters used for declaring receiver capabilities are, in general,
   downgradable; i.e., they express the upper limit for a sender's
   possible behavior.  Thus, a sender MAY select to set its encoder
   using only lower/lesser or equal values of these parameters.

   When the answer does not include a recv-sub-layer-id that is less
   than the sprop-sub-layer-id in the offer, parameters declaring a
   configuration point are not changeable, with the exception of the
   level-id parameter for unicast usage, and these parameters express
   values a receiver expects to be used and MUST be used verbatim in the
   answer as in the offer.

   When a sender's capabilities are declared with the configuration
   parameters, these parameters express a configuration that is
   acceptable for the sender to receive bitstreams.  In order to achieve
   high interoperability levels, it is often advisable to offer multiple
   alternative configurations.  It is impossible to offer multiple
   configurations in a single payload type.  Thus, when multiple
   configuration offers are made, each offer requires its own RTP
   payload type associated with the offer.  However, it is possible to
   offer multiple operation points using one configuration in a single
   payload type by including sprop-vps in the offer and recv-sub-layer-
   id in the answer.

   A receiver SHOULD understand all media type parameters, even if it
   only supports a subset of the payload format's functionality.  This
   ensures that a receiver is capable of understanding when an offer to
   receive media can be downgraded to what is supported by the receiver
   of the offer.

   An answerer MAY extend the offer with additional media format
   configurations.  However, to enable their usage, in most cases a
   second offer is required from the offerer to provide the bitstream
   property parameters that the media sender will use.  This also has
   the effect that the offerer has to be able to receive this media
   format configuration, not only to send it.

7.2.3.  Usage in Declarative Session Descriptions

   When HEVC over RTP is offered with SDP in a declarative style, as in
   Real Time Streaming Protocol (RTSP) [RFC2326] or Session Announcement
   Protocol (SAP) [RFC2974], the following considerations are necessary.

RFC7798 - Page 74

      o  All parameters capable of indicating both bitstream properties
         and receiver capabilities are used to indicate only bitstream
         properties.  For example, in this case, the parameter profile-
         tier-level-id declares the values used by the bitstream, not
         the capabilities for receiving bitstreams.  As a result, the
         following interpretation of the parameters MUST be used:

         + Declaring actual configuration or bitstream properties:
            - profile-space
            - profile-id
            - tier-flag
            - level-id
            - interop-constraints
            - profile-compatibility-indicator
            - tx-mode
            - sprop-vps
            - sprop-sps
            - sprop-pps
            - sprop-max-don-diff
            - sprop-depack-buf-nalus
            - sprop-depack-buf-bytes
            - sprop-segmentation-id
            - sprop-spatial-segmentation-idc

         + Not usable (when present, they MUST be ignored):
            - max-lps
            - max-lsr
            - max-cpb
            - max-dpb
            - max-br
            - max-tr
            - max-tc
            - max-fps
            - max-recv-level-id
            - depack-buf-cap
            - sprop-sub-layer-id
            - dec-parallel-cap
            - include-dph

      o  A receiver of the SDP is required to support all parameters and
         values of the parameters provided; otherwise, the receiver MUST
         reject (RTSP) or not participate in (SAP) the session.  It
         falls on the creator of the session to use values that are
         expected to be supported by the receiving application.

RFC7798 - Page 75

7.2.4.  Considerations for Parameter Sets

   When out-of-band transport of parameter sets is used, parameter sets
   MAY still be additionally transported in-band unless explicitly
   disallowed by an application, and some of these additional parameter
   sets may update some of the out-of-band transported parameter sets.
   Update of a parameter set refers to the sending of a parameter set of
   the same type using the same parameter set ID but with different
   values for at least one other parameter of the parameter set.

7.2.5.  Dependency Signaling in Multi-Stream Mode

   If MRST or MRMT is used, the rules on signaling media decoding
   dependency in SDP as defined in [RFC5583] apply.  The rules on
   "hierarchical or layered encoding" with multicast in Section 5.7 of
   [RFC4566] do not apply.  This means that the notation for Connection
   Data "c=" SHALL NOT be used with more than one address, i.e., the
   sub-field <number of addresses> in the sub-field <connection-address>
   of the "c=" field, described in [RFC4566], must not be present.  The
   order of session dependency is given from the RTP stream containing
   the lowest temporal sub-layer to the RTP stream containing the
   highest temporal sub-layer.

8.  Use with Feedback Messages

   The following subsections define the use of the Picture Loss
   Indication (PLI), Slice Lost Indication (SLI), Reference Picture
   Selection Indication (RPSI), and Full Intra Request (FIR) feedback
   messages with HEVC.  The PLI, SLI, and RPSI messages are defined in
   [RFC4585], and the FIR message is defined in [RFC5104].

8.1.  Picture Loss Indication (PLI)

   As specified in RFC 4585, Section 6.3.1, the reception of a PLI by a
   media sender indicates "the loss of an undefined amount of coded
   video data belonging to one or more pictures".  Without having any
   specific knowledge of the setup of the bitstream (such as use and
   location of in-band parameter sets, non-IDR decoder refresh points,
   picture structures, and so forth), a reaction to the reception of an
   PLI by an HEVC sender SHOULD be to send an IDR picture and relevant
   parameter sets; potentially with sufficient redundancy so to ensure
   correct reception.  However, sometimes information about the
   bitstream structure is known.  For example, state could have been
   established outside of the mechanisms defined in this document that
   parameter sets are conveyed out of band only, and stay static for the
   duration of the session.  In that case, it is obviously unnecessary
   to send them in-band as a result of the reception of a PLI.  Other

RFC7798 - Page 76

   examples could be devised based on a priori knowledge of different
   aspects of the bitstream structure.  In all cases, the timing and
   congestion control mechanisms of RFC 4585 MUST be observed.

8.2.  Slice Loss Indication (SLI)

   The SLI described in RFC 4585 can be used to indicate, to a sender,
   the loss of a number of Coded Tree Blocks (CTBs) in a CTB raster scan
   order of a picture.  In the SLI's Feedback Control Indication (FCI)
   field, the subfield "First" MUST be set to the CTB address of the
   first lost CTB.  Note that the CTB address is in CTB-raster-scan
   order of a picture.  For the first CTB of a slice segment, the CTB
   address is the value of slice_segment_address when present, or 0 when
   the value of first_slice_segment_in_pic_flag is equal to 1; both
   syntax elements are in the slice segment header.  The subfield
   "Number" MUST be set to the number of consecutive lost CTBs, again in
   CTB-raster-scan order of a picture.  Note that due to both the
   "First" and "Number" being counted in CTBs in CTB-raster-scan order,
   of a picture, not in tile-scan order (which is the bitstream order of
   CTBs), multiple SLI messages may be needed to report the loss of one
   tile covering multiple CTB rows but less wide than the picture.

   The subfield "PictureID" MUST be set to the 6 least significant bits
   of a binary representation of the value of PicOrderCntVal, as defined
   in [HEVC], of the picture for which the lost CTBs are indicated.
   Note that for IDR pictures the syntax element slice_pic_order_cnt_lsb
   is not present, but then the value is inferred to be equal to 0.

   As described in RFC 4585, an encoder in a media sender can use this
   information to "clean up" the corrupted picture by sending intra
   information, while observing the constraints described in RFC 4585,
   for example, with respect to congestion control.  In many cases,
   error tracking is required to identify the corrupted region in the
   receiver's state (reference pictures) because of error import in
   uncorrupted regions of the picture through motion compensation.
   Reference-picture selection can also be used to "clean up" the
   corrupted picture, which is usually more efficient and less likely to
   generate congestion than sending intra information.

   In contrast to the video codecs contemplated in RFCs 4585 and 5104
   [RFC5104], in HEVC, the "macroblock size" is not fixed to 16x16 luma
   samples, but is variable.  That, however, does not create a
   conceptual difficulty with SLI, because the setting of the CTB size
   is a sequence-level functionality, and using a slice loss indication
   across CVS boundaries is meaningless as there is no prediction across
   sequence boundaries.  However, a proper use of SLI messages is not as
   straightforward as it was with older, fixed-macroblock-sized video

RFC7798 - Page 77

   codecs, as the state of the sequence parameter set (where the CTB
   size is located) has to be taken into account when interpreting the
   "First" subfield in the FCI.

8.3.  Reference Picture Selection Indication (RPSI)

   Feedback-based reference picture selection has been shown as a
   powerful tool to stop temporal error propagation for improved error
   resilience [Girod99][Wang05].  In one approach, the decoder side
   tracks errors in the decoded pictures and informs the encoder side
   that a particular picture that has been decoded relatively earlier is
   correct and still present in the decoded picture buffer; it requests
   the encoder to use that correct picture-availability information when
   encoding the next picture, so to stop further temporal error
   propagation.  For this approach, the decoder side should use the RPSI
   feedback message.

   Encoders can encode some long-term reference pictures as specified in
   H.264 or HEVC for purposes described in the previous paragraph
   without the need of a huge decoded picture buffer.  As shown in
   [Wang05], with a flexible reference picture management scheme, as in
   H.264 and HEVC, even a decoded picture buffer size of two picture
   storage buffers would work for the approach described in the previous
   paragraph.

   The field "Native RPSI bit string defined per codec" is a base16
   [RFC4648] representation of the 8 bits consisting of the 2 most
   significant bits equal to 0 and 6 bits of nuh_layer_id, as defined in
   [HEVC], followed by the 32 bits representing the value of the
   PicOrderCntVal (in network byte order), as defined in [HEVC], for the
   picture that is indicated by the RPSI feedback message.

   The use of the RPSI feedback message as positive acknowledgement with
   HEVC is deprecated.  In other words, the RPSI feedback message MUST
   only be used as a reference picture selection request, such that it
   can also be used in multicast.

8.4.  Full Intra Request (FIR)

   The purpose of the FIR message is to force an encoder to send an
   independent decoder refresh point as soon as possible (observing, for
   example, the congestion-control-related constraints set out in RFC
   5104).

   Upon reception of a FIR, a sender MUST send an IDR picture.
   Parameter sets MUST also be sent, except when there is a priori
   knowledge that the parameter sets have been correctly established.  A

RFC7798 - Page 78

   typical example for that is an understanding between sender and
   receiver, established by means outside this document, that parameter
   sets are exclusively sent out-of-band.

9.  Security Considerations

   The scope of this Security Considerations section is limited to the
   payload format itself and to one feature of HEVC that may pose a
   particularly serious security risk if implemented naively.  The
   payload format, in isolation, does not form a complete system.
   Implementers are advised to read and understand relevant security-
   related documents, especially those pertaining to RTP (see the
   Security Considerations section in [RFC3550]), and the security of
   the call-control stack chosen (that may make use of the media type
   registration of this memo).  Implementers should also consider known
   security vulnerabilities of video coding and decoding implementations
   in general and avoid those.

   Within this RTP payload format, and with the exception of the user
   data SEI message as described below, no security threats other than
   those common to RTP payload formats are known.  In other words,
   neither the various media-plane-based mechanisms, nor the signaling
   part of this memo, seems to pose a security risk beyond those common
   to all RTP-based systems.

   RTP packets using the payload format defined in this specification
   are subject to the security considerations discussed in the RTP
   specification [RFC3550], and in any applicable RTP profile such as
   RTP/AVP [RFC3551], RTP/AVPF [RFC4585], RTP/SAVP [RFC3711], or
   RTP/SAVPF [RFC5124].  However, as "Securing the RTP Framework: Why
   RTP Does Not Mandate a Single Media Security Solution" [RFC7202]
   discusses, it is not an RTP payload format's responsibility to
   discuss or mandate what solutions are used to meet the basic security
   goals like confidentiality, integrity and source authenticity for RTP
   in general.  This responsibility lays on anyone using RTP in an
   application.  They can find guidance on available security mechanisms
   and important considerations in "Options for Securing RTP Sessions"
   [RFC7201].  Applications SHOULD use one or more appropriate strong
   security mechanisms.  The rest of this section discusses the security
   impacting properties of the payload format itself.

   Because the data compression used with this payload format is applied
   end-to-end, any encryption needs to be performed after compression.
   A potential denial-of-service threat exists for data encodings using
   compression techniques that have non-uniform receiver-end
   computational load.  The attacker can inject pathological datagrams
   into the bitstream that are complex to decode and that cause the
   receiver to be overloaded.  H.265 is particularly vulnerable to such

RFC7798 - Page 79

   attacks, as it is extremely simple to generate datagrams containing
   NAL units that affect the decoding process of many future NAL units.
   Therefore, the usage of data origin authentication and data integrity
   protection of at least the RTP packet is RECOMMENDED, for example,
   with SRTP [RFC3711].

   Like [H.264], HEVC includes a user data Supplemental Enhancement
   Information (SEI) message.  This SEI message allows inclusion of an
   arbitrary bitstring into the video bitstream.  Such a bitstring could
   include JavaScript, machine code, and other active content.  HEVC
   leaves the handling of this SEI message to the receiving system.  In
   order to avoid harmful side effects of the user data SEI message,
   decoder implementations cannot naively trust its content.  For
   example, it would be a bad and insecure implementation practice to
   forward any JavaScript a decoder implementation detects to a web
   browser.  The safest way to deal with user data SEI messages is to
   simply discard them, but that can have negative side effects on the
   quality of experience by the user.

   End-to-end security with authentication, integrity, or
   confidentiality protection will prevent a MANE from performing media-
   aware operations other than discarding complete packets.  In the case
   of confidentiality protection, it will even be prevented from
   discarding packets in a media-aware way.  To be allowed to perform
   such operations, a MANE is required to be a trusted entity that is
   included in the security context establishment.

10.  Congestion Control

   Congestion control for RTP SHALL be used in accordance with RTP
   [RFC3550] and with any applicable RTP profile, e.g., AVP [RFC3551].
   If best-effort service is being used, an additional requirement is
   that users of this payload format MUST monitor packet loss to ensure
   that the packet loss rate is within an acceptable range.  Packet loss
   is considered acceptable if a TCP flow across the same network path,
   and experiencing the same network conditions, would achieve an
   average throughput, measured on a reasonable timescale, that is not
   less than all RTP streams combined is achieving.  This condition can
   be satisfied by implementing congestion-control mechanisms to adapt
   the transmission rate, the number of layers subscribed for a layered
   multicast session, or by arranging for a receiver to leave the
   session if the loss rate is unacceptably high.

   The bitrate adaptation necessary for obeying the congestion control
   principle is easily achievable when real-time encoding is used, for
   example, by adequately tuning the quantization parameter.

RFC7798 - Page 80

   However, when pre-encoded content is being transmitted, bandwidth
   adaptation requires the pre-coded bitstream to be tailored for such
   adaptivity.  The key mechanism available in HEVC is temporal
   scalability.  A media sender can remove NAL units belonging to higher
   temporal sub-layers (i.e., those NAL units with a high value of TID)
   until the sending bitrate drops to an acceptable range.  HEVC
   contains mechanisms that allow the lightweight identification of
   switching points in temporal enhancement layers, as discussed in
   Section 1.1.2 of this memo.  An HEVC media sender can send packets
   belonging to NAL units of temporal enhancement layers starting from
   these switching points to probe for available bandwidth and to
   utilized bandwidth that has been shown to be available.

   Above mechanisms generally work within a defined profile and level
   and, therefore, no renegotiation of the channel is required.  Only
   when non-downgradable parameters (such as profile) are required to be
   changed does it become necessary to terminate and restart the RTP
   stream(s).  This may be accomplished by using different RTP payload
   types.

   MANEs MAY remove certain unusable packets from the RTP stream when
   that RTP stream was damaged due to previous packet losses.  This can
   help reduce the network load in certain special cases.  For example,
   MANES can remove those FUs where the leading FUs belonging to the
   same NAL unit have been lost or those dependent slice segments when
   the leading slice segments belonging to the same slice have been
   lost, because the trailing FUs or dependent slice segments are
   meaningless to most decoders.  MANES can also remove higher temporal
   scalable layers if the outbound transmission (from the MANE's
   viewpoint) experiences congestion.

11.  IANA Considerations

   A new media type, as specified in Section 7.1 of this memo, has been
   registered with IANA.

12.  References

12.1.  Normative References

   [H.264]   ITU-T, "Advanced video coding for generic audiovisual
             services", ITU-T Recommendation H.264, April 2013.

   [HEVC]    ITU-T, "High efficiency video coding", ITU-T Recommendation
             H.265, April 2013.

RFC7798 - Page 81

   [ISO23008-2]
             ISO/IEC, "Information technology -- High efficiency coding
             and media delivery in heterogeneous environments -- Part 2:
             High efficiency video coding", ISO/IEC 23008-2, 2013.

   [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
             Requirement Levels", BCP 14, RFC 2119,
             DOI 10.17487/RFC2119, March 1997,
             <http://www.rfc-editor.org/info/rfc2119>.

   [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model
             with Session Description Protocol (SDP)", RFC 3264,
             DOI 10.17487/RFC3264, June 2002,
             <http://www.rfc-editor.org/info/rfc3264>.

   [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V.
             Jacobson, "RTP: A Transport Protocol for Real-Time
             Applications", STD 64, RFC 3550, DOI 10.17487/RFC3550, July
             2003, <http://www.rfc-editor.org/info/rfc3550>.

   [RFC3551] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and
             Video Conferences with Minimal Control", STD 65, RFC 3551,
             DOI 10.17487/RFC3551, July 2003,
             <http://www.rfc-editor.org/info/rfc3551>.

   [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K.
             Norrman, "The Secure Real-time Transport Protocol (SRTP)",
             RFC 3711, DOI 10.17487/RFC3711, March 2004,
             <http://www.rfc-editor.org/info/rfc3711>.

   [RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session
             Description Protocol", RFC 4566, DOI 10.17487/RFC4566, July
             2006, <http://www.rfc-editor.org/info/rfc4566>.

   [RFC4585] Ott, J., Wenger, S., Sato, N., Burmeister, C., and J. Rey,
             "Extended RTP Profile for Real-time Transport Control
             Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585,
             DOI 10.17487/RFC4585, July 2006,
             <http://www.rfc-editor.org/info/rfc4585>.

   [RFC4648] Josefsson, S., "The Base16, Base32, and Base64 Data
             Encodings", RFC 4648, DOI 10.17487/RFC4648, October 2006,
             <http://www.rfc-editor.org/info/rfc4648>.

   [RFC5104] Wenger, S., Chandra, U., Westerlund, M., and B. Burman,
             "Codec Control Messages in the RTP Audio-Visual Profile
             with Feedback (AVPF)", RFC 5104, DOI 10.17487/RFC5104,
             February 2008, <http://www.rfc-editor.org/info/rfc5104>.

RFC7798 - Page 82

   [RFC5124] Ott, J. and E. Carrara, "Extended Secure RTP Profile for
             Real-time Transport Control Protocol (RTCP)-Based Feedback
             (RTP/SAVPF)", RFC 5124, DOI 10.17487/RFC5124, February
             2008, <http://www.rfc-editor.org/info/rfc5124>.

   [RFC5234] Crocker, D., Ed., and P. Overell, "Augmented BNF for Syntax
             Specifications: ABNF", STD 68, RFC 5234,
             DOI 10.17487/RFC5234, January 2008,
             <http://www.rfc-editor.org/info/rfc5234>.

   [RFC5576] Lennox, J., Ott, J., and T. Schierl, "Source-Specific Media
             Attributes in the Session Description Protocol (SDP)",
             RFC 5576, DOI 10.17487/RFC5576, June 2009,
             <http://www.rfc-editor.org/info/rfc5576>.

   [RFC5583] Schierl, T. and S. Wenger, "Signaling Media Decoding
             Dependency in the Session Description Protocol (SDP)",
             RFC 5583, DOI 10.17487/RFC5583, July 2009,
             <http://www.rfc-editor.org/info/rfc5583>.

12.2.  Informative References

   [3GPDASH] 3GPP, "Transparent end-to-end Packet-switched Streaming
             Service (PSS); Progressive Download and Dynamic Adaptive
             Streaming over HTTP (3GP-DASH)", 3GPP TS 26.247 12.1.0,
             December 2013.

   [3GPPFF]  3GPP, "Transparent end-to-end packet switched streaming
             service (PSS); 3GPP file format (3GP)", 3GPP TS 26.244
             12.20, December 2013.

   [CABAC]   Sole, J., Joshi, R., Nguyen, N., Ji, T., Karczewicz, M.,
             Clare, G., Henry, F., and Duenas, A., "Transform
             coefficient coding in HEVC", IEEE Transactions on Circuts
             and Systems for Video Technology, Vol. 22, No. 12,
             pp. 1765-1777, DOI 10.1109/TCSVT.2012.2223055, December
             2012.

   [Girod99] Girod, B. and Faerber, F., "Feedback-based error control
             for mobile video transmission", Proceedings of the IEEE,
             Vol. 87, No. 10, pp. 1707-1723, DOI 10.1109/5.790632,
             October 1999.

   [H.265.1] ITU-T, "Conformance specification for ITU-T H.265 high
             efficiency video coding", ITU-T Recommendation H.265.1,
             October 2014.

RFC7798 - Page 83

   [HEVCv2]  Flynn, D., Naccari, M., Rosewarne, C., Sharman, K., Sole,
             J., Sullivan, G. J., and T. Suzuki, "High Efficiency Video
             Coding (HEVC) Range Extensions text specification: Draft
             7", JCT-VC document JCTVC-Q1005, 17th JCT-VC meeting,
             Valencia, Spain, March/April 2014.

   [IS014496-12]
             IS0/IEC, "Information technology - Coding of audio-visual
             objects - Part 12: ISO base media file format", IS0/IEC
             14496-12, 2015.

   [IS015444-12]
             IS0/IEC, "Information technology - JPEG 2000 image coding
             system - Part 12: ISO base media file format", IS0/IEC
             15444-12, 2015.

   [JCTVC-J0107]
             Wang, Y.-K., Chen, Y., Joshi, R., and Ramasubramonian, K.,
             "AHG9: On RAP pictures", JCT-VC document JCTVC-L0107, 10th
             JCT-VC meeting, Stockholm, Sweden, July 2012.

   [MPEG2S]  ISO/IEC, "Information technology - Generic coding of moving
             pictures and associated audio information - Part 1:
             Systems", ISO International Standard 13818-1, 2013.

   [MPEGDASH] ISO/IEC, "Information technology - Dynamic adaptive
             streaming over HTTP (DASH) -- Part 1: Media presentation
             description and segment formats", ISO International
             Standard 23009-1, 2012.

   [RFC2326] Schulzrinne, H., Rao, A., and R. Lanphier, "Real Time
             Streaming Protocol (RTSP)", RFC 2326, DOI 10.17487/RFC2326,
             April 1998, <http://www.rfc-editor.org/info/rfc2326>.

   [RFC2974] Handley, M., Perkins, C., and E. Whelan, "Session
             Announcement Protocol", RFC 2974, DOI 10.17487/RFC2974,
             October 2000, <http://www.rfc-editor.org/info/rfc2974>.

   [RFC6051] Perkins, C. and T. Schierl, "Rapid Synchronisation of RTP
             Flows", RFC 6051, DOI 10.17487/RFC6051, November 2010,
             <http://www.rfc-editor.org/info/rfc6051>.

   [RFC6184] Wang, Y.-K., Even, R., Kristensen, T., and R. Jesup, "RTP
             Payload Format for H.264 Video", RFC 6184,
             DOI 10.17487/RFC6184, May 2011,
             <http://www.rfc-editor.org/info/rfc6184>.

RFC7798 - Page 84

   [RFC6190] Wenger, S., Wang, Y.-K., Schierl, T., and A. Eleftheriadis,
             "RTP Payload Format for Scalable Video Coding", RFC 6190,
             DOI 10.17487/RFC6190, May 2011,
             <http://www.rfc-editor.org/info/rfc6190>.

   [RFC7201] Westerlund, M. and C. Perkins, "Options for Securing RTP
             Sessions", RFC 7201, DOI 10.17487/RFC7201, April 2014,
             <http://www.rfc-editor.org/info/rfc7201>.

   [RFC7202] Perkins, C. and M. Westerlund, "Securing the RTP Framework:
             Why RTP Does Not Mandate a Single Media Security Solution",
             RFC 7202, DOI 10.17487/RFC7202, April 2014,
             <http://www.rfc-editor.org/info/rfc7202>.

   [RFC7656] Lennox, J., Gross, K., Nandakumar, S., Salgueiro, G., and
             B. Burman, Ed., "A Taxonomy of Semantics and Mechanisms for
             Real-Time Transport Protocol (RTP) Sources", RFC 7656,
             DOI 10.17487/RFC7656, November 2015,
             <http://www.rfc-editor.org/info/rfc7656>.

   [RFC7667] Westerlund, M. and S. Wenger, "RTP Topologies", RFC 7667,
             DOI 10.17487/RFC7667, November 2015,
             <http://www.rfc-editor.org/info/rfc7667>.

   [RTP-MULTI-STREAM]
             Lennox, J., Westerlund, M., Wu, Q., and C. Perkins,
             "Sending Multiple Media Streams in a Single RTP Session",
             Work in Progress, draft-ietf-avtcore-rtp-multi-stream-11,
             December 2015.

   [SDP-NEG] Holmberg, C., Alvestrand, H., and C. Jennings, "Negotiating
             Medai Multiplexing Using Session Description Protocol
             (SDP)", Work in Progress,
             draft-ietf-mmusic-sdp-bundle-negotiation-25, January 2016.

   [Wang05]  Wang, Y.-K., Zhu, C., and Li, H., "Error resilient video
             coding using flexible reference fames", Visual
             Communications and Image Processing 2005 (VCIP 2005),
             Beijing, China, July 2005.

RFC7798 - Page 85

Acknowledgements

   Muhammed Coban and Marta Karczewicz are thanked for discussions on
   the specification of the use with feedback messages and other aspects
   in this memo.  Jonathan Lennox and Jill Boyce are thanked for their
   contributions to the PACI design included in this memo.  Rickard
   Sjoberg, Arild Fuldseth, Bo Burman, Magnus Westerlund, and Tom
   Kristensen are thanked for their contributions to signaling related
   to parallel processing.  Magnus Westerlund, Jonathan Lennox, Bernard
   Aboba, Jonatan Samuelsson, Roni Even, Rickard Sjoberg, Sachin
   Deshpande, Woo Johnman, Mo Zanaty, Ross Finlayson, Danny Hong, Bo
   Burman, Ben Campbell, Brian Carpenter, Qin Wu, Stephen Farrell, and
   Min Wang made valuable review comments that led to improvements.

RFC7798 - Page 86

Authors' Addresses

   Ye-Kui Wang
   Qualcomm Incorporated
   5775 Morehouse Drive
   San Diego, CA 92121
   United States
   Phone: +1-858-651-8345
   Email: yekui.wang@gmail.com

   Yago Sanchez
   Fraunhofer HHI
   Einsteinufer 37
   D-10587 Berlin
   Germany
   Phone: +49 30 31002-663
   Email: yago.sanchez@hhi.fraunhofer.de

   Thomas Schierl
   Fraunhofer HHI
   Einsteinufer 37
   D-10587 Berlin
   Germany
   Phone: +49-30-31002-227
   Email: thomas.schierl@hhi.fraunhofer.de

   Stephan Wenger
   Vidyo, Inc.
   433 Hackensack Ave., 7th floor
   Hackensack, NJ 07601
   United States
   Phone: +1-415-713-5473
   Email: stewe@stewe.org

   Miska M. Hannuksela
   Nokia Corporation
   P.O. Box 1000
   33721 Tampere
   Finland
   Phone: +358-7180-08000
   Email: miska.hannuksela@nokia.com