RFC 7798

RTP Payload Format for High Efficiency Video Coding (HEVC)

Pages: 86
Proposed Standard

Part 2 of 4 – Pages 20 to 42

RFC7798 - Page 20 prevText

4.  RTP Payload Format

4.1.  RTP Header Usage

   The format of the RTP header is specified in [RFC3550] (reprinted as
   Figure 2 for convenience).  This payload format uses the fields of
   the header in a manner consistent with that specification.

   The RTP payload (and the settings for some RTP header bits) for
   aggregation packets and fragmentation units are specified in Sections
   4.4.2 and 4.4.3, respectively.

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |V=2|P|X|  CC   |M|     PT      |       sequence number         |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                           timestamp                           |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |           synchronization source (SSRC) identifier            |
   +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
   |            contributing source (CSRC) identifiers             |
   |                             ....                              |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

               Figure 2: RTP Header According to [RFC3550]

RFC7798 - Page 21

   The RTP header information to be set according to this RTP payload
   format is set as follows:

   Marker bit (M): 1 bit

      Set for the last packet of the access unit, carried in the current
      RTP stream.  This is in line with the normal use of the M bit in
      video formats to allow an efficient playout buffer handling.  When
      MRST or MRMT is in use, if an access unit appears in multiple RTP
      streams, the marker bit is set on each RTP stream's last packet of
      the access unit.

         Informative note: The content of a NAL unit does not tell
         whether or not the NAL unit is the last NAL unit, in decoding
         order, of an access unit.  An RTP sender implementation may
         obtain this information from the video encoder.  If, however,
         the implementation cannot obtain this information directly from
         the encoder, e.g., when the bitstream was pre-encoded, and also
         there is no timestamp allocated for each NAL unit, then the
         sender implementation can inspect subsequent NAL units in
         decoding order to determine whether or not the NAL unit is the
         last NAL unit of an access unit as follows.  A NAL unit is
         determined to be the last NAL unit of an access unit if it is
         the last NAL unit of the bitstream.  A NAL unit naluX is also
         determined to be the last NAL unit of an access unit if both
         the following conditions are true: 1) the next VCL NAL unit
         naluY in decoding order has the high-order bit of the first
         byte after its NAL unit header equal to 1, and 2) all NAL units
         between naluX and naluY, when present, have nal_unit_type in
         the range of 32 to 35, inclusive, equal to 39, or in the ranges
         of 41 to 44, inclusive, or 48 to 55, inclusive.

   Payload Type (PT): 7 bits

      The assignment of an RTP payload type for this new packet format
      is outside the scope of this document and will not be specified
      here.  The assignment of a payload type has to be performed either
      through the profile used or in a dynamic way.

         Informative note: It is not required to use different payload
         type values for different RTP streams in MRST or MRMT.

   Sequence Number (SN): 16 bits

      Set and used in accordance with [RFC3550].

RFC7798 - Page 22

   Timestamp: 32 bits

      The RTP timestamp is set to the sampling timestamp of the content.
      A 90 kHz clock rate MUST be used.

      If the NAL unit has no timing properties of its own (e.g.,
      parameter set and SEI NAL units), the RTP timestamp MUST be set to
      the RTP timestamp of the coded picture of the access unit in which
      the NAL unit (according to Section 7.4.2.4.4 of [HEVC]) is
      included.

      Receivers MUST use the RTP timestamp for the display process, even
      when the bitstream contains picture timing SEI messages or
      decoding unit information SEI messages as specified in [HEVC].
      However, this does not mean that picture timing SEI messages in
      the bitstream should be discarded, as picture timing SEI messages
      may contain frame-field information that is important in
      appropriately rendering interlaced video.

   Synchronization source (SSRC): 32 bits

      Used to identify the source of the RTP packets.  When using SRST,
      by definition a single SSRC is used for all parts of a single
      bitstream.  In MRST or MRMT, different SSRCs are used for each RTP
      stream containing a subset of the sub-layers of the single
      (temporally scalable) bitstream.  A receiver is required to
      correctly associate the set of SSRCs that are included parts of
      the same bitstream.

4.2.  Payload Header Usage

   The first two bytes of the payload of an RTP packet are referred to
   as the payload header.  The payload header consists of the same
   fields (F, Type, LayerId, and TID) as the NAL unit header as shown in
   Section 1.1.4, irrespective of the type of the payload structure.

   The TID value indicates (among other things) the relative importance
   of an RTP packet, for example, because NAL units belonging to higher
   temporal sub-layers are not used for the decoding of lower temporal
   sub-layers.  A lower value of TID indicates a higher importance.
   More-important NAL units MAY be better protected against transmission
   losses than less-important NAL units.

RFC7798 - Page 23

4.3.  Transmission Modes

   This memo enables transmission of an HEVC bitstream over:

      o a Single RTP stream on a Single media Transport (SRST),

      o Multiple RTP streams over a Single media Transport (MRST), or

      o Multiple RTP streams on Multiple media Transports (MRMT).

      Informative note: While this specification enables the use of MRST
      within the H.265 RTP payload, the signaling of MRST within SDP
      offer/answer is not fully specified at the time of this writing.
      See [RFC5576] and [RFC5583] for what is supported today as well as
      [RTP-MULTI-STREAM] and [SDP-NEG] for future directions.

   When in MRMT, the dependency of one RTP stream on another RTP stream
   is typically indicated as specified in [RFC5583].  [RFC5583] can also
   be utilized to specify dependencies within MRST, but only if the RTP
   streams utilize distinct payload types.

   SRST or MRST SHOULD be used for point-to-point unicast scenarios,
   whereas MRMT SHOULD be used for point-to-multipoint multicast
   scenarios where different receivers require different operation
   points of the same HEVC bitstream, to improve bandwidth utilizing
   efficiency.

      Informative note: A multicast may degrade to a unicast after all
      but one receivers have left (this is a justification of the first
      "SHOULD" instead of "MUST"), and there might be scenarios where
      MRMT is desirable but not possible, e.g., when IP multicast is not
      deployed in certain network (this is a justification of the second
      "SHOULD" instead of "MUST").

   The transmission mode is indicated by the tx-mode media parameter
   (see Section 7.1).  If tx-mode is equal to "SRST", SRST MUST be used.
   Otherwise, if tx-mode is equal to "MRST", MRST MUST be used.
   Otherwise (tx-mode is equal to "MRMT"), MRMT MUST be used.

      Informative note: When an RTP stream does not depend on other RTP
      streams, any of SRST, MRST, or MRMT may be in use for the RTP
      stream.

   Receivers MUST support all of SRST, MRST, and MRMT.

      Informative note: The required support of MRMT by receivers does
      not imply that multicast must be supported by receivers.

RFC7798 - Page 24

4.4.  Payload Structures

   Four different types of RTP packet payload structures are specified.
   A receiver can identify the type of an RTP packet payload through the
   Type field in the payload header.

   The four different payload structures are as follows:

   o  Single NAL unit packet: Contains a single NAL unit in the payload,
      and the NAL unit header of the NAL unit also serves as the payload
      header.  This payload structure is specified in Section 4.4.1.

   o  Aggregation Packet (AP): Contains more than one NAL unit within
      one access unit.  This payload structure is specified in Section
      4.4.2.

   o  Fragmentation Unit (FU): Contains a subset of a single NAL unit.
      This payload structure is specified in Section 4.4.3.

   o  PACI carrying RTP packet: Contains a payload header (that differs
      from other payload headers for efficiency), a Payload Header
      Extension Structure (PHES), and a PACI payload.  This payload
      structure is specified in Section 4.4.4.

4.4.1.  Single NAL Unit Packets

   A single NAL unit packet contains exactly one NAL unit, and consists
   of a payload header (denoted as PayloadHdr), a conditional 16-bit
   DONL field (in network byte order), and the NAL unit payload data
   (the NAL unit excluding its NAL unit header) of the contained NAL
   unit, as shown in Figure 3.

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |           PayloadHdr          |      DONL (conditional)       |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                                               |
   |                  NAL unit payload data                        |
   |                                                               |
   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                               :...OPTIONAL RTP padding        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

            Figure 3: The Structure of a Single NAL Unit Packet

RFC7798 - Page 25

   The payload header SHOULD be an exact copy of the NAL unit header of
   the contained NAL unit.  However, the Type (i.e., nal_unit_type)
   field MAY be changed, e.g., when it is desirable to handle a CRA
   picture to be a BLA picture [JCTVC-J0107].

   The DONL field, when present, specifies the value of the 16 least
   significant bits of the decoding order number of the contained NAL
   unit.  If sprop-max-don-diff is greater than 0 for any of the RTP
   streams, the DONL field MUST be present, and the variable DON for the
   contained NAL unit is derived as equal to the value of the DONL
   field.  Otherwise (sprop-max-don-diff is equal to 0 for all the RTP
   streams), the DONL field MUST NOT be present.

4.4.2.  Aggregation Packets (APs)

   Aggregation Packets (APs) are introduced to enable the reduction of
   packetization overhead for small NAL units, such as most of the non-
   VCL NAL units, which are often only a few octets in size.

   An AP aggregates NAL units within one access unit.  Each NAL unit to
   be carried in an AP is encapsulated in an aggregation unit.  NAL
   units aggregated in one AP are in NAL unit decoding order.

   An AP consists of a payload header (denoted as PayloadHdr) followed
   by two or more aggregation units, as shown in Figure 4.

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |    PayloadHdr (Type=48)       |                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               |
   |                                                               |
   |             two or more aggregation units                     |
   |                                                               |
   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                               :...OPTIONAL RTP padding        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

            Figure 4: The Structure of an Aggregation Packet

   The fields in the payload header are set as follows.  The F bit MUST
   be equal to 0 if the F bit of each aggregated NAL unit is equal to
   zero; otherwise, it MUST be equal to 1.  The Type field MUST be equal
   to 48.  The value of LayerId MUST be equal to the lowest value of
   LayerId of all the aggregated NAL units.  The value of TID MUST be
   the lowest value of TID of all the aggregated NAL units.

RFC7798 - Page 26

      Informative note: All VCL NAL units in an AP have the same TID
      value since they belong to the same access unit.  However, an AP
      may contain non-VCL NAL units for which the TID value in the NAL
      unit header may be different than the TID value of the VCL NAL
      units in the same AP.

   An AP MUST carry at least two aggregation units and can carry as many
   aggregation units as necessary; however, the total amount of data in
   an AP obviously MUST fit into an IP packet, and the size SHOULD be
   chosen so that the resulting IP packet is smaller than the MTU size
   so to avoid IP layer fragmentation.  An AP MUST NOT contain FUs
   specified in Section 4.4.3.  APs MUST NOT be nested; i.e., an AP must
   not contain another AP.

   The first aggregation unit in an AP consists of a conditional 16-bit
   DONL field (in network byte order) followed by a 16-bit unsigned size
   information (in network byte order) that indicates the size of the
   NAL unit in bytes (excluding these two octets, but including the NAL
   unit header), followed by the NAL unit itself, including its NAL unit
   header, as shown in Figure 5.

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                   :       DONL (conditional)      |   NALU size   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   NALU size   |                                               |
   +-+-+-+-+-+-+-+-+         NAL unit                              |
   |                                                               |
   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                               :
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

     Figure 5: The Structure of the First Aggregation Unit in an AP

   The DONL field, when present, specifies the value of the 16 least
   significant bits of the decoding order number of the aggregated NAL
   unit.

   If sprop-max-don-diff is greater than 0 for any of the RTP streams,
   the DONL field MUST be present in an aggregation unit that is the
   first aggregation unit in an AP, and the variable DON for the
   aggregated NAL unit is derived as equal to the value of the DONL
   field.  Otherwise (sprop-max-don-diff is equal to 0 for all the RTP
   streams), the DONL field MUST NOT be present in an aggregation unit
   that is the first aggregation unit in an AP.

RFC7798 - Page 27

   An aggregation unit that is not the first aggregation unit in an AP
   consists of a conditional 8-bit DOND field followed by a 16-bit
   unsigned size information (in network byte order) that indicates the
   size of the NAL unit in bytes (excluding these two octets, but
   including the NAL unit header), followed by the NAL unit itself,
   including its NAL unit header, as shown in Figure 6.

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                   : DOND (cond)   |          NALU size            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                                               |
   |                       NAL unit                                |
   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                               :
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   Figure 6: The Structure of an Aggregation Unit That Is Not the
   First Aggregation Unit in an AP

   When present, the DOND field plus 1 specifies the difference between
   the decoding order number values of the current aggregated NAL unit
   and the preceding aggregated NAL unit in the same AP.

   If sprop-max-don-diff is greater than 0 for any of the RTP streams,
   the DOND field MUST be present in an aggregation unit that is not the
   first aggregation unit in an AP, and the variable DON for the
   aggregated NAL unit is derived as equal to the DON of the preceding
   aggregated NAL unit in the same AP plus the value of the DOND field
   plus 1 modulo 65536.  Otherwise (sprop-max-don-diff is equal to 0 for
   all the RTP streams), the DOND field MUST NOT be present in an
   aggregation unit that is not the first aggregation unit in an AP, and
   in this case the transmission order and decoding order of NAL units
   carried in the AP are the same as the order the NAL units appear in
   the AP.

   Figure 7 presents an example of an AP that contains two aggregation
   units, labeled as 1 and 2 in the figure, without the DONL and DOND
   fields being present.

RFC7798 - Page 28

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                          RTP Header                           |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   PayloadHdr (Type=48)        |         NALU 1 Size           |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |          NALU 1 HDR           |                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+         NALU 1 Data           |
   |                   . . .                                       |
   |                                                               |
   +               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |  . . .        | NALU 2 Size                   | NALU 2 HDR    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | NALU 2 HDR    |                                               |
   +-+-+-+-+-+-+-+-+              NALU 2 Data                      |
   |                   . . .                                       |
   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                               :...OPTIONAL RTP padding        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   Figure 7: An Example of an AP Packet Containing Two Aggregation
   Units without the DONL and DOND Fields

RFC7798 - Page 29

   Figure 8 presents an example of an AP that contains two aggregation
   units, labeled as 1 and 2 in the figure, with the DONL and DOND
   fields being present.

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                          RTP Header                           |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   PayloadHdr (Type=48)        |        NALU 1 DONL            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |          NALU 1 Size          |            NALU 1 HDR         |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                                               |
   |                 NALU 1 Data   . . .                           |
   |                                                               |
   +     . . .     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |               |  NALU 2 DOND  |          NALU 2 Size          |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |          NALU 2 HDR           |                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+          NALU 2 Data          |
   |                                                               |
   |        . . .                  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                               :...OPTIONAL RTP padding        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   Figure 8: An Example of an AP Containing Two Aggregation Units
   with the DONL and DOND Fields

4.4.3.  Fragmentation Units

   Fragmentation Units (FUs) are introduced to enable fragmenting a
   single NAL unit into multiple RTP packets, possibly without
   cooperation or knowledge of the HEVC encoder.  A fragment of a NAL
   unit consists of an integer number of consecutive octets of that NAL
   unit.  Fragments of the same NAL unit MUST be sent in consecutive
   order with ascending RTP sequence numbers (with no other RTP packets
   within the same RTP stream being sent between the first and last
   fragment).

   When a NAL unit is fragmented and conveyed within FUs, it is referred
   to as a fragmented NAL unit.  APs MUST NOT be fragmented.  FUs MUST
   NOT be nested; i.e., an FU must not contain a subset of another FU.

   The RTP timestamp of an RTP packet carrying an FU is set to the NALU-
   time of the fragmented NAL unit.

RFC7798 - Page 30

   An FU consists of a payload header (denoted as PayloadHdr), an FU
   header of one octet, a conditional 16-bit DONL field (in network byte
   order), and an FU payload, as shown in Figure 9.

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |    PayloadHdr (Type=49)       |   FU header   | DONL (cond)   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-|
   | DONL (cond)   |                                               |
   |-+-+-+-+-+-+-+-+                                               |
   |                         FU payload                            |
   |                                                               |
   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                               :...OPTIONAL RTP padding        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                Figure 9: The Structure of an FU

   The fields in the payload header are set as follows.  The Type field
   MUST be equal to 49.  The fields F, LayerId, and TID MUST be equal to
   the fields F, LayerId, and TID, respectively, of the fragmented NAL
   unit.

   The FU header consists of an S bit, an E bit, and a 6-bit FuType
   field, as shown in Figure 10.

   +---------------+
   |0|1|2|3|4|5|6|7|
   +-+-+-+-+-+-+-+-+
   |S|E|  FuType   |
   +---------------+

   Figure 10: The Structure of FU Header

   The semantics of the FU header fields are as follows:

   S: 1 bit
      When set to 1, the S bit indicates the start of a fragmented NAL
      unit, i.e., the first byte of the FU payload is also the first
      byte of the payload of the fragmented NAL unit.  When the FU
      payload is not the start of the fragmented NAL unit payload, the S
      bit MUST be set to 0.

RFC7798 - Page 31

   E: 1 bit
      When set to 1, the E bit indicates the end of a fragmented NAL
      unit, i.e., the last byte of the payload is also the last byte of
      the fragmented NAL unit.  When the FU payload is not the last
      fragment of a fragmented NAL unit, the E bit MUST be set to 0.

   FuType: 6 bits
      The field FuType MUST be equal to the field Type of the fragmented
      NAL unit.

   The DONL field, when present, specifies the value of the 16 least
   significant bits of the decoding order number of the fragmented NAL
   unit.

   If sprop-max-don-diff is greater than 0 for any of the RTP streams,
   and the S bit is equal to 1, the DONL field MUST be present in the
   FU, and the variable DON for the fragmented NAL unit is derived as
   equal to the value of the DONL field.  Otherwise (sprop-max-don-diff
   is equal to 0 for all the RTP streams, or the S bit is equal to 0),
   the DONL field MUST NOT be present in the FU.

   A non-fragmented NAL unit MUST NOT be transmitted in one FU; i.e.,
   the Start bit and End bit must not both be set to 1 in the same FU
   header.

   The FU payload consists of fragments of the payload of the fragmented
   NAL unit so that if the FU payloads of consecutive FUs, starting with
   an FU with the S bit equal to 1 and ending with an FU with the E bit
   equal to 1, are sequentially concatenated, the payload of the
   fragmented NAL unit can be reconstructed.  The NAL unit header of the
   fragmented NAL unit is not included as such in the FU payload, but
   rather the information of the NAL unit header of the fragmented NAL
   unit is conveyed in F, LayerId, and TID fields of the FU payload
   headers of the FUs and the FuType field of the FU header of the FUs.
   An FU payload MUST NOT be empty.

   If an FU is lost, the receiver SHOULD discard all following
   fragmentation units in transmission order corresponding to the same
   fragmented NAL unit, unless the decoder in the receiver is known to
   be prepared to gracefully handle incomplete NAL units.

   A receiver in an endpoint or in a MANE MAY aggregate the first n-1
   fragments of a NAL unit to an (incomplete) NAL unit, even if fragment
   n of that NAL unit is not received.  In this case, the
   forbidden_zero_bit of the NAL unit MUST be set to 1 to indicate a
   syntax violation.

RFC7798 - Page 32

4.4.4.  PACI Packets

   This section specifies the PACI packet structure.  The basic payload
   header specified in this memo is intentionally limited to the 16 bits
   of the NAL unit header so to keep the packetization overhead to a
   minimum.  However, cases have been identified where it is advisable
   to include control information in an easily accessible position in
   the packet header, despite the additional overhead.  One such control
   information is the TSCI as specified in Section 4.5.  PACI packets
   carry this and future, similar structures.

   The PACI packet structure is based on a payload header extension
   mechanism that is generic and extensible to carry payload header
   extensions.  In this section, the focus lies on the use within this
   specification.  Section 4.4.4.2 provides guidance for the
   specification designers in how to employ the extension mechanism in
   future specifications.

   A PACI packet consists of a payload header (denoted as PayloadHdr),
   for which the structure follows what is described in Section 4.2.
   The payload header is followed by the fields A, cType, PHSsize,
   F[0..2], and Y.

   Figure 11 shows a PACI packet in compliance with this memo, i.e.,
   without any extensions.

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |    PayloadHdr (Type=50)       |A|   cType   | PHSsize |F0..2|Y|
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |        Payload Header Extension Structure (PHES)              |
   |=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=|
   |                                                               |
   |                  PACI payload: NAL unit                       |
   |                   . . .                                       |
   |                                                               |
   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                               :...OPTIONAL RTP padding        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

               Figure 11: The Structure of a PACI

RFC7798 - Page 33

   The fields in the payload header are set as follows.  The F bit MUST
   be equal to 0.  The Type field MUST be equal to 50.  The value of
   LayerId MUST be a copy of the LayerId field of the PACI payload NAL
   unit or NAL-unit-like structure.  The value of TID MUST be a copy of
   the TID field of the PACI payload NAL unit or NAL-unit-like
   structure.

   The semantics of other fields are as follows:

   A: 1 bit
      Copy of the F bit of the PACI payload NAL unit or NAL-unit-like
      structure.

   cType: 6 bits
      Copy of the Type field of the PACI payload NAL unit or NAL-unit-
      like structure.

   PHSsize: 5 bits
      Indicates the length of the PHES field.  The value is limited to
      be less than or equal to 32 octets, to simplify encoder design for
      MTU size matching.

   F0:
      This field equal to 1 specifies the presence of a temporal
      scalability support extension in the PHES.

   F1, F2:
      MUST be 0, available for future extensions, see Section 4.4.4.2.
      Receivers compliant with this version of the HEVC payload format
      MUST ignore F1=1 and/or F2=1, and also ignore any information in
      the PHES indicated as present by F1=1 and/or F2=1.

         Informative note: The receiver can do that by first decoding
         information associated with F0=1, and then skipping over any
         remaining bytes of the PHES based on the value of PHSsize.

   Y: 1 bit
      MUST be 0, available for future extensions, see Section 4.4.4.2.
      Receivers compliant with this version of the HEVC payload format
      MUST ignore Y=1, and also ignore any information in the PHES
      indicated as present by Y.

   PHES: variable number of octets
      A variable number of octets as indicated by the value of PHSsize.

   PACI Payload:
      The single NAL unit packet or NAL-unit-like structure (such as: FU
      or AP) to be carried, not including the first two octets.

RFC7798 - Page 34

         Informative note: The first two octets of the NAL unit or NAL-
         unit-like structure carried in the PACI payload are not
         included in the PACI payload.  Rather, the respective values
         are copied in locations of the PayloadHdr of the RTP packet.
         This design offers two advantages: first, the overall structure
         of the payload header is preserved, i.e., there is no special
         case of payload header structure that needs to be implemented
         for PACI.  Second, no additional overhead is introduced.

      A PACI payload MAY be a single NAL unit, an FU, or an AP.  PACIs
      MUST NOT be fragmented or aggregated.  The following subsection
      documents the reasons for these design choices.

4.4.4.1.  Reasons for the PACI Rules (Informative)

   A PACI cannot be fragmented.  If a PACI could be fragmented, and a
   fragment other than the first fragment got lost, access to the
   information in the PACI would not be possible.  Therefore, a PACI
   must not be fragmented.  In other words, an FU must not carry
   (fragments of) a PACI.

   A PACI cannot be aggregated.  Aggregation of PACIs is inadvisable
   from a compression viewpoint, as, in many cases, several to be
   aggregated NAL units would share identical PACI fields and values
   which would be carried redundantly for no reason.  Most, if not all,
   of the practical effects of PACI aggregation can be achieved by
   aggregating NAL units and bundling them with a PACI (see below).
   Therefore, a PACI must not be aggregated.  In other words, an AP must
   not contain a PACI.

   The payload of a PACI can be a fragment.  Both middleboxes and
   sending systems with inflexible (often hardware-based) encoders
   occasionally find themselves in situations where a PACI and its
   headers, combined, are larger than the MTU size.  In such a scenario,
   the middlebox or sender can fragment the NAL unit and encapsulate the
   fragment in a PACI.  Doing so preserves the payload header extension
   information for all fragments, allowing downstream middleboxes and
   the receiver to take advantage of that information.  Therefore, a
   sender may place a fragment into a PACI, and a receiver must be able
   to handle such a PACI.

   The payload of a PACI can be an aggregation NAL unit.  HEVC
   bitstreams can contain unevenly sized and/or small (when compared to
   the MTU size) NAL units.  In order to efficiently packetize such
   small NAL units, APs were introduced.  The benefits of APs are
   independent from the need for a payload header extension.  Therefore,
   a sender may place an AP into a PACI, and a receiver must be able to
   handle such a PACI.

RFC7798 - Page 35

4.4.4.2.  PACI Extensions (Informative)

   This section includes recommendations for future specification
   designers on how to extent the PACI syntax to accommodate future
   extensions.  Obviously, designers are free to specify whatever
   appears to be appropriate to them at the time of their design.
   However, a lot of thought has been invested into the extension
   mechanism described below, and we suggest that deviations from it
   warrant a good explanation.

   This memo defines only a single payload header extension (TSCI,
   described in Section 4.5); therefore, only the F0 bit carries
   semantics.  F1 and F2 are already named (and not just marked as
   reserved, as a typical video spec designer would do).  They are
   intended to signal two additional extensions.  The Y bit allows one
   to, recursively, add further F and Y bits to extend the mechanism
   beyond three possible payload header extensions.  It is suggested to
   define a new packet type (using a different value for Type) when
   assigning the F1, F2, or Y bits different semantics than what is
   suggested below.

   When a Y bit is set, an 8-bit flag-extension is inserted after the Y
   bit.  A flag-extension consists of 7 flags F[n..n+6], and another Y
   bit.

   The basic PACI header already includes F0, F1, and F2.  Therefore,
   the Fx bits in the first flag-extensions are numbered F3, F4, ...,
   F9; the F bits in the second flag-extension are numbered F10, F11,
   ..., F16, and so forth.  As a result, at least three Fx bits are
   always in the PACI, but the number of Fx bits (and associated types
   of extensions) can be increased by setting the next Y bit and adding
   an octet of flag-extensions, carrying seven flags and another Y bit.
   The size of this list of flags is subject to the limits specified in
   Section 4.4.4 (32 octets for all flag-extensions and the PHES
   information combined).

   Each of the F bits can indicate either the presence or the absence of
   certain information in the Payload Header Extension Structure (PHES).

   When a spec developer devises a new syntax that takes advantage of
   the PACI extension mechanism, he/she must follow the constraints
   listed below; otherwise, the extension mechanism may break.

      1) The fields added for a particular Fx bit MUST be fixed in
         length and not depend on what other Fx bits are set (no parsing
         dependency).

      2) The Fx bits must be assigned in order.

RFC7798 - Page 36

      3) An implementation that supports the n-th Fn bit for any value
         of n must understand the syntax (though not necessarily the
         semantics) of the fields Fk (with k < n), so as to be able to
         either use those bits when present, or at least be able to skip
         over them.

4.5.  Temporal Scalability Control Information

   This section describes the single payload header extension defined in
   this specification, known as TSCI.  If, in the future, additional
   payload header extensions become necessary, they could be specified
   in this section of an updated version of this document, or in their
   own documents.

   When F0 is set to 1 in a PACI, this specifies that the PHES field
   includes the TSCI fields TL0PICIDX, IrapPicID, S, and E as follows:

   0                   1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |    PayloadHdr (Type=50)       |A|   cType   | PHSsize |F0..2|Y|
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   TL0PICIDX   |   IrapPicID   |S|E|    RES    |               |
   |-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               |
   |                           ....                                |
   |               PACI payload: NAL unit                          |
   |                                                               |
   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                               :...OPTIONAL RTP padding        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   Figure 12: The Structure of a PACI with a PHES Containing a TSCI

   TL0PICIDX (8 bits)
      When present, the TL0PICIDX field MUST be set to equal to
      temporal_sub_layer_zero_idx as specified in Section D.3.22 of
      [HEVC] for the access unit containing the NAL unit in the PACI.

   IrapPicID (8 bits)
      When present, the IrapPicID field MUST be set to equal to
      irap_pic_id as specified in Section D.3.22 of [HEVC] for the
      access unit containing the NAL unit in the PACI.

RFC7798 - Page 37

   S (1 bit)
      The S bit MUST be set to 1 if any of the following conditions is
      true and MUST be set to 0 otherwise:

      o  The NAL unit in the payload of the PACI is the first VCL NAL
         unit, in decoding order, of a picture.

      o  The NAL unit in the payload of the PACI is an AP, and the NAL
         unit in the first contained aggregation unit is the first VCL
         NAL unit, in decoding order, of a picture.

      o  The NAL unit in the payload of the PACI is an FU with its S bit
         equal to 1 and the FU payload containing a fragment of the
         first VCL NAL unit, in decoding order, of a picture.

   E (1 bit)
      The E bit MUST be set to 1 if any of the following conditions is
      true and MUST be set to 0 otherwise:

      o  The NAL unit in the payload of the PACI is the last VCL NAL
         unit, in decoding order, of a picture.

      o  The NAL unit in the payload of the PACI is an AP and the NAL
         unit in the last contained aggregation unit is the last VCL NAL
         unit, in decoding order, of a picture.

      o  The NAL unit in the payload of the PACI is an FU with its E bit
         equal to 1 and the FU payload containing a fragment of the last
         VCL NAL unit, in decoding order, of a picture.

   RES (6 bits)
      MUST be equal to 0.  Reserved for future extensions.

   The value of PHSsize MUST be set to 3.  Receivers MUST allow other
   values of the fields F0, F1, F2, Y, and PHSsize, and MUST ignore any
   additional fields, when present, than specified above in the PHES.

4.6.  Decoding Order Number

   For each NAL unit, the variable AbsDon is derived, representing the
   decoding order number that is indicative of the NAL unit decoding
   order.

   Let NAL unit n be the n-th NAL unit in transmission order within an
   RTP stream.

RFC7798 - Page 38

   If sprop-max-don-diff is equal to 0 for all the RTP streams carrying
   the HEVC bitstream, AbsDon[n], the value of AbsDon for NAL unit n, is
   derived as equal to n.

   Otherwise (sprop-max-don-diff is greater than 0 for any of the RTP
   streams), AbsDon[n] is derived as follows, where DON[n] is the value
   of the variable DON for NAL unit n:

   o  If n is equal to 0 (i.e., NAL unit n is the very first NAL unit in
      transmission order), AbsDon[0] is set equal to DON[0].

   o  Otherwise (n is greater than 0), the following applies for
      derivation of AbsDon[n]:

      If DON[n] == DON[n-1],
          AbsDon[n] = AbsDon[n-1]

      If (DON[n] > DON[n-1] and DON[n] - DON[n-1] < 32768),
          AbsDon[n] = AbsDon[n-1] + DON[n] - DON[n-1]

      If (DON[n] < DON[n-1] and DON[n-1] - DON[n] >= 32768),
          AbsDon[n] = AbsDon[n-1] + 65536 - DON[n-1] + DON[n]

      If (DON[n] > DON[n-1] and DON[n] - DON[n-1] >= 32768),
          AbsDon[n] = AbsDon[n-1] - (DON[n-1] + 65536 -
          DON[n])

      If (DON[n] < DON[n-1] and DON[n-1] - DON[n] < 32768),
          AbsDon[n] = AbsDon[n-1] - (DON[n-1] - DON[n])

   For any two NAL units m and n, the following applies:

   o  AbsDon[n] greater than AbsDon[m] indicates that NAL unit n follows
      NAL unit m in NAL unit decoding order.

   o  When AbsDon[n] is equal to AbsDon[m], the NAL unit decoding order
      of the two NAL units can be in either order.

   o  AbsDon[n] less than AbsDon[m] indicates that NAL unit n precedes
      NAL unit m in decoding order.

         Informative note: When two consecutive NAL units in the NAL
         unit decoding order have different values of AbsDon, the
         absolute difference between the two AbsDon values may be
         greater than or equal to 1.

RFC7798 - Page 39

         Informative note: There are multiple reasons to allow for the
         absolute difference of the values of AbsDon for two consecutive
         NAL units in the NAL unit decoding order to be greater than
         one.  An increment by one is not required, as at the time of
         associating values of AbsDon to NAL units, it may not be known
         whether all NAL units are to be delivered to the receiver.  For
         example, a gateway may not forward VCL NAL units of higher sub-
         layers or some SEI NAL units when there is congestion in the
         network.  In another example, the first intra-coded picture of
         a pre-encoded clip is transmitted in advance to ensure that it
         is readily available in the receiver, and when transmitting the
         first intra-coded picture, the originator does not exactly know
         how many NAL units will be encoded before the first intra-coded
         picture of the pre-encoded clip follows in decoding order.
         Thus, the values of AbsDon for the NAL units of the first
         intra-coded picture of the pre-encoded clip have to be
         estimated when they are transmitted, and gaps in values of
         AbsDon may occur.  Another example is MRST or MRMT with sprop-
         max-don-diff greater than 0, where the AbsDon values must
         indicate cross-layer decoding order for NAL units conveyed in
         all the RTP streams.

5.  Packetization Rules

   The following packetization rules apply:

   o  If sprop-max-don-diff is greater than 0 for any of the RTP
      streams, the transmission order of NAL units carried in the RTP
      stream MAY be different than the NAL unit decoding order and the
      NAL unit output order.  Otherwise (sprop-max-don-diff is equal to
      0 for all the RTP streams), the transmission order of NAL units
      carried in the RTP stream MUST be the same as the NAL unit
      decoding order and, when tx-mode is equal to "MRST" or "MRMT",
      MUST also be the same as the NAL unit output order.

   o  A NAL unit of a small size SHOULD be encapsulated in an
      aggregation packet together with one or more other NAL units in
      order to avoid the unnecessary packetization overhead for small
      NAL units.  For example, non-VCL NAL units such as access unit
      delimiters, parameter sets, or SEI NAL units are typically small
      and can often be aggregated with VCL NAL units without violating
      MTU size constraints.

   o  Each non-VCL NAL unit SHOULD, when possible from an MTU size match
      viewpoint, be encapsulated in an aggregation packet together with
      its associated VCL NAL unit, as typically a non-VCL NAL unit would
      be meaningless without the associated VCL NAL unit being
      available.

RFC7798 - Page 40

   o  For carrying exactly one NAL unit in an RTP packet, a single NAL
      unit packet MUST be used.

6.  De-packetization Process

   The general concept behind de-packetization is to get the NAL units
   out of the RTP packets in an RTP stream and all RTP streams the RTP
   stream depends on, if any, and pass them to the decoder in the NAL
   unit decoding order.

   The de-packetization process is implementation dependent.  Therefore,
   the following description should be seen as an example of a suitable
   implementation.  Other schemes may be used as well, as long as the
   output for the same input is the same as the process described below.
   The output is the same when the set of output NAL units and their
   order are both identical.  Optimizations relative to the described
   algorithms are possible.

   All normal RTP mechanisms related to buffer management apply.  In
   particular, duplicated or outdated RTP packets (as indicated by the
   RTP sequences number and the RTP timestamp) are removed.  To
   determine the exact time for decoding, factors such as a possible
   intentional delay to allow for proper inter-stream synchronization
   must be factored in.

   NAL units with NAL unit type values in the range of 0 to 47,
   inclusive, may be passed to the decoder.  NAL-unit-like structures
   with NAL unit type values in the range of 48 to 63, inclusive, MUST
   NOT be passed to the decoder.

   The receiver includes a receiver buffer, which is used to compensate
   for transmission delay jitter within individual RTP streams and
   across RTP streams, to reorder NAL units from transmission order to
   the NAL unit decoding order, and to recover the NAL unit decoding
   order in MRST or MRMT, when applicable.  In this section, the
   receiver operation is described under the assumption that there is no
   transmission delay jitter within an RTP stream and across RTP
   streams.  To make a difference from a practical receiver buffer that
   is also used for compensation of transmission delay jitter, the
   receiver buffer is hereafter called the de-packetization buffer in
   this section.  Receivers should also prepare for transmission delay
   jitter; that is, either reserve separate buffers for transmission
   delay jitter buffering and de-packetization buffering or use a
   receiver buffer for both transmission delay jitter and de-
   packetization.  Moreover, receivers should take transmission delay
   jitter into account in the buffering operation, e.g., by additional
   initial buffering before starting of decoding and playback.

RFC7798 - Page 41

   When sprop-max-don-diff is equal to 0 for all the received RTP
   streams, the de-packetization buffer size is zero bytes, and the
   process described in the remainder of this paragraph applies.  When
   there is only one RTP stream received, the NAL units carried in the
   single RTP stream are directly passed to the decoder in their
   transmission order, which is identical to their decoding order.  When
   there is more than one RTP stream received, the NAL units carried in
   the multiple RTP streams are passed to the decoder in their NTP
   timestamp order.  When there are several NAL units of different RTP
   streams with the same NTP timestamp, the order to pass them to the
   decoder is their dependency order, where NAL units of a dependee RTP
   stream are passed to the decoder prior to the NAL units of the
   dependent RTP stream.  When there are several NAL units of the same
   RTP stream with the same NTP timestamp, the order to pass them to the
   decoder is their transmission order.

      Informative note: The mapping between RTP and NTP timestamps is
      conveyed in RTCP SR packets.  In addition, the mechanisms for
      faster media timestamp synchronization discussed in [RFC6051] may
      be used to speed up the acquisition of the RTP-to-wall-clock
      mapping.

   When sprop-max-don-diff is greater than 0 for any the received RTP
   streams, the process described in the remainder of this section
   applies.

   There are two buffering states in the receiver: initial buffering and
   buffering while playing.  Initial buffering starts when the reception
   is initialized.  After initial buffering, decoding and playback are
   started, and the buffering-while-playing mode is used.

   Regardless of the buffering state, the receiver stores incoming NAL
   units, in reception order, into the de-packetization buffer.  NAL
   units carried in RTP packets are stored in the de-packetization
   buffer individually, and the value of AbsDon is calculated and stored
   for each NAL unit.  When MRST or MRMT is in use, NAL units of all RTP
   streams of a bitstream are stored in the same de-packetization
   buffer.  When NAL units carried in any two RTP streams are available
   to be placed into the de-packetization buffer, those NAL units
   carried in the RTP stream that is lower in the dependency tree are
   placed into the buffer first.  For example, if RTP stream A depends
   on RTP stream B, then NAL units carried in RTP stream B are placed
   into the buffer first.

RFC7798 - Page 42

   Initial buffering lasts until condition A (the difference between the
   greatest and smallest AbsDon values of the NAL units in the de-
   packetization buffer is greater than or equal to the value of sprop-
   max-don-diff of the highest RTP stream) or condition B (the number of
   NAL units in the de-packetization buffer is greater than the value of
   sprop-depack-buf-nalus) is true.

   After initial buffering, whenever condition A or condition B is true,
   the following operation is repeatedly applied until both condition A
   and condition B become false:

      o  The NAL unit in the de-packetization buffer with the smallest
         value of AbsDon is removed from the de-packetization buffer and
         passed to the decoder.

   When no more NAL units are flowing into the de-packetization buffer,
   all NAL units remaining in the de-packetization buffer are removed
   from the buffer and passed to the decoder in the order of increasing
   AbsDon values.

(page 42 continued on part 3)