RFC 6190

RTP Payload Format for Scalable Video Coding

Pages: 100
Proposed Standard
→ Errata

Part 3 of 4 – Pages 51 to 74

RFC6190 - Page 51 prevText

6.  De-Packetization Process

6.1.  De-Packetization Process for Single-Session Transmission

   For single-session transmission, where a single RTP session is used,
   the de-packetization process specified in Section 7 of [RFC6184]
   applies.

6.2.  De-Packetization Process for Multi-Session Transmission

   For multi-session transmission, where more than one RTP session is
   used to receive data from the same SVC bitstream, the de-
   packetization process is specified as follows.

   As for a single RTP session, the general concept behind the de-
   packetization process is to reorder NAL units from transmission order
   to the NAL unit decoding order.

   The sessions to be received MUST be identified by mechanisms
   specified in Section 7.2.3.  An enhancement RTP session typically
   contains an RTP stream that depends on at least one other RTP
   session, as indicated by mechanisms defined in Section 7.2.3.  A
   lower RTP session to an enhancement RTP session is an RTP session on
   which the enhancement RTP session depends.  The lowest RTP session
   for a receiver is the base RTP session, which does not depend on any
   other RTP session received by the receiver.  The highest RTP session
   for a receiver is the RTP session on which no other RTP session
   received by the receiver depends.

RFC6190 - Page 52

   For each of the RTP sessions, the RTP reception process as specified
   in RFC 3550 is applied.  Then the received packets are passed into
   the payload de-packetization process as defined in this memo.

   The decoding order of the NAL units carried in all the associated RTP
   sessions is then recovered by applying one of the following
   subsections, depending on which of the MST packetization modes is in
   use.

6.2.1.  Decoding Order Recovery for the NI-T and NI-TC Modes

   The following process MUST be applied when the NI-T packetization
   mode is in use.  The following process MAY be applied when the NI-TC
   packetization mode is in use.

   The process is based on RTP session dependency signaling, RTP
   sequence numbers, and timestamps.

   The decoding order of NAL units within an RTP packet stream in RTP
   session is given by the ordering of sequence numbers SN of the RTP
   packets that contain the NAL units, and the order of appearance of
   NAL units within a packet.

   Timing information according to the media timestamp TS, i.e., the NTP
   timestamp as derived from the RTP timestamp of an RTP packet, is
   associated with all NAL units contained in the same RTP packet
   received in an RTP session.

   For NI-MTAP packets the NALU-time is derived for each contained NAL
   unit by using the "TS offset" value in the NI-MTAP packet as defined
   in Section 4.10, and is used instead of the RTP packet timestamp to
   derive the media timestamp, e.g., using the NTP wall clock as
   provided via RTCP sender reports.  NAL units contained in
   fragmentation packets are handled as defragmented, entire NAL units
   with their own media timestamps.  All NAL units associated with the
   same value of media timestamp TS are part of the same access unit
   AU(TS).  Any empty NAL units SHOULD be kept as, effectively, access
   unit indicators in the reordering process.  Empty NAL units and PACSI
   NAL units SHOULD be removed before passing access unit data to the
   decoder.

      Informative note: These empty NAL units are used to associate NAL
      units present in other RTP sessions with RTP sessions not
      containing any data for an access unit of a particular time
      instance.  They act as access unit indicators in sessions that
      would otherwise contain no data for the particular access unit.
      The presence of these NAL units is ensured by the packetization
      rules in Section 5.2.1.

RFC6190 - Page 53

   It is assumed that the receiver has established an operation point
   (DID, QID, and TID values), and has identified the highest
   enhancement RTP session for this operation point.  The decoding order
   of NAL units from multiple RTP streams in multiple RTP sessions MUST
   be recovered into a single sequence of NAL units, grouped into access
   units, by performing any process equivalent to the following steps.
   The general process is described in Section 4.2 of [RFC6051].  For
   convenience the instructions of [RFC6051] are repeated and applied to
   NAL units rather than to full RTP packets.  Additionally, SVC-
   specific extensions to the procedure in Section 4.2.  of [RFC6051]
   are presented in the following list:

      o  The process should be started with the NAL units received in
         the highest RTP session with the first media timestamp TS (in
         NTP format) available in the session's (de-jittering) buffer.
         It is assumed that packets in the de-jittering buffer are
         already stored in RTP sequence number order.

      o  Collect all NAL units associated with the same value of media
         timestamp TS, starting from the highest RTP session, from all
         the (de-jittering) buffers of the received RTP sessions.  The
         collected NAL units will be those associated with the access
         unit AU(TS).

      o  Place the collected NAL units in the order of session
         dependency as derived by the dependency indication as specified
         in Section 7.2.3, starting from the lowest RTP session.

      o  Place the session ordered NAL units in decoding order within
         the particular access unit by satisfying the NAL unit ordering
         rules for SVC access units, as described in the informative
         algorithm provided in Section 6.2.1.1.

      o  Remove NI-MTAP and any PACSI NAL units from the access unit
         AU(TS).

      o  The access units can then be transferred to the decoder.
         Access units AU(TS) are transferred to the decoder in the order
         of appearance (given by the order of RTP sequence numbers) of
         media timestamp values TS in the highest RTP session associated
         with access unit AU(TS).

            Informative note: Due to packet loss it is possible that not
            all sessions may have NAL units present for the media
            timestamp value TS present in the highest RTP session.  In
            such a case, an algorithm may: a) proceed to the next
            complete access unit with NAL units present in all the
            received RTP sessions; or b) consider a new highest RTP

RFC6190 - Page 54

            session, the highest RTP session for which the access unit
            is complete, and apply the process above.  The algorithm may
            return to the original highest RTP session when a complete
            and error-free access unit that contains NAL units in all
            the sessions is received.

   The following gives an informative example.

   The example shown in Figure 6 refers to three RTP sessions A, B, and
   C containing an SVC bitstream transmitted as 3 sources.  In the
   example, the dependency signaling (described in Section 7.2.3)
   indicates that session A is the base RTP session, B is the first
   enhancement RTP session and depends on A, and C is the second
   enhancement RTP session and depends on A and B.  A hierarchical
   picture coding prediction structure is used, in which session A has
   the lowest frame rate and sessions B and C have the same but higher
   frame rate.

   The figure shows NAL units contained in RTP packets that are stored
   in the de-jittering buffer at the receiver for session de-
   packetization.  The NAL units are already reordered according to
   their RTP sequence number order and, if within an aggregation packet,
   according to the order of their appearance within the aggregation
   packet.  The figure indicates for the received NAL units the decoding
   order within the sessions, as well as the associated media (NTP)
   timestamps ("TS[..]").  NAL units of the same access unit within a
   session are grouped by "(.,.)" and share the same media timestamp TS,
   which is shown at the bottom of the figure.  Note that the timestamps
   are not in increasing order since, in this example, the decoding
   order is different from the output/display order.

   The process first proceeds to the NAL units associated with the first
   media timestamp TS[1] present in the highest session C and
   removes/ignores all preceding (in decoding order) NAL units to NAL
   units with TS[1] in each of the de-jittering buffers of RTP sessions
   A, B, and C.  Then, starting from session C, the first media
   timestamp available in decoding order (TS[1]) is selected and NAL
   units starting from RTP session A, and sessions B and C are placed in
   order of the RTP session dependency as required by Section 7.2.3 of
   this memo (in the example for TS[1]: first session B and then session
   C) into the access unit AU(TS[1]) associated with media timestamp
   TS[1].  Then the next media timestamp TS[3] in order of appearance in
   the highest RTP session C is processed and the process described
   above is repeated.  Note that there may be access units with no NAL
   units present, e.g., in the lowest RTP session A (see, e.g., TS[1]).
   With TS[8], the first access unit with NAL units present in all the
   RTP sessions appears in the buffers.

RFC6190 - Page 55

   C: ------------(1,2)-(3,4)--(5)---(6)---(7,8)(9,10)-(11)--(12)----
        |     |     |     |     |     |      |    |     |      |
   B: -(1,2)-(3,4)-(5)---(6)--(7,8)-(9,10)-(11)-(12)--(13,14)(15,15)-
        |     |                 |     |                 |      |
   A: -------(1)---------------(2)---(3)---------------(4)----(5)----
   ---------------------------------------------------decoding order-->

   TS: [4]   [2]   [1]   [3]   [8]   [6]   [5]   [7]   [12]   [10]

   Key:
   A, B, C                - RTP sessions
   Integer values in "()" - NAL unit decoding order within RTP session
   "( )"                  - groups the NAL units of an access unit
                            in an RTP session
   "|"                    - indicates corresponding NAL units of the
                            same access unit AU(TS[..]) in the RTP
                            sessions
   Integer values in "[]" - media timestamp TS, sampling time
                            as derived, e.g., from NTP timestamp
                            associated with the access unit AU(TS[..]),
                            consisting of NAL units in the sessions
                            above each TS value.

   Figure 6.  Example of decoding order recovery in multi-source
   transmission.

6.2.1.1.  Informative Algorithm for NI-T Decoding Order Recovery within
          an Access Unit

   Within an access unit, the [H.264] specification (Sections 7.4.1.2.3
   and G.7.4.1.2.3) constrains the valid decoding order of NAL units.

   These constraints make it possible to reconstruct a valid decoding
   order for the NAL units of an access unit based only on the order of
   NAL units in each session, the NAL unit headers, and Supplemental
   Enhancement Information message headers.

   This section specifies an informative algorithm to reconstruct a
   valid decoding order for NAL units within an access unit.  Other NAL
   unit orderings may also be valid; however, any compliant NAL unit
   ordering will describe the same video stream and ancillary data as
   the one produced by this algorithm.

   An actual implementation, of course, needs only to behave "as if"
   this reordering is done.  In particular, NAL units that are discarded
   by an implementation's decoding process do not need to be reordered.

RFC6190 - Page 56

   In this algorithm, NAL units within an access unit are first ordered
   by NAL unit type, in the order specified in Table 12 below, except
   from NAL unit type 14, which is handled specially as described in the
   table.  NAL units of the same type are then ordered as specified for
   the type, if necessary.

   For the purposes of this algorithm, "session order" is the order of
   NAL units implied by their transmission order within an RTP session.
   For the non-interleaved and single NAL unit modes, this is the RTP
   sequence number order coupled with the order of NAL units within an
   aggregation unit.

   Table 12.  Ordering of NAL unit types within an Access Unit

    Type    Description / Comments
   -----------------------------------------------------------
     9      Access unit delimiter

     7      Sequence parameter set

     13     Sequence parameter set extension

     15     Subset sequence parameter set

     8      Picture parameter set

     16-18  Reserved

     6      Supplemental enhancement information (SEI)
            If an SEI message with a first payload of 0 (Buffering
            Period) is present, it must be the first SEI message.

            If SEI messages with a Scalable Nesting (30) payload and
            a nested payload of 0 (Buffering Period) are present,
            these then follow the first SEI message.  Such an SEI
            message with the all_layer_representations_in_au_flag
            equal to 1 is placed first, followed by any others,
            sorted in increasing order of DQId.

            All other SEI messages follow in any order.

     14     Prefix NAL unit in scalable extension
     1      Coded slice of a non-IDR picture
     5      Coded slice of an IDR picture

RFC6190 - Page 57

            NAL units of type 1 or 5 will be sent within only a
            single session for any given access unit.  They are
            placed in session order.  (Note: Any given access unit
            will contain only NAL units of type 1 or type 5, not
            both.)

            If NAL units of type 14 are present, every NAL unit of
            type 1 or 5 is prefixed by a NAL unit of type 14.  (Note:
            Within an access unit, every NAL unit of type 14 is
            identical, so correlation of type 14 NAL units with the
            other NAL units is not necessary.)

     12     Filler data

            The only restriction of filler data NAL units within an
            access unit is that they shall not precede the first VCL
            NAL unit with the same access unit.

     19     Coded slice of an auxiliary coded picture without
            partitioning

            These NAL units will be sent within only a single
            session for any given access unit, and are placed in
            session order.

      20    Coded slice in scalable extension
      21-23 Reserved

            Type 20 NAL units are placed in increasing order of DQId.
            Within each DQId value, they are placed in session order.

            (Note: SVC slices with a given DQId value will be sent
            within only a single session for any given access unit.)

            Type 21-23 NAL units are placed immediately following
            the non-reserved-type VCL NAL unit they follow in
            session order.

     10     End of sequence

     11     End of stream

6.2.2.  Decoding Order Recovery for the NI-C, NI-TC, and I-C Modes

   The following process MUST be used when either the NI-C or I-C MST
   packetization mode is in use.  The following process MAY be applied
   when the NI-TC MST packetization mode is in use.

RFC6190 - Page 58

   The RTP packets output from the RTP-level reception processing for
   each session are placed into a re-multiplexing buffer.

   It is RECOMMENDED to set the size of the re-multiplexing buffer (in
   bytes) equal to or greater than the value of the sprop-remux-buf-req
   media type parameter of the highest RTP session the receiver
   receives.

   The CS-DON value is calculated and stored for each NAL unit.

      Informative note: The CS-DON value of a NAL unit may rely on
      information carried in another packet than the packet containing
      the NAL unit.  This happens, e.g., when the CS-DON values need to
      be derived for non-PACSI NAL units contained in single NAL unit
      packets, as the single NAL unit packets themselves do not contain
      CS-DON information.  In this case, when no packet containing
      required CS-DON information is received for a NAL unit, this NAL
      unit has to be discarded by the receiver as it cannot be fed to
      the decoder in the correct order.  When the optional media type
      parameter sprop-mst-csdon-always-present is equal to 1, no such
      dependency exists, i.e., the CS-DON value of any particular NAL
      unit can be derived solely according to information in the packet
      containing the NAL unit, and therefore, the receiver does not need
      to discard any received NAL units.

   The receiver operation is described below with the help of the
   following functions and constants:

   o  Function AbsDON is specified in Section 8.1 of [RFC6184].

   o  Function don_diff is specified in Section 5.5 of [RFC6184].

   o  Constant N is the value of the OPTIONAL sprop-mst-remux-buf-size
      media type parameter of the highest RTP session incremented by 1.

   Initial buffering lasts until one of the following conditions is
   fulfilled:

   o  There are N or more VCL NAL units in the re-multiplexing buffer.

   o  If sprop-mst-max-don-diff of the highest RTP session is present,
      don_diff(m,n) is greater than the value of sprop-mst-max-don-diff
      of the highest RTP session, where n corresponds to the NAL unit
      having the greatest value of AbsDON among the received NAL units
      and m corresponds to the NAL unit having the smallest value of
      AbsDON among the received NAL units.

RFC6190 - Page 59

   o  Initial buffering has lasted for the duration equal to or greater
      than the value of the OPTIONAL sprop-remux-init-buf-time media
      type parameter of the highest RTP session.

   The NAL units to be removed from the re-multiplexing buffer are
   determined as follows:

   o  If the re-multiplexing buffer contains at least N VCL NAL units,
      NAL units are removed from the re-multiplexing buffer and passed
      to the decoder in the order specified below until the buffer
      contains N-1 VCL NAL units.

   o  If sprop-mst-max-don-diff of the highest RTP session is present,
      all NAL units m for which don_diff(m,n) is greater than sprop-
      max-don-diff of the highest RTP session are removed from the re-
      multiplexing buffer and passed to the decoder in the order
      specified below.  Herein, n corresponds to the NAL unit having the
      greatest value of AbsDON among the NAL units in the re-
      multiplexing buffer.

   The order in which NAL units are passed to the decoder is specified
   as follows:

   o  Let PDON be a variable that is initialized to 0 at the beginning
      of the RTP sessions.

   o  For each NAL unit associated with a value of CS-DON, a CS-DON
      distance is calculated as follows.  If the value of CS-DON of the
      NAL unit is larger than the value of PDON, the CS-DON distance is
      equal to CS-DON - PDON.  Otherwise, the CS-DON distance is equal
      to 65535 - PDON + CS-DON + 1.

   o  NAL units are delivered to the decoder in increasing order of CS-
      DON distance.  If several NAL units share the same value of CS-
      DON distance, they can be passed to the decoder in any order.

   o  When a desired number of NAL units have been passed to the
      decoder, the value of PDON is set to the value of CS-DON for the
      last NAL unit passed to the decoder.

7.  Payload Format Parameters

   This section specifies the parameters that MAY be used to select
   optional features of the payload format and certain features of the
   bitstream.  The parameters are specified here as part of the media
   type registration for the SVC codec.  A mapping of the parameters
   into the Session Description Protocol (SDP) [RFC4566] is also

RFC6190 - Page 60

   provided for applications that use SDP.  Equivalent parameters could
   be defined elsewhere for use with control protocols that do not use
   SDP.

   Some parameters provide a receiver with the properties of the stream
   that will be sent.  The names of all these parameters start with
   "sprop" for stream properties.  Some of these "sprop" parameters are
   limited by other payload or codec configuration parameters.  For
   example, the sprop-parameter-sets parameter is constrained by the
   profile-level-id parameter.  The media sender selects all "sprop"
   parameters rather than the receiver.  This uncommon characteristic of
   the "sprop" parameters may be incompatible with some signaling
   protocol concepts, in which case the use of these parameters SHOULD
   be avoided.

7.1.  Media Type Registration

   The media subtype for the SVC codec has been allocated from the IETF
   tree.

   The receiver MUST ignore any unspecified parameter.

      Informative note: Requiring that the receiver ignore unspecified
      parameters allows for backward compatibility of future extensions.
      For example, if a future specification that is backward compatible
      to this specification specifies some new parameters, then a
      receiver according to this specification is capable of receiving
      data per the new payload but ignoring those parameters newly
      specified in the new payload specification.  This provision is
      also present in [RFC6184].

   Media Type name:     video

   Media subtype name:  H264-SVC

   Required parameters: none

   OPTIONAL parameters:

      In the following definitions of parameters, "the stream" or "the
      NAL unit stream" refers to all NAL units conveyed in the current
      RTP session in SST, and all NAL units conveyed in the current RTP
      session and all NAL units conveyed in other RTP sessions that the
      current RTP session depends on in MST.

RFC6190 - Page 61

      profile-level-id:
         A base16 [RFC4648] (hexadecimal) representation of the
         following three bytes in the sequence parameter set or subset
         sequence parameter set NAL unit specified in [H.264]: 1)
         profile_idc; 2) a byte herein referred to as profile-iop,
         composed of the values of constraint_set0_flag,
         constraint_set1_flag, constraint_set2_flag,
         constraint_set3_flag, constraint_set4_flag,
         constraint_set5_flag, and reserved_zero_2bits, in bit-
         significance order, starting from the most-significant bit, and
         3) level_idc.  Note that reserved_zero_2bits is required to be
         equal to 0 in [H.264], but other values for it may be specified
         in the future by ITU-T or ISO/IEC.

         The profile-level-id parameter indicates the default sub-
         profile, i.e., the subset of coding tools that may have been
         used to generate the stream or that the receiver supports, and
         the default level of the stream or the one that the receiver
         supports.

         The default sub-profile is indicated collectively by the
         profile_idc byte and some fields in the profile-iop byte.
         Depending on the values of the fields in the profile-iop byte,
         the default sub-profile may be the same set of coding tools
         supported by one profile, or a common subset of coding tools of
         multiple profiles, as specified in Subsection G.7.4.2.1.1 of
         [H.264].  The default level is indicated by the level_idc byte,
         and, when profile_idc is equal to 66, 77, or 88 (the Baseline,
         Main, or Extended profile) and level_idc is equal to 11,
         additionally by bit 4 (constraint_set3_flag) of the profile-iop
         byte.  When profile_idc is equal to 66, 77, or 88 (the
         Baseline, Main, or Extended profile) and level_idc is equal to
         11, and bit 4 (constraint_set3_flag) of the profile-iop byte is
         equal to 1, the default level is Level 1b.

         Table 13 lists all profiles defined in Annexes A and G of
         [H.264] and, for each of the profiles, the possible
         combinations of profile_idc and profile-iop that represent the
         same sub-profile.

         Table 13.  Combinations of profile_idc and profile-iop
         representing the same sub-profile corresponding to the full set
         of coding tools supported by one profile.  In the following, x
         may be either 0 or 1, while the profile names are indicated as
         follows.  CB: Constrained Baseline profile, B: Baseline
         profile, M: Main profile, E: Extended profile, H: High profile,
         H10: High 10 profile, H42: High 4:2:2 profile, H44: High 4:4:4
         Predictive profile, H10I: High 10 Intra profile, H42I: High

RFC6190 - Page 62

         4:2:2 Intra profile, H44I: High 4:4:4 Intra profile, C44I:
         CAVLC 4:4:4 Intra profile, SB: Scalable Baseline profile, SH:
         Scalable High profile, and SHI: Scalable High Intra profile.

         Profile     profile_idc             profile-iop
                         (hexadecimal)           (binary)

             CB          42 (B)                  x1xx0000
               same as:  4D (M)                  1xxx0000
               same as:  58 (E)                  11xx0000
             B           42 (B)                  x0xx0000
               same as:  58 (E)                  10xx0000
             M           4D (M)                  0x0x0000
             E           58                      00xx0000
             H           64                      00000000
             H10         6E                      00000000
             H42         7A                      00000000
             H44         F4                      00000000
             H10I        6E                      00010000
             H42I        7A                      00010000
             H44I        F4                      00010000
             C44I        2C                      00010000
             SB          53                      x0000000
             SH          56                      0x000000
             SHI         56                      0x010000

         For example, in the table above, profile_idc equal to 58
         (Extended) with profile-iop equal to 11xx0000 indicates the
         same sub-profile corresponding to profile_idc equal to 42
         (Baseline) with profile-iop equal to x1xx0000.  Note that other
         combinations of profile_idc and profile-iop (not listed in
         Table 13) may represent a sub-profile equivalent to the common
         subset of coding tools for more than one profile.  Note also
         that a decoder conforming to a certain profile may be able to
         decode bitstreams conforming to other profiles.

         If profile-level-id is used to indicate stream properties, it
         indicates that, to decode the stream, the minimum subset of
         coding tools a decoder has to support is the default sub-
         profile, and the lowest level the decoder has to support is the
         default level.

         If the profile-level-id parameter is used for capability
         exchange or session setup, it indicates the subset of coding
         tools, which is equal to the default sub-profile, that the
         codec supports for both receiving and sending.  If max-recv-
         level is not present, the default level from profile-level-id
         indicates the highest level the codec wishes to support.  If

RFC6190 - Page 63

         max-recv-level is present, it indicates the highest level the
         codec supports for receiving.  For either receiving or sending,
         all levels that are lower than the highest level supported MUST
         also be supported.

            Informative note: Capability exchange and session setup
            procedures should provide means to list the capabilities for
            each supported sub-profile separately.  For example, the
            one-of-N codec selection procedure of the SDP Offer/Answer
            model can be used (Section 10.2 of [RFC3264]).  The one-of-N
            codec selection procedure may also be used to provide
            different combinations of profile_idc and profile-iop that
            represent the same sub-profile.  When there are many
            different combinations of profile_idc and profile-iop that
            represent the same sub-profile, using the one-of-N codec
            selection procedure may result in a fairly large SDP
            message.  Therefore, a receiver should understand the
            different equivalent combinations of profile_idc and
            profile-iop that represent the same sub-profile, and be
            ready to accept an offer using any of the equivalent
            combinations.

         If no profile-level-id is present, the Baseline Profile without
         additional constraints at Level 1 MUST be implied.

      max-recv-level:
         This parameter MAY be used to indicate the highest level a
         receiver supports when the highest level is higher than the
         default level (the level indicated by profile-level-id).  The
         value of max-recv-level is a base16 (hexadecimal)
         representation of the two bytes after the syntax element
         profile_idc in the sequence parameter set NAL unit specified in
         [H.264]: profile-iop (as defined above) and level_idc.  If (the
         level_idc byte of max-recv-level is equal to 11 and bit 4 of
         the profile-iop byte of max-recv-level is equal to 1) or (the
         level_idc byte of max-recv-level is equal to 9 and bit 4 of the
         profile-iop byte of max-recv-level is equal to 0), the highest
         level the receiver supports is Level 1b.  Otherwise, the
         highest level the receiver supports is equal to the level_idc
         byte of max-recv-level divided by 10.

         max-recv-level MUST NOT be present if the highest level the
         receiver supports is not higher than the default level.

      max-recv-base-level:
         This parameter MAY be used to indicate the highest level a
         receiver supports for the base layer when negotiating an SVC
         stream.  The value of max-recv-base-level is a base16

RFC6190 - Page 64

         (hexadecimal) representation of the two bytes after the syntax
         element profile_idc in the sequence parameter set NAL unit
         specified in [H.264]: profile-iop (as defined above) and
         level_idc.  If (the level_idc byte of max-recv-level is equal
         to 11 and bit 4 of the profile-iop byte of max-recv-level is
         equal to 1) or (the level_idc byte of max-recv-level is equal
         to 9 and bit 4 of the profile-iop byte of max-recv-level is
         equal to 0), the highest level the receiver supports for the
         base layer is Level 1b. Otherwise, the highest level the
         receiver supports for the base layer is equal to the level_idc
         byte of max-recv-level divided by 10.

      max-mbps, max-fs, max-cpb, max-dpb, and max-br:
         The common properties of these parameters are specified in
         [RFC6184].

      max-mbps: This parameter is as specified in [RFC6184].

      max-fs: This parameter is as specified in [RFC6184].

      max-cpb: The value of max-cpb is an integer indicating the maximum
         coded picture buffer size in units of 1000 bits for the VCL HRD
         parameters and in units of 1200 bits for the NAL HRD
         parameters.  Note that this parameter does not use units of
         cpbBrVclFactor and cpbBrNALFactor (see Table A-1 of [H.264]).
         The max-cpb parameter signals that the receiver has more memory
         than the minimum amount of coded picture buffer memory required
         by the signaled highest level conveyed in the value of the
         profile-level-id parameter or the max-recv-level parameter.
         When max-cpb is signaled, the receiver MUST be able to decode
         NAL unit streams that conform to the signaled highest level,
         with the exception that the MaxCPB value in Table A-1 of
         [H.264] for the signaled highest level is replaced with the
         value of max-cpb (after taking cpbBrVclFactor and
         cpbBrNALFactor into consideration when needed).  The value of
         max-cpb (after taking cpbBrVclFactor and cpbBrNALFactor into
         consideration when needed) MUST be greater than or equal to the
         value of MaxCPB given in Table A-1 of [H.264] for the highest
         level.  Senders MAY use this knowledge to construct coded video
         streams with greater variation of bitrate than can be achieved
         with the MaxCPB value in Table A-1 of [H.264].

RFC6190 - Page 65

            Informative note: The coded picture buffer is used in the
            Hypothetical Reference Decoder (HRD, Annex C) of [H.264].
            The use of the HRD is recommended in SVC encoders to verify
            that the produced bitstream conforms to the standard and to
            control the output bitrate.  Thus, the coded picture buffer
            is conceptually independent of any other potential buffers
            in the receiver, including de-interleaving, re-multiplexing,
            and de-jitter buffers.  The coded picture buffer need not be
            implemented in decoders as specified in Annex C of [H.264];
            standard-compliant decoders can have any buffering
            arrangements provided that they can decode standard-
            compliant bitstreams.  Thus, in practice, the input buffer
            for video decoder can be integrated with the de-
            interleaving, re-multiplexing, and de-jitter buffers of the
            receiver.

      max-dpb: This parameter is as specified in [RFC6184].

      max-br: The value of max-br is an integer indicating the maximum
         video bitrate in units of 1000 bits per second for the VCL HRD
         parameters and in units of 1200 bits per second for the NAL HRD
         parameters.  Note that this parameter does not use units of
         cpbBrVclFactor and cpbBrNALFactor (see Table A-1 of [H.264]).

         The max-br parameter signals that the video decoder of the
         receiver is capable of decoding video at a higher bitrate than
         is required by the signaled highest level conveyed in the value
         of the profile-level-id parameter or the max-recv-level
         parameter.

         When max-br is signaled, the video codec of the receiver MUST
         be able to decode NAL unit streams that conform to the signaled
         highest level, with the following exceptions in the limits
         specified by the highest level:

         o  The value of max-br (after taking cpbBrVclFactor and
            cpbBrNALFactor into consideration when needed) replaces the
            MaxBR value in Table A-1 of [H.264] for the highest level.

         o  When the max-cpb parameter is not present, the result of the
            following formula replaces the value of MaxCPB in Table A-1
            of [H.264]: (MaxCPB of the signaled level) * max-br / (MaxBR
            of the signaled highest level).

         For example, if a receiver signals capability for Main profile
         Level 1.2 with max-br equal to 1550, this indicates a maximum
         video bitrate of 1550 kbits/sec for VCL HRD parameters, a

RFC6190 - Page 66

         maximum video bitrate of 1860 kbits/sec for NAL HRD parameters,
         and a CPB size of 4036458 bits (1550000 / 384000 * 1000 *
         1000).

         The value of max-br (after taking cpbBrVclFactor and
         cpbBrNALFactor into consideration when needed) MUST be greater
         than or equal to the value MaxBR given in Table A-1 of [H.264]
         for the signaled highest level.

         Senders MAY use this knowledge to send higher-bitrate video as
         allowed in the level definition of SVC, to achieve improved
         video quality.

            Informative note: This parameter was added primarily to
            complement a similar codepoint in the ITU-T Recommendation
            H.245, so as to facilitate signaling gateway designs.  No
            assumption can be made from the value of this parameter that
            the network is capable of handling such bitrates at any
            given time.  In particular, no conclusion can be drawn that
            the signaled bitrate is possible under congestion control
            constraints.

      redundant-pic-cap:
         This parameter is as specified in [RFC6184].

      sprop-parameter-sets:
         This parameter MAY be used to convey any sequence parameter
         set, subset sequence parameter set, and picture parameter set
         NAL units (herein referred to as the initial parameter set NAL
         units) that can be placed in the NAL unit stream to precede any
         other NAL units in decoding order and that are associated with
         the default level of profile-level-id.  The parameter MUST NOT
         be used to indicate codec capability in any capability exchange
         procedure.  The value of the parameter is a comma (',')
         separated list of base64 [RFC4648] representations of the
         parameter set NAL units as specified in Sections 7.3.2.1,
         7.3.2.2, and G.7.3.2.1 of [H.264].  Note that the number of
         bytes in a parameter set NAL unit is typically less than 10,
         but a picture parameter set NAL unit can contain several
         hundreds of bytes.

            Informative note: When several payload types are offered in
            the SDP Offer/Answer model, each with its own sprop-
            parameter-sets parameter, then the receiver cannot assume
            that those parameter sets do not use conflicting storage
            locations (i.e., identical values of parameter set

RFC6190 - Page 67

            identifiers).  Therefore, a receiver should buffer all
            sprop-parameter-sets and make them available to the decoder
            instance that decodes a certain payload type.

      sprop-level-parameter-sets:
         This parameter MAY be used to convey any sequence, subset
         sequence, and picture parameter set NAL units (herein referred
         to as the initial parameter set NAL units) that can be placed
         in the NAL unit stream to precede any other NAL units in
         decoding order and that are associated with one or more levels
         different than the default level of profile-level-id.  The
         parameter MUST NOT be used to indicate codec capability in any
         capability exchange procedure.

         The sprop-level-parameter-sets parameter contains parameter
         sets for one or more levels that are different than the default
         level.  All parameter sets targeted for use when one level of
         the default sub-profile is accepted by a receiver are clustered
         and prefixed with a three-byte field that has the same syntax
         as profile-level-id.  This enables the receiver to install the
         parameter sets for the accepted level and discard the rest.
         The three-byte field is named PLId, and all parameter sets
         associated with one level are named PSL, which has the same
         syntax as sprop-parameter-sets.  Parameter sets for each level
         are represented in the form of PLId:PSL, i.e., PLId followed by
         a colon (':') and the base64 [RFC4648] representation of the
         initial parameter set NAL units for the level.  Each pair of
         PLId:PSL is also separated by a colon.  Note that a PSL can
         contain multiple parameter sets for that level, separated with
         commas (',').

         The subset of coding tools indicated by each PLId field MUST be
         equal to the default sub-profile, and the level indicated by
         each PLId field MUST be different than the default level.

            Informative note: This parameter allows for efficient level
            downgrade or upgrade in SDP Offer/Answer and out-of-band
            transport of parameter sets, simultaneously.

      in-band-parameter-sets:
         This parameter MAY be used to indicate a receiver capability.
         The value MAY be equal to either 0 or 1.  The value 1 indicates
         that the receiver discards out-of-band parameter sets in sprop-
         parameter-sets and sprop-level-parameter-sets, therefore the
         sender MUST transmit all parameter sets in-band.  The value 0
         indicates that the receiver utilizes out-of-band parameter sets
         included in sprop-parameter-sets and/or sprop-level-parameter-
         sets.  However, in this case, the sender MAY still choose to

RFC6190 - Page 68

         send parameter sets in-band.  When the parameter is not
         present, this receiver capability is not specified, and
         therefore the sender MAY send out-of-band parameter sets only,
         or it MAY send in-band-parameter-sets only, or it MAY send
         both.

      packetization-mode:
         This parameter is as specified in [RFC6184].  When the mst-mode
         parameter is present, the value of this parameter is
         additionally constrained as follows.  If mst-mode is equal to
         "NI-T", "NI-C", or "NI-TC", packetization-mode MUST NOT be
         equal to 2.  Otherwise, (mst-mode is equal to "I-C"),
         packetization-mode MUST be equal to 2.

      sprop-interleaving-depth:
         This parameter is as specified in [RFC6184].

      sprop-deint-buf-req:
         This parameter is as specified in [RFC6184].

      deint-buf-cap:
         This parameter is as specified in [RFC6184].

      sprop-init-buf-time:
         This parameter is as specified in [RFC6184].

      sprop-max-don-diff:
         This parameter is as specified in [RFC6184].

      max-rcmd-nalu-size:
         This parameter is as specified in [RFC6184].

      mst-mode:
         This parameter MAY be used to signal the properties of a NAL
         unit stream or the capabilities of a receiver implementation.
         If this parameter is present, multi-session transmission MUST
         be used.  Otherwise (this parameter is not present), single-
         session transmission MUST be used.  When this parameter is
         present, the following applies.  When the value of mst-mode is
         equal to "NI-T", the NI-T mode MUST be used.  When the value of
         mst-mode is equal to "NI-C", the NI-C mode MUST be used.  When
         the value of mst-mode is equal to "NI-TC", the NI-TC mode MUST
         be used.  When the value of mst-mode is equal to "I-C", the I-C
         mode MUST be used.  The value of mst-mode MUST have one of the
         following tokens: "NI-T", "NI-C", "NI-TC", or "I-C".

         All RTP sessions in an MST MUST have the same value of mst-
         mode.

RFC6190 - Page 69

      sprop-mst-csdon-always-present:
         This parameter MUST NOT be present when mst-mode is not present
         or the value of mst-mode is equal to "NI-T" or "I-C".  This
         parameter signals the properties of the NAL unit stream.  When
         sprop-mst-csdon-always-present is present and the value is
         equal to 1, packetization-mode MUST be equal to 1, and all the
         RTP packets carrying the NAL unit stream MUST be STAP-A packets
         containing a PACSI NAL unit that further contains the DONC
         field or NI-MTAP packets with the J field equal to 1.  When
         sprop-mst-csdon-always-present is present and the value is
         equal to 1, the CS-DON value of any particular NAL unit can be
         derived solely according to information in the packet
         containing the NAL unit.

         When sprop-mst-csdon-always-present is present in the current
         RTP session, it MUST be present also in all the RTP sessions
         the current RTP session depends on and the value of sprop-mst-
         csdon-always-present is identical for the current RTP session
         and all the RTP sessions on which the current RTP session
         depends.

      sprop-mst-remux-buf-size:
         This parameter MUST NOT be present when mst-mode is not present
         or the value of mst-mode is equal to "NI-T".  This parameter
         MUST be present when mst-mode is present and the value of mst-
         mode is equal to "NI-C", "NI-TC", or "I-C".

         This parameter signals the properties of the NAL unit stream.
         It MUST be set to a value one less than the minimum re-
         multiplexing buffer size (in NAL units), so that it is
         guaranteed that receivers can reconstruct NAL unit decoding
         order as specified in Subsection 6.2.2.

         The value of sprop-mst-remux-buf-size MUST be an integer in the
         range of 0 to 32767, inclusive.

      sprop-remux-buf-req:
         This parameter MUST NOT be present when mst-mode is not present
         or the value of mst-mode is equal to "NI-T".  It MUST be
         present when mst-mode is present and the value of mst-mode is
         equal to "NI-C", "NI-TC", or "I-C".

         sprop-remux-buf-req signals the required size of the re-
         multiplexing buffer for the NAL unit stream.  It is guaranteed
         that receivers can recover the decoding order of the received
         NAL units from the current RTP session and the RTP sessions the

RFC6190 - Page 70

         current RTP session depends on as specified in Section 6.2.2,
         when the re-multiplexing buffer size is of at least the value
         of sprop-remux-buf-req in units of bytes.

         The value of sprop-remux-buf-req MUST be an integer in the
         range of 0 to 4294967295, inclusive.

      remux-buf-cap:
         This parameter MUST NOT be present when mst-mode is not present
         or the value of mst-mode is equal to "NI-T".  This parameter
         MAY be used to signal the capabilities of a receiver
         implementation and indicates the amount of re-multiplexing
         buffer space in units of bytes that the receiver has available
         for recovering the NAL unit decoding order as specified in
         Section 6.2.2.  A receiver is able to handle any NAL unit
         stream for which the value of the sprop-remux-buf-req parameter
         is smaller than or equal to this parameter.

         If the parameter is not present, then a value of 0 MUST be used
         for remux-buf-cap.  The value of remux-buf-cap MUST be an
         integer in the range of 0 to 4294967295, inclusive.

      sprop-remux-init-buf-time:
         This parameter MAY be used to signal the properties of the NAL
         unit stream.  The parameter MUST NOT be present if mst-mode is
         not present or the value of mst-mode is equal to "NI-T".

         The parameter signals the initial buffering time that a
         receiver MUST wait before starting to recover the NAL unit
         decoding order as specified in Section 6.2.2 of this memo.

         The parameter is coded as a non-negative base10 integer
         representation in clock ticks of a 90-kHz clock.  If the
         parameter is not present, then no initial buffering time value
         is defined.  Otherwise, the value of sprop-remux-init-buf-time
         MUST be an integer in the range of 0 to 4294967295, inclusive.

      sprop-mst-max-don-diff:
         This parameter MAY be used to signal the properties of the NAL
         unit stream.  It MUST NOT be used to signal transmitter or
         receiver or codec capabilities.  The parameter MUST NOT be
         present if mst-mode is not present or the value of mst-mode is
         equal to "NI-T".  sprop-mst-max-don-diff is an integer in the
         range of 0 to 32767, inclusive.  If sprop-mst-max-don-diff is
         not present, the value of the parameter is unspecified.  sprop-
         mst-max-don-diff is calculated same as sprop-max-don-diff as
         specified in [RFC6184], with decoding order number being
         replaced by cross-session decoding order number.

RFC6190 - Page 71

      sprop-scalability-info:
         This parameter MAY be used to convey the NAL unit containing
         the scalability information SEI message as specified in Annex G
         of [H.264].  This parameter MAY be used to signal the contained
         layers of an SVC bitstream.  The parameter MUST NOT be used to
         indicate codec capability in any capability exchange procedure.
         The value of the parameter is the base64 [RFC4648]
         representation of the NAL unit containing the scalability
         information SEI message.  If present, the NAL unit MUST contain
         only one SEI message that is a scalability information SEI
         message.

         This parameter MAY be used in an offering or declarative SDP
         message to indicate what layers (operation points) can be
         provided.  A receiver MAY indicate its choice of one layer
         using the optional media type parameter scalable-layer-id.

      scalable-layer-id:
         This parameter MAY be used to signal a receiver's choice of the
         offers or declared operation points or layers using sprop-
         scalability-info or sprop-operation-point-info.  The value of
         scalable-layer-id is a base16 representation of the layer_id[ i
         ] syntax element in the scalability information SEI message as
         specified in Annex G of [H.264] or layer-ID contained in sprop-
         operation-point-info.

      sprop-operation-point-info:
         This parameter MAY be used to describe the operation points of
         an RTP session.  The value of this parameter consists of a
         comma-separated list of operation-point-description vectors.
         The values given by the operation-point-description vectors are
         the same as, or are derived from, the values that would be
         given for a scalable layer in the scalability information SEI
         message as specified in Annex G of [H.264], where the term
         scalable layer in the scalability information SEI message
         refers to all NAL units associated with the same values of
         temporal_id, dependency_id, and quality_id.  In this memo, such
         a set of NAL units is called an operation point.

         Each operation-point-description vector has ten elements,
         provided as a comma-separated list of values as defined below.
         The first value of the operation-point-description vector is
         preceded by a '<', and the last value of the operation-point-
         description vector is followed by a '>'.  If the sprop-
         operation-point-info is followed by exactly one operation-
         point-description vector, this describes the highest operation
         point contained in the RTP session.  If there are two or more

RFC6190 - Page 72

         operation-point-description vectors, the first describes the
         lowest and the last describes the highest operation point
         contained in the RTP session.

         The values given by the operation-point-description vector are
         as follows, in the order listed:

          - layer-ID: This value specifies the layer identifier of the
            operation point, which is identical to the layer_id that
            would be indicated (for the same values of dependency_id,
            quality_id, and temporal_id) in the scalability information
            SEI message.  This field MAY be empty, indicating that the
            value is unspecified.  When there are multiple operation-
            point-description vectors with layer-ID, the values of
            layer-ID do not need to be consecutive.

          - temporal-ID: This value specifies the temporal_id of the
            operation point.  This field MUST NOT be empty.

          - dependency-ID: This values specifies the dependency_id of
            the operation point.  This field MUST NOT be empty.

          - quality-ID: This values specifies the quality_id of the
            operation point.  This field MUST NOT be empty.

          - profile-level-ID: This value specifies the profile-level-idc
            of the operation point in the base16 format.  The default
            sub-profile or default level indicated by the parameter
            profile-level-ID in the sprop-operation-point-info vector
            SHALL be equal to or lower than the default sub-profile or
            default level indicated by profile-level-id, which may be
            either present or the default value is taken.  This field
            MAY be empty, indicating that the value is unspecified.

          - avg-framerate: This value specifies the average frame rate
            of the operation point.  This value is given as an integer
            in frames per 256 seconds.  The field MAY be empty,
            indicating that the value is unspecified.

          - width: This value specifies the width dimension in pixels of
            decoded frames for the operation point.  This parameter is
            not directly given in the scalability information SEI
            message.  This field MAY be empty, indicating that the value
            is unspecified.

RFC6190 - Page 73

          - height: This value gives the height dimension in pixels of
            decoded frames for the operation point.  This parameter is
            not directly given in the scalability information SEI.  This
            field MAY be empty, indicating that the value is
            unspecified.

          - avg-bitrate: This value specifies the average bitrate of the
            operation point.  This parameter is given as an integer in
            kbits per second over the entire stream.  Note that this
            parameter is provided in the scalability information SEI
            message in bits per second and calculated over a variable
            time window.  This field MAY be empty, indicating that the
            value is unspecified.

          - max-bitrate: This value specifies the maximum bitrate of the
            operation point.  This parameter is given as an integer in
            kbits per second and describes the maximum bitrate per each
            one-second window.  Note that this parameter is provided in
            the scalability information SEI message in bits per second
            and is calculated over a variable time window.  This field
            MAY be empty, indicating that the value is unspecified.

            Similarly to sprop-scalability-info, this parameter MAY be
            used in an offering or declarative SDP message to indicate
            what layers (operation points) can be provided.  A receiver
            MAY indicate its choice of the highest layer it wants to
            send and/or receive using the optional media type parameter
            scalable-layer-id.

      sprop-no-NAL-reordering-required:
         This parameter MAY be used to signal the properties of the NAL
         unit stream.  This parameter MUST NOT be present when mst-mode
         is not present or the value of mst-mode is not equal to "NI-T".
         The presence of this parameter indicates that no reordering of
         non-VCL or VCL NAL units is required for the decoding order
         recovery process.

      sprop-avc-ready:
         This parameter MAY be used to indicate the properties of the
         NAL unit stream.  The presence of this parameter indicates that
         the RTP session, if used in SST, or used in MST combined with
         other RTP sessions also with this parameter present, can be
         processed by a [RFC6184] receiver.  This parameter MAY be used
         with RTP sessions with media subtype H264-SVC.

      Encoding considerations:
         This media type is framed and binary; see Section 4.8 of RFC
         4288 [RFC4288].

RFC6190 - Page 74

      Security considerations:
         See Section 8 of RFC 6190.

      Published specification:
         Please refer to RFC 6190 and its Section 13.

      Additional information:
         none

      File extensions:     none

      Macintosh file type code: none

      Object identifier or OID: none

      Person & email address to contact for further information:

         Ye-Kui Wang, yekui.wang@huawei.com

      Intended usage:      COMMON

      Restrictions on usage:
         This media type depends on RTP framing, and hence is only
         defined for transfer via RTP [RFC3550].  Transport within other
         framing protocols is not defined at this time.

      Interoperability considerations:
         The media subtype name contains "SVC" to avoid potential
         conflict with RFC 3984 and its potential future replacement RTP
         payload format for H.264 non-SVC profiles.

      Applications that use this media type:
         Real-time video applications like video streaming, video
         telephony, and video conferencing.

      Author:

         Ye-Kui Wang, yekui.wang@huawei.com

      Change controller:
         IETF Audio/Video Transport working group delegated from the
         IESG.

(next page on part 4)