6. De-Packetization Process
6.1. De-Packetization Process for Single-Session Transmission
For single-session transmission, where a single RTP session is used, the de-packetization process specified in Section 7 of [RFC6184] applies.6.2. De-Packetization Process for Multi-Session Transmission
For multi-session transmission, where more than one RTP session is used to receive data from the same SVC bitstream, the de- packetization process is specified as follows. As for a single RTP session, the general concept behind the de- packetization process is to reorder NAL units from transmission order to the NAL unit decoding order. The sessions to be received MUST be identified by mechanisms specified in Section 7.2.3. An enhancement RTP session typically contains an RTP stream that depends on at least one other RTP session, as indicated by mechanisms defined in Section 7.2.3. A lower RTP session to an enhancement RTP session is an RTP session on which the enhancement RTP session depends. The lowest RTP session for a receiver is the base RTP session, which does not depend on any other RTP session received by the receiver. The highest RTP session for a receiver is the RTP session on which no other RTP session received by the receiver depends.
For each of the RTP sessions, the RTP reception process as specified in RFC 3550 is applied. Then the received packets are passed into the payload de-packetization process as defined in this memo. The decoding order of the NAL units carried in all the associated RTP sessions is then recovered by applying one of the following subsections, depending on which of the MST packetization modes is in use.6.2.1. Decoding Order Recovery for the NI-T and NI-TC Modes
The following process MUST be applied when the NI-T packetization mode is in use. The following process MAY be applied when the NI-TC packetization mode is in use. The process is based on RTP session dependency signaling, RTP sequence numbers, and timestamps. The decoding order of NAL units within an RTP packet stream in RTP session is given by the ordering of sequence numbers SN of the RTP packets that contain the NAL units, and the order of appearance of NAL units within a packet. Timing information according to the media timestamp TS, i.e., the NTP timestamp as derived from the RTP timestamp of an RTP packet, is associated with all NAL units contained in the same RTP packet received in an RTP session. For NI-MTAP packets the NALU-time is derived for each contained NAL unit by using the "TS offset" value in the NI-MTAP packet as defined in Section 4.10, and is used instead of the RTP packet timestamp to derive the media timestamp, e.g., using the NTP wall clock as provided via RTCP sender reports. NAL units contained in fragmentation packets are handled as defragmented, entire NAL units with their own media timestamps. All NAL units associated with the same value of media timestamp TS are part of the same access unit AU(TS). Any empty NAL units SHOULD be kept as, effectively, access unit indicators in the reordering process. Empty NAL units and PACSI NAL units SHOULD be removed before passing access unit data to the decoder. Informative note: These empty NAL units are used to associate NAL units present in other RTP sessions with RTP sessions not containing any data for an access unit of a particular time instance. They act as access unit indicators in sessions that would otherwise contain no data for the particular access unit. The presence of these NAL units is ensured by the packetization rules in Section 5.2.1.
It is assumed that the receiver has established an operation point (DID, QID, and TID values), and has identified the highest enhancement RTP session for this operation point. The decoding order of NAL units from multiple RTP streams in multiple RTP sessions MUST be recovered into a single sequence of NAL units, grouped into access units, by performing any process equivalent to the following steps. The general process is described in Section 4.2 of [RFC6051]. For convenience the instructions of [RFC6051] are repeated and applied to NAL units rather than to full RTP packets. Additionally, SVC- specific extensions to the procedure in Section 4.2. of [RFC6051] are presented in the following list: o The process should be started with the NAL units received in the highest RTP session with the first media timestamp TS (in NTP format) available in the session's (de-jittering) buffer. It is assumed that packets in the de-jittering buffer are already stored in RTP sequence number order. o Collect all NAL units associated with the same value of media timestamp TS, starting from the highest RTP session, from all the (de-jittering) buffers of the received RTP sessions. The collected NAL units will be those associated with the access unit AU(TS). o Place the collected NAL units in the order of session dependency as derived by the dependency indication as specified in Section 7.2.3, starting from the lowest RTP session. o Place the session ordered NAL units in decoding order within the particular access unit by satisfying the NAL unit ordering rules for SVC access units, as described in the informative algorithm provided in Section 6.2.1.1. o Remove NI-MTAP and any PACSI NAL units from the access unit AU(TS). o The access units can then be transferred to the decoder. Access units AU(TS) are transferred to the decoder in the order of appearance (given by the order of RTP sequence numbers) of media timestamp values TS in the highest RTP session associated with access unit AU(TS). Informative note: Due to packet loss it is possible that not all sessions may have NAL units present for the media timestamp value TS present in the highest RTP session. In such a case, an algorithm may: a) proceed to the next complete access unit with NAL units present in all the received RTP sessions; or b) consider a new highest RTP
session, the highest RTP session for which the access unit is complete, and apply the process above. The algorithm may return to the original highest RTP session when a complete and error-free access unit that contains NAL units in all the sessions is received. The following gives an informative example. The example shown in Figure 6 refers to three RTP sessions A, B, and C containing an SVC bitstream transmitted as 3 sources. In the example, the dependency signaling (described in Section 7.2.3) indicates that session A is the base RTP session, B is the first enhancement RTP session and depends on A, and C is the second enhancement RTP session and depends on A and B. A hierarchical picture coding prediction structure is used, in which session A has the lowest frame rate and sessions B and C have the same but higher frame rate. The figure shows NAL units contained in RTP packets that are stored in the de-jittering buffer at the receiver for session de- packetization. The NAL units are already reordered according to their RTP sequence number order and, if within an aggregation packet, according to the order of their appearance within the aggregation packet. The figure indicates for the received NAL units the decoding order within the sessions, as well as the associated media (NTP) timestamps ("TS[..]"). NAL units of the same access unit within a session are grouped by "(.,.)" and share the same media timestamp TS, which is shown at the bottom of the figure. Note that the timestamps are not in increasing order since, in this example, the decoding order is different from the output/display order. The process first proceeds to the NAL units associated with the first media timestamp TS[1] present in the highest session C and removes/ignores all preceding (in decoding order) NAL units to NAL units with TS[1] in each of the de-jittering buffers of RTP sessions A, B, and C. Then, starting from session C, the first media timestamp available in decoding order (TS[1]) is selected and NAL units starting from RTP session A, and sessions B and C are placed in order of the RTP session dependency as required by Section 7.2.3 of this memo (in the example for TS[1]: first session B and then session C) into the access unit AU(TS[1]) associated with media timestamp TS[1]. Then the next media timestamp TS[3] in order of appearance in the highest RTP session C is processed and the process described above is repeated. Note that there may be access units with no NAL units present, e.g., in the lowest RTP session A (see, e.g., TS[1]). With TS[8], the first access unit with NAL units present in all the RTP sessions appears in the buffers.
C: ------------(1,2)-(3,4)--(5)---(6)---(7,8)(9,10)-(11)--(12)---- | | | | | | | | | | B: -(1,2)-(3,4)-(5)---(6)--(7,8)-(9,10)-(11)-(12)--(13,14)(15,15)- | | | | | | A: -------(1)---------------(2)---(3)---------------(4)----(5)---- ---------------------------------------------------decoding order--> TS: [4] [2] [1] [3] [8] [6] [5] [7] [12] [10] Key: A, B, C - RTP sessions Integer values in "()" - NAL unit decoding order within RTP session "( )" - groups the NAL units of an access unit in an RTP session "|" - indicates corresponding NAL units of the same access unit AU(TS[..]) in the RTP sessions Integer values in "[]" - media timestamp TS, sampling time as derived, e.g., from NTP timestamp associated with the access unit AU(TS[..]), consisting of NAL units in the sessions above each TS value. Figure 6. Example of decoding order recovery in multi-source transmission.6.2.1.1. Informative Algorithm for NI-T Decoding Order Recovery within an Access Unit
Within an access unit, the [H.264] specification (Sections 7.4.1.2.3 and G.7.4.1.2.3) constrains the valid decoding order of NAL units. These constraints make it possible to reconstruct a valid decoding order for the NAL units of an access unit based only on the order of NAL units in each session, the NAL unit headers, and Supplemental Enhancement Information message headers. This section specifies an informative algorithm to reconstruct a valid decoding order for NAL units within an access unit. Other NAL unit orderings may also be valid; however, any compliant NAL unit ordering will describe the same video stream and ancillary data as the one produced by this algorithm. An actual implementation, of course, needs only to behave "as if" this reordering is done. In particular, NAL units that are discarded by an implementation's decoding process do not need to be reordered.
In this algorithm, NAL units within an access unit are first ordered by NAL unit type, in the order specified in Table 12 below, except from NAL unit type 14, which is handled specially as described in the table. NAL units of the same type are then ordered as specified for the type, if necessary. For the purposes of this algorithm, "session order" is the order of NAL units implied by their transmission order within an RTP session. For the non-interleaved and single NAL unit modes, this is the RTP sequence number order coupled with the order of NAL units within an aggregation unit. Table 12. Ordering of NAL unit types within an Access Unit Type Description / Comments ----------------------------------------------------------- 9 Access unit delimiter 7 Sequence parameter set 13 Sequence parameter set extension 15 Subset sequence parameter set 8 Picture parameter set 16-18 Reserved 6 Supplemental enhancement information (SEI) If an SEI message with a first payload of 0 (Buffering Period) is present, it must be the first SEI message. If SEI messages with a Scalable Nesting (30) payload and a nested payload of 0 (Buffering Period) are present, these then follow the first SEI message. Such an SEI message with the all_layer_representations_in_au_flag equal to 1 is placed first, followed by any others, sorted in increasing order of DQId. All other SEI messages follow in any order. 14 Prefix NAL unit in scalable extension 1 Coded slice of a non-IDR picture 5 Coded slice of an IDR picture
NAL units of type 1 or 5 will be sent within only a single session for any given access unit. They are placed in session order. (Note: Any given access unit will contain only NAL units of type 1 or type 5, not both.) If NAL units of type 14 are present, every NAL unit of type 1 or 5 is prefixed by a NAL unit of type 14. (Note: Within an access unit, every NAL unit of type 14 is identical, so correlation of type 14 NAL units with the other NAL units is not necessary.) 12 Filler data The only restriction of filler data NAL units within an access unit is that they shall not precede the first VCL NAL unit with the same access unit. 19 Coded slice of an auxiliary coded picture without partitioning These NAL units will be sent within only a single session for any given access unit, and are placed in session order. 20 Coded slice in scalable extension 21-23 Reserved Type 20 NAL units are placed in increasing order of DQId. Within each DQId value, they are placed in session order. (Note: SVC slices with a given DQId value will be sent within only a single session for any given access unit.) Type 21-23 NAL units are placed immediately following the non-reserved-type VCL NAL unit they follow in session order. 10 End of sequence 11 End of stream6.2.2. Decoding Order Recovery for the NI-C, NI-TC, and I-C Modes
The following process MUST be used when either the NI-C or I-C MST packetization mode is in use. The following process MAY be applied when the NI-TC MST packetization mode is in use.
The RTP packets output from the RTP-level reception processing for each session are placed into a re-multiplexing buffer. It is RECOMMENDED to set the size of the re-multiplexing buffer (in bytes) equal to or greater than the value of the sprop-remux-buf-req media type parameter of the highest RTP session the receiver receives. The CS-DON value is calculated and stored for each NAL unit. Informative note: The CS-DON value of a NAL unit may rely on information carried in another packet than the packet containing the NAL unit. This happens, e.g., when the CS-DON values need to be derived for non-PACSI NAL units contained in single NAL unit packets, as the single NAL unit packets themselves do not contain CS-DON information. In this case, when no packet containing required CS-DON information is received for a NAL unit, this NAL unit has to be discarded by the receiver as it cannot be fed to the decoder in the correct order. When the optional media type parameter sprop-mst-csdon-always-present is equal to 1, no such dependency exists, i.e., the CS-DON value of any particular NAL unit can be derived solely according to information in the packet containing the NAL unit, and therefore, the receiver does not need to discard any received NAL units. The receiver operation is described below with the help of the following functions and constants: o Function AbsDON is specified in Section 8.1 of [RFC6184]. o Function don_diff is specified in Section 5.5 of [RFC6184]. o Constant N is the value of the OPTIONAL sprop-mst-remux-buf-size media type parameter of the highest RTP session incremented by 1. Initial buffering lasts until one of the following conditions is fulfilled: o There are N or more VCL NAL units in the re-multiplexing buffer. o If sprop-mst-max-don-diff of the highest RTP session is present, don_diff(m,n) is greater than the value of sprop-mst-max-don-diff of the highest RTP session, where n corresponds to the NAL unit having the greatest value of AbsDON among the received NAL units and m corresponds to the NAL unit having the smallest value of AbsDON among the received NAL units.
o Initial buffering has lasted for the duration equal to or greater than the value of the OPTIONAL sprop-remux-init-buf-time media type parameter of the highest RTP session. The NAL units to be removed from the re-multiplexing buffer are determined as follows: o If the re-multiplexing buffer contains at least N VCL NAL units, NAL units are removed from the re-multiplexing buffer and passed to the decoder in the order specified below until the buffer contains N-1 VCL NAL units. o If sprop-mst-max-don-diff of the highest RTP session is present, all NAL units m for which don_diff(m,n) is greater than sprop- max-don-diff of the highest RTP session are removed from the re- multiplexing buffer and passed to the decoder in the order specified below. Herein, n corresponds to the NAL unit having the greatest value of AbsDON among the NAL units in the re- multiplexing buffer. The order in which NAL units are passed to the decoder is specified as follows: o Let PDON be a variable that is initialized to 0 at the beginning of the RTP sessions. o For each NAL unit associated with a value of CS-DON, a CS-DON distance is calculated as follows. If the value of CS-DON of the NAL unit is larger than the value of PDON, the CS-DON distance is equal to CS-DON - PDON. Otherwise, the CS-DON distance is equal to 65535 - PDON + CS-DON + 1. o NAL units are delivered to the decoder in increasing order of CS- DON distance. If several NAL units share the same value of CS- DON distance, they can be passed to the decoder in any order. o When a desired number of NAL units have been passed to the decoder, the value of PDON is set to the value of CS-DON for the last NAL unit passed to the decoder.7. Payload Format Parameters
This section specifies the parameters that MAY be used to select optional features of the payload format and certain features of the bitstream. The parameters are specified here as part of the media type registration for the SVC codec. A mapping of the parameters into the Session Description Protocol (SDP) [RFC4566] is also
provided for applications that use SDP. Equivalent parameters could be defined elsewhere for use with control protocols that do not use SDP. Some parameters provide a receiver with the properties of the stream that will be sent. The names of all these parameters start with "sprop" for stream properties. Some of these "sprop" parameters are limited by other payload or codec configuration parameters. For example, the sprop-parameter-sets parameter is constrained by the profile-level-id parameter. The media sender selects all "sprop" parameters rather than the receiver. This uncommon characteristic of the "sprop" parameters may be incompatible with some signaling protocol concepts, in which case the use of these parameters SHOULD be avoided.7.1. Media Type Registration
The media subtype for the SVC codec has been allocated from the IETF tree. The receiver MUST ignore any unspecified parameter. Informative note: Requiring that the receiver ignore unspecified parameters allows for backward compatibility of future extensions. For example, if a future specification that is backward compatible to this specification specifies some new parameters, then a receiver according to this specification is capable of receiving data per the new payload but ignoring those parameters newly specified in the new payload specification. This provision is also present in [RFC6184]. Media Type name: video Media subtype name: H264-SVC Required parameters: none OPTIONAL parameters: In the following definitions of parameters, "the stream" or "the NAL unit stream" refers to all NAL units conveyed in the current RTP session in SST, and all NAL units conveyed in the current RTP session and all NAL units conveyed in other RTP sessions that the current RTP session depends on in MST.
profile-level-id: A base16 [RFC4648] (hexadecimal) representation of the following three bytes in the sequence parameter set or subset sequence parameter set NAL unit specified in [H.264]: 1) profile_idc; 2) a byte herein referred to as profile-iop, composed of the values of constraint_set0_flag, constraint_set1_flag, constraint_set2_flag, constraint_set3_flag, constraint_set4_flag, constraint_set5_flag, and reserved_zero_2bits, in bit- significance order, starting from the most-significant bit, and 3) level_idc. Note that reserved_zero_2bits is required to be equal to 0 in [H.264], but other values for it may be specified in the future by ITU-T or ISO/IEC. The profile-level-id parameter indicates the default sub- profile, i.e., the subset of coding tools that may have been used to generate the stream or that the receiver supports, and the default level of the stream or the one that the receiver supports. The default sub-profile is indicated collectively by the profile_idc byte and some fields in the profile-iop byte. Depending on the values of the fields in the profile-iop byte, the default sub-profile may be the same set of coding tools supported by one profile, or a common subset of coding tools of multiple profiles, as specified in Subsection G.7.4.2.1.1 of [H.264]. The default level is indicated by the level_idc byte, and, when profile_idc is equal to 66, 77, or 88 (the Baseline, Main, or Extended profile) and level_idc is equal to 11, additionally by bit 4 (constraint_set3_flag) of the profile-iop byte. When profile_idc is equal to 66, 77, or 88 (the Baseline, Main, or Extended profile) and level_idc is equal to 11, and bit 4 (constraint_set3_flag) of the profile-iop byte is equal to 1, the default level is Level 1b. Table 13 lists all profiles defined in Annexes A and G of [H.264] and, for each of the profiles, the possible combinations of profile_idc and profile-iop that represent the same sub-profile. Table 13. Combinations of profile_idc and profile-iop representing the same sub-profile corresponding to the full set of coding tools supported by one profile. In the following, x may be either 0 or 1, while the profile names are indicated as follows. CB: Constrained Baseline profile, B: Baseline profile, M: Main profile, E: Extended profile, H: High profile, H10: High 10 profile, H42: High 4:2:2 profile, H44: High 4:4:4 Predictive profile, H10I: High 10 Intra profile, H42I: High
4:2:2 Intra profile, H44I: High 4:4:4 Intra profile, C44I: CAVLC 4:4:4 Intra profile, SB: Scalable Baseline profile, SH: Scalable High profile, and SHI: Scalable High Intra profile. Profile profile_idc profile-iop (hexadecimal) (binary) CB 42 (B) x1xx0000 same as: 4D (M) 1xxx0000 same as: 58 (E) 11xx0000 B 42 (B) x0xx0000 same as: 58 (E) 10xx0000 M 4D (M) 0x0x0000 E 58 00xx0000 H 64 00000000 H10 6E 00000000 H42 7A 00000000 H44 F4 00000000 H10I 6E 00010000 H42I 7A 00010000 H44I F4 00010000 C44I 2C 00010000 SB 53 x0000000 SH 56 0x000000 SHI 56 0x010000 For example, in the table above, profile_idc equal to 58 (Extended) with profile-iop equal to 11xx0000 indicates the same sub-profile corresponding to profile_idc equal to 42 (Baseline) with profile-iop equal to x1xx0000. Note that other combinations of profile_idc and profile-iop (not listed in Table 13) may represent a sub-profile equivalent to the common subset of coding tools for more than one profile. Note also that a decoder conforming to a certain profile may be able to decode bitstreams conforming to other profiles. If profile-level-id is used to indicate stream properties, it indicates that, to decode the stream, the minimum subset of coding tools a decoder has to support is the default sub- profile, and the lowest level the decoder has to support is the default level. If the profile-level-id parameter is used for capability exchange or session setup, it indicates the subset of coding tools, which is equal to the default sub-profile, that the codec supports for both receiving and sending. If max-recv- level is not present, the default level from profile-level-id indicates the highest level the codec wishes to support. If
max-recv-level is present, it indicates the highest level the codec supports for receiving. For either receiving or sending, all levels that are lower than the highest level supported MUST also be supported. Informative note: Capability exchange and session setup procedures should provide means to list the capabilities for each supported sub-profile separately. For example, the one-of-N codec selection procedure of the SDP Offer/Answer model can be used (Section 10.2 of [RFC3264]). The one-of-N codec selection procedure may also be used to provide different combinations of profile_idc and profile-iop that represent the same sub-profile. When there are many different combinations of profile_idc and profile-iop that represent the same sub-profile, using the one-of-N codec selection procedure may result in a fairly large SDP message. Therefore, a receiver should understand the different equivalent combinations of profile_idc and profile-iop that represent the same sub-profile, and be ready to accept an offer using any of the equivalent combinations. If no profile-level-id is present, the Baseline Profile without additional constraints at Level 1 MUST be implied. max-recv-level: This parameter MAY be used to indicate the highest level a receiver supports when the highest level is higher than the default level (the level indicated by profile-level-id). The value of max-recv-level is a base16 (hexadecimal) representation of the two bytes after the syntax element profile_idc in the sequence parameter set NAL unit specified in [H.264]: profile-iop (as defined above) and level_idc. If (the level_idc byte of max-recv-level is equal to 11 and bit 4 of the profile-iop byte of max-recv-level is equal to 1) or (the level_idc byte of max-recv-level is equal to 9 and bit 4 of the profile-iop byte of max-recv-level is equal to 0), the highest level the receiver supports is Level 1b. Otherwise, the highest level the receiver supports is equal to the level_idc byte of max-recv-level divided by 10. max-recv-level MUST NOT be present if the highest level the receiver supports is not higher than the default level. max-recv-base-level: This parameter MAY be used to indicate the highest level a receiver supports for the base layer when negotiating an SVC stream. The value of max-recv-base-level is a base16
(hexadecimal) representation of the two bytes after the syntax element profile_idc in the sequence parameter set NAL unit specified in [H.264]: profile-iop (as defined above) and level_idc. If (the level_idc byte of max-recv-level is equal to 11 and bit 4 of the profile-iop byte of max-recv-level is equal to 1) or (the level_idc byte of max-recv-level is equal to 9 and bit 4 of the profile-iop byte of max-recv-level is equal to 0), the highest level the receiver supports for the base layer is Level 1b. Otherwise, the highest level the receiver supports for the base layer is equal to the level_idc byte of max-recv-level divided by 10. max-mbps, max-fs, max-cpb, max-dpb, and max-br: The common properties of these parameters are specified in [RFC6184]. max-mbps: This parameter is as specified in [RFC6184]. max-fs: This parameter is as specified in [RFC6184]. max-cpb: The value of max-cpb is an integer indicating the maximum coded picture buffer size in units of 1000 bits for the VCL HRD parameters and in units of 1200 bits for the NAL HRD parameters. Note that this parameter does not use units of cpbBrVclFactor and cpbBrNALFactor (see Table A-1 of [H.264]). The max-cpb parameter signals that the receiver has more memory than the minimum amount of coded picture buffer memory required by the signaled highest level conveyed in the value of the profile-level-id parameter or the max-recv-level parameter. When max-cpb is signaled, the receiver MUST be able to decode NAL unit streams that conform to the signaled highest level, with the exception that the MaxCPB value in Table A-1 of [H.264] for the signaled highest level is replaced with the value of max-cpb (after taking cpbBrVclFactor and cpbBrNALFactor into consideration when needed). The value of max-cpb (after taking cpbBrVclFactor and cpbBrNALFactor into consideration when needed) MUST be greater than or equal to the value of MaxCPB given in Table A-1 of [H.264] for the highest level. Senders MAY use this knowledge to construct coded video streams with greater variation of bitrate than can be achieved with the MaxCPB value in Table A-1 of [H.264].
Informative note: The coded picture buffer is used in the Hypothetical Reference Decoder (HRD, Annex C) of [H.264]. The use of the HRD is recommended in SVC encoders to verify that the produced bitstream conforms to the standard and to control the output bitrate. Thus, the coded picture buffer is conceptually independent of any other potential buffers in the receiver, including de-interleaving, re-multiplexing, and de-jitter buffers. The coded picture buffer need not be implemented in decoders as specified in Annex C of [H.264]; standard-compliant decoders can have any buffering arrangements provided that they can decode standard- compliant bitstreams. Thus, in practice, the input buffer for video decoder can be integrated with the de- interleaving, re-multiplexing, and de-jitter buffers of the receiver. max-dpb: This parameter is as specified in [RFC6184]. max-br: The value of max-br is an integer indicating the maximum video bitrate in units of 1000 bits per second for the VCL HRD parameters and in units of 1200 bits per second for the NAL HRD parameters. Note that this parameter does not use units of cpbBrVclFactor and cpbBrNALFactor (see Table A-1 of [H.264]). The max-br parameter signals that the video decoder of the receiver is capable of decoding video at a higher bitrate than is required by the signaled highest level conveyed in the value of the profile-level-id parameter or the max-recv-level parameter. When max-br is signaled, the video codec of the receiver MUST be able to decode NAL unit streams that conform to the signaled highest level, with the following exceptions in the limits specified by the highest level: o The value of max-br (after taking cpbBrVclFactor and cpbBrNALFactor into consideration when needed) replaces the MaxBR value in Table A-1 of [H.264] for the highest level. o When the max-cpb parameter is not present, the result of the following formula replaces the value of MaxCPB in Table A-1 of [H.264]: (MaxCPB of the signaled level) * max-br / (MaxBR of the signaled highest level). For example, if a receiver signals capability for Main profile Level 1.2 with max-br equal to 1550, this indicates a maximum video bitrate of 1550 kbits/sec for VCL HRD parameters, a
maximum video bitrate of 1860 kbits/sec for NAL HRD parameters, and a CPB size of 4036458 bits (1550000 / 384000 * 1000 * 1000). The value of max-br (after taking cpbBrVclFactor and cpbBrNALFactor into consideration when needed) MUST be greater than or equal to the value MaxBR given in Table A-1 of [H.264] for the signaled highest level. Senders MAY use this knowledge to send higher-bitrate video as allowed in the level definition of SVC, to achieve improved video quality. Informative note: This parameter was added primarily to complement a similar codepoint in the ITU-T Recommendation H.245, so as to facilitate signaling gateway designs. No assumption can be made from the value of this parameter that the network is capable of handling such bitrates at any given time. In particular, no conclusion can be drawn that the signaled bitrate is possible under congestion control constraints. redundant-pic-cap: This parameter is as specified in [RFC6184]. sprop-parameter-sets: This parameter MAY be used to convey any sequence parameter set, subset sequence parameter set, and picture parameter set NAL units (herein referred to as the initial parameter set NAL units) that can be placed in the NAL unit stream to precede any other NAL units in decoding order and that are associated with the default level of profile-level-id. The parameter MUST NOT be used to indicate codec capability in any capability exchange procedure. The value of the parameter is a comma (',') separated list of base64 [RFC4648] representations of the parameter set NAL units as specified in Sections 7.3.2.1, 7.3.2.2, and G.7.3.2.1 of [H.264]. Note that the number of bytes in a parameter set NAL unit is typically less than 10, but a picture parameter set NAL unit can contain several hundreds of bytes. Informative note: When several payload types are offered in the SDP Offer/Answer model, each with its own sprop- parameter-sets parameter, then the receiver cannot assume that those parameter sets do not use conflicting storage locations (i.e., identical values of parameter set
identifiers). Therefore, a receiver should buffer all sprop-parameter-sets and make them available to the decoder instance that decodes a certain payload type. sprop-level-parameter-sets: This parameter MAY be used to convey any sequence, subset sequence, and picture parameter set NAL units (herein referred to as the initial parameter set NAL units) that can be placed in the NAL unit stream to precede any other NAL units in decoding order and that are associated with one or more levels different than the default level of profile-level-id. The parameter MUST NOT be used to indicate codec capability in any capability exchange procedure. The sprop-level-parameter-sets parameter contains parameter sets for one or more levels that are different than the default level. All parameter sets targeted for use when one level of the default sub-profile is accepted by a receiver are clustered and prefixed with a three-byte field that has the same syntax as profile-level-id. This enables the receiver to install the parameter sets for the accepted level and discard the rest. The three-byte field is named PLId, and all parameter sets associated with one level are named PSL, which has the same syntax as sprop-parameter-sets. Parameter sets for each level are represented in the form of PLId:PSL, i.e., PLId followed by a colon (':') and the base64 [RFC4648] representation of the initial parameter set NAL units for the level. Each pair of PLId:PSL is also separated by a colon. Note that a PSL can contain multiple parameter sets for that level, separated with commas (','). The subset of coding tools indicated by each PLId field MUST be equal to the default sub-profile, and the level indicated by each PLId field MUST be different than the default level. Informative note: This parameter allows for efficient level downgrade or upgrade in SDP Offer/Answer and out-of-band transport of parameter sets, simultaneously. in-band-parameter-sets: This parameter MAY be used to indicate a receiver capability. The value MAY be equal to either 0 or 1. The value 1 indicates that the receiver discards out-of-band parameter sets in sprop- parameter-sets and sprop-level-parameter-sets, therefore the sender MUST transmit all parameter sets in-band. The value 0 indicates that the receiver utilizes out-of-band parameter sets included in sprop-parameter-sets and/or sprop-level-parameter- sets. However, in this case, the sender MAY still choose to
send parameter sets in-band. When the parameter is not present, this receiver capability is not specified, and therefore the sender MAY send out-of-band parameter sets only, or it MAY send in-band-parameter-sets only, or it MAY send both. packetization-mode: This parameter is as specified in [RFC6184]. When the mst-mode parameter is present, the value of this parameter is additionally constrained as follows. If mst-mode is equal to "NI-T", "NI-C", or "NI-TC", packetization-mode MUST NOT be equal to 2. Otherwise, (mst-mode is equal to "I-C"), packetization-mode MUST be equal to 2. sprop-interleaving-depth: This parameter is as specified in [RFC6184]. sprop-deint-buf-req: This parameter is as specified in [RFC6184]. deint-buf-cap: This parameter is as specified in [RFC6184]. sprop-init-buf-time: This parameter is as specified in [RFC6184]. sprop-max-don-diff: This parameter is as specified in [RFC6184]. max-rcmd-nalu-size: This parameter is as specified in [RFC6184]. mst-mode: This parameter MAY be used to signal the properties of a NAL unit stream or the capabilities of a receiver implementation. If this parameter is present, multi-session transmission MUST be used. Otherwise (this parameter is not present), single- session transmission MUST be used. When this parameter is present, the following applies. When the value of mst-mode is equal to "NI-T", the NI-T mode MUST be used. When the value of mst-mode is equal to "NI-C", the NI-C mode MUST be used. When the value of mst-mode is equal to "NI-TC", the NI-TC mode MUST be used. When the value of mst-mode is equal to "I-C", the I-C mode MUST be used. The value of mst-mode MUST have one of the following tokens: "NI-T", "NI-C", "NI-TC", or "I-C". All RTP sessions in an MST MUST have the same value of mst- mode.
sprop-mst-csdon-always-present: This parameter MUST NOT be present when mst-mode is not present or the value of mst-mode is equal to "NI-T" or "I-C". This parameter signals the properties of the NAL unit stream. When sprop-mst-csdon-always-present is present and the value is equal to 1, packetization-mode MUST be equal to 1, and all the RTP packets carrying the NAL unit stream MUST be STAP-A packets containing a PACSI NAL unit that further contains the DONC field or NI-MTAP packets with the J field equal to 1. When sprop-mst-csdon-always-present is present and the value is equal to 1, the CS-DON value of any particular NAL unit can be derived solely according to information in the packet containing the NAL unit. When sprop-mst-csdon-always-present is present in the current RTP session, it MUST be present also in all the RTP sessions the current RTP session depends on and the value of sprop-mst- csdon-always-present is identical for the current RTP session and all the RTP sessions on which the current RTP session depends. sprop-mst-remux-buf-size: This parameter MUST NOT be present when mst-mode is not present or the value of mst-mode is equal to "NI-T". This parameter MUST be present when mst-mode is present and the value of mst- mode is equal to "NI-C", "NI-TC", or "I-C". This parameter signals the properties of the NAL unit stream. It MUST be set to a value one less than the minimum re- multiplexing buffer size (in NAL units), so that it is guaranteed that receivers can reconstruct NAL unit decoding order as specified in Subsection 6.2.2. The value of sprop-mst-remux-buf-size MUST be an integer in the range of 0 to 32767, inclusive. sprop-remux-buf-req: This parameter MUST NOT be present when mst-mode is not present or the value of mst-mode is equal to "NI-T". It MUST be present when mst-mode is present and the value of mst-mode is equal to "NI-C", "NI-TC", or "I-C". sprop-remux-buf-req signals the required size of the re- multiplexing buffer for the NAL unit stream. It is guaranteed that receivers can recover the decoding order of the received NAL units from the current RTP session and the RTP sessions the
current RTP session depends on as specified in Section 6.2.2, when the re-multiplexing buffer size is of at least the value of sprop-remux-buf-req in units of bytes. The value of sprop-remux-buf-req MUST be an integer in the range of 0 to 4294967295, inclusive. remux-buf-cap: This parameter MUST NOT be present when mst-mode is not present or the value of mst-mode is equal to "NI-T". This parameter MAY be used to signal the capabilities of a receiver implementation and indicates the amount of re-multiplexing buffer space in units of bytes that the receiver has available for recovering the NAL unit decoding order as specified in Section 6.2.2. A receiver is able to handle any NAL unit stream for which the value of the sprop-remux-buf-req parameter is smaller than or equal to this parameter. If the parameter is not present, then a value of 0 MUST be used for remux-buf-cap. The value of remux-buf-cap MUST be an integer in the range of 0 to 4294967295, inclusive. sprop-remux-init-buf-time: This parameter MAY be used to signal the properties of the NAL unit stream. The parameter MUST NOT be present if mst-mode is not present or the value of mst-mode is equal to "NI-T". The parameter signals the initial buffering time that a receiver MUST wait before starting to recover the NAL unit decoding order as specified in Section 6.2.2 of this memo. The parameter is coded as a non-negative base10 integer representation in clock ticks of a 90-kHz clock. If the parameter is not present, then no initial buffering time value is defined. Otherwise, the value of sprop-remux-init-buf-time MUST be an integer in the range of 0 to 4294967295, inclusive. sprop-mst-max-don-diff: This parameter MAY be used to signal the properties of the NAL unit stream. It MUST NOT be used to signal transmitter or receiver or codec capabilities. The parameter MUST NOT be present if mst-mode is not present or the value of mst-mode is equal to "NI-T". sprop-mst-max-don-diff is an integer in the range of 0 to 32767, inclusive. If sprop-mst-max-don-diff is not present, the value of the parameter is unspecified. sprop- mst-max-don-diff is calculated same as sprop-max-don-diff as specified in [RFC6184], with decoding order number being replaced by cross-session decoding order number.
sprop-scalability-info: This parameter MAY be used to convey the NAL unit containing the scalability information SEI message as specified in Annex G of [H.264]. This parameter MAY be used to signal the contained layers of an SVC bitstream. The parameter MUST NOT be used to indicate codec capability in any capability exchange procedure. The value of the parameter is the base64 [RFC4648] representation of the NAL unit containing the scalability information SEI message. If present, the NAL unit MUST contain only one SEI message that is a scalability information SEI message. This parameter MAY be used in an offering or declarative SDP message to indicate what layers (operation points) can be provided. A receiver MAY indicate its choice of one layer using the optional media type parameter scalable-layer-id. scalable-layer-id: This parameter MAY be used to signal a receiver's choice of the offers or declared operation points or layers using sprop- scalability-info or sprop-operation-point-info. The value of scalable-layer-id is a base16 representation of the layer_id[ i ] syntax element in the scalability information SEI message as specified in Annex G of [H.264] or layer-ID contained in sprop- operation-point-info. sprop-operation-point-info: This parameter MAY be used to describe the operation points of an RTP session. The value of this parameter consists of a comma-separated list of operation-point-description vectors. The values given by the operation-point-description vectors are the same as, or are derived from, the values that would be given for a scalable layer in the scalability information SEI message as specified in Annex G of [H.264], where the term scalable layer in the scalability information SEI message refers to all NAL units associated with the same values of temporal_id, dependency_id, and quality_id. In this memo, such a set of NAL units is called an operation point. Each operation-point-description vector has ten elements, provided as a comma-separated list of values as defined below. The first value of the operation-point-description vector is preceded by a '<', and the last value of the operation-point- description vector is followed by a '>'. If the sprop- operation-point-info is followed by exactly one operation- point-description vector, this describes the highest operation point contained in the RTP session. If there are two or more
operation-point-description vectors, the first describes the lowest and the last describes the highest operation point contained in the RTP session. The values given by the operation-point-description vector are as follows, in the order listed: - layer-ID: This value specifies the layer identifier of the operation point, which is identical to the layer_id that would be indicated (for the same values of dependency_id, quality_id, and temporal_id) in the scalability information SEI message. This field MAY be empty, indicating that the value is unspecified. When there are multiple operation- point-description vectors with layer-ID, the values of layer-ID do not need to be consecutive. - temporal-ID: This value specifies the temporal_id of the operation point. This field MUST NOT be empty. - dependency-ID: This values specifies the dependency_id of the operation point. This field MUST NOT be empty. - quality-ID: This values specifies the quality_id of the operation point. This field MUST NOT be empty. - profile-level-ID: This value specifies the profile-level-idc of the operation point in the base16 format. The default sub-profile or default level indicated by the parameter profile-level-ID in the sprop-operation-point-info vector SHALL be equal to or lower than the default sub-profile or default level indicated by profile-level-id, which may be either present or the default value is taken. This field MAY be empty, indicating that the value is unspecified. - avg-framerate: This value specifies the average frame rate of the operation point. This value is given as an integer in frames per 256 seconds. The field MAY be empty, indicating that the value is unspecified. - width: This value specifies the width dimension in pixels of decoded frames for the operation point. This parameter is not directly given in the scalability information SEI message. This field MAY be empty, indicating that the value is unspecified.
- height: This value gives the height dimension in pixels of decoded frames for the operation point. This parameter is not directly given in the scalability information SEI. This field MAY be empty, indicating that the value is unspecified. - avg-bitrate: This value specifies the average bitrate of the operation point. This parameter is given as an integer in kbits per second over the entire stream. Note that this parameter is provided in the scalability information SEI message in bits per second and calculated over a variable time window. This field MAY be empty, indicating that the value is unspecified. - max-bitrate: This value specifies the maximum bitrate of the operation point. This parameter is given as an integer in kbits per second and describes the maximum bitrate per each one-second window. Note that this parameter is provided in the scalability information SEI message in bits per second and is calculated over a variable time window. This field MAY be empty, indicating that the value is unspecified. Similarly to sprop-scalability-info, this parameter MAY be used in an offering or declarative SDP message to indicate what layers (operation points) can be provided. A receiver MAY indicate its choice of the highest layer it wants to send and/or receive using the optional media type parameter scalable-layer-id. sprop-no-NAL-reordering-required: This parameter MAY be used to signal the properties of the NAL unit stream. This parameter MUST NOT be present when mst-mode is not present or the value of mst-mode is not equal to "NI-T". The presence of this parameter indicates that no reordering of non-VCL or VCL NAL units is required for the decoding order recovery process. sprop-avc-ready: This parameter MAY be used to indicate the properties of the NAL unit stream. The presence of this parameter indicates that the RTP session, if used in SST, or used in MST combined with other RTP sessions also with this parameter present, can be processed by a [RFC6184] receiver. This parameter MAY be used with RTP sessions with media subtype H264-SVC. Encoding considerations: This media type is framed and binary; see Section 4.8 of RFC 4288 [RFC4288].
Security considerations: See Section 8 of RFC 6190. Published specification: Please refer to RFC 6190 and its Section 13. Additional information: none File extensions: none Macintosh file type code: none Object identifier or OID: none Person & email address to contact for further information: Ye-Kui Wang, yekui.wang@huawei.com Intended usage: COMMON Restrictions on usage: This media type depends on RTP framing, and hence is only defined for transfer via RTP [RFC3550]. Transport within other framing protocols is not defined at this time. Interoperability considerations: The media subtype name contains "SVC" to avoid potential conflict with RFC 3984 and its potential future replacement RTP payload format for H.264 non-SVC profiles. Applications that use this media type: Real-time video applications like video streaming, video telephony, and video conferencing. Author: Ye-Kui Wang, yekui.wang@huawei.com Change controller: IETF Audio/Video Transport working group delegated from the IESG.