7.2. SDP Parameters
The receiver MUST ignore any parameter unspecified in this memo.7.2.1. Mapping of Payload Type Parameters to SDP
The media type video/H265 string is mapped to fields in the Session Description Protocol (SDP) [RFC4566] as follows: o The media name in the "m=" line of SDP MUST be video. o The encoding name in the "a=rtpmap" line of SDP MUST be H265 (the media subtype). o The clock rate in the "a=rtpmap" line MUST be 90000. o The OPTIONAL parameters profile-space, profile-id, tier-flag, level-id, interop-constraints, profile-compatibility-indicator, sprop-sub-layer-id, recv-sub-layer-id, max-recv-level-id, tx-mode,
max-lsr, max-lps, max-cpb, max-dpb, max-br, max-tr, max-tc, max- fps, sprop-max-don-diff, sprop-depack-buf-nalus, sprop-depack-buf- bytes, depack-buf-cap, sprop-segmentation-id, sprop-spatial- segmentation-idc, dec-parallel-cap, and include-dph, when present, MUST be included in the "a=fmtp" line of SDP. This parameter is expressed as a media type string, in the form of a semicolon- separated list of parameter=value pairs. o The OPTIONAL parameters sprop-vps, sprop-sps, and sprop-pps, when present, MUST be included in the "a=fmtp" line of SDP or conveyed using the "fmtp" source attribute as specified in Section 6.3 of [RFC5576]. For a particular media format (i.e., RTP payload type), sprop-vps sprop-sps, or sprop-pps MUST NOT be both included in the "a=fmtp" line of SDP and conveyed using the "fmtp" source attribute. When included in the "a=fmtp" line of SDP, these parameters are expressed as a media type string, in the form of a semicolon-separated list of parameter=value pairs. When conveyed in the "a=fmtp" line of SDP for a particular payload type, the parameters sprop-vps, sprop-sps, and sprop-pps MUST be applied to each SSRC with the payload type. When conveyed using the "fmtp" source attribute, these parameters are only associated with the given source and payload type as parts of the "fmtp" source attribute. Informative note: Conveyance of sprop-vps, sprop-sps, and sprop-pps using the "fmtp" source attribute allows for out-of- band transport of parameter sets in topologies like Topo-Video- switch-MCU as specified in [RFC7667]. An example of media representation in SDP is as follows: m=video 49170 RTP/AVP 98 a=rtpmap:98 H265/90000 a=fmtp:98 profile-id=1; sprop-vps=<video parameter sets data>7.2.2. Usage with SDP Offer/Answer Model
When HEVC is offered over RTP using SDP in an offer/answer model [RFC3264] for negotiation for unicast usage, the following limitations and rules apply: o The parameters identifying a media format configuration for HEVC are profile-space, profile-id, tier-flag, level-id, interop- constraints, profile-compatibility-indicator, and tx-mode. These media configuration parameters, except level-id, MUST be used symmetrically when the answerer does not include recv-sub-layer-id
in the answer for the media format (payload type) or the included recv-sub-layer-id is equal to sprop-sub-layer-id in the offer. The answerer MUST: 1) maintain all configuration parameters with the values remaining the same as in the offer for the media format (payload type), with the exception that the value of level-id is changeable as long as the highest level indicated by the answer is not higher than that indicated by the offer; 2) include in the answer the recv-sub-layer-id parameter, with a value less than the sprop-sub-layer-id parameter in the offer, for the media format (payload type), and maintain all configuration parameters with the values being the same as signaled in the sprop-vps for the chosen sub-layer representation, with the exception that the value of level-id is changeable as long as the highest level indicated by the answer is not higher than the level indicated by the sprop-vps in offer for the chosen sub-layer representation; or 3) remove the media format (payload type) completely (when one or more of the parameter values are not supported). Informative note: The above requirement for symmetric use does not apply for level-id, and does not apply for the other bitstream or RTP stream properties and capability parameters. o The profile-compatibility-indicator, when offered as sendonly, describes bitstream properties. The answerer MAY accept an RTP payload type even if the decoder is not capable of handling the profile indicated by the profile-space, profile-id, and interop- constraints parameters, but capable of any of the profiles indicated by the profile-space, profile-compatibility-indicator, and interop-constraints. However, when the profile-compatibility- indicator is used in a recvonly or sendrecv media description, the bitstream using this RTP payload type is required to conform to all profiles indicated by profile-space, profile-compatibility- indicator, and interop-constraints. o To simplify handling and matching of these configurations, the same RTP payload type number used in the offer SHOULD also be used in the answer, as specified in [RFC3264]. o The same RTP payload type number used in the offer for the media subtype H265 MUST be used in the answer when the answer includes recv-sub-layer-id. When the answer does not include recv-sub- layer-id, the answer MUST NOT contain a payload type number used
in the offer for the media subtype H265 unless the configuration is exactly the same as in the offer or the configuration in the answer only differs from that in the offer with a different value of level-id. The answer MAY contain the recv-sub-layer-id parameter if an HEVC bitstream contains multiple operation points (using temporal scalability and sub-layers) and sprop-vps is included in the offer where information of sub-layers are present in the first video parameter set contained in sprop-vps. If the sprop-vps is provided in an offer, an answerer MAY select a particular operation point indicated in the first video parameter set contained in sprop-vps. When the answer includes a recv-sub- layer-id that is less than a sprop-sub-layer-id in the offer, all video parameter sets contained in the sprop-vps parameter in the SDP answer and all video parameter sets sent in-band for either the offerer-to-answerer direction or the answerer-to-offerer direction MUST be consistent with the first video parameter set in the sprop-vps parameter of the offer (see the semantics of sprop- vps in Section 7.1 of this document on one video parameter set being consistent with another video parameter set), and the bitstream sent in either direction MUST conform to the profile, tier, level, and constraints of the chosen sub-layer representation as indicated by the first profile_tier_level( ) syntax structure in the first video parameter set in the sprop-vps parameter of the offer. Informative note: When an offerer receives an answer that does not include recv-sub-layer-id, it has to compare payload types not declared in the offer based on the media type (i.e., video/H265) and the above media configuration parameters with any payload types it has already declared. This will enable it to determine whether the configuration in question is new or if it is equivalent to configuration already offered, since a different payload type number may be used in the answer. The ability to perform operation point selection enables a receiver to utilize the temporal scalable nature of an HEVC bitstream. o The parameters sprop-max-don-diff, sprop-depack-buf-nalus, and sprop-depack-buf-bytes describe the properties of an RTP stream, and all RTP streams the RTP stream depends on, when present, that the offerer or the answerer is sending for the media format configuration. This differs from the normal usage of the offer/answer parameters: normally such parameters declare the properties of the bitstream or RTP stream that the offerer or the answerer is able to receive. When dealing with HEVC, the offerer assumes that the answerer will be able to receive media encoded using the configuration being offered.
Informative note: The above parameters apply for any RTP stream and all RTP streams the RTP stream depends on, when present, sent by a declaring entity with the same configuration. In other words, the applicability of the above parameters to RTP streams depends on the source endpoint. Rather than being bound to the payload type, the values may have to be applied to another payload type when being sent, as they apply for the configuration. o The capability parameters max-lsr, max-lps, max-cpb, max-dpb, max- br, max-tr, and max-tc MAY be used to declare further capabilities of the offerer or answerer for receiving. These parameters MUST NOT be present when the direction attribute is sendonly. o The capability parameter max-fps MAY be used to declare lower capabilities of the offerer or answerer for receiving. The parameters MUST NOT be present when the direction attribute is sendonly. o The capability parameter dec-parallel-cap MAY be used to declare additional decoding capabilities of the offerer or answerer for receiving. Upon receiving such a declaration of a receiver, a sender MAY send a bitstream to the receiver utilizing those capabilities under the assumption that the bitstream fulfills the parallelism requirement. A bitstream that is sent based on choosing a capability point with parallel tool type 'w' from dec- parallel-cap MUST have entropy_coding_sync_enabled_flag equal to 1 and min_spatial_segmentation_idc equal to or larger than dec- parallel-cap.spatial-seg-idc of the capability point. A bitstream that is sent based on choosing a capability point with parallel tool type 't' from dec-parallel-cap MUST have entropy_coding_sync_enabled_flag equal to 0 and min_spatial_segmentation_idc equal to or larger than dec-parallel- cap.spatial-seg-idc of the capability point. o An offerer has to include the size of the de-packetization buffer, sprop-depack-buf-bytes, as well as sprop-max-don-diff and sprop- depack-buf-nalus, in the offer for an interleaved HEVC bitstream or for the MRST or MRMT transmission mode when sprop-max-don-diff is greater than 0 for at least one of the RTP streams. To enable the offerer and answerer to inform each other about their capabilities for de-packetization buffering in receiving RTP streams, both parties are RECOMMENDED to include depack-buf-cap. For interleaved RTP streams or in MRST or MRMT, it is also RECOMMENDED to consider offering multiple payload types with different buffering requirements when the capabilities of the receiver are unknown.
o The capability parameter include-dph MAY be used to declare the capability to utilize decoded picture hash SEI messages and which types of hashes in any HEVC RTP streams received by the offerer or answerer. o The sprop-vps, sprop-sps, or sprop-pps, when present (included in the "a=fmtp" line of SDP or conveyed using the "fmtp" source attribute as specified in Section 6.3 of [RFC5576]), are used for out-of-band transport of the parameter sets (VPS, SPS, or PPS, respectively). o The answerer MAY use either out-of-band or in-band transport of parameter sets for the bitstream it is sending, regardless of whether out-of-band parameter sets transport has been used in the offerer-to-answerer direction. Parameter sets included in an answer are independent of those parameter sets included in the offer, as they are used for decoding two different bitstreams, one from the answerer to the offerer and the other in the opposite direction. In case some RTP streams are sent before the SDP offer/answer settles down, in-band parameter sets MUST be used for those RTP stream parts sent before the SDP offer/answer. o The following rules apply to transport of parameter set in the offerer-to-answerer direction. + An offer MAY include sprop-vps, sprop-sps, and/or sprop-pps. If none of these parameters is present in the offer, then only in-band transport of parameter sets is used. + If the level to use in the offerer-to-answerer direction is equal to the default level in the offer, the answerer MUST be prepared to use the parameter sets included in sprop-vps, sprop-sps, and sprop-pps (either included in the "a=fmtp" line of SDP or conveyed using the "fmtp" source attribute) for decoding the incoming bitstream, e.g., by passing these parameter set NAL units to the video decoder before passing any NAL units carried in the RTP streams. Otherwise, the answerer MUST ignore sprop-vps, sprop-sps, and sprop-pps (either included in the "a=fmtp" line of SDP or conveyed using the "fmtp" source attribute) and the offerer MUST transmit parameter sets in-band. + In MRST or MRMT, the answerer MUST be prepared to use the parameter sets out-of-band transmitted for the RTP stream and all RTP streams the RTP stream depends on, when present, for decoding the incoming bitstream, e.g., by passing these parameter set NAL units to the video decoder before passing any NAL units carried in the RTP streams.
o The following rules apply to transport of parameter set in the answerer-to-offerer direction. + An answer MAY include sprop-vps, sprop-sps, and/or sprop-pps. If none of these parameters is present in the answer, then only in-band transport of parameter sets is used. + The offerer MUST be prepared to use the parameter sets included in sprop-vps, sprop-sps, and sprop-pps (either included in the "a=fmtp" line of SDP or conveyed using the "fmtp" source attribute) for decoding the incoming bitstream, e.g., by passing these parameter set NAL units to the video decoder before passing any NAL units carried in the RTP streams. + In MRST or MRMT, the offerer MUST be prepared to use the parameter sets out-of-band transmitted for the RTP stream and all RTP streams the RTP stream depends on, when present, for decoding the incoming bitstream, e.g., by passing these parameter set NAL units to the video decoder before passing any NAL units carried in the RTP streams. o When sprop-vps, sprop-sps, and/or sprop-pps are conveyed using the "fmtp" source attribute as specified in Section 6.3 of [RFC5576], the receiver of the parameters MUST store the parameter sets included in sprop-vps, sprop-sps, and/or sprop-pps and associate them with the source given as part of the "fmtp" source attribute. Parameter sets associated with one source (given as part of the "fmtp" source attribute) MUST only be used to decode NAL units conveyed in RTP packets from the same source (given as part of the "fmtp" source attribute). When this mechanism is in use, SSRC collision detection and resolution MUST be performed as specified in [RFC5576]. For bitstreams being delivered over multicast, the following rules apply: o The media format configuration is identified by profile-space, profile-id, tier-flag, level-id, interop-constraints, profile- compatibility-indicator, and tx-mode. These media format configuration parameters, including level-id, MUST be used symmetrically; that is, the answerer MUST either maintain all configuration parameters or remove the media format (payload type) completely. Note that this implies that the level-id for offer/answer in multicast is not changeable.
o To simplify the handling and matching of these configurations, the same RTP payload type number used in the offer SHOULD also be used in the answer, as specified in [RFC3264]. An answer MUST NOT contain a payload type number used in the offer unless the configuration is the same as in the offer. o Parameter sets received MUST be associated with the originating source and MUST only be used in decoding the incoming bitstream from the same source. o The rules for other parameters are the same as above for unicast as long as the three above rules are obeyed. Table 1 lists the interpretation of all the parameters that MUST be used for the various combinations of offer, answer, and direction attributes. Note that the two columns wherein the recv-sub-layer-id parameter is used only apply to answers, whereas the other columns apply to both offers and answers. Table 1. Interpretation of parameters for various combinations of offers, answers, direction attributes, with and without recv-sub- layer-id. Columns that do not indicate offer or answer apply to both.
sendonly --+
answer: recvonly, recv-sub-layer-id --+ |
recvonly w/o recv-sub-layer-id --+ | |
answer: sendrecv, recv-sub-layer-id --+ | | |
sendrecv w/o recv-sub-layer-id --+ | | | |
| | | | |
profile-space C D C D P
profile-id C D C D P
tier-flag C D C D P
level-id D D D D P
interop-constraints C D C D P
profile-compatibility-indicator C D C D P
tx-mode C C C C P
max-recv-level-id R R R R -
sprop-max-don-diff P P - - P
sprop-depack-buf-nalus P P - - P
sprop-depack-buf-bytes P P - - P
depack-buf-cap R R R R -
sprop-segmentation-id P P P P P
sprop-spatial-segmentation-idc P P P P P
max-br R R R R -
max-cpb R R R R -
max-dpb R R R R -
max-lsr R R R R -
max-lps R R R R -
max-tr R R R R -
max-tc R R R R -
max-fps R R R R -
sprop-vps P P - - P
sprop-sps P P - - P
sprop-pps P P - - P
sprop-sub-layer-id P P - - P
recv-sub-layer-id X O X O -
dec-parallel-cap R R R R -
include-dph R R R R -
Legend:
C: configuration for sending and receiving bitstreams
D: changeable configuration, same as C except possible
to answer with a different but consistent value (see the
semantics of the six parameters related to profile, tier,
and level on these parameters being consistent)
P: properties of the bitstream to be sent
R: receiver capabilities
O: operation point selection
X: MUST NOT be present
-: not usable, when present MUST be ignored
Parameters used for declaring receiver capabilities are, in general, downgradable; i.e., they express the upper limit for a sender's possible behavior. Thus, a sender MAY select to set its encoder using only lower/lesser or equal values of these parameters. When the answer does not include a recv-sub-layer-id that is less than the sprop-sub-layer-id in the offer, parameters declaring a configuration point are not changeable, with the exception of the level-id parameter for unicast usage, and these parameters express values a receiver expects to be used and MUST be used verbatim in the answer as in the offer. When a sender's capabilities are declared with the configuration parameters, these parameters express a configuration that is acceptable for the sender to receive bitstreams. In order to achieve high interoperability levels, it is often advisable to offer multiple alternative configurations. It is impossible to offer multiple configurations in a single payload type. Thus, when multiple configuration offers are made, each offer requires its own RTP payload type associated with the offer. However, it is possible to offer multiple operation points using one configuration in a single payload type by including sprop-vps in the offer and recv-sub-layer- id in the answer. A receiver SHOULD understand all media type parameters, even if it only supports a subset of the payload format's functionality. This ensures that a receiver is capable of understanding when an offer to receive media can be downgraded to what is supported by the receiver of the offer. An answerer MAY extend the offer with additional media format configurations. However, to enable their usage, in most cases a second offer is required from the offerer to provide the bitstream property parameters that the media sender will use. This also has the effect that the offerer has to be able to receive this media format configuration, not only to send it.7.2.3. Usage in Declarative Session Descriptions
When HEVC over RTP is offered with SDP in a declarative style, as in Real Time Streaming Protocol (RTSP) [RFC2326] or Session Announcement Protocol (SAP) [RFC2974], the following considerations are necessary.
o All parameters capable of indicating both bitstream properties and receiver capabilities are used to indicate only bitstream properties. For example, in this case, the parameter profile- tier-level-id declares the values used by the bitstream, not the capabilities for receiving bitstreams. As a result, the following interpretation of the parameters MUST be used: + Declaring actual configuration or bitstream properties: - profile-space - profile-id - tier-flag - level-id - interop-constraints - profile-compatibility-indicator - tx-mode - sprop-vps - sprop-sps - sprop-pps - sprop-max-don-diff - sprop-depack-buf-nalus - sprop-depack-buf-bytes - sprop-segmentation-id - sprop-spatial-segmentation-idc + Not usable (when present, they MUST be ignored): - max-lps - max-lsr - max-cpb - max-dpb - max-br - max-tr - max-tc - max-fps - max-recv-level-id - depack-buf-cap - sprop-sub-layer-id - dec-parallel-cap - include-dph o A receiver of the SDP is required to support all parameters and values of the parameters provided; otherwise, the receiver MUST reject (RTSP) or not participate in (SAP) the session. It falls on the creator of the session to use values that are expected to be supported by the receiving application.
7.2.4. Considerations for Parameter Sets
When out-of-band transport of parameter sets is used, parameter sets MAY still be additionally transported in-band unless explicitly disallowed by an application, and some of these additional parameter sets may update some of the out-of-band transported parameter sets. Update of a parameter set refers to the sending of a parameter set of the same type using the same parameter set ID but with different values for at least one other parameter of the parameter set.7.2.5. Dependency Signaling in Multi-Stream Mode
If MRST or MRMT is used, the rules on signaling media decoding dependency in SDP as defined in [RFC5583] apply. The rules on "hierarchical or layered encoding" with multicast in Section 5.7 of [RFC4566] do not apply. This means that the notation for Connection Data "c=" SHALL NOT be used with more than one address, i.e., the sub-field <number of addresses> in the sub-field <connection-address> of the "c=" field, described in [RFC4566], must not be present. The order of session dependency is given from the RTP stream containing the lowest temporal sub-layer to the RTP stream containing the highest temporal sub-layer.8. Use with Feedback Messages
The following subsections define the use of the Picture Loss Indication (PLI), Slice Lost Indication (SLI), Reference Picture Selection Indication (RPSI), and Full Intra Request (FIR) feedback messages with HEVC. The PLI, SLI, and RPSI messages are defined in [RFC4585], and the FIR message is defined in [RFC5104].8.1. Picture Loss Indication (PLI)
As specified in RFC 4585, Section 6.3.1, the reception of a PLI by a media sender indicates "the loss of an undefined amount of coded video data belonging to one or more pictures". Without having any specific knowledge of the setup of the bitstream (such as use and location of in-band parameter sets, non-IDR decoder refresh points, picture structures, and so forth), a reaction to the reception of an PLI by an HEVC sender SHOULD be to send an IDR picture and relevant parameter sets; potentially with sufficient redundancy so to ensure correct reception. However, sometimes information about the bitstream structure is known. For example, state could have been established outside of the mechanisms defined in this document that parameter sets are conveyed out of band only, and stay static for the duration of the session. In that case, it is obviously unnecessary to send them in-band as a result of the reception of a PLI. Other
examples could be devised based on a priori knowledge of different aspects of the bitstream structure. In all cases, the timing and congestion control mechanisms of RFC 4585 MUST be observed.8.2. Slice Loss Indication (SLI)
The SLI described in RFC 4585 can be used to indicate, to a sender, the loss of a number of Coded Tree Blocks (CTBs) in a CTB raster scan order of a picture. In the SLI's Feedback Control Indication (FCI) field, the subfield "First" MUST be set to the CTB address of the first lost CTB. Note that the CTB address is in CTB-raster-scan order of a picture. For the first CTB of a slice segment, the CTB address is the value of slice_segment_address when present, or 0 when the value of first_slice_segment_in_pic_flag is equal to 1; both syntax elements are in the slice segment header. The subfield "Number" MUST be set to the number of consecutive lost CTBs, again in CTB-raster-scan order of a picture. Note that due to both the "First" and "Number" being counted in CTBs in CTB-raster-scan order, of a picture, not in tile-scan order (which is the bitstream order of CTBs), multiple SLI messages may be needed to report the loss of one tile covering multiple CTB rows but less wide than the picture. The subfield "PictureID" MUST be set to the 6 least significant bits of a binary representation of the value of PicOrderCntVal, as defined in [HEVC], of the picture for which the lost CTBs are indicated. Note that for IDR pictures the syntax element slice_pic_order_cnt_lsb is not present, but then the value is inferred to be equal to 0. As described in RFC 4585, an encoder in a media sender can use this information to "clean up" the corrupted picture by sending intra information, while observing the constraints described in RFC 4585, for example, with respect to congestion control. In many cases, error tracking is required to identify the corrupted region in the receiver's state (reference pictures) because of error import in uncorrupted regions of the picture through motion compensation. Reference-picture selection can also be used to "clean up" the corrupted picture, which is usually more efficient and less likely to generate congestion than sending intra information. In contrast to the video codecs contemplated in RFCs 4585 and 5104 [RFC5104], in HEVC, the "macroblock size" is not fixed to 16x16 luma samples, but is variable. That, however, does not create a conceptual difficulty with SLI, because the setting of the CTB size is a sequence-level functionality, and using a slice loss indication across CVS boundaries is meaningless as there is no prediction across sequence boundaries. However, a proper use of SLI messages is not as straightforward as it was with older, fixed-macroblock-sized video
codecs, as the state of the sequence parameter set (where the CTB size is located) has to be taken into account when interpreting the "First" subfield in the FCI.8.3. Reference Picture Selection Indication (RPSI)
Feedback-based reference picture selection has been shown as a powerful tool to stop temporal error propagation for improved error resilience [Girod99][Wang05]. In one approach, the decoder side tracks errors in the decoded pictures and informs the encoder side that a particular picture that has been decoded relatively earlier is correct and still present in the decoded picture buffer; it requests the encoder to use that correct picture-availability information when encoding the next picture, so to stop further temporal error propagation. For this approach, the decoder side should use the RPSI feedback message. Encoders can encode some long-term reference pictures as specified in H.264 or HEVC for purposes described in the previous paragraph without the need of a huge decoded picture buffer. As shown in [Wang05], with a flexible reference picture management scheme, as in H.264 and HEVC, even a decoded picture buffer size of two picture storage buffers would work for the approach described in the previous paragraph. The field "Native RPSI bit string defined per codec" is a base16 [RFC4648] representation of the 8 bits consisting of the 2 most significant bits equal to 0 and 6 bits of nuh_layer_id, as defined in [HEVC], followed by the 32 bits representing the value of the PicOrderCntVal (in network byte order), as defined in [HEVC], for the picture that is indicated by the RPSI feedback message. The use of the RPSI feedback message as positive acknowledgement with HEVC is deprecated. In other words, the RPSI feedback message MUST only be used as a reference picture selection request, such that it can also be used in multicast.8.4. Full Intra Request (FIR)
The purpose of the FIR message is to force an encoder to send an independent decoder refresh point as soon as possible (observing, for example, the congestion-control-related constraints set out in RFC 5104). Upon reception of a FIR, a sender MUST send an IDR picture. Parameter sets MUST also be sent, except when there is a priori knowledge that the parameter sets have been correctly established. A
typical example for that is an understanding between sender and receiver, established by means outside this document, that parameter sets are exclusively sent out-of-band.9. Security Considerations
The scope of this Security Considerations section is limited to the payload format itself and to one feature of HEVC that may pose a particularly serious security risk if implemented naively. The payload format, in isolation, does not form a complete system. Implementers are advised to read and understand relevant security- related documents, especially those pertaining to RTP (see the Security Considerations section in [RFC3550]), and the security of the call-control stack chosen (that may make use of the media type registration of this memo). Implementers should also consider known security vulnerabilities of video coding and decoding implementations in general and avoid those. Within this RTP payload format, and with the exception of the user data SEI message as described below, no security threats other than those common to RTP payload formats are known. In other words, neither the various media-plane-based mechanisms, nor the signaling part of this memo, seems to pose a security risk beyond those common to all RTP-based systems. RTP packets using the payload format defined in this specification are subject to the security considerations discussed in the RTP specification [RFC3550], and in any applicable RTP profile such as RTP/AVP [RFC3551], RTP/AVPF [RFC4585], RTP/SAVP [RFC3711], or RTP/SAVPF [RFC5124]. However, as "Securing the RTP Framework: Why RTP Does Not Mandate a Single Media Security Solution" [RFC7202] discusses, it is not an RTP payload format's responsibility to discuss or mandate what solutions are used to meet the basic security goals like confidentiality, integrity and source authenticity for RTP in general. This responsibility lays on anyone using RTP in an application. They can find guidance on available security mechanisms and important considerations in "Options for Securing RTP Sessions" [RFC7201]. Applications SHOULD use one or more appropriate strong security mechanisms. The rest of this section discusses the security impacting properties of the payload format itself. Because the data compression used with this payload format is applied end-to-end, any encryption needs to be performed after compression. A potential denial-of-service threat exists for data encodings using compression techniques that have non-uniform receiver-end computational load. The attacker can inject pathological datagrams into the bitstream that are complex to decode and that cause the receiver to be overloaded. H.265 is particularly vulnerable to such
attacks, as it is extremely simple to generate datagrams containing NAL units that affect the decoding process of many future NAL units. Therefore, the usage of data origin authentication and data integrity protection of at least the RTP packet is RECOMMENDED, for example, with SRTP [RFC3711]. Like [H.264], HEVC includes a user data Supplemental Enhancement Information (SEI) message. This SEI message allows inclusion of an arbitrary bitstring into the video bitstream. Such a bitstring could include JavaScript, machine code, and other active content. HEVC leaves the handling of this SEI message to the receiving system. In order to avoid harmful side effects of the user data SEI message, decoder implementations cannot naively trust its content. For example, it would be a bad and insecure implementation practice to forward any JavaScript a decoder implementation detects to a web browser. The safest way to deal with user data SEI messages is to simply discard them, but that can have negative side effects on the quality of experience by the user. End-to-end security with authentication, integrity, or confidentiality protection will prevent a MANE from performing media- aware operations other than discarding complete packets. In the case of confidentiality protection, it will even be prevented from discarding packets in a media-aware way. To be allowed to perform such operations, a MANE is required to be a trusted entity that is included in the security context establishment.10. Congestion Control
Congestion control for RTP SHALL be used in accordance with RTP [RFC3550] and with any applicable RTP profile, e.g., AVP [RFC3551]. If best-effort service is being used, an additional requirement is that users of this payload format MUST monitor packet loss to ensure that the packet loss rate is within an acceptable range. Packet loss is considered acceptable if a TCP flow across the same network path, and experiencing the same network conditions, would achieve an average throughput, measured on a reasonable timescale, that is not less than all RTP streams combined is achieving. This condition can be satisfied by implementing congestion-control mechanisms to adapt the transmission rate, the number of layers subscribed for a layered multicast session, or by arranging for a receiver to leave the session if the loss rate is unacceptably high. The bitrate adaptation necessary for obeying the congestion control principle is easily achievable when real-time encoding is used, for example, by adequately tuning the quantization parameter.
However, when pre-encoded content is being transmitted, bandwidth adaptation requires the pre-coded bitstream to be tailored for such adaptivity. The key mechanism available in HEVC is temporal scalability. A media sender can remove NAL units belonging to higher temporal sub-layers (i.e., those NAL units with a high value of TID) until the sending bitrate drops to an acceptable range. HEVC contains mechanisms that allow the lightweight identification of switching points in temporal enhancement layers, as discussed in Section 1.1.2 of this memo. An HEVC media sender can send packets belonging to NAL units of temporal enhancement layers starting from these switching points to probe for available bandwidth and to utilized bandwidth that has been shown to be available. Above mechanisms generally work within a defined profile and level and, therefore, no renegotiation of the channel is required. Only when non-downgradable parameters (such as profile) are required to be changed does it become necessary to terminate and restart the RTP stream(s). This may be accomplished by using different RTP payload types. MANEs MAY remove certain unusable packets from the RTP stream when that RTP stream was damaged due to previous packet losses. This can help reduce the network load in certain special cases. For example, MANES can remove those FUs where the leading FUs belonging to the same NAL unit have been lost or those dependent slice segments when the leading slice segments belonging to the same slice have been lost, because the trailing FUs or dependent slice segments are meaningless to most decoders. MANES can also remove higher temporal scalable layers if the outbound transmission (from the MANE's viewpoint) experiences congestion.11. IANA Considerations
A new media type, as specified in Section 7.1 of this memo, has been registered with IANA.12. References
12.1. Normative References
[H.264] ITU-T, "Advanced video coding for generic audiovisual services", ITU-T Recommendation H.264, April 2013. [HEVC] ITU-T, "High efficiency video coding", ITU-T Recommendation H.265, April 2013.
[ISO23008-2] ISO/IEC, "Information technology -- High efficiency coding and media delivery in heterogeneous environments -- Part 2: High efficiency video coding", ISO/IEC 23008-2, 2013. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, <http://www.rfc-editor.org/info/rfc2119>. [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model with Session Description Protocol (SDP)", RFC 3264, DOI 10.17487/RFC3264, June 2002, <http://www.rfc-editor.org/info/rfc3264>. [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson, "RTP: A Transport Protocol for Real-Time Applications", STD 64, RFC 3550, DOI 10.17487/RFC3550, July 2003, <http://www.rfc-editor.org/info/rfc3550>. [RFC3551] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and Video Conferences with Minimal Control", STD 65, RFC 3551, DOI 10.17487/RFC3551, July 2003, <http://www.rfc-editor.org/info/rfc3551>. [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. Norrman, "The Secure Real-time Transport Protocol (SRTP)", RFC 3711, DOI 10.17487/RFC3711, March 2004, <http://www.rfc-editor.org/info/rfc3711>. [RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session Description Protocol", RFC 4566, DOI 10.17487/RFC4566, July 2006, <http://www.rfc-editor.org/info/rfc4566>. [RFC4585] Ott, J., Wenger, S., Sato, N., Burmeister, C., and J. Rey, "Extended RTP Profile for Real-time Transport Control Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585, DOI 10.17487/RFC4585, July 2006, <http://www.rfc-editor.org/info/rfc4585>. [RFC4648] Josefsson, S., "The Base16, Base32, and Base64 Data Encodings", RFC 4648, DOI 10.17487/RFC4648, October 2006, <http://www.rfc-editor.org/info/rfc4648>. [RFC5104] Wenger, S., Chandra, U., Westerlund, M., and B. Burman, "Codec Control Messages in the RTP Audio-Visual Profile with Feedback (AVPF)", RFC 5104, DOI 10.17487/RFC5104, February 2008, <http://www.rfc-editor.org/info/rfc5104>.
[RFC5124] Ott, J. and E. Carrara, "Extended Secure RTP Profile for Real-time Transport Control Protocol (RTCP)-Based Feedback (RTP/SAVPF)", RFC 5124, DOI 10.17487/RFC5124, February 2008, <http://www.rfc-editor.org/info/rfc5124>. [RFC5234] Crocker, D., Ed., and P. Overell, "Augmented BNF for Syntax Specifications: ABNF", STD 68, RFC 5234, DOI 10.17487/RFC5234, January 2008, <http://www.rfc-editor.org/info/rfc5234>. [RFC5576] Lennox, J., Ott, J., and T. Schierl, "Source-Specific Media Attributes in the Session Description Protocol (SDP)", RFC 5576, DOI 10.17487/RFC5576, June 2009, <http://www.rfc-editor.org/info/rfc5576>. [RFC5583] Schierl, T. and S. Wenger, "Signaling Media Decoding Dependency in the Session Description Protocol (SDP)", RFC 5583, DOI 10.17487/RFC5583, July 2009, <http://www.rfc-editor.org/info/rfc5583>.12.2. Informative References
[3GPDASH] 3GPP, "Transparent end-to-end Packet-switched Streaming Service (PSS); Progressive Download and Dynamic Adaptive Streaming over HTTP (3GP-DASH)", 3GPP TS 26.247 12.1.0, December 2013. [3GPPFF] 3GPP, "Transparent end-to-end packet switched streaming service (PSS); 3GPP file format (3GP)", 3GPP TS 26.244 12.20, December 2013. [CABAC] Sole, J., Joshi, R., Nguyen, N., Ji, T., Karczewicz, M., Clare, G., Henry, F., and Duenas, A., "Transform coefficient coding in HEVC", IEEE Transactions on Circuts and Systems for Video Technology, Vol. 22, No. 12, pp. 1765-1777, DOI 10.1109/TCSVT.2012.2223055, December 2012. [Girod99] Girod, B. and Faerber, F., "Feedback-based error control for mobile video transmission", Proceedings of the IEEE, Vol. 87, No. 10, pp. 1707-1723, DOI 10.1109/5.790632, October 1999. [H.265.1] ITU-T, "Conformance specification for ITU-T H.265 high efficiency video coding", ITU-T Recommendation H.265.1, October 2014.
[HEVCv2] Flynn, D., Naccari, M., Rosewarne, C., Sharman, K., Sole, J., Sullivan, G. J., and T. Suzuki, "High Efficiency Video Coding (HEVC) Range Extensions text specification: Draft 7", JCT-VC document JCTVC-Q1005, 17th JCT-VC meeting, Valencia, Spain, March/April 2014. [IS014496-12] IS0/IEC, "Information technology - Coding of audio-visual objects - Part 12: ISO base media file format", IS0/IEC 14496-12, 2015. [IS015444-12] IS0/IEC, "Information technology - JPEG 2000 image coding system - Part 12: ISO base media file format", IS0/IEC 15444-12, 2015. [JCTVC-J0107] Wang, Y.-K., Chen, Y., Joshi, R., and Ramasubramonian, K., "AHG9: On RAP pictures", JCT-VC document JCTVC-L0107, 10th JCT-VC meeting, Stockholm, Sweden, July 2012. [MPEG2S] ISO/IEC, "Information technology - Generic coding of moving pictures and associated audio information - Part 1: Systems", ISO International Standard 13818-1, 2013. [MPEGDASH] ISO/IEC, "Information technology - Dynamic adaptive streaming over HTTP (DASH) -- Part 1: Media presentation description and segment formats", ISO International Standard 23009-1, 2012. [RFC2326] Schulzrinne, H., Rao, A., and R. Lanphier, "Real Time Streaming Protocol (RTSP)", RFC 2326, DOI 10.17487/RFC2326, April 1998, <http://www.rfc-editor.org/info/rfc2326>. [RFC2974] Handley, M., Perkins, C., and E. Whelan, "Session Announcement Protocol", RFC 2974, DOI 10.17487/RFC2974, October 2000, <http://www.rfc-editor.org/info/rfc2974>. [RFC6051] Perkins, C. and T. Schierl, "Rapid Synchronisation of RTP Flows", RFC 6051, DOI 10.17487/RFC6051, November 2010, <http://www.rfc-editor.org/info/rfc6051>. [RFC6184] Wang, Y.-K., Even, R., Kristensen, T., and R. Jesup, "RTP Payload Format for H.264 Video", RFC 6184, DOI 10.17487/RFC6184, May 2011, <http://www.rfc-editor.org/info/rfc6184>.
[RFC6190] Wenger, S., Wang, Y.-K., Schierl, T., and A. Eleftheriadis, "RTP Payload Format for Scalable Video Coding", RFC 6190, DOI 10.17487/RFC6190, May 2011, <http://www.rfc-editor.org/info/rfc6190>. [RFC7201] Westerlund, M. and C. Perkins, "Options for Securing RTP Sessions", RFC 7201, DOI 10.17487/RFC7201, April 2014, <http://www.rfc-editor.org/info/rfc7201>. [RFC7202] Perkins, C. and M. Westerlund, "Securing the RTP Framework: Why RTP Does Not Mandate a Single Media Security Solution", RFC 7202, DOI 10.17487/RFC7202, April 2014, <http://www.rfc-editor.org/info/rfc7202>. [RFC7656] Lennox, J., Gross, K., Nandakumar, S., Salgueiro, G., and B. Burman, Ed., "A Taxonomy of Semantics and Mechanisms for Real-Time Transport Protocol (RTP) Sources", RFC 7656, DOI 10.17487/RFC7656, November 2015, <http://www.rfc-editor.org/info/rfc7656>. [RFC7667] Westerlund, M. and S. Wenger, "RTP Topologies", RFC 7667, DOI 10.17487/RFC7667, November 2015, <http://www.rfc-editor.org/info/rfc7667>. [RTP-MULTI-STREAM] Lennox, J., Westerlund, M., Wu, Q., and C. Perkins, "Sending Multiple Media Streams in a Single RTP Session", Work in Progress, draft-ietf-avtcore-rtp-multi-stream-11, December 2015. [SDP-NEG] Holmberg, C., Alvestrand, H., and C. Jennings, "Negotiating Medai Multiplexing Using Session Description Protocol (SDP)", Work in Progress, draft-ietf-mmusic-sdp-bundle-negotiation-25, January 2016. [Wang05] Wang, Y.-K., Zhu, C., and Li, H., "Error resilient video coding using flexible reference fames", Visual Communications and Image Processing 2005 (VCIP 2005), Beijing, China, July 2005.
Acknowledgements
Muhammed Coban and Marta Karczewicz are thanked for discussions on the specification of the use with feedback messages and other aspects in this memo. Jonathan Lennox and Jill Boyce are thanked for their contributions to the PACI design included in this memo. Rickard Sjoberg, Arild Fuldseth, Bo Burman, Magnus Westerlund, and Tom Kristensen are thanked for their contributions to signaling related to parallel processing. Magnus Westerlund, Jonathan Lennox, Bernard Aboba, Jonatan Samuelsson, Roni Even, Rickard Sjoberg, Sachin Deshpande, Woo Johnman, Mo Zanaty, Ross Finlayson, Danny Hong, Bo Burman, Ben Campbell, Brian Carpenter, Qin Wu, Stephen Farrell, and Min Wang made valuable review comments that led to improvements.
Authors' Addresses
Ye-Kui Wang Qualcomm Incorporated 5775 Morehouse Drive San Diego, CA 92121 United States Phone: +1-858-651-8345 Email: yekui.wang@gmail.com Yago Sanchez Fraunhofer HHI Einsteinufer 37 D-10587 Berlin Germany Phone: +49 30 31002-663 Email: yago.sanchez@hhi.fraunhofer.de Thomas Schierl Fraunhofer HHI Einsteinufer 37 D-10587 Berlin Germany Phone: +49-30-31002-227 Email: thomas.schierl@hhi.fraunhofer.de Stephan Wenger Vidyo, Inc. 433 Hackensack Ave., 7th floor Hackensack, NJ 07601 United States Phone: +1-415-713-5473 Email: stewe@stewe.org Miska M. Hannuksela Nokia Corporation P.O. Box 1000 33721 Tampere Finland Phone: +358-7180-08000 Email: miska.hannuksela@nokia.com