The speech and audio media capabilities defined in this specification are primarily introduced in order to be used as content format in the context of 5G Media Streaming, but not restricted to this use case. Parameters for audio encoder/decoder, content format and transport are defined.
The present document defines:
Media decoding capabilities: the requirements for a receiver in terms of decoding
Media encoding capabilities: the requirements for a sender in terms of encoding
Operation Points: A collection of discrete combinations of different content formats and the encoding formats. Operation Points are supported by
Bitstream Requirements: A media bitstream that conforms to an audio or speech encoding format and certain Operation Point.
Receiver Requirements: A function that can decode and playback any Bitstream that is conforming to a certain Operation Point in real-time.
Sender Requirements: A function that can process and encode any Bitstream that is conforming to a certain Operation Point in real-time.
The integration of each Operation Point in 5G Media Streaming as defined in TS 26.501 and TS 26.511.
The following speech media decoding capabilities are defined:
AMR: All decoding requirements for the AMR speech codec as specified in TS 26.071, TS 26.090, TS 26.073 and TS 26.104) including all 8 modes and source-controlled rate operation TS 26.093.
AMR-WB: All decoding requirements for the AMR-WB codec as specified in TS 26.171, TS 26.190, TS 26.173 and TS 26.204 including all 9 modes and source-controlled rate operation TS 26.193.
EVS: All decoding requirements for the EVS codec as specified in TS 26.441, TS 26.445, TS 26.442 and TS 26.443 as described below including functions for backwards compatibility with AMR-WB (TS 26.446) and discontinuous transmission (TS 26.450).
The following audio media decoding capabilities are defined:
AMR-WB+: All decoding requirements for the AMR-WB+ audio codec as specified in TS 26.290, TS 26.304 and TS 26.273.
xHE-AAC stereo: All decoding requirements for the xHE-AAC stereo audio codec as specified in the MPEG-D USAC "Extended high efficiency AAC profile" as defined in ISO/IEC 23003-3 [37] as well as all processing requirements applicable to the MPEG-D DRC loudness control profile and to the dynamic range control profile, level 1 or higher, as specified in ISO/IEC 23003-4 [38].
AAC-ELDv2: the capability to decode MPEG-4 Low Delay AAC v2 Profile Level 2 bitstreams [49] and to output it as 2-channel audio. Note that this profile contains the audio object types 23 (ER AAC LD), 39 (ER AAC ELD) and 44 (LD MPEG Surround).
xHE-AAC stereo: All encoding requirements for the xHE-AAC stereo audio codec as specified in the MPEG-D USAC "Baseline USAC" profile as defined in ISO/IEC 23003-3 [37] and with the additional requirements that all encoded media contains the required metadata sets conforming to the MPEG-D DRC loudness control profile or to the dynamic range control profile, level 1 or higher, as specified in ISO/IEC 23003-4 [38].
AAC-ELDv2: the capability to encode MPEG-4 Low Delay AAC v2 Profile Level 2 according to ISO/IEC 14496-3 [49]. Note that this profile contains the audio object types 23 (ER AAC LD), 39 (ER AAC ELD) and 44 (LD MPEG Surround).
Multi-instance encoding and decoding capabilities are defined as follows:
<Media-Cap>-<N>: the capability to support up to N simultaneous decoding or encoding instances, each supporting the decoding or encoding capability <Media-Cap>.
For example EVS-2 decoding capability is the capability to simultaneously support 2 decoders with EVS media capabilities according to clause 5.2.