The speech and audio Operation Points defined in this clause are primarily introduced in order to be used as content format in the context of 5G Media Streaming, but not restricted to this use case.
An operation point is a combination of rendering formats and media decoding capabilities.
For each Operation Point, Bitstream and Receiver requirements are detailed in the remainder of clause 6.
Table 6.1 provides an overview of the Operation Points defined in the present document.
Receivers conforming to the AMR Operation Point shall support the AMR speech media decoding capability according to clause 5.2 and shall support playback of the decoded signal.
Senders conforming to the AMR Operation Point shall support the AMR speech media encoding capability according to clause 5.3 in real-time for any speech source format with sampling frequency 8kHz.
Receivers conforming to the AMR-WB Operation Point shall support the AMR-WB speech media decoding capability according to clause 5.2 and shall support playback of the decoded signal.
Senders conforming to the AMR-WB Operation Point shall support the AMR-WB speech media encoding capability according to clause 5.3 in real-time for any speech source format with sampling frequency 16kHz.
Receivers conforming to the EVS Operation Point shall support the EVS speech media decoding capability according to clause 5.2 and shall support playback of the decoded signal.
Senders conforming to the EVS Operation Point shall support the EVS speech media encoding capability according to clause 5.3 in real-time for any speech source format with sampling frequency 8, 16, 32, 48 kHz.
Receivers conforming to the eAAC+ stereo Operation Point shall support the eAAC+ media decoding capability according to clause 5.2 and shall support playback of the decoded signal.
Senders conforming to the eAAC+ stereo Operation Point shall support the eAAC+ stereo audio media encoding capability according to clause 5.3 in real-time for any stereo audio source format with sampling frequency 32kHz, 44.1kHz, 48kHz.
Receivers conforming to the AMR-WB+ Operation Point shall support the AMR-WB+ media decoding capability according to clause 5.2 and shall support playback of the decoded signal.
Senders conforming to the AMR-WB+ Operation Point shall support the AMR-WB+ audio media encoding capability according to clause 5.3 in real-time for any stereo audio source format with sampling frequency 8, 16, 32 or 48 kHz.
The following requirements apply to the xHE-AAC stereo Operation Point.
The sampling frequency shall be either 32 kHz, 44.1 kHz or 48 kHz.
The bitstream shall be encoded according to the MPEG-D USAC "Baseline USAC" profile as defined in ISO/IEC 23003-3 [37] and shall contain the metadata sets conforming to the MPEG-D DRC loudness control profile or to the dynamic range control profile, level 1 or higher, as specified in ISO/IEC 23003-4 [38].
Receivers conforming to the xHE-AAC stereo Operation Point shall support the xHE-AAC stereo media decoding capability according to clause 5.2 and shall support playback of the decoded signal.
Senders conforming to the xHE-AAC stereo Operation Point shall support the xHE-AAC audio media encoding capability according to clause 5.3 in real-time for any stereo audio source format with sampling frequency 32kHz, 44.1kHz, or 48kHz.
The following requirements apply to the IVAS Operation Point.
The input audio format shall be either mono, stereo, binaural, multi-channel (5.1, 5.1.2, 5.1.4, 7.1, 7.1.4), scene-based (Ambisonics up to 3rd order), metadata-assisted spatial audio (MASA), object-based, a combined format of objects with scene-based (OSBA), or a combined format of objects with metadata-assisted spatial audio (OMASA).
The sampling frequency shall be either 8 kHz (only EVS interoperable coding), 16 kHz, 32 kHz and 48 kHz (fullband audio content).
Receivers conforming to the IVAS Operation Point shall support the IVAS media decoding capability according to clause 5.2 and shall support rendering and playback of the decoded signal.
Senders conforming to the IVAS Operation Point shall support the IVAS-Enc media encoding capability according to clause 5.3 in real-time for the audio formats according to the supported IVAS codec level 1, 2 or 3 as either mono, stereo, binaural, multi-channel (5.1, 5.1.2, 5.1.4, 7.1, 7.1.4), scene-based (Ambisonics up to 3rd order), metadata-assisted spatial audio (MASA), and object-based with sampling frequency 8 kHz (only EVS interoperable coding), 16 kHz, 32 kHz and 48 kHz (fullband audio content).
Receivers conforming to the AAC-ELDv2 Operation Point shall support the AAC-ELDv2 media decoding capability according to clause 5.2 and shall support playback of the decoded signal.
Senders conforming to the AAC-ELDv2 Operation Point shall support the AAC-ELDv2 audio media encoding capability according to clause 5.3 in real-time for any stereo audio source format with sampling frequency 32kHz, 44.1kHz, or 48kHz.