The codec for Immersive Voice and Audio Services is part of a framework comprising of an encoder, decoder, and renderer. An overview of the audio processing functions of the receive side of the codec is shown in
Figure 4.1-1. This diagram is based on
[2], with rendering features highlighted.
Interfaces:
3:
Encoded audio frames (50 frames/s), number of bits depending on IVAS codec mode
4:
Encoded Silence Insertion Descriptor (SID) frames
5:
RTP Payload packets
6:
Lost Frame Indicator (BFI)
7:
Renderer config data
8:
Head-tracker pose information and scene orientation control data
9:
Audio output channels (16-bit linear PCM, sampled at 8 (only EVS), 16, 32, or 48 kHz)
10:
Metadata associated with output audio
Please note that the interface numbering is consistent with IVAS General Overview
[2].
Rendering is the process of generating digital audio output from the decoded digital audio signal. Rendering is used when output format is different than input format. In case output format is the same as input format, the decoded audio channels are simply passed through to the output channels. Binaural rendering is a special case, where binaural output channels are prepared for headphone reproduction. This process includes head-tracking and scene orientation control, head-related transfer function processing, and room acoustic synthesis. Rendering for loudspeaker reproduction is also supported for preset or custom loudspeaker configurations.
IVAS rendering is available as an integral component of the IVAS decoder (internal renderer) or can be operated standalone as external rendering. The external renderer can be applied e.g., in the case of rendering outputs originating from multiple sources, such as decoders or audio streams.
IVAS rendering features reflect related design constraints, including:
-
support for provisioning of HRIR/BRIR filter sets as control data for binaural rendering. The format of HRIR/BRIR data is provided in clause 5.10 of TS 26.258,
-
support for default HRIR/BRIR sets for binaural rendering,
-
support for head-tracking data as control data for binaural audio rendering in quaternions and in Euler notation. The format of head-tracking data is provided in clause 5.11 of TS 26.258,
-
support for binaural reverb and early reflections controlled by reverb parameters, the format of reverb parameters is provided in clause 5.14.1, and in Annex B of TS 26.258.
A special feature of the renderer is that it supports split operation with pre-rendering and transcoding to a head-trackable intermediate representation that can be transmitted to a post-rendering end-device. This enables moving a large part of the processing load and memory requirements for IVAS decoding and rendering to a (more) capable node/UE while offloading the final rendering end-device. The IVAS specific split rendering functionality is mostly described in
TS 26.253 whereas more generic split rendering functionality is specified in
TS 26.249.
This document provides a high-level specification of the internal (
clause 5) and external renderer (
clause 6). Furthermore, the rendering library interface is provided (
clause 7). Split rendering is described on high level in
clause 8. Specific rendering algorithms and processing paths are out of scope of this specification and are provided in
TS 26.253.