The present document defines a frame loss concealment procedure, also termed frame substitution and muting procedure, which is executed by the Immersive Voice and Audio Services (IVAS) decoder when one or more frames (speech or audio or SID frames) are unavailable for decoding due to e.g. packet loss, corruption of a packet or late arrival of a packet.
The following documents contain provisions which, through reference in this text, constitute provisions of the present document.
References are either specific (identified by date of publication, edition number, version number, etc.) or non-specific.
For a specific reference, subsequent revisions do not apply.
For a non-specific reference, the latest version applies. In the case of a reference to a 3GPP document (including a GSM document), a non-specific reference implicitly refers to the latest version of that document in the same Release as the present document.
For the purposes of the present document, the terms given in TR 21.905 and the following apply. A term defined in the present document takes precedence over the definition of the same term, if any, in TR 21.905.
Further IVAS codec specific definitions are found in clause 3.1 of TS 26.253.
For the purposes of the present document, the abbreviations given in TR 21.905 and the following apply. An abbreviation defined in the present document takes precedence over the definition of the same abbreviation, if any, in TR 21.905.
BFI
Packet loss concealment serves to ensure the availability of useful audio output when valid packets are unavailable to the decoder. These losses are typically a result of impaired channel conditions like transmission errors or network congestion. The aim is to synthesize a substitution of the decoded audio represented by the lost packet, to prepare for a potential future packet loss, and to handle the transition from the concealment operation back to the decoded audio. The latter is also referred to as recovery operation. An overview of the IVAS codec's decoder operation is given in clause 6.1 of TS 26.253, where Figure 6.1-1 shows the functional structure of the decoder. To complement the picture with the packet loss concealment functionality, Figure 1 below shows the packet loss concealment (PLC) operation of the decoder. A major part of the PLC resides in the core decoding tools, where the audio decoding is mainly handled by the core decoder based on EVS [2]. The Single Channel Elements (SCE) decoder comprises one core-decoder, the Channel Pair Elements (CPE) comprises one or two core-decoders and the Multichannel Coding Tool (MCT) comprises joint decoding using multiple core-decoders, all including associated PLC methods. For the Low Frequency Effect (LFE) channel of multichannel audio, an LFE decoder with associated PLC method is available. Spatial metadata including spatial coding parameters are reconstructed in the spatial parameter decoders or by the associated PLC methods of the respective spatial audio formats. Spatial audio output is finally generated by a scene decoder, upmixer and renderer based on the reconstructed transport channels and the reconstructed spatial metadata. In case of a missing or corrupted packet, a bad frame indicator (BFI) is input to the decoding tools, activating the PLC operation. Notably, scene decoding, upmixing and rendering processing are independent of a bad frame indicator.
The IVAS split rendering feature exposes a further interface, the interface between the entity carrying out pre-rendering and encoding into the intermediate audio representation and the end-device doing post-rendering. To cope with potential transmission errors on that interface, the decoder of the intermediate audio representation features packet loss concealment techniques besides the actual decoding scheme. Note that the IVAS specific split rendering functionality including packet-loss concealment is mostly described in TS 26.253 whereas more generic split rendering functionality is specified in TS 26.249.
Since IVAS is based on the Codec for Enhanced Voice Services (EVS) [2], the main functionality of the core-codec is inherited from EVS. This includes the error concealment operations as described in TS 26.447, and for mono operation the functionality is implemented in a bit-exact manner. IVAS provides a few enhancements of the core-coder on top of EVS.
The following clauses in TS 26.253 describe the enhancements that have been made for the error concealment operation for the IVAS core-coder:
In addition to the mono operation with EVS compatibility, IVAS supports stereo, Independent Streams with metadata (ISM), multi-channel audio (MC), scene-based audio (Ambisonics or SBA), metadata assisted spatial audio (MASA) and combinations of objects with MASA (OMASA) and combination of objects with scene-based audio (OSBA). To handle the variation in audio formats across the supported range of input audio channels and bit rates, several dedicated encoding and decoding modules are employed. The general principle is that the parameters are recycled from the previously decoded frame, but there may also be further concealment operators performed on the parameters. The following clauses in TS 26.253 describe the error concealment operations within each of modules decoding the various audio formats.
The intermediate audio format of the IVAS split rendering feature comprises coded pose correction metadata and coded binaural audio. The binaural audio may be encoded using the LCLD coding format or the LC3plus coding format. The respective PLC schemes are described in TS 26.253 as follows: