Content for TS 26.253 Word version: 18.1.1

3.1 Terms 3.2 Symbols 3.3 Abbreviations
...

1 Scope p. 24

The present document is a detailed description of the signal processing algorithms of the Immersive Voice and Audio Services (IVAS) coder including the IVAS renderer.

2 References p. 24

[1]

TR 21.905: "Vocabulary for 3GPP Specifications".

[2]

TS 26.441: "Codec for Enhanced Voice Services (EVS); General Overview".

[3]

TS 26.445: "Codec for Enhanced Voice Services (EVS); Detailed Algorithmic Description".

[4]

TS 26.447: "Codec for Enhanced Voice Services (EVS); Error concealment of lost packets".

[5]

TS 26.448: "Codec for Enhanced Voice Services (EVS); Jitter Buffer Management"

[6]

TS 26.250: "Codec for Immersive Voice and Audio Services (IVAS); General overview".

[7]

TS 26.251: "Codec for Immersive Voice and Audio Services (IVAS); C code (fixed-point)".
→ to date, still a draft

[8]

TS 26.252: "Codec for Immersive Voice and Audio Services (IVAS); Test Sequences".

[9]

TS 26.254: "Codec for Immersive Voice and Audio Services (IVAS); Rendering".

[10]

TS 26.255: "Codec for Immersive Voice and Audio Services (IVAS); Error concealment of lost packets".

[11]

TS 26.256: "Codec for Immersive Voice and Audio Services (IVAS); Jitter Buffer Management".

[12]

TS 26.258: "Codec for Immersive Voice and Audio Services (IVAS); C code (floating point)".

[13]

C. de Boor and K. Höllig (1987), B-splines without divided differences, in Geometric Modeling, G. Farin ed., SIAM, 21-27.

[14]

Borß, C. A Polygon-Based Panning Method for 3D Loudspeaker Setups. In Audio Engineering Society Convention 137, Los Angeles, USA, Oct. 2014.

[15]

Zotter, F. and Frank, M., All-Round Ambisonic Panning and Decoding, J. Audio Eng. Soc., vol. 60, no. 10, pp. 807-820 (Oct. 2012).

[16]

Allen, J. B., & Berkley, D. A., Image method for efficiently simulating small room acoustics. Journal of the Acoustical Society of America, 65(4), 943-950, (1979).

[17]

Box, G. E. P. and Cox, D. R. (1964). An analysis of transformations, Journal of the Royal Statistical Society, Series B, 26, 211-252.

[18]

TS 26.118: "Virtual Reality (VR) profiles for streaming applications".

[19]

Ivanic, J., & Ruedenberg, K., Rotation matrices for real spherical harmonics. Direct determination by recursion. The Journal of Physical Chemistry, 100(15), 6342-6347, 1996.

[20]

C. R. Helmrich and B. Edler, "Signal-Adaptive Transform Kernel Switching for Stereo Audio Coding," in Proc. IEEE WASPAA, New Paltz, NY, USA, Oct. 2015.

[21]

AES, "AES69-2022: AES standard for file exchange - Spatial acoustic data", Audio Engineering Society, 2022.

[22]

F. Thomas, "Approaching Dual Quaternions From Matrix Algebra," in IEEE Transactions on Robotics, vol. 30, no. 5, pp. 1037-1048, Oct. 2014.

[23]

J, Daniel "Représentation de champs acoustiques, application à la transmission et à la reproduction de scènes sonores complexes dans un contexte multimedia", Thèse de doctorat de l'Université Paris 6, 2001.

[24]

M. Chapman, "A Standard for Interchange of Ambisonic Signal Sets. Including a file standard with metadata", Ambisonics Symposium 2009, Graz, June 25-27.

[25]

ISO/IEC 23091-3:2018 - Information technology Coding-independent code points Part 3: Audio

[26]

ISO/IEC 23008-3:2015 - Information technology High efficiency coding and media delivery in heterogeneous environments Part 3: 3D audio.

[27]

TS 26.249: "Immersive Audio for Split Rendering Scenarios; Detailed Algorithmic Description of Split Rendering Functions".

[28]

ETSI TS 103 634: "Digital Enhanced Cordless Telecommunications (DECT); Low Complexity Communication Codec plus (LC3plus)".

[29]

ETSI TR 103 633: "Digital Enhanced Cordless Telecommunications (DECT); Low Complexity Communication Codec plus (LC3plus); Performance characterization".

[30]

ETSI TS 103 624: "Characterization Methodology and Requirement Specifications for the ETSI LC3plus speech codec".

[31]

Bluetooth Special Interest Group, 'Basic Audio Profile', version 1.0.1.

[r1]

RFC 4566 (2006): "SDP: Session Description Protocol", M. Handley, V. Jacobson and C. Perkins.

[r2]

TS 26.114: "IP Multimedia Subsystem (IMS); Multimedia Telephony; Media handling and interaction".

[r3]

RFC 3550 (2003): "RTP: A Transport Protocol for Real-Time Applications", Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson.

[r4]

RFC 3551 (2003): "RTP Profile for Audio and Video Conferences with Minimal Control", Schulzrinne, H. and S. Casner.

[r5]

RFC 4867 (2007): "RTP Payload Format and File Storage Format for the Adaptive Multi-Rate (AMR) and Adaptive Multi-Rate Wideband (AMR-WB) Audio Codecs", Sjoberg, J., Westerlund, M., Lakaniemi, A., and Q. Xie.

[r6]

RFC 7160 (2014): "Support for Multiple Clock Rates in an RTP Session", Petit-Huguenin, M. and G. Zorn, Ed.

[X1]

ETSI TS 103 634: "Digital Enhanced Cordless Telecommunications (DECT); Low Complexity Communication Codec plus (LC3plus)".

3 Definitions of terms, symbols and abbreviations p. 25

3.1 Terms p. 25

For the purposes of the present document, the terms given in TR 21.905 and the following apply. A term defined in the present document takes precedence over the definition of the same term, if any, in TR 21.905.

frame:

an array of audio samples or metadata spanning a 20-ms time duration.

3.2 Symbols p. 26

For the purposes of the present document, the following symbols and conventions to mathematical expressions apply:

Energy

Sample rate

Gain

Metadata

Length of an audio buffer in samples (e.g., L_frame is the length of a frame in samples)

Mode (e.g., M_element is the element mode or M_core is the core-coder mode)

Number of audio channels

Bitrate

Audio signal in time domain

Audio signal in frequency domain (spectrum)

s(n)

(n) indicates the nth sample of the audio signal s

E(b)

(b) indicates the bth band in the energy vector E

S(k)

(k) indicates the kth frequency bin

S(k,n)

(k,n) indicates the nth time slot of the kth frequency bin of the discrete time-frequency spectrum

s(m;n)

(m;n) indicates the nth sample within mth subframe

s(t)

(t) indicates the time instant t in the continuous time domain

S_i,s_i

Lower index i indicates the ith channel (input or transport) of a multi-channel signal. The indexing starts from 1

s_HP20 (n)

HP20 filtered time domain signal

s_inp (n)

Input signal to IVAS encoder

S^MDFT

Superscript MDFT indicates the type of frequency-domain transform; also FFT and CLDFB

Quantized (coded) version (of frequency-domain audio signal)

Mean value (of energy)

s^[-1] (n)

Upper index indicates a particular frame, e.g., [-1] refers to the previous frame. When omitted, current frame is assumed by default

3.3 Abbreviations p. 26

For the purposes of the present document, the abbreviations given in TR 21.905 and the following apply. An abbreviation defined in the present document takes precedence over the definition of the same abbreviation, if any, in TR 21.905.

ACELP

Algebraic Code-Excited Linear Prediction

AGC

Adaptive Gain Control

AllRAD

All-Round Ambisonics Decoding

BPF

Bass PostFilter

BRIR

Binaural Room Impulse Response

DirAC

Directional Audio Coding

CBR

Constant Bit Rate

CLDFB

Complex Low-Delay FilterBank

CNA

Comfort Noise Addition

CNG

Comfort Noise Generation

CPE

Channel Pair Element

DFT

Discrete Fourier Transform

DoA

Direction of Arrival

DoF

Degree of Freedom

DTX

Discontinuous Transmission

Entropy Coding

EFAP

Edge Fading Amplitude Panning

FEC

Frame Erasure Concealment

FOA

First-Order Ambisonics

GCC-PHAT

Generalized Cross-Correlation PHAse Transform

High-Quality

HOA

Higher-Order Ambisonics

HRIR

Head-Related Room Impulse Response

HRTF

Head-Related Transfer Function

IC-BWE

Inter-Channel BandWidth Extension

ICA

Inter-Channel Alignment

ICC

Inter-Channel Coherence

IGF

Intelligent Gap Filling

ILD

Inter-Channel Level Difference

ITD

Inter-Channel Time Difference

ISM

Independent Streams with Metadata

JBM

Jitter Buffer Management

LCLD (codec)

Low-Complexity Low-Delay (codec)

LC3plus

Low Complexity Communication Codec plus

LFE

Low-Frequency Effects

Linear Prediction

MASA

Metadata-Assisted Spatial Audio

McMASA

Multi-Channel MASA

Multi-Channel

MCT

Multi-channel Coding Tool

MetaData

MDCT

Modified Discrete Cosine Transform

MDFT

Modified Discrete Fourier Transform

MDST

Modified Discrete Sine Transform

OLA

OverLap-Add

OMASA

Objects with MASA

OSBA

Objects with SBA

PCA

Principal Component Analysis

PLC

Packet Loss Concealment

SAD

Sound Activity Detection

SBA

Scene-Based Audio

SCE

Single Channel Element

Spherical Harmonics

SNS

Spectral Noise Shaping

SPAR

Spatial Reconstruction

STFT

Short-Term Fourier Transform

TCX

Transform-Coded eXcitation

Time-Domain

Time-Frequency

TNS

Temporal Noise Shaping

TSM

Time Scale Modification

VAD

Voice Activity Detection

VBAP

Vector Base Amplitude Panning

VBR

Variable Bit Rate