Tech-invite3GPPspaceIETFspace
21222324252627282931323334353637384‑5x

Content for  TS 26.253  Word version:  18.1.1

Top   Top   None   None   Next
1…   4…   5…   6…   7…   8…   A…

 

1  Scopep. 24

The present document is a detailed description of the signal processing algorithms of the Immersive Voice and Audio Services (IVAS) coder including the IVAS renderer.

2  Referencesp. 24

[1]
TR 21.905: "Vocabulary for 3GPP Specifications".
[2]
TS 26.441: "Codec for Enhanced Voice Services (EVS); General Overview".
[3]
TS 26.445: "Codec for Enhanced Voice Services (EVS); Detailed Algorithmic Description".
[4]
TS 26.447: "Codec for Enhanced Voice Services (EVS); Error concealment of lost packets".
[5]
TS 26.448: "Codec for Enhanced Voice Services (EVS); Jitter Buffer Management"
[6]
TS 26.250: "Codec for Immersive Voice and Audio Services (IVAS); General overview".
[7]
TS 26.251: "Codec for Immersive Voice and Audio Services (IVAS); C code (fixed-point)".
→ to date, still a draft
[8]
TS 26.252: "Codec for Immersive Voice and Audio Services (IVAS); Test Sequences".
[9]
TS 26.254: "Codec for Immersive Voice and Audio Services (IVAS); Rendering".
[10]
TS 26.255: "Codec for Immersive Voice and Audio Services (IVAS); Error concealment of lost packets".
[11]
TS 26.256: "Codec for Immersive Voice and Audio Services (IVAS); Jitter Buffer Management".
[12]
TS 26.258: "Codec for Immersive Voice and Audio Services (IVAS); C code (floating point)".
[13]
C. de Boor and K. Höllig (1987), B-splines without divided differences, in Geometric Modeling, G. Farin ed., SIAM, 21-27.
[14]
Borß, C. A Polygon-Based Panning Method for 3D Loudspeaker Setups. In Audio Engineering Society Convention 137, Los Angeles, USA, Oct. 2014.
[15]
Zotter, F. and Frank, M., All-Round Ambisonic Panning and Decoding, J. Audio Eng. Soc., vol. 60, no. 10, pp. 807-820 (Oct. 2012).
[16]
Allen, J. B., & Berkley, D. A., Image method for efficiently simulating small room acoustics. Journal of the Acoustical Society of America, 65(4), 943-950, (1979).
[17]
Box, G. E. P. and Cox, D. R. (1964). An analysis of transformations, Journal of the Royal Statistical Society, Series B, 26, 211-252.
[18]
TS 26.118: "Virtual Reality (VR) profiles for streaming applications".
[19]
Ivanic, J., & Ruedenberg, K., Rotation matrices for real spherical harmonics. Direct determination by recursion. The Journal of Physical Chemistry, 100(15), 6342-6347, 1996.
[20]
C. R. Helmrich and B. Edler, "Signal-Adaptive Transform Kernel Switching for Stereo Audio Coding," in Proc. IEEE WASPAA, New Paltz, NY, USA, Oct. 2015.
[21]
AES, "AES69-2022: AES standard for file exchange - Spatial acoustic data", Audio Engineering Society, 2022.
[22]
F. Thomas, "Approaching Dual Quaternions From Matrix Algebra," in IEEE Transactions on Robotics, vol. 30, no. 5, pp. 1037-1048, Oct. 2014.
[23]
J, Daniel "Représentation de champs acoustiques, application à la transmission et à la reproduction de scènes sonores complexes dans un contexte multimedia", Thèse de doctorat de l'Université Paris 6, 2001.
[24]
M. Chapman, "A Standard for Interchange of Ambisonic Signal Sets. Including a file standard with metadata", Ambisonics Symposium 2009, Graz, June 25-27.
[25]
ISO/IEC 23091-3:2018 - Information technology Coding-independent code points Part 3: Audio
[26]
ISO/IEC 23008-3:2015 - Information technology High efficiency coding and media delivery in heterogeneous environments Part 3: 3D audio.
[27]
TS 26.249: "Immersive Audio for Split Rendering Scenarios; Detailed Algorithmic Description of Split Rendering Functions".
[28]
ETSI TS 103 634: "Digital Enhanced Cordless Telecommunications (DECT); Low Complexity Communication Codec plus (LC3plus)".
[29]
ETSI TR 103 633: "Digital Enhanced Cordless Telecommunications (DECT); Low Complexity Communication Codec plus (LC3plus); Performance characterization".
[30]
ETSI TS 103 624: "Characterization Methodology and Requirement Specifications for the ETSI LC3plus speech codec".
[31]
Bluetooth Special Interest Group, 'Basic Audio Profile', version 1.0.1.
[r1]
RFC 4566  (2006): "SDP: Session Description Protocol", M. Handley, V. Jacobson and C. Perkins.
[r2]
TS 26.114: "IP Multimedia Subsystem (IMS); Multimedia Telephony; Media handling and interaction".
[r3]
RFC 3550  (2003): "RTP: A Transport Protocol for Real-Time Applications", Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson.
[r4]
RFC 3551  (2003): "RTP Profile for Audio and Video Conferences with Minimal Control", Schulzrinne, H. and S. Casner.
[r5]
RFC 4867  (2007): "RTP Payload Format and File Storage Format for the Adaptive Multi-Rate (AMR) and Adaptive Multi-Rate Wideband (AMR-WB) Audio Codecs", Sjoberg, J., Westerlund, M., Lakaniemi, A., and Q. Xie.
[r6]
RFC 7160  (2014): "Support for Multiple Clock Rates in an RTP Session", Petit-Huguenin, M. and G. Zorn, Ed.
[X1]
ETSI TS 103 634: "Digital Enhanced Cordless Telecommunications (DECT); Low Complexity Communication Codec plus (LC3plus)".
Up

3  Definitions of terms, symbols and abbreviationsp. 25

3.1  Termsp. 25

For the purposes of the present document, the terms given in TR 21.905 and the following apply. A term defined in the present document takes precedence over the definition of the same term, if any, in TR 21.905.
frame:
an array of audio samples or metadata spanning a 20-ms time duration.

3.2  Symbolsp. 26

For the purposes of the present document, the following symbols and conventions to mathematical expressions apply:
E
Energy
F
Sample rate
g
Gain
I
Metadata
L
Length of an audio buffer in samples (e.g., Lframe is the length of a frame in samples)
M
Mode (e.g., Melement is the element mode or Mcore is the core-coder mode)
N
Number of audio channels
R
Bitrate
s
Audio signal in time domain
S
Audio signal in frequency domain (spectrum)
s(n)
(n) indicates the nth sample of the audio signal s
E(b)
(b) indicates the bth band in the energy vector E
S(k)
(k) indicates the kth frequency bin
S(k,n)
(k,n) indicates the nth time slot of the kth frequency bin of the discrete time-frequency spectrum
s(m;n)
(m;n) indicates the nth sample within mth subframe
s(t)
(t) indicates the time instant t in the continuous time domain
S_i,s_i
Lower index i indicates the ith channel (input or transport) of a multi-channel signal. The indexing starts from 1
sHP20 (n)
HP20 filtered time domain signal
sinp (n)
Input signal to IVAS encoder
SMDFT
Superscript MDFT indicates the type of frequency-domain transform; also FFT and CLDFB
Ŝ
Quantized (coded) version (of frequency-domain audio signal)
Ē
Mean value (of energy)
s[-1] (n)
Upper index indicates a particular frame, e.g., [-1] refers to the previous frame. When omitted, current frame is assumed by default
Up

3.3  Abbreviationsp. 26

For the purposes of the present document, the abbreviations given in TR 21.905 and the following apply. An abbreviation defined in the present document takes precedence over the definition of the same abbreviation, if any, in TR 21.905.
ACELP
Algebraic Code-Excited Linear Prediction
AGC
Adaptive Gain Control
AllRAD
All-Round Ambisonics Decoding
BPF
Bass PostFilter
BRIR
Binaural Room Impulse Response
DirAC
Directional Audio Coding
CBR
Constant Bit Rate
CLDFB
Complex Low-Delay FilterBank
CNA
Comfort Noise Addition
CNG
Comfort Noise Generation
CPE
Channel Pair Element
DFT
Discrete Fourier Transform
DoA
Direction of Arrival
DoF
Degree of Freedom
DTX
Discontinuous Transmission
EC
Entropy Coding
EFAP
Edge Fading Amplitude Panning
FEC
Frame Erasure Concealment
FOA
First-Order Ambisonics
GCC-PHAT
Generalized Cross-Correlation PHAse Transform
HQ
High-Quality
HOA
Higher-Order Ambisonics
HRIR
Head-Related Room Impulse Response
HRTF
Head-Related Transfer Function
IC-BWE
Inter-Channel BandWidth Extension
ICA
Inter-Channel Alignment
ICC
Inter-Channel Coherence
IGF
Intelligent Gap Filling
ILD
Inter-Channel Level Difference
ITD
Inter-Channel Time Difference
ISM
Independent Streams with Metadata
JBM
Jitter Buffer Management
LCLD (codec)
Low-Complexity Low-Delay (codec)
LC3plus
Low Complexity Communication Codec plus
LFE
Low-Frequency Effects
LP
Linear Prediction
MASA
Metadata-Assisted Spatial Audio
McMASA
Multi-Channel MASA
MC
Multi-Channel
MCT
Multi-channel Coding Tool
MD
MetaData
MDCT
Modified Discrete Cosine Transform
MDFT
Modified Discrete Fourier Transform
MDST
Modified Discrete Sine Transform
OLA
OverLap-Add
OMASA
Objects with MASA
OSBA
Objects with SBA
PCA
Principal Component Analysis
PLC
Packet Loss Concealment
SAD
Sound Activity Detection
SBA
Scene-Based Audio
SCE
Single Channel Element
SH
Spherical Harmonics
SNS
Spectral Noise Shaping
SPAR
Spatial Reconstruction
STFT
Short-Term Fourier Transform
TCX
Transform-Coded eXcitation
TD
Time-Domain
TF
Time-Frequency
TNS
Temporal Noise Shaping
TSM
Time Scale Modification
VAD
Voice Activity Detection
VBAP
Vector Base Amplitude Panning
VBR
Variable Bit Rate
Up

Up   Top   ToC