Tech-invite3GPPspaceIETFspace
21222324252627282931323334353637384‑5x
Top   in Index   Prev   Next

TS 26.452
Codec for Enhanced Voice Services (EVS) –
ANSI C code –
Alternative Fixed-Point using Updated Basic Operators

V18.0.0 (PDF)  2024/03  12 p.
V17.0.0  2022/03  12 p.
V16.4.0  2021/12  12 p.
Rapporteur:
Dr. Pawate, Raj
Cadence Design Systems Inc.

Content for  TS 26.452  Word version:  18.0.0

Here   Top

1  Scopep. 5

The present document contains an electronic copy of the ANSI C code for alternative fixed-point implementation of the Enhanced Voice Services (EVS) Codec using updated basic operators [13]. The ANSI-C code is necessary for a bit exact implementation of the EVS Codec (TS 26.445), Voice Activity Detection (VAD) (TS 26.451), Comfort Noise Generation (CNG) (TS 26.449), Discontinuous Transmission (DTX) (TS 26.450), Packet Loss Concealment (PLC) of Lost Packets (TS 26.447), Jitter Buffer Management (JBM) (TS 26.448), and AMR-WB Interoperable Function (TS 26.446).
Requirements for any implementation of the EVS codec to be standard compliant are specified in TS 26.444 (Test sequences).
Up

2  Referencesp. 5

The following documents contain provisions which, through reference in this text, constitute provisions of the present document.
  • References are either specific (identified by date of publication, edition number, version number, etc.) or non-specific.
  • For a specific reference, subsequent revisions do not apply.
  • For a non-specific reference, the latest version applies. In the case of a reference to a 3GPP document (including a GSM document), a non-specific reference implicitly refers to the latest version of that document in the same Release as the present document.
[1]
TR 21.905: "Vocabulary for 3GPP Specifications".
[2]
TS 26.445: "Codec for Enhanced Voice Services (EVS); Detailed Algorithmic Description".
[3]
TS 26.451: "Codec for Enhanced Voice Services (EVS); Voice Activity Detection (VAD)".
[4]
TS 26.449: "Codec for Enhanced Voice Services (EVS); Comfort Noise Generation (CNG) Aspects".
[5]
TS 26.450: "Codec for Enhanced Voice Services (EVS); Discontinuous Transmission (DTX)".
[6]
TS 26.447: "Codec for Enhanced Voice Services (EVS); Error Concealment of Lost Packets".
[7]
TS 26.448: "Codec for Enhanced Voice Services (EVS); Jitter Buffer Management".
[8]
TS 26.446: "Codec for Enhanced Voice Services (EVS); AMR-WB Backward Compatible Functions".
[9]
TS 26.444: "Codec for Enhanced Voice Services (EVS); Test Sequences".
[10]
RFC 3550:  "RTP: A Transport Protocol for Real-Time Applications".
[11]
Recommendation ITU-T G.191 (03/10): "Software tools for speech and audio coding standardization".
[12]
Recommendation ITU-T G.192: "A common digital parallel interface for speech standardization activities".
[13]
TR 26.973: "Update to fixed-point basic operators".
Up

3  Definitions and abbreviationsp. 6

3.1  Definitionsp. 6

Definition of terms used in the present document, can be found in TS 26.445, TS 26.451, TS 26.449, TS 26.450, TS 26.447, TS 26.448 and TS 26.446.
Up

3.2  Abbreviationsp. 6

For the purposes of the present document, the abbreviations given in TR 21.905 and the following apply. An abbreviation defined in the present document takes precedence over the definition of the same abbreviation, if any, in TR 21.905.
ACELP
Algebraic Code-Excited Linear Prediction
AMR-WB
Adaptive Multi Rate Wideband (codec)
CNG
Comfort Noise Generator
DTX
Discontinuous Transmission
EVS
Enhanced Voice Services
FB
Fullband
FEC
Frame Erasure Concealment
IP
Internet Protocol
JBM
Jitter Buffer Management
MSB
Most Significant Bit
MTSI
Multimedia Telephony Service for IMS
NB
Narrowband
PS
Packet Switched
PSTN
Public Switched Telephone Network
SAD
Sound Activity Detection
SC-VBR
Source Controlled - Variable Bit Rate
SID
Silence Insertion Descriptor
SWB
Super WideBand
VAD
Voice Activity Detection
WB
Wideband
WMOPS
Weighted Millions of Operations Per Second
Up

4  C code structurep. 6

4.0  Generalp. 6

This clause gives an overview of the structure of the bit-exact C code and provides an overview of the contents and organization of the C code attached to the present document.
The C code has been verified on the following systems:
  • IBM PC compatible computers with Windows 7 or 8 operating system and Microsoft Visual C++ 2017 compiler, 32 bit builds.
  • IBM PC compatible computers with Linux operating system and GNU gcc compiler version 4.3.x, 32 bit builds.
    ANSI-C was selected as the programming language because portability was desirable.
Up

4.1  Contents of the C source codep. 6

The C code distribution is organized as follows:
Directory Description
README.txtinformation on how to compile
MakefileUNIX style encoder Makefile
Workspace_msvc/Directory for the MSVC 2017 project files
basic_op/Source code files containing all ITU-T fixed-point basic operators.
basic_math/Source code files contains mathematical fixed-point functions
lib_com/Source code files used in encoder and decoder
lib_dec/Source code files used solely in the decoder
lib_enc/Source code files used solely in the encoder
The distributed files with suffix "c" contain the source code and the files with suffix "h" are the header files. The ROM data is contained in files named "rom_xxx" with suffix "c".
Makefiles are provided for the platforms in which the C code has been verified (listed above). Once the software is installed, this directory will have a compiled version of the encoder (named EVS_cod) and the decoder (named EVS_dec).
Up

4.2  Program executionp. 7

The codec for Enhanced Voice Services is implemented in two programs:
  • EVS_cod: speech/audio encoder;
  • EVS_dec: speech/audio decoder.
The programs should be called like:
  • EVS_cod [encoder options] <speech/audio input file> <parameter file>;
  • EVS_dec [decoder options]<parameter file> <speech/audio output file>.
The speech/audio files contain 16-bit linear encoded PCM speech/audio samples and the parameter files contain encoded speech/audio data.
The encoder and decoder options will be explained by running the applications without input arguments. See the file readme.txt for more information on how to run the encoder and decoder programs.
Up

5  File formatsp. 7

5.0  Generalp. 7

This clause describes the file formats used by the encoder and decoder programs. The test sequences defined in TS 26.444 also use the file formats described here.

5.1  Speech file (encoder input / decoder output)p. 7

Speech files read by the encoder and written by the decoder consist of 16-bit words speech/audio sample. The byte order depends on the host architecture (e.g. LSByte first on PCs, etc.). Both the encoder and the decoder program process complete frames (corresponding to 20 ms, for example, 640 samples at 32 kHz sampling frequency) only.
The encoder will pad the last frame to integer multiples of 20ms frames, i.e. n speech frames will be produced from an input file with a length between [(n-1)*20ms+1 sample; n*20ms]. The files produced by the decoder will always have a length of n*20ms.
Up

5.2  Rate switching profile (encoder input)p. 8

The encoder program can optionally read in a rate switching profile which specifies the encoding bitrate for each frame of speech processed. The file is a binary file, generated by 'gen-rate-profile', which is part of STL 2009, as contained in ITU-T G.191 [11]. The rate switching profile can contain EVS primary mode bitrates and AMR-WB IO mode bitrates arbitrarily. I.e. switching between the two modes can be specified by the rate switching profile.
Up

5.3  Parameter bitstream file (encoder output / decoder input)p. 8

5.3.0  Generalp. 8

The files produced by the speech/audio encoder/expected by the speech decoder contain an arbitrary number of frames in the following available formats.

5.3.1  ITU-T G.192 compliant formatp. 8

 
SYNC_WORD DATA_LENGTH B1 B2 Bnn
 
Each box corresponds to one Word16 value in the bitstream file, for a total of 2+nn words or 4+2nn bytes per frame, where nn is the number of encoded bits in the frame. Each encoded bit is represented as follows: Bit 0 = 0x007f, Bit 1 = 0x0081. The fields have the following meaning:
  • SYNC_WORD: Word to ensure correct frame synchronization between the encoder and the decoder. It is also used to indicate the occurrences of bad frames.
    In the encoder output:(0x6b21)
    In the decoder input:Good frames (0x6b21)
    Bad frames (0x6b20)
  • DATA_LENGTH: Length of the speech data. Codec mode and frame type is extracted in the decoder using this parameter
Up

5.3.2  Compact storage format filep. 8

The encoder and decoder programs can optionally write and read a file in the octet-based compact storage format. The compact storage format is specified in clause A.2.6 of TS 26.445.

5.4  VoIP parameter bitstream file (decoder input)p. 8

 
Packet size Arrival time RTP header G.192 format (see clause 5.3.1)
 
The fields have the following size and meaning:
  • Packet size: 32 bit unsigned integer. (= 12 + 2 + DATA_LENGTH)
  • Arrival time: 32 bit unsigned integer. in ms.
  • RTP header: 96 bits (see RFC 3550), including RTP timestamp and SSRC.
Up

5.5  Bandwidth switching profile (encoder input)p. 9

The encoder program can optionally read in a bandwidth switching profile, which specifies the encoding bandwidth for each frame of speech processed. The file is a text file where each line contains "nb_frames B". B specifies the signal bandwidth that is one of the supported four bandwidths, i.e. NB, WB, SWB or FB. And "nb_frames" is an integer number of frames and specifies the duration of activation of the accompanied signal bandwidth B.

5.6  Channel-aware configuration file (encoder input and decoder output)p. 9

The encoder program can optionally read in a configuration file which specifies the values of FEC indicator p and FEC offset o, where FEC indicator, p: LO or HI, and FEC offset, o: 2, 3, 5, or 7 in number of frames. Each line of the configuration file contains the values of p and o separated by a space.
The channel-aware configuration file is meant to simulate channel feedback from a receiver to a sender, i.e. the decoder would generate FEC indication and FEC offset values for receiver feedback that correspond to the current transmission channel characteristics, thereby allowing optimization of the transmission by the encoder which applies the FEC offset and FEC indication when in the channel-aware mode.
Up

5.7  JBM trace file (decoder output)p. 9

The decoder can generate a JBM trace file with the -Tracefile switch as a by-product of the decoder operation in case of JBM operation (which is triggered with the -VOIP switch on the decoder side).
The trace file is a CSV file with semi-colon as separator. The trace file starts with one header line that contains the column names in the following order:
rtpSeqNo;rtpTs;rcvTime;playtime;active
For each played out speech frame one entry is written to the trace file. The interval of the playtime values is usually 20ms, but may differ, depending on the JBM operation. Each entry is a line in the trace file that contains values as specified in Table 2.
Name Unit Description
rtpSeqNo1RTP sequence number of played out speech frame. -1 if no corresponding RTP packet for the speech frame exists.
rtpTsmsRTP time stamp of played out speech frame. -1 if no corresponding RTP packet for the speech frame exists
rcvTimemsAbsolute reception time of the RTP packet that corresponds to the speech frame. -1 if no corresponding RTP packet for the speech frame exists.
playtimemsAbsolute play time (i.e. the time at which the PCM data is made available by the decoder). Can be floating-point value.
active0 or 1Binary entry, which is set to 1 for active speech frames (i.e. frames that are neither SID nor NO_DATA)
Up

$  Change historyp. 10


Up   Top