Content for TR 26.952 Word version: 18.0.0

4.1 Project History 4.2 Overview of the EVS Codec Work Item 4.3 Presentation of the Following clauses
...

1 Scope p. 7

The present document provides information on the Enhanced Voice Services (EVS) codec Selection, Verification and Characterization Phases which were run using the fixed-point code (TS 26.442). Experimental test results from the subjective quality testing are reported to illustrate the behaviour of the EVS codec. Additional information is provided on implementation complexity of the EVS codec and objective test results. Also the verification results for the floating-point version of the EVS codec (TS 26.443) are presented.

2 References p. 7

The following documents contain provisions which, through reference in this text, constitute provisions of the present document.

References are either specific (identified by date of publication, edition number, version number, etc.) or non-specific.
For a specific reference, subsequent revisions do not apply.
For a non-specific reference, the latest version applies. In the case of a reference to a 3GPP document (including a GSM document), a non-specific reference implicitly refers to the latest version of that document in the same Release as the present document.

[1]

TR 21.905: "Vocabulary for 3GPP Specifications".

[2]

TS 26.441: "Codec for Enhanced Voice Services (EVS); General overview".

[3]

TS 26.442: "Codec for Enhanced Voice Services (EVS); ANSI C code (fixed-point)".

[4]

TS 26.443: "Codec for Enhanced Voice Services (EVS); ANSI C code (floating-point)".

[5]

TS 26.444: "Codec for Enhanced Voice Services (EVS); Test Sequences".

[6]

TS 26.445: "Codec for Enhanced Voice Services (EVS); Detailed algorithmic description".

[7]

TS 26.446: "Codec for Enhanced Voice Services (EVS); Adaptive Multi-Rate - Wideband (AMR-WB) backward compatible functions".

[8]

TS 26.447: "Codec for Enhanced Voice Services (EVS); Error concealment of lost packets".

[9]

TS 26.448: "Codec for Enhanced Voice Services (EVS); Jitter buffer management".

[10]

TS 26.449: "Codec for Enhanced Voice Services (EVS); Comfort Noise Generation (CNG) aspects".

[11]

TS 26.450: "Codec for Enhanced Voice Services (EVS); Discontinuous Transmission (DTX)".

[12]

TS 26.451: "Codec for Enhanced Voice Services (EVS); Voice Activity Detection (VAD)".

[13]

TS 26.114: "IP Multimedia Subsystem (IMS); Multimedia telephony; Media handling and interaction".

[14]

TS 26.131: "Terminal acoustic characteristics for telephony; Requirements".

[15]

3GPP SP-100202: "EVS Work Item Description".

[16]

TR 22.813: "Study of use cases and requirements for enhanced voice codecs for the Evolved Packet System (EPS) ".

[17]

EVS-3 Permanent Document: "EVS Performance Requirements".

[18]

EVS-4 Permanent Document: "EVS Design Constraints".

[19]

EVS-5b Permanent Document: "EVS Selection Rules".

[20]

EVS-6b Permanent Document: "EVS Selection Deliverables".

[21]

EVS-7b Permanent Document: "Processing Plan for the EVS Selection Phase".

[22]

EVS-8b Permanent Document: "Test Plan for the EVS Selection Phase".

[23]

EVS-7c Permanent Document: Processing Plan for the EVS Characterization Phase

[24]

EVS-8c Permanent Document: "Test Plan for the EVS Characterization Phase".

[25]

Recommendation ITU-T P.800: "Methods for subjective determination of transmission quality".

[26]

TR 22.105: "Services and service capabilities".

[27]

Recommendation ITU-T G.191: "Software tools for speech and audio coding standardization", 03/2010, electronic attachment: STL2009 Software Tool Library.

[28]

Recommendation ITU-T G.100.1: "The use of the decibel and of relative levels in speechband telecommunications", 11/2001.

[29]

TS 26.132: "Speech and video telephony terminal acoustic test specification (Release 12)".

[30]

Recommendation ITU-T P.501: "Test signals for use in telephonometry", 01/2012.

3 Abbreviations p. 8

For the purposes of the present document, the abbreviations given in TR 21.905 and the following apply. An abbreviation defined in the present document takes precedence over the definition of the same abbreviation, if any, in TR 21.905.

ACELP

Algebraic Code-Excited Linear Prediction

ACR

Absolute Category Rating

AMR

Adaptive Multi-Rate

AMR-WB

Adaptive Multi-Rate Wideband

CCR

Comparison Category Rating

Confidence Interval

CMOS

Comparison MOS

CNG

Comfort Noise Generation

Circuit Switched

CuT

Codec under Test

DCR

Degradation Category Rating

DMOS

Differential MOS

Direct Source

DTMF

Dual Tone Multi Frequency

DTX

Discontinuous Transmission

EDGE

Enhanced Data rates for GSM Evolution

EFR

Enhanced Full-Rate

ETSI

European Telecommunication Standards Institute

EVS

Enhanced Voice Services

Fullband

FEC

Frame Erasure Concealment

FER

Frame Erasure

Full-Rate

GAL

Global Analysis Laboratory

GERAN

GSM/EDGE Radio Access Network

GSM

Global System for Mobile communications

High Definition

Half-Rate

Interoperable

ITU-T

International Telecommunication Union - Telecommunications Standardisation Sector

Internet Protocol

JICO

Jitter Induced Concealment Operation

JBM

Jitter Buffer Management

Listening Laboratory

MNRU

Modulated Noise Reference Unit

MOPS

Million of Operation per Seconds

MOS

Mean Opinion Score

MSB

Most Significant Bit

MTSI

Multimedia Telephony Service for IMS

Narrowband

Packet Switched

PSTN

Public Switched Telephone Network

REF

Reference

TSG-SA

Technical Specification Group - Service and System Aspects

SA4

Service and System Aspects Working Group 4 (TSG-SA WG4)

SAD

Sound Activity Detection

SC-VBR

Source Controlled - Variable Bit Rate

SID

Silence Insertion Descriptor

SNR

Signal To Noise Ratio

SWB

Super Wideband

TFO

Tandem Free Operation

TSG

Technical Specification Group

UMTS

Universal Mobile Telecommunication System

UTRAN

Universal Terrestrial Radio Access network

VAD

Voice Activity Detection

WID

Work Item Description

Wideband

wMOPS

weighted Million of Operations per Second

4 General p. 9

4.1 Project History p. 9

In 2010, 3GPP finalized the Enhanced Voice Services (EVS) study item with the publication of TR 22.813. This study focused on how 3GPP could maintain the high value and competitiveness of its voice services and whether the new Evolved Packet System (EPS) with LTE (Long Term Evolution) access could open up new opportunities for a major voice service enhancements. Mobile use cases pertinent to LTE access and that may benefit from improved audio quality were studied. Part of the study included examining any potential need for enhanced codecs beyond AMR and AMR-WB, the codecs now used in 3GPP voice services. Envisioned use cases for enhanced voice services included improvements beyond classical telco-grade telephony (typically realized as IMS Multimedia Telephony), high-quality multi-party conferencing, call on hold or audio-visual communication, offering a 'being-there' quality of experience. Additional aspects of the study included how enhanced voice services could complement the existing voice service. Even streaming voice and audio as well as offline voice and audio delivery were also considered as an application scenario using the EVS codec.

Based on the conclusions of the study item in TR 22.813, 3GPP immediately launched a work item targeting the standardization of a new speech codec for Enhanced Voice Services, the EVS codec. The goal of the work item with its WID objectives was to provide clear benefit in terms of overall service quality, service efficiency and interoperability in 3GPP LTE networks. As a result of the study item, it is anticipated that enhanced voice services based on the new EVS codec will become the dominant voice service in 3GPP LTE networks. It is further envisioned that enhanced voice services with EVS will extend beyond 3GPP LTE system scope, ranging from deployments in circuit switched, to other mobile and wireless (WiFi) networks, fixed networks and the Internet. In that context not only the performance of the EVS codec in comparison to existing 3GPP and ITU-T codecs is of interest but even to other state-of-the art codecs.

Thirteen companies declared their intention to submit codecs to the Qualification Phase. Each codec was evaluated in 12 subjective experiments, each conducted twice; once in the candidates' own laboratory and once in a laboratory selected at random from the other 12, see EVS-7b [21] and EVS-8b [22]. Tests were blinded with all of the processing being conducted by a dedicated Host laboratory (Dynastat Inc.). Each of the candidates was evaluated against the requirements by an independent (non-codec proponent) Global Analysis Laboratory (GAL, Dynastat Inc.). At 3GPP SA4#72bis meeting in March 2013, the top five candidates were judged to have qualified although all 13 codecs had passed more than 95 % of the 296 requirements tested duplicated in two languages. After the qualification process, companies declared several collaborations around the qualified candidates. Note that the test results of the Qualification Phase are not included in the present document because they reflect different coders than the final standard.

As a result of examining the codec high level descriptions provided by each candidate at the Qualification meeting, it became clear to the various collaboration groups that all of the qualified candidates were based upon very similar coding principles.

In September 2013, 12 companies (Ericsson, Fraunhofer IIS, Huawei, Nokia, NTT, NTT DOCOMO, Orange, Panasonic, Qualcomm, Samsung, VoiceAge and ZTE Corporation) that confirmed their intent to submit a codec in selection declared their intention to work together and to develop a single jointly-developed candidate for the Selection Phase by merging the best elements of the codecs from each of the different collaboration groups.

Even though only a single codec entered the Selection Phase the strict 3GPP process for codec selection was maintained. The subjective Selection testing comprised 24 experiments, each conducted in two languages. Independent (non-codec proponent) Host Lab (Dynastat Inc.), Cross-check Lab (Audio Research Labs, LLC), Listening Labs (Dynastat Inc., DELTA, and Mesaqin.com s.r.o. (Ltd.)) and Global Analysis Lab (Dynastat Inc.) were used. This testing allowed the codec to be evaluated in 389 requirements, duplicated in two languages. The codec exhibited only two systematic failures (in both languages) at the 95 % confidence level. One of these failures was subsequently addressed as it was found to be the result of a software bug. Objective testing was also performed.

The single joint candidate was selected at 3GPP SA4#80bis meeting in August 2014 and the EVS codec specifications were approved at 3GPP TSG-SA#65 in September 2014. The selected EVS codec fulfils the project targets.

Verification Phase was launched and several organizations volunteered to verify that the code supplied to 3GPP complied with the design constraints and requirements.

The Characterization Phase is the latest phase. During this phase the codec was tested in a more complete manner than in the selection phase. In order to evaluate the selected codec in the broadest possible way a further set of 17 subjective experiments have been designed. Five of these experiments have been conducted in two different languages, for a total of 22 tests. The aim of these additional experiments, and other objective evaluations, was to evaluate features of the codec which remained untested in previous phases or to highlight areas of interest to 3GPP such as tandeming cases, fullband cases, and multi-bandwith comparisons. The same listening laboratories used for selection were again employed in characterization.

3GPP has also specified a floating-point version of the EVS codec (TS 26.443). This work was completed by 3GPP TSG-SA#66 in December 2014.

4.2 Overview of the EVS Codec Work Item p. 10

This clause provides an overview of the objectives before the actual work started, as a historical background. The standardized EVS codec fulfilled all project objectives [15].

With the advent of increasingly compact yet powerful mobile devices and the proliferation of high-speed wireless access to telecommunications networks around the globe, users of mobile devices expect and demand growing sophistication in the communication services being offered. Multi-modal interfaces supporting rich multimedia services for content and conversation are commonplace on the desktop, with demand for smart mobile devices with similar functionality steadily growing.

The identification of this potential was the background for 3GPP to launch a study investigating and defining the use cases and requirements for an Enhanced Voice Service in the Evolved Packet System leading to TR 22.813. The present document defines a new set of high-level technical recommendations and recommended requirements for a new codec for the Enhanced Voice Service and concludes that substantially enhanced voice services will become possible with a codec meeting them. The present document recommends starting an EVS codec development work item with the target to meet the requirements and recommendations set in it.

The overall objective of this work item is to develop a codec suitable for the Enhanced Voice Service in the EPS. The following objectives should be achieved with the new codec:

Enhanced quality and coding efficiency for narrowband (NB) and wideband (WB) speech services, leading to improved user experience and system efficiency. This should also be achieved in interoperation with 3GPP pre-Rel-10 systems and services employing WB voice.
Enhanced quality by the introduction of super-wideband (SWB) speech, leading to improved user experience.
Enhanced quality for mixed content and music in conversational applications (for example, in-call music), leading to improved user experience for cases when selection of dedicated 3GPP audio codecs is not possible.
Robustness to packet loss and delay jitter, leading to optimized behaviour in IP application environments like MTSI within the EPS.
Backward interoperability to the 3GPP AMR-WB codec by having some WB EVS modes supporting the AMR-WB codec format used throughout 3GPP conversational speech telephony service (including CS). The AMR-WB interoperable operation modes of the EVS codec may be either identical to those in the AMR-WB codec or different but bitstream interoperable with them.

These are the project objectives while meeting all design constraints and performance requirements set forth in TR 22.813. It is further desirable that the codec fulfills needs for enhanced voice services in other 3GPP systems, such as CS. The developments under this work item should lead to a set of new specifications defining among others textual description of the coding algorithm and the VAD/DTX/CNG scheme.

Following 3GPP practice, fixed-point and floating-point C code and associated test vectors should also be part of this set of specifications. The included AMR-WB interoperable coding format may become an alternative implementation for AMR-WB operation, provided that the enhancements are consistently significant. Jitter buffer management and packet loss concealment should be specified as part of the set of EVS specifications.

The EVS codec enhances coding efficiency and quality for NB and WB for a large bit rate range, starting from 5.9 kbps VBR. It further provides a significant step in quality over these traditional telephony bandwidths with SWB and FB operation starting from 9.6 and 16.4 kbps, respectively. Maximum bit rate is 128 kbps with support for WB, SWB, and FB. The ability to switch the bit rate at every 20-ms frame allows the codec to easily adapt to changes in channel capacity. The codec features discontinuous transmission (DTX) with algorithms for voice/sound activity detection (VAD) and comfort noise generation (CNG). An error concealment mechanism mitigates the quality impact of channel errors resulting in lost packets. A system for jitter buffer management (JBM) is included. The codec also features a channel-aware mode to further improve frame/packet error resilience. Enhanced interoperation with AMR-WB is provided over all nine bit rates between 6,6 kbps and 23,85 kbps.

4.3 Presentation of the Following clauses p. 11

Clause 5 outlines the Terms of reference for the EVS project. In clause 6, the selection process in 3GPP is presented. An overview of selection and characterization tests can be found in clause 7. The subjective tests provide statistical data which are subject to variations; important notes about interpretation of results are described in clause 8.

The actual test results are presented in clause 9 (narrowband), clause 10 (wideband), and clause 11 (super-wideband). Clause 12 contains the results of mixed-bandwidth and full-band test, while clause 13 presents the results of objective evaluations.