In 2010, 3GPP finalized the Enhanced Voice Services (EVS) study item with the publication of
TR 22.813. This study focused on how 3GPP could maintain the high value and competitiveness of its voice services and whether the new Evolved Packet System (EPS) with LTE (Long Term Evolution) access could open up new opportunities for a major voice service enhancements. Mobile use cases pertinent to LTE access and that may benefit from improved audio quality were studied. Part of the study included examining any potential need for enhanced codecs beyond AMR and AMR-WB, the codecs now used in 3GPP voice services. Envisioned use cases for enhanced voice services included improvements beyond classical telco-grade telephony (typically realized as IMS Multimedia Telephony), high-quality multi-party conferencing, call on hold or audio-visual communication, offering a 'being-there' quality of experience. Additional aspects of the study included how enhanced voice services could complement the existing voice service. Even streaming voice and audio as well as offline voice and audio delivery were also considered as an application scenario using the EVS codec.
Based on the conclusions of the study item in
TR 22.813, 3GPP immediately launched a work item targeting the standardization of a new speech codec for Enhanced Voice Services, the EVS codec. The goal of the work item with its WID objectives was to provide clear benefit in terms of overall service quality, service efficiency and interoperability in 3GPP LTE networks. As a result of the study item, it is anticipated that enhanced voice services based on the new EVS codec will become the dominant voice service in 3GPP LTE networks. It is further envisioned that enhanced voice services with EVS will extend beyond 3GPP LTE system scope, ranging from deployments in circuit switched, to other mobile and wireless (WiFi) networks, fixed networks and the Internet. In that context not only the performance of the EVS codec in comparison to existing 3GPP and ITU-T codecs is of interest but even to other state-of-the art codecs.
Thirteen companies declared their intention to submit codecs to the Qualification Phase. Each codec was evaluated in 12 subjective experiments, each conducted twice; once in the candidates' own laboratory and once in a laboratory selected at random from the other 12, see EVS-7b [21] and EVS-8b [22]. Tests were blinded with all of the processing being conducted by a dedicated Host laboratory (Dynastat Inc.). Each of the candidates was evaluated against the requirements by an independent (non-codec proponent) Global Analysis Laboratory (GAL, Dynastat Inc.). At 3GPP SA4#72bis meeting in March 2013, the top five candidates were judged to have qualified although all 13 codecs had passed more than 95 % of the 296 requirements tested duplicated in two languages. After the qualification process, companies declared several collaborations around the qualified candidates. Note that the test results of the Qualification Phase are not included in the present document because they reflect different coders than the final standard.
As a result of examining the codec high level descriptions provided by each candidate at the Qualification meeting, it became clear to the various collaboration groups that all of the qualified candidates were based upon very similar coding principles.
In September 2013, 12 companies (Ericsson, Fraunhofer IIS, Huawei, Nokia, NTT, NTT DOCOMO, Orange, Panasonic, Qualcomm, Samsung, VoiceAge and ZTE Corporation) that confirmed their intent to submit a codec in selection declared their intention to work together and to develop a single jointly-developed candidate for the Selection Phase by merging the best elements of the codecs from each of the different collaboration groups.
Even though only a single codec entered the Selection Phase the strict 3GPP process for codec selection was maintained. The subjective Selection testing comprised 24 experiments, each conducted in two languages. Independent (non-codec proponent) Host Lab (Dynastat Inc.), Cross-check Lab (Audio Research Labs, LLC), Listening Labs (Dynastat Inc., DELTA, and Mesaqin.com s.r.o. (Ltd.)) and Global Analysis Lab (Dynastat Inc.) were used. This testing allowed the codec to be evaluated in 389 requirements, duplicated in two languages. The codec exhibited only two systematic failures (in both languages) at the 95 % confidence level. One of these failures was subsequently addressed as it was found to be the result of a software bug. Objective testing was also performed.
The single joint candidate was selected at 3GPP SA4#80bis meeting in August 2014 and the EVS codec specifications were approved at 3GPP TSG-SA#65 in September 2014. The selected EVS codec fulfils the project targets.
Verification Phase was launched and several organizations volunteered to verify that the code supplied to 3GPP complied with the design constraints and requirements.
The Characterization Phase is the latest phase. During this phase the codec was tested in a more complete manner than in the selection phase. In order to evaluate the selected codec in the broadest possible way a further set of 17 subjective experiments have been designed. Five of these experiments have been conducted in two different languages, for a total of 22 tests. The aim of these additional experiments, and other objective evaluations, was to evaluate features of the codec which remained untested in previous phases or to highlight areas of interest to 3GPP such as tandeming cases, fullband cases, and multi-bandwith comparisons. The same listening laboratories used for selection were again employed in characterization.
3GPP has also specified a floating-point version of the EVS codec (
TS 26.443). This work was completed by 3GPP TSG-SA#66 in December 2014.
This clause provides an overview of the objectives before the actual work started, as a historical background. The standardized EVS codec fulfilled all project objectives [15].
With the advent of increasingly compact yet powerful mobile devices and the proliferation of high-speed wireless access to telecommunications networks around the globe, users of mobile devices expect and demand growing sophistication in the communication services being offered. Multi-modal interfaces supporting rich multimedia services for content and conversation are commonplace on the desktop, with demand for smart mobile devices with similar functionality steadily growing.
The identification of this potential was the background for 3GPP to launch a study investigating and defining the use cases and requirements for an Enhanced Voice Service in the Evolved Packet System leading to
TR 22.813. The present document defines a new set of high-level technical recommendations and recommended requirements for a new codec for the Enhanced Voice Service and concludes that substantially enhanced voice services will become possible with a codec meeting them. The present document recommends starting an EVS codec development work item with the target to meet the requirements and recommendations set in it.
The overall objective of this work item is to develop a codec suitable for the Enhanced Voice Service in the EPS. The following objectives should be achieved with the new codec:
-
Enhanced quality and coding efficiency for narrowband (NB) and wideband (WB) speech services, leading to improved user experience and system efficiency. This should also be achieved in interoperation with 3GPP pre-Rel-10 systems and services employing WB voice.
-
Enhanced quality by the introduction of super-wideband (SWB) speech, leading to improved user experience.
-
Enhanced quality for mixed content and music in conversational applications (for example, in-call music), leading to improved user experience for cases when selection of dedicated 3GPP audio codecs is not possible.
-
Robustness to packet loss and delay jitter, leading to optimized behaviour in IP application environments like MTSI within the EPS.
-
Backward interoperability to the 3GPP AMR-WB codec by having some WB EVS modes supporting the AMR-WB codec format used throughout 3GPP conversational speech telephony service (including CS). The AMR-WB interoperable operation modes of the EVS codec may be either identical to those in the AMR-WB codec or different but bitstream interoperable with them.
These are the project objectives while meeting all design constraints and performance requirements set forth in
TR 22.813. It is further desirable that the codec fulfills needs for enhanced voice services in other 3GPP systems, such as CS. The developments under this work item should lead to a set of new specifications defining among others textual description of the coding algorithm and the VAD/DTX/CNG scheme.
Following 3GPP practice, fixed-point and floating-point C code and associated test vectors should also be part of this set of specifications. The included AMR-WB interoperable coding format may become an alternative implementation for AMR-WB operation, provided that the enhancements are consistently significant. Jitter buffer management and packet loss concealment should be specified as part of the set of EVS specifications.
The EVS codec enhances coding efficiency and quality for NB and WB for a large bit rate range, starting from 5.9 kbps VBR. It further provides a significant step in quality over these traditional telephony bandwidths with SWB and FB operation starting from 9.6 and 16.4 kbps, respectively. Maximum bit rate is 128 kbps with support for WB, SWB, and FB. The ability to switch the bit rate at every 20-ms frame allows the codec to easily adapt to changes in channel capacity. The codec features discontinuous transmission (DTX) with algorithms for voice/sound activity detection (VAD) and comfort noise generation (CNG). An error concealment mechanism mitigates the quality impact of channel errors resulting in lost packets. A system for jitter buffer management (JBM) is included. The codec also features a channel-aware mode to further improve frame/packet error resilience. Enhanced interoperation with AMR-WB is provided over all nine bit rates between 6,6 kbps and 23,85 kbps.
Clause 5 outlines the Terms of reference for the EVS project. In
clause 6, the selection process in 3GPP is presented. An overview of selection and characterization tests can be found in
clause 7. The subjective tests provide statistical data which are subject to variations; important notes about interpretation of results are described in
clause 8.
The actual test results are presented in
clause 9 (narrowband),
clause 10 (wideband), and
clause 11 (super-wideband).
Clause 12 contains the results of mixed-bandwidth and full-band test, while
clause 13 presents the results of objective evaluations.