Written by Anna Llagostera | March 24, 2023
The EVS codec was released in 2014 and developed in 3GPP – the creator of the 3G, 4G and 5G mobile communication standards – to replace the AMR-WB codec for VoLTE and VoNR mobile telephony. The codec improves speech quality thanks to an enhanced coding scheme, extended audio bandwidth (up to 20kHz) and improved delay jitter and packet loss compensation.
EVS AMR-WB interoperable (IO) mode
One of the requirements when developing the new EVS codec was to retain compatibility with AMR-WB. As a result, the new codec has an AMR-WB IO mode that is compatible with systems that do not support the EVS codec. EVS AMR-WB IO is increasingly used in current mobile networks and terminals.
Similar speech quality is expected from the EVS AMR-WB IO encoder or the legacy AMR-WB encoder, when decoding the resulting bitstream with a legacy AMR-WB decoder. In contrast, the EVS AMR-WB IO decoder can reproduce better speech quality from an AMR-WB encoded bitstream than a classical AMR-WB decoder. To prove this, we used speech files decoded in the EVS AMR-WB IO mode in a standardized, formal listening test in our speech quality lab.
Listening test
We designed a listening test with speech files ranging from excellent to very poor quality and invited people to evaluate the perceived quality of the recordings using an Absolute Category Rating (ACR), where listeners use a 5-point scale to rate the audio sample based on their expected speech quality over a HiFi headphone. In this 5-point scale, 5 is excellent and 1 is bad.
We conducted a listening test with speech signals ranging from fullband to narrowband. The test contained plain audio bandwidth limitations and offline processing conditions that use common speech codecs such as EVS, Opus*, AMR-WB and AMR. The test also included live recordings from R&S real field measurements that reflect typical codec and bitrate usage in mobile communications that were collected under good, average and poor network conditions.
We also added the conditions that better reflect current EVS AMR-WB IO usage in mobile networks and help improve perceived quality. We opted for AMR-WB encoding followed by AMR-WB IO decoding using 23.85 kbit/s and 12.65 kbit/s as bitrates. The goal was to see how listening quality improved relative to the legacy AMR-WB codec.
The listening test was conducted in the R&S SwissQual speech quality lab in Switzerland and used the four German reference speech samples standardized in ITU-T Recommendation P.501 Annex C. The speech samples were well-balanced two-sentence high-quality speech recordings with a 24kHz audio bandwidth. We invited 24 persons to listen to the speech files, none of whom had any a-priori knowledge of the test background or of voice coding techniques. The listeners were normal telephone users. The listeners and listening conditions in the lab are according to ITU-T standards, which was a pre-condition for the test to be considered by ITU as standard conformant, formal listening test.
Detailed results from the listening test were presented to ITU-T in January 2023.
Results
The table below shows the mean opinion scores (MOS) for that particular test design and listening panel. The reference speech samples are encoded with the AMR-WB codec and decoded with (a) the legacy AMR-WB decoder and (b) the EVS AMR-WB IO decoder.
We found that the perceived speech quality increased substantially for the two bitrates included in the listening test (23.85 and 12.65 kbit/s), when decoding an AMR-WB bitstream with an EVS AMR-WB IO decoder instead of the legacy AMR-WB decoder. The biggest improvements in listening quality were stronger for the lower studied bitrate (12.65 kbit/s). The listening test confirmed the better decoding capabilities of EVS AMR-WB IO and helped quantify the expected improvement in speech quality enabled by this EVS feature.
Extended audio bandwidth with EVS AMR-WB IO
The extended audio bandwidth reproduced by the EVS AMR-WB IO decoder is one reason for the clear gain in speech quality. The figure below shows the average signal spectrum for a speech signal encoded with the AMR-WB codec and decoded with (a) the standard AMR-WB decoder (in blue) and (b) the EVS AMR-WB IO decoder (in orange).
Extended audio bandwidth with EVS AMR-WB IO as decoder (orange) instead of AMR-WB (blue) to decode an AMR-WB stream with a 23.85 kbit/s bitrate
The EVS AMR-WB IO decoder can reproduce a speech signal with roughly 7.8kHz audio bandwidth, while the legacy AMR-WB decoder has ‘just’ 7kHz of audio bandwidth.
In addition to the extended audio bandwidth, EVS AMR-WB IO results in a better speech quality in packet loss free conditions due to the improved EVS post-processing modules that reduce the number of coding artifacts.
Speech quality prediction by Recommendation ITU-T P.863 ‘POLQA’
The quality of the speech samples recorded with our Rohde & Schwarz mobile network measurement tools is evaluated with Recommendation ITU-T P.863 ‘POLQA’, which Rohde & Schwarz SwissQual AG co-invented and holds intellectual property rights. Since EVS AMR-WB IO decoding is now increasingly used in real-field environments, validating the ITU-T P.863 prediction accuracy under such conditions is especially important.
We computed the average ITU-T P.863 ‘POLQA’ scores for the legacy AMR-WB decoder and the EVS AMR-WB IO decoder based on the reference speech samples standardized in ITU-T P.501 Annex D. Rohde & Schwarz uses the ITU-T P.501 Annex D reference speech samples in all its measurement tools, since the samples are specifically prepared for ITU-T P.863 ‘POLQA’. The samples use sentences spoken by male and female speakers and their ITU-T P.863 ‘POLQA’ scores are close to the average for many samples.
The table below shows the average ITU-T P.863 ‘POLQA’ scores in fullband (FB) mode for (a) the legacy AMR-WB decoder and (b) the EVS AMR-WB IO decoder.
The perceptible improvement in speech quality when decoding with the EVS AMR-WB IO decoder instead of the legacy AMR-WB decoder can be clearly reproduced with ITU-T P.863 ‘POLQA‘ for the two most common AMR-WB bitrates of 23.85 and 12.65 kbit/s.
Conclusion
Using the AMR-WB IO mode of EVS together with the legacy AMR-WB codec can improve speech quality for telephone services in mobile networks. EVS AMR-WB IO decoders are significantly better than legacy AMR-WB decoders in decoding AMR-WB bitstreams. The low cost of implementing this solution means we expect to see EVS AMR-WB IO mode used more often in networks and devices.
_______________________________
* The Opus codec, released in 2012 and standardized by the Internet Engineering Task Force (IETF) as RFC 6716, is used in OTT applications and more notably in VoIP WhatsApp audio calls. As the EVS codec, the Opus codec is able to deliver an audio bandwidth up to fullband.