ROBUST SPEECH CODING WITH EVS Anssi Rämö, Adriana Vasilache and Henri Toukomaa Nokia Techonologies, Tampere, Finland

Similar documents
ROBUST SPEECH CODING WITH EVS. Nokia Technologies, Tampere, Finland

Date. Next Generation in Speech Quality ETSI STQ Workshop, Nov 2012 Dr. Imre Varga Qualcomm Inc.

EVS Channel Aware Mode Robustness to Frame Erasures

MULTIMODE TREE CODING OF SPEECH WITH PERCEPTUAL PRE-WEIGHTING AND POST-WEIGHTING

Digital Speech Coding

Presents 2006 IMTC Forum ITU-T T Workshop

A MULTI-RATE SPEECH AND CHANNEL CODEC: A GSM AMR HALF-RATE CANDIDATE

The BroadVoice Speech Coding Algorithm. Juin-Hwey (Raymond) Chen, Ph.D. Senior Technical Director Broadcom Corporation March 22, 2010

Speech-Coding Techniques. Chapter 3

TECHNICAL PAPER. Fraunhofer Institute for Integrated Circuits IIS

Speech and audio coding

Real-time Audio Quality Evaluation for Adaptive Multimedia Protocols

The MPEG-4 General Audio Coder

ETSI TS V ( )

ETSI TS V ( )

5: Music Compression. Music Coding. Mark Handley

Data Compression. Audio compression

SAOC and USAC. Spatial Audio Object Coding / Unified Speech and Audio Coding. Lecture Audio Coding WS 2013/14. Dr.-Ing.

INTERNATIONAL TELECOMMUNICATION UNION

the Audio Engineering Society. Convention Paper Presented at the 120th Convention 2006 May Paris, France

New Results in Low Bit Rate Speech Coding and Bandwidth Extension

Perceptual Pre-weighting and Post-inverse weighting for Speech Coding

Bandwidth Planning in your Cisco Webex Meetings Environment

Principles of Audio Coding

Research Article Wideband Speech Recovery Using Psychoacoustic Criteria

Opus, a free, high-quality speech and audio codec

Dusseldorf, Germany Agenda item: th -20 th June, Status Report of SMG11 at SMG#32

Abstract. 1. Introduction

dimensions are comparable to existing ACQUAlab front ends. Numerous important interfaces are already available in the basic unit, such as:

Meeting #29 Agenda items: rd 25 th June, 1999, Miami. Adaptive Multi-Rate Wideband (AMR-WB) Feasibility study report. Version 1.0.

System Identification Related Problems at SMN

AUDIOVISUAL COMMUNICATION

Perspectives on Multimedia Quality Prediction Methodologies for Advanced Mobile and IP-based Telephony

AUDIOVISUAL COMMUNICATION

On Improving the Performance of an ACELP Speech Coder

ETSI TR V ( )

MPEG-4 General Audio Coding

2.4 Audio Compression

End-to-end speech and audio quality evaluation of networks using AQuA - competitive alternative for PESQ (P.862) Endre Domiczi Sevana Oy

(12) Patent Application Publication (10) Pub. No.: US 2012/ A1

Voice Quality Assessment for Mobile to SIP Call over Live 3G Network

ETSI TS V (201

System Identification Related Problems at

Technical PapER. between speech and audio coding. Fraunhofer Institute for Integrated Circuits IIS

On the Importance of a VoIP Packet

ETSI TS V (201

System Identification Related Problems at SMN

Voice Over LTE (VoLTE) Technology. July 23, 2018 Tim Burke

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Senior Manager, CE Technology Dolby Australia Pty Ltd

AUDIO. Henning Schulzrinne Dept. of Computer Science Columbia University Spring 2015

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Snr Staff Eng., Team Lead (Applied Research) Dolby Australia Pty Ltd


Open AMR Initiative. Technical Documentation. Version 1.0 Revision

A New Technique for Transceiver Location Data Over LTE Voice Channels

VS1063 ENCODER DEMONSTRATION

Compressed Audio Demystified by Hendrik Gideonse and Connor Smith. All Rights Reserved.

* Answer/end call requires EHS cable for desk phone

Microsoft Lync compatibility. Sennheiser Communications solutions overview

Optical Storage Technology. MPEG Data Compression

High comfort wearing styles with choice of headband and ear hook with leatherette

GSM Network and Services

Chapter 14 MPEG Audio Compression

Quality of Service and Quality of T-Labs Berlin

Rich Recording Technology Technical overall description

* Answer/end call requires EHS cable for desk phone and Sennheiser software for certain softphones

Implementation of G.729E Speech Coding Algorithm based on TMS320VC5416 YANG Xiaojin 1, a, PAN Jinjin 2,b

User focus SD Office is designed to maximize productivity and flexibility in busy offices with its long distance wireless range up to 590 ft

VoLTE Performance Analysis and Evaluation in Real Networks

Avaya compatibility. Sennheiser Communications solution overview

Missing Frame Recovery Method for G Based on Neural Networks

MPEG-4 Version 2 Audio Workshop: HILN - Parametric Audio Coding

Pan-European ecall employing AMR-WB and LTE CSFB Ralf Weber

ERROR-ROBUST INTER/INTRA MACROBLOCK MODE SELECTION USING ISOLATED REGIONS

14th European Signal Processing Conference (EUSIPCO 2006), Florence, Italy, September 4-8, 2006, copyright by EURASIP

Call me back on Skype

Cisco Unified IP Phone 7942G and Cisco Unified IP Phone 7962G

HD Voice and Wideband Codecs (HD-02) Panel Discussion (ITEXPO West 2009) September 02, 2009 Los Angeles, CA

User focus DW Office ML is designed to maximize productivity and flexibility in busy offices. Note: Neckband available as accessory

Audio Fundamentals, Compression Techniques & Standards. Hamid R. Rabiee Mostafa Salehi, Fatemeh Dabiran, Hoda Ayatollahi Spring 2011

Aastra Telecom compatibility. Sennheiser Communications solution overview

VoIP Forgery Detection

MPEG-4 aacplus - Audio coding for today s digital media world

Designed for all-day use, the DW Office connects directly to desk phone and softphone/pc to deliver excellent sound quality

Source Coding Basics and Speech Coding. Yao Wang Polytechnic University, Brooklyn, NY11201

Troubleshooting the 792xG Series Wireless IP Phone

(A simplified version of this document is available for applicants who had applied in previous years.)

SPREAD SPECTRUM AUDIO WATERMARKING SCHEME BASED ON PSYCHOACOUSTIC MODEL

Investigation of Algorithms for VoIP Signaling

* Answer/end call requires EHS cable for desk phone

Brilliant. comfort. sound. exceptional. Mobile Business Series MB Pro 1

User focus DW Pro 2 is designed to maximize productivity and flexibility in busy offices with its

3GPP TS V ( )

Convention Paper 7215

Determination of Bit-Rate Adaptation Thresholds for the Opus Codec for VoIP Services

AN EFFICIENT TRANSCODING SCHEME FOR G.729 AND G SPEECH CODECS: INTEROPERABILITY OVER THE INTERNET. Received July 2010; revised October 2011

around the office. The iconic design of the DW Pro 1 puts it in a class of its own.

RTP implemented in Abacus

MOHAMMAD ZAKI BIN NORANI THESIS SUBMITTED IN FULFILMENT OF THE DEGREE OF COMPUTER SCIENCE (COMPUTER SYSTEM AND NETWORKING)

* Answer/end call requires EHS cable for desk phone and Sennheiser software for certain softphones

Audiovisual QoS for communication over IP networks

Transcription:

ROBUST SPEECH CODING WITH EVS Anssi Rämö, Adriana Vasilache and Henri Toukomaa Nokia Techonologies, Tampere, Finland 2015-12-16 1

OUTLINE Very short introduction to EVS Robustness EVS LSF robustness features Listening test results More results Summary Questions? 2

INTRODUCTION TO EVS EVS stands for Enhanced Voice Services Latest generation voice and audio codec for 3GPP and VoIP networks Introduces SWB and FB at low bitrates of 9.6 and 16.4 kbit/s Also supports legacy narrowband and wideband bandwidths Supports internal resampling between all supported sampling frequencies: 8, 16, 32 and 48 khz. Bitrates from 5.9 to 128 kbit/s. State-of-the-art quality with both speech and generic audio Communications codec with delay less than 32 ms Very robust against frame loss DTX available for all bitrates and bandwidths 3

ROBUSTNESS Robustness is needed in real communication networks Whenever frame is lost in communication channel it has to be replaced in real time in the decoder with best possible approximation If nothing is done, there would be either silent gaps or on the other extreme loud bangs, when the current signal model is not stable. EVS has several novel methods in several different domains to enhance robustness This paper discusses spectral modelling robustness features related to LSF quantization Listening test results account for all of the EVS robustness increasing methods 4

LSF ROBUSTNESS FEATURES. 5 Mode NB WB at bitrates <9.6kbps Inactive MA MA MA Unvoiced MA MA MA WB at bitrates 9.6kbps Voiced SN/AR SN/AR SN/AR Generic SN/AR SN/AR MA Transitio n SN SN SN Audio SN/AR SN/AR MA

LSF ROBUSTNESS FEATURES.. The purely predictive quantizer uses a moving average (MA) predictor. The auto-regressive (AR) prediction has higher coding gain but also higher recovery time after a frame loss. In order to limit sensitivity to frame losses, the AR predictive quantizer is used in conjunction with the safety net. Transition mode always uses the non-predictive quantizer, due to signal being by definition highly changing. Unvoiced and inactive modes always use MA-predictive coding Voiced, audio and generic modes use switched non-predictive/arpredictive LSF coding at low bitrates. For higher bitrates the MApredictor is used for generic and audio mode. 6

LSF ROBUSTNESS FEATURES... In case of switched coding the predictor usage is selected in closed loop, based on several criteria: - If non-predictive is good enough (SD <~1.0) use it. - If prediction helps only very little use non-predictive. - If there is already a very long streak of predictive frames prefer non-predictive frame time-to-time. In practice this means that for stable signal segments predictive coding is used quite often (over 85%), but when the signal is more unstable the quantizer automatically inserts non-predictive LSF codebook entries. 7

LSF OBJECTIVE RESULTS Even with high frame erasure rate of 10%, there are less than 5% frames with Spectral Distortion larger than 4dB. 8

LISTENING TESTING AMR, AMR-WB and EVS were compared against each other Tested 0%, 3%, 6%, 10% and 15% frame erasure rates. Listening test consisted of two tests: clean speech (DTX enabled) and noisy speech (DTX disabled) ACR9 test methodology was used: 1 (very bad) to 9 (excellent) scale without reference i.e. MOS test. Tested bitrates: Around 12.2-13.2 for all AMR, AMR-WB and EVS. Additional test points at around 24 kbit/s (comparison to AMR-WB). EVS also tested 8, 9.6, 16.4, 32 and 48 kbit/s at various bandwidths. 24 naïve listeners in both tests; Finnish language; Sennheiser HD-650 headphones, diotical listening. Noise types were: street, cafeteria, car, and classical music at -15dB. 9

CLEAN SPEECH RESULTS. 10

CLEAN SPEECH RESULTS.. 11

CLEAN SPEECH RESULTS EVS is significantly more robust than either AMR or AMR-WB at all bitrates Especially impressive is that EVS-WB and EVS-SWB at 13.2 kbit/s with 15 % frame erasure rate provides approximately the same quality as AMR 12.2 at 3 % FER and AMR-WB 12.65 at 6 % FER. Also worth noting is that EVS-FB 48 kbit/s provides better than direct NB voice quality even in maximum tested FER rate of 15 %. 12

NOISY SPEECH RESULTS. 13

NOISY SPEECH RESULTS.. 14

NOISY SPEECH RESULTS Noisy speech results are very similar to the clean speech results. For some reason 10 % FER rate EVS seems to work somewhat better with noisy speech compared to clean speech. Background noise likely masks some audible effects that are audible in clean speech. Overall the quality drops very linearly with the increasing frame erasure rate. 15

COMBINED RESULTS AT LOW RATES 16

RESULTS AT LOW BITRATES.. As can be seen EVS-SWB 13.2k with 6 % FER rate provides better than any clean channel AMR / AMR-WB coding mode. Overall it could be estimated that EVS provides additional 5-6 percentage points of additional FER robustness margin compared to AMR-WB and about 10 percentage points more robustness compared to AMR 12.2 kbit/s. Thus EVS provides the same voice quality than earlier generation voice codec, at the same bitrate, although the channel contains significantly more channel errors. 17

COMBINED RESULTS HIGH RATES 18

RESULTS AT HIGH BITRATES AMR-WB 23.85 kbit/s is at least 1.2 MOS point worse than EVS at 16.4 kbit/s over all FER rates. EVS-FB 48 kbit/s provides statistically equivalent quality to direct FB signal at 0% FER. Even with extremely high FER rate of 15 % EVS-FB 48 kbit/s is better than direct narrowband signal or AMR-WB 23.85 kbit/s at 6 % FER rate. 19

DEMOSAMPLES AMR 12.2 0% AMR-WB 12.65 0% EVS 13.2 no FER AMR 12.2 10% AMR-WB 12.65 10% EVS 13.2 10% AMR 23.85 10% EVS 48 0% EVS 48 10% 20

SUMMARY EVS is extremely robust against frame erasures In clean channel performance it is transparent to original (FB 48kbit/s) 21

QUESTIONS? 22

23

BACKUP SLIDES Combined results in full screen by FER rate Combined results in full screen by bitrate 24

COMBINED CURVES 25

COMBINED CURVES 26