Digital Speech Coding

Similar documents
Speech-Coding Techniques. Chapter 3

GSM Network and Services

CT516 Advanced Digital Communications Lecture 7: Speech Encoder

Source Coding Basics and Speech Coding. Yao Wang Polytechnic University, Brooklyn, NY11201

Audio Fundamentals, Compression Techniques & Standards. Hamid R. Rabiee Mostafa Salehi, Fatemeh Dabiran, Hoda Ayatollahi Spring 2011

AUDIO. Henning Schulzrinne Dept. of Computer Science Columbia University Spring 2015

ON-LINE SIMULATION MODULES FOR TEACHING SPEECH AND AUDIO COMPRESSION TECHNIQUES

Synopsis of Basic VoIP Concepts

Multimedia Systems Speech II Mahdi Amiri February 2012 Sharif University of Technology

Data Compression. Audio compression

A MULTI-RATE SPEECH AND CHANNEL CODEC: A GSM AMR HALF-RATE CANDIDATE

2.4 Audio Compression

Principles of Audio Coding

MULTIMODE TREE CODING OF SPEECH WITH PERCEPTUAL PRE-WEIGHTING AND POST-WEIGHTING

Audio and video compression

Mahdi Amiri. February Sharif University of Technology

Multimedia Systems Speech II Hmid R. Rabiee Mahdi Amiri February 2015 Sharif University of Technology

Application of wavelet filtering to image compression

Abstract. 1. Introduction

Voice Quality Assessment for Mobile to SIP Call over Live 3G Network

Speech and audio coding

Squeeze Play: The State of Ady0 Cmprshn. Scott Selfon Senior Development Lead Xbox Advanced Technology Group Microsoft

RTP implemented in Abacus

ROBUST SPEECH CODING WITH EVS Anssi Rämö, Adriana Vasilache and Henri Toukomaa Nokia Techonologies, Tampere, Finland

Open AMR Initiative. Technical Documentation. Version 1.0 Revision

SAOC and USAC. Spatial Audio Object Coding / Unified Speech and Audio Coding. Lecture Audio Coding WS 2013/14. Dr.-Ing.

Perceptual coding. A psychoacoustic model is used to identify those signals that are influenced by both these effects.

ETSI TS V ( )

AN EFFICIENT TRANSCODING SCHEME FOR G.729 AND G SPEECH CODECS: INTEROPERABILITY OVER THE INTERNET. Received July 2010; revised October 2011

Voice Over LTE (VoLTE) Technology. July 23, 2018 Tim Burke

Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal.

White Paper Voice Quality Sound design is an art form at Snom and is at the core of our development utilising some of the world's most advance voice

Extraction and Representation of Features, Spring Lecture 4: Speech and Audio: Basics and Resources. Zheng-Hua Tan

ETSI TS V2.1.3 ( )

MOHAMMAD ZAKI BIN NORANI THESIS SUBMITTED IN FULFILMENT OF THE DEGREE OF COMPUTER SCIENCE (COMPUTER SYSTEM AND NETWORKING)

Lecture 7: Audio Compression & Coding

The Steganography In Inactive Frames Of Voip

Implementation of G.729E Speech Coding Algorithm based on TMS320VC5416 YANG Xiaojin 1, a, PAN Jinjin 2,b

Audio Coding and MP3

Investigation of Algorithms for VoIP Signaling

Optical Storage Technology. MPEG Data Compression

Assessing Call Quality of VoIP and Data Traffic over Wireless LAN

Real-time Audio Quality Evaluation for Adaptive Multimedia Protocols

ITNP80: Multimedia! Sound-II!

Overcoming Barriers to High-Quality Voice over IP Deployments

Preface Preliminaries. Introduction to VoIP Networks. Public Switched Telephone Network (PSTN) Switching Routing Connection hierarchy Telephone

ARIB STD-T53-C.S Circuit-Switched Video Conferencing Services

COPYRIGHTED MATERIAL. Introduction. 1.1 Introduction

Perceptual Pre-weighting and Post-inverse weighting for Speech Coding

Design of a CELP Speech Coder and Study of Complexity vs Quality Trade-offs for Different Codebooks.

Designing Apps using DSP s. Sandeep Harpalani. Residential Gateway market. Analog Devices. - VoIP Applications for

ABSTRACT. that it avoids the tolls charged by ordinary telephone service

ITU-T G.113. Transmission impairments due to speech processing

6MPEG-4 audio coding tools

Perspectives on Multimedia Quality Prediction Methodologies for Advanced Mobile and IP-based Telephony

KINGS COLLEGE OF ENGINEERING DEPARTMENT OF INFORMATION TECHNOLOGY ACADEMIC YEAR / ODD SEMESTER QUESTION BANK

Alcatel OmniPCX Enterprise

Ai-Chun Pang, Office Number: 417. Homework x 3 30% One mid-term exam (5/14) 40% One term project (proposal: 5/7) 30%

On the Importance of a VoIP Packet

Wireless systems overview

The Effect of Bit-Errors on Compressed Speech, Music and Images

The BroadVoice Speech Coding Algorithm. Juin-Hwey (Raymond) Chen, Ph.D. Senior Technical Director Broadcom Corporation March 22, 2010

RESIDUAL-EXCITED LINEAR PREDICTIVE (RELP) VOCODER SYSTEM WITH TMS320C6711 DSK AND VOWEL CHARACTERIZATION

Audio-coding standards

Multimedia Communications

Building Residential VoIP Gateways: A Tutorial Part Three: Voice Quality Assurance For VoIP Networks

The MPEG-4 General Audio Coder

Chapter 5. Voice Network Concepts. Voice Network Concepts. Voice Communication Concepts and Technology

ETSI TS V5.0.0 ( )

White Paper. Optimal Codec Selection in International IP based Voice Networks. (Release 2.0) May 2010

Interconnected Systems Modern Telecommunication Systems and their Impact on QoS and Q oe

Presents 2006 IMTC Forum ITU-T T Workshop

Ch. 5: Audio Compression Multimedia Systems

4G Mobile Communications

New Results in Low Bit Rate Speech Coding and Bandwidth Extension

ETSI EN V8.0.1 ( )

End-to-end speech and audio quality evaluation of networks using AQuA - competitive alternative for PESQ (P.862) Endre Domiczi Sevana Oy

Audio-coding standards

For Mac and iphone. James McCartney Core Audio Engineer. Eric Allamanche Core Audio Engineer

MATLAB Apps for Teaching Digital Speech Processing

TECHNICAL PAPER. Fraunhofer Institute for Integrated Circuits IIS

Date. Next Generation in Speech Quality ETSI STQ Workshop, Nov 2012 Dr. Imre Varga Qualcomm Inc.

Digital Speech Interpolation Advantage of Statistical Time Division Multiplexer

Setting up a National Transmission Plan & QoS for International Connections. Senior Engineer, TIS ETSI STQ Vice Chair Joachim POMY

Transporting audio-video. over the Internet

Audio 1. Audio and Speech

Audio and Speech. anti-aliasing filter. amplifier. codec A D. G.7xx. 1mV A D. G.7xx. Digital sound. Digital audio. Audio coding

Nokia Q. Xie Motorola April 2007

Introduction to Quality of Service

Throughput Performance Analysis for VOIP over UMTS Networks

VoIP Forgery Detection

Lost VOIP Packet Recovery in Active Networks

On Improving the Performance of an ACELP Speech Coder

ETSI TR V7.2.0 ( )

ORION TELECOM NETWORKS INC.

ETSI TS V ( )

REAL-TIME DIGITAL SIGNAL PROCESSING

VOICE OVER INTERNET PROTOCOL (VOIP)

VALLIAMMAI ENGINEERING COLLEGE

DuSLIC Infineons High Modem Performance Codec

Transcription:

Digital Speech Processing David Tipper Associate Professor Graduate Program of Telecommunications and Networking University of Pittsburgh Telcom 2700/INFSCI 1072 Slides 7 http://www.sis.pitt.edu/~dtipper/tipper.html Digital Speech Coding Digital Speech Convert analog speech to digital form and transmit digitally Applications Telephony: (cellular, wired and Internet- VoIP) Speech Storage (Automated call-centers) High-Fidelity recordings/voice Text-to-speech (machine generated speech) Issues Efficient use of bandwidth Compress to lower bit rate per user => more users Speech Quality Want tollgrade or better quality in a specific transmission environment Environment ( BER, packet lost, packet out of order, delay, etc.) Hardware complexity Speed (coding/decoding delay), computation requirement and power consumption Telcom 2700 2 Digital Speech Processing Speech coding in wireless systems All 1G systems have analog speech transmission 2G and 3G systems have digital speech Type of source coding Motivation for digital speech Increase system capacity Compression possible Quality/bandwidth tradeoffs can be made Improve quality of speech Error control coding possible, equalization, etc. Improve security as encryption possible for privacy Reduce Cost and Operations and Maintenance (OAM) Telcom 2700 3 1

Typical Wireless Communication System Source Source Encoder Channel Encoder Modulator Channel Destination Source Decoder Channel Decoder Demod -ulator Telcom 2700 4 Characteristics of Speech Bandwidth Most of energy between 20 Hz to about 7KHz, Human ear sensitive to energy between 50 Hz and 4KHz Time Signal High correlation Short term stationary Classified into four categories Voiced : created by air passed through vocal cords (e.g., ah, v) Unvoiced : created by air through mouth and lips (e.g., s, f ) Mixed or transitional Silence Telcom 2700 5 Characteristics of Speech Typical Voiced speech v Typical Unvoiced speech s Telcom 2700 6 2

Digital Speech Speech Coder: device that converts speech to digital Types of speech coders Waveform coders Convert any analog signal to digital form Vocoders (Parametric coders) Try to exploit special properties of speech signal to reduce bit rate Build model of speech transmit parameters of model Hybrid Coders Combine features of waveform and vocoders Telcom 2700 7 Speech Quality of Various Coders Mean Opinion Score is a subjective measure of quality Tradeoff in quality vs. data rate vs. complexity Telcom 2700 8 Waveform Coders (e.g.,pcm) Waveform Coders Convert any analog signal to digital - basically A/D signal sampled > twice highest frequency- then quantized into ` n bit samples Uniform quantization Example Pulse Code Modulation band limit speech < 4000 Hz pass speech through μ law compander sample 8000 Hz, 8 bit samples 64 Kbps DS0 rate Characteristics Quality High Complexity Low Bit rate High Delay - Low Robustness - High Telcom 2700 10 3

PCM Speech Coding input compressor PAM Sampleand-hold circuit -to- Digital PCM μ law compander Transmission medium output expander Hold circuit PAM Digital-to μ law expander Pulse code modulation (PCM) system with analog companding then digital conversion ITU G.700 standard basis for speech coding In PSTN in 60 s Telcom 2700 11 Companding 1 μ-law companding Output 0.9 5 15 0.8 100 40 μ = 255 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 (no compression) Compander emphasizes small values, de-emphasizes large values in-order to equalize SNR across samples. Reverse the mapping at the receiver with an expandor 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Input Telcom 2700 12 PCM Speech Coding Digitally companded PCM system ITU G.711 standard better quality speech than analog companding input Linear -to- PAM Digital PCM Digital compressor Sampleand-hold circuit Compressed PCM PCM transmitter Transmission medium output Hold circuit Linear Digital-to PAM PCM Digital expander PCM receiver Differential PCM (DPCM) : reduce bit rate from 64 Kbps to 32 Kbps) since change is small between sample transmit 1 sample then on transmit difference between samples use 4 bits to quantize adaptively adjust range of quantizer improves quality (ADPCM ITU G.726 ) Telcom 2700 13 4

DPCM Speech Coding input Low-pass Differentiator (summer) -to- Digital Encoded difference samples Accumulated signal level Integrator Digital-to DPCM transmitter DPCM input Digital-to Integrator Hold circuit Low-pass output DPCM receiver Telcom 2700 14 Subband Speech Coding Filter 1 A/D 1 speech Filter 2 A/D 2 Mux Channel encoder Filter 3 A/D 3 Partition signal into non-overlapping frequency bands use different A/D quantizer for each band Example: 3 subbands 5600+12000 + 13600 = 31.2 Kbps band Range encoding ---------------------------------------- 1 50-700 Hz 4 bits 2 700-2000 Hz 3 bits 3 2000-3400Hz 2 bits Telcom 2700 15 Vocoders Vocoders (Parametric Coders) Models the vocalization of speech Speech sampled and broken into frames (~25 msec) Instead of transmitting digitized speech 1. Build model of speech 2. Transmit parameters of model 3. Synthesize approximation of speech Linear Predictive Coders (LPC) basic Vocoder model Models vocal tract as a Filter excitation periodic pulse (voiced speech) or noise (unvoiced speech) Transmitted parameters: gain, voiced/unvoiced decision, pitch (if voiced), LPC parameters Telcom 2700 16 5

Vocoders Linear Predictive Coders (LPC) Excitation periodic pulse (voiced speech) or noise (unvoiced speech) Transmitted parameters: gain, voiced/unvoiced decision, pitch (if voiced), LPC parameters Telcom 2700 17 Vocoders Example Tenth Order Linear Predictive Coder Samples Voice at 8000 Hz buffer 240 samples => 30 msec Filter Model (M=10 is order, G is gain, z -1 unit delay, b k are coefficients) H ( z ) = 1 + M G k = 1 b k z G = 5 bits, b k = 8 bits each, voiced/unvoiced decision = 1 bit, pitch = 6 bits => 92 bits/30 msec = 3067 bps k Telcom 2700 18 Vocoders LPC coders can achieve low bit rates 1.2 4.8 Kbps Characteristics of LPC Quality Low Complexity Moderate Bit Rate Low Delay Moderate Robustness Low Quality of pure LPC vocoder to low for cellular telephony - try to improve quality by using hybrid coders Try to improve the quality by refining model of speech, improve accuracy of model improve input to speech coder Telcom 2700 19 6

Vocoders Hybrid Coders Combine Vocoder and Waveform Coder concept Residual LPC (RELP) Codebook excited LPC (CELP) Telcom 2700 20 RELP Vocoder Residual Excited LPC improve quality of LPC by transmitting error (residue) along with LPC parameters s(n) Buffer/ Window residue v/u decision gain, pitch ST-LP Analysis LP parameters LPC Synthesis ENCODER Encoded output Block diagram of a RELP encoder Telcom 2700 21 GSM Speech Coding speech Low-pass A/D 104 kbps 13 kbps RPE-LTP Channel speech encoder encoder 8000 samples/s, 13 bits/sample GSM uses Regular Pulse Excited -- Linear Predictive Coder (RPE-- LPC) for speech Basically combine DPCM concept with LPC Information from previous samples used to predict the current sample. The LPC coefficients, plus an encoded form of the residual (predicted - actual sample = error), represent the signal. Telcom 2700 22 7

GSM Speech Coding (cont) Regular pulse excited - long term prediction (RPE-LRP) speech encoder (RELP speech coder 160 samples/ 20 ms from A/D (= 2080 bits) RPE-LTP speech encoder 36 LPC bits/20 ms 9 LTP bits/5 ms 47 RPE bits/5 ms 260 bits/20 ms to channel encoder LPC: linear prediction coding LTP: long term prediction pitch + input RPE: Residual Prediction Error: Telcom 2700 23 GSM Speech Coding (cont) Channel encoder 260 bits/ 20 ms = 13 kb/s 50 class 1a bits 3-bit CRC 182 class 1b bits 78 class 2 bits 53 bits 4 tail bits* (2,1,5) convolution coder 470 bits Bit interleaver Class 1a: CRC (3-bit error detection) and convolutional coding (error correction) Class 1b: convolutional coding Class 2: no error protection *tail bits to periodically reset convolutional coder 456 bits/ 20 ms = 22.8 kb/s Telcom 2700 24 Hybrid Vocoders Codebook Excited LPC Problem with simple LPC is the voiced/unvoiced decision and pitch estimation doesn t model transitional speech well, and not always accurate Codebook approach pass speech through an analyzer to find closest match to a set of possible excitations (codebook) Transmit codebook pointer + LPC parameters NA-TDMA standard, IS-95, 3G, ITU G.729 standard Telcom 2700 25 8

Typical CELP Encoder Telcom 2700 26 CELP Speech Coders General CELP architecture Telcom 2700 27 Evaluating Speech Coders Qualitative Comparison based on subjective procedures in ITU-T Rec. P. 830 Major Procedures Absolute Category Rating Subjects listen to samples and rank them on an absolute scale - result is a mean opinion score (MOS) Comparison Category Rating Subjects listen to coded samples and original uncoded sample (PCM or analog), the two are compared on a relative scale result is a comparison mean opinion score (CMOS) Mean Opinion Score (MOS) ------------------------------------- Excellent 5 Good 4 Fair 3 Poor 2 Bad 1 Comparison MOS (CMOS) ------------------------------------- Much Better 3 Better 2 Slightly Better 1 About the Same 0 Slightly Worse -1 Worse -2 Much Worse -3 Telcom 2700 29 9

Evaluating Speech Coders MOS for clear channel environment no errors Result vary a little with language and speaker gender Standard Speech coder Bit rate MOS PCM Waveform 64 Kbps 4.3 CT2 ADPCM 32 Kbps 4.1 DECT ADPCM 32 Kbps 4.1 GSM Hybrid RELPC 13 kbps 3.54 QCELP Hybrid CELP 14.4 Kbps 3.4 4.0 QCELP Hybrid CELP 9.6 Kbps 3.4 LPC Vocoder 2.4 Kbps 2.5 ITU G.729 Hybrid CELP 8.Kbps 3.9 Qualcom Codebook Excited LP coder (cdmaone standard) Telcom 2700 30 Evaluating Speech Coders Types of environments recommended for testing coder quality Clean Channel no background noise Vehicle : emulate car background noise Street : emulate pedestrian environment Hoth : emulate background noise in office environment (voice band interference) Consider environments above for cases of Perfect Channel no transmission errors Random channel errors Bursty channel errors May consider repeated encoding/decoding (e.g., mobile to mobile call) Telcom 2700 31 Evaluating Speech Coders Background noise and errors degrade quality Repeated coding degrades quality Telcom 2700 32 10

Codec Selection For cellular need to consider Quality, Complexity, Delay, Compression Rate ITU Coder G.711 Bit Rate 64 Kbps Coding Delay 0 Decoding Delay 0 Complexity Low G.729 8 Kbps 15 ms 7.5 ms Medium G.723.a,b 6.4/5.3 Kbps 35.5 ms 18.75 ms High Telcom 2700 33 Silence Compression Much of a conversation is Silence (~40%) no need to transmit Voice Activity Detector (VAD) Hardware to detect silence period quickly 1. Variable Bit Rate Coder Approach reduce bit rate when silence detected increase compression Cdmaone and CDMA2000 codec use variable bit rate approach 2. Discontinuous transmission (DTX) Approach Stop transmitting frames Send minimal # of frames to keep connection up Comfort Noise Generator (CNG) Synthesize background noise - avoids: Did you hang up? Random noise or reproduce speaker s ambient background GSM, UMTS and popular VoIP G.723.1 codec use VAD/DTX/CNG Telcom 2700 35 Silence Compression Telcom 2700 36 11

Voice Coding Basic Voice Coding Approaches Waveform Vocoders Hybrid Vocoders Evaluation of Vocoder Quality Codebook based vocoders use in new technology 3GPP and ITU recently standardized a AMR wideband CELP input 50-7000 HZ rather than 300-3400 Hz of current systems more natural quality speech slightly higher bit rate Telcom 2700 37 12