Speech-Coding Techniques. Chapter 3

Similar documents
Digital Speech Coding

GSM Network and Services

Source Coding Basics and Speech Coding. Yao Wang Polytechnic University, Brooklyn, NY11201

CT516 Advanced Digital Communications Lecture 7: Speech Encoder

AUDIO. Henning Schulzrinne Dept. of Computer Science Columbia University Spring 2015

Audio Fundamentals, Compression Techniques & Standards. Hamid R. Rabiee Mostafa Salehi, Fatemeh Dabiran, Hoda Ayatollahi Spring 2011

Synopsis of Basic VoIP Concepts

2.4 Audio Compression

RTP implemented in Abacus

Principles of Audio Coding

Mahdi Amiri. February Sharif University of Technology

Multimedia Systems Speech II Hmid R. Rabiee Mahdi Amiri February 2015 Sharif University of Technology

ON-LINE SIMULATION MODULES FOR TEACHING SPEECH AND AUDIO COMPRESSION TECHNIQUES

White Paper Voice Quality Sound design is an art form at Snom and is at the core of our development utilising some of the world's most advance voice

Application of wavelet filtering to image compression

Open AMR Initiative. Technical Documentation. Version 1.0 Revision

Multimedia Systems Speech II Mahdi Amiri February 2012 Sharif University of Technology

Alcatel OmniPCX Enterprise

Designing Apps using DSP s. Sandeep Harpalani. Residential Gateway market. Analog Devices. - VoIP Applications for

Voice Over LTE (VoLTE) Technology. July 23, 2018 Tim Burke

The Steganography In Inactive Frames Of Voip

Audio and video compression

ETSI TS V ( )

Data Compression. Audio compression

Transporting audio-video. over the Internet

ITNP80: Multimedia! Sound-II!

ABSTRACT. that it avoids the tolls charged by ordinary telephone service

MULTIMODE TREE CODING OF SPEECH WITH PERCEPTUAL PRE-WEIGHTING AND POST-WEIGHTING

Perceptual Pre-weighting and Post-inverse weighting for Speech Coding

Voice Quality Assessment for Mobile to SIP Call over Live 3G Network

Overcoming Barriers to High-Quality Voice over IP Deployments

Extraction and Representation of Features, Spring Lecture 4: Speech and Audio: Basics and Resources. Zheng-Hua Tan

Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal.

REAL-TIME DIGITAL SIGNAL PROCESSING

Audio Coding and MP3

MOHAMMAD ZAKI BIN NORANI THESIS SUBMITTED IN FULFILMENT OF THE DEGREE OF COMPUTER SCIENCE (COMPUTER SYSTEM AND NETWORKING)

Perceptual coding. A psychoacoustic model is used to identify those signals that are influenced by both these effects.

Implementation of G.729E Speech Coding Algorithm based on TMS320VC5416 YANG Xiaojin 1, a, PAN Jinjin 2,b

Audio Compression. Audio Compression. Absolute Threshold. CD quality audio:

AN EFFICIENT TRANSCODING SCHEME FOR G.729 AND G SPEECH CODECS: INTEROPERABILITY OVER THE INTERNET. Received July 2010; revised October 2011

Multimedia Systems Speech I Mahdi Amiri February 2011 Sharif University of Technology

Overview. Port Adapter Overview CHAPTER

SAOC and USAC. Spatial Audio Object Coding / Unified Speech and Audio Coding. Lecture Audio Coding WS 2013/14. Dr.-Ing.

Voice over IP (VoIP)

INTERNATIONAL INTERCONNECTION FORUM FOR SERVICES OVER IP. (i3 FORUM) Interoperability Test Plan for International Voice services

KINGS COLLEGE OF ENGINEERING DEPARTMENT OF INFORMATION TECHNOLOGY ACADEMIC YEAR / ODD SEMESTER QUESTION BANK

Speech and audio coding

Presents 2006 IMTC Forum ITU-T T Workshop

Abstract. 1. Introduction

Assessing Call Quality of VoIP and Data Traffic over Wireless LAN

Investigation of Algorithms for VoIP Signaling

ARIB STD-T53-C.S Circuit-Switched Video Conferencing Services

Lecture 7: Audio Compression & Coding

Building Residential VoIP Gateways: A Tutorial Part Three: Voice Quality Assurance For VoIP Networks

The MPEG-4 General Audio Coder

Perspectives on Multimedia Quality Prediction Methodologies for Advanced Mobile and IP-based Telephony

Real-time Audio Quality Evaluation for Adaptive Multimedia Protocols

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Snr Staff Eng., Team Lead (Applied Research) Dolby Australia Pty Ltd

VOICE OVER INTERNET PROTOCOL (VOIP)

End-to-end speech and audio quality evaluation of networks using AQuA - competitive alternative for PESQ (P.862) Endre Domiczi Sevana Oy

Nokia Q. Xie Motorola April 2007

July Copyright (C) The Internet Society (2003). All Rights Reserved.

Squeeze Play: The State of Ady0 Cmprshn. Scott Selfon Senior Development Lead Xbox Advanced Technology Group Microsoft

Technical Specification for the OPERA Objective Perceptual Analyzer OPR-1XX-XXX-P

VoIP Basics. 2005, NETSETRA Corporation Ltd. All rights reserved.

The Effect of Bit-Errors on Compressed Speech, Music and Images

Phillip D. Shade, Senior Network Engineer. Merlion s Keep Consulting

Public Switched TelephoneNetwork (PSTN) By Iqtidar Ali

Preface Preliminaries. Introduction to VoIP Networks. Public Switched Telephone Network (PSTN) Switching Routing Connection hierarchy Telephone

Discontinuous Transmission (DTX) of Speech in cdma2000 Systems

Ai-Chun Pang, Office Number: 417. Homework x 3 30% One mid-term exam (5/14) 40% One term project (proposal: 5/7) 30%

Mpeg 1 layer 3 (mp3) general overview

TELECOMMUNICATION SYSTEMS

Configuring and Debugging Fax Services

Multimedia Communications

White Paper. Optimal Codec Selection in International IP based Voice Networks. (Release 2.0) May 2010

Audio-coding standards

ETSI TS V ( )

Dialogic Diva Analog Media Boards by Sangoma

Audio-coding standards

AUDIOVISUAL COMMUNICATION

Audio 1. Audio and Speech

6MPEG-4 audio coding tools

Chapter 5. Voice Network Concepts. Voice Network Concepts. Voice Communication Concepts and Technology

MATLAB Apps for Teaching Digital Speech Processing

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Senior Manager, CE Technology Dolby Australia Pty Ltd

ROBUST SPEECH CODING WITH EVS Anssi Rämö, Adriana Vasilache and Henri Toukomaa Nokia Techonologies, Tampere, Finland

Lost VOIP Packet Recovery in Active Networks

On the Importance of a VoIP Packet

Voice Analysis for Mobile Networks

Digital Media. Daniel Fuller ITEC 2110

PASS4TEST. IT Certification Guaranteed, The Easy Way! We offer free update service for one year

Audio and Speech. anti-aliasing filter. amplifier. codec A D. G.7xx. 1mV A D. G.7xx. Digital sound. Digital audio. Audio coding

New Results in Low Bit Rate Speech Coding and Bandwidth Extension

ELL 788 Computational Perception & Cognition July November 2015

Georgia State University. Georgia State University. Alexander F. Ribadeneira

ITU-T G.113. Transmission impairments due to speech processing

see the Cisco SPA100 Series Administration Guide for details. The configuration profile is uploaded to the Cisco SPA122 at the time of provisioning.

The Benefit of Low Bit Rate Voice Compression Technologies as Part of a Converged Network Deployment Strategy

HP MSR2000/3000/4000 Router Series

Transcription:

Speech-Coding Techniques Chapter 3

Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones and zeros The lower the bandwidth, the lower the quality RTP payload types Processing power The better quality (for a given bandwidth) uses a more complex algorithm A balance between quality and cost VoIP 2-52

Voice Quality Bandwidth is easily quantified Voice quality is subjective MOS, Mean Opinion Score ITU-T Recommendation P.800 Excellent 5 Good 4 Fair 3 Poor 2 Bad 1 A minimum of 30 people Listen to voice samples or in conversations VoIP 2-53

P.800 recommendations The selection of participants The test environment Explanations to listeners Analysis of results Toll quality A MOS of 4.0 or higher VoIP 2-54

Subjective and objective quality-testing techniques PSQM Perceptual Speech Quality Measurement ITU-T P.861 faithfully represent human judgement and perception algorithmic comparison between the output signal and a know input type of speaker, loudness, delay, active/silence frames, clipping, environmental noise VoIP 2-55

A Little About Speech Speech Air pushed from the lungs past the vocal cords and along the vocal tract The basic vibrations vocal cords The sound is altered by the disposition of the vocal tract (tongue and mouth) Model the vocal tract as a filter The shape changes relatively slowly The vibrations at the vocal cords The excitation signal VoIP 2-56

Speech sounds Voiced sound The vocal cords vibrate open and close Interrupt the air flow Quasi-periodic pulses of air The rate of the opening and closing the pitch A high degree of periodicity at the pitch period 2-20 ms VoIP 2-57

Voiced speech Power spectrum density VoIP 2-58

Unvoiced sounds Forcing air at high velocities through a constriction The glottis is held open Noise-like turbulence Show little long-term periodicity Short-term correlations still present VoIP 2-59

unvoiced speech Power spectrum density VoIP 2-60

Plosive sounds A complete closure in the vocal tract Air pressure is built up and released suddenly As in the sound p in pit or d in dog A vast array of sounds The speech signal is relatively predictable over time The reduction of transmission bandwidth can be significant VoIP 2-61

Voice Sampling A-to-D discrete samples of the waveform and represent each sample by some number of bits A signal can be reconstructed if it is sampled at a minimum of twice the maximum freq. Human speech 300-3800 Hz 8000 samples per second VoIP 2-62

Quantization How many bits are used to represent a sample Quantization noise The difference between the actual level of the input analog signal VoIP 2-63

More bits to reduce noise Diminishing returns Uniform quantization levels Louder talkers sound better 11.2/11 v.s. 2.2/2 Non-uniform quantization Smaller quantization steps at smaller signal levels Spread signal-to-noise ratio more evenly VoIP 2-64

Type of Speech Coders Waveform codecs Sample and code High-quality and not complex Large amount of bandwidth source codecs (vocoders) Match the incoming signal to a math model Linear-predictive filter model of the vocal tract A voiced/unvoiced flag for the excitation The information is sent rather than the signal Low bit rates, but sounds synthetic Higher bit rates do not improve much VoIP 2-65

Hybrid codecs Attempt to provide the best of both Perform a degree of waveform matching Utilize the sound production model Quite good quality at low bit rate VoIP 2-66

G.711 The most commonplace codec Used in circuit-switched telephone network PCM, Pulse-Code Modulation If uniform quantization 12 bits * 8 k/sec = 96 kbps Non-uniform quantization 64 kbps DS0 rate mu-law A-law North America Other countries, a little friendlier to lower signal levels An MOS of about 4.3 VoIP 2-67

ADPCM DPCM, Differential PCM Only transmit the difference between the predicted value and the actual value Voice changes relatively slowly It is possible to predict the value of a sample based on the values of previous samples The receiver performs the same prediction The simplest form No prediction No algorithmic delay VoIP 2-68

ADPCM, Adaptive DPCM Predicts sample values based on Past samples Factoring in some knowledge of how speech varies over time The error is quantized and transmitted Fewer bits required G.721 32 kbps G.726 A-law/mu-law PCM -> 16, 24, 32, 40 kbps An MOS of about 4.0 at 32 kbps VoIP 2-69

Analysis-by-Synthesis (AbS) Codecs Hybrid codec Fill the gap between waveform and source codecs The most successful and commonly used Time-domain AbS codecs Not a simple two-state, voiced/unvoiced Different excitation signals are attempted Closest to the original waveform is selected MPE, Multi-Pulse Excited RPE, Regular-Pulse Excited CELP, Code-Excited Linear Predictive VoIP 2-70

G.728 LD-CELP CELP codecs A filter; its characteristics change over time A codebook of acoustic vectors A vector = a set of elements representing various characteristics of the excitation Transmit Filter coefficients, gain, a pointer to the vector chosen Low Delay CELP Backward-adaptive coder Use previous samples to determine filter coefficients Operates on five samples at a time Delay < 1 ms Only the pointer is transmitted VoIP 2-71

1024 vectors in the code book 10-bit pointer (index) 16 kbps LD-CELP encoder Minimize a frequency-weighted mean-square error VoIP 2-72

LD-CELP decoder An MOS score of about 3.9 One-quarter of G.711 bandwidth VoIP 2-73

G.723.1 ACELP 6.3 or 5.3 kbps Both mandatory Can change from one to another during a conversation The coder A band-limited input speech signal Sampled at 8 KHz, 16-bit uniform PCM quantization Operate on blocks of 240 samples at a time A look-ahead of 7.5 ms A total algorithmic delay of 37.5 ms + other delays A high-pass filter to remove any DC component VoIP 2-74

Various operations to determine the appropriate filter coefficients 5.3 kbps, Algebraic Code-Excited Linear Prediction 6.3 kbps, Multi-pulse Maximum Likelihood Quantization The transmission Linear prediction coefficients Gain parameters Excitation codebook index 24-octet frames at 6.3 kbps, 20-octet frames at 5.3 kbps VoIP 2-75

G.723.1 Annex A Silence Insertion Description (SID) frames of size four octets The two lsbs of the first octet 00 6.3kbps 24 octets/frame 01 5.3kbps 20 10 SID frame 4 An MOS of about 3.8 At least 37.5 ms delay VoIP 2-76

G.729 8 kbps Input frames of 10 ms, 80 samples for 8 KHz sampling rate 5 ms look-ahead Algorithmic delay of 15 ms An 80-bit frame for 10 ms of speech A complex codec G.729.A (Annex A), a number of simplifications Same frame structure Encoder/decoder, interchangeable G.729/G.729.A Slightly lower quality VoIP 2-77

G.729, an MOS of about 4.0 G.729.A an MOS of about 3.7 G.729.B VAD, Voice Activity Detection Based on analysis of several parameters of the input The current frames plus two preceding frames DTX, Discontinuous Transmission Send nothing or send an SID frame SID frame contains information to generate comfort noise CNG, Comfort Noise Generation VoIP 2-78

G.729 Annex D a lower-rate extension 6.4 kbps; 10 ms speech samples, 64 bits/frame MOS 6.3 kbps G.723.1 G.729 Annex E a higher bit rate enhancement the linear prediction filter of G.729 has 10 coef. that of G.729 Annex E has 30 coef. the codebook of G.729 has 35 bits that of G.729 Annex E has 44 bits 118 bits/frame; 11.8 kbps VoIP 2-79

Other Codecs CDMA QCELP defined in IS-733 Qualcom Code-Excited Linear Predictor Variable-rate coder Two most common rates The high rate, 13.3 kbps A lower rate, 6.2 kbps Silence suppression For use with RTP, RFC 2658 VoIP 2-80

GSM Enhanced Full-Rate (EFR) GSM 06.60 An enhanced version of GSM Full-Rate ACELP-based codec The same bit rate and the same overall packing structure 12.2 kbps Support discontinuous transmission For use with RTP, RFC 1890 VoIP 2-81

GSM Adaptive Multi-Rate (AMR) codec GSM 06.90 Eight different modes 4.75 kbps to 12.2 kbps 12.2 kbps, GSM EFR 7.4 kbps, IS-641 (TDMA cellular systems) Change the mode at any time Offer discontinuous transmission The coding choice of many 3G wireless networks VoIP 2-82

The MOS values are for laboratory conditions G.711 does not deal with lost packets G.729 can accommodate a lost frame by interpolating from previous frames But cause errors in subsequent speech frames Processing Power G.728 or G.729, 40 MIPS G.726 10 MIPS VoIP 2-83

Cascaded Codecs E.g., G.711 stream -> G.729 encoder/decoder Quality might not even come close to G.729 Each coder only generate an approximate of the incoming signal VoIP 2-84

Tones, Signal, and DTMF Digits The hybrid codecs are optimized for human speech Other data may need to be transmitted Tones: fax tones, dialing tone, busy tone DTMF digits for two-stage dialing or voice-mail G.711 is OK G.723.1 and G.729 can be unintelligible The ingress gateway needs to intercept The tones and DTMF digits Use an external signaling system VoIP 2-85

Easy at the start of a call Difficult in the middle of a call Encode the tones differently from the speech Send them along the same media path An RTP packet provides the name of the tone and the duration Or, a dynamic RTP profile; an RTP packet containing the frequency, volume and the duration RFC 2198 An RTP payload format for redundant audio data Sending both types of RTP payload VoIP 2-86

RTP Payload Format for DTMF Digits An Internet Draft Both methods described before A large number of tones and events DTMF digits, a busy tone, a congestion tone, a ringing tone, etc. The named events E: the end of the tone, R: reserved VoIP 2-87

Payload format VoIP 2-88