Source Coding Basics and Speech Coding. Yao Wang Polytechnic University, Brooklyn, NY11201

Similar documents
Speech-Coding Techniques. Chapter 3

Digital Speech Coding

CT516 Advanced Digital Communications Lecture 7: Speech Encoder

Mahdi Amiri. February Sharif University of Technology

Multimedia Systems Speech II Mahdi Amiri February 2012 Sharif University of Technology

Multimedia Systems Speech II Hmid R. Rabiee Mahdi Amiri February 2015 Sharif University of Technology

Fundamentals of Multimedia. Lecture 5 Lossless Data Compression Variable Length Coding

KINGS COLLEGE OF ENGINEERING DEPARTMENT OF INFORMATION TECHNOLOGY ACADEMIC YEAR / ODD SEMESTER QUESTION BANK

CS 335 Graphics and Multimedia. Image Compression

IMAGE PROCESSING (RRY025) LECTURE 13 IMAGE COMPRESSION - I

GSM Network and Services

Audio Fundamentals, Compression Techniques & Standards. Hamid R. Rabiee Mostafa Salehi, Fatemeh Dabiran, Hoda Ayatollahi Spring 2011

AUDIO. Henning Schulzrinne Dept. of Computer Science Columbia University Spring 2015

IMAGE COMPRESSION- I. Week VIII Feb /25/2003 Image Compression-I 1

ON-LINE SIMULATION MODULES FOR TEACHING SPEECH AND AUDIO COMPRESSION TECHNIQUES

Lecture Information Multimedia Video Coding & Architectures

2014 Summer School on MPEG/VCEG Video. Video Coding Concept

Lecture Information. Mod 01 Part 1: The Need for Compression. Why Digital Signal Coding? (1)

Application of wavelet filtering to image compression

2.4 Audio Compression

Multimedia Networking ECE 599

Audio and video compression

Data Compression. Audio compression

Lossless Compression Algorithms

Audio Coding and MP3

Image Coding and Data Compression

IMAGE COMPRESSION. Image Compression. Why? Reducing transportation times Reducing file size. A two way event - compression and decompression

Perceptual Pre-weighting and Post-inverse weighting for Speech Coding

7.5 Dictionary-based Coding

Course Syllabus. Website Multimedia Systems, Overview

Lecture 5: Compression I. This Week s Schedule

Mahdi Amiri. February Sharif University of Technology

ITNP80: Multimedia! Sound-II!

Speech and audio coding

The Effect of Bit-Errors on Compressed Speech, Music and Images

Lecture 7: Audio Compression & Coding

DCT Based, Lossy Still Image Compression

Image Compression for Mobile Devices using Prediction and Direct Coding Approach

Topic 5 Image Compression

Wireless Communication

Welcome Back to Fundamentals of Multimedia (MR412) Fall, 2012 Lecture 10 (Chapter 7) ZHU Yongxin, Winson

Principles of Audio Coding

Image Compression - An Overview Jagroop Singh 1

Perceptual Coding. Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding

JPEG: An Image Compression System. Nimrod Peleg update: Nov. 2003

EE67I Multimedia Communication Systems Lecture 4

DigiPoints Volume 1. Student Workbook. Module 8 Digital Compression

Audio-coding standards

Data Compression. Media Signal Processing, Presentation 2. Presented By: Jahanzeb Farooq Michael Osadebey

Outline Introduction MPEG-2 MPEG-4. Video Compression. Introduction to MPEG. Prof. Pratikgiri Goswami

VALLIAMMAI ENGINEERING COLLEGE

Chapter 7 Lossless Compression Algorithms

In the name of Allah. the compassionate, the merciful

IMAGE COMPRESSION TECHNIQUES

Parametric Coding of High-Quality Audio

Optical Storage Technology. MPEG Data Compression

Digital Communication Prof. Bikash Kumar Dey Department of Electrical Engineering Indian Institute of Technology, Bombay

Lecture 6: Compression II. This Week s Schedule

Ch. 2: Compression Basics Multimedia Systems

Audio-coding standards

DIGITAL TELEVISION 1. DIGITAL VIDEO FUNDAMENTALS

ELL 788 Computational Perception & Cognition July November 2015

DIGITAL IMAGE PROCESSING WRITTEN REPORT ADAPTIVE IMAGE COMPRESSION TECHNIQUES FOR WIRELESS MULTIMEDIA APPLICATIONS

Chapter 14 MPEG Audio Compression

Engineering Mathematics II Lecture 16 Compression

Embedded lossless audio coding using linear prediction and cascade coding

JPEG: An Image Compression System

The Steganography In Inactive Frames Of Voip

Multimedia Systems Speech I Mahdi Amiri February 2011 Sharif University of Technology

Video Coding Standards. Yao Wang Polytechnic University, Brooklyn, NY11201 http: //eeweb.poly.edu/~yao

Image compression. Stefano Ferrari. Università degli Studi di Milano Methods for Image Processing. academic year

CISC 7610 Lecture 3 Multimedia data and data formats

Perceptual coding. A psychoacoustic model is used to identify those signals that are influenced by both these effects.

MPEG-4 Structured Audio Systems

5: Music Compression. Music Coding. Mark Handley

Synopsis of Basic VoIP Concepts

Image coding and compression

CMPT 365 Multimedia Systems. Media Compression - Image

Data Compression. An overview of Compression. Multimedia Systems and Applications. Binary Image Compression. Binary Image Compression

The MPEG-4 General Audio Coder

Department of electronics and telecommunication, J.D.I.E.T.Yavatmal, India 2

Intro. To Multimedia Engineering Lossless Compression

International Journal of Emerging Technology and Advanced Engineering Website: (ISSN , Volume 2, Issue 4, April 2012)

Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal.

Transporting audio-video. over the Internet

Lecture 8 JPEG Compression (Part 3)

A Image Comparative Study using DCT, Fast Fourier, Wavelet Transforms and Huffman Algorithm

Audio Compression. Audio Compression. Absolute Threshold. CD quality audio:

VC 12/13 T16 Video Compression

A Novel Image Compression Technique using Simple Arithmetic Addition

Digital Image Processing

Source Coding Techniques

MULTIMODE TREE CODING OF SPEECH WITH PERCEPTUAL PRE-WEIGHTING AND POST-WEIGHTING

Simple variant of coding with a variable number of symbols and fixlength codewords.

Interframe coding A video scene captured as a sequence of frames can be efficiently coded by estimating and compensating for motion between frames pri

Voice Quality Assessment for Mobile to SIP Call over Live 3G Network

Stereo Image Compression

Video Coding Standards

Introduction to Video Compression

Image and Video Coding I: Fundamentals

Transcription:

Source Coding Basics and Speech Coding Yao Wang Polytechnic University, Brooklyn, NY1121 http://eeweb.poly.edu/~yao

Outline Why do we need to compress speech signals Basic components in a source coding system Variable length binary encoding: Huffman coding Speech coding overview Predictive coding: DPCM, ADPCM Vocoder Hybrid coding Speech coding standards Yao Wang, 26 EE3414: Speech Coding 2

Why do we need to compress? Raw PCM speech (sampled at 8 kbps, represented with 8 bit/sample) has data rate of 64 kbps Speech coding refers to a process that reduces the bit rate of a speech file Speech coding enables a telephone company to carry more voice calls in a single fiber or cable Speech coding is necessary for cellular phones, which has limited data rate for each user (<=16 kbps is desired!). For the same mobile user, the lower the bit rate for a voice call, the more other services (data/image/video) can be accommodated. Speech coding is also necessary for voice-over-ip, audio-visual teleconferencing, etc, to reduce the bandwidth consumption over the Internet Yao Wang, 26 EE3414: Speech Coding 3

Basic Components in a Source Coding System Input Samples Transformed parameters Quantized parameters Binary bitstreams Transformation Quantization Binary Encoding Lossy Lossless Prediction Transforms Model fitting... Scalar Q Vector Q Fixed length Variable length (Huffman, arithmetic, LZW) Motivation for transformation --- To yield a more efficient representation of the original samples. Yao Wang, 26 EE3414: Speech Coding 4

Example Coding Methods ZIP : no transformation nor quantization, apply VLC (LZW) to the stream of letters (symbols) in a file directly, lossless coding PCM for speech: no transformation, quantize the speech samples directly using mu-law quantizer, apply fixed length binary coding ADPCM for speech: apply prediction to original samples, the predictor is adapted from one speech frame to the next, quantize the prediction error, error symbols coded using fixed length binary coding JPEG for image: apply discrete cosine transform to blocks of image pixels, quantize the transformed coefficients, code the quantized coefficients using variable length coding (runlength + Huffman coding) Yao Wang, 26 EE3414: Speech Coding 5

Binary Encoding Binary encoding To represent a finite set of symbols using binary codewords. Fixed length coding N levels represented by (int) log 2 (N) bits. Variable length coding (VLC) more frequently appearing symbols represented by shorter codewords (Huffman, arithmetic, LZW=zip). The minimum number of bits required to represent a source is bounded by its entropy. Yao Wang, 26 EE3414: Speech Coding 6

Entropy Bound on Bit Rate Shannon s Theorem: A source with finite number of symbols {s 1, s 2,, s N } Symbol s n has a probability Prob(s n )= p n If s n is given a codeword with l n bits, average bit rate (bits/symbol) is l Average bit rate is bounded by entropy of the source (H) H = H l H = p n l n ; + 1; p n log p 2 n For this reason, variable length coding is also known as entropy coding. The goal in VLC is to reach the entropy bound as closely as possible. ; Yao Wang, 26 EE3414: Speech Coding 7

Huffman Coding Example Symbol Prob Codeword Length 1 2 36/49 1 1 1 1 3 8/49 1 2 1 1 4/49 13/49 1 3 5/49 1/49 3 36 l = + + + = = 49 1 8 49 2 4 1 49 49 3 67 ( ) 14. ; H= pk log pk = 116.. 49 Yao Wang, 26 EE3414: Speech Coding 8

Huffman Coding Example Continued Code the sequence of symbols {3,2,2,,1,1,2,3,2,2} using the Huffman code designed previously Code Table symbol Coded sequence: {1,1,1,,1,1,1,1,1,1} Average bit rate: 18 bits/1=1.8 bits/symbol Fixed length coding rate: 2 bits/symbol Saving is more obvious for a longer sequence of symbols Decoding: table look-up 1 2 3 codeword 1 1 1 Yao Wang, 26 EE3414: Speech Coding 9

Steps in Huffman Code Design Step 1: arrange the symbol probabilities in a decreasing order and consider them as leaf nodes of a tree Step 2: while there are more than one node: Find the two nodes with the smallest probability and assign the one with the lowest probability a, and the other one a 1 (or the other way, but be consistent) Merge the two nodes to form a new node whose probability is the sum of the two merged nodes. Go back to Step 1 Step 3: For each symbol, determine its codeword by tracing the assigned bits from the corresponding leaf node to the top of the tree. The bit at the leaf node is the last bit of the codeword Yao Wang, 26 EE3414: Speech Coding 1

Another Example of Huffman Coding.43.3.3.3.3.3 b(2): 1. 1 Section VIII.57.43 1 Section VII.3.27 1 Section VI.27.23.2 1 Section V.23.23.23.23.2.15.12 Section IV.15.15.15.12.12 Section III 1 Section II Section I Yao Wang, 26 EE3414: Speech Coding 11 1.12.8 1.8.8.6.6.6.6.6.6 b(1):1 b(3):1 b(4):111 b(): 11 b(5):111 b(6): 11 b(7): 111 Average Bit Rate = pnln = 2.71 bits; Entropy = pn log2 pn = 2.68 bits. n n

More on Huffman Coding Huffman coding achieves the upper entropy bound One can code one symbol at a time (scalar coding) or a group of symbols at a time (vector coding) If the probability distribution is known and accurate, Huffman coding is very good (off from the entropy by 1 bit at most). Adaptive Huffman coding: estimate the probability from the sequence on line Other lossless coding method: Arithmetic coding: reaching the entropy lower bound more closely, but also more complex than Huffman coding, can adapt to change probability more easily Lempel Ziv Welch (LZW): does not employ a probability table, universally applicable, but less efficient Yao Wang, 26 EE3414: Speech Coding 12

Speech Coding Speech coders are lossy coders, i.e. the decoded signal is different from the original The goal in speech coding is to minimize the distortion at a given bit rate, or minimize the bit rate to reach a given distortion Figure of Merit Objective measure of distortion is SNR (Signal to noise ratio) SNR does not correlate well with perceived speech quality Speech quality is often measured by MOS (mean opinion score) 5: excellent 4: good 3: fair 2: poor 1: bad PCM at 64 kbps with mu-law or A-law has MOS = 4.5 to 5. Yao Wang, 26 EE3414: Speech Coding 13

Speech Quality Versus Bit Rate For Common Classes of Codecs From: J. Wooward, Speech coding overview, http://www-mobile.ecs.soton.ac.uk/speech_codecs Yao Wang, 26 EE3414: Speech Coding 14

Predictive Coding (LPC or DPCM) Observation: Adjacent samples are often similar Predictive coding: Predict the current sample from previous samples, quantize and code the prediction error, instead of the original sample. If the prediction is accurate most of the time, the prediction error is concentrated near zeros and can be coded with fewer bits than the original signal Usually a linear predictor is used (linear predictive coding) x p ( n) Class of waveform-based coders Other names: Non-predictive coding (uniform or non-uniform) -> PCM Predictive coding -> Differential PCM or DPCM = P k = 1 a k x( n k) Yao Wang, 26 EE3414: Speech Coding 15

Encoder Block Diagram x(n) + d(n) d (n) Binary c(n) + Q( ) Encoder - x p (n) Buffer & xn I ( ) Predictor x p (n) = I a x ( n k ) d( n) = x( n) x ( n) k xˆ ( n) = x p ( n) + dˆ( n) Note: the predictor is closed-loop: based on reconstructed (not original) past values p Yao Wang, 26 EE3414: Speech Coding 16

Decoder Block Diagram c(n) Binary d (n) xn I ( ) Decoder Q -1 ( ) + x p (n) Buffer & Predictor Note: the decoder predictor MUST track the encoder predictor, hence BOTH based on reconstructed (not original) past values Yao Wang, 26 EE3414: Speech Coding 17

Example Code the following sequence of samples using a DPCM coder: Sequence: {1,3,4,4,7,8,6,5,3,1, } Using a simple predictor: predict the current value by the previous one x p ( n) = a1 x( n 1) Using a 3-level quantizer: Q( d) = 2 2 d >= 1 d < 1 d <= 1 Show the reconstructed sequence For simplicity, assume the first sample is known. Show the coded binary stream if the following code is used to code the difference signal Error -> 1, Error 2 -> 1, Error -2 -> Yao Wang, 26 EE3414: Speech Coding 18

Delta-Modulation (DM) Delta-Modulation Quantize the prediction error to 2 levels, and -. Each sample takes 1 bit Bit rate=sampling rate I d( n) = Q[ d( n)] c(n) = c(n) = 1 - d(n) Yao Wang, 26 EE3414: Speech Coding 19

Example Code the following sequence of samples using a DM: {1,3,4,4,7,8,6,5,3,1, } Using =2, assuming the first sample is known Show the reconstructed sequence Show the coded binary sequence Yao Wang, 26 EE3414: Speech Coding 2

DM With a Simple Predictor Predictor: predict the current pixel by the previous one Stepsize: =35 3 25 2 15 x p ( n) = x( n 1) Illustration of the linear delta modulation original signal original sample reconstructed value predicted value 1 5-5.5.1.15.2.25.3.35.4.45.5 Yao Wang, 26 EE3414: Speech Coding 21

How to determine stepsize? 25 2 15 1 5 Q=2: too small lag behind.2.4 25 2 15 1 5 Q=35: OK hug the original.2.4 3 2 1 Q=8: too large oscillate.2.4 * original value o predicted value x reconstructed value demo_sindm.m

Selecting DM Parameters Based on Signal Statistics 9 Histogram of Difference between Adjacent Samples 8 7 6 5 4 3 2 1 L=length(x); d=x(2:l)-x(1:l-1); hist(d,2); -.15 -.1 -.5.5.1.15 should be chosen to be the difference value that has a relatively large percentage In the above example, =.2~.3 In adaptive DM, is varied automatically so that the reconstructed signal hugs the original signal Yao Wang, 26 EE3414: Speech Coding 23

Mozart_short.wav.2.1 -.1 Uniform vs. DM to Audio original, 75 Kbps.2.1 -.1 Mozart_q32_short.wav Uniform Q=32, 22 Kbps -.2 5 1 15 2 -.2 5 1 15 2 DM Q=.25, 44 Kbps ADM P=1.5, 44 Kbps.2.2.1.1 -.1 -.1 -.2 Mozart_DM_Q_.25.wav 5 1 15 2 -.2 5 1 15 2 Mozart_ADM_P1.5.wav

Uniform vs. DM at Different Sampling Rates 11 KHz, Q=32, 55 Kbps 44 KHz, ADM, 44 Kbps.2.2.1.1 -.1 -.1 -.2 1 2 3 4 5 -.2 5 1 15 2 Mozart_f_11_q32.wav Mozart_ADM_P1.5.wav Yao Wang, 26 EE3414: Speech Coding 25

Adaptive Predictive Coding (ADPCM) The optimal linear predictor for a signal depends on the correlation between adjacent coefficients. The short term correlation in a speech signal is time varying, depending on the sound being produced. To reduce the prediction error, the predictor should be adapted from frame to frame, where a frame is a group of speech samples, typically 2ms long (16 samples at 8 KHz sampling rate). The quantizer for the prediction error is also adapted. Yao Wang, 26 EE3414: Speech Coding 26

Forward vs. Backward Adaptation Forward adaptation Before coding a frame of data, find the optimal predictor and quantizer to use based on this frame, the resulting predictor and quantizer information needs to be transmitted along with the prediction error information Backward adaptation The predictor and quantizer are adapted based on the past data frame, the decoder does the same adaptation as the encoder. The predictor and quantizer information does not need to be transmitted: side information data rate is lower! But the prediction error is typically larger than forward adaptation! Yao Wang, 26 EE3414: Speech Coding 27

How to Further Reduce the Bit Rate ADPCM cannot produce satisfactory quality when bit rate is lower than 16 Kbps To further reduce the bit rate, the speech production model should be exploited -> model based coding or vocoder Non-model-based methods are called waveform-based coding or waveform coder Yao Wang, 26 EE3414: Speech Coding 28

From: http://www.phon.ox.ac.uk/~jcoleman/phonation.htm Yao Wang, 26 EE3414: Speech Coding 29

Speech Production Model Speech is produced when air is forced from the lungs through the vocal cords and along the vocal tract. Speech production in human can be crudely modeled as an excitation signal driving a linear system (modeled by a IIR filter) The filter coefficients depend on shape of the vocal tract, which changes when producing different sounds, by changing the positions of the tongue and jaw. Vowels are produced when the excitation signal is a periodic pulse with different periods Consonants are generated when the excitation signal is noise like Yao Wang, 26 EE3414: Speech Coding 3

The Speech Model Vocal tract model Short-term predictor x p ( n) = k = 1 Excitation model Vocoder Pitch pulse train / noise sequence Analysis-by-synthesis coder Long-term predictor e p P Optimal excitation sequence a k x( n k) ( n) = bpx( n m) Yao Wang, 26 EE3414: Speech Coding 31

Vocoder Principle For every short frame Code the filter coefficient Code the excitation signal Send a indicator for voiced unvoiced For voiced send the period Produce unnatural but intelligible sounds at very low bit rate (<=2.4 kbps) Yao Wang, 26 EE3414: Speech Coding 32

Vocoder x(n) - d(n) Pitch/Noise Detector Excitation parameters Predictor + Pitch/Noise Synthesizer Prediction parameters by S. Quackenbush Yao Wang, 26 EE3414: Speech Coding 33

Hybrid Coder To produce more natural sounds, let the excitation signal be arbitrary, chosen so that the produced speech waveform matches with the actual waveform as closely as possible Hybrid coder: code the filter model plus the excitation signal as a waveform Code-excited linear prediction (CELP) coder: choose the excitation signal from codewords in a predesigned codebook This principle leads to acceptable speech quality in the rate range 4.8-16 kbps, used in various wireless cellular phone systems Yao Wang, 26 EE3414: Speech Coding 34

Hybrid Coder x(n) - d(n) Error Minimization Excitation vector Excitation parameters Predictor + Prediction parameters by S. Quackenbush Yao Wang, 26 EE3414: Speech Coding 35

Speech Coding Standards Toll quality speech coder (digital wireline phone) G.711 (A-LAW and µ-law at 64 kbits/sec) G.721 (ADPCM at 32 kbits/ sec) G.723 (ADPCM at 4, 24 kbps) G.726 (ADPCM at 16,24,32,4 kbps) Low bit rate speech coder (cellular phone/ip phone) G.728 low delay (16 Kbps, delay <2ms, same or better quality than G.721) G. 723.1 (CELP Based, 5.3 and 6.4 kbits/sec) G.729 (CELP based, 8 bps) GSM 6.1 (13 and 6.5 kbits/sec, simple to implement, used in GSM phones) Yao Wang, 26 EE3414: Speech Coding 36

Speech Quality Versus Bit Rate For Common Classes of Codecs From: J. Wooward, Speech coding overview, http://www-mobile.ecs.soton.ac.uk/speech_codecs Yao Wang, 26 EE3414: Speech Coding 37

Demonstration of speech quality with different standards PCM, mu-law, 64 kbps GSM 6.1 vocoder, 13 kbps FED-STD-116 CELP at 4.8 kbps FED-STD-115 LPC-1 at 2.4 kb/s Speech files downloaded from http://people.qualcomm.com/karn/voicedemo/ Yao Wang, 26 EE3414: Speech Coding 38

What Should You Know Huffman coding: Given a probability table, can construct a huffman code and apply it to a string of symbols for coding. Can also decode from the bit stream Understand the general idea of variable length coding Predictive coding Understand the predictive coding process (the encoder and decoder block diagram) Understand how the simple predictive coder (including delta modulator) works and can apply it to a sequence Understand the need for adaptation of both the predictor and the quantizer, and why ADPCM improves performance Vocoder and hybrid coding Know the principle of model-based coding, understand the concept of vocoder and hybrid speech coder Standards for speech coding Know why do we need different standards and difference between them Yao Wang, 26 EE3414: Speech Coding 39

References Yao Wang, EL514 Lab Manual, Exp3: Speech and audio compression, Sec. 2.2, 2.3 (copies provided) Z. N. Li and M. Drew, Fundamentals of multimedia, Prentice Hall, 24. Sec. 6.3 (Quantization and transmission of audio) and Chap.13 (Basic audio compression techniques) A. Spanias, Speech coding: a tutorial review, Proc. IEEE, vol. 82, pp 1541-1582, 1994. (and several other articles in this issue) Speech Coding Overview Jason Woodard, http://wwwmobile.ecs.soton.ac.uk/speech_codecs/ Lossless compression algorithms, http://www.cs.sfu.ca/coursecentral/365/d1/23-1/material/notes/chap4/chap4.1/chap4.1.html The Lossless Compression (Squeeze) Page, by Dominik Szopa, http://www.cs.sfu.ca/coursecentral/365/li/squeeze/index.html Has good demo/applets for Huffman coding and LZ coding Yao Wang, 26 EE3414: Speech Coding 4