Mahdi Amiri. February Sharif University of Technology

Similar documents
Multimedia Systems Speech II Hmid R. Rabiee Mahdi Amiri February 2015 Sharif University of Technology

Multimedia Systems Speech II Mahdi Amiri February 2012 Sharif University of Technology

Source Coding Basics and Speech Coding. Yao Wang Polytechnic University, Brooklyn, NY11201

MATLAB Apps for Teaching Digital Speech Processing

Speech-Coding Techniques. Chapter 3

Multimedia Systems Speech I Mahdi Amiri February 2011 Sharif University of Technology

Audio Fundamentals, Compression Techniques & Standards. Hamid R. Rabiee Mostafa Salehi, Fatemeh Dabiran, Hoda Ayatollahi Spring 2011

Digital Speech Coding

Principles of Audio Coding

ON-LINE SIMULATION MODULES FOR TEACHING SPEECH AND AUDIO COMPRESSION TECHNIQUES

Data Compression. Audio compression

2.4 Audio Compression

AUDIO. Henning Schulzrinne Dept. of Computer Science Columbia University Spring 2015

CT516 Advanced Digital Communications Lecture 7: Speech Encoder

Speech and audio coding

Perceptual Pre-weighting and Post-inverse weighting for Speech Coding

Lecture 7: Audio Compression & Coding

Multimedia Systems Speech I Mahdi Amiri September 2015 Sharif University of Technology

Extraction and Representation of Features, Spring Lecture 4: Speech and Audio: Basics and Resources. Zheng-Hua Tan

Mahdi Amiri. February Sharif University of Technology

Audio Coding and MP3

RESIDUAL-EXCITED LINEAR PREDICTIVE (RELP) VOCODER SYSTEM WITH TMS320C6711 DSK AND VOWEL CHARACTERIZATION

GSM Network and Services

KINGS COLLEGE OF ENGINEERING DEPARTMENT OF INFORMATION TECHNOLOGY ACADEMIC YEAR / ODD SEMESTER QUESTION BANK

Audio and video compression

The Effect of Bit-Errors on Compressed Speech, Music and Images

Transporting audio-video. over the Internet

Ch. 5: Audio Compression Multimedia Systems

Application of wavelet filtering to image compression

Optical Storage Technology. MPEG Data Compression

CS 335 Graphics and Multimedia. Image Compression

Synopsis of Basic VoIP Concepts

Perceptual coding. A psychoacoustic model is used to identify those signals that are influenced by both these effects.

JPEG: An Image Compression System

Parametric Coding of High-Quality Audio

MATLAB Functionality for Digital Speech Processing. Graphical User Interface. GUI LITE 2.5 Design Process. GUI25 Initial Screen

MULTIMODE TREE CODING OF SPEECH WITH PERCEPTUAL PRE-WEIGHTING AND POST-WEIGHTING

Course Syllabus. Website Multimedia Systems, Overview

AN EFFICIENT TRANSCODING SCHEME FOR G.729 AND G SPEECH CODECS: INTEROPERABILITY OVER THE INTERNET. Received July 2010; revised October 2011

Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal.

ITNP80: Multimedia! Sound-II!

MPEG-4 General Audio Coding

VALLIAMMAI ENGINEERING COLLEGE

JPEG: An Image Compression System. Nimrod Peleg update: Nov. 2003

MPEG-4 Version 2 Audio Workshop: HILN - Parametric Audio Coding

Topics in Linguistic Theory: Laboratory Phonology Spring 2007

VoIP Forgery Detection

Bluray (

5: Music Compression. Music Coding. Mark Handley

The Steganography In Inactive Frames Of Voip

Improved Tamil Text to Speech Synthesis

Source Coding Techniques

REAL-TIME DIGITAL SIGNAL PROCESSING

Multimedia Systems Image III (Image Compression, JPEG) Mahdi Amiri April 2011 Sharif University of Technology

SAOC and USAC. Spatial Audio Object Coding / Unified Speech and Audio Coding. Lecture Audio Coding WS 2013/14. Dr.-Ing.

Assignment 1: Speech Production and Models EN2300 Speech Signal Processing

Perceptual Audio Coders What to listen for: Artifacts of Parametric Coding

Text-Independent Speaker Identification

Wireless Communication

6MPEG-4 audio coding tools

Design of a CELP Speech Coder and Study of Complexity vs Quality Trade-offs for Different Codebooks.

On Improving the Performance of an ACELP Speech Coder

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Snr Staff Eng., Team Lead (Applied Research) Dolby Australia Pty Ltd

Compressed Audio Demystified by Hendrik Gideonse and Connor Smith. All Rights Reserved.

COS 116 The Computational Universe Laboratory 4: Digital Sound and Music

COS 116 The Computational Universe Laboratory 4: Digital Sound and Music

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Senior Manager, CE Technology Dolby Australia Pty Ltd

The Sensitivity Matrix

Fundamentals of Multimedia

Audio-coding standards

Preface. I Introduction and Multimedia Data Representations 1

Implementation of G.729E Speech Coding Algorithm based on TMS320VC5416 YANG Xiaojin 1, a, PAN Jinjin 2,b

Lost VOIP Packet Recovery in Active Networks

Chapter 2 Studies and Implementation of Subband Coder and Decoder of Speech Signal Using Rayleigh Distribution

Evaluating MMX Technology Using DSP and Multimedia Applications

Audio-coding standards

Perceptual Coding. Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding

Hardware for Speech and Audio Coding

Embedded lossless audio coding using linear prediction and cascade coding

The MPEG-4 General Audio Coder

New Results in Low Bit Rate Speech Coding and Bandwidth Extension

Center for Multimedia Signal Processing

Voice Quality Assessment for Mobile to SIP Call over Live 3G Network

ABSTRACT AUTOMATIC SPEECH CODEC IDENTIFICATION WITH APPLICATIONS TO TAMPERING DETECTION OF SPEECH RECORDINGS

DigiPoints Volume 1. Student Workbook. Module 8 Digital Compression

Squeeze Play: The State of Ady0 Cmprshn. Scott Selfon Senior Development Lead Xbox Advanced Technology Group Microsoft

Missing Frame Recovery Method for G Based on Neural Networks

7.5 Dictionary-based Coding

EE482: Digital Signal Processing Applications

SPEAKER RECOGNITION. 1. Speech Signal

The BroadVoice Speech Coding Algorithm. Juin-Hwey (Raymond) Chen, Ph.D. Senior Technical Director Broadcom Corporation March 22, 2010

Effect of MPEG Audio Compression on HMM-based Speech Synthesis

ASIC Implementation and FPGA Validation of IMA ADPCM Encoder and Decoder Cores using Verilog HDL

Contents. 3 Vector Quantization The VQ Advantage Formulation Optimality Conditions... 48

Implementation of a MPEG 1 Layer I Audio Decoder with Variable Bit Lengths

Chapter 14 MPEG Audio Compression

Features. Sequential encoding. Progressive encoding. Hierarchical encoding. Lossless encoding using a different strategy

ISO/IEC INTERNATIONAL STANDARD. Information technology Coding of audio-visual objects Part 3: Audio

1 Introduction. 2 Speech Compression

Overview: motion-compensated coding

Transcription:

Course Presentation Multimedia Systems Speech II Mahdi Amiri February 2014 Sharif University of Technology

Speech Compression Road Map Based on Time Domain analysis Differential Pulse-Code Modulation (DPCM) Adaptive DPCM (ADPCM) Based on Frequency Domain analysis Linear Predictive Coding (LPC) Code Excited Linear Prediction (CELP) Page 1

Differential PCM (DPCM) Idea Take advantage of data redundancy [ 110 112 111 112 112 114 115 115 114 114 ] Or histogram of PCM samples in a chunk of digitized audio. Typ. Chunk length: 50 ms [ +2-1 +1 0 +2 +1 0-1 0 ] Page 2

Differential PCM (DPCM) Prediction Simplest prediction: The difference between the current sample value and previous sample is quantized. It means we predicted current sample by assuming it to be equal to the previous sample. Another better prediction: The difference between the current sample value and the average of e.g. 2 previous samples. Much better prediction: The difference between the current sample value and the weighted average of e.g. 10 previous samples. Page 3

Differential PCM (DPCM) Basic Scheme General Predictive Coding Delta Modulation (DM): ai xn i 1 z Problem? Page 4

Differential PCM (DPCM) Error Propagation The output of dequantizer in decoder is not equal with the input of the quantizer in the encoder The input of predictor in decoder is not the same as input values of predictor in encoder This is the source of error propagation. General Predictive Coding Error propagation example in above structure when input signal increases constantly by +0.6. Quantization step size is 1; Therefore, e.g. -0.5 thr. 0.5 quantized as 0 and 0.5 thr. 1.5 quantized as 1. x_n: 0 0.6 1.2 1.8 2.4 3.0 3.6 4.2 p_n: 0 0 0.6 1.2 1.8 2.4 3.0 3.6 d_n: 0 0.6 0.6 0.6 0.6 0.6 0.6 0.6 Q : 0 1 1 1 1 1 1 1 ^Q : 0 1 1 1 1 1 1 1 ^dn: 0 1 1 1 1 1 1 1 ^pn: 0 0 1 2 3 4 5 6 ^xn: 0 1 2 3 4 5 6 7 Err: 0 0.4 0.8 1.2 1.6 2.0 2.4 2.8 Page 5

Differential PCM (DPCM) Better Structure Page 6

Adaptive DPCM (ADPCM) Idea Problem? Page 7

Adaptive DPCM (ADPCM) Delta Modulation (DM) 1 bit quantizer: 0 means + and 1 means Size of Quantization Step Adaptive Delta Modulation (ADM) ADM: [ n] = M [ n 1] M = P> 1 if c[ n] = c[ n 1] M = Q< 1 if c[ n] c[ n 1] P= 2, Q= 1 2 Page 8

Speech Compression Concepts FFT, No Time Localization Speech Signal Joseph Fourier, 1768-1830 FFT (is only localized in frequency) Page 9

Speech Compression Concepts FFT, No Time Localization, Ex. 1 See Power Spectral Density (PSD) examples in MATLAB Page 10

Speech Compression Concepts 1 0 x1, freq.: 400 Hz -1 0 0.005 0.01 0.015 1 0 x1, freq.: 900 Hz -1 0 0.005 0.01 0.015 1 0 x1, freq.: 1600 Hz -1 0 0.005 0.01 0.015 Zoomed Parts Part of x4 Page 11 3 2 1 0-1 -2 150 100 50 PSD x1 0 0 1000 2000 3000 4000 5000 6000 7000 8000 Frequency (Hz) PSD x2 150 100 50 0 0 1000 2000 3000 4000 5000 6000 7000 8000 Frequency (Hz) PSD x3 150 100-3 0 0.005 0.01 0.015 Part of x5 x4 = x1+x2+x3 1 0.5 0-0.5 50 0 0 1000 2000 3000 4000 5000 6000 7000 8000 Frequency (Hz) -1 0.325 0.33 0.335 0.34 Part of x6 x5 = x1 then x2 then x3 1 0.5 0-0.5-1 x6 = x3 then x1 then x2 0.325 0.33 0.335 0.34 0.345 FFT, No Time Localization, Ex. 2 1 0.5 0-0.5 x4 = x1+x2+x3 X4 = (x1 + x2 + x3) / 3-1 0 0.2 0.4 0.6 0.8 1 1 0.5 0-0.5 x5 = x1 then x2 then x3-1 0 0.2 0.4 0.6 0.8 1 1 0.5 0-0.5 X5 = x1 then x2 then x3 x6 = x3 then x1 then x2 X6 = x3 then x1 then x2-1 0 0.2 0.4 0.6 0.8 1 500 400 300 200 100 PSD x4 0 0 1000 2000 3000 4000 5000 6000 7000 8000 Frequency (Hz) PSD x5 500 0 0 1000 2000 3000 4000 5000 6000 7000 8000 Frequency (Hz) PSD x6 500 See Power Spectral Density (PSD) examples in MATLAB 400 300 200 100 400 300 200 100 0 0 1000 2000 3000 4000 5000 6000 7000 8000 Frequency (Hz)

Speech Compression Concepts STFT Speech Signal Dennis Gabor, 1900-1979 STFT (fixed time and frequency localization) Page 12

Speech Compression Concepts Spectrogram 3D surface spectrogram of a part from a music piece. Page 13

Speech Compression Concepts Spectrogram Spectrogram of a male voice saying nineteenth century. Page 14

Speech Compression Concepts Spectrogram Display in AudaCity Waveform Spectrogram Page 15

Speech Compression Concepts Spectrogram Display in AudaCity AudaCity Edit Preferences Spectrograms FFT Window Window size FFT Window size:128 FFT Window size:1024 Page 16

Speech Compression Concepts Spectrogram, Demonstration Bat Echolocation Call Flute by Jean Pierre Rampal Singing Voice Face! Page 17

Speech Compression Concepts Formant The time and frequency domain presentation of vowels /a/, /i/, and /u/ /a/ /i/ /u/ Page 18

Speech Compression Concepts A computing system to answer questions posed in natural language. Sample Application Design a reliable speech recognition unit for IBM Watson project. www-943.ibm.com/innovation/us/watson/ Dr. David Ferrucci, Watson Principal Investigator Jeopardy! champions Ken Jennings (left) and Brad Rutter (right) versus the IBM computer Watson Page 19

Linear Predictive Coding (LPC) Modeling Simplified Page 20

Linear Predictive Coding (LPC) Modeling (Hiss or Buzz) Buzzer Filter Chuncks: 30 thr. 50 frames/sec. Speech = Formants + Residue Predictor for each frame: P x% [ n] = a x[ n i] i= 1 i Page 21

Linear Predictive Coding (LPC) Modeling (Hiss or Buzz) The human vocal tract as an infinite impulse response (IIR) system Vowel /a/ LPC Block Diagram Page 22

Linear Predictive Coding (LPC) Original Paper, Atal-Hanauer 1971 Original Synthetic Comparison of wide-band sound spectrograms for synthetic and original speech signal for the utterance "It's time we rounded up that herd of Asian cattle," spoken by a male speaker Page 23

Linear Predictive Coding (LPC) Voiced Frame Example Original Synthetic Time Domain Frequency Domain 180 samples, Pitch period: 75 Page 24

Linear Predictive Coding (LPC) Unvoiced Frame Example Original Synthetic: White noise with uniform distribution Time Domain Frequency Domain 180 samples Page 25

Code Excited Linear Prediction Problem of LPC CELP Where there is both Hiss and Buzz Solution Encode residue Encoder Method Vector Quantization (Codebook) Decoder Page 26

Vector Quantization Block Diagram Page 27

Vector Quantization Example Sample scalar quantizer We have 3 possible colors for each square; so we can quantize each square with 2 bits (28 * 2 = 56 bits for all 28 (7*4) squares. Sample vector quantizer We have 8 forms in the codebook; so we can quantize each form with 3 bits (7 * 3 = 21 bits for all 28 (7*4) squares. Codebook Page 28

Vector Quantization Codebook Design Page 29

Comparison of Speech Coders Sample Speech A lathe is a big tool. Grab every dish of sugar. Page 30

Comparison of Speech Coders Demonstration Original ADPCM LPC CELP Page 31

Speech Coding G.711 PCM u-law, a-law 64, 80 and 96 kbps G.722 ADPCM 48, 56 and 64 kbps G.728 A form of CELP 16 kbps ITU-T T Standards Check out a complete list at http://en.wikipedia.org/wiki/list_of_codecs#audio_codecs A comparison of Internet audio compression formats http://www.sericyb.com.au/audio.html Vocoders Page 32

Speech Coding HawkVoice Free and Open Source Code http://hawksoft.com/hawkvoice/ Check out voice samples of HawkVoice codecs at http://hawksoft.com/hawkvoice/codecs.shtml Page 33

Multimedia Systems Speech II Thank You Next Session: Entropy Coding FIND OUT MORE AT... 1. http://ce.sharif.edu/~m_amiri/ 2. http://www.dml.ir/ Page 34