Multimedia Systems Speech II Hmid R. Rabiee Mahdi Amiri February 2015 Sharif University of Technology

Similar documents
Mahdi Amiri. February Sharif University of Technology

Multimedia Systems Speech II Mahdi Amiri February 2012 Sharif University of Technology

Source Coding Basics and Speech Coding. Yao Wang Polytechnic University, Brooklyn, NY11201

MATLAB Apps for Teaching Digital Speech Processing

Speech-Coding Techniques. Chapter 3

Multimedia Systems Speech I Mahdi Amiri February 2011 Sharif University of Technology

Audio Fundamentals, Compression Techniques & Standards. Hamid R. Rabiee Mostafa Salehi, Fatemeh Dabiran, Hoda Ayatollahi Spring 2011

Digital Speech Coding

Principles of Audio Coding

ON-LINE SIMULATION MODULES FOR TEACHING SPEECH AND AUDIO COMPRESSION TECHNIQUES

Data Compression. Audio compression

AUDIO. Henning Schulzrinne Dept. of Computer Science Columbia University Spring 2015

2.4 Audio Compression

CT516 Advanced Digital Communications Lecture 7: Speech Encoder

Speech and audio coding

Perceptual Pre-weighting and Post-inverse weighting for Speech Coding

Lecture 7: Audio Compression & Coding

Multimedia Systems Speech I Mahdi Amiri September 2015 Sharif University of Technology

Extraction and Representation of Features, Spring Lecture 4: Speech and Audio: Basics and Resources. Zheng-Hua Tan

Mahdi Amiri. February Sharif University of Technology

Audio Coding and MP3

RESIDUAL-EXCITED LINEAR PREDICTIVE (RELP) VOCODER SYSTEM WITH TMS320C6711 DSK AND VOWEL CHARACTERIZATION

GSM Network and Services

KINGS COLLEGE OF ENGINEERING DEPARTMENT OF INFORMATION TECHNOLOGY ACADEMIC YEAR / ODD SEMESTER QUESTION BANK

Audio and video compression

The Effect of Bit-Errors on Compressed Speech, Music and Images

Ch. 5: Audio Compression Multimedia Systems

Transporting audio-video. over the Internet

Application of wavelet filtering to image compression

Optical Storage Technology. MPEG Data Compression

CS 335 Graphics and Multimedia. Image Compression

Synopsis of Basic VoIP Concepts

Perceptual coding. A psychoacoustic model is used to identify those signals that are influenced by both these effects.

JPEG: An Image Compression System

Parametric Coding of High-Quality Audio

MATLAB Functionality for Digital Speech Processing. Graphical User Interface. GUI LITE 2.5 Design Process. GUI25 Initial Screen

MULTIMODE TREE CODING OF SPEECH WITH PERCEPTUAL PRE-WEIGHTING AND POST-WEIGHTING

Course Syllabus. Website Multimedia Systems, Overview

AN EFFICIENT TRANSCODING SCHEME FOR G.729 AND G SPEECH CODECS: INTEROPERABILITY OVER THE INTERNET. Received July 2010; revised October 2011

Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal.

ITNP80: Multimedia! Sound-II!

MPEG-4 General Audio Coding

VALLIAMMAI ENGINEERING COLLEGE

JPEG: An Image Compression System. Nimrod Peleg update: Nov. 2003

MPEG-4 Version 2 Audio Workshop: HILN - Parametric Audio Coding

Topics in Linguistic Theory: Laboratory Phonology Spring 2007

VoIP Forgery Detection

Bluray (

5: Music Compression. Music Coding. Mark Handley

The Steganography In Inactive Frames Of Voip

Source Coding Techniques

Improved Tamil Text to Speech Synthesis

REAL-TIME DIGITAL SIGNAL PROCESSING

Multimedia Systems Image III (Image Compression, JPEG) Mahdi Amiri April 2011 Sharif University of Technology

SAOC and USAC. Spatial Audio Object Coding / Unified Speech and Audio Coding. Lecture Audio Coding WS 2013/14. Dr.-Ing.

Assignment 1: Speech Production and Models EN2300 Speech Signal Processing

Perceptual Audio Coders What to listen for: Artifacts of Parametric Coding

Text-Independent Speaker Identification

Wireless Communication

6MPEG-4 audio coding tools

Design of a CELP Speech Coder and Study of Complexity vs Quality Trade-offs for Different Codebooks.

On Improving the Performance of an ACELP Speech Coder

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Snr Staff Eng., Team Lead (Applied Research) Dolby Australia Pty Ltd

Compressed Audio Demystified by Hendrik Gideonse and Connor Smith. All Rights Reserved.

COS 116 The Computational Universe Laboratory 4: Digital Sound and Music

COS 116 The Computational Universe Laboratory 4: Digital Sound and Music

The Sensitivity Matrix

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Senior Manager, CE Technology Dolby Australia Pty Ltd

Fundamentals of Multimedia

Audio-coding standards

Preface. I Introduction and Multimedia Data Representations 1

Implementation of G.729E Speech Coding Algorithm based on TMS320VC5416 YANG Xiaojin 1, a, PAN Jinjin 2,b

Lost VOIP Packet Recovery in Active Networks

Chapter 2 Studies and Implementation of Subband Coder and Decoder of Speech Signal Using Rayleigh Distribution

Audio-coding standards

Evaluating MMX Technology Using DSP and Multimedia Applications

Perceptual Coding. Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding

Hardware for Speech and Audio Coding

Embedded lossless audio coding using linear prediction and cascade coding

The MPEG-4 General Audio Coder

New Results in Low Bit Rate Speech Coding and Bandwidth Extension

Center for Multimedia Signal Processing

Voice Quality Assessment for Mobile to SIP Call over Live 3G Network

ABSTRACT AUTOMATIC SPEECH CODEC IDENTIFICATION WITH APPLICATIONS TO TAMPERING DETECTION OF SPEECH RECORDINGS

DigiPoints Volume 1. Student Workbook. Module 8 Digital Compression

Squeeze Play: The State of Ady0 Cmprshn. Scott Selfon Senior Development Lead Xbox Advanced Technology Group Microsoft

Missing Frame Recovery Method for G Based on Neural Networks

7.5 Dictionary-based Coding

EE482: Digital Signal Processing Applications

SPEAKER RECOGNITION. 1. Speech Signal

The BroadVoice Speech Coding Algorithm. Juin-Hwey (Raymond) Chen, Ph.D. Senior Technical Director Broadcom Corporation March 22, 2010

Effect of MPEG Audio Compression on HMM-based Speech Synthesis

ASIC Implementation and FPGA Validation of IMA ADPCM Encoder and Decoder Cores using Verilog HDL

Contents. 3 Vector Quantization The VQ Advantage Formulation Optimality Conditions... 48

Implementation of a MPEG 1 Layer I Audio Decoder with Variable Bit Lengths

Chapter 14 MPEG Audio Compression

Features. Sequential encoding. Progressive encoding. Hierarchical encoding. Lossless encoding using a different strategy

ISO/IEC INTERNATIONAL STANDARD. Information technology Coding of audio-visual objects Part 3: Audio

1 Introduction. 2 Speech Compression

Overview: motion-compensated coding

Transcription:

Course Presentation Multimedia Systems Speech II Hmid R. Rabiee Mahdi Amiri February 25 Sharif University of Technology

Speech Compression Road Map Based on Time Domain analysis Differential Pulse-Code Modulation (DPCM) Adaptive DPCM (ADPCM) Based on Frequency Domain analysis Linear Predictive Coding (LPC) Code Excited Linear Prediction (CELP) Page

Differential PCM (DPCM) Idea Take advantage of data redundancy [ 2 2 2 4 5 5 4 4 ] Or histogram of PCM samples in a chunk of digitized audio. Typ. Chunk length: 5 ms [ +2 - + +2 + - ] Page 2

Differential PCM (DPCM) Prediction Simplest prediction: The difference between the current sample value and previous sample is quantized. It means we predicted current sample by assuming it to be equal to the previous sample. Another better prediction: The difference between the current sample value and the average of e.g. 2 previous samples. Much better prediction: The difference between the current sample value and the weighted average of e.g. previous samples. Page 3

Differential PCM (DPCM) Basic Scheme General Predictive Coding Delta Modulation (DM): ai xn i Problem? z Page 4

Differential PCM (DPCM) Error Propagation The output of dequantizer in decoder is not equal with the input of the quantizer in the encoder The input of predictor in decoder is not the same as input values of predictor in encoder This is the source of error propagation. General Predictive Coding Error propagation example in above structure when input signal increases constantly by +.6. Quantization step size is ; Therefore, e.g. -.5 thr..5 quantized as and.5 thr..5 quantized as. x_n:.6.2.8 2.4 3. 3.6 4.2 p_n:.6.2.8 2.4 3. 3.6 d_n:.6.6.6.6.6.6.6 Q : ^Q : ^dn: ^pn: 2 3 4 5 6 ^xn: 2 3 4 5 6 7 Err:.4.8.2.6 2. 2.4 2.8 Page 5

Differential PCM (DPCM) Better Structure Page 6

Adaptive DPCM (ADPCM) Idea Problem? Page 7

Adaptive DPCM (ADPCM) Delta Modulation (DM) bit quantizer: means + and means Size of Quantization Step Adaptive Delta Modulation (ADM) ADM: [ n] M [ n ] M P if c[ n] c[ n ] M Q if c[ n] c[ n ] P 2, Q 2 Page 8

Speech Compression Concepts FFT, No Time Localization Speech Signal Joseph Fourier, 768-83 FFT (is only localized in frequency) Page 9

Speech Compression Concepts FFT, No Time Localization, Ex. See Power Spectral Density (PSD) examples in MATLAB Page

Speech Compression Concepts x, freq.: 4 Hz -.5..5 x, freq.: 9 Hz -.5..5 x, freq.: 6 Hz -.5..5 Zoomed Parts Part of x4 Page 5 5 PSD x 2 3 4 5 6 7 8 Frequency (Hz) PSD x2 5 5 2 3 4 5 6 7 8 Frequency (Hz) PSD x3 5 5 2 3 4 5 6 7 8 x4 = x+x2+x3 x4 = x+x2+x3 Frequency (Hz) 5 FFT, No Time Localization, Ex. 2.5 -.5 x4 = x+x2+x3 x4 = x+x2+x3 - -.2.2.4.4.6.6.8.8.5 x5 = x then x5 x2 = x then then x3x2 then x3 PSD x4 5 PSD x4 x4 = x+x2+x3 PSD x4 4.5 5 4.5 -.5 -.5 3 3 2 4 3 2 3 2 -.5 2 -.5 - -2 - -.5.2.4.63 4.8 6 2 3 4 5 6 7 8-3.5..2.4.6.8 2 5 7 2 8 3 4 5 6 7 8 Frequency (Hz) x5 = x then x2 then x3 PSD x5 Frequency (Hz) 5 x5 = x then x2 then x3 5 PSD x5 PSD x5 x5 = x then x2 then x3 5 4.5 4.5 3 4.5 Part of - -.2.2.4.4.6.6.8.8 3.5 x6 = Frequency x3 then x6 (Hz) x = x3 then then x2x then x2 2 3 -.5 2 2 -. x5.2.3.4.6.7.8.9 -.5 2 3 4 5 6 7 8 -.5 Frequency (Hz) x6 = x3 then x then x2 - PSD x6 5 -.325.33.335.34 2 3 4 5 6 7 8.325.33.335.34 2 3 4 5 6 7 8 4.5 -.5 -.5 Frequency (Hz) x6 = x3 then x then x2 Frequency (Hz) PSD x6 3 x6 = x3 then x then x2 PSD x6 5 2 5 Part of - 4-.2.2.4.4.6.6.8.8 -.5.5 4.5-3..2.3.4.5.6.7.8.9 2 3 4 5 6 7 8 3 Frequency (Hz) 2 x6 2 -.5 -.5 - -.325.33.335.34.345 2 3 4 5 6 7 8.325.33.335.34.345 2 3 4 5 6 7 8.5 -.5.5.5 X4 = (x + x2 + x3) / 3 X5 = x then x2 then x3 X6 = x3 then x then x2 Frequency (Hz) Frequency (Hz) 5 4 3 2 PSD x4 2 3 24 35 46 57 68 7 Frequency Frequency (Hz) (Hz) PSD x5 PSD x5 5 5 4 3 2 2 3 24 35 46 57 68 7 Frequency Frequency (Hz) (Hz) PSD x6 PSD x6 5 5 4 3 2 2 3 24 35 46 57 68 7 Frequency Frequency (Hz) (Hz) See Power Spectral Density (PSD) examples in MATLAB 5 4 3 2 4 3 2 4 3 2 PSD x4

Speech Compression Concepts STFT Speech Signal Dennis Gabor, 9-979 STFT (fixed time and frequency localization) Page 2

Speech Compression Concepts Spectrogram 3D surface spectrogram of a part from a music piece. Page 3

Speech Compression Concepts Spectrogram Spectrogram of a male voice saying nineteenth century. Page 4

Speech Compression Concepts Spectrogram Display in AudaCity Waveform Spectrogram Page 5

Speech Compression Concepts Spectrogram Display in AudaCity AudaCity Edit Preferences Spectrograms FFT Window Window size FFT Window size:28 FFT Window size:24 Page 6

Speech Compression Concepts Spectrogram, Demonstration Bat Echolocation Call Flute by Jean Pierre Rampal Singing Voice Face! Page 7

Speech Compression Concepts Formant The time and frequency domain presentation of vowels /a/, /i/, and /u/ /a/ /i/ /u/ Page 8

Speech Compression Concepts A computing system to answer questions posed in natural language. Sample Application Design a reliable speech recognition unit for IBM Watson project. www-943.ibm.com/innovation/us/watson/ Dr. David Ferrucci, Watson Principal Investigator Jeopardy! champions Ken Jennings (left) and Brad Rutter (right) versus the IBM computer Watson Page 9

Linear Predictive Coding (LPC) Modeling Simplified Page 2

Linear Predictive Coding (LPC) Modeling (Hiss or Buzz) Buzzer Filter Chuncks: 3 thr. 5 frames/sec. Speech = Formants + Residue Predictor for each frame: P x[ n] a x[ n i] i i Page 2

Linear Predictive Coding (LPC) Modeling (Hiss or Buzz) The human vocal tract as an infinite impulse response (IIR) system Vowel /a/ LPC Block Diagram Page 22

Linear Predictive Coding (LPC) Original Paper, Atal-Hanauer 97 Original Synthetic Comparison of wide-band sound spectrograms for synthetic and original speech signal for the utterance "It's time we rounded up that herd of Asian cattle," spoken by a male speaker Page 23

Linear Predictive Coding (LPC) Voiced Frame Example Original Synthetic Time Domain Frequency Domain 8 samples, Pitch period: 75 Page 24

Linear Predictive Coding (LPC) Unvoiced Frame Example Original Synthetic: White noise with uniform distribution Time Domain Frequency Domain 8 samples Page 25

Code Excited Linear Prediction CELP Problem of LPC Where there is both Hiss and Buzz Solution Encode residue Encoder Method Vector Quantization (Codebook) Decoder Page 26

Vector Quantization Block Diagram Page 27

Vector Quantization Example Sample scalar quantizer We have 3 possible colors for each square; so we can quantize each square with 2 bits (28 * 2 = 56 bits for all 28 (7*4) squares. Sample vector quantizer We have 8 forms in the codebook; so we can quantize each form with 3 bits (7 * 3 = 2 bits for all 28 (7*4) squares. Codebook Page 28

Vector Quantization Codebook Design Page 29

Comparison of Speech Coders Sample Speech A lathe is a big tool. Grab every dish of sugar. Page 3

Comparison of Speech Coders Demonstration Original ADPCM LPC CELP Page 3

Speech Coding G.7 PCM u-law, a-law 64, 8 and 96 kbps G.722 ADPCM 48, 56 and 64 kbps G.728 A form of CELP 6 kbps ITU-T Standards Check out a complete list at http://en.wikipedia.org/wiki/list_of_codecs#audio_codecs A comparison of Internet audio compression formats http://www.sericyb.com.au/audio.html Vocoders Page 32

Speech Coding HawkVoice Free and Open Source Code http://hawksoft.com/hawkvoice/ Check out voice samples of HawkVoice codecs at http://hawksoft.com/hawkvoice/codecs.shtml Page 33

Multimedia Systems Speech II Thank You Next Session: Entropy Coding FIND OUT MORE AT.... http://sharif.edu/~rabiee/ 2. http://www.dml.ir/ Page 34