Multimedia Systems Speech II Mahdi Amiri February 2012 Sharif University of Technology

Similar documents
Mahdi Amiri. February Sharif University of Technology

Multimedia Systems Speech II Hmid R. Rabiee Mahdi Amiri February 2015 Sharif University of Technology

Source Coding Basics and Speech Coding. Yao Wang Polytechnic University, Brooklyn, NY11201

MATLAB Apps for Teaching Digital Speech Processing

Audio Fundamentals, Compression Techniques & Standards. Hamid R. Rabiee Mostafa Salehi, Fatemeh Dabiran, Hoda Ayatollahi Spring 2011

CT516 Advanced Digital Communications Lecture 7: Speech Encoder

Digital Speech Coding

Multimedia Systems Speech I Mahdi Amiri February 2011 Sharif University of Technology

ON-LINE SIMULATION MODULES FOR TEACHING SPEECH AND AUDIO COMPRESSION TECHNIQUES

Speech-Coding Techniques. Chapter 3

Data Compression. Audio compression

AUDIO. Henning Schulzrinne Dept. of Computer Science Columbia University Spring 2015

Principles of Audio Coding

Multimedia Systems Speech I Mahdi Amiri September 2015 Sharif University of Technology

Perceptual Pre-weighting and Post-inverse weighting for Speech Coding

2.4 Audio Compression

Lecture 7: Audio Compression & Coding

Speech and audio coding

Audio Coding and MP3

GSM Network and Services

Mahdi Amiri. February Sharif University of Technology

The Effect of Bit-Errors on Compressed Speech, Music and Images

KINGS COLLEGE OF ENGINEERING DEPARTMENT OF INFORMATION TECHNOLOGY ACADEMIC YEAR / ODD SEMESTER QUESTION BANK

Extraction and Representation of Features, Spring Lecture 4: Speech and Audio: Basics and Resources. Zheng-Hua Tan

ITNP80: Multimedia! Sound-II!

Audio and video compression

Perceptual coding. A psychoacoustic model is used to identify those signals that are influenced by both these effects.

Ch. 5: Audio Compression Multimedia Systems

MATLAB Functionality for Digital Speech Processing. Graphical User Interface. GUI LITE 2.5 Design Process. GUI25 Initial Screen

MULTIMODE TREE CODING OF SPEECH WITH PERCEPTUAL PRE-WEIGHTING AND POST-WEIGHTING

Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal.

Application of wavelet filtering to image compression

Optical Storage Technology. MPEG Data Compression

RESIDUAL-EXCITED LINEAR PREDICTIVE (RELP) VOCODER SYSTEM WITH TMS320C6711 DSK AND VOWEL CHARACTERIZATION

Transporting audio-video. over the Internet

Synopsis of Basic VoIP Concepts

CS 335 Graphics and Multimedia. Image Compression

REAL-TIME DIGITAL SIGNAL PROCESSING

5: Music Compression. Music Coding. Mark Handley

VALLIAMMAI ENGINEERING COLLEGE

The Steganography In Inactive Frames Of Voip

Course Syllabus. Website Multimedia Systems, Overview

JPEG: An Image Compression System

MPEG-4 General Audio Coding

Bluray (

Parametric Coding of High-Quality Audio

Chapter 2 Studies and Implementation of Subband Coder and Decoder of Speech Signal Using Rayleigh Distribution

Design of a CELP Speech Coder and Study of Complexity vs Quality Trade-offs for Different Codebooks.

VoIP Forgery Detection

Source Coding Techniques

AN EFFICIENT TRANSCODING SCHEME FOR G.729 AND G SPEECH CODECS: INTEROPERABILITY OVER THE INTERNET. Received July 2010; revised October 2011

SAOC and USAC. Spatial Audio Object Coding / Unified Speech and Audio Coding. Lecture Audio Coding WS 2013/14. Dr.-Ing.

Squeeze Play: The State of Ady0 Cmprshn. Scott Selfon Senior Development Lead Xbox Advanced Technology Group Microsoft

Contents. 3 Vector Quantization The VQ Advantage Formulation Optimality Conditions... 48

Improved Tamil Text to Speech Synthesis

6MPEG-4 audio coding tools

Topics in Linguistic Theory: Laboratory Phonology Spring 2007

On Improving the Performance of an ACELP Speech Coder

Perceptual Audio Coders What to listen for: Artifacts of Parametric Coding

Wireless Communication

JPEG: An Image Compression System. Nimrod Peleg update: Nov. 2003

MPEG-4 Version 2 Audio Workshop: HILN - Parametric Audio Coding

Fundamentals of Multimedia

Preface. I Introduction and Multimedia Data Representations 1

ijdsp Interactive Illustrations of Speech/Audio Processing Concepts

Assignment 1: Speech Production and Models EN2300 Speech Signal Processing

Text-Independent Speaker Identification

7.5 Dictionary-based Coding

Multimedia Systems Image III (Image Compression, JPEG) Mahdi Amiri April 2011 Sharif University of Technology

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Snr Staff Eng., Team Lead (Applied Research) Dolby Australia Pty Ltd

Voice Quality Assessment for Mobile to SIP Call over Live 3G Network

Implementation of a MPEG 1 Layer I Audio Decoder with Variable Bit Lengths

The MPEG-4 General Audio Coder

New Results in Low Bit Rate Speech Coding and Bandwidth Extension

Compressed Audio Demystified by Hendrik Gideonse and Connor Smith. All Rights Reserved.

COS 116 The Computational Universe Laboratory 4: Digital Sound and Music

The BroadVoice Speech Coding Algorithm. Juin-Hwey (Raymond) Chen, Ph.D. Senior Technical Director Broadcom Corporation March 22, 2010

COS 116 The Computational Universe Laboratory 4: Digital Sound and Music

Chapter 14 MPEG Audio Compression

Evaluating MMX Technology Using DSP and Multimedia Applications

Hardware for Speech and Audio Coding

The Sensitivity Matrix

Lost VOIP Packet Recovery in Active Networks

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Senior Manager, CE Technology Dolby Australia Pty Ltd

Features. Sequential encoding. Progressive encoding. Hierarchical encoding. Lossless encoding using a different strategy

Embedded lossless audio coding using linear prediction and cascade coding

Missing Frame Recovery Method for G Based on Neural Networks

the Audio Engineering Society. Convention Paper Presented at the 120th Convention 2006 May Paris, France

Perceptual Coding. Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding

Audio Compression Using DCT and DWT Techniques

ASIC Implementation and FPGA Validation of IMA ADPCM Encoder and Decoder Cores using Verilog HDL

Implementation of G.729E Speech Coding Algorithm based on TMS320VC5416 YANG Xiaojin 1, a, PAN Jinjin 2,b

On the Importance of a VoIP Packet

<< WILL FILL IN THESE SECTIONS THIS WEEK to provide sufficient background>>

ISO/IEC INTERNATIONAL STANDARD. Information technology Coding of audio-visual objects Part 3: Audio

Chapter 4: Audio Coding

A MULTI-RATE SPEECH AND CHANNEL CODEC: A GSM AMR HALF-RATE CANDIDATE

DigiPoints Volume 1. Student Workbook. Module 8 Digital Compression

Audio-coding standards

Audio 1. Audio and Speech

Transcription:

Course Presentation Multimedia Systems Speech II Mahdi Amiri February 2012 Sharif University of Technology

Homework Original Sound Speech Quantization Companding parameter (µ) Compander Quantization bit No. Uniform Quantizer Dequantizer Expander µ-law encoded sound SNR Calculation Plot and Play MATLAB code or GUI implementation (Take a look at Speech noise test MATLAB codes to have sample input signal and to find out more about how to plot and play the sounds. - + Page 1

Differential PCM (DPCM) Idea Take advantage of data redundancy [ 110 112 111 112 112 114 115 115 114 114 ] [ +2-1 +1 0 +2 +1 0-1 0 ] Page 2

Differential PCM (DPCM) Basic Scheme General Predictive Coding Delta Modulation (DM): ax i n i Problem? 1 z Page 3

Differential PCM (DPCM) Error Propagation General Predictive Coding The output of dequantizer in decoder is not equal with the input of the quantizer in the encoder The input of predictor in decoder is not the same as input values of predictor in encoder This is the source of error propagation. Page 4

Differential PCM (DPCM) Better Structure Page 5

Adaptive DPCM (ADPCM) Idea Problem? Page 6

Adaptive DPCM (ADPCM) Delta Modulation (DM) Size of Quantization Step Adaptive Delta Modulation (ADM) ADM: [ n] = M [ n 1] P= 2, Q= 1 2 M= P> 1 if cn [ ] = cn [ 1] M= Q< 1 if cn [ ] cn [ 1] Page 7

Speech Compression Concepts FFT, No Time Localization Speech Signal Joseph Fourier, 1768-1830 FFT (is only localized in frequency) Page 8

Speech Compression Concepts FFT, No Time Localization See Power Spectral Density (PSD) examples in MATLAB Page 9

Speech Compression Concepts STFT Speech Signal Dennis Gabor, 1900-1979 STFT (fixed time and frequency localization) Page 10

Speech Compression Concepts Spectrogram 3D surface spectrogram of a part from a music piece. Page 11

Speech Compression Concepts Spectrogram Spectrogram of a male voice saying nineteenth century. Page 12

Speech Compression Concepts Spectrogram Display in AudaCity Waveform Spectrogram Page 13

Speech Compression Concepts Spectrogram Display in AudaCity AudaCity Edit Preferences Spectrograms FFT Window Window size FFT Window size:128 FFT Window size:1024 Page 14

Speech Compression Concepts Spectrogram, Demonstration Bat Echolocation Call Flute by Jean Pierre Rampal Singing Voice Face! Page 15

Speech Compression Concepts Formant The time and frequency domain presentation of vowels /a/, /i/, and /u/ /a/ /i/ /u/ Page 16

Speech Compression Concepts A computing system to answer questions posed in natural language Sample Application www-943.ibm.com/innovation/us/watson/ Dr. David Ferrucci, Watson Principal Investigator Jeopardy! champions Ken Jennings (left) and Brad Rutter (right) versus the IBM computer Watson Page 17

Linear Predictive Coding (LPC) Modeling Page 18

Linear Predictive Coding (LPC) Modeling (Hiss or Buzz) Buzzer Filter Chuncks: 30 thr. 50 frames/sec. Speech = Formants + Residue Predictor for each frame: P x [ n] = a x[ n i] i = 1 i Page 19

Linear Predictive Coding (LPC) Modeling (Hiss or Buzz) The human vocal tract as an infinite impulse response (IIR) system Vowel /a/ LPC Block Diagram Page 20

Linear Predictive Coding (LPC) Original Paper, Atal-Hanauer 1971 Original Synthetic Comparison of wide-band sound spectrograms for synthetic and original speech signal for the utterance "It's time we rounded up that herd of Asian cattle," spoken by a male speaker Page 21

Linear Predictive Coding (LPC) Voiced Frame Example Original Synthetic Time Domain Frequency Domain 180 samples, Pitch period: 75 Page 22

Linear Predictive Coding (LPC) Unvoiced Frame Example Original Synthetic: White noise with uniform distribution Time Domain Frequency Domain 180 samples Page 23

Code Excited Linear Prediction CELP Problem of LPC Where there is both Hiss and Buzz Solution Encode residue Encoder Method Vector Quantization (Codebook) Decoder Page 24

Vector Quantization Block Diagram Page 25

Vector Quantization Example Sample scalar quantizer We have 3 possible colors for each square; so we can quantize each square with 2 bits (28 * 2 = 56 bits for all 28 (7*4) squares. Sample vector quantizer We have 8 forms in the codebook; so we can quantize each form with 3 bits (7 * 3 = 21 bits for all 28 (7*4) squares. Codebook Page 26

Vector Quantization Codebook Design Page 27

Comparison of Speech Coders Sample Speech A lathe is a big tool. Grab every dish of sugar. Page 28

Comparison of Speech Coders Demonstration Original ADPCM LPC CELP Page 29

Speech Coding G.711 PCM u-law, a-law 64, 80 and 96 kbps G.722 ADPCM 48, 56 and 64 kbps G.728 A form of CELP 16 kbps ITU-T Standards Check out a complete list at http://en.wikipedia.org/wiki/list_of_codecs#audio_codecs A comparison of Internet audio compression formats http://www.sericyb.com.au/audio.html Vocoders Page 30

Speech Coding HawkVoice Free and Open Source Code http://hawksoft.com/hawkvoice/ Check out voice samples of HawkVoice codecs at http://hawksoft.com/hawkvoice/codecs.shtml Page 31

Multimedia Systems Speech II Thank You Next Session: Entropy Coding FIND OUT MORE AT... 1. http://ce.sharif.edu/~m_amiri/ 2. http://www.dml.ir/ Page 32