Multimedia Systems Speech II Mahdi Amiri February 2012 Sharif University of Technology

Course Presentation Multimedia Systems Speech II Mahdi Amiri February 2012 Sharif University of Technology

Homework Original Sound Speech Quantization Companding parameter (µ) Compander Quantization bit No. Uniform Quantizer Dequantizer Expander µ-law encoded sound SNR Calculation Plot and Play MATLAB code or GUI implementation (Take a look at Speech noise test MATLAB codes to have sample input signal and to find out more about how to plot and play the sounds. - + Page 1

Differential PCM (DPCM) Idea Take advantage of data redundancy [ 110 112 111 112 112 114 115 115 114 114 ] [ +2-1 +1 0 +2 +1 0-1 0 ] Page 2

Differential PCM (DPCM) Basic Scheme General Predictive Coding Delta Modulation (DM): ax i n i Problem? 1 z Page 3

Differential PCM (DPCM) Error Propagation General Predictive Coding The output of dequantizer in decoder is not equal with the input of the quantizer in the encoder The input of predictor in decoder is not the same as input values of predictor in encoder This is the source of error propagation. Page 4

Differential PCM (DPCM) Better Structure Page 5

Adaptive DPCM (ADPCM) Idea Problem? Page 6

Adaptive DPCM (ADPCM) Delta Modulation (DM) Size of Quantization Step Adaptive Delta Modulation (ADM) ADM: [ n] = M [ n 1] P= 2, Q= 1 2 M= P> 1 if cn [ ] = cn [ 1] M= Q< 1 if cn [ ] cn [ 1] Page 7

Speech Compression Concepts FFT, No Time Localization Speech Signal Joseph Fourier, 1768-1830 FFT (is only localized in frequency) Page 8

Speech Compression Concepts FFT, No Time Localization See Power Spectral Density (PSD) examples in MATLAB Page 9

Speech Compression Concepts STFT Speech Signal Dennis Gabor, 1900-1979 STFT (fixed time and frequency localization) Page 10

Speech Compression Concepts Spectrogram 3D surface spectrogram of a part from a music piece. Page 11

Speech Compression Concepts Spectrogram Spectrogram of a male voice saying nineteenth century. Page 12

Speech Compression Concepts Spectrogram Display in AudaCity Waveform Spectrogram Page 13

Speech Compression Concepts Spectrogram Display in AudaCity AudaCity Edit Preferences Spectrograms FFT Window Window size FFT Window size:128 FFT Window size:1024 Page 14

Speech Compression Concepts Spectrogram, Demonstration Bat Echolocation Call Flute by Jean Pierre Rampal Singing Voice Face! Page 15

Speech Compression Concepts Formant The time and frequency domain presentation of vowels /a/, /i/, and /u/ /a/ /i/ /u/ Page 16

Speech Compression Concepts A computing system to answer questions posed in natural language Sample Application www-943.ibm.com/innovation/us/watson/ Dr. David Ferrucci, Watson Principal Investigator Jeopardy! champions Ken Jennings (left) and Brad Rutter (right) versus the IBM computer Watson Page 17

Linear Predictive Coding (LPC) Modeling Page 18

Linear Predictive Coding (LPC) Modeling (Hiss or Buzz) Buzzer Filter Chuncks: 30 thr. 50 frames/sec. Speech = Formants + Residue Predictor for each frame: P x [ n] = a x[ n i] i = 1 i Page 19

Linear Predictive Coding (LPC) Modeling (Hiss or Buzz) The human vocal tract as an infinite impulse response (IIR) system Vowel /a/ LPC Block Diagram Page 20

Linear Predictive Coding (LPC) Original Paper, Atal-Hanauer 1971 Original Synthetic Comparison of wide-band sound spectrograms for synthetic and original speech signal for the utterance "It's time we rounded up that herd of Asian cattle," spoken by a male speaker Page 21

Linear Predictive Coding (LPC) Voiced Frame Example Original Synthetic Time Domain Frequency Domain 180 samples, Pitch period: 75 Page 22

Linear Predictive Coding (LPC) Unvoiced Frame Example Original Synthetic: White noise with uniform distribution Time Domain Frequency Domain 180 samples Page 23

Code Excited Linear Prediction CELP Problem of LPC Where there is both Hiss and Buzz Solution Encode residue Encoder Method Vector Quantization (Codebook) Decoder Page 24

Vector Quantization Block Diagram Page 25

Vector Quantization Example Sample scalar quantizer We have 3 possible colors for each square; so we can quantize each square with 2 bits (28 * 2 = 56 bits for all 28 (7*4) squares. Sample vector quantizer We have 8 forms in the codebook; so we can quantize each form with 3 bits (7 * 3 = 21 bits for all 28 (7*4) squares. Codebook Page 26

Vector Quantization Codebook Design Page 27

Comparison of Speech Coders Sample Speech A lathe is a big tool. Grab every dish of sugar. Page 28

Comparison of Speech Coders Demonstration Original ADPCM LPC CELP Page 29

Speech Coding G.711 PCM u-law, a-law 64, 80 and 96 kbps G.722 ADPCM 48, 56 and 64 kbps G.728 A form of CELP 16 kbps ITU-T Standards Check out a complete list at http://en.wikipedia.org/wiki/list_of_codecs#audio_codecs A comparison of Internet audio compression formats http://www.sericyb.com.au/audio.html Vocoders Page 30

Speech Coding HawkVoice Free and Open Source Code http://hawksoft.com/hawkvoice/ Check out voice samples of HawkVoice codecs at http://hawksoft.com/hawkvoice/codecs.shtml Page 31

Multimedia Systems Speech II Thank You Next Session: Entropy Coding FIND OUT MORE AT... 1. http://ce.sharif.edu/~m_amiri/ 2. http://www.dml.ir/ Page 32