Course Presentation Multimedia Systems Speech II Hmid R. Rabiee Mahdi Amiri February 25 Sharif University of Technology
Speech Compression Road Map Based on Time Domain analysis Differential Pulse-Code Modulation (DPCM) Adaptive DPCM (ADPCM) Based on Frequency Domain analysis Linear Predictive Coding (LPC) Code Excited Linear Prediction (CELP) Page
Differential PCM (DPCM) Idea Take advantage of data redundancy [ 2 2 2 4 5 5 4 4 ] Or histogram of PCM samples in a chunk of digitized audio. Typ. Chunk length: 5 ms [ +2 - + +2 + - ] Page 2
Differential PCM (DPCM) Prediction Simplest prediction: The difference between the current sample value and previous sample is quantized. It means we predicted current sample by assuming it to be equal to the previous sample. Another better prediction: The difference between the current sample value and the average of e.g. 2 previous samples. Much better prediction: The difference between the current sample value and the weighted average of e.g. previous samples. Page 3
Differential PCM (DPCM) Basic Scheme General Predictive Coding Delta Modulation (DM): ai xn i Problem? z Page 4
Differential PCM (DPCM) Error Propagation The output of dequantizer in decoder is not equal with the input of the quantizer in the encoder The input of predictor in decoder is not the same as input values of predictor in encoder This is the source of error propagation. General Predictive Coding Error propagation example in above structure when input signal increases constantly by +.6. Quantization step size is ; Therefore, e.g. -.5 thr..5 quantized as and.5 thr..5 quantized as. x_n:.6.2.8 2.4 3. 3.6 4.2 p_n:.6.2.8 2.4 3. 3.6 d_n:.6.6.6.6.6.6.6 Q : ^Q : ^dn: ^pn: 2 3 4 5 6 ^xn: 2 3 4 5 6 7 Err:.4.8.2.6 2. 2.4 2.8 Page 5
Differential PCM (DPCM) Better Structure Page 6
Adaptive DPCM (ADPCM) Idea Problem? Page 7
Adaptive DPCM (ADPCM) Delta Modulation (DM) bit quantizer: means + and means Size of Quantization Step Adaptive Delta Modulation (ADM) ADM: [ n] M [ n ] M P if c[ n] c[ n ] M Q if c[ n] c[ n ] P 2, Q 2 Page 8
Speech Compression Concepts FFT, No Time Localization Speech Signal Joseph Fourier, 768-83 FFT (is only localized in frequency) Page 9
Speech Compression Concepts FFT, No Time Localization, Ex. See Power Spectral Density (PSD) examples in MATLAB Page
Speech Compression Concepts x, freq.: 4 Hz -.5..5 x, freq.: 9 Hz -.5..5 x, freq.: 6 Hz -.5..5 Zoomed Parts Part of x4 Page 5 5 PSD x 2 3 4 5 6 7 8 Frequency (Hz) PSD x2 5 5 2 3 4 5 6 7 8 Frequency (Hz) PSD x3 5 5 2 3 4 5 6 7 8 x4 = x+x2+x3 x4 = x+x2+x3 Frequency (Hz) 5 FFT, No Time Localization, Ex. 2.5 -.5 x4 = x+x2+x3 x4 = x+x2+x3 - -.2.2.4.4.6.6.8.8.5 x5 = x then x5 x2 = x then then x3x2 then x3 PSD x4 5 PSD x4 x4 = x+x2+x3 PSD x4 4.5 5 4.5 -.5 -.5 3 3 2 4 3 2 3 2 -.5 2 -.5 - -2 - -.5.2.4.63 4.8 6 2 3 4 5 6 7 8-3.5..2.4.6.8 2 5 7 2 8 3 4 5 6 7 8 Frequency (Hz) x5 = x then x2 then x3 PSD x5 Frequency (Hz) 5 x5 = x then x2 then x3 5 PSD x5 PSD x5 x5 = x then x2 then x3 5 4.5 4.5 3 4.5 Part of - -.2.2.4.4.6.6.8.8 3.5 x6 = Frequency x3 then x6 (Hz) x = x3 then then x2x then x2 2 3 -.5 2 2 -. x5.2.3.4.6.7.8.9 -.5 2 3 4 5 6 7 8 -.5 Frequency (Hz) x6 = x3 then x then x2 - PSD x6 5 -.325.33.335.34 2 3 4 5 6 7 8.325.33.335.34 2 3 4 5 6 7 8 4.5 -.5 -.5 Frequency (Hz) x6 = x3 then x then x2 Frequency (Hz) PSD x6 3 x6 = x3 then x then x2 PSD x6 5 2 5 Part of - 4-.2.2.4.4.6.6.8.8 -.5.5 4.5-3..2.3.4.5.6.7.8.9 2 3 4 5 6 7 8 3 Frequency (Hz) 2 x6 2 -.5 -.5 - -.325.33.335.34.345 2 3 4 5 6 7 8.325.33.335.34.345 2 3 4 5 6 7 8.5 -.5.5.5 X4 = (x + x2 + x3) / 3 X5 = x then x2 then x3 X6 = x3 then x then x2 Frequency (Hz) Frequency (Hz) 5 4 3 2 PSD x4 2 3 24 35 46 57 68 7 Frequency Frequency (Hz) (Hz) PSD x5 PSD x5 5 5 4 3 2 2 3 24 35 46 57 68 7 Frequency Frequency (Hz) (Hz) PSD x6 PSD x6 5 5 4 3 2 2 3 24 35 46 57 68 7 Frequency Frequency (Hz) (Hz) See Power Spectral Density (PSD) examples in MATLAB 5 4 3 2 4 3 2 4 3 2 PSD x4
Speech Compression Concepts STFT Speech Signal Dennis Gabor, 9-979 STFT (fixed time and frequency localization) Page 2
Speech Compression Concepts Spectrogram 3D surface spectrogram of a part from a music piece. Page 3
Speech Compression Concepts Spectrogram Spectrogram of a male voice saying nineteenth century. Page 4
Speech Compression Concepts Spectrogram Display in AudaCity Waveform Spectrogram Page 5
Speech Compression Concepts Spectrogram Display in AudaCity AudaCity Edit Preferences Spectrograms FFT Window Window size FFT Window size:28 FFT Window size:24 Page 6
Speech Compression Concepts Spectrogram, Demonstration Bat Echolocation Call Flute by Jean Pierre Rampal Singing Voice Face! Page 7
Speech Compression Concepts Formant The time and frequency domain presentation of vowels /a/, /i/, and /u/ /a/ /i/ /u/ Page 8
Speech Compression Concepts A computing system to answer questions posed in natural language. Sample Application Design a reliable speech recognition unit for IBM Watson project. www-943.ibm.com/innovation/us/watson/ Dr. David Ferrucci, Watson Principal Investigator Jeopardy! champions Ken Jennings (left) and Brad Rutter (right) versus the IBM computer Watson Page 9
Linear Predictive Coding (LPC) Modeling Simplified Page 2
Linear Predictive Coding (LPC) Modeling (Hiss or Buzz) Buzzer Filter Chuncks: 3 thr. 5 frames/sec. Speech = Formants + Residue Predictor for each frame: P x[ n] a x[ n i] i i Page 2
Linear Predictive Coding (LPC) Modeling (Hiss or Buzz) The human vocal tract as an infinite impulse response (IIR) system Vowel /a/ LPC Block Diagram Page 22
Linear Predictive Coding (LPC) Original Paper, Atal-Hanauer 97 Original Synthetic Comparison of wide-band sound spectrograms for synthetic and original speech signal for the utterance "It's time we rounded up that herd of Asian cattle," spoken by a male speaker Page 23
Linear Predictive Coding (LPC) Voiced Frame Example Original Synthetic Time Domain Frequency Domain 8 samples, Pitch period: 75 Page 24
Linear Predictive Coding (LPC) Unvoiced Frame Example Original Synthetic: White noise with uniform distribution Time Domain Frequency Domain 8 samples Page 25
Code Excited Linear Prediction CELP Problem of LPC Where there is both Hiss and Buzz Solution Encode residue Encoder Method Vector Quantization (Codebook) Decoder Page 26
Vector Quantization Block Diagram Page 27
Vector Quantization Example Sample scalar quantizer We have 3 possible colors for each square; so we can quantize each square with 2 bits (28 * 2 = 56 bits for all 28 (7*4) squares. Sample vector quantizer We have 8 forms in the codebook; so we can quantize each form with 3 bits (7 * 3 = 2 bits for all 28 (7*4) squares. Codebook Page 28
Vector Quantization Codebook Design Page 29
Comparison of Speech Coders Sample Speech A lathe is a big tool. Grab every dish of sugar. Page 3
Comparison of Speech Coders Demonstration Original ADPCM LPC CELP Page 3
Speech Coding G.7 PCM u-law, a-law 64, 8 and 96 kbps G.722 ADPCM 48, 56 and 64 kbps G.728 A form of CELP 6 kbps ITU-T Standards Check out a complete list at http://en.wikipedia.org/wiki/list_of_codecs#audio_codecs A comparison of Internet audio compression formats http://www.sericyb.com.au/audio.html Vocoders Page 32
Speech Coding HawkVoice Free and Open Source Code http://hawksoft.com/hawkvoice/ Check out voice samples of HawkVoice codecs at http://hawksoft.com/hawkvoice/codecs.shtml Page 33
Multimedia Systems Speech II Thank You Next Session: Entropy Coding FIND OUT MORE AT.... http://sharif.edu/~rabiee/ 2. http://www.dml.ir/ Page 34