CT516 Advanced Digital Communications Lecture 7: Speech Encoder

Similar documents
Speech-Coding Techniques. Chapter 3

Digital Speech Coding

Source Coding Basics and Speech Coding. Yao Wang Polytechnic University, Brooklyn, NY11201

AUDIO. Henning Schulzrinne Dept. of Computer Science Columbia University Spring 2015

Principles of Audio Coding

Multimedia Systems Speech II Mahdi Amiri February 2012 Sharif University of Technology

Data Compression. Audio compression

ON-LINE SIMULATION MODULES FOR TEACHING SPEECH AND AUDIO COMPRESSION TECHNIQUES

Audio Fundamentals, Compression Techniques & Standards. Hamid R. Rabiee Mostafa Salehi, Fatemeh Dabiran, Hoda Ayatollahi Spring 2011

Perceptual coding. A psychoacoustic model is used to identify those signals that are influenced by both these effects.

Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal.

2.4 Audio Compression

Application of wavelet filtering to image compression

Mahdi Amiri. February Sharif University of Technology

ITNP80: Multimedia! Sound-II!

Multimedia Systems Speech II Hmid R. Rabiee Mahdi Amiri February 2015 Sharif University of Technology

Chapter 2 Studies and Implementation of Subband Coder and Decoder of Speech Signal Using Rayleigh Distribution

Audio Coding and MP3

Audio-coding standards

Speech and audio coding

GSM Network and Services

Audio-coding standards

Multimedia Systems Speech I Mahdi Amiri February 2011 Sharif University of Technology

Contents. 3 Vector Quantization The VQ Advantage Formulation Optimality Conditions... 48

Mpeg 1 layer 3 (mp3) general overview

Audio Compression. Audio Compression. Absolute Threshold. CD quality audio:

Audio and video compression

The Effect of Bit-Errors on Compressed Speech, Music and Images

ELL 788 Computational Perception & Cognition July November 2015

Chapter 14 MPEG Audio Compression

KINGS COLLEGE OF ENGINEERING DEPARTMENT OF INFORMATION TECHNOLOGY ACADEMIC YEAR / ODD SEMESTER QUESTION BANK

Ch. 5: Audio Compression Multimedia Systems

Perceptual Coding. Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding

Optical Storage Technology. MPEG Data Compression

Perceptual Pre-weighting and Post-inverse weighting for Speech Coding

Appendix 4. Audio coding algorithms

Filterbanks and transforms

Parametric Coding of High-Quality Audio

Lecture 7: Audio Compression & Coding

Perspectives on Multimedia Quality Prediction Methodologies for Advanced Mobile and IP-based Telephony

Extraction and Representation of Features, Spring Lecture 4: Speech and Audio: Basics and Resources. Zheng-Hua Tan

Compressed Audio Demystified by Hendrik Gideonse and Connor Smith. All Rights Reserved.

EE482: Digital Signal Processing Applications

Figure 1. Generic Encoder. Window. Spectral Analysis. Psychoacoustic Model. Quantize. Pack Data into Frames. Additional Coding.

Audio Coding Standards

DRA AUDIO CODING STANDARD

MULTIMODE TREE CODING OF SPEECH WITH PERCEPTUAL PRE-WEIGHTING AND POST-WEIGHTING

Lecture 16 Perceptual Audio Coding

Implementation of a MPEG 1 Layer I Audio Decoder with Variable Bit Lengths

MATLAB Apps for Teaching Digital Speech Processing

Squeeze Play: The State of Ady0 Cmprshn. Scott Selfon Senior Development Lead Xbox Advanced Technology Group Microsoft

Design of a CELP Speech Coder and Study of Complexity vs Quality Trade-offs for Different Codebooks.

Multimedia Communications. Audio coding

Bluray (

Audio 1. Audio and Speech

MPEG-4 General Audio Coding

ijdsp Interactive Illustrations of Speech/Audio Processing Concepts

Module 8: Video Coding Basics Lecture 42: Sub-band coding, Second generation coding, 3D coding. The Lecture Contains: Performance Measures

Chapter 4: Audio Coding

White Paper Voice Quality Sound design is an art form at Snom and is at the core of our development utilising some of the world's most advance voice

Transporting audio-video. over the Internet

Synopsis of Basic VoIP Concepts

The MPEG-4 General Audio Coder

Lecture Information. Mod 01 Part 1: The Need for Compression. Why Digital Signal Coding? (1)

New Results in Low Bit Rate Speech Coding and Bandwidth Extension

Lecture Information Multimedia Video Coding & Architectures

Audio and Speech. anti-aliasing filter. amplifier. codec A D. G.7xx. 1mV A D. G.7xx. Digital sound. Digital audio. Audio coding

1 Introduction. 2 Speech Compression

I D I A P R E S E A R C H R E P O R T. October submitted for publication

7.5 Dictionary-based Coding

5: Music Compression. Music Coding. Mark Handley

ENHANCED MODIFIED BARK SPECTRAL DISTORTION (EMBSD): AN OBJECTIVE SPEECH QUALITY MEASURE BASED ON AUDIBLE DISTORTION AND COGNITION MODEL

MOHAMMAD ZAKI BIN NORANI THESIS SUBMITTED IN FULFILMENT OF THE DEGREE OF COMPUTER SCIENCE (COMPUTER SYSTEM AND NETWORKING)

Voice Quality Assessment for Mobile to SIP Call over Live 3G Network

Multimedia Systems Speech I Mahdi Amiri September 2015 Sharif University of Technology

Simple Watermark for Stereo Audio Signals with Modulated High-Frequency Band Delay

Module 9 AUDIO CODING. Version 2 ECE IIT, Kharagpur

What is multimedia? Multimedia. Continuous media. Most common media types. Continuous media processing. Interactivity. What is multimedia?

VALLIAMMAI ENGINEERING COLLEGE

MPEG-4 aacplus - Audio coding for today s digital media world

Perceptual Quality Measurement and Control: Definition, Application and Performance

Multimedia. What is multimedia? Media types. Interchange formats. + Text +Graphics +Audio +Image +Video. Petri Vuorimaa 1

The Sensitivity Matrix

Effect of MPEG Audio Compression on HMM-based Speech Synthesis

DuSLIC Infineons High Modem Performance Codec

An investigation of non-uniform bandwidths auditory filterbank in audio coding

AUDIOVISUAL COMMUNICATION

Features. Sequential encoding. Progressive encoding. Hierarchical encoding. Lossless encoding using a different strategy

Image Transformation Techniques Dr. Rajeev Srivastava Dept. of Computer Engineering, ITBHU, Varanasi

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Snr Staff Eng., Team Lead (Applied Research) Dolby Australia Pty Ltd

Investigation of Algorithms for VoIP Signaling

Perceptual Audio Coders What to listen for: Artifacts of Parametric Coding

Lossy compression. CSCI 470: Web Science Keith Vertanen

An adaptive wavelet-based approach for perceptual low bit rate audio coding attending to entropy-type criteria

Technical PapER. between speech and audio coding. Fraunhofer Institute for Integrated Circuits IIS

CISC 7610 Lecture 3 Multimedia data and data formats

Lossy compression CSCI 470: Web Science Keith Vertanen Copyright 2013

SAOC and USAC. Spatial Audio Object Coding / Unified Speech and Audio Coding. Lecture Audio Coding WS 2013/14. Dr.-Ing.

Lecture 5: Compression I. This Week s Schedule

New Directions in Subband Coding

Transcription:

CT516 Advanced Digital Communications Lecture 7: Speech Encoder Yash M. Vasavada Associate Professor, DA-IICT, Gandhinagar 2nd February 2017 Yash M. Vasavada (DA-IICT) CT516: Adv. Digital Comm. 2nd February 2017 1 / 8

Overview of Today s Talk 1 Introduction 2 Temporal Coding 3 Spectral Coding 4 Model Based Techniques 5 Performance of Vocoders Yash M. Vasavada (DA-IICT) CT516: Adv. Digital Comm. 2nd February 2017 2 / 8

Overview of Today s Talk 1 Introduction 2 Temporal Coding 3 Spectral Coding 4 Model Based Techniques 5 Performance of Vocoders Yash M. Vasavada (DA-IICT) CT516: Adv. Digital Comm. 2nd February 2017 2 / 8

Overview of Today s Talk 1 Introduction 2 Temporal Coding 3 Spectral Coding 4 Model Based Techniques 5 Performance of Vocoders Yash M. Vasavada (DA-IICT) CT516: Adv. Digital Comm. 2nd February 2017 2 / 8

Overview of Today s Talk 1 Introduction 2 Temporal Coding 3 Spectral Coding 4 Model Based Techniques 5 Performance of Vocoders Yash M. Vasavada (DA-IICT) CT516: Adv. Digital Comm. 2nd February 2017 2 / 8

Overview of Today s Talk 1 Introduction 2 Temporal Coding 3 Spectral Coding 4 Model Based Techniques 5 Performance of Vocoders Yash M. Vasavada (DA-IICT) CT516: Adv. Digital Comm. 2nd February 2017 2 / 8

Speech Encoding All speech coding techniques require quantization. Many of them employ additional properties of speech [Temporal Waveform Coding:] attempts to represent time domain samples of speech [Spectral Waveform Coding:] attempts to represent spectral characteristics of speech [Model based Coding:] replicates a model of the process by which the Human auditory tract generates the speech Yash M. Vasavada (DA-IICT) CT516: Adv. Digital Comm. 2nd February 2017 3 / 8

Speech Encoding All speech coding techniques require quantization. Many of them employ additional properties of speech [Temporal Waveform Coding:] attempts to represent time domain samples of speech [Spectral Waveform Coding:] attempts to represent spectral characteristics of speech [Model based Coding:] replicates a model of the process by which the Human auditory tract generates the speech Yash M. Vasavada (DA-IICT) CT516: Adv. Digital Comm. 2nd February 2017 3 / 8

Speech Encoding All speech coding techniques require quantization. Many of them employ additional properties of speech [Temporal Waveform Coding:] attempts to represent time domain samples of speech [Spectral Waveform Coding:] attempts to represent spectral characteristics of speech [Model based Coding:] replicates a model of the process by which the Human auditory tract generates the speech Yash M. Vasavada (DA-IICT) CT516: Adv. Digital Comm. 2nd February 2017 3 / 8

Temporal Waveform Coding 1 Pulse Code Modulation (PCM): directly quantizes the samples of speech Companding: In human speech, probability of low amplitudes is greater than that of the high amplitudes. Need to use non-uniform quantization. An alternate is nonlinear compander followed by the uniform quantizer Nonlinearity for µ-law companding: y = log (1 + µ x ) log (1 + µ) Example: 64 kbps PCM for telephone lines: Speech signal (with a bandwidth of 4 khz) is sampled at 8 ksps Encodes each sample with 8 bit uniform quantizer Uses companding with µ = 255 2 Differential PCM (DPCM): speech samples are strongly correlated from one sampling instant to the next. DPCM quantizes the difference between the current sample and the predicted value of this sample using the past samples. Yash M. Vasavada (DA-IICT) CT516: Adv. Digital Comm. 2nd February 2017 4 / 8

Temporal Waveform Coding 1 Pulse Code Modulation (PCM): directly quantizes the samples of speech Companding: In human speech, probability of low amplitudes is greater than that of the high amplitudes. Need to use non-uniform quantization. An alternate is nonlinear compander followed by the uniform quantizer Nonlinearity for µ-law companding: y = log (1 + µ x ) log (1 + µ) Example: 64 kbps PCM for telephone lines: Speech signal (with a bandwidth of 4 khz) is sampled at 8 ksps Encodes each sample with 8 bit uniform quantizer Uses companding with µ = 255 2 Differential PCM (DPCM): speech samples are strongly correlated from one sampling instant to the next. DPCM quantizes the difference between the current sample and the predicted value of this sample using the past samples. Yash M. Vasavada (DA-IICT) CT516: Adv. Digital Comm. 2nd February 2017 4 / 8

Temporal Waveform Coding 3 Adaptive PCDM Perceptual quality is determined by the relative accuracy of the quantization to the size of speech Adaptive quantizers vary the step size between the quantization levels depending on whether the speech is loud or soft Adaptive DPCM at 32 kbps is a common and easily implemented, and used in Digital European Cordless Telephone or DECT standard 4 Delta Modulation An extreme case of PCM in which signal is oversampled and R = 1 bit/sample, i.e., needs a very simple one bit quantizer Equivalent to just knowing the zero-crossing of the speech signal Adaptive Delta Modulation can produce reasonable quality speech at 16 kbps Yash M. Vasavada (DA-IICT) CT516: Adv. Digital Comm. 2nd February 2017 5 / 8

Temporal Waveform Coding 3 Adaptive PCDM Perceptual quality is determined by the relative accuracy of the quantization to the size of speech Adaptive quantizers vary the step size between the quantization levels depending on whether the speech is loud or soft Adaptive DPCM at 32 kbps is a common and easily implemented, and used in Digital European Cordless Telephone or DECT standard 4 Delta Modulation An extreme case of PCM in which signal is oversampled and R = 1 bit/sample, i.e., needs a very simple one bit quantizer Equivalent to just knowing the zero-crossing of the speech signal Adaptive Delta Modulation can produce reasonable quality speech at 16 kbps Yash M. Vasavada (DA-IICT) CT516: Adv. Digital Comm. 2nd February 2017 5 / 8

Spectral Waveform Coding 1 Subband Coding Human perception of speech quality depends on the frequency band. Higher MSE may be tolerated at very low and very high frequencies Subband coders filter the speech signal into multiple bands using Quadrature Mirror Filters (QMFs) Filtered signals are quantied using standard PCM, with a different value of R for different bands 2 Adaptive Transform Coding Correlated time domain signals are transformed into frequency domain samples using FFT or Discrete Cosine Transform These frequency domain signals are represented using their perceptual importance Often combined with VQ and Linear Prediction Yash M. Vasavada (DA-IICT) CT516: Adv. Digital Comm. 2nd February 2017 6 / 8

Spectral Waveform Coding 1 Subband Coding Human perception of speech quality depends on the frequency band. Higher MSE may be tolerated at very low and very high frequencies Subband coders filter the speech signal into multiple bands using Quadrature Mirror Filters (QMFs) Filtered signals are quantied using standard PCM, with a different value of R for different bands 2 Adaptive Transform Coding Correlated time domain signals are transformed into frequency domain samples using FFT or Discrete Cosine Transform These frequency domain signals are represented using their perceptual importance Often combined with VQ and Linear Prediction Yash M. Vasavada (DA-IICT) CT516: Adv. Digital Comm. 2nd February 2017 6 / 8

Linear Predictive Speech Coding Human speech is modeled as noise (air from the lungs) exciting a linear filter (throat, vocal cords and the mouth) Following are quantized and transmitted to the receiver: excitation sequence, filter coefficients, gain VSELP (Vector Sum Excited Linear Prediction) 20 ms frames with 159 bits per frame. Data rate of approx 8 kbps. Two stage VQ is used to quantize the excitation sequence Bits (such as those representing the filter gain) are more critical for perceptual quality. These are protected by error correction coding Yash M. Vasavada (DA-IICT) CT516: Adv. Digital Comm. 2nd February 2017 7 / 8

Linear Predictive Speech Coding Human speech is modeled as noise (air from the lungs) exciting a linear filter (throat, vocal cords and the mouth) Following are quantized and transmitted to the receiver: excitation sequence, filter coefficients, gain VSELP (Vector Sum Excited Linear Prediction) 20 ms frames with 159 bits per frame. Data rate of approx 8 kbps. Two stage VQ is used to quantize the excitation sequence Bits (such as those representing the filter gain) are more critical for perceptual quality. These are protected by error correction coding Yash M. Vasavada (DA-IICT) CT516: Adv. Digital Comm. 2nd February 2017 7 / 8

Relative Performance Table: Vocoder Comparison Type Rate Complexity Delay Quality (kbps) (MIPS) (ms) PCM 64 0.01 0 High ADPCM 32 0.1 0 High Subband 16 1 25 High VSELP 8 100 35 Fair How to measure the quality or the distortion of the speech encoder and quantizers? MSE is a possible measure, however, it does not relate to how good the encoded and quantized speech sounds to the human ear Alternative distortion measures: Perceptually Weighted MSE, Segmental SNR, Itakura-Saito, Log Spectral Distance, etc. No single measure has been found to accurately quantify the perceptual quality Subjective assessment of speech quality involves Mean Opinion Score (MOS) testing: subjects rate speech on a scale of 1 (unintelligible) to 5 (perfect). Toll quality telephone speech rates around 4.3. Yash M. Vasavada (DA-IICT) CT516: Adv. Digital Comm. 2nd February 2017 8 / 8

Relative Performance Table: Vocoder Comparison Type Rate Complexity Delay Quality (kbps) (MIPS) (ms) PCM 64 0.01 0 High ADPCM 32 0.1 0 High Subband 16 1 25 High VSELP 8 100 35 Fair How to measure the quality or the distortion of the speech encoder and quantizers? MSE is a possible measure, however, it does not relate to how good the encoded and quantized speech sounds to the human ear Alternative distortion measures: Perceptually Weighted MSE, Segmental SNR, Itakura-Saito, Log Spectral Distance, etc. No single measure has been found to accurately quantify the perceptual quality Subjective assessment of speech quality involves Mean Opinion Score (MOS) testing: subjects rate speech on a scale of 1 (unintelligible) to 5 (perfect). Toll quality telephone speech rates around 4.3. Yash M. Vasavada (DA-IICT) CT516: Adv. Digital Comm. 2nd February 2017 8 / 8

References N. S. Jayant, Coding Speech at Low Bit Rates, IEEE Spectrum, August 1986. N. S. Jayant, et al., Coding of Speech and Wideband Audio, AT&T Technical Journal, October 1990. Yash M. Vasavada (DA-IICT) CT516: Adv. Digital Comm. 2nd February 2017 9 / 8