Audio Fundamentals, Compression Techniques & Standards. Hamid R. Rabiee Mostafa Salehi, Fatemeh Dabiran, Hoda Ayatollahi Spring 2011

Similar documents
2.4 Audio Compression

Perceptual coding. A psychoacoustic model is used to identify those signals that are influenced by both these effects.

Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal.

Principles of Audio Coding

5: Music Compression. Music Coding. Mark Handley

Data Compression. Audio compression

Audio Compression. Audio Compression. Absolute Threshold. CD quality audio:

Speech-Coding Techniques. Chapter 3

AUDIO. Henning Schulzrinne Dept. of Computer Science Columbia University Spring 2015

Audio and video compression

Digital Speech Coding

Perceptual Coding. Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding

Chapter 14 MPEG Audio Compression

Audio-coding standards

Mpeg 1 layer 3 (mp3) general overview

Audio-coding standards

Audio Coding and MP3

ITNP80: Multimedia! Sound-II!

Digital Media. Daniel Fuller ITEC 2110

Application of wavelet filtering to image compression

ELL 788 Computational Perception & Cognition July November 2015

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Snr Staff Eng., Team Lead (Applied Research) Dolby Australia Pty Ltd

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Senior Manager, CE Technology Dolby Australia Pty Ltd

CS 074 The Digital World. Digital Audio

Lecture 16 Perceptual Audio Coding

CHAPTER 10: SOUND AND VIDEO EDITING

Multimedia Systems Speech II Mahdi Amiri February 2012 Sharif University of Technology

Multimedia Communications. Audio coding

The MPEG-4 General Audio Coder

Source Coding Basics and Speech Coding. Yao Wang Polytechnic University, Brooklyn, NY11201

Speech and audio coding

KINGS COLLEGE OF ENGINEERING DEPARTMENT OF INFORMATION TECHNOLOGY ACADEMIC YEAR / ODD SEMESTER QUESTION BANK

MPEG-4 General Audio Coding

Ch. 5: Audio Compression Multimedia Systems

Squeeze Play: The State of Ady0 Cmprshn. Scott Selfon Senior Development Lead Xbox Advanced Technology Group Microsoft

Bluray (

Figure 1. Generic Encoder. Window. Spectral Analysis. Psychoacoustic Model. Quantize. Pack Data into Frames. Additional Coding.

AUDIOVISUAL COMMUNICATION

CT516 Advanced Digital Communications Lecture 7: Speech Encoder

EE482: Digital Signal Processing Applications

Multimedia Systems Speech II Hmid R. Rabiee Mahdi Amiri February 2015 Sharif University of Technology

Fundamentals of Perceptual Audio Encoding. Craig Lewiston HST.723 Lab II 3/23/06

Mahdi Amiri. February Sharif University of Technology

AUDIOVISUAL COMMUNICATION

Multimedia Systems Speech I Mahdi Amiri February 2011 Sharif University of Technology

Lossy compression. CSCI 470: Web Science Keith Vertanen

Transporting audio-video. over the Internet

Compressed Audio Demystified by Hendrik Gideonse and Connor Smith. All Rights Reserved.

CISC 7610 Lecture 3 Multimedia data and data formats

Optical Storage Technology. MPEG Data Compression

The Effect of Bit-Errors on Compressed Speech, Music and Images

Parametric Coding of High-Quality Audio

What is multimedia? Multimedia. Continuous media. Most common media types. Continuous media processing. Interactivity. What is multimedia?

Extraction and Representation of Features, Spring Lecture 4: Speech and Audio: Basics and Resources. Zheng-Hua Tan

Introduction to LAN/WAN. Application Layer 4

Multimedia. What is multimedia? Media types. Interchange formats. + Text +Graphics +Audio +Image +Video. Petri Vuorimaa 1

Principles of MPEG audio compression

Appendix 4. Audio coding algorithms

Synopsis of Basic VoIP Concepts

Digital Recording and Playback

GSM Network and Services

DAB. Digital Audio Broadcasting

Lecture 7: Audio Compression & Coding

Image and Video Coding I: Fundamentals

Lossy compression CSCI 470: Web Science Keith Vertanen Copyright 2013

3 Sound / Audio. CS 5513 Multimedia Systems Spring 2009 LECTURE. Imran Ihsan Principal Design Consultant

For Mac and iphone. James McCartney Core Audio Engineer. Eric Allamanche Core Audio Engineer

Rich Recording Technology Technical overall description

DSP. Presented to the IEEE Central Texas Consultants Network by Sergio Liberman

Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig

ON-LINE SIMULATION MODULES FOR TEACHING SPEECH AND AUDIO COMPRESSION TECHNIQUES

Implementation of a MPEG 1 Layer I Audio Decoder with Variable Bit Lengths

6MPEG-4 audio coding tools

Fundamental of Digital Media Design. Introduction to Audio

Lecture Information Multimedia Video Coding & Architectures

Networking Applications

CS 335 Graphics and Multimedia. Image Compression

MPEG-4 Version 2 Audio Workshop: HILN - Parametric Audio Coding

DigiPoints Volume 1. Student Workbook. Module 8 Digital Compression

Chapter 5.5 Audio Programming

Lecture #3: Digital Music and Sound

Lecture Information. Mod 01 Part 1: The Need for Compression. Why Digital Signal Coding? (1)

Data Representation. Reminders. Sound What is sound? Interpreting bits to give them meaning. Part 4: Media - Sound, Video, Compression

Contents. 3 Vector Quantization The VQ Advantage Formulation Optimality Conditions... 48

Technical PapER. between speech and audio coding. Fraunhofer Institute for Integrated Circuits IIS

Compression; Error detection & correction

New Results in Low Bit Rate Speech Coding and Bandwidth Extension

08 Sound. Multimedia Systems. Nature of Sound, Store Audio, Sound Editing, MIDI

SAOC and USAC. Spatial Audio Object Coding / Unified Speech and Audio Coding. Lecture Audio Coding WS 2013/14. Dr.-Ing.

A Digital Audio Primer

CHAPTER 6 Audio compression in practice

Compression Part 2 Lossy Image Compression (JPEG) Norm Zeck

COS 116 The Computational Universe Laboratory 4: Digital Sound and Music

_APP B_549_10/31/06. Appendix B. Producing for Multimedia and the Web

MPEG-l.MPEG-2, MPEG-4

ijdsp Interactive Illustrations of Speech/Audio Processing Concepts

Module 9 AUDIO CODING. Version 2 ECE IIT, Kharagpur

Lecture 5: Error Resilience & Scalability

VALLIAMMAI ENGINEERING COLLEGE

Wireless Communication

Transcription:

Audio Fundamentals, Compression Techniques & Standards Hamid R. Rabiee Mostafa Salehi, Fatemeh Dabiran, Hoda Ayatollahi Spring 2011

Outlines Audio Fundamentals Sampling, digitization, quantization μ-law and A-law Sound Compression PCM, DPCM, ADPCM, CELP, LPC Audio Compression MPEG Audio Codec Perceptual Coding MPEG1 Layer1 Layer2 Layer3(MP3) 2

Sound, sound wave, acoustics Sound is a continuous wave that travels through a medium Sound wave: energy causes disturbance in a medium, made of pressure differences (measure pressure level at a location) Acoustics is the study of sound: generation, transmission, and reception of sound waves Example is striking a drum Head of drum vibrates => disturbs air molecules close to head Regions of molecules with pressure above and below equilibrium Sound transmitted by molecules bumping into each other 3

The Characteristics of Sound Waves Frequency The rate at which sound is measured Number of cycles per second or Hertz (Hz) Determines the pitch of the sound as heard by our ears The higher frequency, the clearer and sharper the sound => the higher pitch of sound Amplitude Sound s intensity or loudness The louder the sound, the larger amplitude. All sounds have a duration and successive musical sounds is called rhythm 4

The Characteristics of Sound Waves (cont.) Amplitude Time for one cycle pitch Cycle distance along wave A picture of a sample sound wave 5

Audio Digitization Togetaudioor videointo acomputer, wemust digitize it Audiosignals arecontinuousin time and amplitude Audio signal must be digitized in both time and amplitude toberepresented inbinaryform. Voice Sinusoid wave Analog-Digital Converter (ADC) 10010010000100 6

Human Perception Speech is a complex waveform Vowels (a,i,u,e,o) and bass sounds are low frequencies Consonants (s,sh,k,t, ) are high frequencies Humans most sensitive to low frequencies Most important region is 2 khz to 4 khz Hearing dependent on room and environment Sounds masked by overlapping sounds Temporal & Frequency masking 7

Audio Digitization (cont.) Step1. Sampling -- divide the horizontal axis (the time dimension) into discrete pieces. Uniform sampling is ubiquitous(everywhere at once). 8

Nyquist Theorem Suppose we are sampling a waveform. How often do we need to sample it to figure out its frequency? The Nyquist Theorem, also known as the sampling theorem, is a principle that engineers follow in the digitization of analog signals. For analog-to-digital conversion (ADC) to result in a faithful reproduction of the signal, slices, called samples, of the analog waveform must be taken frequently. The number of samples per second is called the sampling rate or samplingfrequency. 9

Nyquist Theorem (cont.) Nyquist rate -- It can be proven that a bandwidth-limited signal can be fully reconstructed from its samples, if the sampling rate is at least twice the highest frequency in the signal. The highest frequency component, in hertz, for a given analogsignal is f max. According to the Nyquist Theorem, the sampling rate must be at least 2f max, or twice the highest analog frequency component 10

Quantization Step2. Quantization -- Converts actual amplitude sample values (usually voltage measurements) into an integer approximation Tradeoff between number of bits required and error Human perception limitations affect allowable error Specific application affects allowable error Once samples have been captured (Discrete in time by sampling Nyquist), they must be made discrete in amplitude (Discrete in amplitude by quantization). Two approaches to quantization Rounding the sample to the closest integer. (e.g. round 3.14 to 3) Create a Quantizer table that generates a staircase pattern of values based on a step size. 11

Reconstruction Analog-to-Digital Converter (ADC) provides the sampled and quantized binary code. Digital-to-Analog Converter (DAC) converts the quantized binary code back into an approximation of the analog signal (reverses the quantization process) by clocking the code to the same sample rate as the ADC conversion. 12

Quantization Error Quantization is only an approximation. After quantization, some information is lost Errors (noise) introduced The difference between the original sample value and the rounded value is called the quantization error A Signal to Noise Ratio (SNR) is the ratio of the relative sizes of the signal values and the errors. The higher the SNR, the smaller the average error is with respect to the signal value, and the better the fidelity. 13

μ-law, A-law Non-uniform quantizers: Difficult to make, Expensive. Solution: Companding => Uniform Q. => Expanding 14

μ-law, A-law (cont.) North America and Japan Europe 15

μ-law vs. A-law 8-bit μ-law used in US for telephony ITU Recommendation G.711 8-bit A-law used in Europe for telephony Similar, but a slightly different curve. Both give similar quality to 12-bit linear encoding. A-law used for International circuits. Both are linear approximations to a log curve. 8000 samples/sec * 8bits per sample = 64Kb/s data rate 16

μ-law Encoding 17

μ-law Decoding 18

AUDIO & SOUND COMPRESSION TECHNIQUES 19

Sound Compression Some techniques for sound compression: PCM -send every sample DPCM -send differences between samples ADPCM -send differences, but adapt how we code them LPC -linear model of speech formation CELP -use LPC as base, but also use some bits to code corrections for the things LPC gets wrong. The techniques that code received sound signals PCM, DPCM, ADPCM The techniques that parameterize the sound model (model-based coding) LPC, CELP 20

Pulse-code Modulation (PCM) μ-law and a-law PCM have already reduced the data sent. However, each sample is still independently encoded. In reality, samples are correlated. Can utilize this correlation to reduce the data sent. 21

Pulse-code Modulation (PCM) Digital Representation of an Analog Signal Sampling and Quantization Parameters: Sampling Rate (Samples per Second) Quantization Levels (Bits per Sample) 22

Adaptive DPCM (ADPCM) Makes a simple prediction of the next sample, based on weighted previous n samples. Normally the difference between samples is relatively small and can be coded with less than 8 bits. Typically use 6 bits for difference, rather than 8 bits for absolute value. Compression is lossy, as not all differences can be coded 23

Differential PCM (DPCM) Different Coding techniques in comparison with DPCM 2 Methods Adaptive Quantization Quantization levels are adaptive, based on the content of the audio. Adaptive Prediction Receiver runs same prediction algorithm and adaptive quantization levels to reconstruct speech. 24

Model-based Coding PCM, DPCM and ADPCM directly code the received audio signal. An alternative approach is to build a parameterized model of the sound source (i.e.. Human voice). For each time slice (e.g. 20ms): Analyze the audio signal to determine how the signal was produced. Determine the model parameters that fit. Send the model parameters. At the receiver, synthesize the voice from the model and received parameters. 25

Speech formation Voiced sounds: series of pulses of air as larynx opens and closes. Basic tone then shaped by changing resonance of vocal tract. Unvoiced sounds: larynx held open, turbulent noise made in mouth. 26

LPC Introduced in 1960s. Low-bitrate encoder: 1.2Kb/s - 4Kb/s Sounds very synthetic Basic LPC mostly used where bitrate really matters (eg in miltary applications) Most modern voice codecs (eg GSM) are based on enhanced LPC encoders. 27

LPC Digitize signal, and split into segments (eg 20ms) For each segment, determine: Pitch of the signal (i.e. basic formant frequency) Loudness of the signal. Whether sound is voiced or unvoiced Voiced: vowels, m, v, l Unvoiced: f, s Vocal tract excitation parameters (LPC coefficients) 28

Limitations of LPC Model LPC linear predictor is very simple. For this to work, the vocal tract tube must not have any side branches (these would require a more complex model). OK for vowels (tube is a reasonable model) For nasal sounds, nose cavity forms a side branch. In practice this is ignored in pure LPC. More complex codecs attempt to code the residue signal, which helps correct this. 29

Code Excited Linear Prediction (CELP) Goal is to efficiently encode the residue signal, improving speech quality over LPC, but without increasing the bit rate too much. CELP codecs use a codebook of typical residue values. Analyzer compares residue to codebook values. Chooses value which is closest. Sends that value. Receiver looks up the code in its codebook, retrieves the residue, and uses this to excite the LPC formant filter. 30

CELP Problem is that codebook would require different residue values for every possible voice pitch. Codebook search would be slow, and code would require a lot of bits to send. One solution is to have two codebooks. One fixed by codec designers, just large enough to represent one pitch period of residue. One dynamically filled in with copies of the previous residue delayed by various amounts (delay provides the pitch) CELP algorithm using these techniques can provide pretty good quality at 4.8Kb/s. 31

Enhanced LPC Usage GSM (Groupe Speciale Mobile) Residual Pulse Excited LPC 13Kb/s LD-CELP Low-delay Code-Excited Linear Prediction (G.728) 16Kb/s CS-ACELP Conjugate Structure Algebraic CELP (G.729) 8Kb/s MP-MLQ Multi-Pulse Maximum Likelihood Quantization (G.723.1) 6.3Kb/s 32

Audio Compression LPC-based codecs model the sound source to achieve good compression. Works well for voice. Terrible for music. What if you can t model the source? Model the limitations of the human ear. Not all sounds in the sampled audio can actually be heard. Analyze the audio and send only the sounds that can be heard. Quantize more coarsely where noise will be less audible. Audio vs. Speech Higher quality requirement for audio Wider frequency range of audio Sound Signal 33

Psychoacoustics Model Dynamic range is ratio of maximum signal amplitude to minimum signal amplitude (measured in decibels). D = 20 log (Amax/Amin) db Human hearing has dynamic range of ~ 96dB Sensitivity of the ear is dependent on frequency. Most sensitive in range of 2-5KHz Sensitivity of human ears 34

Psychoacoustics Model Amplitude Sensitivity Frequencies only heard if they exceed a sensitivity threshold: 35

Psychoacoustics Model Frequency Masking The sensitivity threshold curve is distorted by the presence of loud sounds. Frequencies just above and below the frequency of a loud sound need to be louder than the normal minimum amplitude before they can be heard. Thinking: if there is a 8 khz signal at 60 db, can we hear another 9 khz signal at 40 db? 36

Masking Curves for Loud Sounds 37

Psychoacoustics Model Temporal masking If we hear a loud sound, then it stops, it takes a little while until we can hear a soft tone nearby After hearing a loud sound, the ear is deaf to quieter sounds in the same frequency range for a short time. 38

Principle of Audio Compression Audio compression Perceptual coding Take advantage of psychoacoustics model Distinguish between the signal of different sensitivity to human ears Signal of high sensitivity more bits allocated for coding Signal of low sensitivity less bits allocated for coding Exploit the frequency masking Don t encode the masked signal (range of masking is 1 critical band) Exploit the temporal masking Don t encode the masked signal 39

AUDIO CODING STANDARDS 40

Audio Coding Standards G.711 -A-LAW/U-LAW encodings (8 bits/sample) G.721 - ADPCM (32 kbs, 4 bits/sample) G.723 - ADPCM (24 kbsand 40 kbs, 8 bits/sample) G.728 - CELP (16 kbs) LPC (FIPS-1015) -Linear Predictive Coding (2.4kbs) CELP (FIPS-1016) -Code excited LPC (4.8kbs, 4bits/sample) G.729 - CS-ACELP (8kbs) MPEG1/MPEG2, AC3 - (16-384kbs) mono, stereo, and 5+1 channels You can find a complete table of coding standards on course web page! 41

MPEG Audio Codec MPEG (Motion Picture Expert Group) and ISO (International Standard Organization) have published several standards about digital audio coding. MPEG-1 Layer 1,2 and 3 (MP3) MPEG2 AAC MPEG4 AAC and TwinVQ They have been widely used in consumer electronics, digital audio broadcasting, DVD and movies etc. 42

MPEG Audio Codec Procedure of MPEG audio coding Apply DFT (Discrete Fourier Transform) decomposes the audio into frequency subbands that approximate the 32 critical bands (sub-band filtering) Use psychoacoustics model in bit allocation If the amplitude of signal in band is below the masking threshold, don t encode Otherwise, allocate bits based on the sensitivity of the signal Multiplex the output of the 32 bands into one bitstream 43

MPEG1 Audio Layer 1 MPEG 1 audio allows sampling rate at 44.1 48, 32, 22.05, 24 and 16KHz. MPEG1 filters the input audio into 32 bands. samples Audio Filtering And downsampling 12 samples 12 samples Normalize By scale factor Perceptual coder 12 samples 44

MPEG1 Audio Layer 2 Layer 2 is very similar to Layer 1, but groups 3 12-samples together in coding. It also improves the scaling factor quantization and also groups 3 audio samples together in bit assignment. 36 samples Audio samples Filtering And downsampling 36 samples Normalize By scale factor Perceptual coder 36samples 45

MPEG Audio Layer 3 : MP3 MP3 is another layer built on top of MPEG1 audio layer 2. MP3 then uses Huffman coding to further compress the bit streams losslessly. 46

MPEG Audio 47

Next Session Image 48