Audio-coding standards

Similar documents
Audio-coding standards

Appendix 4. Audio coding algorithms

Mpeg 1 layer 3 (mp3) general overview

Chapter 14 MPEG Audio Compression

Principles of Audio Coding

5: Music Compression. Music Coding. Mark Handley

Audio Compression. Audio Compression. Absolute Threshold. CD quality audio:

Module 9 AUDIO CODING. Version 2 ECE IIT, Kharagpur

Multimedia Communications. Audio coding

Perceptual Coding. Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding

Perceptual coding. A psychoacoustic model is used to identify those signals that are influenced by both these effects.

EE482: Digital Signal Processing Applications

Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal.

Optical Storage Technology. MPEG Data Compression

Fundamentals of Perceptual Audio Encoding. Craig Lewiston HST.723 Lab II 3/23/06

Lecture 16 Perceptual Audio Coding

Contents. 3 Vector Quantization The VQ Advantage Formulation Optimality Conditions... 48

CISC 7610 Lecture 3 Multimedia data and data formats

ELL 788 Computational Perception & Cognition July November 2015

Audio Coding and MP3

Bit or Noise Allocation

Ch. 5: Audio Compression Multimedia Systems

New Results in Low Bit Rate Speech Coding and Bandwidth Extension

AUDIOVISUAL COMMUNICATION

The MPEG-4 General Audio Coder

Filterbanks and transforms

Figure 1. Generic Encoder. Window. Spectral Analysis. Psychoacoustic Model. Quantize. Pack Data into Frames. Additional Coding.

Principles of MPEG audio compression

AN AUDIO WATERMARKING SCHEME ROBUST TO MPEG AUDIO COMPRESSION

DSP. Presented to the IEEE Central Texas Consultants Network by Sergio Liberman

AUDIOVISUAL COMMUNICATION

MPEG-1. Overview of MPEG-1 1 Standard. Introduction to perceptual and entropy codings

2.4 Audio Compression

Design and Implementation of an MPEG-1 Layer III Audio Decoder KRISTER LAGERSTRÖM

Port of a Fixed Point MPEG-2 AAC Encoder on a ARM Platform

Audio Coding Standards

Data Compression. Audio compression

Parametric Coding of High-Quality Audio

Module 9 AUDIO CODING. Version 2 ECE IIT, Kharagpur

DAB. Digital Audio Broadcasting

<< WILL FILL IN THESE SECTIONS THIS WEEK to provide sufficient background>>

Audio Fundamentals, Compression Techniques & Standards. Hamid R. Rabiee Mostafa Salehi, Fatemeh Dabiran, Hoda Ayatollahi Spring 2011

What is multimedia? Multimedia. Continuous media. Most common media types. Continuous media processing. Interactivity. What is multimedia?

Parametric Coding of Spatial Audio

Multimedia. What is multimedia? Media types. Interchange formats. + Text +Graphics +Audio +Image +Video. Petri Vuorimaa 1

Compressed Audio Demystified by Hendrik Gideonse and Connor Smith. All Rights Reserved.

Speech and audio coding

MPEG-4 General Audio Coding

Design and implementation of a DSP based MPEG-1 audio encoder

Wavelet filter bank based wide-band audio coder

Audio and video compression

Chapter 4: Audio Coding

MPEG-4 aacplus - Audio coding for today s digital media world

CSCD 443/533 Advanced Networks Fall 2017

/ / _ / _ / _ / / / / /_/ _/_/ _/_/ _/_/ _\ / All-American-Advanced-Audio-Codec

CHAPTER 6 Audio compression in practice

Lossy compression. CSCI 470: Web Science Keith Vertanen

An adaptive wavelet-based approach for perceptual low bit rate audio coding attending to entropy-type criteria

Music & Engineering: Digital Encoding and Compression

Lecture 7: Audio Compression & Coding

SPREAD SPECTRUM AUDIO WATERMARKING SCHEME BASED ON PSYCHOACOUSTIC MODEL

ISO/IEC INTERNATIONAL STANDARD

ijdsp Interactive Illustrations of Speech/Audio Processing Concepts

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Senior Manager, CE Technology Dolby Australia Pty Ltd

Audio coding for digital broadcasting

KINGS COLLEGE OF ENGINEERING DEPARTMENT OF INFORMATION TECHNOLOGY ACADEMIC YEAR / ODD SEMESTER QUESTION BANK

Audio Compression Using Decibel chirp Wavelet in Psycho- Acoustic Model

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Snr Staff Eng., Team Lead (Applied Research) Dolby Australia Pty Ltd

Music & Engineering: Digital Encoding and Compression

Efficient Representation of Sound Images: Recent Developments in Parametric Coding of Spatial Audio

ISO/IEC INTERNATIONAL STANDARD. Information technology MPEG audio technologies Part 3: Unified speech and audio coding

Implementation of FPGA Based MP3 player using Invers Modified Discrete Cosine Transform

Scalable Perceptual and Lossless Audio Coding based on MPEG-4 AAC

Lossy compression CSCI 470: Web Science Keith Vertanen Copyright 2013

Implementation of a MPEG 1 Layer I Audio Decoder with Variable Bit Lengths

HAVE YOUR CAKE AND HEAR IT TOO: A HUFFMAN CODED, BLOCK SWITCHING, STEREO PERCEPTUAL AUDIO CODER

Modeling of an MPEG Audio Layer-3 Encoder in Ptolemy

Lecture 5: Compression I. This Week s Schedule

DSP-CIS. Part-IV : Filter Banks & Subband Systems. Chapter-10 : Filter Bank Preliminaries. Marc Moonen

Digital Media. Daniel Fuller ITEC 2110

Improved Audio Coding Using a Psychoacoustic Model Based on a Cochlear Filter Bank

Audio Segmentation and Classification. Abdillahi Hussein Omar

PERCEPTUAL AUDIO CODING THAT SCALES TO LOW BITRATES SRIVATSAN A KANDADAI, B.E., M.S. A dissertation submitted to the Graduate School

INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO

DRA AUDIO CODING STANDARD

Compression Part 2 Lossy Image Compression (JPEG) Norm Zeck

PQMF Filter Bank, MPEG-1 / MPEG-2 BC Audio. Fraunhofer IDMT

14th European Signal Processing Conference (EUSIPCO 2006), Florence, Italy, September 4-8, 2006, copyright by EURASIP

CT516 Advanced Digital Communications Lecture 7: Speech Encoder

2014 Summer School on MPEG/VCEG Video. Video Coding Concept

Digital Audio Compression

An Experimental High Fidelity Perceptual Audio Coder

Digital Audio for Multimedia

Efficient Signal Adaptive Perceptual Audio Coding

Port of a fixed point MPEG2-AAC encoder on a ARM platform

Video Compression An Introduction

Georgios Tziritas Computer Science Department

A Review of Algorithms for Perceptual Coding of Digital Audio Signals

Compression; Error detection & correction

Perceptual Coding of Digital Audio

Transcription:

Audio-coding standards The goal is to provide CD-quality audio over telecommunications networks. Almost all CD audio coders are based on the so-called psychoacoustic model of the human auditory system. This model allows to remove the parts of the signal that the human cannot perceive. Moreover, the amount of quantization noise that is inaudible can be calculated. Audio coders work in the range from 20Hz to 20 khz. The human auditory system has remarkable detection capability with a range (called dynamic range) of over 120 db from very quiet to very loud sounds. The absolute threshold of hearing characterize the amount of energy in a pure tone such that it can be detected by listener in noiseless environment. T q

Audio coding standards

Audio coding standards However, louder sounds mask or hide weaker ones. The psychoacoustic model takes into account this masking property that occurs whenever the presence of strong audio signal makes a temporal or spectral neighborhood of weaker audio signals imperceptible. The noise detection threshold is a modified version of the absolute threshold with its shape determined by the input signal at any given time. In order to estimate this threshold we have first to understand how the ear performs spectral analysis.

Psychoacoustic model It is based on the following observations: The cochlea can be viewed as a bank of highly overlapping bandpass filters. Moreover, the cochlear filters passbands are of non-uniform bandwidth, and the bandwidths increase with increasing frequency. Dependence of resolution on frequency for human auditory system can be expressed in terms of critical-bandwidths. A critical band is a range of frequencies over which the masking SNR remains more or less constant. Critical bandwidths are less than 100 Hz for the lowest audible frequencies and more than 4 khz at the highest. Noise and tone have different masking properties.

If B Psychoacoustic model is the critical-band number then: E ( 14.5 B) Tone masking noise: db (14.1) N Noise masking tone : db, (14.2) E E T E E T N where, is noise and tone energy, respectively, is N E K T in the range of 3-6 db. Speech and audio signals are neither pure tones nor pure noise but rather mixture of both. (14.1),(14.2) capture only the contributions of individual tonelike or noise-like maskers. Typically each frame contains a collection of both masker types. These individual masking thresholds are combined to form a global masking threshold. K

Psychoacoustic model Inter-band masking means that masker centered within one critical band has some predictable effect on detection thresholds in other critical subbands. This effect is known as spread of masking and is modeled as a spreading function.

Psychoacoustic model Cochlea Filter Modeling Threshold Estimation Absolute threshold Masking thresholds for subbands Signal-to mask ratio Short-term signal spectrum Tonality Calculation Estimated Psychoacoustic Threshold

Psychoacoustic model Using short-term spectrum we classify frequency components of each subband as tone or noise For tone-like and noise-like components individual masking thresholds are computed The individual masking thresholds and are combined to form a global threshold and minimum over subband components is computed T q Signal-to-mask ratio for subband is computed Mask-to-noise ratio for subband is computed

Generic perceptual audio coder and decoder PCM audio input Coding filter bank (computing transform coefficients) Bit allocation, Quantization and coding Bitstream formatting Encoded bitstream Psychoacoustic model Encoded bitstream Bitstream unpacking Transform coefficient reconstruction Inverse transform Decoded PCM audio

Generic perceptual audio coder In audio standards the audio stream passes through a filter bank that divides the input into multiple subbands of frequency. This type of transform coding is called subband coding. This transform is overlapped. The input audio stream simultaneously passes through a psychoacoustic model that determines the ratio of the signal energy to the masking threshold for each subband. The quantization and coding block uses the signal-to-mask ratios to decide how to apportion the total number of code bits available for the quantization of the subband signal to minimize the audibility of the quantization noise. Usually scalar uniform quantization of transform coefficients is used. Quantized coefficients are coded by the Huffman code.

Overview of audio standards ISO Motion Pictures Experts Group (ISO-MPEG/audio) is one part of three part compression standard that includes video and systems. The original MPEG (sometimes it is referred as MPEG-1) was created for mono sound systems and had three layers, each providing greater compression ratio. MPEG-2 was created to provide stereo and multichannel audio capability. MPEG advanced audio coder (MPEG-AAC) has the same quality as MPEG-2 but at half the bit rate. MPEG-4 audio is a complete toolbox to do everything from low bit rate speech coding to high quality audio coding or music synthesis. Contains (high efficiency) HE AAC and HE AAC v2 modes.

MPEG-1 has 3 layers. MPEG-1 Layer I, the simplest provides bit rates above 128 kb/s per channel (compression ratios 3-4). Layer II, has intermediate complexity and provides bit rates around 128 kb/s per channel (compression ratios 5-6). Main applications: storage video on CD-ROM and transmitting of audio information over ISDN channels. Layer III is the most complex and provides bit rates around 64 kb/s per channel (compression ratios 10-12). It is used for low-rate compression systems.

Transform The polyphase filter bank is common to Layers I and II. Layer III uses hybrid transform which is based on the modified DCT and filter bank coding. The polyphase filter bank splits the audio signal into 32 equalwidth frequency subbands. Each subband is 750 Hz wide. The equal widths of the subbands do not accurately reflect the human auditory system behavior. The filter bank and its inverse are not lossless transforms. Adjacent filter bands have a major frequency overlap.

Polyphase filter bank Input frames 480 + 32 = 512 Transform coefficients 1 2 3 32

32 480 Polyphase filter bank 1 2 32

s i Polyphase filter bank The transform coefficients at the output of the represent the filter bank outputs ( n) 511 m0 x( n decimated by a factor of 32, that is where H i y i ( n) x(n) ( m) s i (32n) ( i h( m)cos 511 m0 m) H i x(32n 1/ ( m) is the input audio stream and 2)( m 32 i m) H 16) th filter i ( m), is the pulse response of the i th filter.

Polyphase filter bank All filters are obtained from the one prototype lowpass filter with pulse response h(m) C(m) h( m) is a transform window. C( m), if m / 64 is odd, C( m), otherwise A direct implementation of the above equations requires 32 512=16384 multiplications and 32 511=16352 additions to compute 32 filter outputs.

Polyphase filter bank Taking into account the periodicity of the cosine function we can obtain the equivalent but computationally more efficient representation where s ( n) i C(k) 63 k0 M ( i, k) 7 j0 C( k 64 j) x( n ( k 64 j)), is one of 512 coefficients of the transform window, ( i 1/ 2)( k 16) M ( i, k) cos, 32 i 0,...,31, k 0,...,63. (14.3) Period of M ( i, k) is 128 and h (k) is equal to - C(k) with period 64.

Polyphase filter bank Using this formula we perform only 512+32 64=2560 multiplications and 64 7+32 63=2464 additions. More efficient implementation of filtering can be obtained by reducing (14.3) to the modified cosine transform, which can be implemented using the fast Fourier transform. The Layer III of MPEG-1 besides polyphase filter bank uses the modified cosine transform (or two MDCTs).

Modified DCT Input block of length ( L 1) prev. blocks+input block =block of length N LM M Block is component-wise multiplied by analysis window W A (n) The obtained block is multiplied by matrix A of size N M The vector of M transform coeff. is multiplied by T A M Transform coefficients Block is component-wise multiplied by synthesis window W s (n) M The output block is shifted by with respect to the prev. block and added to the output sequence

MDCT The transform matrix A of size N M can be written as A Y T, where T of size M M is the matrix of DCT-IV transform with entries 2 1 1 t kn cos k n, k, n 0,..., M 1, M 2 2 M The matrix Y describes the preprocessing step Y, 0 Y1 of size Y M M T Y Y Y Y Y Y, 0 1 0 1 0 have the form Y 0 0 J I 0, Y I J, 1 0 0 I is the identity matrix of size M / 2M / 2 and J is the contra-identity matrix. 1...

MDCT N LM w a MDCT M w a MDCT M w a MDCT M

MDCT N M L M IMDCT w s M IMDCT w s M IMDCT w s M

PAM-I and PAM-II PAM uses a separate, independent, time-to-frequency transform because it needs finer frequency resolution for accurate calculation of masking thresholds. For PAM-I it is 512-point FFT and for PAM-II it is 1024-point FFT. Main steps of computing signal-to-mask ratio: 1. Calculate the sound pressure level for each subband 2. Calculate the absolute threshold T q (i) for i th spectral component of each subband. This threshold is the lower bound on the audibility of sound. 3. Separate spectral components into tonal and noise-like components.

PAM-I and PAM-II PAM-I identifies tonal components based on local peaks of the audio power spectrum. Other components PAM-I sums into single nontonal component with frequency index equal to the geometric mean of the enclosing critical subband. PAM-II computes a tonality index as a function of frequency. The tonality index is based on a measure of predictability. This index is used to interpolate between pure tone-masking-noise or noise-masking-tone values. 4. Apply a spreading function. 5. Compute the global masking threshold for each spectral component i T g ( i) f ( T ( i), T ( i, j), T ( i, k)), T t q ( i, j), T ( i, k) n t n j 1,..., m, k 1,..., are thresholds for tone and noise components. n

PAM-I and PAM-II 6. Find the minimum masking threshold for subband. T min ( n) min ( i) db, where n 1,...,32 and minimum is taken over all spectral components of the n th subband. 7. Calculate the signal-to-mask ratio for each subband i T SMR( n) L Tmin ( n). sb For each subband the mask-to-noise ratio is computed MNR( n) SNR( n) SMR( n), g where SNR(n) is determined by the quantizer and tabulated.

Stereo coding 1. Intensity stereo (scaled sum of left and right channels) 2. Middle-side (MS) stereo (sum and difference of left and right channels ) 3. Parametric stereo (stereo is synthesized from mono signal by using parameters : inter-channel phase difference, inter-channel correlation and so on. HE-AAC combines Spectral Band Replication (highfrequency part is properly scaled low-frequency part shifted to the high-frequency region) and Parametric stereo

Demo Original 1.44 Mb/s 64 kb/s 32 kb/s 8 kb/s Original 1.44 Mb/s 64 kb/s 32 kb/s 8 kb/s

Shannon-Fano-Elias Coding

Shannon-Fano-Elias Coding

Graphical Interpretation of Shannon-Fano- Elias coding

Shannon-Fano-Elias Coding

Shannon-Fano-Elias coding

Gilbert-Moore Coding

Graphical Interpretation of Gilbert-Moore code

Gilbert-Moore Coding

Gilbert-Moore Coding

Arithmetic coding

Implementation of arithmetic coding

Arithmetic coding

Implementation of arithmetic coding

Implementation of arithmetic coding

Implementation of arithmetic coding

Decoding of arithmetic code

Implementation of arithmetic coding Problems Algorithm requires high computational accuracy (theoretically infinite) Computational delay = length of the sequence to be encoded