Module 9 AUDIO CODING. Version 2 ECE IIT, Kharagpur

Similar documents
Bit or Noise Allocation

Audio-coding standards

Audio-coding standards

Module 9 AUDIO CODING. Version 2 ECE IIT, Kharagpur

Wavelet filter bank based wide-band audio coder

Mpeg 1 layer 3 (mp3) general overview

Perceptual coding. A psychoacoustic model is used to identify those signals that are influenced by both these effects.

Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal.

5: Music Compression. Music Coding. Mark Handley

Filterbanks and transforms

Appendix 4. Audio coding algorithms

MPEG-1. Overview of MPEG-1 1 Standard. Introduction to perceptual and entropy codings

Fundamentals of Perceptual Audio Encoding. Craig Lewiston HST.723 Lab II 3/23/06

Ch. 5: Audio Compression Multimedia Systems

Optical Storage Technology. MPEG Data Compression

Design and implementation of a DSP based MPEG-1 audio encoder

DAB. Digital Audio Broadcasting

Multimedia Communications. Audio coding

Lecture 16 Perceptual Audio Coding

Contents. 3 Vector Quantization The VQ Advantage Formulation Optimality Conditions... 48

ELL 788 Computational Perception & Cognition July November 2015

<< WILL FILL IN THESE SECTIONS THIS WEEK to provide sufficient background>>

Perceptual Coding. Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding

Chapter 14 MPEG Audio Compression

Principles of Audio Coding

A PSYCHOACOUSTIC MODEL WITH PARTIAL SPECTRAL FLATNESS MEASURE FOR TONALITY ESTIMATION

THE DESIGN OF A HYBRID FILTER BANK FOR THE PSYCHOACOUSTIC MODEL IN ISOIMPEG PHASES 1,2 AUDIO ENCODER'

Data Compression. Media Signal Processing, Presentation 2. Presented By: Jahanzeb Farooq Michael Osadebey

EE482: Digital Signal Processing Applications

Efficient Signal Adaptive Perceptual Audio Coding

Decision tree learning

Compression Part 2 Lossy Image Compression (JPEG) Norm Zeck

What is multimedia? Multimedia. Continuous media. Most common media types. Continuous media processing. Interactivity. What is multimedia?

Design and Implementation of an MPEG-1 Layer III Audio Decoder KRISTER LAGERSTRÖM

/ / _ / _ / _ / / / / /_/ _/_/ _/_/ _/_/ _\ / All-American-Advanced-Audio-Codec

Parametric Coding of High-Quality Audio

Compressed Audio Demystified by Hendrik Gideonse and Connor Smith. All Rights Reserved.

Multimedia. What is multimedia? Media types. Interchange formats. + Text +Graphics +Audio +Image +Video. Petri Vuorimaa 1

AUDIOVISUAL COMMUNICATION

CS 335 Graphics and Multimedia. Image Compression

ELEC Dr Reji Mathew Electrical Engineering UNSW

Port of a Fixed Point MPEG-2 AAC Encoder on a ARM Platform

Audio Coding Standards

Memory Access and Computational Behavior. of MP3 Encoding

Audio Coding and MP3

Audio Compression Using Decibel chirp Wavelet in Psycho- Acoustic Model

Digital Signal Processing Lecture Notes 22 November 2010

Image Transformation Techniques Dr. Rajeev Srivastava Dept. of Computer Engineering, ITBHU, Varanasi

An adaptive wavelet-based approach for perceptual low bit rate audio coding attending to entropy-type criteria

An introduction to JPEG compression using MATLAB

Figure 1. Generic Encoder. Window. Spectral Analysis. Psychoacoustic Model. Quantize. Pack Data into Frames. Additional Coding.

AUDIOVISUAL COMMUNICATION

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Snr Staff Eng., Team Lead (Applied Research) Dolby Australia Pty Ltd

Principles of MPEG audio compression

CSCD 443/533 Advanced Networks Fall 2017

Image, video and audio coding concepts. Roadmap. Rationale. Stefan Alfredsson. (based on material by Johan Garcia)

CISC 7610 Lecture 3 Multimedia data and data formats

Digital Signal Processing. Soma Biswas

SPREAD SPECTRUM AUDIO WATERMARKING SCHEME BASED ON PSYCHOACOUSTIC MODEL

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Senior Manager, CE Technology Dolby Australia Pty Ltd

New Results in Low Bit Rate Speech Coding and Bandwidth Extension

Image Enhancement in Spatial Domain. By Dr. Rajeev Srivastava

MUSIC A Darker Phonetic Audio Coder

Lossy compression. CSCI 470: Web Science Keith Vertanen

Digital Image Processing. Image Enhancement in the Frequency Domain

ESE532: System-on-a-Chip Architecture. Today. Message. Project. Expect. Why MPEG Encode? MPEG Encoding Project Motion Estimation DCT Entropy Encoding

2.4 Audio Compression

Lossy compression CSCI 470: Web Science Keith Vertanen Copyright 2013

AN AUDIO WATERMARKING SCHEME ROBUST TO MPEG AUDIO COMPRESSION

HAVE YOUR CAKE AND HEAR IT TOO: A HUFFMAN CODED, BLOCK SWITCHING, STEREO PERCEPTUAL AUDIO CODER

An Experimental High Fidelity Perceptual Audio Coder

Assignment 3: Edge Detection

User Manual for the Testing.vi program

Digital Audio Compression

Fourier Transform in Image Processing. CS/BIOEN 6640 U of Utah Guido Gerig (slides modified from Marcel Prastawa 2012)

Linked Structures Songs, Games, Movies Part IV. Fall 2013 Carola Wenk

Further Studies of a FFT-Based Auditory Spectrum with Application in Audio Classification

Video Compression An Introduction

Seismic Noise Attenuation Using Curvelet Transform and Dip Map Data Structure

Classification of Hyperspectral Breast Images for Cancer Detection. Sander Parawira December 4, 2009

Audio Compression. Audio Compression. Absolute Threshold. CD quality audio:

ijdsp Interactive Illustrations of Speech/Audio Processing Concepts

A Parallel Reconfigurable Architecture for DCT of Lengths N=32/16/8

A Image Comparative Study using DCT, Fast Fourier, Wavelet Transforms and Huffman Algorithm

The MPEG-4 General Audio Coder

MRT based Fixed Block size Transform Coding

Statistical Image Compression using Fast Fourier Coefficients

ISO/IEC INTERNATIONAL STANDARD

A Review of Algorithms for Perceptual Coding of Digital Audio Signals

Squeeze Play: The State of Ady0 Cmprshn. Scott Selfon Senior Development Lead Xbox Advanced Technology Group Microsoft

Robust Audio Watermarking Based on Secure Spread Spectrum and Auditory Perception Model

Audio Coding. C.M. Liu Perceptual Signal Processing Lab College of Computer Science National Chiao-Tung University

A Generic Audio Classification and Segmentation Approach for Multimedia Indexing and Retrieval

AUDIO MEDIA CHAPTER Background

CT516 Advanced Digital Communications Lecture 7: Speech Encoder

Efficient Implementation of Transform Based Audio Coders using SIMD Paradigm and Multifunction Computations

Chapter 4: Audio Coding

Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig

Improved Audio Coding Using a Psychoacoustic Model Based on a Cochlear Filter Bank

convolution shift invariant linear system Fourier Transform Aliasing and sampling scale representation edge detection corner detection

Transcription:

Module 9 AUDIO CODING

Lesson 32 Psychoacoustic Models

Instructional Objectives At the end of this lesson, the students should be able to 1. State the basic objectives of both the psychoacoustic models. 2. Identify the tonal components from an auditory spectrum. 3. Identify the non-tonal components from an auditory spectrum. 4. Prune the list of tonal and non-tonal components using a sliding window. 5. Define masking index, masking function and masking threshold for both tonal and non-tonal components. 6. Calculate the global masking thresholds. 7. Explain the partition-domain transformation for psychoacoustic model-ii. 32.0 Introduction In the last few lessons, we have discussed the basic philosophy of audio encoding and the bit allocation policy followed in MPEG-1 audio standard. The MPEG-1 audio standard follows two psychoacoustic models. In this lesson, we are going to cover the two psychoacoustic models in details. 32.1 Psychoacoustic model classification Psychoacoustic model Model -1 (Layer 1 &2) Model -2 ( Layer 3) Model 1: is computationally simple. has high accuracy at high bit rate. Model 2: is computationally complex. has high accuracy at low bit rate.

Essential philosophies of both the models: Compute Fourier power spectrum of the signal. (512 point FFT for layer 1 & 2 / 1024 point FFT for layer 3). Map the spectrum into critical band domain. Distinguish between the tonal and non-tonal components. Calculate the masking function. Map these functions back to the sub-band domain. 32.2 Psychoacoustic model I The auditory spectrum is approximated by a list of tonal and non-tonal components. Tonal components are selected by identifying the maxima in the spectrum whose height is greatest in the neighbourhood. All the remaining spectral lines are used for calculating the non-tonal components. They are grouped into critical bands. Within each critical band, a non-tonal component is represented. Then, the list of tonal and noise components are decimated by eliminating those components which are below the auditory threshold or are less than one half of a critical band width from a neighbouring component. To compute the masking effect of a tonal or non-tonal component on the neighbouring spectral frequencies, the strength of the component is summed with two terms called the masking index and the masking function. o Masking index: An attenuation term which depends on the critical band rate of the component and whether it is tonal or non-tonal. o Masking function: An attenuation factor which depends on

Displacement of the component from neighbouring frequency. The component signal strength.

For a tonal component j, at critical band rate z(j), the masking threshold LT tm (j,i) at critical band rate z(i) is given by LT tm (j,i) = X tm (j) + av tm (z(j)) + vf[z(i) z(j), X tm (j)] where, X tm (j) is the strength of tonal component at frequency j, av tm (j) is the tonal masking index at the critical band rate z(j), vf (i,j) is the masking function with i representing displacement and j representing signal strength. For non-tonal components the masking index can be calculated as: LT nm (j, i) = X nm (j) + av nm (j) + vf[z(i) - z(j), X nm (j)]. The global masking thresholds are computed for all spectral frequencies by adding the masking thresholds computed above, for all the neighbouring tonal & non-tonal components, with the threshold of hearing. Then, the minimum masking threshold function is determined for each sub-band from the minimum of all the global masking thresholds contributing to that sub-band. Finally, signal to mask ratio (SMR) is computed.

32.3 Psychoacoustic model II It does not make a distinction between the tonal and non-tonal components. Spectral data is transformed into a partition domain. 1024 point FFT computation is used. Tonality is decided by the unpredictability of the spectrum with time. 32.3.1 Layer III encoding There are several new features in the Layer-III encoding scheme: Each of the 32 sub-bands in now split up into 18 spectral lines using the Modified Discrete Cosine Transform(MDCT) with window length 36 followed by a sequence of alias reduction butterflies. Variable frequency/time resolution is used to control transient effects.

Variable bit rate coding using a choice of 32 Huffman tables is used to encode the data.