Perceptual coding. A psychoacoustic model is used to identify those signals that are influenced by both these effects.

Similar documents
Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal.

Audio and video compression

Principles of Audio Coding

5: Music Compression. Music Coding. Mark Handley

Mpeg 1 layer 3 (mp3) general overview

Audio Fundamentals, Compression Techniques & Standards. Hamid R. Rabiee Mostafa Salehi, Fatemeh Dabiran, Hoda Ayatollahi Spring 2011

Audio-coding standards

Chapter 14 MPEG Audio Compression

Perceptual Coding. Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding

Audio-coding standards

Multimedia Communications. Audio coding

Audio Compression. Audio Compression. Absolute Threshold. CD quality audio:

2.4 Audio Compression

Optical Storage Technology. MPEG Data Compression

Fundamentals of Perceptual Audio Encoding. Craig Lewiston HST.723 Lab II 3/23/06

Data Compression. Audio compression

Lecture 16 Perceptual Audio Coding

ELL 788 Computational Perception & Cognition July November 2015

AUDIOVISUAL COMMUNICATION

ITNP80: Multimedia! Sound-II!

Figure 1. Generic Encoder. Window. Spectral Analysis. Psychoacoustic Model. Quantize. Pack Data into Frames. Additional Coding.

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Snr Staff Eng., Team Lead (Applied Research) Dolby Australia Pty Ltd

AUDIOVISUAL COMMUNICATION

Module 9 AUDIO CODING. Version 2 ECE IIT, Kharagpur

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Senior Manager, CE Technology Dolby Australia Pty Ltd

What is multimedia? Multimedia. Continuous media. Most common media types. Continuous media processing. Interactivity. What is multimedia?

Ch. 5: Audio Compression Multimedia Systems

Multimedia. What is multimedia? Media types. Interchange formats. + Text +Graphics +Audio +Image +Video. Petri Vuorimaa 1

Principles of MPEG audio compression

KINGS COLLEGE OF ENGINEERING DEPARTMENT OF INFORMATION TECHNOLOGY ACADEMIC YEAR / ODD SEMESTER QUESTION BANK

EE482: Digital Signal Processing Applications

Parametric Coding of High-Quality Audio

Audio Coding and MP3

DAB. Digital Audio Broadcasting

CSCD 443/533 Advanced Networks Fall 2017

Appendix 4. Audio coding algorithms

CISC 7610 Lecture 3 Multimedia data and data formats

Implementation of a MPEG 1 Layer I Audio Decoder with Variable Bit Lengths

Bluray (

DigiPoints Volume 1. Student Workbook. Module 8 Digital Compression

DSP. Presented to the IEEE Central Texas Consultants Network by Sergio Liberman

Introduction to LAN/WAN. Application Layer 4

CS 074 The Digital World. Digital Audio

EEC-484/584 Computer Networks

MPEG-1. Overview of MPEG-1 1 Standard. Introduction to perceptual and entropy codings

The MPEG-4 General Audio Coder

Compressed Audio Demystified by Hendrik Gideonse and Connor Smith. All Rights Reserved.

Squeeze Play: The State of Ady0 Cmprshn. Scott Selfon Senior Development Lead Xbox Advanced Technology Group Microsoft

CT516 Advanced Digital Communications Lecture 7: Speech Encoder

Contents. 3 Vector Quantization The VQ Advantage Formulation Optimality Conditions... 48

AN AUDIO WATERMARKING SCHEME ROBUST TO MPEG AUDIO COMPRESSION

New Results in Low Bit Rate Speech Coding and Bandwidth Extension

Speech and audio coding

Application of wavelet filtering to image compression

SPREAD SPECTRUM AUDIO WATERMARKING SCHEME BASED ON PSYCHOACOUSTIC MODEL

Perceptual Audio Coders What to listen for: Artifacts of Parametric Coding

Audio Compression Using Decibel chirp Wavelet in Psycho- Acoustic Model

Port of a Fixed Point MPEG-2 AAC Encoder on a ARM Platform

Bit or Noise Allocation

Audio Coding Standards

MPEG-4 Version 2 Audio Workshop: HILN - Parametric Audio Coding

Compression; Error detection & correction

Compression Part 2 Lossy Image Compression (JPEG) Norm Zeck

Lossy compression. CSCI 470: Web Science Keith Vertanen

Modeling of an MPEG Audio Layer-3 Encoder in Ptolemy

MPEG-1 Bitstreams Processing for Audio Content Analysis

CS 335 Graphics and Multimedia. Image Compression

Parametric Coding of Spatial Audio

CMP632 Multimedia Systems 1. Do not turn this page over until instructed to do so by the Senior Invigilator.

Index. 1. Motivation 2. Background 3. JPEG Compression The Discrete Cosine Transformation Quantization Coding 4. MPEG 5.

Wavelet filter bank based wide-band audio coder

The Effect of Bit-Errors on Compressed Speech, Music and Images

Zhitao Lu and William A. Pearlman. Rensselaer Polytechnic Institute. Abstract

<< WILL FILL IN THESE SECTIONS THIS WEEK to provide sufficient background>>

Image, video and audio coding concepts. Roadmap. Rationale. Stefan Alfredsson. (based on material by Johan Garcia)

CHAPTER 10: SOUND AND VIDEO EDITING

Transporting audio-video. over the Internet

Networking Applications

Module 9 AUDIO CODING. Version 2 ECE IIT, Kharagpur

15 Data Compression 2014/9/21. Objectives After studying this chapter, the student should be able to: 15-1 LOSSLESS COMPRESSION

Speech-Music Discrimination from MPEG-1 Bitstream

Digital Media. Daniel Fuller ITEC 2110

DIGITAL TELEVISION 1. DIGITAL VIDEO FUNDAMENTALS

Efficient Representation of Sound Images: Recent Developments in Parametric Coding of Spatial Audio

Compression transparent low-level description of audio signals

3 Sound / Audio. CS 5513 Multimedia Systems Spring 2009 LECTURE. Imran Ihsan Principal Design Consultant

AUDIO. Henning Schulzrinne Dept. of Computer Science Columbia University Spring 2015

DVB Audio. Leon van de Kerkhof (Philips Consumer Electronics)

Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig

Music & Engineering: Digital Encoding and Compression

PHABRIX. Dolby Test & Measurement Application Notes. broadcast excellence. Overview. Dolby Metadata Detection. Dolby Metadata Analysis

6MPEG-4 audio coding tools

MPEG-4 General Audio Coding

Audio coding for digital broadcasting

Digital Audio Compression

VALLIAMMAI ENGINEERING COLLEGE

ENTROPY CODING OF QUANTIZED SPECTRAL COMPONENTS IN FDLP AUDIO CODEC

For Mac and iphone. James McCartney Core Audio Engineer. Eric Allamanche Core Audio Engineer

Speech-Coding Techniques. Chapter 3

Video Compression An Introduction

Transcription:

Perceptual coding Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal. Perceptual encoders, however, have been designed for the compression of general audio such as that associated with a digital television broadcast. Use a psycho-acoustic model since its role is to exploit a number of the limitations of the human ear. Using this approach, sampled segments of the source audio waveform are analyzed as with CELP-based coders. but only those features that are perceptible to the ear are transmitted. Sensitivity: although the human ear is sensitive to signals in the range 15Hz through to 20kHz, the level of sensitivity to each signal is non-linear; that is, the ear is more sensitive to some signals. frequency masking : when multiple signals are present as is the case with general audio a strong signal may reduce the level of sensitivity of the ear to other signals which are near to it in frequency. temporal masking : when the ear hears a loud sound, it takes a short but finite time before it can hear a quieter sound. Perceptual Properties of the Ear: Sensitivity as a Function of Frequency A psychoacoustic model is used to identify those signals that are influenced by both these effects. 96 db These are then eliminated from the transmitted signals and, in so doing, this reduces the amount of information to be transmitted. The ear is most sensitive in the range of 2-5kHz Tone A is audible while tone B is not (although both have same relative amplitudes) 1

Perceptual Properties of the Ear: Frequency Masking Variation with Frequency Effect of Frequency Masking Loud tone suppresses a quieter one. Tone B masks Tone A. Tone B is audible while Tone A is not even if Tone A is audible by itself The masking effect is a function of frequency band. The width of each curve at a particular sound level is known as the critical bandwidth. Experiments show the critical bandwidth increases linearly in steps of 100Hz. e.g. for a signal of 1kHz (2x500Hz) the critical bandwidth is about 200Hz Temporal Masking Caused by a Loud Signal MPEG audio coders Motion Pictures Expert Group (MPEG) was formed by the ISO to formulate a set of standards relating to a range of multimedia applications that involve the use of video with sound. The coders associated with the audio compression part of these standards are known as: MPEG audio coders After the ear hears a loud sound, there is a delay before it can hear a quieter sound. During decay time, signals whose amplitudes are less than the decay envelope will not be heard and hence need not be transmitted. a number of these coders use perceptual coding. 2

MPEG Perceptual Audio Coding Perceptual encoding is a lossy compression technique, i.e. the decoded data is not an exact replica of the original digital audio data. Instead, digital audio data is compressed in a way that despite the high compression rate the decoded audio sounds exactly - or as closely as possible - like the original audio. This is achieved by adapting the encoding process to the characteristics of the human perception of sound. The parts of the audio signal that humans perceive distinctly are coded with high accuracy, MPEG Encoder The time-varying audio input signal is first sampled and quantized using PCM (the sampling rate and number of bits per sample being determined by the specific application). The bandwidth that is available for transmission is divided into a number of frequency subbands using a bank of analysis filters which, because of their role, are also known as critical-band filters. The less distinctive parts are coded less accurately, and parts of the sound we do not hear at all are mostly discarded or replaced by quantization noise. MPEG Encoder (2) Each frequency subband is of equal width and, essentially, the bank of filters maps each set of 32 (time-related) PCM samples into an equivalent set of 32 frequency samples, one per subband. Hence each is known as a subband sample and indicates the magnitude of each of the 32 frequency components that are present in a segment of the audio input signal of a time duration equal to 32 PCM samples. For example, assuming 32 subbands and a sampling rate of 32 ksps that is, a maximum signal frequency of 16 khz each subband has a bandwidth of 500 Hz. 3

MPEG Encoder (3) In a basic encoder, the time duration of each sampled segment of the audio input signal is equal to the time to accumulate 12 successive sets of 32 PCM and hence subband samples; that is, a time duration equal to 384 (12x32) PCM samples. In addition, the analysis filter bank also determines the maximum amplitude of the 12 subband samples in each subband. Each is known as the scaling factor for the subband. These are passed both to the psychoacoustic model and, together with the set of frequency samples in each subband, to the corresponding quantizer block. MPEG Encoder (4) The processing associated with both frequency and temporal masking is carried out by the psychoacoustic model which is performed concurrently with the filtering and analysis operations. The 12 sets of 32 PCM samples are first transformed into an equivalent set of frequency components using a mathematical technique known as the discrete Fourier transform (DFT). Then, using the known hearing thresholds and masking properties of each subband, the model determines the various masking effects of this set of signals. The output of the model is a set of what are known as signal-to-mask ratios (SMRs) and indicate those frequency components whose amplitude is below the related audible threshold. MPEG Encoder (5) In addition, the set of scaling factors are used to determine the quantization accuracy and hence bit allocations to be used for each of the audible components. This is done so that those frequency components that are in regions of highest sensitivity can be quantized with more accuracy (bits) and hence less quantization noise than those in regions where the ear is less sensitive. 4

MPEG Encoder (6) The header contains information such as the sampling frequency that has been used. The quantization is performed in two stages using a form of companding. The scaling factor in each subband is first quantized using 6 bits giving 1 of 64 levels and a further 4 bits are then used to quantize the 12 frequency components in the subband relative to this level. Collectively this is known as the subband sample (SBS) format and, in this way, all the information necessary for decoding is carried within each frame. MPEG Decoder In the decoder, after the magnitude of each set of 32 subband samples have been determined by the dequantizers, these are passed to the synthesis filter bank. The latter then produces the corresponding set of PCM samples which are decoded to produce the time-varying analog output segment. The ancillary data field at the end of a frame is optional and is used, for example, to carry additional coded samples associated with, say, the surround-sound that is present with some digital video broadcasts. 5