Fundamentals of Perceptual Audio Encoding. Craig Lewiston HST.723 Lab II 3/23/06

Similar documents
Mpeg 1 layer 3 (mp3) general overview

Audio-coding standards

Audio-coding standards

AUDIOVISUAL COMMUNICATION

Audio Compression. Audio Compression. Absolute Threshold. CD quality audio:

Perceptual coding. A psychoacoustic model is used to identify those signals that are influenced by both these effects.

Multimedia Communications. Audio coding

AUDIOVISUAL COMMUNICATION

Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal.

Lecture 16 Perceptual Audio Coding

MPEG-1. Overview of MPEG-1 1 Standard. Introduction to perceptual and entropy codings

Principles of Audio Coding

Chapter 14 MPEG Audio Compression

Perceptual Coding. Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding

Ch. 5: Audio Compression Multimedia Systems

Appendix 4. Audio coding algorithms

5: Music Compression. Music Coding. Mark Handley

ELL 788 Computational Perception & Cognition July November 2015

Optical Storage Technology. MPEG Data Compression

CISC 7610 Lecture 3 Multimedia data and data formats

EE482: Digital Signal Processing Applications

Wavelet filter bank based wide-band audio coder

DSP. Presented to the IEEE Central Texas Consultants Network by Sergio Liberman

2.4 Audio Compression

Networking Applications

Audio and video compression

Audio Fundamentals, Compression Techniques & Standards. Hamid R. Rabiee Mostafa Salehi, Fatemeh Dabiran, Hoda Ayatollahi Spring 2011

Figure 1. Generic Encoder. Window. Spectral Analysis. Psychoacoustic Model. Quantize. Pack Data into Frames. Additional Coding.

Port of a Fixed Point MPEG-2 AAC Encoder on a ARM Platform

Bit or Noise Allocation

SPREAD SPECTRUM AUDIO WATERMARKING SCHEME BASED ON PSYCHOACOUSTIC MODEL

Data Compression. Audio compression

CS 074 The Digital World. Digital Audio

Parametric Coding of High-Quality Audio

New Results in Low Bit Rate Speech Coding and Bandwidth Extension

Compression; Error detection & correction

Port of a fixed point MPEG2-AAC encoder on a ARM platform

Rich Recording Technology Technical overall description

DAB. Digital Audio Broadcasting

Digital Recording and Playback

Principles of MPEG audio compression

The Gullibility of Human Senses

Lossy compression. CSCI 470: Web Science Keith Vertanen

Module 9 AUDIO CODING. Version 2 ECE IIT, Kharagpur

ijdsp Interactive Illustrations of Speech/Audio Processing Concepts

Audio Compression Using Decibel chirp Wavelet in Psycho- Acoustic Model

MPEG-4 aacplus - Audio coding for today s digital media world

What is multimedia? Multimedia. Continuous media. Most common media types. Continuous media processing. Interactivity. What is multimedia?

Compressed Audio Demystified by Hendrik Gideonse and Connor Smith. All Rights Reserved.

Multimedia. What is multimedia? Media types. Interchange formats. + Text +Graphics +Audio +Image +Video. Petri Vuorimaa 1

DRA AUDIO CODING STANDARD

Lossy compression CSCI 470: Web Science Keith Vertanen Copyright 2013

Audio Coding Standards

The MPEG-4 General Audio Coder

Perceptual Coding of Digital Audio

<< WILL FILL IN THESE SECTIONS THIS WEEK to provide sufficient background>>

Squeeze Play: The State of Ady0 Cmprshn. Scott Selfon Senior Development Lead Xbox Advanced Technology Group Microsoft

AUDIO MEDIA CHAPTER Background

Module 9 AUDIO CODING. Version 2 ECE IIT, Kharagpur

HAVE YOUR CAKE AND HEAR IT TOO: A HUFFMAN CODED, BLOCK SWITCHING, STEREO PERCEPTUAL AUDIO CODER

CSCD 443/533 Advanced Networks Fall 2017

Lecture 7: Audio Compression & Coding

Digital Media. Daniel Fuller ITEC 2110

Compression; Error detection & correction

Improved Audio Coding Using a Psychoacoustic Model Based on a Cochlear Filter Bank

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Snr Staff Eng., Team Lead (Applied Research) Dolby Australia Pty Ltd

Modeling of an MPEG Audio Layer-3 Encoder in Ptolemy

MP3. Panayiotis Petropoulos

Design and implementation of a DSP based MPEG-1 audio encoder

Multimedia Systems Speech I Mahdi Amiri February 2011 Sharif University of Technology

Bluray (

PERCEPTUAL AUDIO CODING THAT SCALES TO LOW BITRATES SRIVATSAN A KANDADAI, B.E., M.S. A dissertation submitted to the Graduate School

University of Pennsylvania Department of Electrical and Systems Engineering Digital Audio Basics

Digital Audio for Multimedia

For Mac and iphone. James McCartney Core Audio Engineer. Eric Allamanche Core Audio Engineer

Transporting audio-video. over the Internet

Digital video coding systems MPEG-1/2 Video

I D I A P R E S E A R C H R E P O R T. October submitted for publication

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Senior Manager, CE Technology Dolby Australia Pty Ltd

Simple Watermark for Stereo Audio Signals with Modulated High-Frequency Band Delay

Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig

A Review of Algorithms for Perceptual Coding of Digital Audio Signals

CHAPTER 10: SOUND AND VIDEO EDITING

An Experimental High Fidelity Perceptual Audio Coder

MPEG-4 General Audio Coding

S.K.R Engineering College, Chennai, India. 1 2

3 Sound / Audio. CS 5513 Multimedia Systems Spring 2009 LECTURE. Imran Ihsan Principal Design Consultant

Music & Engineering: Digital Encoding and Compression

MPEG-4 Version 2 Audio Workshop: HILN - Parametric Audio Coding

A PSYCHOACOUSTIC MODEL WITH PARTIAL SPECTRAL FLATNESS MEASURE FOR TONALITY ESTIMATION

How is sound processed in an MP3 player?

An adaptive wavelet-based approach for perceptual low bit rate audio coding attending to entropy-type criteria

Speech and audio coding

Music & Engineering: Digital Encoding and Compression

Efficient Representation of Sound Images: Recent Developments in Parametric Coding of Spatial Audio

A Generic Audio Classification and Segmentation Approach for Multimedia Indexing and Retrieval

EEC-484/584 Computer Networks

COS 116 The Computational Universe Laboratory 4: Digital Sound and Music

/ / _ / _ / _ / / / / /_/ _/_/ _/_/ _/_/ _\ / All-American-Advanced-Audio-Codec

Lecture Information. Mod 01 Part 1: The Need for Compression. Why Digital Signal Coding? (1)

Transcription:

Fundamentals of Perceptual Audio Encoding Craig Lewiston HST.723 Lab II 3/23/06

Goals of Lab Introduction to fundamental principles of digital audio & perceptual audio encoding Learn the basics of psychoacoustic models used in perceptual audio encoding. Run 2 experiments exploring some fundamental principles behind the psychoacoustic models of perceptual audio encoding.

Quantization Digital Audio

Quantization Quantization Noise is the difference between the analog signal and the digital representation, and arises as a result of the error in the quantization of the analog signal. N Bits => 2 N levels

Quantization Bits Levels 3 8 4 16 5 32 8 256 16 65536 With each increase in the bit level, the digital representation of the analog signal increases in fidelity, and the quantization noise becomes smaller.

Digital Audio CD Audio: 16 bit encoding 2 Channels (Stereo) 44.1 khz sampling rate 2 * 44.1 khz * 16 bits = 1.41 Mb/s + Overhead (synchronization, error correction, etc.) CD Audio = 4.32 Mb/s

Compression High data rates, such as CD audio (4.32 Mb/s), are incompatible with internet & wireless applications. Audio data must somehow be compressed to a smaller size (less bits), while not affecting signal quality (minimizing quantization noise). Perceptual Audio Encoding is the encoding of audio signals, incorporating psychoacoustic knowledge of the auditory system, in order to reduce the amount of bits necessary to faithfully reproduce the signal. MPEG-1 Layer III (aka mp3) MPEG-2 Advanced Audio Coding (AAC)

MPEG MPEG = Motion Picture Experts Group MPEG is a family of encoding standards for digital multimedia information MPEG-1: a standard for storage and retrieval of moving pictures and audio on storage media (e.g., CD-ROM). Layer I Layer II Layer III (aka MP3) MPEG-2: standard for digital television, including high-definition television (HDTV), and for addressing multimedia applications. Advanced Audio Coding (AAC) MPEG-4: a standard for multimedia applications, with very low bit-rate audio-visual compression for those channels with very limited bandwidths (e.g., wireless channels). MPEG-7: a content representation standard for information search

Overview of Perceptual Encoding General Perceptual Audio Encoder (Painter & Spanias, 2000): Psychoacoustic analysis => masking thresholds Basic principle of Perceptual Audio Encoder: use masking pattern of stimulus to determine the least number of bits necessary for each frequency sub-band, so as to prevent the quantization noise from becoming audible.

Masking

Quantization Noise 3 15 250 200 2 10 150 Amplitude (Quantization Levels) 1 0 1 Amplitude (Quantization Levels) 5 0 5 Amplitude (Quantization Levels) 100 50 0 50 100 2 10 150 200 3 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Time (ms) 15 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Time (ms) 250 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Time (ms) 30 20 10 Level (db) 0 10 20 30 10 2 10 3 10 4 Frequency (Hz)

Sub-band Coding 30 20 10 Level (db) 0 10 20 30 10 2 10 3 10 4 Frequency (Hz)

Sub-band Coding 30 20 m-1 m 10 m+1 Level (db) 0 10 20 30 10 2 10 3 10 4 Frequency (Hz)

Masking/Bit Allocation The number of bits used to encode each frequency sub-band is equal to the least number of bits with a quantization noise that is below the minimum masking threshold for that sub-band.

Example: MPEG-1 Psychoacoustic Model I 1. Spectral Analysis and SPL Normalization

Example: MPEG-1 Psychoacoustic Model I 2. Identification of Tonal Maskers & calculation of individual masking thresholds

Example: MPEG-1 Psychoacoustic Model I 2. Identification of Noise Maskers & calculation of individual masking thresholds

Example: MPEG-1 Psychoacoustic Model I 4. Calculation of Global Masking Thresholds

Example: MPEG-1 Psychoacoustic Model I A - Some portions of the input spectrum require SNR s > 20 db B A C D B - Other portions require less than 3 db SNR C - Some high frequency portions are masked by the signal itself D - Very high frequency portions fall below the absolute threshold of hearing.

Example: MPEG-1 Psychoacoustic Model I 5. Sub-band Bit Allocation

Lab Experiments Exp 1: Masking Pattern Measure absolute hearing thresholds in quiet Measure absolute hearing thresholds in presence of narrowband noise masker Exp 2: Masking Threshold Measure masking threshold of a 1 khz tone in the presence of four different maskers: Tone Gaussian Noise Multiplied Noise Low-noise Noise

Method of Adjustment Georg von Bekesy Method of Adjustment (aka Békésy tracking method) Target tone is swept through frequency range, and subject must adjust intensity of target tone so that it is just barely detectable

Exp 1: Masking Pattern Masker Threshold in quiet Masked threshold Masked Sounds

Exp 2: Masking Thresholds Calculation of tonal & noise masking thresholds: Tonal & noise maskers have different masking effects

Asymmetry of Simultaneous Masking Tone masker SNR ~ 24 db Noise masker SNR ~ 4 db

Asymmetry of Simultaneous Masking Why do tones and noises have different masking effects? Signal = A(t) ejω(t) + φ(t) For narrowband Gaussian noise, e jω(t) is approximately the same as a tone centered at the same frequency. Asymmetry effect is either due to the amplitude term A(t) or to the phase term φ(t), or a combination of both.

Asymmetry of Simultaneous Masking Measure masking effects of modified noises: Multiplied Noise: generated by multiplying a sinusoid at 1 khz with a low-pass Gaussian noise. Amplitude => Gaussian Noise Phase => Pure Tone Low-Noise Noise: Gaussian noise with a temporal envelope that has been smoothed. Amplitude => Pure Tone Phase => Gaussian Noise

Exp 2: Masking Thresholds Target (Quantization noise) Gaussian noise Gaussian noise Gaussian noise Gaussian noise Masker (Desire signal) Tone Gaussian noise Multiplied noise Low-noise noise 1. Measure masking threshold for four different types of masker 2. Comparing the modified noise thresholds with the tone & Gaussian noise thresholds should indicate which component of the Gaussian noise (Amplitude and/or Phase) contributes to the asymmetry effect.

Method: Adaptive Procedure Trial Number Intensity 1 2 3 4 5 6 7 8 9 10 11 12 75 74 73 72 71 Y 70 Y Y 69 Y N Y 68 N Y N 67 N 66 65 Threshold = average of reversal points (usually 6 or 7) N Y

Lab Write-up 1) Describe the methods of Experiment 1 and the results you obtained. Explain how the threshold results obtained relate to the masking thresholds used in perceptual audio encoding. 2) Describe the methods of Experiment 2 and the results you obtained, highlighting the amplitude and phase characteristics of the two modified noises used. Based on your data, indicate which component (amplitude and/or phase) contributes to the asymmetry of simultaneous masking observed. LAB WRITE-UP DUE Monday, March 7, 2005