Mpeg 1 layer 3 (mp3) general overview

Similar documents
Fundamentals of Perceptual Audio Encoding. Craig Lewiston HST.723 Lab II 3/23/06

Audio-coding standards

Audio-coding standards

Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal.

Perceptual coding. A psychoacoustic model is used to identify those signals that are influenced by both these effects.

Audio Compression. Audio Compression. Absolute Threshold. CD quality audio:

Chapter 14 MPEG Audio Compression

5: Music Compression. Music Coding. Mark Handley

Lecture 16 Perceptual Audio Coding

Multimedia Communications. Audio coding

Perceptual Coding. Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding

ELL 788 Computational Perception & Cognition July November 2015

Appendix 4. Audio coding algorithms

EE482: Digital Signal Processing Applications

Principles of Audio Coding

Figure 1. Generic Encoder. Window. Spectral Analysis. Psychoacoustic Model. Quantize. Pack Data into Frames. Additional Coding.

MPEG-1. Overview of MPEG-1 1 Standard. Introduction to perceptual and entropy codings

Ch. 5: Audio Compression Multimedia Systems

Optical Storage Technology. MPEG Data Compression

AUDIOVISUAL COMMUNICATION

AUDIOVISUAL COMMUNICATION

Principles of MPEG audio compression

2.4 Audio Compression

CISC 7610 Lecture 3 Multimedia data and data formats

New Results in Low Bit Rate Speech Coding and Bandwidth Extension

Audio Coding and MP3

Compressed Audio Demystified by Hendrik Gideonse and Connor Smith. All Rights Reserved.

Parametric Coding of High-Quality Audio

Audio Coding Standards

Audio and video compression

Audio Fundamentals, Compression Techniques & Standards. Hamid R. Rabiee Mostafa Salehi, Fatemeh Dabiran, Hoda Ayatollahi Spring 2011

Module 9 AUDIO CODING. Version 2 ECE IIT, Kharagpur

Data Compression. Audio compression

Port of a Fixed Point MPEG-2 AAC Encoder on a ARM Platform

The MPEG-4 General Audio Coder

AN AUDIO WATERMARKING SCHEME ROBUST TO MPEG AUDIO COMPRESSION

Contents. 3 Vector Quantization The VQ Advantage Formulation Optimality Conditions... 48

Bit or Noise Allocation

MPEG-4 General Audio Coding

DSP. Presented to the IEEE Central Texas Consultants Network by Sergio Liberman

ITNP80: Multimedia! Sound-II!

What is multimedia? Multimedia. Continuous media. Most common media types. Continuous media processing. Interactivity. What is multimedia?

DAB. Digital Audio Broadcasting

Efficient Representation of Sound Images: Recent Developments in Parametric Coding of Spatial Audio

/ / _ / _ / _ / / / / /_/ _/_/ _/_/ _/_/ _\ / All-American-Advanced-Audio-Codec

<< WILL FILL IN THESE SECTIONS THIS WEEK to provide sufficient background>>

Multimedia. What is multimedia? Media types. Interchange formats. + Text +Graphics +Audio +Image +Video. Petri Vuorimaa 1

Design and Implementation of an MPEG-1 Layer III Audio Decoder KRISTER LAGERSTRÖM

Efficient Signal Adaptive Perceptual Audio Coding

Parametric Coding of Spatial Audio

Lossy compression. CSCI 470: Web Science Keith Vertanen

CSCD 443/533 Advanced Networks Fall 2017

SPREAD SPECTRUM AUDIO WATERMARKING SCHEME BASED ON PSYCHOACOUSTIC MODEL

DRA AUDIO CODING STANDARD

Compression Part 2 Lossy Image Compression (JPEG) Norm Zeck

Modeling of an MPEG Audio Layer-3 Encoder in Ptolemy

Music & Engineering: Digital Encoding and Compression

Music & Engineering: Digital Encoding and Compression

Audio coding for digital broadcasting

For Mac and iphone. James McCartney Core Audio Engineer. Eric Allamanche Core Audio Engineer

Lecture 7: Audio Compression & Coding

Chapter 4: Audio Coding

MUSIC A Darker Phonetic Audio Coder

MPEG-4 aacplus - Audio coding for today s digital media world

Memory Access and Computational Behavior. of MP3 Encoding

HAVE YOUR CAKE AND HEAR IT TOO: A HUFFMAN CODED, BLOCK SWITCHING, STEREO PERCEPTUAL AUDIO CODER

Squeeze Play: The State of Ady0 Cmprshn. Scott Selfon Senior Development Lead Xbox Advanced Technology Group Microsoft

Rich Recording Technology Technical overall description

Digital Audio for Multimedia

Wavelet filter bank based wide-band audio coder

Compression; Error detection & correction

MPEG-1 Bitstreams Processing for Audio Content Analysis

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Senior Manager, CE Technology Dolby Australia Pty Ltd

Audio Compression Using Decibel chirp Wavelet in Psycho- Acoustic Model

Digital Media. Daniel Fuller ITEC 2110

S.K.R Engineering College, Chennai, India. 1 2

Module 9 AUDIO CODING. Version 2 ECE IIT, Kharagpur

AUDIO MEDIA CHAPTER Background

Digital Recording and Playback

Digital Audio Compression

MP3. Panayiotis Petropoulos

Speech and audio coding

An adaptive wavelet-based approach for perceptual low bit rate audio coding attending to entropy-type criteria

ISO/IEC INTERNATIONAL STANDARD

Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig

Transporting audio-video. over the Internet

Efficient Implementation of Transform Based Audio Coders using SIMD Paradigm and Multifunction Computations

Networking Applications

Lossy compression CSCI 470: Web Science Keith Vertanen Copyright 2013

ISO/IEC INTERNATIONAL STANDARD. Information technology MPEG audio technologies Part 3: Unified speech and audio coding

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Snr Staff Eng., Team Lead (Applied Research) Dolby Australia Pty Ltd

Bluray (

Port of a fixed point MPEG2-AAC encoder on a ARM platform

A Review of Algorithms for Perceptual Coding of Digital Audio Signals

6MPEG-4 audio coding tools

An Experimental High Fidelity Perceptual Audio Coder

Audio Segmentation and Classification. Abdillahi Hussein Omar

PQMF Filter Bank, MPEG-1 / MPEG-2 BC Audio. Fraunhofer IDMT

CT516 Advanced Digital Communications Lecture 7: Speech Encoder

14th European Signal Processing Conference (EUSIPCO 2006), Florence, Italy, September 4-8, 2006, copyright by EURASIP

Transcription:

Mpeg 1 layer 3 (mp3) general overview 1

Digital Audio! CD Audio:! 16 bit encoding! 2 Channels (Stereo)! 44.1 khz sampling rate 2 * 44.1 khz * 16 bits = 1.41 Mb/s + Overhead (synchronization, error correction, etc.) CD Audio = 4.32 Mb/s 2

Bit rate reduction 3

Quantization! Quantization Noise! the difference between the analog signal and the digital representation,! a result of the error in the quantization of the analog signal. N Bits => 2 N levels 4

Quantization! With each increase in the bit level,! the digital representation of the analog signal increases in fidelity,! the quantization noise becomes smaller. Bits Levels 3 8 4 16 5 32 8 256 16 65536 5

Compression! High data rates, such as CD audio (4.32 Mb/s), are incompatible with internet & wireless applications.! Audio data must somehow be compressed to a smaller size (less bits), while not affecting signal quality (minimizing quantization noise).! Perceptual Audio Encoding is the encoding of audio signals, incorporating psychoacoustic knowledge of the auditory system, in order to reduce the amount of bits necessary to faithfully reproduce the signal.! MPEG-1 Layer III (aka mp3)! MPEG-2 Advanced Audio Coding (AAC) 6

MPEG! MPEG = Motion Picture Experts Group! MPEG is a family of encoding standards for digital multimedia information! MPEG-1: a standard for storage and retrieval of moving pictures and audio on storage media (e.g., CD-ROM).! Layer I! Layer II! Layer III (aka MP3)! MPEG-2: standard for digital television, including high-definition television (HDTV), and for addressing multimedia applications.! Advanced Audio Coding (AAC)! MPEG-4: a standard for multimedia applications, with very low bit-rate audio-visual compression for those channels with very limited bandwidths (e.g., wireless channels).! MPEG-7: a content representation standard for information search 7

9.1.2 - Overview of Perceptual Encoding! General Perceptual Audio Encoder! Psychoacoustic analysis => masking thresholds! irrelevancy is identified and then removed! Basic principle of Perceptual Audio Encoder:! use masking pattern of stimulus to determine the least number of bits necessary for each frequency sub-band,! to prevent the quantization noise from becoming audible. 8

Masking! Two major types of masking take place for encoding:! Frequency Masking:! areas of the spectrum that have high energy and are close together can be filtered to eliminate spikes in the spectrum! Temporal Masking:! loud noises that occur close to each other in time (about 3 to 5 milliseconds) can be approximated by a single loud noise 9

Threshold in quiet! Human auditory system has limitations! Frequency range: 20 Hz to 20 khz, sensitive at 2 to 4 KHz.! Dynamic range (quietest to loudest) is about 96 db! Moreover, based on psycho-acoustic characteristics of human hearing, algorithms perform some tricks to further reduce data rate 10

Masking Effects: Frequency Masking! Frequency Masking: If a tone of a certain frequency and amplitude is present, then other tones or noise of similar frequency cannot be heard by the human ear! the louder tone (masker) maskes the softer tone (maskee)! => no need to encode and transfer the softer tone! Different masking when masking and maskee are tones or noise! tone masking noise => strong! noise masking tone => weak 11

Masking Effects (Cont )! Simultaneous masking changes with frequency! Repeat for various frequencies of masking tones! Masking Threshold: Given a certain masker, the maximum nonperceptible amplitude level of the softer tone 12

Masking Pattern Masker Threshold in quiet Masked Sounds 13

Masking 14

Masking/Bit Allocation The number of bits used to encode each frequency sub-band is equal to the least number of bits with a quantization noise that is below the minimum masking threshold for that sub-band. 15

Masking Effects temporal masking! Temporal Masking: If we hear a loud sound, then it stops, it takes a little while until we can hear a soft tone nearby.! The Masking Threshold is used by the audio encoder to determine the maximum allowable quantization noise at each frequency to minimize noise perceptibility: remove parts of signal that we cannot perceive 16

Net effect of masking: 17

MPEG 1 Layer 3 audio coding 9.1.3 18

Layers of MPEG Audio! Many different standard sampling rates:! 16kHz, 22.05kHz, 24kHz, 32 khz, 44.1 khz, 48 khz! Layer I! 12 samples per sub-band (384 total samples)! Compression ratio: approx 4:1! Around 384kbps (depends on chosen sampling rate)! Layer II! 36 samples per sub-band (1152 total samples)! Compression ratio: approx 6:1 to 8:1! Around 256kbps to 192kbps! Layer III! 12 samples per sub-band (384 total samples)! Compression ratio: approx 10:1 to 12:1 19

Perceptual Audio Coder Filtered Audio Signal Audio Input Filterbank Quantization and Rate Control Encoding of Bitstream Perceptual Model Perceptual Threshold Quantized filterbank values, side information Coded Bitstream 20

MPEG 1 Layer 3 (MP3) Encoder Digital Audio Signal (PCM) (768 kbits/s Analysis FilterBank (32 Subbands) FFT 1024 Points 32 Sub- Bands Psycho Acoustic Model MDCT 576 Lines Distortion Control Loop Non Uniform Quantization Rate Control Loop Huffman Encoding Coded Audio Signal at 32-192 KBits/s Multiplexing Perceptual Model 21

MP3 Components! Perceptual model: An estimate of the actual (time and frequency dependent) masking threshold is computed by using rules known from psychoacoustics.! Filter bank: A hybrid polyphase / MDCT filter bank is used to decompose the input signal into sub-sampled spectral components. Together with the corresponding inverse filter bank in the decoder it forms an analysis/synthesis system.! Quantization and coding: The spectral components are quantized and coded with the aim of keeping the noise introduced by the quantization below the masking threshold.! Distortion Control Loop! Non-uniform Quantization Control Loop! Huffman Coding! Multiplexing: A bit stream formatter is used to assemble the bit stream, which consists of the quantized and coded spectral coefficients and some side information, e.g. bit allocation information. 22

Perceptual Model! The perceptual model consists of outputs values for the masking threshold or allowed noise for each coder partition.! In Layer-3, these coder partitions are roughly equivalent to the critical bands of human hearing.! The the compression result should be indistinguishable from the original signal If the quantization noise can be kept below the masking threshold for each coder partition 23

Psychoacoustic Model! Time align audio data! The psychoacoustic model must account for both the delay of the audio data through the filter bank and a data off-set so that the relevant data is centered within its analysis window! Convert audio to spectral domain! The psychoacoustic model uses a time-to-frequency map-ping such as a 512- or 1,024-point Fourier transform! A standard Hanning window, applied to audio data before Fourier transformation, conditions the data to reduce the edge effects of the transform window.! Partition spectral values into critical bands! To simplify the psychoacoustic calculations, the model groups the frequency values into perceptual quanta 24

MPEG Audio Filter Bank Boundaries Finer resolution at lower frequencies 25

Psychoacoustic Model Functions! Incorporate threshold in quiet! This threshold is the lower bound for noise masking and is determined in the absence of masking signals! Separate into tonal and non-tonal components! The model must identify and separate the tonal and noiselike components of the audio signal! Apply spreading function! The model determines the noise-masking thresholds by applying an empirically determined masking or spreading function to the signal components 26

Psychoacoustic Model Functions! Find the minimum masking threshold for each sub-band! The psychoacoustic model calculates the masking thresholds with a higher-frequency resolution than provided by the filter banks.! Where the filter band is wide relative to the critical band (at the lower end of the spectrum), the model selects the minimum of the masking thresholds covered by the filter band.! Where the filter band is narrow relative to the critical band, the model uses the average of the masking thresholds covered by the filter band. 27

Encoding model for Layer I =========è =========è ========è 28

Determine Masking Threshold! algorithm 1. Time align audio data 2. Convert audio to spectral domain 3. Partition spectral values into critical bands 4. Incorporate threshold in quiet 5. Separate into tonal and non-tonal components 6. Apply spreading function 7. Find the minimum masking threshold for each sub-band 8. Calculate signal-to-mask ratio 29

Perceptual Audio Decoder Input Bitstream Decoding of bitstream Inverse Quantization Synthesis Filterbank Output Audio Signal Quantized filterbank values, side information reconstructed frequency lines 30

Stereo coding: problems! Binaural Masking Level Depression (BLMD)! low frequency effect! phase of interaural signals taken into account! è a noise image and a tone image can be in different places.! reduce the masking threshold by up to 20dB! Image distortion or elimination! high frequency effect! a signal with a distorted high-frequency envelope may not provide the same imaging effects in the stereo 31

Stereo coding strategies! Left-Right (L/R) or "Simple" Stereo (SS)! each band coded separately! Intensity Stereo coding (IS)! (left + right) signal (L+R)! and left and right gains gl gr! in high frequency bands! for lower quality coding, equivalent of a pan-pot.! Middle/Side (M/S) stereo coding! M = L + R! S = L R! good for signals:! with strong central images! with a strong surround component 32

Discussion on mp3 coding! Features! Open standard! Availability of encoders and decoders! Supporting technologies! Normative versus Informative! Quality Considerations:! Common types of artifacts! Loss of bandwidth! Pre-echoes! Roughness, double-speak 33

Pre-echoes! Source signal! Reconstructed signal! with block size 1024! with block size 256! Variable length window! better frequency resolution where it is needed! Mixes require transition blocks 34

Bit reservoir! Problem:! A frame with little audio interest may require few bits to encode! with a constant bitrate these bits may be unused! A frame with substantial audio interest may require more bits to encode! Solution:! with a constant bitrate the audio quality may decrease! Allow frames to give to or take from a reservoir! Each frame that save spaces, allows subsequent frames to store bits if needed! typical situation of a silence (few bits) followed by an attack (many bits) 35

Confronti original! Qualità telefonica: 96:1 (2.5 khz / mono / 8 kbps)! Radio AM: 24:1 (7.5 khz / mono / 32 kbps)! 24 kbps, mono! Radio FM: 26...24:1 (11 khz / stereo / 56...64 kbps)! 64 kbps, stereo! quasi-cd: 16:1 (15 khz / stereo / 96 kbps)! CD: 14..12:1 (>15 khz / stereo / 112..128 kbps)! approx. 1MB/minuto di spazio hard-disk! 128 kbps, stereo! Oltre: 8 4:1 per registrazioni moderne ad alta qualità 36