Audio Coding and MP3

Similar documents
2.4 Audio Compression

Audio-coding standards

Audio-coding standards

Principles of Audio Coding

Audio Compression. Audio Compression. Absolute Threshold. CD quality audio:

5: Music Compression. Music Coding. Mark Handley

Mpeg 1 layer 3 (mp3) general overview

ELL 788 Computational Perception & Cognition July November 2015

Perceptual coding. A psychoacoustic model is used to identify those signals that are influenced by both these effects.

Perceptual Coding. Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding

Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal.

MPEG-1. Overview of MPEG-1 1 Standard. Introduction to perceptual and entropy codings

Audio Fundamentals, Compression Techniques & Standards. Hamid R. Rabiee Mostafa Salehi, Fatemeh Dabiran, Hoda Ayatollahi Spring 2011

Speech and audio coding

Optical Storage Technology. MPEG Data Compression

Data Compression. Audio compression

Lecture 16 Perceptual Audio Coding

Compressed Audio Demystified by Hendrik Gideonse and Connor Smith. All Rights Reserved.

Ch. 5: Audio Compression Multimedia Systems

Chapter 14 MPEG Audio Compression

CISC 7610 Lecture 3 Multimedia data and data formats

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Snr Staff Eng., Team Lead (Applied Research) Dolby Australia Pty Ltd

ITNP80: Multimedia! Sound-II!

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Senior Manager, CE Technology Dolby Australia Pty Ltd

Parametric Coding of High-Quality Audio

Figure 1. Generic Encoder. Window. Spectral Analysis. Psychoacoustic Model. Quantize. Pack Data into Frames. Additional Coding.

KINGS COLLEGE OF ENGINEERING DEPARTMENT OF INFORMATION TECHNOLOGY ACADEMIC YEAR / ODD SEMESTER QUESTION BANK

Audio and video compression

Module 9 AUDIO CODING. Version 2 ECE IIT, Kharagpur

Filterbanks and transforms

The MPEG-4 General Audio Coder

Contents. 3 Vector Quantization The VQ Advantage Formulation Optimality Conditions... 48

Audio Coding Standards

EE482: Digital Signal Processing Applications

Appendix 4. Audio coding algorithms

AUDIO. Henning Schulzrinne Dept. of Computer Science Columbia University Spring 2015

MPEG-4 General Audio Coding

Chapter 4: Audio Coding

Lecture 7: Audio Compression & Coding

CS 335 Graphics and Multimedia. Image Compression

Pyramid Coding and Subband Coding

Application of wavelet filtering to image compression

Transporting audio-video. over the Internet

Compression Part 2 Lossy Image Compression (JPEG) Norm Zeck

Multimedia Systems Speech I Mahdi Amiri February 2011 Sharif University of Technology

Source Coding Basics and Speech Coding. Yao Wang Polytechnic University, Brooklyn, NY11201

Pyramid Coding and Subband Coding

CT516 Advanced Digital Communications Lecture 7: Speech Encoder

Compression; Error detection & correction

What is multimedia? Multimedia. Continuous media. Most common media types. Continuous media processing. Interactivity. What is multimedia?

Multimedia. What is multimedia? Media types. Interchange formats. + Text +Graphics +Audio +Image +Video. Petri Vuorimaa 1

Speech-Coding Techniques. Chapter 3

CHAPTER 6 Audio compression in practice

Parametric Coding of Spatial Audio

Squeeze Play: The State of Ady0 Cmprshn. Scott Selfon Senior Development Lead Xbox Advanced Technology Group Microsoft

DRA AUDIO CODING STANDARD

Lecture Information Multimedia Video Coding & Architectures

PQMF Filter Bank, MPEG-1 / MPEG-2 BC Audio. Fraunhofer IDMT

Image, video and audio coding concepts. Roadmap. Rationale. Stefan Alfredsson. (based on material by Johan Garcia)

Principles of MPEG audio compression

Lossy compression. CSCI 470: Web Science Keith Vertanen

Modeling of an MPEG Audio Layer-3 Encoder in Ptolemy

Implementation of a MPEG 1 Layer I Audio Decoder with Variable Bit Lengths

Lecture Information. Mod 01 Part 1: The Need for Compression. Why Digital Signal Coding? (1)

Design and Implementation of an MPEG-1 Layer III Audio Decoder KRISTER LAGERSTRÖM

Audio issues in MIR evaluation

Multimedia Systems Speech II Mahdi Amiri February 2012 Sharif University of Technology

Mahdi Amiri. February Sharif University of Technology

DAB. Digital Audio Broadcasting

Multimedia Systems Speech II Hmid R. Rabiee Mahdi Amiri February 2015 Sharif University of Technology

DSP. Presented to the IEEE Central Texas Consultants Network by Sergio Liberman

DigiPoints Volume 1. Student Workbook. Module 8 Digital Compression

Multimedia Communications. Audio coding

Efficient Representation of Sound Images: Recent Developments in Parametric Coding of Spatial Audio

GSM Network and Services

Digital Speech Coding

Audio coding for digital broadcasting

Port of a fixed point MPEG2-AAC encoder on a ARM platform

Bit or Noise Allocation

/ / _ / _ / _ / / / / /_/ _/_/ _/_/ _/_/ _\ / All-American-Advanced-Audio-Codec

CS 074 The Digital World. Digital Audio

MPEG-l.MPEG-2, MPEG-4

New Results in Low Bit Rate Speech Coding and Bandwidth Extension

Lossy compression CSCI 470: Web Science Keith Vertanen Copyright 2013

For Mac and iphone. James McCartney Core Audio Engineer. Eric Allamanche Core Audio Engineer

Implementation of FPGA Based MP3 player using Invers Modified Discrete Cosine Transform

Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig

An adaptive wavelet-based approach for perceptual low bit rate audio coding attending to entropy-type criteria

Skill Area 214: Use a Multimedia Software. Software Application (SWA)

Fundamental of Digital Media Design. Introduction to Audio

Digital Media. Daniel Fuller ITEC 2110

Perceptual Pre-weighting and Post-inverse weighting for Speech Coding

MPEG-4 Version 2 Audio Workshop: HILN - Parametric Audio Coding

MP3. Panayiotis Petropoulos

ISO/IEC INTERNATIONAL STANDARD. Information technology MPEG audio technologies Part 3: Unified speech and audio coding

ISO/IEC INTERNATIONAL STANDARD

SAOC and USAC. Spatial Audio Object Coding / Unified Speech and Audio Coding. Lecture Audio Coding WS 2013/14. Dr.-Ing.

Compression; Error detection & correction

Rich Recording Technology Technical overall description

REAL-TIME DIGITAL SIGNAL PROCESSING

Transcription:

Audio Coding and MP3 contributions by: Torbjørn Ekman What is Sound? Sound waves: 20Hz - 20kHz Speed: 331.3 m/s (air) Wavelength: 165 cm - 1.65 cm 1

Analogue audio frequencies: 20Hz - 20kHz mono: x(t) scalar stereo: xr ( t) x( t) = xl ( t) Audio Compression small files, low data rate at transmission reconstruction must be (as much as possible) similar to original signal redundancy (lossless coding) irrelevancy (do not code what you cannot hear) 2

Data rates Quality Sample Rate Bit/Sample Channels Data Rate kb/s Frequency Telephone 8.000 8 Mono 64,00 200-3400 MW 11.025 8 Mono 88,00 UKW 22.050 16 Stereo 705,60 CD 44.100 16 Stereo 1411,00 20-20000 DAT 48.000 16 Stereo 1536,00 20-20000 Dynamics compression A-Law A abs( S) sign( S) S' = 1+ ln A 1+ ln( A abs( S)) sign( S) 1+ ln A for else abs(s) 1 A µ-law 1+ ln(1 + µ abs( S)) S' = sign( S), µ = 255 ln(1 + µ ) 3

Masking Masking Threshold for human ear Threshold changes: neighbouring frequencies (Example 0.5, 1, 4, 8 khz) in time 4

Sampling When x(t) is bandwidth-limited: f > ω x( f ) = 0 then n= [] n x ( t) = x g( t n t) with = 1 < 1 2ω t x[ n] = x( n t) f s sin(2πωt) g( t) = 2πωt Quantisation x Q(x) k bits { y,, } K 1 y n L = 2 k representations x y x y Q( x) = i j y i 5

PCM = Pulse Code Modulation Sampling: Quantisation: Coding: { x( t) } { x[ n] } { x[ n] } { Q( x[ n] )} Q( { x[ n] }) { } n i redundancy irrelevancy Play: y ( t) Q( x[ n ]) g( t n t) = i i Stereo CD Audio Data rate: 2 16 bit 44.1 10 = 1411.2 10-3 3 s 1 bit s 6

MPEG compression factors MPEG 1 Audio: PCM 32, 44.1, 48 khz, max 448 kbit/s MPEG 2 Audio: PCM 16, 22.05, 24, 32, 44.1, 48 khz, max 384 KBit/s MPEG Audio Layer I,II,III Layer I Layer II Digital TV Layer III MP3 7

MP3 - MPEG 1 Audio Layer 3 Sampling: 16 khz - 48 khz Bit rate: 32 kb/s - 192 kb/s (CD Audio: 44.1 khz, 1411 kb/s) www.iis.fhg.de/amm/gallery/index.html Karlheinz Brandenburg: MP3 and AAC explained http://www.exp-math.uni-essen.de/~dreibh/diplom/bra99.pdf perceptual encoding / decoding 8

Filterbank Ideal sub-band coder impossible: ideal sub-band coder downsampling aliasing possible: nearly perfect H m 1 for f Dm, m = 1, K, M ( f ) = 0 else 9

Downsampling from M f s back to sub-bandwidth B, upper frequency is multiple of B f s can sample at f s = 2B = 2M B f s (instead of ) x m [] n [ k] y m M [ k] = x [ k M ] m y m Filterbank in MPEG-1 audio layer 1-3 Polyphase filterbank 32 subbands 512 tap FIR-filters 80 + and * per output Equal width Not perfect reconstruction Frequency overlap 10

A closer look The subbands overlap at 3 db to the adjacent bands. The leakage to the other bands is small. The total response almost adds up to one (0 db). White noise The white noise run through the filterbank. The samples from each band are played in the order of the subbands. The subsampled filtered sequence. The samples from each band are played in the order of the subbands. The reconstruction error is 84 db. 11

Nonideal filterbanks Y( e M 1 n= 1 jω X( e ) = X( e jω 2 πn j ω M In a perfect filterbank the first part is the only part. M 1 1 R jω A jω ) Hk ( e ) Hk ( e ) + k= 0 14 M 444 24444 3 1 M 1 1 j ω R jω A M ) Hk ( e ) Hk ( e ) k= 0 14 M 4444 244444 3 0 2πn The second part consists of the aliasing terms. The filterbank is designed so that the aliasing is small. Tubthumper, a time domain view The red line is the reconstruction error after splitting the signal in subbands, down sampling and applying the synthesis filterbank. The reconstruction error is 84 db and sounds like 12

Tubthumper, frequency view Subband 1 2 4 8 16 32 Center frequency [khz] No subsampling 0.3 1.0 2.4 5.2 10.7 21.7 Subsampled 32 times Filterbank MPEG polyphase 12 samples 12 samples 12 samples filterbank band 1 band 2... Layer I frame 384 samples band 31 Layer II/III frame 1152 samples 13

Critical Bands Heinrich Barkhausen (1881-1956) psycho-acoustic width measured in bark f /100 1bark = 9 + 4 log( f /1000) for else f < 500 MPEG - Sub bands Layer I: 32 bands, 625 Hz each, Fourier transform Layer II: 32 bands, three frames, time masking Layer III: Division according to critical bands 14

MPEG masking Psycho-acoustic model masking of neighbouring bands signals are coded when above masking threshold MUSICAM (Masking-pattern adapted Universal Subband Integrated Coding and Multiplexing) Layer I: simplified, Layer II: entirely, Layer III: with other methods Example: Masking MPEG Audio band level masking coding 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 8 12 10 6 2 10 60 35 20 15 2 3 5 3 1?????? 12 x 15????????????? - x x??????? 15

MPEG-1 Layer 3 encoder MP3 Filter bank - sub bands Series MDCT fine grain frequency resolution non-uniform quantisation perception model Huffman coding 16

MP3 (vs. Layer I/II) modified DCT (Series MDCT vs. FFT) critical bands Huffman coding entropy reduction dynamics compression difference and sum of stereo signals MPEG Audio Layer I,II,III Layer I: 19 ms delay, FFT, 384 samples, frequency masking, equal bands Layer II: 35 ms delay, FFT, 1152 samples, frequency masking, time simulated, equal bands Layer III: 59 ms delay, DCT, 1152 samples, frequency and time masking, bands as in bark scale 17

MPEG Layer I, II, III subj. quality bandwidth compression 1 min audio Audio CD CD 1400 1:1 10.58 MB MPEG1 Layer I CD 384 3.6:1 2.88 MB MPEG1 Layer II CD 256 5.5:1 1.92 MB MPEG1 Layer III CD 128 11:1 962 kb MPEG2 Layer III Radio 64 22:1 481 kb MPEG2 Layer III Telephone 16 88:1 120 kb CS-ACELP Speech 5,30 264:1 40 kb MPEG-2 AAC 18

Audio Formats PCM - Pulse Code Modulation ITU G.711; speech data 4kHz bandwidth, 64 kb/s data rate ADPCM (Adaptive Differential PCM) ITU G.726, G.727; 16, 24, 32, 40 kbit/s. Standard for CCITT G.721 SB-ADPCM (Sub-Band ADPCM) ISDN, G.722; 7 khz bandwidth in 64 kbit/s streams Audio Formats AIFF - Audio Interchange File Format Apple (extension from IFF by Electronic Arts) Wave (by Microsoft and IBM) Part of RIFF (Resource Interchange File Format) NeXT/Sun Audio File Format! big endian 19

Proprietary Audio Formats AT&T Proprietary Compression Algorithm EPAC (Bell Labs) Microsoft Windows Media Audio (WMA) AC-3 Audio Code No. 3 - Dolby Digital Surround Speech compression formats GSM 06-10: 160 13-bit values in 260 Bit (33 Byte) are compressed; 8000 samples/s result in data rate of 1650 Byte/s CELP (Code Excited Linear Prediction): analytical model LD-CELP (Low Delay CELP): G.728 LPC-10E (Linear Prediction Coder (Enhanced): military coder, analytical model, 2.4 kbit/s understandable, but low quality. 20

End of Part Thank you for your attention! 21