Compressed Audio Demystified by Hendrik Gideonse and Connor Smith. All Rights Reserved.

Similar documents
5: Music Compression. Music Coding. Mark Handley

Lecture 16 Perceptual Audio Coding

Multimedia Communications. Audio coding

Mpeg 1 layer 3 (mp3) general overview

Squeeze Play: The State of Ady0 Cmprshn. Scott Selfon Senior Development Lead Xbox Advanced Technology Group Microsoft

EE482: Digital Signal Processing Applications

Audio-coding standards

Skill Area 214: Use a Multimedia Software. Software Application (SWA)

ELL 788 Computational Perception & Cognition July November 2015

Figure 1. Generic Encoder. Window. Spectral Analysis. Psychoacoustic Model. Quantize. Pack Data into Frames. Additional Coding.

Audio Coding and MP3

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Senior Manager, CE Technology Dolby Australia Pty Ltd

Audio-coding standards

Compression; Error detection & correction

CS 074 The Digital World. Digital Audio

Audio issues in MIR evaluation

Optical Storage Technology. MPEG Data Compression

CHAPTER 6 Audio compression in practice

ITNP80: Multimedia! Sound-II!

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Snr Staff Eng., Team Lead (Applied Research) Dolby Australia Pty Ltd

Perceptual coding. A psychoacoustic model is used to identify those signals that are influenced by both these effects.

Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal.

Audio Coding Standards

Perceptual Coding. Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding

CISC 7610 Lecture 3 Multimedia data and data formats

Appendix 4. Audio coding algorithms

MPEG-1. Overview of MPEG-1 1 Standard. Introduction to perceptual and entropy codings

The MPEG-4 General Audio Coder

Principles of Audio Coding

Data Compression. Audio compression

Chapter 14 MPEG Audio Compression

SPREAD SPECTRUM AUDIO WATERMARKING SCHEME BASED ON PSYCHOACOUSTIC MODEL

Parametric Coding of High-Quality Audio

Chapter 4: Audio Coding

UNDERSTANDING MUSIC & VIDEO FORMATS

New Results in Low Bit Rate Speech Coding and Bandwidth Extension

CHAPTER 10: SOUND AND VIDEO EDITING

2.1 Transcoding audio files

Audio Fundamentals, Compression Techniques & Standards. Hamid R. Rabiee Mostafa Salehi, Fatemeh Dabiran, Hoda Ayatollahi Spring 2011

Module 9 AUDIO CODING. Version 2 ECE IIT, Kharagpur

AUDIOVISUAL COMMUNICATION

Lossy compression. CSCI 470: Web Science Keith Vertanen

1 Audio quality determination based on perceptual measurement techniques 1 John G. Beerends

Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig

Export Audio Mixdown

Lossy compression CSCI 470: Web Science Keith Vertanen Copyright 2013

Fundamental of Digital Media Design. Introduction to Audio

AUDIOVISUAL COMMUNICATION

Audio Compression. Audio Compression. Absolute Threshold. CD quality audio:

Contents. 3 Vector Quantization The VQ Advantage Formulation Optimality Conditions... 48

CSCD 443/533 Advanced Networks Fall 2017

<< WILL FILL IN THESE SECTIONS THIS WEEK to provide sufficient background>>

/ / _ / _ / _ / / / / /_/ _/_/ _/_/ _/_/ _\ / All-American-Advanced-Audio-Codec

VS1063 ENCODER DEMONSTRATION

For Mac and iphone. James McCartney Core Audio Engineer. Eric Allamanche Core Audio Engineer

Ch. 5: Audio Compression Multimedia Systems

Chapter 5.5 Audio Programming

MPEG-4 General Audio Coding

FILE CONVERSION AFTERMATH: ANALYSIS OF AUDIO FILE STRUCTURE FORMAT

2.4 Audio Compression

15 Data Compression 2014/9/21. Objectives After studying this chapter, the student should be able to: 15-1 LOSSLESS COMPRESSION

Speech and audio coding

Implementation of FPGA Based MP3 player using Invers Modified Discrete Cosine Transform

Scalable Perceptual and Lossless Audio Coding based on MPEG-4 AAC

Compression Part 2 Lossy Image Compression (JPEG) Norm Zeck

Fundamentals of Perceptual Audio Encoding. Craig Lewiston HST.723 Lab II 3/23/06

CHAPTER 5 AUDIO WATERMARKING SCHEME INHERENTLY ROBUST TO MP3 COMPRESSION

Digital Audio Basics

Audio for Everybody. OCPUG/PATACS 21 January Tom Gutnick. Copyright by Tom Gutnick. All rights reserved.

DAB. Digital Audio Broadcasting

Amazing Audacity: Session 1

Digital Audio. Amplitude Analogue signal

Efficient Representation of Sound Images: Recent Developments in Parametric Coding of Spatial Audio

AAMS Auto Audio Mastering System V3 Manual

Lecture #3: Digital Music and Sound

Technical PapER. between speech and audio coding. Fraunhofer Institute for Integrated Circuits IIS

Proceedings of Meetings on Acoustics

Bluray (

DRA AUDIO CODING STANDARD

Image and Video Coding I: Fundamentals

MPEG-4 aacplus - Audio coding for today s digital media world

COS 116 The Computational Universe Laboratory 4: Digital Sound and Music

Parametric Coding of Spatial Audio

AET 1380 Digital Audio Formats

University of Mustansiriyah, Baghdad, Iraq

What is multimedia? Multimedia. Continuous media. Most common media types. Continuous media processing. Interactivity. What is multimedia?

DSP. Presented to the IEEE Central Texas Consultants Network by Sergio Liberman

_APP B_549_10/31/06. Appendix B. Producing for Multimedia and the Web

Mobile Peer-to-Peer Audio Streaming

ROW.mp3. Colin Raffel, Jieun Oh, Isaac Wang Music 422 Final Project 3/12/2010

Recording oral histories

COS 116 The Computational Universe Laboratory 4: Digital Sound and Music

Audio coding for digital broadcasting

Multimedia. What is multimedia? Media types. Interchange formats. + Text +Graphics +Audio +Image +Video. Petri Vuorimaa 1

Modeling of an MPEG Audio Layer-3 Encoder in Ptolemy

Embedding Audio into your RX Application

CT516 Advanced Digital Communications Lecture 7: Speech Encoder

Optimizing Audio for Mobile Development. Ben Houge Berklee College of Music Music Technology Innovation Valencia, Spain

Digital Media. Daniel Fuller ITEC 2110

Data Representation and Networking

Transcription:

Compressed Audio Demystified

Why Music Producers Need to Care About Compressed Audio Files Download Sales Up CD Sales Down High-Definition hasn t caught on yet Consumers don t seem to care about high fidelity

Downloading Audio Digital Rights Management (DRM) Peer-to-Peer (P2P) File Sharing Apple s itunes (128-256 Kbps AAC) Amazon mp3 (256 Kbps MP3)

Downloading Audio: itunes 128 Kbps AAC Files with DRM EMI files are 256 Kbps without DRM

Downloading Audio: itunes

Downloading Audio: Amazon mp3 256 Kbps MP3 files, no DRM

Downloading Audio: Amazon mp3

Goals of Compression Algorithms Transmission size Archive size Maintain audio quality Nice idea, but I don t think it will work

Uncompressed Audio Formats AIFF Amiga and Apple s Uncompressed WAV IBM and Microsoft s Uncompressed PCM Pulse Code Modulation Each Sample Uses All Available Bits

Compressed Audio Formats FLAC Open source lossless MP3 MPEG lossy AAC Apple s lossy OGG Open source lossy

Problems with Lossy Codecs Pre-Echo & Time Smearing Non-Harmonic Distortion Loss of Bandwidth

Problems with Lossy Codecs Pre-Echo & Time Smearing Non-Harmonic Distortion Loss of Bandwidth Birdies

Problems with Lossy Codecs: Loss of Bandwidth When encoder runs out of bits to encode a block of data, frequencies (almost always high) get deleted Effectively the codec becomes a Low Pass Filter (LPF)

Frequency Response: 96 MP3

Frequency Response: 128K MP3

Frequency Response: 160K MP3

Frequency Response: 256K MP3

Problems with Lossy Codecs: Pre-Echo & Time-Smear Quantization noise is spread across an entire window If the transient occurs late in the window, the noise can actually occur before the attack Brandenberg, Karlheinz. AES, 17th International AES Conference, Florence, Italy. MP3 and AAC Explained. New York, NY: AES, 1999.

Visual Time Alignment

Visual Time Alignment: Zoom 1

Visual Time Alignment: Zoom 2

Problems with Lossy Codecs: Double-Speak A single transient gets moved in time so that the stereo channels no longer agree when a transient has occurred More common in low bit-rates like 64 Kbps and 96 Kbps

16 bit 44.1Khz WAV format WAV 16 bit 44.1Khz Uncompressed Left marker is negative peak top Right marker is negative peak bottom

16 bit 44.1Khz MP3 @ 96 Kbps MP3 96Kbps Time Smear: Negative peak in both waves is now in the same sample

16 bit 44.1Khz WAV format 16 bit 44.1Khz MP3 @ 96 Kbps WAV and MP3 96Kbps Overlay of the 2 waveforms

16 bit 44.1Khz MP3 @ 128Kbps MP3 128Kbps

16 bit 44.1Khz WAV format 16 bit 44.1Khz MP3 @ 128Kbps WAV and MP3 128Kbps

16 bit 44.1Khz MP3 @ 160Kbps MP3 160Kbps

16 bit 44.1Khz WAV format 16 bit 44.1Khz MP3 @ 160Kbps WAV and MP3 160Kbps

16 bit 44.1Khz MP3 @ 256Kbps MP3 256Kbps

16 bit 44.1Khz WAV format 16 bit WAV 44.1Khz MP3 and @ 256Kbps MP3 256Kbps

Phase Scope Views 16 bit 44.1 Khz WAV 96 Kbps MP3 128 Kbps MP3 256 Kbps MP3

Psychoacoustic or Perceptual Modeling Creating models of how people hear Limitations in physiology informs what components of an audio signal can be eliminated Higher frequencies can be eliminated Masked sounds can be eliminated

Perceptual Encoding/Decoding System Brandenberg, Karlheinz. AES, 17th International AES Conference, Florence, Italy. MP3 and AAC Explained. New York, NY: AES, 1999.

MP3 Encoder Brandenberg, Karlheinz. AES, 17th International AES Conference, Florence, Italy. MP3 and AAC Explained. New York, NY: AES, 1999.

MP3 Encoder Components Hybrid Filterbank 32 band filterbank (FFT), then additional subdivision with an MDCT down to 576 total division Perceptual Model Either uses its own filterbank for calculations, or just combines its masking calculations with the main filter bank data

MP3 Encoder cont Quantization - 2 loops Inner Loop Assigns bits to blocks of data based on masking threshold, can lower global bit gain to conform to allowed number of bits Outer Loop Controls noise by reducing number of bits of each frequency band until it is below the masking threshold generated by the perceptual model When encoded, each frame has a header of a sync word, bit-rate, among other things

AAC Encoder Explained Filterbank is similar to MP3, but AAC uses 1024 bands (MP3 uses 576). Just uses an MDCT Temporal Noise Shaping Originally designed for better speech encoding at lower bit-rates Predicts in a loop in the frequency domain

Ogg Vorbis and How It Works An open source (free!), lossy audio compression codec Ogg is the container to hold the Vorbis data Designed as the better sounding, open source lossy compression replacement for MP3

Ogg Vorbis Like MPEG, uses Modified Discrete Cosine Transform (MDCT) to separate into blocks One the MDCT has frequency information for each block, the noise floor is separated from the rest of the components Quantized using variable bit rate, based on a psychoacoustic model, lowering bit rate of sounds that will be masked Encoding is always variable bit-rate Bit rate varies from sample to sample

Ogg Vorbis = better?? Different from MP3 in its failure mode (when the bit rate would be lowered so low that perceptible loss would occur) Can raise the noise floor bit depth to cover those distortions, which is often heard as reverberations, rather than the metallic birdies of the mp3 compression

FLAC and How It Works Free Lossless Audio Codec Does not remove data from the audio stream Allows for data compression rates in the 30-50% range

Flac Processing I. Blocking - input is broken into different sized blocks of data. Ideal size for each block is determined through examining many factors, including sample rate, spectral content, etc. (As in most compression codecs, blocks with transient material typically are given a smaller size) II. Interchannel Decorrelation - encoder creates both mid and side signals based on the input of the right and left channels

Flac Cont III. Prediction - Each block goes through an encoder which tries to make a mathematical approximation of the signal. Only the parameters of the predictor need to be included in the compressed file. - 4 types of prediction (verbatim, constant, fixed linear predictor, and FIR Linear prediction) - Flac can change prediction types for each block IV. The predicted signal is subtracted from the original signal, leaving the residue (residual) to be coded losslessly. The residual signal requires fewer bits to encode. Encode-Decode-Verify

Cut Audio Into Blocks of 1024 Samples

Selecting 1 st 1024 Sample Block

Checking Spectrum in each Block

Checking Spectrum 2 nd Block

Checking Spectrum 3 rd Block

Checking Spectrum 4 th Block

Spectrum Comparison: Block 1 and 2 Comparison

Spectrum Comparison: Block 2 and 3

Spectrum Comparison: Block 3 and 4

File Size Comparisons in KB 60000 50000 40000 30000 20000 10000 0 16 bit 44.1 Khz FLAC 8 FLAC 5 FLAC 3 FLAC 0 MP3 96 MP3 128 MP3 160 MP3 256 OGG 96 OGG 128 OGG 160 OGG 256 AAC 96 AAC 128 AAC 160 AAC 256

FLAC and How It Works I wasn t able to hear ANY difference in quality between the FLAC in any of the compression settings and the 16 bit 44.1 Khz uncompressed PCM

Phase Inversion Comparisons Comparing 3 formats to 16/44.1 PCM MP3 256 Kbps OGG 256 Kbps FLAC at 0 Compression DAW visual comparison of 4 tracks

Hey Nineteen Listening Examples: Phase Inversion Mixes Uncompressed PCM 24 bit 96 Khz 256 Kbps MP3 256 Kbps OGG FLAC with All Compression Settings

Methodology Convert the compressed formats back up to 24 bit 96 Khz PCM Mix the original PCM with the upconverted compressed files with phase inverted Time Align waveforms and repeat the mix procedure

Hey Nineteen Listening Examples: Phase Inversion Mixes with Time Alignment Uncompressed PCM 24 bit 96 Khz 256 Kbps MP3 256 Kbps OGG FLAC with 0 Compression Setting

Summary of Different Formats Benefits/Problems FLAC: Lossless Sounds the Best Ogg Vorbis and AAC: High Quality at bit rates of 160 Kbps and better. Bigger files sound better MP3: Sound is passable at 200 Kbps VBR and 256 fixed bit-rate

Who Wins the Golden Headphones? 1 st Place: FLAC (It Really IS Lossless!) 2 nd Place: Ogg Vorbis 3 rd Place: AAC Honorable Mention: MP3