Compressed Audio Demystified by Hendrik Gideonse and Connor Smith. All Rights Reserved.

Compressed Audio Demystified

Why Music Producers Need to Care About Compressed Audio Files Download Sales Up CD Sales Down High-Definition hasn t caught on yet Consumers don t seem to care about high fidelity

Downloading Audio Digital Rights Management (DRM) Peer-to-Peer (P2P) File Sharing Apple s itunes (128-256 Kbps AAC) Amazon mp3 (256 Kbps MP3)

Downloading Audio: itunes 128 Kbps AAC Files with DRM EMI files are 256 Kbps without DRM

Downloading Audio: itunes

Downloading Audio: Amazon mp3 256 Kbps MP3 files, no DRM

Downloading Audio: Amazon mp3

Goals of Compression Algorithms Transmission size Archive size Maintain audio quality Nice idea, but I don t think it will work

Uncompressed Audio Formats AIFF Amiga and Apple s Uncompressed WAV IBM and Microsoft s Uncompressed PCM Pulse Code Modulation Each Sample Uses All Available Bits

Compressed Audio Formats FLAC Open source lossless MP3 MPEG lossy AAC Apple s lossy OGG Open source lossy

Problems with Lossy Codecs Pre-Echo & Time Smearing Non-Harmonic Distortion Loss of Bandwidth

Problems with Lossy Codecs Pre-Echo & Time Smearing Non-Harmonic Distortion Loss of Bandwidth Birdies

Problems with Lossy Codecs: Loss of Bandwidth When encoder runs out of bits to encode a block of data, frequencies (almost always high) get deleted Effectively the codec becomes a Low Pass Filter (LPF)

Frequency Response: 96 MP3

Frequency Response: 128K MP3

Frequency Response: 160K MP3

Frequency Response: 256K MP3

Problems with Lossy Codecs: Pre-Echo & Time-Smear Quantization noise is spread across an entire window If the transient occurs late in the window, the noise can actually occur before the attack Brandenberg, Karlheinz. AES, 17th International AES Conference, Florence, Italy. MP3 and AAC Explained. New York, NY: AES, 1999.

Visual Time Alignment

Visual Time Alignment: Zoom 1

Visual Time Alignment: Zoom 2

Problems with Lossy Codecs: Double-Speak A single transient gets moved in time so that the stereo channels no longer agree when a transient has occurred More common in low bit-rates like 64 Kbps and 96 Kbps

16 bit 44.1Khz WAV format WAV 16 bit 44.1Khz Uncompressed Left marker is negative peak top Right marker is negative peak bottom

16 bit 44.1Khz MP3 @ 96 Kbps MP3 96Kbps Time Smear: Negative peak in both waves is now in the same sample

16 bit 44.1Khz WAV format 16 bit 44.1Khz MP3 @ 96 Kbps WAV and MP3 96Kbps Overlay of the 2 waveforms

16 bit 44.1Khz MP3 @ 128Kbps MP3 128Kbps

16 bit 44.1Khz WAV format 16 bit 44.1Khz MP3 @ 128Kbps WAV and MP3 128Kbps

16 bit 44.1Khz MP3 @ 160Kbps MP3 160Kbps

16 bit 44.1Khz WAV format 16 bit 44.1Khz MP3 @ 160Kbps WAV and MP3 160Kbps

16 bit 44.1Khz MP3 @ 256Kbps MP3 256Kbps

16 bit 44.1Khz WAV format 16 bit WAV 44.1Khz MP3 and @ 256Kbps MP3 256Kbps

Phase Scope Views 16 bit 44.1 Khz WAV 96 Kbps MP3 128 Kbps MP3 256 Kbps MP3

Psychoacoustic or Perceptual Modeling Creating models of how people hear Limitations in physiology informs what components of an audio signal can be eliminated Higher frequencies can be eliminated Masked sounds can be eliminated

Perceptual Encoding/Decoding System Brandenberg, Karlheinz. AES, 17th International AES Conference, Florence, Italy. MP3 and AAC Explained. New York, NY: AES, 1999.

MP3 Encoder Brandenberg, Karlheinz. AES, 17th International AES Conference, Florence, Italy. MP3 and AAC Explained. New York, NY: AES, 1999.

MP3 Encoder Components Hybrid Filterbank 32 band filterbank (FFT), then additional subdivision with an MDCT down to 576 total division Perceptual Model Either uses its own filterbank for calculations, or just combines its masking calculations with the main filter bank data

MP3 Encoder cont Quantization - 2 loops Inner Loop Assigns bits to blocks of data based on masking threshold, can lower global bit gain to conform to allowed number of bits Outer Loop Controls noise by reducing number of bits of each frequency band until it is below the masking threshold generated by the perceptual model When encoded, each frame has a header of a sync word, bit-rate, among other things

AAC Encoder Explained Filterbank is similar to MP3, but AAC uses 1024 bands (MP3 uses 576). Just uses an MDCT Temporal Noise Shaping Originally designed for better speech encoding at lower bit-rates Predicts in a loop in the frequency domain

Ogg Vorbis and How It Works An open source (free!), lossy audio compression codec Ogg is the container to hold the Vorbis data Designed as the better sounding, open source lossy compression replacement for MP3

Ogg Vorbis Like MPEG, uses Modified Discrete Cosine Transform (MDCT) to separate into blocks One the MDCT has frequency information for each block, the noise floor is separated from the rest of the components Quantized using variable bit rate, based on a psychoacoustic model, lowering bit rate of sounds that will be masked Encoding is always variable bit-rate Bit rate varies from sample to sample

Ogg Vorbis = better?? Different from MP3 in its failure mode (when the bit rate would be lowered so low that perceptible loss would occur) Can raise the noise floor bit depth to cover those distortions, which is often heard as reverberations, rather than the metallic birdies of the mp3 compression

FLAC and How It Works Free Lossless Audio Codec Does not remove data from the audio stream Allows for data compression rates in the 30-50% range

Flac Processing I. Blocking - input is broken into different sized blocks of data. Ideal size for each block is determined through examining many factors, including sample rate, spectral content, etc. (As in most compression codecs, blocks with transient material typically are given a smaller size) II. Interchannel Decorrelation - encoder creates both mid and side signals based on the input of the right and left channels

Flac Cont III. Prediction - Each block goes through an encoder which tries to make a mathematical approximation of the signal. Only the parameters of the predictor need to be included in the compressed file. - 4 types of prediction (verbatim, constant, fixed linear predictor, and FIR Linear prediction) - Flac can change prediction types for each block IV. The predicted signal is subtracted from the original signal, leaving the residue (residual) to be coded losslessly. The residual signal requires fewer bits to encode. Encode-Decode-Verify

Cut Audio Into Blocks of 1024 Samples

Selecting 1 st 1024 Sample Block

Checking Spectrum in each Block

Checking Spectrum 2 nd Block

Checking Spectrum 3 rd Block

Checking Spectrum 4 th Block

Spectrum Comparison: Block 1 and 2 Comparison

Spectrum Comparison: Block 2 and 3

Spectrum Comparison: Block 3 and 4

File Size Comparisons in KB 60000 50000 40000 30000 20000 10000 0 16 bit 44.1 Khz FLAC 8 FLAC 5 FLAC 3 FLAC 0 MP3 96 MP3 128 MP3 160 MP3 256 OGG 96 OGG 128 OGG 160 OGG 256 AAC 96 AAC 128 AAC 160 AAC 256

FLAC and How It Works I wasn t able to hear ANY difference in quality between the FLAC in any of the compression settings and the 16 bit 44.1 Khz uncompressed PCM

Phase Inversion Comparisons Comparing 3 formats to 16/44.1 PCM MP3 256 Kbps OGG 256 Kbps FLAC at 0 Compression DAW visual comparison of 4 tracks

Hey Nineteen Listening Examples: Phase Inversion Mixes Uncompressed PCM 24 bit 96 Khz 256 Kbps MP3 256 Kbps OGG FLAC with All Compression Settings

Methodology Convert the compressed formats back up to 24 bit 96 Khz PCM Mix the original PCM with the upconverted compressed files with phase inverted Time Align waveforms and repeat the mix procedure

Hey Nineteen Listening Examples: Phase Inversion Mixes with Time Alignment Uncompressed PCM 24 bit 96 Khz 256 Kbps MP3 256 Kbps OGG FLAC with 0 Compression Setting

Summary of Different Formats Benefits/Problems FLAC: Lossless Sounds the Best Ogg Vorbis and AAC: High Quality at bit rates of 160 Kbps and better. Bigger files sound better MP3: Sound is passable at 200 Kbps VBR and 256 fixed bit-rate

Who Wins the Golden Headphones? 1 st Place: FLAC (It Really IS Lossless!) 2 nd Place: Ogg Vorbis 3 rd Place: AAC Honorable Mention: MP3