Compressed Audio Demystified
Why Music Producers Need to Care About Compressed Audio Files Download Sales Up CD Sales Down High-Definition hasn t caught on yet Consumers don t seem to care about high fidelity
Downloading Audio Digital Rights Management (DRM) Peer-to-Peer (P2P) File Sharing Apple s itunes (128-256 Kbps AAC) Amazon mp3 (256 Kbps MP3)
Downloading Audio: itunes 128 Kbps AAC Files with DRM EMI files are 256 Kbps without DRM
Downloading Audio: itunes
Downloading Audio: Amazon mp3 256 Kbps MP3 files, no DRM
Downloading Audio: Amazon mp3
Goals of Compression Algorithms Transmission size Archive size Maintain audio quality Nice idea, but I don t think it will work
Uncompressed Audio Formats AIFF Amiga and Apple s Uncompressed WAV IBM and Microsoft s Uncompressed PCM Pulse Code Modulation Each Sample Uses All Available Bits
Compressed Audio Formats FLAC Open source lossless MP3 MPEG lossy AAC Apple s lossy OGG Open source lossy
Problems with Lossy Codecs Pre-Echo & Time Smearing Non-Harmonic Distortion Loss of Bandwidth
Problems with Lossy Codecs Pre-Echo & Time Smearing Non-Harmonic Distortion Loss of Bandwidth Birdies
Problems with Lossy Codecs: Loss of Bandwidth When encoder runs out of bits to encode a block of data, frequencies (almost always high) get deleted Effectively the codec becomes a Low Pass Filter (LPF)
Frequency Response: 96 MP3
Frequency Response: 128K MP3
Frequency Response: 160K MP3
Frequency Response: 256K MP3
Problems with Lossy Codecs: Pre-Echo & Time-Smear Quantization noise is spread across an entire window If the transient occurs late in the window, the noise can actually occur before the attack Brandenberg, Karlheinz. AES, 17th International AES Conference, Florence, Italy. MP3 and AAC Explained. New York, NY: AES, 1999.
Visual Time Alignment
Visual Time Alignment: Zoom 1
Visual Time Alignment: Zoom 2
Problems with Lossy Codecs: Double-Speak A single transient gets moved in time so that the stereo channels no longer agree when a transient has occurred More common in low bit-rates like 64 Kbps and 96 Kbps
16 bit 44.1Khz WAV format WAV 16 bit 44.1Khz Uncompressed Left marker is negative peak top Right marker is negative peak bottom
16 bit 44.1Khz MP3 @ 96 Kbps MP3 96Kbps Time Smear: Negative peak in both waves is now in the same sample
16 bit 44.1Khz WAV format 16 bit 44.1Khz MP3 @ 96 Kbps WAV and MP3 96Kbps Overlay of the 2 waveforms
16 bit 44.1Khz MP3 @ 128Kbps MP3 128Kbps
16 bit 44.1Khz WAV format 16 bit 44.1Khz MP3 @ 128Kbps WAV and MP3 128Kbps
16 bit 44.1Khz MP3 @ 160Kbps MP3 160Kbps
16 bit 44.1Khz WAV format 16 bit 44.1Khz MP3 @ 160Kbps WAV and MP3 160Kbps
16 bit 44.1Khz MP3 @ 256Kbps MP3 256Kbps
16 bit 44.1Khz WAV format 16 bit WAV 44.1Khz MP3 and @ 256Kbps MP3 256Kbps
Phase Scope Views 16 bit 44.1 Khz WAV 96 Kbps MP3 128 Kbps MP3 256 Kbps MP3
Psychoacoustic or Perceptual Modeling Creating models of how people hear Limitations in physiology informs what components of an audio signal can be eliminated Higher frequencies can be eliminated Masked sounds can be eliminated
Perceptual Encoding/Decoding System Brandenberg, Karlheinz. AES, 17th International AES Conference, Florence, Italy. MP3 and AAC Explained. New York, NY: AES, 1999.
MP3 Encoder Brandenberg, Karlheinz. AES, 17th International AES Conference, Florence, Italy. MP3 and AAC Explained. New York, NY: AES, 1999.
MP3 Encoder Components Hybrid Filterbank 32 band filterbank (FFT), then additional subdivision with an MDCT down to 576 total division Perceptual Model Either uses its own filterbank for calculations, or just combines its masking calculations with the main filter bank data
MP3 Encoder cont Quantization - 2 loops Inner Loop Assigns bits to blocks of data based on masking threshold, can lower global bit gain to conform to allowed number of bits Outer Loop Controls noise by reducing number of bits of each frequency band until it is below the masking threshold generated by the perceptual model When encoded, each frame has a header of a sync word, bit-rate, among other things
AAC Encoder Explained Filterbank is similar to MP3, but AAC uses 1024 bands (MP3 uses 576). Just uses an MDCT Temporal Noise Shaping Originally designed for better speech encoding at lower bit-rates Predicts in a loop in the frequency domain
Ogg Vorbis and How It Works An open source (free!), lossy audio compression codec Ogg is the container to hold the Vorbis data Designed as the better sounding, open source lossy compression replacement for MP3
Ogg Vorbis Like MPEG, uses Modified Discrete Cosine Transform (MDCT) to separate into blocks One the MDCT has frequency information for each block, the noise floor is separated from the rest of the components Quantized using variable bit rate, based on a psychoacoustic model, lowering bit rate of sounds that will be masked Encoding is always variable bit-rate Bit rate varies from sample to sample
Ogg Vorbis = better?? Different from MP3 in its failure mode (when the bit rate would be lowered so low that perceptible loss would occur) Can raise the noise floor bit depth to cover those distortions, which is often heard as reverberations, rather than the metallic birdies of the mp3 compression
FLAC and How It Works Free Lossless Audio Codec Does not remove data from the audio stream Allows for data compression rates in the 30-50% range
Flac Processing I. Blocking - input is broken into different sized blocks of data. Ideal size for each block is determined through examining many factors, including sample rate, spectral content, etc. (As in most compression codecs, blocks with transient material typically are given a smaller size) II. Interchannel Decorrelation - encoder creates both mid and side signals based on the input of the right and left channels
Flac Cont III. Prediction - Each block goes through an encoder which tries to make a mathematical approximation of the signal. Only the parameters of the predictor need to be included in the compressed file. - 4 types of prediction (verbatim, constant, fixed linear predictor, and FIR Linear prediction) - Flac can change prediction types for each block IV. The predicted signal is subtracted from the original signal, leaving the residue (residual) to be coded losslessly. The residual signal requires fewer bits to encode. Encode-Decode-Verify
Cut Audio Into Blocks of 1024 Samples
Selecting 1 st 1024 Sample Block
Checking Spectrum in each Block
Checking Spectrum 2 nd Block
Checking Spectrum 3 rd Block
Checking Spectrum 4 th Block
Spectrum Comparison: Block 1 and 2 Comparison
Spectrum Comparison: Block 2 and 3
Spectrum Comparison: Block 3 and 4
File Size Comparisons in KB 60000 50000 40000 30000 20000 10000 0 16 bit 44.1 Khz FLAC 8 FLAC 5 FLAC 3 FLAC 0 MP3 96 MP3 128 MP3 160 MP3 256 OGG 96 OGG 128 OGG 160 OGG 256 AAC 96 AAC 128 AAC 160 AAC 256
FLAC and How It Works I wasn t able to hear ANY difference in quality between the FLAC in any of the compression settings and the 16 bit 44.1 Khz uncompressed PCM
Phase Inversion Comparisons Comparing 3 formats to 16/44.1 PCM MP3 256 Kbps OGG 256 Kbps FLAC at 0 Compression DAW visual comparison of 4 tracks
Hey Nineteen Listening Examples: Phase Inversion Mixes Uncompressed PCM 24 bit 96 Khz 256 Kbps MP3 256 Kbps OGG FLAC with All Compression Settings
Methodology Convert the compressed formats back up to 24 bit 96 Khz PCM Mix the original PCM with the upconverted compressed files with phase inverted Time Align waveforms and repeat the mix procedure
Hey Nineteen Listening Examples: Phase Inversion Mixes with Time Alignment Uncompressed PCM 24 bit 96 Khz 256 Kbps MP3 256 Kbps OGG FLAC with 0 Compression Setting
Summary of Different Formats Benefits/Problems FLAC: Lossless Sounds the Best Ogg Vorbis and AAC: High Quality at bit rates of 160 Kbps and better. Bigger files sound better MP3: Sound is passable at 200 Kbps VBR and 256 fixed bit-rate
Who Wins the Golden Headphones? 1 st Place: FLAC (It Really IS Lossless!) 2 nd Place: Ogg Vorbis 3 rd Place: AAC Honorable Mention: MP3