Audio Compression Audio Compression CD quality audio: Sampling rate = 44 KHz, Quantization = 16 bits/sample Bit-rate = ~700 Kb/s (1.41 Mb/s if 2 channel stereo) Telephone-quality speech Sampling rate = 8KHz Quantization = 16 bits/sample Bit rate = 128 Kb/s Absolute Threshold A tone is audible only if its power is above the absolute threshold level 1
Masking effect If a tone of a certain frequency and amplitude is present, the audibility threshold curve is changed Other tones or noise of similar frequency, but of much lower amplitude, are not audible Example 1 Example 2 330 300 Hz 300 500 Hz Masking Effect (Single Masker) Band n-1 Band n Band n+1 Requires fewer bits Requires more bits Masking Effect (Multiple Maskers) 2
Temporal Masking A loud tone of finite duration will mask a softer tone that follows it (for around 30 ms) A similar effect is verified also when the the softer tone precedes the louder tone!!! Perceptual Coding Perceptual coding tries to minimize the perceptual distortion in a transform coding scheme Basic concept: allocate more bits (more quantization levels, less error) to those channels that are most audible, fewer bits (more error) to those channels that are the least audible Needs to continuously analyze the signal to determine the current audibility threshold curve using a perceptual model Audio Coding: Main Standards MPEG (Motion Picture Expert Group) family (note: the standard only specifies the decoder!) MPEG-1 Layer 1 Layer 2 Layer 3 (MP-3) MPEG-2 Back-compatible AAC (non-back-compatible) Dolby AC3 3
MPEG-1 Audio Coder Layer 1 Deemed transparent at 384 Kb/s per channel Subband coding with 32 channels Input divided into groups of 12 input samples Coefficient normalization (extracts Scale Factor) For each block, chooses among 15 quantizers for perceptual quantization No entropy coding after transform coding Decoder is much simpler than encoder Intensity stereo mode Stereo effect of middle and high frequencies depends not so much on the different channel content but on the different channel amplitude Middle and upper subbands of the left and right channel are added together, and only the resulting summed samples are quantized The scale factor is sent for both channel so that amplitudes can be controlled independently during playback MPEG-1 Audio Coder (cont d) Layer 2 Transparent at 256 Kb/s per channel Improved perceptual model (more computationally intensive) Finer resolution quantizers Layer 3 (MP-3) Transparent at 96 Kb/s per channel Applies a variable-size modified DCT on the samples of each subband channel Uses non-uniform quantizers Has entopy coder (Huffman) - requires buffering! Much mode complex than Layer 1 and 2 4
MPEG-1 Layers 1 and 2 Audio Encoder/Decoder (Single Channel) MPEG-1 Layers 3 (MP-3) Audio Encoder/Decoder (Single Channel) Example Original sound (mono, 44.1 KHz, 16 b/s 705.6 Kb/s) Subsampled by 8 (without prefiltering) Subsampled by 8 (with prefiltering) Quantized to 2 b/s Coded using MP3 (64 Kb/s) Coded using MP3 (96 Kb/s) Coded using MP3 (32 Kb/s) 5
Middle-side Stereo Mode Frequency ranges that would normally be coded as left and right are instead coded as Middle (left+right) and Side (left-right) Side channel can be coded with fewer bits (because the two channels are highly correlated) MPEG-2 Audio Coder Backward compatible (i.e., MPEG-1 decoders can decode a portion of MPEG-2 bit-stream): Original goal: provide theater-style surroundsound capabilities Modes of operation: mono-aural stereo three channel (left, right and center) four channel (left, right, center and rear surround) five channel (four channel + center) Full five-channel surround stereo at 640 Kb/s MPEG-2 Audio Coder (Cont d) Non-backward compatible (AAC): At 320 Kb/s judged to be equivalent to MPEG-2 at 640 Kb/s for five-channels surround-sound Can operate with any number of channels (between 1 and 48) and output bit rate (from 8 Kb/s per channel to 182 Kb/s per channel) Sampling rate can be as low as 8Khz and as high as 96 KHz per channel 6
Dolby AC-3 Used in movie theaters as part of the Dolby digital film system Selected for the USA Digital TV (DTV) and DVD Bit-rate: 320 Kb/s for 5.1 stereo Uses 512-point Modified DCT (can be switched to 256-point) Floating-point conversion into exponent-mantissa pairs (mantissas quantized with variable number of bits) Does not transmit bit allocation but perceptual model parameters Dolby AC-3 Encoder PCM Samples Frequency Domain Transform Transform Coefficients Block Floating- Point Exponents Bit Allocation Bitstream Packing Encoded Audio Mantissas Mantissa Quantization Quantized Mantissas References B. Haskell, A. Puri, A. Netravali, Digital Video: An Introduction to MPEG-2, Chapman & Hall, 1997, pp. 55-79 7