Audio Coding and MP3 contributions by: Torbjørn Ekman What is Sound? Sound waves: 20Hz - 20kHz Speed: 331.3 m/s (air) Wavelength: 165 cm - 1.65 cm 1
Analogue audio frequencies: 20Hz - 20kHz mono: x(t) scalar stereo: xr ( t) x( t) = xl ( t) Audio Compression small files, low data rate at transmission reconstruction must be (as much as possible) similar to original signal redundancy (lossless coding) irrelevancy (do not code what you cannot hear) 2
Data rates Quality Sample Rate Bit/Sample Channels Data Rate kb/s Frequency Telephone 8.000 8 Mono 64,00 200-3400 MW 11.025 8 Mono 88,00 UKW 22.050 16 Stereo 705,60 CD 44.100 16 Stereo 1411,00 20-20000 DAT 48.000 16 Stereo 1536,00 20-20000 Dynamics compression A-Law A abs( S) sign( S) S' = 1+ ln A 1+ ln( A abs( S)) sign( S) 1+ ln A for else abs(s) 1 A µ-law 1+ ln(1 + µ abs( S)) S' = sign( S), µ = 255 ln(1 + µ ) 3
Masking Masking Threshold for human ear Threshold changes: neighbouring frequencies (Example 0.5, 1, 4, 8 khz) in time 4
Sampling When x(t) is bandwidth-limited: f > ω x( f ) = 0 then n= [] n x ( t) = x g( t n t) with = 1 < 1 2ω t x[ n] = x( n t) f s sin(2πωt) g( t) = 2πωt Quantisation x Q(x) k bits { y,, } K 1 y n L = 2 k representations x y x y Q( x) = i j y i 5
PCM = Pulse Code Modulation Sampling: Quantisation: Coding: { x( t) } { x[ n] } { x[ n] } { Q( x[ n] )} Q( { x[ n] }) { } n i redundancy irrelevancy Play: y ( t) Q( x[ n ]) g( t n t) = i i Stereo CD Audio Data rate: 2 16 bit 44.1 10 = 1411.2 10-3 3 s 1 bit s 6
MPEG compression factors MPEG 1 Audio: PCM 32, 44.1, 48 khz, max 448 kbit/s MPEG 2 Audio: PCM 16, 22.05, 24, 32, 44.1, 48 khz, max 384 KBit/s MPEG Audio Layer I,II,III Layer I Layer II Digital TV Layer III MP3 7
MP3 - MPEG 1 Audio Layer 3 Sampling: 16 khz - 48 khz Bit rate: 32 kb/s - 192 kb/s (CD Audio: 44.1 khz, 1411 kb/s) www.iis.fhg.de/amm/gallery/index.html Karlheinz Brandenburg: MP3 and AAC explained http://www.exp-math.uni-essen.de/~dreibh/diplom/bra99.pdf perceptual encoding / decoding 8
Filterbank Ideal sub-band coder impossible: ideal sub-band coder downsampling aliasing possible: nearly perfect H m 1 for f Dm, m = 1, K, M ( f ) = 0 else 9
Downsampling from M f s back to sub-bandwidth B, upper frequency is multiple of B f s can sample at f s = 2B = 2M B f s (instead of ) x m [] n [ k] y m M [ k] = x [ k M ] m y m Filterbank in MPEG-1 audio layer 1-3 Polyphase filterbank 32 subbands 512 tap FIR-filters 80 + and * per output Equal width Not perfect reconstruction Frequency overlap 10
A closer look The subbands overlap at 3 db to the adjacent bands. The leakage to the other bands is small. The total response almost adds up to one (0 db). White noise The white noise run through the filterbank. The samples from each band are played in the order of the subbands. The subsampled filtered sequence. The samples from each band are played in the order of the subbands. The reconstruction error is 84 db. 11
Nonideal filterbanks Y( e M 1 n= 1 jω X( e ) = X( e jω 2 πn j ω M In a perfect filterbank the first part is the only part. M 1 1 R jω A jω ) Hk ( e ) Hk ( e ) + k= 0 14 M 444 24444 3 1 M 1 1 j ω R jω A M ) Hk ( e ) Hk ( e ) k= 0 14 M 4444 244444 3 0 2πn The second part consists of the aliasing terms. The filterbank is designed so that the aliasing is small. Tubthumper, a time domain view The red line is the reconstruction error after splitting the signal in subbands, down sampling and applying the synthesis filterbank. The reconstruction error is 84 db and sounds like 12
Tubthumper, frequency view Subband 1 2 4 8 16 32 Center frequency [khz] No subsampling 0.3 1.0 2.4 5.2 10.7 21.7 Subsampled 32 times Filterbank MPEG polyphase 12 samples 12 samples 12 samples filterbank band 1 band 2... Layer I frame 384 samples band 31 Layer II/III frame 1152 samples 13
Critical Bands Heinrich Barkhausen (1881-1956) psycho-acoustic width measured in bark f /100 1bark = 9 + 4 log( f /1000) for else f < 500 MPEG - Sub bands Layer I: 32 bands, 625 Hz each, Fourier transform Layer II: 32 bands, three frames, time masking Layer III: Division according to critical bands 14
MPEG masking Psycho-acoustic model masking of neighbouring bands signals are coded when above masking threshold MUSICAM (Masking-pattern adapted Universal Subband Integrated Coding and Multiplexing) Layer I: simplified, Layer II: entirely, Layer III: with other methods Example: Masking MPEG Audio band level masking coding 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 8 12 10 6 2 10 60 35 20 15 2 3 5 3 1?????? 12 x 15????????????? - x x??????? 15
MPEG-1 Layer 3 encoder MP3 Filter bank - sub bands Series MDCT fine grain frequency resolution non-uniform quantisation perception model Huffman coding 16
MP3 (vs. Layer I/II) modified DCT (Series MDCT vs. FFT) critical bands Huffman coding entropy reduction dynamics compression difference and sum of stereo signals MPEG Audio Layer I,II,III Layer I: 19 ms delay, FFT, 384 samples, frequency masking, equal bands Layer II: 35 ms delay, FFT, 1152 samples, frequency masking, time simulated, equal bands Layer III: 59 ms delay, DCT, 1152 samples, frequency and time masking, bands as in bark scale 17
MPEG Layer I, II, III subj. quality bandwidth compression 1 min audio Audio CD CD 1400 1:1 10.58 MB MPEG1 Layer I CD 384 3.6:1 2.88 MB MPEG1 Layer II CD 256 5.5:1 1.92 MB MPEG1 Layer III CD 128 11:1 962 kb MPEG2 Layer III Radio 64 22:1 481 kb MPEG2 Layer III Telephone 16 88:1 120 kb CS-ACELP Speech 5,30 264:1 40 kb MPEG-2 AAC 18
Audio Formats PCM - Pulse Code Modulation ITU G.711; speech data 4kHz bandwidth, 64 kb/s data rate ADPCM (Adaptive Differential PCM) ITU G.726, G.727; 16, 24, 32, 40 kbit/s. Standard for CCITT G.721 SB-ADPCM (Sub-Band ADPCM) ISDN, G.722; 7 khz bandwidth in 64 kbit/s streams Audio Formats AIFF - Audio Interchange File Format Apple (extension from IFF by Electronic Arts) Wave (by Microsoft and IBM) Part of RIFF (Resource Interchange File Format) NeXT/Sun Audio File Format! big endian 19
Proprietary Audio Formats AT&T Proprietary Compression Algorithm EPAC (Bell Labs) Microsoft Windows Media Audio (WMA) AC-3 Audio Code No. 3 - Dolby Digital Surround Speech compression formats GSM 06-10: 160 13-bit values in 260 Bit (33 Byte) are compressed; 8000 samples/s result in data rate of 1650 Byte/s CELP (Code Excited Linear Prediction): analytical model LD-CELP (Low Delay CELP): G.728 LPC-10E (Linear Prediction Coder (Enhanced): military coder, analytical model, 2.4 kbit/s understandable, but low quality. 20
End of Part Thank you for your attention! 21