Chapter 4: Audio Coding

Size: px
Start display at page:

Download "Chapter 4: Audio Coding"

Transcription

1 Chapter 4: Audio Coding Lossy and lossless audio compression Traditional lossless data compression methods usually don't work well on audio signals if applied directly. Many audio coders are lossy coders, e.g. ADPCM, MP3 and MPEG AAC coders. However, for high fidelity audio compression, lossless coding is preferred, e.g., MLP compression for DVD audio and MPEG-4 ALS coding Issues of compression performance Lossy audio coding generally achieves higher compression performance than lossless audio coding, for example, MP3 has an average compression ratio of about 12 while MPEG-4 ALS coding has an average compression ratio of only 2 Issues of audio quality Lossless coding can reproduce the original audio signal without loss so there is no quality issue for it, however, for lossy audio coding, the reproduced signal has distortion (coding error or noise) in it. This noise may be audible resulting in the degradation of audio quality when listen to it. A quality measure is needed in order to rate the performance of lossy audio coders, e.g., SNR, spectral distortion are objective measures, and Mean Opinion Score, PESQ score are subjective measures 1

2 Linear Prediction for Audio Coding Audio Coding Linear prediction has been widely applied in speech, audio processing and coding areas, for examples, backward predictor for ADPCM, LTP and TNS for AAC in MPEG- I,IV, MPEG-4 ALS employs a high-order predictor, predictor/decorrelator in MLP coding Basic Concept of Linear Prediction By linearly predicting the value of a stationary random process in time, consider the one-step predictor, which forms the prediction of the value by a weighted linear combination of the past values. The predicted value is where are called the prediction coefficients of the one-step linear predictor of order P. The difference between and is called the prediction error The mean-square value of the linear prediction error is where is the autocorrelation of Note that is a quadratic function of the predictor coefficients. 2

3 Linear Prediction for Audio Coding Basic Concept of Linear Prediction The objective of linear predictive analysis is to obtain the prediction coefficients such that is minimized. Since is quadratic, by setting for we obtain the optimum prediction coefficients by solving a set of linear equations Audio Coding These are called normal equations for the coefficients of the linear predictor and can be written in matrix form as shown: The optimum predictor can be computed by inverting the matrix, i.e., Fast algorithm such as Levinson-Durbin algorithm can be used to invert the matrix 3

4 Linear Prediction for Audio Coding Audio Coding Basic Concept of Linear Prediction Practically, in order to compute the autocorrelation, we need to apply a window, e.g. Hamming window, to window the signal such that it becomes finite duration We can view linear prediction as being equivalent to linear filtering where the predictor is embedded in the linear filter as shown: Prediction-Error Filter This FIR filter is called predictor-error filter or inverse prediction filter. The z-transform relationship between input and output is where 4

5 Lossy Audio Coding Waveform coding vs psychoacoustic coding Audio Coding Waveform coding techniques try to code the signal as faithfully as possible so that the reproduced signal is close to the original signal waveform (or spectrum), e.g., ADPCM, Subband coding. The compression performance for waveform coding is generally lower than psychoacoustic coding, typically having a compression ratio from 2 to 4 Psychoacoustic coding techniques explore human psychoacoustic properties such as temporal and frequency maskings, such that the coding noise fall below the perceptual masking levels and become inaudible, e.g. MP3, MPEG II AAC are well-known psychoacoustic audio coders. The compression ratios achieved are generally higher than waveform coders, typically at 6 to 20 5

6 Audio Waveform Coding Adaptive Differential Pulse Code Modulation (ADPCM) Try to code the signal waveform as close to the original as possible The difference between an input sample and a predicted sample is quantized and encoded. The quantization levels are adapted to the signal strength so that the quantization errors are minimized As a result of quantization, ADPCM is a lossy coding technique The compression ratio achieved by ADPCM coding is mild between 2 to 4. Quantization noises will prevail and the audio quality will drop significantly if compression ratio is further pushed up Some International standards ITU G.721, 32 kbps with 8kHz sampling frequency ITU G.722, 64 kbps subband ADPCM, split into two subbands ITU G.726, 40, 24 and 16 kbps with 8kHz sampling frequency Digital Theater System (DTS), subband ADPCM with some perceptual coding for 5.1 channels at 1.5 mbps 6

7 Adaptive Differential Pulse Code Modulation (ADPCM) G.726 ADPCM Coder Audio Waveform Coding ADPCM Encoder ADPCM Decoder 7

8 Adaptive Differential Pulse Code Modulation (ADPCM) Principle of ADPCM Coder A difference signal is obtained,, by subtracting an estimate of the input signal,, from the input signal itself,. An adaptive linear quantizer is used to quantize the value of the difference signal. The quantized codeword is transmitted to the decoder. An inverse quantizer produces a quantized difference signal,, from the codeword. The signal estimate is added to the quantized difference signals to produce the reconstructed version of the input signal. Both the reconstructed signal and the quantized difference signal are operated upon by an adaptive predictor, which produces the estimate of input signal thereby completing the feedback loop. The ADPCM decoder includes a structure identical to the feedback portion of the encoder. In ITU 32 kbit/s ADPCM coding standard, the adaptive predictor consists of a pole-zero filter of order 2 and 6 respectively. The G.726 standard defines a multiplier constant that will change for every difference value,, depending on the current scale of signals. Define a scaled difference signal as follows:,, Audio Waveform Coding is the predicted signal value. is then sent to the quantizer for quantization By changing, the quantizer can adapt to change in the range of the difference 8 signal

9 Adaptive Differential Pulse Code Modulation (ADPCM) Backward Adaptive Quantizer Audio Waveform Coding Basic principle : too many values are quantized to values far from zero > quantizer step size in the quantizer were too small too many values fall close to zero too much of the time -> quantizer step size were too large An adapter should allow one to adapt a backward quantizer step size after receiving just one single output. It simply expands the step size if the quantized input is in the outer levels of the quantizer, and reduces the step size if the input is near zero. It assigns multiplier values to each level, with values smaller than unity for levels near zero, and values larger than 1 for the outer levels. For signal, the quantizer step size is changed according to the quantized value, for the previous signal value, by the simple formula G.726 uses fixed quantizer steps based on the logarithm of the input scaled difference signal, G.726 Quantizer 9

10 Adaptive Differential Pulse Code Modulation (ADPCM) G.726 Backward Predictor The signal estimate is computed by: Audio Waveform Coding where In z-transform, where and is a 6-tap all-zero path and is a 2-tap allpole path of the predictor. The predictor has a transfer function Both sets of predictor coefficients are updated using a simplified gradient algorithm: For an example of the first coefficient of the second order predictor: 10

11 Audio Compression by Exploring Psychoacoustics Property Coding of audio signals by trying to follow the waveform shape can not achieve a compression ratio of higher than 4 if quality is to be maintained. For applications like music player using flash card, a compression ratio of at least 10 is required. In order to achieve this, it is necessary to explore the properties of psychoacoustics so that the quantization errors inherent in the coding process are inaudible, this is the so-called transparent coding the coded audio and source audio pieces are indistinguishable by expert listeners Basic Structure of Psychoacoustics Coder Psychoacoustics Coding 11

12 Basic Structure of Psychoacoustics Coder Principle: Input signal is firstly divided into several sub-bands by using a filter bank, typically a quadratic mirror filter (QMF) implemented by using polyphase structure. Each sub-band signal is then quantized according to the bit allocation information obtained through masking analysis. This masking analysis is done via FFT, critical band grouping of spectral components, and signal to mask ratio (SMR) computation. The quantized sub-band signals are then packed into bit stream as output. Filter bank Psychoacoustics Coding Divide the signal band into several sub-bands. The sub-band filtered signals are then downsampled to the sub-band rate for further processing 12

13 Basic Structure of Psychoacoustics Coder Filter Bank Psychoacoustics Coding In a perfect filter bank the first part is the only part wanted. The second part consists of the aliasing terms. It is impossible to construct ideal sub-band filter (brisk-wall), this results in aliasing after downsampling. For ideal sub-band filter, it should have response In general, sub-band filtering is a non perfect reconstruction process. However, it is possible to build near perfect sub-band filter by designing filter bank so that the aliasing is small. Downsampling from input sampling rate of to, M is the number of sub-bands., where is the sub-band bandwidth with upper frequency at multiple of re-sample the sub-band filtered signal at 13

14 Basic Structure of Psychoacoustics Coder Masking Analysis Psychoacoustics Coding Critical bands masking of neighbouring bands according to psychoacoustics model signals are coded when they are above masking threshold, ignored if they are below, bit allocations for the quantization of above signals are dynamic according to how much they are above the threshold MUSICAM (Masking-pattern adapted Universal Subband Integrated Coding and Multiplexing) Algorithm Example: After analysis, the first levels of 16 of the 32 bands are these: Band Level(db) If the level of the 8th band is 60dB and it gives a masking of 12 db in the 7th band, and 15dB in the 9 th, then since level in 7th band is 10 db ( < 12 db ), so it is inaudible -> ignore it. level in 9th band is 35 db ( > 15 db by 20dB ), so send it. However, only the amount above the masking level needs to be sent, so instead of using 6 bits to encode it, we can use 4 bits with 24dB SQNR, i.e., noise level is still 4dB below the masking, this can achieve a saving of 2 bits. Bit allocated is 14

15 Psychoacoustics Coding MPEG Coding Standards MPEG, which stands for Moving Picture Experts Group, is the name of family of standards used for coding audio-visual information (e.g., movies, video, music) in a digital compressed format. History of MPEG-Audio MPEG-1 Two-Channel coding standard (Nov. 1992) MPEG-2 Extension towards Lower-Sampling-Frequency (LSF) (1994) MPEG-2 Backwards compatible multi-channel coding (1994) MPEG-2 Higher Quality multi-channel standard (MPEG-2 AAC) (1997) MPEG-4 Advanced Audio Coding (AAC) ( ) MPEG-4 with added functionalities, e.g., Spectral Band Replication (SBR), 2003, SinuSodial Coding (SSC), 2004 MPEG-4 ALS (Advanced Lossless Coding), 2006 MPEG Surround, 2007 MPEG Spatial Audio Object Coding (SAOC),

16 Psychoacoustics Coding MPEG-1 Audio Coding MPEG-1 standard (ISO/IEC ) has a gross bit rate of 1.5 Mbits/sec for audio and video, about 1.2 Mbits/sec for video, 0.3 Mbits/sec for audio. Compression factor ranging from 2.7 to 24 (typical at 12). With Compression rate 6:1 (16 bits stereo sampled at 48 KHz is reduced to 256 kbits/sec) and optimal listening conditions, expert listeners could not distinguish between coded and original audio clips. MPEG-1 audio supports sampling frequencies of 32, 44.1 and 48 KHz. Supports one or two audio channels in one of the four modes: Monophonic -- single audio channel Dual-monophonic -- two independent channels, e.g., English and French Stereo -- for stereo channels that share bits, but not using Joint-stereo coding Joint-stereo -- takes advantage of the correlations between stereo channels The Basic Algorithm 1. Use quadratic mirror filter bank to divide the audio signal (e.g., 48 khz sound) into 32 frequency sub-bands --> sub-band filtering. 2. Determine amount of masking for each band caused by nearby band using the psychoacoustic model via a separate FFT analysis. 3. If the power in a band is below the masking threshold, don't encode it. 4. Otherwise, determine number of bits needed to represent the coefficients such that noise introduced by quantization is below the masking effect (Recall that one fewer bit of quantization introduces about 6 db of noise). 5. Format bitstream 16

17 MPEG-1 Audio Coding Psychoacoustics Coding MPEG provides 3 layers of compression with various coding complexities and bit rates. Basic model is same, but codec complexity increases with each layer. MP3 coders incorporate layer 3 of the MPEG audio coding standard. A Block diagram of MPEG Audio Coder 17

18 Psychoacoustics Coding MPEG-1 Audio Coding Polyphase Filterbank in MPEG-1 Audio Coder 32 sub-bands 512 tap FIR-filters equal width non-perfect reconstruction frequency overlap Low complexity, requires 80 multiplications/additions per output sample The sub-bands overlap at 3 db to the adjacent bands. The leakage to the other bands is small. The total response almost adds up to one (0 db). 18

19 Psychoacoustics Coding MPEG-1 Audio Coding Layer I coding Divides data into frames, each of them contains 384 samples, i.e. a duration of 8ms for 48kHz sampling, 12 samples from each of the 32 filtered sub-bands. The subbands are equally spaced in frequency, i.e., not critical bands. Psychoacoustic model only uses frequency masking. Layer 1 Psychoacoustic model uses a 512- point FFT to get detailed spectral information about the signal. Both tonal (sinusoidal) and non-tonal (noise) maskers are derived from the FFT spectrum. Each masker produces a masking threshold depending on its frequency, intensity, and tonality. For each subband, the individual masking thresholds are combined to form a global masking threshold. The masking threshold is compared to the maximum signal level for the sub-band, producing a signal-tomasker ratio (SMR) which determine the bit allocations for the quantizer. 19

20 MPEG-1 Audio Coding Psychoacoustics Coding Layer I coding psychoacoustic model Input signal spectral intensity is computed as X(k) is the FFT spectrum 8/3 is the gain reduction factor due to Hanning window Input signal is assumed to have amplitude limited to Concept of Perceptual Entropy Signal intensity Intensity of scale factor band m Relative intensity of the masked threshold Tonal masker detection by local maxima 20

21 MPEG-1 Audio Coding Psychoacoustics Coding Layer I coding psychoacoustic model SMR is the difference in level between a signal component and the masking threshold at a certain frequency z is a frequency dependent factor c = 3 -> 6 db Addition of masking Intensity of marking curve simply addition of individual masking curve (used in MPEG-1) using the highest masking curve i.e., dominant (used in MPEG-2, AC-2 21

22 Psychoacoustics Coding MPEG-1 Audio Coding Layer I coding quantization and frame packing The Layer 1 quantizer/encoder first examines each subband's samples, finds the maximum absolute value of these samples, and quantizes it to 6 bits. This is called the scale factor for the subband. Then it determines the bit allocation for each subband by minimizing the total noise-to-mask ratio with respect to the bits allocated to each subband. (It's possible for heavily masked subbands to end up with zero bits, so that no samples are encoded.) Finally, the subband samples are linearly quantized to the bit allocation for that subband. Layer-I frame packing Each frame starts with a header information for synchornization and bookkeeping (20bits), 16-bit cyclic redundancy check (CRC) for error detection and correction. Each of the 32 subbands gets 4 bits to describe bit allocation and 6 bits for the scale factor. The remaining bits in the frame are used for subband samples, with an optional trailer for extra information. 22

23 Psychoacoustics Coding MPEG-1 Audio Coding Layer I coding quantization and frame packing Header = 12 bit sync + 20 bits system information as 1 bit for ID (1=MPEG), 2 bits for layer (I & II & III), 1 bit for error protection, 4 bits for bit rate index, 2 bits for sampling frequency, 1 padding bit, 1 private bit, 2 bits for mode, 2 bits for mode extension, 1 copyright bit, 1 bit for original/copy, 2 bits for emphasis Scale factor is 6 bits covering 0<->62 representing quantization of gain range 2.0 <-> with a dynamic range of 120 db Bit allocation is 4 bits No one bit allocation due to the use of mid-thread quantizer Zero bit allocation to highest SMR band 15 forbidden Highest quality is achieved with a bit rate of 384k bps. Typical applications of Layer 1 include digital recording on tapes, hard disks, or magneto-optical disks, which can tolerate the high bit rate. 23

24 MPEG-1 Audio Coding Psychoacoustics Coding Layer II coding Use three frames in filtering (i.e., before, current and next, a total of 3x384=1152 samples). This models a little bit of the temporal masking. The Layer 2 time-frequency mapping is the same as in Layer 1 The Layer 2 psychoacoustic model is similar to the Layer 1 model, but it uses a 1024-point FFT for greater frequency resolution. It uses the same procedure as the Layer 1 model to produce signal-to-masker ratios for each of the 32 subbands. The Layer 2 quantizer/encoder is similar to that used in Layer 1. However, Layer 2 frames are three times as longer than Layer 1 frames, so Layer 2 allows each subband a sequence of three successive scale factors, and the encoder uses one, two, or all three, depending on how much they differ from each other. This gives, on average, a factor of 2 reduction in bit rate for the scale factors compared to Layer 1. Bit allocations are computed in a similar way to Layer 1. Layer 2 processes the input signal in frames of 1152 PCM samples. At 48 khz, each frame carries 24 ms of sound. Highest quality is achieved with a bit rate of 256k bps, but quality is often good down to 64k bps. Typical applications of Layer 2 include audio broadcasting, television, consumer and professional recording, and multimedia. Audio files on the World Wide Web with the extension.mpeg2 or.mp2 are encoded with MPEG-1 Layer 2. 24

25 Psychoacoustics Coding MPEG-1 Audio Coding Layer II coding The Layer 2 frame packer uses the same header and CRC structure as Layer The number of bits used to describe bit allocations varies with subband: 4 bits for the low subbands, 3 bits for the middle subbands, and 2 bits for the high subbands (this follows critical bandwidths). The scale factors (one, two or three depending on the data) are encoded along with a 2-bit code describing which combination of scale factors is being used. The subband samples are quantized according to bit allocation, and then combined into groups of three (called granules). Each granule is encoded with one code word. This allows Layer 2 to capture much more redundant signal information than Layer 1. 25

26 MPEG-1 Audio Coding Psychoacoustics Coding Layer III coding Layer 3 uses a better critical band filter (non-equal frequencies), psychoacoustic model includes temporal masking effects, takes into account stereo redundancy, and uses Huffman coder for further lossless compression of the quantized signals The filter bank used in MPEG Layer-3 is a hybrid filter bank which consists of a 32 channel polyphase filter bank and a Modified Discrete Cosine Transform (MDCT). This provides a fine grain frequency resolution. This hybrid form was chosen for reasons of compatibility to its predecessors, Layer-1 and Layer-2 because MDCT is applied after the polyphase filter. 26

27 MPEG-1 Audio Coding Psychoacoustics Coding Layer III coding Layer 3 is substantially more complicated than Layer 2. It uses both polyphase and discrete cosine transform filter banks, a polynomial prediction psychoacoustic model, and sophisticated quantization (non-uniform) and encoding schemes allowing variable length frames. The frame packer includes a bit reservoir which allows more bits to be used for portions of the signal that need them. Layer 3 is intended for applications where a critical need for low bit rate justifies the expensive and sophisticated encoding system. It allows high quality results at bit rates as low as 64k bps. Typical applications are in telecommunication and professional audio, such as commercially published music and video. The widely used MP3 coding is Layer III coding of MPEG-1. Stereo Redundancy Coding: Intensity stereo coding -- at upper-frequency subbands, encode summed signals instead of independent signals from left and right channels. Middle/Side (MS) stereo coding -- encode middle (sum of left and right) and side (difference of left and right) channels. Detection 27

28 MPEG-1 Audio Coding Psychoacoustics Coding Layer III coding Layer III vs Layer I and Layer II modified DCT (Series MDCT) critical bands in bark scales Huffman coding entropy reduction dynamics compression difference and sum of stereo signals Effectiveness of MPEG audio Quality factor: 5 - perfect, 4 - just noticeable, 3 - slightly annoying, 2 - annoying, 1 - very annoying Real delay (encoding+transmission+decoding) is about 3 times of the theoretical delay. MPEG-III bit stream format Layer Target Bit-rate Compression Ratio Quality (MOS) at 64 kb/s Quality (MOS) at 128 kb/s Theoretical Min. Delay Layer kb/s 4: ms Layer kb/s 6:1-8:1 2.1 to ms Layer 3 64 kb/s 12:1-10:1 3.6 to ms 28

29 MPEG-2 Audio Coding Psychoacoustics Coding MPEG-2 BC is a multichannel standard that is backward compatible with MPEG-1 coder. The number of channels can be between 1 and 48. It allows for ITU-R indistinguishable quality at data rates of 320 kbps for five full-bandwidth channels audio signals. It supports scaleable sampling rates from 8 khz to 96 khz with low and high complexities. The multichannel MPEG-2 format uses a basic five-channel approach sometimes referred to as 3/2+1 stereo (3 front and 2 surround channels + subwoofer). The low frequency effects (LFE) subwoofer channel is optional. An encoder matrix allows a two-channel decoder to decode a compatible twochannel signal that is a subset of a multichannel bit stream. 29

30 MPEG-2 Audio Coding Psychoacoustics Coding The MPEG-1 left and right channels are replaced by matrix MPEG-2 left and right channels and these are encoded into backward compatible MPEG frames with an MPEG-1 encoder. Additional multichannel data is placed in the expanded ancillary data field. A standard two-channel decoder ignores the ancillary information, and reproduces the front main channels. 30

31 MPEG-2 Advanced Audio Coding (AAC) Psychoacoustics Coding Initially, the MPEG-2 AAC was called the MPEG-2 nonbackward compatible (NBC) coding. The standardization of MPEG-2 NBC started after the completion of the MPEG-2 BC multichannel standard. The MPEG-2 NBC was renamed the MPEG-2 AAC and was adopted in 1997 as part of the MPEG-2 standard. The MPEG-2 AAC format codes stereo or multichannel sound at a bit rate of about 64 Kbps per channel. It also provides 5.1-channel coding at an overall rate of 384 Kbps. MPEG-2 AAC is not backward compatible with MPEG-1. AAC supports input channel configurations of 1/0 (mono), 2/0 (two-channel stereo), different multichannel configurations up to 3/2+1, and has provision for up to 48 channels. DAB, DVB and DVD MPEG-Audio LayerII, Dolby AC-3 Dolby AC-3 Dual Channel Multi-Channel MPEG-2 Audio"LSF" MPEG-2 AudioLII Dual Channel Multi-Channel MPEG-1 Audio with32, 44.1and 48 khz CCITT G722 Layer III Layer II Layer I NICAM kbit/s 448 Bit Rate

32 MPEG-2 Advanced Audio Coding (AAC) MPEG-2 AAC Profiles Psychoacoustics Coding To allow flexibility in audio quality versus processing requirements, the AAC format defines three profiles of application: main, low complexity, and scalable sampling rate. The main profile provides the highest audio quality at any given bit rate, and all the features of AAC are employed. A main profile decoder can also decode the low-complexity bit streams. Main Profile all parts of AAC tools except the preprocessing tool are used. Low Complexity (LC) Profile the prediction and preprocessing tools are not executed and the TNS module has a limited order. Scaleable Sampling Rate (SSR) Profile preprocessing tools are required. They are composed of a polyphase quadrature filter, gain detectors, and gain modifiers. In this profile, prediction tools are not executed and the TNS order and the bandwidth are limited. 32

33 MPEG-2 Advanced Audio Coding (AAC) MPEG-2 AAC Encoder Psychoacoustics Coding 33

34 MPEG-2 Advanced Audio Coding (AAC) Psychoacoustics Coding MPEG-2 AAC Encoder The crucial differences between MPEG-2 AAC and its predecessor MPEG- 1 Audio Layer-3 are shown as follows: Filter bank: in contrast to the hybrid filter bank of ISO/MPEG-1 Audio Layer-3 - chosen for reasons of compatibility but displaying certain structural weaknesses - MPEG-2 AAC uses a plain Modified Discrete Cosine Transform (MDCT). Together with the increased window length (1024 instead of 576 spectral lines per transform) the MDCT outperforms the filter banks of previous coding methods. Temporal Noise Shaping (TNS): A true novelty in the area of time/frequency coding schemes. It allows controlling the fine structure of quantization noise within the filter bank window and shapes the distribution of quantization noise in time by prediction in the frequency domain. In particular voice signals experience considerable improvement through TNS. Prediction: with better signal prediction. It benefits from the fact that certain types of audio signals are easy to predict so incorporating a better signal prediction algorithm can improve the performance. Quantization: by allowing finer control of quantization resolution, the given bit rate can be used more efficiently. Bit-stream format: the information to be transmitted undergoes entropy coding in order to keep redundancy as low as possible. The optimization of these coding methods together with a flexible bit-stream structure has made further improvement of the coding efficiency possible. 34

35 MPEG-2 Advanced Audio Coding (AAC) Principle of MPEG AAC Encoding Preprocessing (used in SSR profile only) Psychoacoustics Coding It includes a polyphase quadrature filter (PQF), gain detectors and gain modifiers. The PQF allows four unique bandwidth outputs. For signal with 48kHz sampling, it can output signals with bandwidths at 24 khz, 18 khz, 12 khz, and 6 khz. The pre-echo effect can be suppressed by using the gain control tool. The amplitudes of each PDF band can be manipulated independently by gain detectors and the gain control can be used together with all types of window sequences. The time resolution of the gain control is about 0.7ms for 48 khz sampling rate. 35

36 MPEG-2 Advanced Audio Coding (AAC) Principle of MPEG AAC Encoding Filter bank Psychoacoustics Coding AAC encoder uses the modified discrete cosine transform (MDCT) as the input signal filter bank. MDCT adopts a technique called time domain aliasing cancellation (TDAC) for which the windowed signals are overlapped and added to achieve perfect reconstruction. Forward MDCT: Inverse MDCT: where N is the transform block length and The MPEG AAC filter bank was designed to enable smooth change from one window shape to another in order to better adapt to input signal conditions. It outputs 2048 spectral lines for stationary signals, or 256 spectral lines for transient signals. Two windows are employed in the filter bank - one is used when perceptually important components are spaced closer than 140Hz, and the other is used when components are spaced more than 220Hz apart. 36

37 MPEG-2 Advanced Audio Coding (AAC) Principle of MPEG AAC Encoding Psychoacoustics Coding Window Switching is kept synchronized at both encoder and decoder sides. This allows the encoder to control temporal pre-echo noise even within a filter-bank window since quantization is performed in the frequency domain, the quantization error will spread over the whole window in time domain resulting in pre-echo noise for transient signals, so transient signals are best encoded with short transform length. An illustration of typical window switching sequence: (a) long window, (b) long start window, (c) eight short window, (d) long stop window, and (e) long window Sine or Kaiser-Based Derived Window The encoder can select optimum window shape according to the characteristics of the input signal. In order to maintain perfect reconstruction, the shape of the left half of each window should always match the shape of the right half of the preceding window. 37

38 MPEG-2 Advanced Audio Coding (AAC) Principle of MPEG AAC Encoding Temporal noise shaping Psychoacoustics Coding Perceptual coding is especially difficult if there are temporal mismatches between the masking threshold and the quantization noise (pre-echo). The TNS technique allows the encoder to control the fine temporal structure of the quantization noise even within a filterbank window. TNS is basically a technique that performs noise shaping by applying linear prediction at MDCT coefficients. The TNS filtering block implements an in-place filtering operation on the spectral values, which means that it replaces the set of spectral coefficients input to the TNS filtering block with the corresponding prediction residual. Prediction coefficients of the TNS filtering are obtained by applying a linear predictive analysis of the spectral coefficients. The combination of the encoder filterbank and the adaptive TNS prediction filter is a compound continuous signal adaptive filterbank which adapts between a high-frequency resolution filterbank (for stationary signals) and a high-time resolution filterbank (for transient signals) dynamically. 38

39 MPEG-2 Advanced Audio Coding (AAC) Principle of MPEG AAC Encoding Psychoacoustics Coding Joint stereo coding explores the binaural psychoacoustic effects to significantly reduce the bit rate for stereophonic signals to a rate much lower than that required for coding the input channels separately. Two techniques: M/S stereo coding: also known as sum/difference coding. M/S stereo coding can be used to overcome the problem caused by the binaural masking level depression, where a signal of lower frequencies (< 2kHz) can present as big as 20 db difference in masking threshold depending on the phase of the signal and noise presence. M/S stereo coding is applied to each channel pair of the multichannel audio source, i.e., left/right front and let/right surround channels. M/S coding can be switched on and off in time or in frequency depending on the characteristics of the input signal. Intensity stereo coding: it is similar to techniques based on dynamic crosstalk or channel coupling. The technique is based on the fact that the perception of high-frequency sound components relies mainly on the analysis of their energy-time envelopes rather than the signals themselves. It is possible to transmit only a single set of spectral values and share them among several audio channels while still achieve excellent sound quality. 39

40 MPEG-2 Advanced Audio Coding (AAC) Principle of MPEG AAC Encoding Psychoacoustics Coding Prediction is used to further reduce the redundancy of signals, especially stationary signals. Each spectral component at the filterbank s output (up to 16kHz) are input to the prediction module. The predictor is a backward adaptive predictor which exploits the correlation of spectral components of consecutive frames. An LMS adaptation algorithm is used to calculate the predictor coefficients (order two) on a frame-by-frame basis. Prediction can be switched on and off dynamically according to the bit saving achieved with prediction or not. 40

41 MPEG-2 Advanced Audio Coding (AAC) Principle of MPEG AAC Encoding Quantization and coding A global gain quantized to 8 bits 49 scale factors are differential quantized and then Huffman coded Non-uniform quantizer QS is a global quantization stepsize with resolution of 1.5dB Psychoacoustics Coding Two constraints exist during the quantization: Meet the requirement of the psychoacoustics model Keep the total number of bits below a certain limit The strategy adopted by AAC is by using two nested iteration loops; the inner and outer iteration loops. Main feature of AAC quantization: Nonuniform quantization -> increase SNR, finest step is 1.5 db Huffman coding of spectral values with different probability models -> Probability tables of two and four dimensions are used. Noise shaping by amplifying groups of spectral values. The amplification information is stored in the scale factors -> shape the quantization noise in units similar to the critical bands of human auditory system.. Huffman coding of the difference between scale factors 41

42 MPEG-2 Advanced Audio Coding (AAC) Principle of MPEG AAC Encoding Quantization and coding Psychoacoustics Coding 42

43 MPEG-2 Advanced Audio Coding (AAC) Principle of MPEG AAC Encoding Quantization and coding The task of inner iteration loop is to control the quantizer step size so that the given spectral data can be coded within the number of available bits. Initial quantizer -> quantization -> bit count -> Psychoacoustics Coding repeat with increase quantizer step size if bits exceed the available limit The task of outer iteration loop is to change the amplification amount of the spectral coefficients in all scale factor bands so that the psychoacoustic model can be fulfilled as much as possible. The inner and outer loops are applied iteratively until the best results are achieved or some termination conditions appeared. 43

44 MPEG-2 Advanced Audio Coding (AAC) Quantization and coding Noiseless coding Group 4 quantized coefficients as magnitudes in excess of one, with a value of +/- 1 left in the quantized coefficient array to indicate the sign. The clipped coefficients are coded by integer magnitudes and an offset from the base of the coefficient array to mark their location. Each set of 1024 quantized coefficients is separated into sections so that a single Huffman codebook can be used to code each section. Sectioning is performed dynamically in order to minimize the required number of bits to represent the full set of quantized spectral coefficients. Grouping and interleaving For the case of 8 short windows, the set of 1024 coefficients is actually a matrix of 8 by 128 frequency coefficients. More coding gain can be achieved if the coefficients of these short windows are grouped and interleaved. Scale factors Psychoacoustics Coding There is a global gain that normalizes the scale factors. The global gain is represented by an 8-bit unsigned integer. There are 49 scale factors, most scale factor bands have 32 coefficients. Both the global gain and scale factors are quantized in 1.5dB steps. The difference between each scale factor and the previous scale factor are Huffman coded. 44

45 MPEG-4 Audio Psychoacoustics Coding MPEG-4 is a standard for every application that requires the use of advanced sound compression, synthesis, manipulation, or playback. MPEG-4 audio provides the following integration: Low-bit-rate with high quality compression; Synthetic with natural sound processing; Speech with audio coding; Single with multichannel audio configuration; Traditional with interactive and virtual-reality content analysis Overview of MPEG-4 Capabilities Speech tools: used for the transmission and decoding of natural or synthetic speech Audio tools: used for the transmission and decoding of recorded music and other audio soundtracks Synthesis tools: used for very-low-bit-rate description, transmission, and synthesis of synthetic music and other sounds Composition tools: used for object-based coding, interactive functionality, and audiovisual synchronization. Scalability tools: used for creating bit streams that can be transmitted, without recording, at several different bit rates. 45

46 MPEG-4 Audio Psychoacoustics Coding MPEG-4 General Audio Coding Tools MPEG-4 standard is capable of coding of natural audio at a wide bit range, including bit rates from 6kbit/s up to several hundred kbit/s per audio channel for mono, two-channel, and multichannel signals. In the upper bit rate range of MPEG-4 general audio coder, high-quality compression can be achieved by MPEG-2 AAC standard with certain improvements within the MPEG-4 tool set. Tools: Speech, General Audio, Scalability, Synthesis, Composition, Streaming, Error Protection General Frame Work of MPEG-4 Audio 46

47 MPEG-4 Audio Psychoacoustics Coding MPEG-4 Additions to AAC Besides the building blocks provided by MPEG-2 AAC, several new tools are added in the MPEG-4 T/F coder in order to improve the coding efficiency and offer new functionalities. Perceptual noise substitution (PNS) Long-term prediction (LTP) Transform-domain weighted interleave vector quantization (TwinVQ) coding kernel Low-delay AAC (AAC-LD) Error-resilience (ER) Bit Slice Arithmetic Coding (BSAC) instead of Huffman coding Perceptual noise substitution (PNS) The principle of PNS is to represent noise-like components of the input signal with a very compact parameter representation. It is based on the fact that the subjective sensation stimulated by a noise-like signal is determined by the input signal s spectral and temporal fine structure rather than its actual waveform. 47

48 MPEG-4 Audio Psychoacoustics Coding Perceptual noise substitution (PNS) The encoder analyses the input signal and determines noise-like signal components for each scalefactor band in each coding frame. If a particular scalefactor band is considered to be noise-like, no quantization and entropy coding is performed. Instead, these two steps are omitted and a noise substitution flag and the total power of the substituted set of spectral coefficients are transmitted to the decoder. The decoder analyses the transmitted information. If a noise-substitution flag is detected, pseudorandom noise with a total noise power equal to the transmitted level are inserted into the reconstructed data to replace the actual spectral coefficients. Since only a signaling flag and energy information are transmitted for each selected scalefactor band, PNS results in a highly compact representation for noiselike components in the input signals. It was found that PNS tool has the ability to enhance the coding efficiency for complex musical signals when coded at low bit rates. 48

49 MPEG-4 Audio Psychoacoustics Coding Long-term prediction It is based on the well-known technique for coding of speech signals. This technique is usually used in speech coding to reduce the redundancy of voiced speech signals which is periodic nature. 49

50 MPEG-4 Audio Psychoacoustics Coding Long-term prediction The LTP tool uses the quantized spectral values of the preceding frames to predict input signals as follows: An inverse TNS filter and a synthesis filterbank are needed to map the quantized spectral values back to their time domain representations. The optimum parameters for delay (long-term lag) and amplitude scaling (gain) are calculated based on matching the reconstructed time signal to the actual input signal. They are then formed the predicted signal. An analysis filterbank and a forward TNS filter are used to transform both the input and the predicted signals into their spectral representations. A residual signal is obtained by subtracting the spectral values of the input signal by that of the predicted signal. The difference signal is then sent for quantization and entropy coding. To achieve the best performance, the difference or input signals can be selected for quantization and coding depending on the resulting bit rate. 50

51 MPEG-4 Audio Psychoacoustics Coding Transform-domain weighted interleave vector quantization (TwinVQ) TwinVQ is an additional quantization/coding process provided by MPEG-4 T/F coder other than the MPEG-2 AAC. It is an alternative coding kernel and is mainly designed to be used with the MPEG-4 scalable T/F audio coder. This TwinVQ tool is capable of providing good coding performance at extremely low-bit-rates for general types of audio signal, including music. Two steps involved in TwinVQ: Spectral normalization: flatten the amplitudes of spectral coefficients to a desirable range by using the LPC technique. An LPC model is used for representing the overall coarse spectral envelope of the input signal. The LPC model parameters are quantized and sent to the decoder as side information. Weighted vector quantization: the flattened spectral coefficients are first interleaved and divided into subvectors. Perceptual shaping of the quantization noise can be achieved by using an adaptive weighted distortion measure that is controlled by a perceptual model. Vector quantization is then applied on the weighted subvector with equal bit allocation for all subvectors. 51

52 MPEG-4 Audio Psychoacoustics Coding Transform-domain weighted interleave vector quantization (TwinVQ) 52

53 MPEG-4 Audio Psychoacoustics Coding Low-Delay AAC The standard MPEG-4 T/F coder can provide excellent coding performance for general audio signals at low bit rates, but there is a penalty to pay its algorithmic delay is up to a several hundred milliseconds. This is not well-suited to applications requiring low delay, such as real-time bidirectional communications. A low delay version of MPEG-4 AAC LTP is developed to enable coding with an algorithmic delay down to 20ms. It uses a frame length of 512 or 480 samples at 48kHz sampling for low-delay mode. The look-ahead delay is avoided by disabling the window switching. A sine window is used for non transient parts of signal. Whereas a low overlap window is applied to transient signals so that the optimum TNS performance can be achieved and effects of temporal aliasing as a result of MDCT filterbank can be reduced. The use of bit reservoir is minimized or disabled at the encoder 53

54 MPEG-4 Audio Psychoacoustics Coding MPEG-4 Scalable Audio Coding The traditional perceptual audio coding design has a fixed target bit rate which is not appropriate for applications where bit streams need to be distributed via transmission channels with time-varying capacity. MPEG-4 audio has scalability functionality which is integrated into the AAC coding framework and is capable of generating a single bit stream which adapts to different channel characteristics. Large-step scalable audio coding The bit streams generated is composed of several partial bit streams that can be independently decoded and then combined. The base layer coder is the first coder involved in the scheme. It codes the input signal with the basic perceptual quality. The residual data obtained by subtracting the input from a local decoder s output is coded by an enhancement layer coder. The process continues refining the signal by sending increasingly more enhancement layers into the bit stream until the target bit rate is achieved. 54

55 MPEG-4 Audio Psychoacoustics Coding MPEG-4 Scalable Audio Coding Architecture of MPEG-4 Large-Step Scalable Audio Encoder Architecture of an MPEG-4 Large-Step Scalable Audio Decoder 55

56 MPEG-4 Audio Psychoacoustics Coding MPEG-4 High Efficiency AAC (HE-AAC) This is an extension of low complexity AAC (AAC-LC) optimized for low-bitrate applications such as streaming audio. HE-AAC version 1 profile uses spectral band replication (SBR) to enhance the compression efficiency in the frequency domain. HE-AAC version 2 profile couples SBR with Parametric Stereo (PS) to enhance the compression efficiency of stereo signals. It is a standardized and improved version of the AACplus codec. HE-AA is used in DAB+ digital radio standard. Spectral Band Replication It is based on extending the harmonic spectrum of low frequency band to high frequency band. The codec encodes and transmits the low- and mid-frequencies of the spectrum, while SBR replicates higher frequency content by transposing up harmonics from the low- and mid-frequencies at the decoder. Some guidance information for reconstruction of the highfrequency spectral envelope is transmitted as side information. Parametric Stereo It performs sparse coding in the spatial domain, somewhat similar to what SBR does in the frequency domain. Stereo audio is down-mixed to mono and then additional PS, about 2.3kbps of side information, is sent along with the encoded mono stream 56

57 Psychoacoustics Coding Other Lossy Audio Waveform Coders AC-3 (Dolby Digital) Coder preceded by the AC-1 and AC-2 codecs. AC-1 uses adaptive delta modulation combined with analog companding. It is not a perceptual coder and has approximately a 3:1 compression ratio. It is used in satellite relays of television and FM programming as well as cable radio services. AC-2 is a perceptual coder. It uses 512-point FFT with 50% overlapping. Kaiser-Bessel window is used. Coefficients are grouped into subbands containing 1 to 15 coefficients to model critical bandwidth. It can provide high quality audio with a data rate of 256 kbps per channel. AC-2 is a registered.wav type so that AC-2 files are interchangeable between computer platforms. The AC-3 coding (Dolby Digital) is an outgrowth of the AC-2 encoding format. It has widespread uses in commercial cinema and is widely used to convey multichannel audio in applications such as DVD-Video, DTV and DBS. AC-3 is a perceptual coder. It can code from 1 to 6 channels as 3/2, 3/1, 3/0, 2/2, 2/1, 2/0, 1/0 as well as optional LFE channel. It can code the 5.1 (i.e., six) channels at 48 khz at a nominal rate of 384 kbps (compression ratio 13:1). It also support bit rates from 32 to 640 kbps. The AC-3 coder is backward compatible with matrix surround sound, twochannel stereo and monaural formats. 57

58 Psychoacoustics Coding Other Lossy Audio Waveform Coders AC-3 coder description Filter bank: implement MDCT using 512 FFT with 256 spectral coefficients, use Kaiser-based window with 50% overlapping. There is a total of 50 bands between 0 and 24 khz. The bandwidths vary between 3/4 and 1/4 of critical bandwidth values. A transient detector can dynamically reduce the transform length from 512 to 256 for wideband transient signals Spectral coefficient quantization: Each frequency coefficient is processed with floating point representation with mantissa (0 to 16 bits) and exponent (5-bit) to maintain dynamic range. The coded exponents act as scale factors for mantissas and represent the signal spectrum envelope. The spectral envelope is coded differentially from lower frequency adjacent filters. The first DC term is coded as an absolute. Each differential is quantized from 5 possible values. The differential exponents are grouped for coding, e.g., group of three differentials are coded in a 7-bit word. The selection of grouping is dependent on the signal characteristics trading the frequency and time resolution. The bit allocation for coding the mantissas is according to the masking criteria. Assignment is performed globally on all channels from a bit pool. Quantized mantissas are scaled and offset. Differ is optionally employed when zero bits are allocated to the mantissa 58

59 Psychoacoustics Coding Other Lossy Audio Waveform Coders AC-3 coder description 59

60 Psychoacoustics Coding Other Lossy Audio Waveform Coders Digital Theater System (DTS) The DTS perceptual coding algorithm (known as Coherent Acoustics) nominally codes 5.1 channels at a bit rate of 1.5 Mbps. It can operate over a range of bit rates (e.g., 8 to 512 kbps/channel), sampling frequencies (e.g., 24 to 192 khz) and resolution (e.g., 16 to 25 bits). It is a subband ADPCM coder with 32 uniform subbands. Input signal is divided into frame of 256, 512, 1024, 2048 or 4096 samples depending on the sampling frequency and output bit rate. Each subband is ADPCM coded. Audio signal is examined for psychoacoustic and transient information. A global bit management system allocates bits over all the coded subbands. The algorithm calculates scale factors and bit allocation indices and ultimately quantizes the ADPCM samples using from 0 to 24 bits. Words can be represented using variable-length entropy coding. The LFE channel is coded independently by decimating a full-bandwidth input, yielding a LFE bandwidth; ADPCM coding is then applied. 60

61 Performance Evaluation of Perceptual Audio Coders Traditional audio devices are measured according to their small deviations from linearity; perceptual coders are highly nonlinear. The best objective testing mean for a perceptual coder is an artificial ear. To measure perceived accuracy, the algorithm contains a model that emulates the human hearing response. Subjective listening test Mean Opinion Score (MOS) Subband coders can have unmasked quantization noise that appears as a burst of noise in a processing block. A coder with long block length can exhibit a pre-echo burst of noise just before a transient, or there might be a tinkling sound. Expert listeners can be arranged to perform the subjective listening tests. CCIR has developed a five-point impairment scale for subjective evaluation of compression algorithms: 5 Imperceptible 4 Perceptible, but not annoying 3 Slightly annoying 2 Annoying 1 Very annoying Psychoacoustics Coding Panels of listeners rate the compression algorithms on a continuous scale in scores from 5.0 to 1.0. Original uncompressed material may receive an average score of 4.8 on the CCIR scale. When a coder also obtains an average score of 4.8 it is said to be transparent. Lower scores assess how far from transparency a coder is. Higher compression ratios generally score less. 61

62 Performance Evaluation of Perceptual Audio Coders Coding margin as noise-to-mask ratio (NMR) Psychoacoustics Coding The original signal and the error signal (the difference between the original and coded signals) are subjected to FFT analysis, and the resulting spectra are divided into subbands. The masking threshold (maximum masked error energy) in each original signal subband is estimated; the actual error energy in each coded signal subband is determined and compared to the masking threshold; the ratio of error energy to masking threshold is the NMR in each subband. A positive NMR would indicate an audible artifact. The NMR values can be linearly averaged and expressed in db; this means NMR measures remaining audibility headroom in the coded signal. NMR can be plotted overtime to identify areas of coding difficulty. A masking flag, generated when the NMR exceeds 0 db in a subband can be used to measure the number of impairments in a coded signal. 62

63 Performance Evaluation of Perceptual Audio Coders Perceptual Evaluation of Audio Quality (PEAQ) Psychoacoustics Coding PEAQ is a standardized algorithm for objectively measuring perceived audio quality developed by ITU in 2001 (ITU-R Recommendation BS.1387). It utilizes software to simulate perceptual properties of the human ear and then, integrate multiple model output variables (MOV) into a single metric. PEAQ characterizes the perceived audio quality as subjects would do in a listening test according to ITU-R BS PEAQ results principally model mean opinion scores (MOS) that cover a scale from 1 (bad) to 5 (excellent). 63

64 Performance Evaluation of Perceptual Audio Coders Perceptual Evaluation of Audio Quality (PEAQ) Psychoacoustics Coding The model follows the fundamental properties of the auditory system and its different stages of physiological and psychoacoustic effects. The first part models the construction of the signal with a DFT and filter banks. The other provides a cognitive processing as the human brain does. From the model comparison of the test signal with the (original) reference signal, a number of model output variables MOV are derived. Each model output variable may measure different psychoacoustic dimensions. In the final stage the MOV values are combined to produce a MOSlike result that copes with subjective quality assessment. There are two model versions defined in the standard: basic and advanced versions. 11 MOVs are used for the final mapping to a quality measure whereas the advanced version uses 5 MOVs. 64

65 Audio Coding Formats for the Compact Disc (CD) and Digital Versatile Disc (DVD) The compact disc The CD-DA (Digital Audio) standard was developed by Philips and Sony in First introduced as commercial product in 1982 Introduced CD-ROM (1984), CD-I (1986), CD-WO (1988), Video CD (1994) and CD-RW (1996), SACD (1999) Use 750nm laser with 780MB storage Digital Versatile Disc (DVD) Use 650nm laser with 4.7GB storage Blue-ray disc Introduced in 2006, use blue-violet laser (405nm), 27GB on a single layer and 54GB on a dual-layer disc Supported by Apple, Hitachi, Philips, Samsung, Sharp and Sony HD-DVD 15GB storage, by Toshiba, lost the format war so end of production Lossless Audio Coding 65

66 Audio Coding Formats for CD and DVD CD-DA format Lossless Audio Coding 16-bit PCM data sampled at 44.1 khz, bit rate at 1.41 mbps. Other overhead such as error correction, synchronization, and modulation are required. It holds 784 MB of user information or 74 minutes of stereo audio. Information is contained in pits impressed into the disc s plastic substrate. Disc diameter is 12cm. A pit is about 0.6 µm wide and disc might hold two billion of them. Each pit edge represents a binary 1; flat areas between pits or areas within pits are decoded as binary 0s. Data is read from the disc as a change in intensity of reflected laser light. The pits are aligned in a spiral track running from the inside diameter of the disc to the outside. There are 22,188 revolution across the disc s signal surface of 35.5 mm. The pits are encoded with eight-to-fourteen modulation (EFM) for greater storage density, and Cross Interleave Reed-Solomon code (CIRC) for error correction. EFM is an efficient and highly structured (2,10) run-length limited (RLL) code and is very tolerant of imperfections, provides very high density, and promotes stable clock recovery by a self-clocking decoder. 66

67 Audio Coding Formats for CD and DVD CD-DA format Lossless Audio Coding All data on a CD is formatted by frames. All of the required data is placed into frame format during encoding. Each frame consisting of 588 channel bits. Six 32-bit PCM audio samples (left & right channels) are grouped in a frame. Eight 8-bit parity symbols are generated per frame. One subcode symbol is added per frame. Subcodes contain information describing where tracks begin and end, track numbers, disc timing, index points, and other parameters. There are six possible subcodes (P, Q,R, S, T, U, and V). CD-DA uses only P and Q subcodes. The data in frame format is modulated using EFM. This increases data density and help facilitate control of the spindle motor speed. Block of 14 channel bits are linked by three merging bits to maintain the proper run length between words, as well as suppress dc content, and aid clock synchronization. 67

68 Audio Coding Formats for CD and DVD CD-DA format Lossless Audio Coding Frame of 588 channel bits 3 additional bits are added between two EFMs to bring the Digital Sum Value (DSV) to zero 68

69 Audio Coding Formats for CD and DVD Super Audio CD (SACD) Introduced by Philips and Sony in 1999 Supports discrete-channel (two-channel and multi-channel) audio recording Uses one-bit Direct Stream Digital (DSD) coding method Uses the same dimension as a CD: 12cm diameter and 1.2 mm thickness. The laser wavelength is 650nm, pit length 0.4 µm. Holds 4.7 GBytes of data; for 2-channel stereo this provides about 110 minutes of playing time. There are single-layer, dual-layer or hybrid disc constructions (single layer 4.7GB). All SACD discs incorporate an invisible watermark that is physically embedded in the substrate of the disc. The watermark is used to conduct mutual authentication of the player and the disc; the SACD player will reject any discs that do not bear an authentic watermark. SACD player can play back both SACD and CD discs. Text and graphics can be included on a SACD disc. Direct Stream Digital (DSD) coding for SACD One-bit pulse density form using sigma-delta modulation. Lossless Audio Coding Similar to one-bit sigma delta AD converter, but DSD does not require decimation filtering; instead, the original sampling frequency is retained. Onebit data is recorded directly on the 69 disc.

70 Audio Coding Formats for CD and DVD Lossless Audio Coding Direct Stream Digital (DSD) coding for SACD DSD does not employ interpolation (oversampling) filtering in the playback process. DSD uses a sampling frequency that is 64-times 44.1 khz, or MHz with one bit quantization. The overall bit rate is 4 times higher than on a CD. A lossless coding algorithm known as Direct Stream Transfer (DST) has been adopted for the SACD format; it uses an adaptive prediction filter and arithmetic coding to effectively double the disc capacity. Eight DSD channels (6 surround plus stereo mix) on a 4.7GB data layer are allowed a playing time of 27 minutes. With DST compression, a 74-minute playing time is accommodated. Sigma Delta Modulation in SACD The block H(z) represents the noise-shaping filter. The function of this noise-shaping filter is to shape the quantization noise, introduced by the coarse 1-bit quantization, in such a way that virtually all quantization noise falls outside the 0-20 khz frequency band. H(z) is required to be a low-pass filter with a very high gain in audio band. In order to achieve a dynamic range that is good enough for SACD, e.g., 120 db, the order of the noise-shaping filter should be at least five. Advantages: High sampling frequency reduces warp-around of high harmonics due to non-linearity process Reduce distortion due to warping and ringing caused by sharp antialias filters Disadvantage: Need to convert to linear PCM format for postediting 70

71 Audio Coding Formats for CD and DVD Direct Stream Transfer (DST) Lossless Audio Coding DST is a lossless compression with a compression ratio of at least 2.7. The resulting bit rate is between 5.6 and 8.4 Mb/s for multi-channel audio. Unlike linear PCM, DSD data is onebit data so standard lossless audio compression algorithms working on multi-bit PCM data cannot be used. The main building blocks of DST are framing, prediction, and entropy encoding. Framing: 37,632 bits per frame. Predictor: look ahead linear predictor with order ranges between 1 and 128, the predicted signal is a multibit signal, i.e., predicting 1-bit signal. Probability table: Based on the predicted multibit signal, an error probability table is calculated. The difference between the quantized predicted signal (conversion from multibit to 1 bit) and the input signal gives the error signal which is then encoded by an entropy coder The filter coefficients and the entropy-coded error signal are packed and stored on the disc. 71

72 Audio Coding Formats for CD and DVD Digital Versatile Disc (DVD) Lossless Audio Coding The DVD family of formats was developed by a consortium of manufacturers known as the DVD Forum. Preliminary DVD format was announced in The DVD family includes formats for video, audio, and computer applications. DVD- Video and DVD-ROM families were first introduced in

73 Audio Coding Formats for CD and DVD Physical specification of DVD Uses same diameter (120mm) and thickness (1.2mm) as CD The read-only formats for DVD-Video, DVD-Audio and DVD-ROM share the same disc construction, modulation code and error correction. Single or dual layer per substrate construction Track pitch 0.74 µm, minimum pit length 0.4µm, laser wavelength 635 or 650 nm (CD 780 nm) A DVD layer can store 4.7 Gbytes of data, multiple layer provides greater capacity DVD data layers are embedded deeply within the disc and thus less vulnerable to damage than CD. DVD-Audio First version was finalized in Lossless Audio Coding All DVD-Audio discs must contain an uncompressed or MLP-compressed LPCM version of the DVD-Audio portion of the program. DVD-Audio discs may also include video programs with Dolby Digital, DTS and/or LPCM tracks. Two types of DVD-Audio discs are defined. An Audio-only disc contains only music information. An Audio-only disc can optionally include still pictures (one per track), text information, and a visual menu. Audio with Video (AV) disc can contain motion video information formatted as a subset of the DVD-Video format. 73

74 Audio Coding Formats for CD and DVD Coding Format Supported for DVD-Audio Lossless Audio Coding Parameter DVD-Audio SACD CD Audio Coding 16-,20-,or 24-b LPCM 1-b DSD 16-b LPCM Sampling rate 44.1, 48, 88.2, 96, 176.4, or 192 khz 2,822.4 khz 44.1 khz Channels Compression Yes (MLP) Yes (DST) None Content protection Yes Yes No Playback time * min min 74 min Frequency response DC-96 khz DC-100 khz DC-20 khz Dynamic range Up to 144 db Over 120 db 96 db * For 62 min, we assume 96kHz sampling, 20-b samples, and five channels. For 843 min, we assume 44.1kHz sampling, 16-b samples, and one channel. 74

75 Lossless Audio Coding Audio Coding Formats for CD and DVD Parameter Configuration Supported for DVD-Audio 75

76 Audio Coding Formats for CD and DVD Meridian Lossless Packing (MLP) Coding for DVD-Audio MLP is a lossless coding standard for DVD-audio which allows 74 minutes of high-quality multichannel music to be recorded on a single layer 4.7GB DVD. Problem with lossless coding generally achieves low compression rate compression rate achieved depends very much on the data, low compression for random signals and high compression for silent or near-silent signals Fortunately, it turns out that real acoustic signals tend not to provide full-scale white noise in all channels for any significant duration! will have a variable data rate on normal audio content Lossless Audio Coding MLP tackles this by attempting to maximize the compression at all times using this set of techniques: looking for dead air channels that do not exercise all the available word length channels that do not use the available bandwidth removing interchannel correlations efficiently coding the residual information smoothing coded information by buffering 76

77 Audio Coding Formats for CD and DVD MLP Encoder Important novel techniques used lossless processing lossless matrixing lossless use of infinite impulse response filters managed first-in, first-out buffering across transmission decoder lossless self check operation on heterogeneous channels sampling frequency Lossless Audio Coding 77

78 Audio Coding Formats for CD and DVD MLP Encoding Incoming channels may be re-mapped to optimize the use of substreams. The MLP stream contains hierarchical structure of substreams. Incoming channels can be matrixed into two (or more) substreams. This method allows simpler decoders to access a substream of the overall signal. Each channel is shifted to recover unused capacity, e.g., less than 24-b precision or less than full scale. A lossless matrix technique optimizes the channel use by reducing the interchannel correlations. The signal in each channel is decorrelated using a separate predictor for each channel. The decorrelated audio is further optimized using entropy coding. Each substream is buffered using a FIFO memory system to smooth the encoded data rate. Multiple data substreams are interleaved. Lossless Audio Coding The stream is packetized for fixed or variable data rate for target carrier. 78

79 Audio Coding Formats for CD and DVD MLP Encoding: Lossless Matrix Lossless Audio Coding In general, the encoded data rate is minimized by reducing the commonality between channels, e.g., rotate a stereo mix from left/right to sum/difference. Conventional matrixing is not lossless since inverse matrix reconstructs the original signals with rounding errors. The MLP encoder decomposes the general matrix into a cascade of affine transformations. Each affine transformation modifies just one channel by adding a quantized linear combination of the other channels. If the encoder substracts a particular linear combination, then the decoder must add it back. The quantizers Q ensure constant input-output wordwidth and lossless operation on different computing platforms. A Single Loosless Matrix Encode and Decode 79

80 Audio Coding Formats for CD and DVD MLP Encoding: Prediction The function of a decorrelator (predictive filter) is to decorrelate the input signal such that there is no correlation between the currently transmitted difference signal and its previous values. A decorrelator can make significant gains by flattening the spectrum of the audio signals. Ideally, the transmitted difference spectrum should have a flat spectrum, i.e., white noise. The average power of the decorrelated difference signal is significantly lower than the original signal, hence the reduction in data rate. The MLP encoder uses a separate predictor for each encoded channel. The encoder is free to select IIR or FIR filters up to eighth order from a wide palette. Most lossless compression schemes use FIR filters, however, IIR filters have advantages in some situations where control of peak data rate is important and the input spectrum exhibits an extremely wide dynamic range. Lowest curve MLP nominal compression rate Middle curve lossless matrix is switched off Upper curve constrained to use FIR prediction only Lossless Audio Coding The top line shows the 9.6 Mb/s data-rate limit for DVD-Audio 80

81 Audio Coding Formats for CD and DVD MLP Encoding: Entropy coding Once the cross-channel and inter-sample correlations have been removed, it remains to encode the individual samples of the decorrelated signal as efficiently as possible. Audio signals even after decorrelation tend to be peaky with distribution resemble a Laplacian distribution, that is a two-sided decaying exponential. Therefore, there will be coding gain using entropy coding. The MLP encoder may choose from a number of entropy coding methods; Huffman and Arithmetic coding. Buffering Lossless Audio Coding Nominal audio signals can be well predicted, however, there are occasional fragments like, sibilants, synthesized noise, or percussive events that have high entropy. MLP uses a particular form of stream buffering that can reduce the variations in transmitted data rate, absorbing transients that are hard to compress. FIFO memory buffers are used in the encoder and decoder. These buffers are configured to give a constant notional delay across encode and decode. This overall delay is small, typically of the order of 75ms. FIFO management minimizes the part of the delay due to the decoder buffer. So, this buffer is normally empty and fills only ahead of sections with high instantaneous data rate. During these sections, the decoder s buffer empties and is thus able to deliver data to the decoder core at a higher rate that the transmission channel is able to provide. In the context of a disc, this strategy has the effect of moving excess data away from the stress peaks. The encoder can use the buffering for a number of purposes, e.g., keeping the data rate below a present (format) limit, minimizing the peak data rate over an encoded section. 81

82 Audio Coding Formats for CD and DVD MLP Encoding: Lossless Audio Coding Use of Substreams The MLP stream contains a hierarchical structure of substreams. Incoming channels can be matrixed into two (or more) sunstreams. This method allows simpler decoders to access a subset of the overall signal. 82

83 Audio Coding Formats for CD and DVD MLP Encoding: Two-Channel Downmix Lossless Audio Coding It is often useful to provide a means for accessing high-resolution multichannel audio streams on two-channel playback devices. In an application such as DVD-Audio, the content provider can place separate multi- and two-channel streams on the disc. However, to do this requires separate mix, mastering, and authoring processes and uses more disc capacity. In case where only one multi-channel stream is available, a fixed or guided downmix is required, that means it is first necessary to decode the full multichannel signal. MLP provides an elegant and unique solution. The encoder combines lossless matrixing with the use of two substreams in such a way as to optimally encode both the L0, R0 downmix and the multichannel version. Two Substream in Encoding 83

84 Audio Coding Formats for CD and DVD MLP Encoding: Downmix in Lossless Encoder Lossless Audio Coding Downmix instructions are fed to the matrix 1 to determine some coefficients for the lossless matrix. The matrices then perform a rotation such that the two channels on the substream 0 decode to the desired stereo mix and combine with substream 1 to provide full multichannel. Decoding Two Substreams Because the two-channel downmix is a linear combination of the multichannel mix, then strictly, no new information has been added. In practice only a modest increase in overall data rate (typically 1 bit per sample). The advantages of this method are: The quality of mix-down is guaranteed. The producer can listen to it at the encoding stage. A two-channel-only playback device does not need to decode the multichannel stream and then perform mix-down. Instead, the lossless decoder only needs to decode substream 0. A more complex decoder may access both the two-channel and multichannel versions losslessly. The downmix coefficients do not have to be constant for a whole track, but can be varied 84 under artist control.

85 Audio Coding Formats for CD and DVD MLP Compression Rate Lossless Audio Coding Sampling (khz) Data-rate Reduction Data-rate Reduction (bit/sample/channel) Peak to to to 14 Average MLP decoder is relatively low complexity. A decoder capable of extracting a two-channel stream at 192 khz requires approximately 27 MIPS, while 40 MIPS is required to decode 6 channels at 96 khz. Playing Time on DVD-Audio DVD audio holds approximately 4.7GB of data and has a maximum data transfer rate of 9.6 Mb/s for an audio stream. Six channels of 96kHz/24-b LPCM audio have a data rate of Mb/s which is well in excess of 9.6 Mb/s. In addition, at Mb/s, the data capacity of the disc would be used up in approximately 45 min. MLP meets the requirement of the industry norm of 74 min. Here are some examples of playing times that can be obtained: 5.1 channels, 96 khz/24-b: 100 min 6 channels, 96 khz/24-b: 86 min 2 channels, 96 khz/24-b: 4 hours 2 channels, 192 khz/24-b: 2 hours 2 channels, 44.1 khz/16-b: 12 hours 1 channels, 44.1 khz/16-b: 25 hours 85

86 MPEG-4 Audio Lossless Coding (MPEG-4 ALS) Lossless Audio Coding MPEG-4 ALS is an extension to MPEG-4 Part 3 audio coding standard to allow lossless compression. (ISO ). MPEG-4 ALS Encoder MPEG-4 ALS Decoder 86

87 MPEG-4 Audio Lossless Coding (MPEG-4 ALS) MPEG-4 ALS Predictor Lossless Audio Coding The predictor used in MPEG-4 ALS is a combination of a short-term and longterm predictor. The long-term predictor is aimed to remove the time correlation in audio signals which are mostly periodic in nature. It has the form: where is the tap delay and M is the number of non-zero taps (M=5 in MEPG-4 ALS). The short-term predictor is an order-k linear predictor of the form: The predicted signal from previous samples is calculated as: where Q is the number of bits reserved for fixed-point representation of coefficients. The computation of linear predictor coefficients can be done using standard techniques such as autocorrelation approach with Levinson-Durbin algorithm. In MPEG4 ALS, PARCOR (reflection) coefficients are quantized to 8 bits each after an arcsine transformation. 87

Audio-coding standards

Audio-coding standards Audio-coding standards The goal is to provide CD-quality audio over telecommunications networks. Almost all CD audio coders are based on the so-called psychoacoustic model of the human auditory system.

More information

Optical Storage Technology. MPEG Data Compression

Optical Storage Technology. MPEG Data Compression Optical Storage Technology MPEG Data Compression MPEG-1 1 Audio Standard Moving Pictures Expert Group (MPEG) was formed in 1988 to devise compression techniques for audio and video. It first devised the

More information

Multimedia Communications. Audio coding

Multimedia Communications. Audio coding Multimedia Communications Audio coding Introduction Lossy compression schemes can be based on source model (e.g., speech compression) or user model (audio coding) Unlike speech, audio signals can be generated

More information

Audio-coding standards

Audio-coding standards Audio-coding standards The goal is to provide CD-quality audio over telecommunications networks. Almost all CD audio coders are based on the so-called psychoacoustic model of the human auditory system.

More information

5: Music Compression. Music Coding. Mark Handley

5: Music Compression. Music Coding. Mark Handley 5: Music Compression Mark Handley Music Coding LPC-based codecs model the sound source to achieve good compression. Works well for voice. Terrible for music. What if you can t model the source? Model the

More information

Mpeg 1 layer 3 (mp3) general overview

Mpeg 1 layer 3 (mp3) general overview Mpeg 1 layer 3 (mp3) general overview 1 Digital Audio! CD Audio:! 16 bit encoding! 2 Channels (Stereo)! 44.1 khz sampling rate 2 * 44.1 khz * 16 bits = 1.41 Mb/s + Overhead (synchronization, error correction,

More information

MPEG-4 General Audio Coding

MPEG-4 General Audio Coding MPEG-4 General Audio Coding Jürgen Herre Fraunhofer Institute for Integrated Circuits (IIS) Dr. Jürgen Herre, hrr@iis.fhg.de 1 General Audio Coding Solid state players, Internet audio, terrestrial and

More information

Principles of Audio Coding

Principles of Audio Coding Principles of Audio Coding Topics today Introduction VOCODERS Psychoacoustics Equal-Loudness Curve Frequency Masking Temporal Masking (CSIT 410) 2 Introduction Speech compression algorithm focuses on exploiting

More information

The MPEG-4 General Audio Coder

The MPEG-4 General Audio Coder The MPEG-4 General Audio Coder Bernhard Grill Fraunhofer Institute for Integrated Circuits (IIS) grl 6/98 page 1 Outline MPEG-2 Advanced Audio Coding (AAC) MPEG-4 Extensions: Perceptual Noise Substitution

More information

Chapter 14 MPEG Audio Compression

Chapter 14 MPEG Audio Compression Chapter 14 MPEG Audio Compression 14.1 Psychoacoustics 14.2 MPEG Audio 14.3 Other Commercial Audio Codecs 14.4 The Future: MPEG-7 and MPEG-21 14.5 Further Exploration 1 Li & Drew c Prentice Hall 2003 14.1

More information

Audio Compression. Audio Compression. Absolute Threshold. CD quality audio:

Audio Compression. Audio Compression. Absolute Threshold. CD quality audio: Audio Compression Audio Compression CD quality audio: Sampling rate = 44 KHz, Quantization = 16 bits/sample Bit-rate = ~700 Kb/s (1.41 Mb/s if 2 channel stereo) Telephone-quality speech Sampling rate =

More information

Audio Coding Standards

Audio Coding Standards Audio Standards Kari Pihkala 13.2.2002 Tik-111.590 Multimedia Outline Architectural Overview MPEG-1 MPEG-2 MPEG-4 Philips PASC (DCC cassette) Sony ATRAC (MiniDisc) Dolby AC-3 Conclusions 2 Architectural

More information

Parametric Coding of High-Quality Audio

Parametric Coding of High-Quality Audio Parametric Coding of High-Quality Audio Prof. Dr. Gerald Schuller Fraunhofer IDMT & Ilmenau Technical University Ilmenau, Germany 1 Waveform vs Parametric Waveform Filter-bank approach Mainly exploits

More information

Audio Coding and MP3

Audio Coding and MP3 Audio Coding and MP3 contributions by: Torbjørn Ekman What is Sound? Sound waves: 20Hz - 20kHz Speed: 331.3 m/s (air) Wavelength: 165 cm - 1.65 cm 1 Analogue audio frequencies: 20Hz - 20kHz mono: x(t)

More information

Appendix 4. Audio coding algorithms

Appendix 4. Audio coding algorithms Appendix 4. Audio coding algorithms 1 Introduction The main application of audio compression systems is to obtain compact digital representations of high-quality (CD-quality) wideband audio signals. Typically

More information

Lecture 16 Perceptual Audio Coding

Lecture 16 Perceptual Audio Coding EECS 225D Audio Signal Processing in Humans and Machines Lecture 16 Perceptual Audio Coding 2012-3-14 Professor Nelson Morgan today s lecture by John Lazzaro www.icsi.berkeley.edu/eecs225d/spr12/ Hero

More information

Perceptual coding. A psychoacoustic model is used to identify those signals that are influenced by both these effects.

Perceptual coding. A psychoacoustic model is used to identify those signals that are influenced by both these effects. Perceptual coding Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal. Perceptual encoders, however, have been designed for the compression of general

More information

MPEG-1. Overview of MPEG-1 1 Standard. Introduction to perceptual and entropy codings

MPEG-1. Overview of MPEG-1 1 Standard. Introduction to perceptual and entropy codings MPEG-1 Overview of MPEG-1 1 Standard Introduction to perceptual and entropy codings Contents History Psychoacoustics and perceptual coding Entropy coding MPEG-1 Layer I/II Layer III (MP3) Comparison and

More information

Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal.

Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal. Perceptual coding Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal. Perceptual encoders, however, have been designed for the compression of general

More information

Module 9 AUDIO CODING. Version 2 ECE IIT, Kharagpur

Module 9 AUDIO CODING. Version 2 ECE IIT, Kharagpur Module 9 AUDIO CODING Lesson 29 Transform and Filter banks Instructional Objectives At the end of this lesson, the students should be able to: 1. Define the three layers of MPEG-1 audio coding. 2. Define

More information

Perceptual Coding. Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding

Perceptual Coding. Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding Perceptual Coding Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding Part II wrap up 6.082 Fall 2006 Perceptual Coding, Slide 1 Lossless vs.

More information

ELL 788 Computational Perception & Cognition July November 2015

ELL 788 Computational Perception & Cognition July November 2015 ELL 788 Computational Perception & Cognition July November 2015 Module 11 Audio Engineering: Perceptual coding Coding and decoding Signal (analog) Encoder Code (Digital) Code (Digital) Decoder Signal (analog)

More information

EE482: Digital Signal Processing Applications

EE482: Digital Signal Processing Applications Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu EE482: Digital Signal Processing Applications Spring 2014 TTh 14:30-15:45 CBC C222 Lecture 13 Audio Signal Processing 14/04/01 http://www.ee.unlv.edu/~b1morris/ee482/

More information

DAB. Digital Audio Broadcasting

DAB. Digital Audio Broadcasting DAB Digital Audio Broadcasting DAB history DAB has been under development since 1981 at the Institut für Rundfunktechnik (IRT). In 1985 the first DAB demonstrations were held at the WARC-ORB in Geneva

More information

2.4 Audio Compression

2.4 Audio Compression 2.4 Audio Compression 2.4.1 Pulse Code Modulation Audio signals are analog waves. The acoustic perception is determined by the frequency (pitch) and the amplitude (loudness). For storage, processing and

More information

MPEG-4 aacplus - Audio coding for today s digital media world

MPEG-4 aacplus - Audio coding for today s digital media world MPEG-4 aacplus - Audio coding for today s digital media world Whitepaper by: Gerald Moser, Coding Technologies November 2005-1 - 1. Introduction Delivering high quality digital broadcast content to consumers

More information

Figure 1. Generic Encoder. Window. Spectral Analysis. Psychoacoustic Model. Quantize. Pack Data into Frames. Additional Coding.

Figure 1. Generic Encoder. Window. Spectral Analysis. Psychoacoustic Model. Quantize. Pack Data into Frames. Additional Coding. Introduction to Digital Audio Compression B. Cavagnolo and J. Bier Berkeley Design Technology, Inc. 2107 Dwight Way, Second Floor Berkeley, CA 94704 (510) 665-1600 info@bdti.com http://www.bdti.com INTRODUCTION

More information

Audio coding for digital broadcasting

Audio coding for digital broadcasting Recommendation ITU-R BS.1196-4 (02/2015) Audio coding for digital broadcasting BS Series Broadcasting service (sound) ii Rec. ITU-R BS.1196-4 Foreword The role of the Radiocommunication Sector is to ensure

More information

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Senior Manager, CE Technology Dolby Australia Pty Ltd

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Senior Manager, CE Technology Dolby Australia Pty Ltd Introducing Audio Signal Processing & Audio Coding Dr Michael Mason Senior Manager, CE Technology Dolby Australia Pty Ltd Overview Audio Signal Processing Applications @ Dolby Audio Signal Processing Basics

More information

Compressed Audio Demystified by Hendrik Gideonse and Connor Smith. All Rights Reserved.

Compressed Audio Demystified by Hendrik Gideonse and Connor Smith. All Rights Reserved. Compressed Audio Demystified Why Music Producers Need to Care About Compressed Audio Files Download Sales Up CD Sales Down High-Definition hasn t caught on yet Consumers don t seem to care about high fidelity

More information

SAOC and USAC. Spatial Audio Object Coding / Unified Speech and Audio Coding. Lecture Audio Coding WS 2013/14. Dr.-Ing.

SAOC and USAC. Spatial Audio Object Coding / Unified Speech and Audio Coding. Lecture Audio Coding WS 2013/14. Dr.-Ing. SAOC and USAC Spatial Audio Object Coding / Unified Speech and Audio Coding Lecture Audio Coding WS 2013/14 Dr.-Ing. Andreas Franck Fraunhofer Institute for Digital Media Technology IDMT, Germany SAOC

More information

Ch. 5: Audio Compression Multimedia Systems

Ch. 5: Audio Compression Multimedia Systems Ch. 5: Audio Compression Multimedia Systems Prof. Ben Lee School of Electrical Engineering and Computer Science Oregon State University Chapter 5: Audio Compression 1 Introduction Need to code digital

More information

6MPEG-4 audio coding tools

6MPEG-4 audio coding tools 6MPEG-4 audio coding 6.1. Introduction to MPEG-4 audio MPEG-4 audio [58] is currently one of the most prevalent audio coding standards. It combines many different types of audio coding into one integrated

More information

New Results in Low Bit Rate Speech Coding and Bandwidth Extension

New Results in Low Bit Rate Speech Coding and Bandwidth Extension Audio Engineering Society Convention Paper Presented at the 121st Convention 2006 October 5 8 San Francisco, CA, USA This convention paper has been reproduced from the author's advance manuscript, without

More information

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Snr Staff Eng., Team Lead (Applied Research) Dolby Australia Pty Ltd

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Snr Staff Eng., Team Lead (Applied Research) Dolby Australia Pty Ltd Introducing Audio Signal Processing & Audio Coding Dr Michael Mason Snr Staff Eng., Team Lead (Applied Research) Dolby Australia Pty Ltd Introducing Audio Signal Processing & Audio Coding 2013 Dolby Laboratories,

More information

Data Compression. Audio compression

Data Compression. Audio compression 1 Data Compression Audio compression Outline Basics of Digital Audio 2 Introduction What is sound? Signal-to-Noise Ratio (SNR) Digitization Filtering Sampling and Nyquist Theorem Quantization Synthetic

More information

DRA AUDIO CODING STANDARD

DRA AUDIO CODING STANDARD Applied Mechanics and Materials Online: 2013-06-27 ISSN: 1662-7482, Vol. 330, pp 981-984 doi:10.4028/www.scientific.net/amm.330.981 2013 Trans Tech Publications, Switzerland DRA AUDIO CODING STANDARD Wenhua

More information

Contents. 3 Vector Quantization The VQ Advantage Formulation Optimality Conditions... 48

Contents. 3 Vector Quantization The VQ Advantage Formulation Optimality Conditions... 48 Contents Part I Prelude 1 Introduction... 3 1.1 Audio Coding... 4 1.2 Basic Idea... 6 1.3 Perceptual Irrelevance... 8 1.4 Statistical Redundancy... 9 1.5 Data Modeling... 9 1.6 Resolution Challenge...

More information

Efficient Representation of Sound Images: Recent Developments in Parametric Coding of Spatial Audio

Efficient Representation of Sound Images: Recent Developments in Parametric Coding of Spatial Audio Efficient Representation of Sound Images: Recent Developments in Parametric Coding of Spatial Audio Dr. Jürgen Herre 11/07 Page 1 Jürgen Herre für (IIS) Erlangen, Germany Introduction: Sound Images? Humans

More information

Bit or Noise Allocation

Bit or Noise Allocation ISO 11172-3:1993 ANNEXES C & D 3-ANNEX C (informative) THE ENCODING PROCESS 3-C.1 Encoder 3-C.1.1 Overview For each of the Layers, an example of one suitable encoder with the corresponding flow-diagram

More information

DSP. Presented to the IEEE Central Texas Consultants Network by Sergio Liberman

DSP. Presented to the IEEE Central Texas Consultants Network by Sergio Liberman DSP The Technology Presented to the IEEE Central Texas Consultants Network by Sergio Liberman Abstract The multimedia products that we enjoy today share a common technology backbone: Digital Signal Processing

More information

Audio Fundamentals, Compression Techniques & Standards. Hamid R. Rabiee Mostafa Salehi, Fatemeh Dabiran, Hoda Ayatollahi Spring 2011

Audio Fundamentals, Compression Techniques & Standards. Hamid R. Rabiee Mostafa Salehi, Fatemeh Dabiran, Hoda Ayatollahi Spring 2011 Audio Fundamentals, Compression Techniques & Standards Hamid R. Rabiee Mostafa Salehi, Fatemeh Dabiran, Hoda Ayatollahi Spring 2011 Outlines Audio Fundamentals Sampling, digitization, quantization μ-law

More information

Efficient Implementation of Transform Based Audio Coders using SIMD Paradigm and Multifunction Computations

Efficient Implementation of Transform Based Audio Coders using SIMD Paradigm and Multifunction Computations Efficient Implementation of Transform Based Audio Coders using SIMD Paradigm and Multifunction Computations Luckose Poondikulam S (luckose@sasken.com), Suyog Moogi (suyog@sasken.com), Rahul Kumar, K P

More information

DIGITAL TELEVISION 1. DIGITAL VIDEO FUNDAMENTALS

DIGITAL TELEVISION 1. DIGITAL VIDEO FUNDAMENTALS DIGITAL TELEVISION 1. DIGITAL VIDEO FUNDAMENTALS Television services in Europe currently broadcast video at a frame rate of 25 Hz. Each frame consists of two interlaced fields, giving a field rate of 50

More information

ISO/IEC INTERNATIONAL STANDARD

ISO/IEC INTERNATIONAL STANDARD INTERNATIONAL STANDARD ISO/IEC 13818-7 Second edition 2003-08-01 Information technology Generic coding of moving pictures and associated audio information Part 7: Advanced Audio Coding (AAC) Technologies

More information

Audio and video compression

Audio and video compression Audio and video compression 4.1 introduction Unlike text and images, both audio and most video signals are continuously varying analog signals. Compression algorithms associated with digitized audio and

More information

CISC 7610 Lecture 3 Multimedia data and data formats

CISC 7610 Lecture 3 Multimedia data and data formats CISC 7610 Lecture 3 Multimedia data and data formats Topics: Perceptual limits of multimedia data JPEG encoding of images MPEG encoding of audio MPEG and H.264 encoding of video Multimedia data: Perceptual

More information

/ / _ / _ / _ / / / / /_/ _/_/ _/_/ _/_/ _\ / All-American-Advanced-Audio-Codec

/ / _ / _ / _ / / / / /_/ _/_/ _/_/ _/_/ _\ / All-American-Advanced-Audio-Codec / / _ / _ / _ / / / / /_/ _/_/ _/_/ _/_/ _\ / All-American-Advanced-Audio-Codec () **Z ** **=Z ** **= ==== == **= ==== \"\" === ==== \"\"\" ==== \"\"\"\" Tim O Brien Colin Sullivan Jennifer Hsu Mayank

More information

3GPP TS V6.2.0 ( )

3GPP TS V6.2.0 ( ) TS 26.401 V6.2.0 (2005-03) Technical Specification 3rd Generation Partnership Project; Technical Specification Group Services and System Aspects; General audio codec audio processing functions; Enhanced

More information

INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO

INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO ISO/IEC JTC1/SC29 WG11 N15073 February 2015, Geneva,

More information

Fundamentals of Perceptual Audio Encoding. Craig Lewiston HST.723 Lab II 3/23/06

Fundamentals of Perceptual Audio Encoding. Craig Lewiston HST.723 Lab II 3/23/06 Fundamentals of Perceptual Audio Encoding Craig Lewiston HST.723 Lab II 3/23/06 Goals of Lab Introduction to fundamental principles of digital audio & perceptual audio encoding Learn the basics of psychoacoustic

More information

Parametric Coding of Spatial Audio

Parametric Coding of Spatial Audio Parametric Coding of Spatial Audio Ph.D. Thesis Christof Faller, September 24, 2004 Thesis advisor: Prof. Martin Vetterli Audiovisual Communications Laboratory, EPFL Lausanne Parametric Coding of Spatial

More information

Port of a Fixed Point MPEG-2 AAC Encoder on a ARM Platform

Port of a Fixed Point MPEG-2 AAC Encoder on a ARM Platform Port of a Fixed Point MPEG-2 AAC Encoder on a ARM Platform by Romain Pagniez romain@felinewave.com A Dissertation submitted in partial fulfillment of the requirements for the Degree of Master of Science

More information

ITNP80: Multimedia! Sound-II!

ITNP80: Multimedia! Sound-II! Sound compression (I) Compression of sound data requires different techniques from those for graphical data Requirements are less stringent than for video data rate for CD-quality audio is much less than

More information

MPEG-4 Advanced Audio Coding

MPEG-4 Advanced Audio Coding MPEG-4 Advanced Audio Coding Peter Doliwa Abstract The goal of the MPEG-4 Audio standard is to provide a universal toolbox for transparent and efficient coding of natural audio signals for many different

More information

AUDIOVISUAL COMMUNICATION

AUDIOVISUAL COMMUNICATION AUDIOVISUAL COMMUNICATION Laboratory Session: Audio Processing and Coding The objective of this lab session is to get the students familiar with audio processing and coding, notably psychoacoustic analysis

More information

HAVE YOUR CAKE AND HEAR IT TOO: A HUFFMAN CODED, BLOCK SWITCHING, STEREO PERCEPTUAL AUDIO CODER

HAVE YOUR CAKE AND HEAR IT TOO: A HUFFMAN CODED, BLOCK SWITCHING, STEREO PERCEPTUAL AUDIO CODER HAVE YOUR CAKE AND HEAR IT TOO: A HUFFMAN CODED, BLOCK SWITCHING, STEREO PERCEPTUAL AUDIO CODER Rob Colcord, Elliot Kermit-Canfield and Blane Wilson Center for Computer Research in Music and Acoustics,

More information

S.K.R Engineering College, Chennai, India. 1 2

S.K.R Engineering College, Chennai, India. 1 2 Implementation of AAC Encoder for Audio Broadcasting A.Parkavi 1, T.Kalpalatha Reddy 2. 1 PG Scholar, 2 Dean 1,2 Department of Electronics and Communication Engineering S.K.R Engineering College, Chennai,

More information

Speech and audio coding

Speech and audio coding Institut Mines-Telecom Speech and audio coding Marco Cagnazzo, cagnazzo@telecom-paristech.fr MN910 Advanced compression Outline Introduction Introduction Speech signal Music signal Masking Codeurs simples

More information

Principles of MPEG audio compression

Principles of MPEG audio compression Principles of MPEG audio compression Principy komprese hudebního signálu metodou MPEG Petr Kubíček Abstract The article describes briefly audio data compression. Focus of the article is a MPEG standard,

More information

CT516 Advanced Digital Communications Lecture 7: Speech Encoder

CT516 Advanced Digital Communications Lecture 7: Speech Encoder CT516 Advanced Digital Communications Lecture 7: Speech Encoder Yash M. Vasavada Associate Professor, DA-IICT, Gandhinagar 2nd February 2017 Yash M. Vasavada (DA-IICT) CT516: Adv. Digital Comm. 2nd February

More information

Lecture 7: Audio Compression & Coding

Lecture 7: Audio Compression & Coding EE E682: Speech & Audio Processing & Recognition Lecture 7: Audio Compression & Coding 1 2 3 Information, compression & quantization Speech coding Wide bandwidth audio coding Dan Ellis

More information

AUDIOVISUAL COMMUNICATION

AUDIOVISUAL COMMUNICATION AUDIOVISUAL COMMUNICATION Laboratory Session: Audio Processing and Coding The objective of this lab session is to get the students familiar with audio processing and coding, notably psychoacoustic analysis

More information

Filterbanks and transforms

Filterbanks and transforms Filterbanks and transforms Sources: Zölzer, Digital audio signal processing, Wiley & Sons. Saramäki, Multirate signal processing, TUT course. Filterbanks! Introduction! Critical sampling, half-band filter!

More information

Design and Implementation of an MPEG-1 Layer III Audio Decoder KRISTER LAGERSTRÖM

Design and Implementation of an MPEG-1 Layer III Audio Decoder KRISTER LAGERSTRÖM Design and Implementation of an MPEG-1 Layer III Audio Decoder KRISTER LAGERSTRÖM Master s Thesis Computer Science and Engineering Program CHALMERS UNIVERSITY OF TECHNOLOGY Department of Computer Engineering

More information

Embedded lossless audio coding using linear prediction and cascade coding

Embedded lossless audio coding using linear prediction and cascade coding University of Wollongong Research Online University of Wollongong Thesis Collection 1954-2016 University of Wollongong Thesis Collections 2005 Embedded lossless audio coding using linear prediction and

More information

<< WILL FILL IN THESE SECTIONS THIS WEEK to provide sufficient background>>

<< WILL FILL IN THESE SECTIONS THIS WEEK to provide sufficient background>> THE GSS CODEC MUSIC 422 FINAL PROJECT Greg Sell, Song Hui Chon, Scott Cannon March 6, 2005 Audio files at: ccrma.stanford.edu/~gsell/422final/wavfiles.tar Code at: ccrma.stanford.edu/~gsell/422final/codefiles.tar

More information

Technical PapER. between speech and audio coding. Fraunhofer Institute for Integrated Circuits IIS

Technical PapER. between speech and audio coding. Fraunhofer Institute for Integrated Circuits IIS Technical PapER Extended HE-AAC Bridging the gap between speech and audio coding One codec taking the place of two; one unified system bridging a troublesome gap. The fifth generation MPEG audio codec

More information

Video Compression An Introduction

Video Compression An Introduction Video Compression An Introduction The increasing demand to incorporate video data into telecommunications services, the corporate environment, the entertainment industry, and even at home has made digital

More information

Modeling of an MPEG Audio Layer-3 Encoder in Ptolemy

Modeling of an MPEG Audio Layer-3 Encoder in Ptolemy Modeling of an MPEG Audio Layer-3 Encoder in Ptolemy Patrick Brown EE382C Embedded Software Systems May 10, 2000 $EVWUDFW MPEG Audio Layer-3 is a standard for the compression of high-quality digital audio.

More information

Efficient Signal Adaptive Perceptual Audio Coding

Efficient Signal Adaptive Perceptual Audio Coding Efficient Signal Adaptive Perceptual Audio Coding MUHAMMAD TAYYAB ALI, MUHAMMAD SALEEM MIAN Department of Electrical Engineering, University of Engineering and Technology, G.T. Road Lahore, PAKISTAN. ]

More information

ISO/IEC INTERNATIONAL STANDARD. Information technology MPEG audio technologies Part 3: Unified speech and audio coding

ISO/IEC INTERNATIONAL STANDARD. Information technology MPEG audio technologies Part 3: Unified speech and audio coding INTERNATIONAL STANDARD This is a preview - click here to buy the full publication ISO/IEC 23003-3 First edition 2012-04-01 Information technology MPEG audio technologies Part 3: Unified speech and audio

More information

Proceedings of Meetings on Acoustics

Proceedings of Meetings on Acoustics Proceedings of Meetings on Acoustics Volume 19, 213 http://acousticalsociety.org/ ICA 213 Montreal Montreal, Canada 2-7 June 213 Engineering Acoustics Session 2pEAb: Controlling Sound Quality 2pEAb1. Subjective

More information

CS 335 Graphics and Multimedia. Image Compression

CS 335 Graphics and Multimedia. Image Compression CS 335 Graphics and Multimedia Image Compression CCITT Image Storage and Compression Group 3: Huffman-type encoding for binary (bilevel) data: FAX Group 4: Entropy encoding without error checks of group

More information

Source Coding Basics and Speech Coding. Yao Wang Polytechnic University, Brooklyn, NY11201

Source Coding Basics and Speech Coding. Yao Wang Polytechnic University, Brooklyn, NY11201 Source Coding Basics and Speech Coding Yao Wang Polytechnic University, Brooklyn, NY1121 http://eeweb.poly.edu/~yao Outline Why do we need to compress speech signals Basic components in a source coding

More information

Image Transformation Techniques Dr. Rajeev Srivastava Dept. of Computer Engineering, ITBHU, Varanasi

Image Transformation Techniques Dr. Rajeev Srivastava Dept. of Computer Engineering, ITBHU, Varanasi Image Transformation Techniques Dr. Rajeev Srivastava Dept. of Computer Engineering, ITBHU, Varanasi 1. Introduction The choice of a particular transform in a given application depends on the amount of

More information

KINGS COLLEGE OF ENGINEERING DEPARTMENT OF INFORMATION TECHNOLOGY ACADEMIC YEAR / ODD SEMESTER QUESTION BANK

KINGS COLLEGE OF ENGINEERING DEPARTMENT OF INFORMATION TECHNOLOGY ACADEMIC YEAR / ODD SEMESTER QUESTION BANK KINGS COLLEGE OF ENGINEERING DEPARTMENT OF INFORMATION TECHNOLOGY ACADEMIC YEAR 2011-2012 / ODD SEMESTER QUESTION BANK SUB.CODE / NAME YEAR / SEM : IT1301 INFORMATION CODING TECHNIQUES : III / V UNIT -

More information

Scalable Perceptual and Lossless Audio Coding based on MPEG-4 AAC

Scalable Perceptual and Lossless Audio Coding based on MPEG-4 AAC Scalable Perceptual and Lossless Audio Coding based on MPEG-4 AAC Ralf Geiger 1, Gerald Schuller 1, Jürgen Herre 2, Ralph Sperschneider 2, Thomas Sporer 1 1 Fraunhofer IIS AEMT, Ilmenau, Germany 2 Fraunhofer

More information

Perceptual Pre-weighting and Post-inverse weighting for Speech Coding

Perceptual Pre-weighting and Post-inverse weighting for Speech Coding Perceptual Pre-weighting and Post-inverse weighting for Speech Coding Niranjan Shetty and Jerry D. Gibson Department of Electrical and Computer Engineering University of California, Santa Barbara, CA,

More information

CSCD 443/533 Advanced Networks Fall 2017

CSCD 443/533 Advanced Networks Fall 2017 CSCD 443/533 Advanced Networks Fall 2017 Lecture 18 Compression of Video and Audio 1 Topics Compression technology Motivation Human attributes make it possible Audio Compression Video Compression Performance

More information

The following bit rates are recommended for broadcast contribution employing the most commonly used audio coding schemes:

The following bit rates are recommended for broadcast contribution employing the most commonly used audio coding schemes: Page 1 of 8 1. SCOPE This Operational Practice sets out guidelines for minimising the various artefacts that may distort audio signals when low bit-rate coding schemes are employed to convey contribution

More information

What is multimedia? Multimedia. Continuous media. Most common media types. Continuous media processing. Interactivity. What is multimedia?

What is multimedia? Multimedia. Continuous media. Most common media types. Continuous media processing. Interactivity. What is multimedia? Multimedia What is multimedia? Media types +Text + Graphics + Audio +Image +Video Interchange formats What is multimedia? Multimedia = many media User interaction = interactivity Script = time 1 2 Most

More information

For Mac and iphone. James McCartney Core Audio Engineer. Eric Allamanche Core Audio Engineer

For Mac and iphone. James McCartney Core Audio Engineer. Eric Allamanche Core Audio Engineer For Mac and iphone James McCartney Core Audio Engineer Eric Allamanche Core Audio Engineer 2 3 James McCartney Core Audio Engineer 4 Topics About audio representation formats Converting audio Processing

More information

INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO

INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO ISO/IEC JTC1/SC29/WG11 N15071 February 2015, Geneva,

More information

PQMF Filter Bank, MPEG-1 / MPEG-2 BC Audio. Fraunhofer IDMT

PQMF Filter Bank, MPEG-1 / MPEG-2 BC Audio. Fraunhofer IDMT PQMF Filter Bank, MPEG-1 / MPEG-2 BC Audio The Basic Paradigm of T/F Domain Audio Coding Digital Audio Input Filter Bank Bit or Noise Allocation Quantized Samples Bitstream Formatting Encoded Bitstream

More information

Multimedia. What is multimedia? Media types. Interchange formats. + Text +Graphics +Audio +Image +Video. Petri Vuorimaa 1

Multimedia. What is multimedia? Media types. Interchange formats. + Text +Graphics +Audio +Image +Video. Petri Vuorimaa 1 Multimedia What is multimedia? Media types + Text +Graphics +Audio +Image +Video Interchange formats Petri Vuorimaa 1 What is multimedia? Multimedia = many media User interaction = interactivity Script

More information

Speech-Coding Techniques. Chapter 3

Speech-Coding Techniques. Chapter 3 Speech-Coding Techniques Chapter 3 Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones and zeros The lower the bandwidth, the lower the quality RTP payload types

More information

Lecture 5: Compression I. This Week s Schedule

Lecture 5: Compression I. This Week s Schedule Lecture 5: Compression I Reading: book chapter 6, section 3 &5 chapter 7, section 1, 2, 3, 4, 8 Today: This Week s Schedule The concept behind compression Rate distortion theory Image compression via DCT

More information

Implementation of a MPEG 1 Layer I Audio Decoder with Variable Bit Lengths

Implementation of a MPEG 1 Layer I Audio Decoder with Variable Bit Lengths Implementation of a MPEG 1 Layer I Audio Decoder with Variable Bit Lengths A thesis submitted in fulfilment of the requirements of the degree of Master of Engineering 23 September 2008 Damian O Callaghan

More information

Outline Introduction MPEG-2 MPEG-4. Video Compression. Introduction to MPEG. Prof. Pratikgiri Goswami

Outline Introduction MPEG-2 MPEG-4. Video Compression. Introduction to MPEG. Prof. Pratikgiri Goswami to MPEG Prof. Pratikgiri Goswami Electronics & Communication Department, Shree Swami Atmanand Saraswati Institute of Technology, Surat. Outline of Topics 1 2 Coding 3 Video Object Representation Outline

More information

Digital Audio for Multimedia

Digital Audio for Multimedia Proceedings Signal Processing for Multimedia - NATO Advanced Audio Institute in print, 1999 Digital Audio for Multimedia Abstract Peter Noll Technische Universität Berlin, Germany Einsteinufer 25 D-105

More information

DVB Audio. Leon van de Kerkhof (Philips Consumer Electronics)

DVB Audio. Leon van de Kerkhof (Philips Consumer Electronics) eon van de Kerkhof Philips onsumer Electronics Email: eon.vandekerkhof@ehv.ce.philips.com Introduction The introduction of the ompact Disc, already more than fifteen years ago, has brought high quality

More information

Squeeze Play: The State of Ady0 Cmprshn. Scott Selfon Senior Development Lead Xbox Advanced Technology Group Microsoft

Squeeze Play: The State of Ady0 Cmprshn. Scott Selfon Senior Development Lead Xbox Advanced Technology Group Microsoft Squeeze Play: The State of Ady0 Cmprshn Scott Selfon Senior Development Lead Xbox Advanced Technology Group Microsoft Agenda Why compress? The tools at present Measuring success A glimpse of the future

More information

Digital Speech Coding

Digital Speech Coding Digital Speech Processing David Tipper Associate Professor Graduate Program of Telecommunications and Networking University of Pittsburgh Telcom 2700/INFSCI 1072 Slides 7 http://www.sis.pitt.edu/~dtipper/tipper.html

More information

Wavelet filter bank based wide-band audio coder

Wavelet filter bank based wide-band audio coder Wavelet filter bank based wide-band audio coder J. Nováček Czech Technical University, Faculty of Electrical Engineering, Technicka 2, 16627 Prague, Czech Republic novacj1@fel.cvut.cz 3317 New system for

More information

ETSI TS V (201

ETSI TS V (201 TS 126 401 V13.0.0 (201 16-01) TECHNICAL SPECIFICATION Digital cellular telecommunications system (Phase 2+); Universal Mobile Telecommunications System (UMTS); LTE; General audio codec audio processing

More information

ROW.mp3. Colin Raffel, Jieun Oh, Isaac Wang Music 422 Final Project 3/12/2010

ROW.mp3. Colin Raffel, Jieun Oh, Isaac Wang Music 422 Final Project 3/12/2010 ROW.mp3 Colin Raffel, Jieun Oh, Isaac Wang Music 422 Final Project 3/12/2010 Motivation The realities of mp3 widespread use low quality vs. bit rate when compared to modern codecs Vision for row-mp3 backwards

More information

Week 14. Video Compression. Ref: Fundamentals of Multimedia

Week 14. Video Compression. Ref: Fundamentals of Multimedia Week 14 Video Compression Ref: Fundamentals of Multimedia Last lecture review Prediction from the previous frame is called forward prediction Prediction from the next frame is called forward prediction

More information

Lossy compression. CSCI 470: Web Science Keith Vertanen

Lossy compression. CSCI 470: Web Science Keith Vertanen Lossy compression CSCI 470: Web Science Keith Vertanen Digital audio Overview Sampling rate Quan5za5on MPEG audio layer 3 (MP3) JPEG s5ll images Color space conversion, downsampling Discrete Cosine Transform

More information

AUDIO. Henning Schulzrinne Dept. of Computer Science Columbia University Spring 2015

AUDIO. Henning Schulzrinne Dept. of Computer Science Columbia University Spring 2015 AUDIO Henning Schulzrinne Dept. of Computer Science Columbia University Spring 2015 Key objectives How do humans generate and process sound? How does digital sound work? How fast do I have to sample audio?

More information