Appendix 4. Audio coding algorithms

Size: px
Start display at page:

Download "Appendix 4. Audio coding algorithms"

Transcription

1 Appendix 4. Audio coding algorithms 1 Introduction The main application of audio compression systems is to obtain compact digital representations of high-quality (CD-quality) wideband audio signals. Typically audio signals recorded on CDs and digital audio tapes are sampled at 44.1 or 48 khz and each sample is represented by a 16-bit integer, that is, the uncompressed two-channel stereo CD-quality audio requires 2 41(48) 16=1.41(1.54) b/s for transmission. Unlike speech compression systems the audio codecs process sounds generated by arbitrary sources and they cannot exploit specific features of the input signals. However, almost all modern audio codecs are based on a model of the human auditory system. The key idea behind the so-called perceptual coding is to remove the parts of the input signal which the human cannot perceive. The imperceptible information removed by the perceptual coder is called the irrelevancy. Since, similarly to speech signals, audio signals can be interpreted as outputs of sources with memory then perceptual coders remove both irrelevancy and redundancy in order to provide the lowest bit rate possible for a given quality. An important part of perceptual audio coders is the psychoacoustic model of the human hearing. This model is used in order to estimate the amount of quantization noise that is inaudible. In the next section we consider physical phenomena which are exploited by the psychoacoustic model. 2 Basics of perceptual coding The frequency range which audio coders deal with is roughly from 20 Hz to 20 khz. The human auditory system has remarkable detection capability with a range (called dynamic range) of 120 db from quietest to loudest sounds. However, digitized CD-quality audio signals have a dynamic range of about 96 1

2 db because they use a 16-bit sample representation. The absolute threshold of hearing T q (f) characterizes the amount of energy in a pure tone signal such that it can be detected by a listener in a noiseless environment. This absolute threshold is computed as T q (f) = 3.64(f/1000) e 0.6(f/ ) (f/1000) 4 (1) where f denotes a frequency sample [?]. The threshold is measured as a Sound Pressure Level (SPL), normalized by a minimum value, and expressed in db. The frequency dependence of this threshold is shown in Fig. 1. Using the absolute threshold of hearing in order to estimate the amount of inaudible quantization noise is only the first step in constructing the psychoacoustic model. It turned out that the threshold of hearing at a given frequency depends on the input signal. The reason of that dependency is that louder input sounds can mask or hide weaker ones. The psychoacoustic model takes into account this masking property that occurs when a strong audio signal in the time or frequency domain makes neighboring weaker audio signals imperceptible. The coder analyzes the audio signal and computes the amount of available noise masking as a function of frequency. In other words, the noise detection threshold is a modified version of the absolute threshold T q (f), with its shape determined by the input signal at any given time. In order to estimate this threshold we have first to consider the concept of critical bands. Our inner ear (cochlea) can be interpreted as a bank of highly overlapping nonlinear (level-dependent) bandpass filters. Different areas of the cochlea, each containing a set of neural receptors correspond to different frequency bands called critical bands. The bandwidths of the cochlea filters (critical bandwidths) increase with increasing frequency. They are less than 100 Hz for the lowest audible frequencies and more than 4 khz at the highest frequencies. It is said that the human auditory system has a limited, frequency-dependent resolution. The critical bandwidth is a function of frequency that quantifies the cochlea filter passbands. ore than 60 years ago, due to experiments of H. Fletcher, it was discovered that a critical band is a range of frequencies over which the masking signal-to-noise ratio (SNR) remains more or less constant. For example, H. Fletcher found that a tone at 700 Hz can mask narrow-band noise of less energy in the vicinity of approximately ±70 Hz. If the noise is outside the range 2

3 Sound pressure level (SPL), db f Hz Figure 1: The absolute threshold of hearing in a quiet environment Hz, then to be inaudible it has to have much less energy than the tone. Thus, the critical bandwidth is 140 Hz. The phenomenon of masking has a simple explanation. A strong tone or noise masker present in a given critical band blocks detection of a weaker signal since it creates a strong excitation of the corresponding area of the cochlea. Fig. 2 illustrates masking property. A second observation about masking is that noise and tone have different masking properties. Let B denote a critical-band number and E T and E N be the tone and noise masker energy levels, respectively, for this critical band then the following empirical rules have been observed: and T H N = E T ( B) db (2) T H T = E N K db (3) where T H N and T H T are the noise masking thresholds for tone-maskingnoise and the tone masking threshold for noise-masking-tone types of masking, respectively. The parameter K takes values in the range of 3-6 db. In other words, for the tone-masking-noise case, if the energy of noise (in db) is 3

4 40 Sound pressure level (SPL), db asked sound asker Inaudible sound asking treshold Audible sound Threshold in quiet f, Hz Figure 2: asking of weaker signals by strong tone masker less than T H N, then the noise is imperceptible. For the noise-masking-tone case, if the energy of tone (in db) is less than T H T, then the tone is imperceptible. The problem is that speech and audio signals are neither pure tones nor pure noise but rather a mixture of both. The degree to which a signal, within a critical band, appears more or less tone-like (or noise-like) determines its masking properties. Usually in perceptual coding masking signals are classified as tone or noise and then the thresholds (2) and (3) are computed. The masking property of a given masker in the critical band often spreads to the neighboring critical bands and affects detection thresholds in these critical bands. This effect, also known as the spread of masking, or inter-band masking is modeled in coding applications by a spreading function. Since typically each subband of an audio signal to be encoded contains a collection of maskers of tone and noise types, then the individual masking thresholds computed for each of the maskers and modified by applying a spread function are then combined for each subband into a global masking threshold. The noise spectrum shaped according to the computed global masking threshold is often called Just Noticeable Distortion (JND). The absolute threshold of hearing T q (f) is also considered when shaping the noise 4

5 Cochlea Filter odeling Threshold Estimation Absolute Threshold asking thresholds for subbands Signal-to-mask ratio Tonality estimate Short Term Signal Spectrum Tonality Calculation Estimated Psychoacoustic Threshold Figure 3: Typical psychoacoustic model for audio signal spectra. Let T qmin denote the minimal value of the absolute threshold of hearing in a given subband, then in this subband max {JND, T qmin } is used as the permissible distortion threshold. Exactly how the human auditory system works is still a topic of active research. For the purpose of constructing perceptual audio coders all the considered phenomena are described by formulas and exploited to reduce the bit rate of the perceptual audio coder. Fig. 3 is a block diagram of a typical psychoacoustic model of hearing. The input signal in this diagram is actually the short-term signal spectrum. The cochlea filter model computes the short-term cochlea energy model, i.e., a distribution of the energy along the cochlea. This information is used in two ways. The short-term signal spectrum and the cochlea energy for a given subband are used to compute the tonality of the signal in this subband. Tonality is needed because tones mask noise better than noise masks tones. The cochlea energy plus the tonality information allow us to compute the threshold of audibility for the signal in the subband. This is a spectral estimate indicating what noise level will be audible as a function of frequency. As long as the quantization noise is less than the threshold of audibility at all frequencies, the quantization noise will be imperceptible. Fig. 4 is a diagram for an entire perceptual audio coder. The psychoacoustic model represents one of its blocks. The input signal of the coder is a discrete-time audio signal with sampling rate 32, 44.1, or 48 khz. Each sample is represented by a 16-bit integer value. The input audio stream is split into frames and for each frame transform coefficients are computed. The audio coder is based on transform 5

6 PC audio input Coding filter bank (computing transform coefficients) Bit allocation, Quantization and coding Bitstream formatting Encoded bitstream Psychoacoustic model Encoded bitstream Bitstream unpacking Transform coefficient reconstruction Inverse transform Decoded PC audio Figure 4: Generic perceptual audio coder and decoder coding technique. From the theoretical point of view any kind of transform such as the DFT, the DCT or the so-called odified DCT (DCT), or the transform based on a filter bank can be used in the first block of the coder. In audio standards the audio stream passes through a filter bank that divides the input into multiple frequency subbands. This special type of transform coding is called subband coding [?]. oreover, as will be shown below, some particular cases of subband coding can be implemented as the DCT. The distinguishing feature of all transforms used in audio coding is that they are overlapped transforms. Using nonoverlapped transforms as, for example, in video coding would lead to audible clicks with block frequency. The input audio stream simultaneously passes through a psychoacoustic model that determines the ratio of the signal energy to the masking threshold (SR) for each subband. The quantization and coding block uses the SRs to decide how to allocate the total available number of bits over the subbands to minimize the audibility of the quantization noise. The transform coefficients are then quantized according to the chosen bit allocation based on the perceptual thresholds. Usually uniform scalar quantization is used. The outputs of the quantizer are then further compressed using lossless cod- 6

7 ing, most often a Huffman coder. In recent standards, context arithmetic coding is used instead of Huffman coding. The obtained bitstream contains encoded transform coefficients and side information (bit allocation). The decoder decodes this bitstream, restores the quantized subband values, and reconstructs the audio signal from the subband values. 3 Overview of audio standards The most important standards in this domain were developed by the ISO otion Pictures Experts Group (ISO-PEG) subgroup on high-quality audio. Although perfectly suitable for audio-only applications, PEG Audio is actually one part of a three-part compression standard that also includes video and systems. The original PEG Audio standard was created for mono sound systems and had three layers, each providing greater compression ratio. PEG-2 was created to provide stereo and multichannel audio capability. The PEG-2 Advanced Audio Coder (PEG-AAC) duplicates the performance of PEG-2 at half the bit rate. Recently the AC technique was further enhanced and included to the standard PEG-4 Audio as two efficient coding modes: High Efficiency AAC (HE AAC) and HE-AAC v2. The original PEG-coder is sometimes now referred to as PEG-1. It contains three independent layers of compression. This provides a wide range of trade-offs between code complexity and compressed audio quality. Layer I has the lowest complexity and suits best bit rates above 128 kb/s per channel. For example, Philips Digital Compact Cassette (DCC) developed in 1992 used layer I compression at 192 kb/s per channel. Layer II has an intermediate complexity and bit rates around 128 kb/s per channel. The main application for this layer is audio coding for Digital Audio Broadcasting (DAB) and for Digital Video Brodcasting. It uses a more complicated quantization procedure than layer I. Layer III is the most complex but it provides good audio quality for bit rates around 64 kb/s per channel. It can be used for transmission audio over ISDN channels and storing audio on CD. The widely used P3 player contains a decoder of audio signals compressed according the layer III of the PEG Audio standard. The specific features of this layer are: hybrid transform coding based on the DCT and filter bank coding, a more complicated psychoacoustic model, and the so-called echo-supression system. The variable-length coding of the transform coefficients is used at this layer. 7

8 h ( ) 0 v ( n ) y ( ) 0 0 z 0( n ) g ( ) 0 h1( n ) v1( n ) y1( n ) z1( n ) g1( n ) x( n ) xˆ( n ) h v y 1( n) 1( n) z 1( n) 1( n) g 1( n) Figure 5: -band analysis and synthesis filter banks The digital transform implemented as a polyphase filter bank is common for layers I and II of PEG Audio compression algorithm. This filter bank divides the audio signal into 32 frequency subbands with equal bandwidths. Each subband is 750 Hz wide. Surely, the equal widths of the subbands do not accurately reflect the human auditory system frequency-dependent behavior. Since at lower frequencies critical bandwidth is about 100 Hz, the low-frequency subbands cover more than one critical band. As a result, the critical band with the least noise masking determines the number of bits (number of quantization levels) for the entire subband. At high frequencies, where critical bandwidths are about 4 khz, one critical band covers a few subbands but this circumstance is less critical in the sense of bit allocation. Notice that the polyphase filter bank and its inverse are not always lossless transforms. Even without quantization, the inverse transform sometimes cannot perfectly recover the original signal. However, the error introduced by using the filter bank is negligible and inaudible. Adjacent filter bands in this filter bank have a major frequency overlap. A signal at a single frequency can affect two adjacent filter bank outputs. At each step of the coding procedure = 32 samples of the audio signal enter the -band filter bank as shown in Fig. 5. These samples together with the L = 480 previous samples form an input frame of length N = L + = = 512. Using these N samples the filter bank computes transform coefficients. The output of the ith, i = 0, 1,..., 1, filter is 8

9 computed as v i (n) = N 1 m=0 x(n m)h i (m) (4) where x(n) are the input samples. After decimation by a factor we obtain y i (n) = v i (n) = N 1 m=0 x(n m)h i (m) where h i (n) is the pulse response of the ith analysis filter. In the decoder the quantized transform coefficients ŷ i are upsampled by a factor in order to form the intermediate sequences z i (n) = { ŷi (n/), n = 0,, 2,... 0, otherwise. The obtained values z i (n) are then filtered by the synthesis filter bank where g i (n) = h i (N 1 n), i = 0, 1,..., 1 is the pulse response of the ith synthesis filter. The ISO-PEG coding uses a cosine modulated filter bank. The pulse responses of the analysis and synthesis filters are cosine modulated versions of the pulse response of a lowpass prototype filter, that is, ( ) π(i + 1/2)(m 16) h i (m) = h(m) cos 32 where { w(m), if m/64 is odd h(m) = w(m), otherwise denotes the pulse response of the lowpass prototype filter, w(m), m = 0, 1,..., 511, is the transform window. The pulse response h(n) is multiplied by a cosine term to shift the lowpass pulse response h(n) to the appropriate frequency band. These filters have normalized by f s center frequencies (2i + 1)π/64, i = 0, 1,..., 31, where f s is the sampling frequency. Although the presented form of the filter bank is rather convenient for analysis, it is not efficient from an implementation point of view. A direct implementation of (4) requires =16384 multiplications and =16352 additions to compute the 32 filter outputs. Taking into account 9

10 the periodicity of the cosine function we can obtain the equivalent, but computationally more efficient, representation 63 v i (n) = (i, k) k=0 7 w(k + 64j)x(n (k + 64j)) (5) j=0 where w(k) is one of the 512 coefficients of the transform window, ( ) π(i + 1/2)(k 16) (i, k) = cos 32 i = 0,..., 31, k = 0,..., 63. In order to compute the 32 filter outputs according to (5) we perform only =2560 multiplications and =2464 additions or roughly 80 multiplications and additions per output. The cosine modulated filter bank used by the PEG standard does not correspond to any invertible transform. However, it can be shown that there exist generalized cosine modulated filter banks which provide perfect reconstruction of the input signal. Such filter banks can be implemented more efficiently by reducing the convolution to the DCT which can be implemented using the FFT. The layer III of PEG-1 standard uses the DCT based on DCT-IV. The input of the transform block are blocks of samples x t = (x(n + t), x(n + t + 1),..., x(n + t + 1)) of length each. Together with the (L 1) previous blocks they form a new block s t of length N = L. First, this block is component-wise multiplied by the analysis window w a (n), n = 0, 1,..., N 1. Then by multiplying the obtained vector v t of length N by matrix A of size N (transform matrix), specified below, we obtain the vector y t of transform coefficients: y t = v t A. To perform the inverse transform, first, the vector of transform coefficients y t of length is multiplied by matrix A T : ŝ t = y t A T. Then the obtained vector ŝ t of length N is component-wise multiplied by the synthesis window w s (n), n = 0, 1,..., N 1. The resulting vector ˆv t is 10

11 shifted by positions with respect to the previous block ˆv t 1 and is added to the output sequence of the inverse transform block. Each block ˆx at the output of the decoder is the sum of L shifted blocks of length N. The blocks are overlapped in (L 1) samples. The overlapped analysis and overlapadd synthesis are shown in Fig. 6. The transform matrix A of size N can be written in the form A = Y T where matrix T of size is the matrix of the DCT-IV transform with entries (( 2 t kn = cos k + 1 ) ( n + 1 ) ) π 2 2 k, n = 0, 1,..., 1, matrix Y describes the preprocessing step. It has the following form Y = [Y 0 Y 1 Y 0 Y 1 Y 0 Y 1... ] T where submatrices Y 0 and Y 1 of size in turn consist of submatrices of size /2 /2 : ( ) 0 0 Y 0 = I J ( ) J I Y 1 = 0 0 where I is the identity matrix of size /2 /2 and J is of the same size but the contra-identity matrix which has ones along the second diagonal and its other entries are zero. To complete the description of the modified discrete cosine transform it is necessary to specify the window functions w a (n) and w s (n). The analysis and synthesis windows can be chosen to be equal, w a (n) = w s (n) = w(n). The requirement of the perfect reconstruction can be written as L 2s 1 l=0 w(n + l)w(n + l + 2s) = { 1, s = 0 0, s 0 n = 0, 1,..., /2 1. Often the DCT with N = 2 is used, that is, each block is overlapped with the adjacent block in half of the length. This corresponds to the parameter L = 2. In this case (6) takes the form (6) w 2 (n) + w 2 (n + ) = 1 (7) 11

12 N = L w a DCT w a DCT w a DCT N = L IDCT w s IDCT w s IDCT w s Figure 6: odified DCT 12

13 n = 0,..., /2 1. The example of the window which satisfies (7) is ( ( π w(n) = sin n + 1 )) 2N 2 n = 0,..., N 1. All perceptual codecs based on transforms suffer from a specific artifact called pre-echo. This artifact occurs when, at the end of the transform block containing a low-energy region, a powerful signal with fast changes begins. Such a phenomenon is called sound attack. We compute the average SR for the block. The decoder spreads the quantization error over the entire block. This quantization noise is easy to hear during the low-energy part of the signal. As a result we obtain in the reconstructed block an unmasked distortion in the low-energy region which precedes the sound attack in time. In order to suppress the pre-echo, the PEG standard uses adaptive switching of the transform block length. The block length is larger for stationary parts of the signal and is reduced if the encoder detects a transient region. Layer III of the PEG standard improves performances of layers I and II by using the hybrid filter bank and adaptive switching of the transform block length for pre-echo suppression. The hybrid filter bank combines a 32-band polyphase filter bank with an adaptive DCT. In order to improve the frequency resolution, each of the 32 subbands is subjected to an 18-point DCT. Using this transform improves the frequency resolution from 750 Hz to Hz. However, introducing the second step of transform increases the length of the transform block. To improve the pre-echo suppression the 18-point DCT is switched to a 6-point DCT when a sound attack is detected. For coding stereo signals, layers I and II of the PEG standard use the Intensity Stereo method. The basic idea for intensity stereo is that for some high-frequency subbands, instead of separate transmitting signals of the left and right channels, only the sum of signals from these two channels is transmitted together with scale factors for each of the channels. Layer III supports intensity stereo coding but also uses a more efficient method of stereo coding called iddle-side (S)-stereo. The encoder uses the sum of the signals in the left and right channels (middle) and the difference between them (side) information. The psychoacoustic model uses a separate, independent, time-to-frequency transform instead of the polyphase filter bank because it needs finer frequency 13

14 resolution for accurate calculation of the masking thresholds. There are two psychoacoustic models in PEG-audio; the so-called PA-I and PA-II. The PA-I is less complex than the PA-II and has more compromises to simplify the calculations. Each model works for any of the layers of compression. However, the PA-I is recommended for layers I and II while the PA-II is recommended for layer III. The PA-I uses a 512-point FFT for layer I and a 1024-point FFT for the layer II. PA-II uses a 1024-point FFT for all layers. 14

Audio-coding standards

Audio-coding standards Audio-coding standards The goal is to provide CD-quality audio over telecommunications networks. Almost all CD audio coders are based on the so-called psychoacoustic model of the human auditory system.

More information

Audio-coding standards

Audio-coding standards Audio-coding standards The goal is to provide CD-quality audio over telecommunications networks. Almost all CD audio coders are based on the so-called psychoacoustic model of the human auditory system.

More information

Mpeg 1 layer 3 (mp3) general overview

Mpeg 1 layer 3 (mp3) general overview Mpeg 1 layer 3 (mp3) general overview 1 Digital Audio! CD Audio:! 16 bit encoding! 2 Channels (Stereo)! 44.1 khz sampling rate 2 * 44.1 khz * 16 bits = 1.41 Mb/s + Overhead (synchronization, error correction,

More information

Chapter 14 MPEG Audio Compression

Chapter 14 MPEG Audio Compression Chapter 14 MPEG Audio Compression 14.1 Psychoacoustics 14.2 MPEG Audio 14.3 Other Commercial Audio Codecs 14.4 The Future: MPEG-7 and MPEG-21 14.5 Further Exploration 1 Li & Drew c Prentice Hall 2003 14.1

More information

Module 9 AUDIO CODING. Version 2 ECE IIT, Kharagpur

Module 9 AUDIO CODING. Version 2 ECE IIT, Kharagpur Module 9 AUDIO CODING Lesson 29 Transform and Filter banks Instructional Objectives At the end of this lesson, the students should be able to: 1. Define the three layers of MPEG-1 audio coding. 2. Define

More information

Lecture 16 Perceptual Audio Coding

Lecture 16 Perceptual Audio Coding EECS 225D Audio Signal Processing in Humans and Machines Lecture 16 Perceptual Audio Coding 2012-3-14 Professor Nelson Morgan today s lecture by John Lazzaro www.icsi.berkeley.edu/eecs225d/spr12/ Hero

More information

Perceptual Coding. Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding

Perceptual Coding. Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding Perceptual Coding Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding Part II wrap up 6.082 Fall 2006 Perceptual Coding, Slide 1 Lossless vs.

More information

EE482: Digital Signal Processing Applications

EE482: Digital Signal Processing Applications Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu EE482: Digital Signal Processing Applications Spring 2014 TTh 14:30-15:45 CBC C222 Lecture 13 Audio Signal Processing 14/04/01 http://www.ee.unlv.edu/~b1morris/ee482/

More information

Multimedia Communications. Audio coding

Multimedia Communications. Audio coding Multimedia Communications Audio coding Introduction Lossy compression schemes can be based on source model (e.g., speech compression) or user model (audio coding) Unlike speech, audio signals can be generated

More information

Principles of Audio Coding

Principles of Audio Coding Principles of Audio Coding Topics today Introduction VOCODERS Psychoacoustics Equal-Loudness Curve Frequency Masking Temporal Masking (CSIT 410) 2 Introduction Speech compression algorithm focuses on exploiting

More information

5: Music Compression. Music Coding. Mark Handley

5: Music Compression. Music Coding. Mark Handley 5: Music Compression Mark Handley Music Coding LPC-based codecs model the sound source to achieve good compression. Works well for voice. Terrible for music. What if you can t model the source? Model the

More information

Audio Compression. Audio Compression. Absolute Threshold. CD quality audio:

Audio Compression. Audio Compression. Absolute Threshold. CD quality audio: Audio Compression Audio Compression CD quality audio: Sampling rate = 44 KHz, Quantization = 16 bits/sample Bit-rate = ~700 Kb/s (1.41 Mb/s if 2 channel stereo) Telephone-quality speech Sampling rate =

More information

Contents. 3 Vector Quantization The VQ Advantage Formulation Optimality Conditions... 48

Contents. 3 Vector Quantization The VQ Advantage Formulation Optimality Conditions... 48 Contents Part I Prelude 1 Introduction... 3 1.1 Audio Coding... 4 1.2 Basic Idea... 6 1.3 Perceptual Irrelevance... 8 1.4 Statistical Redundancy... 9 1.5 Data Modeling... 9 1.6 Resolution Challenge...

More information

Figure 1. Generic Encoder. Window. Spectral Analysis. Psychoacoustic Model. Quantize. Pack Data into Frames. Additional Coding.

Figure 1. Generic Encoder. Window. Spectral Analysis. Psychoacoustic Model. Quantize. Pack Data into Frames. Additional Coding. Introduction to Digital Audio Compression B. Cavagnolo and J. Bier Berkeley Design Technology, Inc. 2107 Dwight Way, Second Floor Berkeley, CA 94704 (510) 665-1600 info@bdti.com http://www.bdti.com INTRODUCTION

More information

Perceptual coding. A psychoacoustic model is used to identify those signals that are influenced by both these effects.

Perceptual coding. A psychoacoustic model is used to identify those signals that are influenced by both these effects. Perceptual coding Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal. Perceptual encoders, however, have been designed for the compression of general

More information

AUDIOVISUAL COMMUNICATION

AUDIOVISUAL COMMUNICATION AUDIOVISUAL COMMUNICATION Laboratory Session: Audio Processing and Coding The objective of this lab session is to get the students familiar with audio processing and coding, notably psychoacoustic analysis

More information

Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal.

Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal. Perceptual coding Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal. Perceptual encoders, however, have been designed for the compression of general

More information

ELL 788 Computational Perception & Cognition July November 2015

ELL 788 Computational Perception & Cognition July November 2015 ELL 788 Computational Perception & Cognition July November 2015 Module 11 Audio Engineering: Perceptual coding Coding and decoding Signal (analog) Encoder Code (Digital) Code (Digital) Decoder Signal (analog)

More information

CISC 7610 Lecture 3 Multimedia data and data formats

CISC 7610 Lecture 3 Multimedia data and data formats CISC 7610 Lecture 3 Multimedia data and data formats Topics: Perceptual limits of multimedia data JPEG encoding of images MPEG encoding of audio MPEG and H.264 encoding of video Multimedia data: Perceptual

More information

Optical Storage Technology. MPEG Data Compression

Optical Storage Technology. MPEG Data Compression Optical Storage Technology MPEG Data Compression MPEG-1 1 Audio Standard Moving Pictures Expert Group (MPEG) was formed in 1988 to devise compression techniques for audio and video. It first devised the

More information

Filterbanks and transforms

Filterbanks and transforms Filterbanks and transforms Sources: Zölzer, Digital audio signal processing, Wiley & Sons. Saramäki, Multirate signal processing, TUT course. Filterbanks! Introduction! Critical sampling, half-band filter!

More information

AUDIOVISUAL COMMUNICATION

AUDIOVISUAL COMMUNICATION AUDIOVISUAL COMMUNICATION Laboratory Session: Audio Processing and Coding The objective of this lab session is to get the students familiar with audio processing and coding, notably psychoacoustic analysis

More information

Fundamentals of Perceptual Audio Encoding. Craig Lewiston HST.723 Lab II 3/23/06

Fundamentals of Perceptual Audio Encoding. Craig Lewiston HST.723 Lab II 3/23/06 Fundamentals of Perceptual Audio Encoding Craig Lewiston HST.723 Lab II 3/23/06 Goals of Lab Introduction to fundamental principles of digital audio & perceptual audio encoding Learn the basics of psychoacoustic

More information

DAB. Digital Audio Broadcasting

DAB. Digital Audio Broadcasting DAB Digital Audio Broadcasting DAB history DAB has been under development since 1981 at the Institut für Rundfunktechnik (IRT). In 1985 the first DAB demonstrations were held at the WARC-ORB in Geneva

More information

Bit or Noise Allocation

Bit or Noise Allocation ISO 11172-3:1993 ANNEXES C & D 3-ANNEX C (informative) THE ENCODING PROCESS 3-C.1 Encoder 3-C.1.1 Overview For each of the Layers, an example of one suitable encoder with the corresponding flow-diagram

More information

Audio Coding Standards

Audio Coding Standards Audio Standards Kari Pihkala 13.2.2002 Tik-111.590 Multimedia Outline Architectural Overview MPEG-1 MPEG-2 MPEG-4 Philips PASC (DCC cassette) Sony ATRAC (MiniDisc) Dolby AC-3 Conclusions 2 Architectural

More information

Ch. 5: Audio Compression Multimedia Systems

Ch. 5: Audio Compression Multimedia Systems Ch. 5: Audio Compression Multimedia Systems Prof. Ben Lee School of Electrical Engineering and Computer Science Oregon State University Chapter 5: Audio Compression 1 Introduction Need to code digital

More information

2.4 Audio Compression

2.4 Audio Compression 2.4 Audio Compression 2.4.1 Pulse Code Modulation Audio signals are analog waves. The acoustic perception is determined by the frequency (pitch) and the amplitude (loudness). For storage, processing and

More information

MPEG-1. Overview of MPEG-1 1 Standard. Introduction to perceptual and entropy codings

MPEG-1. Overview of MPEG-1 1 Standard. Introduction to perceptual and entropy codings MPEG-1 Overview of MPEG-1 1 Standard Introduction to perceptual and entropy codings Contents History Psychoacoustics and perceptual coding Entropy coding MPEG-1 Layer I/II Layer III (MP3) Comparison and

More information

Design and Implementation of an MPEG-1 Layer III Audio Decoder KRISTER LAGERSTRÖM

Design and Implementation of an MPEG-1 Layer III Audio Decoder KRISTER LAGERSTRÖM Design and Implementation of an MPEG-1 Layer III Audio Decoder KRISTER LAGERSTRÖM Master s Thesis Computer Science and Engineering Program CHALMERS UNIVERSITY OF TECHNOLOGY Department of Computer Engineering

More information

Data Compression. Audio compression

Data Compression. Audio compression 1 Data Compression Audio compression Outline Basics of Digital Audio 2 Introduction What is sound? Signal-to-Noise Ratio (SNR) Digitization Filtering Sampling and Nyquist Theorem Quantization Synthetic

More information

The MPEG-4 General Audio Coder

The MPEG-4 General Audio Coder The MPEG-4 General Audio Coder Bernhard Grill Fraunhofer Institute for Integrated Circuits (IIS) grl 6/98 page 1 Outline MPEG-2 Advanced Audio Coding (AAC) MPEG-4 Extensions: Perceptual Noise Substitution

More information

Principles of MPEG audio compression

Principles of MPEG audio compression Principles of MPEG audio compression Principy komprese hudebního signálu metodou MPEG Petr Kubíček Abstract The article describes briefly audio data compression. Focus of the article is a MPEG standard,

More information

DSP. Presented to the IEEE Central Texas Consultants Network by Sergio Liberman

DSP. Presented to the IEEE Central Texas Consultants Network by Sergio Liberman DSP The Technology Presented to the IEEE Central Texas Consultants Network by Sergio Liberman Abstract The multimedia products that we enjoy today share a common technology backbone: Digital Signal Processing

More information

Audio Coding and MP3

Audio Coding and MP3 Audio Coding and MP3 contributions by: Torbjørn Ekman What is Sound? Sound waves: 20Hz - 20kHz Speed: 331.3 m/s (air) Wavelength: 165 cm - 1.65 cm 1 Analogue audio frequencies: 20Hz - 20kHz mono: x(t)

More information

Wavelet filter bank based wide-band audio coder

Wavelet filter bank based wide-band audio coder Wavelet filter bank based wide-band audio coder J. Nováček Czech Technical University, Faculty of Electrical Engineering, Technicka 2, 16627 Prague, Czech Republic novacj1@fel.cvut.cz 3317 New system for

More information

DSP-CIS. Part-IV : Filter Banks & Subband Systems. Chapter-10 : Filter Bank Preliminaries. Marc Moonen

DSP-CIS. Part-IV : Filter Banks & Subband Systems. Chapter-10 : Filter Bank Preliminaries. Marc Moonen DSP-CIS Part-IV Filter Banks & Subband Systems Chapter-0 Filter Bank Preliminaries Marc Moonen Dept. E.E./ESAT-STADIUS, KU Leuven marc.moonen@esat.kuleuven.be www.esat.kuleuven.be/stadius/ Part-III Filter

More information

/ / _ / _ / _ / / / / /_/ _/_/ _/_/ _/_/ _\ / All-American-Advanced-Audio-Codec

/ / _ / _ / _ / / / / /_/ _/_/ _/_/ _/_/ _\ / All-American-Advanced-Audio-Codec / / _ / _ / _ / / / / /_/ _/_/ _/_/ _/_/ _\ / All-American-Advanced-Audio-Codec () **Z ** **=Z ** **= ==== == **= ==== \"\" === ==== \"\"\" ==== \"\"\"\" Tim O Brien Colin Sullivan Jennifer Hsu Mayank

More information

Port of a Fixed Point MPEG-2 AAC Encoder on a ARM Platform

Port of a Fixed Point MPEG-2 AAC Encoder on a ARM Platform Port of a Fixed Point MPEG-2 AAC Encoder on a ARM Platform by Romain Pagniez romain@felinewave.com A Dissertation submitted in partial fulfillment of the requirements for the Degree of Master of Science

More information

Parametric Coding of High-Quality Audio

Parametric Coding of High-Quality Audio Parametric Coding of High-Quality Audio Prof. Dr. Gerald Schuller Fraunhofer IDMT & Ilmenau Technical University Ilmenau, Germany 1 Waveform vs Parametric Waveform Filter-bank approach Mainly exploits

More information

AN AUDIO WATERMARKING SCHEME ROBUST TO MPEG AUDIO COMPRESSION

AN AUDIO WATERMARKING SCHEME ROBUST TO MPEG AUDIO COMPRESSION AN AUDIO WATERMARKING SCHEME ROBUST TO MPEG AUDIO COMPRESSION Won-Gyum Kim, *Jong Chan Lee and Won Don Lee Dept. of Computer Science, ChungNam Nat l Univ., Daeduk Science Town, Taejon, Korea *Dept. of

More information

New Results in Low Bit Rate Speech Coding and Bandwidth Extension

New Results in Low Bit Rate Speech Coding and Bandwidth Extension Audio Engineering Society Convention Paper Presented at the 121st Convention 2006 October 5 8 San Francisco, CA, USA This convention paper has been reproduced from the author's advance manuscript, without

More information

Image Transformation Techniques Dr. Rajeev Srivastava Dept. of Computer Engineering, ITBHU, Varanasi

Image Transformation Techniques Dr. Rajeev Srivastava Dept. of Computer Engineering, ITBHU, Varanasi Image Transformation Techniques Dr. Rajeev Srivastava Dept. of Computer Engineering, ITBHU, Varanasi 1. Introduction The choice of a particular transform in a given application depends on the amount of

More information

Audio and video compression

Audio and video compression Audio and video compression 4.1 introduction Unlike text and images, both audio and most video signals are continuously varying analog signals. Compression algorithms associated with digitized audio and

More information

Compressed Audio Demystified by Hendrik Gideonse and Connor Smith. All Rights Reserved.

Compressed Audio Demystified by Hendrik Gideonse and Connor Smith. All Rights Reserved. Compressed Audio Demystified Why Music Producers Need to Care About Compressed Audio Files Download Sales Up CD Sales Down High-Definition hasn t caught on yet Consumers don t seem to care about high fidelity

More information

SPREAD SPECTRUM AUDIO WATERMARKING SCHEME BASED ON PSYCHOACOUSTIC MODEL

SPREAD SPECTRUM AUDIO WATERMARKING SCHEME BASED ON PSYCHOACOUSTIC MODEL SPREAD SPECTRUM WATERMARKING SCHEME BASED ON PSYCHOACOUSTIC MODEL 1 Yüksel Tokur 2 Ergun Erçelebi e-mail: tokur@gantep.edu.tr e-mail: ercelebi@gantep.edu.tr 1 Gaziantep University, MYO, 27310, Gaziantep,

More information

What is multimedia? Multimedia. Continuous media. Most common media types. Continuous media processing. Interactivity. What is multimedia?

What is multimedia? Multimedia. Continuous media. Most common media types. Continuous media processing. Interactivity. What is multimedia? Multimedia What is multimedia? Media types +Text + Graphics + Audio +Image +Video Interchange formats What is multimedia? Multimedia = many media User interaction = interactivity Script = time 1 2 Most

More information

CSCD 443/533 Advanced Networks Fall 2017

CSCD 443/533 Advanced Networks Fall 2017 CSCD 443/533 Advanced Networks Fall 2017 Lecture 18 Compression of Video and Audio 1 Topics Compression technology Motivation Human attributes make it possible Audio Compression Video Compression Performance

More information

Multimedia. What is multimedia? Media types. Interchange formats. + Text +Graphics +Audio +Image +Video. Petri Vuorimaa 1

Multimedia. What is multimedia? Media types. Interchange formats. + Text +Graphics +Audio +Image +Video. Petri Vuorimaa 1 Multimedia What is multimedia? Media types + Text +Graphics +Audio +Image +Video Interchange formats Petri Vuorimaa 1 What is multimedia? Multimedia = many media User interaction = interactivity Script

More information

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Senior Manager, CE Technology Dolby Australia Pty Ltd

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Senior Manager, CE Technology Dolby Australia Pty Ltd Introducing Audio Signal Processing & Audio Coding Dr Michael Mason Senior Manager, CE Technology Dolby Australia Pty Ltd Overview Audio Signal Processing Applications @ Dolby Audio Signal Processing Basics

More information

Video Compression An Introduction

Video Compression An Introduction Video Compression An Introduction The increasing demand to incorporate video data into telecommunications services, the corporate environment, the entertainment industry, and even at home has made digital

More information

Module 9 AUDIO CODING. Version 2 ECE IIT, Kharagpur

Module 9 AUDIO CODING. Version 2 ECE IIT, Kharagpur Module 9 AUDIO CODING Lesson 32 Psychoacoustic Models Instructional Objectives At the end of this lesson, the students should be able to 1. State the basic objectives of both the psychoacoustic models.

More information

MPEG-4 General Audio Coding

MPEG-4 General Audio Coding MPEG-4 General Audio Coding Jürgen Herre Fraunhofer Institute for Integrated Circuits (IIS) Dr. Jürgen Herre, hrr@iis.fhg.de 1 General Audio Coding Solid state players, Internet audio, terrestrial and

More information

MPEG-4 aacplus - Audio coding for today s digital media world

MPEG-4 aacplus - Audio coding for today s digital media world MPEG-4 aacplus - Audio coding for today s digital media world Whitepaper by: Gerald Moser, Coding Technologies November 2005-1 - 1. Introduction Delivering high quality digital broadcast content to consumers

More information

Scalable Perceptual and Lossless Audio Coding based on MPEG-4 AAC

Scalable Perceptual and Lossless Audio Coding based on MPEG-4 AAC Scalable Perceptual and Lossless Audio Coding based on MPEG-4 AAC Ralf Geiger 1, Gerald Schuller 1, Jürgen Herre 2, Ralph Sperschneider 2, Thomas Sporer 1 1 Fraunhofer IIS AEMT, Ilmenau, Germany 2 Fraunhofer

More information

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Snr Staff Eng., Team Lead (Applied Research) Dolby Australia Pty Ltd

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Snr Staff Eng., Team Lead (Applied Research) Dolby Australia Pty Ltd Introducing Audio Signal Processing & Audio Coding Dr Michael Mason Snr Staff Eng., Team Lead (Applied Research) Dolby Australia Pty Ltd Introducing Audio Signal Processing & Audio Coding 2013 Dolby Laboratories,

More information

Chapter 4: Audio Coding

Chapter 4: Audio Coding Chapter 4: Audio Coding Lossy and lossless audio compression Traditional lossless data compression methods usually don't work well on audio signals if applied directly. Many audio coders are lossy coders,

More information

Speech and audio coding

Speech and audio coding Institut Mines-Telecom Speech and audio coding Marco Cagnazzo, cagnazzo@telecom-paristech.fr MN910 Advanced compression Outline Introduction Introduction Speech signal Music signal Masking Codeurs simples

More information

Parametric Coding of Spatial Audio

Parametric Coding of Spatial Audio Parametric Coding of Spatial Audio Ph.D. Thesis Christof Faller, September 24, 2004 Thesis advisor: Prof. Martin Vetterli Audiovisual Communications Laboratory, EPFL Lausanne Parametric Coding of Spatial

More information

Audio Fundamentals, Compression Techniques & Standards. Hamid R. Rabiee Mostafa Salehi, Fatemeh Dabiran, Hoda Ayatollahi Spring 2011

Audio Fundamentals, Compression Techniques & Standards. Hamid R. Rabiee Mostafa Salehi, Fatemeh Dabiran, Hoda Ayatollahi Spring 2011 Audio Fundamentals, Compression Techniques & Standards Hamid R. Rabiee Mostafa Salehi, Fatemeh Dabiran, Hoda Ayatollahi Spring 2011 Outlines Audio Fundamentals Sampling, digitization, quantization μ-law

More information

DRA AUDIO CODING STANDARD

DRA AUDIO CODING STANDARD Applied Mechanics and Materials Online: 2013-06-27 ISSN: 1662-7482, Vol. 330, pp 981-984 doi:10.4028/www.scientific.net/amm.330.981 2013 Trans Tech Publications, Switzerland DRA AUDIO CODING STANDARD Wenhua

More information

Lecture 7: Audio Compression & Coding

Lecture 7: Audio Compression & Coding EE E682: Speech & Audio Processing & Recognition Lecture 7: Audio Compression & Coding 1 2 3 Information, compression & quantization Speech coding Wide bandwidth audio coding Dan Ellis

More information

CHAPTER 4 REVERSIBLE IMAGE WATERMARKING USING BIT PLANE CODING AND LIFTING WAVELET TRANSFORM

CHAPTER 4 REVERSIBLE IMAGE WATERMARKING USING BIT PLANE CODING AND LIFTING WAVELET TRANSFORM 74 CHAPTER 4 REVERSIBLE IMAGE WATERMARKING USING BIT PLANE CODING AND LIFTING WAVELET TRANSFORM Many data embedding methods use procedures that in which the original image is distorted by quite a small

More information

CHAPTER 6 Audio compression in practice

CHAPTER 6 Audio compression in practice CHAPTER 6 Audio compression in practice In earlier chapters we have seen that digital sound is simply an array of numbers, where each number is a measure of the air pressure at a particular time. This

More information

Digital Signal Processing Lecture Notes 22 November 2010

Digital Signal Processing Lecture Notes 22 November 2010 Digital Signal Processing Lecture otes 22 ovember 2 Topics: Discrete Cosine Transform FFT Linear and Circular Convolution Rate Conversion Includes review of Fourier transforms, properties of Fourier transforms,

More information

<< WILL FILL IN THESE SECTIONS THIS WEEK to provide sufficient background>>

<< WILL FILL IN THESE SECTIONS THIS WEEK to provide sufficient background>> THE GSS CODEC MUSIC 422 FINAL PROJECT Greg Sell, Song Hui Chon, Scott Cannon March 6, 2005 Audio files at: ccrma.stanford.edu/~gsell/422final/wavfiles.tar Code at: ccrma.stanford.edu/~gsell/422final/codefiles.tar

More information

Design and implementation of a DSP based MPEG-1 audio encoder

Design and implementation of a DSP based MPEG-1 audio encoder Rochester Institute of Technology RIT Scholar Works Theses Thesis/Dissertation Collections 10-1-1998 Design and implementation of a DSP based MPEG-1 audio encoder Eric Hoekstra Follow this and additional

More information

DIGITAL TELEVISION 1. DIGITAL VIDEO FUNDAMENTALS

DIGITAL TELEVISION 1. DIGITAL VIDEO FUNDAMENTALS DIGITAL TELEVISION 1. DIGITAL VIDEO FUNDAMENTALS Television services in Europe currently broadcast video at a frame rate of 25 Hz. Each frame consists of two interlaced fields, giving a field rate of 50

More information

CS 335 Graphics and Multimedia. Image Compression

CS 335 Graphics and Multimedia. Image Compression CS 335 Graphics and Multimedia Image Compression CCITT Image Storage and Compression Group 3: Huffman-type encoding for binary (bilevel) data: FAX Group 4: Entropy encoding without error checks of group

More information

ISO/IEC INTERNATIONAL STANDARD

ISO/IEC INTERNATIONAL STANDARD INTERNATIONAL STANDARD ISO/IEC 13818-7 Second edition 2003-08-01 Information technology Generic coding of moving pictures and associated audio information Part 7: Advanced Audio Coding (AAC) Technologies

More information

Improved Audio Coding Using a Psychoacoustic Model Based on a Cochlear Filter Bank

Improved Audio Coding Using a Psychoacoustic Model Based on a Cochlear Filter Bank IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 10, NO. 7, OCTOBER 2002 495 Improved Audio Coding Using a Psychoacoustic Model Based on a Cochlear Filter Bank Frank Baumgarte Abstract Perceptual

More information

An adaptive wavelet-based approach for perceptual low bit rate audio coding attending to entropy-type criteria

An adaptive wavelet-based approach for perceptual low bit rate audio coding attending to entropy-type criteria An adaptive wavelet-based approach for perceptual low bit rate audio coding attending to entropy-type criteria N. RUIZ REYES 1, M. ROSA ZURERA 2, F. LOPEZ FERRERAS 2, D. MARTINEZ MUÑOZ 1 1 Departamento

More information

Efficient Signal Adaptive Perceptual Audio Coding

Efficient Signal Adaptive Perceptual Audio Coding Efficient Signal Adaptive Perceptual Audio Coding MUHAMMAD TAYYAB ALI, MUHAMMAD SALEEM MIAN Department of Electrical Engineering, University of Engineering and Technology, G.T. Road Lahore, PAKISTAN. ]

More information

ijdsp Interactive Illustrations of Speech/Audio Processing Concepts

ijdsp Interactive Illustrations of Speech/Audio Processing Concepts ijdsp Interactive Illustrations of Speech/Audio Processing Concepts NSF Phase 3 Workshop, UCy Presentation of an Independent Study By Girish Kalyanasundaram, MS by Thesis in EE Advisor: Dr. Andreas Spanias,

More information

Audio coding for digital broadcasting

Audio coding for digital broadcasting Recommendation ITU-R BS.1196-4 (02/2015) Audio coding for digital broadcasting BS Series Broadcasting service (sound) ii Rec. ITU-R BS.1196-4 Foreword The role of the Radiocommunication Sector is to ensure

More information

Perceptual Coding of Digital Audio

Perceptual Coding of Digital Audio Perceptual Coding of Digital Audio TED PAINTER, STUDENT MEMBER, IEEE AND ANDREAS SPANIAS, SENIOR MEMBER, IEEE During the last decade, CD-quality digital audio has essentially replaced analog audio. Emerging

More information

For Mac and iphone. James McCartney Core Audio Engineer. Eric Allamanche Core Audio Engineer

For Mac and iphone. James McCartney Core Audio Engineer. Eric Allamanche Core Audio Engineer For Mac and iphone James McCartney Core Audio Engineer Eric Allamanche Core Audio Engineer 2 3 James McCartney Core Audio Engineer 4 Topics About audio representation formats Converting audio Processing

More information

HAVE YOUR CAKE AND HEAR IT TOO: A HUFFMAN CODED, BLOCK SWITCHING, STEREO PERCEPTUAL AUDIO CODER

HAVE YOUR CAKE AND HEAR IT TOO: A HUFFMAN CODED, BLOCK SWITCHING, STEREO PERCEPTUAL AUDIO CODER HAVE YOUR CAKE AND HEAR IT TOO: A HUFFMAN CODED, BLOCK SWITCHING, STEREO PERCEPTUAL AUDIO CODER Rob Colcord, Elliot Kermit-Canfield and Blane Wilson Center for Computer Research in Music and Acoustics,

More information

Lecture 5: Compression I. This Week s Schedule

Lecture 5: Compression I. This Week s Schedule Lecture 5: Compression I Reading: book chapter 6, section 3 &5 chapter 7, section 1, 2, 3, 4, 8 Today: This Week s Schedule The concept behind compression Rate distortion theory Image compression via DCT

More information

Module 8: Video Coding Basics Lecture 42: Sub-band coding, Second generation coding, 3D coding. The Lecture Contains: Performance Measures

Module 8: Video Coding Basics Lecture 42: Sub-band coding, Second generation coding, 3D coding. The Lecture Contains: Performance Measures The Lecture Contains: Performance Measures file:///d /...Ganesh%20Rana)/MY%20COURSE_Ganesh%20Rana/Prof.%20Sumana%20Gupta/FINAL%20DVSP/lecture%2042/42_1.htm[12/31/2015 11:57:52 AM] 3) Subband Coding It

More information

CHAPTER 5 AUDIO WATERMARKING SCHEME INHERENTLY ROBUST TO MP3 COMPRESSION

CHAPTER 5 AUDIO WATERMARKING SCHEME INHERENTLY ROBUST TO MP3 COMPRESSION CHAPTER 5 AUDIO WATERMARKING SCHEME INHERENTLY ROBUST TO MP3 COMPRESSION In chapter 4, SVD based watermarking schemes are proposed which met the requirement of imperceptibility, having high payload and

More information

Implementation of FPGA Based MP3 player using Invers Modified Discrete Cosine Transform

Implementation of FPGA Based MP3 player using Invers Modified Discrete Cosine Transform Implementation of FPGA Based MP3 player using Invers Modified Discrete Cosine Transform Mr. Sanket Shinde Universal college of engineering, Kaman Email-Id:sanketsanket01@gmail.com Mr. Vinay Vyas Universal

More information

Audio Compression Using Decibel chirp Wavelet in Psycho- Acoustic Model

Audio Compression Using Decibel chirp Wavelet in Psycho- Acoustic Model Audio Compression Using Decibel chirp Wavelet in Psycho- Acoustic Model 1 M. Chinna Rao M.Tech,(Ph.D) Research scholar, JNTUK,kakinada chinnarao.mortha@gmail.com 2 Dr. A.V.S.N. Murthy Professor of Mathematics,

More information

S.K.R Engineering College, Chennai, India. 1 2

S.K.R Engineering College, Chennai, India. 1 2 Implementation of AAC Encoder for Audio Broadcasting A.Parkavi 1, T.Kalpalatha Reddy 2. 1 PG Scholar, 2 Dean 1,2 Department of Electronics and Communication Engineering S.K.R Engineering College, Chennai,

More information

Squeeze Play: The State of Ady0 Cmprshn. Scott Selfon Senior Development Lead Xbox Advanced Technology Group Microsoft

Squeeze Play: The State of Ady0 Cmprshn. Scott Selfon Senior Development Lead Xbox Advanced Technology Group Microsoft Squeeze Play: The State of Ady0 Cmprshn Scott Selfon Senior Development Lead Xbox Advanced Technology Group Microsoft Agenda Why compress? The tools at present Measuring success A glimpse of the future

More information

MODIFIED IMDCT-DECODER BASED MP3 MULTICHANNEL AUDIO DECODING SYSTEM Shanmuga Raju.S 1, Karthik.R 2, Sai Pradeep.K.P 3, Varadharajan.

MODIFIED IMDCT-DECODER BASED MP3 MULTICHANNEL AUDIO DECODING SYSTEM Shanmuga Raju.S 1, Karthik.R 2, Sai Pradeep.K.P 3, Varadharajan. MODIFIED IMDCT-DECODER BASED MP3 MULTICHANNEL AUDIO DECODING SYSTEM Shanmuga Raju.S 1, Karthik.R 2, Sai Pradeep.K.P 3, Varadharajan.E 4 Assistant Professor, Dept. of ECE, Dr.NGP Institute of Technology,

More information

CODING METHOD FOR EMBEDDING AUDIO IN VIDEO STREAM. Harri Sorokin, Jari Koivusaari, Moncef Gabbouj, and Jarmo Takala

CODING METHOD FOR EMBEDDING AUDIO IN VIDEO STREAM. Harri Sorokin, Jari Koivusaari, Moncef Gabbouj, and Jarmo Takala CODING METHOD FOR EMBEDDING AUDIO IN VIDEO STREAM Harri Sorokin, Jari Koivusaari, Moncef Gabbouj, and Jarmo Takala Tampere University of Technology Korkeakoulunkatu 1, 720 Tampere, Finland ABSTRACT In

More information

Multimedia Communications. Transform Coding

Multimedia Communications. Transform Coding Multimedia Communications Transform Coding Transform coding Transform coding: source output is transformed into components that are coded according to their characteristics If a sequence of inputs is transformed

More information

Implementation of a MPEG 1 Layer I Audio Decoder with Variable Bit Lengths

Implementation of a MPEG 1 Layer I Audio Decoder with Variable Bit Lengths Implementation of a MPEG 1 Layer I Audio Decoder with Variable Bit Lengths A thesis submitted in fulfilment of the requirements of the degree of Master of Engineering 23 September 2008 Damian O Callaghan

More information

Digital Audio for Multimedia

Digital Audio for Multimedia Proceedings Signal Processing for Multimedia - NATO Advanced Audio Institute in print, 1999 Digital Audio for Multimedia Abstract Peter Noll Technische Universität Berlin, Germany Einsteinufer 25 D-105

More information

SIGNAL COMPRESSION. 9. Lossy image compression: SPIHT and S+P

SIGNAL COMPRESSION. 9. Lossy image compression: SPIHT and S+P SIGNAL COMPRESSION 9. Lossy image compression: SPIHT and S+P 9.1 SPIHT embedded coder 9.2 The reversible multiresolution transform S+P 9.3 Error resilience in embedded coding 178 9.1 Embedded Tree-Based

More information

Bluray (

Bluray ( Bluray (http://www.blu-ray.com/faq) MPEG-2 - enhanced for HD, also used for playback of DVDs and HDTV recordings MPEG-4 AVC - part of the MPEG-4 standard also known as H.264 (High Profile and Main Profile)

More information

A Review of Algorithms for Perceptual Coding of Digital Audio Signals

A Review of Algorithms for Perceptual Coding of Digital Audio Signals A Review of Algorithms for Perceptual Coding of Digital Audio Signals Ted Painter, Student Member IEEE, and Andreas Spanias, Senior Member IEEE Department of Electrical Engineering, Telecommunications

More information

Efficient Representation of Sound Images: Recent Developments in Parametric Coding of Spatial Audio

Efficient Representation of Sound Images: Recent Developments in Parametric Coding of Spatial Audio Efficient Representation of Sound Images: Recent Developments in Parametric Coding of Spatial Audio Dr. Jürgen Herre 11/07 Page 1 Jürgen Herre für (IIS) Erlangen, Germany Introduction: Sound Images? Humans

More information

AUDIO MEDIA CHAPTER Background

AUDIO MEDIA CHAPTER Background CHAPTER 3 AUDIO MEDIA 3.1 Background It is important to understand how the various audio software is distributed in order to plan for its use. Today, there are so many audio media formats that sorting

More information

CHAPTER 3 DIFFERENT DOMAINS OF WATERMARKING. domain. In spatial domain the watermark bits directly added to the pixels of the cover

CHAPTER 3 DIFFERENT DOMAINS OF WATERMARKING. domain. In spatial domain the watermark bits directly added to the pixels of the cover 38 CHAPTER 3 DIFFERENT DOMAINS OF WATERMARKING Digital image watermarking can be done in both spatial domain and transform domain. In spatial domain the watermark bits directly added to the pixels of the

More information

MRT based Fixed Block size Transform Coding

MRT based Fixed Block size Transform Coding 3 MRT based Fixed Block size Transform Coding Contents 3.1 Transform Coding..64 3.1.1 Transform Selection...65 3.1.2 Sub-image size selection... 66 3.1.3 Bit Allocation.....67 3.2 Transform coding using

More information

Lossy compression. CSCI 470: Web Science Keith Vertanen

Lossy compression. CSCI 470: Web Science Keith Vertanen Lossy compression CSCI 470: Web Science Keith Vertanen Digital audio Overview Sampling rate Quan5za5on MPEG audio layer 3 (MP3) JPEG s5ll images Color space conversion, downsampling Discrete Cosine Transform

More information

A Image Comparative Study using DCT, Fast Fourier, Wavelet Transforms and Huffman Algorithm

A Image Comparative Study using DCT, Fast Fourier, Wavelet Transforms and Huffman Algorithm International Journal of Engineering Research and General Science Volume 3, Issue 4, July-August, 15 ISSN 91-2730 A Image Comparative Study using DCT, Fast Fourier, Wavelet Transforms and Huffman Algorithm

More information

MUSIC A Darker Phonetic Audio Coder

MUSIC A Darker Phonetic Audio Coder MUSIC 422 - A Darker Phonetic Audio Coder Prateek Murgai and Orchisama Das Abstract In this project we develop an audio coder that tries to improve the quality of the audio at 128kbps per channel by employing

More information