Principles of Audio Coding

Size: px
Start display at page:

Download "Principles of Audio Coding"

Transcription

1 Principles of Audio Coding

2 Topics today Introduction VOCODERS Psychoacoustics Equal-Loudness Curve Frequency Masking Temporal Masking (CSIT 410) 2

3 Introduction Speech compression algorithm focuses on exploiting temporal redundancy PCM DPCM ADPCM Variants of these algorithms take into consideration the speech properties a) Linear PCM at 16 bit per sample at 8kHz b) Speech restored from G.721 compressed audio at 4 bit per sample c) Difference between a & b (CSIT 410) 3

4 Introduction [2] G.726 ADPCM (It supersedes G.721 & G. 723) Defines a multiplier constant that will change for every difference value e n, depending on the current scale of signals The scaled difference signal is defined as (CSIT 410) 4

5 Introduction [3] g n is sent for quantization Quantizer is backward adaptive Works by noticing if too many values are quantized to values far from 0, or too many values fell closer to 0 most of the times. It changes the size of the steps in the quantizer accordingly. (CSIT 410) 5

6 VOCODERS Voice coders Concerned with modeling speech, in capturing the features in as few bits as possible Model speech waveform In time domain (Linear Predictive Coding) In frequency domain (Channel vocoders & Formant vocoders) (CSIT 410) 6

7 VOCODERS Phase Insensitivity Phase is a shift in the time argument Perceptually the sound waves cos( t)+cos(2 t+ /2) and cos( t)+cos(2 t) sound similar So the energy spectrum is important, not the shape of the waveform (CSIT 410) 7

8 VOCODERS Phase Insensitivity[2] Solid line shows phase shifted superposition of two cosine waves. Dashed line shows unshifted superposition. (CSIT 410) 8

9 VOCODERS Channel Vocoder Subband filtering Subband coding. It does use the power of energy spectrum, so waveform is rectified to its absolute value. ITU G.722, for instance, filters analog signals into two bands 1. 50Hz to 3.5 khz 48kbps kHz to 7kHz 16kbps Waveform for the word audio Vocoders can operate at low bitrates, 1-2kbps (CSIT 410) 9

10 VOCODERS Channel Vocoder [2] It also analyzes the pitch & excitation of the speech, besides its absolute value. Excitation is concerned with if a sound is voiced or unvoiced Unvoiced signal looks like a noise (s, f) Voiced signal is fairly periodic (a, e, o) (CSIT 410) 10

11 VOCODERS Channel Vocoder [3] Uses vocal-tract model to generates vector of excitation parameters that describe the sound Guesses if a sound is voiced / unvoiced If the sound is voiced, identifies the period using 2400bps (CSIT 410) 11

12 VOCODERS Channel Vocoder [4] Voiced sounds periodic wave generator Unvoiced sounds pseudo-noise generator + estimate of energy given by band pass filter Achieves intelligible synthetic voice at 2400 bps (CSIT 410) 12

13 VOCODERS Channel Vocoder [5] Channel Vocoder (CSIT 410) 13

14 VOCODERS Formant Vocoder Not all frequencies present in the speech are equally represented Certain frequency components are strong while others not The important frequency peaks are called formants Formant vocoder works by encoding only most important frequencies Can produce intelligible speech at 1000bps Formants of two signals (CSIT 410) 14

15 VOCODERS Linear Predictive Coding Extract the feature of the signal from the waveform, do not convert to frequency domain Set of parameters modeling the shape and excitation of the vocal tract, not actual signals or differences Bitrates using LPC are small, because we send instructions, rather than the sound itself. (Something similar to MIDI) (CSIT 410) 15

16 Psychoacoustics - Introduction The range of human hearing is 20Hz- 20kHz. Range of human voice is from 500 Hz to 4kHz. Temporal masking Ever attended a musical performance & found sometime afterward you hear nothing??? Frequency masking Have you noticed the band s singing drowned out by the lead guitar??? (CSIT 410) 16

17 Psychoacoustics Introduction [2] Any coding technique that take advantage of such psychoacoustic model of hearing is referred to as perceptual coding (CSIT 410) 17

18 Psychoacoustics Equal-Loudness Relations The ear does not hear low and high frequencies as well as those in the middle. Fletcher-Munson Curves Equal loudness curves Perceived loudness (in phons) plotted for a given sound volume (db) vs frequency (Hz) (CSIT 410) 18

19 Psychoacoustics Equal-Loudness Relations [2] Fletcher-Munson equal loudness response curves (CSIT 410) 19

20 Psychoacoustics Equal-Loudness Relations [3] At 4kHz, 2dB gives the perception of 10dB At 10kHz 20dB gives the perception of 10dB At 0.1kHz 30dB gives the perception of 10dB (CSIT 410) 20

21 Psychoacoustics Equal-Loudness Relations [4] Observe the curves in 2.5kHz to 4kHz duration Very sensitive??? Reason : ear canal amplifies frequencies from 2.5kHz to 4kHz (CSIT 410) 21

22 Psychoacoustics Frequency Masking Frequency masking answers How does one tone interfere with another? At what level, one frequency drown out other? Masking curves have answers to these questions (CSIT 410) 22

23 Psychoacoustics Frequency Masking [2] Scenarios Lower tone can effectively mask the higher tones Higher tone do not mask the lower tone as well and effectively as the lower do the higher The greater the power in the masking tone the wider its influence, the broader range of frequency it can mask If two tones are widely separated by frequency, little masking occurs (CSIT 410) 23

24 Psychoacoustics Frequency Masking [3] Threshold of hearing 1. Generate one particular frequency (say 1 khz) 2. Reduce its volume to 0 in a quiet room 3. Turn up until the sound is barely audible Generate data points for all audible frequencies, this way & plot. (CSIT 410) 24

25 Psychoacoustics Frequency Masking [4] Threshold of hearing. Only if the sound is above its threshold level, it can be heard. The formula that approximated the above curve is The threshold units is db (CSIT 410) 25

26 Psychoacoustics Frequency Masking [5] Frequency masking curves Generated by Playing a pure tone (say at 1kHz), at a loud volume, and Verifying how this tone affects our abilities to hear tones at nearby frequencies Play 1kHz-60 db (masking) tone Raise the level of a nearby tone, say 1.1kHz, until is just audible. (CSIT 410) 26

27 Psychoacoustics Frequency Masking [6] The higher the frequency of the masking tone, the broader the range of its influence Effect of masking tones (CSIT 410) 27

28 Psychoacoustics Frequency Masking [7] Masking by loudness Effect of loudness of tones (CSIT 410) 28

29 Psychoacoustics Critical Bands Represents the ear s resolving power for simultaneous tones Human hearing range is divided into critical bands There is an inability of the auditory frequency-analysis mechanism to resolve inputs whose frequency difference is smaller than the critical bandwidth reduced audibility of a sound signal when in the presence of a second signal of higher intensity and within the same critical band. (CSIT 410) 29

30 Psychoacoustics Critical Bands [2] At lower frequency, the critical band is approximately 100Hz For frequencies above 500Hz, the critical bandwidth increases approximately linearly with frequency The ear is not very discriminating within a critical band, because of masking (CSIT 410) 30

31 Psychoacoustics Critical Bands [3] Critical Bands & their bandwidths (CSIT 410) 31

32 Psychoacoustics Bark Unit The higher the masking tone frequency, the broader the frequencies masked Bark unit is an alternative frequency unit, such that the masking curves are of same width New unit is named Bark, named after Heinrich Barkhausen One Bark unit corresponds to width on one critical band for any masking frequency (CSIT 410) 32

33 Psychoacoustics Bark Unit [2] Effects of masking tones expressed in Bark units The conversion between the frequency and the critical band number (Bark) is (CSIT 410) 33

34 Psychoacoustics Temporal Masking It takes quite a while for our hearing to return normal after a musical performance Any loud tone causes the hearing receptors in the inner ear to become saturated, and they require time to recover human eyes also have this kind of effect (CSIT 410) 34

35 Psychoacoustics Temporal Masking [2] Masking experiment 1. Play a masking tone, say at 1 khz, volume level of 60 db 2. Play another (test) tone at 1.1kHz, at 40dB. This may not be heard in the presence of the masking tone 3. Turn off the masking tone. It will take a while to start hearing the test tone 4. Now turn off the test tone just after the masking tone is off. Adjust this delay such that, the test tone is turned off when the test tone can just be distinguished It may take 500 ms to discern the test tone after a masking tome at 60dB is turned off (CSIT 410) 35

36 Psychoacoustics Temporal Masking [3] The louder the test tone, the shorter the delay for it to be heard after the masking signal is removed (CSIT 410) 36

37 Psychoacoustics Temporal Masking [4] Solid Line: masking tone played for 200msec Dashed Line: masking tone played for 100msec The phenomenon of saturation also depends on how long the masking signal is applied (CSIT 410) 37

38 MPEG- Introduction First it applies a filter bank to the input, to break the input into frequency components Applies psycho-acoustic model to the data and this model is used in a bit-allocation block Number of bits allocated is used to quantize the information from the filter bank (CSIT 410) 38

39 MPEG Layers 3 downward compatible layers, each able to understand the lower layers. Audio part of the MPEG standard. More complexity in the psychoacoustic model, better compression, with more delay Layer-1 DAT (Good quality with high bitrate) Layer-2 DAB Layer-3 (MP3) Audio transmission over ISDN (CSIT 410) 39

40 MPEG Layers [2] Each layer uses different frequency transform, and a psychoacoustic model More complex encoders, but simpler decoders (MP3 players for instance) Quality in terms of listening test scores: At 64kbps, out of a level of 5: Layer to 2.6 Layer to 3.8 (CSIT 410) 40

41 MPEG Audio Strategy Compression is called for. It relies on quantization, but also uses idea of critical band. MPEG-1 aims at 256kbps for audio The encoder employs a bank of filters that analyze the frequency components of the audio signal Frequency masking is brought into here to analyze the just noticeable noise level Balances the masking behavior & the available number of bits, by discarding the inaudible frequency (CSIT 410) 41

42 MPEG Audio Strategy [2] Uniform width for all frequency for all frequency analysis filters 32 overlapping subbands For each frequency level, sound level above masking level dictates how many bits must be assigned to code signal values Quantization noise is kept below the masking level & cannot be heard (CSIT 410) 42

43 MPEG Audio Strategy [3] Layer-1 uses only frequency masking. Bitrates range from 32 (mono) to 448 kbps (stereo) Layer-2 uses also temporal masking by accumulating more blocks of samples and comparing the current block with the neighboring blocks. Ranges from (mono) to kbps (stereo) Layer-3 is directed towards lower bitrate applications & uses more sophisticated subband analysis, nonuniform quantization and entropy coding. Ranges from kbps (CSIT 410) 43

44 Audio Compression Algorithm MPEG Audio encoder & decoder (CSIT 410) 44

45 Audio Compression Algorithm [2] It divides the input into 32 frequency subbands, via a filter bank Takes in as input 32 PCM samples, sampled in time and produces as its output 32 frequency coefficients If the sampling rate is fs = 48ksps, the maximum frequency mapped is fs/2 (by Nyquist theorem) (CSIT 410) 45

46 Audio Compression Algorithm [3] Layer-1 The sets of 32 PCM values are assembled in to a set of 12 groups of 32s (segments). Delay to accumulate 384 samples Quantization is decided for each segments Consider the 32 x 12 segment as a 32 x 12 matrix (CSIT 410) 46

47 Audio Compression Algorithm [4] For each of the 32 subbands the quantization is set Maximum amplitude in a row of 12 samples is taken as the scaling factor of that subband & subsequently dictates the bit allocation Decision is made if the signal is noise or tone This decision and the scaling factor is used calculate the masking threshold for each band & then compared to threshold of hearing The output of frequency masking model consists of SMR (Signal-to-Mask Ratio) Ratio of short term signal power to the minimum masking threshold for the subband. SMR directs the amplitude resolution & influences the bit allocation (CSIT 410) 47

48 Audio Compression Algorithm [5] More bits are used in the region where hearing is more sensitive Scaling factor is quantized using 6 bits The 12 values in each subband are quantized Also the bit allocation for each subband is transmitted Maximum resolution for quantizer is 15 bits. (CSIT 410) 48

49 Audio Compression Algorithm [6] MPEG Audio Frame size (CSIT 410) 49

50 Audio Compression Algorithm [7] Layer-2 Reduced bitrate, improved quality and increased complexity Three group of 12 samples are encoded in each frame & temporal masking is brought into play If scaling factor is similar for each of the three groups only one needs to be sent Bit allocation is applied to window length of 36 samples, instead 12 in layer-1 ( before, current and next ) Increased quantizer resolution, 16 bits (CSIT 410) 50

51 Audio Compression Algorithm [8] Layer-3 Takes into account stereo redundancy Uses refinement of Fourier transform, MDCT, addresses problems that DCT had at boundaries of window used Window size can be reduced to 12 samples, optionally, or a mixture of the two can be used (18 for lower freq.) Better compression ratio Table: MP-3 compression performances (CSIT 410) 51

52 MPEG-2 AAC (Advanced Audio Coding) Standard for DVDs. Adopted by XM radio. Capable of delivering high-quality stereo sound at 5 channels, so it can be played from 5 directions, at 320 kbps 5.1 channel system includes low frequency enhancement (woofer = LFE channel) Also capable of delivering good quality stereo sound at 128kbps Supports 3 profiles Main, Low Complexity (LC), Scalable Sampling Rate (SSR) (CSIT 410) 52

53 MPEG-4, 7, 21 MPEG-4 Integrates several audio coders, perceptual and structured Speech compression Perceptually based coders Text-to-speech MIDI MPEG-7 Promote the search of audio objects & coding is based on audio objects Not based on a complete model ASR is supported MPEG-21 Ongoing effort, addressing interoperability (CSIT 410) 53

54 Reference: Chapter 13, 14 (CSIT 410) 54

Chapter 14 MPEG Audio Compression

Chapter 14 MPEG Audio Compression Chapter 14 MPEG Audio Compression 14.1 Psychoacoustics 14.2 MPEG Audio 14.3 Other Commercial Audio Codecs 14.4 The Future: MPEG-7 and MPEG-21 14.5 Further Exploration 1 Li & Drew c Prentice Hall 2003 14.1

More information

Data Compression. Audio compression

Data Compression. Audio compression 1 Data Compression Audio compression Outline Basics of Digital Audio 2 Introduction What is sound? Signal-to-Noise Ratio (SNR) Digitization Filtering Sampling and Nyquist Theorem Quantization Synthetic

More information

Perceptual coding. A psychoacoustic model is used to identify those signals that are influenced by both these effects.

Perceptual coding. A psychoacoustic model is used to identify those signals that are influenced by both these effects. Perceptual coding Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal. Perceptual encoders, however, have been designed for the compression of general

More information

Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal.

Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal. Perceptual coding Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal. Perceptual encoders, however, have been designed for the compression of general

More information

Bluray (

Bluray ( Bluray (http://www.blu-ray.com/faq) MPEG-2 - enhanced for HD, also used for playback of DVDs and HDTV recordings MPEG-4 AVC - part of the MPEG-4 standard also known as H.264 (High Profile and Main Profile)

More information

Perceptual Coding. Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding

Perceptual Coding. Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding Perceptual Coding Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding Part II wrap up 6.082 Fall 2006 Perceptual Coding, Slide 1 Lossless vs.

More information

Audio-coding standards

Audio-coding standards Audio-coding standards The goal is to provide CD-quality audio over telecommunications networks. Almost all CD audio coders are based on the so-called psychoacoustic model of the human auditory system.

More information

2.4 Audio Compression

2.4 Audio Compression 2.4 Audio Compression 2.4.1 Pulse Code Modulation Audio signals are analog waves. The acoustic perception is determined by the frequency (pitch) and the amplitude (loudness). For storage, processing and

More information

Audio-coding standards

Audio-coding standards Audio-coding standards The goal is to provide CD-quality audio over telecommunications networks. Almost all CD audio coders are based on the so-called psychoacoustic model of the human auditory system.

More information

5: Music Compression. Music Coding. Mark Handley

5: Music Compression. Music Coding. Mark Handley 5: Music Compression Mark Handley Music Coding LPC-based codecs model the sound source to achieve good compression. Works well for voice. Terrible for music. What if you can t model the source? Model the

More information

Audio Fundamentals, Compression Techniques & Standards. Hamid R. Rabiee Mostafa Salehi, Fatemeh Dabiran, Hoda Ayatollahi Spring 2011

Audio Fundamentals, Compression Techniques & Standards. Hamid R. Rabiee Mostafa Salehi, Fatemeh Dabiran, Hoda Ayatollahi Spring 2011 Audio Fundamentals, Compression Techniques & Standards Hamid R. Rabiee Mostafa Salehi, Fatemeh Dabiran, Hoda Ayatollahi Spring 2011 Outlines Audio Fundamentals Sampling, digitization, quantization μ-law

More information

AUDIOVISUAL COMMUNICATION

AUDIOVISUAL COMMUNICATION AUDIOVISUAL COMMUNICATION Laboratory Session: Audio Processing and Coding The objective of this lab session is to get the students familiar with audio processing and coding, notably psychoacoustic analysis

More information

Mpeg 1 layer 3 (mp3) general overview

Mpeg 1 layer 3 (mp3) general overview Mpeg 1 layer 3 (mp3) general overview 1 Digital Audio! CD Audio:! 16 bit encoding! 2 Channels (Stereo)! 44.1 khz sampling rate 2 * 44.1 khz * 16 bits = 1.41 Mb/s + Overhead (synchronization, error correction,

More information

Multimedia Communications. Audio coding

Multimedia Communications. Audio coding Multimedia Communications Audio coding Introduction Lossy compression schemes can be based on source model (e.g., speech compression) or user model (audio coding) Unlike speech, audio signals can be generated

More information

AUDIOVISUAL COMMUNICATION

AUDIOVISUAL COMMUNICATION AUDIOVISUAL COMMUNICATION Laboratory Session: Audio Processing and Coding The objective of this lab session is to get the students familiar with audio processing and coding, notably psychoacoustic analysis

More information

ELL 788 Computational Perception & Cognition July November 2015

ELL 788 Computational Perception & Cognition July November 2015 ELL 788 Computational Perception & Cognition July November 2015 Module 11 Audio Engineering: Perceptual coding Coding and decoding Signal (analog) Encoder Code (Digital) Code (Digital) Decoder Signal (analog)

More information

EE482: Digital Signal Processing Applications

EE482: Digital Signal Processing Applications Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu EE482: Digital Signal Processing Applications Spring 2014 TTh 14:30-15:45 CBC C222 Lecture 13 Audio Signal Processing 14/04/01 http://www.ee.unlv.edu/~b1morris/ee482/

More information

Audio Compression. Audio Compression. Absolute Threshold. CD quality audio:

Audio Compression. Audio Compression. Absolute Threshold. CD quality audio: Audio Compression Audio Compression CD quality audio: Sampling rate = 44 KHz, Quantization = 16 bits/sample Bit-rate = ~700 Kb/s (1.41 Mb/s if 2 channel stereo) Telephone-quality speech Sampling rate =

More information

Lecture 16 Perceptual Audio Coding

Lecture 16 Perceptual Audio Coding EECS 225D Audio Signal Processing in Humans and Machines Lecture 16 Perceptual Audio Coding 2012-3-14 Professor Nelson Morgan today s lecture by John Lazzaro www.icsi.berkeley.edu/eecs225d/spr12/ Hero

More information

Audio Coding and MP3

Audio Coding and MP3 Audio Coding and MP3 contributions by: Torbjørn Ekman What is Sound? Sound waves: 20Hz - 20kHz Speed: 331.3 m/s (air) Wavelength: 165 cm - 1.65 cm 1 Analogue audio frequencies: 20Hz - 20kHz mono: x(t)

More information

Appendix 4. Audio coding algorithms

Appendix 4. Audio coding algorithms Appendix 4. Audio coding algorithms 1 Introduction The main application of audio compression systems is to obtain compact digital representations of high-quality (CD-quality) wideband audio signals. Typically

More information

Figure 1. Generic Encoder. Window. Spectral Analysis. Psychoacoustic Model. Quantize. Pack Data into Frames. Additional Coding.

Figure 1. Generic Encoder. Window. Spectral Analysis. Psychoacoustic Model. Quantize. Pack Data into Frames. Additional Coding. Introduction to Digital Audio Compression B. Cavagnolo and J. Bier Berkeley Design Technology, Inc. 2107 Dwight Way, Second Floor Berkeley, CA 94704 (510) 665-1600 info@bdti.com http://www.bdti.com INTRODUCTION

More information

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Snr Staff Eng., Team Lead (Applied Research) Dolby Australia Pty Ltd

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Snr Staff Eng., Team Lead (Applied Research) Dolby Australia Pty Ltd Introducing Audio Signal Processing & Audio Coding Dr Michael Mason Snr Staff Eng., Team Lead (Applied Research) Dolby Australia Pty Ltd Introducing Audio Signal Processing & Audio Coding 2013 Dolby Laboratories,

More information

Ch. 5: Audio Compression Multimedia Systems

Ch. 5: Audio Compression Multimedia Systems Ch. 5: Audio Compression Multimedia Systems Prof. Ben Lee School of Electrical Engineering and Computer Science Oregon State University Chapter 5: Audio Compression 1 Introduction Need to code digital

More information

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Senior Manager, CE Technology Dolby Australia Pty Ltd

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Senior Manager, CE Technology Dolby Australia Pty Ltd Introducing Audio Signal Processing & Audio Coding Dr Michael Mason Senior Manager, CE Technology Dolby Australia Pty Ltd Overview Audio Signal Processing Applications @ Dolby Audio Signal Processing Basics

More information

ITNP80: Multimedia! Sound-II!

ITNP80: Multimedia! Sound-II! Sound compression (I) Compression of sound data requires different techniques from those for graphical data Requirements are less stringent than for video data rate for CD-quality audio is much less than

More information

Audio and video compression

Audio and video compression Audio and video compression 4.1 introduction Unlike text and images, both audio and most video signals are continuously varying analog signals. Compression algorithms associated with digitized audio and

More information

Optical Storage Technology. MPEG Data Compression

Optical Storage Technology. MPEG Data Compression Optical Storage Technology MPEG Data Compression MPEG-1 1 Audio Standard Moving Pictures Expert Group (MPEG) was formed in 1988 to devise compression techniques for audio and video. It first devised the

More information

Parametric Coding of High-Quality Audio

Parametric Coding of High-Quality Audio Parametric Coding of High-Quality Audio Prof. Dr. Gerald Schuller Fraunhofer IDMT & Ilmenau Technical University Ilmenau, Germany 1 Waveform vs Parametric Waveform Filter-bank approach Mainly exploits

More information

Speech-Coding Techniques. Chapter 3

Speech-Coding Techniques. Chapter 3 Speech-Coding Techniques Chapter 3 Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones and zeros The lower the bandwidth, the lower the quality RTP payload types

More information

CT516 Advanced Digital Communications Lecture 7: Speech Encoder

CT516 Advanced Digital Communications Lecture 7: Speech Encoder CT516 Advanced Digital Communications Lecture 7: Speech Encoder Yash M. Vasavada Associate Professor, DA-IICT, Gandhinagar 2nd February 2017 Yash M. Vasavada (DA-IICT) CT516: Adv. Digital Comm. 2nd February

More information

Fundamentals of Perceptual Audio Encoding. Craig Lewiston HST.723 Lab II 3/23/06

Fundamentals of Perceptual Audio Encoding. Craig Lewiston HST.723 Lab II 3/23/06 Fundamentals of Perceptual Audio Encoding Craig Lewiston HST.723 Lab II 3/23/06 Goals of Lab Introduction to fundamental principles of digital audio & perceptual audio encoding Learn the basics of psychoacoustic

More information

Principles of MPEG audio compression

Principles of MPEG audio compression Principles of MPEG audio compression Principy komprese hudebního signálu metodou MPEG Petr Kubíček Abstract The article describes briefly audio data compression. Focus of the article is a MPEG standard,

More information

MPEG-4 General Audio Coding

MPEG-4 General Audio Coding MPEG-4 General Audio Coding Jürgen Herre Fraunhofer Institute for Integrated Circuits (IIS) Dr. Jürgen Herre, hrr@iis.fhg.de 1 General Audio Coding Solid state players, Internet audio, terrestrial and

More information

The MPEG-4 General Audio Coder

The MPEG-4 General Audio Coder The MPEG-4 General Audio Coder Bernhard Grill Fraunhofer Institute for Integrated Circuits (IIS) grl 6/98 page 1 Outline MPEG-2 Advanced Audio Coding (AAC) MPEG-4 Extensions: Perceptual Noise Substitution

More information

Module 9 AUDIO CODING. Version 2 ECE IIT, Kharagpur

Module 9 AUDIO CODING. Version 2 ECE IIT, Kharagpur Module 9 AUDIO CODING Lesson 29 Transform and Filter banks Instructional Objectives At the end of this lesson, the students should be able to: 1. Define the three layers of MPEG-1 audio coding. 2. Define

More information

Digital Media. Daniel Fuller ITEC 2110

Digital Media. Daniel Fuller ITEC 2110 Digital Media Daniel Fuller ITEC 2110 Daily Question: Digital Audio What values contribute to the file size of a digital audio file? Email answer to DFullerDailyQuestion@gmail.com Subject Line: ITEC2110-09

More information

Speech and audio coding

Speech and audio coding Institut Mines-Telecom Speech and audio coding Marco Cagnazzo, cagnazzo@telecom-paristech.fr MN910 Advanced compression Outline Introduction Introduction Speech signal Music signal Masking Codeurs simples

More information

DAB. Digital Audio Broadcasting

DAB. Digital Audio Broadcasting DAB Digital Audio Broadcasting DAB history DAB has been under development since 1981 at the Institut für Rundfunktechnik (IRT). In 1985 the first DAB demonstrations were held at the WARC-ORB in Geneva

More information

New Results in Low Bit Rate Speech Coding and Bandwidth Extension

New Results in Low Bit Rate Speech Coding and Bandwidth Extension Audio Engineering Society Convention Paper Presented at the 121st Convention 2006 October 5 8 San Francisco, CA, USA This convention paper has been reproduced from the author's advance manuscript, without

More information

ijdsp Interactive Illustrations of Speech/Audio Processing Concepts

ijdsp Interactive Illustrations of Speech/Audio Processing Concepts ijdsp Interactive Illustrations of Speech/Audio Processing Concepts NSF Phase 3 Workshop, UCy Presentation of an Independent Study By Girish Kalyanasundaram, MS by Thesis in EE Advisor: Dr. Andreas Spanias,

More information

Contents. 3 Vector Quantization The VQ Advantage Formulation Optimality Conditions... 48

Contents. 3 Vector Quantization The VQ Advantage Formulation Optimality Conditions... 48 Contents Part I Prelude 1 Introduction... 3 1.1 Audio Coding... 4 1.2 Basic Idea... 6 1.3 Perceptual Irrelevance... 8 1.4 Statistical Redundancy... 9 1.5 Data Modeling... 9 1.6 Resolution Challenge...

More information

MPEG-1. Overview of MPEG-1 1 Standard. Introduction to perceptual and entropy codings

MPEG-1. Overview of MPEG-1 1 Standard. Introduction to perceptual and entropy codings MPEG-1 Overview of MPEG-1 1 Standard Introduction to perceptual and entropy codings Contents History Psychoacoustics and perceptual coding Entropy coding MPEG-1 Layer I/II Layer III (MP3) Comparison and

More information

AUDIO. Henning Schulzrinne Dept. of Computer Science Columbia University Spring 2015

AUDIO. Henning Schulzrinne Dept. of Computer Science Columbia University Spring 2015 AUDIO Henning Schulzrinne Dept. of Computer Science Columbia University Spring 2015 Key objectives How do humans generate and process sound? How does digital sound work? How fast do I have to sample audio?

More information

CHAPTER 10: SOUND AND VIDEO EDITING

CHAPTER 10: SOUND AND VIDEO EDITING CHAPTER 10: SOUND AND VIDEO EDITING What should you know 1. Edit a sound clip to meet the requirements of its intended application and audience a. trim a sound clip to remove unwanted material b. join

More information

CISC 7610 Lecture 3 Multimedia data and data formats

CISC 7610 Lecture 3 Multimedia data and data formats CISC 7610 Lecture 3 Multimedia data and data formats Topics: Perceptual limits of multimedia data JPEG encoding of images MPEG encoding of audio MPEG and H.264 encoding of video Multimedia data: Perceptual

More information

Digital Speech Coding

Digital Speech Coding Digital Speech Processing David Tipper Associate Professor Graduate Program of Telecommunications and Networking University of Pittsburgh Telcom 2700/INFSCI 1072 Slides 7 http://www.sis.pitt.edu/~dtipper/tipper.html

More information

Audio Coding Standards

Audio Coding Standards Audio Standards Kari Pihkala 13.2.2002 Tik-111.590 Multimedia Outline Architectural Overview MPEG-1 MPEG-2 MPEG-4 Philips PASC (DCC cassette) Sony ATRAC (MiniDisc) Dolby AC-3 Conclusions 2 Architectural

More information

CHAPTER 6 Audio compression in practice

CHAPTER 6 Audio compression in practice CHAPTER 6 Audio compression in practice In earlier chapters we have seen that digital sound is simply an array of numbers, where each number is a measure of the air pressure at a particular time. This

More information

Port of a Fixed Point MPEG-2 AAC Encoder on a ARM Platform

Port of a Fixed Point MPEG-2 AAC Encoder on a ARM Platform Port of a Fixed Point MPEG-2 AAC Encoder on a ARM Platform by Romain Pagniez romain@felinewave.com A Dissertation submitted in partial fulfillment of the requirements for the Degree of Master of Science

More information

<< WILL FILL IN THESE SECTIONS THIS WEEK to provide sufficient background>>

<< WILL FILL IN THESE SECTIONS THIS WEEK to provide sufficient background>> THE GSS CODEC MUSIC 422 FINAL PROJECT Greg Sell, Song Hui Chon, Scott Cannon March 6, 2005 Audio files at: ccrma.stanford.edu/~gsell/422final/wavfiles.tar Code at: ccrma.stanford.edu/~gsell/422final/codefiles.tar

More information

KINGS COLLEGE OF ENGINEERING DEPARTMENT OF INFORMATION TECHNOLOGY ACADEMIC YEAR / ODD SEMESTER QUESTION BANK

KINGS COLLEGE OF ENGINEERING DEPARTMENT OF INFORMATION TECHNOLOGY ACADEMIC YEAR / ODD SEMESTER QUESTION BANK KINGS COLLEGE OF ENGINEERING DEPARTMENT OF INFORMATION TECHNOLOGY ACADEMIC YEAR 2011-2012 / ODD SEMESTER QUESTION BANK SUB.CODE / NAME YEAR / SEM : IT1301 INFORMATION CODING TECHNIQUES : III / V UNIT -

More information

Compressed Audio Demystified by Hendrik Gideonse and Connor Smith. All Rights Reserved.

Compressed Audio Demystified by Hendrik Gideonse and Connor Smith. All Rights Reserved. Compressed Audio Demystified Why Music Producers Need to Care About Compressed Audio Files Download Sales Up CD Sales Down High-Definition hasn t caught on yet Consumers don t seem to care about high fidelity

More information

What is multimedia? Multimedia. Continuous media. Most common media types. Continuous media processing. Interactivity. What is multimedia?

What is multimedia? Multimedia. Continuous media. Most common media types. Continuous media processing. Interactivity. What is multimedia? Multimedia What is multimedia? Media types +Text + Graphics + Audio +Image +Video Interchange formats What is multimedia? Multimedia = many media User interaction = interactivity Script = time 1 2 Most

More information

Lecture 7: Audio Compression & Coding

Lecture 7: Audio Compression & Coding EE E682: Speech & Audio Processing & Recognition Lecture 7: Audio Compression & Coding 1 2 3 Information, compression & quantization Speech coding Wide bandwidth audio coding Dan Ellis

More information

Multimedia Systems Speech II Mahdi Amiri February 2012 Sharif University of Technology

Multimedia Systems Speech II Mahdi Amiri February 2012 Sharif University of Technology Course Presentation Multimedia Systems Speech II Mahdi Amiri February 2012 Sharif University of Technology Homework Original Sound Speech Quantization Companding parameter (µ) Compander Quantization bit

More information

Mahdi Amiri. February Sharif University of Technology

Mahdi Amiri. February Sharif University of Technology Course Presentation Multimedia Systems Speech II Mahdi Amiri February 2014 Sharif University of Technology Speech Compression Road Map Based on Time Domain analysis Differential Pulse-Code Modulation (DPCM)

More information

Transporting audio-video. over the Internet

Transporting audio-video. over the Internet Transporting audio-video over the Internet Key requirements Bit rate requirements Audio requirements Video requirements Delay requirements Jitter Inter-media synchronization On compression... TCP, UDP

More information

CSCD 443/533 Advanced Networks Fall 2017

CSCD 443/533 Advanced Networks Fall 2017 CSCD 443/533 Advanced Networks Fall 2017 Lecture 18 Compression of Video and Audio 1 Topics Compression technology Motivation Human attributes make it possible Audio Compression Video Compression Performance

More information

Multimedia. What is multimedia? Media types. Interchange formats. + Text +Graphics +Audio +Image +Video. Petri Vuorimaa 1

Multimedia. What is multimedia? Media types. Interchange formats. + Text +Graphics +Audio +Image +Video. Petri Vuorimaa 1 Multimedia What is multimedia? Media types + Text +Graphics +Audio +Image +Video Interchange formats Petri Vuorimaa 1 What is multimedia? Multimedia = many media User interaction = interactivity Script

More information

Chapter 4: Audio Coding

Chapter 4: Audio Coding Chapter 4: Audio Coding Lossy and lossless audio compression Traditional lossless data compression methods usually don't work well on audio signals if applied directly. Many audio coders are lossy coders,

More information

Multimedia Systems Speech II Hmid R. Rabiee Mahdi Amiri February 2015 Sharif University of Technology

Multimedia Systems Speech II Hmid R. Rabiee Mahdi Amiri February 2015 Sharif University of Technology Course Presentation Multimedia Systems Speech II Hmid R. Rabiee Mahdi Amiri February 25 Sharif University of Technology Speech Compression Road Map Based on Time Domain analysis Differential Pulse-Code

More information

Wavelet filter bank based wide-band audio coder

Wavelet filter bank based wide-band audio coder Wavelet filter bank based wide-band audio coder J. Nováček Czech Technical University, Faculty of Electrical Engineering, Technicka 2, 16627 Prague, Czech Republic novacj1@fel.cvut.cz 3317 New system for

More information

Lossy compression. CSCI 470: Web Science Keith Vertanen

Lossy compression. CSCI 470: Web Science Keith Vertanen Lossy compression CSCI 470: Web Science Keith Vertanen Digital audio Overview Sampling rate Quan5za5on MPEG audio layer 3 (MP3) JPEG s5ll images Color space conversion, downsampling Discrete Cosine Transform

More information

The Effect of Bit-Errors on Compressed Speech, Music and Images

The Effect of Bit-Errors on Compressed Speech, Music and Images The University of Manchester School of Computer Science The Effect of Bit-Errors on Compressed Speech, Music and Images Initial Project Background Report 2010 By Manjari Kuppayil Saji Student Id: 7536043

More information

Efficient Representation of Sound Images: Recent Developments in Parametric Coding of Spatial Audio

Efficient Representation of Sound Images: Recent Developments in Parametric Coding of Spatial Audio Efficient Representation of Sound Images: Recent Developments in Parametric Coding of Spatial Audio Dr. Jürgen Herre 11/07 Page 1 Jürgen Herre für (IIS) Erlangen, Germany Introduction: Sound Images? Humans

More information

Compression Part 2 Lossy Image Compression (JPEG) Norm Zeck

Compression Part 2 Lossy Image Compression (JPEG) Norm Zeck Compression Part 2 Lossy Image Compression (JPEG) General Compression Design Elements 2 Application Application Model Encoder Model Decoder Compression Decompression Models observe that the sensors (image

More information

Squeeze Play: The State of Ady0 Cmprshn. Scott Selfon Senior Development Lead Xbox Advanced Technology Group Microsoft

Squeeze Play: The State of Ady0 Cmprshn. Scott Selfon Senior Development Lead Xbox Advanced Technology Group Microsoft Squeeze Play: The State of Ady0 Cmprshn Scott Selfon Senior Development Lead Xbox Advanced Technology Group Microsoft Agenda Why compress? The tools at present Measuring success A glimpse of the future

More information

SPREAD SPECTRUM AUDIO WATERMARKING SCHEME BASED ON PSYCHOACOUSTIC MODEL

SPREAD SPECTRUM AUDIO WATERMARKING SCHEME BASED ON PSYCHOACOUSTIC MODEL SPREAD SPECTRUM WATERMARKING SCHEME BASED ON PSYCHOACOUSTIC MODEL 1 Yüksel Tokur 2 Ergun Erçelebi e-mail: tokur@gantep.edu.tr e-mail: ercelebi@gantep.edu.tr 1 Gaziantep University, MYO, 27310, Gaziantep,

More information

Multimedia Systems Speech I Mahdi Amiri February 2011 Sharif University of Technology

Multimedia Systems Speech I Mahdi Amiri February 2011 Sharif University of Technology Course Presentation Multimedia Systems Speech I Mahdi Amiri February 2011 Sharif University of Technology Sound Sound is a sequence of waves of pressure which propagates through compressible media such

More information

Implementation of a MPEG 1 Layer I Audio Decoder with Variable Bit Lengths

Implementation of a MPEG 1 Layer I Audio Decoder with Variable Bit Lengths Implementation of a MPEG 1 Layer I Audio Decoder with Variable Bit Lengths A thesis submitted in fulfilment of the requirements of the degree of Master of Engineering 23 September 2008 Damian O Callaghan

More information

Music & Engineering: Digital Encoding and Compression

Music & Engineering: Digital Encoding and Compression Music & Engineering: Digital Encoding and Compression Tim Hoerning Fall 2008 (last modified 10/29/08) Overview The Human Ear Psycho-Acoustics Masking Critical Bands Digital Standard Overview CD ADPCM MPEG

More information

Music & Engineering: Digital Encoding and Compression

Music & Engineering: Digital Encoding and Compression Music & Engineering: Digital Encoding and Compression Tim Hoerning Fall 2010 (last modified 11/16/08) Overview The Human Ear Psycho-Acoustics Masking Critical Bands Digital Standard Overview CD ADPCM MPEG

More information

Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig

Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig Multimedia Databases Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 6 Audio Retrieval 6 Audio Retrieval 6.1 Basics of

More information

Parametric Coding of Spatial Audio

Parametric Coding of Spatial Audio Parametric Coding of Spatial Audio Ph.D. Thesis Christof Faller, September 24, 2004 Thesis advisor: Prof. Martin Vetterli Audiovisual Communications Laboratory, EPFL Lausanne Parametric Coding of Spatial

More information

Bit or Noise Allocation

Bit or Noise Allocation ISO 11172-3:1993 ANNEXES C & D 3-ANNEX C (informative) THE ENCODING PROCESS 3-C.1 Encoder 3-C.1.1 Overview For each of the Layers, an example of one suitable encoder with the corresponding flow-diagram

More information

1 Audio quality determination based on perceptual measurement techniques 1 John G. Beerends

1 Audio quality determination based on perceptual measurement techniques 1 John G. Beerends Contents List of Figures List of Tables Contributing Authors xiii xxi xxiii Introduction Karlheinz Brandenburg and Mark Kahrs xxix 1 Audio quality determination based on perceptual measurement techniques

More information

6MPEG-4 audio coding tools

6MPEG-4 audio coding tools 6MPEG-4 audio coding 6.1. Introduction to MPEG-4 audio MPEG-4 audio [58] is currently one of the most prevalent audio coding standards. It combines many different types of audio coding into one integrated

More information

Compression; Error detection & correction

Compression; Error detection & correction Compression; Error detection & correction compression: squeeze out redundancy to use less memory or use less network bandwidth encode the same information in fewer bits some bits carry no information some

More information

/ / _ / _ / _ / / / / /_/ _/_/ _/_/ _/_/ _\ / All-American-Advanced-Audio-Codec

/ / _ / _ / _ / / / / /_/ _/_/ _/_/ _/_/ _\ / All-American-Advanced-Audio-Codec / / _ / _ / _ / / / / /_/ _/_/ _/_/ _/_/ _\ / All-American-Advanced-Audio-Codec () **Z ** **=Z ** **= ==== == **= ==== \"\" === ==== \"\"\" ==== \"\"\"\" Tim O Brien Colin Sullivan Jennifer Hsu Mayank

More information

MPEG-4 Version 2 Audio Workshop: HILN - Parametric Audio Coding

MPEG-4 Version 2 Audio Workshop: HILN - Parametric Audio Coding MPEG-4 Version 2 Audio Workshop: HILN - Parametric Audio Coding Heiko Purnhagen Laboratorium für Informationstechnologie University of Hannover, Germany Outline Introduction What is "Parametric Audio Coding"?

More information

Efficient Implementation of Transform Based Audio Coders using SIMD Paradigm and Multifunction Computations

Efficient Implementation of Transform Based Audio Coders using SIMD Paradigm and Multifunction Computations Efficient Implementation of Transform Based Audio Coders using SIMD Paradigm and Multifunction Computations Luckose Poondikulam S (luckose@sasken.com), Suyog Moogi (suyog@sasken.com), Rahul Kumar, K P

More information

Digital Recording and Playback

Digital Recording and Playback Digital Recording and Playback Digital recording is discrete a sound is stored as a set of discrete values that correspond to the amplitude of the analog wave at particular times Source: http://www.cycling74.com/docs/max5/tutorials/msp-tut/mspdigitalaudio.html

More information

DSP. Presented to the IEEE Central Texas Consultants Network by Sergio Liberman

DSP. Presented to the IEEE Central Texas Consultants Network by Sergio Liberman DSP The Technology Presented to the IEEE Central Texas Consultants Network by Sergio Liberman Abstract The multimedia products that we enjoy today share a common technology backbone: Digital Signal Processing

More information

GSM Network and Services

GSM Network and Services GSM Network and Services Voice coding 1 From voice to radio waves voice/source coding channel coding block coding convolutional coding interleaving encryption burst building modulation diff encoding symbol

More information

Modeling of an MPEG Audio Layer-3 Encoder in Ptolemy

Modeling of an MPEG Audio Layer-3 Encoder in Ptolemy Modeling of an MPEG Audio Layer-3 Encoder in Ptolemy Patrick Brown EE382C Embedded Software Systems May 10, 2000 $EVWUDFW MPEG Audio Layer-3 is a standard for the compression of high-quality digital audio.

More information

HAVE YOUR CAKE AND HEAR IT TOO: A HUFFMAN CODED, BLOCK SWITCHING, STEREO PERCEPTUAL AUDIO CODER

HAVE YOUR CAKE AND HEAR IT TOO: A HUFFMAN CODED, BLOCK SWITCHING, STEREO PERCEPTUAL AUDIO CODER HAVE YOUR CAKE AND HEAR IT TOO: A HUFFMAN CODED, BLOCK SWITCHING, STEREO PERCEPTUAL AUDIO CODER Rob Colcord, Elliot Kermit-Canfield and Blane Wilson Center for Computer Research in Music and Acoustics,

More information

The following bit rates are recommended for broadcast contribution employing the most commonly used audio coding schemes:

The following bit rates are recommended for broadcast contribution employing the most commonly used audio coding schemes: Page 1 of 8 1. SCOPE This Operational Practice sets out guidelines for minimising the various artefacts that may distort audio signals when low bit-rate coding schemes are employed to convey contribution

More information

A Generic Audio Classification and Segmentation Approach for Multimedia Indexing and Retrieval

A Generic Audio Classification and Segmentation Approach for Multimedia Indexing and Retrieval A Generic Audio Classification and Segmentation Approach for Multimedia Indexing and Retrieval 1 A Generic Audio Classification and Segmentation Approach for Multimedia Indexing and Retrieval Serkan Kiranyaz,

More information

Source Coding Basics and Speech Coding. Yao Wang Polytechnic University, Brooklyn, NY11201

Source Coding Basics and Speech Coding. Yao Wang Polytechnic University, Brooklyn, NY11201 Source Coding Basics and Speech Coding Yao Wang Polytechnic University, Brooklyn, NY1121 http://eeweb.poly.edu/~yao Outline Why do we need to compress speech signals Basic components in a source coding

More information

COS 116 The Computational Universe Laboratory 4: Digital Sound and Music

COS 116 The Computational Universe Laboratory 4: Digital Sound and Music COS 116 The Computational Universe Laboratory 4: Digital Sound and Music In this lab you will learn about digital representations of sound and music, especially focusing on the role played by frequency

More information

3 Sound / Audio. CS 5513 Multimedia Systems Spring 2009 LECTURE. Imran Ihsan Principal Design Consultant

3 Sound / Audio. CS 5513 Multimedia Systems Spring 2009 LECTURE. Imran Ihsan Principal Design Consultant LECTURE 3 Sound / Audio CS 5513 Multimedia Systems Spring 2009 Imran Ihsan Principal Design Consultant OPUSVII www.opuseven.com Faculty of Engineering & Applied Sciences 1. The Nature of Sound Sound is

More information

Audio Compression Using Decibel chirp Wavelet in Psycho- Acoustic Model

Audio Compression Using Decibel chirp Wavelet in Psycho- Acoustic Model Audio Compression Using Decibel chirp Wavelet in Psycho- Acoustic Model 1 M. Chinna Rao M.Tech,(Ph.D) Research scholar, JNTUK,kakinada chinnarao.mortha@gmail.com 2 Dr. A.V.S.N. Murthy Professor of Mathematics,

More information

MPEG-4 aacplus - Audio coding for today s digital media world

MPEG-4 aacplus - Audio coding for today s digital media world MPEG-4 aacplus - Audio coding for today s digital media world Whitepaper by: Gerald Moser, Coding Technologies November 2005-1 - 1. Introduction Delivering high quality digital broadcast content to consumers

More information

AN AUDIO WATERMARKING SCHEME ROBUST TO MPEG AUDIO COMPRESSION

AN AUDIO WATERMARKING SCHEME ROBUST TO MPEG AUDIO COMPRESSION AN AUDIO WATERMARKING SCHEME ROBUST TO MPEG AUDIO COMPRESSION Won-Gyum Kim, *Jong Chan Lee and Won Don Lee Dept. of Computer Science, ChungNam Nat l Univ., Daeduk Science Town, Taejon, Korea *Dept. of

More information

Audio Segmentation and Classification. Abdillahi Hussein Omar

Audio Segmentation and Classification. Abdillahi Hussein Omar Audio Segmentation and Classification Abdillahi Hussein Omar Kgs. Lyngby 2005 Preface The work presented in this thesis has been carried out at the Intelligent Signal Processing Group, at the Institute

More information

Lossy compression CSCI 470: Web Science Keith Vertanen Copyright 2013

Lossy compression CSCI 470: Web Science Keith Vertanen Copyright 2013 Lossy compression CSCI 470: Web Science Keith Vertanen Copyright 2013 Digital audio Overview Sampling rate Quan5za5on MPEG audio layer 3 (MP3) JPEG s5ll images Color space conversion, downsampling Discrete

More information

AUDIO information often plays an essential role in understanding

AUDIO information often plays an essential role in understanding 1062 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 3, MAY 2006 A Generic Audio Classification and Segmentation Approach for Multimedia Indexing and Retrieval Serkan Kiranyaz,

More information

Rich Recording Technology Technical overall description

Rich Recording Technology Technical overall description Rich Recording Technology Technical overall description Ari Koski Nokia with Windows Phones Product Engineering/Technology Multimedia/Audio/Audio technology management 1 Nokia s Rich Recording technology

More information

COS 116 The Computational Universe Laboratory 4: Digital Sound and Music

COS 116 The Computational Universe Laboratory 4: Digital Sound and Music COS 116 The Computational Universe Laboratory 4: Digital Sound and Music In this lab you will learn about digital representations of sound and music, especially focusing on the role played by frequency

More information