# CHAPTER 6 Audio compression in practice

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 CHAPTER 6 Audio compression in practice In earlier chapters we have seen that digital sound is simply an array of numbers, where each number is a measure of the air pressure at a particular time. This representation makes it simple to manipulate sound on a computer by doing calculations with these numbers. We have seen examples of simple filters which change the sound in some preferred way like removing high or low frequencies or adding echo. We have also seen how an audio signal can be rewritten in terms of a multiresolution analysis which may be advantageous if we want to compress the data that represent the sound. In this chapter we will very briefly review some practical aspects of sound processing. In particular we will briefly discuss how a sound may be represented in terms of trigonometric functions and review the most important digital audio formats. 6.1 Wavelet based compression In chapter 10 in the Norwegian lecture notes, we described a way to decompose a set of points by comparing representations at different resolutions with the help of piecewise linear functions, a so-called multiresolution analysis. This illustrated the idea behind a family of functions called wavelets (although wavelets should have some additional properties that we have ignored to keep the presentation simple). In this section we will try to compress sound with the multiresolution analysis. Example 6.1. Figure 6.1 shows an example with real audio data. The data in (a) were decomposed once which resulted in the coarser signal in (b) and the error signal in (c). Then all error values smaller than 0.02 set to zero which leads to the signal in (d). The reconstruction algorithm was then run with the data in (b) and (d); the result is shown in (e). Figure (f) shows the difference between the two signals. The same computations were performed with a larger data set consisting of points from the same song. Out of error values were smaller than 0.02 in absolute value. When these were set to 0 and the sound reconstructed with the perturbed data, the music still sounded quite good. The 103

2 104 CHAPTER 6. AUDIO COMPRESSION IN PRACTICE (a) (b) (c) (d) (e) (f) Figure 6.1. An example of data decomposition with the multiresolution analysis from chapter 10 of the Norwegian lecture notes. Figure (a) shows the piecewise linear interpolant to data taken from the song Here Comes the Sun by the Beatles. Figure (b) shows the piecewise linear interpolant to every other data point, and figure (c) shows the error function. In (d) we have set all error values smaller than 0.02 to zero, altogether 148 out of 200 values. In (e) the data have been reconstructed from the functions in (c) and (d). Figure (f) shows the difference between the original data and the reconstructed data.

3 6.2. FOURIER ANALYSIS AND THE DCT (a) (b) Figure 6.2. Trigonometric approximation of a cubic polynomial on the interval [0,2π]. In (a) both functions are shown while in (b) the approximation is plotted on the interval [ 10,10]. maximum difference between this perturbed signal and the original was just under To check the possibility of compression, the data were written to a file as 32-bit floating-point numbers and then compressed with gzip. This resulted in a file with bytes. When the original data were written to file and compressed with gzip, the file size was bytes. In other words, the sound has been compressed down to 55 % with a very simple algorithm, without too much deterioration in the sound quality. 6.2 Fourier analysis and the DCT As we have seen, multiresolution analysis and wavelets provide a convenient framework for compression of audio files. The theory behind wavelets was developed in the late 1980s and was not introduced into much commercial software until the late 1990s. Common audio formats like MP3, AAC and Ogg Vorbis were developed before this time and use a mathematical operation called the Discrete Cosine Transform (DCT) to transform the audio data to a form that lends itself well to compression. Before we consider the basics of the DCT, we need some background information. A very common technique in mathematics is to approximate a general function by a function from some appropriate family. We have seen several examples of this, for example approximation by Taylor polynomials and approximation by piecewise linear functions. Another possibility is to approximate functions by combinations of trigonometric functions. This is a technique that is used in many different areas of mathematics and is usually referred to as Fourier analysis.

4 106 CHAPTER 6. AUDIO COMPRESSION IN PRACTICE In Fourier analysis, general functions are approximated by combinations of the functions 1,sin x,cos x,sin2x,cos2x,sin3x,cos3x,sin4x,cos4x,... (6.1) A simple example is shown in figure 6.2. There a cubic polynomial has been approximated by a trigonometric function on the form a 0 + b 1 sin x + a 1 cos x + b 2 sin2x + a 2 cos2x. (6.2) The coefficients have been determined in such a way that the error should be small on the interval [0,1]. This has been quite successful in that we can hardly differentiate between the two functions in figure 6.2a. In figure 6.2b the approximation has been plotted on the interval [ 10,10]. From this plot we see that the approximation is periodic, as is the case for all trigonometric functions, and therefore very different from the polynomial as we come outside the interval [0,1]. The fact that the error was small in figure 6.2a was no accident; it can be proved that quite general functions can be approximated arbitrarily well by trigonometric functions. Recall that sinkt corresponds to a pure tone of frequency k/(2π). If we think of a certain sound as a function, the fact that any function can be expressed as a combination of the trigonometric functions in (6.1) means that any sound can be generated by combining a set of pure tones with different weights, just as in (6.2). So given a sound, it can always de decomposed into different amounts of pure tones be represented by suitable trigonometric functions. Fact 6.2 (Fourier analysis). Any reasonable function f can be approximated arbitrarily well on the interval [0,2π] by a combination a 0 + N (a k coskt + b k sinkt), (6.3) k=1 where N is a suitable integer. This means that any sound can be decomposed into different amounts (the coefficients {a k } and {b k }) of pure tones (the functions {coskt} and {sinkt}). This is referred to by saying that f has been converted to the frequency domain and the coefficients {a k } and {b k } are referred to as the spectrum of f. When a given sound has been decomposed as in (6.3), we can adjust the sound by adjusting the coefficients. The trigonometric functions with large values of k correspond to high frequencies and the ones with small values of k correspond to low frequencies. We can therefore adjust the treble and bass by manipulating the appropriate coefficients.

5 6.2. FOURIER ANALYSIS AND THE DCT 107 There is an extensive theory for audio processing based on Fourier analysis that is studied in the field of signal processing The Discrete Cosine Transform We are particularly interested in digital audio, i.e., sound given by values sampled from an audio signal at regular intervals. We have seen how digital sound can be compressed by using a multiresolution analysis, but this can also be done with ideas from Fourier analysis. For the purpose of compressing sound it is not so common to use Fourier analysis as indicated above. Instead only cosine functions are used for decomposition. Definition 6.3 (Discrete Cosine Transform (DCT)). Suppose the sequence of numbers u = {u s } n 1 s=0 are given. The DCT of u is given by the sequence v s = 1 n 1 ( (2r + 1)sπ ) u r cos, for s = 0,..., n 1. (6.4) n 2n r =0 The new sequence v generated by the DCT tells us how much the sequence u contains of the different frequencies. For each s = 0, 1,..., n 1, the function cos sπt is sampled at the points t r = (2r + 1)/(2n) for r = 0, 1,..., n 1 which results in the values ( sπ ) cos, cos 2n ( 3sπ 2n ), cos ( 5sπ 2n ),..., cos ( (2n 1)sπ These are then multiplied by the u r and everything is added together. Plots of these values for n = 6 are shown in figure 6.3. We note that as s increases, the functions oscillate more and more. This means that v 0 gives a measure of how much constant content there is in the data, while v 5 gives a measure of how much content there is with maximum oscillation. In other words, the DCT of an audio signal shows the proportion of the different frequencies in the signal. Once the DCT of u has been computed, we can analyse the frequency content of the signal. If we want to reduce the bass we can decrease the v s -values with small indices and if we want to increase the treble we can increase the v s - values with large indices. In a typical audio signal there will be most information in the lower frequencies, and some frequencies will be almost completely absent, i.e., some of the v s -values will be almost zero. This can exploited for compression: We change the small v s -values a little bit and set them to 0, and then 2n ).

6 108 CHAPTER 6. AUDIO COMPRESSION IN PRACTICE (a) (b) (c) (d) (e) (f) Figure 6.3. The 6 different versions of the cos function used in DCT for n = 6. The plots show piecewise linear functions, but this is just to make the plots more readable: Only the values at the integers 0,..., 5 are used.

7 6.2. FOURIER ANALYSIS AND THE DCT (a) (b) (c) (d) Figure 6.4. The signal in (a) is a small part of a song (the same as in figure 6.1b). The plot in (b) shows the DCT of the signal. In (c), all values of the DCT that are smaller than 0.02 in absolute value have been set to 0, a total of 309 values. In (c) the signal has been reconstructed from these perturbed values of the DCT. Note that all signals are discrete; the values have been connected by straight lines to make it easier to interpret the plots. store the signal by storing the DCT-values. This leaves us with one challenge: How to convert the perturbed DCT-values back to a normal signal. It turns out that this can be accomplished with formulas very similar to the DCT itself. Theorem 6.4 (Inverse Discrete Cosine Transform). Suppose that the sequence v = {v s } n 1 s=0 is the DCT of the sequence u = {u r } n 1 r =0 as in (6.4). Then u can be recovered from v via the formulas u r = 1 n 1 ( (2r + 1)sπ ) (v ) v s cos, for r = 0,..., n 1. (6.5) n 2n s=1 We now see the resemblance with multiresolution analysis. The DCT provides another way to rewrite a signal in an alternative form where many values become small or even 0, and this can then be exploited for compression. Example 6.5. Let us test a naive compression strategy similar to the one in ex-

8 110 CHAPTER 6. AUDIO COMPRESSION IN PRACTICE ample 6.1 where we just replace the multiresolution decomposition and reconstruction formulas with the DCT and its inverse. The plots in figure 6.4 illustrate the principle. A signal is shown in (a) and its DCT in (b). In (d) all values of the DCT with absolute value smaller than 0.02 have been set to zero. The signal can then be reconstructed with the inverse DCT of theorem 6.4; the result of this is shown in (c). The two signals in (a) and (b) visually look almost the same even though the signal in (c) can be represented with less than 25 % of the information present in (a). The larger data set from example 6.1 consists of points. We compute the DCT and set all values smaller than a suitable tolerance to 0. It turns out that we need to use a tolerance as small as to get an error in the signal itself that is comparable to the corresponding error in example 6.1. With a tolerance of 0.04, a total of values are set to zero. When we then reconstruct the sound with the inverse DCT, we obtain a signal that differs at most from the original signal which is about the same as in example 6.1. We can store the signal by storing a gzip ed version of the DCT-values (as 32-bit floating-point numbers) of the perturbed signal. This gives a file with bytes, which is 88 % of the gzip ed version of the original data. Recall that when we used the wavelet transform in example 6.1, we ended up with a file size of , which is considerably less than what we obtained here, without the error being much bigger. From this simple example, wavelets therefore seem promising for compression of audio. The approach to compression that we have outlined in the above examples is essentially what is used in practice. The difference is that commercial software does everything in a more sophisticated way and thereby get better compression rates. Fact 6.6 (Basic idea behind audio compression). Suppose a digital audio signal u is given. To compress u, perform the following steps: 1. Rewrite the signal u in a new format where frequency information becomes accessible. 2. Remove those frequencies that only contribute marginally to human perception of the sound. 3. Store the resulting sound by coding the remaining frequency information with some lossless coding method. All the compression strategies used in the commercial formats that we re-

9 6.3. PSYCHO-ACOUSTIC MODELS 111 view below, use the strategy in fact 6.6. In fact they all use a modified version of the DCT in step 1 and a variant of Huffman coding in step 3. Where they vary the most is probably in deciding what information to remove from the signal. To do this well requires some knowledge of human perception of sound. 6.3 Psycho-acoustic models In the previous sections, we have outlined a simple strategy for compressing sound. The idea is to rewrite the audio signal in an alternative mathematical representation where many of the values are small, set the smallest values to 0, store this perturbed signal and compress it with a lossless compression method. This kind of compression strategy works quite well, and is based on keeping the difference between the original signal and the compressed signal small. However, in certain situations a listener will not be able to perceive the sound as being different even if this difference is quite large. This is due to how our auditory system interprets audio signals and is referred to as psycho-acoustic effects. When we hear a sound, there is a mechanical stimulation of the ear drum, and the amount of stimulus is directly related to the size of the sample values of the digital sound. The movement of the ear drum is then converted to electric impulses that travel to the brain where they are perceived as sound. The perception process uses a Fourier-like transformation of the sound so that a steady oscillation in air pressure is perceived as a sound with a fixed frequency. In this process certain kinds of perturbations of the sound are hardly noticed by the brain, and this is exploited in lossy audio compression. The most obvious psycho-acoustic effect is that the human auditory system can only perceive frequencies in the range 20 Hz Hz. An obvious way to do compression is therefore to remove frequencies outside this range, although there are indications that these frequencies may influence the listening experience inaudibly. Another phenomenon is masking effects. A simple example of this is that a loud sound will make a simultaneous quiet sound inaudible. For compression this means that if certain frequencies of a signal are very prominent, most of the other frequencies can be removed, even when they are quite large. These kinds of effects are integrated into what is referred to as a psychoacoustic model. This model is then used as the basis for simplifying the spectrum of the sound in way that is hardly noticeable to a listener, but which allows the sound to be stored with must less information than the original.

10 112 CHAPTER 6. AUDIO COMPRESSION IN PRACTICE 6.4 Digital audio formats Digital audio first became commonly available when the CD was introduced in the early 1980s. As the storage capacity and processing speeds of computers increased, it became possible to transfer audio files to computers and both play and manipulate the data. However, audio was represented by a large amount of data and an obvious challenge was how to reduce the storage requirements. Lossless coding techniques like Huffman and Lempel-Ziv coding were known and with these kinds of techniques the file size could be reduced to about half of that required by the CD format. However, by allowing the data to be altered a little bit it turned out that it was possible to reduce the file size down to about ten percent of the CD format, without much loss in quality. In this section we will give a brief description of some of the most common digital audio formats, both lossy and lossless ones Audio sampling PCM The basis for all digital sound is sampling of an analog (continuous) audio signal. This is usually done with a technique called Pulse Code Modulation (PCM). The audio signal is sampled at regular intervals and the sampled values stored in a suitable number format. Both the sampling rate and the number format varies for different kinds of audio. For telephony it is common to sample the sound 8000 times per second and represent each sample value as a 13-bit integer. These integers are then converted to a kind of 8-bit floating-point format with a 4-bit mantissa. Telephony therefore generates bits per second. The classical CD-format samples the audio signal times per second and stores the samples as 16-bit integers. This works well for music with a reasonably uniform dynamic range, but is problematic when the range varies. Suppose for example that a piece of music has a very loud passage. In this passage the samples will typically make use of almost the full range of integer values, from to When the music enters a more quiet passage the sample values will necessarily become much smaller and perhaps only vary in the range 1000 to 1000, say. Since 2 10 = 1024 this means that in the quiet passage the music would only be represented with 10-bit samples. This problem can be avoided by using a floating-point format instead, but very few audio formats appear to do this. Newer formats with higher quality are available. Music is distributed in various formats on DVDs (DVD-video, DVD-audio, Super Audio CD) with sampling rates up to and up to 24 bits per pixel. These formats also support surround sound (up to seven channels as opposed to the two stereo channels on CDs).

11 6.4. DIGITAL AUDIO FORMATS 113 Both the number of samples per second and the number of bits per sample influence the quality of the resulting sound. For simplicity the quality is often measured by the number of bits per second, i.e., the product of the sampling rate and the number of bits per sample. For standard telephony we saw that the bit rate is bits per second or 64 kb/s. The bit rate for CD-quality stereo sound is bits/s = kb/s. This quality measure is particularly popular for lossy audio formats where the uncompressed audio usually is the same (CD-quality). However, it should be remembered that even two audio files in the same file format and with the same bit rate may be of very different quality because the encoding programs me be of different quality. All the audio formats mentioned so far can be considered raw formats; it is a description of how the sound is digitised. When the information is stored on a computer, the details of how the data is organised must be specified, and there are several popular formats for this Lossless formats The two most common file formats for CD-quality audio are AIFF and WAV, which are both supported by most commercial audio programs. These formats specify in detail how the audio data should be stored in a file. In addition, there is support for including the title of the audio piece, album and artist name and other relevant data. All the other audio formats below (including the lossy ones) also have support for this kind of additional information. AIFF. Audio Interchange File Format was developed by Apple and published in AIFF supports different sample rates and bit lengths, but is most commonly used for storing CD-quality audio at samples per second and 16 bits per sample. No compression is applied to the data, but there is also a variant that supports lossless compression, namely AIFF-C. WAV. Waveform audio data is a file format developed by Microsoft and IBM. As AIFF, it supports different data formats, but by far the most common is standard CD-quality sound. WAV uses a 32-bit integer to specify the file size at the beginning of the file which means that a WAV-file cannot be larger than 4 GB. Microsoft therefore developed the W64 format to remedy this. Apple Lossless. After Apple s ipods became popular, the company in 2004 introduced a lossless compressed file format called Apple Lossless. This format is used for reducing the size of CD-quality audio files. Apple has not published the algorithm behind the Apple Lossless format, but most of the details have been

12 114 CHAPTER 6. AUDIO COMPRESSION IN PRACTICE worked out by programmers working on a public decoder. The compression phase uses a two step algorithm: 1. When the nth sample value x n is reached, an approximation y n to x n is computed, and the error e n = x n y n is stored instead of x n. In the simplest case, the approximation y n would be the previous sample value x n 1 ; better approximations are obtained by computing y n as a combination of several of the previous sample values. 2. The error e n is coded by a variant of the Rice algorithm. This is an algorithm which was developed to code integer numbers efficiently. It works particularly well when small numbers are much more likely than larger numbers and in this situation it achieves compression rates close to the entropy limit. Since the sample values are integers, the step above produces exactly the kind of data that the Rice algorithm handles well. FLAC. Free Lossless Audio Code is another compressed lossless audio format. FLAC is free and open source (meaning that you can obtain the program code). The encoder uses an algorithm similar to the one used for Apple Lossless, with prediction based on previous samples and encoding of the error with a variant of the Rice algorithm Lossy formats All the lossy audio formats described below apply a modified version of the DCT to successive groups (frames) of sample values, analyse the resulting values, and perturb them according to a psycho-acoustic model. These perturbed values are then converted to a suitable number format and coded with some lossless coding method like Huffman coding. When the audio is to be played back, this process has to be reversed and the data translated back to perturbed sample values at the appropriate sample rate. MP3. Perhaps the best known audio format is MP3 or more precisely MPEG-1 Audio Layer 3. This format was developed by Philips, CCETT (Centre commun d études de télévision et télécommunications), IRT (Institut für Rundfunktechnik) and Fraunhofer Society, and became an international standard in Virtually all audio software and music players support this format. MP3 is just a sound format and does not specify the details of how the encoding should be done. As a consequence there are many different MP3 encoders available, of varying quality. In particular, an encoder which works well for higher bit rates (high quality sound) may not work so well for lower bit rates.

13 6.4. DIGITAL AUDIO FORMATS 115 MP3 is based on applying a variant of the DCT (called the Modified Discrete Cosine Transform, MDCT) to groups of 576 (in special circumstances 192) samples. These MDCT values are then processed according to a psycho-acoustic model and coded efficiently with Huffman coding. MP3 supports bit rates from 32 to 320 kbit/s and the sampling rates 32, 44.1, and 48 khz. The format also supports variable bit rates (the bit rate varies in different parts of the file). AAC. Advanced Audio Coding has been presented as the successor to the MP3 format by the principal MP3 developer, Fraunhofer Society. AAC can achieve better quality than MP3 at the same bit rate, particularly for bit rates below 192 kb/s. AAC became well known in April 2003 when Apple introduced this format (at 128 kb/s) as the standard format for their itunes Music Store and ipod music players. AAC is also supported by many other music players, including the most popular mobile phones. The technologies behind AAC and MP3 are very similar. AAC supports more sample rates (from 8 khz to 96 khz) and up to 48 channels. AAC uses the MDCT, just like MP3, but AAC processes samples at time. AAC also uses much more sophisticated processing of frequencies above 16 khz and has a number of other enhancements over MP3. AAC, as MP3, uses Huffman coding for efficient coding of the MDCT values. Tests seem quite conclusive that AAC is better than MP3 for low bit rates (typically below 192 kb/s), but for higher rates it is not so easy to differentiate between the two formats. As for MP3 (and the other formats mentioned here), the quality of an AAC file depends crucially on the quality of the encoding program. There are a number of variants of AAC, in particular AAC Low Delay (AAC- LD). This format was designed for use in two-way communication over a network, for example the Internet. For this kind of application, the encoding (and decoding) must be fast to avoid delays (a delay of at most 20 ms can be tolerated). Ogg Vorbis. Vorbis is an open-source, lossy audio format that was designed to be free of any patent issues and free to use, and to be an improvement on MP3. At our level of detail Vorbis is very similar MP3 and AAC: It uses the MDCT to transform groups of samples to the frequency domain, it then applies a psychoacoustic model, and codes the final data with a variant of Huffman coding. In contrast to MP3 and AAC, Vorbis always uses variable length bit rates. The desired quality is indicated with an integer in the range 1 (worst) to 10 (best). Vorbis supports a wide range of sample rates from 8 khz to 192 khz and up to

14 116 CHAPTER 6. AUDIO COMPRESSION IN PRACTICE 255 channels. In comparison tests with the other formats, Vorbis appear to perform well, particularly at medium quality bit rates. WMA. Windows Media Audio is a lossy audio format developed by Microsoft. WMA is also based on the MDCT and Huffman coding, and like AAC and Vorbis, it was explicitly designed to improve the deficiencies in MP3. WMA supports sample rates up to 48 khz and two channels. There is a more advanced version, WMA Professional, which supports sample rates up to 96 khz and 24 bit samples, but this has limited support in popular software and music players. There is also a lossless variant, WMA Lossless. At low bit rates, WMA generally appears to give better quality than MP3. At higher bit rates, the quality of WMA Pro seems to be comparable to that of AAC and Vorbis.

### Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig

Multimedia Databases Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 6 Audio Retrieval 6 Audio Retrieval 6.1 Basics of

### 2.4 Audio Compression

2.4 Audio Compression 2.4.1 Pulse Code Modulation Audio signals are analog waves. The acoustic perception is determined by the frequency (pitch) and the amplitude (loudness). For storage, processing and

Compressed Audio Demystified Why Music Producers Need to Care About Compressed Audio Files Download Sales Up CD Sales Down High-Definition hasn t caught on yet Consumers don t seem to care about high fidelity

### EE482: Digital Signal Processing Applications

Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu EE482: Digital Signal Processing Applications Spring 2014 TTh 14:30-15:45 CBC C222 Lecture 13 Audio Signal Processing 14/04/01 http://www.ee.unlv.edu/~b1morris/ee482/

### Chapter 14 MPEG Audio Compression

Chapter 14 MPEG Audio Compression 14.1 Psychoacoustics 14.2 MPEG Audio 14.3 Other Commercial Audio Codecs 14.4 The Future: MPEG-7 and MPEG-21 14.5 Further Exploration 1 Li & Drew c Prentice Hall 2003 14.1

### Perceptual Coding. Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding

Perceptual Coding Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding Part II wrap up 6.082 Fall 2006 Perceptual Coding, Slide 1 Lossless vs.

### UNDERSTANDING MUSIC & VIDEO FORMATS

ComputerFixed.co.uk Page: 1 Email: info@computerfixed.co.uk UNDERSTANDING MUSIC & VIDEO FORMATS Are you confused with all the different music and video formats available? Do you know the difference between

### Audio Coding and MP3

Audio Coding and MP3 contributions by: Torbjørn Ekman What is Sound? Sound waves: 20Hz - 20kHz Speed: 331.3 m/s (air) Wavelength: 165 cm - 1.65 cm 1 Analogue audio frequencies: 20Hz - 20kHz mono: x(t)

### Figure 1. Generic Encoder. Window. Spectral Analysis. Psychoacoustic Model. Quantize. Pack Data into Frames. Additional Coding.

Introduction to Digital Audio Compression B. Cavagnolo and J. Bier Berkeley Design Technology, Inc. 2107 Dwight Way, Second Floor Berkeley, CA 94704 (510) 665-1600 info@bdti.com http://www.bdti.com INTRODUCTION

### Lecture 16 Perceptual Audio Coding

EECS 225D Audio Signal Processing in Humans and Machines Lecture 16 Perceptual Audio Coding 2012-3-14 Professor Nelson Morgan today s lecture by John Lazzaro www.icsi.berkeley.edu/eecs225d/spr12/ Hero

### CHAPTER 10: SOUND AND VIDEO EDITING

CHAPTER 10: SOUND AND VIDEO EDITING What should you know 1. Edit a sound clip to meet the requirements of its intended application and audience a. trim a sound clip to remove unwanted material b. join

### Mpeg 1 layer 3 (mp3) general overview

Mpeg 1 layer 3 (mp3) general overview 1 Digital Audio! CD Audio:! 16 bit encoding! 2 Channels (Stereo)! 44.1 khz sampling rate 2 * 44.1 khz * 16 bits = 1.41 Mb/s + Overhead (synchronization, error correction,

### Digital Audio Basics

CSC 170 Introduction to Computers and Their Applications Lecture #2 Digital Audio Basics Digital Audio Basics Digital audio is music, speech, and other sounds represented in binary format for use in digital

### AAMS Auto Audio Mastering System V3 Manual

AAMS Auto Audio Mastering System V3 Manual As a musician or technician working on music sound material, you need the best sound possible when releasing material to the public. How do you know when audio

### Skill Area 214: Use a Multimedia Software. Software Application (SWA)

Skill Area 214: Use a Multimedia Application (SWA) Skill Area 214: Use a Multimedia 214.4 Produce Audio Files What is digital audio? Audio is another meaning for sound. Digital audio refers to a digital

### 15 Data Compression 2014/9/21. Objectives After studying this chapter, the student should be able to: 15-1 LOSSLESS COMPRESSION

15 Data Compression Data compression implies sending or storing a smaller number of bits. Although many methods are used for this purpose, in general these methods can be divided into two broad categories:

### Data Compression. Media Signal Processing, Presentation 2. Presented By: Jahanzeb Farooq Michael Osadebey

Data Compression Media Signal Processing, Presentation 2 Presented By: Jahanzeb Farooq Michael Osadebey What is Data Compression? Definition -Reducing the amount of data required to represent a source

### COS 116 The Computational Universe Laboratory 4: Digital Sound and Music

COS 116 The Computational Universe Laboratory 4: Digital Sound and Music In this lab you will learn about digital representations of sound and music, especially focusing on the role played by frequency

### Wavelet filter bank based wide-band audio coder

Wavelet filter bank based wide-band audio coder J. Nováček Czech Technical University, Faculty of Electrical Engineering, Technicka 2, 16627 Prague, Czech Republic novacj1@fel.cvut.cz 3317 New system for

### What is multimedia? Multimedia. Continuous media. Most common media types. Continuous media processing. Interactivity. What is multimedia?

Multimedia What is multimedia? Media types +Text + Graphics + Audio +Image +Video Interchange formats What is multimedia? Multimedia = many media User interaction = interactivity Script = time 1 2 Most

### Multimedia. What is multimedia? Media types. Interchange formats. + Text +Graphics +Audio +Image +Video. Petri Vuorimaa 1

Multimedia What is multimedia? Media types + Text +Graphics +Audio +Image +Video Interchange formats Petri Vuorimaa 1 What is multimedia? Multimedia = many media User interaction = interactivity Script

### MPEG-1. Overview of MPEG-1 1 Standard. Introduction to perceptual and entropy codings

MPEG-1 Overview of MPEG-1 1 Standard Introduction to perceptual and entropy codings Contents History Psychoacoustics and perceptual coding Entropy coding MPEG-1 Layer I/II Layer III (MP3) Comparison and

### Mobile Peer-to-Peer Audio Streaming

Mobile Peer-to-Peer Audio Streaming Andreas Lüthi Bachelor Thesis Computer Science Department ETH Zürich 8092 Zürich, Switzerland Email: aluethi@student.ethz.ch Abstract A peer-to-peer network has several

### For Mac and iphone. James McCartney Core Audio Engineer. Eric Allamanche Core Audio Engineer

For Mac and iphone James McCartney Core Audio Engineer Eric Allamanche Core Audio Engineer 2 3 James McCartney Core Audio Engineer 4 Topics About audio representation formats Converting audio Processing

### The Gullibility of Human Senses

The Gullibility of Human Senses Three simple tricks for producing LBSC 690: Week 9 Multimedia Jimmy Lin College of Information Studies University of Maryland Monday, April 2, 2007 Images Video Audio But

### IMAGE COMPRESSION. Image Compression. Why? Reducing transportation times Reducing file size. A two way event - compression and decompression

IMAGE COMPRESSION Image Compression Why? Reducing transportation times Reducing file size A two way event - compression and decompression 1 Compression categories Compression = Image coding Still-image

### A Detailed look of Audio Steganography Techniques using LSB and Genetic Algorithm Approach

www.ijcsi.org 402 A Detailed look of Audio Steganography Techniques using LSB and Genetic Algorithm Approach Gunjan Nehru 1, Puja Dhar 2 1 Department of Information Technology, IEC-Group of Institutions

### Parametric Coding of High-Quality Audio

Parametric Coding of High-Quality Audio Prof. Dr. Gerald Schuller Fraunhofer IDMT & Ilmenau Technical University Ilmenau, Germany 1 Waveform vs Parametric Waveform Filter-bank approach Mainly exploits

### Design and Implementation of an MPEG-1 Layer III Audio Decoder KRISTER LAGERSTRÖM

Design and Implementation of an MPEG-1 Layer III Audio Decoder KRISTER LAGERSTRÖM Master s Thesis Computer Science and Engineering Program CHALMERS UNIVERSITY OF TECHNOLOGY Department of Computer Engineering

### MPEG-4 aacplus - Audio coding for today s digital media world

MPEG-4 aacplus - Audio coding for today s digital media world Whitepaper by: Gerald Moser, Coding Technologies November 2005-1 - 1. Introduction Delivering high quality digital broadcast content to consumers

### DSP. Presented to the IEEE Central Texas Consultants Network by Sergio Liberman

DSP The Technology Presented to the IEEE Central Texas Consultants Network by Sergio Liberman Abstract The multimedia products that we enjoy today share a common technology backbone: Digital Signal Processing

### CS 335 Graphics and Multimedia. Image Compression

CS 335 Graphics and Multimedia Image Compression CCITT Image Storage and Compression Group 3: Huffman-type encoding for binary (bilevel) data: FAX Group 4: Entropy encoding without error checks of group

DAB Digital Audio Broadcasting DAB history DAB has been under development since 1981 at the Institut für Rundfunktechnik (IRT). In 1985 the first DAB demonstrations were held at the WARC-ORB in Geneva

### Overcompressing JPEG images with Evolution Algorithms

Author manuscript, published in "EvoIASP2007, Valencia : Spain (2007)" Overcompressing JPEG images with Evolution Algorithms Jacques Lévy Véhel 1, Franklin Mendivil 2 and Evelyne Lutton 1 1 Inria, Complex

### A Image Comparative Study using DCT, Fast Fourier, Wavelet Transforms and Huffman Algorithm

International Journal of Engineering Research and General Science Volume 3, Issue 4, July-August, 15 ISSN 91-2730 A Image Comparative Study using DCT, Fast Fourier, Wavelet Transforms and Huffman Algorithm

### Data Representation and Networking

Data Representation and Networking Instructor: Dmitri A. Gusev Spring 2007 CSC 120.02: Introduction to Computer Science Lecture 3, January 30, 2007 Data Representation Topics Covered in Lecture 2 (recap+)

### Compression; Error detection & correction

Compression; Error detection & correction compression: squeeze out redundancy to use less memory or use less network bandwidth encode the same information in fewer bits some bits carry no information some

### Huffman Code Application. Lecture7: Huffman Code. A simple application of Huffman coding of image compression which would be :

Lecture7: Huffman Code Lossless Image Compression Huffman Code Application A simple application of Huffman coding of image compression which would be : Generation of a Huffman code for the set of values

### ECE 499/599 Data Compression & Information Theory. Thinh Nguyen Oregon State University

ECE 499/599 Data Compression & Information Theory Thinh Nguyen Oregon State University Adminstrivia Office Hours TTh: 2-3 PM Kelley Engineering Center 3115 Class homepage http://www.eecs.orst.edu/~thinhq/teaching/ece499/spring06/spring06.html

DIGITAL IMAGE PROCESSING WRITTEN REPORT ADAPTIVE IMAGE COMPRESSION TECHNIQUES FOR WIRELESS MULTIMEDIA APPLICATIONS SUBMITTED BY: NAVEEN MATHEW FRANCIS #105249595 INTRODUCTION The advent of new technologies

### Image Compression for Mobile Devices using Prediction and Direct Coding Approach

Image Compression for Mobile Devices using Prediction and Direct Coding Approach Joshua Rajah Devadason M.E. scholar, CIT Coimbatore, India Mr. T. Ramraj Assistant Professor, CIT Coimbatore, India Abstract

### Source Coding Basics and Speech Coding. Yao Wang Polytechnic University, Brooklyn, NY11201

Source Coding Basics and Speech Coding Yao Wang Polytechnic University, Brooklyn, NY1121 http://eeweb.poly.edu/~yao Outline Why do we need to compress speech signals Basic components in a source coding

### Scalable Perceptual and Lossless Audio Coding based on MPEG-4 AAC

Scalable Perceptual and Lossless Audio Coding based on MPEG-4 AAC Ralf Geiger 1, Gerald Schuller 1, Jürgen Herre 2, Ralph Sperschneider 2, Thomas Sporer 1 1 Fraunhofer IIS AEMT, Ilmenau, Germany 2 Fraunhofer

### Fourier transforms and convolution

Fourier transforms and convolution (without the agonizing pain) CS/CME/BioE/Biophys/BMI 279 Oct. 26, 2017 Ron Dror 1 Why do we care? Fourier transforms Outline Writing functions as sums of sinusoids The

### Efficient Representation of Sound Images: Recent Developments in Parametric Coding of Spatial Audio

Efficient Representation of Sound Images: Recent Developments in Parametric Coding of Spatial Audio Dr. Jürgen Herre 11/07 Page 1 Jürgen Herre für (IIS) Erlangen, Germany Introduction: Sound Images? Humans

### Lossless Image Compression having Compression Ratio Higher than JPEG

Cloud Computing & Big Data 35 Lossless Image Compression having Compression Ratio Higher than JPEG Madan Singh madan.phdce@gmail.com, Vishal Chaudhary Computer Science and Engineering, Jaipur National

### oit Digital Audio Basics with Audacity UMass Office of Information Technologies Get Started with Digital Audio...

oit UMass Office of Information Technologies Digital Audio Basics with Audacity Get Started with Digital Audio... 2 The Audacity Interface... 3 Edit Your Audio... 4 Export Your Audio Project... 5 Record

### Introduction to Audacity

IMC Innovate Make Create http://library.albany.edu/imc/ 518 442-3607 Introduction to Audacity NOTE: This document illustrates Audacity 2.x on the Windows operating system. Audacity is a versatile program

### IMAGE PROCESSING USING DISCRETE WAVELET TRANSFORM

IMAGE PROCESSING USING DISCRETE WAVELET TRANSFORM Prabhjot kour Pursuing M.Tech in vlsi design from Audisankara College of Engineering ABSTRACT The quality and the size of image data is constantly increasing.

### Module 7 VIDEO CODING AND MOTION ESTIMATION

Module 7 VIDEO CODING AND MOTION ESTIMATION Lesson 20 Basic Building Blocks & Temporal Redundancy Instructional Objectives At the end of this lesson, the students should be able to: 1. Name at least five

### CS1114 Section 8: The Fourier Transform March 13th, 2013

CS1114 Section 8: The Fourier Transform March 13th, 2013 http://xkcd.com/26 Today you will learn about an extremely useful tool in image processing called the Fourier transform, and along the way get more

### VISUAL QUICKSTART GUIDE QUICKTIME PRO 4. Judith Stern Robert Lettieri. Peachpit Press

VISUAL QUICKSTART GUIDE QUICKTIME PRO 4 Judith Stern Robert Lettieri Peachpit Press Visual QuickStart Guide QuickTime Pro 4 Judith Stern Robert Lettieri Peachpit Press 1249 Eighth Street Berkeley, CA 94710

### Multimedia Technology

Multimedia Application An (usually) interactive piece of software which communicates to the user using several media e.g Text, graphics (illustrations, photos), audio (music, sounds), animation and video.

### Teaching KS3 Computing. Session 5 Theory: Representing text and sound Practical: Building on programming skills

Teaching KS3 Computing Session 5 Theory: Representing text and sound Practical: Building on programming skills Today s session 5:00 6:00 Representing text and sound 6.00 7.00 While loops and consolidation

### Section 7.6 Graphs of the Sine and Cosine Functions

Section 7.6 Graphs of the Sine and Cosine Functions We are going to learn how to graph the sine and cosine functions on the xy-plane. Just like with any other function, it is easy to do by plotting points.

### Image Coding and Data Compression

Image Coding and Data Compression Biomedical Images are of high spatial resolution and fine gray-scale quantisiation Digital mammograms: 4,096x4,096 pixels with 12bit/pixel 32MB per image Volume data (CT

### Audio Processing on ARM Cortex -M4 for Automotive Applications

Audio Processing on ARM Cortex -M4 for Automotive Applications Introduction By Pradeep D, Ittiam Systems Pvt. Ltd. Automotive infotainment systems have become an integral part of the in-car experience.

### Audio and video compression

Audio and video compression 4.1 introduction Unlike text and images, both audio and most video signals are continuously varying analog signals. Compression algorithms associated with digitized audio and

### JPEG Compression Using MATLAB

JPEG Compression Using MATLAB Anurag, Sonia Rani M.Tech Student, HOD CSE CSE Department, ITS Bhiwani India ABSTRACT Creating, editing, and generating s in a very regular system today is a major priority.

### Inserting multimedia objects in Dreamweaver

Inserting multimedia objects in Dreamweaver To insert a multimedia object in a page, do one of the following: Place the insertion point in the Document window where you want to insert the object, then

### Lossy Coding 2 JPEG. Perceptual Image Coding. Discrete Cosine Transform JPEG. CS559 Lecture 9 JPEG, Raster Algorithms

CS559 Lecture 9 JPEG, Raster Algorithms These are course notes (not used as slides) Written by Mike Gleicher, Sept. 2005 With some slides adapted from the notes of Stephen Chenney Lossy Coding 2 Suppose

### Video Coding in H.26L

Royal Institute of Technology MASTER OF SCIENCE THESIS Video Coding in H.26L by Kristofer Dovstam April 2000 Work done at Ericsson Radio Systems AB, Kista, Sweden, Ericsson Research, Department of Audio

### A NEW DCT-BASED WATERMARKING METHOD FOR COPYRIGHT PROTECTION OF DIGITAL AUDIO

International journal of computer science & information Technology (IJCSIT) Vol., No.5, October A NEW DCT-BASED WATERMARKING METHOD FOR COPYRIGHT PROTECTION OF DIGITAL AUDIO Pranab Kumar Dhar *, Mohammad

### Audio Compression for Acoustic Sensing

Institut für Technische Informatik und Kommunikationsnetze Audio Compression for Acoustic Sensing Semester Thesis Martin Lendi lendim@student.ethz.ch Computer Engineering and Networks Laboratory Department

Chapter 1 Data Storage 2007 Pearson Addison-Wesley. All rights reserved Chapter 1: Data Storage 1.1 Bits and Their Storage 1.2 Main Memory 1.3 Mass Storage 1.4 Representing Information as Bit Patterns

### Anatomy of a Video Codec

Anatomy of a Video Codec The inner workings of Ogg Theora Dr. Timothy B. Terriberry Outline Introduction Video Structure Motion Compensation The DCT Transform Quantization and Coding The Loop Filter Conclusion

### Table of Content. Nero Express Manual

Table of Content 1. Start Successfully... 4 1.1. About Nero Express... 4 1.2. Versions of Nero Express... 4 1.3. Working with Nero Express... 6 1.4. Starting Nero Express... 6 2. User Interface... 7 2.1.

### freetunes Engelmann Media GmbH

freetunes 3.0 Contents 3 Table of Contents Part I Introduction 6 1 System... requirements 6 2 Installation... 7 3 Program... start 7 4 Demo... 7 5 Copyright... 7 Part II Converter 10 1 Source... 10 2

### Robert Matthew Buckley. Nova Southeastern University. Dr. Laszlo. MCIS625 On Line. Module 2 Graphics File Format Essay

1 Robert Matthew Buckley Nova Southeastern University Dr. Laszlo MCIS625 On Line Module 2 Graphics File Format Essay 2 JPEG COMPRESSION METHOD Joint Photographic Experts Group (JPEG) is the most commonly

### Novel Lossy Compression Algorithms with Stacked Autoencoders

Novel Lossy Compression Algorithms with Stacked Autoencoders Anand Atreya and Daniel O Shea {aatreya, djoshea}@stanford.edu 11 December 2009 1. Introduction 1.1. Lossy compression Lossy compression is

### CHAPTER 4 REVERSIBLE IMAGE WATERMARKING USING BIT PLANE CODING AND LIFTING WAVELET TRANSFORM

74 CHAPTER 4 REVERSIBLE IMAGE WATERMARKING USING BIT PLANE CODING AND LIFTING WAVELET TRANSFORM Many data embedding methods use procedures that in which the original image is distorted by quite a small

### Image Compression. CS 6640 School of Computing University of Utah

Image Compression CS 6640 School of Computing University of Utah Compression What Reduce the amount of information (bits) needed to represent image Why Transmission Storage Preprocessing Redundant & Irrelevant

### Data Storage. Slides derived from those available on the web site of the book: Computer Science: An Overview, 11 th Edition, by J.

Data Storage Slides derived from those available on the web site of the book: Computer Science: An Overview, 11 th Edition, by J. Glenn Brookshear Copyright 2012 Pearson Education, Inc. Data Storage Bits

### Modeling of an MPEG Audio Layer-3 Encoder in Ptolemy

Modeling of an MPEG Audio Layer-3 Encoder in Ptolemy Patrick Brown EE382C Embedded Software Systems May 10, 2000 \$EVWUDFW MPEG Audio Layer-3 is a standard for the compression of high-quality digital audio.

### MPEG-4 Version 2 Audio Workshop: HILN - Parametric Audio Coding

MPEG-4 Version 2 Audio Workshop: HILN - Parametric Audio Coding Heiko Purnhagen Laboratorium für Informationstechnologie University of Hannover, Germany Outline Introduction What is "Parametric Audio Coding"?

### GUIDELINES FOR THE CREATION OF DIGITAL COLLECTIONS

GUIDELINES FOR THE CREATION OF DIGITAL COLLECTIONS Digitization Best Practices for Audio This document sets forth guidelines for digitizing audio materials for CARLI Digital Collections. The issues described

### Introduction to Wavelets

Lab 11 Introduction to Wavelets Lab Objective: In the context of Fourier analysis, one seeks to represent a function as a sum of sinusoids. A drawback to this approach is that the Fourier transform only

### Wavelet Transform (WT) & JPEG-2000

Chapter 8 Wavelet Transform (WT) & JPEG-2000 8.1 A Review of WT 8.1.1 Wave vs. Wavelet [castleman] 1 0-1 -2-3 -4-5 -6-7 -8 0 100 200 300 400 500 600 Figure 8.1 Sinusoidal waves (top two) and wavelets (bottom

### Basic Compression Library

Basic Compression Library Manual API version 1.2 July 22, 2006 c 2003-2006 Marcus Geelnard Summary This document describes the algorithms used in the Basic Compression Library, and how to use the library

### ESKIAV3 (SQA Unit Code - F9AM 04) Audio and Video Software

Overview This is the ability to use a software application designed to record and edit audio and video sequences. ESKIAV3 (SQA Unit Code - F9AM 04) 1 Performance criteria You must be able to: You must

### Rich Recording Technology Technical overall description

Rich Recording Technology Technical overall description Ari Koski Nokia with Windows Phones Product Engineering/Technology Multimedia/Audio/Audio technology management 1 Nokia s Rich Recording technology

### Performance Analysis of Discrete Wavelet Transform based Audio Watermarking on Indian Classical Songs

Volume 73 No.6, July 2013 Performance Analysis of Discrete Wavelet Transform based Audio ing on Indian Classical Songs C. M. Juli Janardhanan Department of ECE Government Engineering College, Wayanad Mananthavady,

### CHAPTER 5 AUDIO WATERMARKING SCHEME INHERENTLY ROBUST TO MP3 COMPRESSION

CHAPTER 5 AUDIO WATERMARKING SCHEME INHERENTLY ROBUST TO MP3 COMPRESSION In chapter 4, SVD based watermarking schemes are proposed which met the requirement of imperceptibility, having high payload and

### REAL-TIME DIGITAL SIGNAL PROCESSING

REAL-TIME DIGITAL SIGNAL PROCESSING FUNDAMENTALS, IMPLEMENTATIONS AND APPLICATIONS Third Edition Sen M. Kuo Northern Illinois University, USA Bob H. Lee Ittiam Systems, Inc., USA Wenshun Tian Sonus Networks,

### CHAPTER 3 DIFFERENT DOMAINS OF WATERMARKING. domain. In spatial domain the watermark bits directly added to the pixels of the cover

38 CHAPTER 3 DIFFERENT DOMAINS OF WATERMARKING Digital image watermarking can be done in both spatial domain and transform domain. In spatial domain the watermark bits directly added to the pixels of the

### IMAGE COMPRESSION. Chapter - 5 : (Basic)

Chapter - 5 : IMAGE COMPRESSION (Basic) Q() Explain the different types of redundncies that exists in image.? (8M May6 Comp) [8M, MAY 7, ETRX] A common characteristic of most images is that the neighboring

### ESKIAV2 (SQA Unit Code - F9AL 04) Audio and Video Software

Overview This is the ability to use a software application designed to record and edit audio and video sequences. ESKIAV2 (SQA Unit Code - F9AL 04) 1 Performance criteria You must be able to: Use audio

### Appearance. License logo s on top. Front buttons all behind a flap

SR6003 Sales sheet SR6003 AV Receiver Overview The demand for the A/V Receiver is raised to another level: Quality requested for home entertainment has been evolved greatly by the Full HD Video contents

### Technical PapER. between speech and audio coding. Fraunhofer Institute for Integrated Circuits IIS

Technical PapER Extended HE-AAC Bridging the gap between speech and audio coding One codec taking the place of two; one unified system bridging a troublesome gap. The fifth generation MPEG audio codec

### Tips for Mixing - From a Mastering Point of View

Mixing for Mastering Tips Plug-ins on the Master Output If you are using plug-ins on the master output during mixing, then export two versions: A) one version with plug-ins enabled on the master output,

### SA-10 SUPER AUDIO CD PLAYER. the new reference

SA-10 SUPER AUDIO CD PLAYER the new reference dedicated to detail For many years, the Marantz Reference Series of MA-9, SC-7 and SA-7 has set the standards for playback and amplification: not only does

### Marantz releases a new mid range Network Audio Player, the NA6005 features high resolution audio playback

Marantz releases a new mid range Network Audio Player, the NA6005 features high resolution audio playback Mahwah, NJ January 15 th, 2015 Marantz, a world leader in advanced audio technologies, today announced

### 2.2: Images and Graphics Digital image representation Image formats and color models JPEG, JPEG2000 Image synthesis and graphics systems

Chapter 2: Representation of Multimedia Data Audio Technology Images and Graphics Video Technology Chapter 3: Multimedia Systems Communication Aspects and Services Chapter 4: Multimedia Systems Storage

### Sometimes dreams come true

aria piccolo+ has the genes of aria, it s fanless, it has internal HDD/SSD storage and includes an audiophile grade DAC supporting PCM and DSD music both for stereo and multichannel. aria piccolo+ is compatible

### Recording oral histories

Florida International University FIU Digital Commons Works of the FIU Libraries FIU Libraries 3-2017 Recording oral histories Rebecca Bakker Florida International University Follow this and additional

### Digital-to- Analog Converter

Since 1984 2120 Digital-to- Analog Converter An introduction to the technology within the Boulder 2120 Digital-to-Analog Converter. Welcome We are living in a world of continual change. Consumer technology

### ENEE408G Multimedia Signal Processing Design Project on Digital Audio Processing

The Goals ENEE408G Multimedia Signal Processing Design Project on Digital Audio Processing 1. Learn the fundamentals of perceptual coding of audio and intellectual rights protection from multimedia. 2.

### INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO

INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO ISO/IEC JTC1/SC29/WG11 N15071 February 2015, Geneva,