The Effect of Bit-Errors on Compressed Speech, Music and Images

Size: px
Start display at page:

Download "The Effect of Bit-Errors on Compressed Speech, Music and Images"

Transcription

1 The University of Manchester School of Computer Science The Effect of Bit-Errors on Compressed Speech, Music and Images Initial Project Background Report 2010 By Manjari Kuppayil Saji Student Id: MSc. Advanced Computer Science and IT Management

2 Abstract The need to reduce transmission data rates and storage space for data has given rise to various compression techniques. These techniques make use of human perception limitations to discard parts of data that are likely to be imperceptible to most humans. Multimedia data transmitted over wireless networks include speech, music and image files. Data sent over wireless LAN networks are often subject to bit-errors due to interference, collisions and so on. While any data is prone to such error, the effect of bit-errors on compressed data is likely to be more pronounced. This project aims to study how these errors will affect the perceived quality of the data. There have been various studies performed to determine a good metric that can objectively derive the quality of speech, music and images based on human perception models. They are meant to be an accurate estimate of relative subjective tests for quality ratings, which are more expensive to carry out. The project studies the underlying theory behind the different objective metrics. Some of the available software implementations to calculate objective scores will be tested and appropriate ones will be used for the study. The project uses MATLAB audio and image processing toolkits as the main software tool to simulate bit-errors and calculate quality scores. 14 th May 2010 i ManjariKuppayilSaji.pdf

3 TABLE OF CONTENTS Abstract... i 1. INTRODUCTION Description of Project Main Objectives Scope of Project Applications Outline of Report PROJECT BACKGROUND AND RELATED WORK Mobile Communication over WLAN Compression Techniques Compression standards in speech Compression standards in music Compression standards in images Perceptual Quality Measurement Speech Quality Measurement Music Quality Measurement Image Quality Measurement RESEARCH METHODS Project Aims and Objectives Software Tools Project Development Implementation Deliverables Evaluation of Results CONCLUSION References Appendix A: Project Plan th May 2010 ii ManjariKuppayilSaji.pdf

4 1. INTRODUCTION 1.1 Description of Project Internet and Mobile connectivity has grown tremendously in the last few decades, and bandwidth has become an important and costly resource. The aim of compression is to allow information to be represented with fewer bits. For example, if we use a sampling frequency of Fs (samples per second) and represent each sample with N bits, we would be required to transmit N x Fs bits per second. In a WLAN, the available frequency bandwidth is shared among many users. This means that if we can represent information with a fewer number of bits (reduce N) and make a compromise on quality by taking fewer samples per second (reduce Fs), we would be able to use the available bandwidth more economically and efficiently. The bandwidth could be shared by more people and information could also be sent faster. Using fewer bits to represent information would also be an advantage while storing large amounts of data such as images or music files which are generally many megabits in size. Data packets are often subject to bit-errors. In the case of a WLAN networks, data packets are error prone as they are subject to interference in the medium, collisions, multipath propagation effects, scattering of waves and so on. There exist error detection and correction protocols that try to correct such bit-errors such as Forward Error Correction (FEC) techniques. If correction due to these methods is not possible, typically a re-transmission protocol for the damaged packet is followed. But in the case of some real time applications, the delay in waiting for a re-transmission might not be feasible and a damaged packet itself is used in the application typically with some Packet Loss Concealment (PLC) techniques. Advanced PLC techniques follow a model based approach to interpolate the damaged packet based on speech models. As the original data is compressed, each bit that is transmitted potentially carries much more information than a bit of uncompressed data. Some compression techniques use some form of prediction, which means that an error in one bit could be have a cascading effect on the rest of the data further reducing the quality of the information. Therefore, the effect of bit-errors on compressed data is likely to be more pronounced as compared to normal uncompressed data. This effect is what will be investigated in this project. If the overall effect of bit-errors results in an acceptable quality, it would perhaps lessen the use of PLC techniques. 1.2 Main Objectives This project is aimed at looking at the effect bit-errors have on compressed data. The main objective is to study and evaluate how bit-errors in compressed multimedia affect their perceived quality. To evaluate and compare perceived quality, there needs to be a method of quantifying it. In the case of speech, a metric known as the Mean Opinion Score (MOS) was developed [1] to subjectively measure the quality of narrowband speech. This standard score gave a numerical value to the telephone speech quality and has a rating of 1 (worst) to 5 (best) which was calculated by taking the average quality rating given by a number of human testers. As it is unrealistic to carry out subjective tests very often, a number of objective metrics are being studied to replace the need for human testers. They can be calculated using software implementations based on human perception models. The success of objective metrics lies in how well they can match subjective scores. As subjective scores like MOS is not as well-defined for music and images, and during the course of this project defining a MOS-like score for them based on the same principles as speech will be one of the objectives. 14 th May ManjariKuppayilSaji.pdf

5 Once the software implementations are evaluated and appropriate ones selected, the quality rating of uncompressed and compressed files to which bit-errors are introduced will be calculated. The calculation will be repeated for different bit-error probability rates. This would provide the data to compare and interpret the effects of bit-errors on the different data formats. A set of graphs plotted to show objective quality score vs. bit-error rate probability will also be produced. At the end of the project, an application code will be produced that has the following functionality: Ability to read the required speech, music and image files in the uncompressed and compressed formats. Functionality to convert the input files into bit format allowing introduction of different biterror probability rates. In the case of speech and music, a functionality to playback the modified files and allow users to listen to the damaged file. Similarly with the image files, functionality that allows users to see the modified images with bit-errors introduced. Computation of a quality rating for the modified files which will be implemented using automated software relevant to the different file formats. Production of objective quality score vs. bit error probability rates. These results will then be studied and a conclusion will be drawn regarding the effect of bit-errors on compressed data. The results could help in selecting appropriate compression techniques and data rates depending on the application. 1.3 Scope of Project This project focuses on three kinds of data: Speech, Music and Images. It concentrates on studying the effects of bit-errors on the following commonly used compression techniques: Speech: 1) G.711 Pulse Code Modulation (PCM) 2) G.726 Adaptive Differential Pulse Code Modulation (ADPCM) 3) G.729 Conjugate Structure CELP (Code Excited Linear Prediction) Music: MP3 MPEG-1 Audio Layer 3 Images: JPEG Joint Photographic Experts Group As an extension, the effect of bit-errors on the quality of Moving Picture Experts Group (MPEG) compressed video files could also be studied. Short video clips, are quite popular and the study of biterror effects could help improve the way they are compressed and transmitted. To simulate a WLAN environment, bit-errors will be introduced to files with compressed data. Perceived quality is a subjective measurement and ideally, calculating the quality would mean training a large group of human subjects to rate the different multimedia. It would then involve calculating a mean value among all the subjects. This would be time-consuming and inefficient. For this project, the quality of the various data will be measured using automated software. Bit-errors will 14 th May ManjariKuppayilSaji.pdf

6 be introduced to compressed data and a rating will be given to the quality of the resulting data. By increasing the percentage of bit-errors, the effect can be clearly observed. The application program developed will not include implementing the compression algorithms itself. Rather speech, music and image files in the above listed formats will be modified to introduce biterrors. 1.4 Applications Calculating objective quality scores for speech has obvious applications in mobile telephony and these scores have been used to calculate network quality and various experiments have been carried out to determine the objective quality scores of different compression algorithms in use today. The same extensive use of image and music quality metrics is not prevalent. The following are some applications where the results from this project may be used: Objective Scores for Music Quality One application where calculating quality could be useful is in the area of digital radio and television broadcasting. It uses compressed signals and since it is a real-time application, re-transmission requests are not feasible. By being able to calculate objective quality scores of transmitted data, it would be possible to make decisions regarding whether the packet can be used with an acceptable loss in quality. It can even be used in feedback mechanisms that allow the bit rate to be increased or adjusted depending on the perceived quality. A similar application would be the Internet radio. Another application would be in the creation of an internet choir [2]. In this real time application, the orchestra members interact over the internet therefore delays cannot be tolerated during a performance. Hence an objective quality score would provide information required to reduce the number of re-transmissions for damaged packets. Objective Scores for Image Quality Transmitting images digitally almost always require compression as a single image can take megabytes of data. Sending images over wireless networks is becoming quite common with images being sent between mobile phones using Bluetooth. Having an objective quality measure can be useful by allowing mobile application developers decide transmission bit-rates. Another interesting application is in the field of teleradiology, where images such as CT scans are transmitted so that specialist doctors can analyze and diagnose a patient without having to be physically near the patient. In such cases, an objective quality score can help determine acceptable compression techniques and also help to decide if an image has to be re-transmitted. A related study is given in [3]. Along with these, applications of objective quality measurement can include video conferencing which is a combination of audio and moving images. As it is a real time application, having a quality rating can be useful in improving the way it is implemented. 1.5 Outline of Report The rest of this report is organized in the following manner. Chapter 2 introduces some of the main topics related to this project. First, the characteristics of mobile communication over a WLAN network are described. The main concerns are elaborated along with a brief description of how wireless networks contrast with traditional wired networks. Next, it outlines 14 th May ManjariKuppayilSaji.pdf

7 the compression algorithms used in the formats that are going to be used in this project. It explains how these algorithms make use of the human perception of sight and sound to improve the efficiency of these algorithms. Finally it also looks at some of the objective quality indices that are used for speech, music and images. Chapter 3 outlines how this project will be conducted. It describes the aims and objectives of the project. Then it gives a description of what software tools will be used in completing this project. Finally the implementation details are given along with a list of deliverables and how the results will be evaluated. The project has been broken down into tasks. A timeline has been given for completing the tasks and a Gantt chart is provided in Appendix A. The final Chapter gives a conclusion to the report. 14 th May ManjariKuppayilSaji.pdf

8 2. PROJECT BACKGROUND AND RELATED WORK This section gives background information on the current literature about the main topics related to this project. It also outlines some of the theory behind the compression algorithms that will be used in the project. 2.1 Mobile Communication over WLAN The radio spectrum covers frequencies below 300 GHz. The use of this spectrum is regulated and various applications use different frequency bands. Some frequency ranges are reserved for military communication, while others are used for TV signals, RFID etc. The IEEE b which is the standard for wireless LAN communications specifies 2.4GHz as the frequency band for WLAN use. This band is split into 14 channels with adjacent channels overlapping. When data is sent over traditional wired networks, there is hardly any interference as the path (wires) through which the data packets are sent is well defined. It is also possible to predict degradation of signals as the property of the medium is known. On the other hand, in wireless networks, interference can occur due to a variety of factors outlined in [4]. As radio signals travel through the atmosphere, they interfere with physical structures like buildings and trees. This causes the signal to become weaker. Even particles of dust in the atmosphere can affect signals. This effect is called path loss. Another phenomenon that affects wireless networks is multipath propagation of signals. Signals could be reflected off physical structures and thus different parts of the signal could reach the receiver through different paths having different delays. This can lead to transmission errors as the data received through the different paths interfere with each other. This effect is known as Inter Symbol Interference (ISI). Additionally, in mobile communications, the sender or receiver or both could be moving, further introducing complexities such as constantly changing multipath propagation effects. This increases the probability of errors being introduced in the signal. Unlike wired communications, the medium (air) in wireless communications is shared among all users. This means that different signals could interfere among themselves. Thus for WLAN communications, this limited bandwidth has to be shared among all users. Commonly used techniques for sharing include multiplexing over time, frequency or code domains. The IEEE b specifies the Direct Sequence Spread Spectrum (DSSS) technique for transmission. From this discussion it is clear that the bandwidth is a costly resource. The main advantage of a WLAN is that it enables people to be mobile. Most applications these days require transmission of large amounts of data. This in turn increases usage of the available frequency bandwidth. To make transmissions more efficient, we need to reduce the amount of data to be transmitted and this is why compression techniques are essential. Effective compression techniques help to represent the same information using lesser bits. This has immediate advantages. It would reduce the amount data to be transferred over the WLAN and also take up less storage space when it has to be saved on a disk. Compression techniques for different kinds of data are discussed in the following sections. 14 th May ManjariKuppayilSaji.pdf

9 2.2 Compression Techniques There are broadly two main compression techniques lossless and lossy compression. In lossless compression, as the name suggests, the compressed data at the receiver s end can be uncompressed to re-create an exact representation of the original data. One of the techniques that lossless compression algorithms exploit is the fact that data has inherent redundancy properties. For example, in text, if a single character repeats 100 times, it is more efficient to transmit that character and the number of times it repeats (100), rather than sending the character 100 times. Another example of data redundancy would be exploiting properties of the English language. For example, in the English alphabet, the letter q is always followed by a u. In this case it is not necessary to send the character after q. Using a good encoding scheme to represent the data in binary format would also help in reducing the bandwidth usage. Lossless compression is generally used with text and data files. It also has applications in reducing the space required for storing information. Some common encoding algorithms used for this purpose are Huffman coding and the Lempel-Ziv-Welch (LZW) algorithm. In lossy compression on the other hand, the decompressed data at the receiver s end will not exactly match the original signal. The advantage of this being much lesser bits are required to represent the information. Lossy compression is generally used for multimedia applications including speech, music and images. These techniques exploit the fact that a small drop in the quality of multimedia data will not be perceptible to humans or it can be tolerated depending on the application it is used for. For example, the human ear can hear sounds only in a particular range of frequencies and the human eye can see only certain wavelengths. Lossy compression algorithms include Linear Predictive Coding (LPC) and A-law for speech, MP3 for music and JPEG for images. The next section takes a look at some of the lossy compression algorithms in detail. 2.3 Compression standards in speech G.711 Pulse Code Modulation (PCM) PCM is a waveform speech compression technique. Here the analog speech signals are first digitized by taking samples of the signal at intervals, and the values at those intervals are then represented in binary format. In G.711, Speech signals are sampled at 8000 samples/sec and the values are encoded using 8 bits. G.711 uses companding or non uniform quantization to achieve good quality speech using just 8 bits. It ensures that the signal to quantization noise ratio (SQNR) remains acceptable for speech with amplitudes ranging from soft to loud. This results in a 64kbits/sec bit stream output. The North American G.711 standard is called the µ-law, other countries use a slightly different version and it is known as the A-law. PCM first converts speech signals to digital signals (i.e.) the signals are represented as 0s and 1s. To do this, the amplitude of the signal is noted a fixed number of times a second. This is known as sampling frequency. The amplitude value is then represented using a fixed number of bits. This step is called quantization. Depending on the number of bits used to represent the sample, the signal can be divided into a fixed number quantization levels. Quantization introduces noise into the original signal because it involves an approximation or rounding error when amplitude values are mapped to these quantization levels. When the quantization levels are constant, it is known as uniform quantization. This noise introduced can be measured by a metric known as Signal to Quantization Noise Ratio (SQNR). If this ratio is sufficiently high, the noise introduced will be imperceptible to the human ear. 14 th May ManjariKuppayilSaji.pdf

10 When uniform quantization is used, speech signals which have small amplitudes that lie between the quantization levels and speech signals having very large amplitudes that lie beyond quantization range will be represented with a large margin of error. This would reduce SQNR to an unacceptable value. To overcome this, non-uniform quantization is used, where the quantization levels lie closer together for small amplitudes and further apart for larger amplitudes. Figure 1 below illustrates non-uniform quantization (companding). In this figure, the samples are each represented with 8bits. Figure1: Converting uniform quantization to non uniform quantization (Companding) [5] The effect of non-uniform quantization with 8 bits (PCM sample size) can be achieved by compressing the original signal and then quantizing it uniformly using 12 bits. Most computers represent integers using 16 bit representation, hence in PCM speech samples are represented by 16 bit values. To reduce the data bit rate to represent the speech sample, the G.711 A- law compression algorithm converts the 16 bit values into an 8 bit value. This value is in the format of floating point value. At the receiver s end, the 8 bit floating point value is mapped to corresponding 16 bit value and the original signal is obtained. Thus, at 8 khz sampling frequency, G.711 reduces the required data transfer rate from 128kbps to 64kbps achieving 50% compression. G.726 Adaptive Differential Pulse Code Modulation (ADPCM) In Differential Coding, rather than encoding the values of the amplitude as in PCM, the difference between two adjacent samples are encoded. This allows representation using a fewer number of bits. ADPCM uses a prediction algorithm to predict the value of the next sample based on previous samples. The difference between the predicted sample and the actual sample is transmitted. The receiver uses the same prediction algorithm and can thus reconstruct the sample values. The term adaptive refers to the fact that bits needed to encode the quantization levels are adjusted dynamically 14 th May ManjariKuppayilSaji.pdf

11 according to the speech signal. The G.726 standard reduces the bit stream output to 16, 24, 32 or 40 bits/second using 2, 3, 4 or 5 bits respectively while encoding the sample differences. An important consideration in differential coding is that if there is an error in the prediction, all subsequent samples will also have an error. To make sure the error does not persist, the receiver s end is modified to decay the effect of an error. Sub-band ADPCM is a technique by which speech is split into bands and more bits are used to represent the frequencies which are more important for comprehending speech, and lesser bits for the bands used to convey for example, emotion. G.729 CS- CELP (Conjugate Structure-Code Excited Linear Prediction) This is a source speech compression technique as opposed to a waveform codec. It is based on the source-filter model of speech production, which is a model built on the basis of how human sound is actually produced with the help of vocal chords, the vocal tract and other speech organs. Speech formation is a complex process. The source filter model helps to synthesize speech using mathematical equations. In order to achieve compression CELP, uses a two codebooks, one basic and one adaptive to represent characteristic residue signals. Source Filter Model [6]: Human speech is created by mainly two kinds of source signals, voiced and unvoiced sounds. Voiced sounds involve vibration of the vocal chords, while unvoiced sounds do not. In the English language, all vowels are voiced sounds. The frequency at which the vocal chords vibrate determines the perceived pitch of the sound and this is known as the fundamental frequency of voiced sounds. Constantly changing fundamental frequencies are what constitute emotion in human speech. Figure 2: Source Filter Model [7] Once the source is produced, the air has to pass through the vocal tract. The human vocal tract acts as a filter ignoring some frequencies and amplifying others (resonance). The position and alignment of 14 th May ManjariKuppayilSaji.pdf

12 the various speech organs is what determines how the filter performs. The frequencies which are amplified by the vocal tract are called formants and at these frequencies the amplitude of the signal is maximum. Characteristic frequencies called anti-formants are also present in nasal sounds that are necessary to accurately model speech. Voiced sounds are represented using their fundamental frequency and unvoiced sounds can be represented by white noise. The vocal tract is determined by formant and anti-formant frequencies. The effect of the lips on the human speech is called lip radiation and can be modeled using a high pass filter. A pictorial representation of the source filter model is shown in Figure 2. Once these characteristics are determined, speech can be synthesized by passing the source signals through a filter that produces formant frequencies. Linear Predictive Coding [8][9]: The formants can be modeled using a set of equations which can predict the next output as a function of the previous outputs. Linear prediction computes coefficients for a time varying filter that represents the formants. So to transmit speech, the sender first calculates the source frequency, and inverse filters this with the original speech. The formants are then calculated from the result. At the receiver s end, the formants are used to create a filter through which the source is passed resulting in synthesized human speech. In practice, the residue left behind after the formants are calculated also contains useful information. Since it is not practical to send the residue information as well, a codebook of typical residue values are stored. The sender then checks which codebook entry most accurately matches the residue (least error) to be sent and sends this code to the receiver. The receiver uses the code to retrieve the residue and the synthesized speech is of better quality. To improve the efficiency of the codebook, G.729 uses two codebooks. A basic codebook contains residue samples for one pitch. A second adaptive codebook is filled during the transmission and provides pitch information by mapping time delays to changes in pitch. The G.729 standard achieves 8kbits/sec bit stream output rates. 2.4 Compression standards in music MP3 MPEG-1 Audio Layer 3 When music is digitized, it is stored on a CD by sampling at times a second and using 16 bits to represent each sample. This roughly calculates to about 1.4 Mbits per second data transfer rate. This data rate is highly impractical to be used over a WLAN to transfer music. The MP3 algorithm is based on the perceptual model of human hearing. So by modeling the characteristics of the human ear and removing parts of the music signal that will not be heard, the bits needed to represent the music sample can be reduced. The human ear is more sensitive to mid range frequencies between 2kHz and 4kHz. At lower and very high frequencies the tone has to be at a higher loudness level (decibels) to be perceived by humans. MP3 uses two phenomena called frequency masking (or simultaneous masking) and temporal masking to achieve compression [10]. In frequency masking, a loud tone masks softer ones at nearby frequencies. Equal loudness contour graphs, shown in Figure 3 [11] describe the threshold of hearing of the human ear at different frequencies. As an example, the graph shown in figure indicates that a 20-decible sound at 1000 Hz would be perceived as the same sound level of 50 decibels at 100 Hz. This is taken into account during simultaneous masking. The dotted line in the graph marked as threshold of audibility is the minimum decibel levels needed by signals at different 14 th May ManjariKuppayilSaji.pdf

13 frequencies to be perceived by humans. A phon is the unit of measurement for the level of perceived loudness. It is defined as dbspl at a particular frequency. This unit measures the loudness of a tone by taking into consideration the effects of frequency on perceived loudness. [14] Figure 3: Equal Loudness Contour Graphs When loud tones are present, this threshold graph changes and more sounds will be masked at adjacent frequencies. This is shown in the Figure 4. Figure 4: Effect of simultaneous masking on audibility threshold The human ear consists of 24 critical bands. The frequencies within these bands are harder to be distinguished by the ear. Thus, the effects of simultaneous masking can be modeled better within 14 th May ManjariKuppayilSaji.pdf

14 these critical bands. Therefore a music signal is first divided into 32 critical bands using filters. Each band is processed separately with mid range frequency bands being allocated more bits. For temporal masking, the fact that quieter tones that are present a short time before or after a loud tone are inaudible is taken into account. The time gap over which this effect is present is a function of the amplitude difference between the tones. MP3 uses encoding techniques like Huffman coding which represents more frequent values with lesser bits, further improving bit rates. Using MP3 compression, data transfer rates can be reduced to rates between 96 and 320 kbits /second. Depending on the application, a data rate that provides sufficient quality can be chosen. 2.5 Compression standards in images JPEG Joint Photographic Experts Group Digital images are represented by pixels which are numerical representations of colour and brightness information. Colour images are typically represented as 3-D arrays with 8 bit values for each of the Red, Green and Blue components. The following are the broad steps followed [15]. In the first step of JPEG compression, pixels in an image are arranged into 8x8 blocks. JPEG uses the human eye s perception of colour and brightness to reduce the size of images. The human eye is more sensitive to brightness compared to colour, therefore the colour information in an image can be coded at lesser accuracy. Hence RGB images are first converted into a YUV format which is a representation in terms of luminance (brightness) and chrominance (colour). Chrominance values of blocks of pixels are averaged as it is not required to depict them with complete accuracy. In the next step, a discrete cosine transform function (DCT) is performed on each block, representing them as a set of 64 DCT coefficients. The first coefficient (called DC coefficient) is the average value of the other 63 coefficients (AC coefficients). In the quantization step, two different tables of constants are used for the luminance and chrominance components. The DCT coefficients are divided by this constant and then rounded off to the nearest integer. In this step, information quality is lost. Smaller DCT coefficients which are not very important to human perception are almost eliminated by giving them a higher quantization factor. These quantization tables are also sent along with the compressed image in the JPRG header so that the receiver can decode the compressed image. Huffman coding is used in JPEG to represent the quantized values. Figure 5 depicts a block diagram of how the JPEG algorithm works. 14 th May ManjariKuppayilSaji.pdf

15 Figure 5: Steps in the JPEG Compression Algorithm [16] 2.6 Perceptual Quality Measurement Mean Opinion Score (MOS) is a scale used to subjectively measure the quality of narrowband speech. An MOS scale ranges from 1 (worst) to 5 (best). This scale is generally used to measure the quality of compressed data or data sent over a communication channel. Interpretation of quality differs from individual to individual; therefore an MOS score would be determined by averaging the scores rated by a large group of test participants. Subjective scores for images and video are not well-defined. They could be calculated in the same way as an MOS score by taking the average rating of a group of individuals. Subjective scores could be a time consuming and expensive process. Moreover it is feasible to perform subjective quality tests at short notice. Automating these scores, to get an objective quality rating which approximate human interpretation, is a much more effective strategy. The algorithms designed to calculate these scores are typically Full Reference algorithms which take both the original and processed/transmitted signals as inputs. For example, as a very simple measure, the signal to noise ratio can be calculated from the inputs and mapped to a rating level. But this would not give a very accurate indication of quality from a human perspective; hence the algorithm would have to be based on human perception models. The following are the standards laid down by Telecommunication Standardization Sector (ITU-T) for objective quality measurement: Speech: ITU-T Recommendation P.862 (02/01) - Perceptual evaluation of speech quality (PESQ), an objective method for end-to-end speech quality assessment of narrowband telephone networks and speech codecs. Music (Audio): ITU-R Recommendation BS Method for objective measurements of perceived audio quality. Images: No such standard exists for images though there are ITU recommendations for video and large screen digital imagery Video: ITU-T Recommendation J.247 (08/08) Objective perceptual multimedia video quality measurement in the presence of a full reference. It specifies a score called Perceptual Evaluation of Video Quality (PEVQ). 14 th May ManjariKuppayilSaji.pdf

16 2.6.1 Speech Quality Measurement The first standard for telephone band quality measurement was ITU-T recommendation P.861 called Perceptual Speech Quality Measure (PSQM) which was introduced in This algorithm had some drawbacks, for example it could not assess quality of signals that were not time-aligned. In modern day situations where time delay is quite prevalent in applications such as VoIP, the algorithm was inadequate and hence a new recommendation had to be established. Research in the field led to the development of a new recommendation PESQ in February 2001[17]. PESQ - Perceptual Evaluation of Speech Quality As shown in Figure 6, the PESQ algorithm uses a perceptual model to derive internal representations of the input signals after time alignment. The difference in these signals is used to predict quality using a cognitive model. Figure 6: The basic structure of the PESQ Algorithm [17] The PESQ Algorithm has been used to calculate the quality of the waveform codecs (G.711, G.726) as well as the G.729 CELP codec at data rates above 4kbits/sec. The PESQ Algorithm has been recommended for the use of testing with environmental noise, among other recommendations, which is the key focus of this project. For an accurate objective score reading the original signal must be uncompressed and with without noise. The modified signal will be the compressed signal with errors introduced Music Quality Measurement PEAQ - Perceptual Evaluation of Audio Quality PEAQ is the algorithm is specified in ITU-R BS.1387 [18]. As with PESQ, it works by using human auditory models to determine quality of music. The algorithm takes the reference and signal to be tested as inputs. The signals input to the PEAQ algorithm must be sampled at 48 khz and must be time aligned. The PEAQ algorithm analyses the audio samples frame-by-frame. For each of the two input signals, a set of ear model output values are calculated for every frame. These values called are obtained by passing each sample through an FFT-based ear model (as defined for the Basic version of ITU- 14 th May ManjariKuppayilSaji.pdf

17 BS The Advanced version uses both an FFT based as well as a filter bank based ear model). The ear models values are based on loudness, modulation, masking and adaptation [19] effects as interpreted by the human brain. The difference between the values for corresponding frames in the test and reference signals are then determined. This is mapped to a set of Model Output Variables (MOVs) 20] which are determined based on audible differences variations between the reference and test signals which are likely to be perceived by the human ear. The basic version of PEAQ defines 11 MOVs and the advanced version defines 5. The MOVs are mapped to a single value called the Objective Difference Grade (ODG) using a cognitive model. The cognitive model for PEAQ is based on an artificial neural network which was trained by using data collected from subjective listening tests. Figure 7 shows the steps involved in PEAQ. Figure 7: Block Diagram of the PEAQ algorithm Basic Version Image Quality Measurement Earlier, image quality was measured using error sensitivity based approaches. In these approaches the images were split into different channels using methods like discrete cosine transformation (DCT). The corresponding channels are compared for the test and reference images using a Contrast Sensitivity Function (CSF) with different channels being given a different weighting factor based on visual sensitivity models. The weakness of this approach is outlined in [21]. The main weakness pointed out is that all kinds of distortions are treated similarly. This stems from the fact that some errors do not affect the perceived quality as much as other kinds of errors so. The error sensitivity 14 th May ManjariKuppayilSaji.pdf

18 approaches only work well for simple patterns in more complex images, an interaction between channels also needs to be taken into consideration. Also the correlation between error sensitivity and perceived quality is not very strong. Based on these observations a new approach to image quality measurement in outlined in [22] and [23]. Both these indices are based on the theory that image quality can be measured as a comparison of structural similarity between the test and reference images. SSIM is a generalized version of UQI. SSIM Structural Similarity Index In this approach, the luminance, contrast and structure of images are compared separately. Also luminance effects are removed from the images before their structures are compared for differences as the luminance does not affect structural differences. The SSIM index is calculated over the image in small 8X8 pixel windows. The overall image quality is then calculated as a mean value over the entire image. Mathematically, luminance can be measured as a mean value of the image samples and contrast is measured as standard deviation of the samples. The SSIM algorithm has been implemented in MATLAB and it will be used for this project. The following figure shows block diagram of how the SSIM index is derived. Figure 8: Block Diagram of the SSIM algorithm [23] The resulting SSIM index lies in the range -1 to 1.A comparison of the UIQI and SSIM indices to other measurements such as the Mean Square Error (MSE) and a multidimensional quality measure using SVD (Singular Value Decomposition) is given in [24] and [25]. A related study has been done in [26] using a metric called Image Quality Scale. The material available on this measure is limited and a suitable software implementation was not available for further study. 14 th May ManjariKuppayilSaji.pdf

19 3. RESEARCH METHODS This chapter lays out the methodology that will be followed to complete the project. It describes the software tools that will be used and how the project will be developed. The project has been separated into tasks and a project plan and a Gantt chart created which is given in Appendix A. This project involves introducing bit-errors in uncompressed and compressed speech, music and image files. The effect of the bit-errors will be observed as a series of graphs of bit-error probability vs. the perceived quality in terms of a mean opinion score. 3.1 Project Aims and Objectives 1) Study the effect of bit-errors on uncompressed and compressed speech, music and image files. 2) Study the effect of an increasing percentage of bit-errors on uncompressed and compressed narrow band speech (G.711, G.726 and G.729 codecs). 3) Study the effect of increasing percentage bit-errors on uncompressed and compressed music files (e.g. MP3). 4) Study the effect of increasing percentage of bit-errors on uncompressed and compressed images (e.g. JPEG). 5) Extend this study to include video files (e.g. MPEG). 3.2 Software Tools MATLAB is the software language that will be used to carry out the study. It is a technical tool that allows programming in a language similar to C. MATLAB contains toolkits that allow both images and audio to be processed easily. Introducing errors in the compressed files and displaying the graph plots of the results will be done in MATLAB. In addition MATLAB implementations of objective quality assessment software will be used. These objective quality scores are based on the ITU standards as mentioned earlier in Chapter 2. The following implementations will be used: 1) Music: Matlab implementation of PEAQ - PQevalAudio.m. A related report is available in [27]. 2) Images: Matlab implementation of SSIM - ssim.m [31] To compress the speech and music files, the audio editing tool Audacity will be used. The Audacity beta (Unicode) version was downloaded for this purpose. [32]. It uses the Lame_v encoder. The uncompressed files will be input and then exported into the various compressed formats as required. To compress image files to the JPEG format Microsoft Paint will be used. 3.3 Project Development Implementation The implementation will follow the structure listed below for compressed files: Step 1: Compressing original files Step 2: Reading compressed files into MATLAB Step 3: Introducing bit-errors into the files Step 4: Calculating Objective Quality Score Step 5: Producing bit-error rate probability vs. Objective Score graphs 14 th May ManjariKuppayilSaji.pdf

20 6. Vary bit-error rate Figure 9: Block Diagram Showing the Implementation steps in the project The project will be divided into three main parts: Speech Analysis, Music Analysis and Image Analysis. Speech Analysis A sample WAV speech file will be taken as the original uncompressed speech. This file will then be compressed into the required formats (G.711, G.726 and G.729) using Audacity an audio editing tool. The compressed sample file will be read as a bit array. For introducing errors, an XOR computation is performed. A bit that is XORed with 1 will be inverted. Therefore to introduce errors, an Error bit array is created with random number bits with the value 1 depending on the required biterror probability. XORing these two arrays give the modified file. To listen to the modified file, the individual bits will be converted back into integer samples and played using the sound function in MATLAB. Both the original WAV file and modified file are passed on to the PESQ software to calculate speech quality. This value will be calculated for the different bit-error probability rates and then the graph can be plotted. Music Analysis An objective quality score for determining perceptual music quality will first be determined by experimenting with different metrics outlined in text. The comparison of the metrics will be based on tests as outlined in the section on Evaluation of Results (3.3.3). To evaluate the effect of bit-errors on compressed music, uncompressed music samples will be obtained by ripping a file from a CD in the WAV format. The music editing tool Audacity will be used to compress this file into MP3 format. Errors will be introduced in the same way as for Speech files. To calculate an objective quality score, the original and modified file will be passed to code that calculates the chosen objective quality score. 14 th May ManjariKuppayilSaji.pdf

21 Image Analysis As with music, an objective quality score for images will be determined first. To evaluate effect of bit-errors, a digital image in the GIF format will be used as the uncompressed version. It will be compressed into JPEG using the Microsoft Paint software. The resulting JPEG image file will be read as bits using the imread function in MATLAB. The integer values will then be converted into bits and stored into a bit array. Errors are introduced similar to the Speech files. To display the modified file, the bit values will be converted back into integer values and then displayed using the imview function in MATLAB. To calculate the objective quality of an image, it will first be converted into grayscale. There are MATLAB functions that can be used to do this. Then the original and modified image files will be passed on to SSIM code which will provide a Structural Similarity rating Deliverables Establishing a MOS-like subjective score for music and images. Determining an appropriate objective quality measure for music and images by testing different measures outlined in literature. Calculating an objective quality score for uncompressed and compressed data using appropriate software. An interpretation of the effect of bit-errors on uncompressed and compressed data using the calculated objective scores. A comparison of the effect of bit-errors on the various compression algorithms studied Evaluation of Results I. Objective Quality Measures In the case of speech files, the objective opinion score (PESQ) is a standard implementation for perceptual speech quality calculation [28].This score has been tested and shown to closely match Mean Opinion Scores (MOS) for telephone quality speech. In the case of music and image files, different objective measures specified in literature need to be evaluated. The only way to test the accuracy of objective quality ratings obtained will be to compare these scores with results from a subjective analysis. As music and image data do not have a definite MOS-like subjective score as for speech, a subjective test will be carried out to define a subjective score. ITU BS [29] provides standards for conducting listening tests. These guidelines will be used while collecting the subjective scores for music files. A similar guideline is outlined for calculating quality of television images in [30], which can be used for this purpose. As part of the test, a group of individuals will be asked to rate an image or music file on a scale of 1 to 5 based on noticeable impairment of the file: 5 Imperceptible 4 Perceptible but not annoying 3 Slightly annoying 2 Annoying 1 Highly Annoying The test will be carried out on different samples of uncompressed files, and files compressed in the different formats. 14 th May ManjariKuppayilSaji.pdf

22 In literature, there are some objective quality score values that do not lie in a range between 1 and 5. Hence the subjective score values obtained from the tests will be converted to this range. The objective quality scores can then be tested for their accuracy by comparing the scores with the subjective scores obtained. Once a suitable objective quality score is determined, it can be used to calculate the objective scores of files with bit-errors introduced into them. II. Interpretation of the effect of bit-errors on compressed data Once the objective scores are obtained for files with bit-errors introduced, an interpretation can be drawn regarding: The robustness of the compression technique to bit-errors. Whether re-transmission requests in case of damaged packets over WLAN networks can be avoided for some applications such as real-time rendition of data. 14 th May ManjariKuppayilSaji.pdf

23 4. CONCLUSION This Background report has laid out the groundwork for completing the project Effect of bit-errors on compressed speech, music and images. It has outlined the key concerns for this project and outlined its main aims and objectives. It has elaborated the principles on which this project is based including current research material. This report also mentions the research methods that will be used to complete this project focusing on the implementation and development of the code that will be used to study and demonstrate the effect of bit-errors on compressed data. Finally the report also gives a timeline that will be followed until submission of the final Dissertation report. The final Dissertation report will include the results that were obtained after carrying out the tests and display them in the form of graphs. It will also include an evaluation of the results obtained and comments on difficulties encountered while carrying out the tests. Compression techniques for transmitting data are becoming commonplace to improve bandwidth usage efficiency and to reduce storage space. Compression techniques are getting increasingly complex as they use human perception models to reduce the amount of data to be transmitted. With such a wide variety of applications that use compressed data, there could be many ways the results of this project are used, for example as feedback mechanisms to adapt transmission data rates to maintain quality or to reduce the number of re-transmissions required in the case of damaged packets. In the case of speech, the PESQ objective measurement has had many uses in the area of mobile telephony including determining the quality of the network and which compression technique should be used for a particular application. In the same way, determining an appropriate method to objectively measure the quality of compressed music, speech and video can be useful as the number of applications that use these kinds of data increase. It seems that such a study has not been conducted before and hence the results will be interesting to study. 14 th May ManjariKuppayilSaji.pdf

Perceptual coding. A psychoacoustic model is used to identify those signals that are influenced by both these effects.

Perceptual coding. A psychoacoustic model is used to identify those signals that are influenced by both these effects. Perceptual coding Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal. Perceptual encoders, however, have been designed for the compression of general

More information

Audio Fundamentals, Compression Techniques & Standards. Hamid R. Rabiee Mostafa Salehi, Fatemeh Dabiran, Hoda Ayatollahi Spring 2011

Audio Fundamentals, Compression Techniques & Standards. Hamid R. Rabiee Mostafa Salehi, Fatemeh Dabiran, Hoda Ayatollahi Spring 2011 Audio Fundamentals, Compression Techniques & Standards Hamid R. Rabiee Mostafa Salehi, Fatemeh Dabiran, Hoda Ayatollahi Spring 2011 Outlines Audio Fundamentals Sampling, digitization, quantization μ-law

More information

Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal.

Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal. Perceptual coding Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal. Perceptual encoders, however, have been designed for the compression of general

More information

2.4 Audio Compression

2.4 Audio Compression 2.4 Audio Compression 2.4.1 Pulse Code Modulation Audio signals are analog waves. The acoustic perception is determined by the frequency (pitch) and the amplitude (loudness). For storage, processing and

More information

Principles of Audio Coding

Principles of Audio Coding Principles of Audio Coding Topics today Introduction VOCODERS Psychoacoustics Equal-Loudness Curve Frequency Masking Temporal Masking (CSIT 410) 2 Introduction Speech compression algorithm focuses on exploiting

More information

Audio and video compression

Audio and video compression Audio and video compression 4.1 introduction Unlike text and images, both audio and most video signals are continuously varying analog signals. Compression algorithms associated with digitized audio and

More information

DigiPoints Volume 1. Student Workbook. Module 8 Digital Compression

DigiPoints Volume 1. Student Workbook. Module 8 Digital Compression Digital Compression Page 8.1 DigiPoints Volume 1 Module 8 Digital Compression Summary This module describes the techniques by which digital signals are compressed in order to make it possible to carry

More information

Perceptual Coding. Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding

Perceptual Coding. Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding Perceptual Coding Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding Part II wrap up 6.082 Fall 2006 Perceptual Coding, Slide 1 Lossless vs.

More information

15 Data Compression 2014/9/21. Objectives After studying this chapter, the student should be able to: 15-1 LOSSLESS COMPRESSION

15 Data Compression 2014/9/21. Objectives After studying this chapter, the student should be able to: 15-1 LOSSLESS COMPRESSION 15 Data Compression Data compression implies sending or storing a smaller number of bits. Although many methods are used for this purpose, in general these methods can be divided into two broad categories:

More information

Speech-Coding Techniques. Chapter 3

Speech-Coding Techniques. Chapter 3 Speech-Coding Techniques Chapter 3 Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones and zeros The lower the bandwidth, the lower the quality RTP payload types

More information

AUDIO. Henning Schulzrinne Dept. of Computer Science Columbia University Spring 2015

AUDIO. Henning Schulzrinne Dept. of Computer Science Columbia University Spring 2015 AUDIO Henning Schulzrinne Dept. of Computer Science Columbia University Spring 2015 Key objectives How do humans generate and process sound? How does digital sound work? How fast do I have to sample audio?

More information

Source Coding Basics and Speech Coding. Yao Wang Polytechnic University, Brooklyn, NY11201

Source Coding Basics and Speech Coding. Yao Wang Polytechnic University, Brooklyn, NY11201 Source Coding Basics and Speech Coding Yao Wang Polytechnic University, Brooklyn, NY1121 http://eeweb.poly.edu/~yao Outline Why do we need to compress speech signals Basic components in a source coding

More information

ITNP80: Multimedia! Sound-II!

ITNP80: Multimedia! Sound-II! Sound compression (I) Compression of sound data requires different techniques from those for graphical data Requirements are less stringent than for video data rate for CD-quality audio is much less than

More information

Compression; Error detection & correction

Compression; Error detection & correction Compression; Error detection & correction compression: squeeze out redundancy to use less memory or use less network bandwidth encode the same information in fewer bits some bits carry no information some

More information

Lossy compression. CSCI 470: Web Science Keith Vertanen

Lossy compression. CSCI 470: Web Science Keith Vertanen Lossy compression CSCI 470: Web Science Keith Vertanen Digital audio Overview Sampling rate Quan5za5on MPEG audio layer 3 (MP3) JPEG s5ll images Color space conversion, downsampling Discrete Cosine Transform

More information

IMAGE COMPRESSION. Image Compression. Why? Reducing transportation times Reducing file size. A two way event - compression and decompression

IMAGE COMPRESSION. Image Compression. Why? Reducing transportation times Reducing file size. A two way event - compression and decompression IMAGE COMPRESSION Image Compression Why? Reducing transportation times Reducing file size A two way event - compression and decompression 1 Compression categories Compression = Image coding Still-image

More information

5: Music Compression. Music Coding. Mark Handley

5: Music Compression. Music Coding. Mark Handley 5: Music Compression Mark Handley Music Coding LPC-based codecs model the sound source to achieve good compression. Works well for voice. Terrible for music. What if you can t model the source? Model the

More information

CT516 Advanced Digital Communications Lecture 7: Speech Encoder

CT516 Advanced Digital Communications Lecture 7: Speech Encoder CT516 Advanced Digital Communications Lecture 7: Speech Encoder Yash M. Vasavada Associate Professor, DA-IICT, Gandhinagar 2nd February 2017 Yash M. Vasavada (DA-IICT) CT516: Adv. Digital Comm. 2nd February

More information

Video Compression An Introduction

Video Compression An Introduction Video Compression An Introduction The increasing demand to incorporate video data into telecommunications services, the corporate environment, the entertainment industry, and even at home has made digital

More information

DIGITAL TELEVISION 1. DIGITAL VIDEO FUNDAMENTALS

DIGITAL TELEVISION 1. DIGITAL VIDEO FUNDAMENTALS DIGITAL TELEVISION 1. DIGITAL VIDEO FUNDAMENTALS Television services in Europe currently broadcast video at a frame rate of 25 Hz. Each frame consists of two interlaced fields, giving a field rate of 50

More information

AUDIOVISUAL COMMUNICATION

AUDIOVISUAL COMMUNICATION AUDIOVISUAL COMMUNICATION Laboratory Session: Audio Processing and Coding The objective of this lab session is to get the students familiar with audio processing and coding, notably psychoacoustic analysis

More information

Multimedia Systems Image III (Image Compression, JPEG) Mahdi Amiri April 2011 Sharif University of Technology

Multimedia Systems Image III (Image Compression, JPEG) Mahdi Amiri April 2011 Sharif University of Technology Course Presentation Multimedia Systems Image III (Image Compression, JPEG) Mahdi Amiri April 2011 Sharif University of Technology Image Compression Basics Large amount of data in digital images File size

More information

CISC 7610 Lecture 3 Multimedia data and data formats

CISC 7610 Lecture 3 Multimedia data and data formats CISC 7610 Lecture 3 Multimedia data and data formats Topics: Perceptual limits of multimedia data JPEG encoding of images MPEG encoding of audio MPEG and H.264 encoding of video Multimedia data: Perceptual

More information

KINGS COLLEGE OF ENGINEERING DEPARTMENT OF INFORMATION TECHNOLOGY ACADEMIC YEAR / ODD SEMESTER QUESTION BANK

KINGS COLLEGE OF ENGINEERING DEPARTMENT OF INFORMATION TECHNOLOGY ACADEMIC YEAR / ODD SEMESTER QUESTION BANK KINGS COLLEGE OF ENGINEERING DEPARTMENT OF INFORMATION TECHNOLOGY ACADEMIC YEAR 2011-2012 / ODD SEMESTER QUESTION BANK SUB.CODE / NAME YEAR / SEM : IT1301 INFORMATION CODING TECHNIQUES : III / V UNIT -

More information

Audio-coding standards

Audio-coding standards Audio-coding standards The goal is to provide CD-quality audio over telecommunications networks. Almost all CD audio coders are based on the so-called psychoacoustic model of the human auditory system.

More information

ELL 788 Computational Perception & Cognition July November 2015

ELL 788 Computational Perception & Cognition July November 2015 ELL 788 Computational Perception & Cognition July November 2015 Module 11 Audio Engineering: Perceptual coding Coding and decoding Signal (analog) Encoder Code (Digital) Code (Digital) Decoder Signal (analog)

More information

Mpeg 1 layer 3 (mp3) general overview

Mpeg 1 layer 3 (mp3) general overview Mpeg 1 layer 3 (mp3) general overview 1 Digital Audio! CD Audio:! 16 bit encoding! 2 Channels (Stereo)! 44.1 khz sampling rate 2 * 44.1 khz * 16 bits = 1.41 Mb/s + Overhead (synchronization, error correction,

More information

Audio-coding standards

Audio-coding standards Audio-coding standards The goal is to provide CD-quality audio over telecommunications networks. Almost all CD audio coders are based on the so-called psychoacoustic model of the human auditory system.

More information

Data Compression. Audio compression

Data Compression. Audio compression 1 Data Compression Audio compression Outline Basics of Digital Audio 2 Introduction What is sound? Signal-to-Noise Ratio (SNR) Digitization Filtering Sampling and Nyquist Theorem Quantization Synthetic

More information

Lecture 5: Compression I. This Week s Schedule

Lecture 5: Compression I. This Week s Schedule Lecture 5: Compression I Reading: book chapter 6, section 3 &5 chapter 7, section 1, 2, 3, 4, 8 Today: This Week s Schedule The concept behind compression Rate distortion theory Image compression via DCT

More information

CS 074 The Digital World. Digital Audio

CS 074 The Digital World. Digital Audio CS 074 The Digital World Digital Audio 1 Digital Audio Waves Hearing Analog Recording of Waves Pulse Code Modulation and Digital Recording CDs, Wave Files, MP3s Editing Wave Files with BinEd 2 Waves A

More information

Digital Speech Coding

Digital Speech Coding Digital Speech Processing David Tipper Associate Professor Graduate Program of Telecommunications and Networking University of Pittsburgh Telcom 2700/INFSCI 1072 Slides 7 http://www.sis.pitt.edu/~dtipper/tipper.html

More information

CS 335 Graphics and Multimedia. Image Compression

CS 335 Graphics and Multimedia. Image Compression CS 335 Graphics and Multimedia Image Compression CCITT Image Storage and Compression Group 3: Huffman-type encoding for binary (bilevel) data: FAX Group 4: Entropy encoding without error checks of group

More information

Lossy compression CSCI 470: Web Science Keith Vertanen Copyright 2013

Lossy compression CSCI 470: Web Science Keith Vertanen Copyright 2013 Lossy compression CSCI 470: Web Science Keith Vertanen Copyright 2013 Digital audio Overview Sampling rate Quan5za5on MPEG audio layer 3 (MP3) JPEG s5ll images Color space conversion, downsampling Discrete

More information

Multimedia Communications. Audio coding

Multimedia Communications. Audio coding Multimedia Communications Audio coding Introduction Lossy compression schemes can be based on source model (e.g., speech compression) or user model (audio coding) Unlike speech, audio signals can be generated

More information

CSCD 443/533 Advanced Networks Fall 2017

CSCD 443/533 Advanced Networks Fall 2017 CSCD 443/533 Advanced Networks Fall 2017 Lecture 18 Compression of Video and Audio 1 Topics Compression technology Motivation Human attributes make it possible Audio Compression Video Compression Performance

More information

Transporting audio-video. over the Internet

Transporting audio-video. over the Internet Transporting audio-video over the Internet Key requirements Bit rate requirements Audio requirements Video requirements Delay requirements Jitter Inter-media synchronization On compression... TCP, UDP

More information

AUDIOVISUAL COMMUNICATION

AUDIOVISUAL COMMUNICATION AUDIOVISUAL COMMUNICATION Laboratory Session: Audio Processing and Coding The objective of this lab session is to get the students familiar with audio processing and coding, notably psychoacoustic analysis

More information

Figure 1. Generic Encoder. Window. Spectral Analysis. Psychoacoustic Model. Quantize. Pack Data into Frames. Additional Coding.

Figure 1. Generic Encoder. Window. Spectral Analysis. Psychoacoustic Model. Quantize. Pack Data into Frames. Additional Coding. Introduction to Digital Audio Compression B. Cavagnolo and J. Bier Berkeley Design Technology, Inc. 2107 Dwight Way, Second Floor Berkeley, CA 94704 (510) 665-1600 info@bdti.com http://www.bdti.com INTRODUCTION

More information

Chapter 14 MPEG Audio Compression

Chapter 14 MPEG Audio Compression Chapter 14 MPEG Audio Compression 14.1 Psychoacoustics 14.2 MPEG Audio 14.3 Other Commercial Audio Codecs 14.4 The Future: MPEG-7 and MPEG-21 14.5 Further Exploration 1 Li & Drew c Prentice Hall 2003 14.1

More information

Compressed Audio Demystified by Hendrik Gideonse and Connor Smith. All Rights Reserved.

Compressed Audio Demystified by Hendrik Gideonse and Connor Smith. All Rights Reserved. Compressed Audio Demystified Why Music Producers Need to Care About Compressed Audio Files Download Sales Up CD Sales Down High-Definition hasn t caught on yet Consumers don t seem to care about high fidelity

More information

Introduction to LAN/WAN. Application Layer 4

Introduction to LAN/WAN. Application Layer 4 Introduction to LAN/WAN Application Layer 4 Multimedia Multimedia: Audio + video Human ear: 20Hz 20kHz, Dogs hear higher freqs DAC converts audio waves to digital E.g PCM uses 8-bit samples 8000 times

More information

Optical Storage Technology. MPEG Data Compression

Optical Storage Technology. MPEG Data Compression Optical Storage Technology MPEG Data Compression MPEG-1 1 Audio Standard Moving Pictures Expert Group (MPEG) was formed in 1988 to devise compression techniques for audio and video. It first devised the

More information

Voice Quality Assessment for Mobile to SIP Call over Live 3G Network

Voice Quality Assessment for Mobile to SIP Call over Live 3G Network Abstract 132 Voice Quality Assessment for Mobile to SIP Call over Live 3G Network G.Venkatakrishnan, I-H.Mkwawa and L.Sun Signal Processing and Multimedia Communications, University of Plymouth, Plymouth,

More information

Features. Sequential encoding. Progressive encoding. Hierarchical encoding. Lossless encoding using a different strategy

Features. Sequential encoding. Progressive encoding. Hierarchical encoding. Lossless encoding using a different strategy JPEG JPEG Joint Photographic Expert Group Voted as international standard in 1992 Works with color and grayscale images, e.g., satellite, medical,... Motivation: The compression ratio of lossless methods

More information

CHAPTER 6 Audio compression in practice

CHAPTER 6 Audio compression in practice CHAPTER 6 Audio compression in practice In earlier chapters we have seen that digital sound is simply an array of numbers, where each number is a measure of the air pressure at a particular time. This

More information

Audio Coding and MP3

Audio Coding and MP3 Audio Coding and MP3 contributions by: Torbjørn Ekman What is Sound? Sound waves: 20Hz - 20kHz Speed: 331.3 m/s (air) Wavelength: 165 cm - 1.65 cm 1 Analogue audio frequencies: 20Hz - 20kHz mono: x(t)

More information

Lecture 16 Perceptual Audio Coding

Lecture 16 Perceptual Audio Coding EECS 225D Audio Signal Processing in Humans and Machines Lecture 16 Perceptual Audio Coding 2012-3-14 Professor Nelson Morgan today s lecture by John Lazzaro www.icsi.berkeley.edu/eecs225d/spr12/ Hero

More information

Voice Analysis for Mobile Networks

Voice Analysis for Mobile Networks White Paper VIAVI Solutions Voice Analysis for Mobile Networks Audio Quality Scoring Principals for Voice Quality of experience analysis for voice... 3 Correlating MOS ratings to network quality of service...

More information

A Comparison of Still-Image Compression Standards Using Different Image Quality Metrics and Proposed Methods for Improving Lossy Image Quality

A Comparison of Still-Image Compression Standards Using Different Image Quality Metrics and Proposed Methods for Improving Lossy Image Quality A Comparison of Still-Image Compression Standards Using Different Image Quality Metrics and Proposed Methods for Improving Lossy Image Quality Multidimensional DSP Literature Survey Eric Heinen 3/21/08

More information

VC 12/13 T16 Video Compression

VC 12/13 T16 Video Compression VC 12/13 T16 Video Compression Mestrado em Ciência de Computadores Mestrado Integrado em Engenharia de Redes e Sistemas Informáticos Miguel Tavares Coimbra Outline The need for compression Types of redundancy

More information

Synopsis of Basic VoIP Concepts

Synopsis of Basic VoIP Concepts APPENDIX B The Catalyst 4224 Access Gateway Switch (Catalyst 4224) provides Voice over IP (VoIP) gateway applications for a micro branch office. This chapter introduces some basic VoIP concepts. This chapter

More information

Digital Media. Daniel Fuller ITEC 2110

Digital Media. Daniel Fuller ITEC 2110 Digital Media Daniel Fuller ITEC 2110 Daily Question: Digital Audio What values contribute to the file size of a digital audio file? Email answer to DFullerDailyQuestion@gmail.com Subject Line: ITEC2110-09

More information

Compression; Error detection & correction

Compression; Error detection & correction Compression; Error detection & correction compression: squeeze out redundancy to use less memory or use less network bandwidth encode the same information in fewer bits some bits carry no information some

More information

ITEC310 Computer Networks II

ITEC310 Computer Networks II ITEC310 Computer Networks II Chapter 29 Multimedia Department of Information Technology Eastern Mediterranean University 2/75 Objectives After completing this chapter you should be able to do the following:

More information

Networking Applications

Networking Applications Networking Dr. Ayman A. Abdel-Hamid College of Computing and Information Technology Arab Academy for Science & Technology and Maritime Transport Multimedia Multimedia 1 Outline Audio and Video Services

More information

Compression Part 2 Lossy Image Compression (JPEG) Norm Zeck

Compression Part 2 Lossy Image Compression (JPEG) Norm Zeck Compression Part 2 Lossy Image Compression (JPEG) General Compression Design Elements 2 Application Application Model Encoder Model Decoder Compression Decompression Models observe that the sensors (image

More information

Compression of Stereo Images using a Huffman-Zip Scheme

Compression of Stereo Images using a Huffman-Zip Scheme Compression of Stereo Images using a Huffman-Zip Scheme John Hamann, Vickey Yeh Department of Electrical Engineering, Stanford University Stanford, CA 94304 jhamann@stanford.edu, vickey@stanford.edu Abstract

More information

Multimedia Systems Speech II Mahdi Amiri February 2012 Sharif University of Technology

Multimedia Systems Speech II Mahdi Amiri February 2012 Sharif University of Technology Course Presentation Multimedia Systems Speech II Mahdi Amiri February 2012 Sharif University of Technology Homework Original Sound Speech Quantization Companding parameter (µ) Compander Quantization bit

More information

EE482: Digital Signal Processing Applications

EE482: Digital Signal Processing Applications Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu EE482: Digital Signal Processing Applications Spring 2014 TTh 14:30-15:45 CBC C222 Lecture 13 Audio Signal Processing 14/04/01 http://www.ee.unlv.edu/~b1morris/ee482/

More information

Outline Introduction MPEG-2 MPEG-4. Video Compression. Introduction to MPEG. Prof. Pratikgiri Goswami

Outline Introduction MPEG-2 MPEG-4. Video Compression. Introduction to MPEG. Prof. Pratikgiri Goswami to MPEG Prof. Pratikgiri Goswami Electronics & Communication Department, Shree Swami Atmanand Saraswati Institute of Technology, Surat. Outline of Topics 1 2 Coding 3 Video Object Representation Outline

More information

Image, video and audio coding concepts. Roadmap. Rationale. Stefan Alfredsson. (based on material by Johan Garcia)

Image, video and audio coding concepts. Roadmap. Rationale. Stefan Alfredsson. (based on material by Johan Garcia) Image, video and audio coding concepts Stefan Alfredsson (based on material by Johan Garcia) Roadmap XML Data structuring Loss-less compression (huffman, LZ77,...) Lossy compression Rationale Compression

More information

Wireless Communication

Wireless Communication Wireless Communication Systems @CS.NCTU Lecture 6: Image Instructor: Kate Ching-Ju Lin ( 林靖茹 ) Chap. 9 of Fundamentals of Multimedia Some reference from http://media.ee.ntu.edu.tw/courses/dvt/15f/ 1 Outline

More information

Part 1 of 4. MARCH

Part 1 of 4. MARCH Presented by Brought to You by Part 1 of 4 MARCH 2004 www.securitysales.com A1 Part1of 4 Essentials of DIGITAL VIDEO COMPRESSION By Bob Wimmer Video Security Consultants cctvbob@aol.com AT A GLANCE Compression

More information

International Journal of Emerging Technology and Advanced Engineering Website: (ISSN , Volume 2, Issue 4, April 2012)

International Journal of Emerging Technology and Advanced Engineering Website:   (ISSN , Volume 2, Issue 4, April 2012) A Technical Analysis Towards Digital Video Compression Rutika Joshi 1, Rajesh Rai 2, Rajesh Nema 3 1 Student, Electronics and Communication Department, NIIST College, Bhopal, 2,3 Prof., Electronics and

More information

Lecture 8 JPEG Compression (Part 3)

Lecture 8 JPEG Compression (Part 3) CS 414 Multimedia Systems Design Lecture 8 JPEG Compression (Part 3) Klara Nahrstedt Spring 2012 Administrative MP1 is posted Today Covered Topics Hybrid Coding: JPEG Coding Reading: Section 7.5 out of

More information

VALLIAMMAI ENGINEERING COLLEGE

VALLIAMMAI ENGINEERING COLLEGE VALLIAMMAI ENGINEERING COLLEGE SRM Nagar, Kattankulathur 603 203 DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING QUESTION BANK VIII SEMESTER EC6018 - MULTIMEDIA COMPRESSION AND COMMUNICATION Regulation

More information

Professor Laurence S. Dooley. School of Computing and Communications Milton Keynes, UK

Professor Laurence S. Dooley. School of Computing and Communications Milton Keynes, UK Professor Laurence S. Dooley School of Computing and Communications Milton Keynes, UK How many bits required? 2.4Mbytes 84Kbytes 9.8Kbytes 50Kbytes Data Information Data and information are NOT the same!

More information

Principles of MPEG audio compression

Principles of MPEG audio compression Principles of MPEG audio compression Principy komprese hudebního signálu metodou MPEG Petr Kubíček Abstract The article describes briefly audio data compression. Focus of the article is a MPEG standard,

More information

DAB. Digital Audio Broadcasting

DAB. Digital Audio Broadcasting DAB Digital Audio Broadcasting DAB history DAB has been under development since 1981 at the Institut für Rundfunktechnik (IRT). In 1985 the first DAB demonstrations were held at the WARC-ORB in Geneva

More information

Mahdi Amiri. February Sharif University of Technology

Mahdi Amiri. February Sharif University of Technology Course Presentation Multimedia Systems Speech II Mahdi Amiri February 2014 Sharif University of Technology Speech Compression Road Map Based on Time Domain analysis Differential Pulse-Code Modulation (DPCM)

More information

Does your Voice Quality Monitoring Measure Up?

Does your Voice Quality Monitoring Measure Up? Does your Voice Quality Monitoring Measure Up? Measure voice quality in real time Today s voice quality monitoring tools can give misleading results. This means that service providers are not getting a

More information

Audio Compression. Audio Compression. Absolute Threshold. CD quality audio:

Audio Compression. Audio Compression. Absolute Threshold. CD quality audio: Audio Compression Audio Compression CD quality audio: Sampling rate = 44 KHz, Quantization = 16 bits/sample Bit-rate = ~700 Kb/s (1.41 Mb/s if 2 channel stereo) Telephone-quality speech Sampling rate =

More information

Appendix 4. Audio coding algorithms

Appendix 4. Audio coding algorithms Appendix 4. Audio coding algorithms 1 Introduction The main application of audio compression systems is to obtain compact digital representations of high-quality (CD-quality) wideband audio signals. Typically

More information

New Results in Low Bit Rate Speech Coding and Bandwidth Extension

New Results in Low Bit Rate Speech Coding and Bandwidth Extension Audio Engineering Society Convention Paper Presented at the 121st Convention 2006 October 5 8 San Francisco, CA, USA This convention paper has been reproduced from the author's advance manuscript, without

More information

Review and Implementation of DWT based Scalable Video Coding with Scalable Motion Coding.

Review and Implementation of DWT based Scalable Video Coding with Scalable Motion Coding. Project Title: Review and Implementation of DWT based Scalable Video Coding with Scalable Motion Coding. Midterm Report CS 584 Multimedia Communications Submitted by: Syed Jawwad Bukhari 2004-03-0028 About

More information

VIDEO SIGNALS. Lossless coding

VIDEO SIGNALS. Lossless coding VIDEO SIGNALS Lossless coding LOSSLESS CODING The goal of lossless image compression is to represent an image signal with the smallest possible number of bits without loss of any information, thereby speeding

More information

Multimedia Systems Speech II Hmid R. Rabiee Mahdi Amiri February 2015 Sharif University of Technology

Multimedia Systems Speech II Hmid R. Rabiee Mahdi Amiri February 2015 Sharif University of Technology Course Presentation Multimedia Systems Speech II Hmid R. Rabiee Mahdi Amiri February 25 Sharif University of Technology Speech Compression Road Map Based on Time Domain analysis Differential Pulse-Code

More information

CHAPTER 10: SOUND AND VIDEO EDITING

CHAPTER 10: SOUND AND VIDEO EDITING CHAPTER 10: SOUND AND VIDEO EDITING What should you know 1. Edit a sound clip to meet the requirements of its intended application and audience a. trim a sound clip to remove unwanted material b. join

More information

Speech and audio coding

Speech and audio coding Institut Mines-Telecom Speech and audio coding Marco Cagnazzo, cagnazzo@telecom-paristech.fr MN910 Advanced compression Outline Introduction Introduction Speech signal Music signal Masking Codeurs simples

More information

Introduction ti to JPEG

Introduction ti to JPEG Introduction ti to JPEG JPEG: Joint Photographic Expert Group work under 3 standards: ISO, CCITT, IEC Purpose: image compression Compression accuracy Works on full-color or gray-scale image Color Grayscale

More information

Lecture Information Multimedia Video Coding & Architectures

Lecture Information Multimedia Video Coding & Architectures Multimedia Video Coding & Architectures (5LSE0), Module 01 Introduction to coding aspects 1 Lecture Information Lecturer Prof.dr.ir. Peter H.N. de With Faculty Electrical Engineering, University Technology

More information

Mahdi Amiri. February Sharif University of Technology

Mahdi Amiri. February Sharif University of Technology Course Presentation Multimedia Systems Overview of the Course Mahdi Amiri February 2014 Sharif University of Technology Course Syllabus Website http://ce.sharif.edu/courses/92-93/2/ce342-1/ Page 1 Course

More information

Index. 1. Motivation 2. Background 3. JPEG Compression The Discrete Cosine Transformation Quantization Coding 4. MPEG 5.

Index. 1. Motivation 2. Background 3. JPEG Compression The Discrete Cosine Transformation Quantization Coding 4. MPEG 5. Index 1. Motivation 2. Background 3. JPEG Compression The Discrete Cosine Transformation Quantization Coding 4. MPEG 5. Literature Lossy Compression Motivation To meet a given target bit-rate for storage

More information

Course Syllabus. Website Multimedia Systems, Overview

Course Syllabus. Website   Multimedia Systems, Overview Course Syllabus Website http://ce.sharif.edu/courses/93-94/2/ce342-1/ Page 1 Course Syllabus Textbook Z-N. Li, M.S. Drew, Fundamentals of Multimedia, Pearson Prentice Hall Upper Saddle River, NJ, 2004.*

More information

Data Storage. Slides derived from those available on the web site of the book: Computer Science: An Overview, 11 th Edition, by J.

Data Storage. Slides derived from those available on the web site of the book: Computer Science: An Overview, 11 th Edition, by J. Data Storage Slides derived from those available on the web site of the book: Computer Science: An Overview, 11 th Edition, by J. Glenn Brookshear Copyright 2012 Pearson Education, Inc. Data Storage Bits

More information

Image and Video Compression Fundamentals

Image and Video Compression Fundamentals Video Codec Design Iain E. G. Richardson Copyright q 2002 John Wiley & Sons, Ltd ISBNs: 0-471-48553-5 (Hardback); 0-470-84783-2 (Electronic) Image and Video Compression Fundamentals 3.1 INTRODUCTION Representing

More information

So, what is data compression, and why do we need it?

So, what is data compression, and why do we need it? In the last decade we have been witnessing a revolution in the way we communicate 2 The major contributors in this revolution are: Internet; The explosive development of mobile communications; and The

More information

INF5063: Programming heterogeneous multi-core processors. September 17, 2010

INF5063: Programming heterogeneous multi-core processors. September 17, 2010 INF5063: Programming heterogeneous multi-core processors September 17, 2010 High data volumes: Need for compression PAL video sequence 25 images per second 3 bytes per pixel RGB (red-green-blue values)

More information

7.5 Dictionary-based Coding

7.5 Dictionary-based Coding 7.5 Dictionary-based Coding LZW uses fixed-length code words to represent variable-length strings of symbols/characters that commonly occur together, e.g., words in English text LZW encoder and decoder

More information

2014 Summer School on MPEG/VCEG Video. Video Coding Concept

2014 Summer School on MPEG/VCEG Video. Video Coding Concept 2014 Summer School on MPEG/VCEG Video 1 Video Coding Concept Outline 2 Introduction Capture and representation of digital video Fundamentals of video coding Summary Outline 3 Introduction Capture and representation

More information

Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig

Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig Multimedia Databases Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 6 Audio Retrieval 6 Audio Retrieval 6.1 Basics of

More information

Lecture Information. Mod 01 Part 1: The Need for Compression. Why Digital Signal Coding? (1)

Lecture Information. Mod 01 Part 1: The Need for Compression. Why Digital Signal Coding? (1) Multimedia Video Coding & Architectures (5LSE0), Module 01 Introduction to coding aspects 1 Lecture Information Lecturer Prof.dr.ir. Peter H.N. de With Faculty Electrical Engineering, University Technology

More information

Lecture #3: Digital Music and Sound

Lecture #3: Digital Music and Sound Lecture #3: Digital Music and Sound CS106E Spring 2018, Young In this lecture we take a look at how computers represent music and sound. One very important concept we ll come across when studying digital

More information

CS 260: Seminar in Computer Science: Multimedia Networking

CS 260: Seminar in Computer Science: Multimedia Networking CS 260: Seminar in Computer Science: Multimedia Networking Jiasi Chen Lectures: MWF 4:10-5pm in CHASS http://www.cs.ucr.edu/~jiasi/teaching/cs260_spring17/ Multimedia is User perception Content creation

More information

3G Services Present New Challenges For Network Performance Evaluation

3G Services Present New Challenges For Network Performance Evaluation 3G Services Present New Challenges For Network Performance Evaluation 2004-29-09 1 Outline Synopsis of speech, audio, and video quality evaluation metrics Performance evaluation challenges related to 3G

More information

Chapter 1. Digital Data Representation and Communication. Part 2

Chapter 1. Digital Data Representation and Communication. Part 2 Chapter 1. Digital Data Representation and Communication Part 2 Compression Digital media files are usually very large, and they need to be made smaller compressed Without compression Won t have storage

More information

Fundamentals of Video Compression. Video Compression

Fundamentals of Video Compression. Video Compression Fundamentals of Video Compression Introduction to Digital Video Basic Compression Techniques Still Image Compression Techniques - JPEG Video Compression Introduction to Digital Video Video is a stream

More information

Digital Image Processing

Digital Image Processing Digital Image Processing Fundamentals of Image Compression DR TANIA STATHAKI READER (ASSOCIATE PROFFESOR) IN SIGNAL PROCESSING IMPERIAL COLLEGE LONDON Compression New techniques have led to the development

More information

Ch. 5: Audio Compression Multimedia Systems

Ch. 5: Audio Compression Multimedia Systems Ch. 5: Audio Compression Multimedia Systems Prof. Ben Lee School of Electrical Engineering and Computer Science Oregon State University Chapter 5: Audio Compression 1 Introduction Need to code digital

More information