Signal processing in sound engineering

Size: px
Start display at page:

Download "Signal processing in sound engineering"

Transcription

1 Assessment of speech quality in MP3 compression Stefan Brachmański Wroclaw University of Technology, Wybrzeże Wyspiańskiego 27, Wrocław Summary The development of the telecommunication services demands the necessity of more effective use of band assigned to the transmission. The phonic signals, including the speech signal, are transformed before the transmission (i.e. compression or coding). The aim of presented research was determining of the bit rate influence for quality evaluation of speech transmission coded with the MP3 technique and defining the minimal value of bit rate giving the satisfactory quality of coded speech signal. The quality evaluation was performed with recommended by International Telecommunication Union ACR and DCR subjective methods and objective PESQ method. In the performed tests the sentences lists were red by male and female. The results obtained with the ACR method indicate that very good speech quality (the MOS over 4.5) is gained for the bit rate of minimum 128 kb/s, while with the DCR method - 56 kb/s. The good speech transmission quality was assessed by listeners in the ACR method for the bit rate over 64 kb/s, and over 32 kb/s in the DCR method. 1. Introduction There was the significant development of the telecommunication technology lately, among others in the domain of video and audio signals transmission including speech signal. As the result more effective use of band assigned for the signal transmission is required. Nowadays many solutions is performed in which speech signal is transformed in many ways for its more sufficient transmission, acquisition or recognition. Various algorithms named codecs (coder on the sending side and decoder on the receiving side) are applied for this purposes. The codecs characteristic features are among others: demands according to band width, brought delays, compression level and the quality of reproduced speech signal. All these factors can however badly influence the quality of transmitted speech which is one of crucial elements of whole assessment of telecommunication services. Nowadays the quality service is fundamental for all: end-users, operators, service providers and even for hardware suppliers, so it is obvious that the significant role is the proper quality assessment of transmitted speech signal. One of factors influencing the quality of the transmitted signal is the kind of applied compression techniques. Among others various compression techniques, this paper is focused on the MP3 standard, which is extremely popular standard and moreover is used in the DAB digital radio. In presented tests there was examined the influence of bit rate on speech transmission quality. The speech transmission quality can be assessed by subjective or objective methods. The big disadvantage of objective methods is that they are expensive and time consuming hence for many years there were quests for objective method which results would be consistent with the user opinion. Despite the big progress in the new objective methods creation [1],[3],[12], [13],[16],[17],[18],[25],[26] there is still the only way of their verification which are methods based on subjective tests [4],[5],[6],[7],[8],[9],[15],[20],[21],[22],[24]. Among many subjective methods of transmitted and coded speech quality assessment the most popular are nowadays methods which would give direct five ranks degree like recommended by ITU-T in P.800 [15] the ACR method (Absolute Category Rating) and the

2 DCR method (Degradation Category Rating). In presented research both these methods were applied in the process of quality assessment of MP3 coded speech with various bitrate speeds. 2. MPEG-1 Audio Layer 3 (MP3) Standard The MPEG-1 Audio Layer 3 standard, known as MP3 was designed in the Fraunhofer Institute in cooperation with Thomson company in 1991 and approved by ISO as the international standard ((ISO ) [14]. The MP3 was realized in three development versions named as Layer 1, Layer 2 and Layer 3, which basic parameters in relation to PCM are given in Table 1. Tab. 1. Basic MPEG audio compression parameters in relation to CD quality stereo signal Compression Compression degree Demanded bit rate speed PCM (CD quality) 1:1 1,4 Mb/s MPEG-1 Layer 1 1:4 384 kb/s MPEG-1 Layer 2 1:8 192 kb/s MPEG-1 Layer 3 1: kb/s All MPEG audio standards use the same idea which is the limitation of audio stream by removing the part of signal which is unimportant from the listeners point of view. The imperfections of human ear are used, in particular the masking effect (Fig.1). The weaker sounds appearing around stronger ones can be removed and the human ear can not notice this fact and at the same time lasting usable signal contains less information [23]. In the MP3 standard, there is also used the phenomenon that in conjunction with small speed of neural stimulus transmitted to brain, human can not distinguish weak sounds which appear shortly before or after stronger ones. The MP3 standard uses that expanding the masking range, and before the masking signal the masking appears in a very short time between 2 and 5 ms (some sources say about 20 ms), and after the signal in much longer period between 50 ms and 200 ms. In the masking process, the MP3 coder is based on two compression methods which are loss and lossless, but the dominant is the loss compression and its algorithm is much more complex [19], [27]. Fig. 1 Masking effect. (white bars - sounds, which can be masked during compression, gray bar audible sound).

3 Fig. 2. Illustration of inaudible sounds in presence of strong signal. The first phase of coding is the division of digital audio signal into 32 bands and calculating of 1024 points Fast Fourier Transform (FFT). Signal components laying outside human s audible range are omitted. In the next phase the Modified Discrete Cosine Transform is used (MDCT) and the psycho-acoustic model. The MDCT is the primary element which distinguish Layer 3 MPEG-1 from Layers 1 and 2. It is a lapped transform which let us avoid artifacts stemming from the block boundaries, which would be audible each several milliseconds. The effect of that block are MDCT coefficients. The psycho-acoustic model assumes that regarding the human ear and brain characteristics it is not able to receive and process all acoustic information carried by sound. Psycho-acoustic model (Layer 3) predicts the audible range from 20 Hz to 16 khz (Layer -2 takes range 20 Hz 20 khz), and maximum ear sensitivity in range from 2 khz to 4 khz. In the effect of operations done in the block of psycho-acoustic model the decision is made which data should be given more precisely and which are less relevant. There would be rejected data which is not consistent with the model. After that the signal is given to the quantizer and coder. To obtain more efficient coding there is applied nonlinear quantification, adaptive segmentation and Huffman coding method. In the final phase the audio signal is divided into small parts called frames. Each MP3 file consist of frames containing data responding fraction of recording reconstructed be decoder. Each frame has the heading which includes 32 bits of additional information describing between others the kind and parameters of sound (Fig. 3). There are also all necessary information for proper reconstruction of music data, taking place in forthcoming part of the frame.

4 Fig. 3. Frame block in MP3 3. Absolute Category Rating (ACR) method The Absolute Category Rating (ACR) method is recommended by International Telecommunication Union (ITU) [15] for assessing the quality of speech signal. Test lists comprise simple, short (2-3s), semantically unrelated sentences. A test list is divided into groups of five sentences. The experimenter must decide how many sentences are required in each group to constitute a speech sample. A minimum of two and a maximum of five are recommended. The test material should be properly prepared and recorded. The speaker should pronounce the sentences fluently and should not have any speech defects. To reduce the influence of the individual characteristics of the speaker s voice on the obtained result, several speakers should take part in the experiment. The ITU-T P.800 recommendation permit earlier recording of utterance on high quality equipment like a conventional two-track tape recorder (high grade tape), a two-channel digital audio processor a high quality video cassette recorder or digital tape recorder (DAT) or a computer with acoustic input and output. Speech is recorded from a linear microphone and low-noise amplifier with a flat frequency response. The recording should be carried out in the room of volume m 3, reverberation time less than 0.5s (preferably in the range s) and noise level not bigger than 30dBA. The recording level should be set 20-30dB less than the level of overdriving of recording system. In the beginning of each recording, 20 seconds calibrating tone of known level is recording. Usually as the calibrating tone the harmonic signal of 1000Hz frequency is applied unless the system is sensitive for this frequency (i.e. this frequency is used for another purposes). In such cases there can be used tone of different frequency. The recoded calibrating tone can be used for setting of listening level. The listening is carried out in a room of parameters like during the recordings of testing lists. It is recommended to measure the noise level at least twice, i.e. in the beginning and end of the measures. If there is a big difference between measures, then the leading person should judge if it can influence the final measures score. The experiment s listening part should take place in a room with a noise level below 30 dba. Listeners are chosen at random from the normal telephone using population.

5 Listeners read instruction of experiment before beginning of the measurements. Various scales may be used for different purposes. Operator gives the following opinion scales recommended by ITU: a) listening-quality scale (Excellent speech is rated 5, Good - 4, Fair - 3, Poor 2, Bad - 1), b) listening-effort scale(complete relaxation possible; no effort required is rated 5, Attention necessary; no appreciable effort required - 4, Moderate effort required - 3, Considerable effort required - 2, No meaning understood with any feasible effort - 1 ), c) loudness-preference scale (Much louder than preferred is rated 5, Louder than preferred - 4, Preferred 3, Quieter than preferred 2, Much quieter than preferred - 1). Listeners listen to the sentences and give their opinions in five levels scale. The mean value should be calculated over listeners and speakers for each condition of speech transmission. The male and female voice have different characteristics which is why both types of voices should be regarded and the obtained results should be analyzed separately. In cases when the differences in results obtained from both type of voices are irrelevant the final results can be averaged. 4. Degradation Category Rating (DCR) The Degradation Category Rating (DCR) method is recommended by ITU [15] as an alternative to the ACR method which is not accurate enough for high-quality systems. The measurement consists of comparing the tested system with a high-quality reference system. Speech samples, i.e. different sentences (sentence lists the same as in ACR), are selected from a larger, balanced test list and presented to the listeners in single pairs (A-B) or repeated pairs (A-B-A-B) where A is the reference sample and B is the tested sample. Each pair is rated separately. It is recommended to use several zero pairs (A A) to verify the quality and sensitivity of the ratings given by the participants in the experiment. Samples A and B should be separated by a s interval. In a repeated pair procedure (A-B-A-B), the separation between the two pairs should be s. The listeners evaluate the degree of deterioration in the quality of the second sentence in comparison with the first one, using a five-point scale of quality deterioration. In the method DCR the listeners try to answer the question Please rate the degradation of the second sample relative to the first.. Listeners hear two sentences (original and transmitted) and give their opinions in five point scale (degradation is inaudible - 5, degradation is perceived but not annoying - 4, degradation is slightly annoying - 3, degradation is annoying - 2 and degradation is very annoying - 1). The average rating (Degradation Mean Opinion Score DMOS) is calculated over the listeners and the speakers for each tested speech transmission condition. The requirements as to the room s acoustics, recording, listening sessions, the selection of a listening group and the test material are similar as for ACR 5. Comparison Category Rating (CCR) The Comparison Category Rating (CCR) method [15] is similar to the DCR. The process of recording and replaying of the list is the same whereas the model and tested samples are played in random order (the A-B pairs are created randomly). The listeners aim is to compare two samples A and B and to assess if the quality of the first signal in comparison to the second one is the same or different. There is the seven grades scale from 3 to -3 (3- the quality of the first signal in comparison to the second one is much better 2- better, 1 slightly better, 0 about the same, -1 slightly worse, -2 worse, -3 much worse). The gained assessment is the mean of all partial marks and is named CMOS (Comparison Mean Opinion Score). 6. Perceptual Evaluation of Speech Quality Method

6 The Perceptual Evaluation of Speech Quality (PESQ) method [17] is the improved version of PSQM (Perceptual Speech Quality Measure) [16] based on a transformation of physical signals with the psychoacoustic modeling [2]. PSQM method was recommended by ITU-T as P.861 [16], and in 2001 ITU-T accepted the PESQ as a new standard P.862 [17], which replaced previously recommended PSQM method. The idea of PESQ measurement is based on so called internal representation which reflects a theoretical form of speech signal in a human brain, similarly to PSQM. As a reference signal, the previously recorded male and female voices (one sentence by each voice) are used. Such prepared original signal is transmitted via telecommunication channel being under investigation, and at the output of this channel this signal is distorted (degraded). Next, these two signals are compared in a psychoacoustic domain which reflects the human impression of speech. The transformation from the physical form into the psychoacoustic representation appears in three stages time frequency reflection, frequency-critical bank scaling and scaling of the signal levels. In the first operation the time signals are mapped to the time-frequency domain using a short-term FFT with a Hann window of size 32 ms (N=256 samples at sampling frequency of 8 khz, or N=512 samples at sampling frequency of 16 khz). The overlap between successive time windows is 50%. The second stage takes into account the fact that hearing system features the worse frequency discrimination for higher frequency range in comparison to the lower band. This fact, with the signal-by-noise masking phenomena by the means of filter bank as critical bands with a bark scale, leads to the modeling of the hearing process at the particular stimuli. The continuous spectrum is a representation of a stimuli distribution over the nerves connected to basilar membrane which reflects all the complex phenomena as nonlinear smoothing in critical bands. Scaling of the signal levels in db into levels of loudness in phones, and finally into the loudness scale in sons [28], is the third step of transformation. This step reflects the nonlinearity of relation: signal level loudness impression. At the end of processing chain, the cognitive model is applied, and the final decision is a result of comparison between two internal representations (spectra) of tested and reference signals. The output value is PESQ score. The range of the PESQ score is -0.5 to 4.5. This PESQ score can be transformed into a subjective listening quality MOS-like scale between 1.0 and 4.5, the normal range of MOS values found in an ACR experiment. 7. Experiment The aim of tests carried out was: - checking the bit rate influence on MP3 coded speech signal quality evaluation, - determining of the bit rate minimal value giving satisfactory quality of coded speech signal, - examining the differences in results obtained with ACR and DCR methods, - chocking the possibility of determining the relation between subjective ACR and DCR methods and the objective PESQ one. The ACR and DCR measurements were carried out at room of Chair of Acoustics and Multimedia of Wroclaw University of Technology. The room fulfilled the ITU-T P.800 recommendation [15]. The listeners were from 20 to 30 years old. Two native speakers (women and man) of Polish participated in the study. Listeners were students at the Wroclaw University of Technology whose age ranged from 20 to 23 years The test material consisted of phonetically-balanced sentences lists (Fig.4) which were pronounced by male and female speakers. The model sentences lists were recorded on the digital tape recorder with the sampling frequency of 44,1kHz and 16 bits. Testing lists were

7 created with the GX Transcoder converter, which allows for conversion of most popular audio formats. In tests carried out for the MP3 format the bit rate speed was changed in range from 24kb/s to 320kb/s. For each measurement point (different bit rate) there were two lists 50 sentences each, one pronounced by male and one by female speaker. The listeners were giving their marks in 5 degree scale from 1 to 5. The testing signals were presented to the listeners with the earphones and the computer software created in the Laboratory of Analysis and Processing of Acoustic Signals at Chair of Acoustics and Multimedia, Faculty of Electronics at Wroclaw University of Technology. The software allows for carrying out tests with ACR and DCR methods. At the front of each listener there is a screen, keyboard and mouse. The listener according to information from the leading person has to choose the measurement option (ACR or DCR). Choosing the ACR method there is only one tested signal presented to the listeners, whereas choosing the DCR method there are two signals presented the original (the first one) and the tested one (the second one). Before the start of the measurements there is the instruction displayed on the screen regarding the measurement method and the way of giving the mark. After listening to testing sequence, the listener gives his mark in 5 degree scale. The evaluation in the form of marks is giving in certain time limits. The passage of time is presented to the listener on the screen. If the listener is not able to give his mark in the planned time then the program gives the lowest possible mark, i.e. 1. The software calculates marks for each listener separately and after the end of the session for each tested transmission condition (with certain bit rate) and gives medium MOS opinion for each listener. There is a possibility of network connection of all computers taking part in the experiment and the result can be presented as the medium mark of each individual listener or medium value of all listeners with full statistic information. The measurement with the network connection demands gathering of all the listeners at the same time because the evaluation is done simultaneously by all listeners. On the contrary one-stand measurement allows carrying out tests in different times but it is necessary to keep the same measurement conditions. The results of MOS evaluation obtained with the ACR method for male and female voice are presented in Fig.5. In Fig.6 results obtained for the DCR method are presented. The measurements presented in the table below were partially done within the framework of diploma project realized at Faculty of Electronics of Wroclaw University of Technology [11].

8 Rys 4. Sample sentences list During the analysis of the obtained results ther were no significant differences between marks obtained for male and female voices for both ACR and DCR methods. According to the P.800 recommendation [15] in such case the results for male and female voices can be averaged. The medium results obtained for ACR and DCR methods are presented in Fig.7. The measurements of objective PESQ method were carried out with the same sentences lists and identical transmission conditions. The obtained results are presented in Fig.7.

9 Fig. 5. MOS speech signal quality evaluation obtained with ACR method in bit rate function (ACR-M male voice, ACR-F female voice) Fig. 6. MOS speech signal quality evaluation obtained with DCR method in bit rate function (DCR-M male voice, DCR-F female voice)

10 Fig. 7. Medium MOS speech quality evaluation obtained with ACR, DCR and objective PESQ methods in bit rate function. 8. Summary During the analysis of the results obtained in presented experiment it can be noticed that as could be expected in both subjective methods Absolute Category Rating (ACR) and Degradation Category Rating (DCR), increase of the bit rate is improving the speech quality. There were no significant differences between results for male and female voices which allowed the averaging the results obtained for both voices. Results for the ACR methods indicate that very good speech quality (MOS over 4.5) can be achieved for the minimal bit rate of 128 kb/s, whereas for the DCR method it is 56 kb/s. The good speech transmission quality was decided by listeners for bit rates from 64 kb/s 128 kb/s in the ACR method, and from 32 kb/s 56 kb/s in the DCR method. When comparing the ACR and DCR methods it can be noticed that there is bigger increase of MOS opinion in the DCR than in ACR method. Similar research were carried out in the framework of evaluation of speech transmission in the system of digital radio DAB+. The results were presented at 134th Convention Audio Engineering Society in Rome [4]. In the research the sound samples after the transmission: multiplex radio-transmitter radio receiver were used. For the coding the systems MPEG- 2/4 AAC and HE AAC v.1 were used during the experimental emission carried out in Wroclaw. The tested samples (male and female voices) were transmitted with six different bit rates (136 kbit/s, 128 kbit/s, 96 kbit/s, 64 kbit/s, 48 kbit/s oraz 24 kbit/s), for the sampling frequency of 48 khz. For each bit rate value two versions of signals were recorded: with switch on and switched off the SBR processor. As the model samples the CD recordings were applied, which were the audio source. The tested samples were recorded with the digital tape recorder DAT Tascam DA 30, with sampling frequency of 44,1 khz and 16 bits on the analog output of the radio receiver DAB CLINT Audio 01. In the above research it was confirmed that for bit rates from 96 kb/s the DCR gives very good opinion whereas the ACR method from the bit rate of 128 kb/s. The results obtained with the ACR method in these two experiments are convergent. In both cases the very good

11 speech transmission evaluation was obtained for bit rates of minimum 128 kb/s; and the results are unanimous despite different speech coding techniques (MP3 and AAC). The differences are in the experimental results obtained for the DCR method. In the presented research (MP3 coding) the very good evaluation was obtained from the bit rate of 56 kb/s, whereas in experiments with digital radio transmission DAB+ (AAC coding) from the bit rat of 96 kb/s. The measurements carried out with the objective PESQ method are burdened with error related to bit rates overriding 128 kbps. The PESQ allows the maximal sampling frequency of 16kHz with 16bits, which gives 128kbps. It can be expected that for higher bit rates the value of evaluation will not change and reach the maximal value. At the same time it was stated that the results obtained for the ACR method are higher than for the DCR one, so in the opposite way than in research presented in this paper for the MP3 coding. It demands further research to clarify this fact by tests carried out for the bigger listeners group and more diverse testing material. After the finishing of speech transmission quality evaluation measurements the listeners were asked for sharing their impressions regarding particular methods. The listeners emphasized the lack of well defined mark grade scale in the ACR method and no model sample to which the testing sample could be compared. That fact caused difficulties in marking during the evaluation. In the DCR method the model signal exists but even the listeners had problems with giving mark to testing sample and also for that method there were complaints regarding not enough precise definition of the marking scale.. Literature [1] ANSI S 3.5, (1997), Methods for the calculation of the speech intelligibility index (SII). [2] Barbedo J.G.A., Lopes A., (2005), A new cognitive model for objective assessment of audio quality, J. Audio Eng. Soc., 53, 1/2, 22-31,. [3] Beerends J.G., Stemerdink J.A., (1994), A Perceptual Speech-Quality Measure Based on a Psychoacoustic Sound Representation, J. Audio Eng. Soc., 42, 3, [4] Brachmański S., Kin M., (2013), Assessment of speech quality in Digital audio Broadcasting (DAB+) system, 134th Convention Audio Engineering Society, Convention paper 8829, Rome, Italy. [5] Brachmański S., (2012), Automation of Subjective Measurements of Logatom Intelligibility in Classrooms, Automation, ed. by Florian Kongoli, InTech [6] Brachmański S., (2008), Automation of subjective measurements of speech inteligibility in analogue telecommunication channels, Archives of Acoustics, 33, 3, [7] Brachmański S., Kula S., (2003), Badanie jakości mowy w połączeniach głosowych. Stara usługa - nowe problemy, Przegląd Telekomunikacyjny i Wiadomości Telekom., 8-9, [8] Brachmański S., (1999), Subiektywne metody oceny jakości transmisji mowy w cyfrowych kanałach telekomunikacyjnych, Krajowe Sympozjum Telekom. 1999, Tom B, , Bydgoszcz, Poland. [9] Brachmanski S., (2001), Automatyzacja subiektywnych pomiarów jakości transmisji mowy metodą ACR, XLVIII Open Seminar on Acoustics, , Wrocław-Polanica Zdrój, Poland. [10] Brachmański S., (2001), Fonetyczna struktura materiału testowego stosowanego w pomiarach jakości transmisji mowy metodą ACR, XLVIII Open Seminar on Acoustics, , Wrocław-Polanica Zdrój, Poland.. [11] Dończyk R., (2013), Wpływ techniki kompresji na jakość mowy, Praca dyplomowa, Politechnika Wrocławska, Wrocław.

12 [12] French, N.R., Steinberg, J.C., (1947), Factors governing the intelligibility of speech sounds, J. Acoust. Soc. Am. 19, [13] Houtgast T., Steeneken H.J.M., (1973), The Modulation Transfer Function in room acoustics as a predictor of speech intelligibility, Acustica, 28, [14] ISO/IEC, (1993), Information Technology - Coding of moving pictures and associated audio for digital storage media at up to about 1.5 Mbit/s Part 3: Audio; Standard ISO/IEC [15] ITU-T Recommendation P.800, 1(996), Method for subjective determination of transmission quality. [16] ITU-T Recom. P.861 (1996), Objective Quality Measurement of Telephone-band ( Hz) Speech Codecs, (1996). [17] ITU-T Recom. P.862, (2007), Perceptual evaluation of speech quality (PESQ): An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs. [18] ITU-T Recom. P.863, (2011), Methods for Objective and Subjective Assessment of Speech Quality. Perceptual Objective Listening Quality Assessment. [19] Li Z.N., Drew M.S., (2004), Fundamentals of multimedia, Pearsons Education Inc. [20] Majewski W., Myślecki W., Baściuk K., Brachmański S., (1998), Application of modified logatom intelligibility test in telecommunications, audiometry and room acoustics, Proc. 9 th Mediterranean Electrotechnical Conf. Melecon 98, 25-28, Tel-Aviv, Israel. [21] Polska Norma PN-90 / T 05100, (1990), Analogowe łańcuchy telefoniczne. Wymagania i metody pomiaru wyrazistości logatomowej. Warszawa,. [22] Polska Norma PN V , (1999), Cyfrowe łańcuchy telefoniczne. Wymagania i metoda pomiaru wyrazistości logatomowej., Wyd. Norm., Warszawa. [23] Rabiner L.R., Schafer R.W., (2011), Theory and applications of digital speech processing, Pearsons Education Inc.. [24] Sotschek J., (1976), Methoden zur Messung der Sprachgüte I: Verfahren zur Bestimmung der Satz- und der Wortverständlichkeit, Der Fernmelde Ingenieur, 10, [25] Voran S., (1999), Objective Estimation of Perceived Speech Quality Part I: Development of the Measuring Normalizing Block Technique, IEEE Trans. Speech Audio Process., 7, [26] Voran S., (1999), Objective Estimation of Perceived Speech Quality Part II: Evaluation of the Measuring Normalizing Block Technique, IEEE Trans. Speech Audio Process., 7, [27] Zölzer U., (2008), Digital Audio Signal Processing, John Wiley & Sons Ltd.. [28] Zwicker E, Feldtkeller R., (1967), Das Ohr als Nachrich-tenempfänger, S. Hirzel Verlag, Stuttgart,.

Proceedings of Meetings on Acoustics

Proceedings of Meetings on Acoustics Proceedings of Meetings on Acoustics Volume 19, 213 http://acousticalsociety.org/ ICA 213 Montreal Montreal, Canada 2-7 June 213 Engineering Acoustics Session 2pEAb: Controlling Sound Quality 2pEAb1. Subjective

More information

2.4 Audio Compression

2.4 Audio Compression 2.4 Audio Compression 2.4.1 Pulse Code Modulation Audio signals are analog waves. The acoustic perception is determined by the frequency (pitch) and the amplitude (loudness). For storage, processing and

More information

Principles of Audio Coding

Principles of Audio Coding Principles of Audio Coding Topics today Introduction VOCODERS Psychoacoustics Equal-Loudness Curve Frequency Masking Temporal Masking (CSIT 410) 2 Introduction Speech compression algorithm focuses on exploiting

More information

Mpeg 1 layer 3 (mp3) general overview

Mpeg 1 layer 3 (mp3) general overview Mpeg 1 layer 3 (mp3) general overview 1 Digital Audio! CD Audio:! 16 bit encoding! 2 Channels (Stereo)! 44.1 khz sampling rate 2 * 44.1 khz * 16 bits = 1.41 Mb/s + Overhead (synchronization, error correction,

More information

Perceptual Coding. Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding

Perceptual Coding. Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding Perceptual Coding Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding Part II wrap up 6.082 Fall 2006 Perceptual Coding, Slide 1 Lossless vs.

More information

AUDIOVISUAL COMMUNICATION

AUDIOVISUAL COMMUNICATION AUDIOVISUAL COMMUNICATION Laboratory Session: Audio Processing and Coding The objective of this lab session is to get the students familiar with audio processing and coding, notably psychoacoustic analysis

More information

ELL 788 Computational Perception & Cognition July November 2015

ELL 788 Computational Perception & Cognition July November 2015 ELL 788 Computational Perception & Cognition July November 2015 Module 11 Audio Engineering: Perceptual coding Coding and decoding Signal (analog) Encoder Code (Digital) Code (Digital) Decoder Signal (analog)

More information

Audio-coding standards

Audio-coding standards Audio-coding standards The goal is to provide CD-quality audio over telecommunications networks. Almost all CD audio coders are based on the so-called psychoacoustic model of the human auditory system.

More information

AUDIOVISUAL COMMUNICATION

AUDIOVISUAL COMMUNICATION AUDIOVISUAL COMMUNICATION Laboratory Session: Audio Processing and Coding The objective of this lab session is to get the students familiar with audio processing and coding, notably psychoacoustic analysis

More information

Perceptual coding. A psychoacoustic model is used to identify those signals that are influenced by both these effects.

Perceptual coding. A psychoacoustic model is used to identify those signals that are influenced by both these effects. Perceptual coding Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal. Perceptual encoders, however, have been designed for the compression of general

More information

Audio-coding standards

Audio-coding standards Audio-coding standards The goal is to provide CD-quality audio over telecommunications networks. Almost all CD audio coders are based on the so-called psychoacoustic model of the human auditory system.

More information

Chapter 14 MPEG Audio Compression

Chapter 14 MPEG Audio Compression Chapter 14 MPEG Audio Compression 14.1 Psychoacoustics 14.2 MPEG Audio 14.3 Other Commercial Audio Codecs 14.4 The Future: MPEG-7 and MPEG-21 14.5 Further Exploration 1 Li & Drew c Prentice Hall 2003 14.1

More information

Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal.

Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal. Perceptual coding Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal. Perceptual encoders, however, have been designed for the compression of general

More information

5: Music Compression. Music Coding. Mark Handley

5: Music Compression. Music Coding. Mark Handley 5: Music Compression Mark Handley Music Coding LPC-based codecs model the sound source to achieve good compression. Works well for voice. Terrible for music. What if you can t model the source? Model the

More information

Principles of MPEG audio compression

Principles of MPEG audio compression Principles of MPEG audio compression Principy komprese hudebního signálu metodou MPEG Petr Kubíček Abstract The article describes briefly audio data compression. Focus of the article is a MPEG standard,

More information

EE482: Digital Signal Processing Applications

EE482: Digital Signal Processing Applications Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu EE482: Digital Signal Processing Applications Spring 2014 TTh 14:30-15:45 CBC C222 Lecture 13 Audio Signal Processing 14/04/01 http://www.ee.unlv.edu/~b1morris/ee482/

More information

Appendix 4. Audio coding algorithms

Appendix 4. Audio coding algorithms Appendix 4. Audio coding algorithms 1 Introduction The main application of audio compression systems is to obtain compact digital representations of high-quality (CD-quality) wideband audio signals. Typically

More information

Audio Coding and MP3

Audio Coding and MP3 Audio Coding and MP3 contributions by: Torbjørn Ekman What is Sound? Sound waves: 20Hz - 20kHz Speed: 331.3 m/s (air) Wavelength: 165 cm - 1.65 cm 1 Analogue audio frequencies: 20Hz - 20kHz mono: x(t)

More information

Subjective and Objective Assessment of Perceived Audio Quality of Current Digital Audio Broadcasting Systems and Web-Casting Applications

Subjective and Objective Assessment of Perceived Audio Quality of Current Digital Audio Broadcasting Systems and Web-Casting Applications Subjective and Objective Assessment of Perceived Audio Quality of Current Digital Audio Broadcasting Systems and Web-Casting Applications Peter Počta {pocta@fel.uniza.sk} Department of Telecommunications

More information

Multimedia Communications. Audio coding

Multimedia Communications. Audio coding Multimedia Communications Audio coding Introduction Lossy compression schemes can be based on source model (e.g., speech compression) or user model (audio coding) Unlike speech, audio signals can be generated

More information

Figure 1. Generic Encoder. Window. Spectral Analysis. Psychoacoustic Model. Quantize. Pack Data into Frames. Additional Coding.

Figure 1. Generic Encoder. Window. Spectral Analysis. Psychoacoustic Model. Quantize. Pack Data into Frames. Additional Coding. Introduction to Digital Audio Compression B. Cavagnolo and J. Bier Berkeley Design Technology, Inc. 2107 Dwight Way, Second Floor Berkeley, CA 94704 (510) 665-1600 info@bdti.com http://www.bdti.com INTRODUCTION

More information

MPEG-4 aacplus - Audio coding for today s digital media world

MPEG-4 aacplus - Audio coding for today s digital media world MPEG-4 aacplus - Audio coding for today s digital media world Whitepaper by: Gerald Moser, Coding Technologies November 2005-1 - 1. Introduction Delivering high quality digital broadcast content to consumers

More information

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Snr Staff Eng., Team Lead (Applied Research) Dolby Australia Pty Ltd

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Snr Staff Eng., Team Lead (Applied Research) Dolby Australia Pty Ltd Introducing Audio Signal Processing & Audio Coding Dr Michael Mason Snr Staff Eng., Team Lead (Applied Research) Dolby Australia Pty Ltd Introducing Audio Signal Processing & Audio Coding 2013 Dolby Laboratories,

More information

Parametric Coding of High-Quality Audio

Parametric Coding of High-Quality Audio Parametric Coding of High-Quality Audio Prof. Dr. Gerald Schuller Fraunhofer IDMT & Ilmenau Technical University Ilmenau, Germany 1 Waveform vs Parametric Waveform Filter-bank approach Mainly exploits

More information

CHAPTER 6 Audio compression in practice

CHAPTER 6 Audio compression in practice CHAPTER 6 Audio compression in practice In earlier chapters we have seen that digital sound is simply an array of numbers, where each number is a measure of the air pressure at a particular time. This

More information

Compressed Audio Demystified by Hendrik Gideonse and Connor Smith. All Rights Reserved.

Compressed Audio Demystified by Hendrik Gideonse and Connor Smith. All Rights Reserved. Compressed Audio Demystified Why Music Producers Need to Care About Compressed Audio Files Download Sales Up CD Sales Down High-Definition hasn t caught on yet Consumers don t seem to care about high fidelity

More information

Optical Storage Technology. MPEG Data Compression

Optical Storage Technology. MPEG Data Compression Optical Storage Technology MPEG Data Compression MPEG-1 1 Audio Standard Moving Pictures Expert Group (MPEG) was formed in 1988 to devise compression techniques for audio and video. It first devised the

More information

Lecture 16 Perceptual Audio Coding

Lecture 16 Perceptual Audio Coding EECS 225D Audio Signal Processing in Humans and Machines Lecture 16 Perceptual Audio Coding 2012-3-14 Professor Nelson Morgan today s lecture by John Lazzaro www.icsi.berkeley.edu/eecs225d/spr12/ Hero

More information

Voice Quality Assessment for Mobile to SIP Call over Live 3G Network

Voice Quality Assessment for Mobile to SIP Call over Live 3G Network Abstract 132 Voice Quality Assessment for Mobile to SIP Call over Live 3G Network G.Venkatakrishnan, I-H.Mkwawa and L.Sun Signal Processing and Multimedia Communications, University of Plymouth, Plymouth,

More information

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Senior Manager, CE Technology Dolby Australia Pty Ltd

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Senior Manager, CE Technology Dolby Australia Pty Ltd Introducing Audio Signal Processing & Audio Coding Dr Michael Mason Senior Manager, CE Technology Dolby Australia Pty Ltd Overview Audio Signal Processing Applications @ Dolby Audio Signal Processing Basics

More information

SPREAD SPECTRUM AUDIO WATERMARKING SCHEME BASED ON PSYCHOACOUSTIC MODEL

SPREAD SPECTRUM AUDIO WATERMARKING SCHEME BASED ON PSYCHOACOUSTIC MODEL SPREAD SPECTRUM WATERMARKING SCHEME BASED ON PSYCHOACOUSTIC MODEL 1 Yüksel Tokur 2 Ergun Erçelebi e-mail: tokur@gantep.edu.tr e-mail: ercelebi@gantep.edu.tr 1 Gaziantep University, MYO, 27310, Gaziantep,

More information

CHAPTER 10: SOUND AND VIDEO EDITING

CHAPTER 10: SOUND AND VIDEO EDITING CHAPTER 10: SOUND AND VIDEO EDITING What should you know 1. Edit a sound clip to meet the requirements of its intended application and audience a. trim a sound clip to remove unwanted material b. join

More information

Module 9 AUDIO CODING. Version 2 ECE IIT, Kharagpur

Module 9 AUDIO CODING. Version 2 ECE IIT, Kharagpur Module 9 AUDIO CODING Lesson 29 Transform and Filter banks Instructional Objectives At the end of this lesson, the students should be able to: 1. Define the three layers of MPEG-1 audio coding. 2. Define

More information

Perceptual Pre-weighting and Post-inverse weighting for Speech Coding

Perceptual Pre-weighting and Post-inverse weighting for Speech Coding Perceptual Pre-weighting and Post-inverse weighting for Speech Coding Niranjan Shetty and Jerry D. Gibson Department of Electrical and Computer Engineering University of California, Santa Barbara, CA,

More information

DAB. Digital Audio Broadcasting

DAB. Digital Audio Broadcasting DAB Digital Audio Broadcasting DAB history DAB has been under development since 1981 at the Institut für Rundfunktechnik (IRT). In 1985 the first DAB demonstrations were held at the WARC-ORB in Geneva

More information

The MPEG-4 General Audio Coder

The MPEG-4 General Audio Coder The MPEG-4 General Audio Coder Bernhard Grill Fraunhofer Institute for Integrated Circuits (IIS) grl 6/98 page 1 Outline MPEG-2 Advanced Audio Coding (AAC) MPEG-4 Extensions: Perceptual Noise Substitution

More information

CISC 7610 Lecture 3 Multimedia data and data formats

CISC 7610 Lecture 3 Multimedia data and data formats CISC 7610 Lecture 3 Multimedia data and data formats Topics: Perceptual limits of multimedia data JPEG encoding of images MPEG encoding of audio MPEG and H.264 encoding of video Multimedia data: Perceptual

More information

Audio Compression. Audio Compression. Absolute Threshold. CD quality audio:

Audio Compression. Audio Compression. Absolute Threshold. CD quality audio: Audio Compression Audio Compression CD quality audio: Sampling rate = 44 KHz, Quantization = 16 bits/sample Bit-rate = ~700 Kb/s (1.41 Mb/s if 2 channel stereo) Telephone-quality speech Sampling rate =

More information

Subjective Audiovisual Quality in Mobile Environment

Subjective Audiovisual Quality in Mobile Environment Vienna University of Technology Faculty of Electrical Engineering and Information Technology Institute of Communications and Radio-Frequency Engineering Master of Science Thesis Subjective Audiovisual

More information

Audio Fundamentals, Compression Techniques & Standards. Hamid R. Rabiee Mostafa Salehi, Fatemeh Dabiran, Hoda Ayatollahi Spring 2011

Audio Fundamentals, Compression Techniques & Standards. Hamid R. Rabiee Mostafa Salehi, Fatemeh Dabiran, Hoda Ayatollahi Spring 2011 Audio Fundamentals, Compression Techniques & Standards Hamid R. Rabiee Mostafa Salehi, Fatemeh Dabiran, Hoda Ayatollahi Spring 2011 Outlines Audio Fundamentals Sampling, digitization, quantization μ-law

More information

Wavelet filter bank based wide-band audio coder

Wavelet filter bank based wide-band audio coder Wavelet filter bank based wide-band audio coder J. Nováček Czech Technical University, Faculty of Electrical Engineering, Technicka 2, 16627 Prague, Czech Republic novacj1@fel.cvut.cz 3317 New system for

More information

A PSYCHOACOUSTIC MODEL WITH PARTIAL SPECTRAL FLATNESS MEASURE FOR TONALITY ESTIMATION

A PSYCHOACOUSTIC MODEL WITH PARTIAL SPECTRAL FLATNESS MEASURE FOR TONALITY ESTIMATION A PSYCHOACOUSTIC MODEL WITH PARTIAL SPECTRAL FLATNESS MEASURE FOR TONALITY ESTIMATION Armin Taghipour 1, Maneesh Chandra Jaikumar 2, and Bernd Edler 1 1 International Audio Laboratories Erlangen, Am Wolfsmantel

More information

AUDIO. Henning Schulzrinne Dept. of Computer Science Columbia University Spring 2015

AUDIO. Henning Schulzrinne Dept. of Computer Science Columbia University Spring 2015 AUDIO Henning Schulzrinne Dept. of Computer Science Columbia University Spring 2015 Key objectives How do humans generate and process sound? How does digital sound work? How fast do I have to sample audio?

More information

MPEG-1. Overview of MPEG-1 1 Standard. Introduction to perceptual and entropy codings

MPEG-1. Overview of MPEG-1 1 Standard. Introduction to perceptual and entropy codings MPEG-1 Overview of MPEG-1 1 Standard Introduction to perceptual and entropy codings Contents History Psychoacoustics and perceptual coding Entropy coding MPEG-1 Layer I/II Layer III (MP3) Comparison and

More information

Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig

Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig Multimedia Databases Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 6 Audio Retrieval 6 Audio Retrieval 6.1 Basics of

More information

1 Introduction. 2 Speech Compression

1 Introduction. 2 Speech Compression Abstract In this paper, the effect of MPEG audio compression on HMM-based speech synthesis is studied. Speech signals are encoded with various compression rates and analyzed using the GlottHMM vocoder.

More information

Transporting audio-video. over the Internet

Transporting audio-video. over the Internet Transporting audio-video over the Internet Key requirements Bit rate requirements Audio requirements Video requirements Delay requirements Jitter Inter-media synchronization On compression... TCP, UDP

More information

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 SUBJECTIVE AND OBJECTIVE QUALITY EVALUATION FOR AUDIO WATERMARKING BASED ON SINUSOIDAL AMPLITUDE MODULATION PACS: 43.10.Pr, 43.60.Ek

More information

The following bit rates are recommended for broadcast contribution employing the most commonly used audio coding schemes:

The following bit rates are recommended for broadcast contribution employing the most commonly used audio coding schemes: Page 1 of 8 1. SCOPE This Operational Practice sets out guidelines for minimising the various artefacts that may distort audio signals when low bit-rate coding schemes are employed to convey contribution

More information

MULTIMODE TREE CODING OF SPEECH WITH PERCEPTUAL PRE-WEIGHTING AND POST-WEIGHTING

MULTIMODE TREE CODING OF SPEECH WITH PERCEPTUAL PRE-WEIGHTING AND POST-WEIGHTING MULTIMODE TREE CODING OF SPEECH WITH PERCEPTUAL PRE-WEIGHTING AND POST-WEIGHTING Pravin Ramadas, Ying-Yi Li, and Jerry D. Gibson Department of Electrical and Computer Engineering, University of California,

More information

Audio Compression Using Decibel chirp Wavelet in Psycho- Acoustic Model

Audio Compression Using Decibel chirp Wavelet in Psycho- Acoustic Model Audio Compression Using Decibel chirp Wavelet in Psycho- Acoustic Model 1 M. Chinna Rao M.Tech,(Ph.D) Research scholar, JNTUK,kakinada chinnarao.mortha@gmail.com 2 Dr. A.V.S.N. Murthy Professor of Mathematics,

More information

Rich Recording Technology Technical overall description

Rich Recording Technology Technical overall description Rich Recording Technology Technical overall description Ari Koski Nokia with Windows Phones Product Engineering/Technology Multimedia/Audio/Audio technology management 1 Nokia s Rich Recording technology

More information

SAOC and USAC. Spatial Audio Object Coding / Unified Speech and Audio Coding. Lecture Audio Coding WS 2013/14. Dr.-Ing.

SAOC and USAC. Spatial Audio Object Coding / Unified Speech and Audio Coding. Lecture Audio Coding WS 2013/14. Dr.-Ing. SAOC and USAC Spatial Audio Object Coding / Unified Speech and Audio Coding Lecture Audio Coding WS 2013/14 Dr.-Ing. Andreas Franck Fraunhofer Institute for Digital Media Technology IDMT, Germany SAOC

More information

Parametric Coding of Spatial Audio

Parametric Coding of Spatial Audio Parametric Coding of Spatial Audio Ph.D. Thesis Christof Faller, September 24, 2004 Thesis advisor: Prof. Martin Vetterli Audiovisual Communications Laboratory, EPFL Lausanne Parametric Coding of Spatial

More information

MP3. Panayiotis Petropoulos

MP3. Panayiotis Petropoulos MP3 By Panayiotis Petropoulos Overview Definition History MPEG standards MPEG 1 / 2 Layer III Why audio compression through Mp3 is necessary? Overview MPEG Applications Mp3 Devices Mp3PRO Conclusion Definition

More information

Technical PapER. between speech and audio coding. Fraunhofer Institute for Integrated Circuits IIS

Technical PapER. between speech and audio coding. Fraunhofer Institute for Integrated Circuits IIS Technical PapER Extended HE-AAC Bridging the gap between speech and audio coding One codec taking the place of two; one unified system bridging a troublesome gap. The fifth generation MPEG audio codec

More information

Effect of MPEG Audio Compression on HMM-based Speech Synthesis

Effect of MPEG Audio Compression on HMM-based Speech Synthesis Effect of MPEG Audio Compression on HMM-based Speech Synthesis Bajibabu Bollepalli 1, Tuomo Raitio 2, Paavo Alku 2 1 Department of Speech, Music and Hearing, KTH, Stockholm, Sweden 2 Department of Signal

More information

Speech-Coding Techniques. Chapter 3

Speech-Coding Techniques. Chapter 3 Speech-Coding Techniques Chapter 3 Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones and zeros The lower the bandwidth, the lower the quality RTP payload types

More information

Audio and video compression

Audio and video compression Audio and video compression 4.1 introduction Unlike text and images, both audio and most video signals are continuously varying analog signals. Compression algorithms associated with digitized audio and

More information

S.K.R Engineering College, Chennai, India. 1 2

S.K.R Engineering College, Chennai, India. 1 2 Implementation of AAC Encoder for Audio Broadcasting A.Parkavi 1, T.Kalpalatha Reddy 2. 1 PG Scholar, 2 Dean 1,2 Department of Electronics and Communication Engineering S.K.R Engineering College, Chennai,

More information

Efficient Representation of Sound Images: Recent Developments in Parametric Coding of Spatial Audio

Efficient Representation of Sound Images: Recent Developments in Parametric Coding of Spatial Audio Efficient Representation of Sound Images: Recent Developments in Parametric Coding of Spatial Audio Dr. Jürgen Herre 11/07 Page 1 Jürgen Herre für (IIS) Erlangen, Germany Introduction: Sound Images? Humans

More information

Fundamentals of Perceptual Audio Encoding. Craig Lewiston HST.723 Lab II 3/23/06

Fundamentals of Perceptual Audio Encoding. Craig Lewiston HST.723 Lab II 3/23/06 Fundamentals of Perceptual Audio Encoding Craig Lewiston HST.723 Lab II 3/23/06 Goals of Lab Introduction to fundamental principles of digital audio & perceptual audio encoding Learn the basics of psychoacoustic

More information

CSCD 443/533 Advanced Networks Fall 2017

CSCD 443/533 Advanced Networks Fall 2017 CSCD 443/533 Advanced Networks Fall 2017 Lecture 18 Compression of Video and Audio 1 Topics Compression technology Motivation Human attributes make it possible Audio Compression Video Compression Performance

More information

INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO

INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO ISO/IEC JTC1/SC29/WG11 N15071 February 2015, Geneva,

More information

Perspectives on Multimedia Quality Prediction Methodologies for Advanced Mobile and IP-based Telephony

Perspectives on Multimedia Quality Prediction Methodologies for Advanced Mobile and IP-based Telephony Perspectives on Multimedia Quality Prediction Methodologies for Advanced Mobile and IP-based Telephony Nobuhiko Kitawaki University of Tsukuba 1-1-1, Tennoudai, Tsukuba-shi, 305-8573 Japan. E-mail: kitawaki@cs.tsukuba.ac.jp

More information

Digital Image Processing

Digital Image Processing Digital Image Processing Fundamentals of Image Compression DR TANIA STATHAKI READER (ASSOCIATE PROFFESOR) IN SIGNAL PROCESSING IMPERIAL COLLEGE LONDON Compression New techniques have led to the development

More information

Digital Recording and Playback

Digital Recording and Playback Digital Recording and Playback Digital recording is discrete a sound is stored as a set of discrete values that correspond to the amplitude of the analog wave at particular times Source: http://www.cycling74.com/docs/max5/tutorials/msp-tut/mspdigitalaudio.html

More information

Digital Speech Coding

Digital Speech Coding Digital Speech Processing David Tipper Associate Professor Graduate Program of Telecommunications and Networking University of Pittsburgh Telcom 2700/INFSCI 1072 Slides 7 http://www.sis.pitt.edu/~dtipper/tipper.html

More information

Networking Applications

Networking Applications Networking Dr. Ayman A. Abdel-Hamid College of Computing and Information Technology Arab Academy for Science & Technology and Maritime Transport Multimedia Multimedia 1 Outline Audio and Video Services

More information

DSP. Presented to the IEEE Central Texas Consultants Network by Sergio Liberman

DSP. Presented to the IEEE Central Texas Consultants Network by Sergio Liberman DSP The Technology Presented to the IEEE Central Texas Consultants Network by Sergio Liberman Abstract The multimedia products that we enjoy today share a common technology backbone: Digital Signal Processing

More information

Quality Aspects in Digital Broadcasting and Webcasting Systems: Bitrate versus Loudness

Quality Aspects in Digital Broadcasting and Webcasting Systems: Bitrate versus Loudness Paper Quality Aspects in Digital Broadcasting and Webcasting Systems: Bitrate versus Loudness Przemysław Gilski, Sławomir Gajewski, and Jacek Stefański Faculty of Electronics, Telecommunications and Informatics,

More information

ITNP80: Multimedia! Sound-II!

ITNP80: Multimedia! Sound-II! Sound compression (I) Compression of sound data requires different techniques from those for graphical data Requirements are less stringent than for video data rate for CD-quality audio is much less than

More information

MPEG-4 General Audio Coding

MPEG-4 General Audio Coding MPEG-4 General Audio Coding Jürgen Herre Fraunhofer Institute for Integrated Circuits (IIS) Dr. Jürgen Herre, hrr@iis.fhg.de 1 General Audio Coding Solid state players, Internet audio, terrestrial and

More information

3 Sound / Audio. CS 5513 Multimedia Systems Spring 2009 LECTURE. Imran Ihsan Principal Design Consultant

3 Sound / Audio. CS 5513 Multimedia Systems Spring 2009 LECTURE. Imran Ihsan Principal Design Consultant LECTURE 3 Sound / Audio CS 5513 Multimedia Systems Spring 2009 Imran Ihsan Principal Design Consultant OPUSVII www.opuseven.com Faculty of Engineering & Applied Sciences 1. The Nature of Sound Sound is

More information

Ch. 5: Audio Compression Multimedia Systems

Ch. 5: Audio Compression Multimedia Systems Ch. 5: Audio Compression Multimedia Systems Prof. Ben Lee School of Electrical Engineering and Computer Science Oregon State University Chapter 5: Audio Compression 1 Introduction Need to code digital

More information

Lecture #3: Digital Music and Sound

Lecture #3: Digital Music and Sound Lecture #3: Digital Music and Sound CS106E Spring 2018, Young In this lecture we take a look at how computers represent music and sound. One very important concept we ll come across when studying digital

More information

Perceptual Quality Measurement and Control: Definition, Application and Performance

Perceptual Quality Measurement and Control: Definition, Application and Performance Perceptual Quality Measurement and Control: Definition, Application and Performance A. R. Prasad, R. Esmailzadeh, S. Winkler, T. Ihara, B. Rohani, B. Pinguet and M. Capel Genista Corporation Tokyo, Japan

More information

Modeling of an MPEG Audio Layer-3 Encoder in Ptolemy

Modeling of an MPEG Audio Layer-3 Encoder in Ptolemy Modeling of an MPEG Audio Layer-3 Encoder in Ptolemy Patrick Brown EE382C Embedded Software Systems May 10, 2000 $EVWUDFW MPEG Audio Layer-3 is a standard for the compression of high-quality digital audio.

More information

Does your Voice Quality Monitoring Measure Up?

Does your Voice Quality Monitoring Measure Up? Does your Voice Quality Monitoring Measure Up? Measure voice quality in real time Today s voice quality monitoring tools can give misleading results. This means that service providers are not getting a

More information

MAXIMIZING AUDIOVISUAL QUALITY AT LOW BITRATES

MAXIMIZING AUDIOVISUAL QUALITY AT LOW BITRATES MAXIMIZING AUDIOVISUAL QUALITY AT LOW BITRATES Stefan Winkler Genista Corporation Rue du Theâtre 5 1 Montreux, Switzerland stefan.winkler@genista.com Christof Faller Audiovisual Communications Lab Ecole

More information

For Mac and iphone. James McCartney Core Audio Engineer. Eric Allamanche Core Audio Engineer

For Mac and iphone. James McCartney Core Audio Engineer. Eric Allamanche Core Audio Engineer For Mac and iphone James McCartney Core Audio Engineer Eric Allamanche Core Audio Engineer 2 3 James McCartney Core Audio Engineer 4 Topics About audio representation formats Converting audio Processing

More information

Port of a Fixed Point MPEG-2 AAC Encoder on a ARM Platform

Port of a Fixed Point MPEG-2 AAC Encoder on a ARM Platform Port of a Fixed Point MPEG-2 AAC Encoder on a ARM Platform by Romain Pagniez romain@felinewave.com A Dissertation submitted in partial fulfillment of the requirements for the Degree of Master of Science

More information

End-to-end speech and audio quality evaluation of networks using AQuA - competitive alternative for PESQ (P.862) Endre Domiczi Sevana Oy

End-to-end speech and audio quality evaluation of networks using AQuA - competitive alternative for PESQ (P.862) Endre Domiczi Sevana Oy End-to-end speech and audio quality evaluation of networks using AQuA - competitive alternative for PESQ (P.862) Endre Domiczi Sevana Oy Overview Significance of speech and audio quality Problems with

More information

And you thought we were famous

And you thought we were famous And you thought we were famous for our cassette recorders! Pro-Installation Solid-State Recorder PMD570 For years, Marantz Professional has led the industry in the recording and gathering of audio for

More information

Digital Media. Daniel Fuller ITEC 2110

Digital Media. Daniel Fuller ITEC 2110 Digital Media Daniel Fuller ITEC 2110 Daily Question: Digital Audio What values contribute to the file size of a digital audio file? Email answer to DFullerDailyQuestion@gmail.com Subject Line: ITEC2110-09

More information

Data Compression. Audio compression

Data Compression. Audio compression 1 Data Compression Audio compression Outline Basics of Digital Audio 2 Introduction What is sound? Signal-to-Noise Ratio (SNR) Digitization Filtering Sampling and Nyquist Theorem Quantization Synthetic

More information

Fundamental of Digital Media Design. Introduction to Audio

Fundamental of Digital Media Design. Introduction to Audio Fundamental of Digital Media Design Introduction to Audio by Noraniza Samat Faculty of Computer Systems & Software Engineering noraniza@ump.edu.my OER Fundamental of Digital Media Design by Noraniza Samat

More information

Lecture Information. Mod 01 Part 1: The Need for Compression. Why Digital Signal Coding? (1)

Lecture Information. Mod 01 Part 1: The Need for Compression. Why Digital Signal Coding? (1) Multimedia Video Coding & Architectures (5LSE0), Module 01 Introduction to coding aspects 1 Lecture Information Lecturer Prof.dr.ir. Peter H.N. de With Faculty Electrical Engineering, University Technology

More information

Lecture Information Multimedia Video Coding & Architectures

Lecture Information Multimedia Video Coding & Architectures Multimedia Video Coding & Architectures (5LSE0), Module 01 Introduction to coding aspects 1 Lecture Information Lecturer Prof.dr.ir. Peter H.N. de With Faculty Electrical Engineering, University Technology

More information

Voice Analysis for Mobile Networks

Voice Analysis for Mobile Networks White Paper VIAVI Solutions Voice Analysis for Mobile Networks Audio Quality Scoring Principals for Voice Quality of experience analysis for voice... 3 Correlating MOS ratings to network quality of service...

More information

Missing Frame Recovery Method for G Based on Neural Networks

Missing Frame Recovery Method for G Based on Neural Networks Missing Frame Recovery Method for G7231 Based on Neural Networks JARI TURUNEN & PEKKA LOULA Information Technology, Pori Tampere University of Technology Pohjoisranta 11, POBox 300, FIN-28101 Pori FINLAND

More information

New Results in Low Bit Rate Speech Coding and Bandwidth Extension

New Results in Low Bit Rate Speech Coding and Bandwidth Extension Audio Engineering Society Convention Paper Presented at the 121st Convention 2006 October 5 8 San Francisco, CA, USA This convention paper has been reproduced from the author's advance manuscript, without

More information

Evaluation of VoIP Speech Quality Using Neural Network

Evaluation of VoIP Speech Quality Using Neural Network Journal of Communication and Computer 12 (2015) 237-243 doi: 10.17265/1548-7709/2015.05.003 D DAVID PUBLISHING Evaluation of VoIP Speech Quality Using Neural Network Angel Garabitov and Aleksandar Tsenov

More information

What is multimedia? Multimedia. Continuous media. Most common media types. Continuous media processing. Interactivity. What is multimedia?

What is multimedia? Multimedia. Continuous media. Most common media types. Continuous media processing. Interactivity. What is multimedia? Multimedia What is multimedia? Media types +Text + Graphics + Audio +Image +Video Interchange formats What is multimedia? Multimedia = many media User interaction = interactivity Script = time 1 2 Most

More information

3G Services Present New Challenges For Network Performance Evaluation

3G Services Present New Challenges For Network Performance Evaluation 3G Services Present New Challenges For Network Performance Evaluation 2004-29-09 1 Outline Synopsis of speech, audio, and video quality evaluation metrics Performance evaluation challenges related to 3G

More information

Lossy compression. CSCI 470: Web Science Keith Vertanen

Lossy compression. CSCI 470: Web Science Keith Vertanen Lossy compression CSCI 470: Web Science Keith Vertanen Digital audio Overview Sampling rate Quan5za5on MPEG audio layer 3 (MP3) JPEG s5ll images Color space conversion, downsampling Discrete Cosine Transform

More information

1 Audio quality determination based on perceptual measurement techniques 1 John G. Beerends

1 Audio quality determination based on perceptual measurement techniques 1 John G. Beerends Contents List of Figures List of Tables Contributing Authors xiii xxi xxiii Introduction Karlheinz Brandenburg and Mark Kahrs xxix 1 Audio quality determination based on perceptual measurement techniques

More information

White Paper Voice Quality Sound design is an art form at Snom and is at the core of our development utilising some of the world's most advance voice

White Paper Voice Quality Sound design is an art form at Snom and is at the core of our development utilising some of the world's most advance voice White Paper Voice Quality Sound design is an art form at and is at the core of our development utilising some of the world's most advance voice quality engineering tools White Paper - Audio Quality Table

More information

DRA AUDIO CODING STANDARD

DRA AUDIO CODING STANDARD Applied Mechanics and Materials Online: 2013-06-27 ISSN: 1662-7482, Vol. 330, pp 981-984 doi:10.4028/www.scientific.net/amm.330.981 2013 Trans Tech Publications, Switzerland DRA AUDIO CODING STANDARD Wenhua

More information

Multimedia. What is multimedia? Media types. Interchange formats. + Text +Graphics +Audio +Image +Video. Petri Vuorimaa 1

Multimedia. What is multimedia? Media types. Interchange formats. + Text +Graphics +Audio +Image +Video. Petri Vuorimaa 1 Multimedia What is multimedia? Media types + Text +Graphics +Audio +Image +Video Interchange formats Petri Vuorimaa 1 What is multimedia? Multimedia = many media User interaction = interactivity Script

More information