On Improving the Performance of an ACELP Speech Coder

Size: px
Start display at page:

Download "On Improving the Performance of an ACELP Speech Coder"

Transcription

1 On Improving the Performance of an ACELP Speech Coder ARI HEIKKINEN, SAMULI PIETILÄ, VESA T. RUOPPILA, AND SAKARI HIMANEN Nokia Research Center, Speech and Audio Systems Laboratory P.O. Box, FIN-337 Tampere, Finland Abstract: - In this paper we evaluate the performance of a variety of techniques to improve the parameter analysis in CELP speech coders. These methods include using extended cost horizon in the fixed codebook search process, as well as joint optimization and delayed decision coding of the adaptive and fixed codebook parameters. Based on our simulations for the IS- speech coder, substantial improvements in terms of objective performance are achieved especially by using delayed decision coding, while the subjective improvements are more marginal. This paper also presents the justification for efficient coding methods based on the distribution of adaptive and algebraic codebook indices in the modified IS- coders, as well as demonstrates the performance improvements achieved by using a shaped lattice structure and adaptive pulse positioning to encode the adaptive and algebraic codebook indices. While the simulations were made using the IS- speech coder or a modified version of it, the results and observations can be generalized to most ACELP and CELP coders. At lower bit rates the importance of each approach described in this paper is expected to increase. Key-Words: - algebraic code excited linear prediction Introduction In recent years, code excited linear prediction (CELP) [] has been the most popular approach for high quality speech coding at bit rates approximately above kbps. This is especially true for a derivative of CELP coders called algebraic CELP (ACELP), and different ACELP coders have been widely accepted in recent speech coding standardization processes in 3GPP, ITU-T, ETSI and TIA. One example of such a coder is the 7. kbps IS- speech coder adopted by TIA []. However, at bit rates below kbps the quality of CELP coders in general deteriorates rapidly, which is partly proven by the recent efforts in ITU-T to standardize a high quality kbps speech coder [3]. To improve the performance of CELP coders and simultaneously to make it more amenable for lower bit rates, methods including relaxed waveform matching [] and phase dispersion [5] have been suggested, which efficiently exploit the properties of the human speech perception mechanism. On the other hand, to tackle the limitations concerning the parameter analysis in CELP coding, extended cost horizon [], joint optimization [7] and delayed decision coding have been proposed [8]. For increased coding efficiency, also methods exploiting V.T. Ruoppila is presently with VoiceAge Corp. in Montreal, Canada. S. Pietilä is presently with Nokia Mobile Phones in Tampere, Finland. the uneven distribution of the excitation parameters of a CELP coder have been presented, see e.g. [9,, ]. In this paper, we evaluate the performance of using extended cost horizon, joint estimation and delayed decision coding in the excitation search process of the IS- speech coder. Furthermore, the justification for the enhanced methods employing the uneven distribution of the adaptive and algebraic codebook indices are given, together with the simulation results of the proposed approaches for their efficient coding. This paper is organized as follows. In Section the structure of the IS- speech coder is briefly described. The simulation results for using the extended cost horizon, joint optimization and delayed decision coding are presented in Section 3. The empirically found distributions for the adaptive and algebraic codebook indices are shown in Section. The concepts of shaped lattice and adaptive pulse positioning for efficient coding of adaptive and fixed codebook indices are also shortly described, together with the simulation results. Finally, conclusions are drawn. IS- Speech Coder In the ACELP speech coder, a cascade of time variant pitch predictor and linear prediction (LP) filter is used to filter an excitation signal, see Fig.. An all-pole LP filter

2 τ u b ( n ) z b u(n) A( z) s(n) sˆ ( n) e(n) i u ( b k ) z uk ( ) A( z) A(z) P (z) yk ( ) Excitation Generator u c ( n ) g W (z) b Error Minimization Excitation Generator u ( c k ) g Fig. Block diagrams of ACELP encoder (left) and decoder (right). H ( z) = A z = a z a z a p n z () ( ) where a...a p are the coefficients, is used to model the short-time spectral envelope of the speech signal. A pitch predictor of the form = B( z) bz utilizes the pitch periodicity of speech to model the fine structure of the spectrum. The gain b is bounded to the interval of -., and the pitch period, or similarly pitch lag, to the interval of -3 samples (sampling frequency is 8 khz). The pitch predictor is also referred to as long-term predictor (LTP) filter. In Fig., the LTP filter is represented by the feedback loop consisting of the delay z and the gain. The LTP memory can also be seen as a codebook consisting overlapping codevectors. This codebook is usually referred to as the LTP or adaptive codebook. An algebraic excitation, and more generally fixed excitation, signal u c (n) is multiplied by a gain g to form an input signal to the filter cascade. The algebraic excitation signal is composed of pulses having a value of ± and zeros, and the corresponding codebook is called algebraic codebook. The output of the filter cascade is a synthesized speech signal s ˆ( n). An error signal e(n) is computed by subtracting the synthesized speech signal s ˆ( n) from the original speech signal s(n). The optimal adaptive and algebraic codevectors are sequentially selected by minimizing the weighted sum-squared error. The purpose of the weighting filter W(z) is to shape the spectrum of the error signal so that it is less audible. a () The frame length used in the IS- coder is ms, and a frame is further divided into four subframes of equal lengths. One set of LP coefficients is derived for each frame and it is encoded with bits. The other parameters are derived subframe wise. The pitch lag is encoded by bits (8585) while 8 bits ( 7) are used to code the pulse positions together with their signs. The pitch gain and the algebraic codebook gains are vector quantized by 8 bits ( 7). The decoder receives the parameters from the channel, see Fig., and determines the algebraic excitation signal by the received index and gain. The algebraic excitation signal is filtered through the LTP-LP filter cascade to produce the synthesized speech signal. Finally, a postfilter P(z) is employed to enhance the perceptual speech quality. 3 Modified Parameter Analysis In a typical CELP coder, there are two important limitations in the parameter estimation process, which can partly be justified by the reduced complexity. Firstly, different parameters are sequentially optimized instead of joint optimization. Secondly, the cost function used to find the excitation signals (adaptive and fixed) minimizes the sum-squared error within the current subframe, but it does not take into account the effect that the excitation signal has on the subsequent subframes. One result of subframe based error minimization is that the excitation samples at the first positions of the subframe will have greater contribution to the cost function than the samples at the last positions due to LP filtering. To alleviate these problems, it has been proposed in [] that the cost function of the fixed codebook search is extended to cover the beginning of the next

3 Joint Optimization Delayed Decision, NUM ALG = NUM ADA Delayed Decision, NUM ALG = NUM ADA Delayed Decision, NUM ALG = NUM ADA 3.5 Whole Speech.5 Voiced Speech 5. Unvoiced Speech SegSNR Max{ NUM ALG, NUM ADA }. 3 Fig.. Simulation results for joint optimization and delayed decision coding of adaptive and algebraic codebook parameters in the IS- speech coder. subframe. In the presented approach, the target signal and the synthesized speech signal are extended by concatenating their free evolutions (output of zero valued excitation) to the original signals. In [7], the adaptive and fixed codebook parameters were jointly searched instead of sequential search. A solution described in [8] is the delayed decision method, where a predetermined number of fixed and adaptive codebook parameter candidates are chosen for each subframe in the current frame. After the last subframe, the parameter combination that gives the best total performance over the whole frame is chosen. The advantages of this approach include simultaneous optimization of the adaptive and fixed codebook excitation parameters, as well as taking into account the influence of the current subframe parameters to the successive subframes. In delayed decision coding various kinds of tree coding algorithms can be used, which are mainly classified by the decision timing. In the first method of the two most typical ones, a decision is made simultaneously for all subframes in a frame by selecting the best path in the tree. In the other widely used method the decision is made for each subframe s by considering the cumulative distortion from sth to (s N)th subframe. In our simulations the second approach was used with N set to one, resulting thus to an additional coder delay of one frame. This delay is needed to determine the excitation parameters for the last subframe of the current frame. To evaluate the performance of the three methods described above, we implemented them to the IS- speech coder. Based on our simulation results, a maximum increase of. db in segmental SNR was achieved by using the extended cost horizon approach for the algebraic codebook search. This improvement was achieved with the extension length of eight samples while the other extension lengths in range of - samples performed approximately.-. db better than the original coder. In general, the improvements were bigger for voiced than for unvoiced speech. In computing the extended excitation signal, no pitch sharpening was used to the extended algebraic excitation segment. In Fig., the simulation results for different delayed decision configurations in the IS- speech coder are shown. In the figure, the number of adaptive and algebraic codebook parameter sets derived at each stage is depicted by NUM_ADA and NUM_ALG, respectively. The explosion of the amount of paths was restricted by considering only NUM_ADA NUM_ALG best candidates at each stage in the tree. Unquantized gain values were used in the simulations. In addition to different delayed decision configurations, the performance of joint optimization of the adaptive and algebraic codebook parameters within each subframe is illustrated in Fig.. As it can be observed from Fig., clear improvements in terms of segmental SNR can be achieved by using delayed decision coding. Also, improvements can be achieved by joint optimization of adaptive and algebraic codebook excitation parameters although better performance is achieved by delayed decision coding. In informal listening

4 d d d d 3 Fig. 3. The differences between successive pitch periods in the modified IS- speech coder. d d 3 D D 3 D d c D D D d a D D b Fig.. A three-dimensional lattice for delta periods in the modified IS- speech coder. experiments, the improvements achieved by all tested methods were judged to be rather marginal. At lower bit rates, however, the subjective importance of these methods is expected to be higher. Distribution of Codebook Indices. Adaptive Codebook Indices In the IS- speech coder, the smooth evolution of pitch contour during voiced speech is exploited by using differential coding for every other pitch value. The absolute pitch period is searched from the range of 9 / 3-3 samples for the first and third subframe. In the range of 9 / 3-8 / 3 samples, a resolution of /3 is used while integer values are used in the range of 85-3 samples. For the second and fourth subframes the pitch periods are searched from the neighborhood of the pitch period in the previous subframe. The range of the search for the delta pitch periods is - / 3 to 5 / 3 samples using a resolution of /3. Generally speaking, coding of n successive delta pitch periods can be described as an n-dimensional lattice where each dimension represents a pitch period in a corresponding subframe []. In a typical lattice coding of delta periods, attention is only paid to the selection of its boundary values while the rectangular shape of the lattice is maintained. No further care is taken to describe how a suitable set of points is chosen to cover only the most likely points used. Since the pitch period evolves usually smoothly during voiced speech, the rectangular lattice covers also points that are used rarely. Thus, the coding efficiency can be increased by shaping the lattice to eliminate unlikely pitch period combinations from the resulting coding scheme.

5 In [] we proposed a shaped lattice structure derived from the empirically found distribution of delta periods in a modified IS- coder. In the modified coder, the absolute pitch period is used only for the first subframe while delta pitch periods are used for the other subframes. The distribution of delta periods over a large database is shown in Fig.3 where the difference between the pitch periods of the (i)th subframe and the ith subframe is denoted by d i. The proposed shaped lattice is given in Fig., and is composed of a union of non-overlapping hypercubes D i, which are defined by the delta period range and the resolution used in each dimension. Different hypercubes are marked by the dashed lines in the figure, and can be defined by their unique edges. For example, the hypercube D is defined by the edges a, b and c in the figure. The lattice structure used for the simulations was symmetric with respect to axis d, d and d 3. The point distribution in the last three dimensions was uniform and /3 resolution was used. Because of the symmetry, the three-dimensional lattice can be unambiguously defined by one corner point of the projection of D to axis d and d, see Fig.. In the optimal index search from the lattice, a single open-loop pitch estimate was first derived jointly for the last three subframes. The closed-loop pitch was then derived from the neighborhood of the derived open-loop pitch. In the simulations, three different shaped lattices S A, S B, and S C were implemented for the modified IS- coder with corner points ( / 3, / 3 ), ( / 3, / 3 ), and ( / 3, / 3 ), respectively. As a reference, two cubic lattices L A and L B with maximum delta periods of / 3 and / 3 were used. These ranges were selected based on the distributions presented in Fig 3. The simulation results are presented in Table. The results are expressed as segmental SNRs between the voiced sections of the prefiltered input speech and synthesized and postfiltered speech, together with the number of bits needed for the coding of the delta periods in each frame. As it can be seen from Table, the coding efficiency of successive pitch periods can be increased by using the shaped lattice structure. Scheme SegSNR (db) Bits Lattice L A 8.. Lattice L B Shaped Lattice S A Shaped Lattice S B 8.. Shaped Lattice S C Table. Segmental SNRs and the number of bits needed for different three-dimensional lattices.. Algebraic Codebook Indices In low bit rate CELP coders, the target signal for the fixed codebook search is highly periodic due to the inability of the adaptive codebook to model the periodicity of input speech. In ACELP coders, periodicity is thus introduced to the algebraic excitation signal by the pitch sharpening procedure, where the gain-scaled algebraic excitation is repeated by the pitch interval. To further exploit the periodicity of the target signal, an adaptive algebraic codebook was presented in []. The presented approach was based on the assumption that the distribution of the pulses in the algebraic codebook is related to the locations of pitch pulses during voiced speech. In our experiments, we first wanted to verify the assumption that pulse locations in the algebraic excitation are located to the vicinity of pitch pulses during voiced speech in the IS- coder. In the experiments, we first located the pitch pulses in the voiced regions of speech using the time domain energy contour of the LP residual signal. Subsequently, we encoded the same signal with a modified IS- coder. In the modification, all excitation pulse combinations instead of the tabulated positions were used in the coder in order to give more reliable results about the desired pulse positions. Finally, we compared the pitch pulse locations and the excitation pulse positions. Fig. 5 depicts the relative distribution of the excitation pulses with respect to the pitch pulse locations. As it can be seen from the figure, pitch pulse position and its vicinity clearly dominate the graph. In addition, it was observed in the experiments that positive pulses dominated this region over negative pulses. Based on the observations done, a simplistic approach derived from the one described in [] was taken to generate an adaptive algebraic codebook for simulation purposes. In the original IS- coder, 7 bits are used to code four positive or negative pulses per subframe (indices,5,,35;,,,3;,7,,37; 3,,8,9,38,39). In our modification, we replaced the positions,9,,39 of the fourth pulse by adaptive locations centered on the largest energy peak of the adaptive codebook excitation, typically indicating a pitch pulse. After this modification, an increase of. db in segmental SNR during voiced speech was achieved compared to the original method. It should be noted that the improvements by using adaptive pulse positioning are expected to be higher at lower bit rates due to the sparser algebraic codebook. Also, it is likely that further

6 Percentage Distance from Closest Pitch Pulse in Normalized Pitch Periods Fig. 5. Histogram of excitation pulse locations with respect to pitch pulse locations improvements can be achieved by using more sophisticated methods for defining the adaptive pulse positions. 5 Conclusion In this paper the performance of different techniques to improve the parameter analysis in CELP speech coders was evaluated using the IS- speech coder as the simulation platform. The evaluated methods included using extended cost horizon in the algebraic excitation search process, as well as joint optimization and delayed decision coding of the adaptive and algebraic codebook parameters. Also, justification for efficient coding methods based on the distribution of adaptive and algebraic codebook indices in the modified IS- coders was given, and the performance of shaped lattice and adaptive pulse positioning for coding the codebook indices was demonstrated. Based on the simulations done, substantial improvements in terms of objective performance are achieved especially by using delayed decision coding, while the improvements in subjective speech quality were found to be more marginal. On the other hand, it is expected that higher subjective improvements are achieved with the described methods whilst lowering the bit rate from around 7. kbps. While the simulations were made using the IS- speech coder or a modified version of it, the conclusions made can be generalized to a majority of ACELP and CELP coders. References: [] M.R. Schroder and B.S. Atal, Code-excited linear prediction (CELP): high-quality speech at very low bit rates, Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, pp , 985. [] T. Honkanen, J. Vainio, K. Järvinen, P. Haavisto, R. Salami, C. Laflamme and J.-P. Adoul, Enhanced full rate speech codec for IS-3 digital cellular system, Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, pp , 997. [3] ITU-T, Q./ Rapporteur s Meeting Report, September, 999. [] W.B. Kleijn, P. Kroon and D. Nahumi, The RCELP speech coding algorithm, European Transactions on Telecommunications, Vol. 5, No. 5, pp , 99. [5] R. Hagen, E. Ekudden, B. Johansson and W.B. Kleijn, Removal of sparse-excitation artifacts in CELP, Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 5-8, 998. [] S. Cucci, M. Fratti and M. Ronchi, On improving performance of analysis by synthesis speech coders, IEEE Transactions on Speech and Audio Processing, Vol., No. 3, pp. 3-7, 99. [7] L. Zhang, T. Wang and V. Cuperman, A CELP variable rate speech codec with low average rate, Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, pp , 997. [8] K. Mano and T. Moriya,.8 kbit/s delayed decision CELP coder using tree coding, Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. -, 99. [9] T. Eriksson and J. Sjöberg, Dynamic bit allocation in CELP excitation coding, Proceedings of International Conference on Acoustics, Speech and Signal Processing, pp. 7 7, 993. [] T. Amada, K. Miseki and M. Akamine, CELP speech coding based on an adaptive pulse position codebook, Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 3-, 999. [] A. Heikkinen, V.T. Ruoppila and S. Pietilä, A shaped lattice quantizer for successive pitch periods, Proceedings of EUROSPEECH, pp ,.

Perceptual Pre-weighting and Post-inverse weighting for Speech Coding

Perceptual Pre-weighting and Post-inverse weighting for Speech Coding Perceptual Pre-weighting and Post-inverse weighting for Speech Coding Niranjan Shetty and Jerry D. Gibson Department of Electrical and Computer Engineering University of California, Santa Barbara, CA,

More information

AN EFFICIENT TRANSCODING SCHEME FOR G.729 AND G SPEECH CODECS: INTEROPERABILITY OVER THE INTERNET. Received July 2010; revised October 2011

AN EFFICIENT TRANSCODING SCHEME FOR G.729 AND G SPEECH CODECS: INTEROPERABILITY OVER THE INTERNET. Received July 2010; revised October 2011 International Journal of Innovative Computing, Information and Control ICIC International c 2012 ISSN 1349-4198 Volume 8, Number 7(A), July 2012 pp. 4635 4660 AN EFFICIENT TRANSCODING SCHEME FOR G.729

More information

MULTIMODE TREE CODING OF SPEECH WITH PERCEPTUAL PRE-WEIGHTING AND POST-WEIGHTING

MULTIMODE TREE CODING OF SPEECH WITH PERCEPTUAL PRE-WEIGHTING AND POST-WEIGHTING MULTIMODE TREE CODING OF SPEECH WITH PERCEPTUAL PRE-WEIGHTING AND POST-WEIGHTING Pravin Ramadas, Ying-Yi Li, and Jerry D. Gibson Department of Electrical and Computer Engineering, University of California,

More information

ON-LINE SIMULATION MODULES FOR TEACHING SPEECH AND AUDIO COMPRESSION TECHNIQUES

ON-LINE SIMULATION MODULES FOR TEACHING SPEECH AND AUDIO COMPRESSION TECHNIQUES ON-LINE SIMULATION MODULES FOR TEACHING SPEECH AND AUDIO COMPRESSION TECHNIQUES Venkatraman Atti 1 and Andreas Spanias 1 Abstract In this paper, we present a collection of software educational tools for

More information

Speech and audio coding

Speech and audio coding Institut Mines-Telecom Speech and audio coding Marco Cagnazzo, cagnazzo@telecom-paristech.fr MN910 Advanced compression Outline Introduction Introduction Speech signal Music signal Masking Codeurs simples

More information

Digital Speech Coding

Digital Speech Coding Digital Speech Processing David Tipper Associate Professor Graduate Program of Telecommunications and Networking University of Pittsburgh Telcom 2700/INFSCI 1072 Slides 7 http://www.sis.pitt.edu/~dtipper/tipper.html

More information

Switched orthogonalization of fixed-codebook search in code-excited linear-predictive speech coder: Derivation of conditions for switching

Switched orthogonalization of fixed-codebook search in code-excited linear-predictive speech coder: Derivation of conditions for switching Acoust. Sci. & Tech. 37, 1 (2016) TECHNICAL REPORT #2016 The Acoustical Society of Japan Switched orthogonalization of fixed-codebook search in code-excited linear-predictive speech coder: Derivation of

More information

Speech-Coding Techniques. Chapter 3

Speech-Coding Techniques. Chapter 3 Speech-Coding Techniques Chapter 3 Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones and zeros The lower the bandwidth, the lower the quality RTP payload types

More information

Principles of Audio Coding

Principles of Audio Coding Principles of Audio Coding Topics today Introduction VOCODERS Psychoacoustics Equal-Loudness Curve Frequency Masking Temporal Masking (CSIT 410) 2 Introduction Speech compression algorithm focuses on exploiting

More information

THE OPTIMIZATION AND REAL-TIME IMPLEMENTATION OF

THE OPTIMIZATION AND REAL-TIME IMPLEMENTATION OF THE OPTIMIZATION AND REAL-TIME IMPLEMENTATION OF SPEECH CODEC G.729A USING CS-ACELP ON TMS320C6416T Noureddine Aloui 1 Chafik Barnoussi 2 Mourad Talbi 3 Adnane Cherif 4 Department of Physics, Laboratory

More information

Source Coding Basics and Speech Coding. Yao Wang Polytechnic University, Brooklyn, NY11201

Source Coding Basics and Speech Coding. Yao Wang Polytechnic University, Brooklyn, NY11201 Source Coding Basics and Speech Coding Yao Wang Polytechnic University, Brooklyn, NY1121 http://eeweb.poly.edu/~yao Outline Why do we need to compress speech signals Basic components in a source coding

More information

Center for Multimedia Signal Processing

Center for Multimedia Signal Processing Center for Multimedia Signal Processing CELP Decoder FS1016 CELP Codec Dr. M. W. Mak enmwmak@polyu.edu.hk Tel: 27666257 Fax: 23628439 URL: www.en.polyu.edu.hk/~mwmak/mypage.htm 3 Aug., 2000 Summary This

More information

Multi-Pulse Based Code Excited Linear Predictive Speech Coder with Fine Granularity Scalability for Tonal Language

Multi-Pulse Based Code Excited Linear Predictive Speech Coder with Fine Granularity Scalability for Tonal Language Journal of Computer Science 6 (11): 1288-1292, 2010 ISSN 1549-3636 2010 Science Publications Multi-Pulse Based Code Excited Linear Predictive Speech Coder with Fine Granularity Scalability for Tonal Language

More information

Audio Fundamentals, Compression Techniques & Standards. Hamid R. Rabiee Mostafa Salehi, Fatemeh Dabiran, Hoda Ayatollahi Spring 2011

Audio Fundamentals, Compression Techniques & Standards. Hamid R. Rabiee Mostafa Salehi, Fatemeh Dabiran, Hoda Ayatollahi Spring 2011 Audio Fundamentals, Compression Techniques & Standards Hamid R. Rabiee Mostafa Salehi, Fatemeh Dabiran, Hoda Ayatollahi Spring 2011 Outlines Audio Fundamentals Sampling, digitization, quantization μ-law

More information

The BroadVoice Speech Coding Algorithm. Juin-Hwey (Raymond) Chen, Ph.D. Senior Technical Director Broadcom Corporation March 22, 2010

The BroadVoice Speech Coding Algorithm. Juin-Hwey (Raymond) Chen, Ph.D. Senior Technical Director Broadcom Corporation March 22, 2010 The BroadVoice Speech Coding Algorithm Juin-Hwey (Raymond) Chen, Ph.D. Senior Technical Director Broadcom Corporation March 22, 2010 Outline 1. Introduction 2. Basic Codec Structures 3. Short-Term Prediction

More information

2.4 Audio Compression

2.4 Audio Compression 2.4 Audio Compression 2.4.1 Pulse Code Modulation Audio signals are analog waves. The acoustic perception is determined by the frequency (pitch) and the amplitude (loudness). For storage, processing and

More information

The MPEG-4 General Audio Coder

The MPEG-4 General Audio Coder The MPEG-4 General Audio Coder Bernhard Grill Fraunhofer Institute for Integrated Circuits (IIS) grl 6/98 page 1 Outline MPEG-2 Advanced Audio Coding (AAC) MPEG-4 Extensions: Perceptual Noise Substitution

More information

the Audio Engineering Society. Convention Paper Presented at the 120th Convention 2006 May Paris, France

the Audio Engineering Society. Convention Paper Presented at the 120th Convention 2006 May Paris, France Audio Engineering Society Convention Paper Presented at the 120th Convention 2006 May 20 23 Paris, France This convention paper has been reproduced from the author s advance manuscript, without editing,

More information

Missing Frame Recovery Method for G Based on Neural Networks

Missing Frame Recovery Method for G Based on Neural Networks Missing Frame Recovery Method for G7231 Based on Neural Networks JARI TURUNEN & PEKKA LOULA Information Technology, Pori Tampere University of Technology Pohjoisranta 11, POBox 300, FIN-28101 Pori FINLAND

More information

A MULTI-RATE SPEECH AND CHANNEL CODEC: A GSM AMR HALF-RATE CANDIDATE

A MULTI-RATE SPEECH AND CHANNEL CODEC: A GSM AMR HALF-RATE CANDIDATE A MULTI-RATE SPEECH AND CHANNEL CODEC: A GSM AMR HALF-RATE CANDIDATE S.Villette, M.Stefanovic, A.Kondoz Centre for Communication Systems Research University of Surrey, Guildford GU2 5XH, Surrey, United

More information

Implementation of G.729E Speech Coding Algorithm based on TMS320VC5416 YANG Xiaojin 1, a, PAN Jinjin 2,b

Implementation of G.729E Speech Coding Algorithm based on TMS320VC5416 YANG Xiaojin 1, a, PAN Jinjin 2,b International Conference on Materials Engineering and Information Technology Applications (MEITA 2015) Implementation of G.729E Speech Coding Algorithm based on TMS320VC5416 YANG Xiaojin 1, a, PAN Jinjin

More information

Abstract. 1. Introduction

Abstract. 1. Introduction Wideband Speech Coding Standards and Applications Abstract Increasing the bandwidth of sound signals from the telephone bandwidth of 200-3400 Hz to the wider bandwidth of 50-7000 Hz results in increased

More information

ROBUST SPEECH CODING WITH EVS Anssi Rämö, Adriana Vasilache and Henri Toukomaa Nokia Techonologies, Tampere, Finland

ROBUST SPEECH CODING WITH EVS Anssi Rämö, Adriana Vasilache and Henri Toukomaa Nokia Techonologies, Tampere, Finland ROBUST SPEECH CODING WITH EVS Anssi Rämö, Adriana Vasilache and Henri Toukomaa Nokia Techonologies, Tampere, Finland 2015-12-16 1 OUTLINE Very short introduction to EVS Robustness EVS LSF robustness features

More information

Design of a CELP Speech Coder and Study of Complexity vs Quality Trade-offs for Different Codebooks.

Design of a CELP Speech Coder and Study of Complexity vs Quality Trade-offs for Different Codebooks. EECS 651- Source Coding Theory Design of a CELP Speech Coder and Study of Complexity vs Quality Trade-offs for Different Codebooks. Suresh Kumar Devalapalli Raghuram Rangarajan Ramji Venkataramanan Abstract

More information

SAOC and USAC. Spatial Audio Object Coding / Unified Speech and Audio Coding. Lecture Audio Coding WS 2013/14. Dr.-Ing.

SAOC and USAC. Spatial Audio Object Coding / Unified Speech and Audio Coding. Lecture Audio Coding WS 2013/14. Dr.-Ing. SAOC and USAC Spatial Audio Object Coding / Unified Speech and Audio Coding Lecture Audio Coding WS 2013/14 Dr.-Ing. Andreas Franck Fraunhofer Institute for Digital Media Technology IDMT, Germany SAOC

More information

Multimedia Systems Speech II Mahdi Amiri February 2012 Sharif University of Technology

Multimedia Systems Speech II Mahdi Amiri February 2012 Sharif University of Technology Course Presentation Multimedia Systems Speech II Mahdi Amiri February 2012 Sharif University of Technology Homework Original Sound Speech Quantization Companding parameter (µ) Compander Quantization bit

More information

Mahdi Amiri. February Sharif University of Technology

Mahdi Amiri. February Sharif University of Technology Course Presentation Multimedia Systems Speech II Mahdi Amiri February 2014 Sharif University of Technology Speech Compression Road Map Based on Time Domain analysis Differential Pulse-Code Modulation (DPCM)

More information

Multimedia Systems Speech II Hmid R. Rabiee Mahdi Amiri February 2015 Sharif University of Technology

Multimedia Systems Speech II Hmid R. Rabiee Mahdi Amiri February 2015 Sharif University of Technology Course Presentation Multimedia Systems Speech II Hmid R. Rabiee Mahdi Amiri February 25 Sharif University of Technology Speech Compression Road Map Based on Time Domain analysis Differential Pulse-Code

More information

Data Compression. Audio compression

Data Compression. Audio compression 1 Data Compression Audio compression Outline Basics of Digital Audio 2 Introduction What is sound? Signal-to-Noise Ratio (SNR) Digitization Filtering Sampling and Nyquist Theorem Quantization Synthetic

More information

Audio-coding standards

Audio-coding standards Audio-coding standards The goal is to provide CD-quality audio over telecommunications networks. Almost all CD audio coders are based on the so-called psychoacoustic model of the human auditory system.

More information

Scalable Coding of Image Collections with Embedded Descriptors

Scalable Coding of Image Collections with Embedded Descriptors Scalable Coding of Image Collections with Embedded Descriptors N. Adami, A. Boschetti, R. Leonardi, P. Migliorati Department of Electronic for Automation, University of Brescia Via Branze, 38, Brescia,

More information

MPEG-4 General Audio Coding

MPEG-4 General Audio Coding MPEG-4 General Audio Coding Jürgen Herre Fraunhofer Institute for Integrated Circuits (IIS) Dr. Jürgen Herre, hrr@iis.fhg.de 1 General Audio Coding Solid state players, Internet audio, terrestrial and

More information

Lecture 7: Audio Compression & Coding

Lecture 7: Audio Compression & Coding EE E682: Speech & Audio Processing & Recognition Lecture 7: Audio Compression & Coding 1 2 3 Information, compression & quantization Speech coding Wide bandwidth audio coding Dan Ellis

More information

Video Coding Using Spatially Varying Transform

Video Coding Using Spatially Varying Transform Video Coding Using Spatially Varying Transform Cixun Zhang 1, Kemal Ugur 2, Jani Lainema 2, and Moncef Gabbouj 1 1 Tampere University of Technology, Tampere, Finland {cixun.zhang,moncef.gabbouj}@tut.fi

More information

Audio-coding standards

Audio-coding standards Audio-coding standards The goal is to provide CD-quality audio over telecommunications networks. Almost all CD audio coders are based on the so-called psychoacoustic model of the human auditory system.

More information

The Steganography In Inactive Frames Of Voip

The Steganography In Inactive Frames Of Voip The Steganography In Inactive Frames Of Voip This paper describes a novel high-capacity steganography algorithm for embedding data in the inactive frames of low bit rate audio streams encoded by G.723.1

More information

Audio Engineering Society. Convention Paper. Presented at the 126th Convention 2009 May 7 10 Munich, Germany

Audio Engineering Society. Convention Paper. Presented at the 126th Convention 2009 May 7 10 Munich, Germany Audio Engineering Society Convention Paper Presented at the 126th Convention 2009 May 7 10 Munich, Germany 7712 The papers at this Convention have been selected on the basis of a submitted abstract and

More information

AUDIO. Henning Schulzrinne Dept. of Computer Science Columbia University Spring 2015

AUDIO. Henning Schulzrinne Dept. of Computer Science Columbia University Spring 2015 AUDIO Henning Schulzrinne Dept. of Computer Science Columbia University Spring 2015 Key objectives How do humans generate and process sound? How does digital sound work? How fast do I have to sample audio?

More information

Appendix 4. Audio coding algorithms

Appendix 4. Audio coding algorithms Appendix 4. Audio coding algorithms 1 Introduction The main application of audio compression systems is to obtain compact digital representations of high-quality (CD-quality) wideband audio signals. Typically

More information

ABSTRACT AUTOMATIC SPEECH CODEC IDENTIFICATION WITH APPLICATIONS TO TAMPERING DETECTION OF SPEECH RECORDINGS

ABSTRACT AUTOMATIC SPEECH CODEC IDENTIFICATION WITH APPLICATIONS TO TAMPERING DETECTION OF SPEECH RECORDINGS ABSTRACT Title of thesis: AUTOMATIC SPEECH CODEC IDENTIFICATION WITH APPLICATIONS TO TAMPERING DETECTION OF SPEECH RECORDINGS Jingting Zhou, Master of Engineering, 212 Thesis directed by: Professor Carol

More information

Real Time Implementation of TETRA Speech Codec on TMS320C54x

Real Time Implementation of TETRA Speech Codec on TMS320C54x Real Time Implementation of TETRA Speech Codec on TMS320C54x B. Sheetal Kiran, Devendra Jalihal, R. Aravind Department of Electrical Engineering, Indian Institute of Technology Madras Chennai 600 036 {sheetal,

More information

IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 22, NO. 3, MARCH

IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 22, NO. 3, MARCH IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 22, NO. 3, MARCH 2014 697 Cascaded Long Term Prediction for Enhanced Compression of Polyphonic Audio Signals Tejaswi Nanjundaswamy,

More information

Convention Paper 7215

Convention Paper 7215 Audio Engineering Society Convention Paper 7215 Presented at the 123rd Convention 2007 October 5 8 New York, NY, USA The papers at this Convention have been selected on the basis of a submitted abstract

More information

Joint Matrix Quantization of Face Parameters and LPC Coefficients for Low Bit Rate Audiovisual Speech Coding

Joint Matrix Quantization of Face Parameters and LPC Coefficients for Low Bit Rate Audiovisual Speech Coding IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 12, NO. 3, MAY 2004 265 Joint Matrix Quantization of Face Parameters and LPC Coefficients for Low Bit Rate Audiovisual Speech Coding Laurent Girin

More information

QUANTIZER DESIGN FOR EXPLOITING COMMON INFORMATION IN LAYERED CODING. Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose

QUANTIZER DESIGN FOR EXPLOITING COMMON INFORMATION IN LAYERED CODING. Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose QUANTIZER DESIGN FOR EXPLOITING COMMON INFORMATION IN LAYERED CODING Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose Department of Electrical and Computer Engineering University of California,

More information

CT516 Advanced Digital Communications Lecture 7: Speech Encoder

CT516 Advanced Digital Communications Lecture 7: Speech Encoder CT516 Advanced Digital Communications Lecture 7: Speech Encoder Yash M. Vasavada Associate Professor, DA-IICT, Gandhinagar 2nd February 2017 Yash M. Vasavada (DA-IICT) CT516: Adv. Digital Comm. 2nd February

More information

Module 9 AUDIO CODING. Version 2 ECE IIT, Kharagpur

Module 9 AUDIO CODING. Version 2 ECE IIT, Kharagpur Module 9 AUDIO CODING Lesson 29 Transform and Filter banks Instructional Objectives At the end of this lesson, the students should be able to: 1. Define the three layers of MPEG-1 audio coding. 2. Define

More information

Application of wavelet filtering to image compression

Application of wavelet filtering to image compression Application of wavelet filtering to image compression LL3 HL3 LH3 HH3 LH2 HL2 HH2 HL1 LH1 HH1 Fig. 9.1 Wavelet decomposition of image. Application to image compression Application to image compression

More information

Optimized Progressive Coding of Stereo Images Using Discrete Wavelet Transform

Optimized Progressive Coding of Stereo Images Using Discrete Wavelet Transform Optimized Progressive Coding of Stereo Images Using Discrete Wavelet Transform Torsten Palfner, Alexander Mali and Erika Müller Institute of Telecommunications and Information Technology, University of

More information

14th European Signal Processing Conference (EUSIPCO 2006), Florence, Italy, September 4-8, 2006, copyright by EURASIP

14th European Signal Processing Conference (EUSIPCO 2006), Florence, Italy, September 4-8, 2006, copyright by EURASIP TRADEOFF BETWEEN COMPLEXITY AND MEMORY SIZE IN THE 3GPP ENHANCED PLUS DECODER: SPEED-CONSCIOUS AND MEMORY- CONSCIOUS DECODERS ON A 16-BIT FIXED-POINT DSP Osamu Shimada, Toshiyuki Nomura, Akihiko Sugiyama

More information

Optimal Estimation for Error Concealment in Scalable Video Coding

Optimal Estimation for Error Concealment in Scalable Video Coding Optimal Estimation for Error Concealment in Scalable Video Coding Rui Zhang, Shankar L. Regunathan and Kenneth Rose Department of Electrical and Computer Engineering University of California Santa Barbara,

More information

International Journal of Emerging Technology and Advanced Engineering Website: (ISSN , Volume 2, Issue 4, April 2012)

International Journal of Emerging Technology and Advanced Engineering Website:   (ISSN , Volume 2, Issue 4, April 2012) A Technical Analysis Towards Digital Video Compression Rutika Joshi 1, Rajesh Rai 2, Rajesh Nema 3 1 Student, Electronics and Communication Department, NIIST College, Bhopal, 2,3 Prof., Electronics and

More information

S.K.R Engineering College, Chennai, India. 1 2

S.K.R Engineering College, Chennai, India. 1 2 Implementation of AAC Encoder for Audio Broadcasting A.Parkavi 1, T.Kalpalatha Reddy 2. 1 PG Scholar, 2 Dean 1,2 Department of Electronics and Communication Engineering S.K.R Engineering College, Chennai,

More information

REAL-TIME DIGITAL SIGNAL PROCESSING

REAL-TIME DIGITAL SIGNAL PROCESSING REAL-TIME DIGITAL SIGNAL PROCESSING FUNDAMENTALS, IMPLEMENTATIONS AND APPLICATIONS Third Edition Sen M. Kuo Northern Illinois University, USA Bob H. Lee Ittiam Systems, Inc., USA Wenshun Tian Sonus Networks,

More information

A CUSTOM VLSI ARCHITECTURE FOR IMPLEMENTING LOW-DELAY ANALY SIS-BY-SYNTHESIS SPEECH CODING ALGORITHMS

A CUSTOM VLSI ARCHITECTURE FOR IMPLEMENTING LOW-DELAY ANALY SIS-BY-SYNTHESIS SPEECH CODING ALGORITHMS A CUSTOM VLSI ARCHITECTURE FOR IMPLEMENTING LOW-DELAY ANALY SIS-BY-SYNTHESIS SPEECH CODING ALGORITHMS Peter Dean Schuler B.A.Sc., Simon Fraser University, 1989 A THESIS SUBMlTT'ED IN PARTIAL FULFlLLMENT

More information

Perspectives on Multimedia Quality Prediction Methodologies for Advanced Mobile and IP-based Telephony

Perspectives on Multimedia Quality Prediction Methodologies for Advanced Mobile and IP-based Telephony Perspectives on Multimedia Quality Prediction Methodologies for Advanced Mobile and IP-based Telephony Nobuhiko Kitawaki University of Tsukuba 1-1-1, Tennoudai, Tsukuba-shi, 305-8573 Japan. E-mail: kitawaki@cs.tsukuba.ac.jp

More information

New Results in Low Bit Rate Speech Coding and Bandwidth Extension

New Results in Low Bit Rate Speech Coding and Bandwidth Extension Audio Engineering Society Convention Paper Presented at the 121st Convention 2006 October 5 8 San Francisco, CA, USA This convention paper has been reproduced from the author's advance manuscript, without

More information

Parametric Coding of High-Quality Audio

Parametric Coding of High-Quality Audio Parametric Coding of High-Quality Audio Prof. Dr. Gerald Schuller Fraunhofer IDMT & Ilmenau Technical University Ilmenau, Germany 1 Waveform vs Parametric Waveform Filter-bank approach Mainly exploits

More information

Ch. 5: Audio Compression Multimedia Systems

Ch. 5: Audio Compression Multimedia Systems Ch. 5: Audio Compression Multimedia Systems Prof. Ben Lee School of Electrical Engineering and Computer Science Oregon State University Chapter 5: Audio Compression 1 Introduction Need to code digital

More information

Dusseldorf, Germany Agenda item: th -20 th June, Status Report of SMG11 at SMG#32

Dusseldorf, Germany Agenda item: th -20 th June, Status Report of SMG11 at SMG#32 ETSI TC SMG#32 Tdoc SMG P-00-269 Dusseldorf, Germany Agenda item: 6.10 19 th -20 th June, 2000 Source: Chairman, SMG11 * Status Report of SMG11 at SMG#32 Executive Summary This document provides an overview

More information

Real-time Audio Quality Evaluation for Adaptive Multimedia Protocols

Real-time Audio Quality Evaluation for Adaptive Multimedia Protocols Real-time Audio Quality Evaluation for Adaptive Multimedia Protocols Lopamudra Roychoudhuri and Ehab S. Al-Shaer School of Computer Science, Telecommunications and Information Systems, DePaul University,

More information

DRA AUDIO CODING STANDARD

DRA AUDIO CODING STANDARD Applied Mechanics and Materials Online: 2013-06-27 ISSN: 1662-7482, Vol. 330, pp 981-984 doi:10.4028/www.scientific.net/amm.330.981 2013 Trans Tech Publications, Switzerland DRA AUDIO CODING STANDARD Wenhua

More information

Efficient MPEG-2 to H.264/AVC Intra Transcoding in Transform-domain

Efficient MPEG-2 to H.264/AVC Intra Transcoding in Transform-domain MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Efficient MPEG- to H.64/AVC Transcoding in Transform-domain Yeping Su, Jun Xin, Anthony Vetro, Huifang Sun TR005-039 May 005 Abstract In this

More information

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 SUBJECTIVE AND OBJECTIVE QUALITY EVALUATION FOR AUDIO WATERMARKING BASED ON SINUSOIDAL AMPLITUDE MODULATION PACS: 43.10.Pr, 43.60.Ek

More information

Information technology MPEG audio technologies Part 3: Unified speech and audio coding

Information technology MPEG audio technologies Part 3: Unified speech and audio coding INTERNATIONAL STANDARD ISO/IEC 23003-3:2012 TECHNICAL CORRIGENDUM 3 Published 2015-04-01 Corrected version 2016-10-01 INTERNATIONAL ORGANIZATION FOR STANDARDIZATION МЕЖДУНАРОДНАЯ ОРГАНИЗАЦИЯ ПО СТАНДАРТИЗАЦИИ

More information

Synopsis of Basic VoIP Concepts

Synopsis of Basic VoIP Concepts APPENDIX B The Catalyst 4224 Access Gateway Switch (Catalyst 4224) provides Voice over IP (VoIP) gateway applications for a micro branch office. This chapter introduces some basic VoIP concepts. This chapter

More information

KINGS COLLEGE OF ENGINEERING DEPARTMENT OF INFORMATION TECHNOLOGY ACADEMIC YEAR / ODD SEMESTER QUESTION BANK

KINGS COLLEGE OF ENGINEERING DEPARTMENT OF INFORMATION TECHNOLOGY ACADEMIC YEAR / ODD SEMESTER QUESTION BANK KINGS COLLEGE OF ENGINEERING DEPARTMENT OF INFORMATION TECHNOLOGY ACADEMIC YEAR 2011-2012 / ODD SEMESTER QUESTION BANK SUB.CODE / NAME YEAR / SEM : IT1301 INFORMATION CODING TECHNIQUES : III / V UNIT -

More information

Perceptual coding. A psychoacoustic model is used to identify those signals that are influenced by both these effects.

Perceptual coding. A psychoacoustic model is used to identify those signals that are influenced by both these effects. Perceptual coding Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal. Perceptual encoders, however, have been designed for the compression of general

More information

Hybrid Speech Synthesis

Hybrid Speech Synthesis Hybrid Speech Synthesis Simon King Centre for Speech Technology Research University of Edinburgh 2 What are you going to learn? Another recap of unit selection let s properly understand the Acoustic Space

More information

AUDIOVISUAL COMMUNICATION

AUDIOVISUAL COMMUNICATION AUDIOVISUAL COMMUNICATION Laboratory Session: Audio Processing and Coding The objective of this lab session is to get the students familiar with audio processing and coding, notably psychoacoustic analysis

More information

Lecture 16 Perceptual Audio Coding

Lecture 16 Perceptual Audio Coding EECS 225D Audio Signal Processing in Humans and Machines Lecture 16 Perceptual Audio Coding 2012-3-14 Professor Nelson Morgan today s lecture by John Lazzaro www.icsi.berkeley.edu/eecs225d/spr12/ Hero

More information

Perceptual Audio Coders What to listen for: Artifacts of Parametric Coding

Perceptual Audio Coders What to listen for: Artifacts of Parametric Coding Perceptual Audio Coders What to listen for: Artifacts of Parametric Coding Heiko Purnhagen, Bernd Edler University of AES 109th Convention, Los Angeles, September 22-25, 2000 1 Introduction: Parametric

More information

VoIP Forgery Detection

VoIP Forgery Detection VoIP Forgery Detection Satish Tummala, Yanxin Liu and Qingzhong Liu Department of Computer Science Sam Houston State University Huntsville, TX, USA Emails: sct137@shsu.edu; yanxin@shsu.edu; liu@shsu.edu

More information

Blind Measurement of Blocking Artifact in Images

Blind Measurement of Blocking Artifact in Images The University of Texas at Austin Department of Electrical and Computer Engineering EE 38K: Multidimensional Digital Signal Processing Course Project Final Report Blind Measurement of Blocking Artifact

More information

Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal.

Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal. Perceptual coding Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal. Perceptual encoders, however, have been designed for the compression of general

More information

Context based optimal shape coding

Context based optimal shape coding IEEE Signal Processing Society 1999 Workshop on Multimedia Signal Processing September 13-15, 1999, Copenhagen, Denmark Electronic Proceedings 1999 IEEE Context based optimal shape coding Gerry Melnikov,

More information

Text-Independent Speaker Identification

Text-Independent Speaker Identification December 8, 1999 Text-Independent Speaker Identification Til T. Phan and Thomas Soong 1.0 Introduction 1.1 Motivation The problem of speaker identification is an area with many different applications.

More information

An Iterative Joint Codebook and Classifier Improvement Algorithm for Finite- State Vector Quantization

An Iterative Joint Codebook and Classifier Improvement Algorithm for Finite- State Vector Quantization An Iterative Joint Codebook and Classifier Improvement Algorithm for Finite- State Vector Quantization Keren 0. Perlmutter Sharon M. Perlmutter Michelle Effrost Robert M. Gray Information Systems Laboratory

More information

AUDIOVISUAL COMMUNICATION

AUDIOVISUAL COMMUNICATION AUDIOVISUAL COMMUNICATION Laboratory Session: Audio Processing and Coding The objective of this lab session is to get the students familiar with audio processing and coding, notably psychoacoustic analysis

More information

Spectral modeling of musical sounds

Spectral modeling of musical sounds Spectral modeling of musical sounds Xavier Serra Audiovisual Institute, Pompeu Fabra University http://www.iua.upf.es xserra@iua.upf.es 1. Introduction Spectral based analysis/synthesis techniques offer

More information

Module 8: Video Coding Basics Lecture 42: Sub-band coding, Second generation coding, 3D coding. The Lecture Contains: Performance Measures

Module 8: Video Coding Basics Lecture 42: Sub-band coding, Second generation coding, 3D coding. The Lecture Contains: Performance Measures The Lecture Contains: Performance Measures file:///d /...Ganesh%20Rana)/MY%20COURSE_Ganesh%20Rana/Prof.%20Sumana%20Gupta/FINAL%20DVSP/lecture%2042/42_1.htm[12/31/2015 11:57:52 AM] 3) Subband Coding It

More information

Proceedings of Meetings on Acoustics

Proceedings of Meetings on Acoustics Proceedings of Meetings on Acoustics Volume 19, 213 http://acousticalsociety.org/ ICA 213 Montreal Montreal, Canada 2-7 June 213 Engineering Acoustics Session 2pEAb: Controlling Sound Quality 2pEAb1. Subjective

More information

Rate Distortion Optimization in Video Compression

Rate Distortion Optimization in Video Compression Rate Distortion Optimization in Video Compression Xue Tu Dept. of Electrical and Computer Engineering State University of New York at Stony Brook 1. Introduction From Shannon s classic rate distortion

More information

Module 7 VIDEO CODING AND MOTION ESTIMATION

Module 7 VIDEO CODING AND MOTION ESTIMATION Module 7 VIDEO CODING AND MOTION ESTIMATION Lesson 20 Basic Building Blocks & Temporal Redundancy Instructional Objectives At the end of this lesson, the students should be able to: 1. Name at least five

More information

On the Importance of a VoIP Packet

On the Importance of a VoIP Packet On the Importance of a VoIP Packet Christian Hoene, Berthold Rathke, Adam Wolisz Technical University of Berlin hoene@ee.tu-berlin.de Abstract If highly compressed multimedia streams are transported over

More information

Stereo Image Compression

Stereo Image Compression Stereo Image Compression Deepa P. Sundar, Debabrata Sengupta, Divya Elayakumar {deepaps, dsgupta, divyae}@stanford.edu Electrical Engineering, Stanford University, CA. Abstract In this report we describe

More information

SPREAD SPECTRUM AUDIO WATERMARKING SCHEME BASED ON PSYCHOACOUSTIC MODEL

SPREAD SPECTRUM AUDIO WATERMARKING SCHEME BASED ON PSYCHOACOUSTIC MODEL SPREAD SPECTRUM WATERMARKING SCHEME BASED ON PSYCHOACOUSTIC MODEL 1 Yüksel Tokur 2 Ergun Erçelebi e-mail: tokur@gantep.edu.tr e-mail: ercelebi@gantep.edu.tr 1 Gaziantep University, MYO, 27310, Gaziantep,

More information

Audio and video compression

Audio and video compression Audio and video compression 4.1 introduction Unlike text and images, both audio and most video signals are continuously varying analog signals. Compression algorithms associated with digitized audio and

More information

A Quantized Transform-Domain Motion Estimation Technique for H.264 Secondary SP-frames

A Quantized Transform-Domain Motion Estimation Technique for H.264 Secondary SP-frames A Quantized Transform-Domain Motion Estimation Technique for H.264 Secondary SP-frames Ki-Kit Lai, Yui-Lam Chan, and Wan-Chi Siu Centre for Signal Processing Department of Electronic and Information Engineering

More information

Robust Shape Retrieval Using Maximum Likelihood Theory

Robust Shape Retrieval Using Maximum Likelihood Theory Robust Shape Retrieval Using Maximum Likelihood Theory Naif Alajlan 1, Paul Fieguth 2, and Mohamed Kamel 1 1 PAMI Lab, E & CE Dept., UW, Waterloo, ON, N2L 3G1, Canada. naif, mkamel@pami.uwaterloo.ca 2

More information

Compressed Audio Demystified by Hendrik Gideonse and Connor Smith. All Rights Reserved.

Compressed Audio Demystified by Hendrik Gideonse and Connor Smith. All Rights Reserved. Compressed Audio Demystified Why Music Producers Need to Care About Compressed Audio Files Download Sales Up CD Sales Down High-Definition hasn t caught on yet Consumers don t seem to care about high fidelity

More information

Chapter 2 Studies and Implementation of Subband Coder and Decoder of Speech Signal Using Rayleigh Distribution

Chapter 2 Studies and Implementation of Subband Coder and Decoder of Speech Signal Using Rayleigh Distribution Chapter 2 Studies and Implementation of Subband Coder and Decoder of Speech Signal Using Rayleigh Distribution Sangita Roy, Dola B. Gupta, Sheli Sinha Chaudhuri and P. K. Banerjee Abstract In the last

More information

ROW.mp3. Colin Raffel, Jieun Oh, Isaac Wang Music 422 Final Project 3/12/2010

ROW.mp3. Colin Raffel, Jieun Oh, Isaac Wang Music 422 Final Project 3/12/2010 ROW.mp3 Colin Raffel, Jieun Oh, Isaac Wang Music 422 Final Project 3/12/2010 Motivation The realities of mp3 widespread use low quality vs. bit rate when compared to modern codecs Vision for row-mp3 backwards

More information

Multimedia Communications. Audio coding

Multimedia Communications. Audio coding Multimedia Communications Audio coding Introduction Lossy compression schemes can be based on source model (e.g., speech compression) or user model (audio coding) Unlike speech, audio signals can be generated

More information

Features. Sequential encoding. Progressive encoding. Hierarchical encoding. Lossless encoding using a different strategy

Features. Sequential encoding. Progressive encoding. Hierarchical encoding. Lossless encoding using a different strategy JPEG JPEG Joint Photographic Expert Group Voted as international standard in 1992 Works with color and grayscale images, e.g., satellite, medical,... Motivation: The compression ratio of lossless methods

More information

A NEW DCT-BASED WATERMARKING METHOD FOR COPYRIGHT PROTECTION OF DIGITAL AUDIO

A NEW DCT-BASED WATERMARKING METHOD FOR COPYRIGHT PROTECTION OF DIGITAL AUDIO International journal of computer science & information Technology (IJCSIT) Vol., No.5, October A NEW DCT-BASED WATERMARKING METHOD FOR COPYRIGHT PROTECTION OF DIGITAL AUDIO Pranab Kumar Dhar *, Mohammad

More information

signal-to-noise ratio (PSNR), 2

signal-to-noise ratio (PSNR), 2 u m " The Integration in Optics, Mechanics, and Electronics of Digital Versatile Disc Systems (1/3) ---(IV) Digital Video and Audio Signal Processing ƒf NSC87-2218-E-009-036 86 8 1 --- 87 7 31 p m o This

More information

A Synchronization Scheme for Hiding Information in Encoded Bitstream of Inactive Speech Signal

A Synchronization Scheme for Hiding Information in Encoded Bitstream of Inactive Speech Signal Journal of Information Hiding and Multimedia Signal Processing c 2016 ISSN 2073-4212 Ubiquitous International Volume 7, Number 5, September 2016 A Synchronization Scheme for Hiding Information in Encoded

More information

Presents 2006 IMTC Forum ITU-T T Workshop

Presents 2006 IMTC Forum ITU-T T Workshop Presents 2006 IMTC Forum ITU-T T Workshop G.729EV: An 8-32 kbit/s scalable wideband speech and audio coder bitstream interoperable with G.729 Presented by Christophe Beaugeant On behalf of ETRI, France

More information

The following bit rates are recommended for broadcast contribution employing the most commonly used audio coding schemes:

The following bit rates are recommended for broadcast contribution employing the most commonly used audio coding schemes: Page 1 of 8 1. SCOPE This Operational Practice sets out guidelines for minimising the various artefacts that may distort audio signals when low bit-rate coding schemes are employed to convey contribution

More information