On Improving the Performance of an ACELP Speech Coder
|
|
- Oswin Preston
- 5 years ago
- Views:
Transcription
1 On Improving the Performance of an ACELP Speech Coder ARI HEIKKINEN, SAMULI PIETILÄ, VESA T. RUOPPILA, AND SAKARI HIMANEN Nokia Research Center, Speech and Audio Systems Laboratory P.O. Box, FIN-337 Tampere, Finland Abstract: - In this paper we evaluate the performance of a variety of techniques to improve the parameter analysis in CELP speech coders. These methods include using extended cost horizon in the fixed codebook search process, as well as joint optimization and delayed decision coding of the adaptive and fixed codebook parameters. Based on our simulations for the IS- speech coder, substantial improvements in terms of objective performance are achieved especially by using delayed decision coding, while the subjective improvements are more marginal. This paper also presents the justification for efficient coding methods based on the distribution of adaptive and algebraic codebook indices in the modified IS- coders, as well as demonstrates the performance improvements achieved by using a shaped lattice structure and adaptive pulse positioning to encode the adaptive and algebraic codebook indices. While the simulations were made using the IS- speech coder or a modified version of it, the results and observations can be generalized to most ACELP and CELP coders. At lower bit rates the importance of each approach described in this paper is expected to increase. Key-Words: - algebraic code excited linear prediction Introduction In recent years, code excited linear prediction (CELP) [] has been the most popular approach for high quality speech coding at bit rates approximately above kbps. This is especially true for a derivative of CELP coders called algebraic CELP (ACELP), and different ACELP coders have been widely accepted in recent speech coding standardization processes in 3GPP, ITU-T, ETSI and TIA. One example of such a coder is the 7. kbps IS- speech coder adopted by TIA []. However, at bit rates below kbps the quality of CELP coders in general deteriorates rapidly, which is partly proven by the recent efforts in ITU-T to standardize a high quality kbps speech coder [3]. To improve the performance of CELP coders and simultaneously to make it more amenable for lower bit rates, methods including relaxed waveform matching [] and phase dispersion [5] have been suggested, which efficiently exploit the properties of the human speech perception mechanism. On the other hand, to tackle the limitations concerning the parameter analysis in CELP coding, extended cost horizon [], joint optimization [7] and delayed decision coding have been proposed [8]. For increased coding efficiency, also methods exploiting V.T. Ruoppila is presently with VoiceAge Corp. in Montreal, Canada. S. Pietilä is presently with Nokia Mobile Phones in Tampere, Finland. the uneven distribution of the excitation parameters of a CELP coder have been presented, see e.g. [9,, ]. In this paper, we evaluate the performance of using extended cost horizon, joint estimation and delayed decision coding in the excitation search process of the IS- speech coder. Furthermore, the justification for the enhanced methods employing the uneven distribution of the adaptive and algebraic codebook indices are given, together with the simulation results of the proposed approaches for their efficient coding. This paper is organized as follows. In Section the structure of the IS- speech coder is briefly described. The simulation results for using the extended cost horizon, joint optimization and delayed decision coding are presented in Section 3. The empirically found distributions for the adaptive and algebraic codebook indices are shown in Section. The concepts of shaped lattice and adaptive pulse positioning for efficient coding of adaptive and fixed codebook indices are also shortly described, together with the simulation results. Finally, conclusions are drawn. IS- Speech Coder In the ACELP speech coder, a cascade of time variant pitch predictor and linear prediction (LP) filter is used to filter an excitation signal, see Fig.. An all-pole LP filter
2 τ u b ( n ) z b u(n) A( z) s(n) sˆ ( n) e(n) i u ( b k ) z uk ( ) A( z) A(z) P (z) yk ( ) Excitation Generator u c ( n ) g W (z) b Error Minimization Excitation Generator u ( c k ) g Fig. Block diagrams of ACELP encoder (left) and decoder (right). H ( z) = A z = a z a z a p n z () ( ) where a...a p are the coefficients, is used to model the short-time spectral envelope of the speech signal. A pitch predictor of the form = B( z) bz utilizes the pitch periodicity of speech to model the fine structure of the spectrum. The gain b is bounded to the interval of -., and the pitch period, or similarly pitch lag, to the interval of -3 samples (sampling frequency is 8 khz). The pitch predictor is also referred to as long-term predictor (LTP) filter. In Fig., the LTP filter is represented by the feedback loop consisting of the delay z and the gain. The LTP memory can also be seen as a codebook consisting overlapping codevectors. This codebook is usually referred to as the LTP or adaptive codebook. An algebraic excitation, and more generally fixed excitation, signal u c (n) is multiplied by a gain g to form an input signal to the filter cascade. The algebraic excitation signal is composed of pulses having a value of ± and zeros, and the corresponding codebook is called algebraic codebook. The output of the filter cascade is a synthesized speech signal s ˆ( n). An error signal e(n) is computed by subtracting the synthesized speech signal s ˆ( n) from the original speech signal s(n). The optimal adaptive and algebraic codevectors are sequentially selected by minimizing the weighted sum-squared error. The purpose of the weighting filter W(z) is to shape the spectrum of the error signal so that it is less audible. a () The frame length used in the IS- coder is ms, and a frame is further divided into four subframes of equal lengths. One set of LP coefficients is derived for each frame and it is encoded with bits. The other parameters are derived subframe wise. The pitch lag is encoded by bits (8585) while 8 bits ( 7) are used to code the pulse positions together with their signs. The pitch gain and the algebraic codebook gains are vector quantized by 8 bits ( 7). The decoder receives the parameters from the channel, see Fig., and determines the algebraic excitation signal by the received index and gain. The algebraic excitation signal is filtered through the LTP-LP filter cascade to produce the synthesized speech signal. Finally, a postfilter P(z) is employed to enhance the perceptual speech quality. 3 Modified Parameter Analysis In a typical CELP coder, there are two important limitations in the parameter estimation process, which can partly be justified by the reduced complexity. Firstly, different parameters are sequentially optimized instead of joint optimization. Secondly, the cost function used to find the excitation signals (adaptive and fixed) minimizes the sum-squared error within the current subframe, but it does not take into account the effect that the excitation signal has on the subsequent subframes. One result of subframe based error minimization is that the excitation samples at the first positions of the subframe will have greater contribution to the cost function than the samples at the last positions due to LP filtering. To alleviate these problems, it has been proposed in [] that the cost function of the fixed codebook search is extended to cover the beginning of the next
3 Joint Optimization Delayed Decision, NUM ALG = NUM ADA Delayed Decision, NUM ALG = NUM ADA Delayed Decision, NUM ALG = NUM ADA 3.5 Whole Speech.5 Voiced Speech 5. Unvoiced Speech SegSNR Max{ NUM ALG, NUM ADA }. 3 Fig.. Simulation results for joint optimization and delayed decision coding of adaptive and algebraic codebook parameters in the IS- speech coder. subframe. In the presented approach, the target signal and the synthesized speech signal are extended by concatenating their free evolutions (output of zero valued excitation) to the original signals. In [7], the adaptive and fixed codebook parameters were jointly searched instead of sequential search. A solution described in [8] is the delayed decision method, where a predetermined number of fixed and adaptive codebook parameter candidates are chosen for each subframe in the current frame. After the last subframe, the parameter combination that gives the best total performance over the whole frame is chosen. The advantages of this approach include simultaneous optimization of the adaptive and fixed codebook excitation parameters, as well as taking into account the influence of the current subframe parameters to the successive subframes. In delayed decision coding various kinds of tree coding algorithms can be used, which are mainly classified by the decision timing. In the first method of the two most typical ones, a decision is made simultaneously for all subframes in a frame by selecting the best path in the tree. In the other widely used method the decision is made for each subframe s by considering the cumulative distortion from sth to (s N)th subframe. In our simulations the second approach was used with N set to one, resulting thus to an additional coder delay of one frame. This delay is needed to determine the excitation parameters for the last subframe of the current frame. To evaluate the performance of the three methods described above, we implemented them to the IS- speech coder. Based on our simulation results, a maximum increase of. db in segmental SNR was achieved by using the extended cost horizon approach for the algebraic codebook search. This improvement was achieved with the extension length of eight samples while the other extension lengths in range of - samples performed approximately.-. db better than the original coder. In general, the improvements were bigger for voiced than for unvoiced speech. In computing the extended excitation signal, no pitch sharpening was used to the extended algebraic excitation segment. In Fig., the simulation results for different delayed decision configurations in the IS- speech coder are shown. In the figure, the number of adaptive and algebraic codebook parameter sets derived at each stage is depicted by NUM_ADA and NUM_ALG, respectively. The explosion of the amount of paths was restricted by considering only NUM_ADA NUM_ALG best candidates at each stage in the tree. Unquantized gain values were used in the simulations. In addition to different delayed decision configurations, the performance of joint optimization of the adaptive and algebraic codebook parameters within each subframe is illustrated in Fig.. As it can be observed from Fig., clear improvements in terms of segmental SNR can be achieved by using delayed decision coding. Also, improvements can be achieved by joint optimization of adaptive and algebraic codebook excitation parameters although better performance is achieved by delayed decision coding. In informal listening
4 d d d d 3 Fig. 3. The differences between successive pitch periods in the modified IS- speech coder. d d 3 D D 3 D d c D D D d a D D b Fig.. A three-dimensional lattice for delta periods in the modified IS- speech coder. experiments, the improvements achieved by all tested methods were judged to be rather marginal. At lower bit rates, however, the subjective importance of these methods is expected to be higher. Distribution of Codebook Indices. Adaptive Codebook Indices In the IS- speech coder, the smooth evolution of pitch contour during voiced speech is exploited by using differential coding for every other pitch value. The absolute pitch period is searched from the range of 9 / 3-3 samples for the first and third subframe. In the range of 9 / 3-8 / 3 samples, a resolution of /3 is used while integer values are used in the range of 85-3 samples. For the second and fourth subframes the pitch periods are searched from the neighborhood of the pitch period in the previous subframe. The range of the search for the delta pitch periods is - / 3 to 5 / 3 samples using a resolution of /3. Generally speaking, coding of n successive delta pitch periods can be described as an n-dimensional lattice where each dimension represents a pitch period in a corresponding subframe []. In a typical lattice coding of delta periods, attention is only paid to the selection of its boundary values while the rectangular shape of the lattice is maintained. No further care is taken to describe how a suitable set of points is chosen to cover only the most likely points used. Since the pitch period evolves usually smoothly during voiced speech, the rectangular lattice covers also points that are used rarely. Thus, the coding efficiency can be increased by shaping the lattice to eliminate unlikely pitch period combinations from the resulting coding scheme.
5 In [] we proposed a shaped lattice structure derived from the empirically found distribution of delta periods in a modified IS- coder. In the modified coder, the absolute pitch period is used only for the first subframe while delta pitch periods are used for the other subframes. The distribution of delta periods over a large database is shown in Fig.3 where the difference between the pitch periods of the (i)th subframe and the ith subframe is denoted by d i. The proposed shaped lattice is given in Fig., and is composed of a union of non-overlapping hypercubes D i, which are defined by the delta period range and the resolution used in each dimension. Different hypercubes are marked by the dashed lines in the figure, and can be defined by their unique edges. For example, the hypercube D is defined by the edges a, b and c in the figure. The lattice structure used for the simulations was symmetric with respect to axis d, d and d 3. The point distribution in the last three dimensions was uniform and /3 resolution was used. Because of the symmetry, the three-dimensional lattice can be unambiguously defined by one corner point of the projection of D to axis d and d, see Fig.. In the optimal index search from the lattice, a single open-loop pitch estimate was first derived jointly for the last three subframes. The closed-loop pitch was then derived from the neighborhood of the derived open-loop pitch. In the simulations, three different shaped lattices S A, S B, and S C were implemented for the modified IS- coder with corner points ( / 3, / 3 ), ( / 3, / 3 ), and ( / 3, / 3 ), respectively. As a reference, two cubic lattices L A and L B with maximum delta periods of / 3 and / 3 were used. These ranges were selected based on the distributions presented in Fig 3. The simulation results are presented in Table. The results are expressed as segmental SNRs between the voiced sections of the prefiltered input speech and synthesized and postfiltered speech, together with the number of bits needed for the coding of the delta periods in each frame. As it can be seen from Table, the coding efficiency of successive pitch periods can be increased by using the shaped lattice structure. Scheme SegSNR (db) Bits Lattice L A 8.. Lattice L B Shaped Lattice S A Shaped Lattice S B 8.. Shaped Lattice S C Table. Segmental SNRs and the number of bits needed for different three-dimensional lattices.. Algebraic Codebook Indices In low bit rate CELP coders, the target signal for the fixed codebook search is highly periodic due to the inability of the adaptive codebook to model the periodicity of input speech. In ACELP coders, periodicity is thus introduced to the algebraic excitation signal by the pitch sharpening procedure, where the gain-scaled algebraic excitation is repeated by the pitch interval. To further exploit the periodicity of the target signal, an adaptive algebraic codebook was presented in []. The presented approach was based on the assumption that the distribution of the pulses in the algebraic codebook is related to the locations of pitch pulses during voiced speech. In our experiments, we first wanted to verify the assumption that pulse locations in the algebraic excitation are located to the vicinity of pitch pulses during voiced speech in the IS- coder. In the experiments, we first located the pitch pulses in the voiced regions of speech using the time domain energy contour of the LP residual signal. Subsequently, we encoded the same signal with a modified IS- coder. In the modification, all excitation pulse combinations instead of the tabulated positions were used in the coder in order to give more reliable results about the desired pulse positions. Finally, we compared the pitch pulse locations and the excitation pulse positions. Fig. 5 depicts the relative distribution of the excitation pulses with respect to the pitch pulse locations. As it can be seen from the figure, pitch pulse position and its vicinity clearly dominate the graph. In addition, it was observed in the experiments that positive pulses dominated this region over negative pulses. Based on the observations done, a simplistic approach derived from the one described in [] was taken to generate an adaptive algebraic codebook for simulation purposes. In the original IS- coder, 7 bits are used to code four positive or negative pulses per subframe (indices,5,,35;,,,3;,7,,37; 3,,8,9,38,39). In our modification, we replaced the positions,9,,39 of the fourth pulse by adaptive locations centered on the largest energy peak of the adaptive codebook excitation, typically indicating a pitch pulse. After this modification, an increase of. db in segmental SNR during voiced speech was achieved compared to the original method. It should be noted that the improvements by using adaptive pulse positioning are expected to be higher at lower bit rates due to the sparser algebraic codebook. Also, it is likely that further
6 Percentage Distance from Closest Pitch Pulse in Normalized Pitch Periods Fig. 5. Histogram of excitation pulse locations with respect to pitch pulse locations improvements can be achieved by using more sophisticated methods for defining the adaptive pulse positions. 5 Conclusion In this paper the performance of different techniques to improve the parameter analysis in CELP speech coders was evaluated using the IS- speech coder as the simulation platform. The evaluated methods included using extended cost horizon in the algebraic excitation search process, as well as joint optimization and delayed decision coding of the adaptive and algebraic codebook parameters. Also, justification for efficient coding methods based on the distribution of adaptive and algebraic codebook indices in the modified IS- coders was given, and the performance of shaped lattice and adaptive pulse positioning for coding the codebook indices was demonstrated. Based on the simulations done, substantial improvements in terms of objective performance are achieved especially by using delayed decision coding, while the improvements in subjective speech quality were found to be more marginal. On the other hand, it is expected that higher subjective improvements are achieved with the described methods whilst lowering the bit rate from around 7. kbps. While the simulations were made using the IS- speech coder or a modified version of it, the conclusions made can be generalized to a majority of ACELP and CELP coders. References: [] M.R. Schroder and B.S. Atal, Code-excited linear prediction (CELP): high-quality speech at very low bit rates, Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, pp , 985. [] T. Honkanen, J. Vainio, K. Järvinen, P. Haavisto, R. Salami, C. Laflamme and J.-P. Adoul, Enhanced full rate speech codec for IS-3 digital cellular system, Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, pp , 997. [3] ITU-T, Q./ Rapporteur s Meeting Report, September, 999. [] W.B. Kleijn, P. Kroon and D. Nahumi, The RCELP speech coding algorithm, European Transactions on Telecommunications, Vol. 5, No. 5, pp , 99. [5] R. Hagen, E. Ekudden, B. Johansson and W.B. Kleijn, Removal of sparse-excitation artifacts in CELP, Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 5-8, 998. [] S. Cucci, M. Fratti and M. Ronchi, On improving performance of analysis by synthesis speech coders, IEEE Transactions on Speech and Audio Processing, Vol., No. 3, pp. 3-7, 99. [7] L. Zhang, T. Wang and V. Cuperman, A CELP variable rate speech codec with low average rate, Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, pp , 997. [8] K. Mano and T. Moriya,.8 kbit/s delayed decision CELP coder using tree coding, Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. -, 99. [9] T. Eriksson and J. Sjöberg, Dynamic bit allocation in CELP excitation coding, Proceedings of International Conference on Acoustics, Speech and Signal Processing, pp. 7 7, 993. [] T. Amada, K. Miseki and M. Akamine, CELP speech coding based on an adaptive pulse position codebook, Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 3-, 999. [] A. Heikkinen, V.T. Ruoppila and S. Pietilä, A shaped lattice quantizer for successive pitch periods, Proceedings of EUROSPEECH, pp ,.
Perceptual Pre-weighting and Post-inverse weighting for Speech Coding
Perceptual Pre-weighting and Post-inverse weighting for Speech Coding Niranjan Shetty and Jerry D. Gibson Department of Electrical and Computer Engineering University of California, Santa Barbara, CA,
More informationAN EFFICIENT TRANSCODING SCHEME FOR G.729 AND G SPEECH CODECS: INTEROPERABILITY OVER THE INTERNET. Received July 2010; revised October 2011
International Journal of Innovative Computing, Information and Control ICIC International c 2012 ISSN 1349-4198 Volume 8, Number 7(A), July 2012 pp. 4635 4660 AN EFFICIENT TRANSCODING SCHEME FOR G.729
More informationMULTIMODE TREE CODING OF SPEECH WITH PERCEPTUAL PRE-WEIGHTING AND POST-WEIGHTING
MULTIMODE TREE CODING OF SPEECH WITH PERCEPTUAL PRE-WEIGHTING AND POST-WEIGHTING Pravin Ramadas, Ying-Yi Li, and Jerry D. Gibson Department of Electrical and Computer Engineering, University of California,
More informationON-LINE SIMULATION MODULES FOR TEACHING SPEECH AND AUDIO COMPRESSION TECHNIQUES
ON-LINE SIMULATION MODULES FOR TEACHING SPEECH AND AUDIO COMPRESSION TECHNIQUES Venkatraman Atti 1 and Andreas Spanias 1 Abstract In this paper, we present a collection of software educational tools for
More informationSpeech and audio coding
Institut Mines-Telecom Speech and audio coding Marco Cagnazzo, cagnazzo@telecom-paristech.fr MN910 Advanced compression Outline Introduction Introduction Speech signal Music signal Masking Codeurs simples
More informationDigital Speech Coding
Digital Speech Processing David Tipper Associate Professor Graduate Program of Telecommunications and Networking University of Pittsburgh Telcom 2700/INFSCI 1072 Slides 7 http://www.sis.pitt.edu/~dtipper/tipper.html
More informationSwitched orthogonalization of fixed-codebook search in code-excited linear-predictive speech coder: Derivation of conditions for switching
Acoust. Sci. & Tech. 37, 1 (2016) TECHNICAL REPORT #2016 The Acoustical Society of Japan Switched orthogonalization of fixed-codebook search in code-excited linear-predictive speech coder: Derivation of
More informationSpeech-Coding Techniques. Chapter 3
Speech-Coding Techniques Chapter 3 Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones and zeros The lower the bandwidth, the lower the quality RTP payload types
More informationPrinciples of Audio Coding
Principles of Audio Coding Topics today Introduction VOCODERS Psychoacoustics Equal-Loudness Curve Frequency Masking Temporal Masking (CSIT 410) 2 Introduction Speech compression algorithm focuses on exploiting
More informationTHE OPTIMIZATION AND REAL-TIME IMPLEMENTATION OF
THE OPTIMIZATION AND REAL-TIME IMPLEMENTATION OF SPEECH CODEC G.729A USING CS-ACELP ON TMS320C6416T Noureddine Aloui 1 Chafik Barnoussi 2 Mourad Talbi 3 Adnane Cherif 4 Department of Physics, Laboratory
More informationSource Coding Basics and Speech Coding. Yao Wang Polytechnic University, Brooklyn, NY11201
Source Coding Basics and Speech Coding Yao Wang Polytechnic University, Brooklyn, NY1121 http://eeweb.poly.edu/~yao Outline Why do we need to compress speech signals Basic components in a source coding
More informationCenter for Multimedia Signal Processing
Center for Multimedia Signal Processing CELP Decoder FS1016 CELP Codec Dr. M. W. Mak enmwmak@polyu.edu.hk Tel: 27666257 Fax: 23628439 URL: www.en.polyu.edu.hk/~mwmak/mypage.htm 3 Aug., 2000 Summary This
More informationMulti-Pulse Based Code Excited Linear Predictive Speech Coder with Fine Granularity Scalability for Tonal Language
Journal of Computer Science 6 (11): 1288-1292, 2010 ISSN 1549-3636 2010 Science Publications Multi-Pulse Based Code Excited Linear Predictive Speech Coder with Fine Granularity Scalability for Tonal Language
More informationAudio Fundamentals, Compression Techniques & Standards. Hamid R. Rabiee Mostafa Salehi, Fatemeh Dabiran, Hoda Ayatollahi Spring 2011
Audio Fundamentals, Compression Techniques & Standards Hamid R. Rabiee Mostafa Salehi, Fatemeh Dabiran, Hoda Ayatollahi Spring 2011 Outlines Audio Fundamentals Sampling, digitization, quantization μ-law
More informationThe BroadVoice Speech Coding Algorithm. Juin-Hwey (Raymond) Chen, Ph.D. Senior Technical Director Broadcom Corporation March 22, 2010
The BroadVoice Speech Coding Algorithm Juin-Hwey (Raymond) Chen, Ph.D. Senior Technical Director Broadcom Corporation March 22, 2010 Outline 1. Introduction 2. Basic Codec Structures 3. Short-Term Prediction
More information2.4 Audio Compression
2.4 Audio Compression 2.4.1 Pulse Code Modulation Audio signals are analog waves. The acoustic perception is determined by the frequency (pitch) and the amplitude (loudness). For storage, processing and
More informationThe MPEG-4 General Audio Coder
The MPEG-4 General Audio Coder Bernhard Grill Fraunhofer Institute for Integrated Circuits (IIS) grl 6/98 page 1 Outline MPEG-2 Advanced Audio Coding (AAC) MPEG-4 Extensions: Perceptual Noise Substitution
More informationthe Audio Engineering Society. Convention Paper Presented at the 120th Convention 2006 May Paris, France
Audio Engineering Society Convention Paper Presented at the 120th Convention 2006 May 20 23 Paris, France This convention paper has been reproduced from the author s advance manuscript, without editing,
More informationMissing Frame Recovery Method for G Based on Neural Networks
Missing Frame Recovery Method for G7231 Based on Neural Networks JARI TURUNEN & PEKKA LOULA Information Technology, Pori Tampere University of Technology Pohjoisranta 11, POBox 300, FIN-28101 Pori FINLAND
More informationA MULTI-RATE SPEECH AND CHANNEL CODEC: A GSM AMR HALF-RATE CANDIDATE
A MULTI-RATE SPEECH AND CHANNEL CODEC: A GSM AMR HALF-RATE CANDIDATE S.Villette, M.Stefanovic, A.Kondoz Centre for Communication Systems Research University of Surrey, Guildford GU2 5XH, Surrey, United
More informationImplementation of G.729E Speech Coding Algorithm based on TMS320VC5416 YANG Xiaojin 1, a, PAN Jinjin 2,b
International Conference on Materials Engineering and Information Technology Applications (MEITA 2015) Implementation of G.729E Speech Coding Algorithm based on TMS320VC5416 YANG Xiaojin 1, a, PAN Jinjin
More informationAbstract. 1. Introduction
Wideband Speech Coding Standards and Applications Abstract Increasing the bandwidth of sound signals from the telephone bandwidth of 200-3400 Hz to the wider bandwidth of 50-7000 Hz results in increased
More informationROBUST SPEECH CODING WITH EVS Anssi Rämö, Adriana Vasilache and Henri Toukomaa Nokia Techonologies, Tampere, Finland
ROBUST SPEECH CODING WITH EVS Anssi Rämö, Adriana Vasilache and Henri Toukomaa Nokia Techonologies, Tampere, Finland 2015-12-16 1 OUTLINE Very short introduction to EVS Robustness EVS LSF robustness features
More informationDesign of a CELP Speech Coder and Study of Complexity vs Quality Trade-offs for Different Codebooks.
EECS 651- Source Coding Theory Design of a CELP Speech Coder and Study of Complexity vs Quality Trade-offs for Different Codebooks. Suresh Kumar Devalapalli Raghuram Rangarajan Ramji Venkataramanan Abstract
More informationSAOC and USAC. Spatial Audio Object Coding / Unified Speech and Audio Coding. Lecture Audio Coding WS 2013/14. Dr.-Ing.
SAOC and USAC Spatial Audio Object Coding / Unified Speech and Audio Coding Lecture Audio Coding WS 2013/14 Dr.-Ing. Andreas Franck Fraunhofer Institute for Digital Media Technology IDMT, Germany SAOC
More informationMultimedia Systems Speech II Mahdi Amiri February 2012 Sharif University of Technology
Course Presentation Multimedia Systems Speech II Mahdi Amiri February 2012 Sharif University of Technology Homework Original Sound Speech Quantization Companding parameter (µ) Compander Quantization bit
More informationMahdi Amiri. February Sharif University of Technology
Course Presentation Multimedia Systems Speech II Mahdi Amiri February 2014 Sharif University of Technology Speech Compression Road Map Based on Time Domain analysis Differential Pulse-Code Modulation (DPCM)
More informationMultimedia Systems Speech II Hmid R. Rabiee Mahdi Amiri February 2015 Sharif University of Technology
Course Presentation Multimedia Systems Speech II Hmid R. Rabiee Mahdi Amiri February 25 Sharif University of Technology Speech Compression Road Map Based on Time Domain analysis Differential Pulse-Code
More informationData Compression. Audio compression
1 Data Compression Audio compression Outline Basics of Digital Audio 2 Introduction What is sound? Signal-to-Noise Ratio (SNR) Digitization Filtering Sampling and Nyquist Theorem Quantization Synthetic
More informationAudio-coding standards
Audio-coding standards The goal is to provide CD-quality audio over telecommunications networks. Almost all CD audio coders are based on the so-called psychoacoustic model of the human auditory system.
More informationScalable Coding of Image Collections with Embedded Descriptors
Scalable Coding of Image Collections with Embedded Descriptors N. Adami, A. Boschetti, R. Leonardi, P. Migliorati Department of Electronic for Automation, University of Brescia Via Branze, 38, Brescia,
More informationMPEG-4 General Audio Coding
MPEG-4 General Audio Coding Jürgen Herre Fraunhofer Institute for Integrated Circuits (IIS) Dr. Jürgen Herre, hrr@iis.fhg.de 1 General Audio Coding Solid state players, Internet audio, terrestrial and
More informationLecture 7: Audio Compression & Coding
EE E682: Speech & Audio Processing & Recognition Lecture 7: Audio Compression & Coding 1 2 3 Information, compression & quantization Speech coding Wide bandwidth audio coding Dan Ellis
More informationVideo Coding Using Spatially Varying Transform
Video Coding Using Spatially Varying Transform Cixun Zhang 1, Kemal Ugur 2, Jani Lainema 2, and Moncef Gabbouj 1 1 Tampere University of Technology, Tampere, Finland {cixun.zhang,moncef.gabbouj}@tut.fi
More informationAudio-coding standards
Audio-coding standards The goal is to provide CD-quality audio over telecommunications networks. Almost all CD audio coders are based on the so-called psychoacoustic model of the human auditory system.
More informationThe Steganography In Inactive Frames Of Voip
The Steganography In Inactive Frames Of Voip This paper describes a novel high-capacity steganography algorithm for embedding data in the inactive frames of low bit rate audio streams encoded by G.723.1
More informationAudio Engineering Society. Convention Paper. Presented at the 126th Convention 2009 May 7 10 Munich, Germany
Audio Engineering Society Convention Paper Presented at the 126th Convention 2009 May 7 10 Munich, Germany 7712 The papers at this Convention have been selected on the basis of a submitted abstract and
More informationAUDIO. Henning Schulzrinne Dept. of Computer Science Columbia University Spring 2015
AUDIO Henning Schulzrinne Dept. of Computer Science Columbia University Spring 2015 Key objectives How do humans generate and process sound? How does digital sound work? How fast do I have to sample audio?
More informationAppendix 4. Audio coding algorithms
Appendix 4. Audio coding algorithms 1 Introduction The main application of audio compression systems is to obtain compact digital representations of high-quality (CD-quality) wideband audio signals. Typically
More informationABSTRACT AUTOMATIC SPEECH CODEC IDENTIFICATION WITH APPLICATIONS TO TAMPERING DETECTION OF SPEECH RECORDINGS
ABSTRACT Title of thesis: AUTOMATIC SPEECH CODEC IDENTIFICATION WITH APPLICATIONS TO TAMPERING DETECTION OF SPEECH RECORDINGS Jingting Zhou, Master of Engineering, 212 Thesis directed by: Professor Carol
More informationReal Time Implementation of TETRA Speech Codec on TMS320C54x
Real Time Implementation of TETRA Speech Codec on TMS320C54x B. Sheetal Kiran, Devendra Jalihal, R. Aravind Department of Electrical Engineering, Indian Institute of Technology Madras Chennai 600 036 {sheetal,
More informationIEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 22, NO. 3, MARCH
IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 22, NO. 3, MARCH 2014 697 Cascaded Long Term Prediction for Enhanced Compression of Polyphonic Audio Signals Tejaswi Nanjundaswamy,
More informationConvention Paper 7215
Audio Engineering Society Convention Paper 7215 Presented at the 123rd Convention 2007 October 5 8 New York, NY, USA The papers at this Convention have been selected on the basis of a submitted abstract
More informationJoint Matrix Quantization of Face Parameters and LPC Coefficients for Low Bit Rate Audiovisual Speech Coding
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 12, NO. 3, MAY 2004 265 Joint Matrix Quantization of Face Parameters and LPC Coefficients for Low Bit Rate Audiovisual Speech Coding Laurent Girin
More informationQUANTIZER DESIGN FOR EXPLOITING COMMON INFORMATION IN LAYERED CODING. Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose
QUANTIZER DESIGN FOR EXPLOITING COMMON INFORMATION IN LAYERED CODING Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose Department of Electrical and Computer Engineering University of California,
More informationCT516 Advanced Digital Communications Lecture 7: Speech Encoder
CT516 Advanced Digital Communications Lecture 7: Speech Encoder Yash M. Vasavada Associate Professor, DA-IICT, Gandhinagar 2nd February 2017 Yash M. Vasavada (DA-IICT) CT516: Adv. Digital Comm. 2nd February
More informationModule 9 AUDIO CODING. Version 2 ECE IIT, Kharagpur
Module 9 AUDIO CODING Lesson 29 Transform and Filter banks Instructional Objectives At the end of this lesson, the students should be able to: 1. Define the three layers of MPEG-1 audio coding. 2. Define
More informationApplication of wavelet filtering to image compression
Application of wavelet filtering to image compression LL3 HL3 LH3 HH3 LH2 HL2 HH2 HL1 LH1 HH1 Fig. 9.1 Wavelet decomposition of image. Application to image compression Application to image compression
More informationOptimized Progressive Coding of Stereo Images Using Discrete Wavelet Transform
Optimized Progressive Coding of Stereo Images Using Discrete Wavelet Transform Torsten Palfner, Alexander Mali and Erika Müller Institute of Telecommunications and Information Technology, University of
More information14th European Signal Processing Conference (EUSIPCO 2006), Florence, Italy, September 4-8, 2006, copyright by EURASIP
TRADEOFF BETWEEN COMPLEXITY AND MEMORY SIZE IN THE 3GPP ENHANCED PLUS DECODER: SPEED-CONSCIOUS AND MEMORY- CONSCIOUS DECODERS ON A 16-BIT FIXED-POINT DSP Osamu Shimada, Toshiyuki Nomura, Akihiko Sugiyama
More informationOptimal Estimation for Error Concealment in Scalable Video Coding
Optimal Estimation for Error Concealment in Scalable Video Coding Rui Zhang, Shankar L. Regunathan and Kenneth Rose Department of Electrical and Computer Engineering University of California Santa Barbara,
More informationInternational Journal of Emerging Technology and Advanced Engineering Website: (ISSN , Volume 2, Issue 4, April 2012)
A Technical Analysis Towards Digital Video Compression Rutika Joshi 1, Rajesh Rai 2, Rajesh Nema 3 1 Student, Electronics and Communication Department, NIIST College, Bhopal, 2,3 Prof., Electronics and
More informationS.K.R Engineering College, Chennai, India. 1 2
Implementation of AAC Encoder for Audio Broadcasting A.Parkavi 1, T.Kalpalatha Reddy 2. 1 PG Scholar, 2 Dean 1,2 Department of Electronics and Communication Engineering S.K.R Engineering College, Chennai,
More informationREAL-TIME DIGITAL SIGNAL PROCESSING
REAL-TIME DIGITAL SIGNAL PROCESSING FUNDAMENTALS, IMPLEMENTATIONS AND APPLICATIONS Third Edition Sen M. Kuo Northern Illinois University, USA Bob H. Lee Ittiam Systems, Inc., USA Wenshun Tian Sonus Networks,
More informationA CUSTOM VLSI ARCHITECTURE FOR IMPLEMENTING LOW-DELAY ANALY SIS-BY-SYNTHESIS SPEECH CODING ALGORITHMS
A CUSTOM VLSI ARCHITECTURE FOR IMPLEMENTING LOW-DELAY ANALY SIS-BY-SYNTHESIS SPEECH CODING ALGORITHMS Peter Dean Schuler B.A.Sc., Simon Fraser University, 1989 A THESIS SUBMlTT'ED IN PARTIAL FULFlLLMENT
More informationPerspectives on Multimedia Quality Prediction Methodologies for Advanced Mobile and IP-based Telephony
Perspectives on Multimedia Quality Prediction Methodologies for Advanced Mobile and IP-based Telephony Nobuhiko Kitawaki University of Tsukuba 1-1-1, Tennoudai, Tsukuba-shi, 305-8573 Japan. E-mail: kitawaki@cs.tsukuba.ac.jp
More informationNew Results in Low Bit Rate Speech Coding and Bandwidth Extension
Audio Engineering Society Convention Paper Presented at the 121st Convention 2006 October 5 8 San Francisco, CA, USA This convention paper has been reproduced from the author's advance manuscript, without
More informationParametric Coding of High-Quality Audio
Parametric Coding of High-Quality Audio Prof. Dr. Gerald Schuller Fraunhofer IDMT & Ilmenau Technical University Ilmenau, Germany 1 Waveform vs Parametric Waveform Filter-bank approach Mainly exploits
More informationCh. 5: Audio Compression Multimedia Systems
Ch. 5: Audio Compression Multimedia Systems Prof. Ben Lee School of Electrical Engineering and Computer Science Oregon State University Chapter 5: Audio Compression 1 Introduction Need to code digital
More informationDusseldorf, Germany Agenda item: th -20 th June, Status Report of SMG11 at SMG#32
ETSI TC SMG#32 Tdoc SMG P-00-269 Dusseldorf, Germany Agenda item: 6.10 19 th -20 th June, 2000 Source: Chairman, SMG11 * Status Report of SMG11 at SMG#32 Executive Summary This document provides an overview
More informationReal-time Audio Quality Evaluation for Adaptive Multimedia Protocols
Real-time Audio Quality Evaluation for Adaptive Multimedia Protocols Lopamudra Roychoudhuri and Ehab S. Al-Shaer School of Computer Science, Telecommunications and Information Systems, DePaul University,
More informationDRA AUDIO CODING STANDARD
Applied Mechanics and Materials Online: 2013-06-27 ISSN: 1662-7482, Vol. 330, pp 981-984 doi:10.4028/www.scientific.net/amm.330.981 2013 Trans Tech Publications, Switzerland DRA AUDIO CODING STANDARD Wenhua
More informationEfficient MPEG-2 to H.264/AVC Intra Transcoding in Transform-domain
MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Efficient MPEG- to H.64/AVC Transcoding in Transform-domain Yeping Su, Jun Xin, Anthony Vetro, Huifang Sun TR005-039 May 005 Abstract In this
More information19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007
19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 SUBJECTIVE AND OBJECTIVE QUALITY EVALUATION FOR AUDIO WATERMARKING BASED ON SINUSOIDAL AMPLITUDE MODULATION PACS: 43.10.Pr, 43.60.Ek
More informationInformation technology MPEG audio technologies Part 3: Unified speech and audio coding
INTERNATIONAL STANDARD ISO/IEC 23003-3:2012 TECHNICAL CORRIGENDUM 3 Published 2015-04-01 Corrected version 2016-10-01 INTERNATIONAL ORGANIZATION FOR STANDARDIZATION МЕЖДУНАРОДНАЯ ОРГАНИЗАЦИЯ ПО СТАНДАРТИЗАЦИИ
More informationSynopsis of Basic VoIP Concepts
APPENDIX B The Catalyst 4224 Access Gateway Switch (Catalyst 4224) provides Voice over IP (VoIP) gateway applications for a micro branch office. This chapter introduces some basic VoIP concepts. This chapter
More informationKINGS COLLEGE OF ENGINEERING DEPARTMENT OF INFORMATION TECHNOLOGY ACADEMIC YEAR / ODD SEMESTER QUESTION BANK
KINGS COLLEGE OF ENGINEERING DEPARTMENT OF INFORMATION TECHNOLOGY ACADEMIC YEAR 2011-2012 / ODD SEMESTER QUESTION BANK SUB.CODE / NAME YEAR / SEM : IT1301 INFORMATION CODING TECHNIQUES : III / V UNIT -
More informationPerceptual coding. A psychoacoustic model is used to identify those signals that are influenced by both these effects.
Perceptual coding Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal. Perceptual encoders, however, have been designed for the compression of general
More informationHybrid Speech Synthesis
Hybrid Speech Synthesis Simon King Centre for Speech Technology Research University of Edinburgh 2 What are you going to learn? Another recap of unit selection let s properly understand the Acoustic Space
More informationAUDIOVISUAL COMMUNICATION
AUDIOVISUAL COMMUNICATION Laboratory Session: Audio Processing and Coding The objective of this lab session is to get the students familiar with audio processing and coding, notably psychoacoustic analysis
More informationLecture 16 Perceptual Audio Coding
EECS 225D Audio Signal Processing in Humans and Machines Lecture 16 Perceptual Audio Coding 2012-3-14 Professor Nelson Morgan today s lecture by John Lazzaro www.icsi.berkeley.edu/eecs225d/spr12/ Hero
More informationPerceptual Audio Coders What to listen for: Artifacts of Parametric Coding
Perceptual Audio Coders What to listen for: Artifacts of Parametric Coding Heiko Purnhagen, Bernd Edler University of AES 109th Convention, Los Angeles, September 22-25, 2000 1 Introduction: Parametric
More informationVoIP Forgery Detection
VoIP Forgery Detection Satish Tummala, Yanxin Liu and Qingzhong Liu Department of Computer Science Sam Houston State University Huntsville, TX, USA Emails: sct137@shsu.edu; yanxin@shsu.edu; liu@shsu.edu
More informationBlind Measurement of Blocking Artifact in Images
The University of Texas at Austin Department of Electrical and Computer Engineering EE 38K: Multidimensional Digital Signal Processing Course Project Final Report Blind Measurement of Blocking Artifact
More informationBoth LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal.
Perceptual coding Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal. Perceptual encoders, however, have been designed for the compression of general
More informationContext based optimal shape coding
IEEE Signal Processing Society 1999 Workshop on Multimedia Signal Processing September 13-15, 1999, Copenhagen, Denmark Electronic Proceedings 1999 IEEE Context based optimal shape coding Gerry Melnikov,
More informationText-Independent Speaker Identification
December 8, 1999 Text-Independent Speaker Identification Til T. Phan and Thomas Soong 1.0 Introduction 1.1 Motivation The problem of speaker identification is an area with many different applications.
More informationAn Iterative Joint Codebook and Classifier Improvement Algorithm for Finite- State Vector Quantization
An Iterative Joint Codebook and Classifier Improvement Algorithm for Finite- State Vector Quantization Keren 0. Perlmutter Sharon M. Perlmutter Michelle Effrost Robert M. Gray Information Systems Laboratory
More informationAUDIOVISUAL COMMUNICATION
AUDIOVISUAL COMMUNICATION Laboratory Session: Audio Processing and Coding The objective of this lab session is to get the students familiar with audio processing and coding, notably psychoacoustic analysis
More informationSpectral modeling of musical sounds
Spectral modeling of musical sounds Xavier Serra Audiovisual Institute, Pompeu Fabra University http://www.iua.upf.es xserra@iua.upf.es 1. Introduction Spectral based analysis/synthesis techniques offer
More informationModule 8: Video Coding Basics Lecture 42: Sub-band coding, Second generation coding, 3D coding. The Lecture Contains: Performance Measures
The Lecture Contains: Performance Measures file:///d /...Ganesh%20Rana)/MY%20COURSE_Ganesh%20Rana/Prof.%20Sumana%20Gupta/FINAL%20DVSP/lecture%2042/42_1.htm[12/31/2015 11:57:52 AM] 3) Subband Coding It
More informationProceedings of Meetings on Acoustics
Proceedings of Meetings on Acoustics Volume 19, 213 http://acousticalsociety.org/ ICA 213 Montreal Montreal, Canada 2-7 June 213 Engineering Acoustics Session 2pEAb: Controlling Sound Quality 2pEAb1. Subjective
More informationRate Distortion Optimization in Video Compression
Rate Distortion Optimization in Video Compression Xue Tu Dept. of Electrical and Computer Engineering State University of New York at Stony Brook 1. Introduction From Shannon s classic rate distortion
More informationModule 7 VIDEO CODING AND MOTION ESTIMATION
Module 7 VIDEO CODING AND MOTION ESTIMATION Lesson 20 Basic Building Blocks & Temporal Redundancy Instructional Objectives At the end of this lesson, the students should be able to: 1. Name at least five
More informationOn the Importance of a VoIP Packet
On the Importance of a VoIP Packet Christian Hoene, Berthold Rathke, Adam Wolisz Technical University of Berlin hoene@ee.tu-berlin.de Abstract If highly compressed multimedia streams are transported over
More informationStereo Image Compression
Stereo Image Compression Deepa P. Sundar, Debabrata Sengupta, Divya Elayakumar {deepaps, dsgupta, divyae}@stanford.edu Electrical Engineering, Stanford University, CA. Abstract In this report we describe
More informationSPREAD SPECTRUM AUDIO WATERMARKING SCHEME BASED ON PSYCHOACOUSTIC MODEL
SPREAD SPECTRUM WATERMARKING SCHEME BASED ON PSYCHOACOUSTIC MODEL 1 Yüksel Tokur 2 Ergun Erçelebi e-mail: tokur@gantep.edu.tr e-mail: ercelebi@gantep.edu.tr 1 Gaziantep University, MYO, 27310, Gaziantep,
More informationAudio and video compression
Audio and video compression 4.1 introduction Unlike text and images, both audio and most video signals are continuously varying analog signals. Compression algorithms associated with digitized audio and
More informationA Quantized Transform-Domain Motion Estimation Technique for H.264 Secondary SP-frames
A Quantized Transform-Domain Motion Estimation Technique for H.264 Secondary SP-frames Ki-Kit Lai, Yui-Lam Chan, and Wan-Chi Siu Centre for Signal Processing Department of Electronic and Information Engineering
More informationRobust Shape Retrieval Using Maximum Likelihood Theory
Robust Shape Retrieval Using Maximum Likelihood Theory Naif Alajlan 1, Paul Fieguth 2, and Mohamed Kamel 1 1 PAMI Lab, E & CE Dept., UW, Waterloo, ON, N2L 3G1, Canada. naif, mkamel@pami.uwaterloo.ca 2
More informationCompressed Audio Demystified by Hendrik Gideonse and Connor Smith. All Rights Reserved.
Compressed Audio Demystified Why Music Producers Need to Care About Compressed Audio Files Download Sales Up CD Sales Down High-Definition hasn t caught on yet Consumers don t seem to care about high fidelity
More informationChapter 2 Studies and Implementation of Subband Coder and Decoder of Speech Signal Using Rayleigh Distribution
Chapter 2 Studies and Implementation of Subband Coder and Decoder of Speech Signal Using Rayleigh Distribution Sangita Roy, Dola B. Gupta, Sheli Sinha Chaudhuri and P. K. Banerjee Abstract In the last
More informationROW.mp3. Colin Raffel, Jieun Oh, Isaac Wang Music 422 Final Project 3/12/2010
ROW.mp3 Colin Raffel, Jieun Oh, Isaac Wang Music 422 Final Project 3/12/2010 Motivation The realities of mp3 widespread use low quality vs. bit rate when compared to modern codecs Vision for row-mp3 backwards
More informationMultimedia Communications. Audio coding
Multimedia Communications Audio coding Introduction Lossy compression schemes can be based on source model (e.g., speech compression) or user model (audio coding) Unlike speech, audio signals can be generated
More informationFeatures. Sequential encoding. Progressive encoding. Hierarchical encoding. Lossless encoding using a different strategy
JPEG JPEG Joint Photographic Expert Group Voted as international standard in 1992 Works with color and grayscale images, e.g., satellite, medical,... Motivation: The compression ratio of lossless methods
More informationA NEW DCT-BASED WATERMARKING METHOD FOR COPYRIGHT PROTECTION OF DIGITAL AUDIO
International journal of computer science & information Technology (IJCSIT) Vol., No.5, October A NEW DCT-BASED WATERMARKING METHOD FOR COPYRIGHT PROTECTION OF DIGITAL AUDIO Pranab Kumar Dhar *, Mohammad
More informationsignal-to-noise ratio (PSNR), 2
u m " The Integration in Optics, Mechanics, and Electronics of Digital Versatile Disc Systems (1/3) ---(IV) Digital Video and Audio Signal Processing ƒf NSC87-2218-E-009-036 86 8 1 --- 87 7 31 p m o This
More informationA Synchronization Scheme for Hiding Information in Encoded Bitstream of Inactive Speech Signal
Journal of Information Hiding and Multimedia Signal Processing c 2016 ISSN 2073-4212 Ubiquitous International Volume 7, Number 5, September 2016 A Synchronization Scheme for Hiding Information in Encoded
More informationPresents 2006 IMTC Forum ITU-T T Workshop
Presents 2006 IMTC Forum ITU-T T Workshop G.729EV: An 8-32 kbit/s scalable wideband speech and audio coder bitstream interoperable with G.729 Presented by Christophe Beaugeant On behalf of ETRI, France
More informationThe following bit rates are recommended for broadcast contribution employing the most commonly used audio coding schemes:
Page 1 of 8 1. SCOPE This Operational Practice sets out guidelines for minimising the various artefacts that may distort audio signals when low bit-rate coding schemes are employed to convey contribution
More information