Audio Engineering Society. Convention Paper. Presented at the 126th Convention 2009 May 7 10 Munich, Germany

Size: px
Start display at page:

Download "Audio Engineering Society. Convention Paper. Presented at the 126th Convention 2009 May 7 10 Munich, Germany"

Transcription

1 Audio Engineering Society Convention Paper Presented at the 126th Convention 2009 May 7 10 Munich, Germany 7712 The papers at this Convention have been selected on the basis of a submitted abstract and extended precis that have been peer reviewed by at least two qualified anonymous reviewers. This convention paper has been reproduced from the author's advance manuscript, without editing, corrections, or consideration by the Review Board. The AES takes no responsibility for the contents. Additional papers may be obtained by sending request and remittance to Audio Engineering Society, 60 East 42 nd Street, New York, New York , USA; also see All rights reserved. Reproduction of this paper, or any portion thereof, is not permitted without direct permission from the Journal of the Audio Engineering Society. for transitions between LPC-based and non-lpc based audio coding Jeremie Lecomte 1, Philippe Gournay 2, Ralf Geiger 1, Bruno Bessette 2 and Max Neuendorf 1 1 Fraunhofer IIS, Erlangen, 91058, Germany 2 Université de Sherbrooke, Sherbrooke, Québec, J1K2R1, Canada Correspondence should be addressed to Jérémie Lecomte (amm-info@iis.fraunhofer.de) ABSTRACT The reference model selected by MPEG for the forthcoming unified speech and audio codec (USAC) switches between a non-lpc based coding mode (based on AAC) operating in the transform domain and an LPC-based coding mode (derived from AMR-WB+) operating either in the time domain (ACELP) or in the frequency domain (wlpt). Seamlessly switching between these different coding modes required the design of a new set of cross-fade windows optimized to minimize the amount of overhead information sent during transitions between LPC-based and non-lpc based coding. This paper presents the new set of windows which was designed in order to provide an adequate trade-off between overlap duration and time/frequency resolution, and to maintain the benefits of critical sampling through all coding modes. 1. INTRODUCTION It is widely known that time domain codecs based on a Linear Predictive Coding (LPC) representation, such as CELP and its derivatives, perform better on speech signals, while frequency domain codecs based on Modified Discrete Cosine Transform (MDCT) decomposition, such as the Advanced Audio Coding (AAC) codec, perform better on music and on general audio signals. In the past few years, there has been a growing request for a codec capable of good performance on both speech and music signals at low bit rates (below 64 kbps). To fulfill this need, MPEG issued a Call for Proposals for a unified speech and audio codec (USAC) in 2007 [1] and a first reference model was selected in July This reference model is a switched codec which makes use of either a non-lpc based transform domain audio core codec or an LPC-based core codec.

2 Seamlessly switching between these different core codecs requires the use of properly designed cross-fade windows. This paper presents the new set of windows which was designed for the USAC codec. The goal of this new set of windows is to minimize the amount of overhead information sent during transitions between LPC-based and non-lpc based coding, to provide an adequate trade-off between overlap duration and time/frequency resolution, and to maintain the benefits of critical sampling through all coding modes. The paper is organized as follows. The AMR-WB+ and HE-AAC codecs which form the basis of the USAC codec are briefly reviewed in section 2. Section 3 gives an overview of the USAC codec. The set of windows developed for transitions between LPC-based and non- LPC based core coding in USAC is described in section 4. Finally, conclusions are drawn in section STATE-OF-THE-ART SPEECH AND AUDIO CODECS This section will shortly describe the main characteristics of the two standards, AMR-WB+ and HE-AAC, which form the basis of the new USAC codec An LPC-based codec: AMR-WB+ The Extended AMR-WB (AMR-WB+) [2] audio coder is a multi-rate audio coder, capable of encoding mono and stereo signals at bit rates ranging from 6 to 48 kbps. Based on LPC, the coder uses a multi-mode encoding model which can switch, on a frame-by-frame basis, between time domain and frequency domain encoding. In time domain mode, the input signal is encoded using the Algebraic Code Excited Linear Prediction (ACELP) encoder from the 3GPP AMR-WB speech coding standard [3]. In frequency domain mode, an LPCweighted version of the input signal is encoded in the FFT domain using Transform Coded excitation (TCX). The input signal is split into 1024 samples super-frames. Each super-frame can be divided in frames of 256, 512 or 1024 samples. Only short frames (nominally 256 samples) are used in time domain mode, while either short, medium or long frames (nominally 256, 512 or 1024 samples) are used in frequency domain mode. A super-frame can therefore be encoded using 26 different ACELP/TCX mode combinations. Although some open-loop strategies exist, the mode combination is normally determined by a closed loop mode selection procedure which minimizes the total weighted error. The AMR-WB+ codec also includes tools for bandwidth and stereo extension. In bandwidth extension, the upper half band of the signal is encoded at a very low bit rate (typically 800 bps) using a parametric approach which relies on spectral folding and spectral envelope shaping (using an LP filter). The stereo image of the input audio signal is encoded using a mid/side representation and a sub-band coding approach A non-lpc based codec: HE-AAC MPEG-4 Advanced Audio Coding (AAC) [4] is a generic audio coding scheme. It utilizes the Modified Discrete Cosine Transform (MDCT) [5] to represent the audio signal in frequency domain. The quantization and coding of the MDCT spectrum is controlled by a perceptual model. Several additional coding tools ensure that the codec provides efficient coding for general audio signals. These tools include a time-variant MDCT filter bank, which allows a switching between more precise frequency and time resolution. Therefore, two different transform lengths of 1024 and 128 samples are available. Table 1 shows the AAC standard windows. The two MDCT lengths are realized in the long and short window respectively. The transitional start and stop windows are used to lead over from one transform length to the other. Window long window start window eight short windows stop window General Design Table 1: Transform windows in AAC standard To further reduce the bitrate, High Efficiency AAC (HE-AAC) [4] combines an AAC core in the low frequency band with a parametric coding approach for the high frequency band (Spectral Band Replication: SBR). The high frequency band is reconstructed from replicated low frequency signal portions, and is controlled by parameter sets containing level, noise and tonality adjustment parameters. Page 2 of 9

3 HE-AAC inherits its generic M/S stereo and multichannel coding capabilities from AAC. The further enhanced HE-AACv2 possesses a parametric stereo coding tool, which extracts binaural cues from the input channels that are transmitted in addition to a mono downmix. Similarly HE-AAC can be paired with MPEG Surround, a generic parametric multichannel extension, to efficiently code multiple channels [6]. 3. UNIFIED SPEECH AND AUDIO CODING (USAC) This section describes the context of USAC development and gives a brief overview of its reference model version 0 (RM0) Motivation for USAC At the 82nd MPEG Meeting in Shenzhen, China, in 2007 the MPEG Audio Subgroup issued a Call for Proposals (CfP) on Unified Speech and Audio Coding [1]. The aim of this activity is to standardize an audio codec which performs consistently and equally well for speech, music and mixed content over a large bitrate range, and at the same time reaches the quality of the best performing state-of-the-art codecs for each type of content. As a characteristic feature, the core coder derived from AMR-WB+ applies an LPC tool on the input signal as one of the first processing steps, making the rest of the signal flow operate in an LPC-filtered domain (LPD). The input signal is further encoded using both time and frequency coding, where switching between both modes can be performed on a frame-by-frame basis. The time domain mode of the LPD coder is based on ACELP technology, which is known for its excellent speech coding capability. Alternatively, an MDCT-based transform coder allows coding of the weighted LPC filtered signal similar to the TCX known from AMR- WB+. In order to better distinguish the original TCX from the new MDCT-based coding used in the RM0, the latter is more precisely called weighted Linear Predictive Transform (wlpt) coding throughout this paper. The LPD coding path is typically activated for speechlike signals. The other, AAC derived, non-lpd coding algorithm is used for other general audio and music signals and is also usually activated exclusively at high bit rates, because it scales towards transparency as known from existing MPEG technologies. Seven candidates were provided as responses to the CfP. The evaluation of these responses was based on nine listening tests covering bitrates from 12 kbps mono up to 64 kbps stereo and assessing performance on all signal types. Eventually a candidate designed jointly by VoiceAge Corporation and Fraunhofer IIS was selected as the RM0 for USAC. At this stage, the proposed technology already performed equally good or better than the two state-of-the-art reference codecs at all test points RM0 system overview The technology in USAC RM0 combines state-of-theart MPEG technology such as AAC, SBR and MPEG Surround with state-of-the-art LPC based speech coder technology such as ACELP and TCX. At the core of the USAC RM0 is a hybrid coding scheme with two core codecs, derived from AAC and AMR-WB+. A mode switch selects one of the two coders (see Figure 1). Figure 1: Simplified encoder diagram of the USAC RM0 For bandwidth extension and parametric stereo coding, modified and enhanced versions of SBR [4] and MPEG Surround [6] are used on top of the switched AAC and ACELP/wLPT core. More details on the USAC reference model 0 can be found in [7, 8]. Page 3 of 9

4 4. THE NEW SET OF WINDOWS DEVELOPED FOR USAC Using two fundamentally different coding paradigms in one unified system poses a series of problems at the transition points where one core codec switches over to the other: risk of blocking artifacts, possible overhead of information required by transitions and necessity for constant framing. In the USAC framework all this is particularly challenging because the non-lpd domain core codec uses an MDCT. The MDCT allows an overlapping of adjacent blocks by a maximum of 50% without introducing additional overhead. This is particularly helpful to smooth blocking artifacts, but requires introducing Time Domain Aliasing (TDA) which has to be canceled out during synthesis [5]. A Time Domain Aliasing Cancellation (TDAC) is done by an adequate overlap-add operation of adjacent MDCT blocks on synthesis side. In USAC however, adjacent blocks can be coded using the LPD coder, which has either: a) Time Domain Aliasing (TDA) in a weighted LPC domain (not in the signal domain) or b) no TDA at all. In order to allow proper aliasing cancellation with the non-lpd mode (which introduces aliasing in the signal domain), the required aliasing components must be converted into the signal domain (case a) or introduced artificially by simulating the MDCT operations of analysis windowing, folding, unfolding and synthesis windowing (case b). Another solution to this problem is the design of MDCT analysis/synthesis windows without a TDAC region. The overlap-add operation is then the same as a simple cross-fade over the range of the window slope. Both methods are used in USAC RM0. In order to get the necessary and appropriate overlap areas for cross-fade and TDAC, a slightly different time alignment between the two coding modes had to be introduced as explained in section Categories of Windows The complete set of windows is divided into 4 categories depending on the coding mode of the previous, current and following frames: - The first category of windows is used when the core coder stays in the LPD mode. This case is presented in section The second category of windows is used when the core coder stays in the non-lpd mode. This case is presented in section The last two categories deal with transitions between the two coding modes. Transitions from the non-lpd mode to the LPD mode are presented in section Transitions from the LPD mode to the non-lpd mode are presented in section Two special cases are considered, depending on the coding mode (wlpt or ACELP) of the LPD frame. Figure 2 represents a basic scheme for switching back and forth from the LPD to the non-lpd modes. In the presented case the LPC processed block corresponds to four AMR-WB+ frames or ACELP frames (of size 256 samples) LPD mode to LPD mode When comparing the LPD mode in USAC to the original AMR-WB+ codec, it can be seen that the TCX filterbank was replaced by an MDCT. In this wlpt, the aliasing is computed in the weighted LPC domain (i.e. after the weighting filter W(z)) as shown in Figure 3. Therefore the original window switching procedure presented in section 5.3 of [2] had to be modified. In the original TCX, the right hand slope of the window covers 1/9th of the entire window length and the left hand slope length is equal to the right hand slope length of the Page 4 of 9

5 previous frame to achieve perfect reconstruction. In wlpt, the use of the MDCT allows larger and homogeneous overlap regions; therefore the overlap size is fixed to 128 samples. Figure 4: Stop_start window sequence 4.4. Transitions between LPD and non-lpd Figure 3: MDCT computation for the wlpt: aliasing occurs in the weighted LPC domain 4.3. Non-LPD mode to non-lpd mode If both the previous and the following frames are encoded in the non-lpd mode, then generally, the regular AAC transform windows presented in table 1 are used. Short windows, which have a better time resolution, are used to avoid pre- or post-echoes on transients caused by temporal smearing of the quantization noise. However, since coding efficiency is generally worse for lower frequency resolutions, the coder switches to short windows only when necessary and back to normal long windows as soon as possible. Unfortunately the windowing specification presented in [4] does not allow switching from short windows to an isolated long window and immediately back to short windows. Instead transition windows, called stop and start windows, must be used to switch from short to long windows and vice versa. In addition to the regular AAC transition windows, a new combined stop_start window producing an isolated long block was introduced in USAC RM0. Figure 4 shows this new window, which is useful to avoid undesirable latency for closely spaced transients, e.g. in case of transients occurring at a time interval longer than one long frame but shorter than two. In such a case it would make no sense to apply the short-stop-startshort sequence authorized by the AAC standard, because the codec might miss the transient and also apply short windows on perhaps stable signal. The following transition procedures represent the main innovation of this paper. The main challenge is to connect smoothly two different domains. Section presents transitions from the non-lpd to the LPD mode, while section presents transitions from the LPD to the non-lpd mode Non-LPD mode to LPD mode The LPD codec includes some predictors and internal filters which, during start-up, need a short time to reach a state which ensures an accurate filter synthesis. Using a rectangular window at the beginning of the first LPD frame and resetting the LPD-based codec to a zero state is therefore obviously not the ideal option for these transitions, because it would not leave enough time for the LPD codec to build-up a good signal and as a result would introduce blocking artifacts. Using a rectangular window but properly resetting the internal state of the LPD codec, including filter memories and the adaptive codebook used by ACELP, using past synthesis samples from the previous non-lpd frame was also considered. This operation requires among other things decoding the previous non-lpd frame, performing an LPC analysis, and applying the LPC analysis filter to the non-lpd synthesis signal. The impact on quality is, however, minimal and hence this approach was not further considered given the large increase in complexity. Introducing time domain aliasing in the original signal before LPD coding is not feasible either, because time domain aliasing is not compatible with prediction-based time domain coding such as ACELP. A possibility was to introduce an artificial aliasing in the beginning of the LPD segment and to apply TDAC in the same way as for ACELP to non-lpd transitions (see 4.4.2). However, in this case the artificial aliasing is produced from the synthesis signal instead of the original one. Since the synthesis signal is inaccurate especially at the LPD start-up, the introduction of artificial TDA would rather emphasize this error than reduce artifacts. Page 5 of 9

6 To avoid these problems, a modified start window without any time domain aliasing on its right side was designed. The right part of this window, which is represented in Figure 5, finishes before the centre of the TDA (i.e. the folding point) of the MDCT. Consequently, the modified start window is free of time domain aliasing on its right side. Compared to the standard short window which has an overlap of 128 samples (including TDA), the overlap region of the modified start window is reduced to 64 samples. This overlap regions is however still sufficient to smooth the blocking effect. Furthermore, it reduces the impact of the inaccuracy due to the start of the LPD coder by feeding it with a faded-in input. Note that this transition requires an overhead of 64 samples, i.e. that 64 samples are coded by both the non-lpd codec and the LPD codec. This results in a small difference in alignment between the non-lpd and the LPD core codecs. This small misalignment is compensated when the codec switches back again to the non-lpd codec, as explained in section wlpt to non-lpd mode Figure 6 shows an example of transition from the LPD to the non-lpd modes, when the previous frame was coded using wlpt. Unlike ACELP which only codes the samples inside the frame, wlpt naturally provides some overlap after the end of the previous frame. Therefore, the standard AAC window length (1024- sample kernel) can be used and critical sampling is preserved. Note that in this case the overlap provided by wlpt is sufficient to compensate the misalignment between the LPD and non-lpd core codecs. Figure 6: Transitions from the LPD to non-lpd mode, when the previous frame was coded using wlpt Figure 5: Window scheme for transitions from the non-lpd mode to the LPD mode LPD mode to non-lpd mode This section describes transitions from the LPD to the non-lpd mode. Two cases are considered: first, transitions from wlpt to the non-lpd mode; then, transitions from ACELP to the non-lpd mode. The examples are given for the stop-like case only, i.e. when the second half of the first non-lpd window has a 1024-sample overlap. As explained in Section 5, the complete family of transition windows also includes the stop_start-like case, i.e. when the second half of the first non-lpd window has a 128-sample overlap. During these transitions, the TDA introduced by the non-lpd mode in the overlap region (which is 128 samples long) has to be canceled out. The normal way to do this is to use the aliasing present in the previous frame, that is to say present in the overlap part of the MDCT-based wlpt frame. But TDAC is not straightforward in this case, because the aliasing in the two frames were produced in two different domains (LPD and non-lpd mode). The MDCT in wlpt is not computed directly in the signal domain, but after filtering the signal with a filter W(z) based on the LPC coefficients. W(z) is called the weighted analysis filter and permits to both whiten the input signal and shape the quantization noise by a formant-based curve which is in line with psycho-acoustic theories. Therefore, in order to have the aliasing contribution of the wlpt overlap part in the same domain as in AAC, i.e. in the signal domain, the weighting filter W(z) is moved between the folding operation and the DCT IV of the MDCT in wlpt. Figure 7 shows the modification compared to Figure 3 in the specific case of overlapand-add with a non-lpd frame. Page 6 of 9

7 The corresponding aliasing, required for perfect reconstruction, is artificially introduced in the right end of the synthesized ACELP frame delivered by the LPCbased codec. Time domain aliasing cancelation is achieved by windowing, folding, unfolding and windowing again the ACELP contribution, then overlap-adding it to the non-lpd mode contribution, in an MDCT/IMDCT manner. This process is illustrated in Figure 9. This approach requires the introduction of 64 overhead samples only. Figure 7: Time domain aliasing introduction in the wlpt overlap segment for wlpt to non-lpd mode transitions ACELP to non-lpd mode Figure 8 shows an example of a transition from the LPD mode to the non-lpd mode, when the previous frame was encoded using ACELP. This case is characterized by a transition from a codec operating in the LPC residual domain to a codec operating directly in the signal domain. The time domain aliasing introduced by the AAC codec in the overlap region (which is 128 samples long) is kept unchanged. Figure 9: Artificial time domain aliasing introduction and time domain aliasing cancellation for ACELP to non-lpd mode transitions Unlike wlpt, ACELP does not provide any overlap region which can be used to compensate the difference in alignment between the LPD and the non-lpd modes. Since the non-lpd mode is very flexible in terms of window length, of number of transmitted spectral coefficients, and of bit allocation, it was chosen to let this codec provide the necessary number of overhead samples. The window size for this transition has therefore enlarged from 2048 to 2304 samples. The flat region on the left side of this window is 128 samples longer than the standard length of 448 samples to compensate for the misalignment. A new MDCT kernel of 1152 samples was consequently introduced in the USAC codec. 5. OVERVIEW OF THE NEW FAMILY OF WINDOWS Figure 8: Window scheme for transitions from the LPD mode to the non-lpd mode, when the previous frame was coded using ACELP Figure 10 shows all possible windows and the allowed changeovers between windows. Here, circles represent the different windows; lines indicate allowed sequences of windows. Please note that circles marked with an asterisk ( ) really represent a group of windows, which all share the same characteristic in terms of underlying transform length (short or long), window slope (short or long) and coding mode (LPD or non- LPD), but may vary in detail depending on the adjacent windows. The general appearance is hinted at by small icons in the lower part of the circles. For example, as shown in section 4.4.2, two stop windows are available Page 7 of 9

8 depending on the core mode of the previous frame, one of which has an increase transform length (1152 instead of 1024), but the overall appearance is the same for both (non-lpd mode, short slope on the left, long transform, long slope on the right). Similarly, there are two start windows, depending on the following core mode. For the stop_start window, as many as four different variants are conceivable. The lines indicating the allowed succession of windows are accompanied by a three digit acronym [x 1 x 2 x 3 ], which helps to understand on what condition a particular line would be followed. The first digit x 1 indicates if the frame will be coded in the LPD mode (=1) or in non-lpd mode (=0). If in non-lpd mode (first digit is 0), the second digit x 2 indicates whether an attack (i.e. a transitional audio event) is present in the frame (=1) or not (=0), essentially triggering the use of short windows. The last digit x 3 is looking one frame further ahead and indicates whether the frame following the current frame will be a non-lpd frame without attacks (=0) or whether it contains an attack or will be coded in LPD (in both cases =1). The symbol indicates a wildcard and means that the value can be either 0 or 1 and is basically ignored. As an example, if the encoder decides to encode the next frame in non- LPD mode ([0..), which is reasonably stationary (no attacks) (..0..), and the next but one frame will be coded in LPD mode (..1]), it would travel along the line ([001]). The meaning of the digits is summarized in Table 2. [ x 1 x 2 x 3 ] Mode decision Attack in present Following frame frame? 0 = non-lpd 0 = No 0 = attack-free, non-lpd 1 = LPD 1 = Yes 1 = attack or LPD mode = ignored = ignored Table 2: Legend for the window state transitions of Figure CONCLUSION This paper presented a new family of windows designed for transitions between an LPC-domain codec such as AMR-WB+ and a purely transform based codec such as HE-AAC. The major problem solved was the transitions between a frequency domain codec with time domain aliasing (such as both wlpt and AAC) and a time domain codec which is normally incompatible with time domain aliasing because of its use of long-term prediction (ACELP). The proposed family of windows provides smooth transitions (no blocking artifacts) and introduces a minimum of overhead (in terms of number of samples coded twice during the overlap regions and in terms of bit transmitted). This undoubtedly contributed to the success of the candidate technology selected as the reference model version 0 of the MPEG unified speech and audio codec USAC. [1 ] [01 ] [001] [000] [000] [01 ] Page 8 of 9

9 7. REFERENCES [1] ISO/IEC JTC1/SC29/WG11 MPEG2007/N9519, Call for Proposals on Unified Speech and Audio Coding [2] 3GPP Technical Specification TS26.290, Audio codec processing functions; Extended Adaptive Multi-Rate - Wideband (AMR-WB+) codec; Transcoding functions, March 2007 [3] 3GPP Technical Specification TS26.171, Adaptive Multi-Rate Wideband (AMR-WB) speech codec; General description, 2002 [4] ISO/IEC , Information technology: Coding of audio-visual objects, Part 3: Audio [5] J. Princen and A. Bradley, Analysis/Synthesis Filter Bank Design Based on Time Domain Aliasing Cancellation, IEEE Trans. on Acoustics, Speech and Signal Processing, vol.34 n.5, oct.1986 [6] ISO/IEC :2007 Information technology MPEG audio technologies Part 1: MPEG Surround [7] A Novel Scheme for Low Bitrate Unified Speech and Audio Coding MPEG RM0, M. Neuendorf et al., Paper submitted to the 126th AES convention, Munich, Germany, May 2009 [8] M. Neuendorf, P. Gournay, M. Multrus, J. Lecomte, B. Bessette, R. Geiger, S. Bayer, G. Fuchs, J. Hilpert, N. Rettelbach, R. Salami, G. Schuller, R. Lefebvre, and B. Grill. Unified Speech and Audio Coding Scheme for High Quality at Low Bitrates. Accepted for publication at ICASSP 09, Taipei, Taiwan. Page 9 of 9

SAOC and USAC. Spatial Audio Object Coding / Unified Speech and Audio Coding. Lecture Audio Coding WS 2013/14. Dr.-Ing.

SAOC and USAC. Spatial Audio Object Coding / Unified Speech and Audio Coding. Lecture Audio Coding WS 2013/14. Dr.-Ing. SAOC and USAC Spatial Audio Object Coding / Unified Speech and Audio Coding Lecture Audio Coding WS 2013/14 Dr.-Ing. Andreas Franck Fraunhofer Institute for Digital Media Technology IDMT, Germany SAOC

More information

Technical PapER. between speech and audio coding. Fraunhofer Institute for Integrated Circuits IIS

Technical PapER. between speech and audio coding. Fraunhofer Institute for Integrated Circuits IIS Technical PapER Extended HE-AAC Bridging the gap between speech and audio coding One codec taking the place of two; one unified system bridging a troublesome gap. The fifth generation MPEG audio codec

More information

New Results in Low Bit Rate Speech Coding and Bandwidth Extension

New Results in Low Bit Rate Speech Coding and Bandwidth Extension Audio Engineering Society Convention Paper Presented at the 121st Convention 2006 October 5 8 San Francisco, CA, USA This convention paper has been reproduced from the author's advance manuscript, without

More information

ELL 788 Computational Perception & Cognition July November 2015

ELL 788 Computational Perception & Cognition July November 2015 ELL 788 Computational Perception & Cognition July November 2015 Module 11 Audio Engineering: Perceptual coding Coding and decoding Signal (analog) Encoder Code (Digital) Code (Digital) Decoder Signal (analog)

More information

ISO/IEC INTERNATIONAL STANDARD. Information technology MPEG audio technologies Part 3: Unified speech and audio coding

ISO/IEC INTERNATIONAL STANDARD. Information technology MPEG audio technologies Part 3: Unified speech and audio coding INTERNATIONAL STANDARD This is a preview - click here to buy the full publication ISO/IEC 23003-3 First edition 2012-04-01 Information technology MPEG audio technologies Part 3: Unified speech and audio

More information

The MPEG-4 General Audio Coder

The MPEG-4 General Audio Coder The MPEG-4 General Audio Coder Bernhard Grill Fraunhofer Institute for Integrated Circuits (IIS) grl 6/98 page 1 Outline MPEG-2 Advanced Audio Coding (AAC) MPEG-4 Extensions: Perceptual Noise Substitution

More information

Module 9 AUDIO CODING. Version 2 ECE IIT, Kharagpur

Module 9 AUDIO CODING. Version 2 ECE IIT, Kharagpur Module 9 AUDIO CODING Lesson 29 Transform and Filter banks Instructional Objectives At the end of this lesson, the students should be able to: 1. Define the three layers of MPEG-1 audio coding. 2. Define

More information

Scalable Perceptual and Lossless Audio Coding based on MPEG-4 AAC

Scalable Perceptual and Lossless Audio Coding based on MPEG-4 AAC Scalable Perceptual and Lossless Audio Coding based on MPEG-4 AAC Ralf Geiger 1, Gerald Schuller 1, Jürgen Herre 2, Ralph Sperschneider 2, Thomas Sporer 1 1 Fraunhofer IIS AEMT, Ilmenau, Germany 2 Fraunhofer

More information

MPEG-4 General Audio Coding

MPEG-4 General Audio Coding MPEG-4 General Audio Coding Jürgen Herre Fraunhofer Institute for Integrated Circuits (IIS) Dr. Jürgen Herre, hrr@iis.fhg.de 1 General Audio Coding Solid state players, Internet audio, terrestrial and

More information

Audio Coding Standards

Audio Coding Standards Audio Standards Kari Pihkala 13.2.2002 Tik-111.590 Multimedia Outline Architectural Overview MPEG-1 MPEG-2 MPEG-4 Philips PASC (DCC cassette) Sony ATRAC (MiniDisc) Dolby AC-3 Conclusions 2 Architectural

More information

INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO

INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO ISO/IEC JTC1/SC29 WG11 N15073 February 2015, Geneva,

More information

Convention Paper 8654 Presented at the 132nd Convention 2012 April Budapest, Hungary

Convention Paper 8654 Presented at the 132nd Convention 2012 April Budapest, Hungary Audio Engineering Society Convention Paper 8654 Presented at the 132nd Convention 2012 April 26 29 Budapest, Hungary This paper was peer-reviewed as a complete manuscript for presentation at this Convention.

More information

Audio coding for digital broadcasting

Audio coding for digital broadcasting Recommendation ITU-R BS.1196-4 (02/2015) Audio coding for digital broadcasting BS Series Broadcasting service (sound) ii Rec. ITU-R BS.1196-4 Foreword The role of the Radiocommunication Sector is to ensure

More information

Mpeg 1 layer 3 (mp3) general overview

Mpeg 1 layer 3 (mp3) general overview Mpeg 1 layer 3 (mp3) general overview 1 Digital Audio! CD Audio:! 16 bit encoding! 2 Channels (Stereo)! 44.1 khz sampling rate 2 * 44.1 khz * 16 bits = 1.41 Mb/s + Overhead (synchronization, error correction,

More information

Convention Paper 7215

Convention Paper 7215 Audio Engineering Society Convention Paper 7215 Presented at the 123rd Convention 2007 October 5 8 New York, NY, USA The papers at this Convention have been selected on the basis of a submitted abstract

More information

Multimedia Communications. Audio coding

Multimedia Communications. Audio coding Multimedia Communications Audio coding Introduction Lossy compression schemes can be based on source model (e.g., speech compression) or user model (audio coding) Unlike speech, audio signals can be generated

More information

DRA AUDIO CODING STANDARD

DRA AUDIO CODING STANDARD Applied Mechanics and Materials Online: 2013-06-27 ISSN: 1662-7482, Vol. 330, pp 981-984 doi:10.4028/www.scientific.net/amm.330.981 2013 Trans Tech Publications, Switzerland DRA AUDIO CODING STANDARD Wenhua

More information

Audio-coding standards

Audio-coding standards Audio-coding standards The goal is to provide CD-quality audio over telecommunications networks. Almost all CD audio coders are based on the so-called psychoacoustic model of the human auditory system.

More information

Compressed Audio Demystified by Hendrik Gideonse and Connor Smith. All Rights Reserved.

Compressed Audio Demystified by Hendrik Gideonse and Connor Smith. All Rights Reserved. Compressed Audio Demystified Why Music Producers Need to Care About Compressed Audio Files Download Sales Up CD Sales Down High-Definition hasn t caught on yet Consumers don t seem to care about high fidelity

More information

Speech and audio coding

Speech and audio coding Institut Mines-Telecom Speech and audio coding Marco Cagnazzo, cagnazzo@telecom-paristech.fr MN910 Advanced compression Outline Introduction Introduction Speech signal Music signal Masking Codeurs simples

More information

Optical Storage Technology. MPEG Data Compression

Optical Storage Technology. MPEG Data Compression Optical Storage Technology MPEG Data Compression MPEG-1 1 Audio Standard Moving Pictures Expert Group (MPEG) was formed in 1988 to devise compression techniques for audio and video. It first devised the

More information

MPEG-4 aacplus - Audio coding for today s digital media world

MPEG-4 aacplus - Audio coding for today s digital media world MPEG-4 aacplus - Audio coding for today s digital media world Whitepaper by: Gerald Moser, Coding Technologies November 2005-1 - 1. Introduction Delivering high quality digital broadcast content to consumers

More information

14th European Signal Processing Conference (EUSIPCO 2006), Florence, Italy, September 4-8, 2006, copyright by EURASIP

14th European Signal Processing Conference (EUSIPCO 2006), Florence, Italy, September 4-8, 2006, copyright by EURASIP TRADEOFF BETWEEN COMPLEXITY AND MEMORY SIZE IN THE 3GPP ENHANCED PLUS DECODER: SPEED-CONSCIOUS AND MEMORY- CONSCIOUS DECODERS ON A 16-BIT FIXED-POINT DSP Osamu Shimada, Toshiyuki Nomura, Akihiko Sugiyama

More information

Parametric Coding of High-Quality Audio

Parametric Coding of High-Quality Audio Parametric Coding of High-Quality Audio Prof. Dr. Gerald Schuller Fraunhofer IDMT & Ilmenau Technical University Ilmenau, Germany 1 Waveform vs Parametric Waveform Filter-bank approach Mainly exploits

More information

2.4 Audio Compression

2.4 Audio Compression 2.4 Audio Compression 2.4.1 Pulse Code Modulation Audio signals are analog waves. The acoustic perception is determined by the frequency (pitch) and the amplitude (loudness). For storage, processing and

More information

the Audio Engineering Society. Convention Paper Presented at the 120th Convention 2006 May Paris, France

the Audio Engineering Society. Convention Paper Presented at the 120th Convention 2006 May Paris, France Audio Engineering Society Convention Paper Presented at the 120th Convention 2006 May 20 23 Paris, France This convention paper has been reproduced from the author s advance manuscript, without editing,

More information

Audio-coding standards

Audio-coding standards Audio-coding standards The goal is to provide CD-quality audio over telecommunications networks. Almost all CD audio coders are based on the so-called psychoacoustic model of the human auditory system.

More information

Enhanced MPEG-4 Low Delay AAC - Low Bitrate High Quality Communication

Enhanced MPEG-4 Low Delay AAC - Low Bitrate High Quality Communication Enhanced MPEG- Low Delay AAC - Low Bitrate High Quality Communication Markus Schnell, Ralf Geiger, Markus Schmidt, Manuel Jander, Markus Multrus, Gerald Schuller, Jürgen Herre Fraunhofer IIS, Erlangen,

More information

5: Music Compression. Music Coding. Mark Handley

5: Music Compression. Music Coding. Mark Handley 5: Music Compression Mark Handley Music Coding LPC-based codecs model the sound source to achieve good compression. Works well for voice. Terrible for music. What if you can t model the source? Model the

More information

The Reference Model Architecture for MPEG Spatial Audio Coding

The Reference Model Architecture for MPEG Spatial Audio Coding The Reference Model Architecture for MPEG J. Herre 1, H. Purnhagen 2, J. Breebaart 3, C. Faller 5, S. Disch 1, K. Kjörling 2, E. Schuijers 4, J. Hilpert 1, F. Myburg 4 1 Fraunhofer Institute for Integrated

More information

Contents. 3 Vector Quantization The VQ Advantage Formulation Optimality Conditions... 48

Contents. 3 Vector Quantization The VQ Advantage Formulation Optimality Conditions... 48 Contents Part I Prelude 1 Introduction... 3 1.1 Audio Coding... 4 1.2 Basic Idea... 6 1.3 Perceptual Irrelevance... 8 1.4 Statistical Redundancy... 9 1.5 Data Modeling... 9 1.6 Resolution Challenge...

More information

3GPP TS V6.2.0 ( )

3GPP TS V6.2.0 ( ) TS 26.401 V6.2.0 (2005-03) Technical Specification 3rd Generation Partnership Project; Technical Specification Group Services and System Aspects; General audio codec audio processing functions; Enhanced

More information

INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO

INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO ISO/IEC JTC1/SC29/WG11 N15071 February 2015, Geneva,

More information

EE482: Digital Signal Processing Applications

EE482: Digital Signal Processing Applications Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu EE482: Digital Signal Processing Applications Spring 2014 TTh 14:30-15:45 CBC C222 Lecture 13 Audio Signal Processing 14/04/01 http://www.ee.unlv.edu/~b1morris/ee482/

More information

Principles of Audio Coding

Principles of Audio Coding Principles of Audio Coding Topics today Introduction VOCODERS Psychoacoustics Equal-Loudness Curve Frequency Masking Temporal Masking (CSIT 410) 2 Introduction Speech compression algorithm focuses on exploiting

More information

Perceptual Pre-weighting and Post-inverse weighting for Speech Coding

Perceptual Pre-weighting and Post-inverse weighting for Speech Coding Perceptual Pre-weighting and Post-inverse weighting for Speech Coding Niranjan Shetty and Jerry D. Gibson Department of Electrical and Computer Engineering University of California, Santa Barbara, CA,

More information

/ / _ / _ / _ / / / / /_/ _/_/ _/_/ _/_/ _\ / All-American-Advanced-Audio-Codec

/ / _ / _ / _ / / / / /_/ _/_/ _/_/ _/_/ _\ / All-American-Advanced-Audio-Codec / / _ / _ / _ / / / / /_/ _/_/ _/_/ _/_/ _\ / All-American-Advanced-Audio-Codec () **Z ** **=Z ** **= ==== == **= ==== \"\" === ==== \"\"\" ==== \"\"\"\" Tim O Brien Colin Sullivan Jennifer Hsu Mayank

More information

Convention Paper Presented at the 121st Convention 2006 October 5 8 San Francisco, CA, USA

Convention Paper Presented at the 121st Convention 2006 October 5 8 San Francisco, CA, USA Audio Engineering Society Convention Paper Presented at the 121st Convention 2006 October 5 8 San Francisco, CA, USA This convention paper has been reproduced from the author s advance manuscript, without

More information

Digital Speech Coding

Digital Speech Coding Digital Speech Processing David Tipper Associate Professor Graduate Program of Telecommunications and Networking University of Pittsburgh Telcom 2700/INFSCI 1072 Slides 7 http://www.sis.pitt.edu/~dtipper/tipper.html

More information

Structural analysis of low latency audio coding schemes

Structural analysis of low latency audio coding schemes Structural analysis of low latency audio coding schemes Manfred Lutzky, Markus Schnell, Markus Schmidt and Ralf Geiger Fraunhofer Institute for Integrated Circuits IIS, Am Wolfsmantel 33, 91058 Erlangen,

More information

Data Compression. Audio compression

Data Compression. Audio compression 1 Data Compression Audio compression Outline Basics of Digital Audio 2 Introduction What is sound? Signal-to-Noise Ratio (SNR) Digitization Filtering Sampling and Nyquist Theorem Quantization Synthetic

More information

Lecture 16 Perceptual Audio Coding

Lecture 16 Perceptual Audio Coding EECS 225D Audio Signal Processing in Humans and Machines Lecture 16 Perceptual Audio Coding 2012-3-14 Professor Nelson Morgan today s lecture by John Lazzaro www.icsi.berkeley.edu/eecs225d/spr12/ Hero

More information

application Bulletin Fraunhofer Institute for Integrated Circuits IIS

application Bulletin Fraunhofer Institute for Integrated Circuits IIS application Bulletin xhe-aac in Digital Radio Mondiale (DRM) Implementation Guidelines for the Realization of xhe-aac in the DRM Framework With the adoption of xhe-aac (Extended High Efficiency Advanced

More information

Multi-Pulse Based Code Excited Linear Predictive Speech Coder with Fine Granularity Scalability for Tonal Language

Multi-Pulse Based Code Excited Linear Predictive Speech Coder with Fine Granularity Scalability for Tonal Language Journal of Computer Science 6 (11): 1288-1292, 2010 ISSN 1549-3636 2010 Science Publications Multi-Pulse Based Code Excited Linear Predictive Speech Coder with Fine Granularity Scalability for Tonal Language

More information

Audio Coding and MP3

Audio Coding and MP3 Audio Coding and MP3 contributions by: Torbjørn Ekman What is Sound? Sound waves: 20Hz - 20kHz Speed: 331.3 m/s (air) Wavelength: 165 cm - 1.65 cm 1 Analogue audio frequencies: 20Hz - 20kHz mono: x(t)

More information

Filterbanks and transforms

Filterbanks and transforms Filterbanks and transforms Sources: Zölzer, Digital audio signal processing, Wiley & Sons. Saramäki, Multirate signal processing, TUT course. Filterbanks! Introduction! Critical sampling, half-band filter!

More information

Appendix 4. Audio coding algorithms

Appendix 4. Audio coding algorithms Appendix 4. Audio coding algorithms 1 Introduction The main application of audio compression systems is to obtain compact digital representations of high-quality (CD-quality) wideband audio signals. Typically

More information

Application Note PEAQ Audio Objective Testing in ClearView

Application Note PEAQ Audio Objective Testing in ClearView 1566 La Pradera Dr Campbell, CA 95008 www.videoclarity.com 408-379-6952 Application Note PEAQ Audio Objective Testing in ClearView Video Clarity, Inc. Version 1.0 A Video Clarity Application Note page

More information

MPEG SURROUND: THE FORTHCOMING ISO STANDARD FOR SPATIAL AUDIO CODING

MPEG SURROUND: THE FORTHCOMING ISO STANDARD FOR SPATIAL AUDIO CODING MPEG SURROUND: THE FORTHCOMING ISO STANDARD FOR SPATIAL AUDIO CODING LARS VILLEMOES 1, JÜRGEN HERRE 2, JEROEN BREEBAART 3, GERARD HOTHO 3, SASCHA DISCH 2, HEIKO PURNHAGEN 1, AND KRISTOFER KJÖRLING 1 1

More information

Audio Fundamentals, Compression Techniques & Standards. Hamid R. Rabiee Mostafa Salehi, Fatemeh Dabiran, Hoda Ayatollahi Spring 2011

Audio Fundamentals, Compression Techniques & Standards. Hamid R. Rabiee Mostafa Salehi, Fatemeh Dabiran, Hoda Ayatollahi Spring 2011 Audio Fundamentals, Compression Techniques & Standards Hamid R. Rabiee Mostafa Salehi, Fatemeh Dabiran, Hoda Ayatollahi Spring 2011 Outlines Audio Fundamentals Sampling, digitization, quantization μ-law

More information

Estimating MP3PRO Encoder Parameters From Decoded Audio

Estimating MP3PRO Encoder Parameters From Decoded Audio Estimating MP3PRO Encoder Parameters From Decoded Audio Paul Bießmann 1, Daniel Gärtner 1, Christian Dittmar 1, Patrick Aichroth 1, Michael Schnabel 2, Gerald Schuller 1,2, and Ralf Geiger 3 1 Semantic

More information

ON-LINE SIMULATION MODULES FOR TEACHING SPEECH AND AUDIO COMPRESSION TECHNIQUES

ON-LINE SIMULATION MODULES FOR TEACHING SPEECH AND AUDIO COMPRESSION TECHNIQUES ON-LINE SIMULATION MODULES FOR TEACHING SPEECH AND AUDIO COMPRESSION TECHNIQUES Venkatraman Atti 1 and Andreas Spanias 1 Abstract In this paper, we present a collection of software educational tools for

More information

MUSIC A Darker Phonetic Audio Coder

MUSIC A Darker Phonetic Audio Coder MUSIC 422 - A Darker Phonetic Audio Coder Prateek Murgai and Orchisama Das Abstract In this project we develop an audio coder that tries to improve the quality of the audio at 128kbps per channel by employing

More information

Efficient Implementation of Transform Based Audio Coders using SIMD Paradigm and Multifunction Computations

Efficient Implementation of Transform Based Audio Coders using SIMD Paradigm and Multifunction Computations Efficient Implementation of Transform Based Audio Coders using SIMD Paradigm and Multifunction Computations Luckose Poondikulam S (luckose@sasken.com), Suyog Moogi (suyog@sasken.com), Rahul Kumar, K P

More information

Chapter 4: Audio Coding

Chapter 4: Audio Coding Chapter 4: Audio Coding Lossy and lossless audio compression Traditional lossless data compression methods usually don't work well on audio signals if applied directly. Many audio coders are lossy coders,

More information

MPEG-4 Version 2 Audio Workshop: HILN - Parametric Audio Coding

MPEG-4 Version 2 Audio Workshop: HILN - Parametric Audio Coding MPEG-4 Version 2 Audio Workshop: HILN - Parametric Audio Coding Heiko Purnhagen Laboratorium für Informationstechnologie University of Hannover, Germany Outline Introduction What is "Parametric Audio Coding"?

More information

TECHNICAL PAPER. Fraunhofer Institute for Integrated Circuits IIS

TECHNICAL PAPER. Fraunhofer Institute for Integrated Circuits IIS TECHNICAL PAPER Enhanced Voice Services (EVS) Codec Until now, telephone services have generally failed to offer a high-quality audio experience due to limitations such as very low audio bandwidth and

More information

Efficient Signal Adaptive Perceptual Audio Coding

Efficient Signal Adaptive Perceptual Audio Coding Efficient Signal Adaptive Perceptual Audio Coding MUHAMMAD TAYYAB ALI, MUHAMMAD SALEEM MIAN Department of Electrical Engineering, University of Engineering and Technology, G.T. Road Lahore, PAKISTAN. ]

More information

Perceptual coding. A psychoacoustic model is used to identify those signals that are influenced by both these effects.

Perceptual coding. A psychoacoustic model is used to identify those signals that are influenced by both these effects. Perceptual coding Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal. Perceptual encoders, however, have been designed for the compression of general

More information

On Improving the Performance of an ACELP Speech Coder

On Improving the Performance of an ACELP Speech Coder On Improving the Performance of an ACELP Speech Coder ARI HEIKKINEN, SAMULI PIETILÄ, VESA T. RUOPPILA, AND SAKARI HIMANEN Nokia Research Center, Speech and Audio Systems Laboratory P.O. Box, FIN-337 Tampere,

More information

MULTIMODE TREE CODING OF SPEECH WITH PERCEPTUAL PRE-WEIGHTING AND POST-WEIGHTING

MULTIMODE TREE CODING OF SPEECH WITH PERCEPTUAL PRE-WEIGHTING AND POST-WEIGHTING MULTIMODE TREE CODING OF SPEECH WITH PERCEPTUAL PRE-WEIGHTING AND POST-WEIGHTING Pravin Ramadas, Ying-Yi Li, and Jerry D. Gibson Department of Electrical and Computer Engineering, University of California,

More information

Using Noise Substitution for Backwards-Compatible Audio Codec Improvement

Using Noise Substitution for Backwards-Compatible Audio Codec Improvement Using Noise Substitution for Backwards-Compatible Audio Codec Improvement Colin Raffel AES 129th Convention San Francisco, CA February 16, 2011 Outline Introduction and Motivation Coding Error Analysis

More information

Audio Engineering Society. Convention Paper. Presented at the 123rd Convention 2007 October 5 8 New York, NY, USA

Audio Engineering Society. Convention Paper. Presented at the 123rd Convention 2007 October 5 8 New York, NY, USA Audio Engineering Society Convention Paper Presented at the 123rd Convention 2007 October 5 8 New York, NY, USA The papers at this Convention have been selected on the basis of a submitted abstract and

More information

MPEG-4 Audio Lossless Coding

MPEG-4 Audio Lossless Coding Audio Engineering Society Convention Paper Presented at the 116th Convention 2004 May 8 11 Berlin, Germany This convention paper has been reproduced from the author s advance manuscript, without editing,

More information

Perceptual Audio Coders What to listen for: Artifacts of Parametric Coding

Perceptual Audio Coders What to listen for: Artifacts of Parametric Coding Perceptual Audio Coders What to listen for: Artifacts of Parametric Coding Heiko Purnhagen, Bernd Edler University of AES 109th Convention, Los Angeles, September 22-25, 2000 1 Introduction: Parametric

More information

Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal.

Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal. Perceptual coding Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal. Perceptual encoders, however, have been designed for the compression of general

More information

6MPEG-4 audio coding tools

6MPEG-4 audio coding tools 6MPEG-4 audio coding 6.1. Introduction to MPEG-4 audio MPEG-4 audio [58] is currently one of the most prevalent audio coding standards. It combines many different types of audio coding into one integrated

More information

S.K.R Engineering College, Chennai, India. 1 2

S.K.R Engineering College, Chennai, India. 1 2 Implementation of AAC Encoder for Audio Broadcasting A.Parkavi 1, T.Kalpalatha Reddy 2. 1 PG Scholar, 2 Dean 1,2 Department of Electronics and Communication Engineering S.K.R Engineering College, Chennai,

More information

<< WILL FILL IN THESE SECTIONS THIS WEEK to provide sufficient background>>

<< WILL FILL IN THESE SECTIONS THIS WEEK to provide sufficient background>> THE GSS CODEC MUSIC 422 FINAL PROJECT Greg Sell, Song Hui Chon, Scott Cannon March 6, 2005 Audio files at: ccrma.stanford.edu/~gsell/422final/wavfiles.tar Code at: ccrma.stanford.edu/~gsell/422final/codefiles.tar

More information

ROBUST SPEECH CODING WITH EVS Anssi Rämö, Adriana Vasilache and Henri Toukomaa Nokia Techonologies, Tampere, Finland

ROBUST SPEECH CODING WITH EVS Anssi Rämö, Adriana Vasilache and Henri Toukomaa Nokia Techonologies, Tampere, Finland ROBUST SPEECH CODING WITH EVS Anssi Rämö, Adriana Vasilache and Henri Toukomaa Nokia Techonologies, Tampere, Finland 2015-12-16 1 OUTLINE Very short introduction to EVS Robustness EVS LSF robustness features

More information

Pyramid Coding and Subband Coding

Pyramid Coding and Subband Coding Pyramid Coding and Subband Coding Predictive pyramids Transform pyramids Subband coding Perfect reconstruction filter banks Quadrature mirror filter banks Octave band splitting Transform coding as a special

More information

3GPP TS V ( )

3GPP TS V ( ) TS 6.405 V11.0.0 (01-09) Technical Specification 3rd Generation Partnership Project; Technical Specification Group Services and System Aspects; General audio codec audio processing functions; Enhanced

More information

SPREAD SPECTRUM AUDIO WATERMARKING SCHEME BASED ON PSYCHOACOUSTIC MODEL

SPREAD SPECTRUM AUDIO WATERMARKING SCHEME BASED ON PSYCHOACOUSTIC MODEL SPREAD SPECTRUM WATERMARKING SCHEME BASED ON PSYCHOACOUSTIC MODEL 1 Yüksel Tokur 2 Ergun Erçelebi e-mail: tokur@gantep.edu.tr e-mail: ercelebi@gantep.edu.tr 1 Gaziantep University, MYO, 27310, Gaziantep,

More information

ENTROPY CODING OF QUANTIZED SPECTRAL COMPONENTS IN FDLP AUDIO CODEC

ENTROPY CODING OF QUANTIZED SPECTRAL COMPONENTS IN FDLP AUDIO CODEC RESEARCH REPORT IDIAP ENTROPY CODING OF QUANTIZED SPECTRAL COMPONENTS IN FDLP AUDIO CODEC Petr Motlicek Sriram Ganapathy Hynek Hermansky Idiap-RR-71-2008 NOVEMBER 2008 Centre du Parc, Rue Marconi 19, P.O.

More information

Presents 2006 IMTC Forum ITU-T T Workshop

Presents 2006 IMTC Forum ITU-T T Workshop Presents 2006 IMTC Forum ITU-T T Workshop G.729EV: An 8-32 kbit/s scalable wideband speech and audio coder bitstream interoperable with G.729 Presented by Christophe Beaugeant On behalf of ETRI, France

More information

Spectral modeling of musical sounds

Spectral modeling of musical sounds Spectral modeling of musical sounds Xavier Serra Audiovisual Institute, Pompeu Fabra University http://www.iua.upf.es xserra@iua.upf.es 1. Introduction Spectral based analysis/synthesis techniques offer

More information

A MULTI-RATE SPEECH AND CHANNEL CODEC: A GSM AMR HALF-RATE CANDIDATE

A MULTI-RATE SPEECH AND CHANNEL CODEC: A GSM AMR HALF-RATE CANDIDATE A MULTI-RATE SPEECH AND CHANNEL CODEC: A GSM AMR HALF-RATE CANDIDATE S.Villette, M.Stefanovic, A.Kondoz Centre for Communication Systems Research University of Surrey, Guildford GU2 5XH, Surrey, United

More information

25*$1,6$7,21,17(51$7,21$/('(1250$/,6$7,21,62,(&-7&6&:* &2',1*2)029,1*3,&785(6$1'$8',2 &DOOIRU3URSRVDOVIRU1HZ7RROVIRU$XGLR&RGLQJ

25*$1,6$7,21,17(51$7,21$/('(1250$/,6$7,21,62,(&-7&6&:* &2',1*2)029,1*3,&785(6$1'$8',2 &DOOIRU3URSRVDOVIRU1HZ7RROVIRU$XGLR&RGLQJ INTERNATIONAL ORGANISATION FOR STANDARDISATION 25*$1,6$7,21,17(51$7,21$/('(1250$/,6$7,21,62,(&-7&6&:* &2',1*2)029,1*3,&785(6$1'$8',2,62,(&-7&6&:* 03(*1 -DQXDU\ 7LWOH $XWKRU 6WDWXV &DOOIRU3URSRVDOVIRU1HZ7RROVIRU$XGLR&RGLQJ

More information

Pyramid Coding and Subband Coding

Pyramid Coding and Subband Coding Pyramid Coding and Subband Coding! Predictive pyramids! Transform pyramids! Subband coding! Perfect reconstruction filter banks! Quadrature mirror filter banks! Octave band splitting! Transform coding

More information

Figure 1. Generic Encoder. Window. Spectral Analysis. Psychoacoustic Model. Quantize. Pack Data into Frames. Additional Coding.

Figure 1. Generic Encoder. Window. Spectral Analysis. Psychoacoustic Model. Quantize. Pack Data into Frames. Additional Coding. Introduction to Digital Audio Compression B. Cavagnolo and J. Bier Berkeley Design Technology, Inc. 2107 Dwight Way, Second Floor Berkeley, CA 94704 (510) 665-1600 info@bdti.com http://www.bdti.com INTRODUCTION

More information

Lecture 5: Error Resilience & Scalability

Lecture 5: Error Resilience & Scalability Lecture 5: Error Resilience & Scalability Dr Reji Mathew A/Prof. Jian Zhang NICTA & CSE UNSW COMP9519 Multimedia Systems S 010 jzhang@cse.unsw.edu.au Outline Error Resilience Scalability Including slides

More information

Chapter 14 MPEG Audio Compression

Chapter 14 MPEG Audio Compression Chapter 14 MPEG Audio Compression 14.1 Psychoacoustics 14.2 MPEG Audio 14.3 Other Commercial Audio Codecs 14.4 The Future: MPEG-7 and MPEG-21 14.5 Further Exploration 1 Li & Drew c Prentice Hall 2003 14.1

More information

The following bit rates are recommended for broadcast contribution employing the most commonly used audio coding schemes:

The following bit rates are recommended for broadcast contribution employing the most commonly used audio coding schemes: Page 1 of 8 1. SCOPE This Operational Practice sets out guidelines for minimising the various artefacts that may distort audio signals when low bit-rate coding schemes are employed to convey contribution

More information

Opus, a free, high-quality speech and audio codec

Opus, a free, high-quality speech and audio codec Opus, a free, high-quality speech and audio codec Jean-Marc Valin, Koen Vos, Timothy B. Terriberry, Gregory Maxwell 29 January 2014 What is Opus? New highly-flexible speech and audio codec Works for most

More information

MPEG-1. Overview of MPEG-1 1 Standard. Introduction to perceptual and entropy codings

MPEG-1. Overview of MPEG-1 1 Standard. Introduction to perceptual and entropy codings MPEG-1 Overview of MPEG-1 1 Standard Introduction to perceptual and entropy codings Contents History Psychoacoustics and perceptual coding Entropy coding MPEG-1 Layer I/II Layer III (MP3) Comparison and

More information

A PSYCHOACOUSTIC MODEL WITH PARTIAL SPECTRAL FLATNESS MEASURE FOR TONALITY ESTIMATION

A PSYCHOACOUSTIC MODEL WITH PARTIAL SPECTRAL FLATNESS MEASURE FOR TONALITY ESTIMATION A PSYCHOACOUSTIC MODEL WITH PARTIAL SPECTRAL FLATNESS MEASURE FOR TONALITY ESTIMATION Armin Taghipour 1, Maneesh Chandra Jaikumar 2, and Bernd Edler 1 1 International Audio Laboratories Erlangen, Am Wolfsmantel

More information

Modeling of an MPEG Audio Layer-3 Encoder in Ptolemy

Modeling of an MPEG Audio Layer-3 Encoder in Ptolemy Modeling of an MPEG Audio Layer-3 Encoder in Ptolemy Patrick Brown EE382C Embedded Software Systems May 10, 2000 $EVWUDFW MPEG Audio Layer-3 is a standard for the compression of high-quality digital audio.

More information

Parametric Coding of Spatial Audio

Parametric Coding of Spatial Audio Parametric Coding of Spatial Audio Ph.D. Thesis Christof Faller, September 24, 2004 Thesis advisor: Prof. Martin Vetterli Audiovisual Communications Laboratory, EPFL Lausanne Parametric Coding of Spatial

More information

Efficient Representation of Sound Images: Recent Developments in Parametric Coding of Spatial Audio

Efficient Representation of Sound Images: Recent Developments in Parametric Coding of Spatial Audio Efficient Representation of Sound Images: Recent Developments in Parametric Coding of Spatial Audio Dr. Jürgen Herre 11/07 Page 1 Jürgen Herre für (IIS) Erlangen, Germany Introduction: Sound Images? Humans

More information

Sonic Studio Mastering EQ Table of Contents

Sonic Studio Mastering EQ Table of Contents Sonic Studio Mastering EQ Table of Contents 1.0 Sonic Studio Mastering EQ... 3 1.1 Sonic Studio Mastering EQ Audio Unit Plug-in...4 1.1.1 Overview... 4 1.1.2 Operation... 4 1.1.2.1 Mastering EQ Visualizer...5

More information

ETSI TS V (201

ETSI TS V (201 TS 126 401 V13.0.0 (201 16-01) TECHNICAL SPECIFICATION Digital cellular telecommunications system (Phase 2+); Universal Mobile Telecommunications System (UMTS); LTE; General audio codec audio processing

More information

Implementation of G.729E Speech Coding Algorithm based on TMS320VC5416 YANG Xiaojin 1, a, PAN Jinjin 2,b

Implementation of G.729E Speech Coding Algorithm based on TMS320VC5416 YANG Xiaojin 1, a, PAN Jinjin 2,b International Conference on Materials Engineering and Information Technology Applications (MEITA 2015) Implementation of G.729E Speech Coding Algorithm based on TMS320VC5416 YANG Xiaojin 1, a, PAN Jinjin

More information

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 SUBJECTIVE AND OBJECTIVE QUALITY EVALUATION FOR AUDIO WATERMARKING BASED ON SINUSOIDAL AMPLITUDE MODULATION PACS: 43.10.Pr, 43.60.Ek

More information

ROW.mp3. Colin Raffel, Jieun Oh, Isaac Wang Music 422 Final Project 3/12/2010

ROW.mp3. Colin Raffel, Jieun Oh, Isaac Wang Music 422 Final Project 3/12/2010 ROW.mp3 Colin Raffel, Jieun Oh, Isaac Wang Music 422 Final Project 3/12/2010 Motivation The realities of mp3 widespread use low quality vs. bit rate when compared to modern codecs Vision for row-mp3 backwards

More information

Efficiënte audiocompressie gebaseerd op de perceptieve codering van ruimtelijk geluid

Efficiënte audiocompressie gebaseerd op de perceptieve codering van ruimtelijk geluid nederlands akoestisch genootschap NAG journaal nr. 184 november 2007 Efficiënte audiocompressie gebaseerd op de perceptieve codering van ruimtelijk geluid Philips Research High Tech Campus 36 M/S2 5656

More information

Perceptual Coding. Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding

Perceptual Coding. Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding Perceptual Coding Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding Part II wrap up 6.082 Fall 2006 Perceptual Coding, Slide 1 Lossless vs.

More information

Wavelet filter bank based wide-band audio coder

Wavelet filter bank based wide-band audio coder Wavelet filter bank based wide-band audio coder J. Nováček Czech Technical University, Faculty of Electrical Engineering, Technicka 2, 16627 Prague, Czech Republic novacj1@fel.cvut.cz 3317 New system for

More information

ARCHITECTURES OF INCORPORATING MPEG-4 AVC INTO THREE-DIMENSIONAL WAVELET VIDEO CODING

ARCHITECTURES OF INCORPORATING MPEG-4 AVC INTO THREE-DIMENSIONAL WAVELET VIDEO CODING ARCHITECTURES OF INCORPORATING MPEG-4 AVC INTO THREE-DIMENSIONAL WAVELET VIDEO CODING ABSTRACT Xiangyang Ji *1, Jizheng Xu 2, Debin Zhao 1, Feng Wu 2 1 Institute of Computing Technology, Chinese Academy

More information

An investigation of non-uniform bandwidths auditory filterbank in audio coding

An investigation of non-uniform bandwidths auditory filterbank in audio coding PAGE 360 An investigation of non-uniform bandwidths auditory filterbank in audio coding Andrew Lin, Stevan Berber, Waleed Abdulla Department of Electrical and Computer Engineering University of Auckland,

More information

MODIFIED IMDCT-DECODER BASED MP3 MULTICHANNEL AUDIO DECODING SYSTEM Shanmuga Raju.S 1, Karthik.R 2, Sai Pradeep.K.P 3, Varadharajan.

MODIFIED IMDCT-DECODER BASED MP3 MULTICHANNEL AUDIO DECODING SYSTEM Shanmuga Raju.S 1, Karthik.R 2, Sai Pradeep.K.P 3, Varadharajan. MODIFIED IMDCT-DECODER BASED MP3 MULTICHANNEL AUDIO DECODING SYSTEM Shanmuga Raju.S 1, Karthik.R 2, Sai Pradeep.K.P 3, Varadharajan.E 4 Assistant Professor, Dept. of ECE, Dr.NGP Institute of Technology,

More information