Convention Paper 7215

Size: px
Start display at page:

Download "Convention Paper 7215"

Transcription

1 Audio Engineering Society Convention Paper 7215 Presented at the 123rd Convention 2007 October 5 8 New York, NY, USA The papers at this Convention have been selected on the basis of a submitted abstract and extended precis that have been peer reviewed by at least two qualified anonymous reviewers. This convention paper has been reproduced from the author's advance manuscript, without editing, corrections, or consideration by the Review Board. The AES takes no responsibility for the contents. Additional papers may be obtained by sending request and remittance to Audio Engineering Society, 60 East 42 nd Street, New York, New York , USA; also see All rights reserved. Reproduction of this paper, or any portion thereof, is not permitted without direct permission from the Journal of the Audio Engineering Society. A Very Low Bit Rate Protection Layer to Increase the Robustness of the AMR-WB+ Codec against Bit Errors Philippe Gournay Université de Sherbrooke, 2500 boul. de l Université, Sherbrooke (Québec) J1K 2R1 Canada Philippe.Gournay@USherbrooke.ca ABSTRACT Audio codecs face various channel impairments when used in challenging applications such as digital radio. The standard AMR-WB+ audio codec includes a concealment procedure to handle lost frames. It is also inherently robust to bit errors, although some bits within any given frame are more sensitive than others. Motivated by this observation, the present paper makes two contributions. First, a detailed study of the sensitivity of individual bits in AMR-WB+ frames is provided. All the bits in a frame are then divided into three sensitivity classes so that efficient unequal error protection (UEP) schemes can be designed. Then, a very low bit rate protection layer to increase the robustness of the codec against bit errors is proposed and assessed using the results of subjective audio quality tests. Remarkably, in contrast to the standard codec, where some errors have a very discernable effect, the protection layer ensures that the decoded audio is free of major channel artifacts even at a significant 0.5% bit error rate. 1. INTRODUCTION The AMR-WB+ audio codec [1-3] uses a hybrid coding model that switches automatically, depending on the characteristics of the input signal, between an ACELP (Algebraic Code-Excited Linear Prediction) and a TCX (Transform Coded excitation) coding model. AMR- WB+ performs well for speech as well as for music, accepts both mono and stereo inputs, accommodates a wide audio bandwidth range (from 8 to 48 khz), and is scalable in bit rate from 6 to 36 kbps for mono and 7 to 48 kbps for stereo encoding. Moreover, it is backward compatible with the AMR-WB/G standard [4], which was the first speech codec to be adopted for both wireless and wireline services. AMR-WB+ was standardized in 2004 for streaming and multimedia messaging services in Global System for Mobile communications (GSM) and Third Generation (3G) cellular systems by the 3rd Generation Partnership Project (3GPP). This codec has also been standardized as a low bit rate audio option for DVB-H Mobile TV applications [5]. Currently, because of its excellent performance at low bit rates, AMR-WB+ is also drawing increasing interest for other applications such as digital radio.

2 Depending on the application, audio codecs have to face various channel impairments that typically translate into lost frames/packets and/or bit errors. AMR-WB+ includes a frame loss concealment procedure to help mitigate the impact of lost frames. AMR-WB+ is also inherently robust to bit errors. However, as with any other codec, some bits within a given frame are more sensitive, in the sense that errors in these bits have a greater impact on the degradation of the decoded and perceived sound quality, than others. With this in mind, the present paper makes two contributions. First, a detailed study of the sensitivity of individual bits in AMR-WB+ frames is provided. This information is required to design efficient unequal error protection (UEP) schemes. Then, a very low bit rate protection layer is proposed to increase the robustness of the codec to bit errors, and it is assessed using subjective audio quality tests. The protection layer adds only a few bits to the AMR-WB+ frame, which typically contain hundreds of bits of encoded audio. This paper is organized as follows. Section 2 gives the necessary insight into the AMR-WB+ codec architecture and bitstream structure. Section 3 presents the results of the bit sensitivity study, including a classification of AMR-WB+ bitstream frames into three sensitivity classes. The protection scheme is then presented in section 4. Finally, the results of a subjective quality evaluation are presented in section 5 and conclusions are drawn in section THE AMR-WB+ CODEC This section gives a brief overview of the AMR-WB+ codec, emphasizing its multi-mode nature, its flexible variable-length frame structure, and the embedded organization of its bitstream Overview of the codec AMR-WB+ is a hybrid codec that switches between a time-domain coding model and a transform-domain coding model. The time domain coding model is actually the AMR-WB 3GPP mandatory standard for wideband speech communication [4] (also standardized by the ITU-T as G.722.2), which is a multi-rate codec for wideband speech sampled at 16 khz that uses ACELP (Algebraic Code Excited Linear Prediction). The transform coding model is called Transform Coded excitation (TCX) [6] and is designed to switch seamlessly to and from the ACELP coding model. As shown in Fig. 1, the AMR-WB+ encoder selects between the ACELP and TCX coding models based on the characteristics of the input signal. Mode selection can be done either in closed-loop, in which case the coding model that maximizes a perceptually-weighted Signal-to-Noise Ratio (SNR) is selected, or in open-loop for reduced complexity. Audio Fig. 1: An overview of the AMR-WB+ encoder (bandwidth extension and stereo extension not shown) Under normal operation, the input audio signal is first down-mixed to mono and down-sampled to 25.6 khz. It is further decomposed into two bands: a lower band (0 to 6.4 khz) sampled at 12.8 khz, and an upper band containing all frequencies between 6.4 and 12.8 khz. The lower band is segmented into super-frames of 1024 samples that are in turn segmented in four short frames of 256 samples. These frames are then fed to the core ACELP/TCX coder for mode selection. A super-frame is subsequently encoded using one of the 26 possible combinations of four core coding modes, these modes being: ACELP spanning one frame and TCX spanning one frame (short TCX), two frames (medium TCX) and four frames (long TCX). Three out of those 26 possible coding configurations are represented on Fig. 2. (a) (b) (c) Mode Selection ACELP ACELP ACELP ACELP Short TCX ACELP Medium TCX Long TCX ACELP 1 frame TCX 1, 2 or 4 frames Mode Index, ISF One super-frame = khz PACKETIZATION Bitstream Fig. 2: Three out of the 26 possible coding configurations: (a) four ACELP frames, (b) one short TCX frame followed by one ACELP and one medium TCX frame, (c) one long TCX frame Page 2 of 18

3 The packetization process (also called multiplexing) which is critical for transmission consists of building four packets from one encoded super-frame. For the ACELP and short TCX coding modes, packetization is rather straightforward as one coded frame fills exactly one packet. For medium and long TCX however, coded frames need to be split between several packets. In those cases, the packetization process takes into account the possibility of losing some packet. In the case of long TCX frames for example, some important parameters are duplicated and sent in several packets to avoid losing an entire super-frame when one single packet is lost. Not represented in Fig. 1 are the bandwidth and stereo extensions. The upper band (6.4 khz to 12.8 khz) is encoded at a very low bit rate (800 bits/s) using a parametric approach called BandWidth Extension (BWE). BWE is based on spectral folding and spectral envelope shaping (using an LP filter). Proper scaling is also applied to ensure continuity between the lower and upper frequency bands. The stereo image of the input audio signal is encoded using a mid/side representation and a sub-band coding approach. The lower band (0 to 6.4 khz) of the mid signal is encoded using the hybrid ACELP/TCX model described above for mono signals. Regarding the side signal, its lower band (up to 1 khz) is encoded using a waveform coding approach similar to the core codec except that the ACELP coding mode is not used. Four stereo coding modes are available: short, medium and long TCX, plus a special short TCX mode that uses preecho reduction to improve transients. Note that the stereo coding mode is independent of the core coding mode. A balance factor that represents the ratio between the mid and the side signals is also transmitted. The middle band (up to 6.4 khz) of the side signal is encoded using a time-domain filtering approach that resembles to an inter-channel predictive technique. For the upper band (6.4 khz to 12.8 khz), BWE is applied twice, once for each channel (left and right). The codec s attributes (bit rate and audio bandwidth) are controlled by two input parameters: the mode index and the Internal Sampling Frequency (ISF). The mode index sets the number of bits per frame, and determines how that number of bits is shared between the core codec and the optional stereo extension (there are 47 possible combinations). The ISF parameter is used to tweak the bit rate and the bandwidth of the codec. By default the internal sampling frequency of the codec is 25.6 khz which sets the frame duration at 20 ms. The internal sampling frequency can be altered by a factor varying between 0.5 and 1.5. The frame duration (in ms), and consequently the bit rate (in kbps), changes accordingly The AMR-WB+ bitstream The AMR-WB+ bitstream is organized as shown in Fig. 3. A packet begins with the core coding mode which is either 0 for ACELP or 1, 2 or 3 for short, medium and long TCX, respectively. Then, there is the core (ACELP or TCX) bitstream. For mono signals, the packet ends with the bandwidth extension information. For stereo signals however, when the mode index calls for it, an optional stereo extension is inserted between the core bitstream and the bandwidth extension. That extension contains first the stereo coding mode, then the stereo low band, mid band and bandwidth extension. Core Mode Core Bitstream (ACELP or TCX) BWE (mono/right) Core codec Fig. 3: Embedded Structure of the AMR-WB+ Bitstream (one packet) Fig. 3 represents one single packet only. As explained in section 1.1, for medium and long TCX, the packetization procedure is responsible for distributing the bitstream among the required number of packets File format headers Stereo Mode Low-band Mid-band BWE (left) Stereo extension The 3GPP software simulation that is used as a reference to check compliance with the standard provides support for two different bitstream file formats. These file formats contain an additional header either for each packet (in the AMR-WB+ Transport Interface Format) or for each group of four packets that corresponds to a super-frame (in the AMR-WB+ file storage format). This header is mainly used to indicate to the decoder the coding parameters (coding mode, ISF). In the case of the Transport Interface Format, a transport frame index (from 0 to 3) gives the position of the frame within the super-frame. Page 3 of 18

4 It is important to note that the content of those headers is not taken into consideration in the following bit sensitivity study. The information contained in these headers is obviously critical, as it determines the size of the packets. It is therefore most likely to be sent separately to the decoder. For an application that requires a fixed configuration for example (fixed bit rate and audio bandwidth), the mode index and ISF could be sent to the decoder with a high level of protection during session initiation only. 3. BIT SENSITIVITY STUDY As shown in section 2, encoded AMR-WB+ frames are composed of a core (mono) part and an optional stereo extension. In this section, the sensitivity of individual bits in AMR-WB+ frames is closely examined. Since the structure of encoded frames depends on the coding mode, a sensitivity study is done for each core mode and for each stereo mode. All the bits in a frame are then divided into three sensitivity classes (class A/B/C with high/moderate/little-or-no sensitivity respectively), as is usually done for speech codecs Impact of Bit Errors The impact of bit errors most likely depends on the type of audio signal. To assess the sensitivity of AMR-WB+ bits, we therefore selected a one-minute stereo recording with an average rich mixed content (speech over music). We then conducted the experiments described in sections (for the core codec) and (for the stereo extension) for three different coding parameters that we considered representative of the operating range of AMR-WB+: 8.67 kbps mono (mode index=16, ISF=0.8333); kbps stereo (mode index=37, ISF=1.125); 32 kbps stereo (mode index=40, ISF=1.333) Core coder Instead of using the closed-loop or the open-loop mode selection of the encoder, we forced it to use the same coding mode over the entirety of the recording. This situation is somewhat artificial, but at least it provides a convenient way to study the sensitivity of individual bits. At the super-frame level, we therefore tested the four following coding mode configurations: Four ACELP; Four short TCX; Two medium TCX; One long TCX. For each set of coding parameters and for each coding mode configuration, we assessed the sensitivity of every bit within the super-frame. There are 832 bits at 8.67 kbps, 1696 bits at kbps, and 1920 bits at 32 kbps (including the optional stereo extension which is in fact studied independently in section 3.2). As mentioned in section 2.3, we did not take into account the file format headers. The sensitivity of a given bit was assessed by systematically (i.e. for all the super-frames of the recording) inversing that bit before decoding the bitstream, then computing the segmental Signal-to- Noise Ratio (SNR) of the decoded audio with respect to the audio signal decoded without errors. Fig. 4, which can be found after the references section, shows the SNR as a function of the bit position for the 8.67 kbps mono experiment. The SNR is displayed for one packet only (208 bits) for the ACELP and short TCX modes, for two packets (416 bits) for the medium TCX mode, and for four packets (832 bits) for the long TCX mode. For the medium and long TCX modes, a black triangle pointing up indicates the beginning of a new packet. The segmental SNR is clipped at 100 db. Therefore, that value indicates that the bit is not errorsensitive. See the last column (for 208 bits per frame) of Tables 14 to 17d in reference [1] for the correspondence between bit numbers and coding parameters. It seems reasonable to presume that the two mode bits, M1and M2, located at the beginning of every packet would be highly sensitive to bit errors since an error in one of these bits would result in the entire packet being misinterpreted. However, as shown in Table 1, this presumption seems to be valid only for the ACELP and short TCX modes. The reason why M1 (for the medium TCX mode), and both M1 and M2 (for the long TCX mode), seem non-sensitive (100 db) is that these bits are duplicated in the multiple packets of these modes (i.e., two packets for medium TCX or four packets for long TCX). To deal with lost packets, the standard decoder declares that two (four) consecutive packets are medium (long) TCX packets if it has received at least one medium (long) TCX out of those two (four) packets. Nevertheless, M1 and M2 are obviously not robust at all to multiple errors; therefore it is still legitimate to view them as highly sensitive. Page 4 of 18

5 ACELP Short Medium Long TCX TCX TCX M1 M M M Average Table 1: Sensitivity (segmental SNR in db) of the first (M1) and second (M2) bits of the mode compared to the average bit sensitivity (8.67 kbps mono) On average, the ACELP mode appears to be highly sensitive to bit errors as demonstrated by the top curve in Fig. 4. This is mainly because of the extensive use it makes of prediction. For ACELP, the most sensitive parameters are: 1. the first subvectors of the multistage ISP (Immitance Spectral Pairs) quantizer, which are located at the beginning of the packet; 2. the pitch value (also called the adaptive codebook index); and 3. the joint gain quantizer. These last two parameters are transmitted four times per frame (once per subframe), and the subframe structure is clearly apparent on the SNR curve. For the TCX, the most sensitive parameters are the first subvectors of the LPC quantizer, and the global gain. The measured sensitivity of Algebraic Vector Quantizer (ΑVQ) bits, which make up the largest component of the bitstream, is highly variable. At 8.67 kbps mono and in the short TCX mode, AVQ bits are located between bit numbers 58 and 191. As it can be seen on curve (b) of Fig. 4, the sensitivity of AVQ bits decreases gradually (increasing SNR) until roughly bit position 150, then increases steadily (decreasing SNR) until bit 191 which is the last AVQ bit. The difference in SNR between the most and least sensitive AVQ bits is more than 20 db which is far from marginal. This wide range of sensitivity can be explained by the way the TCX codec operates. The audio signal is first windowed and frequency transformed. The resulting set of frequency bins, which are complex-valued, are grouped four by four and quantized as a series of 8- dimensional vectors called subvectors. Depending on the window length (short, medium or long), the spectrum is organized in interlaced tracks of subvectors (one track in short TCX, two tracks in medium TCX, and four tracks in long TCX). Apart from the special overflow case where one track encroaches on another packet, each track of AVQ-quantized subvectors is normally packetized in its own packet. The AVQ uses two parameters to encode one subvector: a codebook number that indicates how many bits are used to code that subvector, and a codebook index that gives the value of that subvector. Codebook numbers are further encoded using a unitary code where 0 is represented by the string 0, 2 by the string 10, 3 by the string 110 and so on (codebook number 1 does not exist). As shown in Fig. 5, codebook numbers are multiplexed starting from the end of the AVQ bitstream, downwards, while codebook indices are multiplexed starting from the beginning of the AVQ bitstream, upwards. The boundary between codebook numbers and indices depends on the signal, but roughly 80% of the AVQ bitstream is used for indices while only 20% is used for codebook numbers. Sensitivity Low High E E E i 0 i 1 i 2 n 2 n 1 n 0 Codebook indices (i k occupies 4 n k bits) Codebook numbers n k One per subvector Unary encoded (n k E ) Fig. 5: Multiplexing strategy for the algebraic V.Q. in the core TCX bitstream, showing relative sensitivity. The impact of bit errors within the AVQ bitstream depends on several factors. Overall, bit errors have more impact when they hit a codebook number rather than a codebook index. This is because codebook numbers are encoded using a variable-length code which is vulnerable to error propagation. To be more precise, the impact of one bit error in a multiplexed codebook number depends on the bit pattern, as shown in Table 2. Changing a 0 into a 1 suppresses one subvector, while changing a 1 into a 0 introduces a false subvector (by splitting in two the codebook index of one subvector). Suppressing (or inserting) a subvector shifts the remainder of the spectrum to the left (or to the right). An error occurring within the codebook-number part of the bitstream also has an impact on the decoding of codebook indices, as it can lead the codebook-index decoder to read one or two extra 4-bits packets. Interestingly, the bit pattern for which errors have no impact on the decoding of codebook indices (last row in Table 2) is the most probable pattern during rich and energetic segments of signals. Page 5 of 18

6 Cbk. numbers (bit pattern) Subvectors Shift in indices (4-bits packets) Table 2: Impact of changing one bit in the multiplexed codebook numbers on the decoding of the TCX subvectors and codebook indices (all signs are changed when the arrow direction is reversed) Since codebook numbers are decoded starting from the end of the AVQ bitstream (which corresponds to lower frequencies) downwards, the effect of one error propagates to upper frequencies. Therefore, due to error propagation combined with the fact that lower frequencies are generally more energetic than higher frequencies (especially on speech signals), bit errors that occur in the lower part of the spectrum tend to have more impact than bit errors that occur in the upper part of the spectrum. This explains the inversed v-shaped of the SNR curve depicted in Fig. 5 and observable on Fig. 4. As shown in Fig. 4, longer TCX mode samples appear to be generally more error-sensitive than the shorter ones. In the case of medium and long TCX modes, some bits give the impression to be totally non-sensitive to errors (100 db). These bits are in fact redundancy bits for the global gain, and in the standard codec they are used only when some packets are lost. For all modes (except short TCX which is on average the least sensitive mode) the bits that appear as least sensitive are those for the BWE. We see two main reasons for that. First, unlike ACELP and TCX which both are waveform coders, BWE is a parametric coder that does not accurately reproduce the waveform of the signal. Therefore the signal-to-noise ratio is ineffective at measuring the impact of bit errors at that level. Then, the BWE bitstream consists merely of two vector quantizers: one for the gain, the other (a two-stage vector quantizer) for the spectral envelope. As will be discussed in section 3.1.3, the index assignment for those quantizers was properly done. Therefore they are inherently robust to single bit errors Stereo extension We conducted the same experiment as above for the four possible stereo extension modes. For this experiment, the standard mode selection procedure was used to determine the core coding mode, but the stereo extension mode was forced over the entirety of the recording. For obvious reasons, the most sensitive parameters were found to be the stereo coding mode, and the balance factor and global gain for the lower band. In stereo modes 0 and 1, the balance factor and global gain are sent within the same packet, while they are sent over different packets in the other modes. Overall, for the core codec as well as for the stereo extension, we found that the results were highly consistent across the different bit rates Index assignment adequacy Like any other speech or audio codec, AMR-WB+ makes an extensive use of vector quantization. It is well known that a vector quantizer will be more sensitive to bit errors if there is an inadequate index assignment (i.e. a bad pairing between indexes and codewords). Using a procedure similar to binary switching [7], we verified that the index assignment was correctly done for all vector quantizers in the AMR-WB+ standard. Therefore introducing robust vector quantization techniques such as pseudo-gray coding in the AMR-WB+ codec was not expected to be an effective protection measure. In addition, altering the index assignment scheme would have rendered the robust version of the AMR-WB+ codec incompatible with the standard version. In other words, if there was a little something to be gained by changing the index assignment, it was not worth losing the compatibility with the standard. This investigation backed up our choice to append an optional very low bit rate protection layer to the standard (unmodified) AMR- WB+ codec Sensitivity Classes To define sensitivity classes for the AMR-WB+ bitstream, we used the sensitivity curves measured as described in section 3.1, and corroborated them by listening to the decoded audio files. As it is usually done for speech codecs, three classes of bits were defined: Class A: Highly sensitive, contains all bits that do not tolerate any error. This class requires strong error correction and detection, and the whole frame must be declared as lost when one of these bits is in error. Page 6 of 18

7 Class B: Moderately sensitive, contains all bits that exhibit potentially significant sensitivity to errors. Under error-prone conditions, bits belonging to this class might require a certain level of error correction. But contrary to Class A bits, frames can be decoded even with a certain level of residual errors in Class B bits. Class C: Not sensitive, contains bits that are not sensitive enough to require any protection against errors. Sensitivity classes for the ACELP and the short, medium and long TCX modes are given in Tables 3 to 5. Sensitivity classes for the stereo extension modes 0 (short TCX without pre-echo reduction) to 3 (long TCX) are given in Tables 6 to 8. These tables (which can be found at the end of the paper) represent the kbps coding configuration only, but their generalization to any other coding configuration is rather straightforward. The main differences between the various rates are the number of algebraic-codebook bits (for ACELP) and the number of AVQ bits (for TCX and stereo extension) which both depend on the mode index. All those bits fall within the least sensitive class C Comparison to AMR-WB bit classification For the ACELP mode, the classification we give is similar to the classification defined by the 3GPP for the AMR-WB standard [8]. However, there are also some notable differences. First of all, the AMR-WB classification does not make use of class C. For example, in the AMR-WB codec at kbps (which forms the core of our kbps configuration), 72 bits out of the 317 bits that compose a frame fall within class A. All remaining bits fall within class B. We shifted all bits one class down (A to B and B to C) and created a more sensitive class populated only by the mode bits. The remaining differences are minor. First, the AMR-WB bitstream includes a Voice Activity Detection (VAD) bit classed as sensitive (class A) which does not exist in the AMR-WB+ bitstream. Conversely, the AMR-WB+ ACELP bitstream includes a two-bit mean energy parameter which does not exist in the AMR-WB bitstream and which we classed as sensitive (class B). Finally, concerning the ISP (Immitance Spectral Pairs) quantizer, we intentionally reduced the number of bits declared as sensitive so that class B for ACELP is not too big when compared to the same class for other coding modes. In comparison with AMR-WB, we removed bits 11 and 13 to 16 (2 nd ISP subvector), bits 24 to 27 (4 th ISP subvector), and bit 32 (5 th ISP subvector) from the error-sensitive class. 4. THE VERY LOW BIT RATE PROTECTION LAYER This section presents the very low bit rate protection layer. The standard AMR-WB+ bitstream frame is kept unchanged, but an extra layer of 16 bits per frame is added to allow for error detection and correction at the decoder. The protection scheme depends on both the core and the stereo coding modes Protection of the core coder The protection layer includes 14 bits for error detection or detection/correction of the core codec. The exact use of these bits depends on the core coding mode The core protection layer The core coding mode (class A bits) is protected by the customized Hamming-like systematic block code shown in Table 9. The codeword length is 6 (two mode bits located in the core bitstream and four redundancy bits sent in the protection layer). The minimum Hamming distance for this code is 4, which means that single bit errors can be detected and corrected, and that double bit errors can be detected but not corrected. Codewords containing three or more bit errors cannot be corrected properly and will result with an erroneous core coding mode. Mode Mode (binary) Redundancy Table 9: Error detecting and correcting code used to protect the core coding mode Since the core coding mode is by far the most sensitive parameter, the probability of having residual errors at that level is a critical consideration. Suppose that bit errors are uniformly distributed within the bitstream, with p the bit error probability. The probability of having k errors within n bits is: C k n p k n k (1 p), (1) Page 7 of 18

8 where k C is the number of k-combinations from a set n with n elements: C k n n! =. (2) k!( n k)! It is well known that a code with a minimum Hamming distance H can detect up to H/2 bit errors but correct only up to (H-1)/2 errors. For a code with N data bits and K redundancy bits, the probability p e of having residual errors after decoding is therefore equal to the probability of having more than H/2 bad bits within the N+K bits: H / 2 p = 1 C p (1 p). (3) e i= 0 i i N + K N + K i Note that, in our case, this calculation does not consider the possibility of further corrections using the inherent redundancy of modes 2 and 3 to help correct erroneous bits (in those multi-packet modes, the same mode information is sent in several packets). The error probability p e holds for one decoding only (i.e. one frame or one packet). The probability of having at least one bad mode when performing L successive decoding operations is: e p e L P = 1 (1 ). (4) In our case, for a bit error rate p=0.1% and with N=2 mode bits and K=4 redundancy bits, p e is equal to 2x With a typical frame duration of 20 ms, the number of frames per hour is L=3600*50. The probability of getting at least one bad mode within one hour is therefore as small as P e =0.36% which, as we will see in section 5, is likely to be acceptable for most applications using AMR-WB+. By comparison, under the same hypotheses but without mode protection, the probability of having at least one bad mode would be over 90% in less than 25 seconds. For all core coding modes, the first four bits of the protection layer (Tables 10 and 11) are therefore used to protect the core coding mode. The remainder of the protection layer is used for parameters that were identified as moderately sensitive (class B) in section 3.2. These bits are protected using either plain error detection (using parity bits) or error detection and correction (using systematic block codes). For the ACELP core coding mode, one parity bit covers the two bits of the mean energy parameter. Another parity bit is applied to the 16 bits of the quantization indices for the 1 st and 2 nd ISP subvectors. Then, two parity bits are used to protect each subframe: the first one is applied to the pitch parameter (also called adaptive codebook index) and the second one to the codebook gain. Regarding the pitch, the parity bit covers the seven most significant bits (MSB) for the 1 st and 3 rd subframes, and the two MSBs only for the 2 nd and 4 th subframes. Regarding the gain, the parity bit covers all seven bits of the quantization index for all subframes. For the short TCX core coding mode, one parity bit covers the three MSBs of the global gain. The remainder of the protection layer is used to protect the quantization indices for the 1 st and 3 rd ISP subvectors. The 1 st ISP subvector (8 bits) is protected using a block code with five redundancy bits and a minimum Hamming distance of 4. This block code can correct one bit error and detect up to two bit errors. The 3 rd ISP subvector (6 bits) is protected using a block code with four redundancy bits and a Hamming distance of 3. This block code can detect and correct single bit errors. The protection layer applied to the first packet of the medium TCX core coding mode is similar to the protection layer for the short TCX, except that the 2 nd ISP subvector (8 bits) is addressed instead of the 3 rd one in the 1 st packet. This 2 nd ISP subvector is also protected using a block code with four redundancy bits and a minimum Hamming distance of 3. For the 2 nd packet, the protection layer covers the 3 rd ISP subvector (using the same error correcting code as in short TCX) and the first eight bits of the AVQ codebook numbers (using the block code with four redundancy bits and a minimum Hamming distance of 3). Two bits are left unused. Those two bits could have been used to strengthen the error correcting code for the AVQ codebook numbers. However, two additional redundancy bits would only increase the minimum Hamming distance of the block code by 1. This would have enabled detecting (but not correcting) one more bit error. Detecting additional bit errors would not have been useful in this context since AMR-WB+ does not have a concealment algorithm to deal with erroneous AVQ codes The protection layer for the long TCX mode uses a similar structure to the one used for medium TCX to protect sensitive ISP subvectors and the first AVQ codebook numbers in each packet, as depicted in Table 11. Page 8 of 18

9 ACELP Short TCX Medium TCX Packet 1 Packet Mode redundancy Mean energy Mode redundancy Global gain Mode redundancy Global gain Mode redundancy 1st and 2nd 5 3rd ISP ISP subvect. subvector 6 Pitch SF1 1st ISP 1st ISP 7 Gain SF1 subvector subvector 8 Pitch SF2 First 8 bits 9 Gain SF2 of AVQ 10 Pitch SF3 codebook 11 Gain SF3 3rd ISP 2nd ISP numbers 12 Pitch SF4 subvector subvector unused 13 Gain SF4 unused Table 10: Protection layer for the core coder: modes 0 (ACELP), 1 (short TCX) and 2 (medium TCX) Long TCX Packet 1 Packet 2 Packet 3 Packet Mode redundancy Global gain Mode redundancy Mode redundancy First 8 bits Mode redundancy First 8 bits 5 3rd ISP of AVQ of AVQ 6 subvector codebook codebook 1st ISP 7 numbers numbers subvector 8 First 8 bits Next 8 bits Next 8 bits 9 10 of AVQ codebook of AVQ codebook of AVQ codebook nd ISP subvector numbers unused numbers unused numbers unused 13 unused unused unused Table 11: Protection layer for the core coder: mode 3 (long TCX) Decoding of the core protection layer How the AMR-WB+ decoder uses the protection layer to mitigate the effects of bit errors within the core bitstream is the focus of this subsection. The AMR-WB+ decoder first uses the first four bits of the protection layer to detect and correct bit errors within the core coding mode. The mode is decoded (using a minimal distance criterion) for the four packets of the super-frame. A packet is declared as erased if an error is detected but cannot be corrected. Then, the AMR-WB+ decoder uses the natural redundancy of core coding modes 2 and 3 to correct some of the residual errors (this is a part of the standard AMR-WB+ decoder): the core coding mode for the whole superframe (four packets) is set to 3 if at least one valid (i.e. not erased) packet is in mode 3. The same procedure is used for mode 2, once for the first two packets and another time for the last two packets. Packets that are declared as erased are concealed as provided by the standard AMR-WB+ decoder. Other packets are decoded using the normal decoding process. During the decoding process, parity bits are checked right before the related parameter is decoded. When a parity bit indicates the presence of an error, the concealment procedure is used instead of the normal decoding procedure, but for that parameter only. This is equivalent to replacing the traditional bad frame indicator (BFI) by partial BFIs. Regarding parameters protected by a Hamming code, error correction is performed (using minimal distance decoding) right before decoding that parameter. When an error is detected but cannot be corrected, the appropriate concealment procedure is used instead of the normal decoding procedure, for that parameter only Protection of the stereo extension The two remaining bits of the protection layer are used to protect the optional stereo extension. When the codec operates in mono, those two bits are left unused The stereo protection layer The first bit is used to detect errors that may affect the stereo mode. A parity bit is not the best solution for that purpose, because some errors are more critical than others. Consider for example single bit errors affecting the stereo mode. Mistaking a mode 0 for a mode 1 (or vice versa) has very little impact on the decoding of the stereo extension, since the bit allocation is the same for those two modes (see Table 6). However, mistaking a mode 0 with a mode 2, or a mode 1 with a mode 3, is more critical as it can lead to a flawed decoding of the balance factor or the global gain. Mistaking a mode 2 for a mode 3 (or conversely) would also have serious consequences on the decoding of the stereo extension. This latter type of error, however, is less likely to happen because of the inherent redundancy of the mode bits in these multi-packet modes. Page 9 of 18

10 Therefore, instead of including a parity bit which is independent of the type of error, the protection layer includes a control bit (0 for modes 0 and 1, and 1 for modes 2 and 3) to distinguish between single-packet modes and multi-packet modes. This control bit, in conjunction with the inherent redundancy for modes 2 and 3, gives a good level of protection for the stereo mode. The second protection bit is used to protect the balance factor and the global gain, which were both identified in section as the most sensitive parameters for the stereo extension. The exact use of this bit depends on the stereo mode, and more specifically on the number of packets used by the stereo extension (Tables 12 and 13). For stereo modes 0 and 1, one parity bit covers both the four most significant bits of the balance factor and the four most significant bits of the global gain. For stereo modes 2 and 3, as the balance factor and the global gain are sent separately, one parity bit covers the four most significant bits of the parameter that is contained in each packet. In mode 3, when none of those two parameters is present in a packet, this protection bit is not used. It must be noted that the parity bit for the balance factor and the global gain also enables detection of 50% of residual errors in the stereo modes, should they occur. Mode 0 Mode 1 Mode 2 Mode 2 Packet 1 Packet 2 14 Mode ctrl. Mode ctrl. Mode ctrl. Mode ctrl. 15 Bal. & gain Bal. & gain Balance Gain Table 12: Protection layer for the stereo extension: stereo modes 0, 1 and 2 Stereo mode 3 Packet 1 Packet 2 Packet 3 Packet 4 14 Mode ctrl. Mode ctrl. Mode ctrl. Mode ctrl. 15 Balance unused Gain unused Table 13: Protection layer for the stereo extension: stereo mode Decoding the stereo protection layer The AMR-WB+ decoder first declares as erased the stereo extension of packets that are supposed to be in stereo modes 2 or 3 but don t have the proper (multipacket) control bit. Then, it uses the same logic as for the core coding mode to correct some of the residual mode errors (this is a part of the standard AMR-WB+ decoder). The stereo image of packets that are declared as erased is decoded using the concealment procedure that is included in the standard decoder; otherwise the normal decoding procedure is used. In the course of the normal decoding procedure, for packets which have not been erased, the corresponding parity bit is checked right before decoding the balance factor and/or the global gain. If the parity bit indicates the presence of an error, then the concealed value of the parameter is used instead of the decoded value. 5. SUBJECTIVE PERFORMANCE EVALUATION In this section, the performance of the protection layer is assessed using the results of subjective quality tests Test procedure The test procedure was an adaptation of the MUSHRA (MUltiple Stimuli with Hidden Reference and Anchor) methodology [9]. The adaptation mostly consisted in removing the band-limited anchors, to force listeners to focus on coding artifacts and transmission impairments rather than on the audio bandwidth. Listeners had to rate different audio recordings created under different processing conditions on a 1 to 5 scale having 0.1 steps, with 1 meaning bad and 5 meaning excellent. The processing conditions used were: the original signal ( direct ), the audio signal produced by the AMR-WB+ codec without bit errors ( clear channel ), and the audio signal produced by the standard and the protected AMR-WB+ codecs, both at two different bit error rates. The identities of the processing conditions under which each of the recordings was generated were of course unknown to the listeners. The two Bit Error Rates (BER) considered were 0.1% and 0.5%. This represents a range which is well aligned with typical radio-communications environments. The same BER was applied to the standard codec and to the protected AMR-WB+ codec. This assumes that the difference in bit rate between the two codecs (16 bits per packet for the protection layer) is not big enough to induce a difference in the residual BER. The standard AMR-WB+ compression algorithm was used as a reference to assess the performance of the protection layer. The MP3 and the E-AAC+ codecs could also have been used as references. However, Page 10 of 18

11 when subjected to these bit error rate conditions these alternative codecs became ineffective and stopped operating. This is most likely due to their extensive reliance on variable-length coders such as Huffman coding. Notably however, neither the standard nor the protected AMR-WB+ decoder, exhibited this type of adverse behavior during the sensitivity study or when processing audio samples for the subjective test although they also make use of a variable-length coder (the unary code used for AVQ codebook numbers). We conducted three separate experiments using three different coding configurations: 8.67 kbps mono (mode index=16, ISF=0.8333), kbps stereo (mode index=37, ISF=1.125) and 32 kbps stereo (mode index=40, ISF=1.333). The 16-bits added for the protection layer raises the effective bit rates of these configurations to 9.33 kbps, kbps and kbps, respectively. Each experiment was done using a set of 16 audio tracks, with four tracks belonging to each of the following four categories: speech, music, speech between music, and speech over music. The selected audio tracks were each between 5 and 10 seconds in duration. Nine distinct experienced listeners participated in the test Test results The mean scores obtained by the different processing conditions (taking all audio categories into account) are shown in Fig. 6 for the 8.67 kbps mono experiment, Fig. 7 for the kbps stereo experiment and Fig. 8 for the 32 kbps stereo experiment. In the kbps experiment, the clear channel recording scores 4.2 on the 1 to 5 scale. At a 0.1% BER, the protection layer raises the score of the coded recording from 2.7 to 3.5. This means that the protection layer makes up for half of the quality degradation caused by bit errors. At a 0.5% BER, the protection layer raises the score from 1.47 to 2.17 which is also very significant. In the presence of bit errors, the output of the standard AMR-WB+ decoder (without the protection layer) is contaminated by some rather annoying channel artifacts (distortions or problems in the stereo image). The protection layer ensures that the decoded audio is free of major channel artifacts even at a 0.5% BER Fig. 6: Subjective test results for the 8.67 kbps mono experiment Fig. 7: Subjective test results for the kbps stereo experiment Fig. 8: Subjective test results for the 32 kbps stereo experiment Page 11 of 18

12 The quality improvement brought by the protection layer, though noticeable in all experiments, seems greater at higher bit rates than at lower bit rates. This is probably due to the fact that, at very low bit rates (8 kbps), coding artifacts tend to mask channel artifacts. 6. CONCLUSION The AMR-WB+ codec is inherently robust to packet losses and bit errors. In this paper we presented a protection layer to further enhance its robustness to bit errors. This very low bit rate (16 bits per frame) protection layer ensures that the decoded audio signal is free of major channel artifacts, even at a 0.5% BER which is significant. Further investigation of AMR-WB+ robustness improvements is focused around using the proposed 16- bits protection layer more effectively. The core coding mode could be protected more efficiently, for example, if we were to take into consideration the fact that over a super-frame eight core coding mode bits are sent to the decoder but only 26 core coding mode combinations are possible. In addition, we could also leverage the unused protection bits. When the codec operates in mono, the two bits that are currently reserved to protect the stereo extension could be used to provide additional protection to the core coded bits. For this purpose, we would design two different protection schemes, one for mono and the other for stereo operation. Taking advantage of the redundancy already present in the TCX bitstream (global gain redundancy in medium and long TCX for example) is another obvious consideration. Finally, it would be interesting to improve the robustness of the algebraic vector quantizer of the TCX (focusing, for example, on preventing bit errors from propagating throughout the spectrum, as explained in section 3.1.1). This level of increased robustness would most likely require a higher bit rate than the proposed 16-bit protection layer, however. 7. ACKNOWLEDGEMENTS The author wishes to express his sincere gratitude to Joanne Davidson and Baris Demir from VoiceAge Corporation for their help and dedication in reviewing this paper, and for their many valuable comments and suggestions. This work was funded by NSERC and VoiceAge Corporation. 8. REFERENCES [1] 3GPP Technical Specification TS , Audio codec processing functions; Extended Adaptive Multi-Rate - Wideband (AMR-WB+) codec; Transcoding functions, June [2] R. Salami, R. Lefebvre, A. Lakaniemi, K. Kontola, S. Bruhn and A. Taleb, Extended AMR-WB for High-Quality Audio on Mobile Devices, IEEE Communications Magazine, Vol. 44, No. 5, pp , May [3] J. Mäkinen, B. Bessette, S. Bruhn, P. Ojala, R. Salami, A. Taleb, AMR-WB+: a new audio coding standard for 3rd generation mobile audio services, IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP 2005), pp , Philadelphia, USA, March [4] B. Bessette, R. Salami, R. Lefebvre, M. Jelinek, J. Rotola-Pukkila, J. Vainio, H. Mikkola, K. Järvinen, The adaptive multirate wideband speech codec (AMR-WB), IEEE Transactions on Speech and Audio Processing, vol. 10, no 8, pp , November [5] ETSI Technical Specification TS V1.2.1, Digital Video Broadcasting (DVB); Specification for the use of Video and Audio Coding in DVB services delivered directly over IP protocols, April 2006 [6] R. Salami, R. Lefebvre, and C. Laflamme, A wideband codec at 16/24 kbit/s with 10 ms frames, 1997 IEEE Workshop on Speech Coding, pp , Pocono Manor, Pennsylvania USA, September 7-10, [7] K. Zeger, A. Gersho, Pseudo-Gray Coding, IEEE Transactions on Communications, vol. 38, n o 12, pp , December [8] 3GPP Technical Specification TS , AMR Wideband Speech Codec; Frame Structure, June [9] ITU-R Recommendation BS , Method for the subjective assessment of intermediate quality levels of coding systems, January 2003 Page 12 of 18

13 SNR (db) (a) SNR (db) (b) SNR (db) (c) SNR (db) (d) Bit position Fig. 4: Segmental SNR as a function of the position of a systematically-reversed bit. AMR-WB+ operating at 8.67 kbps mono (mode index=16, ISF=0.833). (a) ACELP, (b) short TCX, (c) medium TCX, (d) long TCX. Page 13 of 18

14 ACELP Short TCX Parameter Bits Bit number Parameter Bits Bit number Mode Mode st ISP subvec st ISP subvec nd ISP subvec nd ISP subvec rd ISP subvec rd ISP subvec th ISP subvec th ISP subvec th ISP subvec th ISP subvec th ISP subvec th ISP subvec th ISP subvec th ISP subvec Index of mean energy Noise factor Adaptive CB Index Global gain LTP-filtering flag 1 59 Algebraic VQ Algebraic CB Indices codebook gains Adaptive CB Index LTP-filtering flag Algebraic CB Indices codebook gains Adaptive CB Index LTP-filtering flag Algebraic CB Indices codebook gains Adaptive CB Index LTP-filtering flag Algebraic CB Indices codebook gains Index of HF ISP Index of HF ISP Index of HF gain Index of HF gain Class A: 2 bits Class A: 2 bits Class B: 62 bits Class B: 25 bits Class C: 272 bits Class C: 309 bits Table 3: Bit sensitivity classification for the ACELP and short TCX core coding modes kbps stereo (mode index=37, ISF=1.125) Page 14 of 18

Audio Engineering Society. Convention Paper. Presented at the 126th Convention 2009 May 7 10 Munich, Germany

Audio Engineering Society. Convention Paper. Presented at the 126th Convention 2009 May 7 10 Munich, Germany Audio Engineering Society Convention Paper Presented at the 126th Convention 2009 May 7 10 Munich, Germany 7712 The papers at this Convention have been selected on the basis of a submitted abstract and

More information

MPEG-4 aacplus - Audio coding for today s digital media world

MPEG-4 aacplus - Audio coding for today s digital media world MPEG-4 aacplus - Audio coding for today s digital media world Whitepaper by: Gerald Moser, Coding Technologies November 2005-1 - 1. Introduction Delivering high quality digital broadcast content to consumers

More information

New Results in Low Bit Rate Speech Coding and Bandwidth Extension

New Results in Low Bit Rate Speech Coding and Bandwidth Extension Audio Engineering Society Convention Paper Presented at the 121st Convention 2006 October 5 8 San Francisco, CA, USA This convention paper has been reproduced from the author's advance manuscript, without

More information

The MPEG-4 General Audio Coder

The MPEG-4 General Audio Coder The MPEG-4 General Audio Coder Bernhard Grill Fraunhofer Institute for Integrated Circuits (IIS) grl 6/98 page 1 Outline MPEG-2 Advanced Audio Coding (AAC) MPEG-4 Extensions: Perceptual Noise Substitution

More information

SAOC and USAC. Spatial Audio Object Coding / Unified Speech and Audio Coding. Lecture Audio Coding WS 2013/14. Dr.-Ing.

SAOC and USAC. Spatial Audio Object Coding / Unified Speech and Audio Coding. Lecture Audio Coding WS 2013/14. Dr.-Ing. SAOC and USAC Spatial Audio Object Coding / Unified Speech and Audio Coding Lecture Audio Coding WS 2013/14 Dr.-Ing. Andreas Franck Fraunhofer Institute for Digital Media Technology IDMT, Germany SAOC

More information

Principles of Audio Coding

Principles of Audio Coding Principles of Audio Coding Topics today Introduction VOCODERS Psychoacoustics Equal-Loudness Curve Frequency Masking Temporal Masking (CSIT 410) 2 Introduction Speech compression algorithm focuses on exploiting

More information

DRA AUDIO CODING STANDARD

DRA AUDIO CODING STANDARD Applied Mechanics and Materials Online: 2013-06-27 ISSN: 1662-7482, Vol. 330, pp 981-984 doi:10.4028/www.scientific.net/amm.330.981 2013 Trans Tech Publications, Switzerland DRA AUDIO CODING STANDARD Wenhua

More information

Optical Storage Technology. MPEG Data Compression

Optical Storage Technology. MPEG Data Compression Optical Storage Technology MPEG Data Compression MPEG-1 1 Audio Standard Moving Pictures Expert Group (MPEG) was formed in 1988 to devise compression techniques for audio and video. It first devised the

More information

MPEG-4 General Audio Coding

MPEG-4 General Audio Coding MPEG-4 General Audio Coding Jürgen Herre Fraunhofer Institute for Integrated Circuits (IIS) Dr. Jürgen Herre, hrr@iis.fhg.de 1 General Audio Coding Solid state players, Internet audio, terrestrial and

More information

Speech-Coding Techniques. Chapter 3

Speech-Coding Techniques. Chapter 3 Speech-Coding Techniques Chapter 3 Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones and zeros The lower the bandwidth, the lower the quality RTP payload types

More information

Convention Paper Presented at the 121st Convention 2006 October 5 8 San Francisco, CA, USA

Convention Paper Presented at the 121st Convention 2006 October 5 8 San Francisco, CA, USA Audio Engineering Society Convention Paper Presented at the 121st Convention 2006 October 5 8 San Francisco, CA, USA This convention paper has been reproduced from the author s advance manuscript, without

More information

The Steganography In Inactive Frames Of Voip

The Steganography In Inactive Frames Of Voip The Steganography In Inactive Frames Of Voip This paper describes a novel high-capacity steganography algorithm for embedding data in the inactive frames of low bit rate audio streams encoded by G.723.1

More information

Proceedings of Meetings on Acoustics

Proceedings of Meetings on Acoustics Proceedings of Meetings on Acoustics Volume 19, 213 http://acousticalsociety.org/ ICA 213 Montreal Montreal, Canada 2-7 June 213 Engineering Acoustics Session 2pEAb: Controlling Sound Quality 2pEAb1. Subjective

More information

Simple Watermark for Stereo Audio Signals with Modulated High-Frequency Band Delay

Simple Watermark for Stereo Audio Signals with Modulated High-Frequency Band Delay ACOUSTICAL LETTER Simple Watermark for Stereo Audio Signals with Modulated High-Frequency Band Delay Kazuhiro Kondo and Kiyoshi Nakagawa Graduate School of Science and Engineering, Yamagata University,

More information

14th European Signal Processing Conference (EUSIPCO 2006), Florence, Italy, September 4-8, 2006, copyright by EURASIP

14th European Signal Processing Conference (EUSIPCO 2006), Florence, Italy, September 4-8, 2006, copyright by EURASIP TRADEOFF BETWEEN COMPLEXITY AND MEMORY SIZE IN THE 3GPP ENHANCED PLUS DECODER: SPEED-CONSCIOUS AND MEMORY- CONSCIOUS DECODERS ON A 16-BIT FIXED-POINT DSP Osamu Shimada, Toshiyuki Nomura, Akihiko Sugiyama

More information

Digital Speech Coding

Digital Speech Coding Digital Speech Processing David Tipper Associate Professor Graduate Program of Telecommunications and Networking University of Pittsburgh Telcom 2700/INFSCI 1072 Slides 7 http://www.sis.pitt.edu/~dtipper/tipper.html

More information

2.4 Audio Compression

2.4 Audio Compression 2.4 Audio Compression 2.4.1 Pulse Code Modulation Audio signals are analog waves. The acoustic perception is determined by the frequency (pitch) and the amplitude (loudness). For storage, processing and

More information

Mpeg 1 layer 3 (mp3) general overview

Mpeg 1 layer 3 (mp3) general overview Mpeg 1 layer 3 (mp3) general overview 1 Digital Audio! CD Audio:! 16 bit encoding! 2 Channels (Stereo)! 44.1 khz sampling rate 2 * 44.1 khz * 16 bits = 1.41 Mb/s + Overhead (synchronization, error correction,

More information

Compressed Audio Demystified by Hendrik Gideonse and Connor Smith. All Rights Reserved.

Compressed Audio Demystified by Hendrik Gideonse and Connor Smith. All Rights Reserved. Compressed Audio Demystified Why Music Producers Need to Care About Compressed Audio Files Download Sales Up CD Sales Down High-Definition hasn t caught on yet Consumers don t seem to care about high fidelity

More information

Technical PapER. between speech and audio coding. Fraunhofer Institute for Integrated Circuits IIS

Technical PapER. between speech and audio coding. Fraunhofer Institute for Integrated Circuits IIS Technical PapER Extended HE-AAC Bridging the gap between speech and audio coding One codec taking the place of two; one unified system bridging a troublesome gap. The fifth generation MPEG audio codec

More information

5: Music Compression. Music Coding. Mark Handley

5: Music Compression. Music Coding. Mark Handley 5: Music Compression Mark Handley Music Coding LPC-based codecs model the sound source to achieve good compression. Works well for voice. Terrible for music. What if you can t model the source? Model the

More information

A MULTI-RATE SPEECH AND CHANNEL CODEC: A GSM AMR HALF-RATE CANDIDATE

A MULTI-RATE SPEECH AND CHANNEL CODEC: A GSM AMR HALF-RATE CANDIDATE A MULTI-RATE SPEECH AND CHANNEL CODEC: A GSM AMR HALF-RATE CANDIDATE S.Villette, M.Stefanovic, A.Kondoz Centre for Communication Systems Research University of Surrey, Guildford GU2 5XH, Surrey, United

More information

Multimedia Communications. Audio coding

Multimedia Communications. Audio coding Multimedia Communications Audio coding Introduction Lossy compression schemes can be based on source model (e.g., speech compression) or user model (audio coding) Unlike speech, audio signals can be generated

More information

The following bit rates are recommended for broadcast contribution employing the most commonly used audio coding schemes:

The following bit rates are recommended for broadcast contribution employing the most commonly used audio coding schemes: Page 1 of 8 1. SCOPE This Operational Practice sets out guidelines for minimising the various artefacts that may distort audio signals when low bit-rate coding schemes are employed to convey contribution

More information

GSM Network and Services

GSM Network and Services GSM Network and Services Voice coding 1 From voice to radio waves voice/source coding channel coding block coding convolutional coding interleaving encryption burst building modulation diff encoding symbol

More information

Dusseldorf, Germany Agenda item: th -20 th June, Status Report of SMG11 at SMG#32

Dusseldorf, Germany Agenda item: th -20 th June, Status Report of SMG11 at SMG#32 ETSI TC SMG#32 Tdoc SMG P-00-269 Dusseldorf, Germany Agenda item: 6.10 19 th -20 th June, 2000 Source: Chairman, SMG11 * Status Report of SMG11 at SMG#32 Executive Summary This document provides an overview

More information

Speech and audio coding

Speech and audio coding Institut Mines-Telecom Speech and audio coding Marco Cagnazzo, cagnazzo@telecom-paristech.fr MN910 Advanced compression Outline Introduction Introduction Speech signal Music signal Masking Codeurs simples

More information

ETSI TS V ( )

ETSI TS V ( ) TS 126 441 V12.0.0 (2014-10) TECHNICAL SPECIFICATION Universal Mobile Telecommunications System (UMTS); LTE; EVS Codec General Overview (3GPP TS 26.441 version 12.0.0 Release 12) 1 TS 126 441 V12.0.0 (2014-10)

More information

AUDIOVISUAL COMMUNICATION

AUDIOVISUAL COMMUNICATION AUDIOVISUAL COMMUNICATION Laboratory Session: Audio Processing and Coding The objective of this lab session is to get the students familiar with audio processing and coding, notably psychoacoustic analysis

More information

Audio-coding standards

Audio-coding standards Audio-coding standards The goal is to provide CD-quality audio over telecommunications networks. Almost all CD audio coders are based on the so-called psychoacoustic model of the human auditory system.

More information

CODING METHOD FOR EMBEDDING AUDIO IN VIDEO STREAM. Harri Sorokin, Jari Koivusaari, Moncef Gabbouj, and Jarmo Takala

CODING METHOD FOR EMBEDDING AUDIO IN VIDEO STREAM. Harri Sorokin, Jari Koivusaari, Moncef Gabbouj, and Jarmo Takala CODING METHOD FOR EMBEDDING AUDIO IN VIDEO STREAM Harri Sorokin, Jari Koivusaari, Moncef Gabbouj, and Jarmo Takala Tampere University of Technology Korkeakoulunkatu 1, 720 Tampere, Finland ABSTRACT In

More information

AUDIOVISUAL COMMUNICATION

AUDIOVISUAL COMMUNICATION AUDIOVISUAL COMMUNICATION Laboratory Session: Audio Processing and Coding The objective of this lab session is to get the students familiar with audio processing and coding, notably psychoacoustic analysis

More information

ISO/IEC INTERNATIONAL STANDARD. Information technology MPEG audio technologies Part 3: Unified speech and audio coding

ISO/IEC INTERNATIONAL STANDARD. Information technology MPEG audio technologies Part 3: Unified speech and audio coding INTERNATIONAL STANDARD This is a preview - click here to buy the full publication ISO/IEC 23003-3 First edition 2012-04-01 Information technology MPEG audio technologies Part 3: Unified speech and audio

More information

6MPEG-4 audio coding tools

6MPEG-4 audio coding tools 6MPEG-4 audio coding 6.1. Introduction to MPEG-4 audio MPEG-4 audio [58] is currently one of the most prevalent audio coding standards. It combines many different types of audio coding into one integrated

More information

On Improving the Performance of an ACELP Speech Coder

On Improving the Performance of an ACELP Speech Coder On Improving the Performance of an ACELP Speech Coder ARI HEIKKINEN, SAMULI PIETILÄ, VESA T. RUOPPILA, AND SAKARI HIMANEN Nokia Research Center, Speech and Audio Systems Laboratory P.O. Box, FIN-337 Tampere,

More information

Lecture 16 Perceptual Audio Coding

Lecture 16 Perceptual Audio Coding EECS 225D Audio Signal Processing in Humans and Machines Lecture 16 Perceptual Audio Coding 2012-3-14 Professor Nelson Morgan today s lecture by John Lazzaro www.icsi.berkeley.edu/eecs225d/spr12/ Hero

More information

ARIB STD-T53-C.S Circuit-Switched Video Conferencing Services

ARIB STD-T53-C.S Circuit-Switched Video Conferencing Services ARIB STD-T-C.S00-0 Circuit-Switched Video Conferencing Services Refer to "Industrial Property Rights (IPR)" in the preface of ARIB STD-T for Related Industrial Property Rights. Refer to "Notice" in the

More information

Perceptual Pre-weighting and Post-inverse weighting for Speech Coding

Perceptual Pre-weighting and Post-inverse weighting for Speech Coding Perceptual Pre-weighting and Post-inverse weighting for Speech Coding Niranjan Shetty and Jerry D. Gibson Department of Electrical and Computer Engineering University of California, Santa Barbara, CA,

More information

2 Framework of The Proposed Voice Quality Assessment System

2 Framework of The Proposed Voice Quality Assessment System 3rd International Conference on Multimedia Technology(ICMT 2013) A Packet-layer Quality Assessment System for VoIP Liangliang Jiang 1 and Fuzheng Yang 2 Abstract. A packet-layer quality assessment system

More information

Multi-Pulse Based Code Excited Linear Predictive Speech Coder with Fine Granularity Scalability for Tonal Language

Multi-Pulse Based Code Excited Linear Predictive Speech Coder with Fine Granularity Scalability for Tonal Language Journal of Computer Science 6 (11): 1288-1292, 2010 ISSN 1549-3636 2010 Science Publications Multi-Pulse Based Code Excited Linear Predictive Speech Coder with Fine Granularity Scalability for Tonal Language

More information

Perceptual coding. A psychoacoustic model is used to identify those signals that are influenced by both these effects.

Perceptual coding. A psychoacoustic model is used to identify those signals that are influenced by both these effects. Perceptual coding Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal. Perceptual encoders, however, have been designed for the compression of general

More information

Scalable Coding of Image Collections with Embedded Descriptors

Scalable Coding of Image Collections with Embedded Descriptors Scalable Coding of Image Collections with Embedded Descriptors N. Adami, A. Boschetti, R. Leonardi, P. Migliorati Department of Electronic for Automation, University of Brescia Via Branze, 38, Brescia,

More information

Parametric Coding of High-Quality Audio

Parametric Coding of High-Quality Audio Parametric Coding of High-Quality Audio Prof. Dr. Gerald Schuller Fraunhofer IDMT & Ilmenau Technical University Ilmenau, Germany 1 Waveform vs Parametric Waveform Filter-bank approach Mainly exploits

More information

Presents 2006 IMTC Forum ITU-T T Workshop

Presents 2006 IMTC Forum ITU-T T Workshop Presents 2006 IMTC Forum ITU-T T Workshop G.729EV: An 8-32 kbit/s scalable wideband speech and audio coder bitstream interoperable with G.729 Presented by Christophe Beaugeant On behalf of ETRI, France

More information

INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO

INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO ISO/IEC JTC1/SC29/WG11 N15071 February 2015, Geneva,

More information

the Audio Engineering Society. Convention Paper Presented at the 120th Convention 2006 May Paris, France

the Audio Engineering Society. Convention Paper Presented at the 120th Convention 2006 May Paris, France Audio Engineering Society Convention Paper Presented at the 120th Convention 2006 May 20 23 Paris, France This convention paper has been reproduced from the author s advance manuscript, without editing,

More information

Implementation of G.729E Speech Coding Algorithm based on TMS320VC5416 YANG Xiaojin 1, a, PAN Jinjin 2,b

Implementation of G.729E Speech Coding Algorithm based on TMS320VC5416 YANG Xiaojin 1, a, PAN Jinjin 2,b International Conference on Materials Engineering and Information Technology Applications (MEITA 2015) Implementation of G.729E Speech Coding Algorithm based on TMS320VC5416 YANG Xiaojin 1, a, PAN Jinjin

More information

The BroadVoice Speech Coding Algorithm. Juin-Hwey (Raymond) Chen, Ph.D. Senior Technical Director Broadcom Corporation March 22, 2010

The BroadVoice Speech Coding Algorithm. Juin-Hwey (Raymond) Chen, Ph.D. Senior Technical Director Broadcom Corporation March 22, 2010 The BroadVoice Speech Coding Algorithm Juin-Hwey (Raymond) Chen, Ph.D. Senior Technical Director Broadcom Corporation March 22, 2010 Outline 1. Introduction 2. Basic Codec Structures 3. Short-Term Prediction

More information

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 SUBJECTIVE AND OBJECTIVE QUALITY EVALUATION FOR AUDIO WATERMARKING BASED ON SINUSOIDAL AMPLITUDE MODULATION PACS: 43.10.Pr, 43.60.Ek

More information

ADAPTIVE JOINT H.263-CHANNEL CODING FOR MEMORYLESS BINARY CHANNELS

ADAPTIVE JOINT H.263-CHANNEL CODING FOR MEMORYLESS BINARY CHANNELS ADAPTIVE JOINT H.263-CHANNEL ING FOR MEMORYLESS BINARY CHANNELS A. Navarro, J. Tavares Aveiro University - Telecommunications Institute, 38 Aveiro, Portugal, navarro@av.it.pt Abstract - The main purpose

More information

Perspectives on Multimedia Quality Prediction Methodologies for Advanced Mobile and IP-based Telephony

Perspectives on Multimedia Quality Prediction Methodologies for Advanced Mobile and IP-based Telephony Perspectives on Multimedia Quality Prediction Methodologies for Advanced Mobile and IP-based Telephony Nobuhiko Kitawaki University of Tsukuba 1-1-1, Tennoudai, Tsukuba-shi, 305-8573 Japan. E-mail: kitawaki@cs.tsukuba.ac.jp

More information

Scalable Perceptual and Lossless Audio Coding based on MPEG-4 AAC

Scalable Perceptual and Lossless Audio Coding based on MPEG-4 AAC Scalable Perceptual and Lossless Audio Coding based on MPEG-4 AAC Ralf Geiger 1, Gerald Schuller 1, Jürgen Herre 2, Ralph Sperschneider 2, Thomas Sporer 1 1 Fraunhofer IIS AEMT, Ilmenau, Germany 2 Fraunhofer

More information

Voice Quality Assessment for Mobile to SIP Call over Live 3G Network

Voice Quality Assessment for Mobile to SIP Call over Live 3G Network Abstract 132 Voice Quality Assessment for Mobile to SIP Call over Live 3G Network G.Venkatakrishnan, I-H.Mkwawa and L.Sun Signal Processing and Multimedia Communications, University of Plymouth, Plymouth,

More information

Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal.

Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal. Perceptual coding Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal. Perceptual encoders, however, have been designed for the compression of general

More information

ADAPTIVE PICTURE SLICING FOR DISTORTION-BASED CLASSIFICATION OF VIDEO PACKETS

ADAPTIVE PICTURE SLICING FOR DISTORTION-BASED CLASSIFICATION OF VIDEO PACKETS ADAPTIVE PICTURE SLICING FOR DISTORTION-BASED CLASSIFICATION OF VIDEO PACKETS E. Masala, D. Quaglia, J.C. De Martin Λ Dipartimento di Automatica e Informatica/ Λ IRITI-CNR Politecnico di Torino, Italy

More information

Audio Fundamentals, Compression Techniques & Standards. Hamid R. Rabiee Mostafa Salehi, Fatemeh Dabiran, Hoda Ayatollahi Spring 2011

Audio Fundamentals, Compression Techniques & Standards. Hamid R. Rabiee Mostafa Salehi, Fatemeh Dabiran, Hoda Ayatollahi Spring 2011 Audio Fundamentals, Compression Techniques & Standards Hamid R. Rabiee Mostafa Salehi, Fatemeh Dabiran, Hoda Ayatollahi Spring 2011 Outlines Audio Fundamentals Sampling, digitization, quantization μ-law

More information

ROBUST SPEECH CODING WITH EVS Anssi Rämö, Adriana Vasilache and Henri Toukomaa Nokia Techonologies, Tampere, Finland

ROBUST SPEECH CODING WITH EVS Anssi Rämö, Adriana Vasilache and Henri Toukomaa Nokia Techonologies, Tampere, Finland ROBUST SPEECH CODING WITH EVS Anssi Rämö, Adriana Vasilache and Henri Toukomaa Nokia Techonologies, Tampere, Finland 2015-12-16 1 OUTLINE Very short introduction to EVS Robustness EVS LSF robustness features

More information

MPEG-1. Overview of MPEG-1 1 Standard. Introduction to perceptual and entropy codings

MPEG-1. Overview of MPEG-1 1 Standard. Introduction to perceptual and entropy codings MPEG-1 Overview of MPEG-1 1 Standard Introduction to perceptual and entropy codings Contents History Psychoacoustics and perceptual coding Entropy coding MPEG-1 Layer I/II Layer III (MP3) Comparison and

More information

ITNP80: Multimedia! Sound-II!

ITNP80: Multimedia! Sound-II! Sound compression (I) Compression of sound data requires different techniques from those for graphical data Requirements are less stringent than for video data rate for CD-quality audio is much less than

More information

Source Coding Basics and Speech Coding. Yao Wang Polytechnic University, Brooklyn, NY11201

Source Coding Basics and Speech Coding. Yao Wang Polytechnic University, Brooklyn, NY11201 Source Coding Basics and Speech Coding Yao Wang Polytechnic University, Brooklyn, NY1121 http://eeweb.poly.edu/~yao Outline Why do we need to compress speech signals Basic components in a source coding

More information

Audio-coding standards

Audio-coding standards Audio-coding standards The goal is to provide CD-quality audio over telecommunications networks. Almost all CD audio coders are based on the so-called psychoacoustic model of the human auditory system.

More information

Chapter 14 MPEG Audio Compression

Chapter 14 MPEG Audio Compression Chapter 14 MPEG Audio Compression 14.1 Psychoacoustics 14.2 MPEG Audio 14.3 Other Commercial Audio Codecs 14.4 The Future: MPEG-7 and MPEG-21 14.5 Further Exploration 1 Li & Drew c Prentice Hall 2003 14.1

More information

MULTIMODE TREE CODING OF SPEECH WITH PERCEPTUAL PRE-WEIGHTING AND POST-WEIGHTING

MULTIMODE TREE CODING OF SPEECH WITH PERCEPTUAL PRE-WEIGHTING AND POST-WEIGHTING MULTIMODE TREE CODING OF SPEECH WITH PERCEPTUAL PRE-WEIGHTING AND POST-WEIGHTING Pravin Ramadas, Ying-Yi Li, and Jerry D. Gibson Department of Electrical and Computer Engineering, University of California,

More information

Module 6 STILL IMAGE COMPRESSION STANDARDS

Module 6 STILL IMAGE COMPRESSION STANDARDS Module 6 STILL IMAGE COMPRESSION STANDARDS Lesson 19 JPEG-2000 Error Resiliency Instructional Objectives At the end of this lesson, the students should be able to: 1. Name two different types of lossy

More information

Voice Analysis for Mobile Networks

Voice Analysis for Mobile Networks White Paper VIAVI Solutions Voice Analysis for Mobile Networks Audio Quality Scoring Principals for Voice Quality of experience analysis for voice... 3 Correlating MOS ratings to network quality of service...

More information

Open AMR Initiative. Technical Documentation. Version 1.0 Revision

Open AMR Initiative. Technical Documentation. Version 1.0 Revision VoiceAge Corporation 750 Chemin Lucerne, Suite 250 Ville Mont-Royal (Quebec) H3R 2H6 Canada (514) 737-4940 Fax (514) 908-2037 www.voiceage.com Open AMR Initiative Technical Documentation Version 1.0 Revision

More information

SPREAD SPECTRUM AUDIO WATERMARKING SCHEME BASED ON PSYCHOACOUSTIC MODEL

SPREAD SPECTRUM AUDIO WATERMARKING SCHEME BASED ON PSYCHOACOUSTIC MODEL SPREAD SPECTRUM WATERMARKING SCHEME BASED ON PSYCHOACOUSTIC MODEL 1 Yüksel Tokur 2 Ergun Erçelebi e-mail: tokur@gantep.edu.tr e-mail: ercelebi@gantep.edu.tr 1 Gaziantep University, MYO, 27310, Gaziantep,

More information

HAVE YOUR CAKE AND HEAR IT TOO: A HUFFMAN CODED, BLOCK SWITCHING, STEREO PERCEPTUAL AUDIO CODER

HAVE YOUR CAKE AND HEAR IT TOO: A HUFFMAN CODED, BLOCK SWITCHING, STEREO PERCEPTUAL AUDIO CODER HAVE YOUR CAKE AND HEAR IT TOO: A HUFFMAN CODED, BLOCK SWITCHING, STEREO PERCEPTUAL AUDIO CODER Rob Colcord, Elliot Kermit-Canfield and Blane Wilson Center for Computer Research in Music and Acoustics,

More information

Channel-Adaptive Error Protection for Scalable Audio Streaming over Wireless Internet

Channel-Adaptive Error Protection for Scalable Audio Streaming over Wireless Internet Channel-Adaptive Error Protection for Scalable Audio Streaming over Wireless Internet GuiJin Wang Qian Zhang Wenwu Zhu Jianping Zhou Department of Electronic Engineering, Tsinghua University, Beijing,

More information

ELL 788 Computational Perception & Cognition July November 2015

ELL 788 Computational Perception & Cognition July November 2015 ELL 788 Computational Perception & Cognition July November 2015 Module 11 Audio Engineering: Perceptual coding Coding and decoding Signal (analog) Encoder Code (Digital) Code (Digital) Decoder Signal (analog)

More information

/ / _ / _ / _ / / / / /_/ _/_/ _/_/ _/_/ _\ / All-American-Advanced-Audio-Codec

/ / _ / _ / _ / / / / /_/ _/_/ _/_/ _/_/ _\ / All-American-Advanced-Audio-Codec / / _ / _ / _ / / / / /_/ _/_/ _/_/ _/_/ _\ / All-American-Advanced-Audio-Codec () **Z ** **=Z ** **= ==== == **= ==== \"\" === ==== \"\"\" ==== \"\"\"\" Tim O Brien Colin Sullivan Jennifer Hsu Mayank

More information

A New Technique for Transceiver Location Data Over LTE Voice Channels

A New Technique for Transceiver Location Data Over LTE Voice Channels International Journal of Research in Engineering and Science (IJRES) ISSN (Online): 2320-9364, ISSN (Print): 2320-9356 Volume 4 Issue 10 ǁ October. 2016 ǁ PP.15-19 A New Technique for Transceiver Location

More information

Module 9 AUDIO CODING. Version 2 ECE IIT, Kharagpur

Module 9 AUDIO CODING. Version 2 ECE IIT, Kharagpur Module 9 AUDIO CODING Lesson 29 Transform and Filter banks Instructional Objectives At the end of this lesson, the students should be able to: 1. Define the three layers of MPEG-1 audio coding. 2. Define

More information

3GPP TS V ( )

3GPP TS V ( ) TS 26.101 V10.0.0 (2011-03) Technical Specification 3rd Generation Partnership Project; Technical Specification Group Services and System Aspects; Mandatory speech codec speech processing functions; Adaptive

More information

Synopsis of Basic VoIP Concepts

Synopsis of Basic VoIP Concepts APPENDIX B The Catalyst 4224 Access Gateway Switch (Catalyst 4224) provides Voice over IP (VoIP) gateway applications for a micro branch office. This chapter introduces some basic VoIP concepts. This chapter

More information

Voice Over LTE (VoLTE) Technology. July 23, 2018 Tim Burke

Voice Over LTE (VoLTE) Technology. July 23, 2018 Tim Burke Voice Over LTE (VoLTE) Technology July 23, 2018 Tim Burke Range of Frequencies Humans Can Hear 20,000 Hz 20 Hz Human Hearing 8,000 Hz 10,000 Hz 14,000 Hz 12,000 Hz Range of Frequencies Designed For Entertainment

More information

MPEG-4 Structured Audio Systems

MPEG-4 Structured Audio Systems MPEG-4 Structured Audio Systems Mihir Anandpara The University of Texas at Austin anandpar@ece.utexas.edu 1 Abstract The MPEG-4 standard has been proposed to provide high quality audio and video content

More information

Nokia Q. Xie Motorola April 2007

Nokia Q. Xie Motorola April 2007 Network Working Group Request for Comments: 4867 Obsoletes: 3267 Category: Standards Track J. Sjoberg M. Westerlund Ericsson A. Lakaniemi Nokia Q. Xie Motorola April 2007 RTP Payload Format and File Storage

More information

Optimizing A/V Content For Mobile Delivery

Optimizing A/V Content For Mobile Delivery Optimizing A/V Content For Mobile Delivery Media Encoding using Helix Mobile Producer 11.0 November 3, 2005 Optimizing A/V Content For Mobile Delivery 1 Contents 1. Introduction... 3 2. Source Media...

More information

Real-time Audio Quality Evaluation for Adaptive Multimedia Protocols

Real-time Audio Quality Evaluation for Adaptive Multimedia Protocols Real-time Audio Quality Evaluation for Adaptive Multimedia Protocols Lopamudra Roychoudhuri and Ehab S. Al-Shaer School of Computer Science, Telecommunications and Information Systems, DePaul University,

More information

CHAPTER 5 AUDIO WATERMARKING SCHEME INHERENTLY ROBUST TO MP3 COMPRESSION

CHAPTER 5 AUDIO WATERMARKING SCHEME INHERENTLY ROBUST TO MP3 COMPRESSION CHAPTER 5 AUDIO WATERMARKING SCHEME INHERENTLY ROBUST TO MP3 COMPRESSION In chapter 4, SVD based watermarking schemes are proposed which met the requirement of imperceptibility, having high payload and

More information

MRT based Adaptive Transform Coder with Classified Vector Quantization (MATC-CVQ)

MRT based Adaptive Transform Coder with Classified Vector Quantization (MATC-CVQ) 5 MRT based Adaptive Transform Coder with Classified Vector Quantization (MATC-CVQ) Contents 5.1 Introduction.128 5.2 Vector Quantization in MRT Domain Using Isometric Transformations and Scaling.130 5.2.1

More information

ELEC 691X/498X Broadcast Signal Transmission Winter 2018

ELEC 691X/498X Broadcast Signal Transmission Winter 2018 ELEC 691X/498X Broadcast Signal Transmission Winter 2018 Instructor: DR. Reza Soleymani, Office: EV 5.125, Telephone: 848 2424 ext.: 4103. Office Hours: Wednesday, Thursday, 14:00 15:00 Slide 1 In this

More information

ETSI TS V ( )

ETSI TS V ( ) TS 126 446 V12.0.0 (2014-10) TECHNICAL SPECIFICATION Universal Mobile Telecommunications System (UMTS); LTE; EVS Codec AMR-WB Backward Compatible Functions (3GPP TS 26.446 version 12.0.0 Release 12) 1

More information

Parametric Coding of Spatial Audio

Parametric Coding of Spatial Audio Parametric Coding of Spatial Audio Ph.D. Thesis Christof Faller, September 24, 2004 Thesis advisor: Prof. Martin Vetterli Audiovisual Communications Laboratory, EPFL Lausanne Parametric Coding of Spatial

More information

RTP implemented in Abacus

RTP implemented in Abacus Spirent Abacus RTP implemented in Abacus 编号版本修改时间说明 1 1. Codec that Abacus supports. G.711u law G.711A law G.726 G.726 ITU G.723.1 G.729 AB (when VAD is YES, it is G.729AB, when No, it is G.729A) G.729

More information

Assessing Call Quality of VoIP and Data Traffic over Wireless LAN

Assessing Call Quality of VoIP and Data Traffic over Wireless LAN Assessing Call Quality of VoIP and Data Traffic over Wireless LAN Wen-Tzu Chen and Chih-Yuan Lee Institute of Telecommunications Management, National Cheng Kung University, No. 1 University Road, Tainan

More information

Video coding. Concepts and notations.

Video coding. Concepts and notations. TSBK06 video coding p.1/47 Video coding Concepts and notations. A video signal consists of a time sequence of images. Typical frame rates are 24, 25, 30, 50 and 60 images per seconds. Each image is either

More information

ON-LINE SIMULATION MODULES FOR TEACHING SPEECH AND AUDIO COMPRESSION TECHNIQUES

ON-LINE SIMULATION MODULES FOR TEACHING SPEECH AND AUDIO COMPRESSION TECHNIQUES ON-LINE SIMULATION MODULES FOR TEACHING SPEECH AND AUDIO COMPRESSION TECHNIQUES Venkatraman Atti 1 and Andreas Spanias 1 Abstract In this paper, we present a collection of software educational tools for

More information

RECOMMENDATION ITU-R BT.1720 *

RECOMMENDATION ITU-R BT.1720 * Rec. ITU-R BT.1720 1 RECOMMENDATION ITU-R BT.1720 * Quality of service ranking and measurement methods for digital video broadcasting services delivered over broadband Internet protocol networks (Question

More information

S.K.R Engineering College, Chennai, India. 1 2

S.K.R Engineering College, Chennai, India. 1 2 Implementation of AAC Encoder for Audio Broadcasting A.Parkavi 1, T.Kalpalatha Reddy 2. 1 PG Scholar, 2 Dean 1,2 Department of Electronics and Communication Engineering S.K.R Engineering College, Chennai,

More information

Perceptual Coding. Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding

Perceptual Coding. Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding Perceptual Coding Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding Part II wrap up 6.082 Fall 2006 Perceptual Coding, Slide 1 Lossless vs.

More information

Advanced Video Coding: The new H.264 video compression standard

Advanced Video Coding: The new H.264 video compression standard Advanced Video Coding: The new H.264 video compression standard August 2003 1. Introduction Video compression ( video coding ), the process of compressing moving images to save storage space and transmission

More information

Recommended Readings

Recommended Readings Lecture 11: Media Adaptation Scalable Coding, Dealing with Errors Some slides, images were from http://ip.hhi.de/imagecom_g1/savce/index.htm and John G. Apostolopoulos http://www.mit.edu/~6.344/spring2004

More information

Modified SPIHT Image Coder For Wireless Communication

Modified SPIHT Image Coder For Wireless Communication Modified SPIHT Image Coder For Wireless Communication M. B. I. REAZ, M. AKTER, F. MOHD-YASIN Faculty of Engineering Multimedia University 63100 Cyberjaya, Selangor Malaysia Abstract: - The Set Partitioning

More information

A Synchronization Scheme for Hiding Information in Encoded Bitstream of Inactive Speech Signal

A Synchronization Scheme for Hiding Information in Encoded Bitstream of Inactive Speech Signal Journal of Information Hiding and Multimedia Signal Processing c 2016 ISSN 2073-4212 Ubiquitous International Volume 7, Number 5, September 2016 A Synchronization Scheme for Hiding Information in Encoded

More information

Networking Applications

Networking Applications Networking Dr. Ayman A. Abdel-Hamid College of Computing and Information Technology Arab Academy for Science & Technology and Maritime Transport Multimedia Multimedia 1 Outline Audio and Video Services

More information

COMPUTER NETWORKS UNIT I. 1. What are the three criteria necessary for an effective and efficient networks?

COMPUTER NETWORKS UNIT I. 1. What are the three criteria necessary for an effective and efficient networks? Question Bank COMPUTER NETWORKS Short answer type questions. UNIT I 1. What are the three criteria necessary for an effective and efficient networks? The most important criteria are performance, reliability

More information

Appendix 4. Audio coding algorithms

Appendix 4. Audio coding algorithms Appendix 4. Audio coding algorithms 1 Introduction The main application of audio compression systems is to obtain compact digital representations of high-quality (CD-quality) wideband audio signals. Typically

More information

DAB. Digital Audio Broadcasting

DAB. Digital Audio Broadcasting DAB Digital Audio Broadcasting DAB history DAB has been under development since 1981 at the Institut für Rundfunktechnik (IRT). In 1985 the first DAB demonstrations were held at the WARC-ORB in Geneva

More information