DRA AUDIO CODING STANDARD - PDF Free Download

Applied Mechanics and Materials Online: 2013-06-27 ISSN: 1662-7482, Vol. 330, pp 981-984 doi:10.4028/www.scientific.net/amm.330.981 2013 Trans Tech Publications, Switzerland DRA AUDIO CODING STANDARD Wenhua Ma 1, a Yuanzhe Ma 2, b Yu-Li You 3, c 1 School of Informatics Guangdong University of Foreign Studies.2 Baiyundadao 2 Department of Biomedical, South China University of Technology.381.Wushanlu 3 Provincial Key Lab for Digital Audio Technology, Digital Rise Technology Co. Ltd. 6th Floor, Bldg. 2, Science and Tech Park South China University of Technology a myz122@yahoo.com.cn, b reika2009@yeah.net, c yuliyou@usa.com Keywords: Audio coding, standard, listening test, adaptive transform coder. Abstract. China s DRA audio coding standard is shown to be a barebone transform coder, utilizing transient-localized MDCT for improved pre-echo suppression and statistic allocation of codebooks for high entropy coding efficiency. A signal path of up to 24 bits is provided throughout the codec so that highest audio quality can be delivered if bit rate suffices. Results of five ITU-R BS.1116 compliant subjective listening tests are presented. Introduction There have been extensive standardization activities in audio coding in the past twenty years. MPEG-1 is the first international standard for perceptual high quality audio coding [1]. It is essentially a subband coder that deploys a 32-band QMF (quadrature mirror filter bank). Its Layer 3 adds switched MDCT (modified discrete cosine transform) to the subband signals output from the QMF for increased frequency resolution. MPEG-1 was extended by MPEG-2 BC (backward compatible) to provide for lower sample rates and multichannel surround sounds [1]. MPEG-2 AAC abandons backward compatibility with MPEG-1 in order to achieve significant improvement in coding efficiency [1]. AAC uses an MDCT that switches between 1024 and 128 spectral lines. AAC is carried over into MPEG-4 AAC with the addition of more coding tools, such as Perceptual Noise Shaping and Long Term Prediction, and coder configurations [1]. Dolby AC-3 is probably the most commercially successful audio coding standard [1]. It uses MDCT that switches between 128 and 256 spectral lines. WMA (Windows Media Audio), offered by Microsoft, uses MDCT that switches between 64, 128, 256, 512, 1024, and 2048 spectral lines [2]. Vorbis, an open source codec offered by the Xiph.Org Foundation, uses MDCT that switches between 256 and 1024 spectral lines [3]. Modern audio codecs have apparently converged to MDCT as the tool for time-frequency analysis. DRA algorithm [4], adopted as China s national standard for its electronics industry, uses transient-localized MDCT that provides improved pre-echo suppression with small bit and computation overheads. It uses statistic allocation of Huffman codebooks to enhance the coding efficiency of Huffman codes. Its quantization unit and Huffman codebooks are designed in such a way that signal path of up to 24 bits is provided throughout the codec so that highest audio quality can be delivered if bit rate allows. Although simple, DRA standard delivers state-of art coding efficiency as is shown by the five ITU-R BS.1116 compliant subjective listening tests. All rights reserved. No part of contents of this paper may be reproduced or transmitted in any form or by any means without the written permission of Trans Tech Publications, www.ttp.net. (ID: 130.203.136.75, Pennsylvania State University, University Park, USA-05/03/16,23:28:38)

982 Materials Engineering and Automatic Control II The Algorithm As shown in Figure 1, DRA audio encoder is a simple, essentially bare-bone, adaptive transform coder. Its major components are described as follows. Transient-Localized MDCT. An MDCT that switches between 128 and 1024 spectral lines is used to provide time-frequency analysis. In order to improve its capability of pre-echo suppression, a special brief window is introduced, whose effective size is reduced from 256 to 160 samples. In particular, the brief window is nonzero only within the central 160 samples, with zeros for the first 48 and the last 48 samples of the window [5]. In order to switch to/from this brief window from/to the long (2048 samples) and short (256 samples) windows, a few more transitional windows are introduced due to the perfect reconstruction conditions [1]. All the windows are based on the sine window [1] This brief window is applied only to the block of samples containing a transient, while the short and/or the appropriate transition windows are applied to the quasi-stationary samples in the remainder of the transient frame. Some example window sequences are shown in Figure 2. (a) is an example for the conventional approach. (b) shows that a transient occurs in the first block of the frame, so a brief window is deployed for this block. (c) shows that a transient occurs in the third block. (d) shows that two transients occur in the third and sixth blocks, so two brief windows are placed, respectively. (e) shows that a transient occurs in the last block. Since the effective size of this brief window is reduced from 256 to 160, better pre-echo suppression is achieved for two reasons. The first is that a finer temporal resolution is deployed to transient samples and high bit rates associated with transients are constrained to fewer samples. The second reason is that the spread of quantization noise is reduced. The spread of quantization noise for the short window is 256 samples, but this is reduced to only 160 samples for the brief window. For a typical sample rate of 48 khz, they amount to 5.33 and 3.3 ms, respectively. Given that significant premasking tends to last about 1-2 ms [1], the spread of quantization noise that may be audible is reduced from 4.3-3.3 to 2.3-1.3 ms, a significant reduction. Linear Scalar Quantization. Linear scalar quantization is used to quantize MDCT spectra lines. A group of spectral lines, referred to as a quantization unit, boxed in the frequency domain by the critical bands and in the time domain by the MDCT blocks that are statistically similar, share a quantization step size, which is logarithmically quantized with a step size of 0.2 db. When the quantization step size is one, the maximum allowed quantization index is ±(2 23 1) and the Huffman codebooks are designed to accommodate this. Consequently, a signal path of up to 24 bits is provided throughout the codec so that audio quality far exceeds the perceptual capability of the human ear can be delivered if bit rate suffices.

Applied Mechanics and Materials Vol. 330 983 Statistic Allocation of Codebooks. With the conventional approach to codebook allocation, all the spectral lines within a quantization unit share a Huffman codebook. The codebook assigned to such a unit is the smallest one that can accommodate the largest quantization index within the unit. Consequently, a fixed quantization step size means all the quantization indexes within the unit are fixed, and so is the Huffman codebook. There is no other option. Since the quantization indexes within a quantization unit do not necessarily share the same statistic properties, the traditional approach does not provide a good match, if any, between the statistic properties of the Huffman codebooks and those of the quantization indexes. This motivates a statistic adaptive approach to codebook assignment, whose steps are outlined as follows: 1. The quantization indexes are grouped into granules of four, the smallest codebook that can accommodate the largest quantization index in the granule is assigned to the granule. 2. Segment the indexes of these codebook into large segments based on their local statistic properties. 3. Select the largest codebook within each segment as the codebook for that segment. The advantage of this approach is illustrated in Figure3. Since the largest quantization index falls into quantization unit d, so a large codebook is assigned using previous methods, which is obviously not a good match because most of the indexes in the unit are much smaller. Using the DRA approach, however, the largest quantization index is segmented into segment C, so share a codebook with other large quantization indexes. Also, all quantization indexes in segment D are small, so a small codebook is selected. This obviously results in fewer bits for coding the quantization indexes. Other Components. At low bit rates, DRA algorithm may deploy joint channel coding. While the implementation for sum/difference coding is regular, the joint intensity coding is a little different. Instead of joining stereo pairs, it joins all channels into the left channel, thereby providing significant bit rate reduction when surround sounds are involved. While perceptual model and global bit allocation are necessary components of DRA encoder, they are not part of DRA decoder and there is little, if any, restriction on their implementation, so they are not stipulated in the DRA standard and are thereby not discussed here. Subjective Listening Tests During its standardization process, DRA went through five rounds of ITU-R BS.1116 [1] compliant subjective listening test. The results of them are shown in Table 1. Table 1Scores for ITU-R BS.11116 compliant subjective listening tests. Lab Date Stereo 128[kbps] 5.1 320[kbps] 5.1 384[kbps] NTICRT 08/04 4.6-4.7 SLDST 10/04 4.2-4.0 SLDST 01/05 4.1-4.2 SLDST 07/05 4.7-4.5 SLDST 08/06-4.6 4.9

984 Materials Engineering and Automatic Control II All these tests specified the bit rate in such a way that it is the upper limit absolutely not to be exceeded in any frame. For example, if the sample rate is 48 khz, the bit rate of 128kbps translates into 2730 bits per frame because a DRA frame consists of 1024 samples. No frame can use more than 2730 bits and no bit reservoir is allowed. The first test was conducted by National Testing and Inspection Center for Radio and TV Products of China (NTICRT) in August 2004. Ten stereo sound tracks selected mostly from SQAM CD [6] and five 5.1 surround sound tracks were used in the test. The test subjects were all expert listeners consisting of conductors, musicians, recording engineers, and audio engineers. The other four tests were all performed by the State Lab for DTV System Testing (SLDST) under the State Administration for Radio, Film, and TV of China. Other than a few Chinese sound tracks, most of the test materials were selected from the SQAM CD [6] and a pool of surround sound tracks used by EBU and MPEG, including Pitch pipe, Harpsichord, and Elloit1. The last test, though still conducted by SLDST, was actually ordered and supervised by China Central TV (CCTV) as part of its DTV technology evaluation program. CCTV was only interested in surround sounds, so DRA was tested at 384kbps and 320 kbps. This test was conducted in comparison with two major international codecs, DRA came out as the clear winner. Conclusion DRA audio coding standard was shown to be essentially a bare-bone transform coder that uses transient-localized MDCT for improved pre-echo suppression and statistic allocation of codebooks for better entropy coding efficiency. Its quantizer and Huffman codebooks are designed in such a way that a signal path of up to 24 bits is provided throughout the codec so that highest audio quality can be delivered if bit rate suffices. Its coding efficiency has been evaluated by five ITU-R BS.11116 compliant subjective listening tests. Acknowledgements Supported by 2012 Guangdong Science and Technology Plan for Commercialization of Advanced and New Technologies under contract 2012B010100033. References [1]T. Painter and A. Spanias, Perceptual coding of digital audio, Proceedings of the IEEE, vol. 88, no. 4, pp. 451 513, April 2000. [2] Wikipedia, Windows Media Audio, http://en.wikipedia. org/wiki/windows Media Audio, October 2007. [3] Vorbis I specification, Xiph.org Foundation, 2004. [4] Yu-Li You, Weixiong Zhang, Mao Xu, and Subin Zhang, Electronics Industry Standard: Multichannel Digital Audio Coding Technology, SJ/T11368-2006, Ministry of Information Industry, People s Republic of China, 2007. [5] Yu-Li You and Wenhua Ma, Temporal transient localization for enhanced pre-echo suppression, submitted to IEEE International Conference on Acoustics, Speech, and Signal Processing, 2008. [6] EBU, Sound Quality Assessment Material Recordings for Subjective Tests, Tech. 3253, April 1988.

Materials Engineering and Automatic Control II 10.4028/www.scientific.net/AMM.330 DRA Audio Coding Standard 10.4028/www.scientific.net/AMM.330.981