LOSSLESS AUDIO COMPRESSION USING INTEGER MODIFIED DISCRETE COSINE TRANSFORM. Yoshikazu Yokotani and Soontorn Oraintara

3 ISPACS Awaji Island 3 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS 3) Awaji Island, Japan, December 7, 3 C14 LSSLESS AUDI CMPRESSIN USING INTEGER MDIFIED DISCRETE CSINE TRANSFRM Yoshikazu Yokotani and Soontorn raintara Department of Electrical Engineering, University of Texas at Arlington, 416 Yates St., Arlington, TX, 761916, USA Phone: +18172723482, Fax: +18172722253 Email:yoshi@msp.uta.edu, oraintar@uta.edu ASTRACT Recently, an MPEG2 AAC [1] based lossless audio codec with the Integer MDCT (IntMDCT) was proposed [2]. The IntMDCT was constructed by lifting scheme [3] to hold the perfect reconstruction(pr). In this paper, we will evaluate the IntMDCT implemented by fixedpoint arithmetic with quantized lifting coefficients in the MPEG2 AAC based lossless audio coding. The results indicate that there exists a tradeoff between computational complexity of the IntMDCT and coding efficiency when it is applied in the MPEG2 AAC based lossless audio coding scheme and one can reduce the computational complexity of the IntMDCT while a certain level of coding efficiency is maintained in the scheme. 1. INTRDUCTIN Lossless audio coding has received attentions from audio researchers and engineers in recent years. Following the demands of network operating companies and service providers, the MPEG committee started working to define lossless audio coding technology last year [4]. Recently, a lossless audio codec by using the integer modified discrete cosine Transform (IntMDCT), implemented by the lifting scheme, was proposed [5, 2]. The IntMDCT is an approximated version of the MDCT employed in the current audio coding standards, and is a reversible process. In [5, 5], the IntMDCT is employed in the MPEGAudio layeriii (MP3) coder and decoder by factorizing the 3 and 9point DCT s used in the 12 and 36 point MDCT s to a set of Given rotation angles and approximating them by lifting. In [2], similar idea but different in size is applied. The 48point MDCT is approximated. Since it is very difficult to factorize a large orthogonal matrix into Given rotation angles, the 512point discrete Fourier transform (DFT) is used in the fast structure. In this paper, the IntFFT proposed in [] is used in the implementation of our coder since it is adjustable for different levels of complexity which may be useful in different situations. Since it inherits the properties of the MDCT and is a reversible process, it is a natural choice for a lossless audio coder. The Int MDCT can be used to implement a lossless audio codec in two ways. First, it can be used to replace the conventional MDCT directly. Since it maps integers to integers with reversibility preserved, one can simply apply an entropy coder at the output of the transform. The resulting codec is called a nonscalable codec. Second, the conventional (lossy) audio codec using MDCT is taken to be the base layer. An enhancement layer is created by applying the IntMDCT to the audio signal. The difference between the outputs of the two versions of the transform serve as the refining detail which complements the base layer bit stream. Since the base layer is compatible with the MPEG AAC standard and the enhancement layer can be an addon component, it is call a scalable codec [6]. The key idea of approximating an orthogonal transform with irrational coefficients with reversibility preserved is to factorize the matrix into a number of Givens rotation angles, and then further factorize them into lower and upper triangular matrices. Each of these triangular matrices can be implemented using a two port network with only one branch called lifting whose corresponding coefficient can be truncated [3]. As long as the forward and inverse lifting coefficients are approximated properly and simultaneously, the input of the forward transform can be perfectly reconstructed (PR) at the output of its inverse. Such technique has been applied into many existing transforms including dyadic wavelet transform [7], the discrete cosine transform (DCT) [8, 9], the discrete Fourier transform [] and the MDCT [5]. In this paper, the IntMDCT is implemented by fixedpoint arithmetic with the quantized lifting coefficients and evaluated in lossless audio coding with experimental results when it is used in the nonscalable and the scalable codecs. This paper is organized as follows: a fast structure of the IntMDCT is described in section 2. Section 3 shows the computational complexity of the IntMDCT, and the structures of the nonscalable and scalable codecs are described in section 4 with the coding results. In section 5, the future work is described. Finally, the conclusions are stated. 2. A FAST AND MULTIPLIERLESS STRUCTURE F THE INTMDCT A fast structure of the IntMDCT can be constructed from a fast structure of the MDCT whose Givens rotation angles are factorized via the lifting scheme with a set of butterfly coefficient pairs and, which are cosine and sine of a rotational angle, respectively. This is formulated in (1). (1) where is a Givens rotation matrix. Now, we introduce two parameters, lifting coefficients and, and these coefficients are calculated from butterfly coefficients and by (2) (3)

?? P V,,? Q * L Q x() indow operation x(n/21) sinθsw(n/21) x(n/2) cosθsw(n/21) x(n1) x(n) cosθsw() cosθsw() sinθsw() PreFFT twiddle operation cosθpr() sinθpr() sinθpr() j cosθpr() N/2pt FFT Re Im PostFFT twiddle operation sinθpo() sinθpo() cosθpo() cosθpo() 1 X() X(1) cosθsw(n/21) x(3n/21) x(3n/2) sinθsw(n/21) sinθpr(n/21) cosθpr(n/21) sinθpr(n/21) cosθpr(n/21) j θ fft1 θ fft3 cosθpo(n/21) Re sinθpo(n/21) sinθpo(n/21) 1 Im cosθpo(n/21) X(N2) X(N1) sinθsw() x(2n1) Figure 1: The fast MDCT Implementation via the FFT J J! #%$&' ()+*,./ $K6 &' ()+L./ $K6 &' ()+L./ 21354&7681:9;4<6>=?A@ ED C $:GFHFIF =? (3)+* and (4) 1354& M 1 9 4N6 = @ 1 9 4>6 =? @ 6 M 1 9 4>6<P = @ ED (3)+Q (3)+L (5) 21354&76 M 1R9;4N6N= S@ 1:9;4>6<=?A@ :M 1R9;4>6 P T@ = ED UGV (6) (3)+Q (3)+L Factorization of the = point MDCT can be separately done in the following four blocks: The = point sine window operation, point FFT, and = the = point prefft twiddle operation, the ( point postfft twiddle operation. Figure 1 depicts Q a fast structure of the MDCT which is composed of the four blocks with rotational angles defined below: The = point sine window operation: YX [Z;&\^] = 9E= Z_@ (7) The = point prefft twiddle operation: I`Ha [Z;&b ] 9cZd6 @ (8) = The =Ae point split radix FFT: %fgfih 5$&j ] $ =Ae (9) fgfih 5$&j Rk ] $ =Ae () The postfft twiddle operation: `Hl [Zc&\ Z ] = (11) ZK where FiFIF ( $: and FHFHF (. For the inverse MDCT, it can be simply implemented by applying the rotational angles with the opposite signs. Here, the equation (7) is obtained by the method reported in [2], and (8) and (11) are calculated from the fast DCTIV structure described in [11]. (9) and () are the twiddle factors of the =Ae point splitradix and (3)+Q (3)+Q V 13nm& mo FFT coefficients of an input signal where FHFHF ( defined by (5) and (6). Now, a fast IntMDCT can be constructed by calculating the lifting coefficients and for all the rotations as well as the following x matrix with a rotational angle p : Given a rotational angle, a pair of the lifting coefficients and is computed by the equations (2) and (3). Here, it should be noted that these equations have to be changed with a different range of rotational angles to keep the absolute values of the lifting coefficients and less than or equal to as reported in []. The lifting coefficients obtained from the factorization are generally irrational and must be approximated by finiteprecision numbers. Moreover, if the coefficients are represented in dyadic number, the IntMDCT can be implemented by additions and shifts so no multipliers are necessary. Now, let = be the number of bits to quantize these lifting coefficients. The value of = is directly related to the accuracy of the transform. Since this is a highly

? complex structure, there is no simple mathematical expression that relates the accuracy of the transform to the order of quantization. Figure 2 shows the absolute values of MDCT spectrum and Int MDCT spectrum with = and gq when the 1kHz sine wave in the SQAM [12] is the input. The size of the transforms is. hen = and gq, the IntMDCT spectrum looks close to the MDCT spectrum compared to the case of =. 3. CMPUTATINAL CMPLEXITY F THE INTMDCT The parameter = relates to computational complexity as well as the accuracy of the IntMDCT since the lower the value of = is the less the number of additions and shifts are required. However, it is difficult to quantitatively evaluate how much the computational complexity can be saved by decreasing the value. Moreover, it cannot be used to see the advantage of using the lifting scheme rather than the butterfly structure in terms of computational complexity. Now, let us call the FxpMDCT as the case where the MDCT is constructed by fixedpoint arithmetic and butterfly structure is used to implement a Givens rotation as it is in the conventional MDCT. In this section, the computational complexity is discussed with the numbers of real multiplications and real additions needed to perform the FxpMDCT, and IntMDCT. Figure 3?shows the number of real additions and shifts to calculate the point FxpMDCT and IntMDCT. The same value of = is used to represent butterfly coefficients in the FxpMDCT and lifting coefficients in the Int MDCT. From the figure,it can be observed that the IntMDCT has percent less computational complexity than the Fxp at most P MDCT. Moreover, reducing the value of = from 15 to 12 save about 28 percent of the computational complexity. 4. IMPLEMENTATIN F THE INTMDCT IN LSSLESS AUDI CDING As proposed in [2, 6], two different types of lossless audio codecs can be constructed. ne is a nonscalable codec where the Int MDCT is directly followed by an entropy coder. Another is a scalable codec where the base layer stream is compatible with the MPEG AAC standard and the enhancement layer is the residual to complement a losslessly compressed audio signal [6]. 4.1. Nonscalable codec Figure 5 depicts a structure of the nonscalable codec implemented with the IntMDCT. In this codec, the psychoacoustic model defined in the MPEG2 AAC is used to switch a window shape for the IntMDCT. After the IntMDCT, the coefficients are compressed by a contextbased arithmetic coder. A context model is a probability model of the current coding symbol constructed from the neighbor symbols. The model can be used in an entropy coding scheme so that the compression ratio can be improved by the conditional entropy. For lossless audio coding application, this encoding scheme has been applied to a linear prediction error in time domain [13]. Table 1 shows comparison of the bit rates of the nonscalable codec with the value of = and iq with the bit rates of the lossless AAC [2] and a linear predictionbased codec, FLAC (Free Lossless Audio Coder) [14]. The test audio files are chosen from the SQAM CD [12]. As mentioned in [2], in each SQAM audio file, an audio signal is preceded and followed by zero frames. To make a fair comparison, these zero frames are omitted. It can be observed that the coding efficiencies when a value of = The number of shifts The number of additions 1.8 x 5 1.6 1.4 1.2 1.8.6 IntMDCT FxpMDCT.4 4 5 6 7 8 9 11 12 13 14 15 1.8 x 5 1.6 1.4 1.2 1.8.6.4 IntMDCT FxpMDCT (a).2 4 5 6 7 8 9 11 12 13 14 15 (b) Figure 3: Comparison of computational complexities of the Fxp MDCT and the IntMDCT (a)the number of additions (b)the number of shifts and iq are almost the same. In the case of =, the coding efficiency is degraded by at most q bits/sample. This is because decreasing a value of = introduces a more approximation error observed in the Figure 2 into the IntMDCT spectrum and the randomness pushes up the bit rate even though it contributes to reduction of computational complexity of the IntMDCT. These results implies the following: The optimum value of = could be around to maintain the same coding efficiency compared to the case of = gq. Thus, for the nonscalable codec, a high value in = is desirable to keep a certain level of coding efficiency. 4.2. Scalable codec A lossless scalable codec can be realized by combining the MPEG2 AAC perceptual audio coder with a residual of the quantized Int MDCT coefficients. Figure 6 and 7 illustrate the MPEG2 AAC based lossless scalable codec [6]. Here, it can be seen that the base layer stream is compatible with the bit stream defined by the MPEG2 AAC standard. Figure 4 shows bit rates of both base layer

5 5 4 4 3 3 5 3 4 5 6 7 8 9 (a) 5 3 4 5 6 7 8 9 (b) 4 4 3 3 3 4 5 6 7 8 9 3 4 5 6 7 8 9 (c) Figure 2: Absolute values of (a)mdct spectrum, (b)intmdct spectrum with = (d)intmdct spectrum with = iq (d), (c)intmdct spectrum with =, and Table 1: Comparison of the bit rates (bits/sample) of the nonscalable codec Test the nonscalable codec Lossless FLAC audios A value of = AAC [2] [14] 8 12 15 Piano 4.7 4.28 4.24 4. 3.79 Soprano 7.26 6.91 6.88 6.65 7.7 rchestra 7. 6.69 6.63 6.5 6.25 Pop 8.44 8.32 8.31 8.3 7.56 and enhancement layer of the scalable codec with the value of = and gq for the four test audios used in Table 1. Ideally, bit rates of the nonscalable codec and the scalable codec are the same since both tests use the same set of audio files. However, it is observed that bit rates of the scalable code are higher than those obtained of the nonscalable codec. Moreover, when the bit rate of the base layer is equal to r k kbps, the total bit rate increases at most q bits/sample. This may be explained as follow: Increasing a bit rate of the base layer cannot make the approximation error of the IntMDCT in the enhancement layer smaller and the bit rate of the enhancement layer doesn t decrease much. As a result, the total bit rate will increase. In addition, similar to the result in Table 1, when a value of =, the coding efficiency is similar to the one obtained with = gq in the scalable codec. 5. CNCLUSINS In this paper, we implemented the IntMDCT by fixedpoint arithmetic with quantized lifting coefficients and evaluated the computational complexity as well as coding efficiencies when the Int MDCT is applied in the MPEG2 AAC based lossless audio coding scheme. These results indicate that there exists a tradeoff between

the computational complexity and coding efficiency for lossless in this simulation. audio coding and it is balanced when = 6. REFERENCES [1] IS/IEC JTC1/SC29/G11 (MPEG). International standard IS/IEC 138187 Generic coding of moving pictures and associated audio : Advanced audio coding. 1997. [2] R. Geiger, T. Sporer, J. Koller, and K. randenburg. Audio coding based on integer transform. 111st AES convention Preprint 5471, 1. [3] I. Daubechies and. Sweldens. Factoring wavelet transforms into lifting steps. Technical report, ell Laboratories, Lucent Technologies, 1996. [4] IS/IEC JTC1/SC29/G11 (MPEG). MPEG meeting dcoument N54. 2. [5] T. Krishnan and S. raintara. A fast and lossless forward and inverse structure for the MDCT in MPEG audio coding. Proc. of the International Symposium on Circuits and Systems, May 2. [6] R. Geiger, J.Herre, J. Koller, and K. randenburg. IntMDCT A link between perceptual and lossless audio coding. Proc. IEEE International Conf. on Acoustics, Speech, and Signal Processing, 2:1813 1816, 2. [7] A. R. Calderbank et al. Lossless image compression using integer to integer wavelet transforms. Proc. IEEE International Conf. on Image Processing, 1:596 599, 1997. [8] T. D. Tran. The indct: Fast multiplierless approximation of the DCT. IEEE Singal Processing Letters, 7(6):141 144, June 1. [9] S. C. Chan and P. M. Yiu. Multiplierless discrete sinusoidal and lapped transforms using sumofpowersoftwo (sopot) coefficients. IEEE International Symposium on Circuits and Systems, 2:13 16, May 1. [] S. raintara, Y. J. Chen, and T. Nguyen. Integer fast Fourier transform. IEEE Trans. on Signal Processing, 5:67 618, March 2. [11] H. S. Malvar. Signal Processing with Lapped Transforms. Archtech House, 1992. [12] SQAM(Sound Quality Assessment Material). CD 422 42. European roadcasting Union, 1988. [13] T. Qiu. Lossless audio coding based on high order context modeling. IEEE Fourth orkshop on Multimedia Signal Processing, pages 575 58, 1. [14] Josh Coalson. FLAC : Free Lossless Audio Codec.

The total bit rate of the scalable codec(bits/sample) 6.2 6 5.8 5.6 5.4 5.2 The total bit rate of the scalable codec(bits/sample) 8 7.9 7.8 7.7 7.6 7.5 7.4 5 7.3 The total bit rate of the scalable codec(bits/sample) 4.8 6 65 7 75 8 85 9 95 8 7.9 7.8 7.7 7.6 7.5 7.4 7.3 (a) The total bit rate of the scalable codec(bits/sample) 7.2 6 65 7 75 8 85 9 95 7.1 7.5 7 6.95 6.9 6.85 (b) 7.2 7.1 6 65 7 75 8 85 9 95 (c) 6.8 6 65 7 75 8 85 9 95 (d) Figure 4: Comparison of the bit rates (bits/sample) of the scalable codec (a)sqam6(piano), (b)sqam61(soprano), (c)sqam65(rchestra), and (d)sqam7(pop) Legend Signal in float Signal in integer Control Psychoacoustic model PCM audio Int MDCT Contextbased Coding Contextbased Decoding Int IMDCT Encoder Decoder Figure 5: A Structure of the nonscalable Codec

Psychoacoustic Model block switching MPEG2 AAC Encoder 16bit PCM Audio MDCT Q Huffman Coding ase layer Q 1 Integer MDCT Contextbased Coding Enhancement layer Lossless Encoding Scheme Figure 6: A structure of the lossless scalable encoder MPEG2 AAC Decoder ase layer Huffman Decoding Q 1 IMDCT Reconstructed Audio with lossy compression process Enhancement layer Contextbased Decoding Lossless Decoding Scheme Integer IMDCT Reconstructed Audio with lossless compression process Figure 7: A structure of the lossless scalable decoder