Volume 4, Issue 8, August 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Audio Compression Using Biorthogonal Wavelet, Modified Run Length, High Shift Encoding Zainab T. Drweesh, Loay E.George Dept. of Computer Science, College of Science, University of Baghdad, Baghdad, Iraq Abstract Audio compression has become one of the basic technologies and has been the subject of much research and experimentation throughout the last two decades. The need to compress audio data is motivated by both storage requirements and transmission requirements. The purpose of this research is to design and implement a low complexity and efficient audio coding scheme based on Biorthogonal tab 9/7 wavelet filter. The proposed system consists of the audio normalization, followed by wavelet (Tap 9/7), progressive hierarchal quantization, modified run length encoding, and finally high order shift coding to produce the final bit stream. To reduce the effect of quantization noise, which is notable at the low energetic segments of the audio signal, a post processing filtering stage is introduced as final stage of the decoding processes. The efficiency performance of the suggested audio encoding methods has been measured using peak signal to noise (PSNR) ratio and compression ratio (CR). The attained results indicated that compression performance of the system is promising; it achieved better results than the DCT based. The compression ratio is increased with the increase of number of passes. Also the post processing stage improved the subjective quality of the reconstructed audio signal. Also, it improved the fidelity level of reconstructed audio signal when PSNR is less than 38 Db. Keywords Audio Compression, Biorthogonal tab 9/7 wavelet filter, Hierarchal Quantization, Lossless Coding I. INTRODUCTION The need to compress media data is motivated by both storage requirements and transmission requirements. In earlier days of the information age, disk space was limited and expensive, and bandwidth was limited, so compression was a must [1]. Data compression is popular for two reasons: (1) people like to accumulate data and hate to throw anything away, no matter how big a storage device one has, sooner or later it is going to overflow, data compression seems useful because it delays this inevitability and (2) people hate to wait a long time for data transfers. When sitting at the computer, waiting for a Web page to come in or for a file to download, we naturally feel that anything longer than a few seconds is a long time to wait [2]. The need for audio compression algorithms that can satisfy simultaneously the conflicting demands of high compression ratios and transparent quality for high fidelity audio signals led to the establishment of several coding methodologies over the last two decades. In general, audio compression schemes employ design techniques that exploit both perceptual irrelevancies and statistical redundancies. The most popular audio coders are based on using two techniques (i.e., sub-band coding and transform coding). Sub-band coding splits signal into a number of sub-bands, using band-pass filter like wavelet transform [3]. Transform coding uses a mathematical transformation like FFT, and DCT. Considerable interest has arisen in recent years regarding wavelet as a new transform technique for both image and audio processing applications. Like other transform coding techniques, wavelet coding is based on the idea that the coefficients of a transform decorrelates the samples values of an audio signal and can be coded more efficiently than the original samples values themselves [4], such that most of the signal energy is concentrated in a small fraction of samples. For surveying the problem of improving audio coding (compression), several algorithms have been developed. Khalifa et al [5] proposed audio compression using wavelet transform to achieve transparent coding of audio and speech signals at the lowest possible data rates. Their wavelet based compression system reached compression ratio 1.88 with signal to noise ratio 34.5 db when using the Daubechies-10 wavelet. Also, Dhubkarya and Dubey [4] have presented a new high quality audio codec at low bit rate using wavelet transform and made improvement in reconstructed wave using post filtering. Harmanpreet Kaur and Ramanpreet Kaur [6] proposed a speech compression method using different transform techniques. The signal is compressed using DWT technique afterward this compressed signal is again compressed by DCT and then this compressed signal is decompressed using DWT technique. They have investigated the use of DWT & DCT as analysis tools for speech signal coding; they used Peak Signal to Noise Ratio and Normalized Root Mean Square Error (NRMSE) to evaluate the effectiveness of different filters of wavelet family. Patil and et al [7] proposed a simple audio compression scheme based on discrete wavelet transform & DCT. They implemented it using MATLAB, the experimental results indicated that in general there is improvement in compression gain and signal to noise ratio with DWT based technique. 2014, IJARCSSE All Rights Reserved Page 63
II. AUDIO COMPRESSION SYSTEM In our previous work [8], we proposed the use of DCT with run-length encoding and high-order shift encoder to compress audio signals. In this paper we propose the use of Bi-Orthogonal tap 9-7 wavelet transform instead of the discrete cosine transform (DCT). Also some modifications on run length encoder and high-order shift encoder were accomplished to make the proposed wavelet based compression scheme is more efficient. We have chosen Bi-orthogonal wavelet decompositions due to its efficiency beside to its wide use for lossy and near lossless image compression, taking into consideration its use in ISO JPEG2000 standard. In this research a modified run length coding (RLE) method is developed, the modifications are accomplished in order to overcome the weakness of the traditional run length encoding (i.e., the gain of RLE is reduced significantly when short runs occurs in the input stream). Also, an improved high-order shift encoder is introduced as stable and robust entropy encoder to reduce the size of transform stream. Shift encoder produces a list of code words to encode the input sequence of numbers. The rest of this paper is structured as follows. Section 3 contains a description of the proposed system. The established proposed system is tested using different audio files, and the test results are discussed in Section 4. Finally the derived conclusions are listed in section 5. The layout of the proposed system is illustrated in Figure (1). Fig. 1 The Proposed Encoding System Layout A. Biorthogonal Wavelet Transform In order to gain greater flexibility in the construction of wavelet bases, orthogonal condition is relaxed allowing semi-orthogonal, biorthogonal or non-orthogonal wavelet bases [9]. The proposed system deals with biorthogonal wavelet 9/7. These wavelets are part of the family of symmetric biorthogonal wavelet CDF. Daubechies 9/7 (also called Tab7/9 because the filters lengths are 9 and 7 for low and high pass filters, respectively) have risen to special prominence because they were selected to be the kernel transform in JPEG2000 standard [10]. In most of the cases, the filters used in wavelet transforms have floating point coefficients. Since the input images have integer entries, the filter output no longer consists of integers and losses will result due to rounding [11]. The (9/7) wavelet transform is computed by four lifting steps followed by two scaling. The Lifting scheme of the biorthogonal transform 9/7 goes through four steps: two prediction operators and two update operators as shown in Figure (2) [12]. B. Quantization Quantization is simply the process of reducing the number of bits needed to store coefficient values by reducing its precision (e.g., rounding from float type to integer). The goal of quantization is to reduce most of the less important high frequency coefficients to zero. In the proposed system, the uniform scalar quantization operation was adopted to quantize the coefficients of each sub band individually; this step will reduce the number of bits needed to represent the coefficients approximately, and preparing it to run length and shift coding step. 2014, IJARCSSE All Rights Reserved Page 64
(a) Fig. 2 Split, Predict and Update Steps of forward CDF 9/7 wavelet using Lifting scheme; Lifting implementation of the analysis side of the CDF 9/7 filter bank; (b) Structure of the CDF 9/7 filter. The coefficients of each subband are quantized with an appropriate quantization step value (Q stp ). The transform coefficients are categorized according to its subband membership to (L n, H n H 2, H 1 ). The approximation subband (L n ) coefficients are quantized using low quantization step, which is always smaller than the quantization step that we used to quantize detail subbands' coefficients. Also, the quantization step of the high level detail subband coefficients are smaller than that for low level subband. C. Modified Run Length Coder A modified run length encoding (RLE) step is used to prune the long runs of zeros; because shift coding could not handle efficiently the long runs, it can only prune the short runs through the pairing step. As mentioned earlier, the modified step added to the applied RLE is to overcome the known weakness of the traditional run length encoder. The selective based mechanism was adopted to improve RLE performance; the selectivity criteria is imposed on the size of the 0's run such that "if the length of runs is less than (m) zeros then this run is handled as a sequence of symbols and not as zero runs". The modified run length encoder attributes explained in details in the next section. D. High Order Shift Coding As in our previous work [8], the developed high order shift encoder is applied, as the last step, on the output stream coming from RLE stage. The proposed method consists of the following steps: 1. Map symbols' values to be always positive. 2. Bound the dynamic range of input symbols values to be always lower than certain range. 3. Detect the most redundant pairs of symbols and represent each pair by a new single symbol. 4. Apply shift coding optimizer to find the best lengths for the short and long codewords. III. PROPOSED SYSTEM WORKFLOW The developed audio compression system consists of two major units; the first is the Encoding unit and the second one is the Decoding unit. A. Encoding Unit: The system flow involves the following stages: Stage-1: Pre-processing Stage: this stage implied the following major steps: 1. Loading the waveform audio file (with WAVE) format. This process includes reading the header data to get the basic file and signal specification information (i.e., number of samples, number of channels, sampling rate, and sampling resolution). Then the audio signal data is loaded as an array of unsigned bytes when the sample resolution is (8 bit/sample) and as an array of signed integers if the sample resolution is 16 bit/sample. 2. Normalizing the loaded audio data values to the range [-1,1] using the following equations: (i) For 8 bit sampling resolution: W i 128 W n i = (1) 128 (ii) For 16 bit sampling resolution: W n i = W(i) 32768 (2) Where, W(i) is the i th element value of the loaded audio data. 2014, IJARCSSE All Rights Reserved Page 65
Stage-2: Compressor Stage Biorthogonal (9/7) Wavelet Transform: in this step the analysis filter bank (CDF 9/7) that followed by down sampling is applied to decompose the input audio signal. The resulting subbands may be, further, input to the analysis filter bank and down sampled again. The process may be repeated according to number of wavelet transform passes parameter; which is predefined by user. The following set of equations describes the four lifting steps and the two scaling steps applied to accomplish biorthogonal (9/7) wavelet decomposition: Y 2n + 1 = X 2n + 1 + a X 2n + X 2n + 2 (3a) Y 2n = X 2n + b Y 2n 1 + Y 2n + 1 (3b) Y 2n + 1 = Y 2n + 1 + c Y 2n + Y 2n + 2 (3c) Y 2n = Y 2n + 1 + d Y 2n 1 + Y 2n + 1 (3d) Y 2n + 1 = K Y 2n + 1 Y 2n = 1 Y 2n K (3f) (3e) The values of the (a, b, c, d, K) parameters are: a=-1.58613434 b=-0.0529801185 c=-0.8829110762 d=-0.4435068522 K=1.149604398 Stage-3: Scalar Quantization Uniform scalar quantization is applied to quantize the wavelet coefficients of each subband. The value of quantization step, Q high (n), of n th detail subband higher than that for the neighbor high detail subband, Q high (n+1); the quantization step is increased using linear progressive relationship. Since the approximation subband is the most energetic one, the quantization step (Q low ) of the low subband was lower than the detail subbands. The used equation for uniform quantization is: w i w q i = round (4) Q Where, Q=Q low when quantization process is for the approximation subband, and Q=Q high (n) when quantization is applied on n th high subband, w(i) is i th coefficients of the wavelet transform, w q is the corresponding quantization index value. The applied progressive linear equation for assigning the quantization steps for the detail subbands is: Q ig n 1 = β Q ig n (5) Where, β the rate of increase of the quantization step. Stage-4: Selective Run Length Encoder In our proposed system, the run length encoding (RLE) step is applied because of the redundant occurrence of 0's sequences; some of the sequences consist of long runs of zeros. Due to the high probability of occurrence of zeros short runs, the efficiency of RLE could be reduced significantly. So, some additional steps were added to RLE steps to ensure the encoder will handle only the long zero runs and deals with short runs as successive sequence of symbols. Shift coding could handle the repeated sequences of short zero runs through it symbols pairing step (as clarified in the next stage). The steps of the introduced selective run length encoder are: 1. Mapping to positive: All samples of the input stream to RLE are mapped to be positive numbers; this step is useful to simplify the next coding steps. The following mapping equation had been used to convert the signed samples into positive samples: 2C i if C 0 C (i) = (6) 2C i 1 if C < 0 Where, C(i) the i th element value of the signed quantization index value. 2. Increase the values of all C'() elements by 1. 3. Detect the long ones sequences, such that length of the run should be higher than a predefined threshold (T run ) value. In the conducted tests the value of T run is set 5. 4. Calculate the histogram for the output coefficients to find the lowest element value (J) that have histogram value smaller than the histogram of the zero value. Then, replace each zero with this specific element (J), and each coefficient smaller than (J) subtracted by one: J 0 For I =0 to All Histogram Elements If (His (0)>His (I)) Then J I 2014, IJARCSSE All Rights Reserved Page 66
Break End If End For For I =0 to All C'() Elements If (C'(I) = 0) Then C'(I) J Else If (C'(I) J) Then C'(I) C'(I) 1 End If End For 5. Save the length of detected long zero runs in a separated array and save it using the traditional shift coding method. Stage-5: High Order Shift Coding This stage is same like that illustrated in our previous paper [8]. The steps high order shift encoder are summarized in the following: 1. Range Bounding: In this step the parameters: (i) highest value (Max), (ii) mean value (m), and (iii) the mean absolute deflection (μ) from the mean For the input stream are calculated. Then, determine the parameter (R max) value using the following criterion: R max = min (m + 2μ, Max, 241) Then all elements values of C'() that are equal or higher than R max are represented using the triple set of values (241, C'(i) mod R max, C'(i) div R max ). 2. Pairing: This step is applied to detect the most redundant pair of C'() symbols and replace the pair by single value (R max +1), then increment R max by 1. The pairing operation is repeated for a number of times (at least 10 times). 3. Shift Coding Optimizer: In this step the shift coding optimizer is applied to find the optimal size of the short and long codewords needed to represent the small and sequence of values. 4. Shift Coding: Do the traditional shift coding operation, and then save the produced code words into the compressed binary output file. B. Decoding Unit The Decoding unit consists of the inverse operations to those applied in the encoding process; and these operations are applied in reverse order. The operations are: (i) shift decoding, (ii) long runs expansion, (iii) de-quantization, (iv) inverse wavelet, (v) mapping to the byte range [0,255] in case of 8-bit sample resolution, and to the integer range [- 32768, 32767] in case of 16-bit sample resolution. In our proposed system a post processing stage is added as last stage of the decoding module. This stage is a selective based low pass filter used to reduce the subjective effect of the produced noise whose effect is sensible at the sound segments have low power. Also, the conducted tests indicated that the error level due to lossy compression is reduced when the proposed post-processing is applied. Figure (1) shows the proposed decoding system. The proposed low pass filter could describe mathematically as in the following [8]: Where, Or Wav new i = 2 j = 2 c j Wav i + j c 0 = exp 1.2 1 Wav i (9) c 0 = exp 1.2 1 Wav i 2 (10) (8) c 1 = c 1 = 2B, c 2 = c 2 = B (11) Where, B = 1 6 1 c 0 (12) IV. TESTS RESULTS We present the result of applying introduced audio compression method on several audios in order to evaluate the performance of the suggested audio compression. For evaluation purpose the objective quality measures (such as the Mean Square Error MSE and the Peak Signal to Noise Ratio PSNR) were utilized. The system was established using C# programming language. The effects of several parameters were studied. Also all additional programs for testing purpose have been developed using same programming language. 2014, IJARCSSE All Rights Reserved Page 67
The effects of the following control parameters have been investigated: (i) Quantization step, Q Low, for low band coefficient, (ii) The quantization step of the highest detail subband, Q high (NoPasses), (iii) progressive rate parameter (β), (iv) number of wavelet transform passes (N passes ), (v) sampling rate, (vi) sampling resolution, and (vii) post filtering. As performance indicators the compression ratio (CR), and PSNR have been calculated. Table (1) shows the characteristics of the audio files, which have been used in the conducted tests. Figure (3) presents the waveform patterns of these samples. Table (2) presents the adopted default values of the considered control parameters, these values are selected after making a comprehensive tests and choosing the best setup of parameters. The effects of each parameter are explored by varying its value while setting other parameters fixed at their default values. Attribute Table I: The Attributes of the Audio Test Samples Audio Samples Test1.Wav Test2.Wav Test3.Wav Sampling Rate (KHz) 32 44.1 44.1 Sample Resolution (bps) 8 8 16 Size (KB) 848 447 893 Audio Type Soft Music Song Animal Voice a. The wave form of Test1.wav b. The wave form of Test2.wav c. The wave form of Test3.wav Fig. 3 The Wave Form of the Tested Waveform Files Table II: The Default Values of the Control Parameters Parameter Default Value Range Npasses 5 passes Range=[2, 9] Sampling Rate 30 KHz {44100, 30000, 22050, 11025} Sample Resolution 8 bps {8, 16} Q l 0.02 [0.02, 0.047] Q h 0.04 [0.04, 0.06] Β 1.2 [1, 2] Figure (4) presents the effects of the number of wavelet transform passes (N passes ) on PSNR and compression ratio. The results indicate that when high compression is the main concern then the selection of high N passes is recommended, while for target case "low compression gain and high fidelity" the choice N passes equal to 5 (or more less down to 1) is more suitable. 2014, IJARCSSE All Rights Reserved Page 68
PSNR (db) Drweesh et al., International Journal of Advanced Research in Computer Science and Software Engineering 4(8), Fig. 4 The Effect of Number of Wavelet Passes on the Proposed System Performance (the audio sample is Test1.wav) Figures (5) and (6) present the effect of Q L on PSNR and compression ratio, Figures (7), (8) show the effect of Q h on PSNR and compression ratio. Figures (9) and (10) present the effects of (β) parameter. It is obvious that the increase of these parameters causes increase in the attained compression gain while decrease the fidelity level. 31.90 31.88 Effect of QL on PSNR 31.86 31.84 31.82 31.80 0.018 0.022 0.026 0.03 0.034 0.038 0.042 0.046 Approximation Quantization Step (QL) Fig. 5 The effect of Approximation Band Quantization Step on PSNR (The audio sample is Test1.wav) Fig. 6 The effect of Approximation Band Quantization Step on Compression Ratio (the audio sample is Test1.wav) 2014, IJARCSSE All Rights Reserved Page 69
PSNR (db) Compression Ratio (CR) PSNR (db) Drweesh et al., International Journal of Advanced Research in Computer Science and Software Engineering 4(8), 40 38 The Effect of QH on PSNR 36 34 32 30 0.015 0.020 0.025 0.030 0.035 0.040 0.045 0.050 0.055 0.060 0.065 Inital Detail Quantization Step (QH) Fig. 7 The effect of Initial Detail Band Quantization Step on PSNR (The audio sample is Test1.wav) 10 9 The Effect of QH on PSNR 8 7 6 5 4 0.015 0.020 0.025 0.030 0.035 0.040 0.045 0.050 0.055 0.060 0.065 Initial Detail Quantization Step (QH) Fig. 8 The effect of Initial Detail Band Quantization Step on Compression Ratio (The audio sample is Test1.wav) 43 41 The effect of Progrssive Rate on PSNR 39 37 35 33 31 0.98 1.02 1.06 1.10 1.14 1.18 1.22 Progressive Rate Fig. 9 The effect of Detail Band Progressive Rate on PSNR (The audio sample is Test1.wav) Fig. 10 The effect of Detail Band Progressive Rate on Compression Ratio (the audio sample is Test1.wav) 2014, IJARCSSE All Rights Reserved Page 70
Figure (11) shows the effect of sampling resolution on performance of our compression schema. The results refer that the impact of sampling resolution is significant; the 16 bit sampling resolution gives more impressive result. In this set of tests a conversion from 16-bits (2's complement integer representation) to 8-bit (unsigned byte representation) is done to get the corresponding low resolution audio sample. Fig. 11 The effect of sampling resolution on the relation between PSNR and Compression ratio (the audio sample is Test3.wav) Figure (12) presents the effect of sampling rate on the relation between PSNR and compression ratio, the result indicate that the 44.1 KHz sampling rate gain highest compression ratio. Fig. 12 The effect of sampling rate on the relation between PSNR and compression ratio (the audio sample is Test2.wav) Figure (13) shows the effect of post processing filter on the PSNR and compression ratio. We have used post processing algorithm as a last step in the decoding unit, in order to enhance the subjective quality of the audio signal. The result indicate that when the reconstructed audio signal have a high fidelity the post processing filter enhance only the subjective quality without improving the level of PSNR. While, the reconstructed audio signal that have low fidelity, the post filtering algorithm enhances both the subjective quality and increase the fidelity level (PSNR). Fig. 13 The effect of sampling rate on the relation between PSNR and compression ratio (The audio sample is Test2.wav) 2014, IJARCSSE All Rights Reserved Page 71
The above listed results indicate that the proposed audio compression system can lead to high compression ratio without making significant degradation in quality. Also, post filtering improved the quality of reconstructed audio signal. The (number of passes 9) gives better compression results but it causes an increase in the encoding/coding time. For comparison purpose, figures (14a & 14b) present the attained relationship (i.e., PSNR versus CR) for the DCT method [8] and the wavelet method. It is obvious that wavelet based method performs better the DCT for both cases (8 bps and 16 bps). a. 8-bits/sample (a) 16 bits/sample Fig.14 A comparison between the results of DCT and Wavelet audio compression methods (The test sample is Test3.wav) V. CONCLUSION In this paper, an audio compression scheme using biorthogonal tab 9/7 with modified run length and high order shift coding had been introduced. The test results indicated that the proposed scheme is encouraging, and outperforms the performance of DCT compression scheme [8]. The new selective run length coding had successfully improved the compression gain. The post filtering algorithm enhances the audio quality and improved the fidelity level (in terms of MSE and PSNR) especially at low PSNR levels. As a future work the developed system can be improved by merging the two transforms (Wavelet and DCT) in one coding scheme for audio compression, the advantages of both transforms could be exploited to get better compression gain. REFERNCES [1] P. Havaldar, G. Medioni, "Multimedia Systems Algorithms Standards and Industry Practices", Book, Cengage Learning, Boston, MA, USA, 2010.Salomon, D.; "Data Compression: The Complete Reference"; Book; Springer; New York; 2004. [2] D. Salomon, "Data Compression: The Complete Reference", Book, Springer; New York, 2004. [3] D. Katz, R. Gentile, "Embedded Media Processing", Book; Elsevier Science, September 2005. [4] Dhubkarya, and Dubey, "High Quality Audio Coding at Low Bit Rate Using Wavelet and Wavelet Packet Transform"; Journal of Theoretical and Applied Information Technology, Vol. 6, No. 2, Pp. 194-200, 2009. [5] O.O. Khalifa, S.H. Harding, and A.H.A. Hashim; "Compression Using Wavelet Transform", 2nd Edition; Signal Processing: An International Journal, Vol. 2, Issue 5, Pp 17-26, 2007. 2014, IJARCSSE All Rights Reserved Page 72
[6] H. Kaur, R. Kaur, "Speech compression and decompression using DWT and DCT", Int. J. Computer Technology & Applications, Vol. 3, Issue 4, Pp. 1501-1503, 2012. [7] M. V. Patil, A. Gupta, A. Varma, S. Salil, "Audio and Speech Compression Using DCT and DWT Techniques", International Journal of Innovative Research in Science, Engineering and Technology, Vol. 2, Issue 5, Pp. 1712-1719, May 2013. [8] Z. T. Drweesh, L. E. Goerge;" Audio Compression Based on Discrete Cosine Transform, Run Length and High Order Shift Encoding"; International Journal of Engineering and Innovative Technology (IJEIT(, Vol. 4, Issue 1, Pp. 45-51, July 2014. [9] Teena, V., Vidya, C., and Dipti, P.; "The Haar Wavelet and The Biorthogonal Wavelet Transforms of an Image"; International Journal of Engineering Research and Applications (IJERA), ISSN: 2248-9622, National Conference on Emerging Trends in Engineering & Technology (VNCET-30 Mar 12), Pp. 288-291, March 2012. [10] H. Bekkouche, M. Barret and J. Oksman, "Adaptive listing scheme for lossless image coding"; SUPELEC, Equipe Signaux at Electrolux Systems, France, February, 2002. [11] Jan E. Odegard C. Sidney Burrus, Smooth Biorthogonal Wavelets For Applications In Image Compression ; Department of Electrical and Computer Engineering Rice University, Houston, Texas 77005-1892, USA. odegard@rice.edu.; http://wwwdsp.rice.edu, 1997 [12] Mohammed B., Abdelhafid B., Abdelmounaim M., and Abdelmalik T., "Improving Quality of Medical Image Compression Using Biorthogonal CDF Wavelet Based on Lifting Scheme and SPIHT Coding"; Serbian journal of electrical engineering; Vol. 8, No. 2, Pp. 163-179, May 2011. 2014, IJARCSSE All Rights Reserved Page 73