Audio Compression Using Decibel chirp Wavelet in Psycho- Acoustic Model

Size: px

Start display at page:

Download "Audio Compression Using Decibel chirp Wavelet in Psycho- Acoustic Model"

Olivia Shelton
5 years ago
Views:

1 Audio Compression Using Decibel chirp Wavelet in Psycho- Acoustic Model 1 M. Chinna Rao M.Tech,(Ph.D) Research scholar, JNTUK,kakinada chinnarao.mortha@gmail.com 2 Dr. A.V.S.N. Murthy Professor of Mathematics, Srinivasa Institute Of Engg & Tech avsnmurthy2005@gmail.com 3 Dr.Ch.Satyanarayana Assoc. Professor of CSE, JNTUK,kakinada Abstract: The advanced technologies, transfer of data over the internet and multimedia era have become one of the basic needs in communication world. So that the design of multimedia workstations along with high quality of audio transmission and storage is a big challenge. In this paper we present a new design of a psycho-acoustic model by using Decibelchirp for audio coding following the model used in the standard MPEG-1 audio layer 3. The goal is to achieve transparent coding of audio and speech signals at the lowest possible data rates. This architecture is based on appropriate wavelet packet decomposition instead of a short term Fourier transformation. It s is important characteristic is to propose an analysis of the frequency bands that come closer to the critical bands of the ear. This study shows the best performance of the existing. Keywords-Audio Compression,MPEG-1,Psycho-acoustic model,wavelet Packet Decomposition, FFT, Frequency bands, Critical bands. 1. INTRODUCTION Audio compression has become one of the basic technologies of the multimedia age. The change in the telecommunication infrastructure, in recent years, from circuit switched to packet switched systems has also reflected on the way that speech and audio signals are carried in present systems. In many applications, such as the design of multimedia workstations and high quality audio transmission and storage, the goal is to achieve transparent coding of audio and speech signals at the lowest possible data rates. In other words, bandwidth cost money, therefore, the transmission and storage of information becomes costly. The MPEG/Audio is a standard for both transmitting and recording compressed ratio. Compression refers to a variety of ways to reduce the size of a data file. Audio compression is form of data compression. To obtain this different methods for compression have been designed; from the simplest one that consists to make a banal under sampling, to the most advanced that takes account of the sensitivity of the human ear. The MPEG algorithm achieves compression by exploiting the perceptual limitation of the human ear. Audio compression algorithms are used to obtain compact digital representations of highfidelity audio signals for the purpose of IJCSIET-ISSUE2-VOLUME2-SERIES1 Page 1

2 efficient transmission. The main objective in audio coding is to represent the signal with a minimum number of bits while achieving transparent signal reproduction. The majority of MPEG1coders apply a psychoacoustic model for coding applications using filter bank to approximate the frequency selectivity of the human auditory system [1]. The ear is an organ of large sensibility, presenting a high resolution and a great dynamic of the signal [10]. Most MP3 coders use the Fourier Transform whose fundamental roles are limited only to the spectral properties of the signal. In this paper we present a new design for modelling auditory masking is based on wavelet packet decomposition. Wavelet [2] unlike FFT [2], whose basic functions are sinusoids, is based on small waves, called wavelets, of varying frequency and limited duration. Its most important characteristics are conceived to analyze temporal and spectral properties of non-stationary signals [4] such as audio. A new architecture of a psycho-acoustic model based on wavelets decomposition can be applied to the audio coders in sub-bands approximating the critical bands. Models of human auditory pathway are commonly used as a part of audio coding systems or algorithms for objective evaluation of sound quality. In order to properly simulate the nonlinear, signal-dependent behavior of the cochlea, physiologically accurate models were developed, such as the Munich model [10] which was concepted by Zwicker and as for the Cambridge [11] one by Moore. Our study is made up of fourth parts. The first part will focus on the proposed psychoacoustic model. The second part compares Munich and Cambridge with new one. The third part deals with a comparison between the compressions ratios of the proposed coders (FFT and wavelet). Finally, a sound quality evaluation will be carried out to conclude this work. II. RELATED WORK Many existing psycho-acoustic models [10][11], employ an FFT-based transform to derive a spectral decomposition of the audio signal into uniform sub-bands with equal bandwidths. Currently the filter banks use the frequency domain, which gives only localization in frequency. This can be detrimental to some instruments, which are very localized in time. Wavelets, however, give both localization in time and frequency. Therefore, using filter banks that utilize wavelets might allow for better audio compression. The cochlear filter bank is based on a novel structure that supports the time-frequency resolution necessary to simulate psychophysical data closely related to cochlear spectral decomposition properties. It will be shown that this filter bank is able to closely mimic the spectral and temporal properties of frequency decomposition of the human peripheral auditory system. III.DECIBELCHIRP WITH PSYCHO-ACOUSTIC MODEL Psycho-acoustic studies show that each frequency takes place in the cochlea, along the basilar membrane. Therefore, the cochlea can be viewed from a signalprocessing perspective as a bank of highly overlapping band-pass filters [9]. The efficient algorithm of the new design is based on Wavelet packet decomposition is applied on small frames. In order to compare our work to some proposed models using various subdivisions of filter bank (28, 29 sub-bands), an approximation of the critical bands is adopted using 26 sub bands as shown in Figure 2. We choose this subdivision because it represents a good approximation of the real critical band (25 sub-bands) [8]. Then, we determine tonal and non tonal components. This step begins with the determination of the local maxima, followed by extracting the tonal components IJCSIET-ISSUE2-VOLUME2-SERIES1 Page 2

3 (sinusoidal) and non tonal components (noise) in every bandwidth of critical bands. The selective suppression of tonal and non tonal components of masking is a procedure used to reduce the number of maskers taken into account for the calculation of the global masking threshold [1]. The remaining tonal and non tonal components are those which are above the hearing absolute threshold. Individual masking threshold takes into account the masking threshold for each remaining component. Lastly, global masking threshold is calculated by the whole of tonal and non tonal components which are deduced from the spectrum of the wavelet packet decomposition. In our psycho-acoustic model, wavelet packet decomposition is developed for khz sampling frequency. The new model analyzes the input signal. The main important step is that wavelet packet decomposition is designed to closely mimic the critical bands in a psycho-acoustic model. The wavelet filters are usually chosen with many taps and a narrow transition band. Hence, aliasing between bands is minimized. The calculation of the spectrum of the input signal was carried out using decomposition of wavelets packets whose connections are selected in such a way that sub-bands correspond to the best possible one to the critical bands. The wavelet packet decomposition induces an organization of information according to a frequential segmentation as shown in Figure 2. Note: Fig 1 and Fig 2 have been taken from fundamental base papers along with mathematical formulae. Fig 2. The proposed wavelet decompostion Fig 1. The psycho acoustic model using wavelet decompostion In fact, on each level of the tree, the information corresponding to the entire signal is divided into frequency bands. IV. EXISTING MORLET FILTER BANKS Morlet wavelet is the mother wavelet to decompose the audible spectrum for the IJCSIET-ISSUE2-VOLUME2-SERIES1 Page 3

4 cochlear filters bank. In the time domain, the mother wavelet is a high-frequency vibration whose amplitude decreases when the time varies from zero to infinity. The Morlet wavelet transform has proved to be beneficial for many types of wave signal processing and has shown a good performance in tasks like audio coding [2]. The Morlet wavelet shape is not simply analytical, but also very resolute in time and frequency, regular, locally periodic and with non compact support. It is formed by complex values of the shape of a complex sinus modulated by a Gaussian envelope [5]. It is characterized by narrow frequency response, which offers a higher spectral resolution. This wavelet is defined as: is the Gaussian envelope, fo is the center frequency of the mother wavelet and B is the equivalent rectangular bandwidth of the cochlea [6]. The peripheral auditory system is equivalent to a band pass filter bank [11] whose respective bands would have the critical bandwidth. To approximate these critical bands, we adopted in our work two models called Munich and Cambridge [8]. In fact, there is a similarity between the widths of bands of these models and those of the ear. Those models are defined by two functions [8]: Considering a constant surface of elementary paving in time frequency domain, we can prove that [15]: Which F is the central frequency of the band (in KHz). Consequently, the Morlet Munich and Cambridge wavelet are given by: A mother wavelet must have finite energy [14] : Our cochlea is characterized by a number of critical bands whose frequency distribution varies logarithmically. When comparing those filter banks, we remark that Munich and Cambridge filter bank is more adequate to the cochlear filter bank than the new model. In fact, its bandwidths grow faster than those of Cambridge. V. RATIO OF SOUND COMPRESSION EVALUATION WITH DECIBELCHIRP IJCSIET-ISSUE2-VOLUME2-SERIES1 Page 4

In order to evaluate Morlet Munich and Cambridge wavelet compression, we used for various bitrates between 64 kbits/s and 160 Kbits/s some types of sound such as Soul, Slow, Rock, Arabic music and

Table1 contains the type of the sound files used for the test, the average of compression ratio (Comp), their capacity (Cp), their duration (t) and the compression ratio calculated for each bit rate.

5 In order to evaluate Morlet Munich and Cambridge wavelet compression, we used for various bitrates between 64 kbits/s and 160 Kbits/s some types of sound such as Soul, Slow, Rock, Arabic music and voice. The evaluation is based on the compression ratio defined as the quotient between the size of the original file and MP3 file. Table1 contains the type of the sound files used for the test, the average of compression ratio (Comp), their capacity (Cp), their duration (t) and the compression ratio calculated for each bit rate. The weakest compression ratio is given for the flow of 160 Kbits/s and the highest is given for the flow of 64 Kbits/s. However, the weaker the bit rate, the higher the compression ratio is and the less intelligible its quality becomes. Based on the results obtained in table1, the Morlet Munich filters bank takes account of the masking phenomenon and takes account of the critical bands. The second part of our evaluation will focus on comparing the compression ratio obtained from the coder based on wavelet analysis and the coder based on an FFT analysis versus using a bit rate of 128 kbits/s which gives the best compromise between compression ratio and sound quality [10]. It reveals that sound compression using Morlet Munich wavelet is the best one. In fact, their average compression ratios present an improvement of 13.59% and 3.2% in comparison to the FFT and Decibelchirp coders. models. Those figures show that high frequencies are canceled for compression. We remark that the Munich model is quiet better than the Decibelchirp. In fact, for Munich model, high frequencies are canceled from Hz, as for the Decibelchirp they are canceled from 18,500 Hz. Fig. 4. Spectrum of original signal and compressed signal for Munich model Sound FET Cambridge Munich Decibelchirp Voice Soul Arabic Comp Rock Table 1.Comparison beteen FFT and Wavelet Compression This new psychoacoustic model has been implemented in reference coder based on the standard MPEG-1 audio layer 3 [1]. Figures 5 and 6 shows the spectrum of the original and compressed signal using the two Fig.5.Spectrum of original signal and compressed signal for Decibelchirp model IJCSIET-ISSUE2-VOLUME2-SERIES1 Page 5

6 VI. EVALUATION OF SOUND QUALITY Approval and intelligibility are the two most significant criteria to consider subjective quality. For this purpose, we invited several subjects to hear some mp3 files resulting from the coder based on FFT analysis, then on Munich and Decibelchirp Morlet wavelet transforms. The protocol of evaluation consists in listening to the compressed sound file. Then, the possibility is offered to the listeners to listen to it as long as they wish. The listeners are 24: 12 men and 12 women just heard on a five-point opinion scale, ranging from bad to excellent. The ratings are assigned integer scores ranging from 1 for bad to 5 for excellent. This comparison, based on results obtained from the statistic card, shows that Morlet Munich coder gives satisfactory results especially for Slow, Soul and Arabic sound. We noticed that for those sound files our coder received a significant number of notes ranging between 3 and 4. Consequently; it gives the best statistics for quality in comparison to the FFT and Cambridge coders. packet decomposition offers a good performance. In fact, it gives the best compression ratio and sound quality in comparison to the FFT and Cambridge coders. A high selectivity was noticed and can lead to some interesting perspectives on audio coding using Decibelchirp of psychoacoustic model. REFERENCES [1] ISO /IEC (F), Information technology - Coding of moving picture and associated audio for digital storage media at up to about 1.5Mbits/s Part3: Audio, [2] G. Hernandez, M. Mendoza, B. Reusch, and L. Salinas, Shiftability and filter bank design using Morlet wavelet, Computer Science Society, Nov. 2004, pp [3] D. Sinha and A. Tewfik, Low Bitrate Transparent Audio Compression using adapted wavelets, Proc. IEEE Trans.Sig, Dec. 1993, pp [4] B. Carnero and A. Drygajlo, Perceptual speech coding and enhancement using frame-synchronized fast wavelet packet transform algorithm, Proc. IEEE Transaction on speech and audio processing, vol 47, NO 6, June [5] I. Daubechies, Ten lectures on wavelet, SIAM, Philadelphia, PA, [6] J. O. SmithIII and J. S. Abel, Bark and ERB Bilinear transforms, Proc. IEEE Transactions on Speech and Audio Processing, [7] G. Davidson, L. Fielder, and M. Antill, High-quality audio transform coding at 128 kbits/s, Proc. IEEE ICASSP, vol.2, 1990, pp Figure. 6. Sound quality histogram VII.CONCLUSION This paper studies audio compression using new psychoacoustic model with Decibelchirp for wavelet transforms. Our goal is to identify the best model that can be applied to the audio coders in sub-bands approximating the critical bands and finally to improve the psycho-acoustic model. This study shows that the Munich Morlet wavelet [8] F. Baumgarte, A computationally efficient cochlear filter bank for perceptual audio coding storage, Proc. IEEE ICASSP, vol.5, 2001, pp [9] W. W. Chang and C. T. Wang, Auditory patterns, Proc. IEEE Transactions on speech and audio processing, vol. 4, NO 2, March [10] E. Zwicker and E. Terhardt, Analytical expressions for critical band rate and critical bandwidth as a function of frequency, J. Acoust. Soc. Am, vol. 68, NO 5, Nov [11] C. Wang and Y. C. Tong, An improved critical-band transform processor for speech applications, Circuits and Systems, May 2004, pp IJCSIET-ISSUE2-VOLUME2-SERIES1 Page 6

Perceptual coding. A psychoacoustic model is used to identify those signals that are influenced by both these effects.

Perceptual coding. A psychoacoustic model is used to identify those signals that are influenced by both these effects. Perceptual coding Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal. Perceptual encoders, however, have been designed for the compression of general