Zhitao Lu and William A. Pearlman. Rensselaer Polytechnic Institute. Abstract

Similar documents
An Embedded Wavelet Video Coder. Using Three-Dimensional Set. Partitioning in Hierarchical Trees. Beong-Jo Kim and William A.

An Embedded Wavelet Video. Set Partitioning in Hierarchical. Beong-Jo Kim and William A. Pearlman

SIGNAL COMPRESSION. 9. Lossy image compression: SPIHT and S+P

Modified SPIHT Image Coder For Wireless Communication

A 3-D Virtual SPIHT for Scalable Very Low Bit-Rate Embedded Video Compression

Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal.

An embedded and efficient low-complexity hierarchical image coder

Perceptual coding. A psychoacoustic model is used to identify those signals that are influenced by both these effects.

ANALYSIS OF SPIHT ALGORITHM FOR SATELLITE IMAGE COMPRESSION

Fingerprint Image Compression

Motion Estimation. Original. enhancement layers. Motion Compensation. Baselayer. Scan-Specific Entropy Coding. Prediction Error.

Optimized Progressive Coding of Stereo Images Using Discrete Wavelet Transform

A SCALABLE SPIHT-BASED MULTISPECTRAL IMAGE COMPRESSION TECHNIQUE. Fouad Khelifi, Ahmed Bouridane, and Fatih Kurugollu

A Study of Image Compression Based Transmission Algorithm Using SPIHT for Low Bit Rate Application

A New Configuration of Adaptive Arithmetic Model for Video Coding with 3D SPIHT

FAST AND EFFICIENT SPATIAL SCALABLE IMAGE COMPRESSION USING WAVELET LOWER TREES

Wavelet Based Image Compression, Pattern Recognition And Data Hiding

Mpeg 1 layer 3 (mp3) general overview

Improved Image Compression by Set Partitioning Block Coding by Modifying SPIHT

Fast Color-Embedded Video Coding. with SPIHT. Beong-Jo Kim and William A. Pearlman. Rensselaer Polytechnic Institute, Troy, NY 12180, U.S.A.

Chapter 14 MPEG Audio Compression

Wavelet Transform (WT) & JPEG-2000

Audio-coding standards

An Embedded Wavelet Video Coder Using Three-Dimensional Set Partitioning in Hierarchical Trees (SPIHT)

Audio-coding standards

5: Music Compression. Music Coding. Mark Handley

Wavelet Based Image Compression Using ROI SPIHT Coding

Coding the Wavelet Spatial Orientation Tree with Low Computational Complexity

Fully Scalable Wavelet-Based Image Coding for Transmission Over Heterogeneous Networks

REGION-BASED SPIHT CODING AND MULTIRESOLUTION DECODING OF IMAGE SEQUENCES

Image Compression Algorithms using Wavelets: a review

Low-Memory Packetized SPIHT Image Compression

Bit-Plane Decomposition Steganography Using Wavelet Compressed Video

ELL 788 Computational Perception & Cognition July November 2015

Embedded Descendent-Only Zerotree Wavelet Coding for Image Compression

Error Protection of Wavelet Coded Images Using Residual Source Redundancy

Compression of RADARSAT Data with Block Adaptive Wavelets Abstract: 1. Introduction

Multimedia Communications. Audio coding

An adaptive wavelet-based approach for perceptual low bit rate audio coding attending to entropy-type criteria

Appendix 4. Audio coding algorithms

An Optimum Approach for Image Compression: Tuned Degree-K Zerotree Wavelet Coding

EXPLORING ON STEGANOGRAPHY FOR LOW BIT RATE WAVELET BASED CODER IN IMAGE RETRIEVAL SYSTEM

Principles of Audio Coding

Perceptual Coding. Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding

New Results in Low Bit Rate Speech Coding and Bandwidth Extension

Scalable Compression and Transmission of Large, Three- Dimensional Materials Microstructures

Fully Spatial and SNR Scalable, SPIHT-Based Image Coding for Transmission Over Heterogenous Networks

Lecture 16 Perceptual Audio Coding

Motion-Compensated Subband Coding. Patrick Waldemar, Michael Rauth and Tor A. Ramstad

Four-Dimensional Wavelet Compression of 4-D Medical Images Using Scalable 4-D SBHP. CIPR Technical Report TR Ying Liu and William A.

Audio Fundamentals, Compression Techniques & Standards. Hamid R. Rabiee Mostafa Salehi, Fatemeh Dabiran, Hoda Ayatollahi Spring 2011

Progressive resolution coding of hyperspectral imagery featuring region of interest access

Audio Compression. Audio Compression. Absolute Threshold. CD quality audio:

signal-to-noise ratio (PSNR), 2

CHAPTER 4 REVERSIBLE IMAGE WATERMARKING USING BIT PLANE CODING AND LIFTING WAVELET TRANSFORM

QUANTIZER DESIGN FOR EXPLOITING COMMON INFORMATION IN LAYERED CODING. Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose

Module 9 AUDIO CODING. Version 2 ECE IIT, Kharagpur

CSEP 521 Applied Algorithms Spring Lossy Image Compression

DCT-BASED IMAGE COMPRESSION USING WAVELET-BASED ALGORITHM WITH EFFICIENT DEBLOCKING FILTER

Review and Implementation of DWT based Scalable Video Coding with Scalable Motion Coding.

Color Image Compression Using EZW and SPIHT Algorithm

Efficient Signal Adaptive Perceptual Audio Coding

Scalable Medical Data Compression and Transmission Using Wavelet Transform for Telemedicine Applications

The MPEG-4 General Audio Coder

SPREAD SPECTRUM AUDIO WATERMARKING SCHEME BASED ON PSYCHOACOUSTIC MODEL

AUDIO COMPRESSION ATLO W BIT RATES USING A SIGNAL ADAPTIVE SWITCHED FILTERBANK. AT&T Bell Labs. Murray Hill, NJ 07974

Low computational complexity enhanced zerotree coding for wavelet-based image compression

Parametric Coding of High-Quality Audio

Optical Storage Technology. MPEG Data Compression

Adaptive Quantization for Video Compression in Frequency Domain

Efficient, Low-Complexity Image Coding with a Set-Partitioning Embedded Block Coder

Aliasing reduction via frequency roll-off for scalable image/video coding

Three-Dimensional Wavelet-Based Compression of Hyperspectral Images

Embedded lossless audio coding using linear prediction and cascade coding

Lecture 5: Error Resilience & Scalability

PERFORMANCE IMPROVEMENT OF SPIHT ALGORITHM USING HYBRID IMAGE COMPRESSION TECHNIQUE

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Senior Manager, CE Technology Dolby Australia Pty Ltd

AUDIO COMPRESSION USING WAVELET TRANSFORM

Fundamentals of Perceptual Audio Encoding. Craig Lewiston HST.723 Lab II 3/23/06

Analysis and Comparison of EZW, SPIHT and EBCOT Coding Schemes with Reduced Execution Time

MOBILE VIDEO COMMUNICATIONS IN WIRELESS ENVIRONMENTS. Jozsef Vass Shelley Zhuang Jia Yao Xinhua Zhuang. University of Missouri-Columbia

Fully scalable texture coding of arbitrarily shaped video objects

Data Compression. Audio compression

2.4 Audio Compression

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Snr Staff Eng., Team Lead (Applied Research) Dolby Australia Pty Ltd

A Image Comparative Study using DCT, Fast Fourier, Wavelet Transforms and Huffman Algorithm

Comparison of EBCOT Technique Using HAAR Wavelet and Hadamard Transform

Scalable to lossless audio compression based on perceptual set partitioning in hierarchical trees (PSPIHT)

IMAGE DATA COMPRESSION BASED ON DISCRETE WAVELET TRANSFORMATION

Modeling of an MPEG Audio Layer-3 Encoder in Ptolemy

ENTROPY CODING OF QUANTIZED SPECTRAL COMPONENTS IN FDLP AUDIO CODEC

Speech and audio coding

FPGA IMPLEMENTATION OF BIT PLANE ENTROPY ENCODER FOR 3 D DWT BASED VIDEO COMPRESSION

Embedded Rate Scalable Wavelet-Based Image Coding Algorithm with RPSWS

Reconstruction PSNR [db]

Comparison of different Fingerprint Compression Techniques

DIGITAL TELEVISION 1. DIGITAL VIDEO FUNDAMENTALS

MPEG-4 General Audio Coding

MRT based Fixed Block size Transform Coding

Lapped Orthogonal Transform Coding by Amplitude and Group Partitioning

Transcription:

An Ecient, Low-Complexity Audio Coder Delivering Multiple Levels of Quality for Interactive Applications Zhitao Lu and William A. Pearlman Electrical,Computer and Systems Engineering Department Rensselaer Polytechnic Institute Troy, NY 12180 July 6, 2000 Abstract This paper proposes an ecient, rate-scalable, low complexity audio coder based on the SPIHT (set partitioning in hierarchical trees) coding algorithm [5], which has achieved notable success in still image coding. Awavelet packet transform is used to decompose the audio signal into 29 frequency subbands corresponding roughly to the critical subbands of the human auditory system. A psychoacustic model,which, for simplicity, is based on MPEG model I, is used to calculate the signal to mask ratio, and then calculate the bit rate allocation among subbands. We distinguish the subbands into two groups: the low frequency group which contains the rst 17 subbands corresponding to 0-3.4 KHz, and the high frequency group which contains the remaining high frequency subbands. The SPIHT algorithm is used to encoder and decode the low frequency group and a reverse sorting process plus arithmetic coding algorithm is used to encode and decode the high frequency group. The experiment shows that this coder yields nearly transparent quality at bit rates 55-66 Kbits/second, and and degrades only gradually at lower rates. The low complexity of this coding system shows its potential for interactive applications with levels of quality from good to perceptually transparent. 1 Introduction Source coding of wideband audio signals for storage and/or transmission application over bandlimited channels is currently a research topic receiving considerable attention. Its applications are in the elds of audio production, program distribution and exchange, digital audio broadcasting, digital storage, video conference and multimedia applications. The industrial standard for wideband audio signal with sampling rate at 44.1 KHz which covers the entire audible frequency range of the human hearing system, each sample is quantized into 16 bits, without compression, the bit rate will be 705.6 Kb/sec for one channel. The goal of audio data compression is to get the bit rate as low as possible without perceptible distortion. Most proposed audio coders are transform coders or subband coders. They mainly include three parts: subband decomposition or transform, dynamic bit allocation and the coding algorithm. First the original audio data is transformed into subband signals the target bit rate is dynamically allocated among the subbands through a psychoacustic model and then each subband signal is encoded to a bit stream. Wavelet packet decomposition is widely used, since by changing the time resolution and frequency resolution, it represents the audio signal more eciently. The subbands are usually similar 1

to the critical subbands of the human hearing system. The algorithms proposed in [6] and [4] nd the optimal basis for each data frame. In [3] the audio signal is decomposed into harmonic components and noise-like residue. MPEG layer III also uses 6 points and 18 points DCT in subbands to get better high-frequency resolution. For dynamic bit allocation and quantization, the audio data is usually input into a psychoacustic model to calculate the signal to mask ratio in each subband and to determine how many bits should be used for each coecient inevery subband [4]. In [6], a scalar quantizer is used and in [1] a hybrid quantizer{low frequency band using scalar quantizer and high frequency band using vector quantizer{is used to quantize the coecients in each subband. In this paper, we present the coder based on the set partitioning in hierarchical trees (SPIHT) coding algorithm [5] and reverse sorting process plus arithmetic coding algorithm. The SPIHT algorithm, rst used for still image compression, combines quantizer and coder together, and exploits the dependence of the coecients in dierent subbands by setting up similarity trees. Using a wavelet packet transform, dynamic bit allocation, and the SPIHT coding algorithm, which also features fast encoding and decoding, this coder achieves nearly transparent quality at 55-66Kbits/second. The system is also capable of delivering lower rate service from the same bitstream. The degradation of quality at lower bit rates appears to be quite gradual, making the system attractive for delivery of dierent qualities of service requiring near real-time to real-time execution, such as interactive applications. 2 Wavelet Packet Transform and Dynamic Bit Allocation We use the tree structure of lter banks proposed in [6], where there are 29 subbands which mimic the critical subbands of the human hearing system. The lowpass and highpass lters are the length-20 Daubechies lter pair [2], which provide suciently good bandpass characteristics. In parallel with execution of the wavelet packet transform (WPT), the audio source data is fed into a psychoacoustic model, which is based on the ISO/MPEG model 1. After calculating the signal-to-mask ratio in each subband SMR m and combining with the coecients in each subband, we do the dynamic bit allocation [7]. The rate-distortion relation is modeled as: d m (r m )=w m g2 ;2rm where: d m is distortion in each subband r m is bit rate in each subband, w m is the subband weight calculated by SMR m, g is a constant. For each subband with n m samples, the signal-to-noise ratio, SNR m >SMR m, is satised in order to get transparent quality coding. The whole problem of allocating bit rate to achieve the above constraint is a constrained optimization problem, formally solved by use of the Kuhn- Tucker theorem. However, the problem can be solved iteratively by calculating the bit rate for each subband by the formula: ( 1 r m = log 2 2[ wm2 m nm ] w mm 2 =n m > 0 w m m=n 2 m where: n m is the number of samples in m'th subband. 2

When the depth of subband i+1 in the wavelet packet tree l i+1 = l i + 1, then a,b will be the ospring of 1. c,d will be the ospring of 2. e,f will be the ospring of 3. g,h will be the ospring of 4. Figure 1: Figure 3. Denition of The Similarity Tree 2 m is the variance of the coecients in m'th subband. The parameter is continually adjusted until the overall target bit rate is met with nonnegative r m 's. After we get the bit rate for each subband, we determine the nal minimum quantization intervals or stepsizes for each subband. 3 Coding Algorithm In this paper, after getting the quantizer stepsize for each subband, we seperate all the subbands into two groups{the low frequency group which contains the rst 17 subbands (frequency range is 0-3.4KHz) and the high frequency group which contains other higher frequency subbands. The SPIHT algorithm is used to encode the coecients in low the frequency group. The coecients in high frequency group are quantized by the uniform quantizer with stepsizes determined in the rate allocation procedure, and then encoded losslessly by the reverse sorting process plus arithmetic coding. 3.1 SPIHT Algorithm The principles of the SPIHT algorithm are partial ordering of the transform coecients by magnitude with a set partitioning sorting algorithm, ordered bit plane transmission and exploitation of self-similarity accross dierent scales of an audio wavelet packet transform 3.1.1 Similarity Trees In a typical audio signal, most of the energy is concentrated in low frequency bands, but there are basic frequency and harmonic components in dierent subbands. It has been observed that there is a temporal self-similarity among dierent subbands analogous to the spatial self-similarity trees in the 2D wavelet tramsform of an image. The coecients are expected to be better magnitudeordered if we move down following the same similarity tree. We dene the similarity tree in Figure 3, so that every point in subband i corresponds to 2 l i+1;li points in the next subband i + 1, where l i is the depth of the ith subband in the lter bank tree in Figure 1. This denition is analogous to that of spatial orientation trees in image coding with SPIHT. By dening the similarity trees, we organize the coecients into subsets as follows: O(i) set of all osprings of point i. D(i) set of all descendants of point i. 3

H(i) set of all roots of similarity trees. L(i) D(i)-O(i) Each point i and its descendant and ospring sets represent the same time period in dierent frequency subbands. 3.1.2 Set Partitioning Sorting Algorithm The same set partitioning rule is dened in the encoder and decoder. The subset T m is determined if it is signicant or insignicant by the magnitude test max i2tm fjc ijg > 2 n on the subband coecients c i in T m. If the subset is insignicant, a 0 is sent to the decoder, if it is signicant,a 1issent to the decoder and then the subset is further split according to the similarity tree until all the signicant sets are a single signicant point. After this sorting, the indices of the coecients are put onto three lists, the list of insignicant points (LIP), the list of insignicant sets (LIS), and the list of signicant points (LSP). Only bits related to the LSP entries and binary outcomes of the magnitude tests are transmitted to the decoder. The partitioning proceeds as follows in this so-called sorting pass: 1) The initial partition is formed with point iandits descendents D(i). 2) if D(i) is signicant, then it is partitioned into L(i) and the ospring of i, O(i). 3) if L(i) is signicant, then it is partitioned into D(k), for every k in O(i). 3.1.3 Coding Algorithm After each sorting pass, we get the ordered signicant points for the threshold 2 n, and then we send to the decoder the nth most signicant bit of every coecient found signicant at a higher threshold. The binary representation of the magnitude-ordered coecient is shown as Table 1, where the arrows indicate the order of the transmission. By transmitting the bit stream in this ordered bit plane fashion, we always transmit the most valuable remaining bits to decoder. The outline of the full coding algorithm is as follows: 1) Initialization. Set the list of signicant points (LSP) as empty. Set the roots of similarity trees in the insignicant points (LIP) and insignicant sets (LIS). Set the signicance threshold 2 n with n = blog 2 (max (i) (jc i j)c 2) Sorting pass. Using the set partitioning algorithm distribute the appropriate indices of the coecients to the LIP, LIS,and LSP. 3) Renement pass: For each entry in the LSP signicant for higher n, send the nth most signicant bit to the decoder. 4) decrement nby one and return to step 2 until the specied bit rate is reached. 4

Table 1: Binary representation of the magnitude ordered coecients sign s s s s s s s s s s s 5 1 1 0 0 0 0 0 0 0 0 0 4 -! 1 1 0 0 0 0 0 0 0 3 - - -! 1 1 1 1 0 0 0 2 - - - - - - -! 1 1 1 1 - - - - - - - - - -! 0 - - - - - - - - - -! 3.2 Reverse Sorting Process and Arithmetic Coding We use reverse sorting process and arithmetic coding algorithm to encode the quantized coecients in the high frequency group. After the uniform quantization, the quantized coecients in high frequency group have little magnitude with high probability. This is the motivation to use reverse sorting. In this algorithm, we have a list of signicant points whose magnitude is greater than the threshold. The sorting process is as follows: 1. Initialization : put all coecients in high frequency group into signicant set. Set the threshold as 1. 2. Find the maximum value of the coecients in the signicant set, and transmit it. 3. Pass through the whole signicant set for a coecient whose magnitude is greater than threshold, send 1, otherwise, send 0, and remove it from the signicant set. 4. Pass through the whole signicant set for a positive coecient, send 1, otherwise send 0. 5. Increase the threshold by 1, and pass through the signicant set for a coecient whose magnitude is greater than threshold, send 1, otherwise send 0 and remove it from signicant set. 6. Repeat last step until the signicant set is empty. After we do the reverse sorting, Arithmetic encoding is used to encode the bitstream. This will improve the compression ratio by about 10-30 percents. The decoder can reproduce the lists and sets produced at the encoder from the received maximum values and decision and sign bits, so is able to reconstruct the same quantized coecients using the received renement bits without additional overhead information. 4 The Presented Audio Coder The diagram of the proposed audio coder is shown in Figure 4. The audio frame with frame size of 1024 is input into the psychoacustic model to calculate the SMR. In parallel with this, the signal is transformed into 29 subband signals. Then for each subband, according to the desired total bit rate, SMR, and coecents, we calculate the stepsize of the quantizers used in the SPIHT coding algorithm and reverse sorting process. These stepsizes are also sent as side information to the decoder. The transform coecients of low frequency group are encoded by SPIHT algorithm 5

total is total bitrate (bits/sample) Table 2: Test Result of the Proposed Encoder signal total spiht arc SNR(dB) sting 1.28.5.8 13.08 fm 1.32.5.8 13.13 stp 1.39.5.9 10.90 pj 1.30.6.7 11.40 mozart 1.48.5 1. 11.18 mahler 1.49.5 1. 14.23 spiht is bitrate used for low frequency group (bits/sample) arc is the bitrate used for high frequency group (bits/sample) SNR is (signal variance/mean square error) in db. all of the reconstructed signals achieve nearly transparent quality at corresponding bitrate. and high frequency group signal are encoded by reverse sorting process plus arithmetic coding algorithm. 5 Results Several pieces of music are used to test the performance of this coder. Mozart and Mahler are classical music, FM and Sting are soft rock songs, PJ and Stp are rock songs. All of them are sampled at the rate of 44.1 KHz with 16 bits of precision. Informal subjective testing reveals that the coder can get nearly transparent quality at bit rates 55-66 Kbits/second. The results are shown in Table 2. In this paper, we investigated the performance of an audio encoder based on SPIHT algorithm. The experiment shows that it is ecient and has low complexity in implementation. We will compare the performance of our encoder with the MPEG2 and MPEG4 standard encoders. We shall also present results for lower coding rates for non-transparent quality applications, where 6

current indications are that the SPIHT audio coder performs very well. We believe this coder to a strong candidate for interactive applications requiring good to perceptually transparent quality. References [1] S. Boland, M. Deriche, \Audio Coding Using the Wavelet Packet Transform and a Combined Scalar-Vector Quantization,"Proc. IEEE Intern. Conf. Acoust., Speech, and Sig. Processing (ICASSP), Vol. 2, pp. 1041-1044, 1996. [2] I. Daubechies, \Orthonormal Bases of Compactly Supported Wavelets," Commun. Pure Appl. Math., vol 41. pp. 909-996, Nov. 1988. [3] K. N. Hamdy, M.AliandA.H.Tewk, \Low Bit Rate High Quality Audio Coding with Combined Harmonic and Wavelet Representations,"Proc. IEEE Intern. Conf. Acoust., Speech, and Sig. Processing (ICASSP),, Vol. 2, pp. 1045-1048, 1996. [4] M. Purat and P. Noll, \Audio Coding with a Dynamic Wavelet Packet Decomposition Based on Frequency-Varying Modulated Lapped Transforms,"Proc. IEEE Intern. Conf. Acoust., Speech, and Sig. Processing (ICASSP), Vol. 2, pp. 1021-1024, 1996. [5] A. Said and W. A. Pearlman, \A New, Fast and Ecient Image Codec Based on Set Partitioning in Hierarchical Trees,", IEEE Trans. on Circuits and Systems for Video Technology, Vol. 6 No. 3, pp. 243-250, June 1996. [6] D. Sinha and A. H. Tewk, \Low Bit Rate Transparent Audio Compression Using Adapted Wavekets," IEEE Trans. on Signal Processing, Vol 41, No. 12, pp. 3463-3479 Dec. 1993. [7] T. R. Trinkaus, Wideband Audio Compression Using Subband Coding and Entropy-Constrained Scalar Quantization, Master Thesis, Rensselaer Polytechnic Institute, 1995. 7