Convention Paper Presented at the 121st Convention 2006 October 5 8 San Francisco, CA, USA

Similar documents
Module 6 STILL IMAGE COMPRESSION STANDARDS

New Results in Low Bit Rate Speech Coding and Bandwidth Extension

ADAPTIVE PICTURE SLICING FOR DISTORTION-BASED CLASSIFICATION OF VIDEO PACKETS

The MPEG-4 General Audio Coder

ELL 788 Computational Perception & Cognition July November 2015

Mpeg 1 layer 3 (mp3) general overview

QUANTIZER DESIGN FOR EXPLOITING COMMON INFORMATION IN LAYERED CODING. Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose

Packet Loss Concealment for Audio Streaming based on the GAPES and MAPES Algorithms

Perceptual coding. A psychoacoustic model is used to identify those signals that are influenced by both these effects.

Parametric Coding of High-Quality Audio

Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal.

MPEG-4 General Audio Coding

A Hybrid Temporal-SNR Fine-Granular Scalability for Internet Video

Delay Constrained ARQ Mechanism for MPEG Media Transport Protocol Based Video Streaming over Internet

Coding for the Network: Scalable and Multiple description coding Marco Cagnazzo

Multi-path Forward Error Correction Control Scheme with Path Interleaving

Context based optimal shape coding

SPREAD SPECTRUM AUDIO WATERMARKING SCHEME BASED ON PSYCHOACOUSTIC MODEL

Multimedia Communications. Audio coding

Recommended Readings

Robustness of Multiplexing Protocols for Audio-Visual Services over Wireless Networks

6MPEG-4 audio coding tools

Lecture 16 Perceptual Audio Coding

Channel-Adaptive Error Protection for Scalable Audio Streaming over Wireless Internet

Optical Storage Technology. MPEG Data Compression

AUDIOVISUAL COMMUNICATION

MPEG-1. Overview of MPEG-1 1 Standard. Introduction to perceptual and entropy codings

Network-Adaptive Video Coding and Transmission

5: Music Compression. Music Coding. Mark Handley

Audio and video compression

SIGNAL COMPRESSION. 9. Lossy image compression: SPIHT and S+P

Scalable Perceptual and Lossless Audio Coding based on MPEG-4 AAC

Chapter 14 MPEG Audio Compression

DRA AUDIO CODING STANDARD

DAB. Digital Audio Broadcasting

Rate-Distortion Optimized Layered Coding with Unequal Error Protection for Robust Internet Video

2.4 Audio Compression

S.K.R Engineering College, Chennai, India. 1 2

4G WIRELESS VIDEO COMMUNICATIONS

Wireless Video Transmission: A Single Layer Distortion Optimal Approach

Fundamentals of Perceptual Audio Encoding. Craig Lewiston HST.723 Lab II 3/23/06

AUDIOVISUAL COMMUNICATION

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007

Perceptual Coding. Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding

Principles of Audio Coding

Audio-coding standards

1480 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 5, OCTOBER 2012

over the Internet Tihao Chiang { Ya-Qin Zhang k enormous interests from both industry and academia.

ADAPTIVE JOINT H.263-CHANNEL CODING FOR MEMORYLESS BINARY CHANNELS

Proceedings of Meetings on Acoustics

Streaming (Multi)media

Partial Reliable TCP

14th European Signal Processing Conference (EUSIPCO 2006), Florence, Italy, September 4-8, 2006, copyright by EURASIP

Multiple Description Coding for Video Using Motion Compensated Prediction *

International Journal of Emerging Technology and Advanced Engineering Website: (ISSN , Volume 2, Issue 4, April 2012)

CC-SCTP: Chunk Checksum of SCTP for Enhancement of Throughput in Wireless Network Environments

ISO/IEC INTERNATIONAL STANDARD. Information technology MPEG audio technologies Part 3: Unified speech and audio coding

Error Concealment Used for P-Frame on Video Stream over the Internet

IJCSI International Journal of Computer Science Issues, Vol. 9, Issue 5, No 2, September 2012 ISSN (Online):

Modified SPIHT Image Coder For Wireless Communication

An Unequal Packet Loss Protection Scheme for H.264/AVC Video Transmission

Request for Comments: 5109 December 2007 Obsoletes: 2733, 3009 Category: Standards Track. RTP Payload Format for Generic Forward Error Correction

Nokia Q. Xie Motorola April 2007

Source Coding Basics and Speech Coding. Yao Wang Polytechnic University, Brooklyn, NY11201

Internet Streaming Media Alliance Ultravox Provisional Specification Version 1.0 November 2007

Lecture 5: Error Resilience & Scalability

Introduction to LAN/WAN. Application Layer 4

/ / _ / _ / _ / / / / /_/ _/_/ _/_/ _/_/ _\ / All-American-Advanced-Audio-Codec

MODIFIED IMDCT-DECODER BASED MP3 MULTICHANNEL AUDIO DECODING SYSTEM Shanmuga Raju.S 1, Karthik.R 2, Sai Pradeep.K.P 3, Varadharajan.

Motion Estimation. Original. enhancement layers. Motion Compensation. Baselayer. Scan-Specific Entropy Coding. Prediction Error.

The new Hybrid approach to protect MPEG-2 video header

Multimedia Data Transmission over Mobile Internet using Packet-Loss Punctured (PaLoP) Codes

Image Error Concealment Based on Watermarking

CODING METHOD FOR EMBEDDING AUDIO IN VIDEO STREAM. Harri Sorokin, Jari Koivusaari, Moncef Gabbouj, and Jarmo Takala

Transporting audio-video. over the Internet

ERROR-ROBUST INTER/INTRA MACROBLOCK MODE SELECTION USING ISOLATED REGIONS

Week 14. Video Compression. Ref: Fundamentals of Multimedia

Perceptually-Based Joint-Program Audio Coding

MPEG-4 aacplus - Audio coding for today s digital media world

2 Framework of The Proposed Voice Quality Assessment System

Optimal Estimation for Error Concealment in Scalable Video Coding

Interactive Progressive Encoding System For Transmission of Complex Images

Active Concealment for Internet Speech Transmission

Video-Aware Wireless Networks (VAWN) Final Meeting January 23, 2014

Audio Compression. Audio Compression. Absolute Threshold. CD quality audio:

INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO

The Steganography In Inactive Frames Of Voip

Audio-coding standards

System Modeling and Implementation of MPEG-4. Encoder under Fine-Granular-Scalability Framework

Review and Implementation of DWT based Scalable Video Coding with Scalable Motion Coding.

Motion Estimation for Video Coding Standards

CS 335 Graphics and Multimedia. Image Compression

Compression transparent low-level description of audio signals

Error-Resilient Transmission of 3D Models

Networking Applications

Systematic Lossy Error Protection for Video Transmission over Wireless Ad Hoc Networks

Video-Aware Link Adaption

Video Compression An Introduction

RECOMMENDATION ITU-R BT.1720 *

Improving the quality of H.264 video transmission using the Intra-Frame FEC over IEEE e networks

Transcription:

Audio Engineering Society Convention Paper Presented at the 121st Convention 2006 October 5 8 San Francisco, CA, USA This convention paper has been reproduced from the author s advance manuscript, without editing, corrections, or consideration by the Review Board. The AES takes no responsibility for the contents. Additional papers may be obtained by sending request and remittance to Audio Engineering Society, 60 East 42 nd Street, New York, New York 10165-2520, USA; also see www.aes.org. All rights reserved. Reproduction of this paper, or any portion thereof, is not permitted without direct permission from the Journal of the Audio Engineering Society. Error-Robust Frame Splitting For Audio Streaming Over the Lossy Packet Network Jong Kyu Kim 1, Hwan Sik Yun 1, Jung Su Kim 1, Joon-Hyuk Chang 2, and Nam Soo Kim 1 1 School of Electrical Engineering, Seoul National University, Seoul 151-744, Korea 2 School of Electronic and Electrical Engineering, Inha University, Incheon 402-751, Korea Correspondence should be addressed to Chong Kyu Kim-Author (ckkim@hi.snu.ac.kr) ABSTRACT In this paper, we propose a novel audio streaming scheme for perceptual audio coder over the packet-switching network. Each frame is split into several subframes which are independently decoded based on the specified packet size for robust error concealment. We further improve the subframe splitting techniques by allocating the spectral lines to each subframe adaptively. Through an informal listening test, it is discovered that our approach enhances audio signal under the lossy packet network environments. 1. INTRODUCTION Audio streaming has become one of the most popular data services in mobile communications on these days. Most of the audio streaming services are based on the packet-switching network where messages are divided into packets and each packet is transmitted individually. In the mobile packet-switching network, one of the most typical type of errors is the packet loss. Packet loss may arise in many different forms on the internet or wireless networks [1]. Under such packet loss conditions, it is crucial to guarantee the user-perceived quality of service (QoS). There are several practical techniques of audio dd streaming such as the error resilience (ER) or error protection (EP) tools in the MPEG Advanced Audio Coding (AAC) standard [2]. These tools can be applied to cope with bit errors caused by the packet losses error. When packet loss occurs, error concealment is usually applied to substitute the lost part with a suitable data. Error concealment algorithms are implemented at the receiver of the audio streams and they do not usually require any side information from the transmitter [3]. The major objective of packet error concealment is to regenerate the lost data so that it is perceptually indistinguishable from the original. Also, there have been several proposals on packeti-

zation scheme for error-robust audio streaming over the packet-switching networks. RTP payload format [4] defines a general and configurable payload structure to transport MPEG-4 elementary streams, which includes detection of the loss of crucial information in the bitstream, optional interleaving of audio frames, and retransmission or forward error correction with due consideration to congestion control. A more specific strategy on the packetization of audio bitstream has been proposed with an internal structure of encoded audio frame [5]. This strategy arranges MPEG-AAC frames in different packets according to the proportional priority considering the tradeoff between redundancy overhead and retransmission delay. In this paper, we propose a novel frame splitting scheme for robust audio streaming over the packetswitching networks. The proposed scheme can be applied to the cases in which a single audio frame should be split into a number of seprate packets. Such situation happens when the size of the packet is smaller than that of the audio frame or when the audio frame should be segmented into subblocks and interleaved. The proposed technique is found effective to enhance the audio quality that may possibly be degraded due to the packet losses and the mismatch between the audio frame and packet sizes specified in network and audio codec configurations. The rest of this paper is organized as follows; in Section 2 we address the general structure of a perceptual audio coder and its bitstream. And we address its defects when applied to transmission over packetswitching network. Then we describe the proposed frame splitting scheme in Section 3 and present an adaptive frame splitting technique which prevents deterioration of coding efficiency in Section 4. Following the experimental results in Section 5, we conclude this paper in Section 6. 2. STRUCTURE OF AUDIO BITSTREAM Generally, perceptual audio coders are developed with little attention to the transmission error. In this section, we consider the problem of streaming the compressed audio data over the lossy packet network. In conventional audio coding algorithm, each block of audio samples are converted to a frame of bitstream which is independently decoded. A block of input samples are transformed into a set of spectral lines in the frequency domain through a timefrequency transformation. Perceptual audio coding algorithms achieve a high coding gain by exploiting both the perceptual irrelevancies and the statistical redundancies in the spectral domain [6]. Perceptually irrelevant components are removed by adjusting the quantization stepsizes depending on the masking level computed from the psycho-acoustics model. On the other hand, statistical redundancies are removed with the use of an entropy coding technique such as the Huffman coding or DPCM. Consequently, the compressed audio bitstream should consist of the entropy encoded spectral information, side information which is used to decode each spectral line and header information which conveys the configurations e.g., sampling rate, number of channels and so on. In general, these informations are written sequentially as shown in Fig. 1. As mentioned above, audio frame is the smallest unit that can be decoded independently in a perceptual audio coder. For the transmission of the compressed audio data over a packet-switching network, the bitstream should be segmented into serveral packets of appropriate size. If the audio encoder has been developed without knowing the network specification, there usually exist some mismatches between the audio frame and packet sizes. If the frame border does not coincide with the packet border, information in a frame spans over two adjacent packets. When either of these two packets is missing at the receiver, the frame can not be decoded perfectly; more precisely, part of the frame can be decoded depending on which part of the frame was lost. However the possibility is low since the audio bitstream is highly vulnerable to consecutive bit errors caused by the missing packet. An even worse case arises when a single frame is segmented into several packets. This happens when the packet size should be smaller than the frame size or when a frame is segmented into partitions and interleaved Fig. 1: Structure of the bistream in the conventional audio coding algorithms Page 2 of 7

Author et al. Fig. 2: Synchronization between audio frames and packets over several packets. In this case, the loss of a single packet results in a loss of the whole audio frame even though all the other packets are received successfully. Fig. 3: Sequential Splitting 3. FRAME SPLITTING In order to cope with the mismatch between the audio frame and packet sizes and to achieve an efficient audio data streaming which is robust to packet losses, we modify the conventional audio encoding technique. Our approach splits each audio frame into several subframes such that the size of a subframe matches the size of a packet and each subframe can be decoded independently. Even though the basic idea can be applied to various perceptual audio coders, we focus on the modification of the MPEG-AAC in this work. 3.1. Splitting into Subframes Every time a frame is encoded, the available number of packets is given to maintain the time synchrony. This comes from the network bandwidth configuration. As a result, the audio coder operates in a variable rate mode. An example of assigning each packet to the corresponding audio frame is shown in Fig. 2 where we can see that the number of packets varies from frame to frame. One drawback of this scheme is that the frame which is split into smaller number of packets is given less bits for audio compression irrespective of the spectral contents. This effect becomes weaker as the packet size gets smaller. In conventional perceptual audio coders, spectral lines are grouped into frequency bands which are referred to as scalefactor bands (as in MPEG-AAC). The spectral lines in each scalefactor band are entropy coded with a side information added to the bitstream. Since the spectral lines in each scalefactor band are coded jointly, it is desirable to split the audio frame by treating each scalefactor band as the basic unit. Fig. 4: Interleaving Splitting 3.2. Scaeflactor Band Allocation Rule The rule according to which the scalefactor bands are allocated to each packet can be arbitrarily chosen if only both the encoder and the decoder know it exactly. It is easy to devise two simple rules: sequential and interleaving schemes of splitting. In the sequential splitting scheme, adjacent scalefactor bands are allocated into the same packet as shown in Fig. 3. A major shortcoming of this scheme is to make a large spectral gap in the frequency domain when a packet is missing. To alleviate this deterioration, the interleaving scheme interleaves the order of scalefactor bands before sequentially assigning to each packet as shown in Fig. 4. Since the interleaving operation disperses the effect of packet loss, missing spectral lines which appear as a large spectral gap in the original sequential splitting scheme are replaced by multiple AES 121st Convention, San Francisco, CA, USA, 2006 October 5 8 Page 3 of 7

small gaps which are perceptually preferred. This also helps error concealment because missing spectral lines can be predicted based on the correlation with the neighboring spectral lines. A disadvantage of the interleaving scheme is that it decreases the coding efficiency since the spectral lines collected over a wide frequency range has low redundancies. This results in a deterioration of the perceived audio quality at the same bitrate. Consequently, an optimal splitting rule should be designed based on a tradeoff between the coding efficiency and error concealment. In the splitting rule, the number of scalefactor bands allocated to each subframe is an important factor that determines the audio quality. It is due to the fact that spectral lines are distributed unequally over the whole frequency range. For instance, there are usually more spectral contents in low frequency bands than in high frequency bands. If every packet assumes an equal number of scalefactor bands, the low frequency bands are likely to be coded with less bits than required. Adjusting the splitting rule according to the overall statistical distribution of the audio signals can alleviate this effect in some degree. However the distribution of spectral information varies rapidly over time and the fixed rule may not guarantee a proper splitting for some frames leading to a degradation of the audio quality. A suboptimal splitting rule will be discussed in the following section. 3.3. Encoding After the allocation of scalefactor bands according to the splitting rule mentioned above, each subframe is independently encoded. Encoding is executed based on the general audio coding algorithm with a slight modification. First, the number of available bits is given as a parameter for the separate encoding of each subframe. This parameter is used for rate control. Since each subframe should be quantized independently, it is inavoidable to modify the conventional audio coding algorithm. Instead of computing the global gain over all the scalefactor bands, separate global gain is obtained for each subframe considering only the scalefactor bands that belong to it. Once the global gain is obtained, each scalefactor is fed into the rate control loop which iteratively determines the quantization levels of the scalefactor bands according to the bitrate constraint [2].The rate control loop is almost the same to that of the conventional audio coder. The only difference lies on that in our approach the scalefactor bands that belong to the same subframe are considered simultaneously. After the bit allocation, encoding data of each subframe is separately written as a bitstream for later packetization. The bitstream structure of each subframe is not much different from that of the normal audio frame specified in conventional audio coding. For a decoding robust to packet loss, the header is added to all the subframe data. Even though this may be considered an overhead on the limited network bandwidth, the header data is usually much smaller than the other data that describes the audio contents. 4. ADAPTIVE SPLITTING Now, what remains is how to optimally split the audio frame into a finite number of subframes. Splitting here means a mapping that allocates each scalefactor band to a specific subframe on packet. As mentioned in the previous section, a fixed allocation is not desirable for achieving high audio quality despite its advantage that it does not require any side information to be delivered to the receiver. A more promising approach is to split the audio frame such that all the subframes are encoded with a equal amount of coding efficiency such that no specific subframe causes the low audio quality. To measure the coding efficiency for each subframe, we apply the noise-to-mask ratio (NMR) which represents a ratio of the quantization noise to the masking threshold [2]. Let R i denote the NMR for the i th scalefactor band. Then, R i = N i M i (1) where N i is the power of the quantization noise and M i is the masking threshold computed from the psycho-acoustics model for the i th scalefactor band. Creating a constant NMR over all scalefactor bands is an objective in the rate-distortion control module of the MPEG-AAC when the number of available bits are higher or lower than the required bits [2]. Analogous to this method, we also aim to allocate sclaefactor bands to subframes such that all Page 4 of 7

the subframes have almost the same level of NMR. For the adaptive frame splitting, we propose an algorithm that operates in an iterative manner. A flowchart of the overall algorithm is shown in Fig. 5. At the initial phase, scalefactors are allocated to each subframe with a default splitting rule then each subframe is encoded. After encoding, NMR for each subframe is calculated and it is checked whether the NMRs are equally distributed. If the NMRs are found to be unbalanced, scalefactor bands are reallocated by increasing the number of scalefactor bands in the subframe with the maximum NMR while decreasing the number of scalefactor bands in subframe with minimum NMR. Then the process of encoding and NMR computation are executed again. As this iteration continues, the number of scalefactor bands allocated to each subframe converges as shown in Fig.?? in which the number of scalefactor bands in each subframe is plotted. The iteration stops when the frame splitting does not make any change. Information on the adaptive splitting should be included in the bitstream of each subframe. Decoder arranges the decoded scalefactor bands according to it. For an independent decoding, each packet should have the location index as well as the number of scalefactor bands the subframe has. In our implementation, we assign 5 bits for the starting location index and another 5 bits for the number of scalefactor bands. An example of frame splitting with the relevant information to be coded is given in Table 1. Subframe Index 1 2 3 4 5 First Scalefactor Band Index 0 8 19 28 33 Last Scalefactor Band Index 7 18 27 32 37 Starting Location Index 8 11 6 Number of Scalefactor Bands 7 11 9 5 5 Table 1: Representation of Allocation Information. 5. TEST RESULTS To evaluate the performance of the proposed scheme, we implemented the frame splitting module on the MPEG-AAC platform. For simplicity, we made several modifications on the original MPEG-AAC algorithm. First, we did not apply the block switching technique such that the audio analysis could be performed based on only the long block. Second, additional encoding tools such as temporal noise shaping Fig. 5: A flowchart of the overall frame splitting algorithm Page 5 of 7

(TNS) and gain control were not applied. Finally, the bit reservoir was not adopted. Sampling Rate Frame Size Input File Length Number of Channels 11,025 Hz 92.88 ms 40 s Mono Table 2: Test audio coder specifications. Bitrate 8.4 kbps Packet Length 20 ms Packet Size 168 bits PER (Packet Error Rate) over 20% Table 3: Packet network specifications Specifications for the audio frame, packet size and network condition in the experiments are shown in Tables. These specifications were derived from an audio streaming application where the input signal is compressed by the MPEG-AAC and then transmitted over the Code Division Multiple Access (CDMA) packet-switching network. When a packet loss occurred, an error concealment algorithm was applied to reconstruct the missing spectral lines. For the error concealment, we took the simple repetition strategy with which the lost spectral lines were substituted with those spectral components that were successfully received most recently. If the packet loss occurred continuously (burst packet loss), corresonding spectral lines were faded out exponentially, and muted from a certain number of consecutive packet losses. The same error concealment scheme was applied to both the original audio coder and proposed one. A comparison of waveforms decoded from the audio bitstream damaged from some packet losses is given in Fig. 7. The first plot shows the original input waveform and the second plot is the waveform obtained from the conventional MPEG-AAC decoder. The third plot displays the waveform obtained from the proposed frame splitting algorithm. The graph at the bottom illustrates the applied error sequence where 0 represents no packet error and 1 indicates the packet loss. At the locations where packet losses occurred, original algorithm could not decode the received frames and faded out the waveform. In contrast, the proposed algorithm recovered the partly Fig. 6: An iteration to find the numbers of scalefactor bands assigned to the subframes Fig. 7: Decoded Waveform Page 6 of 7

lost audio frames and the lost spectral lines could be concealed more faithfully. This example clearly demonstrates the advantage of our frame splitting technique in the loss packet environments. For a further evaluation of the performance, an informal listening test was carried out by ten listeners. All the ten subjects provided an opinion that the decoded audio data obtained from the proposed approach had much less interruption caused by the packet losses compared to that from the original MPEG-AAC. for transport of MPEG-4 elementary streams, IETF RFC 3640, 2003. [5] J. Korhonen, Y. Wang and D. Isherwood, Toward bandwidth-efficient and error-robust audio streaming over lossy packet networks, Multimedia Systems Journal (MMSJ), 2005. [6] T. Painter, A. Spanias. Perceptual coding of digital audio, In Proc. the IEEE, April 2000. 6. CONCLUSIONS In this paper, we have proposed a frame splitting scheme in perceptual audio coding algorithms. Each subframe is independently encoded such that it fits the specified packet size. Received packets are independently decoded without being affected by the other missing packets. An informal subjective listening evaluation has shown that the suggested scheme dramatically improves the audio streaming quality under the lossy packet network environment. 7. ACKNOWLEDGEMENT This work was supported by SK Telecom, and the authors would like to thank Dr. D. H. Lee, Dr. S. S. Park and D. S. Woo at SK Telecom for their helpful discussions. 8. REFERENCES [1] Y. Wang, A. Ahmaniemi, D. Isherwood and W. Huang, Content-based UEP: A new scheme for packet loss recovery in music streaming, ACM Multimedia Conference, Berkeley, CA, USA, Nov. 2003. [2] ISO. Information Technology-Coding of Audio- Visual Objects, 1999. ISO/IEC JTC1/SC29 WG11, ISO/IEC IS-14496 (Part-3, Audio). [3] B.W. Wah, X. Su and D. Lin., A survey of error concealment schemes for real-time audio and video transmissions over the internet, IEEE International Symposium on Multimedia Software Engineering, Taipei, Taiwan, pp. 17-24, Dec. 2000. [4] J. van der Meer, D. Mackie, V. Swaminathan, D. Singer, P. Singer, RTP payload format Page 7 of 7