DRA AUDIO CODING STANDARD

Similar documents
5: Music Compression. Music Coding. Mark Handley

Audio Compression. Audio Compression. Absolute Threshold. CD quality audio:

Audio Coding Standards

Optical Storage Technology. MPEG Data Compression

Multimedia Communications. Audio coding

Mpeg 1 layer 3 (mp3) general overview

Contents. 3 Vector Quantization The VQ Advantage Formulation Optimality Conditions... 48

MPEG-1. Overview of MPEG-1 1 Standard. Introduction to perceptual and entropy codings

Audio-coding standards

Lecture 16 Perceptual Audio Coding

Audio-coding standards

MPEG-4 General Audio Coding

ELL 788 Computational Perception & Cognition July November 2015

Audio Coding and MP3

Figure 1. Generic Encoder. Window. Spectral Analysis. Psychoacoustic Model. Quantize. Pack Data into Frames. Additional Coding.

Perceptual Coding. Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding

The MPEG-4 General Audio Coder

Appendix 4. Audio coding algorithms

<< WILL FILL IN THESE SECTIONS THIS WEEK to provide sufficient background>>

Principles of Audio Coding

Perceptual coding. A psychoacoustic model is used to identify those signals that are influenced by both these effects.

Fundamentals of Perceptual Audio Encoding. Craig Lewiston HST.723 Lab II 3/23/06

Chapter 4: Audio Coding

Scalable Perceptual and Lossless Audio Coding based on MPEG-4 AAC

Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal.

Wavelet filter bank based wide-band audio coder

MPEG-4 aacplus - Audio coding for today s digital media world

EE482: Digital Signal Processing Applications

Module 9 AUDIO CODING. Version 2 ECE IIT, Kharagpur

Chapter 14 MPEG Audio Compression

/ / _ / _ / _ / / / / /_/ _/_/ _/_/ _/_/ _\ / All-American-Advanced-Audio-Codec

Compressed Audio Demystified by Hendrik Gideonse and Connor Smith. All Rights Reserved.

Application Note PEAQ Audio Objective Testing in ClearView

S.K.R Engineering College, Chennai, India. 1 2

2.4 Audio Compression

The RTP Encapsulation based on Frame Type Method for AVS Video

New Results in Low Bit Rate Speech Coding and Bandwidth Extension

INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO

Efficient Signal Adaptive Perceptual Audio Coding

ROW.mp3. Colin Raffel, Jieun Oh, Isaac Wang Music 422 Final Project 3/12/2010

CT516 Advanced Digital Communications Lecture 7: Speech Encoder

Parametric Coding of High-Quality Audio

HAVE YOUR CAKE AND HEAR IT TOO: A HUFFMAN CODED, BLOCK SWITCHING, STEREO PERCEPTUAL AUDIO CODER

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Snr Staff Eng., Team Lead (Applied Research) Dolby Australia Pty Ltd

ISO/IEC INTERNATIONAL STANDARD. Information technology MPEG audio technologies Part 3: Unified speech and audio coding

I D I A P R E S E A R C H R E P O R T. October submitted for publication

Perceptually motivated Sub-band Decomposition for FDLP Audio Coding

Audio coding for digital broadcasting

Data Compression. Audio compression

Speech and audio coding

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Senior Manager, CE Technology Dolby Australia Pty Ltd

For Mac and iphone. James McCartney Core Audio Engineer. Eric Allamanche Core Audio Engineer

SPREAD SPECTRUM AUDIO WATERMARKING SCHEME BASED ON PSYCHOACOUSTIC MODEL

MUSIC A Darker Phonetic Audio Coder

PQMF Filter Bank, MPEG-1 / MPEG-2 BC Audio. Fraunhofer IDMT

ENTROPY CODING OF QUANTIZED SPECTRAL COMPONENTS IN FDLP AUDIO CODEC

Principles of MPEG audio compression

AUDIOVISUAL COMMUNICATION

Efficient Implementation of Transform Based Audio Coders using SIMD Paradigm and Multifunction Computations

Simple Watermark for Stereo Audio Signals with Modulated High-Frequency Band Delay

Modeling of an MPEG Audio Layer-3 Encoder in Ptolemy

6MPEG-4 audio coding tools

INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO

SAOC and USAC. Spatial Audio Object Coding / Unified Speech and Audio Coding. Lecture Audio Coding WS 2013/14. Dr.-Ing.

AUDIOVISUAL COMMUNICATION

CISC 7610 Lecture 3 Multimedia data and data formats

MPEG-l.MPEG-2, MPEG-4

Squeeze Play: The State of Ady0 Cmprshn. Scott Selfon Senior Development Lead Xbox Advanced Technology Group Microsoft

Compression; Error detection & correction

2014 Summer School on MPEG/VCEG Video. Video Coding Concept

The following bit rates are recommended for broadcast contribution employing the most commonly used audio coding schemes:

DSP. Presented to the IEEE Central Texas Consultants Network by Sergio Liberman

AVS-P3: Algorithm and Implementation

Shape Optimization Design of Gravity Buttress of Arch Dam Based on Asynchronous Particle Swarm Optimization Method. Lei Xu

signal-to-noise ratio (PSNR), 2

Design and Implementation of MP3 Player Based on FPGA Dezheng Sun

KINGS COLLEGE OF ENGINEERING DEPARTMENT OF INFORMATION TECHNOLOGY ACADEMIC YEAR / ODD SEMESTER QUESTION BANK

Audio Fundamentals, Compression Techniques & Standards. Hamid R. Rabiee Mostafa Salehi, Fatemeh Dabiran, Hoda Ayatollahi Spring 2011

Embedded lossless audio coding using linear prediction and cascade coding

Audio and video compression

Subjective and Objective Assessment of Perceived Audio Quality of Current Digital Audio Broadcasting Systems and Web-Casting Applications

Source Coding Basics and Speech Coding. Yao Wang Polytechnic University, Brooklyn, NY11201

EBU Tech Doc Three parts of Tech Doc 3344

Motion Estimation. Original. enhancement layers. Motion Compensation. Baselayer. Scan-Specific Entropy Coding. Prediction Error.

The Analysis and Research of IPTV Set-top Box System. Fangyan Bai 1, Qi Sun 2

Perceptual Pre-weighting and Post-inverse weighting for Speech Coding

MP3. Panayiotis Petropoulos

Ch. 5: Audio Compression Multimedia Systems

CSCD 443/533 Advanced Networks Fall 2017

Speech-Coding Techniques. Chapter 3

A PSYCHOACOUSTIC MODEL WITH PARTIAL SPECTRAL FLATNESS MEASURE FOR TONALITY ESTIMATION

FINE-GRAIN SCALABLE AUDIO CODING BASED ON ENVELOPE RESTORATION AND THE SPIHT ALGORITHM

ISO/IEC INTERNATIONAL STANDARD

Performance analysis of AAC audio codec and comparison of Dirac Video Codec with AVS-china. Under guidance of Dr.K.R.Rao Submitted By, ASHWINI S URS

Audio Engineering Society. Convention Paper. Presented at the 126th Convention 2009 May 7 10 Munich, Germany

DigiPoints Volume 1. Student Workbook. Module 8 Digital Compression

ANALYSIS OF SPIHT ALGORITHM FOR SATELLITE IMAGE COMPRESSION

Technical PapER. between speech and audio coding. Fraunhofer Institute for Integrated Circuits IIS

Structural analysis of low latency audio coding schemes

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007

Transcription:

Applied Mechanics and Materials Online: 2013-06-27 ISSN: 1662-7482, Vol. 330, pp 981-984 doi:10.4028/www.scientific.net/amm.330.981 2013 Trans Tech Publications, Switzerland DRA AUDIO CODING STANDARD Wenhua Ma 1, a Yuanzhe Ma 2, b Yu-Li You 3, c 1 School of Informatics Guangdong University of Foreign Studies.2 Baiyundadao 2 Department of Biomedical, South China University of Technology.381.Wushanlu 3 Provincial Key Lab for Digital Audio Technology, Digital Rise Technology Co. Ltd. 6th Floor, Bldg. 2, Science and Tech Park South China University of Technology a myz122@yahoo.com.cn, b reika2009@yeah.net, c yuliyou@usa.com Keywords: Audio coding, standard, listening test, adaptive transform coder. Abstract. China s DRA audio coding standard is shown to be a barebone transform coder, utilizing transient-localized MDCT for improved pre-echo suppression and statistic allocation of codebooks for high entropy coding efficiency. A signal path of up to 24 bits is provided throughout the codec so that highest audio quality can be delivered if bit rate suffices. Results of five ITU-R BS.1116 compliant subjective listening tests are presented. Introduction There have been extensive standardization activities in audio coding in the past twenty years. MPEG-1 is the first international standard for perceptual high quality audio coding [1]. It is essentially a subband coder that deploys a 32-band QMF (quadrature mirror filter bank). Its Layer 3 adds switched MDCT (modified discrete cosine transform) to the subband signals output from the QMF for increased frequency resolution. MPEG-1 was extended by MPEG-2 BC (backward compatible) to provide for lower sample rates and multichannel surround sounds [1]. MPEG-2 AAC abandons backward compatibility with MPEG-1 in order to achieve significant improvement in coding efficiency [1]. AAC uses an MDCT that switches between 1024 and 128 spectral lines. AAC is carried over into MPEG-4 AAC with the addition of more coding tools, such as Perceptual Noise Shaping and Long Term Prediction, and coder configurations [1]. Dolby AC-3 is probably the most commercially successful audio coding standard [1]. It uses MDCT that switches between 128 and 256 spectral lines. WMA (Windows Media Audio), offered by Microsoft, uses MDCT that switches between 64, 128, 256, 512, 1024, and 2048 spectral lines [2]. Vorbis, an open source codec offered by the Xiph.Org Foundation, uses MDCT that switches between 256 and 1024 spectral lines [3]. Modern audio codecs have apparently converged to MDCT as the tool for time-frequency analysis. DRA algorithm [4], adopted as China s national standard for its electronics industry, uses transient-localized MDCT that provides improved pre-echo suppression with small bit and computation overheads. It uses statistic allocation of Huffman codebooks to enhance the coding efficiency of Huffman codes. Its quantization unit and Huffman codebooks are designed in such a way that signal path of up to 24 bits is provided throughout the codec so that highest audio quality can be delivered if bit rate allows. Although simple, DRA standard delivers state-of art coding efficiency as is shown by the five ITU-R BS.1116 compliant subjective listening tests. All rights reserved. No part of contents of this paper may be reproduced or transmitted in any form or by any means without the written permission of Trans Tech Publications, www.ttp.net. (ID: 130.203.136.75, Pennsylvania State University, University Park, USA-05/03/16,23:28:38)

982 Materials Engineering and Automatic Control II The Algorithm As shown in Figure 1, DRA audio encoder is a simple, essentially bare-bone, adaptive transform coder. Its major components are described as follows. Transient-Localized MDCT. An MDCT that switches between 128 and 1024 spectral lines is used to provide time-frequency analysis. In order to improve its capability of pre-echo suppression, a special brief window is introduced, whose effective size is reduced from 256 to 160 samples. In particular, the brief window is nonzero only within the central 160 samples, with zeros for the first 48 and the last 48 samples of the window [5]. In order to switch to/from this brief window from/to the long (2048 samples) and short (256 samples) windows, a few more transitional windows are introduced due to the perfect reconstruction conditions [1]. All the windows are based on the sine window [1] This brief window is applied only to the block of samples containing a transient, while the short and/or the appropriate transition windows are applied to the quasi-stationary samples in the remainder of the transient frame. Some example window sequences are shown in Figure 2. (a) is an example for the conventional approach. (b) shows that a transient occurs in the first block of the frame, so a brief window is deployed for this block. (c) shows that a transient occurs in the third block. (d) shows that two transients occur in the third and sixth blocks, so two brief windows are placed, respectively. (e) shows that a transient occurs in the last block. Since the effective size of this brief window is reduced from 256 to 160, better pre-echo suppression is achieved for two reasons. The first is that a finer temporal resolution is deployed to transient samples and high bit rates associated with transients are constrained to fewer samples. The second reason is that the spread of quantization noise is reduced. The spread of quantization noise for the short window is 256 samples, but this is reduced to only 160 samples for the brief window. For a typical sample rate of 48 khz, they amount to 5.33 and 3.3 ms, respectively. Given that significant premasking tends to last about 1-2 ms [1], the spread of quantization noise that may be audible is reduced from 4.3-3.3 to 2.3-1.3 ms, a significant reduction. Linear Scalar Quantization. Linear scalar quantization is used to quantize MDCT spectra lines. A group of spectral lines, referred to as a quantization unit, boxed in the frequency domain by the critical bands and in the time domain by the MDCT blocks that are statistically similar, share a quantization step size, which is logarithmically quantized with a step size of 0.2 db. When the quantization step size is one, the maximum allowed quantization index is ±(2 23 1) and the Huffman codebooks are designed to accommodate this. Consequently, a signal path of up to 24 bits is provided throughout the codec so that audio quality far exceeds the perceptual capability of the human ear can be delivered if bit rate suffices.

Applied Mechanics and Materials Vol. 330 983 Statistic Allocation of Codebooks. With the conventional approach to codebook allocation, all the spectral lines within a quantization unit share a Huffman codebook. The codebook assigned to such a unit is the smallest one that can accommodate the largest quantization index within the unit. Consequently, a fixed quantization step size means all the quantization indexes within the unit are fixed, and so is the Huffman codebook. There is no other option. Since the quantization indexes within a quantization unit do not necessarily share the same statistic properties, the traditional approach does not provide a good match, if any, between the statistic properties of the Huffman codebooks and those of the quantization indexes. This motivates a statistic adaptive approach to codebook assignment, whose steps are outlined as follows: 1. The quantization indexes are grouped into granules of four, the smallest codebook that can accommodate the largest quantization index in the granule is assigned to the granule. 2. Segment the indexes of these codebook into large segments based on their local statistic properties. 3. Select the largest codebook within each segment as the codebook for that segment. The advantage of this approach is illustrated in Figure3. Since the largest quantization index falls into quantization unit d, so a large codebook is assigned using previous methods, which is obviously not a good match because most of the indexes in the unit are much smaller. Using the DRA approach, however, the largest quantization index is segmented into segment C, so share a codebook with other large quantization indexes. Also, all quantization indexes in segment D are small, so a small codebook is selected. This obviously results in fewer bits for coding the quantization indexes. Other Components. At low bit rates, DRA algorithm may deploy joint channel coding. While the implementation for sum/difference coding is regular, the joint intensity coding is a little different. Instead of joining stereo pairs, it joins all channels into the left channel, thereby providing significant bit rate reduction when surround sounds are involved. While perceptual model and global bit allocation are necessary components of DRA encoder, they are not part of DRA decoder and there is little, if any, restriction on their implementation, so they are not stipulated in the DRA standard and are thereby not discussed here. Subjective Listening Tests During its standardization process, DRA went through five rounds of ITU-R BS.1116 [1] compliant subjective listening test. The results of them are shown in Table 1. Table 1Scores for ITU-R BS.11116 compliant subjective listening tests. Lab Date Stereo 128[kbps] 5.1 320[kbps] 5.1 384[kbps] NTICRT 08/04 4.6-4.7 SLDST 10/04 4.2-4.0 SLDST 01/05 4.1-4.2 SLDST 07/05 4.7-4.5 SLDST 08/06-4.6 4.9

984 Materials Engineering and Automatic Control II All these tests specified the bit rate in such a way that it is the upper limit absolutely not to be exceeded in any frame. For example, if the sample rate is 48 khz, the bit rate of 128kbps translates into 2730 bits per frame because a DRA frame consists of 1024 samples. No frame can use more than 2730 bits and no bit reservoir is allowed. The first test was conducted by National Testing and Inspection Center for Radio and TV Products of China (NTICRT) in August 2004. Ten stereo sound tracks selected mostly from SQAM CD [6] and five 5.1 surround sound tracks were used in the test. The test subjects were all expert listeners consisting of conductors, musicians, recording engineers, and audio engineers. The other four tests were all performed by the State Lab for DTV System Testing (SLDST) under the State Administration for Radio, Film, and TV of China. Other than a few Chinese sound tracks, most of the test materials were selected from the SQAM CD [6] and a pool of surround sound tracks used by EBU and MPEG, including Pitch pipe, Harpsichord, and Elloit1. The last test, though still conducted by SLDST, was actually ordered and supervised by China Central TV (CCTV) as part of its DTV technology evaluation program. CCTV was only interested in surround sounds, so DRA was tested at 384kbps and 320 kbps. This test was conducted in comparison with two major international codecs, DRA came out as the clear winner. Conclusion DRA audio coding standard was shown to be essentially a bare-bone transform coder that uses transient-localized MDCT for improved pre-echo suppression and statistic allocation of codebooks for better entropy coding efficiency. Its quantizer and Huffman codebooks are designed in such a way that a signal path of up to 24 bits is provided throughout the codec so that highest audio quality can be delivered if bit rate suffices. Its coding efficiency has been evaluated by five ITU-R BS.11116 compliant subjective listening tests. Acknowledgements Supported by 2012 Guangdong Science and Technology Plan for Commercialization of Advanced and New Technologies under contract 2012B010100033. References [1]T. Painter and A. Spanias, Perceptual coding of digital audio, Proceedings of the IEEE, vol. 88, no. 4, pp. 451 513, April 2000. [2] Wikipedia, Windows Media Audio, http://en.wikipedia. org/wiki/windows Media Audio, October 2007. [3] Vorbis I specification, Xiph.org Foundation, 2004. [4] Yu-Li You, Weixiong Zhang, Mao Xu, and Subin Zhang, Electronics Industry Standard: Multichannel Digital Audio Coding Technology, SJ/T11368-2006, Ministry of Information Industry, People s Republic of China, 2007. [5] Yu-Li You and Wenhua Ma, Temporal transient localization for enhanced pre-echo suppression, submitted to IEEE International Conference on Acoustics, Speech, and Signal Processing, 2008. [6] EBU, Sound Quality Assessment Material Recordings for Subjective Tests, Tech. 3253, April 1988.

Materials Engineering and Automatic Control II 10.4028/www.scientific.net/AMM.330 DRA Audio Coding Standard 10.4028/www.scientific.net/AMM.330.981