Technical PapER. between speech and audio coding. Fraunhofer Institute for Integrated Circuits IIS

Similar documents
application Bulletin Fraunhofer Institute for Integrated Circuits IIS

TECHNICAL PAPER. Personal audio mix for broadcast programs. Fraunhofer Institute for Integrated Circuits IIS

WHITE PAPER. Fraunhofer Institute for Integrated Circuits IIS

TECHNICAL PAPER. Fraunhofer Institute for Integrated Circuits IIS

FAST FACTS. Fraunhofer Institute for Integrated Circuits IIS

WHITE PAPER. Director Prof. Dr.-Ing. Albert Heuberger Am Wolfsmantel Erlangen

EXCELLENCE IN AUDIO AND MEDIA ENGINEERING

The AAC audio Coding Family For

SAOC and USAC. Spatial Audio Object Coding / Unified Speech and Audio Coding. Lecture Audio Coding WS 2013/14. Dr.-Ing.

WHITE PAPER Fraunhofer DCPServer Professional

ISO/IEC INTERNATIONAL STANDARD. Information technology MPEG audio technologies Part 3: Unified speech and audio coding

ELL 788 Computational Perception & Cognition July November 2015

2.4 Audio Compression

The MPEG-4 General Audio Coder

Audio Engineering Society. Convention Paper. Presented at the 126th Convention 2009 May 7 10 Munich, Germany

MPEG-4 General Audio Coding

Experiencing MultiChannel Sound in

Parametric Coding of High-Quality Audio

Principles of Audio Coding

5: Music Compression. Music Coding. Mark Handley

Audio-coding standards

INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO

Audio-coding standards

Audio Fundamentals, Compression Techniques & Standards. Hamid R. Rabiee Mostafa Salehi, Fatemeh Dabiran, Hoda Ayatollahi Spring 2011

Chapter 14 MPEG Audio Compression

MPEG-4 aacplus - Audio coding for today s digital media world

Convention Paper 8654 Presented at the 132nd Convention 2012 April Budapest, Hungary

14th European Signal Processing Conference (EUSIPCO 2006), Florence, Italy, September 4-8, 2006, copyright by EURASIP

Audio Compression. Audio Compression. Absolute Threshold. CD quality audio:

New Results in Low Bit Rate Speech Coding and Bandwidth Extension

Optical Storage Technology. MPEG Data Compression

INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO

Audio coding for digital broadcasting

Fraunhofer Institute for Integrated Circuits IIS

Multimedia Communications. Audio coding

Perceptual Coding. Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding

Source Coding Basics and Speech Coding. Yao Wang Polytechnic University, Brooklyn, NY11201

Mpeg 1 layer 3 (mp3) general overview

Compressed Audio Demystified by Hendrik Gideonse and Connor Smith. All Rights Reserved.

6MPEG-4 audio coding tools

3GPP TS V6.2.0 ( )

Speech and audio coding

Figure 1. Generic Encoder. Window. Spectral Analysis. Psychoacoustic Model. Quantize. Pack Data into Frames. Additional Coding.

Data Compression. Audio compression

Audio and video compression

Bluray (

Parametric Coding of Spatial Audio

Speech-Coding Techniques. Chapter 3

Efficient Representation of Sound Images: Recent Developments in Parametric Coding of Spatial Audio

Audio Coding and MP3

MP3. Panayiotis Petropoulos

Appendix 4. Audio coding algorithms

Proceedings of Meetings on Acoustics

PRESS RELEASE. Fraunhofer IIS, Qualcomm and Technicolor to Demonstrate the World s First Live Broadcast of MPEG-H Interactive and Immersive TV Audio

DRA AUDIO CODING STANDARD

Perceptual coding. A psychoacoustic model is used to identify those signals that are influenced by both these effects.

Digital Speech Coding

Contents. 3 Vector Quantization The VQ Advantage Formulation Optimality Conditions... 48

DAB. Digital Audio Broadcasting

Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal.

CISC 7610 Lecture 3 Multimedia data and data formats

EE482: Digital Signal Processing Applications

Voice Over LTE (VoLTE) Technology. July 23, 2018 Tim Burke

S.K.R Engineering College, Chennai, India. 1 2

Professional Monitoring Receiver for

Transporting audio-video. over the Internet

Scalable Perceptual and Lossless Audio Coding based on MPEG-4 AAC

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Senior Manager, CE Technology Dolby Australia Pty Ltd

Networking Applications

Modeling of an MPEG Audio Layer-3 Encoder in Ptolemy

Lecture 16 Perceptual Audio Coding

CSCD 443/533 Advanced Networks Fall 2017

Implementation of G.729E Speech Coding Algorithm based on TMS320VC5416 YANG Xiaojin 1, a, PAN Jinjin 2,b

CT516 Advanced Digital Communications Lecture 7: Speech Encoder

MPEG-4 Advanced Audio Coding

Perceptual Pre-weighting and Post-inverse weighting for Speech Coding

ON-LINE SIMULATION MODULES FOR TEACHING SPEECH AND AUDIO COMPRESSION TECHNIQUES

AAC-ELD based Audio Communication on ios A Developer s Guide

Module 9 AUDIO CODING. Version 2 ECE IIT, Kharagpur

AUDIO. Henning Schulzrinne Dept. of Computer Science Columbia University Spring 2015

MULTIMODE TREE CODING OF SPEECH WITH PERCEPTUAL PRE-WEIGHTING AND POST-WEIGHTING

Opus, a free, high-quality speech and audio codec

Application of wavelet filtering to image compression

<< WILL FILL IN THESE SECTIONS THIS WEEK to provide sufficient background>>

ROBUST SPEECH CODING WITH EVS Anssi Rämö, Adriana Vasilache and Henri Toukomaa Nokia Techonologies, Tampere, Finland

Efficiënte audiocompressie gebaseerd op de perceptieve codering van ruimtelijk geluid

ISO/IEC Information technology High efficiency coding and media delivery in heterogeneous environments. Part 3: 3D audio

GSM Network and Services

Perceptual Audio Coders What to listen for: Artifacts of Parametric Coding

Advanced Functionality & Commercial Aspects of DRM. Vice Chairman DRM Technical Committee,

ITNP80: Multimedia! Sound-II!

The following bit rates are recommended for broadcast contribution employing the most commonly used audio coding schemes:

KINGS COLLEGE OF ENGINEERING DEPARTMENT OF INFORMATION TECHNOLOGY ACADEMIC YEAR / ODD SEMESTER QUESTION BANK

MPEG-4 Structured Audio Systems

ARIB STD-T53-C.S Circuit-Switched Video Conferencing Services

AUDIOVISUAL COMMUNICATION

Principles of MPEG audio compression

Introducing the MPEG-H Audio Alliance s New Interactive and Immersive TV Audio System

Chapter 4: Audio Coding

ETSI TS V (201

Transcription:

Technical PapER Extended HE-AAC Bridging the gap between speech and audio coding One codec taking the place of two; one unified system bridging a troublesome gap. The fifth generation MPEG audio codec Extended High Efficiency AAC (xhe-aac) is the first to deliver high quality for speech, audio and mixed signals at very low bit-rates. This paper will take a brief look at the inner workings of xhe-aac, in the process answering two key questions: what are the working principles of audio and speech codecs, and how was their structure taken into account when developing xhe-aac. Fraunhofer Institute for Integrated Circuits IIS Management of the institute Prof. Dr.-Ing. Albert Heuberger (executive) Dr.-Ing. Bernhard Grill Am Wolfsmantel 33 91058 Erlangen www.iis.fraunhofer.de Contact Matthias Rose Phone +49 9131 776-6175 matthias.rose@iis.fraunhofer.de Contact USA Fraunhofer USA, Inc. Digital Media Technologies* Phone +1 408 573 9900 codecs@dmt.fraunhofer.org Contact China Toni Fiedler china@iis.fraunhofer.de Contact Japan Fahim Nawabi Phone: +81 90-4077-7609 fahim.nawabi@iis.fraunhofer.de Contact Korea Youngju Ju Phone: +82 2 948 1291 youngju.ju@iis-extern.fraunhofer.de * Fraunhofer USA Digital Media Technologies, a division of Fraunhofer USA, Inc., promotes and supports the products of Fraunhofer IIS in the U. S. 1 / 7

Unrivalled audio quality for all signal types Prior to the development of xhe-aac, there was the speech coding scheme on the one hand and the audio coding scheme on the other. Both are eminently capable of delivering good quality when applied to their individual operating fields, but reveal weaknesses when deployed elsewhere. Consequently, speech codecs are perfect for speech but show deficiencies when it comes to music at low bit-rates. Meanwhile, audio codecs perform very well for music signals, but cave in when it comes to speech at very low bit-rates. This predicament provides audio developers with a challenge as there are plenty of services and applications that move through the domains of both speech and music. They work with various signal types and therefore require a suitably flexible codec. Streaming and broadcast are two prominent examples. Until now, providers of low bit-rate services have been forced to prioritize either sound or speech and the matching codec, resulting in poor audio quality for the other signal world. Alternatively, they have attempted to test and integrate two codecs at considerable expense in terms of both time and money. This undesirable situation has been further amplified by significant changes on the receiver end. When originally developed, mobile phones were intended only for straightforward voice and texting applications. But during the past few years, major technological developments have allowed users to stream videos and music onto their mobile devices, listen to their favorite digital radio stations, and download apps to deliver audio books, magazines and other content. xhe-aac is the perfect solution for all of these requirements. Surpassing its goal of performing at least as good as the best speech codec, the 3GPP Adaptive Multi-Rate Wideband Plus (AMR-WB+), and the best audio codec, MPEG-4 High Efficiency Advanced Audio Coding (HE-AAC), it unites the strongest features of these two technologies, delivering unrivalled audio quality starting at bit-rates as low as 6 kbit/s per channel for any type of signal. Behind the scenes: Codec structures Nowadays, audio and speech codecs rely on the following three main elements: a core-coder for the representation of low and intermediate frequency signal components; a parametric bandwidth extension which uses various parameters to generate higher frequency content from low frequency portions; a parametric stereo coder to create additional stereo signals on the basis of a mono downmix and spatial parameters. At low bit-rates, the parametric tools mentioned above provide increased coding efficiency while maintaining good audio quality. The tools can be selectively deactivated when the codec operates at higher bit-rates and the core-coder is able to independently handle a wider bandwidth as well as the discrete coding of several channels. 2 / 7

LP Uncompressed PCM Audio L Low Spatial / Encoder and Downmix HP R High Bandwidth Extension Encoder Core Decoder Low Bitstream De-Multiplex Merging Bandwidth Extension Decoder High Figure 1: Depiction of a typical audio codec structure, including the core codec and parametric tools for bandwidth extension and stereo signals. a) The encoder b) The corresponding decoder structure. Bold arrows represent audio signals. Thin arrows depict side information and control data. Core Encoder Spatial / Decoder and Upwnmix L R Bitstream Multiplex Uncompressed PCM Audio a b Composition of an audio coding scheme As with other general transform coding schemes, the working principle of Advanced Audio Coding (AAC) is based on psychoacoustics. Aiming for irrelevance removal, it takes advantage of temporal and simultaneous masking effects. The resulting audio coding scheme relies on the following three main steps: the conversion from time to frequency; the quantization of the frequency spectrum, during which process information from a psychoacoustic model helps to control the quantization error; the encoding of quantized spectral coefficients and corresponding side information with minimum entropy. These steps guarantee the codec s impressive flexibility when handling different kinds of signal types at various operating points. The audio codec HE-AAC v2 provides an even greater coding efficiency since it blends the AAC core with the tools Spectral Band Replication (SBR) and Parametric (PR): To enhance the operational range of speech or audio codecs, SBR regenerates higher frequency content based on similar sounding signals coming from lower and mid frequencies. Additional side information is also transmitted to reconstruct this high-frequency content later on. The SBR model profits from the human ear s tendency to sense lower frequencies much more accurately than higher frequencies. 3 / 7

In combination with parameters for inter-channel level, phase and correlation, PS can generate stereo content from a mono downmix. Composition of a speech coding scheme Speech coding models, e.g. Algebraic Code Excited Linear Prediction (ACELP), perform very well for speech signals at low bit-rates. This level of quality is possible because the time domain source-filter is closely related to the development of human speech in the human vocal tract. The three most essential elements in modern speech coding schemes are: a short-term linear predictive coding scheme (LPC) for moulding the coded sound according to the shape of the human vocal tract; a long-term predictor (LTP) exploiting the pattern by which the vocal chords vibrate, also known as Adaptive Codebook ; a so-called Innovation Codebook which is responsible for encoding the speech signal s non-predictive portions. Following the ACELP principle, AMR-WB employs an algebraic representation for the Innovation Codebook comprising a small number of individual non-zero pulses within a given time segment represented by means of an algebraic allocation formula. The speech codec s parameters - i.e. the LPC coefficients, the LTP lag and gain, and the innovative excitation - are efficiently coded in each frame. As a result, this coding model delivers high quality audio for speech signals at high as well as at low bit-rates. To perform equally well with music signals, a transform coding mode for the excitation signal (TCX) as well as a parametric frequency and stereo extension were added to AMR- WB, resulting in AMR-WB+. Despite these enhancements, the codec cannot compete with HE-AAC v2 when it comes to music signals. The unified system: The design principle of xhe-aac While generally maintaining the structure of HE-AAC v2 shown in Figure 1, xhe-aac features updated parametric tools for even greater efficiency. These include enhanced SBR (esbr) and improved parametric stereo coding. As illustrated in Figure 2, the core coder unites an AAC based transform coder and speech coding technology, including ACELP. Typically, encoder structures in MPEG are not fixed and implementers are only obliged to create valid bit-streams. The same principle works for xhe-aac, resulting in the possibility of continuously improving the encoder s performance in the years to come. xhe-aac shares all capabilities and tools of AAC, including features for discrete stereo or multi-channel operations. xhe-aac decoders are compatible with AAC-LC, HE-AAC and HE-AACv2 streams. For a comprehensive introduction to the xhe-aac coding scheme, please refer to the AES Convention Paper MPEG Unified Speech and Audio Coding The ISO/MPEG Standard for High-Efficiency Audio Coding of all Content Types. 4 / 7

Figure 2: Working principle of the xhe-aac Bitstream De-Multiplex core decoder. AAC based coding ACELP Bandwidth Extension Processing Uncompressed PCM Audio Extended HE-AAC Figure 3: The new member of the AAC codec HE-AAC v2 family is an upgrade of HE-AAC v2 with inte- HE-AAC v1 AAC-LC SBR PS Unified Speech and Audio Coding grated speech coding tools and more efficient tools for general audio signal coding. Low Complexity AAC For greater flexibility and efficiency, xhe-aac introduces the following new features: a more efficient context-adaptive arithmetic decoder replacing the AAC Huffman decoder; a Frequency Domain LPC Noise Shaping (FDNS) mechanism complementing the AAC scale factor mechanism, responsible for controlling the quantization noise shaping; a larger set of window lengths within the xhe-aac MDCT for a better timefrequency decomposition. Perceived Audio Quality Figure 4: Qualitative illustration of perceived audio quality. Perceptual quality Extended HE-AAC Mono HE-AAC v2 HE-AAC AAC-LC 0 8 16 32 64 128 Bitrate (kb/s) 5 / 7

Excellent Speech Perceived Quality Figure 5: Speech perceived quality. From ISO/ IEC JTC1/SC29/WG11 USAC Verification Test Report. Good Mono Perceptual quality Fair Poor Extended HE-AAC Mono HE-AAC, v2 Bad from ISO/IEC JTC1/SC29/WG11 USAC Verification Test Report 0 8 16 32 64 128 Bitrate (kb/s) Where xhe-aac can make a change The latest addition to the MPEG AAC codec family delivers high audio quality for a wide variety of bit-rates, making it especially valuable for applications where bit-rates vary or are set very low. This includes the streaming of media content, especially on to mobile devices. Since unstable network conditions are no obstacle for the codec, users can enjoy their multimedia experience with fewer dropouts and shorter buffering time. The codec s bit-rate efficiency also creates new space which can either be used for new programs or tracks, or can simply be converted into cost savings a feature digital broadcasting services can benefit from. The first radio system to adopt xhe-aac and benefit from its features is Digital Radio Mondiale (DRM). 6 / 7

INFORMATION IN THIS DOCUMENT IS PROVIDED AS IS AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. INFORMATION IN THIS DOCUMENT IS OWNED AND COPYRIGHTED BY THE FRAUNHOFER- GESELLSCHAFT AND MAY BE CHANGED AND/OR UPDATED AT ANY TIME WITHOUT FURTHER NOTICE. PERMISSION IS HEREBY NOT GRANTED FOR RESALE OR COMMERCIAL USE OF THIS SERVICE, IN WHOLE OR IN PART, NOR BY ITSELF OR INCORPORATED IN ANOTHER PRODUCT. Copyright March 2017 Fraunhofer-Gesellschaft ABOUT FRAUNHOFER IIS The division of Fraunhofer IIS has been an authority in its field for more than 25 years, starting with the creation of mp3 and co-development of AAC formats. Today, there are more than 10 billion licensed products worldwide with Fraunhofer s media technologies, and over one billion new products added every year. Besides the global successes mp3 and AAC, the Fraunhofer technologies that improve consumers audio experiences include Cingo (spatial VR audio), Symphoria (automotive 3D audio), xhe-aac (adaptive streaming and digital radio), the 3GPP EVS VoLTE codec (crystal clear telephone calls), and the interactive and immersive MPEG-H TV Audio System. With the test plan for the Digital Cinema Initiative and the recognized software suite easydcp, Fraunhofer IIS significantly pushed the digitization of cinema. The most recent technological achievement for moving pictures is Realception, a tool for light-field data processing. Fraunhofer IIS, based in Erlangen, Germany, is one of 69 divisions of Fraunhofer-Gesellschaft, Europe s largest application-oriented research organization. For more information, contact amm-info@iis.fraunhofer.de, or visit www.iis.fraunhofer.de/amm. 7 / 7