Efficient Representation of Sound Images: Recent Developments in Parametric Coding of Spatial Audio

Similar documents
Parametric Coding of Spatial Audio

MPEG-4 General Audio Coding

The MPEG-4 General Audio Coder

Mpeg 1 layer 3 (mp3) general overview

Perceptual Coding. Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding

The Reference Model Architecture for MPEG Spatial Audio Coding

SAOC and USAC. Spatial Audio Object Coding / Unified Speech and Audio Coding. Lecture Audio Coding WS 2013/14. Dr.-Ing.

MPEG SURROUND: THE FORTHCOMING ISO STANDARD FOR SPATIAL AUDIO CODING

Principles of Audio Coding

MPEG-4 aacplus - Audio coding for today s digital media world

ELL 788 Computational Perception & Cognition July November 2015

Chapter 14 MPEG Audio Compression

Audio-coding standards

Efficiënte audiocompressie gebaseerd op de perceptieve codering van ruimtelijk geluid

Audio-coding standards

Perceptual coding. A psychoacoustic model is used to identify those signals that are influenced by both these effects.

Parametric Coding of High-Quality Audio

EE482: Digital Signal Processing Applications

Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal.

INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO

New Results in Low Bit Rate Speech Coding and Bandwidth Extension

Audio coding for digital broadcasting

Audio Compression. Audio Compression. Absolute Threshold. CD quality audio:

2.4 Audio Compression

Lecture 16 Perceptual Audio Coding

JND-based spatial parameter quantization of multichannel audio signals

AUDIOVISUAL COMMUNICATION

Audio Coding and MP3

AUDIOVISUAL COMMUNICATION

Compressed Audio Demystified by Hendrik Gideonse and Connor Smith. All Rights Reserved.

5: Music Compression. Music Coding. Mark Handley

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Senior Manager, CE Technology Dolby Australia Pty Ltd

Multimedia Communications. Audio coding

ISO/IEC INTERNATIONAL STANDARD. Information technology MPEG audio technologies Part 3: Unified speech and audio coding

Audio Coding Standards

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Snr Staff Eng., Team Lead (Applied Research) Dolby Australia Pty Ltd

SAOC FOR GAMING THE UPCOMING MPEG STANDARD ON PARAMETRIC OBJECT BASED AUDIO CODING

Optical Storage Technology. MPEG Data Compression

Appendix 4. Audio coding algorithms

Spatial Audio Object Coding (SAOC) The Upcoming MPEG Standard on Parametric Object Based Audio Coding

ISO/IEC Information technology High efficiency coding and media delivery in heterogeneous environments. Part 3: 3D audio

Convention Paper Presented at the 124th Convention 2008 May Amsterdam, The Netherlands

MPEG Surround binaural rendering Surround sound for mobile devices (Binaurale Wiedergabe mit MPEG Surround Surround sound für mobile Geräte)

14th European Signal Processing Conference (EUSIPCO 2006), Florence, Italy, September 4-8, 2006, copyright by EURASIP

INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO

Module 9 AUDIO CODING. Version 2 ECE IIT, Kharagpur

Bluray (

1 Audio quality determination based on perceptual measurement techniques 1 John G. Beerends

Chapter 4: Audio Coding

MULTI-CHANNEL GOES MOBILE: MPEG SURROUND BINAURAL RENDERING

Data Compression. Audio compression

For Mac and iphone. James McCartney Core Audio Engineer. Eric Allamanche Core Audio Engineer

6MPEG-4 audio coding tools

Chapter 5.5 Audio Programming

3GPP TS V6.2.0 ( )

MPEG-4 Structured Audio Systems

Audio Fundamentals, Compression Techniques & Standards. Hamid R. Rabiee Mostafa Salehi, Fatemeh Dabiran, Hoda Ayatollahi Spring 2011

SPREAD SPECTRUM AUDIO WATERMARKING SCHEME BASED ON PSYCHOACOUSTIC MODEL

Technical PapER. between speech and audio coding. Fraunhofer Institute for Integrated Circuits IIS

EE Multimedia Signal Processing. Scope & Features. Scope & Features. Multimedia Signal Compression VI (MPEG-4, 7)

Audio Engineering Society. Convention Paper. Presented at the 123rd Convention 2007 October 5 8 New York, NY, USA

DSP. Presented to the IEEE Central Texas Consultants Network by Sergio Liberman

TECHNICAL PAPER. Personal audio mix for broadcast programs. Fraunhofer Institute for Integrated Circuits IIS

Figure 1. Generic Encoder. Window. Spectral Analysis. Psychoacoustic Model. Quantize. Pack Data into Frames. Additional Coding.

Surrounded by High-Definition Sound

MPEG-1. Overview of MPEG-1 1 Standard. Introduction to perceptual and entropy codings

Low Bitrate Coding of Spot Audio Signals for Interactive and Immersive Audio Applications

Fundamentals of Perceptual Audio Encoding. Craig Lewiston HST.723 Lab II 3/23/06

Audio Engineering Society. Convention Paper. Presented at the 116th Convention 2004 May 8 11 Berlin, Germany

DAB. Digital Audio Broadcasting

ijdsp Interactive Illustrations of Speech/Audio Processing Concepts

MP3. Panayiotis Petropoulos

Networking Applications

CISC 7610 Lecture 3 Multimedia data and data formats

MPEG-4 Advanced Audio Coding

Preview only reaffirmed Fifth Avenue, New York, New York, 10176, US

Proceedings of Meetings on Acoustics

Audio and video compression

ISO/IEC INTERNATIONAL STANDARD

Convention Paper Presented at the 140 th Convention 2016 June 4 7, Paris, France

Advanced Functionality & Commercial Aspects of DRM. Vice Chairman DRM Technical Committee,

AUDIO MEDIA CHAPTER Background

An investigation of non-uniform bandwidths auditory filterbank in audio coding

MPEG-4 AUTHORING TOOL FOR THE COMPOSITION OF 3D AUDIOVISUAL SCENES

Ch. 5: Audio Compression Multimedia Systems

AUDIO SIGNAL PROCESSING FOR NEXT- GENERATION MULTIMEDIA COMMUNI CATION SYSTEMS

Digital video coding systems MPEG-1/2 Video

MPEG Spatial Audio Coding Multichannel Audio for Broadcasting

Efficient Signal Adaptive Perceptual Audio Coding

Design and Implementation of an MPEG-1 Layer III Audio Decoder KRISTER LAGERSTRÖM

Squeeze Play: The State of Ady0 Cmprshn. Scott Selfon Senior Development Lead Xbox Advanced Technology Group Microsoft

DVB Audio. Leon van de Kerkhof (Philips Consumer Electronics)

Perceptual Audio Coders What to listen for: Artifacts of Parametric Coding

Audio Engineering Society. Convention Paper

Lecture 3 Image and Video (MPEG) Coding

Enhanced Audio Features for High- Definition Broadcasts and Discs. Roland Vlaicu Dolby Laboratories, Inc.

PHABRIX. Dolby Test & Measurement Application Notes. broadcast excellence. Overview. Dolby Metadata Detection. Dolby Metadata Analysis

Efficient Implementation of Transform Based Audio Coders using SIMD Paradigm and Multifunction Computations

IMMERSIVE AUDIO WITH MPEG 3D AUDIO STATUS AND OUTLOOK

HAVE YOUR CAKE AND HEAR IT TOO: A HUFFMAN CODED, BLOCK SWITCHING, STEREO PERCEPTUAL AUDIO CODER

Transcription:

Efficient Representation of Sound Images: Recent Developments in Parametric Coding of Spatial Audio Dr. Jürgen Herre 11/07 Page 1 Jürgen Herre für (IIS) Erlangen, Germany

Introduction: Sound Images? Humans live in a world of sound continuously listen to the sound surrounding us perceive the acoustic waves reaching the ears and interpret them (re)construct an acoustic scene In analogy to visual perception, we form a sound image... This talk: How to efficiently represent and reproduce sound images! Dr. Jürgen Herre 11/07 Page 2

Overview Part I Part II Part III What constitutes a sound image? Efficient coding of spatial sound images Perceptual audio coding Coding of multi-channel / surround sound Spatial Audio Coding & MPEG Surround Next-generation interactive coding / rendering of sound images: Spatial Audio Object Coding Demonstration Dr. Jürgen Herre 11/07 Page 3

Part I: What Constitutes A Sound Image? Dr. Jürgen Herre 11/07 Page 4

Spatial Sound Perception The Physics Part We are permanently listening to sound through our two sensors (ears) Ears receive a complex sound mixture of direct sound that is radiated from several sources (frequently concurrently), and many reflections from our acoustic environment ( ambient sound ) Dr. Jürgen Herre 11/07 Page 5

Spatial Sound Perception (2) The Perceptual Part Some Key Words Humans interpret sound to make sense of it Map it to an internal sound scene which is a perceptual correlate of the actual physical scene (not necessarily identical!) Very complex & sophisticated process - by far not fully understood! Psychoacoustics [Zwicker, Moore, Fletcher, Blauert, ] Perceptual streaming, auditory scene analysis [Bregman, ] How to form a consistent interpretation of the acoustic world from the stream of received information fragments Dr. Jürgen Herre 11/07 Page 6

Spatial Sound Perception (3) A Note of Caution In the following, we will look at the phenomenon of spatial audio perception in a grossly simplified, yet illustrative and very useful way Allows immediate technical application to efficient coding of spatial sound Highlight some analogies between sound images and visual images Frequency Selectivity The Human Auditory System processes sound in a frequency selective way (roughly logarithmic, ca. 1/3 octave BARK and ERB scales) Dr. Jürgen Herre 11/07 Page 7

Spatial Sound Perception (4) Sound Localization Complex process, mostly driven by interaural signal differences L R Depending on its position, a sound source produces differences in the ear signals Most important perceptual cues contained in the ear signals: e.g. [Blauert 1984] Interaural Level Differences (ILD) Interaural Phase/Time Differences (IPD/ITD) (relevant up to ca. 4 khz frequency) Temporal signal envelope (high frequencies) ILD, IPD/ITD determine perceived lateral position of sound sources! Dr. Jürgen Herre 11/07 Page 8

Spatial Sound Perception (5) In realistic closed rooms, reflections from the walls (and other objects) occur These blur the simple ILD, IPD/ITD relations by decorrelating the ear signals Decorrelated sound at the two ears represents reflections and thus provides information on the acoustic environment Distinct early reflections nearby walls Late reverberation statistical summary of acoustic environment / room properties Auditory system needs to parse sound into direct & ambient sound! Dr. Jürgen Herre 11/07 Page 9

Spatial Sound Perception (6) e1 [Faller 2004] Dr. Jürgen Herre 11/07 Page 10 e2 Important role of Interaural Coherence (IC): Determines spatial extent (=width) of sound event [Schuijers03] [Faller03] Perceived width of auditory event increases as coherence decreases (1 4); eventually separates into 2 distinct events Max. of normalized cross-correlation: IC = max d $ % n=#$ $ e 1 (n) " e 2 (n + d) % e 2 1 (n) " % e 2 2 (n + d) Key to source width and source/ambience perception! n=#$ $ n=#$

Some A V Analogies in Scene Attributes Visual Domain Auditory Domain ( auditory cues) Foreground object Sound sources ( high IC) - directional Object position - Object position ( ILD/ITD/IPD for lat. position) Object size - Object size ( IC) Background Ambience ( low IC) - non-directional Dr. Jürgen Herre 11/07 Page 11

Part II: Efficient Coding Of Spatial Sound Images Dr. Jürgen Herre 11/07 Page 12

Basics of Audio Coding Goal Predominant Approach Represent audio data as compactly as possible while maintaining sound quality (ideally: transparent coding) Concept of perceptual audio coding Optimize subjective quality rather than objective distortion metrics (e.g. MSE/SNR) Use knowledge about signal receiver: Psychoacoustics gives limits of perception Keep coding distortion below limits! No universal source model available (unlike in speech coding) Dr. Jürgen Herre 11/07 Page 13

Psychoacoustics 80 db L T 60 40 Masking Threshold Masker 20 Threshold in quiet Inaudible Signal Masked Sound 0 0,02 0,05 0,1 0,2 0,5 1 2 5 10 20 khz f T Dr. Jürgen Herre 11/07 Page 14

Demonstration: The "13 db Miracle" Historic demonstration by James D. Johnston and Karlheinz Brandenburg at AT&T Bell Laboratories in 1990 using the best psychoacoustic model available at that time... Original signal Original + white noise, SNR = 13,6 db Original + noise at threshold, SNR = 13,6 db Difference signal: White noise Difference signal: Noise at threshold SNR does not adequately describe subjective sound quality! Putting psychoacoustics to work makes a huge difference! Dr. Jürgen Herre 11/07 Page 15

Basic Paradigm of (Monophonic) Perceptual Audio Coding audio in Analysis Filterbank Quantization & Coding Encoding of Bitstream bitstream out Perceptual Model bitstream in Decoding of Bitstream Inverse Quantization Synthesis Filterbank audio out Dr. Jürgen Herre 11/07 Page 16

A Real Audio Coder (MPEG-2 AAC, 1997) Bitstream Output Bitstream Multiplexer Input time signal Gain Control Perceptual Model Filter Bank TNS Intensity/ Coupling Prediction M/S Scale Factors Quant. Rate/Distortion Control Noiseless Coding Dr. Jürgen Herre 11/07 Page 17

The Better Spatial Sound Image: Surround Sound / Multi-Channel Audio Significantly increased spatial realism over stereo, envelopment Origins in movie sound (5.1); now also for music, broadcasting Increasingly adopted in consumers homes Dr. Jürgen Herre 11/07 Page 18 Fraunhofer Institut

Traditional Delivery Formats For Surround Sound Matrixed Surround (Prologic, Neo6, ) Discrete Surround (AAC, AC-3, ) Downmix of 5.1 sound into stereo signal, upmix at the receiver side Efficient in terms of transmission bandwidth (same bitrate as stereo) Backward compatible to stereo delivery Limited computation necessary Significant loss in subjective audio quality Separate transmission of each channel Significantly higher bitrate than stereo Moderate amount of computation High subjective audio quality possible Dr. Jürgen Herre 11/07 Page 19

A Major Step Ahead: Spatial Audio Coding Rather recent development Main Characteristics Compression efficiency: Transmits multi-channel audio at bitrates used for 2-channel stereo (or even mono) Backward compatibility: SAC multi-channel audio is coded in a backward compatible way existing infrastructures can be seamlessly upgraded to multi-channel / surround! High subjective audio quality Heavily based on exploiting perception rather than waveform coding! Dr. Jürgen Herre 11/07 Page 20

The Spatial Audio Coding (SAC) Concept Spatial Audio Coding = Downmix + Parametric Spatial Synthesis e.g. ICLD, ICTD/ICPD, ICC Extracts & reproduces Inter-Channel Counterparts of Inter-Aural Parameters Dr. Jürgen Herre 11/07 Page 21

Related Technology Generalization / Extension of Binaural Cue Coding Parametric Coding of Multi-Channel into a Mono Channel Parametric Stereo Parametric Coding of 2-Channel Stereo into a Mono Channel Matrix Surround (Dolby Prologic, Logic 7, Circle Surround ) First commercial Application (2003) MP3 Surround Backward compatible extension of stereo MP3 towards 5.1 surround sound at 128-192kbit/s Dr. Jürgen Herre 11/07 Page 22

Example: MP3 Surround Encoding Principle: Surround Stereo + Side Information L R C Ls Rs Compatible Downmix Lc Rc Standard MP3 Encoder MP3 Surround Bitstream Surround BCC Encoder Ancillary Data Surround Enhancement [Herre et al. 2004] Dr. Jürgen Herre 11/07 Page 23

Example: MP3 Surround Decoding Principle: Stereo + Side Information Surround MP3 Surround Bitstream Standard MP3 Decoder Lc Rc Advanced Surround Stereo Multi- Playback Channel (simple Decoderdecoder) Decoding L R C Ls Rs Ancillary Data Surround Enhancement [Herre et al. 2004] Dr. Jürgen Herre 11/07 Page 24

MPEG Spatial Audio Coding / MPEG Surround Work item Spatial Audio Coding (SAC) Technical development 3/2004-7/2006 Main contributors: Fraunhofer IIS, Agere Systems, Coding Technologies and Philips Renamed into MPEG Surround (MPEG-D) Applications: Efficient & backward compatible upgrade of audio distribution to multi-channel, e.g.: Music download service Multi-channel streaming / Internet radio Digital audio broadcasting Dr. Jürgen Herre 11/07 Page 25

MPEG Surround Synthesis: General Concept Decoder Structure Input 1 hybrid analysis hybri d synthesi s Output 1 Input 2 hybrid analysis Spatial synthesis hybri d synthesi s Output 2 hybri d synthesi s Output N Spatial parameters hybrid = QMF filterbank + 2nd stage non-uniform frequency resolution relating to frequ. resolution of human auditory system Same QMF filterbank as in MPEG-4 HE-AAC (AAC + SBR) Dr. Jürgen Herre 11/07 Page 26

MPEG Surround Synthesis: General Concept (2) Frequency selective processing: Dynamic matrix + decorrelation Many more details, please see dedicated publications! Side information rate typ. 3-32 kbit/s Dr. Jürgen Herre 11/07 Page 27

Underlying Idea: Hierarchical En/decoding Encoder 5-2-5 Coding Decoder l f l b OTT encoder element OTT decoder element l f l b c lfe r f r b OTT encoder element OTT encoder element TTT encoder element l 0 r 0 l 0 r 0 TTT decoder element OTT decoder element OTT decoder element c lfe r f r b Spatial parameters Spatial parameters Dr. Jürgen Herre 11/07 Page 28

MPEG Surround Concepts Generalization Similar trees for mono-based operation ( 5-1-5 ), other modes ( 7-2-7, 7-5-7 ) Spatial Parameters Channel Level Differences (CLDs) Inter-Channel Correlations (ICCs) Channel Prediction Coefficients (CPCs) Prediction errors (residuals) Other Aspects Decorrelation by QMF-domain all-pass filters Several tools for handling fine temporal envelope structure (both without and with additional side information) Dr. Jürgen Herre 11/07 Page 29

MPEG Surround: Additional Functionalities Artistic Downmix Matrix Surround Compatibility Enhanced Matrix Mode Binaural Rendering for headphone Externally created downmixes can be used Stereo downmix can be made compatible with common matrix surround decoders MPEG Surround decoder can decode matrix surround signal (i.e. work without side info) Downmix can be generated as virtual surround, or MPEG Surround can be decoded directly into virtual surround very efficiently Rich set of attractive features for practical application Dr. Jürgen Herre 11/07 Page 30

MPEG Surround: Recent Verification Test MPEG Surround MPEG Surround without side-info Matrixed Surround (Dolby Prologic II) Dr. Jürgen Herre 11/07 Page 31 Music-Store test scenario: Stereo downmix coded using AAC@160kbit

Part III: Next Generation Interactive Coding / Rendering of Sound Images From Spatial Audio Coding to Spatial Audio Object Coding Dr. Jürgen Herre 11/07 Page 32

Classic MPEG-4 Interactive Scene Composition (1996ff) audiovisual objects hierarchically multiplexed downstream control / data hierarchically multiplexed upstream control / data video compositor projection plane voice sprite 2D background scene coordinate system z y x audiovisual presentation user events 3D objects audio compositor voice Hierarchy of objects person sprite 2D background scene globe furniture desk audiovisual presentation Dr. Jürgen Herre 11/07 Page 33 hypothetical viewer speaker display user input Scene is composed of multiple A/V objects and can be rendered interactively

MPEG-4 Object Based (De)Coding Discrete approach comes at a rather high price: Bitrate and decoding complexity grow with number of objects Structural complexity Dr. Jürgen Herre 11/07 Page 34

From Spatial Audio Coding (SAC) to SAOC Regular Spatial Audio Coding: Channel-oriented scheme (MPEG Surround) Chan. #1 Chan. #2 Chan. #3 SAC Encoder Downmix signal(s) SAC Decoder Chan. #1 Chan. #2 Chan. #3 Chan. #4... Side Info Chan. #4... Dr. Jürgen Herre 11/07 Page 35

From SAC to SAOC (2) Alternative: Object-oriented Spatial Audio Coding Obj. #1 Obj. #2 Obj. #3 Obj. #4... SAOC Encoder Downmix signal(s) Side Info SAOC Decoder obj. #1 obj. #2 obj. #3 obj. #4... Renderer Chan. #1 Chan. #2... User Interaction / Ctrl. Processes object signals instead of channel signals Mixing /rendering parameters vary according to user interaction Combined obj. decoding & rendering computationally efficient! Previous work by Faller & Baumgarte [2001ff] and Faller [2006] Dr. Jürgen Herre 11/07 Page 36

Real-time interactive rendering of audio objects from a mono audio downmix + SAOC side information Dr. Jürgen Herre 11/07 Page 37

New MPEG Standardization Activities Work on Spatial Audio Object Coding (SAOC) started Transcoding approach: SAOC + rendering info MPEG Surround SAOC Decoder Reference model and working draft recently established (10/2007) Dr. Jürgen Herre 11/07 Page 38

Conclusions Sound Images carry some analogy to images in the visual world Spatial Audio Coding schemes code surround sound based on perception (rather than on waveform match) Object positions are represented by perceptual spatial parameters Audio Object Texture is coded using regular mono/stereo coder Such schemes can bring surround sound into existing infrastructures High compression factor (surround sound at 64kbps and below!) Stereo / mono backward compatibility Extension towards efficient interactive, object-based scene coding / rendering (Spatial Audio Object Coding) is currently on its way Dr. Jürgen Herre 11/07 Page 39

Dr. Jürgen Herre 11/07 Page 40