Parametric Coding of Spatial Audio

Similar documents
Efficient Representation of Sound Images: Recent Developments in Parametric Coding of Spatial Audio

Audio-coding standards

Audio-coding standards

5: Music Compression. Music Coding. Mark Handley

Audio Compression. Audio Compression. Absolute Threshold. CD quality audio:

14th European Signal Processing Conference (EUSIPCO 2006), Florence, Italy, September 4-8, 2006, copyright by EURASIP

Mpeg 1 layer 3 (mp3) general overview

SAOC and USAC. Spatial Audio Object Coding / Unified Speech and Audio Coding. Lecture Audio Coding WS 2013/14. Dr.-Ing.

EE482: Digital Signal Processing Applications

Distributed Signal Processing for Binaural Hearing Aids

JND-based spatial parameter quantization of multichannel audio signals

ELL 788 Computational Perception & Cognition July November 2015

Optical Storage Technology. MPEG Data Compression

MPEG-4 aacplus - Audio coding for today s digital media world

Perceptual coding. A psychoacoustic model is used to identify those signals that are influenced by both these effects.

Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal.

Principles of Audio Coding

Audio Engineering Society. Convention Paper. Presented at the 116th Convention 2004 May 8 11 Berlin, Germany

Efficiënte audiocompressie gebaseerd op de perceptieve codering van ruimtelijk geluid

Audio Coding and MP3

Multimedia Communications. Audio coding

Parametric Coding of High-Quality Audio

Squeeze Play: The State of Ady0 Cmprshn. Scott Selfon Senior Development Lead Xbox Advanced Technology Group Microsoft

MPEG SURROUND: THE FORTHCOMING ISO STANDARD FOR SPATIAL AUDIO CODING

Appendix 4. Audio coding algorithms

Lecture 16 Perceptual Audio Coding

Proceedings of Meetings on Acoustics

The Reference Model Architecture for MPEG Spatial Audio Coding

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Snr Staff Eng., Team Lead (Applied Research) Dolby Australia Pty Ltd

Audio Engineering Society. Convention Paper. Presented at the 123rd Convention 2007 October 5 8 New York, NY, USA

Chapter 14 MPEG Audio Compression

The MPEG-4 General Audio Coder

Compressed Audio Demystified by Hendrik Gideonse and Connor Smith. All Rights Reserved.

1 Audio quality determination based on perceptual measurement techniques 1 John G. Beerends

Dolby Atmos Immersive Audio From the Cinema to the Home

Audio coding for digital broadcasting

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Senior Manager, CE Technology Dolby Australia Pty Ltd

MPEG-1. Overview of MPEG-1 1 Standard. Introduction to perceptual and entropy codings

AUDIOVISUAL COMMUNICATION

MPEG-4 Structured Audio Systems

DSP. Presented to the IEEE Central Texas Consultants Network by Sergio Liberman

New Results in Low Bit Rate Speech Coding and Bandwidth Extension

ISO/IEC INTERNATIONAL STANDARD. Information technology MPEG audio technologies Part 3: Unified speech and audio coding

Audio Engineering Society. Convention Paper

MULTI-CHANNEL GOES MOBILE: MPEG SURROUND BINAURAL RENDERING

I D I A P R E S E A R C H R E P O R T. October submitted for publication

Perceptual Audio Coders What to listen for: Artifacts of Parametric Coding

Speech and audio coding

ETSI TR V ( )

2.4 Audio Compression

Data Compression. Audio compression

Perceptual Coding. Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding

Chapter 4: Audio Coding

AUDIOVISUAL COMMUNICATION

MP3. Panayiotis Petropoulos

MPEG Surround binaural rendering Surround sound for mobile devices (Binaurale Wiedergabe mit MPEG Surround Surround sound für mobile Geräte)

Spatial Audio Object Coding (SAOC) The Upcoming MPEG Standard on Parametric Object Based Audio Coding

Convention Paper Presented at the 140 th Convention 2016 June 4 7, Paris, France

Convention Paper Presented at the 124th Convention 2008 May Amsterdam, The Netherlands

MC-12 Software Version 2.0. Release & Errata Notes

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007

For Mac and iphone. James McCartney Core Audio Engineer. Eric Allamanche Core Audio Engineer

Introduction to HRTFs

Chapter 5.5 Audio Programming

MUSIC A Darker Phonetic Audio Coder

AUDIBLE AND INAUDIBLE EARLY REFLECTIONS: THRESHOLDS FOR AURALIZATION SYSTEM DESIGN

Fundamentals of Perceptual Audio Encoding. Craig Lewiston HST.723 Lab II 3/23/06

MAXIMIZING AUDIOVISUAL QUALITY AT LOW BITRATES

Lecture Information Multimedia Video Coding & Architectures

Surrounded by High-Definition Sound

Technical PapER. between speech and audio coding. Fraunhofer Institute for Integrated Circuits IIS

Remote Collaborative Real-Time Multimedia Experience over the Future Internet ROMEO. Grant Agreement Number: D4.3

/ / _ / _ / _ / / / / /_/ _/_/ _/_/ _/_/ _\ / All-American-Advanced-Audio-Codec

Module 9 AUDIO CODING. Version 2 ECE IIT, Kharagpur

Perceptually motivated Sub-band Decomposition for FDLP Audio Coding

MPEG-4 General Audio Coding

INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO

Audio and video compression

SAOC FOR GAMING THE UPCOMING MPEG STANDARD ON PARAMETRIC OBJECT BASED AUDIO CODING

Simple Watermark for Stereo Audio Signals with Modulated High-Frequency Band Delay

ITNP80: Multimedia! Sound-II!

Principles of MPEG audio compression

Low Bitrate Coding of Spot Audio Signals for Interactive and Immersive Audio Applications

Lecture Information. Mod 01 Part 1: The Need for Compression. Why Digital Signal Coding? (1)

Speech-Coding Techniques. Chapter 3

Loudness Variation When Downmixing

DAB. Digital Audio Broadcasting

ijdsp Interactive Illustrations of Speech/Audio Processing Concepts

Using Noise Substitution for Backwards-Compatible Audio Codec Improvement

Figure 1. Generic Encoder. Window. Spectral Analysis. Psychoacoustic Model. Quantize. Pack Data into Frames. Additional Coding.

A PSYCHOACOUSTIC MODEL WITH PARTIAL SPECTRAL FLATNESS MEASURE FOR TONALITY ESTIMATION

6MPEG-4 audio coding tools

Evaluation of a new Ambisonic decoder for irregular loudspeaker arrays using interaural cues

3GPP TS V ( )

ROW.mp3. Colin Raffel, Jieun Oh, Isaac Wang Music 422 Final Project 3/12/2010

Audio Coding Standards

Subjective Consumer Evaluation of Multichannel

MPEG-4 ALS International Standard for Lossless Audio Coding

SPREAD SPECTRUM AUDIO WATERMARKING SCHEME BASED ON PSYCHOACOUSTIC MODEL

ETSI TS V8.0.0 ( ) Technical Specification

Transcription:

Parametric Coding of Spatial Audio Ph.D. Thesis Christof Faller, September 24, 2004 Thesis advisor: Prof. Martin Vetterli Audiovisual Communications Laboratory, EPFL Lausanne

Parametric Coding of Spatial Audio Contents: - Audio Coding and Thesis Motivation - Background - Binaural Cue Coding (BCC) - Variations of BCC - Source Localization in Complex Listening Scenarios - Conclusions

Audio Coding Audio coding Transmission x Encoder Decoder x Convert audio signal into a representation suitable for: - Transmission - Storage Minimize bitrate - Optimal for storage - Optimal for transmission Storage

Audio Coding Lossless coding x Encoder Decoder x 50% Memory/bitrate reduction (Put music of 10 CDs on 1 CD) x = x Redundancy reduction: -Time to frequency transform -Entropy coding

Perceptual audio coding Audio Coding x Encoder Decoder x 90% Memory/bitrate reduction (Put music of 10 CDs on 1 CD) x x x x x x

Parametric audio coding Audio Coding x Encoder Decoder x >90% Memory/bitrate reduction Drawback: Quality loss

Thesis Motivation Receiver properties and source properties have been extensively considered in audio coding for monaural audio signals. This thesis: Extensively consider receiver and source properties for stereo and multi-channel audio signals.

Parametric Coding of Spatial Audio Contents: - Audio Coding and Thesis Motivation - Background - Binaural Cue Coding (BCC) - Variations of BCC - Source Localization in Complex Listening Scenarios - Conclusions

Background Spatial audio playback Two-channel stereo (headphone and loudspeaker playback) 5.1 Surround LFE Left Center Right Left Right Left Right Rear left Rear right Spatial audio playback: - perception of an auditory spatial image

Background Interaural differences One source, free-field s Source azimuth and ITD/ILD: (Narrowband signal) φ = 90 ILD φ = 0 ITD φ φ = -90 distance difference, head shadowing interaural time and level difference (ITD and ILD)

Background Inter-channel differences: Inter-channel time-difference (ICTD) Inter-channel level-difference (ICLD) Inter-channel coherence (ICC) ICTD, ICLD ICC

Background Mixing stereo signals: a 1 a 1 d 1 d 1 s 1 a 2 s 1 a 2 d 2 d 2 Σ Σ Σ Σ a 3 a 3 d 3 d 3 s 2 a 4 s 2 a 4 d 4 d 4

Background Other auditory spatial image attributes: Auditory event distance Auditory event width Listener envelopment - Power of ear-input signals - Ratio of power of direct to reflected sound - Lateral fraction - Late lateral energy fraction Spatial audio playback: These attributes are controlled by adding reflections to the signal channels.

Parametric Coding of Spatial Audio Contents: - Audio Coding and Thesis Motivation - Background - Binaural Cue Coding (BCC) - Variations of BCC - Source Localization in Complex Listening Scenarios - Conclusions

Binaural Cue Coding (BCC) Conventional multi-channel transmission Proposed multi-channel transmission MONO SIGNAL ANALYSIS + DOWNMIX SYNTHESIS

Binaural Cue Coding (BCC) Estimation of inter-channel cues: ICTD, ICLD, and ICC Frequency ICC ICTD, ICLD Time ICTD/ICLD: 1-3 kb/s (per channel pair) ICC: 1-3 kb/s

Binaural Cue Coding (BCC) ICTD/ICLD/ICC synthesis s FB s d 1 d 2 a 1 a 2 h 1 h 2 x 1 x 2 IFB IFB x 1 x 2 d C a C h C x C IFB x C d i : delays a i : scale factors h i : ICTD ICLD h i : filters f f

Binaural Cue Coding (BCC) ICTD, ICLD, and ICC and auditory spatial image attributes Auditory event localization: ICTD, ICLD Effect of late reflections: (ambience, listener envelopment, auditory event distance) ICC (de-correlation) ICLD (reverberation decay) Effect of early reflections: (auditory event width, coloration) ICC (de-correlation) ICLD (comb filter)

Binaural Cue Coding (BCC) Subjective evaluation: Stereo audio quality GRADING 100 80 60 40 20 0 anchor 1 anchor 2 no ICC ICC reference INDIVIDUAL ITEMS A B C D E F G H I J K L M N AVERAGE Excellent Good Fair Poor Bad MUSHRA (ITU-R BS.1534), headphone listening, 7 experienced subjects BCC: "excellent"

Surround 5.1 Demo: Binaural Cue Coding (BCC) Original Mono Synthesized ANALYSIS + DOWNMIX SYNTHESIS 15 kb/s

Binaural Cue Coding (BCC) Surround 5.1 Demo: 44 kb/s (lowest bitrate ever demonstrated) Memory of 1 CD = 32 CDs worth of music! Not stereo, but surround! Original Synthesized ANALYSIS + DOWNMIX 32 kb/s HE-AAC SYNTHESIS 12 kb/s

Parametric Coding of Spatial Audio Contents: - Audio Coding and Thesis Motivation - Background - Binaural Cue Coding (BCC) - Variations of BCC - Source Localization in Complex Listening Scenarios - Conclusions

Variations of BCC "MP3 Surround": Stereo backwards compatible coding of 5.1 Stereo-downmix: 2-to-5 Synthesis: Lt FB y 1 Upmixing.. s 1 s 4 d 1 d 4 a 1 a 4 A x 1 x 4 IFB IFB L sl + s 3 a 3 x 3 IFB C Lt = L + 0.7C + sl Rt = R + 0.7C + sr Rt FB y 2.. s 2 s 5 d 2 d 5 a 2 a 5 A x 2 x 5 IFB IFB R sr

Variations of BCC Low complexity 5-to-2 BCC: Subjective evaluation: MUSHRA (ITU-R BS.1534) Hidden reference, Anchor, AAC, 5-to-2 BCC + MP3, Dolby Prologic II 256 kb/s 192 kb/s PCM

Variations of BCC MP3 Surround Demo: Lt Rt L LFE C R Ls Rs MP3 Surround

Variations of BCC Variations of BCC Conventional flexible rendering BCC for flexible rendering ANALYSIS + DOWNMIX MONO SIGNAL

Variations of BCC BCC for Flexible Rendering Side information: Time-frequency structure of sum signal FREQUENCY 2 1 2 3 1 TIME SUBBAND INDEX... 1 1 1 2 3 3 1 1 1 2 3 3 2 2 2 2 1 1... TIME INDEX k

Binaural Cue Coding (BCC) BCC for Flexible Rendering Demo: 4 Instruments playing together. ANALYSIS + DOWNMIX MONO SIGNAL 2.5kb/s

Parametric Coding of Spatial Audio Contents: - Audio Coding and Thesis Motivation - Background - Binaural Cue Coding (BCC) - Variations of BCC - Source Localization in Complex Listening Scenarios - Conclusions

Source Localization in Complex Listening Scenarios Model for source localization in complex listening scenarios: - concurrently active sources - direct and reflected sound

Source Localization in Complex Listening Scenarios IC and concurrent sources s 1 s 2 e 1 e 2 1 IC 0.5 0 30 20 10 0 10 20 30 p 1 /p 2 [db] IC can be used as a "single-source-active indicator"

Source Localization in Complex Listening Scenarios IC and reflections s' e 1 s e 2 e 1 e 2 IC 1 n 1 n 2 n IC can be used as a "first-wavefront indicator"

Source Localization in Complex Listening Scenarios Model for source localization Psychological aspect of model Cue selection: Use ITD and ILD when IC > c 0 ITD, ILD, IC Binaural processor Inner ear (cochlea) Middle ear [db] [db] f [Hz] External ear f [Hz] h 1, h 2

Source Localization in Complex Listening Scenarios Two concurrent male speech sources, free-field 500Hz 1 IC c 0.9 c 0 = 0.95 s 1 s 2 0.8 5 40 40 Δ L [db] ILD 0 5 1 ITD τ [ms] 0 1 0 0.2 0.4 0.6 0.8 1 TIME [s]

Source Localization in Complex Listening Scenarios Precedence effect: (Broadband pulses, 5ms lead/lag delay, free-field) 500Hz 1 s 1 s 2 200 ms IC c 0.9 0.8 c 0 = 0.95 5 ms s 1 s 2 Δ L [db] ILD 5 0 5 40 40 ITD τ [ms] 1 0 1 0 0.1 0.2 0.3 0.4 TIME [s]

Source Localization in Complex Listening Scenarios Source localization in reverberant environment: (2 male speech sources, 500Hz: RT = 2s, 2kHz: RT = 1.4s) 5 p(δl) Without cue (B) selection ΔL [db] ILD 0 s 1 s 2 5 p(τ) 30 30 1 0 1 τ [ms] ITD p(δl c>c 0 ) With cue (D) selection 5 ΔL [db] ILD 0 5 p(τ c>c 0 ) 1 0 1 τ [ms] ITD

Parametric Coding of Spatial Audio Contents: - Audio Coding and Thesis Motivation - Background - Binaural Cue Coding (BCC) - Variations of BCC - Source Localization in Complex Listening Scenarios - Conclusions

Parametric Coding of Spatial Audio Conclusions Binaural Cue Coding (BCC): - low bitrate coding of stereo and multi-channel audio signals - low bitrate transmission of independent sources - bridging between different audio formats Proposed source localization model: - speculates about role of IC for "cue selection" - attempts at explaining source localization in complex listening

Parametric Coding of Spatial Audio Future work: Multi-channel BCC parameters: Different more efficient and flexible parametrization. BCC applied to binaural recordings: Further investigate. Psychoacoustic experiments with BCC stimuli. Source localization model: - Adaptation of cue selection threshold - More simulations, psychoacoustic experiments - Can model be related to other attributes of auditory spatial image?

Parametric Coding of Spatial Audio Acknowledgements: Martin Vetterli, Peter Kroon, and all colleagues I collaborated with. Thanks to all of you for coming to this defense!

Audio Coding Hybrid audio coding: Intensity stereo coding (ISC): Frequency Frequency Frequency Azimuth parameters Frequency Spectral bandwidth replication (SBR): Frequency Spectral parameters Frequency

Binaural Cue Coding (BCC) Downmix with equalization x 1 x 2 FB FB x 1 x 2! x e s IFB s x C FB x C Numerical example: x 1 x 1 x 2 x 2 x + x 1 +x 2 2 1 0.5 0 1 0.5 0 1 0.5 0 0 5 10 15 20 TIME [ms] Time [ms] X 1 [db] X 1 X 2 X 2 [db] X 1 [db] +X 2 10 5 0 5 10 10 5 0 5 10 10 5 0 5 10 0 0.5 1 1.5 2 FREQUENCY [khz] Frequency [Hz]

Binaural Cue Coding (BCC) Alternative ICTD/ICLD/ICC synthesis s FB s d 1 a 1 b 1! x 1 IFB x 1 LR LR s 1 s 2 d 2 a 2 b 2! x 2 IFB x 2 a i, b i : scale factors LR: late reverberation filter Coherent/incoherent channel power:

Binaural Cue Coding (BCC) Subjective evaluation: 5-channel audio quality GRADING 5 4 3 2 1 INDIVIDUAL ITEMS a b c d e f g h AVERAGE Imperceptible Perceptible but not annoying Slightly annoying Annoying Very annoying Hidden reference test (ITU-R BS.1116), loudspeaker listening, 9 subjects

Binaural Cue Coding (BCC) Subjective evaluation: ICLD time resolution stable Auditory spatial image stability image stability grading instable 5 grading 4 3 2 1 speech singing percussions applause 32 2048 ms 16 1024 ms 8512 ms 4256 ms window size Overall overall quality speech singing percussions applause 32048 ms 16 1024 ms 8512 ms 4256 ms window size imperceptible perceptible but not annoying slightly annoying annoying very annoying

Variations of BCC C-to-E BCC x 1 x 2 x C ENCODER DOWNMIX C-to-E y 1 y E DECODER ICTD, ICLD, ICC SYNTHESIS x 1 x 2 x C ICTD, ICLD, ICC ESTIMATION SIDE INFORMATION - Potentially better quality than C-to-1 BCC - E-channel backwards compatible coding of C-channel audio

Variations of BCC 6-to-5 BCC: 5.1 backwards compatible coding of 6.1 Downmix: 5-to-6 Synthesis: y 1 y 2 Upmixing delay delay x 1 x 2 y 3 delay x 3 y 4 FB y 4. + s 4 s 6 a 4 a 6 x 4 x 6 IFB IFB x 4 x 6 Dolby Surround EX loudspeaker positioning y 5 FB y 5. s 5 a 5 x 5 IFB x 5

Variations of BCC 7-to-5 BCC: 5.1 backwards compatible coding of 7.1 Downmix: 5-to-7 Synthesis: y 1 y 2 y 3 y 4 FB y 1 Upmixing. s 4 s 6 delay delay delay a 4 d 4 a 6 d 6 A x 4 x 6 IFB IFB x 1 x 2 x 3 x 4 x 6 Lexicon Logic 7 loudspeaker positioning y 5 FB y 2. s 5 s 7 d 5 d 7 a 5 a 7 A x 5 x 7 IFB IFB x 5 x 7

Variations of BCC Hybrid BCC Frequency Frequency Frequency f 0 ICTD, ICLD ICC Frequency 100 QUALITY Hybrid BCC Conventional coder BCC BIT RATE Differences to intensity stereo coding: - consider more cues (not only intensities) - use separate filterbank (better performance)

Variations of BCC Hybrid BCC 100 anchor 1 anchor 2 f 0 =0kHz f 0 =1kHz f 0 =2kHz f 0 =6kHz f 0 =16kHz AVERAGE 80 GRADING 60 40 20 0 x 1 x 2 FB FB BCC ENCODER 59 kb/s 68 kb/s y 1 IFB y 2 IFB 74 kb/s FB FB 84 kb/s BCC DECODER 101 kb/s IFB IFB x 1 x 2 x C FB IFB y C FB IFB x C SIDE INFORMATION

Source Localization in Complex Listening Scenarios Three and five concurrent male speech sources: (-30, 0, 30, -80, 80, free-field) Without cue selection With cue selection ΔL [db] ILD ΔL [db] ILD p(δl) (A) 5 0 5 p(τ) 1 0 1 τ [ms] ITD p(δl c 12 >c 0 ) (C) 5 0 ΔL [db] ILD ILD ΔL [db] p(δl) (B) 5 0 5 p(τ) 1 0 1 τ [ms] ITD p(δl c 12 >c 0 ) (D) 5 0 5 5 p(τ c 12 >c 0 ) 1 0 1 τ [ms] ITD p(τ c 12 >c 0 ) 1 0 1 τ [ms] ITD

Source Localization in Complex Listening Scenarios Cue selection: Three phases of the precedence effect, 500Hz critical band WITHOUT Without CUE cue SELECTION selection 200 ms ITD τ [ms] ILD Δ L [db] ITD τ [ms] 1 0 1 5 0 5 1 0 0 0.2 0.4 0.6 0.8 1 ICI [ms] With cue selection WITH CUE SELECTION 5 10 15 20 ICI [ms] lead/lag delay ILD Δ L [db] p 0, c 0 1 5 0 5 1 0.5 p 0 c 0 0 0 0.2 0.4 0.6 0.8 1 ICI [ms] lead/lag delay [ms] 5 10 15 20 ICI [ms]

Source Localization in Complex Listening Scenarios Amplitude panning, free-field, 500Hz critical band Reference REFERENCE BCC BCC (no without ICC) 1 IC c 0.9 c 0 = 0.995 c 0 = 0.995 c 0 = 0.995 0.8 0.5 ILD τ [ms] 0 0.5 5 Δ L [db] ITD Standard stereo setup Two male speech sources +-8dB amplitude panning. 0 5 0 1 2 3 4 TIME [s] 0 1 2 3 4 TIME [s] 0 1 2 3 4 TIME [s]