SAOC and USAC. Spatial Audio Object Coding / Unified Speech and Audio Coding. Lecture Audio Coding WS 2013/14. Dr.-Ing.

Similar documents
Technical PapER. between speech and audio coding. Fraunhofer Institute for Integrated Circuits IIS

The MPEG-4 General Audio Coder

ISO/IEC INTERNATIONAL STANDARD. Information technology MPEG audio technologies Part 3: Unified speech and audio coding

Audio coding for digital broadcasting

Parametric Coding of High-Quality Audio

MPEG-4 General Audio Coding

Convention Paper 8654 Presented at the 132nd Convention 2012 April Budapest, Hungary

Audio Engineering Society. Convention Paper. Presented at the 126th Convention 2009 May 7 10 Munich, Germany

Efficient Representation of Sound Images: Recent Developments in Parametric Coding of Spatial Audio

Optical Storage Technology. MPEG Data Compression

New Results in Low Bit Rate Speech Coding and Bandwidth Extension

INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO

5: Music Compression. Music Coding. Mark Handley

Parametric Coding of Spatial Audio

3GPP TS V6.2.0 ( )

Perceptual Audio Coders What to listen for: Artifacts of Parametric Coding

Lecture 16 Perceptual Audio Coding

Audio Coding Standards

ELL 788 Computational Perception & Cognition July November 2015

MPEG-4 aacplus - Audio coding for today s digital media world

Audio-coding standards

MPEG-4 Version 2 Audio Workshop: HILN - Parametric Audio Coding

INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO

Audio-coding standards

6MPEG-4 audio coding tools

2.4 Audio Compression

Data Compression. Audio compression

Speech and audio coding

Audio Fundamentals, Compression Techniques & Standards. Hamid R. Rabiee Mostafa Salehi, Fatemeh Dabiran, Hoda Ayatollahi Spring 2011

Digital Speech Coding

MPEG SURROUND: THE FORTHCOMING ISO STANDARD FOR SPATIAL AUDIO CODING

Module 9 AUDIO CODING. Version 2 ECE IIT, Kharagpur

Appendix 4. Audio coding algorithms

EE482: Digital Signal Processing Applications

Mpeg 1 layer 3 (mp3) general overview

Speech-Coding Techniques. Chapter 3

Chapter 4: Audio Coding

Multimedia Communications. Audio coding

Principles of Audio Coding

ON-LINE SIMULATION MODULES FOR TEACHING SPEECH AND AUDIO COMPRESSION TECHNIQUES

Compressed Audio Demystified by Hendrik Gideonse and Connor Smith. All Rights Reserved.

Perceptual coding. A psychoacoustic model is used to identify those signals that are influenced by both these effects.

Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal.

SAOC FOR GAMING THE UPCOMING MPEG STANDARD ON PARAMETRIC OBJECT BASED AUDIO CODING

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Senior Manager, CE Technology Dolby Australia Pty Ltd

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Snr Staff Eng., Team Lead (Applied Research) Dolby Australia Pty Ltd

14th European Signal Processing Conference (EUSIPCO 2006), Florence, Italy, September 4-8, 2006, copyright by EURASIP

ETSI TS V (201

System Identification Related Problems at SMN

Spatial Audio Object Coding (SAOC) The Upcoming MPEG Standard on Parametric Object Based Audio Coding

Lecture 7: Audio Compression & Coding

Efficiënte audiocompressie gebaseerd op de perceptieve codering van ruimtelijk geluid

System Identification Related Problems at

ISO/IEC Information technology High efficiency coding and media delivery in heterogeneous environments. Part 3: 3D audio

Implementation of G.729E Speech Coding Algorithm based on TMS320VC5416 YANG Xiaojin 1, a, PAN Jinjin 2,b

Perceptual Coding. Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding

AUDIO. Henning Schulzrinne Dept. of Computer Science Columbia University Spring 2015

Scalable Perceptual and Lossless Audio Coding based on MPEG-4 AAC

Audio Coding and MP3

System Identification Related Problems at SMN

Chapter 14 MPEG Audio Compression

Convention Paper Presented at the 140 th Convention 2016 June 4 7, Paris, France

Convention Paper Presented at the 124th Convention 2008 May Amsterdam, The Netherlands

The Reference Model Architecture for MPEG Spatial Audio Coding

Source Coding Basics and Speech Coding. Yao Wang Polytechnic University, Brooklyn, NY11201

Enhanced MPEG-4 Low Delay AAC - Low Bitrate High Quality Communication

Proceedings of Meetings on Acoustics

MPEG Spatial Audio Coding Multichannel Audio for Broadcasting

Presents 2006 IMTC Forum ITU-T T Workshop

Bluray (

ITNP80: Multimedia! Sound-II!

Opus, a free, high-quality speech and audio codec

GSM Network and Services

Squeeze Play: The State of Ady0 Cmprshn. Scott Selfon Senior Development Lead Xbox Advanced Technology Group Microsoft

25*$1,6$7,21,17(51$7,21$/('(1250$/,6$7,21,62,(&-7&6&:* &2',1*2)029,1*3,&785(6$1'$8',2 &DOOIRU3URSRVDOVIRU1HZ7RROVIRU$XGLR&RGLQJ

TECHNICAL PAPER. Fraunhofer Institute for Integrated Circuits IIS

the Audio Engineering Society. Convention Paper Presented at the 120th Convention 2006 May Paris, France

MPEG-4 Advanced Audio Coding

DRA AUDIO CODING STANDARD

ROW.mp3. Colin Raffel, Jieun Oh, Isaac Wang Music 422 Final Project 3/12/2010

Perceptual Pre-weighting and Post-inverse weighting for Speech Coding

Transporting audio-video. over the Internet

ISO/IEC INTERNATIONAL STANDARD

Chapter 5.5 Audio Programming

ROBUST SPEECH CODING WITH EVS Anssi Rämö, Adriana Vasilache and Henri Toukomaa Nokia Techonologies, Tampere, Finland

Figure 1. Generic Encoder. Window. Spectral Analysis. Psychoacoustic Model. Quantize. Pack Data into Frames. Additional Coding.

For Mac and iphone. James McCartney Core Audio Engineer. Eric Allamanche Core Audio Engineer

ISO/IEC INTERNATIONAL STANDARD. Information technology Coding of audio-visual objects Part 3: Audio

Structural analysis of low latency audio coding schemes

CISC 7610 Lecture 3 Multimedia data and data formats

application Bulletin Fraunhofer Institute for Integrated Circuits IIS

Ch. 5: Audio Compression Multimedia Systems

Advanced Video Coding: The new H.264 video compression standard

MPEG-1 Bitstreams Processing for Audio Content Analysis

Convention Paper 7215

Distributed Signal Processing for Binaural Hearing Aids

S.K.R Engineering College, Chennai, India. 1 2

ETSI TR V ( )

1 Audio quality determination based on perceptual measurement techniques 1 John G. Beerends

Estimating MP3PRO Encoder Parameters From Decoded Audio

Transcription:

SAOC and USAC Spatial Audio Object Coding / Unified Speech and Audio Coding Lecture Audio Coding WS 2013/14 Dr.-Ing. Andreas Franck Fraunhofer Institute for Digital Media Technology IDMT, Germany

SAOC Spatial Audio Object Coding Outline Introduction From Spatial Audio Coding to SAOC Audio objects SAOC Decoding Applications Performance Evaluation Conclusion 2

SAOC - Introduction Perceptual audio coding for multichannel signals is widely used Spatial Audio Coding, for instance MPEG Surround Existing Spatial Audio Coders are channel-based Designed for a specific reproduction setup Spatial Audio Object Coding Continuation of the Spatial Audio Coding paradigm Transmit audio objects instead of channel signals ISO/IEC 23003-2:2010 Standard 3

SAOC From Spatial Audio Coding to SAOC Spatial Audio Coding (e.g., MPEG Surround) Channel-oriented Downmix (mono or stereo) Transmit downmix using standard audio codec (AAC) Additional parameter data (parametric coding) Output channels for specific reproduction setup 5.1, 7.1 x 1 x 2 x N SAC Encoder Do wnmix Parameter estimation & coding Audio codec Parameter bitstream SAC Decoder Rendering Parameter decoding ŷ 1 ŷ 2 ŷ M Transmitting side Receiving side [1] Herre et.al 2012: MPEG Spatial Audio Object Coding, J. Audio Eng. Soc. 60:9, 2012 4

SAOC Audio Objects Audio objects instead of channels SAOC encoder: Stereo or mono downmix plus SAOC parameters SAOC decoder: Use SAOC parameters to transform downmix into audio objects Rendering to loudspeaker configuration (Rendering matrix ) Obj. #1 Obj. #2 Obj. #3 Obj. #4... SAOC Encoder Downmix signal(s) SAOC Decoder obj. #1 obj. #2 obj. #3 obj. #4... Renderer Chan. #1... Chan. #2 SAOC parameters Interaction 5

SAOC Audio Objects Advantages of object-based processing Coding efficiency: SAOC parameters only a few kbit/s per audio object Coding and transmission independent of reproduction setup Rendering on arbitrary loudspeaker setups 5.1, 7.1, 10.2, 22.2, Binaural reproduction, Wave field synthesis, Rendering controllable (real-time user interaction) Control over individual audio objects Change gain, equalization, effects, 6

SAOC Decoding Modes Decoder Processing Mode Rendering integrated into decoding (efficient) SAOC decoder For mono and stereo output, incl. binaural reproduction Downmix Downmix processor Outpu t Rendering matrix: realtime control of rendering HRTF parameters for binaural Open SAOC interface SAOC bitstream SAOC parameter processor Enables use of individual HRTFs Efficient parametric representation Rendering matrix HRTF parameters 7

SAOC Decoding Modes Transcoder Processing Mode For multichannel output (MPEG Surround - MPS) SAOC encoder works as transcoder Transcoding of SAOC parameters to MPS bitstream Adjustment of downmix panning (only for stereo downmix) Highly Efficient Operates in transform domain Avoid unnecessary (de)quantization and decoding Downmix SAOC bitstream SAOC transcoder Downmix preprocessor SAOC parameter processor Rendering matrix Preprocessed downmix MPS bitstream MPS decoder Output 8

SAOC Bitstream Contains parametric description of audio objects: SAOC parameters Typical: 2-3 kbit/s per audio object (plus 3 kbit/s per audio scene) SAOC bitstream embedded in ancillary data of core audio coder Enables backward compatibility Parameters transmitted in flexible time/frequency grid Adaptation to bitrate demands and/or signal characteristics Same time/frequency grid as in MPEG Surround Lossless, efficient transcoding 9

SAOC Enhanced Audio Objects Allow arbitrary attenuation or amplification of objects Karaoke Solo voices, SAOC bitstream contains residual signal Reconstruction from downmix and residual Efficient transmission of residual signal (AAC) 10

SAOC - Applications Interactive Remix / Karaoke Interactive remixes Equalization, room simulation, (for individual objects) For channel-based formats, only applicable to whole scene Modification of specific audio objects (instruments, voices, ) Karaoke, vocal solo Suppress main voice or background music Advantageous: Enhanced Audio Objects Future extensions of digital broadcasting Clean-audio dialogs Additional objects for interactivity 11

SAOC - Applications Teleconferencing Today: Mainly monophonic reproduction Suboptimal for multi-user scenarios Key benefits of SAOC Adjustment of individual speaker signals Spatial representation of audio scene Match between visual and audio scene Improved intelligibility and listening comfort Transmission efficiency Backward compatibility 12

SAOC - Applications Rich Media / Gaming Applications of Rich Media Interactive audio-visual interfaces Games Platforms Mobile Flash- or Java-Based Limited audio rendering capabilities (audio scene size) Key advantages Low complexity (number of output channels instead of scene size) Interactivity (adjust level of objects and background music) Efficient transmission, backward compatibility 13

SAOC Performance Evaluation Listening Test Remix scenario Part of MPEG verification tests (5 sites, 125 participants) MUSHRA test (ITU BS.1534-1) Simulate adjustments to a mix of audio objects Core coder: High Efficiency AAC (HE-AAC) HR: Hidden reference LP: 3.5 khz Lowpass HE-AAC REF: Individual objects, high bitrate HE-AAC MO: Individual objects, same bitrate HE-AAC SAOC: standard SAOC HE-AAC SAOC LP: low power 14

SAOC Performance Evaluation Listening Test Teleconferencing Part of MPEG verification tests Teleconferencing application: Simulate adjustments of a participant Core coder: MPEG-4 Enhanced Low Delay AAC (AAC-ELD) HR: Hidden reference LP: 3.5 khz Lowpass AAC-ELD MO: Individual objects AAC-ELD SAOC-LD: low delay 15

SAOC - Summary Highly efficient transport/storage of audio objects and flexible/interactive audio scene rendering Backwards compatible downmix for reproduction on legacy devices Flexible rendering configurations (loudspeaker setups) ISO/MPEG standard Very interesting applications, e.g.: Remixing / Karaoke Gaming / Rich media Teleconferencing Interactivity for broadcast applications 16

USAC Unified Speech and Audio Coding Outline Introduction Differences between speech and audio coding Codec structure Improvements to coding tools Performance Evaluation Applications Summary 17

USAC - Introduction Status quo: General audio coding and speech coding are largely separate worlds Problem: Increased demand for audio coders that handle all types of inputs Broadcasting Audio books, multimedia Mobile devices for all types of content (often low bandwidths) Objective (initiated by MPEG) Universal codec that handles all types of content at least as well as the best current speech or audio codec 18

USAC Differences Between Speech and Audio Coding Audio Coding Information sink model Characteristics of human hearing Typically transform- or subbandbased approaches Divide signal in multiple bands and apply psychoacoustics Speech Coding Information source model Characteristics of vocal tract Typically based on prediction coding Predicted filter for the vocal tract and an excitation signal Not well-suited for speech (at bit rates typically used by speech coders ) Poor quality for music 19

USAC Hybrid coding approach Combine state-of the art audio and speech coders HE-AACv2 AMR-WB+ Switch between coders based on content Signal classification Scalefactors Arithm. Dec. Inv. Quant. Scaling Bitstream De-Multiplex LPC Dec. LPC to Freq. Dom. ACELP LPC Synth Filter Share common functionality Take care of artifacts due to switches Figure: Neuendorf et.al: MPEG Unified Speech and Audio Coding, J. Audio Eng. Soc., 61:12, Dec. 2013 AAC MDCT (time-warped) Bandwidth Extension AMR-WB+ Windowing, Overlap-Add Uncompressed PCM Audio FAC Stereo Processing 20

USAC AMR-WB+ (1) Adaptive Multi-Rate Wideband State-of-the-art speech coder Based on ACELP (Algebraic codeexcited linear prediction) CELP: Encode signal by LPC coefficients LTP coefficients: long term prediction (delay and gain) Innovation codebook : excitation signal, sparse pulses ACELP: Algebraic representation of innovation codebook CELP encoder LPC Figure: A. Valin: Speex: A Free Coder for Free Speech, 2006 21

USAC AMR-WB+ (2) Extended Adaptive Multi-Rate Wideband The + in AMR-WB+ Additional transform-domain coder for music signals Parametric high frequency LPC extension Parametric stereo extensions But: For music, still inferior to CELP encoder good audio coders Figure: A. Valin: Speex: A Free Coder for Free Speech, 2006 22

USAC Coder/Encoder Structure General structure of modern audio codecs Encoder Spatial coding Parametric bandwidth extension Core coding Decoder In opposite order Uncompressed PCM Audio Spatial / Stereo Encoder and Do wnmix LP low freq. L HP high freq. R Bandwidt h Extension Encoder Core Decoder Bitstream De -Multi plex low freq. Combine Bandwidth Extension Decoder high freq. Core Encoder Spatial / Ste reo D ecoding and Upmix L R Figure: Neuendorf et.al: MPEG Unified Speech and Audio Coding, J. Audio Eng. Soc., 61:12, Dec. 2013 Bitstream Multiplex Encoder Uncompressed PCM Audio Decoder 23

USAC Decoder Structure Here: Focus on decoder Only decoder is standardized Follows general codec structure Left part: audio coder (HE-AACv2) Right part: speech coder (AMR-WB+) Some tools shared (LPC decoding) Challenge: Switching between modes Figure: Neuendorf et.al: MPEG Unified Speech and Audio Coding, J. Audio Eng. Soc., 61:12, Dec. 2013 Scalefactors Arithm. Dec. Inv. Quant. Scaling MDCT (time-warped) Bandwidth Extension Bitstream De-Multiplex LPC Dec. LPC to Freq. Dom. Windowing, Overlap-Add Uncompressed PCM Audio ACELP LPC Synth Filter FAC Stereo Processing 24

USAC Transition Handling Encoder switches between two modes Signal classifier (speech or music) Transition handling without audible errors or loss of coding efficiency HE-AAC: MDCT: Transform (frequency) domain, overlapping windows, time-domain alias cancellation (TDAC) AMR-WB+ Time-domain, no overlap Solution: Forward Aliasing Cancellation (FAC) In case of transitions, transmit the alias cancellation information for TDAC ed) ndowing, Overlap-Add compressed PCM Audio FAC Stereo Processing 25

USAC Improvements to Coding Tools USAC is not just a combination of HE-AAC and AMR-WB+ Multiple improvements to both parts Context-adaptive arithmetic coder for transform coding Additional quantization modes Alternate LPC-based noise shaping Additional MDCT window sizes Time-Warped MDCT Enhanced Spectral Bandwidth Replication Unified Stereo Coding 26

USAC Time-Warped MDCT (1) Transform coding good for stationary tonal signals Sparse spectrum, few nonzero spectral coefficients to code High coding gain Problematic: Pitch changes within signal Typical signal: Voiced speech Smearing of energy over many spectral coefficients Decreased coding efficiency 27

USAC Time-Warped MDCT (2) Solution: Time-Warped MDCT Reduce variations of fundamental frequency Basic algorithm (encoder side) Apply a time-variant resampling prior to the MDCT Adjust MDCT windows to preserve TDAC Transmit resampling ratio as side information Original spectrum Figure: Edler et.al: A time-warped MDCT approach to speech transform coding, AES 126th Convention, May 2009 Time-warped spectrum 28

USAC Enhanced Spectral Band Replication (1) State of the Art Spectral Band Replication Basis: SBR of HE-AAC Operates in QMF domain Copy low-frequency spectrum to higher frequencies Adjust HF copies based on parameters (side info) Tonality Envelope Additional noise, sinusoids Level [db] Level [db] 50 0-50 50 0 A AAC lo wband signal 0 5 10 15 20 Freq uency [khz] HF Generated high band signal B 1 2 3 Freq uencyrangeof first HF patch Freq uencyrangeof secondhf patch Frequency ranges of inverse filtering bands Figure: Neuendorf et.al: MPEG Unified Speech and Audio Coding, J. Audio Eng. Soc., 61:12, Dec. 2013-50 0 5 10 15 20 Freq uency [khz] 29

USAC Enhanced Spectral Band Replication (2) Alternative Sampling Rate Ratios HE-AAC SBR performs a 2:1 upsampling in QMF domain Bandwidth doubled USAC: Additional ratios bitstream AAC Decoder audio signal f s SBR Decoder QMF Analysis (32 bands) SBR processing QMF Synthesis (64 bands) audio output 2 f s 4:1 (16 QMF analysis bands) Four times the core bandwidth Good for very low bit rates 8:3 (24 QMF analysis bands) Halfway between 2:1 and 4:1 Best tradeoff for medium bit rates ( ~ 24 kbit/s) 2:1 bitstream 4:1 AAC Decoder SBR data audio signal f s SBR data SBR Decoder QMF Analysis (16 bands) SBR processing QMF Synthesis (64 bands) audio output 42 f ss 30

USAC Enhanced Spectral Band Replication (3) Harmonic Transposition HE-AAC SBR: Spectral copies Frequency shifts Bad match for harmonics of tonal signals (integer multiples) USAC esbr: Harmonic transposer Map sinusoid with frequency ω to sinusoid with frequency, integer Supported orders 2,3,4 Frequency shifts for higher orders Other improvements in esbr (not covered here) Predictive vector coding for SBR spectral envelopes 31

Stereo Coding in USAC Unified Stereo Coding (1) Discrete Stereo Coding Strives to preserve waveforms Joint coding techniques (e.g., M/S) Used with higher bit rates Parametric Stereo Coding Mono downmix and side info (parameters) Typically used with low bit rates Unified Stereo Coding Extends and combines discrete and parametric stereo coding Additional parameter: IPD (inter-channel phase difference) Transmit parameters and residual signal Use parameters to minimize residual 32

Stereo Coding in USAC Unified Stereo Coding (2) Prediction factor α (complex-valued) Gain normalization and α determined from parametric stereo parameters Figure: Neuendorf et.al: MPEG Unified Speech and Audio Coding, J. Audio Eng. Soc., 61:12, Dec. 2013 33

USAC Performance Evaluation Verification Test Results Question: Whether USAC performs as least as good as the better of the best speech or audio coder Part of verification tests for approval by ISO/IEC 3 tests, 13 test sites, 60-25 participants MUSHRA methodology Test subjects USAC HE-AACv2 AMR-WB+ Virtual coder (VC): The better of HE-AACv2 and AMR-WB+ Determined separately for each test item and bit rate 34

USAC Performance Evaluation Listening Test Mono, Low Bit Rates Speech Mixed 100 Excellent Music All Good 80 60 Fair 40 Poor 20 Bad 0 8 12 16 24 8 12 16 24 8 12 16 24 bitrate [kbps] 8 12 16 24 USAC VC HE-AAC AMR 35

USAC Performance Evaluation Listening Test Stereo, Low Bit Rates Speech Mixed 100 Excellent Music All 80 Good 60 Fair 40 Poor 20 Bad 0 16 20 24 16 20 24 16 20 24 bitrate [kbps] 16 20 24 USAC VC HE-AAC AMR 36

USAC Performance Evaluation Listening Test Stereo, High Bit Rates Speech Mixed 100 Excellent Music All 80 Good 60 Fair 40 Poor 20 Bad 0 32 48 64 96 32 48 64 96 32 48 64 96 bitrate [kbps] 32 48 64 96 USAC VC HE-AAC AMR 37

USAC - Applications Multimedia streaming Mobile devices Scalability is key feature Significant quality improvements for low bit rates Broadcasting applications Coding efficiency saves bandwidth Audio books Mainly speech Guarantees good quality for music and effects 38

USAC Unified Speech and Audio Coding Summary First audio codec that successfully merges general audio and speech coding For music signals, improved quality especially at low to very low bit rates Moderately increased computational complexity Standardized as ISO/IEC 23003-3:2012 MPEG-D Unified Speech and Audio Coding Applicable as general-purpose codec at all bit rates 39