Parametric Coding of High-Quality Audio

Similar documents
MPEG-4 General Audio Coding

The MPEG-4 General Audio Coder

ELL 788 Computational Perception & Cognition July November 2015

5: Music Compression. Music Coding. Mark Handley

Perceptual Audio Coding Using Adaptive Preand Post-Filters and Lossless Compression

Perceptual coding. A psychoacoustic model is used to identify those signals that are influenced by both these effects.

Mpeg 1 layer 3 (mp3) general overview

Multimedia Communications. Audio coding

Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal.

Principles of Audio Coding

Audio-coding standards

SAOC and USAC. Spatial Audio Object Coding / Unified Speech and Audio Coding. Lecture Audio Coding WS 2013/14. Dr.-Ing.

Audio-coding standards

Lecture 16 Perceptual Audio Coding

Chapter 14 MPEG Audio Compression

Optical Storage Technology. MPEG Data Compression

New Results in Low Bit Rate Speech Coding and Bandwidth Extension

Perceptual Coding. Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding

MPEG-4 aacplus - Audio coding for today s digital media world

Audio Compression. Audio Compression. Absolute Threshold. CD quality audio:

Audio Coding and MP3

Audio Coding Standards

Scalable Perceptual and Lossless Audio Coding based on MPEG-4 AAC

Compressed Audio Demystified by Hendrik Gideonse and Connor Smith. All Rights Reserved.

Chapter 4: Audio Coding

2.4 Audio Compression

Perceptual Audio Coders What to listen for: Artifacts of Parametric Coding

Appendix 4. Audio coding algorithms

Figure 1. Generic Encoder. Window. Spectral Analysis. Psychoacoustic Model. Quantize. Pack Data into Frames. Additional Coding.

S.K.R Engineering College, Chennai, India. 1 2

Audio Fundamentals, Compression Techniques & Standards. Hamid R. Rabiee Mostafa Salehi, Fatemeh Dabiran, Hoda Ayatollahi Spring 2011

EE482: Digital Signal Processing Applications

Audio coding for digital broadcasting

Audio and video compression

MPEG-4 Version 2 Audio Workshop: HILN - Parametric Audio Coding

Fundamentals of Perceptual Audio Encoding. Craig Lewiston HST.723 Lab II 3/23/06

Embedded lossless audio coding using linear prediction and cascade coding

Speech and audio coding

ISO/IEC INTERNATIONAL STANDARD. Information technology MPEG audio technologies Part 3: Unified speech and audio coding

CISC 7610 Lecture 3 Multimedia data and data formats

14th European Signal Processing Conference (EUSIPCO 2006), Florence, Italy, September 4-8, 2006, copyright by EURASIP

Contents. 3 Vector Quantization The VQ Advantage Formulation Optimality Conditions... 48

Data Compression. Audio compression

Technical PapER. between speech and audio coding. Fraunhofer Institute for Integrated Circuits IIS

Parametric Coding of Spatial Audio

Ch. 5: Audio Compression Multimedia Systems

MPEG-1. Overview of MPEG-1 1 Standard. Introduction to perceptual and entropy codings

Networking Applications

KINGS COLLEGE OF ENGINEERING DEPARTMENT OF INFORMATION TECHNOLOGY ACADEMIC YEAR / ODD SEMESTER QUESTION BANK

Lecture 7: Audio Compression & Coding

Estimating MP3PRO Encoder Parameters From Decoded Audio

Perceptual Pre-weighting and Post-inverse weighting for Speech Coding

Efficient Representation of Sound Images: Recent Developments in Parametric Coding of Spatial Audio

DSP. Presented to the IEEE Central Texas Consultants Network by Sergio Liberman

DAB. Digital Audio Broadcasting

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Snr Staff Eng., Team Lead (Applied Research) Dolby Australia Pty Ltd

Principles of MPEG audio compression

MPEG-4 Advanced Audio Coding

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Senior Manager, CE Technology Dolby Australia Pty Ltd

Source Coding Basics and Speech Coding. Yao Wang Polytechnic University, Brooklyn, NY11201

What is multimedia? Multimedia. Continuous media. Most common media types. Continuous media processing. Interactivity. What is multimedia?

ENTROPY CODING OF QUANTIZED SPECTRAL COMPONENTS IN FDLP AUDIO CODEC

Design and Implementation of an MPEG-1 Layer III Audio Decoder KRISTER LAGERSTRÖM

MP3. Panayiotis Petropoulos

Proceedings of Meetings on Acoustics

Multimedia. What is multimedia? Media types. Interchange formats. + Text +Graphics +Audio +Image +Video. Petri Vuorimaa 1

Aalborg Universitet. Published in: Proc. of Asilomar Conference on Signals, Systems, and Computers. Publication date: 2008

Squeeze Play: The State of Ady0 Cmprshn. Scott Selfon Senior Development Lead Xbox Advanced Technology Group Microsoft

3GPP TS V6.2.0 ( )

Module 9 AUDIO CODING. Version 2 ECE IIT, Kharagpur

AUDIOVISUAL COMMUNICATION

Application of wavelet filtering to image compression

ijdsp Interactive Illustrations of Speech/Audio Processing Concepts

ISO/IEC INTERNATIONAL STANDARD

Efficient Signal Adaptive Perceptual Audio Coding

Convention Paper Presented at the 118th Convention 2005 May Barcelona, Spain

INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO

CT516 Advanced Digital Communications Lecture 7: Speech Encoder

An adaptive wavelet-based approach for perceptual low bit rate audio coding attending to entropy-type criteria

Compression; Error detection & correction

ROW.mp3. Colin Raffel, Jieun Oh, Isaac Wang Music 422 Final Project 3/12/2010

Introduction to LAN/WAN. Application Layer 4

Enhanced MPEG-4 Low Delay AAC - Low Bitrate High Quality Communication

Convention Paper Presented at the 121st Convention 2006 October 5 8 San Francisco, CA, USA

Efficient Implementation of Transform Based Audio Coders using SIMD Paradigm and Multifunction Computations

Compression Part 2 Lossy Image Compression (JPEG) Norm Zeck

MPEG-1 Bitstreams Processing for Audio Content Analysis

SPREAD SPECTRUM AUDIO WATERMARKING SCHEME BASED ON PSYCHOACOUSTIC MODEL

Lossy compression. CSCI 470: Web Science Keith Vertanen

Compression; Error detection & correction

Data Compression. Media Signal Processing, Presentation 2. Presented By: Jahanzeb Farooq Michael Osadebey

15 Data Compression 2014/9/21. Objectives After studying this chapter, the student should be able to: 15-1 LOSSLESS COMPRESSION

DRA AUDIO CODING STANDARD

CSCD 443/533 Advanced Networks Fall 2017

<< WILL FILL IN THESE SECTIONS THIS WEEK to provide sufficient background>>

ETSI TS V (201

Multimedia Systems Speech I Mahdi Amiri February 2011 Sharif University of Technology

Lecture Information Multimedia Video Coding & Architectures

Perceptually motivated Sub-band Decomposition for FDLP Audio Coding

PQMF Filter Bank, MPEG-1 / MPEG-2 BC Audio. Fraunhofer IDMT

Transcription:

Parametric Coding of High-Quality Audio Prof. Dr. Gerald Schuller Fraunhofer IDMT & Ilmenau Technical University Ilmenau, Germany 1

Waveform vs Parametric Waveform Filter-bank approach Mainly exploits limitations of human auditory system Mature technology Parametric Source model approach Exploits both source model as well as limitations of human auditory system most audio coders use a combination of both 2

1. Perceptual Noise Substitution (PNS) 3

Perceptual Noise Substitution (1) Background: Parametric coding of signals gives a very compact signal representation Parametric coding of noise-like signal components has been used widely e.g. in speech coding Can similar techniques be used in perceptual audio coding? MPEG-4: Perceptual Noise Substitution (PNS) permits a frequency selective parametric coding of noise-like signal components 4

Perceptual Noise Substitution (2) 5

Perceptual Noise Substitution (3) Principle: Noise-like signal components are detected on a scalefactor band basis Corresponding groups of spectral coefficients are excluded from quantization/coding Instead, only a "noise substitution flag" plus total power of the substituted band is transmitted in the bitstream Decoder inserts pseudo random vectors with desired target power as spectral coefficients -> Highly compact representation for noise-like spectral components 6

2. Spectral Band Replication Bandwidth Extension (1) Background Concept Audio coding at very low bitrates artifacts To avoid excessive artifacts, bandwidth is usually sacrificed at low bitrates (<40kbit/s/ch) Signal sounds unattractive (muffled) re-generate HF signal content at decoder end from LF part (and some helper information) 7

Bandwidth Extension (2) Idea of Spectral Band Replication (SBR) Compatibility Re-generate HF-part of signal spectrum by means of transposition of transmitted spectrum ensures preservation of harmonic structure Subsequent shaping of signal towards original time/spectral envelope by an adaptive filter (envelope adjuster) Some more provisions for handling special situations SBR bitstream elements (ca. 2 kbit/s/ch) can be stored in AAC bitstream in a compatible way Standard AAC decoders decode AAC part only MPEG-4 Audio Extension #1 SBR is used e.g. in High-Efficiency AAC (aacplus) and MP3Pro 8

Bandwidth Extension (3) Spectral Band Replication Scheme (Principle) AAC Core Decoder Delay AAC-SBR Bitstream Bitstream Parser Bitstream Demultiplexer Huffman Decoding & Dequantization Analysis QMF Bank Transposer Envelope Adjuster Synthesis QMF Bank Output PCM Samples 9

Bandwidth Extension (4) Dietz e.a, Spectral Band Replication, a novel aproach in audio coding 10

Bandwidth Extension (5) Dietz e.a, Spectral Band Replication, a novel aproach in audio coding 11

Bandwidth Extension (6) Dietz e.a, Spectral Band Replication, a novel aproach in audio coding 12

Bandwidth Extension (7) Dietz e.a, Spectral Band Replication, a novel aproach in audio coding 13

Bandwidth Extension (8) Dietz e.a, Spectral Band Replication, a novel aproach in audio coding 14

Bandwidth Extension (9) Dietz e.a, Spectral Band Replication, a novel aproach in audio coding 15

Audio Coding for Communication Applications Prof. Dr. Gerald Schuller Fraunhofer IDMT & Ilmenau University of Technology 16

Introduction (1) Audio coding/compression is used in: Multimedia applications Storage Digital broadcast: Digital radio Examples: Sirius, XM-Radio, ibiquity, DAB MP3 (MPEG-1 Layer 3) MPEG-2/4 AAC AC-3 PAC 17

Introduction (2) New networks: Higher rate wireless services (for instance with space-time block coding) Quality of service (->low delay) In-home or local networks 18

New Communication Applications High quality teleconferencing Reporting for radio or TV stations, using wireless networks Delay wise most critical: Virtual presence (for instance musicians playing together over long distance) Concerts with wireless microphones and speakers (data compression for transmitted power and bandwidth) Desired delay < 10 ms (Ultra Low Delay) 19

Goals Audio coding for communications: Ultra low encoding/decoding delay (<10ms) High quality for music and speech Bit-rates about 50-100 kb/s 20

Problems Conventional audio coders: Good audio quality But very high encoding/decoding delay (>100 ms) Speech coders: Low encoding/decoding delay (order of 10 50ms), suitable for communications applications But not high quality audio, don t perform well on non-speech signals like music or room noise 21

Conventional Audio Coders Audio Input Analysis Filter Bank Q Q Coder Coder Synthesis Filter Bank Decoded Signal Psychoacoustic model 22

Compression Irrelevance (Psycho Acoustics) What the receiver (ear) cannot detect Sound below the threshold of hearing In general sound below the psychoacoustic masking threshold Redundancy The predictability or statistical dependencies in a signal 23

Basics of Psychoacoustics (Irrelevance) Temporal masking threshold Spectral masking threshold 24

Major Sources of Delay System delay of analysis and synthesis filter bank Buffering for bit-rate smoothing Previous Approaches: Low delay filter banks MPEG-4 low delay coder: reduced number of subbands 25

Limitations of previous Approaches High coding gain requires high numbers of subbands Low delay filter banks: delay lower bounded by downsampling factor (= number of bands) MPEG 4 low delay coder: reduced coding efficiency (higher bit-rate), not very low delay (ca. 30 ms at 32 khz sampling). Reason: subband coding leads to trade-off between coding efficiency and delay. 26

New Approach Subband coding has same asymptotic gain as predictive coding (Jayant, Noll, 1984; Nitadori, 1970). But predictive coding has lower delay Replace filter bank by predictor 27

How to apply predictive Coding Problem: output of psycho-acoustic model is a time/frequency description. Approach: Separate stages for application of irrelevance (psycho-acoustics) and redundancy reduction Apply psycho-acoustic quantization noise shaping with linear filters (irrelevance red.) Use lossless predictive coding after quantization (redundancy reduction) 28

Pre- and Post-Filter Approach Encoder full band connection Decoder Audio Input Pre-Filter Q Lossless Coding Lossless Decoding Post-filter Audio Output Psychoacoustic model Irrelevance Reduction Redundancy Reduction 29

Function Pre- and post-filters form the quantization noise over frequency and time Post-filter is inverse of pre-filter Pre-filter normalizes the signal to its psychoacoustic masking threshold Simple uniform constant step size quantizer is used (rounding operation) Added benefit: more precise control over quantization noise shape than conventional approach 30

Pre-Filter Structure Short delay for synchronization with psycho-acoustic model (our implementation: 128 samples + 128 samples blocking delay) Audio Input Delay Gain x + _ Q prefilterd quantized signal FIR Analysis Filter Bank Psychoacoustic model LPC LSF Gain 31

Properties Disadvantage: Computationally complex structure But advantage: No inherent delay, suitable for communications applications 32

Example Frequency Response Spectrum of the signal (i) compared to masking threshold (ii) and freq. resp. of post-filter (iii) 33

Redundancy Reduction Irrelevance removed after pre-filter and quantizer Lossless compression needed for perceptual lossless coding Audio Irrelevance Reduction Redundancy Reduction Coded Audio Pre-filter and quantization Lossless coding or compression 34

Lossless Coding Unit Goals: high compression ratios low delay Previous approaches: General purpose or text: Lempel-Ziv, PPMZ Audio: Shorten (Softsound, GB), LPAC, LTAC (TU-Berlin, Germany), WaveZip (Soundspace, CA), MLP (Meridian, GB, for DVD) 35

Lossless Coding Previous approaches: Made for file compression Based on forward (block based) prediction, or transforms Problems: High encoding/decoding delay Compression ratio can be improved New approach: Backward adaptation (based on past) for low delay Cascading predictors for improved compression 36

Lossless Predictive Coding Encoder Signal (quant.) 1 z Pred. h Q (residual) + - e(n) Coding Contains e.g. Huffman Coder Coded Signal For backward adaptation: Predictor coefficient vector h updated with LMS algorithm. 37

Lossless Predictive Coding Decoder Decoding e(n) + Decoded Signal Q Pred. h 1 z Observe: Quantization / rounding of predicted value does not affect lossless property 38

Low Coding Delay Backward adaptation with LMS algorithm Define a vector of input samples: x T The predicted value T P( n) round ( x ( n 1) h( n ( n) x( n L 1),..., x( n) )) Update of the predictor coefficient vector h with normalized LMS (Widrow, Hoff, 1960). Prediction error: e( n) x( n) P( n) e( n) h( n 1) h( n) x( n) 2 1 x( n) 39

Increased Compression Ratio Cascading LMS predictors, using the final output, has advantages: Increased adaptation speed Improved prediction accuracy Better numerical stability (see Prandoni, Vetterli, 1998, for a special case) 40

Cascading and Combining Predictors For us important: Availability of predictors of different orders as additional outputs. Reasons: Very non-stationary signals (attacks) require fast adaptation/ short filters Stationary signals require long filters Approach: Combine predictors of different orders adaptively (analog to block switching) 41

Combination of Predictors (1) Assume predictors P 1, P 2, P 3 with different orders for different signal statistics. How can they be combined? Use predictive minimum description length principle for the optimal combination of predictors P 3 i 1 w i P i w i : probability of P i being correct on past signal. 42

Combination of Predictors (2) Assume that the prediction error has a Laplacian distribution ( error get: p(x)=e c x e(n)= x (n) P i (n) n ), with the prediction and the weights w i we w i c n e x( n) P i ( n) Weights reward predictors with good past performance The weights are normalised such that they add up to 1. 43

Weighted Cascaded LMS (WCLMS) Prediction x(n) 1 z Pred. e ( n 1 ) + e ( n) 2 + - - x ˆ( n) eˆ1 ( n) 1 z Pred. 1 z Pred. e ( n) ˆ2 w1 ( n 1) P1 ( x( n 1)) + w2 ( n 1) P2 ( x( n 1)) + P3 ( x( n 1)) w3 ( n 1) + Q [ P( x( n 1))] The w s are adapted based on previous prediction errors, 0<w<1, to adapt to signal statistics (orders, number of coefficients: 120, 80, 40) 44

Entropy Coding of Residuals Take known algorithms, e.g.: Adaptive Golomb-Rice Codes Adaptive Arithmetic Coding Block based Huffman using pre-calculated code books No additional delay introduced, because ULD implementation already is block-based (128 samples) Inherently variable bit rate 45

Comparison of different lossless compression schemes Signals are at 32 khz sampling rate, bit-rate in bit/sample. Application after pre-filter and quantization. Signal Cascaded LMS Shorten Wavezip LPAC Pop 1.94 2.52 3.22 2.23 Jazz 1.99 2.67 3.35 2.48 mixed 2.16 2.58 3.19 2.35 Speech 1.96 2.48 3.09 2.12 46

Controlling the Bit-Rate: constant bit rate mode A factor of less than 1 leads to quantization noise above threshold of audibility, but a reduced bit-rate By adapting the attenuation factor and iterating the lossless coder a target bit rate can be approximated Audio Pre-filter X Q Coding Encoded Audio Attenuation 47

Alternative Predictor Structure Closed-Loop Predictor instead of Open-Loop Predictor Advantage: only one quantizer Disadvantage: Quantization and Prediction not separated anymore (Irrelevance and Redundancy Reduction) e(n) [e(n)] Pred 48

Conclusions Predictive coding can be used to obtain Ultra Low Delay audio coders Obtained delay: 6 ms (<10 ms) Subjective audio quality comparable to conventional high delay coder at same bitrate Price: higher complexity Can also be used to obtain high audio quality (unlike speech coders) 49