The MPEG-4 General Audio Coder

Similar documents
MPEG-4 General Audio Coding

Multimedia Communications. Audio coding

5: Music Compression. Music Coding. Mark Handley

Parametric Coding of High-Quality Audio

Audio-coding standards

Audio-coding standards

Chapter 4: Audio Coding

MPEG-4 Version 2 Audio Workshop: HILN - Parametric Audio Coding

Audio coding for digital broadcasting

SAOC and USAC. Spatial Audio Object Coding / Unified Speech and Audio Coding. Lecture Audio Coding WS 2013/14. Dr.-Ing.

Mpeg 1 layer 3 (mp3) general overview

Optical Storage Technology. MPEG Data Compression

ISO/IEC INTERNATIONAL STANDARD. Information technology MPEG audio technologies Part 3: Unified speech and audio coding

New Results in Low Bit Rate Speech Coding and Bandwidth Extension

Chapter 14 MPEG Audio Compression

Speech and audio coding

INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO

Principles of Audio Coding

2.4 Audio Compression

Efficient Representation of Sound Images: Recent Developments in Parametric Coding of Spatial Audio

MPEG-4 Advanced Audio Coding

Audio Coding Standards

Perceptual coding. A psychoacoustic model is used to identify those signals that are influenced by both these effects.

Lecture 16 Perceptual Audio Coding

Perceptual Coding. Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding

Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal.

ISO/IEC INTERNATIONAL STANDARD

Module 9 AUDIO CODING. Version 2 ECE IIT, Kharagpur

Embedded lossless audio coding using linear prediction and cascade coding

Appendix 4. Audio coding algorithms

Audio Fundamentals, Compression Techniques & Standards. Hamid R. Rabiee Mostafa Salehi, Fatemeh Dabiran, Hoda Ayatollahi Spring 2011

6MPEG-4 audio coding tools

Scalable Perceptual and Lossless Audio Coding based on MPEG-4 AAC

Compressed Audio Demystified by Hendrik Gideonse and Connor Smith. All Rights Reserved.

Audio Coding and MP3

Technical PapER. between speech and audio coding. Fraunhofer Institute for Integrated Circuits IIS

3GPP TS V6.2.0 ( )

Perceptual Audio Coders What to listen for: Artifacts of Parametric Coding

Ch. 5: Audio Compression Multimedia Systems

Bit or Noise Allocation

Contents. 3 Vector Quantization The VQ Advantage Formulation Optimality Conditions... 48

Lecture 7: Audio Compression & Coding

ELL 788 Computational Perception & Cognition July November 2015

Efficient Implementation of Transform Based Audio Coders using SIMD Paradigm and Multifunction Computations

Structural analysis of low latency audio coding schemes

ISO/IEC INTERNATIONAL STANDARD. Information technology Coding of audio-visual objects Part 3: Audio

Opus, a free, high-quality speech and audio codec

PQMF Filter Bank, MPEG-1 / MPEG-2 BC Audio. Fraunhofer IDMT

Audio Compression. Audio Compression. Absolute Threshold. CD quality audio:

14th European Signal Processing Conference (EUSIPCO 2006), Florence, Italy, September 4-8, 2006, copyright by EURASIP

EE482: Digital Signal Processing Applications

S.K.R Engineering College, Chennai, India. 1 2

ROW.mp3. Colin Raffel, Jieun Oh, Isaac Wang Music 422 Final Project 3/12/2010

Data Compression. Audio compression

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Snr Staff Eng., Team Lead (Applied Research) Dolby Australia Pty Ltd

Presents 2006 IMTC Forum ITU-T T Workshop

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Senior Manager, CE Technology Dolby Australia Pty Ltd

DSP. Presented to the IEEE Central Texas Consultants Network by Sergio Liberman

Convention Paper Presented at the 121st Convention 2006 October 5 8 San Francisco, CA, USA

ETSI TS V (201

The Sensitivity Matrix

DRA AUDIO CODING STANDARD

MPEG-4 aacplus - Audio coding for today s digital media world

)3/)%# *4#3#7'. -0%' /CTOBER -ELBOURNE!UDIO 3UBGROUP 2EVISED 2EPORT ON #OMPLEXITY OF -0%'!!# 4OOLS

Audio Engineering Society. Convention Paper. Presented at the 126th Convention 2009 May 7 10 Munich, Germany

Figure 1. Generic Encoder. Window. Spectral Analysis. Psychoacoustic Model. Quantize. Pack Data into Frames. Additional Coding.

CISC 7610 Lecture 3 Multimedia data and data formats

MULTIMODE TREE CODING OF SPEECH WITH PERCEPTUAL PRE-WEIGHTING AND POST-WEIGHTING

Fundamentals of Perceptual Audio Encoding. Craig Lewiston HST.723 Lab II 3/23/06

Convention Paper 8654 Presented at the 132nd Convention 2012 April Budapest, Hungary

Port of a Fixed Point MPEG-2 AAC Encoder on a ARM Platform

_äìé`çêé. Audio Compression Codec Specifications and Requirements. Application Note. Issue 2

MPEG-4 BSAC Technology

Parametric Coding of Spatial Audio

DAB. Digital Audio Broadcasting

Design and Implementation of an MPEG-1 Layer III Audio Decoder KRISTER LAGERSTRÖM

Information technology Coding of audio-visual objects Part 3: Audio

KINGS COLLEGE OF ENGINEERING DEPARTMENT OF INFORMATION TECHNOLOGY ACADEMIC YEAR / ODD SEMESTER QUESTION BANK

/ / _ / _ / _ / / / / /_/ _/_/ _/_/ _/_/ _\ / All-American-Advanced-Audio-Codec

EE Multimedia Signal Processing. Scope & Features. Scope & Features. Multimedia Signal Compression VI (MPEG-4, 7)

Motion Estimation. Original. enhancement layers. Motion Compensation. Baselayer. Scan-Specific Entropy Coding. Prediction Error.

MPEG-1. Overview of MPEG-1 1 Standard. Introduction to perceptual and entropy codings

<< WILL FILL IN THESE SECTIONS THIS WEEK to provide sufficient background>>

Speech-Coding Techniques. Chapter 3

DIGITAL TELEVISION 1. DIGITAL VIDEO FUNDAMENTALS

Digital Audio for Multimedia

Bluray (

Music & Engineering: Digital Encoding and Compression

MPEG SURROUND: THE FORTHCOMING ISO STANDARD FOR SPATIAL AUDIO CODING

The following bit rates are recommended for broadcast contribution employing the most commonly used audio coding schemes:

the Audio Engineering Society. Convention Paper Presented at the 120th Convention 2006 May Paris, France

MP3. Panayiotis Petropoulos

An Experimental High Fidelity Perceptual Audio Coder

The Reference Model Architecture for MPEG Spatial Audio Coding

INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO

Perceptual Pre-weighting and Post-inverse weighting for Speech Coding

PERCEPTUAL AUDIO CODING THAT SCALES TO LOW BITRATES SRIVATSAN A KANDADAI, B.E., M.S. A dissertation submitted to the Graduate School

Digital Speech Coding

Proceedings of Meetings on Acoustics

INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO

Transcription:

The MPEG-4 General Audio Coder Bernhard Grill Fraunhofer Institute for Integrated Circuits (IIS) grl 6/98 page 1

Outline MPEG-2 Advanced Audio Coding (AAC) MPEG-4 Extensions: Perceptual Noise Substitution (PNS) Long Term Prediction TwinVQ Coding Core The MPEG-4 Scalable General Audio Coder Results of Listening Tests Demonstration of a Real-Time Player grl 6/98 page 2

MPEG-2 AAC Encoder Overview: Bitstream Output Bitstream Multiplexer Input time signal Gain Control Perceptual Model Filter Bank TNS Intensity/ Coupling Prediction M/S Scale Factors Quant. Rate/Distortion Control Noiseless Coding grl 6/98 page 3

Extension: Perceptual Noise Substitution (PNS) Bitstream Output Bitstream Multiplexer Input time signal Gain Control Perceptual Model Filter Bank TNS Intensity/ Coupling PNS Prediction M/S Scale Factors Quant. Rate/Distortion Control Noiseless Coding grl 6/98 page 4

Perceptual Noise Substitution (2) Background: Parametric coding of noise-like signal components has been used widely e.g. in speech coding MPEG-4: Perceptual Noise Substitution (PNS) permits a frequency selective parametric coding of noise-like signal components Noise-like signal components are detected on a scalefactor band basis Corresponding groups of spectral coefficients are excluded from quantization/coding Instead, only a "noise substitution flag" plus total power of the substituted band is transmitted in the bitstream Decoder inserts pseudo random vectors with desired target power as spectral coefficients grl 6/98 page 5

Perceptual Noise Substitution (3) "Perceptual Noise Substitution" (PNS): Perceptual coder + parametric represent. of noise-like signals Audio Input Encoder Perceptual Model Analysis Filterbank Noise Detection Quantization & Coding Noise subst. signaling Substituted signal energies Bitstream Multiplexer Bitstream Out Decoder Audio Output Synthesis Filterbank Inverse Quantization Bitstream Demultiplexer Bitstream In Noise Generator Noise subst. signaling Substituted signal energies grl 6/98 page 6

Extension 2: Long Term Prediction Bitstream Output Bitstream Multiplexer Input time signal Gain Control Perceptual Model Filter Bank TNS Intensity/ Coupling Prediction M/S Scale Factors Quant. Rate/Distortion Control Noiseless Coding grl 6/98 page 7

Long Term Prediction (2) Motivation: The MPEG-4 General Audio Coder Tone-like signals require much higher coding precision than noise-like signals (e.g. 20 db vs. 6 db) Tonal signal components are predictable MPEG-2 AAC: Prediction of each spectral coefficient with backward adaptive predictor High complexity (ca. 50% of decoder computation & RAM) MPEG-4: Long Term Predictor (LTP) as known from speech coding Lower complexity: Saving of approx. 50% in terms of computation and memory over MPEG-2 predictors Comparable performance to MPEG-2 predictors grl 6/98 page 8

Extension 3: Twin-VQ The MPEG-4 General Audio Coder Bitstream Output Bitstream Multiplexer Input time signal Gain Control Perceptual Model Filter Bank TNS Intensity/ Coupling Prediction M/S Scale Factors Quant. Rate/Distortion Control Noiseless Coding grl 6/98 page 9

Transform-Domain Weighted Interleave VQ (2) Background: Audio coding at extremely low bitrates (6-8 kbit/s) CELP speech coders do not perform well for music 0.5 Bits per frequency line at these data rates!! MPEG-4: Transform-Domain Weighted Interleave Vector Quantization (TwinVQ) as alternative coding kernel Vector selection under control of the perceptual model Fully integrated into MPEG-4 AAC coding system: Uses same spectral representation as AAC coder Makes use of other MPEG-4 tools (e.g. LTP, TNS, joint stereo coding) grl 6/98 page 10

Transform-Domain Weighted Interleave VQ (3) Structure: Normalization of spectral coefficients: LPC envelope (overall spectral shape) Periodic component coding (harmonic components) Bark-scale envelope coding (additional flattening) Vector Quantization (VQ) process: Interleaving of spectral coefficients into new sub-vectors Vector quantization (two sets of codebooks, weighted distortion measure allows distortion control by perceptual model) no bit/noise allocation or rate control iteration grl 6/98 page 11

Scalability Definition: Types of Scalability: Capability to decode useful sub-sets of the bitstream SNR / NMR (Noise to Mask Ratio) Scalability: Extension layers improve the SNR/NMR of the coded signal Audio Bandwidth Scalability: Extension layers increase the decodable audio band width Restriction of Generality: Very low bit rate core coder optimized for special signals, e.g speech. Additional layers provide good quality for all types of signals. Implementation Complexity: grl 6/98 page 12

Application examples The MPEG-4 General Audio Coder Network based (packetized) transmission Requires routers which know about the importance of a packet Less important (outer layer) packets may be dropped if the available bandwidth decreases Broadcast The most important (inner layer) packets are transmitted with a better error protection scheme Music data base High quality content is encoded and stored Access to a lower quality version is possible without recoding to allow for pre-listening with a lower quality grl 6/98 page 13

Scalable GA Coder (I) Encoder Block Diagram MDCT Q&C Reconstruct Spectrum - + F S S Q&C Perceptual Model Encoding of the error signal of an AAC or Twin-VQ Quantization and Coding (Q&C) module in a second, or third, or n-th similar quantization module in the frequency domain Solutions using only AAC, or only Twin-VQ modules possible Additionally, Twin-VQ / AAC combinations defined Useful for large enhancement steps of >= 8 kbit/s per step grl 6/98 page 14

Scalable GA Coder (II) The MPEG-4 General Audio Coder Decoder Block Diagram Reconstruct Spectrum Reconstruct Spectrum IMDCT Inv. FSS + Add Twin-VQ Q&C Modules 8 kbit/s fixed step size Vector Quantizer (VQ) modules optional 6 kbit/s in first layer first choice for a 6 or 8 kbit/s base layer for the coding of general audio signals AAC Q&C modules Any step size possible Reasonable step sizes from 8 to >64 kbit/s The same end quality can be achieved as from a single step AAC coder However, a higher bit rate may be required for the same audio quality grl 6/98 page 15

Scalable GA Coder : Combination with CELP Coder (I) CELP CODEC MDCT FSS Quantization & Coding MDCT Encoder Perceptual. model Very low bitrate core coder ( e.g. speech coder) Core coder typically operating at a lower sampling frequency MDCT used for efficient up-sampling grl 6/98 page 16

Scalable GA Coder : Combination with Core Coder (II) CELP DECODER MDCT IFSS IMDCT Requantization Requantization + Decoder grl 6/98 page 17

Scalable Stereo Coding: Stereo / Stereo grl 6/98 page 18

Scalable Stereo Coding: Mono / Stereo grl 6/98 page 19

Scalable Stereo Coding: Mono Core / Mono GA / Stereo GA grl 6/98 page 20

Scalable GA Coder : The MPEG-4 General Audio Coder Typical Configurations Some successfully tested mono/mono combinations: 6 kbit/s CELP + 18 kbit/s AAC 6 kbit/s TwinVQ + 18 kbit/s AAC 8 kbit/s TwinVQ + 8 kbit/s TwinVQ 6 kbit/s CELP + 18 kbit/s + 24 kbit/s AAC Mono/stereo combinations 6 kbit/s mono CELP + 18 kbit/s mono + 24 kbit/s stereo AAC 24 kbit/s mono + 16 kbit/s stereo + 16 kbit/s stereo AAC 24 kbit/s mono + 72 kbit/s stereo AAC Stereo/stereo combinations 2 x 6 kbit/s mono CELP + 36 kbit/s stereo AAC grl 6/98 page 21

Results (I) Mono Configurations 4,5 4 3,5 3 2,5 2 1,5 1 0,5 0 Layer 3 24 kbit/s AAC 24 kbit/s CELP+A AC 24 kbit/s Tw inv Q + AAC 24 kbit/s AAC 18 kbit/s Series1 Series2 3,5 4,1 3,85 3,65 3,27 grl 6/98 page 22

Results (II) 5 4,5 Mono Mono / Stereo Configuration Stereo 4 3,5 3 2,5 2 1,5 1 0,5 0 L3 24 kbit/s AAC 24 kbit/s scal 24 24 kbit/s L3 40 kbit/s AAC 40 kbit/s scal 40 kbit/s L3 56 kbit/s AAC 56 kbit/s scal 56 kbit/s grl 6/98 page 23

Conclusions Highest quality coding with proven AAC technology PNS, LTP and TwinVQ further enhance the very low bitrate performance Mono, Stereo, and Multi-channel Stereo supported Bitrate range 6 - ~300 kbit/s per channel at 8-96 khz SR Additional flexibility with the scalable coding modes Unique capabilities through the availability of the monostereo coding modes Overall complexity within the limits of today s hardware ==> The MPEG-4 GA coder the most versatile audio coding system available today Low-Delay and Error Resilience Additions in MPEG-4 Version 2 grl 6/98 page 24