ELL 788 Computational Perception & Cognition July November 2015

Similar documents
Perceptual Coding. Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding

5: Music Compression. Music Coding. Mark Handley

Mpeg 1 layer 3 (mp3) general overview

Audio-coding standards

Perceptual coding. A psychoacoustic model is used to identify those signals that are influenced by both these effects.

Audio-coding standards

Multimedia Communications. Audio coding

Principles of Audio Coding

Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal.

Contents. 3 Vector Quantization The VQ Advantage Formulation Optimality Conditions... 48

Parametric Coding of High-Quality Audio

Chapter 14 MPEG Audio Compression

MPEG-1. Overview of MPEG-1 1 Standard. Introduction to perceptual and entropy codings

Lecture 16 Perceptual Audio Coding

Audio Coding and MP3

EE482: Digital Signal Processing Applications

Figure 1. Generic Encoder. Window. Spectral Analysis. Psychoacoustic Model. Quantize. Pack Data into Frames. Additional Coding.

2.4 Audio Compression

Appendix 4. Audio coding algorithms

Optical Storage Technology. MPEG Data Compression

Compressed Audio Demystified by Hendrik Gideonse and Connor Smith. All Rights Reserved.

MPEG-4 General Audio Coding

Audio Compression. Audio Compression. Absolute Threshold. CD quality audio:

New Results in Low Bit Rate Speech Coding and Bandwidth Extension

Data Compression. Audio compression

MPEG-4 aacplus - Audio coding for today s digital media world

Audio Fundamentals, Compression Techniques & Standards. Hamid R. Rabiee Mostafa Salehi, Fatemeh Dabiran, Hoda Ayatollahi Spring 2011

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Snr Staff Eng., Team Lead (Applied Research) Dolby Australia Pty Ltd

CSCD 443/533 Advanced Networks Fall 2017

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Senior Manager, CE Technology Dolby Australia Pty Ltd

Fundamentals of Perceptual Audio Encoding. Craig Lewiston HST.723 Lab II 3/23/06

DAB. Digital Audio Broadcasting

Audio and video compression

AUDIOVISUAL COMMUNICATION

Technical PapER. between speech and audio coding. Fraunhofer Institute for Integrated Circuits IIS

Fundamentals of Video Compression. Video Compression

Module 9 AUDIO CODING. Version 2 ECE IIT, Kharagpur

MP3. Panayiotis Petropoulos

Lossy compression CSCI 470: Web Science Keith Vertanen Copyright 2013

AUDIOVISUAL COMMUNICATION

CISC 7610 Lecture 3 Multimedia data and data formats

Parametric Coding of Spatial Audio

Lossy compression. CSCI 470: Web Science Keith Vertanen

Compression Part 2 Lossy Image Compression (JPEG) Norm Zeck

Audio coding for digital broadcasting

Ch. 5: Audio Compression Multimedia Systems

<< WILL FILL IN THESE SECTIONS THIS WEEK to provide sufficient background>>

Efficient Representation of Sound Images: Recent Developments in Parametric Coding of Spatial Audio

CHAPTER 6 Audio compression in practice

Audio Coding Standards

Compression; Error detection & correction

The MPEG-4 General Audio Coder

Implementation of FPGA Based MP3 player using Invers Modified Discrete Cosine Transform

I D I A P R E S E A R C H R E P O R T. October submitted for publication

ISO/IEC INTERNATIONAL STANDARD

14th European Signal Processing Conference (EUSIPCO 2006), Florence, Italy, September 4-8, 2006, copyright by EURASIP

AUDIO. Henning Schulzrinne Dept. of Computer Science Columbia University Spring 2015

ISO/IEC INTERNATIONAL STANDARD. Information technology MPEG audio technologies Part 3: Unified speech and audio coding

6MPEG-4 audio coding tools

Bluray (

CHAPTER 6. 6 Huffman Coding Based Image Compression Using Complex Wavelet Transform. 6.3 Wavelet Transform based compression technique 106

/ / _ / _ / _ / / / / /_/ _/_/ _/_/ _/_/ _\ / All-American-Advanced-Audio-Codec

Efficient Implementation of Transform Based Audio Coders using SIMD Paradigm and Multifunction Computations

S.K.R Engineering College, Chennai, India. 1 2

Video Compression An Introduction

INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO

SAOC and USAC. Spatial Audio Object Coding / Unified Speech and Audio Coding. Lecture Audio Coding WS 2013/14. Dr.-Ing.

MPEG-4 Advanced Audio Coding

2014 Summer School on MPEG/VCEG Video. Video Coding Concept

ROW.mp3. Colin Raffel, Jieun Oh, Isaac Wang Music 422 Final Project 3/12/2010

SPREAD SPECTRUM AUDIO WATERMARKING SCHEME BASED ON PSYCHOACOUSTIC MODEL

Filterbanks and transforms

Chapter 4: Audio Coding

DRA AUDIO CODING STANDARD

Networking Applications

Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig

ITNP80: Multimedia! Sound-II!

MUSIC A Darker Phonetic Audio Coder

MPEG-l.MPEG-2, MPEG-4

An adaptive wavelet-based approach for perceptual low bit rate audio coding attending to entropy-type criteria

Modeling of an MPEG Audio Layer-3 Encoder in Ptolemy

Estimating MP3PRO Encoder Parameters From Decoded Audio

KINGS COLLEGE OF ENGINEERING DEPARTMENT OF INFORMATION TECHNOLOGY ACADEMIC YEAR / ODD SEMESTER QUESTION BANK

Scalable Perceptual and Lossless Audio Coding based on MPEG-4 AAC

Design and Implementation of an MPEG-1 Layer III Audio Decoder KRISTER LAGERSTRÖM

Compression; Error detection & correction

Squeeze Play: The State of Ady0 Cmprshn. Scott Selfon Senior Development Lead Xbox Advanced Technology Group Microsoft

Review and Implementation of DWT based Scalable Video Coding with Scalable Motion Coding.

CT516 Advanced Digital Communications Lecture 7: Speech Encoder

Wavelet filter bank based wide-band audio coder

Image Transformation Techniques Dr. Rajeev Srivastava Dept. of Computer Engineering, ITBHU, Varanasi

What is multimedia? Multimedia. Continuous media. Most common media types. Continuous media processing. Interactivity. What is multimedia?

Principles of MPEG audio compression

Efficient Signal Adaptive Perceptual Audio Coding

FINE-GRAIN SCALABLE AUDIO CODING BASED ON ENVELOPE RESTORATION AND THE SPIHT ALGORITHM

Multimedia. What is multimedia? Media types. Interchange formats. + Text +Graphics +Audio +Image +Video. Petri Vuorimaa 1

3GPP TS V6.2.0 ( )

Convention Paper Presented at the 121st Convention 2006 October 5 8 San Francisco, CA, USA

DSP. Presented to the IEEE Central Texas Consultants Network by Sergio Liberman

DIGITAL TELEVISION 1. DIGITAL VIDEO FUNDAMENTALS

Transcription:

ELL 788 Computational Perception & Cognition July November 2015 Module 11 Audio Engineering: Perceptual coding

Coding and decoding Signal (analog) Encoder Code (Digital) Code (Digital) Decoder Signal (analog) Transmission Channel / Storage Coding For digitizing (recording / transmission) Signal to be satisfactorily reproduced Compact (Compression) Time-efficient processing Aims at reproducing an audio event NOT reproducing the original signal

Principle perceptual coding Shape permissible quantization noise in frequency domain Source: Dietz, et al.

Approach Use the psycho-accoustic model Cochlear time-frequency resolution Discard frequencies that we cannot hear Auditory masking (Simultaneous and Temporal) Allow noise to the extent that cannot be perceived Stereo-coding issue just two channels do not help Not optimal Localization problem (Random uncorrelated noise) Sophisticated methods vs. Commercial systems Commodity encoder / decoder

Time-frequency based audio coding Source: Brandenburg 2013 Analysis filter-banks: decompose the input signal into a time sequence of vectors with as many elements as spectral components Perceptual model: actual (time- and frequency-dependent) masking threshold is computed using rules derived from psychoacoustics Quantization and coding: goal is to keep the resulting error signal (usually referred to as quantization noise) below the masking threshold Bit stream multiplex: quantized and coded spectral coefficients and some side information + metadata

Sub-band coding Modified DCT (MDCT) used to realize polyphase filter-banks DCT is like DFT; uses only cosine functions MDCT operates on overlapping blocks avoids artefacts arising out of block boundaries High frequency resolution poor time resolution A set of time samples a vector of spectral energy components These spectral components are orthogonal -- Avoids redundancy Retain only that human ear will recognize Filter-banks cover range of human auditory system Apply masking in close frequencies Key for perceptual compression

Quantization and coding The energy co-efficients at different frequencies need to be quantized and coded Quantization error introduces noise Should be below hearing threshold varies with frequency Each quantized level is represented with a symbol Entropy coding: Allocate less bits for frequently used symbols e.g. Huffman coding Arithmetic coding, or range coding Represents entire signal segment with one number More optimal than Entropy coding

Fixed rate and variable rate coding Bit-rate depends on Filter banks Quantization Coding Entropy coding leads to variable rate coding Fixed rate transmission is often desirable VBR coding can still be used if delay is allowed Typically in one-way transmission, e.g. music system, broadcasting, etc. NOT for telephony Buffer control is important

Perceptual audio codecs: MP3 Sampling frequencies: 32, 44.1 and 48 khz Sampling frequencies Frame-length 24 ms @ 48 khz sampling frequency 1152 time samples Auditory masking to remove redundant signal MDCT for coding 2 filter-bank blocks with 576 sub-bands each Quantization based on power law (loudness) Huffman coding: a set of predefined tables Bit-rate: 32 320 kbps, or VBR High encoding complexity but low decoding complexity Bit-reservoir and back frames Bits preserved during coding of silent frames can be used to encode audio frames with more audio content 2-channel stereo

Quality Trade-off between bit-rate and the sound quality Depends on nature of audio being compressed Random sound is difficult to compress Compression artifacts

Advance Audio Coding (AAC) Family Improved audio quality for coding at low bit rates < 48 kb/s per channel Spectral Band Replication (SBR) for bandwidth extension Low frequency components of signal are transmitted High frequency components are reconstructed Results in major savings in bit-rate 5.1 channel stereo

Sprectral Band Reconstruction (SBR) Based on the principles Generally, there is a large dependency between the lower and higher frequency parts of an audio signal. The psychoacoustic part of the human brain tends to analyse higher frequencies with less accuracy Encoder codes only the low and mid frequencies The high frequency part is can be efficiently reconstructed from the low frequency part Some low bit rate SBR control data is needed

Hi-frequency reconstruction using SBR Spectral envelop and some other control information about the high-frequency components are extracted while coding Source: Dietz, et al.

SBR encoder and decoder (Basic block diagrams) Encoder Decoder

Delays Not suitable for 2-way / multi-party applications Telephony, A/V conferencing, Telepresence Requirement: max delay ~20 ms

Codecs with low delay: AAC-LD Uses shorter transform size 480 or 512 subband MDC (vs. 1024 for AAC) Avoid look-ahead Bit reservoir is avoided / restricted to only a small number of bits (around 100) Less compression Can be used over normal telephone lines / ISDN 32 / 64 kbps or higher The achieved coding quality scales up with bitrate The audio quality of AAC LD is slightly better compared to MP3 at the same bitrate.

Spatial audio-coding (Channel-based spatial coding: stereo / 5.1) Encoding each channel separately is suboptimal Also, quality is poor because of uncorrelated noise The multi-channel signals are coded as a downmix All signal components (disregarding spatial aspects) present in all channels parameters describing perceptually relevant differences (in terms of spatial hearing) between the original audio channels Very low bit-rate (order of 2 lower) In the decoder, the channels are synthesized from the downmix Close to the original audio channels

References Brandenburg, et al. Perceptual Coding of High-Quality Digital Audio. Proc. IEEE, Sept 2013 Wang and Vilermo. Modified discrete cosine transform... Dietz, et al. Spectral Band Replication, a novel approach in audio coding Lutzky, et.al. A guideline to audio codec delay