Using Noise Substitution for Backwards-Compatible Audio Codec Improvement

Similar documents
ROW.mp3. Colin Raffel, Jieun Oh, Isaac Wang Music 422 Final Project 3/12/2010

Spectral modeling of musical sounds

Perceptual Pre-weighting and Post-inverse weighting for Speech Coding

Speech and audio coding

Perceptual Coding. Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding

EE482: Digital Signal Processing Applications

5: Music Compression. Music Coding. Mark Handley

ELL 788 Computational Perception & Cognition July November 2015

New Results in Low Bit Rate Speech Coding and Bandwidth Extension

SPREAD SPECTRUM AUDIO WATERMARKING SCHEME BASED ON PSYCHOACOUSTIC MODEL

Lecture 16 Perceptual Audio Coding

Parametric Coding of High-Quality Audio

Audio-coding standards

Compressed Audio Demystified by Hendrik Gideonse and Connor Smith. All Rights Reserved.

Parametric Coding of Spatial Audio

Audio-coding standards

MPEG-4 aacplus - Audio coding for today s digital media world

Fundamentals of Perceptual Audio Encoding. Craig Lewiston HST.723 Lab II 3/23/06

Packet Loss Concealment for Audio Streaming based on the GAPES and MAPES Algorithms

Distributed Signal Processing for Binaural Hearing Aids

Appendix 4. Audio coding algorithms

Optical Storage Technology. MPEG Data Compression

The MPEG-4 General Audio Coder

AN EFFICIENT TRANSCODING SCHEME FOR G.729 AND G SPEECH CODECS: INTEROPERABILITY OVER THE INTERNET. Received July 2010; revised October 2011

Squeeze Play: The State of Ady0 Cmprshn. Scott Selfon Senior Development Lead Xbox Advanced Technology Group Microsoft

1 Audio quality determination based on perceptual measurement techniques 1 John G. Beerends

A NEW DCT-BASED WATERMARKING METHOD FOR COPYRIGHT PROTECTION OF DIGITAL AUDIO

Digital Signal Processing Lecture Notes 22 November 2010

14th European Signal Processing Conference (EUSIPCO 2006), Florence, Italy, September 4-8, 2006, copyright by EURASIP

DRA AUDIO CODING STANDARD

The Sensitivity Matrix

Modeling the Spectral Envelope of Musical Instruments

Image Transformation Techniques Dr. Rajeev Srivastava Dept. of Computer Engineering, ITBHU, Varanasi

Sonic Studio Mastering EQ Table of Contents

Lecture 7: Audio Compression & Coding

2014 Summer School on MPEG/VCEG Video. Video Coding Concept

Real-time Complex Cepstral Signal Processing for Musical Applications

Perceptual coding. A psychoacoustic model is used to identify those signals that are influenced by both these effects.

Image Denoising Based on Hybrid Fourier and Neighborhood Wavelet Coefficients Jun Cheng, Songli Lei

Mpeg 1 layer 3 (mp3) general overview

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007

ON-LINE SIMULATION MODULES FOR TEACHING SPEECH AND AUDIO COMPRESSION TECHNIQUES

Audio Engineering Society. Convention Paper. Presented at the 126th Convention 2009 May 7 10 Munich, Germany

Chapter X Sampler Instrument

University of Pennsylvania Department of Electrical and Systems Engineering Digital Audio Basics

REAL-TIME DIGITAL SIGNAL PROCESSING

FFT and Spectrum Analyzer

SAOC and USAC. Spatial Audio Object Coding / Unified Speech and Audio Coding. Lecture Audio Coding WS 2013/14. Dr.-Ing.

Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal.

6MPEG-4 audio coding tools

MPEG-4 Structured Audio Systems

Multimedia Communications. Audio coding

Ch. 5: Audio Compression Multimedia Systems

CISC 7610 Lecture 3 Multimedia data and data formats

Compression of Stereo Images using a Huffman-Zip Scheme

Module 9 AUDIO CODING. Version 2 ECE IIT, Kharagpur

twisted wave twisted wave [an introduction]

/ / _ / _ / _ / / / / /_/ _/_/ _/_/ _/_/ _\ / All-American-Advanced-Audio-Codec

ENTROPY CODING OF QUANTIZED SPECTRAL COMPONENTS IN FDLP AUDIO CODEC

Perceptual Audio Coders What to listen for: Artifacts of Parametric Coding

All MSEE students are required to take the following two core courses: Linear systems Probability and Random Processes

University of Mustansiriyah, Baghdad, Iraq

Scalable Perceptual and Lossless Audio Coding based on MPEG-4 AAC

MPEG-4 Version 2 Audio Workshop: HILN - Parametric Audio Coding

Assignment 1: Speech Production and Models EN2300 Speech Signal Processing

DSP. Presented to the IEEE Central Texas Consultants Network by Sergio Liberman

Chapter 4: Audio Coding

Two-Dimensional Fourier Processing of Rasterised Audio. Chris Pike

Multimedia Communications. Transform Coding

Content-Based Adaptive Binary Arithmetic Coding (CABAC) Li Li 2017/2/9

An Environmental Audio Based Context Recognition System Using Smartphones

Neural Networks Based Time-Delay Estimation using DCT Coefficients

Error Protection of Wavelet Coded Images Using Residual Source Redundancy

<< WILL FILL IN THESE SECTIONS THIS WEEK to provide sufficient background>>

MATLAB Apps for Teaching Digital Speech Processing

Data preprocessing Functional Programming and Intelligent Algorithms

A PSYCHOACOUSTIC MODEL WITH PARTIAL SPECTRAL FLATNESS MEASURE FOR TONALITY ESTIMATION

MUSIC A Darker Phonetic Audio Coder

Visually Improved Image Compression by Combining EZW Encoding with Texture Modeling using Huffman Encoder

On Pre-Image Iterations for Speech Enhancement

1 Introduction. 2 Speech Compression

Wavelet filter bank based wide-band audio coder

MPEG-1. Overview of MPEG-1 1 Standard. Introduction to perceptual and entropy codings

Classification and Vowel Recognition

ERROR RECOGNITION and IMAGE ANALYSIS

ECE4703 B Term Laboratory Assignment 2 Floating Point Filters Using the TMS320C6713 DSK Project Code and Report Due at 3 pm 9-Nov-2017

Audio Compression. Audio Compression. Absolute Threshold. CD quality audio:

Effect of MPEG Audio Compression on HMM-based Speech Synthesis

CHROMA AND MFCC BASED PATTERN RECOGNITION IN AUDIO FILES UTILIZING HIDDEN MARKOV MODELS AND DYNAMIC PROGRAMMING. Alexander Wankhammer Peter Sciri

Source Coding Basics and Speech Coding. Yao Wang Polytechnic University, Brooklyn, NY11201

Image and Video Coding I: Fundamentals

INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO

RECOMMENDATION ITU-R BS Procedure for the performance test of automated query-by-humming systems

LabROSA Research Overview

Wake Vortex Tangential Velocity Adaptive Spectral (TVAS) Algorithm for Pulsed Lidar Systems

Summary of Last Chapter. Course Content. Chapter 3 Objectives. Chapter 3: Data Preprocessing. Dr. Osmar R. Zaïane. University of Alberta 4

ME scope Application Note 18 Simulated Multiple Shaker Burst Random Modal Test

Simple Watermark for Stereo Audio Signals with Modulated High-Frequency Band Delay

Chapter 14 MPEG Audio Compression

(12) United States Patent (10) Patent No.: US 6,453,252 B1. Laroche (45) Date of Patent: Sep. 17, 2002

Transcription:

Using Noise Substitution for Backwards-Compatible Audio Codec Improvement Colin Raffel AES 129th Convention San Francisco, CA February 16, 2011

Outline Introduction and Motivation Coding Error Analysis Synthesis Example: row-mp3

Introduction and Motivation Problem: Many widely used audio codecs are out of date compared to the state-of-the-art because they were not made to be improved upon in a backwards-compatible way

Introduction and Motivation Problem: Many widely used audio codecs are out of date compared to the state-of-the-art because they were not made to be improved upon in a backwards-compatible way Observation: Adding the coding process s residual, or coding error, back into the audio file will result in a lossless audio quality

Introduction and Motivation Problem: Many widely used audio codecs are out of date compared to the state-of-the-art because they were not made to be improved upon in a backwards-compatible way Observation: Adding the coding process s residual, or coding error, back into the audio file will result in a lossless audio quality Technique: Find a way to represent the coding error with a small amount of data and include the information in the coded audio file

Introduction and Motivation Problem: Many widely used audio codecs are out of date compared to the state-of-the-art because they were not made to be improved upon in a backwards-compatible way Observation: Adding the coding process s residual, or coding error, back into the audio file will result in a lossless audio quality Technique: Find a way to represent the coding error with a small amount of data and include the information in the coded audio file Proposal: Store frame-by-frame, per-critical-band residual levels in the audio codec s metadata and re-synthesize the coding error as colored noise when decoding

Coding Error Achieving lower data rates requires some information loss We can define coding error as (original audio) (coded audio) Tends to be noisy Modeling as colored noise is cheap 1 0.8 Normalized sample autocorrelation comparison Original file Coding error 0.6 Autocorrelation 0.4 0.2 0 0.2 0.4 0.6 0 0.5 1 1.5 Correlation lag 2 2.5 3 x 10 4

Residual Analysis: Spectral Flux Idea: Model only the non-stationary component of the error Simple method: Spectral flux, defined as SF(n) = N 1 ( X [n, k] X [n 1, k] ) 2 k=0 Stationary signal components get subtracted out Roughly speaking, Full proof is in the paper SF(n) RMS(x[n]) Proportionality only holds for Gaussian noise and non-overlapping rectangular windows

Residual Analysis: Spectral Flux Coding error does not satisfy proportionality criterion The proportionality still roughly holds in practice Comparison of RMS and Spectral Flux 0.18 0.16 Coding error RMS Spectral flux 0.14 RMS/Spectral Flux 0.12 0.1 0.08 0.06 0.04 0.02 0 0 0.5 1 1.5 2 Time (seconds)

Residual Analysis: Spectral Flux To determine coloring, evaluate the flux on a per-band basis Band levels tended to change too rapidly from frame-to-frame However, RMS proportionality holds in practice and makes this technique useful 50 40 Spectra of coding error and flux based synthesized noise Error Noise coded 30 Magnitude (db) 20 10 0 10 10 2 10 3 10 4 Frequency (log)

Residual Analysis: Smoothed Cepstrum Obtain spectral envelope by windowing the real cepstrum and taking the DFT C[n] = R ( E[k] = R 1 N N 1 k=0 ( N 1 n=0 log( X (k) )e j2πnk/n ) w[n]c[n]e j2πnk/n ) Works well for relatively peak-free spectra Per-band level can be found by averaging over bins in band

Residual Analysis: Smoothed Cepstrum Generally results in band levels which are smooth from band to band and frame to frame Smoothed cepstrum of coding error spectrum 0 Coding error spectrum Smoothed Cepstrum 10 20 Magnitude (db) 30 40 50 60 70 2000 4000 6000 8000 10000 12000 14000 16000 18000 Frequency (Hz)

Residual Analysis: Comparison Flux is analytically clean, but varies rapidly because it is intentionally uncorrelated Smoothed cepstrum provides a reasonable estimate which is smoother in time and band 0.5 0.45 0.4 0.35 Critical band level estimates Spectral Flux Smoothed Cepstrum Spectral Mean Magnitude (linear) 0.3 0.25 0.2 0.15 0.1 0.05 0 5 10 15 20 25 Band Number

Residual Synthesis Generate coding error representation by applying critical band envelopes to a random spectra Envelope differences from frame-to-frame cause coloring discontinuities We can generate any amount of colored noise by generating a larger spectrum So, create additional noise per-frame and crossfade Transients in the residual result in frames of noise in the error representation Traditional methods for detecting and representing transients are not effective The coded audio and coding error s envelopes are similar We can modulate residual representation with the coded audio s envelope

Residual Synthesis We can parametrize the amount of envelope modulation by y[n] = ((1 α) + αl[n])) x[n] 0.25 0.2 Comparison of error level to matched and unmatched synthesized noise Original MP3 Coding error Modulated error estimate Unmodulated error estimate Amplitude estimate (linear) 0.15 0.1 0.05 0 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 Time (seconds)

Implementation: row-mp3 The MP3 codec is highly pervasive but somewhat out-of-date To allow backwards-compatibility, we can store information in the ID3 (metadata) tag row-mp3 -aware decoders can use the information, while others will simply ignore it Including per-frame critical band levels results in a relatively small data overhead For example, with a 23.2 ms frame size and 8-bit quantized band level values we have (.0232) (8) (25) = 8.6 kbit/s/channel Data overhead can be reduced by using different quantization schemes or compression such as Huffman coding

Implementation: row-mp3 Created a simple MUSHRA-like web-based test to determine codec s effectiveness row-mp3 files used spectral flux method with no envelope modulation 60 subjects tended to rate the row-mp3 version about 150% better for low MP3 bit rates Further, more controlled testing with all error analysis and synthesis methods is needed!"##$% % /2B /2. /2A /20 /23 /21 /2, /2- /2% / *+,)%-. *+,),-/ *+,)01,23)4"5)678+'99 :#;#<#=># :78?@+,)%-. :78?@+,)01

Conclusions Audio coding error can be effectively modeled as colored noise Flux provided a theoretically-sound coloring estimate Cepstral smoothing works better in practice Synthesis by scaling random spectra Cross-fading and interpolation prevented coloring discontinuities Level-modulated error estimate helped prevent smeared transients row-mp3 codec and accompanying listening tests suggest feasibility

Future work Investigating the optimal number and spacing of bands Testing the effectiveness of other analysis techniques Evaluating different methods for dealing with transients Applying similar techniques to spectral modeling and other processes with residual Implementing inclusion schemes in other audio codecs Generating residual levels solely from the coded audio (as a sound enhancement)

Acknowledgements Jieun Oh and Isaac Wang for creating the row-mp3 codec Prof. Marina Bosi for her instruction in the field of audio coding Prof. Julius Smith for helpful advice and discussion on various topics

Sound examples and code http://ccrma.stanford.edu/ e craffel/software/noise/ http://ccrma.stanford.edu/ e craffel/software/rowmp3/