ROW.mp3. Colin Raffel, Jieun Oh, Isaac Wang Music 422 Final Project 3/12/2010

Similar documents
Using Noise Substitution for Backwards-Compatible Audio Codec Improvement

Lecture 16 Perceptual Audio Coding

Optical Storage Technology. MPEG Data Compression

ELL 788 Computational Perception & Cognition July November 2015

The MPEG-4 General Audio Coder

Audio-coding standards

Audio-coding standards

/ / _ / _ / _ / / / / /_/ _/_/ _/_/ _/_/ _\ / All-American-Advanced-Audio-Codec

HAVE YOUR CAKE AND HEAR IT TOO: A HUFFMAN CODED, BLOCK SWITCHING, STEREO PERCEPTUAL AUDIO CODER

Mpeg 1 layer 3 (mp3) general overview

MPEG-4 General Audio Coding

Compressed Audio Demystified by Hendrik Gideonse and Connor Smith. All Rights Reserved.

<< WILL FILL IN THESE SECTIONS THIS WEEK to provide sufficient background>>

EE482: Digital Signal Processing Applications

Figure 1. Generic Encoder. Window. Spectral Analysis. Psychoacoustic Model. Quantize. Pack Data into Frames. Additional Coding.

Parametric Coding of High-Quality Audio

5: Music Compression. Music Coding. Mark Handley

DRA AUDIO CODING STANDARD

Appendix 4. Audio coding algorithms

Principles of Audio Coding

MUSIC A Darker Phonetic Audio Coder

Speech and audio coding

Perceptual Pre-weighting and Post-inverse weighting for Speech Coding

Compression; Error detection & correction

Audio Compression. Audio Compression. Absolute Threshold. CD quality audio:

Perceptual Audio Coders What to listen for: Artifacts of Parametric Coding

New Results in Low Bit Rate Speech Coding and Bandwidth Extension

Contents. 3 Vector Quantization The VQ Advantage Formulation Optimality Conditions... 48

MPEG-4 aacplus - Audio coding for today s digital media world

Multimedia Communications. Audio coding

Module 9 AUDIO CODING. Version 2 ECE IIT, Kharagpur

Perceptual Coding. Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding

SAOC and USAC. Spatial Audio Object Coding / Unified Speech and Audio Coding. Lecture Audio Coding WS 2013/14. Dr.-Ing.

ITNP80: Multimedia! Sound-II!

CISC 7610 Lecture 3 Multimedia data and data formats

Chapter 14 MPEG Audio Compression

Bluray (

Source Coding Basics and Speech Coding. Yao Wang Polytechnic University, Brooklyn, NY11201

Audio Coding and MP3

UNDERSTANDING MUSIC & VIDEO FORMATS

Perceptual coding. A psychoacoustic model is used to identify those signals that are influenced by both these effects.

Speech-Coding Techniques. Chapter 3

Perceptually motivated Sub-band Decomposition for FDLP Audio Coding

CS 074 The Digital World. Digital Audio

Lecture #3: Digital Music and Sound

Chapter 4: Audio Coding

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Snr Staff Eng., Team Lead (Applied Research) Dolby Australia Pty Ltd

DAB. Digital Audio Broadcasting

Parametric Coding of Spatial Audio

Squeeze Play: The State of Ady0 Cmprshn. Scott Selfon Senior Development Lead Xbox Advanced Technology Group Microsoft

2.4 Audio Compression

Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal.

MPEG-4 Version 2 Audio Workshop: HILN - Parametric Audio Coding

15 Data Compression 2014/9/21. Objectives After studying this chapter, the student should be able to: 15-1 LOSSLESS COMPRESSION

Data Representation and Networking

DSP. Presented to the IEEE Central Texas Consultants Network by Sergio Liberman

Audio Fundamentals, Compression Techniques & Standards. Hamid R. Rabiee Mostafa Salehi, Fatemeh Dabiran, Hoda Ayatollahi Spring 2011

Chapter 1. Data Storage Pearson Addison-Wesley. All rights reserved

Scalable Perceptual and Lossless Audio Coding based on MPEG-4 AAC

REDUCED FLOATING POINT FOR MPEG1/2 LAYER III DECODING. Mikael Olausson, Andreas Ehliar, Johan Eilert and Dake liu

SPREAD SPECTRUM AUDIO WATERMARKING SCHEME BASED ON PSYCHOACOUSTIC MODEL

ENTROPY CODING OF QUANTIZED SPECTRAL COMPONENTS IN FDLP AUDIO CODEC

BeaqleJS: HTML5 and JavaScript based Framework for the Subjective Evaluation of Audio Quality

Implementation of G.729E Speech Coding Algorithm based on TMS320VC5416 YANG Xiaojin 1, a, PAN Jinjin 2,b

A PSYCHOACOUSTIC MODEL WITH PARTIAL SPECTRAL FLATNESS MEASURE FOR TONALITY ESTIMATION

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Senior Manager, CE Technology Dolby Australia Pty Ltd

CHAPTER 5 AUDIO WATERMARKING SCHEME INHERENTLY ROBUST TO MP3 COMPRESSION

VS1063 ENCODER DEMONSTRATION

Digital Speech Coding

Compression; Error detection & correction

Audio coding for digital broadcasting

Implementation of FPGA Based MP3 player using Invers Modified Discrete Cosine Transform

Computing in the Modern World

GSM Network and Services

Structural analysis of low latency audio coding schemes

MULTIMODE TREE CODING OF SPEECH WITH PERCEPTUAL PRE-WEIGHTING AND POST-WEIGHTING

Fundamentals of Perceptual Audio Encoding. Craig Lewiston HST.723 Lab II 3/23/06

Introduction to Video Compression

CT516 Advanced Digital Communications Lecture 7: Speech Encoder

14th European Signal Processing Conference (EUSIPCO 2006), Florence, Italy, September 4-8, 2006, copyright by EURASIP

CSCD 443/533 Advanced Networks Fall 2017

Data Compression. Audio compression

Lecture 7: Audio Compression & Coding

Lossy compression CSCI 470: Web Science Keith Vertanen Copyright 2013

2014 Summer School on MPEG/VCEG Video. Video Coding Concept

Efficient Signal Adaptive Perceptual Audio Coding

Pyramid Coding and Subband Coding

CHAPTER 6 Audio compression in practice

6MPEG-4 audio coding tools

Lecture 5: Error Resilience & Scalability

MPEG-1. Overview of MPEG-1 1 Standard. Introduction to perceptual and entropy codings

Distributed Signal Processing for Binaural Hearing Aids

Lecture 6: Compression II. This Week s Schedule

ISO/IEC INTERNATIONAL STANDARD. Information technology MPEG audio technologies Part 3: Unified speech and audio coding

For Mac and iphone. James McCartney Core Audio Engineer. Eric Allamanche Core Audio Engineer

Pyramid Coding and Subband Coding

REAL-TIME DIGITAL SIGNAL PROCESSING

AUDIOVISUAL COMMUNICATION

TEMPORAL ENVELOPE CORRECTION FOR ATTACK RESTORATION IN LOW BIT-RATE AUDIO CODING

Networking Applications

Transcription:

ROW.mp3 Colin Raffel, Jieun Oh, Isaac Wang Music 422 Final Project 3/12/2010

Motivation

The realities of mp3 widespread use low quality vs. bit rate when compared to modern codecs Vision for row-mp3 backwards compatible with mp3 for easy adoption higher quality minimal data rate increase

Approach

Coding the difference between the original and mp3 impracticality of lossless approach (mp3hd) Exploiting features specific to the difference data "noisy"/largely stochastic "flat" spectrum (Take a Listen to the difference files) Use ID3 tags in the metadata section of mp3 store up to 16 megabytes of data (ID3v2.x) use TXXX user defined text information tag row-mp3 ignorant players will play the mp3 as usual while proper decoders will play a higher quality file

Overview Encoder Decoder

Flow Diagram for ROW.mp3 Encoder

Flow Diagram for ROW.mp3 Decoder

Implementation Noise shaping Non-stochastic error coding Huffman coding Using the ID3 tags Time matching error and mp3 Dependencies

Noise shaping Exploit the "helpful" parts of noise and hearing humans can't differentiate between noise signals noisiness is (somewhat) easily measured hearing is on a per-critical-band basis Don't code noise, just code noise level in each band level estimate based on spectral flux Decode by synthesizing weighted noise signal overlap-add to prevent discontinuities interpolation between noise levels

Synthesized noise spectrum

Non-stochastic (tonal) error coding Tonal component separation is difficult complex algorithms with high cost works poorly for high-noise signals (like coding error) Instead, use "inverse flux" look for stationary spectral components quotient approach for smoother output power parameter determines repeat importance Code tonal error with PAC at low bit rate simple signal makes PAC's job easier

Huffman coding row-mp3 applies Huffman coding to the noise level data 25 floating-point numbers per block of 1024 samples reduces the mantissas size by ~50% (when quantized 4 bits)...assuming we generate a Huffman table specific to each given sound file the Huffman table is not very big, it's okay potentially also be applied to the PAC coding stage experimenting with PAC coding at 0.3bits/samp using 3 scale and 2 mantissa bits: mantissas coding: ~70% of original scale factors coding: ~90% of the original

Huffman coding: modules huffmancode.py creates a Huffman binary tree given a list of dictionary data (symbol, frequency) pairs for quick look-up of symbols and codes, also creates two dictionaries from this tree: Symbol2Code Code2Symbol trainnoise method in traindata.py input: array of entire noise level output: Code2Symbol dictionary Huffman-coded quantized noise values

Using the ID3 tags ID3 tag specifications each tag can hold up to 16 MB TXXX user defined text information tag tags can only hold unicode strings use Python pickle module to serialize as strings use eyed3 Python library Store extra data for error + noise in ID3v2.x tags arrays of mantissas, scales, bit allocation for PAC-coded error Huffman-encoded noise levels Huffman table

Time matching error and mp3

Dependencies LAME v3.98.3 wav to mp3 encoder mpg123 v1.10.1 mp3 to wav decoder eyed3 v0.6.17 ID3 tag manipulation scipy v0.8.0 wav file reading/writing

Evaluation Data Rate Analysis Listening Test

Data Rate Analysis Error levels 25 bands, 8 bits per band, 1024 samples per block, 44100 samples per second, 50% Huffman coding gain = 4 kbps per channel PAC tonal error.2 bits per sample = 8 kbps per channel Total data rate mp3 data rate per channel + 12 kbps per channel

Listening Test: MUSHRA Formats: Reference file (lossless, 44.1khz 16 bit PCM) 3.5 khz low-pass filtered reference (as required by MUSHRA) 128 kbps mp3 128 kbps row-mp3 64 kbps mp3 64 kbps row-mp3 320 kbps mp3 Audio Sources: Dance/electronic music Pop/country music Rock/blues music Glockenspiel Harpsichord Male Speech Castanets https://ccrma.stanford.edu/~craffel/etc/mp3challenge/

Listening Test: Results Preference for row-mp3 for low bitrate for music 64 kbps row-mp3 ranked significantly higher for "complex"/music signals 128 kbps row-mp3 ranked roughly equivalent

Future Work

An intelligent algorithm which analyzes an mp3 file and predicts the error in absence of the original lossless file Noise synthesis in the time domain with a scaled filter bank rather than using random complex numbers in the frequency domain Block switching when extracting the noisy component to deal with poor coding of transients Direct coding of missing transients in the time domain A more intelligent tonal algorithm with better reconstruction in the time domain A perceptual audio codec for the tonal component which is especially well suited for low data rates and coding highly tonal sound Application of Huffman coding for the perceptual audio coder component to further reduce the file size

Conclusion

In summary row-mp3 does the following: (lossless audio file) - (mp3) => ID3 tag of mp3 Backwards-compatible with the mp3 Small storage size Exploited the noisy nature of the error: Passed quantized, Huffman coded per-critical band noise level values For the remainder of error: Basic tonal extraction and used a standard perceptual audio coder to decrease file size. With some potential improvements, the row-mp3 codec could provide a viable, backwards-compatible solution to low-quality mp3s at low bit rates.

Acknowledgments

Special thanks to: Professor Bosi for great lectures, advice, and feedback Craig Sapp for help on course materials All who participated in the "mp3 challenge"!