Squeeze Play: The State of Ady0 Cmprshn. Scott Selfon Senior Development Lead Xbox Advanced Technology Group Microsoft

Similar documents
Compressed Audio Demystified by Hendrik Gideonse and Connor Smith. All Rights Reserved.

Lecture 16 Perceptual Audio Coding

Perceptual coding. A psychoacoustic model is used to identify those signals that are influenced by both these effects.

Audio Fundamentals, Compression Techniques & Standards. Hamid R. Rabiee Mostafa Salehi, Fatemeh Dabiran, Hoda Ayatollahi Spring 2011

Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal.

Chapter 5.5 Audio Programming

Data Compression. Audio compression

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Senior Manager, CE Technology Dolby Australia Pty Ltd

ITNP80: Multimedia! Sound-II!

Optimizing Audio for Mobile Development. Ben Houge Berklee College of Music Music Technology Innovation Valencia, Spain

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Snr Staff Eng., Team Lead (Applied Research) Dolby Australia Pty Ltd

For Mac and iphone. James McCartney Core Audio Engineer. Eric Allamanche Core Audio Engineer

5: Music Compression. Music Coding. Mark Handley

Principles of Audio Coding

Mpeg 1 layer 3 (mp3) general overview

EE482: Digital Signal Processing Applications

Digital Speech Coding

DSP. Presented to the IEEE Central Texas Consultants Network by Sergio Liberman

Skill Area 214: Use a Multimedia Software. Software Application (SWA)

Chapter 14 MPEG Audio Compression

2.4 Audio Compression

Audio Coding and MP3

Parametric Coding of Spatial Audio

Audio and video compression

Audio-coding standards

Audio-coding standards

AUDIOVISUAL COMMUNICATION

Appendix 4. Audio coding algorithms

ELL 788 Computational Perception & Cognition July November 2015

VS1063 ENCODER DEMONSTRATION

CS 074 The Digital World. Digital Audio

Fundamentals of Perceptual Audio Encoding. Craig Lewiston HST.723 Lab II 3/23/06

KINGS COLLEGE OF ENGINEERING DEPARTMENT OF INFORMATION TECHNOLOGY ACADEMIC YEAR / ODD SEMESTER QUESTION BANK

Parametric Coding of High-Quality Audio

AUDIOVISUAL COMMUNICATION

Compression; Error detection & correction

Speech-Coding Techniques. Chapter 3

Perceptual Coding. Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding

Ch. 5: Audio Compression Multimedia Systems

Multimedia Communications. Audio coding

Transporting audio-video. over the Internet

New Results in Low Bit Rate Speech Coding and Bandwidth Extension

<< WILL FILL IN THESE SECTIONS THIS WEEK to provide sufficient background>>

AUDIO. Henning Schulzrinne Dept. of Computer Science Columbia University Spring 2015

Audio Compression. Audio Compression. Absolute Threshold. CD quality audio:

Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig

1 Audio quality determination based on perceptual measurement techniques 1 John G. Beerends

Fundamental of Digital Media Design. Introduction to Audio

UNDERSTANDING MUSIC & VIDEO FORMATS

Digital Audio Basics

the gamedesigninitiative at cornell university Lecture 15 Game Audio

Export Audio Mixdown

Figure 1. Generic Encoder. Window. Spectral Analysis. Psychoacoustic Model. Quantize. Pack Data into Frames. Additional Coding.

CT516 Advanced Digital Communications Lecture 7: Speech Encoder

Speech and audio coding

CHAPTER 6 Audio compression in practice

Digital Media. Daniel Fuller ITEC 2110

Lecture #3: Digital Music and Sound

MPEG-1. Overview of MPEG-1 1 Standard. Introduction to perceptual and entropy codings

2.1 Transcoding audio files

Embedding Audio into your RX Application

Perceptual Pre-weighting and Post-inverse weighting for Speech Coding

CHAPTER 10: SOUND AND VIDEO EDITING

Audio Coding Standards

Optical Storage Technology. MPEG Data Compression

AET 1380 Digital Audio Formats

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007

Audio for Everybody. OCPUG/PATACS 21 January Tom Gutnick. Copyright by Tom Gutnick. All rights reserved.

FILE CONVERSION AFTERMATH: ANALYSIS OF AUDIO FILE STRUCTURE FORMAT

Chapter 4: Audio Coding

ITP 342 Mobile App Dev. Audio

Technical Specification for the OPERA Objective Perceptual Analyzer OPR-1XX-XXX-P

DRA AUDIO CODING STANDARD

Proceedings of Meetings on Acoustics

SAOC and USAC. Spatial Audio Object Coding / Unified Speech and Audio Coding. Lecture Audio Coding WS 2013/14. Dr.-Ing.

_APP B_549_10/31/06. Appendix B. Producing for Multimedia and the Web

The MPEG-4 General Audio Coder

ABSTRACT. that it avoids the tolls charged by ordinary telephone service

Source Coding Basics and Speech Coding. Yao Wang Polytechnic University, Brooklyn, NY11201

Efficient Representation of Sound Images: Recent Developments in Parametric Coding of Spatial Audio

CISC 7610 Lecture 3 Multimedia data and data formats

The Effect of Bit-Errors on Compressed Speech, Music and Images

Application of wavelet filtering to image compression

MPEG-4 aacplus - Audio coding for today s digital media world

Design Brief CD15 Prisma Compact Disc and Digital Music Player

What is multimedia? Multimedia. Continuous media. Most common media types. Continuous media processing. Interactivity. What is multimedia?

Networking Applications

Lossy compression. CSCI 470: Web Science Keith Vertanen

Principles of MPEG audio compression

SPREAD SPECTRUM AUDIO WATERMARKING SCHEME BASED ON PSYCHOACOUSTIC MODEL

ITEC310 Computer Networks II

MULTIMODE TREE CODING OF SPEECH WITH PERCEPTUAL PRE-WEIGHTING AND POST-WEIGHTING

Lecture 7: Audio Compression & Coding

Multimedia Systems Speech II Mahdi Amiri February 2012 Sharif University of Technology

Multimedia Systems Speech I Mahdi Amiri February 2011 Sharif University of Technology

Multimedia. What is multimedia? Media types. Interchange formats. + Text +Graphics +Audio +Image +Video. Petri Vuorimaa 1

3 Sound / Audio. CS 5513 Multimedia Systems Spring 2009 LECTURE. Imran Ihsan Principal Design Consultant

VLSI Solution. VS10XX - Plugins. Plugins, Applications. Plugins. Description. Applications. Patches. In Development. Public Document.

Digital Audio. Amplitude Analogue signal

Module 9 AUDIO CODING. Version 2 ECE IIT, Kharagpur

Transcription:

Squeeze Play: The State of Ady0 Cmprshn Scott Selfon Senior Development Lead Xbox Advanced Technology Group Microsoft

Agenda Why compress? The tools at present Measuring success A glimpse of the future

The Philosophy of Compression

The tools of the present Black box codecs Parameters that may or may not have wellunderstood meaning Results that may or may not be appropriate Compression targets Iteration slow enough to be discouraged Bulk quality settings

Compression formats, ca. 2012 Lossless codecs (<3:1): FLAC, Apple Lossless Lossy codecs Reductions (up to :1): sample rate, bit depth, channel count, noise floor, culling Time domain: A-law/u-law, ADPCM (~4:1) Perceptual (6-40+:1): MP3, Ogg Vorbis, XMA, etc. Hybrids (vary): AAL, WavPack, MP3 variants

PCM Yes, still compression! Pulse Code Modulation Analog signal regularly sampled and stored digitally Bit depth: Storage representation of a sample Linear PCM = linear quantization Sampling rate: Frequency of analog signal capture or reproduction Nyquist frequency (SR/2)

PCM and Quantization Frequency quantization 44,100 Hz can represent sound frequencies up to 22,050 Hz Amplitude quantization 16 bits: 20 log 216 8 bits: 20 log 28 2 2 = ~90 db range = ~42 db range

PCM A-Law/µ-Law (G.711) Pulse Code Modulation (1972, ITU 1988) Adds compander support A-Law (13 bit signed 8 bit signed) µ-law (14 bit signed 8 bit signed) Encodes location of most significant non-zero bit, drops one or more LSBs Designed for telephony (8 khz, 8 bit)

ADPCM (G.726) Adaptive Differential Pulse Code Modulation (ITU 1970s, IMA 1990s) Stores difference between samples Quantized to a step size lookup table ~4:1 compression (16 bits 4 bits) Cheap to decode on CPU, straightforward to HW accelerate

ADPCM Artifacts Codec assumption: Signal slope doesn t change suddenly Poor response to transients, quick attacks Settling time before silence Challenged particularly at lower sampling rates (<32 khz) Step size quantization errors PCM source ADPCM output

Perceptual Compression MP3, WMA, XMA, AAC, Ogg Vorbis, ATRAC, AC-3 Psychoacoustic: based on human frequency sensitivities Frequency-domain compression Take advantage of limits of auditory perception

Perceptual Compression Strategies Frequency sensitivities Nominally 20 khz, often realistically 16 khz Most sensitive to speech range Absolute threshold of hearing Masking

Acoustic Masking Frequency Masking 20 50 100 200 500 1000 2000 4000 8000 16000 Time Masking Forward masking Backward masking A narrow 1200 Hz noise band masks sounds at higher frequencies (Scharf 1975)

Perceptual Codec Artifacts Time frequency domain artifacts Window size limits accuracy for transients: ringing or pre-echoes Loss of phase information: warbles, underwater Channel collapse/recreation artifacts Spatial loss and cross-talk

Game-Specific Perceptual Artifacts (Or, Games are from Mars, Codecs are from Venus) Pitch shifting Mixing / Synchronization Repetition and Reuse Looping

New Dog, Old Tricks Sample rate reduction Bit depth reduction Channel reduction Normalization can all be less effective (or ineffective) with perceptual codecs

Choosing a Compression Format Support (device platform, middleware) Performance tradeoffs (CPU or hardware) Licensing (or lack thereof)

Evaluating Codec Capabilities Storage and bandwidth Decode latency Multichannel support (and leveraging) Looping accuracy Seamless seeking Perceptual quality

Measuring Success Critical listening and perceptual codecs

Squeeze Play: The Game Show Which wave is more compressed? A B C PCM (46 KB) XMA q60 (8 KB, ~6:1) ADPCM (12.5 KB, ~3.6:1)

Which wave is more compressed? Input (44.1 khz PCM) 1.85 MB A B Output (XMA, quality 1) 140 KB [13:1 compression] Output (xwma, 48 kbps) 76 KB [24:1 compression]

Measuring Success Critical listening and perceptual codecs Visual evaluations

Which wave is more compressed? Input (32 khz PCM) 298 KB A B C Output (ADPCM) 82 KB [3.6:1 compression] Output (xwma, 20 kbps) 16 KB [18.6:1 compression] Output (XMA, quality 1) 28 KB [10.6:1 compression]

Measuring Success Critical listening and perceptual codecs Visual evaluations Delta evaluations (Taylor, 2011)

Delta Evaluations

Measuring Success Critical listening and perceptual codecs Visual evaluations Delta evaluations (Taylor, 2011) Automated evaluation PESQ/POLQA (ITU-T Rec. P.863) PEAQ (ITU BS.1387-1) Noise to Mask Ratio (NMR)

NMR Evaluation Noise to Mask Ratio Windowed evaluation of Signal-to-Mask Ratio (SMR) minus Signal-to-Noise Ratio (SNR) NMR at three XMA quality settings (Mathews 2012)

The Compression of the Future? Self-correcting/adjusting compression Communicating more with less Linguistic sounds and speech synthesis MIDI music: the revenge? Parameterized procedural synthesis Case study: impacts

frequency Impacts Resonant decay + transient Compress as modes + residual (>150:1) Lloyd, Raghuvanshi, Govindaraju (ACM, 2011) = + time Original Modal ( Clean ) Residual ( Noise )

Conclusions Know thy artifacts And use appropriate techniques to counter What s the playback context? More robust qualitative evaluation Avoid the bulk knob Consider automating listening tests

Questions? scottsel@microsoft.com Xbox LIVE Gamertag: Timmmmmay