Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Snr Staff Eng., Team Lead (Applied Research) Dolby Australia Pty Ltd

Similar documents
Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Senior Manager, CE Technology Dolby Australia Pty Ltd

Perceptual coding. A psychoacoustic model is used to identify those signals that are influenced by both these effects.

Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal.

Principles of Audio Coding

Chapter 14 MPEG Audio Compression

5: Music Compression. Music Coding. Mark Handley

CHAPTER 10: SOUND AND VIDEO EDITING

DSP. Presented to the IEEE Central Texas Consultants Network by Sergio Liberman

EE482: Digital Signal Processing Applications

Audio Fundamentals, Compression Techniques & Standards. Hamid R. Rabiee Mostafa Salehi, Fatemeh Dabiran, Hoda Ayatollahi Spring 2011

Perceptual Coding. Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding

2.4 Audio Compression

ELL 788 Computational Perception & Cognition July November 2015

Audio-coding standards

Compressed Audio Demystified by Hendrik Gideonse and Connor Smith. All Rights Reserved.

Enhanced Audio Features for High- Definition Broadcasts and Discs. Roland Vlaicu Dolby Laboratories, Inc.

Data Compression. Audio compression

Audio Coding and MP3

Figure 1. Generic Encoder. Window. Spectral Analysis. Psychoacoustic Model. Quantize. Pack Data into Frames. Additional Coding.

Squeeze Play: The State of Ady0 Cmprshn. Scott Selfon Senior Development Lead Xbox Advanced Technology Group Microsoft

The following bit rates are recommended for broadcast contribution employing the most commonly used audio coding schemes:

Lecture 16 Perceptual Audio Coding

Audio-coding standards

AUDIOVISUAL COMMUNICATION

Audio Compression. Audio Compression. Absolute Threshold. CD quality audio:

Audio and video compression

CISC 7610 Lecture 3 Multimedia data and data formats

Multimedia Communications. Audio coding

CS 074 The Digital World. Digital Audio

AUDIOVISUAL COMMUNICATION

AET 1380 Digital Audio Formats

Implementation of a MPEG 1 Layer I Audio Decoder with Variable Bit Lengths

Appendix 4. Audio coding algorithms

Mpeg 1 layer 3 (mp3) general overview

Lecture #3: Digital Music and Sound

The MPEG-4 General Audio Coder

Efficient Representation of Sound Images: Recent Developments in Parametric Coding of Spatial Audio

ITNP80: Multimedia! Sound-II!

What is multimedia? Multimedia. Continuous media. Most common media types. Continuous media processing. Interactivity. What is multimedia?

Ch. 5: Audio Compression Multimedia Systems

Multimedia. What is multimedia? Media types. Interchange formats. + Text +Graphics +Audio +Image +Video. Petri Vuorimaa 1

CHAPTER 6 Audio compression in practice

ijdsp Interactive Illustrations of Speech/Audio Processing Concepts

DIGITAL TELEVISION 1. DIGITAL VIDEO FUNDAMENTALS

UNDERSTANDING MUSIC & VIDEO FORMATS

/ / _ / _ / _ / / / / /_/ _/_/ _/_/ _/_/ _\ / All-American-Advanced-Audio-Codec

Digital Media. Daniel Fuller ITEC 2110

Parametric Coding of High-Quality Audio

Wavelet filter bank based wide-band audio coder

KINGS COLLEGE OF ENGINEERING DEPARTMENT OF INFORMATION TECHNOLOGY ACADEMIC YEAR / ODD SEMESTER QUESTION BANK

Aud-X 5.1 Help.

For Mac and iphone. James McCartney Core Audio Engineer. Eric Allamanche Core Audio Engineer

1. Before adjusting sound quality

A Digital Audio Primer

MPEG-l.MPEG-2, MPEG-4

Chapter 5.5 Audio Programming

Principles of MPEG audio compression

Fundamental of Digital Media Design. Introduction to Audio

Embedding Audio into your RX Application

Bluray (

Parametric Coding of Spatial Audio

DAB. Digital Audio Broadcasting

Module 9 AUDIO CODING. Version 2 ECE IIT, Kharagpur

Optical Storage Technology. MPEG Data Compression

Speech-Coding Techniques. Chapter 3

Networking Applications

MPEG-4 General Audio Coding

Introduction to LAN/WAN. Application Layer 4

Proceedings of Meetings on Acoustics

Port of a Fixed Point MPEG-2 AAC Encoder on a ARM Platform

University of Pennsylvania Department of Electrical and Systems Engineering Digital Audio Basics

3 Sound / Audio. CS 5513 Multimedia Systems Spring 2009 LECTURE. Imran Ihsan Principal Design Consultant

Features and Benefits Integrated Twin HD Digital Tuner

CSCD 443/533 Advanced Networks Fall 2017

Professor Laurence S. Dooley. School of Computing and Communications Milton Keynes, UK

Fundamentals of Perceptual Audio Encoding. Craig Lewiston HST.723 Lab II 3/23/06

Multimedia Technology

_äìé`çêé. Audio Compression Codec Specifications and Requirements. Application Note. Issue 2

Computing in the Modern World

Speech and audio coding

HAVE YOUR CAKE AND HEAR IT TOO: A HUFFMAN CODED, BLOCK SWITCHING, STEREO PERCEPTUAL AUDIO CODER

Contents. 3 Vector Quantization The VQ Advantage Formulation Optimality Conditions... 48

Multimedia Systems Speech I Mahdi Amiri February 2011 Sharif University of Technology

UHD Audio Center. Operation Manual

New Results in Low Bit Rate Speech Coding and Bandwidth Extension

Data Representation. Reminders. Sound What is sound? Interpreting bits to give them meaning. Part 4: Media - Sound, Video, Compression

LIVE MUSIC PERFORMANCES OVER HIGH- SPEED IP NETWORKS

Export Audio Mixdown

Tuning into a Radio Station

AUDIO MEDIA CHAPTER Background

145W per channel, this powerhouse envelopes you in a new level of sound performance. (145W x 8ohms 1kHz 0.9% THD with 1 ch.

2.1 Transcoding audio files

ITEC310 Computer Networks II

H6201 SMART LED TV 46" 50" SPEC SHEET PRODUCT HIGHLIGHTS. Wi-Fi Built In. key features

Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig

Compression; Error detection & correction

_APP B_549_10/31/06. Appendix B. Producing for Multimedia and the Web

H6203 SMART LED TV 40" 46" 50" 55" 60" 65" SPEC SHEET PRODUCT HIGHLIGHTS. Wi-Fi Built In. key features

The Gullibility of Human Senses

1 Audio quality determination based on perceptual measurement techniques 1 John G. Beerends

Transcription:

Introducing Audio Signal Processing & Audio Coding Dr Michael Mason Snr Staff Eng., Team Lead (Applied Research) Dolby Australia Pty Ltd Introducing Audio Signal Processing & Audio Coding 2013 Dolby Laboratories, Inc. 1

Overview Audio Signal Processing Applications Audio Signal Processing Basics Sampling What is an audio signal? Signal Processing Domains Case Study 1 Headphone Virtualisation Frequency Response FIR filtering Computational Complexity Case Study 2 Perceptual Audio Coding Psychoacoustics Introducing Audio Signal Processing & Audio Coding 2013 Dolby Laboratories, Inc. 2

Audio Signal Processing Applications Cinema Delivering channel based audio - 5.1 7.1 Distribute movies to multiple screens in a multiplex Cinemas use speaker arrays rather than single speakers so processing required to fill the arrays from single channel feeds Rendering object based audio Dolby Atmos Cinema soundtrack is express as individual objects and locations - in every cinema the movie is renderer for that specific cinema s speaker locations Speaker equalisation & protection Process the audio sent to each speaker to compensate for the frequency response of the speaker cones. Ensure that audio sent to the speakers doesn t over driver the speaker cones, which would damage them. Introducing Audio Signal Processing & Audio Coding 2013 Dolby Laboratories, Inc. 3

Audio Signal Processing Applications Broadcast / Home Theatre Compression of Audio for DVD / Blu-ray Disc Perceptual audio coding (case study later) Multi-channel audio coding Multiple languages Multiple playback formats (stereo / 5.1 / etc) Broadcast end-to-end Capture, coding, transmission, playback AV Receivers (AVRs), Set Top Boxes (STBs) Games consoles Introducing Audio Signal Processing & Audio Coding 2013 Dolby Laboratories, Inc. 4

Audio Signal Processing Applications Personal Audio Devices Mobile phones (feature phones & smart phones) Tablets Music players PCs Same issues as Home Theatre, but usually more limited acoustic hardware (i.e. cheap speakers) Headphone playback is a big use case (case study later) Introducing Audio Signal Processing & Audio Coding 2013 Dolby Laboratories, Inc. 5

Audio Signal Processing Applications Voice Processing Many of the same basic challenges but because speech has some specifically different characteristics from general audio, different solutions exist Speech coders use different approaches than audio codecs What makes a good codec is measured differently The transmission bandwidths used for the data is much more limited Conferencing & Telephony Introducing Audio Signal Processing & Audio Coding 2013 Dolby Laboratories, Inc. 6

Audio Signal Processing Basics Sampling Digital signals have samples which are discrete in time and magnitude Process of converting a continuous signal to the digital domain is Sampling Two key questions when sampling are: How often to sample & how precisely? Analogue to Digital Converter (ADC) Digital Signal Processing Digital to Analogue Converter (DAC) Introducing Audio Signal Processing & Audio Coding 2013 Dolby Laboratories, Inc. 7

Audio Signal Processing Basics Sampling Frequency Fs (how often?) Number of samples per second Nyquist rate: Greater than twice the highest frequency Introducing Audio Signal Processing & Audio Coding 2013 Dolby Laboratories, Inc. 8

Audio Signal Processing Basics Resolution (how precisely?) Each sample is represented by a number, how many bits should we use? Converting a continuous value to a discrete value requires quantisation. Quantisation Error 1 0.5 0-0.5 1-1.0 +1.0 0 Introducing Audio Signal Processing & Audio Coding 2013 Dolby Laboratories, Inc. 9

Audio Signal Processing Basics Resolution (how precisely?) By using more bits, we reduce the error 101 skipping all the math Each additional bit improves SNR (signal to noise ratio) by 6.01 db -1.0 +1.0 000 Introducing Audio Signal Processing & Audio Coding 2013 Dolby Laboratories, Inc. 10

Audio Signal Processing Basics Audio Signal Sampling Frequency Human perception 20 Hz 20,000 Hz Nyquist says Fs >= 40 khz CD Audio: 44.1 khz Blu-ray (and before that DAT): 48 khz Bit depth Range of loudness relative to human hearing Threshold of hearing 0 db Jet Engines 110-140 db Busy Road (standing at the curb) 100 db Sustained exposure will cause damage 85dB 16 bits per sample gives ~ 96 db of dynamic range 24 bits per sample = 144 db Introducing Audio Signal Processing & Audio Coding 2013 Dolby Laboratories, Inc. 11

Audio Signal Processing Basics Audio Signal Raw data rate 48 khz, 16 bits per sample = 768 kbps / ch 3.86 GB for a 2hr movie (5.1 channels) Introducing Audio Signal Processing & Audio Coding 2013 Dolby Laboratories, Inc. 12

Audio Signal Processing Basics Processing domains Sampled audio i.e., Pulse Code Modulated (PCM) data is in the time domain Not everything we want to do with audio is formulated as a time domain operation E.g.: Flattening the frequency response of a speaker The Fourier Transform expresses a signal in terms of it s frequency components (sinusoids). Using it we can formulate processing in the frequency domain Whether processing is implemented in the time or the frequency domain can depend on where it is most efficient. Signal processing also has other useful transform domains which may offer advantages for specific types of processing (e.g. image coding often uses the discrete cosine transform DCT) Introducing Audio Signal Processing & Audio Coding 2013 Dolby Laboratories, Inc. 13

Case Study 1 HEADPHONE VIRTUALISATION Introducing Audio Signal Processing & Audio Coding 2013 Dolby Laboratories, Inc. 14

Headphone Virtualisation How do you get surround sound out of a pair of headphones? Introducing Audio Signal Processing & Audio Coding 2013 Dolby Laboratories, Inc. 15

Headphone Virtualisation Two things we need to achieve: 1. Make it sound like the audio is coming from different directions 2. Make it sound like the listener is in a room. Both can be achieved by filtering the signal using the impulse response of the room, or the head-related transfer functions. Introducing Audio Signal Processing & Audio Coding 2013 Dolby Laboratories, Inc. 16

Headphone Virtualisation Room impulse response By measuring how a short impulsive sound is altered by a room, the room s reflections and echoes can be characterised to create an impulse response. https://www.youtube.com/watch?v=pkzjihtj4jc The impulse response can in turn be used to filter any signal, to make it sound like it was in the room. The process of filtering a signal using an impulse response is convolution: y[n] = h k x n k k= Introducing Audio Signal Processing & Audio Coding 2013 Dolby Laboratories, Inc. 17

Headphone Virtualisation Room impulse response How many points would be required to capture a room? (i.e. how long is the impulse response?) Limiting the impulse response to 50ms gives us 1440 points (@48kHz) Considering the computational cost 1440 * 48k > 69 MFLOPS Introducing Audio Signal Processing & Audio Coding 2013 Dolby Laboratories, Inc. 18

Headphone Virtualisation Computational load On a DSP chip with a single cycle MAC -> 69 MIPS On an ARM, MAC s ~ 3.5 cycles each -> ~240 MIPS 5.1 channels -> 10 filters = 2,400 MIPS Introducing Audio Signal Processing & Audio Coding 2013 Dolby Laboratories, Inc. 19

Headphone Virtualisation The solution? Convolution in Time domain <-> Multiplication in Frequency Domain Fourier Transform the impulse response & the signal Block based, e.g., blocks of 2048 O[N.log2(N)] -> k*22528 ~ 78,848 Operate in the Frequency domain, Complex multiplies -> 4 * 2048 -> 8,192 Transform the result back to the time domain. Same as forward transform Blocks per second? 23 blocks/sec ~4 MFLOPS / filter What about the HRTFs? Introducing Audio Signal Processing & Audio Coding 2013 Dolby Laboratories, Inc. 20

Headphone Virtualisation Head-related Transfer Function Measured on a dummy Applied as filters Same computational arguments lead us to the need to apply these in the frequency domain. NB: we don t need to go back to the time domain between the two sets of filters Introducing Audio Signal Processing & Audio Coding 2013 Dolby Laboratories, Inc. 21

Case study 2 PERCEPTUAL AUDIO CODING Introducing Audio Signal Processing & Audio Coding 2013 Dolby Laboratories, Inc. 22

Perceptual Audio Coding How do you reduce the storage and transmission bandwidth requirements of Audio signals? Bitrates: Uncompressed : 768 kbps / ch DVD (AC3) : 448 kbps (5.1 channels) Introducing Audio Signal Processing & Audio Coding 2013 Dolby Laboratories, Inc. 23

Perceptual Audio Coding Audio Coding is Lossy Lossless compression: must perfectly reconstruct their source. (zip files) Lossy compression: can throw away data if it isn t needed. The reconstruction need only be good enough. Deciding which bits to throw away and what is good enough is the hard part. Introducing Audio Signal Processing & Audio Coding 2013 Dolby Laboratories, Inc. 24

Perceptual Audio Coding Time/Frequency analysis Quantisation Entropy coding Psychoacoustic analysis Bit allocation Introducing Audio Signal Processing & Audio Coding 2013 Dolby Laboratories, Inc. 25

Perceptual Audio Coding Psychoacoustics Study of sound Perception Perception implies the human experience which include physiological and psychological factors. Is at the heart of the question of which parts of an audio signal are important, or unimportant. Introducing Audio Signal Processing & Audio Coding 2013 Dolby Laboratories, Inc. 26

Perceptual Audio Coding Psychoacoustics Most perceptual quantities are non-linear and subjective Loudness Non-linearly related to sound pressure Scales include: sone, phon Pitch Non-linearly related to frequency Scales include: Bark, Mel, ERB Introducing Audio Signal Processing & Audio Coding 2013 Dolby Laboratories, Inc. 27

Perceptual Audio Coding Frequency Masking Introducing Audio Signal Processing & Audio Coding 2013 Dolby Laboratories, Inc. 28

Perceptual Audio Coding Temporal Masking Introducing Audio Signal Processing & Audio Coding 2013 Dolby Laboratories, Inc. 29

Perceptual Audio Coding Time/Frequency analysis Break the incoming signal into time blocks and transform into the frequency domain Coding is always block based The frequency representation is analysed in bins of equal perceptual bandwidth (bark) Psychoacoustic analysis Use the frequency representation of the current block to calculate the masking curve Use the frequency masking curves from previous frames to account for temporal masking Time/Frequency analysis Psychoacoustic analysis Quantisation Bit allocation Introducing Audio Signal Processing & Audio Coding 2013 Dolby Laboratories, Inc. 30

Perceptual Audio Coding Masking Curve Areas of the spectrum where the masking curve is above the signal energy, represent things we can t hear If we can t hear them, we shouldn t spend bits encoding them Introducing Audio Signal Processing & Audio Coding 2013 Dolby Laboratories, Inc. 31

Perceptual Audio Coding Bit allocation Using the masking curve, we can calculate the allowed signal to noise ratio in each of the frequency bands Knowing that allocating a bit to a quantiser improves SNR by 6 db, iterative allocate the bits available in the bit pool to band, until we either; run out of bits, or exceed the SNR requirements in all bands (any left over bits can be used to code the next frame) The bit distribution must be sent to the decoder Quantiser Quantise the frequency domain representation to send to the decoder. Time/Frequency analysis Psychoacoustic analysis Quantisation Bit allocation Introducing Audio Signal Processing & Audio Coding 2013 Dolby Laboratories, Inc. 32

Perceptual Audio Coding Decoding is simple Recreate the frequency representation of each frame Transform back to the time domain Additional processing can be used to enhance the reconstructed signal Introducing Audio Signal Processing & Audio Coding 2013 Dolby Laboratories, Inc. 33

Introducing Audio Signal Processing & Audio Coding 2013 Dolby Laboratories, Inc. 34