Audio 1. Audio and Speech

Similar documents
Audio and Speech. anti-aliasing filter. amplifier. codec A D. G.7xx. 1mV A D. G.7xx. Digital sound. Digital audio. Audio coding

1 Multimedia Content

AUDIO. Henning Schulzrinne Dept. of Computer Science Columbia University Spring 2015

Speech-Coding Techniques. Chapter 3

Digital Speech Coding

2.4 Audio Compression

Audio Fundamentals, Compression Techniques & Standards. Hamid R. Rabiee Mostafa Salehi, Fatemeh Dabiran, Hoda Ayatollahi Spring 2011

Principles of Audio Coding

GSM Network and Services

Source Coding Basics and Speech Coding. Yao Wang Polytechnic University, Brooklyn, NY11201

CT516 Advanced Digital Communications Lecture 7: Speech Encoder

Audio and video compression

Perceptual Coding. Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding

Audio Coding and MP3

Audio Compression. Audio Compression. Absolute Threshold. CD quality audio:

Audio-coding standards

Perceptual coding. A psychoacoustic model is used to identify those signals that are influenced by both these effects.

Application of wavelet filtering to image compression

Audio-coding standards

Chapter 14 MPEG Audio Compression

Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal.

Data Compression. Audio compression

5: Music Compression. Music Coding. Mark Handley

ELL 788 Computational Perception & Cognition July November 2015

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Snr Staff Eng., Team Lead (Applied Research) Dolby Australia Pty Ltd

Mpeg 1 layer 3 (mp3) general overview

Synopsis of Basic VoIP Concepts

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Senior Manager, CE Technology Dolby Australia Pty Ltd

Parametric Coding of High-Quality Audio

EE482: Digital Signal Processing Applications

Lecture 16 Perceptual Audio Coding

DSP. Presented to the IEEE Central Texas Consultants Network by Sergio Liberman

For Mac and iphone. James McCartney Core Audio Engineer. Eric Allamanche Core Audio Engineer

Speech and audio coding

Ch. 5: Audio Compression Multimedia Systems

MULTIMODE TREE CODING OF SPEECH WITH PERCEPTUAL PRE-WEIGHTING AND POST-WEIGHTING

Transporting audio-video. over the Internet

Multimedia Systems Speech II Mahdi Amiri February 2012 Sharif University of Technology

The MPEG-4 General Audio Coder

Mahdi Amiri. February Sharif University of Technology

Optical Storage Technology. MPEG Data Compression

ITNP80: Multimedia! Sound-II!

Multimedia Systems Speech II Hmid R. Rabiee Mahdi Amiri February 2015 Sharif University of Technology

Multimedia Systems Speech I Mahdi Amiri February 2011 Sharif University of Technology

Lecture 7: Audio Compression & Coding

Appendix 4. Audio coding algorithms

Evaluating MMX Technology Using DSP and Multimedia Applications

Lecture Information Multimedia Video Coding & Architectures

Design of a CELP Speech Coder and Study of Complexity vs Quality Trade-offs for Different Codebooks.

AUDIOVISUAL COMMUNICATION

Squeeze Play: The State of Ady0 Cmprshn. Scott Selfon Senior Development Lead Xbox Advanced Technology Group Microsoft

Voice Over LTE (VoLTE) Technology. July 23, 2018 Tim Burke

New Results in Low Bit Rate Speech Coding and Bandwidth Extension

CISC 7610 Lecture 3 Multimedia data and data formats

KINGS COLLEGE OF ENGINEERING DEPARTMENT OF INFORMATION TECHNOLOGY ACADEMIC YEAR / ODD SEMESTER QUESTION BANK

Voice Quality Assessment for Mobile to SIP Call over Live 3G Network

White Paper Voice Quality Sound design is an art form at Snom and is at the core of our development utilising some of the world's most advance voice

ON-LINE SIMULATION MODULES FOR TEACHING SPEECH AND AUDIO COMPRESSION TECHNIQUES

End-to-end speech and audio quality evaluation of networks using AQuA - competitive alternative for PESQ (P.862) Endre Domiczi Sevana Oy

Perceptual Pre-weighting and Post-inverse weighting for Speech Coding

15: OS Scheduling and Buffering

Networking Applications

AUDIOVISUAL COMMUNICATION

MOHAMMAD ZAKI BIN NORANI THESIS SUBMITTED IN FULFILMENT OF THE DEGREE OF COMPUTER SCIENCE (COMPUTER SYSTEM AND NETWORKING)

Lecture Information. Mod 01 Part 1: The Need for Compression. Why Digital Signal Coding? (1)

Building Residential VoIP Gateways: A Tutorial Part Three: Voice Quality Assurance For VoIP Networks

CHAPTER 10: SOUND AND VIDEO EDITING

MPEG-4 General Audio Coding

The Steganography In Inactive Frames Of Voip

Module 9 AUDIO CODING. Version 2 ECE IIT, Kharagpur

REAL-TIME DIGITAL SIGNAL PROCESSING

Parametric Coding of Spatial Audio

3 Sound / Audio. CS 5513 Multimedia Systems Spring 2009 LECTURE. Imran Ihsan Principal Design Consultant

ijdsp Interactive Illustrations of Speech/Audio Processing Concepts

Figure 1. Generic Encoder. Window. Spectral Analysis. Psychoacoustic Model. Quantize. Pack Data into Frames. Additional Coding.

Lossy compression. CSCI 470: Web Science Keith Vertanen

Digital Recording and Playback

Perspectives on Multimedia Quality Prediction Methodologies for Advanced Mobile and IP-based Telephony

A MULTI-RATE SPEECH AND CHANNEL CODEC: A GSM AMR HALF-RATE CANDIDATE

Embedding Audio into your RX Application

Principles of MPEG audio compression

The Effect of Bit-Errors on Compressed Speech, Music and Images

Multimedia Communications. Audio coding

Compressed Audio Demystified by Hendrik Gideonse and Connor Smith. All Rights Reserved.

Audio Coding Standards

Implementation of G.729E Speech Coding Algorithm based on TMS320VC5416 YANG Xiaojin 1, a, PAN Jinjin 2,b

Experiment 6 The Real World Interface

CS 260: Seminar in Computer Science: Multimedia Networking

ENHANCED MODIFIED BARK SPECTRAL DISTORTION (EMBSD): AN OBJECTIVE SPEECH QUALITY MEASURE BASED ON AUDIBLE DISTORTION AND COGNITION MODEL

Perceptual Quality Measurement and Control: Definition, Application and Performance

Multimedia Systems Speech I Mahdi Amiri September 2015 Sharif University of Technology

The BroadVoice Speech Coding Algorithm. Juin-Hwey (Raymond) Chen, Ph.D. Senior Technical Director Broadcom Corporation March 22, 2010

MPEG-4 Version 2 Audio Workshop: HILN - Parametric Audio Coding

Module objectives. Integrated services. Support for real-time applications. Real-time flows and the current Internet protocols

Course Syllabus. Website Multimedia Systems, Overview

VLSI Solution. VS10XX - Plugins. Plugins, Applications. Plugins. Description. Applications. Patches. In Development. Public Document.

SAOC and USAC. Spatial Audio Object Coding / Unified Speech and Audio Coding. Lecture Audio Coding WS 2013/14. Dr.-Ing.

Introduction to LAN/WAN. Application Layer 4

_APP B_549_10/31/06. Appendix B. Producing for Multimedia and the Web

Chapter 5.5 Audio Programming

Transcription:

Audio 1 Audio and Speech

Audio 2 Digital sound amplifier anti-aliasing filter codec 1mV A D G.7xx packetization A D G.7xx

Audio 3 Digital audio sample each audio channel and quantize pulse-code modulation (PCM) Nyquist bound: need to sample at twice (+ ɛ) the maximum signal frequency analog telephony: 300 Hz 3400 Hz 8 khz sampling 8 bits/sample, 64 kb/s FM radio: 15 khz audio CD: 44,100 Hz sampling, 16 bits/sample (based on video equipment used for early recordings) more bits more dynamic range, lower distortion audio highly redundant compression almost all codecs fixed rate

Audio 4 Audio coding application frequency sampling AD/DA bits application telephone 300-3400 Hz 8 khz 12 13 PSTN wide band 50-7000 Hz 16 khz 14 15 conferencing high-quality 30-15000 Hz 32 khz 16 FM, TV 20-20000 Hz 44.1 khz 16 CD 10-22000 Hz 48 khz 24 pro-audio

Audio 5 Digital audio: sampling 1.00 0.75 0.50 0.25 0 0.25 0.50 0.75 1.00 1 2 T T 1 2 T T T (a) (b) (c) distortion: signal-to-(quantization) noise ratio

Audio 6 Digital audio: compression Alternatives for compression: companding: non-linear quantization µ-law (G.711) waveform: exploit statistical correlation between samples model: model voice, extract parameters (e.g., pitch) subband: split signal into bands (e.g., 32) and code individually MPEG audio coding Newer codings: make use of masking properties of human ear

Audio 7 Judging a codec bitrate quality delay: algorithmic delay, processing robustness to loss complexity: MIPS, floating vs. fixed point, encode vs. decode tandem performance can the codec be embedded? non-speech performance: music, voiceband data, fax, tones,...

Audio 8 Quality metrics speech vs. music communications vs. toll quality mean opinion score (MOS) and degradation MOS score MOS DMOS 5 excellent inaudible no effort required 4 good, toll quality audible, but not annoying no appreciable effort 3 fair slightly annoying moderate effort 2 poor annoying considerable effort 1 bad very annoying no meaning diagnostic rhyme test (DRT) for low-rate codecs (96 pairs like dune vs. tune ) 90% = toll quality

Audio 9 Companding: µ-law for G.711 ( PCMU ) 260 240 220 mu-law output 200 180 160 140 Also: A-law in Europe 120 0 5000 10000 15000 20000 25000 30000 35000 16-bit input

Audio 10 Silence detection (VAD) avoid transmitting silence during sentence pauses and/or other person talking detect silence based on energy, sound hangover unvoiced segments at end of words conferencing! comfort noise white noise, shaped noise with periodic updates transmit update (4 byte) when things change

Audio 11 Audio silence detection needed in conferences to avoid drowning in fan noise also reduces data rate in use in transoceanic telephony since 1950 s (TASI: time-assigned speech interpolation) use energy estimate (µ-law already close) or spectral properties (difficult) difficulty: background noise, levels vary vary noise threshold: threshold = running average + hysteresis if above threshold, increase running average by one for each block if below threshold, update running average speech has soft (unvoiced) beginnings and endings hang-over, pre-talkspurt burst

Audio 12 Speech codecs waveform codecs exploit sample correlation: 24-32 kb/s linear predictive (vocoder) on frames of 10 30 ms (stationary): remove correlation error is white noise vector quantization hybrid, analysis-by-synthesis entropy coding: frequent values have shorter codes runlength coding

Audio 13 Digital audio: compression coding kb/s MOS use LPC-10 2.4 2.3 robotic, secure telephone G.723.1 5.3/6.3 3.8 videotelephony (room for video) GSM HR 5.6 3.5 GSM 2.5G networks IS 641 7.4 4.0 TDMA (N. America) mobile (new) IS 54/136 7.95 3.5 TDMA (N. America) mobile (old) G.729 8.0 4.0 mobile telephony GSM EFR 12.2 4.0 GSM 2.5G GSM 13.0 3.5 European mobile phone G.728 16.0 4.0 low-delay G.726 16-40 low-complexity (ADPCM) G.726 32 4.1 low-complexity (ADPCM) DVI 32.0 toll-quality (Intel, Microsoft) G.722 64.0 7 khz codec (subband) G.711 64.0 4.5 telephone (µ-law, A-law) MPEG L3 56-128.0 N/A CD stereo 16 bit/44.1 khz 1411 compact disc

Audio 14 (Adaptive) Differential Pulse Code Modulation input signal fixed or adaptive quantizer error signal channel - inverse quantizer estimate linear predictor: fixed or adaptive coefficients reconstructed signal

Audio 15 Linear predictive codec white noise generator vocaltract filter impulse generator f E.g., LPC 10 at 2.4 kb/s: Sampling rate Frame length Linear predictive filter Pitch and voicing Gain information 8 khz 180 samples = 22.5 ms 10 coefficients = 42 bits 7 bits 5 bits

Audio 16 Linear predictive codec Σ Σ Σ Σ speech excitation D D D D linear (IIR) filter infinite impulse response (IIR) vs. finite impulse response

Audio 17 G.723 audio codec analysis-by-synthesis codec 5.3 or 6.3 kb/s bit rate 30 ms frames with 7.5 ms look-ahead MOS of 3.98 requires 40% of 100 MHz Pentium or 22 DSP MIPS, 16 kb code popular for videoconferencing with modems

Audio 18 G.729 audio codec 8 kb/s bit rate 10 ms frames with 5 ms look-ahead LD-CELP (low-delay code-excited linear prediction): filter coefficients, code book for excitations requires 25% of 100 MHz Pentium MOS of 3.7/3.2/2.8 with 0/3/5% packet loss deals with frame erasure Annex A: low-complexity; Annex B: VAD; Annex D: 6.4 kb/s; Annex E: 11.8 kb/s

Audio 19 Audio perceptual encoding ear = 24-26 overlapping bandpass filters (100 Hz to 5,000 Hz) maximum sensitivity between 1,000 and 5,000 Hz masked by stronger signal in close frequency proximity ISO MPEG-1 Layer I, II, III analysis filter bank AAC: 64 kb/s for mono for almost CD-quality

Audio 20 Distortion: signal-to-noise ratio error (noise) r(n) =x(n) y(n) variances σ 2 x,σ 2 y,σ 2 r power for signal with pdf p(x) and range V...+ V : σ 2 x = +V V (x x) 2 p(x)dx σ 2 u = 1 M M n=1 u2 (n) SNR = 6.02N 1.73 for uniform quantizer with N bits

Audio 21 Distortion measures SNR not a good measure of perceptual quality segmental SNR: time-averaged blocks (say, 16 ms) frequency weighting subjective measures: A-B preference subjective SNR: comparison with additive noise MOS (mean opinion score of 1-5), DRT, DAM,...

Audio 22 MOS vs. packet loss 4.5 G.711 Bernoulli (10ms) G.711 Bursty (10ms) G.729 Bursty (p_c=30%, 20ms) 4 3.5 MOS 3 2.5 2 1.5 0 0.05 0.1 0.15 0.2 p_u (loss%)

Audio 23 Objective speech quality measurements approximate human perception of noise and other distortions distortion due to encoding and packet loss (gaps, interpolation of decoder) examples: PSQM (P.861), PESQ (P.862), MNB, EMBSD compare reference signal to distorted signal either generate MOS scores or distance metrics much cheaper than subjective tests only for telephone-quality audio so far

Audio 24 Objectice quality measures PSQM: perceptual distance; can t handle delay offset PESQ: MOS scores; automatically detects and compensates for time-varying delay offsets between reference and degraded signal time-frequency mapping (FFT) frequency warping from Hertz scale to critical band domain (Bark spectrum) calculate noise disturbance as the difference of compressed loudness (Sone) intensity in each band between the two signals, with threshold masking asymmetry modeling (addition of an unrelated frequency component is worse than omission of a component of the reference signal)

Audio 25 Objective vs. Subjective MOS Objective MOS tools don t always handle loss impairments correctly: 12 10 Objective MOS correlation EMBSD PSQM PSQM+ MNB1 MNB2 Objective Perceptual Quality 8 6 4 2 0 1.5 2 2.5 3 3.5 4 4.5 Subjective MOS

Audio 26 Audio traffic models talkspurt: constant bit rate: one packet every 20... 100 ms mean: 1.67 s silence period: usually none (maybe transmit background noise value) 1.34 s for telephone conversation, both roughly exponentially distributed double talk for hand-off may vary between conversations... only in aggregate

Audio 27 Multiplexing traffic In a diff-serv buffer, with R =0.5 =reserved/peak: p_o (Out of profile packet probability) 1 0.1 0.01 0.001 N = 5 Effect of N (multiplexing factor) and R (token rate) on p_o N = 30 N = 100 R = 0.5 expo CDF trace 0.0001 0 10 20 30 40 50 60 70 80 90 100 token bucket buffer size B (in number of packets) G.729B: about 42-43% silence

Audio 28 Audio on Solaris cat sample.au > /dev/audio works; see samples in /usr/demo/sound/sounds/ audiocontrol to change ports and volume audiotool for recording and playback audioplay for playing back sound files

Audio 29 Audio on Solaris #include <sys/audioio.h> ac = open("/dev/audioctl", O_RDWR); au = open("/dev/audio", O_RDWR); AUDIO_INITINFO(&ai); ai.record.port = AUDIO_MICROPHONE; ai.play.port = AUDIO_SPEAKER; ioctl(ac, AUDIO_SETINFO, &ai); bytes = read(au, buffer, bytes); write(au, buffer, bytes); Careful with opening audio input - keep bit bucket handy!

Audio 30 Audio on Solaris read() blocks until audio read set device to non-blocking if only currently available audio needed write() returns when copied to device, but playout may last longer typical loop for world s most expensive microphone amplifier: while (1) { b = read(au, buffer, bytes); /* audio processing */ write(au, buffer, b); } ioctl(au, AUDIO_DRAIN, 0); blocks until audio played out use select() to handle both network and audio input

Audio 31 Event-based programs read() is blocking server only works with single socket audio, network input need I/O multiplexing event-based programming also need to handle time-outs, connection requests all events (mouse clicks, windows, etc.) handled by event loop do wait for event(s) handle event (hopefully short) forever harder to maintain state, recursion alternative 1: interrupts (signals) called at any time alternaitve 2: threads (separate scheduling, same address space)

Audio 32 Multiplexing with select() int select(int nfds, fd_set *readfds, fd_set *writefds, fd_set *exceptfds, struct timeval *timeout) block until >= 1file descriptors have something to be read, written, or an exception, or timeout set bit mask for descriptors to watch using FD SET returns with bits for ready descriptors set check with FD ISSET cannot specify amount of data ready

Audio 33 Audio timing Need to write block of audio to speaker every t ms (t = 20... 100 ms) 1. timer OS overhead may not be accurate error accumulation (time between timers) clock may differ from audio sampling clock 2. use audio input: for every block read, write one audio block stay in sync 3. but: doesn t work for half-duplex audio cards

Audio 34 Audio on Linux /dev/audio for µ-law device /dev/dsp for general samples aumix allows to configure mixer use fuser -v /dev/dsp to find out who s using the device watch for endianness - Linux supports both LE and BE guide at http://www.4front-tech.com/pguide/audio.html

Audio 35 Audio on Linux: example #include <ioctl.h> #include <unistd.h> #include <fcntl.h> #include <sys/soundcard.h> #define BUF_SIZE 4096 int format = AFMT_S16_LE, stereo = 1, speed = 11025; if ((audio_fd = open("/dev/dsp", open_mode, 0)) == -1) { /* error */ } if (ioctl(audio_fd, SNDCTL_DSP_SETFMT, &format) == -1) { } if (ioctl(audio_fd, SNDCTL_DSP_STEREO, &stereo) == -1 { } if (ioctl(audio_fd, SNDCTL_DSP_SPEED, &speed) == -1) { } /* set to full duplex */ if (ioctl(audio_fd, SNDCTL_DSP_SETDUPLEX, 0) == -1) { }

Audio 36 Java sound interface Java Sound API (java.sun.com/products/java-media/sound/) higher layer: Java Media Framework (JMF), includes RTP javax.sound.sampled.spi as service provider

Audio 37 Java sound API: example TargetDataLine line; DataLine.Info info = new DataLine.Info(TargetDataLine.class, format); // format is an AudioFormat object byte[] data = new byte[line.getbuffersize()/5]; if (!AudioSystem.isLineSupported(info)) { // Handle the error... } // Obtain and open the line. try { line = (TargetDataLine) AudioSystem.getLine(info); line.open(format, buffersize); } catch (LineUnavailableException ex) { // Handle the error... } // Begin audio capture. line.start(); numbytesread = line.read(data, offset, data.length);

Audio 38 Audio mixing for conferences: several concurrent talkers (+ open mikes) at least temporarily adding two voices twice as loud (+3 db) mixing audio adding linear samples for waveform-encoded (e.g., µ-law) samples: use lookup table

Audio 39 References J. Bellamy, Digital Telephony, 2nd ed., Wiley, 1991. N. S. Jayant and P. Noll, Digital Coding of Waveforms, Prentice Hall. R. Steinmetz and K. Nahrstedt, Multimedia: Computing, Communications and Applications. Upper Saddle River, New Jersey: Prentice-Hall, 1995. O. Hersent, D. Gurle and J.P. Petit, IP Telephony, Addison-Wesley, 2000. L.R. Rabiner and R.W. Schafer, Digital Processing of Speech Signals, Prentice-Hall, 1978. See also http://www.cs.columbia.edu/ hgs/audio