CISC 7610 Lecture 3 Multimedia data and data formats

Similar documents
Audio-coding standards

Audio-coding standards

Perceptual Coding. Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding

Mpeg 1 layer 3 (mp3) general overview

Lecture 16 Perceptual Audio Coding

5: Music Compression. Music Coding. Mark Handley

CSCD 443/533 Advanced Networks Fall 2017

Chapter 14 MPEG Audio Compression

Module 9 AUDIO CODING. Version 2 ECE IIT, Kharagpur

DigiPoints Volume 1. Student Workbook. Module 8 Digital Compression

Audio Compression. Audio Compression. Absolute Threshold. CD quality audio:

MPEG-1. Overview of MPEG-1 1 Standard. Introduction to perceptual and entropy codings

Perceptual coding. A psychoacoustic model is used to identify those signals that are influenced by both these effects.

What is multimedia? Multimedia. Continuous media. Most common media types. Continuous media processing. Interactivity. What is multimedia?

Appendix 4. Audio coding algorithms

Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal.

EE482: Digital Signal Processing Applications

Multimedia. What is multimedia? Media types. Interchange formats. + Text +Graphics +Audio +Image +Video. Petri Vuorimaa 1

Principles of Audio Coding

Ch. 5: Audio Compression Multimedia Systems

IMAGE COMPRESSION. Image Compression. Why? Reducing transportation times Reducing file size. A two way event - compression and decompression

Lecture 8 JPEG Compression (Part 3)

Computer and Machine Vision

Lossy compression. CSCI 470: Web Science Keith Vertanen

Video Compression An Introduction

Optical Storage Technology. MPEG Data Compression

Lossy compression CSCI 470: Web Science Keith Vertanen Copyright 2013

VIDEO COMPRESSION STANDARDS

Image, video and audio coding concepts. Roadmap. Rationale. Stefan Alfredsson. (based on material by Johan Garcia)

Compression; Error detection & correction

Features. Sequential encoding. Progressive encoding. Hierarchical encoding. Lossless encoding using a different strategy

Advanced Video Coding: The new H.264 video compression standard

Audio Coding and MP3

Fundamentals of Perceptual Audio Encoding. Craig Lewiston HST.723 Lab II 3/23/06

ECE 417 Guest Lecture Video Compression in MPEG-1/2/4. Min-Hsuan Tsai Apr 02, 2013

Lecture 5: Compression I. This Week s Schedule

Image and Video Coding I: Fundamentals

Multimedia Communications. Audio coding

Index. 1. Motivation 2. Background 3. JPEG Compression The Discrete Cosine Transformation Quantization Coding 4. MPEG 5.

Lecture 7: Audio Compression & Coding

Audio and video compression

Digital Video Processing

Audio Fundamentals, Compression Techniques & Standards. Hamid R. Rabiee Mostafa Salehi, Fatemeh Dabiran, Hoda Ayatollahi Spring 2011

Introduction ti to JPEG

ELL 788 Computational Perception & Cognition July November 2015

Professor Laurence S. Dooley. School of Computing and Communications Milton Keynes, UK

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Senior Manager, CE Technology Dolby Australia Pty Ltd

Image and Video Coding I: Fundamentals

Compressed Audio Demystified by Hendrik Gideonse and Connor Smith. All Rights Reserved.

Multimedia Standards

2014 Summer School on MPEG/VCEG Video. Video Coding Concept

JPEG. Wikipedia: Felis_silvestris_silvestris.jpg, Michael Gäbler CC BY 3.0

Compression Part 2 Lossy Image Compression (JPEG) Norm Zeck

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Snr Staff Eng., Team Lead (Applied Research) Dolby Australia Pty Ltd

Bit or Noise Allocation

Digital Image Representation Image Compression

Video coding. Concepts and notations.

Image/video compression: howto? Aline ROUMY INRIA Rennes

Figure 1. Generic Encoder. Window. Spectral Analysis. Psychoacoustic Model. Quantize. Pack Data into Frames. Additional Coding.

CS 335 Graphics and Multimedia. Image Compression

Lecture 3 Image and Video (MPEG) Coding

Bluray (

Lecture 8 JPEG Compression (Part 3)

DAB. Digital Audio Broadcasting

Contents. 3 Vector Quantization The VQ Advantage Formulation Optimality Conditions... 48

CMPT 365 Multimedia Systems. Media Compression - Image

Digital video coding systems MPEG-1/2 Video

CS 074 The Digital World. Digital Audio

Outline Introduction MPEG-2 MPEG-4. Video Compression. Introduction to MPEG. Prof. Pratikgiri Goswami

Georgios Tziritas Computer Science Department

VC 12/13 T16 Video Compression

Parametric Coding of High-Quality Audio

Video Compression MPEG-4. Market s requirements for Video compression standard

The MPEG-4 General Audio Coder

2.4 Audio Compression

Compression; Error detection & correction

Data Compression. Audio compression

PREFACE...XIII ACKNOWLEDGEMENTS...XV

Modeling of an MPEG Audio Layer-3 Encoder in Ptolemy

Cross Layer Protocol Design

MULTIMEDIA AND CODING

Multimedia Systems Speech I Mahdi Amiri February 2011 Sharif University of Technology

Networking Applications

AN AUDIO WATERMARKING SCHEME ROBUST TO MPEG AUDIO COMPRESSION

( ) ; For N=1: g 1. g n

Audio Coding Standards

CHAPTER 4 REVERSIBLE IMAGE WATERMARKING USING BIT PLANE CODING AND LIFTING WAVELET TRANSFORM

DIGITAL TELEVISION 1. DIGITAL VIDEO FUNDAMENTALS

Multimedia Systems Image III (Image Compression, JPEG) Mahdi Amiri April 2011 Sharif University of Technology

Compression of Stereo Images using a Huffman-Zip Scheme

Fundamentals of Video Compression. Video Compression

<< WILL FILL IN THESE SECTIONS THIS WEEK to provide sufficient background>>

JPEG Compression. What is JPEG?

7.5 Dictionary-based Coding

Principles of MPEG audio compression

Interactive Progressive Encoding System For Transmission of Complex Images

Multimedia Networking ECE 599

Introduction to Video Coding

Port of a Fixed Point MPEG-2 AAC Encoder on a ARM Platform

The Gullibility of Human Senses

Transcription:

CISC 7610 Lecture 3 Multimedia data and data formats Topics: Perceptual limits of multimedia data JPEG encoding of images MPEG encoding of audio MPEG and H.264 encoding of video

Multimedia data: Perceptual limits Hearing 20 20,000 Hz frequency range 120 db dynamic range of loudness (1,000,000 : 1) Vision 30 frames per second 300 dots per inch at one foot viewing distance 226M highest-res pixels to cover field of view (Deering, 1998) 140 db dynamic range (10,000,000 : 1) over time

Audio can be fully captured Hearing 20 20,000 Hz frequency range 140 db dynamic range of loudness (10,000,000 : 1) CD quality recording Sampling rate: 44,100 Hz max freq 22,050 Hz Bit depth: 16 bits dynamic range 96 db Bit rate: 44,100 samps/s x 16 bits x 2 chan. = 1.4 Mb/s Typical MP3 compressed recording Bit rate: 128 kb/s (11x compression)

Video streams can still be improved Vision 30 frames per second 300 dots per inch at one foot viewing distance 226M highest-res pixels to cover field of view 140 db brightness dynamic range (10,000,000 : 1) over time HD broadcast quality video = 1.5 Gb/s 1920 x 1080 pixels/frame x 30 frames/s x 24 bits/pixel Typical H.264 compressed recording = 30 Mb/s (>50x) 25 GB Blu-ray disc holds 2 hours, including audio

Compression lets us store and process these data efficiently Remove data that are redundant or irrelevant Redundant: implicit in remaining data Can be fully reconstructed (lossless compression) Irrelevant: unique but unnecessary For example: imperceptible to humans (lossy compression) (estimated) 130 120 110 100 90 80 70 60 50 40 30 100 phon 20 10 (threshold) 0-10 10 100 1000 10k 100k Equal-loudness contours (red) (from ISO 226:2003 revision) Fletcher Munson curves shown (blue) for comparison 80 60 40 20

Considerations for compression Latency: Amount of preceding signal that needs to be observed to compress a given sample Important for real-time applications Locality: Amount of decoded signal that would be affected by changing one bit in encoded signal Important for error robustness Generality: Variety of signals that can be encoded (efficiently) by a given encoder-decoder Decoder complexity vs encoder complexity

Image compression: JPEG encoding 8x8 Blocks YCbCr DCT DesertWall.jpg Encoding Quantization Images from: http://www.ams.org/samplings/feature-column/fcarc-image-compression

Break into 8x8 pixel blocks

Break into 8x8 pixel blocks

Break into 8x8 pixel blocks

Break into 8x8 pixel blocks

Transform from RGB to YCbCr R G B Y Cb Cr

Transform from RGB to YCbCr

Discrete cosine transform (DCT) Forward: F ω = 1 7 2 C ω x=0 f x cos( π(2 x+1)ω 16 ) Inverse: f x = 1 C 2 ω F ω cos( ω=0 7 π(2 x+1)ω 16 ) ω=0 ω=1 First few bases ω=2 ω=3

2D DCT is a 1D DCT in x and y Forward: F u, v = 1 7 4 C uc v x=0 7 y=0 f x, y cos( π(2 x+1)u 16 )cos( π(2 y+1)v 16 ) Bases (cos(...)):

Quantize 2D DCT coefficients Global quality parameter Frequency-dependent sensitivity parameter F ' u, v =round ( F u, v αq u, v ) Bigger Q or α means fewer quantized values

Quantized DCT coefficients for each channel Y Cb Cr Y Cb Cr

Quantized coefficients are losslessly encoded Zig-zag pattern turns 2D matrix into 1D sequence Top left (DC) is encoded relative to previous block Rest of sequence is run-length encoded Each non-zero entry is coded as its value and the number of preceding zeros These symbols are Huffman coded

Reconstructed YCbCr channels Y Cb Cr Y Cb Cr

Reconstructed blocks Orig Orig Recon (Q=50) Recon (Q=10)

Reconstructed images Q=50 size=32kb Q=10 Size=1.7kB

Image compression: JPEG encoding For more information, see: Austin, D. Image Compression: Seeing What's Not There. AMS feature column, Jan 2008. Wallace, Gregory K. "The JPEG still picture compression standard." IEEE Transactions on Consumer Electronics, 38.1, 1992 8x8 Blocks YCbCr DCT DesertWall.jpg Encoding Quantization

Audio coding: two types Based on properties of sound production Properties of the voice or other sound source Control a sound synthesizer accurately enough May only work for source of interest Based on properties of sound reception Properties of the ear Represent sound accurately enough Works for any sound Images from: https://www.ee.columbia.edu/~dpwe/e6820/lectures/l07-coding.pdf

Perceptual audio coding: MPEG Audio We saw with JPEG that quantization is key to coding efficiency Approximate the true image with one that is close, but shorter to encode Perceptual audio codecs approximate the true sound with one that is close, but shorter to encode Two signals that are imperceptibly different should have the same quantized representation

Quantization in audio Represent waveform with discrete levels Q[x] 5 4 3 2 1 0-1 -2-3 -4-5 -5-4 -3-2 -1 0 1 2 3 4 5 x 6 4 2 0-2 Q[x[n]] error e[n] = x[n] - Q[x[n]] 0 5 10 15 20 25 30 35 40 Equivalent to adding an error of uniform noise x[n] Q[x[n]] = round(x[n]/d) p(e[n]) -D /2 +D /2

Quantization in audio Represent waveform with discrete levels Q[x] 5 4 3 2 1 0-1 -2-3 -4-5 -5-4 -3-2 -1 0 1 2 3 4 5 x 6 4 2 0-2 Q[x[n]] error e[n] = x[n] - Q[x[n]] 0 5 10 15 20 25 30 35 40 Equivalent to adding an error of uniform noise x[n] level / db 0-20 -40 Quantized at 7 bits -60-80 0 1000 2000 3000 4000 5000 6000 7000 freq / Hz

MPEG Audio basic idea Break audio into different frequencies Quantize each frequency as much as possible While remaining imperceptible Hide quantization behind louder signals Need psychoacoustic model of hiding Scale indices Scale normalize Quantize Input 32 band polyphase filterbank Scale normalize Quantize Format & pack bitstream Bitstream Data frame Control & scales Psychoacoustic masking model Per-band masking margins 32 chans x 36 samples 24 ms

MPEG psychoacoustic model Model masking of sounds by tonal & noisy sounds Difficulties Masking is non-linear in frequency and intensity Complex-dynamic sounds not well understood SPL / db SPL / db SPL / db 60 50 40 30 20 10 0-10 60 50 40 30 20 10 0-10 60 50 40 30 20 10 0-10 Signal spectrum Tonal peaks 1 3 5 7 9 11 13 15 17 19 21 23 25 Masking spread Non-tonal energy 1 3 5 7 9 11 13 15 17 19 21 23 25 Resultant masking 1 3 5 7 9 11 13 15 17 19 21 23 25 freq / Bark

MPEG bit allocation Result of psychoacoustic model is maximum tolerable noise per subband Safe noise level required signal-to-noise ratio required number of bits For fixed bit-rates, may not be able to satisfy all subbands Masking tone level SNR ~ 6 B Masked threshold Safe noise level Quantization noise Subband N freq

MPEG Audio Layer III (MP3) Do an additional frequency analysis on subbands of layers I and II Adapts to content (better time or frequency resolution) Digital Audio Signal (PCM) (768 kbit/s) Subband 31 Filterbank 32 Subbands MDCT FFT 1024 Points 0 Window Switching Line 575 0 Psychoacoustic Model Distortion Control Loop Nonuniform Quantization Rate Control Loop Huffman Encoding Coding of Sideinformation Coded Audio Signal 192 kbit/s 32 kbit/s External Control

Perceptual audio coding For more information see: Pan, D. "A tutorial on MPEG/audio compression." IEEE multimedia 2, pp60-74, 1995. Painter, T., and A. Spanias. "Perceptual coding of digital audio." Proceedings of the IEEE, 88.4, pp451-515, 2000.

Video coding: MPEG2 and H.264 Take advantage of redundancy in space and time Predict pieces of frames from other frames Take advantage of irrelevance using quantization like JPEG Final lossless coding to squeeze out last bits Video codecs rely more heavily on analysis-bysynthesis than audio and image codecs

MPEG2 video encoding basics 0110 0101 1101 1 011 000010 Huffman Coding RGB _.. 0110110 0101100 1101111 0110 0101 1101 Quantization YUV Removespatial redundancy Blocks Macroblocks Removetemporal redundancy Zig Zag Scan DiscreteCosineTransform I B B P H Authoring Division Name File Name Security Notice (if required) Used in DVD, digital (HD) TV Image from: http://www.keysight.com/upload/cmc_upload/all/6c06mpegtutorial1.pdf

MPEG2 data is composed of streams

MPEG2 data is composed of streams Streams for different types of data Audio, video, program information Streams are multiplexed together Well-defined bistream syntax for keeping track of what came from where Different conceptual streams for coding, broadcast, disc-based storage Enforce synchronization through different clocks Information to make the stream seek-able

MPEG2 data is composed of streams Elementary stream (ES): e.g., Frames I-frame Variable P-frame data Variable Fixed Variable Fixed Variable Packetized ES (PES) PES Header I-frame data PES Header P-frame data Transport stream TS Header Fixed Fixed Fixed Fixed PES data TS Header PES data TS Header PES data TS Header PES data

H.264 analysis-by-synthesis

H.264 Advanced video coding Also known as MPEG4 part 10 Almost same stream format as MPEG2 But about 2x compression advantage Used in Blu-ray, YouTube Widespread browser support Improvements over MPEG2 Better support for periodic movement Better support for small motion blocks Better lossless coding (arithmetic coding)

Video coding For more information: Hewlett-Packard. "MPEG-2: The Basics of How It Works." 1999. Navarro, A. Introduction to digital television. Slideshare, 2012 http://www.slideshare.net/chetanrao2012/introdgitaltv Wiegand, Thomas, and Gary J. Sullivan. "The H. 264/AVC video coding standard." IEEE Signal Processing Magazine, 24.2, pp148-153, 2007.

Summary Compression lets us store data efficiently Remove data that are redundant or irrelevant Redundant: implicit in remaining data Can be fully reconstructed (lossless compression) Irrelevant: unique but unnecessary For example: imperceptible to humans (lossy compression)