Audio and video compression

Similar documents
Perceptual coding. A psychoacoustic model is used to identify those signals that are influenced by both these effects.

Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal.

Digital video coding systems MPEG-1/2 Video

Audio Fundamentals, Compression Techniques & Standards. Hamid R. Rabiee Mostafa Salehi, Fatemeh Dabiran, Hoda Ayatollahi Spring 2011

Introduction to LAN/WAN. Application Layer 4

Audio Compression. Audio Compression. Absolute Threshold. CD quality audio:

2.4 Audio Compression

Interframe coding A video scene captured as a sequence of frames can be efficiently coded by estimating and compensating for motion between frames pri

Mpeg 1 layer 3 (mp3) general overview

5: Music Compression. Music Coding. Mark Handley

Principles of Audio Coding

COMP 249 Advanced Distributed Systems Multimedia Networking. The Video Data Type Coding & Compression Basics

Week 14. Video Compression. Ref: Fundamentals of Multimedia

MPEG-2. ISO/IEC (or ITU-T H.262)

Networking Applications

Video Coding Standards. Yao Wang Polytechnic University, Brooklyn, NY11201 http: //eeweb.poly.edu/~yao

Multimedia Standards

Video coding. Concepts and notations.

Bluray (

DigiPoints Volume 1. Student Workbook. Module 8 Digital Compression

VALLIAMMAI ENGINEERING COLLEGE

Data Compression. Audio compression

What is multimedia? Multimedia. Continuous media. Most common media types. Continuous media processing. Interactivity. What is multimedia?

ITEC310 Computer Networks II

Chapter 10. Basic Video Compression Techniques Introduction to Video Compression 10.2 Video Compression with Motion Compensation

Audio-coding standards

Multimedia. What is multimedia? Media types. Interchange formats. + Text +Graphics +Audio +Image +Video. Petri Vuorimaa 1

Chapter 14 MPEG Audio Compression

Multimedia Communications. Audio coding

Digital Video Processing

CSCD 443/533 Advanced Networks Fall 2017

KINGS COLLEGE OF ENGINEERING DEPARTMENT OF INFORMATION TECHNOLOGY ACADEMIC YEAR / ODD SEMESTER QUESTION BANK

Video Coding Standards

Video Compression An Introduction

Audio-coding standards

10.2 Video Compression with Motion Compensation 10.4 H H.263

Ch. 4: Video Compression Multimedia Systems

MPEG-2. And Scalability Support. Nimrod Peleg Update: July.2004

Transporting audio-video. over the Internet

Ch. 5: Audio Compression Multimedia Systems

Optical Storage Technology. MPEG Data Compression

DIGITAL TELEVISION 1. DIGITAL VIDEO FUNDAMENTALS

Outline Introduction MPEG-2 MPEG-4. Video Compression. Introduction to MPEG. Prof. Pratikgiri Goswami

ELL 788 Computational Perception & Cognition July November 2015

Ch 4: Multimedia. Fig.4.1 Internet Audio/Video

ITNP80: Multimedia! Sound-II!

EEC-484/584 Computer Networks

Cross Layer Protocol Design

Perceptual Coding. Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding

Digital Speech Coding

Lecture 4: Video Compression Standards (Part1) Tutorial 2 : Image/video Coding Techniques. Basic Transform coding Tutorial 2

Figure 1. Generic Encoder. Window. Spectral Analysis. Psychoacoustic Model. Quantize. Pack Data into Frames. Additional Coding.

Tutorial T5. Video Over IP. Magda El-Zarki (University of California at Irvine) Monday, 23 April, Morning

Audio Coding and MP3

Welcome Back to Fundamentals of Multimedia (MR412) Fall, 2012 Chapter 10 ZHU Yongxin, Winson

Lecture 6: Compression II. This Week s Schedule

The Scope of Picture and Video Coding Standardization

Multimedia Signals and Systems Motion Picture Compression - MPEG

CS 335 Graphics and Multimedia. Image Compression

Computer and Machine Vision

CISC 7610 Lecture 3 Multimedia data and data formats

Speech-Coding Techniques. Chapter 3

Using animation to motivate motion

Lecture 3 Image and Video (MPEG) Coding

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Senior Manager, CE Technology Dolby Australia Pty Ltd

ECE 417 Guest Lecture Video Compression in MPEG-1/2/4. Min-Hsuan Tsai Apr 02, 2013

Application of wavelet filtering to image compression

Synopsis of Basic VoIP Concepts

Multimedia Systems Speech I Mahdi Amiri February 2011 Sharif University of Technology

4G WIRELESS VIDEO COMMUNICATIONS

Fundamentals of Video Compression. Video Compression

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Snr Staff Eng., Team Lead (Applied Research) Dolby Australia Pty Ltd

Principles of MPEG audio compression

Wireless Communication

Fundamentals of Perceptual Audio Encoding. Craig Lewiston HST.723 Lab II 3/23/06

MPEG-4: Simple Profile (SP)

5LSE0 - Mod 10 Part 1. MPEG Motion Compensation and Video Coding. MPEG Video / Temporal Prediction (1)

Course Syllabus. Website Multimedia Systems, Overview

H.264/AVC und MPEG-4 SVC - die nächsten Generationen der Videokompression

VIDEO AND IMAGE PROCESSING USING DSP AND PFGA. Chapter 3: Video Processing

Mahdi Amiri. February Sharif University of Technology

Video Compression. Learning Objectives. Contents (Cont.) Contents. Dr. Y. H. Chan. Standards : Background & History

Chapter 11.3 MPEG-2. MPEG-2: For higher quality video at a bit-rate of more than 4 Mbps Defined seven profiles aimed at different applications:

Parametric Coding of High-Quality Audio

Implementation of a MPEG 1 Layer I Audio Decoder with Variable Bit Lengths

Image and video processing

Source Coding Basics and Speech Coding. Yao Wang Polytechnic University, Brooklyn, NY11201

AUDIO. Henning Schulzrinne Dept. of Computer Science Columbia University Spring 2015

Laboratoire d'informatique, de Robotique et de Microélectronique de Montpellier Montpellier Cedex 5 France

The Basics of Video Compression

CMPT 365 Multimedia Systems. Media Compression - Video Coding Standards

Chapter 2 MPEG Video Compression Basics

Image, video and audio coding concepts. Roadmap. Rationale. Stefan Alfredsson. (based on material by Johan Garcia)

Module 6 STILL IMAGE COMPRESSION STANDARDS

Lecture Information Multimedia Video Coding & Architectures

Recommended Readings

Tech Note - 05 Surveillance Systems that Work! Calculating Recorded Volume Disk Space

MULTIMEDIA SYSTEMS

VIDEO COMPRESSION STANDARDS

Compressed-Domain Video Processing and Transcoding

Transcription:

Audio and video compression 4.1 introduction Unlike text and images, both audio and most video signals are continuously varying analog signals. Compression algorithms associated with digitized audio and video are different from those associated with text and images. 4.2 Audio compression Speech and non-speech signals are encoded in different approaches. 4.2.1 Speech coding Differential pulse code modulation (DPCM) is a derivative of standard PCM and exploits the fact that, for most audio signals, the range of the differences in amplitude between successive samples of the audio waveform is less than the range of the actual sample amplitudes. (G.711) In Adaptive differential PCM (ADPCM), fewer bits are used to encode smaller difference values than for larger values. (G.721, G.722 & G.726) DPCM and ADPCM can also be used to encode nonspeech signals. In linear predictive coding (LPC), a speech signal is analyzed to extract its perceptual features including pitch and format frequencies and these features are then encoded. (LPC-10, G.728, G.723 & G.729) CYH/MMT/CmpAV/p.1 CYH/MMT/CmpAV/p.2

Summary of speech compression standards and their applications: Standard Compression technique Compressed bit rate (kbps) Quality Example applications G.711 PCM+ companding 64 Good PSTN/ISDN telephony G.721 ADPCM 32 16 Good Fair Telephony at reduced bit rates G.722 ADPCM with subband coding 64 56/48 Excellent Good Audio conferencing G.726 ADPCM with subband coding 40/32 24/16 Good Fair General telephony at reduced bit rates LPC-10 LPC 2.4/1.2 Poor Telephony in military networks G.728 Code-excited LPC (CELP) 16 Good Low delay/low bit rate telephony G.729 CELP 8 Good Telephony in cellular networks G.729(A) CELP 8 Good Simultaneous telephony and data (fax) G.723.1 CELP 6.3 5.3 Good Fair Video and internet telephony 4.2.2 Perceptual coding Audio signal is coded based on a psychoacoustic model which describes the limitations of the human ear. Ear is more sensitive to some signals than others. Frequency masking: A strong signal may reduce the level of sensitivity of the ear to other signals which are near to it in frequency. Temporal masking: When the ear hears a loud sound, it takes a short but finite time before it can hear a quieter sound. CYH/MMT/CmpAV/p.3 CYH/MMT/CmpAV/p.4

CYH/MMT/CmpAV/p.5 CYH/MMT/CmpAV/p.6

MPEG audio coders An international standard based on this approach is defined in ISO Recommendation 11172-3. Summary of MPEG layer 1, 2 and 3 perceptual encoders Layer Application Compressed bit rate 1 Digital audio cassette 2 Digital audio and digital video broadcasting 3 CD-quality audio over low bit rate channel Quality 32-448 kbps Hi-fi quality at 192kbps per channel 32-192 kbps Near CDquality at 128 kbps per channel 64 kbps CD-quality at 64 kbps per channel Example input-tooutput delay 20ms 40ms 60ms A higher layer makes a better use of the psychoacoustic model and hence higher compression rate can be achieved. The 3 layers require increasing levels of complexity (and hence cost) to achieve a particular perceived quality, the choice of layer and bit rate is often a compromise between the desired perceived quality and the available bit rate. CYH/MMT/CmpAV/p.7 CYH/MMT/CmpAV/p.8

Dolby audio coders In AC-1, the bit allocation information of the quantized subband samples is directly encoded and embedded in the bit-stream. In AC-2, this information is indirectly encoded and has to be estimated at the decoder. In AC-3, additional information is transmitted to compensate for the estimation error. The acoustic quality of both the MPEG and Dolby audio coders were found to be comparable. Summary of compression standards for general audio: Standard Compressed bit rate MPEG Layer 1 32-448kbps Audio Layer 2 32-192kbps Dolby audio coders Quality Example applications Hi-fi quality Digital audio at 192kbps cassettes Near CD at Digital audio and 128 kbps digital video broadcasting Layer 3 64kbps CD quality CD-quality over low bit rate channels AC-1 512kbps Hi-fi quality Radio and television satellite relays AC-2 256kbps Hi-fi quality PC sound cards AC-3 192kbps Near CD quality Digital video broadcasting CYH/MMT/CmpAV/p.9 CYH/MMT/CmpAV/p.10

4.3 Video compression There is not just a single standard associated with video but rather a range of standards, each targeted at a particular application domain. 4.3.1 Video compression principles Video is simply a sequence of digitized pictures and it is also referred to as moving pictures. A video sequence can be encoded with JPEG algorithm frame by frame and this approach is known as motion JPEG. In addition to the spatial redundancy present in each frame, considerable redundancy is often present between successive frames. Frames are classified as 1 of 3 basic frame types (I-, P- and B- frames) and encoded differently. CYH/MMT/CmpAV/p.11 CYH/MMT/CmpAV/p.12

I-frames: I-frames are encoded independently using the JPEG algorithm. I-frames are inserted into the output stream relatively frequently. I-frames are used as access points for random access and FF/FR functionality in the bit stream. P-frames: Frames are partitioned into blocks of size 16x16 (macroblocks). To encode a P-frame, the contents of each macroblock in the target frame are compared on a pixel-by-pixel basis with the contents of the reference frame to find a best-matched block of equal size. The reference frame can be a P- or I- frame. The (x,y) offset of the macroblock being encoded and the best-matched block is known as motion vector. This motion-vector-searching process is known as motion estimation. CYH/MMT/CmpAV/p.13 CYH/MMT/CmpAV/p.14

A prediction of the target frame is made with the reference frame based on the motion vectors obtained. The difference between the predicted frame and the actual target frame is known as the prediction error. Motion compensation: Additional bits are required to encode the prediction error so as to compensate for the difference if necessary. B-frames: To encoded a B-frame, any motion is estimated with reference to both the immediately preceding I- or P- frame and the immediately succeeding P- or I-frame. B-frames provide the highest level of compression. B-frames are not involved in the coding of other frames and hence they do not propagate errors. CYH/MMT/CmpAV/p.15 CYH/MMT/CmpAV/p.16

The number of frames between successive I-frames is known as a group of pictures (GOP). The number of frames between a P-frame and the immediately preceding I- or P-frame is called the prediction span. The order of encoding and transmission of the frames is changed to minimize the time required to decode the frames. A 4 th type of frame known as a PB-frame has also been defined. Two neighboring P- and B-frames are encoded as if they were a single frame. A 5 th type of frame known as a D-frame has been defined for use in movie/video-on-demand applications. CYH/MMT/CmpAV/p.17 CYH/MMT/CmpAV/p.18

Basic bitstream format: Type : type of frame, I, P or B Address : identifies the location of the macroblock in the frame Quantization value: the threshold value used to quantize all DCT coefficients in the macroblock. Motion vector: encoded vector Block present: indicates which block in the macroblock are present Typical figures of the compression ratios I-frames: 10~20:1 P-frames: 20~30:1 B-frames: 30~50:1 CYH/MMT/CmpAV/p.19 CYH/MMT/CmpAV/p.20

4.3.2 H.261 H.261 has been defined by the ITU-T for the provision of video telephony and videoconferencing services over an ISDN. Supports I- and P-frames only. Encoding format: Type: indicates if the macroblock is intracoded or intercoded Address: identifies the location of the macroblock in the frame Quantization value: the threshold value used to quantize all DCT coefficients in the macroblock. Motion vector: encoded vector Coded block pattern: indicates which block in the macroblock are present Picture start code: indicates the start of a new frame. Temporal reference: a timestamp for the decoder to synchronize the video information with the audio information. Picture type: indicates if the frame is encoded as I- or P-frame. GOB start code: is a resynchronization marker which is used for resynchronization in case of error. Group of (macro)block (GOP) is a structure consists of 3x11 macroblocks. CYH/MMT/CmpAV/p.21 CYH/MMT/CmpAV/p.22

4.3.3 H.263 H.263 has been defined by the ITU-T for use in a range of real-time video applications over wireless and PSTNs. The applications include video telephony, videoconferencing, security surveillance, interactive games playing and so on. H.263 standard has a number of advanced coding options compared with H.261: Progressive scanning with a refresh rate of either 15 or 7.5 fps. Support I-, P-, B- and PB- frames Motion vectors, if necessary, are allowed to point outside of the frame area. Schemes such as error tracking, independent segment decoding and reference picture selection are included in the standard that aim at minimizing the effects of errors on neighboring GOBs. Error concealment scheme is incorporated into the decoder to mask the error from the viewer. CYH/MMT/CmpAV/p.23 CYH/MMT/CmpAV/p.24

4.3.4 MPEG The Motion Pictures Expert Group (MPEG) was formed by the ISO to formulate a set of standards relating to a range of multimedia applications that involve the use of video with sound. Typical figures of the compression ratios I-frames: 10:1 P-frames: 20:1 B-frames: 50:1 MPEG1 : ISO Recommendation 11172 Similar video compression technique as H.261. Progressive scanning with a refresh rate of 30Hz (for NTSC) and 25Hz (for PAL) Support I-, P- and B- frames I-frames must be used for the various random-access functions associated with VCRs. Improvement with respect to H.261: 1. A new layer called slice is added in the structure of the stream such that the decoder can resynchronize more quickly in case of error. 2. support B-frames 3. larger searching window of motion vectors and finer resolution of its representation CYH/MMT/CmpAV/p.25 CYH/MMT/CmpAV/p.26

Bitstream format: Sequence start code: indicates the start of a sequence CYH/MMT/CmpAV/p.27 CYH/MMT/CmpAV/p.28

Video parameters: specify the screen size and aspect ratio Bitstream parameters: indicate the bit rate and the size of the memory/ frame buffers that are required Quantization parameters: contain the contents of the quantization tables that are to be used. - GOP start code: indicates the start of a GOP Time stamp: used for synchronization purposes Parameters: defines the particular sequence of frame types that are used in each GOP (e.g. IPPBPP) - Picture start code: indicates the start of a frame Type: indicates if it's a I-, P- or B-frame Buffer parameters: indicate how full the buffer should be before the decoding operation should start Encode parameters: indicate the resolution of a motion vector. - Slice start code: indicates the start of a slice Vertical position: indicates the scan line in which the slice is Quantization parameters: indicates the scaling factor that applies to this slice. MPEG2 : ISO Recommendation 13818 It supports four levels - low, main, high 1440 and high - each targeted at a particular application domain. There are 5 profiles associated with each level: simple, main, spatial resolution, quantization accuracy and high. The different combinations of levels and profiles form a framework for all standards activities associated with MPEG-2. One of the most popular setting is the MP@ML standard which is for digital television broadcasting. There are 3 standards associated with HDTV: advanced television (ATV) in North America, digital video broadcast (DVB) in Europe, and multiple sub-nyquist sampling encoding (MUSE) in Japan. ATV DVB MUSE Aspect ratio 16/9 4/3 16/9 Resolution 1280x720 1440x1152 1920x1035 Compression (video) Compression (Audio) MP@HL of MPEG2 Dolby AC-3 SSP@H1440 of MPEG2 MP2 Similar to MP@HL CYH/MMT/CmpAV/p.29 CYH/MMT/CmpAV/p.30

Summary of video compression standards Standard Digitization Compressed Example applications format bit rate H.261 CIF/QCIF x64kbps Video telephony/ conferencing over ISDN and LANs H.263 S-QCIF/ QCIF <64kbps Video telephony/ conferencing and security surveillance over low bit rate channels SIF <1.5Mbps Storage of VHS-quality video on CD-ROMs MPEG-1/ ISO11172 MPEG-2/ ISO13818 Low SIF <4Mbps Recording of VHS-quality video Main 4:2:0 <15Mbps 4:2:2 <20Mbps High 1440 4:2:0 <60Mbps 4:2:2 <80Mbps High 4:2:0 <80Mbps 4:2:2 <100Mbps MPEG-4 Various 5kbpstens Mbps Digital video broadcasting HDTV (4/3 aspect ratio) HDTV (16/9 aspect ratio) Versatile multimedia coding standard CYH/MMT/CmpAV/p.31