TECHNICAL RESEARCH REPORT

Similar documents
Video Compression Standards (II) A/Prof. Jian Zhang

Week 14. Video Compression. Ref: Fundamentals of Multimedia

MPEG-2. ISO/IEC (or ITU-T H.262)

Outline Introduction MPEG-2 MPEG-4. Video Compression. Introduction to MPEG. Prof. Pratikgiri Goswami

The Scope of Picture and Video Coding Standardization

Interframe coding A video scene captured as a sequence of frames can be efficiently coded by estimating and compensating for motion between frames pri

Video Compression An Introduction

Multimedia Standards

Chapter 11.3 MPEG-2. MPEG-2: For higher quality video at a bit-rate of more than 4 Mbps Defined seven profiles aimed at different applications:

MPEG-2. And Scalability Support. Nimrod Peleg Update: July.2004

Lecture 5: Video Compression Standards (Part2) Tutorial 3 : Introduction to Histogram

DIGITAL TELEVISION 1. DIGITAL VIDEO FUNDAMENTALS

MPEG-4: Simple Profile (SP)

ECE 417 Guest Lecture Video Compression in MPEG-1/2/4. Min-Hsuan Tsai Apr 02, 2013

Motion Estimation. Original. enhancement layers. Motion Compensation. Baselayer. Scan-Specific Entropy Coding. Prediction Error.

Module 7 VIDEO CODING AND MOTION ESTIMATION

Welcome Back to Fundamentals of Multimedia (MR412) Fall, 2012 Chapter 10 ZHU Yongxin, Winson

Advanced Video Coding: The new H.264 video compression standard

System Modeling and Implementation of MPEG-4. Encoder under Fine-Granular-Scalability Framework

EE Low Complexity H.264 encoder for mobile applications

Video Compression MPEG-4. Market s requirements for Video compression standard

Zonal MPEG-2. Cheng-Hsiung Hsieh *, Chen-Wei Fu and Wei-Lung Hung

5LSE0 - Mod 10 Part 1. MPEG Motion Compensation and Video Coding. MPEG Video / Temporal Prediction (1)

Cross Layer Protocol Design

Video Coding Standards

CMPT 365 Multimedia Systems. Media Compression - Video Coding Standards

10.2 Video Compression with Motion Compensation 10.4 H H.263

Comparative Study of Partial Closed-loop Versus Open-loop Motion Estimation for Coding of HDTV

A real-time SNR scalable transcoder for MPEG-2 video streams

Professor Laurence S. Dooley. School of Computing and Communications Milton Keynes, UK

Digital Video Processing

VIDEO COMPRESSION STANDARDS

Video Coding Standards. Yao Wang Polytechnic University, Brooklyn, NY11201 http: //eeweb.poly.edu/~yao

JPEG 2000 vs. JPEG in MPEG Encoding

VIDEO AND IMAGE PROCESSING USING DSP AND PFGA. Chapter 3: Video Processing

EE 5359 Low Complexity H.264 encoder for mobile applications. Thejaswini Purushotham Student I.D.: Date: February 18,2010

Video coding. Concepts and notations.

MPEG Digital Video Coding Standards

EE Multimedia Signal Processing. Scope & Features. Scope & Features. Multimedia Signal Compression VI (MPEG-4, 7)

In the name of Allah. the compassionate, the merciful

Multimedia Signals and Systems Motion Picture Compression - MPEG

Review and Implementation of DWT based Scalable Video Coding with Scalable Motion Coding.

PREFACE...XIII ACKNOWLEDGEMENTS...XV

Lecture 4: Video Compression Standards (Part1) Tutorial 2 : Image/video Coding Techniques. Basic Transform coding Tutorial 2

Thanks for slides preparation of Dr. Shawmin Lei, Sharp Labs of America And, Mei-Yun Hsu February Material Sources

COMPRESSED DOMAIN SPATIAL SCALING OF MPEG VIDEO BIT- STREAMS WITH COMPARATIVE EVALUATION

Standard Codecs. Image compression to advanced video coding. Mohammed Ghanbari. 3rd Edition. The Institution of Engineering and Technology

MPEG: It s Need, Evolution and Processing Methods

Chapter 10. Basic Video Compression Techniques Introduction to Video Compression 10.2 Video Compression with Motion Compensation

EE 5359 H.264 to VC 1 Transcoding

Laboratoire d'informatique, de Robotique et de Microélectronique de Montpellier Montpellier Cedex 5 France

Information technology Coding of moving pictures and associated audio for digital storage media at up to about 1,5 Mbit/s

MPEG-2 standard and beyond

Wireless Communication

Video Quality Analysis for H.264 Based on Human Visual System

System Modeling and Implementation of MPEG-4. Encoder under Fine-Granular-Scalability Framework

Index. 1. Motivation 2. Background 3. JPEG Compression The Discrete Cosine Transformation Quantization Coding 4. MPEG 5.

Video Coding Standards: H.261, H.263 and H.26L

Digital video coding systems MPEG-1/2 Video

Video Compression. Learning Objectives. Contents (Cont.) Contents. Dr. Y. H. Chan. Standards : Background & History

The Basics of Video Compression

Audio and video compression

Ch. 4: Video Compression Multimedia Systems

H.264/AVC und MPEG-4 SVC - die nächsten Generationen der Videokompression

Fundamentals of Video Compression. Video Compression

Objective: Introduction: To: Dr. K. R. Rao. From: Kaustubh V. Dhonsale (UTA id: ) Date: 04/24/2012

Automatic Video Caption Detection and Extraction in the DCT Compressed Domain

Compressed-Domain Video Processing and Transcoding

2014 Summer School on MPEG/VCEG Video. Video Coding Concept

ISO/IEC INTERNATIONAL STANDARD. Information technology Generic coding of moving pictures and associated audio information Part 2: Video

Upcoming Video Standards. Madhukar Budagavi, Ph.D. DSPS R&D Center, Dallas Texas Instruments Inc.

Fast Implementation of VC-1 with Modified Motion Estimation and Adaptive Block Transform

Lecture 13 Video Coding H.264 / MPEG4 AVC

4G WIRELESS VIDEO COMMUNICATIONS

Video Coding in H.26L

Introduction to Video Compression

Efficient support for interactive operations in multi-resolution video servers

The VC-1 and H.264 Video Compression Standards for Broadband Video Services

LIST OF TABLES. Table 5.1 Specification of mapping of idx to cij for zig-zag scan 46. Table 5.2 Macroblock types 46

IMAGE COMPRESSION. Image Compression. Why? Reducing transportation times Reducing file size. A two way event - compression and decompression

Lesson 6. MPEG Standards. MPEG - Moving Picture Experts Group Standards - MPEG-1 - MPEG-2 - MPEG-4 - MPEG-7 - MPEG-21

VIDEO CODING STANDARDS FOR MULTIMEDIA COMMUNICATION 1. Video Coding Standards for Multimedia Communication: H.261, H.

Introduction to Video Coding

Georgios Tziritas Computer Science Department

What is multimedia? Multimedia. Continuous media. Most common media types. Continuous media processing. Interactivity. What is multimedia?

Using animation to motivate motion

Scalable Multiresolution Video Coding using Subband Decomposition

Video Transcoding Architectures and Techniques: An Overview. IEEE Signal Processing Magazine March 2003 Present by Chen-hsiu Huang

AUDIOVISUAL COMMUNICATION

Multimedia. What is multimedia? Media types. Interchange formats. + Text +Graphics +Audio +Image +Video. Petri Vuorimaa 1

Interactive Progressive Encoding System For Transmission of Complex Images

MPEG-2 Decoder for SHAPES DOL

Lecture 8 JPEG Compression (Part 3)

A Hybrid Temporal-SNR Fine-Granular Scalability for Internet Video

April and Center for Telecommunications Research.

VIDEO streaming applications over the Internet are gaining. Brief Papers

Lecture 3 Image and Video (MPEG) Coding

International Journal of Emerging Technology and Advanced Engineering Website: (ISSN , Volume 2, Issue 4, April 2012)

White paper: Video Coding A Timeline

Lecture 5: Error Resilience & Scalability

Transcription:

TECHNICAL RESEARCH REPORT An Advanced Image Coding Algorithm that Utilizes Shape- Adaptive DCT for Providing Access to Content by R. Haridasan CSHCN T.R. 97-5 (ISR T.R. 97-16) The Center for Satellite and Hybrid Communication Networks is a NASA-sponsored Commercial Space Center also supported by the Department of Defense (DOD), industry, the State of Maryland, the University of Maryland and the Institute for Systems Research. This document is a technical report in the CSHCN series originating at the University of Maryland. Web site http://www.isr.umd.edu/cshcn/

An Advanced Image Coding Algorithm that Utilizes Shape-Adaptive DCT for Providing Access to Content Radhakrishnan Haridasan University of Maryland, College Park, MD. This work was carried out under the supervision of Kiran Challapali Video Communications Department Philips Research riarcli Manor, NY 10510 This report has previously appeared as a Technical Report at Philips Research February 18, 1997

Contents 1 Objective 2 2 ackground 2 2.1 MPEG-2: The current video coding standard : : : : : : : : : : : : : : : : : : : : : : 2 2.2 MPEG-4: The future video coding standard : : : : : : : : : : : : : : : : : : : : : : : 3 2.3 Using an MPEG-2 like syntax for MPEG-4 : : : : : : : : : : : : : : : : : : : : : : : 4 3 Shape-Adaptive Discrete Cosine Transform (SA-DCT) 5 3.1 The SA-DCT Algorithm : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 5 4 The MPEG-2 video bitstream: Syntax and Semantics 7 5 Incorporating Content-ased Accessibility into the MPEG-2 framework 8 6 Results 9 7 Conclusion 10 A Appendix 11 1

1 Objective To determine the eciency of Shape-Adaptive Discrete Cosine Transform (SA-DCT) for coding arbitrarily shaped objects. 2 ackground 2.1 MPEG-2: The current video coding standard The Moving Picture Experts Group (MPEG) was formed under the auspices of the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC) in 1988 to establish standards for coding of moving pictures and associated audio for various applications such as digital storage media, distribution and communication [1][2]. After the success of the rst generation video coding standard MPEG-1, which was optimized for progressive video at 1.5 Mbit/sec, MPEG-2 work item started to cover a wide range of applications including all-digital transmission of broadcast quality audio-video television signals. In addition to \coding tools" provided in MPEG-1, the MPEG-2 standard provides for SNR scalability, temporal and spatial scalability and also allows for data partitioning. These extensions are needed to address applications such as video telecommunications, video on Asynchronous Transfer Mode (ATM) networks, interworking of video standards, HDTV with embedded TV, etc. Adopted as the international standard in November 1994, MPEG-2 has become widely accepted for generic coding of video and audio worldwide. The recent trend towards wireless communications, interactive computer applications and the integration of audio-visual data into an ever increasing number of applications is resulting in new expectations and requirements not addressed by current video coding algorithms. A whole spectrum of new applications including interactive mobile multimedia communications, videophone, multimedia electronic mail, remote sensing, electronic newspapers, interactive multimedia databases, multimedia videotex, games, interactive computer imagery and sign language captioning are envis- 2

aged for the future[3]. MPEG-2 was not designed to accommodate functionalities such as support for manipulation of the content of audio-visual data, or content-based database access useful for these new applications. Hence, there is a need to develop a new video coding standard suited for such applications. 2.2 MPEG-4: The future video coding standard In the MPEG meeting in September 1993 it became clear that there needs to be a new work item to accommodate novel and conventional functionalities for video coding. MPEG-4's main focus shifted from low bit rate coding of video to deal with, in addition to compression, contentbased interactivity, and universal accessiblity by wireless as well as wired networks. Among the functionalities designated as content-based were multimedia data access, manipulation, bitstream editing, and scalability. The remaining functionalities include hybrid natural and synthetic data coding, improved temporal random access, coding of multiple concurrent data streams, an example of which is stereoscopic coding, and robustness in error-prone environments. To realize these functionalities, greater freedom to create a coding algorithm of choice is sought. This exibility is provided by combining various coding tools and giving all the required rules to the decoder to decipher the coded bitstream. MPEG-4 would then nd the most essential coding tools and standardize them ; making them \normative". It would also standardize a set of exible rules that allow for various combinations of these tools, in order to accommodate the functionalities needed by particular applications. This would happen using a exible syntax, in the form of MPEG-4 Syntactic Description Language (MSDL), which would be object-oriented. The MSDL would also be extensible, soastoallow for dierent combinations using these tools to be developed in the future. To summarize, the basic elements of the proposed MPEG-4 standard to be nalized by 1997 would be: Tools: Techniques accessible through the syntax or described using the syntax (examples of 3

tools are motion compensation and contour representation). Algorithms: Organized collections of tools providing one or more functionalities (examples of algorithms are MPEG-2 video, MPEG-1 systems etc.). Proles: Algorithms or combinations of algorithms constrained in a specic way to address a particular class of applications (an example is MPEG-2 MP@ML). tool 1 tool 2 tool 3 tool 4 tool 5 tool i MPEG-4 Syntactic Descriptive Language (MSDL) algorithm 1 algorithm 2 algorithm 3 Profile 1 Profile k Figure 1: MPEG-4 Elements 2.3 Using an MPEG-2 like syntax for MPEG-4 As discussed in the preceding discussion, the key enabling technology of MPEG-4 applications is the ability to access objects that comprise an image. At present the entire image is treated as a single entity. In order to provide for the new functionality it is clear that arbitrarily shaped image segments should be encoded eciently in a manner that each object can be decoded independently of other objects. An extension of MPEG-2 to incorporate content-based addressability is to use the Shape-Adaptive DCT in lieu of the standard block based DCT. This would provide for segment or object-based coding of images and video while providing backward compatibility to existing coding 4

standards. In section 5 and Appendix A, specic details for using MPEG-2 like syntax for MPEG-4 is provided. 3 Shape-Adaptive Discrete Cosine Transform (SA-DCT) Shape-Adaptive DCT is a transform coding scheme which isoflow complexity and can be used to extend current block-based DCT schemes to accomplish generic coding of segmented video over a wide range of bit-rates [4]. In using this transform, rst an image is partitioned into adjacent blocks of M M pixels. Each M M block is then encoded using SA-DCT. The SA-DCT coecients are quantized and run-length coded. The contour information is assumed to be encoded separately and transmitted to the receiver for decoding the image. 3.1 The SA-DCT Algorithm The SA-DCT algorithm is based on representing M M blocks in terms of predened orthogonal sets of basis functions. The algorithm does not require any more computations than the ordinary DCT. Figure 2 illustrates the steps involved in coding an arbitrarily shaped image foreground segment contained within an 8 8 reference block. Figure 2(a) shows an image block segmented into a foreground (dark) and background (light). To perform vertical SA-DCT of the foreground, the vector length N(0 <N< 9) of each column j(0 <j< 9) of the foreground is calculated and the columns are shifted and aligned to the upper border of the 8 8 reference block (Figure 2(b)). Depending on the vector size N of each particular column of the segment, a DCT transform matrix DCT N, where h DCT N (p; k) =c 0 cos p k + 1 i 2 N p; k =0; 1; ::; N, 1: (1) containing a set of N DCT N basis vectors is selected. Here c 0 = q 1 2 if p = 0, and c 0 = 1 otherwise. p denotes the pth DCT basis vector. The N vertical DCT-coecients c j for each 5

x x x y y u DCT-1 DCT-2 DCT-3 DCT-4 DCT-6 DCT-3 (a) Original Segment (b) Ordering of Pels and vertical SA-DCT used (c) Location of pels after vertical SA-DCT x x v u u DCT-6 DCT-5 DCT-4 DCT-2 DCT-1 DCT-1 u (d) Location of pels prior to horizontal SA-DCT (e) Ordering of pels and horizontal SA-DCT used (f) Location of 2-D SA-DCT coefficients Figure 2: Successive steps to perform a forward SA-DCT on an arbitrarily shaped image foreground segment segment column data x j are calculated according to the formula 2 c j = N DCT N x j (2) In Figure 2(b) the 4th column belonging to the object is transformed using DCT 3 basis vectors. After SA-DCT in the vertical direction, the lowest DCT-coecients (DC values ) for the segment columns are found along the upperborderofthe 8 8 reference block (Figure 2(c)). To perform the horizontal DCT transformation (Figure 2(e)), the length of each row is calculated, and the rows are shifted to the left border of the 8 8 reference block, and a horizontal DCT adapted to the size of each row is calculated using equations (1) and (2). Note that horizontal SA-DCT is performed along vertical SA-DCT coecients with the same index (i.e. all vertical DC coecients () are grouped together and are SA-DCT transformed in the horizontal dimension). Figure 2(f) 6

shows the nal location of the resulting DCT coecients within the 8 8 image block. Performing SA-DCT results in the nal number of DCT-coecients that is identical to the number of pels contained in the image segment. Also, the coecients are located in comparable positions as in a standard 88 block. The DC coecient() is located in the upper left border of the reference block, and depending on the actual shape of the segment, the remaining coecients are concentrated around the DC coecient. The contour of the segment is transmitted to the receiver the decoder using chain-dierence coding [6][7]. The decoder performs the Inverse Shape-Adaptive DCT using the equation x j = DCT N T cj (3) in both horizontal and vertical segment direction. Here c j denotes a horizontalorvertical coecient vector of size N. 4 The MPEG-2 video bitstream: Syntax and Semantics MPEG standards specify a syntax for the bitstream and the decoding process. An input video sequence is divided into units of group of pictures (GOP's) which can be decoded and reconstructed independently of other GOPs, if necessary (Figure 3). Each GOP has three kinds of pictures depending on how the picture is coded: Intra-coded (I) pictures, Predictive-coded (P) pictures and i-directionally predictive-coded () pictures[5]. An I-picture is coded using information only from itself and can be decoded independently of other pictures. A P-picture is a picture encoded using motion-compensated prediction from a past I-picture or P-picture. A -picture is a picture which is coded using motion-compensated prediction from a past and/or future I or P- picture. The order of the coded pictures in the coded bitstream is the order in which the decoder processes them and not necessarily the order in which the pictures are displayed. Each picture within the GOP is composed of one or more slices. A slice consists of a horizontal row of macroblocks. Slices occur in the bitstream in the order in which they are encountered, starting at the upper-left corner of the picture and proceeding by raster-scan order from left to 7

FMC MC FMC FMC: Forward motion compensation MC: ackward motion compensation I P P P I P I 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 Group of pictures (a) time I 1 P 4 2 3 P 7 5 6 P 10 8 9 I 13 11 12 P 16 14 15 P 19 17 18 (b) Figure 3: Example of a coded video sequence (a) display and (b) bitstream order right and top to bottom (see Figure 4). A macroblock consists of a 16 16 block of luminance and associated 8 8 chrominance blocks. Each luminance block consists of 8 8 pixels. The 8 8 blocks are fundamental units of SA-DCT based transform coding. 5 Incorporating Content-ased Accessibility into the MPEG-2 framework It was decided to use the current MPEG-2syntax and modify its semantics involving the DCT to accomodate Shape-Adaptive DCT. Due to time constraints, we restricted ourselves to intra-frame coding of monochrome images and video sequences. To provide content-based addressability we introduced object-based slices. This means that we nowhave asmany slices on a horizontal row of macroblocks as there are objects (Figure 5). The introduction of object-based slices allows us to decode only those slices which pertain to the object of interest. To incorporate this new concept into the MPEG-2 slice structure we used the extra information slice data element to specify upto 256 dierent objects. We also had to break away from the MPEG-2 specication to allow for skipped macroblocks in case of I-pictures and overlap of slices. 8

Frame A C E D Macroblock G F H I lock 16 8 8 (a) (b) Figure 4: (a) General slice structure (b) Luminance macroblocks and blocks 6 Results Still images such as Lena and arbara were segmented by hand. The mask so obtained was used along with the input image to perform a Shape-Adaptive DCT of 8 8 image blocks. The SA-DCT coecients were zig-zag scanned and run-length coded. MPEG-2 intra-vlc-tables were then used to Human encode the coecients. At the decoder the reverse process was carried out in a manner that only objects of interest are decoded. The block diagram for the system is shown in Figure 6. The images were encoded as explained above. A software simulation of the end-to-end system was built. Selected objects in the image were decoded, thus verifying access to objects in the image. In order to verify that we were not gaining this new functionality of accessing objects at the cost of compression eciency, the bits-per-pixel (bpp) needed to encode Lena and arbara at various quantization levels were computed. The results are tabulated in Table 1. Figure 7 shows Lena and arbara coded using the scheme described at around 0.5 and 1.0 bpp. 9

Frame (b) Slice for object 1 Object 2 (c) Slice for object 2 Object 1 (a) Figure 5: MPEG-4 slice structure Table 1: Quantization v/s PP Lena arbara Qscale its per pixel SNR its per pixel SNR 1 2.93 45.43d 4.01 44.17d 6 0.73 38.41d 1.12 36.01d 11 0.48 35.32d 0.73 32.57d 16 0.41 33.69d 0.59 30.73d 21 0.36 32.50d 0.50 29.33d 26 0.33 31.61d 0.44 28.25d 31 0.31 30.83d 0.40 27.35d 7 Conclusion The suitability of SA-DCT to object-based encoding was established by designing an encoder and a decoder based on the MPEG-2 framework. It was veried that we would not be trading-o compression ratio to gain this additional functionality. Tests were also run to nd out the overhead involved in encoding the object boundary using chain-dierence coding. It was found that contour imformation contributed to an insignicant (< 1%) part of the coded data. We nowhave amethod for object based encoding that is ecient. Eorts are underway todevelop methods for automatic segmentation of objects and extending SA-DCT based codec to inter-frame coding and full-color image sequences. 10

PIC EXTRACT DCT LOCKS SHAPE ADAPTIVE DCT QUANTIZE ENTROPY CODING MASK CHANNEL PIC RECONSTITUTE FRAME INVERSE SHAPE ADAPTIVE DCT DEQUANTIZE ENTROPY DECODING Figure 6: lock diagram of the system A Appendix The following are the changes that had to be made to the existing MPEG-2 code. 1. Allowing skipped macroblocks in INTRA frames. 2. Using SA-DCT routines instead of DCT. 3. Using the extra information slice data element to specify the object to be encoded/decoded. 4. Reconstituting the frame using slices belonging to dierent objects and allowing for overlap of slices. 11

References [1] Information Technology- Generic Coding of Moving Pictures and Associated Audio, Recommendation ITU-T H.262, ISO/IEC 13818-2. [2] MPEG Moving Pictures Expert Group FAQ, http://www.crs4.it/ ~ luigi/mpeg/mpegfaq.html. [3] MPEG-4 Proposal Package Description (PPD) - Revision 2 (Lausanne Revision), ISO/IEC JTC1/SC29/WG11, MPEG 95/March 1995. [4] Thomas Sikora and ela Makai, Shape-Adaptive DCT for Generic Coding of Video, IEEE Transactions on Circuits and Systems, Vol.5, No.1, February 1995. [5] Kiran Challapali, Video Compression for Digital Television Applications, Philips Journal of Research, August 1995. [6] H. Freeman, On the Encoding of Arbitrary Geometric Congurations, IRE Transactions on Electroncs and Computers,pp. 260-268, Vol. EC-10, June 1961. [7] H. Freeman, Computer Processing of Line-Drawing Images, Computing Surveys, pp. 57-97, Vol. 6, No.1, March 1974. 12

Figure 7(a): From top left corner clockwise: Original Lena, Decoded Lena Decoded foreground, Decoded background at 0.52bpp 13

Figure 7(b): From top left corner clockwise: Original Lena, Decoded Lena Decoded foreground, Decoded background at 1.1bpp 14

Figure 7(c): From top left corner clockwise: Original arbara, Decoded arbara Decoded foreground, Decoded background at 0.51bpp 15

Figure 7(d): From top left corner clockwise: Original arbara, Decoded arbara Decoded foreground, Decoded background at 0.97bpp 16