Compression of Light Field Images using Projective 2-D Warping method and Block matching

Similar documents
Compression of Stereo Images using a Huffman-Zip Scheme

Review and Implementation of DWT based Scalable Video Coding with Scalable Motion Coding.

Stereo Image Compression

Rate Distortion Optimization in Video Compression

Reconstruction PSNR [db]

Mesh Based Interpolative Coding (MBIC)

Comparative Study of Partial Closed-loop Versus Open-loop Motion Estimation for Coding of HDTV

Rate-distortion Optimized Streaming of Compressed Light Fields with Multiple Representations

Video Compression Method for On-Board Systems of Construction Robots

Rate-distortion Optimized Streaming of Compressed Light Fields with Multiple Representations

International Journal of Emerging Technology and Advanced Engineering Website: (ISSN , Volume 2, Issue 4, April 2012)

Video Compression An Introduction

Overview: motion-compensated coding

recording plane (U V) image plane (S T)

DECODING COMPLEXITY CONSTRAINED RATE-DISTORTION OPTIMIZATION FOR THE COMPRESSION OF CONCENTRIC MOSAICS

2014 Summer School on MPEG/VCEG Video. Video Coding Concept

Block-Matching based image compression

ISSN (ONLINE): , VOLUME-3, ISSUE-1,

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 4, APRIL

Frequency Band Coding Mode Selection for Key Frames of Wyner-Ziv Video Coding

Interframe coding of video signals

Module 7 VIDEO CODING AND MOTION ESTIMATION

Optimized Progressive Coding of Stereo Images Using Discrete Wavelet Transform

ECE 417 Guest Lecture Video Compression in MPEG-1/2/4. Min-Hsuan Tsai Apr 02, 2013

A High Quality/Low Computational Cost Technique for Block Matching Motion Estimation

Video Quality Analysis for H.264 Based on Human Visual System

06/12/2017. Image compression. Image compression. Image compression. Image compression. Coding redundancy: image 1 has four gray levels

Outline Introduction MPEG-2 MPEG-4. Video Compression. Introduction to MPEG. Prof. Pratikgiri Goswami

BIG DATA-DRIVEN FAST REDUCING THE VISUAL BLOCK ARTIFACTS OF DCT COMPRESSED IMAGES FOR URBAN SURVEILLANCE SYSTEMS

Interframe coding A video scene captured as a sequence of frames can be efficiently coded by estimating and compensating for motion between frames pri

Image Compression Algorithm and JPEG Standard

VC 12/13 T16 Video Compression

1740 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 7, JULY /$ IEEE

HYBRID TRANSFORMATION TECHNIQUE FOR IMAGE COMPRESSION

CODING METHOD FOR EMBEDDING AUDIO IN VIDEO STREAM. Harri Sorokin, Jari Koivusaari, Moncef Gabbouj, and Jarmo Takala

IMAGE COMPRESSION USING ANTI-FORENSICS METHOD

Context based optimal shape coding

Graph Fourier Transform for Light Field Compression

Intra-Mode Indexed Nonuniform Quantization Parameter Matrices in AVC/H.264

Extensions of H.264/AVC for Multiview Video Compression

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 6, NO. 11, NOVEMBER

Image Compression for Mobile Devices using Prediction and Direct Coding Approach

A NOVEL SCANNING SCHEME FOR DIRECTIONAL SPATIAL PREDICTION OF AVS INTRA CODING

Coding of Coefficients of two-dimensional non-separable Adaptive Wiener Interpolation Filter

CSE237A: Final Project Mid-Report Image Enhancement for portable platforms Rohit Sunkam Ramanujam Soha Dalal

CMPT 365 Multimedia Systems. Media Compression - Video

EE 5359 Low Complexity H.264 encoder for mobile applications. Thejaswini Purushotham Student I.D.: Date: February 18,2010

( ) ; For N=1: g 1. g n

A Comparative Study of DCT, DWT & Hybrid (DCT-DWT) Transform

JPEG compression of monochrome 2D-barcode images using DCT coefficient distributions

LECTURE VIII: BASIC VIDEO COMPRESSION TECHNIQUE DR. OUIEM BCHIR

International Journal of Advanced Research in Computer Science and Software Engineering

Redundant Data Elimination for Image Compression and Internet Transmission using MATLAB

EE 5359 MULTIMEDIA PROCESSING SPRING Final Report IMPLEMENTATION AND ANALYSIS OF DIRECTIONAL DISCRETE COSINE TRANSFORM IN H.

A NEW ENTROPY ENCODING ALGORITHM FOR IMAGE COMPRESSION USING DCT

Wavelet-Based Video Compression Using Long-Term Memory Motion-Compensated Prediction and Context-Based Adaptive Arithmetic Coding

Final report on coding algorithms for mobile 3DTV. Gerhard Tech Karsten Müller Philipp Merkle Heribert Brust Lina Jin

Features. Sequential encoding. Progressive encoding. Hierarchical encoding. Lossless encoding using a different strategy

IMAGE COMPRESSION USING HYBRID QUANTIZATION METHOD IN JPEG

Implementation and analysis of Directional DCT in H.264

5LSH0 Advanced Topics Video & Analysis

Chapter 11.3 MPEG-2. MPEG-2: For higher quality video at a bit-rate of more than 4 Mbps Defined seven profiles aimed at different applications:

EE368 Project Report CD Cover Recognition Using Modified SIFT Algorithm

BLOCK MATCHING-BASED MOTION COMPENSATION WITH ARBITRARY ACCURACY USING ADAPTIVE INTERPOLATION FILTERS

IMAGE COMPRESSION USING HYBRID TRANSFORM TECHNIQUE

Fingerprint Image Compression

Modified SPIHT Image Coder For Wireless Communication

Motion Estimation for Video Coding Standards

Chapter 10. Basic Video Compression Techniques Introduction to Video Compression 10.2 Video Compression with Motion Compensation

Multi-View Image Coding in 3-D Space Based on 3-D Reconstruction

Digital Image Steganography Techniques: Case Study. Karnataka, India.

Reversible Wavelets for Embedded Image Compression. Sri Rama Prasanna Pavani Electrical and Computer Engineering, CU Boulder

DIGITAL TELEVISION 1. DIGITAL VIDEO FUNDAMENTALS

Model-Aided Coding: A New Approach to Incorporate Facial Animation into Motion-Compensated Video Coding

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 9, NO. 7, OCTOBER

JPEG 2000 vs. JPEG in MPEG Encoding

CHAPTER 9 INPAINTING USING SPARSE REPRESENTATION AND INVERSE DCT

AUDIOVISUAL COMMUNICATION

Express Letters. A Simple and Efficient Search Algorithm for Block-Matching Motion Estimation. Jianhua Lu and Ming L. Liou

FRACTAL IMAGE COMPRESSION OF GRAYSCALE AND RGB IMAGES USING DCT WITH QUADTREE DECOMPOSITION AND HUFFMAN CODING. Moheb R. Girgis and Mohammed M.

7.5 Dictionary-based Coding

Overview. Videos are everywhere. But can take up large amounts of resources. Exploit redundancy to reduce file size

A Image Comparative Study using DCT, Fast Fourier, Wavelet Transforms and Huffman Algorithm

A Novel Approach for Deblocking JPEG Images

Redundancy and Correlation: Temporal

Reduced Frame Quantization in Video Coding

Digital Video Processing

Vidhya.N.S. Murthy Student I.D Project report for Multimedia Processing course (EE5359) under Dr. K.R. Rao

EFFICIENT DEISGN OF LOW AREA BASED H.264 COMPRESSOR AND DECOMPRESSOR WITH H.264 INTEGER TRANSFORM

Multimedia Systems Image III (Image Compression, JPEG) Mahdi Amiri April 2011 Sharif University of Technology

An Efficient Mode Selection Algorithm for H.264

3D Mesh Sequence Compression Using Thin-plate Spline based Prediction

Video Compression MPEG-4. Market s requirements for Video compression standard

Performance analysis of Integer DCT of different block sizes.

Modeling and Simulation of H.26L Encoder. Literature Survey. For. EE382C Embedded Software Systems. Prof. B.L. Evans

VIDEO AND IMAGE PROCESSING USING DSP AND PFGA. Chapter 3: Video Processing

A Novel Statistical Distortion Model Based on Mixed Laplacian and Uniform Distribution of Mpeg-4 FGS

NOVEL TECHNIQUE FOR IMPROVING THE METRICS OF JPEG COMPRESSION SYSTEM

Video compression with 1-D directional transforms in H.264/AVC

MOTION ESTIMATION WITH THE REDUNDANT WAVELET TRANSFORM.*

Transcription:

Compression of Light Field Images using Projective 2-D Warping method and Block matching A project Report for EE 398A Anand Kamat Tarcar Electrical Engineering Stanford University, CA (anandkt@stanford.edu) Abstract: Light Fields are 2-D array of images of a 3D object taken from various viewpoints around the object. They constitute massive amounts of data if accurate representation of light field of an object is expected. Hence Compression is very much essential in case of light fields. In this paper we suggest 2-D projective warping method to predict a view from a previous view and use it in collaboration with motion compensated block matching technique to compress such datasets. The performance observed is better than what we would have if we had only used motion compensation. Keywords: Light Fields, Motion Compensation, Projective Warping shown in Fig.1 (Crystal) and Fig.2 (Lego) which we use in our simulations. The classic paper [1] gives us an idea of what light fields are, how they can be represented and captured or obtained. This will not be discussed in this paper. However, since these images are taken from predetermined locations with camera positions equidistant from one another and specific preknown angles these consecutive views are related to one another in a deeper way. The consecutive views are very similar to one another and hence if we exploit this redundancy between views then large compression gains can be obtained. I.INTRODUCTION Light fields capture the essence of amount of light reflected from an object in all directions. The best way to obtain this is by taking images of the object of concern from all possible directions. Sometimes, the users might be interested in only certain views of the object in which case the images might be taken from several equally spaced points in a plane right in front of the object. This leads to a 2-D array of 2-D images of the object. So, overall the light fields represent a 4-D signal. Accurate representation of light fields requires images of the order of resolution of each image which can amount to tens of thousands of images. Hence compression of this Gigabytes of data becomes necessary. Example light fields are Fig. 1 Crystal Light Field In this paper we propose to predict each consecutive image from previous view using 2-D warping method explained shortly and code the

residual only along with side information required to decode the views. II.RELATED WORK A lot of work has been done in the past in the topic of compression of light fields. Shape adaptation was used in [2] where only the relevant Fig. 2 Lego Light Field portion related to shape of the object is encoded and sent while neglecting the background. Another major technique that has been extensively studied is Disparity compensation. In [3] a recursive algorithm for compressing light fields using disparity maps is proposed. Using four corner views of the array of images as reference the other images are predicted recursively using disparity maps and residuals encoded and sent. Compression ratio as high as 1000:1 was obtained at acceptable image quality. Among other modifications made to disparity compensation includes using it with lifting structure of Wavelet Transforms and shape adaptation as in [4], [5]. All of them use the same basic principle. Predict views from one another and encode the residual. We will use the same approach using a 2-D Projective warping approach. III.MOTIVATION It is very clear that each consecutive view is some sort of a 2-D warped version of the previous view. If we know the camera parameters for each view i.e. we have the geometry model then prediction of each view from the other becomes a simple affair. But in our case since we do not have any kind of camera parameters at our disposal we decided to come up with a technique in which we can project the previous image using 2-D warping which uses feature mapping between the two views to obtain a projection matrix. The projected view gives us a very good prediction of what the next view is supposed to look like. Hence, then by simple application of motion compensation we can encode the residual easily and send it to the decoder with motion vectors and projection matrix. IV.ALGORITHM AND MODEL IV.A.Model: The 2-D warping algorithm predicts consecutive view from a previous view by using the projection matrix. This projection matrix gives the relation between the pair of views and is uniquely defined for each pair of views. It is a 3x3 matrix which when applied to the previous view gives us an estimate of how the next projected image will look like. Since each pixel in previous image is likely to be projected similarly in the next view this matrix is same for a pair of views and need be transmitted only once. But it is different for different pairs of views. This matrix needs to be quantized, encoded and transmitted along with the residual image described next as side information for successful reconstruction of views at the decoder. Fig. 3 System Model Since we want our encoder and decoder to be in Synchronization with each with regards to decoding we use reconstructed image at each step

to project the next view of the sequence. Once we have the reconstructed image from previous step and its projection we can use block matching algorithm as in motion compensation to get the actual prediction and hence the residual image. The block diagram of the system model is shown in Fig.3. Now, why do we go for motion compensation? When we project the image using 2-D warping we notice that some artifacts are introduced at edges of the projected image. This is because projected view is predicted using previous view which lacks some information which is introduces in the next view as camera moves along the 2-D plane clicking different views. Some other artifacts and blurring may also be introduced in the projected image. Hence we use the previous reconstructed image in that case as another option to predict the next view. The views are then broken into equal sized blocks and motion compensation is applied using the previous reconstructed view and the projected predicted view of the previous reconstructed view. The best blocks from both views are found by block matching. The block from the current view is moved around a 2-D search area in the reconstructed and reconstructed projected previous view. By applying the Lagrangian cost function the best block among the two options mentioned is obtained and the difference gives us the residual block. Then Discrete Cosine Transform (DCT) is then applied to the residual image thus obtained which is then scalar quantized and entropy coded. This is transmitted to the decoder along with 2-D motion vector which gives information regarding movement in the horizontal and vertical direction in the previous view of the current block in the current view. The side information also includes the one bit needed to provide information related to which view was used for prediction of that particular block: whether it was the previous reconstructed view or the projection of the previous reconstructed view. The decoder also needs the projection matrix for each pair of views for reconstruction; this is also transmitted as side information. Since this is just a 3x3 matrix and does not have fixed statistics we choose to encode it using fixed length coding. Once the decoder has the all the side information it needs it can simply de-quantize the residual, perform inverse DCT and obtain the actual reconstruction using motion vector and previous reconstructed view or its projection. One important point to note is that the first view is always intra coded as it cannot be predicted and then decoded from other views. Also view sequencing plays an important role in the way the algorithm works which is discussed shortly. IV.B.Assumptions: We have made a few assumptions while executing our project. Firstly, we assume that the motion vector components that we encode have a Laplacian distribution and a displacement of 10 in horizontal or vertical direction can be encoded using 16 bits. Assuming Laplacian distribution for motion vectors is a fair deal as it has been done in the [6] where variable block size motion estimation has been implemented for H.264. Secondly, we encode the projection matrix with fixed length coding of 16 bits since it does not have specific statistics. Also it does not really add a great deal to the overall bits per pixel since it is just a 3x3 matrix for each pair of views. IV.C.View Sequencing: View Sequencing is an important part of the overall scheme. We need the consecutive views to have minimum differences among them so we choose to read the 2-D array of images in a zigzag pattern as shown in the Fig.4. That way the difference in camera movement between views is kept minimum and hence prediction is easier giving smaller residuals and better results. Fig. 4 View Sequencing

Fig. 5 View to View Transformation IV.D.Obtaining Projected Image: To obtain the Projection matrix relating two views we first need to get some corresponding matching points between the two views to capture the essence of how the pixels from previous view are projected in the next view in the sequence. This is done by feature matching algorithm called Harris corner detection, which gives us a number of corner points in each of the views in a pair. From these sets only the pairs of points which match each other in both the views with high correlation are chosen. Using these points as reference, we apply Random Sample Consensus algorithm to obtain the best fit projection matrix model for these points. It is an iterative method to estimate parameters of a mathematical model from a set of observed data which contains outliers. Using this matrix a normalized 2-D homogenous transform is applied to the previous view and nearest neighbor interpolation performed to obtain the projected view. Fig.5 shows transformation of reconstructed previous view into projection view and how we obtain residual image. IV.E.Algorithm and Implementation: Step 1: Read images in a sequential manner mentioned above. Step 2: If it is the first image encode it using intra coding, apply DCT, Quantize it. Step3: If not the first image, use the 2-D warping technique to obtain projection of the previous reconstructed view as a prediction for the current view and the projection matrix. Include the rate of encoding the quantized matrix in the total rate. Step 4: Consider the current view block-wise. For each block perform block matching in previous reconstructed view and its projection prediction.

Step 5: Select the best block based on the Lagrangian cost function. Where C1, D1, R1 are the total cost, distortion compared to block in current view and rate required to encode the residual block and motion vector for that block in the reconstructed previous view. Similarly, the second equation is representative of projection of previous reconstructed view. λ is related by the following relation in default case.,where Q is the quantization step size. Step 6: Select the best overall block from above based on minimum cost criteria and include additional rate for side information related to option we choose. Step 7: Apply 2-D DCT to the residual and Quantize and transmit it to the decoder. Also perform inverse operations like de-quantizing and IDCT and motion compensation using motion vector side information to obtain reconstructed frame to use for next encoding stage. Step 8: Calculate total distortion as mean squared error between the original view and the reconstructed view obtained. Total rate is the rate of encoding residual frame and side information as mentioned above. The Peak Signal to Noise Ratio (PSNR) is calculated from distortion using following relation:,where d is the average distortion per pixel. Important Note: A training module is implemented before actual encoding happens. In this i) All the images are intra coded and a probability density function of different DCT coefficients is obtained. ii) All images except the first one are coded using block matching with 2-D warping, only this time the Lagrangian cost does not include the rate for encoding the block but only the side information. This gives us the probability density of residual DCT coefficients. These pdf s are used for encoding stage. V.RESULTS V.A.Parameters we used for our implementation: Since the algorithm is computationally and time exhaustive we decided to test our system for a small data set of 4x4 grid of 256x256 resolution images. The Light Field data sets we chose for our project are Crystal Light Field and Lego Light Field. These can be seen in Fig.1 and Fig.2 above. Only the luminance component of the light field data sets was compressed. The original images were of 1024x1024 resolution. These were downsampled and interpolated to avoid aliasing using bicubic interpolation to obtain the smaller resolution images. The default values of block size used was 8x8. For obtaining rate PSNR curves the Quantization step size was varied as 2 i, where i = 3,4,5,6,7,8. V.B.Rate-Distortion Behavior: Here we compare the performance of 2-D projective warping method with block matching and Motion Compensation with integer pixel accuracy method for the two data sets. It is clear from Fig.6 and Fig.7 that the projective warping method gives higher PSNR and lower rate. We can explain this by saying that the projective method is a better predictor for the next consecutive view and hence gives more efficient residuals resulting in lower distortion and smaller rate to encode the residual as compared to Motion Compensation in whose case we expect longer motion vectors to compensate for camera movement and more distorted residuals. Fig. 6 Rate PSNR curve for Crystal

Also, we see that Crystal Light field has lower PSNR rate curve compared to Lego. This can be explained because Lego has kind of a flat background wall while Crystal has sharp background. Hence, as camera moves taking views the movement does not affect the flat background much causing smaller residuals and hence better performance compared to Crystal where camera movement affects the sharp background more. Fig. 8 Compression Ratio for Crystal Fig. 7 Rate PSNR curve for Lego V.C.Compression Ratio Performance: For these results we use a default block size of 8. The compression ratios we observe for Crystal light field are as high at 60 at high quantization step size of 2 8, while those for Lego light fields are as high as 100. It is clear from the Fig.8 and Fig.9 that motion compensation gives lower compression ratios than projective 2-D warping method. The reasoning for that is again that we get a better prediction from the projection than using the previous reconstructed view only and hence smaller residuals and motion vectors to code. This leads to a better data compression. V.D.Rate-Distortion behavior for different block sizes: We use three different block sizes: 8x8, 16x16, 32x32. From both Fig.10 and Fig.11 for the two data sets we observe that for 2-D warping method using smaller block size gives better results than using larger block sizes. Smaller block sizes mean better estimation of current view using previous reconstructed frame and its projection. This means lower distortion and hence better PSNR. Fig. 9 Compression Ratio for Lego Fig. 10 Rate PSNR for varying block size: Crystal VI.SUMMARY AND CONCLUSION Overall, we can summarize that light fields which amount to gigabytes of data can be compressed using redundancy between the views as much as possible. Motion compensation tries to do that but the projection that we obtain using the 2-D warping method turns out to be a better predictor

of consecutive views and hence with use of block matching like that in motion compensation we can get much better results than the case where we only use motion compensation. In our method we Fig. 11 Rate PSNR for varying block size: Lego exploit the idea that consecutive views are related to one another due to the specific way the camera moves and can be predicted from each other using projection matrix. Also, the way the views are sequenced also plays a critical role in getting good compression. Our aim is to minimize distortion as well as rate which we achieve with our method better than that with motion compensation. VII.FUTURE WORK In our project simulations we used small data sets by sub-sampling the light fields, yet we get very good compression ratios at acceptable image qualities. Accordingly, use of larger data could give much higher ratios. This could be verified. Also right now we are only considering algorithms upto integer pixel accuracy. Sub-pel algorithms with interpolation of blocks to get residuals could be developed and performance evaluated. I suspect it will give much better results than current version. Also, extra modes like copy mode could be introduced. ACKNOWLEDGEMENTS I would like to thank Professor Girod and Mina Makar for their help and advice throughout the project. Thanks to Professor Peter Kovesi and Professor Levoy s group for access to open source Matlab library and Light field data sets. Special thanks to Chuo-Ling Chang for his DAPBT code (which we finally did not use in our simulation) and Huizhong Chen for his help. REFERENCES [1] M.Levoy, P.Hariharan, Light Field Rendering, Proceedings SIGGRAPH 96, pp. 31-42, Aug.1996 [2] C.L.Chang, X.Zhu, P.Ramanathan, B.Girod, Shape adaptation for light field compression, IEEE proceedings on Image Processing, Vol.1, pg.i, Sept.2003 [3] M.Magnor, B.Girod, Hierarchical coding of light fields with disparity maps, IEEE proceedings on Image Processing, Vol.3, pg. 334-338, Oct.1999 [4] B.Girod, C.L.Chang, P.Ramanathan, X.Zhu, Light Field compression using Disparity compensated Lifting, IEEE Multimedia and Expo, Vol.1, pg.373-376, July 2003 [5] C.L.Chang, X.Zhu, P.Ramanathan, B.Girod, Light Field Compression using Disprity compensated lifting and Shape Adaptation, IEEE transactions on Image Processing, Vol.15, No.4, April 2006 [6] T.Y.Kuo, C.H.Chan, Fast Variable Block Size Motion Estimation for H.264 Using Likelihood and Correlation of Motion Field, IEEE transactions on Circuits and System for Video Technology, Vol.16, No.10, Oct.2006 APPENDIX Division of Work Task Anand Shinjini Choice of data sets 50% 50% View Sequencing 100% - Projection Function* 20% 40% Motion Compensation and Mode Selection 100% - Block matching 100% - Simulation and all results 100% - Presentation 40% 60% Report 100% - *We had help from Prof. Peter Kovesi s Matlab open source function Library