Transforms for the Motion Compensation Residual

Similar documents
Video compression with 1-D directional transforms in H.264/AVC

Implementation and analysis of Directional DCT in H.264

EE 5359 MULTIMEDIA PROCESSING SPRING Final Report IMPLEMENTATION AND ANALYSIS OF DIRECTIONAL DISCRETE COSINE TRANSFORM IN H.

Review and Implementation of DWT based Scalable Video Coding with Scalable Motion Coding.

Performance Comparison between DWT-based and DCT-based Encoders

Yui-Lam CHAN and Wan-Chi SIU

2014 Summer School on MPEG/VCEG Video. Video Coding Concept

An Optimized Template Matching Approach to Intra Coding in Video/Image Compression

Video Compression An Introduction

International Journal of Emerging Technology and Advanced Engineering Website: (ISSN , Volume 2, Issue 4, April 2012)

Motion-Compensated Subband Coding. Patrick Waldemar, Michael Rauth and Tor A. Ramstad

Fast Mode Decision for H.264/AVC Using Mode Prediction

Reduced Frame Quantization in Video Coding

BLOCK MATCHING-BASED MOTION COMPENSATION WITH ARBITRARY ACCURACY USING ADAPTIVE INTERPOLATION FILTERS

Compression of Stereo Images using a Huffman-Zip Scheme

Compression of Light Field Images using Projective 2-D Warping method and Block matching

Rotate Intra Block Copy for Still Image Coding

Digital Video Processing

A STAIRCASE TRANSFORM CODING SCHEME FOR SCREEN CONTENT VIDEO CODING. Cheng Chen, Jingning Han, Yaowu Xu, and James Bankoski

FAST MOTION ESTIMATION DISCARDING LOW-IMPACT FRACTIONAL BLOCKS. Saverio G. Blasi, Ivan Zupancic and Ebroul Izquierdo

Lecture 13 Video Coding H.264 / MPEG4 AVC

A LOW-COMPLEXITY AND LOSSLESS REFERENCE FRAME ENCODER ALGORITHM FOR VIDEO CODING

Efficient MPEG-2 to H.264/AVC Intra Transcoding in Transform-domain

NEW CAVLC ENCODING ALGORITHM FOR LOSSLESS INTRA CODING IN H.264/AVC. Jin Heo, Seung-Hwan Kim, and Yo-Sung Ho

1740 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 7, JULY /$ IEEE

Comparative Study of Partial Closed-loop Versus Open-loop Motion Estimation for Coding of HDTV

Complexity Reduced Mode Selection of H.264/AVC Intra Coding

A NOVEL SCANNING SCHEME FOR DIRECTIONAL SPATIAL PREDICTION OF AVS INTRA CODING

Motion Estimation. Original. enhancement layers. Motion Compensation. Baselayer. Scan-Specific Entropy Coding. Prediction Error.

Video Quality Analysis for H.264 Based on Human Visual System

Mesh Based Interpolative Coding (MBIC)

Chapter 11.3 MPEG-2. MPEG-2: For higher quality video at a bit-rate of more than 4 Mbps Defined seven profiles aimed at different applications:

signal-to-noise ratio (PSNR), 2

Fast Decision of Block size, Prediction Mode and Intra Block for H.264 Intra Prediction EE Gaurav Hansda

A comparison of CABAC throughput for HEVC/H.265 VS. AVC/H.264

Deblocking Filter Algorithm with Low Complexity for H.264 Video Coding

Interframe coding A video scene captured as a sequence of frames can be efficiently coded by estimating and compensating for motion between frames pri

Optimized Progressive Coding of Stereo Images Using Discrete Wavelet Transform

Introduction to Video Compression

MRT based Adaptive Transform Coder with Classified Vector Quantization (MATC-CVQ)

Video Coding Using Spatially Varying Transform

CS 335 Graphics and Multimedia. Image Compression

H.264 to MPEG-4 Transcoding Using Block Type Information

Pattern based Residual Coding for H.264 Encoder *

Module 7 VIDEO CODING AND MOTION ESTIMATION

Reduced 4x4 Block Intra Prediction Modes using Directional Similarity in H.264/AVC

STUDY AND IMPLEMENTATION OF VIDEO COMPRESSION STANDARDS (H.264/AVC, DIRAC)

Image Segmentation Techniques for Object-Based Coding

Descrambling Privacy Protected Information for Authenticated users in H.264/AVC Compressed Video

Motion Estimation for Video Coding Standards

CHAPTER 9 INPAINTING USING SPARSE REPRESENTATION AND INVERSE DCT

Express Letters. A Simple and Efficient Search Algorithm for Block-Matching Motion Estimation. Jianhua Lu and Ming L. Liou

Frequency Band Coding Mode Selection for Key Frames of Wyner-Ziv Video Coding

Compression of RADARSAT Data with Block Adaptive Wavelets Abstract: 1. Introduction

Stereo Image Compression

A Hybrid Temporal-SNR Fine-Granular Scalability for Internet Video

LIST OF TABLES. Table 5.1 Specification of mapping of idx to cij for zig-zag scan 46. Table 5.2 Macroblock types 46

A deblocking filter with two separate modes in block-based video coding

Multi-View Image Coding in 3-D Space Based on 3-D Reconstruction

Homogeneous Transcoding of HEVC for bit rate reduction

An Improved Complex Spatially Scalable ACC DCT Based Video Compression Method

HEVC The Next Generation Video Coding. 1 ELEG5502 Video Coding Technology

Multimedia Systems Video II (Video Coding) Mahdi Amiri April 2012 Sharif University of Technology

Laboratoire d'informatique, de Robotique et de Microélectronique de Montpellier Montpellier Cedex 5 France

CHAPTER 6. 6 Huffman Coding Based Image Compression Using Complex Wavelet Transform. 6.3 Wavelet Transform based compression technique 106

Video Coding Standards. Yao Wang Polytechnic University, Brooklyn, NY11201 http: //eeweb.poly.edu/~yao

Scalable Video Coding

Week 14. Video Compression. Ref: Fundamentals of Multimedia

Transform Kernel Selection Strategy for the H.264

Multiresolution motion compensation coding for video compression

A Quantized Transform-Domain Motion Estimation Technique for H.264 Secondary SP-frames

VIDEO COMPRESSION STANDARDS

Extensions of H.264/AVC for Multiview Video Compression

DIGITAL TELEVISION 1. DIGITAL VIDEO FUNDAMENTALS

Advanced Video Coding: The new H.264 video compression standard

Variable Temporal-Length 3-D Discrete Cosine Transform Coding

A Image Comparative Study using DCT, Fast Fourier, Wavelet Transforms and Huffman Algorithm

IEEE TRANSACTIONS ON BROADCASTING, VOL. 51, NO. 4, DECEMBER

Optimizing the Deblocking Algorithm for. H.264 Decoder Implementation

Fast frame memory access method for H.264/AVC

Digital Image Steganography Techniques: Case Study. Karnataka, India.

JPEG 2000 vs. JPEG in MPEG Encoding

H.264/AVC Baseline Profile to MPEG-4 Visual Simple Profile Transcoding to Reduce the Spatial Resolution

THE H.264 ADVANCED VIDEO COMPRESSION STANDARD

Rate Distortion Optimization in Video Compression

Fine grain scalable video coding using 3D wavelets and active meshes

PERFORMANCE ANALYSIS OF INTEGER DCT OF DIFFERENT BLOCK SIZES USED IN H.264, AVS CHINA AND WMV9.

Video Transcoding Architectures and Techniques: An Overview. IEEE Signal Processing Magazine March 2003 Present by Chen-hsiu Huang

View Synthesis for Multiview Video Compression

Video coding. Concepts and notations.

Context based optimal shape coding

View Synthesis for Multiview Video Compression

Very Low Bit Rate Color Video

Wireless Communication

H.264 / AVC (Advanced Video Coding)

7.5 Dictionary-based Coding

Block-Matching based image compression

ISSN (ONLINE): , VOLUME-3, ISSUE-1,

Introduction to Video Coding

EE 5359 Low Complexity H.264 encoder for mobile applications. Thejaswini Purushotham Student I.D.: Date: February 18,2010

Transcription:

Transforms for the Motion Compensation Residual The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published Publisher Kisli, F., and J.S. Lim. Transforms for the motion compensation residual. Acoustics, Speech and Signal Processing, 29. ICASSP 29. IEEE International Conference on. 29. 789-792. 29 IEEE http://dx.doi.org/.9/icassp.29.49972 Institute of Electrical and Electronics Engineers Version Final published version Accessed Sun Jul 22 :32:46 EDT 28 Citable Link Terms of Use Detailed Terms http://hdl.handle.net/72./94 Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use.

Transforms for the Motion Compensation Residual Fatih Kisli and Jae S. Lim Research Laboratory of Electronics Massachusetts Institute of Technology E-mail: fkisli@mit.edu Abstract The Discrete-Cosine-Transform (DCT) is the most widely used transform in image and video compression. Its use in image compression is often justi ed by the notion that it is the statistically optimal transform for rst-order Markov signals, which have been used to model images. In standard video codecs, the motion-compensation residual (MC-residual) is also compressed with the DCT. The MC-residual may, however, possess different characteristics from an image. Hence, the question that arises is if other transforms can be developed that can perform better on the MC-residual than the DCT. Inspired by recent research on direction-adaptive image transforms, we provide an adaptive auto-covariance characterization for the MC-residual that shows some statistical differences between the MC-residual and the image. Based on this characterization, we propose a set of block transforms. Experimental results indicate that these transforms can improve the compression ef ciency of the MC-residual. Index Terms Discrete cosine transforms, Motion compensation, Video coding I. INTRODUCTION The Discrete-Cosine-Transform (DCT) is the most widely used transform in image and video compression. Its use in image compression is often justi ed by its being the statistically optimal transform for a class of signals, known as the rst-order Markov signals, which have been used to model images. In video coding, however, what is often transformed is the motion-compensation residual (MC-residual). In standard video codecs, the MC-residual is also transformed with the DCT. The MC-residual is related to images from which it has been obtained. However, its spatial characteristics may be different from that of an image. A number of studies report that the statistical characteristics of the MC-residual have some differences from those of images [], [2], [3]. Hence, the question that arises is what those characteristics are, if the DCT can effectively exploit them, and if we can develop other transforms that can exploit these characteristics better. Typically, motion compensation works well in smooth and slowly moving regions, and the MC-residual in such regions is small. Regions where motion compensation fails are occlusion regions. A major portion of regions where motion compensation works to some extent are object boundaries. Since motion-compensated prediction can only account for translational motion, whereas real-world objects also have other motions such as rotation, the shapes of objects tend to change slightly from fre to fre. As a result, the prediction around object boundaries is not successful. Therefore the highmagnitude pixels in the MC-residual tend to concentrate along object boundaries exposing one-dimensional structures in the MC-residual. An exple is shown in Figure. It appears that using two-dimensional transforms that have basis functions with square support is not the best choice. Recently, there has been a great deal of research on transforms that can take advantage of locally anisotropic features in images [4], [], [6], [7]. Conventionally, the 2-D DCT or the 2-D Discrete Wavelet Transform (DWT) is carried out as a separable transform by cascading two -D transforms in the vertical and horizontal directions. This approach does not take advantage of the locally anisotropic features present in images because it favors horizontal or vertical features over others. The main idea of these other approaches is to adapt to locally anisotropic features by performing the ltering along the direction where the image intensity variations are smaller. This is achieved by performing ltering and subspling on oriented sublattices of the spling grid [6], by directional lifting implementations of the wavelet transform [], or by various other means. Even though most of the work is based on the wavelet transform, applications of similar ideas to DCTbased image compression have also been made [7]. However, it appears that the applicability of these ideas to modeling and compressing the MC-residual have not been investigated. In this paper, we develop block transforms for the MCresidual. Using insights obtained from the research on direction-adaptive image transforms, we investigate how locally anisotropic features of images affect the MC-residual. In the next section, we obtain adaptive auto-covariance characterizations of the MC-residual and the image, which reveal some statistical differences between the MC-residual and the image. Fig.. Motion compensation residual (Mobile sequence at CIF resolution, fre 8 and ) 978--4244-24-/9/$. 29 IEEE 789 ICASSP 29

Based on this characterization, we propose in Section III a set of block transforms which can be used, together with the 2-D DCT, to compress the MC-residual. Experimental results, presented in Section IV, demonstrate that these transforms can improve the compression ef ciency of the MC-residual. II. CHARACTERIZATION OF THE MOTION COMPENSATION RESIDUAL Characterizations of the MC-residual are more focused on representing its auto-covariance with functions that provide a t to experimental data [], [2], [3]. These studies have used one global model for the entire MC-residual. However, the lessons learned from the research on direction-adaptive image transforms indicate that adapting the representation to local anisotropic features in images may improve the compression performance. We consider how such local anisotropic features of images affect the MC-residuals. To analyze this, we characterize the auto-covariance of the image and the MCresidual using a more generalized version of the conventionally used separable rst-order Markov model. The separable model and the generalized model are given below in () and (2), respectively. R(I,J)=ρ I ρ J 2. () R(θ, I, J) =ρ Icos(θ)+Jsin(θ) ρ Isin(θ)+Jcos(θ) 2. (2) The generalized model is a rotated version of the separable model, where θ represents the ount of rotation. Setting it to zero reduces the generalized model to the separable model. We estimate the pareters ρ and ρ 2 for the separable model, and the pareters ρ, ρ 2 and θ for the generalized model from blocks of 8x8 pixels of an image (Mobile sequence, fre ) and its MC-residual. We use the biased estimator to estimate the auto-covariance of the blocks and then nd the pareters ρ, ρ 2 and θ that minimize the mean-square-error between the auto-covariance estimate and the models in () and (2). Figures 2-a and 2-b show scatter plots of ρ and ρ 2 estimated from the image for the separable and the generalized auto-covariance models. Figures 2-c and 2-d show scatter plots of the se pareters estimated from the MC-residual. We use the Mobile sequence for these experiments because this sequence has various moving objects causing unsuccessful motion-compensated prediction at object boundaries. Characteristics similar to those in the plots of Figure 2 have been observed with other sequences as well. The scatter plots of ρ and ρ 2 obtained from the image using the separable and the generalized auto-covariance models are plotted in Figures 2-a and 2-b. In the plot of the generalized model (Figure 2-b), the region along the ρ = ρ 2 line is less populated by data points relative to the plot of the separable model (Figure 2-a). On the other hand, there are almost no data points beyond the ρ =.8 and ρ 2 =.8 lines in the separable model. From these observations, we can see that in the generalized case, the data points lying below the ρ = ρ 2 line have been shifted along the ρ axis towards the right, and the data points lying above the ρ = ρ 2 line have been shifted along the ρ 2 axis towards the top. This implies that by allowing an additional pareter θ in the model, higher correlation coef cients ρ or ρ 2 are more likely. The pareter θ adjusts itself such that either ρ or ρ 2 point along more regular variations than in the separable model, which is consistent with the respling and lifting methods in [4] and []. From the plots for the MC-residual in Figure 2, we can observe that in the plot with the separable auto-covariance model (Figure 2-c), the points seem to ll up evenly a region between a ρ 2 = k/ρ curve and the axes. In the plot with the generalized auto-covariance model (Figure 2-d), the points seem to concentrate towards the tails of that curve, forming semi-disks centered on the axes. Better concentration of data points in a region means better predictability of data, which in turn implies that the model used in (2) can provide a more faithful characterization of the data. The signi cant improvement of the characterization with an additional pareter, θ, is important since a better paretrization of a source can potentially lead to a better compression of the source. Figure 2 also illustrates the effect of the locally anisotropic features of images on the MC-residual. Consider the generalized auto-covariance characterization of the image and the MC-residual (Figure 2-b and 2-d). While the characterization of the image has data points which, if roughly described, ll up two disks tangent to the ρ and ρ 2 axes, the characterization of the MC-residual seems to have its data points concentrated in two semi disks centered on those axes. In other words, the data points move r to the axes in the MC-residual case. This means that given any data point (ρ, ρ 2 )inthe image characterization, the smaller covariance factor becomes even smaller in the MC-residual characterization. This is how locally anisotropic features of images affect or propagate to the MC-residual. It is a major difference in the statistical characteristics between the image and the MC-residual. Therefore, direction-adaptive transforms proposed for images may not work as well on MC-residuals. Indeed, in Section IV, we compare the performance of the direction-adaptive blockbased image transform in [7], with the transforms we propose. The results show the superiority of the proposed transforms on the MC-residual. III. ONE-DIMENSIONAL TRANSFORMS The ρ vs ρ 2 scatter plot of the MC-residual obtained with the generalized model in Figure 2-d indicates that often one of the two correlation coef cients, either ρ or ρ 2, is signi cantly larger than the other. Hence, we propose to decorrelate the data along the direction with the larger correlation coef cient. Unlike the direction-adaptive transforms applied to images, we choose not to perform any decorrelation along the other direction with the smaller correlation coef cient. For low correlation coef cients, decorrelation may not perform well. It may even worsen the energy-compaction, especially for such signals as the MC-residual. Unlike an image, the energy of the MC-residual is not uniformly distributed in the spatial domain. Even within a block, many pixels may have zero intensity and the energy may often be concentrated in a region of the 79

..... (a) ρ vs ρ 2 from the separable auto-covariance (image). (b) ρ vs ρ 2 from the generalized auto-covariance (image). (c) ρ vs ρ 2 from the separable auto-covariance (MC-residual). (d) ρ vs ρ 2 from the generalized auto-covariance (MC-residual) Fig. 2. Scatter plots of estimated pareters ρ and ρ 2 using the separable and generalized auto-covariance models from an image (mobile sequence at CIF resolution, fre ) and an MC-residual (mobile sequence at CIF resolution, fre 8 and ). block (see Figure ). Performing a spatial transform with a support on the whole block may not produce the best energy compaction. The transforms that we propose to use for the MC-residual, in addition to the 2-D DCT, are shown in Figure 3. They are one-dimensional DCT s over groups of pixels in 8x8 blocks. One group of pixels, in the rst transform for exple (most left in Figure 3), consists of the eight leftmost pixels sitting on top of each other. The groups of pixels are chosen such that they form lines roughly pointing at one direction, which is the direction of the large correlation coef cient. We choose sixteen directions that cover 8 and we have sixteen transforms. Only the rst ve are shown in Figure 3 due to space limitations. Each group of pixels for each of the shown transforms are represented with arrows that traverse those pixels. The scanning patterns of the resulting transform coef cients depend on the orientation of the transforms. They are designed such that lower frequency coef cients are scanned before higher frequency coef cients and such that they are continuous. The best transform for each block is chosen in a Rate- Distortion (RD) optimized manner. The block is encoded with each available transform, including the 2-D DCT. Then a cost function is formed using a linear combination of the distortion (MSE) of the block and the number of bits spent on the block (both to encode the quantized coef cients and the side information) for each transform. The transform which has the smallest cost function is chosen for each block. The chosen transform for each block is encoded using variable-length-codes (VLC). Since the 2-D DCT is chosen about half of the time on average, we use a -bit codeword to represent the 2-D DCT and -bit codewords to represent each of the sixteen one-dimensional transforms. This choice was made because of its simple implementation and because it is to the optimal Huffman codes for the average frequencies of the transforms. More sophisticated methods may result in improvements in coding the side information. IV. EXPERIMENTAL RESULTS We present some experimental results to illustrate the performance of the proposed transforms within the H.264/AVC reference software. We use QCIF resolution sequences at 3 fres-per-second. Some of the important encoder pareters are as follows. The rst fre is encoded as an I-fre, and all the remaining fres are coded as P-fres. Quarter-pixelresolution full-search motion-estimation with adaptive blocksizes (6x6,6x8 8x6,8x8) are used. Entropy coding is performed with context-adaptive variable-length-codes (CAVLC). The results of the experiments are shown with the Bjontegaard-Delta (BD) bitrate metric [8] using the following quantization pareters: 24, 28, 32, 36. The BD-bitrate metric indicates the average bitrate savings (in terms of percentage) of the codec with the proposed transforms with respect to the codec with the conventional transform, the 2-D DCT. Figure 4- a shows the results obtained with the proposed transforms. The upper graph shows results that are obtained when taking the side information, which signals the chosen transform for each block, into account. The lower graph shows results obtained without taking the side information into account. The lower graph is included to show the ultimate performance of the proposed transforms and scans. These performances may be achievable if the side information is coded more intelligently and becomes negligibly small. For QCIF resolution sequences, the proposed transforms can achieve bitrate savings upto.8% with an average of 2%. When the side information bits are neglected, the ultimate achievable bitrate savings can be as large as 4.3% with an average of 2.7%. We have also performed experiments to report the performance of the direction-adaptive image transforms in [7] on the MC-residual. In [7], the performance on images is reported. The transforms in [7] are 2-D directional DCT s together with a DC separation and ΔDC correction method borrowed from [9]. For the experiments, we have complemented the six transforms in [7] with another eight transforms to achieve ner directional adaptivity, which is comparable to the adaptivity of our proposed transforms. We also used the scans proposed in [7]. All remaining parts of the system were unchanged. The results of these experiments are shown in Figure 4-b. The average bitrate savings are 4.8%, which is less than half of the 2% achieved with the proposed transforms. 79

Fig. 3. One-dimensional transforms for the motion compensation residual. The arrows traversing a set of pixels indicate a one-dimensional DCT on those pixels. A total of 6 transforms, directed at different directions cover 8. Only the rst ve transforms are shown here. The remaining transforms are symmetric versions of these ve and can be easily derived. BD bitrate results BD bitrate results 4 4 3 2 3 2 BD bitrate results (no side info) BD bitrate results (no side info) 4 4 3 2 3 2 (a) BD Bitrate results of the proposed transforms. Average bitrate savings w.r.t. 2-D DCT is 2%. (b) BD Bitrate results of the direction-adaptive image transforms in [7]. Average bitrate savings w.r.t. 2-D DCT is 4.8%. Fig. 4. Bjontegaard-Delta Bitrate results for QCIF resolution sequences. V. CONCLUSION We have analyzed the effect of locally anisotropic features of images on the motion-compensation residual (MC-residual). The analysis reveals statistical differences between the MCresidual and the image. We have proposed transforms based on this analysis and reported experimental results which indicate that these transforms can improve the compression ef ciency of the MC-residual. Future research efforts focus on improving these transforms and on investigating the effects of locally anisotropic features of images on other prediction residuals such as the intra prediction residual in H.264/AVC, resolution enhancement residual in scalable video coding and the disparity compensation residual in multiview video coding. REFERENCES [] C.-F. Chen and K. Pang, The optimal transform of motion-compensated fre difference images in a hybrid coder, Circuits and Systems II: Analog and Digital Signal Processing, IEEE Transactions on, vol. 4, no. 6, pp. 393 397, Jun 993. [2] W. Niehsen and M. Brunig, Covariance analysis of motion-compensated fre differences, Circuits and Systems for Video Technology, IEEE Transactions on, vol. 9, no. 4, pp. 36 39, Jun 999. [3] K.-C. Hui and W.-C. Siu, Extended analysis of motion-compensated fre difference for block-based motion prediction error, Image Processing, IEEE Transactions on, vol. 6, no., pp. 232 24, May 27. [4] E. Le Pennec and S. Mallat, Sparse geometric image representations with bandelets, Image Processing, IEEE Transactions on, vol. 4, no. 4, pp. 423 438, April. [] C.-L. Chang and B. Girod, Direction-adaptive discrete wavelet transform for image compression, Image Processing, IEEE Transactions on, vol. 6, no., pp. 289 32, May 27. [6] V. Velisavljevic, B. Beferull-Lozano, M. Vetterli, and P. Dragotti, Directionlets: anisotropic multidirectional representation with separable ltering, Image Processing, IEEE Transactions on, vol., no. 7, pp. 96 933, July 26. [7] B. Zeng and J. Fu, Directional discrete cosine transforms for image coding, Multimedia and Expo, 26 IEEE International Conference on, pp. 72 724, 9-2 July 26. [8] G. Bjontegaard, Calculation of average psnr differences between rdcurves, VCEG Contribution VCEG-M33, April 2. [9] P. Kauff and K. Schuur, Shape-adaptive dct with block-based dc separation and dc correction, Circuits and Systems for Video Technology, IEEE Transactions on, vol. 8, no. 3, pp. 237 242, Jun 998. 792