On Spatial Adaptation of Motion Field Smoothness in Video Coding

Size: px

Start display at page:

Download "On Spatial Adaptation of Motion Field Smoothness in Video Coding"

Norman Davis
6 years ago
Views:

1 1 On Spatial Adaptation of Motion Field Smoothness in Video Coding Prakash Ishwar and Pierre Moulin Abstract Most motion compensation methods dealt with in the literature make strong assumptions about the smoothness of the underlying motion field. For instance, block-matching algorithms assume a blockwiseconstant motion field and are adequate for translational motion models; control grid interpolation assumes a block-wise bilinear motion field, and captures zooming and warping fairly well. Time-varying imagery however, often contains both types of motion (as well as others) and hence exhibits a high degree of spatial variability of its motion field smoothness properties. We develop a simple method to spatially adapt the smoothness of the motion field. The proposed method demonstrates substantial improvements in video quality over a wide range of bit rates. To this end, we introduce the notion of a motion field that is characterized by a set of labels. The labels provide the flexibility to adaptively switch between two different motion models locally. The individual motion models have very different smoothness properties. The switched framework for motion compensation performs significantly better than each of its constituent motion models, in terms of both visual quality and signal-to-noise ratio ( db on the average). Finally, we develop an extension of this method that enhances the overlapped block motion compensation scheme by allowing spatial adaptation of the window function. Index Terms: video compression, motion estimation, block matching, overlapped block motion compensation, interpolation methods. The authors are with the Department of Electrical and Computer Engineering, the Coordinated Science Laboratory, and the Beckman Institute at the University of Illinois, 405 N. Mathews Avenue, Urbana, IL {ishwar,moulin}@ifp.uiuc.edu. Ph: (217) , (217) Fax: (217) Part of this work was presented at ICIP 97. This research was supported by NSF under grant MIP

2 2 I. INTRODUCTION The principal drawback of the conventional block matching algorithm () [1], in video compression is the creation of unnatural blocking artifacts in the motion-compensated prediction for the current frame. This arises for example when multiple objects with significantly different types of motion are within the same block. Blocking artifacts may become perceptually prominent at medium to low bit rates. Blocking artifacts can be avoided altogether. One approach makes use of motion-interpolative techniques like Control Grid Interpolation (CGI) [2], triangle motion compensation (TMC) [3] and motion compensation techniques based on mesh and affine motion-parameterization [4]. CGI performs well in regions where the true two-dimensional (2-D) motion field is smooth, such as those that result from a zoom, pan or a rotation of the camera. CGI, however, does not perform consistently better than, primarily due to to its inability to perform any kind of motion segmentation (i.e., decoupling of motion in adjacent blocks) whatsoever. A principal drawback common to these motion compensation methods is a lack of flexibility in the type of motion-approximation they can perform. CGI uses a piecewise bilinear, TMC a piecewise planar, and a piecewise constant motion field. These methods do not have the structure to represent motion fields that exhibit local deviations in their degree of smoothness. CGI produces smooth motion field predictions and performs poorly at locations in the frame where the motion field changes abruptly. If a block is part of a region that undergoes monolithic translation, then will undoubtedly outperform CGI in this block. On the other hand, block matching is at a disadvantage when attempting to capture nontranslational motion. Specifically, for zooming, rotational motion, or any other motion for which regions within a block undergo warping in successive frames, CGI outperforms. Variable-size block matching [5] would only partially cure the problem. Note that the and CGI schemes can produce the same solution only for the pathological case of a constant motion field, and it appears to be difficult to simultaneously preserve the advantages of both methods. The key idea in our study is to overcome that apparent limitation. We introduce a method that simultaneously preserves the benefits of higher-order interpolative schemes such as CGI and incorporates the additional flexibility of using a -type 2-D zeroth-order hold interpolation for some blocks. This method is ideal for scenes that exhibit translational motion of some objects and warping or rotational motion of other objects. We characterize the motion field by a set of

3 3 motion vectors and labels defined at regular grid points. The labels make it possible to locally switch between motion models such as and CGI. We call this new scheme switched CGI (SCGI). To our knowledge, spatial adaptation in motion compensation literature has been nearly exclusively applied to the spatial resolution of the motion field [4] [8]. An exception is the paper [9], which uses a bank of interpolation filters to provide flexibility in billinear motion interpolation. A different adaptive approach consists of switching locally between a global and a local motion compensation model [10], [11]. In contrast with [10], [11], we switch between purely local motion models and use an iterative optimization framework to do so. Additional details about our method may be found in [12]. The organization of this paper is as follows. In Section II, we briefly review the block matching and CGI algorithms, and introduce our switched CGI scheme. In Section III, we extend this approach and develop an adaptive version of Overlapped Block Motion Compensation (OBMC). In section IV, we present video coding results comparing smoothness-adapted versus nonadapted motion-compensated prediction methods, for a fixed spatial resolution. These results clearly demonstrate the advantages of spatial adaptation of smoothness of motion fields. II. SWITCHED CONTROL GRID INTERPOLATION The motion-compensated prediction methods of this section can be described by the following equation: ˆF current ( s) = F previous ( s + MV (model) ( s)). (1) Here, ˆFcurrent ( s) represents the intensity value at position s in the current frame, Fprevious the previous frame available at the decoder, and MV (model) ( s) is the motion vector at position s according to the motion model: model. Motion vectors are not integer-valued in general in which case, the prediction can be obtained by a bilinear interpolation of the intensities at neighboring pixels [1]. The motion compensation models in this section can be described as follows. The current frame is partitioned into M M square blocks by a uniform grid. Each grid point (or block) is associated with a motion vector (see Fig. 1). The motion vector for the other pixels is a function of the grid-point motion vectors. It is precisely the form of this functional dependence that characterizes a specific motion model in this framework. Given the functional dependence, a

4 4 variety of algorithms can be used to determine the parameters of the model and the grid motion vectors by minimizing an estimation criterion like the mean-squared prediction error. Some of these algorithms may be optimal with respect to minimizing the estimation criterion, whereas others may be suboptimal, but fast. A. Block Matching In the block matching motion model, pixels in the M M block B are constrained to have the same motion vector. This block-constant motion vector is obtained by minimizing the estimation criterion for the block. Referring to Fig. 1 and Equation (1), the motion model can be described by: MV ( s) = MV ( s 1 ), s B (2) ˆF current( s) = F previous( s + MV ( s)). (3) Here, s 1 is the top left pixel in block B. The resulting motion field is constant within each block. Equation (2) can also be interpreted (from the decoder s point of view) as a 2-D interpolation of the motion vectors assigned to control points (top-left block corner pixels) using a 2-D rectangular interpolating kernel. In other words, block matching can be viewed as a 2-D zeroth-order motionhold, but, the block-constant motion vectors MV ( s i ) for each block are usually chosen so as to minimize the estimation criterion over the entire block. B. Control Grid Interpolation The CGI motion model may be succinctly described by the following equation (see Fig. 1): MV CGI ( s) = w( s s 1 ) MV ( s 1 ) + w( s s 2 ) MV ( s 2 ) + w( s s 3 ) MV ( s 3 ) + w( s s 4 ) MV ( s 4 ). (4) Here, w( s) is the bilinear window function centered at 0, although it may be noted that the representation of (4) accommodates arbitrary interpolating kernels. The parameters MV CGI ( s) of the CGI model are estimated by optimizing the estimation criterion. This is commonly done using an iterative algorithm. The control point motion vectors (CPMVs) are assigned initial values (typically zero). This is followed by successively determining for each control point, the locally optimal (over a search range) motion vector that minimizes the estimation criterion over the control point s region of influence 1 (ROI). The motion vectors of neighboring control points are held fixed during this minimization. This process, when carried 1 The region of influence of a control point is the set of pixels whose motion vectors are dependent on the motion

5 5 through for all the control points (typically in a raster scan order), constitutes one iteration. The process is repeated until either there is no significant change in the control point motion vectors with successive passes, or a certain maximum number of iterations have been completed. We refer the reader to [2] for details of the method. C. Switched CGI The drawbacks of CGI that were mentioned in the Introduction can be at least partially attributed to the inability of CGI to decouple motion in adjacent blocks so as to accommodate spatially abrupt changes in the smoothness of the underlying motion field across block boundaries. Such abrupt changes in motion across block boundaries are well captured by block matching. In our proposed SCGI scheme, we assign an additional label (a single bit) to each control point that indicates whether or not the motion field is constant over the block having the control point at its top left corner. In other words, the labels provide a means for adaptively eliminating the motion dependence across neighboring blocks in CGI. The motion vectors and labels at the control points are jointly optimized to minimize the estimation criterion, in this case the meansquared prediction error. Suboptimal heuristic techniques could be used but are not given further attention in this section. For example, one could use the motion vectors obtained using block matching and only optimize over the labels. The SCGI motion model can be described by: MV SCGI ( s) = MV ( s) : label( s) = 0 MV CGI ( s) : label( s) = 1. (5) Here, label( s) = label( s i ), where s i is the top left control point of the block i to which s belongs. In other words, all pixels within the same block have the same label, and this label can be regarded as the label of the block s control point. MV ( s) and MV CGI ( s) are given by (2) and (4) respectively. Fig. 2 illustrates the idea for the one-dimensional case in which the horizontal axis represents the one-dimensional spatial location, and the vertical axis the corresponding motion vector (here vector assigned to the control point. Note that the motion vector for each pixel is a bilinear function of the four corner control point motion vectors of the block to which the pixel belongs. For estimation criteria such as mean squared error and mean absolute error, the region of influence of a control point is a 2M 1 2M 1 square region centered at the control point.

6 6 a scalar) at each position. The figure illustrates the nature of the motion field typical of each model and may also be interpreted as a sectional (horizontal or vertical) plot of a component of the actual motion vectors in a video sequence. In the figure, control points are assumed to be at locations that are multiples of 8 and form the one-dimensional counterpart of a 2-D-uniform grid of pixels. In SCGI, a label value of one for any control point tells the decoder to perform a linear interpolation of the motion vector components in the segment between the control point and the control point immediately adjacent to its right. A label value zero indicates that the motion field is constant in the same region. The labels can be associated with an interpretation that can be physically visualized: the local degree of smoothness of the motion field, where blockiness corresponds to the least smooth model (). Such local variations in the smoothness of the motion field are characteristic of real-life sequences. The idea is by no means restricted to the CGI- hybridization scheme described in this section. Similar notions may be exploited for locally switching among a class of competing motion models if there is sufficient reason to believe that a given sequence has motion that has significant traits of each model at a large number of spatiotemporal locations. Hybridization, however, comes at a slight cost. An additional one bit per block is needed to instruct the decoder which is the appropriate model to switch to. This additional cost may be justified if the resulting prediction error can be better coded in a rate-distortion sense. In other words, the cost (in terms of the number of bits) needed to encode the prediction error should at least compensate any additional overhead incurred. That this is indeed the case for the - CGI hybridization scheme is amply evidenced by our simulation results, which are elaborated on in Section IV. We now describe an iterative coordinate-descent type algorithm to implement SCGI. The iterative framework is the same as the one described in the previous subsection for CGI except that labels also need to be updated for each control point. All labels are initialized to unity, which means that the initial motion model is purely CGI for all blocks. The optimal motion vectors (in each iteration) are determined by comparing the resulting local prediction error in the region of influence of the control point for cases when a blockwise constant motion field is assumed for the block (corresponding to a label value equal to zero for the block) and when bilinear interpolation is performed (label value equal to unity). The best motion vector is retained, and the label of the control point is switched if necessary. The algorithm is greedy and is guaranteed

7 7 to converge to a stationary point in a finite number of iterations because the estimation criterion is bounded below (by zero), is nonincreasing at each iteration, and there are only a finite number of combinations of labels and motion vectors. III. SWITCHED OVERLAPPED BLOCK MOTION COMPENSATION The CGI algorithm described in the previous section used a first-order motion model that helped eliminate blocking artifacts characteristic of. A similar technique is often employed in the intensity domain to reduce blocking artifacts due to. Indeed, such artifacts can be smoothed out by an interpolation of the prediction intensities corresponding to motion vectors from neighboring blocks. The weights of the interpolation are determined by a window function (typically bilinear) centered at each block. This forms the basis of the OBMC algorithm [13], [14]. OBMC globally smoothes out the prediction intensity field for all blocks using the same window: ˆF OBMC current ( s) = b MZ 2 w( s OBMC b c) F previous( s + MV ( b)) (6) where b MZ 2 pixel), c = ( M 1 that satisfies = {(km,lm) T,k,l Z} denote the coordinates of a block (top-left corner 2, M 1 2 b MZ 2 ) and w( s) is a 2-D non-negative interpolating kernel (window function) w( s b c) = 1 for each s. Typically, the window is symmetric around 0 and has support 2M 2M. Note that can be considered to be a special case of OBMC using an appropriate rectangular window of weights. Also note that OBMC is not without drawbacks: the same window function is used for all blocks of the frame, with the result that sharp features such as edges are oversmoothed in the motion-compensated prediction. Similar to the drawbacks of CGI in the motion domain, OBMC has no means to eliminate the interdependencies of the prediction intensities across block boundaries in the intensity domain. Similar to SCGI, we can adaptively eliminate inter-block motion field dependency in OBMC by using binary labels for each block. The prediction takes the form: ˆF current SOBMC ( s) = ˆF current( s) : if label( s) = 0 (7) ˆF current OBMC( s) : if label( s) = 1 where as before, label( s) = label( s i ), (that is, all pixels within the same block have the same label) and, ˆF current( s) and ˆF OBMC current ( s) are respectively given by (3) and (6). The labels therefore provide a convenient mechanism for regulating the extent of the interdependencies of prediction

8 8 intensities across block boundaries. We call this scheme switched OBMC (SOBMC). The labels and grid-point motion vectors can be jointly optimized under a single iterative scheme similar to the joint optimization of labels and grid-point motion vectors described in the previous section for SCGI. In order to facilitate practical implementations however, we have used the motion vectors obtained by, following standard OBMC practice [15]. The optimization is then only over the labels at each block. The optimal decision is made by comparing the squared prediction errors for and OBMC. Thus the labels can be found independently for each block. The current H.263 standard (version 1 or 2) provides no option to turn OBMC on/off on a macroblock basis either directly or indirectly [6]. The Advanced Prediction Mode of H.263 is signaled by bit 12 of PTYPE that is frame specific and not macroblock specific. Thus, the Advanced Prediction Mode of H.263 includes OBMC that is applied to all macroblocks in P pictures. Our results show definite advantages to switching OBMC on/off on a macroblock basis. IV. VIDEO CODING RESULTS In this section, we describe our simulation setup and discuss the results of simulations performed over a variety of bit rates. We compare SCGI with and CGI, which are the constituent models of SCGI. We also compare SOBMC with its constituent models, namely and OBMC. This second comparison allows us to assess whether our proposed adaptation of smoothness in the motion domain produces improvements comparable to those resulting from adaptation of smoothness in the intensity domain. The estimation criterion for all methods is the mean-squared prediction error. The motion vectors in all methods are estimated with half-pixel accuracy. The search for optimal motion vectors in, CGI and SCGI is exhaustive within the search range. In CGI, we use the iterative algorithm described in Section II.B to determine the control-point motion vectors. In SCGI, we jointly optimize the labels and motion vectors using the iterative procedure outlined in Section II.C. The motion vectors in both cases are initialized to zero and the labels in SCGI to unity (which means that each block is initially assumed to use CGI). For both OBMC and SOBMC, we use the standard bilinear window and use the motion vectors as model parameters. Hence unlike in SCGI where the optimal label for any block depends on label values of neighboring blocks, in the simple implementation of SOBMC that we chose to work with, the

9 9 label for each block is determined independently. Displaced frame differences (DFDs) are lossily encoded using the recently developed Estimation- Quantization (E-Q) wavelet coder [16], with lossless encoding of the output using an arithmetic coder [17]. The E-Q coder is a state-of-the-art still image coder which models wavelet coefficients of an image in each subband by a generalized Gaussian distribution (GGD) whose shape and variance parameters are estimated locally in the subband. This coder was chosen to encode DFDs because of the underlying adaptation of the quantizers to local image statistics. The wavelet coefficients are quantized in an optimal rate-distortion sense on the basis of the GGD model for the coefficient. The E-Q coder was operated at a specified value of λ (the slope of the rate-distortion curve of the coder) for the entire sequence. The value of λ was chosen so that the average overall bit rate of the sequence would be close to the target bit rate. The overall bit rate for each frame measured in bits per pixel (bpp) is the sum of the total motion-information bit rate (which includes the additional label bits for the SCGI and SOBMC models) and the bit rate of the DFD. The total number of bits for the motion vectors in any frame is estimated using first-order entropies of the motion vectors. While coding performance is coupled to both motion compensation and DFD coding, we have consistently used the same DFD coder for all the motion compensation schemes that we compare. We are primarily interested in investigating how assumptions on the smoothness of the underlying correspondence field relate to good prediction. We compare the performance of the switched methods for sequences spanning three different bit rates intended for different applications: low bit rate (16 kbit/s) for video-phone applications, medium bit rate (128 kbit/s) for video-conferencing applications, and high bit rates (1.5 Mbit/s) for SDTV (Standard Definition TV). The coding results are very encouraging and illustrate the benefits of spatial switching between and CGI interpolative schemes. Note that several coding refinements are possible but have not been considered here. For instance, we are not concerned with developing fast algorithms; we do not perform bidirectional prediction; we do not use quadtrees for the motion field; and we have not optimized the quantizers for coding DFDs. A. Coding of Susie at 16 kbit/s We encode the Susie sequence, at 16 kbit/s, 7.5 frames per second (fps) using blocks. We used ±16 pixels for the motion vector search range and the standard bilinear window for OBMC and SOBMC. Table I and Figs. 3 and 4 show coding results for frames 1 117

10 10 in the subsampled Susie sequence. Fig. 3 compares the performance of the SCGI, CGI, and schemes. Fig. 4 compares the performance of SOBMC, OBMC, and. Frames 1 and 61 were intraframe-coded using the E-Q coder at bit rates of 0.70 bpp and 0.54 bpp, respectively. For each motion model, the DFDs were coded with a fixed value of the λ parameter of the DFD coder (see Table I) to ensure that the mean total bit rate is very close to the target rate of 16 kbit/s. The important results of the simulation are summarized in Table I. From Figs. 3(d) and 4(d) and Table I, it can be observed that the switched schemes perform much better than any scheme that does not switch. The switched methods in fact perform almost uniformly better than nonswitched methods, as can be inferred from Figs. 3(d) and 4(d). An examination of Figs. 3(a) and 4(a) reveals that the bit rate for the motion information for the switched schemes is higher than for nonswitched ones (compare the mean bit rates in Table I). This is due in part to the use of additional label bits that require bpp, constituting less than 4.65% of the total bit rate. In terms of peak signal to noise ratio (PSNR), however, the switched schemes demonstrate a significant gain over the nonswitched ones over most of the sequence. On average, SCGI improves by 1.04 db over and by 0.67 db over CGI. Likewise, SOBMC improves by 0.68 db over and by 0.52 db over OBMC on the average. It is also noteworthy that switching in the motion domain provides noticeable improvements over switching in the intensity domain (by 0.37 db). In addition to the improvement in PSNR, the switched schemes also exhibit significant improvement in the visual quality of the decoded sequence. Fig. 5 compares the visual quality of the decoded frame 29 produced by different methods. Observe the prominent blocking artifacts in Fig. 5(b) in the hair above the forehead and those artifacts near the chin and right eye of Susie. Although some of these artifacts may also be found in the SCGI and SOBMC reconstructions, they are less prominent and fit much better into their surroundings due to the first order spatial/motion interpolation effects in the neighboring blocks. Warping artifacts in the hair above the forehead can also be seen in the CGI reconstruction. The SCGI reconstruction captures the eyes quite well a characteristic that may be attributed to CGI (it may be observed that the CGI reconstruction also captures the eye region quite well). Figures 3(e) and 4(e) show an important statistic of the switched schemes, namely the fraction of blocks that use a label value equal to unity as a function of frame number. This may be interpreted as a measure of the degree of smoothness of the motion field (Fig. 3(e)) or reconstructed frame (Fig. 4(e)).

11 11 B. Coding of Football at 1.5 Mbit/s We encode the Football sequence at 1.5 Mbit/s, 30 fps using 8 8 block size. We used ±16 pixels for the motion vector search range and the standard bilinear window for OBMC and SOBMC. All relevant coding results for frames 1-30 of the Football sequence are shown in Table II and Fig. 6. The first frame was intraframe encoded at 2.15 bpp. In this case, the motion contributes very little to the total bit rate, so that the additional overhead of encoding the labels is indeed negligible. It can be seen from the coding results that the switched schemes perform considerably better than those that do not switch. The PSNR improvements of SCGI are 0.92 db and 0.99 db, respectively, over the and the CGI schemes. In the case of SOBMC, the respective improvements in PSNR over and OBMC are 0.18 db and 0.91 db. Once again SCGI improves uniformly over and CGI as can be seen from the PSNR plots of Fig. 6(d). Even at this high bit rate, there is a small improvement in the visual quality of the decoded frames for the switched methods over nonswitched ones, as can be seen in Fig. 7. Here blocking artifacts of the reconstruction can be observed in the field background as also on the back of player 75. Warping artifacts characteristic of CGI can be observed in the track suit of the player who is seen to be crouched down in the bottom left portion of the figure. Although not shown, smoothing artifacts was observed in the OBMC reconstruction. OBMC ends up destroying fine texture details that can be coded at the higher available bitrate. The performance of SCGI was 0.74 db higher than that of SOBMC. C. Coding of Salesman at 128 kbit/s We also encode the Salesman sequence at 128 kbit/s, 15 fps, using 8 8 block size and ±16 pixels motion vector search range. We use the standard bilinear window for OBMC and SOBMC. Table III shows coding results for frames 1-59 in the subsampled (15 fps) Salesman sequence. Frames 1 and 31 were intraframe encoded using the E-Q coder, at a bit rate of 1.61 bpp. As with the Susie sequence, we found that the switched schemes outperform methods that do not switch for most frames of the sequence. Again SCGI outperforms SOBMC (by 0.57 db). See Table III for performance gains for the different methods. Finally, a comment that applies to all of our experiments: we have observed that the SCGI scheme required in average one to three fewer iterations for convergence to a local minimum than did the CGI scheme (we allowed a maximum of 30 iterations per frame for both CGI and SCGI).

12 12 Although there are efficient (but usually suboptimal) methods for estimating motion for TMC and OBMC [3], [15] that are also applicable with minor modifications to the CGI and SCGI schemes, in this work, speed has not been the principal concern, and we have not attempted to optimize the performance of the CGI and SCGI schemes for fast execution. V. DISCUSSION AND CONCLUSION Our experimental results have demonstrated clear advantages to locally switching between two different motion-compensation methods. Our switching model, SCGI, is readily seen as a method of spatially adapting the smoothness of the motion field. This spatial adaptation encompasses both CGI and as special instances when all binary labels are equal to one or zero, respectively. The purpose of such spatial adaptation is to better model motion fields in typical video sequences, especially those that contain various types of motion, such as translational motion, or warping, or rotational motion. A better model would hopefully produce smoother DFDs that are easier to encode in a rate-distortion sense; coding performance is intimately related to both the specifics of coding DFDs and the motion model. That SCGI is indeed a good model to work with has been borne out by the results of our simulations. An extension of the switching idea would be to use multivalued labels, in which case it would be possible to produce motion fields that can locally have higher orders of smoothness. We have also developed a switched OBMC method (SOBMC), which performs spatial adaptation of smoothness in the intensity domain. The constituent models (OBMC and ) of SOBMC differ in the values of the weights used for combining predictions corresponding to individual motion vectors. How different these weights are for neighboring pixels depends on the smoothness of the underlying window function that is used. As with SCGI, in SOBMC, the labels allow one to switch between two different window functions having very different smoothness properties. In conclusion, local, spatial adaptation of the smoothness properties of the motion field and intensity can offer substantial improvements in video coding quality over a variety of bit rates. The use of labels as indicators of local smoothness properties in video provides a simple but effective framework to perform this adaptive switching.

13 13 References [1] A. M. Tekalp, Digital Video Processing. Upper Saddle River, NJ: Prentice-Hall, [2] G. J. Sullivan and R. L. Baker, Motion compensation for video compression using control grid interpolation, in Proc. IEEE Int. Conf. Acoust., Speech, and Signal Proc., pp , [3] Y. Nakaya and H. Harashima, An iterative motion estimation method using triangular patches for motion compensation, in Proc. SPIE Conf. on Vis. Commun. and Image Proc., vol. 1605, pp , [4] Y. Altunbasak, A. M. Tekalp, and G. Bozdagi, Two-dimensional object-based coding using a content-based mesh and affine motion parameterization, in Proc. IEEE Int. Conf. on Image Proc., vol. II, (Arlington, VA), pp , [5] M. H. Chan, Y. B. Yu, and A. G. Constantinides, Variable size block matching motion compensation with applications to video coding, Proc. IEEE, vol. 137, pp , Aug [6] Video coding for low bitrate communication. ITU-T recommendation H.263 version 2, Feb [7] C. L. Huang and C. Y. Hsu, A new motion compensation method for image sequence coding using hierarchical grid interpolation, in IEEE Trans. Circ. and Syst. for Video Technology, vol. 4, pp , Feb [8] R. Krishnamurthy, P. Moulin, and J. W. Woods, Multiscale modeling and estimation of motion fields for video coding, IEEE Trans. Image Proc., vol. 6, pp , Dec [9] P. Hsu, K. J. Liu, and T. Chen, An adaptive interpolation scheme for 2 D mesh motion compensation, in Proc. IEEE Int. Conf. on Image Proc., vol. III, (Santa Barbara, CA), pp , [10] H. Josawa, K. Kamikura, A. Sagata, H. Kotera, and H. Watanabe, Two-stage motion compensation using adaptive global MC and local affine MC, IEEE Trans. Circ. and Syst. for Video Technology, vol. 7, pp , Feb [11] Z. Sun and A. M. Tekalp, Trifocal motion modeling for object-based video compression and manipulation, IEEE Trans. Circ. and Syst. for Video Technology, vol. 8, pp , Sept [12] P. Ishwar, On spatial adaptation of smoothness of motion and intensity fields in video coding, MS thesis, ECE dept., Univ. Illinois at Urbana-Champaign, Dec Postscript file available at: ishwar/publications/msthesis.ps. [13] H. Watanabe and S. Singhal, Windowed motion compensation, in Proc. SPIE Conf. on Vis. Commun. and Image Proc., vol. 1605, pp , Nov [14] M. T. Orchard and G. J. Sullivan, Overlapped block motion compensation: An estimation theoretic approach, IEEE Trans. Image Proc., vol. 3, pp , Sept [15] R. Rajagopalan, E. Feig, and M. Orchard, Motion optimization of ordered blocks for overlapped block motion compensation, IEEE Trans. Circ. and Syst. for Video Technology, vol. 8, pp , Apr [16] S. M. LoPresto, K. Ramchandran, and M. T. Orchard, Image coding based on mixture modeling of wavelet coefficients and a fast Estimation Quantization framework, in Proc. IEEE Data Compression Conference, (Snowbird, UT), pp , Mar [17] G. G. Langdon, Jr., An introduction to arithmetic coding, IBM J. Res. Develop., vol. 28, pp , Mar

14 14 block B MV( s ) 1 MV( s ) 2 Control Points Pixel Position ( s ) MV( s ) 4 MV( s ) 3 8 X 8 Block Fig. 1. Notation describing grid-based motion compensation. Labels (1) (0) (0) (1) CGI SCGI Fig. 2. One-dimensional version of the interpolation schemes used in, CGI, and SCGI.

15 15 11 MV RATE PLOTS FOR SUSIE 10 DFD RATE PLOTS FOR SUSIE 10 CGI SCGI 9 CGI SCGI 8 9 MV RATE (kbit/s) > DFD RATE (kbit/s) > (a) (b) 16 TOTAL RATE PLOTS FOR SUSIE 14 CGI SCGI TOTAL RATE (kbit/s) > (c) 36 PSNR PLOTS FOR SUSIE 75 MOTION TYPE PLOT IN SCGI FOR SUSIE PSNR (db) > CGI SCGI % BLOCKS THAT USE FIRST ORDER HOLD > (d) (e) Fig. 3. Comparative results for Susie (: solid line, CGI: dot dash line, SCGI: dashed line): (a) total motion-rate (kbit/s), (b) compressed DFD bit rate (kbit/s), (c) total bit rate (kbit/s), (d) PSNR (db) at decoder, and (e) percentage of blocks using first-order hold interpolation.

16 MV RATE PLOTS FOR SUSIE OBMC SOBMC 10 9 DFD RATE PLOTS FOR SUSIE OBMC SOBMC 8 MV RATE (kbit/s) > DFD RATE (kbit/s) > (a) (b) 16 TOTAL RATE PLOTS FOR SUSIE 14 OBMC SOBMC TOTAL RATE (kbit/s) > (c) 36 PSNR PLOTS FOR SUSIE 70 MOTION TYPE PLOT IN SOBMC FOR SUSIE OBMC SOBMC PSNR (db) > % BLOCKS THAT USE OBMC > (d) (e) Fig. 4. Comparative results for Susie (: solid line, OBMC: dot dash line, SOBMC: dashed line): (a) total motion rate (kbit/s), (b) compressed DFD bit rate (kbit/s), (c) total bit rate (kbit/s), (d) PSNR (db) at decoder, and (e) percentage of blocks that use a bilinear window in SOBMC.

17 17 (a) (b) (c) (d) (e) (f) Fig. 5. Frame 29 of the Susie sequence coded at 16 kbit/s. (a) Original, (b), (c) CGI, (d) SCGI, (e) OBMC, and (f) SOBMC.

18 MV RATE PLOTS FOR FOOTBALL 1300 DFD RATE PLOTS FOR FOOTBALL MV RATE (kbit/s) > CGI SCGI DFD RATE (kbit/s) > CGI SCGI (a) (b) 1600 TOTAL RATE PLOTS FOR FOOTBALL TOTAL RATE (kbit/s) > CGI SCGI (c) 31 PSNR PLOTS FOR FOOTBALL 70 MOTION TYPE PLOT IN SCGI FOR FOOTBALL PSNR (db) > CGI SCGI % BLOCKS THAT USE FIRST ORDER HOLD > (d) (e) Fig. 6. Comparative results for Football (: solid line, CGI: dot dash line, SCGI: dashed line): (a) total motion-rate (kbit/s), (b) compressed DFD bit rate (kbit/s), (c) total bit rate (kbit/s), (d) PSNR (db) at decoder, and (e) percentage of blocks using first-order hold.

19 19 (a) (b) (c) (d) Fig. 7. Frame 26 (cropped) of the Football sequence coded at 1.5 Mbit/s. (a) Original, (b), (c) CGI, (d) SCGI.

20 20 TABLE I Video-coding results for 30 frames of the Susie sequence at 16 kbit/s, 7.5 fps. Motion model λ Mean motion Mean DFD Mean total Mean PSNR rate (kbit/s) rate (kbit/s) rate (kbit/s) (db) CGI SCGI OBMC SOBMC TABLE II Video-coding results for 30 frames of the Football sequence at 1.5 Mbit/s, 30 fps. Motion model λ Mean motion Mean DFD Mean total Mean PSNR rate (kbit/s) rate (kbit/s) rate (kbit/s) (db) CGI SCGI OBMC SOBMC TABLE III Video-coding results for 30 frames of the Salesman sequence at 128 kbit/s, 15 fps. Motion model λ Mean motion Mean DFD Mean total Mean PSNR rate (kbit/s) rate (kbit/s) rate (kbit/s) (db) CGI SCGI OBMC SOBMC

2 Abstract Most motion compensation methods dealt with in the literature make strong assumptions about the smoothness of the underlying motion eld. Fo

2 Abstract Most motion compensation methods dealt with in the literature make strong assumptions about the smoothness of the underlying motion eld. Fo 1 On Spatial Adaptation of Motion Field Smoothness in Video Coding Prakash Ishwar and Pierre Moulin The authors are with the Department of Electrical and Computer Engineering and the Beckman Institute