Czerepinski, P. J., & Bull, D. R. (1996). Coderoriented matching criteria for motion estimation. In Proc. 1st Intl workshop on Wireless Image and Video Communications (pp. 38 42). Institute of Electrical and Electronics Engineers (IEEE). DOI: 10.1109/WIVC.1996.624640 Peer reviewed version Link to published version (if available): 10.1109/WIVC.1996.624640 Link to publication record in Explore Bristol Research PDFdocument University of Bristol Explore Bristol Research General rights This document is made available in accordance with publisher policies. Please cite only the published version using the reference above. Full terms of use are available: http://www.bristol.ac.uk/pure/about/ebrterms
IEEE COMSOC EURASIP First International Workshop on Wireless IniageNideo Communications September 1996 C 0 DER O R I15 N TED MAT c HI N G C RITE RI A FOR PfiOTION ESTIMATION P. J. Czerepinski and D.R. Bull Image Communications Group, Centre for Communications Research, University of Bristol, Merchant Venturers Bldg., Woodland Rd., Bristol BS8 IUB, UK email: dave.bul1 @bl:istol.ac.uk p.j.czerepinski @bristol.ac.uk Tel. +44 117 9545195; Fax: +44 117 9255265 Abstract Classical matching criteria for motion estimation determine match quality by operating in the spatial domain. The resulting DFD signal is, however, compressed by means of frequency transformations, and any optimal matching criteria should esi imate the cost of coding the composite bit stream. Such criteria are presented in this paper and shown to ofer some improvement in the PSNR of the reconstructed sequence. 1 INTRODUCTION In the context of video coding, motion estimation and compensation [1][2][3][4] represent a well established means of removing the temporal redundancy from a signal. Motion information is generally extracted by comparing the contents of adjacent video frames. For simplicity, a blockbased translational motion model is often assumed, and motion vectors are found by determining a match between corresponding regions in the current and previous frames. Classical matching criteria such as MSE or MAD minimise a measure of the prediction error, with the aim of producing an estimate which is visually pleasing. What pleases the eye however, need not be pleasing for the coder and suboptimal (in the MSE sense) predictions may exist which produce a differential signal more amenable to coding. A simple example, involving a transform coder is shown in figure 1. The current frame (fig. lb) is predicted using the previous frame (fig. la). In the MSE sense, the previous frame s central block is the best prediction for the current frame s allblack block. Due to sharp edges in the resulting displaced frame difference signal (DFD), (fig. IC), ringing artefacts are bound to appear in the reconstruction if transform coefficients are quantised. On the other hand, the choice of any of the allwhite blocks as a prediction would produce an allblack differential signal (fig. Id), which could be coded with the DC coefficient alone. Figure 1 a) b) c) d) The current frame (b) is predicted using the previous frame (a). Displaced frame diperences produced using dijyerent matching criteria: (c) (d) For video coding, especially at lower bit rates, it is important that the DFD signal can be coded efficiently. In this paper we discuss features that make a good matching criterion and we reflect on the relevance of the popularly used MSE or MAD measures. As an alternative to these, we investigate a coderoriented approach to the problem. 2 WHAT IS AN IDEAL MATCHING CRITERION? A block diagram of a typical block discrete cosine transform (DCT) video coder is shown in figure 2. Authorized licensed use limited to: UNIVERSITY OF BRISTOL. Downloaded on March 9, 2009 at 10:11 from IEEE Xplore. Restrictions apply.
For a predetermined quantiser and entropy coding algorithm, motion estimation with its associated matching criterion remains the only degree of freedom in the compression process. The quality of the reconstructed image will entirely depend on the content of the displaced frame difference signal, since it will affect the quantisation (some signals will produce smaller quantisation errors than others), the residual entropy and the motion field entropy. An ideal matching criterion should therefore satisfy the following, sometimes mutually exclusive, conditions: good match: the discrepancy between the original frame and its prediction should be minimised, i.e. blocks that minimise the MAD or MSE should be chosen; good compression of the residual signal: the entropy of the DCTcoded displaced frame difference should be as low as possible; good reconstruction quality (for a fixed compression ratio): the distortion of the reconstructed signal should be minimised, i.e. blocks that produce the smallest quantisation error should be chosen; good compression of the motion field: the motion field should be as uniform as possible to facilitate efficient coding. I Figure 2 Coder block diagram. DCT discrete cosine transform, Q quantiser, IQ inverse quantiser, IDCT inverse discrete cosine transform, FM frame memory, VLC variable length coding, MEMC motion estimation and compensation. Of the above list, the first condition (minimum MAD or MSE) was found by the authors to be of lowest relevance to the compression efficiency: it estimates match quality in the spatial domain, whereas the quantisation and entropy coding processes operate on the transform domain data. The widespread application of these criteria can however be justified by their simplicity. We conclude, that the efficiency of the motion estimation process is a function of the quantisation error Q, and the bit rates B, and B, of the DCTcoded residual signal and the motion field respectively. For optimum performance, Q,, B, and B, must be taken into account in the design of the matching criterion. 3 MATCHING CRITERION DESIGN The following matching criteria have been proposed for testing: [ I i.e. minimising the sum of quantised DCT coefficients 6, of the displaced frame difference signal. (2) D, = B, + B,, 7 i.e. minimising the bitrate required to code the residual signal and the motion vectors. Conditional statistical models are used when coding both. In the case of the DCTcoded DFD, coefficient values are conditioned on coefficients positions in an 8x8 block, whereas the motion vectors are coded in respect to the mean of their three causal halfplane neighbours. Authorized licensed use limited to: UNIVERSITY OF BRISTOL. Downloaded on March 9, 2009 at 10:11 from IEEE Xplore. Restrictions apply.
Authorized licensed use limited to: UNIVERSITY OF BRISTOL. Downloaded on March 9, 2009 at 10:11 from IEEE Xplore. Restrictions apply. \ w f 60 Io/ 50 40 30 20., I *. \.... 0 2......... 0 20 40 60 80 100 120 140 160 180 200 bilrate [bits/macroblock] Figure 3 Mean square error of the reconstruction in respect to the bit rate; Table Tennis sequence. Next, consider the curve A in figure 3. This illustrates the relationship between the average number of bits used to code a displaced frame difference macroblock (16x16) and the resulting mean squared error for the Table Tennis sequence. A is just a coder performance curve, where, for the sake of this article, the axes have been labelled bits/macroblockmse rather than the usual bitdsecondpsnr. The clustered (B, MSE) vectors correspond to the number of bits required to encode a single frame difference macroblock and its resulting MSE. Every (B, MSE) vector is associated with a candidate motion vector, evaluated duriing an example search. The goal of this approach is to choose the translation that offers the best BMSE tradeoff. In order to accomplish this, bit rate and distortion must be weighted, dependent on their contribution to reconstruction quality. These weights (k and 1 respectively) can be derived from the averaged coder performance curve, A. Their ratio at a point k MSE0 (Bo, MSE, ) E A is equal to the differential =. Since this value varies along the curve, the 1 AB0 k investigation is restricted to the bit rate interval 30130 bits/macroblock, over which remains 1 approximately constant (=4). In this context, two additional matching criteria are proposed: (3) D, = k(b, + B,,)+ lq, 9 i.e. minimising a weighted sum of the bit rates required to code the residual signal and motion vectors and the quantising distortion. The weiglhting of the components corresponds to their contribution to objective reconstruction quality. Distortion is measured in terms of the mean square error. i.e. selecting the motion vector MVo that minimises the norm of the corresponding (kxb, IxMSE) vector.
Authorized licensed use limited to: UNIVERSITY OF BRISTOL. Downloaded on March 9, 2009 at 10:11 from IEEE Xplore. Restrictions apply. 4 CODING The criteria described in section 3 have been applied to the Table Tennis and Akiyo sequences (24Ox352x8bpp, 3Ofps). The codec employed halfpixel block matching motion estimation, with 16x16 macroblocks and 8x8 DCT subblocks. Full search has been performed, with the search area restricted to +I5 pixels ( Table Tennis ) and +6 pixels ( Akiyo ). DCT Coefficient Coding Match evaluation requires that the DCT is performed on DFD blocks during the search. Additionally, criteria (2)(3)(4) require an estimate of the bit rate associated with a given coefficient bit stream. After quantisation, a zigzag scan is performed on DCT subblocks. Our entropy coder uses a fixed conditional model, where the states are identified with coefficients positions within the subblock. In order to speed up the bit rate calculation for motion estimation purposes, the model is converted into a lookup table. This does not affect the precision, since eventually arithmetic coding is employed to compress the quantised coefficients, and this method is known to introduce very little overhead and operate in close accordance to the model. Motion Vector Coding Motion vectors are coded dependent on the prediction formed as the mean of their three causal halfplane neighbours. As is the case with the DCT coefficients, lookup tables are used during motion estimation and arithmetic coding is finally applied to the motion field. 5 RESULTS Results achieved by applying the proposed matching criteria to the two test sequences are shown in figure 4. In the case of Table Tennis, criteria (l), (2) and (4) yield a slightly better performance, compared to MSE alone, at higher bitrates but deteriorate at lower bitrates. Criterion (3) is superior to MSE at all bit rates considered and therefore has been selected for further evaluation using Akiyo. The coding gain, however, does not exceed 0.2dB in the case of Table Tennis and is smaller in the case of Akiyo. This is due to low motion content in the latter, which gives the criterion fewer opportunities to demonstrate its capabilities. 6 CONCLUSIONS In this article, a number of alternative matching criteria for motion compensated DCT coding have been presented. Their performance has been compared to that of the conventional MSE criterion for both high and low activity sequences. These experiments demonstrate the performance bounds of block matching motion estimation and show that only modest quality improvements are possible with more complex criteria. In addition, computational cost required to achieve these bounds is unlikely to be justified by quality improvements.
42 33.5 I ] / f i.... 29.5 200 300 400 ti00 600 700 800 900 1000 1100 1200 bit rate [kbps] Figure 4a. Performance of direrent matching criteria, Table Tennis. 36.2 36 35.8 35.6.. B rr 35.4 z (I) a 35.2 35 34.8 Acknowledgements 24.6 70 80 90 100 110 120 130 140 150 bit rate [kbps] Figure 4b. Perform!ance of different matching criteria, Akiyo. The authors wish to acknowledge the support of Sony Broadcast and Professional Europe, the CVCP and the University of Bristol for their support of this work. References [I] Le Gall D. MPEG: A Video Compression Standard for Multimedia Applications, Communications of the ACM, vo1.34 no.4 Apr. 91, pp.4.758 [2] Musmann H.G., Pirsch P., Grallert H.J. Advances in Picture Coding, Proc. IEEE vo1.73 no.4 Apr. 85 pp.523548 [3] Netravali A.N., Robbins J.D. Motion Coinpenscited Television Coding Part I, Bell Syst. Tech. J. v01.58 March 79 pp.63170 [4] Wang Q., Clarke R.J. Motion Estimation and Compensation for Image Sequence Coding, Signal Processing: Image Communication!; 4 (1992) 161 174 [5] Witten LA., Neal R.M., Cleary J.G. Arithmetic Coding for Data Compression, Communications of the ACM, vo1.30 no.6 June 87, pp.520540 Authorized licensed use limited to: UNIVERSITY OF BRISTOL. Downloaded on March 9, 2009 at 10:11 from IEEE Xplore. Restrictions apply.