Fast Block-Based True Motion Estimation Using Distance Dependent Thresholds Golam Sorwar

Similar documents
IEEE Proof. MOTION estimation (ME) plays a vital role in video

Open Research Online The Open University s repository of research publications and other research outputs

Prediction-based Directional Search for Fast Block-Matching Motion Estimation

Module 7 VIDEO CODING AND MOTION ESTIMATION

Enhanced Hexagon with Early Termination Algorithm for Motion estimation

Pattern based Residual Coding for H.264 Encoder *

Express Letters. A Simple and Efficient Search Algorithm for Block-Matching Motion Estimation. Jianhua Lu and Ming L. Liou

An Adaptive Cross Search Algorithm for Block Matching Motion Estimation

Motion Vector Estimation Search using Hexagon-Diamond Pattern for Video Sequences, Grid Point and Block-Based

A New Fast Motion Estimation Algorithm. - Literature Survey. Instructor: Brian L. Evans. Authors: Yue Chen, Yu Wang, Ying Lu.

Redundancy and Correlation: Temporal

AN ADJUSTABLE BLOCK MOTION ESTIMATION ALGORITHM BY MULTIPATH SEARCH

Motion Estimation for Video Coding Standards

Efficient Block Matching Algorithm for Motion Estimation

Transactions Briefs. An Adaptive Search Length Algorithm for Block Matching Motion Estimation

Semi-Hierarchical Based Motion Estimation Algorithm for the Dirac Video Encoder

Module 7 VIDEO CODING AND MOTION ESTIMATION

Enhanced Hexagonal Search for Fast Block Motion Estimation

Fast Block-Matching Motion Estimation Using Modified Diamond Search Algorithm

International Journal of Scientific & Engineering Research, Volume 5, Issue 7, July ISSN

Multimedia Systems Video II (Video Coding) Mahdi Amiri April 2012 Sharif University of Technology

A High Quality/Low Computational Cost Technique for Block Matching Motion Estimation

Fobe Algorithm for Video Processing Applications

BI-DIRECTIONAL AFFINE MOTION COMPENSATION USING A CONTENT-BASED, NON-CONNECTED, TRIANGULAR MESH

Predictive Motion Vector Field Adaptive Search Technique (PMVFAST) - Enhancing Block Based Motion Estimation

A Novel Hexagonal Search Algorithm for Fast Block Matching Motion Estimation

A Study on Block Matching Algorithms for Motion Estimation

A Novel Statistical Distortion Model Based on Mixed Laplacian and Uniform Distribution of Mpeg-4 FGS

Tunnelling-based Search Algorithm for Block-Matching Motion Estimation

Fast Implementation of VC-1 with Modified Motion Estimation and Adaptive Block Transform

Video Compression System for Online Usage Using DCT 1 S.B. Midhun Kumar, 2 Mr.A.Jayakumar M.E 1 UG Student, 2 Associate Professor

International Journal of Emerging Technology and Advanced Engineering Website: (ISSN , Volume 2, Issue 4, April 2012)

Toward Optimal Pixel Decimation Patterns for Block Matching in Motion Estimation

ELEC Dr Reji Mathew Electrical Engineering UNSW

Efficient Method for Half-Pixel Block Motion Estimation Using Block Differentials

Professor Laurence S. Dooley. School of Computing and Communications Milton Keynes, UK

AN AUTOMATED ALGORITHM FOR APPROXIMATION OF TEMPORAL VIDEO DATA USING LINEAR BEZIER FITTING

Directional Cross Diamond Search Algorithm for Fast Block Motion Estimation

An Efficient Mode Selection Algorithm for H.264

Simplified Block Matching Algorithm for Fast Motion Estimation in Video Compression

A Comparative Approach for Block Matching Algorithms used for Motion Estimation

Joint Adaptive Block Matching Search (JABMS) Algorithm

DATA hiding [1] and watermarking in digital images

COMPARATIVE ANALYSIS OF BLOCK MATCHING ALGORITHMS FOR VIDEO COMPRESSION

Moving Object Segmentation Method Based on Motion Information Classification by X-means and Spatial Region Segmentation

Using temporal seeding to constrain the disparity search range in stereo matching

Logical Templates for Feature Extraction in Fingerprint Images

Low Complexity Block Motion Estimation Using Morphological-based Feature Extraction and XOR Operations

Motion Estimation. There are three main types (or applications) of motion estimation:

Low Discrepancy Sequences Applied in Block Matching Motion Estimation Algorithms

CS 565 Computer Vision. Nazar Khan PUCIT Lectures 15 and 16: Optic Flow

Block Matching 11.1 NONOVERLAPPED, EQUALLY SPACED, FIXED SIZE, SMALL RECTANGULAR BLOCK MATCHING

DIGITAL video compression is essential for the reduction. Two-Bit Transform for Binary Block Motion Estimation

International Journal of Advance Engineering and Research Development

Dense Motion Field Reduction for Motion Estimation

Colour Segmentation-based Computation of Dense Optical Flow with Application to Video Object Segmentation

An Investigation of Block Searching Algorithms for Video Frame Codecs

Video Alignment. Literature Survey. Spring 2005 Prof. Brian Evans Multidimensional Digital Signal Processing Project The University of Texas at Austin

Comparative Study of Partial Closed-loop Versus Open-loop Motion Estimation for Coding of HDTV

Compression of Stereo Images using a Huffman-Zip Scheme

Motion Detection and Projection Based Block Motion Estimation Using the Radon Transform for Video Coding

A Sum Square Error based Successive Elimination Algorithm for Block Motion Estimation

Module 7 VIDEO CODING AND MOTION ESTIMATION

By Charvi Dhoot*, Vincent J. Mooney &,

Classification-Based Adaptive Search Algorithm for Video Motion Estimation

Displacement estimation

Video Image Sequence Super Resolution using Optical Flow Motion Estimation

A New Configuration of Adaptive Arithmetic Model for Video Coding with 3D SPIHT

International Journal of Advance Engineering and Research Development

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 3, MARCH

Texture Segmentation by Windowed Projection

Fast Motion Estimation for Shape Coding in MPEG-4

A Miniature-Based Image Retrieval System

Optical Flow-Based Motion Estimation. Thanks to Steve Seitz, Simon Baker, Takeo Kanade, and anyone else who helped develop these slides.

VIDEO DENOISING BASED ON ADAPTIVE TEMPORAL AVERAGING

A MOTION MODEL BASED VIDEO STABILISATION ALGORITHM

Fast Optical Flow Using Cross Correlation and Shortest-Path Techniques

Structured Light II. Thanks to Ronen Gvili, Szymon Rusinkiewicz and Maks Ovsjanikov

Mesh Based Interpolative Coding (MBIC)

3D Computer Vision. Structured Light II. Prof. Didier Stricker. Kaiserlautern University.

Motion Estimation using Block Overlap Minimization

Motion estimation for video compression

Local Image Registration: An Adaptive Filtering Framework

IN RECENT years, multimedia application has become more

Video De-interlacing with Scene Change Detection Based on 3D Wavelet Transform

Segmentation and Tracking of Partial Planar Templates

AUTOMATIC OBJECT DETECTION IN VIDEO SEQUENCES WITH CAMERA IN MOTION. Ninad Thakoor, Jean Gao and Huamei Chen

The Template Update Problem

Representing Moving Images with Layers. J. Y. Wang and E. H. Adelson MIT Media Lab

A VLSI Architecture for H.264/AVC Variable Block Size Motion Estimation

FRAME-RATE UP-CONVERSION USING TRANSMITTED TRUE MOTION VECTORS

Feature Tracking and Optical Flow

EE795: Computer Vision and Intelligent Systems

QUANTIZER DESIGN FOR EXPLOITING COMMON INFORMATION IN LAYERED CODING. Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose

Overview: motion-compensated coding

Automatic Video Caption Detection and Extraction in the DCT Compressed Domain

Locating 1-D Bar Codes in DCT-Domain

[2006] IEEE. Reprinted, with permission, from [Wenjing Jia, Huaifeng Zhang, Xiangjian He, and Qiang Wu, A Comparison on Histogram Based Image

A Robust Wipe Detection Algorithm

An Acceleration Scheme to The Local Directional Pattern

Transcription:

Fast Block-Based True Motion Estimation Using Distance Dependent Thresholds Golam Sorwar School of Multimedia and Information Technology Southern Cross University Coffs Harbour, NSW 2457, Australia Email: gsorwar@scu.edu.au Fax: +61-2-6659-3612 Manzur Murshed and Laurence Dooley Gippsland School of Computing and Information Technology Monash University Churchill, Vic 3842, Australia Email: {Manzur.Murshed,Laurence.Dooley}@infotech.monash.edu.au Fax: +61-3-9902-6879 A fast motion estimation algorithm, called distance-dependent thresholding search (DTS), is presented for block-based true motion estimation applications, and introduces the novel concept of variable distance dependent thresholds. The performance of the DTS algorithm is analysed and quantitatively compared with both the traditional and exhaustive full-search (FS) technique, and the computationally faster, non-exhaustive three-step-search (TSS) algorithm. Experimental results show that by applying an appropriate threshold function, the DTS algorithm not only matches the speed of the TSS algorithm, but both retains a block distortion error comparable to the global minimum produced by the FS algorithm, and avoids the problem of identifying a large number of spurious motion vectors in the search process. ACM Classification: I.4 (Image processing and computer vision) 1. INTRODUCTION With advances in both communication and multimedia technologies, there is a critical need to have visual management systems or user-friendly tools to assist in information retrieval from digital multimedia databases. Amongst the salient features that could be used to define visual content for example, are pixel intensity, colour, texture, shape and motion. Of these, motion is the most obvious and effective feature to provide global and local understanding as well as describing the dynamic content within a video sequence. The extraction of motion parameters from video sequences has therefore, been one of the key elements in a range of applications from computer vision through to Copyright 2004, Australian Computer Society Inc. General permission to republish, but not for profit, all or part of this material is granted, provided that the JRPIT copyright notice is given and that reference is made to the publication, to its date of issue, and to the fact that reprinting privileges were granted by permission of the Australian Computer Society Inc. Manuscript received: 3 June 2003 Communicating Editor: Robyn Owens 157

popular video compression standards such as the Motion Picture Expert Group (MPEG-1/2/4) family. Motion in video sequences may generally be categorised by: camera movement, the movement of objects within a frame, and movement of both camera and objects. Many different motion estimation algorithms have been proposed, including pel-recursive (Robbins and Netravali, 1983; Walker and Rao, 1984), block-matching (Jain and Jain, 1981), and the optical flow-based method (Horn and Schunck, 1981; Lucas and Kanade, 1981). The block-matching algorithm (BMA) has proved to be very popular because of its simplicity, robustness, and ease of implementation. The algorithm, which estimates motion on a block-by-block basis, has been widely exploited in video coding standards such as MPEG-1/2 and H.261/263. One important feature (Dufaux and Moscheni, 1995) of the BMA is that it exhibits superior performance for larger-sized pixel block displacements. The exhaustive BMA, known as the full search (FS) algorithm, searches each candidate block for the closest match within the entire search region to minimise the block-distortion measure (BDM). The BDM of image blocks may be measured using various criteria such as, the mean absolute error (MAE), the mean square error (MSE), and the matching pel-count (MPC). Since the FS algorithm exhaustively searches for a global minimum block-difference error for each candidate block, it generally provides the lowest possible distortion error of any BMA. The algorithm however, suffers two major drawbacks. Its exhaustive nature inevitably means it is very computationally expensive and in addition, the algorithm tends to capture many false motion vectors even when there is no object motion within the search region. This is due to the fact that the distortion of an object in a video frame is relates to its velocity as well as the zoom factor of the camera and therefore, as the length of a motion vector grows so does the block difference distortion error. Although this observation has very little impact when the algorithm is used for video coding, severe artifacts can arise when the algorithm is applied to estimate the true motion vectors, where both object and/or camera motion is present. A number of fast non-exhaustive block matching approaches have been proposed including the three-step search algorithm (TSS) by Koga et al (1981), the new three-step search algorithm (NTSS) by Li et al (1994), the 2D-logarithmic search algorithm (2DLOG) by Jain and Jain (1981), the four-step search algorithm (4SS) by Po and Ma (1996), and the cross-search algorithm by Ghanbari (1990). Of these, TSS has been recommended by both the Reference Model 8 (RM8) of CCITT and Simulation Model 3 (SM3) of MPEG, because of its simplicity, regularity and performance (Po and Ma, 1996; Kim and Choi, 1998). It is still today considered one of the best algorithms against which to compare the performance of new fast algorithms (Zhou and Chen, 2001; Lai and Wong, 2002; Nam et al, 2000). All the aforementioned fast algorithms have been based upon the assumption that the BDM increases as the checking points move away from the global minima. According to Chow and Liou (1993) however, this assumption does not hold true for real world video sequences. Any directional search algorithm can, therefore, be ambiguous and converge to one of the local minima. Moreover, none of the above fast algorithms address the key issue of avoiding the capture of significant numbers of spurious motion vectors in the search process (Dufaux and Moscheni, 1995). This paper directly addresses these issues by introducing a new distance dependent thresholding search algorithm (DTS) which not only avoid picking a large number of false motion vectors, but also simultaneously exhibits the characteristics of a fast search and low BDM. It is our assumption that true motion, caused by moving objects, will produce an error surface that would stretch the search even with thresholding. 158

The paper is structured as follows. Section 2 explains the FS and the TSS algorithms, while the novel distance dependent threshold search (DTS) algorithm using both linear and exponential thresholding functions, is described in Section 3. Experimental results to verify the performance of the DTS algorithm in terms of both its search speed and corresponding BDM error measure are presented in Section 4, which also discusses the selection of the threshold function and related parameters, as well as explaining how the DTS algorithm avoids a large number of spurious motion vectors in the search process. Conclusions are provided in Section 5. 2. BLOCK MATCHING ALGORITHMS Block-based motion estimation algorithm assumes that objects are rigid, move in a translational movement for at least a few frames and occlusion of one object by another, and uncovered background, are neglected. In a block-matching algorithm, the current frame is divided into equisized non-overlapping small rectangular blocks, of M N pixels, as shown in Figure 1. Throughout the paper, the pixels in a frame are numbered using the Cartesian coordinate system with the origin being in the upper-left corner. B n (k,l) denotes the M N sized block containing all the pixels p x, y of frame number n, where k x < k + N and l y < l+m. For each block of the current frame n, a motion vector is obtained by finding a suitably matched block, within the search window, of the next frame n+1. For example, in Figure 1, block B n+1 (k + u, l + v) of the next frame is suitably matched with the block B n (k,l) of the current frame, so that the motion vector for the block B n (k,l) is computed as (u, v). 2.1 The full search (FS) algorithm In selecting a suitably matched block, the FS algorithm searches the entire search region for a block such that the BDM is a global minimum. If more than one block generates a minimum BDM, the FS algorithm selects the block whose motion vector has the smallest magnitude, in order to exploit the centre-biased motion-vector distribution characteristics of a real-world video sequence (Li et al, 1994; Po and Ma, 1996). To achieve this, checking points are used in a spiral trajectory starting at the centre of the search region. If the maximum displacement of a motion vector in both the horizontal and vertical directions is ± d pixels, the total number of search points used to locate the motion vector for each block can be as high as (2d + 1) 2. The spiral trajectory of the checking points used by the FS algorithm with the maximum displacement, d = 7, is shown in Figure 2. Figure 1: Frame-block coordinate system 159

Figure 2: The spiral trajectory of the checking points in the FS algorithm 2.2 The three step search (TSS) algorithm The TSS algorithm is based on a coarse-to-fine approach with logarithmically decreasing step sizes as shown in the example of Figure 3, which has a maximum displacement d = 7. The initial step size is d/2, where d is the maximum motion displacement. At each step, nine checking points are matched and the point with the minimum BDM is chosen as the starting centre of the next step. It is straightforward to prove that the total number of checking points used with maximum motion d is 1+ 8 log 2 (d + 1). 3. THE DISTANCE DEPENDENT THRESHOLDING SEARCH (DTS) ALGORITHM In the FS algorithm, the suitability of a block match is measured based on the optimal (minimum) BDM. FS works well when there is no distortion, but as alluded in Section 1, the level of distortion in any video frame increases with the velocity of the moving objects and/or the zoom factor used First step Second step Third step Figure 3: Three step search path 160

by the camera. Locating a block with the minimum difference, but with a motion vector of high magnitude, is both ineffectual in the prevailing distorted search space, and may also lead to many false motion vectors being erroneously selected. The exhaustive FS algorithm therefore, becomes increasingly inefficient as the spiral trajectory (search pattern) expands. The basis for the solution proposed in this paper is that the suitability measure of the FS algorithm is relaxed from the optimal criterion as the spiral search trajectory moves from the centre, and becomes distance-dependent based thereby exploiting the aforementioned observation, i.e. a Distance-dependent Thresholding Search (DTS) algorithm. Definition 1: Search Squares SS i The search space with maximum displacement d, centred at pixel p cx,cy, can be divided into d+1 mutually exclusive concentric search squares SS i, for all 0 i d, such that a checking point at pixel p x,y is SS k if and only if max( x cx, y cy )=k, for all d+cx x d+cx and d+cy y d+cy. The checking points used in the first three search squares SS 0, SS 1 and SS 2 are clearly shown in Figure 4. From this figure it can be easily identified that (1) Checking points in SS 0 Checking points in SS 1 Checking points in SS 2 Figure 4: DTS search squares 3.1 The Formal DTS Algorithm Like all block-base motion estimation search techniques, the DTS algorithm starts at the centre of the search space. The search then progresses outwards by using search squares, SS i, in order while monitoring the current minimum MAE. A parametric thresholding function, Threshold(i), is used to determine the various thresholds to be used in the search involving each SS i. After searching each SS i, the current minimum MAE is compared against the threshold value of that specific search square and the search is terminated if this MAE value is not higher than the threshold value. The DTS algorithm is formally presented in Figure 5. 161

Precondition: Pixel p cx,cy is the centre of the search space with maximum displacement d. Initialisation: MinMAE = MAE( cx,cy )(0,0) MV = (0,0) Body: If MinMAE > 0 Then For i = 1, 2,, d For each checking point p x,y in SS i e = MAE (cx,cy) (x-cx,y-cy) If e < MinMAE Then MinMAE = e MV = (x-cx,y-cy) If MinMAE Threshold(i) Then Stop Postcondition: MV contains the motion vector and MinMAE contains the distortion error of the respective block. Figure 5: The DTS algorithm Threshold(i) in the DTS algorithm is a monotonically increasing function with respect to i, which can have a linear, exponential, or any other complex analytic form. The following two variable distance dependent threshold functions are applied in this paper. 3.2 Linear thresholding (LT) Let the centre of the search region be at pixel p cx,cy, which also defines the starting point of the search. In LT, the search will terminate at search square SS i when: Assuming b-bit gray level intensity, the maximum value of the MAE will be 2 b 1, since the pixel intensity is measured using 2 b levels with values 0,1,,2 b 1. As (d,d) is the longest possible motion vector within the search region, an upper bound may be set for constant C L such that: (2) (3) 3.3 Exponential thresholding (ET) In ET, the search terminates at search square SS i when Using a similar argument to that in Section 3.1, an upper limit can be set for the constant C E : (4) (5) 162

It is interesting to observe that by setting either C L = 0 or C E = 0 in (2) or (4) respectively, it transforms the DTS algorithm into the exhaustive FS algorithm. It is also clear that the search time for the DTS algorithm decreases as the value of the constant (C L or C E ), used in the respective threshold function is increased. 4. EXPERIMENTAL RESULTS The performance of the DTS, TSS, and FS algorithms was evaluated using the luminance (Ycomponent) signal of a large number of standard and non-standard video sequences such as: Football (320 240 pixels), Flower Garden (352 240 pixels), Miss America (176 144 pixels), Table Tennis (352 240 pixels), Foreman (176 144 pixels), Rocket (176 144 pixels), Ballet (352 240 pixels), and Son (360 288 pixels). The Table Tennis, Football, Rocket, Ballet, Foreman, and Son sequences comprise various kinds of motions, including translation, zooming, and panning, the Flower Garden sequence mainly consists of high portions of fast panning motion, while Miss America is a very low motion video conferencing sequence. All sequences are uniformly quantized to an 8-bit gray level intensity. In the experiments, the block size dimensions were M = N = 16 and d = ±7, i.e., each frame was divided into 16 16 pixel blocks and within each frame, a maximum of (2d+1) 2 = 225 checking points were used. Both linear and exponential threshold functions were used to assess the performance of the DTS algorithm. In the following results, the linear threshold function with constant C L is denoted as LT(C L ) and the exponential threshold function with constant C E is denoted as ET(C E ). To quantitatively evaluate the performance of the DTS algorithm for both the LT and ET functions, the following three specific measures were identified: The average MSE between the reconstructed and the corresponding original frames. The average number of search points. The average percentage of true object motion vector captured. 4.1 MSE results The performance of the DTS algorithm using both the LT and ET functions in terms of the average MSE between the estimated and original frames is shown in Table 1 and Table 2 for the first 80 frames of the Table Tennis, Flower Garden, Miss America and Football sequences. It can be observed that all DTS algorithm variants compared very favorably with the FS algorithm for all test video sequences except Football, even when the search points was comparable with the TSS algorithm, as for example in the Tennis LT(8) and Miss America LT(2) cases. It can be also observed that for the Miss America sequence, with LT(4), the speed improvement factor was almost 20 times faster whereas the average MSE was very similar (within 0.24%) to the optimal average MSE of the FS algorithm. In contrast, the performance of the TSS algorithm was significantly inferior, especially in respect of the motion involved in the Tennis video sequence, as being a directional search algorithm, TSS tends to converge to one of the local minima as explained in the introduction. Table 2 illustrates that the error performance of the DTS algorithm was not so satisfactory for the complex motion video sequence, Football. In Figure 6, we plotted the MSE performances of the DTS algorithm against the FS and TSS algorithms for Flower Garden and Table Tennis sequences. For the sake of clarity in plotting, we only considered the threshold function for the DTS algorithm that uses search points comparable to the TSS algorithm. 4.2 Search points results The performance of the FS, TSS, and DTS algorithms in terms of the average search points to estimate motion vectors is presented in Table 1 and Table 2. Again it is clear that in the Tennis case, 163

DTS with LT(16) is more than 15 times faster than the FS algorithm while the MSE is comparable to the TSS algorithm (9 times faster). For all values beyond LT(6) in Table 2, for the Miss America sequence, while the search speed increased, the quality performance in terms of average MSE remained constant, indicating that although the speed-up factor was high compared to TSS, it still provided better prediction quality. For the Football sequence however, the search speed of the DTS algorithm was not so satisfactory compared to TSS algorithm. The results also proved that by choosing a suitable constant for the selected threshold function, the average search points required by the DTS algorithm could be considerably less, while concomitantly having a significantly lower average MSE. In Figure 7, we plotted the search point performances of the DTS algorithm against the FS and TSS algorithms. For the clarity in plotting, we only considered the threshold function for the DTS algorithm that generates MSE comparable to the TSS algorithm. BMA Flower Garden MSE(avg) Search points(avg) BMA Tennis MSE(avg) Search points(avg) FS(±7) 270.46 199.76 FS(±7) 126.34 196.98 LT(2) 270.97 155.02 LT(2) 127.25 75.68 LT(4) 275.80 73.48 LT(4) 131.64 38.14 LT(6) 283.98 45.93 LT(6) 138.26 26.03 LT(8) 293.61 33.59 LT(8) 147.59 20.35 LT(12) 318.73 23.05 LT(12) 166.71 14.65 LT(16) 353.03 18.60 LT(16) 183.65 12.36 ET(1) 275.96 70.16 ET(1) 135.22 49.22 ET(2) 270.49 155.02 ET(2) 126.5 120.80 ET(4) 270.47 175.65 ET(4) 126.35 171.16 ET(8) 270.47 180.65 ET(8) 126.35 182.36 TSS 322.88 23.22 TSS 190.81 22.75 Table 1: Average MSE and average search points per motion vector for Flower Garden and Tennis sequences (1 80 frames) BMA Miss America MSE(avg) Search points(avg) BMA Football MSE(avg) Search points(avg) FS(±7) 5.386 168.21 FS(±7) 335.70 202.05 LT(2) 5.395 15.14 LT(2) 335.96 123.71 LT(4) 5.399 14.60 LT(4) 340.67 82.57 LT(6) 5.408 9.01 LT(6) 356.53 57.10 LT(8) 5.408 7.90 LT(8) 380.03 41.75 LT(12) 5.408 7.25 LT(12) 430.26 26.88 LT(16) 5.408 7.22 LT(16) 471.12 20.51 ET(1) 5.400 12.81 ET(1) 360.90 74.89 ET(2) 5.398 26.61 ET(2) 335.73 154.95 ET(4) 5.397 48.20 ET(4) 335.70 191.31 ET(8) 5.397 64.31 ET(8) 335.70 197.71 TSS 5.511 19.67 TSS 370.32 23.11 Table 2: Average MSE and average search points per motion vector for Miss America and Football sequences (1-80 frames) 164

Another finding from the results in Table 1 and Table 2 was that the LT function consistently provided a better performance in comparison with the ET function for a wide range of different video sequences. This was not surprising due to the fact that the distortion of an object in any frame was linearly proportional to its velocity as well as the zoom factor of the camera. 4.3 True motion result The performance of the DTS algorithm was also evaluated in terms of how effectively it could capture true object motion. As the block-based technique captures both object and camera motion, to remove camera motion when capturing true object motion vectors, a well known four parameter (pan and zoom) global motion estimation model and Iterative-Least-Square Estimation technique (Rath and Makur, 1999) have been used to estimate the global motion parameters. Having compensated for global motion, true object motion vectors should usually be clustered in the blocks (a) (b) Figure 6: MSE Comparison: (a) Flower Garden; (b) Tennis video sequences (a) (b) Figure 7: Comparison of average search points required per motion vector for the (a) Flower Garden; (b) Tennis sequences 165

containing one or more objects. However, as block motion estimation cannot be performed with full accuracy due to the limitations of block-based estimation techniques, false motion vectors appear as noise, together with the true object motion vectors. To retain only the true object motion vectors, these false motion vectors are removed. A simple strategy is to define a false motion vector elimination threshold T, with the decision based on the value of T (Kim and Ro, 1999). This approach is flawed however, since thresholding only performs well if the true and false motion vectors have different magnitudes (lengths). In the case where the lengths are similar, the technique will not separate true from false motion vectors. An alternative strategy for eliminating false vectors is to gradually increase the length of the true motion vectors by a larger amount compared with false motion vectors, so that the true motion vector lengths can be separated by thresholding. To achieve this goal, the Mean Accumulated Threshold (MAT) filter design proposed by (Sorwar, Murshded and Dooley, 2001) has been applied. This was implemented as a two-stage combination of accumulation of the mean vector length, in addition to the original vector length, and thresholding. The performance of the MAT filter proved to Figure 8: Comparison of LT, ET, FS and TSS in terms of the percentage of true motion vectors captured with different noise tolerance thresholds Figure 9: Average number of true object motion vector captured by different algorithms 166

significantly increase the length of true object motion vectors compared with all other vectors, within a relatively small number of iterations (for certain sequences this can be as low as 2). The above process was applied to six different standard video sequences (Table Tennis, Foreman, Salesman, Rocket, Son and Ballet) from Berkeley MPEG library (Mpeg-lib, 1997) and the average values are plotted in Figure 8 and Figure 9. Figure 9 shows the highest percentage of true motion vectors over two iterations, obtained after filtering all the vectors using the MAT filter and manually selecting using a priori knowledge concerning the moving objects in the frame, captured by different search algorithms. From this plot it can be seen that DTS with the LT function captures approximately 75% of the true motion vectors, where as the FS and TSS algorithms capture only around 40% of true motion vectors. This clearly confirms the superiority of the DTS algorithm, for both LT and ET functions in capturing the true object motion vectors, for all noise tolerance threshold levels considered. The graphs also reaffirm the earlier judgment that LT is a better thresholding function for the DTS algorithm. Figure 10 shows the motion vectors captured by all three algorithms for the pair of frames 32 and 33 from the Tennis sequence. Besides low camera motion (zoom out), the only moving objects appearing in these frames are the ball, the bat, and a portion of the hand holding the bat. The figure (a) FS (b) LT(8) (c) ET(1) (d) TSS Figure 10: The motion vectors obtained from all four search algorithms applied to the frame pair 32 and 33 of the Tennis sequence 167

reveals that the FS and TSS algorithms perform far worse compared to the DTS algorithm, by capturing a large number of spurious true motion vectors. The variable distance threshold of the DTS algorithm eliminates many of the false vectors so ensuring an overall superior performance. 5. CONCLUSIONS This paper has presented a novel Distance-dependent Thresholding Search (DTS) algorithm for block-based motion estimation in video coding and true object motion estimation for object motion based video analysis. An important feature of DTS is that the FS as well as other fast searching modes are encompassed, with different threshold settings, enabling differing quality-of-service levels. A unique characteristic of the DTS algorithm is that its flexibility in being able to trade predicted picture quality (MSE) for search speed in different video sequences. This flexibility has considerable potential for exploitation in a wide range of applications ranging from low-bit rate video conferencing, through to adaptive high-quality video coding. The performance of DTS was examined and shown, in comparison to both the full-search (FS) and the fast three-step-search (TSS) algorithms for all test sequences. With the exception of the very high motion Football sequence, it provided comparable speed performance, while retaining a distortion error comparable to the optimum value produced by the FS algorithm. Both linear and exponential thresholding functions were examined with the former consistently providing better performance. The variable thresholding feature of DTS also avoided identifying large numbers of spurious motion vectors in the search process. 6. ACKNOWLEDGEMENT The authors would like to formally acknowledge both anonymous reviewers for their insightful comments, suggestions and criticisms, which considerably improved the quality of this paper. REFERENCES CHOW, H.-K. and LIOU, M.L. (1993): Genetic motion search algorithm for video compression. IEEE Trans. on Circuits and Systems for Video Tech. 3: 440 445. DUFAUX, F. and MOSCHENI, F. (1995): Motion estimation techniques for digital TV: a review and a new contribution. Proc. of the IEEE 83: 858 876. GHANBARI, M. (1990): The cross-search algorithm for motion estimation (image coding). IEEE Trans. on Comm. 38: 950 953. HORN, K.P. and SCHUNCK, B.G. (1981): Determining optical flow. Artificial Intell. 17: 185 203. JAIN, J.R. and JAIN, A.K. (1984): Displacement measurement and its application in inter frame image coding. IEEE Trans. Comm. 29: 1799 1808. KIM, J.N. and CHOI, T.S. (1998): A fast three-step search algorithm with minimum checking points using unimodal error surface assumption. IEEE Trans. Consumer Electronics 44: 638 648. KIM, S.-Y. and RO, Y.M. (1999): Fast content-based MPEG video indexing using object motion. Proceedings of IEEE Region 10 Conference 2: 1506 1509. KOGA, T., IINUMA, K., HIRANO, A., IIJIMA, Y., and ISHIGURO, T. (1981): Motion-compensated inter frame coding for videoconferencing. IEEE Nat. Telecomm. Conf. 4: 1 5. LAI, K.C. and WONG, S.C. (2002): A fast motion estimation using a three-dimensional reference motion vector. Int. Conf. on Acoustics, Speech, and Signal Processing 4: 3429 3432. LI, R., ZENG, B., and LIOU, M.L. (1994): A new three-step search algorithm for block motion estimation. IEEE Trans. on Circuits & Systems for Video Tech. 4: 438 442. LUCAS, B.D. and KANADE, T. (1981): An iterative image registration technique with an application to stereo vision. Proc. DARPA Image Understanding Workshop, 121 130. MPEG-LIB, 1997. http://bmrc.berkeley.edu/ftp/pub/multimedia/mpeg/movies. NAM, J.-Y., SEO, J.-S., KWAK, J.-S., LEE, M.-H., and YEONG, H.H. (2000):. New fast-search algorithm for block matching motion estimation using temporal and spatial correlation of motion vector. IEEE Trans. on Consumer Electronics 46: 934 942. PO, L.-M. and MA, W.-C.(1996): Novel four-step search algorithm for fast block motion estimation. IEEE Trans. on Circuits & Systems for Video Tech. 6: 313 317. 168

RATH, G.B. and MAKUR, A. (1999): Iterative least squares and compression based estimations for a four-parameter linear global motion model and global motion compensation. IEEE Trans. on Circuits & Systems for Video Tech. 9: 1075 1099. ROBBINS, J.D. and NETRAVALI, A.N. (1983): Recursive motion compensation: a review. Image sequence processing and dynamic sconce analysis, Springer-Verlag, 76 103. SORWAR, G., MURSHED, M. and DOOLEY, L. (2002): A novel filter for block based motion estimation. Proc. of the International Conference on Digital Image Computing Techniques and Applications. 201 206. WALKER, D.R. and RAO, K.R. (1984): Improved pel-recursive motion compensation. IEEE Trans. Comm. 32: 1128 1134. YU, Y., ZHOU, J., and CHEN, C.W. (2001): A novel fast block motion estimation algorithm based on combined subsamplings on pixels and search candidates. Journal of Visual Comm. and Image Representation 12: 96 105. BIOGRAPHICAL NOTES Golam Sorwar received his B.Sc (Hons.) in Electrical and Electronic Engineering in 1994 from Bangladesh University of Engineering and Technology (BUET), M.Sc. in Electrical, Electronic and Systems Eng. in 1998 from the National University of Malaysia, and Ph.D. in IT from Monash University, Australia in 2003. He is currently a lecturer at the School of Multimedia and Information Technology at Southern Cross University, Australia, where his major research interests are in the fields of image/video coding, indexing and retrieval, motion estimation, shot detection, multimedia communication and artificial intelligence. Dr. Sorwar is a member of ACS, IEEE, and Institute of Engineers of Bangladesh (IEB). Manzur Murshed received his B.Sc.Engg. (Hons.) in Computer Science and Engineering from Bangladesh University of Engineering and Technology (BUET) in 1994 and Ph.D. in Computer Science from the Australian National University in 1999. He is currently the Director of Research and a Senior Lecturer at Gippsland School of Computing and Information Technology, Monash University, Australia, where his major research interests are in the fields of multimedia signal processing and communications, parallel and distributed computing, algorithms, and multilingual systems development. He has published more than 40 journal and peer-reviewed research publications. Dr. Murshed is a member of the IEEE. Laurence S. Dooley received his B.Sc.(Hons), M.Sc. and Ph.D. degrees in Electrical Engineering from the University of Wales, Swansea in 1981, 1983 and 1987 respectively. He is currently Professor of Multimedia Technology at the Gippsland School of Computing & IT, Monash University, Australia, where his major research interests are in the fields of multimedia signal processing, mobile multimedia communications and video technology. He has published more than 80-international scientific peer-reviewed journal, book chapters and conference papers. He is also Executive Director of the Monash Regional Centre for ICT (MRCICT). Professor Dooley is a Senior Member of the IEEE, a Chartered Engineer (C.Eng). Golam Sorwar Manzur Murshed Laurence Dooley 169