Transactions Briefs. An Adaptive Search Length Algorithm for Block Matching Motion Estimation

Similar documents
Express Letters. A Simple and Efficient Search Algorithm for Block-Matching Motion Estimation. Jianhua Lu and Ming L. Liou

Enhanced Hexagon with Early Termination Algorithm for Motion estimation

Module 7 VIDEO CODING AND MOTION ESTIMATION

An Adaptive Cross Search Algorithm for Block Matching Motion Estimation

Fobe Algorithm for Video Processing Applications

Video Compression System for Online Usage Using DCT 1 S.B. Midhun Kumar, 2 Mr.A.Jayakumar M.E 1 UG Student, 2 Associate Professor

A New Fast Motion Estimation Algorithm. - Literature Survey. Instructor: Brian L. Evans. Authors: Yue Chen, Yu Wang, Ying Lu.

Redundancy and Correlation: Temporal

AN ADJUSTABLE BLOCK MOTION ESTIMATION ALGORITHM BY MULTIPATH SEARCH

Prediction-based Directional Search for Fast Block-Matching Motion Estimation

Semi-Hierarchical Based Motion Estimation Algorithm for the Dirac Video Encoder

Fast Motion Estimation for Shape Coding in MPEG-4

Adaptive Square-Diamond Search(ASDS) Algorithm for Fast Block Matching Motion Estimation

SINGLE PASS DEPENDENT BIT ALLOCATION FOR SPATIAL SCALABILITY CODING OF H.264/SVC

DIGITAL TELEVISION 1. DIGITAL VIDEO FUNDAMENTALS

Motion Estimation for Video Coding Standards

Fast Decision of Block size, Prediction Mode and Intra Block for H.264 Intra Prediction EE Gaurav Hansda

A Novel Hexagonal Search Algorithm for Fast Block Matching Motion Estimation

Low-Complexity Block-Based Motion Estimation via One-Bit Transforms

Enhanced Hexagonal Search for Fast Block Motion Estimation

Mesh Based Interpolative Coding (MBIC)

A Sum Square Error based Successive Elimination Algorithm for Block Motion Estimation

A Study on Block Matching Algorithms for Motion Estimation

Motion Vector Estimation Search using Hexagon-Diamond Pattern for Video Sequences, Grid Point and Block-Based

Comparative Study of Partial Closed-loop Versus Open-loop Motion Estimation for Coding of HDTV

IN RECENT years, multimedia application has become more

International Journal of Scientific & Engineering Research, Volume 5, Issue 7, July ISSN

DIGITAL video compression is essential for the reduction. Two-Bit Transform for Binary Block Motion Estimation

Efficient Block Matching Algorithm for Motion Estimation

Directional Cross Diamond Search Algorithm for Fast Block Motion Estimation

A New Configuration of Adaptive Arithmetic Model for Video Coding with 3D SPIHT

Variable Temporal-Length 3-D Discrete Cosine Transform Coding

A High Quality/Low Computational Cost Technique for Block Matching Motion Estimation

Motion estimation for video compression

Joint Adaptive Block Matching Search (JABMS) Algorithm

A Low Bit-Rate Video Codec Based on Two-Dimensional Mesh Motion Compensation with Adaptive Interpolation

CMPT 365 Multimedia Systems. Media Compression - Video

Using animation to motivate motion

International Journal of Emerging Technology and Advanced Engineering Website: (ISSN , Volume 2, Issue 4, April 2012)

Complexity Reduced Mode Selection of H.264/AVC Intra Coding

Homogeneous Transcoding of HEVC for bit rate reduction

Yui-Lam CHAN and Wan-Chi SIU

Fast Block-Matching Motion Estimation Using Modified Diamond Search Algorithm

Fast Wavelet-based Macro-block Selection Algorithm for H.264 Video Codec

Reduced Frame Quantization in Video Coding

Low Complexity Block Motion Estimation Using Morphological-based Feature Extraction and XOR Operations

A 3-D Virtual SPIHT for Scalable Very Low Bit-Rate Embedded Video Compression

An Efficient Mode Selection Algorithm for H.264

Context based optimal shape coding

MANY image and video compression standards such as

Multimedia Systems Video II (Video Coding) Mahdi Amiri April 2012 Sharif University of Technology

Predictive Motion Vector Field Adaptive Search Technique (PMVFAST) - Enhancing Block Based Motion Estimation

Motion Estimation Using Low-Band-Shift Method for Wavelet-Based Moving-Picture Coding

Chapter 10. Basic Video Compression Techniques Introduction to Video Compression 10.2 Video Compression with Motion Compensation

H.264 to MPEG-4 Transcoding Using Block Type Information

International Journal of Advance Engineering and Research Development

ADAPTIVE PICTURE SLICING FOR DISTORTION-BASED CLASSIFICATION OF VIDEO PACKETS

EE 5359 Low Complexity H.264 encoder for mobile applications. Thejaswini Purushotham Student I.D.: Date: February 18,2010

STUDY AND IMPLEMENTATION OF VIDEO COMPRESSION STANDARDS (H.264/AVC, DIRAC)

Implementation and analysis of Directional DCT in H.264

Optimized Progressive Coding of Stereo Images Using Discrete Wavelet Transform

Video Compression An Introduction

A Novel Statistical Distortion Model Based on Mixed Laplacian and Uniform Distribution of Mpeg-4 FGS

Quality versus Intelligibility: Evaluating the Coding Trade-offs for American Sign Language Video

Coding of Coefficients of two-dimensional non-separable Adaptive Wiener Interpolation Filter

Optimal Estimation for Error Concealment in Scalable Video Coding

Implementation of A Optimized Systolic Array Architecture for FSBMA using FPGA for Real-time Applications

10.2 Video Compression with Motion Compensation 10.4 H H.263

Fast frame memory access method for H.264/AVC

Review and Implementation of DWT based Scalable Video Coding with Scalable Motion Coding.

EE 5359 MULTIMEDIA PROCESSING SPRING Final Report IMPLEMENTATION AND ANALYSIS OF DIRECTIONAL DISCRETE COSINE TRANSFORM IN H.

QUANTIZER DESIGN FOR EXPLOITING COMMON INFORMATION IN LAYERED CODING. Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose

A Novel Deblocking Filter Algorithm In H.264 for Real Time Implementation

Module 7 VIDEO CODING AND MOTION ESTIMATION

MOTION estimation is one of the major techniques for

Lecture 3: Image & Video Coding Techniques (II) & Standards (I) A/Prof. Jian Zhang

University of Bristol - Explore Bristol Research. Peer reviewed version. Link to published version (if available): /WIVC.1996.

Pattern based Residual Coding for H.264 Encoder *

IJCSI International Journal of Computer Science Issues, Vol. 9, Issue 4, No 2, July 2012 ISSN (Online):

VIDEO streaming applications over the Internet are gaining. Brief Papers

AN EFFICIENT VIDEO WATERMARKING USING COLOR HISTOGRAM ANALYSIS AND BITPLANE IMAGE ARRAYS

Fast Implementation of VC-1 with Modified Motion Estimation and Adaptive Block Transform

Digital Video Processing

Performance analysis of Integer DCT of different block sizes.

Digital video coding systems MPEG-1/2 Video

One-pass bitrate control for MPEG-4 Scalable Video Coding using ρ-domain

RECENTLY, researches on gigabit wireless personal area

MultiFrame Fast Search Motion Estimation and VLSI Architecture

Video coding. Concepts and notations.

Open Research Online The Open University s repository of research publications and other research outputs

Chapter 11.3 MPEG-2. MPEG-2: For higher quality video at a bit-rate of more than 4 Mbps Defined seven profiles aimed at different applications:

Outline Introduction MPEG-2 MPEG-4. Video Compression. Introduction to MPEG. Prof. Pratikgiri Goswami

Star Diamond-Diamond Search Block Matching Motion Estimation Algorithm for H.264/AVC Video Codec

FRAME-LEVEL QUALITY AND MEMORY TRAFFIC ALLOCATION FOR LOSSY EMBEDDED COMPRESSION IN VIDEO CODEC SYSTEMS

Multiframe Blocking-Artifact Reduction for Transform-Coded Video

FAST MOTION ESTIMATION DISCARDING LOW-IMPACT FRACTIONAL BLOCKS. Saverio G. Blasi, Ivan Zupancic and Ebroul Izquierdo

Simplified Block Matching Algorithm for Fast Motion Estimation in Video Compression

SIGNAL COMPRESSION. 9. Lossy image compression: SPIHT and S+P

Vidhya.N.S. Murthy Student I.D Project report for Multimedia Processing course (EE5359) under Dr. K.R. Rao

Motion-Compensated Subband Coding. Patrick Waldemar, Michael Rauth and Tor A. Ramstad

Transcription:

906 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 7, NO. 6, DECEMBER 1997 Transactions Briefs An Adaptive Search Length Algorithm for Block Matching Motion Estimation Mark R. Pickering, John F. Arnold, and Michael R. Frater Abstract This paper presents a new fast search algorithm for block matching motion estimation called the adaptive search length (ASL) algorithm. The ASL algorithm adaptively varies the number of positions searched for each block while still maintaining control of the average number of searches per block for each frame. Experimental results show that the peak signal-to-noise ratio (PSNR) of decoded sequences which were coded using the ASL algorithm is within 0.25 db of the PSNR of decoded sequences which were coded using the full search block matching algorithm. It is also shown that the ASL algorithm requires only 10% of the computations required by the full search algorithm to achieve this level of decoded image quality. Index Terms Image coding, motion compensation. I. INTRODUCTION Motion estimation is an essential component of all modern standard video coding algorithms (e.g., CCITT Rec. H.261 [1], ISO MPEG-1 [2], and MPEG-2 [3]). It is included in these standard algorithms to reduce the redundancy between successive frames of a video sequence. The method adopted to estimate the motion between frames is the block matching algorithm (BMA) [4]. For the full search BMA, a measure of the difference between every block in a search window from the previous frame and the current block is calculated. The most commonly used difference measure is the mean absolute difference (MAD) between the current block and the block in the search window [5]. Other difference measures which may be used include the normalized cross-correlation function (NCCF) [6], the statistical correlation measure [7], the mean squared error (MSE) [4], and the number of pixels which are classified as matching pixels by the pel difference classification (PDC) algorithm [8]. The full search BMA will always find the optimum motion vector corresponding to the optimum value from all possible values of the difference measure. However, the price paid for this optimum performance is a high computational cost. A number of fast search algorithms have been proposed which may find a suboptimum motion vector but have a greatly reduced computational complexity. Most of these algorithms reduce the number of computations required by calculating the difference measure at positions coarsely spread over the search window according to some pattern and then repeating this procedure with finer resolution around the position with the minimum difference measure found from the preceding step. These algorithms include: the two-dimensional logarithmic search [4], three step search [5], modified motion estimation [9], orthogonal search [10], variable-stage motion search [11], cross search [12], dynamic search-window adjustment with interlaced search [13], and new three-step search algorithms [14]. Manuscript received August 8, 1995. This paper was recommended by Associate Editor C.-C. Jay Kuo. This work was supported by the Australian Research Council. The authors are with the School of Electrical Engineering, University College, The University of New South Wales, Australian Defence Force Academy, Australia. Publisher Item Identifier S 1051-8215(97)08389-4. The simplified conjugate direction search algorithm, known as the one-at-a-time search (OTS) algorithm, adopts a different approach [15]. In this algorithm, the difference measure is first calculated at the center of the search window and then at positions which follow a path of descending difference measure values until a minimum is found. All these algorithms rely on the assumption that the difference measure decreases monotonically as the search position moves closer to the optimum position. If this assumption does not hold then these fast algorithms may not find the global minimum. The genetic motion search (GMS) algorithm [16] attempts to overcome this problem by first choosing a random selection of search positions and then using an algorithm similar to the genetic processes of mutation and evolution to find the global minimum of the difference measure. However, for a search window of identical size, the number of computations required by the other fast search algorithms outlined above is approximately 15% of that required by the GMS algorithm. The fast algorithms proposed in [17] reduce the complexity of the BMA by reducing the number of points from each block which are used to calculate the difference measure and by reducing the number of blocks from each frame for which a motion vector is calculated directly. For all of these fast search algorithms, the number of times the difference measure is calculated remains approximately constant for each block in the frame. This results in an inefficient use of the total number of computations allowed for each frame. Consider the surfaces shown in Fig. 1 which represent the value of the MAD difference measure for each position in the search window for three different blocks. The MAD at position (i; j) in the search window is denoted by D(i; j) and is given by (n=2)01 (m=2)01 1 D(i; j) = (n2m) u=0(n=2) ji(u; v) 0 S(u + i; v + j)j v=0(m=2) (1) where I(u; v) is the pixel value u rows down and v columns to the right of the center of the current n 2 m block and S(u + i; v + j) is the pixel value u + i rows down and v + j columns to the right of the center of the search window. For the surface shown in Fig. 1(a), a simple fast search algorithm such as the OTS or TSS algorithm would require a small number of searches to determine the global optimum position for this block. For the surface shown in Fig. 1(b), it is unlikely that the previously described fast search algorithms (with the exception of the GMS algorithm) would converge to the global minimum. For the surface shown in Fig. 1(c), there is no need to find the global minimum position since any of the local minimum positions will correspond to a satisfactory prediction block as D(i; j) is uniformly small. The new algorithm presented in this paper, called the adaptive search length (ASL) algorithm, adaptively varies the number of positions searched (and hence the number of times the difference measure is calculated) for each block, while still maintaining control of the average number of searches per block for each frame. By allowing this variation, a small number of searches are used where it is likely that the first local minimum found corresponds to the global minimum or where the difference measure at a local minimum is 1051 8215/97$10.00 1997 IEEE

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 7, NO. 6, DECEMBER 1997 907 (a) (b) (c) Fig. 1. Surfaces which represent the value of the MAD difference measure for each position in the search window for three different blocks. sufficiently small, and hence a large number of searches may be used for difficult cases such as the one shown in Fig. 1(b). The ASL algorithm is described in Section II of this paper. The results of experiments which compare the ASL algorithm to the full search and other algorithms are presented in Section III and discussed in Section IV. II. THE ASL ALGORITHM The adaptive search length algorithm can best be described as an extension of the OTS algorithm. When using the OTS algorithm, the first position searched is at the center of the search window and then the search continues in alternating horizontal and vertical directions until a local minimum of the difference measure is found. For the new ASL algorithm, a similar procedure is conducted for up to 25 different starting positions spread across the search window. These starting positions are distributed evenly over the search window and the order they are used is based on their proximity to the center of the search window. Those close to the center are used before those at the edges of the search window. The order in which these positions are used for the next block in the frame is recalculated after the current block has been searched. The ASL algorithm is defined by the following steps. 1) For the first starting position, a modified OTS is conducted to find a local minimum value of the MAD difference measure, D(i; j): 2) Calculate the maximum search length, denoted by Lm; allowed for the current search window. 3) If the number of searches used to find the local minimum is greater than the maximum search length, the search is stopped and it is assumed that the local minimum is the global minimum. If the number of searches used is less than the maximum search length, Steps 1) 3) are repeated beginning at the next starting position. 4) After the search has been completed, calculate the order in which the starting positions are to be used for the next block in the frame. Each of these steps will now be explained in detail. A. The Modified OTS The modified OTS used in the ASL algorithm is similar to the algorithm defined in [18]. The main difference between this algorithm and the original OTS algorithm is that both horizontal and vertical directions are searched for the starting position and the direction with the smallest value of the MAD is chosen as the first direction to be searched. B. Calculating the Maximum Search Length The second step in the ASL algorithm requires a maximum search length for the current search window to be calculated. This step is performed every time the modified OTS finds a local minimum. The maximum search length is determined according to three different maximum values for the search length denoted by L 1 ;L 2 ; and L 3 : The maximum search length for the current search window is then taken as the minimum of these three values and is given by Lm = min [L 1;L 2;L 3]: (2) In order to calculate the value for L 1; it is first necessary to estimate the number of local minima for D(i; j) which are likely to occur in the search window. That is, to determine whether the surface which represents the value of D(i; j) difference measure for each position in the search window will look more like Fig. 1(a) or more like Fig. 1(b) or (c). The number of local minima is estimated by measuring the average search path length of the searches already conducted on the search window. If the surface contains a large number of local minima as in Fig. 1(b) then the average search path length will be small because the average distance between each starting position and the closest local minimum will be small. If the surface contains only one minimum as in Fig. 1(a), then the average distance between each starting position and the global minimum will be large. Hence, after a local minimum is found, the longer the average search path length, the more likely it is that the local minimum is actually the global minimum and no more positions need to be searched. The average search path length for a search window is denoted by 0 and is given by 0= 3 8 (3)

908 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 7, NO. 6, DECEMBER 1997 where 3 is the number of positions already searched which lie on a search path and 8 is the number of starting positions already searched. Note that a more accurate estimate of 0 is obtained when only those positions which lie on a search path are used to calculate 3 rather than using the total number of positions searched. (These positions are shown connected by a straight line for the example shown in Fig. 3). An experiment was conducted to determine the relationship between the average search path length and the number of searches required to find the global minimum. This experiment was conducted using a software simulation of the MPEG-2 standard video coder to code 23 different five-frame sequences. As the sequences were interlaced sequences, the coder was restricted to coding only field motion vectors in order to eliminate the effect of the interlacing on the search algorithms. The coder operated with five frames per group of pictures (GOP) and the first frame was coded as an I-frame and the remaining four frames were coded as P-frames. No intra coded macroblocks were allowed in the P-frames to eliminate the effect of the intra/inter decision method. With the coder operating in this way, the allowed range for the motion vectors of each macroblock was 67 pixels in the vertical direction and 615 pixels in the horizontal direction. The full search BMA in this case would search 465 positions to find the global minimum value for D(i; j): After the position corresponding to the global minimum is found, the decoded version of the search window is used to find the minimum value of the MAD from the current position and the eight surrounding positions corresponding to a half pixel increase or decrease in each motion vector. In the first part of the experiment, the full search algorithm was used to determine the position of the global minimum value for D(i; j): The coder was then run using the modified OTS algorithm to find local minimum values for D(i; j) in each search window. The search procedure was halted when the local minimum found coincided with the global minimum previously determined by the full search algorithm. For each macroblock, the value of 0 and the total number of searches required to find the global minimum were recorded. A plot of the average path length versus the number of searches required to find the global minimum for each macroblock in the 23 test sequences is shown in Fig. 2. It can be seen from Fig. 2 that the maximum number of searches required to find the global minimum is approximately inversely proportional to the average search path length of the modified OTS search. Therefore, when a local minimum is found and a value for 0 is calculated, the value for L 1 should represent the maximum number of searches required to find the global minimum for a search window with an average search path length equal to 0: Hence, for the coder operating in this mode, the equation relating L 1 and 0 was chosen as L 1 = N max ; if 0 < 8 512 0 0 8 +48; otherwise (4) where N max is the maximum number of searches possible in the search window. The curve for this equation is also shown in Fig. 2. Now by using this first maximum search length value L 1; more search positions are allowed in search windows which contain many local minimum values for D(i; j) than in search windows which have monotonically decreasing values of D(i; j): However, there is still some inefficient use of the total number of search positions allowed per frame. Consider the search window with the MAD surface shown in Fig. 1(c). The modified OTS will find many local minima in this search window, the value calculated for 0 will be small, and consequently, if only the value of L 1 is used, many search positions will be allowed for this search window. It can be seen however that the value for D(i; j) at all the local minimum positions found will be Fig. 2. The average path length versus the number of searches required to find the global minimum for each macroblock in the 23 test sequences. The shades of grey for the points indicate the frequency at which each point on the plot occurred. A darker shade of grey indicates a higher frequency. very small, and hence, any of the these positions will correspond to a good prediction for the current block. Therefore, a second maximum search length value is needed to limit the number of allowed search positions in a search window where D(i; j) at the local minimum positions has already reached an acceptably small value. This second maximum search length value is given by L 2 = N max ; if D min > N max +64 0; if D min < 4 16D min 0 64; otherwise where D min is the minimum of the values for D(i; j) already found by the modified OTS. The parameters for (5) were found experimentally to give the most efficient use of the total number of search positions allowed per frame. It can be seen from (5) that the maximum number of searches allowed for a search window increases linearly according to the minimum of the values for D(i; j) already found. Therefore, if Lm was taken as the minimum of L 1 and L 2 ; more searches would be allowed for a search window where there was a large number of local minima and the minimum MAD value found so far was not acceptably small. However, if only the minimum of L 1 and L 2 is used, the total number of positions searched for each frame will depend on the characteristics of the individual search windows and could vary markedly for different sequences. Therefore, a third maximum search length value, L 3; is used to ensure that a predetermined value of the mean number of searches per block is not exceeded. The value of L 3 for the current block to be searched is calculated after the search is completed for the previous block and before the search begins for the current block. The value of L 3 then remains constant until the search is completed for the current block. In order to explain the procedure for calculating L 3 ; it is necessary to define the following parameters. Let the allowed mean number of searches per block be denoted by N, the total number of positions searched for the previous block be denoted by N, and the number of excess search positions available for the next block be denoted by N + : The procedure for calculating L 3 is defined as follows. For the first block in the frame, the value of L 3 is set to N and the value of N + is set to zero. The ASL algorithm with the maximum search length Lm given by (2) is then used to determine a prediction block for the current block and the value for N is set to the number of 16 (5)

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 7, NO. 6, DECEMBER 1997 909 Fig. 3. The search paths taken by the ASL algorithm for the search window with the MAD surface shown in Fig. 1(b) superimposed over a grey-scale representation of the values for D(i; j) at each position in the search window. positions searched for this block. The value for N + is then adjusted using the equation N + = N + + N 0 N (6) TABLE I VALUES OF THE PARAMETERS CALCULATED WHEN THE LOCAL MINIMUM WAS REACHED FOR EACH STARTING POSITION and the value of L 3 for the next block is calculated using the equation L 3 = N + N + : (7) A prediction block for the next block is then found using the ASL algorithm and the process continues for the remaining blocks in the frame. By calculating the value of L 3 in this way, if the ASL algorithm searches less than N positions to find the global minimum in a block, the searches which are not used can be used for later blocks which may require more than N positions to be searched to find the global minimum. In addition, the average number of searches per block for the frame can never exceed N since no extra searches can be used for a block unless they have not been used in a previous block. The value of N + may become slightly negative when the number of search positions required to reach a local minima slightly exceeds the value of L 3 (the value of N is only compared with L 3 when a local minima is reached). However, a negative value of N + must make the value of L 3 less than N, and for the following macroblocks (N 0N) will almost certainly be positive. Hence, a negative feedback loop is formed which will guarantee that N + becomes positive within a few macroblocks. As an example of how this part of the ASL algorithm operates, consider the search window with the MAD surface shown in Fig. 1(b). The search paths taken by the ASL algorithm for this search window are shown in Fig. 3 superimposed over a grey-scale representation of the values for D(i; j) at each position in the search window. A lighter shade of grey corresponds to a larger value for the MAD at that position. It can be seen from Fig. 3 that 21 starting positions are used to find the global minimum value of the MAD. The order of the starting positions is taken as the original order. The values of the parameters calculated when the local minimum was reached for each starting position is shown in Table I. The value of L 3 for this block was 1329, indicating that the average number of search positions used so far in the frame was considerably lower than N:

910 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 7, NO. 6, DECEMBER 1997 Fig. 4. The order of the starting positions after the frame containing the search window with the MAD surface shown in Fig. 1(b) has been searched. C. Calculating the Starting Position Order It can be seen from Fig. 3 that, because the global minimum position is not close to the center of the search window, a large number of starting positions are used before the global minimum position is found. This search window is taken from a block in the center of a frame in a sequence where the camera is panning across the scene. Consequently, a large number of search windows in this frame will have approximately the same global minimum position. Therefore, so that positions in frames such as these are not searched unnecessarily, the order in which the starting positions are used is recalculated for each block. The starting position order for each block is calculated using the mean over the sequence of D(i; j) values at the local minimum reached from each starting position. These values are denoted by D(^i n; ^j n) for n =1 11125 and are given by B D ^i n ; ^j n = 1 D ^i k n ; ^j n for n =1 11125 (8) B k=1 where B is the number of blocks in the sequence searched so far, n denotes the original order of each starting position, and D k (^i n ; ^j n ) denotes the MAD value for the kth block searched at the local minimum found when starting at position n: These values are recalculated after each block has been searched and reset when a frame of the sequence is coded in intra mode. Then, for each block, the first starting position is taken as the starting position with the minimum value of D(^i n; ^j n) and the remaining starting positions are used in the order of ascending values for D(^i n ; ^j n ): For example, Fig. 4 shows the order of the starting positions for the next block after the frame containing the search window in Fig. 1(b) has been searched. The mean number of searches per block for the frame was N = 50: This method of reordering the starting positions performs well even when there are several different objects moving in different directions in the frame. In this case the starting positions which are closest to the motion vectors of each object will be the first positions used in the search. The order of these first few starting positions used will depend on the average of the MAD values found for each of the starting positions in the previous frames. The position reordering algorithm could be improved by adding some extra priority to the starting positions according to the local motion vector field. This would have the effect of choosing between the first few starting positions according to the motion vectors found in the immediate neighborhood of the current block. D. Implementation Issues Motion estimation forms a significant fraction of the total computational effort in conventional motion-compensated DCT video compression standards. Any algorithm that reduces the amount of computation required to achieve satisfactory performance in this area is therefore of value. Currently, most video encoders are implemented in hardware. The algorithm described in this paper probably has limited use in full hardware implementations since it does not have the regularity and modularity in computation which are exhibited by techniques like the full search or three-step search block matching algorithms and which are important for hardware implementation. While it is not expected that the algorithm proposed here would be suitable for direct hardware implementation, the scalability of the computational cost provides significant advantages for software encoder implementations, which are expected to become more important in the future. As an example of this trend, the emphasis in studying implementation complexity in MPEG-2 was on hardware implementations, whereas the emphasis in the current work toward developing the MPEG-4 standard is on software implementations. The scalability of the computational cost is potentially of particular advantage in real-time operating systems. An important issue in realtime operating systems is providing guarantees of quality of service. In the Nemesis operating system [19], each task is given a statistical guarantee for its access to various resources, including memory computational power and communications. If the operating system is unable to meet its commitments to a given task, it will inform the task of the level of resources that will be provided. Given this information, the task can then make an optimal internal allocation of resources and thereby achieve a graceful degradation in performance. In the case of a video compression system employing the motion estimation algorithm described here, it would be relatively easy to reduce the level of computational resources made available to the motion estimation algorithm without causing a catastrophic reduction in performance. This type of algorithm will also be very useful for designing video encoding and decoding software for the new style of programmable DSP chips which have recently been developed for real-time multimedia applications. An example of such a chip is the Texas Instruments TMS320C80 which includes four parallel processors and dedicated on-board memory designed specifically for real-time software encoding and decoding of video data. The algorithm presented in this paper could be easily executed on such a system providing a dramatic reduction in processing time compared with the time required by the full search algorithm. III. RESULTS In order to compare the performance of the ASL algorithm to other standard fast search algorithms, a software simulation of the MPEG-2 standard video coder was used to code four different 24- frame sequences. The coder operated with the settings described in Section II-B but with 12 frames per GOP instead of five, the first frame was coded as an I-frame, and the remaining 11 frames in the GOP were coded as P-frames. The four sequences chosen were Bus, which shows fast camera panning, BBC Disk, which shows rotational motion, Football, which shows fast moving objects on a stationary background, and Table Tennis, which shows camera zooming. These four sequences contain all of the commonly occurring types of motion found in typical video sequences, and hence it can be assumed that the performance of the ASL algorithm for these sequences can be extended to all typical video sequences. The four sequences were coded at a bit rate of 4 Mb/s using the ASL algorithm with several values of N: The peak signal-tonoise ratio (PSNR) found for these values is then compared with the PSNR obtained when using the OTS, TSS, and new three-step search algorithms. An extended version of the OTS algorithm (OTS2) was also simulated with the starting position set to the average of the

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 7, NO. 6, DECEMBER 1997 911 (a) (b) (c) (d) Fig. 5. The PSNR of the decoded sequences plotted against the mean number of searches per block, and the PSNR obtained when using the full search BMA for the sequences (a) Bus, (b) BBC Disk, (c) Table Tennis, and (d) Football. motion vectors found for the blocks immediately surrounding the current block in the previous frame. This simulation was included to demonstrate the superior performance gained from using multiple starting positions in conjunction with starting position reordering. In addition, the PSNR values for the 4 : 1 pixel subsampling (PSS) and the combined 4 : 1 pixel subsampling and 2 : 1 motion vector field subsampling (PMVSS) approaches proposed in [17] were also compared. The two-dimensional logarithmic, modified motion estimation, orthogonal search, and cross search algorithms were not simulated as these approaches were shown in [12] to be similar in performance to the TSS algorithm. The genetic motion search algorithm was not simulated as the approximate number of searches required was much larger than that for the proposed algorithm. The PSNR of the decoded sequences plotted against the mean number of searches per block and the PSNR obtained when using the full search algorithm are shown for the four sequences in Fig. 5. Note that because the range of motion vectors is restricted to 67 pixels in the vertical direction, the TSS algorithm requires 19 instead of the 25 search points which would be required if the search window was square. Also note that the PSNR obtained from the full search algorithm is shown as a dotted line to represent the theoretical upper limit of the PSNR, rather than a single point at 465 search points. The time required to calculate the MAD distortion measure for all search points is the only calculation time which needs to be considered for the ASL algorithm. The additional time required to calculate the maximum search length varies according to the number of local minima in the search window but is typically less than 1% of the total computational time. The time required to calculate the starting position order is also less than 1% of the total time and may be neglected. It can be seen from Fig. 5 that the ASL algorithm using N =50 performs significantly better than the OTS, TSS, NTSS, and OTS2 algorithms for the sequences Bus and BBC Disk. This value of N still corresponds to 10% of the calculations required by the full search algorithm. In these sequences, the correct motion vectors for a large proportion of the macroblocks in the frame are found at the edges of the search window. By recalculating the start position order, the ASL algorithm can adapt to the predominant motion in

912 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 7, NO. 6, DECEMBER 1997 the frame whereas the TSS, OTS, and NTSS algorithms must start searching at the center of the search window for every macroblock. The performance of the OTS2 algorithm was better than the OTS algorithm for these sequences but as only one starting position is used the PSNR for this method was still significantly lower than that for the ASL algorithm. For the sequences such as Table Tennis and Football which have correct motion vectors which are close to the center of the search window, the performance of the TSS, OTS, NTSS, and OTS2 algorithms is closer to the full search performance but still worse than the performance of the ASL algorithm. The performance of the PSS and PMVSS algorithms is close to that of the ASL algorithm for the corresponding number of searches. However, an advantage of the ASL algorithm over these two techniques is the ability to vary the calculation time required using N : It is interesting to note that since these algorithms reduce the calculation time required by subsampling the pixels used for the MAD calculation and the motion field, it would be possible to combine these techniques with the ASL algorithm to further reduce the calculation time required. The combination of these two techniques is currently the subject of further research. For the sequence BBC Disk, the PSNR plots show that the highest PSNR value for the decoded sequence can be obtained when using the ASL algorithm with a mean search length per block of N =20:This result occurs because of the suboptimal performance of the MAD as a difference measure. Since the motion vector which produces the best MAD value does not necessarily produce a decoded block with the best PSNR, occasional irregularities can occur. In this case, even though all the motion vectors which produced the best MAD value have not been found, some of the motion vectors which were found produced better PSNR values. This effect can also be seen for the PSS and PMVSS algorithms where subsampling the motion field actually produces slightly better PSNR values than using all the motion vectors found by the PSS method. However, it can be seen from Fig. 5 that, in general, the more search positions which are used in the ASL algorithm the better the PSNR of the resulting decoded sequence. IV. CONCLUSIONS In this paper, we have presented a new fast search algorithm, known as the ASL algorithm, for block-matching motion estimation. The primary advantage of this technique over other techniques proposed previously, such as TSS, OTS, NTSS, OTS2, and PMVSS, is that a tradeoff can be made between computational requirements and prediction quality, and that this tradeoff can be adjusted online. This features makes possible the graceful degradation of quality in software coders when reduced computational resources are available. The performance of the new technique either matches or exceeds that of previous techniques for the same usage of computational capacity. Furthermore, it can be seen from the experimental results that as N approaches 50, the PSNR value for the ASL algorithm approaches the PSNR for the full search algorithm. When N =50, the ASL algorithm requires 10% of the computations required by the full search algorithm for no significant difference in the PSNR of the decoded image. [4] J. R. Jain and A. K. Jain, Displacement measurement and its application in interframe image coding, IEEE Trans. Commun., vol. COM-29, pp. 1799 1808, Dec. 1981. [5] T. Koga, K. Iinuma, A. Hirano, Y. Iijima, and T. Ishiguro, Motion compensated interframe coding for video conferencing, in Proc. Nat. Telecommunications Conf., New Orleans, LA, Nov. 29 Dec. 3, 1981, pp. G5.3.1 G5.3.5. [6] H. G. Musmann, P. Pirsch, and H.-J. Grallert, Advances in picture coding, in Proc. IEEE, vol. 73, pp. 523 548, Apr. 1985. [7] W. K. Pratt, Correlation Techniques for Image Registration, IEEE Trans. Aerosp. Electron. Syst., vol. 10, no. 3, pp. 353 358, May 1974. [8] H. Gharavi and M. Mills, Blockmatching motion estimation algorithms New results, IEEE Trans. Circuits Syst., vol. 37, pp. 649 651, May 1990. [9] S. Kappagantula and K. R. Rao, Motion compensated interframe image prediction, IEEE Trans. Commun., vol. COM-33, pp. 1011 1015, Sept. 1985. [10] A. Puri, H. M. Hang, and D. L. Schilling, An efficient blockmatching algorithm for motion compensated coding, in Proc. IEEE ICASSP, Apr. 1987, pp. 25.4.1 25.4.4. [11] S. C. Kwatra, C.-M. Lin, and W. A. Whyte, An adaptive algorithm for motion compensated color image coding, IEEE Trans. Commun., vol. COM-35, pp. 747 754, July 1987. [12] M. Ghanbari, The cross-search algorithm for motion estimation, IEEE Trans. Commun., vol. 38, pp. 950 953, July 1990. [13] L.-W. Lee, J.-F. Wang, J.-Y. Lee, and J.-D. Shie, Dynamic searchwindow adjustment and interlaced search for block-matching algorithm, IEEE Trans. Circuits Syst. Video Technol., vol. 3, pp. 85 87, Feb. 1993. [14] R. Li, B. Zeng, and M. L. Liou, A new three-step search algorithm for block motion estimation, IEEE Trans. Circuits Syst. Video Technol., vol. 4, pp. 438 442, Aug. 1994. [15] R. Srinivasan and K. R. Rao, Predictive coding based on efficient motion estimation, IEEE Trans. Commun., vol. COM-33, pp. 888 896, Aug. 1985. [16] K. H.-K. Chow and M. L. Liou, Genetic motion search algorithm for video compression, IEEE Trans. Circuits Syst. Video Technol., vol. 3, pp. 440 445, Dec. 1993. [17] B. Liu and A. Zaccarin, New fast algorithms for the estimation of block motion vectors, IEEE Trans. Circuits Syst. Video Technol., vol. 3, pp. 148 157, Apr. 1993. [18] X. Zhang, M. C. Cavenor, and J. F. Arnold, An efficient motion compensation scheme for coding videophone sequences for the broadband ISDN, in Proc. Australian Broadband Switching and Services Symp., Melbourne, Australia, July 1992, pp. 365 372. [19] S. J. Mullender, I. M. Leslie, and D. McAuley, Operating system support for distributed multimedia, in Proc. Summer 1994 USENIX Conf., Boston, MA, June 1994, pp. 209 219. REFERENCES [1] Video codec for audio visual services at p 2 64 kbit/s, CCITT Recommendation H.261, 1990. [2] Information technology Coding of moving pictures and associated audio for digital storage media at up to about 1.5 Mbit/s, ISO/IEC 11172, Aug. 1993. [3] Information technology Generic coding of moving pictures and associated audio, ISO/IEC 13818, Mar. 1995.