Tunnelling-based Search Algorithm for Block-Matching Motion Estimation María Santamaría*, María Trujillo *Universidad del Valle, Colombia, maria.santamaria@correounivalle.edu.co Universidad del Valle, Colombia, maria.trujillo@correounivalle.edu.co Keywords: Block-matching algorithms, motion estimation, tunnelling search. Abstract Video compression techniques rely on motion estimation, which consists in estimating the displacement of image content from one frame to other. Block-matching is commonly used to calculate such displacement since is an efficient algorithm for reducing the temporal redundancy in video coding and is adopted by video coding standards. Also, it is perhaps the most reliable and robust technique for this purpose. New algorithms have been proposed in order to reduce the computational cost of block-matching without degrading estimation quality. However, these approaches may fall into a local minimum values since the search space is not covered completely. In this paper, a tunnelling-based search (TBS) approach is proposed in order to avoid local minimum falls. The search is guided by the direction of the gradient. TBS shows a considerable reduction in the number of explored blocks without degrading the quality of prediction. 1 Introduction Motion estimation refers to the estimation of the displacement of image content from one frame to other in a time-sequence of 2-D images. Region matching algorithms are a common technique for motion estimation. In this technique it is assumed that all pixels within a region have the same motion activity [1]. Nowadays in many application fields such as video compression, the most widely used methods to estimate motion are a type of region matching method: block-based techniques, which are called block-matching algorithms (BMA). In a BMA the current frame is divided into non-overlapping blocks and for each analysed block, the algorithm searches for the block of the same dimensions that matches most. The search is made within a search window in the reference frame by minimising an error function called block distortion measure (BDM) defined as: error = E(f(x, y, t i ), f(x dx, y dy, t i 1 ))dxdy. (1) The relative position between the reference block and its best matched block is represented as a motion vector (d x, d y ). The brute force algorithm, the full-search, is simple and guarantees a high accuracy in finding the best match [2]. However, it involves a high computational cost due to it evaluates all positions in a search area of (2W + 1)x(2W + 1) size. Since motion estimation is a computationally expensive operation, different approaches have been proposed in order to improve accuracy and efficiency of block-matching algorithms. However, some approaches may fall into local minimum matching error points producing a degradation of the quality of prediction of the algorithms due to they do not cover the search area completely. Many proposed approaches principally the ones that use fixed set of search patterns [3 7] are based on two assumptions. First, the matching error decreases as the search approaches the position of the global minimum error. Second, the error surface is uni-modal with a global minimum error point [8]. An error surface represents estimation errors, its shape is determined by the optimised objective function and the data [9]. Video sequences with low amount of movement generally have uni-modal error surfaces and are characterised for having homogeneous error regions around the global minimum error point. Video sequences with medium and high amount of movement tend to have a behaviour with non-uni-modal error surfaces with multiple local minimum error points. Fig. 1 shows examples of 3-dimensional error surfaces. (a) represents a smooth movement, in it is easy to identify the global minimum which is surrounding by a somewhat homogeneous surface. (b) represents a more complex movement, in it a couple of local minima scattered through the surface are visible (blue points). It is important to consider that many BMA are not capable to reach a global minimum value located far from the search centre, since they are not able to surpass peaks. Taking this into account, this paper presents an approach based on a search guided by a tunnel allowing the exploration in other promising neighbourhood. The rest of the paper is organised as follows. Section II presents some previous work on avoiding local minima error points in block-based motion estimation techniques. Section III introduces the proposed Tunnelling-based Search (TBS). Section IV is focused on the experimental evaluation and Section V includes final comments as conclusions.
of the eight surrounding directions: upper, lower, left, right, upper-left, upper-right, lower-left and lower-right directions. It makes a straight path, pixel by pixel in each of these directions if in each step the distortion is reduced, otherwise, the search stops in that direction. The algorithm stops when there is no reduction in the distortion in any of the eight directions mentioned above. 2.3 Fast Directional Gradient Descent Search (a) Po et al., 2009 [11] proposed the fast directional gradient descent search, an improvement for the MDGDS that increases the speed of the algorithm and leads to little loss in quality of prediction. The improvement consists of detecting when in a direction a minimum is clearly better than the current search center. Thus, the algorithm stops to evaluate the remaining directions and starts a new stage in the minimum search found. They propose the measure Relative Ratio Distortion (RDR) as a criterion to determine whether a particular block is better than the reference block. When the path ends in one direction, the RDR is compared with a threshold and if it is lower, leaps and explores other directions. 2.4 Iterative Random Search (b) Figure 1. 3D SAD values using 16x16 block size and a search area of 33x33 pixels. (a) Akiyo s uni-modal error surface with a global minimum error point. (b) Stefan s non-uni-modal error surface with multiple local minimum error points. 2 Related Works Different approaches have been developed to try to avoid local minimum error points in block-based motion estimation techniques. Here are presented four of them. 2.1 Adaptive Rood Pattern Search Nie and Ma [8] proposed a BMA which consists of two sequential stages: initial search and refined local search. The initial stage use an adaptive rood pattern to reduce the risk of being trapped into local minimum matching error points. Then, in the refined local search a unit-size rood pattern is carried out repeatedly, and unrestrictedly, until the final MV is found. 2.2 Multi-Directional Gradient Descent Search Po et al., 2008 [10] proposed a BMA that starts with evaluating the distortion in the central block and independently in each Porto et al. [12] proposed an iterative BMA for high-definition videos which divides the search window into four areas. The algorithm selects five initial candidates: the search centre and one random block for each one of the aforementioned areas. It uses a cross pattern to determine their neighbours. The IRS evaluates the distortion for a candidate and its neighbours and takes the best of them as new candidate. This process is repeated until there is not an improvement for a centre candidate. Once the IRS has the five best blocks, it selects the best of them and this is considered as the best matched block. The IRS algorithm employs random candidates selection as a strategy to increase the probability of avoiding local minima fall. 3 Tunnelling-based Search (TBS) 3.1 Overview of the TBS Algorithm The proposed search strategy has as main purpose allowing a search in two promising neighbourhoods with the aim of avoiding local minimum fall. The average gradient direction of the analysed block is used to orient the direction, 180 degree anticlockwise (ACW), of one of the neighbourhoods. The other neighbourhood remains at the analysed block. In the first case, the objective is to allow TBS exploring another a promising region. 3.2 Window Centred Search The TBS explores the neighbourhood of the block located at the position of the analysed block to guarantee a good estimation of motion vectors when the sequence has low amount of movement. This is due to in such cases, the best matched block is near to the window search centre. Such exploration
is made using a cross pattern since any motion vector can be decomposed into two components: one vertical and one horizontal [8]. 3.3 Direction for guiding Search Let θ be the gradient direction, θ π is considered as the guided direction to determine an alternative search centre, due to θ represents the direction in which the image content is changing most rapidly (intensity or colour). The proposed orientation might be a good choice to find a best matched block, since BDM is a similarity function and the search would be directed towards a region whose content is similar to the target image content. The gradient direction can be calculated by the formula: ( ) f θ(i, j) = tan 1 f (i, j), (i, j). (2) y x In a block, the gradient direction is calculated for each pixel within it and the average gradient is obtained as: θ = 1 n 1 n 2 n 1 θ(i, j). (3) i=0 j=0 Where n is the block size, and i, j represents the pixel position. Finally, θ is discretised as shown: Figure 2. Discretisation of the gradient direction. Therefore, any gradient direction falling within the yellow range (0 to 45 or 315 to 360 degrees) is set to 0 degrees. Any gradient direction falling in the blue range (45 to 135 degrees) is set to 90 degrees. Any gradient direction falling in the green range (135 to 225 degrees) is set to 180 degrees. And finally, any gradient direction falling within the violet range (225 to 315 degrees) is set to 270 degrees. 3.4 Tunnelling Search Once θ + π direction has been estimated, the algorithm starts a tunnel exploration (straight path) in such direction until a peak is surpassed or until a number of iterations have been reached. Then, a refinement process is made until there is not an improvement on the candidate block. Figure 3. TBS algorithm flowchart. 4 Experimental Results The tunnelling search algorithm was implemented using the C++ programming language and compared to other known algorithms of the literature, such as: full-search, hexagon-based search, adaptative rood pattern search, multi- directional gradient descent search, and fast directional gradient descent search. The video sequences used to test the proposed approach were taken from Xiph.Org Foundation [13]. In total, 9 of them were used (see Table 1 and Fig. 4). Video Sequence Size No. of Frames Motion Akiyo 352x288 300 Low Coastguard 352x288 300 High Football 352x288 260 High Foreman 352x288 300 Medium Garden 352x240 115 Medium Mobile 352x288 300 Medium Mother daughter 352x288 300 Low Silent 352x288 300 Low Stefan 352x240 300 High Table 1. Characteristics of the test sequences. Classification taken from [14]. HEXBS algorithm is considered in the comparison due to it is a well-know block-matching algorithm and it is used by the H.264 standard. The experimental evaluation was made using different criteria. The sum of absolute differences (SAD) is used as BDM. Since SAD is computed for each block on the analysed frame, the complexity/efficiency of a BMA turns proportional to the (1)
(a) (b) (c) (d) (e) (f) (g) (h) (i) Figure 4. Test sequences used in the study: (a) Akiyo, (b) Coastguard, (c) Football, (d) Foreman, (e) Garden, (f) Mobile, (g) Mother daughter, (h) Silent, and (i) Stefan. Figure 5. EXB performance of TBS compared to HEXBS, ARPS, MDGDS, and FDGDS, on the complete Akiyo sequence. number of explored blocks (EXB). (2) Peak signal-to-noise ratio (PSNR) is employed to evaluate the quality of prediction of a BMA. These results were obtained comparing the original frame with the frame reconstructed through the motion vectors produced by the BMA (motion compensation process). Table 2 shows obtained results of the comparison based on the reconstructing frames of the full selected video sequences. It was considered a block size of 16x16, and a search window of 33x33. ARPS algorithm explores the lowest EXB, but it incurs in a considerable degradation of PSNR when comparing against the others algorithms. In video sequences with low amount of movement TBS is the most time consuming of the algorithm analysed (see Fig. 5). Moreover, when it is applied to video sequences with medium of high amount of movement, EXB by TBS is lower in most cases than MDGDS, FDGDS and IRS (see Fig. 6). PSNR degradation of TBS lies on 7.83% [0.202, 2.775] db, when comparing against full-search. Quality of prediction of TBS is not far from that MDGDS s corresponds 98% (see Fig. 8), and in most of the cases is higher than HEXBS, ARPS and IRS, as shown in Fig. 7. This is a reasonable degradation taking into account that the EXB by TBS is an average of 6 points less 25.01% than the EXB analysed by MDGDS. Lower PSNR values obtained by TBS correspond to video sequences with high or medium amount of movement, Coastguard, Football and Foreman. The performance of IRS algorithm is lower than the other algorithms compared. Not only the number of explored blocks is relatively higher, also the quality of prediction is lower. It is important to consider that this algorithm takes into consideration more search points than usual since it makes five refinement process for each block searched. Figure 6. EXB performance of TBS compared to HEXBS, ARPS, MDGDS, and FDGDS, on the complete Football sequence. 5 Conclusions In this paper, a simple block-matching algorithm, called tunnelling-based search (TBS) have been proposed. The algorithm searches in other promising region in order to avoid falling into local minimum error points. This is done by exploiting a neighbourhood located at π radians with respect to gradient direction. TBS presents a considerable reduction 25.01% on search points when comparing against the algorithm analysed with best quality of prediction (MDGDS), without degrading it too much ( 1.642 db). TBS algorithm can be implemented in parallel, since the analysed neighbourhoods are independent of each other. Thereby, the search can be extended to more than two neighbourhoods.
Sequence Measure FS TBS HEXBS ARPS MDGDS FDGDS IRS Akiyo PSNR 42.944 42.738 42.586 41.853 42.940 42.939 40.836 EXB 984.919 10.479 10.313 1.689 8.563 8.518 30.514 Coastguard PSNR 30.477 27.596 30.341 29.085 30.371 30.369 24.249 EXB 984.919 12.292 10.350 8.279 14.125 12.218 40.811 Football PSNR 25.673 21.971 21.151 22.345 24.515 24.456 19.161 EXB 984.919 15.526 10.357 11.910 28.560 25.769 39.057 Foreman PSNR 32.119 29.094 29.305 28.820 31.392 31.305 25.776 EXB 984.919 12.952 10.336 9.580 19.271 16.941 34.833 Garden PSNR 23.794 22.846 22.653 18.931 23.579 23.547 15.189 EXB 973.703 13.587 10.294 8.097 15.249 12.180 41.865 Mobile PSNR 24.588 22.335 24.243 21.102 24.521 24.519 20.454 EXB 984.919 9.695 10.318 8.552 11.231 11.065 30.016 Mother daughter PSNR 40.473 39.605 39.854 38.993 40.373 40.365 37.540 EXB 984.919 10.890 10.332 2.871 10.994 10.826 31.745 Silent PSNR 35.973 34.320 33.942 33.972 35.421 35.395 31.458 EXB 984.919 10.166 10.314 3.467 10.499 10.119 32.573 Stefan PSNR 24.104 20.040 20.868 20.285 22.216 22.212 17.978 EXB 973.703 11.948 10.275 7.893 16.671 15.438 37.194 Table 2. Performance and quality of prediction comparison of TBS with HEXBS, ARPS, MDGDS, FDGDS, and IRS. Figure 7. PSNR performance of TBS compared to HEXBS, ARPS, MDGDS, and FDGDS, on the complete Akiyo sequence. Figure 8. PSNR performance of TBS compared to HEXBS, ARPS, MDGDS, and FDGDS, on the complete Football sequence. TBS have shown to have a good trade off between quality of prediction and number of explored blocks. The considered direction seems to be a good direction for searching a bestmatched block. As future work, it is planned to propose a strategy to determine in a better way longitude of the tunnel search. References [1] Y. Luo, Fast adaptive block based motion estimation for video compression, Ph.D. dissertation, Ohio University, 2009. [2] J. Huska and P. Kulla, Trends in block-matching motion estimation algorithms, in 6th Internation Scientific Conference Radioelektronika 200, 2004, pp. 161 164. [3] L.-M. Po, K.-H. Ng, K.-M. Wong, and K.-W. Cheung, Motion compensated interframe coding for video conferencing, in Proceedings on National Telecommunications Conference (NTC81), Nov. 1981, pp. G5.3.1 G5.3.3. [4] L.-M. Po and W.-C. Ma, A novel four-step search algorithm for fast block motion estimation, IEEE Transactions on Circuits and Systems for Video Technology, vol. 6, no. 3, pp. 313 317, 1996.
[5] J. Y. Tham, S. Ranganath, M. Ranganath, and A. Kassim, A novel unrestricted center-biased diamond search algorithm for block motion estimation, IEEE Transactions on Circuits and Systems for Video Technology, vol. 8, no. 4, pp. 369 377, 1998. [6] C. Zhu, X. Lin, L.-P. Chau, K.-P. Lim, H.-A. Ang, and C.-Y. Ong, A novel hexagon-based search algorithm for fast block motion estimation, in 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2001 (ICASSP 01), vol. 3, 2001, pp. 1593 1596. [7] C.-H. Cheung and L.-M. Po, Novel cross-diamondhexagonal search algorithms for fast block motion estimation, IEEE Transactions on Multimedia, vol. 7, no. 1, pp. 16 22, 2005. [8] Y. Nie and K.-K. Ma, Adaptive rood pattern search for fast block-matching motion estimation, IEEE Transactions on Image Processing, vol. 11, no. 12, pp. 1442 1449, 2002. [9] J. Ylipaavalniemi, Variability of independent components in functional magnetic resonance imaging, Master s thesis, Helsinki University of Technology, 2005. [10] L.-M. Po, K.-H. Ng, K.-M. Wong, and K.-W. Cheung, Multi-direction search algorithm for block-based motion estimation, in IEEE Asia Pacific Conference in Circuits and Systems (APPCAS), 2008, pp. 1466 1469. [11] L.-M. Po, K.-H. Ng, K.-W. Cheung, K.-M. Wong, Y. M. S. Uddin, and C.-W. Ting, Novel directional gradient descent searches for fast block motion estimation, IEEE Transactions on Circuits and Systems for Video Technology, vol. 19, no. 8, pp. 1189 1195, 2009. [12] M. Porto, C. Cristani, P. Dall Oglio, M. Grellert, J. Mattos, S. Bampi, and L. Agostini, Iterative random search: a new local minima resistant algorithm for motion estimation in high-definition videos, Multimedia Tools and Applications, pp. 107 127, 2013. [13] Xiph.org video test media [derf s collection]. [Online]. Available: http://media.xiph.org/video/derf/ [14] V. Padilla, Algoritmos de block-matching para compresión de video, Final Career Project, Systems Engineering Program, Universidad del Valle, 2009.