Toward Optimal Pixel Decimation Patterns for Block Matching in Motion Estimation

th International Conference on Advanced Computing and Communications Toward Optimal Pixel Decimation Patterns for Block Matching in Motion Estimation Avishek Saha Department of Computer Science and Engineering, Indian Institute of Technology, Kharagpur, WB 21302, India, avishek@cse.iitkgp.ernet.in Abstract In this paper, we present some results on the use of N- queen sub-sampling lattices for motion estimation in H.264. For MPEG-4, N-queen has been shown to give better results compared to other existing pixel decimation lattices in terms of spatial homogeneity and directional coverage. We aim to develop a generalized algorithm to select an M- length pattern from an N N block such that the selected pattern is optimal with respect to the aforementioned metrics of spatial homogeneity and directional coverage. In the process, we observe and present a few interesting pixel decimation patterns that might be useful for the purpose of motion estimation. 1 Introduction Motion estimation is an integral part of well-known video compression standards, such as, MPEG-1/MPEG-2 and H.261, H.263 and H.264. It achieves compression by exploiting the temporal redundancy in successive frames of a video sequence. Limited processing power, battery life and memory capacity require reduced complexity encoders. Motion estimation being the most computationally intensive module is an ideal candidate for optimization. There are six categories [] to improve motion estimation, namely, (a) reduction in search position [16, ], (b) simplification of matching criterion [14], (c) bitwidth reduction [], (d) predictive search [4], (e) hierarchical search [9] and (f) fast full search [3]. The pixel decimation technique can be easily combined with most of the aforementioned approaches. According to the N-queen pixel decimation approach [], the spatial information of an N N block can be fully represented by the least number of pixels, only when we select at least one pixel from each row, column and diagonal. The N-queen sampling lattice has been analyzed in terms of spatial homogeneity and directional coverage. Spatial homogeneity is measured by the average (µ d )and variance (σd 2 ) of spatial distances from each skipped pixel to its nearest pixel, as given below: σ 2 d = µ d = 1 (N 2 K) 1 (N 2 K) N x=1,y=1 N x=1,y=1 (x, y) S(x, y) (1) ( (x, y) S(x, y) µ d ) 2 (2) where N is the size of the block, S(x, y) is the location of the selected pixel nearest to the pixel at location (x, y) and K is the number of selected pixels. Lower the value of µ d and σd 2, the more is the spatial homogeneity of the sampling lattice. Directional coverage is given by the ratio of the number of edges which have at least one of the selected pixels, to the total number of edges. Edges can be lines passing through the N N block in any of 0, 45, 90 and 135 directions. It has been shown [13] that the N- queen pattern is better than the Quarter [2], Hexagonal [5], Quincunx [10] and Yu s [] patterns in terms of spatial homogeneity and directional coverage. Previously, the work on N-queen based pixel decimation involved the selection of two patterns, namely 4-queen and -queen, and their implementation on MPEG-4 reference software. The patterns were also analyzed in terms of the aforementioned criteria. However, no explanation was provided as to why these patterns perform better in terms of those criteria. The question of whether there exists an optimal pattern was also left unanswered. In this work, we try to explore the existence of a pattern optimal in terms of the performance criteria used in evaluation of the N-queen patterns. The salient contributions of our work are: 1. suitability study of N-queen patterns on H.264, 2. search for optimal patterns in and 16 16 blocks, 0-695-3059-1/0 $25.00 200 IEEE DOI 10.09/ADCOM.200. 635

The paper is organized as follows. Section 2 provides some insight into our optimal N-length pattern search. It also discusses the use of Genetic Algorithms in the search of optimal pixel decimation patterns. Experimental results of the GA-based search and the performance of the obtained GA-based sampling lattices on standard test sequences are presented in Section 3. Finally, Section 4 concludes this paper and provides future directions. 2 Optimal N-Length Pattern Selection Let us denote the number of pixels to be selected by M and the size of the block by N N. The selection of an M- length pattern from an N N block can be mapped onto the simpler problem of selecting any M positions from an available set of N N = N 2 positions, such that no position in the M-length sequence occur more than once. We intend to select an optimal M-length pattern from the C N 2 M possible patterns. For N =, M and N =16, M 32, this computation becomes prohibitively large. Hence, we use genetic algorithm (GA) to obtain the optimal M-length patterns in an N N block. A genetic algorithm [6] has five components: 1. Encoding: We encode the chromosome as a sequence of M numbers p i,where0 i (M 1), such that no p i is repeated in any given chromosome and all p i s have values in the range 0 p i (N 2 1). The chromosome length M is an input parameter. A sample chromosome for the pattern length M =is represented as: 9 26 3 4 51 54 we adopt the concept of an elitist model [6]. The offsprings are randomly selected for crossover. Crossover is performed only within the previous generation mating pool. In our application, mutation is applied with small probability on the crossovered off-springs. 5. Other Parameters: Parameters such as population size, probability of applying genetic operators and number of elites have been determined by experimentation. For an block, we try to find out the best patterns of length M, where M 16. The minimum value of M is taken to be because any value of M lower than will result in increase in the value of µ d. The maximum value of M is taken to be 16 because it is shown that for M =16, µ d reaches its lowest value for a block of size, as discussed in the following lemma. Lemma 2.1 For a block of size, µ d achieves a lowest possible value of 1, when the number of pixels selected is more than or equal to 16. Proof. Let us consider the patterns in Fig. 1. It can be seen that (a) 1 1 1 1 1 1 1 1 1 1 1 1 Figure 1. pixel decimation patterns. (a) Pattern obtained by tiling four 4-queen Pattern (b) Pattern obtained by GA-based search (b) The corresponding pixel decimation pattern is shown in Fig. 2b. The numbers have been assigned to the smaller cells in a row-major fashion. 2. Initial Population: We use both random and improved initial input population. 3. Fitness Function: The best M-length sampling lattice preserves the maximum amount of texture and edge information. This information can be represented by the parameters of spatial homogeneity [Eqs. (1) and (2)] and directional coverage. Smaller mean and variance indicate a more spatially homogeneous [] sampling lattice. By definition, a lower value of µ d automatically lowers the values of σ 2 d and σ d/µ d. So, we use only µ d as the fitness function. 4. Genetic Operators: The traditional genetic operators are selection, crossover and mutation. In this work, the lowest possible distance between every pixel and its nearest selected pixel is 1. The given patterns use 16 pixels and have a µ d value of 1. For sequences of length greater than 16, the new selected pixel has to be placed in any of the unshaded location. But, this new selected pixel in no way can lower the current value of µ d (=1). Thus, investigating µ d values of sequence length greater than 16 is not necessary. For 16 16 blocks, we investigate patterns of length 4 times the length of patterns considered in the case of blocks. This facilitates comparison of the results of 16 16 blocks with that of the blocks. 3 Experimental Results In this section, we provide results of our GA-based search. We also present the performance of the obtained pixel patterns in encoding of standard test sequences. 636

Table 1. Results of GA-based Search. Patt Spatial Homogeneity Directional Coverage Len µ d σ 2 d σ d /µ d 0 90 45 135 Block Size Q 1.320 0.14 2.% 1.234 0.04 23.4% 4 9 1.166 0.041 1.40% 6 10 1.130 0.03 1.01% 6 1.109 0.033 16.45% 1.064 0.022 14.04% 13 1.049 0.01.2% 14 1.033 0.013 10.% 1.01 0.006.05% 16Q 1 0 0 16 1 0 0 Block Size 16 16 4 6 6 6 13 13 13 32Q 1.305 0.135 2.14% 16 16 20 32 1.216 0.09 23.% 13 14 20 36 1.6 0.040 1.36% 13 22 40 1.3 0.036 16.4% 13 20 44 1.094 0.030.4% 16 25 4 1.05 0.021 13.56% 16 16 2 52 1.045 0.01.29% 16 16 2 56 1.029 0.0 10.2% 16 16 2 60 1.013 0.005 0.04% 16 16 29 16 16 22 64Q 1 0 0 16 16 29 64 1 0 0 Q denotes pattern obtained using N-queen technique Patt Len denotes the length of the decimation pattern 3.1 GA-basedSearchResults 20 23 22 1 22 2 2 26 2 22 29 Table 1 shows the results of our GA-based search with different pattern lengths (say, M), on block sizes of and 16 16. For blocksize, there exist patterns with µ d much less than the -queen pattern. This can be seen for M =. 3.2 Performance of GA-based sampling lattices Our experiments were performed on a typical motion estimation software (MEPackage) without any encoding and on the H.264 1 JM 10.2 reference software [1]. The 1 Encoder parameter configuration: High profile Level 3.3, Period of I-Frames = 10, Quantization parameter for I and P Slices (0-51) = 2, No frames skipped, Subpixel motion estimation disabled, Number of previous references frame = 2, Only InterSearch16x16 enabled, No B-frame used, (a) µ d =1.32 (b) µ d =1.234 (c) µ d =1.216 Figure 2. Sampling lattices for (a) -queen based N =, M =, (b) improved GA-based for N =, M =, (c) improved GA-based for N =16, M =32 distortion metric used was the sum-of-absolute-differences (SAD). We carried out our experiments on various M-length sampling lattices, where M 16, for N =and 32 M 64, forn =16. When M 16, the sampling lattices for 16 16 macroblocks were constructed by tiling 4 smaller sampling lattices. These pixel decimation patterns have been implemented with the full search strategy. Table 2 and 3 presents the results on MEPackage and H.264 for the slow-motion sequence Container and the fast-motion sequence Foreman. The column P represents the PSNR value. The column P denotes the fall in PSNR value for a particular method with respect to the Full Sampling (FS) lattice. For the case, FS 4, FS 4 9, etc., denote that the 16 16 sampling lattices were constructed by tiling 4 smaller sampling lattices with M =, M =9, and so on. For the 16 16 case, FS 32,FS 36, etc., denotethat the 16 16 sampling lattices were constructed by selecting 32, 36, etc., number of pixels, respectively. For all cases, FS 16 4QandFS 4 Q denote that the 16 16 sampling lattices were obtained by tiling sixteen 4-queen patterns and four -queen patterns, respectively. The column SUF denotes the SpeedUp Factor (SUF) obtained by using our pixel decimation patterns over the Full Sampling (FS) lattice. From Table 1, 2 and 3, we make the following observa- SP-Picture Periodicity disabled, Entropy coding method = CABAC, RDoptimized mode decision = 1, Initial QP for rate control = 24 63

Table 2. Performance of GA-based sampling lattices on MEPackage [ P:PSNRY, P: PSNRY, SUF: SpeedUp Factor] Method P P Method P P SUF sampling lattices 16 16 sampling lattices FS 43.0925 - FS 43.0925-1 FS 16x4Q 43.042 0.00003 FS 16x4Q 43.042 0.00003 4 FS 4x16 43.0241 0.0093 FS 64 43.0241 0.0093 4 FS 4x 43.0545 0.0166 FS 60 43.03405 0.005 4.2 FS 4x14 43.03666 0.0149 FS 56 43.06359 0.096 4.5 Container FS 4x13 43.0600 0.035 FS 52 43.060005 0.032 4.92 QCIF FS 4x 43.0516 0.0349 FS 4 43.05442 0.03413 5.33 FS 4x 43.060326 0.029 FS 44 43.02242 0.06963 5.1 FS 4x10 43.0336 0.05544 FS 40 43.0464 0.044391 6.4 FS 4x9 43.0465 0.04539 FS 36 43.040 0.0454. FS 4xQ 43.003960 0.0195 FS 4xQ 43.003960 0.0195 FS 4x 43.035000 0.055 FS 32 42.994339 0.0916 FS.42520 - FS.42520-1 FS 16x4Q.6215 0.00345 FS 16x4Q.6215 0.00345 4 FS 4x16.915 0.063345 FS 64.915 0.063345 4 FS 4x.4 0.1006 FS 60.564 0.06033 4.2 FS 4x14.6534 0.396 FS 56.256 0.64 4.5 Foreman FS 4x13.6093 0.442 FS 52.655 0.6645 4.92 QCIF FS 4x.693256 0.149264 FS 4.6992 0.16254 5.33 FS 4x.534 0.2646 FS 44.69060 0.346 5.1 FS 4x10.65693 0.1553 FS 40.64260 0.19426 6.4 FS 4x9.5341 0.30369 FS 36.549410 0.291. FS 4xQ.56401 0.249 FS 4xQ.56401 0.249 FS 4x.590530 0.25199 FS 32.52001 0.322502 tions: 1. It can be seen from Table 1, that the M =pattern obtained in GA-based search has a lower µ d and σ 2 d than the -queen pattern for both N =and N =16. However, Table 2 shows that the -queen pattern gives better results in terms of PSNR, as compared to the GA-based M = pattern for N = 16 but not for N =. In Table 3, the -queen always gives better results than the GA-based M =pattern. 2. In Table 1, the µ d and σd 2 values are identical for 4- queen based 16-pixel pattern (16Q) and the GA-based M =16pattern (16). In addition, the directional coverage of GA-based M =16pattern (16) is better than that of 4-queen based 16-pixel pattern (16Q). However, it can be seen in Table 2 and 3 that the PSNR results obtained for the two cases are not identical with the 4- queen based FS 16 4Q pattern giving better results in all the cases for both N =and N =16. 3. Again, the µ d and σd 2 values of the M-length patterns (32 M 60) for N = 16 is less than that of theircorrespondingn =patterns. However, in most cases, the PSNR values for N =is comparable and at times even much better than that for N =16. The better performance of the N-queen patterns proposed in [] was only rationalized in terms of spatial homogeneity and directional coverage. However, our explorations and analysis show that there exist even better patterns in terms of the prescribed criteria of spatial homogeneity and directional coverage, which do not always lead to better performance in terms of PSNR. Hence the metrics of spatial homogeneity and directional coverage cannot give us complete information about the optimality of a sampling lattice. 4 Conclusions and Future Work In this paper, we have presented an in-depth exploration of various M-length pixel patterns within an N N block. 63

Table 3. Performance of GA-based sampling lattices on H.264 [ P:PSNRY, P: PSNRY, SUF: SpeedUp Factor] Input Method P P Method P P SUF sampling lattices 16 16 sampling lattices FS 30.9 - FS 30.9-1 FS 16x4Q 30.59 0.39 FS 16x4Q 30.59 0.39 4 FS 4x16 30.55 0.43 FS 64 30.55 0.43 4 Container FS 4x 30.53 0.45 FS 60 30.49 0.49 4.2 QCIF FS 4x14 30.43 0.55 FS56 30.55 0.43 4.5 ± 16 FS 4x13 30.44 0.54 FS 52 30.44 0.54 4.92.5 Hz FS 4x 30.43 0.55 FS 4 30.43 0.55 5.33 10 kbps FS 4x 30.42 0.56 FS 44 30.3 0.61 5.1 FS 4x10 30.35 0.63 FS 40 30.2 0.1 6.4 FS 4x9 30.24 0.4 FS 36 30.2 0.1. FS 4xQ 30.21 0. FS 4xQ 30.21 0. FS 4x 30. 0.3 FS 32 30.1 0.1 FS 32.93 - FS 32.93-1 FS 16x4Q 32.49 0.44 FS 16x4Q 32.49 0.44 4 FS 4x16 32.49 0.44 FS 64 32.49 0.44 4 Foreman FS 4x 32.45 0.4 FS 60 32.45 0.4 4.2 QCIF FS 4x14 32.4 0.53 FS 56 32.41 0.52 4.5 ± 16 FS 4x13 32.36 0.5 FS 52 32.35 0.5 4.92 10 Hz FS 4x 32.29 0.64 FS 4 32.29 0.64 5.33 2 kbps FS 4x 32.21 0.2 FS 44 32.23 0. 5.1 FS 4x10 32. 0.1 FS 40 32.10 0.3 6.4 FS 4x9 32.01 0.92 FS 36.99 0.94. FS 4xQ. 1.05 FS 4xQ. 1.05 FS 4x.6 1.0 FS 32. 1.06 Our aim was to develop a generalized algorithm for selecting an optimal M-length pixel decimation pattern from an N N block. In the process, we have shown that patterns having better values of spatial homogeneity and directional coverage than the N-queen exist, but these patterns do not always lead to better performance in terms of reconstructed video quality. Thus, it may be infered that in addition to the aforementioned metrics there may exist some other criteria which needs to be considered for a better and more accurate estimate of the sampling lattice quality. Future work lies in finding an optimal criteria and developing an algorithm to find an optimal M-length pattern for any given block dimensions. Acknowledgment This work has been supported by a research grant from the Department of Science and Technology (DST), Govt. of India, under Research Grant No. SR/S3/EECE/024/2003. References [1] JVTModel JM10.2. http://iphome.hhi.de/suehring/tml/. [2] M. Bierling. Displacement estimation by hierarchical block matching. Proc. SPIE Conf. Visual Comm. Pro., 1001:942 951, 19. [3] M. Brunig and W. Niehsen. Fast full-search blockmatching. IEEE Trans. on CSVT, (2):241 24, 2001. [4] J. Chalidabhongse and C. Kuo. Fast motion vector estimation using multiresolution-spatio-temporal correlations. IEEE Trans. on CSVT, (3):4 4, 199. [5] K. Choi, S. Chan, and T. Ng. A new fast motion estimation algorithm using hexagonal subsampling pattern and multiple candidate search. In Proc. IEEE ICIP, pages 49 500, 1996. [6] D. Goldberg. Genetic Algorithms in Search, Optimization and Machine Learning. Reading, MA: Addison-Wesley, third indian reprint edition, 2000. [] Y. Huang, S. Ma, C. Shen, and L. Chen. Predictive line search: An efficient motion estimation algorithm for mpeg-4 encoding systems on multimedia processors. IEEE Trans. on CSVT, 13(1):1, 2003. [] Y. W. Huang, C. Y. Chen, C. H. Tsai, C. F. Shen, and L. G. Chen. Survey on block matching motion estimation algo- 639

rithms and architectures with new results. Jrnl. of VLSI Sig. Pro., 42(3):29 320, March 2006. [9] J. Lee and N. Lee. Variable block size motion estimation algorithm and its hardware architecture for h.264. In Proc. of IEEE Int. Symp. Circuits Syst. (ISCAS), pages 40 43, 2004. [10] K. Lengwehasatit and A. Ortega. Probabilistic partialdistance fast matching algorithms for motion estimation. IEEE Trans. on CSVT, :139 2, February 2001. [] J. Luo, C. Wang, and T. Chiang. A novel all-binary motion estimation (abme) with optimized hardware architectures. IEEE Trans. on CSVT, ():00, 2002. [] C. N. Wang, S. W. Yang, C. M. Liu, and T. Chiang. A hierarchical decimation lattice based on n-queen with an application for motion estimation. IEEE Sig. Pro. Lett., 10():22 2, Aug 2003. [13] C. N. Wang, S. W. Yang, C. M. Liu, and T. Chiang. A hierarchical n-queen decimation lattice and hardware architecture for motion estimation. IEEE Trans. on CSVT, 14(4):429 440, April 2004. [14] Y. Wang, Y. Wang, and H. Kuroda. A globally adaptive pixel decimation algorithm for block-motion estimation. IEEE Trans. on CSVT, 10(6):1006 10, 2000. [] Y. Yu, J. Zhou, and C. W. Chen. A novel fast block motion estimation algorithm based on combined subsamplings on pixels and search candidates. Jrnl. of Vis. Comm. and Image Repr., :96 105, 2001. [16] S. Zhu and K. Ma. A new diamond search algorithm for fast block-matching motion estimation. IEEE Trans on Image Pro., 9(2):2 290, 2000. 640