Fast Template Matching Using Correlation-based Adaptive Predictive Search

Size: px

Start display at page:

Download "Fast Template Matching Using Correlation-based Adaptive Predictive Search"

Scarlett Marsh
5 years ago
Views:

1 Fast Template Matching Using Correlation-based Adaptive Predictive Search Shijun Sun, HyunWook Park, David R. Haynor, Yongmin Kim Image Computing Systems Laboratory, Departments of Electrical Engineering, Bioengineering, and Radiology, University of Washington, Seattle, WA Received 30 April 2003; accepted 10 June 2003 ABSTRACT: We have developed the Correlation-based Adaptive Predictive Search (CAPS) as a fast search strategy for multidimensional template matching. A 2D template is analyzed, and certain characteristics are computed from its autocorrelation. The extracted information is then used to speed up the search procedure. This method provides a significant improvement in computation time while retaining the accuracy of traditional full-search matching. We have extended CAPS to three and higher dimensions. An example of the third dimension is rotation where rotated targets can be located while again substantially reducing the computational requirements. CAPS can also be applied in multiple steps to further speed up the template matching process. Experiments were conducted to evaluate the performance of 2D, 3D, and multiple-step CAPS algorithms. Compared to the conventional full-search method, we achieved speedup ratios of up to 66.5 and 145 with 2D and 3D CAPS, respectively Wiley Periodicals, Inc. Int J Imaging Syst Technol, 13, , 2003; Published online in Wiley InterScience ( DOI /ima Key words: Correlation-based Adaptive Predictive Search (CAPS); template matching; correlation coefficient; step sizes; 3D CAPS; multiple-step CAPS I. INTRODUCTION Template matching is the process of locating the position(s) at which a specified template is located inside an image or search area of a larger size. The matching process involves (1) moving the template within the search area, (2) at each template location, computing the similarity between the template and the image area over which the template is positioned, and (3) determining the positions where a good similarity measure is obtained. The correlation between the template and the image window has been used as a measure of similarity in template matching and image registration since the 1970s (Rosenfeld, 1969; Pratt, 1974). However, computing the correlation coefficient is extremely expensive, thus this approach has been seldom used in practice. Instead, Sum of Absolute Differences (SAD) or other similar measures are used to reduce the computational burden (Jain, 1981; ISO, 2001a). Correspondence to: Y. Kim, Departments of Electrical Engineering and Bioengineering, Box , University of Washington, Seattle, WA ; ykim@u.washington.edu The advantages of the correlation coefficient approach are its reliability and accuracy. The correlation coefficient is independent of any offset or linear transform in the data sets. During the past decades, several approaches have been developed to achieve faster template matching (Jain, 1981; Rosenfeld, 1977; Goshtasby, 1984; Nagel, 1972; and Barnes, 1972). These approaches can be categorized into two classes. Coarse-fine matching (Rosenfeld, 1977), or the similar two-stage matching proposed by other researchers (Goshtasby, 1984), subsamples the template and matches it to the subsampled image first. Wherever the subsampled correlation coefficient exceeds a predetermined threshold, the full-resolution template and image are used to calculate the correlation coefficient. This method speeds up the matching process, but carries a small false dismissal probability. Another class of fast search algorithms is three-step search (Jain, 1981), which is widely used in motion estimation for digital video compression and processing. In the first search step, a search step size of 4 pixels is used. Once an optimal point is found, the step size is reduced to 2 pixels to evaluate the neighborhood of this previously determined optimal point to choose the next search point. In the third step, all the neighboring points of the second search point are evaluated to find the final best-matched point. Certainly, these fast search methods can speed up the search process, but mismatches or suboptimal matches can occur. With the continuing advancement in computer architectures, especially with powerful modern processors, it is possible to implement template matching based on the correlation coefficient approach in a reasonable time. However, many applications (e.g., in machine vision and digital video) can benefit from a search method with both high speed and good accuracy. In this paper, such an algorithm, called Correlation-based Adaptive Predictive Search (CAPS), is presented. First, a 2D image template is analyzed; certain characteristics are computed from its autocorrelation. The extracted information is then used to speed up the search procedure by properly choosing a set of search step sizes. Sections II and III present the basic CAPS algorithm and experimental results. We have extended CAPS to three and higher dimensions. Section IV shows an example with the third dimension as rotation where rotated targets can be located while again substantially reducing the computational requirements. Section V demonstrates how to apply CAPS in multiple steps to further speed up the 2003 Wiley Periodicals, Inc.

Figure 1. Block diagram of the CAPS algorithm in template matching. V C and T M represent the CAPS cut value and matching threshold, respectively. template matching process.

PRINCIPLE OF 2D CAPS For two data sets, {a} and {b}, where {a} and {b} could be functions, matrices, or images, etc.

2 Figure 1. Block diagram of the CAPS algorithm in template matching. V C and T M represent the CAPS cut value and matching threshold, respectively. template matching process. Discussions and conclusion on the CAPS algorithm are given in Sections VI and VII, respectively. II. PRINCIPLE OF 2D CAPS For two data sets, {a} and {b}, where {a} and {b} could be functions, matrices, or images, etc., the correlation coefficient (corr) between {a} and {b} is defined as E a E a b E b corr a, b sd a sd b which is usually simplified to E a b E a E b corr a, b sd a sd b (1) (2) Figure 2. The test image where three example templates are defined. The area #1 is the eye template, #2 is the feather template, and #3 is the shoulder template. 170 Vol. 13, (2003)

3 Figure 3. Solid line: the autocorrelation coefficients of the eye template in Fig. 2 along the horizontal and vertical lines through (0, 0). Dashed line: the cross correlation between the eye template and the test image along the horizontal and vertical lines through the position of the eye template. The horizontal and vertical widths are defined for the autocorrelation by V C of 0.5. where E[x] is the expected value or mean of a data set {x} and sd[x] is the standard deviation of {x}. The correlation coefficient takes a value in the range of 1.0 to 1.0 and gives us a quantitative measure of how similar the two data sets are. The calculation of correlation coefficients for every possible search point during template matching is extremely time-consuming. Thus, a search method with both high speed and accuracy could play an important role in making the correlation coefficient method computationally reasonable and supportable in systems. Correlation-based Adaptive Predictive Search (CAPS) is based on the statistics of the template. It utilizes the high correlation that typically exists between neighboring template pixels. Like three-step search, CAPS calculates the correlation coefficient at a sublattice of points in the image and then searches a neighborhood of each of the matched points in the sublattice to determine the final matched locations. CAPS differs from three-step search in that the spacing of the sublattice depends on the autocorrelation of the template rather than being fixed. Figure 1 shows the block diagram of the CAPS algorithm. In the first step, the autocorrelation of the template is analyzed. To calculate template s autocorrelation, the template is expanded by either periodic padding or padding with a constant value. After generating a padded template, the correlation coefficient between the original template and the padded template is computed in the 2D full-search range, giving the autocorrelation of the template. The resulting 2D autocorrelation has a peak at the center of the 2D autocorrelation plane. We have found that the shape of this central peak is not sensitive to the padding method employed. In the following study, constant padding with the mean value of the template was used. Figure 4. widths. Mechanism to derive the horizontal and vertical Vol. 13, (2003) 171

4 Figure 5. CAPS search lattice with CAPS horizontal and vertical step sizes, SS h and SS v. Next, the central peak of the autocorrelation is analyzed. Figure 2 shows a test image and three example templates. The horizontal and vertical cross sections through (0, 0) of the computed autocorrelation of the eye template are shown in Fig. 3 (the solid line). Given a cut value V C,0 V C 1, we can measure the extent of the autocorrelation peak. For example, V C 0.5 is used in Fig. 3. The cross correlation along the horizontal and vertical lines between the eye template and the test image in Fig. 2 is also shown in Fig. 3 (the dashed line). We can see that the cross correlation peak near the eye position is very similar to the corresponding autocorrelation peak. Therefore, the extracted information, e.g., the horizontal and vertical widths of the peak derived from the autocorrelation of the template, can be used as the guidance for the matching process. Figure 4 illustrates how CAPS derives the horizontal and vertical widths corresponding to a given V C, which will then determine the horizontal and vertical skip distances in template matching. Starting out from the central peak at (0, 0), the template s autocorrelation coefficient value decreases in all directions. As soon as it becomes equal to or less than V C while traversing only along horizontal and vertical axes, the horizontal and vertical widths of this peak are determined, so that we can avoid calculating the autocorrelation coefficient for every point in Fig. 4. The search skip distances or CAPS step sizes, horizontal (SS h ) and vertical (SS v ), are defined as one half of the horizontal and vertical widths of the autocorrelation peak of the 2D template. The upper limit for SS h and SS v is the template s width and height, respectively. To perform template matching, a matching threshold (T M ), e.g., 0.9 or 0.8 depending on the image characteristics and a specific application, is chosen. The CAPS algorithm allows us to perform the calculation of 2D correlation coefficients initially on a coarse grid spaced by SS h and SS v without degrading the reliability in template matching. Figure 5 illustrates that when the correlation coefficient (corr) between the template and the search area at a specific point is smaller than or equal to V C T M, it is very unlikely that a target is located within the neighborhood of this point. Thus, we can move to another search point guided by SS h and SS v. Whenever we find corr V C T M, a full search is launched around this search point as shown in Fig. 5. After the whole search process, the final matched point(s) are determined by the criteria that (a) corr at each matched point is greater than T M and (b) it is the local maximum of the cross correlation over an area equal to the template size. A lower cut value leads to a larger step size. However, the larger step size does not always mean better performance in computation time. If too low a cut value is chosen, it is likely that more points will have their correlation coefficients exceeding V C T M in the first stage of the search, each requiring a local full search. In addition, more correlation coefficients have to be computed in each local full search due to its larger full-search area. III. EXPERIMENTS AND SIMULATIONS FOR 2D CAPS We have implemented CAPS and performed experiments on a Pentium 4 machine running at 1.7 GHz with 512 Mbytes of memory. In Sections III and IV, a cut value (V C ) of 0.5 was used. Studies on the influence of the cut value are included in Sections V and VI. For the eye template and the test image shown in Fig. 2, locating the template using full search took 10.7 seconds on the Pentium 4 machine with a matching threshold (T M ) of 0.9. Most ( 99.5%) of the time was spent in calculating correlation coefficients. Therefore, the overall speed in template matching can be increased with a search strategy that reduces the number of correlation coefficients calculated. A. CAPS with a Single Matched Point. With the same template and image, CAPS took second on the Pentium 4 machine. The time necessary to compute the template s autocorrelation and determine the CAPS step sizes, SS h and SS v, which were 7 and Figure 6. A group of templates with the size ranging from to Vol. 13, (2003)

Table I. Computation times with the 512 512 test image in Fig. 2 Template size Full-search time CAPS time CAPS speedup ratio 40 40 4.86 s 0.126 s 38.6 64 64 10.7 s 0.185 s 57.8 128 128 31.1 s 0.

5 Table I. Computation times with the test image in Fig. 2 Template size Full-search time CAPS time CAPS speedup ratio s s s s s s , took 1% of the total time. Thus, CAPS in this specific example is 57.8 times faster than full search. Figure 6 shows various templates ranging from to extracted from the test image in Fig. 2. In case of the template, SS h and SS v were computed to be 6 and 5, respectively. With the same value T M of 0.9, the full search took 4.86 seconds, while CAPS took second, resulting in a speedup factor of For a larger template, SS h and SS v were determined to be 9 and 13, respectively. The full search took 31.1 seconds, while CAPS took second, providing a speedup factor of Table 1 summarizes the above results. We can see that CAPS is very effective in lowering the computational requirement in template matching. For the eye templates in Fig. 6 and T M of 0.9, the 2D CAPS speedup ratio is a function of the template size, ranging from 33.5 for a template to 66.5 for an template, as shown in Fig. 7. Usually, when the template size increases, the CAPS step sizes increase, reducing the number of correlation coefficients calculated in the first stage. On the other hand, the local full-search area expands, increasing the number of correlation coefficients calculated in the second stage. Figure 7 shows that for this specific case, the CAPS speedup ratio peaks at the template size of around B. CAPS with Multiple Matched Points. The goal of a fast search algorithm is not only high speed, but also high retrieval probability in template matching. We have already shown the computational speed improvements achieved with CAPS. To test the reliability of CAPS, we conducted the following experiment. Figure 8(a) was generated following the block map in Fig. 8(b). The original eye template in Fig. 2 was replicated into each block, then was offset by a constant ( 32 and 32 again) along the horizontal direction, repetitively blurred by a 3 3 boxcar lowpass filter in the downwards direction, and upwards contaminated with additive white Gaussian noise with a standard deviation of 16. It is not possible to detect all the multiple eye images by using any kind of differential methods, such as SAD or MAD (Mean Absolute Difference), because of their sensitivity to noise and offset in an image. With the three-step fast search method (Jain, 1981), we only locate one matched point in the test image because it is not capable of handling multiple matched points. With the traditional full-search approach based on normalized correlation coefficients, 19 out of 30 eyes were identified with T M 0.9 (positions shown in Table 2, left column), while 29 eyes were detected when T M was 0.8 (Table 2, middle), and all 30 eyes were correctly located when T M was set to 0.7 (Table 2, right column). With CAPS, we obtained the same results as with the traditional full-search approach at all T M values tested. The CAPS algorithm did not lose any of the targets, which demonstrates its high reliability. Even for the above case with multiple targets, our fast search was 11.4 times faster than the full-search method when T M was 0.7. These computational results with Fig. 8 are summarized in Table 3. It is an open question for all the matching algorithms as to how a matching threshold for an intended application can be determined optimally. We will discuss in Section VI some methods for choosing T M based on the noise level in images. IV. EXTENSION TO 3D CAPS WITH 2 ROTATIONS We have shown that CAPS is computationally efficient for template matching with one or multiple translated targets. In practice, it is also common to face the situations where the image contains one or multiple targets with different rotations. However, only 2D translational motion is typically considered in template matching since the computation required by rotation is exorbitantly large (Yoshimura, 1994). We have extended the idea of CAPS to three dimensions where the third dimension can be, for example, rotation. Rotated targets in the 2D image can then be located. This approach, which is called 3D CAPS, offers a clear computational advantage over the 3D template matching based on full search. To perform template matching with rotation, we first need to select the resolution used in the angular direction. For example, the angular resolution ( ) in radians can be calculated from the template size via 2 roundup 2 max t x, t y where t x and t y are the width and height of the template, respectively and roundup[x] means to round up the x value to a next integer. Thus, roundup [2 max(t x,t y )] is the total number of times we rotate an image (or template) in template matching. (3) Figure 7. Fig. 6. 2D CAPS speedup ratios for the templates in Vol. 13, (2003) 173

Figure 8. Test image with multiple 64 64 targets. For the 64 64 eye template shown in Fig. 2, is 0.0156 radian or 0.893 degrees.

6 Figure 8. Test image with multiple targets. For the eye template shown in Fig. 2, is radian or degrees. If we use the full-search method in this specific example, we would need to search the image for a total of 403 angles, which would make this process prohibitively timeconsuming. In extending the CAPS idea to incorporate rotation, the autocorrelation of the template in 3D needs to be computed with the third dimension being the rotational angle. As in 2D CAPS, it is not necessary to calculate the autocorrelation coefficients for the whole 3D space. We can derive the rotational width by starting from the center (rotation angle of 0) and traversing along the rotation angle axis as shown in Fig. 9. When the autocorrelation drops below the cut value in both positive and negative directions, we can stop calculating the autocorrelation coefficients. The rotational skip distance or step size (SS r ) is one half of this rotational width. There are two ways in template matching to handle the relative rotation between the template and the image. First, we can rotate the template. Because the template size is usually smaller than that of an image, rotating the template is faster. However, due to the rotation of the template, a new alpha mask has to be generated for each rotational angle. So, for every matching, a template-size subimage has to be extracted based on the alpha mask. The second way is to rotate the image. More computation is required for rotating the image than rotating the template. However, no alpha mask is required and the computation can be more easily supported in modern microprocessors with powerful computing instructions. Both rotation methods yield the same final matching results. In the following discussion, we used the method of rotating the image and set V C 0.5 as before. Figure 10 illustrates the mechanism of 3D CAPS after we have generated SS h,ss v, and SS r. The slices shown represent the rotated images with different rotational angles. As an example, six slices are shown in Fig. 10. The following steps are taken in 3D CAPS. First, we select rotational slices spaced by SS r (shown as solid-line frames) and perform 2D CAPS on these selected slices with SS h and SS v. The solid-line circles represent the points scanned in this step. The points with corr V C T M are marked and shown as black points A and B in Fig. 10. Second, we project all the black points to other slices (shown as dashed-line frames) within an interval of SS r, creating the points labeled as Ã or B in Fig. 10. Correlation coefficients at Ã and B (shown as solid circles) in the dashed-line frames are calculated. If at a specific point corr V C T M, it is marked gray. In Fig. 10, for example, there are two projected points in each of the two middle slices, but only one point is marked gray in each slice. Third, we launch a 2D full search locally in the shaded areas centered at the gray points in each slice. After the whole search process is com- Table II. Full search and CAPS template matching results on Fig. 8. T M value of 0.9, 0.8, and 0.7, respectively (from left to right). The blocks with Yes have been located by template matching T M 0.9 T M 0.8 T M 0.7 Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes 174 Vol. 13, (2003)

7 Table III. Computation times with the multiple-eye image shown in Fig. 8 T M Full-search time CAPS time CAPS speedup ratio s s s s s s 11.4 pleted, the position and rotation of one or multiple targets can be determined by the local maximum criterion. Experiments were performed on the Pentium 4. First, the test image in Fig. 2 was studied with T M of 0.9. SS h,ss v, and SS r of the eye template in Fig. 2 were determined to be 7 pixels, 6 pixels, and 33 times the angular resolution ( degrees), respectively. The full search with this template took 4,846 seconds while 3D CAPS took 33.8 seconds. A speedup factor of 143 was achieved. Since the image was not rotated, the determined matched angle of 0 degree was obtained as expected. Many other rotation angles were tested. For example, the test image in Fig. 2 was rotated by 285 (or 75) degrees, clipped to the image size of , and tested with 3D CAPS. The full search took 2,456 seconds using the same template. 3D CAPS took 16.9 seconds, which resulted in a speedup factor of 145. The determined matched angle was 319 with degrees, which corresponds to degrees. The third search dimension in 3D CAPS can be some parameter other than rotation, e.g., another spatial dimension, scaling (zooming or shrinking), frequency band, time, and the level of hierarchical decomposition. An nd CAPS where n is 3 is a straightforward extension based on the fundamental idea of CAPS. V. MULTIPLE-STEP CAPS From Figs. 3, 4, and 9, we see that a high cut value leads to small CAPS step sizes, but the number of triggers during coarse search is less and the local full-search area is smaller. On the other hand, a low cut value leads to large step sizes, but the number of triggers is more during coarse search and each local full-search area is larger. To take advantage of these characteristics of CAPS and further speed up the matching process, CAPS can be iteratively applied in multiple steps. Figure 11 illustrates a two-step CAPS example. The first-step CAPS with a lower cut value (V C1 ) is used over the whole search area. Then, CAPS is applied again with a higher cut value (V C2 )in the second-step CAPS area. This process can continue if the step size is larger than 2. The similarities and differences between onestep CAPS and multiple-step CAPS can be observed in Figs. 5 and 11. In Fig. 5, a full search is launched locally in the shaded area. On the other hand, Fig. 11 shows that a second CAPS with smaller horizontal and vertical step sizes (SS h2 and SS v2 ) is launched in the second-step CAPS area and the final full search is performed in a much smaller lightly-shaded area. Experiments were conducted with the test image in Fig. 2 and different templates shown in Fig. 6 on the same Pentium 4 machine. V C values of 0.4, 0.5, 0.6, and 0.7 were used in single-step CAPS while 0.3/0.65, 0.4/0.7, and 0.5/0.75 were used for V C1 /V C2 in two-step CAPS. For each template, the computation time was recorded for different cut value(s). Figure 12 shows the comparison of the computation time between one-step CAPS and two-step CAPS. For each template, the speedup ratio was calculated by dividing the best (minimum) computation time for single-step CAPS by the best two-step CAPS computation time. The speedup ratio ranged from 1.33 to 1.78 with an average of Similarly, with the feather and shoulder templates in Fig. 2, the speedup ratio with two-step CAPS was 1.40 and 1.84, respectively. We have tested our algorithm further on other images to confirm the computational improvement of multiple-step CAPS over singlestep CAPS. For example, we used the barbara image and a group of 12 templates extracted from it with sizes ranging from to Values of 0.4, 0.5, 0.6, and 0.7 were used for V C in single-step CAPS while 0.4/0.7, 0.5/0.75, and 0.6/0.8 were used for V C1 /V C2 in two-step CAPS. With these templates, the average speedup ratio was 1.65, which is a bit (10%) better than that (shown in Fig. 12) obtained from the test image in Fig. 2. The reliability of two-step CAPS in detecting the template locations was identical to that of single-step CAPS. The method can be easily extended to more than two steps, and the same idea is applicable to 3D CAPS. Usually, if the autocorrelation peak is wide, more than two steps of CAPS can be used to further facilitate the search. When the autocorrelation peak is narrow and sharp, one or two steps are enough. VI. DISCUSSION In contrast to existing fast search algorithms, CAPS adjusts the search step size(s) based on the characteristics of a given template. If there is a high correlation between adjacent pixels in the template, the correlation coefficients can be computed on a coarse lattice (large CAPS step sizes). For example, for the shoulder template in Fig. 2, when V C is set at 0.5, SS h and SS v are 15 and 36, respectively. On the other hand, SS h and SS v are small when there is not so much correlation in the template, e.g., for the feather template in Fig. 2 where SS h and SS v are 2 and 4, respectively. For a multi-image search, the CAPS step sizes need be computed only once. These predetermined step sizes can then be used in all the subsequent matching with this template. The advantage of CAPS is that we can be confident that we will not miss any instance of the template compared to the full-search method, while the computational burden is cut substantially by utilizing autocorrelation information from the template. However false dismissal caused by significant object deformation could certainly happen not only to CAPS, but also to full search. The fundamental assumption of the CAPS algorithm is that the search step sizes used in CAPS are reliable compared to those determined from cross correlation, i.e., the width derived from autocorrelation should not be greater than the width based on cross Figure 9. Autocorrelation coefficients as a function of rotational angle for the eye template in Fig. 2. Vol. 13, (2003) 175

Figure 10. 3D CAPS mechanism. correlation. The 12 templates in Fig. 6 and another 12 templates extracted from the barbara were used to verify our assumption with the cut value (V C ) set to 0.5.

Since the reliability is one of the critical goals of the CAPS algorithm, the width by autocorrelation might not always be the optimal width for template matching in a specific image.

8 Figure 10. 3D CAPS mechanism. correlation. The 12 templates in Fig. 6 and another 12 templates extracted from the barbara were used to verify our assumption with the cut value (V C ) set to 0.5. As shown in Fig. 13, the width by autocorrelation is always less than or equal to the width by cross correlation. Since the reliability is one of the critical goals of the CAPS algorithm, the width by autocorrelation might not always be the optimal width for template matching in a specific image. CAPS offers little gain when the template size is near the image size. In addition, when the template size is very small or the content of the template is very busy (with a very sharp autocorrelation peak), SS h and SS v (as well as the rotational step size in the 3D case) may be 1, and CAPS would then reduce to template matching by the full-search method. The choice of proper cut values is still an unsolved issue. In our single-step CAPS experiment, we used V C around 0.5; in our twostep CAPS experiments, we chose V C1 around 0.7 and V C2 to be half way between 1.0 and V C1, which produced reasonably good results. This could be certainly one option in practice. For those applications where more optimization is needed, further experiments with various combinations of cut values could be performed to determine the optimal set of parameters to minimize the overall computation time. As we mentioned in Section III, determining a reliable matching threshold (T M ) for a specific application remains as an open problem. When the images and/or templates are contaminated with additive white noise, T M to be used in the correlation coefficient methods, including CAPS, can be estimated based on the noise level in images. Mathematically, the correlation coefficient of a data set {a} and a contaminated data set {a N}, where {N} represents a set of white Gaussian noises, can be approximated as N T sd N sd a with sd[n] and sd[a] are the standard deviation of the noise set {N} and data set {a}, respectively. If we define {a} to be the template and {a N} tobethe template part of the contaminated image, T M can then be estimated as a function of the relative noise level N T, (5) corr a, a N where N T is the relative noise level defined as 1 1 N T 2 (4) Figure 11. Two-step CAPS. The first-step CAPS with a lower cut value (V C1 ) is used over the whole search area followed by another CAPS with a higher cut value (V C2 ) in the area localized by the first stage. 176 Vol. 13, (2003)

however, our studies show that it is a good approximation in case of additive white Gaussian noise.

9 Figure 12. The speedup ratio of two-step CAPS compared to one-step CAPS. 1 T M (6) 2 1 N T Since the pixel s dynamic range in digital images is limited and the pixel values are discrete via truncation or rounding, T M may be a little bit different from the ideal value; however, our studies show that it is a good approximation in case of additive white Gaussian noise. We have shown that CAPS can speed up the computation in template matching while producing the reliable matching results that are identical to the full-search method. In addition to being a fast algorithm, CAPS has an additional advantage in that it can handle multiple matched points, i.e., multiple occurrences of a given template in a search area, which is useful in some applications, e.g., machine vision. On the other hand, the three-step fast search method cannot support this feature. Assuming that there is only one target, we have compared the CAPS algorithm and traditional fast search methods in terms of computation speed. The eye template and the test image in Fig. 2 were used in this experiment. The original image was truncated symmetrically to different sizes, from to When the image size is less than , the ratio becomes less than one, which means the single-step 2D CAPS loses its computational advantage over the three-step search. Similar results can be observed for the two-level coarse-fine matching process. Therefore, when the search area is large compared to the template size, CAPS is faster than the traditional fast search methods. When the search area is small and one target does exist in the area, the traditional fast search methods may be faster than the single-step 2D CAPS, but a mismatch or a suboptimal match can occur and/or multiple targets in the search area cannot be handled. When no target exists in the search area, single-step 2D CAPS is more advantageous than traditional fast search methods even when the search area is small. This is because CAPS is performed on a coarse grid spaced by SS h and SS v with few local full searches. For example, in an experiment with the eye template and a group of truncated barbara images, the speedup ratios of singlestep 2D CAPS over the three-step search and coarse-fine search were 2.29 and 2.20, respectively, when the image size was With the image size reduced to 96 96, the speedup ratios Figure 13. 2D plot for comparison of width (in both horizontal and vertical directions) derived from autocorrelation and corresponding width derived from cross correlation. Vol. 13, (2003) 177

10 were 1.46 and D, 3D, and nd CAPS would be especially useful for searching a large database for infrequently occurring targets. VII. CONCLUSION CAPS has been developed as a fast search algorithm in template matching, which provides a significant improvement in the required computation time while maintaining the accuracy of traditional full-search matching. It is computationally advantageous over other fast search methods when the ratio of the search area and the template size is large. Furthermore, CAPS is able to track multiple targets in the same time. The CAPS concept has been further extended to higher dimensions including rotation. Further improvement of CAPS has been developed by iteratively applying CAPS in multiple steps. The algorithm could be used in many machine vision applications. Some other interesting applications are to track video objects in MPEG-4 (ISO, 2001b, Sun, 2003) applications and perform efficient database searches in MPEG-7 (ISO, 2001c) applications. REFERENCES D.I. Barnes and H.F. Silverman, A class of algorithms for fast digital image registration, IEEE Trans Comp 21 (1972), A. Goshtasby, S.H. Gage and J.F. Batholic, A two-stage cross correlation approach to template matching, IEEE Trans Pattern Anal Mach Intell 6 (1984), ISO/IEC MPEG-4 Video verification model version ISO/IEC JTC1/ SC29/WG11 N3908, ISO/IEC MPEG-4 Overview (Singapore Version). ISO/IEC JTC1/SC29/ WG11 N4030, ISO/IEC Overview of the MPEG-7 Standard (Singapore Version). ISO/IEC JTC1/SC29/WG11 N4031, J.R. Jain and A.K. Jain, Displacement measurement and its application in interframe image coding, IEEE Trans Commun 29 (1981), R.N. Nagel and A. Rossenfeld, Ordered search techniques in template matching, IEEE Proc 60 (1972), W.K. Pratt, Correlation techniques of image registration, IEEE Trans Aero Elec Syst 10 (1974), A. Rosenfeld, Picture processing by computer, New York: Academic Press, A. Rosenfeld and G.J. Vanderbrug, Coarse-fine template matching, IEEE Trans Syst, Man and Cybernetics 7 (1977), S. Sun, D.R. Haynor and Y. Kim, Semiautomatic video object segmentation using VSnakes, IEEE Trans Circ Syst Vid 1 (2003), S. Yoshimura and T. Kanade, Fast template matching based on the normalized correlation by using multiresolution eigenimages, Proc IROS 94, Munich, Germany, 1994, pp Vol. 13, (2003)

Motion estimation for video compression

Motion estimation for video compression Blockmatching Search strategies for block matching Block comparison speedups Hierarchical blockmatching Sub-pixel accuracy Motion estimation no. 1 Block-matching