INTELLIGENT AND OPTIMAL NORMALIZED CORRELATION FOR HIGH- SPEED PATTERN MATCHING. Abstract

Size: px

Start display at page:

Download "INTELLIGENT AND OPTIMAL NORMALIZED CORRELATION FOR HIGH- SPEED PATTERN MATCHING. Abstract"

Stuart Stevenson
5 years ago
Views:

1 INTELLIGENT AND OPTIMAL NORMALIZED CORRELATION FOR HIGH- SPEED PATTERN MATCHING Swami Manickam, Scott D. Roth, Thomas Bushman Datacube Inc., 300, Rosewood Drive, Danvers, MA 01923, U.S.A. Abstract The vision industries have used normalized correlation to reliably locate patterns with high accuracy. However, normalized correlation is computationally very expensive. Researchers were able to reduce the computational complexity of normalized correlation by using data pyramids and thus making it find patterns in real-time, but only in spatially translated images. Any attempts to extend normalized correlation to search for a rotated and scaled pattern proved to be too slow for most vision applications. Other attempts by vision industry researchers to speed up normalized correlation, such as skipping pixels, deteriorated the performance. In addition, today s vision industries need to handle nonlinear changes in brightness, process variations such as multilayer buildup in wafer production, blurring, and perspective distortion. In this paper, we discuss an algorithm that has been developed to resolve all these issues. The developed algorithm extracts only the essential information from patterns through a training process built upon normalized correlation. Using an intelligent- search technique and normalized correlation as the matching criteria, it locates the positions, orientations, and scales (both horizontal and vertical) of one or more trained patterns in real time, while maintaining high accuracy. Overview Registration of an image with respect to a reference pattern has a wide range of applications such as: Wafer alignment, using arbitrary artwork as the pattern to recognize. Fiducial recognition for PCB assembly by a pick-and-place robot. Registration of alignment marks on printed material to be inspected (e.g., wallpaper rolls, currency sheets). Robot vision guidance, locating objects on conveyor belts, pallets, and trays. Vision industries and researchers have employed a mathematical procedure known as normalized gray-scale correlation (NGC) to locate the reference pattern within an image under consideration. In this method, the reference pattern is shifted to every location in the image, the values are multiplied by the pixels that are overlaid, and the total is stored at the position to form an image showing where regions identical or similar to the reference pattern are located. In order to normalize the result of this pattern matching or correlation without the absolute brightness value of the region biasing the results, the operation is usually calculated as the sum of products of the pixel brightness divided by their geometric mean (Russ, 1992).

2 Over the years, NGC has been proven to be a robust and reliable method (Rosenfeld et al., 1982). If sub-pixel accuracy is required, the correlation surface may be interpolated and accuracy of better than 1/16 th of a pixel can be achieved (Gleason et al., 1990). This conventional NGC is invariant to linear changes in brightness, but that is where its invariance ends. In other words, conventional NGC has very little tolerance to any changes in rotation, scale, perspective distortion, non-linear changes in brightness, and multilayer buildup in wafer manufacturing. In addition, NGC is computationally very expensive (O(n 4 ) to find translated patterns). The formula for calculating the NGC score NC is: NC = ( I I )( T T) x, y ( I I ) ( T T) x, y x, y 2 2, where I = I N and T = T N T and I are template and image data, respectively, and N is the number of template pixels. To speed up NGC, Datacube and others have used pipeline image processing. While this allows more pixels to be processed per second, the computational cost of NGC remains the same (O(n 4 ) to find translated patterns). Several strategies such as skipping every other pixel and using random pixels have been used to reduce the number of pixels processed. But these strategies significantly deteriorate the reliability of correlation. Another frequently used strategy is use of image pyramids. An image pyramid is a succession of images where each image is down-sampled from its predecessor. In this approach, a coarse match is found at the top of the pyramid and some sort of hillclimbing strategy is utilized to traverse through the successive pyramid images. This significantly reduces the number of pixels used in correlation. With pyramiding, NGC is able to find patterns in real time, but with variance only in translation. However, in many applications, the given template and the observed image are not only spatially translated but are also relatively rotated and may be even scaled. In such cases, the NGC maximum of a 2-D pattern has to be searched in a 4-D parameter space: translation X, translation Y, rotation, and scale. This can become quite impractical (O(n 6 )) unless reasonable estimates of the scale and rotation are given (Jain, 1989). Today s vision industries require a pattern matching algorithm that can quickly locate a pattern that may have been rotated, scaled, blurred, occluded, distorted due to perspective, and illuminated differently (exhibiting non-linear change in brightness). Instead of attempting to adapt the reliable NGC to these new requirements by intelligent optimization, researchers have moved onto other techniques such as geometry-based, pattern-recognition algorithms. In this paper, we discuss an intelligent optimization of NGC to address these new requirements. The resulting algorithm finds the patterns faster and more accurately than the new geometry-based, pattern-recognition algorithms. The developed procedure, known as vsfind in its implemented form, consists of two algorithms:

3 1. Automatic training, during which the template is distinguished from other objects and the background within the search space, and 2. Run-time pattern recognition, during which the trained template is located in a new image. Development of an Intelligent and Optimal NGC Adaptation The major weakness of NGC is its computational complexity. Since NGC computes dot products, its computational complexity is directly proportional to the template size and image search area. Speeding up of the NGC can be achieved only by reducing the number of data points involved in the correlation. However, care must be taken not to degrade the accuracy and reliability of NGC. Do all pixels contribute equally or do some pixels contribute more than the others to the correlation score? If the latter is true, then how do we identify the influential pixels? In 1994, Krattenthaler et al. analyzed correlation-based template matching and reported that not all pixels contribute equally in correlation. They proposed Point Correlation, where matching is performed with a pre-computed set of points from the reference pattern or the template. In their method, a set of correlation-sensitive points (points of the template with the greater influence on template matching) is selected in a training session. Intuitively, the selected set of points corresponds to the most important features of the template, knowing the set of possible template transformations (translation, rotation, scaling, etc.). To determine this crucial set of points, they proposed an iterative algorithm as follows: 1. Compute a point set P M with M initial points (randomly select three points, preferably on edges). 2. Iteration step Assume a point set P L consisting of L points, where L M has already been computed. Then find the new set P L+1 in the following manner: a. For each point X j = (x i,y i ) in the template with X j P L Compute the correlation result R j (i) for all transformations i, 1 i N, using point correlation with the set of points P L X j. Compute a correlation measure CM j of the correlation result R j (i) that determines the quality of the point X j. b. Choose the point X j to be the new element of P L+1, whose correlation score CM j is a maximum. This technique clearly maintains the NGC s strength of accuracy and reliability by selecting the correlation-sensitive points. By reducing the number of points used in the correlation (from O(n 2 ) to O(n)), the run-time computations are reduced from O(n 6 ) to O(n 5 ). However, the training procedure for selecting such a set of points is computationally very expensive. For example,

4 Let the template be of size n x n pixels. Let the number of translations possible be of the order of n x n. Let the possible combinations of rotations and scale be O(n 2 ). Then N in step 2.a would be of O(n 4 ). Computing the correlation score for a single transform involves O(n) multiplies and additions. To select the most influential template pixel, all of the not-yet-selected template pixels (O(n 2 )) are considered in each iteration. Therefore, the number of computations required to select one pixel would be O(n 4 ) * O(n) * O(n 2 ). To select n pixels, the required number of computations would be: ( n) * O(n 7 ), which is O(n 9 ). Even with modern computers, training time on the order of O(n 9 ) is impractical for the vision industries, where both the run-time and training speed are important. Krattenthaler et al. did not propose any criteria to terminate the training except to set the number of points to some fixed value. The number of correlation-sensitive points of a template depends on the complexity of the template and the range of transformations (rotation, scaling, etc.). Hence, a criterion to determine the size of the set of points is also important in making the training practical. This point-correlation technique has the potential to meet some of the requirements (rotational and scale invariance, accuracy and reliability) of the vision industries. However, the computational complexity of the training makes the application of this technique impractical. The following section describes a procedure that optimizes the training through additional data-reduction techniques and patent-pending algorithms. Optimization of point-correlation training To reduce the training computational complexity, the data-reduction strategies are: 1. Data pyramiding is an independent data-reduction approach and can be combined with the point correlation. Pyramiding reduces both the template and image data approximately from O(n 2 ) to O(n). 2. The pyramiding concept can be extended to the transformations also. Use coarser rotational and scale transformations for the reduced data (top of the pyramid) with gradually finer transformations at the successive pyramid depths. This reduces the number of transformations approximately from O(n 2 ) to O(n). 3. Data pyramiding reduces the data with loss of details. Hence, the points selected at the reduced resolution must be tolerant to the set of coarser transformations. Patent-pending rules can be applied to select transformationtolerant pixels. 4. A heuristic (patent-pending) can be applied to the procedure of selecting the most influential pixel without compromising the optimality of the pointcorrelation template matching (Krattenthaler et al.). In addition, other calculations determine an optimal number of points for a given template and the transformations set.

5 5. Points selected at the spatially reduced resolution are extrapolated for the successive, higher resolutions. At the full resolution, additional points (a prespecified number based on the desired accuracy) are selected using the same strategy (#4, above) to achieve the full accuracy of the patternmatching algorithm. With these optimizations, the training computational complexity has been reduced from O(n 9 ) to O(n 3.5 ), which is practical. Fig. 1. below shows the points selected with the training algorithm for concentric squares with and without rotational transformations. The points selected at the spatially reduced resolution (and extrapolated for illustration) are shown as black x. The additional points selected at the full resolution to improve accuracy are shown as white +. With rotational transformations, the points selected by the training algorithm cluster around the corners, since corners are good indicators of the orientation of a square-shaped template and they are the furthest from the centroid of the template, thereby providing higher angular accuracy. Without rotational transformations, the selected points are spread out along all the edges. With rotation Without rotation Fig. 1. Points selected by vsfind s training engine to recognize a template of concentric squares with and without rotational transformations. Fig. 2. below shows the points selected to distinguish a ring from a disk with the same diameters. Since only the interior pixels corresponding to the hole are necessary to distinguish the two circular objects, the training algorithm selected only those points at the top of the pyramid. Points along the edges are selected at the full resolution to improve accuracy.

6 Fig. 2. Points selected by vsfind s training engine to distinguish a ring from a disk. Run-time pattern matching algorithm During run time, an exhaustive search is performed with the selected set of points representing the template and a coarse set of rotation and scaling transformations on the spatially reduced image data. Computed correlation scores are sorted, and the location, orientation, and scale that resulted in the best score are refined further for a possible match at the successive data pyramids. A hill-climbing approach is employed at each level of the data pyramid to converge toward the best score. At the full spatial resolution, another hill-climbing approach is utilized to achieve sub-pixel accuracy in X, Y, orientation, and scale. Point correlation combined with data and transformations pyramiding is capable of finding a trained template from an image with O(n 2.5 ) computations (translations: O(n), rotation and scale transformations: O(n), and the number of template points used in correlation: O(n 0.5 )). In addition, the multimedia extensions (e.g., MMX on Pentiums) of today s computers help to speed up repetitive multiplies. As a result, almost real-time performance is achieved. Achieving Other Requirements The vision industries need a pattern-matching algorithm that can handle nonsquare pixel data, perspective distortion, blurring, and nonlinear change in brightness. The following sections describe vsfind s enhancements to handle these. Perspective distortion and non-square pixel data Calibration is recommended before training and run time. The benefits of calibration are: 1. Pixels do not have to be square. 2. A template can be trained on a full, interlaced image and then later recognized in an image field. 3. Perspective distortion and skew are corrected, increasing accuracy. Fig. 3. illustrates the effect of perspective distortion on shape. A transformation that includes rotation, scaling, and correction for perspective distortion combined with

7 bilinear interpolation is applied to the sparse template during the search and final refinement for high accuracy. Object With Perspective Fig. 3. Effect of perspective distortion on shape. Recognizing blurred objects Blurring occurs when the object scene is out of focus or the depth of field is narrow and objects vary in height. The typical effect of blurring is the loss of details, like the output of a low-pass filter. Since the image pyramids are built via averaging, the points selected at the spatially reduced resolution are inherently tolerant to blurring. In addition, NGC s use of gray-scale data tolerates gradual variations such as smoothing due to blurring. vsfind versus geometry-based finders Machine vision is a crucial element of wafer manufacturing, for alignment and inspection. Contrast can change nonlinearly and unpredictably during the manufacturing process of a wafer. By operating on the edges of the image instead of the raw grayscale data, NGC becomes invariant to nonlinear changes in brightness and contrast. During process variations, the object interiors may exhibit unpredictable brightness changes. However, irrespective of the brightness changes, the object boundaries are always visible. Basically, only the shapes of objects remain the same. Geometry-based finders handle unpredictable brightness variations by operating only on the shape features of a template. In this case, shape is nothing but a linked list of significant edge pixels. Unless the complete geometry of the object to be recognized is acquired from a CAD model, how does a geometry-based finder know the object s exact geometry based on a single image of the object? Quantization of the edges in the image creates subtle variations of lines and curves; shadows and highlights may look like geometric features but are in fact lighting artifacts. Is the shape in the following image (Fig. 4.) a perfect square?

8 Fig. 4. Square or a five-sided polygon? To determine if this is a square or a five-sided polygon, the user must explicitly edit the model or at least specify an error tolerance for fitting lines. Any geometry-based recognition algorithm must use an error tolerance for fitting lines, circular arcs, and other curves. This procedure introduces approximations (and errors) to the model before runtime recognition. Then at run time, as lines and curves are fit to the edges in the new image, additional errors are introduced. NGC on the other hand operates on the image without estimating edge geometry. Errors are not introduced during training or run time. To handle nonlinear changes in contrast, vsfind operates on edge images (e.g., the result of a Sobel filter). This is typically the first step to geometry-based processing, too, because lines and curves are fit to the edges in the image. If a nonlinear contrast transformation is applied to the gray-scale image, it affects the magnitudes of the edges in the edge image, but not the sub-pixel locations of the edges. vsfind fits the grayscale edges in the template to the gray-scale edges in the image for precise recognition and registration. Finally, the cost of geometry-based analysis increases at least linearly with the complexity of the scene. The complexity of the scene can be measured by the number of separate lines and curves in the image or by the number of pixels corresponding to the zero crossings of edges above some threshold. The speed of vsfind, however, is basically independent of scene complexity because it is based on NGC. Handling multilayer buildup in wafers As semiconductor layers are added to a wafer, new features appear in the image. The edges in the image are first thresholded, using either an automatic threshold or a specified threshold. Then isolated edge pixels are filtered and only the significant edge pixels are kept and used in training. This way, only these pixels (most probably the boundary pixels) would be selected and hence any objects that may clutter the interiors at a later time would not have any effect on the correlation score.

9 Handling multiple instances of a template Since an exhaustive search is performed at the top of the data pyramid to locate the trained template and then the correlation scores are sorted, no additional computations (except for the hill climbing) are necessary to find multiple instances of a template. The next better matches are simply pursued through the successive levels of the data pyramid. Handling partly visible, overlapping, touching templates By definition, sparse (point) correlation correlates only on a set of pre-selected points to locate the best match. Hence, it can handle situations wherein parts of the templates are cluttered, overlapping, touching, or not visible due to clipping by the field of view. However, if the portion of the template that is missing happens to be where most of the points were selected during training, then this algorithm does fail. However, the training can be forced to select uniformly distributed points to avoid this failure. Handling templates of arbitrary shapes In sparse correlation, the points that are not selected during training are masked out of the template data during the correlation score computations. Users can also mask out portions of the template that are not important for recognition. In fact, any arbitrarily shaped template can be defined. Performance of the Proposed Pattern-Matching Algorithm: vsfind To test the performance of vsfind, the following target sheet (Fig. 5.) was placed on a micrometer stage and repeatedly moved under the camera. For all tests, the square bulls-eye target in the middle was trained as the template. The template was 191 x 191 pixels in size within a 640 x 480 image. Accuracy tests were performed on both linear and rotating micrometer stages. As the target was moved under the camera, images were acquired and the template was found by vsfind. When the target was moved on the linear stage, vsfind was run with an angular range of ± 0 degrees, so only an (X,Y) translation was sought. When the target was moved on the rotating stage, both the angle and (X,Y) translation were sought.

10 Fig. 5. Target image used in accuracy tests. Twenty pictures each were taken, processed, and stored for the translation test and the rotation test. The accuracy parameter to vsfind (on a scale of 1 to 100) was set to a value of 30 for these tests. The following graph (Fig. 6.) shows the linearity of found locations for the twenty translation images: For these tests, the target was moved almost vertically in steps of about 1/3 pixels. A line was least-squares fit to the twenty found locations. The vsfind: Nonlinearity in Pixels plot below (Fig. 6.) shows the deviation from the fit line for each of the twenty sample images. With this test, questions arose about the accuracy of the micrometer stage and any minute motion of the camera. To answer these questions, the four squares in the corners of the images were used as fiducials. The centers of the four square fiducials were found by applying a high-precision line fitter to each side of each square. By intersecting the four lines around each square, four accurate corners for each square were derived. Then the (X,Y) locations of the four corners of each square were averaged to get the centroid of each square. Finally, the centroids of the four squares were averaged to get a single (X,Y) location representing the centroid of the square fiducials. The linearity of the fiducials travel on the micrometer stage was analyzed in the same way as the linearity of the finder s results was analyzed, above. The graph of non-linearity is in shown in Fig. 7.

11 vsfind: Nonlinearity in Pixels Std. Dev.: 0.016; Max: pixels Pixels Sample Images Fig. 6. vsfind: Nonlinearity in pixels. Fiducials: Nonlinearity in Pixels Std. Dev.: 0.013; Max: pixels Pixels Sample Images Fig. 7. Fiducials: Nonlinearity in pixels. Clearly, the errors from vsfind and the metrology (line-fitter) tools are in correspondence, yet the tools compute the target location using completely different software and methodologies. Consequently, movement of the camera or non-linear movement of the stage caused some error. To determine an absolute error that is independent of these movements, the finder locations were simply subtracted from the combined fiducial centroids in each image. The plot of these relative errors is shown in Fig. 8. For the rotation test, the target was rotated in steps of about 0.42 degrees, so the twenty images covered a total of about 8 degrees of travel. Fig. 9. compares the angular difference in the angle reported by vsfind for the rotated template with the combined rotational angle of the four square fiducials.

12 When solving for rotation and translation instead of translation only, the error in translation increases, as depicted in Fig. 10. Finder Error in Translation Relative to Fiducials Std. Dev.: 0.011; Max: pixels Pixels Sample Images Fig. 8. vsfind s error in translation relative to fiducials. Finder Error in Rotation Relative to Fiducials Std. Dev.: ; Max: degrees Degrees Sample Images Fig. 9. vsfind s error in rotation relative to fiducials.

13 Finder Error in Translation Relative to Fiducials While Also Solving for Rotation Std. Dev.: 0.023; Max: pixels Pixels Sample Images Fig. 10. vsfind s error in translation relative to fiducials while also solving for rotation. All of the above graphs illustrate the results of tests performed with the finder s accuracy set to 30. The following table summarizes comparable results for accuracy settings of 10 and 20. Included are the search times on a 400 MHz Pentium II for an angular search range of ±45 degrees. Accuracy Settings Translation Test Recognition time (ms) One standard deviation (S.D.) error (pixels) Rotation Test Recognition time (ms) One S.D. error in translation (pixels) One S.D. error in rotation (degrees) Summary and Conclusions A pattern-matching technique called vsfind was described. vsfind extracts only the essential information from patterns through a training process (patent-pending) built upon normalized correlation (NGC). By using an intelligent search technique and normalized correlation as the matching criteria, vsfind locates trained patterns almost in

14 real time, while maintaining high accuracy. vsfind was favorably compared to geometry-based finders for accurately locating patterns on wafers for registration. Also described was the adaptation of NGC to meet the scale, rotation, calibrated perspective distortion, focus, contrast reversal, missing features, overlapping, touching and linear and non-linear brightness invariant requirements of the vision industries. References 1. Gleason, S.S., M. A. Hunt, and W. B. Jatko Subpixel measurement of image features based on paraboloid surface fit. SPIE Vol Machine Vision Systems Integration in Industry. 2. Gonzalez, R.C., and R. E. Woods Digital image processing. Addison-Wesley Publishing Company, Inc., New York. 3. Jain, A. K Fundamentals of digital image processing. Prentice-Hall, Inc., Englewood Cliffs, New Jersey. 4. Krattenthaler, W., K. J. Mayer, and M. Zeiller Point correlation: a reducedcost template matching technique. First IEEE International Conference on Image Processing November, Austin, Texas. 5. Rosenfeld, A., and A. C. Kak Digital picture processing. Academic Press, New York. 6. Russ, J.C The image processing handbook. CRC Press Inc., Ann Arbor, Michigan.

MATCHING CRITERIA IN TEMPLATE LOCALIZING COMPARATIVE ANALYSIS OF EXPERIMENTAL RESULTS

MATCHING CRITERIA IN TEMPLATE LOCALIZING COMPARATIVE ANALYSIS OF EXPERIMENTAL RESULTS Yulka PETKOVA*, Dimitar TYANEV* *Technical University of Varna, Department of Computer Sciences and Technologies, 1,