Improving object recognition accuracy and speed through non-uniform sampling

Size: px

Start display at page:

Download "Improving object recognition accuracy and speed through non-uniform sampling"

Jesse Baldwin
6 years ago
Views:

1 Improving object recognition accuracy and speed through non-uniform sampling Boaz J. Super * Computer Vision and Robotics Laboratory Computer Science Department, University of Illinois at Chicago 851 South Morgan St., Chicago, IL, USA ABSTRACT Silhouette-based shape retrieval and recognition have been well studied, because silhouettes are compact representations of object shape, and because they can be reliably extracted in controlled-environment applications such as digitizing museum collections. In past work, we developed a fast and accurate method for retrieval and recognition of object silhouettes and other closed planar contours. The method is based on a combination of alignment, correspondence, eigenspace dimensionality reduction, and example-based retrieval. Its efficiency and accuracy result from the particular forms of each of these components and the way they are combined. This paper presents two improvements to the method: non-uniform sampling and a new similarity measure. The improved method ranks first in retrieval accuracy in comparison with eight prior methods tested on a benchmark database of 1,400 shapes. Its classification accuracy is 96.8% for the first-ranked class hypothesis, and it returns the correct classification in the top ten hypotheses 99.8% of the time. Average time for retrieval and recognition is approximately 0.6 seconds in Matlab on a PC. Keywords: shape classification, shape retrieval, shape matching, object recognition, silhouette, alignment, correspondence, dimensionality reduction, eigenspace methods, principal component analysis 1. INTRODUCTION Several applications of shape matching, such as visual information retrieval and object recognition, must solve a one-to-many matching problem: a query shape is compared to many shapes in a database to find the one(s) most similar to the query. As shape databases grow in size, it becomes infeasible to compare the query with every database shape, so techniques such as database clustering, indexing, and prefiltering are used to reduce the number of pairwise matches. However, even with such techniques, it is essential to find a pairwise shape matcher that is both accurate and fast. The reason is that for a given time constraint determined by the application, the faster of two equally-accurate shape matchers will allow more pairwise matches to be tested, increasing the probability of finding a better match. The aim of our recent research is to develop a pairwise shape matcher for the one-to-many shape matching problem. Recent experiments have shown that it achieves both high accuracy and high speed In fact, the method in Super 23 is so fast that we have not yet found it necessary to implement database clustering, indexing or prefiltering: for example, a query shape can be matched to all 1,400 shapes in an MPEG-7 test database in about 2/3 of a second on a personal computer. The method ranked second in accuracy in a comparison with eight prior methods, but it is much faster than the most accurate method. This paper presents two enhancements to the original system: non-uniform shape sampling and a new similarity measure. We show that these techniques increase the accuracy of the system while maintaining its high speed. With these enhancements, the proposed method is more accurate than the eight prior methods. * super@uic.edu; Phone: (312) ; Web: Preprint. This paper will be published in the Proceedings of the SPIE Conference on Intelligent Robots and Computer Vision XXI: Algorithms, Techniques, and Active Vision, Providence, RI, October 27-31, 2003.

2 In its current form, the system is designed to match shapes represented as closed planar contours. Typically these contours will be silhouettes of 2-D or 3-D objects. It is an example-based system: representative examples and views of objects in a class are included in a database, to allow the system to handle both shape and viewpoint variation. Databases of object silhouettes can be created by imaging individual objects against contrasting backgrounds, so that their silhouettes can be reliably extracted by automatic thresholding and boundary-following algorithms. A turntable can be used to acquire multiple viewpoints of an object. 26 Museum objects, scientific specimens, and other object collections can be acquired in this way. Two basic operations supported by a shape database are retrieval and recognition. Retrieval means finding individual instances in the database that are similar to the query shape. Recognition means identifying the type of object, or class, of a query shape. * In an example-based method such as ours, recognition is performed by returning the label of the database shape most similar to the query shape. Recognition requires that the objects in the database be labeled by class, whereas retrieval requires no class labels. The system described here can be used for both retrieval and recognition. The shape matching method has three components: (1) alignment of shapes within a canonical reference frame, (2) correspondence matching of the resulting invariant curves, and (3) eigenspace-based dimensionality reduction. The key to its speed is the use of a normalization procedure that generates a linear number of poses in Step 1, the use of fixed correspondences as an approximation to flexible correspondences in Step 2, and the dimensionality reduction in Step 3. Retrieval is performed by using the shape matching method to compare the query shape with the database shapes. Recognition is performed using retrieval as a first step, to find database examples similar to the shape to be classified. We call the system RACER, for Recognition by Alignment, Correspondence, Eigenspace, and Retrieval. Related work is discussed in Section 2, and the original RACER system is reviewed in Section 3. The two enhancements to the system are described and motivated in Section 4. Section 5 evaluates the accuracy and speed of the improved system in retrieval and recognition tasks. 2. RELATED WORK Shape matching methods have been studied for object recognition, information retrieval, and other applications. Many shape matching methods can be classified as feature, structural, transform, or direct matching approaches. Feature methods compare vectors of shape measurements such as circularity, eccentricity, and moments. 8,9 Feature methods are fast but have relatively low discriminability, which has limited their accuracy. Structural methods represent shapes as parts and relationships, using graphs, trees, or strings. 4,13,27 Because they explicitly model shape structures and differences between structures, these methods can handle significant shape variation. However, structural matching methods tend to be computationally costly. A compromise approach is to use parts and not relations. For example, shapes can be indexed by the parts they contain. 21,22 Part-based indexing is fast, but because it uses less information, it must be used with other techniques to achieve acceptable accuracy. Transform methods match shapes represented in alternative domains; for example, by Fourier 17 or wavelet 5 decompositions, or by curvature scale-space images. 16 For these methods, the cost of generating the alternative representation of the query shape must be considered. Direct matching methods match shapes in their original form as spatial data. These methods align (register) two shapes and measure their residual difference. There are two general approaches. One approach searches for the best correspondence and alignment together. 3 This approach is too costly to be practical for the one-to-many shape matching problem. The other approach uses minimalsize sets of shape points to align the two shapes. 11 Two, three, and four points are needed for planar similarity, affine, and projective invariance, respectively. Since multiple pairings of ordered point sets from * In this paper, recognition is synonymous with classification. Some authors use recognition to mean the identification of a specific instance of an object instead.

3 the two shapes must be tested to find a good alignment, this approach is also costly, even with the use of methods that have been developed to reduce the cost (e.g., refs. [12, 18]). Nevertheless, the direct matching approach is appealing because the cost of generating alternative representations (such as structural or scale-space representations) is avoided, and because perceptually similar shapes are often spatially similar up to a low-order transformation. The development of RACER was intended to take advantage of these benefits, by finding ways to perform direct matching fast enough for oneto-many shape matching. The main result of Super is that the accuracy and speed of a direct matching approach can be competitive with other approaches for shape recognition and retrieval. 3. SHAPE RETRIEVAL AND CLASSIFICATION SYSTEM In order to make this paper self-contained, this section reviews the original RACER method presented in Super The method has three major components: alignment, correspondence, and dimensionality reduction. 3.1 Alignment Sampling Let s denote a closed planar contour representing a shape. Each contour s is re-sampled with n points at uniform intervals of arc length. The number of sample points n should be chosen large enough to capture the perceptually significant structure of the shapes in the database. We have found n = 100 or 200 to be sufficient on all examples we have tested. Discretization noise and small-scale variation are removed by smoothing the x- and y-coordinate functions of the contour by a small Gaussian filter. In the experiments, the σ of the Gaussian was always set to 1.5% of n Detection of key points The method depends on detecting a small set of key points on the contour. Ideally, key points should be distinctive, and persist across minor variation in the shapes and viewpoints of objects in a class. The current system uses positive local maxima and negative local minima of curvature, which often correspond to projections and concavities of the shape. Other types of key points could also be used. Key points that lie on shallow projections or concavities are discarded. Specifically, key points are discarded if they lie closer than σ to the chord connecting their neighboring key points (Figure 1a). The remaining key points are mapped to their locations on the original, unsmoothed contour s. This set of key points will be denoted by C (Figure 1b) Normalization and pose generation Each shape contour is normalized with respect to position, orientation, scale, reflection, and cyclic starting point by transforming it to a canonical reference frame, and renumbering its sample points. The following procedure is used: (1) A key point x key is selected from C. 2 (2) The point y on s that is furthest from x key is found; i.e., y= arg max{ x x }. The point y is selected from the set of all sample points of the contour, not just the key points. (3) The shape s is translated, rotated, and scaled such that x key is mapped to (0,0) and y is mapped to (1,0). (4) The transformed sample point furthest from the x-axis is found: this point will be denoted z. If z is below the x-axis, then the transformed shape is reflected about the x-axis, so that the point furthest from the x-axis will always lie above it. x s key

4 (5) The sample points of the normalized contour are renumbered from the origin (i.e., from the transformed key point x key ), always in the same direction (clockwise or counterclockwise). The renumbered sample points will be denoted by x 1,...,x n = (x 1,y 1 ),...,(x n,y n ). Steps 1-3 provide invariance to position, orientation, and scale. Step 4 enables a query shape to match a reflected version in the database. Step 5 provides invariance to starting point. Figures 1c-e show examples of normalized contours. The triple (x key, y, z) determines the transformation of a contour to a normalized contour. Since the points y and z are found by a deterministic geometric construction, the choice of key point x key determines the normalized contour of a particular shape. Thus, each key point generates one normalized contour of a shape. As a convenient shorthand, the term pose, which usually refers to a transformation, will be used to refer to the transformed (normalized) contour instead. In our experiments, there were typically poses per shape Single key point alignment The core operation of the RACER system is matching a pair of poses from two shapes. Since each pose of a shape is determined by one key point, the choice of one key point from shape s 1 and one key point from shape s 2 determines the relative alignment of s 1 and s 2 within the canonical reference frame. We refer to this as single key point alignment. The key points of the two poses coincide at the origin of the canonical frame (Figure 2). The use of only one key point to determine a pose is critical to the efficiency of the method. The number of poses generated is linear in the number of key points, in contrast with methods that generate poses from pairs or triples of key points (e.g., refs. [12, 18]). As a result, the number of pose-to-pose matches of two shapes is quadratic in the numbers of key points instead of quartic or higher. (a) (b) d< σ z z z x key y x key y x key y (c) (d) (e) Figure 1. (a) Key points that are closer than σ to the line connecting their neighboring key points are discarded. (b) The final set of key points C on an example shape contour. Key points often lie on significant projections or concavities of the shape. (c)-(e) Three of the 15 normalized contours, or poses, of the shape shown in (b). There is one pose per key point. The transformed positions of x key, y, and z are shown.

5 Figure 2. Fixed-correspondence match between corresponding poses of two similar shapes. The key points of the two poses are at the origin (indicated by the dot). Figure 3. Example of non-uniform sampling of a pose. The key point is at the origin. The use of only one key point to determine a pose also contributes to the accuracy of the method. A practical distinction between a key point and a sample point is that a segmentation or detection procedure is required to find the key points. Therefore, reliance on pairs or triples of key points would increase the probability of failing to find corresponding poses of two similar shapes. 3.2 Correspondence Poses are compared by computing the sum of squared differences (SSD) of corresponding points. Computing flexible correspondences between points (e.g., ref. [3]) is computationally too costly for one-tomany shape matching. Therefore, fixed correspondence is used: i.e., the i-th point of one pose is matched to the i-th point of another pose (Figure 2). One of the reasons for renumbering the sample points in Step 5 of the normalization procedure is to make fixed correspondence possible. 3.3 Eigenspace dimensionality reduction Even using fixed correspondences, matching is too slow for typical numbers of sample points. The consistent numbering of the sample points, combined with the use of fixed correspondences, makes it possible to use principal component analysis 7 (PCA) to reduce the dimensionality of the pose matching problem. It is convenient to represent each pose by a pose vector X = (x 1,...,x n,y 1,...,y n ) T of length 2n. A pose vector represents the normalized contour as a single point in 2n-dimensional space. The SSD between two poses p 1 and p 2 from shapes s 1 and s 2 is given by (X 1 X 2 ) T (X 1 X 2 ). The PCA of the database of pose vectors is computed as follows. Let DB X = {X 1,...,X N } be the set of all pose vectors of all shapes in the database and let X be its mean. Let A denote the matrix of column vectors (X- X) for all X DB X. Let S denote the scatter matrix AA T, with eigenvector matrix P and eigenvalue matrix Λ, so that S = PΛP T. Let P K contain the K eigenvectors corresponding to the K largest eigenvalues, for K 2n. These are the first K principal components. It can be shown that the best K-dimensional approximation of the pose vectors in DB X, in a sum-of-squared-errors sense, is obtained by projecting DB X to the subspace spanned by the eigenvectors in P K. 7 We will refer to this subspace as the eigenspace from now on. Then the K-dimensional approximation X of an arbitrary pose vector X DB X can be written:

6 X + P K a K = X X, (1) where a K is a coefficient vector of length K that multiplies the K principal component vectors. A pair of poses p 1 and p 2 are compared by computing the sum of squared differences of the coefficient vectors, which approximates the SSD of corresponding sample points of the pose vectors: D pose (p 1,p 2 ) = (a K1 a K2 ) T (a K1 a K2 ) (X 1 X 2 ) T (X 1 X 2 ), (2) where the subscripts 1 and 2 identify the pose. The approximation in (2) can be derived from (1) using the orthonormality of the eigenvectors. The difference between a pair of shapes s 1 and s 2 is defined as the difference between the best-matching pair of poses from those shapes; i.e., 1 2 D ( s, s ) = min( D ( p, p )), (3) shape 1 2 ij, 1 2 pose i j where pi, pj are poses from s 1 and s 2, respectively. In other words, this is the match resulting from the best single key point alignment of the two shapes. The system stores the coefficient vectors, which have dimensionality K, instead of the pose vectors, which have dimensionality 2n. The number of principal components K is typically chosen so that K << 2n. We have found that good results are obtained when K is large enough to account for approximately 95% of the variance of DB X. Typical choices for n are 100 to 200 and for K are 20 to 30, resulting in speedup and storage compression ratios of 6.7 to 20. This is not the first use of PCA to match shapes represented as vectors in a shape space. In Zhu and Yuille 27, PCA was used to characterize shape parts (worms and circles) as part of a structural, graph-based shape matching system. Here, we use it to match whole shapes. In Cootes et al., 6 PCA was used to develop statistical models of shape, called point distribution models (PDMs). Here, we also use PCA to model shape, with two important differences. First, PDMs use PCA to model the variation of shapes within individual object classes, whereas the current work uses PCA to model the total (intraclass + interclass) variation of shapes from many different object classes. Second, PDMs require that accurately placed landmarks and their correspondences be given. These have usually been provided manually, although recently there has been research on automating this step. 10 In contrast, the current method is fully automated. 3.4 Retrieval and recognition The system processes a retrieval request as follows. First, the poses of a query contour q are generated using the normalization procedure. Second, the query pose vectors are projected to the reduceddimensionality eigenspace of the database to generate the query's coefficient vectors. Third, D shape (q,s) is computed for each database shape s. Last, a list of database shapes ranked in order of increasing D shape (q,s) is presented to the user. To perform recognition, class labels must be stored with the database shapes. Recognition is performed using a nearest-neighbor approach. Let c 1,...,c m denote the m classes in the database. Multiple examples of each class are stored in the database, and each example is labeled with its class identifier. Given an input shape q to be recognized, q is compared to each shape in the database. The match score for each class c is computed as the minimum shape difference between q and every database shape s in class c: D ( q, c) = min( D ( q, s)). (4) class s c shape

7 Eqn. (4) is used to compute the nearest neighbor of s in each class c 1,...,c m in the database. Finally, the classes are ranked in order of increasing D class. The system can return the single best-matching class label, or it can return several labels as class hypotheses to be resolved by further processing. Classification can be implemented as a simple post-processing step after retrieval: the list of labels corresponding to the ranked list of retrieved shapes is assembled, then all but the first-ranked instance of each distinct class label is discarded. RACER automatically performs classification after retrieval if it detects class labels in the database. 4. METHOD IMPROVEMENTS The use of fixed correspondence contributes to the speed of RACER in two ways. First, the cost of finding flexible correspondences between the poses is avoided. Second, fixed correspondence in combination with the consistent numbering of pose sample points makes the dimensionality reduction possible, because PCA requires that the shape space dimensions be consistent across the pose vectors. However, the SSD of two poses under the fixed correspondence can be significantly larger than the SSD under a flexible correspondence such as closest-point correspondence. Why then, does the method work as well as it does, as reported in previous experiments? There are two reasons. First, the discrepancy between fixed correspondence and flexible correspondence is likely to be smaller when the shapes are more similar. If there are a sufficient number of examples in the database, then it is more likely that there will be database shapes similar to the query shape. In Super 24, we showed that the number of examples required is not large: on the MPEG-7 test database, 7 examples per class resulted in accuracy nearly as high as when 20 examples per class were used. Second, the speed enabled by the use of fixed correspondence allows the system to match a large number of poses within the time constraints of the task, which increases the probability of finding a good match. We have developed two techniques for improving the accuracy of the fixed correspondence matching scheme while maintaining its speed and its compatibility with PCA. First, the poses are resampled nonuniformly such that the sample density decreases as a function of arc length from the key point. Second, the similarity measure is normalized by the arc lengths of the eigenspace projections of the poses. 4.1 Non-uniform contour sampling As discussed earlier, fixed correspondence is used as an approximation to flexible correspondence. The accuracy of this approximation varies over different sample points. By construction, any two poses match exactly at the key point. Typically, the approximation is good near the key point and degrades further away. This observation suggests giving greater influence to portions nearer to the key point. One way to implement this idea is to sample the poses non-uniformly, with the sample rate decreasing as a function of arc length from the key point. Non-uniform sampling is compatible with fixed-correspondence matching as long as the same sampling pattern is used for every pose of every shape. The experiments in this paper used a logarithmic sampling pattern. An example is shown in Figure 3. The sampling was defined as follows. Let n denote the number of sample points, and let k = 0,...,n 1 index the samples beginning with the key point at k = 0. Suppose for now that the pose has an arc length of one. Let l(k) be the arc length along the curve from the key point to sample point k, always in the same direction. The following function was used: λk e 1 n, k = 0, K, λ0 2 lk ( ) = λ( n k) e 1 n 1, k = +,, n, λ 2 1 K 1 0 (5)

8 n λ where λ 2 0 = 2( e 1) is a normalization factor to ensure that the sample points all fit on a curve of unit arc length. For simplicity, n is assumed to be even. The two parts of (5) define a function that decreases symmetrically in both directions away from the key point. The parameter λ determines how rapidly the sampling rate decreases, with higher values of λ corresponding to greater non-uniformity. Since the sampling pattern is independent of any specific pose, it can be computed once, in advance. To apply it to a pose, it is multiplied by the arc length of the pose; i.e., pose p is sampled by l(k)l(p) where L(p) is the total arc length of p. The change to the method by using non-uniform sampling occurs in Step 5 of the alignment procedure. In the original version, since uniform sampling was used, the same set of sample points was used for each pose of the same shape: only a renumbering of the sample points was necessary. With non-uniform sampling, the sample points are different for each pose of the same shape; thus, new sample points interpolating between the original uniformly-spaced sample points are computed for each pose. This adds to the computational cost of database preparation and to the cost of preprocessing the query shape. It does not affect the computational cost of the pairwise shape matches. 4.2 Length-normalized similarity measure Different poses of the same shape can have significantly different arc lengths as shown in Figure 1c-e. This variability in the arc length of the poses introduces a confounding factor in the computation of D shape defined in (3). Suppose that non-identical shapes s a and s b are similar. Suppose that (p a1, p b1 ) and (p a2, p b2 ) are two pairs of poses of s a and s b that match about equally well perceptually, but that p a1 and p b1 are significantly larger than p a2 and p b2. Then D pose (p a1,p b1 ) is greater than D pose (p a2,p b2 ) because the pointwise differences between the poses are proportionally larger. A simple solution is to scale the measured SSD of two poses by their arc lengths. The modified posedifference measure is * Dpose( p1, p2) Dpose( p1, p2) = (6) Lp ( 1) Lp ( 2) where p denotes the projection of pose p to the eigenspace and Lp ( ) denotes the total arc length of p. The arc lengths of the eigenspace projections are used because it is these K-dimensional approximations that are * matched to compute D pose in (2). The modified versions of D shape and D class based on D pose will be denoted * * by D shape and D class, respectively. The arc lengths can be precomputed during database creation for the database shapes and during query * preprocessing for the query shape. Thus, the additional cost of pairwise shape matching due to using D pose instead of D pose is small. 5. EXPERIMENTS The parameter values n = 100, K = 25, and λ = 0.1 were used for all experiments in this paper. Super 23 presented experimental results showing that the original method was not sensitive to the values of n and K within a broad range. Similarly, the accuracy of the improved method appears to change gradually as a function of all three parameters. We first demonstrate the RACER system on a database 22 of 131 shapes in six classes: rabbits, hands, staplers, wrenches, and two types of fish. For the demonstration, each database shape was used as a query. The match to each query's copy in the database was excluded from the results of that query. Figure 4 shows retrieval and recognition results for several example queries. Overall, the closest match was correct for 130

9 Query Top five retrieved shapes Classification fish-1 fish-2 hand wrench rabbit stapler fish-2 Figure 4. Example retrieval and classification results from the shape database used in Super 22. The left shape in each row is the query. The rest of each row shows the top five non-identical shapes retrieved by the system, and the system's classification of the query. The closest match was correct 130 times out of 131 test queries (99.24%). The first six rows show a typical query result from each of the six classes in the database. The last row shows the query with the poorest results, due to the large shape difference between the query hand and all of the other hands in the database. The wrench and rabbit shapes were provided by Stan Sclaroff and the fish shapes were provided by Farzin Mokhtarian. of the 131 test queries (99.24%). This is also the number of correct classifications. The single error, shown in the last row of Figure 4, is due to the large shape difference between the query hand and all of the other hands in the database, examples of which are shown in row 3 of the figure. For a more challenging test of the method, we measured the retrieval accuracy of the RACER system using an MPEG-7 evaluation procedure, accuracy measure, and test database. This allows a direct comparison of RACER with several other systems that were tested the same way. 15,2,20 The MPEG-7 test database consists of 70 classes with 20 examples per class, for a total of 1,400 shapes. These shapes come from a variety of sources and represent both natural and artificial objects. The database is challenging due to the presence of examples that are visually dissimilar from other members of their class, and examples that are highly similar to members of other classes. Thus, retrieval methods based only on visual shape matching are not expected to achieve a perfect score on this database. 15 The same two conditions outliers and overlapping classes can cause retrieval errors in RACER as well. The following procedure was used in the prior published evaluations on this database. 15,2,20 Each database shape was used as a test query. A retrieval was counted as correct if it was in the same class as the query. The number of correct retrievals in the top 40 ranks was counted, including the identical copy of the query. Each system's accuracy was reported as a percentage of the maximum possible number of correct

10 retrievals, which is 28,000 (1,400 shapes 20 correct retrievals per shape). 15 This percentage score will be referred to as retrieval accuracy. We note that RACER does not use class labels for shape retrieval. Class labels are used only for recognition (and for system evaluation). Table 1 displays retrieval accuracy for the proposed method and for the five most accurate prior methods. The performance of each system was measured by its creators, who are the ones best able to select good parameter values for their respective systems. 15 The accuracy of the curvature scale space method 16 and the part correspondence method 13 were reported in Latecki et al. 15 The accuracy of the shape contexts method was reported in Belongie et al., 2 and the accuracy of the curve edit distance method was reported in Sebastian et al. 20 Latecki et al. 15 also report the results of four additional methods, based on skeleton graphs, wavelets, Zernike moments, and multilayer eigenvectors. These are not shown in Table 1 because they have significantly lower accuracies, in the range of 60-70%. The enhanced RACER system presented here has the highest retrieval accuracy score of all 10 methods. Table 1. Retrieval accuracy on the MPEG-7 shape database. Method Retrieval Accuracy Improved RACER system 79.09% Curve Edit Distance % Original RACER system % Shape Contexts % Part Correspondence % Curvature Scale Space % Table 1 shows that the improved RACER system has higher accuracy then the original. A second experiment, not shown here, demonstrated that both non-uniform sampling and arc-length normalization contribute to the increase in accuracy. We note that the increase in accuracy relative to the original RACER system can be traded off for an increase in speed by reducing K. We then evaluated RACER's ability to perform recognition, using the MPEG-7 database. Each database shape was used as a test shape, and the number of correctly classified test shapes was counted. For each test shape, the match to its identical copy in the database was excluded in the computation of its classification. Many recognition systems consist of multiple stages; therefore, we also assessed the potential use of the method as a first stage in a multistage system, by measuring the number of test shapes for which the correct classification was in the top 5 or 10 class hypotheses. The results for all three recognition accuracy measures are shown in Table 2. Table 2. The number of times the correct classification is returned in the top 1, 5, or 10 hypotheses. Successes Errors 1st ranked hypothesis 1,355 (96.8%) 45 (3.2%) In hypotheses ranked 1-5 1,393 (99.5%) 7 (0.5%) In hypotheses ranked ,397 (99.8%) 3 (0.2%) A direct comparison of recognition performance with the other methods in Table 1 is not possible, since the published results reported only shape retrieval and not shape recognition. 15,2,20 However, with an error

11 rate of only 3.2% and an average retrieval and recognition time of 0.6 seconds, the system is highly successful as a fast one-stage recognition system. As the first stage of a hypothetical multistage system, if RACER passes the best shape of each of the top 10 class hypotheses to the next stage, then it generates more than a hundred-fold reduction in the number of shapes to be tested, with a false negative rate of only 3/1400 (0.2%). The current system is implemented in interpreted Matlab running on a Pentium Ghz PC. The offline preparation of the complete 1,400-shape database takes about 1.5 minutes. The complete preprocessing of a query contour, from sampling through finding the coefficient vectors of all of its poses, takes 48 milliseconds on average. A pairwise match takes only 0.4 milliseconds on average: this is the time to compare all the poses of the query with all the poses of one database shape. The entire retrieval and recognition process on the MPEG-7 database takes approximately 0.6 seconds. This includes preprocessing the query, matching it to every one of the 1,400 database shapes, and ranking the results for output. 6. DISCUSSION For any system there is an upper limit on database size beyond which it is not feasible to match the query shape to every database shape. Beyond this limit, general techniques such as database clustering, indexing, and prefiltering are used to reduce the number of database shapes that are matched. We deliberately did not use such techniques in the experiments in this paper, in order to demonstrate that RACER is fast enough to handle O(10 3 )-size databases in interactive time without the use of such techniques. We are currently adding database clustering and indexing to RACER. One way to compare the efficiency of different systems that do or do not use clustering, indexing, or prefiltering techniques is to compare pairwise matching times. Sebastian et al. 20 report >1 second per pairwise match. Belongie et al. 2 report 200 milliseconds per pairwise match, including preprocessing. Latecki and Lakämper 14 report 50 milliseconds per pairwise match and 5 milliseconds for preprocessing. Mokhtarian et al. 16 do not report pairwise matching time. Even allowing for differences in implementation, hardware, and the way the timing results are reported, it appears that RACER's pairwise matching time of 0.4 milliseconds is significantly faster than the other systems for which timing data are available. 7. CONCLUSION In previous work 23-25, we developed a fast and accurate method for shape retrieval and shape recognition. This paper presented two improvements to the original method: non-uniform sampling of the normalized shape contours, and arc-length normalization of the similarity measure. The improved method is the most accurate of 10 methods evaluated on a test of retrieval accuracy. The method is also fast: it operates in interactive time, requiring about 0.6 seconds per query in Matlab on a PC. Research on this promising approach is continuing. We are currently incorporating database clustering and indexing, and extending the method to apply to partial shapes. ACKNOWLEDGEMENTS The author thanks Drs. Mokhtarian, Sclaroff, and Latecki for generously making contour data files available. The rabbit data 19 were derived from images in ref. [1]. Figures 1-4 are used by permission of CVRL. This research was supported in part by NSF grant EIA REFERENCES 1. P. Alden. Peterson First Guides: Mammals. Houghton-Mifflin, Boston, S. Belongie, J. Malik, and J. Puzicha, "Shape Matching and Object Recognition Using Shape Contexts," IEEE Trans. Pattern Analysis and Machine Intelligence 24(4), , P. Besl and N. D. McKay, "A Method for Registration of 3-D Shapes," IEEE Trans. Pattern Analysis and Machine Intelligence 14(2), , 1992.

12 4. S. W. Chen, S. T. Tung, C. Y. Fang, S. Cherng, A. K. Jain, "Extended Attributed String Matching for Shape Recognition," Computer Vision and Image Understanding 70(1), 36-50, G. C.-H. Chuang and C.-C. J. Kuo, "Wavelet Descriptor of Planar Curves: Theory and Applications," IEEE Transactions on Image Processing 5(1), 56-70, T.F. Cootes, C.J. Taylor, D.H. Cooper, and J. Graham, "Active Shape Models - Their Training and Application," Computer Vision and Image Understanding 61(1), 38-59, R. Duda, P. Hart, and D. Stork. Pattern Classification (2nd ed.). Wiley, New York, M. Flickner et al., "Query by Image and Video Content: The QBIC System," IEEE Computer 28, 23-32, L. Gupta and M. D. Srinath, "Contour Sequence Moments for the Classification of Closed Planar Shapes," Pattern Recognition 20(3) , A. Hill, C.J. Taylor, and A.D. Brett, "A Framework for Automatic Landmark Identification Using a New Method of Nonrigid Correspondence," IEEE Trans. Pattern Analysis and Machine Intelligence 22(3), , D. P. Huttenlocher and S. Ullman, "Recognizing Solid Objects by Alignment with an Image," International Journal of Computer Vision 5(2) , Y. Lamdan, J. T. Schwartz, and H. J. Wolfson, "Object Recognition by Affine Invariant Matching," Proc. Conf. Computer Vision and Pattern Recognition, , L. Latecki and R. Lakämper, "Shape Similarity Measure Based on Correspondence of Visual Parts," IEEE Trans. on Pattern Analysis and Machine Intelligence 22(10), , L. Latecki and R. Lakämper, "Application of planar shape comparison to object retrieval in image databases," Pattern Recognition 35, 15 29, L. Latecki, R. Lakämper, and U. Eckhardt, "Shape Descriptors for Non-rigid Shapes with a Single Closed Contour," Proc. IEEE Conf. Computer Vision and Pattern Recognition, , F. Mokhtarian, S. Abbasi, and J. Kittler, "Efficient and Robust Retrieval by Shape Content through Curvature Scale Space," In A. Smeulders and R. Jain, Eds., Image Databases and Multi-Media Search. World Scientific, New Jersey, 51-58, T. Pavlidis, Structural Pattern Recognition. Berlin: Springer Verlag, C. A. Rothwell, A. Zisserman, D. A. Forsyth, and J. L. Mundy, "Planar Object Recognition using Projective Shape Representation," International Journal of Computer Vision 16, 57-99, S. Sclaroff, "Distance to Deformable Prototypes: Encoding Shape Categories for Efficient Search," In A. Smeulders and R. Jain, Eds., Image Databases and Multi-Media Search. World Scientific, New Jersey, , T. Sebastian, P. Klein, and B. Kimia, "On Aligning Curves," IEEE Trans. Pattern Analysis and Machine Intelligence 25(1), , F. Stein and G. Medioni, "Structural Indexing: Efficient 2-D Object Recognition," IEEE Trans. Pattern Analysis and Machine Intelligence 14(12), , B. J. Super, "Fast Retrieval of Isolated Visual Shapes," Computer Vision and Image Understanding 85(1), 1-21, B. J. Super, "Fast Correspondence-based System for Shape Retrieval," Computer Vision and Robotics Laboratory Technical Report CVRL , University of Illinois at Chicago, (In submission.) 24. B. J. Super, "Generalization Accuracy of a Fast 2-D Shape Retrieval Method," Proceedings of the 2003 International Conference on Imaging Science, Systems, and Technology, Las Vegas, NV, June B. J. Super, "Fast Correspondence-based System for 2-D Shape Classification," Proceedings of the 2003 International Conference on Imaging Science, Systems, and Technology, Las Vegas, NV, June B. J. Super and H. Lu, "Evaluation of a Hypothesizer for Silhouette-Based 3-D Object Recognition," Pattern Recognition 36(1), 69-78, S. C. Zhu and A. L. Yuille, "FORMS: A Flexible Object Recognition and Modelling System," International Journal of Computer Vision 20(3) , 1996

Learning Chance Probability Functions for Shape Retrieval or Classification

Learning Chance Probability Functions for Shape Retrieval or Classification Boaz J. Super Computer Science Department, University of Illinois at Chicago super@cs.uic.edu Abstract Several example-based