Retrieval from Shape Databases Using Chance Probability Functions and Fixed Correspondence 1

Size: px

Start display at page:

Download "Retrieval from Shape Databases Using Chance Probability Functions and Fixed Correspondence 1"

Lindsay Owens
5 years ago
Views:

1 Retrieval from Shape Databases Using Chance Probability Functions and Fixed Correspondence 1 Boaz J. Super Motorola Labs Schaumburg, Illinois, USA motorola.com Abstract Similarity-based retrieval from shape databases typically employs a pairwise shape matcher and one or more indexing techniques. In this paper, we focus specifically on the design of a pairwise matcher for retrieval of 2-D shape contours. In the past, the matchers used for the oneto-many problem of shape retrieval were often designed for the problem of matching an isolated pair of shapes. This approach fails to exploit two characteristics of the one-to-many matching problem that distinguish it from the one-to-one matching problem. First, the output of shape retrieval systems tends to be dominated by matches to relatively similar shapes. In this paper, we demonstrate that by not expending computational resources on unneeded accuracy of matching, both the speed and the accuracy of retrieval can be increased. Second, the shape database is a large statistical sample of the population of shapes. We introduce a probabilistic model for exploiting that statistical knowledge to further increase retrieval accuracy. The model has several benefits: (1) It does not require class labels on the database shapes, thus supporting unlabeled retrieval. (2) It does not require feature independence. (3) It is parameter-free. (4) It has a fast runtime implementation. The probabilistic model is general and thus potentially applicable to other one-to-many matching problems. Keywords: similarity-based retrieval, shape retrieval, chance probability functions, fixed correspondence, alignment 1 Work performed while author was affiliated with the University of Illinois at Chicago. 1

2 1. Introduction Similarity-based retrieval from shape databases has received significant research attention, in part because of domain-specific applications (e.g., [12,13]), and in part because of the MPEG-7 multimedia content description standard which provides for explicit coding of shape data. This paper addresses a commonly studied subproblem of shape retrieval: the retrieval of shapes represented by closed planar contours [3,9,16,21,29]. Such contours can be obtained from silhouettes of 3-D objects and from other sources. For example, silhouettes can be easily acquired at the same time as full images of objects during the digitization of collections from museums and scientific laboratories. Typically, a turntable and controlled lighting are used [34]. Much of the research on matching shapes has focused on two subproblems: (1) methods for matching a pair of shapes, and (2) methods for accessing a large database efficiently (indexing). In the past, pairwise matchers have often been designed with the goal of being able to match two relatively dissimilar shapes. Such flexible matchers are computationally expensive, so in order to return search results in a reasonable time, generic information retrieval techniques are used to reduce the number of pairwise matches between the query and database shapes. Such generic techniques fall into three categories: clustering and prototyping [1], indexing with search structures [25], and prefiltering with low-cost features [21]. We suggest that the problem of matching one shape to many shapes in order to find the best matching shapes (the one-to-many matching problem) is fundamentally different from the problem of matching a pair of shapes in isolation (the isolated matching problem). A consequence is that a good pairwise matcher for one-to-many matching is not necessarily the same as a good pairwise matcher for matching two shapes in isolation. This paper identifies two characteristics of the one-to-many matching problem that can be exploited to increase both the accuracy and the speed of shape retrieval. The first characteristic is that if there are sufficient examples in the database, the output of the retrieval system is typically determined by matches to shapes that are relatively similar to the query shape. In this case, the extra computational cost incurred by flexible matchers designed to 2

3 handle highly dissimilar shapes is simply wasted work. The strategy presented in this paper essentially reallocates computational resources away from matching shapes accurately towards performing a larger number of shape matches instead. The result is a system that is both faster and more accurate than prior approaches. The second characteristic of the one-to-many matching problem is that the shape database is also a large statistical sample of shapes. This paper presents a formalism for exploiting that statistical knowledge base in order to increase retrieval accuracy. The formalism is based on learning a set of chance probability functions (CPFs) at database setup time. Unlike categorybased retrieval methods, which require class labels on database shapes [26], the approach presented here can be used with unlabeled database shapes, as in e.g., [1,9,10,18,27]. The central focus of this paper is the design of a pairwise shape matcher specifically for use for retrieval from shape databases, arrived at by exploiting the characteristics of the one-to-many matching problem that distinguish it from the isolated matching problem. Although indexing is an essential component of retrieval systems, this paper does not report any research on indexing. In fact, since our method is fast enough to apply to the largest of the widely-used test databases [18] without indexing, we did not find it necessary to implement any indexing methods. As databases grow larger, the generic indexing techniques mentioned above clustering and prototyping, search structures, and prefiltering can be used with the proposed approach just as they can with any other retrieval method. However, at any given database size, the high speed of the proposed approach implies that it can perform a greater number of direct matches within a given time than prior methods can. All else being equal, this is likely to lead to higher performance at that database size. Finally, it is important to note that the CPF formalism does not depend specifically on shape data. Therefore, it is potentially applicable to other types of information retrieval and to other pattern recognition problems besides retrieval. Key elements of this work have appeared in preliminary form in our earlier papers. In this paper, the accuracy of the core matcher [30,31] has been improved by the use of the arc length 3

4 normalization scheme given in Section 3.3, and the accuracy of the chance probability function method [32] has been improved by using information from multiple pose matches of the same two shapes (Section 6). In addition, the explanation of the chance probability function method has been made clearer. All of the empirical results presented in this paper are new. The rest of the paper is organized as follows. Section 2 discusses prior approaches to shape matching to provide a context for the current work. Section 3 presents the core shape matcher for use in one-to-many matching applications. Section 4 presents the single-pose CPF model introduced in a recent workshop paper [32]. Section 5 experimentally evaluates the retrieval accuracy and speed of the method and compares its performance with prior methods in the literature. Section 6 presents a multi-pose extension of the single-pose CPF model that increases retrieval accuracy by aggregating information from multiple matches. Section 7 presents an application of the system to nearest-neighbor shape classification in the case where database shape labels are available. Finally, Section 8 analyzes the reasons for the method's performance. 2. Background Our approach to designing a shape retrieval system that is both fast and accurate was guided by consideration of the characteristics and performance of earlier shape matching methods. This section summarizes that background. (A recent general survey of shape matching can be found in Veltkamp and Hagedoorn [35].) Most shape matching methods fall into a few broad categories: methods that compare global shape features, methods that compare structural representations, and methods that match spatial data. Our method is in this last category, but we briefly discuss all three approaches in order to provide context. Methods that use global shape features, such as invariant moments [8,15], support very fast retrieval since the basic match operation is the comparison of two feature vectors. However, the accuracy of these methods in empirical tests has been less than that of other approaches. The underlying reason is that the non-locality of the representations limits the ability to discriminate 4

5 between different database shapes. For this reason, we rejected the global feature approach for designing a shape retrieval method that is accurate and fast. Structural approaches represent shapes by graphs, trees, or strings, and compare them using approximate graph-, tree-, or string-matching methods Some structural approaches compare object skeletons (e.g., Zhu and Yuille [36], Sharvit et al. [28], and Liu and Geiger [20]); others compare object boundaries or contours (e.g., Del Bimbo and Pala [5], Gdalyahu and Weinshall [9], and Latecki and Lakamper [16]). An interesting recent development, combining elements of structural and spatial approaches, is a continuous version of edit-distance matching of two contours (Sebastian et al. [27]). The appeal of structural methods is their ability to match shapes that are relatively dissimilar. However, structural matching methods are computationally intensive, so their flexibility comes at the cost of speed. Spatial matching approaches, also called template matching approaches, compare spatial data directly. Typically two subproblems are solved: finding corresponding points on the two shapes, and finding a transformation that aligns the two shapes. Transformations may be global [2] or local (deformable templates) [3,4]. Solving for the transformation is often interleaved with solving the correspondence problem [2]. Recent developments in solving the correspondence problem include the use of regional information (labeled distance sets [10]) or global information (shape contexts [1]) to disambiguate point correspondences. In general, the solution of the correspondence problem incurs a significant computational cost. Our approach is in the category of spatial matching methods, and more specifically, in the subcategory of alignment methods. Alignment methods hypothesize the matching transformation of two shapes based on minimal-size point sets on each contour and then verify hypothesized matches by finding correspondences between the transformed shape(s) [14,23,24]. Prior approaches for speeding up verification include cascading verification stages or using distance transforms [14,23,24]. The current approach avoids solving the correspondence problem in the verification stage; that is, it does not attempt to determine which points of one shape correspond to which points of the other shape. Instead, it assumes a fixed (predetermined) 5

6 correspondence. As a result, verification can be performed by a single vector match, which is a fast operation; furthermore, the vectors are significantly smaller than would be required for a distance-transform based method. The speed of the verification stage allows many hypothesized transformations to be tested, thereby enabling the method to fully exploit the inherent redundancy of the alignment approach. It may at first seem unexpected that the strategy of using a data-independent correspondence would work at all, let alone work more accurately and faster than other shape matching approaches. However, the proposed approach exploits the observation that the output of shape retrieval systems is usually driven by fairly similar matches, so that highly accurate pairwise matching is not needed. By optimizing speed rather than accuracy, a much larger number of candidate matches can be tested, and this boosts the retrieval accuracy of the system. Indeed, one way to interpret the significance of this work is that it suggests that approximate comparison of the input shape with a large number of example shapes is more valuable than accurate comparison of the input shape with a few examples. In order to compare the proposed method with as many other methods as possible we used a test database and a shape retrieval accuracy benchmark introduced for the development of the MPEG-7 multimedia content description standard [18]. It is the largest and most widely used test shape database available (some of the other widely used test databases [26] contain subsets of the MPEG-7 database and vice-versa). It is composed of a large number of different types of shapes: 70 classes of shapes with 20 examples of each class, for a total of 1,400 shapes. The standard benchmark [18] measures the percent of correct retrievals in the first 40 ranks, where "correct retrieval" means "in the same class as the query," and each of the 1,400 shapes is used as a query to the database. It must be emphasized that the class information is used only to evaluate the accuracy of retrieval; it is not used to perform the retrieval itself. Thus, unlabeled retrieval is being tested. Rows 1-9 of Table 1 (in Section 5) show the published retrieval accuracy scores of nine prior systems tested on this benchmark. Each system's accuracy is reported by the creators of that system since they are best able to optimize the performance of their system [18]. 6

7 3. Shape matcher This paper explores retrieval of planar shapes represented as single closed planar contours. With straightforward modifications, the method can also handle open contours. The current research was motivated in part by the goal of performing silhouette-based classification of 3-D objects; however, any source of planar contours can be used, including the bounding contours of purely two-dimensional patterns. 3.1 Preprocessing Let X = {x 1,...,x α } denote the ordered set of 2-D sample points of a digital contour. A set of key points, X key = {x i1,...,x ik } X, are detected on each shape. In the current version, the key points are positive local maxima and negative local minima of curvature. Noisy key points due to discretization are eliminated in two steps. First, the contour is smoothed slightly using a 1-D Gaussian with σ = 3 pixels prior to computing the curvature; then key points that are within σ of T x (x) T x (z) T x (y) Figure 1. Top: Example shape with key points. Key points often lie on significant projections or concavities. Bottom: Three of the 15 normalized contours, or poses, of the shape shown at the top. There is one pose per key point. The transformed positions of x, y, and z are indicated by the small circles. 7

8 the chord connecting their neighbors are discarded. The remaining key points tend to correspond to perceptually salient projections and concavities (see top of Figure 1). Other types of key points such as zero-crossings of curvature or domain-specific interest points are also possible. Most shapes we have examined (about 1,500) have ten to forty key points. As with any other form of bottom-up detection of local features, the detection of key points is not completely satisfactory, with both "missing" and "extra" key points. We use an efficient alignment-based method to overcome this problem by exploiting the redundancy afforded by having multiple key points. This method is described next. 3.2 Alignment Shapes are matched under the planar similarity transformation group including reflection. Each shape is transformed to an invariant reference frame using three points: one key point and two other points that are deterministically generated from the contour by the choice of the key point. The transformed coordinates of the remaining α 3 contour points are invariant to planar similarity transformations and can be matched directly. The current method is thus in the general category of alignment methods that use invariant reference frames, such as Rothwell et al. [24]. The advantage of alignment methods is the redundancy they afford. Since there are usually multiple possible 3-point bases, if one does not yield a good match, another one often does. There are different ways to transform a specific shape to the invariant reference frame, because different point triples can be used to determine the normalizing transformation. Each possible transformation of the shape to the invariant reference frame will be referred to as a pose of that shape. The cost of matching two shapes is the cost of matching all poses of one shape to all poses of the other shape. To keep the cost of matching low, it is necessary to minimize the number of poses generated by each shape. We do this by using a simple geometric construction that, given an arbitrarily selected key point x X key, deterministically selects two additional points y,z X to define the pose. The number of poses per shape is thus the same as the number 8

9 of key points; hence, the cost of matching two shapes is only quadratic in the number of key points. In the rest of the paper, the term pose will be used to refer to both the transformation and to the normalized contour resulting from the transformation. The meaning will be clear from the context. The geometric construction is defined as follows. Given an arbitrary key point x X key, the point y X furthest from x in Euclidean distance is found, and then the point z X furthest from the line through x and y is found (Figure 2): y = arg max w X w x (1) z = arg max(w x) T (y x), (2) w X where column vectors are assumed, and v means a vector orthogonal to v. Given x, the normalization is performed by applying a planar similarity transformation T x such that T x (x) = (0,0), T x (y) = (1,0), and T x (z) > 0. (See Figure 2.) The last condition enables a query to match a reflected version in the database. The bottom row of Figure 1 shows several of the poses generated from an example shape. Note that by construction, all shape points in the invariant reference frame are bounded by a unit circle centered at the origin. In the following, u = (u,v) will denote coordinates in the invariant reference frame. T x (z) 1 x y z T x (x) 0 1 T x (y) Figure 2. Transformation of a shape contour to an invariant reference frame using a threepoint geometric construction. 9

10 Although x is a key point (i.e., x X key ), the points y and z are sample points (i.e., y,z X). Unlike key points, sample points are not produced by the key-point detection process and thus are not subject to failures of that process. Thus, using only one key point but two sample points to determine the normalizing transformation reduces the sensitivity of the method to key-point detection. Finally, we note that even if similar key points x and x' are chosen on two similar contours s and s', respectively, the points y, y' or z, z' may lie on completely different parts of s and s'. However, in practice, the probability of this occurring for all key points of a shape is small. The matching method in the next section requires only one good match of a key point in s and a key point in s' in order to match the shapes successfully. 3.3 Matching Solving the correspondence problem is computationally expensive. To avoid that cost, the system uses a fixed-correspondence scheme as an approximation to flexible correspondence. All poses of all shapes are resampled in the invariant reference frame with the same number of points in the same pattern. Let u 1,...,u n denote a sequence of n new sample points of the normalized contour, starting with u 1 = T x (x) = (0,0). Two poses are compared by measuring the sum of squared distances between points with the same index in 1,...,n. This is equivalent to representing each normalized contour (pose) as a vector p = (u 1,...,u n,v 1,...,v n ) T in a 2n-dimensional shape space, and measuring the difference between two poses p and p' as the squared Euclidean distance between the 2n-dimensional points in the shape space (Figure 3): D* pose (p,p') = ( p - p' ) T ( p - p' ). (3) All poses of all shapes coincide at the transformed key point, i.e., at the origin of the invariant reference frame. Generally, the accuracy of the fixed-correspondence approximation is highest near to the origin and tends to decrease with distance along the contour from the origin. 10

11 T x (x) Figure 3. Pose-to-pose matching in the canonical reference frame. The short line segments show the fixed point-correspondences implied by the componentwise vector match. For display purposes, only every second sample point is shown in this figure. The reason we align key points instead of centroids is to ensure that the fixed-correspondence approximation will be accurate over at least part of the contour. In order to further reduce the error of the fixed-correspondence approximation, a non-uniform sampling pattern is used to give more influence to the parts of the contour that are nearer to the origin. The current system uses a logarithmic sampling pattern as shown in Figure 4. The expression in (4) fits n sample points on a closed contour of unit arc length, such that the spacing pattern is the same in both directions from u 1 : l( k) λ ( k 1) e 1, λ 0 λ ( n k 1) e 1 λ0 = + 1, k k n = 1, K, n = + 2, K, n. 2 (4) In (4), l(k) is the position of the kth sample point, expressed as an arc-length distance from the origin, u 1, and λ is a parameter that controls how rapidly the sampling rate decreases away from u 1. The factorλ 0 = 2(e λ n2 1) ensures that the sample points lie on a closed curve of unit arc length. For convenience, n is assumed to be even. The sampling pattern for a pose contour p of arbitrary arc length L(p) is L(p)l(k). The effect of the non-uniform sampling pattern is to cause 11

12 T x ( x) Figure 4. Logarithmic sampling pattern. the method to perform hybrid local-global matching, in which global shape is considered but local information around the key point has more influence. Different poses of the same shape may have different arc lengths, as demonstrated in the bottom row of Figure 1. As a consequence, one pair of similar poses from a pair of shapes being matched may be larger than another pair of similar poses from the same shape pair. The larger pose pair will tend to have a larger D* pose value. Therefore, the pose difference measure in (3) is normalized by the squared average of the arc lengths of the two poses being matched: D pose ( p 1, p 2 * D pose ( p1, p 2 ) ) = (5) [( L( p ) + L( p )) 2] The denominator is squared because D* pose is a squared error measure. The normalization in (5) allows the difference values of multiple pairs of poses from the same two shapes to be compared commensurately. The normalization in (5) yields higher system accuracy than the one used in [30-32]. The arc lengths of pose contours in the database can be computed in advance, so only the arc lengths of the query shape's poses must be computed at runtime. The difference between two shapes s and s' is computed as the minimum of all pairwise pose differences between the shapes: D shape (s, s ) = min(d pose (p i, p j )), (6) i, j 12

13 where i, j index the poses p, from s, s', respectively. For a concrete example, suppose that i p j there are 10 key points in s and 15 key points in s'. Then Dshape(s,s') is computed from 150 poseto-pose matches, corresponding to pairs of key points from s and s'. Finally, to perform retrieval, the query shape is compared to the database shapes using (6), and the database shapes are ranked in order of increasing D shape. Typically, the R shapes with the smallest D shape scores are returned, where R is specified by the user. 4. Chance probability functions 4.1 Overview The distribution of shapes in shape space is generally not uniform. As a consequence, equal distances from the input shape to different stored shapes do not have the same significance in high-density versus low-density regions of the space. This suggests that replacing a shape-space distance measure such as (3) or (5) with a probability measure based on the distribution of the shapes in the space would lead to improved accuracy. Suppose shapes have a probability distribution in the shape space, and assume that the database shapes and query shapes are randomly and independently drawn according to this distribution. The proposed method is based on the probability of a query shape being close to each of the database shapes. The less likely it is that an observed dissimilarity between a query shape q and a stored shape s occurred by chance, the more similar q and s are considered to be. For a given query q we compute this "chance probability" (CP) for each database shape s, and rank the database shapes by increasing chance probability instead of by increasing shape-space (geometric) distance. In order to estimate CPs rapidly during retrieval, a set of chance probability functions (CPFs) are learned during the database setup phase. Each CPF is a mapping from geometric distance to chance probabilities for one shape pose in the database. Learning makes the method feasible: since each CPF depends on all of the poses of all of the shapes in the database, it would be expensive to compute the CPFs at run time. CPFs are storage-efficient since the properties of 13

14 nearest-neighbor methods can be exploited to compress the representation of each CPF into a relatively small lookup table. The empirical distribution of the shape poses in the database, rather than an assumed model distribution, is used to compute the CPFs. This approach has four useful properties. First, it does not require class membership information. Thus, it can be used for the unlabeled retrieval problem that is the focus of this paper. In contrast, Bayesian methods that depend on knowledge of class-conditional probabilities cannot be used for the unlabeled retrieval problem. Second, the computation of chance probabilities is parameter-free. This is an advantage over using kernel-based methods to explicitly estimate the distribution in shape space, since kernel-based methods require a window size parameter. Third, the proposed approach has a fast run-time implementation using compressed lookup tables. Fourth, the chance probabilities are computed directly from similarities in the shape space, instead of from the composition of models for individual feature dimensions. This has an important consequence: feature independence is not required. Indeed, in our application the features are highly non-independent. It also implies that in practice, our method can be used with high-dimensional shape spaces without concern for the numerical inaccuracy caused by summing a large number of independent log probability estimates. For example, in this paper, the dimensionality of the shape space is 200. The direct computation of chance probabilities distinguishes the method from works in perceptual grouping and in object recognition which compute the chance probabilities of complex events by composing the chance probabilities of multiple simpler events (e.g., [6,11,22]). 4.2 Chance probability functions Let F(p) denote the probability distribution of pose vectors p in the 2n-dimensional pose vector space defined in Section 3. Let P p (d) denote the probability of a pose occurring within D pose -distance d of p: Prob{ p' : D pose (p,p') d }. (7) 14

15 We estimate P p (d) from the set S of all poses of all shapes in the database, as follows: P p (d) = { p' S : D pose (p,p') d } / S, (8) where. denotes set cardinality. In the rest of this paper, P p (d) will refer to the estimate (8) instead of to (7). The CPF method can be used for the shape retrieval problem precisely because a large sample of shapes is available in the shape database to provide the estimates of P p (d) for each p. Using the database poses to estimate P p (d) in (8) results in a simple structure for P p (d). Suppose that the poses in S are ranked by distance from p; i.e., number the poses p 1,p 2,...,p m, where m = S, p 1 = p, and D pose (p,p i ) D pose (p,p i+1 ) for 1 i < m. Then P p [D pose (p,p i )] = i/m (9) for i = 1,...,m. These probabilities are simply the rank orderings of the training shapes by their D pose distance from p, normalized by m. The P p (d) function for each p is a cumulative probability distribution. It is a stair function as illustrated in Figure 5. It is straightforward to store these cumulative probability functions as lookup tables indexed by d. A complete lookup table requires m entries; however, since in similarity-based retrieval the results are usually determined by the closest examples, it is only necessary to keep the first few entries 1,...,m 1 of 1... P p (d)... 0 p 1 p 2 p 3 p 4 p 5... p m-1 p m d Figure 5. Chance probability function for one pose, p = p 1. The piecewise linearly interpolated version is used in the method. 15

16 the table plus a small number (m 2 ) of the remaining entries, which can be interpolated if necessary. Then the size of the lookup table is m 1 + m 2. The experiments in this paper used logarithmic spacing of the higher table entries. Now let q represent a pose of an unknown shape. The chance probability of q relative to p is defined as CP p (q) P p [D pose (p,q)] = Prob{ p' S : D pose (p,p') D pose (p,q) }. (10) This is the probability of getting a pose as close or closer to p than D pose (p,q) by chance, as estimated by the actual distribution of the database poses. The smaller the value of CP p (q), the less likely it is that q is near to p by chance. Therefore, when CP p (q) is small, we infer that q is similar to p. Initial experimentation demonstrated that better results are obtained when P p (d) is interpolated between the steps, as shown by the continuous function in Figure 5, since otherwise ties occur frequently because P p (d) increases in discrete steps. In the rest of the paper P p (d) will refer to this interpolated form, which we call the chance probability function (CPF) of p. Since the functions P p (d) are stored as lookup tables, the interpolation is performed during retrieval rather than during learning. Any remaining ties in CP p (q) with respect to different database poses p are resolved using the D pose (p,q) values as tie breakers. The chance probabilities are used for shape retrieval as follows. Given an unknown input shape q, D pose (p,q) is computed for every pose q of the input shape and every pose p of every database shape, using the matching method of Section 3. Then, for each p and q, the chance probability CP p (q) is found using the lookup table for the CPF of p. The retrieval results are computed as in Section 3 except that the chance probabilities CP p (q) are substituted for the distances D pose (p,q). Intuitively, each distance is replaced by the significance of that distance. Although the chance probabilities are used as dissimilarity values they are not metric distances: neither symmetry nor the triangle inequality is satisfied. In the experimental results 16

17 below, we use a symmetric version CP(p,q) = max[cp p (q),cp q (p)]. The second term, CP q (p), is the estimated probability of getting a database shape as close to or closer to q than D pose (p,q) by chance. The max operator enforces a conservative inference strategy, in which the two poses are considered similar if both of them are unlikely to be that close to the other by chance. Note that the computation of the new chance probability function, CP q, depends only on the distances between the query poses and the database poses. These must be computed by the base matcher at retrieval time anyway, so in practice, the additional cost of computing CP q is small. 2 The use of chance probabilities differs fundamentally from the use of class-conditional probabilities p(q c) or posterior probabilities p(c q), where c is a class hypothesis. In class-based approaches, high similarity is usually inferred from high evidence for the class hypothesis. In our approach, high similarity is inferred from low evidence for the chance hypothesis. With the chance probability method, estimates of class-conditional or posterior probability densities are not required. 5. Experimental evaluation of retrieval performance To enable comparison with other approaches, we evaluated the proposed system on a widelyused benchmark test of retrieval accuracy [18]. The test database, which was developed as part of the MPEG-7 standards project, consists of 70 classes of objects with 20 examples per class, for a total of 1,400 shapes. The shapes in the database come from a variety of sources and include both natural and artificial objects. The database is challenging due to the presence of outliers (examples that are visually dissimilar from other members of their own class) and overlapping classes (examples that are highly similar to members of other classes) [18]. Figure 6 shows a few example shapes from the database. The figure shows pairs of shapes from the same classes in order to illustrate the within-class variability in this database. 2 By contrast, the computation of the CPFs CP p for each database pose p requires computation of the distances between all pairs of database poses, which is performed during database setup. 17

18 Figure 6. Example shapes from the test database [18]. The standard test procedure is as follows [1,10,18,27]. Each database shape is used as a test query. A retrieval is counted as correct if it is in the same class as the query. The number of correct retrievals in the top 40 ranks is counted. The self-match is included in the count. The retrieval accuracy of each system is reported as a percentage of the maximum possible number of correct retrievals, which is 28,000 (1,400 shapes 20 correct retrievals per shape). We will refer to this percentage score as the retrieval accuracy or bullseye score, following [18]. It is important to emphasize that the class labels are used only for evaluation purposes; they are not used by our retrieval method nor by any of the other methods in the comparison. Preprocessing of the 1,400 shapes resulted in 20,035 poses (an average of 14 poses/shape). The accuracy of the method is not sensitive to the number of sample points, n, as long as n is large enough to capture the complexity of the shapes. We found that n = 100 satisfied this requirement for the MPEG-7 test database. The method is also not very sensitive to the value of the sampling parameter λ. Accuracy appears to peak at λ = 0.1 and decreases slowly away from this value. The retrieval accuracy of the base method, i.e., using the pairwise matcher from Section 3 but not the CPFs, was 80.78%. This is a higher score than prior methods (Table 1). 18

19 The benchmark test only specified accuracy, not speed. Speed results were reported for only some of the prior systems tested. We have compiled the speed results for the methods of Table 1 where available, and standardized them by reporting the time it takes to match a single query shape with a single database shape (pairwise matching time). For our method, the pairwise matching time of 0.3 milliseconds includes the time required to match every pose of the query shape with every pose of the database shape; i.e., it is the time to compute D shape and not D pose. Even taking platform differences into consideration, it is clear that our method is the fastest by a significant margin. (Our timing results were measured using an interpreted Matlab implementation on a 1.8 GHz Pentium PC.) In fact, less than one second is required to perform a complete retrieval, including preprocessing of the query contour and exhaustive matching with every one of the 1,400 database shapes. 19

20 Table 1. Retrieval accuracy on 1,400 test objects (MPEG-7 test database) measured as the percent of correct retrievals in the top 40 ranks (bullseye score) [18], and pairwise shape matching time. The accuracy scores for methods 4-9 are from [18]. The references given for methods 4-9 in the table below are for the papers introducing the methods on which the test systems were based, not the systems themselves. See [18] for more details. For our methods (top 3 rows) the shape matching time includes all pose-to-pose matches between two shapes. Method Retrieval accuracy Pairwise shape matching time (milliseconds) base matcher + aggregated-pose CPFs 84.05% 3.7 base matcher + CPFs 83.04% 2.6 base matcher: fixed correspondence method 80.78% Distance Sets [10] 78.38% Curve edit distance [27] 78.17% Shape contexts [1] 76.51% Part correspondence [16,17] 76.45% Curvature scale space [21] 75.44% 6. Multilayer eigenvectors 70.33% 7. Zernike moments [15] 70.22% n/a 8. Wavelet [3] 67.76% 9. Skeleton DAG [19] 60% We then tested the hypothesis that using chance probabilities instead of shape-space distance would increase the accuracy of the method. The CPFs were learned from the entire set of 20,035 poses offline, during database setup. There were no parameters to set for learning CPFs. Each CPF lookup table was then compressed 100-to-1, from 20,035 to 200 (m 1 = 100, m 2 = 100). The bullseye score was 83.04%, a gain of more than two percentage points over the version without CPFs. The pairwise matching time of 2.6 ms was higher than for the non-cpf version, due to the additional time required to perform the table lookups at retrieval time. However, this is still faster than the prior methods for which data are available, and much faster than most of them (Table 1). 20

21 Offline database setup times are reasonably low. For the method without CPFs, it took 1.1 minutes to set up the database (including preprocessing all contours), whereas for the method with CPFs, it took 11.5 minutes. The additional time was due to the computation of the CPFs Aggregating information from multiple pose matches between two shapes Let p 1, p 2 be two poses from one shape and q 1, q 2 be two poses from another shape. In general, the values D pose (p 1,q 1 ) and D pose (p 2,q 2 ) are not independent; if two shapes are similar they are likely to be similar in more than one pose-to-pose match. A complete model for exploiting the information from multiple pose-to-pose matches between the same two shapes would have to take this non-independence into account. However, we hypothesized that an assumption of pose independence might nevertheless provide a computationally tractable approximation for exploiting information from multiple pose matches between the same two shapes. To avoid confusion, note that here we are discussing the independence of poses, not the independence of features. As discussed in Section 4, our method does not assume feature independence. We extend the CPF method as follows. Let M ={(p i,q j )}, for i = 1,...,m s and j = 1,...,m q, be the set of all pose-to-pose matches between the m s poses of a database shape and the m q poses of the query shape. A greedy bipartite matching of the poses of the two shapes is performed by repeatedly selecting the match M* in M with the lowest chance probability, M* = (p*,q*) =arg min{cp(p,q)}, (11) (p,q) M and removing from M all other matches that contain either p* or q*. This yields a sequence of non-overlapping matches M k = (p k,q k ), k = 1,2,...,N, ordered by increasing chance probability value. 3 Note that dimensionality reduction can be used to decrease database storage requirements [31]. For example, using only the first 25 principal components [7] of the 200-dimensional pose vectors provides an 8-to-1 compression of the vectors, at the cost of reducing retrieval accuracy from 83.04% to 82.36%. 21

22 retrieval accuracy (%) Figure 7. Retrieval accuracy as a function of the number of poses used to compute an aggregated CP. N We assume that non-overlapping matches (matches that do not share any poses in common) are independent, and compute the probability for the set of matches M 1,...,M N by multiplying the probabilities CP(p k,q k ) for each k = 1,...,N: N CP(s, q) = CP(p k,q k ), (12) k=1 where s, q are the database and query shapes, respectively. For numerical reasons we sum the logs of the probabilities instead of multiplying the probabilities: N logcp(s,q) = log CP(p k,q k ) (13) k=1 Figure 7 shows the performance of the aggregated-pose CP method for different values of N. Empirically, the maximum benefit is obtained at N = 3, which yields an accuracy score of 84.05%. This corresponds to an additional 915 shapes correctly retrieved, compared to the base method without CPFs, and an additional 1,588 shapes correctly retrieved compared to the most accurate prior method in Table 1 (Distance Sets [10]). At this level of performance, the method retrieves nearly 17 of the 20 possible correct shapes for each query, on average. The pairwise matching time is 3.7 ms. Even with the additional computation, the method is still faster than prior approaches. In the following, when results for the aggregated-pose version of the method are given, N = 3 will be assumed. 22

23 7. Shape classification The method presented here is designed for the problem of unlabeled shape retrieval. However, if the database shapes have class labels, then the current method can also be used for classification. In this section, we report the results of a nearest neighbor (1-NN) classifier. The classifier returns the label of the best-matching database shape: s ˆ = arg min D shape (q, s) (14) s database ˆ c = class( ˆ s ), (15) where q is the input shape, s is a database shape, maps shapes to their class labels, and c ˆ is the hypothesized class. ˆ s is the database shape closest to q, class: s c We measured the classification accuracy on the same MPEG-7 1,400-shape database used for evaluation of retrieval accuracy. Every one of the 1,400 shapes was used as a test input. A classification was considered correct if the closest non-identical match in the database was in the same class as the query shape. The classification accuracies of the base method, the method with CPFs, and the method with aggregated-pose CPFs are shown in Table 2. Since even the base method generates few errors in an absolute sense (42 out of 1,400 test inputs), improvements due to CPFs and aggregated-pose CPFs will necessarily be small. The most accurate method (aggregated-pose CPF) generates only 36 errors out of 1,400, a 2.6% error rate. Table 2. Classification accuracy on 1,400 test shapes (MPEG-7 database). Fixed correspondence Fixed correspondence + CPFs Fixed correspondence + aggregated-pose CPFs Successes Errors 1,358 (97.0%) 42 (3.0%) 1,361 (97.2%) 39 (2.8%) 1,364 (97.4%) 36 (2.6%) 23

24 The classification errors were due to two causes. In some cases a shape from one class was similar to a shape from another class; for example, quadrupeds such as horse or dog would occasionally be confused. In other error cases, there was no database shape example similar to the query shape. It is interesting to note that the correct classification failed to be in the top 5 class hypotheses only 3 times out of 1,400 trials (0.2%). Direct comparison with the other methods in Table 1 is not possible because classification accuracy was not reported in those papers. However, it appears that the classification accuracy of the proposed method is high in absolute terms. Furthermore, after retrieval has been performed, classification takes negligible additional time. 8. Discussion The basic characteristic of a one-to-many shape matching problem is that a large sample of shapes is available. We exploited that characteristic in two ways to develop a system that is both accurate and fast in comparison with earlier methods. First, we used the statistics of the large sample to obtain a better chance probability model than would be obtained by assuming a uniform distribution in shape space. (The latter is equivalent to the method without CPFs.) Second, we used the fact that the output of a shape retrieval system is usually driven by matches with relatively similar shapes if the training and test sets are drawn from the same population. Thus, costly flexible matching is not required. More specifically, the system is fast because (1) a linear number of poses are generated for each shape, (2) individual pose-to-pose matches are performed without solving either a correspondence problem or a structural matching problem, and (3) the chance probability functions are learned in advance and stored in lookup tables. The use of a fixed correspondence as an approximation to flexible correspondence is key. The system is accurate because the alignment approach provides a high level of redundancy, which we can exploit successfully because the speed of the core matcher enables exhaustive pairwise matching of the poses of two shapes. This increases the probability of finding a good 24

25 match if one exists. Thus, sacrificing accuracy for speed at the level of individual matches leads to an increased accuracy at the system level. The chance probability model increases accuracy further by exploiting implicit information about the density of the shape space in the vicinity of each database shape. Furthermore, the chance probability model is convenient to use with highdimensional spaces because it is a function on a one-dimensional domain; i.e., distance from a point in feature space. In this respect it differs from kernel-based density estimation, since density is a function on the shape space itself, which is a 2n-dimensional domain. The results reported in this paper were obtained without the use of generic information retrieval techniques for reducing the number of pairwise matches (i.e., database indexing, clustering, or prefiltering). Since these techniques can be used with any one-to-many shape matching system, the current system could employ them for use with larger databases. These techniques could also be used to reduce the cost of learning the CPFs in the offline database setup stage. This is valuable since the cost of learning all of the CPFs is O(m 2 logm), where m is the number of database poses. The CPF method is a generic technique that can be applied to other shape retrieval methods that are based on distances in a feature space, and to retrieval of other types of data besides shapes. However, in contrast to the generic techniques mentioned above, the CPF method is used to increase accuracy rather than to decrease the number of matches. We are currently exploring applications of the ideas of this paper to other problems, including the problem of finding correspondences between two isolated shapes [33]. Acknowledgements The author thanks Drs. Latecki and Mokhtarian for generously making contour data files available. Versions of Figures 1 and 3-5 have appeared in [30-32]. This research was supported in part by NSF References 25

26 [1] S. Belongie, J. Malik, and J. Puzicha, Shape Matching and Object Recognition Using Shape Contexts. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(4), , [2] P. Besl and N. D. McKay, A Method for Registration of 3-D Shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 14(2), , [3] G. C.-H. Chuang and C.-C. J. Kuo, Wavelet Descriptor of Planar Curves: Theory and Applications. IEEE Transactions on Image Processing 5(1), 56-70, [4] T. F. Cootes, C. J. Taylor, D. H. Cooper, and J. Graham, Active Shape Models - Their Training and Application, Computer Vision and Image Understanding 61(1) 38-59, [5] A. Del Bimbo and P. Pala, Shape Indexing by Multi-Scale Representation, Image and Vision Computing 17, , [6] A. Desolneux, L. Moisan, and J.-M. Morel, Meaningful Alignments, International Journal of Computer Vision 40(1), 7-23, [7] R. Duda, P. Hart, and D. Stork, Pattern Classification (2 nd ed.). Wiley, New York, [8] C. Faloutsos, R. Barber, M. Flickner, J. Hafner, W. Niblack, D. Petkovic, and W. Equitz, Efficient and Effective Querying by Image Content, Journal of Intelligent Information Systems 3, , [9] Y. Gdalyahu and D. Weinshall, Flexible Syntactic Matching of Curves and Its Application to Automatic Hierarchical Classification of Silhouettes, IEEE Transactions on Pattern Analysis and Machine Intelligence 21(12), , [10] C. Grigorescu and N. Petkov, Distance Sets for Shape Filters and Shape Recognition, IEEE Transactions on Image Processing 12(10) , [11] W. E. L. Grimson and D. P. Huttenlocher, On the Verification of Hypothesized Matches in Model-Based Recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence 13(12) , [12] G. van der Heijden and M. Worring, "Domain Concept to Feature Mapping for a Plant Variety Image Database," in A. Smeulders and R. Jain (Eds), Image Databases and Multi- Media Search, World Scientific, New Jersey, , [13] Y. Hicks, D. Marshall, R.R. Martin, P.L. Rosin, M.M. Bayer, D.G. Mann, "Automatic Landmarking for Building Biological Shape Models" Proceedings of the International Conference on Image Processing, pp ,

27 [14] D. P. Huttenlocher and S. Ullman, Recognizing Solid Objects by Alignment with an Image. International Journal of Computer Vision 5(2), , [15] A. Khotanzad and Y. H. Hong, "Invariant Image Recognition by Zernike Moments," IEEE Transactions on Pattern Analysis and Machine Intelligence 12(5) , [16] L. Latecki and R. Lakämper, Shape Similarity Measure Based on Correspondence of Visual Parts. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(10), , [17] L. Latecki and R. Lakämper, Application of Planar Shape Comparison to Object Retrieval in Image Databases, Pattern Recognition 35, 15 29, [18] L. Latecki, R. Lakämper, and U. Eckhardt, Shape Descriptors for Non-rigid Shapes with a Single Closed Contour. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp , [19] I.-J. Lin and S. Y. Kung. Coding and Comparison of DAGs as a Novel Neural Structure with Application to On-line Handwritten Recognition. IEEE Transactions on Signal Processing 45(11), , [20] T.-L. Liu and D. Geiger, Approximate Tree Matching and Shape Similarity, Proceedings of the International Conference on Computer Vision, pp , [21] F. Mokhtarian, S. Abbasi, and J. Kittler, Efficient and Robust Retrieval by Shape Content through Curvature Scale Space, in A. Smeulders and R. Jain (Eds.), Image Databases and Multi-Media Search, World Scientific, New Jersey, pp , [22] P. Muse, F. Sur, F. Cao, and Y. Gousseau, Unsupervised Thresholds for Shape Matching, Proceedings of the International Conference on Image Processing, Vol. II, , [23] C. F. Olson and D. P. Huttenlocher, Automatic Target Recognition by Matching Oriented Edge Pixels. IEEE Transactions on Image Processing 6(1), , [24] C. A. Rothwell, A. Zisserman, D. A. Forsyth, and J. L. Mundy, Planar Object Recognition using Projective Shape Representation. International Journal of Computer Vision 16, 57-99, [25] H. Samet, The Design and Analysis of Spatial Data Structures, Addison-Wesley, Reading, MA, [26] T. Sebastian, P. Klein, and B. Kimia, Shock-based Indexing into Large Shape Databases, Proceedings of the European Conference on Computer Vision, pp , [27] T. Sebastian, P. Klein, and B. Kimia, On Aligning Curves, IEEE Transactions on Pattern Analysis and Machine Intelligence 25(1), ,

Learning Chance Probability Functions for Shape Retrieval or Classification

Learning Chance Probability Functions for Shape Retrieval or Classification Boaz J. Super Computer Science Department, University of Illinois at Chicago super@cs.uic.edu Abstract Several example-based