Shape Retrieval Based on Dynamic Programming

Size: px

Start display at page:

Download "Shape Retrieval Based on Dynamic Programming"

Emery George
6 years ago
Views:

1 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 9, NO. 1, JANUARY # can be ound by solving the ollowing system o equations: z =k H i # k = z =k H i^ i : (16) 3 Repeat 1 and 2 until cluster membership is unchanged. Since the number o objects is speciied in the beginning, the value o variance o noise does not aect the inal result and we do not have to speciy. The convergence in the second stage is guaranteed as the cost unction always decreases. Actually, i starting rom the same initial set o cluster centers, the second stage gives the same result as the two-step iterative procedure used in [2] does. 1 However, our algorithm requires much less computations, considering that ^ i and H i are already obtained rom the least-squares minimization o S i. Note also, that the above algorithm merges nonadjacent regions as well, which is not the case or some methods [1], [5], [9]. IV. RESULTS In Figs. 2 and 3, we show the results o applying the proposed merging algorithm on standard test sequences. The initial segmentation was obtained with the morphological multiscale technique [7]. The results or table tennis and lower garden sequences were obtained with optic low matching. As measurements y(x) we used the dense optic low ield, computed rom two successive images using the hierarchical method in [3]. For the calendar sequence, we used the model, which is based on the linearized intensity matching equation [6]. The improvements due to the use o the new similarity measure are conirmed by comparison with Fig. 2(c) and (d) in which we show the results o the existing methods in [2] and [9] applied or the same set o initial regions. More elaborate evaluation can be ound in [6]. [3] J. R. Bergen, P. Anandan, K. J. Hanna, and R. Hingorani, Hierarchical model-based motion estimation, in Proc. 2nd Eur. Con. Computer Vision, 1992, pp [4] P. J. Huber, Robust Statistic. New York: Wiley, [5] F. Moscheni, S. Bhattacharjee, and M. Kunt, Spatiotemporal segmentation based on region merging, IEEE Trans. Pattern Anal. Machine Intell., vol. 20, no. 9, pp , Sept [6] Intell. Sensory Inorm. Syst. Grp., Tech. Rep. 4,, Univ. Amsterdam, The Netherlands, available at rein/isis.html, Dec [7] P. Salembier, Morphological multiscale segmentation or image coding, Signal Process., vol. 38, pp , Sept [8] J. Y. A. Wang and E. H. Adelson, Representing moving images with layers, IEEE Trans. Image Processing, vol. 3, pp , Sept [9] L. Wu, J. Benois-Pineau, Ph. Delagnes, and D. Barba, Spatio temporal segmentation o image sequences or object-oriented low bit-rate image coding, Signal Process: Image Commun., vol. 8, pp , Sept Shape Retrieval Based on Dynamic Programming Evangelos Milios and Euripides G. M. Petrakis Abstract We propose a shape matching algorithm or deormed shapes based on dynamic programming. Our algorithm is capable o grouping together segments at iner scales in order to come up with appropriate correspondences with segments at coarser scales. We illustrate the eectiveness o our algorithm in retrieval o shapes by content on two dierent two-dimensional (2 D) datasets, one o static hand gesture shapes and another o marine lie shapes. We also demonstrate the superiority o our approach over traditional approaches to shape matching and retrieval, such as Fourier descriptors and geometric and sequential moments. Our evaluation is based on human relevance judgments ollowing a well-established methodology rom the inormation retrieval ield. Index Terms Dynamic programming, image database, query by example, relevance judgments, shape retrieval. V. CONCLUSION We have proposed a new criterion or similarity o regions movement in a video scene based on a statistical test or equality o motion parameters. The uncertainty in parameter estimation is incorporated in an optimal way. Using this measure, we have developed a new merging algorithm consisting o two stages. The agglomerative merging in the irst stage provides a good starting point or the second stage in which the regions are merged according to a K-means like algorithm. The improved perormance over existing methods has been demonstrated on real sequences. As extracted objects and their motion parameters are accurate, they can be used or content-based video retrieval in digital libraries. REFERENCES [1] G. Adiv, Determining 3-D motion and structure rom optical lows generated by several moving objects, IEEE Trans. Pattern Anal. Machine Intell., vol. PAMI 7, pp , [2] Y. Altunbasak, P. E. Eren, and A. M. Tekalp, Region-based parametric motion segmentation using color inormation, Graph. Models Image Process., vol. 60, pp , Jan We note, by the way, that in [2, Sec. IV-C] the mixed use o intensity matching and optic low matching does not guarantee convergence. I. INTRODUCTION Object recognition is an important problem in computer vision and has received considerable attention in the literature. Most approaches to object recognition are model-based [1], emphasizing the accuracy o recognition. They are limited to speciic image types and require that all shapes are preprocessed and labeled prior to storage. However, the increasing amounts o image data in many application domains has generated additional interest or real-time management and retrieval o shapes [2], [3]. There, the emphasis is not only on accuracy, but also Manuscript received December 1, 1998; revised July 27, This work was supported by a grant rom the Natural Sciences and Engineering Research Council o Canada. This work was perormed while E. G. M. Petrakis was visiting York University. The associate editor coordinating the review o this manuscript and approving it or publication was Pro. Thomas S. Huang. E. Milios was with the Department o Computer Science, York University, Toronto, Ont. M3J 1P3, Canada ( eem@cs.yorku.ca). He is now with the Faculty o Computer Science, Dalhousie University, Haliax, N.S. B3J 2X4, Canada ( eem@cs.dal.ca). E. G. M. Petrakis is with the Department o Electronic and Computer Engineering, Technical University o Crete, Chania, Crete, Greece ( petrakis@ced.tuc.gr). Publisher Item Identiier S (00) /00$ IEEE

2 142 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 9, NO. 1, JANUARY 2000 on eiciency (i.e., speed) o retrieval. Due to large image numbers, less emphasis may be given to preprocessing and labeling. The perormance o any object recognition system ultimately depends on the types o shape representations used and on the matching algorithms applied [4]. An important class o methods relies on symbolic entities obtained rom the representation o the shape s inlection points at various scales and is reerred to as curvature scale space (CSS) representation [5]. In [6], matching is perormed through interval trees, which are computed by tracking the CSS representation rom coarser to iner scales. In [7], matching is based on the maxima o CSS curves. Recently, matching in scale-space has been addressed with dynamic programming (DP) [8]. Regarding image database retrieval by shape content, [2] reports experiments with traditional shape representation and matching methods (i.e., Fourier descriptors, moment-based methods, combinations o such methods) on 500 trademark images. More recently, the eectiveness o such shape methods in conjunction with color eatures is investigated in [3] using 1100 trademark images taken rom two dierent datasets. In this work, we ocus on shape content and shape similarity retrievals. The contributions o this work are the ollowing. We propose a shape matching algorithm that is particularly eective or shape retrieval. We establish the superiority o our method over traditional shape matching methods such as Fourier descriptors and sequential and geometric moments. We tested our algorithm on a data set o 980 two-dimensional (2 D) hand gesture shapes and on a marine lie database with 1100 shapes. We introduce to the computer vision community a well established method rom inormation retrieval or the empirical evaluation o retrieval results obtained by many competing methods. We assume that shapes have already been extracted rom images in the orm o closed sequences o points. Automatic shape extraction rom images (or example via region segmentation or edge ollowing) is a nontrivial problem, and it is outside the scope o this paper. For our hand gesture dataset, contours are extracted rom images by taking the polygonal approximation o the hand boundary ater thresholding. The shapes o the marine dataset are already available in the desired orm. The major problem with segmented representations is that small perturbations to the shape can yield large changes in the segmentation. Thereore, the matching algorithm must be robust to segmentation changes. A standard ix is to represent the shape at multiple scales o resolution (smoothing), and either use a ull scale-space representation or matching [6], or like [8] and our approach, have the algorithm choose the appropriate scales or dierent parts o the shape. The rest o this work is organized as ollows: Work related to our proposed algorithm is discussed in Section II. Our shape matching algorithm is presented in Section III. The database set-up, the evaluation criteria along with the experimental results are presented in Section IV. Conclusions and issues or uture research are discussed in Section V. II. RELATED WORK Our algorithm is a substantial extension o the DP algorithm o [8], in the ollowing ways. We propose that the computation o the CSS representation be removed rom the matching algorithm. The CSS representation has two drawbacks: First, it tends to diuse the eects o a eature ar away rom its location as coarser scales are considered. This may be undesirable i such eatures have perceptual signiicance. Second, it is computationally expensive. Our algorithm is not only aster but also achieves matching accuracy comparable to that o the original ormulation [9]. Merging o neighboring segments in [8] is allowed only i such segments merge at some scale (not necessarily the same or the two shapes) in the CSS representation. We present a ormulation o the algorithm that does not use scale space to restrict search or segment merges. Our algorithm allows all segment merges, and relies only on the minimization o the overall matching cost to select the merges. We have implemented a dierent set o cost measures rom the original algorithm and have demonstrated their improved perormance [9]. The algorithm in [8] perorms a best-only search in a DP table as it looks or minimum-cost paths in the DP ramework. In [10], we have identiied instances, in which a best-only search strategy ails to ind a valid match between two shapes, although one exists [9]. We have extended the algorithm to perorm k-best search and we have demonstrated that or a small k (e.g., 5), the additional space and time requirements are modest and the algorithm can solve matching problems where the original one ails (see also Section III-E). III. SHAPE MATCHING The shape matching algorithm that lies at the core o our methodology takes in two shapes and computes: 1) their distance and 2) the correspondences between similar parts o the two shapes. In retrievals, only the distances are used. However, the correspondences help assess the plausibility o the distance computation, i necessary. A. Deinitions Let A and B be the two shapes to be matched and let A = a 0;a 1; 111;a N01 and B = b 0;b 1; 111;b M01 be the convex/concave segment sequences o the two shapes, with a i being the segment between two consecutive inlection points (i.e., points o change o curvature) p i and p i+1; similarly or b j. Then, a(i0nji); n 0, is the sequence o segments a i0n; a i0n+1; 111;a i ; similarly or b(j 0 mjj); m 0. The algorithm searches or segment correspondences at various levels o shape detail by allowing matching o merged sequences o consecutive segments, i this leads to the minimization o a cost unction. Each merging is a recursive application o the grammar rules CV C! C and VCV! V, where C and V denote convex and concave segments respectively [11]. The number o merged segments is always odd. A complete match is a correspondence between groups o consecutive segments in order, such that no segments are let unassociated. The goal is to ind the best association o segments in shape A to segments in shape B. This is ormulated as a minimization problem that is solved eiciently by dynamic programming: A table o partial costs is built and the optimal matching is searched in the orm o a path in the DP table that minimizes a total dissimilarity cost. The DP table has 2N columns and 2M rows, corresponding to segments o shape A and shape B, respectively repeated twice (to orce the algorithm to consider all possible relative rotations between the two shapes). All subscripts below are modulo N and M, respectively. A link between cells (i w01; j w01) and (i w ;j w ) denotes the matching o segments a(i w01 +1ji w ) with b(j w01 +1jj w ).Apath is a linked sequence o cells (i 0;j 0); (i 1;j 1); 111; (i w;jw), not necessarily adjacent, indicating a partial match. A complete match has (i 0 ;j 0 ) = (i w ;j w ). Function (a(i w01 +1ji w );b(j w01 +1jj w )) represents

3 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 9, NO. 1, JANUARY associations, which are not possible. To describe the illing o the k wth option cost g[k w ] o cell(i w ;j w ) o the DP table, we need to compute the ollowing (n; m; k) or all possible n; m, and k, and select the k wth smallest value (n; m; k) =g(i w 0 2n 0 1; j w 0 2m 0 1; k) + (a(i w 0 2nji w); b(jw 0 2mjj w)) ; n; m 0; 0 k<k: (1) Fig. 1. Example o a DP table. The cost in the above equation is the cost o association o segments a(i w 0 2nji w ) with b(j w 0 2mjj w ) and consists o a merging cost component and a dissimilarity cost component (see Section III-C below). Additional constraints on acceptable pairs (n; m) are that either t 1 = t 2 =0(a complete match has been ound), or t 1 > 0; t 2 > 0 (there exist unmatched segments in both shapes). the dissimilarity cost between its two arguments and is deined in Section III-C. Each cell cell(i; j) in the DP table contains the cost array g[k w] and associated bookkeeping data t 1 [k w ], t 2 [k w ], index[k w ], g n [k w ], g m [k w ], where k w varies rom 0 to K 0 1 and reers to the k w th best path (k wth option), or partial match, up to and including cell(i w;j w). Each cell keeps up to K best paths. Speciically, g[k w ] holds the cost o the path, t 1 [k w ] and t 2 [k w ] hold the number o unmatched segments in shapes A and B respectively or the path and index[k w], gn[kw] and g m [k w ] hold the back links or the path and allow the backward tracing o the path. I g n[kw] =nw and g m[kw] =mw or cell(i w;j w), then the previous cell and option in the path is cell(i w01; j w01) = cell(i w 0 2n w 0 1; j w 0 2m w 0 1) and index[k w ], respectively, where n w and m w are nonnegative integers. Notice that, i w01 = i w 0 2n w 0 1 and j w01 = j w 0 2m w 0 1, since the number o segments which are merged is always odd. Fig. 1 illustrates the lower hal o a DP table computed or the matching o two shapes with eight segments each. Two incomplete paths are shown ending at cell(10; 6). Numbers in cells indicate accumulated costs. The DP table consists o the initial value area (let hal) and the calculation area (right hal). In the initial value area all g terms are initialized to zero, t 1 to N and t 2 to M, implying that each o these cells can act as the irst cell in a path. The calculation area is computed and inally the optimal path is searched. A path is complete when its corresponding t 1 and t 2 at the inal cell o the path both become zero simultaneously. B. Algorithm The main idea in the algorithm is to ill the DP table cells by scanning orward and then search or the optimal complete path by tracing backward. We outline the algorithm as ollows. or do or do ill the K options in cell check i a complete path has been ound; end or end or ind the complete path with the lowest cost; retrace segment matches by ollowing the backward links. The or loop or j w does not run over all the indicated values, but only over those values that do not involve convex to concave segment C. Cost Components The cost term in (1) can be rewritten as (a(i w01 + 1ji w );b(j w01 +1jj w )). Following the notation o [8], this cost term consists o three additive components: (a(i w01 +1ji w );b(j w01 +1jj w )) = Merging Cost (a(i w01 +1ji w )) + Merging Cost (b(j w01 +1jj w )) + Dissim Cost (a(i w01 +1ji w );b(j w01 +1jj w )) where represents the relative importance o the merging and dissimilarity costs. In this work, was set to one. The irst two terms in (2) represent the cost o merging segments a(i w01 +1ji w ) in shape A and segments b(j w01 +1jj w ) in shape B, respectively, while the last term is the cost o associating the merged a(i w01 +1ji w ) with the merged b(j w01 +1jj w ). Requirements or reliable cost computation are the ollowing. Merging should ollow the process grammar rules [11] (i.e., each allowable merging should be a recursive application o the grammar rules CV C ) C and VCV ) V ). This is enorced by the DP algorithm. Merging a visually prominent segment (i.e., a large segment with high curvature) into a merged segment o the opposite type (convex or concave) should incur a high cost. To speciy this requirement, we need to deine visual prominence in geometric terms. The partial cost components arising rom dierent eatures o the shape should be combined into a total cost in a meaningul way. The heuristic cost computations that ollow attempt to satisy the above requirements. First, we deine geometric quantities (eatures) needed in the speciication o visual prominence o a segment according to Fig. 2. Rotation Angle i angle traversed by the tangent to the segment rom inlection point p i to inlection point p i+1 and shows how strongly a segment is curved. Length l i length o segment a i. Area a i area enclosed between the chord and the arc between the inlection points p i and p i+1. (2)

4 144 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 9, NO. 1, JANUARY 2000 For being length or area: 0 c =10 C segs o group For being rotation angle: all segs o shape : (7) Fig. 2. Geometric quantities or deining the prominence o a segment. 1) Dissimilarity Cost Computation: We assign a higher cost to segments (or groups o segments) with large dierences in more than one eature: Dissim Cost = W max d g (3) all eatures where = l;,ora. We choose the max operation instead o product (as in [8]) because in the product a small cost in terms o one eature can cancel the eect o a high cost in terms o another eature, something that may lead to a visually implausible outcome. The max operation addresses this problem. Factor W equals the number o eatures or which d is greater than 0:75 2 max d g, where = l; ; a. Thus, i all three eatures have uniormly large d, then the dissimilarity cost is multiplied by three. The term d, or = is deined as d = 2A 0 2B (4) 2 A +2 B where 2 A = i s=i s, and +1 2B = j s=j +1 s, and s being the rotation angle o segment with index s o shape A and shape B, respectively. The term d, or being l (length) or area (a), is deined as d = A F A 0 B F B (5) where F A = N01 s, s=0 A = i s=i +1 s o shape A and similarly or F B, B o shape B. 2) Merging Cost Computation: Let the types o the segments being merged be CV C 111VC. The opposite case is obtained by switching C and V. The merging cost is deined as ollows: Merging Cost = max w c g (6) all eatures where subscript reers to a eature (length, area or rotation angle). We choose the above maximization ormula instead o sum o products o terms comparing consecutive segments (as in [8]) or the ollowing reasons: we used max instead o product, because in a product a small cost in terms o one eature can cancel the eect o a high cost in terms o another. The reason or abandoning the sum o consecutive segments is because it implies that the plausibility o merging several segments can be reduced to the similarity o consecutive segments, which may not necessarily be true. Consider, or example, the case o a short and lat segment next to a long and curved one. In this case, it is plausible to merge the two, while the merging cost in [8] will be high. Another drawback o the use o a sum is that the merging cost increases with the number o segments merged, even i several very short segments are being merged into a large one. c =10 Csegs o group 0 Csegs o group + : (8) The intuition behind these ormulae is that they measure the visual prominence o the eatures o the absorbed segments (o type V ) relative to the absorbing segments (o type C). All costs c are within the interval [0, 2]. Cost c is close to 0 i the convex segments visually dominate the concave ones (hence it is plausible to absorb the concave ones), while it is close to two i the concave segments visually dominate the convex ones (hence it is not plausible to perorm the merge, thereore the merging cost should be high). For being any eature (length, area, rotation angle) the weight term is w = N 2 V segs o shape : (9) The intuition behind the weight term is to measure the visual prominence o the absorbed segments within the shape as a whole. Factor N=2 is heuristic. D. Matching Examples Fig. 3 illustrates segment correspondences (indicated by consecutive lines connecting the starting and ending points o the associated segments) obtained by matching hand silhouettes (let) and ish silhouettes (right). One o the shapes has been scaled to 50% o the original. Notice that the algorithm tried to come-up with plausible segment associations by matching groups o segments which, in the case o the ish shapes are due to shape detail. E. K-Option DP Search The algorithm in [8], except that it uses the CSS representation to constrain the search or possible matches, is a special case o the above K-option DP algorithm or K = 1. The motivation or introducing the K-option method is that, by using only one option at each cell, the optimal path is sometimes missed. This happens because the (n; m) pair chosen at cell(i; j) is the one which leads to the least cost path only up to cell(i; j). This path may not be the one that leads to the ormation o a complete path. It is possible that ater the entire DP table is computed, there exist only incomplete paths, in which case the algorithm ails to produce a match. Also, the best choice at cell(i; j) may lead to an expensive match o the remaining unmatched segments. This is illustrated in Fig. 1, which shows the DP table or matching two shapes with eight and our segments, respectively. Cells in the table are marked as ollows: (4, 0) as B, (6, 0) as C, (8, 2) as E, (9, 4) as A, and (10, 6) as D. At each cell, the number in bold represents the cost o the path up to that cell using the K =1option (dashed line). Numbers in italics represent the cost along the alternate path (K = 2, dotted

5 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 9, NO. 1, JANUARY (a) (b) Fig. 3. Segment associations reported by the matching algorithm on representative matches rom the gestures and the marine lie databases. line). Alternate path BEAD is ignored by the K =1option algorithm because o a least cost choice made at A. This problem is addressed by the K-option DP method where the choice o a best subpath is not made at cell A but deerred until a later point when more inormation is available. IV. SHAPE RETRIEVAL In our experiments we used the ollowing datasets. GESTURES 1 : Consists o 980 synthetic shapes that are generated rom 17 original hand gestures. We took the 17 shapes in pairs and, or each such pair, we produced a number o blended shapes by transorming one shape to the other using the shape morphing algorithm o [12]. To evaluate our method we took the 17 original shapes as queries. SQUID 2 : Consists o 1100 shapes o marine species. We careully selected 20 shapes rom the SQUID database and we used them as queries. Each shape is represented by its contour. All contours are preprocessed to contain between 80 and 100 points. The ish silhouettes o the SQUID database contain much iner contour detail than the hand silhouettes o the GESTURES database. The experiments are designed to illustrate the superiority o our shape matching algorithm over traditional methods o shape matching 1 We have made our database available on the Internet at http: // 2 SQUID is available rom and retrieval. All measurements below correspond to averages over 17 and 20 queries, respectively. A. Comparison with Other Methods The competitors to our method are as ollows. Fourier Descriptors [13]: This is known to be one o the most successul methods or the recognition o closed shapes. We computed the irst (lower order) 20 coeicients o the Fourier transorm. Sequential Moments [14]: This is one o the most eective moment-based methods or closed shapes. For each shape, a representation o 4 moment coeicients is computed rom its bounding contour. Geometric Moments [15]: This is the original and the most characteristic representative o a wide class o methods based on area moments. A representation o seven moment coeicients o the shape is computed rom the area it occupies. Additional reasons or choosing these methods or our evaluation are: a1) they are translation, rotation and scale invariant (the same as our method) and 2) they all ilter out shape detail so that, they can detect shape similarity at a coarse view-scale. Our method has these advantages too but, in addition, matching with our method which is local (as opposed to global matching with the above methods) reporting all associations between similar segments choosing (possibly) dierent scales or dierent parts o the shapes depending on noise or shape detail. Fourier descriptors and sequential and geometric moments are precomputed and stored in separate iles in the database along with the original contours. When a query is given, its representation is computed and it is matched with similar representations stored in the database. This typically takes less than less than 5 s on a SUN Ultra 1 or

6 146 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 9, NO. 1, JANUARY 2000 Fig. 4. Precision-recall diagram or the GESTURES database corresponding to (a) our shape matching algorithm, (b) Fourier descriptors, (c) sequential moments, and (d) geometric moments. each data set. For our method, no precomputed inormation is stored. Instead, the actual shape contours are used to search the database. For this reason, our method is the slowest requiring approximately 1 s per shape match on a Pentium PC, 200 MHz. Certain optimizations that could speed-up our method are possible, such as the precomputation and storage o the convex and concave segments o all shapes in the database, and the selection o a number o options K that ensures optimal tradeo between accuracy and amount o computation. B. Evaluation Criteria We used human relevance judgments to compute the eectiveness o each method. Two shapes (i.e., a query and a stored shape) are considered similar i a human judges that they represent the same igure. To measure eectiveness, or each candidate method we computed: 1) Precision: deined as the percentage o similar shapes retrieved with respect to the total number o retrieved shapes. 2) Recall: deined as the percentage o similar shapes retrieved with respect to the total number o similar shapes in the database. Because we don t have the resources to compare every query with each database shape (i.e., this would require, or each method, = visual judgments or the GESTURES dataset and = visual judgments or the SQUID dataset!), or each query, we merged the answers obtained by all candidate methods and we considered this as the database which is manually inspected or relevant entries. Notice however, that this method does not allow or absolute judgments such as method A misses 10% o the total similar answers in the database. It provides, however, a air basis or comparisons between methods allowing judgements such as method A returns 5% ewer correct answers than method B. C. Experimental Results Each query retrieves the best 50 shapes. For answer sets containing between one and 50 entries, we computed the average values o precision and recall. Precision and recall values are represented in precision-recall plots: The horizontal axis corresponds to recall and the vertical axis corresponds to precision. Each method is represented by a curve. Each point in such a curve is the average over 17 queries (or the GESTURES database) and 20 queries (or the SQUID database) respectively. The total number o points in each curve is 50 (i.e., we compute precision and recall or answers containing between one and Fig. 5. Precision-recall diagram or the SQUID database corresponding to: (a) our shape matching algorithm, (b) Fourier descriptors, (c) sequential moments, and (d) geometric moments. 50 shapes). Thereore, the top-let point o the diagram corresponds to the precision/recall values or the best answer (best match), while the bottom right point corresponds to the precision/recall values or the entire answer set with 50 retrieved shapes. Fig. 4 illustrates the precision-recall diagram or the GESTURES database. For small answer sets returning up to 12 shapes (corresponding to the let-most 12 points o a curve) our method and Fourier descriptors perorm about equally in terms o both, precision and recall. Notice that, or such small answer sets, both methods achieve precision close to one, that is, their answers are almost 100% correct. For larger answer sets, our method perorms clearly better than any other method, achieving up to 25% better recall and 20% better precision than the second best method (Fourier descriptors). This result demonstrates that our method is very well suited or image retrieval where one is (typically) interested in retrieving more than ten or 20 and up to 50 shapes. Fig. 5 illustrates the precision-recall diagram or the SQUID database. Our method perorms clearly better than any other method, achieving up to 18% better recall and 15% better precision. An important observation is that all methods achieved lower values o precision and recall than those achieved on the GESTURES database. Presumably, all methods are sensitive to noise and contour detail which mainly exist in the shapes o the SQUID database. We see that our method outperorms the others both the GESTURES and the SQUID databases. Although it is slower than its competitors, the selection o a method or shape retrieval should be based mainly on eectiveness (i.e., quality o results) rather than on eiciency (i.e., speed). V. CONCLUSIONS We propose a shape matching algorithm or handling shape similarity retrievals in image databases. Our algorithm is based on dynamic programming, perorms implicitly at multiple scales and allows the matching o deormed shapes. We demonstrate the superiority o our approach over traditional approaches to shape matching and retrieval (Fourier descriptors, geometric and sequential moments) using two dierent datasets with 980 and 1100 shapes, respectively. We also introduce to the Computer Vision community a well-established methodology or the evaluation o the retrieval results obtained by more than one competing methods.

7 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 9, NO. 1, JANUARY Current research is directed toward extending our matching algorithm or open curves. Future work includes the experimentation with more datasets and methods, and the handling o combined queries involving more than one eature (e.g., shape, color, text). Recently, we have proposed an indexing mechanism or our method which achieves up to three orders o magnitude speed-up over sequential scanning with very high accuracy, providing also or clustering visualization and browsing o a data set [9]. ACKNOWLEDGMENT The authors would like to thank P. Elinas, or his help in the experiments, Z. Rao or valuable discussions and or a shape morphing program to generate the GESTURES database, M. Roussou and P. Economopoulos or implementing the shape matching algorithm in C++, J. Baid, who introduced the K-option idea, and Pro. Mokhtarian, Centre or Vision, Speech, and Signal Processing Laboratory, University o Surrey, U.K., or providing the SQUID database or our experiments. REFERENCES [1] R. T. Chin and C. R. Dyer, Model-based recognition in robot vision, ACM Comput. Surv., vol. 18, pp , [2] B. M. Mehtre, M. S. Kankhanhalli, and W. F. Lee, Shape measures or content based image retrieval: A comparison, Inorm. Process. Manage., vol. 33, pp , [3] A. K. Jain and A. Vailaya, Shape-based retrieval: A case study with trademark image databases, Pattern Recognit., vol. 31, pp , [4] S. Loncaric, A survey o shape analysis techniques, Pattern Recognit., vol. 31, pp , [5] A. Witkin, Scale space iltering, in Proc. 8th YCA1, Karlsruhe, West Germany, 1983, pp [6] F. Mokhtarian and A. Mackworth, Scale-based description o plannar curves and two-dimensional shapes, IEEE Trans. Pattern Anal. Machine Intell., vol. PAMI 8, pp , [7] F. Mokhtarian, Silhouette-based object recognition through curvature scale space, IEEE Trans. Pattern Anal. Machine Intell., vol. 17, May [8] N. Ueda and S. Suzuki, Learning visual models rom shape contours using multiscale convex/concave structure matching, IEEE Trans. Pattern Anal. Machine Intell., vol. 15, pp , Apr [9] E. Milios and E. Petrakis, Eicient shape matching and retrieval at multiple scales, Dept. Comput. Sci., York Univ., Toronto, Ont., Canada, Dec [10] J. Baid and E. Milios, Deormed shape recognition using dynamic programming, Dept. Comput. Sci., York Univ., Toronto, Ont., Canada, Tech. Rep., [11] E. Milios, Shape matching using curvature processes, Comput. Vis., Graph., Image Process., vol. 47, pp , [12] T. W. Sederberg and E. Greenwood, A physical based approach to 2-D shape bending, Comput. Graph., vol. 26, pp , [13] T. P. Wallace and P. A. Wintz, An eicient three-dimensional aircrat recognition algorithm using normalized Fourier descriptors, Comput. Graph. Image Process., vol. 13, pp , [14] L. Gupta and M. D. Srinath, Contour sequence moments or the classiication o closed planar shapes, Pattern Recognit., vol. 20, pp , [15] M.-K. Hu, Visual pattern recognition by moment invariants, IRE Trans. Inorm. Theory, vol. IT-8, pp , Automatic Text Detection and Tracking in Digital Video Huiping Li, David Doermann, and Omid Kia Abstract Text that appears in a scene or is graphically added to video can provide an important supplemental source o index inormation as well as clues or decoding the video s structure and or classiication. In this work, we present algorithms or detecting and tracking text in digital video. Our system implements a scale-space eature extractor that eeds an artiicial neural processor to detect text blocks. Our text tracking scheme consists o two modules: a sum o squared dierence (SSD) based module to ind the initial position and a contour-based module to reine the position. Experiments conducted with a variety o video sources show that our scheme can detect and track text robustly. Index Terms Digital libraries, neural network, text detection, text tracking, video indexing. I. INTRODUCTION The continued prolieration o large amounts o digital video has increased demand or true content based indexing and retrieval systems. Traditionally, content has been indexed primarily by manual annotation [1], closed caption [2], or transcribed audio [3], but some work has also been done on the content analysis o the video itsel. One area where signiicant progress is being made is in the detection and recognition o text. Text which either appears in a scene or is graphically added to video provides an important supplemental source o index inormation. For example, sports scores, product names, scene locations, speaker names, movie credits, program introductions and special announcements oten appear in the image text and supplement or summarize the visual content, but may not be present in the transcript. Searches can easily be reined i access to this textual content is available. At a high level, text in digital video can be divided into two classes, scene text and graphic text. Scene text appears within the scene and is captured by the camera. Examples o scene text include street signs, billboards, text on trucks and writing on shirts. Graphic text, on the other hand, is text that is mechanically added to video rames to supplement the visual and audio content. Since it is purposeully added it is oten more structured and closely related to the subject than scene text. In some domains such as sports, however, scene text can be used to uniquely identiy objects (participants in the clip). Most related previous work has ocused on the extraction o graphic text [4] [6]. Although scene text is oten diicult to detect and extract due to its virtually unlimited range o poses, sizes, shapes and colors, it is important in applications such as navigation, surveillance, video classiication, or analysis o sporting events. Text oten spans tens or even hundreds o rames in digital video. Exploiting the temporal coherence o text by tracking it is useul not Manuscript received December 11, 1998; revised June 21, The associate editor coordinating the review o this manuscript and approving it or publication was Dr. Hong Jiang Zhang. H. Li and D. S. Doermann are with the Language and Media Processing Laboratory, Center or Automation Research, University o Maryland, College Park, MD USA ( huiping@car.umd.edu; doermann@car.umd.edu). O. Kia is with the National Institute o Standards and Technology (NIST), Gaithersburg, MD USA. Publisher Item Identiier S (00) /00$ IEEE

Efficient Retrieval by Shape Content

Efficient Retrieval by Shape Content Euripides G.M. Petrakis Dept. of Electr. and Comp. Engineering Technical University of Crete, Chanea, Greece petrakis@ced.tuc.gr Evangelos Milios y Department of Computer