SHAPE RECOGNITION METHOD BASED ON THE k-nearest NEIGHBOR RULE

SHAPE RECOGNITION METHOD BASED ON THE k-nearest NEIGHBOR RULE Dorna Purcaru Faculty of Automaton, Computers and Electroncs Unersty of Craoa 13 Al. I. Cuza Street, Craoa RO-1100 ROMANIA E-mal: dpurcaru@electroncs.uc.ro Abstract: In artfcal ntellgence, each shape s represented by the ector of the characterstc features and represents a pont n the d-dmensonal space of the descrptors. An unknown shape s rejected or s dentfed wth one or more models (known shapes preously learned). The shape recognton s based on the smlarty between the unknown shape and each model. The dstance between the assocated two ectors s used for estmate the smlarty between any two shapes. The paper presents a recognton method based on the k-nearest neghbors rule. Ths method supposes two stages: model classfcaton and shape dentfcaton. Only narant descrptors are used for the model classfcaton at many leels and ths classfcaton s realzed only once, n the learnng stage of the recognton process. The shape dentfcaton supposes the dentfcaton of the most smlar class of models, the dentfcaton of the nearest neghbors and the dentfcaton of the most smlar model. A shape s dentfed wth a model f that shape s nsde the regon of the accepted tolerance of the model. The proposed method assures fast and smple shape recognton n robotcs. Key words: shape, model, descrptor, recognton, classfcaton, dentfcaton. 1 INTRODUCTION Each shape x can be descrbed by ts ector of the characterstc features, [ x1x 2... x d ] T represents a pont ( x, x,... ) x =, and t X 1 2 x d n the d-dmensonal space of the descrptors. An unknown shape can be dentfed wth one or more models (known shapes or prototypes that form the tranng set). The shape dentfcaton s based on the analyss of the shape smlarty (Purcaru 2003). The most smple and ntute method for the shape classfcaton and dentfcaton s that based on the dstance. Many dstances are presented n (Dougherty 1988, Belaïd 1992, Kunt 2000, Purcaru 1999), and the choce of the proper dstance depends on many factors: - performances mposed to the shape recognton process; - smlarty between the models; - recognton procedure. An adeuate reference dstance s ery dffcult to fnd because a alue too small sometmes determnes the rejecton of the shapes dentcal wth some learned models, and the shape dscrmnaton capablty decreases f the alue of the reference dstance s too large. Many methods are recommended n (Belaïd 1992, Kunt 2000) for the shape classfcaton and decson: functonal dscrmnaton, automatc classfcaton, Bayes classfcaton procedure, k-nearest neghbor rule, stochastc methods, structural methods etc. 2 THE k-nearest NEIGHBORS RULE The k-nearest neghbor rule (k-nn) was ntroduced by Fx and Hodges n 1951. Accordng to ths nonparametrc procedure, an unknown shape x s assgned to the class that contans the majorty of ts k-nearest neghbors n the tranng set (Haste 1996). The man adantages of ths classfcaton method are the followng: good performance n practcal applcatons; remarkable propertes for conergence, f the number of the models that form the tranng set tends to nfnty; the dentfcaton of the class that contans x does not suppose complete statstcal knowledge regardng the condtonal densty functons of each class (Denoeux 1995); f err B s the error of the Bayes decson and errk NN the error of the k-nn decson, errb errk NN... err2 NN err1 NN 2 errb (1) and lm errk NN = errb. (2) k When all the classes cannot be assumed to be represented n the tranng set, t may be wse to consder that a shape that s far away from any preously obsered pattern, most probably belongs to an unknown

class and that shape should be rejected (Dubusson 1993). Dasarathy has ntroduced n (Dasarathy 1980) the concept of acceptable neghbor that s the neghbor whose dstance to the unknown shape (that must be classfed) s smaller than a threshold alue, learned from the tranng set. The frst erson of k-nn method s consdered slow for the shape dentfcaton because all possble dstances must be computed between the unknown shape and each model, n the d-dmensonal descrptor space. Many authors proposed dfferent faster ersons that can be grouped n four categores (Belaïd 1992), dependng on ther prncple: the compresson of the descrptor space dmenson; the pang, when the descrptor space s dded n cells and the study s lmted to some cells; the sortng, when the models are separated by the axes of the descrptor space; the herarchy, that supposes dfferent leels of delcacy n the study of the models. All these mproed ersons decrease the number of the analyzed models and, n ths way, the shape dentfcaton tme decreases too. Each technue mposes frst a pre-processng stage of the tranng set. The effcency of the method decreases when the dmenson of the descrptor space ncreases. The Km and Park classfcaton algorthm, presented n (Belaïd 1992), s based on the sortng prncple; all the known shapes (models) T m = [m,1m,2...m,d ], = 1, p are sorted and form a tree wth L leels of classfcaton ( L max d 1 ). The sortng at each leel s realzed dependng on the alues of one descrptor. Let denote c, = 1, L the dson factor of the alue doman of the descrptor used for the leel of classfcaton. There s mnmum one node at ths leel and c branches leae from ths node. The algorthm of Km and Park establshes the optmal path, n the tree of the known shapes, for fndng the nearest (the most smlar) neghbor of an unknown shape. The maxmum number of comparsons for the shape dentfcaton s drastcally reduced usng ths algorthm. An analyss of the maxmum number of dstances (that must be computed for the shape dentfcaton) s presented n Table 1. Ths number s N C = P when the unknown shape s successely compared wth each model m, = 1, p. The classfcaton based on the algorthm of Km and Park reduces at N C the maxmum number of computed dstances between the unknown shape and a model. A class dson stops when that class contans maxmum p models. N C and N C are computed wth the followng relatons: N C = c1 c2... c L max p, (3) N C = c1 + c2 +... + cl max + p. (4) The same dson factor c = c s consdered for all classfcaton leels n Table 1. Table 1 c L max p N N C C 3 1 10 13 30 3 4 10 22 810 3 4 2 14 162 3 5 5 20 1215 5 3 5 20 625 5 3 2 17 162 10 3 2 32 2,000 10 4 2 42 20,000 10 4 10 50 100,000 The algorthm of Km and Park s more effcent (faster) when the followng alues ncrease: L max, p and/or c, = 1, L. max 3 PRINCIPLE OF THE SHAPE RECOGNITION METHOD Modelng of the known shapes s presented n (Purcaru 2003). Let us consder d descrptors: s narant and the others ( d s) uas-narant to shape localzaton n the sensory space. Each known shape s tmes explored, n the same condtons, and ectors of the characterstc features result: [ ] T m = m,1m,2... m,d, 1 2 = 1,, and m,j = m,j =... = m,j for j = 1, s. One ector of the characterstc features, T m = [m,1m,2...m,d ], and the assocated ector of the accepted tolerance, ε = [ ε ε...ε ] T,1,2,d, are defned n (Purcaru 2003) for each known shape (model): m,j, j = 1, s,1 m,j = 1 (5) m,j, j = s + 1, d t= 1 and 0, j = 1,s ε,j = 1 2. (6) 3 ( m m,j ), j = s + 1, d,j 1 t= 1 In concluson, around a pont M ( m,1, m,2,... m,d ), correspondng to the model m, there s a regon of the accepted tolerance R at, n the d- dmensonal descrptor space: { x x m ε, j 1, d} R at, = j,j, j =. (7) If an unknown shape x s nsde R at,, that shape can be dentfed wth the model m. The shape x, wth x j m,j ε,j, 1 j d, (8) s consdered smlar wth the model m, accordng to descrptor j. All models that satsfy ths condton form the nearest neghbors of x accordng to descrptor j.

The shape recognton method, presented n ths paper, supposes model classfcaton and shape dentfcaton. The model classfcaton s realzed only once, n the learnng stage of the recognton process; ths model sortng s realzed at many leels, accordng to narant descrptors. The shape dentfcaton s realzed n three stages: Identfcaton of the most smlar class; Identfcaton of the nearest neghbors; Identfcaton of the most smlar model. 3.1 Model Classfcaton The narant descrptors are used to classfy the models, at h leels ( 1 h s ) of classfcaton. Leel 1 The models are classfed dependng on the alues m,1, = 1, p. Each dstnct alue of the frst descrptor generates a class C, = 1, c1 wth N ( C ) models. The classfcaton stops, for the models contaned by C, f ( C ) p 1 c ; otherwse the N, 1 classfcaton contnues. Leel 2 Each class classes C wth ( C ) p C, j, j = 1, c, 2 N > s dded n c, 2, dependng on the alues m,2 of the models contaned by C ; each dstnct alue of the second descrptor generates a class C,j, wth N ( C,j ) models. The classfcaton stops, for the models contaned by C, f ( ) N C p, 2,j,j 1 j c, ; otherwse, the model classfcaton contnues. Ths model classfcaton stops when a) each group of shapes, resulted at the last leel of classfcaton, contans maxmum p models, or b) the procedure passes through s leels of classfcaton. At the h leel of classfcaton, 1 h s, each class s dentfed by the characterstc alue m p, h ; all models contaned by ths class hae the same alue of the descrptor h. The models of the class C,+ 1,...+ are smlar n accordance to the frst ( + 1) narant descrptors. 3.2 Shape Identfcaton The class of the most smlar wth x models s frstly establshed. The nearest neghbors are then dentfed based on the uas-narant descrptors. The unknown shape can be dentfed wth the most smlar model that represents the nearest neghbor of x. Identfcaton of the most smlar class The class C, that contans the most smlar wth x models, s dentfed after many stages. The alue x 1 establshes the class C (wth the characterstc alue m,1 = x1 ). The alue x 2 establshes the class C, j (wth the characterstc alue,2 x 2 N C > p. The class dentfcaton stops when all alues x 1, x 2,... x s are used or the last resulted class contans maxmum p models. If ( C) p m = ) f ( ) N >, the procedure contnues wth the dentfcaton of the nearest neghbors; otherwse, t contnues wth the dentfcaton of the most smlar model. Identfcaton of the nearest neghbors Among the models of class C, the k-nearest neghbors are dentfed n many leels (stages), usng the uas-narant descrptors. The alues x s + and m,s + (for the models contaned by the studed class) are represented on the axs of the descrptor ( s + ) for each leel of dentfcaton. The smlarty (between x and each model) s then studed, accordng to ths descrptor. The reference alue ε M,s+ = max ε,s+ (9) s computed at each leel, for the analyzed models. At the leel, the study starts wth the nearest model and stops when there s not any model on the explored semaxs or the condton x s+ m,s+ > ε M,s+ (10) s satsfed. The k -nearest neghbors (establshed at the leel of dentfcaton) are the models smlar wth 1,2,... s +. Such x, n accordance to descrptors ( ) model m satsfes the condton x j m,j ε,j, j = 1, s + (11) and form the class neghbor dentfcaton stops f G, wth N ( G ) = k. The otherwse t contnues at the leel ( + 1). k p, and The group, resulted at the last neghbor dentfcaton leel, s denoted G and N ( G) =k. The k-nearest neghbors are the models denoted [ n n... n ] T n =,1,2,d, = 1, k. Identfcaton of the most smlar model The unknown shape x can be dentfed wth the model n f ( n x ) j n,j ε,j, 1 k, j = 1, d, (12) where ε ( n) s the ector of the accepted tolerance assocated wth the model n. The condton (12) s

already erfed for some descrptors, at the preously leels of dentfcaton. 4 EXPERIMENTAL RESULTS Many shapes were recognzed usng the method presented n ths paper. For example, the models are 60 suare shapes ( p = 60) wth or wthout suare holes; a model can contan maxmum 6 holes. Let denote and S Ht the sde of the exteror suare and the sde of the hole t respectely, and S Ht the relate (dmensonless) alues of and S Ht. The model dstrbuton s presented n Tables 1, 2, 3, dependng on N H, and S Ht. Table 1 N H 0 1 S Ht 3 4 5 7 8 3 4 5 7 8 9 10 11 Table 2 N H 2 3 S Ht 3 4 5 4 8 9 10 11 Table 3 N H 4 5 6 S 4 Ht 3 3 4 4 5 8 9 10 11 The selected descrptors are the followng: the number of the holes, m,1 = N H ; the relate area of the shape lmted by the exteror outlne, m,2 = A E ; the relate area of each hole contaned by the explored shape, m,2 + t = A Ht, t = 1, 6. The frst descrptor s narant to shape localzaton n the sensory space, and m,3,..., m, 8 are uasnarant descrptors. So, d = 8 and s = 1. A model m, wth N H holes, has alues zero for the last ( 6 N H ) descrptors. The nomnal alues m, 2 and the assocated accepted tolerances ε, 2 are specfed n Table 4, for the models wthout holes. Table 4 m, 2 ε, 2 1 2.2 1.265 3 2 8.9 2.625 4 3 16 1.999 5 4 24.7 2.469 7 5 48.7 5.109 8 6 63.6 4.05 9 7 81.2 4.17 10 8 10 4.527 11 9 121 3.162 The unknown shape x=[1; 103; 53. 0; 0; 0; 0; 0] T must be recognzed. The model classfcaton s only dependng on N H ; the resulted classes are C, = 1, 7. The number of models contaned by each class s specfed n Table 5. Table 5 1 2 3 4 5 6 7 N ( C ) 9 23 10 7 6 3 2 Let consder p = 3. Because x 1 = 1, the most smlar class wth x s C 2, that contans 22 models. The dentfcaton of the nearest neghbors starts wth the study of smlarty n accordance to the second descrptor, A E. The models of the class C 2 hae only 6 dstnct alues of A E (24.7; 48.7; 63.6; 81.2; 10; 121) and the maxmum alue of ther accepted tolerance s ε M,2 = 5. 109. Only one alue of A E satsfes the condton (11) because x 2 = 103. So, all 5 models wth A E = 10, form the k1 -nearest neghbors of x. The next leel of neghbor dentfcaton studes the smlarty n accordance to the thrd descrptor, A Ht. The k 1 -nearest neghbors hae the followng dstnct alues of A H1 : 2.2; 8.9; 16; 24.7; 48.7. Only the last alue satsfes the condton (11) because x 3 = 53. 5 and ε M,3 = 5.109. The shape x s dentfed wth the model m 26 =[1; 10; 48.7; 0; 0; 0; 0; 0] T because the condton (12) s satsfed.

5 CONCLUSION Ths paper presents a method for the shape recognton, based on the dentfcaton of the nearest neghbors of the shape that must be recognzed. Many narant or uas-narant characterstc features assure the descrpton of each analyzed shape. The recognton method supposes the classfcaton of the models and the shape dentfcaton. Ths method has the followng man adantages: assures a smple and fast recognton and good performance n practcal applcatons; establshes the models the most smlar wth the unknown shape, at dfferent leels of smlarty; s adeuate for the known shapes wth spreads of ther descrptor alues; the unknown shape must be explored only once, regardless to ts locaton n the sensory space. REFERENCES Belaïd,A., Belaïd,Y., Reconnassance des formes. Méthodes and applcatons, InterEdtons, 1992. Dasarathy,B.V., Nosng around the neghborhood: A new system structure and classfcaton rule for recognton n partally exposed enronments, IEEE Transactons on Pattern Analyss and Machne Intellgence, Vol.2, No.1, 1980, pp.67-71. Denoeux,T., A k -nearest neghbor classfcaton rule based on Dempster-Shafter theory, IEEE Transactons on Systems, Man and Cybernetcs, Vol.25, No.5, 1995, pp.804-813. Dougherty,E., Mathematcal Methods for Artfcal Intellgence and Autonomous Systems, Prentce-Hall Internatonal Edtons, 1988. Dubusson,B. and Masson,M., A statstcal decson rule wth ncomplete knowledge about classes, Pattern Recognton, Vol.26, No.1, 1993, pp.155-165. Haste,T. and Tbshran,R., Dscrmnant adapte nearest neghbor classfcaton, IEEE Transactons on Pattern Analyss and Machne Intellgence, Vol.18, No.6, 1996, pp.607-615. Kunt,M., Coray,G. et al., Tratement de l nformatue, Vol. 3. Reconnassance des formes et analyse de scènes, Presses Polytechnues et Unerstares Romandes, 2000. Purcaru,D.M. and Iordache,S., 2D Shape Recognton Usng Maxmum Weghted Dstance, Publshed n: Adances n Intellgent Systems and Computer Scence, WSES Press (Greece), 1999, pp. 69-72. Purcaru,D.M., On the Estmaton of Shape Smlarty n Robotcs, Publshed n Computatonal Methods n Crcuts and Systems Applcatons, WSES Press (Greece), 2003, pp. 120-125.