An efficient weighted nearest neighbour classifier using vertical data representation

Size: px
Start display at page:

Download "An efficient weighted nearest neighbour classifier using vertical data representation"

Transcription

1 Int. J. Busness Intellgence and Data Mnng, Vol. x, No. x, xxxx 1 An effcent weghted nearest neghbour classfer usng vertcal data representaton Wllam Perrzo Department of Computer Scence, North Dakota State Unversty, P.O. Box 5164, Fargo, ND , USA E-mal: wllam.perrzo@ndsu.edu Qn Dng* Department of Computer Scence, Pennsylvana State Unversty at Harrsburg, Mddletown, PA 17057, USA Fax: E-mal: qdng@psu.edu *Correspondng author Maleq Khan Department of Computer Scence, Purdue Unversty, 250 N. Unversty St., West Lafayette, IN 47907, USA E-mal: mmkhan@cs.purdue.edu Anne Denton Department of Computer Scence and Operatons Research, North Dakota State Unversty, P.O. Box 5164, Fargo, ND , USA E-mal: anne.denton@ndsu.edu Qang Dng Jangsu Telecom Co., Ltd., Huan Cheng Nan Lu #88, Nantong, Jangsu , Chna E-mal: qdng74@gmal.com Abstract: The k-nearest neghbour (KNN) technque s a smple yet effectve method for classfcaton. In ths paper, we propose an effcent weghted nearest neghbour classfcaton algorthm, called PINE, usng vertcal data representaton. A metrc called HOBBt s used as the dstance metrc. The PINE algorthm apples a Gaussan podum functon to set weghts to dfferent neghbours. We compare PINE wth classcal KNN methods usng horzontal and vertcal representaton wth dfferent dstance metrcs. The expermental results show that PINE outperforms other KNN methods n terms of classfcaton accuracy and runnng tme. Copyrght 200x Inderscence Enterprses Ltd.

2 2 W. Perrzo, Q. Dng, M. Khan, A. Denton and Q. Dng Keywords: nearest neghbours; classfcaton; data mnng; vertcal data; spatal data. Reference to ths paper should be made as follows: Perrzo, W., Dng, Q., Khan, M., Denton, A. and Dng, Q. (xxxx) An effcent weghted nearest neghbour classfer usng vertcal data representaton, Int. J. Busness Intellgence and Data Mnng, Vol. x, No. x, pp.xxx xxx. Bographcal notes: Wllam Perrzo s a Professor of Computer Scence at North Dakota State Unversty. He holds a PhD Degree from the Unversty of Mnnesota, a Master s Degree from the Unversty of Wsconsn and a Bachelor s Degree from St. John s Unversty. He has been a Research Scentst at the IBM Advanced Busness Systems Dvson and the USA Ar Force Electronc Systems Dvson. Hs areas of expertse are data mnng, knowledge dscovery, database systems, dstrbuted database systems, hgh-speed computer and communcatons networks, precson agrculture and bonformatcs. He s a member of ISCA, ACM, IEEE, IAAA and AAAS. Qn Dng s an Assstant Professor n Computer Scence at Pennsylvana State Unversty at Harrsburg, USA. She receved her PhD n Computer Scence from North Dakota State Unversty, USA, MS and BS n Computer Scence from Nanjng Unversty, Chna. Her research nterests nclude data mnng, database, bonformatcs and spatal data. Maleq Khan receved BS n Computer Scence and Engneerng from Bangladesh Unversty of Engneerng and Technology, Bangladesh and MS n Computer Scence from North Dakota State Unversty, ND, USA. He s currently workng toward the PhD Degree n Computer Scence at Purdue Unversty, IN, USA. Hs research nterests nclude dstrbuted algorthms, communcaton networks, wreless sensor networks, bonformatcs and data mnng. Anne Denton s an Assstant Professor n Computer Scence at North Dakota State Unversty, USA. Her research nterests are n data mnng, knowledge dscovery n scentfc data and bonformatcs. Specfc nterests nclude data mnng of dverse data, n whch objects are charactersed by a varety of propertes such as numercal and categorcal attrbutes, graphs, sequences, tme-dependent attrbutes and others. She receved her PhD n Physcs from the Unversty of Manz, Germany and her MS n Computer Scence from North Dakota State Unversty, Fargo, ND, USA. Qang Dng receved hs MS and PhD n Computer Scence from North Dakota State Unversty, USA, BS n Computer Scence from Nanjng Unversty, Chna. He was an Assstant Professor n Computer Scence at Concorda College, MN, USA, before he joned Jangsu Telecom Co., Ltd., Chna. Hs research nterests nclude database and data mnng. 1 Introducton The nearest neghbour technque (Cover and Hart, 1967; Cover, 1968; Dasarathy, 1991; Duda and Hart, 1973; McLachlan, 1992; Safar, 2005) s a smple yet effectve method for classfcaton. Gven a set of tranng data, a KNN classfer predcts the class value for a new sample X by searchng the tranng set for the KNNs of X based on certan dstance

3 An effcent weghted nearest neghbour classfer 3 metrc and then assgnng to X the most frequent class occurrng n ts KNNs. The value of k,.e., the number of neghbours, s pre-specfed. Nearest neghbour classfers are lazy learners n that a classfer s not bult untl a new sample needs to be classfed. They dffer from eager classfers, such as decson tree nducton (Breman et al., 1984; Qunlan, 1993; Jeong and Lee, 2005; James, 1985; Fu, 2005). In classcal KNN methods, data are represented n horzontal format. Each tranng observaton s a tuple that conssts of n features (or called attrbutes). Each tme when a new sample needs to be classfed, the entre tranng data set s scanned to fnd the KNNs of ths new sample. The database scan can be costly f the tranng data s extremely large. In ths paper, we propose a nearest neghbour algorthm, called Podum Incremental Neghbour Evaluator (PINE), usng vertcal data representaton. Vertcal data representaton can usually provde hgh compresson rato for large datasets. Vertcal representaton s partcularly sutable for spatal data, not only because the spatal data set s typcally large, but also because the contnuty of spatal neghbourhood provdes potental for even hgher compresson rato usng vertcal representaton. In most KNN methods, each of the KNNs casts an equal vote for the class of the new sample. In the PINE algorthm, we apply a Gaussan podum functon to dfferent neghbours wth dfferent weghts. The podum functon s chosen n such a way that the neghbours close to the sample have hgher mpact than others n determnng the class of the sample. In addton, all the samples n the database are consdered neghbours, though some samples have very low or even zero weghts. By dong so, there wll be no need to pre-specfy the k value and more mportantly, hgh classfcaton accuracy wll be acheved. The dea of usng weghts n KNN s not new. It s common to assgn weghts to dfferent features n the dstance metrcs snce some features may be more relevant than others n terms of closeness or dstance (Fu and Wang, 2005). Weghts can also be assgned to the nstances. Cost and Salzberg presented a method of attachng weghts to dfferent nstances for learnng wth symbolc features (Cost and Salzberg, 1993). Some nstances are more relable and are consdered exemplars. These relable exemplars are gven smaller weghts to make them appear closer to a new sample. In our work, based on the dstances from the new sample, we partton the entre tranng set nto dfferent groups (called rngs ) and then assgn a weght to each rng usng a podum functon. The podum functon establshes a rser heght for each step of the podum weghtng functon as the dstance from the sample grows. The dea of a podum functon s smlar to the concept of a radal bass functon (Nefeld and Psalts, 1993; Al and Smth, 2005). In ths paper, we use a vertcal data representaton called P-tree 1 (Perrzo et al., 2001a; Dng et al., 2002). P-trees are compact and data-mnng-ready structure, whch provdes lossless representaton of the orgnal horzontal data. P-trees represent bt nformaton that s obtaned from the data through a separaton nto bt planes. The mult-level structure s used to acheve hgh compresson. A consstent mult-level structure s mantaned across all bt planes of all attrbutes. Ths s done so that a smple mult-way logcal AND operaton can be used to reconstruct count nformaton for any attrbute value or tuple. Dfferent metrcs can be defned for closeness of two data ponts. In ths paper, we use a metrc called Hgh Order Basc Bt smlarty (HOBBt) (Khan et al., 2002). HOBBt dstance metrc s partcularly sutable for spatal data.

4 4 W. Perrzo, Q. Dng, M. Khan, A. Denton and Q. Dng We run the PINE algorthm on very large real mage datasets. Our expermental results show that t s much faster to buld a KNN classfer usng vertcal data representaton than usng horzontal data representaton for very large datasets. It also shows that the PINE algorthm acheves hgher classfcaton accuracy than other KNN methods. The rest of the paper s organsed as follows. In Secton 2, we revew the vertcal P-tree structure. In Secton 3, we ntroduce the HOBBt metrc and detal our PINE algorthm. Performance analyss s gven n Secton 4. Secton 5 concludes the paper. 2 The vertcal P-tree structure revew We use a vertcal structure, called a Peano Count Tree (P-tree), to represent the data. We frst splt each attrbute value nto bts. For example, for mage data, each band s an attrbute and the value s represented as a byte (8 bts). The fle for each ndvdual bt s called a bsq fle. For an attrbute wth m-bt values, we have m bsq fles. We organse each bsq bt fle, B j (the fle constructed from the jth bts of th attrbute), nto a P-tree. An example of an 8 8 bsq fle and ts P-tree s gven n Fgure 1. Fgure 1 P-tree and PM-tree In ths example, 39 s the count of 1 s n the bsq fle, called root count. The numbers at the next level, 16, 8, 15 and 16, are the 1-bt counts for the four major quadrants. Snce the frst and last quadrant s made up of entrely 1-bts and 0-bts respectvely (called pure1 and pure0 quadrant respectvely), we do not need sub-trees for these two quadrants. Ths pattern s contnued recursvely. Recursve raster orderng s called the Peano or Z-orderng n the lterature therefore, the name P-trees. The process wll eventually termnate at the leaf level where each quadrant s a 1-row-1-column quadrant. For each band, assumng 8-bt data values, we get 8 basc P-trees, one for each bt poston. For band B we wll label the basc P-trees, P,1, P,2,, P,8, where P,j s a lossless representaton of the jth bt of the values from the th band. However, the P,j provdes much more nformaton and s structured to facltate many mportant data mnng processes. For effcent mplementaton, we use a varaton of P-trees, called PM-tree (Pure Mask tree), n whch mask nstead of count s used. In the PM-tree, 3-value logc s used,.e., 11 represents a pure1 quadrant, 00 represents a pure0 quadrant and 01 represents a mxed quadrant. To smplfy, we use 1 for pure1, 0 for pure0 and m for mxed. Ths s llustrated n the thrd part of Fgure 1.

5 An effcent weghted nearest neghbour classfer 5 P-tree algebra contans operators AND, OR, NOT and XOR, whch are the pxel-by-pxel logcal operatons on P-trees (Dng et al., 2002). The NOT operaton s a straghtforward translaton of each count to ts quadrant-complement. The AND and OR operatons are shown n Fgure 2. Fgure 2 P-tree algebra (AND and OR) The basc P-trees can be combned usng smple logcal operatons to produce P-trees for the orgnal values (at any level of precson, 1-bt precson, 2-bt precson, etc.). We let P b,v denote the P-tree for attrbute, b and value, v, where v can be expressed n any bt precson. Usng the 8-bt precson for values, P b, can be constructed from the basc P-trees as: P b, = P b1 AND P b2 AND P b3 AND P b4 AND P b5 AND P b6 AND P b7 AND P b8 Where ndcates the NOT operaton. The AND operaton s smply the pxel-wse AND of the bts. Smlarly, the data n the relatonal format can be represented as P-trees also. For any combnaton of values (v 1,v 2,, v n ), where v s from attrbute-, the quadrant-wse count of occurrences of ths combnaton of values s gven by: P P P P ( v1, v2,, v ) = 1, AND 1 2, AND AND n v v 2 n, vn P-trees are mplemented n an effcent way, whch facltates fast P-tree operatons and consderable savng of space. More detals can be found n Dng et al. (2002) and Perrzo et al. (2001b).

6 6 W. Perrzo, Q. Dng, M. Khan, A. Denton and Q. Dng 3 Dstance-weghted nearest neghbour classfcaton usng vertcal data representaton In classcal KNN classfcaton technques, there s a lmt placed on the number of neghbours. In our dstance-weghted neghbour classfcaton approach, the podum or dstance weghtng functon establshes a rser heght for each step of the podum weghtng functon as the dstance from the sample grows. Ths approach gves users maxmum flexblty n choosng just the rght level of nfluence for each tranng sample n the entre tranng set. The real queston s, can ths level of flexblty be offered wthout mposng a severe penalty wth respect to the speed of the classfer. Tradtonally, sub-samplng, neghbour-lmtng and other restrctons are ntroduced precsely to ensure that the algorthm wll fnsh ts classfcaton n reasonable tme. The use of the compressed, data-mnng-ready data structure, the P-tree, n fact, makes PINE even faster than tradtonal methods. Ths s crtcally mportant n classfcaton snce data are typcally never dscarded and therefore the tranng set wll grow wthout bound. The classfcaton technque must scale well or t wll quckly become unusable n ths settng. PINE scales well snce ts accuracy ncreases as the tranng set grows whle ts speed remans very reasonable (see the performance study below). Furthermore, snce PINE s lazy (does not requre a tranng phase n whch a closed form classfer s pre-bult), t does not ncur the expensve delays requred for rebuldng a classfer when new tranng data arrves. Thus, PINE gves us a faster and more accurate classfer. Before explanng how dstance-weghted neghbour classfer PINE usng P-trees works, we gve an overvew of the technque of KNN classfcaton and llustrate how to calculate KNN usng P-trees wthout consderng weghts. In KNN the basc dea s that the tuples that most lkely belong to the same class are those that are smlar n the other attrbutes. Ths contnuty assumpton s consstent wth the propertes of a spatal neghbourhood. Based on some pre-selected dstance metrc or smlarty measure, such as Eucldean dstance, classcal KNN fnds the k most smlar or nearest tranng samples to an unclassfed sample and assgns the pluralty class of those k samples to the new sample (Dasarathy, 1991; Nefeld and Psalts, 1993). The value for k s pre-selected by the user based on the accuracy requred (usually the larger the value of k, the more accurate the classfer) and the delay tme requred for classfyng wth that k-value (usually the larger the value of k the slower the classfer). The steps of the classfcaton process are: determne a sutable dstance metrc fnd the KNNs usng the selected dstance metrc fnd the pluralty class of the KNNs (votng on the class labels of the KNNs) assgn that class to the sample to be classfed. We use a dstance metrc called HOBBt, whch provdes an effcent method of computaton based on P-trees. Instead of examnng ndvdual tranng samples to fnd the nearest neghbours, we start our ntal neghbourhood of the target sample wthn a specfed dstance n the feature space based on ths metrc, and then successvely expand the neghbourhood area untl there are at least k tuples n the neghbourhood set.

7 An effcent weghted nearest neghbour classfer 7 Of course, there may be more boundary neghbours equdstant from the sample than are necessary to complete the KNN set, n whch case, one can ether use the larger set or arbtrarly gnore some of them. Tradtonal methods fnd the exact KNN set, snce that s easest usng tradtonal technques although t s clear that allowng some samples at that dstance to vote and not others, wll skew the result. Instead, the P-tree-based KNN approach bulds a closed nearest neghbour set (closed-knn) (Khan et al., 2002), that s, we nclude all of the boundary neghbours. The nductve defnton of the closed-knn set s gven below. a If x KNN, then x closed-knn b If x closed-knn and d(t, y) d(t, x), then y closed-knn, where, d(t, x) s the dstance of x from target T c Closed-KNN does not contan any tuple whch cannot be produced by steps (a) and (b). Expermental results show closed-knn yelds hgher classfcaton accuracy than KNN does. If there are many tuples on the boundary, ncluson of some but not all of them skews the votng mechansm. The P-tree mplementaton requres no extra computaton to fnd the closed-knn. Our neghbourhood expanson mechansm automatcally ncludes the entre boundary of the neghbourhood. P-tree algorthms avod the examnaton of ndvdual data ponts, whch mproves the classfcaton effcency. 3.1 Hgher Order Basc bt (HOBBt) dstance Usng a sutable dstance or smlarty metrc s very mportant n KNN methods because t can strongly affect performance. There s no metrc that works best for all the cases. Research has been done to fnd the rght metrc for the rght problem (Short and Fukanaga, 1980, 1981; Fredman, 1994; Myles and Hand, 1990; Haste and Tbshran, 1996; Domencon et al., 2002). The most commonly used dstance metrc s Eucldean metrc. For two data ponts, X = <x 1, x 2, x 3,, x n 1 > and Y = <y 1, y 2, y 3,, y n 1 >, the Eucldean smlarty functon s defned as 2 n 1 d ( X, Y) = ( x y ) = 1 It can be generalsed to the Mnkowsk smlarty functon, n 1 (, ) q q q = = 1 d X Y w x y 2 If q = 2, ths gves the Eucldean functon. If q = 1, t gves the Manhattan dstance, whch s 1 n 1 d ( X, Y) = x y = 1

8 8 W. Perrzo, Q. Dng, M. Khan, A. Denton and Q. Dng If q =, t gves the max functon n 1 d ( X, Y) = max x y. = 1 In ths paper we use the HOBBt dstance metrc (Khan et al., 2002), whch s a natural choce for spatal data. The HOBBt metrc measures dstance based on the most sgnfcant consecutve bt postons startng from the left (the hghest order bt). Smlarty or closeness s of nterest. When comparng two values btwse from left to rght, once a dfference s found, any further comparsons are not needed. The HOBBt smlarty between two ntegers A and B s defned by S H (A, B) = max{s 0 s a = b } (1) where a and b are the th bts of A and B respectvely. The HOBBt dstance between two tuples X and Y s defned by n 1 H = 1 H d ( X, Y) = max{ m S ( x, y )} (2) where m s the number of bts n bnary representatons of the values; n 1 s the number of attrbutes used for measurng dstance (the nth beng the class attrbute); and x and y are the th attrbutes of tuples X and Y. The HOBBt dstance between two tuples s a functon of the least smlar parng of attrbute values n them. From the experments, we found that HOBBt dstance s more sutable for spatal data than other dstance metrcs. 3.2 Closed-KNN To fnd the closed KNN set, frst we look for the tuples whch are dentcal to the target tuple n all 8 bts of all bands,.e., the tuples, X, havng dstance from the target T, d H (X, T) = 0. If, for nstance, t 1 = 105 ( b = 105 d ) s a target attrbute value, the ntal nterval of nterest s [105, 105] ([ , ]). If over all tuples the number of matches s less than k, we compare attrbutes on the bass of the seven most sgnfcant bts, not carng about the 8th bt. The expanded nterval of nterest would be [104,105] ([ , ] or [ , ]). If k matches stll have not been found, removng one more bt from the rght gves the nterval [104, 107] ([ , ]). Contnung to remove bts from the rght we get ntervals, [104, 111], then [96, 111] and so on. Ths process s mplemented usng P-trees as follows. P,j s the basc P-tree for bt j of band and P,j s the complement of P,j. Let b,j be the jth bt of the th band of the target tuple, and for mplementaton purposes let the representaton of the P-tree depend on the value of b j. Defne: Pt,j = P,j f b,j = 1, = P,j, otherwse.

9 An effcent weghted nearest neghbour classfer 9 Then the root count of Pt,j s the number of tuples n the tranng dataset havng the same value as the jth bt of the th band of the target tuple. Defne: Pv,1 j = Pt,1 & Pt,2 & Pt,3 & & Pt,j (3) where & s the P-tree AND operator and n s the number of bands. Pv,1 j counts the tuples havng the same bt values as the target tuple n the hgher order j bts of th band. Then a neghbourhood P-tree can be formed as follows: Pnn(j) = Pv 1,1 j & Pv 2,1 j & Pv 3,1 j & & Pv n 1,1 j (4) We calculate the ntal neghbourhood P-tree, Pnn(8), matchng exactly n all bands, consderng 8-bt values. Then we calculate Pnn(7), matchng n seven hgher order bts; then Pnn(6) and so on. We contnue as long as the root count of Pnn(j) s less than k. Let us denote the fnal Pnn(j) by Pcnn. Pcnn represents the closed-knn set and the root count of Pcnn s the number of the nearest tuples. A bt of one n Pcnn for a tuple means that the tuple s n the closed-knn set. For the purpose of classfcaton, we do not need to consder all bts n the class band. If the class band s 8 bts long, there are 256 possble classes. Instead, we partton the class band values nto fewer, say 8, groups by truncatng the 5 least sgnfcant bts. The 8 classes are 0, 1, 2,, 7. Usng the leftmost 3 bts we construct the value P-trees P n (0), P n (1),, P n (7). The P-tree Pcnn & P n () represents the tuples havng a class value that are n the closed-knn set, Pcnn. An, whch yelds the maxmum root count of Pcnn & P n () s the pluralty class; that s predcted class = arg max{rc( Pcnn & P ( ))} (5) where, RC(P) s the root count of P. 3.3 PINE: weghted nearest neghbour classfer usng vertcal data representaton The contnuty assumpton of KNN tells us that tuples that are more smlar to a gven tuple have more nfluence on classfcaton than tuples that are less smlar. Therefore gvng more votng weght to closer tuples than dstant tuples ncreases the classfcaton accuracy. Instead of consderng the KNNs, we nclude all of the ponts, usng the largest weght, 1, for those matchng exactly, and the smallest weght, 0, for those furthest away. Fgure 3 shows the neghbourhood rngs usng HOBBt metrc. Many weghtng functons whch decrease wth dstance can be used (e.g., Gaussan, Krgng, etc.). A smple example of such functon s a lnear podum functon (shown n Fgure 4) whch decreases step-by-step wth dstance. Note that the HOBBt dstance metrc s deally suted to the defnton of neghbourhood rngs, because the range of ponts that are consdered equdstant grows exponentally wth dstance from the centre. Adjustng weghts s partcularly mportant for small to ntermedate dstances where the podums are small. At larger dstances where fne-tunng s less mportant the HOBBt dstance remans unchanged over a large range,.e., podums are wder. Ideally, the 0-weghted rng should nclude all tranng samples that are judged to be too far away (by a doman expert) to nfluence class. n

10 10 W. Perrzo, Q. Dng, M. Khan, A. Denton and Q. Dng Fgure 3 Neghbourhood rngs usng HOBBt metrc Fgure 4 An example of a lnear podum functon We number the rngs from 0 (outermost) to m (nnermost). Let w j be the weght assocated wth the rng j. Let c j be the number of neghbour tuples n the rng j belongng to the class. Then the total weght vote by the class s gven by: m V () = wc (6) j= 0 Ths can easly be transformed to: j j m m m V () = w0 cl + ( wj wj 1) ck k= 0 j= 1 k= j (7)

11 An effcent weghted nearest neghbour classfer 11 Let crcle j be the crcle formed by the rngs j, j + 1,, m, that s, the rng j ncludng all of ts nner rngs. Referrng to equaton (4), the P-tree, Pnn(j), represents all of the tuples n the crcle j. Therefore, {Pnn(j) & P n ()} represents the tuples n the crcle j and class ; P n () s the P-tree for class. Hence: m ck = RC{ Pnn( j) & Pn ( )} (8) k= j m V( ) = w RC{ Pnn(0) & P ( )} + [( w w )RC{ Pnn( j) & P ( )}] (9) 0 n j j 1 n j= 1 An whch yelds the maxmum weghted vote, V(), s the pluralty class or the predcted class; that s: predcted class = arg max{ V ( )} (10) 4 Performance analyss We have performed experments to evaluate PINE on real data sets. An example data set s gven n Fgure 5. Ths data set ncludes an aeral TIFF mage (wth Red, Green and Blue bands) and assocated ground data, such as ntrate, mosture and yeld, of the Oaks area n North Dakota. In ths data set, yeld s the class label attrbute,.e., the goal s to predct the yeld based on the values of other attrbutes. Ths data set and some other data sets are avalable at Fgure 5 An mage dataset: (a) TIFF mage; (b) ntrate map; (c) mosture map and (d) yeld map

12 12 W. Perrzo, Q. Dng, M. Khan, A. Denton and Q. Dng We tested KNN wth Manhattan, Eucldean, Max and HOBBt dstance metrcs; closed-knn wth the HOBBt metrc and PINE. In PINE, HOBBt was used as the dstance functon and the Gaussan functon was used as the podum functon. The functon s exp( (2 2 d )/(2 σ 2 )), where d s the HOBBt dstance and the varance σ s 2 4. The mappng of the functon s gven n Table 1. Table 1 Gaussan weghs as the functon of HoBBt dstance HOBBt dstance Gaussan wegh The comparson of the accuracy s gven n Fgure 6. We observed that PINE acheves hgher classfcaton accuracy than closed-knn does. Especally when the tranng set sze ncreases, the mprovement of PINE over closed-knn s more apparent. In addtonal, PINE performs much better than classcal KNN methods usng varous metrcs. All these classfers work better than raw guessng, whch s 12.5% n the data set wth eght classes. Fgure 6 Comparson of classfcaton accuracy for KNN usng dfferent metrcs, closed-knn and PINE In terms of runnng tme, from Fgure 7, we observed that both closed-knn and PINE are much faster than classcal KNN methods usng varous metrcs (notce that both the sze and classfcaton tme are plotted n logarthmc scale). The reason behnd ths s because both PINE and closed-knn use vertcal data representaton, whch provdes fast calculaton of nearest neghbours. We can also see that PINE s slghtly slower than

13 An effcent weghted nearest neghbour classfer 13 closed-knn. Ths s expected because n PINE all the tranng samples are ncluded as neghbours whle n closed-knn only k samples (wth possbly extra ponts on the boundary) are selected as neghbours to be nvolved n the calculaton. However, ths addtonal tme cost s relatvely small. On the average, PINE s eght tmes faster than classcal KNN, and closed-knn s ten tmes faster. Both PINE and closed-knn ncrease at a lower rate than classcal KNN methods do when the tranng set sze ncreases. Fgure 7 Comparson of classfcaton tme per sample (sze and classfcaton tme are plotted n logarthmc scale) 5 Conclusons In ths paper, we propose a weghted nearest neghbour classfer, called PINE. PINE uses a vertcal data structure P-tree and a dstance metrc called HOBBt. A Gaussan podum functon s used to weght the dfferent neghbours based on the dstance from the sample data. PINE does not requre provdng the number of neghbours n advance. PINE s partcular useful for classfcaton on large data sets, such as spatal data sets, whch have hgh compresson rato usng vertcal data representaton. Performance analyss shows that PINE outperforms classcal KNN methods n terms of accuracy and speed of classfcaton on spatal data. In addton, PINE has potental for effcent classfcaton on data streams. In data streams, new data keep arrvng, so the classfcaton effcency wll be an mportant ssue. PINE provdes hgh effcency as well as accuracy for classfcaton on data streams. As the areas of data mnng applcatons grow (Chen and Lu, 2005), our approach has potental to be extended to some of these areas, such as DNA mcroarray data analyss and medcal mage analyss.

14 14 W. Perrzo, Q. Dng, M. Khan, A. Denton and Q. Dng Acknowledgement Ths work s partally supported by GSA Grant K References Al, S. and Smth, K.A. (2005) Kernel wdth selecton for SVM classfcaton: a meta-learnng approach, Internatonal Journal of Data Warehousng and Mnng, Idea Group Inc., Vol. 1, No. 4, pp Breman, L., Fredman, J.H., Olshen, R.A. and Stone, C.J. (1984) Classfcaton and Regresson Trees, Wadsworth, Belmont. Chen, S.Y. and Lu, X. (2005) Data mnng from 1994 to 2004: an applcaton-orentated revew, Internatonal Journal of Busness Intellgence and Data Mnng, Inderscence Publshers, Vol. 1, No. 1, pp Cost, S. and Salzberg, S. (1993) A weghted nearest neghbour algorthm for learnng wth symbolc features, Machne Learnng, Vol. 10, No. 1, pp Cover, T.M. (1968) Rates of convergence for nearest neghbour procedures, Proceedngs of Hawa Internatonal Conference on Systems Scences, Honolulu, Hawa, USA, pp Cover, T.M. and Hart, P. (1967) Nearest neghbour pattern classfcaton, IEEE Trans. on Informaton Theory, pp Dasarathy, B.V. (1991) Nearest-Neghbour Classfcaton Technques, IEEE Computer Socety Press, Los Alomtos, CA. Dng, Q., Khan, M., Roy, A. and Perrzo, W. (2002) The P-tree algebra, Proceedngs of ACM Symposum on Appled Computng, pp Domencon, C., Peng, J. and Gunopulos, D. (2002) Locally adaptve metrc nearest neghbour classfcaton, IEEE Trans. on Pattern Analyss and Machne Intellgence, Vol. 24, No. 9, pp.1, 281 1, 285. Duda, R.O. and Hart, P.E. (1973) Pattern Classfcaton and Scene Analyss, Wley, New York. Fredman, J. (1994) Flexble Metrc Nearest Neghbour Classfcaton, Techncal report, Stanford Unversty, Stanford, CA, USA. Fu, L. (2005) Novel effcent classfers based on data cube, Internatonal Journal of Data Warehousng and Mnng, Idea Group Inc., Vol. 1, No. 3, pp Fu, X. and Wang, L. (2005) Data dmensonalty reducton wth applcaton to mprovng classfcaton performance and explanng concepts of data sets, Internatonal Journal of Busness Intellgence and Data Mnng, Inderscence Publshers, Vol. 1, No. 1, pp Haste, T. and Tbshran, R. (1996) Dscrmnant adaptve nearest neghbour classfcaton, IEEE Trans. on Pattern Analyss and Machne Intellgence, Vol. 18, No. 6, pp James, M. (1985) Classfcaton Algorthms, John Wley & Sons, New York. Jeong, M. and Lee, D. (2005) Improvng classfcaton accuracy of decson trees for dfferent abstracton levels of data, Internatonal Journal of Data Warehousng and Mnng, Idea Group Inc., Vol. 1, No. 3, pp Khan, M., Dng, Q. and Perrzo, W. (2002) k-nearest Neghbour classfcaton on spatal data streams usng P-trees, Proceedngs of Pacfc and Asa Knowledge Dscovery and Data Mnng (PAKDD), Lecture Notes n Artfcal Intellgence, Vol. 2336, pp McLachlan, G.J. (1992) Dscrmnant Analyss and Statstcal Pattern Recognton, Wley, New York. Myles, J.P. and Hand, D.J. (1990) The mult-class metrc problem n nearest neghbour dscrmnaton rules, Pattern Recognton, Vol. 23, pp.1, 291 1, 297.

15 An effcent weghted nearest neghbour classfer 15 Nefeld, M.A. and Psalts, D. (1993) Optcal mplementatons of radal bass classfers, Appled Optcs, Vol. 32, No. 8, pp Perrzo, W., Dng, Q., Dng, Q. and Roy, A. (2001a) On mnng satellte and other remotely sensed mages, Proceedngs of ACM Workshop on Research Issues on Data Mnng and Knowledge Dscovery, Santa Barbara, CA, USA, pp Perrzo, W., Dng, Q., Dng, Q. and Roy, A. (2001b) Dervng hgh confdence rules from spatal data usng Peano count trees, Lecture Notes n Computer Scence, Vol. 2118, pp Qunlan, J.R. (1993) C4.5: Programs for Machne Learnng, Morgan Kaufmann. Safar, M. (2005) K nearest neghbour search n navgaton systems, Moble Informaton Systems, IOS Press, Vol. 1, No. 3, pp Short, R. and Fukanaga, K. (1980) A new nearest neghbour dstance measure, Proceedngs of Internatonal Conference on Pattern Recognton, Mam Beach, FL, USA, pp Short, R. and Fukanaga, K. (1981) The optmal dstance measure for nearest neghbour classfcaton, IEEE Trans. on Informaton Theory, Vol. 27, pp Note 1 P-tree technology s patented by North Dakota State Unversty (Wllam Perrzo, prmary nventor of record); patent number 6,941,303 ssued September 6, Webste TIFF mage data sets, Avalable at

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

Feature Reduction and Selection

Feature Reduction and Selection Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components

More information

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

Wishing you all a Total Quality New Year!

Wishing you all a Total Quality New Year! Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma

More information

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1 4/14/011 Outlne Dscrmnatve classfers for mage recognton Wednesday, Aprl 13 Krsten Grauman UT-Austn Last tme: wndow-based generc obect detecton basc ppelne face detecton wth boostng as case study Today:

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

Smoothing Spline ANOVA for variable screening

Smoothing Spline ANOVA for variable screening Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory

More information

Edge Detection in Noisy Images Using the Support Vector Machines

Edge Detection in Noisy Images Using the Support Vector Machines Edge Detecton n Nosy Images Usng the Support Vector Machnes Hlaro Gómez-Moreno, Saturnno Maldonado-Bascón, Francsco López-Ferreras Sgnal Theory and Communcatons Department. Unversty of Alcalá Crta. Madrd-Barcelona

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Decson surface s a hyperplane (lne n 2D) n feature space (smlar to the Perceptron) Arguably, the most mportant recent dscovery n machne learnng In a nutshell: map the data to a predetermned

More information

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification Introducton to Artfcal Intellgence V22.0472-001 Fall 2009 Lecture 24: Nearest-Neghbors & Support Vector Machnes Rob Fergus Dept of Computer Scence, Courant Insttute, NYU Sldes from Danel Yeung, John DeNero

More information

The Research of Support Vector Machine in Agricultural Data Classification

The Research of Support Vector Machine in Agricultural Data Classification The Research of Support Vector Machne n Agrcultural Data Classfcaton Le Sh, Qguo Duan, Xnmng Ma, Me Weng College of Informaton and Management Scence, HeNan Agrcultural Unversty, Zhengzhou 45000 Chna Zhengzhou

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

Parallel matrix-vector multiplication

Parallel matrix-vector multiplication Appendx A Parallel matrx-vector multplcaton The reduced transton matrx of the three-dmensonal cage model for gel electrophoress, descrbed n secton 3.2, becomes excessvely large for polymer lengths more

More information

X- Chart Using ANOM Approach

X- Chart Using ANOM Approach ISSN 1684-8403 Journal of Statstcs Volume 17, 010, pp. 3-3 Abstract X- Chart Usng ANOM Approach Gullapall Chakravarth 1 and Chaluvad Venkateswara Rao Control lmts for ndvdual measurements (X) chart are

More information

Optimizing Document Scoring for Query Retrieval

Optimizing Document Scoring for Query Retrieval Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng

More information

Mathematics 256 a course in differential equations for engineering students

Mathematics 256 a course in differential equations for engineering students Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the

More information

Load Balancing for Hex-Cell Interconnection Network

Load Balancing for Hex-Cell Interconnection Network Int. J. Communcatons, Network and System Scences,,, - Publshed Onlne Aprl n ScRes. http://www.scrp.org/journal/jcns http://dx.do.org/./jcns.. Load Balancng for Hex-Cell Interconnecton Network Saher Manaseer,

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS ARPN Journal of Engneerng and Appled Scences 006-017 Asan Research Publshng Network (ARPN). All rghts reserved. NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS Igor Grgoryev, Svetlana

More information

Problem Set 3 Solutions

Problem Set 3 Solutions Introducton to Algorthms October 4, 2002 Massachusetts Insttute of Technology 6046J/18410J Professors Erk Demane and Shaf Goldwasser Handout 14 Problem Set 3 Solutons (Exercses were not to be turned n,

More information

Classifier Selection Based on Data Complexity Measures *

Classifier Selection Based on Data Complexity Measures * Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.

More information

Hermite Splines in Lie Groups as Products of Geodesics

Hermite Splines in Lie Groups as Products of Geodesics Hermte Splnes n Le Groups as Products of Geodescs Ethan Eade Updated May 28, 2017 1 Introducton 1.1 Goal Ths document defnes a curve n the Le group G parametrzed by tme and by structural parameters n the

More information

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng

More information

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,

More information

Fast Feature Value Searching for Face Detection

Fast Feature Value Searching for Face Detection Vol., No. 2 Computer and Informaton Scence Fast Feature Value Searchng for Face Detecton Yunyang Yan Department of Computer Engneerng Huayn Insttute of Technology Hua an 22300, Chna E-mal: areyyyke@63.com

More information

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems A Unfed Framework for Semantcs and Feature Based Relevance Feedback n Image Retreval Systems Ye Lu *, Chunhu Hu 2, Xngquan Zhu 3*, HongJang Zhang 2, Qang Yang * School of Computng Scence Smon Fraser Unversty

More information

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION Paulo Quntlano 1 & Antono Santa-Rosa 1 Federal Polce Department, Brasla, Brazl. E-mals: quntlano.pqs@dpf.gov.br and

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory Background EECS. Operatng System Fundamentals No. Vrtual Memory Prof. Hu Jang Department of Electrcal Engneerng and Computer Scence, York Unversty Memory-management methods normally requres the entre process

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

An Image Fusion Approach Based on Segmentation Region

An Image Fusion Approach Based on Segmentation Region Rong Wang, L-Qun Gao, Shu Yang, Yu-Hua Cha, and Yan-Chun Lu An Image Fuson Approach Based On Segmentaton Regon An Image Fuson Approach Based on Segmentaton Regon Rong Wang, L-Qun Gao, Shu Yang 3, Yu-Hua

More information

Lecture 5: Multilayer Perceptrons

Lecture 5: Multilayer Perceptrons Lecture 5: Multlayer Perceptrons Roger Grosse 1 Introducton So far, we ve only talked about lnear models: lnear regresson and lnear bnary classfers. We noted that there are functons that can t be represented

More information

Machine Learning 9. week

Machine Learning 9. week Machne Learnng 9. week Mappng Concept Radal Bass Functons (RBF) RBF Networks 1 Mappng It s probably the best scenaro for the classfcaton of two dataset s to separate them lnearly. As you see n the below

More information

Meta-heuristics for Multidimensional Knapsack Problems

Meta-heuristics for Multidimensional Knapsack Problems 2012 4th Internatonal Conference on Computer Research and Development IPCSIT vol.39 (2012) (2012) IACSIT Press, Sngapore Meta-heurstcs for Multdmensonal Knapsack Problems Zhbao Man + Computer Scence Department,

More information

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur FEATURE EXTRACTION Dr. K.Vjayarekha Assocate Dean School of Electrcal and Electroncs Engneerng SASTRA Unversty, Thanjavur613 41 Jont Intatve of IITs and IISc Funded by MHRD Page 1 of 8 Table of Contents

More information

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET 1 BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET TZU-CHENG CHUANG School of Electrcal and Computer Engneerng, Purdue Unversty, West Lafayette, Indana 47907 SAUL B. GELFAND School

More information

Sorting Review. Sorting. Comparison Sorting. CSE 680 Prof. Roger Crawfis. Assumptions

Sorting Review. Sorting. Comparison Sorting. CSE 680 Prof. Roger Crawfis. Assumptions Sortng Revew Introducton to Algorthms Qucksort CSE 680 Prof. Roger Crawfs Inserton Sort T(n) = Θ(n 2 ) In-place Merge Sort T(n) = Θ(n lg(n)) Not n-place Selecton Sort (from homework) T(n) = Θ(n 2 ) In-place

More information

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach Angle Estmaton and Correcton of Hand Wrtten, Textual and Large areas of Non-Textual Document Images: A Novel Approach D.R.Ramesh Babu Pyush M Kumat Mahesh D Dhannawat PES Insttute of Technology Research

More information

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009. Farrukh Jabeen Algorthms 51 Assgnment #2 Due Date: June 15, 29. Assgnment # 2 Chapter 3 Dscrete Fourer Transforms Implement the FFT for the DFT. Descrbed n sectons 3.1 and 3.2. Delverables: 1. Concse descrpton

More information

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices Steps for Computng the Dssmlarty, Entropy, Herfndahl-Hrschman and Accessblty (Gravty wth Competton) Indces I. Dssmlarty Index Measurement: The followng formula can be used to measure the evenness between

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 A mathematcal programmng approach to the analyss, desgn and

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

Face Recognition University at Buffalo CSE666 Lecture Slides Resources:

Face Recognition University at Buffalo CSE666 Lecture Slides Resources: Face Recognton Unversty at Buffalo CSE666 Lecture Sldes Resources: http://www.face-rec.org/algorthms/ Overvew of face recognton algorthms Correlaton - Pxel based correspondence between two face mages Structural

More information

A Background Subtraction for a Vision-based User Interface *

A Background Subtraction for a Vision-based User Interface * A Background Subtracton for a Vson-based User Interface * Dongpyo Hong and Woontack Woo KJIST U-VR Lab. {dhon wwoo}@kjst.ac.kr Abstract In ths paper, we propose a robust and effcent background subtracton

More information

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr) Helsnk Unversty Of Technology, Systems Analyss Laboratory Mat-2.08 Independent research projects n appled mathematcs (3 cr) "! #$&% Antt Laukkanen 506 R ajlaukka@cc.hut.f 2 Introducton...3 2 Multattrbute

More information

Query Clustering Using a Hybrid Query Similarity Measure

Query Clustering Using a Hybrid Query Similarity Measure Query clusterng usng a hybrd query smlarty measure Fu. L., Goh, D.H., & Foo, S. (2004). WSEAS Transacton on Computers, 3(3), 700-705. Query Clusterng Usng a Hybrd Query Smlarty Measure Ln Fu, Don Hoe-Lan

More information

User Authentication Based On Behavioral Mouse Dynamics Biometrics

User Authentication Based On Behavioral Mouse Dynamics Biometrics User Authentcaton Based On Behavoral Mouse Dynamcs Bometrcs Chee-Hyung Yoon Danel Donghyun Km Department of Computer Scence Department of Computer Scence Stanford Unversty Stanford Unversty Stanford, CA

More information

Random Kernel Perceptron on ATTiny2313 Microcontroller

Random Kernel Perceptron on ATTiny2313 Microcontroller Random Kernel Perceptron on ATTny233 Mcrocontroller Nemanja Djurc Department of Computer and Informaton Scences, Temple Unversty Phladelpha, PA 922, USA nemanja.djurc@temple.edu Slobodan Vucetc Department

More information

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION SHI-LIANG SUN, HONG-LEI SHI Department of Computer Scence and Technology, East Chna Normal Unversty 500 Dongchuan Road, Shangha 200241, P. R. Chna E-MAIL: slsun@cs.ecnu.edu.cn,

More information

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe CSCI 104 Sortng Algorthms Mark Redekopp Davd Kempe Algorthm Effcency SORTING 2 Sortng If we have an unordered lst, sequental search becomes our only choce If we wll perform a lot of searches t may be benefcal

More information

Concurrent Apriori Data Mining Algorithms

Concurrent Apriori Data Mining Algorithms Concurrent Apror Data Mnng Algorthms Vassl Halatchev Department of Electrcal Engneerng and Computer Scence York Unversty, Toronto October 8, 2015 Outlne Why t s mportant Introducton to Assocaton Rule Mnng

More information

CSE 326: Data Structures Quicksort Comparison Sorting Bound

CSE 326: Data Structures Quicksort Comparison Sorting Bound CSE 326: Data Structures Qucksort Comparson Sortng Bound Steve Setz Wnter 2009 Qucksort Qucksort uses a dvde and conquer strategy, but does not requre the O(N) extra space that MergeSort does. Here s the

More information

A Similarity Measure Method for Symbolization Time Series

A Similarity Measure Method for Symbolization Time Series Research Journal of Appled Scences, Engneerng and Technology 5(5): 1726-1730, 2013 ISSN: 2040-7459; e-issn: 2040-7467 Maxwell Scentfc Organzaton, 2013 Submtted: July 27, 2012 Accepted: September 03, 2012

More information

Classifying Acoustic Transient Signals Using Artificial Intelligence

Classifying Acoustic Transient Signals Using Artificial Intelligence Classfyng Acoustc Transent Sgnals Usng Artfcal Intellgence Steve Sutton, Unversty of North Carolna At Wlmngton (suttons@charter.net) Greg Huff, Unversty of North Carolna At Wlmngton (jgh7476@uncwl.edu)

More information

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning Outlne Artfcal Intellgence and ts applcatons Lecture 8 Unsupervsed Learnng Professor Danel Yeung danyeung@eee.org Dr. Patrck Chan patrckchan@eee.org South Chna Unversty of Technology, Chna Introducton

More information

Machine Learning. Support Vector Machines. (contains material adapted from talks by Constantin F. Aliferis & Ioannis Tsamardinos, and Martin Law)

Machine Learning. Support Vector Machines. (contains material adapted from talks by Constantin F. Aliferis & Ioannis Tsamardinos, and Martin Law) Machne Learnng Support Vector Machnes (contans materal adapted from talks by Constantn F. Alfers & Ioanns Tsamardnos, and Martn Law) Bryan Pardo, Machne Learnng: EECS 349 Fall 2014 Support Vector Machnes

More information

THE CONDENSED FUZZY K-NEAREST NEIGHBOR RULE BASED ON SAMPLE FUZZY ENTROPY

THE CONDENSED FUZZY K-NEAREST NEIGHBOR RULE BASED ON SAMPLE FUZZY ENTROPY Proceedngs of the 20 Internatonal Conference on Machne Learnng and Cybernetcs, Guln, 0-3 July, 20 THE CONDENSED FUZZY K-NEAREST NEIGHBOR RULE BASED ON SAMPLE FUZZY ENTROPY JUN-HAI ZHAI, NA LI, MENG-YAO

More information

Journal of Chemical and Pharmaceutical Research, 2014, 6(6): Research Article. A selective ensemble classification method on microarray data

Journal of Chemical and Pharmaceutical Research, 2014, 6(6): Research Article. A selective ensemble classification method on microarray data Avalable onlne www.ocpr.com Journal of Chemcal and Pharmaceutcal Research, 2014, 6(6):2860-2866 Research Artcle ISSN : 0975-7384 CODEN(USA) : JCPRC5 A selectve ensemble classfcaton method on mcroarray

More information

A Deflected Grid-based Algorithm for Clustering Analysis

A Deflected Grid-based Algorithm for Clustering Analysis A Deflected Grd-based Algorthm for Clusterng Analyss NANCY P. LIN, CHUNG-I CHANG, HAO-EN CHUEH, HUNG-JEN CHEN, WEI-HUA HAO Department of Computer Scence and Informaton Engneerng Tamkang Unversty 5 Yng-chuan

More information

A Lazy Ensemble Learning Method to Classification

A Lazy Ensemble Learning Method to Classification IJCSI Internatonal Journal of Computer Scence Issues, Vol. 7, Issue 5, September 2010 ISSN (Onlne): 1694-0814 344 A Lazy Ensemble Learnng Method to Classfcaton Haleh Homayoun 1, Sattar Hashem 2 and Al

More information

Pruning Training Corpus to Speedup Text Classification 1

Pruning Training Corpus to Speedup Text Classification 1 Prunng Tranng Corpus to Speedup Text Classfcaton Jhong Guan and Shugeng Zhou School of Computer Scence, Wuhan Unversty, Wuhan, 430079, Chna hguan@wtusm.edu.cn State Key Lab of Software Engneerng, Wuhan

More information

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like:

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like: Self-Organzng Maps (SOM) Turgay İBRİKÇİ, PhD. Outlne Introducton Structures of SOM SOM Archtecture Neghborhoods SOM Algorthm Examples Summary 1 2 Unsupervsed Hebban Learnng US Hebban Learnng, Cntd 3 A

More information

Type-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data

Type-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data Malaysan Journal of Mathematcal Scences 11(S) Aprl : 35 46 (2017) Specal Issue: The 2nd Internatonal Conference and Workshop on Mathematcal Analyss (ICWOMA 2016) MALAYSIAN JOURNAL OF MATHEMATICAL SCIENCES

More information

Performance Evaluation of Information Retrieval Systems

Performance Evaluation of Information Retrieval Systems Why System Evaluaton? Performance Evaluaton of Informaton Retreval Systems Many sldes n ths secton are adapted from Prof. Joydeep Ghosh (UT ECE) who n turn adapted them from Prof. Dk Lee (Unv. of Scence

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

SHAPE RECOGNITION METHOD BASED ON THE k-nearest NEIGHBOR RULE

SHAPE RECOGNITION METHOD BASED ON THE k-nearest NEIGHBOR RULE SHAPE RECOGNITION METHOD BASED ON THE k-nearest NEIGHBOR RULE Dorna Purcaru Faculty of Automaton, Computers and Electroncs Unersty of Craoa 13 Al. I. Cuza Street, Craoa RO-1100 ROMANIA E-mal: dpurcaru@electroncs.uc.ro

More information

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Determining the Optimal Bandwidth Based on Multi-criterion Fusion Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Mult-crteron Fuson Ha-L Lang 1+, Xan-Mn

More information

LECTURE : MANIFOLD LEARNING

LECTURE : MANIFOLD LEARNING LECTURE : MANIFOLD LEARNING Rta Osadchy Some sldes are due to L.Saul, V. C. Raykar, N. Verma Topcs PCA MDS IsoMap LLE EgenMaps Done! Dmensonalty Reducton Data representaton Inputs are real-valued vectors

More information

Shape Representation Robust to the Sketching Order Using Distance Map and Direction Histogram

Shape Representation Robust to the Sketching Order Using Distance Map and Direction Histogram Shape Representaton Robust to the Sketchng Order Usng Dstance Map and Drecton Hstogram Department of Computer Scence Yonse Unversty Kwon Yun CONTENTS Revew Topc Proposed Method System Overvew Sketch Normalzaton

More information

Some Advanced SPC Tools 1. Cumulative Sum Control (Cusum) Chart For the data shown in Table 9-1, the x chart can be generated.

Some Advanced SPC Tools 1. Cumulative Sum Control (Cusum) Chart For the data shown in Table 9-1, the x chart can be generated. Some Advanced SP Tools 1. umulatve Sum ontrol (usum) hart For the data shown n Table 9-1, the x chart can be generated. However, the shft taken place at sample #21 s not apparent. 92 For ths set samples,

More information

Solving two-person zero-sum game by Matlab

Solving two-person zero-sum game by Matlab Appled Mechancs and Materals Onlne: 2011-02-02 ISSN: 1662-7482, Vols. 50-51, pp 262-265 do:10.4028/www.scentfc.net/amm.50-51.262 2011 Trans Tech Publcatons, Swtzerland Solvng two-person zero-sum game by

More information

Sequential search. Building Java Programs Chapter 13. Sequential search. Sequential search

Sequential search. Building Java Programs Chapter 13. Sequential search. Sequential search Sequental search Buldng Java Programs Chapter 13 Searchng and Sortng sequental search: Locates a target value n an array/lst by examnng each element from start to fnsh. How many elements wll t need to

More information

Face Recognition Method Based on Within-class Clustering SVM

Face Recognition Method Based on Within-class Clustering SVM Face Recognton Method Based on Wthn-class Clusterng SVM Yan Wu, Xao Yao and Yng Xa Department of Computer Scence and Engneerng Tong Unversty Shangha, Chna Abstract - A face recognton method based on Wthn-class

More information

Learning-Based Top-N Selection Query Evaluation over Relational Databases

Learning-Based Top-N Selection Query Evaluation over Relational Databases Learnng-Based Top-N Selecton Query Evaluaton over Relatonal Databases Lang Zhu *, Wey Meng ** * School of Mathematcs and Computer Scence, Hebe Unversty, Baodng, Hebe 071002, Chna, zhu@mal.hbu.edu.cn **

More information

USING GRAPHING SKILLS

USING GRAPHING SKILLS Name: BOLOGY: Date: _ Class: USNG GRAPHNG SKLLS NTRODUCTON: Recorded data can be plotted on a graph. A graph s a pctoral representaton of nformaton recorded n a data table. t s used to show a relatonshp

More information

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints Australan Journal of Basc and Appled Scences, 2(4): 1204-1208, 2008 ISSN 1991-8178 Sum of Lnear and Fractonal Multobjectve Programmng Problem under Fuzzy Rules Constrants 1 2 Sanjay Jan and Kalash Lachhwan

More information

Semantic Image Retrieval Using Region Based Inverted File

Semantic Image Retrieval Using Region Based Inverted File Semantc Image Retreval Usng Regon Based Inverted Fle Dengsheng Zhang, Md Monrul Islam, Guoun Lu and Jn Hou 2 Gppsland School of Informaton Technology, Monash Unversty Churchll, VIC 3842, Australa E-mal:

More information

CSE 326: Data Structures Quicksort Comparison Sorting Bound

CSE 326: Data Structures Quicksort Comparison Sorting Bound CSE 326: Data Structures Qucksort Comparson Sortng Bound Bran Curless Sprng 2008 Announcements (5/14/08) Homework due at begnnng of class on Frday. Secton tomorrow: Graded homeworks returned More dscusson

More information

Virtual Machine Migration based on Trust Measurement of Computer Node

Virtual Machine Migration based on Trust Measurement of Computer Node Appled Mechancs and Materals Onlne: 2014-04-04 ISSN: 1662-7482, Vols. 536-537, pp 678-682 do:10.4028/www.scentfc.net/amm.536-537.678 2014 Trans Tech Publcatons, Swtzerland Vrtual Machne Mgraton based on

More information

Investigating the Performance of Naïve- Bayes Classifiers and K- Nearest Neighbor Classifiers

Investigating the Performance of Naïve- Bayes Classifiers and K- Nearest Neighbor Classifiers Journal of Convergence Informaton Technology Volume 5, Number 2, Aprl 2010 Investgatng the Performance of Naïve- Bayes Classfers and K- Nearest Neghbor Classfers Mohammed J. Islam *, Q. M. Jonathan Wu,

More information

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms Course Introducton Course Topcs Exams, abs, Proects A quc loo at a few algorthms 1 Advanced Data Structures and Algorthms Descrpton: We are gong to dscuss algorthm complexty analyss, algorthm desgn technques

More information

UB at GeoCLEF Department of Geography Abstract

UB at GeoCLEF Department of Geography   Abstract UB at GeoCLEF 2006 Mguel E. Ruz (1), Stuart Shapro (2), June Abbas (1), Slva B. Southwck (1) and Davd Mark (3) State Unversty of New York at Buffalo (1) Department of Lbrary and Informaton Studes (2) Department

More information

Using Fuzzy Logic to Enhance the Large Size Remote Sensing Images

Using Fuzzy Logic to Enhance the Large Size Remote Sensing Images Internatonal Journal of Informaton and Electroncs Engneerng Vol. 5 No. 6 November 015 Usng Fuzzy Logc to Enhance the Large Sze Remote Sensng Images Trung Nguyen Tu Huy Ngo Hoang and Thoa Vu Van Abstract

More information

FAHP and Modified GRA Based Network Selection in Heterogeneous Wireless Networks

FAHP and Modified GRA Based Network Selection in Heterogeneous Wireless Networks 2017 2nd Internatonal Semnar on Appled Physcs, Optoelectroncs and Photoncs (APOP 2017) ISBN: 978-1-60595-522-3 FAHP and Modfed GRA Based Network Selecton n Heterogeneous Wreless Networks Xaohan DU, Zhqng

More information

Array transposition in CUDA shared memory

Array transposition in CUDA shared memory Array transposton n CUDA shared memory Mke Gles February 19, 2014 Abstract Ths short note s nspred by some code wrtten by Jeremy Appleyard for the transposton of data through shared memory. I had some

More information

CS 534: Computer Vision Model Fitting

CS 534: Computer Vision Model Fitting CS 534: Computer Vson Model Fttng Sprng 004 Ahmed Elgammal Dept of Computer Scence CS 534 Model Fttng - 1 Outlnes Model fttng s mportant Least-squares fttng Maxmum lkelhood estmaton MAP estmaton Robust

More information

Classification / Regression Support Vector Machines

Classification / Regression Support Vector Machines Classfcaton / Regresson Support Vector Machnes Jeff Howbert Introducton to Machne Learnng Wnter 04 Topcs SVM classfers for lnearly separable classes SVM classfers for non-lnearly separable classes SVM

More information

Conditional Speculative Decimal Addition*

Conditional Speculative Decimal Addition* Condtonal Speculatve Decmal Addton Alvaro Vazquez and Elsardo Antelo Dep. of Electronc and Computer Engneerng Unv. of Santago de Compostela, Span Ths work was supported n part by Xunta de Galca under grant

More information

Lecture #15 Lecture Notes

Lecture #15 Lecture Notes Lecture #15 Lecture Notes The ocean water column s very much a 3-D spatal entt and we need to represent that structure n an economcal way to deal wth t n calculatons. We wll dscuss one way to do so, emprcal

More information

BIN XIA et al: AN IMPROVED K-MEANS ALGORITHM BASED ON CLOUD PLATFORM FOR DATA MINING

BIN XIA et al: AN IMPROVED K-MEANS ALGORITHM BASED ON CLOUD PLATFORM FOR DATA MINING An Improved K-means Algorthm based on Cloud Platform for Data Mnng Bn Xa *, Yan Lu 2. School of nformaton and management scence, Henan Agrcultural Unversty, Zhengzhou, Henan 450002, P.R. Chna 2. College

More information

A fast algorithm for color image segmentation

A fast algorithm for color image segmentation Unersty of Wollongong Research Onlne Faculty of Informatcs - Papers (Arche) Faculty of Engneerng and Informaton Scences 006 A fast algorthm for color mage segmentaton L. Dong Unersty of Wollongong, lju@uow.edu.au

More information