An efficient weighted nearest neighbour classifier using vertical data representation
|
|
- Francis Gardner
- 6 years ago
- Views:
Transcription
1 Int. J. Busness Intellgence and Data Mnng, Vol. x, No. x, xxxx 1 An effcent weghted nearest neghbour classfer usng vertcal data representaton Wllam Perrzo Department of Computer Scence, North Dakota State Unversty, P.O. Box 5164, Fargo, ND , USA E-mal: wllam.perrzo@ndsu.edu Qn Dng* Department of Computer Scence, Pennsylvana State Unversty at Harrsburg, Mddletown, PA 17057, USA Fax: E-mal: qdng@psu.edu *Correspondng author Maleq Khan Department of Computer Scence, Purdue Unversty, 250 N. Unversty St., West Lafayette, IN 47907, USA E-mal: mmkhan@cs.purdue.edu Anne Denton Department of Computer Scence and Operatons Research, North Dakota State Unversty, P.O. Box 5164, Fargo, ND , USA E-mal: anne.denton@ndsu.edu Qang Dng Jangsu Telecom Co., Ltd., Huan Cheng Nan Lu #88, Nantong, Jangsu , Chna E-mal: qdng74@gmal.com Abstract: The k-nearest neghbour (KNN) technque s a smple yet effectve method for classfcaton. In ths paper, we propose an effcent weghted nearest neghbour classfcaton algorthm, called PINE, usng vertcal data representaton. A metrc called HOBBt s used as the dstance metrc. The PINE algorthm apples a Gaussan podum functon to set weghts to dfferent neghbours. We compare PINE wth classcal KNN methods usng horzontal and vertcal representaton wth dfferent dstance metrcs. The expermental results show that PINE outperforms other KNN methods n terms of classfcaton accuracy and runnng tme. Copyrght 200x Inderscence Enterprses Ltd.
2 2 W. Perrzo, Q. Dng, M. Khan, A. Denton and Q. Dng Keywords: nearest neghbours; classfcaton; data mnng; vertcal data; spatal data. Reference to ths paper should be made as follows: Perrzo, W., Dng, Q., Khan, M., Denton, A. and Dng, Q. (xxxx) An effcent weghted nearest neghbour classfer usng vertcal data representaton, Int. J. Busness Intellgence and Data Mnng, Vol. x, No. x, pp.xxx xxx. Bographcal notes: Wllam Perrzo s a Professor of Computer Scence at North Dakota State Unversty. He holds a PhD Degree from the Unversty of Mnnesota, a Master s Degree from the Unversty of Wsconsn and a Bachelor s Degree from St. John s Unversty. He has been a Research Scentst at the IBM Advanced Busness Systems Dvson and the USA Ar Force Electronc Systems Dvson. Hs areas of expertse are data mnng, knowledge dscovery, database systems, dstrbuted database systems, hgh-speed computer and communcatons networks, precson agrculture and bonformatcs. He s a member of ISCA, ACM, IEEE, IAAA and AAAS. Qn Dng s an Assstant Professor n Computer Scence at Pennsylvana State Unversty at Harrsburg, USA. She receved her PhD n Computer Scence from North Dakota State Unversty, USA, MS and BS n Computer Scence from Nanjng Unversty, Chna. Her research nterests nclude data mnng, database, bonformatcs and spatal data. Maleq Khan receved BS n Computer Scence and Engneerng from Bangladesh Unversty of Engneerng and Technology, Bangladesh and MS n Computer Scence from North Dakota State Unversty, ND, USA. He s currently workng toward the PhD Degree n Computer Scence at Purdue Unversty, IN, USA. Hs research nterests nclude dstrbuted algorthms, communcaton networks, wreless sensor networks, bonformatcs and data mnng. Anne Denton s an Assstant Professor n Computer Scence at North Dakota State Unversty, USA. Her research nterests are n data mnng, knowledge dscovery n scentfc data and bonformatcs. Specfc nterests nclude data mnng of dverse data, n whch objects are charactersed by a varety of propertes such as numercal and categorcal attrbutes, graphs, sequences, tme-dependent attrbutes and others. She receved her PhD n Physcs from the Unversty of Manz, Germany and her MS n Computer Scence from North Dakota State Unversty, Fargo, ND, USA. Qang Dng receved hs MS and PhD n Computer Scence from North Dakota State Unversty, USA, BS n Computer Scence from Nanjng Unversty, Chna. He was an Assstant Professor n Computer Scence at Concorda College, MN, USA, before he joned Jangsu Telecom Co., Ltd., Chna. Hs research nterests nclude database and data mnng. 1 Introducton The nearest neghbour technque (Cover and Hart, 1967; Cover, 1968; Dasarathy, 1991; Duda and Hart, 1973; McLachlan, 1992; Safar, 2005) s a smple yet effectve method for classfcaton. Gven a set of tranng data, a KNN classfer predcts the class value for a new sample X by searchng the tranng set for the KNNs of X based on certan dstance
3 An effcent weghted nearest neghbour classfer 3 metrc and then assgnng to X the most frequent class occurrng n ts KNNs. The value of k,.e., the number of neghbours, s pre-specfed. Nearest neghbour classfers are lazy learners n that a classfer s not bult untl a new sample needs to be classfed. They dffer from eager classfers, such as decson tree nducton (Breman et al., 1984; Qunlan, 1993; Jeong and Lee, 2005; James, 1985; Fu, 2005). In classcal KNN methods, data are represented n horzontal format. Each tranng observaton s a tuple that conssts of n features (or called attrbutes). Each tme when a new sample needs to be classfed, the entre tranng data set s scanned to fnd the KNNs of ths new sample. The database scan can be costly f the tranng data s extremely large. In ths paper, we propose a nearest neghbour algorthm, called Podum Incremental Neghbour Evaluator (PINE), usng vertcal data representaton. Vertcal data representaton can usually provde hgh compresson rato for large datasets. Vertcal representaton s partcularly sutable for spatal data, not only because the spatal data set s typcally large, but also because the contnuty of spatal neghbourhood provdes potental for even hgher compresson rato usng vertcal representaton. In most KNN methods, each of the KNNs casts an equal vote for the class of the new sample. In the PINE algorthm, we apply a Gaussan podum functon to dfferent neghbours wth dfferent weghts. The podum functon s chosen n such a way that the neghbours close to the sample have hgher mpact than others n determnng the class of the sample. In addton, all the samples n the database are consdered neghbours, though some samples have very low or even zero weghts. By dong so, there wll be no need to pre-specfy the k value and more mportantly, hgh classfcaton accuracy wll be acheved. The dea of usng weghts n KNN s not new. It s common to assgn weghts to dfferent features n the dstance metrcs snce some features may be more relevant than others n terms of closeness or dstance (Fu and Wang, 2005). Weghts can also be assgned to the nstances. Cost and Salzberg presented a method of attachng weghts to dfferent nstances for learnng wth symbolc features (Cost and Salzberg, 1993). Some nstances are more relable and are consdered exemplars. These relable exemplars are gven smaller weghts to make them appear closer to a new sample. In our work, based on the dstances from the new sample, we partton the entre tranng set nto dfferent groups (called rngs ) and then assgn a weght to each rng usng a podum functon. The podum functon establshes a rser heght for each step of the podum weghtng functon as the dstance from the sample grows. The dea of a podum functon s smlar to the concept of a radal bass functon (Nefeld and Psalts, 1993; Al and Smth, 2005). In ths paper, we use a vertcal data representaton called P-tree 1 (Perrzo et al., 2001a; Dng et al., 2002). P-trees are compact and data-mnng-ready structure, whch provdes lossless representaton of the orgnal horzontal data. P-trees represent bt nformaton that s obtaned from the data through a separaton nto bt planes. The mult-level structure s used to acheve hgh compresson. A consstent mult-level structure s mantaned across all bt planes of all attrbutes. Ths s done so that a smple mult-way logcal AND operaton can be used to reconstruct count nformaton for any attrbute value or tuple. Dfferent metrcs can be defned for closeness of two data ponts. In ths paper, we use a metrc called Hgh Order Basc Bt smlarty (HOBBt) (Khan et al., 2002). HOBBt dstance metrc s partcularly sutable for spatal data.
4 4 W. Perrzo, Q. Dng, M. Khan, A. Denton and Q. Dng We run the PINE algorthm on very large real mage datasets. Our expermental results show that t s much faster to buld a KNN classfer usng vertcal data representaton than usng horzontal data representaton for very large datasets. It also shows that the PINE algorthm acheves hgher classfcaton accuracy than other KNN methods. The rest of the paper s organsed as follows. In Secton 2, we revew the vertcal P-tree structure. In Secton 3, we ntroduce the HOBBt metrc and detal our PINE algorthm. Performance analyss s gven n Secton 4. Secton 5 concludes the paper. 2 The vertcal P-tree structure revew We use a vertcal structure, called a Peano Count Tree (P-tree), to represent the data. We frst splt each attrbute value nto bts. For example, for mage data, each band s an attrbute and the value s represented as a byte (8 bts). The fle for each ndvdual bt s called a bsq fle. For an attrbute wth m-bt values, we have m bsq fles. We organse each bsq bt fle, B j (the fle constructed from the jth bts of th attrbute), nto a P-tree. An example of an 8 8 bsq fle and ts P-tree s gven n Fgure 1. Fgure 1 P-tree and PM-tree In ths example, 39 s the count of 1 s n the bsq fle, called root count. The numbers at the next level, 16, 8, 15 and 16, are the 1-bt counts for the four major quadrants. Snce the frst and last quadrant s made up of entrely 1-bts and 0-bts respectvely (called pure1 and pure0 quadrant respectvely), we do not need sub-trees for these two quadrants. Ths pattern s contnued recursvely. Recursve raster orderng s called the Peano or Z-orderng n the lterature therefore, the name P-trees. The process wll eventually termnate at the leaf level where each quadrant s a 1-row-1-column quadrant. For each band, assumng 8-bt data values, we get 8 basc P-trees, one for each bt poston. For band B we wll label the basc P-trees, P,1, P,2,, P,8, where P,j s a lossless representaton of the jth bt of the values from the th band. However, the P,j provdes much more nformaton and s structured to facltate many mportant data mnng processes. For effcent mplementaton, we use a varaton of P-trees, called PM-tree (Pure Mask tree), n whch mask nstead of count s used. In the PM-tree, 3-value logc s used,.e., 11 represents a pure1 quadrant, 00 represents a pure0 quadrant and 01 represents a mxed quadrant. To smplfy, we use 1 for pure1, 0 for pure0 and m for mxed. Ths s llustrated n the thrd part of Fgure 1.
5 An effcent weghted nearest neghbour classfer 5 P-tree algebra contans operators AND, OR, NOT and XOR, whch are the pxel-by-pxel logcal operatons on P-trees (Dng et al., 2002). The NOT operaton s a straghtforward translaton of each count to ts quadrant-complement. The AND and OR operatons are shown n Fgure 2. Fgure 2 P-tree algebra (AND and OR) The basc P-trees can be combned usng smple logcal operatons to produce P-trees for the orgnal values (at any level of precson, 1-bt precson, 2-bt precson, etc.). We let P b,v denote the P-tree for attrbute, b and value, v, where v can be expressed n any bt precson. Usng the 8-bt precson for values, P b, can be constructed from the basc P-trees as: P b, = P b1 AND P b2 AND P b3 AND P b4 AND P b5 AND P b6 AND P b7 AND P b8 Where ndcates the NOT operaton. The AND operaton s smply the pxel-wse AND of the bts. Smlarly, the data n the relatonal format can be represented as P-trees also. For any combnaton of values (v 1,v 2,, v n ), where v s from attrbute-, the quadrant-wse count of occurrences of ths combnaton of values s gven by: P P P P ( v1, v2,, v ) = 1, AND 1 2, AND AND n v v 2 n, vn P-trees are mplemented n an effcent way, whch facltates fast P-tree operatons and consderable savng of space. More detals can be found n Dng et al. (2002) and Perrzo et al. (2001b).
6 6 W. Perrzo, Q. Dng, M. Khan, A. Denton and Q. Dng 3 Dstance-weghted nearest neghbour classfcaton usng vertcal data representaton In classcal KNN classfcaton technques, there s a lmt placed on the number of neghbours. In our dstance-weghted neghbour classfcaton approach, the podum or dstance weghtng functon establshes a rser heght for each step of the podum weghtng functon as the dstance from the sample grows. Ths approach gves users maxmum flexblty n choosng just the rght level of nfluence for each tranng sample n the entre tranng set. The real queston s, can ths level of flexblty be offered wthout mposng a severe penalty wth respect to the speed of the classfer. Tradtonally, sub-samplng, neghbour-lmtng and other restrctons are ntroduced precsely to ensure that the algorthm wll fnsh ts classfcaton n reasonable tme. The use of the compressed, data-mnng-ready data structure, the P-tree, n fact, makes PINE even faster than tradtonal methods. Ths s crtcally mportant n classfcaton snce data are typcally never dscarded and therefore the tranng set wll grow wthout bound. The classfcaton technque must scale well or t wll quckly become unusable n ths settng. PINE scales well snce ts accuracy ncreases as the tranng set grows whle ts speed remans very reasonable (see the performance study below). Furthermore, snce PINE s lazy (does not requre a tranng phase n whch a closed form classfer s pre-bult), t does not ncur the expensve delays requred for rebuldng a classfer when new tranng data arrves. Thus, PINE gves us a faster and more accurate classfer. Before explanng how dstance-weghted neghbour classfer PINE usng P-trees works, we gve an overvew of the technque of KNN classfcaton and llustrate how to calculate KNN usng P-trees wthout consderng weghts. In KNN the basc dea s that the tuples that most lkely belong to the same class are those that are smlar n the other attrbutes. Ths contnuty assumpton s consstent wth the propertes of a spatal neghbourhood. Based on some pre-selected dstance metrc or smlarty measure, such as Eucldean dstance, classcal KNN fnds the k most smlar or nearest tranng samples to an unclassfed sample and assgns the pluralty class of those k samples to the new sample (Dasarathy, 1991; Nefeld and Psalts, 1993). The value for k s pre-selected by the user based on the accuracy requred (usually the larger the value of k, the more accurate the classfer) and the delay tme requred for classfyng wth that k-value (usually the larger the value of k the slower the classfer). The steps of the classfcaton process are: determne a sutable dstance metrc fnd the KNNs usng the selected dstance metrc fnd the pluralty class of the KNNs (votng on the class labels of the KNNs) assgn that class to the sample to be classfed. We use a dstance metrc called HOBBt, whch provdes an effcent method of computaton based on P-trees. Instead of examnng ndvdual tranng samples to fnd the nearest neghbours, we start our ntal neghbourhood of the target sample wthn a specfed dstance n the feature space based on ths metrc, and then successvely expand the neghbourhood area untl there are at least k tuples n the neghbourhood set.
7 An effcent weghted nearest neghbour classfer 7 Of course, there may be more boundary neghbours equdstant from the sample than are necessary to complete the KNN set, n whch case, one can ether use the larger set or arbtrarly gnore some of them. Tradtonal methods fnd the exact KNN set, snce that s easest usng tradtonal technques although t s clear that allowng some samples at that dstance to vote and not others, wll skew the result. Instead, the P-tree-based KNN approach bulds a closed nearest neghbour set (closed-knn) (Khan et al., 2002), that s, we nclude all of the boundary neghbours. The nductve defnton of the closed-knn set s gven below. a If x KNN, then x closed-knn b If x closed-knn and d(t, y) d(t, x), then y closed-knn, where, d(t, x) s the dstance of x from target T c Closed-KNN does not contan any tuple whch cannot be produced by steps (a) and (b). Expermental results show closed-knn yelds hgher classfcaton accuracy than KNN does. If there are many tuples on the boundary, ncluson of some but not all of them skews the votng mechansm. The P-tree mplementaton requres no extra computaton to fnd the closed-knn. Our neghbourhood expanson mechansm automatcally ncludes the entre boundary of the neghbourhood. P-tree algorthms avod the examnaton of ndvdual data ponts, whch mproves the classfcaton effcency. 3.1 Hgher Order Basc bt (HOBBt) dstance Usng a sutable dstance or smlarty metrc s very mportant n KNN methods because t can strongly affect performance. There s no metrc that works best for all the cases. Research has been done to fnd the rght metrc for the rght problem (Short and Fukanaga, 1980, 1981; Fredman, 1994; Myles and Hand, 1990; Haste and Tbshran, 1996; Domencon et al., 2002). The most commonly used dstance metrc s Eucldean metrc. For two data ponts, X = <x 1, x 2, x 3,, x n 1 > and Y = <y 1, y 2, y 3,, y n 1 >, the Eucldean smlarty functon s defned as 2 n 1 d ( X, Y) = ( x y ) = 1 It can be generalsed to the Mnkowsk smlarty functon, n 1 (, ) q q q = = 1 d X Y w x y 2 If q = 2, ths gves the Eucldean functon. If q = 1, t gves the Manhattan dstance, whch s 1 n 1 d ( X, Y) = x y = 1
8 8 W. Perrzo, Q. Dng, M. Khan, A. Denton and Q. Dng If q =, t gves the max functon n 1 d ( X, Y) = max x y. = 1 In ths paper we use the HOBBt dstance metrc (Khan et al., 2002), whch s a natural choce for spatal data. The HOBBt metrc measures dstance based on the most sgnfcant consecutve bt postons startng from the left (the hghest order bt). Smlarty or closeness s of nterest. When comparng two values btwse from left to rght, once a dfference s found, any further comparsons are not needed. The HOBBt smlarty between two ntegers A and B s defned by S H (A, B) = max{s 0 s a = b } (1) where a and b are the th bts of A and B respectvely. The HOBBt dstance between two tuples X and Y s defned by n 1 H = 1 H d ( X, Y) = max{ m S ( x, y )} (2) where m s the number of bts n bnary representatons of the values; n 1 s the number of attrbutes used for measurng dstance (the nth beng the class attrbute); and x and y are the th attrbutes of tuples X and Y. The HOBBt dstance between two tuples s a functon of the least smlar parng of attrbute values n them. From the experments, we found that HOBBt dstance s more sutable for spatal data than other dstance metrcs. 3.2 Closed-KNN To fnd the closed KNN set, frst we look for the tuples whch are dentcal to the target tuple n all 8 bts of all bands,.e., the tuples, X, havng dstance from the target T, d H (X, T) = 0. If, for nstance, t 1 = 105 ( b = 105 d ) s a target attrbute value, the ntal nterval of nterest s [105, 105] ([ , ]). If over all tuples the number of matches s less than k, we compare attrbutes on the bass of the seven most sgnfcant bts, not carng about the 8th bt. The expanded nterval of nterest would be [104,105] ([ , ] or [ , ]). If k matches stll have not been found, removng one more bt from the rght gves the nterval [104, 107] ([ , ]). Contnung to remove bts from the rght we get ntervals, [104, 111], then [96, 111] and so on. Ths process s mplemented usng P-trees as follows. P,j s the basc P-tree for bt j of band and P,j s the complement of P,j. Let b,j be the jth bt of the th band of the target tuple, and for mplementaton purposes let the representaton of the P-tree depend on the value of b j. Defne: Pt,j = P,j f b,j = 1, = P,j, otherwse.
9 An effcent weghted nearest neghbour classfer 9 Then the root count of Pt,j s the number of tuples n the tranng dataset havng the same value as the jth bt of the th band of the target tuple. Defne: Pv,1 j = Pt,1 & Pt,2 & Pt,3 & & Pt,j (3) where & s the P-tree AND operator and n s the number of bands. Pv,1 j counts the tuples havng the same bt values as the target tuple n the hgher order j bts of th band. Then a neghbourhood P-tree can be formed as follows: Pnn(j) = Pv 1,1 j & Pv 2,1 j & Pv 3,1 j & & Pv n 1,1 j (4) We calculate the ntal neghbourhood P-tree, Pnn(8), matchng exactly n all bands, consderng 8-bt values. Then we calculate Pnn(7), matchng n seven hgher order bts; then Pnn(6) and so on. We contnue as long as the root count of Pnn(j) s less than k. Let us denote the fnal Pnn(j) by Pcnn. Pcnn represents the closed-knn set and the root count of Pcnn s the number of the nearest tuples. A bt of one n Pcnn for a tuple means that the tuple s n the closed-knn set. For the purpose of classfcaton, we do not need to consder all bts n the class band. If the class band s 8 bts long, there are 256 possble classes. Instead, we partton the class band values nto fewer, say 8, groups by truncatng the 5 least sgnfcant bts. The 8 classes are 0, 1, 2,, 7. Usng the leftmost 3 bts we construct the value P-trees P n (0), P n (1),, P n (7). The P-tree Pcnn & P n () represents the tuples havng a class value that are n the closed-knn set, Pcnn. An, whch yelds the maxmum root count of Pcnn & P n () s the pluralty class; that s predcted class = arg max{rc( Pcnn & P ( ))} (5) where, RC(P) s the root count of P. 3.3 PINE: weghted nearest neghbour classfer usng vertcal data representaton The contnuty assumpton of KNN tells us that tuples that are more smlar to a gven tuple have more nfluence on classfcaton than tuples that are less smlar. Therefore gvng more votng weght to closer tuples than dstant tuples ncreases the classfcaton accuracy. Instead of consderng the KNNs, we nclude all of the ponts, usng the largest weght, 1, for those matchng exactly, and the smallest weght, 0, for those furthest away. Fgure 3 shows the neghbourhood rngs usng HOBBt metrc. Many weghtng functons whch decrease wth dstance can be used (e.g., Gaussan, Krgng, etc.). A smple example of such functon s a lnear podum functon (shown n Fgure 4) whch decreases step-by-step wth dstance. Note that the HOBBt dstance metrc s deally suted to the defnton of neghbourhood rngs, because the range of ponts that are consdered equdstant grows exponentally wth dstance from the centre. Adjustng weghts s partcularly mportant for small to ntermedate dstances where the podums are small. At larger dstances where fne-tunng s less mportant the HOBBt dstance remans unchanged over a large range,.e., podums are wder. Ideally, the 0-weghted rng should nclude all tranng samples that are judged to be too far away (by a doman expert) to nfluence class. n
10 10 W. Perrzo, Q. Dng, M. Khan, A. Denton and Q. Dng Fgure 3 Neghbourhood rngs usng HOBBt metrc Fgure 4 An example of a lnear podum functon We number the rngs from 0 (outermost) to m (nnermost). Let w j be the weght assocated wth the rng j. Let c j be the number of neghbour tuples n the rng j belongng to the class. Then the total weght vote by the class s gven by: m V () = wc (6) j= 0 Ths can easly be transformed to: j j m m m V () = w0 cl + ( wj wj 1) ck k= 0 j= 1 k= j (7)
11 An effcent weghted nearest neghbour classfer 11 Let crcle j be the crcle formed by the rngs j, j + 1,, m, that s, the rng j ncludng all of ts nner rngs. Referrng to equaton (4), the P-tree, Pnn(j), represents all of the tuples n the crcle j. Therefore, {Pnn(j) & P n ()} represents the tuples n the crcle j and class ; P n () s the P-tree for class. Hence: m ck = RC{ Pnn( j) & Pn ( )} (8) k= j m V( ) = w RC{ Pnn(0) & P ( )} + [( w w )RC{ Pnn( j) & P ( )}] (9) 0 n j j 1 n j= 1 An whch yelds the maxmum weghted vote, V(), s the pluralty class or the predcted class; that s: predcted class = arg max{ V ( )} (10) 4 Performance analyss We have performed experments to evaluate PINE on real data sets. An example data set s gven n Fgure 5. Ths data set ncludes an aeral TIFF mage (wth Red, Green and Blue bands) and assocated ground data, such as ntrate, mosture and yeld, of the Oaks area n North Dakota. In ths data set, yeld s the class label attrbute,.e., the goal s to predct the yeld based on the values of other attrbutes. Ths data set and some other data sets are avalable at Fgure 5 An mage dataset: (a) TIFF mage; (b) ntrate map; (c) mosture map and (d) yeld map
12 12 W. Perrzo, Q. Dng, M. Khan, A. Denton and Q. Dng We tested KNN wth Manhattan, Eucldean, Max and HOBBt dstance metrcs; closed-knn wth the HOBBt metrc and PINE. In PINE, HOBBt was used as the dstance functon and the Gaussan functon was used as the podum functon. The functon s exp( (2 2 d )/(2 σ 2 )), where d s the HOBBt dstance and the varance σ s 2 4. The mappng of the functon s gven n Table 1. Table 1 Gaussan weghs as the functon of HoBBt dstance HOBBt dstance Gaussan wegh The comparson of the accuracy s gven n Fgure 6. We observed that PINE acheves hgher classfcaton accuracy than closed-knn does. Especally when the tranng set sze ncreases, the mprovement of PINE over closed-knn s more apparent. In addtonal, PINE performs much better than classcal KNN methods usng varous metrcs. All these classfers work better than raw guessng, whch s 12.5% n the data set wth eght classes. Fgure 6 Comparson of classfcaton accuracy for KNN usng dfferent metrcs, closed-knn and PINE In terms of runnng tme, from Fgure 7, we observed that both closed-knn and PINE are much faster than classcal KNN methods usng varous metrcs (notce that both the sze and classfcaton tme are plotted n logarthmc scale). The reason behnd ths s because both PINE and closed-knn use vertcal data representaton, whch provdes fast calculaton of nearest neghbours. We can also see that PINE s slghtly slower than
13 An effcent weghted nearest neghbour classfer 13 closed-knn. Ths s expected because n PINE all the tranng samples are ncluded as neghbours whle n closed-knn only k samples (wth possbly extra ponts on the boundary) are selected as neghbours to be nvolved n the calculaton. However, ths addtonal tme cost s relatvely small. On the average, PINE s eght tmes faster than classcal KNN, and closed-knn s ten tmes faster. Both PINE and closed-knn ncrease at a lower rate than classcal KNN methods do when the tranng set sze ncreases. Fgure 7 Comparson of classfcaton tme per sample (sze and classfcaton tme are plotted n logarthmc scale) 5 Conclusons In ths paper, we propose a weghted nearest neghbour classfer, called PINE. PINE uses a vertcal data structure P-tree and a dstance metrc called HOBBt. A Gaussan podum functon s used to weght the dfferent neghbours based on the dstance from the sample data. PINE does not requre provdng the number of neghbours n advance. PINE s partcular useful for classfcaton on large data sets, such as spatal data sets, whch have hgh compresson rato usng vertcal data representaton. Performance analyss shows that PINE outperforms classcal KNN methods n terms of accuracy and speed of classfcaton on spatal data. In addton, PINE has potental for effcent classfcaton on data streams. In data streams, new data keep arrvng, so the classfcaton effcency wll be an mportant ssue. PINE provdes hgh effcency as well as accuracy for classfcaton on data streams. As the areas of data mnng applcatons grow (Chen and Lu, 2005), our approach has potental to be extended to some of these areas, such as DNA mcroarray data analyss and medcal mage analyss.
14 14 W. Perrzo, Q. Dng, M. Khan, A. Denton and Q. Dng Acknowledgement Ths work s partally supported by GSA Grant K References Al, S. and Smth, K.A. (2005) Kernel wdth selecton for SVM classfcaton: a meta-learnng approach, Internatonal Journal of Data Warehousng and Mnng, Idea Group Inc., Vol. 1, No. 4, pp Breman, L., Fredman, J.H., Olshen, R.A. and Stone, C.J. (1984) Classfcaton and Regresson Trees, Wadsworth, Belmont. Chen, S.Y. and Lu, X. (2005) Data mnng from 1994 to 2004: an applcaton-orentated revew, Internatonal Journal of Busness Intellgence and Data Mnng, Inderscence Publshers, Vol. 1, No. 1, pp Cost, S. and Salzberg, S. (1993) A weghted nearest neghbour algorthm for learnng wth symbolc features, Machne Learnng, Vol. 10, No. 1, pp Cover, T.M. (1968) Rates of convergence for nearest neghbour procedures, Proceedngs of Hawa Internatonal Conference on Systems Scences, Honolulu, Hawa, USA, pp Cover, T.M. and Hart, P. (1967) Nearest neghbour pattern classfcaton, IEEE Trans. on Informaton Theory, pp Dasarathy, B.V. (1991) Nearest-Neghbour Classfcaton Technques, IEEE Computer Socety Press, Los Alomtos, CA. Dng, Q., Khan, M., Roy, A. and Perrzo, W. (2002) The P-tree algebra, Proceedngs of ACM Symposum on Appled Computng, pp Domencon, C., Peng, J. and Gunopulos, D. (2002) Locally adaptve metrc nearest neghbour classfcaton, IEEE Trans. on Pattern Analyss and Machne Intellgence, Vol. 24, No. 9, pp.1, 281 1, 285. Duda, R.O. and Hart, P.E. (1973) Pattern Classfcaton and Scene Analyss, Wley, New York. Fredman, J. (1994) Flexble Metrc Nearest Neghbour Classfcaton, Techncal report, Stanford Unversty, Stanford, CA, USA. Fu, L. (2005) Novel effcent classfers based on data cube, Internatonal Journal of Data Warehousng and Mnng, Idea Group Inc., Vol. 1, No. 3, pp Fu, X. and Wang, L. (2005) Data dmensonalty reducton wth applcaton to mprovng classfcaton performance and explanng concepts of data sets, Internatonal Journal of Busness Intellgence and Data Mnng, Inderscence Publshers, Vol. 1, No. 1, pp Haste, T. and Tbshran, R. (1996) Dscrmnant adaptve nearest neghbour classfcaton, IEEE Trans. on Pattern Analyss and Machne Intellgence, Vol. 18, No. 6, pp James, M. (1985) Classfcaton Algorthms, John Wley & Sons, New York. Jeong, M. and Lee, D. (2005) Improvng classfcaton accuracy of decson trees for dfferent abstracton levels of data, Internatonal Journal of Data Warehousng and Mnng, Idea Group Inc., Vol. 1, No. 3, pp Khan, M., Dng, Q. and Perrzo, W. (2002) k-nearest Neghbour classfcaton on spatal data streams usng P-trees, Proceedngs of Pacfc and Asa Knowledge Dscovery and Data Mnng (PAKDD), Lecture Notes n Artfcal Intellgence, Vol. 2336, pp McLachlan, G.J. (1992) Dscrmnant Analyss and Statstcal Pattern Recognton, Wley, New York. Myles, J.P. and Hand, D.J. (1990) The mult-class metrc problem n nearest neghbour dscrmnaton rules, Pattern Recognton, Vol. 23, pp.1, 291 1, 297.
15 An effcent weghted nearest neghbour classfer 15 Nefeld, M.A. and Psalts, D. (1993) Optcal mplementatons of radal bass classfers, Appled Optcs, Vol. 32, No. 8, pp Perrzo, W., Dng, Q., Dng, Q. and Roy, A. (2001a) On mnng satellte and other remotely sensed mages, Proceedngs of ACM Workshop on Research Issues on Data Mnng and Knowledge Dscovery, Santa Barbara, CA, USA, pp Perrzo, W., Dng, Q., Dng, Q. and Roy, A. (2001b) Dervng hgh confdence rules from spatal data usng Peano count trees, Lecture Notes n Computer Scence, Vol. 2118, pp Qunlan, J.R. (1993) C4.5: Programs for Machne Learnng, Morgan Kaufmann. Safar, M. (2005) K nearest neghbour search n navgaton systems, Moble Informaton Systems, IOS Press, Vol. 1, No. 3, pp Short, R. and Fukanaga, K. (1980) A new nearest neghbour dstance measure, Proceedngs of Internatonal Conference on Pattern Recognton, Mam Beach, FL, USA, pp Short, R. and Fukanaga, K. (1981) The optmal dstance measure for nearest neghbour classfcaton, IEEE Trans. on Informaton Theory, Vol. 27, pp Note 1 P-tree technology s patented by North Dakota State Unversty (Wllam Perrzo, prmary nventor of record); patent number 6,941,303 ssued September 6, Webste TIFF mage data sets, Avalable at
Learning the Kernel Parameters in Kernel Minimum Distance Classifier
Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department
More informationMachine Learning: Algorithms and Applications
14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of
More informationContent Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers
IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth
More informationA Binarization Algorithm specialized on Document Images and Photos
A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a
More informationParallelism for Nested Loops with Non-uniform and Flow Dependences
Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr
More informationSupport Vector Machines
/9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.
More informationSubspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;
Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features
More informationCluster Analysis of Electrical Behavior
Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School
More informationFeature Reduction and Selection
Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components
More informationTerm Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task
Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto
More informationWishing you all a Total Quality New Year!
Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma
More informationOutline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1
4/14/011 Outlne Dscrmnatve classfers for mage recognton Wednesday, Aprl 13 Krsten Grauman UT-Austn Last tme: wndow-based generc obect detecton basc ppelne face detecton wth boostng as case study Today:
More informationA Fast Content-Based Multimedia Retrieval Technique Using Compressed Data
A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,
More informationSmoothing Spline ANOVA for variable screening
Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory
More informationEdge Detection in Noisy Images Using the Support Vector Machines
Edge Detecton n Nosy Images Usng the Support Vector Machnes Hlaro Gómez-Moreno, Saturnno Maldonado-Bascón, Francsco López-Ferreras Sgnal Theory and Communcatons Department. Unversty of Alcalá Crta. Madrd-Barcelona
More informationSupport Vector Machines
Support Vector Machnes Decson surface s a hyperplane (lne n 2D) n feature space (smlar to the Perceptron) Arguably, the most mportant recent dscovery n machne learnng In a nutshell: map the data to a predetermned
More information12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification
Introducton to Artfcal Intellgence V22.0472-001 Fall 2009 Lecture 24: Nearest-Neghbors & Support Vector Machnes Rob Fergus Dept of Computer Scence, Courant Insttute, NYU Sldes from Danel Yeung, John DeNero
More informationThe Research of Support Vector Machine in Agricultural Data Classification
The Research of Support Vector Machne n Agrcultural Data Classfcaton Le Sh, Qguo Duan, Xnmng Ma, Me Weng College of Informaton and Management Scence, HeNan Agrcultural Unversty, Zhengzhou 45000 Chna Zhengzhou
More informationProblem Definitions and Evaluation Criteria for Computational Expensive Optimization
Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty
More informationImprovement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration
Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,
More informationParallel matrix-vector multiplication
Appendx A Parallel matrx-vector multplcaton The reduced transton matrx of the three-dmensonal cage model for gel electrophoress, descrbed n secton 3.2, becomes excessvely large for polymer lengths more
More informationX- Chart Using ANOM Approach
ISSN 1684-8403 Journal of Statstcs Volume 17, 010, pp. 3-3 Abstract X- Chart Usng ANOM Approach Gullapall Chakravarth 1 and Chaluvad Venkateswara Rao Control lmts for ndvdual measurements (X) chart are
More informationOptimizing Document Scoring for Query Retrieval
Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng
More informationMathematics 256 a course in differential equations for engineering students
Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the
More informationLoad Balancing for Hex-Cell Interconnection Network
Int. J. Communcatons, Network and System Scences,,, - Publshed Onlne Aprl n ScRes. http://www.scrp.org/journal/jcns http://dx.do.org/./jcns.. Load Balancng for Hex-Cell Interconnecton Network Saher Manaseer,
More informationCompiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz
Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster
More informationNUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS
ARPN Journal of Engneerng and Appled Scences 006-017 Asan Research Publshng Network (ARPN). All rghts reserved. NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS Igor Grgoryev, Svetlana
More informationProblem Set 3 Solutions
Introducton to Algorthms October 4, 2002 Massachusetts Insttute of Technology 6046J/18410J Professors Erk Demane and Shaf Goldwasser Handout 14 Problem Set 3 Solutons (Exercses were not to be turned n,
More informationClassifier Selection Based on Data Complexity Measures *
Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.
More informationHermite Splines in Lie Groups as Products of Geodesics
Hermte Splnes n Le Groups as Products of Geodescs Ethan Eade Updated May 28, 2017 1 Introducton 1.1 Goal Ths document defnes a curve n the Le group G parametrzed by tme and by structural parameters n the
More informationA Fast Visual Tracking Algorithm Based on Circle Pixels Matching
A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng
More informationSLAM Summer School 2006 Practical 2: SLAM using Monocular Vision
SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,
More informationFast Feature Value Searching for Face Detection
Vol., No. 2 Computer and Informaton Scence Fast Feature Value Searchng for Face Detecton Yunyang Yan Department of Computer Engneerng Huayn Insttute of Technology Hua an 22300, Chna E-mal: areyyyke@63.com
More informationA Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems
A Unfed Framework for Semantcs and Feature Based Relevance Feedback n Image Retreval Systems Ye Lu *, Chunhu Hu 2, Xngquan Zhu 3*, HongJang Zhang 2, Qang Yang * School of Computng Scence Smon Fraser Unversty
More informationMULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION
MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION Paulo Quntlano 1 & Antono Santa-Rosa 1 Federal Polce Department, Brasla, Brazl. E-mals: quntlano.pqs@dpf.gov.br and
More information6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour
6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the
More informationVirtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory
Background EECS. Operatng System Fundamentals No. Vrtual Memory Prof. Hu Jang Department of Electrcal Engneerng and Computer Scence, York Unversty Memory-management methods normally requres the entre process
More informationAn Optimal Algorithm for Prufer Codes *
J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,
More informationAn Image Fusion Approach Based on Segmentation Region
Rong Wang, L-Qun Gao, Shu Yang, Yu-Hua Cha, and Yan-Chun Lu An Image Fuson Approach Based On Segmentaton Regon An Image Fuson Approach Based on Segmentaton Regon Rong Wang, L-Qun Gao, Shu Yang 3, Yu-Hua
More informationLecture 5: Multilayer Perceptrons
Lecture 5: Multlayer Perceptrons Roger Grosse 1 Introducton So far, we ve only talked about lnear models: lnear regresson and lnear bnary classfers. We noted that there are functons that can t be represented
More informationMachine Learning 9. week
Machne Learnng 9. week Mappng Concept Radal Bass Functons (RBF) RBF Networks 1 Mappng It s probably the best scenaro for the classfcaton of two dataset s to separate them lnearly. As you see n the below
More informationMeta-heuristics for Multidimensional Knapsack Problems
2012 4th Internatonal Conference on Computer Research and Development IPCSIT vol.39 (2012) (2012) IACSIT Press, Sngapore Meta-heurstcs for Multdmensonal Knapsack Problems Zhbao Man + Computer Scence Department,
More informationFEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur
FEATURE EXTRACTION Dr. K.Vjayarekha Assocate Dean School of Electrcal and Electroncs Engneerng SASTRA Unversty, Thanjavur613 41 Jont Intatve of IITs and IISc Funded by MHRD Page 1 of 8 Table of Contents
More informationBOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET
1 BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET TZU-CHENG CHUANG School of Electrcal and Computer Engneerng, Purdue Unversty, West Lafayette, Indana 47907 SAUL B. GELFAND School
More informationSorting Review. Sorting. Comparison Sorting. CSE 680 Prof. Roger Crawfis. Assumptions
Sortng Revew Introducton to Algorthms Qucksort CSE 680 Prof. Roger Crawfs Inserton Sort T(n) = Θ(n 2 ) In-place Merge Sort T(n) = Θ(n lg(n)) Not n-place Selecton Sort (from homework) T(n) = Θ(n 2 ) In-place
More informationSkew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach
Angle Estmaton and Correcton of Hand Wrtten, Textual and Large areas of Non-Textual Document Images: A Novel Approach D.R.Ramesh Babu Pyush M Kumat Mahesh D Dhannawat PES Insttute of Technology Research
More informationAssignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.
Farrukh Jabeen Algorthms 51 Assgnment #2 Due Date: June 15, 29. Assgnment # 2 Chapter 3 Dscrete Fourer Transforms Implement the FFT for the DFT. Descrbed n sectons 3.1 and 3.2. Delverables: 1. Concse descrpton
More informationSteps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices
Steps for Computng the Dssmlarty, Entropy, Herfndahl-Hrschman and Accessblty (Gravty wth Competton) Indces I. Dssmlarty Index Measurement: The followng formula can be used to measure the evenness between
More informationThe Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique
//00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy
More informationA mathematical programming approach to the analysis, design and scheduling of offshore oilfields
17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 A mathematcal programmng approach to the analyss, desgn and
More informationS1 Note. Basis functions.
S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type
More informationFace Recognition University at Buffalo CSE666 Lecture Slides Resources:
Face Recognton Unversty at Buffalo CSE666 Lecture Sldes Resources: http://www.face-rec.org/algorthms/ Overvew of face recognton algorthms Correlaton - Pxel based correspondence between two face mages Structural
More informationA Background Subtraction for a Vision-based User Interface *
A Background Subtracton for a Vson-based User Interface * Dongpyo Hong and Woontack Woo KJIST U-VR Lab. {dhon wwoo}@kjst.ac.kr Abstract In ths paper, we propose a robust and effcent background subtracton
More informationHelsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)
Helsnk Unversty Of Technology, Systems Analyss Laboratory Mat-2.08 Independent research projects n appled mathematcs (3 cr) "! #$&% Antt Laukkanen 506 R ajlaukka@cc.hut.f 2 Introducton...3 2 Multattrbute
More informationQuery Clustering Using a Hybrid Query Similarity Measure
Query clusterng usng a hybrd query smlarty measure Fu. L., Goh, D.H., & Foo, S. (2004). WSEAS Transacton on Computers, 3(3), 700-705. Query Clusterng Usng a Hybrd Query Smlarty Measure Ln Fu, Don Hoe-Lan
More informationUser Authentication Based On Behavioral Mouse Dynamics Biometrics
User Authentcaton Based On Behavoral Mouse Dynamcs Bometrcs Chee-Hyung Yoon Danel Donghyun Km Department of Computer Scence Department of Computer Scence Stanford Unversty Stanford Unversty Stanford, CA
More informationRandom Kernel Perceptron on ATTiny2313 Microcontroller
Random Kernel Perceptron on ATTny233 Mcrocontroller Nemanja Djurc Department of Computer and Informaton Scences, Temple Unversty Phladelpha, PA 922, USA nemanja.djurc@temple.edu Slobodan Vucetc Department
More informationBAYESIAN MULTI-SOURCE DOMAIN ADAPTATION
BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION SHI-LIANG SUN, HONG-LEI SHI Department of Computer Scence and Technology, East Chna Normal Unversty 500 Dongchuan Road, Shangha 200241, P. R. Chna E-MAIL: slsun@cs.ecnu.edu.cn,
More informationCSCI 104 Sorting Algorithms. Mark Redekopp David Kempe
CSCI 104 Sortng Algorthms Mark Redekopp Davd Kempe Algorthm Effcency SORTING 2 Sortng If we have an unordered lst, sequental search becomes our only choce If we wll perform a lot of searches t may be benefcal
More informationConcurrent Apriori Data Mining Algorithms
Concurrent Apror Data Mnng Algorthms Vassl Halatchev Department of Electrcal Engneerng and Computer Scence York Unversty, Toronto October 8, 2015 Outlne Why t s mportant Introducton to Assocaton Rule Mnng
More informationCSE 326: Data Structures Quicksort Comparison Sorting Bound
CSE 326: Data Structures Qucksort Comparson Sortng Bound Steve Setz Wnter 2009 Qucksort Qucksort uses a dvde and conquer strategy, but does not requre the O(N) extra space that MergeSort does. Here s the
More informationA Similarity Measure Method for Symbolization Time Series
Research Journal of Appled Scences, Engneerng and Technology 5(5): 1726-1730, 2013 ISSN: 2040-7459; e-issn: 2040-7467 Maxwell Scentfc Organzaton, 2013 Submtted: July 27, 2012 Accepted: September 03, 2012
More informationClassifying Acoustic Transient Signals Using Artificial Intelligence
Classfyng Acoustc Transent Sgnals Usng Artfcal Intellgence Steve Sutton, Unversty of North Carolna At Wlmngton (suttons@charter.net) Greg Huff, Unversty of North Carolna At Wlmngton (jgh7476@uncwl.edu)
More informationOutline. Type of Machine Learning. Examples of Application. Unsupervised Learning
Outlne Artfcal Intellgence and ts applcatons Lecture 8 Unsupervsed Learnng Professor Danel Yeung danyeung@eee.org Dr. Patrck Chan patrckchan@eee.org South Chna Unversty of Technology, Chna Introducton
More informationMachine Learning. Support Vector Machines. (contains material adapted from talks by Constantin F. Aliferis & Ioannis Tsamardinos, and Martin Law)
Machne Learnng Support Vector Machnes (contans materal adapted from talks by Constantn F. Alfers & Ioanns Tsamardnos, and Martn Law) Bryan Pardo, Machne Learnng: EECS 349 Fall 2014 Support Vector Machnes
More informationTHE CONDENSED FUZZY K-NEAREST NEIGHBOR RULE BASED ON SAMPLE FUZZY ENTROPY
Proceedngs of the 20 Internatonal Conference on Machne Learnng and Cybernetcs, Guln, 0-3 July, 20 THE CONDENSED FUZZY K-NEAREST NEIGHBOR RULE BASED ON SAMPLE FUZZY ENTROPY JUN-HAI ZHAI, NA LI, MENG-YAO
More informationJournal of Chemical and Pharmaceutical Research, 2014, 6(6): Research Article. A selective ensemble classification method on microarray data
Avalable onlne www.ocpr.com Journal of Chemcal and Pharmaceutcal Research, 2014, 6(6):2860-2866 Research Artcle ISSN : 0975-7384 CODEN(USA) : JCPRC5 A selectve ensemble classfcaton method on mcroarray
More informationA Deflected Grid-based Algorithm for Clustering Analysis
A Deflected Grd-based Algorthm for Clusterng Analyss NANCY P. LIN, CHUNG-I CHANG, HAO-EN CHUEH, HUNG-JEN CHEN, WEI-HUA HAO Department of Computer Scence and Informaton Engneerng Tamkang Unversty 5 Yng-chuan
More informationA Lazy Ensemble Learning Method to Classification
IJCSI Internatonal Journal of Computer Scence Issues, Vol. 7, Issue 5, September 2010 ISSN (Onlne): 1694-0814 344 A Lazy Ensemble Learnng Method to Classfcaton Haleh Homayoun 1, Sattar Hashem 2 and Al
More informationPruning Training Corpus to Speedup Text Classification 1
Prunng Tranng Corpus to Speedup Text Classfcaton Jhong Guan and Shugeng Zhou School of Computer Scence, Wuhan Unversty, Wuhan, 430079, Chna hguan@wtusm.edu.cn State Key Lab of Software Engneerng, Wuhan
More informationOutline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like:
Self-Organzng Maps (SOM) Turgay İBRİKÇİ, PhD. Outlne Introducton Structures of SOM SOM Archtecture Neghborhoods SOM Algorthm Examples Summary 1 2 Unsupervsed Hebban Learnng US Hebban Learnng, Cntd 3 A
More informationType-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data
Malaysan Journal of Mathematcal Scences 11(S) Aprl : 35 46 (2017) Specal Issue: The 2nd Internatonal Conference and Workshop on Mathematcal Analyss (ICWOMA 2016) MALAYSIAN JOURNAL OF MATHEMATICAL SCIENCES
More informationPerformance Evaluation of Information Retrieval Systems
Why System Evaluaton? Performance Evaluaton of Informaton Retreval Systems Many sldes n ths secton are adapted from Prof. Joydeep Ghosh (UT ECE) who n turn adapted them from Prof. Dk Lee (Unv. of Scence
More informationAn Entropy-Based Approach to Integrated Information Needs Assessment
Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology
More informationSHAPE RECOGNITION METHOD BASED ON THE k-nearest NEIGHBOR RULE
SHAPE RECOGNITION METHOD BASED ON THE k-nearest NEIGHBOR RULE Dorna Purcaru Faculty of Automaton, Computers and Electroncs Unersty of Craoa 13 Al. I. Cuza Street, Craoa RO-1100 ROMANIA E-mal: dpurcaru@electroncs.uc.ro
More informationDetermining the Optimal Bandwidth Based on Multi-criterion Fusion
Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Mult-crteron Fuson Ha-L Lang 1+, Xan-Mn
More informationLECTURE : MANIFOLD LEARNING
LECTURE : MANIFOLD LEARNING Rta Osadchy Some sldes are due to L.Saul, V. C. Raykar, N. Verma Topcs PCA MDS IsoMap LLE EgenMaps Done! Dmensonalty Reducton Data representaton Inputs are real-valued vectors
More informationShape Representation Robust to the Sketching Order Using Distance Map and Direction Histogram
Shape Representaton Robust to the Sketchng Order Usng Dstance Map and Drecton Hstogram Department of Computer Scence Yonse Unversty Kwon Yun CONTENTS Revew Topc Proposed Method System Overvew Sketch Normalzaton
More informationSome Advanced SPC Tools 1. Cumulative Sum Control (Cusum) Chart For the data shown in Table 9-1, the x chart can be generated.
Some Advanced SP Tools 1. umulatve Sum ontrol (usum) hart For the data shown n Table 9-1, the x chart can be generated. However, the shft taken place at sample #21 s not apparent. 92 For ths set samples,
More informationSolving two-person zero-sum game by Matlab
Appled Mechancs and Materals Onlne: 2011-02-02 ISSN: 1662-7482, Vols. 50-51, pp 262-265 do:10.4028/www.scentfc.net/amm.50-51.262 2011 Trans Tech Publcatons, Swtzerland Solvng two-person zero-sum game by
More informationSequential search. Building Java Programs Chapter 13. Sequential search. Sequential search
Sequental search Buldng Java Programs Chapter 13 Searchng and Sortng sequental search: Locates a target value n an array/lst by examnng each element from start to fnsh. How many elements wll t need to
More informationFace Recognition Method Based on Within-class Clustering SVM
Face Recognton Method Based on Wthn-class Clusterng SVM Yan Wu, Xao Yao and Yng Xa Department of Computer Scence and Engneerng Tong Unversty Shangha, Chna Abstract - A face recognton method based on Wthn-class
More informationLearning-Based Top-N Selection Query Evaluation over Relational Databases
Learnng-Based Top-N Selecton Query Evaluaton over Relatonal Databases Lang Zhu *, Wey Meng ** * School of Mathematcs and Computer Scence, Hebe Unversty, Baodng, Hebe 071002, Chna, zhu@mal.hbu.edu.cn **
More informationUSING GRAPHING SKILLS
Name: BOLOGY: Date: _ Class: USNG GRAPHNG SKLLS NTRODUCTON: Recorded data can be plotted on a graph. A graph s a pctoral representaton of nformaton recorded n a data table. t s used to show a relatonshp
More informationSum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints
Australan Journal of Basc and Appled Scences, 2(4): 1204-1208, 2008 ISSN 1991-8178 Sum of Lnear and Fractonal Multobjectve Programmng Problem under Fuzzy Rules Constrants 1 2 Sanjay Jan and Kalash Lachhwan
More informationSemantic Image Retrieval Using Region Based Inverted File
Semantc Image Retreval Usng Regon Based Inverted Fle Dengsheng Zhang, Md Monrul Islam, Guoun Lu and Jn Hou 2 Gppsland School of Informaton Technology, Monash Unversty Churchll, VIC 3842, Australa E-mal:
More informationCSE 326: Data Structures Quicksort Comparison Sorting Bound
CSE 326: Data Structures Qucksort Comparson Sortng Bound Bran Curless Sprng 2008 Announcements (5/14/08) Homework due at begnnng of class on Frday. Secton tomorrow: Graded homeworks returned More dscusson
More informationVirtual Machine Migration based on Trust Measurement of Computer Node
Appled Mechancs and Materals Onlne: 2014-04-04 ISSN: 1662-7482, Vols. 536-537, pp 678-682 do:10.4028/www.scentfc.net/amm.536-537.678 2014 Trans Tech Publcatons, Swtzerland Vrtual Machne Mgraton based on
More informationInvestigating the Performance of Naïve- Bayes Classifiers and K- Nearest Neighbor Classifiers
Journal of Convergence Informaton Technology Volume 5, Number 2, Aprl 2010 Investgatng the Performance of Naïve- Bayes Classfers and K- Nearest Neghbor Classfers Mohammed J. Islam *, Q. M. Jonathan Wu,
More informationCourse Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms
Course Introducton Course Topcs Exams, abs, Proects A quc loo at a few algorthms 1 Advanced Data Structures and Algorthms Descrpton: We are gong to dscuss algorthm complexty analyss, algorthm desgn technques
More informationUB at GeoCLEF Department of Geography Abstract
UB at GeoCLEF 2006 Mguel E. Ruz (1), Stuart Shapro (2), June Abbas (1), Slva B. Southwck (1) and Davd Mark (3) State Unversty of New York at Buffalo (1) Department of Lbrary and Informaton Studes (2) Department
More informationUsing Fuzzy Logic to Enhance the Large Size Remote Sensing Images
Internatonal Journal of Informaton and Electroncs Engneerng Vol. 5 No. 6 November 015 Usng Fuzzy Logc to Enhance the Large Sze Remote Sensng Images Trung Nguyen Tu Huy Ngo Hoang and Thoa Vu Van Abstract
More informationFAHP and Modified GRA Based Network Selection in Heterogeneous Wireless Networks
2017 2nd Internatonal Semnar on Appled Physcs, Optoelectroncs and Photoncs (APOP 2017) ISBN: 978-1-60595-522-3 FAHP and Modfed GRA Based Network Selecton n Heterogeneous Wreless Networks Xaohan DU, Zhqng
More informationArray transposition in CUDA shared memory
Array transposton n CUDA shared memory Mke Gles February 19, 2014 Abstract Ths short note s nspred by some code wrtten by Jeremy Appleyard for the transposton of data through shared memory. I had some
More informationCS 534: Computer Vision Model Fitting
CS 534: Computer Vson Model Fttng Sprng 004 Ahmed Elgammal Dept of Computer Scence CS 534 Model Fttng - 1 Outlnes Model fttng s mportant Least-squares fttng Maxmum lkelhood estmaton MAP estmaton Robust
More informationClassification / Regression Support Vector Machines
Classfcaton / Regresson Support Vector Machnes Jeff Howbert Introducton to Machne Learnng Wnter 04 Topcs SVM classfers for lnearly separable classes SVM classfers for non-lnearly separable classes SVM
More informationConditional Speculative Decimal Addition*
Condtonal Speculatve Decmal Addton Alvaro Vazquez and Elsardo Antelo Dep. of Electronc and Computer Engneerng Unv. of Santago de Compostela, Span Ths work was supported n part by Xunta de Galca under grant
More informationLecture #15 Lecture Notes
Lecture #15 Lecture Notes The ocean water column s very much a 3-D spatal entt and we need to represent that structure n an economcal way to deal wth t n calculatons. We wll dscuss one way to do so, emprcal
More informationBIN XIA et al: AN IMPROVED K-MEANS ALGORITHM BASED ON CLOUD PLATFORM FOR DATA MINING
An Improved K-means Algorthm based on Cloud Platform for Data Mnng Bn Xa *, Yan Lu 2. School of nformaton and management scence, Henan Agrcultural Unversty, Zhengzhou, Henan 450002, P.R. Chna 2. College
More informationA fast algorithm for color image segmentation
Unersty of Wollongong Research Onlne Faculty of Informatcs - Papers (Arche) Faculty of Engneerng and Informaton Scences 006 A fast algorthm for color mage segmentaton L. Dong Unersty of Wollongong, lju@uow.edu.au
More information