Robust Subspace Outlier Detection in High Dimensional Space

Size: px
Start display at page:

Download "Robust Subspace Outlier Detection in High Dimensional Space"

Transcription

1 Robust Subspace Outler Detecton n Hgh Dmensonal Space Zhana Noname manuscrpt No. In 202 Abstract Rare data n a large-scale database are called outlers that reveal sgnfcant nformaton n the real world. The subspace-based outler detecton s regarded as a feasble approach n very hgh dmensonal space. However, the outlers found n subspaces are only part of the true outlers n hgh dmensonal space, ndeed. The outlers hdden n normalclustered ponts are sometmes neglected n the projected dmensonal subspace. In ths paper, we propose a robust subspace method for detectng such nner outlers n a gven dataset, whch uses two dmensonal-projectons: detectng outlers n subspaces wth local densty rato n the frst projected dmensons; fndng outlers by comparng neghbor s postons n the second projected dmensons. Each pont s weght s calculated by summng up all related values got n the two steps projected dmensons, and then the ponts scorng the largest weght values are taken as outlers. By takng a seres of experments wth the number of dmensons from 0 to 0000, the results show that our proposed method acheves hgh precson n the case of extremely hgh dmensonal space, and works well n low dmensonal space. Keywords-Outler detecton; Hgh dmensonal subspace; Dmenson projecton; k-ns; I. INTRODUCTION Fndng rare and valuable data s always a sgnfcant ssue n data mnng feld. These worthy data are called anomaly data that are dfferent from the rest of the normal data based on some measures. They are also called outlers that are located far n dstance from others. Outler detecton has many practcal applcatons n dfferent domans, such as medcne development, fraud detecton, sports statstcs analyss, publc health management, and so on. Accordng to dfferent perspectves, many defntons about outlers are proposed. The wdely accepted defnton s Hawkns : an outler s an observaton that devates so much from other observatons as to arouse suspcon that t was generated by a dfferent mechansm[7]. Ths defnton not only descrbes the dfference of data from observaton but also ponts out the essental dfference of data n mechansm; even though some synthetc data are generated accordng to ths concept n order to verfy ther outlers detecton methods. Although outler detecton tself does not have a specal requrement for hgh dmensonal space, large-scale data are more practcable n the real world. There are two ssues for outler detecton n hgh dmensonal space: the frst one s to overcome the complexty n hgh dmensonal space, and the other s to meet the requrement for real applcatons wth the tremendous growth of hgh dmensonal data. In low dmensonal space, outlers can be consdered as far ponts from the normal ponts based on the dstance. However, n hgh dmensonal space, the dstance no longer meets the a. 3-Dmenson b. X-Y Dmenson c. X-Z Dmenson d. Y-Z Dmenson Fgure. Sample data plotted n three-dmensonal space and n twodmensonal spaces. Four red outlers separated n (a) are observed. But n (b), (c) and (d), only two red outlers are observed. Other two outlers are hdden n the normal clusters. exact descrpton between outlers and normal data. In ths case, detectng outlers falls nto two categores, dstancebased and subspace-based methods. The frst one uses robust dstance or densty n hgh dmensonal space,.e. [], Hlout[8], LOCI[3], Grd[4], ABOD[4], etc. These methods are sutable for the outler detecton n not hgh dmensonal space. However, n very hgh dmensonal space, they perform poor because of curse of dmensons. The other one that subspace based detecton s an optmum method to fnd outlers n hgh dmensonal space. It s based on the assumpton that the outlers n all low projected dmensonal subspaces are taken as real outlers n hgh dmensonal space. Ths soluton ncludes Aggarwal s Fracton[2], GLS-SOD[6], CURIO[5], SPOT[22], Grd- Clusterng[23], etc. Snce outlers are easly found n low projected dmensons usng some optmzed search algorthms to fnd sutable cell-grds that are dvsons of subspace, t s wdely used for outler detecton n hgh dmensonal space. Recent advance n geo-spatal, bonformatcs, genetcs and partcle physcs also requre more robust subspace detecton methods n growng hgh dmensonal data. However, one key ssue s stll uncertan: - Is that truth that the outlers detected n subspaces are all outlers n hgh dmensonal space?

2 In fact, subspace-based detecton methods can fnd some outlers dfferent from normal ponts n projected dmensonal space, but they gnore the outlers hdden nsde the regon of normal data. These nner outlers are stll dfferent from normal data n hgh dmensonal space. We show a smple example to prove the dfference between these two types of outlers separately n three-dmensonal space and projected two-dmensonal subspaces, as shown n Fg.. Total 24 ponts are dstrbuted n a three-dmensonal space, ncludng 20 normal ponts n sx clusters and 4 outlers n red color. In (a), four outlers can be found dfferently because they do not belong to any normal clusters. The outlers O 3 and O 4 are detected dfferent n any of the projected dmensonal spaces, whle the nner outlers O and O 2 are hdden nsde the clusters n the projected dmensonal space. Therefore, detectng O and O 2 fals. All subspacebased methods fal to detect these nner outlers, as shown n (b), (c) and (d). From the above, how to fnd all outlers wth subspace-based method s stll an ssue to be consdered. In ths paper, we try to solve ths ssue by utlzng the two dmensonal-projectons and propose a robust subspace detecton method called k-ns(k-nearest Sectons). It calculates ldr(local densty rato) n the frst projected dmensonal subspace and the nearest neghbors ldr n the second projected dmensonal subspace. Then, each pont s weght s summed statstcally. The outlers are those scorng the largest weghts. The man features and contrbutons of ths paper are summarzed as follows: - We apply two dmensonal-projectons to calculate the weght values n all projected dmensons. For each pont, we supply the ( m+ m (m ) ) weght values n order to compare t wth others extensvely. - Our proposed method employs k-ns(k-nearest Sectons) based on the k-nn (k Nearest Neghbor) concept for the local densty calculaton n the second projected dmensonal space. The nner outlers are detected successfully by evaluatng the neghbor s ldr after projectng them nto other dmensons. - We execute a seres of experments wth the range of dmensons from 0 to 0000 to evaluate our proposed algorthm. The experment results show that our proposed algorthm has advantages over other algorthms wth stablty and precson n hgh dmensonal dataset. - We also consder the dfference between the outlers and nosy data. The outlers are obvously dfferent n hgh dmensonal space wth nosy data whle they are mxed together n low dmensonal space. Ths paper s organzed as follows. In secton 2, we gve a bref overvew of related works on hgh dmensonal outler detecton. In secton 3, we ntroduce our concept and our approach, and we descrbe our algorthm. In secton 4, we evaluate the proposed method by experments of dfferent dmensonal datasets, artfcally generated and real datasets. At last, we conclude our fndngs n secton 5. II. RELATED WORKS As an mportant part of the data mnng, outler detecton has been developed for more than ten years, and many study results have been acheved n large scale database. We categorze them nto the followng fve groups. Dstance and Densty Based Outler Detecton: the dstance based outler detecton s a conventonal method because t comes from the orgnal outler defnton,.e. outlers are those ponts that are far from other ponts based on dstance measures, e.g. by Hlout[8]. Ths algorthm detects pont wth ts k-nearest neghbors by dstance and uses space-fllng curve to map hgh dmensonal space. The most well known [] uses k-nn and densty based algorthm, whch detects outlers locally by ther k-nearest dstance neghbor ponts and measures them by lrd (local reachablty densty) and lof(local Outler Factor). Ths algorthm runs smoothly n low dmensonal space and s stll effectve n relatve hgh dmensonal space. LOCI[3] s an mproved algorthm based on, whch s more senstve to local dstance than. However, LOCI does not perform well as n hgh dmensonal space. Subspace Clusterng Based Outler Detecton: snce t s dffcult to fnd outlers n hgh dmensonal space, they try to fnd these ponts behavng abnormally n low dmensonal space. Subspace clusterng s a feasble method for outler detecton n hgh dmensonal space. Ths approach assumes that outlers are always devated from others n low dmensonal space f they are dfferent n hgh dmensonal space. Aggarwal[2] uses the equ-depth ranges n each dmenson wth expected fracton and devaton of ponts n k k k k-dmensonal cube D gven by N f and N f ( f ). Ths method detects outlers by calculatng the sparse coeffcent S(D) of the cube D. Outler Detecton wth Dmenson Deducton: another method s dmenson deducton from hgh dmensonal space to low dmensonal space, such as SOM (Self-Organzng Map)[8,9], mappng several dmensons to two dmensons, and then detectng the outlers n two dmensonal space. FndOut[] detects outlers by removng the clusters and deducts dmensons wth wavelet transform on multdmensonal data. However, ths method may cause nformaton loss when the dmenson s reduced. The result s not as robust as expected, and t s seldom appled to outler detecton. Informaton-theory based Outler Detecton: n subspace, the dstrbuton of ponts n each dmenson can be coded for data compresson. Hence, the hgh dmensonal ssue s changed to the nformaton statstc ssue n each dmenson. Chrstan Bohm has proposed CoCo[9] method wth MDL(Mnmum Descrpton Length) for outler detecton, and he also apples ths method to the clusterng ssue, e.g. Robust Informaton-theory Clusterng[5,2]. Other Outler Detecton Methods: besdes above four groups, some detecton measurements are also dstnctve and useful. One notable approach s called ABOD (Angle- Based Outler Detecton)[4]. It s based on the concept of angle wth vector product and scalar product. The outlers usually have the smaller angles than normal ponts.

3 each dmenson. Here, we call the small regon a secton. Based on the secton dvson, we construct the new data structure called secton space. Second, we calculate the sparsty of pont n each secton n each dmenson by computng the ldr aganst average value n that dmenson. Thrd, we calculate the scatterng of the same secton ponts by ldr after projectng them from orgnal dmenson to other dmensons. Last, we sum up all the results as weght for each pont, and then compare all the ponts wth the score. The outlers are the ponts scorng the largest values of weght. Fgure 2. Secton Space Dvson and Dmenson Projecton The above methods have reduced the hgh dmensonal cures to some extent, and they get the correct results n some specal cases. However, the problem stll exsts and affects the pont s detecton accuracy. Chrstan Bohm s nformaton-theory based method s smlar to the subspace clusterng methods and suffers the same wth subspace-based outler detecton methods. In summary, seekng a general approach or mprovng the exsted subspace-based methods to detect outlers n hgh dmensonal space s stll a key ssue needs to be solved. III. PROPOSED METHOD It s known from last secton that not all outlers can be found n projected dmensonal subspace. The outlers falng to be detected n subspace are called nner outlers. The nner outlers are mxed n normal clusters n projected dmensonal subspaces, but they are detected anomaly n hgh dmensonal space. From another pont of vew, the nner outlers belong to several normal clusters n dfferent subspaces, but they do not belong to any cluster as a whole. In ths paper, the key msson s to fnd such nner outlers n hgh dmensonal space. A. General Idea Learnng from the subspace detecton methods, we know that hgh dmensonal ssue can be transformed nto the statstcal ssue by loop detecton n all projected dmensonal subspaces. Moreover, the ponts dstrbuton s ndependent n dfferent dmensons. By observng these ponts and learnng from exstng outler defntons, we have found that the outlers are placed n a cluster of normal ponts n a certan dmenson and devated n other dmensons. Otherwse, outlers are clustered wth dfferent normal ponts n dfferent dmensons whle normal ponts are always clustered together. Therefore, our proposed method needs to solve the two sub-ssues: how to fnd outlers effectvely n all projected-dmensonal subspaces; and how to detect the devaton of ponts of the same regon n one dmenson when these ponts are projected to other dmensons. Our proposal can be dvded nto four steps. Frst, we dvde the entre range of data nto many small regons n B. Secton Data Structure Our proposed method s based on the secton data structure. The mechansm on how to compose ths secton structure and transform the Eucldean data space nto our proposed secton space s ntroduced n below. We dvde the space nto the same number of equ-wdth sectons n each dmenson, so the space just looks lke a cell-grd. The conventonal data space nformaton s composed of ponts and dmensons whle our proposed data structure represents the data dstrbuton by pont, dmenson and secton. Ths structure has two advantages. Frst, the pont n whch secton s easly found out n all dmensons. Therefore, we can use all related secton calculated result to denote the pont s weght value. Second, t s easy for calculatng the dstrbuton change by checkng the ponts secton poston whle projectng them to dfferent dmensons. The data structure of PontInfo(pont nformaton) and SectonInfo(secton nformaton) cted n our proposal are shown as follows: PontInfo[Dmenson ID, Pont ID]: secton ID of the pont SectonInfo[Dmenson ID, Secton ID]: #ponts n the secton The PontInfo records each pont s secton poston n dfferent dmensons. The SectonInfo records the number of ponts of each secton n dfferent dmensons. The man calculaton of sparsty of ponts and dmenson projectons are processed based on these two data structures. The transformng process from the orgnal data space to the proposed secton-based space s explaned usng an example of two-dmensonal dataset, as shown n Fg. 2. The dataset ncludes 23 ponts n two-dmensonal space, as shown n Fg. 2(a). The orgnal data dstrbute n the data space based on Eucldean dstance s shown n Fg. 2(b). In our proposed secton-based structure, we construct the PontInfo structure as n Fg. 2(c) and SectonInfo structure n as n Fg. 2(d). The range of each dmenson s dvded nto fve sectons n ths example. The secton dvson s shown n Fg. 2(b) wth blue lnes. The data range of each dmenson may be dfferent. If we set the same data range for every dmenson coverng the maxmum of a certan dmenson, t would produce too many empty sectons n some dmensons. The empty sectons producng meanngless values 0 would affect the result markedly n the followng calculatons. Therefore, we set the mnmum data range n each dmenson coverng the only area where ponts exst. In order to avod two end-sectons

4 havng larger densty than that of other sectons, we extend the border by enlargng the orgnal range by 0.%. Takng the data n Fg. 2 as an example to explan how to generate the data range n each dmenson, the orgnal data range n x dmenson s (5, 23), and the length s 8. The new extendedrange s (4.99, ) by enlargng the length by 0.%. Therefore, the new length s The orgnal data range n y dmenson s (6, 25), and the length s 9. The new data range s (5.9905, ), and the new length s The length of a secton s n x dmenson and n y dmenson. C. Defntons To our proposal, some defntons of notatons are gven n Table : Symbol P (pont) Secton (secton) scn (number of secton) d (secton densty) dsts (secton dstance) ldr sdr SI (statstc nformaton) Table. Defnton of Notatons Defnton The nformaton of pont. p j refers to the j th pont of all ponts. p, j refers to the j th pont n th dmenson. The range of data n each dmenson s dvded nto the same number of equ-wdth parts, whch are called sectons. The number of sectons for each dmenson. It s decded by the number of total ponts and the average secton densty. scn s defned equally n each dmenson. The number of ponts n one secton s called secton densty, d for short. The secton dstance used for evaluatng the secton dfference among ponts n all projected dmensons, as defned n (). Local densty rato, after ntroducng the Secton, t s replaced by sdr Secton densty rato, the calculaton s defned n (4) and (5) The statstc nformaton of each pont composed of all weghts, as defned n (6). The secton densty d wth dfferent subscrptons presents specfc meanngs n followng cases. Case : n a secton, all ponts of ths secton n a dmenson have the same secton densty, and d, j means secton densty value for the ponts n the j th secton n the th dmenson. Case 2: the secton densty s used to compare t wth the average densty n ths dmenson. So the low sectondensty means a low rato aganst the average secton densty n a dmenson. d means the average secton densty n the th dmenson. Case 3: f the secton densty of a pont s needed, the expresson wll nclude the pont. d (p) means the secton densty value of pont p n the th dmenson. In secton-based subspace, the secton denotes the pont s local area. The local densty s replaced by d. Then the ldr s replaced by sdr(secton densty rato). The process of two dmensonal-projectons s ntroduced n our proposal. Projectng ponts to each one-dmensonal subspace s the frst projecton. All ponts are checked n all the projected dmensonal subspaces. After that, the ponts n projected dmensons stll need to be checked between dfferent subspaces n order to detect nner outlers. Therefore, the ponts are projected agan from the frst projected dmenson to other dmensons, and then compare ther dstrbuton changes wth each other. It s called the second dmenson projecton. The whole procedure projects the ponts twce: from hgh dmenson to one dmenson and from one dmenson to other dmensons. D. k-nearest Sectons In ths secton, we descrbe the detecton methods n two steps. In the frst step, the sdr s employed to evaluate the sparsty of ponts n frst projecton dmensons. In the second step, whch s the key part of ths proposal, the scatterng of ponts after ther second projectons to other projected dmensons s calculated based on k-ns(k- Nearest Sectons). At last, we summarze these results of two steps statstcally. Before ntroducng the concept of k-ns, the dsts needs to be clarfed n advance. DEFINITION (dsts of ponts) Let pont p,q Secton. p, q are n the th dmenson. When p, q are projected from the dmenson to j, the secton dstance between them corresponds to the dfference of ther secton ID. dsts( p (), q ) = SecId( p j ) SecId( q j ) + Defnton() s used to measure the ponts scatter n the second projectons. In the dmenson before second projecton, assume that the ponts p and q are n the same secton. After applyng the second projecton from the dmenson to j, assume that the ponts p and q are located n dfferent sectons wth dfferent secton IDs. So we can compare the dstance of two ponts by the subtracton between SecId(p j ) and SecId(q j ) as n (). The dsts( p,q ) s defned as the absolute dfference value between the two ponts sectons, plus n order to avod the computatonal complexty of 0. In k-ns algorthm, dsts supples the effectve factor to evaluate the scatter of the ponts n the second projected dmensons. The defnton of outlers n k-ns s regarded as a statstc weght value, whch s decded by ts related calculated results n all projected dmensons. DEFINITION 2 (Outler n k-ns) The x ns of a gven pont x Secton n the database D R m s defned as follows: ' ' x = {x,x D x D,x,x Secton, p Secton ns m m m m m m ' << j j j j = = = j=,j = j=,j (2) d ( x ) d dsts( x, p ) dsts( x, p )} x, x` and p are the ponts n the same secton n the dmenson. The pont p s any of the neghbor ponts by the measure of dsts after applyng the second projecton. x ns s a statstcal result for summarzng all the values of x n two

5 dmensonal-projectons, whch means x ns of x can be used as a fnal result to detect outlers. By the k-ns defnton, outlers satsfyng ether of the followng two condtons are to be detected: frst, outlers that can be detected n the frst projecton; second, outlers that stll can be detected by dsts-based k-ns n the second projecton even f the pont does not appear abnormal n frst projecton. Although the x ns n () can reflect the outler result, t s dffcult to be calculated for each pont. Therefore, the general statstc nformaton for each pont s defned n (3): DEFINITION 3 (General Statstcal Informaton of Pont) Set sdr ( p ) s the calculated value of p k n the frst Pr oj,k projected dmenson and sdr ( p ) s the calculated Pr oj j value of p k after second projecton from the dmenson to the dmenson j. ω and ω 2 are the weght parameters for these two values. Then the statstcal nformaton value of p s expressed as follows: m m m SI ns( p k ) = ωsdr Pr oj ( p,k ) + ω2 sdr Pr oj ( p j j,k ) = = j=,j j sdr s used to calculate the densty rato of pont n two dmensonal-projectons. The detal calculaton method s ntroduced n (4) and (5). SI (Statstc Informaton) s the pont s fnal score by whch all the ponts are evaluated. The outler s SI value s obvously dfferent from normal pont s. For the dfferent dataset, adjust the weght values may brng the better result. ) Secton Densty Rato Calculated n frst Projected Dmenson Outlers always appear more sparsely than most normal ponts f they can be detected n projected dmensons. Therefore, the secton densty of outlers s lower than the average secton densty n that dmenson. In our proposal, sdr(secton Densty Rato) s cted for calculaton. The sdr of a pont not only reflect the sparsty compared wth others n that dmenson, but also keep ths value ndependent between dfferent dmensons. DEFINITION 4 (Secton Densty Rato) Set pont p,j Secton, γ n the dmenson, where j s the pont ID and γ s the secton ID. d, γ s the secton densty of pont p,j n dmenson, and d s the average secton densty n dmenson. The p, j s sdr s denoted by sdr of Secton γ, whch s defned as follows:, d, γ sdr Pr oj ( p, j ) sdr Pr oj ( Secton, γ ) = d (4) One pont to be notced s that one sdr(secton) does not only correspond to one pont, but t s shared by all the ponts n the same secton. Hence, the secton sdr ( Secton ) s assgned to the pont sdr ( p ). Pr oj, γ,k 2 Pr oj, j (3) Totally, m-tmes sdrpr oj are obtaned from all dmensons for each pont. Lemma. Gven a data set DB and pont p of DB n a secton of the dmenson, Card( Secton ) = scn, d ( p) = Count ( Secton( p )) and scn d Count( Secton,k ) scn k = =, d(p) f p s an outler, then <. d Where Card( Secton ) s the number of sectons n dmenson. Secton( p ) refers to the secton the pont p s n. Count ( Secton( p )) s the number of ponts n Secton( p ). Proof. Frst, we set the outler p s not n a normal s cluster n projected dmensons, otherwse Defnton(4) s to be appled. q, Count( Secton( p )) Count( Secton( q )) d ( p) = Count ( Secton( p )) Count( Secton( q )) n n j = Snce the secton densty d of outler p s less than most of ponts accordng to outler s defnton, c k= c k= n d ( p) Count( Secton( q )) j n = n j= Count( Secton ( q )) = Count( Secton k ) = d c So ( p) d d < k 2) k-nearest Secton Calculated n Second Projected Dmenson If outlers do not appear clearly n the low dmensons, they cannot be detected by the frst step snce they are hdden among the normal ponts and have smlar dstance or densty wth others. Nevertheless, these ponts stll can be detected n the second projected dmensons. Ths step ams to fnd outlers from normal ponts by projectng these ponts nto dfferent dmensons. The secton dstance measurement descrbes the sparsty of ponts to check them n second projected dmensons. Basng on the secton dstance concept and referrng to the k-nearest Neghbor concept[0], we can get the sdr of the nearest sectons of the pont n the projected dmensons. DEFINITION 5 (Nearest Sectons n Projected Dmenson) In the second dmenson projecton, the dmenson s projected from to j, Set p j, p f, q secton, the nearest secton neghbor N kn (p) of the pont p s defned as N kn( p ) = { q secton dsts( p,q ) k dsts( p )}. The pont q s one of k-nearest neghbor ponts. Count( Secton, γ ) = s.

6 Nkn s the number of p s neghbors. Then sdr Pr oj j of pont p,k s defned as follows: 2 dsts( p j,k,q)) Nkn q N kn (5) sdr Pr oj ( p j,k ) sdr Pr oj ( Secton j, γ ) = s 2 dsts( p j,f,q)) s N f = kn q Nkn We calculate the p k s dsts wth k-nearest neghbor ponts, and then get the rato value aganst the average value of ponts n the same secton. Whle the pont s projected to another dmenson, a sngle sdr Pr oj j value s calculated each tme of the projecton. Totally, m (m-) -tmes sdr Pr oj j values are obtaned from all the projected dmensons for each pont. Lemma 2. Gven a dataset DB and pont o, p,q Secton n a dmenson. Set a normal ponts cluster C, normal ponts p,q C, after the second projecton, ponts p, q, o are projected to dmenson j. p s o s the k th nearest neghbor, q s p s the k th nearest neghbor. If o s an outler, then dsts( o, p ) dsts( p,q ) Proof. For normal ponts, they belong to a cluster n all dmensons. Therefore, p,q C n dmenson j. o s an outler, so o C. If q s n the o s k neghbors, Then If p, q are on the o s same sde dsts( o, p ) dst( p,q ) If p, q are on the o s both sdes, p,q C o C o C o C, t s the contradcton! If q s not n the o s k neghbors, q s on the other sde of p If dsts( p,q ) > dsts( o, p ) o C o C o C, t s the contradcton! Then, dsts( o, p ) > dsts( p,q ) 3) Statstcal Informaton Values for Each Pont Through the above two steps calculaton, each pont gets m-tmes sdr Pr oj values at frst projecton and gets m (m )-tmes values n second projecton. sdr Pr oj j The sutable weghts for SI n (3) are consdered n order to gve the sharp boundary to compare ponts. By evaluatng dfferent weghtng values and ther performance, we choose smple and clear values. Here, we get the recprocal value of average sdr Pr oj and sdr, so we set weght ω Pr oj j = m and ω 2 =. The outlers have obvously larger SI than m(m ) that of the normal ponts. DEFINITION 6 (Statstc Informaton of pont) 2m SI ns(p k ) = m m (6) sdr Pr oj ( p,k ) sdr Pr oj ( p j,k ) + m = j= Equaton(6) sums up sdr values n all projecteddmensons. In low dmenson, the SI value for normal ponts should be close to, and outler s SI value should be obvously larger than. However, t s not true n hgh dmensonal space. Normal ponts SI are gettng close to outler s SI. Nevertheless, the outler s SI s stll obvously hgher than normal ponts. Therefore, outlers can be detected just by fndng ponts wth top largest SI values. E. Algorthm Now, we focus on how to mplement the k-ns method n R language. How to get PontInfo and SectonInfo effectvely n dfferent sectons and dmensons s a key ssue that needs to be consdered n detal. The proposed algorthm s shown n Table 2 wth pseudo-r code. Here, the dataset has n ponts n m dmensonal space. The range of data s dvded nto scn sectons n each dmenson. Table 2. k-ns Algorthm Algorthm: k-nearest Secton Input: k, data[n,m], scn Begn Intalze(PontInfo[n, m], SectonInfo[scn, m]) For = to m d =n/length(sectoninfo[ SectonInfo[,]!=0,]) For j= to n Get sdr Pr oj ( Secton, γ ) wth secton densty rato n (4) sdr Pr oj ( Secton, γ ) denote sdr Pr oj ( p, j ) (PontInfo[,j]=γ) End n End m For c= to 0 resort dmenson n random order For = to m For j= to scn PtNum <- SecInfo[j,] If(PtNum ==0) next Ptd <- whch(ptinfo[,]==j) If(PtNum < 3 k) sdr 2 Pr oj = j else For each(p n Ptd]) { f ( <m) =+ else = Get dsts(p Ptd, ) wth Defnton() Get sdr Pr oj j wth Defnton(5) } End j End End c Get SI value wth Defnton(6) for each pont Output: Outlers wth Pont ID (SI(p) >> SI or top SI score) Three ponts need to be clarfed n ths algorthm. The frst pont s how to decde the average sectondensty d n each dmenson. d value s obtaned by the defnton of the average secton densty n. It means d scn s same n each dmenson. However, we consder the specal case that most ponts are n several sectons and no pont s n other sectons. In ths case, d becomes very low and even close to the outler s secton densty. Therefore, we only count sectons wth ponts. Subsequently, d are vared n dfferent dmensons. Hence, the rato of the secton densty aganst d n Defnton (4) can measure the sparsty of ponts n dfferent sectons of a dmenson. The second pont s the number of ponts n one secton. There are three dfferent cases.

7 Case : no pont n the secton. In ths case, the algorthm just passes ths secton and goes to the next secton. Case 2: many ponts n the secton. In ths case, the nearest sectons method s used drectly to detect ponts. Case 3: only a few ponts n the secton. In ths case, the pont dstrbuton s dffcult to be judged just by these several ponts. In addton, the secton densty rato n the step must be very low. Therefore, these ponts are to be already detected by the prevous step. Here, we pass ths secton too. The threshold value to separate the case 2 and case 3 s related to the k. k should not be large because k s less than d n the step 2. Through experments wth values from 4 to 20 to fnd the sutable value for k and the threshold of the number of ponts n one secton, we have found that the threshold value can be defned as 3 k as the best soluton 2 whch could be used n most of the stuatons. F. Complexty Analyss The three-step procedure s consdered separately to state the complexty of the k-ns algorthm. In the frst step, t calculates the secton densty n each projected dmenson. The tme complexty s O(m n). In the second step, k- nearest sectons densty s calculated between projected dmensons. The tme complexty s O(scn (m-) m). It s notced that all ponts n the secton are used, so the tme complexty expresson s changed to O(n (m-) m). In the last step, summng up all weght values for each pont, the tme complexty s O(n). Hence, the total complexty tme s T( n) = O + O 2 + O3 = O ( m n ) + O ( n ( m ) m ) + O ( n ) 2 O( n m ) The k-ns takes more processng tme on calculatng n the loop of dmenson projectons and fndng the related pont s secton n each dmenson. The space complexty of k-ns s S n = O 2 scn m + 3 m n + m + n ( ) ( ) O3 ( m n) We need to record the necessary nformaton and ntermedate result for the pont and the secton. The temporary room needed n the procedure s just a lttle. G. Dstncton between Outlers and Nosy Ponts The concept of outler and nosy pont has been proposed for more than ten years. Accordng to that, outler s regarded as abnormal data, whch s generated by a dfferent mechansm and contans valuable nformaton, and nosy data are regarded as a sde product of clusterng ponts, whch have no useful nformaton but affect the correct result greatly. In the data space, outlers are the ponts that are farther from others by some measures, whle the nosy ponts always appear around the outlers. Snce the nosy ponts are also far away from the normal ponts, n low dmensonal space, t s dffcult to make a dstnct boundary between outlers and nosy ponts. Based on ths frustrated (a) (b) Fgure 3. Nosy Data of Dataset 5 Projected to Two-Dmensonal Space and Dataset 3 Projected to Two-Dmensonal Space observaton, some researchers even consder that nosy pont s as a knd of outlers. There s no dfference n detectng abnormal data by any methods. Hence, t s a meanngful ssue to make them dfferent between outlers and nosy ponts not only n concept but also n detecton measures. In ths paper, we try to explan the dstncton between outlers and nosy ponts n two aspects. The frst s that there are dfferent data generaton processes. Outlers are generated by a dfferent dstrbuton from normal ponts. Nosy ponts have the same dstrbuton wth normal ponts. The second s that abnormal states are dfferent n dmensonal space. Outlers appear abnormal n most of the dmensons. Nosy ponts only appear abnormally n several dmensons and appear normal n other dmensons. From the whole dmensons vew, these nosy data also conform to the same dstrbuton of normal data. The outler may appear n the same way n low dmensonal space, but they conform to dfferent dstrbuton mechansm from normal ponts. Therefore, t shows the dfference between outlers and nosy ponts n some projected-dmensonal spaces. An example of nosy data s shown n Fg. 3(a). The data s retreved from Dataset 8 as ntroduced n secton 4, whch contans 000 ponts n 0000 dmensons. The outlers are placed n the mddle regon and can be found dfferently from normal ponts. The nosy ponts are labeled wth a cloud symbol that s so dfferent n ths projected twodmensonal space. Another example s shown n Fg. 3(b). The outlers are not always obvous n low projected dmensonal space, whle nosy ponts that are dstrbuted on the margnal area of both dmensons are lkely abnormal ponts. IV. EVALUATION We have mplemented our algorthm and appled t to several hgh dmensonal datasets, and then have made the comparson between k-ns, and LOCI. In order to compare these algorthms under far condtons, we performed them wth R language, on a Mac Book Pro wth 2.53GHz Intel core 2 CPU and 4G memory. A. Synthetc Datasets A crtcal ssue of evaluatng outler detecton algorthms s that there are no benchmark datasets avalable n a real world to satsfy the explct dvson between outlers and normal ponts. The ponts that are found as outlers n some

8 Precson k-ns Precson k-ns F-measure k-ns Recall Recall Dataset (a) Precson-Recall n Dataset wth 0 Dmensons (b) Precson-Recall n Dataset 2 wth00 Dmensons (c) F-measure of Dataset -8 Fgure 4. Effectveness Comparson between and k-ns n Eght Datasets from dmenson 0 to Precson-Recall n (a) and (b); F-measure n (c) Table 3. Experment Dataset real dataset are mpossble to provde a reasonable explanaton why these ponts are pcked out as outlers. On the other hand, what we have learned from the statstcal knowledge s helpful to generate the artfcal dataset: f some ponts wth some dstrbutons are apparently dfferent from those of normal ponts, these ponts can be regarded as outlers. Hence, we generate the synthetc data based on ths assumpton. We generate the eght synthetc datasets wth ponts of and dmensons of The normal ponts conform to the normal dstrbutons whle outlers conform to the random dstrbutons n a fxed regon. Normal ponts are dstrbuted n fve clusters wth random μ and σ, and 0 outlers are dstrbuted randomly n the mddle of normal ponts range. The more detals about the parameters n each dataset are shown n Table 3. The experment datasets are generated by the rules that outlers range should be wthn the range of the normal ponts n any dmensons. Therefore, outlers cannot be found n low dmensonal space. The data dstrbuton example s shown n Fg. 3(b) where the Dataset 3 s projected to two-dmensonal space wth outlers labeled wth red color. It s clearly shown that the outlers are wthn the range of normal ponts and appear no dfference wth the normal ponts n ths two-dmensonal space. Nosy ponts that are placed on the margn of dstrbuted area are more lkely regarded as abnormal ponts. Hence, outlers and normal data cannot be separated just by the straght observaton of the dfferent dstrbutons. B. Effectveness Frst, we conduct the two-dmensonal experment usng the dataset n Fg. 2. The result show that three algorthms perform well. Our proposed algorthm can run on low dmensonal dataset. Next, our proposed algorthm s evaluated thoroughly by a seres of experments and compared t wth. LOCI s excluded for comparson because t performs poor n every dataset. In order to measure the performance of these algorthms wth precson and recall, the 0 outlers are repreved one by one. In the evaluaton of all eght dataset experments, we obtaned 0 precsons and the 0 recalls respectvely n every dataset, and obtan the 0 F-measures. We pck up the hghest F-measure from each dataset for demonstratng the experment performance by and k- NS. At the begnnng, we need to set all the approprate parameters n the eght expermental datasets. The parameters are the best ones for the prepared datasets, and they are changed accordng to the data sze and the number of dmensons. The parameter Knn of s set around 0 n all the experments snce the dataset sze s only 500 or 000 ponts. Ths s a reasonable rato of neghbor ponts aganst the whole dataset sze. For our algorthm, the parameters of d and scn are nverse each other. The product of d and scn s equal to n. We set scn a lttle larger than d, because these combnatons of parameters have shown the better experment results. The 0-dmensonal experment result s shown n Fg. 4(a). performs best n ths 0-dmensonal experment. Especally, can detect two outlers wth very hgh precson. Nevertheless, the precson of falls down sharply wth the ncreasng recall from 20% to 40%. At last, the result of precson s worse than k-ns n detectng all outlers correctly. As a whole, the performance of k-ns s below. The reason that performance s poor n both algorthms s that the outlers are placed n the center of normal data n our datasets, whch prevents these outlers to be found n low dmensonal space. Therefore, t s dffcult to fnd exact outlers n 0-dmensonal space. When the number of dmenson ncreases to 00, the precson and recall n the 2 nd dataset clearly show the

9 Tme(Sec) k-ns Dataset Fgure 5. Runnng Tme effectveness of these algorthms. Dfferent from the frst dataset, k-ns acheves 00% precson wth any recall all the tme. obvously reduces the precson from 00% to 43.48% wth the ncreasng recall from 70% to 00%, as shown n Fg. 4(b). In fact, k-ns keeps the perfect result n 00 dmensons, whle performs much poorer n terms of the precson and the recall. The experments of the datasets from to 8 are shown n Fg. 4(c). needs to pck the largest F-measure for each dataset, whle k-ns only needs to pck the largest F-measure for the frst dataset. In addton, F-measures of k-ns are always on the datasets 2 to 8. The experments show that k-ns performs perfectly n fnd nner outlers n hgh dmensonal space. s suffered the curse of hgh dmenson greatly. We fnd that the precson become better when the dataset sze s ncreased; but t does not for. C. Effcences We also compare these algorthms n runnng-tme. In R language, the runnng tme ncludes user tme, system tme and total tme. So we only use the user tme to compare them. As shown n Fg. 5, s faster n all experments. The two algorthms take more tme when the number of dmensons or the data sze ncrease. The reason s that there s no dmenson-loop calculaton for because t only processes the dstance between a pont and ts neghbors. However, our proposed algorthm calculates the values n all the frst projected dmensons and all the second projecteddmensons. D. Performance on Real World Data In ths subsecton, we compare these algorthms wth a real-world dataset publcly avalable at the UCI machnelearnng repostory[24]. We use Arcene dataset that s provded by ARCECE group. The task of the group s to dstngush cancer versus normal patterns from mass-spectrometrc data. Ths s a two-class classfcaton problem wth contnuous nput varables. Ths dataset s one of fve datasets from the NIPS 2003 feature selecton challenge. The orgnal dataset ncludes total 900 nstances wth 0000 attrbutes. The datasets have tranng dataset, valdatng dataset and test dataset. Each sub-dataset s K-NS No. SI Pt ID SI Pt ID k-ns Mxed Pt Pt Recall Pt Recall Pt Recall % 2 20% 5 50% % 3 30% 9 90% b k-ns MIX Pont Number a c Fgure 6. Top 20 Ponts Detected n Arcane Data labeled wth postve and negatve except for test dataset. For 700 nstances n test dataset, we only know 30 nstances are postve and 390 nstances are negatve. The best_svm_result s avalable at [25]. 308 nstances are labeled wth postve, and 392 nstances are labeled wth negatve. We use ths SVM result for evaluatng and our proposal, where we create a dataset by addng randomly selected 0 negatve nstances to the retreved 308 postve nstances by SVM. The frst evaluaton uses ths dataset wth total 38 nstances. The second evaluaton uses the retreved 392 negatve nstances, and we apply two algorthms to detect outler from them. The result of the frst experment s shown n Fg. 6. The 20 top ponts are chosen n both algorthms. SI s the score for pont, and the Pt ID s pont ID. The ponts wth Pt ID larger than 308 are true outlers. Three outlers and two outlers are detected n and k-ns n the top ten ponts. Totally, fve outlers are detected by mxed result combned wth both algorthms. In the top twenty ponts, seven outlers and three outlers are detected by and k-ns. Totally, nne outlers are detected wth mxed result. In both results, the s better than k-ns. However, the k-ns can help to ncrease the detecton accuracy from 30% to 50% n 0 ponts, 70% to 90% n 20 ponts. In another word, the k- NS supply a reasonable alternatve soluton to ncrease the precson results. As a contrast, we also gve the LOCI result, whch output pont ID (8, 20, 48, 95, 53, 89, 93, 242, 307, 3, 35, 37). Its recall s 30%, the same as k-ns. However, all the outlers detected n LOCI are also detected by. Table 4. Recall % Top 5 Ponts detected n Arcane Data

10 In the second experment, there are two postve ponts mss-clarfed by SVM. Therefore, fndng these two ponts s the task of ths experment. As seen n Table 4 showng the results, pont IDs of 29 and 82 are the most probably outlers by the ntersecton of and k-ns results. It s noted that the both ponts appear n the top three detected ponts of the both results. If we consder the LOCI result, the ntersectons pont ID s 53 and 75, whch s entrely dfferent from k-ns. Nevertheless, contrast wth former results, the frst concluson seems more reasonable. V. CONCLUSTION In ths paper, we ntroduce a new defnton of nner outler, and then present a novel method, called k-ns, desgned to detect such nner outlers wth the top largest score n a hgh dmensonal dataset. The algorthm s based on a statstcal method wth three steps. () Calculate the secton densty rato of each pont n each dmenson after frst projecton. () Compute the nearest sectons densty rato of each pont n all projected dmensons after second projecton. () Summarze all sdr values of each pont and denoted as a weght value (SI), then compare SI wth those of the other ponts. Each pont gets totally m+m (m-) values to be compared. Expermental results on synthetc datasets wth dmenson from 0 to 0000 have shown that our proposed k-ns algorthm has the followng advantages: Immune to the curse of hgh dmensons, Adapt to varous outler dstrbutons, Show outstandng performance on detectng nner outlers n hgh dmensonal data space. The dfference between outlers and nosy data s also dscussed n ths paper. Ths ssue s dffcult n low dmensonal space. In our experments, the nosy data and outler are found dfferently by comparng the dstrbuton n projected dmenson and whole dmensons, and the nosy data seem to more abnormal than outlers n some projected dmensonal spaces n our cases. As the ongong and future work, we contnue to mprove the algorthm by fndng the best relatonshp for two-step sdr. Besdes performng the dataset wth the hgh dmensons, the dataset wth large-scale data sze or ncrement updates nstead of computng t over the entre dataset to the outler detecton need to be conducted. Another ssue s the expensve cost of the processng tme n hgh dmensonal space. Any soluton to reduce the processng tme needs to be nvestgated. One of the approaches may be the use of the parallel processng. REFERENCES [] Markus M.Breung, Hans-Peter Kregel, Raymond T.Ng, Jorg Sander. : Indetfy densty-based local outlers. Proceedngs of the 2000 ACM SIGMOD nternatonal conference on Management of data. [2] Charu C.Aggarwal, Phlp S.Yu. Outler detecton for hgh dmensonal data. Proceedngs of the 200 ACM SIGMOD nternatonal conference on Management of data. [3] Spros Papadmtrou, Hroyuk Ktagawa, Phllp B.Gbbons. LOCI: fast outler detecton usng the local correlaton ntegral. IEEE 9 th Internatonal conference on data engneerng [4] Hans-peter Kregel, Matthas Schubert, Arthur Zmek. Angle-based outler detecton n hgh dmensonal data. The 4th ACM SIGKDD nternatonal conference conference on Knowledge dscovery and data mnng [5] Chrstan Bohm, Chrstos Faloutsos, etc. Robust nformaton theoretc clusterng. The 2th ACM SIGKDD nternatonal conference conference on Knowledge dscovery and data mnng [6] Zhana, Wataru Kameyama. A Proposal for Outler Detecton n Hgh Dmensonal Space. The 73 rd Natonal Conventon of Informaton Processng Socety of Japan, 20. [7] D. Hawkns. Identfcaton of Outlers. Chapman and Hall, London, 980. [8] Fabrzo Angull and Clara Pzzut. Outler mnng n large hghdmensonal data sets. IEEE Transactons on Knowledge and Data Engneerng (TKDE), 7(2):203-25, February [9] Chrstan Bohm, Katrn Haegler. CoCo: codng cost for parameterfree outler detecton. In Proceedngs of the 5th ACM SIGKDD nternatonal conference conference on Knowledge dscovery and data mnng [0] Alexandar Hnnerburg, Charu C. aggarwal, Danel A. Kem. What s the nearest neghbor n hgh dmensonal space? Proceedngs of the 26th VLDB Conference, [] Dantong Yu,etc. FndOut: fndng outlers n very large datasets. Knowledge and Informaton System (2002) 4: [2] Chrstan Bohm, Chrstos Faloutsos, etc. Outler-robust clusterng usng ndependent components. Proceedngs of the 2008 ACM SIGMOD nternatonal conference on Management of data. [3] De Vres, T., Chawla, S., Houle, M.E., Fndng Local Anomales n Very Hgh Dmensonal Space, 200 IEEE 0th Internatonal Conference on Data Mnng(ICDM), pp.28-37, 3-7 Dec [4] Anny La-me Chu and Ada Wa-chee Fu, Enhancements on Local Outler Detecton. Proceedngs of the Seventh Internatonal Database Engneerng and Applcatons Symposum (IDEAS 03) [5] Aaron Ceglar, John F.Roddck and Davd M.W.Powers. CURIO: A fast outler and outler cluster detecton algorthm for larger datasets. AIDM '07 Proceedngs of the 2nd nternatonal workshop on Integratng artfcal ntellgence and data mnng. Australa, [6] Feng chen, Chang-Ten Lu, Arnold P. Boedhardjo. GLS-SOD: a generalzed local statstcal approach for spatal outler detecton.. In Proceedngs of the 6th ACM SIGKDD nternatonal conference on Knowledge dscovery and data mnng [7] Mchal Valko, Branslav Kveton, etc. 20. Condtonal Anomaly Detecton wth Soft Harmonc Functons. In Proceedngs of the IEEE th Internatonal Conference on Data Mnng (ICDM '). [8] Ashok K. Nag, Amt Mtra, etc. Multple outeler detecton n multvarate data usng self-organzng maps ttle. Computatonal statstcs : [9] Teuvo kohonen. The self-organzng map. Proceedngs of the IEEE, Vol.78, No.9, September, 990. [20] Naok Abe, Banca Zadrozny, John Langford. Outler Detecton by Actve Learnng. Proceedngs of the 2th ACM SIGKDD nternatonal conference [2] J Zhang, etc. Detectng projected outlers n hgh dmensonal data streams. In Proceedngs of the 20th Internatonal Conference on Database and Expert Systems Applcatons (DEXA '09). [22] Alexander Hnneburg, Danel A. Kem. Optmal grdclusterng:towards breakng the curse of dmensonalty n hgh dmensonal clusterng. The 25 th VLDB conference 999. [23] Amol Ghotng, etc. Fast Mnng of Dstance-Based Outlers n Hgh- Dmensonal Datasets. Data Mnng and Knowledge Dscovery. Vol.6: , [24] on May6th,202). [25] (vsted on May 6th,202).

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

Classifier Selection Based on Data Complexity Measures *

Classifier Selection Based on Data Complexity Measures * Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.

More information

A Deflected Grid-based Algorithm for Clustering Analysis

A Deflected Grid-based Algorithm for Clustering Analysis A Deflected Grd-based Algorthm for Clusterng Analyss NANCY P. LIN, CHUNG-I CHANG, HAO-EN CHUEH, HUNG-JEN CHEN, WEI-HUA HAO Department of Computer Scence and Informaton Engneerng Tamkang Unversty 5 Yng-chuan

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes SPH3UW Unt 7.3 Sphercal Concave Mrrors Page 1 of 1 Notes Physcs Tool box Concave Mrror If the reflectng surface takes place on the nner surface of the sphercal shape so that the centre of the mrror bulges

More information

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices Steps for Computng the Dssmlarty, Entropy, Herfndahl-Hrschman and Accessblty (Gravty wth Competton) Indces I. Dssmlarty Index Measurement: The followng formula can be used to measure the evenness between

More information

Feature Reduction and Selection

Feature Reduction and Selection Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Decson surface s a hyperplane (lne n 2D) n feature space (smlar to the Perceptron) Arguably, the most mportant recent dscovery n machne learnng In a nutshell: map the data to a predetermned

More information

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following.

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following. Complex Numbers The last topc n ths secton s not really related to most of what we ve done n ths chapter, although t s somewhat related to the radcals secton as we wll see. We also won t need the materal

More information

The Research of Support Vector Machine in Agricultural Data Classification

The Research of Support Vector Machine in Agricultural Data Classification The Research of Support Vector Machne n Agrcultural Data Classfcaton Le Sh, Qguo Duan, Xnmng Ma, Me Weng College of Informaton and Management Scence, HeNan Agrcultural Unversty, Zhengzhou 45000 Chna Zhengzhou

More information

Performance Evaluation of Information Retrieval Systems

Performance Evaluation of Information Retrieval Systems Why System Evaluaton? Performance Evaluaton of Informaton Retreval Systems Many sldes n ths secton are adapted from Prof. Joydeep Ghosh (UT ECE) who n turn adapted them from Prof. Dk Lee (Unv. of Scence

More information

User Authentication Based On Behavioral Mouse Dynamics Biometrics

User Authentication Based On Behavioral Mouse Dynamics Biometrics User Authentcaton Based On Behavoral Mouse Dynamcs Bometrcs Chee-Hyung Yoon Danel Donghyun Km Department of Computer Scence Department of Computer Scence Stanford Unversty Stanford Unversty Stanford, CA

More information

Clustering Algorithm of Similarity Segmentation based on Point Sorting

Clustering Algorithm of Similarity Segmentation based on Point Sorting Internatonal onference on Logstcs Engneerng, Management and omputer Scence (LEMS 2015) lusterng Algorthm of Smlarty Segmentaton based on Pont Sortng Hanbng L, Yan Wang*, Lan Huang, Mngda L, Yng Sun, Hanyuan

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

Learning-Based Top-N Selection Query Evaluation over Relational Databases

Learning-Based Top-N Selection Query Evaluation over Relational Databases Learnng-Based Top-N Selecton Query Evaluaton over Relatonal Databases Lang Zhu *, Wey Meng ** * School of Mathematcs and Computer Scence, Hebe Unversty, Baodng, Hebe 071002, Chna, zhu@mal.hbu.edu.cn **

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

CMPS 10 Introduction to Computer Science Lecture Notes

CMPS 10 Introduction to Computer Science Lecture Notes CPS 0 Introducton to Computer Scence Lecture Notes Chapter : Algorthm Desgn How should we present algorthms? Natural languages lke Englsh, Spansh, or French whch are rch n nterpretaton and meanng are not

More information

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr) Helsnk Unversty Of Technology, Systems Analyss Laboratory Mat-2.08 Independent research projects n appled mathematcs (3 cr) "! #$&% Antt Laukkanen 506 R ajlaukka@cc.hut.f 2 Introducton...3 2 Multattrbute

More information

Edge Detection in Noisy Images Using the Support Vector Machines

Edge Detection in Noisy Images Using the Support Vector Machines Edge Detecton n Nosy Images Usng the Support Vector Machnes Hlaro Gómez-Moreno, Saturnno Maldonado-Bascón, Francsco López-Ferreras Sgnal Theory and Communcatons Department. Unversty of Alcalá Crta. Madrd-Barcelona

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervsed Learnng and Clusterng Why consder unlabeled samples?. Collectng and labelng large set of samples s costly Gettng recorded speech s free, labelng s tme consumng 2. Classfer could be desgned

More information

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Determining the Optimal Bandwidth Based on Multi-criterion Fusion Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Mult-crteron Fuson Ha-L Lang 1+, Xan-Mn

More information

TN348: Openlab Module - Colocalization

TN348: Openlab Module - Colocalization TN348: Openlab Module - Colocalzaton Topc The Colocalzaton module provdes the faclty to vsualze and quantfy colocalzaton between pars of mages. The Colocalzaton wndow contans a prevew of the two mages

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009. Farrukh Jabeen Algorthms 51 Assgnment #2 Due Date: June 15, 29. Assgnment # 2 Chapter 3 Dscrete Fourer Transforms Implement the FFT for the DFT. Descrbed n sectons 3.1 and 3.2. Delverables: 1. Concse descrpton

More information

Outlier Detection Methodologies Overview

Outlier Detection Methodologies Overview Outler Detecton Methodologes Overvew Mohd. Noor Md. Sap Department of Computer and Informaton Systems Faculty of Computer Scence and Informaton Systems Unverst Teknolog Malaysa 81310 Skuda, Johor Bahru,

More information

The Shortest Path of Touring Lines given in the Plane

The Shortest Path of Touring Lines given in the Plane Send Orders for Reprnts to reprnts@benthamscence.ae 262 The Open Cybernetcs & Systemcs Journal, 2015, 9, 262-267 The Shortest Path of Tourng Lnes gven n the Plane Open Access Ljuan Wang 1,2, Dandan He

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

Detection of an Object by using Principal Component Analysis

Detection of an Object by using Principal Component Analysis Detecton of an Object by usng Prncpal Component Analyss 1. G. Nagaven, 2. Dr. T. Sreenvasulu Reddy 1. M.Tech, Department of EEE, SVUCE, Trupath, Inda. 2. Assoc. Professor, Department of ECE, SVUCE, Trupath,

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

An Image Fusion Approach Based on Segmentation Region

An Image Fusion Approach Based on Segmentation Region Rong Wang, L-Qun Gao, Shu Yang, Yu-Hua Cha, and Yan-Chun Lu An Image Fuson Approach Based On Segmentaton Regon An Image Fuson Approach Based on Segmentaton Regon Rong Wang, L-Qun Gao, Shu Yang 3, Yu-Hua

More information

Design of Structure Optimization with APDL

Design of Structure Optimization with APDL Desgn of Structure Optmzaton wth APDL Yanyun School of Cvl Engneerng and Archtecture, East Chna Jaotong Unversty Nanchang 330013 Chna Abstract In ths paper, the desgn process of structure optmzaton wth

More information

A Modified Median Filter for the Removal of Impulse Noise Based on the Support Vector Machines

A Modified Median Filter for the Removal of Impulse Noise Based on the Support Vector Machines A Modfed Medan Flter for the Removal of Impulse Nose Based on the Support Vector Machnes H. GOMEZ-MORENO, S. MALDONADO-BASCON, F. LOPEZ-FERRERAS, M. UTRILLA- MANSO AND P. GIL-JIMENEZ Departamento de Teoría

More information

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1 4/14/011 Outlne Dscrmnatve classfers for mage recognton Wednesday, Aprl 13 Krsten Grauman UT-Austn Last tme: wndow-based generc obect detecton basc ppelne face detecton wth boostng as case study Today:

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

Study of Data Stream Clustering Based on Bio-inspired Model

Study of Data Stream Clustering Based on Bio-inspired Model , pp.412-418 http://dx.do.org/10.14257/astl.2014.53.86 Study of Data Stream lusterng Based on Bo-nspred Model Yngme L, Mn L, Jngbo Shao, Gaoyang Wang ollege of omputer Scence and Informaton Engneerng,

More information

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION Paulo Quntlano 1 & Antono Santa-Rosa 1 Federal Polce Department, Brasla, Brazl. E-mals: quntlano.pqs@dpf.gov.br and

More information

Query Clustering Using a Hybrid Query Similarity Measure

Query Clustering Using a Hybrid Query Similarity Measure Query clusterng usng a hybrd query smlarty measure Fu. L., Goh, D.H., & Foo, S. (2004). WSEAS Transacton on Computers, 3(3), 700-705. Query Clusterng Usng a Hybrd Query Smlarty Measure Ln Fu, Don Hoe-Lan

More information

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc.

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc. [Type text] [Type text] [Type text] ISSN : 0974-74 Volume 0 Issue BoTechnology 04 An Indan Journal FULL PAPER BTAIJ 0() 04 [684-689] Revew on Chna s sports ndustry fnancng market based on market -orented

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

Mathematics 256 a course in differential equations for engineering students

Mathematics 256 a course in differential equations for engineering students Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the

More information

UB at GeoCLEF Department of Geography Abstract

UB at GeoCLEF Department of Geography   Abstract UB at GeoCLEF 2006 Mguel E. Ruz (1), Stuart Shapro (2), June Abbas (1), Slva B. Southwck (1) and Davd Mark (3) State Unversty of New York at Buffalo (1) Department of Lbrary and Informaton Studes (2) Department

More information

3D vector computer graphics

3D vector computer graphics 3D vector computer graphcs Paolo Varagnolo: freelance engneer Padova Aprl 2016 Prvate Practce ----------------------------------- 1. Introducton Vector 3D model representaton n computer graphcs requres

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning Outlne Artfcal Intellgence and ts applcatons Lecture 8 Unsupervsed Learnng Professor Danel Yeung danyeung@eee.org Dr. Patrck Chan patrckchan@eee.org South Chna Unversty of Technology, Chna Introducton

More information

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for

More information

Optimizing Document Scoring for Query Retrieval

Optimizing Document Scoring for Query Retrieval Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng

More information

Available online at Available online at Advanced in Control Engineering and Information Science

Available online at   Available online at   Advanced in Control Engineering and Information Science Avalable onlne at wwwscencedrectcom Avalable onlne at wwwscencedrectcom Proceda Proceda Engneerng Engneerng 00 (2011) 15000 000 (2011) 1642 1646 Proceda Engneerng wwwelsevercom/locate/proceda Advanced

More information

Meta-heuristics for Multidimensional Knapsack Problems

Meta-heuristics for Multidimensional Knapsack Problems 2012 4th Internatonal Conference on Computer Research and Development IPCSIT vol.39 (2012) (2012) IACSIT Press, Sngapore Meta-heurstcs for Multdmensonal Knapsack Problems Zhbao Man + Computer Scence Department,

More information

Brave New World Pseudocode Reference

Brave New World Pseudocode Reference Brave New World Pseudocode Reference Pseudocode s a way to descrbe how to accomplsh tasks usng basc steps lke those a computer mght perform. In ths week s lab, you'll see how a form of pseudocode can be

More information

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET 1 BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET TZU-CHENG CHUANG School of Electrcal and Computer Engneerng, Purdue Unversty, West Lafayette, Indana 47907 SAUL B. GELFAND School

More information

Random Variables and Probability Distributions

Random Variables and Probability Distributions Random Varables and Probablty Dstrbutons Some Prelmnary Informaton Scales on Measurement IE231 - Lecture Notes 5 Mar 14, 2017 Nomnal scale: These are categorcal values that has no relatonshp of order or

More information

Lecture 5: Multilayer Perceptrons

Lecture 5: Multilayer Perceptrons Lecture 5: Multlayer Perceptrons Roger Grosse 1 Introducton So far, we ve only talked about lnear models: lnear regresson and lnear bnary classfers. We noted that there are functons that can t be represented

More information

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1) Secton 1.2 Subsets and the Boolean operatons on sets If every element of the set A s an element of the set B, we say that A s a subset of B, or that A s contaned n B, or that B contans A, and we wrte A

More information

Type-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data

Type-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data Malaysan Journal of Mathematcal Scences 11(S) Aprl : 35 46 (2017) Specal Issue: The 2nd Internatonal Conference and Workshop on Mathematcal Analyss (ICWOMA 2016) MALAYSIAN JOURNAL OF MATHEMATICAL SCIENCES

More information

Problem Set 3 Solutions

Problem Set 3 Solutions Introducton to Algorthms October 4, 2002 Massachusetts Insttute of Technology 6046J/18410J Professors Erk Demane and Shaf Goldwasser Handout 14 Problem Set 3 Solutons (Exercses were not to be turned n,

More information

Fast Computation of Shortest Path for Visiting Segments in the Plane

Fast Computation of Shortest Path for Visiting Segments in the Plane Send Orders for Reprnts to reprnts@benthamscence.ae 4 The Open Cybernetcs & Systemcs Journal, 04, 8, 4-9 Open Access Fast Computaton of Shortest Path for Vstng Segments n the Plane Ljuan Wang,, Bo Jang

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

Biostatistics 615/815

Biostatistics 615/815 The E-M Algorthm Bostatstcs 615/815 Lecture 17 Last Lecture: The Smplex Method General method for optmzaton Makes few assumptons about functon Crawls towards mnmum Some recommendatons Multple startng ponts

More information

Smoothing Spline ANOVA for variable screening

Smoothing Spline ANOVA for variable screening Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory

More information

CS 534: Computer Vision Model Fitting

CS 534: Computer Vision Model Fitting CS 534: Computer Vson Model Fttng Sprng 004 Ahmed Elgammal Dept of Computer Scence CS 534 Model Fttng - 1 Outlnes Model fttng s mportant Least-squares fttng Maxmum lkelhood estmaton MAP estmaton Robust

More information

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics Introducton G10 NAG Fortran Lbrary Chapter Introducton G10 Smoothng n Statstcs Contents 1 Scope of the Chapter... 2 2 Background to the Problems... 2 2.1 Smoothng Methods... 2 2.2 Smoothng Splnes and Regresson

More information

Face Recognition Method Based on Within-class Clustering SVM

Face Recognition Method Based on Within-class Clustering SVM Face Recognton Method Based on Wthn-class Clusterng SVM Yan Wu, Xao Yao and Yng Xa Department of Computer Scence and Engneerng Tong Unversty Shangha, Chna Abstract - A face recognton method based on Wthn-class

More information

Unsupervised Learning

Unsupervised Learning Pattern Recognton Lecture 8 Outlne Introducton Unsupervsed Learnng Parametrc VS Non-Parametrc Approach Mxture of Denstes Maxmum-Lkelhood Estmates Clusterng Prof. Danel Yeung School of Computer Scence and

More information

Modular PCA Face Recognition Based on Weighted Average

Modular PCA Face Recognition Based on Weighted Average odern Appled Scence odular PCA Face Recognton Based on Weghted Average Chengmao Han (Correspondng author) Department of athematcs, Lny Normal Unversty Lny 76005, Chna E-mal: hanchengmao@163.com Abstract

More information

Module Management Tool in Software Development Organizations

Module Management Tool in Software Development Organizations Journal of Computer Scence (5): 8-, 7 ISSN 59-66 7 Scence Publcatons Management Tool n Software Development Organzatons Ahmad A. Al-Rababah and Mohammad A. Al-Rababah Faculty of IT, Al-Ahlyyah Amman Unversty,

More information

Private Information Retrieval (PIR)

Private Information Retrieval (PIR) 2 Levente Buttyán Problem formulaton Alce wants to obtan nformaton from a database, but she does not want the database to learn whch nformaton she wanted e.g., Alce s an nvestor queryng a stock-market

More information

X- Chart Using ANOM Approach

X- Chart Using ANOM Approach ISSN 1684-8403 Journal of Statstcs Volume 17, 010, pp. 3-3 Abstract X- Chart Usng ANOM Approach Gullapall Chakravarth 1 and Chaluvad Venkateswara Rao Control lmts for ndvdual measurements (X) chart are

More information

USING GRAPHING SKILLS

USING GRAPHING SKILLS Name: BOLOGY: Date: _ Class: USNG GRAPHNG SKLLS NTRODUCTON: Recorded data can be plotted on a graph. A graph s a pctoral representaton of nformaton recorded n a data table. t s used to show a relatonshp

More information

Wishing you all a Total Quality New Year!

Wishing you all a Total Quality New Year! Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15 CS434a/541a: Pattern Recognton Prof. Olga Veksler Lecture 15 Today New Topc: Unsupervsed Learnng Supervsed vs. unsupervsed learnng Unsupervsed learnng Net Tme: parametrc unsupervsed learnng Today: nonparametrc

More information

APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT

APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT 3. - 5. 5., Brno, Czech Republc, EU APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT Abstract Josef TOŠENOVSKÝ ) Lenka MONSPORTOVÁ ) Flp TOŠENOVSKÝ

More information

SCALABLE AND VISUALIZATION-ORIENTED CLUSTERING FOR EXPLORATORY SPATIAL ANALYSIS

SCALABLE AND VISUALIZATION-ORIENTED CLUSTERING FOR EXPLORATORY SPATIAL ANALYSIS SCALABLE AND VISUALIZATION-ORIENTED CLUSTERING FOR EXPLORATORY SPATIAL ANALYSIS J.H.Guan, F.B.Zhu, F.L.Ban a School of Computer, Spatal Informaton & Dgtal Engneerng Center, Wuhan Unversty, Wuhan, 430079,

More information

Collaboratively Regularized Nearest Points for Set Based Recognition

Collaboratively Regularized Nearest Points for Set Based Recognition Academc Center for Computng and Meda Studes, Kyoto Unversty Collaboratvely Regularzed Nearest Ponts for Set Based Recognton Yang Wu, Mchhko Mnoh, Masayuk Mukunok Kyoto Unversty 9/1/013 BMVC 013 @ Brstol,

More information

Fuzzy Filtering Algorithms for Image Processing: Performance Evaluation of Various Approaches

Fuzzy Filtering Algorithms for Image Processing: Performance Evaluation of Various Approaches Proceedngs of the Internatonal Conference on Cognton and Recognton Fuzzy Flterng Algorthms for Image Processng: Performance Evaluaton of Varous Approaches Rajoo Pandey and Umesh Ghanekar Department of

More information

Incremental Learning with Support Vector Machines and Fuzzy Set Theory

Incremental Learning with Support Vector Machines and Fuzzy Set Theory The 25th Workshop on Combnatoral Mathematcs and Computaton Theory Incremental Learnng wth Support Vector Machnes and Fuzzy Set Theory Yu-Mng Chuang 1 and Cha-Hwa Ln 2* 1 Department of Computer Scence and

More information

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Proceedngs of the Wnter Smulaton Conference M E Kuhl, N M Steger, F B Armstrong, and J A Jones, eds A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Mark W Brantley Chun-Hung

More information

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification Introducton to Artfcal Intellgence V22.0472-001 Fall 2009 Lecture 24: Nearest-Neghbors & Support Vector Machnes Rob Fergus Dept of Computer Scence, Courant Insttute, NYU Sldes from Danel Yeung, John DeNero

More information

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe CSCI 104 Sortng Algorthms Mark Redekopp Davd Kempe Algorthm Effcency SORTING 2 Sortng If we have an unordered lst, sequental search becomes our only choce If we wll perform a lot of searches t may be benefcal

More information

A Comparative Study for Outlier Detection Techniques in Data Mining

A Comparative Study for Outlier Detection Techniques in Data Mining A Comparatve Study for Outler Detecton Technques n Data Mnng Zurana Abu Bakar, Rosmayat Mohemad, Akbar Ahmad Department of Computer Scence Faculty of Scence and Technology Unversty College of Scence and

More information

y and the total sum of

y and the total sum of Lnear regresson Testng for non-lnearty In analytcal chemstry, lnear regresson s commonly used n the constructon of calbraton functons requred for analytcal technques such as gas chromatography, atomc absorpton

More information

Associative Based Classification Algorithm For Diabetes Disease Prediction

Associative Based Classification Algorithm For Diabetes Disease Prediction Internatonal Journal of Engneerng Trends and Technology (IJETT) Volume-41 Number-3 - November 016 Assocatve Based Classfcaton Algorthm For Dabetes Dsease Predcton 1 N. Gnana Deepka, Y.surekha, 3 G.Laltha

More information

Local Quaternary Patterns and Feature Local Quaternary Patterns

Local Quaternary Patterns and Feature Local Quaternary Patterns Local Quaternary Patterns and Feature Local Quaternary Patterns Jayu Gu and Chengjun Lu The Department of Computer Scence, New Jersey Insttute of Technology, Newark, NJ 0102, USA Abstract - Ths paper presents

More information

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach Angle Estmaton and Correcton of Hand Wrtten, Textual and Large areas of Non-Textual Document Images: A Novel Approach D.R.Ramesh Babu Pyush M Kumat Mahesh D Dhannawat PES Insttute of Technology Research

More information

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems A Unfed Framework for Semantcs and Feature Based Relevance Feedback n Image Retreval Systems Ye Lu *, Chunhu Hu 2, Xngquan Zhu 3*, HongJang Zhang 2, Qang Yang * School of Computng Scence Smon Fraser Unversty

More information

An Anti-Noise Text Categorization Method based on Support Vector Machines *

An Anti-Noise Text Categorization Method based on Support Vector Machines * An Ant-Nose Text ategorzaton Method based on Support Vector Machnes * hen Ln, Huang Je and Gong Zheng-Hu School of omputer Scence, Natonal Unversty of Defense Technology, hangsha, 410073, hna chenln@nudt.edu.cn,

More information

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like:

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like: Self-Organzng Maps (SOM) Turgay İBRİKÇİ, PhD. Outlne Introducton Structures of SOM SOM Archtecture Neghborhoods SOM Algorthm Examples Summary 1 2 Unsupervsed Hebban Learnng US Hebban Learnng, Cntd 3 A

More information

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters Proper Choce of Data Used for the Estmaton of Datum Transformaton Parameters Hakan S. KUTOGLU, Turkey Key words: Coordnate systems; transformaton; estmaton, relablty. SUMMARY Advances n technologes and

More information

A Novel Adaptive Descriptor Algorithm for Ternary Pattern Textures

A Novel Adaptive Descriptor Algorithm for Ternary Pattern Textures A Novel Adaptve Descrptor Algorthm for Ternary Pattern Textures Fahuan Hu 1,2, Guopng Lu 1 *, Zengwen Dong 1 1.School of Mechancal & Electrcal Engneerng, Nanchang Unversty, Nanchang, 330031, Chna; 2. School

More information

Load-Balanced Anycast Routing

Load-Balanced Anycast Routing Load-Balanced Anycast Routng Chng-Yu Ln, Jung-Hua Lo, and Sy-Yen Kuo Department of Electrcal Engneerng atonal Tawan Unversty, Tape, Tawan sykuo@cc.ee.ntu.edu.tw Abstract For fault-tolerance and load-balance

More information

An efficient iterative source routing algorithm

An efficient iterative source routing algorithm An effcent teratve source routng algorthm Gang Cheng Ye Tan Nrwan Ansar Advanced Networng Lab Department of Electrcal Computer Engneerng New Jersey Insttute of Technology Newar NJ 7 {gc yt Ansar}@ntedu

More information

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur FEATURE EXTRACTION Dr. K.Vjayarekha Assocate Dean School of Electrcal and Electroncs Engneerng SASTRA Unversty, Thanjavur613 41 Jont Intatve of IITs and IISc Funded by MHRD Page 1 of 8 Table of Contents

More information

Hierarchical clustering for gene expression data analysis

Hierarchical clustering for gene expression data analysis Herarchcal clusterng for gene expresson data analyss Gorgo Valentn e-mal: valentn@ds.unm.t Clusterng of Mcroarray Data. Clusterng of gene expresson profles (rows) => dscovery of co-regulated and functonally

More information