Analyzing Popular Clustering Algorithms from Different Viewpoints

Size: px
Start display at page:

Download "Analyzing Popular Clustering Algorithms from Different Viewpoints"

Transcription

1 /2002/13(08) Journal of Software Vol.13, No.8 Analyzng Popular Clusterng Algorthms from Dfferent Vewponts QIAN We-nng, ZHOU Ao-yng (Department of Computer Scence, Fudan Unversty, Shangha , Chna) (Laboratory for Intellgent Informaton Processng, Fudan Unversty, Shangha , Chna) E-mal: Receved September 3, 2001; accepted February 25, 2002 Abstract: Clusterng s wdely studed n data mnng communty. It s used to partton data set nto clusters so that ntra-cluster data are smlar and nter-cluster data are dssmlar. Dfferent clusterng methods use dfferent smlarty defnton and technques. Several popular clusterng algorthms are analyzed from three dfferent vewponts: (1) clusterng crtera, (2) cluster representaton, and (3) algorthm framework. Furthermore, some new bult algorthms, whch mx or generalze some other algorthms, are ntroduced. Snce the analyss s from several vewponts, t can cover and dstngush most of the exstng algorthms. It s the bass of the research of self-tunng algorthm and clusterng benchmark. Key words: data mnng; clusterng; algorthm Clusterng s an mportant data-mnng technque used to fnd data segmentaton and pattern nformaton. Clusterng technque s wdely used n applcatons of fnancal data classfcaton, spatal data processng, satellte photo analyss, and medcal fgure auto-detecton etc.. The problem of clusterng s to partton the data set nto segments (called clusters) so that ntra-cluster data are smlar and nter-cluster data are dssmlar. It can be formalzed as follows: Defnton 1. Gven a data set V{v 1,v 2,,v n }, n whch v s (=1,2,,n) are called data ponts. The process of parttonng V nto {C 1,C 2,,C k }, C V ( =1,2,,k), and k =1 C = V, based on the smlarty between data ponts are called clusterng, C s ( =1,2,,k) are called clusters. The defnton does not defne the smlarty between data ponts. In fact, dfferent methods use dfferent crtera. Clusterng s also known as unsupervsed learnng process, snce there s no pror knowledge about the data set. Therefore, clusterng analyss usually acts as the preprocessng of other KDD operatons. The qualty of the clusterng result s mportant for the whole KDD process. As other data mnng operatons, hgh performance and scalablty are other two requests besde the accuracy. Thus, a good clusterng algorthm should satsfy the followng Supported by the Natonal Grand Fundamental Research 973 Program of Chna under Grant No.G ( 973 ); the Natonal Research Foundaton for the Doctoral Program of Hgher Educaton of Chna under Grant No ( ) QIAN We-nng was born n He s a Ph.D. canddate at the Department of Computer Scence, Fudan Unversty. Hs research nterests are clusterng, data mnng and Web data management. ZHOU Ao-yng was born n He s a professor and doctoral supervsor at the Department of Computer Scence, Fudan Unversty. Hs current research nterests nclude Web data management, data mnng, and obect management over peer-to-peer networks.

2 : 1383 requests: Independent of n-advance knowledge; Only need easy-to-set parameters; Accurate; Fast; Havng good scalablty. Much research work has been done on buldng clusterng algorthms. Each uses novel technques to mprove the ablty of handlng certan characterstc data sets. However, dfferent algorthms use dfferent crtera as mentoned above. Snce there s no benchmark for clusterng methods, t s dffcult to compare these algorthms by usng a common measurement. However, a detaled comparson s necessary. Ths s because that: (1) The advantages and dsadvantages should be analyzed, so that mprovement can be developed on exstng algorthms. (2) The user should be able to choose rght algorthm for a certan data set, so that the optmal result and performance can be obtaned. (3) The detaled comparson s the bass for buldng a clusterng benchmark. In ths paper, we analyze several exstng popular algorthms from some dfferent aspects. It s dfferent wth some other survey work [1~3] n that we compare these algorthms unversally from dfferent vewponts, whle others try to generalze some methods to a certan framework, such as n Refs.[1,2], whch can only cover lmted algorthms, or ust ntroduce clusterng algorthms one by one as tutoral [3], so that no comparson among algorthms s analyzed. Snce dfferent algorthms use dfferent crtera and technques, those surveys can only cover some of the algorthms. Furthermore, some algorthms cannot be dstngushed snce they use a same technque so that they fall nto the same category n a certan framework. The rest of ths paper s organzed as follows: Secton 1 to 3 analyze the clusterng algorthms from three dfferent vewponts, namely, clusterng crtera, algorthm framework and cluster representaton. Secton 4 ntroduces some methods, whch are mxture or generalzaton of other algorthms. Secton 5 ntroduces research focus on auto-detecton of clusters. Fnally, Secton 6 s for concluson remarks. It should be note that from each vewpont, although we try to classfy as many algorthms as we can, someone s stll mssng. And some algorthms may fall nto the same category. However, whle we observng these algorthms from all these vewponts, dfferent algorthms can be dstngushed. Ths s the motvaton of our work. 1 Crtera The bass of clusterng analyss s the defnton of smlarty. Usually, the defnton of smlarty contans two parts: (1) The smlarty between data ponts; (2) The smlarty between sets of data ponts. Not all clusterng methods need both of them. Some algorthms only use one. The clusterng crtera can be classfed nto three categores: dstance-based, densty-based, and lnkage-based. Dstance-based and densty-based clusterng s usually appled to data n Eucldean space, whle lnkage-based clusterng can be appled to data n arbtrary metrc space. 1.1 Dstance-Based clusterng The basc dea of dstance-based clusterng s that a cluster s the data ponts close to each other. The dstance between two data ponts s easy to defne n Eucldean space. The wdely used dstance defntons nclude Eucldean dstance, and Manhattan dstance. However, there are several choces for smlarty defnton between two sets of data ponts, as follows: or or Smlarty ( C, C ) = dstance( rep, rep ) Smlarty rep 1 avg ( C, C ) = dstance( v, v ) n n v C, v C (1) (2)

3 1384 Journal of Software 2002,13(8) Smlarty max ( C, C ) = max{ dstance( v, v ) v C, v C } (3) or Smlarty mn ( C, C ) = mn{ dstance( v, v ) v C, v C } (4) In (1), rep and rep are representatves of C and C, respectvely. The representatve of a data set s usually the mean, such as n k-means [4]. Sngle representatve methods usually employ Defnton (1). It s obvous that the complexty of (2), (3), and (4) are all O( C * C ), whch are neffcent for large data sets. Although they are more global defntons, they are usually not drectly appled on smlarty defnton for sub-clusters or clusters. The only excepton s BIRCH [5], n whch CF-vector and CF-tree are employed to accelerate the computaton. Some trade-off approaches are taken, as t wll be dscussed n Secton 2.1, n whch the detaled analyss of sngle representatve methods s also gven. The advantage of dstance-based clusterng s that dstance s easy for computng and understandng. And dstance-based clusterng algorthms usually need parameters of K, whch s the number of fnal clusters user wants, or the mnmum dstance to dstngush two clusters. However, the dsadvantage of them s also dstnct that they are nose-senstve. Although some technques are ntroduced n some of them, they result n other serous problems. CURE [6] uses representatveshrnkng technques to reduce the mpact of noses. Fg. 1 Hollow-Shaped cluster dentfed by CURE However, t nvtes the problem that t fals to dentfy the clusters n hollow shapes, as the result n our experment shown n Fg.1. Ths shortcomng counteracts the advantage of mult-representatves that the algorthm can dentfy arbtrary-shaped clusters. BIRCH, whch s the frst clusterng algorthm consderng noses, ntroduces a new parameter T, whch s substantally a parameter related to densty. Furthermore, t s hard for user to understand ths parameter unless the page storage ablty of CF-tree s known(page_sze/entry_sze/t s an approxmaton of densty n that page). In addton, t may cause loss of small clusters and long-shaped clusters. Snce lack of space, the detaled dscusson s omtted here. 1.2 Densty-Based clusterng Other than dstance-based clusterng methods, densty-based clusterng stands for that clusters are dense areas. Therefore, the smlarty defnton of data ponts s based on whether they belong to connected dense regons. The data ponts belongng to the connected dense regon belong to the same cluster. Based on the dfferent computaton of densty, densty-based clusterng can be further classfed nto Nearest-Neghbor (called NN n the rest of ths paper) methods and cell-based methods. The dfference between them s that the former defne densty based on data set, and the latter defne t based on data space. No matter whch knd a densty-based clusterng algorthm belongs to, t always needs a parameter of mnmum-densty threshold, whch s the key to defne dense regon NN methods NN methods only treat ponts, whch have more than k neghbors n hyper-sphere whose radus s ε, as data ponts n clusters. Snce the neghbors of each pont should be counted, the ndex structures whch support regon query, such as R * -tree, or X-tree, are always employed. Because of the curse of dmensonalty [7], these methods don t have good scalablty for dmensonalty. Furthermore, NN methods wll result n frequent I/O when the data

4 : 1385 sets are very large. However, for most mult-dmensonal data sets, these methods are effcent. In short, the shortcomng of ths knd of methods s the shortcomng of the ndex structures they based-on. Tradtonal NN methods, such as DBSCAN and ts descendants [8~10], need parameters of densty threshold and ε. Recently, OPTICS [11], whose basc dea s the same as DBSCAN, focuses on automatcally dentfcaton of cluster structures. Snce the novel technques n OPTICS do not belong to the topc of ths sub-secton, we wll dscuss them n Secton Cell-Based methods Cell-based methods count densty nformaton based on the unts. STING [12], WaveCluster [13], DBCLASD [14], CLIQUE [15], and OptGrd [16] all fall nto ths category. Cell-based methods have the shortcomng that cells are only pproxmaton of dense areas. Some methods ntroduce technques to solve ths problem, as wll be ntroduced n Secton 2.3. Densty-based clusterng methods all meet problem when data sets contan clusters or sub-clusters whose granularty s smaller than the granularty of unts for computng densty. A well-known example s the dumbbell-shaped clusters, as shown n our expermental result, Fgure 2. However, for densty-based clusterng methods, t s easy to remove noses, f the parameters are properly set. That s to say, t s robust to noses. Fg.2 Dumbbell-Shaped clusters dentfed by densty-based algorthm (DBSCAN) 1.3 Lnkage-Based clusterng Other than dstance-based or densty-based clusterng, lnkage-based clusterng can be appled to arbtrary metrc spaces. Furthermore, snce n hgh-dmensonal space, the dstance nformaton and densty nformaton s not suffcent for clusterng, lnkage-based clusterng s often employed. Algorthms belongng to ths knd nclude ROCK [17], CHAMELEON [18], ARHP [19,20], STIRR [21], CACTUS [22], etc. Lnkage-based methods are based on graph or hyper-graph model. They usually map the data set nto a graph/hyper-graph, then cluster the data ponts based on the edge/hyper-edge nformaton, so that the hghly connected data ponts are assgned to the same cluster. The dfference between graph model and hyper-graph model s that the former reflects the smlarty of par of nodes, whle the latter usually reflects the co-occurrence nformaton. ROCK and CHAMELEON use graph model, whle ARHP, PDDP, STIRR, and CACTUS use hyper-graph model. Although the developers of CACTUS ddn t state that t s a hyper-graph-model-based algorthm, t belongs to that knd. The qualty of lnkage-based clusterng result depends on the defnton of lnk or hyper-edge. Snce t s mpossble to handle a complete graph, the graph/hyper-graph model always elmnates the edges/hyper-edges whose weght s low, so that the graph/hyper-graph s sparse. However, to gan the effcency, t may reduce the accuracy. The algorthms fall n ths category use dfferent frameworks. ROCK and CHAMELEON are herarchcal clusterng methods, whle ARHP s dvsve method, and STIRR uses dynamcal system model. Furthermore, snce the co-occurrence problem s smlar to assocaton rule mnng problem, ARHP and CACTUS both borrow Apror

5 1386 Journal of Software 2002,13(8) algorthm [23] to fnd the clusters. Another algorthm employ Apror-lke algorthm s CLIQUE. However, the monotoncty lemma s used to fnd hgh-dmensonal clusters based on clusters fnd n subspaces. CLIQUE s not lnkage-based clusterng methods, whch s the dfference between t wth other algorthms dscussed n ths subsecton. The detaled dscusson of algorthm framework wll be gven n Secton 3. And snce CHAMELEON uses both lnk and dstance nformaton, t wll be dscussed standalone n Secton Cluster Representaton The purpose of clusterng s to dentfy the data clusters, whch are the summary of the smlar data. Each algorthm should represent the clusters and sub-clusters n some forms. Although labelng each data pont wth a cluster dentty s a straghtforward dea, most methods don t employ ths approach. Ths may be because that: (1) The summary, whch should be easly understandable, s more than (data-pont, cluster-d) pars; (2) It s tme- and space-expensve to label all the data ponts n the process of clusterng; (3) Some methods employ accurate compact cluster representatves, whch make the tme-consumng process of labelng unnecessary. We classfy the cluster representaton technques nto four knds, as dscussed n the followng: 2.1 Representatve ponts Most dstance-based clusterng methods use some ponts to represent clusters. These ponts are called representatve ponts. The representatves may be data ponts, or some other ponts that do not exst n database, such as means of some sets of data ponts. The data representaton technques fallng nto ths category can be further classfed nto three classes: Sngle representatve The smplest approach s to use one pont as the representatve of each cluster. Each data pont s assgned to the cluster whose representatve s the closest one. The representatve pont may be the mean of the cluster, lke k-means [4] methods do, or the data pont n the database, whch s the closest pont to the center, lke k-medods methods do. Other algorthms fall nto ths knd nclude BIRCH [5], CLARA [24], and CLARANS [25]. The dfferent affect of k-means and k-medods methods on clusterng result s ntroduced n detal n Ref.[25]. Snce t s not related to the motvaton of ths paper, we don t survey t here. The shortcomng of sngle representatve approach s obvous: (1) only sphere clusters can be dentfed; and (2) large clusters wth small cluster besde wll be splt, whle some data ponts n the large cluster wll be assgned to the small cluster. These two condtons are shown n Fg.3 (The rght part of ths Fgure s borrowed from Ref.[6], Fg.1(b)). Therefore, ths approach wll fal when processng data sets wth arbtrary shaped clusters or clusters wth great dfference All data ponts Usng all the data ponts n a cluster to represent t s another straghtforward approach. However, t s tme-expensve snce: (1) the data sets are always large so that the label nformaton cannot ft n memory, whch leads to frequent dsk access, and (2) whle computng nformaton ntra- and nter- clusters, t wll access all data ponts. Furthermore, the label nformaton s hard to understand. Therefore, no popular algorthms take ths approach Mult-Representatves Mult-representatves approach s ntroduced n CURE, whch s the trade-off between sngle-pont and all-ponts methods. The frst representatve s the data pont, whch s the farthest to the mean of the cluster. And next, the data pont, whose dstance to the nearest exstng representatve s the largest, s chosen each tme, untl the number of representatves s large enough. In Ref.[6], the experments show that for most data sets, 10

6 : 1387 Fg.3 Non-Sphercal clusters and clusters wth dfferent scales dentfed by sngle representatve methods representatves wll lead to satsfed result. In the long verson of Ref.[26], the authors who developed CURE also mentoned that for complex data sets, more representatves are needed. However, before clusterng, the complexty of the clusters s unknown. Furthermore, the relatonshp between complexty of clusters and number of representatves s not clear. Ths forces the user to choose a large number of representatves. Snce the tme complexty of CURE s O(n 2 log n), n whch n s the number of data ponts n the begnnng, the exstence of large number of representatves n the ntal sub-clusters wll affect the effcency (there exsts sub-clusters because that a smple parttonng technque s used n CURE [6]. The tme-complexty accordng to number of representatves s O(c*log c), f the number of ntal sub-clusters s a fxed number), as shown n our expermental result, Fg.4. Furthermore, along wth the technque they handlng outlers (the shrnkng of representatves), t fals to dentfy clusters of hollow shape, as t has already been dscussed n Secton 1.1 and shown n Fg.1. However, t outperforms sngle-pont and all-ponts approaches when both effectveness and effcency are consdered. Tme (s) Number of representatves n a cluster Fg. 4 Performance of CURE vs. number of representatves n a cluster 2.2 Dense area Some densty-based clusterng algorthms use dense area to denote clusters and sub-clusters. DBSCAN [8], ts descendants [9,10], and OPTICS [11] belong to ths category. Dense area representaton method s smlar to all-data-ponts methods except that only core ponts are used. Core ponts are those data ponts whose neghbors wthn a certan regon are more than the threshold. Therefore, only core ponts are used to expand a sub-cluster, and t wll stop when no further expanson can be appled on core ponts. Dense area can fgure arbtrary-shaped clusters besdes the dumbbell-shaped clusters. However, the cost for computng core ponts s expensve, so that specal ndex structures are needed. In algorthms of DBSCAN seres and OPTICS, R * -tree s used to support regon query. Snce these methods need to scan the whole database, and

7 1388 Journal of Software 2002,13(8) each pont may cause a regon query, these methods always result n frequent I/O when appled to large databases, as shown n experments gven n Secton Cells Some grd-based methods use cells to summary the clusters, such as STING [12], WaveCluster [13], CLIQUE [15], DBCLASD [14], and OptGrd [16] etc.. Other than dense areas, whch are the condensaton of dense data ponts, cells are parttons of the data space. Therefore, a cell s the approxmaton of the data ponts fallng nto t. Ths makes the algorthms takng ths approach naccurate n some condton. In Ref.[12], the authors argue that under a suffcent condton, STING can ensure the result s accurate. However, ths concluson s made n the condton that the characterstc of queres s known a pror. WaveCluster facltates the mult-resoluton property of wavelet to dentfy clusters n dfferent resolutons, whch ensure that the hghest resoluton clusters are accurate. The advantage of usng cells to represent clusters s straghtforward. Frstly, the number of cells s much smaller than the sze of the database. Therefore, the data for processng s lmted, whch leads to hgh scalablty of those approaches. Secondly, the cost of computng propertes of cells s low compared to fndng dense area, whch needs complex data structure support. Ths s because that cells are data ndependent, whle dense area depends on data dstrbuton. At last, as dense areas, cells can reflect the data dstrbuton nformaton of a local area, although t s approxmate. Snce the number of neghborng relatonshp s explosve when the dmensonalty s ncreasng, the algorthms facltatng neghborng nformaton of cells s usually neffcent for hgh-dmensonal data. The only excepton s CLIQUE. Dfferent from other cell representaton methods, CLIQUE fnds dense unts (cells) from low-dmensonal subspaces to hgh-dmensonal subspaces. Therefore, t has hgh scalablty to dmensonalty. Although OptGrd s a cell-based clusterng method, t does not use neghborng nformaton, so that t s effcent for hgh-dmensonal data sets. 2.4 Probablty Some methods use probablty to denote the degree of a data ponts belongng to a cluster. EM [27,28], and AutoClass [29] belong to ths category. The problem of classfyng a data pont to more than one cluster s also known as fuzzy clusterng or soft clusterng. In most cases, the performance of soft clusterng s unsatsfactory. Reference [2] provdes a detaled survey of fuzzy clusterng. Snce the lack of space, we are not verbose here. 3 Algorthm Framework In the above two sectons, we dscussed the clusterng crtera and cluster representaton, whch are the two most mportant factors for clusterng effectveness. In ths secton, the algorthm framework wll be dscussed. The algorthm framework determnes the tme complexty of the algorthms, and the needed parameters. Furthermore, algorthm framework also affects the technques of preprocessng. These are the focuses n the followng three subsectons. 3.1 Optmzaton methods Optmzaton methods usually try to optmze a certan measure. Tradtonal optmzaton methods are also known as parttonng methods. The most famous ones nclude k-means (ncludng ts varance k-modes [30], k-prototypes [30] ) [4], and k-medods (ncludng PAM [24], CLARA [24], CLARANS [25], etc.). Some new bult algorthms also fall nto ths category, ncludng STIRR [21]. K-means methods try to mnmze a dssmlar crteron (typcally the squared-error crteron). K-means

8 : 1389 algorthms usually are lnear to the sze of the data set. However, they are usually senstve to outlers, and often termnate at a local optmum. Therefore, the qualty of the result s not satsfable. Furthermore, they are usually desgned as memory-resdent algorthms, whch lmts the scalablty. Other than k-means, k-medods methods use data ponts to represent a cluster. Snce noses or outlers less nfluence the medods, they are more robust than k-means. However, the cost of k-medods algorthms s also expensve. PAM, CLARA, and CLARANS are three most famous k-medods algorthms. PAM s the frst k-medods method. CLARA and CLARANS both use samplng technque, n whch CLARA use fxed samples, whle CLARANS don t. Furthermore, CLARANS explots randomzed search. Therefore, CLARANS s more scalable than PAM and CLARA. Other than k-means or k-medods, some new bult optmzaton algorthms don t use representatves, such as STIRR. STIRR s desgned to handle categorcal data, so that means or medods s dffcult to defne. It maps the data set nto a hyper-graph and then employs dynamcal system technques to fnd basns, whch are fx-ponts of the system. Therefore, t can be vewed as the process of fndng an optmum of the system confguraton. 3.2 Agglomerate methods Agglomerate algorthms treat data ponts or data set parttons as sub-clusters n the begnnng. Then they merge the sub-clusters teratvely untl the fnal clusters are gotten. BIRCH [5], CURE [6], ISAAC [31], ROCK [17], STING [12], CHAMELEON [18], all fall nto ths category. The agglomerate methods have the shortcomng that the tme complexty s at least O(n2). Therefore, several technques are employed to accelerate the processng. Snce the number of the merge operatons depends on the number of ntal obects, some preprocessng technques are used to reduce the obect to be processed. Samplng and parttonng are two wdely used preprocessng technques. The developers of CURE proved that a small sample could guarantee the qualty of clusterng, whle CURE, STING, CHAMELEON all use parttonng before mergng the sub-clusters. Another technque used to accelerate the processng s ndexng. Nearly all agglomerate algorthms explot specal ndex structure. BIRCH uses CF-tree, CURE uses k-d-tree and heap, ROCK uses two-level heap, STING uses quad-tree-lke ndex, and CHAMELEON uses k-d-tree and heap-based prorty queue. Agglomerate methods usually need a parameter known as stop condton, whch s used to determne when the merge operatons should stop. Ths parameter may be k, the number of fnal clusters, or a threshold, whch denotes the mnmum value of the mergng measurement. 3.3 Dvsve methods Dvsve methods belong to herarchcal methods as agglomerate methods do. Dvsve methods begn wth a large cluster, whch contans all the data ponts, and then partton the cluster based on the dssmlarty recursvely, untl some stop condton s reached. ARHP [19,20], PDDP [20], and OptGrd [16] fall nto ths category. ARHP uses hyper-graph model. The whole data set s mapped to a hyper-graph by usng assocaton rule dscovery technques frst. Then, the sub-graphs satsfy that the ftness s larger than a threshold s parttoned out. At last, the vertces are assgned to the clusters they are hghly connected to. Other than ARHP, whch uses ftness to partton the clusters, PDDP and OptGrd use a hyper-plane to splt a cluster n each teraton. As agglomerate methods, dvsve methods also need the parameter of stop condton. It can be ether the number of fnal clusters: k, or a threshold for parttonng, such as ftness-threshold. The advantage of dvsve methods s that, for graph/hyper-graph model, there s some mature research work, such as HMETIS [32], can be employed. In fact, even CHAMELEON [18], an agglomerate method, has a dvsve step as the pre-processng to get the ntal sub-clusters. Snce t s the preprocessng, the parameter s easy to set.

9 1390 Journal of Software 2002,13(8) 4 Mxed or Generalzed Clusterng Approaches As analyzed above, algorthms usng sngle crtera may fall down on handlng some knd of data sets. Some recent research focuses on combnng or generalzng dfferent crtera. In ths secton, three algorthms of ths knd wll be ntroduced and analyzed. 4.1 CHAMELEON: dstance + connectvty method CHAMELEON [18] s an algorthm combnng several exstng clusterng technques. From the clusterng crtera vewpont, t combnes dstance measurement (relatve closeness) wth lnkage measurement (relatve nter-connectvty). Furthermore, t generalzes the classc dstance measurement n that t uses relatve crtera, whch s frst ntroduced n lnkage-based clusterng [19]. From the algorthm framework vewpont, t uses dvsve method as parttonng step to generate the ntal sub-clusters. And the man phase of the algorthm employs agglomerate framework. From the cluster representaton vewpont, t s an all-pont method. However, the ponts here may be the ntal sub-clusters. The advantages and shortcomngs of CHAMELEON can be derved easly from the multple vewponts analyss. It s strong at dentfcaton of arbtrary shaped clusters and hghly ntra-connectve clusters, snce relatve dstance and relatve connectvty are used. However, t needs two parameters as the threshold of relatve dstance and relatve connectvty respectvely. Furthermore, the dvsve parttonng needs another parameter. Ths s the shortcomng of combnng so many technques together. Furthermore, the framework determnes that ndex structure (e.g. k-d-tree) supports regon query and a heap must be used. Although the tme complexty s analyzed theoretcally, the scalng up technque or experment s not provded n the paper. 4.2 Hybrd: dstance + densty method Hybrd algorthm s a clusterng method combnng dstance and densty crtera [33]. From the vewpont of crtera, t uses dstance and densty nformaton. From the cluster representaton vewpont, t uses mult-representatve technque. Although cell s employed to enable the scalng up processng, t s not used to present the clusters, so that the cluster representaton could be more accurate. From the framework vewpont, t s an agglomerate algorthm. As dscussed before, the advantages and shortcomngs s straghtforward after the analyss. It can dentfy arbtrary-shaped clusters, and be nsenstve to noses or outlers, snce both dstance and densty nformaton are taken use of. However, ths ntroduced three parameters: one s for dstance computng whle other two are for densty computng. Furthermore, the framework determnes the use of k-d-tree and heap structure. Dfferent from CHAMELEON, t s desgned to handlng very large databases. The cell-based ndexng not only reduces the data to be processed, but also acceleratng the labelng process. As shown n our experments, Fg.5, t outperforms two popular clusterng algorthms DBSCAN and CURE, snce that R * -tree takes hgh overhead when processng large data sets, whle CURE fals when data sets scales out of the man memory. Detaled descrpton of the experments can be found n Ref.[33]. 4.3 DENCLUE: generalzed densty method DENCLUE [34] s a densty-based clusterng method, whch tres to generalze several other clusterng algorthms. It can be vewed as a knd of survey on densty-based clusterng algorthms, snce t can cover almost all densty-based algorthms by usng dfferent nfluence functon and densty functon. The developers of DENCLUE also state that t can generalze herarchcal algorthms and parttonng algorthms (named as tradtonal optmzaton algorthms n ths paper). However, t can only denote the framework of those algorthms. It cannot cover those algorthms usng representatves, even dfferent functons or parameters are set.

10 : Tme (s) Data sze Hybrd DBSCAN CURE Fg.5 Scalng-up experments of CURE, DBSCAN, and Hybrd algorthm Snce DENCLUE s n fact a densty-based method. It needs to determne the parameters to calculate densty, and be robust to noses. Furthermore, the cell-based technque determnes that a tree-based ndex should be taken use of, so that t can handle very large data sets. It also employs a flterng technque to reduce the complexty of handlng hgh-dmensonal data. However, another parameter should be ntroduced. 5 Automatc and Vsualzaton Approaches Snce clusterng s a process of unsupervsed learnng, settng approprate parameters s a problem for lots of algorthms. The above analyss show that for most clusterng algorthms, some parameters are needed. Although they may be straghtforward n some cases, they are dffcult to set n many envronments. Furthermore, current cluster representaton technques can be easly understood only when the data s n low-dmensonal space. Therefore, some algorthms are bult for automatc clusterng. Meanwhle, some other efforts has been made to vsualze the process of clusterng, so that the user can set the parameters easly and the result can be more understandable. OPTICS [11] s an algorthm, whch s desgned to dscover cluster structure. It s essentally a densty-based clusterng algorthm, as DBSCAN s. The dfference between OPTICS and other densty-based methods s that t uses reachablty-plots to vsualze the process of clusterng. Furthermore, t ntroduces an automatc technque to detect the steep ponts, so that clusters can be dscovered. By usng dfferent parameters, t can dscover clusters n dfferent densty-level. Therefore, cluster structure s an organzaton of clusters n dfferent densty. In Ref.[35], the authors ntroduced an algorthm to buld mult-granularty cluster-tree. They argued that an accurate mult-granularty cluster-tree should be vertcal dstngushed, horzontal dstngushed, and complete, whch ensure that each node n the cluster-tree denotes a cluster n a certan granularty, whle any cluster n any granularty has a correspondng node n the cluster-tree. The constructon of mult-granularty cluster-tree employs dstance-based clusterng n agglomerate framework, whch s the man dfference between mult-granularty cluster-tree wth cluster structure n Ref.[11]. Therefore, clusters n dfferent densty wll be treated as clusters n dfferent level, and clusters n dfferent scale may be treated as clusters n the same level, by OPTICS; whle mult-granularty cluster-tree wll treat them n the contrary, as shown n Fg.6. The dfference exsts because that the motvaton of buldng mult-granularty cluster-tree s to provde a cluster management faclty to ease the understandng of clusterng result, whle OPTICS s desgned for automatcally or nteractve clusterng.

11 1392 Journal of Software 2002,13(8) Fg.6 Some researchers n computer graphcs also developed some algorthms to vsualze the clusterng process, such as H-BLOB [36]. However, the basc dea s smlar: (1) vsualze the clusterng processng, so that the constructon of clusters can be seen by the user; (2) clusters may exst n dfferent levels, whle dfferent parameters are used, whatever whch crtera s used. 6 Conclusons In ths paper, we try to analyze the exstng popular clusterng algorthms both theoretcally and expermentally from three dfferent vewponts: clusterng crtera, cluster representaton, and algorthm framework, so that most algorthms can be covered, and dstngushed. Ths work can be the bass of: (1) Clusterng algorthm advantage/dsadvantage analyss; (2) Clusterng algorthm selecton for data mnng users; (3) Clusterng algorthm auto-selecton for dfferent data sets; (4) Self-tunng clusterng algorthm development; (5) Clusterng benchmark constructon. The analyss shows that most current algorthms have ts shortcomngs whle beng effectve or effcent for some specal characterstc data sets. Furthermore, three algorthms, whch generalze or mx some other algorthms, are ntroduced. And they are analyzed from the three vewponts ntroduced n ths paper. At last, some automatc/vsualzaton algorthms for clusterng are ntroduced. They are the attempts of researchers to push the unsupervsed learnng process to a more understandable and automatc stage. Acknowledgement We would lke to thank Dr. Wen Jn n Smon Fraser Unversty for hs suggeston on the outlne and draft of ths paper. We also would lke to thank Dr. Joerge Sander for provdng the source code of DBSCAN, and Ms. Hale Qan for helpng us to mplement the algorthms of CURE and Hybrd. References: [1] Fasulo, D. An analyss of recent work on clusterng algorthms. Techncal Report, Department of Computer Scence and Engneerng, Unversty of Washngton, [2] Barald, A., Blonda, P. A survey of fuzzy clusterng algorthms for pattern recognton. IEEE Transactons on Systems, Man and Cybernetcs, Part B (Cybernetcs), 1999,29:786~801. [3] Kem, D.A., Hnneburg, A. Clusterng technques for large data sets from the past to the future. Tutoral Notes for ACM SIGKDD 1999 Internatonal Conference on Knowledge Dscovery and Data Mnng. San Dego, CA, ACM, ~181. [4] McQueen, J. Some methods for classfcaton and Analyss of Multvarate Observatons. In: LeCam, L., Neyman, J., eds. Proceedngs of the 5th Berkeley Symposum on Mathematcal Statstcs and Probablty ~297. [5] Zhang, T., Ramakrshnan, R., Lvny, M. BIRCH: an effcent data clusterng method for very large databases. In: Jagadsh, H.V., Mumck, I.S., eds. Proceedngs of the 1996 ACM SIGMOD Internatonal Conference on Management of Data. Quebec: ACM Press, ~114. [6] Guha, S., Rastog, R., Shm, K. CURE: an effcent clusterng algorthm for large databases. In: Haas, L.M., Twary, A., eds. Proceedngs of the 1998 ACM SIGMOD Internatonal Conference on Management of Data. Seattle: ACM Press, ~84.

12 : 1393 [7] Beyer, K.S., Goldsten, J., Ramakrshnan, R., et al. When s nearest neghbor meanngful? In: Beer, C., Buneman, P., eds. Proceedngs of the 7th Internatonal Conference on Data Theory, ICDT 99. LNCS1540, Jerusalem, Israel: Sprnger, ~235. [8] Ester, M., Kregel, H.-P., Sander, J., et al. A densty-based algorthm for dscoverng clusters n large spatal databases wth noses. In: Smouds, E., Han, J., Fayyad, U.M., eds. Proceedngs of the 2nd Internatonal Conference on Knowledge Dscovery and Data Mnng (KDD 96). AAAI Press, ~231. [9] Ester, M., Kregel, H.-P., Sander, J., et al. Incremental clusterng for mnng n a data warehousng envronment. In: Gupta, A., Shmuel, O., Wdom, J., eds. Proceedngs of the 24th Internatonal Conference on Very Large Data Bases. New York: Morgan Kaufmann, ~333. [10] Sander, J., Ester, M., Kregel, H.-P., et al. Densty-Based clusterng n spatal databases: the algorthm GDBSCAN and ts applcatons. Data Mnng and Knowledge Dscovery, 1998,2(2):169~194. [11] Ankerst, M., Breung, M.M., Kregel, H.-P., et al. OPTICS: orderng ponts to dentfy the clusterng structure. In: Dels, A., Faloutsos, C., Ghandeharzadeh, S., eds. Proceedngs of the 1999 ACM SIGMOD Internatonal Conference on Management of Data. Phladelpha: ACM Press, ~60. [12] Wang, W., Yang, J, Muntz, R. STING: a statstcal nformaton grd approach to spatal data mnng. In: Jarke, M., Carey, M.J., Dttrch, K.R., et al., eds. Proceedngs of the 23rd Internatonal Conference on Very Large Data Bases. Athens: Morgan Kaufmann, ~195. [13] Shekholeslam, G., Chatteree, S., Zhang, A. WaveCluster: a mult-resoluton clusterng approach for very large spatal databases. In: Gupta, A., Shmuel, O., Wdom, J., eds. Proceedngs of the 24th Internatonal Conference on Very Large Data Bases. New York: Morgan Kaufmann, ~438. [14] Xu, X., Ester, M., Kregel, H.-P., et al. A dstrbuton-based clusterng algorthm for mnng n large spatal databases. In: Proceedngs of the 14th Internatonal Conference on Data Engneerng. Orlando: IEEE Computer Socety Press, ~331. [15] Agrawal, R., Gehrke, J., Gunopulos, D., et al. Automatc subspace clusterng of hgh dmensonal data for data mnng applcatons. In: Haas, L.M., Twary, A., eds. Proceedngs of the 1998 ACM SIGMOD Internatonal Conference on Management of Data. Seattle: ACM Press, ~105. [16] Hnnebrug, A., Kem, D.A. Optmal grd-clusterng: towards breakng the curse of dmensonalty n hgh-dmensonal clusterng. In: Atknson, M.P., Orlowska, M.E., Valdurez, P., et al., eds. Proceedngs of the 25th Internatonal Conference on Very Large Data Bases. Ednburgh: Morgan Kaufmann, ~517. [17] Guha, S., Rastog, R., Shm, K. ROCK: a robust clusterng algorthm for categorcal attrbutes. In: Proceedngs of the 15th Internatonal Conference on Data Engneerng. Sydney: IEEE Computer Socety Press, ~521. [18] Karyps, G., Han, E.H., Kumar, V. CHAMELEON: a herarchcal clusterng algorthm usng dynamc modelng. IEEE Computer, 1999,32(8):68~75. [19] Han, E.H., Karyps, G., Kumar, V., et al. Hypergraph based clusterng n hgh-dmensonal data sets: a summary of results. Data Engneerng Bulletn, 1998,21(1):15~22. [20] Boley, D., Gn, M., Gross, R., et al. Parttonng-Based clusterng for web document categorzaton. Decson Support System Journal, 1999,27(3):329~341. [21] Gbson, D., Klenberg, J.M., Raghavan, P. Clusterng categorcal data: an approach based on dynamcal systems. In: Gupta, A., Shmuel, O., Wdom, J., eds. Proceedngs of the 24th Internatonal Conference on Very Large Data Bases. New York: Morgan Kaufmann, ~322. [22] Gant, V., Gehrke, J., Ramakrshnan, R. CACTUS, clusterng categorcal data usng summares. In: Proceedngs of the 5th Internatonal Conference on Knowledge Dscovery and Data Mnng. San Dego: ACM Press, ~83. [23] Agrawal, R., Srkant, R. Fast algorthms for mnng assocaton rules. In: Bocca, J.B., Jarke, M., Zanolo, C., eds. Proceedngs of the 20th Internatonal Conference on Very Large Data Bases (VLDB 94). Santago: Morgan Kaufmann, ~499. [24] Kaufman, L., Rousseeuw, P.J. Fndng Groups n Data: an Introducton to Cluster Analyss. John Wley & Sons, [25] Ng, R.T., Han, J. Effcent and effectve clusterng methods for spatal data mnng. In: Bocca, J.B., Jarke, M., Zanolo, C., eds. Proceedngs of the 20th Internatonal Conference on Very Large Data Bases (VLDB 94). Santago: Morgan Kaufmann, ~155. [26] Guha, S., Rastog, R., Shm, K. CURE: an effcent clusterng algorthm for large databases. Informaton System Journal, 1998, 26(1):35~58. [27] Dempster, A.P., Lard, N.M., Rubn, D.B. Maxmum lkelhood from ncomplete data va the EM algorthm. Journal of the Royal Statstcal Socety(Seres B), 1977,29(1):1~38.

13 1394 Journal of Software 2002,13(8) [28] Laurtzen, S.L. The EM algorthm for graphcal assocaton models wth mssng data. Computatonal Statstcs and Data Analyss, 1995,19:191~201. [29] Cheeseman, P., Stutz, J. Bayesan classfcaton (AutoClass): theory and results. In: Fayyad, U.M., Patetsky-Shapro, G., Smyth, P., et al., eds. Advances n Knowledge Dscovery and Data Mnng. AAAI/MIT Press, ~180. [30] Huang, Z. Extensons to the K-means algorthm for clusterng large data sets wth categorcal values. Data Mnng and Knowledge Dscovery, 1998,2:283~304. [31] Talavera, L., Bear, J. Effcent constructon of comprehensble herarchcal clusterng. In: Zytkow, J.M., Quafalou, M., eds. Prncples of Data Mnng and Knowledge Dscovery, Proceedngs of the 2nd European Symposum, PKDD 98. LNCS1510, Nantes: Sprnger-Verlag, ~101. [32] Karyps, G., Aggarwal, R., Kumar, V., et al. Multlevel hypergraph parttonng: applcaton n VLSI doman. In: Proceedngs of the 34th Conference on Desgn Automaton. Anahem, CA: ACM Press, ~529. [33] Zhou, A., Qan, W., Qan, H., et al. A hybrd approach to clusterng n very large databases. In: Cheung, D., Wllams, G.J., L, Q., eds. Proceedngs of the 5th Pacfc-Asa Conference on Knowledge Dscovery and Data Mnng. LNCS2035, Hong Kong: Sprnger-Verlag, ~524. [34] Hnneburg, A., Kem, D.A. An effcent approach to clusterng n large multmeda databases wth nose. In: Agrawal, R., Stolorz, P.E., Patetsky-Shapro, G., eds. Proceedngs of the 4th Internatonal Conference on Knowledge Dscovery and Data Mnng (KDD 98). New York: AAAI Press, ~65. [35] Zhou, A., Qan, W., Qan, H., et al. SACT: automatc cluster-tree constructon for very large spatal databases. Techncal Report, Computer Scence Department, Fudan Unversty, [36] Sprenger, T.C., Brunella, R., Gross, M.H. H-BLOB: a herarchcal vsual clusterng method usng mplct surfaces. Techncal Report No.341, Computer Scence Department, ETH Zürch, ftp://ftp.nf.ethz.ch/pub/publcatons/tech-reports/3xx/341.pdf., (, ) (, ) :.,,.. 3 : (1) ; (2) ; (3).,. 3,.. : ; : TP311 : A

A Deflected Grid-based Algorithm for Clustering Analysis

A Deflected Grid-based Algorithm for Clustering Analysis A Deflected Grd-based Algorthm for Clusterng Analyss NANCY P. LIN, CHUNG-I CHANG, HAO-EN CHUEH, HUNG-JEN CHEN, WEI-HUA HAO Department of Computer Scence and Informaton Engneerng Tamkang Unversty 5 Yng-chuan

More information

Hierarchical clustering for gene expression data analysis

Hierarchical clustering for gene expression data analysis Herarchcal clusterng for gene expresson data analyss Gorgo Valentn e-mal: valentn@ds.unm.t Clusterng of Mcroarray Data. Clusterng of gene expresson profles (rows) => dscovery of co-regulated and functonally

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning Outlne Artfcal Intellgence and ts applcatons Lecture 8 Unsupervsed Learnng Professor Danel Yeung danyeung@eee.org Dr. Patrck Chan patrckchan@eee.org South Chna Unversty of Technology, Chna Introducton

More information

SCALABLE AND VISUALIZATION-ORIENTED CLUSTERING FOR EXPLORATORY SPATIAL ANALYSIS

SCALABLE AND VISUALIZATION-ORIENTED CLUSTERING FOR EXPLORATORY SPATIAL ANALYSIS SCALABLE AND VISUALIZATION-ORIENTED CLUSTERING FOR EXPLORATORY SPATIAL ANALYSIS J.H.Guan, F.B.Zhu, F.L.Ban a School of Computer, Spatal Informaton & Dgtal Engneerng Center, Wuhan Unversty, Wuhan, 430079,

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

Unsupervised Learning

Unsupervised Learning Pattern Recognton Lecture 8 Outlne Introducton Unsupervsed Learnng Parametrc VS Non-Parametrc Approach Mxture of Denstes Maxmum-Lkelhood Estmates Clusterng Prof. Danel Yeung School of Computer Scence and

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15 CS434a/541a: Pattern Recognton Prof. Olga Veksler Lecture 15 Today New Topc: Unsupervsed Learnng Supervsed vs. unsupervsed learnng Unsupervsed learnng Net Tme: parametrc unsupervsed learnng Today: nonparametrc

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

Clustering. A. Bellaachia Page: 1

Clustering. A. Bellaachia Page: 1 Clusterng. Obectves.. Clusterng.... Defntons... General Applcatons.3. What s a good clusterng?. 3.4. Requrements 3 3. Data Structures 4 4. Smlarty Measures. 4 4.. Standardze data.. 5 4.. Bnary varables..

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

Study of Data Stream Clustering Based on Bio-inspired Model

Study of Data Stream Clustering Based on Bio-inspired Model , pp.412-418 http://dx.do.org/10.14257/astl.2014.53.86 Study of Data Stream lusterng Based on Bo-nspred Model Yngme L, Mn L, Jngbo Shao, Gaoyang Wang ollege of omputer Scence and Informaton Engneerng,

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervsed Learnng and Clusterng Why consder unlabeled samples?. Collectng and labelng large set of samples s costly Gettng recorded speech s free, labelng s tme consumng 2. Classfer could be desgned

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervsed Learnng and Clusterng Supervsed vs. Unsupervsed Learnng Up to now we consdered supervsed learnng scenaro, where we are gven 1. samples 1,, n 2. class labels for all samples 1,, n Ths s also

More information

A Similarity Measure Method for Symbolization Time Series

A Similarity Measure Method for Symbolization Time Series Research Journal of Appled Scences, Engneerng and Technology 5(5): 1726-1730, 2013 ISSN: 2040-7459; e-issn: 2040-7467 Maxwell Scentfc Organzaton, 2013 Submtted: July 27, 2012 Accepted: September 03, 2012

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

The Research of Support Vector Machine in Agricultural Data Classification

The Research of Support Vector Machine in Agricultural Data Classification The Research of Support Vector Machne n Agrcultural Data Classfcaton Le Sh, Qguo Duan, Xnmng Ma, Me Weng College of Informaton and Management Scence, HeNan Agrcultural Unversty, Zhengzhou 45000 Chna Zhengzhou

More information

Face Recognition Method Based on Within-class Clustering SVM

Face Recognition Method Based on Within-class Clustering SVM Face Recognton Method Based on Wthn-class Clusterng SVM Yan Wu, Xao Yao and Yng Xa Department of Computer Scence and Engneerng Tong Unversty Shangha, Chna Abstract - A face recognton method based on Wthn-class

More information

Clustering is a discovery process in data mining.

Clustering is a discovery process in data mining. Cover Feature Chameleon: Herarchcal Clusterng Usng Dynamc Modelng Many advanced algorthms have dffculty dealng wth hghly varable clusters that do not follow a preconceved model. By basng ts selectons on

More information

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Determining the Optimal Bandwidth Based on Multi-criterion Fusion Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Mult-crteron Fuson Ha-L Lang 1+, Xan-Mn

More information

Machine Learning. Topic 6: Clustering

Machine Learning. Topic 6: Clustering Machne Learnng Topc 6: lusterng lusterng Groupng data nto (hopefully useful) sets. Thngs on the left Thngs on the rght Applcatons of lusterng Hypothess Generaton lusters mght suggest natural groups. Hypothess

More information

Classifier Selection Based on Data Complexity Measures *

Classifier Selection Based on Data Complexity Measures * Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

CS 534: Computer Vision Model Fitting

CS 534: Computer Vision Model Fitting CS 534: Computer Vson Model Fttng Sprng 004 Ahmed Elgammal Dept of Computer Scence CS 534 Model Fttng - 1 Outlnes Model fttng s mportant Least-squares fttng Maxmum lkelhood estmaton MAP estmaton Robust

More information

An Improved Image Segmentation Algorithm Based on the Otsu Method

An Improved Image Segmentation Algorithm Based on the Otsu Method 3th ACIS Internatonal Conference on Software Engneerng, Artfcal Intellgence, Networkng arallel/dstrbuted Computng An Improved Image Segmentaton Algorthm Based on the Otsu Method Mengxng Huang, enjao Yu,

More information

A New Approach For the Ranking of Fuzzy Sets With Different Heights

A New Approach For the Ranking of Fuzzy Sets With Different Heights New pproach For the ankng of Fuzzy Sets Wth Dfferent Heghts Pushpnder Sngh School of Mathematcs Computer pplcatons Thapar Unversty, Patala-7 00 Inda pushpndersnl@gmalcom STCT ankng of fuzzy sets plays

More information

STING : A Statistical Information Grid Approach to Spatial Data Mining

STING : A Statistical Information Grid Approach to Spatial Data Mining STING : A Statstcal Informaton Grd Approach to Spatal Data Mnng We Wang, Jong Yang, and Rchard Muntz Department of Computer Scence Unversty of Calforna, Los Angeles {wewang, jyang, muntz}@cs.ucla.edu February

More information

Outlier Detection Methodologies Overview

Outlier Detection Methodologies Overview Outler Detecton Methodologes Overvew Mohd. Noor Md. Sap Department of Computer and Informaton Systems Faculty of Computer Scence and Informaton Systems Unverst Teknolog Malaysa 81310 Skuda, Johor Bahru,

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

Clustering Algorithm of Similarity Segmentation based on Point Sorting

Clustering Algorithm of Similarity Segmentation based on Point Sorting Internatonal onference on Logstcs Engneerng, Management and omputer Scence (LEMS 2015) lusterng Algorthm of Smlarty Segmentaton based on Pont Sortng Hanbng L, Yan Wang*, Lan Huang, Mngda L, Yng Sun, Hanyuan

More information

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1 4/14/011 Outlne Dscrmnatve classfers for mage recognton Wednesday, Aprl 13 Krsten Grauman UT-Austn Last tme: wndow-based generc obect detecton basc ppelne face detecton wth boostng as case study Today:

More information

BIN XIA et al: AN IMPROVED K-MEANS ALGORITHM BASED ON CLOUD PLATFORM FOR DATA MINING

BIN XIA et al: AN IMPROVED K-MEANS ALGORITHM BASED ON CLOUD PLATFORM FOR DATA MINING An Improved K-means Algorthm based on Cloud Platform for Data Mnng Bn Xa *, Yan Lu 2. School of nformaton and management scence, Henan Agrcultural Unversty, Zhengzhou, Henan 450002, P.R. Chna 2. College

More information

SHAPE RECOGNITION METHOD BASED ON THE k-nearest NEIGHBOR RULE

SHAPE RECOGNITION METHOD BASED ON THE k-nearest NEIGHBOR RULE SHAPE RECOGNITION METHOD BASED ON THE k-nearest NEIGHBOR RULE Dorna Purcaru Faculty of Automaton, Computers and Electroncs Unersty of Craoa 13 Al. I. Cuza Street, Craoa RO-1100 ROMANIA E-mal: dpurcaru@electroncs.uc.ro

More information

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms Course Introducton Course Topcs Exams, abs, Proects A quc loo at a few algorthms 1 Advanced Data Structures and Algorthms Descrpton: We are gong to dscuss algorthm complexty analyss, algorthm desgn technques

More information

Concurrent Apriori Data Mining Algorithms

Concurrent Apriori Data Mining Algorithms Concurrent Apror Data Mnng Algorthms Vassl Halatchev Department of Electrcal Engneerng and Computer Scence York Unversty, Toronto October 8, 2015 Outlne Why t s mportant Introducton to Assocaton Rule Mnng

More information

Available online at Available online at Advanced in Control Engineering and Information Science

Available online at   Available online at   Advanced in Control Engineering and Information Science Avalable onlne at wwwscencedrectcom Avalable onlne at wwwscencedrectcom Proceda Proceda Engneerng Engneerng 00 (2011) 15000 000 (2011) 1642 1646 Proceda Engneerng wwwelsevercom/locate/proceda Advanced

More information

A PATTERN RECOGNITION APPROACH TO IMAGE SEGMENTATION

A PATTERN RECOGNITION APPROACH TO IMAGE SEGMENTATION 1 THE PUBLISHING HOUSE PROCEEDINGS OF THE ROMANIAN ACADEMY, Seres A, OF THE ROMANIAN ACADEMY Volume 4, Number 2/2003, pp.000-000 A PATTERN RECOGNITION APPROACH TO IMAGE SEGMENTATION Tudor BARBU Insttute

More information

Smoothing Spline ANOVA for variable screening

Smoothing Spline ANOVA for variable screening Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory

More information

Learning-Based Top-N Selection Query Evaluation over Relational Databases

Learning-Based Top-N Selection Query Evaluation over Relational Databases Learnng-Based Top-N Selecton Query Evaluaton over Relatonal Databases Lang Zhu *, Wey Meng ** * School of Mathematcs and Computer Scence, Hebe Unversty, Baodng, Hebe 071002, Chna, zhu@mal.hbu.edu.cn **

More information

An Image Fusion Approach Based on Segmentation Region

An Image Fusion Approach Based on Segmentation Region Rong Wang, L-Qun Gao, Shu Yang, Yu-Hua Cha, and Yan-Chun Lu An Image Fuson Approach Based On Segmentaton Regon An Image Fuson Approach Based on Segmentaton Regon Rong Wang, L-Qun Gao, Shu Yang 3, Yu-Hua

More information

Feature Reduction and Selection

Feature Reduction and Selection Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components

More information

Bidirectional Hierarchical Clustering for Web Mining

Bidirectional Hierarchical Clustering for Web Mining Bdrectonal Herarchcal Clusterng for Web Mnng ZHONGMEI YAO & BEN CHOI Computer Scence, College of Engneerng and Scence Lousana Tech Unversty, Ruston, LA 71272, USA zya001@latech.edu, pro@bencho.org Abstract

More information

FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK

FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK L-qng Qu, Yong-quan Lang 2, Jng-Chen 3, 2 College of Informaton Scence and Technology, Shandong Unversty of Scence and Technology,

More information

A new segmentation algorithm for medical volume image based on K-means clustering

A new segmentation algorithm for medical volume image based on K-means clustering Avalable onlne www.jocpr.com Journal of Chemcal and harmaceutcal Research, 2013, 5(12):113-117 Research Artcle ISSN : 0975-7384 CODEN(USA) : JCRC5 A new segmentaton algorthm for medcal volume mage based

More information

Module Management Tool in Software Development Organizations

Module Management Tool in Software Development Organizations Journal of Computer Scence (5): 8-, 7 ISSN 59-66 7 Scence Publcatons Management Tool n Software Development Organzatons Ahmad A. Al-Rababah and Mohammad A. Al-Rababah Faculty of IT, Al-Ahlyyah Amman Unversty,

More information

Clustering algorithms and validity measures

Clustering algorithms and validity measures Clusterng algorthms and valdty measures M. Hald, Y. Batstas, M. Vazrganns Department of Informatcs Athens Unversty of Economcs & Busness Emal: {mhal, yanns, mvazrg}@aueb.gr Abstract Clusterng ams at dscoverng

More information

K-means and Hierarchical Clustering

K-means and Hierarchical Clustering Note to other teachers and users of these sldes. Andrew would be delghted f you found ths source materal useful n gvng your own lectures. Feel free to use these sldes verbatm, or to modfy them to ft your

More information

LECTURE : MANIFOLD LEARNING

LECTURE : MANIFOLD LEARNING LECTURE : MANIFOLD LEARNING Rta Osadchy Some sldes are due to L.Saul, V. C. Raykar, N. Verma Topcs PCA MDS IsoMap LLE EgenMaps Done! Dmensonalty Reducton Data representaton Inputs are real-valued vectors

More information

Query Clustering Using a Hybrid Query Similarity Measure

Query Clustering Using a Hybrid Query Similarity Measure Query clusterng usng a hybrd query smlarty measure Fu. L., Goh, D.H., & Foo, S. (2004). WSEAS Transacton on Computers, 3(3), 700-705. Query Clusterng Usng a Hybrd Query Smlarty Measure Ln Fu, Don Hoe-Lan

More information

Querying by sketch geographical databases. Yu Han 1, a *

Querying by sketch geographical databases. Yu Han 1, a * 4th Internatonal Conference on Sensors, Measurement and Intellgent Materals (ICSMIM 2015) Queryng by sketch geographcal databases Yu Han 1, a * 1 Department of Basc Courses, Shenyang Insttute of Artllery,

More information

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

Related-Mode Attacks on CTR Encryption Mode

Related-Mode Attacks on CTR Encryption Mode Internatonal Journal of Network Securty, Vol.4, No.3, PP.282 287, May 2007 282 Related-Mode Attacks on CTR Encrypton Mode Dayn Wang, Dongda Ln, and Wenlng Wu (Correspondng author: Dayn Wang) Key Laboratory

More information

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like:

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like: Self-Organzng Maps (SOM) Turgay İBRİKÇİ, PhD. Outlne Introducton Structures of SOM SOM Archtecture Neghborhoods SOM Algorthm Examples Summary 1 2 Unsupervsed Hebban Learnng US Hebban Learnng, Cntd 3 A

More information

A fast algorithm for color image segmentation

A fast algorithm for color image segmentation Unersty of Wollongong Research Onlne Faculty of Informatcs - Papers (Arche) Faculty of Engneerng and Informaton Scences 006 A fast algorthm for color mage segmentaton L. Dong Unersty of Wollongong, lju@uow.edu.au

More information

Hybridization of Expectation-Maximization and K-Means Algorithms for Better Clustering Performance

Hybridization of Expectation-Maximization and K-Means Algorithms for Better Clustering Performance BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 16, No 2 Sofa 2016 Prnt ISSN: 1311-9702; Onlne ISSN: 1314-4081 DOI: 10.1515/cat-2016-0017 Hybrdzaton of Expectaton-Maxmzaton

More information

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng

More information

A NEW LINEAR APPROXIMATE CLUSTERING ALGORITHM BASED UPON SAMPLING WITH PROBABILITY DISTRIBUTING

A NEW LINEAR APPROXIMATE CLUSTERING ALGORITHM BASED UPON SAMPLING WITH PROBABILITY DISTRIBUTING A NEW LINEAR APPROXIMATE CLUSTERING ALGORITHM BASED UPON SAMPLING WITH PROBABILITY DISTRIBUTING CHANG-AN YUAN,, CHANG-JIE TANG, CHUAN LI, JIAN-JUN HU, JING PENG College of Computer, Schuan unversty, Chengdu,

More information

Determining Fuzzy Sets for Quantitative Attributes in Data Mining Problems

Determining Fuzzy Sets for Quantitative Attributes in Data Mining Problems Determnng Fuzzy Sets for Quanttatve Attrbutes n Data Mnng Problems ATTILA GYENESEI Turku Centre for Computer Scence (TUCS) Unversty of Turku, Department of Computer Scence Lemmnkäsenkatu 4A, FIN-5 Turku

More information

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION Paulo Quntlano 1 & Antono Santa-Rosa 1 Federal Polce Department, Brasla, Brazl. E-mals: quntlano.pqs@dpf.gov.br and

More information

Survey of Cluster Analysis and its Various Aspects

Survey of Cluster Analysis and its Various Aspects Harmnder Kaur et al, Internatonal Journal of Computer Scence and Moble Computng, Vol.4 Issue.0, October- 05, pg. 353-363 Avalable Onlne at www.csmc.com Internatonal Journal of Computer Scence and Moble

More information

Detection of an Object by using Principal Component Analysis

Detection of an Object by using Principal Component Analysis Detecton of an Object by usng Prncpal Component Analyss 1. G. Nagaven, 2. Dr. T. Sreenvasulu Reddy 1. M.Tech, Department of EEE, SVUCE, Trupath, Inda. 2. Assoc. Professor, Department of ECE, SVUCE, Trupath,

More information

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification Introducton to Artfcal Intellgence V22.0472-001 Fall 2009 Lecture 24: Nearest-Neghbors & Support Vector Machnes Rob Fergus Dept of Computer Scence, Courant Insttute, NYU Sldes from Danel Yeung, John DeNero

More information

Constructing Minimum Connected Dominating Set: Algorithmic approach

Constructing Minimum Connected Dominating Set: Algorithmic approach Constructng Mnmum Connected Domnatng Set: Algorthmc approach G.N. Puroht and Usha Sharma Centre for Mathematcal Scences, Banasthal Unversty, Rajasthan 304022 usha.sharma94@yahoo.com Abstract: Connected

More information

Using Fuzzy Logic to Enhance the Large Size Remote Sensing Images

Using Fuzzy Logic to Enhance the Large Size Remote Sensing Images Internatonal Journal of Informaton and Electroncs Engneerng Vol. 5 No. 6 November 015 Usng Fuzzy Logc to Enhance the Large Sze Remote Sensng Images Trung Nguyen Tu Huy Ngo Hoang and Thoa Vu Van Abstract

More information

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints Australan Journal of Basc and Appled Scences, 2(4): 1204-1208, 2008 ISSN 1991-8178 Sum of Lnear and Fractonal Multobjectve Programmng Problem under Fuzzy Rules Constrants 1 2 Sanjay Jan and Kalash Lachhwan

More information

Object-Based Techniques for Image Retrieval

Object-Based Techniques for Image Retrieval 54 Zhang, Gao, & Luo Chapter VII Object-Based Technques for Image Retreval Y. J. Zhang, Tsnghua Unversty, Chna Y. Y. Gao, Tsnghua Unversty, Chna Y. Luo, Tsnghua Unversty, Chna ABSTRACT To overcome the

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc.

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc. [Type text] [Type text] [Type text] ISSN : 0974-74 Volume 0 Issue BoTechnology 04 An Indan Journal FULL PAPER BTAIJ 0() 04 [684-689] Revew on Chna s sports ndustry fnancng market based on market -orented

More information

Collaboratively Regularized Nearest Points for Set Based Recognition

Collaboratively Regularized Nearest Points for Set Based Recognition Academc Center for Computng and Meda Studes, Kyoto Unversty Collaboratvely Regularzed Nearest Ponts for Set Based Recognton Yang Wu, Mchhko Mnoh, Masayuk Mukunok Kyoto Unversty 9/1/013 BMVC 013 @ Brstol,

More information

Biostatistics 615/815

Biostatistics 615/815 The E-M Algorthm Bostatstcs 615/815 Lecture 17 Last Lecture: The Smplex Method General method for optmzaton Makes few assumptons about functon Crawls towards mnmum Some recommendatons Multple startng ponts

More information

A Novel Adaptive Descriptor Algorithm for Ternary Pattern Textures

A Novel Adaptive Descriptor Algorithm for Ternary Pattern Textures A Novel Adaptve Descrptor Algorthm for Ternary Pattern Textures Fahuan Hu 1,2, Guopng Lu 1 *, Zengwen Dong 1 1.School of Mechancal & Electrcal Engneerng, Nanchang Unversty, Nanchang, 330031, Chna; 2. School

More information

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z.

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z. TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS Muradalyev AZ Azerbajan Scentfc-Research and Desgn-Prospectng Insttute of Energetc AZ1012, Ave HZardab-94 E-mal:aydn_murad@yahoocom Importance of

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS ARPN Journal of Engneerng and Appled Scences 006-017 Asan Research Publshng Network (ARPN). All rghts reserved. NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS Igor Grgoryev, Svetlana

More information

The Codesign Challenge

The Codesign Challenge ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.

More information

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices Internatonal Mathematcal Forum, Vol 7, 2012, no 52, 2549-2554 An Applcaton of the Dulmage-Mendelsohn Decomposton to Sparse Null Space Bases of Full Row Rank Matrces Mostafa Khorramzadeh Department of Mathematcal

More information

1. Introduction. Abstract

1. Introduction. Abstract Image Retreval Usng a Herarchy of Clusters Danela Stan & Ishwar K. Seth Intellgent Informaton Engneerng Laboratory, Department of Computer Scence & Engneerng, Oaland Unversty, Rochester, Mchgan 48309-4478

More information

A Topology-aware Random Walk

A Topology-aware Random Walk A Topology-aware Random Walk Inkwan Yu, Rchard Newman Dept. of CISE, Unversty of Florda, Ganesvlle, Florda, USA Abstract When a graph can be decomposed nto clusters of well connected subgraphs, t s possble

More information

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices Steps for Computng the Dssmlarty, Entropy, Herfndahl-Hrschman and Accessblty (Gravty wth Competton) Indces I. Dssmlarty Index Measurement: The followng formula can be used to measure the evenness between

More information

TN348: Openlab Module - Colocalization

TN348: Openlab Module - Colocalization TN348: Openlab Module - Colocalzaton Topc The Colocalzaton module provdes the faclty to vsualze and quantfy colocalzaton between pars of mages. The Colocalzaton wndow contans a prevew of the two mages

More information

Research on Categorization of Animation Effect Based on Data Mining

Research on Categorization of Animation Effect Based on Data Mining MATEC Web of Conferences 22, 0102 0 ( 2015) DOI: 10.1051/ matecconf/ 2015220102 0 C Owned by the authors, publshed by EDP Scences, 2015 Research on Categorzaton of Anmaton Effect Based on Data Mnng Na

More information

Fast Computation of Shortest Path for Visiting Segments in the Plane

Fast Computation of Shortest Path for Visiting Segments in the Plane Send Orders for Reprnts to reprnts@benthamscence.ae 4 The Open Cybernetcs & Systemcs Journal, 04, 8, 4-9 Open Access Fast Computaton of Shortest Path for Vstng Segments n the Plane Ljuan Wang,, Bo Jang

More information

A Comparative Study for Outlier Detection Techniques in Data Mining

A Comparative Study for Outlier Detection Techniques in Data Mining A Comparatve Study for Outler Detecton Technques n Data Mnng Zurana Abu Bakar, Rosmayat Mohemad, Akbar Ahmad Department of Computer Scence Faculty of Scence and Technology Unversty College of Scence and

More information

Maximum Variance Combined with Adaptive Genetic Algorithm for Infrared Image Segmentation

Maximum Variance Combined with Adaptive Genetic Algorithm for Infrared Image Segmentation Internatonal Conference on Logstcs Engneerng, Management and Computer Scence (LEMCS 5) Maxmum Varance Combned wth Adaptve Genetc Algorthm for Infrared Image Segmentaton Huxuan Fu College of Automaton Harbn

More information

Incremental Learning with Support Vector Machines and Fuzzy Set Theory

Incremental Learning with Support Vector Machines and Fuzzy Set Theory The 25th Workshop on Combnatoral Mathematcs and Computaton Theory Incremental Learnng wth Support Vector Machnes and Fuzzy Set Theory Yu-Mng Chuang 1 and Cha-Hwa Ln 2* 1 Department of Computer Scence and

More information

Network Intrusion Detection Based on PSO-SVM

Network Intrusion Detection Based on PSO-SVM TELKOMNIKA Indonesan Journal of Electrcal Engneerng Vol.1, No., February 014, pp. 150 ~ 1508 DOI: http://dx.do.org/10.11591/telkomnka.v1.386 150 Network Intruson Detecton Based on PSO-SVM Changsheng Xang*

More information

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach Angle Estmaton and Correcton of Hand Wrtten, Textual and Large areas of Non-Textual Document Images: A Novel Approach D.R.Ramesh Babu Pyush M Kumat Mahesh D Dhannawat PES Insttute of Technology Research

More information

Spatial Data Dynamic Balancing Distribution Method Based on the Minimum Spatial Proximity for Parallel Spatial Database

Spatial Data Dynamic Balancing Distribution Method Based on the Minimum Spatial Proximity for Parallel Spatial Database JOURNAL OF SOFTWARE, VOL. 6, NO. 7, JULY 211 1337 Spatal Data Dynamc Balancng Dstrbuton Method Based on the Mnmum Spatal Proxmty for Parallel Spatal Database Yan Zhou College of Automaton Unversty of Electrc

More information

Optimal Workload-based Weighted Wavelet Synopses

Optimal Workload-based Weighted Wavelet Synopses Optmal Workload-based Weghted Wavelet Synopses Yoss Matas School of Computer Scence Tel Avv Unversty Tel Avv 69978, Israel matas@tau.ac.l Danel Urel School of Computer Scence Tel Avv Unversty Tel Avv 69978,

More information

EXTENDED BIC CRITERION FOR MODEL SELECTION

EXTENDED BIC CRITERION FOR MODEL SELECTION IDIAP RESEARCH REPORT EXTEDED BIC CRITERIO FOR ODEL SELECTIO Itshak Lapdot Andrew orrs IDIAP-RR-0-4 Dalle olle Insttute for Perceptual Artfcal Intellgence P.O.Box 59 artgny Valas Swtzerland phone +4 7

More information

CHAPTER 2 DECOMPOSITION OF GRAPHS

CHAPTER 2 DECOMPOSITION OF GRAPHS CHAPTER DECOMPOSITION OF GRAPHS. INTRODUCTION A graph H s called a Supersubdvson of a graph G f H s obtaned from G by replacng every edge uv of G by a bpartte graph,m (m may vary for each edge by dentfyng

More information

Clustering Algorithm Combining CPSO with K-Means Chunqin Gu 1, a, Qian Tao 2, b

Clustering Algorithm Combining CPSO with K-Means Chunqin Gu 1, a, Qian Tao 2, b Internatonal Conference on Advances n Mechancal Engneerng and Industral Informatcs (AMEII 05) Clusterng Algorthm Combnng CPSO wth K-Means Chunqn Gu, a, Qan Tao, b Department of Informaton Scence, Zhongka

More information

UB at GeoCLEF Department of Geography Abstract

UB at GeoCLEF Department of Geography   Abstract UB at GeoCLEF 2006 Mguel E. Ruz (1), Stuart Shapro (2), June Abbas (1), Slva B. Southwck (1) and Davd Mark (3) State Unversty of New York at Buffalo (1) Department of Lbrary and Informaton Studes (2) Department

More information