Bidirectional Hierarchical Clustering for Web Mining

Size: px
Start display at page:

Download "Bidirectional Hierarchical Clustering for Web Mining"

Transcription

1 Bdrectonal Herarchcal Clusterng for Web Mnng ZHONGMEI YAO & BEN CHOI Computer Scence, College of Engneerng and Scence Lousana Tech Unversty, Ruston, LA 71272, USA Abstract In ths paper we propose a new bdrectonal herarchcal clusterng system for addressng challenges of web mnng. The key feature of our approach s that t ams to maxmze the ntra-cluster smlarty n the bottom-up cluster-mergng phase and t ensures to mnmze the nter-cluster smlarty n the top-down refnement phase. Ths two-pass approach acheves better clusterng than exstng one-pass approaches. We also propose a new cluster-mergng crteron for allowng more than two clusters to be merged n each step and a new measure of smlarty for takng nto consderaton not only the nter-connectvty between clusters but also the nternal connectvty wthn the clusters. These result n reducng the average complexty for creatng the fnal herarchcal structure of clusters from O(n 2 ) to O(n). The herarchcal structure represents a semantc structure between concepts of clusters and s drectly applcable to the future of semantc net. 1. Introducton The World Wde Web, wth ts explosve growth and ever-broadenng reach, has become the default knowledge resource for many areas of endeavor. It s becomng ncreasngly mportant to devse sophstcated schemes to fnd nterestng concepts and relatons between concepts from ths resource. Clusterng s one of the technques that can solve ths problem. Clusterng s an unsupervsed dscovery process for parttonng a set of data such that the ntra-cluster smlarty s maxmzed and the ntercluster smlarty s mnmzed [1,2]. The applcaton of clusterng technques to web mnng has been facng a number of challenges [3,4], such as huge amount of resources, retreval tme, hgh dmensonalty, qualty, and meanngful nterpretaton. In ths paper we propose a new Bdrectonal Herarchcal Clusterng system n a hgh dmensonal space based n part on the graph parttonng model [,6]. Our system frst uses the all-k-nearest neghbors [7] to sparsfy the graph and to elmnate outlers. In our bottom-up cluster-mergng phase, we defne a new edge matchng [,6] method that takes nto consderaton not only the nter-connectvty between vertces but also the nternal connectvty wthn the vertces. Ths edge matchng method also dscovers the herarchcal structure of clusters much faster than the usual herarchcal clusterng. Our top-down refnement processng then elmnates errors that occurred n the greedy clustermergng phase. The fnal step s to extract concepts from the clusters organzed n the herarchcal structure. The rest of ths paper s organzed as follows. Secton 2 revews related work. Our proposed Bdrectonal Herarchcal Clusterng system s presented n Secton 3. Secton 4 dscusses the computatonal complexty of our algorthm. Secton contans conclusons and future work. 2. Related Work Numerous clusterng algorthms appear n lterature [1-4,8-19]. Clusterng technques can be broadly categorzed nto parttonal clusterng and herarchcal clusterng [1,2] whch dffer n whether they produce flat parttons or herarchy of clusters. The k-means s a parttonal clusterng algorthm whch has O(n) tme complexty n terms of the number of data ponts [8,19]. Whle the k- means s senstve to outlers, the medod-based method elmnates ths problem typfed by PAM and CLARANS [9]. But the k-medods have O(n 2 ) tme complexty. The lmtatons of these two parttonal schemes are that they are senstve to ntal seeds and they fal when clusters have arbtrary shapes or large dfferent szes. Ths research was supported n part by Center for Entrepreneurshp and Informaton Technology (CEnIT), Lousana Tech Unversty, Grant CSe Proceedngs of the IEEE/WIC Internatonal Conference on Web Intellgence (WI 03) /03 $ IEEE

2 Herarchcal clusterng creates a nested sequence of clusters. There are varatons of herarchcal agglomeratve clusterng (HAC) algorthms whch dffer prmarly n how they compute the dstance between clusters [1,2]. For nstance, the sngle lnk method can fnd clusters of arbtrary shape or dfferent szes, but t s susceptble to nose and outlers. The complete lnk method s less used because of ts O(n 3 ) tme complexty. An effcent method s the group average method whch defnes the average par-wse dstance as the cluster dstance. Other densty-based or grd-based clusterng methods were presented, e.g. GDBSCAN [13] and OptGrd [14]. Nevertheless they do not work effectvely n a very hgh dmensonal space [4,1]. Another clusterng approach s the probablstc approach [16]. Ths approach tends to mpose structure on the data and the selected dstrbuton famly may not be approprate [4]. More recently, clusterng algorthms for mnng large databases have been proposed [10-12]. Most of these are varants of herarchcal clusterng, e.g. BIRCH [11], CURE [12], and CHAMELEON [10]. In summary, only k-means methods, HAC methods and graph parttonng algorthms [10] have been appled n very hgh dmensonal datasets. The performances of HAC algorthms have hgher qualty and are more versatle than the k-means algorthm. The maor lmtatons of HAC methods [8,10,17] are ther O(n 2 ) tme complexty and the errors that may occur durng the greedy cluster-mergng procedure (Fgure 1). In the followng sectons we present our new algorthm that overcomes the lmtatons of common HAC methods. A F B 8 7 G D E 6 7 C H (Step1) (Step2) A,B 1 F,G D,E A,B,C D,E F,G,H C H (Step3,4) (Step,6,7) Fgure 1. An example of HAC. Note that the greedy decson can lead to an ncorrect soluton. The correct soluton n ths case s (ABCD) and (EFGH). 3. Our New Bdrectonal Herarchcal Clusterng (BHC) System In ths secton we propose our new BHC approach. Ths approach conssts of the followng fve maor steps: (1) representng web pages by vector-space model; (2) generatng the matrx of k-nearest neghbors of web B A 8 6 C 3 D,E 4 4 F 7 G H pages; (3) bottom-up cluster mergng phase; (4) top-down refnement phase; and () extractng concepts of clusters Representng Web Pages We convert a web page nto a vector of features and only text n the web page s represented: (w 1,, w k,, w m ) where w k s the weght of the term t k n the th web page, and m s the number of dstnct terms (dmensonalty) n the dataset. Hereafter m denotes the dmensonalty and n denotes the number of web pages. A maor dffculty of text clusterng s the hgh dmensonalty of the feature space [20,21]. After removng stoppng-terms and stemmng terms n web pages, we remove those terms whose document frequences [21] are less than a threshold n order to reduce dmensonalty. The document frequency thresholdng can be relably used, because t elmnated 90% or more unque terms wth ether an mprovement or no loss n accuracy of performance and t also has lowest cost [21]. We then use the term frequency nverse document frequency (tf-df) [19] to determne w k (1 k m, and n): w k tf log( n / df ) s, k where tf k s the frequency of the term t k n the th web page, df k s the document frequency of term t k, and s s the normalzaton component. Fnally, the length of each web page vector s normalzed to have unt L 2 norm [19], that s, s m 2 ( ( tf k log( N / df 1/ k ) 2 ). k 1 The normalzaton ensures that web pages dealng wth the same subect matter, but dfferng n length lead to smlar web page vectors [19]. The cosne measure s then appled to compute smlarty between vectors, d and d : cos( d, d d d ) d d where denotes the dot-product of vectors and d s the length of the vector. Snce the length of the web page vector s normalzed to have unt length, the above formula s smplfed to cos(d,d )=d d. Cosne measure has been tested to be one of the best smlarty measures compared to Dce Coeffcent, extended Jaccard, Eucldean, Pearson correlaton measures n web page doman [4,20,22] Generatng All-k-nearest-neghbor Matrx Fndng k-nearest-neghbor for each web page can be solved by brute force usng O(n 2 ) smlarty computatons. Fortunately, there are fast algorthms to solve the all-k- k Proceedngs of the IEEE/WIC Internatonal Conference on Web Intellgence (WI 03) /03 $ IEEE

3 nearest-neghbor (Aknn) problem. We apply the fast algorthm presented n [7]. The fast Aknn algorthm starts wth a rough guess of the set of k-nearest neghbors and refnes t when more nformaton s avalable throughout the process. A pvot-based ndex s used to ndex the set of nearest neghbors. The pvot-based ndexng algorthm [7] works on the trangle nequalty. However, general smlarty functons do not obey the trangle nequalty. Thus we have to transform the smlarty s nto dstance t= log(s ) [4] ust for the Aknn problem. The algorthm n has O( ) ( 2) tme complexty. The value of depends on how good the ndex s to search n the vector space. The n k Aknn matrx s used to construct a sparse graph, n whch a vertex represents a web page and each vertex s connected wth ts k-nearest neghbors. Edges n the graph are weghted by the par-wse smlartes among vertces. We denote the maxmum edge weght as Max, whch wll be used to determne thresholds n the followng phase. Ths k-nearest-neghbor graph approach reduces redundancy, outlers and overall executon tme The Bottom-up Cluster Mergng Phase Our bottom-up cluster mergng approach operates on the dea of matchng [,6] n graph parttonng. If the edge between two vertces n the graph G =(V,E ) (V s the set of vertces and E s the set of edges) has been matched, t s collapsed and a mult-node consstng of these two vertces s created. A coarser graph G +1 s obtaned by collapsng the matched adacent vertces n G (Fgure 2 [,6]). Each vertex n the orgnal graph G 0 s regarded as a sngle cluster. A herarchcal structure of clusters s created n the graph coarsenng procedure. Fgure 2. Matchng vertces to coarsen a graph. We defne a new matchng method called Heavy Connectvty Matchng (HCM) for allowng more than two vertces to be merged n each stage. We also defne a new smlarty measure called edge connectvty for takng nto consderaton the nter-connectvty between vertces and the nternal connectvty wthn the vertces. Edge connectvty between vertces u and v s defned: n _ edge ( u ) n _ edge ( v ) cr _ edge ( u, v ) u v where n_edge(u) s the sum of the weghts of edges connectng sub-vertces n vertex u f u contanng more than one vertces; otherwse t s 0; cr_edge(u,v) s the weght of the edge crossng between vertces u and v; and u v s the number of edges n the unon of u and v. Our Heavy Connectvty Matchng method proceeds by vstng vertces n an arbtrary order. If a vertex u has not been matched yet, we select ts unmatched adacent vertces such that the edge connectvty between u and ts unmatched adacent vertces s larger than a threshold. Vertex u and ts matched adacent vertces are then combned to form a mult-node for the next coarser graph. In order to preserve the connectvty nformaton n the coarser graph, we update edge weghts after each stage of coarsenng the graph. Let V v be the set of vertces of G combned to form vertex v of the next coarser G +1. We compute n_edge(v) to be the sum of weghts of edges connectng the vertces wthn V v. In the case where more than one vertex of V v contan edges to another vertex u, the weght of the cr_edge(v,u) s updated as the sum of the weghts of edges connectng v and u. HCM s appled successvely to coarsen the graph. In each stage the threshold s dvded by a decay factor, ( >1) [18]. equals Max/ for the frst stage. Durng the th stage, the edges whch have weghts n the range of (Max/ ~ Max/ +1 ) are matched and collapsed. controls the speed of coarsenng and guarantees a certan number of edges are matched and collapsed durng each stage. The herarchcal structure of clusters s thus created durng ths mergng phase. The mergng procedure stops when the hghest edge connectvty n the coarsest graph s below a stoppng factor that s a functon of Max/ The Top-down Refnement Phase After groupng vertces n the greedy herarchcal way, we successvely refne the clusters as we proect the coarser graph G +1 down to the larger fner graph G. Obtanng the larger G from the coarser G +1 s done smply by transformng mult-node v of G +1 back to ndvdual vertces, V v, of G. Snce G s fner, t provdes more degrees of freedom that can be used to refne clusterng. The refnement algorthm s used to reduce the nterconnectvty between clusters (or mult-nodes). The nterconnectvty between clusters A and B s defned as: weght (, ) A, B gan, A B where vertex belongs to cluster A (or mult-node A), vertex belongs to cluster B and A s the sze of cluster A. If a vertex n A s swapped to cluster B and decreases the value of gan, then the vertex should be moved to Proceedngs of the IEEE/WIC Internatonal Conference on Web Intellgence (WI 03) /03 $ IEEE

4 cluster B. The gan s smlar to the rato-cut heurstc n [17]. Gven the defnton of gan, for each vertex u, we compute mprovement f u s moved from the cluster t belongs to, to one of the other clusters that u s connected to. The mprovement s ndcated by the heurstc value of (gan-before-swap gan-after-swap). The Kernghan- Ln algorthm (KL) [,6] then proceeds by repeatedly selectng a vertex u wth the hghest heurstc value and movng t to the desred cluster. After movng u, u won t be moved agan and the heurstc values of the vertces adacent to u are updated to reflect the change. In each fner graph, the KL algorthm s termnated when no more vertex movng wll decrease the nter-connectvty between clusters. Ths refnement algorthm s appled at each successve fner graph. For the example n Fgure 1, we wll computer the mprovement f any vertex s moved to the other cluster whch t s connected to. As we can see, D wll be moved to the other cluster (ABC) snce ts value of (gan-beforeswap gan-after-swap) s ( =0.37). Further movng won t have any mprovement. Ths llustrates our refnement method can mprove clusterng and obtan the correct soluton. We can see that the top-down (coarsest-fnest) refnement approach operates at dfferent representaton scales and can easly dentfy groups of vertces to be moved together. Thus ths mult-level refnement approach can clmb out of local mnma very effectvely [,6]. 3.. Concept Extracton We extract cluster concept by selectng the most mportant terms from each cluster. We apply the most frequent and predctve term method to extract the concepts of clusters, snce t receved the best performance over 2 method, most frequent term method, and most predctve term method [24]. The most frequent and predctve word method selects terms based on the product of local frequency and predctveness: p ( term cluster ) p ( term cluster ) p ( term ) where p(term cluster) s frequency of the term n the cluster and p(term) s the term s frequency n the whole collecton. The k-hghest-rankng terms are thus extracted from the cluster to represent the concept. 4. Analyss of Computatonal Complexty The overall computatonal complexty of our new algorthm depends on the tme complexty of buldng the all-k-nearest-neghbor matrx and the amount of tme t requres to perform the bottom-up and top-down phases of the clusterng algorthm. The tme complexty of fndng the Aknn has been dscussed n the prevous secton, whch s O( n ) ( n). The amount of tme requred by the mergng phrase depends on rate n whch the sze of successvely coarser graphs s decreasng. If the sze of successvely coarse graphs decreases by a constant factor, then the complexty of the algorthm s lnear on the number of vertces and the number of edges n the graph [,6]. In our new edge matchng approach, snce the Max and the decay factor control the speed of coarsenng the graph, an approprate value of may guarantee a number of edges are matched and collapsed durng each stage. In ths case, the bottom-up cluster mergng phase has O(n) tme complexty, because n an Aknn sparse graph the number of edges s lnear on the number of vertces. In the worst case, when the sze of successvely coarser graphs decreases by only a few vertces at a tme, the complexty of the mergng algorthm wll be quadratc on the number of vertces n the graph. The complexty of refnement phase s same as the mergng phase snce both of them are multlevel algorthms. (The KL mproved by Fducca and Mattheyses [23] reduces complexty to O( E ) by usng approprate data structures.) Therefore the average complexty of overall procedure s determned by constructng the Aknn graph, whch takes O( n ) ( n) tme.. Conclusons and Future Work In ths paper we presented the comprehensve process for clusterng web pages and extractng cluster concepts. More mportantly, we proposed a new BHC algorthm based n part on multlevel graph parttonng. We defned a new edge matchng method that preferred mergng the sub-clusters whose edge connectvty was hgh n the bottom-up cluster-mergng phase. We also used an obectve functon for the top-down refnement procedure that decreased the nter-connectvty between dfferent clusters. Thus the new algorthm tred to maxmze the ntra-cluster smlarty n the bottom-up cluster-mergng phase and t ensured to mnmze the nter-cluster smlarty n the top-down refnement phase. The advantages of our algorthm are that t elmnated the errors occurrng n greedy clusterng algorthms and ts multlevel refnement procedure was very effectve n clmbng out of local mnma. The average tme complexty of our new algorthm s O( n ) ( n), whch s also faster than the common HAC algorthm (O(n 2 )). Proceedngs of the IEEE/WIC Internatonal Conference on Web Intellgence (WI 03) /03 $ IEEE

5 We beleve that the new algorthm wll have good performance n near future, snce the Aknn algorthm, the multlevel graph parttonng and KL algorthm were well studed and mplemented. However, as we can see, the choce of proper obectve functons s essental for overall success of our algorthm. Another problem s the method we used to extract concepts of clusters. Usng mportant terms to represent concepts of clusters s the smplest way but more sophstcated methods reman to be developed. Our future work ncludes nvestgatng more sophstcated methods for clusterng based on contextual meanng of web pages and ncorporatng them wth our proposed classfcaton system [2,26] nto our web-page Classfcaton and Search Engne. References [1] B. S. Evertt, S. Landua, and M. Leese, Cluster Analyss, Arnold, London Great Brtan, [2] A. K. Jan, M. N. Murty, and P. J. Flynn, Data Clusterng: A Revew, ACM computng Surveys, Vol. 31, No. 3, September 1999, pp [3] O. Zamr and O. Etzon, Web Document Clusterng: A Feasblty Demonstraton, n Proc. 21st Annu. Int. ACM SIGIR Conf., 1998, pp [4] A. Strehl, Relatonshp-based Clusterng and Cluster Ensembles for Hgh-dmensonal Data Mnng, Dssertaton, The Unversty of Texas as Austn, May [] G. Karyps, and V. Kumar, Multlevel k-way Parttonng Scheme for Irregular Graph, Journal of Parallel and Dstrbuted computng, 48(1), 1998, pp [6] G. Karyps and V. Kumar, A Fast and Hgh Qualty Multlevel Scheme for Parttonng Irregular Graphs, SIAM Journal of Scentfc Computng, 20(1), 1999, pp [7] E. Chavez, K. Fgueroa, and G. Navarro, A Fast Algorthm for the All K Nearest Neghbors Problem n General Metrc Spaces, umch.mx/ ~elchavez/publca/. [8] M. Stenbach, G. Karyps, V. Kumar, A Comparson of Document Clusterng Technques, KDD 2000, Techncal report of Unversty of Mnnesota. [9] R. Ng and J. Han, "Effcent and Effectve Clusterng Methods for Spatal Data Mnng", VLDB-94. [10] G. Karyps, E.-H. Han, V. Kumar, CHAMELEON: A Herarchcal Clusterng Algorthm Usng Dynamc Modelng, IEEE Computer, 32(8), August 1999, pp [11] T. Zhang, R. Ramakrshnan and M. Lnvy, BIRCH: an Effcent Data Clusterng Method for Very Large Databases, Proceedngs of the ACM SIGMOD Conference on Management of Data, Montreal, Canada, 1996, pp [12] S. Guha, R. Rastog, and K. Shm, CURE: A Clusterng Algorthm for Large Databases, Proceedngs of the ACM SIGMOD Conference on Management of Data, 1998, pp [13] J. Sander, M. Ester, H. P. Kregel and X. Xu, Denstybased Clusterng n Spatal Databases: The Algorthm GDBSCAN and ts Applcatons, An Internatonal Journal 2(2), Kluwer Academc Publshers, Norwell, MA., June 1998, pp [14] A. Hnneburg and D. A. Kem. An Optmal Grdclusterng: Towards Breakng the Curse of Dmensonalty n Hgh-dmensonal Clusterng, VLDB- 99, [1] B. Lu, Y. Xa, P. S. Yu, Clusterng Through Decson Tree Constructon, SIGMOD [16] M. Goldszmdt and M. Saham, A Probablstc Approach to Full-Text Document Clusterng, Techncal Report ITAD-433-MS , SRI Internatonal, [17] G. Karyps, E.-H. Han, and V. Kumar, Multlevel Refnement for Herarchcal Clusterng, fsmat.umch.mx/ ~elchavez/publca/. [18] K. Raaraman and H. Pan, Document Clusterng usng 3- tuples, PRICAI'2000 Internatonal Workshop on Text and Web Mnng, Melbourne, Australa, Sep. 2000, p88-9. [19] I. S. Dhllon, J. Fan and Y. Guan, Effcent Clusterng of Very Large Document Collectons, Data Mnng for Scentfc and Engneerng Applcatons, Kluwer Academc Publsher, [20] C. J. Rsbergen, Informaton Retreval, Butterworths, [21] Y. Yang and J. O. Pedersen, A Comparatve Study on Feature Selecton n Text Categorzaton, ~ymng/ papers.yy/ml97.ps. [22] A. Strehl, J. Ghosh, and R. Mooney, Impact of Smlarty Measures on Web-page Clusterng, Proceedngs of the AAAI2002 Workshop on Artfcal Intellgence for Web Search, AAAI/MIT Press, Austn, Texas, July 2002, pp8-64. [23] C. M. Fducca and R. M. Mattheyses, A Lnear Tme Heurstc for Improvng Network Parttons, Proceedngs 19th IEEE Desgn Automaton Conference, 1982, pages [24] A. Popescul and L. H. Ungar, Automatc Labelng of Document Clusters, http: // ~popescul/publcatons.html. [2] X. Peng & B. Cho, Automatc Web Page Classfcaton n a Dynamc and Herarchcal Way, IEEE Internatonal Conference on Data Mnng, 2002, pp [26] B. Cho, Makng Sense of Search Results by Automatc Web-page Classfcatons, WebNet 2001, 2001, pp Proceedngs of the IEEE/WIC Internatonal Conference on Web Intellgence (WI 03) /03 $ IEEE

Hierarchical clustering for gene expression data analysis

Hierarchical clustering for gene expression data analysis Herarchcal clusterng for gene expresson data analyss Gorgo Valentn e-mal: valentn@ds.unm.t Clusterng of Mcroarray Data. Clusterng of gene expresson profles (rows) => dscovery of co-regulated and functonally

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

Clustering is a discovery process in data mining.

Clustering is a discovery process in data mining. Cover Feature Chameleon: Herarchcal Clusterng Usng Dynamc Modelng Many advanced algorthms have dffculty dealng wth hghly varable clusters that do not follow a preconceved model. By basng ts selectons on

More information

A Deflected Grid-based Algorithm for Clustering Analysis

A Deflected Grid-based Algorithm for Clustering Analysis A Deflected Grd-based Algorthm for Clusterng Analyss NANCY P. LIN, CHUNG-I CHANG, HAO-EN CHUEH, HUNG-JEN CHEN, WEI-HUA HAO Department of Computer Scence and Informaton Engneerng Tamkang Unversty 5 Yng-chuan

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15 CS434a/541a: Pattern Recognton Prof. Olga Veksler Lecture 15 Today New Topc: Unsupervsed Learnng Supervsed vs. unsupervsed learnng Unsupervsed learnng Net Tme: parametrc unsupervsed learnng Today: nonparametrc

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

Analyzing Popular Clustering Algorithms from Different Viewpoints

Analyzing Popular Clustering Algorithms from Different Viewpoints 1000-9825/2002/13(08)1382-13 2002 Journal of Software Vol.13, No.8 Analyzng Popular Clusterng Algorthms from Dfferent Vewponts QIAN We-nng, ZHOU Ao-yng (Department of Computer Scence, Fudan Unversty, Shangha

More information

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning Outlne Artfcal Intellgence and ts applcatons Lecture 8 Unsupervsed Learnng Professor Danel Yeung danyeung@eee.org Dr. Patrck Chan patrckchan@eee.org South Chna Unversty of Technology, Chna Introducton

More information

Query Clustering Using a Hybrid Query Similarity Measure

Query Clustering Using a Hybrid Query Similarity Measure Query clusterng usng a hybrd query smlarty measure Fu. L., Goh, D.H., & Foo, S. (2004). WSEAS Transacton on Computers, 3(3), 700-705. Query Clusterng Usng a Hybrd Query Smlarty Measure Ln Fu, Don Hoe-Lan

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervsed Learnng and Clusterng Supervsed vs. Unsupervsed Learnng Up to now we consdered supervsed learnng scenaro, where we are gven 1. samples 1,, n 2. class labels for all samples 1,, n Ths s also

More information

Web Mining: Clustering Web Documents A Preliminary Review

Web Mining: Clustering Web Documents A Preliminary Review Web Mnng: Clusterng Web Documents A Prelmnary Revew Khaled M. Hammouda Department of Systems Desgn Engneerng Unversty of Waterloo Waterloo, Ontaro, Canada 2L 3G1 hammouda@pam.uwaterloo.ca February 26,

More information

Clustering. A. Bellaachia Page: 1

Clustering. A. Bellaachia Page: 1 Clusterng. Obectves.. Clusterng.... Defntons... General Applcatons.3. What s a good clusterng?. 3.4. Requrements 3 3. Data Structures 4 4. Smlarty Measures. 4 4.. Standardze data.. 5 4.. Bnary varables..

More information

Unsupervised Learning

Unsupervised Learning Pattern Recognton Lecture 8 Outlne Introducton Unsupervsed Learnng Parametrc VS Non-Parametrc Approach Mxture of Denstes Maxmum-Lkelhood Estmates Clusterng Prof. Danel Yeung School of Computer Scence and

More information

Load Balancing for Hex-Cell Interconnection Network

Load Balancing for Hex-Cell Interconnection Network Int. J. Communcatons, Network and System Scences,,, - Publshed Onlne Aprl n ScRes. http://www.scrp.org/journal/jcns http://dx.do.org/./jcns.. Load Balancng for Hex-Cell Interconnecton Network Saher Manaseer,

More information

Classifier Selection Based on Data Complexity Measures *

Classifier Selection Based on Data Complexity Measures * Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1 4/14/011 Outlne Dscrmnatve classfers for mage recognton Wednesday, Aprl 13 Krsten Grauman UT-Austn Last tme: wndow-based generc obect detecton basc ppelne face detecton wth boostng as case study Today:

More information

Constructing Minimum Connected Dominating Set: Algorithmic approach

Constructing Minimum Connected Dominating Set: Algorithmic approach Constructng Mnmum Connected Domnatng Set: Algorthmc approach G.N. Puroht and Usha Sharma Centre for Mathematcal Scences, Banasthal Unversty, Rajasthan 304022 usha.sharma94@yahoo.com Abstract: Connected

More information

Graph-based Clustering

Graph-based Clustering Graphbased Clusterng Transform the data nto a graph representaton ertces are the data ponts to be clustered Edges are eghted based on smlarty beteen data ponts Graph parttonng Þ Each connected component

More information

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng

More information

SCALABLE AND VISUALIZATION-ORIENTED CLUSTERING FOR EXPLORATORY SPATIAL ANALYSIS

SCALABLE AND VISUALIZATION-ORIENTED CLUSTERING FOR EXPLORATORY SPATIAL ANALYSIS SCALABLE AND VISUALIZATION-ORIENTED CLUSTERING FOR EXPLORATORY SPATIAL ANALYSIS J.H.Guan, F.B.Zhu, F.L.Ban a School of Computer, Spatal Informaton & Dgtal Engneerng Center, Wuhan Unversty, Wuhan, 430079,

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

A Webpage Similarity Measure for Web Sessions Clustering Using Sequence Alignment

A Webpage Similarity Measure for Web Sessions Clustering Using Sequence Alignment A Webpage Smlarty Measure for Web Sessons Clusterng Usng Sequence Algnment Mozhgan Azmpour-Kv School of Engneerng and Scence Sharf Unversty of Technology, Internatonal Campus Ksh Island, Iran mogan_az@ksh.sharf.edu

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information

Concurrent Apriori Data Mining Algorithms

Concurrent Apriori Data Mining Algorithms Concurrent Apror Data Mnng Algorthms Vassl Halatchev Department of Electrcal Engneerng and Computer Scence York Unversty, Toronto October 8, 2015 Outlne Why t s mportant Introducton to Assocaton Rule Mnng

More information

Efficient Segmentation and Classification of Remote Sensing Image Using Local Self Similarity

Efficient Segmentation and Classification of Remote Sensing Image Using Local Self Similarity ISSN(Onlne): 2320-9801 ISSN (Prnt): 2320-9798 Internatonal Journal of Innovatve Research n Computer and Communcaton Engneerng (An ISO 3297: 2007 Certfed Organzaton) Vol.2, Specal Issue 1, March 2014 Proceedngs

More information

LinkSelector: A Web Mining Approach to. Hyperlink Selection for Web Portals

LinkSelector: A Web Mining Approach to. Hyperlink Selection for Web Portals nkselector: A Web Mnng Approach to Hyperlnk Selecton for Web Portals Xao Fang and Olva R. u Sheng Department of Management Informaton Systems Unversty of Arzona, AZ 8572 {xfang,sheng}@bpa.arzona.edu Submtted

More information

Face Recognition University at Buffalo CSE666 Lecture Slides Resources:

Face Recognition University at Buffalo CSE666 Lecture Slides Resources: Face Recognton Unversty at Buffalo CSE666 Lecture Sldes Resources: http://www.face-rec.org/algorthms/ Overvew of face recognton algorthms Correlaton - Pxel based correspondence between two face mages Structural

More information

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices Steps for Computng the Dssmlarty, Entropy, Herfndahl-Hrschman and Accessblty (Gravty wth Competton) Indces I. Dssmlarty Index Measurement: The followng formula can be used to measure the evenness between

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

Available online at Available online at Advanced in Control Engineering and Information Science

Available online at   Available online at   Advanced in Control Engineering and Information Science Avalable onlne at wwwscencedrectcom Avalable onlne at wwwscencedrectcom Proceda Proceda Engneerng Engneerng 00 (2011) 15000 000 (2011) 1642 1646 Proceda Engneerng wwwelsevercom/locate/proceda Advanced

More information

An Improved Image Segmentation Algorithm Based on the Otsu Method

An Improved Image Segmentation Algorithm Based on the Otsu Method 3th ACIS Internatonal Conference on Software Engneerng, Artfcal Intellgence, Networkng arallel/dstrbuted Computng An Improved Image Segmentaton Algorthm Based on the Otsu Method Mengxng Huang, enjao Yu,

More information

K-means and Hierarchical Clustering

K-means and Hierarchical Clustering Note to other teachers and users of these sldes. Andrew would be delghted f you found ths source materal useful n gvng your own lectures. Feel free to use these sldes verbatm, or to modfy them to ft your

More information

BRDPHHC: A Balance RDF Data Partitioning Algorithm based on Hybrid Hierarchical Clustering

BRDPHHC: A Balance RDF Data Partitioning Algorithm based on Hybrid Hierarchical Clustering 015 IEEE 17th Internatonal Conference on Hgh Performance Computng and Communcatons (HPCC), 015 IEEE 7th Internatonal Symposum on Cyberspace Safety and Securty (CSS), and 015 IEEE 1th Internatonal Conf

More information

1. Introduction. Abstract

1. Introduction. Abstract Image Retreval Usng a Herarchy of Clusters Danela Stan & Ishwar K. Seth Intellgent Informaton Engneerng Laboratory, Department of Computer Scence & Engneerng, Oaland Unversty, Rochester, Mchgan 48309-4478

More information

Module Management Tool in Software Development Organizations

Module Management Tool in Software Development Organizations Journal of Computer Scence (5): 8-, 7 ISSN 59-66 7 Scence Publcatons Management Tool n Software Development Organzatons Ahmad A. Al-Rababah and Mohammad A. Al-Rababah Faculty of IT, Al-Ahlyyah Amman Unversty,

More information

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 An Iteratve Soluton Approach to Process Plant Layout usng Mxed

More information

A New Approach For the Ranking of Fuzzy Sets With Different Heights

A New Approach For the Ranking of Fuzzy Sets With Different Heights New pproach For the ankng of Fuzzy Sets Wth Dfferent Heghts Pushpnder Sngh School of Mathematcs Computer pplcatons Thapar Unversty, Patala-7 00 Inda pushpndersnl@gmalcom STCT ankng of fuzzy sets plays

More information

Keyword-based Document Clustering

Keyword-based Document Clustering Keyword-based ocument lusterng Seung-Shk Kang School of omputer Scence Kookmn Unversty & AIrc hungnung-dong Songbuk-gu Seoul 36-72 Korea sskang@kookmn.ac.kr Abstract ocument clusterng s an aggregaton of

More information

Optimizing Document Scoring for Query Retrieval

Optimizing Document Scoring for Query Retrieval Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng

More information

Study of Data Stream Clustering Based on Bio-inspired Model

Study of Data Stream Clustering Based on Bio-inspired Model , pp.412-418 http://dx.do.org/10.14257/astl.2014.53.86 Study of Data Stream lusterng Based on Bo-nspred Model Yngme L, Mn L, Jngbo Shao, Gaoyang Wang ollege of omputer Scence and Informaton Engneerng,

More information

From Comparing Clusterings to Combining Clusterings

From Comparing Clusterings to Combining Clusterings Proceedngs of the Twenty-Thrd AAAI Conference on Artfcal Intellgence (008 From Comparng Clusterngs to Combnng Clusterngs Zhwu Lu and Yuxn Peng and Janguo Xao Insttute of Computer Scence and Technology,

More information

Feature Reduction and Selection

Feature Reduction and Selection Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components

More information

Private Information Retrieval (PIR)

Private Information Retrieval (PIR) 2 Levente Buttyán Problem formulaton Alce wants to obtan nformaton from a database, but she does not want the database to learn whch nformaton she wanted e.g., Alce s an nvestor queryng a stock-market

More information

APPLIED MACHINE LEARNING

APPLIED MACHINE LEARNING Methods for Clusterng K-means, Soft K-means DBSCAN 1 Objectves Learn basc technques for data clusterng K-means and soft K-means, GMM (next lecture) DBSCAN Understand the ssues and major challenges n clusterng

More information

An Image Fusion Approach Based on Segmentation Region

An Image Fusion Approach Based on Segmentation Region Rong Wang, L-Qun Gao, Shu Yang, Yu-Hua Cha, and Yan-Chun Lu An Image Fuson Approach Based On Segmentaton Regon An Image Fuson Approach Based on Segmentaton Regon Rong Wang, L-Qun Gao, Shu Yang 3, Yu-Hua

More information

CS 534: Computer Vision Model Fitting

CS 534: Computer Vision Model Fitting CS 534: Computer Vson Model Fttng Sprng 004 Ahmed Elgammal Dept of Computer Scence CS 534 Model Fttng - 1 Outlnes Model fttng s mportant Least-squares fttng Maxmum lkelhood estmaton MAP estmaton Robust

More information

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints Australan Journal of Basc and Appled Scences, 2(4): 1204-1208, 2008 ISSN 1991-8178 Sum of Lnear and Fractonal Multobjectve Programmng Problem under Fuzzy Rules Constrants 1 2 Sanjay Jan and Kalash Lachhwan

More information

Document Representation and Clustering with WordNet Based Similarity Rough Set Model

Document Representation and Clustering with WordNet Based Similarity Rough Set Model IJCSI Internatonal Journal of Computer Scence Issues, Vol. 8, Issue 5, No 3, September 20 ISSN (Onlne): 694-084 www.ijcsi.org Document Representaton and Clusterng wth WordNet Based Smlarty Rough Set Model

More information

Performance Evaluation of Information Retrieval Systems

Performance Evaluation of Information Retrieval Systems Why System Evaluaton? Performance Evaluaton of Informaton Retreval Systems Many sldes n ths secton are adapted from Prof. Joydeep Ghosh (UT ECE) who n turn adapted them from Prof. Dk Lee (Unv. of Scence

More information

UB at GeoCLEF Department of Geography Abstract

UB at GeoCLEF Department of Geography   Abstract UB at GeoCLEF 2006 Mguel E. Ruz (1), Stuart Shapro (2), June Abbas (1), Slva B. Southwck (1) and Davd Mark (3) State Unversty of New York at Buffalo (1) Department of Lbrary and Informaton Studes (2) Department

More information

Fuzzy C-Means Initialized by Fixed Threshold Clustering for Improving Image Retrieval

Fuzzy C-Means Initialized by Fixed Threshold Clustering for Improving Image Retrieval Fuzzy -Means Intalzed by Fxed Threshold lusterng for Improvng Image Retreval NAWARA HANSIRI, SIRIPORN SUPRATID,HOM KIMPAN 3 Faculty of Informaton Technology Rangst Unversty Muang-Ake, Paholyotn Road, Patumtan,

More information

Classic Term Weighting Technique for Mining Web Content Outliers

Classic Term Weighting Technique for Mining Web Content Outliers Internatonal Conference on Computatonal Technques and Artfcal Intellgence (ICCTAI'2012) Penang, Malaysa Classc Term Weghtng Technque for Mnng Web Content Outlers W.R. Wan Zulkfel, N. Mustapha, and A. Mustapha

More information

Object-Based Techniques for Image Retrieval

Object-Based Techniques for Image Retrieval 54 Zhang, Gao, & Luo Chapter VII Object-Based Technques for Image Retreval Y. J. Zhang, Tsnghua Unversty, Chna Y. Y. Gao, Tsnghua Unversty, Chna Y. Luo, Tsnghua Unversty, Chna ABSTRACT To overcome the

More information

Shape Representation Robust to the Sketching Order Using Distance Map and Direction Histogram

Shape Representation Robust to the Sketching Order Using Distance Map and Direction Histogram Shape Representaton Robust to the Sketchng Order Usng Dstance Map and Drecton Hstogram Department of Computer Scence Yonse Unversty Kwon Yun CONTENTS Revew Topc Proposed Method System Overvew Sketch Normalzaton

More information

Experiments in Text Categorization Using Term Selection by Distance to Transition Point

Experiments in Text Categorization Using Term Selection by Distance to Transition Point Experments n Text Categorzaton Usng Term Selecton by Dstance to Transton Pont Edgar Moyotl-Hernández, Héctor Jménez-Salazar Facultad de Cencas de la Computacón, B. Unversdad Autónoma de Puebla, 14 Sur

More information

Optimal Workload-based Weighted Wavelet Synopses

Optimal Workload-based Weighted Wavelet Synopses Optmal Workload-based Weghted Wavelet Synopses Yoss Matas School of Computer Scence Tel Avv Unversty Tel Avv 69978, Israel matas@tau.ac.l Danel Urel School of Computer Scence Tel Avv Unversty Tel Avv 69978,

More information

CHAPTER 2 DECOMPOSITION OF GRAPHS

CHAPTER 2 DECOMPOSITION OF GRAPHS CHAPTER DECOMPOSITION OF GRAPHS. INTRODUCTION A graph H s called a Supersubdvson of a graph G f H s obtaned from G by replacng every edge uv of G by a bpartte graph,m (m may vary for each edge by dentfyng

More information

Keywords - Wep page classification; bag of words model; topic model; hierarchical classification; Support Vector Machines

Keywords - Wep page classification; bag of words model; topic model; hierarchical classification; Support Vector Machines (IJCSIS) Internatonal Journal of Computer Scence and Informaton Securty, Herarchcal Web Page Classfcaton Based on a Topc Model and Neghborng Pages Integraton Wongkot Srura Phayung Meesad Choochart Haruechayasak

More information

A Multi-step Strategy for Shape Similarity Search In Kamon Image Database

A Multi-step Strategy for Shape Similarity Search In Kamon Image Database A Mult-step Strategy for Shape Smlarty Search In Kamon Image Database Paul W.H. Kwan, Kazuo Torach 2, Kesuke Kameyama 2, Junbn Gao 3, Nobuyuk Otsu 4 School of Mathematcs, Statstcs and Computer Scence,

More information

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide Lobachevsky State Unversty of Nzhn Novgorod Polyhedron Quck Start Gude Nzhn Novgorod 2016 Contents Specfcaton of Polyhedron software... 3 Theoretcal background... 4 1. Interface of Polyhedron... 6 1.1.

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

Survey of Cluster Analysis and its Various Aspects

Survey of Cluster Analysis and its Various Aspects Harmnder Kaur et al, Internatonal Journal of Computer Scence and Moble Computng, Vol.4 Issue.0, October- 05, pg. 353-363 Avalable Onlne at www.csmc.com Internatonal Journal of Computer Scence and Moble

More information

A Simple Methodology for Database Clustering. Hao Tang 12 Guangdong University of Technology, Guangdong, , China

A Simple Methodology for Database Clustering. Hao Tang 12 Guangdong University of Technology, Guangdong, , China for Database Clusterng Guangdong Unversty of Technology, Guangdong, 0503, Chna E-mal: 6085@qq.com Me Zhang Guangdong Unversty of Technology, Guangdong, 0503, Chna E-mal:64605455@qq.com Database clusterng

More information

Detection of an Object by using Principal Component Analysis

Detection of an Object by using Principal Component Analysis Detecton of an Object by usng Prncpal Component Analyss 1. G. Nagaven, 2. Dr. T. Sreenvasulu Reddy 1. M.Tech, Department of EEE, SVUCE, Trupath, Inda. 2. Assoc. Professor, Department of ECE, SVUCE, Trupath,

More information

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like:

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like: Self-Organzng Maps (SOM) Turgay İBRİKÇİ, PhD. Outlne Introducton Structures of SOM SOM Archtecture Neghborhoods SOM Algorthm Examples Summary 1 2 Unsupervsed Hebban Learnng US Hebban Learnng, Cntd 3 A

More information

X- Chart Using ANOM Approach

X- Chart Using ANOM Approach ISSN 1684-8403 Journal of Statstcs Volume 17, 010, pp. 3-3 Abstract X- Chart Usng ANOM Approach Gullapall Chakravarth 1 and Chaluvad Venkateswara Rao Control lmts for ndvdual measurements (X) chart are

More information

Clustering Algorithm of Similarity Segmentation based on Point Sorting

Clustering Algorithm of Similarity Segmentation based on Point Sorting Internatonal onference on Logstcs Engneerng, Management and omputer Scence (LEMS 2015) lusterng Algorthm of Smlarty Segmentaton based on Pont Sortng Hanbng L, Yan Wang*, Lan Huang, Mngda L, Yng Sun, Hanyuan

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

Personalized Concept-Based Clustering of Search Engine Queries

Personalized Concept-Based Clustering of Search Engine Queries IEEE TRANSACTIONS ON JOURNAL NAME, MANUSCRIPT ID 1 Personalzed Concept-Based Clusterng of Search Engne Queres Kenneth Wa-Tng Leung, Wlfred Ng, and Dk Lun Lee Abstract The exponental growth of nformaton

More information

A Knowledge Management System for Organizing MEDLINE Database

A Knowledge Management System for Organizing MEDLINE Database A Knowledge Management System for Organzng MEDLINE Database Hyunk Km, Su-Shng Chen Computer and Informaton Scence Engneerng Department, Unversty of Florda, Ganesvlle, Florda 32611, USA Wth the exploson

More information

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS ARPN Journal of Engneerng and Appled Scences 006-017 Asan Research Publshng Network (ARPN). All rghts reserved. NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS Igor Grgoryev, Svetlana

More information

Collaboratively Regularized Nearest Points for Set Based Recognition

Collaboratively Regularized Nearest Points for Set Based Recognition Academc Center for Computng and Meda Studes, Kyoto Unversty Collaboratvely Regularzed Nearest Ponts for Set Based Recognton Yang Wu, Mchhko Mnoh, Masayuk Mukunok Kyoto Unversty 9/1/013 BMVC 013 @ Brstol,

More information

Type-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data

Type-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data Malaysan Journal of Mathematcal Scences 11(S) Aprl : 35 46 (2017) Specal Issue: The 2nd Internatonal Conference and Workshop on Mathematcal Analyss (ICWOMA 2016) MALAYSIAN JOURNAL OF MATHEMATICAL SCIENCES

More information

A Comparative Study for Outlier Detection Techniques in Data Mining

A Comparative Study for Outlier Detection Techniques in Data Mining A Comparatve Study for Outler Detecton Technques n Data Mnng Zurana Abu Bakar, Rosmayat Mohemad, Akbar Ahmad Department of Computer Scence Faculty of Scence and Technology Unversty College of Scence and

More information

Problem Set 3 Solutions

Problem Set 3 Solutions Introducton to Algorthms October 4, 2002 Massachusetts Insttute of Technology 6046J/18410J Professors Erk Demane and Shaf Goldwasser Handout 14 Problem Set 3 Solutons (Exercses were not to be turned n,

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

A Topology-aware Random Walk

A Topology-aware Random Walk A Topology-aware Random Walk Inkwan Yu, Rchard Newman Dept. of CISE, Unversty of Florda, Ganesvlle, Florda, USA Abstract When a graph can be decomposed nto clusters of well connected subgraphs, t s possble

More information

Network Intrusion Detection Based on PSO-SVM

Network Intrusion Detection Based on PSO-SVM TELKOMNIKA Indonesan Journal of Electrcal Engneerng Vol.1, No., February 014, pp. 150 ~ 1508 DOI: http://dx.do.org/10.11591/telkomnka.v1.386 150 Network Intruson Detecton Based on PSO-SVM Changsheng Xang*

More information

Hierarchical agglomerative. Cluster Analysis. Christine Siedle Clustering 1

Hierarchical agglomerative. Cluster Analysis. Christine Siedle Clustering 1 Herarchcal agglomeratve Cluster Analyss Chrstne Sedle 19-3-2004 Clusterng 1 Classfcaton Basc (unconscous & conscous) human strategy to reduce complexty Always based Cluster analyss to fnd or confrm types

More information

Impact of a New Attribute Extraction Algorithm on Web Page Classification

Impact of a New Attribute Extraction Algorithm on Web Page Classification Impact of a New Attrbute Extracton Algorthm on Web Page Classfcaton Gösel Brc, Banu Dr, Yldz Techncal Unversty, Computer Engneerng Department Abstract Ths paper ntroduces a new algorthm for dmensonalty

More information

Clustering algorithms and validity measures

Clustering algorithms and validity measures Clusterng algorthms and valdty measures M. Hald, Y. Batstas, M. Vazrganns Department of Informatcs Athens Unversty of Economcs & Busness Emal: {mhal, yanns, mvazrg}@aueb.gr Abstract Clusterng ams at dscoverng

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

Outlier Detection Methodologies Overview

Outlier Detection Methodologies Overview Outler Detecton Methodologes Overvew Mohd. Noor Md. Sap Department of Computer and Informaton Systems Faculty of Computer Scence and Informaton Systems Unverst Teknolog Malaysa 81310 Skuda, Johor Bahru,

More information

On the Network Partitioning of Large Urban Transportation Networks

On the Network Partitioning of Large Urban Transportation Networks On the etwor Parttonng of Large Urban Transportaton etwors Hamdeh Etemadna and Khaled Abdelghany Abstract Ths paper ams at developng a traffc networ parttonng mechansm for dstrbuted traffc management applcatons.

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

Associative Based Classification Algorithm For Diabetes Disease Prediction

Associative Based Classification Algorithm For Diabetes Disease Prediction Internatonal Journal of Engneerng Trends and Technology (IJETT) Volume-41 Number-3 - November 016 Assocatve Based Classfcaton Algorthm For Dabetes Dsease Predcton 1 N. Gnana Deepka, Y.surekha, 3 G.Laltha

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervsed Learnng and Clusterng Why consder unlabeled samples?. Collectng and labelng large set of samples s costly Gettng recorded speech s free, labelng s tme consumng 2. Classfer could be desgned

More information

A Novel Term_Class Relevance Measure for Text Categorization

A Novel Term_Class Relevance Measure for Text Categorization A Novel Term_Class Relevance Measure for Text Categorzaton D S Guru, Mahamad Suhl Department of Studes n Computer Scence, Unversty of Mysore, Mysore, Inda Abstract: In ths paper, we ntroduce a new measure

More information

Correlative features for the classification of textural images

Correlative features for the classification of textural images Correlatve features for the classfcaton of textural mages M A Turkova 1 and A V Gadel 1, 1 Samara Natonal Research Unversty, Moskovskoe Shosse 34, Samara, Russa, 443086 Image Processng Systems Insttute

More information

Lecture 5: Multilayer Perceptrons

Lecture 5: Multilayer Perceptrons Lecture 5: Multlayer Perceptrons Roger Grosse 1 Introducton So far, we ve only talked about lnear models: lnear regresson and lnear bnary classfers. We noted that there are functons that can t be represented

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Decson surface s a hyperplane (lne n 2D) n feature space (smlar to the Perceptron) Arguably, the most mportant recent dscovery n machne learnng In a nutshell: map the data to a predetermned

More information

A Similarity Measure Method for Symbolization Time Series

A Similarity Measure Method for Symbolization Time Series Research Journal of Appled Scences, Engneerng and Technology 5(5): 1726-1730, 2013 ISSN: 2040-7459; e-issn: 2040-7467 Maxwell Scentfc Organzaton, 2013 Submtted: July 27, 2012 Accepted: September 03, 2012

More information

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms Course Introducton Course Topcs Exams, abs, Proects A quc loo at a few algorthms 1 Advanced Data Structures and Algorthms Descrpton: We are gong to dscuss algorthm complexty analyss, algorthm desgn technques

More information

On the Efficiency of Swap-Based Clustering

On the Efficiency of Swap-Based Clustering On the Effcency of Swap-Based Clusterng Pas Fränt and Oll Vrmaok Department of Computer Scence, Unversty of Joensuu, Fnland {frant, ovrma}@cs.oensuu.f Abstract. Random swap-based clusterng s very smple

More information