/02/$ IEEE

Size: px
Start display at page:

Download "/02/$ IEEE"

Transcription

1 A Modfed Fuzzy ART for Soft Document Clusterng Ravkumar Kondadad and Robert Kozma Dvson of Computer Scence Department of Mathematcal Scences Unversty of Memphs, Memphs, TN ABSTRACT Document clusterng s a very useful applcaton n recent days especally wth the advent of the World Wde Web. Most of the exstng document clusterng algorthms ether produce clusters of poor qualty or are hghly computatonally expensve. In ths paper we propose a document-clusterng algorthm, KMART, that uses an unsupervsed Fuzzy Adaptve Resonance Theory (Fuzzy-ART) neural network. A modfed verson of the Fuzzy ART s used to enable a document to be n multple clusters. The number of clusters s determned dynamcally. Some experments are reported to compare the effcency and executon tme of our algorthm wth other document-clusterng algorthm lke Fuzzy c Means. The results show that KMART s both effectve and effcent. 1. INTRODUCTION Clusterng s an mportant tool n data mnng and knowledge dscovery. The ablty to automatcally group smlar tems together enables one to dscover hdden smlarty and key concepts. Also clusterng enables one to summarze a large amount of data nto a small number of groups. Ths serves as an nvaluable tool for users to comprehend a large amount of data. The World Wde Web search engnes serve as a good example for ths. Clusterng s used n many dfferent felds, lke data mnng [5], mage compresson [15] and nformaton retreval [16]. Reference [10] provdes an extensve survey of varous clusterng technques. The World Wde Web s a large repostory of many knds of nformaton. The sheer sze of t makes t hard for any user to fnd nformaton relevant to hm/her. Nowadays many search engnes exst to allow users to query the Web, usually va keyword search. However, snce each keyword s assocated wth many dfferent subjects, and the typcal amount of nformaton (web documents) returned s very large, the user s not able to have a good grasp of the output. Usually the search results are lsted by some sort of relevance measure. However, even documents of vastly dfferent subjects can share the same hgh relevance scores. Thus, one needs a way to cluster the results from the web search engne to facltate users. Some search engnes have pre-defned subjects that are used to categorze the output of search engnes (for nstance, yahoo.com). However, few search engnes (lke Teoma.com, wsenut.com) provde a dynamc clusterng mechansm.e. clusterng algorthms are appled only to the resultng documents of the query. We beleve that ths s an mportant servce for any search engne over the Web and s hghly benefcal to users. Whle there are many tradtonal clusterng algorthms avalable, document clusterng brngs along many dstnctve ssues to deal wth. One such ssue s representaton. A document s typcally represented as a vector (document vector), where each dmenson corresponds to a term (word), and the value denotes whether a term s present or not. In addton, smlarty between documents s typcally measured by some non-eucldean measure between the vectors. Ths means that a document vector cannot be manpulated lke normal vectors. For nstance, we cannot average document vectors. Ths mples that algorthms that requre a cluster center lke K-means [9,19] need to be modfed sgnfcantly. There are multple ways of lookng at the clusterng problem. Accordng to [11], there are four dfferent knds of clusterng algorthms: agglomeratve herarchcal algorthms, partton algorthms, model fttng and densty based. Agglomeratve herarchcal clusterng algorthms [7] use a bottom-up methodology to merge smaller clusters nto larger ones, usng technques such as mnmal spannng tree. Partton algorthms such as K-means try to dvde data nto subgroups such that the partton optmzes certan crtera, lke nter-cluster dstance or ntra-cluster dstances. They typcally take an teratve approach. Model fttng algorthms attempt to ft the data as a mxture of easly parameterzed dstrbutons (e.g. multvarate normal) and estmate ther parameters. Densty-based algorthms, such as DBSCAN [8], vew clusterng as locatng hgh-densty regons. The goal of document clusterng s to categorze the documents so that all the documents n a cluster are smlar. Most of the early work [9,19] appled tradtonal clusterng algorthms lke K-means to the sets of documents to be clustered. Wllett [24] provded a survey on applyng herarchcal clusterng algorthms nto clusterng documents. Cuttng et al. [6] proposed speedng up the parttonbased clusterng by usng technques that provde good ntal clusters. Two technques, Buckshot and Fractonaton are mentoned. Buckshot selects a small sample of documents to pre-cluster them usng a standard clusterng algorthm and assgns the rest of the documents to the clusters formed. Fractonaton splts the N documents nto m buckets where each bucket contans N/m documents. Fractonaton takes an nput parameter ρ, whch ndcates the reducton factor for each bucket. The standard clusterng algorthm s appled so that f there are n documents n each bucket, they are clustered nto n/ρ clusters. Now each of these clusters are treated as f they were ndvdual documents and the whole process s repeated untl there are only K clusters. Most of the algorthms above use a word-based approach to fnd the smlarty between two documents. In [26] a phrase-based approach called STC (suffx-tree clusterng) was proposed. STC s a lnear-tme clusterng algorthm. Ths allows STC to form clusters dependng not only on ndvdual words but also on orderng of the words.

2 In [18], a new method was proposed for clusterng related documents usng assocaton rules and hyper-graph parttonng. Ths method frst fnds set of terms that occur frequently together n documents usng the Apror algorthm [1]. These frequent tem sets are then used to group tems nto hyper-graph edges, and a hyper-graph parttonng algorthm s used to fnd the tem clusters. The smlarty among tems s captured mplctly by the frequent tem sets. The man advantage of ths method s that t does not requre any dstance measure to fnd the smlarty between documents. The clusterng technques above can be categorzed as hard clusterng, as every tem s clustered nto a sngle cluster. Soft clusterng allows each tem to assocate wth multple clusters, by ntroducng a membershp functon W j between each cluster-tem par to measure the degree of assocaton. In ths paper, we propose a soft document-clusterng algorthm usng a modfed Fuzzy Adaptve resonance theory network [4]. A bref descrpton about soft clusterng and some of the soft document clusterng algorthms s gven n the next secton. In the rest of ths paper, we dscuss about ART networks brefly and then we dscuss our proposed algorthm, together wth our expermental results. We show that our clusterng technque overcomes the problems of standard hard clusterng algorthms mentoned above, wthout payng any prce n effcency. 2. SOFT DOCUMENT CLUSTERING A sngle document very often contans multple themes. For example, ths paper can be classfed nto the felds fuzzy clusterng as well as Neural networks. Many clusterng algorthms mentoned above assgn each document to a sngle cluster, thus makng t hard for a user to dscover such nformaton. To remedy the above stuaton, we can employ soft clusterng. That s, each document can belong to multple clusters, and there s a measure to determne the assocaton between each cluster and each document. Ths has the followng advantages: A document can belong to multple clusters, thus we can dscover the multple themes for a document. Clusters that contan combnaton of themes. For nstance, n our experments, when the document set has documents related to baseball, moves and baseball-moves respectvely, KMART formed three clusters for documents about baseball, moves and baseball moves where as hard clusterng algorthms lke k-means faled to produce a cluster for baseballmoves. The measure assocated between clusters and documents can be used as a relevance measure to order the document approprately. Many soft clusterng algorthms employ the dea of fuzzness n ther methods. One of the most common fuzzy clusterng algorthms s Fuzzy C-means (FCM). It was frst reported by Dunn n 1972 and subsequently generalzed by Bezdek [3]. FCM s based on the Partton clusterng algorthm, teratng over the data sets untl the values of the membershp functon stablzes. FCM has been used n many applcatons lke medcal dagnoss, mage analyss, rrgaton desgn and automatc target recognton. Other fuzzy algorthm technques such as Self-Organzng Maps [14], also abounds. Barald and Blonda [2] provdes a good survey of such algorthms. However, one drawback of fuzzy algorthms s that they are slow compared to non-fuzzy algorthms. Fuzzy clusterng algorthms tend to be teratve, and typcal fuzzy clusterng algorthms requre repeatedly calculatng the assocatons between every cluster/document par. SISC and WBSC [12,13] are two soft documentclusterng algorthms developed by one of the authors of ths paper. SISC uses a modfed Fuzzy C Means algorthm to cluster documents. It uses a randomzaton approach that enables t to avod lot of computatons needed n a tradtonal fuzzy clusterng algorthm. At each teraton, t computes a smlarty measure between a cluster and a document wth a probablty proportonal to the proxmty of the smlarty measure to the threshold measure. It also has a robust outlerhandlng mechansm. WBSC [13] uses a word-based approach. It starts wth each term as a cluster and clusters the terms dependng on the documents they appear n. It s a herarchcal clusterng algorthm. There has also been work done on applyng Selforganzng maps to cluster documents. For nstance, [20] dscusses an approach called Adaptve approach whch uses self-organzng maps to cluster documents and also takes feedback from the user and re-clusters the documents. Approaches based on neural networks nclude one based on an adaptve blnear retreval model [25], and a herarchcal model based on fuzzy adaptve resonance theory [17]. In ths paper, we propose a modfcaton to the tradtonal Fuzzy ART algorthm, whch s a hard clusterng algorthm, to make t a soft clusterng algorthm. Ths also cuts down some teratve search process n Fuzzy ART makng t much faster than some of the exstng document-clusterng algorthms. We dscuss brefly about ART networks n the next secton. 3. ART NETWORKS ART (Adaptve Resonance theory) neural networks are developed by Grossberg [4] to address the problem of stablty-plastcty dlemma. A network s plastc, f t can adapt to the nputs ndefntely. A network s not stable f t can wth stand to nose. A tradtonal neural network uses the tranng data to adapt to the nput, but does not do t for test data. So t s not plastc. Also f the tranng data contans some erroneous nformaton t adapts accordng to that erroneous data. So t s not stable. The stablty-plastcty dlemma can be proposed as follows: How can a learnng system be desgned to reman plastc or adaptve and at the same tme reman stable to rrelevant events? The ART networks proposed by Grossberg solve ths problem. It s an ncremental algorthm. So t adapts to new

3 nputs ndefntely. At the same tme, t wont let new nputs to change any stored patterns untl the nput pattern matches the stored pattern wth n a certan tolerance. Ths means that an ART network has both plastcty and stablty; new categores can be formed when the envronment does not match any of the stored patterns, but the envronment cannot change stored patterns unless they are suffcently smlar. The general structure of an ART network s shown n the fgure 1. Fgure 1: Archtecture of an ART network A typcal ART network conssts of two layers: an nput layer (F1) and an output layer (F2). There are no hdden layers. The nput layer contans N nodes, where N s the number of nput patterns. The number of nodes n the output layer s decded dynamcally. Every node n the output layer has a correspondng prototype vector. The networks dynamcs are governed by two sub-systems: an attenton subsystem and an orentng subsystem. The attenton subsystem proposes a wnnng neuron (or category) and the orentng subsystem decdes whether to accept t or not. The network s sad to be n a resonant state when the orentng system accepts a wnnng category (.e. when the wnnng prototype vector matches the current nput pattern close enough.) There are many versons of ART algorthms: ART1, ART2, ARTMAP, Fuzzy ART, Fuzzy ART MAP etc. ART1 s the basc ART network that s used for bnary data. Fuzzy ART s an extenson of ART1 for analog data. It uses Fuzzy AND operator nstead of the crsp operator. The basc Fuzzy ART algorthm was descrbed below: The Fuzzy ART takes three nput parameters: choce parameter (β > 0), vglance parameter (0 ρ 1) and learnng rate (0 λ 1). Step1: Intalzaton: Intalze all the parameters. Step 2: Apply nput pattern Let I:=[next nput vector] Let P:= be the set of canddate prototype vectors Step 3: Category choce Fnd the closest prototype vector (P P) that maxmzes I β + P P β acts as a te breaker when multple prototype vectors are subsets of the nput pattern and favors larger magntude prototypes. Step 4: Vglance Test The prototype selected n the prevous step undergoes a vglance test that compares the smlarty between the wnnng prototype and the current nput pattern aganst a user-defned vglance parameter as follows I P ρ (2) I If the prototype passes the vglance test, t s adapted to the gven nput pattern (Step 5). Otherwse, the current prototype s deactvated for the current nput pattern and other prototypes n the F2 layer are also undergone the vglance test untl one of the prototypes passes the test. If none of them passes the test, a new prototype s created for the current nput pattern. Go to step 2 to contnue for the next nput. Step 5: Matched prototype update: The matched prototype s updated to move closer to the current nput pattern accordng to the followng equaton P = λ( I P ) + (1 λ) P (3) λ s the learnng rate. If λ s 1, t s called fast learnng. After the update, all the prototypes are reactvated and the algorthm contnues wth the next nput (step 2). The Fuzzy ART algorthm mentoned above s a hard clusterng algorthm. We modfed the Fuzzy Art to make t a soft clusterng algorthm. The algorthm s called KMART (Kondadad & Kozma Modfed ART) algorthm. In the next secton we present KMART. 4. KMART Although Fuzzy ART has the name fuzzy n t, t s used to work wth Fuzzy data. But t categorzes a gven set of data tems nto dfferent parttons. (.e. t s a hard clusterng algorthm). So t cannot be used for document clusterng effectvely. The algorthm can be broadly dvded nto three stages; Pre-processng, cluster buldng and keyword selecton. 4.1 Pre-processng: In ths stage, stop words are removed from all the documents. The algorthm mantans a common lst of stop words lke artcles, propostons, verb auxlares etc. Then all the words n all documents are combned and redundant terms are removed to form a lst of unque words n all the documents together. Document vectors are formed for each document. The length of the vector s the total number of unque words n all documents and the value of the vector s (1)

4 the frequency of the word f the word appears n the documents and zero otherwse. 4.2 Cluster Buldng: A modfed verson of Fuzzy ART was used for cluster buldng. We propose a change to the exstng Fuzzy ART algorthm to make t a soft clusterng algorthm. Instead of choosng a maxmum smlarty category and applyng the vglance test to check f t s close enough to the nput pattern, we can check every category n the F2 layer and apply the vglance test and f the category passes the vglance test, the nput document s put nto that partcular category. The smlarty measure computed n the vglance test defnes a degree of membershp of the gven nput pattern to the current cluster. Ths enables the document to be n multple clusters wth varyng degrees of membershps. All the prototypes that pass the vglance test are updated accordng to (3). Ths modfcaton also has other advantages apart from allowng soft clusterng. Fuzzy ART s generally tme consumng because t nvolves some teratve search whle searchng for a wnnng category that satsfes the vglance test. In our modfcaton, there s no search because every F2 node s checked. Ths makes t computatonally less expensve. Another advantage s that by elmnatng the category choce step, we are avodng the use of choce parameter, there by reducng the number of user-defned parameters n the system. Ths modfcaton also does not volate the underlyng prncple of ART networks.e. to avod stablty- plastcty dlemma. KMART stll s an ncremental clusterng algorthm, thus plastc and also before learnng a new nput t checks the nput and the nput pattern s learned only f t matches any of the stored patterns wth n a certan tolerance. 4.3 Keyword selecton: The fnal step n KMART s to dsplay representatve keywords for each cluster formed n the prevous stage. Ths allows users to dstngush among dfferent clusters. For each cluster, we rank the words n that cluster accordng to the number of documents n the cluster the word appears and the smlarty of the documents (defned by vglance test) n whch the word appears. We generally dsplay the frst 7-10 words as keywords. 5. EXPERIMENTS In ths secton, we descrbe the results of the varous experments conducted and analyze the results. We compared our experments wth both soft clusterng algorthms lke SISC [12] and also hard clusterng algorthms lke k-means [19] and Fractonaton [6]. 5.1 Data & Expermental Envronment: We downloaded 2000 documents from the World Wde Web manually that belong to dfferent categores lke food, agents, vrus, crcket, football, genetc algorthms etc. we also downloaded another 2000 documents from the UCI KDD archve [22] whch has varous documents from dfferent newsgroups. All the experments are carred out on a 733 MHz, 256 MB RAM PC. We ran the algorthm to get the clusters and compared the qualty of clusters formed. We also compared the executon tmes of all the algorthms for document sets of dfferent szes. To be more accurate, we actually ran all the algorthms on dfferent document sets. Snce except ours all other clusterng algorthms take number of clusters as nput, we made all of them to produce same number of clusters. All the results shown are averages taken over 20 dfferent runs. 5.2 Qualty of the Clusters: We compared the clusters formed by the documents aganst the documents n the orgnal categores and matched the clusters wth the categores one-to-one. The number of matches can be used to measure the qualty of the clusters formed. The matchng was computed usng a b-partte matchng algorthm [21]. Fgure 2 compares the qualty of the clusters formed by KMART to Fuzzy ART, SISC, K-means and Fractonaton. Number of matches per Qualty Number of Documents KMART FuzzyART SISC K-Means Fractonaton Fgure 2: Comparson of qualty of the clusters As we can clearly see from the fgure, KMART formed clusters of better qualty compared to all other algorthms and almost comparable to the tradtonal Fuzzy ART. 5.3 Executon tme: We also compared the executon tmes of our approach wth Fuzzy ART, SISC, K-Means and Fractonaton. Fgure 3 compares the executon tme of KMART wth other algorthms. The executon tme of KMART s lnear wth the number of documents. It can be clearly seen from the fgure that our algorthm runs much faster than all the hard clusterng algorthms and ts executon tme s almost comparable to that of SISC. KMART also runs much faster

5 than Fuzzy ART. Ths s because KMART avods the expensve tme consumng search n the category choce step by elmnatng that step from the Fuzzy ART algorthm. Executon tme (In mnutes) Executon tme Number of documents SISC KMART FuzzyART Fractonaton K-Means Fgure 3: Comparson of executon tmes Ths shows that KMART s very effectve and effcent both n terms of qualty of the clusters and also the executon tme. 6. CONCLUSIONS AND FUTURE WORK We proposed a modfcaton to the tradtonal Fuzzy ART to adapt t to the document-clusterng doman that makes t a soft clusterng algorthm and also reduces the executon tme. The expermental results show that our approach forms clusters of better qualty and also faster compared to other algorthms. The man advantage of KMART over most of other fuzzy clusterng algorthms s that the number of clusters s decded dynamcally. Currently t s practcal to work wth around 1500 documents from web search perspectve. Our future work nvolves makng t more effcent and reducng the response tme by adaptng better data structures. We are also consderng ways of automatcally tunng the values of the vglance and learnng rate parameters dependng on the nput document set dervng a parameter-free Fuzzy ART network. References: [1] Rakesh Agrawal and Ramakrshnan Srkant, Fast Algorthms for Mnng Assocaton Rules n Large Databases, In Proceedngs of the 1994 Internatonal Conference on Very Large Databases, pp , [2] A. Barald, P. Blonda, A survey of fuzzy clusterng algorthms for pattern recognton, Techncal Report TR , Internatonal Computer Scence Insttute, Berkeley, CA, Oct [3] J.L. Bezdek, Pattern Recognton Wth Fuzzy Objectve Functon Algorthms, Plenum Press, Nyew York, NY [4] Carpenter,G.A., Grossberg,S., Rosen,D. "Fuzzy ART: Fast Stable Learnng of Analog Patterns by an Adaptve Resonance System.", Neural Networks, 4, [5] M.S. Chen, J. Han, and P.S. Yu, Data Mnng: An Overvew from a Database Perspectve, IEEE Transactons on Knowledge and Data Engneerng, 8(6): , [6] Douglass R. Cuttng, Davd R. Karger, Jan O. Pedersen, John W. Tukey, Scatter/Gather: A Cluster-based Approach to Browsng Large Document Collectons, In Proceedngs of the Ffteenth Annual Internatonal ACM SIGIR Conference, pp , June [7] F. Murtagh. A survey of recent advances n herarchcal clusterng algorthms. The Computer Journal, 26(4): , [8]Martn Ester, Hans-Peter Kregel, Jorg Sander, and Xaowe Xu. A densty-based algorthm for dscoverng clusters n large spatal databases wth nose. In Proceedngs of the Second Internatonal Conference on Knowledge Dscovery and Data Mnng ({KDD}-96)}, pages AAAI Press, [9] D. R. Hll, A vector clusterng technque, n: Samuelson (Ed.), Mechanzed Informaton Storage, Retreval and Dssemnaton, North- Holland, Amsterdam, [10] A.K. Jan, M.N. Murty and P.J. Flynn, Data Clusterng: A Revew, ACM Computng Surveys. 31(3): , Sept [11] W.J. Krzanowsk and F.H. Marrott, Multvarate Analyss: Classfcaton, Covarance Structures and Repeated Measurements. Arnold, London, [12] Kng-Ip Ln, Ravkumar Kondadad, A Smlarty based Soft clusterng algorthm for documents, In proceedngs of 7th nternatonal conference on Database systems for advanced applcatons (DASFAA-2001), pp 40-47, Aprl [13] Kng-Ip Ln, Ravkumar Kondadad, "A Word based soft clusterng algorthm for documents", In proceedngs of 16th Internatonal conference on computers and ther applcatons (CATA-2001), pp , March [14] T. Kohonen, The self-organzng map, Proceedngs of the IEEE, 78(9): , [15] Y. Lnde, A. Buzo and R.M. Gray, An Algorthm for Vector Quantzaton Desgn, IEEE Transactons on Communcatons, 28(1), [16] M.N. Murty and A. K. Jan, Knowledge-based clusterng scheme for collecton management and retreval of lbrary books, Pattern recognton 28, , [17] Alberto Munoz, Compound key word generaton from document databases usng a Herarchcal clusterng ART Model, Intellgent Data Analyss, 1(1), Jan [18] Jerome Moore, Eu-Hong (Sam) Han, Danel Boley, Mara Gn, Robert Gross, Kyle Hastngs, George Karyps, Vpn Kumar, and Bamshad Mobasher, Web Page Categorzaton and Feature Selecton Usng Assocaton Rule and Prncpal Component Clusterng, In Proceedngs of seventh Workshop on Informaton Technologes and Systems (WITS'97), December [19] J. J. Roccho, Document retreval systems optmzaton and evaluaton, Ph.D. Thess, Harvard Unversty, [20] Dmtr Roussnov, Krstne Tolle, Marshall Ramsey and Hsnchun Chen, Interactve Internet search through Automatc clusterng: an emprcal study, In Proceedngs of the Internatonal ACM SIGIR Conference, pages , [21] Robert E. Tarjan, Data Structures and Network Algorthms, Socety for Industral and Appled Mathematcs, [22]UCI, [23] P.Wllett, V. Wnterman and D. Bawden, "Implementaton of Nearest Neghbour Searchng n an Onlne Chemcal Structure Search System, Journal of Chemcal Informaton and Computer Scences, 26, 36-41,1986. [24] P.Wllett, Recent trends n herarchcal document clusterng: a crtcal revew, Informaton processng and management, 24: , [25] Wong, S.K.M., Ca, Y.J., and Yao, Y.Y, Computaton of Term Assocaton by neural Network. In Proceedngs of the Sxteenth Annual Internatonal ACM SIGIR Conference on Research and Development n Informaton Retreval, pp , [26] O.Zamr, O.Etzon, Web document clusterng: a feasblty demonstraton, n Proceedngs of 19 th nternatonal ACM SIGIR conference on research and development n nformaton retreval (SIGIR 98), 1998, pp

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like:

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like: Self-Organzng Maps (SOM) Turgay İBRİKÇİ, PhD. Outlne Introducton Structures of SOM SOM Archtecture Neghborhoods SOM Algorthm Examples Summary 1 2 Unsupervsed Hebban Learnng US Hebban Learnng, Cntd 3 A

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

Concurrent Apriori Data Mining Algorithms

Concurrent Apriori Data Mining Algorithms Concurrent Apror Data Mnng Algorthms Vassl Halatchev Department of Electrcal Engneerng and Computer Scence York Unversty, Toronto October 8, 2015 Outlne Why t s mportant Introducton to Assocaton Rule Mnng

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning Outlne Artfcal Intellgence and ts applcatons Lecture 8 Unsupervsed Learnng Professor Danel Yeung danyeung@eee.org Dr. Patrck Chan patrckchan@eee.org South Chna Unversty of Technology, Chna Introducton

More information

Hierarchical clustering for gene expression data analysis

Hierarchical clustering for gene expression data analysis Herarchcal clusterng for gene expresson data analyss Gorgo Valentn e-mal: valentn@ds.unm.t Clusterng of Mcroarray Data. Clusterng of gene expresson profles (rows) => dscovery of co-regulated and functonally

More information

Unsupervised Learning

Unsupervised Learning Pattern Recognton Lecture 8 Outlne Introducton Unsupervsed Learnng Parametrc VS Non-Parametrc Approach Mxture of Denstes Maxmum-Lkelhood Estmates Clusterng Prof. Danel Yeung School of Computer Scence and

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,

More information

Query Clustering Using a Hybrid Query Similarity Measure

Query Clustering Using a Hybrid Query Similarity Measure Query clusterng usng a hybrd query smlarty measure Fu. L., Goh, D.H., & Foo, S. (2004). WSEAS Transacton on Computers, 3(3), 700-705. Query Clusterng Usng a Hybrd Query Smlarty Measure Ln Fu, Don Hoe-Lan

More information

Load Balancing for Hex-Cell Interconnection Network

Load Balancing for Hex-Cell Interconnection Network Int. J. Communcatons, Network and System Scences,,, - Publshed Onlne Aprl n ScRes. http://www.scrp.org/journal/jcns http://dx.do.org/./jcns.. Load Balancng for Hex-Cell Interconnecton Network Saher Manaseer,

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervsed Learnng and Clusterng Supervsed vs. Unsupervsed Learnng Up to now we consdered supervsed learnng scenaro, where we are gven 1. samples 1,, n 2. class labels for all samples 1,, n Ths s also

More information

UB at GeoCLEF Department of Geography Abstract

UB at GeoCLEF Department of Geography   Abstract UB at GeoCLEF 2006 Mguel E. Ruz (1), Stuart Shapro (2), June Abbas (1), Slva B. Southwck (1) and Davd Mark (3) State Unversty of New York at Buffalo (1) Department of Lbrary and Informaton Studes (2) Department

More information

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints Australan Journal of Basc and Appled Scences, 2(4): 1204-1208, 2008 ISSN 1991-8178 Sum of Lnear and Fractonal Multobjectve Programmng Problem under Fuzzy Rules Constrants 1 2 Sanjay Jan and Kalash Lachhwan

More information

Clustering Algorithm of Similarity Segmentation based on Point Sorting

Clustering Algorithm of Similarity Segmentation based on Point Sorting Internatonal onference on Logstcs Engneerng, Management and omputer Scence (LEMS 2015) lusterng Algorthm of Smlarty Segmentaton based on Pont Sortng Hanbng L, Yan Wang*, Lan Huang, Mngda L, Yng Sun, Hanyuan

More information

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Determining the Optimal Bandwidth Based on Multi-criterion Fusion Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Mult-crteron Fuson Ha-L Lang 1+, Xan-Mn

More information

A Combined Approach for Mining Fuzzy Frequent Itemset

A Combined Approach for Mining Fuzzy Frequent Itemset A Combned Approach for Mnng Fuzzy Frequent Itemset R. Prabamaneswar Department of Computer Scence Govndammal Adtanar College for Women Truchendur 628 215 ABSTRACT Frequent Itemset Mnng s an mportant approach

More information

A Deflected Grid-based Algorithm for Clustering Analysis

A Deflected Grid-based Algorithm for Clustering Analysis A Deflected Grd-based Algorthm for Clusterng Analyss NANCY P. LIN, CHUNG-I CHANG, HAO-EN CHUEH, HUNG-JEN CHEN, WEI-HUA HAO Department of Computer Scence and Informaton Engneerng Tamkang Unversty 5 Yng-chuan

More information

Classifier Selection Based on Data Complexity Measures *

Classifier Selection Based on Data Complexity Measures * Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.

More information

Analyzing Popular Clustering Algorithms from Different Viewpoints

Analyzing Popular Clustering Algorithms from Different Viewpoints 1000-9825/2002/13(08)1382-13 2002 Journal of Software Vol.13, No.8 Analyzng Popular Clusterng Algorthms from Dfferent Vewponts QIAN We-nng, ZHOU Ao-yng (Department of Computer Scence, Fudan Unversty, Shangha

More information

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for

More information

Incremental Learning with Support Vector Machines and Fuzzy Set Theory

Incremental Learning with Support Vector Machines and Fuzzy Set Theory The 25th Workshop on Combnatoral Mathematcs and Computaton Theory Incremental Learnng wth Support Vector Machnes and Fuzzy Set Theory Yu-Mng Chuang 1 and Cha-Hwa Ln 2* 1 Department of Computer Scence and

More information

Available online at Available online at Advanced in Control Engineering and Information Science

Available online at   Available online at   Advanced in Control Engineering and Information Science Avalable onlne at wwwscencedrectcom Avalable onlne at wwwscencedrectcom Proceda Proceda Engneerng Engneerng 00 (2011) 15000 000 (2011) 1642 1646 Proceda Engneerng wwwelsevercom/locate/proceda Advanced

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15 CS434a/541a: Pattern Recognton Prof. Olga Veksler Lecture 15 Today New Topc: Unsupervsed Learnng Supervsed vs. unsupervsed learnng Unsupervsed learnng Net Tme: parametrc unsupervsed learnng Today: nonparametrc

More information

An Image Fusion Approach Based on Segmentation Region

An Image Fusion Approach Based on Segmentation Region Rong Wang, L-Qun Gao, Shu Yang, Yu-Hua Cha, and Yan-Chun Lu An Image Fuson Approach Based On Segmentaton Regon An Image Fuson Approach Based on Segmentaton Regon Rong Wang, L-Qun Gao, Shu Yang 3, Yu-Hua

More information

A PATTERN RECOGNITION APPROACH TO IMAGE SEGMENTATION

A PATTERN RECOGNITION APPROACH TO IMAGE SEGMENTATION 1 THE PUBLISHING HOUSE PROCEEDINGS OF THE ROMANIAN ACADEMY, Seres A, OF THE ROMANIAN ACADEMY Volume 4, Number 2/2003, pp.000-000 A PATTERN RECOGNITION APPROACH TO IMAGE SEGMENTATION Tudor BARBU Insttute

More information

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms Course Introducton Course Topcs Exams, abs, Proects A quc loo at a few algorthms 1 Advanced Data Structures and Algorthms Descrpton: We are gong to dscuss algorthm complexty analyss, algorthm desgn technques

More information

Machine Learning. Topic 6: Clustering

Machine Learning. Topic 6: Clustering Machne Learnng Topc 6: lusterng lusterng Groupng data nto (hopefully useful) sets. Thngs on the left Thngs on the rght Applcatons of lusterng Hypothess Generaton lusters mght suggest natural groups. Hypothess

More information

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION Paulo Quntlano 1 & Antono Santa-Rosa 1 Federal Polce Department, Brasla, Brazl. E-mals: quntlano.pqs@dpf.gov.br and

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Decson surface s a hyperplane (lne n 2D) n feature space (smlar to the Perceptron) Arguably, the most mportant recent dscovery n machne learnng In a nutshell: map the data to a predetermned

More information

The Research of Support Vector Machine in Agricultural Data Classification

The Research of Support Vector Machine in Agricultural Data Classification The Research of Support Vector Machne n Agrcultural Data Classfcaton Le Sh, Qguo Duan, Xnmng Ma, Me Weng College of Informaton and Management Scence, HeNan Agrcultural Unversty, Zhengzhou 45000 Chna Zhengzhou

More information

Module Management Tool in Software Development Organizations

Module Management Tool in Software Development Organizations Journal of Computer Scence (5): 8-, 7 ISSN 59-66 7 Scence Publcatons Management Tool n Software Development Organzatons Ahmad A. Al-Rababah and Mohammad A. Al-Rababah Faculty of IT, Al-Ahlyyah Amman Unversty,

More information

Pruning Training Corpus to Speedup Text Classification 1

Pruning Training Corpus to Speedup Text Classification 1 Prunng Tranng Corpus to Speedup Text Classfcaton Jhong Guan and Shugeng Zhou School of Computer Scence, Wuhan Unversty, Wuhan, 430079, Chna hguan@wtusm.edu.cn State Key Lab of Software Engneerng, Wuhan

More information

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng

More information

A New Approach For the Ranking of Fuzzy Sets With Different Heights

A New Approach For the Ranking of Fuzzy Sets With Different Heights New pproach For the ankng of Fuzzy Sets Wth Dfferent Heghts Pushpnder Sngh School of Mathematcs Computer pplcatons Thapar Unversty, Patala-7 00 Inda pushpndersnl@gmalcom STCT ankng of fuzzy sets plays

More information

Study of Data Stream Clustering Based on Bio-inspired Model

Study of Data Stream Clustering Based on Bio-inspired Model , pp.412-418 http://dx.do.org/10.14257/astl.2014.53.86 Study of Data Stream lusterng Based on Bo-nspred Model Yngme L, Mn L, Jngbo Shao, Gaoyang Wang ollege of omputer Scence and Informaton Engneerng,

More information

Simulation Based Analysis of FAST TCP using OMNET++

Simulation Based Analysis of FAST TCP using OMNET++ Smulaton Based Analyss of FAST TCP usng OMNET++ Umar ul Hassan 04030038@lums.edu.pk Md Term Report CS678 Topcs n Internet Research Sprng, 2006 Introducton Internet traffc s doublng roughly every 3 months

More information

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics Introducton G10 NAG Fortran Lbrary Chapter Introducton G10 Smoothng n Statstcs Contents 1 Scope of the Chapter... 2 2 Background to the Problems... 2 2.1 Smoothng Methods... 2 2.2 Smoothng Splnes and Regresson

More information

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems A Unfed Framework for Semantcs and Feature Based Relevance Feedback n Image Retreval Systems Ye Lu *, Chunhu Hu 2, Xngquan Zhu 3*, HongJang Zhang 2, Qang Yang * School of Computng Scence Smon Fraser Unversty

More information

An Efficient Genetic Algorithm with Fuzzy c-means Clustering for Traveling Salesman Problem

An Efficient Genetic Algorithm with Fuzzy c-means Clustering for Traveling Salesman Problem An Effcent Genetc Algorthm wth Fuzzy c-means Clusterng for Travelng Salesman Problem Jong-Won Yoon and Sung-Bae Cho Dept. of Computer Scence Yonse Unversty Seoul, Korea jwyoon@sclab.yonse.ac.r, sbcho@cs.yonse.ac.r

More information

Visual Thesaurus for Color Image Retrieval using Self-Organizing Maps

Visual Thesaurus for Color Image Retrieval using Self-Organizing Maps Vsual Thesaurus for Color Image Retreval usng Self-Organzng Maps Chrstopher C. Yang and Mlo K. Yp Department of System Engneerng and Engneerng Management The Chnese Unversty of Hong Kong, Hong Kong ABSTRACT

More information

A fast algorithm for color image segmentation

A fast algorithm for color image segmentation Unersty of Wollongong Research Onlne Faculty of Informatcs - Papers (Arche) Faculty of Engneerng and Informaton Scences 006 A fast algorthm for color mage segmentaton L. Dong Unersty of Wollongong, lju@uow.edu.au

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervsed Learnng and Clusterng Why consder unlabeled samples?. Collectng and labelng large set of samples s costly Gettng recorded speech s free, labelng s tme consumng 2. Classfer could be desgned

More information

Classifying Acoustic Transient Signals Using Artificial Intelligence

Classifying Acoustic Transient Signals Using Artificial Intelligence Classfyng Acoustc Transent Sgnals Usng Artfcal Intellgence Steve Sutton, Unversty of North Carolna At Wlmngton (suttons@charter.net) Greg Huff, Unversty of North Carolna At Wlmngton (jgh7476@uncwl.edu)

More information

Image Alignment CSC 767

Image Alignment CSC 767 Image Algnment CSC 767 Image algnment Image from http://graphcs.cs.cmu.edu/courses/15-463/2010_fall/ Image algnment: Applcatons Panorama sttchng Image algnment: Applcatons Recognton of object nstances

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

SCALABLE AND VISUALIZATION-ORIENTED CLUSTERING FOR EXPLORATORY SPATIAL ANALYSIS

SCALABLE AND VISUALIZATION-ORIENTED CLUSTERING FOR EXPLORATORY SPATIAL ANALYSIS SCALABLE AND VISUALIZATION-ORIENTED CLUSTERING FOR EXPLORATORY SPATIAL ANALYSIS J.H.Guan, F.B.Zhu, F.L.Ban a School of Computer, Spatal Informaton & Dgtal Engneerng Center, Wuhan Unversty, Wuhan, 430079,

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

Web Mining: Clustering Web Documents A Preliminary Review

Web Mining: Clustering Web Documents A Preliminary Review Web Mnng: Clusterng Web Documents A Prelmnary Revew Khaled M. Hammouda Department of Systems Desgn Engneerng Unversty of Waterloo Waterloo, Ontaro, Canada 2L 3G1 hammouda@pam.uwaterloo.ca February 26,

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

Optimizing Document Scoring for Query Retrieval

Optimizing Document Scoring for Query Retrieval Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng

More information

K-means and Hierarchical Clustering

K-means and Hierarchical Clustering Note to other teachers and users of these sldes. Andrew would be delghted f you found ths source materal useful n gvng your own lectures. Feel free to use these sldes verbatm, or to modfy them to ft your

More information

Web Document Classification Based on Fuzzy Association

Web Document Classification Based on Fuzzy Association Web Document Classfcaton Based on Fuzzy Assocaton Choochart Haruechayasa, Me-Lng Shyu Department of Electrcal and Computer Engneerng Unversty of Mam Coral Gables, FL 33124, USA charuech@mam.edu, shyu@mam.edu

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

Using Fuzzy Logic to Enhance the Large Size Remote Sensing Images

Using Fuzzy Logic to Enhance the Large Size Remote Sensing Images Internatonal Journal of Informaton and Electroncs Engneerng Vol. 5 No. 6 November 015 Usng Fuzzy Logc to Enhance the Large Sze Remote Sensng Images Trung Nguyen Tu Huy Ngo Hoang and Thoa Vu Van Abstract

More information

Enhancement of Infrequent Purchased Product Recommendation Using Data Mining Techniques

Enhancement of Infrequent Purchased Product Recommendation Using Data Mining Techniques Enhancement of Infrequent Purchased Product Recommendaton Usng Data Mnng Technques Noraswalza Abdullah, Yue Xu, Shlomo Geva, and Mark Loo Dscplne of Computer Scence Faculty of Scence and Technology Queensland

More information

Lecture 5: Multilayer Perceptrons

Lecture 5: Multilayer Perceptrons Lecture 5: Multlayer Perceptrons Roger Grosse 1 Introducton So far, we ve only talked about lnear models: lnear regresson and lnear bnary classfers. We noted that there are functons that can t be represented

More information

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 An Iteratve Soluton Approach to Process Plant Layout usng Mxed

More information

Performance Evaluation of Information Retrieval Systems

Performance Evaluation of Information Retrieval Systems Why System Evaluaton? Performance Evaluaton of Informaton Retreval Systems Many sldes n ths secton are adapted from Prof. Joydeep Ghosh (UT ECE) who n turn adapted them from Prof. Dk Lee (Unv. of Scence

More information

(1) The control processes are too complex to analyze by conventional quantitative techniques.

(1) The control processes are too complex to analyze by conventional quantitative techniques. Chapter 0 Fuzzy Control and Fuzzy Expert Systems The fuzzy logc controller (FLC) s ntroduced n ths chapter. After ntroducng the archtecture of the FLC, we study ts components step by step and suggest a

More information

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe CSCI 104 Sortng Algorthms Mark Redekopp Davd Kempe Algorthm Effcency SORTING 2 Sortng If we have an unordered lst, sequental search becomes our only choce If we wll perform a lot of searches t may be benefcal

More information

Clustering is a discovery process in data mining.

Clustering is a discovery process in data mining. Cover Feature Chameleon: Herarchcal Clusterng Usng Dynamc Modelng Many advanced algorthms have dffculty dealng wth hghly varable clusters that do not follow a preconceved model. By basng ts selectons on

More information

Clustering using Vector Membership: An Extension of the Fuzzy C-Means Algorithm

Clustering using Vector Membership: An Extension of the Fuzzy C-Means Algorithm Clusterng usng Vector Membershp: An Extenson of the Fuzzy C-Means Algorthm Srnjoy Ganguly 1, Dgbalay Bose, Amt Konar 3 1,,3 Department of Electroncs & Telecommuncaton Engneerng, Jadavpur Unversty, Kolkata,

More information

CSCI 5417 Information Retrieval Systems Jim Martin!

CSCI 5417 Information Retrieval Systems Jim Martin! CSCI 5417 Informaton Retreval Systems Jm Martn! Lecture 11 9/29/2011 Today 9/29 Classfcaton Naïve Bayes classfcaton Ungram LM 1 Where we are... Bascs of ad hoc retreval Indexng Term weghtng/scorng Cosne

More information

Learning-Based Top-N Selection Query Evaluation over Relational Databases

Learning-Based Top-N Selection Query Evaluation over Relational Databases Learnng-Based Top-N Selecton Query Evaluaton over Relatonal Databases Lang Zhu *, Wey Meng ** * School of Mathematcs and Computer Scence, Hebe Unversty, Baodng, Hebe 071002, Chna, zhu@mal.hbu.edu.cn **

More information

Fuzzy Logic Based RS Image Classification Using Maximum Likelihood and Mahalanobis Distance Classifiers

Fuzzy Logic Based RS Image Classification Using Maximum Likelihood and Mahalanobis Distance Classifiers Research Artcle Internatonal Journal of Current Engneerng and Technology ISSN 77-46 3 INPRESSCO. All Rghts Reserved. Avalable at http://npressco.com/category/jcet Fuzzy Logc Based RS Image Usng Maxmum

More information

An Evolvable Clustering Based Algorithm to Learn Distance Function for Supervised Environment

An Evolvable Clustering Based Algorithm to Learn Distance Function for Supervised Environment IJCSI Internatonal Journal of Computer Scence Issues, Vol. 7, Issue 5, September 2010 ISSN (Onlne): 1694-0814 www.ijcsi.org 374 An Evolvable Clusterng Based Algorthm to Learn Dstance Functon for Supervsed

More information

FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK

FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK L-qng Qu, Yong-quan Lang 2, Jng-Chen 3, 2 College of Informaton Scence and Technology, Shandong Unversty of Scence and Technology,

More information

Parallel and Distributed Association Rule Mining - Dr. Giuseppe Di Fatta. San Vigilio,

Parallel and Distributed Association Rule Mining - Dr. Giuseppe Di Fatta. San Vigilio, Parallel and Dstrbuted Assocaton Rule Mnng - Dr. Guseppe D Fatta fatta@nf.un-konstanz.de San Vglo, 18-09-2004 1 Overvew Assocaton Rule Mnng (ARM) Apror algorthm Hgh Performance Parallel and Dstrbuted Computng

More information

Detection of an Object by using Principal Component Analysis

Detection of an Object by using Principal Component Analysis Detecton of an Object by usng Prncpal Component Analyss 1. G. Nagaven, 2. Dr. T. Sreenvasulu Reddy 1. M.Tech, Department of EEE, SVUCE, Trupath, Inda. 2. Assoc. Professor, Department of ECE, SVUCE, Trupath,

More information

LinkSelector: A Web Mining Approach to. Hyperlink Selection for Web Portals

LinkSelector: A Web Mining Approach to. Hyperlink Selection for Web Portals nkselector: A Web Mnng Approach to Hyperlnk Selecton for Web Portals Xao Fang and Olva R. u Sheng Department of Management Informaton Systems Unversty of Arzona, AZ 8572 {xfang,sheng}@bpa.arzona.edu Submtted

More information

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr) Helsnk Unversty Of Technology, Systems Analyss Laboratory Mat-2.08 Independent research projects n appled mathematcs (3 cr) "! #$&% Antt Laukkanen 506 R ajlaukka@cc.hut.f 2 Introducton...3 2 Multattrbute

More information

CS 534: Computer Vision Model Fitting

CS 534: Computer Vision Model Fitting CS 534: Computer Vson Model Fttng Sprng 004 Ahmed Elgammal Dept of Computer Scence CS 534 Model Fttng - 1 Outlnes Model fttng s mportant Least-squares fttng Maxmum lkelhood estmaton MAP estmaton Robust

More information

Graph-based Clustering

Graph-based Clustering Graphbased Clusterng Transform the data nto a graph representaton ertces are the data ponts to be clustered Edges are eghted based on smlarty beteen data ponts Graph parttonng Þ Each connected component

More information

Hybridization of Expectation-Maximization and K-Means Algorithms for Better Clustering Performance

Hybridization of Expectation-Maximization and K-Means Algorithms for Better Clustering Performance BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 16, No 2 Sofa 2016 Prnt ISSN: 1311-9702; Onlne ISSN: 1314-4081 DOI: 10.1515/cat-2016-0017 Hybrdzaton of Expectaton-Maxmzaton

More information

Classifier Swarms for Human Detection in Infrared Imagery

Classifier Swarms for Human Detection in Infrared Imagery Classfer Swarms for Human Detecton n Infrared Imagery Yur Owechko, Swarup Medasan, and Narayan Srnvasa HRL Laboratores, LLC 3011 Malbu Canyon Road, Malbu, CA 90265 {owechko, smedasan, nsrnvasa}@hrl.com

More information

Parallel matrix-vector multiplication

Parallel matrix-vector multiplication Appendx A Parallel matrx-vector multplcaton The reduced transton matrx of the three-dmensonal cage model for gel electrophoress, descrbed n secton 3.2, becomes excessvely large for polymer lengths more

More information

Reducing Frame Rate for Object Tracking

Reducing Frame Rate for Object Tracking Reducng Frame Rate for Object Trackng Pavel Korshunov 1 and We Tsang Oo 2 1 Natonal Unversty of Sngapore, Sngapore 11977, pavelkor@comp.nus.edu.sg 2 Natonal Unversty of Sngapore, Sngapore 11977, oowt@comp.nus.edu.sg

More information

Querying by sketch geographical databases. Yu Han 1, a *

Querying by sketch geographical databases. Yu Han 1, a * 4th Internatonal Conference on Sensors, Measurement and Intellgent Materals (ICSMIM 2015) Queryng by sketch geographcal databases Yu Han 1, a * 1 Department of Basc Courses, Shenyang Insttute of Artllery,

More information

A Webpage Similarity Measure for Web Sessions Clustering Using Sequence Alignment

A Webpage Similarity Measure for Web Sessions Clustering Using Sequence Alignment A Webpage Smlarty Measure for Web Sessons Clusterng Usng Sequence Algnment Mozhgan Azmpour-Kv School of Engneerng and Scence Sharf Unversty of Technology, Internatonal Campus Ksh Island, Iran mogan_az@ksh.sharf.edu

More information

Network Intrusion Detection Based on PSO-SVM

Network Intrusion Detection Based on PSO-SVM TELKOMNIKA Indonesan Journal of Electrcal Engneerng Vol.1, No., February 014, pp. 150 ~ 1508 DOI: http://dx.do.org/10.11591/telkomnka.v1.386 150 Network Intruson Detecton Based on PSO-SVM Changsheng Xang*

More information

Document Representation and Clustering with WordNet Based Similarity Rough Set Model

Document Representation and Clustering with WordNet Based Similarity Rough Set Model IJCSI Internatonal Journal of Computer Scence Issues, Vol. 8, Issue 5, No 3, September 20 ISSN (Onlne): 694-084 www.ijcsi.org Document Representaton and Clusterng wth WordNet Based Smlarty Rough Set Model

More information

Determining Fuzzy Sets for Quantitative Attributes in Data Mining Problems

Determining Fuzzy Sets for Quantitative Attributes in Data Mining Problems Determnng Fuzzy Sets for Quanttatve Attrbutes n Data Mnng Problems ATTILA GYENESEI Turku Centre for Computer Scence (TUCS) Unversty of Turku, Department of Computer Scence Lemmnkäsenkatu 4A, FIN-5 Turku

More information

User Authentication Based On Behavioral Mouse Dynamics Biometrics

User Authentication Based On Behavioral Mouse Dynamics Biometrics User Authentcaton Based On Behavoral Mouse Dynamcs Bometrcs Chee-Hyung Yoon Danel Donghyun Km Department of Computer Scence Department of Computer Scence Stanford Unversty Stanford Unversty Stanford, CA

More information

Maximum Variance Combined with Adaptive Genetic Algorithm for Infrared Image Segmentation

Maximum Variance Combined with Adaptive Genetic Algorithm for Infrared Image Segmentation Internatonal Conference on Logstcs Engneerng, Management and Computer Scence (LEMCS 5) Maxmum Varance Combned wth Adaptve Genetc Algorthm for Infrared Image Segmentaton Huxuan Fu College of Automaton Harbn

More information

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1 4/14/011 Outlne Dscrmnatve classfers for mage recognton Wednesday, Aprl 13 Krsten Grauman UT-Austn Last tme: wndow-based generc obect detecton basc ppelne face detecton wth boostng as case study Today:

More information

A Clustering Algorithm for Key Frame Extraction Based on Density Peak

A Clustering Algorithm for Key Frame Extraction Based on Density Peak Journal of Computer and Communcatons, 2018, 6, 118-128 http://www.scrp.org/ournal/cc ISSN Onlne: 2327-5227 ISSN Prnt: 2327-5219 A Clusterng Algorthm for Key Frame Extracton Based on Densty Peak Hong Zhao

More information

Enhanced AMBTC for Image Compression using Block Classification and Interpolation

Enhanced AMBTC for Image Compression using Block Classification and Interpolation Internatonal Journal of Computer Applcatons (0975 8887) Volume 5 No.0, August 0 Enhanced AMBTC for Image Compresson usng Block Classfcaton and Interpolaton S. Vmala Dept. of Comp. Scence Mother Teresa

More information

Improving Web Image Search using Meta Re-rankers

Improving Web Image Search using Meta Re-rankers VOLUME-1, ISSUE-V (Aug-Sep 2013) IS NOW AVAILABLE AT: www.dcst.com Improvng Web Image Search usng Meta Re-rankers B.Kavtha 1, N. Suata 2 1 Department of Computer Scence and Engneerng, Chtanya Bharath Insttute

More information

Fuzzy C-Means Initialized by Fixed Threshold Clustering for Improving Image Retrieval

Fuzzy C-Means Initialized by Fixed Threshold Clustering for Improving Image Retrieval Fuzzy -Means Intalzed by Fxed Threshold lusterng for Improvng Image Retreval NAWARA HANSIRI, SIRIPORN SUPRATID,HOM KIMPAN 3 Faculty of Informaton Technology Rangst Unversty Muang-Ake, Paholyotn Road, Patumtan,

More information

Keyword-based Document Clustering

Keyword-based Document Clustering Keyword-based ocument lusterng Seung-Shk Kang School of omputer Scence Kookmn Unversty & AIrc hungnung-dong Songbuk-gu Seoul 36-72 Korea sskang@kookmn.ac.kr Abstract ocument clusterng s an aggregaton of

More information

Accelerated kmeans Clustering using Binary Random Projection

Accelerated kmeans Clustering using Binary Random Projection Accelerated kmeans Clusterng usng Bnary Random Projecton Yukyung Cho, Chaehoon Park, and In So Kweon Robotcs and Computer Vson Lab., KAIST, Korea Abstract. Codebooks have been wdely used for mage retreval

More information

A Two-Stage Algorithm for Data Clustering

A Two-Stage Algorithm for Data Clustering A Two-Stage Algorthm for Data Clusterng Abdolreza Hatamlou 1 and Salwan Abdullah 2 1 Islamc Azad Unversty, Khoy Branch, Iran 2 Data Mnng and Optmsaton Research Group, Center for Artfcal Intellgence Technology,

More information