Keyword-based Document Clustering

Size: px
Start display at page:

Download "Keyword-based Document Clustering"

Transcription

1 Keyword-based ocument lusterng Seung-Shk Kang School of omputer Scence Kookmn Unversty & AIrc hungnung-dong Songbuk-gu Seoul Korea Abstract ocument clusterng s an aggregaton of related documents to a cluster based on the smlarty evaluaton task between documents and the representatves of clusters. erms and ther dscrmnatng features of terms are the clue to the clusterng and the dscrmnatng features are based on the term and document frequences. Feature selecton method on the bass of frequency statstcs has a lmtaton to the enhancement of the clusterng algorthm because t does not consder the contents of the cluster obects. In ths paper we adopt a content-based analytc approach to refne the smlarty computaton and propose a keyword-based clusterng algorthm. Expermental results show that content-based keyword weghtng outperforms frequency-based weghtng method. Keywords: ocument lusterng Weghtng Scheme Feature Selecton Introducton ocument clusterng s an aggregaton of documents by dscrmnatng the relevant documents from the rrelevant documents. he relevance determnaton crtera of any two documents s a smlarty measure and the representatves of the documents [234]. here are some smlarty measures such as ce coeffcent Jaccard s coeffcent and cosne measure. hese smlarty measures requre that the documents are represented n document vectors and the smlarty of two documents s calculated from the operaton of document vectors. In general the representatves of a document or a cluster are document vectors that consst of <term weght> pars and the document smlartes are determned by the terms and ther weghtng values that are extracted from the document [79]. In the prevous studes on the document clusterng we focused on the clusterng algorthm but the document hs work was supported by the Korea Scence and Engneerng Foundaton(KOSEF) through the Advanced Informaton echnology Research enter(airc). representaton methodology was not the mportant ssue. ocument vectors are smply constructed from the term frequency (F) and the nverted document frequency (IF). hs representaton of term weghtng method starts from the precondton that terms or keywords representng the document are calculated by F-IF. erm weghtng method by F-IF s generally used to construct a document vector but we cannot say that t s the best way of representng a document. So we suppose that there s a lmtaton to mprove the accuracy of the clusterng system only by mprovng the clusterng algorthm wthout changng the document/cluster representaton method. Also document clusterng requres a large amount of memory spaces to keep the representatves of documents/clusters and the smlarty measures [6 8 ]. Gven N documents to be clustered N N smlarty matrx s needed to store document smlarty measures. Also the recursve teraton of smlarty calculaton and reconstructng the representatve of the clusters need a huge number of computatons. In ths paper we propose a new clusterng method that s based on the keyword weghtng approach. he clusterng algorthm starts from the seed documents and the cluster s expanded by the keyword relatonshp. he evoluton of the cluster stops when no more documents are added to the cluster and rrelevant documents are removed from the cluster canddates. 2 Keyword-based Weghtng Scheme In general the constructon of a document vector depends on the term frequency and document frequency. If keywords are determned by frequency nformaton of the document we are apt to generate an error that nouns are often used regardless of substance of the document and the words of a hgh frequency are extracted. he clusterng method whch s focused on smlarty calculaton consders the whole words except stopwords as the representatve of the document and consttutes a document vector that s calculated by the weght value from the term frequency and document frequency. It s common that terms and ther weght values represent a document and <term weght> pars are the unque elements of the document vector. When we construct a document vector term frequency and document frequency are the most mportant features to calculate the weght of a term. As for the terms and

2 ther weght values the weght value of a term means a rankng score ust as an mportance factor to the document. So the term weghtng can be seen as an evaluaton of the term as a keyword or a stopword to the document. he weghtng functon w(t) from a term to ts weght s descrbed n expresson (). w: term weght () w(t) = f t s a stopword f t s a keyword a otherwse a For the weghtng scheme of terms there are two ponts of vews as the representaton of a document: () a dscrmnatve value that dstngushes or characterzes the document from others; (2) an mportance measure as a keyword or a stopword. Frequency-based term weghtng (FBW) s a statstcal measure of terms n an nter-document relatonshp. hs weghtng scheme s a very effcent method for dstngushng and characterzng a document from others and t performs well for the applcatons of document classfcaton or clusterng n the nformaton retreval system. he only evaluaton measure to characterze a document n frequency-based weghtng scheme s a frequency statstcs but term frequences are not the best measures to characterze the document by terms. Another weghtng scheme s a keyword-based term weghtng (KBW) method that s based on the keyword mportance factors n a document. It s an analytc approach that analyses the contents of a document to get a keyword lst from the document. he weght value of a word s calculated by the mportance factors as a keyword n a document. he weght value of a word s a combnaton value of keyword-weghtng factors and the terms are ordered by the keyword rankng score. he rankng scores n ths weghtng scheme are calculated from the analyss results of the document. Keyword-based term weghtng wll be a good soluton to overcome the lmtaton of the frequency-based weghtng scheme. Keywords n a text are the terms that represent a document and the canddate keywords are extracted from the analyss results of the document. Keyword rankng method depends on several factors of a term such as the type of a document the locaton and the role of words n a sentence or a paragraph [5]. hematc words of a document are representatve terms for the document. hematc words are extracted from a text by analysng the contents of the text but keyword extracton depends on the type of text. Keywords are easly found n the ttle or an abstract n a research paper that conssts of a ttle abstract body experment and concluson. Also newspaper artcle contans a keyword n the ttle or the frst part of the text. here are some clues of determnng a keyword and we may classfy them as word level sentence level paragraph level and text level features. Word-level features are the type of part-of-speech and case-role nformaton. he part-of-speech of Korean noun s dvded nto common noun compound noun proper noun and numeral. Syntactc or sentence-level features are the type of a phrase or a clause sentence locaton and sentence type. From the rhetorc word n a sentence the mportance of the sentence s computed and the terms n a sentence are affected by the type of a sentence. Also the weghtng scheme of a term n the subectve clause s not the equal to the same term that appeared n an auxlary clause or n a modfyng clause. Basc term weght s assgned by the type of a term and recomputed by the features that t accompanes n the text. hat s the weght value of a term s also determned by the characterstcs of word sentence phrase and clause where the term s extracted. 3 Keyword-based ocument lusterng Keyword-based document clusterng creates a cluster by the keywords of each document. Suppose that s a set of clusters that s fnally created by the clusterng algorthm. If n s the number of clusters n then s a set of clusters. 2 = { } 2 Each cluster s ntalsed by document d that s not assgned to the exstng clusters and d s a seed document of. When a new cluster s created expanson and reducton steps are repeated untl t reaches a stable state from the start state. In each evoluton steps for cluster s the -th state of. : the -th state of a cluster he characterstc vector of a cluster s a set of <keyword weght value> pars that represents the cluster. If K s a keyword set of a document and K s a keyword set of cluster then K s the -th state of cluster. Fgure shows a keyword-based clusterng algorthm for the cluster. Gven the keyword sets for each document cluster s created by the self-expandng algorthm. 3. luster Intalsaton he frst step of the clusterng algorthm s a creaton and ntalsaton of a new cluster. A document s selected that does not belong to any other cluster and t s assgned to a new cluster that s an ntal state n n

3 of cluster. = { } At ths tme a document that s the frst document n the new cluster s called a seed document (or an ntalsaton document). he seed document s randomly selected among the documents that do not belong to the clusters ~. Keyword set K of a document s a set of keywords k k 2 k n that are extracted from document. he ntal state of keyword set K s ntalsed by K. K = K K = { k k s a keyword that s extracted from } = { } K = K = { x document x where k K x for k such that k K } = do { K = K where x x + = for all x begn s = sm( x K ) f ( s < threshold) + + = { x} end for = + } whle ( seleteocument() ) = Fgure. Keyword-based clusterng algorthm 3.2 Expandng the luster In the ntalsaton step of the cluster a new cluster an ntal state of cluster s establshed as the seed document and the keyword set K s ntalsed by the key word set of the seed document. In the expandng step of the cluster the cluster s expanded by addng more related documents to the cluster that nclude the keywords of the seed document as the related documents of the seed document. hat s addng the total documents that K appear each keyword of (the keyword extracted from the seed document) to the cluster that s the next state of cluster expands the cluster. = { x k x K = K where K k K he cluster expanson s performed by the teraton of keyword expanson and cluster expanson. More documents are added to a cluster by the smlarty evaluaton between the keyword set and the document. If a new document s added to a cluster then the keywords n the added document are also added to the keyword set of the cluster. he frst expanson s performed by the keyword set extracted from the seed document. he second expanson s performed by new keywords that are added to a cluster as a result of the frst expanson. And the -th expanson s performed by the (-)-th state of the keyword set. he number of teratons s decded through the experment. When a cluster s expanded from to the keyword set K s also expanded to a new keyword set K that appears n the total documents of the cluster. he keyword set K of s a unon of the total keyword sets of. x } he keyword set of the cluster s used to calculate the characterstc vector of each cluster. he characterstc vector s consttuted the weght value calculated by term frequency (F) and nverted document frequency (IF) of the keywords and ths s used to calculate the smlarty measure between a document and the cluster. 3.3 luster Reducton and ompleton hs step s to produce a complete cluster by removng the documents that are not related to the cluster. For the cluster documents of a low smlarty to the cluster are removed that are not related to a cluster through the smlarty computaton wth the cluster. he result of cluster reducton s a flterng of documents that are not related + to the cluster and the cluster s generated as a next step of the cluster. Ultmately the cluster s completed that conssts of the related documents after flterng the non-related documents. If a cluster s completed the next cluster + s created through the same process. lusterng s termnated f all the documents are clustered or no more clusters are created. x

4 Input ocument Keyword Extracton create nverted-fle reate Inverted-Fle reate a luster Int. luster create cluster Keyword set 2 n Expand luster expand cluster Reduce/omplete luster lusters a b 2a 2b na nb a 2 L a n a b 2 L b n b z 2 L z n z Fgure 2. Overall archtecture of keywordbased clusterng 4 esgn and Implementaton he structure of a keyword-based clusterng system s shown n Fgure 2. At frst keywords are extracted from each nput document and the weght values of them are computed. Keywords and ther scores are stored n an nverted-fle structure. Inverted-fle structure s a good for the expanson of the cluster and addng the documents that ncludes a keyword to the ntal cluster. Fgure 3 shows an example of the operaton of the document clusterng system: ntalzaton expanson reducton and completon of clusters. A new cluster s created and t ncludes a seed document. An ntal set of keywords for the ntal state of a cluster s a keyword set K of document. K = { 2 n } For the terms n K documents that contan the same term are added as a canddate document n the cluster. Let the canddate documents be a b 2a 2b na nb. then xy s a document that s expanded by term x. Keyword set of the cluster s reconstructed by new set of documents. In each step of the cluster expanson the number of keywords that are used for the expanson and the threshold of the weght value are decded through experments consderng the maxmum number of document canddates n a cluster. Also <keyword weght> pars as an ntermedate representatve of the cluster are much mportant factor of the cluster expanson. result A B 2A 2B na nb complete cluster Fgure 3. Example of keyword-based clusterng Now a new keyword set that s lmted to the cluster canddates s constructed to get cluster documents. hrough the smlarty calculaton between the document and the canddate centrod of the cluster relevant documents are selected to be a member of the cluster. hrough the teratons on keyword selecton and the reconstructon of the related documents a new cluster s completed that reaches n a stable status wth a strong relatonshp between keyword set and document set. 5 he Experments We mplemented our clusterng algorthm and appled t to the clusterng of smlar documents. he test documents for the experment are collected from the three days of newspaper artcles. he total number of artcles s 383 and average 32 terms are extracted from the artcles. We performed a document clusterng by applyng the dfference crtera for term selecton: ) frequency-based term selecton; 2) percentage-based keyword selecton; and 3) keyword selecton by absolute number of keywords. Fgure 4 shows the result of smlarty clusterng by frequency-based term selecton. In ths experment three types of term selecton are performed.

5 - all terms are used to the clusterng - terms wth more than frequency 2 - terms wth more than frequency 3 In each experment we vared the smlarty decson rato by the percentage of term matches. Fgure 4 shows that term selecton by frequency 2 or 3 s not good for the representaton of a document. smlarty decson and auxlary keywords are also needed for the accuracy. Another pont n ths experment s that 3%~6% keyword selecton resulted better than the selecton of all terms. We compared the F -measure for the selecton of maxmum keywords. All the experments n Fgure 6 resulted better than the experment of usng all the terms n the document. Also 3~7 keywords wth 6%~7% match rato resulted a good performance for the comparson of document smlarty. term m atch rato term match rato Fgure 4. Frequency-based keyword selecton Fgure 6. Keyword selecton by maxmum term match rato Fgure 5. Percentage-based keyword selecton In the experment of percentage-based keyword selecton terms of hgh weght values are selected for the smlarty calculaton of the document. All the curves n Fgure 5 are a smlar shape except for % selecton. In case of % selecton we guess that less than % of keywords are not suffcent for the 6 oncluson It s common that clusterng algorthm s based on the smlarty computaton by frequency-based statstcs to aggregate the related documents. hs metrc s an mportant factor for term weghtng. We proposed a term weghtng method that s based on the keyword features and we tred to complement the drawback of frequency-based metrc. Based on the keyword weghtng scheme documents of the same keywords are grouped nto a cluster canddate and a new cluster s created by removng rrelevant documents. We performed an experment for the clusterng of smlar documents and the results showed that keyword-based weghtng scheme s better than the frequency-based method. Our keyword-based algorthm s usng 3%~6% of terms for a clusterng and the smlarty matrx s not a necessty that t wll be good for the clusterng of a huge number of documents. We also expect that ths algorthm wll be good for the topc trackng of specal events. In the experment we randomly selected a seed document and t s a bt senstve for the seed document. So our next research wll be focused on mnmzng the effect of the seed document by gettng representatve keywords before startng the clusterng.

6 References [] Anderberg M. R. luster Analyss for Applcatons New York: Academc 973. [2] an F. and E. A. Ozkarahan ynamc luster Mantenance Informaton Processng & Management Vol. 25 pp [3] ubes R. and A. K. Jan lusterng Methodologes n Exploratory ata Analyss Advances n omputers Vol. 9 pp [4] Frakes W. B. and R. Baeza-Yates Informaton Retreval Prentce Hall 992. [5] Kang S. S. H. G. Lee S. H. Son G.. Hong and B. J. Moon erm Weghtng Method by Postposton and ompound Noun Recognton Proceedngs of 3 th onference on Korean Language omputng pp [6] Murtagh F. omplextes of Herarchc lusterng Algorthms: State of the Art omputatonal Statstcs Quarterly Vol. pp [7] Perry S. A. and P. Wllett A Revew of the Use of Inverted Fles for Best Match Searchng n Informaton Retreval Systems Journal of Informaton Scence Vol. 6 pp [8] Sbson R. SLINK: an Optmally Effcent Algorthm for the Sngle-Lnk luster Method omputer Journal Vol. 6 pp [9] Wllett P. ocument lusterng Usng an Inverted Fle Approach Journal of Informaton Scence Vol. 2 pp [] Wllett P. Recent rends n Herarchc ocument lusterng: A rtcal Revew Informaton Processng and Management Vol. 24 No.5 pp

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

Hierarchical clustering for gene expression data analysis

Hierarchical clustering for gene expression data analysis Herarchcal clusterng for gene expresson data analyss Gorgo Valentn e-mal: valentn@ds.unm.t Clusterng of Mcroarray Data. Clusterng of gene expresson profles (rows) => dscovery of co-regulated and functonally

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

UB at GeoCLEF Department of Geography Abstract

UB at GeoCLEF Department of Geography   Abstract UB at GeoCLEF 2006 Mguel E. Ruz (1), Stuart Shapro (2), June Abbas (1), Slva B. Southwck (1) and Davd Mark (3) State Unversty of New York at Buffalo (1) Department of Lbrary and Informaton Studes (2) Department

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

Fuzzy C-Means Initialized by Fixed Threshold Clustering for Improving Image Retrieval

Fuzzy C-Means Initialized by Fixed Threshold Clustering for Improving Image Retrieval Fuzzy -Means Intalzed by Fxed Threshold lusterng for Improvng Image Retreval NAWARA HANSIRI, SIRIPORN SUPRATID,HOM KIMPAN 3 Faculty of Informaton Technology Rangst Unversty Muang-Ake, Paholyotn Road, Patumtan,

More information

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,

More information

Feature Reduction and Selection

Feature Reduction and Selection Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components

More information

Module Management Tool in Software Development Organizations

Module Management Tool in Software Development Organizations Journal of Computer Scence (5): 8-, 7 ISSN 59-66 7 Scence Publcatons Management Tool n Software Development Organzatons Ahmad A. Al-Rababah and Mohammad A. Al-Rababah Faculty of IT, Al-Ahlyyah Amman Unversty,

More information

CS 534: Computer Vision Model Fitting

CS 534: Computer Vision Model Fitting CS 534: Computer Vson Model Fttng Sprng 004 Ahmed Elgammal Dept of Computer Scence CS 534 Model Fttng - 1 Outlnes Model fttng s mportant Least-squares fttng Maxmum lkelhood estmaton MAP estmaton Robust

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

Performance Evaluation of Information Retrieval Systems

Performance Evaluation of Information Retrieval Systems Why System Evaluaton? Performance Evaluaton of Informaton Retreval Systems Many sldes n ths secton are adapted from Prof. Joydeep Ghosh (UT ECE) who n turn adapted them from Prof. Dk Lee (Unv. of Scence

More information

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS ARPN Journal of Engneerng and Appled Scences 006-017 Asan Research Publshng Network (ARPN). All rghts reserved. NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS Igor Grgoryev, Svetlana

More information

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc.

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc. [Type text] [Type text] [Type text] ISSN : 0974-74 Volume 0 Issue BoTechnology 04 An Indan Journal FULL PAPER BTAIJ 0() 04 [684-689] Revew on Chna s sports ndustry fnancng market based on market -orented

More information

Query Clustering Using a Hybrid Query Similarity Measure

Query Clustering Using a Hybrid Query Similarity Measure Query clusterng usng a hybrd query smlarty measure Fu. L., Goh, D.H., & Foo, S. (2004). WSEAS Transacton on Computers, 3(3), 700-705. Query Clusterng Usng a Hybrd Query Smlarty Measure Ln Fu, Don Hoe-Lan

More information

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems A Unfed Framework for Semantcs and Feature Based Relevance Feedback n Image Retreval Systems Ye Lu *, Chunhu Hu 2, Xngquan Zhu 3*, HongJang Zhang 2, Qang Yang * School of Computng Scence Smon Fraser Unversty

More information

Machine Learning. Topic 6: Clustering

Machine Learning. Topic 6: Clustering Machne Learnng Topc 6: lusterng lusterng Groupng data nto (hopefully useful) sets. Thngs on the left Thngs on the rght Applcatons of lusterng Hypothess Generaton lusters mght suggest natural groups. Hypothess

More information

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng

More information

Discriminative Dictionary Learning with Pairwise Constraints

Discriminative Dictionary Learning with Pairwise Constraints Dscrmnatve Dctonary Learnng wth Parwse Constrants Humn Guo Zhuoln Jang LARRY S. DAVIS UNIVERSITY OF MARYLAND Nov. 6 th, Outlne Introducton/motvaton Dctonary Learnng Dscrmnatve Dctonary Learnng wth Parwse

More information

Classifier Selection Based on Data Complexity Measures *

Classifier Selection Based on Data Complexity Measures * Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.

More information

Information Retrieval

Information Retrieval Anmol Bhasn abhasn[at]cedar.buffalo.edu Moht Devnan mdevnan[at]cse.buffalo.edu Sprng 2005 #$ "% &'" (! Informaton Retreval )" " * + %, ##$ + *--. / "#,0, #'",,,#$ ", # " /,,#,0 1"%,2 '",, Documents are

More information

K-means and Hierarchical Clustering

K-means and Hierarchical Clustering Note to other teachers and users of these sldes. Andrew would be delghted f you found ths source materal useful n gvng your own lectures. Feel free to use these sldes verbatm, or to modfy them to ft your

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

Available online at Available online at Advanced in Control Engineering and Information Science

Available online at   Available online at   Advanced in Control Engineering and Information Science Avalable onlne at wwwscencedrectcom Avalable onlne at wwwscencedrectcom Proceda Proceda Engneerng Engneerng 00 (2011) 15000 000 (2011) 1642 1646 Proceda Engneerng wwwelsevercom/locate/proceda Advanced

More information

Innovation Typology. Collaborative Authoritativeness. Focused Web Mining. Text and Data Mining In Innovation. Generational Models

Innovation Typology. Collaborative Authoritativeness. Focused Web Mining. Text and Data Mining In Innovation. Generational Models Text and Data Mnng In Innovaton Joseph Engler Innovaton Typology Generatonal Models 1. Lnear or Push (Baroque) 2. Pull (Romantc) 3. Cyclc (Classcal) 4. Strategc (New Age) 5. Collaboratve (Polyphonc) Collaboratve

More information

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide Lobachevsky State Unversty of Nzhn Novgorod Polyhedron Quck Start Gude Nzhn Novgorod 2016 Contents Specfcaton of Polyhedron software... 3 Theoretcal background... 4 1. Interface of Polyhedron... 6 1.1.

More information

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like:

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like: Self-Organzng Maps (SOM) Turgay İBRİKÇİ, PhD. Outlne Introducton Structures of SOM SOM Archtecture Neghborhoods SOM Algorthm Examples Summary 1 2 Unsupervsed Hebban Learnng US Hebban Learnng, Cntd 3 A

More information

Security Enhanced Dynamic ID based Remote User Authentication Scheme for Multi-Server Environments

Security Enhanced Dynamic ID based Remote User Authentication Scheme for Multi-Server Environments Internatonal Journal of u- and e- ervce, cence and Technology Vol8, o 7 0), pp7-6 http://dxdoorg/07/unesst087 ecurty Enhanced Dynamc ID based Remote ser Authentcaton cheme for ult-erver Envronments Jun-ub

More information

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur FEATURE EXTRACTION Dr. K.Vjayarekha Assocate Dean School of Electrcal and Electroncs Engneerng SASTRA Unversty, Thanjavur613 41 Jont Intatve of IITs and IISc Funded by MHRD Page 1 of 8 Table of Contents

More information

Application of Clustering Algorithm in Big Data Sample Set Optimization

Application of Clustering Algorithm in Big Data Sample Set Optimization Applcaton of Clusterng Algorthm n Bg Data Sample Set Optmzaton Yutang Lu 1, Qn Zhang 2 1 Department of Basc Subjects, Henan Insttute of Technology, Xnxang 453002, Chna 2 School of Mathematcs and Informaton

More information

Clustering Algorithm of Similarity Segmentation based on Point Sorting

Clustering Algorithm of Similarity Segmentation based on Point Sorting Internatonal onference on Logstcs Engneerng, Management and omputer Scence (LEMS 2015) lusterng Algorthm of Smlarty Segmentaton based on Pont Sortng Hanbng L, Yan Wang*, Lan Huang, Mngda L, Yng Sun, Hanyuan

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervsed Learnng and Clusterng Why consder unlabeled samples?. Collectng and labelng large set of samples s costly Gettng recorded speech s free, labelng s tme consumng 2. Classfer could be desgned

More information

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Determining the Optimal Bandwidth Based on Multi-criterion Fusion Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Mult-crteron Fuson Ha-L Lang 1+, Xan-Mn

More information

An Image Fusion Approach Based on Segmentation Region

An Image Fusion Approach Based on Segmentation Region Rong Wang, L-Qun Gao, Shu Yang, Yu-Hua Cha, and Yan-Chun Lu An Image Fuson Approach Based On Segmentaton Regon An Image Fuson Approach Based on Segmentaton Regon Rong Wang, L-Qun Gao, Shu Yang 3, Yu-Hua

More information

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms Course Introducton Course Topcs Exams, abs, Proects A quc loo at a few algorthms 1 Advanced Data Structures and Algorthms Descrpton: We are gong to dscuss algorthm complexty analyss, algorthm desgn technques

More information

Document Representation and Clustering with WordNet Based Similarity Rough Set Model

Document Representation and Clustering with WordNet Based Similarity Rough Set Model IJCSI Internatonal Journal of Computer Scence Issues, Vol. 8, Issue 5, No 3, September 20 ISSN (Onlne): 694-084 www.ijcsi.org Document Representaton and Clusterng wth WordNet Based Smlarty Rough Set Model

More information

User Authentication Based On Behavioral Mouse Dynamics Biometrics

User Authentication Based On Behavioral Mouse Dynamics Biometrics User Authentcaton Based On Behavoral Mouse Dynamcs Bometrcs Chee-Hyung Yoon Danel Donghyun Km Department of Computer Scence Department of Computer Scence Stanford Unversty Stanford Unversty Stanford, CA

More information

Chinese Word Segmentation based on the Improved Particle Swarm Optimization Neural Networks

Chinese Word Segmentation based on the Improved Particle Swarm Optimization Neural Networks Chnese Word Segmentaton based on the Improved Partcle Swarm Optmzaton Neural Networks Ja He Computatonal Intellgence Laboratory School of Computer Scence and Engneerng, UESTC Chengdu, Chna Department of

More information

Unsupervised Learning

Unsupervised Learning Pattern Recognton Lecture 8 Outlne Introducton Unsupervsed Learnng Parametrc VS Non-Parametrc Approach Mxture of Denstes Maxmum-Lkelhood Estmates Clusterng Prof. Danel Yeung School of Computer Scence and

More information

Description of NTU Approach to NTCIR3 Multilingual Information Retrieval

Description of NTU Approach to NTCIR3 Multilingual Information Retrieval Proceedngs of the Thrd NTCIR Workshop Descrpton of NTU Approach to NTCIR3 Multlngual Informaton Retreval Wen-Cheng Ln and Hsn-Hs Chen Department of Computer Scence and Informaton Engneerng Natonal Tawan

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

Load-Balanced Anycast Routing

Load-Balanced Anycast Routing Load-Balanced Anycast Routng Chng-Yu Ln, Jung-Hua Lo, and Sy-Yen Kuo Department of Electrcal Engneerng atonal Tawan Unversty, Tape, Tawan sykuo@cc.ee.ntu.edu.tw Abstract For fault-tolerance and load-balance

More information

An Improved Image Segmentation Algorithm Based on the Otsu Method

An Improved Image Segmentation Algorithm Based on the Otsu Method 3th ACIS Internatonal Conference on Software Engneerng, Artfcal Intellgence, Networkng arallel/dstrbuted Computng An Improved Image Segmentaton Algorthm Based on the Otsu Method Mengxng Huang, enjao Yu,

More information

Improving Web Image Search using Meta Re-rankers

Improving Web Image Search using Meta Re-rankers VOLUME-1, ISSUE-V (Aug-Sep 2013) IS NOW AVAILABLE AT: www.dcst.com Improvng Web Image Search usng Meta Re-rankers B.Kavtha 1, N. Suata 2 1 Department of Computer Scence and Engneerng, Chtanya Bharath Insttute

More information

Recommended Items Rating Prediction based on RBF Neural Network Optimized by PSO Algorithm

Recommended Items Rating Prediction based on RBF Neural Network Optimized by PSO Algorithm Recommended Items Ratng Predcton based on RBF Neural Network Optmzed by PSO Algorthm Chengfang Tan, Cayn Wang, Yuln L and Xx Q Abstract In order to mtgate the data sparsty and cold-start problems of recommendaton

More information

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning Outlne Artfcal Intellgence and ts applcatons Lecture 8 Unsupervsed Learnng Professor Danel Yeung danyeung@eee.org Dr. Patrck Chan patrckchan@eee.org South Chna Unversty of Technology, Chna Introducton

More information

LinkSelector: A Web Mining Approach to. Hyperlink Selection for Web Portals

LinkSelector: A Web Mining Approach to. Hyperlink Selection for Web Portals nkselector: A Web Mnng Approach to Hyperlnk Selecton for Web Portals Xao Fang and Olva R. u Sheng Department of Management Informaton Systems Unversty of Arzona, AZ 8572 {xfang,sheng}@bpa.arzona.edu Submtted

More information

BIN XIA et al: AN IMPROVED K-MEANS ALGORITHM BASED ON CLOUD PLATFORM FOR DATA MINING

BIN XIA et al: AN IMPROVED K-MEANS ALGORITHM BASED ON CLOUD PLATFORM FOR DATA MINING An Improved K-means Algorthm based on Cloud Platform for Data Mnng Bn Xa *, Yan Lu 2. School of nformaton and management scence, Henan Agrcultural Unversty, Zhengzhou, Henan 450002, P.R. Chna 2. College

More information

Load Balancing for Hex-Cell Interconnection Network

Load Balancing for Hex-Cell Interconnection Network Int. J. Communcatons, Network and System Scences,,, - Publshed Onlne Aprl n ScRes. http://www.scrp.org/journal/jcns http://dx.do.org/./jcns.. Load Balancng for Hex-Cell Interconnecton Network Saher Manaseer,

More information

Comparison of Performance in Text Mining using Categorization of Unstructured Data

Comparison of Performance in Text Mining using Categorization of Unstructured Data Indan Journal of Scence and Technology, Vol 9(4), DOI: 0.7485/jst/06/v94/9648, June 06 ISSN (Prnt) : 0974-6846 ISSN (Onlne) : 0974-5645 Comparson of Performance n Text Mnng usng Categorzaton of Unstructured

More information

BRDPHHC: A Balance RDF Data Partitioning Algorithm based on Hybrid Hierarchical Clustering

BRDPHHC: A Balance RDF Data Partitioning Algorithm based on Hybrid Hierarchical Clustering 015 IEEE 17th Internatonal Conference on Hgh Performance Computng and Communcatons (HPCC), 015 IEEE 7th Internatonal Symposum on Cyberspace Safety and Securty (CSS), and 015 IEEE 1th Internatonal Conf

More information

Related-Mode Attacks on CTR Encryption Mode

Related-Mode Attacks on CTR Encryption Mode Internatonal Journal of Network Securty, Vol.4, No.3, PP.282 287, May 2007 282 Related-Mode Attacks on CTR Encrypton Mode Dayn Wang, Dongda Ln, and Wenlng Wu (Correspondng author: Dayn Wang) Key Laboratory

More information

Revealing Paths of Relevant Information in Web Graphs

Revealing Paths of Relevant Information in Web Graphs Revealng Paths of Relevant Informaton n Web Graphs Georgos Kouzas 1, Vassleos Kolas 2, Ioanns Anagnostopoulos 1 and Eleftheros Kayafas 2 1 Unversty of the Aegean Department of Fnancal and Management Engneerng

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

Problem Set 3 Solutions

Problem Set 3 Solutions Introducton to Algorthms October 4, 2002 Massachusetts Insttute of Technology 6046J/18410J Professors Erk Demane and Shaf Goldwasser Handout 14 Problem Set 3 Solutons (Exercses were not to be turned n,

More information

Smoothing Spline ANOVA for variable screening

Smoothing Spline ANOVA for variable screening Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory

More information

Intra-Parametric Analysis of a Fuzzy MOLP

Intra-Parametric Analysis of a Fuzzy MOLP Intra-Parametrc Analyss of a Fuzzy MOLP a MIAO-LING WANG a Department of Industral Engneerng and Management a Mnghsn Insttute of Technology and Hsnchu Tawan, ROC b HSIAO-FAN WANG b Insttute of Industral

More information

Edge Detection in Noisy Images Using the Support Vector Machines

Edge Detection in Noisy Images Using the Support Vector Machines Edge Detecton n Nosy Images Usng the Support Vector Machnes Hlaro Gómez-Moreno, Saturnno Maldonado-Bascón, Francsco López-Ferreras Sgnal Theory and Communcatons Department. Unversty of Alcalá Crta. Madrd-Barcelona

More information

Solving two-person zero-sum game by Matlab

Solving two-person zero-sum game by Matlab Appled Mechancs and Materals Onlne: 2011-02-02 ISSN: 1662-7482, Vols. 50-51, pp 262-265 do:10.4028/www.scentfc.net/amm.50-51.262 2011 Trans Tech Publcatons, Swtzerland Solvng two-person zero-sum game by

More information

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET 1 BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET TZU-CHENG CHUANG School of Electrcal and Computer Engneerng, Purdue Unversty, West Lafayette, Indana 47907 SAUL B. GELFAND School

More information

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION Paulo Quntlano 1 & Antono Santa-Rosa 1 Federal Polce Department, Brasla, Brazl. E-mals: quntlano.pqs@dpf.gov.br and

More information

Efficient Content Representation in MPEG Video Databases

Efficient Content Representation in MPEG Video Databases Effcent Content Representaton n MPEG Vdeo Databases Yanns S. Avrths, Nkolaos D. Doulams, Anastasos D. Doulams and Stefanos D. Kollas Department of Electrcal and Computer Engneerng Natonal Techncal Unversty

More information

From Comparing Clusterings to Combining Clusterings

From Comparing Clusterings to Combining Clusterings Proceedngs of the Twenty-Thrd AAAI Conference on Artfcal Intellgence (008 From Comparng Clusterngs to Combnng Clusterngs Zhwu Lu and Yuxn Peng and Janguo Xao Insttute of Computer Scence and Technology,

More information

Impact of a New Attribute Extraction Algorithm on Web Page Classification

Impact of a New Attribute Extraction Algorithm on Web Page Classification Impact of a New Attrbute Extracton Algorthm on Web Page Classfcaton Gösel Brc, Banu Dr, Yldz Techncal Unversty, Computer Engneerng Department Abstract Ths paper ntroduces a new algorthm for dmensonalty

More information

A Method of Hot Topic Detection in Blogs Using N-gram Model

A Method of Hot Topic Detection in Blogs Using N-gram Model 84 JOURNAL OF SOFTWARE, VOL. 8, NO., JANUARY 203 A Method of Hot Topc Detecton n Blogs Usng N-gram Model Xaodong Wang College of Computer and Informaton Technology, Henan Normal Unversty, Xnxang, Chna

More information

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for

More information

PCA Based Gait Segmentation

PCA Based Gait Segmentation Honggu L, Cupng Sh & Xngguo L PCA Based Gat Segmentaton PCA Based Gat Segmentaton Honggu L, Cupng Sh, and Xngguo L 2 Electronc Department, Physcs College, Yangzhou Unversty, 225002 Yangzhou, Chna 2 Department

More information

An Efficient Genetic Algorithm with Fuzzy c-means Clustering for Traveling Salesman Problem

An Efficient Genetic Algorithm with Fuzzy c-means Clustering for Traveling Salesman Problem An Effcent Genetc Algorthm wth Fuzzy c-means Clusterng for Travelng Salesman Problem Jong-Won Yoon and Sung-Bae Cho Dept. of Computer Scence Yonse Unversty Seoul, Korea jwyoon@sclab.yonse.ac.r, sbcho@cs.yonse.ac.r

More information

FAHP and Modified GRA Based Network Selection in Heterogeneous Wireless Networks

FAHP and Modified GRA Based Network Selection in Heterogeneous Wireless Networks 2017 2nd Internatonal Semnar on Appled Physcs, Optoelectroncs and Photoncs (APOP 2017) ISBN: 978-1-60595-522-3 FAHP and Modfed GRA Based Network Selecton n Heterogeneous Wreless Networks Xaohan DU, Zhqng

More information

A Simple Methodology for Database Clustering. Hao Tang 12 Guangdong University of Technology, Guangdong, , China

A Simple Methodology for Database Clustering. Hao Tang 12 Guangdong University of Technology, Guangdong, , China for Database Clusterng Guangdong Unversty of Technology, Guangdong, 0503, Chna E-mal: 6085@qq.com Me Zhang Guangdong Unversty of Technology, Guangdong, 0503, Chna E-mal:64605455@qq.com Database clusterng

More information

Type-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data

Type-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data Malaysan Journal of Mathematcal Scences 11(S) Aprl : 35 46 (2017) Specal Issue: The 2nd Internatonal Conference and Workshop on Mathematcal Analyss (ICWOMA 2016) MALAYSIAN JOURNAL OF MATHEMATICAL SCIENCES

More information

A Multi-step Strategy for Shape Similarity Search In Kamon Image Database

A Multi-step Strategy for Shape Similarity Search In Kamon Image Database A Mult-step Strategy for Shape Smlarty Search In Kamon Image Database Paul W.H. Kwan, Kazuo Torach 2, Kesuke Kameyama 2, Junbn Gao 3, Nobuyuk Otsu 4 School of Mathematcs, Statstcs and Computer Scence,

More information

Study of Data Stream Clustering Based on Bio-inspired Model

Study of Data Stream Clustering Based on Bio-inspired Model , pp.412-418 http://dx.do.org/10.14257/astl.2014.53.86 Study of Data Stream lusterng Based on Bo-nspred Model Yngme L, Mn L, Jngbo Shao, Gaoyang Wang ollege of omputer Scence and Informaton Engneerng,

More information

MODULE DESIGN BASED ON INTERFACE INTEGRATION TO MAXIMIZE PRODUCT VARIETY AND MINIMIZE FAMILY COST

MODULE DESIGN BASED ON INTERFACE INTEGRATION TO MAXIMIZE PRODUCT VARIETY AND MINIMIZE FAMILY COST INTERNATIONAL CONFERENCE ON ENGINEERING DESIGN, ICED 07 28-31 AUGUST 2007, CITE DES SCIENCES ET DE L'INDUSTRIE, PARIS, FRANCE MODULE DESIGN BASED ON INTERFACE INTEGRATION TO MAIMIZE PRODUCT VARIETY AND

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

Private Information Retrieval (PIR)

Private Information Retrieval (PIR) 2 Levente Buttyán Problem formulaton Alce wants to obtan nformaton from a database, but she does not want the database to learn whch nformaton she wanted e.g., Alce s an nvestor queryng a stock-market

More information

Single Document Keyphrase Extraction Using Neighborhood Knowledge

Single Document Keyphrase Extraction Using Neighborhood Knowledge Proceedngs of the Twenty-Thrd AAAI Conference on Artfcal Intellgence (2008) Sngle Document Keyphrase Extracton Usng Neghborhood Knowledge Xaoun Wan and Janguo Xao Insttute of Computer Scence and Technology

More information

Clustering Algorithm Combining CPSO with K-Means Chunqin Gu 1, a, Qian Tao 2, b

Clustering Algorithm Combining CPSO with K-Means Chunqin Gu 1, a, Qian Tao 2, b Internatonal Conference on Advances n Mechancal Engneerng and Industral Informatcs (AMEII 05) Clusterng Algorthm Combnng CPSO wth K-Means Chunqn Gu, a, Qan Tao, b Department of Informaton Scence, Zhongka

More information

Face Recognition University at Buffalo CSE666 Lecture Slides Resources:

Face Recognition University at Buffalo CSE666 Lecture Slides Resources: Face Recognton Unversty at Buffalo CSE666 Lecture Sldes Resources: http://www.face-rec.org/algorthms/ Overvew of face recognton algorthms Correlaton - Pxel based correspondence between two face mages Structural

More information

Concurrent Apriori Data Mining Algorithms

Concurrent Apriori Data Mining Algorithms Concurrent Apror Data Mnng Algorthms Vassl Halatchev Department of Electrcal Engneerng and Computer Scence York Unversty, Toronto October 8, 2015 Outlne Why t s mportant Introducton to Assocaton Rule Mnng

More information

TN348: Openlab Module - Colocalization

TN348: Openlab Module - Colocalization TN348: Openlab Module - Colocalzaton Topc The Colocalzaton module provdes the faclty to vsualze and quantfy colocalzaton between pars of mages. The Colocalzaton wndow contans a prevew of the two mages

More information

AUTOMATED METHOD FOR STATISTICAL PROCESSING OF AE TESTING DATA

AUTOMATED METHOD FOR STATISTICAL PROCESSING OF AE TESTING DATA AUTOMATED METHOD FOR STATISTICAL PROCESSING OF AE TESTING DATA V. A. Barat and A. L. Alyakrtsky Research Dept, Interuns Ltd., bld. 24, corp 3-4, Myasntskaya str., Moscow, 0000, Russa Keywords: sgnal processng,

More information

Querying by sketch geographical databases. Yu Han 1, a *

Querying by sketch geographical databases. Yu Han 1, a * 4th Internatonal Conference on Sensors, Measurement and Intellgent Materals (ICSMIM 2015) Queryng by sketch geographcal databases Yu Han 1, a * 1 Department of Basc Courses, Shenyang Insttute of Artllery,

More information

A NOTE ON FUZZY CLOSURE OF A FUZZY SET

A NOTE ON FUZZY CLOSURE OF A FUZZY SET (JPMNT) Journal of Process Management New Technologes, Internatonal A NOTE ON FUZZY CLOSURE OF A FUZZY SET Bhmraj Basumatary Department of Mathematcal Scences, Bodoland Unversty, Kokrajhar, Assam, Inda,

More information

Efficient Distributed File System (EDFS)

Efficient Distributed File System (EDFS) Effcent Dstrbuted Fle System (EDFS) (Sem-Centralzed) Debessay(Debsh) Fesehaye, Rahul Malk & Klara Naherstedt Unversty of Illnos-Urbana Champagn Contents Problem Statement, Related Work, EDFS Desgn Rate

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

Fuzzy Modeling of the Complexity vs. Accuracy Trade-off in a Sequential Two-Stage Multi-Classifier System

Fuzzy Modeling of the Complexity vs. Accuracy Trade-off in a Sequential Two-Stage Multi-Classifier System Fuzzy Modelng of the Complexty vs. Accuracy Trade-off n a Sequental Two-Stage Mult-Classfer System MARK LAST 1 Department of Informaton Systems Engneerng Ben-Guron Unversty of the Negev Beer-Sheva 84105

More information

CMPS 10 Introduction to Computer Science Lecture Notes

CMPS 10 Introduction to Computer Science Lecture Notes CPS 0 Introducton to Computer Scence Lecture Notes Chapter : Algorthm Desgn How should we present algorthms? Natural languages lke Englsh, Spansh, or French whch are rch n nterpretaton and meanng are not

More information

Enhancement of Infrequent Purchased Product Recommendation Using Data Mining Techniques

Enhancement of Infrequent Purchased Product Recommendation Using Data Mining Techniques Enhancement of Infrequent Purchased Product Recommendaton Usng Data Mnng Technques Noraswalza Abdullah, Yue Xu, Shlomo Geva, and Mark Loo Dscplne of Computer Scence Faculty of Scence and Technology Queensland

More information

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 An Iteratve Soluton Approach to Process Plant Layout usng Mxed

More information

Feature Selection as an Improving Step for Decision Tree Construction

Feature Selection as an Improving Step for Decision Tree Construction 2009 Internatonal Conference on Machne Learnng and Computng IPCSIT vol.3 (2011) (2011) IACSIT Press, Sngapore Feature Selecton as an Improvng Step for Decson Tree Constructon Mahd Esmael 1, Fazekas Gabor

More information

Pruning Training Corpus to Speedup Text Classification 1

Pruning Training Corpus to Speedup Text Classification 1 Prunng Tranng Corpus to Speedup Text Classfcaton Jhong Guan and Shugeng Zhou School of Computer Scence, Wuhan Unversty, Wuhan, 430079, Chna hguan@wtusm.edu.cn State Key Lab of Software Engneerng, Wuhan

More information

The Shortest Path of Touring Lines given in the Plane

The Shortest Path of Touring Lines given in the Plane Send Orders for Reprnts to reprnts@benthamscence.ae 262 The Open Cybernetcs & Systemcs Journal, 2015, 9, 262-267 The Shortest Path of Tourng Lnes gven n the Plane Open Access Ljuan Wang 1,2, Dandan He

More information

Classic Term Weighting Technique for Mining Web Content Outliers

Classic Term Weighting Technique for Mining Web Content Outliers Internatonal Conference on Computatonal Technques and Artfcal Intellgence (ICCTAI'2012) Penang, Malaysa Classc Term Weghtng Technque for Mnng Web Content Outlers W.R. Wan Zulkfel, N. Mustapha, and A. Mustapha

More information