Keyword-based Document Clustering
|
|
- Candace Hubbard
- 5 years ago
- Views:
Transcription
1 Keyword-based ocument lusterng Seung-Shk Kang School of omputer Scence Kookmn Unversty & AIrc hungnung-dong Songbuk-gu Seoul Korea Abstract ocument clusterng s an aggregaton of related documents to a cluster based on the smlarty evaluaton task between documents and the representatves of clusters. erms and ther dscrmnatng features of terms are the clue to the clusterng and the dscrmnatng features are based on the term and document frequences. Feature selecton method on the bass of frequency statstcs has a lmtaton to the enhancement of the clusterng algorthm because t does not consder the contents of the cluster obects. In ths paper we adopt a content-based analytc approach to refne the smlarty computaton and propose a keyword-based clusterng algorthm. Expermental results show that content-based keyword weghtng outperforms frequency-based weghtng method. Keywords: ocument lusterng Weghtng Scheme Feature Selecton Introducton ocument clusterng s an aggregaton of documents by dscrmnatng the relevant documents from the rrelevant documents. he relevance determnaton crtera of any two documents s a smlarty measure and the representatves of the documents [234]. here are some smlarty measures such as ce coeffcent Jaccard s coeffcent and cosne measure. hese smlarty measures requre that the documents are represented n document vectors and the smlarty of two documents s calculated from the operaton of document vectors. In general the representatves of a document or a cluster are document vectors that consst of <term weght> pars and the document smlartes are determned by the terms and ther weghtng values that are extracted from the document [79]. In the prevous studes on the document clusterng we focused on the clusterng algorthm but the document hs work was supported by the Korea Scence and Engneerng Foundaton(KOSEF) through the Advanced Informaton echnology Research enter(airc). representaton methodology was not the mportant ssue. ocument vectors are smply constructed from the term frequency (F) and the nverted document frequency (IF). hs representaton of term weghtng method starts from the precondton that terms or keywords representng the document are calculated by F-IF. erm weghtng method by F-IF s generally used to construct a document vector but we cannot say that t s the best way of representng a document. So we suppose that there s a lmtaton to mprove the accuracy of the clusterng system only by mprovng the clusterng algorthm wthout changng the document/cluster representaton method. Also document clusterng requres a large amount of memory spaces to keep the representatves of documents/clusters and the smlarty measures [6 8 ]. Gven N documents to be clustered N N smlarty matrx s needed to store document smlarty measures. Also the recursve teraton of smlarty calculaton and reconstructng the representatve of the clusters need a huge number of computatons. In ths paper we propose a new clusterng method that s based on the keyword weghtng approach. he clusterng algorthm starts from the seed documents and the cluster s expanded by the keyword relatonshp. he evoluton of the cluster stops when no more documents are added to the cluster and rrelevant documents are removed from the cluster canddates. 2 Keyword-based Weghtng Scheme In general the constructon of a document vector depends on the term frequency and document frequency. If keywords are determned by frequency nformaton of the document we are apt to generate an error that nouns are often used regardless of substance of the document and the words of a hgh frequency are extracted. he clusterng method whch s focused on smlarty calculaton consders the whole words except stopwords as the representatve of the document and consttutes a document vector that s calculated by the weght value from the term frequency and document frequency. It s common that terms and ther weght values represent a document and <term weght> pars are the unque elements of the document vector. When we construct a document vector term frequency and document frequency are the most mportant features to calculate the weght of a term. As for the terms and
2 ther weght values the weght value of a term means a rankng score ust as an mportance factor to the document. So the term weghtng can be seen as an evaluaton of the term as a keyword or a stopword to the document. he weghtng functon w(t) from a term to ts weght s descrbed n expresson (). w: term weght () w(t) = f t s a stopword f t s a keyword a otherwse a For the weghtng scheme of terms there are two ponts of vews as the representaton of a document: () a dscrmnatve value that dstngushes or characterzes the document from others; (2) an mportance measure as a keyword or a stopword. Frequency-based term weghtng (FBW) s a statstcal measure of terms n an nter-document relatonshp. hs weghtng scheme s a very effcent method for dstngushng and characterzng a document from others and t performs well for the applcatons of document classfcaton or clusterng n the nformaton retreval system. he only evaluaton measure to characterze a document n frequency-based weghtng scheme s a frequency statstcs but term frequences are not the best measures to characterze the document by terms. Another weghtng scheme s a keyword-based term weghtng (KBW) method that s based on the keyword mportance factors n a document. It s an analytc approach that analyses the contents of a document to get a keyword lst from the document. he weght value of a word s calculated by the mportance factors as a keyword n a document. he weght value of a word s a combnaton value of keyword-weghtng factors and the terms are ordered by the keyword rankng score. he rankng scores n ths weghtng scheme are calculated from the analyss results of the document. Keyword-based term weghtng wll be a good soluton to overcome the lmtaton of the frequency-based weghtng scheme. Keywords n a text are the terms that represent a document and the canddate keywords are extracted from the analyss results of the document. Keyword rankng method depends on several factors of a term such as the type of a document the locaton and the role of words n a sentence or a paragraph [5]. hematc words of a document are representatve terms for the document. hematc words are extracted from a text by analysng the contents of the text but keyword extracton depends on the type of text. Keywords are easly found n the ttle or an abstract n a research paper that conssts of a ttle abstract body experment and concluson. Also newspaper artcle contans a keyword n the ttle or the frst part of the text. here are some clues of determnng a keyword and we may classfy them as word level sentence level paragraph level and text level features. Word-level features are the type of part-of-speech and case-role nformaton. he part-of-speech of Korean noun s dvded nto common noun compound noun proper noun and numeral. Syntactc or sentence-level features are the type of a phrase or a clause sentence locaton and sentence type. From the rhetorc word n a sentence the mportance of the sentence s computed and the terms n a sentence are affected by the type of a sentence. Also the weghtng scheme of a term n the subectve clause s not the equal to the same term that appeared n an auxlary clause or n a modfyng clause. Basc term weght s assgned by the type of a term and recomputed by the features that t accompanes n the text. hat s the weght value of a term s also determned by the characterstcs of word sentence phrase and clause where the term s extracted. 3 Keyword-based ocument lusterng Keyword-based document clusterng creates a cluster by the keywords of each document. Suppose that s a set of clusters that s fnally created by the clusterng algorthm. If n s the number of clusters n then s a set of clusters. 2 = { } 2 Each cluster s ntalsed by document d that s not assgned to the exstng clusters and d s a seed document of. When a new cluster s created expanson and reducton steps are repeated untl t reaches a stable state from the start state. In each evoluton steps for cluster s the -th state of. : the -th state of a cluster he characterstc vector of a cluster s a set of <keyword weght value> pars that represents the cluster. If K s a keyword set of a document and K s a keyword set of cluster then K s the -th state of cluster. Fgure shows a keyword-based clusterng algorthm for the cluster. Gven the keyword sets for each document cluster s created by the self-expandng algorthm. 3. luster Intalsaton he frst step of the clusterng algorthm s a creaton and ntalsaton of a new cluster. A document s selected that does not belong to any other cluster and t s assgned to a new cluster that s an ntal state n n
3 of cluster. = { } At ths tme a document that s the frst document n the new cluster s called a seed document (or an ntalsaton document). he seed document s randomly selected among the documents that do not belong to the clusters ~. Keyword set K of a document s a set of keywords k k 2 k n that are extracted from document. he ntal state of keyword set K s ntalsed by K. K = K K = { k k s a keyword that s extracted from } = { } K = K = { x document x where k K x for k such that k K } = do { K = K where x x + = for all x begn s = sm( x K ) f ( s < threshold) + + = { x} end for = + } whle ( seleteocument() ) = Fgure. Keyword-based clusterng algorthm 3.2 Expandng the luster In the ntalsaton step of the cluster a new cluster an ntal state of cluster s establshed as the seed document and the keyword set K s ntalsed by the key word set of the seed document. In the expandng step of the cluster the cluster s expanded by addng more related documents to the cluster that nclude the keywords of the seed document as the related documents of the seed document. hat s addng the total documents that K appear each keyword of (the keyword extracted from the seed document) to the cluster that s the next state of cluster expands the cluster. = { x k x K = K where K k K he cluster expanson s performed by the teraton of keyword expanson and cluster expanson. More documents are added to a cluster by the smlarty evaluaton between the keyword set and the document. If a new document s added to a cluster then the keywords n the added document are also added to the keyword set of the cluster. he frst expanson s performed by the keyword set extracted from the seed document. he second expanson s performed by new keywords that are added to a cluster as a result of the frst expanson. And the -th expanson s performed by the (-)-th state of the keyword set. he number of teratons s decded through the experment. When a cluster s expanded from to the keyword set K s also expanded to a new keyword set K that appears n the total documents of the cluster. he keyword set K of s a unon of the total keyword sets of. x } he keyword set of the cluster s used to calculate the characterstc vector of each cluster. he characterstc vector s consttuted the weght value calculated by term frequency (F) and nverted document frequency (IF) of the keywords and ths s used to calculate the smlarty measure between a document and the cluster. 3.3 luster Reducton and ompleton hs step s to produce a complete cluster by removng the documents that are not related to the cluster. For the cluster documents of a low smlarty to the cluster are removed that are not related to a cluster through the smlarty computaton wth the cluster. he result of cluster reducton s a flterng of documents that are not related + to the cluster and the cluster s generated as a next step of the cluster. Ultmately the cluster s completed that conssts of the related documents after flterng the non-related documents. If a cluster s completed the next cluster + s created through the same process. lusterng s termnated f all the documents are clustered or no more clusters are created. x
4 Input ocument Keyword Extracton create nverted-fle reate Inverted-Fle reate a luster Int. luster create cluster Keyword set 2 n Expand luster expand cluster Reduce/omplete luster lusters a b 2a 2b na nb a 2 L a n a b 2 L b n b z 2 L z n z Fgure 2. Overall archtecture of keywordbased clusterng 4 esgn and Implementaton he structure of a keyword-based clusterng system s shown n Fgure 2. At frst keywords are extracted from each nput document and the weght values of them are computed. Keywords and ther scores are stored n an nverted-fle structure. Inverted-fle structure s a good for the expanson of the cluster and addng the documents that ncludes a keyword to the ntal cluster. Fgure 3 shows an example of the operaton of the document clusterng system: ntalzaton expanson reducton and completon of clusters. A new cluster s created and t ncludes a seed document. An ntal set of keywords for the ntal state of a cluster s a keyword set K of document. K = { 2 n } For the terms n K documents that contan the same term are added as a canddate document n the cluster. Let the canddate documents be a b 2a 2b na nb. then xy s a document that s expanded by term x. Keyword set of the cluster s reconstructed by new set of documents. In each step of the cluster expanson the number of keywords that are used for the expanson and the threshold of the weght value are decded through experments consderng the maxmum number of document canddates n a cluster. Also <keyword weght> pars as an ntermedate representatve of the cluster are much mportant factor of the cluster expanson. result A B 2A 2B na nb complete cluster Fgure 3. Example of keyword-based clusterng Now a new keyword set that s lmted to the cluster canddates s constructed to get cluster documents. hrough the smlarty calculaton between the document and the canddate centrod of the cluster relevant documents are selected to be a member of the cluster. hrough the teratons on keyword selecton and the reconstructon of the related documents a new cluster s completed that reaches n a stable status wth a strong relatonshp between keyword set and document set. 5 he Experments We mplemented our clusterng algorthm and appled t to the clusterng of smlar documents. he test documents for the experment are collected from the three days of newspaper artcles. he total number of artcles s 383 and average 32 terms are extracted from the artcles. We performed a document clusterng by applyng the dfference crtera for term selecton: ) frequency-based term selecton; 2) percentage-based keyword selecton; and 3) keyword selecton by absolute number of keywords. Fgure 4 shows the result of smlarty clusterng by frequency-based term selecton. In ths experment three types of term selecton are performed.
5 - all terms are used to the clusterng - terms wth more than frequency 2 - terms wth more than frequency 3 In each experment we vared the smlarty decson rato by the percentage of term matches. Fgure 4 shows that term selecton by frequency 2 or 3 s not good for the representaton of a document. smlarty decson and auxlary keywords are also needed for the accuracy. Another pont n ths experment s that 3%~6% keyword selecton resulted better than the selecton of all terms. We compared the F -measure for the selecton of maxmum keywords. All the experments n Fgure 6 resulted better than the experment of usng all the terms n the document. Also 3~7 keywords wth 6%~7% match rato resulted a good performance for the comparson of document smlarty. term m atch rato term match rato Fgure 4. Frequency-based keyword selecton Fgure 6. Keyword selecton by maxmum term match rato Fgure 5. Percentage-based keyword selecton In the experment of percentage-based keyword selecton terms of hgh weght values are selected for the smlarty calculaton of the document. All the curves n Fgure 5 are a smlar shape except for % selecton. In case of % selecton we guess that less than % of keywords are not suffcent for the 6 oncluson It s common that clusterng algorthm s based on the smlarty computaton by frequency-based statstcs to aggregate the related documents. hs metrc s an mportant factor for term weghtng. We proposed a term weghtng method that s based on the keyword features and we tred to complement the drawback of frequency-based metrc. Based on the keyword weghtng scheme documents of the same keywords are grouped nto a cluster canddate and a new cluster s created by removng rrelevant documents. We performed an experment for the clusterng of smlar documents and the results showed that keyword-based weghtng scheme s better than the frequency-based method. Our keyword-based algorthm s usng 3%~6% of terms for a clusterng and the smlarty matrx s not a necessty that t wll be good for the clusterng of a huge number of documents. We also expect that ths algorthm wll be good for the topc trackng of specal events. In the experment we randomly selected a seed document and t s a bt senstve for the seed document. So our next research wll be focused on mnmzng the effect of the seed document by gettng representatve keywords before startng the clusterng.
6 References [] Anderberg M. R. luster Analyss for Applcatons New York: Academc 973. [2] an F. and E. A. Ozkarahan ynamc luster Mantenance Informaton Processng & Management Vol. 25 pp [3] ubes R. and A. K. Jan lusterng Methodologes n Exploratory ata Analyss Advances n omputers Vol. 9 pp [4] Frakes W. B. and R. Baeza-Yates Informaton Retreval Prentce Hall 992. [5] Kang S. S. H. G. Lee S. H. Son G.. Hong and B. J. Moon erm Weghtng Method by Postposton and ompound Noun Recognton Proceedngs of 3 th onference on Korean Language omputng pp [6] Murtagh F. omplextes of Herarchc lusterng Algorthms: State of the Art omputatonal Statstcs Quarterly Vol. pp [7] Perry S. A. and P. Wllett A Revew of the Use of Inverted Fles for Best Match Searchng n Informaton Retreval Systems Journal of Informaton Scence Vol. 6 pp [8] Sbson R. SLINK: an Optmally Effcent Algorthm for the Sngle-Lnk luster Method omputer Journal Vol. 6 pp [9] Wllett P. ocument lusterng Usng an Inverted Fle Approach Journal of Informaton Scence Vol. 2 pp [] Wllett P. Recent rends n Herarchc ocument lusterng: A rtcal Revew Informaton Processng and Management Vol. 24 No.5 pp
Machine Learning: Algorithms and Applications
14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of
More informationHierarchical clustering for gene expression data analysis
Herarchcal clusterng for gene expresson data analyss Gorgo Valentn e-mal: valentn@ds.unm.t Clusterng of Mcroarray Data. Clusterng of gene expresson profles (rows) => dscovery of co-regulated and functonally
More informationContent Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers
IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth
More informationSubspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;
Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features
More informationUB at GeoCLEF Department of Geography Abstract
UB at GeoCLEF 2006 Mguel E. Ruz (1), Stuart Shapro (2), June Abbas (1), Slva B. Southwck (1) and Davd Mark (3) State Unversty of New York at Buffalo (1) Department of Lbrary and Informaton Studes (2) Department
More informationA Binarization Algorithm specialized on Document Images and Photos
A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a
More informationCluster Analysis of Electrical Behavior
Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School
More informationFuzzy C-Means Initialized by Fixed Threshold Clustering for Improving Image Retrieval
Fuzzy -Means Intalzed by Fxed Threshold lusterng for Improvng Image Retreval NAWARA HANSIRI, SIRIPORN SUPRATID,HOM KIMPAN 3 Faculty of Informaton Technology Rangst Unversty Muang-Ake, Paholyotn Road, Patumtan,
More informationTerm Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task
Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto
More informationA Fast Content-Based Multimedia Retrieval Technique Using Compressed Data
A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,
More informationParallelism for Nested Loops with Non-uniform and Flow Dependences
Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr
More informationSLAM Summer School 2006 Practical 2: SLAM using Monocular Vision
SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,
More informationFeature Reduction and Selection
Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components
More informationModule Management Tool in Software Development Organizations
Journal of Computer Scence (5): 8-, 7 ISSN 59-66 7 Scence Publcatons Management Tool n Software Development Organzatons Ahmad A. Al-Rababah and Mohammad A. Al-Rababah Faculty of IT, Al-Ahlyyah Amman Unversty,
More informationCS 534: Computer Vision Model Fitting
CS 534: Computer Vson Model Fttng Sprng 004 Ahmed Elgammal Dept of Computer Scence CS 534 Model Fttng - 1 Outlnes Model fttng s mportant Least-squares fttng Maxmum lkelhood estmaton MAP estmaton Robust
More informationProblem Definitions and Evaluation Criteria for Computational Expensive Optimization
Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty
More informationPerformance Evaluation of Information Retrieval Systems
Why System Evaluaton? Performance Evaluaton of Informaton Retreval Systems Many sldes n ths secton are adapted from Prof. Joydeep Ghosh (UT ECE) who n turn adapted them from Prof. Dk Lee (Unv. of Scence
More informationNUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS
ARPN Journal of Engneerng and Appled Scences 006-017 Asan Research Publshng Network (ARPN). All rghts reserved. NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS Igor Grgoryev, Svetlana
More informationBioTechnology. An Indian Journal FULL PAPER. Trade Science Inc.
[Type text] [Type text] [Type text] ISSN : 0974-74 Volume 0 Issue BoTechnology 04 An Indan Journal FULL PAPER BTAIJ 0() 04 [684-689] Revew on Chna s sports ndustry fnancng market based on market -orented
More informationQuery Clustering Using a Hybrid Query Similarity Measure
Query clusterng usng a hybrd query smlarty measure Fu. L., Goh, D.H., & Foo, S. (2004). WSEAS Transacton on Computers, 3(3), 700-705. Query Clusterng Usng a Hybrd Query Smlarty Measure Ln Fu, Don Hoe-Lan
More informationA Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems
A Unfed Framework for Semantcs and Feature Based Relevance Feedback n Image Retreval Systems Ye Lu *, Chunhu Hu 2, Xngquan Zhu 3*, HongJang Zhang 2, Qang Yang * School of Computng Scence Smon Fraser Unversty
More informationMachine Learning. Topic 6: Clustering
Machne Learnng Topc 6: lusterng lusterng Groupng data nto (hopefully useful) sets. Thngs on the left Thngs on the rght Applcatons of lusterng Hypothess Generaton lusters mght suggest natural groups. Hypothess
More informationA Fast Visual Tracking Algorithm Based on Circle Pixels Matching
A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng
More informationDiscriminative Dictionary Learning with Pairwise Constraints
Dscrmnatve Dctonary Learnng wth Parwse Constrants Humn Guo Zhuoln Jang LARRY S. DAVIS UNIVERSITY OF MARYLAND Nov. 6 th, Outlne Introducton/motvaton Dctonary Learnng Dscrmnatve Dctonary Learnng wth Parwse
More informationClassifier Selection Based on Data Complexity Measures *
Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.
More informationInformation Retrieval
Anmol Bhasn abhasn[at]cedar.buffalo.edu Moht Devnan mdevnan[at]cse.buffalo.edu Sprng 2005 #$ "% &'" (! Informaton Retreval )" " * + %, ##$ + *--. / "#,0, #'",,,#$ ", # " /,,#,0 1"%,2 '",, Documents are
More informationK-means and Hierarchical Clustering
Note to other teachers and users of these sldes. Andrew would be delghted f you found ths source materal useful n gvng your own lectures. Feel free to use these sldes verbatm, or to modfy them to ft your
More informationLearning the Kernel Parameters in Kernel Minimum Distance Classifier
Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department
More informationAn Optimal Algorithm for Prufer Codes *
J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,
More informationAvailable online at Available online at Advanced in Control Engineering and Information Science
Avalable onlne at wwwscencedrectcom Avalable onlne at wwwscencedrectcom Proceda Proceda Engneerng Engneerng 00 (2011) 15000 000 (2011) 1642 1646 Proceda Engneerng wwwelsevercom/locate/proceda Advanced
More informationInnovation Typology. Collaborative Authoritativeness. Focused Web Mining. Text and Data Mining In Innovation. Generational Models
Text and Data Mnng In Innovaton Joseph Engler Innovaton Typology Generatonal Models 1. Lnear or Push (Baroque) 2. Pull (Romantc) 3. Cyclc (Classcal) 4. Strategc (New Age) 5. Collaboratve (Polyphonc) Collaboratve
More informationLobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide
Lobachevsky State Unversty of Nzhn Novgorod Polyhedron Quck Start Gude Nzhn Novgorod 2016 Contents Specfcaton of Polyhedron software... 3 Theoretcal background... 4 1. Interface of Polyhedron... 6 1.1.
More informationOutline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like:
Self-Organzng Maps (SOM) Turgay İBRİKÇİ, PhD. Outlne Introducton Structures of SOM SOM Archtecture Neghborhoods SOM Algorthm Examples Summary 1 2 Unsupervsed Hebban Learnng US Hebban Learnng, Cntd 3 A
More informationSecurity Enhanced Dynamic ID based Remote User Authentication Scheme for Multi-Server Environments
Internatonal Journal of u- and e- ervce, cence and Technology Vol8, o 7 0), pp7-6 http://dxdoorg/07/unesst087 ecurty Enhanced Dynamc ID based Remote ser Authentcaton cheme for ult-erver Envronments Jun-ub
More informationFEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur
FEATURE EXTRACTION Dr. K.Vjayarekha Assocate Dean School of Electrcal and Electroncs Engneerng SASTRA Unversty, Thanjavur613 41 Jont Intatve of IITs and IISc Funded by MHRD Page 1 of 8 Table of Contents
More informationApplication of Clustering Algorithm in Big Data Sample Set Optimization
Applcaton of Clusterng Algorthm n Bg Data Sample Set Optmzaton Yutang Lu 1, Qn Zhang 2 1 Department of Basc Subjects, Henan Insttute of Technology, Xnxang 453002, Chna 2 School of Mathematcs and Informaton
More informationClustering Algorithm of Similarity Segmentation based on Point Sorting
Internatonal onference on Logstcs Engneerng, Management and omputer Scence (LEMS 2015) lusterng Algorthm of Smlarty Segmentaton based on Pont Sortng Hanbng L, Yan Wang*, Lan Huang, Mngda L, Yng Sun, Hanyuan
More informationUnsupervised Learning and Clustering
Unsupervsed Learnng and Clusterng Why consder unlabeled samples?. Collectng and labelng large set of samples s costly Gettng recorded speech s free, labelng s tme consumng 2. Classfer could be desgned
More informationDetermining the Optimal Bandwidth Based on Multi-criterion Fusion
Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Mult-crteron Fuson Ha-L Lang 1+, Xan-Mn
More informationAn Image Fusion Approach Based on Segmentation Region
Rong Wang, L-Qun Gao, Shu Yang, Yu-Hua Cha, and Yan-Chun Lu An Image Fuson Approach Based On Segmentaton Regon An Image Fuson Approach Based on Segmentaton Regon Rong Wang, L-Qun Gao, Shu Yang 3, Yu-Hua
More informationCourse Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms
Course Introducton Course Topcs Exams, abs, Proects A quc loo at a few algorthms 1 Advanced Data Structures and Algorthms Descrpton: We are gong to dscuss algorthm complexty analyss, algorthm desgn technques
More informationDocument Representation and Clustering with WordNet Based Similarity Rough Set Model
IJCSI Internatonal Journal of Computer Scence Issues, Vol. 8, Issue 5, No 3, September 20 ISSN (Onlne): 694-084 www.ijcsi.org Document Representaton and Clusterng wth WordNet Based Smlarty Rough Set Model
More informationUser Authentication Based On Behavioral Mouse Dynamics Biometrics
User Authentcaton Based On Behavoral Mouse Dynamcs Bometrcs Chee-Hyung Yoon Danel Donghyun Km Department of Computer Scence Department of Computer Scence Stanford Unversty Stanford Unversty Stanford, CA
More informationChinese Word Segmentation based on the Improved Particle Swarm Optimization Neural Networks
Chnese Word Segmentaton based on the Improved Partcle Swarm Optmzaton Neural Networks Ja He Computatonal Intellgence Laboratory School of Computer Scence and Engneerng, UESTC Chengdu, Chna Department of
More informationUnsupervised Learning
Pattern Recognton Lecture 8 Outlne Introducton Unsupervsed Learnng Parametrc VS Non-Parametrc Approach Mxture of Denstes Maxmum-Lkelhood Estmates Clusterng Prof. Danel Yeung School of Computer Scence and
More informationDescription of NTU Approach to NTCIR3 Multilingual Information Retrieval
Proceedngs of the Thrd NTCIR Workshop Descrpton of NTU Approach to NTCIR3 Multlngual Informaton Retreval Wen-Cheng Ln and Hsn-Hs Chen Department of Computer Scence and Informaton Engneerng Natonal Tawan
More informationCompiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz
Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster
More informationLoad-Balanced Anycast Routing
Load-Balanced Anycast Routng Chng-Yu Ln, Jung-Hua Lo, and Sy-Yen Kuo Department of Electrcal Engneerng atonal Tawan Unversty, Tape, Tawan sykuo@cc.ee.ntu.edu.tw Abstract For fault-tolerance and load-balance
More informationAn Improved Image Segmentation Algorithm Based on the Otsu Method
3th ACIS Internatonal Conference on Software Engneerng, Artfcal Intellgence, Networkng arallel/dstrbuted Computng An Improved Image Segmentaton Algorthm Based on the Otsu Method Mengxng Huang, enjao Yu,
More informationImproving Web Image Search using Meta Re-rankers
VOLUME-1, ISSUE-V (Aug-Sep 2013) IS NOW AVAILABLE AT: www.dcst.com Improvng Web Image Search usng Meta Re-rankers B.Kavtha 1, N. Suata 2 1 Department of Computer Scence and Engneerng, Chtanya Bharath Insttute
More informationRecommended Items Rating Prediction based on RBF Neural Network Optimized by PSO Algorithm
Recommended Items Ratng Predcton based on RBF Neural Network Optmzed by PSO Algorthm Chengfang Tan, Cayn Wang, Yuln L and Xx Q Abstract In order to mtgate the data sparsty and cold-start problems of recommendaton
More informationOutline. Type of Machine Learning. Examples of Application. Unsupervised Learning
Outlne Artfcal Intellgence and ts applcatons Lecture 8 Unsupervsed Learnng Professor Danel Yeung danyeung@eee.org Dr. Patrck Chan patrckchan@eee.org South Chna Unversty of Technology, Chna Introducton
More informationLinkSelector: A Web Mining Approach to. Hyperlink Selection for Web Portals
nkselector: A Web Mnng Approach to Hyperlnk Selecton for Web Portals Xao Fang and Olva R. u Sheng Department of Management Informaton Systems Unversty of Arzona, AZ 8572 {xfang,sheng}@bpa.arzona.edu Submtted
More informationBIN XIA et al: AN IMPROVED K-MEANS ALGORITHM BASED ON CLOUD PLATFORM FOR DATA MINING
An Improved K-means Algorthm based on Cloud Platform for Data Mnng Bn Xa *, Yan Lu 2. School of nformaton and management scence, Henan Agrcultural Unversty, Zhengzhou, Henan 450002, P.R. Chna 2. College
More informationLoad Balancing for Hex-Cell Interconnection Network
Int. J. Communcatons, Network and System Scences,,, - Publshed Onlne Aprl n ScRes. http://www.scrp.org/journal/jcns http://dx.do.org/./jcns.. Load Balancng for Hex-Cell Interconnecton Network Saher Manaseer,
More informationComparison of Performance in Text Mining using Categorization of Unstructured Data
Indan Journal of Scence and Technology, Vol 9(4), DOI: 0.7485/jst/06/v94/9648, June 06 ISSN (Prnt) : 0974-6846 ISSN (Onlne) : 0974-5645 Comparson of Performance n Text Mnng usng Categorzaton of Unstructured
More informationBRDPHHC: A Balance RDF Data Partitioning Algorithm based on Hybrid Hierarchical Clustering
015 IEEE 17th Internatonal Conference on Hgh Performance Computng and Communcatons (HPCC), 015 IEEE 7th Internatonal Symposum on Cyberspace Safety and Securty (CSS), and 015 IEEE 1th Internatonal Conf
More informationRelated-Mode Attacks on CTR Encryption Mode
Internatonal Journal of Network Securty, Vol.4, No.3, PP.282 287, May 2007 282 Related-Mode Attacks on CTR Encrypton Mode Dayn Wang, Dongda Ln, and Wenlng Wu (Correspondng author: Dayn Wang) Key Laboratory
More informationRevealing Paths of Relevant Information in Web Graphs
Revealng Paths of Relevant Informaton n Web Graphs Georgos Kouzas 1, Vassleos Kolas 2, Ioanns Anagnostopoulos 1 and Eleftheros Kayafas 2 1 Unversty of the Aegean Department of Fnancal and Management Engneerng
More informationImprovement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration
Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,
More informationProblem Set 3 Solutions
Introducton to Algorthms October 4, 2002 Massachusetts Insttute of Technology 6046J/18410J Professors Erk Demane and Shaf Goldwasser Handout 14 Problem Set 3 Solutons (Exercses were not to be turned n,
More informationSmoothing Spline ANOVA for variable screening
Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory
More informationIntra-Parametric Analysis of a Fuzzy MOLP
Intra-Parametrc Analyss of a Fuzzy MOLP a MIAO-LING WANG a Department of Industral Engneerng and Management a Mnghsn Insttute of Technology and Hsnchu Tawan, ROC b HSIAO-FAN WANG b Insttute of Industral
More informationEdge Detection in Noisy Images Using the Support Vector Machines
Edge Detecton n Nosy Images Usng the Support Vector Machnes Hlaro Gómez-Moreno, Saturnno Maldonado-Bascón, Francsco López-Ferreras Sgnal Theory and Communcatons Department. Unversty of Alcalá Crta. Madrd-Barcelona
More informationSolving two-person zero-sum game by Matlab
Appled Mechancs and Materals Onlne: 2011-02-02 ISSN: 1662-7482, Vols. 50-51, pp 262-265 do:10.4028/www.scentfc.net/amm.50-51.262 2011 Trans Tech Publcatons, Swtzerland Solvng two-person zero-sum game by
More informationBOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET
1 BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET TZU-CHENG CHUANG School of Electrcal and Computer Engneerng, Purdue Unversty, West Lafayette, Indana 47907 SAUL B. GELFAND School
More informationMULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION
MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION Paulo Quntlano 1 & Antono Santa-Rosa 1 Federal Polce Department, Brasla, Brazl. E-mals: quntlano.pqs@dpf.gov.br and
More informationEfficient Content Representation in MPEG Video Databases
Effcent Content Representaton n MPEG Vdeo Databases Yanns S. Avrths, Nkolaos D. Doulams, Anastasos D. Doulams and Stefanos D. Kollas Department of Electrcal and Computer Engneerng Natonal Techncal Unversty
More informationFrom Comparing Clusterings to Combining Clusterings
Proceedngs of the Twenty-Thrd AAAI Conference on Artfcal Intellgence (008 From Comparng Clusterngs to Combnng Clusterngs Zhwu Lu and Yuxn Peng and Janguo Xao Insttute of Computer Scence and Technology,
More informationImpact of a New Attribute Extraction Algorithm on Web Page Classification
Impact of a New Attrbute Extracton Algorthm on Web Page Classfcaton Gösel Brc, Banu Dr, Yldz Techncal Unversty, Computer Engneerng Department Abstract Ths paper ntroduces a new algorthm for dmensonalty
More informationA Method of Hot Topic Detection in Blogs Using N-gram Model
84 JOURNAL OF SOFTWARE, VOL. 8, NO., JANUARY 203 A Method of Hot Topc Detecton n Blogs Usng N-gram Model Xaodong Wang College of Computer and Informaton Technology, Henan Normal Unversty, Xnxang, Chna
More informationTsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance
Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for
More informationPCA Based Gait Segmentation
Honggu L, Cupng Sh & Xngguo L PCA Based Gat Segmentaton PCA Based Gat Segmentaton Honggu L, Cupng Sh, and Xngguo L 2 Electronc Department, Physcs College, Yangzhou Unversty, 225002 Yangzhou, Chna 2 Department
More informationAn Efficient Genetic Algorithm with Fuzzy c-means Clustering for Traveling Salesman Problem
An Effcent Genetc Algorthm wth Fuzzy c-means Clusterng for Travelng Salesman Problem Jong-Won Yoon and Sung-Bae Cho Dept. of Computer Scence Yonse Unversty Seoul, Korea jwyoon@sclab.yonse.ac.r, sbcho@cs.yonse.ac.r
More informationFAHP and Modified GRA Based Network Selection in Heterogeneous Wireless Networks
2017 2nd Internatonal Semnar on Appled Physcs, Optoelectroncs and Photoncs (APOP 2017) ISBN: 978-1-60595-522-3 FAHP and Modfed GRA Based Network Selecton n Heterogeneous Wreless Networks Xaohan DU, Zhqng
More informationA Simple Methodology for Database Clustering. Hao Tang 12 Guangdong University of Technology, Guangdong, , China
for Database Clusterng Guangdong Unversty of Technology, Guangdong, 0503, Chna E-mal: 6085@qq.com Me Zhang Guangdong Unversty of Technology, Guangdong, 0503, Chna E-mal:64605455@qq.com Database clusterng
More informationType-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data
Malaysan Journal of Mathematcal Scences 11(S) Aprl : 35 46 (2017) Specal Issue: The 2nd Internatonal Conference and Workshop on Mathematcal Analyss (ICWOMA 2016) MALAYSIAN JOURNAL OF MATHEMATICAL SCIENCES
More informationA Multi-step Strategy for Shape Similarity Search In Kamon Image Database
A Mult-step Strategy for Shape Smlarty Search In Kamon Image Database Paul W.H. Kwan, Kazuo Torach 2, Kesuke Kameyama 2, Junbn Gao 3, Nobuyuk Otsu 4 School of Mathematcs, Statstcs and Computer Scence,
More informationStudy of Data Stream Clustering Based on Bio-inspired Model
, pp.412-418 http://dx.do.org/10.14257/astl.2014.53.86 Study of Data Stream lusterng Based on Bo-nspred Model Yngme L, Mn L, Jngbo Shao, Gaoyang Wang ollege of omputer Scence and Informaton Engneerng,
More informationMODULE DESIGN BASED ON INTERFACE INTEGRATION TO MAXIMIZE PRODUCT VARIETY AND MINIMIZE FAMILY COST
INTERNATIONAL CONFERENCE ON ENGINEERING DESIGN, ICED 07 28-31 AUGUST 2007, CITE DES SCIENCES ET DE L'INDUSTRIE, PARIS, FRANCE MODULE DESIGN BASED ON INTERFACE INTEGRATION TO MAIMIZE PRODUCT VARIETY AND
More informationS1 Note. Basis functions.
S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type
More informationPrivate Information Retrieval (PIR)
2 Levente Buttyán Problem formulaton Alce wants to obtan nformaton from a database, but she does not want the database to learn whch nformaton she wanted e.g., Alce s an nvestor queryng a stock-market
More informationSingle Document Keyphrase Extraction Using Neighborhood Knowledge
Proceedngs of the Twenty-Thrd AAAI Conference on Artfcal Intellgence (2008) Sngle Document Keyphrase Extracton Usng Neghborhood Knowledge Xaoun Wan and Janguo Xao Insttute of Computer Scence and Technology
More informationClustering Algorithm Combining CPSO with K-Means Chunqin Gu 1, a, Qian Tao 2, b
Internatonal Conference on Advances n Mechancal Engneerng and Industral Informatcs (AMEII 05) Clusterng Algorthm Combnng CPSO wth K-Means Chunqn Gu, a, Qan Tao, b Department of Informaton Scence, Zhongka
More informationFace Recognition University at Buffalo CSE666 Lecture Slides Resources:
Face Recognton Unversty at Buffalo CSE666 Lecture Sldes Resources: http://www.face-rec.org/algorthms/ Overvew of face recognton algorthms Correlaton - Pxel based correspondence between two face mages Structural
More informationConcurrent Apriori Data Mining Algorithms
Concurrent Apror Data Mnng Algorthms Vassl Halatchev Department of Electrcal Engneerng and Computer Scence York Unversty, Toronto October 8, 2015 Outlne Why t s mportant Introducton to Assocaton Rule Mnng
More informationTN348: Openlab Module - Colocalization
TN348: Openlab Module - Colocalzaton Topc The Colocalzaton module provdes the faclty to vsualze and quantfy colocalzaton between pars of mages. The Colocalzaton wndow contans a prevew of the two mages
More informationAUTOMATED METHOD FOR STATISTICAL PROCESSING OF AE TESTING DATA
AUTOMATED METHOD FOR STATISTICAL PROCESSING OF AE TESTING DATA V. A. Barat and A. L. Alyakrtsky Research Dept, Interuns Ltd., bld. 24, corp 3-4, Myasntskaya str., Moscow, 0000, Russa Keywords: sgnal processng,
More informationQuerying by sketch geographical databases. Yu Han 1, a *
4th Internatonal Conference on Sensors, Measurement and Intellgent Materals (ICSMIM 2015) Queryng by sketch geographcal databases Yu Han 1, a * 1 Department of Basc Courses, Shenyang Insttute of Artllery,
More informationA NOTE ON FUZZY CLOSURE OF A FUZZY SET
(JPMNT) Journal of Process Management New Technologes, Internatonal A NOTE ON FUZZY CLOSURE OF A FUZZY SET Bhmraj Basumatary Department of Mathematcal Scences, Bodoland Unversty, Kokrajhar, Assam, Inda,
More informationEfficient Distributed File System (EDFS)
Effcent Dstrbuted Fle System (EDFS) (Sem-Centralzed) Debessay(Debsh) Fesehaye, Rahul Malk & Klara Naherstedt Unversty of Illnos-Urbana Champagn Contents Problem Statement, Related Work, EDFS Desgn Rate
More informationAn Entropy-Based Approach to Integrated Information Needs Assessment
Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology
More informationFuzzy Modeling of the Complexity vs. Accuracy Trade-off in a Sequential Two-Stage Multi-Classifier System
Fuzzy Modelng of the Complexty vs. Accuracy Trade-off n a Sequental Two-Stage Mult-Classfer System MARK LAST 1 Department of Informaton Systems Engneerng Ben-Guron Unversty of the Negev Beer-Sheva 84105
More informationCMPS 10 Introduction to Computer Science Lecture Notes
CPS 0 Introducton to Computer Scence Lecture Notes Chapter : Algorthm Desgn How should we present algorthms? Natural languages lke Englsh, Spansh, or French whch are rch n nterpretaton and meanng are not
More informationEnhancement of Infrequent Purchased Product Recommendation Using Data Mining Techniques
Enhancement of Infrequent Purchased Product Recommendaton Usng Data Mnng Technques Noraswalza Abdullah, Yue Xu, Shlomo Geva, and Mark Loo Dscplne of Computer Scence Faculty of Scence and Technology Queensland
More informationAn Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation
17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 An Iteratve Soluton Approach to Process Plant Layout usng Mxed
More informationFeature Selection as an Improving Step for Decision Tree Construction
2009 Internatonal Conference on Machne Learnng and Computng IPCSIT vol.3 (2011) (2011) IACSIT Press, Sngapore Feature Selecton as an Improvng Step for Decson Tree Constructon Mahd Esmael 1, Fazekas Gabor
More informationPruning Training Corpus to Speedup Text Classification 1
Prunng Tranng Corpus to Speedup Text Classfcaton Jhong Guan and Shugeng Zhou School of Computer Scence, Wuhan Unversty, Wuhan, 430079, Chna hguan@wtusm.edu.cn State Key Lab of Software Engneerng, Wuhan
More informationThe Shortest Path of Touring Lines given in the Plane
Send Orders for Reprnts to reprnts@benthamscence.ae 262 The Open Cybernetcs & Systemcs Journal, 2015, 9, 262-267 The Shortest Path of Tourng Lnes gven n the Plane Open Access Ljuan Wang 1,2, Dandan He
More informationClassic Term Weighting Technique for Mining Web Content Outliers
Internatonal Conference on Computatonal Technques and Artfcal Intellgence (ICCTAI'2012) Penang, Malaysa Classc Term Weghtng Technque for Mnng Web Content Outlers W.R. Wan Zulkfel, N. Mustapha, and A. Mustapha
More information