Online Text Mining System based on M2VSM

Size: px
Start display at page:

Download "Online Text Mining System based on M2VSM"

Transcription

1 FR-E2-1 SCIS & ISIS 2008 Onlne Text Mnng System based on M2VSM Yasufum Takama 1, Takash Okada 1, Toru Ishbash 2 1. Tokyo Metropoltan Unversty, 2. Tokyo Metropoltan Insttute of Technology 6-6 Asahgaoka, Hno, Tokyo , Japan emal: ytakama@sd.tmu.ac.p Abstract Ths paper proposes an onlne text mnng system that s developed based on M2VSM (Meta keyword-based Modfed VSM. When conventonal vector space model (VSM s appled to document clusterng, t s dffcult to adust the granularty of cluster n terms of topc. In order to solve the problem, M2VSM s proposed as an extended VSM so that t can consder meta keywords such as adectves and adverbs, as addtonal value of ndexng terms. The smlarty between documents s calculated by consderng the matchng of meta keywords for each ndex term, whch makes t possble to cluster documents wth varous granulartes n terms of topc. The onlne text mnng system s developed wth MUSASHI, whch s one of the most popular open source data mnng tools. By usng the system, users can perform a seres of text mnng process onlne, ncludng preprocessng, feature selecton, clusterng, and vsualzaton of results. Expermental results show that clusterng results by M2VSM match the results by test subects n both rough and detaled clusterng. It s also shown that the system can process database contanng 5,000 documents wthn 7 mnutes. I. INTRODUCTION We can fnd huge databases easly on the Web n recent years, because of breakthroughs n technque for nformaton acquston and dramatcally low-prcng of the mass storage devces. The volume of such databases has been already beyond human s ablty of nformaton processng, and ntellgent support by nformaton technologes ncludng nformaton retreval and data mnng are requred. Varous knds of data mnng and nformaton retreval technques have been developed based on vector space model (VSM because of ts several advantages. One of them s the ablty to rank the documents n order of the expectaton that documents are approprate to a user s query. However, conventonal VSM s dffcult to adust the granularty of cluster n term of a topc. For example, when VSM s appled to document database of a specfc feld such as the feld of medcne, the documents tend to form dense clusters n the vector space because of hgh smlarty between them; therefore ther performance decreases [3]. Employng addtonal words as ndex terms s one of the usual solutons, because ncreasng the number of dmenson by ncreasng the number of ndex terms can make the vector space sparse. However, ths sometmes leads to problems such as curse of dmensonalty, whch prevents the expresson of the accurate relatonshp between documents. Furthermore, clusters found n such sparse space do not tend to have correspondng topc, whch makes t hard to nterpret for humans. In order to solve above-mentoned problems, M2VSM (Meta keyword-based Modfed VSM has been proposed by extendng conventonal VSM [3, 4]. The M2VSM makes use of such meta keywords as addtonal value of ndexng terms, and the smlarty between documents s calculated by consderng the matchng of meta keywords for each ndex term. Ths paper proposes a text mnng system that s developed based on M2VSM. It s desgned for analyzng large volume of documents, from preprocessng such as ndex terms / meta keywords selecton, to document clusterng. It s developed wth MUSASHI, whch s one of the most popular open source data mnng tools. By usng the system, users can perform a seres of text mnng process onlne, ncludng preprocessng, feature selecton, clusterng, and vsualzaton of results. Expermental results show that M2VSM can generate clusters that match those generated by test subects, n both of rough and detaled clusterng. It s also shown that the developed system can analyze 5,000 documents wthn 400 seconds, whch means t s sutable for practcal use n terms of processng speed. II. M2VSM A. Vector Space Model The VSM has been wdely used n the tradtonal nformaton retreval feld. The VSM model creates a mult-dmensonal space, n whch both documents and queres are represented by vectors. For a fxed collecton of documents, a N w -dmensonal vector s generated for each document and query from sets of terms assocated weghts, where N w s the number of ndexng terms n the document collecton. Then the smlarty between documents ncludng query s calculated by cosne measure. In VSM, weght w assocated wth the term t n document D s often calculated by TFIDF (Term Frequency Inverse Document Frequency measure [11], whch s calculated by Eq.. m N D TFIDF( t, D = log, M DF( t where m represents the number of occurrences (frequency of term t n document D, M represents the total frequency of ndexng terms n D, N D s the total number of documents and DF(t s the number of documents contanng t. The smlarty sm(d, D between documents D and D s defned as cosne value of document vectors (Eq.. 739

2 Nw = n = 1 wnwn sm( D, D. D D B. Outlne of M2VSM As mentoned n Secton I, when the conventonal VSM s appled to cluster documents, t s dffcult to adust the granularty of cluster n terms of a topc. In partcular, VSM s not good at dvdng documents n terms of detaled topc. Therefore, when t s appled to a database n a specfc feld such as the feld of medcne, t can crowd the documents n the vector space [3, 4]. Furthermore, even though herarchcal clusterng such as AHC [1] s employed, obtaned herarchy does not always correspond to topcal herarchy lke Web drectory servces. One of the reasons causng ths problem s the exstence of the ndexng terms appearng n many documents, because they have the general meanngs n the feld. Therefore ncreasng the number of the ndexng terms dose not only resolve the problem but also causes the curse of dmensonalty at worst. The M2VSM assumes f same ndexng terms have the dfferent meta keywords (adectves, adverbs, etc. as ts modfer n a dfferent document, each document refers to the dfferent topcs. In other words, ndex terms ndependently represent a topc n general sense, whereas ndex terms combned wth meta keywords represent a topc n more detal. That s, meta keywords gve addtonal value of ndexng terms. Gven the collecton of meta keywords S M, we defne the smlarty as Eq. -(5, Nw α n n wnwn sm( D D = = 1,, α α D D N W α D = α w n n= 1 k n 2 n, (4 α = α, k = ME ME, (5 n where ME n (a subset of S M represents the set of meta keywords of ndexng term t n n a document D, and α n reflects the degree of co-occurrng meta keywords of t n between D and D nto document smlarty calculaton. The α (>1 s parameter for adustng the effect of meta keywords, whch s set to 3 n the experments of ths paper. In prevous study [3, 4], the range of α n s defned as [0,1], whch means the exstence of meta keyword appearng ether document (.e., ether D or D decreases the smlarty. Compared wth the prevous study, the α n ( 1 reflects the nfluence of meta keywords more postvely nto smlarty calculaton. n C. Selecton of Meta Keyword In Sec. 2-B, adectves and adverbs are referred to as meta keywords. In partcular, we select meta keywords from adectves, adverbs, adnomnal nouns, and adectve verbs. These parts of speech are used for descrbng the characterstcs or state of target obect, sentment, emoton, etc., whch are sutable as meta keywords. Ths paper focuses on processng documents wrtten n Japanese. Bascally, a meta keyword of an ndex term t s defned as ether adectve, adverb, adnomnal noun, or adectve verb, whch has modfcaton relaton wth t. In ths paper, nouns are used as ndex terms unless those are used as adnomnal nouns. It s noted that the text mnng system n Sec. III can nteractvely specfy the part of speech for ndex terms and meta keywords. In order to dentfy ndex terms and meta keywords from Japanese documents, ths paper employs Japanese dependency parser Cabocha [6]. III. TEXT MINING SYSTEM BASED ON M2VSM Fg. 1 shows the system archtecture of the developed text mnng system that s based on M2VSM. Current verson of the system can only handle Japanese documents. It conssts of 3 processng components: preprocessng, ndexng / meta keyword selecton, clusterng, and vsualzaton. Gven a set of documents that are to be analyzed, preprocessng component performs morphologcal analyss, syntactc and dependency parsng, removal of words belongng to the part of speech that s not used as ndexng terms or meta keywords, and reunon of words that are excessvely segmented. The result s stored n a database n order to speedng up the subsequent processng. Document Set Selecton Method Selecton Preprocessng Indexng / Meta-keyword selecton Clusterng Data Results Vsualzaton Cluster Selecton Fg. 1. System archtecture of text mnng system based on M2VSM In the next step, a set of ndex terms as well as that of meta keywords, whch are used for smlarty calculaton by M2VSM, are selected. Ths step s performed nteractvely wth the help of a user. In the thrd step, document clusters are generated from the target document set based on the document-document smlarty calculated by M2VSM. The system employs sngle pass clusterng [8, 9] n order to process large number of documents wthn a reasonable tme. When performng clusterng, three smlarty measure can be appled; sngle-lnkage, complete-lnkage, and average-lnkage method. It s also possble to calculate the smlarty based on ordnary VSM. The result of clusterng s presented to a user wth ether table format or usng nformaton vsualzaton [5, 10]. 740

3 The system s mplemented wth usng MUSASHI [2, 7, 12], whch s a famous open source data mnng tool. MUSASHI provdes a set of commands for processng vast amount of data as shown n Table 1. It s expected that usng MUSASHI makes t possble to develop stable and effcent system n relatvely short development tme. Table 1. Part of command set of MUSASHI Command xtagg xtbar xtcat xtcomb xtcount xtcut xtcorrelaton Bref descrpton Aggregaton of records Generaton of bar graph (SVG format Merge of multple XML tables Calculaton of combnaton Countng the number of rows Selecton of tems Calculaton of correlaton coeffcent A. Selecton of ndex terms / meta keywords In the developed system, a user can nteractvely select ndex terms and meta keywords that are to be used for the analyss. Frst, a user selects ndex terms, and then selects meta keywords from the remanng words. When selectng ndex terms, words are fltered based on the part of speeches specfed by a user. The result s further fltered by specfyng mnmum and maxmum df values (DF(t n Eq.. In order to specfy approprate df values, the hstogram of df values s presented by a user, as shown n Fg. 2. Fnally, a user can examne each of the words obtaned by those flterng processes, and remove unwanted words. summary of a clusterng result, and that showng the detal of a cluster. As one of the advantages of M2VSM s that t can perform both rough and detaled clusterng as dscussed n Sec. II-B, the developed system can perform several clusterng processes wth dfferent thresholds n the same tral. The table shows the summary of clusterng results. For each clusterng result, used threshold for clusterng, the number of obtaned clusters, and the number of documents n each cluster s presented. By selectng one or more nterestng clusterng results, summary of the results s shown as the table. The table contans the number of documents, the numbers of ndex terms and meta keywords, frequences of ndex terms and meta keywords for each cluster. By selectng one or more nterestng clusters, detaled nformaton about the clusters s shown as the table. The table contans typcal ndex terms together wth correspondng meta keywords, and typcal documents for the selected clusters. Typcal ndex terms and meta keywords are selected based on ther frequences, and up to 5 documents close to the centrod of a cluster are selected as typcal documents of the cluster. Other detaled nformaton about a cluster, such as the relatonshp between ndex terms, that between ndex terms and meta keywords, that between meta keywords, and that between cluster centrods and ndex terms, are also dsplayed by Keyword Map as shown n Fg. 3. Keyword Map treats an ndex term, meta keyword, and cluster centrod as a node, whch s arranged accordng to the relatonshps wth other nodes so that related nodes can form a cluster on the map. Keyword Map employs sprng model for drawng a map. Flterng by DF values # of Index terms Fg. 2. Hstogram of DF values DF The selecton of meta keywords s performed n smlar way, except that typcal ndex terms are presented for each canddate meta keyword n the last step. A user can select meta keywords by examnng ther relatons wth the correspondng ndex terms. B. Output of clusterng results The result of clusterng s presented to a user wth two types of formats: a table format and vsualzaton by keyword map [5]. A table format s generated as HTML fles, whch a user can vew wth ordnary HTML browser. There are three types of tables; a table comparng the clusterng results wth dfferent thresholds, that showng the Fg. 3. Analyzed result presented by Keyword Map IV. EXPERIMENTS A. Performance of M2VSM Experments are performed wth three document sets wrtten n Japanese. The purpose of the experments s to show the effectveness of the proposed M2VSM aganst conventonal VSM n document clusterng n two levels: clusterng by general topc (rough clusterng and by detaled topc (detaled 741

4 clusterng. For that purpose, comparson between clusterng results by M2VSM, VSM, PVSM (Phrase-based VSM, and test subects are performed. The PVSM s a smple extenson of VSM, n whch phrase (combnaton of a noun and ts meta keywords s used as an ndex term nstead of ndependent word. It s expected that PVSM could generate clusters correspondng to more detaled topcs than normal VSM. Documents used for the experments are edtoral artcles of 7 Japanese newspaper companes: Asah Shmbun 1, Yomur Shmbun 2, Nkke Shmbun 3, Kobe Shmbun 4, Chugoku Shmbun 5, Hokkado 6 Shmbun, and Kahoku Shmpo 7. Total number of artcles, whch were collected from June 1, 2005 to December 29, 2005, s 2,298. In order to reduce the burden of test subects, we frst appled M2VSM, VSM, and PVSM to the collected document sets, and found small subset of documents that belong to the same cluster by any of 3 methods. By addng some nose artcles to those subsets, we obtaned the 3 document set A, B, and C, each of whch contans 20 documents. That s, each document set forms a sngle cluster under general topc, but are dvded nto several clusters under specfc topcs. In ths process, sngle lnkage method s appled and the same threshold s used for the 3 methods. The topcs of the document set are as follows. Here, RC (rough cluster means the topc of the entre document set, and DC (detaled clusters means the topcs of clusters when a document set s dvded n detal. - Set A: (RC nternatonal ssues, (DC sx-party talks, postwar perod, Iran - Set B: (RC North Korea, (DC Japan-North Korea talks, sx-party talks - Set C: (RC IT, (DC Rakuten-TBS problem, meda and the Internet Ten test subects are asked to cluster each of the 3 document sets n two levels. Frst, they are asked to roughly dvde the documents n terms of topc. Then, the obtaned clusters are further dvded nto clusters n terms of more detaled topc. There s no constrant on the number of clusters n each level. The clusterng results by M2VSM, VSM, and PVSM and those by test subects are compared wth the followng measure. dp( method, subect Match( method, subect, D =, (6 C D 2 where method ndcates ether M2VSM, VSM, or PVSM, and subect ndcates a subect (=1,,10. The D s a document set (A, B, or C, d p (method,subect s the number of document pars, whch are clustered n the same way by both of method and subect. For example, let us consder the case where a document set contans 3 documents {1, 2, 3} and one method dvdes t as {1, 2} and {3}, and a subect dvdes t as {1, 2, 3}. In ths case, total number of document pars (.e. denomnator n Eq. (6 s 3 ( 1-2, 1-3, and 2-3, and only the par 1-2 s clustered n the same way (.e. belongng to the same cluster by both of them, the matchng score s 1/3=0.33. When the clusterng results of both a method and a subect are completely the same, Eq. (6 s equal to. Each method s compared wth 10 test subects wth Eq. (6. Table 2, 3, and 4 summarze the comparson result for document set A, B, and C, respectvely. In these tables, the left column (RC for each method shows the result for rough clusterng, and the rght column (DC s the result for detaled clusterng. That s, the sngle-lnkage method s appled to the document set and rough clusterng s performed by cuttng the obtaned dendrogram wth low smlarty threshold, whereas detaled clusterng s performed wth hgh smlarty threshold. Thresholds are determned for each method so that the average score (Eq. (6 over test subects can be as hgh as possble. In the tables, NUM shows the number of clusters generated by each method, TH s used threshold, Avg., Max, and Mn are average, max, mnmum matchng score over 10 test subects, respectvely (number n parentheses s the rank among three methods. Table 2. Expermental results for set A NUM TH Avg Max Mn Table 3. Expermental results for set B NUM TH Avg Max Mn It can be seen from the tables that M2VSM and VSM obtan the same results for all data sets n the case of rough clusterng. However, when detaled clusterng s performed, the performance of VSM s lower than M2VSM. PVSM tends to outperform VSM when detaled clusterng s performed, but ts performance tends to be worst than other 2 methods n rough clusterng. The most mportant thng s that the proposed M2VSM can obtan the best results for all 3 data sets, n both rough and detaled clusterng. These results show M2VSM s 742

5 capable of adustng the granularty of clusters n terms of a topc. Table 4. Expermental results for set C NUM TH Avg Max Mn B. Evaluaton of M2VSM-based Text Mng System The performance of the developed text mnng system s evaluated n terms of processng tme. Documents used for the experments are edtoral artcles of 7 Japanese newspaper companes 1-7. Total number of artcles s 5,672, whch were collected from May 1, 2005 to Jan 31, Table 5 shows the used parameter values and the specfcaton on whch the system was run. The processng tme throughout the analyss process,.e., from preprocessng to document clusterng s measured whle varyng the number of documents from 200 to 5,000. It s noted that the tme of user s nteractng wth the system s omtted from the processng tme. Fg. 3 shows the relatonshp between processng tme and the number of documents. It can be seen that the developed system can process 5,000 documents wthn 400 seconds. Table 5. Parameters used for experment # of ndex terms # of meta keywords Clusterng Method 2, Average lnkage 0.7 CPU Cache sze Memory Swap Pentum GHz Tme (s Threshold for clusterng 512KB 755MB 1.5GB # of documents Fg. 3. Relatonshp between the number of documents and processng tme V. CONCLUSIONS The M2VSM, a modfed VSM based on meta keywords s proposed for clusterng documents wth varous granulartes n terms of topc. The M2VSM makes use of adectves, adverbs, adnomnal nouns, and adectve verbs as meta keywords for ndex terms, and calculate document smlarty whle consderng the effect of meta keywords. Experments are performed wth the sets of edtoral artcles of Japanese newspapers, and the results show obtaned clusterng results can correspond to the results by test subects n the case of both rough and detaled clusterng. A text mnng system s developed based on M2VSM, and the expermental result shows t has enough processng speed for practcal use. In the future study, we are gong to provde the system wth experts for a specfc doman, such as management engneer. Although only documents wrtten n Japanese s consdered n ths paper, M2VSM tself can be appled to documents wrtten wth other languages such as Englsh documents. As preprocessng,.e., meta keyword extracton should be dfferent from language to language, t should be studed for each languages. Our future study ncludes the applcaton of M2VSM to Englsh documents. In the experments reported n the paper, clusterng of only two levels, rough and detaled clusterng, s performed. It s also challengng to apply M2VSM to generaton of mult-level (more than 2 level topcal structure, such as Web drectory servces. REFERENCES [1] S. Chakrabart, Chapter 4: Smlarty and Clusterng, mnng the web, Morgan Kaufmann, pp , [2] Y. Hamuro, N. Katoh, K. Yada, MUSASHI: Flexble and Effcent Data Preprocessng Tool for KDD based on XML, Proceedngs of the Frst Internatonal Workshop on Data Cleanng and Preprocessng, pp.38-49, [3] T. Ishbash, and Y. Takama, Proposal of M2VSM and Its Comparson wth Conventonal VSM, AM2004, Vol ICS-128, pp. 1-6, [4] T. Ishbash, and Y. Takama, Proposal of M2VSM for Informaton Retreval n the Specfc Feld, SCIS&ISIS2004, THP-3-3, [5] T. Kanam, and Y. Takama, Interactve Keyword Map Equpped wth Keywords Arrangement Support Functons for Emphaszng User s Intenton, Trans. Informaton Processng Socety of Japan, Vol. 48, No. 3, pp , 2007 (wrtten n Japanese. [6] T. Kudo, and Y. Matsumoto, Japanese dependency analyss usng cascaded chunkng, Proc. Of 6th Conference on Natural Language Learnng, Vol. 20, pp. 1-7, [7] MUSASHI Mnng Utltes and System Archtecture for Scalable processng of HIstorcal data, [8] R. Papa, and J. Allan, On-lne New Event Detecton Usng Sngle-pass Clusterng, UMASS Computer Scence Techncal Report, UM-CS , [9] M. Sptters, and W. Kraa, TNO at TDT2001: Language Model-Based Topc Detecton, Topc Detecton and Trackng (TDT Workshop 2001, [10] Y. Takama, T. Kanam, and A. Matsumura, Applcaton of Keyword Map-based Relevance Feedback to Interactve Blog Search, AMT 2005, pp , [11] J. Thorsten, A probablstc Analyss of the Roccho Algorthm wth TFIDF for Text Categorzaton, n proceedng of the 14th Internatonal Conference on Machne Learnng, pp , [12] K. Yada, Y. Hamuro N. Katoh, T. Washo, I. Fusamoto, Data Mnng Orented CRM System Based on MUSASHI: C-MUSASHI, Proceedngs of Second Internatonal Workshop on Actve Mnng, pp.52-61,

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

UB at GeoCLEF Department of Geography Abstract

UB at GeoCLEF Department of Geography   Abstract UB at GeoCLEF 2006 Mguel E. Ruz (1), Stuart Shapro (2), June Abbas (1), Slva B. Southwck (1) and Davd Mark (3) State Unversty of New York at Buffalo (1) Department of Lbrary and Informaton Studes (2) Department

More information

Description of NTU Approach to NTCIR3 Multilingual Information Retrieval

Description of NTU Approach to NTCIR3 Multilingual Information Retrieval Proceedngs of the Thrd NTCIR Workshop Descrpton of NTU Approach to NTCIR3 Multlngual Informaton Retreval Wen-Cheng Ln and Hsn-Hs Chen Department of Computer Scence and Informaton Engneerng Natonal Tawan

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

Query Clustering Using a Hybrid Query Similarity Measure

Query Clustering Using a Hybrid Query Similarity Measure Query clusterng usng a hybrd query smlarty measure Fu. L., Goh, D.H., & Foo, S. (2004). WSEAS Transacton on Computers, 3(3), 700-705. Query Clusterng Usng a Hybrd Query Smlarty Measure Ln Fu, Don Hoe-Lan

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

Information Retrieval

Information Retrieval Anmol Bhasn abhasn[at]cedar.buffalo.edu Moht Devnan mdevnan[at]cse.buffalo.edu Sprng 2005 #$ "% &'" (! Informaton Retreval )" " * + %, ##$ + *--. / "#,0, #'",,,#$ ", # " /,,#,0 1"%,2 '",, Documents are

More information

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for

More information

Performance Evaluation of Information Retrieval Systems

Performance Evaluation of Information Retrieval Systems Why System Evaluaton? Performance Evaluaton of Informaton Retreval Systems Many sldes n ths secton are adapted from Prof. Joydeep Ghosh (UT ECE) who n turn adapted them from Prof. Dk Lee (Unv. of Scence

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng

More information

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach Angle Estmaton and Correcton of Hand Wrtten, Textual and Large areas of Non-Textual Document Images: A Novel Approach D.R.Ramesh Babu Pyush M Kumat Mahesh D Dhannawat PES Insttute of Technology Research

More information

Document Representation and Clustering with WordNet Based Similarity Rough Set Model

Document Representation and Clustering with WordNet Based Similarity Rough Set Model IJCSI Internatonal Journal of Computer Scence Issues, Vol. 8, Issue 5, No 3, September 20 ISSN (Onlne): 694-084 www.ijcsi.org Document Representaton and Clusterng wth WordNet Based Smlarty Rough Set Model

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

TN348: Openlab Module - Colocalization

TN348: Openlab Module - Colocalization TN348: Openlab Module - Colocalzaton Topc The Colocalzaton module provdes the faclty to vsualze and quantfy colocalzaton between pars of mages. The Colocalzaton wndow contans a prevew of the two mages

More information

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide Lobachevsky State Unversty of Nzhn Novgorod Polyhedron Quck Start Gude Nzhn Novgorod 2016 Contents Specfcaton of Polyhedron software... 3 Theoretcal background... 4 1. Interface of Polyhedron... 6 1.1.

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

A Knowledge Management System for Organizing MEDLINE Database

A Knowledge Management System for Organizing MEDLINE Database A Knowledge Management System for Organzng MEDLINE Database Hyunk Km, Su-Shng Chen Computer and Informaton Scence Engneerng Department, Unversty of Florda, Ganesvlle, Florda 32611, USA Wth the exploson

More information

An Image Fusion Approach Based on Segmentation Region

An Image Fusion Approach Based on Segmentation Region Rong Wang, L-Qun Gao, Shu Yang, Yu-Hua Cha, and Yan-Chun Lu An Image Fuson Approach Based On Segmentaton Regon An Image Fuson Approach Based on Segmentaton Regon Rong Wang, L-Qun Gao, Shu Yang 3, Yu-Hua

More information

Web Document Classification Based on Fuzzy Association

Web Document Classification Based on Fuzzy Association Web Document Classfcaton Based on Fuzzy Assocaton Choochart Haruechayasa, Me-Lng Shyu Department of Electrcal and Computer Engneerng Unversty of Mam Coral Gables, FL 33124, USA charuech@mam.edu, shyu@mam.edu

More information

USING GRAPHING SKILLS

USING GRAPHING SKILLS Name: BOLOGY: Date: _ Class: USNG GRAPHNG SKLLS NTRODUCTON: Recorded data can be plotted on a graph. A graph s a pctoral representaton of nformaton recorded n a data table. t s used to show a relatonshp

More information

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1) Secton 1.2 Subsets and the Boolean operatons on sets If every element of the set A s an element of the set B, we say that A s a subset of B, or that A s contaned n B, or that B contans A, and we wrte A

More information

Available online at Available online at Advanced in Control Engineering and Information Science

Available online at   Available online at   Advanced in Control Engineering and Information Science Avalable onlne at wwwscencedrectcom Avalable onlne at wwwscencedrectcom Proceda Proceda Engneerng Engneerng 00 (2011) 15000 000 (2011) 1642 1646 Proceda Engneerng wwwelsevercom/locate/proceda Advanced

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

COMPLEX WAVELET TRANSFORM-BASED COLOR INDEXING FOR CONTENT-BASED IMAGE RETRIEVAL

COMPLEX WAVELET TRANSFORM-BASED COLOR INDEXING FOR CONTENT-BASED IMAGE RETRIEVAL COMPLEX WAVELET TRANSFORM-BASED COLOR INDEXING FOR CONTENT-BASED IMAGE RETRIEVAL Nader Safavan and Shohreh Kasae Department of Computer Engneerng Sharf Unversty of Technology Tehran, Iran skasae@sharf.edu

More information

Enhancement of Infrequent Purchased Product Recommendation Using Data Mining Techniques

Enhancement of Infrequent Purchased Product Recommendation Using Data Mining Techniques Enhancement of Infrequent Purchased Product Recommendaton Usng Data Mnng Technques Noraswalza Abdullah, Yue Xu, Shlomo Geva, and Mark Loo Dscplne of Computer Scence Faculty of Scence and Technology Queensland

More information

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009. Farrukh Jabeen Algorthms 51 Assgnment #2 Due Date: June 15, 29. Assgnment # 2 Chapter 3 Dscrete Fourer Transforms Implement the FFT for the DFT. Descrbed n sectons 3.1 and 3.2. Delverables: 1. Concse descrpton

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

3D vector computer graphics

3D vector computer graphics 3D vector computer graphcs Paolo Varagnolo: freelance engneer Padova Aprl 2016 Prvate Practce ----------------------------------- 1. Introducton Vector 3D model representaton n computer graphcs requres

More information

Feature Reduction and Selection

Feature Reduction and Selection Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components

More information

Relevance Feedback Document Retrieval using Non-Relevant Documents

Relevance Feedback Document Retrieval using Non-Relevant Documents Relevance Feedback Document Retreval usng Non-Relevant Documents TAKASHI ONODA, HIROSHI MURATA and SEIJI YAMADA Ths paper reports a new document retreval method usng non-relevant documents. From a large

More information

Keyword-based Document Clustering

Keyword-based Document Clustering Keyword-based ocument lusterng Seung-Shk Kang School of omputer Scence Kookmn Unversty & AIrc hungnung-dong Songbuk-gu Seoul 36-72 Korea sskang@kookmn.ac.kr Abstract ocument clusterng s an aggregaton of

More information

Load-Balanced Anycast Routing

Load-Balanced Anycast Routing Load-Balanced Anycast Routng Chng-Yu Ln, Jung-Hua Lo, and Sy-Yen Kuo Department of Electrcal Engneerng atonal Tawan Unversty, Tape, Tawan sykuo@cc.ee.ntu.edu.tw Abstract For fault-tolerance and load-balance

More information

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Determining the Optimal Bandwidth Based on Multi-criterion Fusion Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Mult-crteron Fuson Ha-L Lang 1+, Xan-Mn

More information

The Research of Support Vector Machine in Agricultural Data Classification

The Research of Support Vector Machine in Agricultural Data Classification The Research of Support Vector Machne n Agrcultural Data Classfcaton Le Sh, Qguo Duan, Xnmng Ma, Me Weng College of Informaton and Management Scence, HeNan Agrcultural Unversty, Zhengzhou 45000 Chna Zhengzhou

More information

Semantic Image Retrieval Using Region Based Inverted File

Semantic Image Retrieval Using Region Based Inverted File Semantc Image Retreval Usng Regon Based Inverted Fle Dengsheng Zhang, Md Monrul Islam, Guoun Lu and Jn Hou 2 Gppsland School of Informaton Technology, Monash Unversty Churchll, VIC 3842, Australa E-mal:

More information

Load Balancing for Hex-Cell Interconnection Network

Load Balancing for Hex-Cell Interconnection Network Int. J. Communcatons, Network and System Scences,,, - Publshed Onlne Aprl n ScRes. http://www.scrp.org/journal/jcns http://dx.do.org/./jcns.. Load Balancng for Hex-Cell Interconnecton Network Saher Manaseer,

More information

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 A mathematcal programmng approach to the analyss, desgn and

More information

A PATTERN RECOGNITION APPROACH TO IMAGE SEGMENTATION

A PATTERN RECOGNITION APPROACH TO IMAGE SEGMENTATION 1 THE PUBLISHING HOUSE PROCEEDINGS OF THE ROMANIAN ACADEMY, Seres A, OF THE ROMANIAN ACADEMY Volume 4, Number 2/2003, pp.000-000 A PATTERN RECOGNITION APPROACH TO IMAGE SEGMENTATION Tudor BARBU Insttute

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices Steps for Computng the Dssmlarty, Entropy, Herfndahl-Hrschman and Accessblty (Gravty wth Competton) Indces I. Dssmlarty Index Measurement: The followng formula can be used to measure the evenness between

More information

Classifier Selection Based on Data Complexity Measures *

Classifier Selection Based on Data Complexity Measures * Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.

More information

A Method of Hot Topic Detection in Blogs Using N-gram Model

A Method of Hot Topic Detection in Blogs Using N-gram Model 84 JOURNAL OF SOFTWARE, VOL. 8, NO., JANUARY 203 A Method of Hot Topc Detecton n Blogs Usng N-gram Model Xaodong Wang College of Computer and Informaton Technology, Henan Normal Unversty, Xnxang, Chna

More information

Deep Classification in Large-scale Text Hierarchies

Deep Classification in Large-scale Text Hierarchies Deep Classfcaton n Large-scale Text Herarches Gu-Rong Xue Dkan Xng Qang Yang 2 Yong Yu Dept. of Computer Scence and Engneerng Shangha Jao-Tong Unversty {grxue, dkxng, yyu}@apex.sjtu.edu.cn 2 Hong Kong

More information

Sequential search. Building Java Programs Chapter 13. Sequential search. Sequential search

Sequential search. Building Java Programs Chapter 13. Sequential search. Sequential search Sequental search Buldng Java Programs Chapter 13 Searchng and Sortng sequental search: Locates a target value n an array/lst by examnng each element from start to fnsh. How many elements wll t need to

More information

Virtual Machine Migration based on Trust Measurement of Computer Node

Virtual Machine Migration based on Trust Measurement of Computer Node Appled Mechancs and Materals Onlne: 2014-04-04 ISSN: 1662-7482, Vols. 536-537, pp 678-682 do:10.4028/www.scentfc.net/amm.536-537.678 2014 Trans Tech Publcatons, Swtzerland Vrtual Machne Mgraton based on

More information

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms Course Introducton Course Topcs Exams, abs, Proects A quc loo at a few algorthms 1 Advanced Data Structures and Algorthms Descrpton: We are gong to dscuss algorthm complexty analyss, algorthm desgn technques

More information

Combining Multiple Resources, Evidence and Criteria for Genomic Information Retrieval

Combining Multiple Resources, Evidence and Criteria for Genomic Information Retrieval Combnng Multple Resources, Evdence and Crtera for Genomc Informaton Retreval Luo S 1, Je Lu 2 and Jame Callan 2 1 Department of Computer Scence, Purdue Unversty, West Lafayette, IN 47907, USA ls@cs.purdue.edu

More information

Fuzzy C-Means Initialized by Fixed Threshold Clustering for Improving Image Retrieval

Fuzzy C-Means Initialized by Fixed Threshold Clustering for Improving Image Retrieval Fuzzy -Means Intalzed by Fxed Threshold lusterng for Improvng Image Retreval NAWARA HANSIRI, SIRIPORN SUPRATID,HOM KIMPAN 3 Faculty of Informaton Technology Rangst Unversty Muang-Ake, Paholyotn Road, Patumtan,

More information

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION Paulo Quntlano 1 & Antono Santa-Rosa 1 Federal Polce Department, Brasla, Brazl. E-mals: quntlano.pqs@dpf.gov.br and

More information

Learning-Based Top-N Selection Query Evaluation over Relational Databases

Learning-Based Top-N Selection Query Evaluation over Relational Databases Learnng-Based Top-N Selecton Query Evaluaton over Relatonal Databases Lang Zhu *, Wey Meng ** * School of Mathematcs and Computer Scence, Hebe Unversty, Baodng, Hebe 071002, Chna, zhu@mal.hbu.edu.cn **

More information

On Some Entertaining Applications of the Concept of Set in Computer Science Course

On Some Entertaining Applications of the Concept of Set in Computer Science Course On Some Entertanng Applcatons of the Concept of Set n Computer Scence Course Krasmr Yordzhev *, Hrstna Kostadnova ** * Assocate Professor Krasmr Yordzhev, Ph.D., Faculty of Mathematcs and Natural Scences,

More information

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc.

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc. [Type text] [Type text] [Type text] ISSN : 0974-74 Volume 0 Issue BoTechnology 04 An Indan Journal FULL PAPER BTAIJ 0() 04 [684-689] Revew on Chna s sports ndustry fnancng market based on market -orented

More information

A Webpage Similarity Measure for Web Sessions Clustering Using Sequence Alignment

A Webpage Similarity Measure for Web Sessions Clustering Using Sequence Alignment A Webpage Smlarty Measure for Web Sessons Clusterng Usng Sequence Algnment Mozhgan Azmpour-Kv School of Engneerng and Scence Sharf Unversty of Technology, Internatonal Campus Ksh Island, Iran mogan_az@ksh.sharf.edu

More information

Decision Strategies for Rating Objects in Knowledge-Shared Research Networks

Decision Strategies for Rating Objects in Knowledge-Shared Research Networks Decson Strateges for Ratng Objects n Knowledge-Shared Research etwors ALEXADRA GRACHAROVA *, HAS-JOACHM ER **, HASSA OUR ELD ** OM SUUROE ***, HARR ARAKSE *** * nsttute of Control and System Research,

More information

Clustering Algorithm of Similarity Segmentation based on Point Sorting

Clustering Algorithm of Similarity Segmentation based on Point Sorting Internatonal onference on Logstcs Engneerng, Management and omputer Scence (LEMS 2015) lusterng Algorthm of Smlarty Segmentaton based on Pont Sortng Hanbng L, Yan Wang*, Lan Huang, Mngda L, Yng Sun, Hanyuan

More information

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems A Unfed Framework for Semantcs and Feature Based Relevance Feedback n Image Retreval Systems Ye Lu *, Chunhu Hu 2, Xngquan Zhu 3*, HongJang Zhang 2, Qang Yang * School of Computng Scence Smon Fraser Unversty

More information

Pruning Training Corpus to Speedup Text Classification 1

Pruning Training Corpus to Speedup Text Classification 1 Prunng Tranng Corpus to Speedup Text Classfcaton Jhong Guan and Shugeng Zhou School of Computer Scence, Wuhan Unversty, Wuhan, 430079, Chna hguan@wtusm.edu.cn State Key Lab of Software Engneerng, Wuhan

More information

Brave New World Pseudocode Reference

Brave New World Pseudocode Reference Brave New World Pseudocode Reference Pseudocode s a way to descrbe how to accomplsh tasks usng basc steps lke those a computer mght perform. In ths week s lab, you'll see how a form of pseudocode can be

More information

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes SPH3UW Unt 7.3 Sphercal Concave Mrrors Page 1 of 1 Notes Physcs Tool box Concave Mrror If the reflectng surface takes place on the nner surface of the sphercal shape so that the centre of the mrror bulges

More information

Programming in Fortran 90 : 2017/2018

Programming in Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Exercse 1 : Evaluaton of functon dependng on nput Wrte a program who evaluate the functon f (x,y) for any two user specfed values

More information

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z.

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z. TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS Muradalyev AZ Azerbajan Scentfc-Research and Desgn-Prospectng Insttute of Energetc AZ1012, Ave HZardab-94 E-mal:aydn_murad@yahoocom Importance of

More information

Wishing you all a Total Quality New Year!

Wishing you all a Total Quality New Year! Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma

More information

Module Management Tool in Software Development Organizations

Module Management Tool in Software Development Organizations Journal of Computer Scence (5): 8-, 7 ISSN 59-66 7 Scence Publcatons Management Tool n Software Development Organzatons Ahmad A. Al-Rababah and Mohammad A. Al-Rababah Faculty of IT, Al-Ahlyyah Amman Unversty,

More information

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS ARPN Journal of Engneerng and Appled Scences 006-017 Asan Research Publshng Network (ARPN). All rghts reserved. NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS Igor Grgoryev, Svetlana

More information

A Simple Methodology for Database Clustering. Hao Tang 12 Guangdong University of Technology, Guangdong, , China

A Simple Methodology for Database Clustering. Hao Tang 12 Guangdong University of Technology, Guangdong, , China for Database Clusterng Guangdong Unversty of Technology, Guangdong, 0503, Chna E-mal: 6085@qq.com Me Zhang Guangdong Unversty of Technology, Guangdong, 0503, Chna E-mal:64605455@qq.com Database clusterng

More information

Cross-Language Information Retrieval

Cross-Language Information Retrieval Feature Artcle: Cross-Language Informaton Retreval 19 Cross-Language Informaton Retreval Jan-Yun Ne 1 Abstract A research group n Unversty of Montreal has worked on the problem of cross-language nformaton

More information

CMPS 10 Introduction to Computer Science Lecture Notes

CMPS 10 Introduction to Computer Science Lecture Notes CPS 0 Introducton to Computer Scence Lecture Notes Chapter : Algorthm Desgn How should we present algorthms? Natural languages lke Englsh, Spansh, or French whch are rch n nterpretaton and meanng are not

More information

Journal of Chemical and Pharmaceutical Research, 2014, 6(6): Research Article. A selective ensemble classification method on microarray data

Journal of Chemical and Pharmaceutical Research, 2014, 6(6): Research Article. A selective ensemble classification method on microarray data Avalable onlne www.ocpr.com Journal of Chemcal and Pharmaceutcal Research, 2014, 6(6):2860-2866 Research Artcle ISSN : 0975-7384 CODEN(USA) : JCPRC5 A selectve ensemble classfcaton method on mcroarray

More information

LinkSelector: A Web Mining Approach to. Hyperlink Selection for Web Portals

LinkSelector: A Web Mining Approach to. Hyperlink Selection for Web Portals nkselector: A Web Mnng Approach to Hyperlnk Selecton for Web Portals Xao Fang and Olva R. u Sheng Department of Management Informaton Systems Unversty of Arzona, AZ 8572 {xfang,sheng}@bpa.arzona.edu Submtted

More information

User Tweets based Genre Prediction and Movie Recommendation using LSI and SVD

User Tweets based Genre Prediction and Movie Recommendation using LSI and SVD User Tweets based Genre Predcton and Move Recommendaton usng LSI and SVD Saksh Bansal, Chetna Gupta Department of CSE/IT Jaypee Insttute of Informaton Technology,sec-62 Noda, Inda sakshbansal76@gmal.com,

More information

ANALYSIS OF ADAPTIF LOCAL REGION IMPLEMENTATION ON LOCAL THRESHOLDING METHOD

ANALYSIS OF ADAPTIF LOCAL REGION IMPLEMENTATION ON LOCAL THRESHOLDING METHOD Nusantara Journal of Computers and ts Applcatons ANALYSIS F ADAPTIF LCAL REGIN IMPLEMENTATIN N LCAL THRESHLDING METHD I Gust Agung Socrates Ad Guna 1), Hendra Maulana 2), Agus Zanal Arfn 3) and Dn Adn

More information

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1 4/14/011 Outlne Dscrmnatve classfers for mage recognton Wednesday, Aprl 13 Krsten Grauman UT-Austn Last tme: wndow-based generc obect detecton basc ppelne face detecton wth boostng as case study Today:

More information

ON SOME ENTERTAINING APPLICATIONS OF THE CONCEPT OF SET IN COMPUTER SCIENCE COURSE

ON SOME ENTERTAINING APPLICATIONS OF THE CONCEPT OF SET IN COMPUTER SCIENCE COURSE Yordzhev K., Kostadnova H. Інформаційні технології в освіті ON SOME ENTERTAINING APPLICATIONS OF THE CONCEPT OF SET IN COMPUTER SCIENCE COURSE Yordzhev K., Kostadnova H. Some aspects of programmng educaton

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

SURFACE PROFILE EVALUATION BY FRACTAL DIMENSION AND STATISTIC TOOLS USING MATLAB

SURFACE PROFILE EVALUATION BY FRACTAL DIMENSION AND STATISTIC TOOLS USING MATLAB SURFACE PROFILE EVALUATION BY FRACTAL DIMENSION AND STATISTIC TOOLS USING MATLAB V. Hotař, A. Hotař Techncal Unversty of Lberec, Department of Glass Producng Machnes and Robotcs, Department of Materal

More information

Image Representation & Visualization Basic Imaging Algorithms Shape Representation and Analysis. outline

Image Representation & Visualization Basic Imaging Algorithms Shape Representation and Analysis. outline mage Vsualzaton mage Vsualzaton mage Representaton & Vsualzaton Basc magng Algorthms Shape Representaton and Analyss outlne mage Representaton & Vsualzaton Basc magng Algorthms Shape Representaton and

More information

Querying by sketch geographical databases. Yu Han 1, a *

Querying by sketch geographical databases. Yu Han 1, a * 4th Internatonal Conference on Sensors, Measurement and Intellgent Materals (ICSMIM 2015) Queryng by sketch geographcal databases Yu Han 1, a * 1 Department of Basc Courses, Shenyang Insttute of Artllery,

More information

Rules for Using Multi-Attribute Utility Theory for Estimating a User s Interests

Rules for Using Multi-Attribute Utility Theory for Estimating a User s Interests Rules for Usng Mult-Attrbute Utlty Theory for Estmatng a User s Interests Ralph Schäfer 1 DFKI GmbH, Stuhlsatzenhausweg 3, 66123 Saarbrücken Ralph.Schaefer@dfk.de Abstract. In ths paper, we show that Mult-Attrbute

More information

An Efficient Genetic Algorithm with Fuzzy c-means Clustering for Traveling Salesman Problem

An Efficient Genetic Algorithm with Fuzzy c-means Clustering for Traveling Salesman Problem An Effcent Genetc Algorthm wth Fuzzy c-means Clusterng for Travelng Salesman Problem Jong-Won Yoon and Sung-Bae Cho Dept. of Computer Scence Yonse Unversty Seoul, Korea jwyoon@sclab.yonse.ac.r, sbcho@cs.yonse.ac.r

More information

Corner-Based Image Alignment using Pyramid Structure with Gradient Vector Similarity

Corner-Based Image Alignment using Pyramid Structure with Gradient Vector Similarity Journal of Sgnal and Informaton Processng, 013, 4, 114-119 do:10.436/jsp.013.43b00 Publshed Onlne August 013 (http://www.scrp.org/journal/jsp) Corner-Based Image Algnment usng Pyramd Structure wth Gradent

More information

Alignment Results of SOBOM for OAEI 2010

Alignment Results of SOBOM for OAEI 2010 Algnment Results of SOBOM for OAEI 2010 Pegang Xu, Yadong Wang, Lang Cheng, Tany Zang School of Computer Scence and Technology Harbn Insttute of Technology, Harbn, Chna pegang.xu@gmal.com, ydwang@ht.edu.cn,

More information

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning Outlne Artfcal Intellgence and ts applcatons Lecture 8 Unsupervsed Learnng Professor Danel Yeung danyeung@eee.org Dr. Patrck Chan patrckchan@eee.org South Chna Unversty of Technology, Chna Introducton

More information

Arabic Text Classification Using N-Gram Frequency Statistics A Comparative Study

Arabic Text Classification Using N-Gram Frequency Statistics A Comparative Study Arabc Text Classfcaton Usng N-Gram Frequency Statstcs A Comparatve Study Lala Khresat Dept. of Computer Scence, Math and Physcs Farlegh Dcknson Unversty 285 Madson Ave, Madson NJ 07940 Khresat@fdu.edu

More information

A Clustering Algorithm for Key Frame Extraction Based on Density Peak

A Clustering Algorithm for Key Frame Extraction Based on Density Peak Journal of Computer and Communcatons, 2018, 6, 118-128 http://www.scrp.org/ournal/cc ISSN Onlne: 2327-5227 ISSN Prnt: 2327-5219 A Clusterng Algorthm for Key Frame Extracton Based on Densty Peak Hong Zhao

More information

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following.

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following. Complex Numbers The last topc n ths secton s not really related to most of what we ve done n ths chapter, although t s somewhat related to the radcals secton as we wll see. We also won t need the materal

More information

Visual Thesaurus for Color Image Retrieval using Self-Organizing Maps

Visual Thesaurus for Color Image Retrieval using Self-Organizing Maps Vsual Thesaurus for Color Image Retreval usng Self-Organzng Maps Chrstopher C. Yang and Mlo K. Yp Department of System Engneerng and Engneerng Management The Chnese Unversty of Hong Kong, Hong Kong ABSTRACT

More information

Classic Term Weighting Technique for Mining Web Content Outliers

Classic Term Weighting Technique for Mining Web Content Outliers Internatonal Conference on Computatonal Technques and Artfcal Intellgence (ICCTAI'2012) Penang, Malaysa Classc Term Weghtng Technque for Mnng Web Content Outlers W.R. Wan Zulkfel, N. Mustapha, and A. Mustapha

More information

Keywords - Wep page classification; bag of words model; topic model; hierarchical classification; Support Vector Machines

Keywords - Wep page classification; bag of words model; topic model; hierarchical classification; Support Vector Machines (IJCSIS) Internatonal Journal of Computer Scence and Informaton Securty, Herarchcal Web Page Classfcaton Based on a Topc Model and Neghborng Pages Integraton Wongkot Srura Phayung Meesad Choochart Haruechayasak

More information

Intrinsic Plagiarism Detection Using Character n-gram Profiles

Intrinsic Plagiarism Detection Using Character n-gram Profiles Intrnsc Plagarsm Detecton Usng Character n-gram Profles Efstathos Stamatatos Unversty of the Aegean 83200 - Karlovass, Samos, Greece stamatatos@aegean.gr Abstract: The task of ntrnsc plagarsm detecton

More information

Analysis of Continuous Beams in General

Analysis of Continuous Beams in General Analyss of Contnuous Beams n General Contnuous beams consdered here are prsmatc, rgdly connected to each beam segment and supported at varous ponts along the beam. onts are selected at ponts of support,

More information

Reducing Frame Rate for Object Tracking

Reducing Frame Rate for Object Tracking Reducng Frame Rate for Object Trackng Pavel Korshunov 1 and We Tsang Oo 2 1 Natonal Unversty of Sngapore, Sngapore 11977, pavelkor@comp.nus.edu.sg 2 Natonal Unversty of Sngapore, Sngapore 11977, oowt@comp.nus.edu.sg

More information

Solving two-person zero-sum game by Matlab

Solving two-person zero-sum game by Matlab Appled Mechancs and Materals Onlne: 2011-02-02 ISSN: 1662-7482, Vols. 50-51, pp 262-265 do:10.4028/www.scentfc.net/amm.50-51.262 2011 Trans Tech Publcatons, Swtzerland Solvng two-person zero-sum game by

More information

Concept Forest: A New Ontology-assisted Text Document Similarity Measurement Method

Concept Forest: A New Ontology-assisted Text Document Similarity Measurement Method Concept Forest: A New Ontology-asssted Text Document Smlarty Measurement Method James Z. Wang Wllam Taylor School of Computng Clemson Unversty, Box 340974 Clemson, SC 29634-0974, USA +1-864-656-7678 {jzwang,

More information

Object-Based Techniques for Image Retrieval

Object-Based Techniques for Image Retrieval 54 Zhang, Gao, & Luo Chapter VII Object-Based Technques for Image Retreval Y. J. Zhang, Tsnghua Unversty, Chna Y. Y. Gao, Tsnghua Unversty, Chna Y. Luo, Tsnghua Unversty, Chna ABSTRACT To overcome the

More information

Machine Learning. Topic 6: Clustering

Machine Learning. Topic 6: Clustering Machne Learnng Topc 6: lusterng lusterng Groupng data nto (hopefully useful) sets. Thngs on the left Thngs on the rght Applcatons of lusterng Hypothess Generaton lusters mght suggest natural groups. Hypothess

More information