Concept Forest: A New Ontology-assisted Text Document Similarity Measurement Method

Size: px
Start display at page:

Download "Concept Forest: A New Ontology-assisted Text Document Similarity Measurement Method"

Transcription

1 Concept Forest: A New Ontology-asssted Text Document Smlarty Measurement Method James Z. Wang Wllam Taylor School of Computng Clemson Unversty, Box Clemson, SC , USA {jzwang, wptaylo}@cs.clemson.edu Abstract Although usng ontologes to assst nformaton retreval and text document processng has recently attracted more and more attenton, exstng ontologybased approaches have not shown advantages over the tradtonal keywords-based Latent Semantc Indexng (LSI) method. Ths paper proposes an algorthm to extract a concept forest (CF) from a document wth the assstance of a natural language ontology, the WordNet lexcal database. Usng concept forests to represent the semantcs of text documents, the semantc smlartes of these documents are then measured as the commonaltes of ther concept forests. Performance studes of text document clusterng based on dfferent document smlarty measurement methods show that the CF-based smlarty measurement s an effectve alternatve to the exstng keywords-based methods. In partcular, ths CFbased approach has obvous advantages over the exstng keywords-based methods, ncludng LSI, n processng text abstracts or n P2P envronments where t s mpractcal to collect the entre document corpus for analyss. 1. Introducton Currently, keywords-based technques are commonly used n varous nformaton retreval and text mnng applcatons. Among them, Vector Space Model (VSM) [1] and Latent Semantc Indexng (LSI) [2] are the most wdely adopted. Usng VSM, a text document s represented by a vector of the frequences of terms appearng n ths document. The smlarty between two text documents s measured as the cosne coeffcent between ther term frequency vectors. However, a major drawback of the keywords-based VSM approach s ts nablty of handlng the polysemy and synonymy phenomena of the natural language. As meanngs of words and understandng of concepts dffer n dfferent communtes, dfferent users mght use the same word for dfferent concepts (polysemy) or use dfferent words for the same concept (synonymy). Thus, matchng only keywords may not accurately reveal the semantc smlarty among text documents or between search crtera and text documents due to the heterogenety and ndependency of data sources and data repostores. For example, the keyword java can represent three dfferent concepts: coffee, an sland, or a programmng language, whle keywords dog and canne may represent the same concept n dfferent documents. LSI tres to overcome the lmtaton of VSM by usng statstcally derved conceptual ndces to represent text documents and queres. LSI assumes that there s an underlyng latent structure n word usage that s partally obscured by varablty of word choce and tres to address the polysemy and synonymy problems through modelng the co-occurrence of keywords n documents. Though earler studes contend that LSI may mplctly reveal concepts through the co-occurrence of keywords, we found that the co-occurrence of keywords may not necessarly mean ther contextualty n the document, especally n mult-dscplnary research papers. Ths s exactly why usng LSI-based tools to extract terms from commercal web documents, whch may contan ads, headlnes, and news feeds, s a questonable practce. On the other hand, how to map the LSI-based conceptual ndex nto the underlyng concept s not clear, makng t dffcult to vsualze the text mnng results. In addton, some text document archves, such as MEDLINE database [3] and web bloggng entres, contan prmarly short artcles or abstracts nstead of long papers. These short documents may not provde suffcent cooccurrence nformaton for LSI-based semantc smlarty measurement. Furthermore, n dynamc envronments, such as lve news feeds or P2P systems, t s mpractcal to collect the entre document corpus for analyss. In ths paper, to address the weaknesses of exstng keywords-based approaches, we propose an ontology-

2 asssted text document smlarty measurement method by buldng a concept forest to represent the semantcs of a text document. The rest of ths paper s organzed as follows. We frst dscuss the exstng ontology-based approaches and ther weaknesses n secton 2 and then dscuss our ontology-asssted concept forest constructon algorthm and the assocated smlarty measurement method n secton 3. In secton 4, we cluster varous text document corpuses based on smlarty values obtaned by dfferent methods to valdate the advantages of our CF-based approach. Fnally we gve our concluson and dscuss the future work n secton Background and exstng approaches Recently, to address the problems n keywords-based approaches, many studes tred to use ontologes to assst nformaton retreval and text document processng. These ontology-based approaches can be dvded nto two categores. One category of ontology-based methods [4, 5, 6, 7] apply machne learnng methods, such as clusterng analyss and fuzzy logc, to construct ontologes from text documents and, then, use these ontologes to assst nformaton retreval and text document processng [ 8, 9 ]. However, these methods requre analyzng the entre document corpus to construct a good ontology, and the performance of nformaton retreval and text document processng depends on how good the constructed ontologes are. Durng the corpus analyss, terms rarely appearng n the document corpus are often gnored because of ther low frequences of occurrence. However, hgh nformaton content of these rare terms s valuable for nformaton retreval accordng to nformaton theory. Ignorng these terms n the constructed ontologes may affect the performance of nformaton retreval and text document processng. Nonetheless, these ontology-based methods have not been fully evaluated aganst the keywords-based LSI method, arguably the best keywords-based method. Another group of ontology-based methods utlze an exstng ontology, such as WordNet [ 10 ], to assst nformaton retreval and text document processng. These methods use three dfferent approaches to take advantage of the exstng ontologcal knowledge. The frst approach [11, 12] nvolves usng WordNet to fnd synonyms or hypernyms of terms to mprove the performance of nformaton retreval and text document processng. However, ths approach may ntroduce nose by addng semantc content that s not present n the document corpus. For nstance, gven a document about beef and a document about pork, a hypernymbased method may use meat to replace beef and pork because two terms have a common hypernym meat. Ths approach over-smplfes or overgeneralzes the problem, makng t mpossble to dstngush documents contanng beef from documents contanng pork. Another problem wth ths approach s that t does not perform word sense dsambguaton. Instead, all synonyms or hypernyms related to a keyword are used to replace the keyword. These weaknesses often lead to dsappontng nformaton retreval and text document processng performance [13, 14]. The second approach focuses on word sense dsambguaton [15, 16, 17, 18] to address the synonymy and polysemy problems n natural language processng. However, ths approach tres to determne an exact sense for a term, often resultng n msclassfcaton of terms. Ths approach also gnores the mpact of the semantc smlartes and relatonshps among dfferent terms n the same text document on the performance of nformaton retreval and text document processng. To address the problems n the frst two approaches, the thrd approach apples varous technques [19, 20, 21, 22] to dscover the semantc smlartes and relatonshps of terms and use them to enhance the keywords-based nformaton retreval and text document processng methods, such as VSM. However, the technques used to dscover the term relatonshps and smlartes have ther weaknesses. Seddng [19] used a nave, syntax-based dsambguaton approach by assgnng each word a partof-speech (POS) tag and by enrchng the bag-of-words data representaton, whch extracts synonyms and hypernyms from WordNet to use n document clusterng. Unfortunately, ths study found that ncludng synonyms and hypernyms, dsambguated only by PoS tags, does not mprove the effectveness of text document clusterng. The authors attrbuted ths underperformance to the nose ntroduced by ncorrect senses retreved from WordNet and concluded that dsambguaton by PoS alone s nsuffcent to reveal the full potental of ncludng background knowledge n nformaton retreval and text document processng. To further nvestgate ths ssue, Smone [20] proposed a document search technque that uses other methods, n addton to POS taggng, to cluster search results nto meanngful categores accordng to the words that modfy the orgnal search term n the text document. Ths work focuses on determnng f the antonymy relaton, nstead of synonyms and hypernyms, could be used on the modfers found n documents to decompose a set of search results nto a herarchy of sub-clusters. Unfortunately, ther expermental studes agan suggest that ths approach cannot mprove the performance of nformaton retreval. Whle these two studes [19, 20] suggest explotng term relatonshps or smlartes usng WordNet may not mprove the performance of nformaton retreval and text document processng, other studes usng dfferent methods mply that t s possble to use term relatonshps

3 or smlartes to mprove the performance of the keywords-based VSM. Huang [21] used a guded selforganzaton map (SOM), a result of mergng statstcal methods, compettve neural models, and semantc relatonshps obtaned from WordNet, to mprove the performance of the tradtonal VSM. However, certan human nvolvement s requred to buld the guded SOM. Jng [22] calculates a mutual nformaton matrx for all terms n the documents based on nformaton obtaned from WordNet and uses the mutual nformaton to enhance the keywords-based VSM method. However automatcally computng term mutual nformaton (TMI) s sometmes problematc and may lead to wrong conclusons about the qualty of the learned mutual smlarty [23]. Even though usng SOM and TMI can mprove the performance of the keywords-based VSM, ther performance n comparson wth LSI, the best keywords-based method, has not been nvestgated. Furthermore, these methods requre analyzng the entre document corpus as VSM and LSI do. To address the problems n exstng ontology-based methods, we propose a new ontology-asssted method to measure the semantc smlarty of text documents. Ths new method constructs a concept forest (CF) from a text document, based on the co-occurrence of terms and ther semantc relatonshps found n WordNet. Usng the CF to represent the semantcs text documents, we propose a smple method to measure the semantc smlarty of two text documents. A unque feature of our proposed CFbased method s that we derve the concept forest based only on analyzng the co-occurrences and relatonshps of terms wthn a sngle document. Conversely, exstng approaches all requre analyzng the entre text document corpus to determne the semantc smlarty of two text documents. Therefore, our CF-based method s a practcal alternatve to the exstng nformaton retreval and text document processng methods n dynamc envronments such as P2P systems and lve news feeds, where t s mpractcal to collect the entre document corpus for analyss. 3. Concept Forest and Semantc Smlarty Our CF-based method ncludes three steps: concept forest constructon, semantc content purfcaton, and smlarty measurement. 3.1 Concept Forest Constructon We use WordNet [10] to assst our concept forest constructon. WordNet s a large lexcal database of Englsh words, n whch nouns, verbs, adjectves and adverbs are grouped nto sets of cogntve synonyms (synsets) wth each synset representng a dstnct concept. Synsets are nterlnked by means of conceptual and lexcal relatons. There are approxmately 150,000 words organzed n over 115,000 synsets n WordNet. Every synset contans a group of synonymous words or collocatons wth dfferent senses (concepts) of a word beng n dfferent synsets. Most synsets are connected to other synsets through semantc relatons, such as hypernym, hyponym, etc. The domnant semantc relatonshp n WordNet s hypernym, the s-a relatonshp. Most nouns and verbs are organzed nto herarches, defned by hypernym or s-a relatonshps. For example, Fgure 1 depcts the hypernym herarchy for the frst sense of the word dog. Fgure 1: Herarchy of Hypernym Relatonshps. Gven a text document, we frst extract all keywords and ther occurrence frequences from the document, excludng stop words such as pronouns, common verbs, common nouns, adjectves, and frlly words. These words add lttle or no value n determnng the document s semantc content accordng to prevous studes [1, 2]. We then use a smple WordNet morphology nterface (functon morphstr()) to stem these keywords,.e., to map nflected (or sometmes derved) words to ther stem, base or root form. For nstance, cared, cares, and carng are all mapped to the root word care. After word stemmng, we determne the proper synset for a word based on the co-occurrence of terms n the document and the semantc relatonshps of senses defned n WordNet. In WordNet, each set of synonyms (synset) shares some common propertes, such as a gloss (or dctonary) defnton, ndexed by a unque ID (called synsetid). However, one word may be related to several synsets due to the polysemy of the natural language. For nstance, the word java has three dfferent senses: (1) Coffee, cafe (synsetid: ); (2) Programmng, Programmng Language (synsetid: ); (3) An Island (synsetid: ). Therefore, smply retrevng all senses of the stemmed words to represent the semantc content of a document ntroduces a lot of nose [13, 14]. To address ths ssue, for any stemmed word obtaned from a document, we only use the sense that clearly represents the concept of the word n ths document for our concept forest constructon. Our procedure checks every par of stemmed words obtaned from the text document to determne whether

4 there are semantc relatonshps between ther senses defned n WordNet. We only consder the hypernym relatonshp n ths study because consderng only the hypernym relatonshp s adequate for measurng the semantc smlarty of documents due to the domnance of the hypernym relatonshp among the terms n text documents accordng to our expermental studes. Gven two terms (T1 and T2) obtaned from the same text document, f ther respectve synsets S1 and S2 have a hypernym relatonshp, the synsetids of S1 and S2 are used to represent the concepts of T1 and T2 respectvely, and other senses of T1 and T2 wll be dscarded. Meanwhle, a s-a relatonshp lnk s formed between the synsetids of S1 and S2. Ths process completes when all pars of stemmed words are nvestgated. For nstance, gven a document contanng words dsease, sckness, nfluenza, drug and medcne, we can construct a concept tree for terms dsease, sckness and nfluenza usng s-a relatonshp lnk based on the hypernym relatonshp among these terms as shown n Fgure 2. Smlarly, a concept tree can be bult for terms drug and medcne. These two concept trees form a concept forest depcted n Fgure 3. We note that the terms nstead of ther related synsetids are shown n the concept forest for demonstraton only. In actual concept forests, the synsetids are used to represent the concepts whenever possble. We also note that a concept tree may contan only a sngle stemmed word. S: (n) nfluenza, flu, grppe drect hypernym / nherted hypernym / sster term S: (n) contagous dsease, contagon S: (n) communcable dsease S: (n) dsease S: (n) llness, unwellness, malady, sckness Fgure 2: Hypernym herarchy for terms nfluenza, dsease and sckness. s-a dsease sckness s-a nfluenza drug s-a medcne Fgure 3: Concept forest derved from terms nfluenza, dsease, sckness, drug and medcne. Unlke some exstng approaches [1, 2], whch use all terms n all synsets of the stemmed words to represent the semantc content of a document, we treat keywords dfferently accordng to ther synset propertes and the semantc relatonshps among the synsets of keywords. If a keyword has only one sense, ts synsetid wll be used n the concept forest. If a keyword has more than one sense and no other keyword s senses have semantc relatonshps wth ths keyword s senses, then ths keyword wll be kept as ts orgnal stemmed word n the concept forest snce we cannot dsambguate the word sense. Fnally, f a keyword has many senses, and one or more senses have semantc relatonshps wth the senses of other keywords n the text document, only the synsetids of the senses havng semantc relatonshps wth the senses of other keywords wll be kept n the concept forest. Other senses of ths keyword wll be dscarded snce they are rrelevant to the semantc content of the text document. 3.2 Semantc Content Purfcaton A concept forest constructed by method descrbed n Secton 3.1 may contan terms or synsetids that are not closely related to the man topcs of the text document, and these terms or synsetids may sometmes ntroduce nose to nformaton retreval and text document processng. To address ths ssue, we use the frequences of terms occurrng n the text document to calculate a semantc content rate (SCR) for a concept tree n the concept forest. Each stemmed word obtaned from a text document has an assocated word frequency value correspondng to the number of occurrences that ths word was found n the text document. When a stemmed word s mapped to a partcular synsetid durng the CF constructon, the assocated word frequency value s transferred to the synsetid. If several stemmed words are mapped to the same synsetid, the word frequency value of ths synsetid s the sum of the word frequency values of these assocated words. We further defne the semantc content weght for a concept tree as the sum of the word frequency values of all ts assocated synsetids. For a sngle-node tree, ts semantc content weght s the word frequency value of ths sngle node. Assumng the semantc content weghts of concept trees n a concept forest are w 1, w 2,, w n, respectvely, the semantc content rate of concept tree s defned as: SCR w n j 1 The SCR values n a concept forest ndcate the semantc organzaton of the assocated text document. A concept forest obtaned from a clearly and concsely wrtten sngle-topc abstract may contan a concept tree havng an SCR value greater than 75%, whle the concept forest obtaned from a long multple-topc text document may contan several concept trees wth much smaller SCR values. To purfy the semantc content of a concept forest, we use a threshold (e.g., 5%) to flter out concept trees wth low SCR values. Any concept tree whose SCR value falls below ths threshold wll be removed from the fnal purfed concept forest. w j (1)

5 3.3 Semantc Smlarty Measurement Usng a concept forest to represent the semantc content of a text document, the semantc smlarty of two text documents can be determned by comparng ther concept forests. Formally, an concept forest s defned as a Drected Acyclc Graph (DAG): CF = [T, E, R], where T = {t 1, t 2,, t n } s a set of stemmed words or synsetids, and E = {e 1, e 2,, e m } s a set of edges connectng synsetids wth relatonshps defned n R = {r 1, r 2,, r k }. Specfcally, an edge e s defned as a trplet [t 1, t 2, r j ] where t 1, t 2 T and r j R. In addton, two terms can be lnked by only one relatonshp, that s, l k, [ t, t, r ] E [ t, t, r ] E. j k For nstance, the forest concept n Fgure 3 can be represented as CF = [{ dsease, sckness, nfluenza, drug, medcne }, {[ nfluenza, dsease, s-a ], [ dsease, sckness, s-a ], [ medcne, drug, sa ]}, { s-a }]. Gven two documents D 1 and D 2, and ther concept forests CF 1 = [T 1, E 1, R 1 ] and CF 2 = [T 2, E 2, R 2 ] respectvely, determnng the semantc smlarty of these two documents needs to consder the smlartes of the term sets, edge sets, and relatonshp sets n ther concept forests. However, we use only the hypernym ( s-a ) relatonshp to construct the CF and thus the relatonshp set R s the same for all CFs. On the other hand, the selecton of terms durng the CF constructon mples ther relatonshps. Therefore, we calculate the semantc smlarty of two text documents by smply comparng the smlarty of the term sets (T 1 and T 2 ) n ther concept forests, hopng ths smple measurement s suffcent for nformaton retreval and text document processng. That s: T1 T2 Sm ( D1, D2 ) (2) T T 4. Expermental Studes To evaluate whether our obtaned concept forest can represent the semantc content of a document, we cluster text documents based on ther semantc smlarty values calculated by Equaton 2. The clusterng results are then compared wth the results of document clusterng based on VSM and LSI respectvely. 4.1 Text Document Corpus As n many prevous studes, we derve our document corpuses from Reuters Text Categorzaton Collecton n UCI KDD archve [24]. The Reuters dataset s a collecton of documents that appeared on Reuters newswre n The documents were assembled and ndexed wth categores. Ths dataset 1 2 j l conssts of approxmately 21,500 fles coverng 132 (possbly overlappng) categores wth the fle sze per artcle rangng from 12 to 900 words. As we dscussed prevously, LSI s not effcent n nformaton retreval and text document processng for short text documents due to the nsuffcent cooccurrence nformaton wthn the short documents. We want to study whether our CF-based text document smlarty measurement method can address ths ssue so that t can be used for nformaton retreval n text abstract databases, such as MEDLINE database. Therefore, we ntentonally select text documents contanng less than 400 words. As shown n Table 1, four text document corpuses contanng 50 to 500 documents are selected for our expermental studes. Table 1: Selected text document corpuses Corpus Corpus Characterstcs C-1 50 Documents, 2 categores (Ol, Nat-Gas), 25 documents n each category. C Documents, 2 categores (Coffee, Sugar), 50 documents n each category. C Documents, 4 categores (Gran, Wheat, Shp, Crude), 50 documents n each category. C Documents, 2 categores (Wheat, Gran), 250 documents n each category. 4.2 Performance Evaluaton Method Although many studes used K-means clusterng algorthm or ts varants for text document clusterng [13, 14], K-means algorthm s not sutable for text document clusterng usng our CF-based smlarty measurement because t does not make sense to calculate a mean smlarty among a set of documents. Therefore, an agglomeratve herarchcal clusterng algorthm s used n our performance study. Gven a text document corpus, each document ntally belongs to ts own ndvdual cluster. We set the ntal smlarty threshold to be 1 and decrease the threshold wth a small nterval so that documents wth smlar semantcs wll be gradually merged nto the same group. Snce we already know the categores from whch each document was obtaned, the document clusterng process stops when the majorty of documents from dfferent categores fall nto ther respectve clusters and further decreasng the threshold wll cause clusters contanng documents prmarly from two dfferent categores merged nto one cluster. After the document clusterng, we calculate the clusterng accuracy as the number of documents correctly clustered nto ther categores dvded by the total number of documents.

6 Besdes clusterng text documents based on our CFbased smlarty measurement method, we also perform the document clusterng usng VSM and LSI as the document smlarty measurement methods. For VSM, the cosne coeffcents of the document vectors are used as the smlarty measures. For LSI, we calculate the rank k approxmaton of term vector for each document and calculate ther smlartes usng cosne coeffcents of ther term vectors. Then we use these smlarty values to cluster the text documents. We repeat the same process under dfferent k values and report the best clusterng results for LSI. 4.3 Performance Results We conducted our expermental studes on a DELL desktop computer equpped wth a 1.0 GHz Intel Pentum IV processor and 512 MB RAM, runnng the Red Hat Enterprse Lnux. We cluster the text document corpuses lsted n Table 1 based on three dfferent smlarty measurement methods, VSM, LSI, and CF-based method. The accuraces of text document clusterng usng dfferent methods are lsted n Table 2. In addton to the clusterng accuracy, we also observe the total tme needed to complete the corpus analyss and document clusterng. The results are reported n Fgure 4. Table 2: Clusterng accuraces on text corpuses lsted n Table 1 VSM LSI CF C-1 64% 64% 74% C-2 50% 62% 80% C-3 25% 34% 48% C-4 50% 56.8% 68% C-1 C-2 C-3 C-4 CF VSM Fgure 4: Total tme (mnutes) needed to complete the corpus analyss and text document clusterng usng dfferent methods on text corpuses lsted n Table 1. The performance results n Table 2 show that the accuracy of document clusterng based on our CF-based smlarty measurement s much better than that based on LSI VSM or LSI. On the other hand, the clusterng accuraces based on LSI are better than those based on VSM. The executon tme depcted n Fgure 4 exhbt the runtme effcency of our CF-based text document processng method. The total tme spent on corpus analyss and document clusterng usng the CF-based method s much less than that based on VSM or LSI. 5. Concluson and Future Studes In ths paper, we propose a novel algorthm to extract a concept forest (CF) from a text document wth the assstance of a natural language ontology, WordNet lexcal database. Usng concept forests to represent the semantcs of text documents, we measure the semantc smlarty of two text documents by smply comparng the term sets n ther respectve concept forests. Ths CFbased smlarty measurement does not requre analyzng the entre document corpus, an advantage over most exstng document smlarty measurement methods, ncludng the popular VSM and LSI. Ths unque advantage allows our CF-based text document smlarty measurement method to be used P2P envronments where collectng the entre document corpus for analyss s mpractcal. Our expermental studes also show that the CF-based text document smlarty measurement method performs much better than both VSM and LSI methods when document szes are relatvely small. Furthermore, our CF-based document smlarty measurement method s much more effcent regardng the total executon tme used for corpus analyss and document clusterng. Therefore, we beleve the CF-based approach s a practcal alternatve to the exstng keywords-based methods for nformaton retreval and text mnng n text abstract databases, such as MEDLINE. We are currently desgnng a graph-matchng-based method to compare the smlarty of two concept forests, hopng to provde a more sophstcated text document smlarty measurement and mprove the text document clusterng accuracy. We are also mplementng a CFbased nformaton retreval system to effectvely retreve text abstracts from MEDLINE database. 6. References [1] G. Salton, A. Wong, and C. S. Yang (1975), A Vector Space Model for Automatc Indexng, Communcatons of the ACM, vol. 18, no. 11, pages [2] Deerwester, S., Dumas, S. T., Furnas, G. W., Landauer, T. K., & Harshman, R. (1990). Indexng by latent semantc analyss. Journal of the Amercan Socety for Informaton Scence, 41(6),

7 [3] MEDLINE Fact Sheet, l [4] Lee, C.S., Jan, Z.W. and Huang, L.K., A fuzzy ontology and ts applcaton to news summarzaton. IEEE Transactons on Systems, Man, and Cybernetcs, Part B: Cybernetcs. Volume 35, Issue 5. pp [5] Lpka Dey, Ashsh Chandra Rastog, Sachn Kumar, Generatng Concept Ontologes through Text Mnng, 2006 IEEE/WIC/ACM Internatonal Conference on Web Intellgence (WI'06), pp , [6] O. S. Chn, N. Kulathuramayer, A. W. Yeo, Automatc Dscovery of Concepts from Text, IEEE/WIC/ACM Internatonal Conference on Web Intellgence (WI 2006), pp , December 2006 [7] Blaz Fortuna, Dunja Mladenc and Marko Grobelnk, Sem-Automatc Constructon of Topc Ontology, Conference on Data Mnng and Data Warehouses (SKDD 2005) at multconference IS [8] Navgl, R., Velard, P. and Gangem, A., Ontology learnng and ts applcaton to automated termnology translaton. IEEE Intellgent Systems. Volume 18, Issue 1. pp [9] Sugumaran, V. and Storey, V.C., Ontologes for conceptual modelng: ther creaton, use, and management. Internatonal Journal of Data and Knowledge Engneerng. Volume 42, Issue 3. pp [10] Chrstne Fellbaum (ed.), WordNet: An Electronc Lexcal Database. The MIT Press, May [11] S. Scott and S. Matwn. Text Classfcaton usng WordNet Hypernyms. In S. Harabagu, edtor, Use of WordNet n Natural Language Processng Systems: Proceedngs of the Conference, pages Assocaton for Computatonal Lngustcs, Somerset, New Jersey, [12] D. Koller, and M. Saham, Herarchcally classfyng documents usng very few words, Proceedngs of the 14th nternatonal Conference on Machne Learnng ECML98, [13] A. Kehagas, V. Petrds, V.G. Kaburlasos, and P. Fragkou, A comparson of word- and sense-based text categorzaton usng several classfcaton algorthms, Journal of Intellgent Informaton Systems, 21(3), [14] A. Hotho, S. Staab, and G. Stumme. Ontologes mprove text document clusterng. In Proceedngs of the IEEE Internatonal Conference on Data Mnng, pages , [15] Dmtros Mavroeds et al., Word Sense Dsambguaton for Explotng Herarchcal Thesaur n Text Classfcaton, A. Jorge et al. (Eds.): PKDD 2005, LNAI 3721, pp , [16] Youjn Chung and Jong-Hyeok Lee, Practcal Word- Sense Dsambguaton Usng Co-occurrng Concept Codes, Machne Translaton (2005) 19: [17] Ernesto Wllam De Luca, Andreas Nürnberger: Usng clusterng methods to mprove ontology-based query term dsambguaton. Int. J. Intell. Syst. 21(7): (2006) [18] Yng Lu, Peter Scheuermann, Xngsen L, and Xngquan Zhu, Usng WordNet to Dsambguate Word Senses for Text Classfcaton, Y. Sh et al. (Eds.): ICCS 2007, Part III, LNCS 4489, pp , [19] J. Seddng and D. Kazakov. WordNet-based Text Document Clusterng. In Proc. of the Thrd Workshop on Robust Methods n Analyss of Natural Language Data (ROMAND), pp , Geneva, [20] Thomas de Smone and Dmtar Kazakov. Usng WordNet Smlarty and Antonymy Relatons to Ad Document Retreval. Recent Advances n Natural Language Processng (RANLP 2005), September 2005, Borovets, Bulgara. [21] Chhl Hung, Stefan Wermter and Peter Smth, Hybrd Neural Document Clusterng Usng Guded Self-Organzaton and WordNet, IEEE Intellgent Systems, Vol. 19, No. 2, pp , [22] L. Jng, L. Zhou, M. Ng and J. Huang, Ontologybased Dstance Measure for Text Clusterng, SIAM Text Mnng 2006 Workshop. [23] Marta Sabou, Learnng Web Servce Ontologes: an Automatc Extracton Method and ts Evaluaton, Ontology Learnng and Populaton ( Edtors: P.Butelaar, P. Cmano, B. Magnn), IOS Press, 2005 [24] Reuters Text Categorzaton Collecton, s21578.html

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

Query Clustering Using a Hybrid Query Similarity Measure

Query Clustering Using a Hybrid Query Similarity Measure Query clusterng usng a hybrd query smlarty measure Fu. L., Goh, D.H., & Foo, S. (2004). WSEAS Transacton on Computers, 3(3), 700-705. Query Clusterng Usng a Hybrd Query Smlarty Measure Ln Fu, Don Hoe-Lan

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for

More information

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

Document Representation and Clustering with WordNet Based Similarity Rough Set Model

Document Representation and Clustering with WordNet Based Similarity Rough Set Model IJCSI Internatonal Journal of Computer Scence Issues, Vol. 8, Issue 5, No 3, September 20 ISSN (Onlne): 694-084 www.ijcsi.org Document Representaton and Clusterng wth WordNet Based Smlarty Rough Set Model

More information

Web Document Classification Based on Fuzzy Association

Web Document Classification Based on Fuzzy Association Web Document Classfcaton Based on Fuzzy Assocaton Choochart Haruechayasa, Me-Lng Shyu Department of Electrcal and Computer Engneerng Unversty of Mam Coral Gables, FL 33124, USA charuech@mam.edu, shyu@mam.edu

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

Available online at Available online at Advanced in Control Engineering and Information Science

Available online at   Available online at   Advanced in Control Engineering and Information Science Avalable onlne at wwwscencedrectcom Avalable onlne at wwwscencedrectcom Proceda Proceda Engneerng Engneerng 00 (2011) 15000 000 (2011) 1642 1646 Proceda Engneerng wwwelsevercom/locate/proceda Advanced

More information

Performance Evaluation of Information Retrieval Systems

Performance Evaluation of Information Retrieval Systems Why System Evaluaton? Performance Evaluaton of Informaton Retreval Systems Many sldes n ths secton are adapted from Prof. Joydeep Ghosh (UT ECE) who n turn adapted them from Prof. Dk Lee (Unv. of Scence

More information

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,

More information

A Knowledge Management System for Organizing MEDLINE Database

A Knowledge Management System for Organizing MEDLINE Database A Knowledge Management System for Organzng MEDLINE Database Hyunk Km, Su-Shng Chen Computer and Informaton Scence Engneerng Department, Unversty of Florda, Ganesvlle, Florda 32611, USA Wth the exploson

More information

UB at GeoCLEF Department of Geography Abstract

UB at GeoCLEF Department of Geography   Abstract UB at GeoCLEF 2006 Mguel E. Ruz (1), Stuart Shapro (2), June Abbas (1), Slva B. Southwck (1) and Davd Mark (3) State Unversty of New York at Buffalo (1) Department of Lbrary and Informaton Studes (2) Department

More information

Module Management Tool in Software Development Organizations

Module Management Tool in Software Development Organizations Journal of Computer Scence (5): 8-, 7 ISSN 59-66 7 Scence Publcatons Management Tool n Software Development Organzatons Ahmad A. Al-Rababah and Mohammad A. Al-Rababah Faculty of IT, Al-Ahlyyah Amman Unversty,

More information

Mathematics 256 a course in differential equations for engineering students

Mathematics 256 a course in differential equations for engineering students Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the

More information

Description of NTU Approach to NTCIR3 Multilingual Information Retrieval

Description of NTU Approach to NTCIR3 Multilingual Information Retrieval Proceedngs of the Thrd NTCIR Workshop Descrpton of NTU Approach to NTCIR3 Multlngual Informaton Retreval Wen-Cheng Ln and Hsn-Hs Chen Department of Computer Scence and Informaton Engneerng Natonal Tawan

More information

LRD: Latent Relation Discovery for Vector Space Expansion and Information Retrieval

LRD: Latent Relation Discovery for Vector Space Expansion and Information Retrieval LRD: Latent Relaton Dscovery for Vector Space Expanson and Informaton Retreval Techncal Report KMI-06-09 March, 006 Alexandre Gonçalves, Janhan Zhu, Dawe Song, Vctora Uren, Roberto Pacheco In Proc. of

More information

Alignment Results of SOBOM for OAEI 2010

Alignment Results of SOBOM for OAEI 2010 Algnment Results of SOBOM for OAEI 2010 Pegang Xu, Yadong Wang, Lang Cheng, Tany Zang School of Computer Scence and Technology Harbn Insttute of Technology, Harbn, Chna pegang.xu@gmal.com, ydwang@ht.edu.cn,

More information

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems A Unfed Framework for Semantcs and Feature Based Relevance Feedback n Image Retreval Systems Ye Lu *, Chunhu Hu 2, Xngquan Zhu 3*, HongJang Zhang 2, Qang Yang * School of Computng Scence Smon Fraser Unversty

More information

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur FEATURE EXTRACTION Dr. K.Vjayarekha Assocate Dean School of Electrcal and Electroncs Engneerng SASTRA Unversty, Thanjavur613 41 Jont Intatve of IITs and IISc Funded by MHRD Page 1 of 8 Table of Contents

More information

Online Text Mining System based on M2VSM

Online Text Mining System based on M2VSM FR-E2-1 SCIS & ISIS 2008 Onlne Text Mnng System based on M2VSM Yasufum Takama 1, Takash Okada 1, Toru Ishbash 2 1. Tokyo Metropoltan Unversty, 2. Tokyo Metropoltan Insttute of Technology 6-6 Asahgaoka,

More information

Ontology Generator from Relational Database Based on Jena

Ontology Generator from Relational Database Based on Jena Computer and Informaton Scence Vol. 3, No. 2; May 2010 Ontology Generator from Relatonal Database Based on Jena Shufeng Zhou (Correspondng author) College of Mathematcs Scence, Laocheng Unversty No.34

More information

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Determining the Optimal Bandwidth Based on Multi-criterion Fusion Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Mult-crteron Fuson Ha-L Lang 1+, Xan-Mn

More information

User Authentication Based On Behavioral Mouse Dynamics Biometrics

User Authentication Based On Behavioral Mouse Dynamics Biometrics User Authentcaton Based On Behavoral Mouse Dynamcs Bometrcs Chee-Hyung Yoon Danel Donghyun Km Department of Computer Scence Department of Computer Scence Stanford Unversty Stanford Unversty Stanford, CA

More information

A New Approach For the Ranking of Fuzzy Sets With Different Heights

A New Approach For the Ranking of Fuzzy Sets With Different Heights New pproach For the ankng of Fuzzy Sets Wth Dfferent Heghts Pushpnder Sngh School of Mathematcs Computer pplcatons Thapar Unversty, Patala-7 00 Inda pushpndersnl@gmalcom STCT ankng of fuzzy sets plays

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

Lecture 5: Multilayer Perceptrons

Lecture 5: Multilayer Perceptrons Lecture 5: Multlayer Perceptrons Roger Grosse 1 Introducton So far, we ve only talked about lnear models: lnear regresson and lnear bnary classfers. We noted that there are functons that can t be represented

More information

Keywords - Wep page classification; bag of words model; topic model; hierarchical classification; Support Vector Machines

Keywords - Wep page classification; bag of words model; topic model; hierarchical classification; Support Vector Machines (IJCSIS) Internatonal Journal of Computer Scence and Informaton Securty, Herarchcal Web Page Classfcaton Based on a Topc Model and Neghborng Pages Integraton Wongkot Srura Phayung Meesad Choochart Haruechayasak

More information

Meta-heuristics for Multidimensional Knapsack Problems

Meta-heuristics for Multidimensional Knapsack Problems 2012 4th Internatonal Conference on Computer Research and Development IPCSIT vol.39 (2012) (2012) IACSIT Press, Sngapore Meta-heurstcs for Multdmensonal Knapsack Problems Zhbao Man + Computer Scence Department,

More information

An Image Fusion Approach Based on Segmentation Region

An Image Fusion Approach Based on Segmentation Region Rong Wang, L-Qun Gao, Shu Yang, Yu-Hua Cha, and Yan-Chun Lu An Image Fuson Approach Based On Segmentaton Regon An Image Fuson Approach Based on Segmentaton Regon Rong Wang, L-Qun Gao, Shu Yang 3, Yu-Hua

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

Deep Classification in Large-scale Text Hierarchies

Deep Classification in Large-scale Text Hierarchies Deep Classfcaton n Large-scale Text Herarches Gu-Rong Xue Dkan Xng Qang Yang 2 Yong Yu Dept. of Computer Scence and Engneerng Shangha Jao-Tong Unversty {grxue, dkxng, yyu}@apex.sjtu.edu.cn 2 Hong Kong

More information

Enhancement of Infrequent Purchased Product Recommendation Using Data Mining Techniques

Enhancement of Infrequent Purchased Product Recommendation Using Data Mining Techniques Enhancement of Infrequent Purchased Product Recommendaton Usng Data Mnng Technques Noraswalza Abdullah, Yue Xu, Shlomo Geva, and Mark Loo Dscplne of Computer Scence Faculty of Scence and Technology Queensland

More information

Semantic Image Retrieval Using Region Based Inverted File

Semantic Image Retrieval Using Region Based Inverted File Semantc Image Retreval Usng Regon Based Inverted Fle Dengsheng Zhang, Md Monrul Islam, Guoun Lu and Jn Hou 2 Gppsland School of Informaton Technology, Monash Unversty Churchll, VIC 3842, Australa E-mal:

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15 CS434a/541a: Pattern Recognton Prof. Olga Veksler Lecture 15 Today New Topc: Unsupervsed Learnng Supervsed vs. unsupervsed learnng Unsupervsed learnng Net Tme: parametrc unsupervsed learnng Today: nonparametrc

More information

Pruning Training Corpus to Speedup Text Classification 1

Pruning Training Corpus to Speedup Text Classification 1 Prunng Tranng Corpus to Speedup Text Classfcaton Jhong Guan and Shugeng Zhou School of Computer Scence, Wuhan Unversty, Wuhan, 430079, Chna hguan@wtusm.edu.cn State Key Lab of Software Engneerng, Wuhan

More information

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints Australan Journal of Basc and Appled Scences, 2(4): 1204-1208, 2008 ISSN 1991-8178 Sum of Lnear and Fractonal Multobjectve Programmng Problem under Fuzzy Rules Constrants 1 2 Sanjay Jan and Kalash Lachhwan

More information

Keyword-based Document Clustering

Keyword-based Document Clustering Keyword-based ocument lusterng Seung-Shk Kang School of omputer Scence Kookmn Unversty & AIrc hungnung-dong Songbuk-gu Seoul 36-72 Korea sskang@kookmn.ac.kr Abstract ocument clusterng s an aggregaton of

More information

HCMX: AN EFFICIENT HYBRID CLUSTERING APPROACH FOR MULTI-VERSION XML DOCUMENTS

HCMX: AN EFFICIENT HYBRID CLUSTERING APPROACH FOR MULTI-VERSION XML DOCUMENTS HCMX: AN EFFICIENT HYBRID CLUSTERING APPROACH FOR MULTI-VERSION XML DOCUMENTS VIJAY SONAWANE 1, D.RAJESWARA.RAO 2 1 Research Scholar, Department of CSE, K.L.Unversty, Green Felds, Guntur, Andhra Pradesh

More information

Learning Topic Structure in Text Documents using Generative Topic Models

Learning Topic Structure in Text Documents using Generative Topic Models Learnng Topc Structure n Text Documents usng Generatve Topc Models Ntsh Srvastava CS 397 Report Advsor: Dr Hrsh Karnck Abstract We present a method for estmatng the topc structure for a document corpus

More information

Hierarchical clustering for gene expression data analysis

Hierarchical clustering for gene expression data analysis Herarchcal clusterng for gene expresson data analyss Gorgo Valentn e-mal: valentn@ds.unm.t Clusterng of Mcroarray Data. Clusterng of gene expresson profles (rows) => dscovery of co-regulated and functonally

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

Load Balancing for Hex-Cell Interconnection Network

Load Balancing for Hex-Cell Interconnection Network Int. J. Communcatons, Network and System Scences,,, - Publshed Onlne Aprl n ScRes. http://www.scrp.org/journal/jcns http://dx.do.org/./jcns.. Load Balancng for Hex-Cell Interconnecton Network Saher Manaseer,

More information

Combining Multiple Resources, Evidence and Criteria for Genomic Information Retrieval

Combining Multiple Resources, Evidence and Criteria for Genomic Information Retrieval Combnng Multple Resources, Evdence and Crtera for Genomc Informaton Retreval Luo S 1, Je Lu 2 and Jame Callan 2 1 Department of Computer Scence, Purdue Unversty, West Lafayette, IN 47907, USA ls@cs.purdue.edu

More information

Remote Sensing Image Retrieval Algorithm based on MapReduce and Characteristic Information

Remote Sensing Image Retrieval Algorithm based on MapReduce and Characteristic Information Remote Sensng Image Retreval Algorthm based on MapReduce and Characterstc Informaton Zhang Meng 1, 1 Computer School, Wuhan Unversty Hube, Wuhan430097 Informaton Center, Wuhan Unversty Hube, Wuhan430097

More information

Arabic Text Classification Using N-Gram Frequency Statistics A Comparative Study

Arabic Text Classification Using N-Gram Frequency Statistics A Comparative Study Arabc Text Classfcaton Usng N-Gram Frequency Statstcs A Comparatve Study Lala Khresat Dept. of Computer Scence, Math and Physcs Farlegh Dcknson Unversty 285 Madson Ave, Madson NJ 07940 Khresat@fdu.edu

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

FAHP and Modified GRA Based Network Selection in Heterogeneous Wireless Networks

FAHP and Modified GRA Based Network Selection in Heterogeneous Wireless Networks 2017 2nd Internatonal Semnar on Appled Physcs, Optoelectroncs and Photoncs (APOP 2017) ISBN: 978-1-60595-522-3 FAHP and Modfed GRA Based Network Selecton n Heterogeneous Wreless Networks Xaohan DU, Zhqng

More information

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms Course Introducton Course Topcs Exams, abs, Proects A quc loo at a few algorthms 1 Advanced Data Structures and Algorthms Descrpton: We are gong to dscuss algorthm complexty analyss, algorthm desgn technques

More information

The Research of Support Vector Machine in Agricultural Data Classification

The Research of Support Vector Machine in Agricultural Data Classification The Research of Support Vector Machne n Agrcultural Data Classfcaton Le Sh, Qguo Duan, Xnmng Ma, Me Weng College of Informaton and Management Scence, HeNan Agrcultural Unversty, Zhengzhou 45000 Chna Zhengzhou

More information

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices Internatonal Mathematcal Forum, Vol 7, 2012, no 52, 2549-2554 An Applcaton of the Dulmage-Mendelsohn Decomposton to Sparse Null Space Bases of Full Row Rank Matrces Mostafa Khorramzadeh Department of Mathematcal

More information

A Webpage Similarity Measure for Web Sessions Clustering Using Sequence Alignment

A Webpage Similarity Measure for Web Sessions Clustering Using Sequence Alignment A Webpage Smlarty Measure for Web Sessons Clusterng Usng Sequence Algnment Mozhgan Azmpour-Kv School of Engneerng and Scence Sharf Unversty of Technology, Internatonal Campus Ksh Island, Iran mogan_az@ksh.sharf.edu

More information

Smoothing Spline ANOVA for variable screening

Smoothing Spline ANOVA for variable screening Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory

More information

Type-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data

Type-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data Malaysan Journal of Mathematcal Scences 11(S) Aprl : 35 46 (2017) Specal Issue: The 2nd Internatonal Conference and Workshop on Mathematcal Analyss (ICWOMA 2016) MALAYSIAN JOURNAL OF MATHEMATICAL SCIENCES

More information

Learning-Based Top-N Selection Query Evaluation over Relational Databases

Learning-Based Top-N Selection Query Evaluation over Relational Databases Learnng-Based Top-N Selecton Query Evaluaton over Relatonal Databases Lang Zhu *, Wey Meng ** * School of Mathematcs and Computer Scence, Hebe Unversty, Baodng, Hebe 071002, Chna, zhu@mal.hbu.edu.cn **

More information

Impact of a New Attribute Extraction Algorithm on Web Page Classification

Impact of a New Attribute Extraction Algorithm on Web Page Classification Impact of a New Attrbute Extracton Algorthm on Web Page Classfcaton Gösel Brc, Banu Dr, Yldz Techncal Unversty, Computer Engneerng Department Abstract Ths paper ntroduces a new algorthm for dmensonalty

More information

Assembler. Building a Modern Computer From First Principles.

Assembler. Building a Modern Computer From First Principles. Assembler Buldng a Modern Computer From Frst Prncples www.nand2tetrs.org Elements of Computng Systems, Nsan & Schocken, MIT Press, www.nand2tetrs.org, Chapter 6: Assembler slde Where we are at: Human Thought

More information

Query classification using topic models and support vector machine

Query classification using topic models and support vector machine Query classfcaton usng topc models and support vector machne Deu-Thu Le Unversty of Trento, Italy deuthu.le@ds.untn.t Raffaella Bernard Unversty of Trento, Italy bernard@ds.untn.t Abstract Ths paper descrbes

More information

A Deflected Grid-based Algorithm for Clustering Analysis

A Deflected Grid-based Algorithm for Clustering Analysis A Deflected Grd-based Algorthm for Clusterng Analyss NANCY P. LIN, CHUNG-I CHANG, HAO-EN CHUEH, HUNG-JEN CHEN, WEI-HUA HAO Department of Computer Scence and Informaton Engneerng Tamkang Unversty 5 Yng-chuan

More information

The Codesign Challenge

The Codesign Challenge ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.

More information

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc.

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc. [Type text] [Type text] [Type text] ISSN : 0974-74 Volume 0 Issue BoTechnology 04 An Indan Journal FULL PAPER BTAIJ 0() 04 [684-689] Revew on Chna s sports ndustry fnancng market based on market -orented

More information

Domain Thesaurus Construction from Wikipedia *

Domain Thesaurus Construction from Wikipedia * Internatonal Conference on Computer, Networks and Communcaton Engneerng (ICCNCE 2013) Doman Thesaurus Constructon from Wkpeda * WenKe Yn 1, Mng Zhu 2, TanHao Chen 2 1 Department of Electronc Engneerng

More information

Analysis of Continuous Beams in General

Analysis of Continuous Beams in General Analyss of Contnuous Beams n General Contnuous beams consdered here are prsmatc, rgdly connected to each beam segment and supported at varous ponts along the beam. onts are selected at ponts of support,

More information

Related-Mode Attacks on CTR Encryption Mode

Related-Mode Attacks on CTR Encryption Mode Internatonal Journal of Network Securty, Vol.4, No.3, PP.282 287, May 2007 282 Related-Mode Attacks on CTR Encrypton Mode Dayn Wang, Dongda Ln, and Wenlng Wu (Correspondng author: Dayn Wang) Key Laboratory

More information

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr) Helsnk Unversty Of Technology, Systems Analyss Laboratory Mat-2.08 Independent research projects n appled mathematcs (3 cr) "! #$&% Antt Laukkanen 506 R ajlaukka@cc.hut.f 2 Introducton...3 2 Multattrbute

More information

Load-Balanced Anycast Routing

Load-Balanced Anycast Routing Load-Balanced Anycast Routng Chng-Yu Ln, Jung-Hua Lo, and Sy-Yen Kuo Department of Electrcal Engneerng atonal Tawan Unversty, Tape, Tawan sykuo@cc.ee.ntu.edu.tw Abstract For fault-tolerance and load-balance

More information

Solving two-person zero-sum game by Matlab

Solving two-person zero-sum game by Matlab Appled Mechancs and Materals Onlne: 2011-02-02 ISSN: 1662-7482, Vols. 50-51, pp 262-265 do:10.4028/www.scentfc.net/amm.50-51.262 2011 Trans Tech Publcatons, Swtzerland Solvng two-person zero-sum game by

More information

The Effect of Similarity Measures on The Quality of Query Clusters

The Effect of Similarity Measures on The Quality of Query Clusters The effect of smlarty measures on the qualty of query clusters. Fu. L., Goh, D.H., Foo, S., & Na, J.C. (2004). Journal of Informaton Scence, 30(5) 396-407 The Effect of Smlarty Measures on The Qualty of

More information

A Resources Virtualization Approach Supporting Uniform Access to Heterogeneous Grid Resources 1

A Resources Virtualization Approach Supporting Uniform Access to Heterogeneous Grid Resources 1 A Resources Vrtualzaton Approach Supportng Unform Access to Heterogeneous Grd Resources 1 Cunhao Fang 1, Yaoxue Zhang 2, Song Cao 3 1 Tsnghua Natonal Labatory of Inforamaton Scence and Technology 2 Department

More information

Cross-Language Information Retrieval

Cross-Language Information Retrieval Feature Artcle: Cross-Language Informaton Retreval 19 Cross-Language Informaton Retreval Jan-Yun Ne 1 Abstract A research group n Unversty of Montreal has worked on the problem of cross-language nformaton

More information

User Tweets based Genre Prediction and Movie Recommendation using LSI and SVD

User Tweets based Genre Prediction and Movie Recommendation using LSI and SVD User Tweets based Genre Predcton and Move Recommendaton usng LSI and SVD Saksh Bansal, Chetna Gupta Department of CSE/IT Jaypee Insttute of Informaton Technology,sec-62 Noda, Inda sakshbansal76@gmal.com,

More information

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach Angle Estmaton and Correcton of Hand Wrtten, Textual and Large areas of Non-Textual Document Images: A Novel Approach D.R.Ramesh Babu Pyush M Kumat Mahesh D Dhannawat PES Insttute of Technology Research

More information

Determining Fuzzy Sets for Quantitative Attributes in Data Mining Problems

Determining Fuzzy Sets for Quantitative Attributes in Data Mining Problems Determnng Fuzzy Sets for Quanttatve Attrbutes n Data Mnng Problems ATTILA GYENESEI Turku Centre for Computer Scence (TUCS) Unversty of Turku, Department of Computer Scence Lemmnkäsenkatu 4A, FIN-5 Turku

More information

A Method of Hot Topic Detection in Blogs Using N-gram Model

A Method of Hot Topic Detection in Blogs Using N-gram Model 84 JOURNAL OF SOFTWARE, VOL. 8, NO., JANUARY 203 A Method of Hot Topc Detecton n Blogs Usng N-gram Model Xaodong Wang College of Computer and Informaton Technology, Henan Normal Unversty, Xnxang, Chna

More information

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009. Farrukh Jabeen Algorthms 51 Assgnment #2 Due Date: June 15, 29. Assgnment # 2 Chapter 3 Dscrete Fourer Transforms Implement the FFT for the DFT. Descrbed n sectons 3.1 and 3.2. Delverables: 1. Concse descrpton

More information

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters Proper Choce of Data Used for the Estmaton of Datum Transformaton Parameters Hakan S. KUTOGLU, Turkey Key words: Coordnate systems; transformaton; estmaton, relablty. SUMMARY Advances n technologes and

More information

Concurrent Apriori Data Mining Algorithms

Concurrent Apriori Data Mining Algorithms Concurrent Apror Data Mnng Algorthms Vassl Halatchev Department of Electrcal Engneerng and Computer Scence York Unversty, Toronto October 8, 2015 Outlne Why t s mportant Introducton to Assocaton Rule Mnng

More information

Object-Based Techniques for Image Retrieval

Object-Based Techniques for Image Retrieval 54 Zhang, Gao, & Luo Chapter VII Object-Based Technques for Image Retreval Y. J. Zhang, Tsnghua Unversty, Chna Y. Y. Gao, Tsnghua Unversty, Chna Y. Luo, Tsnghua Unversty, Chna ABSTRACT To overcome the

More information

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1) Secton 1.2 Subsets and the Boolean operatons on sets If every element of the set A s an element of the set B, we say that A s a subset of B, or that A s contaned n B, or that B contans A, and we wrte A

More information

Personalized Concept-Based Clustering of Search Engine Queries

Personalized Concept-Based Clustering of Search Engine Queries IEEE TRANSACTIONS ON JOURNAL NAME, MANUSCRIPT ID 1 Personalzed Concept-Based Clusterng of Search Engne Queres Kenneth Wa-Tng Leung, Wlfred Ng, and Dk Lun Lee Abstract The exponental growth of nformaton

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

A Clustering Algorithm for Chinese Adjectives and Nouns 1

A Clustering Algorithm for Chinese Adjectives and Nouns 1 Clusterng lgorthm for Chnese dectves and ouns Yang Wen, Chunfa Yuan, Changnng Huang 2 State Key aboratory of Intellgent Technology and System Deptartment of Computer Scence & Technology, Tsnghua Unversty,

More information

Efficient Semantically Equal Join on Strings in Practice

Efficient Semantically Equal Join on Strings in Practice Thammasat Int. J. Sc. Tech., Vol. 4, No., Aprl-June 009 Effcent Semantcally Equal Jon on Strngs n Practce Juggapong Natwcha Computer Engneerng Department, Faculty of Engneerng Chang Ma Unversty, Chang

More information

Under-Sampling Approaches for Improving Prediction of the Minority Class in an Imbalanced Dataset

Under-Sampling Approaches for Improving Prediction of the Minority Class in an Imbalanced Dataset Under-Samplng Approaches for Improvng Predcton of the Mnorty Class n an Imbalanced Dataset Show-Jane Yen and Yue-Sh Lee Department of Computer Scence and Informaton Engneerng, Mng Chuan Unversty 5 The-Mng

More information

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z.

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z. TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS Muradalyev AZ Azerbajan Scentfc-Research and Desgn-Prospectng Insttute of Energetc AZ1012, Ave HZardab-94 E-mal:aydn_murad@yahoocom Importance of

More information

Querying by sketch geographical databases. Yu Han 1, a *

Querying by sketch geographical databases. Yu Han 1, a * 4th Internatonal Conference on Sensors, Measurement and Intellgent Materals (ICSMIM 2015) Queryng by sketch geographcal databases Yu Han 1, a * 1 Department of Basc Courses, Shenyang Insttute of Artllery,

More information

Text Similarity Computing Based on LDA Topic Model and Word Co-occurrence

Text Similarity Computing Based on LDA Topic Model and Word Co-occurrence 2nd Internatonal Conference on Software Engneerng, Knowledge Engneerng and Informaton Engneerng (SEKEIE 204) Text Smlarty Computng Based on LDA Topc Model and Word Co-occurrence Mngla Shao School of Computer,

More information

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION Paulo Quntlano 1 & Antono Santa-Rosa 1 Federal Polce Department, Brasla, Brazl. E-mals: quntlano.pqs@dpf.gov.br and

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

THE CONDENSED FUZZY K-NEAREST NEIGHBOR RULE BASED ON SAMPLE FUZZY ENTROPY

THE CONDENSED FUZZY K-NEAREST NEIGHBOR RULE BASED ON SAMPLE FUZZY ENTROPY Proceedngs of the 20 Internatonal Conference on Machne Learnng and Cybernetcs, Guln, 0-3 July, 20 THE CONDENSED FUZZY K-NEAREST NEIGHBOR RULE BASED ON SAMPLE FUZZY ENTROPY JUN-HAI ZHAI, NA LI, MENG-YAO

More information

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning Outlne Artfcal Intellgence and ts applcatons Lecture 8 Unsupervsed Learnng Professor Danel Yeung danyeung@eee.org Dr. Patrck Chan patrckchan@eee.org South Chna Unversty of Technology, Chna Introducton

More information

CSCI 5417 Information Retrieval Systems Jim Martin!

CSCI 5417 Information Retrieval Systems Jim Martin! CSCI 5417 Informaton Retreval Systems Jm Martn! Lecture 11 9/29/2011 Today 9/29 Classfcaton Naïve Bayes classfcaton Ungram LM 1 Where we are... Bascs of ad hoc retreval Indexng Term weghtng/scorng Cosne

More information

Problem Set 3 Solutions

Problem Set 3 Solutions Introducton to Algorthms October 4, 2002 Massachusetts Insttute of Technology 6046J/18410J Professors Erk Demane and Shaf Goldwasser Handout 14 Problem Set 3 Solutons (Exercses were not to be turned n,

More information

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng

More information

An Efficient Genetic Algorithm with Fuzzy c-means Clustering for Traveling Salesman Problem

An Efficient Genetic Algorithm with Fuzzy c-means Clustering for Traveling Salesman Problem An Effcent Genetc Algorthm wth Fuzzy c-means Clusterng for Travelng Salesman Problem Jong-Won Yoon and Sung-Bae Cho Dept. of Computer Scence Yonse Unversty Seoul, Korea jwyoon@sclab.yonse.ac.r, sbcho@cs.yonse.ac.r

More information

A Background Subtraction for a Vision-based User Interface *

A Background Subtraction for a Vision-based User Interface * A Background Subtracton for a Vson-based User Interface * Dongpyo Hong and Woontack Woo KJIST U-VR Lab. {dhon wwoo}@kjst.ac.kr Abstract In ths paper, we propose a robust and effcent background subtracton

More information