Document Representation and Clustering with WordNet Based Similarity Rough Set Model

Size: px
Start display at page:

Download "Document Representation and Clustering with WordNet Based Similarity Rough Set Model"

Transcription

1 IJCSI Internatonal Journal of Computer Scence Issues, Vol. 8, Issue 5, No 3, September 20 ISSN (Onlne): Document Representaton and Clusterng wth WordNet Based Smlarty Rough Set Model Nguyen Ch Thanh and Koch Yamada Department of Management and Informaton System Scence, Nagaoka Unversty of Technology, Nagaoka-sh, Japan Abstract Most studes on document clusterng tll date use Vector Space Model (VSM) to represent documents n the document space, where documents are denoted by a vector n a word vector space. The standard VSM does not take nto account the semantc relatedness between terms. Thus, terms wth some semantc smlarty are dealt wth n the same way as terms wth no semantc relatedness. Snce ths unconcern about semantcs reduces the qualty of clusterng results, many studes have proposed varous approaches to ntroduce knowledge of semantc relatedness nto VSM model. Those approaches gve better results than the standard VSM. However they stll have ther own ssues. We propose a new approach as a combnaton of two approaches, one of whch uses Rough Sets theory and cooccurrence of terms, and the other uses WordNet knowledge to solve these ssues. Experments for ts evaluaton show advantage of the proposed approach over the others. Keywords: document clusterng, document representaton, rough sets, text mnng.. Introducton Document clusterng s an mportant text mnng technque to generate useful nformaton from text collectons such as news artcles, research papers, books, dgtal lbrares, e- mal messages, and web pages. Text-based document clusterng attempts to group documents nto clusters where each cluster mght represent a topc that s dfferent from topcs of the other clusters. Document clusterng algorthms are dvded nto two categores n general: parttonal clusterng and herarchcal clusterng. Parttonal clusterng dvdes a document collecton nto groups n a sngle level, whle herarchcal clusterng creates a tree structure of documents. There are varous document clusterng methods proposed n recent years, ncludng herarchcal clusterng algorthms usng results from a k-way parttonal clusterng soluton [], sphercal k-means [2], bsectng k-means [3], frequent term meanng sequences based method [4], k-means wth Harmony Search Optmzaton [5]. Vector space model s a popular model for document representaton n document clusterng ncludng the above methods. Documents are represented by vectors of weghts, where each weght n a vector denotes mportance of a term n the document. In the standard VSM, however, semantc relatons between terms are not taken nto account. Two terms wth a close semantc relaton and two other terms wth no semantc relaton are both treated n the same way. Ths unconcern about semantcs could reduce qualty of the clusterng result. There are some approaches proposed to deal wth ths problem. Tolerance Rough Set model (TRSM) [6] and Smlarty Rough Set Model (SRSM) [7] extended the vector space model usng Rough Sets theory and cooccurrence of terms. TRSM and SRSM have been successfully appled to document clusterng. However, the results showed that SRSM had better performance than TRSM and some other conventonal methods [7]. There are other approaches that employ WordNet based semantc smlarty to enhance the performance of document clusterng [8, 9]. They modfed the VSM model by readustng term weghts n the document vectors based on ts relatonshps wth other terms co-occurrng n the document. SRSM and WordNet based methods performed better results than the standard VSM. However, they stll have ther own ssues as dscussed later. We propose a new method by combnng ther strength and reducng ther weakness. The new method uses both Rough Sets theory and WordNet based semantc smlarty to defne a new representaton model of documents. Expermental results show that t gves better clusterng results than the other methods dscussed n the paper. The paper s organzed by sx sectons. In Secton 2 and Secton 3 we dscuss SRSM and WordNet semantc smlarty based methods, respectvely. Secton 4 descrbes our proposed method. Secton 5 presents the results of our experments on document collectons. Fnally, Secton 6 concludes wth a summary and dscusson about future research.

2 IJCSI Internatonal Journal of Computer Scence Issues, Vol. 8, Issue 5, September 20 ISSN (Onlne): Smlarty rough set model Smlarty Rough Set Model s a mathematcal model extended from Pawlak s Rough Set model [0] usng smlarty relaton nstead of equvalence relaton [7]. It s also an expanson from Tolerance Rough Set Model [6] wth a tolerance relaton. Equvalence, tolerance and smlarty relatons are bnary relatons that could be used to represent relatons between terms n document clusterng. An equvalence relaton must satsfy reflexve, symmetrc and transtve propertes, whle a tolerance relaton does not have to satsfy transtve one. A smlarty relaton must be reflexve, but not requred to be symmetrc and transtve [, 2]. TRSM based on a tolerance relaton was successfully appled to nformaton retreval and document clusterng n [6, 3, 4]. Recently, SRSM based on a smlarty relaton was proposed and appled to document clusterng by authors of ths paper [7]. It showed that SRSM produces better results than TRSM both n qualty and robustness, where co-occurrence of terms was used to obtan tolerance and smlarty relatons, respectvely. SRSM could be defned as follows: Let the par apr = (U, R) be an approxmaton space, where U s the unverse, and R U x U s a smlarty relaton on U. r(x): U 2 U s an uncertanty functon whch corresponds to the smlarty relaton R understood as yrx y r(x), whch mght represent that y s smlar to x. r(x) s a smlarty class of all obects that are consdered to have smlar nformaton to x. The functon r(x) satsfes reflexve property: x r(x), however t s not necessary symmetrc and transtve. Gven an arbtrary set X U, X can be characterzed by a par of lower and upper approxmatons as follows: apr( X ) x U r ( x) X, () apr( X ) r( x), (2) r x xx where ( ) denotes the nverse relaton of R, whch s the class of referent obects to whch x s smlar: r ( x) y U xry (3) We proposed a new model of document representaton for document clusterng usng the above generalzed rough set theory Smlarty Rough Set Model [7]. The new model s defned as follows. The unverse U of the approxmaton space (U, R) s the set of all terms T used n the document vectors. The bnary relaton R s defned by t Rt f t, t ). f ( t ), (4) D( D where f D (t, t ) s the number of documents n the document set D n whch term t and t co-occur, f D (t ) s the number of documents n D n whch term t occurs and s a parameter (0 < < ). The relaton R defned above s a smlarty relaton that satsfes only reflexvty. An uncertanty functon I (t ) correspondng to the smlarty relaton s defned as I (t ) = {t U t Rt }, (5) where I (t ) s a set of all terms smlar to t. The lower and upper approxmaton of any subset X T based on ths model can be obtaned usng equatons () and (2), where U and r are replaced by T and I, respectvely. I In ths case, ( t ) s the set of terms to whch t s smlar, and s defned as I ( t ) t f D ( t, t ). f In the document clusterng wth SRSM (referred to as SRSM later, whle the ordnary approach s referred to as VSM), we appled sphercal k-means algorthm [2] to term vectors that conssts of terms n upper approxmatons of ordnary document vectors (term sets). The usage of upper approxmaton could gve us better clusterng results, because two documents become smlar to each other, f one contans many terms smlar (n the sense of eq. (4)) to terms n the other even f the two documents do not have many common terms. Snce there are many synonyms n natural language n general and people use dfferent terms to represent a certan thng, the upper approxmaton would gve a postve effect on document clusterng. There would be another advantage of usng the upper approxmaton. The number of terms n a document s usually relatvely small n comparson wth the number of terms n a corpus. Therefore, the document vectors are usually hgh dmensonal and sparse. Hence, document smlarty measurements often yeld zero values, whch can lead to the poor clusterng results. Snce the proposed approach puts addtonal terms nto document vectors wthout ncreasng the dmenson, the unwelcome tendency mght be mtgated to some extent. We use tf df weghtng scheme to calculate the weghts of terms n upper approxmatons of the document vectors. The term weghtng method s extended to defne weghts of terms that are not contaned n documents but n the upper approxmatons. It ensures that such terms have a weght smaller than the weght of any other term n the document. The weght a of term t n the upper approxmaton of document d s then defned as follows. D ( t ) (6)

3 IJCSI Internatonal Journal of Computer Scence Issues, Vol. 8, Issue 4, July 20 ISSN (Onlne): N f log f t d t ) N log (7) a f D( t ) mnt f ( ) \ h d wh t apr d d log N t ) 0 f t apr( d ) where f s the frequency of term n document, N s number of documents, d s a set of terms appearng n document, t h s the term wth the smallest weght n the document and w h s the orgnal weght of term t h n the document. Then normalzaton s appled to the upper approxmatons of document vectors. The cosne smlarty measure s used to calculate the smlarty between two vectors. The algorthm s descrbed as follows [7]:. Preprocessng (word stemmng, stopwords removal). 2. Create document vectors. 2.a. Obtan sets of terms appearng n documents. 2.b. Create document vectors usng tf df. 2.b. Generate smlarty classes of terms based on ther co-occurrences. 2.c. Create vectors of upper approxmatons of documents usng equaton (7) and then the vectors are normalzed. 3. Apply the clusterng algorthm 3.a. Start wth a random parttonng of the vectors of upper approxmatons of documents, namely C (0) = {C (0), C (0) 2,..., C (0) k }. Let c (0), c (0) (0) 2,..., c k denote the centrods of the gven parttonng wth the ndex of teraton t = 0. 3.b. For each document vector x, N, fnd the centrod closest n cosne smlarty to ts upper approxmaton apr ( x ). Then, compute the new parttonng C (t + ) based on the old centrods c (t), c (t) 2,..., c (t) k : C (t+) s the set of all document vectors whose upper approxmatons are closest to the centrod c (t). If the upper approxmaton of a document s closest to more than one centrod, then t s randomly assgned to one of the clusters. 3.c. Compute the new centrods: s apr( x ), ( t ) c s s, k, where c (t+) denotes the centrod or the mean of the upper approxmatons of documents n cluster C (t+). 3.d. If some stoppng crteron s met, then set C * = C (t + ) and set c * = c (t + ) for k, and ext. Otherwse, ncrement t by, and go to step 3.b above. In our mplementaton, the teraton stops when the centrods of the generated clusters are dentcal to those generated n the prevous teraton. In SRSM, we used co-occurrence of terms to calculate the semantc relaton between terms. The usage of cooccurrence gves us a mert that lets us defne smlarty relatons automatcally wthout any knowledge base. However, t mght also have a weakness that n some cases co-occurrence of terms does not necessarly mean they have a smlar meanng. In the case, terms that do not appear n a document nor smlar to any term n the document may be contaned n the upper approxmaton. 3. WordNet semantc smlarty based model WordNet s an electronc lexcal database of Englsh, avalable to researchers n computatonal lngustcs and natural language processng [5]. WordNet was developed and s beng mantaned by the Cogntve Scence Laboratory of Prnceton Unversty. In WordNet, a concept represents a meanng of a term. Terms whch have the same concept are grouped n a synset. Each synset has ts defnton (gloss) and lnks wth other synsets hgher or lower n the herarchy by dfferent types of semantc relatons. There are dfferent methods to compute semantc smlarty of terms usng WordNet, whch can be dvded nto four categores: path based, nformaton content based, gloss based and vector based methods. Path based methods use length of the path between concept nodes to calculate the smlarty relatedness [6, 7]. Informaton content based methods [8, 9] measure the relatedness of the two concepts usng the nformaton content of the most specfc shared parent. In gloss based methods [20, 2], glosses of concepts are used to determne the relatedness of concepts. In vector based methods [22, 23], the relatedness between terms are computed usng concept vectors derved from glosses. Recently, some studes used WordNet-based semantc smlarty to enhance performance of document clusterng [8, 9]. They modfed the VSM model by readustng weghts of terms n the documents. The basc dea s that a term s consdered more mportant f other terms semantcally related to t appear n the same document. They ncrease weght values of such terms wth the followng equaton: w ~ w sm( t, t ) w (8) t d

4 IJCSI Internatonal Journal of Computer Scence Issues, Vol. 8, Issue 5, September 20 ISSN (Onlne): where w d, sm( t, t ) 2 s the orgnal weghts of term t n document s the semantc smlarty between the two terms calculated usng a WordNet based measure. They proposed mproved VSM model based on ths dea and showed that the clusterng performance based on the new model was better than that based on the VSM. The advantage of ths approach s the hgh relablty of smlarty gven by the WordNet. The basc dea behnd eq. (8) also seems adequate. A possble weak pont mght come from the general property of WordNet. Snce t s a general dctonary, t mght not work for documents n a specfc feld. Another s that t utlzes the knowledge of smlarty only to adust the mportance of terms n a document. It does not let us fnd smlarty between two documents where one contans many terms smlar to ones n the other but the two do not have many common terms. 4. WordNet based smlarty rough set model for document clusterng In document clusterng, the effect of semantc smlarty between terms s large, and must be taken nto account to enhance the performance of VSM. In SRSM, the semantc relaton between terms s calculated usng co-occurrence of terms. However, there seem cases when terms have hgh co-occurrence but have low semantc smlarty. WordNetbased approaches measure the relatedness of terms usng the lexcal database. Based on the ontology structure of terms or defntons of terms n WordNet, we can compute scores of semantc relatedness. However, as a general dctonary, WordNet does not cover all terms and term meanngs n every specfc subect. Moreover, n dfferent felds, the semantc relaton of terms may be dfferent. Our dea s to explot both approaches to get better clusterng results. In SRSM, we defned the smlarty class of terms usng the relaton R gven by eq. (4). Here, we propose a new relaton that ntegrates WordNet knowledge to elmnate terms havng no smlar meanng but a hgh frequency of co-occurrence. t Rt f t, t ). f ( t ) (( t not n WordNet) D( D ( t not n WordNet) sm( t, t ) > )), (9) where s a threshold value. The relaton defned by Eq. (9) s a smlarty relaton, because t s reflexve, non-symmetrc and non-transtve. The basc dea s that term t s smlar to t when t s smlar to t from the vewpont of co-occurrence and they are also smlar n the semantcs of WordNet. If t or t s not n WordNet, we use only the co-occurrence smlarty. Then we can defne a new representaton model based on ths relaton n the smlar way to the one n secton 2. Let the par apr = (U, R) be an approxmaton space, where U s the set of all ndex terms T n the same way as SRSM, and R U x U s a smlarty relaton on U. r(x): U 2 U s an uncertanty functon whch corresponds to the relaton R understood as yrx y r(x), whch mght represent that y s smlar to x. r(x) s a smlarty class of all obects that are consdered to have smlar nformaton to x. The functon r(x) satsfes reflexve property: x r(x), however t s not necessary symmetrc and transtve. Gven an arbtrary set X U, X can be characterzed by a par of lower and upper approxmatons as equatons () and (2). The bnary relaton R s a relaton that corresponds to an uncertanty functon defned by eq. (9). That s, I θ (t ) = {t U t Rt }. (0) R s a smlarty relaton because t only satsfes the propertes of reflexvty. In SRSM, we assgned weghts to terms that do not occur n the document but belong to smlarty classes of terms n the document, and do not change the weght values of terms n the document. In the new method we mprove the SRSM by readustng weght values of terms based on the dea of WordNet based methods. The weght a of term t n the upper approxmaton of document d s then defned as follows. N f log sm( t, tk ) a t ) tkd k N log a mn t ) thd a h log N t ) 0 k f t f t d apr( d f t apr( d ) ) \ d () The new proposed approach could be regarded as a combnaton of SRSM and WSSM (WordNet Semantc Smlarty based Model) whch ncorporate the advantages of both the models. WSSM s completely ncluded n the proposed approach because weghts of terms n a document are adusted usng eq. (8). In addton, t s an mproved verson of SRSM, because t calculates the upper approxmaton of the term set of a document and uses t as the document vector. The mprovements are the smlarty relaton (eq. (9)) used to calculate the upper approxmaton, and eq. () to readust the weghts of terms that are contaned n the document.

5 IJCSI Internatonal Journal of Computer Scence Issues, Vol. 8, Issue 4, July 20 ISSN (Onlne): Expermental results In the experments, we use two test collectons to evaluate the proposed approach n comparson wth SRSM, WSSM and methods n CLUTO toolkt [24]. The algorthms provded n CLUTO toolkt are based on the parttonal, agglomeratve, and graph-parttonng paradgms. They are denoted as rb, rbr, drect, agglo, graph, bagglo. The rb s a repeated bsectng approach. The rbr s the same as the repeated bsectng method except that at the end the overall soluton s globally optmzed. The drect s a partton method whch uses an teratve refnement algorthm to optmze a global clusterng crteron functon. The agglo s an agglomeratve clusterng algorthm. The graph uses a nearest-neghbor graph to model documents, and then dvdes the graph nto k clusters usng a mn-cut graph parttonng algorthm. In the bagglo, agglomeraton process s used to cluster documents after the document collecton s splt nto N clusters usng the rb method. The frst test collecton s a classc data set obtaned by combnng CACM, CISI, CRANFIELD, and MEDLINE abstracts whch s avalable from [25]. The dataset ncludes abstracts of papers n dfferent felds. CACM contans 3204 abstracts from Communcatons of ACM, CISI contans 460 abstracts of nformaton scence papers, CRANDFIELD contans 400 abstract of aeronautcal papers, MEDLINE contans 033 abstracts of medcne papers. The clusterng algorthms are supposed to cluster the dataset contanng 7097 abstracts nto four groups. After preprocessng (stemmng, stop-words elmnaton, and hgh frequency word prunng), we have 377 terms n the document collecton. Wth the 377 terms we created 7097 document vectors usng tf df weghtng scheme, each document vector has 377 dmensons. We evaluate clusterng results obtaned by each algorthm wth three commonly-used measures: entropy, F measure and mutual nformaton [3, 26]. There are dfferent clusterng qualty measures renderng dfferent results. However, f a method performs better than the others on many of these measures then we could say that the method s better than the others. Entropy, F measure and mutual nformaton measures are external qualty measures whch evaluate the clusterng results by comparng the clusters produced by the algorthm to the known classes of documents. Wth the entropy measure method, the clusterng qualty s better f the entropy s smaller. Whle wth F measure and mutual nformaton method, the hgher the evaluated values are the better clusterng result s. We run the experments wth the proposed method, SRSM and WSSM. We also run the test collecton wth the CLUTO toolkt. The WordNet-based smlarty measure used n the experment s the Wu and Palmer measure [7], whch s a path-based method. It computes the relatedness of two concepts usng the lowest common subsumer of two concepts lcs(c, c 2 ) whch s the frst shared concept on the paths from the concepts to the root concept of the ontology herarchy. 2 depth( lcs) sm( c, c2) (2) l( c, lcs) l( c, lcs) 2 depth( lcs) 2 where l(c,lcs) s the length of the path between the two nodes and depth(lcs) s the number of nodes on the path from lcs to root. We ran the experment usng the proposed method wth the value of threshold = 0.3. Table shows the evaluaton of clusterng results from CLUTO toolkt s algorthms, Table 2 shows the evaluaton of clusterng results of SRSM and the newly proposed method wth dfferent values of parameter. The best evaluaton n each qualty measure s shaded n Table 2. As for WSSM, the evaluaton of the clusterng result was 0.363, 0.332, for entropy, mutual nformaton and F measure respectvely. As seen n these results, the best case s the clusterng by the proposed approach wth = 0.55 n all three evaluaton measures. Wth SRSM, co-occurrence of terms s used to determne the smlarty classes of terms. In the proposed method, we use both co-occurrence of terms and WordNet based semantc smlarty. The new approach, as suggested by the fgures n the column of "sze of smlarty classes" n Table 2, can remove rrelevant terms from smlarty classes. For example, wth the SRSM mplementaton n our experment, smlarty class of photon contans ntegraton and algebra, whch have low semantc relatedness wth photon tself. Wth the new method, ntegraton and algebra are removed from the smlarty class. Another example would be the term program, whch for SRSM s n the smlarty class of glossary, whle for the new approach t s not the case. The removal of rrelevant terms mproves the qualty of smlarty classes and could gve better clusterng results. Table : Clusterng results of the frst data set from CLUTO toolkt [7] CLUTO Method Entropy nformaton F measure Rb Rbr Drect Agglo Bagglo

6 IJCSI Internatonal Journal of Computer Scence Issues, Vol. 8, Issue 5, No 3, September 20 ISSN (Onlne): Table 2: Evaluaton of clusterng results wth SRSM and the new method for the frst data set SRSM New method Entropy nformaton F measure Sze of smlarty Sze of smlarty classes Entropy Informaton F measure classes Max Avg Max Avg The maxmum, the mnmum and the average szes of smlarty classes of SRSM and the proposed method are shown n Table 2. Number of terms that are not n WordNet s 4753 among 377 terms. It s around 36%. We can see that szes of smlarty classes of the proposed method are smaller than those of SRSM. The dfference s the result of removng terms wth low WordNet based semantc relatedness from smlarty classes n the proposed method. As defned by eq. (9), a smlarty class of a term t conssts of terms that satsfy both the condton of co-occurrence wth t and one of the followng condtons: ) at least one of the two terms does not exst n WordNet database; 2) the WordNet based smlarty measure between the two terms s greater than a threshold value. For example, when = 0.55, the average number of smlarty classes defned only by the co-occurrence condton s 2.45 (SRSM), whle the one defned by eq. (9) s 2.07 (the proposed method), whch means that 0.38 terms n average are removed from smlarty classes of SRSM because they do not satsfy the above condton ) nor 2). Then, among the remanng 2.07 terms of smlarty classes of the proposed method, 0.88 terms satsfy the condton ) and.9 satsfy condton 2), n average. The contngency table of the best case of the proposed method s shown n Table 3. Precson and recall of CACM, CISI, CRANFIELD, and MEDLINE are 0.964, 0.797, 0.930, and 0.85, 0.952, 0.976, 0.983, respectvely. The computaton tme of the new method s almost same as the one of the SRSM method whch has the tme Table 3: Contngency table of the best case of the proposed method CACM CISI CRANFIELD MEDLINE Cluster Cluster Cluster Cluster complexty of O(MlogM) [7], where M s the number of terms n the text collecton. The dfference between the new and SRSM methods s the computaton of term semantc relatonshp based on WordNet. The computaton of semantc relatonshp s fast because we use a path based method and the maxmum depth of the word herarchy n WordNet s sxteen [9], a very small number n comparson wth number of terms n a text collecton. The second test collecton used n our experment s abstracts of papers from several IEEE ournals of several felds. We formed a collecton of 00 documents from IEEE Transactons on Knowledge and Data Engneerng (378 abstracts), IEEE Transactons on Bomedcal Engneerng (3 abstracts) and IEEE Transactons on Nanotechnology (32 abstracts). These categores of documents are denoted as KDE, BIO and NANO. We use the clusterng methods to cluster the data set nto three clusters. After removng stopwords and stemmng words, we have 5690 terms n the document collecton. Wth 5690 terms, the algorthm created 00 document vector usng tf df weghtng scheme, each document vector has 5690 dmensons. Table 4: Clusterng results of the second data set from CLUTO toolkt [7] CLUTO Method Entropy nformaton F measure rb rbr drect agglo graph bagglo

7 IJCSI Internatonal Journal of Computer Scence Issues, Vol. 8, Issue 4, July 20 ISSN (Onlne): Table 5: Evaluaton of clusterng results wth SRSM and the new method for the second data set SRSM New method Entropy nformaton F measure Entropy nformaton F measure Table 4 shows the evaluaton of clusterng results from CLUTO toolkt s algorthms. Table 5 shows the evaluaton of clusterng results of SRSM and the newly proposed method wth dfferent values of parameter. The best evaluaton n each qualty measure values s shaded n Table 5. For the WordNet semantc smlarty based method, the evaluaton of the clusterng result was 0.363, 0.332, for entropy, mutual nformaton and F measure respectvely. The results show that clusterng results of the newly proposed method are better than those of the other methods n all three evaluaton measures. 6. Conclusons The vector space model s wdely used n the feld of document clusterng. It represents a document as a vector of terms. However, the smple VSM treats terms ndependent to each other and the semantc relatonshps between terms are not consdered. Therefore, t reduces the effectveness of document clusterng methods. SRSM method and WordNet semantc smlarty based method use the semantc relaton between terms to mprove the performance of document clusterng. However, these methods have ther own ssues as we dscussed n the prevous sectons. We proposed a new method that s a combnaton of SRSM and WordNet semantc smlarty based method to solve these ssues. Our experment results show that the qualty of the clusterng wth the proposed method s better than the ones wth SRSM and WordNet semantc smlarty based method. Its clusterng results are also better than results of other methods n the CLUTO toolkt. In addton to WordNet, Wkpeda and Wktonary are also promsng tools for semantc relatedness measurement and analyss [22]. In our future work, we wll explot these tools to further mprove document clusterng methods. References [] Y. Zhao and G. Karyps, Herarchcal clusterng algorthms for document datasets, Data Mnng and Knowledge Dscovery, 0 (2), pp. 4-68, [2] I.S. Dhllon and D.S. Modha, Concept decompostons for large sparse text data usng clusterng, Machne Learnng, 42 (-2), pp , 200. [3] M. Stenbach, G. Karyps and V. Kumar, A comparson of document clusterng technques, Proceedngs of the KDD Workshop on Text Mnng, [4] Y. L, S.M. Chung and J.D. Holt, Text document clusterng based on frequent word meanng sequences, Data and Knowledge Engneerng, 64 (), pp , [5] M. Mahdav and H. Abolhassan, Harmony K-means algorthm for document clusterng, Data Mnng and Knowledge Dscovery, pp. -22, [6] T.B. Ho and K. Funakosh, Informaton retreval usng rough sets, Journal of Japanese Socety for Artfcal Intellgence, 3 (3), pp , 997. [7] N.C. Thanh, K. Yamada and M. Unehara, A Smlarty Rough Set Model for document representaton and document clusterng, Journal of Advanced Computatonal Intellgence and Intellgent Informatcs, 5 (2), pp , 20. [8] W.K. Gad and M.S. Kamel, Enhancng text clusterng performance usng semantc smlarty, Lecture Notes n Busness Informaton Processng, 24 LNBIP, pp , [9] L. Jng, M.K. Ng and J.Z. Huang, Knowledge-based vector space model for text clusterng, Knowledge and Informaton Systems, 25 (), pp , 200. [0] Z. Pawlak, Rough sets, Int. J. of Informaton and Computer Scences, (5), pp , 982. [] R. Slownsk and D. Vanderpooten, Smlarty Relaton as a bass for rough approxmaton, Advances n Machne Intellgence and Soft Computng, Vol.4, pp. 7-33, 997. [2] R. Slownsk and D. Vanderpooten, A generalzed defnton of rough approxmatons based on smlarty, IEEE Trans. on Knowledge and Data Engneerng, 2 (2), pp , [3] T.B. Ho and N.B. Nguyen, Nonherarchcal document clusterng based on a tolerance rough set model, Internatonal Journal of Intellgent Systems, 7 (2), pp , [4] X.-J. Meng, Q.-C. Chen, and X.-L. Wang, A tolerance rough set based semantc clusterng method for web search results, Informaton Technology Journal, 8 (4), pp , [5] Prnceton Unversty, "About WordNet", WordNet, Prnceton Unversty. 200, [6] R. Rada, H. Ml, E. Bcknell and M. Blettner, Development and applcaton of a metrc on semantc nets, IEEE Transactons on Systems, Man and Cybernetcs, v (n), pp. 7-30, 989. [7] Z. Wu and M. Palmer, Verbs semantcs and lexcal selecton, Proceedngs of the 32nd annual meetng on Assocaton for Computatonal Lngustcs (ACL '94), pp , 994. [8] P. Resnk, Usng nformaton content to evaluate semantc smlarty, Proceedngs of the 4th Internatonal Jont Conference on Artfcal Intellgence, pp , 995.

8 IJCSI Internatonal Journal of Computer Scence Issues, Vol. 8, Issue 5, September 20 ISSN (Onlne): [9] J. J. Jang and D. W. Conrath. Semantc smlarty based on corpus statstcs and lexcal taxonomy, Proceedngs of the 0th Internatonal Conference on Research n Computatonal Lngustcs, Tape, Tawan, 997. [20] M. Lesk, Automatc sense dsambguaton usng machne readable dctonares: How to tell a pne cone from an ce cream cone, Proceedngs of the 5th Annual Internatonal Conference on Systems Documentaton, pp , 986. [2] S. Baneree and T. Pedersen, An adapted lesk algorthm for word sense dsambguaton usng WordNet, CICLng ' 02: Proceedngs of the Thrd Internatonal Conference on Computatonal Lngustcs and Intellgent Text Processng, pp , [22] T. Zesch and I. Gurevych, Wsdom of crowds versus wsdom of lngusts - Measurng the semantc relatedness of words, Natural Language Engneerng, 6 (), pp , 200. [23] S. Patwardhan and T. Pedersen, Usng WordNet Based Context Vectors to Estmate the Semantc Relatedness of Concepts, Proceedngs of the EACL 2006 Workshop Makng Sense of Sense - Brngng Computatonal Lngustcs and Psycholngustcs Together, pp. -8, Trento, Italy, [24] G. Karyps, CLUTO - A Clusterng Toolkt, 2003, [25] ftp://ftp.cs.cornell.edu/pub/smart [26] A. Strehl, J. Ghosh and R. Mooney, Impact of smlarty measures on web-page clusterng, Proceedngs of the 7th Natonal Conference on Artfcal Intellgence: Workshop of Artfcal Intellgence for Web search (AAAI 2000), Austn, TX, pp , July 2000.

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for

More information

Description of NTU Approach to NTCIR3 Multilingual Information Retrieval

Description of NTU Approach to NTCIR3 Multilingual Information Retrieval Proceedngs of the Thrd NTCIR Workshop Descrpton of NTU Approach to NTCIR3 Multlngual Informaton Retreval Wen-Cheng Ln and Hsn-Hs Chen Department of Computer Scence and Informaton Engneerng Natonal Tawan

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

Query Clustering Using a Hybrid Query Similarity Measure

Query Clustering Using a Hybrid Query Similarity Measure Query clusterng usng a hybrd query smlarty measure Fu. L., Goh, D.H., & Foo, S. (2004). WSEAS Transacton on Computers, 3(3), 700-705. Query Clusterng Usng a Hybrd Query Smlarty Measure Ln Fu, Don Hoe-Lan

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices Internatonal Mathematcal Forum, Vol 7, 2012, no 52, 2549-2554 An Applcaton of the Dulmage-Mendelsohn Decomposton to Sparse Null Space Bases of Full Row Rank Matrces Mostafa Khorramzadeh Department of Mathematcal

More information

UB at GeoCLEF Department of Geography Abstract

UB at GeoCLEF Department of Geography   Abstract UB at GeoCLEF 2006 Mguel E. Ruz (1), Stuart Shapro (2), June Abbas (1), Slva B. Southwck (1) and Davd Mark (3) State Unversty of New York at Buffalo (1) Department of Lbrary and Informaton Studes (2) Department

More information

Performance Evaluation of Information Retrieval Systems

Performance Evaluation of Information Retrieval Systems Why System Evaluaton? Performance Evaluaton of Informaton Retreval Systems Many sldes n ths secton are adapted from Prof. Joydeep Ghosh (UT ECE) who n turn adapted them from Prof. Dk Lee (Unv. of Scence

More information

Keywords - Wep page classification; bag of words model; topic model; hierarchical classification; Support Vector Machines

Keywords - Wep page classification; bag of words model; topic model; hierarchical classification; Support Vector Machines (IJCSIS) Internatonal Journal of Computer Scence and Informaton Securty, Herarchcal Web Page Classfcaton Based on a Topc Model and Neghborng Pages Integraton Wongkot Srura Phayung Meesad Choochart Haruechayasak

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

The Research of Support Vector Machine in Agricultural Data Classification

The Research of Support Vector Machine in Agricultural Data Classification The Research of Support Vector Machne n Agrcultural Data Classfcaton Le Sh, Qguo Duan, Xnmng Ma, Me Weng College of Informaton and Management Scence, HeNan Agrcultural Unversty, Zhengzhou 45000 Chna Zhengzhou

More information

Load Balancing for Hex-Cell Interconnection Network

Load Balancing for Hex-Cell Interconnection Network Int. J. Communcatons, Network and System Scences,,, - Publshed Onlne Aprl n ScRes. http://www.scrp.org/journal/jcns http://dx.do.org/./jcns.. Load Balancng for Hex-Cell Interconnecton Network Saher Manaseer,

More information

Hierarchical clustering for gene expression data analysis

Hierarchical clustering for gene expression data analysis Herarchcal clusterng for gene expresson data analyss Gorgo Valentn e-mal: valentn@ds.unm.t Clusterng of Mcroarray Data. Clusterng of gene expresson profles (rows) => dscovery of co-regulated and functonally

More information

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Determining the Optimal Bandwidth Based on Multi-criterion Fusion Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Mult-crteron Fuson Ha-L Lang 1+, Xan-Mn

More information

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning Outlne Artfcal Intellgence and ts applcatons Lecture 8 Unsupervsed Learnng Professor Danel Yeung danyeung@eee.org Dr. Patrck Chan patrckchan@eee.org South Chna Unversty of Technology, Chna Introducton

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

Available online at Available online at Advanced in Control Engineering and Information Science

Available online at   Available online at   Advanced in Control Engineering and Information Science Avalable onlne at wwwscencedrectcom Avalable onlne at wwwscencedrectcom Proceda Proceda Engneerng Engneerng 00 (2011) 15000 000 (2011) 1642 1646 Proceda Engneerng wwwelsevercom/locate/proceda Advanced

More information

A New Approach For the Ranking of Fuzzy Sets With Different Heights

A New Approach For the Ranking of Fuzzy Sets With Different Heights New pproach For the ankng of Fuzzy Sets Wth Dfferent Heghts Pushpnder Sngh School of Mathematcs Computer pplcatons Thapar Unversty, Patala-7 00 Inda pushpndersnl@gmalcom STCT ankng of fuzzy sets plays

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15 CS434a/541a: Pattern Recognton Prof. Olga Veksler Lecture 15 Today New Topc: Unsupervsed Learnng Supervsed vs. unsupervsed learnng Unsupervsed learnng Net Tme: parametrc unsupervsed learnng Today: nonparametrc

More information

Online Text Mining System based on M2VSM

Online Text Mining System based on M2VSM FR-E2-1 SCIS & ISIS 2008 Onlne Text Mnng System based on M2VSM Yasufum Takama 1, Takash Okada 1, Toru Ishbash 2 1. Tokyo Metropoltan Unversty, 2. Tokyo Metropoltan Insttute of Technology 6-6 Asahgaoka,

More information

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems A Unfed Framework for Semantcs and Feature Based Relevance Feedback n Image Retreval Systems Ye Lu *, Chunhu Hu 2, Xngquan Zhu 3*, HongJang Zhang 2, Qang Yang * School of Computng Scence Smon Fraser Unversty

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints Australan Journal of Basc and Appled Scences, 2(4): 1204-1208, 2008 ISSN 1991-8178 Sum of Lnear and Fractonal Multobjectve Programmng Problem under Fuzzy Rules Constrants 1 2 Sanjay Jan and Kalash Lachhwan

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

Module Management Tool in Software Development Organizations

Module Management Tool in Software Development Organizations Journal of Computer Scence (5): 8-, 7 ISSN 59-66 7 Scence Publcatons Management Tool n Software Development Organzatons Ahmad A. Al-Rababah and Mohammad A. Al-Rababah Faculty of IT, Al-Ahlyyah Amman Unversty,

More information

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS ARPN Journal of Engneerng and Appled Scences 006-017 Asan Research Publshng Network (ARPN). All rghts reserved. NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS Igor Grgoryev, Svetlana

More information

Web Document Classification Based on Fuzzy Association

Web Document Classification Based on Fuzzy Association Web Document Classfcaton Based on Fuzzy Assocaton Choochart Haruechayasa, Me-Lng Shyu Department of Electrcal and Computer Engneerng Unversty of Mam Coral Gables, FL 33124, USA charuech@mam.edu, shyu@mam.edu

More information

Load-Balanced Anycast Routing

Load-Balanced Anycast Routing Load-Balanced Anycast Routng Chng-Yu Ln, Jung-Hua Lo, and Sy-Yen Kuo Department of Electrcal Engneerng atonal Tawan Unversty, Tape, Tawan sykuo@cc.ee.ntu.edu.tw Abstract For fault-tolerance and load-balance

More information

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1) Secton 1.2 Subsets and the Boolean operatons on sets If every element of the set A s an element of the set B, we say that A s a subset of B, or that A s contaned n B, or that B contans A, and we wrte A

More information

A NOTE ON FUZZY CLOSURE OF A FUZZY SET

A NOTE ON FUZZY CLOSURE OF A FUZZY SET (JPMNT) Journal of Process Management New Technologes, Internatonal A NOTE ON FUZZY CLOSURE OF A FUZZY SET Bhmraj Basumatary Department of Mathematcal Scences, Bodoland Unversty, Kokrajhar, Assam, Inda,

More information

Graph-based Clustering

Graph-based Clustering Graphbased Clusterng Transform the data nto a graph representaton ertces are the data ponts to be clustered Edges are eghted based on smlarty beteen data ponts Graph parttonng Þ Each connected component

More information

Semantic Image Retrieval Using Region Based Inverted File

Semantic Image Retrieval Using Region Based Inverted File Semantc Image Retreval Usng Regon Based Inverted Fle Dengsheng Zhang, Md Monrul Islam, Guoun Lu and Jn Hou 2 Gppsland School of Informaton Technology, Monash Unversty Churchll, VIC 3842, Australa E-mal:

More information

Unsupervised Learning

Unsupervised Learning Pattern Recognton Lecture 8 Outlne Introducton Unsupervsed Learnng Parametrc VS Non-Parametrc Approach Mxture of Denstes Maxmum-Lkelhood Estmates Clusterng Prof. Danel Yeung School of Computer Scence and

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervsed Learnng and Clusterng Why consder unlabeled samples?. Collectng and labelng large set of samples s costly Gettng recorded speech s free, labelng s tme consumng 2. Classfer could be desgned

More information

LRD: Latent Relation Discovery for Vector Space Expansion and Information Retrieval

LRD: Latent Relation Discovery for Vector Space Expansion and Information Retrieval LRD: Latent Relaton Dscovery for Vector Space Expanson and Informaton Retreval Techncal Report KMI-06-09 March, 006 Alexandre Gonçalves, Janhan Zhu, Dawe Song, Vctora Uren, Roberto Pacheco In Proc. of

More information

Combining Multiple Resources, Evidence and Criteria for Genomic Information Retrieval

Combining Multiple Resources, Evidence and Criteria for Genomic Information Retrieval Combnng Multple Resources, Evdence and Crtera for Genomc Informaton Retreval Luo S 1, Je Lu 2 and Jame Callan 2 1 Department of Computer Scence, Purdue Unversty, West Lafayette, IN 47907, USA ls@cs.purdue.edu

More information

K-means and Hierarchical Clustering

K-means and Hierarchical Clustering Note to other teachers and users of these sldes. Andrew would be delghted f you found ths source materal useful n gvng your own lectures. Feel free to use these sldes verbatm, or to modfy them to ft your

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

Classic Term Weighting Technique for Mining Web Content Outliers

Classic Term Weighting Technique for Mining Web Content Outliers Internatonal Conference on Computatonal Technques and Artfcal Intellgence (ICCTAI'2012) Penang, Malaysa Classc Term Weghtng Technque for Mnng Web Content Outlers W.R. Wan Zulkfel, N. Mustapha, and A. Mustapha

More information

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1 4/14/011 Outlne Dscrmnatve classfers for mage recognton Wednesday, Aprl 13 Krsten Grauman UT-Austn Last tme: wndow-based generc obect detecton basc ppelne face detecton wth boostng as case study Today:

More information

Mathematics 256 a course in differential equations for engineering students

Mathematics 256 a course in differential equations for engineering students Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the

More information

An Improved Image Segmentation Algorithm Based on the Otsu Method

An Improved Image Segmentation Algorithm Based on the Otsu Method 3th ACIS Internatonal Conference on Software Engneerng, Artfcal Intellgence, Networkng arallel/dstrbuted Computng An Improved Image Segmentaton Algorthm Based on the Otsu Method Mengxng Huang, enjao Yu,

More information

Non-Split Restrained Dominating Set of an Interval Graph Using an Algorithm

Non-Split Restrained Dominating Set of an Interval Graph Using an Algorithm Internatonal Journal of Advancements n Research & Technology, Volume, Issue, July- ISS - on-splt Restraned Domnatng Set of an Interval Graph Usng an Algorthm ABSTRACT Dr.A.Sudhakaraah *, E. Gnana Deepka,

More information

A Clustering Algorithm for Chinese Adjectives and Nouns 1

A Clustering Algorithm for Chinese Adjectives and Nouns 1 Clusterng lgorthm for Chnese dectves and ouns Yang Wen, Chunfa Yuan, Changnng Huang 2 State Key aboratory of Intellgent Technology and System Deptartment of Computer Scence & Technology, Tsnghua Unversty,

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information

Enhancement of Infrequent Purchased Product Recommendation Using Data Mining Techniques

Enhancement of Infrequent Purchased Product Recommendation Using Data Mining Techniques Enhancement of Infrequent Purchased Product Recommendaton Usng Data Mnng Technques Noraswalza Abdullah, Yue Xu, Shlomo Geva, and Mark Loo Dscplne of Computer Scence Faculty of Scence and Technology Queensland

More information

2x x l. Module 3: Element Properties Lecture 4: Lagrange and Serendipity Elements

2x x l. Module 3: Element Properties Lecture 4: Lagrange and Serendipity Elements Module 3: Element Propertes Lecture : Lagrange and Serendpty Elements 5 In last lecture note, the nterpolaton functons are derved on the bass of assumed polynomal from Pascal s trangle for the fled varable.

More information

DOCUMENT clustering is a special version of data clustering

DOCUMENT clustering is a special version of data clustering INTL JOURNAL OF ELECTRONICS AND TELECOMMUNICATIONS, 2011, VOL. 57, NO. 3, PP. 271 277 Manuscrpt receved June 19, 2011; revsed September 2011. DOI: 10.2478/v10177-011-0036-5 Document Clusterng Concepts,

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices Steps for Computng the Dssmlarty, Entropy, Herfndahl-Hrschman and Accessblty (Gravty wth Competton) Indces I. Dssmlarty Index Measurement: The followng formula can be used to measure the evenness between

More information

Classifier Selection Based on Data Complexity Measures *

Classifier Selection Based on Data Complexity Measures * Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.

More information

F Geometric Mean Graphs

F Geometric Mean Graphs Avalable at http://pvamu.edu/aam Appl. Appl. Math. ISSN: 1932-9466 Vol. 10, Issue 2 (December 2015), pp. 937-952 Applcatons and Appled Mathematcs: An Internatonal Journal (AAM) F Geometrc Mean Graphs A.

More information

A Deflected Grid-based Algorithm for Clustering Analysis

A Deflected Grid-based Algorithm for Clustering Analysis A Deflected Grd-based Algorthm for Clusterng Analyss NANCY P. LIN, CHUNG-I CHANG, HAO-EN CHUEH, HUNG-JEN CHEN, WEI-HUA HAO Department of Computer Scence and Informaton Engneerng Tamkang Unversty 5 Yng-chuan

More information

Personalized Concept-Based Clustering of Search Engine Queries

Personalized Concept-Based Clustering of Search Engine Queries IEEE TRANSACTIONS ON JOURNAL NAME, MANUSCRIPT ID 1 Personalzed Concept-Based Clusterng of Search Engne Queres Kenneth Wa-Tng Leung, Wlfred Ng, and Dk Lun Lee Abstract The exponental growth of nformaton

More information

From Comparing Clusterings to Combining Clusterings

From Comparing Clusterings to Combining Clusterings Proceedngs of the Twenty-Thrd AAAI Conference on Artfcal Intellgence (008 From Comparng Clusterngs to Combnng Clusterngs Zhwu Lu and Yuxn Peng and Janguo Xao Insttute of Computer Scence and Technology,

More information

Simulation: Solving Dynamic Models ABE 5646 Week 11 Chapter 2, Spring 2010

Simulation: Solving Dynamic Models ABE 5646 Week 11 Chapter 2, Spring 2010 Smulaton: Solvng Dynamc Models ABE 5646 Week Chapter 2, Sprng 200 Week Descrpton Readng Materal Mar 5- Mar 9 Evaluatng [Crop] Models Comparng a model wth data - Graphcal, errors - Measures of agreement

More information

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification Introducton to Artfcal Intellgence V22.0472-001 Fall 2009 Lecture 24: Nearest-Neghbors & Support Vector Machnes Rob Fergus Dept of Computer Scence, Courant Insttute, NYU Sldes from Danel Yeung, John DeNero

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

Pruning Training Corpus to Speedup Text Classification 1

Pruning Training Corpus to Speedup Text Classification 1 Prunng Tranng Corpus to Speedup Text Classfcaton Jhong Guan and Shugeng Zhou School of Computer Scence, Wuhan Unversty, Wuhan, 430079, Chna hguan@wtusm.edu.cn State Key Lab of Software Engneerng, Wuhan

More information

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc.

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc. [Type text] [Type text] [Type text] ISSN : 0974-74 Volume 0 Issue BoTechnology 04 An Indan Journal FULL PAPER BTAIJ 0() 04 [684-689] Revew on Chna s sports ndustry fnancng market based on market -orented

More information

(1) The control processes are too complex to analyze by conventional quantitative techniques.

(1) The control processes are too complex to analyze by conventional quantitative techniques. Chapter 0 Fuzzy Control and Fuzzy Expert Systems The fuzzy logc controller (FLC) s ntroduced n ths chapter. After ntroducng the archtecture of the FLC, we study ts components step by step and suggest a

More information

FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK

FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK L-qng Qu, Yong-quan Lang 2, Jng-Chen 3, 2 College of Informaton Scence and Technology, Shandong Unversty of Scence and Technology,

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervsed Learnng and Clusterng Supervsed vs. Unsupervsed Learnng Up to now we consdered supervsed learnng scenaro, where we are gven 1. samples 1,, n 2. class labels for all samples 1,, n Ths s also

More information

Experiments in Text Categorization Using Term Selection by Distance to Transition Point

Experiments in Text Categorization Using Term Selection by Distance to Transition Point Experments n Text Categorzaton Usng Term Selecton by Dstance to Transton Pont Edgar Moyotl-Hernández, Héctor Jménez-Salazar Facultad de Cencas de la Computacón, B. Unversdad Autónoma de Puebla, 14 Sur

More information

An Indian Journal FULL PAPER ABSTRACT KEYWORDS. Trade Science Inc.

An Indian Journal FULL PAPER ABSTRACT KEYWORDS. Trade Science Inc. [Type text] [Type text] [Type text] ISSN : 97-735 Volume Issue 9 BoTechnology An Indan Journal FULL PAPER BTAIJ, (9), [333-3] Matlab mult-dmensonal model-based - 3 Chnese football assocaton super league

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

Concept Forest: A New Ontology-assisted Text Document Similarity Measurement Method

Concept Forest: A New Ontology-assisted Text Document Similarity Measurement Method Concept Forest: A New Ontology-asssted Text Document Smlarty Measurement Method James Z. Wang Wllam Taylor School of Computng Clemson Unversty, Box 340974 Clemson, SC 29634-0974, USA +1-864-656-7678 {jzwang,

More information

Using Fuzzy Logic to Enhance the Large Size Remote Sensing Images

Using Fuzzy Logic to Enhance the Large Size Remote Sensing Images Internatonal Journal of Informaton and Electroncs Engneerng Vol. 5 No. 6 November 015 Usng Fuzzy Logc to Enhance the Large Sze Remote Sensng Images Trung Nguyen Tu Huy Ngo Hoang and Thoa Vu Van Abstract

More information

Optimizing Document Scoring for Query Retrieval

Optimizing Document Scoring for Query Retrieval Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng

More information

Information Retrieval

Information Retrieval Anmol Bhasn abhasn[at]cedar.buffalo.edu Moht Devnan mdevnan[at]cse.buffalo.edu Sprng 2005 #$ "% &'" (! Informaton Retreval )" " * + %, ##$ + *--. / "#,0, #'",,,#$ ", # " /,,#,0 1"%,2 '",, Documents are

More information

On-line Hot Topic Recommendation Using Tolerance Rough Set Based Topic Clustering

On-line Hot Topic Recommendation Using Tolerance Rough Set Based Topic Clustering JOURNAL OF COMPUTERS, VOL. 5, NO. 4, APRIL 2010 549 On-lne Hot Topc Recommendaton Usng Tolerance Rough Set Based Topc Clusterng Yonghu Wu, Yuxn Dng, Xaolong Wang, Jun Xu Intellgence Computng Research Center

More information

Smoothing Spline ANOVA for variable screening

Smoothing Spline ANOVA for variable screening Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory

More information

A Method of Hot Topic Detection in Blogs Using N-gram Model

A Method of Hot Topic Detection in Blogs Using N-gram Model 84 JOURNAL OF SOFTWARE, VOL. 8, NO., JANUARY 203 A Method of Hot Topc Detecton n Blogs Usng N-gram Model Xaodong Wang College of Computer and Informaton Technology, Henan Normal Unversty, Xnxang, Chna

More information

An Efficient Genetic Algorithm with Fuzzy c-means Clustering for Traveling Salesman Problem

An Efficient Genetic Algorithm with Fuzzy c-means Clustering for Traveling Salesman Problem An Effcent Genetc Algorthm wth Fuzzy c-means Clusterng for Travelng Salesman Problem Jong-Won Yoon and Sung-Bae Cho Dept. of Computer Scence Yonse Unversty Seoul, Korea jwyoon@sclab.yonse.ac.r, sbcho@cs.yonse.ac.r

More information

HCMX: AN EFFICIENT HYBRID CLUSTERING APPROACH FOR MULTI-VERSION XML DOCUMENTS

HCMX: AN EFFICIENT HYBRID CLUSTERING APPROACH FOR MULTI-VERSION XML DOCUMENTS HCMX: AN EFFICIENT HYBRID CLUSTERING APPROACH FOR MULTI-VERSION XML DOCUMENTS VIJAY SONAWANE 1, D.RAJESWARA.RAO 2 1 Research Scholar, Department of CSE, K.L.Unversty, Green Felds, Guntur, Andhra Pradesh

More information

Machine Learning. Topic 6: Clustering

Machine Learning. Topic 6: Clustering Machne Learnng Topc 6: lusterng lusterng Groupng data nto (hopefully useful) sets. Thngs on the left Thngs on the rght Applcatons of lusterng Hypothess Generaton lusters mght suggest natural groups. Hypothess

More information

The Effect of Similarity Measures on The Quality of Query Clusters

The Effect of Similarity Measures on The Quality of Query Clusters The effect of smlarty measures on the qualty of query clusters. Fu. L., Goh, D.H., Foo, S., & Na, J.C. (2004). Journal of Informaton Scence, 30(5) 396-407 The Effect of Smlarty Measures on The Qualty of

More information

A Webpage Similarity Measure for Web Sessions Clustering Using Sequence Alignment

A Webpage Similarity Measure for Web Sessions Clustering Using Sequence Alignment A Webpage Smlarty Measure for Web Sessons Clusterng Usng Sequence Algnment Mozhgan Azmpour-Kv School of Engneerng and Scence Sharf Unversty of Technology, Internatonal Campus Ksh Island, Iran mogan_az@ksh.sharf.edu

More information

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 A mathematcal programmng approach to the analyss, desgn and

More information

Constructing Minimum Connected Dominating Set: Algorithmic approach

Constructing Minimum Connected Dominating Set: Algorithmic approach Constructng Mnmum Connected Domnatng Set: Algorthmc approach G.N. Puroht and Usha Sharma Centre for Mathematcal Scences, Banasthal Unversty, Rajasthan 304022 usha.sharma94@yahoo.com Abstract: Connected

More information

An Internal Clustering Validation Index for Boolean Data

An Internal Clustering Validation Index for Boolean Data BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 16, No 6 Specal ssue wth selecton of extended papers from 6th Internatonal Conference on Logstc, Informatcs and Servce Scence

More information

BIN XIA et al: AN IMPROVED K-MEANS ALGORITHM BASED ON CLOUD PLATFORM FOR DATA MINING

BIN XIA et al: AN IMPROVED K-MEANS ALGORITHM BASED ON CLOUD PLATFORM FOR DATA MINING An Improved K-means Algorthm based on Cloud Platform for Data Mnng Bn Xa *, Yan Lu 2. School of nformaton and management scence, Henan Agrcultural Unversty, Zhengzhou, Henan 450002, P.R. Chna 2. College

More information

Alignment Results of SOBOM for OAEI 2010

Alignment Results of SOBOM for OAEI 2010 Algnment Results of SOBOM for OAEI 2010 Pegang Xu, Yadong Wang, Lang Cheng, Tany Zang School of Computer Scence and Technology Harbn Insttute of Technology, Harbn, Chna pegang.xu@gmal.com, ydwang@ht.edu.cn,

More information

Optimal Workload-based Weighted Wavelet Synopses

Optimal Workload-based Weighted Wavelet Synopses Optmal Workload-based Weghted Wavelet Synopses Yoss Matas School of Computer Scence Tel Avv Unversty Tel Avv 69978, Israel matas@tau.ac.l Danel Urel School of Computer Scence Tel Avv Unversty Tel Avv 69978,

More information

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng

More information

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr) Helsnk Unversty Of Technology, Systems Analyss Laboratory Mat-2.08 Independent research projects n appled mathematcs (3 cr) "! #$&% Antt Laukkanen 506 R ajlaukka@cc.hut.f 2 Introducton...3 2 Multattrbute

More information

Conditional Speculative Decimal Addition*

Conditional Speculative Decimal Addition* Condtonal Speculatve Decmal Addton Alvaro Vazquez and Elsardo Antelo Dep. of Electronc and Computer Engneerng Unv. of Santago de Compostela, Span Ths work was supported n part by Xunta de Galca under grant

More information

Virtual Machine Migration based on Trust Measurement of Computer Node

Virtual Machine Migration based on Trust Measurement of Computer Node Appled Mechancs and Materals Onlne: 2014-04-04 ISSN: 1662-7482, Vols. 536-537, pp 678-682 do:10.4028/www.scentfc.net/amm.536-537.678 2014 Trans Tech Publcatons, Swtzerland Vrtual Machne Mgraton based on

More information

Recommended Items Rating Prediction based on RBF Neural Network Optimized by PSO Algorithm

Recommended Items Rating Prediction based on RBF Neural Network Optimized by PSO Algorithm Recommended Items Ratng Predcton based on RBF Neural Network Optmzed by PSO Algorthm Chengfang Tan, Cayn Wang, Yuln L and Xx Q Abstract In order to mtgate the data sparsty and cold-start problems of recommendaton

More information

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms Course Introducton Course Topcs Exams, abs, Proects A quc loo at a few algorthms 1 Advanced Data Structures and Algorthms Descrpton: We are gong to dscuss algorthm complexty analyss, algorthm desgn technques

More information

An Image Fusion Approach Based on Segmentation Region

An Image Fusion Approach Based on Segmentation Region Rong Wang, L-Qun Gao, Shu Yang, Yu-Hua Cha, and Yan-Chun Lu An Image Fuson Approach Based On Segmentaton Regon An Image Fuson Approach Based on Segmentaton Regon Rong Wang, L-Qun Gao, Shu Yang 3, Yu-Hua

More information

Cross-Language Information Retrieval

Cross-Language Information Retrieval Feature Artcle: Cross-Language Informaton Retreval 19 Cross-Language Informaton Retreval Jan-Yun Ne 1 Abstract A research group n Unversty of Montreal has worked on the problem of cross-language nformaton

More information

SURFACE PROFILE EVALUATION BY FRACTAL DIMENSION AND STATISTIC TOOLS USING MATLAB

SURFACE PROFILE EVALUATION BY FRACTAL DIMENSION AND STATISTIC TOOLS USING MATLAB SURFACE PROFILE EVALUATION BY FRACTAL DIMENSION AND STATISTIC TOOLS USING MATLAB V. Hotař, A. Hotař Techncal Unversty of Lberec, Department of Glass Producng Machnes and Robotcs, Department of Materal

More information

Problem Set 3 Solutions

Problem Set 3 Solutions Introducton to Algorthms October 4, 2002 Massachusetts Insttute of Technology 6046J/18410J Professors Erk Demane and Shaf Goldwasser Handout 14 Problem Set 3 Solutons (Exercses were not to be turned n,

More information