Using Wikipedia Anchor Text and Weighted Clustering Coefficient to Enhance the Traditional Multi-Document Summarization

Size: px
Start display at page:

Download "Using Wikipedia Anchor Text and Weighted Clustering Coefficient to Enhance the Traditional Multi-Document Summarization"

Transcription

1 Usng Wkpeda Anchor Text and Weghted Clusterng Coeffcent to Enhance the Tradtonal Mult-Document Summarzaton by Nraj Kumar, Kannan Srnathan, Vasudeva Varma n 13th Internatonal Conference on Intellgent Text Processng and Computatonal Lngustcs Indan Insttute of Technology Delh, New Delh, Inda Report No: IIIT/TR/2012/-1 Centre for Search and Informaton Extracton Lab Internatonal Insttute of Informaton Technology Hyderabad , INDIA March 2012

2 Usng Wkpeda Anchor Text and Weghted Clusterng Coeffcent to Enhance the Tradtonal Mult-Document Summarzaton Nraj Kumar, Kannan Srnathan, Vasudeva Varma, IIIT-Hyderabad, Hyderabad , INDIA Abstract. Smlar to the tradtonal approach, we consder the task of summarzaton as selecton of top ranked sentences from ranked sentenceclusters. To acheve ths goal, we rank the sentence clusters by usng the mportance of words calculated by usng page rank algorthm on reverse drected word graph of sentences. Next, to rank the sentences n every cluster we ntroduce the use of weghted clusterng coeffcent. We use page rank score of words for calculaton of weghted clusterng coeffcent. Fnally the most mportant ssue s the presence of a lot of nosy entres n the text, whch downgrades the performance of most of the text mnng algorthms. To solve ths problem, we ntroduce the use of Wkpeda anchor text based phrase mappng scheme. Our expermental results on DUC-2002 and DUC-2004 dataset show that our system performs better than unsupervsed systems and better than/comparable wth novel supervsed systems of ths area. Keywords: Mult-document summarzaton, sentence clusters, weghted clusterng coeffcent, page rank, and Wkpeda anchor text. 1 Introducton The generc summares reflect the man topcs of the document wthout any addtonal clues and pror knowledge. Accordng to [5], generc summares outperform over (1) query-based and (2) hybrd summares n the browsng tasks, so the document context of generc summares help users n browsng. These days dgtal lbrares and nternet etc. contan huge amount of text resources, lke: Text artcles, web pages, News documents, Educatonal materals etc. These all agan contan huge amount of nformaton and we have less tme to go through. It s remarkable to note that all such documents do not always contan human suppled summares. We beleve that an unsupervsed approach to generate extract summary by usng lmted lngustc resources s essental. It mproves the quck access of large quanttes of such nformaton. Fnally, the uses of learnng /tranng based systems make us dependent on corpus or dataset. That s why we focus our attenton towards the development of an unsupervsed generc Mult-document summarzaton system, whch can generate hgh qualty extract summary wthout usng heavy lngustc resources and learnng/tranng.

3 1.1 Related Work A lot of methods have been proposed for mult-document summarzaton. The most frequently used technques among all proposed methods are the use of sentence vector representaton (where each row represents a sentence and each column represents a term) and graphs based methods (where each node s a sentence and each edge represents the par wse relatonshp among correspondng sentences). Fnally all these methods rank the sentences accordng to ther scores calculated by a set of predefned features, such as term frequency nverse sentence frequency (TF-ISF) [16]; [14], sentence or term poston [20], and number of keywords [20]. Some state of the art methods wth key features are: centrod-based methods (e.g., MEAD [16]), graph-rankng based methods (e.g., LexPageRank [10]), non-negatve matrx factorzaton (NMF) based methods (e.g., [11]), Condtonal random feld (CRF) based summarzaton [18], and LSA based methods [11]. 1.2 Problem Setup and Motvaton In ths secton we present some basc ssues and problems related to tradtonal multdocument summarzaton and basc motvaton behnd the technques used to solve t. Usng Wkpeda anchor texts and documents ttles to handle nosy terms: Presence of nosy words n documents generally reduces the performance of most of the summarzaton algorthms. Because several tmes nosy words get good score wth lngustc, statstcal or graph theoretcal scorng system. However, the use of Tf-Idf (term frequency and nverse document frequency) and word net etc., shows some mprovements, but stll t requres some more mprovements. To solve ths ssue, we use the Wkpeda anchor text and ttles of documents. Wth the help of Wkpeda anchor text and ttles of documents, we dentfy the nformatve terms from gven documents. The anchor texts n Wkpeda have great semantc value,.e. they provde alternatve names, morphologcal varatons and related phrases for target artcle. Ths step has two benefts: (1) It reduces the chances of gettng hgh mportance by nosy words and (2) mproves the performance of overall system. Usng page rank score on reverse drected word graph of sentences to rank the sentence clusters: Use of sentence clusters n mult-document summarzaton s not new. We use GAAC (group average agglomeratve clusterng algorthm) to cluster the sentences. To rank the dentfed sentence clusters, we use page rank score of words, calculated on reverse drected word graph of sentences. Ths scheme helps n effectve rankng of words through votng. In general wrtng behavour, we descrbe the term after wrtng t. The page rank score on reverse drected word graph of sentences effectvely captures t. Use of Weghted Clusterng Coeffcent: use of weghted clusterng coeffcent helps us n dentfyng the strength of tes wth strong nodes. Before gong nto detal, we frst descrbe the clusterng coeffcent and then descrbe the requrement of weghted clusterng coeffcent.

4 The clusterng coeffcent s a measure of degree to whch nodes n a graph tends to cluster. There are two types of clusterng coeffcents: a) Global Clusterng coeffcent: It s desgned to gve an overall ndcaton of the clusterng n the network. b) Local Clusterng Coeffcent: It gves the ndcaton of embeddedness of sngle node. We use the noton of local clusterng coeffcent. It can be defned as: a) In undrected network the local clusterng coeffcent CV of a node V can be defned as: C V K V 2eV K V 1 Where, K V =number of neghbors / degree of V and e V =number of connected pars between all neghbours of V b) In drected network the local clusterng coeffcent CV be defned as: C V K V ev K V 1 (1) of a node V can Man am behnd the use of weghted clusterng coeffcents: We beleve that each word n document may have dfferent levels of mportance (beyond what s captured by degree of node n graph) and therefore we cannot gnore ths fact. The unweghted clusterng coeffcent obtaned by usng word graph of sentences, helps us n dentfyng the embeddedness strength of words wth other words n the graph; however, the use of mportance of words n clusterng coeffcents (.e. weghted clusterng coeffcent) helps us n dentfyng the embeddedness strength of words wth other mportant words n the graph. Ths s a general socal networkng behavour, where strength or status of any node or person depends upon (1) strength of that person / node and (2) strength of te ups wth strong frends. By usng of page rank of words n calculaton of weghted clusterng coeffcent we tred to acheve both levels of strength. Our system uses the weghted clusterng coeffcent score of words to calculate the mportance of sentences n sentence cluster. The effectve mprovements n qualty of results also support our vew (see sub-secton 4.2 for results). (2) 2 Framework and Algorthm 2.1 Input Cleanng Our nput cleanng task ncludes: (1) removal of nosy entres from entre document collecton and (2) sentence fltraton. Fnally we stem the entre text by usng porter stemmng algorthm.

5 2.2 Calculaton of Importance of Words The calculaton of mportance of words s very mportant, as, we use t to calculate the mportance of dentfed sentence clusters n next step. To calculate the mportance of all dstnct words of gven collecton, we concatenate all the documents of gven collecton and prepare a sngle fle. Next, we calculate the page rank score of every word on reverse drected word graph of sentences. The way to prepare the reverse drected word graph of sentences and calculaton of page rank s gven below: Preparng reverse drected word graph of sentences: Let, we have a set of sentences.e. S = {S1, S2,...Sn} from gven collecton. Now, to prepare the reverse drected word graph of sentences, we add reverse drected lnk for every adjacent G V, E as a word par of every sentence n the set. See Fgure-1. We denote drected graph, Where, V V V,..., V V E 1, 2 j, f there s a lnk from j V n V. V to denotes the vertex set and lnk set Fgure1: reverse drected word graph of sentences, Here S1, S2 and S3 represents the sentences of document and a, b, c, d, e, f, g, h and represents the dstnct words. Calculatng Page Rank Score: For any gven vertex V, let IN V be the set of vertces that pont to t (predecessors), and let OUT V be the set of vertces that vertex V ponts to (successors). Then the page rank score of vertex V can be defned as [3]: 1 SV j S V (3) N jin V OUTV j Where: SV Rank / score of word / vertex V. S V j =rank/score of word/vertex V j, from whch ncomng lnk comes to word / vertex V. N Count of number of words/vertex n word graph of sentences. Dampng factor (we use a fxed score for dampng factor.e., 0.85 as used n [3]).

6 2.3 Preparng Sentence Clusters and Rankng To dentfy the topcs covered n document we use Group average agglomeratve clusterng scheme (GAAC). In our case the topc s consdered as set of sentences related to same concept. Among three major agglomeratve clusterng algorthms,.e. sngle-lnk, complete-lnk, and average-lnk clusterng. Sngle-lnk clusterng can lead to elongated clusters. Complete-lnk clusterng s strongly affected by outlers. Average-lnk clusterng s a compromse between the two extremes, whch generally avods both problems. Ths s the man reason of use of group average agglomeratve clusterng algorthm for clusterng the sentences. GACC, uses average smlarty across all pars wthn the merged cluster to measure the smlarty of two clusters. In ths scheme average smlarty between two clusters (say, c and c j ) can be computed as: sm 1 ( c, c j ) sm( x, y) c c j ( c c 1) x j ( c c j ) y( c c j ): yx (4) Where, sm( x, y ) = count of co-occurrng words n x and y To apply the GACC on sentences we use a sentence vector representaton of documents of entre collecton. Here, each row represents a sentence and each column represents a term. In the entre evaluaton, we use the threshold 0.4. Calculatng mportance of sentence clusters or topcs: To calculate the weghted mportance of any sentence cluster or topc, we calculate the sum of weghted mportance of all words n the gven sentence cluster. The calculaton of weghted mportance of any sentence cluster can be gven as: C W Wwd (5) Where W C = weght of gven sentence cluster C W wd =weght of all words n gven sentence cluster. (see sub-secton 2.2, eq-3 to calculate the weght of words). Next, we calculate the percentage of weghted nformaton of every dentfed sentence cluster. The percentage weghted mportance of any dentfed sentence cluster can be calculated as: W C % W C 100 (6) W C Where: % WC =percentage weght of gven sentence cluster C. W C =sum of weght of all dentfed sentence cluster. W C = weght of gven sentence cluster C.

7 2.4 Mappng Phrases by usng Wkpeda Anchor Text We use Wkpeda anchor text to dentfy the nformatve terms n every dentfed sentence cluster. For ths, frst of all we fx the phrase boundary. Accordng to scheme defned n [2], we consder stopwords and punctuaton marks as phrase boundary. Next, we stem the entre anchor text collecton and fnd the longest matchng Wkpeda anchor text sequence n every words sequence wthn phrase boundary. We repeat ths process wth every word sequence nsde the predefned phrase boundary. We also fnd the matchng words related to ttles of entre collecton. We remove the rest of the words from every sentence. Thus every sentence n collecton contans sequence of Wkpeda anchor texts or words from ttles of entre collecton. We use ths mappng of phrases n calculaton of weghted clusterng coeffcents. 2.5 Calculatng Weghted Clusterng Coeffcent After step 2.4 we have sequence of Wkpeda anchor text words or words from ttles of documents, n sentences of every dentfed sentence cluster. Now, we calculate the weghted clusterng coeffcent of all such words n every sentence cluster. For ths we create undrected word graph of sentences. The sparse nature of word graph of sentences s the man reason behnd the selecton undrected graph for calculaton of weghted clusterng coeffcent. The process to calculate the weghted clusterng coeffcent of every dstnct word of gven sentence cluster s gven below: Preparng word graph of sentences: we treat every dstnct word as node of graph and prepare undrected word graph of sentences by addng undrected edge for every adjacent words par. Fgure-2: Undrected word graph of sentences, Here, S1, S2 and S3 denotes the sentences and A, B, C, D and N denotes the words whch are common to Wkpeda anchor text or Ttle of documents. as an undrected word graph of 1, 2 V n denotes the vertex set and lnk set V j, V E f there s a lnk between V j and V Calculatng Lnk weght: We use the page rank score of words (See sub-secton-2.2, for calculaton of weght of every word) n calculaton of lnk weght. The lnk weght of any edge E V, V j can be calculated as: Graph Theoretcal Notaton: We denote G V, E sentences. Where, V V V,..., W ScoreV V V j ScoreV, j Lc j DegreeV DegreeV j Where, W V, V j = Lnk weght of lnk between nodes V and V, V 2 (7) V j

8 Score V =page rank score of node (word) V Score V j =page rank score of node (word) V j Degree V =degree of node (word) V Degree V j =degree of node (word) V j L c V, Vj = count of number of lnks between nodes V and By usng ths scheme, we calculate the lnk weght of every edge of the graph. Calculatng weghted clusterng coeffcent: We use the lnk weght calculated by usng page rank score n calculaton of weghted clusterng coeffcent. In ths ven, we mantan the propertes of unweghted clusterng coeffcents on undrected graph (as descrbed n [4]). The value of weghted clusterng coeffcent of any node.e. 0,1 C. In the unweghted case, the number of trangles at ts node determnes ts clusterng property. In the weghted case, clusterng should be determned by some weghted characterstc of trangles. For each trangle all three edges should be taken nto account. For each trangle, the weghted characterstc should be nvarant to permutaton of weght. When any of the trangle approaches zero, the weghted characterstc of that trangle should lkewse approaches zero. When vertex V partcpates n the 1 maxmum number KV K V 1 of trangles, where each edge weght s 2 maxmal, the weghted clusterng coeffcent should also be maxmal.e. ~ C V 1. To acheve the weghted clusterng coeffcent [4], replaces e V (See Eq-1) by sum of trangle ntenstes. Now weghted clusterng coeffcent of any node V can be defned as: Where, ~ CV K V 2 K V ~ 1 W V V j, V k ~ W V, V j, V j W W V V j ~ ~, V W V, V W V, V j j k k 1 3 ~ V (8) (9) W V, V j Lnk weght of lnk between nodes V and V j (see equaton-7). In these equatons W s the maxmum of all edge s weght n gven graph. The normalzaton used n above equaton and use of sum of trangle ntenstes fulfl the condtons gven n [4]. 2.6 Rankng Sentences nsde Every Sentence Cluster To rank the sentences n every sentence cluster, we use the weghted clusterng coeffcent of words n sentences. We add the weghted clusterng coeffcent score of words to calculate the weght of sentence. We fnally rank the sentences n

9 descendng order of ther weght. The scheme to calculate the weght of sentences can be gven as: Where, Wt S r S WCC W Wt r (10) =weght of sentence Sr n gven sentence cluster. W = sum of weght of all words (node / vertex) whch exst n gven WCC sentence S r and obtaned by usng weghted clusterng coeffcent (see sub-secton 2.5, equaton-8). Next, we rank the sentences of gven sentence cluster n descendng order of ther weght. 2.7 Generatng Extract Summary To generate the extract summary, we select sngle top ranked sentence(s) from every dentfed sentence cluster and arrange them accordng to the rank of ther parent sentence cluster (see sub-secton 2.3 for rankng of dentfed sentence clusters). If, number of sentence clusters s few, then we use the percentage weght of every sentence cluster to fx the number of requred top sentences, whch are to be extracted from every sentence cluster. To calculate the percentage weght / mportance of any gven sentence clusters C we use the followng scheme: W C % W C 100 (11) W C Where, % WC = percentage weght of gven sentence cluster C. W C = sum of weghted mportance of all dentfed sentence clusters. W C = weght of gven sentence cluster C. (see sub-secton 2.3, to calculate the weght of any gven sentence clusters). Now, the count of sentences, that s to be extracted from sentence cluster C can be the nearest hgher nteger value of % WC Total number of requred sentences. NOTE: f the length of sentence s more than 40 words than we dscard t and pck the next hghest ranked sentence from same sentence cluster. 3 Pseudo Code INPUT: ASCII text document. OUTPUT: Requred number of extracted sentences as summary. We truncate the fnal output to meet the requred number of words. ALGORITHM:

10 Step 1. Apply nput cleanng (see Subsec-2.1). Step 2. Calculate the mportance/weght of every dstnct word of entre text collecton (See Subsecton-2.2). Step 3. Identfy all sentence clusters from the gven collecton and rank every dentfed sentence cluster n descendng order of ther mportance / score (see sub-secton 2.3). Step 4. Use Wkpeda anchor text and words from ttles of document collecton to dentfy the nformatve words n every dentfed sentence cluster (see sub-secton-2.4). Step 5. Calculate the weghted clusterng coeffcents of nformatve words of every dentfed sentence cluster (See sub-secton 2.5). Step 6. Use weghted clusterng coeffcent of nformatve words to rank the sentences n descendng order of ther weght, n every dentfed sentence cluster (See sub-secton 2.6). Step 7. Apply sentence extracton scheme, to produce the requred number of sentences (see sub-secton-2.7). 4 Evaluaton We have done two dfferent experments. In frst experment we compare our devsed system wth state-of-the-art supervsed and unsupervsed systems. In the second experment, we test the effect of weghted clusterng coeffcent. The detals of dataset, evaluaton metrcs and results are gven below. Detals of dataset: We use DUC2002 and DUC2004 data sets to evaluate our devsed system. DUC dataset s an open benchmark data sets from Document Understandng Conference (DUC) for generc automatc summarzaton. Table 1 gves a bref descrpton of the dataset. Table-1: Detals of DUC 2002, DUC-2004 dataset DUC2002 DUC2004 number of document collectons number of documents n each collecton data source TREC TDT summary length 200 words 665bytes Evaluaton metrc: We use ROUGE toolkt (verson 1.5.5) to measure the summarzaton performance. To properly evaluate the summary we use ROUGE-1, ROUGE-2, ROUGE-SU and ROUGE-L based measures. The rest of the detals and package s avalable at [13]. 4.1 Experment-1 In ths experment we emprcally compare our devsed system s result wth publshed results of [6]. The detals of system descrpton used n expermental evaluaton of [6], s descrbed below: Systems used n evaluaton. We use the publshed results of the followng most wdely used document summarzaton methods as the baselne systems to compare wth our devsed system. (1) Random: The method selects sentences randomly for

11 each document collecton (2) Centrod: The method apples MEAD algorthm [16] to extract sentences accordng to the followng three parameters: centrod value, postonal value, and frst-sentence overlap. (3) LexPageRank: The method frst constructs a sentence connectvty graph based on cosne smlarty and then selects mportant sentences based on the concept of egenvector centralty [10]. (4) LSA: The method performs latent semantc analyss on terms by sentences matrx to select sentences havng the greatest combned weghts across all mportant topcs [11]. (5) NMF: The method performs non-negatve matrx factorzaton (NMF) on terms by sentences matrx and then ranks the sentences by ther weghted scores [12]. (6) KM: The method performs K-means algorthm on terms by sentences matrx to cluster the sentences and then chooses the centrods for each sentence cluster. (7) FGB: The FGB method s proposed n [19]. (8) The publshed results of BSTM method [6]. Results: Results are gven n Table-2 and Table-3. Table-2 contans evaluaton results on DUC-2002 dataset. Table-3 contans evaluaton results on DUC-2004 dataset. The hghest evaluaton score related to every ROUGE evaluaton metrc s presented by usng bold font. From expermental results (as, gven n Table-2 and Table-3), t s clear that our devsed system performs better than all unsupervsed systems and better/comparable wth supervsed system lke BSTM [6]. Table-2: Evaluaton results on DUC-2002 dataset Systems ROUGE-1 ROUGE-2 ROUGE-L ROUGE-SU DUC Best Random Centrod LexPageRank LSA NMF KM FGB BSTM Our System Table-3: Evaluaton results on DUC-2004 dataset Systems ROUGE-1 ROUGE-2 ROUGE-L ROUGE-SU DUC Best Random Centrod LexPageRank LSA NMF KM FGB BSTM Our System

12 4.2 Experment-2 We use ths experment to justfy the use of weghted clusterng coeffcent for rankng the sentences n every dentfed sentence cluster. For ths we make smple change and use unweghted clusterng coeffcent as gven n equaton-1 n place of equaton-8 (see sub-secton 2.5) and run the entre system. The comparatve results (.e. wth weghted clusterng coeffcent and wth unweghted clusterng coeffcent) wth DUC-2002 and DUC-2004 dataset are gven n Fgure-3 and n Fgure-4 respectvely. The results gven n Fgure-3 and 4, clearly ndcates the benefts of usng weghted clusterng coeffcent. Fgure-3: Experments usng DUC-2002 dataset Fgure-Y: Experments usng DUC-2004 dataset 5 Concluson and Future Work In ths paper we ntroduce the use of Wkpeda anchor text and weghted clusterng coeffcent for mult-document summarzaton. Addtonally, we lmt the use of lngustc resources to nclude only stopwords, stemmers and punctuaton marks. The expermental results show that our devsed system performs better than unsupervsed systems and better/comparable wth supervsed systems of ths area. As, a future work we are plannng to use the relaton between Wkpeda anchor texts for mprovements n summary qualty. We beleve that such relaton can

13 mprove the weghted clusterng coeffcent score of nformatve terms and hence, t may mprove the summary qualty. References 1. D. M. Ble, A. Y. Ng, and M. I. Jordan. Latent drchlet allocaton. In Advances n Neural Informaton Processng Systems Kumar, Nraj and Srnathan, Kannan. Automatc keyphrase extracton from scentfc documents usng N-gram fltraton technque. Proceedng of the eghth ACM symposum on Document engneerng. DocEng '08. Sao Paulo, Brazl L. Page, s. Brn, r. Motwan and t. Wnograd., The pagerank ctaton rankng: brngng order to the web. Techncal report, Stanford dgtal lbrary technologes project, Jar Saramak, Jukka-Pekka Onnela, Janos Kertesz and Kmmo Kask; Characterzng Motfs n Weghted Complex Networks. 5. Danel m. Mcdonald and hsnchun chen., Summary n context: searchng versus browsng;acm transactons on nformaton systems, vol. 24, no. 1, january 2006, pages Dngdng Wang, Shenghuo Zhu, Tao L, Yhong Gong;Mult-Document Summarzaton usng Sentence-based Topc Models;Proceedngs of the ACL-IJCNLP 2009 Conference Short Papers, pages ,Suntec, Sngapore, 4 August ACL and AFNLP 7. C. Dng and X. He. K-means clusterng and prncpal component analyss. In Prodeedngs of ICML Chrs Dng, Xaofeng He, and Horst Smon On the equvalence of nonnegatve matrx factorzaton and spectral clusterng. In Proceedngs of Sam Data Mnng. 9. Chrs Dng, Tao L, We Peng, and Haesun Park Orthogonal nonnegatve matrx trfactorzatons for clusterng. In Proceedngs of SIGKDD G. Erkan and D. Radev Lexpagerank: Prestge n mult-document text summarzaton. In Proceedngs of EMNLP Y. Gong and X. Lu Generc text summarzaton usng relevance measure and latent semantc analyss. In Proceedngs of SIGIR. 12. Danel D. Lee and H. Sebastan Seung. Algorthms for non-negatve matrx factorzaton. In Advances n Neural Informaton Processng Systems C-Y. Ln and E.Hovy. Automatc evaluaton of summares usng n-gram cooccurrence statstcs. In Proceedngs of NLT-NAACL C-Y. Ln and E. Hovy From sngle to mult-document summarzaton: A prototype system and ts evaluaton. In Proceedngs of ACL I. Man Automatc summarzaton. John Benjamns Publshng Company. 16. D. Radev, H. Jng, M. Stys, and D. Tam Centrod-based summarzaton of multple documents. Informaton Processng and Management, pages B. Rcardo and R. Berther Modern nformaton retreval. ACM Press. 18. D. Shen, J-T. Sun, H. L, Q. Yang, and Z. Chen Document summarzaton usng condtonal random felds. In Proceedngs of IJCAI Dngdng Wang, Shenghuo Zhu, Tao L, Yun Ch, and Yhong Gong Integratng clusterng and mult-document summarzaton to mprove document understandng. In Proceedngs of CIKM W-T. Yh, J. Goodman, L. Vanderwende, and H. Suzuk Multdocument summarzaton by maxmzng nformatve content-words. In Proceedngs of IJCAI 2007.

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for

More information

UB at GeoCLEF Department of Geography Abstract

UB at GeoCLEF Department of Geography   Abstract UB at GeoCLEF 2006 Mguel E. Ruz (1), Stuart Shapro (2), June Abbas (1), Slva B. Southwck (1) and Davd Mark (3) State Unversty of New York at Buffalo (1) Department of Lbrary and Informaton Studes (2) Department

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

Single Document Keyphrase Extraction Using Neighborhood Knowledge

Single Document Keyphrase Extraction Using Neighborhood Knowledge Proceedngs of the Twenty-Thrd AAAI Conference on Artfcal Intellgence (2008) Sngle Document Keyphrase Extracton Usng Neghborhood Knowledge Xaoun Wan and Janguo Xao Insttute of Computer Scence and Technology

More information

Available online at Available online at Advanced in Control Engineering and Information Science

Available online at   Available online at   Advanced in Control Engineering and Information Science Avalable onlne at wwwscencedrectcom Avalable onlne at wwwscencedrectcom Proceda Proceda Engneerng Engneerng 00 (2011) 15000 000 (2011) 1642 1646 Proceda Engneerng wwwelsevercom/locate/proceda Advanced

More information

Weighted Feature Subset Non-Negative Matrix Factorization and its Applications to Document Understanding

Weighted Feature Subset Non-Negative Matrix Factorization and its Applications to Document Understanding 200 IEEE Internatonal Conference on Data Mnng Weghted Feature Subset Non-Negatve Matrx Factorzaton and ts Applcatons to Document Understandng Dngdng Wang Tao L School of Computng and Informaton Scences

More information

Hierarchical clustering for gene expression data analysis

Hierarchical clustering for gene expression data analysis Herarchcal clusterng for gene expresson data analyss Gorgo Valentn e-mal: valentn@ds.unm.t Clusterng of Mcroarray Data. Clusterng of gene expresson profles (rows) => dscovery of co-regulated and functonally

More information

Module Management Tool in Software Development Organizations

Module Management Tool in Software Development Organizations Journal of Computer Scence (5): 8-, 7 ISSN 59-66 7 Scence Publcatons Management Tool n Software Development Organzatons Ahmad A. Al-Rababah and Mohammad A. Al-Rababah Faculty of IT, Al-Ahlyyah Amman Unversty,

More information

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems A Unfed Framework for Semantcs and Feature Based Relevance Feedback n Image Retreval Systems Ye Lu *, Chunhu Hu 2, Xngquan Zhu 3*, HongJang Zhang 2, Qang Yang * School of Computng Scence Smon Fraser Unversty

More information

Query Clustering Using a Hybrid Query Similarity Measure

Query Clustering Using a Hybrid Query Similarity Measure Query clusterng usng a hybrd query smlarty measure Fu. L., Goh, D.H., & Foo, S. (2004). WSEAS Transacton on Computers, 3(3), 700-705. Query Clusterng Usng a Hybrd Query Smlarty Measure Ln Fu, Don Hoe-Lan

More information

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information

Keyword-based Document Clustering

Keyword-based Document Clustering Keyword-based ocument lusterng Seung-Shk Kang School of omputer Scence Kookmn Unversty & AIrc hungnung-dong Songbuk-gu Seoul 36-72 Korea sskang@kookmn.ac.kr Abstract ocument clusterng s an aggregaton of

More information

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like:

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like: Self-Organzng Maps (SOM) Turgay İBRİKÇİ, PhD. Outlne Introducton Structures of SOM SOM Archtecture Neghborhoods SOM Algorthm Examples Summary 1 2 Unsupervsed Hebban Learnng US Hebban Learnng, Cntd 3 A

More information

Keywords - Wep page classification; bag of words model; topic model; hierarchical classification; Support Vector Machines

Keywords - Wep page classification; bag of words model; topic model; hierarchical classification; Support Vector Machines (IJCSIS) Internatonal Journal of Computer Scence and Informaton Securty, Herarchcal Web Page Classfcaton Based on a Topc Model and Neghborng Pages Integraton Wongkot Srura Phayung Meesad Choochart Haruechayasak

More information

Information Retrieval

Information Retrieval Anmol Bhasn abhasn[at]cedar.buffalo.edu Moht Devnan mdevnan[at]cse.buffalo.edu Sprng 2005 #$ "% &'" (! Informaton Retreval )" " * + %, ##$ + *--. / "#,0, #'",,,#$ ", # " /,,#,0 1"%,2 '",, Documents are

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

An Image Fusion Approach Based on Segmentation Region

An Image Fusion Approach Based on Segmentation Region Rong Wang, L-Qun Gao, Shu Yang, Yu-Hua Cha, and Yan-Chun Lu An Image Fuson Approach Based On Segmentaton Regon An Image Fuson Approach Based on Segmentaton Regon Rong Wang, L-Qun Gao, Shu Yang 3, Yu-Hua

More information

Description of NTU Approach to NTCIR3 Multilingual Information Retrieval

Description of NTU Approach to NTCIR3 Multilingual Information Retrieval Proceedngs of the Thrd NTCIR Workshop Descrpton of NTU Approach to NTCIR3 Multlngual Informaton Retreval Wen-Cheng Ln and Hsn-Hs Chen Department of Computer Scence and Informaton Engneerng Natonal Tawan

More information

Combining Multiple Resources, Evidence and Criteria for Genomic Information Retrieval

Combining Multiple Resources, Evidence and Criteria for Genomic Information Retrieval Combnng Multple Resources, Evdence and Crtera for Genomc Informaton Retreval Luo S 1, Je Lu 2 and Jame Callan 2 1 Department of Computer Scence, Purdue Unversty, West Lafayette, IN 47907, USA ls@cs.purdue.edu

More information

Ranking Techniques for Cluster Based Search Results in a Textual Knowledge-base

Ranking Techniques for Cluster Based Search Results in a Textual Knowledge-base Rankng Technques for Cluster Based Search Results n a Textual Knowledge-base Shefal Sharma Fetch Technologes, Inc 841 Apollo St, El Segundo, CA 90254 +1 (310) 414-9849 ssharma@fetch.com Sofus A. Macskassy

More information

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION Paulo Quntlano 1 & Antono Santa-Rosa 1 Federal Polce Department, Brasla, Brazl. E-mals: quntlano.pqs@dpf.gov.br and

More information

A Knowledge Management System for Organizing MEDLINE Database

A Knowledge Management System for Organizing MEDLINE Database A Knowledge Management System for Organzng MEDLINE Database Hyunk Km, Su-Shng Chen Computer and Informaton Scence Engneerng Department, Unversty of Florda, Ganesvlle, Florda 32611, USA Wth the exploson

More information

FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK

FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK L-qng Qu, Yong-quan Lang 2, Jng-Chen 3, 2 College of Informaton Scence and Technology, Shandong Unversty of Scence and Technology,

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

Performance Evaluation of Information Retrieval Systems

Performance Evaluation of Information Retrieval Systems Why System Evaluaton? Performance Evaluaton of Informaton Retreval Systems Many sldes n ths secton are adapted from Prof. Joydeep Ghosh (UT ECE) who n turn adapted them from Prof. Dk Lee (Unv. of Scence

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

Experiments in Text Categorization Using Term Selection by Distance to Transition Point

Experiments in Text Categorization Using Term Selection by Distance to Transition Point Experments n Text Categorzaton Usng Term Selecton by Dstance to Transton Pont Edgar Moyotl-Hernández, Héctor Jménez-Salazar Facultad de Cencas de la Computacón, B. Unversdad Autónoma de Puebla, 14 Sur

More information

Domain Thesaurus Construction from Wikipedia *

Domain Thesaurus Construction from Wikipedia * Internatonal Conference on Computer, Networks and Communcaton Engneerng (ICCNCE 2013) Doman Thesaurus Constructon from Wkpeda * WenKe Yn 1, Mng Zhu 2, TanHao Chen 2 1 Department of Electronc Engneerng

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

Selecting Query Term Alterations for Web Search by Exploiting Query Contexts

Selecting Query Term Alterations for Web Search by Exploiting Query Contexts Selectng Query Term Alteratons for Web Search by Explotng Query Contexts Guhong Cao Stephen Robertson Jan-Yun Ne Dept. of Computer Scence and Operatons Research Mcrosoft Research at Cambrdge Dept. of Computer

More information

The Research of Support Vector Machine in Agricultural Data Classification

The Research of Support Vector Machine in Agricultural Data Classification The Research of Support Vector Machne n Agrcultural Data Classfcaton Le Sh, Qguo Duan, Xnmng Ma, Me Weng College of Informaton and Management Scence, HeNan Agrcultural Unversty, Zhengzhou 45000 Chna Zhengzhou

More information

Load-Balanced Anycast Routing

Load-Balanced Anycast Routing Load-Balanced Anycast Routng Chng-Yu Ln, Jung-Hua Lo, and Sy-Yen Kuo Department of Electrcal Engneerng atonal Tawan Unversty, Tape, Tawan sykuo@cc.ee.ntu.edu.tw Abstract For fault-tolerance and load-balance

More information

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur FEATURE EXTRACTION Dr. K.Vjayarekha Assocate Dean School of Electrcal and Electroncs Engneerng SASTRA Unversty, Thanjavur613 41 Jont Intatve of IITs and IISc Funded by MHRD Page 1 of 8 Table of Contents

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach Angle Estmaton and Correcton of Hand Wrtten, Textual and Large areas of Non-Textual Document Images: A Novel Approach D.R.Ramesh Babu Pyush M Kumat Mahesh D Dhannawat PES Insttute of Technology Research

More information

TN348: Openlab Module - Colocalization

TN348: Openlab Module - Colocalization TN348: Openlab Module - Colocalzaton Topc The Colocalzaton module provdes the faclty to vsualze and quantfy colocalzaton between pars of mages. The Colocalzaton wndow contans a prevew of the two mages

More information

Document Representation and Clustering with WordNet Based Similarity Rough Set Model

Document Representation and Clustering with WordNet Based Similarity Rough Set Model IJCSI Internatonal Journal of Computer Scence Issues, Vol. 8, Issue 5, No 3, September 20 ISSN (Onlne): 694-084 www.ijcsi.org Document Representaton and Clusterng wth WordNet Based Smlarty Rough Set Model

More information

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide Lobachevsky State Unversty of Nzhn Novgorod Polyhedron Quck Start Gude Nzhn Novgorod 2016 Contents Specfcaton of Polyhedron software... 3 Theoretcal background... 4 1. Interface of Polyhedron... 6 1.1.

More information

Machine Learning 9. week

Machine Learning 9. week Machne Learnng 9. week Mappng Concept Radal Bass Functons (RBF) RBF Networks 1 Mappng It s probably the best scenaro for the classfcaton of two dataset s to separate them lnearly. As you see n the below

More information

Recommended Items Rating Prediction based on RBF Neural Network Optimized by PSO Algorithm

Recommended Items Rating Prediction based on RBF Neural Network Optimized by PSO Algorithm Recommended Items Ratng Predcton based on RBF Neural Network Optmzed by PSO Algorthm Chengfang Tan, Cayn Wang, Yuln L and Xx Q Abstract In order to mtgate the data sparsty and cold-start problems of recommendaton

More information

Optimizing Document Scoring for Query Retrieval

Optimizing Document Scoring for Query Retrieval Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng

More information

Federated Search of Text-Based Digital Libraries in Hierarchical Peer-to-Peer Networks

Federated Search of Text-Based Digital Libraries in Hierarchical Peer-to-Peer Networks Federated Search of Text-Based Dgtal Lbrares n Herarchcal Peer-to-Peer Networks Je Lu School of Computer Scence Carnege Mellon Unversty Pttsburgh, PA 15213 jelu@cs.cmu.edu Jame Callan School of Computer

More information

GA-Based Learning Algorithms to Identify Fuzzy Rules for Fuzzy Neural Networks

GA-Based Learning Algorithms to Identify Fuzzy Rules for Fuzzy Neural Networks Seventh Internatonal Conference on Intellgent Systems Desgn and Applcatons GA-Based Learnng Algorthms to Identfy Fuzzy Rules for Fuzzy Neural Networks K Almejall, K Dahal, Member IEEE, and A Hossan, Member

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT

APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT 3. - 5. 5., Brno, Czech Republc, EU APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT Abstract Josef TOŠENOVSKÝ ) Lenka MONSPORTOVÁ ) Flp TOŠENOVSKÝ

More information

A Clustering Algorithm for Key Frame Extraction Based on Density Peak

A Clustering Algorithm for Key Frame Extraction Based on Density Peak Journal of Computer and Communcatons, 2018, 6, 118-128 http://www.scrp.org/ournal/cc ISSN Onlne: 2327-5227 ISSN Prnt: 2327-5219 A Clusterng Algorthm for Key Frame Extracton Based on Densty Peak Hong Zhao

More information

Classifier Selection Based on Data Complexity Measures *

Classifier Selection Based on Data Complexity Measures * Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.

More information

Classifying Acoustic Transient Signals Using Artificial Intelligence

Classifying Acoustic Transient Signals Using Artificial Intelligence Classfyng Acoustc Transent Sgnals Usng Artfcal Intellgence Steve Sutton, Unversty of North Carolna At Wlmngton (suttons@charter.net) Greg Huff, Unversty of North Carolna At Wlmngton (jgh7476@uncwl.edu)

More information

Arabic Text Classification Using N-Gram Frequency Statistics A Comparative Study

Arabic Text Classification Using N-Gram Frequency Statistics A Comparative Study Arabc Text Classfcaton Usng N-Gram Frequency Statstcs A Comparatve Study Lala Khresat Dept. of Computer Scence, Math and Physcs Farlegh Dcknson Unversty 285 Madson Ave, Madson NJ 07940 Khresat@fdu.edu

More information

Resolving Surface Forms to Wikipedia Topics

Resolving Surface Forms to Wikipedia Topics Resolvng Surface Forms to Wkpeda Topcs Ypng Zhou Lan Ne Omd Rouhan-Kalleh Flavan Vasle Scott Gaffney Yahoo! Labs at Sunnyvale {zhouy,lanne,omd,flavan,gaffney}@yahoo-nc.com Abstract Ambguty of entty mentons

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

Classification of Face Images Based on Gender using Dimensionality Reduction Techniques and SVM

Classification of Face Images Based on Gender using Dimensionality Reduction Techniques and SVM Classfcaton of Face Images Based on Gender usng Dmensonalty Reducton Technques and SVM Fahm Mannan 260 266 294 School of Computer Scence McGll Unversty Abstract Ths report presents gender classfcaton based

More information

Detection of an Object by using Principal Component Analysis

Detection of an Object by using Principal Component Analysis Detecton of an Object by usng Prncpal Component Analyss 1. G. Nagaven, 2. Dr. T. Sreenvasulu Reddy 1. M.Tech, Department of EEE, SVUCE, Trupath, Inda. 2. Assoc. Professor, Department of ECE, SVUCE, Trupath,

More information

Discriminative Dictionary Learning with Pairwise Constraints

Discriminative Dictionary Learning with Pairwise Constraints Dscrmnatve Dctonary Learnng wth Parwse Constrants Humn Guo Zhuoln Jang LARRY S. DAVIS UNIVERSITY OF MARYLAND Nov. 6 th, Outlne Introducton/motvaton Dctonary Learnng Dscrmnatve Dctonary Learnng wth Parwse

More information

Pruning Training Corpus to Speedup Text Classification 1

Pruning Training Corpus to Speedup Text Classification 1 Prunng Tranng Corpus to Speedup Text Classfcaton Jhong Guan and Shugeng Zhou School of Computer Scence, Wuhan Unversty, Wuhan, 430079, Chna hguan@wtusm.edu.cn State Key Lab of Software Engneerng, Wuhan

More information

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION SHI-LIANG SUN, HONG-LEI SHI Department of Computer Scence and Technology, East Chna Normal Unversty 500 Dongchuan Road, Shangha 200241, P. R. Chna E-MAIL: slsun@cs.ecnu.edu.cn,

More information

A Method of Hot Topic Detection in Blogs Using N-gram Model

A Method of Hot Topic Detection in Blogs Using N-gram Model 84 JOURNAL OF SOFTWARE, VOL. 8, NO., JANUARY 203 A Method of Hot Topc Detecton n Blogs Usng N-gram Model Xaodong Wang College of Computer and Informaton Technology, Henan Normal Unversty, Xnxang, Chna

More information

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints Australan Journal of Basc and Appled Scences, 2(4): 1204-1208, 2008 ISSN 1991-8178 Sum of Lnear and Fractonal Multobjectve Programmng Problem under Fuzzy Rules Constrants 1 2 Sanjay Jan and Kalash Lachhwan

More information

HCMX: AN EFFICIENT HYBRID CLUSTERING APPROACH FOR MULTI-VERSION XML DOCUMENTS

HCMX: AN EFFICIENT HYBRID CLUSTERING APPROACH FOR MULTI-VERSION XML DOCUMENTS HCMX: AN EFFICIENT HYBRID CLUSTERING APPROACH FOR MULTI-VERSION XML DOCUMENTS VIJAY SONAWANE 1, D.RAJESWARA.RAO 2 1 Research Scholar, Department of CSE, K.L.Unversty, Green Felds, Guntur, Andhra Pradesh

More information

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Determining the Optimal Bandwidth Based on Multi-criterion Fusion Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Mult-crteron Fuson Ha-L Lang 1+, Xan-Mn

More information

Cordial and 3-Equitable Labeling for Some Star Related Graphs

Cordial and 3-Equitable Labeling for Some Star Related Graphs Internatonal Mathematcal Forum, 4, 009, no. 31, 1543-1553 Cordal and 3-Equtable Labelng for Some Star Related Graphs S. K. Vadya Department of Mathematcs, Saurashtra Unversty Rajkot - 360005, Gujarat,

More information

Semantic Illustration Retrieval for Very Large Data Set

Semantic Illustration Retrieval for Very Large Data Set Semantc Illustraton Retreval for Very Large Data Set Song Ka, Huang Te-Jun, Tan Yong-Hong Dgtal Meda Lab, Insttute of Computng Technology, Chnese Academy of Scences Beng, 00080, R Chna Insttute for Dgtal

More information

High-Boost Mesh Filtering for 3-D Shape Enhancement

High-Boost Mesh Filtering for 3-D Shape Enhancement Hgh-Boost Mesh Flterng for 3-D Shape Enhancement Hrokazu Yagou Λ Alexander Belyaev y Damng We z Λ y z ; ; Shape Modelng Laboratory, Unversty of Azu, Azu-Wakamatsu 965-8580 Japan y Computer Graphcs Group,

More information

Semantic Image Retrieval Using Region Based Inverted File

Semantic Image Retrieval Using Region Based Inverted File Semantc Image Retreval Usng Regon Based Inverted Fle Dengsheng Zhang, Md Monrul Islam, Guoun Lu and Jn Hou 2 Gppsland School of Informaton Technology, Monash Unversty Churchll, VIC 3842, Australa E-mal:

More information

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices Steps for Computng the Dssmlarty, Entropy, Herfndahl-Hrschman and Accessblty (Gravty wth Competton) Indces I. Dssmlarty Index Measurement: The followng formula can be used to measure the evenness between

More information

Web Document Classification Based on Fuzzy Association

Web Document Classification Based on Fuzzy Association Web Document Classfcaton Based on Fuzzy Assocaton Choochart Haruechayasa, Me-Lng Shyu Department of Electrcal and Computer Engneerng Unversty of Mam Coral Gables, FL 33124, USA charuech@mam.edu, shyu@mam.edu

More information

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 An Iteratve Soluton Approach to Process Plant Layout usng Mxed

More information

A CALCULATION METHOD OF DEEP WEB ENTITIES RECOGNITION

A CALCULATION METHOD OF DEEP WEB ENTITIES RECOGNITION A CALCULATION METHOD OF DEEP WEB ENTITIES RECOGNITION 1 FENG YONG, DANG XIAO-WAN, 3 XU HONG-YAN School of Informaton, Laonng Unversty, Shenyang Laonng E-mal: 1 fyxuhy@163.com, dangxaowan@163.com, 3 xuhongyan_lndx@163.com

More information

Professional competences training path for an e-commerce major, based on the ISM method

Professional competences training path for an e-commerce major, based on the ISM method World Transactons on Engneerng and Technology Educaton Vol.14, No.4, 2016 2016 WIETE Professonal competences tranng path for an e-commerce maor, based on the ISM method Ru Wang, Pn Peng, L-gang Lu & Lng

More information

Hermite Splines in Lie Groups as Products of Geodesics

Hermite Splines in Lie Groups as Products of Geodesics Hermte Splnes n Le Groups as Products of Geodescs Ethan Eade Updated May 28, 2017 1 Introducton 1.1 Goal Ths document defnes a curve n the Le group G parametrzed by tme and by structural parameters n the

More information

A New Approach For the Ranking of Fuzzy Sets With Different Heights

A New Approach For the Ranking of Fuzzy Sets With Different Heights New pproach For the ankng of Fuzzy Sets Wth Dfferent Heghts Pushpnder Sngh School of Mathematcs Computer pplcatons Thapar Unversty, Patala-7 00 Inda pushpndersnl@gmalcom STCT ankng of fuzzy sets plays

More information

Personalized Concept-Based Clustering of Search Engine Queries

Personalized Concept-Based Clustering of Search Engine Queries IEEE TRANSACTIONS ON JOURNAL NAME, MANUSCRIPT ID 1 Personalzed Concept-Based Clusterng of Search Engne Queres Kenneth Wa-Tng Leung, Wlfred Ng, and Dk Lun Lee Abstract The exponental growth of nformaton

More information

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,

More information

Simulation Based Analysis of FAST TCP using OMNET++

Simulation Based Analysis of FAST TCP using OMNET++ Smulaton Based Analyss of FAST TCP usng OMNET++ Umar ul Hassan 04030038@lums.edu.pk Md Term Report CS678 Topcs n Internet Research Sprng, 2006 Introducton Internet traffc s doublng roughly every 3 months

More information

Impact of a New Attribute Extraction Algorithm on Web Page Classification

Impact of a New Attribute Extraction Algorithm on Web Page Classification Impact of a New Attrbute Extracton Algorthm on Web Page Classfcaton Gösel Brc, Banu Dr, Yldz Techncal Unversty, Computer Engneerng Department Abstract Ths paper ntroduces a new algorthm for dmensonalty

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

The Codesign Challenge

The Codesign Challenge ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.

More information

User Tweets based Genre Prediction and Movie Recommendation using LSI and SVD

User Tweets based Genre Prediction and Movie Recommendation using LSI and SVD User Tweets based Genre Predcton and Move Recommendaton usng LSI and SVD Saksh Bansal, Chetna Gupta Department of CSE/IT Jaypee Insttute of Informaton Technology,sec-62 Noda, Inda sakshbansal76@gmal.com,

More information

LRD: Latent Relation Discovery for Vector Space Expansion and Information Retrieval

LRD: Latent Relation Discovery for Vector Space Expansion and Information Retrieval LRD: Latent Relaton Dscovery for Vector Space Expanson and Informaton Retreval Techncal Report KMI-06-09 March, 006 Alexandre Gonçalves, Janhan Zhu, Dawe Song, Vctora Uren, Roberto Pacheco In Proc. of

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

Cross-Language Information Retrieval

Cross-Language Information Retrieval Feature Artcle: Cross-Language Informaton Retreval 19 Cross-Language Informaton Retreval Jan-Yun Ne 1 Abstract A research group n Unversty of Montreal has worked on the problem of cross-language nformaton

More information

Cross-lingual Pseudo Relevance Feedback Based on Weak Relevant Topic Alignment

Cross-lingual Pseudo Relevance Feedback Based on Weak Relevant Topic Alignment Cross-lngual Pseudo Relevance Feedback Based on Weak Relevant opc Algnment WANG Xu-wen Insttute of Medcal Informaton & Lbrary, Chnese Academy of Medcal Scences, Beng 100020 wang.xuwen@mcams.ac.cn ZHANG

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

ICDAR2007 Page Segmentation Competition

ICDAR2007 Page Segmentation Competition ICDAR2007 Page Segmentaton Competton A. Antonacopoulos 1, B. Gatos 2 and D. Brdson 1 1 Pattern Recognton and Image Analyss (PRImA) Research Lab School of Computng, Scence and Engneerng, Unversty of Salford,

More information

A Novel Term_Class Relevance Measure for Text Categorization

A Novel Term_Class Relevance Measure for Text Categorization A Novel Term_Class Relevance Measure for Text Categorzaton D S Guru, Mahamad Suhl Department of Studes n Computer Scence, Unversty of Mysore, Mysore, Inda Abstract: In ths paper, we ntroduce a new measure

More information

Alignment Results of SOBOM for OAEI 2010

Alignment Results of SOBOM for OAEI 2010 Algnment Results of SOBOM for OAEI 2010 Pegang Xu, Yadong Wang, Lang Cheng, Tany Zang School of Computer Scence and Technology Harbn Insttute of Technology, Harbn, Chna pegang.xu@gmal.com, ydwang@ht.edu.cn,

More information

Object-Based Techniques for Image Retrieval

Object-Based Techniques for Image Retrieval 54 Zhang, Gao, & Luo Chapter VII Object-Based Technques for Image Retreval Y. J. Zhang, Tsnghua Unversty, Chna Y. Y. Gao, Tsnghua Unversty, Chna Y. Luo, Tsnghua Unversty, Chna ABSTRACT To overcome the

More information

Learning Semantics-Preserving Distance Metrics for Clustering Graphical Data

Learning Semantics-Preserving Distance Metrics for Clustering Graphical Data Learnng Semantcs-Preservng Dstance Metrcs for Clusterng Graphcal Data Aparna S. Varde, Elke A. Rundenstener Carolna Ruz Mohammed Manruzzaman,3 Rchard D. Ssson Jr.,3 Department of Computer Scence Center

More information

A Novel Video Retrieval Method Based on Web Community Extraction Using Features of Video Materials

A Novel Video Retrieval Method Based on Web Community Extraction Using Features of Video Materials IEICE TRANS. FUNDAMENTALS, VOL.E92 A, NO.8 AUGUST 2009 1961 PAPER Specal Secton on Sgnal Processng A Novel Vdeo Retreval Method Based on Web Communty Extracton Usng Features of Vdeo Materals Yasutaka HATAKEYAMA

More information

Ranking Search Results by Web Quality Dimensions

Ranking Search Results by Web Quality Dimensions Rankng Search Results by Web Qualty Dmensons Joshua C. C. Pun Department of Computer Scence HKUST Clear Water Bay, Kowloon Hong Kong punjcc@cs.ust.hk Frederck H. Lochovsky Department of Computer Scence

More information

Enhancement of Infrequent Purchased Product Recommendation Using Data Mining Techniques

Enhancement of Infrequent Purchased Product Recommendation Using Data Mining Techniques Enhancement of Infrequent Purchased Product Recommendaton Usng Data Mnng Technques Noraswalza Abdullah, Yue Xu, Shlomo Geva, and Mark Loo Dscplne of Computer Scence Faculty of Scence and Technology Queensland

More information

Load Balancing for Hex-Cell Interconnection Network

Load Balancing for Hex-Cell Interconnection Network Int. J. Communcatons, Network and System Scences,,, - Publshed Onlne Aprl n ScRes. http://www.scrp.org/journal/jcns http://dx.do.org/./jcns.. Load Balancng for Hex-Cell Interconnecton Network Saher Manaseer,

More information

Hybrid Non-Blind Color Image Watermarking

Hybrid Non-Blind Color Image Watermarking Hybrd Non-Blnd Color Image Watermarkng Ms C.N.Sujatha 1, Dr. P. Satyanarayana 2 1 Assocate Professor, Dept. of ECE, SNIST, Yamnampet, Ghatkesar Hyderabad-501301, Telangana 2 Professor, Dept. of ECE, AITS,

More information

Improving Web Search Results Using Affinity Graph

Improving Web Search Results Using Affinity Graph Improvng Web Search Results Usng Affnty Graph Benyu Zhang, Hua L 2, Y Lu 3, Le J 4, Wens X 5, Weguo Fan 5, Zheng Chen, We-Yng Ma Mcrosoft Research Asa, 49 Zhchun Road, Bejng, 00080, P. R. Chna {byzhang,

More information