Single Document Keyphrase Extraction Using Neighborhood Knowledge

Size: px
Start display at page:

Download "Single Document Keyphrase Extraction Using Neighborhood Knowledge"

Transcription

1 Proceedngs of the Twenty-Thrd AAAI Conference on Artfcal Intellgence (2008) Sngle Document Keyphrase Extracton Usng Neghborhood Knowledge Xaoun Wan and Janguo Xao Insttute of Computer Scence and Technology Pekng Unversty, Beng , Chna {wanxaoun, Abstract Exstng methods for sngle document keyphrase extracton usually make use of only the nformaton contaned n the specfed document. Ths paper proposes to use a small number of nearest neghbor documents to provde more knowledge to mprove sngle document keyphrase extracton. A specfed document s expanded to a small document set by addng a few neghbor documents close to the document, and the graph-based rankng algorthm s then appled on the expanded document set to make use of both the local nformaton n the specfed document and the global nformaton n the neghbor documents. Expermental results demonstrate the good effectveness and robustness of our proposed approach. Introducton A keyphrase s defned as a meanngful and sgnfcant expresson consstng of one or more words n a document. Approprate keyphrases can serve as a hghly condensed summary for a document, and they can be used as a label for the document to supplement or replace the ttle or summary, or they can be hghlghted wthn the body of the document to facltate users fast browsng and readng. Moreover, document keyphrases have been successfully used n the followng IR and NLP tasks: document ndexng (Gutwn et al., 1999), document classfcaton (Krulwch and Burkey, 1996), document clusterng (Hammouda et al., 2005) and document summarzaton (Berger and Mttal, 2000). Keyphrases are usually manually assgned by authors, especally for ournal or conference artcles. However, the vast maorty of documents (e.g. news artcles, magazne artcles) do not have keyphrases, therefore t s benefcal to automatcally extract a few keyphrases from a gven document to delver the man content of the document. Here, keyphrases are selected from wthn the body of the nput document, wthout a predefned lst (.e. controlled vocabulary). Though keyphrase extracton s an mportant research topc n the NLP and IR feld, t has receved less attenton than t deserves. Most prevous works focus on keyphrase extracton for ournal or conference artcles, whle ths paper focus on keyphrase extracton for news artcles because news artcle s one of the most popular Copyrght 2008, Amercan Assocaton for Artfcal Intellgence ( All rghts reserved. document genres on the web and most news artcles have no author-assgned keyphrases. Exstng methods conduct the keyphrase extracton task usng only the nformaton contaned n the specfed document, ncludng the phrase s TFIDF, poston and other syntactc nformaton n the document. One common assumpton of exstng methods s that the documents are ndependent of each other. And the keyphrase extracton task s conducted separately wthout nteractons for each document. However, some topc-related documents actually have mutual nfluences and contan useful clues whch can help to extract keyphrases from each other. For example, two documents about the same topc earthquake would share a few common phrases, e.g. earthquake, vctm, and they can provde addtonal knowledge for each other to better evaluate and extract salent keyphrases from each other. Therefore, gven a specfed document, we can retreve a few documents topcally close to the document from a large corpus through search engnes, and these neghbor documents are deemed benefcal to evaluate and extract keyphrases from the document because they can provde more knowledge and clues for keyphrase extracton from the specfed document. Ths study proposes to construct an approprate knowledge context for a specfed document by leveragng a few neghbor documents close to the specfed document. The neghborhood knowledge can be used n the keyphrase extracton process and help to extract salent keyphrases from the document. In partcular, the graph-based rankng algorthm s employed for sngle document keyphrase extracton by makng use of both the word relatonshps n the specfed document and the word relatonshps n the neghbor documents, where the former relatonshps reflect the local nformaton exstng n the specfed document and the latter relatonshps reflect the global nformaton exstng n the neghborhood. Experments have been performed on a dataset consstng of 308 news artcles and human-annotated keyphrases, and the results demonstrate the good effectveness of the proposed approach. The use of the neghborhood knowledge can sgnfcantly mprove the performance of sngle document keyphrase extracton. We also nvestgate how the sze of the neghborhood nfluences the keyphrase extracton performance and t s encouragng that a small number of neghbor documents can mprove the performance. 855

2 Related Work The methods for keyphrase (or keyword) extracton can be roughly categorzed nto ether unsupervsed or supervsed. In ths study, we focus on unsupervsed methods. Unsupervsed methods usually nvolve assgnng a salency score to each canddate phrases by consderng varous features. Krulwch and Burkey (1996) use heurstcs to extract keyphrases from a document. The heurstcs are based on syntactc clues, such as the use of talcs, the presence of phrases n secton headers, and the use of acronyms. Barker and Cornaccha (2000) propose a smple system for choosng noun phrases from a document as keyphrases. Muñoz (1996) uses an unsupervsed learnng algorthm to dscover two-word keyphrases. The algorthm s based on Adaptve Resonance Theory (ART) neural networks. Steer and Belew (1993) use the mutual nformaton statstcs to dscover two-word keyphrases. Tomokyo and Hurst (2003) use pontwse KL-dvergence between multple language models for scorng both phraseness and nformatveness of phrases. More recently, Mhalcea and Tarau (2004) propose the TextRank model to rank keywords based on the co-occurrence lnks between words. Such algorthms make use of votng or recommendatons between words to extract keyphrases. Supervsed machne learnng algorthms have been proposed to classfy a canddate phrase nto ether keyphrase or not. GenEx (Turney, 2000) and Kea (Frank et al., 1999; Wtten et al., 1999) are two typcal systems, and the most mportant features for classfyng a canddate phrase are the frequency and locaton of the phrase n the document. More lngustc knowledge has been explored by Hulth (2003). Statstcal assocatons between keyphrases have been used to enhance the coherence of the extracted keyphrases (Turney, 2003). Song et al. (2003) present an nformaton gan-based keyphrase extracton system called KPSpotter. Medelyan and Wtten (2006) propose KEA++ that enhances automatc keyphrase extracton by usng semantc nformaton on terms and phrases gleaned from a doman-specfc thesaurus. Nguyen and Kan (2007) focus on keyphrase extracton n scentfc publcatons by usng new features that capture salent morphologcal phenomena found n scentfc keyphrases. All the above methods make use of only the nformaton contaned n the specfed document. The use of neghbor documents to mprove sngle document keyphrase extracton has not been nvestgated yet. Other related works nclude web page keyword extracton (Kelleher and Luz, 2005), advertsng keywords fndng (Yh et al., 2006). It s noteworthy that collaboratve technques have been successfully used n the tasks of nformaton flterng (Xue et al., 2005), document summarzaton (Wan et al., 2007) and web mnng (Wong et al., 2006). Proposed Approach Overvew Gven a specfed document d 0 for keyphrase extracton, the proposed approach frst fnds a few neghbor documents for document d 0. The neghbor documents are topcally close to the specfed document and they construct the neghborhood knowledge context for the specfed document. In other words, document d 0 s expanded to a small document set D whch provdes more knowledge and clues for keyphrase extracton from d 0. Gven the expanded document set, the proposed approach adopts the graph-based rankng algorthm to ncorporate both the word relatonshps n d 0 (local nformaton) and the word relatonshps n neghbor documents (global nformaton) for keyphrase extracton from d 0. Fgure 1 gves the framework of the proposed approach. 1. Neghborhood Constructon: Expand the specfed document d 0 to a small document set D={d 0, d 1, d 2, d k } by addng k neghbor documents. The neghbor documents d 1, d 2,, d k can be obtaned by usng document smlarty search technques; 2. Keyphrase Extracton: Gven document d 0 and the expanded document set D, perform the followng steps to extract keyphrases for d 0 : a) Neghborhood-level Word Evaluaton: Buld a global affnty graph G based on all canddate words restrcted by syntactc flters n all the documents of the expanded document set D, and employ the graph-based rankng algorthm to compute the global salency score for each word. b) Document-level Keyphrase Extracton: For the specfed document d 0, evaluate the canddate phrases n the document based on the scores of the words contaned n the phrases, and fnally choose a few phrases wth hghest scores as the keyphrases of the document. Fgure 1: The framework of the proposed approach For the frst step n the above framework, dfferent smlarty search technques can be adopted to obtan neghbor documents close to the specfed document. The number k of the neghbor documents nfluences the keyphrase extracton performance and wll be nvestgated n the experments. For the second step n the above framework, substep a) ams to evaluate all canddate words n the expanded document set based on the graph-based rankng algorthm. The global affnty graph ams to reflect the neghborhoodlevel co-occurrence relatonshps between all canddate words n the expanded document set. The salency scores of the words are computed based on the global affnty graph to ndcate how much nformaton about the man topc the words reflect. Substep b) ams to evaluate the canddate phrases n the specfed document based on the neghborhood-level word scores, and then choose a few salent phrases as the keyphrases of the document. 856

3 Neghborhood Constructon Gven a specfed document d 0, neghborhood constructon ams to fnd a few nearest neghbors for the document from a text corpus or on the Web. The k neghbor documents d 1, d 2,, d k and the specfed document d 0 buld the expanded document set D={d 0, d 1, d 2,, d k } for d 0, whch can be consdered as the expanded knowledge context for document d 0. The neghbor documents can be obtaned by usng the technque of document smlarty search. Document smlarty search s to fnd documents smlar to a query document n a text corpus and return a ranked lst of smlar documents to users. The effectveness of document smlarty search reles on the functon for evaluatng the smlarty between two documents. In ths study, we use the wdely-used cosne measure to evaluate document smlarty and the term weght s computed by TFIDF. The smlarty sm doc (d,d ), between documents d and d, can be defned as the normalzed nner product of the two term vectors d r and d r : r r d d smdoc (d,d ) = r r (1) d d In the experments, we smply use the cosne measure to compute the parwse smlarty value between the specfed document d 0 and the documents n the corpus, and then choose k documents (dfferent from d 0 ) wth the largest smlarty values as the nearest neghbors for d 0. Fnally, there are totally k+1 documents n the expanded document set. For the document set D={d 0, d 1, d 2,, d k }, the parwse cosne smlarty values between documents are calculated and recorded for later use. The effcency of document smlarty search can be sgnfcantly mproved by adoptng some ndex structure n the mplemented system, such as K-D-B tree, R-tree, SS-tree, SR-tree and X-tree (Böhm & Berchtold, 2001). The use of neghborhood nformaton s worth more dscusson. Because neghbor documents mght not be sampled from the same generatve model as the specfed document, we probably do not want to trust them so much as the specfed document. Thus a confdence value s assocated wth every document n the expanded document set, whch reflects out belef that the document s sampled from the same underlyng model as the specfed document. When a document s close to the specfed one, the confdence value s hgh, but when t s farther apart, the confdence value wll be reduced. Heurstcally, we use the cosne smlarty between a document and the specfed document as the confdence value. The confdence values of the neghbor documents wll be ncorporated n the keyphrase extracton algorthm. Keyphrase Extracton a) Neghborhood-Level Word Evaluaton Lke the PageRank algorthm (Page et al., 1998), the graph-based rankng algorthm employed n ths study s essentally a way of decdng the mportance of a vertex wthn a graph based on global nformaton recursvely drawn from the entre graph. The basc dea s that of votng or recommendaton between the vertces. A lnk between two vertces s consdered as a vote cast from one vertex to the other vertex. The score assocated wth a vertex s determned by the votes that are cast for t, and the score of the vertces castng these votes. Formally, gven the expanded document set D, let G=(V, E) be an undrected graph to reflect the relatonshps between words n the document set. V s the set of vertces and each vertex s a canddate word 1 n the document set. Because not all words n the documents are good ndcators of keyphrases, the words added to the graph are restrcted wth syntactc flters,.e., only the words wth a certan part of speech are added. As n Mhalcea and Tarau (2004), the documents are tagged by a POS tagger, and only the nouns and adectves are added nto the vertex set 2. E s the set of edges, whch s a subset of V V. Each edge e n E s assocated wth an affnty weght aff(v,v ) between words v and v. The weght s computed based on the cooccurrence relaton between the two words, controlled by the dstance between word occurrences. The co-occurrence relaton can express coheson relatonshps between words. Two vertces are connected f the correspondng words cooccur at least once wthn a wndow of maxmum w words, where w can be set anywhere from 2 to 20 words. The affnty weght aff(v,v ) s smply set to be the count of the controlled co-occurrences between the words v and v n the whole document set as follows: aff ( v,v ) = smdoc( d0, d p ) countd ( v,v ) (2) d p D ( v where countd p,v ) s the count of the controlled cooccurrences between words v and v n document d p, and sm doc (d 0,d p ) s the smlarty factor to reflect the confdence value for usng document d p (0 p k) n the expanded document set. The graph s bult based on the whole document set and t can reflect the global nformaton n the neghborhood, whch s called Global Affnty Graph. We use an affnty matrx M to descrbe G wth each entry correspondng to the weght of an edge n the graph. M = (M, ) V V s defned as follows: M aff( v,v ), f v lnks wth v and ; = 0, otherwse, (3) Then M s normalzed to M ~ as follows to make the sum of each row equal to 1: 1 The orgnal words are used wthout stemmng. 2 The correspondng POS tags of the canddate words nclude JJ, NN, NNS, NNP, NNPS. We used the Stanford log-lnear POS tagger (Toutanova and Mannng, 2000) n ths study. p 857

4 V V ~ M, M,, f M, 0 M = (4), = 1 = 1 0, otherwse Based on the global affnty graph G, the salency score WordScore(v ) for word v can be deduced from those of all other words lnked wth t and t can be formulated n a recursve form as n the PageRank algorthm: ~ (1 µ ) WordScore( v ) = µ WordScore( v ) M, + all V (5) And the matrx form s: r ~ r T (1 µ ) r λ = µ M λ + e V (6) where λ r = [ WordScore( v )] V s the vector of word 1 salency scores. e r s a vector wth all elements equalng to 1. µ s the dampng factor usually set to 0.85, as n the PageRank algorthm. The above process can be consdered as a Markov chan by takng the words as the states and the correspondng transton matrx s gven by ~ (1 µ ) r µ M T + e. The V statonary probablty dstrbuton of each state s obtaned by the prncpal egenvector of the transton matrx. For mplementaton, the ntal scores of all words are set to 1 and the teraton algorthm n Equaton (5) s adopted to compute the new scores of the words. Usually the convergence of the teraton algorthm s acheved when the dfference between the scores computed at two successve teratons for any words falls below a gven threshold ( n ths study). b) Document-Level Keyphrase Extracton After the scores of all canddate words n the document set have been computed, canddate phrases (ether sngle-word or mult-word) are selected and evaluated for the specfed document d 0. The canddate words (.e. nouns and adectves) of d 0, whch s a subset of V, are marked n the text of document d 0, and sequences of adacent canddate words are collapsed nto a mult-word phrase. The phrases endng wth an adectve s not allowed, and only the phrases endng wth a noun are collected as canddate phrases for the document. For nstance, n the followng sentence: Mad/JJ cow/nn dsease/nn has/vbz klled/vbn 10,000/CD cattle/nns, the canddate phrases are Mad cow dsease and cattle. The score of a canddate phrase p s computed by summng the neghborhood-level salency scores of the words contaned n the phrase. PhraseScor e( p ) = WordScore( ) (7) v v p All the canddate phrases n document d 0 are ranked n decreasng order of the phrase scores and the top m phrases are selected as the keyphrases of d 0. m ranges from 1 to 20 n ths study. Emprcal Evaluaton Evaluaton Setup To our knowledge, there was no gold standard news dataset wth assgned keyphrases for evaluaton. So we manually annotated the DUC2001 dataset (Over, 2001) and used the annotated dataset for evaluaton n ths study. The dataset was orgnally used for document summarzaton. It conssted of 309 news artcles collected from TREC-9, n whch two artcles were duplcate (.e. d05a\fbis and d05a\fbis-41815~), so the actual document number was 308. The artcles could be categorzed nto 30 news topcs and the average length of the documents was 740 words. Two graduate students were employed to manually label the keyphrases for each document. At most 10 keyphrases could be assgned to each document. The annotaton process lasted two weeks. The Kappa statstc for measurng nter-agreement among annotators was And then the annotaton conflcts between the two subects were solved by dscusson. Fnally, 2488 keyphrases were labeled for the dataset. The average keyphrase number per document was 8.08 and the average word number per keyphrase was In the experments, the DUC2001 dataset was consdered as the corpus for document expanson n ths study, whch could be easly expanded by addng more documents. Each specfed document was expanded by addng k documents (dfferent from the specfed document) most smlar to the document. For evaluaton of keyphrase extracton results, the automatc extracted keyphrases were compared wth the manually labeled keyphrases. The words n a keyphrase were converted to ther correspondng basc forms usng word stemmng before comparson. The precson p=count correct /count system, recall r=count correct /count human, F- measure (F=2pr/(p+r)) were used as evaluaton metrcs, where count correct was the total number of correct keyphrases extracted by the system, and count system was the total number of automatc extracted keyphrases, and count human was the total number of human-labeled keyphrases. Evaluaton Results The proposed approach (.e. ExpandRank) s compared wth the baselne methods relyng only on the specfed document (.e. SngleRank and TFIDF). The SngleRank baselne uses the graph-based rankng algorthm to compute the word scores for each sngle document based on the local graph for the specfed document. The TFIDF baselne computes the word scores for each sngle document based on the word s TFIDF value n the specfed document. The two baselnes do not make use of the neghborhood knowledge. Table 1 gves the comparson results of the baselne methods and the proposed ExpandRank methods wth dfferent neghbor numbers (k=1, 5, 10). In the experments, the keyphrase number m s typcally set to 10 because at 858

5 most 10 keyphrases can be manually labeled for each document, and the co-occurrence wndow sze w s also smply set to 10. Table 1. Keyphrase Extracton Results System TFIDF SngleRank ExpandRank (k=1) ExpandRank (k=5) ExpandRank (k=10) Seen from Table 1, the ExpandRank methods wth dfferent neghbor numbers can always outperform the baselne methods of SngleRank and TFIDF over all three metrcs. The results demonstrate the good effectveness of the proposed method. In order to nvestgate how the sze of the neghborhood nfluences the keyphrase extracton performance, we conduct experments wth dfferent values of the neghbor number k. Fgure 2 shows the performance curves for the ExpandRank method. In the fgure, k ranges from 0 to 15. Note that when k=0, the ExpandRank method degenerates nto the baselne SngleRank method. We can see from the fgure that the performance of ExpandRank (.e. k>0) can always outperform the baselne SngleRank method (.e. k=0), no matter how many neghbor documents are used. We can also see that the performance of ExpandRank frst ncreases and then decreases wth the ncrease of k. The trend demonstrates that very few or very many neghbors wll deterorate the results, because very few neghbors cannot provde suffcent knowledge and very many neghbors may ntroduce nosy knowledge. Seen from the fgure, t s not necessary to use many neghbors for ExpandRank, and the neghbor number can be set to a comparable small number (.e. 5), whch wll mprove the computatonal effcency and make the propose approach more applcable Neghbor number k Fgure 2: ExpandRank (m=10, w=10) performance vs. neghbor number k In order to nvestgate how the co-occurrence wndow sze nfluences the keyphrase extracton performance, we conduct experments wth dfferent wndow sze w. Fgures 3 and 4 show the performance curves for ExpandRank when w ranges from 2 to 20. In Fgure 3 the neghbor number s set to 5 and n Fgure 4 the neghbor number s set to 10. We can see from the fgures that the performances are almost not affected by the wndow sze, except when w s set to Wndow sze w Fgure 3: ExpandRank (k=5, m=10) performance vs. wndow sze w Wndow sze w Fgure 4: ExpandRank (k=10, m=10) performance vs. wndow sze w In the above experments, the keyphrase number s set to 10. We further conduct experments wth dfferent keyphrase number m to nvestgate how the keyphrase number nfluences the keyphrase extracton performance. Fgures 5 and 6 show the performance curves for ExpandRank when m ranges from 1 to 20. In Fgure 5 the neghbor number s set to 5 and n Fgure 6 the neghbor number s set to 10. We can see from the fgures that the precson values decrease wth the ncrease of m, and the recall values ncreases wth the ncrease of m, whle the F- measure values frst ncrease and then tend to decrease wth the ncrease of m Keyphrase number m Fgure 5: ExpandRank (k=5, w=10) performance vs. keyphrase number m 859

6 Keyphrase number m Fgure 6: ExpandRank (k=10, w=10) performance vs. keyphrase number m It s noteworthy that the proposed approach has hgher computatonal complexty than the baselne approach because t nvolves more documents, and we can mprove ts effcency by collaboratvely conductng sngle document keyphrase extractons n a batch mode. Suppose there are multple documents to be extracted separately, we can group the documents nto clusters, and for each cluster, we can use all other documents as the neghbors for a specfed document. Thus the mutual nfluences between all documents can be ncorporated nto the keyphrase extracton algorthm and all the words and phrases n the documents of a cluster are evaluated collaboratvely, resultng n keyphrase extracton for all the sngle documents n a batch mode. Concluson and Future Work Ths paper proposes a novel approach to sngle document keyphrase extracton by leveragng the neghborhood knowledge of the specfed document. In future work, other keyphrase extracton algorthms wll be ntegrated nto the proposed framework, and we wll use more test data for evaluaton to valdate the robustness of the proposed approach. Acknowledgements Ths work was supported by the Natonal Scence Foundaton of Chna (No ) and the Research Fund for the Doctoral Program of Hgher Educaton of Chna (No ). References Berger, A., and Mttal, V OCELOT: A system for summarzng Web Pages. In Proceedngs of SIGIR2000. Barker, K., and Cornaccha, N Usng nounphrase heads to extract document keyphrases. In Canadan Conference on AI. Böhm, C., and Berchtold, S Searchng n hgh-dmensonal spacesndex structures for mprovng the performance of multmeda databases. ACM Computng Surveys, 33(3): Frank, E.; Paynter, G. W.; Wtten, I. H.; Gutwn, C.; and Nevll-Mannng, C. G Doman-specfc keyphrase extracton. Proceedngs of IJCAI- 99, pp Gutwn, C.; Paynter, G. W.; Wtten, I. H.; Nevll-Mannng, C. G.; and Frank, E Improvng browsng n dgtal lbrares wth keyphrase ndexes. Journal of Decson Support Systems, 27, Hammouda, K. M.; Matute, D. N.; and Kamel, M. S CorePhrase: keyphrase extracton for document clusterng. In Proceedngs of MLDM2005. Hulth, A Improved automatc keyword extracton gven more lngustc knowledge. In Proceedngs of EMNLP2003. Kelleher, D., and Luz, S Automatc hypertext keyphrase detecton. In Proceedngs of IJCAI2005. Krulwch, B., and Burkey, C Learnng user nformaton nterests through the extracton of semantcally sgnfcant phrases. In AAAI 1996 Sprng Symposum on Machne Learnng n Informaton Access. Medelyan, O., and Wtten, I. H Thesaurus based automatc keyphrase ndexng. In Proceedngs of JCDL2006. Mhalcea, R., and Tarau, P TextRank: Brngng order nto texts. In Proceedngs of EMNLP2004. Muñoz, A Compound key word generaton from document databases usng a herarchcal clusterng ART model. Intellgent Data Analyss, 1(1). Nguyen, T. D., and Kan, M.-Y Keyphrase extracton n scentfc publcatons. In Proceedngs of ICADL2007. Over, P Introducton to DUC-2001: an ntrnsc evaluaton of generc news text summarzaton systems. In Proceedngs of DUC2001. Page, L.; Brn, S.; Motwan, R.; and Wnograd, T The pagerank ctaton rankng: Brngng order to the web. Techncal report, Stanford Dgtal Lbrares. Song, M.; Song, I.-Y.; and Hu, X KPSpotter: a flexble nformaton gan-based keyphrase extracton system. In Proceedngs of WIDM2003. Steer, A. M., and Belew, R. K Exportng phrases: A statstcal analyss of topcal language. In Proceedngs of Second Symposum on Document Analyss and Informaton Retreval, pp Tomokyo, T., and Hurst, M A language model approach to keyphrase extracton. In Proceedngs of ACL Workshop on Multword Expressons. Toutanova, K., and Mannng, C. D Enrchng the knowledge sources used n a maxmum entropy Part-of-Speech tagger. In Proceedngs of EMNLP/VLC Turney, P. D Learnng algorthms for keyphrase extracton. Informaton Retreval, 2: Turney, P. D Coherent keyphrase extracton va web mnng. In Proc. of IJCAI-03, pages Wan, X.; Yang, J.; and Xao, J Sngle document summarzaton wth document expanson. In Proceedngs of AAAI2007 Wtten, I. H.; Paynter, G. W.; Frank, E.; Gutwn, C.; and Nevll-Mannng, C. G KEA: Practcal automatc keyphrase extracton. Proceedngs of Dgtal Lbrares 99 (DL'99), pp Wong, T.-L.; Lam, W.; and Chan, S.-K Collaboratve nformaton extracton and mnng from multple web documents. In Proceedngs of SDM2006. Xue, G.-R.; Ln, C.; Yang, Q.; X, W.; Zeng, H.-J.; Yu, Y.; and Chen, Z Scalable collaboratve flterng usng cluster-based smoothng. In Proceedngs of SIGIR2005. Yh, W.-T.; Goodman, J.; and Carvalho, V. R Fndng advertsng keywords on web pages. In Proceedngs of WWW

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems A Unfed Framework for Semantcs and Feature Based Relevance Feedback n Image Retreval Systems Ye Lu *, Chunhu Hu 2, Xngquan Zhu 3*, HongJang Zhang 2, Qang Yang * School of Computng Scence Smon Fraser Unversty

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

Classifying Acoustic Transient Signals Using Artificial Intelligence

Classifying Acoustic Transient Signals Using Artificial Intelligence Classfyng Acoustc Transent Sgnals Usng Artfcal Intellgence Steve Sutton, Unversty of North Carolna At Wlmngton (suttons@charter.net) Greg Huff, Unversty of North Carolna At Wlmngton (jgh7476@uncwl.edu)

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

Query Clustering Using a Hybrid Query Similarity Measure

Query Clustering Using a Hybrid Query Similarity Measure Query clusterng usng a hybrd query smlarty measure Fu. L., Goh, D.H., & Foo, S. (2004). WSEAS Transacton on Computers, 3(3), 700-705. Query Clusterng Usng a Hybrd Query Smlarty Measure Ln Fu, Don Hoe-Lan

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

The Research of Support Vector Machine in Agricultural Data Classification

The Research of Support Vector Machine in Agricultural Data Classification The Research of Support Vector Machne n Agrcultural Data Classfcaton Le Sh, Qguo Duan, Xnmng Ma, Me Weng College of Informaton and Management Scence, HeNan Agrcultural Unversty, Zhengzhou 45000 Chna Zhengzhou

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

Classifier Selection Based on Data Complexity Measures *

Classifier Selection Based on Data Complexity Measures * Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.

More information

UB at GeoCLEF Department of Geography Abstract

UB at GeoCLEF Department of Geography   Abstract UB at GeoCLEF 2006 Mguel E. Ruz (1), Stuart Shapro (2), June Abbas (1), Slva B. Southwck (1) and Davd Mark (3) State Unversty of New York at Buffalo (1) Department of Lbrary and Informaton Studes (2) Department

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

Keyword-based Document Clustering

Keyword-based Document Clustering Keyword-based ocument lusterng Seung-Shk Kang School of omputer Scence Kookmn Unversty & AIrc hungnung-dong Songbuk-gu Seoul 36-72 Korea sskang@kookmn.ac.kr Abstract ocument clusterng s an aggregaton of

More information

Description of NTU Approach to NTCIR3 Multilingual Information Retrieval

Description of NTU Approach to NTCIR3 Multilingual Information Retrieval Proceedngs of the Thrd NTCIR Workshop Descrpton of NTU Approach to NTCIR3 Multlngual Informaton Retreval Wen-Cheng Ln and Hsn-Hs Chen Department of Computer Scence and Informaton Engneerng Natonal Tawan

More information

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Determining the Optimal Bandwidth Based on Multi-criterion Fusion Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Mult-crteron Fuson Ha-L Lang 1+, Xan-Mn

More information

Keywords - Wep page classification; bag of words model; topic model; hierarchical classification; Support Vector Machines

Keywords - Wep page classification; bag of words model; topic model; hierarchical classification; Support Vector Machines (IJCSIS) Internatonal Journal of Computer Scence and Informaton Securty, Herarchcal Web Page Classfcaton Based on a Topc Model and Neghborng Pages Integraton Wongkot Srura Phayung Meesad Choochart Haruechayasak

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for

More information

Pruning Training Corpus to Speedup Text Classification 1

Pruning Training Corpus to Speedup Text Classification 1 Prunng Tranng Corpus to Speedup Text Classfcaton Jhong Guan and Shugeng Zhou School of Computer Scence, Wuhan Unversty, Wuhan, 430079, Chna hguan@wtusm.edu.cn State Key Lab of Software Engneerng, Wuhan

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK

FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK L-qng Qu, Yong-quan Lang 2, Jng-Chen 3, 2 College of Informaton Scence and Technology, Shandong Unversty of Scence and Technology,

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

A Knowledge Management System for Organizing MEDLINE Database

A Knowledge Management System for Organizing MEDLINE Database A Knowledge Management System for Organzng MEDLINE Database Hyunk Km, Su-Shng Chen Computer and Informaton Scence Engneerng Department, Unversty of Florda, Ganesvlle, Florda 32611, USA Wth the exploson

More information

Combining Multiple Resources, Evidence and Criteria for Genomic Information Retrieval

Combining Multiple Resources, Evidence and Criteria for Genomic Information Retrieval Combnng Multple Resources, Evdence and Crtera for Genomc Informaton Retreval Luo S 1, Je Lu 2 and Jame Callan 2 1 Department of Computer Scence, Purdue Unversty, West Lafayette, IN 47907, USA ls@cs.purdue.edu

More information

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur FEATURE EXTRACTION Dr. K.Vjayarekha Assocate Dean School of Electrcal and Electroncs Engneerng SASTRA Unversty, Thanjavur613 41 Jont Intatve of IITs and IISc Funded by MHRD Page 1 of 8 Table of Contents

More information

Arabic Text Classification Using N-Gram Frequency Statistics A Comparative Study

Arabic Text Classification Using N-Gram Frequency Statistics A Comparative Study Arabc Text Classfcaton Usng N-Gram Frequency Statstcs A Comparatve Study Lala Khresat Dept. of Computer Scence, Math and Physcs Farlegh Dcknson Unversty 285 Madson Ave, Madson NJ 07940 Khresat@fdu.edu

More information

Optimizing Document Scoring for Query Retrieval

Optimizing Document Scoring for Query Retrieval Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng

More information

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION SHI-LIANG SUN, HONG-LEI SHI Department of Computer Scence and Technology, East Chna Normal Unversty 500 Dongchuan Road, Shangha 200241, P. R. Chna E-MAIL: slsun@cs.ecnu.edu.cn,

More information

Federated Search of Text-Based Digital Libraries in Hierarchical Peer-to-Peer Networks

Federated Search of Text-Based Digital Libraries in Hierarchical Peer-to-Peer Networks Federated Search of Text-Based Dgtal Lbrares n Herarchcal Peer-to-Peer Networks Je Lu School of Computer Scence Carnege Mellon Unversty Pttsburgh, PA 15213 jelu@cs.cmu.edu Jame Callan School of Computer

More information

Problem Set 3 Solutions

Problem Set 3 Solutions Introducton to Algorthms October 4, 2002 Massachusetts Insttute of Technology 6046J/18410J Professors Erk Demane and Shaf Goldwasser Handout 14 Problem Set 3 Solutons (Exercses were not to be turned n,

More information

CSCI 5417 Information Retrieval Systems Jim Martin!

CSCI 5417 Information Retrieval Systems Jim Martin! CSCI 5417 Informaton Retreval Systems Jm Martn! Lecture 11 9/29/2011 Today 9/29 Classfcaton Naïve Bayes classfcaton Ungram LM 1 Where we are... Bascs of ad hoc retreval Indexng Term weghtng/scorng Cosne

More information

Feature Selection as an Improving Step for Decision Tree Construction

Feature Selection as an Improving Step for Decision Tree Construction 2009 Internatonal Conference on Machne Learnng and Computng IPCSIT vol.3 (2011) (2011) IACSIT Press, Sngapore Feature Selecton as an Improvng Step for Decson Tree Constructon Mahd Esmael 1, Fazekas Gabor

More information

Recommended Items Rating Prediction based on RBF Neural Network Optimized by PSO Algorithm

Recommended Items Rating Prediction based on RBF Neural Network Optimized by PSO Algorithm Recommended Items Ratng Predcton based on RBF Neural Network Optmzed by PSO Algorthm Chengfang Tan, Cayn Wang, Yuln L and Xx Q Abstract In order to mtgate the data sparsty and cold-start problems of recommendaton

More information

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning Outlne Artfcal Intellgence and ts applcatons Lecture 8 Unsupervsed Learnng Professor Danel Yeung danyeung@eee.org Dr. Patrck Chan patrckchan@eee.org South Chna Unversty of Technology, Chna Introducton

More information

Improving Web Search Results Using Affinity Graph

Improving Web Search Results Using Affinity Graph Improvng Web Search Results Usng Affnty Graph Benyu Zhang, Hua L 2, Y Lu 3, Le J 4, Wens X 5, Weguo Fan 5, Zheng Chen, We-Yng Ma Mcrosoft Research Asa, 49 Zhchun Road, Bejng, 00080, P. R. Chna {byzhang,

More information

Using Wikipedia Anchor Text and Weighted Clustering Coefficient to Enhance the Traditional Multi-Document Summarization

Using Wikipedia Anchor Text and Weighted Clustering Coefficient to Enhance the Traditional Multi-Document Summarization Usng Wkpeda Anchor Text and Weghted Clusterng Coeffcent to Enhance the Tradtonal Mult-Document Summarzaton by Nraj Kumar, Kannan Srnathan, Vasudeva Varma n 13th Internatonal Conference on Intellgent Text

More information

X- Chart Using ANOM Approach

X- Chart Using ANOM Approach ISSN 1684-8403 Journal of Statstcs Volume 17, 010, pp. 3-3 Abstract X- Chart Usng ANOM Approach Gullapall Chakravarth 1 and Chaluvad Venkateswara Rao Control lmts for ndvdual measurements (X) chart are

More information

A Clustering Algorithm for Key Frame Extraction Based on Density Peak

A Clustering Algorithm for Key Frame Extraction Based on Density Peak Journal of Computer and Communcatons, 2018, 6, 118-128 http://www.scrp.org/ournal/cc ISSN Onlne: 2327-5227 ISSN Prnt: 2327-5219 A Clusterng Algorthm for Key Frame Extracton Based on Densty Peak Hong Zhao

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

Domain Thesaurus Construction from Wikipedia *

Domain Thesaurus Construction from Wikipedia * Internatonal Conference on Computer, Networks and Communcaton Engneerng (ICCNCE 2013) Doman Thesaurus Constructon from Wkpeda * WenKe Yn 1, Mng Zhu 2, TanHao Chen 2 1 Department of Electronc Engneerng

More information

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1 4/14/011 Outlne Dscrmnatve classfers for mage recognton Wednesday, Aprl 13 Krsten Grauman UT-Austn Last tme: wndow-based generc obect detecton basc ppelne face detecton wth boostng as case study Today:

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

LRD: Latent Relation Discovery for Vector Space Expansion and Information Retrieval

LRD: Latent Relation Discovery for Vector Space Expansion and Information Retrieval LRD: Latent Relaton Dscovery for Vector Space Expanson and Informaton Retreval Techncal Report KMI-06-09 March, 006 Alexandre Gonçalves, Janhan Zhu, Dawe Song, Vctora Uren, Roberto Pacheco In Proc. of

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

Laplacian Eigenmap for Image Retrieval

Laplacian Eigenmap for Image Retrieval Laplacan Egenmap for Image Retreval Xaofe He Partha Nyog Department of Computer Scence The Unversty of Chcago, 1100 E 58 th Street, Chcago, IL 60637 ABSTRACT Dmensonalty reducton has been receved much

More information

Detection of an Object by using Principal Component Analysis

Detection of an Object by using Principal Component Analysis Detecton of an Object by usng Prncpal Component Analyss 1. G. Nagaven, 2. Dr. T. Sreenvasulu Reddy 1. M.Tech, Department of EEE, SVUCE, Trupath, Inda. 2. Assoc. Professor, Department of ECE, SVUCE, Trupath,

More information

Mathematics 256 a course in differential equations for engineering students

Mathematics 256 a course in differential equations for engineering students Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the

More information

Performance Evaluation of Information Retrieval Systems

Performance Evaluation of Information Retrieval Systems Why System Evaluaton? Performance Evaluaton of Informaton Retreval Systems Many sldes n ths secton are adapted from Prof. Joydeep Ghosh (UT ECE) who n turn adapted them from Prof. Dk Lee (Unv. of Scence

More information

Improving Web Image Search using Meta Re-rankers

Improving Web Image Search using Meta Re-rankers VOLUME-1, ISSUE-V (Aug-Sep 2013) IS NOW AVAILABLE AT: www.dcst.com Improvng Web Image Search usng Meta Re-rankers B.Kavtha 1, N. Suata 2 1 Department of Computer Scence and Engneerng, Chtanya Bharath Insttute

More information

Personalized Concept-Based Clustering of Search Engine Queries

Personalized Concept-Based Clustering of Search Engine Queries IEEE TRANSACTIONS ON JOURNAL NAME, MANUSCRIPT ID 1 Personalzed Concept-Based Clusterng of Search Engne Queres Kenneth Wa-Tng Leung, Wlfred Ng, and Dk Lun Lee Abstract The exponental growth of nformaton

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information

A Deflected Grid-based Algorithm for Clustering Analysis

A Deflected Grid-based Algorithm for Clustering Analysis A Deflected Grd-based Algorthm for Clusterng Analyss NANCY P. LIN, CHUNG-I CHANG, HAO-EN CHUEH, HUNG-JEN CHEN, WEI-HUA HAO Department of Computer Scence and Informaton Engneerng Tamkang Unversty 5 Yng-chuan

More information

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION Paulo Quntlano 1 & Antono Santa-Rosa 1 Federal Polce Department, Brasla, Brazl. E-mals: quntlano.pqs@dpf.gov.br and

More information

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like:

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like: Self-Organzng Maps (SOM) Turgay İBRİKÇİ, PhD. Outlne Introducton Structures of SOM SOM Archtecture Neghborhoods SOM Algorthm Examples Summary 1 2 Unsupervsed Hebban Learnng US Hebban Learnng, Cntd 3 A

More information

High-Boost Mesh Filtering for 3-D Shape Enhancement

High-Boost Mesh Filtering for 3-D Shape Enhancement Hgh-Boost Mesh Flterng for 3-D Shape Enhancement Hrokazu Yagou Λ Alexander Belyaev y Damng We z Λ y z ; ; Shape Modelng Laboratory, Unversty of Azu, Azu-Wakamatsu 965-8580 Japan y Computer Graphcs Group,

More information

Structural Analysis of Musical Signals for Indexing and Thumbnailing

Structural Analysis of Musical Signals for Indexing and Thumbnailing Structural Analyss of Muscal Sgnals for Indexng and Thumbnalng We Cha Barry Vercoe MIT Meda Laboratory {chawe, bv}@meda.mt.edu Abstract A muscal pece typcally has a repettve structure. Analyss of ths structure

More information

Deep Classification in Large-scale Text Hierarchies

Deep Classification in Large-scale Text Hierarchies Deep Classfcaton n Large-scale Text Herarches Gu-Rong Xue Dkan Xng Qang Yang 2 Yong Yu Dept. of Computer Scence and Engneerng Shangha Jao-Tong Unversty {grxue, dkxng, yyu}@apex.sjtu.edu.cn 2 Hong Kong

More information

Experiments in Text Categorization Using Term Selection by Distance to Transition Point

Experiments in Text Categorization Using Term Selection by Distance to Transition Point Experments n Text Categorzaton Usng Term Selecton by Dstance to Transton Pont Edgar Moyotl-Hernández, Héctor Jménez-Salazar Facultad de Cencas de la Computacón, B. Unversdad Autónoma de Puebla, 14 Sur

More information

Selecting Query Term Alterations for Web Search by Exploiting Query Contexts

Selecting Query Term Alterations for Web Search by Exploiting Query Contexts Selectng Query Term Alteratons for Web Search by Explotng Query Contexts Guhong Cao Stephen Robertson Jan-Yun Ne Dept. of Computer Scence and Operatons Research Mcrosoft Research at Cambrdge Dept. of Computer

More information

Unsupervised Negation Focus Identification with Word-Topic Graph Model

Unsupervised Negation Focus Identification with Word-Topic Graph Model Unsupervsed Negaton Focus Identfcaton th Word-Topc Graph Model Boe Zou Qaomng Zhu Guodong Zhou * Natural Language Processng Lab, School of Computer Scence and Technology Soocho Unversty, Suzhou, 215006,

More information

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng

More information

Analysis of Continuous Beams in General

Analysis of Continuous Beams in General Analyss of Contnuous Beams n General Contnuous beams consdered here are prsmatc, rgdly connected to each beam segment and supported at varous ponts along the beam. onts are selected at ponts of support,

More information

Edge Detection in Noisy Images Using the Support Vector Machines

Edge Detection in Noisy Images Using the Support Vector Machines Edge Detecton n Nosy Images Usng the Support Vector Machnes Hlaro Gómez-Moreno, Saturnno Maldonado-Bascón, Francsco López-Ferreras Sgnal Theory and Communcatons Department. Unversty of Alcalá Crta. Madrd-Barcelona

More information

Available online at Available online at Advanced in Control Engineering and Information Science

Available online at   Available online at   Advanced in Control Engineering and Information Science Avalable onlne at wwwscencedrectcom Avalable onlne at wwwscencedrectcom Proceda Proceda Engneerng Engneerng 00 (2011) 15000 000 (2011) 1642 1646 Proceda Engneerng wwwelsevercom/locate/proceda Advanced

More information

Object-Based Techniques for Image Retrieval

Object-Based Techniques for Image Retrieval 54 Zhang, Gao, & Luo Chapter VII Object-Based Technques for Image Retreval Y. J. Zhang, Tsnghua Unversty, Chna Y. Y. Gao, Tsnghua Unversty, Chna Y. Luo, Tsnghua Unversty, Chna ABSTRACT To overcome the

More information

Information Retrieval

Information Retrieval Anmol Bhasn abhasn[at]cedar.buffalo.edu Moht Devnan mdevnan[at]cse.buffalo.edu Sprng 2005 #$ "% &'" (! Informaton Retreval )" " * + %, ##$ + *--. / "#,0, #'",,,#$ ", # " /,,#,0 1"%,2 '",, Documents are

More information

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices Steps for Computng the Dssmlarty, Entropy, Herfndahl-Hrschman and Accessblty (Gravty wth Competton) Indces I. Dssmlarty Index Measurement: The followng formula can be used to measure the evenness between

More information

Web Document Classification Based on Fuzzy Association

Web Document Classification Based on Fuzzy Association Web Document Classfcaton Based on Fuzzy Assocaton Choochart Haruechayasa, Me-Lng Shyu Department of Electrcal and Computer Engneerng Unversty of Mam Coral Gables, FL 33124, USA charuech@mam.edu, shyu@mam.edu

More information

Chapter 6 Programmng the fnte element method Inow turn to the man subject of ths book: The mplementaton of the fnte element algorthm n computer programs. In order to make my dscusson as straghtforward

More information

Enhancement of Infrequent Purchased Product Recommendation Using Data Mining Techniques

Enhancement of Infrequent Purchased Product Recommendation Using Data Mining Techniques Enhancement of Infrequent Purchased Product Recommendaton Usng Data Mnng Technques Noraswalza Abdullah, Yue Xu, Shlomo Geva, and Mark Loo Dscplne of Computer Scence Faculty of Scence and Technology Queensland

More information

Ranking Techniques for Cluster Based Search Results in a Textual Knowledge-base

Ranking Techniques for Cluster Based Search Results in a Textual Knowledge-base Rankng Technques for Cluster Based Search Results n a Textual Knowledge-base Shefal Sharma Fetch Technologes, Inc 841 Apollo St, El Segundo, CA 90254 +1 (310) 414-9849 ssharma@fetch.com Sofus A. Macskassy

More information

A PATTERN RECOGNITION APPROACH TO IMAGE SEGMENTATION

A PATTERN RECOGNITION APPROACH TO IMAGE SEGMENTATION 1 THE PUBLISHING HOUSE PROCEEDINGS OF THE ROMANIAN ACADEMY, Seres A, OF THE ROMANIAN ACADEMY Volume 4, Number 2/2003, pp.000-000 A PATTERN RECOGNITION APPROACH TO IMAGE SEGMENTATION Tudor BARBU Insttute

More information

Module Management Tool in Software Development Organizations

Module Management Tool in Software Development Organizations Journal of Computer Scence (5): 8-, 7 ISSN 59-66 7 Scence Publcatons Management Tool n Software Development Organzatons Ahmad A. Al-Rababah and Mohammad A. Al-Rababah Faculty of IT, Al-Ahlyyah Amman Unversty,

More information

Smoothing Spline ANOVA for variable screening

Smoothing Spline ANOVA for variable screening Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory

More information

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc.

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc. [Type text] [Type text] [Type text] ISSN : 0974-74 Volume 0 Issue BoTechnology 04 An Indan Journal FULL PAPER BTAIJ 0() 04 [684-689] Revew on Chna s sports ndustry fnancng market based on market -orented

More information

The Effect of Similarity Measures on The Quality of Query Clusters

The Effect of Similarity Measures on The Quality of Query Clusters The effect of smlarty measures on the qualty of query clusters. Fu. L., Goh, D.H., Foo, S., & Na, J.C. (2004). Journal of Informaton Scence, 30(5) 396-407 The Effect of Smlarty Measures on The Qualty of

More information

Reducing Frame Rate for Object Tracking

Reducing Frame Rate for Object Tracking Reducng Frame Rate for Object Trackng Pavel Korshunov 1 and We Tsang Oo 2 1 Natonal Unversty of Sngapore, Sngapore 11977, pavelkor@comp.nus.edu.sg 2 Natonal Unversty of Sngapore, Sngapore 11977, oowt@comp.nus.edu.sg

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

Cross-Language Information Retrieval

Cross-Language Information Retrieval Feature Artcle: Cross-Language Informaton Retreval 19 Cross-Language Informaton Retreval Jan-Yun Ne 1 Abstract A research group n Unversty of Montreal has worked on the problem of cross-language nformaton

More information

Feature-Based Matrix Factorization

Feature-Based Matrix Factorization Feature-Based Matrx Factorzaton arxv:1109.2271v3 [cs.ai] 29 Dec 2011 Tanq Chen, Zhao Zheng, Quxa Lu, Wenan Zhang, Yong Yu {tqchen,zhengzhao,luquxa,wnzhang,yyu}@apex.stu.edu.cn Apex Data & Knowledge Management

More information

Classifier Swarms for Human Detection in Infrared Imagery

Classifier Swarms for Human Detection in Infrared Imagery Classfer Swarms for Human Detecton n Infrared Imagery Yur Owechko, Swarup Medasan, and Narayan Srnvasa HRL Laboratores, LLC 3011 Malbu Canyon Road, Malbu, CA 90265 {owechko, smedasan, nsrnvasa}@hrl.com

More information

Modular PCA Face Recognition Based on Weighted Average

Modular PCA Face Recognition Based on Weighted Average odern Appled Scence odular PCA Face Recognton Based on Weghted Average Chengmao Han (Correspondng author) Department of athematcs, Lny Normal Unversty Lny 76005, Chna E-mal: hanchengmao@163.com Abstract

More information

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 1. SSDH: Semi-supervised Deep Hashing for Large Scale Image Retrieval

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 1. SSDH: Semi-supervised Deep Hashing for Large Scale Image Retrieval IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY SSDH: Sem-supervsed Deep Hashng for Large Scale Image Retreval Jan Zhang, and Yuxn Peng arxv:607.08477v2 [cs.cv] 8 Jun 207 Abstract Hashng

More information

Unsupervised Learning

Unsupervised Learning Pattern Recognton Lecture 8 Outlne Introducton Unsupervsed Learnng Parametrc VS Non-Parametrc Approach Mxture of Denstes Maxmum-Lkelhood Estmates Clusterng Prof. Danel Yeung School of Computer Scence and

More information

From Comparing Clusterings to Combining Clusterings

From Comparing Clusterings to Combining Clusterings Proceedngs of the Twenty-Thrd AAAI Conference on Artfcal Intellgence (008 From Comparng Clusterngs to Combnng Clusterngs Zhwu Lu and Yuxn Peng and Janguo Xao Insttute of Computer Scence and Technology,

More information

Unsupervised Content Discovery in Composite Audio Rui Cai Department of Computer Science and Technology, Tsinghua Univ. Beijing, , China

Unsupervised Content Discovery in Composite Audio Rui Cai Department of Computer Science and Technology, Tsinghua Univ. Beijing, , China Unsupervsed Content Dscovery n Composte Audo Ru Ca Department of Computer Scence and Technology, Tsnghua Unv. Bejng, 100084, Chna caru01@mals.tsnghua.edu.cn Le Lu Mcrosoft Research Asa No. 49 Zhchun Road

More information

Related-Mode Attacks on CTR Encryption Mode

Related-Mode Attacks on CTR Encryption Mode Internatonal Journal of Network Securty, Vol.4, No.3, PP.282 287, May 2007 282 Related-Mode Attacks on CTR Encrypton Mode Dayn Wang, Dongda Ln, and Wenlng Wu (Correspondng author: Dayn Wang) Key Laboratory

More information

Associative Based Classification Algorithm For Diabetes Disease Prediction

Associative Based Classification Algorithm For Diabetes Disease Prediction Internatonal Journal of Engneerng Trends and Technology (IJETT) Volume-41 Number-3 - November 016 Assocatve Based Classfcaton Algorthm For Dabetes Dsease Predcton 1 N. Gnana Deepka, Y.surekha, 3 G.Laltha

More information

A Modified Median Filter for the Removal of Impulse Noise Based on the Support Vector Machines

A Modified Median Filter for the Removal of Impulse Noise Based on the Support Vector Machines A Modfed Medan Flter for the Removal of Impulse Nose Based on the Support Vector Machnes H. GOMEZ-MORENO, S. MALDONADO-BASCON, F. LOPEZ-FERRERAS, M. UTRILLA- MANSO AND P. GIL-JIMENEZ Departamento de Teoría

More information

Resolving Surface Forms to Wikipedia Topics

Resolving Surface Forms to Wikipedia Topics Resolvng Surface Forms to Wkpeda Topcs Ypng Zhou Lan Ne Omd Rouhan-Kalleh Flavan Vasle Scott Gaffney Yahoo! Labs at Sunnyvale {zhouy,lanne,omd,flavan,gaffney}@yahoo-nc.com Abstract Ambguty of entty mentons

More information

Semantic Illustration Retrieval for Very Large Data Set

Semantic Illustration Retrieval for Very Large Data Set Semantc Illustraton Retreval for Very Large Data Set Song Ka, Huang Te-Jun, Tan Yong-Hong Dgtal Meda Lab, Insttute of Computng Technology, Chnese Academy of Scences Beng, 00080, R Chna Insttute for Dgtal

More information

Determining Fuzzy Sets for Quantitative Attributes in Data Mining Problems

Determining Fuzzy Sets for Quantitative Attributes in Data Mining Problems Determnng Fuzzy Sets for Quanttatve Attrbutes n Data Mnng Problems ATTILA GYENESEI Turku Centre for Computer Scence (TUCS) Unversty of Turku, Department of Computer Scence Lemmnkäsenkatu 4A, FIN-5 Turku

More information

Incremental Learning with Support Vector Machines and Fuzzy Set Theory

Incremental Learning with Support Vector Machines and Fuzzy Set Theory The 25th Workshop on Combnatoral Mathematcs and Computaton Theory Incremental Learnng wth Support Vector Machnes and Fuzzy Set Theory Yu-Mng Chuang 1 and Cha-Hwa Ln 2* 1 Department of Computer Scence and

More information

Improving the Quality of Information Retrieval Using Syntactic Analysis of Search Query

Improving the Quality of Information Retrieval Using Syntactic Analysis of Search Query Improvng the Qualty of Informaton Retreval Usng Syntactc Analyss of Search Query Nadezhda Yarushkna 1[0000-0002-5718-8732], Aleksey Flppov 1[0000-0003-0008-5035], and Mara Grgorcheva 1[0000-0001-7492-5178]

More information

Learning-Based Top-N Selection Query Evaluation over Relational Databases

Learning-Based Top-N Selection Query Evaluation over Relational Databases Learnng-Based Top-N Selecton Query Evaluaton over Relatonal Databases Lang Zhu *, Wey Meng ** * School of Mathematcs and Computer Scence, Hebe Unversty, Baodng, Hebe 071002, Chna, zhu@mal.hbu.edu.cn **

More information

Meta-heuristics for Multidimensional Knapsack Problems

Meta-heuristics for Multidimensional Knapsack Problems 2012 4th Internatonal Conference on Computer Research and Development IPCSIT vol.39 (2012) (2012) IACSIT Press, Sngapore Meta-heurstcs for Multdmensonal Knapsack Problems Zhbao Man + Computer Scence Department,

More information

Innovation Typology. Collaborative Authoritativeness. Focused Web Mining. Text and Data Mining In Innovation. Generational Models

Innovation Typology. Collaborative Authoritativeness. Focused Web Mining. Text and Data Mining In Innovation. Generational Models Text and Data Mnng In Innovaton Joseph Engler Innovaton Typology Generatonal Models 1. Lnear or Push (Baroque) 2. Pull (Romantc) 3. Cyclc (Classcal) 4. Strategc (New Age) 5. Collaboratve (Polyphonc) Collaboratve

More information