Lexical ambiguity in cross-language image retrieval: a preliminary analysis.

Size: px
Start display at page:

Download "Lexical ambiguity in cross-language image retrieval: a preliminary analysis."

Transcription

1 Lexical ambiguity in cross-language image retrieval: a preliminary analysis. Borja Navarro-Colorado, Marcel Puchol-Blasco, Rafael M. Terol, Sonia Vázquez and Elena Lloret. Natural Language Processing Research Group (GPLSI) Department of Software and Computing Systems University of Alicante. {borja,marcel,rafamt,svazquez,elloret}@dlsi.ua.es Abstract In this paper we calculate and analyse the lexical ambiguity of queries in a crosslingual Image Retrieval (Flickling) and compare it with the results obtained by users. We want to know to what extent the lexical ambiguity of a query influences the correct localization of an image in a multilingual framework. With this, our final objective is to determine the necessity of Word Sense Disambiguation systems in Information Retrieval tasks, according to user behaviour. The results show that users really deal with lexical ambiguity: in general terms, users find the correct image with low ambiguity queries. Categories and Subject Descriptors H.3 [Information Storage and Retrieval]: Content Analysis and Indexing; H.3.3 Information Search and Retrieval; I.2 [Artificial Intelligence]: I Natural Language Processing General Terms Human Factors, Measurement, Performance, Experimentation Keywords Lexical ambiguity, Interactive Image Retrieval, Word Sense Disambiguation 1 Introduction In this paper 1 we present a preliminary study about how users deal with lexical ambiguity during the interaction with an Information Retrieval system. Taking advantage of Flickling system 2, we have analysed user behaviour with lexical ambiguity in a multilingual image retrieval task. The Natural Language Processing community has not reached an agreement as to whether Word Sense Disambiguation (WSD) systems are useful or not in Information Retrieval (IR) tasks. In other words, there is not an agreement about IR systems improving the retrieval process with a WSD system. 1 This paper has been supported by the Spanish Government, project TEXT-MESS TIN C Elena Lloret is funded by the FPI grant (BES ) from the Spanish Ministry of Science and Innovation, under this project. Marcel Puchol-Blasco is funded by the research grant BFPI06/182 from the Valencian Regional Government (Generalitat Valenciana). 2

2 In general terms, on the one hand, papers like [14, 15] said that a WSD system really does not improve the text retrieval process. On the other hand, papers like [5, 8] said that indexing the text collection with senses, the retrieval process improves. During the CLEF 2008 some papers argued about this topic. For example, [9] said that only in most specific cases, would a WSD system improve the success of IR system; while [6, 7, 10, 1] said that there is no improvement with WSD systems. However, papers like [11, 12, 3] prove that there is a real improvement on Information Retrieval with WSD systems. At present, it seems that only for ambiguous words whose senses do not have semantic relations among them (homographs), a WSD system is useful in IR tasks. There are different approaches to WSD, with different disambiguation algorithms and lexical resources, and different problems [2]. However, all of them introduce errors and are time-consuming in an IR system, therefore nowadays it is difficult to know to what extent they could be useful in IR tasks. In our opinion, the key question is to know if lexical ambiguity is really a decisive factor in IR tasks for a user: if it is, WSD systems are necessary in IR. If not, they are unnecessary. Following Robins [13], in order to design an effective IR system, it is necessary to know how users interact with these systems. Taking advantage of iclef 2009 task, we have analysed the lexical ambiguity of the queries used in Flickling system by different users, and we have compared it with the results obtained by them. With this, we have extracted interesting data of user behaviour about lexical ambiguity on an IR task that could be useful to throw light on this question. In the next section, we introduce the mathematical formula created to calculate the lexical ambiguity of a query. Then we show some data extracted from the iclef log: the lexical ambiguity and the image retrieved by the system, the lexical ambiguity and the images correctly found by the users, and the lexical ambiguity of a specific user. Finally we extract some preliminary results and future work. 2 The lexical ambiguity of the queries In order to achieve our objective, we need to know the general ambiguity of each query, and then compare it with the success of the query. That is, according to the general ambiguity of a query, we want to know if the image has been located or not, and also if both aspects are related. In this sense, we must represent each query with a value that shows the general ambiguity of this query. On the one hand, the ambiguity of each word is the number of senses that it has in WordNet [4] or EuroWordNet [16]. On the other hand, a query is made up by more than one word. This set of words has a relation of co-occurrence (for the user, the image will probably be annotated with these words in the title, description or tags) and the senses of each word have influence on one another. With this frame, we have developed a formula to calculate the general ambiguity of a query with two variables: the number of words in the query and the number of senses of each word. α = (1 1 Q s ) 1 n where α represents the general ambiguity of the query, s the number of senses for each word in WordNet or EuroWordNet, and n the number of words in the query. Next, we are going to split the formula into different parts, in order to explain each in more detail. The ambiguity of one word is uniquely related to the number of associated senses to it in the WordNet or EuroWordNet lexical resource, but the ambiguity of a query is not the sum of the word senses. The user introduces a word in a query with a specific sense. Therefore, we take into account the probability of each sense for each word. The formula 1 s calculates the ambiguity of the word: the probability that the user is using only one sense for a word. The scores obtained by this formula are in the range between 0 (upper ambiguity) and 1 (lower ambiguity). Moreover, considering that a query is composed of a set of n words, the ambiguity of a query will be associated to the ambiguity of its n related words. The formula that computes this ambiguity

3 is defined such as 1 s 1 1 s s n that simplified is defined as such Q n. Finally, with the i=1 si aim of considering the opposite situation (values near zero indicate lower ambiguity while values near one indicate upper ambiguity), we introduce the opposite element 1 in the formula being 1 1 the final formula that computes the ambiguity of each query. Q n i=1 si The second part of the formula ( 1 n ) is the reciprocal of the number of words in the query. The number of words in a query has a specific weight during the retrieval process and in the general ambiguity of the query too. The words that make up a query are the context in which each ambiguous word is disambiguated. That is, for a user, each word has a specific sense in this specific context (the rest of words in the query). Therefore, the number of words in the context influences the ambiguity of the query. From a semantic point of view, the fact that different languages are used in the same query is a method of lexical disambiguation. In this sense, users employ translation to deal with the semantic ambiguity. The same word in different languages could have a different amount of senses (for example, the Spanish word jugar -5 senses- and the English translation to play -29 senses-). In these cases, we consider this set of words (that is, a word and its translation to another language) as the same word. Considering that users use translation to reduce the ambiguity, we can conclude that the ambiguity of these complex words are the intersection of their senses, that is, we consider the language with less senses for the same word. In the previous example, we considered that the ambiguity of jugar - to play is the amount of senses of only jugar : 5 senses. There is a general problem with words that do not appear in WordNet or EuroWordNet. In these cases, we assume that that word is either a proper noun or a technical term. In both cases, we consider that these words do not have ambiguity, and assume only one sense for each 3. In a nutshell, from our point of view, a query with only one word and five senses has more ambiguity that a query with five words each with one sense. In each of these queries, the number of senses is the same, but not the number of words. This formula tries to represent this fact. 3 Log analysis and results For this paper we take into account only three languages: English, Spanish and Italian 4. We have calculated the general lexical ambiguity of each query with the previous formula, and we have extracted some data related to the success of these queries. Specifically, we have extracted data about these three aspects: lexical ambiguity of the queries and images retrieved by the system, lexical ambiguity of the queries and images found by the users, lexical ambiguity of the queries and images found by a specific user. Let s see the data extracted for each of these aspects. 3.1 Lexical ambiguity of the queries and images retrieved by the system The average number of images returned for a specific ambiguity in a question has been extracted. Graph 3.1 shows the average of images retrieved by the system according to the level of lexical ambiguity within the queries. 3 Another possibility is that a word does not appear in the lexicon due to a typographic error and spelling mistake by the user. This case is not taken into account. However, as we will show later, this assumption has introduced some errors in the results. 4 Portuguese and Dutch have not been taken into account. The ambiguity associated with words of these languages will be the minimum number of senses where we have translation. Queries with no results are not relevant for our experiments because the words contained in the queries usually are words with spelling mistakes by the users.

4 These data show that the amount of image retrieved by the system increases according to the level of the queries lexical ambiguity. It is meaningful the increment of images retrieved from an ambiguity level of 0.5 to 0.9. With these data, we can say that the system has more precision with low ambiguity queries. However, it is curious the amount of retrieved images with non-ambiguous queries (0). It is higher than the images retrieved with ambiguity 0.1 or 0.2. This is an exception to the general behaviour of data. The reason for this exception is the words that do not appear in the lexicon. As we said before, the words that do not appear in the lexicon are considered non-ambiguous words (proper nouns or technical terms). However, it has been a hard decision due to, on one hand, proper nouns and technical terms possibly being ambiguous; and, on the other hand, there are ambiguous words that simply do not appear in the lexicon and words written in other languages. In all these cases, we have considered these words as non-ambiguous words, and, in many times, it is not true Ambiguity and average of image retrieved Images Ambiguity This is the reason why the amount of queries with no ambiguity is high. However, not all these queries really have no ambiguity. It must be also taken into account that, in a multilingual framework, we depend totally on the completeness of each lexicon. This aspect of the formula must be reviewed. Independently of this fact, the data show a clear increase in the image retrieved according to the increment of the queries ambiguity. With the maximum ambiguity of the queries (0.8 and 0.9), the system retrieves the maximum amount of images allowed (500). With this we can make a preliminary conclusion: with more lexical ambiguity in the query, a higher number of images are retrieved and the system has less precision. In this sense, a WSD system could be useful for IR tasks. 3.2 Ambiguity of the queries and images found by the users Table 1 and graphs 3.2 and 3.2b show the average of images located correctly by all users according to the queries lexical ambiguity. Images not located by the users are included too. To clarify the data, we have divided them into two graphs. The first one shows the data with queries with 0 ambiguity up to 0.5 ambiguity. The second one, from 0.5 ambiguity to 0.9.

5 Ambiguity Images not found Images found Total (23.59%) 1046 (74.4%) (21.77%) 963 (78.22%) (20.1%) 1220 (79.89%) (24.56%) 476 (75.43%) (16.94%) 250 (83%) (54.21%) 38 (45.78%) (64.28%) 10 (35.71%) (100%) 0 (0%) (50%) 7 (50%) (50%) 1 (50%) 2 Total 1184 (22.8%) 4011 (77.2%) 5195 Table 1: Ambiguity and image found and not found Ambiguity and images found Not found Found 800 Images Ambiguity These data clearly show that with low ambiguous queries, the users correctly find more much images than with queries with high ambiguity: 71.3% of the images have been correctly located with queries with low ambiguity (between 0 to 0.3). However, an unambiguous query doesn t always mean that the user correctly finds an image, or an ambiguous query signify that the user doesn t find the image. For example, as shown the table, with queries with ambiguity 0, a 74.4% of the images are correctly located and a 23.5% are not located. For queries with high ambiguity (second graph) the situation is different. Actually, these data do not show relevant behaviour: the amount of images located and not located by the user are more or less the same (approximately 50%), but the total amount of images located and not located by the users with higher ambiguity queries is low. With these data we can achieve interesting conclusions. The great majority of images correctly located has queries with low ambiguity: 1046 images for queries with ambiguity 0; 963 images for queries with ambiguity 0.1, or 1220 images for queries with ambiguity 0.2, from a total of % of the images correctly located have queries with low ambiguity (between 0 and 0.3). However, the 88.93% of the images not located have queries with the same level of ambiguity. The reasons why an image is not located could be different: for example, cases where the user does not know the language of the image or the correct translation, or cases where the user does not find the correct words.

6 However, we notice that, in the majority of cases, when a user correctly find an image, the ambiguity of the query is low. From the total amount of images, 71.3% have been correctly located with queries with low ambiguity (between 0 to 0.3), against the 20.26% of images not located with queries with the same level of ambiguity Images b. Ambiguity and images found Not found Found Ambiguity According to these data we can conclude that, in the majority of cases, the query s lexical ambiguity influences the precision of the user finding images with a Multilingual Image Retrieval system. However, at the same time, due to the users correctly finding some images with queries with high ambiguity and they can not find correctly some images with queries with low ambiguity, our conclusion can not be categorical: the ambiguity of the query is an important (and maybe decisive) factor, but it is not the unique factor for a correct image retrieval by a user. In this sense, the use of a WSD system in order to improve the IR tasks is important, and maybe decisive. In any case, it is interesting that, according to these data, it is not necessary to disambiguate with one sense per word. Up a 0.3 level of ambiguity is acceptable for a user to find images correctly. Therefore, a coarse-grained WSD system 5 could be useful for IR tasks. 3.3 Ambiguity of the queries and images found by a specific user Another interesting fact is how a specific user deals with the lexical ambiguity of the queries. In order to show this, we have extracted the lexical ambiguity of the queries from the best user (the user that has located more images correctly). They are shown in graph 3.3 and table 2. We have taken into account only the last query: the query in which the user has found the image. 5 That is, WSD systems that could disambiguate with a low level of granularity, with more than one related sense per word, or with general semantic categories.

7 Ambiguity of the query Amount of images found Total 396 Table 2: Ambiguity for the best user

8 Queries Ambiguity for the best user Ambiguity In this case, the data are clear: the majority of images has been correctly found with nonambiguous queries (47.9% images). With low ambiguity (between 0.1 and 0.3) the 27.5% (109 images) has been correctly found. However, 17.9% (71 images) of the images have been found with high ambiguity queries (between 0.5 and 0.9). The behaviour of data with a specific user is similar to the average of users: the majority of images has been found correctly with low ambiguity queries (75.4% for queries with ambiguity between 0 and 0.3). However, the user can find images correctly with high ambiguous queries (17.9%). Therefore, the conclusion is the same. For this user, according to these data, the lexical ambiguity of the query is an important factor: with less ambiguous queries, the user can find more images correctly. However, it is not the unique factor, because there are some cases in which the user can find images correctly with high ambiguous queries. 4 Conclusions and future work According to all these data, we assume the following preliminary conclusions: In general terms, the lexical ambiguity of the queries influences the precision of a user finding images: with less ambiguous queries, the user can find more images correctly. The user can find images correctly with low ambiguous queries: ambiguity between 0 and 0.3. For a human, it is not necessary to have a fine grain disambiguation: with an ambiguity of 0 to 0.3, he can find the majority of images correctly. The user tries to make up the least ambiguous query to improve the retrieval task. Although the lexical ambiguity of the query is an important factor to find images correctly, it is not the unique nor absolute factor: they can find some images correctly with ambiguous queries (on average, less than with low ambiguous queries). Extrapolating these results to the general case about the usefulness of WSD systems in IR tasks, we think that it is necessary to apply these kind of systems to improve the precision of the IR systems. However, it is not necessary to have a fine-grain disambiguation. Otherwise, coarsegrained disambiguation could be useful and enough. For example, to disambiguate with more than one related sense (in WordNet), to use lexical resources with low ambiguity, to disambiguate with semantic classes, generic concepts or domains, etc. As a future work, there are some aspects that must be extensively developed:

9 The formula must be improved in order to correctly represent words that do not appear in the lexical resource. Some options are: to use more than one lexicon (general and technical) or to apply Named Entities Recognition systems to the query. In order to prove that coarse-grained disambiguation is enough for an IR task, it is necessary to repeat the experiment with other lexical resources with less granularity than WordNet, or to reduce the granularity of WordNet with some well-known technique (domains, clustering of related senses, semantic classes, etc.). Finally, all these data have been extracted from an Image Retrieval system. It is necessary to complete the analysis with a more generic Text Retrieval system. References [1] Otavio Costa Acosta, André Pinto Geraldo, Viviane Moreira Orengo, and Aline Villavicencio. UFRGS@CLEF2008: Indexing Multiword Expressions for Information Retrieval. In Working Notes for the CLEF 2008 Workshop, [2] Eneko Agirre and Philip Glenny Edmonds, editors. Word Sense Disambiguation: algorithms and applications. Springer, [3] Pierpaolo Basile, Annalina Caputo, and Giovanni Semeraro. UNIBA-SENSE at CLEF 2008: SEmantic N-levels Search Engine. In Working Notes for the CLEF 2008 Workshop, [4] Christiane Fellbaum, editor. WordNet. An Electronic Lexical Database. MIT Press, [5] Julio Gonzalo, Felisa Verdejo, Irina Chugur, and Juan M. Cigarrán. Indexing with WordNet synsets can improve Text Retrieval. In Usage of WordNet in Natural Language Processing Systems. Coling-ACL Workshop., [6] Jacques Guyot, Gilles Falquet, Saïd Radhouani, and Karim Benzineb. UNIGE Experiments on Robust Word Sense Disambiguation. In Working Notes for the CLEF 2008 Workshop, [7] Andreas Juffinger, Roman Kern, and Michael Granitzer. Exploiting Co-occurrence on Corpus and Document Level for Fair Cross-language Retrieval. In Working Notes for the CLEF 2008 Workshop, [8] Robert Krovetz. On the Importance of Word Sense Disambiguation for Information Retrieval. In Creating and Using Semantics for Information Retrieval and Filtering. State of the Art and Future Research. Third International Conference on Language Resources and Evaluation (LREC) workshop, [9] Fernando Martínez-Santiago, José M. Perea-Ortega, and Miguel A. García-Cumbreras. SINAI at Robust WSD CLEF 2008: When WSD is a Good Idea for Information Retrieval tasks? In Working Notes for the CLEF 2008 Workshop, [10] Sergio Navarro, Fernando Llopis, and Rafael Muñoz. IRn in the CLEF Robust WSD Task In Working Notes for the CLEF 2008 Workshop, [11] Arantxa Otegi, Eneko Agirre, and German Rigau. IXA at CLEF 2008 Robust-WSD Task: using Word Sense Disambiguation for (Cross Lingual) Information Retrieval. In Working Notes for the CLEF 2008 Workshop, [12] José R. Pérez-Agüera and Hugo Zaragoza. UCM-Y!R at CLEF 2008 Robust and WSD tasks. In Working Notes for the CLEF 2008 Workshop, 2008.

10 [13] D. Robins. Interactive Information Retrieval: contexts and basic notions. Informing Science, 3(2):51 61, [14] Mark Sanderson. Word Sense Disambiguation and Information Retrieval. In Proceedings of the 17th ACM SIGIR Conference, pages , [15] Ellen M. Voorhees. Using WordNet to Disambiguate Word Senses for Text Retrieval. In Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval, pages , [16] Piek Vossen, editor. EuroWordNet: A Multilingual Database with Lexical Semantic Networks. Kluwer Academic Publishers, 1998.

Analysis of Word Sense Disambiguation-Based Information Retrieval

Analysis of Word Sense Disambiguation-Based Information Retrieval Analysis of Word Sense Disambiguation-Based Information Retrieval Jacques Guyot and Gilles Falquet and Saïd Radhouani and Karim Benzineb Centre Universitaire dinformatique, University of Geneva Route de

More information

Improving Retrieval Experience Exploiting Semantic Representation of Documents

Improving Retrieval Experience Exploiting Semantic Representation of Documents Improving Retrieval Experience Exploiting Semantic Representation of Documents Pierpaolo Basile 1 and Annalina Caputo 1 and Anna Lisa Gentile 1 and Marco de Gemmis 1 and Pasquale Lops 1 and Giovanni Semeraro

More information

Evaluating wordnets in Cross-Language Information Retrieval: the ITEM search engine

Evaluating wordnets in Cross-Language Information Retrieval: the ITEM search engine Evaluating wordnets in Cross-Language Information Retrieval: the ITEM search engine Felisa Verdejo, Julio Gonzalo, Anselmo Peñas, Fernando López and David Fernández Depto. de Ingeniería Eléctrica, Electrónica

More information

arxiv:cmp-lg/ v1 5 Aug 1998

arxiv:cmp-lg/ v1 5 Aug 1998 Indexing with WordNet synsets can improve text retrieval Julio Gonzalo and Felisa Verdejo and Irina Chugur and Juan Cigarrán UNED Ciudad Universitaria, s.n. 28040 Madrid - Spain {julio,felisa,irina,juanci}@ieec.uned.es

More information

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISSN : 2456-3307 Some Issues in Application of NLP to Intelligent

More information

Alicante at CLEF 2009 Robust-WSD Task

Alicante at CLEF 2009 Robust-WSD Task Alicante at CLEF 2009 Robust-WSD Task Javi Fernández, Rubén Izquierdo, and José M. Gómez University of Alicante, Department of Software and Computing Systems San Vicente del Raspeig Road, 03690 Alicante,

More information

Making Sense Out of the Web

Making Sense Out of the Web Making Sense Out of the Web Rada Mihalcea University of North Texas Department of Computer Science rada@cs.unt.edu Abstract. In the past few years, we have witnessed a tremendous growth of the World Wide

More information

MIRACLE at ImageCLEFmed 2008: Evaluating Strategies for Automatic Topic Expansion

MIRACLE at ImageCLEFmed 2008: Evaluating Strategies for Automatic Topic Expansion MIRACLE at ImageCLEFmed 2008: Evaluating Strategies for Automatic Topic Expansion Sara Lana-Serrano 1,3, Julio Villena-Román 2,3, José C. González-Cristóbal 1,3 1 Universidad Politécnica de Madrid 2 Universidad

More information

Using the Multilingual Central Repository for Graph-Based Word Sense Disambiguation

Using the Multilingual Central Repository for Graph-Based Word Sense Disambiguation Using the Multilingual Central Repository for Graph-Based Word Sense Disambiguation Eneko Agirre, Aitor Soroa IXA NLP Group University of Basque Country Donostia, Basque Contry a.soroa@ehu.es Abstract

More information

LexiRes: A Tool for Exploring and Restructuring EuroWordNet for Information Retrieval

LexiRes: A Tool for Exploring and Restructuring EuroWordNet for Information Retrieval LexiRes: A Tool for Exploring and Restructuring EuroWordNet for Information Retrieval Ernesto William De Luca and Andreas Nürnberger 1 Abstract. The problem of word sense disambiguation in lexical resources

More information

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition Ana Zelaia, Olatz Arregi and Basilio Sierra Computer Science Faculty University of the Basque Country ana.zelaia@ehu.es

More information

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition Ana Zelaia, Olatz Arregi and Basilio Sierra Computer Science Faculty University of the Basque Country ana.zelaia@ehu.es

More information

INTERCONNECTING AND MANAGING MULTILINGUAL LEXICAL LINKED DATA. Ernesto William De Luca

INTERCONNECTING AND MANAGING MULTILINGUAL LEXICAL LINKED DATA. Ernesto William De Luca INTERCONNECTING AND MANAGING MULTILINGUAL LEXICAL LINKED DATA Ernesto William De Luca Overview 2 Motivation EuroWordNet RDF/OWL EuroWordNet RDF/OWL LexiRes Tool Conclusions Overview 3 Motivation EuroWordNet

More information

BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network

BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network Roberto Navigli, Simone Paolo Ponzetto What is BabelNet a very large, wide-coverage multilingual

More information

Web People Search with Domain Ranking

Web People Search with Domain Ranking Web People Search with Domain Ranking Zornitsa Kozareva 1, Rumen Moraliyski 2, and Gaël Dias 2 1 University of Alicante, Campus de San Vicente, Spain zkozareva@dlsi.ua.es 2 University of Beira Interior,

More information

Improvement of Web Search Results using Genetic Algorithm on Word Sense Disambiguation

Improvement of Web Search Results using Genetic Algorithm on Word Sense Disambiguation Volume 3, No.5, May 24 International Journal of Advances in Computer Science and Technology Pooja Bassin et al., International Journal of Advances in Computer Science and Technology, 3(5), May 24, 33-336

More information

Overview of iclef 2008: search log analysis for Multilingual Image Retrieval

Overview of iclef 2008: search log analysis for Multilingual Image Retrieval Overview of iclef 2008: search log analysis for Multilingual Image Retrieval Julio Gonzalo Paul Clough Jussi Karlgren UNED U. Sheffield SICS Spain United Kingdom Sweden julio@lsi.uned.es p.d.clough@sheffield.ac.uk

More information

Noida institute of engineering and technology,greater noida

Noida institute of engineering and technology,greater noida Impact Of Word Sense Ambiguity For English Language In Web IR Prachi Gupta 1, Dr.AnuragAwasthi 2, RiteshRastogi 3 1,2,3 Department of computer Science and engineering, Noida institute of engineering and

More information

A graph-based method to improve WordNet Domains

A graph-based method to improve WordNet Domains A graph-based method to improve WordNet Domains Aitor González, German Rigau IXA group UPV/EHU, Donostia, Spain agonzalez278@ikasle.ehu.com german.rigau@ehu.com Mauro Castillo UTEM, Santiago de Chile,

More information

DAEBAK!: Peripheral Diversity for Multilingual Word Sense Disambiguation

DAEBAK!: Peripheral Diversity for Multilingual Word Sense Disambiguation DAEBAK!: Peripheral Diversity for Multilingual Word Sense Disambiguation Steve L. Manion University of Canterbury Christchurch, New Zealand steve.manion @pg.canterbury.ac.nz Raazesh Sainudiin University

More information

CACAO PROJECT AT THE 2009 TASK

CACAO PROJECT AT THE 2009 TASK CACAO PROJECT AT THE TEL@CLEF 2009 TASK Alessio Bosca, Luca Dini Celi s.r.l. - 10131 Torino - C. Moncalieri, 21 alessio.bosca, dini@celi.it Abstract This paper presents the participation of the CACAO prototype

More information

WikiTranslate: Query Translation for Cross-Lingual Information Retrieval Using Only Wikipedia

WikiTranslate: Query Translation for Cross-Lingual Information Retrieval Using Only Wikipedia WikiTranslate: Query Translation for Cross-Lingual Information Retrieval Using Only Wikipedia Dong Nguyen, Arnold Overwijk, Claudia Hauff, Dolf R.B. Trieschnigg, Djoerd Hiemstra, and Franciska M.G. de

More information

Disambiguating Search by Leveraging a Social Context Based on the Stream of User s Activity

Disambiguating Search by Leveraging a Social Context Based on the Stream of User s Activity Disambiguating Search by Leveraging a Social Context Based on the Stream of User s Activity Tomáš Kramár, Michal Barla and Mária Bieliková Faculty of Informatics and Information Technology Slovak University

More information

Putting ontologies to work in NLP

Putting ontologies to work in NLP Putting ontologies to work in NLP The lemon model and its future John P. McCrae National University of Ireland, Galway Introduction In natural language processing we are doing three main things Understanding

More information

University of Santiago de Compostela at CLEF-IP09

University of Santiago de Compostela at CLEF-IP09 University of Santiago de Compostela at CLEF-IP9 José Carlos Toucedo, David E. Losada Grupo de Sistemas Inteligentes Dept. Electrónica y Computación Universidad de Santiago de Compostela, Spain {josecarlos.toucedo,david.losada}@usc.es

More information

CIRGDISCO at RepLab2012 Filtering Task: A Two-Pass Approach for Company Name Disambiguation in Tweets

CIRGDISCO at RepLab2012 Filtering Task: A Two-Pass Approach for Company Name Disambiguation in Tweets CIRGDISCO at RepLab2012 Filtering Task: A Two-Pass Approach for Company Name Disambiguation in Tweets Arjumand Younus 1,2, Colm O Riordan 1, and Gabriella Pasi 2 1 Computational Intelligence Research Group,

More information

Multilingual Information Retrieval

Multilingual Information Retrieval Proposal for Tutorial on Multilingual Information Retrieval Proposed by Arjun Atreya V Shehzaad Dhuliawala ShivaKarthik S Swapnil Chaudhari Under the direction of Prof. Pushpak Bhattacharyya Department

More information

Integrating Spanish Linguistic Resources in a Web Site Assistant

Integrating Spanish Linguistic Resources in a Web Site Assistant Integrating Spanish Linguistic Resources in a Web Site Assistant Paloma Martínez*, Ana García-Serrano, Alberto Ruiz-Cristina * Universidad Carlos III de Madrid Avd. Universidad 30, 28911 Leganés, Madrid,

More information

Measuring conceptual distance using WordNet: the design of a metric for measuring semantic similarity

Measuring conceptual distance using WordNet: the design of a metric for measuring semantic similarity Measuring conceptual distance using WordNet: the design of a metric for measuring semantic similarity Item Type text; Article Authors Lewis, William D. Publisher University of Arizona Linguistics Circle

More information

Evaluating a Conceptual Indexing Method by Utilizing WordNet

Evaluating a Conceptual Indexing Method by Utilizing WordNet Evaluating a Conceptual Indexing Method by Utilizing WordNet Mustapha Baziz, Mohand Boughanem, Nathalie Aussenac-Gilles IRIT/SIG Campus Univ. Toulouse III 118 Route de Narbonne F-31062 Toulouse Cedex 4

More information

A fully-automatic approach to answer geographic queries: GIRSA-WP at GikiP

A fully-automatic approach to answer geographic queries: GIRSA-WP at GikiP A fully-automatic approach to answer geographic queries: at GikiP Johannes Leveling Sven Hartrumpf Intelligent Information and Communication Systems (IICS) University of Hagen (FernUniversität in Hagen)

More information

Serbian Wordnet for biomedical sciences

Serbian Wordnet for biomedical sciences Serbian Wordnet for biomedical sciences Sanja Antonic University library Svetozar Markovic University of Belgrade, Serbia antonic@unilib.bg.ac.yu Cvetana Krstev Faculty of Philology, University of Belgrade,

More information

Graph-based Entity Linking using Shortest Path

Graph-based Entity Linking using Shortest Path Graph-based Entity Linking using Shortest Path Yongsun Shim 1, Sungkwon Yang 1, Hyunwhan Joe 1, Hong-Gee Kim 1 1 Biomedical Knowledge Engineering Laboratory, Seoul National University, Seoul, Korea {yongsun0926,

More information

University of Amsterdam at INEX 2010: Ad hoc and Book Tracks

University of Amsterdam at INEX 2010: Ad hoc and Book Tracks University of Amsterdam at INEX 2010: Ad hoc and Book Tracks Jaap Kamps 1,2 and Marijn Koolen 1 1 Archives and Information Studies, Faculty of Humanities, University of Amsterdam 2 ISLA, Faculty of Science,

More information

Word Sense Disambiguation for Cross-Language Information Retrieval

Word Sense Disambiguation for Cross-Language Information Retrieval Word Sense Disambiguation for Cross-Language Information Retrieval Mary Xiaoyong Liu, Ted Diamond, and Anne R. Diekema School of Information Studies Syracuse University Syracuse, NY 13244 xliu03@mailbox.syr.edu

More information

Multilingual Image Search from a user s perspective

Multilingual Image Search from a user s perspective Multilingual Image Search from a user s perspective Julio Gonzalo, Paul Clough, Jussi Karlgren QUAERO-Image CLEF workshop, 16/09/08 Finding is a matter of two fast stupid smart slow great potential for

More information

WordNet-based User Profiles for Semantic Personalization

WordNet-based User Profiles for Semantic Personalization PIA 2005 Workshop on New Technologies for Personalized Information Access WordNet-based User Profiles for Semantic Personalization Giovanni Semeraro, Marco Degemmis, Pasquale Lops, Ignazio Palmisano LACAM

More information

Identifying and Ranking Possible Semantic and Common Usage Categories of Search Engine Queries

Identifying and Ranking Possible Semantic and Common Usage Categories of Search Engine Queries Identifying and Ranking Possible Semantic and Common Usage Categories of Search Engine Queries Reza Taghizadeh Hemayati 1, Weiyi Meng 1, Clement Yu 2 1 Department of Computer Science, Binghamton university,

More information

Query Difficulty Prediction for Contextual Image Retrieval

Query Difficulty Prediction for Contextual Image Retrieval Query Difficulty Prediction for Contextual Image Retrieval Xing Xing 1, Yi Zhang 1, and Mei Han 2 1 School of Engineering, UC Santa Cruz, Santa Cruz, CA 95064 2 Google Inc., Mountain View, CA 94043 Abstract.

More information

Antonio Fernández Orquín, Andrés Montoyo, Rafael Muñoz

Antonio Fernández Orquín, Andrés Montoyo, Rafael Muñoz UMCC_DLSI: Reinforcing a Ranking Algorithm with Sense Frequencies and Multidimensional Semantic Resources to solve Multilingual Word Sense Disambiguation Yoan Gutiérrez, Yenier Castañeda, Andy González,

More information

Identifying Poorly-Defined Concepts in WordNet with Graph Metrics

Identifying Poorly-Defined Concepts in WordNet with Graph Metrics Identifying Poorly-Defined Concepts in WordNet with Graph Metrics John P. McCrae and Narumol Prangnawarat Insight Centre for Data Analytics, National University of Ireland, Galway john@mccr.ae, narumol.prangnawarat@insight-centre.org

More information

Research Article. August 2017

Research Article. August 2017 International Journals of Advanced Research in Computer Science and Software Engineering ISSN: 2277-128X (Volume-7, Issue-8) a Research Article August 2017 English-Marathi Cross Language Information Retrieval

More information

Using External Knowledge to Solve Multi-Dimensional Queries. FALQUET, Gilles, RADHOUANI, Saïd

Using External Knowledge to Solve Multi-Dimensional Queries. FALQUET, Gilles, RADHOUANI, Saïd Proceedings Chapter Using External Knowledge to Solve Multi-Dimensional Queries FALQUET, Gilles, RADHOUANI, Saïd Reference FALQUET, Gilles, RADHOUANI, Saïd. Using External Knowledge to Solve Multi-Dimensional

More information

Are Users Willing To Search CL?

Are Users Willing To Search CL? UNED@iCLEF: Are Users Willing To Search CL? Cross Language Evaluation Forum 2006 Javier Artiles, Fernando López Ostenero, Julio Gonzalo, Víctor Peinado NLP & IR Group Dept. de Lenguajes y Sistemas Informáticos

More information

Abstract. Cross-lingual information retrieval, query translation, word sense disambiguation, Wikipedia, comparable corpus

Abstract. Cross-lingual information retrieval, query translation, word sense disambiguation, Wikipedia, comparable corpus WikiTranslate: Query Translation for Cross-lingual Information Retrieval using only Wikipedia D. Nguyen, A.Overwijk, C.Hauff, R.B. Trieschnigg, D. Hiemstra, F.M.G. de Jong Twente University dong.p.ng@gmail.com,

More information

Towards the Automatic Creation of a Wordnet from a Term-based Lexical Network

Towards the Automatic Creation of a Wordnet from a Term-based Lexical Network Towards the Automatic Creation of a Wordnet from a Term-based Lexical Network Hugo Gonçalo Oliveira, Paulo Gomes (hroliv,pgomes)@dei.uc.pt Cognitive & Media Systems Group CISUC, University of Coimbra Uppsala,

More information

Patent Terminlogy Analysis: Passage Retrieval Experiments for the Intellecutal Property Track at CLEF

Patent Terminlogy Analysis: Passage Retrieval Experiments for the Intellecutal Property Track at CLEF Patent Terminlogy Analysis: Passage Retrieval Experiments for the Intellecutal Property Track at CLEF Julia Jürgens, Sebastian Kastner, Christa Womser-Hacker, and Thomas Mandl University of Hildesheim,

More information

Ontology Based Prediction of Difficult Keyword Queries

Ontology Based Prediction of Difficult Keyword Queries Ontology Based Prediction of Difficult Keyword Queries Lubna.C*, Kasim K Pursuing M.Tech (CSE)*, Associate Professor (CSE) MEA Engineering College, Perinthalmanna Kerala, India lubna9990@gmail.com, kasim_mlp@gmail.com

More information

Enhanced retrieval using semantic technologies:

Enhanced retrieval using semantic technologies: Enhanced retrieval using semantic technologies: Ontology based retrieval as a new search paradigm? - Considerations based on new projects at the Bavarian State Library Dr. Berthold Gillitzer 28. Mai 2008

More information

Web Query Translation with Representative Synonyms in Cross Language Information Retrieval

Web Query Translation with Representative Synonyms in Cross Language Information Retrieval Web Query Translation with Representative Synonyms in Cross Language Information Retrieval August 25, 2005 Bo-Young Kang, Qing Li, Yun Jin, Sung Hyon Myaeng Information Retrieval and Natural Language Processing

More information

QUERY EXPANSION USING WORDNET WITH A LOGICAL MODEL OF INFORMATION RETRIEVAL

QUERY EXPANSION USING WORDNET WITH A LOGICAL MODEL OF INFORMATION RETRIEVAL QUERY EXPANSION USING WORDNET WITH A LOGICAL MODEL OF INFORMATION RETRIEVAL David Parapar, Álvaro Barreiro AILab, Department of Computer Science, University of A Coruña, Spain dparapar@udc.es, barreiro@udc.es

More information

The Sheffield and Basque Country Universities Entry to CHiC: using Random Walks and Similarity to access Cultural Heritage

The Sheffield and Basque Country Universities Entry to CHiC: using Random Walks and Similarity to access Cultural Heritage The Sheffield and Basque Country Universities Entry to CHiC: using Random Walks and Similarity to access Cultural Heritage Eneko Agirre 1, Paul Clough 2, Samuel Fernando 2, Mark Hall 2, Arantxa Otegi 1,

More information

Testing Concept Indexing in Crosslingual Medical Text Classification

Testing Concept Indexing in Crosslingual Medical Text Classification Testing Concept Indexing in Crosslingual Medical Text Classification Francisco Carrero, Jose Carlos Cortizo Universidad Europea de Madrid {francisco.carrero, josecarlos.cortizo}@uem.es Jose Maria Gomez

More information

A proposal for improving WordNet Domains

A proposal for improving WordNet Domains A proposal for improving WordNet Domains Aitor González-Agirre, Mauro Castillo, German Rigau IXA group UPV/EHU, Donostia Spain, Donostia Spain agonzalez278@ikasle.ehu.com, german.rigau@ehu.com UTEM, Santiago

More information

Using an Image-Text Parallel Corpus and the Web for Query Expansion in Cross-Language Image Retrieval

Using an Image-Text Parallel Corpus and the Web for Query Expansion in Cross-Language Image Retrieval Using an Image-Text Parallel Corpus and the Web for Query Expansion in Cross-Language Image Retrieval Yih-Chen Chang and Hsin-Hsi Chen * Department of Computer Science and Information Engineering National

More information

KNOW At The Social Book Search Lab 2016 Suggestion Track

KNOW At The Social Book Search Lab 2016 Suggestion Track KNOW At The Social Book Search Lab 2016 Suggestion Track Hermann Ziak and Roman Kern Know-Center GmbH Inffeldgasse 13 8010 Graz, Austria hziak, rkern@know-center.at Abstract. Within this work represents

More information

Query-based Text Normalization Selection Models for Enhanced Retrieval Accuracy

Query-based Text Normalization Selection Models for Enhanced Retrieval Accuracy Query-based Text Normalization Selection Models for Enhanced Retrieval Accuracy Si-Chi Chin Rhonda DeCook W. Nick Street David Eichmann The University of Iowa Iowa City, USA. {si-chi-chin, rhonda-decook,

More information

International Journal of Advance Engineering and Research Development SENSE BASED INDEXING OF HIDDEN WEB USING ONTOLOGY

International Journal of Advance Engineering and Research Development SENSE BASED INDEXING OF HIDDEN WEB USING ONTOLOGY Scientific Journal of Impact Factor (SJIF): 5.71 International Journal of Advance Engineering and Research Development Volume 5, Issue 04, April -2018 e-issn (O): 2348-4470 p-issn (P): 2348-6406 SENSE

More information

LSI UNED at M-WePNaD: Embeddings for Person Name Disambiguation

LSI UNED at M-WePNaD: Embeddings for Person Name Disambiguation LSI UNED at M-WePNaD: Embeddings for Person Name Disambiguation Andres Duque, Lourdes Araujo, and Juan Martinez-Romo Dpto. Lenguajes y Sistemas Informáticos Universidad Nacional de Educación a Distancia

More information

ResPubliQA 2010

ResPubliQA 2010 SZTAKI @ ResPubliQA 2010 David Mark Nemeskey Computer and Automation Research Institute, Hungarian Academy of Sciences, Budapest, Hungary (SZTAKI) Abstract. This paper summarizes the results of our first

More information

1.

1. * 390/0/2 : 389/07/20 : 2 25-8223 ( ) 2 25-823 ( ) ISC SCOPUS L ISA http://jist.irandoc.ac.ir 390 22-97 - :. aminnezarat@gmail.com mosavit@pnu.ac.ir : ( ).... 00.. : 390... " ". ( )...2 2. 3. 4 Google..

More information

Automatic Lemmatizer Construction with Focus on OOV Words Lemmatization

Automatic Lemmatizer Construction with Focus on OOV Words Lemmatization Automatic Lemmatizer Construction with Focus on OOV Words Lemmatization Jakub Kanis, Luděk Müller University of West Bohemia, Department of Cybernetics, Univerzitní 8, 306 14 Plzeň, Czech Republic {jkanis,muller}@kky.zcu.cz

More information

Enhanced Retrieval of Web Pages using Improved Page Rank Algorithm

Enhanced Retrieval of Web Pages using Improved Page Rank Algorithm Enhanced Retrieval of Web Pages using Improved Page Rank Algorithm Rekha Jain 1, Sulochana Nathawat 2, Dr. G.N. Purohit 3 1 Department of Computer Science, Banasthali University, Jaipur, Rajasthan ABSTRACT

More information

CS674 Natural Language Processing

CS674 Natural Language Processing CS674 Natural Language Processing A question What was the name of the enchanter played by John Cleese in the movie Monty Python and the Holy Grail? Question Answering Eric Breck Cornell University Slides

More information

Personalized Terms Derivative

Personalized Terms Derivative 2016 International Conference on Information Technology Personalized Terms Derivative Semi-Supervised Word Root Finder Nitin Kumar Bangalore, India jhanit@gmail.com Abhishek Pradhan Bangalore, India abhishek.pradhan2008@gmail.com

More information

Word Disambiguation in Web Search

Word Disambiguation in Web Search Word Disambiguation in Web Search Rekha Jain Computer Science, Banasthali University, Rajasthan, India Email: rekha_leo2003@rediffmail.com G.N. Purohit Computer Science, Banasthali University, Rajasthan,

More information

An Axiomatic Approach to IR UIUC TREC 2005 Robust Track Experiments

An Axiomatic Approach to IR UIUC TREC 2005 Robust Track Experiments An Axiomatic Approach to IR UIUC TREC 2005 Robust Track Experiments Hui Fang ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign Abstract In this paper, we report

More information

Semantic Annotation of Web Resources Using IdentityRank and Wikipedia

Semantic Annotation of Web Resources Using IdentityRank and Wikipedia Semantic Annotation of Web Resources Using IdentityRank and Wikipedia Norberto Fernández, José M.Blázquez, Luis Sánchez, and Vicente Luque Telematic Engineering Department. Carlos III University of Madrid

More information

HITS and Misses: Combining BM25 with HITS for Expert Search

HITS and Misses: Combining BM25 with HITS for Expert Search HITS and Misses: Combining BM25 with HITS for Expert Search Johannes Leveling and Gareth J. F. Jones School of Computing and Centre for Next Generation Localisation (CNGL) Dublin City University Dublin

More information

CROSS LANGUAGE INFORMATION ACCESS IN TELUGU

CROSS LANGUAGE INFORMATION ACCESS IN TELUGU CROSS LANGUAGE INFORMATION ACCESS IN TELUGU by Vasudeva Varma, Aditya Mogadala Mogadala, V. Srikanth Reddy, Ram Bhupal Reddy in Siliconandhrconference (Global Internet forum for Telugu) Report No: IIIT/TR/2011/-1

More information

A Comprehensive Analysis of using Semantic Information in Text Categorization

A Comprehensive Analysis of using Semantic Information in Text Categorization A Comprehensive Analysis of using Semantic Information in Text Categorization Kerem Çelik Department of Computer Engineering Boğaziçi University Istanbul, Turkey celikerem@gmail.com Tunga Güngör Department

More information

University of Alicante at NTCIR-9 GeoTime

University of Alicante at NTCIR-9 GeoTime University of Alicante at NTCIR-9 GeoTime Fernando S. Peregrino fsperegrino@dlsi.ua.es David Tomás dtomas@dlsi.ua.es Department of Software and Computing Systems University of Alicante Carretera San Vicente

More information

Ontology Based Search Engine

Ontology Based Search Engine Ontology Based Search Engine K.Suriya Prakash / P.Saravana kumar Lecturer / HOD / Assistant Professor Hindustan Institute of Engineering Technology Polytechnic College, Padappai, Chennai, TamilNadu, India

More information

Building Multilingual Resources and Neural Models for Word Sense Disambiguation. Alessandro Raganato March 15th, 2018

Building Multilingual Resources and Neural Models for Word Sense Disambiguation. Alessandro Raganato March 15th, 2018 Building Multilingual Resources and Neural Models for Word Sense Disambiguation Alessandro Raganato March 15th, 2018 About me alessandro.raganato@helsinki.fi http://wwwusers.di.uniroma1.it/~raganato ERC

More information

WEIGHTING QUERY TERMS USING WORDNET ONTOLOGY

WEIGHTING QUERY TERMS USING WORDNET ONTOLOGY IJCSNS International Journal of Computer Science and Network Security, VOL.9 No.4, April 2009 349 WEIGHTING QUERY TERMS USING WORDNET ONTOLOGY Mohammed M. Sakre Mohammed M. Kouta Ali M. N. Allam Al Shorouk

More information

Conceptual document indexing using a large scale semantic dictionary providing a concept hierarchy

Conceptual document indexing using a large scale semantic dictionary providing a concept hierarchy Conceptual document indexing using a large scale semantic dictionary providing a concept hierarchy Martin Rajman, Pierre Andrews, María del Mar Pérez Almenta, and Florian Seydoux Artificial Intelligence

More information

SINAI at CLEF ehealth 2017 Task 3

SINAI at CLEF ehealth 2017 Task 3 SINAI at CLEF ehealth 2017 Task 3 Manuel Carlos Díaz-Galiano, M. Teresa Martín-Valdivia, Salud María Jiménez-Zafra, Alberto Andreu, and L. Alfonso Ureña López Department of Computer Science, Universidad

More information

Department of Electronic Engineering FINAL YEAR PROJECT REPORT

Department of Electronic Engineering FINAL YEAR PROJECT REPORT Department of Electronic Engineering FINAL YEAR PROJECT REPORT BEngCE-2007/08-HCS-HCS-03-BECE Natural Language Understanding for Query in Web Search 1 Student Name: Sit Wing Sum Student ID: Supervisor:

More information

Question Answering Approach Using a WordNet-based Answer Type Taxonomy

Question Answering Approach Using a WordNet-based Answer Type Taxonomy Question Answering Approach Using a WordNet-based Answer Type Taxonomy Seung-Hoon Na, In-Su Kang, Sang-Yool Lee, Jong-Hyeok Lee Department of Computer Science and Engineering, Electrical and Computer Engineering

More information

Utilizing Semantic Equivalence Classes of Japanese Functional Expressions in Machine Translation

Utilizing Semantic Equivalence Classes of Japanese Functional Expressions in Machine Translation Utilizing Semantic Equivalence Classes of Japanese Functional Expressions in Machine Translation Akiko Sakamoto Takehito Utsuro University of Tsukuba Tsukuba, Ibaraki, 305-8573, JAPAN Suguru Matsuyoshi

More information

WebMining: An unsupervised parallel corpora web retrieval system

WebMining: An unsupervised parallel corpora web retrieval system WebMining: An unsupervised parallel corpora web retrieval system Jesús Tomás Instituto Tecnológico de Informática Universidad Politécnica de Valencia jtomas@upv.es Jaime Lloret Dpto. de Comunicaciones

More information

Using Ontology Dimensions and Negative Expansion to solve Precise Queries in CLEF Medical Task

Using Ontology Dimensions and Negative Expansion to solve Precise Queries in CLEF Medical Task Using Ontology Dimensions and Negative Expansion to solve Precise Queries in CLEF Medical Task Jean-Pierre Chevallet IPAL-CNRS Institute for Infocomm Research 21 Heng Mui Keng Terrace Singapore 119613

More information

Automatically Generating Queries for Prior Art Search

Automatically Generating Queries for Prior Art Search Automatically Generating Queries for Prior Art Search Erik Graf, Leif Azzopardi, Keith van Rijsbergen University of Glasgow {graf,leif,keith}@dcs.gla.ac.uk Abstract This report outlines our participation

More information

Tag Semantics for the Retrieval of XML Documents

Tag Semantics for the Retrieval of XML Documents Tag Semantics for the Retrieval of XML Documents Davide Buscaldi 1, Giovanna Guerrini 2, Marco Mesiti 3, Paolo Rosso 4 1 Dip. di Informatica e Scienze dell Informazione, Università di Genova, Italy buscaldi@disi.unige.it,

More information

EFFICIENT INTEGRATION OF SEMANTIC TECHNOLOGIES FOR PROFESSIONAL IMAGE ANNOTATION AND SEARCH

EFFICIENT INTEGRATION OF SEMANTIC TECHNOLOGIES FOR PROFESSIONAL IMAGE ANNOTATION AND SEARCH EFFICIENT INTEGRATION OF SEMANTIC TECHNOLOGIES FOR PROFESSIONAL IMAGE ANNOTATION AND SEARCH Andreas Walter FZI Forschungszentrum Informatik, Haid-und-Neu-Straße 10-14, 76131 Karlsruhe, Germany, awalter@fzi.de

More information

Introducing the Arabic WordNet Project

Introducing the Arabic WordNet Project Introducing the Arabic WordNet Project William BLACK, Sabri ELKATEB, School of Informatics University of Manchester Sackville Street, Manchester, M60 1QD, w.black@manchester.ac.uk, sabrikom@hotmail.com

More information

Semantics Isn t Easy Thoughts on the Way Forward

Semantics Isn t Easy Thoughts on the Way Forward Semantics Isn t Easy Thoughts on the Way Forward NANCY IDE, VASSAR COLLEGE REBECCA PASSONNEAU, COLUMBIA UNIVERSITY COLLIN BAKER, ICSI/UC BERKELEY CHRISTIANE FELLBAUM, PRINCETON UNIVERSITY New York University

More information

Random Walks for Knowledge-Based Word Sense Disambiguation. Qiuyu Li

Random Walks for Knowledge-Based Word Sense Disambiguation. Qiuyu Li Random Walks for Knowledge-Based Word Sense Disambiguation Qiuyu Li Word Sense Disambiguation 1 Supervised - using labeled training sets (features and proper sense label) 2 Unsupervised - only use unlabeled

More information

R 2 D 2 at NTCIR-4 Web Retrieval Task

R 2 D 2 at NTCIR-4 Web Retrieval Task R 2 D 2 at NTCIR-4 Web Retrieval Task Teruhito Kanazawa KYA group Corporation 5 29 7 Koishikawa, Bunkyo-ku, Tokyo 112 0002, Japan tkana@kyagroup.com Tomonari Masada University of Tokyo 7 3 1 Hongo, Bunkyo-ku,

More information

A Document Graph Based Query Focused Multi- Document Summarizer

A Document Graph Based Query Focused Multi- Document Summarizer A Document Graph Based Query Focused Multi- Document Summarizer By Sibabrata Paladhi and Dr. Sivaji Bandyopadhyay Department of Computer Science and Engineering Jadavpur University Jadavpur, Kolkata India

More information

in the NTU Multilingual Corpus (NTU-MC) January 15, 2016

in the NTU Multilingual Corpus (NTU-MC) January 15, 2016 . Sentiment Annotation in the NTU Multilingual Corpus (NTU-MC). 2 nd Wordnet Bahasa Workshop (WBW2016) Francis Bond, Tomoko Ohkuma, Luis Morgado Da Costa, Yasuhide Miura, Rachel Chen, Takayuki Kuribayashi,

More information

Influence of Word Normalization on Text Classification

Influence of Word Normalization on Text Classification Influence of Word Normalization on Text Classification Michal Toman a, Roman Tesar a and Karel Jezek a a University of West Bohemia, Faculty of Applied Sciences, Plzen, Czech Republic In this paper we

More information

Using DEB Services for Knowledge Representation within the KYOTO Project

Using DEB Services for Knowledge Representation within the KYOTO Project Using DEB Services for Knowledge Representation within the KYOTO Project Aleš Horák and Adam Rambousek Faculty of Informatics, Masaryk University Botanická 68a, 602 00 Brno, Czech Republic {hales,xrambous}@fi.muni.cz

More information

Toward Interlinking Asian Resources Effectively: Chinese to Korean Frequency-Based Machine Translation System

Toward Interlinking Asian Resources Effectively: Chinese to Korean Frequency-Based Machine Translation System Toward Interlinking Asian Resources Effectively: Chinese to Korean Frequency-Based Machine Translation System Eun Ji Kim and Mun Yong Yi (&) Department of Knowledge Service Engineering, KAIST, Daejeon,

More information

GoNTogle: A Tool for Semantic Annotation and Search

GoNTogle: A Tool for Semantic Annotation and Search GoNTogle: A Tool for Semantic Annotation and Search Giorgos Giannopoulos 1,2, Nikos Bikakis 1,2, Theodore Dalamagas 2, and Timos Sellis 1,2 1 KDBSL Lab, School of ECE, Nat. Tech. Univ. of Athens, Greece

More information

Evaluation of Synset Assignment to Bi-lingual Dictionary

Evaluation of Synset Assignment to Bi-lingual Dictionary Evaluation of Synset Assignment to Bi-lingual Dictionary Thatsanee Charoenporn 1, Virach Sornlertlamvanich 1, Chumpol Mokarat 1, Hitoshi Isahara 2, Hammam Riza 3, and Purev Jaimai 4 1 Thai Computational

More information

Open Source Dutch WordNet

Open Source Dutch WordNet Open Source Dutch WordNet Marten Postma Piek Vossen Release 1.0 Date December 1, 2014 m.c.postma@vu.nl p.t.j.m.vossen@vu.nl License CC BY-SA 4.0 Website http://wordpress.let.vupr.nl/odwn/ This project

More information

Automatic Word Sense Disambiguation Using Wikipedia

Automatic Word Sense Disambiguation Using Wikipedia Automatic Word Sense Disambiguation Using Wikipedia Sivakumar J *, Anthoniraj A ** School of Computing Science and Engineering, VIT University Vellore-632014, TamilNadu, India * jpsivas@gmail.com ** aanthoniraja@gmail.com

More information

Annotated Suffix Trees for Text Clustering

Annotated Suffix Trees for Text Clustering Annotated Suffix Trees for Text Clustering Ekaterina Chernyak and Dmitry Ilvovsky National Research University Higher School of Economics Moscow, Russia echernyak,dilvovsky@hse.ru Abstract. In this paper

More information

Linking Entities in Chinese Queries to Knowledge Graph

Linking Entities in Chinese Queries to Knowledge Graph Linking Entities in Chinese Queries to Knowledge Graph Jun Li 1, Jinxian Pan 2, Chen Ye 1, Yong Huang 1, Danlu Wen 1, and Zhichun Wang 1(B) 1 Beijing Normal University, Beijing, China zcwang@bnu.edu.cn

More information