The Effectiveness of a Dictionary-Based Technique for Indonesian-English Cross-Language Text Retrieval

Size: px
Start display at page:

Download "The Effectiveness of a Dictionary-Based Technique for Indonesian-English Cross-Language Text Retrieval"

Transcription

1 University of Massachusetts Amherst Amherst Computer Science Department Faculty Publication Series Computer Science 1997 The Effectiveness of a Dictionary-Based Technique for Indonesian-English Cross-Language Text Retrieval Mirna Adriani Universitas Indonesia W. Bruce Croft University of Massachusetts - Amherst Follow this and additional works at: Part of the Computer Sciences Commons Recommended Citation Adriani, Mirna and Croft, W. Bruce, "The Effectiveness of a Dictionary-Based Technique for Indonesian-English Cross-Language Text Retrieval" (1997). Computer Science Department Faculty Publication Series Retrieved from This Article is brought to you for free and open access by the Computer Science at ScholarWorks@UMass Amherst. It has been accepted for inclusion in Computer Science Department Faculty Publication Series by an authorized administrator of ScholarWorks@UMass Amherst. For more information, please contact scholarworks@library.umass.edu.

2 The Effectiveness of a Dictionary-Based Technique for Indonesian-English Cross-Language Text Retrieval Mirna Adriani Fakultas Ilmu Komputer Universitas Indonesia Depok Indonesia mirna@cs.ui.ac.id W. Bruce Croft Center for Intelligent Information Retrieval Computer Science Department University of Massachusetts, Amherst Amherst, MA , USA croft@cs.umass.edu May 1997 Abstract We evaluate the effectiveness of a dictionary-based cross-language text retrieval technique which uses a two-way dictionary for translating queries from their original language into the language of the text documents. As can be expected, the translated queries are not as effective as queries formulated by the users using the same language as the text documents. We then apply a local-feedback technique to expand the translated queries in order to improve their retrieval effectiveness. Our empirical results show that the technique is effective for English-Indonesian and Indonesian-English cross-language retrieval. 1. Introduction Internet which has connected computer users all over the world has become a major medium for textual information exchange involving users located in various parts of the world. As a result, we see a dramatic increase in the availability and accessibility of text documents in various languages. It opens the possibility for a user to conduct research using published materials or references written in other languages. For instance, a traveller can obtain information about countries that he/she wants to visit. The problem is that, as with the Internet in general, finding or selecting relevant texts among all that is available online is difficult, let alone if the texts are written in foreign languages. Since, obviously, not many users are fluent in foreign languages to be able find foreign text documents relevant to their information needs, an information retrieval system that takes a query written in the user s native language and retrieves documents written in another language would be very helpful. Such a system is called a cross-language information retrieval (CLIR) system. This situation has motivated the recent increasing interest in CLIR which is focused toward overcoming the above language barriers.

3 Dictionary-based CLIR techniques are retrieval techniques which involve translations of textual data from one language to the other using a dictionary. There are two main strategies in dictionary-based CLIR, first, by translating the original documents into the language of the queries, and second, by translating the queries into the language of the documents. In this study, we employ the second strategy, i.e., the query translation strategy. This strategy is more efficient than the first strategy, i.e., the document translation strategy, in that it does not require the expensive overhead cost of translating all documents, especially when new documents are added frequently and not all of the documents are of interest to the users. We evaluate this strategy on English-Indonesian and Indonesian-English cross-language text retrievals. Our preliminary results, as can be expected, showed that the retrieval effectiveness of the translated queries is lower than that of queries written in the same language as the documents (mono-language retrieval). To improve the queries recallprecision performance, we apply a query expansion techniques which uses local-feedback technique [Attar & Fraenkel, 1977]. A query is expanded by adding potentially relevant keywords to it in order to make it more specific. 2. Other CLIR Research One of the earlier work in CLIR is by Salton [Salton, 1970]. In his study, the effectiveness of English and German queries is compared with that of queries obtained using a bilingual thesaurus for retrieving documents in both languages. Much work in CLIR has been done by researchers recently. One of the well-know methods in CLIR is the parallel corpus method. This method requires the availability of documents in both languages, hence the name parallel corpus, which may be obtained using manual or machine translation, or by using comparable documents of the same topic. Landauer and Littman [Landauer & Littman, 1990] apply Cross-Language Latent Semantic Indexing (CL-LSI) to parallel document collections. This technique transforms the correlation among words into a compact vector representation. In this technique, a query (or a document) does not need to be translated into the other language, instead its vector representation is simply compared with the documents vectors. A study using parallel corpus has also been conducted by Sheridan and Ballerini [Sheridan & Ballerini, 1996]. They showed that adding terms obtained from thesaurus to the query results in better retrieval performance for multilingual documents. 2

4 Another well-known method, besides the parallel corpus method, is the dictionary-based method. There have been many studies which propose dictionary-based CLIR techniques, particularly those that employ the query translation strategy. The effectiveness of translating a query word-by-word based by looking up in a dictionary has been studied by Hull and Grefenstette [Hull & Grefenstette, 1996]. They identified the cause of poor performance of the translated queries such as word ambiguities and the difficulty in recognising phrases. Ballesteros and Croft [Ballesteros & Croft, 1996] proposes a technique to improve the retrieval effectiveness of the translated query by applying a query expansion technique based on the local feedback technique. They demonstrated that adding terms obtained using the local feedback to the query at pre-translation, at post-translation and at pretranslation and post-translation combined, results in better retrieval performance than the base retrieval performance of the query without expansion. Davis and Ogden [Davis & Ogden, 1997] combined the parallel corpus method and the dictionary-based method. They used a bilingual dictionary to translate the English queries to Spanish for retrieving parallel documents. To reduce ambiguity, they used a statistically-based parts-of-speech tagger to identify the syntactic roles of the terms in the query before translating them. The retrieval is done by comparing the vector representation of the query and the vector representations of the parallel documents. 3. Indonesian-English CLIR study In this study, we apply a similar technique as the one used by Ballesteros and Croft [Ballesteros & Croft, 1996] for Indonesian-English query. The technique uses the local feedback to expand the queries. The local feedback technique [Attar & Fraenkel, 1977] extracts terms from a number of top documents retrieved by the original query based on a presumption that those documents are relevant. The terms are then added to the query resulting in an expanded query. The query expansion can be performed before, after, or both before and after the query translation. In this study we compare all of the three schemes. We construct the Indonesian queries based on TREC's Spanish query topics SP26-45 [Harman, 1995] by translating the queries to Indonesian and modifying them to make them relevant with Indonesia's national affairs. The English queries are chosen from TREC topics To obtain the terms for the pre-translation query expansion, the local feedback process is performed on the Associated Press collection from TREC (78,321 articles) and on Tempo collections (5,601 articles from Tempo, Indonesian weekly magazine) for, respectively, English-Indonesian and Indonesian-English retrievals. The retrieval performance evaluation and the post-translation query expansion are conducted on the 2GB TREC (vol. 2) English collection and the 68 MB newspaper article collection 3

5 from Kompas (an Indonesian daily newspaper), respectively, for Indonesian-English and English-Indonesian retrievals. In the first step of the query translation process, we delete stop phrases, stop words, and translate plural words into their singular forms. Then each word in the queries is replaced with the first definition listed in a two-way Indonesian-English dictionary [Echols & Shaddily, 1992]. Words which do not have any translation are kept in their original form in the query. Our experiment is conducted using INQUERY information retrieval system. INQUERY is a probabilistic information retrieval system developed at the University of Massachusetts [Callan et.al, 1992]. In order to handle the Indonesian texts, we added an Indonesian word-stemmer module into INQUERY. 3.1 Translation Process The first step in this study is to do the translation for each word in the queries. Each word is replaced by the first definition found in the dictionary. Note that a dictionary entry may contain a number of terms that do not have the same meaning as the original word such as shown in Table 1. English word culture Translated Indonesian word from the dictionary kesopanan, kebudayaan control pengawasan, pengawas, penilikan, pengaturan, penguasaan, pembatasan Table 1. The Examples of some translated words The translated queries for both Indonesian to English and English to Indonesian cases result in lower retrieval effectiveness as compared with that of the original language's queries. The terms from the dictionary that are not related to the original query term might have caused this poor performance. The results (Table 2.) show that the performance of both query translations is below that of the monolingual retrieval. The English to Indonesian query translation shows 33.2% performance drop compared to the original queries. Meanwhile, the Indonesian to English query translation shows greater performance decrease, that is a drop of 62.1%. The cause of such a big drop may be due to the fact that English vocabulary is much larger, consisting of more specific concept words, than Indonesian vocabulary. As a result, 4

6 mapping words from Indonesian to English is more difficult or less accurate than from English to Indonesia. Moreover, the translation words might not relate to the original query. Monolingual Query Translated Query English (-62.1) Indonesian (-33.2) Table 2. The average recall-precision from the original queries and the translated queries. In order to improve the performance of the translated query, we modify the query using the local feedback technique. The modification is done before and after the query translation to add more terms. 3.2 Query Expansion Pre-Translation The query expansion that is done before the translation process add more terms to the Indonesian and English queries (pre-translation feedback). Furthermore, the new queries are translated word by word to the other language. Table 3. and 4. show some examples of the resulting queries in the pre-translation query expansions for Indonesian and English queries. English word English query Translation of English query English query & pre-translation feedback Translation of the English query & pre-translation feedback Translated Indonesian word from the dictionary heritage culture indonesia [warisan, pusaka] [kesopanan, kebudayaan] indonesia heritage culture indonesia [indonesian jakarta sumatra jaya archipelago] warisan, pusaka kesopanan, kebudayaan indonesia [orang Indonesia jakarta sumatra jaya nusantara] Table 3. The Examples of the translated queries using pre-translation feedback for English queries 5

7 Indonesian word Indonesian query Translation of Indonesian query Indonesian query & pretranslation feed-back Translation of the Indonesian query & pre-translation feedback Translated English word from the dictionary tumpahan minyak buruk [something spilled] [oil, grease, fat] [old, worn out, dilapidated] tumpahan minyak buruk translation [menteri barel opec ekonomi kurs] [something spilled] [oil, grease, fat] [old, worn out, dilapidated] [cabinet minister] barel opec [economics] [a rate of exchange] Table 4. The Examples of the translated query using pre-translation feedback for Indonesian queries The results show that the pre-translation feedback for Indonesian to English query translation does not add useful terms to the queries so that the performance is 17-21% lower than the base result (Table 5). The reason is that the Associated Press collection that we use for the training set does not contain enough documents concerning Indonesia. The terms that we get from the local feedback for most of the queries come from the same small set of documents. No. of terms Relevant docs Avg. Precision (-17.8) (-21.8) (-19.5) (-17.3) Table 5. The average precision of pre-translation feedback result for English to Indonesian translated queries On the other hand, the Indonesian to English query translation shows better result (Table 6). The feedback terms help to increase the retrieval performance by 10-15% as compared to the base result. 6

8 No. of terms Relevant docs Avg. Precision (10.4) (10.4) (15.2) Table 6. The average precision of pre-translation feedback result for the Indonesian to English translated queries 3.3 Query Expansion Post-Translation Feedback The query expansion that is done after the translation showed greater improvement than the base query. The effect of adding terms through the local feedback is positive because the documents that are used for the feedback is the same as the documents used for the evaluation (Table 7). English word English query Translation of English query English query & posttranslation feedback Translation of the English query & post-translation feedback Translated Indonesian word from the dictionary potensial weakness Indonesian navy [kesanggupan, tenaga] [kelemahan, kekurangan] [orang Indonesia] [Angkatan Laut] [kesanggupan, tenaga] [kelemahan, kekurangan] [orang Indonesia] [Angkatan Laut] [indonesia laut kompas organisasi negeri luas keamanan] [kesanggupan, tenaga] [kelemahan, kekurangan] [orang Indonesia] [Angkatan Laut] [indonesia laut kompas organisasi negeri luas keamanan] Table 7. The example of the English to Indonesian translated query using post-translation feedback. The retrieval performance of the English to Indonesian query translation as shown in Table 9. improves by 10-15%. Likewise, the retrieval performance of Indonesian to English query translation improves by 46-68%. The better result in the last experiment is due to that more relevant words from a larger set of relevant documents being added to the queries. 7

9 No. of terms Relevant docs Avg. Precision (22.6) (23.6) (23.1) (26.9) Table 8. The average precision using post-translation feedback for English to Indonesian translated queries. No. of terms Relevant docs Avg. Precision (68.9) (66.5) (61.8) (46.4) Table 9. The average precision using post-translation feedback for Indonesian to English translated queries. 3.4 Combining Pre- and Post-Translation Feedback We also studied the possibility of adding terms before and after the translation were conducted. At first, we applied expanded the queries before the translation (pre-translation feedback). Then we applied the feedback technique again after the expanding queries were translated (post-translation feedback). No. of terms Relevant docs Avg. Precision (-42.1) (-46.1) (-35.3) (-40.8) Table 10. The average precision of the combined feedback result for English to Indonesian translated queries. The result of the combined method for the English-Indonesian queries is worse that that of the base queries as shown in Table 10. The added terms from the pre-translation feedback 8

10 which are not relevant to the original query got even bad terms after being expand on post-translation feedback. No. of terms Relevant docs Avg. Precision (12.8) (17.0) (47.9) (56.3) (57.3) Table 11. The average precision of the combined feedback result for Indonesian to English translated queries. On the other hand, we got a positive improvement on the Indonesian-English queries. The words from the combined feedback produced more than 50% increased in average precision. 4. Conclusion Our study shows that translating the query using dictionary can perform well in CLIR. The ambiguity on the translated query can be improved by adding words using local feedback query expansion strategy. From the result of the query expansion that is done before the translation, we learn that the characteristic of the training set such as the period of the articles and its domain is important in getting good terms. The query expansion that is done after the translation shows a greater performance from the base queries. The number of relevant files on the English document collections that is bigger than the number of relevant judgement files in Indonesian document collection has increased the retrieval performance. The combined feedback also improves the performance of the translated queries even though as not good as post-translation feedback. Our future research includes a study on the retrieval performance of translating terms in phrases found in the query instead of a term-by-term translation as in the current study. 9

11 Acknowledgements We would like to thank Hendarto, Andrey Andoko, Hardanto from Kompas ( and Saiful Basri from Tempo who provide the Indonesian text collections. We would also like to thank Prof. John U. Wolff who provides the Indonesian-English electronic dictionary on this study. This material is based on work supported in part by the National Science Foundation, Library of Congress and Department of Commerce under cooperative agreement number EEC Any opinions, findings and conclusions or recommendations expressed in this material are the author(s) and do not necessarily reflect those of the sponsor. References Attar, R. and A. S. Fraenkel. Local Feedback in Full-Text Retrieval Systems. Journal of the Association for Computing Machinery, 24: , Ballesteros, Lisa and W. Bruce Croft. Dictionary Methods for Cross-Lingual Information Retrieval. Proceedings of the 7th International DEXA Conference on Database and Expert Systems Applications, , Callan, J. P., W. Bruce Croft, S. M. Harding. The Inquery Retrieval System. Third International Conference on Database and Expert Systems Applications, Hull, David A. and Gregory Grefenstette. Experiments in Multilingual Information Retrieval. In Proceedings of the 19 th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Davis, Mark W. and William C. Ogden. Implementing Cross-Language Text Retrieval Systems for Large-scale Text Collections and the World Wide Web. AAAI Symposium on Cross-Language Text and Speech Retrieval, Echols, John M. and Hasan Shadily. An Indonesian-English Dictionary. Third ed. Cornell University, Harman, Donna. Overview of the Fourth Text Retrieval Conference (TREC-4). Proceedings of the Fourth Text Retrieval Conference (TREC-4),

12 Salton, Gerard. Automatic Processing of Foreign Language Documents. Journal of the American Society for Information Science, 21 : , Sheridan, P. and J. P. Ballerini. Experiments in Multilingual Information Retrieval using the SPIDER System. Proceedings of the 19 th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Zurich, Switzerland, August

This paper studies methods to enhance cross-language retrieval of domain-specific

This paper studies methods to enhance cross-language retrieval of domain-specific Keith A. Gatlin. Enhancing Cross-Language Retrieval of Comparable Corpora Through Thesaurus-Based Translation and Citation Indexing. A master s paper for the M.S. in I.S. degree. April, 2005. 23 pages.

More information

Prior Art Retrieval Using Various Patent Document Fields Contents

Prior Art Retrieval Using Various Patent Document Fields Contents Prior Art Retrieval Using Various Patent Document Fields Contents Metti Zakaria Wanagiri and Mirna Adriani Fakultas Ilmu Komputer, Universitas Indonesia Depok 16424, Indonesia metti.zakaria@ui.edu, mirna@cs.ui.ac.id

More information

Cross-Language Information Retrieval using Dutch Query Translation

Cross-Language Information Retrieval using Dutch Query Translation Cross-Language Information Retrieval using Dutch Query Translation Anne R. Diekema and Wen-Yuan Hsiao Syracuse University School of Information Studies 4-206 Ctr. for Science and Technology Syracuse, NY

More information

TEXT CHAPTER 5. W. Bruce Croft BACKGROUND

TEXT CHAPTER 5. W. Bruce Croft BACKGROUND 41 CHAPTER 5 TEXT W. Bruce Croft BACKGROUND Much of the information in digital library or digital information organization applications is in the form of text. Even when the application focuses on multimedia

More information

Multilingual Web Retrieval: An Experiment in English Chinese Business Intelligence

Multilingual Web Retrieval: An Experiment in English Chinese Business Intelligence Multilingual Web Retrieval: An Experiment in English Chinese Business Intelligence Jialun Qin and Yilu Zhou Department of Management Information Systems, The University of Arizona, Tucson, AZ 85721. E-mail:

More information

Web Query Translation with Representative Synonyms in Cross Language Information Retrieval

Web Query Translation with Representative Synonyms in Cross Language Information Retrieval Web Query Translation with Representative Synonyms in Cross Language Information Retrieval August 25, 2005 Bo-Young Kang, Qing Li, Yun Jin, Sung Hyon Myaeng Information Retrieval and Natural Language Processing

More information

An Evaluation of Information Retrieval Accuracy. with Simulated OCR Output. K. Taghva z, and J. Borsack z. University of Massachusetts, Amherst

An Evaluation of Information Retrieval Accuracy. with Simulated OCR Output. K. Taghva z, and J. Borsack z. University of Massachusetts, Amherst An Evaluation of Information Retrieval Accuracy with Simulated OCR Output W.B. Croft y, S.M. Harding y, K. Taghva z, and J. Borsack z y Computer Science Department University of Massachusetts, Amherst

More information

ITERATIVE SEARCHING IN AN ONLINE DATABASE. Susan T. Dumais and Deborah G. Schmitt Cognitive Science Research Group Bellcore Morristown, NJ

ITERATIVE SEARCHING IN AN ONLINE DATABASE. Susan T. Dumais and Deborah G. Schmitt Cognitive Science Research Group Bellcore Morristown, NJ - 1 - ITERATIVE SEARCHING IN AN ONLINE DATABASE Susan T. Dumais and Deborah G. Schmitt Cognitive Science Research Group Bellcore Morristown, NJ 07962-1910 ABSTRACT An experiment examined how people use

More information

Automatic Translation in Cross-Lingual Access to Legislative Databases

Automatic Translation in Cross-Lingual Access to Legislative Databases Automatic Translation in Cross-Lingual Access to Legislative Databases Catherine Bounsaythip, Aarno Lehtola, Jarno Tenni VTT Information Technology P. Box 1201, FIN-02044 VTT, Finland Phone: +358 9 456

More information

WikiTranslate: Query Translation for Cross-Lingual Information Retrieval Using Only Wikipedia

WikiTranslate: Query Translation for Cross-Lingual Information Retrieval Using Only Wikipedia WikiTranslate: Query Translation for Cross-Lingual Information Retrieval Using Only Wikipedia Dong Nguyen, Arnold Overwijk, Claudia Hauff, Dolf R.B. Trieschnigg, Djoerd Hiemstra, and Franciska M.G. de

More information

Improving the Effectiveness of Information Retrieval with Local Context Analysis

Improving the Effectiveness of Information Retrieval with Local Context Analysis Improving the Effectiveness of Information Retrieval with Local Context Analysis JINXI XU BBN Technologies and W. BRUCE CROFT University of Massachusetts Amherst Techniques for automatic query expansion

More information

TREC-7 Experiments at the University of Maryland Douglas W. Oard Digital Library Research Group College of Library and Information Services University

TREC-7 Experiments at the University of Maryland Douglas W. Oard Digital Library Research Group College of Library and Information Services University TREC-7 Experiments at the University of Maryland Douglas W. Oard Digital Library Research Group College of Library and Information Services University of Maryland, College Park, MD 20742 oard@glue.umd.edu

More information

A Formal Approach to Score Normalization for Meta-search

A Formal Approach to Score Normalization for Meta-search A Formal Approach to Score Normalization for Meta-search R. Manmatha and H. Sever Center for Intelligent Information Retrieval Computer Science Department University of Massachusetts Amherst, MA 01003

More information

Evaluating a Visual Information Retrieval Interface: AspInquery at TREC-6

Evaluating a Visual Information Retrieval Interface: AspInquery at TREC-6 Evaluating a Visual Information Retrieval Interface: AspInquery at TREC-6 Russell Swan James Allan Don Byrd Center for Intelligent Information Retrieval Computer Science Department University of Massachusetts

More information

A Practical Passage-based Approach for Chinese Document Retrieval

A Practical Passage-based Approach for Chinese Document Retrieval A Practical Passage-based Approach for Chinese Document Retrieval Szu-Yuan Chi 1, Chung-Li Hsiao 1, Lee-Feng Chien 1,2 1. Department of Information Management, National Taiwan University 2. Institute of

More information

Dictionary Based Transitive Cross-Language Information Retrieval Using Lexical Triangulation

Dictionary Based Transitive Cross-Language Information Retrieval Using Lexical Triangulation Dictionary Based Transitive Cross-Language Information Retrieval Using Lexical Triangulation A study submitted in partial fulfilment of the requirements for the degree of Master of Science in Information

More information

1.

1. * 390/0/2 : 389/07/20 : 2 25-8223 ( ) 2 25-823 ( ) ISC SCOPUS L ISA http://jist.irandoc.ac.ir 390 22-97 - :. aminnezarat@gmail.com mosavit@pnu.ac.ir : ( ).... 00.. : 390... " ". ( )...2 2. 3. 4 Google..

More information

Multilingual Information Retrieval

Multilingual Information Retrieval Proposal for Tutorial on Multilingual Information Retrieval Proposed by Arjun Atreya V Shehzaad Dhuliawala ShivaKarthik S Swapnil Chaudhari Under the direction of Prof. Pushpak Bhattacharyya Department

More information

TREC-3 Ad Hoc Retrieval and Routing. Experiments using the WIN System. Paul Thompson. Howard Turtle. Bokyung Yang. James Flood

TREC-3 Ad Hoc Retrieval and Routing. Experiments using the WIN System. Paul Thompson. Howard Turtle. Bokyung Yang. James Flood TREC-3 Ad Hoc Retrieval and Routing Experiments using the WIN System Paul Thompson Howard Turtle Bokyung Yang James Flood West Publishing Company Eagan, MN 55123 1 Introduction The WIN retrieval engine

More information

Real-time Query Expansion in Relevance Models

Real-time Query Expansion in Relevance Models Real-time Query Expansion in Relevance Models Victor Lavrenko and James Allan Center for Intellignemt Information Retrieval Department of Computer Science 140 Governor s Drive University of Massachusetts

More information

Abstract. Cross-lingual information retrieval, query translation, word sense disambiguation, Wikipedia, comparable corpus

Abstract. Cross-lingual information retrieval, query translation, word sense disambiguation, Wikipedia, comparable corpus WikiTranslate: Query Translation for Cross-lingual Information Retrieval using only Wikipedia D. Nguyen, A.Overwijk, C.Hauff, R.B. Trieschnigg, D. Hiemstra, F.M.G. de Jong Twente University dong.p.ng@gmail.com,

More information

CLIR Evaluation at TREC

CLIR Evaluation at TREC CLIR Evaluation at TREC Donna Harman National Institute of Standards and Technology Gaithersburg, Maryland http://trec.nist.gov Workshop on Cross-Linguistic Information Retrieval SIGIR 1996 Paper Building

More information

Searching and Organizing Images Across Languages

Searching and Organizing Images Across Languages Searching and Organizing Images Across Languages Paul Clough University of Sheffield Western Bank Sheffield, UK +44 114 222 2664 p.d.clough@sheffield.ac.uk Mark Sanderson University of Sheffield Western

More information

arxiv:cs/ v1 [cs.cl] 2 Nov 2000

arxiv:cs/ v1 [cs.cl] 2 Nov 2000 Applying Machine Translation to Two-Stage Cross-Language Information Retrieval Atsushi Fujii and Tetsuya Ishikawa arxiv:cs/0011003v1 [cs.cl] 2 Nov 2000 University of Library and Information Science 1-2

More information

Noida institute of engineering and technology,greater noida

Noida institute of engineering and technology,greater noida Impact Of Word Sense Ambiguity For English Language In Web IR Prachi Gupta 1, Dr.AnuragAwasthi 2, RiteshRastogi 3 1,2,3 Department of computer Science and engineering, Noida institute of engineering and

More information

WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS

WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS 1 WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS BRUCE CROFT NSF Center for Intelligent Information Retrieval, Computer Science Department, University of Massachusetts,

More information

that term has also been used to describe systems able to perform within-language retrieval in more than one language but that lack any cross-language

that term has also been used to describe systems able to perform within-language retrieval in more than one language but that lack any cross-language Cross-Language Text Retrieval Research in the USA Douglas W. Oard College of Library and Information Services University of Maryland, College Park, MD 20742 oard@glue.umd.edu Abstract The increasing availability

More information

Cross-Lingual Information Access and Its Evaluation

Cross-Lingual Information Access and Its Evaluation Cross-Lingual Information Access and Its Evaluation Noriko Kando Research and Development Department National Center for Science Information Systems (NACSIS), Japan URL: http://www.rd.nacsis.ac.jp/~{ntcadm,kando}/

More information

MIRACLE at ImageCLEFmed 2008: Evaluating Strategies for Automatic Topic Expansion

MIRACLE at ImageCLEFmed 2008: Evaluating Strategies for Automatic Topic Expansion MIRACLE at ImageCLEFmed 2008: Evaluating Strategies for Automatic Topic Expansion Sara Lana-Serrano 1,3, Julio Villena-Román 2,3, José C. González-Cristóbal 1,3 1 Universidad Politécnica de Madrid 2 Universidad

More information

M erg in g C lassifiers for Im p ro v ed In fo rm a tio n R e triev a l

M erg in g C lassifiers for Im p ro v ed In fo rm a tio n R e triev a l M erg in g C lassifiers for Im p ro v ed In fo rm a tio n R e triev a l Anette Hulth, Lars Asker Dept, of Computer and Systems Sciences Stockholm University [hulthi asker]ø dsv.su.s e Jussi Karlgren Swedish

More information

VIDEO SEARCHING AND BROWSING USING VIEWFINDER

VIDEO SEARCHING AND BROWSING USING VIEWFINDER VIDEO SEARCHING AND BROWSING USING VIEWFINDER By Dan E. Albertson Dr. Javed Mostafa John Fieber Ph. D. Student Associate Professor Ph. D. Candidate Information Science Information Science Information Science

More information

James Mayfield! The Johns Hopkins University Applied Physics Laboratory The Human Language Technology Center of Excellence!

James Mayfield! The Johns Hopkins University Applied Physics Laboratory The Human Language Technology Center of Excellence! James Mayfield! The Johns Hopkins University Applied Physics Laboratory The Human Language Technology Center of Excellence! (301) 219-4649 james.mayfield@jhuapl.edu What is Information Retrieval? Evaluation

More information

Implementation of the common phrase index method on the phrase query for information retrieval

Implementation of the common phrase index method on the phrase query for information retrieval Implementation of the common phrase index method on the phrase query for information retrieval Triyah Fatmawati, Badrus Zaman, and Indah Werdiningsih Citation: AIP Conference Proceedings 1867, 020027 (2017);

More information

Robust Relevance-Based Language Models

Robust Relevance-Based Language Models Robust Relevance-Based Language Models Xiaoyan Li Department of Computer Science, Mount Holyoke College 50 College Street, South Hadley, MA 01075, USA Email: xli@mtholyoke.edu ABSTRACT We propose a new

More information

Research Article. August 2017

Research Article. August 2017 International Journals of Advanced Research in Computer Science and Software Engineering ISSN: 2277-128X (Volume-7, Issue-8) a Research Article August 2017 English-Marathi Cross Language Information Retrieval

More information

Information Retrieval. CS630 Representing and Accessing Digital Information. What is a Retrieval Model? Basic IR Processes

Information Retrieval. CS630 Representing and Accessing Digital Information. What is a Retrieval Model? Basic IR Processes CS630 Representing and Accessing Digital Information Information Retrieval: Retrieval Models Information Retrieval Basics Data Structures and Access Indexing and Preprocessing Retrieval Models Thorsten

More information

CLEF-IP 2009: Exploring Standard IR Techniques on Patent Retrieval

CLEF-IP 2009: Exploring Standard IR Techniques on Patent Retrieval DCU @ CLEF-IP 2009: Exploring Standard IR Techniques on Patent Retrieval Walid Magdy, Johannes Leveling, Gareth J.F. Jones Centre for Next Generation Localization School of Computing Dublin City University,

More information

Minoru SASAKI and Kenji KITA. Department of Information Science & Intelligent Systems. Faculty of Engineering, Tokushima University

Minoru SASAKI and Kenji KITA. Department of Information Science & Intelligent Systems. Faculty of Engineering, Tokushima University Information Retrieval System Using Concept Projection Based on PDDP algorithm Minoru SASAKI and Kenji KITA Department of Information Science & Intelligent Systems Faculty of Engineering, Tokushima University

More information

Fall CS646: Information Retrieval. Lecture 2 - Introduction to Search Result Ranking. Jiepu Jiang University of Massachusetts Amherst 2016/09/12

Fall CS646: Information Retrieval. Lecture 2 - Introduction to Search Result Ranking. Jiepu Jiang University of Massachusetts Amherst 2016/09/12 Fall 2016 CS646: Information Retrieval Lecture 2 - Introduction to Search Result Ranking Jiepu Jiang University of Massachusetts Amherst 2016/09/12 More course information Programming Prerequisites Proficiency

More information

Influence of Word Normalization on Text Classification

Influence of Word Normalization on Text Classification Influence of Word Normalization on Text Classification Michal Toman a, Roman Tesar a and Karel Jezek a a University of West Bohemia, Faculty of Applied Sciences, Plzen, Czech Republic In this paper we

More information

Cross-Language Evaluation Forum - CLEF

Cross-Language Evaluation Forum - CLEF Cross-Language Evaluation Forum - CLEF Carol Peters IEI-CNR, Pisa, Italy IST-2000-31002 Kick-off: October 2001 Outline Project Objectives Background CLIR System Evaluation CLEF Infrastructure Results so

More information

Information Retrieval. (M&S Ch 15)

Information Retrieval. (M&S Ch 15) Information Retrieval (M&S Ch 15) 1 Retrieval Models A retrieval model specifies the details of: Document representation Query representation Retrieval function Determines a notion of relevance. Notion

More information

Document Structure Analysis in Associative Patent Retrieval

Document Structure Analysis in Associative Patent Retrieval Document Structure Analysis in Associative Patent Retrieval Atsushi Fujii and Tetsuya Ishikawa Graduate School of Library, Information and Media Studies University of Tsukuba 1-2 Kasuga, Tsukuba, 305-8550,

More information

Estimating Embedding Vectors for Queries

Estimating Embedding Vectors for Queries Estimating Embedding Vectors for Queries Hamed Zamani Center for Intelligent Information Retrieval College of Information and Computer Sciences University of Massachusetts Amherst Amherst, MA 01003 zamani@cs.umass.edu

More information

Machine Translation for Information Access across the Language Barrier: the MuST System

Machine Translation for Information Access across the Language Barrier: the MuST System MT Summit VII Sept. 1999 Machine Translation for Information Access across the Language Barrier: the MuST System Chin-Yew Lin Information Sciences Institute University of Southern California Marina del

More information

CS54701: Information Retrieval

CS54701: Information Retrieval CS54701: Information Retrieval Basic Concepts 19 January 2016 Prof. Chris Clifton 1 Text Representation: Process of Indexing Remove Stopword, Stemming, Phrase Extraction etc Document Parser Extract useful

More information

James P. Callan and W. Bruce Croft. seven elds describing aspects of the information need: the information need that is related to, but often distinct

James P. Callan and W. Bruce Croft. seven elds describing aspects of the information need: the information need that is related to, but often distinct An Evaluation of Query Processing Strategies Using the TIPSTER Collection James P. Callan and W. Bruce Croft Computer Science Department University of Massachusetts, Amherst, MA 01003, USA callan@cs.umass.edu,

More information

CS 6320 Natural Language Processing

CS 6320 Natural Language Processing CS 6320 Natural Language Processing Information Retrieval Yang Liu Slides modified from Ray Mooney s (http://www.cs.utexas.edu/users/mooney/ir-course/slides/) 1 Introduction of IR System components, basic

More information

UMass at TREC 2006: Enterprise Track

UMass at TREC 2006: Enterprise Track UMass at TREC 2006: Enterprise Track Desislava Petkova and W. Bruce Croft Center for Intelligent Information Retrieval Department of Computer Science University of Massachusetts, Amherst, MA 01003 Abstract

More information

This literature review provides an overview of the various topics related to using implicit

This literature review provides an overview of the various topics related to using implicit Vijay Deepak Dollu. Implicit Feedback in Information Retrieval: A Literature Analysis. A Master s Paper for the M.S. in I.S. degree. April 2005. 56 pages. Advisor: Stephanie W. Haas This literature review

More information

Self Introduction. Presentation Outline. College of Information 3/31/2016. Multilingual Information Access to Digital Collections

Self Introduction. Presentation Outline. College of Information 3/31/2016. Multilingual Information Access to Digital Collections College of Information Multilingual Information Access to Digital Collections Jiangping Chen Http://coolt.lis.unt.edu/ Jiangping.chen@unt.edu April 20, 2016 Self Introduction An Associate Professor at

More information

A Model for Information Retrieval Agent System Based on Keywords Distribution

A Model for Information Retrieval Agent System Based on Keywords Distribution A Model for Information Retrieval Agent System Based on Keywords Distribution Jae-Woo LEE Dept of Computer Science, Kyungbok College, 3, Sinpyeong-ri, Pocheon-si, 487-77, Gyeonggi-do, Korea It2c@koreaackr

More information

Using Statistical Properties of Text to Create. Metadata. Computer Science and Electrical Engineering Department

Using Statistical Properties of Text to Create. Metadata. Computer Science and Electrical Engineering Department Using Statistical Properties of Text to Create Metadata Grace Crowder crowder@cs.umbc.edu Charles Nicholas nicholas@cs.umbc.edu Computer Science and Electrical Engineering Department University of Maryland

More information

Web searching across languages: Preference and behavior of bilingual academic users in Korea

Web searching across languages: Preference and behavior of bilingual academic users in Korea Library & Information Science Research 27 (2005) 249 263 Web searching across languages: Preference and behavior of bilingual academic users in Korea Hae-young Rieh a, T, Soo Young Rieh b a Center for

More information

Getting Information from Documents You Cannot Read: An Interactive Cross-Language Text Retrieval and Summarization System

Getting Information from Documents You Cannot Read: An Interactive Cross-Language Text Retrieval and Summarization System Getting Information from Documents You Cannot Read: An Interactive Cross-Language Text Retrieval and Summarization System William Ogden, James Cowie, Mark Davis, Eugene Ludovik, Hugo Molina-Salgado, and

More information

Optimal Query. Assume that the relevant set of documents C r. 1 N C r d j. d j. Where N is the total number of documents.

Optimal Query. Assume that the relevant set of documents C r. 1 N C r d j. d j. Where N is the total number of documents. Optimal Query Assume that the relevant set of documents C r are known. Then the best query is: q opt 1 C r d j C r d j 1 N C r d j C r d j Where N is the total number of documents. Note that even this

More information

1.0 Abstract. 2.0 TIPSTER and the Computing Research Laboratory. 2.1 OLEADA: Task-Oriented User- Centered Design in Natural Language Processing

1.0 Abstract. 2.0 TIPSTER and the Computing Research Laboratory. 2.1 OLEADA: Task-Oriented User- Centered Design in Natural Language Processing Oleada: User-Centered TIPSTER Technology for Language Instruction 1 William C. Ogden and Philip Bernick The Computing Research Laboratory at New Mexico State University Box 30001, Department 3CRL, Las

More information

The answer (circa 2001)

The answer (circa 2001) Question ing Question Answering Overview and task definition History Open-domain question ing Basic system architecture Predictive indexing methods Pattern-matching methods Advanced techniques? What was

More information

A RECOMMENDER SYSTEM FOR SOCIAL BOOK SEARCH

A RECOMMENDER SYSTEM FOR SOCIAL BOOK SEARCH A RECOMMENDER SYSTEM FOR SOCIAL BOOK SEARCH A thesis Submitted to the faculty of the graduate school of the University of Minnesota by Vamshi Krishna Thotempudi In partial fulfillment of the requirements

More information

Cross-lingual Information Retrieval

Cross-lingual Information Retrieval Cross-lingual Information Retrieval State-of-the-Art Nurul Amelina Nasharuddin, Muhamad Taufik Abdullah Department of Multimedia Faculty of Computer Science & Information Technology, UPM 43400 UPM Serdang,

More information

Improving Difficult Queries by Leveraging Clusters in Term Graph

Improving Difficult Queries by Leveraging Clusters in Term Graph Improving Difficult Queries by Leveraging Clusters in Term Graph Rajul Anand and Alexander Kotov Department of Computer Science, Wayne State University, Detroit MI 48226, USA {rajulanand,kotov}@wayne.edu

More information

Cross Lingual Information Retrieval Using Data Mining Methods

Cross Lingual Information Retrieval Using Data Mining Methods Association for Information Systems AIS Electronic Library (AISeL) AMCIS 2009 Proceedings Americas Conference on Information Systems (AMCIS) 2009 Cross Lingual Information Retrieval Using Data Mining Methods

More information

Automatic Generation of Query Sessions using Text Segmentation

Automatic Generation of Query Sessions using Text Segmentation Automatic Generation of Query Sessions using Text Segmentation Debasis Ganguly, Johannes Leveling, and Gareth J.F. Jones CNGL, School of Computing, Dublin City University, Dublin-9, Ireland {dganguly,

More information

Effect of log-based Query Term Expansion on Retrieval Effectiveness in Patent Searching

Effect of log-based Query Term Expansion on Retrieval Effectiveness in Patent Searching Effect of log-based Query Term Expansion on Retrieval Effectiveness in Patent Searching Wolfgang Tannebaum, Parvaz Madabi and Andreas Rauber Institute of Software Technology and Interactive Systems, Vienna

More information

Towards open-domain QA. Question answering. TReC QA framework. TReC QA: evaluation

Towards open-domain QA. Question answering. TReC QA framework. TReC QA: evaluation Question ing Overview and task definition History Open-domain question ing Basic system architecture Watson s architecture Techniques Predictive indexing methods Pattern-matching methods Advanced techniques

More information

Indri at TREC 2005: Terabyte Track (Notebook Version)

Indri at TREC 2005: Terabyte Track (Notebook Version) Indri at TREC 2005: Terabyte Track (Notebook Version) Donald Metzler, Trevor Strohman, Yun Zhou, W. B. Croft Center for Intelligent Information Retrieval University of Massachusetts, Amherst Abstract This

More information

CS6200 Information Retrieval. Jesse Anderton College of Computer and Information Science Northeastern University

CS6200 Information Retrieval. Jesse Anderton College of Computer and Information Science Northeastern University CS6200 Information Retrieval Jesse Anderton College of Computer and Information Science Northeastern University Major Contributors Gerard Salton! Vector Space Model Indexing Relevance Feedback SMART Karen

More information

Information Retrieval Research

Information Retrieval Research ELECTRONIC WORKSHOPS IN COMPUTING Series edited by Professor C.J. van Rijsbergen Jonathan Furner, School of Information and Media Studies, and David Harper, School of Computer and Mathematical Studies,

More information

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition Ana Zelaia, Olatz Arregi and Basilio Sierra Computer Science Faculty University of the Basque Country ana.zelaia@ehu.es

More information

An Attempt to Identify Weakest and Strongest Queries

An Attempt to Identify Weakest and Strongest Queries An Attempt to Identify Weakest and Strongest Queries K. L. Kwok Queens College, City University of NY 65-30 Kissena Boulevard Flushing, NY 11367, USA kwok@ir.cs.qc.edu ABSTRACT We explore some term statistics

More information

Sheffield University and the TREC 2004 Genomics Track: Query Expansion Using Synonymous Terms

Sheffield University and the TREC 2004 Genomics Track: Query Expansion Using Synonymous Terms Sheffield University and the TREC 2004 Genomics Track: Query Expansion Using Synonymous Terms Yikun Guo, Henk Harkema, Rob Gaizauskas University of Sheffield, UK {guo, harkema, gaizauskas}@dcs.shef.ac.uk

More information

Manning Chapter: Text Retrieval (Selections) Text Retrieval Tasks. Vorhees & Harman (Bulkpack) Evaluation The Vector Space Model Advanced Techniques

Manning Chapter: Text Retrieval (Selections) Text Retrieval Tasks. Vorhees & Harman (Bulkpack) Evaluation The Vector Space Model Advanced Techniques Text Retrieval Readings Introduction Manning Chapter: Text Retrieval (Selections) Text Retrieval Tasks Vorhees & Harman (Bulkpack) Evaluation The Vector Space Model Advanced Techniues 1 2 Text Retrieval:

More information

INFORMATION RETRIEVAL SYSTEMS: Theory and Implementation

INFORMATION RETRIEVAL SYSTEMS: Theory and Implementation INFORMATION RETRIEVAL SYSTEMS: Theory and Implementation THE KLUWER INTERNATIONAL SERIES ON INFORMATION RETRIEVAL Series Editor W. Bruce Croft University of Massachusetts Amherst, MA 01003 Also in the

More information

Making Sense Out of the Web

Making Sense Out of the Web Making Sense Out of the Web Rada Mihalcea University of North Texas Department of Computer Science rada@cs.unt.edu Abstract. In the past few years, we have witnessed a tremendous growth of the World Wide

More information

Inter and Intra-Document Contexts Applied in Polyrepresentation

Inter and Intra-Document Contexts Applied in Polyrepresentation Inter and Intra-Document Contexts Applied in Polyrepresentation Mette Skov, Birger Larsen and Peter Ingwersen Department of Information Studies, Royal School of Library and Information Science Birketinget

More information

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition Ana Zelaia, Olatz Arregi and Basilio Sierra Computer Science Faculty University of the Basque Country ana.zelaia@ehu.es

More information

A Deep Relevance Matching Model for Ad-hoc Retrieval

A Deep Relevance Matching Model for Ad-hoc Retrieval A Deep Relevance Matching Model for Ad-hoc Retrieval Jiafeng Guo 1, Yixing Fan 1, Qingyao Ai 2, W. Bruce Croft 2 1 CAS Key Lab of Web Data Science and Technology, Institute of Computing Technology, Chinese

More information

Context-Based Topic Models for Query Modification

Context-Based Topic Models for Query Modification Context-Based Topic Models for Query Modification W. Bruce Croft and Xing Wei Center for Intelligent Information Retrieval University of Massachusetts Amherst 140 Governors rive Amherst, MA 01002 {croft,xwei}@cs.umass.edu

More information

Cross-lingual thesaurus for multilingual knowledge management

Cross-lingual thesaurus for multilingual knowledge management Available online at www.sciencedirect.com Decision Support Systems 45 (2008) 596 605 www.elsevier.com/locate/dss Cross-lingual thesaurus for multilingual knowledge management Christopher C. Yang a,, Chih-Ping

More information

Recap: lecture 2 CS276A Information Retrieval

Recap: lecture 2 CS276A Information Retrieval Recap: lecture 2 CS276A Information Retrieval Stemming, tokenization etc. Faster postings merges Phrase queries Lecture 3 This lecture Index compression Space estimation Corpus size for estimates Consider

More information

Indexing and Query Processing

Indexing and Query Processing Indexing and Query Processing Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu January 28, 2013 Basic Information Retrieval Process doc doc doc doc doc information need document representation

More information

Building Test Collections. Donna Harman National Institute of Standards and Technology

Building Test Collections. Donna Harman National Institute of Standards and Technology Building Test Collections Donna Harman National Institute of Standards and Technology Cranfield 2 (1962-1966) Goal: learn what makes a good indexing descriptor (4 different types tested at 3 levels of

More information

Information Retrieval: Retrieval Models

Information Retrieval: Retrieval Models CS473: Web Information Retrieval & Management CS-473 Web Information Retrieval & Management Information Retrieval: Retrieval Models Luo Si Department of Computer Science Purdue University Retrieval Models

More information

Evaluating a Conceptual Indexing Method by Utilizing WordNet

Evaluating a Conceptual Indexing Method by Utilizing WordNet Evaluating a Conceptual Indexing Method by Utilizing WordNet Mustapha Baziz, Mohand Boughanem, Nathalie Aussenac-Gilles IRIT/SIG Campus Univ. Toulouse III 118 Route de Narbonne F-31062 Toulouse Cedex 4

More information

Tilburg University. Authoritative re-ranking of search results Bogers, A.M.; van den Bosch, A. Published in: Advances in Information Retrieval

Tilburg University. Authoritative re-ranking of search results Bogers, A.M.; van den Bosch, A. Published in: Advances in Information Retrieval Tilburg University Authoritative re-ranking of search results Bogers, A.M.; van den Bosch, A. Published in: Advances in Information Retrieval Publication date: 2006 Link to publication Citation for published

More information

Adaptive Model of Personalized Searches using Query Expansion and Ant Colony Optimization in the Digital Library

Adaptive Model of Personalized Searches using Query Expansion and Ant Colony Optimization in the Digital Library International Conference on Information Systems for Business Competitiveness (ICISBC 2013) 90 Adaptive Model of Personalized Searches using and Ant Colony Optimization in the Digital Library Wahyu Sulistiyo

More information

Query Expansion from Wikipedia and Topic Web Crawler on CLIR

Query Expansion from Wikipedia and Topic Web Crawler on CLIR Query Expansion from Wikipedia and Topic Web Crawler on CLIR Meng-Chun Lin, Ming-Xiang Li, Chih-Chuan Hsu and Shih-Hung Wu* Department of Computer Science and Information Engineering Chaoyang University

More information

Siemens TREC-4 Report: Further Experiments with Database. Merging. Ellen M. Voorhees. Siemens Corporate Research, Inc.

Siemens TREC-4 Report: Further Experiments with Database. Merging. Ellen M. Voorhees. Siemens Corporate Research, Inc. Siemens TREC-4 Report: Further Experiments with Database Merging Ellen M. Voorhees Siemens Corporate Research, Inc. Princeton, NJ ellen@scr.siemens.com Abstract A database merging technique is a strategy

More information

MT in the Online Environment: Challenges and Opportunities

MT in the Online Environment: Challenges and Opportunities Abstract MT in the Online Environment: Challenges and Opportunities Mary Flanagan (mflanagan@csi.compuserve.com) CompuServe Natural Language Technologies Integrating machine translation in online services

More information

Introduction & Administrivia

Introduction & Administrivia Introduction & Administrivia Information Retrieval Evangelos Kanoulas ekanoulas@uva.nl Section 1: Unstructured data Sec. 8.1 2 Big Data Growth of global data volume data everywhere! Web data: observation,

More information

Web Services in Language Technology and Terminology Management

Web Services in Language Technology and Terminology Management Web Services in Language Technology and Terminology Management Uwe Quasthoff, Christian Wolff Leipzig University Computer Science Institute, NLP Dept. Augustusplatz 10/11 04109 Leipzig, Germany {quasthoff,

More information

Web Search Engine Question Answering

Web Search Engine Question Answering Web Search Engine Question Answering Reena Pindoria Supervisor Dr Steve Renals Com3021 07/05/2003 This report is submitted in partial fulfilment of the requirement for the degree of Bachelor of Science

More information

Quoogle: A Query Expander for Google

Quoogle: A Query Expander for Google Quoogle: A Query Expander for Google Michael Smit Faculty of Computer Science Dalhousie University 6050 University Avenue Halifax, NS B3H 1W5 smit@cs.dal.ca ABSTRACT The query is the fundamental way through

More information

3 Publishing Technique

3 Publishing Technique Publishing Tool 32 3 Publishing Technique As discussed in Chapter 2, annotations can be extracted from audio, text, and visual features. The extraction of text features from the audio layer is the approach

More information

The University of Amsterdam at the CLEF 2008 Domain Specific Track

The University of Amsterdam at the CLEF 2008 Domain Specific Track The University of Amsterdam at the CLEF 2008 Domain Specific Track Parsimonious Relevance and Concept Models Edgar Meij emeij@science.uva.nl ISLA, University of Amsterdam Maarten de Rijke mdr@science.uva.nl

More information

IIT at TREC-10. A. Chowdhury AOL Inc. D. Holmes NCR Corporation

IIT at TREC-10. A. Chowdhury AOL Inc. D. Holmes NCR Corporation IIT at TREC-10 M. Aljlayl, S. Beitzel, E. Jensen Information Retrieval Laboratory Department of Computer Science Illinois Institute of Technology Chicago, IL 60616 {aljlayl, beitzel, jensen } @ ir.iit.edu

More information

QUERY EXPANSION USING WORDNET WITH A LOGICAL MODEL OF INFORMATION RETRIEVAL

QUERY EXPANSION USING WORDNET WITH A LOGICAL MODEL OF INFORMATION RETRIEVAL QUERY EXPANSION USING WORDNET WITH A LOGICAL MODEL OF INFORMATION RETRIEVAL David Parapar, Álvaro Barreiro AILab, Department of Computer Science, University of A Coruña, Spain dparapar@udc.es, barreiro@udc.es

More information

Text Mining. Munawar, PhD. Text Mining - Munawar, PhD

Text Mining. Munawar, PhD. Text Mining - Munawar, PhD 10 Text Mining Munawar, PhD Definition Text mining also is known as Text Data Mining (TDM) and Knowledge Discovery in Textual Database (KDT).[1] A process of identifying novel information from a collection

More information

Chapter 6: Information Retrieval and Web Search. An introduction

Chapter 6: Information Retrieval and Web Search. An introduction Chapter 6: Information Retrieval and Web Search An introduction Introduction n Text mining refers to data mining using text documents as data. n Most text mining tasks use Information Retrieval (IR) methods

More information

INFORMATION RETRIEVAL SYSTEM: CONCEPT AND SCOPE

INFORMATION RETRIEVAL SYSTEM: CONCEPT AND SCOPE 15 : CONCEPT AND SCOPE 15.1 INTRODUCTION Information is communicated or received knowledge concerning a particular fact or circumstance. Retrieval refers to searching through stored information to find

More information