From CLIR to CLIE: Some Lessons in NTCIR Evaluation

Size: px
Start display at page:

Download "From CLIR to CLIE: Some Lessons in NTCIR Evaluation"

Transcription

1 From CLIR to CLIE: Some Lessons in NTCIR Evaluation Hsin-Hsi Chen Department of Computer Science and Information Engineering National Taiwan University Taipei, Taiwan ext 311 ABSTRACT Cross-language information retrieval (CLIR) facilitates the use of one language to access documents in other languages. Crosslanguage information extraction (CLIE) extracts relevant information in finer granularity from multilingual documents for some specific applications like summarization, question answering, opinion extraction, etc. This paper reviews CLIR, CLQA, and opinion analysis tasks in NTCIR evaluation. The design methodologies and some key technologies are reported. Categories and Subject Descriptors H.3.3 [Information Storage and Retrieval]: Information Search Retrieval Search process. General Terms Algorithms, Measurement, Performance. Keywords CLIE, CLIR, Evaluation, Opinion Analysis, Question Answering. 1. INTRODUCTION Cross-language information retrieval (CLIR) facilitates the use of one language to access documents in other languages. Crosslanguage information extraction (CLIE) extracts relevant information in finer granularity from multilingual documents for some specific applications like summarization, question answering, opinion extraction, etc. NTCIR started evaluation of CLIR tasks on Chinese, English, Japanese and Korean languages in In these 5 years ( ), four CLIR test collections say, NTCIR-2, NTCIR-3, NTCIR-4 and NTCIR-5 evaluation sets [2][3][4][5], have been developed. In NTCIR-5 ( ), we extended CLIR task to CLQA (Cross-Lingual Question Answering) task [8], which is an application of CLIE. In NTCIR- 6 ( ), we further reused the past NTCIR CLIR test collections to build a corpus for opinion analysis [6][7], which is another application of CLIE. For setting up an evaluation test set for multilingual information access, several issues including data sources, languages, genres, criteria for topic/question creation, relevance granularity, and so on, have to be considered. This paper reviews CLIR, CLQA and opinion analysis tasks in NTCIR evaluation, and reports the design methodologies and some key technologies in Sections 2-4, respectively. In each section, the definitions of subtasks, the collection of document sets, the formulation of topics, the evaluation metrics, and the technologies explored are discussed. 2. CLIR 2.1 CLIR evaluation In CLIR, the topics are in source languages and the documents are in target languages. The target languages are different from the source languages. Comparing to TREC 2 and CLEF 3, which also provide CLIR evaluation, NTCIR focuses on Asian languages and English. In 2001, Hsin-Hsi Chen and Kuang-hua Chen [2] from Department of Computer Science and Information Engineering, and Department of Library Information Science, National Taiwan University (NTU) organized two subtasks in NTCIR-2, including Chinese-Chinese IR (CHIR) and English-Chinese IR (ECIR). They collected a Chinese document set CIRB010 from 5 news agencies in Taiwan. The statistics is shown in Table 1. Table 1. CIRB010 document set. News Agency #Documents Percentage China Times 38, % Commercial Times 25, % China Times Express 5, % Central Daily News 27, % China Daily News 34, % Total 132,173 (200MB) Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. IWRIDL-2006 Kolkata, India Copyright 2007 ACM $5.00. The creation of CIRB010 topic set consists of three stages, including collecting information request through questionnaire on the web, selecting information requests and constructing topics. From 405 information requests, researchers filter out 163 unsuitable requests. Then a full text retrieval system filters out

2 173 information requests further based on the number of relevant documents reported. Finally, researchers select 50 topics from the remaining 69 information requests. We adopt pooling method to collect candidate documents from participants submissions. To speed up the evaluation procedure, we designed an evaluation platform shown in Figure 1 for assessors. The left upper part shows the name of the assessor assessing the designated document, topic ID, pool file, the j th document in the pool file, and document number. The degree of relevancy say, highly relevant (score 3), relevant (score 2), partially relevant (score 1), and irrelevant (score 0), is assigned by assessors. Assessors can consult previous judgments to make their decision, or correct their judgments. In addition, they can also give comments to their decision. The log is kept for further analysis to improve the evaluation procedure. The right upper part and the bottom part list the topic description and the document being judged, respectively. (3) Multilingual CLIR (MLIR): The target collection consists of documents in two or more languages. In MLIR evaluation, systems are checked if they can retrieve documents in more than one language relevant to the same topic. Therefore, we tried to restrict the document sets used in the evaluation within the same periods. Unfortunately, not all documents in these four languages were available due to the copyright issue. Table 2 summarizes the document sets used in NTCIR-3 CLIR task. CJE news articles were published in , however, Korean news articles were published in Thus, we divided the collection into CJE part ( ) and Korean part (1994). Different topic sets were created for each part. As before, four different granularities of measurement, i.e., highly relevant (S), relevant (A), partially relevant (B) and irrelevant (C), are adopted. Instead of using the above formula, documents of either S or A are regarded as correct in a rigid case. In a relaxed case, documents of S, A, or B are considered as correct. Table 2. NTCIR-3 CLIR document sets. Japan Korea Taiwan Mainichi Newspaper ( ): Japanese Mainichi Daily News ( ): English Korea Economic Daily (1994): Korean CIRB011 ( ): Chinese United Daily News (CIRB020, ): Chinese 220,078 12,723 66, , ,508 Figure 1. Evaluation platform. Each topic is judged by three assessors. Total 23 assessors spent 799 hours to judge the relevancy of 44,924 documents. We integrated the 3 scores for each document in the pool file by the following way: ( X ) A + X B + X C 3 R = 3 where R is integrated score, X A, X B, and X C are 3 scores assigned by assessors. In a rigid case, a document is considered as correct when its R score is within and 1. Comparatively, in a relaxed case, a document is considered correct when its R score is within and 1. In NTCIR-3, CLIR become an international joint effort. Research groups from Japan, Korea and Taiwan are involved in the design. We began to evaluate the CLIR problems on three Asian languages (Chinese (C), Japanese (J), and Korean (K)), and English (E). Three subtasks shown as follows are designed [3]: (1) Single Language IR (SLIR): The language of search topics is identical to that of documents. (2) Bilingual CLIR (BLIR): The document set to be searched is in a single language different from the language of the topic set. Taiwan News and Chinatimes English News (EIRB010, ): English 10,204 In the NTCIR-3 testing data set, the number of English documents is times and 9.60 times less than that of Chinese and Japanese documents, respectively. There are 18 topics (36%) without any relevant documents in English data set. The number of relevant documents from English collection is far less than that from Chinese and Japanese collection. The document sets come from the same periods in NTCIR-4 and NTCIR-5 CLIR tasks. In NTCIR-4, 254,438 Korean news articles published within were added into the collection, and the sizes of Chinese, Japanese, and English document sets were extended to 381,375, 593,636, and 347,376, respectively [4]. In NTCIR-5, publication period was changed into , and the numbers of documents in Chinese, Japanese, Korean, and English were further expanded to 901,446, 858,400, 220,374, and 259,050, respectively [5]. The research issues which have been explored are shown as follows [3][4][5]. (1) Index methods: indexing of CJK text, decompounding problem, identification of named entities, dictionaries for indexing, and so on.

3 (2) Translation: query/document translation, translation methods and sources, term disambiguation, multiword translation, out-of-vocabulary problem, transliteration method, conversion of Kanji codes, cognate matching, pivot language approach, and so on. (3) Retrieval models: Okapi (BM25 and its variations), vector model, logistic regression model, language model, data fusion, and so on. (4) Query expansion and re-ranking: pseudo relevance feedback, web-based expansion, statistical thesauri, pretranslation expansion, document re-ranking, and so on. 2.2 The Web as a Translation Aid Translation is necessary for CLIR. In addition to bilingual dictionaries and machine translation systems, the web also serves as a translation aid. After bilingual dictionary lookup, those outof-vocabulary (OOV) query terms are translated by using the web as a multilingual corpus. For example, the named entity is an important query term, but not in the bilingual dictionary. Figure 2 demonstrates a snapshot after Google search, where snippets in a sorted sequence are returned. Figure 3 shows one of snippets in which the corresponding English translation appears. Here, a snippet consists of title, type, body and source fields. The following depicts how to extract the translation pairs from snippets. Figure 2. A snapshot after Google search Figure 3. A Snippet containing translation of the named entity The basic algorithm is as follows. Top-k snippets returned by Google are analyzed. For each snippet, we collect those continuous capitalized words, and regard them as candidates. Then we count the total occurrences of each candidate in the k snippets, and sort the candidates by their frequencies. The candidates of the larger occurrences are considered as the translation of the query term. The above algorithm does not consider the distance between the query term and the corresponding candidate in a snippet. Intuitively, the larger the distance is, the less possible a candidate is. We modify the basic algorithm as follows. We drop those candidates whose distances are larger than a predefined threshold. In this way, a snippet may not contribute any candidates. To collect enough candidates say, cnum, we may have to examine more than k snippets. Because there may not always exist cnum candidates, we stop collecting when maximum (max) snippets are examined. We prefer those candidates of higher occurrences with the query term and smaller average distances. 3. CLQA 3.1 CLQA evaluation Question-answering (QA) attracts much attention due to that huge heterogeneous data collection is available on the Internet. In NTCIR-5, we initiated CLQA as a pilot task. Five subtasks, including JE, EJ, CE, CC, and EC where C, E, and J stand for Chinese, English, and Japanese, respectively, were evaluated. For each subtask XY, a question in source language X is submitted to a QA system, and answers will be extracted from documents in target language Y [8]. The document collection consists of materials in 3 languages, including Chinese data set of 901,446 news articles (from UDN.com), Japanese data set of 658,719 news articles (from Yomiuri Newspaper), and English data set of 17,741 news articles (the Daily Yomiuri). Because English data set is comparatively small, we have to consider if there exist answers in English corpus when designing questions. For CE subtask, the Chinese questions are translated from English questions by human translators. For CC and EC, we reference to CLIR topics and the logs kept for an online Chinese QA system. In NTCIR-5 CLQA task, the answer types were restricted to named entities (NEs) such as PERSON, LOCATION, ORGANIZATION, ARTIFACT (product name, book title, law, etc.), DATE, TIME, MONEY, PERCENT, and NUMEX. Total 200 questions were provided for each subtask in the formal run evaluation. How to find the correct answer is the major concern in this pilot task. The participants were asked to submit the answer in target language rather than to translate the answers back to original language. The evaluation criteria for each answer are shown as follows: (1) Right: The answer is correct, and the document containing the answer also supports it. (2) Unsupported: The answer is correct, but the document containing the answer cannot support it. (3) Wrong: The answer is incorrect. The answers were evaluated by different metrics: accuracy in official runs, and MRR and top-5 in unofficial runs. The challenging issues in CLQA task are two folds, including machine translation and question-answering. We have to translate questions from source language to target language, and retrieve the relevant documents containing the answers. That is similar to CLIR. The translated questions are also employed to extract the answers. Translation errors may result in poor IR and IE performance. Techniques such as machine-readable dictionaries,

4 on-line machine translation systems, collocation on search results from the web, and so on, have been explored. 3.2 Answer translation and fusion In the goal of CLQA, the extracted answers are in terms of source language. In other words, they have to be translated to the language in which questions are. In the pilot study, we focus on the performance of question translation and answer retrieval. We did not ask participants to do answer translation. For multilingual CLQA, we submit a question to extract the plausible answers from a multilingual document collection. The same named entities may be reported in different languages. For example, in the Chinese question 1997? (Who was the Japanese Prime Minister in 1997?), Table 3 lists the first five answers from English and Chinese document sets, respectively. Merging answers from multiple sources is an additional task in multilingual CLQA. Extension of the methodology in Section 2.2 may be adopted. Table 3. Answers in different languages. In this example,,, and denote the same persons as Yoshiro Mori, Keizo Obuchi, and Ryutaro Hashimoto, respectively. We can merge the two sets of answers in the following way. (1) Multiply out the English answers E i (1 i 5) and the Chinese answers C j (1 j 5), and generate 25 combinations. (2) For a combination (E i, C j ), submit E i and C j together to Google, and employ the similar way as the methods specified in Section 2.2 to verify if E i and C j appear in the neighborhood. If the combination has strong collocation, then delete (E i, X) (where X C j ) and (X, C j ) (where X E i ), and try the remaining combinations. Figure 4 shows an example of submitting Keizo Obuchi to Goggle. The collocation is marked in red. Figure 4. Collocation Example of and Keizo Obuchi. 4. OPINION ANALYSIS 4.1 Opinion extraction evaluation Humans like to express their opinions and are eager to know others opinions. An opinion is a word string that someone expresses to declare his stand toward a specific target. The named entity who expresses an opinion is called the opinion holder. There may not always be an opinion holder in a sentence. A target may be a product, a person, an event, and so on. Automatically mining and organizing opinions from heterogeneous information sources are very useful for individuals, organizations and even governments [7][8]. Opinion extraction, opinion summarization and opinion tracking are three important techniques for understanding opinions. Opinion extraction identifies opinion holders, extracts the relevant opinion sentences and decides their polarity. Opinion summarization recognizes the major events embedded in documents and summarizes the supportive and the non-supportive evidences. Opinion tracking captures subjective information from various genres and monitors the developments of opinions from spatial and temporal dimensions. There are many applications such as polls on public issues, product review analysis, opinion collection of famous people, monitoring changes of public opinions, opinions toward candidates in an election, summarizing opinions of different social classes, and so on. In 2005, the open submission session at NTCIR-5 [1] collected researchers comments about a new pilot task. This pilot task aims to build an opinion extraction corpus based on the past NTCIR CLIR test collections, and to promote investigation of opinionated information access. Opinion extraction is one of the kernel technologies related to opinion analysis tasks. In this pilot task, sentences are the basic units for extraction and relevance judgments. Opinions and the opinion holders are the information we focus on. The test collection consists of topics and the relevant documents. For each topic and the corresponding relevant documents, which sentences have subjective information relevant to the designated topic, their polarity, and the explicit opinion holder have to be reported.

5 To evaluate the technologies embedded, we divide opinion analysis into five subtasks. Extracting opinion holders and opinionated sentences are mandatory. To indicate the relevancy of opinionated sentences to the given topics and/or determine the polarity of relevant opinionated sentences is optional. In addition, there is an optional application-oriented subtask. The name of the opinion task describes the language, type, and granularity of the task. The format is L-T-G where L denotes the language of the material, T the type of task, and G the granularity (of the analyzed unit of materials). For example, C/J/E-OE-S denotes Chinese/Japanese/English Opinion Extraction at the Sentence Level. We selected 32 opinionated topics from NTCIR-3, -4, -5 CLIR tasks. The relevant documents for these topics were also extracted. There are 872 Chinese documents meeting our requirements. Each Chinese document is tagged with the information shown in Table 4 by three annotators. Table 4. Tags for corpus annotation. Tag Level Attribute Value Description <SEN_OP></SEN_OP> Sentence TYPE YES Sentence Opinion: Define if this NO sentence is an opinion sentence. <SEN_ATTITUDE></SEN_ATTITUDE> Sentence Sentence Subsentence TYPE SUP NSP NEU Sentence Attitude: Define the opinion polarity of a sentence <SEN_REL></SEN_REL> YES Sentence Relevance: Define if this TYPE NO sentence is relevant to the topic. <OPINION_SRC></OPINION_SRC> EXP Opinion Source: Define the opinion TYPE IMP holder of a specific opinion For speeding up the annotation, an opinion annotation tool shown in Figure 5 is designed. With this friendly interface, users can click the suitable buttons to annotate different values. Three buttons, i.e., Supportive subsentence, Nonsupportive subsentence, and Neutral subsentence, are provided. These buttons are used to annotate the target text with <SEN_ATTITUDE> </SEN_ATTITUDE> tag pair. (2) Opinion keyword Four buttons such as Positive keyword, Negative keyword, Neutral keyword, and Opinion operator are designed. These buttons are used to annotate the target text with <SENTIMENT_KW> </SENTIMENT_KW> tag pair. (3) Semantics conversion It consists of three buttons, i.e., Converted to positive, Converted to negative, and Converted to neutral. These buttons are used to annotate the target text with <CXT_ATTITUDE> </CXT_ATTITUDE> tag pair. This tag pair is applied if the sentiment polarity of an opinion keyword is converted from one to another when this keyword co-occurs with another word. (4) Opinion holder It contains two buttons say, Explicit holder and Implicit holder. These buttons are used to annotate the target text with <OPINION_SRC> </OPINION_SRC> tag pair. Now, the tool supports Chinese, English and Japanese. It is easy to extend to other languages by just including language files. To evaluate the quality of the human-tagged corpora, the agreements of annotations have to be analyzed. Inter-annotator agreements are conducted at different levels. The metrics of the inter-annotator agreement is adopted as follows. A B Agreement( A, B) = samples Three annotators denoted A, B and C examined the samples. Under lenient metrics, neutral and positive are considered to be in the same category. Strict metrics treats all three categories (positive, neutral, and negative) as distinct. The annotations are called strongly inconsistent if positive polarity and negative polarity are assigned to the same constituent by different annotators. Kappa value gives a quantitative measure of the magnitude of inter-annotator agreement. Figure 5. An opinion annotation tool. The function descriptions of buttons are shown as follows. (1) Opinion subsentence 4.2 Opinion extraction algorithms The extraction of opinion passage and determination of its tendency is not trivial. We should consider the topic specification, the keywords, and the surrounding words. The topic specification consists of two parts: the focus and the contents of an event. The focus has strong relationship with the opinion types, and the contents determine if a document or a passage is related to the topic specification. In the opinion extraction algorithm, the following cues are employed. CW: a set of concept words in a topic SW: a set of supportive keywords NS: a set of not-supportive keywords OW: a set of opinion keywords NW: a set of neutral keywords NG: a set of negation operators F: focus of a topic

6 CW and F are topic dependent, and SW, NS, OW, NW and NG are topic independent. The word-based opinion extraction algorithm is shown as follows. (a) passage level (1) Determine a passage by full stop, question mark and exclamation mark, segment the passage and perform steps (2)-(8) until all the passages in a document are read. (2) If the passage does not contain any keywords, then it is not an opinion. Move to step (1) to get the next passage. (3) If the passage contains a designated number of concept words, then it is related to the topic and go ahead to determine its type. Otherwise, move to step (1) to get the next passage. That is, although it may be an opinion, it is not related to the topic. (4) If all the keywords in the passage are supportive, then check further if the surrounding words contain any negation operator. If there does not exist such an operator, then the passage is a positive opinion. We increment the corresponding opinion counter, and move to step (1). Otherwise, we change the type of the supportive keyword with negation into not-supportive, and move to step (6). (5) If all the keywords in the passage are not-supportive, then check further if the surrounding words contain any negation operator. If there does not exist such an operator, then the passage is a negative opinion. We increment the corresponding opinion counter, and move to step (1). Otherwise, we change the type of the not-supportive keyword with negation into supportive, and move to step (6). (6) If supportive and not-supportive keywords are mixed in the passage, the majority determines the passage type, i.e., supportive or not-supportive, and move to step (1). (7) If the passage contains only neutral keywords, then set the passage as neutral, increment neutral counter, and move to step (1). (8) If the passage contains only opinion keywords, then increment neutral counter, and move to step (1). (b) document level (1) If the topic focus is anti like topic Anti-Meinung Dam Construction, related to environment protection, then reverse the types of passages and exchange the corresponding counters. (2) If number of the positive opinions is larger than that of the negative opinions, then the document is positive. (3) If number of the negative opinions is larger than that of the positive opinions, then the document is negative. (4) If number of the neutral opinions is the largest among the passages, or the numbers of the positive and negative opinions are the same, then the document is neutral. 5. CONCLUSION In the design of CLIR and CLIE evaluation, the cost to prepare the answer sets is always an issue. We try to reuse the test beds set up in the previous NTCIR evaluations. That not only reduces the cost in the development of test data, but also makes the evaluation of individual modules of a complex system feasible. Take an opinion extraction system as an example. When answering the question Why is Seed in favor of human cloning?, an opinion extraction system has to find the relevant documents to the topic human cloning, and then the positive opinions are reported. The performance of back-end information retrieval systems will have great effects on the front-end opinion extraction systems. Consistent topics on the same document sets enable researchers to test pipelining modules incrementally. 6. ACKNOWLEDGMENTS The author is very thankful to the efforts of the co-organizers in NTCIR CLIR, CLQA and opinion analysis tasks. 7. REFERENCES [1] Chen, H. H. and Koga, T. Open submission session, In Proceedings of 5th NTCIR Workshop Meeting on Evaluation of Information Access Technologies (Tokyo, Japan, December 6-9, 2005). National Institute of Informatics, Tokyo, Japan, 2005, ntcir/workshop/onlineproceedings5/index.html. [2] Chen, K. H. and Chen, H. H. Cross-language Chinese text retrieval in NTCIR workshop towards cross-language multilingual text retrieval. ACM SIGIR Forum, 35, 2 (Fall 2001), [3] Kishida, K., Chen, K. H., Lee, S. Chen, H. H., Kando, N., Kuriyama, K., Myaeng, S. H. and Eguchi, K. Cross-lingual information retrieval (CLIR) task at the NTCIR workshop 3. ACM SIGIR Forum, 38, 1 (June 2004), [4] Kishida, K., Chen, K. H., Lee, S., Kuriyama, K., Kando, N., Chen, H. H., Myaeng, S. H and Eguchi, K. Overview of CLIR task at the fourth NTCIR workshop. In Proceedings of 4th NTCIR Workshop Meeting on Evaluation of Information Access Technologies (Tokyo, Japan, June 2-4, 2004). National Institute of Informatics, Tokyo, Japan, 2004, [5] Kishida, K., Chen, K. H., Lee, S., Kuriyama, K., Kando, N., Chen, H. H. and Myaeng, S. H. Overview of CLIR task at the fifth NTCIR workshop. In Proceedings of 5th NTCIR Workshop Meeting on Evaluation of Information Access Technologies (Tokyo, Japan, December 6-9, 2005). National Institute of Informatics, Tokyo, Japan, 2005, [6] Ku, L. W., Ho, H. W. and Chen, H. H. Novel relationship discovery using opinions mined from the web. In Proceedings of Twenty-First National Conference on Artificial Intelligence (AAAI-06) (Boston, Massachusetts, July 16-20, 2006). AAAI Press, Menlo Park, California, 2006, [7] Ku, L. W., Liang, Y. T. and Chen, H. H. Opinion extraction, summarization and tracking in news and blog corpora. In Proceedings of AAAI-2006 Spring Symposium on Computational Approaches to Analyzing Weblogs. (Stanford, USA, March 27-29, 2006) AAAI Technical Report SS-06-03, California, USA, 2006, [8] Sasaki, Y., Chen, H. H., Chen, K. H. and Lin, C. J. Overview of the NTCIR-5 cross-lingual question answering task. In Proceedings of 5th NTCIR Workshop Meeting on Evaluation of Information Access Technologies (Tokyo, Japan, December 6-9, 2005). National Institute of Informatics, Tokyo, Japan, 2005,

NTUBROWS System for NTCIR-7. Information Retrieval for Question Answering

NTUBROWS System for NTCIR-7. Information Retrieval for Question Answering NTUBROWS System for NTCIR-7 Information Retrieval for Question Answering I-Chien Liu, Lun-Wei Ku, *Kuang-hua Chen, and Hsin-Hsi Chen Department of Computer Science and Information Engineering, *Department

More information

Document Structure Analysis in Associative Patent Retrieval

Document Structure Analysis in Associative Patent Retrieval Document Structure Analysis in Associative Patent Retrieval Atsushi Fujii and Tetsuya Ishikawa Graduate School of Library, Information and Media Studies University of Tsukuba 1-2 Kasuga, Tsukuba, 305-8550,

More information

A Practical Passage-based Approach for Chinese Document Retrieval

A Practical Passage-based Approach for Chinese Document Retrieval A Practical Passage-based Approach for Chinese Document Retrieval Szu-Yuan Chi 1, Chung-Li Hsiao 1, Lee-Feng Chien 1,2 1. Department of Information Management, National Taiwan University 2. Institute of

More information

Overview of the Patent Mining Task at the NTCIR-8 Workshop

Overview of the Patent Mining Task at the NTCIR-8 Workshop Overview of the Patent Mining Task at the NTCIR-8 Workshop Hidetsugu Nanba Atsushi Fujii Makoto Iwayama Taiichi Hashimoto Graduate School of Information Sciences, Hiroshima City University 3-4-1 Ozukahigashi,

More information

Cross-Language Chinese Text Retrieval in NTCIR Workshop Towards Cross-Language Multilingual Text Retrieval

Cross-Language Chinese Text Retrieval in NTCIR Workshop Towards Cross-Language Multilingual Text Retrieval Cross-Language Chinese Text Retrieval in NTCIR Workshop Towards Cross-Language Multilingual Text Retrieval Kuang-hua Chen + and Hsin-Hsi Chen * + Department of Library and Information Science National

More information

Overview of the Patent Retrieval Task at the NTCIR-6 Workshop

Overview of the Patent Retrieval Task at the NTCIR-6 Workshop Overview of the Patent Retrieval Task at the NTCIR-6 Workshop Atsushi Fujii, Makoto Iwayama, Noriko Kando Graduate School of Library, Information and Media Studies University of Tsukuba 1-2 Kasuga, Tsukuba,

More information

indexing and query processing. The inverted le was constructed for the retrieval target collection which contains full texts of two years' Japanese pa

indexing and query processing. The inverted le was constructed for the retrieval target collection which contains full texts of two years' Japanese pa Term Distillation in Patent Retrieval Hideo Itoh Hiroko Mano Yasushi Ogawa Software R&D Group, RICOH Co., Ltd. 1-1-17 Koishikawa, Bunkyo-ku, Tokyo 112-0002, JAPAN fhideo,mano,yogawag@src.ricoh.co.jp Abstract

More information

Cross-Lingual Information Access and Its Evaluation

Cross-Lingual Information Access and Its Evaluation Cross-Lingual Information Access and Its Evaluation Noriko Kando Research and Development Department National Center for Science Information Systems (NACSIS), Japan URL: http://www.rd.nacsis.ac.jp/~{ntcadm,kando}/

More information

Overview of Patent Retrieval Task at NTCIR-5

Overview of Patent Retrieval Task at NTCIR-5 Overview of Patent Retrieval Task at NTCIR-5 Atsushi Fujii, Makoto Iwayama, Noriko Kando Graduate School of Library, Information and Media Studies University of Tsukuba 1-2 Kasuga, Tsukuba, 305-8550, Japan

More information

Using an Image-Text Parallel Corpus and the Web for Query Expansion in Cross-Language Image Retrieval

Using an Image-Text Parallel Corpus and the Web for Query Expansion in Cross-Language Image Retrieval Using an Image-Text Parallel Corpus and the Web for Query Expansion in Cross-Language Image Retrieval Yih-Chen Chang and Hsin-Hsi Chen * Department of Computer Science and Information Engineering National

More information

Automatically Generating Queries for Prior Art Search

Automatically Generating Queries for Prior Art Search Automatically Generating Queries for Prior Art Search Erik Graf, Leif Azzopardi, Keith van Rijsbergen University of Glasgow {graf,leif,keith}@dcs.gla.ac.uk Abstract This report outlines our participation

More information

Comment Extraction from Blog Posts and Its Applications to Opinion Mining

Comment Extraction from Blog Posts and Its Applications to Opinion Mining Comment Extraction from Blog Posts and Its Applications to Opinion Mining Huan-An Kao, Hsin-Hsi Chen Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan

More information

CLEF-IP 2009: Exploring Standard IR Techniques on Patent Retrieval

CLEF-IP 2009: Exploring Standard IR Techniques on Patent Retrieval DCU @ CLEF-IP 2009: Exploring Standard IR Techniques on Patent Retrieval Walid Magdy, Johannes Leveling, Gareth J.F. Jones Centre for Next Generation Localization School of Computing Dublin City University,

More information

Applying the KISS Principle for the CLEF- IP 2010 Prior Art Candidate Patent Search Task

Applying the KISS Principle for the CLEF- IP 2010 Prior Art Candidate Patent Search Task Applying the KISS Principle for the CLEF- IP 2010 Prior Art Candidate Patent Search Task Walid Magdy, Gareth J.F. Jones Centre for Next Generation Localisation School of Computing Dublin City University,

More information

HUKB at NTCIR-12 IMine-2 task: Utilization of Query Analysis Results and Wikipedia Data for Subtopic Mining

HUKB at NTCIR-12 IMine-2 task: Utilization of Query Analysis Results and Wikipedia Data for Subtopic Mining HUKB at NTCIR-12 IMine-2 task: Utilization of Query Analysis Results and Wikipedia Data for Subtopic Mining Masaharu Yoshioka Graduate School of Information Science and Technology, Hokkaido University

More information

A Micro-analysis of Topic Variation for a Geotemporal Query

A Micro-analysis of Topic Variation for a Geotemporal Query A Micro-analysis of Topic Variation for a Geotemporal Query Fredric Gey, Ray Larson, Jorge Machado, Masaharu Yoshioka* University of California, Berkeley USA INESC-ID, National Institute of Electroniques

More information

TEXT CHAPTER 5. W. Bruce Croft BACKGROUND

TEXT CHAPTER 5. W. Bruce Croft BACKGROUND 41 CHAPTER 5 TEXT W. Bruce Croft BACKGROUND Much of the information in digital library or digital information organization applications is in the form of text. Even when the application focuses on multimedia

More information

CriES 2010

CriES 2010 CriES Workshop @CLEF 2010 Cross-lingual Expert Search - Bridging CLIR and Social Media Institut AIFB Forschungsgruppe Wissensmanagement (Prof. Rudi Studer) Organizing Committee: Philipp Sorg Antje Schultz

More information

Learning to find transliteration on the Web

Learning to find transliteration on the Web Learning to find transliteration on the Web Chien-Cheng Wu Department of Computer Science National Tsing Hua University 101 Kuang Fu Road, Hsin chu, Taiwan d9283228@cs.nthu.edu.tw Jason S. Chang Department

More information

Study on Merging Multiple Results from Information Retrieval System

Study on Merging Multiple Results from Information Retrieval System Proceedings of the Third NTCIR Workshop Study on Merging Multiple Results from Information Retrieval System Hiromi itoh OZAKU, Masao UTIAMA, Hitoshi ISAHARA Communications Research Laboratory 2-2-2 Hikaridai,

More information

Query Expansion from Wikipedia and Topic Web Crawler on CLIR

Query Expansion from Wikipedia and Topic Web Crawler on CLIR Query Expansion from Wikipedia and Topic Web Crawler on CLIR Meng-Chun Lin, Ming-Xiang Li, Chih-Chuan Hsu and Shih-Hung Wu* Department of Computer Science and Information Engineering Chaoyang University

More information

Multilingual Information Retrieval

Multilingual Information Retrieval Proposal for Tutorial on Multilingual Information Retrieval Proposed by Arjun Atreya V Shehzaad Dhuliawala ShivaKarthik S Swapnil Chaudhari Under the direction of Prof. Pushpak Bhattacharyya Department

More information

Pseudo-Relevance Feedback and Title Re-Ranking for Chinese Information Retrieval

Pseudo-Relevance Feedback and Title Re-Ranking for Chinese Information Retrieval Pseudo-Relevance Feedback and Title Re-Ranking Chinese Inmation Retrieval Robert W.P. Luk Department of Computing The Hong Kong Polytechnic University Email: csrluk@comp.polyu.edu.hk K.F. Wong Dept. Systems

More information

Evaluation of Information Access Technologies at NTCIR Workshop

Evaluation of Information Access Technologies at NTCIR Workshop Evaluation of Information Access Technologies at NTC Workshop Noriko Kando National Institute of Informatics (NII), Tokyo kando@nii.ac.jp Abstract: This paper introduces the NTC Workshops, a series of

More information

Proceedings of NTCIR-9 Workshop Meeting, December 6-9, 2011, Tokyo, Japan

Proceedings of NTCIR-9 Workshop Meeting, December 6-9, 2011, Tokyo, Japan Overview of the NTCIR-9 Crosslink Task: Cross-lingual Link Discovery Ling-Xiang Tang 1, Shlomo Geva 1, Andrew Trotman 2, Yue Xu 1, Kelly Y. Itakura 1 1 Faculty of Science and Technology, Queensland University

More information

Indexing and Query Processing

Indexing and Query Processing Indexing and Query Processing Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu January 28, 2013 Basic Information Retrieval Process doc doc doc doc doc information need document representation

More information

Proceedings of NTCIR-9 Workshop Meeting, December 6-9, 2011, Tokyo, Japan

Proceedings of NTCIR-9 Workshop Meeting, December 6-9, 2011, Tokyo, Japan Read Article Management in Document Search Process for NTCIR-9 VisEx Task Yasufumi Takama Tokyo Metropolitan University 6-6 Asahigaoka, Hino Tokyo 191-0065 ytakama@sd.tmu.ac.jp Shunichi Hattori Tokyo Metropolitan

More information

DCU at FIRE 2013: Cross-Language!ndian News Story Search

DCU at FIRE 2013: Cross-Language!ndian News Story Search DCU at FIRE 2013: Cross-Language!ndian News Story Search Piyush Arora, Jennifer Foster, and Gareth J. F. Jones CNGL Centre for Global Intelligent Content School of Computing, Dublin City University Glasnevin,

More information

Cross-Language Evaluation Forum - CLEF

Cross-Language Evaluation Forum - CLEF Cross-Language Evaluation Forum - CLEF Carol Peters IEI-CNR, Pisa, Italy IST-2000-31002 Kick-off: October 2001 Outline Project Objectives Background CLIR System Evaluation CLEF Infrastructure Results so

More information

Classification and retrieval of biomedical literatures: SNUMedinfo at CLEF QA track BioASQ 2014

Classification and retrieval of biomedical literatures: SNUMedinfo at CLEF QA track BioASQ 2014 Classification and retrieval of biomedical literatures: SNUMedinfo at CLEF QA track BioASQ 2014 Sungbin Choi, Jinwook Choi Medical Informatics Laboratory, Seoul National University, Seoul, Republic of

More information

A Document Graph Based Query Focused Multi- Document Summarizer

A Document Graph Based Query Focused Multi- Document Summarizer A Document Graph Based Query Focused Multi- Document Summarizer By Sibabrata Paladhi and Dr. Sivaji Bandyopadhyay Department of Computer Science and Engineering Jadavpur University Jadavpur, Kolkata India

More information

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University CS473: CS-473 Course Review Luo Si Department of Computer Science Purdue University Basic Concepts of IR: Outline Basic Concepts of Information Retrieval: Task definition of Ad-hoc IR Terminologies and

More information

ResPubliQA 2010

ResPubliQA 2010 SZTAKI @ ResPubliQA 2010 David Mark Nemeskey Computer and Automation Research Institute, Hungarian Academy of Sciences, Budapest, Hungary (SZTAKI) Abstract. This paper summarizes the results of our first

More information

of combining different Web search engines on Web question-answering

of combining different Web search engines on Web question-answering Effect of combining different Web search engines on Web question-answering Tatsunori Mori Akira Kanai Madoka Ishioroshi Mitsuru Sato Graduate School of Environment and Information Sciences Yokohama National

More information

Passage Retrieval and other XML-Retrieval Tasks. Andrew Trotman (Otago) Shlomo Geva (QUT)

Passage Retrieval and other XML-Retrieval Tasks. Andrew Trotman (Otago) Shlomo Geva (QUT) Passage Retrieval and other XML-Retrieval Tasks Andrew Trotman (Otago) Shlomo Geva (QUT) Passage Retrieval Information Retrieval Information retrieval (IR) is the science of searching for information in

More information

DELOS WP7: Evaluation

DELOS WP7: Evaluation DELOS WP7: Evaluation Claus-Peter Klas Univ. of Duisburg-Essen, Germany (WP leader: Norbert Fuhr) WP Objectives Enable communication between evaluation experts and DL researchers/developers Continue existing

More information

Ontology based Model and Procedure Creation for Topic Analysis in Chinese Language

Ontology based Model and Procedure Creation for Topic Analysis in Chinese Language Ontology based Model and Procedure Creation for Topic Analysis in Chinese Language Dong Han and Kilian Stoffel Information Management Institute, University of Neuchâtel Pierre-à-Mazel 7, CH-2000 Neuchâtel,

More information

A Patent Retrieval Method Using a Hierarchy of Clusters at TUT

A Patent Retrieval Method Using a Hierarchy of Clusters at TUT A Patent Retrieval Method Using a Hierarchy of Clusters at TUT Hironori Doi Yohei Seki Masaki Aono Toyohashi University of Technology 1-1 Hibarigaoka, Tenpaku-cho, Toyohashi-shi, Aichi 441-8580, Japan

More information

Feature Selecting Model in Automatic Text Categorization of Chinese Financial Industrial News

Feature Selecting Model in Automatic Text Categorization of Chinese Financial Industrial News Selecting Model in Automatic Text Categorization of Chinese Industrial 1) HUEY-MING LEE 1 ), PIN-JEN CHEN 1 ), TSUNG-YEN LEE 2) Department of Information Management, Chinese Culture University 55, Hwa-Kung

More information

Experiment for Using Web Information to do Query and Document Expansion

Experiment for Using Web Information to do Query and Document Expansion Experiment for Using Web Information to do Query and Document Expansion Yih-Chen Chang and Hsin-Hsi Chen * Department of Computer Science and Information Engineering National Taiwan University Taipei,

More information

Integrating Query Translation and Text Classification in a Cross-Language Patent Access System

Integrating Query Translation and Text Classification in a Cross-Language Patent Access System Integrating Query Translation and Text Classification in a Cross-Language Patent Access System Guo-Wei Bian Shun-Yuan Teng Department of Information Management Huafan University, Taiwan, R.O.C. gwbian@cc.hfu.edu.tw

More information

From Passages into Elements in XML Retrieval

From Passages into Elements in XML Retrieval From Passages into Elements in XML Retrieval Kelly Y. Itakura David R. Cheriton School of Computer Science, University of Waterloo 200 Univ. Ave. W. Waterloo, ON, Canada yitakura@cs.uwaterloo.ca Charles

More information

Overview of the Fifth NTCIR Workshop

Overview of the Fifth NTCIR Workshop Overview of the Fifth NTCIR Workshop Noriko Kando National Institute of Informatics http://research.nii.ac.jp/ntcir/ kando (at) nii. ac. jp ntcir5 2005-12-07 Noriko kando 1 NTCIR Workshop is : A series

More information

doi: / _32

doi: / _32 doi: 10.1007/978-3-319-12823-8_32 Simple Document-by-Document Search Tool Fuwatto Search using Web API Masao Takaku 1 and Yuka Egusa 2 1 University of Tsukuba masao@slis.tsukuba.ac.jp 2 National Institute

More information

Advanced Topics in Information Retrieval. Learning to Rank. ATIR July 14, 2016

Advanced Topics in Information Retrieval. Learning to Rank. ATIR July 14, 2016 Advanced Topics in Information Retrieval Learning to Rank Vinay Setty vsetty@mpi-inf.mpg.de Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ATIR July 14, 2016 Before we start oral exams July 28, the full

More information

Search Evaluation. Tao Yang CS293S Slides partially based on text book [CMS] [MRS]

Search Evaluation. Tao Yang CS293S Slides partially based on text book [CMS] [MRS] Search Evaluation Tao Yang CS293S Slides partially based on text book [CMS] [MRS] Table of Content Search Engine Evaluation Metrics for relevancy Precision/recall F-measure MAP NDCG Difficulties in Evaluating

More information

Ranked Retrieval. Evaluation in IR. One option is to average the precision scores at discrete. points on the ROC curve But which points?

Ranked Retrieval. Evaluation in IR. One option is to average the precision scores at discrete. points on the ROC curve But which points? Ranked Retrieval One option is to average the precision scores at discrete Precision 100% 0% More junk 100% Everything points on the ROC curve But which points? Recall We want to evaluate the system, not

More information

R 2 D 2 at NTCIR-4 Web Retrieval Task

R 2 D 2 at NTCIR-4 Web Retrieval Task R 2 D 2 at NTCIR-4 Web Retrieval Task Teruhito Kanazawa KYA group Corporation 5 29 7 Koishikawa, Bunkyo-ku, Tokyo 112 0002, Japan tkana@kyagroup.com Tomonari Masada University of Tokyo 7 3 1 Hongo, Bunkyo-ku,

More information

Web Query Translation with Representative Synonyms in Cross Language Information Retrieval

Web Query Translation with Representative Synonyms in Cross Language Information Retrieval Web Query Translation with Representative Synonyms in Cross Language Information Retrieval August 25, 2005 Bo-Young Kang, Qing Li, Yun Jin, Sung Hyon Myaeng Information Retrieval and Natural Language Processing

More information

Aggregation for searching complex information spaces. Mounia Lalmas

Aggregation for searching complex information spaces. Mounia Lalmas Aggregation for searching complex information spaces Mounia Lalmas mounia@acm.org Outline Document Retrieval Focused Retrieval Aggregated Retrieval Complexity of the information space (s) INEX - INitiative

More information

Informativeness for Adhoc IR Evaluation:

Informativeness for Adhoc IR Evaluation: Informativeness for Adhoc IR Evaluation: A measure that prevents assessing individual documents Romain Deveaud 1, Véronique Moriceau 2, Josiane Mothe 3, and Eric SanJuan 1 1 LIA, Univ. Avignon, France,

More information

Entity Linking at TAC Task Description

Entity Linking at TAC Task Description Entity Linking at TAC 2013 Task Description Version 1.0 of April 9, 2013 1 Introduction The main goal of the Knowledge Base Population (KBP) track at TAC 2013 is to promote research in and to evaluate

More information

Overview of FIRE 2011 Prasenjit Majumder on behalf of the FIRE team

Overview of FIRE 2011 Prasenjit Majumder on behalf of the FIRE team Overview of FIRE 2011 Prasenjit Majumder on behalf of the FIRE team Overview of FIRE 2011 p. 1/21 Overview Background Tasks Data Results Problems and prospects People Overview of FIRE 2011 p. 2/21 Background

More information

Chinese Question Answering using the DLT System at NTCIR 2005

Chinese Question Answering using the DLT System at NTCIR 2005 Chinese Question Answering using the DLT System at NTCIR 2005 Richard F. E. Sutcliffe* Natural Language Engineering and Web Applications Group Department of Computer Science University of Essex, Wivenhoe

More information

An Attempt to Identify Weakest and Strongest Queries

An Attempt to Identify Weakest and Strongest Queries An Attempt to Identify Weakest and Strongest Queries K. L. Kwok Queens College, City University of NY 65-30 Kissena Boulevard Flushing, NY 11367, USA kwok@ir.cs.qc.edu ABSTRACT We explore some term statistics

More information

FLL: Answering World History Exams by Utilizing Search Results and Virtual Examples

FLL: Answering World History Exams by Utilizing Search Results and Virtual Examples FLL: Answering World History Exams by Utilizing Search Results and Virtual Examples Takuya Makino, Seiji Okura, Seiji Okajima, Shuangyong Song, Hiroko Suzuki, Fujitsu Laboratories Ltd. Fujitsu R&D Center

More information

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISSN : 2456-3307 Some Issues in Application of NLP to Intelligent

More information

Evaluation of the Document Categorization in Fixed-point Observatory

Evaluation of the Document Categorization in Fixed-point Observatory Evaluation of the Document Categorization in Fixed-point Observatory Yoshihiro Ueda Mamiko Oka Katsunori Houchi Service Technology Development Department Fuji Xerox Co., Ltd. 3-1 Minatomirai 3-chome, Nishi-ku,

More information

Information Retrieval

Information Retrieval Multimedia Computing: Algorithms, Systems, and Applications: Information Retrieval and Search Engine By Dr. Yu Cao Department of Computer Science The University of Massachusetts Lowell Lowell, MA 01854,

More information

Wikipedia and the Web of Confusable Entities: Experience from Entity Linking Query Creation for TAC 2009 Knowledge Base Population

Wikipedia and the Web of Confusable Entities: Experience from Entity Linking Query Creation for TAC 2009 Knowledge Base Population Wikipedia and the Web of Confusable Entities: Experience from Entity Linking Query Creation for TAC 2009 Knowledge Base Population Heather Simpson 1, Stephanie Strassel 1, Robert Parker 1, Paul McNamee

More information

led to different techniques for cross-language retrieval, ones which utilized the power of human indexing of documents to improve retrieval via bi-lin

led to different techniques for cross-language retrieval, ones which utilized the power of human indexing of documents to improve retrieval via bi-lin Cross-Language Retrieval for the CLEF Collections Comparing Multiple Methods of Retrieval Fredric C. Gey 1, Hailing Jiang 2, Vivien Petras 2 and Aitao Chen 2 1 UC Data Archive & Technical Assistance, 2

More information

Building Test Collections. Donna Harman National Institute of Standards and Technology

Building Test Collections. Donna Harman National Institute of Standards and Technology Building Test Collections Donna Harman National Institute of Standards and Technology Cranfield 2 (1962-1966) Goal: learn what makes a good indexing descriptor (4 different types tested at 3 levels of

More information

University of Virginia Department of Computer Science. CS 4501: Information Retrieval Fall 2015

University of Virginia Department of Computer Science. CS 4501: Information Retrieval Fall 2015 University of Virginia Department of Computer Science CS 4501: Information Retrieval Fall 2015 5:00pm-6:15pm, Monday, October 26th Name: ComputingID: This is a closed book and closed notes exam. No electronic

More information

IRCE at the NTCIR-12 IMine-2 Task

IRCE at the NTCIR-12 IMine-2 Task IRCE at the NTCIR-12 IMine-2 Task Ximei Song University of Tsukuba songximei@slis.tsukuba.ac.jp Yuka Egusa National Institute for Educational Policy Research yuka@nier.go.jp Masao Takaku University of

More information

Challenges on Combining Open Web and Dataset Evaluation Results: The Case of the Contextual Suggestion Track

Challenges on Combining Open Web and Dataset Evaluation Results: The Case of the Contextual Suggestion Track Challenges on Combining Open Web and Dataset Evaluation Results: The Case of the Contextual Suggestion Track Alejandro Bellogín 1,2, Thaer Samar 1, Arjen P. de Vries 1, and Alan Said 1 1 Centrum Wiskunde

More information

UMass at TREC 2006: Enterprise Track

UMass at TREC 2006: Enterprise Track UMass at TREC 2006: Enterprise Track Desislava Petkova and W. Bruce Croft Center for Intelligent Information Retrieval Department of Computer Science University of Massachusetts, Amherst, MA 01003 Abstract

More information

Evaluating an Associative Browsing Model for Personal Information

Evaluating an Associative Browsing Model for Personal Information Evaluating an Associative Browsing Model for Personal Information Jinyoung Kim, W. Bruce Croft, David A. Smith and Anton Bakalov Department of Computer Science University of Massachusetts Amherst {jykim,croft,dasmith,abakalov}@cs.umass.edu

More information

Multilingual Information Access for Digital Libraries The Metadata Records Translation Project

Multilingual Information Access for Digital Libraries The Metadata Records Translation Project Multilingual Information Access for Digital Libraries The Metadata Records Translation Project Jiangping Chen Http://max.lis.unt.edu/ Jiangping.chen@unt.edu July 2011 Presentation Outline About Me Current

More information

CIRGDISCO at RepLab2012 Filtering Task: A Two-Pass Approach for Company Name Disambiguation in Tweets

CIRGDISCO at RepLab2012 Filtering Task: A Two-Pass Approach for Company Name Disambiguation in Tweets CIRGDISCO at RepLab2012 Filtering Task: A Two-Pass Approach for Company Name Disambiguation in Tweets Arjumand Younus 1,2, Colm O Riordan 1, and Gabriella Pasi 2 1 Computational Intelligence Research Group,

More information

NTU Approaches to Subtopic Mining and Document Ranking at NTCIR-9 Intent Task

NTU Approaches to Subtopic Mining and Document Ranking at NTCIR-9 Intent Task NTU Approaches to Subtopic Mining and Document Ranking at NTCIR-9 Intent Task Chieh-Jen Wang, Yung-Wei Lin, *Ming-Feng Tsai and Hsin-Hsi Chen Department of Computer Science and Information Engineering,

More information

Chapter 2. Architecture of a Search Engine

Chapter 2. Architecture of a Search Engine Chapter 2 Architecture of a Search Engine Search Engine Architecture A software architecture consists of software components, the interfaces provided by those components and the relationships between them

More information

ABRIR at NTCIR-9 at GeoTime Task Usage of Wikipedia and GeoNames for Handling Named Entity Information

ABRIR at NTCIR-9 at GeoTime Task Usage of Wikipedia and GeoNames for Handling Named Entity Information ABRIR at NTCIR-9 at GeoTime Task Usage of Wikipedia and GeoNames for Handling Named Entity Information Masaharu Yoshioka Graduate School of Information Science and Technology, Hokkaido University N14 W9,

More information

Diversifying Query Suggestions Based on Query Documents

Diversifying Query Suggestions Based on Query Documents Diversifying Query Suggestions Based on Query Documents Youngho Kim University of Massachusetts Amherst yhkim@cs.umass.edu W. Bruce Croft University of Massachusetts Amherst croft@cs.umass.edu ABSTRACT

More information

SENTIMENT ESTIMATION OF TWEETS BY LEARNING SOCIAL BOOKMARK DATA

SENTIMENT ESTIMATION OF TWEETS BY LEARNING SOCIAL BOOKMARK DATA IADIS International Journal on WWW/Internet Vol. 14, No. 1, pp. 15-27 ISSN: 1645-7641 SENTIMENT ESTIMATION OF TWEETS BY LEARNING SOCIAL BOOKMARK DATA Yasuyuki Okamura, Takayuki Yumoto, Manabu Nii and Naotake

More information

Accessing XML documents: The INEX initiative. Mounia Lalmas, Thomas Rölleke, Zoltán Szlávik, Tassos Tombros (+ Duisburg-Essen)

Accessing XML documents: The INEX initiative. Mounia Lalmas, Thomas Rölleke, Zoltán Szlávik, Tassos Tombros (+ Duisburg-Essen) Accessing XML documents: The INEX initiative Mounia Lalmas, Thomas Rölleke, Zoltán Szlávik, Tassos Tombros (+ Duisburg-Essen) XML documents Book Chapters Sections World Wide Web This is only only another

More information

TREC-7 Experiments at the University of Maryland Douglas W. Oard Digital Library Research Group College of Library and Information Services University

TREC-7 Experiments at the University of Maryland Douglas W. Oard Digital Library Research Group College of Library and Information Services University TREC-7 Experiments at the University of Maryland Douglas W. Oard Digital Library Research Group College of Library and Information Services University of Maryland, College Park, MD 20742 oard@glue.umd.edu

More information

MIRACLE at ImageCLEFmed 2008: Evaluating Strategies for Automatic Topic Expansion

MIRACLE at ImageCLEFmed 2008: Evaluating Strategies for Automatic Topic Expansion MIRACLE at ImageCLEFmed 2008: Evaluating Strategies for Automatic Topic Expansion Sara Lana-Serrano 1,3, Julio Villena-Román 2,3, José C. González-Cristóbal 1,3 1 Universidad Politécnica de Madrid 2 Universidad

More information

Retrieval Evaluation. Hongning Wang

Retrieval Evaluation. Hongning Wang Retrieval Evaluation Hongning Wang CS@UVa What we have learned so far Indexed corpus Crawler Ranking procedure Research attention Doc Analyzer Doc Rep (Index) Query Rep Feedback (Query) Evaluation User

More information

INFORMATION RETRIEVAL SYSTEM: CONCEPT AND SCOPE

INFORMATION RETRIEVAL SYSTEM: CONCEPT AND SCOPE 15 : CONCEPT AND SCOPE 15.1 INTRODUCTION Information is communicated or received knowledge concerning a particular fact or circumstance. Retrieval refers to searching through stored information to find

More information

SNUMedinfo at TREC CDS track 2014: Medical case-based retrieval task

SNUMedinfo at TREC CDS track 2014: Medical case-based retrieval task SNUMedinfo at TREC CDS track 2014: Medical case-based retrieval task Sungbin Choi, Jinwook Choi Medical Informatics Laboratory, Seoul National University, Seoul, Republic of Korea wakeup06@empas.com, jinchoi@snu.ac.kr

More information

Evaluating a Conceptual Indexing Method by Utilizing WordNet

Evaluating a Conceptual Indexing Method by Utilizing WordNet Evaluating a Conceptual Indexing Method by Utilizing WordNet Mustapha Baziz, Mohand Boughanem, Nathalie Aussenac-Gilles IRIT/SIG Campus Univ. Toulouse III 118 Route de Narbonne F-31062 Toulouse Cedex 4

More information

KNOW At The Social Book Search Lab 2016 Suggestion Track

KNOW At The Social Book Search Lab 2016 Suggestion Track KNOW At The Social Book Search Lab 2016 Suggestion Track Hermann Ziak and Roman Kern Know-Center GmbH Inffeldgasse 13 8010 Graz, Austria hziak, rkern@know-center.at Abstract. Within this work represents

More information

Chapter 6: Information Retrieval and Web Search. An introduction

Chapter 6: Information Retrieval and Web Search. An introduction Chapter 6: Information Retrieval and Web Search An introduction Introduction n Text mining refers to data mining using text documents as data. n Most text mining tasks use Information Retrieval (IR) methods

More information

ECNU at 2017 ehealth Task 2: Technologically Assisted Reviews in Empirical Medicine

ECNU at 2017 ehealth Task 2: Technologically Assisted Reviews in Empirical Medicine ECNU at 2017 ehealth Task 2: Technologically Assisted Reviews in Empirical Medicine Jiayi Chen 1, Su Chen 1, Yang Song 1, Hongyu Liu 1, Yueyao Wang 1, Qinmin Hu 1, Liang He 1, and Yan Yang 1,2 Department

More information

TREC 2017 Dynamic Domain Track Overview

TREC 2017 Dynamic Domain Track Overview TREC 2017 Dynamic Domain Track Overview Grace Hui Yang Zhiwen Tang Ian Soboroff Georgetown University Georgetown University NIST huiyang@cs.georgetown.edu zt79@georgetown.edu ian.soboroff@nist.gov 1. Introduction

More information

INEX REPORT. Report on INEX 2011

INEX REPORT. Report on INEX 2011 INEX REPORT Report on INEX 2011 P. Bellot T. Chappell A. Doucet S. Geva J. Kamps G. Kazai M. Koolen M. Landoni M. Marx V. Moriceau J. Mothe G. Ramírez M. Sanderson E. Sanjuan F. Scholer X. Tannier M. Theobald

More information

Question Answering Systems

Question Answering Systems Question Answering Systems An Introduction Potsdam, Germany, 14 July 2011 Saeedeh Momtazi Information Systems Group Outline 2 1 Introduction Outline 2 1 Introduction 2 History Outline 2 1 Introduction

More information

This paper studies methods to enhance cross-language retrieval of domain-specific

This paper studies methods to enhance cross-language retrieval of domain-specific Keith A. Gatlin. Enhancing Cross-Language Retrieval of Comparable Corpora Through Thesaurus-Based Translation and Citation Indexing. A master s paper for the M.S. in I.S. degree. April, 2005. 23 pages.

More information

Improving Synoptic Querying for Source Retrieval

Improving Synoptic Querying for Source Retrieval Improving Synoptic Querying for Source Retrieval Notebook for PAN at CLEF 2015 Šimon Suchomel and Michal Brandejs Faculty of Informatics, Masaryk University {suchomel,brandejs}@fi.muni.cz Abstract Source

More information

Question Answering Approach Using a WordNet-based Answer Type Taxonomy

Question Answering Approach Using a WordNet-based Answer Type Taxonomy Question Answering Approach Using a WordNet-based Answer Type Taxonomy Seung-Hoon Na, In-Su Kang, Sang-Yool Lee, Jong-Hyeok Lee Department of Computer Science and Engineering, Electrical and Computer Engineering

More information

Making Sense Out of the Web

Making Sense Out of the Web Making Sense Out of the Web Rada Mihalcea University of North Texas Department of Computer Science rada@cs.unt.edu Abstract. In the past few years, we have witnessed a tremendous growth of the World Wide

More information

Prior Art Retrieval Using Various Patent Document Fields Contents

Prior Art Retrieval Using Various Patent Document Fields Contents Prior Art Retrieval Using Various Patent Document Fields Contents Metti Zakaria Wanagiri and Mirna Adriani Fakultas Ilmu Komputer, Universitas Indonesia Depok 16424, Indonesia metti.zakaria@ui.edu, mirna@cs.ui.ac.id

More information

CACAO PROJECT AT THE 2009 TASK

CACAO PROJECT AT THE 2009 TASK CACAO PROJECT AT THE TEL@CLEF 2009 TASK Alessio Bosca, Luca Dini Celi s.r.l. - 10131 Torino - C. Moncalieri, 21 alessio.bosca, dini@celi.it Abstract This paper presents the participation of the CACAO prototype

More information

Overview of the Web Retrieval Task at the Third NTCIR Workshop

Overview of the Web Retrieval Task at the Third NTCIR Workshop ISSN 1346-97 NII Technical Report Jan 23 Overview of the Web Retrieval Task at the Third NTCIR Workshop Koji Eguchi, Keizo Oyama, Emi Ishida, Noriko Kando, and Kazuko Kuriyama NII-23-2E Jan 23 Overview

More information

nding that simple gloss (i.e., word-by-word) translations allowed users to outperform a Naive Bayes classier [3]. In the other study, Ogden et al., ev

nding that simple gloss (i.e., word-by-word) translations allowed users to outperform a Naive Bayes classier [3]. In the other study, Ogden et al., ev TREC-9 Experiments at Maryland: Interactive CLIR Douglas W. Oard, Gina-Anne Levow, y and Clara I. Cabezas, z University of Maryland, College Park, MD, 20742 Abstract The University of Maryland team participated

More information

INEX REPORT. Report on INEX 2012

INEX REPORT. Report on INEX 2012 INEX REPORT Report on INEX 2012 P. Bellot T. Chappell A. Doucet S. Geva S. Gurajada J. Kamps G. Kazai M. Koolen M. Landoni M. Marx A. Mishra V. Moriceau J. Mothe M. Preminger G. Ramírez M. Sanderson E.

More information

Navigation Retrieval with Site Anchor Text

Navigation Retrieval with Site Anchor Text Navigation Retrieval with Site Anchor Text Hideki Kawai Kenji Tateishi Toshikazu Fukushima NEC Internet Systems Research Labs. 8916-47, Takayama-cho, Ikoma-city, Nara, JAPAN {h-kawai@ab, k-tateishi@bq,

More information

Information Retrieval

Information Retrieval Introduction to Information Retrieval CS3245 Information Retrieval Lecture 9: IR Evaluation 9 Ch. 7 Last Time The VSM Reloaded optimized for your pleasure! Improvements to the computation and selection

More information

LARGE-VOCABULARY CHINESE TEXT/SPEECH INFORMATION RETRIEVAL USING MANDARIN SPEECH QUERIES

LARGE-VOCABULARY CHINESE TEXT/SPEECH INFORMATION RETRIEVAL USING MANDARIN SPEECH QUERIES LARGE-VOCABULARY CHINESE TEXT/SPEECH INFORMATION RETRIEVAL USING MANDARIN SPEECH QUERIES Bo-ren Bai 1, Berlin Chen 2, Hsin-min Wang 2, Lee-feng Chien 2, and Lin-shan Lee 1,2 1 Department of Electrical

More information

Overview of iclef 2008: search log analysis for Multilingual Image Retrieval

Overview of iclef 2008: search log analysis for Multilingual Image Retrieval Overview of iclef 2008: search log analysis for Multilingual Image Retrieval Julio Gonzalo Paul Clough Jussi Karlgren UNED U. Sheffield SICS Spain United Kingdom Sweden julio@lsi.uned.es p.d.clough@sheffield.ac.uk

More information