Evaluation of some Information Retrieval models for Gujarati Ad hoc Monolingual Tasks

Similar documents
IITH at CLEF 2017: Finding Relevant Tweets for Cultural Events

Overview of FIRE 2011 Prasenjit Majumder on behalf of the FIRE team

Term Frequency Normalisation Tuning for BM25 and DFR Models

GUJARAT TECHNOLOGICAL UNIVERSITY Diploma Engineering Semester VI Examination Dec

Informativeness for Adhoc IR Evaluation:

વષ ૨૦૧૮ દર ય ન સ ધ થન ર હ ર ત ન ક ય મ

CS47300: Web Information Search and Management

Darshan Institute of Engineering & Technology for Diploma Studies Unit 6

Vedant Public School Isanpur, Ahmedabad.

Information Retrieval. Lecture 7 - Evaluation in Information Retrieval. Introduction. Overview. Standard test collection. Wintersemester 2007

RMIT University at TREC 2006: Terabyte Track

It is an entirely new way of typing and hence please go through the instructions to experience the best usability.

GUJARAT TECHNOLOGICAL UNIVERSITY Diploma Semester V th Examination December

Custom IDF weights for boosting the relevancy of retrieved documents in textual retrieval

LaHC at CLEF 2015 SBS Lab

ResPubliQA 2010

Information Retrieval

University of Santiago de Compostela at CLEF-IP09

Fondazione Ugo Bordoni at TREC 2003: robust and web track

European Web Retrieval Experiments at WebCLEF 2006

University of Glasgow at CLEF 2013: Experiments in ehealth Task 3 with Terrier

CS646 (Fall 2016) Homework 1

Lab 2 Test collections

TREC-7 Experiments at the University of Maryland Douglas W. Oard Digital Library Research Group College of Library and Information Services University

Fondazione Ugo Bordoni at TREC 2004

Making Retrieval Faster Through Document Clustering

VK Multimedia Information Systems

GUJARAT TECHNOLOGICAL UNIVERSITY DIPLOMA ENGINEERING SEMESTER 2(C2D) EXAMINATION SUMMER

Evaluating a Conceptual Indexing Method by Utilizing WordNet

University of Waterloo: Logistic Regression and Reciprocal Rank Fusion at the Microblog Track

Building Test Collections. Donna Harman National Institute of Standards and Technology

Automatic Query Type Identification Based on Click Through Information

ECNU at 2017 ehealth Task 2: Technologically Assisted Reviews in Empirical Medicine

This is an author-deposited version published in : Eprints ID : 18777

Prior Art Retrieval Using Various Patent Document Fields Contents

CLUSTERING, TIERED INDEXES AND TERM PROXIMITY WEIGHTING IN TEXT-BASED RETRIEVAL

Quarterly Programmatic Report (November, 2013 to February, 2014)

TREC-10 Web Track Experiments at MSRA

Research Article. August 2017

MeSH-based dataset for measuring the relevance of text retrieval

Query Expansion using Wikipedia and DBpedia

A Practical Passage-based Approach for Chinese Document Retrieval

Verbose Query Reduction by Learning to Rank for Social Book Search Track

Modern Information Retrieval

FUB, IASI-CNR and University of Tor Vergata at TREC 2008 Blog Track

TREC 2016 Dynamic Domain Track: Exploiting Passage Representation for Retrieval and Relevance Feedback

Applying the KISS Principle for the CLEF- IP 2010 Prior Art Candidate Patent Search Task

Word Indexing Versus Conceptual Indexing in Medical Image Retrieval

a,f[u:5f[8df\ a,f[u S[JL ZLT[ AGFJXF[m

Developing a Test Collection for the Evaluation of Integrated Search Lykke, Marianne; Larsen, Birger; Lund, Haakon; Ingwersen, Peter

GUJARAT TECHNOLOGICAL UNIVERSITY Diploma Engineering - SEMESTER VI EXAMINATION SUMMER 2013

Heading-aware Snippet Generation for Web Search

A New Measure of the Cluster Hypothesis

An Investigation of Basic Retrieval Models for the Dynamic Domain Task

Automatically Generating Queries for Prior Art Search

Information Retrieval

Information Retrieval. (M&S Ch 15)

Inter and Intra-Document Contexts Applied in Polyrepresentation

ટ ડર ન ટ સ ટ ડર થ ૨-૦૦ સ ધ D.D. રજ ટ ર ર, સ ર ટ ર ય નવ સર ટ ન રહ શ. સ ર ટ ર ય નવ સર ટ ર જક ટ અબ ધત અ ધક ર ક લસ ચવ

AN UNSUPERVISED APPROACH TO DEVELOP IR SYSTEM: THE CASE OF URDU

York University at CLEF ehealth 2015: Medical Document Retrieval

Overview of the FIRE 2011 RISOT Task

Query Likelihood with Negative Query Generation

Searching Image Databases Containing Trademarks

6.2 DATA DISTRIBUTION AND EXPERIMENT DETAILS

WHAT YOU SEE IS (ALMOST) WHAT YOU HEAR: DESIGN PRINCIPLES FOR USER INTERFACES FOR ACCESSING SPEECH ARCHIVES

Using Query History to Prune Query Results

Text Documents clustering using K Means Algorithm

Designing and Building an Automatic Information Retrieval System for Handling the Arabic Data

Information Retrieval

Multifaceted Feature Sets for Information Retrieval. Melanie Imhof Zurich University of Applied Sciences University of Neuchatel

Chapter 8. Evaluating Search Engine

Cross-Language Information Retrieval using Dutch Query Translation

Search Evaluation. Tao Yang CS293S Slides partially based on text book [CMS] [MRS]

R 2 D 2 at NTCIR-4 Web Retrieval Task

THIS LECTURE. How do we know if our results are any good? Results summaries: Evaluating a search engine. Making our good results usable to a user

CLEF-IP 2009: Exploring Standard IR Techniques on Patent Retrieval

Automatic prior art searching and patent encoding at CLEF-IP 10

Running CLEF-IP experiments using a graphical query builder

Gujarati Style Guide. Published: December, Microsoft Gujarati Style Guide

Search Engines Chapter 8 Evaluating Search Engines Felix Naumann

TEXT CHAPTER 5. W. Bruce Croft BACKGROUND

Evaluating the effectiveness of content-oriented XML retrieval

Investigate the use of Anchor-Text and of Query- Document Similarity Scores to Predict the Performance of Search Engine

From Passages into Elements in XML Retrieval

Part 7: Evaluation of IR Systems Francesco Ricci

Retrieval Evaluation. Hongning Wang

CS54701: Information Retrieval

IPAL at CLEF 2008: Mixed-Modality based Image Search, Novelty based Re-ranking and Extended Matching

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India

Cross-Lingual Information Access and Its Evaluation

Cross-Language Chinese Text Retrieval in NTCIR Workshop Towards Cross-Language Multilingual Text Retrieval

Multimodal Information Spaces for Content-based Image Retrieval

Introduction to Information Retrieval. (COSC 488) Spring Nazli Goharian. Course Outline

Empirical Analysis of Single and Multi Document Summarization using Clustering Algorithms

Combining fields for query expansion and adaptive query expansion

Document Structure Analysis in Associative Patent Retrieval

Comparative Analysis of Sparse Matrix Algorithms For Information Retrieval

Research Article Relevance Feedback Based Query Expansion Model Using Borda Count and Semantic Similarity Approach

Examining the Limits of Crowdsourcing for Relevance Assessment

Transcription:

Evaluation of some Information Retrieval models for Gujarati Ad hoc Monolingual Tasks JOSHI Hardik Department of Computer Science, Gujarat University, Ahmedabad hardikjjoshi@gmail.com PAREEK Jyoti Department of Computer Science, Gujarat University, Ahmedabad drjyotipareek@yahoo.com Abstract: This paper describes the work towards Gujarati Ad hoc Monolingual Retrieval task for widely used Information Retrieval (IR) models. We present an indexing baseline for the Gujarati Language represented by Mean Average Precision (MAP) values. Our objective is to obtain a relative picture of a better IR model for Gujarati Language. Results show that Classical IR models like Term Frequency Inverse Document Frequency (TF_IDF) performs better when compared to few recent probabilistic IR models. The experiments helped to identify the outperforming IR models for Gujarati Language. Keywords: Automatic Indexing, Corpus, Gujarati, Information Retrieval, Mean Average Precision, Monolingual. 1 Introduction Information retrieval (IR) is concerned with representing, searching, and manipulating large collections of electronic text data. IR is the discipline that deals with retrieval of unstructured data or partially structured data, especially textual documents, in response to a set of query or topic statement(s), which may itself be unstructured [6]. The typical interaction between a user and an IR system can be modeled as the user submitting a query to the system; the system returns a ranked list of relevant documents, with the most relevant at top of the list. The need for effective methods of automated IR has grown in importance because of the tremendous explosion in the amount of text documents and growing number of document sources on the Internet. Over the last few years, there has been a significant growth in the amount of text documents in Indian languages. Researchers have been performing IR tasks in English and European languages since many years [2, 15], efforts are being made to encourage IR tasks for the Indian Languages [5, 14]. Most of the IR research community uses resources known as test collection [12]. Since 1990's, TREC [15] is conducting evaluation exercises using test collections. The classic components of a test collection are: A collection of documents; each document is given a unique identifier docid or docno. A set of topics (also referred as queries); each query is uniquely identified by a qid or num. A set of relevance judgements (also referred as qrels) that consists of a list of (qid,docid) pairs detailing the relevance of documents to topics. In this paper, we have described the ad hoc monolingual IR task performed over Gujarati language test collection. In ad hoc querying, the user formulates any number of arbitrary queries but applies them to a fixed collection [7]. We have considered Gujarati language, the reason being no such tasks have been performed for Gujarati language,

although some work already exists for Bengali, Hindi and Marathi languages [10,13]. Apart from this Gujarati is spoken by nearly 50 Million people over the world and is an official language for the state of Gujarat [9]. The rest of the paper is organized as follows: Section 2 gives an overview of the experimental setup used to perform ad hoc task. Section 3 presents an overview of the evaluation conducted. Section 4 presents the results obtained during the experiment and finally Section 5 concludes the experiments. 2. Experimental Setup 2.1 Overview of the Corpus The test collection used for this experiment is the collection made available during the FIRE 2011 [5]. The details of Gujarati Collection are mentioned in Table 1. The collection was created from the archives of the daily newspaper, Gujarat Samachar from 2001 to 2010. Each document represents a news article from Gujarat Samachar. The average number of tokens per document is 445. Size of Collection 2.7 GB Number of text Documents 3,13,163 Size of Vocabulary 20,92,619 Number of Tokens 13,92,72,906 Table 1 Statistics of the Gujarati Collection The corpus is coded in UTF-8 and each article is marked up using the following tags: <DOC> : Starting of the document <DOCNO> </DOCNO> : Unique identifier of the document <TEXT> </TEXT> : Contains the document text </DOC> : Ending tag of the document 2.2 Queries The IR models were tested against 50 different queries in Gujarati language. Following the TREC model [16], each query is divided into three sections: the title (T) which indicates the brief title, the description (D) that gives a one-sentence description and the narrative part (N), which specifies the relevance assessment criteria. Below is an example of a single query in the collection of 50 queries. <top> <num>150</num> <title> બલ ગ સ ન પર પક ર ય ન.</title> <desc> બલ ગ સન મ ઇ સ ટ થ ન થઈન દ ન કરવ ન નણય.</desc> <narr>સબ ધત દ ત વ જ મ મ ઇ સ ટ ન ય બલ ગ સ ત ન પ ટ પરથ ન થઈન દ ન અન સ મ જક ક મ કરવ ન નણય વષ ન મ હત હશ.</narr> </top>

2.3 IR Models In the experiment, we have compared various models that are widely used in test collection evaluation exercise. We considered classical models like Term Frequency Inverse Document Frequency (TF_IDF) model, language models like Hiemstra Language Model (Hiemstra_LM) [8], probabilistic models like Okapi (BM25), Divergence from Randomness (DFR) group of models [1] like Bose-Einstein model for randomness which considers the ratio of two Bernoulli's processes for first Normalization, and Normalization 2 for term frequency normalization (BB2), The DLH hyper-geometric DFR model (DLH) and its improvement (DLH13), Divergence From Independence model (DFI0), A different hyper-geometric DFR model using Popper's normalization (DPH) which is parameter free, DFR based hyper-geometric models which takes an average of two information measures (DFRee), Inverse Term Frequency model with Bernoulli after-effect (IFB2), Inverse Expected Document Frequency model with Bernoulli after-effect (In_expC2), Inverse Document Frequency model with Laplace after-effect (InL2), Poisson model with Laplace after-effect (PL2), a loglogistic DFR model ( LGD) [4] and an Unsupervised DFR model that computed the inner product of Pearson's X^2. In all seventeen different models were evaluated. Few of the models required parametric values; we have used the default values that are generally applied to similar tests. 3 Evaluations In earlier years, the IR systems were evaluated using measures like Precision, Recall and Fallout [3], where precision measures the fraction of retrieved documents that are relevant whereas recall measures the fraction of relevant documents retrieved and fallout measures the fraction of non-relevant documents retrieved. In recent years Mean Average Precision (MAP) values are considered to give the best judgment in the presence of multiple queries [11]. In our experiments, to evaluate the retrieval performance, the mean average precision (MAP) values were considered. This measure is highly recommended among the TREC community and provides a single-figure measure of quality across recall levels. Among evaluation measures, MAP has been shown to have especially good discrimination and stability. Average Precision is the average of the precision value obtained for the set of top k documents existing after each relevant document is retrieved, and this value is then averaged over multiple queries to obtain MAP. So, if each query qj belongs to a set of queries Q and if the set of relevant documents for a query is {d1, d mj }and R jk is the set of ranked retrieval results from the top result until we get document d k then MAP can be calculated using equation 1. Q mj 1 4. Results 1 MAP (Q) = Q j= 1 mj Precision (R jk ) k = 1 (1) The MAP values of various IR models are given in Table 2, the top three performers are In_expC2, In_L2, TF_IDF with the highest MAP values. We have evaluated the

outperforming IR models with some other evaluation measures like gm_map, Rpref, bpref and reciprocal rank to further compare these models. The results of further investigations are presented in the Table 3. IR Model Mean Average Precision (MAP) Values T TD TDN BB2 0.2203 0.2371 0.0965 BM25 0.2181 0.2493 0.1742 DFI0 0.1940 0.1267 0.0184 DFR_BM25 0.2192 0.2512 0.1747 DFRee 0.2149 0.2189 0.1229 DLH 0.2143 0.2314 0.1566 DLH13 0.2177 0.2282 0.1449 DPH 0.2197 0.2380 0.1525 Hiemstra_LM 0.2035 0.2350 0.1806 IFB2 0.2156 0.1900 0.0639 In_expC2 0.2238 0.2619 0.1989 InL2 0.2240 0.2531 0.1759 Js_KLs 0.2183 0.2234 0.1287 LGD 0.2186 0.2197 0.1343 PL2 0.1991 0.2422 0.1468 TF_IDF 0.2198 0.2512 0.1698 XSqrA_M 0.2181 0.2239 0.1309 Table 2: MAP values of various IR models Measures IR Models In_expC2 InL2 TF_IDF iprec_at_0.0 0.7382 0.7198 0.7446 Precision@5 0.4480 0.4960 0.5160 gm_map 0.1894 0.1706 0.1689 Rpref 0.3014 0.2847 0.2892 Bpref 0.3129 0.3072 0.3080 recip_rank 0.6654 0.6678 0.6971 Table 3: Some other Evaluation metrics among top IR Models The result shown in Table 3 signifies that TF_IDF is outperforming models In_expC2 and InL2 in terms of Precision values. A better picture of TF_IDF model can be obtained from the Graph given in Figure 1, which models the Precision Recall Curve for the TF_IDF Model.

Precision 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 TD T TDN 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Recall Figure 1: Precision-Recall Curve for the TF_IDF Model 5. Conclusion From the results of experiments carried out to evaluate various IR models, we can conclude that In_expC2, In_L2, and TF_IDF models are performing well with the Gujarati Newswire corpus. Surprisingly TF_IDF is a classical model which does not give good results in English like language. The reason of its performance in Gujarati Language needs to be investigated. Further investigations claim that Precision can be best obtained from the TF_IDF model. Better Precision-Recall curve is obtained when we consider the Title (T) and short Description (D) both together in the queries. Through our experiments, we are able to generate a baseline for each of the IR model for Gujarati Language. References: 1 Amati, G., van Rijsbergen, C.J. : "Probabilistic Models of Information Retrieval Based on Measuring the Divergence from Randomness", ACM - Transactions on Information Systems, 20, 357-389, (2002) 2 CLEF, Conference and Labs of the Evaluation Forum, http://www.clefinitiative.eu/ (Accessed on 17th Jan, 2012) 3 Cleverdon, C.W., Keen, M.: "Factors Affecting the Performance of Indexing Systems", Vol 2., ASLIB, Uk, pp. 37-59, (1966) 4 Clinchant, S., Gaussier, E. : "Bridging Language Modeling and Divergence From Randomness Approaches: A Log-logistic Model for IR", Proceedings of ICTIR, London, UK, 2009 5 FIRE, Forum for Information Retrieval Evaluation, Homepage, http://www.isical.ac.in/~clia/ (Accessed on 17th Jan, 2012)

6 Greengrass, Ed. Information Retrieval : A Survey by Ed Greengrass. Information Retrieval, 141-163 November (2002) 7 Harman, D., : "Overview of the second text retrieval conference (TREC-2)". Inf. Process. Management 31, 3, 271-289, May (1995) 8 Hiemstra, D. : Using Language Models for Information Retrieval, Ph.D. Thesis (2000) 9 Lewis, M. Paul (ed.) : "Ethnologue: Languages of the World", SIL International, Sixteenth edition. Dallas, Texas, (2009) 10 Majumder, P. et al : "The FIRE 2008 Evaluation Exercise", ACM Transactions on Asian Language Information Processing(TALIP), Volume 9, Issue 3, Sept. (2010) 11 Manning, C.D., Prabhakar R., Schtze, H. : "Introduction to Information Retrieval", Cambridge University Press, New York, USA (2009) 12 Sanderson, M. : "Test Collection Based Evaluation of Information Retrieval Systems. Foundations and Trends in Information Retrieval", 4(4), 247-375, (2010) 13 Savoy, J. : "Comparative Study of Monolingual and Multilingual Search Models for Use with Asian Languages", ACM - Transaciton on Asian Languages Information Processing, 4, 163-189 (2005) 14 Technology development for Indian Languages, http://tdil.mit.gov.in Accessed on 17th Jan, 2012. 15 TREC, Text Retrieval Conference, Homepage, http://trec.nist.gov/ (Accessed on 17th Jan, 2012) 16 Voorhees, E.M., Harman, D.K. (Eds) : TREC, Experiment and Evaluation in Information Retrieval. The MIT Press, Cambridge (MA), 2005