Information Retrieval Term Project : Incremental Indexing Searching Engine
|
|
- Deirdre Clark
- 6 years ago
- Views:
Transcription
1 Information Retrieval Term Project : Incremental Indexing Searching Engine Chi-yau Lin r @ntu.edu.tw Department of Computer Science and Information Engineering National Taiwan University Taipei, Taiwan Abstract To retrieve information is useful to the user, so in this report we propose our method to build our information retrieval system with the lemur toolkit from CMU [1]. The Lemur Toolkit is designed to facilitate research in language modeling and information retrieval, where IR is broadly interpreted to include such technologies as ad hoc and distributed retrieval, cross-language IR, summarization, filtering, and classification. The toolkit supports indexing of large-scale text databases, the construction of simple language models for documents, queries, or sub-collections, and the implementation of retrieval systems based on language models as well as a variety of other retrieval models. 1. Introduction Unfortunately the word information can be very misleading. An information retrieval system does not inform (i.e. change the knowledge of) the user on the subject of his inquiry. It merely informs on the existence (or non-existence) and whereabouts of documents relating to his request.' In this project, we will build an information retrieval system to solve the problem proposed on course web site. Section2 describes the problem we will resolve. Section3 is the approach to the problem defined in section2. Section4 is our experimental result. The performance of our IR system can be viewed from here. Section5 is the final section, we will describe the problem we met and conclude the report. 2. Problem definition The problem we want to solve is how to manage lots of documents, and build a good data structures for IR model over the text to speed up the search. Through this model, we have an efficient search engine to find the pattern that we want in a short time. Five Query topics are retrieved from the Relevance Judgements {301, 302, 304, 306, 307} in Trec6 Ad hoc. Documents are divided into two sets, one is FBIS3 with 236 files, and the other is FBIS4 with 256 files. Design an IR model and do index for FBIS3, and do incremental indexing for FBIS4 to FBIS3. After that, do evaluation for relevance model, and the performance of designed IR model can be viewed from the
2 output files. The output files contain Full Index Time, Incremental Time, Search Time, Average Precision, Precision at R(30%) and Precision at 10 docs. 3. Approach We use the Lemur Toolkit for language modeling and information retrieval to do our term project Incremental Indexing Search Engine. This toolkit was design and written at Carnegie Mellon University and at the University of Massachusetts. This toolkit supports the indexing of large scale text database, the construction of query formulation and the retrieval in the indexing database. All of the features we need to build our Incremental Indexing Search Engine are provided by this toolkit. There are three major steps shown in Figure Parsing Queries In the first step, we need to parse the text query(topics.txt) into the format we can use. We use one of the module in this tool called ParseToFile to do query formulation. We use the parameters below to configure the module. outputfile = query stopwords = stopwords.txt docformat = web stemmer = porter we give a list of stop words to the parser and use the porter, which is the well-known stemming algorithm, stemmer to help us do better query formulation. Figure 2 shows the process of parsing query Indexing Document Collections Second, we build our search database by indexing the document collection of FBIS3 and FBIS4. The module for indexing we use is IncIndexer. This module has the ability to do incremental indexing. The parameters we use below to configure the module are index =./project/myindex396memory = stopwords = stopwords.txt docformat = trec stemmer = porter datafiles = FBIS396 We use 512 MB memory for Inv(FP)InvFPPushIndex and the same stopwords.txt we mention above. The documents format are standard TREC formatted documents. Also, the porter is used for stemming. The last parameter is FBIS396 containing list of datafiles to index. We use this module to do
3 index FBIS3 index FBIS3+ increment FBIS4 index FBIS3 + FBIS4 The Figures 3, 4, and 5 below present the indexing process of the above three Retrieving Documents The third module we use in this toolkit is RetEval. There are several models provided and we choose the popular TFIDF retrieval module. This module runs retrieval experiments with the parameters we give below retmodel = 0 index =./project/myindex396:ifptextquery =query resultfile = res396.simpletfidfresultcount = 400 resultformat = 1 doc.tfmethod = 1 query.tfmethod = 1 We use log-tf as the document term TF weighting method and the query term TF weighting method. The Figure6,7 show the retrieval of FBIS3 and FBIS3+FBIS4. 4. Experiment Following is the result of our experiment. The performance is not so good as we expected. We think there must be something wrong when we used the lemur tool. The full index time, incremental time and search time are reasonable, even the precision at 10 docs is okay. But the average precision and precision at R(30%) are quite low. We still don t know the reason exactly. We try to find out the reason for them. Table1 shows the result without stemming. Table2 shows the result with stemming. The performance is much better when the experiment is without stemming. Table1 Experiment results without stemming Full Index Incremental Search Average Precision at Precision Time(sec) time Time(sec) Precision R(30%) at 10 docs FBIS s 2.8s FBIS3+FBIS s s 5.5s
4 Table2 Experiment results with stemming Full Index Incremental Search Average Precision at Precision Time(sec) time Time(sec) Precision R(30%) at 10 docs FBIS s 2s FBIS3+FBIS4 483s 217.8s 3.4s There was a problem in our experiment. From the Fig1, we can see column 9 and 18. We use a function named RetEval. There are several parameters that we need to tune up. The parameters are shown in Figure9. 5. Conclusion This is the first time we try to use lemur system to build our IR system. We encountered lots of problems while using the toolkit. In the middle time, we still want to build our system by our own, but the Lemur toolkit supports the construction of basic text retrieval systems using language modeling methods, as well as traditional methods such as those based on the vector space model and Okapi. As the toolkit evolves, it is expected that it will support research in a broader range of information technologies such as filtering, and even question answering. In a word, the toolkit is so attractive that we still decided to use it. Lemur has many applications for indexing and retrieval that are fully functional for many purposes, so we almost use them "out of the box". In addition, since Lemur was written to facilitate research on LM and IR, the design allows us to try out new retrieval methods by subclass abstract interfaces, or write new applications based on existing methods. This is a big problem for us, because we don t know clearly about the parameters they defined. We had tried many times to tune up the parameters in order to find better results. But the behaviors are really not smart, we tried to search our problem form their public forum. This forum is for the users and developers of the Lemur toolkit to discuss the software and hare tips on using Lemur as well as to ask questions. The developers of the toolkit monitor this forum on a regular basis. In the forum, we found lots of problems in the toolkit. Some codes in lemur toolkit are wrong, and we found the error in this forum. The expected performance is not so good. 6. Appendix
5 Figure1. Flow Overview Figure2. Process of parsing query
6 Figure3. The parameter of indexing FBIS3 Figure4. The parameter of incremental indexing FBIS4 Figure5. The parameter of indexing FBIS3 + FBIS4
7 Figure6. The retrieval of FBIS3 Figure7. The retrieval of FBIS3 + FBIS4
8 Figure8. Selection in Retrieval model 7. Reference [1] CMU Lemur [2] Information Retrieval Data Structures & Algorithms
VK Multimedia Information Systems
VK Multimedia Information Systems Mathias Lux, mlux@itec.uni-klu.ac.at This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Results Exercise 01 Exercise 02 Retrieval
More informationUniversity of Santiago de Compostela at CLEF-IP09
University of Santiago de Compostela at CLEF-IP9 José Carlos Toucedo, David E. Losada Grupo de Sistemas Inteligentes Dept. Electrónica y Computación Universidad de Santiago de Compostela, Spain {josecarlos.toucedo,david.losada}@usc.es
More informationdr.ir. D. Hiemstra dr. P.E. van der Vet
dr.ir. D. Hiemstra dr. P.E. van der Vet Abstract Over the last 20 years genomics research has gained a lot of interest. Every year millions of articles are published and stored in databases. Researchers
More informationCS47300: Web Information Search and Management
CS47300: Web Information Search and Management Prof. Chris Clifton 27 August 2018 Material adapted from course created by Dr. Luo Si, now leading Alibaba research group 1 AD-hoc IR: Basic Process Information
More informationCS54701: Information Retrieval
CS54701: Information Retrieval Basic Concepts 19 January 2016 Prof. Chris Clifton 1 Text Representation: Process of Indexing Remove Stopword, Stemming, Phrase Extraction etc Document Parser Extract useful
More informationCS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University
CS473: CS-473 Course Review Luo Si Department of Computer Science Purdue University Basic Concepts of IR: Outline Basic Concepts of Information Retrieval: Task definition of Ad-hoc IR Terminologies and
More informationFeature Selecting Model in Automatic Text Categorization of Chinese Financial Industrial News
Selecting Model in Automatic Text Categorization of Chinese Industrial 1) HUEY-MING LEE 1 ), PIN-JEN CHEN 1 ), TSUNG-YEN LEE 2) Department of Information Management, Chinese Culture University 55, Hwa-Kung
More informationInformation Retrieval
Multimedia Computing: Algorithms, Systems, and Applications: Information Retrieval and Search Engine By Dr. Yu Cao Department of Computer Science The University of Massachusetts Lowell Lowell, MA 01854,
More informationSemantic Extensions to Syntactic Analysis of Queries Ben Handy, Rohini Rajaraman
Semantic Extensions to Syntactic Analysis of Queries Ben Handy, Rohini Rajaraman Abstract We intend to show that leveraging semantic features can improve precision and recall of query results in information
More informationIntroduction to Information Retrieval
Introduction to Information Retrieval Mohsen Kamyar چهارمین کارگاه ساالنه آزمایشگاه فناوری و وب بهمن ماه 1391 Outline Outline in classic categorization Information vs. Data Retrieval IR Models Evaluation
More informationInformation Retrieval: Retrieval Models
CS473: Web Information Retrieval & Management CS-473 Web Information Retrieval & Management Information Retrieval: Retrieval Models Luo Si Department of Computer Science Purdue University Retrieval Models
More informationInformation Retrieval. (M&S Ch 15)
Information Retrieval (M&S Ch 15) 1 Retrieval Models A retrieval model specifies the details of: Document representation Query representation Retrieval function Determines a notion of relevance. Notion
More informationResPubliQA 2010
SZTAKI @ ResPubliQA 2010 David Mark Nemeskey Computer and Automation Research Institute, Hungarian Academy of Sciences, Budapest, Hungary (SZTAKI) Abstract. This paper summarizes the results of our first
More informationINLS W: Information Retrieval Systems Design and Implementation. Fall 2009.
INLS 490-154W: Information Retrieval Systems Design and Implementation. Fall 2009. 3. Learning to index Chirag Shah School of Information & Library Science (SILS) UNC Chapel Hill NC 27599 chirag@unc.edu
More informationPrior Art Retrieval Using Various Patent Document Fields Contents
Prior Art Retrieval Using Various Patent Document Fields Contents Metti Zakaria Wanagiri and Mirna Adriani Fakultas Ilmu Komputer, Universitas Indonesia Depok 16424, Indonesia metti.zakaria@ui.edu, mirna@cs.ui.ac.id
More informationChapter 6: Information Retrieval and Web Search. An introduction
Chapter 6: Information Retrieval and Web Search An introduction Introduction n Text mining refers to data mining using text documents as data. n Most text mining tasks use Information Retrieval (IR) methods
More informationImproving Difficult Queries by Leveraging Clusters in Term Graph
Improving Difficult Queries by Leveraging Clusters in Term Graph Rajul Anand and Alexander Kotov Department of Computer Science, Wayne State University, Detroit MI 48226, USA {rajulanand,kotov}@wayne.edu
More informationA RECOMMENDER SYSTEM FOR SOCIAL BOOK SEARCH
A RECOMMENDER SYSTEM FOR SOCIAL BOOK SEARCH A thesis Submitted to the faculty of the graduate school of the University of Minnesota by Vamshi Krishna Thotempudi In partial fulfillment of the requirements
More informationChapter 6. Queries and Interfaces
Chapter 6 Queries and Interfaces Keyword Queries Simple, natural language queries were designed to enable everyone to search Current search engines do not perform well (in general) with natural language
More informationCSA4020. Multimedia Systems:
CSA4020 Multimedia Systems: Adaptive Hypermedia Systems Lecture 4: Automatic Indexing & Performance Evaluation Multimedia Systems: Adaptive Hypermedia Systems 1 Automatic Indexing Document Retrieval Model
More informationQuery Expansion for Noisy Legal Documents
Query Expansion for Noisy Legal Documents Lidan Wang 1,3 and Douglas W. Oard 2,3 1 Computer Science Department, 2 College of Information Studies and 3 Institute for Advanced Computer Studies, University
More informationModern Information Retrieval
Modern Information Retrieval Chapter 3 Retrieval Evaluation Retrieval Performance Evaluation Reference Collections CFC: The Cystic Fibrosis Collection Retrieval Evaluation, Modern Information Retrieval,
More informationExtracting Visual Snippets for Query Suggestion in Collaborative Web Search
Extracting Visual Snippets for Query Suggestion in Collaborative Web Search Hannarin Kruajirayu, Teerapong Leelanupab Knowledge Management and Knowledge Engineering Laboratory Faculty of Information Technology
More informationOpen Research Online The Open University s repository of research publications and other research outputs
Open Research Online The Open University s repository of research publications and other research outputs A Study of Document Weight Smoothness in Pseudo Relevance Feedback Conference or Workshop Item
More informationRouting and Ad-hoc Retrieval with the. Nikolaus Walczuch, Norbert Fuhr, Michael Pollmann, Birgit Sievers. University of Dortmund, Germany.
Routing and Ad-hoc Retrieval with the TREC-3 Collection in a Distributed Loosely Federated Environment Nikolaus Walczuch, Norbert Fuhr, Michael Pollmann, Birgit Sievers University of Dortmund, Germany
More informationToolkits in IR -- Indri/Lemur and trec_eval. Jing He
Toolkits in IR -- Indri/Lemur and trec_eval Jing He hejing@iro.umontreal.ca What will you know after the lecture? How to build index/retrieve documents in Indri/Lemur? How to evaluate the retrieved results
More informationLemur (Draft last modified 12/08/2010)
1. Module name: Lemur Lemur (Draft last modified 12/08/2010) 2. Scope This module addresses the basic concepts of the Lemur platform that is specifically designed to facilitate research in Language Modeling
More informationRe-ranking Documents Based on Query-Independent Document Specificity
Re-ranking Documents Based on Query-Independent Document Specificity Lei Zheng and Ingemar J. Cox Department of Computer Science University College London London, WC1E 6BT, United Kingdom lei.zheng@ucl.ac.uk,
More informationTREC-10 Web Track Experiments at MSRA
TREC-10 Web Track Experiments at MSRA Jianfeng Gao*, Guihong Cao #, Hongzhao He #, Min Zhang ##, Jian-Yun Nie**, Stephen Walker*, Stephen Robertson* * Microsoft Research, {jfgao,sw,ser}@microsoft.com **
More informationQuery Expansion with the Minimum User Feedback by Transductive Learning
Query Expansion with the Minimum User Feedback by Transductive Learning Masayuki OKABE Information and Media Center Toyohashi University of Technology Aichi, 441-8580, Japan okabe@imc.tut.ac.jp Kyoji UMEMURA
More informationTag-based Social Interest Discovery
Tag-based Social Interest Discovery Xin Li / Lei Guo / Yihong (Eric) Zhao Yahoo!Inc 2008 Presented by: Tuan Anh Le (aletuan@vub.ac.be) 1 Outline Introduction Data set collection & Pre-processing Architecture
More informationInformation Retrieval
Introduction Information Retrieval Information retrieval is a field concerned with the structure, analysis, organization, storage, searching and retrieval of information Gerard Salton, 1968 J. Pei: Information
More informationA short introduction to the development and evaluation of Indexing systems
A short introduction to the development and evaluation of Indexing systems Danilo Croce croce@info.uniroma2.it Master of Big Data in Business SMARS LAB 3 June 2016 Outline An introduction to Lucene Main
More informationHyperlink-Extended Pseudo Relevance Feedback for Improved. Microblog Retrieval
THE AMERICAN UNIVERSITY IN CAIRO SCHOOL OF SCIENCES AND ENGINEERING Hyperlink-Extended Pseudo Relevance Feedback for Improved Microblog Retrieval A thesis submitted to Department of Computer Science and
More informationRelevancy Workbench Module. 1.0 Documentation
Relevancy Workbench Module 1.0 Documentation Created: Table of Contents Installing the Relevancy Workbench Module 4 System Requirements 4 Standalone Relevancy Workbench 4 Deploy to a Web Container 4 Relevancy
More informationNTUBROWS System for NTCIR-7. Information Retrieval for Question Answering
NTUBROWS System for NTCIR-7 Information Retrieval for Question Answering I-Chien Liu, Lun-Wei Ku, *Kuang-hua Chen, and Hsin-Hsi Chen Department of Computer Science and Information Engineering, *Department
More informationINFSCI 2140 Information Storage and Retrieval Lecture 2: Models of Information Retrieval: Boolean model. Final Group Projects
INFSCI 2140 Information Storage and Retrieval Lecture 2: Models of Information Retrieval: Boolean model Peter Brusilovsky http://www2.sis.pitt.edu/~peterb/2140-051/ Final Group Projects Groups of variable
More informationRobust Relevance-Based Language Models
Robust Relevance-Based Language Models Xiaoyan Li Department of Computer Science, Mount Holyoke College 50 College Street, South Hadley, MA 01075, USA Email: xli@mtholyoke.edu ABSTRACT We propose a new
More informationPrivacy and Security in Online Social Networks Department of Computer Science and Engineering Indian Institute of Technology, Madras
Privacy and Security in Online Social Networks Department of Computer Science and Engineering Indian Institute of Technology, Madras Lecture - 25 Tutorial 5: Analyzing text using Python NLTK Hi everyone,
More informationA New Measure of the Cluster Hypothesis
A New Measure of the Cluster Hypothesis Mark D. Smucker 1 and James Allan 2 1 Department of Management Sciences University of Waterloo 2 Center for Intelligent Information Retrieval Department of Computer
More informationFUZZY INFERENCE SYSTEMS
CHAPTER-IV FUZZY INFERENCE SYSTEMS Fuzzy inference is the process of formulating the mapping from a given input to an output using fuzzy logic. The mapping then provides a basis from which decisions can
More informationApplying the KISS Principle for the CLEF- IP 2010 Prior Art Candidate Patent Search Task
Applying the KISS Principle for the CLEF- IP 2010 Prior Art Candidate Patent Search Task Walid Magdy, Gareth J.F. Jones Centre for Next Generation Localisation School of Computing Dublin City University,
More informationTerm Frequency Normalisation Tuning for BM25 and DFR Models
Term Frequency Normalisation Tuning for BM25 and DFR Models Ben He and Iadh Ounis Department of Computing Science University of Glasgow United Kingdom Abstract. The term frequency normalisation parameter
More informationIndri at TREC 2005: Terabyte Track (Notebook Version)
Indri at TREC 2005: Terabyte Track (Notebook Version) Donald Metzler, Trevor Strohman, Yun Zhou, W. B. Croft Center for Intelligent Information Retrieval University of Massachusetts, Amherst Abstract This
More informationMANAGE YOUR CONSTRUCTION21 COMMUNITY
MANAGE YOUR CONSTRUCTION21 COMMUNITY Online communities are spaces dedicated to exchanges, news watch and sharing of documents. By creating your community on a specific topic, you stand out as a national
More informationObject-Oriented Analysis and Design Prof. Partha Pratim Das Department of Computer Science and Engineering Indian Institute of Technology-Kharagpur
Object-Oriented Analysis and Design Prof. Partha Pratim Das Department of Computer Science and Engineering Indian Institute of Technology-Kharagpur Lecture 06 Object-Oriented Analysis and Design Welcome
More informationAutomatically Generating Queries for Prior Art Search
Automatically Generating Queries for Prior Art Search Erik Graf, Leif Azzopardi, Keith van Rijsbergen University of Glasgow {graf,leif,keith}@dcs.gla.ac.uk Abstract This report outlines our participation
More informationOn the Relation Between "Semantically Tractable" Queries and AURA s Question Formulation Facility
On the Relation Between "Semantically Tractable" Queries and AURA s Question Formulation Facility 1. Introduction Working Note 34 Peter Clark, peter.e.clark@boeing.com, Sept 2009 In 2003, Popescu, Etzioni
More informationFrom Passages into Elements in XML Retrieval
From Passages into Elements in XML Retrieval Kelly Y. Itakura David R. Cheriton School of Computer Science, University of Waterloo 200 Univ. Ave. W. Waterloo, ON, Canada yitakura@cs.uwaterloo.ca Charles
More informationInverted Indexes. Indexing and Searching, Modern Information Retrieval, Addison Wesley, 2010 p. 5
Inverted Indexes Indexing and Searching, Modern Information Retrieval, Addison Wesley, 2010 p. 5 Basic Concepts Inverted index: a word-oriented mechanism for indexing a text collection to speed up the
More informationLab 2 Test collections
Lab 2 Test collections Information Retrieval, 2017 Goal Introduction The objective of this lab is for you to get acquainted with working with an IR test collection and Lemur Indri retrieval system. Instructions
More informationTREC-7 Experiments at the University of Maryland Douglas W. Oard Digital Library Research Group College of Library and Information Services University
TREC-7 Experiments at the University of Maryland Douglas W. Oard Digital Library Research Group College of Library and Information Services University of Maryland, College Park, MD 20742 oard@glue.umd.edu
More informationTerm Frequency With Average Term Occurrences For Textual Information Retrieval
Noname manuscript No. (will be inserted by the editor) Term Frequency With Average Term Occurrences For Textual Information Retrieval O. Ibrahim D. Landa-Silva Received: date / Accepted: date Abstract
More informationPhysically-Based Laser Simulation
Physically-Based Laser Simulation Greg Reshko Carnegie Mellon University reshko@cs.cmu.edu Dave Mowatt Carnegie Mellon University dmowatt@andrew.cmu.edu Abstract In this paper, we describe our work on
More informationInformation Retrieval
Information Retrieval WS 2016 / 2017 Lecture 2, Tuesday October 25 th, 2016 (Ranking, Evaluation) Prof. Dr. Hannah Bast Chair of Algorithms and Data Structures Department of Computer Science University
More informationInformation Retrieval CSCI
Information Retrieval CSCI 4141-6403 My name is Anwar Alhenshiri My email is: anwar@cs.dal.ca I prefer: aalhenshiri@gmail.com The course website is: http://web.cs.dal.ca/~anwar/ir/main.html 5/6/2012 1
More informationExperiments with ClueWeb09: Relevance Feedback and Web Tracks
Experiments with ClueWeb09: Relevance Feedback and Web Tracks Mark D. Smucker 1, Charles L. A. Clarke 2, and Gordon V. Cormack 2 1 Department of Management Sciences, University of Waterloo 2 David R. Cheriton
More informationSimilarity search in multimedia databases
Similarity search in multimedia databases Performance evaluation for similarity calculations in multimedia databases JO TRYTI AND JOHAN CARLSSON Bachelor s Thesis at CSC Supervisor: Michael Minock Examiner:
More informationCLEF-IP 2009: Exploring Standard IR Techniques on Patent Retrieval
DCU @ CLEF-IP 2009: Exploring Standard IR Techniques on Patent Retrieval Walid Magdy, Johannes Leveling, Gareth J.F. Jones Centre for Next Generation Localization School of Computing Dublin City University,
More informationInvestigate the use of Anchor-Text and of Query- Document Similarity Scores to Predict the Performance of Search Engine
Investigate the use of Anchor-Text and of Query- Document Similarity Scores to Predict the Performance of Search Engine Abdulmohsen Almalawi Computer Science Department Faculty of Computing and Information
More informationMultimedia Information Extraction and Retrieval Term Frequency Inverse Document Frequency
Multimedia Information Extraction and Retrieval Term Frequency Inverse Document Frequency Ralf Moeller Hamburg Univ. of Technology Acknowledgement Slides taken from presentation material for the following
More informationApproach Research of Keyword Extraction Based on Web Pages Document
2017 3rd International Conference on Electronic Information Technology and Intellectualization (ICEITI 2017) ISBN: 978-1-60595-512-4 Approach Research Keyword Extraction Based on Web Pages Document Yangxin
More informationAn Evaluation of Information Retrieval Accuracy. with Simulated OCR Output. K. Taghva z, and J. Borsack z. University of Massachusetts, Amherst
An Evaluation of Information Retrieval Accuracy with Simulated OCR Output W.B. Croft y, S.M. Harding y, K. Taghva z, and J. Borsack z y Computer Science Department University of Massachusetts, Amherst
More informationInstructor: Stefan Savev
LECTURE 2 What is indexing? Indexing is the process of extracting features (such as word counts) from the documents (in other words: preprocessing the documents). The process ends with putting the information
More informationPlan for today. CS276B Text Retrieval and Mining Winter Vector spaces and XML. Text-centric XML retrieval. Vector spaces and XML
CS276B Text Retrieval and Mining Winter 2005 Plan for today Vector space approaches to XML retrieval Evaluating text-centric retrieval Lecture 15 Text-centric XML retrieval Documents marked up as XML E.g.,
More informationChapter 27 Introduction to Information Retrieval and Web Search
Chapter 27 Introduction to Information Retrieval and Web Search Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 27 Outline Information Retrieval (IR) Concepts Retrieval
More informationEvaluating a Conceptual Indexing Method by Utilizing WordNet
Evaluating a Conceptual Indexing Method by Utilizing WordNet Mustapha Baziz, Mohand Boughanem, Nathalie Aussenac-Gilles IRIT/SIG Campus Univ. Toulouse III 118 Route de Narbonne F-31062 Toulouse Cedex 4
More informationReducing Redundancy with Anchor Text and Spam Priors
Reducing Redundancy with Anchor Text and Spam Priors Marijn Koolen 1 Jaap Kamps 1,2 1 Archives and Information Studies, Faculty of Humanities, University of Amsterdam 2 ISLA, Informatics Institute, University
More informationMercure at trec6 2 IRIT/SIG. Campus Univ. Toulouse III. F Toulouse. fbougha,
Mercure at trec6 M. Boughanem 1 2 C. Soule-Dupuy 2 3 1 MSI Universite de Limoges 123, Av. Albert Thomas F-87060 Limoges 2 IRIT/SIG Campus Univ. Toulouse III 118, Route de Narbonne F-31062 Toulouse 3 CERISS
More informationImpact of Term Weighting Schemes on Document Clustering A Review
Volume 118 No. 23 2018, 467-475 ISSN: 1314-3395 (on-line version) url: http://acadpubl.eu/hub ijpam.eu Impact of Term Weighting Schemes on Document Clustering A Review G. Hannah Grace and Kalyani Desikan
More informationMySQL Worst Practices. Introduction. by Jonathan Baldie
MySQL Worst Practices by Jonathan Baldie Introduction MySQL and MariaDB are two of the most popular database engines in the world. They re rightly chosen for their speed potential, portability, and the
More informationSearch Engine Architecture II
Search Engine Architecture II Primary Goals of Search Engines Effectiveness (quality): to retrieve the most relevant set of documents for a query Process text and store text statistics to improve relevance
More informationText Documents clustering using K Means Algorithm
Text Documents clustering using K Means Algorithm Mrs Sanjivani Tushar Deokar Assistant professor sanjivanideokar@gmail.com Abstract: With the advancement of technology and reduced storage costs, individuals
More informationEnhanced Performance of Search Engine with Multitype Feature Co-Selection of Db-scan Clustering Algorithm
Enhanced Performance of Search Engine with Multitype Feature Co-Selection of Db-scan Clustering Algorithm K.Parimala, Assistant Professor, MCA Department, NMS.S.Vellaichamy Nadar College, Madurai, Dr.V.Palanisamy,
More informationMoodle 2 Importing Courses from Moodle 1.9
Moodle 2 Importing Courses from Moodle 1.9 You can import the information from the previous version of Moodle to the new version through the Backup and Restore process. 1 PC: I recommend using the Firefox
More informationInformation Retrieval (IR) Introduction to Information Retrieval. Lecture Overview. Why do we need IR? Basics of an IR system.
Introduction to Information Retrieval Ethan Phelps-Goodman Some slides taken from http://www.cs.utexas.edu/users/mooney/ir-course/ Information Retrieval (IR) The indexing and retrieval of textual documents.
More informationCS2112 Fall Assignment 4 Parsing and Fault Injection. Due: March 18, 2014 Overview draft due: March 14, 2014
CS2112 Fall 2014 Assignment 4 Parsing and Fault Injection Due: March 18, 2014 Overview draft due: March 14, 2014 Compilers and bug-finding systems operate on source code to produce compiled code and lists
More informationMatrex Table of Contents
Matrex Table of Contents Matrex...1 What is the equivalent of a spreadsheet in Matrex?...2 Why I should use Matrex instead of a spreadsheet application?...3 Concepts...4 System architecture in the future
More informationInformation Retrieval on the Internet (Volume III, Part 3, 213)
Information Retrieval on the Internet (Volume III, Part 3, 213) Diana Inkpen, Ph.D., University of Toronto Assistant Professor, University of Ottawa, 800 King Edward, Ottawa, ON, Canada, K1N 6N5 Tel. 1-613-562-5800
More informationRunning the model in production mode: using the queue.
Running the model in production mode: using the queue. 1) Codes are executed with run scripts. These are shell script text files that set up the individual runs and execute the code. The scripts will seem
More informationExam in course TDT4215 Web Intelligence - Solutions and guidelines - Wednesday June 4, 2008 Time:
English Student no:... Page 1 of 14 Contact during the exam: Geir Solskinnsbakk Phone: 735 94218/ 93607988 Exam in course TDT4215 Web Intelligence - Solutions and guidelines - Wednesday June 4, 2008 Time:
More informationDocument Expansion for Text-based Image Retrieval at CLEF 2009
Document Expansion for Text-based Image Retrieval at CLEF 2009 Jinming Min, Peter Wilkins, Johannes Leveling, and Gareth Jones Centre for Next Generation Localisation School of Computing, Dublin City University
More informationSifaka: Text Mining Above a Search API
Sifaka: Text Mining Above a Search API ABSTRACT Cameron VandenBerg Language Technologies Institute Carnegie Mellon University Pittsburgh, PA 15213, USA cmw2@cs.cmu.edu Text mining and analytics software
More information(Refer Slide Time: 05:25)
Data Structures and Algorithms Dr. Naveen Garg Department of Computer Science and Engineering IIT Delhi Lecture 30 Applications of DFS in Directed Graphs Today we are going to look at more applications
More informationSINAI at CLEF ehealth 2017 Task 3
SINAI at CLEF ehealth 2017 Task 3 Manuel Carlos Díaz-Galiano, M. Teresa Martín-Valdivia, Salud María Jiménez-Zafra, Alberto Andreu, and L. Alfonso Ureña López Department of Computer Science, Universidad
More informationRelevance of a Document to a Query
Relevance of a Document to a Query Computing the relevance of a document to a query has four parts: 1. Computing the significance of a word within document D. 2. Computing the significance of word to document
More informationKey to A Successful Exadata POC
BY UMAIR MANSOOB Who Am I Oracle Certified Administrator from Oracle 7 12c Exadata Certified Implementation Specialist since 2011 Oracle Database Performance Tuning Certified Expert Oracle Business Intelligence
More informationiarabicweb16: Making a Large Web Collection More Accessible for Research
iarabicweb16: Making a Large Web Collection More Accessible for Research Khaled Yasser, Reem Suwaileh, Abdelrahman Shouman, Yassmine Barkallah, Mucahid Kutlu, Tamer Elsayed Computer Science and Engineering
More informationData Structures and Algorithms Dr. Naveen Garg Department of Computer Science and Engineering Indian Institute of Technology, Delhi.
Data Structures and Algorithms Dr. Naveen Garg Department of Computer Science and Engineering Indian Institute of Technology, Delhi Lecture 18 Tries Today we are going to be talking about another data
More informationOverview. Lab 2: Information Retrieval. Assignment Preparation. Data. .. Fall 2015 CSC 466: Knowledge Discovery from Data Alexander Dekhtyar..
.. Fall 2015 CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. Due date: Thursday, October 8. Lab 2: Information Retrieval Overview In this assignment you will perform a number of Information
More informationChapter 8. Evaluating Search Engine
Chapter 8 Evaluating Search Engine Evaluation Evaluation is key to building effective and efficient search engines Measurement usually carried out in controlled laboratory experiments Online testing can
More informationIn this simple example, it is quite clear that there are exactly two strings that match the above grammar, namely: abc and abcc
JavaCC: LOOKAHEAD MiniTutorial 1. WHAT IS LOOKAHEAD The job of a parser is to read an input stream and determine whether or not the input stream conforms to the grammar. This determination in its most
More informationQ: Given a set of keywords how can we return relevant documents quickly?
Keyword Search Traditional B+index is good for answering 1-dimensional range or point query Q: What about keyword search? Geo-spatial queries? Q: Documents on Computer Science? Q: Nearby coffee shops?
More informationLatent Semantic Indexing
Latent Semantic Indexing Thanks to Ian Soboroff Information Retrieval 1 Issues: Vector Space Model Assumes terms are independent Some terms are likely to appear together synonyms, related words spelling
More informationAn Improvement of Centroid-Based Classification Algorithm for Text Classification
An Improvement of Centroid-Based Classification Algorithm for Text Classification Zehra Cataltepe, Eser Aygun Istanbul Technical Un. Computer Engineering Dept. Ayazaga, Sariyer, Istanbul, Turkey cataltepe@itu.edu.tr,
More informationAutomatic Term Mismatch Diagnosis for Selective Query Expansion
Automatic Term Mismatch Diagnosis for Selective Query Expansion Le Zhao Language Technologies Institute Carnegie Mellon University Pittsburgh, PA, USA lezhao@cs.cmu.edu Jamie Callan Language Technologies
More informationTRIS Teaching Resource Information Service
TRIS Teaching Resource Information Service Newsletter Issue 4 The TRIS - team at the Faculty of Sciences, University of Kent, Canterbury Funded by Challenge Fund, UELT P DF (portable document format) documents
More information4.Task Abstraction. Prof. Tulasi Prasad Sariki SCSE, VIT, Chennai 1 / 16
4.Task Abstraction Prof. Tulasi Prasad Sariki SCSE, VIT, Chennai www.learnersdesk.weebly.com 1 / 16 Outline Why Analyze Tasks Abstractly? Who: Designer or User Actions Analyze Consume, Produce Search Lookup,
More informationHistorical Clicks for Product Search: GESIS at CLEF LL4IR 2015
Historical Clicks for Product Search: GESIS at CLEF LL4IR 2015 Philipp Schaer 1 and Narges Tavakolpoursaleh 12 1 GESIS Leibniz Institute for the Social Sciences, 50669 Cologne, Germany firstname.lastname@gesis.org
More informationInformation Retrieval. CS630 Representing and Accessing Digital Information. What is a Retrieval Model? Basic IR Processes
CS630 Representing and Accessing Digital Information Information Retrieval: Retrieval Models Information Retrieval Basics Data Structures and Access Indexing and Preprocessing Retrieval Models Thorsten
More information