Document Clustering for Mediated Information Access The WebCluster Project
|
|
- Alexina Quinn
- 5 years ago
- Views:
Transcription
1 Document Clustering for Mediated Information Access The WebCluster Project School of Communication, Information and Library Sciences Rutgers University The original WebCluster project was conducted at the Robert Gordon University, Aberdeen, UK. It was supervised by Prof. David J. Harper and sponsored by Ubilab, Zurich. Current work is being conducted in collaboration with Ph.D. student Hyuk-Jin Lee and Prof. Nicholas J. Belkin. Exploratory Search Interfaces: Categorization, Clustering and Beyond Workshop at HCIL 2005, University of Maryland, June 2, 2004
2 WebCluster - Motivation Information Need (within some subject domain) WWW_SearchEngine Query Search engine Domain Gulfs information need query structured subject domain unstructured target collection (WWW)
3 Interaction in the library Information need 1. Select library 2. Consult catalog Information Need Formulation 3. Browse shelves 4. Use inter-library scheme
4 Can we simulate the library interaction? Information need Structured source collections 3. Search WWW 1. Select source collection Results Information Need Formulation Results 2. Explore source collection with ClusterBook
5 The mediated access interaction Information need WebCluster Specialised source Query Web search engine Topical documents Target collection (WWW)
6 Interaction model vs. prototype Structuring the source collection w Document clustering w Supervised classification w Manual (intellectual) classification Exploring the structured source collection w Metaphor Library, book, encyclopaedia w Visualization tool Folder metaphor, hyperbolic tree, themescape, cone trees, thematic maps w Search strategies supported Best match or cluster-based searching, browsing
7 Model vs. prototype Interaction model w Explicit (the user marks relevant documents) vs. implicit (cues on relevance are derived based on user behavior/actions) w Transparent (the user is aware) vs. opaque (the user is happy to see effect of magic ) w Automatic vs. manual/intellectual generation of the mediated query Query model w Language models (generative, Kullback-Leibler) w Probabilistic models w Rocchio or other RF-specific formulae
8 ClusterBook - Source collection
9 ClusterBook - Target collection
10 Informal experiments - Objectives - Test the users reaction to the mediated access concept Test the user satisfaction regarding the functionality of the system, and the relevance of the documents retrieved Formative usability testing - some volunteers were not only experienced searchers, but also had experience in evaluating IR systems Comparison of user generated queries vs. system generated queries Note. These experiments were run at different stages of the development
11 Informal experiments - Experimental procedure - Subjects received introduction to the system Task assigned: You are a trainee in a newspaper. You support the journalists by providing information for the topic of their articles. Sample topics: w The history of the Brasilian debt crisis w How are the quotas for growing coffee set and controlled on a world-wide basis? Source collection: a sub-collection of Reuters (newspaper articles) Steps followed by users (explicit scenario): w Formulate a query and record it w Browse source collection, select best cluster, edit query generated by system, submit it to the search engine w Submit to the same search engine the initial, self-generated query w Compare results of the two searches
12 Informal experiments - Results - Users found the mediation useful for unfamiliar topics The system nearly always proposed new, good query terms Users not always good at recognizing good query terms The system proposed bad query terms (not specific to the topic) the opaque scenario not viable unless the query formulation is improved The two-step process was questioned when: w the query formulation was considered easy, for a familiar topic w the documents of the source collection were considered sufficient to cover the information need Complete link, group average OK; single link bad Overall, the system is usable
13 Consequences of informal experiments Formal experiments are needed to verify the main assumptions: w The Cluster Hypothesis holds for a specialized collection w Good clusters can be found with the search strategies provided w Mediated queries can improve retrieval effectiveness The effect on retrieval performance of various parameters should be compared w Weighting schemes w Clustering methods w Search strategies
14 Critical issue: The label generation w Document representatives w searching w Cluster representatives w browsing w searching w mediation w Collection representatives w collection selection Wind Energy Power Generation Propulsion Fixed Plants... Coastal Wind Farms Inland Wind Farms Portable Generators... Pacific Rim Wind Farms Design of Coastal Wind Farms Desert Design of Wind Farms. Wind generators for yachts
15 Mediation experiment - simulations Objectives: w Test the potential of mediation to increase retrieval effectiveness w Test the effect on performance of a variety of parameters Cluster-based mediation (realistic mediation) Search engine Topic-based mediator (upperbound) Search engine Source collection Simple query generator (baseline) Target collection
16 Experimental setup Interactive track of TREC-8 w Offers relevance judgments for complex topics, with a multitude of aspects w Offers the experimental design for the user experiment w Six topics with 12 to 56 aspects each w Target collection: FT , with 210,158 articles w Source collection built based on relevance judgments: half of the relevant documents, their nearest neighbors, plus the documents judged non-relevant
17 Results the cluster hypothesis Aspectual cluster hypothesis confirmed by an extended version of the van Rijsbergen Sparck Jones separation test w Similarity between pairs of docs covering the same aspect is higher than between pairs of docs covering the same topics, which is higher than between pairs of docs in the collection Consequence confirmed: clustering groups documents in pockets of relevance
18 Results retrieval effectiveness Tf-Idf > KL > RelFreq as weighting schemes for document representation Adding disambiguation terms to the query increases recall, but decreases precision Nearest-neighbor mediation ( more like this ) highly significantly improves both recall and precision, even if just one exemplary document is offered for each topic aspect Cosine and Dice performs similarly
19 Mediation results Upperbound experiment (all relevant docs known in source) w Both recall and precision increase with query length w Query term weights strongly affect performance w No evidence that uniformity of term frequency affects performance Clustered source mediation w Best cluster mediation increases P, decreases R w Fuse and search strong increase in R and P w Search and fuse good R, terrible P!
20 Result presentation (within subjects) User experiment effectiveness of mediated information retrieval for Web searches Query formulation (between subjects) Linear (list) Structured (cluster) Unaided Baseline On the fly clustering Mediated Source-based mediation Source & target based mediation
21 User experiment no mediation
22 User experiment mediated access
23 User experiment mediated access
24 Contributions of WebCluster Proposes and explores system-based mediated access to very large heterogeneous document collections Explores the use of clustering for capturing the topical, semantic structure of a problem domain (as represented by a specialized collection) Explores the use of language models for building cluster and document representatives Offers a framework for building structured portals on the WWW Offers a framework for building collaborative environments
25 WebCluster - Other applications CD-ROM based collections w structured access to the collection itself w mediated access to WWW (via CD-ROM) Mediated access (portals) via hierarchically structured information sources Examples are: via large structured report (e.g. government reports), via structured collection of information (e.g. encyclopaedia), via intranet Multimedia information access w cluster multimedia source, e.g. annotated photographs w mediated access to other photographs (not annotated)
26 Other directions for WebCluster Clustered vs. categorized source collection Language model based labels vs. specialized terminology based on a domain ontology / thesaurus Different interaction and visualization metaphors w Spring-embedded algorithms for 2D representation of clusters Various inter-document similarities (faceted?) User profiles / personalization w Change of interest over time
27 Topic representation What (weighted) terms best describe a topic? w Applications: w Clustering generating cluster representatives w Mediation generating mediated queries w Machine generation w Simulation based on test topics and relevance judgments w Use various weighting formulae and cut-off points w Which representations are more effective? What do they have in common / specific? w Human (manual / intellectual generation) w Compare queries generated by searchers in TREC tasks Effectiveness Keyword vs. natural language queries
28 Questions?
29 Query formulation problems Vague information need Vocabulary mismatch Difficulty of query language syntax Lack of context, ambiguity of terms Lack of a search strategy No understanding of the underlying indexing/searching model Note. TREC experiments have shown that the quality of the query has a higher impact on retrieval effectiveness than weighting schemes or search algorithms.
30 Role of structure Reveals the semantic structure of the domain & its concepts Groups (semantically?) similar documents Supports exploration and concept formation Supports term disambiguation (context) (Has potential for efficient retrieval) (Has potential for effective retrieval) Science... Computing, Mathematics Physics Computing Computer Programming language Mathematics... Screen Keyboard Pascal C++ Algebra
31 R i = Browsing label (relative cluster representative) KL ( cluster, parent) = i p i, cluster log p p i, cluster i, parent Wind Energy Power Generation Propulsion... Fixed Plants Coastal Wind Farms Inland Wind Farms Portable Generators... Pacific Rim Wind Farms Design of Coastal Wind Farms Desert Wind Farms Design of. Wind generators for yachts
32 A i = Searching label (absolute cluster representative) KL ( cluster, collection) = i p i, cluster log p p i, cluster i, collection Wind Energy Power Generation Propulsion... Fixed Plants Coastal Wind Farms Inland Wind Farms Portable Generators... Pacific Rim Wind Farms Design of Coastal Wind Farms Desert Wind Farms Design of. Wind generators for yachts
33 E i = (1 ω) + (1 ω) ω Mediation label (Expanded cluster representative) A r 1 i,0 A + (1 ω) ω i, r 1 + r ω A i, r A i,1 2 + (1 ω) ω A i,2 Wind Energy +... Power Generation Propulsion... Fixed Plants Coastal Wind Farms Inland Wind Farms Portable Generators... Pacific Rim Wind Farms Design of Coastal Wind Farms Desert Wind Farms Design of. Wind generators for yachts
34 Topic model representations Exemplary representation Statistical analysis Statistical representation Language model Context analysis Keyword representation Typical terms, weighted Thresholding Mediated query
35 The cluster hypothesis Reminder: the original cluster hypothesis w Closely associated documents tend to be relevant to the same requests (van Rijsbergen) Aspectual cluster hypothesis: Highly similar documents tend to be relevant to the same topic. However, documents relevant to the same topic may be quite dissimilar if they cover distinct aspects of the topic. w Consequence: Clustering algorithms tend to group together documents that cover highly focused topics, or aspects of complex topic. Documents covering distinct aspects of complex topics tend to be spread over the cluster structure.
36 Aspects of relevance in the mediated access process
37 Distribution of relevant documents in clusters Clustering vs. Random 25% 20% 15% Recall 10% 5% 0% Clusters Clustering Random
38 Name w Transparent mediated access Targeted users w Experienced searchers WebCluster scenario#1 Web Search Engine WWW c 0 c 3 c 2 c 5 Specific w The users are aware of the mediation process, of the separation between the source and target collections w The users have the option to edit the query generated (proposed) by the system. They understand the indexing / searching model. W e b C l u s t e r c 0 c 1 c 2 c 3 c 4 c 5 Document from the source collection Document from the target collection (WWW)
39 WebCluster scenario#2 Name W e b C l u s t e r c 0 c 1 c 2 c 3 c 4 c 5 w Opaque mediated access Targeted users w Naive / beginner searchers Web Search Engine c 3 WWW c 0 c 2 c 5 Document from the source collection Document from the target collection (WWW) Specific w The users explore the structure of the domain, which contains sample documents, and have the option of asking for similar documents w The users are unaware of the mediation process - the query generation and target search are not visible
40 Initial user interface (Java AWT)
Information Retrieval
Information Retrieval An Introduction The view of an open-minded computer scientist What is Information Retrieval? The process of actively seeking out information relevant to a topic of interest (van Rijsbergen)
More informationInformation Retrieval (Part 1)
Information Retrieval (Part 1) Fabio Aiolli http://www.math.unipd.it/~aiolli Dipartimento di Matematica Università di Padova Anno Accademico 2008/2009 1 Bibliographic References Copies of slides Selected
More informationA New Measure of the Cluster Hypothesis
A New Measure of the Cluster Hypothesis Mark D. Smucker 1 and James Allan 2 1 Department of Management Sciences University of Waterloo 2 Center for Intelligent Information Retrieval Department of Computer
More informationText Categorization (I)
CS473 CS-473 Text Categorization (I) Luo Si Department of Computer Science Purdue University Text Categorization (I) Outline Introduction to the task of text categorization Manual v.s. automatic text categorization
More informationReading group on Ontologies and NLP:
Reading group on Ontologies and NLP: Machine Learning27th infebruary Automated 2014 1 / 25 Te Reading group on Ontologies and NLP: Machine Learning in Automated Text Categorization, by Fabrizio Sebastianini.
More informationCS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University
CS473: CS-473 Course Review Luo Si Department of Computer Science Purdue University Basic Concepts of IR: Outline Basic Concepts of Information Retrieval: Task definition of Ad-hoc IR Terminologies and
More informationInformation Retrieval. (M&S Ch 15)
Information Retrieval (M&S Ch 15) 1 Retrieval Models A retrieval model specifies the details of: Document representation Query representation Retrieval function Determines a notion of relevance. Notion
More informationLecture 7: Relevance Feedback and Query Expansion
Lecture 7: Relevance Feedback and Query Expansion Information Retrieval Computer Science Tripos Part II Ronan Cummins Natural Language and Information Processing (NLIP) Group ronan.cummins@cl.cam.ac.uk
More informationChapter 27 Introduction to Information Retrieval and Web Search
Chapter 27 Introduction to Information Retrieval and Web Search Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 27 Outline Information Retrieval (IR) Concepts Retrieval
More informationInformation Retrieval and Knowledge Organisation
Information Retrieval and Knowledge Organisation Knut Hinkelmann Content Information Retrieval Indexing (string search and computer-linguistic aproach) Classical Information Retrieval: Boolean, vector
More informationInformation Retrieval and Web Search Engines
Information Retrieval and Web Search Engines Lecture 7: Document Clustering May 25, 2011 Wolf-Tilo Balke and Joachim Selke Institut für Informationssysteme Technische Universität Braunschweig Homework
More informationInformation Retrieval
Information Retrieval CSC 375, Fall 2016 An information retrieval system will tend not to be used whenever it is more painful and troublesome for a customer to have information than for him not to have
More informationTEXT CHAPTER 5. W. Bruce Croft BACKGROUND
41 CHAPTER 5 TEXT W. Bruce Croft BACKGROUND Much of the information in digital library or digital information organization applications is in the form of text. Even when the application focuses on multimedia
More informationMath Information Retrieval: User Requirements and Prototype Implementation. Jin Zhao, Min Yen Kan and Yin Leng Theng
Math Information Retrieval: User Requirements and Prototype Implementation Jin Zhao, Min Yen Kan and Yin Leng Theng Why Math Information Retrieval? Examples: Looking for formulas Collect teaching resources
More informationIntroduction to Information Retrieval
Introduction to Information Retrieval http://informationretrieval.org IIR 16: Flat Clustering Hinrich Schütze Institute for Natural Language Processing, Universität Stuttgart 2009.06.16 1/ 64 Overview
More informationKnowledge Retrieval. Franz J. Kurfess. Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A.
Knowledge Retrieval Franz J. Kurfess Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A. 1 Acknowledgements This lecture series has been sponsored by the European
More informationINFORMATION RETRIEVAL SYSTEM: CONCEPT AND SCOPE
15 : CONCEPT AND SCOPE 15.1 INTRODUCTION Information is communicated or received knowledge concerning a particular fact or circumstance. Retrieval refers to searching through stored information to find
More informationContent-based Recommender Systems
Recuperação de Informação Doutoramento em Engenharia Informática e Computadores Instituto Superior Técnico Universidade Técnica de Lisboa Bibliography Pasquale Lops, Marco de Gemmis, Giovanni Semeraro:
More informationCSE 7/5337: Information Retrieval and Web Search Document clustering I (IIR 16)
CSE 7/5337: Information Retrieval and Web Search Document clustering I (IIR 16) Michael Hahsler Southern Methodist University These slides are largely based on the slides by Hinrich Schütze Institute for
More informationSearch Engines. Information Retrieval in Practice
Search Engines Information Retrieval in Practice All slides Addison Wesley, 2008 Classification and Clustering Classification and clustering are classical pattern recognition / machine learning problems
More informationThis literature review provides an overview of the various topics related to using implicit
Vijay Deepak Dollu. Implicit Feedback in Information Retrieval: A Literature Analysis. A Master s Paper for the M.S. in I.S. degree. April 2005. 56 pages. Advisor: Stephanie W. Haas This literature review
More informationChapter 6: Information Retrieval and Web Search. An introduction
Chapter 6: Information Retrieval and Web Search An introduction Introduction n Text mining refers to data mining using text documents as data. n Most text mining tasks use Information Retrieval (IR) methods
More informationSearching for Expertise
Searching for Expertise Toine Bogers Royal School of Library & Information Science University of Copenhagen IVA/CCC seminar April 24, 2013 Outline Introduction Expertise databases Expertise seeking tasks
More informationLearning Ontology-Based User Profiles: A Semantic Approach to Personalized Web Search
1 / 33 Learning Ontology-Based User Profiles: A Semantic Approach to Personalized Web Search Bernd Wittefeld Supervisor Markus Löckelt 20. July 2012 2 / 33 Teaser - Google Web History http://www.google.com/history
More informationThe IR Black Box. Anomalous State of Knowledge. The Information Retrieval Cycle. Different Types of Interactions. Upcoming Topics.
The IR Black Bo LBSC 796/INFM 718R: Week 8 Relevance Feedback Query Search Ranked List Jimmy Lin College of Information Studies University of Maryland Monday, March 27, 2006 Anomalous State of Knowledge
More informationIn the recent past, the World Wide Web has been witnessing an. explosive growth. All the leading web search engines, namely, Google,
1 1.1 Introduction In the recent past, the World Wide Web has been witnessing an explosive growth. All the leading web search engines, namely, Google, Yahoo, Askjeeves, etc. are vying with each other to
More informationWEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS
1 WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS BRUCE CROFT NSF Center for Intelligent Information Retrieval, Computer Science Department, University of Massachusetts,
More informationOutline. Possible solutions. The basic problem. How? How? Relevance Feedback, Query Expansion, and Inputs to Ranking Beyond Similarity
Outline Relevance Feedback, Query Expansion, and Inputs to Ranking Beyond Similarity Lecture 10 CS 410/510 Information Retrieval on the Internet Query reformulation Sources of relevance for feedback Using
More informationContextual Information Retrieval Using Ontology-Based User Profiles
Contextual Information Retrieval Using Ontology-Based User Profiles Vishnu Kanth Reddy Challam Master s Thesis Defense Date: Jan 22 nd, 2004. Committee Dr. Susan Gauch(Chair) Dr.David Andrews Dr. Jerzy
More informationA World Wide Web-based HCI-library Designed for Interaction Studies
A World Wide Web-based HCI-library Designed for Interaction Studies Ketil Perstrup, Erik Frøkjær, Maria Konstantinovitz, Thorbjørn Konstantinovitz, Flemming S. Sørensen, Jytte Varming Department of Computing,
More informationINF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering
INF4820 Algorithms for AI and NLP Evaluating Classifiers Clustering Murhaf Fares & Stephan Oepen Language Technology Group (LTG) September 27, 2017 Today 2 Recap Evaluation of classifiers Unsupervised
More informationFlat Clustering. Slides are mostly from Hinrich Schütze. March 27, 2017
Flat Clustering Slides are mostly from Hinrich Schütze March 7, 07 / 79 Overview Recap Clustering: Introduction 3 Clustering in IR 4 K-means 5 Evaluation 6 How many clusters? / 79 Outline Recap Clustering:
More informationIntroduction to Information Retrieval
Introduction to Information Retrieval (Supplementary Material) Zhou Shuigeng March 23, 2007 Advanced Distributed Computing 1 Text Databases and IR Text databases (document databases) Large collections
More informationIntroduction to Information Retrieval
Introduction to Information Retrieval Mohsen Kamyar چهارمین کارگاه ساالنه آزمایشگاه فناوری و وب بهمن ماه 1391 Outline Outline in classic categorization Information vs. Data Retrieval IR Models Evaluation
More informationEnhancing Cluster Quality by Using User Browsing Time
Enhancing Cluster Quality by Using User Browsing Time Rehab M. Duwairi* and Khaleifah Al.jada'** * Department of Computer Information Systems, Jordan University of Science and Technology, Irbid 22110,
More informationIN4325 Query refinement. Claudia Hauff (WIS, TU Delft)
IN4325 Query refinement Claudia Hauff (WIS, TU Delft) The big picture Information need Topic the user wants to know more about The essence of IR Query Translation of need into an input for the search engine
More informationIntroduction to Information Retrieval
Introduction to Information Retrieval http://informationretrieval.org IIR 6: Flat Clustering Wiltrud Kessler & Hinrich Schütze Institute for Natural Language Processing, University of Stuttgart 0-- / 83
More informationInformation Retrieval and Web Search Engines
Information Retrieval and Web Search Engines Lecture 7: Document Clustering December 4th, 2014 Wolf-Tilo Balke and José Pinto Institut für Informationssysteme Technische Universität Braunschweig The Cluster
More information21. Search Models and UIs for IR
21. Search Models and UIs for IR INFO 202-10 November 2008 Bob Glushko Plan for Today's Lecture The "Classical" Model of Search and the "Classical" UI for IR Web-based Search Best practices for UIs in
More informationIn = number of words appearing exactly n times N = number of words in the collection of words A = a constant. For example, if N=100 and the most
In = number of words appearing exactly n times N = number of words in the collection of words A = a constant. For example, if N=100 and the most common word appears 10 times then A = rn*n/n = 1*10/100
More informationEnhancing Cluster Quality by Using User Browsing Time
Enhancing Cluster Quality by Using User Browsing Time Rehab Duwairi Dept. of Computer Information Systems Jordan Univ. of Sc. and Technology Irbid, Jordan rehab@just.edu.jo Khaleifah Al.jada' Dept. of
More informationText Mining. Munawar, PhD. Text Mining - Munawar, PhD
10 Text Mining Munawar, PhD Definition Text mining also is known as Text Data Mining (TDM) and Knowledge Discovery in Textual Database (KDT).[1] A process of identifying novel information from a collection
More informationCS 6320 Natural Language Processing
CS 6320 Natural Language Processing Information Retrieval Yang Liu Slides modified from Ray Mooney s (http://www.cs.utexas.edu/users/mooney/ir-course/slides/) 1 Introduction of IR System components, basic
More informationHybrid Approach for Query Expansion using Query Log
Volume 7 No.6, July 214 www.ijais.org Hybrid Approach for Query Expansion using Query Log Lynette Lopes M.E Student, TSEC, Mumbai, India Jayant Gadge Associate Professor, TSEC, Mumbai, India ABSTRACT Web
More informationInformation Retrieval and Web Search
Information Retrieval and Web Search Relevance Feedback. Query Expansion Instructor: Rada Mihalcea Intelligent Information Retrieval 1. Relevance feedback - Direct feedback - Pseudo feedback 2. Query expansion
More informationCSA4020. Multimedia Systems:
CSA4020 Multimedia Systems: Adaptive Hypermedia Systems Lecture 4: Automatic Indexing & Performance Evaluation Multimedia Systems: Adaptive Hypermedia Systems 1 Automatic Indexing Document Retrieval Model
More informationText classification II CE-324: Modern Information Retrieval Sharif University of Technology
Text classification II CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2015 Some slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford)
More informationSYSTEMS FOR NON STRUCTURED INFORMATION MANAGEMENT
SYSTEMS FOR NON STRUCTURED INFORMATION MANAGEMENT Prof. Dipartimento di Elettronica e Informazione Politecnico di Milano INFORMATION SEARCH AND RETRIEVAL Inf. retrieval 1 PRESENTATION SCHEMA GOALS AND
More informationCS6200 Information Retrieval. Jesse Anderton College of Computer and Information Science Northeastern University
CS6200 Information Retrieval Jesse Anderton College of Computer and Information Science Northeastern University Major Contributors Gerard Salton! Vector Space Model Indexing Relevance Feedback SMART Karen
More informationData Mining of Web Access Logs Using Classification Techniques
Data Mining of Web Logs Using Classification Techniques Md. Azam 1, Asst. Prof. Md. Tabrez Nafis 2 1 M.Tech Scholar, Department of Computer Science & Engineering, Al-Falah School of Engineering & Technology,
More informationCHAPTER 6 PROPOSED HYBRID MEDICAL IMAGE RETRIEVAL SYSTEM USING SEMANTIC AND VISUAL FEATURES
188 CHAPTER 6 PROPOSED HYBRID MEDICAL IMAGE RETRIEVAL SYSTEM USING SEMANTIC AND VISUAL FEATURES 6.1 INTRODUCTION Image representation schemes designed for image retrieval systems are categorized into two
More informationEvaluating a Visual Information Retrieval Interface: AspInquery at TREC-6
Evaluating a Visual Information Retrieval Interface: AspInquery at TREC-6 Russell Swan James Allan Don Byrd Center for Intelligent Information Retrieval Computer Science Department University of Massachusetts
More informationAutomatically Generating Queries for Prior Art Search
Automatically Generating Queries for Prior Art Search Erik Graf, Leif Azzopardi, Keith van Rijsbergen University of Glasgow {graf,leif,keith}@dcs.gla.ac.uk Abstract This report outlines our participation
More informationKeywords APSE: Advanced Preferred Search Engine, Google Android Platform, Search Engine, Click-through data, Location and Content Concepts.
Volume 5, Issue 3, March 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Advanced Preferred
More informationSemantic Extensions to Syntactic Analysis of Queries Ben Handy, Rohini Rajaraman
Semantic Extensions to Syntactic Analysis of Queries Ben Handy, Rohini Rajaraman Abstract We intend to show that leveraging semantic features can improve precision and recall of query results in information
More informationX. A Relevance Feedback System Based on Document Transformations. S. R. Friedman, J. A. Maceyak, and S. F. Weiss
X-l X. A Relevance Feedback System Based on Document Transformations S. R. Friedman, J. A. Maceyak, and S. F. Weiss Abstract An information retrieval system using relevance feedback to modify the document
More information10/10/13. Traditional database system. Information Retrieval. Information Retrieval. Information retrieval system? Information Retrieval Issues
COS 597A: Principles of Database and Information Systems Information Retrieval Traditional database system Large integrated collection of data Uniform access/modifcation mechanisms Model of data organization
More informationA Comparison of Three Document Clustering Algorithms: TreeCluster, Word Intersection GQF, and Word Intersection Hierarchical Agglomerative Clustering
A Comparison of Three Document Clustering Algorithms:, Word Intersection GQF, and Word Intersection Hierarchical Agglomerative Clustering Abstract Kenrick Mock 9/23/1998 Business Applications Intel Architecture
More informationImproving Difficult Queries by Leveraging Clusters in Term Graph
Improving Difficult Queries by Leveraging Clusters in Term Graph Rajul Anand and Alexander Kotov Department of Computer Science, Wayne State University, Detroit MI 48226, USA {rajulanand,kotov}@wayne.edu
More informationChapter 8. Evaluating Search Engine
Chapter 8 Evaluating Search Engine Evaluation Evaluation is key to building effective and efficient search engines Measurement usually carried out in controlled laboratory experiments Online testing can
More informationInformation Retrieval
Introduction to Information Retrieval SCCS414: Information Storage and Retrieval Christopher Manning and Prabhakar Raghavan Lecture 10: Text Classification; Vector Space Classification (Rocchio) Relevance
More informationINFS 427: AUTOMATED INFORMATION RETRIEVAL (1 st Semester, 2018/2019)
INFS 427: AUTOMATED INFORMATION RETRIEVAL (1 st Semester, 2018/2019) Session 01 Introduction to Information Retrieval Lecturer: Mrs. Florence O. Entsua-Mensah, DIS Contact Information: fentsua-mensah@ug.edu.gh
More informationUsing Clusters on the Vivisimo Web Search Engine
Using Clusters on the Vivisimo Web Search Engine Sherry Koshman and Amanda Spink School of Information Sciences University of Pittsburgh 135 N. Bellefield Ave., Pittsburgh, PA 15237 skoshman@sis.pitt.edu,
More informationPreference Learning in Recommender Systems
Preference Learning in Recommender Systems M. de Gemmis L. Iaquinta P. Lops C. Musto F. Narducci G. Semeraro Department of Computer Science University of Bari, Italy ECML/PKDD workshop on Preference Learning,
More informationVECTOR SPACE CLASSIFICATION
VECTOR SPACE CLASSIFICATION Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Introduction to Information Retrieval, Cambridge University Press. Chapter 14 Wei Wei wwei@idi.ntnu.no Lecture
More informationInformation Retrieval. Techniques for Relevance Feedback
Information Retrieval Techniques for Relevance Feedback Introduction An information need may be epressed using different keywords (synonymy) impact on recall eamples: ship vs boat, aircraft vs airplane
More informationIntroduction to Interactive Systems. Overview. What Is an Interactive System? SMD158 Interactive Systems Spring 2005
INSTITUTIONEN FÖR SYSTEMTEKNIK LULEÅ TEKNISKA UNIVERSITET Introduction to Interactive Systems SMD158 Interactive Systems Spring 2005 Jan-14-05 1997-2005 by David A. Carr 1 L Overview What is an interactive
More informationAdding user context to IR test collections
Adding user context to IR test collections Birger Larsen Information Systems and Interaction Design Royal School of Library and Information Science Copenhagen, Denmark blar @ iva.dk Outline RSLIS and ISID
More informationTilburg University. Authoritative re-ranking of search results Bogers, A.M.; van den Bosch, A. Published in: Advances in Information Retrieval
Tilburg University Authoritative re-ranking of search results Bogers, A.M.; van den Bosch, A. Published in: Advances in Information Retrieval Publication date: 2006 Link to publication Citation for published
More informationOptimal Query. Assume that the relevant set of documents C r. 1 N C r d j. d j. Where N is the total number of documents.
Optimal Query Assume that the relevant set of documents C r are known. Then the best query is: q opt 1 C r d j C r d j 1 N C r d j C r d j Where N is the total number of documents. Note that even this
More informationFederated Search. Jaime Arguello INLS 509: Information Retrieval November 21, Thursday, November 17, 16
Federated Search Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu November 21, 2016 Up to this point... Classic information retrieval search from a single centralized index all ueries
More informationNavigating the User Query Space
Navigating the User Query Space Ronan Cummins 1, Mounia Lalmas 2, Colm O Riordan 3 and Joemon M. Jose 1 1 School of Computing Science, University of Glasgow, UK 2 Yahoo! Research, Barcelona, Spain 3 Dept.
More informationInformation Retrieval
Information Retrieval Test Collections Gintarė Grigonytė gintare@ling.su.se Department of Linguistics and Philology Uppsala University Slides based on previous IR course given by K.F. Heppin 2013-15 and
More informationMovie Recommender System - Hybrid Filtering Approach
Chapter 7 Movie Recommender System - Hybrid Filtering Approach Recommender System can be built using approaches like: (i) Collaborative Filtering (ii) Content Based Filtering and (iii) Hybrid Filtering.
More informationCS54701: Information Retrieval
CS54701: Information Retrieval Basic Concepts 19 January 2016 Prof. Chris Clifton 1 Text Representation: Process of Indexing Remove Stopword, Stemming, Phrase Extraction etc Document Parser Extract useful
More informationPV211: Introduction to Information Retrieval https://www.fi.muni.cz/~sojka/pv211
PV: Introduction to Information Retrieval https://www.fi.muni.cz/~sojka/pv IIR 6: Flat Clustering Handout version Petr Sojka, Hinrich Schütze et al. Faculty of Informatics, Masaryk University, Brno Center
More informationIntroduction to Information Retrieval
Introduction to Information Retrieval http://informationretrieval.org IIR 6: Flat Clustering Hinrich Schütze Center for Information and Language Processing, University of Munich 04-06- /86 Overview Recap
More informationMaking Retrieval Faster Through Document Clustering
R E S E A R C H R E P O R T I D I A P Making Retrieval Faster Through Document Clustering David Grangier 1 Alessandro Vinciarelli 2 IDIAP RR 04-02 January 23, 2004 D a l l e M o l l e I n s t i t u t e
More informationWhat you have learned so far. Interoperability. Ontology heterogeneity. Being serious about the semantic web
What you have learned so far Interoperability Introduction to the Semantic Web Tutorial at ISWC 2010 Jérôme Euzenat Data can be expressed in RDF Linked through URIs Modelled with OWL ontologies & Retrieved
More informationQUERY EXPANSION USING WORDNET WITH A LOGICAL MODEL OF INFORMATION RETRIEVAL
QUERY EXPANSION USING WORDNET WITH A LOGICAL MODEL OF INFORMATION RETRIEVAL David Parapar, Álvaro Barreiro AILab, Department of Computer Science, University of A Coruña, Spain dparapar@udc.es, barreiro@udc.es
More informationOverview of Web Mining Techniques and its Application towards Web
Overview of Web Mining Techniques and its Application towards Web *Prof.Pooja Mehta Abstract The World Wide Web (WWW) acts as an interactive and popular way to transfer information. Due to the enormous
More informationEmpirical Analysis of Single and Multi Document Summarization using Clustering Algorithms
Engineering, Technology & Applied Science Research Vol. 8, No. 1, 2018, 2562-2567 2562 Empirical Analysis of Single and Multi Document Summarization using Clustering Algorithms Mrunal S. Bewoor Department
More informationSocial Search Networks of People and Search Engines. CS6200 Information Retrieval
Social Search Networks of People and Search Engines CS6200 Information Retrieval Social Search Social search Communities of users actively participating in the search process Goes beyond classical search
More informationWEIGHTING QUERY TERMS USING WORDNET ONTOLOGY
IJCSNS International Journal of Computer Science and Network Security, VOL.9 No.4, April 2009 349 WEIGHTING QUERY TERMS USING WORDNET ONTOLOGY Mohammed M. Sakre Mohammed M. Kouta Ali M. N. Allam Al Shorouk
More informationThe What, Why, Who and How of Where: Building a Portal for Geospatial Data. Alan Darnell Director, Scholars Portal
The What, Why, Who and How of Where: Building a Portal for Geospatial Data Alan Darnell Director, Scholars Portal What? Scholars GeoPortal Beta release Fall 2011 Production release March 2012 OLITA Award
More informationExtending the Facets concept by applying NLP tools to catalog records of scientific literature
Extending the Facets concept by applying NLP tools to catalog records of scientific literature *E. Picchi, *M. Sassi, **S. Biagioni, **S. Giannini *Institute of Computational Linguistics **Institute of
More informationConceptual Language Models for Context-Aware Text Retrieval
Conceptual Language Models for Context-Aware Text Retrieval Henning Rode Djoerd Hiemstra University of Twente, The Netherlands {h.rode, d.hiemstra}@cs.utwente.nl Abstract While participating in the HARD
More informationUsability Report. Author: Stephen Varnado Version: 1.0 Date: November 24, 2014
Usability Report Author: Stephen Varnado Version: 1.0 Date: November 24, 2014 2 Table of Contents Executive summary... 3 Introduction... 3 Methodology... 3 Usability test results... 4 Effectiveness ratings
More informationKeyword Extraction by KNN considering Similarity among Features
64 Int'l Conf. on Advances in Big Data Analytics ABDA'15 Keyword Extraction by KNN considering Similarity among Features Taeho Jo Department of Computer and Information Engineering, Inha University, Incheon,
More informationInformation Retrieval. Lecture 7
Information Retrieval Lecture 7 Recap of the last lecture Vector space scoring Efficiency considerations Nearest neighbors and approximations This lecture Evaluating a search engine Benchmarks Precision
More informationMultimedia Information Systems
Multimedia Information Systems Samson Cheung EE 639, Fall 2004 Lecture 6: Text Information Retrieval 1 Digital Video Library Meta-Data Meta-Data Similarity Similarity Search Search Analog Video Archive
More informationImproving the Performance of Search Engine With Respect To Content Mining Kr.Jansi, L.Radha
Improving the Performance of Search Engine With Respect To Content Mining Kr.Jansi, L.Radha 1 Asst. Professor, Srm University, Chennai 2 Mtech, Srm University, Chennai Abstract R- Google is a dedicated
More informationDatabases and Information Retrieval Integration TIETS42. Kostas Stefanidis Autumn 2016
+ Databases and Information Retrieval Integration TIETS42 Autumn 2016 Kostas Stefanidis kostas.stefanidis@uta.fi http://www.uta.fi/sis/tie/dbir/index.html http://people.uta.fi/~kostas.stefanidis/dbir16/dbir16-main.html
More informationFondazione Ugo Bordoni at TREC 2004
Fondazione Ugo Bordoni at TREC 2004 Giambattista Amati, Claudio Carpineto, and Giovanni Romano Fondazione Ugo Bordoni Rome Italy Abstract Our participation in TREC 2004 aims to extend and improve the use
More informationInformation Retrieval
Introduction Information Retrieval Information retrieval is a field concerned with the structure, analysis, organization, storage, searching and retrieval of information Gerard Salton, 1968 J. Pei: Information
More informationDesign and Implementation of Search Engine Using Vector Space Model for Personalized Search
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 1, January 2014,
More informationijade Reporter An Intelligent Multi-agent Based Context Aware News Reporting System
ijade Reporter An Intelligent Multi-agent Based Context Aware Reporting System Eddie C.L. Chan and Raymond S.T. Lee The Department of Computing, The Hong Kong Polytechnic University, Hung Hong, Kowloon,
More informationImproving the Effectiveness of Information Retrieval with Local Context Analysis
Improving the Effectiveness of Information Retrieval with Local Context Analysis JINXI XU BBN Technologies and W. BRUCE CROFT University of Massachusetts Amherst Techniques for automatic query expansion
More informationCS47300: Web Information Search and Management
CS47300: Web Information Search and Management Federated Search Prof. Chris Clifton 13 November 2017 Federated Search Outline Introduction to federated search Main research problems Resource Representation
More informationINF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering
INF4820 Algorithms for AI and NLP Evaluating Classifiers Clustering Erik Velldal & Stephan Oepen Language Technology Group (LTG) September 23, 2015 Agenda Last week Supervised vs unsupervised learning.
More information