Recap: Relevance Feedback. CS276A Text Information Retrieval, Mining, and Exploitation. Pseudo Feedback. Pseudo-Feedback: Performance.

Size: px
Start display at page:

Download "Recap: Relevance Feedback. CS276A Text Information Retrieval, Mining, and Exploitation. Pseudo Feedback. Pseudo-Feedback: Performance."

Transcription

1 CS276A Tet Information Retrieval, Mining, and Eploitation Recap: Relevance Feedback Rocchio Algorithm: Lecture 9 5 Nov 2002 Typical weights: alpha = 8, beta = 64, gamma = 64 Tradeoff alpha vs beta/gamma: If we have a lot of judged documents, we want a higher beta/gamma. But we usually don t 2 Pseudo Feedback Pseudo-Feedback: Performance initial query apply relevance feedback retrieve documents label top k docs relevant documents top k documents 3 4 Today s topics The User in Information Access User Interfaces Browsing Visualization Find starting point Formulate/ Reformulate Query Send to system Receive results Information need Eplore results 5 User no Done? Stop yes 6

2 The User in Information Access Information Access in Contet Find starting point Formulate/ Reformulate Query Send to system Information Access Analyze Receive results Synthesize Information need User Focus of most IR! Eplore results no Done? Stop ye s 7 High-Level Goal User no Done? Stop yes 8 The User in Information Access Starting points Find starting point Information need User Formulate/ Reformulate Query Send to system Receive results Eplore results no Done? Stop yes 9 Source selection Highwire press Leis-neis Google! Overviews Directories/hierarchies Visual maps Clustering 10 Highwire Press Hierarchical browsing Source Selection Level 0 Level 1 Level

3 Visual Browsing: Themescape Browsing Scatter/Gather Starting point Scatter/gather allows the user to find a set of documents of interest through browsing. Take the collection and scatter it into n clusters. Pick the clusters of interest and merge them. Iterate Answer Credit: William Arms, Cornell Scatter/Gather Scatter/gather 17 18

4 How to Label Clusters Visual Browsing: Hyperbolic Tree Show titles of typical documents Titles are easy to scan Authors create them for quick scanning! But you can only show a few titles which may not fully represent cluster Show words/phrases prominent in cluster More likely to fully represent cluster Use distinguishing words/phrases But harder to scan Visual Browsing: Hyperbolic Tree Study of Kohonen Feature Maps H. Chen, A. Houston, R. Sewell, and B. Schatz, JASIS 49(7) Comparison: Kohonen Map and Yahoo Task: Window shop for interesting home page Repeat with other interface Results: Starting with map could repeat in Yahoo (8/11) Starting with Yahoo unable to repeat in map (2/14) 21 Credit: Marti 22Hearst Study (cont.) Participants liked: Correspondence of region size to # documents Overview (but also wanted zoom) Ease of jumping from one topic to another Multiple routes to topics Use of category and subcategory labels Study (cont.) Participants wanted: hierarchical organization other ordering of concepts (alphabetical) integration of browsing and search corresponce of color to meaning more meaningful labels labels at same level of abstraction fit more labels in the given space combined keyword and category search multiple category assignment (sports+entertain) Credit: Marti 23Hearst Credit: Marti 24Hearst

5 Browsing Searching vs. Browsing Effectiveness depends on Starting point Ease of orientation (are similar docs close etc, intuitive organization) How adaptive system is Compare to physical browsing (library, grocery store) 25 Information need dependent Open-ended (find an interesting quote on the virtues of friendship) -> browsing Specific (directions to Pacific Bell Park) -> searching User dependent Some users prefer searching, others browsing (confirmed in many studies: some hate to type) You don t need to know vocabulary for browsing. System dependent (some web sites don t support search) Searching and browsing are often interleaved. 26 Searchers vs. Browsers Eercise 1/3 of users do not search at all 1/3 rarely search (or urls only) Only 1/3 understand the concept of search (ISP data from 2000) 27 Observe your own information seeking behavior WWW University library Grocery store Are you a searcher or a browser? How do you reformulate your query? Read bad hits, then minus terms Read good hits, then plus terms Try a completely different query 28 The User in Information Access Query Specification Find starting point Information need User Formulate/ Reformulate Query Send to system Receive results Eplore results no Done? Stop yes 29 Recall: Relevance feedback Query epansion Spelling correction Query-log mining based Interaction styles for query specification Queries on the Web Parametric search Term browsing 30

6 Query Specification: Interaction Styles Shneiderman 97 Command Language Form Fillin Menu Selection Direct Manipulation Natural Language Eample: How do each apply to Boolean Queries Command-Based Query Specification command attribute value connector find pa shneiderman and tw user# What are the attribute names? What are the command names? What are allowable values? Credit: Marti 31Hearst Credit: Marti 32Hearst Form-Based Query Specification (Altavista) Form-Based Query Specification (Melvyl) Credit: Marti 33Hearst Credit: Marti 34Hearst Form-based Query Specification (Infoseek) Credit: Marti 35Hearst Credit: Marti 36Hearst

7 Menu-based Query Specification (Young & Shneiderman 93) Query Specification/Reformulation A good user interface makes it easy for the user to reformulate the query Challenge: one user interface is not ideal for all types of information needs Credit: Marti 37Hearst 38 Types of Information Needs Queries on the Web Most Frequent on 2002/10/26 Need answer to question (who won the game?) Re-find a particular document Find a good recipe for tonight s dinner Authoritative summary of information (HIV review) Eploration of new area (browse sites about Baja) Queries on the Web (2000) Intranet Queries (Aug 2000) 3351 bearfacts 3349 telebears 1909 etension 1874 schedule+of+classes 1780 bearlink 1737 bear+facts 1468 decal 1443 infobears 1227 calendar 989 career+center 974 campus+map 920 academic+calendar 840 map 773 bookstore 741 class+pass 738 housing 721 tele-bears 716 directory 667 schedule 627 recipes 602 transcripts 582 tuition 577 seti 563 registrar 550 info+bears 543 class+schedule 470 financial+aid 41 Source: Ray 42Larson

8 Intranet Queries Query Specification: Feast or Famine Summary of sample data from 3 weeks of UCB queries 13.2% Telebears/BearFacts/InfoBears/BearLink (12297) 6.7% Schedule of classes or final eams (6222) 5.4% Summer Session (5041) 3.2% Etension (2932) 3.1% Academic Calendar (2846) 2.4% Directories (2202) 1.7% Career Center (1588) 1.7% Housing (1583) 1.5% Map (1393) Average query length over last 4 months: 1.8 words This suggests what is difficult to find from the home page Famine Feast Specifying a well targeted query is hard. Bigger problem for Boolean. Source: Ray 43Larson 44 Parametric search Each document has, in addition to tet, some meta-data e.g., Language = French Format = pdf Subject = Physics etc. Date = Feb 2000 A parametric search interface allows the user to combine a full-tet query with selections on these parameters e.g., language, date range, etc. 45 Parametric search eample! " #$ %& '( )!* )+ %,' - ' $%,.(! ) /-( '( (0 0 ) " 46 Parametric search eample 12)'(--( 3 " Interfaces for term browsing 47 48

9 The User in Information Access Find starting point Information need Formulate/ Reformulate Query Send to system Receive results Eplore results User Done? no Stop yes Eplore Results Determine: Do these results answer my question? Summarization More generally: provide contet Hypertet navigation: Can I find the answer by following a link? Browsing and clustering (again) Browse to eplore results 51 Eplore Results: Contet We can t present complete documents in the result set too much information. Present information about each doc Must be concise (so we can show many docs) Must be informative Typical information about each document Summary Contet of query words Meta data: date, author, language, file name/url Contet of document in collection Information about structure of document 52 Contet in Collection: Cha-Cha Category Labels Advantages: Interpretable Capture summary information Describe multiple facets of content Domain dependent, and so descriptive Disadvantages Do not scale well (for organizing documents) Domain dependent, so costly to acquire May mis-match users interests 53 Credit: Marti 54Hearst

10 Evaluate Results Contet in Hierarchy: Cat-a-Cone Eplore Results: Summarization Query-dependent summarization KWIC (keyword in contet) lines (a la google) Query-independent summarization Summary written by author (if available) Eploit genre (news stories) Sentence etraction Natural language generation Evaluate Results Structure of document: SeeSoft Personalization User Query Outride Personalized Search System Interests Demographics Query Augmentation Intranet Search Result Processing Click Stream Search History Result Set Web Search Application Usage 57 $- (.! " $#! %& ' )( * +), + 58 How Long to Get an Answer? O u t r i d e 38.9 G o o g l e 75.4 Y a h o o! 81 E c i t e 83.5 A O L / :<;=?>$@ A 2?B C =?D$7EC>F2GCDIH 2 J =?DK L M N O P Q R S T U V W X Y Z [ Z X \ ] ^ _ ` a^ b c M b \ d W Z e f g g g

11 A E/AGF5HCIC@CD J A K L :9-M. N9O -QP,4 RS-UT VXW5V3RQY86 -CT -QPCN5+ 23Z Search Engine User Actions Difference (%) Outride 11.2 Google Yahoo! AOL Ecite Average Table 1. User actions study results. Eperienced Users Novice Users Overall Engine Epert Rank Novice Rank Average Rank % Time Time Difference Outride 32.8 (1) 45.1 (1) 38.9 (1) 0% AOL 92.3 (5) 87.0 (4) 89.6 (5) 130.2% Ecite 75.7 (3) 91.3 (5) 83.5 (4) 114.5% Google 72.5 (2) 78.4 (3) 75.4 (2) 93.7% Yahoo! 85.1 (4) 76.9 (2) 81.0 (3) 107.9% Table 2. Overall timing results (in seconds, with placement in parenthesis). 61 ) ' ( &% "# $!! Novice *,+ -/ :9-/ Eperts Others Outride Performance of Interactive Retrieval Boolean Queries: Interface Issues 63 [ Boolean logic is difficult for the average user. [ Much research was done on interfaces facilitating the creation of boolean queries by non-eperts. [ Much of this research was made obsolete by the web. [ Current view is that non-epert users are best served with non-boolean or simple +/- boolean (pioneered by altavista). [ But boolean queries are the standard for certain groups of epert users (eg, lawyers). 64 User Interfaces: Other Issues [ Technical HCI issues \ How to use screen real estate \ One monolithic window or many? \ Undo operator \ Give access to history \ Alternative interfaces for novel/epert users [ Disabilities Take-Away [ Don t ignore the user in information retrieval. [ Finding matching documents for a query is only part of information access and knowledge work. [ In addition to core information retrieval, information access interfaces need to support \ Finding starting points \ Formulation/reformulation of queries \ Eploring/evaluating results 65 66

12 Eercise Current information retrieval user interfaces are designed for typical computer screens. How would you design a user interface for a wallsize screen? Resources MIR Ch Donna Harman, Overview of the fourth tet retrieval conference (TR EC 4), National Institute of Standards and Technology. Cutting, Karger, Pedersen, Tukey. Scatter/Gather. ACM SIGIR. Hearst, Cat-a-cone, an interactive interface for specifying searches and viewing retrieving results in a large category hierarchy, ACM SIGIR

CS276A Text Information Retrieval, Mining, and Exploitation. Lecture 9 5 Nov 2002

CS276A Text Information Retrieval, Mining, and Exploitation. Lecture 9 5 Nov 2002 CS276A Text Information Retrieval, Mining, and Exploitation Lecture 9 5 Nov 2002 Recap: Relevance Feedback Rocchio Algorithm: Typical weights: alpha = 8, beta = 64, gamma = 64 Tradeoff alpha vs beta/gamma:

More information

21. Search Models and UIs for IR

21. Search Models and UIs for IR 21. Search Models and UIs for IR INFO 202-10 November 2008 Bob Glushko Plan for Today's Lecture The "Classical" Model of Search and the "Classical" UI for IR Web-based Search Best practices for UIs in

More information

A World Wide Web-based HCI-library Designed for Interaction Studies

A World Wide Web-based HCI-library Designed for Interaction Studies A World Wide Web-based HCI-library Designed for Interaction Studies Ketil Perstrup, Erik Frøkjær, Maria Konstantinovitz, Thorbjørn Konstantinovitz, Flemming S. Sørensen, Jytte Varming Department of Computing,

More information

USER SEARCH INTERFACES. Design and Application

USER SEARCH INTERFACES. Design and Application USER SEARCH INTERFACES Design and Application KEEP IT SIMPLE Search is a means towards some other end, rather than a goal in itself. Search is a mentally intensive task. Task Example: You have a friend

More information

Enabling Users to Visually Evaluate the Effectiveness of Different Search Queries or Engines

Enabling Users to Visually Evaluate the Effectiveness of Different Search Queries or Engines Appears in WWW 04 Workshop: Measuring Web Effectiveness: The User Perspective, New York, NY, May 18, 2004 Enabling Users to Visually Evaluate the Effectiveness of Different Search Queries or Engines Anselm

More information

Query reformulation CE-324: Modern Information Retrieval Sharif University of Technology

Query reformulation CE-324: Modern Information Retrieval Sharif University of Technology Query reformulation CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2016 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford) Sec.

More information

Query reformulation CE-324: Modern Information Retrieval Sharif University of Technology

Query reformulation CE-324: Modern Information Retrieval Sharif University of Technology Query reformulation CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2015 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford) Sec.

More information

Knowledge Retrieval. Franz J. Kurfess. Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A.

Knowledge Retrieval. Franz J. Kurfess. Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A. Knowledge Retrieval Franz J. Kurfess Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A. 1 Acknowledgements This lecture series has been sponsored by the European

More information

Chapter 27 Introduction to Information Retrieval and Web Search

Chapter 27 Introduction to Information Retrieval and Web Search Chapter 27 Introduction to Information Retrieval and Web Search Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 27 Outline Information Retrieval (IR) Concepts Retrieval

More information

Today s topic CS347. Results list clustering example. Why cluster documents. Clustering documents. Lecture 8 May 7, 2001 Prabhakar Raghavan

Today s topic CS347. Results list clustering example. Why cluster documents. Clustering documents. Lecture 8 May 7, 2001 Prabhakar Raghavan Today s topic CS347 Clustering documents Lecture 8 May 7, 2001 Prabhakar Raghavan Why cluster documents Given a corpus, partition it into groups of related docs Recursively, can induce a tree of topics

More information

Session 10: Information Retrieval

Session 10: Information Retrieval INFM 63: Information Technology and Organizational Context Session : Information Retrieval Jimmy Lin The ischool University of Maryland Thursday, November 7, 23 Information Retrieval What you search for!

More information

Overview On Methods Of Searching The Web

Overview On Methods Of Searching The Web Overview On Methods Of Searching The Web Introduction World Wide Web (WWW) is the ultimate source of information. It has taken over the books, newspaper, and any other paper based material. It has become

More information

User-Centered and System-Centered IR

User-Centered and System-Centered IR User-Centered and System-Centered IR Information Retrieval Lecture 2 User tasks Role of the system Document view and model Lecture 2 Information Retrieval 1 What is Information Retrieval? IR is the study

More information

Information Retrieval. Techniques for Relevance Feedback

Information Retrieval. Techniques for Relevance Feedback Information Retrieval Techniques for Relevance Feedback Introduction An information need may be epressed using different keywords (synonymy) impact on recall eamples: ship vs boat, aircraft vs airplane

More information

Adaptive Search Engines Learning Ranking Functions with SVMs

Adaptive Search Engines Learning Ranking Functions with SVMs Adaptive Search Engines Learning Ranking Functions with SVMs CS478/578 Machine Learning Fall 24 Thorsten Joachims Cornell University T. Joachims, Optimizing Search Engines Using Clickthrough Data, Proceedings

More information

CS506/606 - Topics in Information Retrieval

CS506/606 - Topics in Information Retrieval CS506/606 - Topics in Information Retrieval Instructors: Class time: Steven Bedrick, Brian Roark, Emily Prud hommeaux Tu/Th 11:00 a.m. - 12:30 p.m. September 25 - December 6, 2012 Class location: WCC 403

More information

Introduction. What do you know about web in general and web-searching in specific?

Introduction. What do you know about web in general and web-searching in specific? WEB SEARCHING Introduction What do you know about web in general and web-searching in specific? Web World Wide Web (or WWW, It is called a web because the interconnections between documents resemble a

More information

Learning Ranking Functions with SVMs

Learning Ranking Functions with SVMs Learning Ranking Functions with SVMs CS4780/5780 Machine Learning Fall 2014 Thorsten Joachims Cornell University T. Joachims, Optimizing Search Engines Using Clickthrough Data, Proceedings of the ACM Conference

More information

Relevance Feedback and Query Expansion. Debapriyo Majumdar Information Retrieval Spring 2015 Indian Statistical Institute Kolkata

Relevance Feedback and Query Expansion. Debapriyo Majumdar Information Retrieval Spring 2015 Indian Statistical Institute Kolkata Relevance Feedback and Query Epansion Debapriyo Majumdar Information Retrieval Spring 2015 Indian Statistical Institute Kolkata Importance of Recall Academic importance Not only of academic importance

More information

Promoting Website CS 4640 Programming Languages for Web Applications

Promoting Website CS 4640 Programming Languages for Web Applications Promoting Website CS 4640 Programming Languages for Web Applications [Jakob Nielsen and Hoa Loranger, Prioritizing Web Usability, Chapter 5] [Sean McManus, Web Design, Chapter 15] 1 Search Engine Optimization

More information

Chapter 2. Architecture of a Search Engine

Chapter 2. Architecture of a Search Engine Chapter 2 Architecture of a Search Engine Search Engine Architecture A software architecture consists of software components, the interfaces provided by those components and the relationships between them

More information

Directory Search Engines Searching the Yahoo Directory

Directory Search Engines Searching the Yahoo Directory Searching on the WWW Directory Oriented Search Engines Often looking for some specific information WWW has a growing collection of Search Engines to aid in locating information The Search Engines return

More information

Using Clusters on the Vivisimo Web Search Engine

Using Clusters on the Vivisimo Web Search Engine Using Clusters on the Vivisimo Web Search Engine Sherry Koshman and Amanda Spink School of Information Sciences University of Pittsburgh 135 N. Bellefield Ave., Pittsburgh, PA 15237 skoshman@sis.pitt.edu,

More information

Information Retrieval and Web Search Engines

Information Retrieval and Web Search Engines Information Retrieval and Web Search Engines Lecture 7: Document Clustering May 25, 2011 Wolf-Tilo Balke and Joachim Selke Institut für Informationssysteme Technische Universität Braunschweig Homework

More information

Information Retrieval

Information Retrieval Information Retrieval CSC 375, Fall 2016 An information retrieval system will tend not to be used whenever it is more painful and troublesome for a customer to have information than for him not to have

More information

Appendix A: Scenarios

Appendix A: Scenarios Appendix A: Scenarios Snap-Together Visualization has been used with a variety of data and visualizations that demonstrate its breadth and usefulness. Example applications include: WestGroup case law,

More information

Information Retrieval

Information Retrieval Introduction Information Retrieval Information retrieval is a field concerned with the structure, analysis, organization, storage, searching and retrieval of information Gerard Salton, 1968 J. Pei: Information

More information

Information Retrieval

Information Retrieval Information Retrieval An Introduction The view of an open-minded computer scientist What is Information Retrieval? The process of actively seeking out information relevant to a topic of interest (van Rijsbergen)

More information

Outline. Structures for subject browsing. Subject browsing. Research issues. Renardus

Outline. Structures for subject browsing. Subject browsing. Research issues. Renardus Outline Evaluation of browsing behaviour and automated subject classification: examples from KnowLib Subject browsing Automated subject classification Koraljka Golub, Knowledge Discovery and Digital Library

More information

SE Workshop PLAN. What is a Search Engine? Components of a SE. Crawler-Based Search Engines. How Search Engines (SEs) Work?

SE Workshop PLAN. What is a Search Engine? Components of a SE. Crawler-Based Search Engines. How Search Engines (SEs) Work? PLAN SE Workshop Ellen Wilson Olena Zubaryeva Search Engines: How do they work? Search Engine Optimization (SEO) optimize your website How to search? Tricks Practice What is a Search Engine? A page on

More information

THE HISTORY & EVOLUTION OF SEARCH

THE HISTORY & EVOLUTION OF SEARCH THE HISTORY & EVOLUTION OF SEARCH Duration : 1 Hour 30 Minutes Let s talk about The History Of Search Crawling & Indexing Crawlers / Spiders Datacenters Answer Machine Relevancy (200+ Factors)

More information

AN OVERVIEW OF SEARCHING AND DISCOVERING WEB BASED INFORMATION RESOURCES

AN OVERVIEW OF SEARCHING AND DISCOVERING WEB BASED INFORMATION RESOURCES Journal of Defense Resources Management No. 1 (1) / 2010 AN OVERVIEW OF SEARCHING AND DISCOVERING Cezar VASILESCU Regional Department of Defense Resources Management Studies Abstract: The Internet becomes

More information

Learning Ranking Functions with SVMs

Learning Ranking Functions with SVMs Learning Ranking Functions with SVMs CS4780/5780 Machine Learning Fall 2012 Thorsten Joachims Cornell University T. Joachims, Optimizing Search Engines Using Clickthrough Data, Proceedings of the ACM Conference

More information

Information Architecture

Information Architecture Information Architecture Why, What, & How Internet Technology 1 Why IA? Information Overload Internet Technology 2 What is IA? Process of organizing & presenting information in an intuitive & clear manner.

More information

Searching. Outline. Copyright 2006 Haim Levkowitz. Copyright 2006 Haim Levkowitz

Searching. Outline. Copyright 2006 Haim Levkowitz. Copyright 2006 Haim Levkowitz Searching 1 Outline Goals and Objectives Topic Headlines Introduction Directories Open Directory Project Search Engines Metasearch Engines Search techniques Intelligent Agents Invisible Web Summary 2 1

More information

UNIT-V WEB MINING. 3/18/2012 Prof. Asha Ambhaikar, RCET Bhilai.

UNIT-V WEB MINING. 3/18/2012 Prof. Asha Ambhaikar, RCET Bhilai. UNIT-V WEB MINING 1 Mining the World-Wide Web 2 What is Web Mining? Discovering useful information from the World-Wide Web and its usage patterns. 3 Web search engines Index-based: search the Web, index

More information

CHAPTER THREE INFORMATION RETRIEVAL SYSTEM

CHAPTER THREE INFORMATION RETRIEVAL SYSTEM CHAPTER THREE INFORMATION RETRIEVAL SYSTEM 3.1 INTRODUCTION Search engine is one of the most effective and prominent method to find information online. It has become an essential part of life for almost

More information

Sec. 8.7 RESULTS PRESENTATION

Sec. 8.7 RESULTS PRESENTATION Sec. 8.7 RESULTS PRESENTATION 1 Sec. 8.7 Result Summaries Having ranked the documents matching a query, we wish to present a results list Most commonly, a list of the document titles plus a short summary,

More information

Today we show how a search engine works

Today we show how a search engine works How Search Engines Work Today we show how a search engine works What happens when a searcher enters keywords What was performed well in advance Also explain (briefly) how paid results are chosen If we

More information

MI example for poultry/export in Reuters. Overview. Introduction to Information Retrieval. Outline.

MI example for poultry/export in Reuters. Overview. Introduction to Information Retrieval. Outline. Introduction to Information Retrieval http://informationretrieval.org IIR 16: Flat Clustering Hinrich Schütze Institute for Natural Language Processing, Universität Stuttgart 2009.06.16 Outline 1 Recap

More information

Information Retrieval

Information Retrieval Introduction to Information Retrieval SCCS414: Information Storage and Retrieval Christopher Manning and Prabhakar Raghavan Lecture 10: Text Classification; Vector Space Classification (Rocchio) Relevance

More information

This lecture: IIR Sections Ranked retrieval Scoring documents Term frequency Collection statistics Weighting schemes Vector space scoring

This lecture: IIR Sections Ranked retrieval Scoring documents Term frequency Collection statistics Weighting schemes Vector space scoring This lecture: IIR Sections 6.2 6.4.3 Ranked retrieval Scoring documents Term frequency Collection statistics Weighting schemes Vector space scoring 1 Ch. 6 Ranked retrieval Thus far, our queries have all

More information

Multimedia Information Extraction and Retrieval Term Frequency Inverse Document Frequency

Multimedia Information Extraction and Retrieval Term Frequency Inverse Document Frequency Multimedia Information Extraction and Retrieval Term Frequency Inverse Document Frequency Ralf Moeller Hamburg Univ. of Technology Acknowledgement Slides taken from presentation material for the following

More information

EVALUATION OF PROTOTYPES USABILITY TESTING

EVALUATION OF PROTOTYPES USABILITY TESTING EVALUATION OF PROTOTYPES USABILITY TESTING CPSC 544 FUNDAMENTALS IN DESIGNING INTERACTIVE COMPUTATION TECHNOLOGY FOR PEOPLE (HUMAN COMPUTER INTERACTION) WEEK 9 CLASS 17 Joanna McGrenere and Leila Aflatoony

More information

Query Refinement and Search Result Presentation

Query Refinement and Search Result Presentation Query Refinement and Search Result Presentation (Short) Queries & Information Needs A query can be a poor representation of the information need Short queries are often used in search engines due to the

More information

Lecture 5: Information Retrieval using the Vector Space Model

Lecture 5: Information Retrieval using the Vector Space Model Lecture 5: Information Retrieval using the Vector Space Model Trevor Cohn (tcohn@unimelb.edu.au) Slide credits: William Webber COMP90042, 2015, Semester 1 What we ll learn today How to take a user query

More information

Lecture 8 May 7, Prabhakar Raghavan

Lecture 8 May 7, Prabhakar Raghavan Lecture 8 May 7, 2001 Prabhakar Raghavan Clustering documents Given a corpus, partition it into groups of related docs Recursively, can induce a tree of topics Given the set of docs from the results of

More information

Clustering Results. Result List Example. Clustering Results. Information Retrieval

Clustering Results. Result List Example. Clustering Results. Information Retrieval Information Retrieval INFO 4300 / CS 4300! Presenting Results Clustering Clustering Results! Result lists often contain documents related to different aspects of the query topic! Clustering is used to

More information

Provided by TryEngineering.org -

Provided by TryEngineering.org - Provided by TryEngineering.org - Lesson Focus Lesson focuses on exploring how the development of search engines has revolutionized Internet. Students work in teams to understand the technology behind search

More information

Chapter 6: Information Retrieval and Web Search. An introduction

Chapter 6: Information Retrieval and Web Search. An introduction Chapter 6: Information Retrieval and Web Search An introduction Introduction n Text mining refers to data mining using text documents as data. n Most text mining tasks use Information Retrieval (IR) methods

More information

Aggregation for searching complex information spaces. Mounia Lalmas

Aggregation for searching complex information spaces. Mounia Lalmas Aggregation for searching complex information spaces Mounia Lalmas mounia@acm.org Outline Document Retrieval Focused Retrieval Aggregated Retrieval Complexity of the information space (s) INEX - INitiative

More information

5 Choosing keywords Initially choosing keywords Frequent and rare keywords Evaluating the competition rates of search

5 Choosing keywords Initially choosing keywords Frequent and rare keywords Evaluating the competition rates of search Seo tutorial Seo tutorial Introduction to seo... 4 1. General seo information... 5 1.1 History of search engines... 5 1.2 Common search engine principles... 6 2. Internal ranking factors... 8 2.1 Web page

More information

Text Analytics (Text Mining)

Text Analytics (Text Mining) CSE 6242 / CX 4242 Apr 1, 2014 Text Analytics (Text Mining) Concepts and Algorithms Duen Horng (Polo) Chau Georgia Tech Some lectures are partly based on materials by Professors Guy Lebanon, Jeffrey Heer,

More information

evaluation techniques goals of evaluation evaluation by experts cisc3650 human-computer interaction spring 2012 lecture # II.1

evaluation techniques goals of evaluation evaluation by experts cisc3650 human-computer interaction spring 2012 lecture # II.1 topics: evaluation techniques usability testing references: cisc3650 human-computer interaction spring 2012 lecture # II.1 evaluation techniques Human-Computer Interaction, by Alan Dix, Janet Finlay, Gregory

More information

Indexing. Week 14, Spring Edited by M. Naci Akkøk, , Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel

Indexing. Week 14, Spring Edited by M. Naci Akkøk, , Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel Indexing Week 14, Spring 2005 Edited by M. Naci Akkøk, 5.3.2004, 3.3.2005 Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel Overview Conventional indexes B-trees Hashing schemes

More information

A NEW CLUSTER MERGING ALGORITHM OF SUFFIX TREE CLUSTERING

A NEW CLUSTER MERGING ALGORITHM OF SUFFIX TREE CLUSTERING A NEW CLUSTER MERGING ALGORITHM OF SUFFIX TREE CLUSTERING Jianhua Wang, Ruixu Li Computer Science Department, Yantai University, Yantai, Shandong, China Abstract: Key words: Document clustering methods

More information

Information Networks. Hacettepe University Department of Information Management DOK 422: Information Networks

Information Networks. Hacettepe University Department of Information Management DOK 422: Information Networks Information Networks Hacettepe University Department of Information Management DOK 422: Information Networks Search engines Some Slides taken from: Ray Larson Search engines Web Crawling Web Search Engines

More information

Information Retrieval and Web Search Engines

Information Retrieval and Web Search Engines Information Retrieval and Web Search Engines Lecture 7: Document Clustering December 4th, 2014 Wolf-Tilo Balke and José Pinto Institut für Informationssysteme Technische Universität Braunschweig The Cluster

More information

Exploiting Key Answers from Your Data Warehouse Using SAS Enterprise Reporter Software

Exploiting Key Answers from Your Data Warehouse Using SAS Enterprise Reporter Software Eploiting Key Answers from Your Data Warehouse Using SAS Enterprise Reporter Software Donna Torrence, SAS Institute Inc., Cary, North Carolina Juli Staub Perry, SAS Institute Inc., Cary, North Carolina

More information

Information Retrieval

Information Retrieval Information Retrieval Suan Lee - Information Retrieval - 06 Scoring, Term Weighting and the Vector Space Model 1 Recap of lecture 5 Collection and vocabulary statistics: Heaps and Zipf s laws Dictionary

More information

The IR Black Box. Anomalous State of Knowledge. The Information Retrieval Cycle. Different Types of Interactions. Upcoming Topics.

The IR Black Box. Anomalous State of Knowledge. The Information Retrieval Cycle. Different Types of Interactions. Upcoming Topics. The IR Black Bo LBSC 796/INFM 718R: Week 8 Relevance Feedback Query Search Ranked List Jimmy Lin College of Information Studies University of Maryland Monday, March 27, 2006 Anomalous State of Knowledge

More information

Introduction to Information Retrieval

Introduction to Information Retrieval Introduction to Information Retrieval http://informationretrieval.org IIR 16: Flat Clustering Hinrich Schütze Institute for Natural Language Processing, Universität Stuttgart 2009.06.16 1/ 64 Overview

More information

Search Engine Architecture. Hongning Wang

Search Engine Architecture. Hongning Wang Search Engine Architecture Hongning Wang CS@UVa CS@UVa CS4501: Information Retrieval 2 Document Analyzer Classical search engine architecture The Anatomy of a Large-Scale Hypertextual Web Search Engine

More information

IBE101: Introduction to Information Architecture. Hans Fredrik Nordhaug 2008

IBE101: Introduction to Information Architecture. Hans Fredrik Nordhaug 2008 IBE101: Introduction to Information Architecture Hans Fredrik Nordhaug 2008 Objectives Defining IA Practicing IA User Needs and Behaviors The anatomy of IA Organizations Systems Labelling Systems Navigation

More information

CSE 3. How Is Information Organized? Searching in All the Right Places. Design of Hierarchies

CSE 3. How Is Information Organized? Searching in All the Right Places. Design of Hierarchies CSE 3 Comics Updates Shortcut(s)/Tip(s) of the Day Web Proxy Server PrimoPDF How Computers Work Ch 30 Chapter 5: Searching for Truth: Locating Information on the WWW Fluency with Information Technology

More information

Navigating Large Hierarchical Space Using Invisible Links

Navigating Large Hierarchical Space Using Invisible Links Navigating Large Hierarchical Space Using Invisible Links Ming C. Hao, Meichun Hsu, Umesh Dayal, Adrian Krug* Software Technology Laboratory HP Laboratories Palo Alto HPL-2000-8 January, 2000 E-mail:(mhao,

More information

Broadening Access to Large Online Databases by Generalizing Query Previews

Broadening Access to Large Online Databases by Generalizing Query Previews Broadening Access to Large Online Databases by Generalizing Query Previews Egemen Tanin egemen@cs.umd.edu Catherine Plaisant plaisant@cs.umd.edu Ben Shneiderman ben@cs.umd.edu Human-Computer Interaction

More information

Hierarchical Document Clustering

Hierarchical Document Clustering Hierarchical Document Clustering Benjamin C. M. Fung, Ke Wang, and Martin Ester, Simon Fraser University, Canada INTRODUCTION Document clustering is an automatic grouping of text documents into clusters

More information

Domain-specific Concept-based Information Retrieval System

Domain-specific Concept-based Information Retrieval System Domain-specific Concept-based Information Retrieval System L. Shen 1, Y. K. Lim 1, H. T. Loh 2 1 Design Technology Institute Ltd, National University of Singapore, Singapore 2 Department of Mechanical

More information

Jan Pedersen 22 July 2010

Jan Pedersen 22 July 2010 Jan Pedersen 22 July 2010 Outline Problem Statement Best effort retrieval vs automated reformulation Query Evaluation Architecture Query Understanding Models Data Sources Standard IR Assumptions Queries

More information

Web Search Basics Introduction to Information Retrieval INF 141/ CS 121 Donald J. Patterson

Web Search Basics Introduction to Information Retrieval INF 141/ CS 121 Donald J. Patterson Web Search Basics Introduction to Information Retrieval INF 141/ CS 121 Donald J. Patterson Content adapted from Hinrich Schütze http://www.informationretrieval.org Overview Overview Introduction Classic

More information

Combining Information Retrieval and Relevance Feedback for Concept Location

Combining Information Retrieval and Relevance Feedback for Concept Location Combining Information Retrieval and Relevance Feedback for Concept Location Sonia Haiduc - WSU CS Graduate Seminar - Jan 19, 2010 1 Software changes Software maintenance: 50-90% of the global costs of

More information

Link Analysis and Web Search

Link Analysis and Web Search Link Analysis and Web Search Moreno Marzolla Dip. di Informatica Scienza e Ingegneria (DISI) Università di Bologna http://www.moreno.marzolla.name/ based on material by prof. Bing Liu http://www.cs.uic.edu/~liub/webminingbook.html

More information

Interaction Style Categories. COSC 3461 User Interfaces. What is a Command-line Interface? Command-line Interfaces

Interaction Style Categories. COSC 3461 User Interfaces. What is a Command-line Interface? Command-line Interfaces COSC User Interfaces Module 2 Interaction Styles What is a Command-line Interface? An interface where the user types commands in direct response to a prompt Examples Operating systems MS-DOS Unix Applications

More information

Overview. Data-mining. Commercial & Scientific Applications. Ongoing Research Activities. From Research to Technology Transfer

Overview. Data-mining. Commercial & Scientific Applications. Ongoing Research Activities. From Research to Technology Transfer Data Mining George Karypis Department of Computer Science Digital Technology Center University of Minnesota, Minneapolis, USA. http://www.cs.umn.edu/~karypis karypis@cs.umn.edu Overview Data-mining What

More information

Searching in All the Right Places. How Is Information Organized? Chapter 5: Searching for Truth: Locating Information on the WWW

Searching in All the Right Places. How Is Information Organized? Chapter 5: Searching for Truth: Locating Information on the WWW Chapter 5: Searching for Truth: Locating Information on the WWW Fluency with Information Technology Third Edition by Lawrence Snyder Searching in All the Right Places The Obvious and Familiar To find tax

More information

Information Behavior in Digital Age (III): Related Research

Information Behavior in Digital Age (III): Related Research Information Behavior in Digital Age (III): Related Research Invited Lectures on Information Behaviors 國立政治大學圖書資訊與檔案學研究所 Peiling Wang, Ph.D. November 28, 2013 Use of Digital Information Resources & Internet

More information

Document Clustering for Mediated Information Access The WebCluster Project

Document Clustering for Mediated Information Access The WebCluster Project Document Clustering for Mediated Information Access The WebCluster Project School of Communication, Information and Library Sciences Rutgers University The original WebCluster project was conducted at

More information

Information Retrieval CSCI

Information Retrieval CSCI Information Retrieval CSCI 4141-6403 My name is Anwar Alhenshiri My email is: anwar@cs.dal.ca I prefer: aalhenshiri@gmail.com The course website is: http://web.cs.dal.ca/~anwar/ir/main.html 5/6/2012 1

More information

Digital Forensic Text String Searching: Improving Information Retrieval Effectiveness by Thematically Clustering Search Results

Digital Forensic Text String Searching: Improving Information Retrieval Effectiveness by Thematically Clustering Search Results Digital Forensic Text String Searching: Improving Information Retrieval Effectiveness by Thematically Clustering Search Results DFRWS 2007 Department of Information Systems & Technology Management The

More information

Information Retrieval

Information Retrieval Multimedia Computing: Algorithms, Systems, and Applications: Information Retrieval and Search Engine By Dr. Yu Cao Department of Computer Science The University of Massachusetts Lowell Lowell, MA 01854,

More information

Assignment No. 1. Abdurrahman Yasar. June 10, QUESTION 1

Assignment No. 1. Abdurrahman Yasar. June 10, QUESTION 1 COMPUTER ENGINEERING DEPARTMENT BILKENT UNIVERSITY Assignment No. 1 Abdurrahman Yasar June 10, 2014 1 QUESTION 1 Consider the following search results for two queries Q1 and Q2 (the documents are ranked

More information

The Anatomy of a Large-Scale Hypertextual Web Search Engine

The Anatomy of a Large-Scale Hypertextual Web Search Engine The Anatomy of a Large-Scale Hypertextual Web Search Engine Article by: Larry Page and Sergey Brin Computer Networks 30(1-7):107-117, 1998 1 1. Introduction The authors: Lawrence Page, Sergey Brin started

More information

A Task-Based Evaluation of an Aggregated Search Interface

A Task-Based Evaluation of an Aggregated Search Interface A Task-Based Evaluation of an Aggregated Search Interface No Author Given No Institute Given Abstract. This paper presents a user study that evaluated the effectiveness of an aggregated search interface

More information

EVALUATION OF PROTOTYPES USABILITY TESTING

EVALUATION OF PROTOTYPES USABILITY TESTING EVALUATION OF PROTOTYPES USABILITY TESTING CPSC 544 FUNDAMENTALS IN DESIGNING INTERACTIVE COMPUTATIONAL TECHNOLOGY FOR PEOPLE (HUMAN COMPUTER INTERACTION) WEEK 9 CLASS 17 Joanna McGrenere and Leila Aflatoony

More information

Student Usability Project Recommendations Define Information Architecture for Library Technology

Student Usability Project Recommendations Define Information Architecture for Library Technology Student Usability Project Recommendations Define Information Architecture for Library Technology Erika Rogers, Director, Honors Program, California Polytechnic State University, San Luis Obispo, CA. erogers@calpoly.edu

More information

Text Analytics (Text Mining)

Text Analytics (Text Mining) CSE 6242 / CX 4242 Text Analytics (Text Mining) Concepts and Algorithms Duen Horng (Polo) Chau Georgia Tech Some lectures are partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John Stasko,

More information

Module 1: Internet Basics for Web Development (II)

Module 1: Internet Basics for Web Development (II) INTERNET & WEB APPLICATION DEVELOPMENT SWE 444 Fall Semester 2008-2009 (081) Module 1: Internet Basics for Web Development (II) Dr. El-Sayed El-Alfy Computer Science Department King Fahd University of

More information

Web Search. Lecture Objectives. Text Technologies for Data Science INFR Learn about: 11/14/2017. Instructor: Walid Magdy

Web Search. Lecture Objectives. Text Technologies for Data Science INFR Learn about: 11/14/2017. Instructor: Walid Magdy Text Technologies for Data Science INFR11145 Web Search Instructor: Walid Magdy 14-Nov-2017 Lecture Objectives Learn about: Working with Massive data Link analysis (PageRank) Anchor text 2 1 The Web Document

More information

CSC369 Lecture 9. Larry Zhang, November 16, 2015

CSC369 Lecture 9. Larry Zhang, November 16, 2015 CSC369 Lecture 9 Larry Zhang, November 16, 2015 1 Announcements A3 out, due ecember 4th Promise: there will be no extension since it is too close to the final exam (ec 7) Be prepared to take the challenge

More information

Introduction to Information Retrieval

Introduction to Information Retrieval Introduction Inverted index Processing Boolean queries Course overview Introduction to Information Retrieval http://informationretrieval.org IIR 1: Boolean Retrieval Hinrich Schütze Institute for Natural

More information

Supporting Exploratory Search Through User Modeling

Supporting Exploratory Search Through User Modeling Supporting Exploratory Search Through User Modeling Kumaripaba Athukorala, Antti Oulasvirta, Dorota Glowacka, Jilles Vreeken, Giulio Jacucci Helsinki Institute for Information Technology HIIT Department

More information

Information Retrieval Lecture 4: Web Search. Challenges of Web Search 2. Natural Language and Information Processing (NLIP) Group

Information Retrieval Lecture 4: Web Search. Challenges of Web Search 2. Natural Language and Information Processing (NLIP) Group Information Retrieval Lecture 4: Web Search Computer Science Tripos Part II Simone Teufel Natural Language and Information Processing (NLIP) Group sht25@cl.cam.ac.uk (Lecture Notes after Stephen Clark)

More information

Information Retrieval

Information Retrieval Information Retrieval Suan Lee - Information Retrieval - 09 Relevance Feedback & Query Epansion 1 Recap of the last lecture Evaluating a search engine Benchmarks Precision and recall Results summaries

More information

Semantic Website Clustering

Semantic Website Clustering Semantic Website Clustering I-Hsuan Yang, Yu-tsun Huang, Yen-Ling Huang 1. Abstract We propose a new approach to cluster the web pages. Utilizing an iterative reinforced algorithm, the model extracts semantic

More information

Planning Your Web Site

Planning Your Web Site Planning Your Web Site All You Want To Know And Have No One To Ask What to do BEFORE DURING and AFTER PLAN,PLAN, PLAN Before You Build Domain Name Hosting Solutions Ecommerce? Databases? What kind of content?

More information

Chapter 6. Queries and Interfaces

Chapter 6. Queries and Interfaces Chapter 6 Queries and Interfaces Keyword Queries Simple, natural language queries were designed to enable everyone to search Current search engines do not perform well (in general) with natural language

More information

The Person in Personal

The Person in Personal WWW Panel: Searching Personal Content The Person in Personal (Supporting the Person in Searching Personal Content) Susan Dumais Microsoft Research http://research.microsoft.com/~sdumais Stuff I ve I Seen

More information

Mining Web Data. Lijun Zhang

Mining Web Data. Lijun Zhang Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems

More information

Almost 80 percent of new site visits begin at search engines. A couple of years back Nielsen published a list of popular search engines.

Almost 80 percent of new site visits begin at search engines. A couple of years back Nielsen published a list of popular search engines. SEO OverView We have a problem, we want people to visit our Web site, that's the purpose after all to bring people to our website and increase traffic inorder to buy soundspirit products and learn more

More information