Search Engine Architecture II

Similar documents
Chapter 2. Architecture of a Search Engine

Information Retrieval

Query Refinement and Search Result Presentation

Information Retrieval

Natural Language Processing

Chapter 6: Information Retrieval and Web Search. An introduction

Ranked Retrieval. Evaluation in IR. One option is to average the precision scores at discrete. points on the ROC curve But which points?

Information Retrieval

THE WEB SEARCH ENGINE

Chapter IR:II. II. Architecture of a Search Engine. Indexing Process Search Process

CSE 5243 INTRO. TO DATA MINING

Information Retrieval

Outline. Possible solutions. The basic problem. How? How? Relevance Feedback, Query Expansion, and Inputs to Ranking Beyond Similarity

Search Engine Architecture. Search Engine Architecture

Searching the Web for Information

Search Engines Chapter 2 Architecture Felix Naumann

Chapter 6. Queries and Interfaces

Basic Tokenizing, Indexing, and Implementation of Vector-Space Retrieval

Search With Better Results

Search Engines Information Retrieval in Practice

TREC-10 Web Track Experiments at MSRA

Information Retrieval. (M&S Ch 15)

Efficient Implementation of Postings Lists

Microsoft FAST Search Server 2010 for SharePoint for Application Developers Course 10806A; 3 Days, Instructor-led

Open Source Search. Andreas Pesenhofer. max.recall information systems GmbH Künstlergasse 11/1 A-1150 Wien Austria

SEO and UAEX.EDU GETTING YOUR WEB PAGES FOUND IN GOOGLE

Mining Web Data. Lijun Zhang

Development of Search Engines using Lucene: An Experience

International Journal of Scientific & Engineering Research Volume 2, Issue 12, December ISSN Web Search Engine

Search Like a Pro. How Search Engines Work. Comparison Search Engine. Comparison Search Engine. How Search Engines Work

Information Retrieval and Web Search

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India

Instructor: Kathleen Scheaffer Content: Adopted from Gwen Harris

Information Retrieval

Getting Started with Internet Explorer 10

Information Retrieval. Information Retrieval and Web Search

Introduction to Information Retrieval

Full-Text Indexing For Heritrix

Department of Electronic Engineering FINAL YEAR PROJECT REPORT

To search and summarize on Internet with Human Language Technology

Information Retrieval II

In = number of words appearing exactly n times N = number of words in the collection of words A = a constant. For example, if N=100 and the most

Sec. 8.7 RESULTS PRESENTATION

Search & Google. Melissa Winstanley

Statistical Insight - Help

Mining Web Data. Lijun Zhang

Relevance Feedback and Query Reformulation. Lecture 10 CS 510 Information Retrieval on the Internet Thanks to Susan Price. Outline

CS377: Database Systems Text data and information. Li Xiong Department of Mathematics and Computer Science Emory University

CS6200 Information Retrieval. Jesse Anderton College of Computer and Information Science Northeastern University

A BFS-BASED SIMILAR CONFERENCE RETRIEVAL FRAMEWORK

An Overview of Search Engine. Hai-Yang Xu Dev Lead of Search Technology Center Microsoft Research Asia

SOURCERER: MINING AND SEARCHING INTERNET- SCALE SOFTWARE REPOSITORIES

TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES

Information Retrieval and Web Search

Search With Better Results. by Hewie Poplock

The Security Role for Content Analysis

How to Search: EBSCO HOST

21. Search Models and UIs for IR

CS54701: Information Retrieval

CS 6320 Natural Language Processing

deseo: Combating Search-Result Poisoning Yu USF

Models for Document & Query Representation. Ziawasch Abedjan

Chapter 27 Introduction to Information Retrieval and Web Search

Contextual Search using Cognitive Discovery Capabilities

Introduction to Information Retrieval

Department of Computer Science and Engineering B.E/B.Tech/M.E/M.Tech : B.E. Regulation: 2013 PG Specialisation : _

No Trade Secrets. Microsoft does not claim any trade secret rights in this documentation.

Query Answering Using Inverted Indexes

Umbraco SEO tabs and Search Engines

Taming Text. How to Find, Organize, and Manipulate It MANNING GRANT S. INGERSOLL THOMAS S. MORTON ANDREW L. KARRIS. Shelter Island

BA Insight Federator. How the BA Insight Federator Extends SharePoint Search

Finding Information on the Information Highway. How to get around in the Internet

CABI Training Materials. CAB Direct. Simple Searching Global Health KNOWLEDGE FOR LIFE.

Brainspace: Quick Reference

Information Retrieval CS Lecture 13. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Information Retrieval: Retrieval Models

KDD 10 Tutorial: Recommender Problems for Web Applications. Deepak Agarwal and Bee-Chung Chen Yahoo! Research

Context based Re-ranking of Web Documents (CReWD)

INTRODUCTION. Chapter GENERAL

Searching the Internet

Unlock more volume with broad match on Bing Ads

Telkomtelstra Corporate Website Increase a Business Experience through telkomtelstra Website

Exploiting Internal and External Semantics for the Clustering of Short Texts Using World Knowledge

Access the Google Analytics Demo Account. b/demoaccount

Supporting Constructivist Learning in a Multimedia Presentation System

Searching in All the Right Places. How Is Information Organized? Chapter 5: Searching for Truth: Locating Information on the WWW

Evaluation. David Kauchak cs160 Fall 2009 adapted from:

Design and Implementation of Search Engine Using Vector Space Model for Personalized Search

Query Languages. Berlin Chen Reference: 1. Modern Information Retrieval, chapter 4

THE MINION SEARCH ENGINE: INDEXING, SEARCH, TEXT SIMILARITY AND TAG GARDENING

Today we show how a search engine works

UFCW Guide to Contracts on the Web

modern database systems lecture 4 : information retrieval

Cross Lingual Question Answering using CINDI_QA for 2007

Introduction. What do you know about web in general and web-searching in specific?

Information Retrieval. CS630 Representing and Accessing Digital Information. What is a Retrieval Model? Basic IR Processes

Search Engines Exercise 5: Querying. Dustin Lange & Saeedeh Momtazi 9 June 2011

support.ebsco.com Business Searching Interface User Guide 1 support.ebsco.com

Building Search Applications

Transcription:

Search Engine Architecture II

Primary Goals of Search Engines Effectiveness (quality): to retrieve the most relevant set of documents for a query Process text and store text statistics to improve relevance Efficiency (speed): process queries from users as fast as possible Use specialized data structures J. Pei: Information Retrieval and Web Search -- Search Engine Architecture 2

Two Major Functions Search engine components support two major functions The index process: building data structures that enable searching The query process: using those data structures to produce a ranked list of documents for a user s query J. Pei: Information Retrieval and Web Search -- Search Engine Architecture 3

The Indexing Process J. Pei: Information Retrieval and Web Search -- Search Engine Architecture 4

The Query Process J. Pei: Information Retrieval and Web Search -- Search Engine Architecture 5

User Interaction Providing the interface between users and the search engine Tasks Accepting a user query and transforming it into index terms Taking the ranked list of documents from the search engine and organizing it into the results shown to the user Generating snippets to summarize documents Refining a user query to better represent the information need J. Pei: Information Retrieval and Web Search -- Search Engine Architecture 6

Query Input Query input: providing an interface and a parser for a query language Simple languages: some Boolean operations on keywords More complex languages may consider issues like proximity of keywords [Discussion] What is the language used in Google and Bing? Do you think using punctuations in queries a good idea? J. Pei: Information Retrieval and Web Search -- Search Engine Architecture 7

Transformation Query transformation: improving the initial query before and after producing a document ranking Tokenizing, stopping, and stemming queries Spell checking and query suggestion using query logs Query expansion based on analysis of term occurrences in documents Relevance feedback based on term occurrences in documents that are identified as relevant by a user J. Pei: Information Retrieval and Web Search -- Search Engine Architecture 8

Results Output Constructing the display of ranked documents Generating snippets Highlighting important words and passages Clustering documents to identify related groups Adding appropriate ads Translating multiple language documents into the common language J. Pei: Information Retrieval and Web Search -- Search Engine Architecture 9

The Query Process J. Pei: Information Retrieval and Web Search -- Search Engine Architecture 10

Ranking and Evaluation Ranking: taking the transformed query from the user interaction component and generating a ranked list of documents using scores based on a retrieval model The core of a search engine Both effectiveness and efficiency matter Evaluation: measuring and monitoring the effectiveness and efficiency Recording and analyzing user behavior using log data Evaluation results are used to tune and improve the ranking component J. Pei: Information Retrieval and Web Search -- Search Engine Architecture 11

Scoring Assigning a score to a document according to the query and a retrieval model Some models are efficient: Boolean model Some models are costly: language model compute the probability that a document is relevant to the query based on the words in the document J. Pei: Information Retrieval and Web Search -- Search Engine Architecture 12

Performance Optimization Find an efficient way to compute the scores Using indexes term by term, term-at-a-time scoring Accessing all indexes for the query terms simultaneously, document-at-a-time Safe optimizations guarantee that the score calculated will be the same as the scores without optimization Unsafe optimization (approximation) may be faster in the tradeoff of accuracy J. Pei: Information Retrieval and Web Search -- Search Engine Architecture 13

Distributed Processing Queries can be distributed to processors in a network Distributed by a query broker Caching: indexes or even ranked document lists from previous queries are left in local memory Reusing previous query results for popular queries J. Pei: Information Retrieval and Web Search -- Search Engine Architecture 14

Evaluation Tasks Logs: clickthrough data and dwell time (time spent looking at a document) Can be used for spell checking, query suggestions, query caching, mapping ads to users, Ranking analysis: measuring the ranking method and comparing to alternatives Top ranked documents rather than the whole ranked list are more important Performance analysis: monitoring and improving overall system performance Response time, throughput, network usage Simulation methods J. Pei: Information Retrieval and Web Search -- Search Engine Architecture 15

Summary A high-level description of search engine software architecture The indexing process and the query process Building blocks and their functionalities J. Pei: Information Retrieval and Web Search -- Search Engine Architecture 16

Data in Search Engines Documents Queries Logs: history of queries and answers, possibly including user interaction data, such as clickthrough data Search engine task can be tackled using individual types of data and their combinations [Discussion] Identify the types of data used in the components of search engine architecture J. Pei: Information Retrieval and Web Search -- Search Engine Architecture 17

To-Do List Expand the figures of the query process to include the detailed functionalities A more-like-this query occurs when a user clicks on a particular document in the result list and tells the search engine to find documents that are similar to this one. Describe which low-level components are used to answer this type of query and the sequence they are used in Document filtering is an application that stores a large number of queries or user profiles and compares these profiles to every incoming document on a feed. Documents that are sufficiently similar to the profile are forwarded to that person via email or some other mechanism. Describe the architecture of a filtering engine and how it may differ from a search engine Read textbook Chapter 2.3.4-2.4 J. Pei: Information Retrieval and Web Search -- Search Engine Architecture 18