Jan Pedersen 22 July 2010

Similar documents
S-MART: Novel Tree-based Structured Learning Algorithms Applied to Tweet Entity Linking

21. Search Models and UIs for IR

Effective Latent Space Graph-based Re-ranking Model with Global Consistency

Query Refinement and Search Result Presentation

Natural Language Processing

for Searching Social Media Posts

TREC OpenSearch Planning Session

Learning Dense Models of Query Similarity from User Click Logs

Entity and Knowledge Base-oriented Information Retrieval

Latent Semantic Indexing

Models for Document & Query Representation. Ziawasch Abedjan

Modern Retrieval Evaluations. Hongning Wang

James Mayfield! The Johns Hopkins University Applied Physics Laboratory The Human Language Technology Center of Excellence!

NTT SMT System for IWSLT Katsuhito Sudoh, Taro Watanabe, Jun Suzuki, Hajime Tsukada, and Hideki Isozaki NTT Communication Science Labs.

A Study of Pattern-based Subtopic Discovery and Integration in the Web Track

Chapter 2. Architecture of a Search Engine

Search Quality. Jan Pedersen 10 September 2007

University of Virginia Department of Computer Science. CS 4501: Information Retrieval Fall 2015

Module Contact: Dr Dan Smith, CMP Copyright of the University of East Anglia Version 1

Sponsored Search Advertising. George Trimponias, CSE

CS47300: Web Information Search and Management

Natural Language Processing

From Boolean Towards Semantic Retrieval Models. Speakers : Arpan Gupta, Seinjuti Chatterjee

CS54701: Information Retrieval

Mining the Search Trails of Surfing Crowds: Identifying Relevant Websites from User Activity Data

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University

ImgSeek: Capturing User s Intent For Internet Image Search

Information Retrieval. Information Retrieval and Web Search

From Neural Re-Ranking to Neural Ranking:

IN4325 Query refinement. Claudia Hauff (WIS, TU Delft)

Letter Pair Similarity Classification and URL Ranking Based on Feedback Approach

Chapter 6: Information Retrieval and Web Search. An introduction

Knowledge Management - Overview

Estimating Embedding Vectors for Queries

Automated Fixing of Programs with Contracts

Introduction to Text Mining. Hongning Wang

NUS-I2R: Learning a Combined System for Entity Linking

> Data Mining Overview with Clementine

A BFS-BASED SIMILAR CONFERENCE RETRIEVAL FRAMEWORK

A Few Things to Know about Machine Learning for Web Search

Knowledge Retrieval. Franz J. Kurfess. Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A.

Clickthrough Log Analysis by Collaborative Ranking

60-538: Information Retrieval

PROVIDING PRIVACY AND PERSONALIZATION IN SEARCH

Understanding the Query: THCIB and THUIS at NTCIR-10 Intent Task. Junjun Wang 2013/4/22

Multimedia Information Extraction and Retrieval Term Frequency Inverse Document Frequency

An adaptable search system for collection of partially structured documents

Announcements. CS 188: Artificial Intelligence Spring Generative vs. Discriminative. Classification: Feature Vectors. Project 4: due Friday.

Search Engine Architecture II

Boolean Queries. Keywords combined with Boolean operators:

Information Retrieval and Web Search

ΕΠΛ660. Ανάκτηση µε το µοντέλο διανυσµατικού χώρου

Hybrid Approach for Query Expansion using Query Log

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li

Information Retrieval

Categorizing Search Results Using WordNet and Wikipedia

What you have learned so far. Interoperability. Ontology heterogeneity. Being serious about the semantic web

Information Retrieval

Addressing the Challenges of Underspecification in Web Search. Michael Welch

The Security Role for Content Analysis

Basic techniques. Text processing; term weighting; vector space model; inverted index; Web Search

A SYSTEMATIC STUDY OF MULTI-LEVEL QUERY UNDERSTANDING YANEN LI DISSERTATION

NTUBROWS System for NTCIR-7. Information Retrieval for Question Answering

Lessons Learned in Static Analysis Tool Evaluation. Providing World-Class Services for World-Class Competitiveness

Web Search. Lecture Objectives. Text Technologies for Data Science INFR Learn about: 11/14/2017. Instructor: Walid Magdy

5 steps for Improving Intranet Search

University of Delaware at Diversity Task of Web Track 2010

Volume 2, Issue 6, June 2014 International Journal of Advance Research in Computer Science and Management Studies

Automatic Boolean Query Suggestion for Professional Search

Effect of log-based Query Term Expansion on Retrieval Effectiveness in Patent Searching

CIS 4930/6930 Spring 2014 Introduction to Data Science /Data Intensive Computing. University of Florida, CISE Department Prof.

A Novel Approach for Restructuring Web Search Results by Feedback Sessions Using Fuzzy clustering

Log Linear Model for String Transformation Using Large Data Sets

Query Reformulation for Clinical Decision Support Search

CS47300: Web Information Search and Management

International Journal of Computer Engineering and Applications, Volume IX, Issue X, Oct. 15 ISSN

UMass at TREC 2017 Common Core Track

Diversification of Query Interpretations and Search Results

The Sum of Its Parts: Reducing Sparsity in Click Estimation with Query Segments

Introduction to Information Retrieval (Manning, Raghavan, Schutze)

Structured Learning. Jun Zhu

Information Retrieval. CS630 Representing and Accessing Digital Information. What is a Retrieval Model? Basic IR Processes

Elementary IR: Scalable Boolean Text Search. (Compare with R & G )

Information Retrieval and Web Search

A Survey Paper on String Transformation using Probabilistic Approach

Federated Search. Jaime Arguello INLS 509: Information Retrieval November 21, Thursday, November 17, 16

CSCI 599: Applications of Natural Language Processing Information Retrieval Retrieval Models (Part 3)"

University of Virginia Department of Computer Science. CS 4501: Information Retrieval Fall 2015

Improving Document Ranking for Long Queries with Nested Query Segmentation

Introduction to Information Retrieval

Relevance Feedback and Query Reformulation. Lecture 10 CS 510 Information Retrieval on the Internet Thanks to Susan Price. Outline

Query Evaluation Strategies

Formal Semantics. Prof. Clarkson Fall Today s music: Down to Earth by Peter Gabriel from the WALL-E soundtrack

Part I: Data Mining Foundations

slide courtesy of D. Yarowsky Splitting Words a.k.a. Word Sense Disambiguation Intro to NLP - J. Eisner 1

Introduction to Information Retrieval. Lecture Outline

Information Retrieval. (M&S Ch 15)

Key differentiating technologies for mobile search

Optimal Query. Assume that the relevant set of documents C r. 1 N C r d j. d j. Where N is the total number of documents.

Managing Design Processes

Transcription:

Jan Pedersen 22 July 2010

Outline Problem Statement Best effort retrieval vs automated reformulation Query Evaluation Architecture Query Understanding Models Data Sources

Standard IR Assumptions Queries are well-formed expressions of intent Best effort response to the query as given Users will reformulate If results do not meet information need Actually, queries often contain errors Misspelling Incorrect terms Users can t reformulate if they don t understand what s wrong Miss good content Dead ends

Problem Definitions Best Effort Retrieval Find the most relevant results for the user query Segmentation Stemming/Synonym expansion Term deletion Automated Query Reformulation Modify the user query to produce more relevant results for the inferred intent Spell correction Term deletion

Spelling Correction Recourse Link

Stemming Equivalent Forms Blended Results

Abbreviations Federated Results Term Expansion

Term Relaxation Key term extraction Incorrect intent

Incorrect Terminology Term Relaxation Relaxed Query

Key Concepts Win/Loss ratios Automated reformulations are errorful Wins are queries whose results improve Losses are queries whose results degrade Related to precision But not all valid reformulations change results Pre vspost result analysis Query alternatives generated pre-results Blending decisions are post results

Query Evaluation Query Annotation L4 Federation layer L3 L3 Merge layer Ranked candidates Merge layer L0/L1/L2 L0/L1/L2 L0/L1/L2 L0/L1/L2 L0/L1/L2 L0/L1/L2 Inverted Index Matching and Ranking Inverted Index Matching and Ranking

Matching and Ranking L0/L1 matching/ranking 10 10 Docs L3 reranking 10 3 Docs L2 reranking 10 5 Docs Sequence of select, L4 rank, prune steps 10 1 L0: Boolean logic L1: IR score L2/L3/L4: Machine learned Learning to Rank Using Gradient Descent, Burges et al., ICML 2005

Query Annotation {UN} Unigram Equiv Query {United Nations} Bigram {jobs} Unigram NLP query annotation Offline analysis Ambiguity preserving Multiple interpretations Backend Independent Shared Structure and Attributes Syntax and semantics

Query Planning [{un united nations } jobs] L3-merge(L2-rank([un jobs]), L2-rank([ united nations jobs]) or [{un united nations } jobs] L3-cascade(threshold, L2-rank([un jobs]), L2-rank([ united nations jobs])

Design Considerations Efficiency Query plan can generate multiple backend queries Merged in L3 Some queries are cheaper than others Query reduction can improve performance Relevance L3 merging has maximal information But is costly Multiple query plan strategies Depending on query analysis confidence

Query Analysis Models Noisy Channel Model arg max{ P ( rewrite query)} = arg max{ P( rewrite) P( query Language Model rewrite)} Example: Spelling Language Model: Likelihood of the correction Translation Model: Likelihood of the error occurring Translation Model

Language Models Based on Large-Scale Text Mining Unigrams and N-grams Probability of query term sequence Favor queries seen before Avoid nonsensical combinations 1T n-gram resource Multi-Style Language Model for Web Scale Information Retrieval Wang et al., SIGIR 2010 SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets Chaiken et al., PVLDB 2008

Translation Models Training set of aligned pairs E.g. misspelling/correction; surface/stemmed Query log analysis Session reformulation associated queries Co-click associated queries Manual Annotation Many application-specific models Random Walks on the Click Graph Craswell et al., SIGIR 2007 Context sensitive synonym discovery for web search queries Wei et al., CIKM 2009 Context sensitive stemming for web search Peng et al, SIGIR 2007

Summary Query Understanding is critical to Web Search Affects most queries Can radically improve results Trade-off between relevance and efficiency Rewrites can be costly Win/loss ratio is the key metric Especially important for tail queries No meta-data to guide matching and ranking