Diversification of Query Interpretations and Search Results

Similar documents
DivQ: Diversification for Keyword Search over Structured Databases

University of Delaware at Diversity Task of Web Track 2010

Efficient Diversification of Web Search Results

5. Novelty & Diversity

An Investigation of Basic Retrieval Models for the Dynamic Domain Task

A Comparative Analysis of Cascade Measures for Novelty and Diversity

Overview of the TREC 2010 Web Track

A Survey on Keyword Diversification Over XML Data

Retrieval Evaluation. Hongning Wang

Information Retrieval

Refinement of keyword queries over structured data with ontologies and users

Information Retrieval

Information Retrieval. Lecture 7 - Evaluation in Information Retrieval. Introduction. Overview. Standard test collection. Wintersemester 2007

An Analysis of NP-Completeness in Novelty and Diversity Ranking

Current Approaches to Search Result Diversication

Overview of the TREC 2009 Web Track

Diversifying Query Suggestions Based on Query Documents

Exploiting the Diversity of User Preferences for Recommendation

Chapter 8. Evaluating Search Engine

Information Retrieval

TREC 2017 Dynamic Domain Track Overview

Search Result Diversification

INTERSOCIAL: Unleashing the Power of Social Networks for Regional SMEs

TREC 2013 Web Track Overview

Modeling Expected Utility of Multi-session Information Distillation

A Study of Pattern-based Subtopic Discovery and Integration in the Web Track

Part 11: Collaborative Filtering. Francesco Ricci

Experiments with ClueWeb09: Relevance Feedback and Web Tracks

Information Retrieval

CSCI 599: Applications of Natural Language Processing Information Retrieval Evaluation"

Overview of the NTCIR-13 OpenLiveQ Task

Query Subtopic Mining Exploiting Word Embedding for Search Result Diversification

ICTNET at Web Track 2010 Diversity Task

Evaluating search engines CE-324: Modern Information Retrieval Sharif University of Technology

Result Diversification For Tweet Search

The 1st International Workshop on Diversity in Document Retrieval

Advanced Search Techniques for Large Scale Data Analytics Pavel Zezula and Jan Sedmidubsky Masaryk University

Search Result Diversity Evaluation based on Intent Hierarchies

International Journal of Computer Engineering and Applications, Volume IX, Issue X, Oct. 15 ISSN

Université de Montréal at the NTCIR-11 IMine Task

Similarity and Diversity in Information Retrieval

ISSN Vol.05,Issue.07, July-2017, Pages:

IRCE at the NTCIR-12 IMine-2 Task

Discounted Cumulated Gain based Evaluation of Multiple Query IR Sessions

The Water Filling Model and The Cube Test: Multi-Dimensional Evaluation for Professional Search

Similarity search in multimedia databases

Reducing Redundancy with Anchor Text and Spam Priors

Information Retrieval

A probabilistic model to resolve diversity-accuracy challenge of recommendation systems

mnir: Diversifying Search Results based on a Mixture of Novelty, Intention and Relevance

Search Engines Chapter 8 Evaluating Search Engines Felix Naumann

On Duplicate Results in a Search Session

NTU Approaches to Subtopic Mining and Document Ranking at NTCIR-9 Intent Task

Evaluation of Retrieval Systems

Increasing evaluation sensitivity to diversity

University of Virginia Department of Computer Science. CS 4501: Information Retrieval Fall 2015

Dealing with Incomplete Judgments in Cascade Measures

Evaluation of Retrieval Systems

CSCI 5417 Information Retrieval Systems. Jim Martin!

Addressing the Challenges of Underspecification in Web Search. Michael Welch

Revisiting Sub topic Retrieval in the ImageCLEF 2009 Photo Retrieval Task

Interactive keyword-based access to large-scale structured datasets

Evaluation of Retrieval Systems

Information Search in Web Archives

Learning Temporal-Dependent Ranking Models

Probabilistic Models of Novel Document Rankings for Faceted Topic Retrieval

ICFHR 2014 COMPETITION ON HANDWRITTEN KEYWORD SPOTTING (H-KWS 2014)

Ranking Web Pages by Associating Keywords with Locations

Context based Re-ranking of Web Documents (CReWD)

Classification. 1 o Semestre 2007/2008

Part 7: Evaluation of IR Systems Francesco Ricci

Introduction & Administrivia

Towards Breaking the Quality Curse. AWebQuerying Web-Querying Approach to Web People Search.

Effective Searching of RDF Knowledge Bases

TREC 2016 Dynamic Domain Track: Exploiting Passage Representation for Retrieval and Relevance Feedback

Overview of the TREC 2013 Crowdsourcing Track

Proposed System. Start. Search parameter definition. User search criteria (input) usefulness score > 0.5. Retrieve results

Keyword Search over RDF Graphs. Elisa Menendez

Interactive Exploratory Search for Multi Page Search Results

Coverage-based search result diversification

{david.vallet,

Dynamic Embeddings for User Profiling in Twitter

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2012/13

QoS Management of Web Services

Using Coherence-based Measures to Predict Query Difficulty

Modeling Document Novelty with Neural Tensor Network for Search Result Diversification

Citation for published version (APA): He, J. (2011). Exploring topic structure: Coherence, diversity and relatedness

Experiment Design and Evaluation for Information Retrieval Rishiraj Saha Roy Computer Scientist, Adobe Research Labs India

Keyword query interpretation over structured data

A Document-centered Approach to a Natural Language Music Search Engine

Introduction to Information Retrieval

Microsoft Research Asia at the Web Track of TREC 2009

Lecture 5: Information Retrieval using the Vector Space Model

Information Retrieval. Lecture 7

TREC 2014 Web Track Overview

ISSN Vol.08,Issue.18, October-2016, Pages:

WebSci and Learning to Rank for IR

James Mayfield! The Johns Hopkins University Applied Physics Laboratory The Human Language Technology Center of Excellence!

Evaluating search engines CE-324: Modern Information Retrieval Sharif University of Technology

Building a Microblog Corpus for Search Result Diversification

Transcription:

Diversification of Query Interpretations and Search Results Advanced Methods of IR Elena Demidova Materials used in the slides: Charles L.A. Clarke, Maheedhar Kolla, Gordon V. Cormack, Olga Vechtomova, Azin Ashkan, Stefan Büttcher, and Ian MacKinnon. 2008. Novelty and diversity in information retrieval evaluation. In Proc. of the ACM SIGIR 2008. E. Demidova, P. Fankhauser, X. Zhou, and W. Nejdl. 2010. DivQ: diversification for keyword search over structured databases. In Proc. of ACM SIGIR 2010. C.D. Manning, P. Raghavan, H. Schütze. Introduction to Information Retrieval. Cambridge University Press. 2008. 1

The probability ranking principle (PRP) If an IR system s response to each query is a ranking of documents in order of decreasing probability of relevance, the overall effectiveness of the system to its user will be maximized [1]. Let s look at some examples and question this principle. 2

An example A search engine result for the query UPS. What are possible informational needs? Which results are relevant to these informational needs? 3

An example Search engine result for the query UPS. Recap: How can we quantify the quality of the search result? 4

Evaluation measures for ranked retrieval: precision@k Recap: What is the precision@5 of this result (for the informational need/s you identified)? 5

Evaluation measures for ranked retrieval: NDCG NDCG: normalized discounted cumulative gain a standard evaluation measure when graded relevance values are available. Intuition: relevant results retrieved earlier should be rewarded. NDCG@k computation: Step 1: assess R(j, m) - the relevance of the result at position m for the query j. Step 2: transform the relevance of result into the gain: G[m]=2 R(j, m) -1 Step 3: discount the gain of the result at position m by log 2 (1+m). Step 4: cumulate the discounted gain over the top-k results (DCG) Step 5: normalize the DCG using Z kj - a normalization factor such that NDCG of the perfect ranking at k is 1. Step 6: average over a set of queries Q. How is NDCG different from the precision@k? 6

Novelty and diversity in IR evaluation ndcg@k and precision@k assume that relevance of each result can be judged in isolation, independent of other results Ambiguity in queries and redundancy in retrieved documents are poorly reflected by evaluation measures such as precision@k and ndcg. Evaluation should systematically reward novelty and diversity. Elena Demidova: Advanced Methods of Information 7

Example from [C. Clarke et. al SIGIR 2008.] Question answering example The questions can be viewed as representatives of examples the user may be seeking. 8

Example from [C. Clarke et. al SIGIR 2008.] Question answering example 9

Information nuggets Information nugget n represents any binary property of the document e.g. an answer to a question, topicality, etc. N = {n 1,, n m } is the space of all possible nuggets. User s information need u is a set of nuggets a subset of N. The information present in a document is a set of nuggets d a subset of N. u d N 10

Information nuggets Document d is relevant to a query u if it contains at least one relevant information nugget. u d N 11

Diversity-aware evaluation measures: ɑ-ndcg ɑ-ndcg: adjust the gain computation of NDCG to discourage redundant results J(d k, i) assessor judged that document d k contains nugget n i. ɑ - a factor that balances relevance and novelty r i, k-1 the number of documents ranked up to position k-1 that have been judged to contain nugget n i. Sum over all m nuggets: Other steps are equivalent to the NDCG computation. 12

Example from [C. Clarke et. al SIGIR 2008.] Compute the gain vector for ɑ-ndcg Work in groups: 10 minutes J(d k, i) assessor judged that document d k contains nugget n i. ɑ - a factor that balances relevance and novelty = 0.5 r i, k-1 the number of documents ranked up to position k-1 that have been judged to contain nugget n i. i m sum over the nuggets 13

Example from [C. Clarke et. al SIGIR 2008.] Compute the gain vector for ɑ-ndcg J(d k, i) assessor judged that document d k contains nugget n i. ɑ - a factor that balances relevance and novelty = 0.5 r i, k-1 the number of documents ranked up to position k-1 that have been judged to contain nugget n i. i m sum over the nuggets G[a] = 1+ 1 = 2 G[b] = 0.5 G[c] = 0.5^2=0.25 G[d] = 0 G[e] = 2 G [f] = 0.5 G [g] = 1 G [h] = 0.5^2= 0.25 G [i] = 0 G[j] = 0 14

Diversification in databases: The goals Materialization of results is expensive in databases Provide an overview of the variety of differently structured search results. Develop efficient algorithms and evaluation metrics for search result diversification over structured data. Query interpretations have clear semantics, they offer quality information for diversification We diversify query interpretations and avoid materializing and filtering redundant results 15

An example query Keyword query: CONSIDERATION CHRISTOPHER GUEST Ranking Diversification 1 A director CHRISTOPHER GUEST 1 A director CHRISTOPHER of a movie CONSIDERATION GUEST of a movie CONSIDERATION 2 A director CHRISTOPHER 2 An actor GUEST GUEST 3 An actor CHRISTOPHER GUEST in a movie CONSIDERATION 3 As character CHRISTOPHER Query interpretations can be similar and return overlapping results 16

Recap: From keywords to structured queries K= consideration christopher guest consideration: christopher guest: σ consideration title (Movie) σ christopher guest name (Director) + T= σ? name (Director) Directs σ? title (Movie) = Q = σ christopher guest name (Director) Directs σ consideration title (Movie) 17

Recap: probability of a query interpretation P( I T K) P ( Q K) =, I is the set of keyword interpretations in Q consideration: movie.title, christopher guest :director.name T is the template of Q T= σ? name (Director) Directs σ? title (Movie) We use database and query log statistics to estimate P(Q K) We rank interpretations in order of decreasing probability of relevance Compare: the probability ranking principle (PRP) 18

Relevance ranking using probability estimates Keyword query: CONSIDERATION CHRISTOPHER GUEST Rank Ranking 1 A director CHRISTOPHER GUEST of a movie CONSIDERATION 2 A director CHRISTOPHER GUEST 3 An actor CHRISTOPHER GUEST in a movie CONSIDERATION Although the queries are different, their results highly overlap 19

Diversifying query interpretations Estimating Query Similarity Sim( Q 1, Q 2 ) I 1 ={consideration: movie.title, christopher guest :director.name} I 2 ={ christopher guest :director.name} Sim (Q 1, Q 2 ) = 0.5 corresponds to the Jaccard coefficient computed on the sets of keyword interpretations Combining Relevance and Similarity SimQ, q Score( Q) = λ PQ ( K) ( 1 λ) q QI QI λ: balance relevance and novelty in the diversification procedure = I I 1 1 I I 2 2. ( ). 20

Diversification algorithm Input: list L[l] of top-k query interpretations ranked by relevance Output: list R[r] of the relevant and diverse query interpretations ProcSelect Diverse Query Interpretations copy the first element of L in R; while(less than r elements selected) { //select the best candidate for R[i] initialize best_score=0; while(more candidates for R[i]in L){ if(check score upper bound) break; if(score(l[j])>best_score) set new best_score;} add the candidate from Lwith best_scoreto R }End Proc; 21

Diversification Keyword query: CONSIDERATION CHRISTOPHER GUEST Rank Ranking 1 A director CHRISTOPHER GUEST of a movie CONSIDERATION 2 An actor GUEST 3 As character CHRISTOPHER A (better?) overview of the available results. Measures needed. 22

Evaluation metrics: α-ndcg and S-Recall α-ndcg: a tradeoff between relevance and novelty. A document is a set of information nuggets. α-ndcg discounts the document gain if nuggets were seen. S-Recall is the number of unique subtopics covered by the first k results, divided by the total number of subtopics. 23

Evaluation metrics: α-ndcg-w and WS-Recall An information nugget in α-ndcg corresponds to a tuple (primary key) in a database result In a database relevance of pks varies a lot christopher guest name (Director) vs. consideration plottext (Plot) α-ndcg and S-Recall do not take into account graded relevance of information nuggets or sub-topics 24

Adapting gain for α-ndcg-w α-ndcg-w takes into account graded relevance of subtopics G [ k] = ( ) ( 1 α) r relevanceq k r expresses an overlap in the results of Q k with results of the query interpretations at ranks 1 k-1 r = pk i Q k j [ 1, k 1] pk i Q j 25

WS-Recall WS-Recall is an aggregated relevance of the subtopics covered by the top-k results divided by the maximum possible aggregated relevance when all relevant subtopics are covered: WS recall@ k = pk Q 1... k pk U ( ) relevance pk ( ) relevance pk U is the set of all relevant subtopics 26

Evaluation Goal: assess quality of disambiguation and diversification scheme A user study and a set of experiments Data sets: IMDB and Lyrics IMDB dataset: seven tables, 10 million records Lyrics dataset: five tables, 400 000 records Queries: Web search engine queries for movie and lyrics domains 50 single- (sc) and multi-concept (mc) queries for each dataset 27

Accessing relevance of query interpretations Too many interpretations. Can a query interpretation reflect the informational need represented by the query? Ground truth collection: a user study with 16 participants Relevance of query interpretations: accessed manually by the study participants Relevance of pks: we transfer the relevance of a query interpretation to the pks this interpretation retrieves. 28

Query disambiguation: α-ndcg-w, α=0 In the case of α=0, novelty of results is completely ignored, and α-ndcg-w corresponds to the standard NDCG. IMDB The relatively high α-ndcg-w values for α=0 confirm the quality of the ranking function. Lyrics Y-axis: α-ndcg-w X-axis: k (for top-k interpretations) Rank: ranking without diversification Div: ranking with diversification. 29

Diversification: α-ndcg-w, α=0.99 With α=0.99, novelty becomes crucial, and results without novelty are regarded as completely redundant. IMDB Diversification performed on top of query ranking achieves significant reduction of result redundancy, while preserving retrieval quality in the majority of the cases. Lyrics Y-axis: α-ndcg-w X-axis: k (for top-k interpretations) Rank: ranking without diversification Div: ranking with diversification. 30

λ: Balancing relevance and novelty α-ndcg-w@top-5, α=0.99, Lyrics α-ndcg-w values achieved by diversification decrease with increasing λ, until they meet α-ndcg-w of the ranking in λ=1. The smaller the value of λ, the more visible is the impact of diversification and the more α-ndcg-w values of diversification outperform the original ranking. With an increasing λ, relevance of query interpretations dominates over novelty and the amount of reranking achieved by diversification becomes smaller. ( K) ( 1 λ) Score( Q) = λ PQ q QI SimQ (, q). QI Y-axis: α-ndcg-w Rank: ranking without diversification Div: ranking with diversification. Elena Demidova: Advanced Methods of Information 31

Discussion An approach for search result diversification over structured data A probabilistic query disambiguation model A query similarity measure A greedy algorithm to obtain relevant and diverse interpretations Evaluation metrics to measure diversification quality A user study and experiments to evaluate the proposed scheme Novelty of keyword search results was substantially improved Search results obtained using the proposed diversification scheme better characterize possible answers available in the database than the results obtained by the initial relevance ranking 32

References and further reading References: [1] Charles L.A. Clarke, Maheedhar Kolla, Gordon V. Cormack, Olga Vechtomova, Azin Ashkan, Stefan Büttcher, and Ian MacKinnon. 2008. Novelty and diversity in information retrieval evaluation. In Proc. of the ACM SIGIR 2008. http://doi.acm.org/10.1145/1390334.1390446 [2] E. Demidova, P. Fankhauser, X. Zhou, and W. Nejdl. 2010. DivQ: diversification for keyword search over structured databases. In Proc. of ACM SIGIR 2010. http://doi.acm.org/10.1145/1835449.1835506 Further reading: [3] Enrico Minack, Wolf Siberski, and Wolfgang Nejdl. 2011. Incremental diversification for very large sets: a streaming-based approach. In Proc. of the ACM SIGIR 2011. http://doi.acm.org/10.1145/2009916.2009996 [4] Jun Wang and Jianhan Zhu. 2009. Portfolio theory of information retrieval. In Proc. of the SIGIR '09. http://doi.acm.org/10.1145/1571941.1571963 33