A Dynamic Bayesian Network Click Model for Web Search Ranking

Similar documents
Estimating Credibility of User Clicks with Mouse Movement and Eye-tracking Information

A Comparative Study of Click Models for Web Search

A Model to Estimate Intrinsic Document Relevance from the Clickthrough Logs of a Web Search Engine

Modern Retrieval Evaluations. Hongning Wang

Characterizing Search Intent Diversity into Click Models

Time-Aware Click Model

A Task Level Metric for Measuring Web Search Satisfaction and its Application on Improving Relevance Estimation

A new click model for relevance prediction in web search

Modeling Contextual Factors of Click Rates

Large-Scale Validation and Analysis of Interleaved Search Evaluation

A Comparative Analysis of Cascade Measures for Novelty and Diversity

A Noise-aware Click Model for Web Search

Bayes Net Learning. EECS 474 Fall 2016

Search Evaluation. Tao Yang CS293S Slides partially based on text book [CMS] [MRS]

CSCI 599: Applications of Natural Language Processing Information Retrieval Evaluation"

Search Engines Chapter 8 Evaluating Search Engines Felix Naumann

Content-Aware Click Modeling

Click Models for Web Search

Evaluating the Accuracy of. Implicit feedback. from Clicks and Query Reformulations in Web Search. Learning with Humans in the Loop

Conditional Random Fields - A probabilistic graphical model. Yen-Chin Lee 指導老師 : 鮑興國

Constructing Click Models for Mobile Search

Mining the Search Trails of Surfing Crowds: Identifying Relevant Websites from User Activity Data

Session Based Click Features for Recency Ranking

Advanced Search Techniques for Large Scale Data Analytics Pavel Zezula and Jan Sedmidubsky Masaryk University

Retrieval Evaluation. Hongning Wang

Graphical Models, Bayesian Method, Sampling, and Variational Inference

Adaptive Search Engines Learning Ranking Functions with SVMs

Social Search Networks of People and Search Engines. CS6200 Information Retrieval

Regularization and model selection

PUTTING CONTEXT INTO SEARCH AND SEARCH INTO CONTEXT. Susan Dumais, Microsoft Research

Semantic Extensions to Syntactic Analysis of Queries Ben Handy, Rohini Rajaraman

Learning Ranking Functions with SVMs

Ranking with Query-Dependent Loss for Web Search

Search Quality. Jan Pedersen 10 September 2007

Relevance and Effort: An Analysis of Document Utility

Lizhe Sun. November 17, Florida State University. Ranking in Statistics and Machine Learning. Lizhe Sun. Introduction

Mining Web Data. Lijun Zhang

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 15, 2015

KDD 10 Tutorial: Recommender Problems for Web Applications. Deepak Agarwal and Bee-Chung Chen Yahoo! Research

Going nonparametric: Nearest neighbor methods for regression and classification

ECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov

Understanding how searchers work is essential to creating compelling content and ads We will discuss

Building Search Applications

Joint Entity Resolution

Using Historical Click Data to Increase Interleaving Sensitivity

Visual Appeal vs. Usability: Which One Influences User Perceptions of a Website More?

A Large Scale Validation and Analysis of Interleaved Search Evaluation

Lecture 21 : A Hybrid: Deep Learning and Graphical Models

Personalized Models of Search Satisfaction. Ahmed Hassan and Ryen White

Motivation: Shortcomings of Hidden Markov Model. Ko, Youngjoong. Solution: Maximum Entropy Markov Model (MEMM)

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016

Chapter 8. Evaluating Search Engine

Fractional Similarity : Cross-lingual Feature Selection for Search

Reducing Click and Skip Errors in Search Result Ranking

WebSci and Learning to Rank for IR

Search Engines. Informa1on Retrieval in Prac1ce. Annota1ons by Michael L. Nelson

Learning the Structure of Sum-Product Networks. Robert Gens Pedro Domingos

Detecting Good Abandonment in Mobile Search

Learning Dense Models of Query Similarity from User Click Logs

Structured Learning. Jun Zhu

Detecting Multilingual and Multi-Regional Query Intent in Web Search

Optimizing Search Engines using Click-through Data

Recommender Systems. Collaborative Filtering & Content-Based Recommending

A Hidden Markov Model for Alphabet Soup Word Recognition

Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011

Bayesian Networks. A Bayesian network is a directed acyclic graph that represents causal relationships between random variables. Earthquake.

Machine Learning Classifiers and Boosting

Personalized Web Search

Information Retrieval

Mining Web Data. Lijun Zhang

Learning Ranking Functions with SVMs

CSEP 573: Artificial Intelligence

Evaluating Search Engines by Modeling the Relationship Between Relevance and Clicks

Learning to Rank (part 2)

Advanced Topics in Information Retrieval. Learning to Rank. ATIR July 14, 2016

Shallow Parsing Swapnil Chaudhari 11305R011 Ankur Aher Raj Dabre 11305R001

Part 11: Collaborative Filtering. Francesco Ricci

Challenges on Combining Open Web and Dataset Evaluation Results: The Case of the Contextual Suggestion Track

Predictive Analysis: Evaluation and Experimentation. Heejun Kim

The Sum of Its Parts: Reducing Sparsity in Click Estimation with Query Segments

A Neural Click Model for Web Search

STA 4273H: Statistical Machine Learning

THE HISTORY & EVOLUTION OF SEARCH

Conditional Random Fields and beyond D A N I E L K H A S H A B I C S U I U C,

Introduction to SLAM Part II. Paul Robertson

Scalable Multidimensional Hierarchical Bayesian Modeling on Spark

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL., NO., DEC User Action Interpretation for Online Content Optimization

Semantic Search in s

Robust Relevance-Based Language Models

Evaluating Search Engines by Clickthrough Data

Representation Learning using Multi-Task Deep Neural Networks for Semantic Classification and Information Retrieval

HMMConverter A tool-box for hidden Markov models with two novel, memory efficient parameter training algorithms

INCORPORATING SYNONYMS INTO SNIPPET BASED QUERY RECOMMENDATION SYSTEM

Jan Pedersen 22 July 2010

Instance-Based Learning: Nearest neighbor and kernel regression and classificiation

Towards Predicting Web Searcher Gaze Position from Mouse Movements

Analysis of Trail Algorithms for User Search Behavior

Query Refinement and Search Result Presentation

THE URBAN COWGIRL PRESENTS KEYWORD RESEARCH

User behavior modeling for better Web search ranking

Transcription:

A Dynamic Bayesian Network Click Model for Web Search Ranking Olivier Chapelle and Anne Ya Zhang Apr 22, 2009 18th International World Wide Web Conference

Introduction Motivation Clicks provide valuable implicit relevance feedback. More abundant and cheaper than editorial judgments. For learning a web search function, clicks have been used as a target [Joachims 02] or as a feature [Agichtein et al. 06]. Main difficulty:presentation bias Results at lower positions are less likely to be clicked even if they are relevant.

Introduction Motivation Clicks provide valuable implicit relevance feedback. More abundant and cheaper than editorial judgments. For learning a web search function, clicks have been used as a target [Joachims 02] or as a feature [Agichtein et al. 06]. Main difficulty:presentation bias Results at lower positions are less likely to be clicked even if they are relevant.

Problem statement Unbias the clicks = What would have been the click-through rate (CTR) of a given url had it been shown in first position. Goal is not click modeling per se. Two main types of click models to unbias the click logs: 1 Position based models 2 Cascade model [Craswell et al. 08] Position based models are widely used but cascade models turn out to be more accurate. We propose an extension of the cascade model and apply it to the problem of learning a ranking function.

Outline 1 Click modeling Position based models Cascade model Dynamic Bayesian Network 2 Experiments CTR@1 prediction Predicted relevance as a feature Predicted relevance as a target 3 Discussion A simplified model Editorial judgments vs clicks

Outline 1 Click modeling Position based models Cascade model Dynamic Bayesian Network 2 Experiments 3 Discussion

Position based model Naive approach: unbias by computing the aggregated CTRs at various positions. Examination model User clicks examines the snippet and finds it attractive. Probability of examination depends only on the position. P(C = 1 u, p) = a u b p. Estimation through maximum likelihood. Leverages variations in the search results to isolate the position effect. Variation: logistic model P(C = 1 u, p) = (1 + exp( a u b p )) 1. Unconstrained and jointly convex problem: easier optimization.

Position based model Naive approach: unbias by computing the aggregated CTRs at various positions. Examination model User clicks examines the snippet and finds it attractive. Probability of examination depends only on the position. P(C = 1 u, p) = a u b p. Estimation through maximum likelihood. Leverages variations in the search results to isolate the position effect. Variation: logistic model P(C = 1 u, p) = (1 + exp( a u b p )) 1. Unconstrained and jointly convex problem: easier optimization.

Position based model Naive approach: unbias by computing the aggregated CTRs at various positions. Examination model User clicks examines the snippet and finds it attractive. Probability of examination depends only on the position. P(C = 1 u, p) = a u b p. Estimation through maximum likelihood. Leverages variations in the search results to isolate the position effect. Variation: logistic model P(C = 1 u, p) = (1 + exp( a u b p )) 1. Unconstrained and jointly convex problem: easier optimization.

Typical results for position based model (normalized at 1 for position 1): Relative position effect on the CTR 1 0.8 0.6 0.4 0.2 Naive Examination 0 1 2 3 4 5 6 7 8 9 10 Position Example: under the examination model, moving a document from position 1 to 3 will reduce its CTR by about 50%.

Problem with position based models Example Query = myspace Url = www.myspace.com Market = UK Ranking 1 Pos 1: uk.myspace.com, ctr = 0.97 Pos 2: www.myspace.com, ctr = 0.11 Ranking 2 Pos 1: www.myspace.com, ctr = 0.97 Position based models cannot explain such a variation. Underlying problem: the probability of examination depends more on the relevance of the documents above than on the position.

Cascade model Introduced by Craswell et al. (WSDM 2008). 1: i = 1 2: User examines position i and sees url u. 3: if random(0,1) a u then 4: User clicks in position i and stops. 5: else 6: i i + 1; go to 2 7: end if User does a top-dwon scan of the documents and stops when he finds a relevant one. Probability of a click in position 3: (1 a u1 )(1 a u2 )a u3. Limitation: exactly one click per session (by definition). But already outperforms position based models. (correctly handles the myspace example)

Cascade model Introduced by Craswell et al. (WSDM 2008). 1: i = 1 2: User examines position i and sees url u. 3: if random(0,1) a u then 4: User clicks in position i and stops. 5: else 6: i i + 1; go to 2 7: end if User does a top-dwon scan of the documents and stops when he finds a relevant one. Probability of a click in position 3: (1 a u1 )(1 a u2 )a u3. Limitation: exactly one click per session (by definition). But already outperforms position based models. (correctly handles the myspace example)

Proposed extension Two main differences: 1 After clicking, there is some probability that the user is not satisfied and goes back to the search results. 2 The user may abandon his search before finding a relevant result. Allows for 0 or more than 1 click per session.

Dynamic Bayesian Network for the cascade model E i 1 E i E i+1 C i E 1 = 1 E i = 0 E i+1 = 0 A i = 1, E i = 1 C i = 1 P(A i = 1) = a u A i a u S i s u P(S i = 1 C i = 1) = s u C i = 0 S i = 0 S i = 1 E i+1 = 0 P(E i+1 = 1 E i = 1, S i = 0) = γ For each position i, 4 binary variables E i, A i, S i, C i {0, 1} E i : did the user examine? A i : was the user attracted? S i : was the user satisfied?

Dynamic Bayesian Network for the cascade model E i 1 E i E i+1 C i E 1 = 1 E i = 0 E i+1 = 0 A i = 1, E i = 1 C i = 1 P(A i = 1) = a u A i a u S i s u P(S i = 1 C i = 1) = s u C i = 0 S i = 0 S i = 1 E i+1 = 0 P(E i+1 = 1 E i = 1, S i = 0) = γ For each position i, 4 binary variables E i, A i, S i, C i {0, 1} E i : did the user examine? A i : was the user attracted? S i : was the user satisfied?

Dynamic Bayesian Network for the cascade model E i 1 E i E i+1 C i E 1 = 1 E i = 0 E i+1 = 0 A i = 1, E i = 1 C i = 1 P(A i = 1) = a u A i a u S i s u P(S i = 1 C i = 1) = s u C i = 0 S i = 0 S i = 1 E i+1 = 0 P(E i+1 = 1 E i = 1, S i = 0) = γ a u measures the perceived relevance s u is the ratio between actual and perceived relevance P(S i = 1 E i = 1) = a u s u.

Inference The model is similar to an HMM (Hidden Markov Model) and can be trained in the same way using the Expectation-Maximization algorithm (EM): E-Step Given a u, s u and γ, compute the posterior probabilities on A i, E i and S i. This involves the forward-backward algorithm. M-Step Given the posterior probabilities, update a u, s u and γ (simple counting). Note that the inference for different queries is decoulped. Throughput: 10k sessions per second. Processing several months of data on a Map-Reduce cluster typically takes several hours.

Extension to other clicks Introduce two virtual nodes to capture clicks that are not on the search results: Position 0 Anything at the top of the search result page (sponsored ads, spelling suggestion) Position 11 Anything at the bottom (next page button)

Outline 1 Click modeling 2 Experiments CTR@1 prediction Predicted relevance as a feature Predicted relevance as a target 3 Discussion

Experimental setup Session definition Unique user and query, finished by 60 minutes idle time. Consider only first page of results. Ignore the sessions for which the clicks are not in the same order of the ranking. Kept only queries for which we have at least 10 sessions. One month of UK data. 682k unique queries and 58M sessions.

Experiment (I) Predict attractiveness, i.e. CTR@1 Leave-first-position-out 1 Retrieve all the sessions related to a given query 2 Consider an url that appeared both in position 1 and other positions 3 Hold out as test the sessions in which that url appeared in position 1 4 Train the model on the remaining sessions and predict a u 5 Compute the test CTR in position 1 on the held-out sessions 6 Compute an error between these two quantities 7 Average the error on all such urls and queries, weighted by the number of test sessions.

MSE as a function of the minimum number of training sessions: left = all queries; right = head queries. 0.18 0.16 0.14 0.12 Logistic Examination DBN Cascade COEC MSE 0.1 0.08 0.06 0.04 0.02 0 10 0 10 1 10 2 10 3 10 4 Minimum number of sessions

Influence of γ 0.05 0.048 0.046 MSE 0.044 0.042 0.04 0.038 0.036 0.2 0.4 0.6 0.8 1 γ Users are persistent.

Experiments (II) Use the predicted relevance a u s u for ranking: 1 Simply rank according to the predicted relevance. NDCG comparison computed for urls with at least 10 sessions and queries with at least 10 urls: Logistic 0.705-7.8% Cascade 0.73-4.6% DBN 10 nodes 0.748-2.2% DBN 12 nodes 0.765 DBN 12 nodes (a u only) 0.756-1.2% Machine learned ranking function trained with hundreds of ranking features: 0.795 (+3.9%) 2 Add predicted relevance as a feature in the machine learned ranking function: +0.8%. Feature is one of the top 10 most important ones.

Experiments (II) Use the predicted relevance a u s u for ranking: 1 Simply rank according to the predicted relevance. NDCG comparison computed for urls with at least 10 sessions and queries with at least 10 urls: Logistic 0.705-7.8% Cascade 0.73-4.6% DBN 10 nodes 0.748-2.2% DBN 12 nodes 0.765 DBN 12 nodes (a u only) 0.756-1.2% Machine learned ranking function trained with hundreds of ranking features: 0.795 (+3.9%) 2 Add predicted relevance as a feature in the machine learned ranking function: +0.8%. Feature is one of the top 10 most important ones.

Experiments (III) Use predicted relevance as a target: useful for markets with few editorial judgments. Technique: boosting decision trees trained on pairwise preferences. 2 sets of preferences: P E from editorial judgments and P C coming from the click modeling. Minimize: 1 δ P E φ(f (x i ) f (x j )) + δ φ(f (x i ) f (x j )). P C (x i,x j ) P E (x i,x j ) P C with φ(t) = max(0, 1 t) 2.

DCG relative to δ = 0. Left: only editorial judgments; right: only clicks. 1.02 1.01 Relative DCG 5 1 0.99 0.98 0.97 0 0.2 0.4 0.6 0.8 1 δ About 2% gain by combining clicks and editorial judgments.

Outline 1 Click modeling 2 Experiments 3 Discussion A simplified model Editorial judgments vs clicks

A simplified model Optimal γ = 0.9, but γ = 1 is almost as good. In that case, inference is much easier because we know that the urls after the last click have not been examined. Simple counting algorithm for estimating a u and s u : a u = # clicks, view := clicked or click below. # views s u = # last clicks. # clicks Extremely simple and efficient to alleviate the position bias Never use standard CTRs anymore!

A simplified model Optimal γ = 0.9, but γ = 1 is almost as good. In that case, inference is much easier because we know that the urls after the last click have not been examined. Simple counting algorithm for estimating a u and s u : a u = # clicks, view := clicked or click below. # views s u = # last clicks. # clicks Extremely simple and efficient to alleviate the position bias Never use standard CTRs anymore!

Editorial judgments vs clicks About 20% disagreement rate between the preferences based on clicks and on editorial judgments. We asked editors to look at these contradicting pairs and assign of the following 6 reasons: 1 Errors in editorial judgment 2 Imperfect editorial guidelines 3 Popularity is not necessarily aligned with relevance 4 Clicks measure mostly perceived relevance, while editors judge the relevance of the landing page 5 Click model is not accurate 6 Other

Editorial judgments vs clicks About 20% disagreement rate between the preferences based on clicks and on editorial judgments. We asked editors to look at these contradicting pairs and assign of the following 6 reasons: 1 Errors in editorial judgment 33.3% 2 Imperfect editorial guidelines 24.6% 3 Popularity is not necessarily aligned with relevance 15.2% 4 Clicks measure mostly perceived relevance, while editors judge the relevance of the landing page 19.4% 5 Click model is not accurate 4.7% 6 Other 2.8%

Examples: Popularity relevance First url preferred by the clicks, second one by the editors. Adobe www.adobe.com/products/acrobat/readstep2.html www.adobe.com Hsbc www.hsbc.co.uk/1/2/personal/internet-banking www.hsbc.co.uk Paris Hilton www.maximonline.com/girls_of_maxim/ en.wikipedia.org/wiki/paris_hilton Rush Hour 3 www.imdb.com/title/tt0293564 www.rushhourmovie.com

Examples: perceived vs actual relevance Spam pages Quality of the url / abstract.com preferred to.co.uk Domain trust Cricket scores news.bbc.co.uk/sport2/hi/cricket/default.stm www.cricinfo.com Hotmail www.hotmail.com newhotmail.co.uk Travel insurance www.directline.com www.travelandinsurance.com

Examples: perceived vs actual relevance Spam pages Quality of the url / abstract.com preferred to.co.uk Domain trust Cricket scores news.bbc.co.uk/sport2/hi/cricket/default.stm www.cricinfo.com Hotmail www.hotmail.com newhotmail.co.uk Travel insurance www.directline.com www.travelandinsurance.com

Examples: perceived vs actual relevance Spam pages Quality of the url / abstract.com preferred to.co.uk Domain trust Cricket scores news.bbc.co.uk/sport2/hi/cricket/default.stm www.cricinfo.com Hotmail www.hotmail.com newhotmail.co.uk Travel insurance www.directline.com www.travelandinsurance.com

Examples: perceived vs actual relevance Spam pages Quality of the url / abstract.com preferred to.co.uk Domain trust Cricket scores news.bbc.co.uk/sport2/hi/cricket/default.stm www.cricinfo.com Hotmail www.hotmail.com newhotmail.co.uk Travel insurance www.directline.com www.travelandinsurance.com

Conclusion Cascade models are better than position based models to handle the position bias problem. Extension of the cascade model by allowing the user to 1) give up and 2) continue searching after a click (by introducing the notion of satisfaction). Simplified version of this model turns out to be almost as good and only involves simple counting. Estimated relevance is helpful both as a target and as a feature. But if test metric is editorial, improvements can be limited due to intrinsic differences between clicks and editorial judgments. Current work: use the click duration to model the satisfaction.

Differences with the next talk Next talk, Click Chain Model in Web Search, is strikingly similar to our work: the authors also construct a graphical model to extend the basic model and allow for any number of clicks. Some key differences: 1 Notion of satisfaction 2 Approximate vs exact inference 3 Evaluation