Learning Socially Optimal Information Systems from Egoistic Users

Similar documents
University of Delaware at Diversity Task of Web Track 2010

Structured Learning of Two-Level Dynamic Rankings

Learning with Humans in the Loop

Learning to Rank (part 2)

Efficient Diversification of Web Search Results

Learning Ranking Functions with Implicit Feedback

A Study of Pattern-based Subtopic Discovery and Integration in the Web Track

5. Novelty & Diversity

Learning Ranking Functions with SVMs

Large-Scale Validation and Analysis of Interleaved Search Evaluation

Learning Dense Models of Query Similarity from User Click Logs

NTU Approaches to Subtopic Mining and Document Ranking at NTCIR-9 Intent Task

International Journal of Computer Engineering and Applications, Volume IX, Issue X, Oct. 15 ISSN

Evaluating the Accuracy of. Implicit feedback. from Clicks and Query Reformulations in Web Search. Learning with Humans in the Loop

Scalable Diversified Ranking on Large Graphs

Part 5: Structured Support Vector Machines

Learning Optimal Subsets with Implicit User Preferences

An Analysis of NP-Completeness in Novelty and Diversity Ranking

An Investigation of Basic Retrieval Models for the Dynamic Domain Task

Minimally Invasive Randomization for Collecting Unbiased Preferences from Clickthrough Logs

Coverage-based search result diversification

Fast Convergence of Regularized Learning in Games

Learning to Rank. Tie-Yan Liu. Microsoft Research Asia CCIR 2011, Jinan,

CS 6740: Advanced Language Technologies April 2, Lecturer: Lillian Lee Scribes: Navin Sivakumar, Lakshmi Ganesh, Taiyang Chen.

A Survey: Static and Dynamic Ranking

Lecture 3: Improving Ranking with

Modeling Contextual Factors of Click Rates

Submodularity in Speech/NLP

Modern Retrieval Evaluations. Hongning Wang

CSE 546 Machine Learning, Autumn 2013 Homework 2

Interactive Exploratory Search for Multi Page Search Results

A Multiple-Play Bandit Algorithm Applied to Recommender Systems

Microsoft Research Asia at the NTCIR-9 Intent Task

A Class of Submodular Functions for Document Summarization

Noisy Submodular Maximization via Adaptive Sampling with Applications to Crowdsourced Image Collection Summarization

Microsoft Research Asia at the Web Track of TREC 2009

ECS289: Scalable Machine Learning

Promoting Ranking Diversity for Biomedical Information Retrieval based on LDA

Robust Relevance-Based Language Models

Part 5: Structured Support Vector Machines

Improving Aggregated Search Coherence

Submodular Optimization in Computational Sustainability. Andreas Krause

Learning to Rank. from heuristics to theoretic approaches. Hongning Wang

Exploiting the Diversity of User Preferences for Recommendation

Markov Decision Processes (MDPs) (cont.)

Support Vector Machine Learning for Interdependent and Structured Output Spaces

Randomized Composable Core-sets for Distributed Optimization Vahab Mirrokni

The Offset Tree for Learning with Partial Labels

Letter Pair Similarity Classification and URL Ranking Based on Feedback Approach

Combine the PA Algorithm with a Proximal Classifier

Risk Minimization and Language Modeling in Text Retrieval Thesis Summary

arxiv: v2 [cs.ai] 1 Dec 2015

Kernels and representation

The 1st International Workshop on Diversity in Document Retrieval

Evaluating Search Engines by Modeling the Relationship Between Relevance and Clicks

mnir: Diversifying Search Results based on a Mixture of Novelty, Intention and Relevance

Online Learning of Assignments

Learning optimally diverse rankings over large document collections

Combining Implicit and Explicit Topic Representations for Result Diversification

TREC OpenSearch Planning Session

Evaluating search engines CE-324: Modern Information Retrieval Sharif University of Technology

Visual Recognition: Examples of Graphical Models

1 Overview. 2 Applications of submodular maximization. AM 221: Advanced Optimization Spring 2016

A Dynamic Bayesian Network Click Model for Web Search Ranking

Non-exhaustive, Overlapping k-means

More Data, Less Work: Runtime as a decreasing function of data set size. Nati Srebro. Toyota Technological Institute Chicago

Search Result Diversification

arxiv: v1 [cs.ir] 11 Oct 2018

Support Vector Machines and their Applications

Comparisons of Sequence Labeling Algorithms and Extensions

Diversification of Query Interpretations and Search Results

ECS289: Scalable Machine Learning

Proteins, Particles, and Pseudo-Max- Marginals: A Submodular Approach Jason Pacheco Erik Sudderth

Clickthrough Log Analysis by Collaborative Ranking

Optimal Separating Hyperplane and the Support Vector Machine. Volker Tresp Summer 2018

Adaptive Search Engines Learning Ranking Functions with SVMs

Diversity Maximization Under Matroid Constraints

Distributed Submodular Maximization in Massive Datasets. Alina Ene. Joint work with Rafael Barbosa, Huy L. Nguyen, Justin Ward

Using Historical Click Data to Increase Interleaving Sensitivity

INTERSOCIAL: Unleashing the Power of Social Networks for Regional SMEs

Learning CRFs for Image Parsing with Adaptive Subgradient Descent

Addressing the Challenges of Underspecification in Web Search. Michael Welch

Semi-supervised learning and active learning

WebSci and Learning to Rank for IR

A Survey on Postive and Unlabelled Learning

Mining the Search Trails of Surfing Crowds: Identifying Relevant Websites from User Activity Data

Coverage Approximation Algorithms

Estimating Credibility of User Clicks with Mouse Movement and Eye-tracking Information

Fast Learning of Document Ranking Functions with the Committee Perceptron

Representative & Informative Query Selection for Learning to Rank using Submodular Functions

Learning Ranking Functions with SVMs

Advanced Topics in Information Retrieval. Learning to Rank. ATIR July 14, 2016

Lecture 9: Support Vector Machines

Improving Difficult Queries by Leveraging Clusters in Term Graph

Learning Hierarchies at Two-class Complexity

ICTNET at Web Track 2010 Diversity Task

Effective Latent Space Graph-based Re-ranking Model with Global Consistency

arxiv: v2 [cs.cv] 1 Oct 2013

Optimizing Search Engines using Click-through Data

COMS 4771 Support Vector Machines. Nakul Verma

Transcription:

Learning Socially Optimal Information Systems from Egoistic Users Karthik Raman Thorsten Joachims Department of Computer Science Cornell University, Ithaca NY karthik@cs.cornell.edu www.cs.cornell.edu/ karthik tj@cs.cornell.edu www.cs.cornell.edu/people/tj Sept 23, 2013 Karthik Raman (Cornell University) Learning Socially Optimal Systems Sept 23, 2013 1 / 20

Example: (Extrinsic) Diversity in Web Search Karthik Raman (Cornell University) Learning Socially Optimal Systems Sept 23, 2013 2 / 20

Example: (Extrinsic) Diversity in Web Search What did the user mean? Karthik Raman (Cornell University) Learning Socially Optimal Systems Sept 23, 2013 2 / 20

Example: (Extrinsic) Diversity in Web Search Silvercorp Metals 40% Karthik Raman (Cornell University) Learning Socially Optimal Systems Sept 23, 2013 2 / 20

Example: (Extrinsic) Diversity in Web Search Silvercorp Metals 40% Support Vector Machines 35% Karthik Raman (Cornell University) Learning Socially Optimal Systems Sept 23, 2013 2 / 20

Example: (Extrinsic) Diversity in Web Search Silvercorp Metals 40% Support Vector Machines 35% SVM Gift Cards 15% Karthik Raman (Cornell University) Learning Socially Optimal Systems Sept 23, 2013 2 / 20

Example: (Extrinsic) Diversity in Web Search Silvercorp Metals 40% Support Vector Machines 35% SVM Gift Cards 15% Society of Vascular Medicine 10% Karthik Raman (Cornell University) Learning Socially Optimal Systems Sept 23, 2013 2 / 20

Example: (Extrinsic) Diversity in Web Search Silvercorp Metals 40% Support Vector Machines 35% SVM Gift Cards 15% Society of Vascular Medicine 10% Karthik Raman (Cornell University) Learning Socially Optimal Systems Sept 23, 2013 2 / 20

Example: (Extrinsic) Diversity in Web Search Silvercorp Metals 40% Support Vector Machines 35% SVM Gift Cards 15% SOCIALLY BENEFICIAL Society of Vascular Medicine 10% Karthik Raman (Cornell University) Learning Socially Optimal Systems Sept 23, 2013 2 / 20

Socially optimal solutions in Information systems Problem: Common content for users with different tastes. Hedge against uncertainty in user s preferences. Karthik Raman (Cornell University) Learning Socially Optimal Systems Sept 23, 2013 3 / 20

Socially optimal solutions in Information systems Problem: Common content for users with different tastes. Hedge against uncertainty in user s preferences. Ex: News Sites, Product & Media Recommendation, Frontpages. Karthik Raman (Cornell University) Learning Socially Optimal Systems Sept 23, 2013 3 / 20

Socially optimal solutions in Information systems Problem: Common content for users with different tastes. Hedge against uncertainty in user s preferences. Ex: News Sites, Product & Media Recommendation, Frontpages. Diverse User population: N user types. User type i has probability p i. Karthik Raman (Cornell University) Learning Socially Optimal Systems Sept 23, 2013 3 / 20

Socially optimal solutions in Information systems Problem: Common content for users with different tastes. Hedge against uncertainty in user s preferences. Ex: News Sites, Product & Media Recommendation, Frontpages. Diverse User population: N user types. User type i has probability p i. Personal utility for object (e.g. ranking) y: U i (, y). Karthik Raman (Cornell University) Learning Socially Optimal Systems Sept 23, 2013 3 / 20

Socially optimal solutions in Information systems Problem: Common content for users with different tastes. Hedge against uncertainty in user s preferences. Ex: News Sites, Product & Media Recommendation, Frontpages. Diverse User population: N user types. User type i has probability p i. Personal utility for object (e.g. ranking) y: U i (, y). Social utility is the expected utility (over all users): N U(, y) = E[U i (, y)] = p i U i (, y) i=1 Karthik Raman (Cornell University) Learning Socially Optimal Systems Sept 23, 2013 3 / 20

Socially optimal solutions in Information systems Problem: Common content for users with different tastes. Hedge against uncertainty in user s preferences. Ex: News Sites, Product & Media Recommendation, Frontpages. Diverse User population: N user types. User type i has probability p i. Personal utility for object (e.g. ranking) y: U i (, y). Social utility is the expected utility (over all users): N U(, y) = E[U i (, y)] = p i U i (, y) i=1 Goal: Find Socially Optimal Solution y = argmax y U(, y). Karthik Raman (Cornell University) Learning Socially Optimal Systems Sept 23, 2013 3 / 20

Socially optimal solutions in Information systems Problem: Common content for users with different tastes. Hedge against uncertainty in user s preferences. Ex: News Sites, Product & Media Recommendation, Frontpages. Diverse User population: N user types. User type i has probability p i. Personal utility for object (e.g. ranking) y: U i (, y). Social utility is the expected utility (over all users): N U(, y) = E[U i (, y)] = p i U i (, y) i=1 Goal: Find Socially Optimal Solution y = argmax y U(, y). Challenge: Learn from egoistic, weak, noisy user feedback. Karthik Raman (Cornell University) Learning Socially Optimal Systems Sept 23, 2013 3 / 20

Challenge of egoistic feedback Challenge: Learn from egoistic, weak, noisy user feedback. Karthik Raman (Cornell University) Learning Socially Optimal Systems Sept 23, 2013 4 / 20

Challenge of egoistic feedback Challenge: Learn from egoistic, weak, noisy user feedback. User i s feedback reflects them behaving as per personal utility U i. Karthik Raman (Cornell University) Learning Socially Optimal Systems Sept 23, 2013 4 / 20

Challenge of egoistic feedback Challenge: Learn from egoistic, weak, noisy user feedback. User i s feedback reflects them behaving as per personal utility U i. Not social utility U. As in the case of prior work [RSJ12] for the intrinsic diversity problem. Karthik Raman (Cornell University) Learning Socially Optimal Systems Sept 23, 2013 4 / 20

Challenge of egoistic feedback Challenge: Learn from egoistic, weak, noisy user feedback. User i s feedback reflects them behaving as per personal utility U i. Not social utility U. As in the case of prior work [RSJ12] for the intrinsic diversity problem. Need to infer social utility from such conflicting, individual feedback. Karthik Raman (Cornell University) Learning Socially Optimal Systems Sept 23, 2013 4 / 20

Challenge of egoistic feedback Challenge: Learn from egoistic, weak, noisy user feedback. User i s feedback reflects them behaving as per personal utility U i. Not social utility U. As in the case of prior work [RSJ12] for the intrinsic diversity problem. Need to infer social utility from such conflicting, individual feedback. User Interest Pref Ranking Company a 1, a 2, a 3,... ML Gift Cards Social Opt Karthik Raman (Cornell University) Learning Socially Optimal Systems Sept 23, 2013 4 / 20

Challenge of egoistic feedback Challenge: Learn from egoistic, weak, noisy user feedback. User i s feedback reflects them behaving as per personal utility U i. Not social utility U. As in the case of prior work [RSJ12] for the intrinsic diversity problem. Need to infer social utility from such conflicting, individual feedback. User Interest Pref Ranking Company a 1, a 2, a 3,... ML b 1, b 2, b 3,... Gift Cards Social Opt Karthik Raman (Cornell University) Learning Socially Optimal Systems Sept 23, 2013 4 / 20

Challenge of egoistic feedback Challenge: Learn from egoistic, weak, noisy user feedback. User i s feedback reflects them behaving as per personal utility U i. Not social utility U. As in the case of prior work [RSJ12] for the intrinsic diversity problem. Need to infer social utility from such conflicting, individual feedback. User Interest Pref Ranking Company a 1, a 2, a 3,... ML b 1, b 2, b 3,... Gift Cards c 1, c 2, c 3,... Social Opt Karthik Raman (Cornell University) Learning Socially Optimal Systems Sept 23, 2013 4 / 20

Challenge of egoistic feedback Challenge: Learn from egoistic, weak, noisy user feedback. User i s feedback reflects them behaving as per personal utility U i. Not social utility U. As in the case of prior work [RSJ12] for the intrinsic diversity problem. Need to infer social utility from such conflicting, individual feedback. User Interest Pref Ranking Company a 1, a 2, a 3,... ML b 1, b 2, b 3,... Gift Cards c 1, c 2, c 3,... Social Opt a 1, b 1, c 1,... Even if social optimal is presented, users may indicate preferences for other rankings. Karthik Raman (Cornell University) Learning Socially Optimal Systems Sept 23, 2013 4 / 20

Related work [CG98, ZCL03, CK06]: Address Extrinsic Diversity. Do not use learning. [YJ08, SMO10, RJS11]: Use learning for diversity. Require relevance labels for all user-document pairs. [RKJ08]: Uses online learning: Array of (decoupled) MA Bandits. Learns very slowly. Does not generalize across queries. [SRG13]: Couples the arms together. Does not generalize across queries. Hard-coded notion of diversity. [YG12]: Generalizes across queries. Requires cardinal utilities. [RSJ12]: Learns from user preferences. Requires all users directly optimize social utility U. Karthik Raman (Cornell University) Learning Socially Optimal Systems Sept 23, 2013 5 / 20

Preferential Feedback What feedback do we obtain from users? Karthik Raman (Cornell University) Learning Socially Optimal Systems Sept 23, 2013 6 / 20

Preferential Feedback What feedback do we obtain from users? Implicit feedback (e.g. clicks) is timely and easily available. Presented (y) Karthik Raman (Cornell University) Learning Socially Optimal Systems Sept 23, 2013 6 / 20

Preferential Feedback What feedback do we obtain from users? Implicit feedback (e.g. clicks) is timely and easily available. Presented (y) User feedback does not reflect cardinal utilities. Shown in user studies [JGP + 07]. Karthik Raman (Cornell University) Learning Socially Optimal Systems Sept 23, 2013 6 / 20

Preferential Feedback What feedback do we obtain from users? Implicit feedback (e.g. clicks) is timely and easily available. Presented (y) User feedback does not reflect cardinal utilities. Shown in user studies [JGP + 07]. KEY: Treat user feedback as preferences. Feedback (y ) Karthik Raman (Cornell University) Learning Socially Optimal Systems Sept 23, 2013 6 / 20

Preferential Feedback What feedback do we obtain from users? Implicit feedback (e.g. clicks) is timely and easily available. Presented (y) User feedback does not reflect cardinal utilities. Shown in user studies [JGP + 07]. KEY: Treat user feedback as preferences. How do we learn from such preferential feedback? Feedback (y ) Karthik Raman (Cornell University) Learning Socially Optimal Systems Sept 23, 2013 6 / 20

Learning from Preferences: Coactive Learning [SJ12, RJSS13] Learning model Repeat forever: System receives context x t. System makes prediction y t. Karthik Raman (Cornell University) Learning Socially Optimal Systems Sept 23, 2013 7 / 20

Learning from Preferences: Coactive Learning [SJ12, RJSS13] Learning model Repeat forever: System receives context x t. System makes prediction y t. e.g. : Search Engine User Query Ranking Karthik Raman (Cornell University) Learning Socially Optimal Systems Sept 23, 2013 7 / 20

Learning from Preferences: Coactive Learning [SJ12, RJSS13] Learning model Repeat forever: System receives context x t. System makes prediction y t. Regret = Regret + U(xt, yt ) U(x t, y t ) e.g. : Search Engine User Query Ranking Social utility Karthik Raman (Cornell University) Learning Socially Optimal Systems Sept 23, 2013 7 / 20

Learning from Preferences: Coactive Learning [SJ12, RJSS13] Learning model Repeat forever: System receives context x t. System makes prediction y t. Regret = Regret + U(xt, y t ) U(x t, y t ) System gets (preference) feedback: U i (x t, ȳ t ) α,δ U i (x t, y t ) e.g. : Search Engine User Query Ranking Social utility Karthik Raman (Cornell University) Learning Socially Optimal Systems Sept 23, 2013 7 / 20

Learning from Preferences: Coactive Learning [SJ12, RJSS13] Learning model Repeat forever: System receives context x t. System makes prediction y t. Regret = Regret + U(xt, y t ) U(x t, y t ) System gets (preference) feedback: U i (x t, ȳ t ) α,δ U i (x t, y t ) e.g. : Search Engine User Query Ranking Social utility Feedback received in terms of personal utilities U i. But regret is in terms of social utility U. Karthik Raman (Cornell University) Learning Socially Optimal Systems Sept 23, 2013 7 / 20

Learning from Preferences: Coactive Learning [SJ12, RJSS13] Learning model Repeat forever: System receives context x t. System makes prediction y t. Regret = Regret + U(xt, y t ) U(x t, y t ) System gets (preference) feedback: U i (x t, ȳ t ) α,δ U i (x t, y t ) e.g. : Search Engine User Query Ranking Social utility Feedback received in terms of personal utilities U i. But regret is in terms of social utility U. How does we model utilities? Karthik Raman (Cornell University) Learning Socially Optimal Systems Sept 23, 2013 7 / 20

Modeling User Utility: Submodularity Assume personal utilities are submodular. Diminishing returns: Marginal benefit of additional document on ML diminishes if 10 docs already shown vs only 1 previous doc. D1 D2 D4 D3 Computing ranking Submodular maximization Use simple, efficient greedy algorithm. Approximation guarantee of 1 2 (under partition matroid constraint). Karthik Raman (Cornell University) Learning Socially Optimal Systems Sept 23, 2013 8 / 20

Modeling User Utility: Submodularity Assume personal utilities are submodular. Diminishing returns: Marginal benefit of additional document on ML diminishes if 10 docs already shown vs only 1 previous doc. D1 D2 D4 D3 Computing ranking Submodular maximization Use simple, efficient greedy algorithm. Approximation guarantee of 1 2 (under partition matroid constraint). How does this lead to diversity? Karthik Raman (Cornell University) Learning Socially Optimal Systems Sept 23, 2013 8 / 20

Diversity via Submodularity: An example Posn Doc machine learning metal silver 1 2 3 4 MAX of Col Doc d 1 d 2 d 3 d 4 d 5 d 6 d 7 d 8 Marginal Benefit Doc d 1 d 2 d 3 d 4 d 5 d 6 d 7 d 8 Words ma:3 le:3 ma:5 le:2 ma:2 le:5 ma:2 le:3 me:3 si:5 me:6 si:2 me:4 si:2 ma:1 me:3 si:1 ma:1 Word Weight machine 5 learning 7 metal 4 silver 6 Karthik Raman (Cornell University) Learning Socially Optimal Systems Sept 23, 2013 9 / 20

Diversity via Submodularity: An example Posn Doc machine learning metal silver 1 2 3 4 MAX of Col Doc Marginal Benefit d 1 3*5 + 3*7 36 d 2 d 3 d 4 d 5 d 6 d 7 d 8 Doc d 1 d 2 d 3 d 4 d 5 d 6 d 7 d 8 Words ma:3 le:3 ma:5 le:2 ma:2 le:5 ma:2 le:3 me:3 si:5 me:6 si:2 me:4 si:2 ma:1 me:3 si:1 ma:1 Word Weight machine 5 learning 7 metal 4 silver 6 Karthik Raman (Cornell University) Learning Socially Optimal Systems Sept 23, 2013 9 / 20

Diversity via Submodularity: An example Posn Doc machine learning metal silver 1 2 3 4 MAX of Col Doc Marginal Benefit d 1 3*5 + 3*7 36 d 2 5*5 + 2*7 39 d 3 2*5 + 5*7 45 d 4 2*5 + 3*7 31 d 5 3*4 + 5*6 42 d 6 6*4 + 2*6 36 d 7 1*5 + 4*4 + 2*6 33 d 8 1*5 + 3*4 + 1*6 23 Doc d 1 d 2 d 3 d 4 d 5 d 6 d 7 d 8 Words ma:3 le:3 ma:5 le:2 ma:2 le:5 ma:2 le:3 me:3 si:5 me:6 si:2 me:4 si:2 ma:1 me:3 si:1 ma:1 Word Weight machine 5 learning 7 metal 4 silver 6 Karthik Raman (Cornell University) Learning Socially Optimal Systems Sept 23, 2013 9 / 20

Diversity via Submodularity: An example Posn Doc machine learning metal silver 1 d 3 2 5 0 0 2 3 4 MAX of Col 2 5 0 0 Doc Marginal Benefit d 1 3*5 + 3*7 36 d 2 5*5 + 2*7 39 d 3 2*5 + 5*7 45 d 4 2*5 + 3*7 31 d 5 3*4 + 5*6 42 d 6 6*4 + 2*6 36 d 7 1*5 + 4*4 + 2*6 33 d 8 1*5 + 3*4 + 1*6 23 Doc d 1 d 2 d 3 d 4 d 5 d 6 d 7 d 8 Words ma:3 le:3 ma:5 le:2 ma:2 le:5 ma:2 le:3 me:3 si:5 me:6 si:2 me:4 si:2 ma:1 me:3 si:1 ma:1 Word Weight machine 5 learning 7 metal 4 silver 6 Karthik Raman (Cornell University) Learning Socially Optimal Systems Sept 23, 2013 9 / 20

Diversity via Submodularity: An example Posn Doc machine learning metal silver 1 d 3 2 5 0 0 2 3 4 MAX of Col 2 5 0 0 Doc Marginal Benefit d 1 (3-2)*5 5 d 2 (5-2)*5 15 d 3 - - d 4 0 0 d 5 3*4 + 5*6 42 d 6 6*4 + 2*6 36 d 7 4*4 + 2*6 28 d 8 3*4 + 1*6 18 Doc d 1 d 2 d 3 d 4 d 5 d 6 d 7 d 8 Words ma:3 le:3 ma:5 le:2 ma:2 le:5 ma:2 le:3 me:3 si:5 me:6 si:2 me:4 si:2 ma:1 me:3 si:1 ma:1 Word Weight machine 5 learning 7 metal 4 silver 6 Karthik Raman (Cornell University) Learning Socially Optimal Systems Sept 23, 2013 9 / 20

Diversity via Submodularity: An example Posn Doc machine learning metal silver 1 d 3 2 5 0 0 2 d 5 0 0 3 5 3 4 MAX of Col 2 5 3 5 Doc Marginal Benefit d 1 (3-2)*5 5 d 2 (5-2)*5 15 d 3 - - d 4 0 0 d 5 3*4 + 5*6 42 d 6 6*4 + 2*6 36 d 7 4*4 + 2*6 28 d 8 3*4 + 1*6 18 Doc d 1 d 2 d 3 d 4 d 5 d 6 d 7 d 8 Words ma:3 le:3 ma:5 le:2 ma:2 le:5 ma:2 le:3 me:3 si:5 me:6 si:2 me:4 si:2 ma:1 me:3 si:1 ma:1 Word Weight machine 5 learning 7 metal 4 silver 6 Karthik Raman (Cornell University) Learning Socially Optimal Systems Sept 23, 2013 9 / 20

Diversity via Submodularity: An example Posn Doc machine learning metal silver 1 d 3 2 5 0 0 2 d 5 0 0 3 5 3 d 2 5 2 0 0 4 MAX of Col 5 5 3 5 Doc Marginal Benefit d 1 (3-2)*5 5 d 2 (5-2)*5 15 d 3 - - d 4 0 0 d 5 - - d 6 (6-3)*4 12 d 7 (4-3)*4 4 d 8 0 0 Doc d 1 d 2 d 3 d 4 d 5 d 6 d 7 d 8 Words ma:3 le:3 ma:5 le:2 ma:2 le:5 ma:2 le:3 me:3 si:5 me:6 si:2 me:4 si:2 ma:1 me:3 si:1 ma:1 Word Weight machine 5 learning 7 metal 4 silver 6 Karthik Raman (Cornell University) Learning Socially Optimal Systems Sept 23, 2013 9 / 20

Diversity via Submodularity: An example Posn Doc machine learning metal silver 1 d 3 2 5 0 0 2 d 5 0 0 3 5 3 d 2 5 2 0 0 4 d 6 0 0 6 2 MAX of Col 5 5 6 5 Doc Marginal Benefit d 1 0 0 d 2 - - d 3 - - d 4 0 0 d 5 - - d 6 (6-3)*4 12 d 7 0 0 d 8 0 0 Doc d 1 d 2 d 3 d 4 d 5 d 6 d 7 d 8 Words ma:3 le:3 ma:5 le:2 ma:2 le:5 ma:2 le:3 me:3 si:5 me:6 si:2 me:4 si:2 ma:1 me:3 si:1 ma:1 Word Weight machine 5 learning 7 metal 4 silver 6 Karthik Raman (Cornell University) Learning Socially Optimal Systems Sept 23, 2013 9 / 20

More General Submodular Utility Can we use other submodular functions? Karthik Raman (Cornell University) Learning Socially Optimal Systems Sept 23, 2013 10 / 20

More General Submodular Utility Can we use other submodular functions? Yes. Given ranking/set y = (d i1,..., d in ), aggregate features as: φ j F (x, y) = F (γ 1φ j (x, d i1 ), γ 2 φ j (x, d i2 ),......, γ n φ j (x, d in )) φ j (x, d i ) is j th feature of d i. F is a submodular function (modeling decision). γ 1 γ 2... γ n 0 are position-discount factors Karthik Raman (Cornell University) Learning Socially Optimal Systems Sept 23, 2013 10 / 20

More General Submodular Utility Can we use other submodular functions? Yes. Given ranking/set y = (d i1,..., d in ), aggregate features as: φ j F (x, y) = F (γ 1φ j (x, d i1 ), γ 2 φ j (x, d i2 ),......, γ n φ j (x, d in )) φ j (x, d i ) is j th feature of d i. F is a submodular function (modeling decision). γ 1 γ 2... γ n 0 are position-discount factors Utility modeled as linear in aggregate features: U(x, y) = w T φ F (x, y) Submodular aggregation leads to diversity. Karthik Raman (Cornell University) Learning Socially Optimal Systems Sept 23, 2013 10 / 20

Social Perceptron for Ranking 1 Initialize weight vector w 1 0. 2 Given context x t compute y t argmax y w t φ(x t, y). 3 Observe user clicks D. 4 Construct preference feedback ȳ t PrefFeedback(y t, D). 5 w t+1 w t + φ(x t, ȳ t ) φ(x t, y t ) 6 Clip: w j t+1 max( wj t+1, 0) 7 Repeat from step 2. Karthik Raman (Cornell University) Learning Socially Optimal Systems Sept 23, 2013 11 / 20

Social Perceptron for Ranking 1 Initialize weight vector w 1 0. 2 Given context x t compute y t argmax y wt φ(x t, y). Using greedy algorithm. 3 Observe user clicks D. 4 Construct preference feedback ȳ t PrefFeedback(y t, D). 5 w t+1 w t + φ(x t, ȳ t ) φ(x t, y t ) 6 Clip: w j t+1 max( wj t+1, 0) 7 Repeat from step 2. Karthik Raman (Cornell University) Learning Socially Optimal Systems Sept 23, 2013 11 / 20

Social Perceptron for Ranking 1 Initialize weight vector w 1 0. 2 Given context x t compute y t argmax y wt φ(x t, y). Using greedy algorithm. 3 Observe user clicks D. 4 Construct preference feedback ȳ t PrefFeedback(y t, D). Pairwise feedback. 5 w t+1 w t + φ(x t, ȳ t ) φ(x t, y t ) 6 Clip: w j t+1 max( wj t+1, 0) 7 Repeat from step 2. Karthik Raman (Cornell University) Learning Socially Optimal Systems Sept 23, 2013 11 / 20

Social Perceptron for Ranking 1 Initialize weight vector w 1 0. 2 Given context x t compute y t argmax y w t φ(x t, y). Using greedy algorithm. 3 Observe user clicks D. 4 Construct preference feedback ȳ t PrefFeedback(y t, D). Pairwise feedback. 5 w t+1 w t + φ(x t, ȳ t ) φ(x t, y t ) Perceptron update. 6 Clip: w j t+1 max( wj t+1, 0) To ensure submodularity. 7 Repeat from step 2. Karthik Raman (Cornell University) Learning Socially Optimal Systems Sept 23, 2013 11 / 20

Regret Bound Definition User feedback is expected α i, δ i -informative if ξ t R is chosen s.t. : ( ) Eȳt [U i (x t, ȳ t )] (1 + δ i )U i (x t, y t ) + α i U i (x t, yt,i ) U i (x t, y t ) ξ t. Karthik Raman (Cornell University) Learning Socially Optimal Systems Sept 23, 2013 12 / 20

Regret Bound Definition User feedback is expected α i, δ i -informative if ξ t R is chosen s.t. : ( ) Eȳt [U i (x t, ȳ t )] (1 + δ i )U i (x t, y t ) + α i U i (x t, yt,i ) U i (x t, y t ) ξ t. Theorem For any w R m and φ(x, y) l2 R the average regret of the SoPer-R algorithm can be upper bounded as: REG T 1 T 1 E i [p i ξ t ] + R w 15R w + ηt 2η η 2T. with: δ i t=0 ( Γ F 1 p ) i, η = min p i α i. p i i Karthik Raman (Cornell University) Learning Socially Optimal Systems Sept 23, 2013 12 / 20

Regret Bound: Analysis Theorem Average regret of the SoPer-R algorithm can be upper bounded as: REG T 1 T 1 E i [p i ξ t ] + R w 15R w + ηt 2η η 2T. with: δ i t=0 ( Γ F 1 p ) i, η = min p i α i. p i i Does not depend on number of dimensions. Only on feature ball radius R. Decays gracefully with noisy feedback (the α i s and η). Need not converge to optimal. Partly due to NP-hardness of submodular maximization. Karthik Raman (Cornell University) Learning Socially Optimal Systems Sept 23, 2013 13 / 20

Social Perceptron for Sets SoPer-S Algorithm for predicting diverse sets. See paper for more details. Karthik Raman (Cornell University) Learning Socially Optimal Systems Sept 23, 2013 14 / 20

Experimental Setup Used standard TREC 6-8 Interactive search-diversification dataset: Each query has 7-56 user types. Setup as in previous work [BJYB11, RJS11]. Simulated user behavior: Users scan rankings top to bottom. Click on first document relevant to them (with small error chance). Utility function: Normalized DCG-coverage function (i.e. F (x 1,..., x n ) = max γ i x i ) upto rank 5. i Karthik Raman (Cornell University) Learning Socially Optimal Systems Sept 23, 2013 15 / 20

Learning to Diversify: Single Query 0.88 Normalized List Utility 0.8 0.72 0.64 0.56 Random Ranked Bandit SoPer-R 0 200 400 600 800 1000 Number of Iterations Per Query Improved learning for single-query diversification. Karthik Raman (Cornell University) Learning Socially Optimal Systems Sept 23, 2013 16 / 20

Learning to Diversify: Cross-Query 0.68 Normalized List Utility 0.64 0.6 0.56 0.52 Random StructPerc SoPer-R 0 200 400 600 800 1000 Number of Iterations Per Query StructPerc is (rough) skyline: Uses optimal for training. First method to learn cross-query diversity from implicit feedback. Karthik Raman (Cornell University) Learning Socially Optimal Systems Sept 23, 2013 17 / 20

Robustness User Fncn SoPer-R Function Rand MAX SQRT MAX.630 ±.007.620 ±.006.557 ±.006 SQRT.656 ±.007.654 ±.007.610 ±.007 Robust to difference between submodular functions used in User s utility and Algorithm s utility. Karthik Raman (Cornell University) Learning Socially Optimal Systems Sept 23, 2013 18 / 20

Robustness User Fncn SoPer-R Function Rand MAX SQRT MAX.630 ±.007.620 ±.006.557 ±.006 SQRT.656 ±.007.654 ±.007.610 ±.007 Robust to difference between submodular functions used in User s utility and Algorithm s utility. Random No Noise Noise.557 ±.006.630 ±.007.631 ±.007 Works even if user feedback is noisy Karthik Raman (Cornell University) Learning Socially Optimal Systems Sept 23, 2013 18 / 20

Conclusions Proposed online-learning algorithms for aggregating conflicting user preferences of a diverse population. Utilizes the coactive learning model. Modeled user utilities as submodular. Provided regret bounds for algorithms. Works well empirically and is robust. Karthik Raman (Cornell University) Learning Socially Optimal Systems Sept 23, 2013 19 / 20

THANKS QUESTIONS? Poster #15 Karthik Raman (Cornell University) Learning Socially Optimal Systems Sept 23, 2013 20 / 20

C. Brandt, T. Joachims, Yisong Yue, and J. Bank. Dynamic ranked retrieval. In WSDM, 2011. Jaime Carbonell and Jade Goldstein. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In SIGIR, pages 335 336, 1998. Harr Chen and David R. Karger. Less is more: probabilistic models for retrieving fewer relevant documents. In SIGIR, pages 429 436, 2006. T. Joachims, L. Granka, Bing Pan, H. Hembrooke, F. Radlinski, and G. Gay. Evaluating the accuracy of implicit feedback from clicks and query reformulations in web search. TOIS, 25(2), April 2007. Karthik Raman, Thorsten Joachims, and Pannagadatta Shivaswamy. Structured learning of two-level dynamic rankings. In CIKM, pages 291 296, 2011. Karthik Raman, Thorsten Joachims, Pannaga Shivaswamy, and Tobias Schnabel. Karthik Raman (Cornell University) Learning Socially Optimal Systems Sept 23, 2013 20 / 20

Stable coactive learning via perturbation. In ICML, pages 837 845, 2013. Filip Radlinski, Robert Kleinberg, and Thorsten Joachims. Learning diverse rankings with multi-armed bandits. In ICML, pages 784 791, 2008. Karthik Raman, Pannaga Shivaswamy, and Thorsten Joachims. Online learning to diversify from implicit feedback. In KDD, pages 705 713, 2012. Pannaga Shivaswamy and Thorsten Joachims. Online structured prediction via coactive learning. In ICML, 2012. Rodrygo L.T. Santos, Craig Macdonald, and Iadh Ounis. Selectively diversifying web search results. In CIKM, pages 1179 1188, 2010. Aleksandrs Slivkins, Filip Radlinski, and Sreenivas Gollapudi. Ranked bandits in metric spaces: learning optimally diverse rankings over large document collections. JMLR, 14:399 436, 2013. Karthik Raman (Cornell University) Learning Socially Optimal Systems Sept 23, 2013 20 / 20

Yisong Yue and Carlos Guestrin. Linear submodular bandits and their application to diversified retrieval. In NIPS, pages 2483 2491, 2012. Yisong Yue and Thorsten Joachims. Predicting diverse subsets using structural SVMs. In ICML, pages 1224 1231, 2008. Cheng Xiang Zhai, William W. Cohen, and John Lafferty. Beyond independent relevance: methods and evaluation metrics for subtopic retrieval. In SIGIR, pages 10 17, 2003. Karthik Raman (Cornell University) Learning Socially Optimal Systems Sept 23, 2013 20 / 20

Experimental Setup Details TREC 6-8 Interactive diversification dataset: Contains 17 queries. Each has 7-56 user types. Binary relevance labels. Similar results observed for WEB diversification dataset. Setup details: Re-ranking documents relevant to 1 user. Probability of user type # of documents relevant to user. DCG-position discounting: γ i = 1 log 2 (1 + i). Karthik Raman (Cornell University) Learning Socially Optimal Systems Sept 23, 2013 20 / 20

Regret Bound Definition User feedback is expected α i, δ i -informative for user with personal utility function U i, if ξ t R is chosen s.t. for some given α i [0, 1] and δ i > 0: ( ) Eȳt [U i (x t, ȳ t )] (1 + δ i )U i (x t, y t ) + α i U i (x t, yt,i ) U i (x t, y t ) ξ t. Theorem For any w R m and φ(x, y) l2 R the average regret of the SoPer-R algorithm can be upper bounded as: REG T 1 T 1 E i [p i ξ t ] + βr w 2 4 β + 2 R w ηt η η. T with: δ i t=0 ( Γ F 1 p ) i, η = min p i α i and β = (1 β gr ) = 1 p i i 2. Karthik Raman (Cornell University) Learning Socially Optimal Systems Sept 23, 2013 20 / 20