Lecture 3: Improving Ranking with

Similar documents
Mining the Search Trails of Surfing Crowds: Identifying Relevant Websites from User Activity Data

CS 572: Information Retrieval. Implicit Searcher Feedback

Modeling User Behavior and Interactions. Lecture 2: Interpreting Behavior Data. Eugene Agichtein

WebSci and Learning to Rank for IR

Advanced Topics in Information Retrieval. Learning to Rank. ATIR July 14, 2016

Modeling Contextual Factors of Click Rates

Learning to Rank (part 2)

University of Delaware at Diversity Task of Web Track 2010

NTU Approaches to Subtopic Mining and Document Ranking at NTCIR-9 Intent Task

Learning to Rank. from heuristics to theoretic approaches. Hongning Wang

Learning Socially Optimal Information Systems from Egoistic Users

Information Retrieval

Characterizing Search Intent Diversity into Click Models

Letter Pair Similarity Classification and URL Ranking Based on Feedback Approach

Learning to Rank. Tie-Yan Liu. Microsoft Research Asia CCIR 2011, Jinan,

Personalized Web Search

A Task Level Metric for Measuring Web Search Satisfaction and its Application on Improving Relevance Estimation

Search Engines Chapter 8 Evaluating Search Engines Felix Naumann

Structured Ranking Learning using Cumulative Distribution Networks

Optimizing Search Engines using Click-through Data

Evaluating the Accuracy of. Implicit feedback. from Clicks and Query Reformulations in Web Search. Learning with Humans in the Loop

Clickthrough Log Analysis by Collaborative Ranking

Towards Predicting Web Searcher Gaze Position from Mouse Movements

Web Search. Lecture Objectives. Text Technologies for Data Science INFR Learn about: 11/14/2017. Instructor: Walid Magdy

Inferring Searcher Intent

Adapting Ranking Functions to User Preference

Mining long-lasting exploratory user interests from search history

ICTNET at Web Track 2010 Diversity Task

CSCI 5417 Information Retrieval Systems. Jim Martin!

Learning Dense Models of Query Similarity from User Click Logs

Learning Ranking Functions with SVMs

Adaptive Search Engines Learning Ranking Functions with SVMs

Search Evaluation. Tao Yang CS293S Slides partially based on text book [CMS] [MRS]

A Dynamic Bayesian Network Click Model for Web Search Ranking

Learning to Rank for Information Retrieval. Tie-Yan Liu Lead Researcher Microsoft Research Asia

Learning Similarity Metrics for Event Identification in Social Media. Hila Becker, Luis Gravano

Evaluating Search Engines by Modeling the Relationship Between Relevance and Clicks

Learning Ranking Functions with SVMs

Chapter 8. Evaluating Search Engine

Northeastern University in TREC 2009 Million Query Track

Fall Lecture 16: Learning-to-rank

Information Retrieval Spring Web retrieval

Information Retrieval and Web Search Engines

High Accuracy Retrieval with Multiple Nested Ranker

Beyond PageRank: Machine Learning for Static Ranking

Ambiguity. Potential for Personalization. Personalization. Example: NDCG values for a query. Computing potential for personalization

Estimating Credibility of User Clicks with Mouse Movement and Eye-tracking Information

TriRank: Review-aware Explainable Recommendation by Modeling Aspects

Success Index: Measuring the efficiency of search engines using implicit user feedback

Efficient Multiple-Click Models in Web Search

Multi-label Classification. Jingzhou Liu Dec

Improving Patent Search by Search Result Diversification

Predictive Indexing for Fast Search

Apache Solr Learning to Rank FTW!

Information Retrieval

IMAGE RETRIEVAL SYSTEM: BASED ON USER REQUIREMENT AND INFERRING ANALYSIS TROUGH FEEDBACK

THIS LECTURE. How do we know if our results are any good? Results summaries: Evaluating a search engine. Making our good results usable to a user

Introduction to Information Retrieval. Hongning Wang

Information Retrieval

Scalable Diversified Ranking on Large Graphs

08 An Introduction to Dense Continuous Robotic Mapping

Automatic Identification of User Goals in Web Search [WWW 05]

COMP6237 Data Mining Searching and Ranking

Structured Learning of Two-Level Dynamic Rankings

Representation/Indexing (fig 1.2) IR models - overview (fig 2.1) IR models - vector space. Weighting TF*IDF. U s e r. T a s k s

A User Preference Based Search Engine

Where we are. Exploratory Graph Analysis (40 min) Focused Graph Mining (40 min) Refinement of Query Results (40 min)

Demystifying movie ratings 224W Project Report. Amritha Raghunath Vignesh Ganapathi Subramanian

UMass at TREC 2017 Common Core Track

A Machine Learning Approach for Improved BM25 Retrieval

Diversification of Query Interpretations and Search Results

LETOR: Benchmark Dataset for Research on Learning to Rank for Information Retrieval Tie-Yan Liu 1, Jun Xu 1, Tao Qin 2, Wenying Xiong 3, and Hang Li 1

Information Retrieval

CS6322: Information Retrieval Sanda Harabagiu. Lecture 13: Evaluation

CS54701: Information Retrieval

Capturing User Interests by Both Exploitation and Exploration

An Exploration of Query Term Deletion

IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, ISSN:

CS 347 Parallel and Distributed Data Processing

CS 347 Parallel and Distributed Data Processing

Fighting Search Engine Amnesia: Reranking Repeated Results

Modern Retrieval Evaluations. Hongning Wang

COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning

Detecting Multilingual and Multi-Regional Query Intent in Web Search

Information Retrieval. Lecture 7

Combining Implicit and Explicit Topic Representations for Result Diversification

Random projection for non-gaussian mixture models

A new click model for relevance prediction in web search

Analysis of Trail Algorithms for User Search Behavior

In the Mood to Click? Towards Inferring Receptiveness to Search Advertising

Jan Pedersen 22 July 2010

Session Based Click Features for Recency Ranking

Introduction to Information Retrieval

Information Retrieval May 15. Web retrieval

arxiv: v1 [cs.ir] 2 Feb 2015

Information Integration of Partially Labeled Data

Learn from Web Search Logs to Organize Search Results

WCL2R: A Benchmark Collection for Learning to Rank Research with Clickthrough Data

BigTable: A Distributed Storage System for Structured Data (2006) Slides adapted by Tyler Davis

A Few Things to Know about Machine Learning for Web Search

Transcription:

Modeling User Behavior and Interactions Lecture 3: Improving Ranking with Behavior Data 1

* &!" ) & $ & 6+ $ & 6+ + "& & 8 > " + & 2 Lecture 3 Plan $ ( ' # $ #! & $ #% #"! #! -( #", # + 4 / 0 3 21 0 / 0. $ #! *7 5 # &# (" -( 8* ( ( # 5 # < ; : 9 $ # # %!? (" # %, = &

. -, " 98 ' : 7-2 =#? C : ) BE =? K L PN O KJ d c d c [Z Y W X U K L Q N K M f Z m U l Review: Learning to Rank / 0 1 + # * & )( ' & &% $ #! $; 2 $% $ " * -, "# 4 6 4"5 - " # # # 3, % : : ) & $ @ ;? ) 9 ) $; & > $ $% 8 3 < 7 % $ A? ' $ @ $ > 2)%. D 7 F H K PR J Q H K H K L MN HG I `a b b e _ `a b b ^_ ]\ U V W X S T U V H j h i K O ghk K NM J N T U V S T U V [ kx 3

* & $ $! = ; 8 ) ) 7 6 0/ K G J J F F RM JN OL M J JN OL N M L G KK G JIH \ F M G L l G S lg S N J i J PM M O D N M G JI OL h M M J } zy M Q G JI OL O J qt ˆ M \ G JN J N R O\ i hj M M J } f T V N J RM Ordinal Regression Approaches # "! + ( ' ) " $ % # "! B A( @? > < 9 : 8 ) ) 5-43 21 1,-. D C Z [ Y X T W T UV V FL S O NF Q GF PM E FG t s sr gp qpz o S O l m n k [ j N h J gf e cd b _`a ] ^ C w v u s} k T p GRF N G q PM JEF O g N~ PM L O G S VQ NO GF GPL [ c{ b x d a ƒ v u N M K P ~M N~ J J l Š~L l S O S ~ R O l ~ S n L G J N T ~ F j N ~ ~\ Z M ~ j N O g N~ PM L O G S VQ NO G F Q GPL [ y x d a Žd Œ d s sr Z p o MFV ~ [ 4

( ' # # " & &! @ 6-9- 2.-, B A A 2.- F 9? C # "! ', / 2 - / E 4 2 U 4 5. A.? 4 5. Learning to Rank Summary % " $ /?. >7 9 =* :;< 7 8 10 / )*+ 54 32 D9.- 9 >?. H4 1G- E? 98 L " % K J " I! 7 ST 54 Q R N OP /? 7 9 54 >? 5 3 F?. > 5 8 M - / 93 > 1 E 84 V 54 /?. > 5 8 E - 5

# # " 1 4 1 < 4 # " # 4 / - 7? 5 4 9. < A 4 / A A 1 7 1 A Approaches to Use Behavior Data J! & $ > 3 9 A? > 3 9 A? >V E7> 59 4! " K ' J E 1> < E > 5 - / - 1> 8 / - > 1 7 54 V? E - 5 > - 5 >9 54 2 6

Recap: Available Behavior Data 7

Training Examples from Click Data [ Joachims 2002 ] 8

Loss Function [ Joachims 2002 ] 9

Learned Retrieval Function [ Joachims 2002 ] 10

Features [ Joachims 2002 ] 11

Results [ Joachims 2002 ] Summary: Learned outperforms all base methods in experiment Learning from clickthrough data is possible Relative preferences are useful training data. 12

Extension: Query Chains [Radlinski & Joachims, KDD 2005] 13

Query Chains (Cont d) [Radlinski & Joachims, KDD 2005] 14

Query Chains (Results) [Radlinski & Joachims, KDD 2005] 15

G S LG G R O\ L G O K G PM M OL G PM j J M LR J S O G i J PM S J F F M G GFL i J PM S J % PM M J S NG J J FL G L PM M O g F \ G O P I M OM 16 Lecture 3 Plan S F S O N J GRM O N~ [ h J j N GROl F N[! Z R j N O G Ol NO F O Ij N O G $ # " L O G R IJ K ~ h J Nn [

Incorporating Behavior for Static Rank [Richardson et al., WWW2006] Web Crawl Build Index Answer Queries Which pages to crawl Efficient index order Informs dynamic ranking Static Rank 17

# " # # $ # I % " ' $ % L I ' [Richardson et al., WWW2006] frank Machine Learning Model 18 Web

K # " J Features: Summary [Richardson et al., WWW2006] 19

[ c \ \ \ ] ] e \ \ Features: Popularity [Richardson et al., WWW2006] Function Exact URL No Params Page URL-1 URL-2 Domain Domain+1 Example cnn.com/2005/tech/wikipedia.html?v=mobile cnn.com/2005/tech/wikipedia.html wikipedia.html cnn.com/2005/tech cnn.com/2005 cnn.com cnn.com/2005 Eugene Agichtein, Emory University, RuSSIR 2009 (Petrozavodsk, Russia) 20

1 > 52 /.? 1 54 F E / 4 ;?.? A V 7.? 7 G / 4 29-. 9 5 - D A F? 3-2 5 V / 2 Features: Anchor, Page, Domain [Richardson et al., WWW2006] / / -+ 54-2 / -+ /? 52? 34 / - 9 A E>5 F>5? G -. 32 5 9 A 7 G? > 5?. E? 4 7? 5 - -, 84 4-9 / - 3 A / -.? 5 3 / F 3 9?? -. / - 9 A E>5? - 84 H4.- > 54D 7? 3 > 5 9-84 H.- 21

F? 3..-2 1-8 - 5 1 4 -. M 9, 1/ M C F< >E 2 4 < 29 9-4 >G 9 52 = [Richardson et al., WWW2006] Data E 2 9.- 9-9?? 7? 3 54 < > 5 G 9 C < /? - 9? 1 =.-2 / 4 F?.? / / - 4 < - 54 E - H - > 5 8. A 7 -..? 7 -+ > 5 /? E,, 4 E/ E -, 34 7 9 - >.2 9- > 5.8-7 -. E?. 4. 22

U O Q U TPU P Becoming Query Independent [Richardson et al., WWW2006] OU S U N R N N PO R PU 23

[Richardson et al., WWW2006] Measure p H S p pairwise accuracy = H p 24

RankNet, Burges et al., ICML 2005 [Richardson et al., WWW2006] Feature Vector Label NN output Error is function of label and output 25

RankNet [Burges et al. 2005] [Richardson et al., WWW2006] Feature Vector1 Label1 NN output 1 26

RankNet [Burges et al. 2005] [Richardson et al., WWW2006] Feature Vector2 Label2 NN output 1 NN output 2 27

RankNet [Burges et al. 2005] [Richardson et al., WWW2006] NN output 1 NN output 2 Error is function of both outputs (Desire output1 > output2) 28

RankNet [Burges et al. 2005] [Richardson et al., WWW2006] Feature Vector1 NN output 29

Experimental Methodology [Richardson et al., WWW2006] 30

Accuracy of Each Feature Set [Richardson et al., WWW2006] 70 65 60 55 50 PageRank Popularity Anchor Page Domain (1) (14) (5) (8) (7) All Feature Set Accuracy (%) PageRank 56.70 Popularity 60.82 Anchor 59.09 Page 63.93 Domain 59.03 All Features 67.43 31

! #! % # %! & Qualitative Evaluation [Richardson et al., WWW2006] # #! "! # $ #! % % #! " $! $!! $ $ Technology Oriented Consumer Oriented 32

Behavior for Dynamic Ranking [Agichtein et al., SIGIR2006] ResultPosition QueryTitleOverlap DeliberationTime ClickFrequency ClickDeviation DwellTime DwellTimeDeviation Presentation Position of the URL in Current ranking Fraction of query terms in result Title Clickthrough Seconds between query and first click Fraction of all clicks landing on page Deviation from expected click frequency Browsing Result page dwell time Deviation from expected dwell time for query Sample Behavior Features (from Lecture 2) 33

! 9 = 41 + 7 3 6 3 Feature Merging: Details [Agichtein et al., SIGIR2006] Query: SIGIR, fake results w/ fake feature values Result URL BM25 PageRank Clicks DwellTime sigir2007.org 2.4 0.5?? Sigir2006.org 1.4 1.1 150 145.2 acm.org/sigs/sigir/ 1.2 2 60 23.5 60 "!# ( ) ( ) % ' & $ % $ # - - 8 1 5. <+ ;. - : + 1 2 / 0. -, * + 34

Results for Incorporating Behavior into Ranking [Agichtein et al., SIGIR2006] NDCG 0.74 0.72 0.7 0.68 0.66 0.64 0.62 0.6 0.58 0.56 RN Rerank-All RN+All 1 2 3 4 5 6 7 8 9 10 K MAP Gain RN 0.270 RN+ALL 0.321 0.052 (19.13%) BM25 0.236 BM25+ALL 0.292 0.056 (23.71%) 35

Which Queries Benefit Most [Agichtein et al., SIGIR2006] 350 300 250 200 150 100 50 0 Frequency Average Gain 0.1 0.2 0.3 0.4 0.5 0.6 0.2 0.15 0.1 0.05 0-0.05-0.1-0.15-0.2-0.25-0.3-0.35-0.4 Most gains are for queries with poor original ranking 36

! Lecture 3 Plan Automatic relevance labels Enriching feature space Dealing with data sparseness Dealing with Scale " Active learning Ranking for diversity Fun and games 37

Extension to Unseen [Bilenko and White, WWW 2008] Queries/Documents: Search Trails Trails start with a search engine query Continue until a terminating event Another search Visit to an unrelated site (social networks, webmail) Timeout, browser homepage, browser closing 38

Probabilistic Model [Bilenko and White, WWW 2008] IR via language modeling [Zhai-Lafferty, Lavrenko] Query-term distribution gives more mass to rare terms: Term-website weights 39

Results: Learning to Rank [Bilenko and White, WWW 2008] Add 0.72 0.7 0.68 0.66 0.64 0.62 0.6 ( ) as a feature to RankNet Baseline Baseline+Heuristic Baseline+Probabilistic 0.58 NDCG@1 NDCG@3 NDCG@10 Baseline+Probabilistic+RW 40

Scalability: (Peta?)bytes of Click Data BBM: Bayesian Browsing Model from Petabyte-scale Data, Liu et al, KDD 2009 query URL 1 URL 2 URL 3 URL 4 S 1 S 2 S 3 S 4 Relevance E 1 E 2 E 3 E 4 Examine Snippet C 1 C 2 C 3 C 4 ClickThroughs 41

Training BBM: One-Pass Counting BBM: Bayesian Browsing Model from Petabyte-scale Data, Liu et al, KDD 2009 Find R j 42

Training BBM on MapReduce BBM: Bayesian Browsing Model from Petabyte-scale Data, Liu et al, KDD 2009 Map: emit((q,u), idx) Reduce: construct the count vector 43

Model Comparison on Efficiency BBM: Bayesian Browsing Model from Petabyte-scale Data, Liu et al, KDD 2009 57 times faster 44

Large-Scale Experiment BBM: Bayesian Browsing Model from Petabyte-scale Data, Liu et al, KDD 2009 Setup: 8 weeks data, 8 jobs Job k takes first k- week data Experiment platform SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets [Chaiken et al, VLDB 08] 45

Scalability of BBM BBM: Bayesian Browsing Model from Petabyte-scale Data, Liu et al, KDD 2009 Increasing computation load more queries, more URLs, more impressions Near-constant elapsed time 3 hours Scan 265 terabyte data Full posteriors for 1.15 billion (query, url) pairs Computation Overload Elapse Time on SCOPE 46

Lecture 3 Plan Automatic relevance labels Enriching feature space Dealing with data sparseness Dealing with Scale Active learning Ranking for diversity 47

New Direction: Active Learning Goal: Learn the relevanceswith as little training data as possible. Search involves a three step process: [Radlinski & Joachims, KDD 2007] 2. Given a ranking, users provide feedback: User clicks provide pairwise relevance judgments. 3. Given feedback, update the relevance estimates. 48

Overview of Approach [Radlinski & Joachims, KDD 2007] Available information: 1. Have an estimate of the relevance of each result. 2. Can obtain pairwise comparisons of the top few results. 3. Do have absolute relevance information. Goal: Learn the document relevance quickly. Will address four questions: 1. How to represent knowledge about doc relevance. 2. How to maintain this knowledge as we collect data. 3. Given our knowledge, what is the best ranking? 4. What rankings do we show users to get useful data? 49

1: Representing Document Relevance Given a query, q, let M = (µ 1,..., µ C ) M be the true relevance values of the documents. Model knowledge of M with a Bayesian: P (M D) = P (D M ) P (M )/P (D) [Radlinski & Joachims, KDD 2007] Assume P (M D) is spherical multivariate normal: P (M D) = N (ν 1,..., ν C ; σ 12,..., σ C 2 ) Eugene Agichtein, Emory University, RuSSIR 2009 (Petrozavodsk, Russia) 50

1: Representing Document Relevance [Radlinski & Joachims, KDD 2007] Given a fixed query, maintain knowledge about relevance as clicks are observed. This tells us which documents we are sure about, and which ones need more data. Eugene Agichtein, Emory University, RuSSIR 2009 (Petrozavodsk, Russia) 51

2: Maintaining P(M D) [Radlinski & Joachims, KDD 2007] Model noisy pairwise judgments w [Bradley-Terry 52] Adding a Gaussian prior, apply off-the-shelf algorithm to maintain, commonly used for chess [Glickman 1999] Eugene Agichtein, Emory University, RuSSIR 2009 (Petrozavodsk, Russia) 52

3: Ranking (Inference) [Radlinski & Joachims, KDD 2007] Want to assign relevancesm = (µ 1,..., µ C ) such that L(M, M ) is small, but M is unknown. Minimize loss (pairwise): Eugene Agichtein, Emory University, RuSSIR 2009 (Petrozavodsk, Russia) 53

4: Getting Useful Data [Radlinski & Joachims, KDD 2007] : could present the ranking based on best estimate of relevance. Then the data we get would always be about the documents already ranked highly. Instead, ranking shown users: 1. Pick top two docs to minimize 2. Append current best estimate ranking. Eugene Agichtein, Emory University, RuSSIR 2009 (Petrozavodsk, Russia) 54

4: Exploration Strategies [Radlinski & Joachims, KDD 2007] Eugene Agichtein, Emory University, RuSSIR 2009 (Petrozavodsk, Russia) 55

4: Loss Functions [Radlinski & Joachims, KDD 2007] Eugene Agichtein, Emory University, RuSSIR 2009 (Petrozavodsk, Russia) 56

Results: TREC Data [Radlinski & Joachims, KDD 2007] Optimizing for better than for Eugene Agichtein, Emory University, RuSSIR 2009 (Petrozavodsk, Russia) 57

Need for Diversity (in IR) Ambiguous Queries Users with different information needs issuing the same textual query ( Jaguar ) Informational (Exploratory) Queries: [Predicting Diverse Subsets Using Structural SVMs, Y. Yue and Joachims, ICML 2008] User interested in a specific detail or entire breadth of knowledge available [Swaminathan et al., 2008] Want results with high information diversity 58

Optimizing for Diversity Long interest in IR community Requires [Predicting Diverse Subsets Using Structural SVMs, Y. Yue and Joachims, ICML 2008] Impossible given current learning to rank methods Problem: no consensus on how to measure diversity. Formulate as predicting diverse subsets Experiment: Use training data with explicitly labeled subtopics (TREC 6-8 Interactive Track) Use loss function to encode subtopic loss Train using structural SVMs [Tsochantaridis et al., 2005] 59

Representing Diversity Existing datasets with manual subtopic labels E.g., Use of robots in the world today Nanorobots Space mission robots Underwater robots [Predicting Diverse Subsets Using Structural SVMs, Y. Yue and Joachims, ICML 2008] Manual partitioning of the total information regarding a query Relatively reliable 60

Example [Predicting Diverse Subsets Using Structural SVMs, Y. Yue and Joachims, ICML 2008] Choose K documents with maximal information coverage. For K = 3, optimal set is {D1, D2, D10} 61

Maximizing Subtopic Coverage select K documents which collectively cover as many subtopics as possible. [Predicting Diverse Subsets Using Structural SVMs, Y. Yue and Joachims, ICML 2008] Perfect selection takes n choose K time. Set cover problem. Greedy gives (1-1/e)-approximation bound. Special case of Max Coverage (Khulleret al, 1997) 62

Weighted Word Coverage More distinct words = more information Weight word importance Does not depend on human labels select K documents which collectively cover as many distinct (weighted) words as possible [Predicting Diverse Subsets Using Structural SVMs, Y. Yue and Joachims, ICML 2008] Greedy selection also yields (1-1/e) bound. Need to find good weighting function (learning problem). 63

Example [Predicting Diverse Subsets Using Structural SVMs, Y. Yue and Joachims, ICML 2008] Document Word Counts Word Benefit V1 V2 V3 V4 V5 V1 1 D1 X X X V2 2 D2 X X X V3 3 D3 X X X X V4 4 V5 5 Marginal Benefit D1 D2 D3 Best Iter 1 12 11 10 D1 Iter 2 64

[Predicting Diverse Subsets Using Structural SVMs Y. Yue and Joachims, ICML 2008] Document Word Counts Word Benefit V1 V2 V3 V4 V5 V1 1 D1 X X X V2 2 D2 X X X V3 3 D3 X X X X V4 4 V5 5 Marginal Benefit D1 D2 D3 Best Iter 1 12 11 10 D1 Iter 2 -- 2 3 D3 65

12/4/1 train/valid/test split Approx 500 documents in training set [Predicting Diverse Subsets Using Structural SVMs Y. Yue and Joachims, ICML 2008] Permuted until all 17 queries were tested once Set K=5 (some queries have very few documents) SVM-div uses term frequency thresholds to define importance levels SVM-div2 in addition uses TFIDF thresholds 66

[Predicting Diverse Subsets Using Structural SVMs Y. Yue and Joachims, ICML 2008] Method Loss Random 0.469 Okapi 0.472 Unweighted Model 0.471 Essential Pages 0.434 SVM-div 0.349 SVM-div2 0.382 Methods SVM-div vs Ess. Pages SVM-div2 vs Ess. Pages SVM-div vs SVM-div2 W / T / L 14 / 0 / 3 ** 13 / 0 / 4 9 / 6 / 2 67

[Predicting Diverse Subsets Using Structural SVMs Y. Yue and Joachims, ICML 2008] Can expect further benefit from having more training data. Eugene Agichtein, Emory University, RuSSIR 2009 (Petrozavodsk, Russia) 68

Predicting Diverse Subsets Using Structural SVMs Y. Yue and Joachims, ICML 2008 Formulated diversified retrieval as predicting diverse subsets Efficient training and prediction algorithms Used weighted word coverage as proxy to information coverage. Encode diversity criteria using loss function Weighted subtopic loss http://projects.yisongyue.com/svmdiv/ 69

Lecture 3 Summary Automatic relevance labels Enriching feature space Dealing with data sparseness Dealing with Scale Active learning Ranking for diversity 70

# $ * %! )! 4 3 75 9 / + 8 2 Key References and Further Reading, T. 2002., KDD 2002, E., Brill, E., Dumais, S. # $ # "!, SIGIR 2006 %! % +, F. and Joachims, T., KDD 2005, F. and Joachims, T. ) (' & # % -, ) (' &, KDD 2007 % " + % % /, M and White, R, (., WWW 2008 % # "! % $ $ 0 5 6 # 5!,Yand Joachims,, ICML 2008,C., Guo, F., and Faloutsos, C., KDD 2009 1 2 / / " 9 $ 9 9 4 5 : # $ 3 % 71