Modeling Rich Interac1ons in Session Search Georgetown University at TREC 2014 Session Track

Similar documents
Utilizing Query Change for Session Search

Effective Structured Query Formulation for Session Search

An Investigation of Basic Retrieval Models for the Dynamic Domain Task

COSC572 GUEST LECTURE - PROF. GRACE HUI YANG INTRODUCTION TO INFORMATION RETRIEVAL NOV 2, 2016

TREC 2017 Dynamic Domain Track Overview

CS6200 Informa.on Retrieval. David Smith College of Computer and Informa.on Science Northeastern University

An Exploration of Query Term Deletion

CS6200 Informa.on Retrieval. David Smith College of Computer and Informa.on Science Northeastern University

TREC 2016 Dynamic Domain Track: Exploiting Passage Representation for Retrieval and Relevance Feedback

University of Delaware at Diversity Task of Web Track 2010

Washington, DC April 22, 2013

Search Engines. Informa1on Retrieval in Prac1ce. Annotations by Michael L. Nelson

Learning to Reweight Terms with Distributed Representations

Search Engines. Informa1on Retrieval in Prac1ce. Annota1ons by Michael L. Nelson

On Duplicate Results in a Search Session

Modern Retrieval Evaluations. Hongning Wang

CS6200 Informa.on Retrieval. David Smith College of Computer and Informa.on Science Northeastern University

On Duplicate Results in a Search Session

Recommender Systems Collabora2ve Filtering and Matrix Factoriza2on

University of Virginia Department of Computer Science. CS 4501: Information Retrieval Fall 2015

CSCI 599: Applications of Natural Language Processing Information Retrieval Retrieval Models (Part 3)"

CSCI 599: Applications of Natural Language Processing Information Retrieval Evaluation"

A Comparative Analysis of Cascade Measures for Novelty and Diversity

A Deep Relevance Matching Model for Ad-hoc Retrieval

Informa(on Retrieval

Reducing Click and Skip Errors in Search Result Ranking

A term-based methodology for query reformulation understanding

Information Search in Web Archives

Modeling multiple interactions with a Markov random field in query expansion for session search

Chapter 8. Evaluating Search Engine

Context based Re-ranking of Web Documents (CReWD)

Increasing Stability of Result Organization for Session Search

Fall Lecture 16: Learning-to-rank

Reducing Redundancy with Anchor Text and Spam Priors

Machine Learning Crash Course: Part I

Midterm Exam Search Engines ( / ) October 20, 2015

Query Likelihood with Negative Query Generation

James Mayfield! The Johns Hopkins University Applied Physics Laboratory The Human Language Technology Center of Excellence!

Personalized Web Search

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University

Microsoft Research Asia at the Web Track of TREC 2009

Search Engines. Informa1on Retrieval in Prac1ce. Annotations by Michael L. Nelson

Overview of the NTCIR-13 OpenLiveQ Task

Informa(on Retrieval

This lecture. Measures for a search engine EVALUATING SEARCH ENGINES. Measuring user happiness. Measures for a search engine

Feature selection. LING 572 Fei Xia

Birkbeck (University of London)

Improving Difficult Queries by Leveraging Clusters in Term Graph

Information Retrieval. (M&S Ch 15)

Clustering. Introduction to Data Science University of Colorado Boulder SLIDES ADAPTED FROM LAUREN HANNAH

The University of Amsterdam at the CLEF 2008 Domain Specific Track

Related Entity Finding Based on Co-Occurrence

Machine Learning based session drop prediction in LTE networks and its SON aspects

Ranking and Learning. Table of Content. Weighted scoring for ranking Learning to rank: A simple example Learning to ranking as classification.

Robust Relevance-Based Language Models

Towards Better Text Understanding and Retrieval through Kernel Entity Salience Modeling

A Security Punctua.on Framework for Enforcing Access Control on Streaming Data. Rimma V. Nehme, Elke A. Rundensteinerr, Elisa Ber.

Informa(on Retrieval

Northeastern University in TREC 2009 Million Query Track

CSCI 5417 Information Retrieval Systems. Jim Martin!

CS6200 Informa.on Retrieval. David Smith College of Computer and Informa.on Science Northeastern University

Consistency Rationing in the Cloud: Pay only when it matters

Informa(on Retrieval

Basic Tokenizing, Indexing, and Implementation of Vector-Space Retrieval

Unsupervised Rank Aggregation with Distance-Based Models

A Simple and Efficient Sampling Method for Es7ma7ng AP and ndcg

Search Engine Architecture II

Information Retrieval

Overview of the NTCIR-12 MobileClick-2 Task

Information Retrieval Using Context Based Document Indexing and Term Graph

The Water Filling Model and The Cube Test: Multi-Dimensional Evaluation for Professional Search

Query Log Anonymization by Differential Privacy

Mining the Search Trails of Surfing Crowds: Identifying Relevant Websites from User Activity Data

CS 6320 Natural Language Processing

IRCE at the NTCIR-12 IMine-2 Task

University of TREC 2009: Indexing half a billion web pages

Fall CS646: Information Retrieval. Lecture 2 - Introduction to Search Result Ranking. Jiepu Jiang University of Massachusetts Amherst 2016/09/12

IALP 2016 Improving the Effectiveness of POI Search by Associated Information Summarization

A Multiple-stage Approach to Re-ranking Clinical Documents

Ranking with Query-Dependent Loss for Web Search

Relevance Feedback and Query Reformulation. Lecture 10 CS 510 Information Retrieval on the Internet Thanks to Susan Price. Outline

TREC 2015 Dynamic Domain Track Overview

Chapter 6: Information Retrieval and Web Search. An introduction

Document indexing, similarities and retrieval in large scale text collections

Information Retrieval

Boolean Model. Hongning Wang

A Formal Approach to Score Normalization for Meta-search

Effective Tweet Contextualization with Hashtags Performance Prediction and Multi-Document Summarization

Towards Prac+cal Relevance Ranking for 10 Million Books

Advanced Topics in Information Retrieval. Learning to Rank. ATIR July 14, 2016

From Neural Re-Ranking to Neural Ranking:

Effective Latent Space Graph-based Re-ranking Model with Global Consistency

PERSONALIZED TAG RECOMMENDATION

Instructor: Stefan Savev

A Taxonomy of Web Search

Keyword search in databases: the power of RDBMS

A Study of Methods for Negative Relevance Feedback

Behavioral Data Mining. Lecture 18 Clustering

Informa(on Retrieval

Notes: Notes: Primo Ranking Customization

Transcription:

Modeling Rich Interac1ons in Session Search Georgetown University at TREC 2014 Session Track Jiyun Luo, Xuchu Dong and Grace Hui Yang Department of Computer Science Georgetown University

Introduc:on Session search Document retrieval for an en:re search session. TREC Session Track provides log data which records A sequence of query changes q 1,q 2 q n- 1,q n The ranked list for each past query Document clicked informa:on and dwell :me. TREC 2014 Session Track: RL1 using the last query of a session RL2 using any informa:on in current session RL3 using informa:on from other sessions We use: ClueWeb12 Category A as our corpus 2

Outline Introduc:on Methods and Approaches Ad- hoc Retrieval Model (Ad- hoc) Query Change Retrieval Model (QCM) Weighted QCM User- Click Model Clustering Session Performance Predic:on and Replacement Submissions Evalua:on Result Conclusion 3

Ad- hoc Retrieval Model (Ad- hoc) Mul:nomial Language Modeling + Dirichlet Smoothing. Term weight P(t d) as: μ is the Dirichlet smoothing parameter, and is set = 5000. 4

Query Change Retrieve Model (QCM) Idea: Query Change is an important form of user feedback Dongyi Guan, Sicong Zhang, and Hui Yang. 2013. U:lizing query change for session search. (SIGIR '13). Defining query change Δq i as the syntac:c edi:ng changes between two adjacent queries: Δq i Δq i = Added term ; Removed term Δq ; Theme term q i q i 1 + i Table 1 A example of Query Change qtheme Session Queries Query Change Q theme Q 1 = hydropower efficiency +Δq 2 = environment hydropower session 52 Q 2 = hydropower environment - Δq 2 = efficiency Q 3 = hydropower damage +Δq 3 = damage hydropower - Δq 3 = environment 5

Query Change Retrieve Model (QCM) The relevance score Increase between weights one query q i and a document d is calculated by: for theme terms Increase weights for novel added terms Score(q i, d) = log P(q i d)+αw Theme βw Add,In +εw Add,Out δw Remove Current reward/ relevance score Decrease weights for old added terms Decrease weights for removed terms 6

Query Change Retrieve Model (QCM) The relevance score between one query q i and a document d is calculated by: Score(q i, d) = log P(q i d)+αw Theme βw Add,In +εw Add,Out δw Remove The QCM model combines all queries in a session with a discount factor Υ: n i=1 Score qcm (q 1..n, d) = γ n i Score(q i, d) 7

Weighted QCM Weighted QCM combines queries based on query quality which is indicated by user click Strong SAT- Click a clicked document with dwelled :me >= 30 seconds Weak SAT- Click a clicked document with dwell :me >= 10 seconds and< 30 seconds 8

Weighted QCM Weighted QCM combines queries based on query quality which is indicated by user click Strong SAT- Click a clicked document with dwelled :me >= 30 seconds Weak SAT- Click a clicked document with dwell :me >= 10 seconds and< 30 seconds!"#$%!"#$!!..!,! =!"#$% (!!,!) +!!!"#$% (!!,!)!!!!!""#!!!!"# The good query set: Queries bringing at least one SAT- Click + the current query The bad query set: Queries bringing no SAT- Click 9

User- Click Model We boost a document s ranking score, if it is SAT- Clicked by users Session Level User- Click Model for RL2 score from QCM model boost from Session level User- Click model Ψ points for a Strong SAT- Click, θ points for a Weak SAT- Click, sum up for the whole session normaliza1on to (0,1) 10

User- Click Model Topic Level User- Click Model for RL3 boost from Topic level User- Click model similar to session level User- Click model, however calcula:on is done for the whole session cluster A session cluster is a set of sessions that sharing similar search topics 11

Clustering Topic ID is not obtainable in real search prac:ce. cluster sessions by comparing queries similarity Ø Convert all queries in one session to a term vector Ø Assign idf value as weight to each dimension Ø Cluster sessions based on the Euclidean distance of these vectors We use K- means clustering algorithm and set K = 60 12

Session Performance Predic1on and Replacement For sessions that share similar search topics predict their performance replace bad sessions results with good sessions Predict session performance Extract several features (n) from the sessions Rank sessions by formula:!"#$%!! = 1!(! #!!"!!"!!#$%!!!"#$!%&$'(!!! = TRUE! )!!!!..! 13

Session Performance Predic1on and Replacement Features Table Table&2&Features&Extracted&for&each&Session&! Feature F 1 F 2 F 3 F 4 F 5 F 6 F 7 F 8 Definition Search intent is comparison No user-click in session s!!"#$$ 5s.!!"#$$!!"!!h!!!"#!!"!!"#$$!time in a session. # of unique terms in session s 20. (!)!!"#$$_!"#_!"#!$ <!!"#$$_!"#_!"#!$ 2 Session s does not contain the most frequent search term in T(s). # of unique terms in session s 6!!!(!) #!!"!!"#!!"#!$%!!"!!"!!#$%!! #!!"!!"#!!"#!$%!!"!!"!!#$%!! <!(!) * T(s) means a session cluster including session s 14

Outline Introduc:on Methods and Approaches Ad- hoc Retrieval Model (Ad- hoc) Query Change Retrieval Model (QCM) Weighted QCM User- Click Model Clustering Session Performance Predic:on and Replacement Submissions Evalua:on Result Conclusion 15

RL1 RL2 RL3 Our Submissions GUS14RUN1 GUS14RUN2 GUS14RUN3 Weighted QCM (ω=0.65) Session Level User- Click Model Weighted QCM (ω=0.65) Topic Level User- Click Model Ad- hoc Retrieval Model Weighted QCM (ω=0.8) Session Level User- Click Model Weighted QCM (ω=0.8) Topic Level User- Click Model Weighted QCM (ω=0.8) Topic Level User- Click Model using topic ids Session Performance Predic:on and Replacement RL3 in RUN1 and RUN2 using session clusters based on query similarity RL3 in RUN3 using session cluster based on topic id Why? similar queries leads to similar retrieval list in our system. Not useful when apply session replacement strategy 16

Evalua1on Results GUS14RUN1 GUS14RUN2 GUS14RUN3 Max Med ndcg@10 P@10 ndcg@10 P@10 ndcg@10 P@10 ndcg@10 P@10 ndcg@10 P@10 RL1 0.2053 0.378 0.2053 0.378 0.2053 0.378 0.3890 0.629 0.1549 0.348 RL2 0.2458 0.426 0.2482 0.427 0.2482 0.427 0.4865 0.712 0.1626 0.372 RL3 0.2443 0.423 0.2458 0.424 0.2580 0.429 0.5111 0.744 0.1790 0.404 Ø 2 nd rank in task RL1, 1 st rank in task RL2 and RL3 Adjus:ng term weight based on query change is effec:ve Combining queries in a session is useful for Session Track User- Click is effec:ve to predicate relevance Ø A small performance drop from RL2 to RL3 in RUN1 and RUN2 cluster sessions based on query similarity may work, however need more work to refine it Ø A small increase from RL2 to RL3 in RUN3 For sessions sharing same search topics, replacing poor sessions results using good sessions is prac:cal. 17

Conclusion Achieve 20.9% increase from RL1 to RL2 by u:lizing query change feedback user click feedback Achieve 4% increase from RL2 to RL3 by Topic level User- Click Model Session performance predic:on and replacement

Thanks! Jiyun Luo, Xuchu Dong and Grace Hui Yang Department of Computer Science Georgetown University 19