Effective Latent Space Graph-based Re-ranking Model with Global Consistency

Similar documents
WebSci and Learning to Rank for IR

CS249: ADVANCED DATA MINING

Behavioral Data Mining. Lecture 18 Clustering

Heterogeneous Graph-Based Intent Learning with Queries, Web Pages and Wikipedia Concepts

Topic-Level Random Walk through Probabilistic Model

A Deep Relevance Matching Model for Ad-hoc Retrieval

PERSONALIZED TAG RECOMMENDATION

Mining Web Data. Lijun Zhang

Clustering Results. Result List Example. Clustering Results. Information Retrieval

Multi-label Classification. Jingzhou Liu Dec

Exploiting Index Pruning Methods for Clustering XML Collections

Alternating Minimization. Jun Wang, Tony Jebara, and Shih-fu Chang

DOCUMENT RETRIEVAL USING A PROBABILISTIC KNOWLEDGE MODEL

Re-ranking Documents Based on Query-Independent Document Specificity

A Machine Learning Approach for Information Retrieval Applications. Luo Si. Department of Computer Science Purdue University

vector space retrieval many slides courtesy James Amherst

A ew Algorithm for Community Identification in Linked Data

Clustering. Bruno Martins. 1 st Semester 2012/2013

LRLW-LSI: An Improved Latent Semantic Indexing (LSI) Text Classifier

Fast Inbound Top- K Query for Random Walk with Restart

Jan Pedersen 22 July 2010

Jianyong Wang Department of Computer Science and Technology Tsinghua University

June 15, Abstract. 2. Methodology and Considerations. 1. Introduction

Using PageRank in Feature Selection

Graph-based Techniques for Searching Large-Scale Noisy Multimedia Data

Feature selection. Term 2011/2012 LSI - FIB. Javier Béjar cbea (LSI - FIB) Feature selection Term 2011/ / 22

ENHANCEMENT OF METICULOUS IMAGE SEARCH BY MARKOVIAN SEMANTIC INDEXING MODEL

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li

Feature selection. LING 572 Fei Xia

Using PageRank in Feature Selection

Mining Web Data. Lijun Zhang

University of Delaware at Diversity Task of Web Track 2010

Personalized Information Retrieval

Combining Implicit and Explicit Topic Representations for Result Diversification

A Taxonomy of Semi-Supervised Learning Algorithms

Text Analytics (Text Mining)

Advanced Topics in Information Retrieval. Learning to Rank. ATIR July 14, 2016

Combining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating

Query Sugges*ons. Debapriyo Majumdar Information Retrieval Spring 2015 Indian Statistical Institute Kolkata

Supervised Models for Multimodal Image Retrieval based on Visual, Semantic and Geographic Information

Part I: Data Mining Foundations

Entity and Knowledge Base-oriented Information Retrieval

Learning to Rank Networked Entities

Improving Difficult Queries by Leveraging Clusters in Term Graph

Liangjie Hong*, Dawei Yin*, Jian Guo, Brian D. Davison*

Unsupervised Rank Aggregation with Distance-Based Models

An Empirical Comparison of Collaborative Filtering Approaches on Netflix Data

Text Categorization (I)

on learned visual embedding patrick pérez Allegro Workshop Inria Rhônes-Alpes 22 July 2015

Introduction to Information Retrieval

Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p.

Fast Discriminative Component Analysis for Comparing Examples

Retrieval by Content. Part 3: Text Retrieval Latent Semantic Indexing. Srihari: CSE 626 1

A Multiple-stage Approach to Re-ranking Clinical Documents

Query Independent Scholarly Article Ranking

Data mining. Classification k-nn Classifier. Piotr Paszek. (Piotr Paszek) Data mining k-nn 1 / 20

Metric Learning for Large-Scale Image Classification:

Bing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. Springer

Data Mining: Dynamic Past and Promising Future

Object and Action Detection from a Single Example

Latent Semantic Indexing

On Finding Power Method in Spreading Activation Search

Centralities (4) By: Ralucca Gera, NPS. Excellence Through Knowledge

Unsupervised discovery of category and object models. The task

LETTER Local and Nonlocal Color Line Models for Image Matting

TREC 2016 Dynamic Domain Track: Exploiting Passage Representation for Retrieval and Relevance Feedback

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010

Thanks to Jure Leskovec, Anand Rajaraman, Jeff Ullman

Fall CS646: Information Retrieval. Lecture 2 - Introduction to Search Result Ranking. Jiepu Jiang University of Massachusetts Amherst 2016/09/12

International Journal of Advance Engineering and Research Development. A Survey of Document Search Using Images

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University

Chapter 6: Information Retrieval and Web Search. An introduction

Text Analytics (Text Mining)

COMP5331: Knowledge Discovery and Data Mining

Prototype of Silver Corpus Merging Framework

Information Retrieval

Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University Infinite data. Filtering data streams

Re-contextualization and contextual Entity exploration. Sebastian Holzki

TGNet: Learning to Rank Nodes in Temporal Graphs. Qi Song 1 Bo Zong 2 Yinghui Wu 1,3 Lu-An Tang 2 Hui Zhang 2 Guofei Jiang 2 Haifeng Chen 2

An Investigation of Basic Retrieval Models for the Dynamic Domain Task

Ranking Web Pages by Associating Keywords with Locations

Clickthrough Log Analysis by Collaborative Ranking

TriRank: Review-aware Explainable Recommendation by Modeling Aspects

Bibliometrics: Citation Analysis

Relative Constraints as Features

Text Modeling with the Trace Norm

NYU CSCI-GA Fall 2016

Data Preprocessing. Javier Béjar. URL - Spring 2018 CS - MAI 1/78 BY: $\

Search Engines. Information Retrieval in Practice

Relational Retrieval Using a Combination of Path-Constrained Random Walks

Video Google: A Text Retrieval Approach to Object Matching in Videos

Information Retrieval. hussein suleman uct cs

Unsupervised Learning

Mixture Models and the EM Algorithm

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015

Understanding the Query: THCIB and THUIS at NTCIR-10 Intent Task. Junjun Wang 2013/4/22

TRANSDUCTIVE LINK SPAM DETECTION

Efficient Algorithms may not be those we think

Large-Scale Face Manifold Learning

Reviewer Profiling Using Sparse Matrix Regression

Transcription:

Effective Latent Space Graph-based Re-ranking Model with Global Consistency Feb. 12, 2009 1

Outline Introduction Related work Methodology Graph-based re-ranking model Learning a latent space graph A case study and the overall algorithm Experiments Conclusions and Future Work 2

Introduction Problem definition Given a set of documents D d 1 d 2 A term vector d i = x i Relevance scores using VSM or LM A connected graph Explicit link (e.g., hyperlinks) Implicit link (e.g., inferred from the content information) Many other features d 3 d 4 d 5 How to leverage the interconnection between documents/entities to improve the d ranking of retrieved results 2 with respect to the query? d 1 d 3 d 4 d 5 q 3

Introduction Initial ranking scores: relevance Graph structure: centrality (importance, authority) Simple method: Combine those two parts linearly Limitations: Do not make full use of the information Treat each of them individually What we have done? Propose a joint regularization framework Combine the content with link information in a latent space graph 4

Related work Structural re-ranking model Regularization framework Learning a latent space Using some variations of PageRank and HITS Centrality within graphs (Kurland and Lee, SIGIR 05 & SIGIR 06) Improve Web search results using affinity graph (Zhang et al., SIGIR 05) Improve an initial ranking by random walk in entity-relation networks (Minkov et al., SIGIR 06) Linear combination, treat the content and link individually 5

Related work Structural re-ranking model Regularization framework Learning a latent space Using some variations of Regularization framework PageRank and HITS Graph Laplacians for label propagation Centrality (two classes) within (Zhu graphs et al., (Kurland ICML 03, and Zhou Lee, et SIGIR 05 al., NIPS 03) & SIGIR 06) Improve Extent Web the graph search harmonic results function using to affinity multiple graph classes (Zhang (Mei et et al., WWW 08) SIGIR 05) Improve Score regularization an initial ranking to adjust by ad-hoc random retrieval walk scores in entity-relation (Diaz, CIKM 05) networks Enhance (Minkov learning et to al., rank SIGIR 06) with parameterized regularization models (Qin et al., WWW 08) Query-independent settings Do not consider multiple relationships between objects. 6

Related work Structural re-ranking model Regularization framework Learning a latent space Using some variations of Regularization framework PageRank Learning and a HITS latent space Graph Laplacians for label propagation Centrality (two Latent classes) within Semantic (Zhu graphs et Analysis al., (Kurland ICML 03, (LSA) and Zhou Lee, (Deerwester et SIGIR 05 al., NIPS 03) et & al., SIGIR JASIS 90) 06) Improve Extent Probabilistic Web the graph search LSI harmonic (plsi) results (Hofmann, function using to affinity multiple SIGIR 99) graph classes (Zhang (Mei et et al., WWW 08) plsi + PHITS (Cohn and Hofmann, SIGIR 05) NIPS 00) Improve Score regularization an initial ranking to adjust by ad-hoc Combine content and link for random retrieval walk scores in entity-relation (Diaz, CIKM 05) classification using matrix factorization networks Enhance (Zhu(Minkov et learning al., SIGIR 07) et to al., rank SIGIR 06) with parameterized regularization models (Qin et al., WWW 08) Use the joint factorization to learning the latent feature. Difference: leverage the latent feature for building a latent space graph. 7

Methodology Graph-based re-ranking model + Case study: Expert finding Learning a latent space graph 8

Graph-based re-ranking model III. Methodology Intuition: Global consistency: similar documents are most likely to have similar ranking scores with respect to a query. The initial ranking scores provides invaluable information Regularization framework Parameter Global consistency Fit initial scores 9

Graph-based re-ranking model III. Methodology Optimization problem A closed-form solution Connection with other methods µ α 0, return the initial scores µ α 1, a variation of PageRank-based model µ α (0, 1), combine both information simultaneously 10

Methodology Graph-based re-ranking model + Case study: Expert finding Learning a latent space graph 11

Learning a latent space graph III. Methodology Objective: incorporate the content with link information (or relational data) simultaneously Latent Semantic Analysis Joint factorization Combine the content with relational data Build latent space graph Calculate the weight matrix W 12

Latent Semantic Analysis III. Methodology - Learning a latent space graph Map documents to vector space of reduced dimensionality SVD is performed on the matrix The largest k singular values Reformulated as an optimization problem 13

III. Methodology - Learning a latent space graph Embedding multiple relational data Taking the papers as an example Paper-term matrix C Paper-author matrix A A unified optimization problem NxL A C NxM Conjugate Gradient V C + X NxK V A 14

III. Methodology - Learning a latent space graph Embedding multiple relational data Taking the papers as an example Paper-term matrix C Paper-author matrix A A unified optimization problem NxL A C NxM Conjugate Gradient V C + X NxK V A 15

Build latent space graph The edge weight w ij is defined III. Methodology - Learning a latent space graph W 16

Methodology Graph-based re-ranking model + Case study: Expert finding Learning a latent space graph 17

Case study: Application to expert finding Utilize statistical language model to calculate the initial ranking scores The probability of a query given a document III. Methodology Infer a document model θ d for each document The probability of the query generated by the document model θ d The product of terms generated by the document model (Assumption: each term are independent) 18

Case study: Application to expert finding Expert Finding: Identify a list of experts in the academic field for a given query topic (e.g., data mining Jiawei Han, etc ) Publications as representative of their expertise Use DBLP dataset to obtain the publications Authors have expertise in the topic of their papers Overall aggregation of their publications III. Methodology Refine the ranking scores of papers, then aggregate the refined scores to re-rank the experts 19

III. Methodology Case study: Application to expert finding 20

Experiments DBLP Collection A subset of the DBLP records (15-CONF) Statistics of the 15-CONF collection 21

Benchmark Dataset IV. Experiments A benchmark dataset with 16 topics and expert lists 22

Evaluation Metrics IV. Experiments Precision at rank n (P@n): Mean Average Precision (MAP): Bpref: The score function of the number of non-relevant candidates 23

Preliminary Experiments IV. Experiments Evaluation results (%) PRRM may not improve the performance GBRM achieve the best results 24

Details of the results IV. Experiments 25

IV. Experiments Effect of parameter µ α µ α 0, return the initial scores (baseline) µ α 1, discard the initial scores, consider the global consistency over the graph 26

IV. Experiments Effect of parameter µ α Robust, achieve the best results when µ α (0.5, 0.7) 27

Effect of graph construction IV. Experiments Different dimensionality (k d ) of the latent feature, which is used to calculate the weight matrix W Become better for greater k d, because higher dimensional space can better capture the similarities k d, = 50 achieve better results than tf.idf 28

Effect of graph construction IV. Experiments Different number of nearest neighbors (k nn ) Tends to degrade a little with increasing k nn k nn = 10 achieve the best results Average processing time: increase linearly with the increase of k nn 29

Conclusions and Future Work Conclusions Leverage the graph-based model for the query-dependent ranking problem Integrate the latent space with the graph-based re-ranking model Address expert finding task in a the academic field using the proposed method The improvement in our proposed model is promising Future work Extend our framework to consider more features Apply the framework to other applications and large-scale dataset 30

Q&A Thanks! 31