From Neural Re-Ranking to Neural Ranking:

Similar documents
From Neural Re-Ranking to Neural Ranking: Learning a Sparse Representation for Inverted Indexing

A Deep Relevance Matching Model for Ad-hoc Retrieval

Estimating Embedding Vectors for Queries

Fractional Similarity : Cross-lingual Feature Selection for Search

Learning to Reweight Terms with Distributed Representations

Inverted List Caching for Topical Index Shards

University of Amsterdam at INEX 2010: Ad hoc and Book Tracks

A Study of MatchPyramid Models on Ad hoc Retrieval

UMass at TREC 2017 Common Core Track

Entity and Knowledge Base-oriented Information Retrieval

A RECOMMENDER SYSTEM FOR SOCIAL BOOK SEARCH

Window Extraction for Information Retrieval

Predicting Query Performance on the Web

Northeastern University in TREC 2009 Million Query Track

Learning to Rank for Faceted Search Bridging the gap between theory and practice

Learning Dense Models of Query Similarity from User Click Logs

Indri at TREC 2005: Terabyte Track (Notebook Version)

Supervised Reranking for Web Image Search

for Searching Social Media Posts

James Mayfield! The Johns Hopkins University Applied Physics Laboratory The Human Language Technology Center of Excellence!

Indexing Word-Sequences for Ranked Retrieval

Towards Better Text Understanding and Retrieval through Kernel Entity Salience Modeling

Seq2SQL: Generating Structured Queries from Natural Language Using Reinforcement Learning

Facial Expression Classification with Random Filters Feature Extraction

Effect of log-based Query Term Expansion on Retrieval Effectiveness in Patent Searching

Unsupervised Learning of Spatiotemporally Coherent Metrics

Jan Pedersen 22 July 2010

arxiv: v3 [cs.ir] 13 Jul 2018

An Information Retrieval Approach for Source Code Plagiarism Detection

End-to-End Neural Ad-hoc Ranking with Kernel Pooling

Efficient Execution of Dependency Models

Query Likelihood with Negative Query Generation

AComparisonofRetrievalModelsusingTerm Dependencies

One-Pass Ranking Models for Low-Latency Product Recommendations

Bi-directional Linkability From Wikipedia to Documents and Back Again: UMass at TREC 2012 Knowledge Base Acceleration Track

Diversifying Query Suggestions Based on Query Documents

The University of Amsterdam at the CLEF 2008 Domain Specific Track

Deep Learning Applications

An Axiomatic Approach to IR UIUC TREC 2005 Robust Track Experiments

Reducing Click and Skip Errors in Search Result Ranking

A New Measure of the Cluster Hypothesis

Book Recommendation based on Social Information

Experiments with ClueWeb09: Relevance Feedback and Web Tracks

Reducing Redundancy with Anchor Text and Spam Priors

Robust Relevance-Based Language Models

TREC-10 Web Track Experiments at MSRA

University of Virginia Department of Computer Science. CS 4501: Information Retrieval Fall 2015

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

Diversification of Query Interpretations and Search Results

Lecture 7: Relevance Feedback and Query Expansion

Query Phrase Expansion using Wikipedia for Patent Class Search

Exploring Size-Speed Trade-Offs in Static Index Pruning

Search Engines Information Retrieval in Practice

Modern Retrieval Evaluations. Hongning Wang

AN ENHANCED ATTRIBUTE RERANKING DESIGN FOR WEB IMAGE SEARCH

Deep Learning for Embedded Security Evaluation

PERSONALIZED TAG RECOMMENDATION

Representation Learning using Multi-Task Deep Neural Networks for Semantic Classification and Information Retrieval

Improving Patent Search by Search Result Diversification

BRACE: A Paradigm For the Discretization of Continuously Valued Data

TEXT CHAPTER 5. W. Bruce Croft BACKGROUND

Improving Web Page Retrieval using Search Context from Clicked Domain Names. Rongmei Li

CMPSCI 646, Information Retrieval (Fall 2003)

In-Place Activated BatchNorm for Memory- Optimized Training of DNNs

Content-based Dimensionality Reduction for Recommender Systems

Semantic Matching by Non-Linear Word Transportation for Information Retrieval

Multimodal topic model for texts and images utilizing their embeddings

Using Maximum Entropy for Automatic Image Annotation

Rishiraj Saha Roy and Niloy Ganguly IIT Kharagpur India. Monojit Choudhury and Srivatsan Laxman Microsoft Research India India

WebSci and Learning to Rank for IR

Autoencoder. Representation learning (related to dictionary learning) Both the input and the output are x

CMSC 476/676 Information Retrieval Midterm Exam Spring 2014

An Investigation of Basic Retrieval Models for the Dynamic Domain Task

Data-Intensive Distributed Computing

Automatic Boolean Query Suggestion for Professional Search

Effective Structured Query Formulation for Session Search

Face Recognition A Deep Learning Approach

Attentive Neural Architecture for Ad-hoc Structured Document Retrieval

Learning to Rank. Tie-Yan Liu. Microsoft Research Asia CCIR 2011, Jinan,

Experiments of Image Retrieval Using Weak Attributes

Learning Concept Importance Using a Weighted Dependence Model

S-MART: Novel Tree-based Structured Learning Algorithms Applied to Tweet Entity Linking

Deep Character-Level Click-Through Rate Prediction for Sponsored Search

INFORMATION RETRIEVAL SYSTEMS: Theory and Implementation

Single Image Depth Estimation via Deep Learning

Pseudo-Relevance Feedback and Title Re-Ranking for Chinese Information Retrieval

This lecture: IIR Sections Ranked retrieval Scoring documents Term frequency Collection statistics Weighting schemes Vector space scoring

Autoencoders. Stephen Scott. Introduction. Basic Idea. Stacked AE. Denoising AE. Sparse AE. Contractive AE. Variational AE GAN.

Comparative Analysis of Clicks and Judgments for IR Evaluation

PTE : Predictive Text Embedding through Large-scale Heterogeneous Text Networks

Exploiting Global Impact Ordering for Higher Throughput in Selective Search

CPSC 340: Machine Learning and Data Mining. Multi-Dimensional Scaling Fall 2017

Practical Relevance Ranking for 10 Million Books.

Department of Computer Science and Engineering B.E/B.Tech/M.E/M.Tech : B.E. Regulation: 2013 PG Specialisation : _

PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space

COMPUTATIONAL INTELLIGENCE SEW (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 6: k-nn Cross-validation Regularization

Indri at TREC 2005: Terabyte Track

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

Chapter IR:IV. IV. Indexing. Abstract Model of Ranking Inverted Indexes Compression Auxiliary Structures Index Construction Query Processing

ECG782: Multidimensional Digital Signal Processing

Transcription:

From Neural Re-Ranking to Neural Ranking: Learning a Sparse Representation for Inverted Indexing Hamed Zamani (1), Mostafa Dehghani (2), W. Bruce Croft (1), Erik Learned-Miller (1), and Jaap Kamps (2) (1) University of Massachusetts Amherst, USA (2) University of Amsterdam, The Netherlands Slidesby HamedZamani

Motivation Learning to rank is core in modern search engines. LTR models (& neural rankers) are stacked re-ranking a small set of documents retrieved by early stage ranker. The performance of LTR & neural models is upperbounded by the recall of the early stage rankers. We propose a Standalone Neural Ranking Model: end to end standalone efficient effective

Main Idea Looking inside the models: (Effective) models tend to contain many features that aren t characteristic or essential. They aren t harmful, so aren t selected out/removed. Can we reward models for being parsimonious? Introducing a sparsity property into the learned query and document representations. Building an inverted index for the learned representations. Retrieving documents using the constructed inverted index.

Main Idea

Objectives Two objectives: Relevance objective Hinge loss as a pairwise loss function Sparsity objective sparsity ratio * = # -. /01-02030456 74 8 8 Maximizing sparsity ratio is equivalent to minimizing : ; norm: 8 < = * =?@A * =?. (defining 0 = = 0) However, the < = norm is not differentiable! : C can be used to approximate < =. Note that we use ReLU (DE<F G = max(0, G)) as the activation function in our network, which sets all non-positive values to zero.

Loss Function % &'()* + ', - '., - '/, 0 ' 1 23456478 539:49; <=77 C=9C3D893D4=9 =E FG85H39I I=CGJ89D 58258789D3D4=97 +? %. (+ ' - '. - '/ ) %. 37 DK8 723574DH <=77! controls the sparsity of the learned representations. Note that the representation dimensionality should be high (e.g., up to 20$ in our experiments).

Training in a Nutshell The relevance objective helps the model distinguish relevant vs. non-relevant. The sparsity objective applies here to increase the sparsity ratio of the final representations.

Post-Training in a Nutshell How to compute query and document representations? An offline process for Query processing inverted index construction

Query and Document Representations In designing the representation learning sub-network, we should consider that queries generally contain less information that documents. Therefore, query representations are expected to have less nonzero elements. GPU memory is limited (e.g., 12GB) and the output dimensionality should be high. The network parameters and the data for each batch should fit into memory. Queries and documents should be in the same semantic space. Therefore, parameters should be shared. Based on these points, we decided to learn representation for each n-gram and aggregate them.

Representation Learning Sub-network The network starts with an embedding layer. The embeddings are learned in an end-to-end fashion.! "#$%& first learns a low dimensional manifold of the data (for high abstraction) and then increases the dimensionality.

Query and Document Representations " = 1, -./+ " & + 1 ( )*+ q -n+1 << d -n+1 and 0 12345 is shared. 0 12345 (7 ), 7 )/+,, 7 )/.-+ ) 0 12345 learns a representation for each n-gram. Query representation is also computed, similarly. Why will this lead to higher sparsity for queries compared to documents?

Overview This model has millions of parameters. How to obtain enough data for training? Answer: weak supervision.

Weak Supervision Using Programmatically generated labels Weak supervision for ranking task. Collect large number of queries Using an existing retrieval model (e.g., query likelihood), retrieve documents for each query from a collection C. Assume that the query likelihood score is true label (!) and train your pairwise model based on this data. M. Dehghani, H. Zamani, A. Severyn, J. Kamps, W. B. Croft. Neural Ranking Models with Weak Supervision, In Proc. of SIGIR 17.

Experiments Two collections: Robust (250 queries) and ClueWeb09-Cat.B (200 queries) Two fold cross-validation over the queries of each collection for hyper-parameter tuning. Over 6M unique queries extracted from AOL query logs (excluding navigational queries) for generating weak supervision data using query likelihood. Metrics: MAP@1000, P@20, ndcg@20, and Recall@1000

Sparsity and Efficiency Statistics of non-zero dimensions (out of 10,000 dimensions per document. Average running time per query in milliseconds. Minimizing! " leads to higher sparsity ratio. The number of non-zero elements in query representations is much less than that in the document representations. The SNRM s retrieval time (computing query representation + scoring documents using the inverted index) is comparable with term matching models. Therefore, SNRM is efficient.

Effectiveness FNRM [Dehghani et al., SIGIR 17] and CNRM (FNRM with convolution) re-rank 2000 documents retrieved by query likelihood. SNRM outperforms FNRM and CNRM in terms of recall@1000. SNRM with PRF significantly outperforms all the baselines. Note that it is not clear how to incorporate PRF in existing neural models and simple/intuitive approaches do not work well (because of the dense representations)

Robustness to Collection Growth The retrieval performance by removing some random documents from the Robust collection at the training time. Removing 1% of documents (over 5k documents) from the training set does not significantly affect the performance. Removing 5% of documents (over 26k documents) from the training set significantly drops the performance. In the settings where new documents are frequently added to the collection, the model should be re-trained periodically.

Conclusion We proposed a Standalone Neural Ranking Model (SNRM) that can retrieve documents from a large collection. SNRM does not require a first stage ranker. It actually can be a first stage ranker. It can be potentially trained to maximize recall. SNRM is trained end to end. SNRM can take advantage of pseudo-relevance feedback. SNRM outperforms state-of-the-art IR models.

Thank You! code: https://github.com/hamed-zamani/snrm