A Deep Relevance Matching Model for Ad-hoc Retrieval

Similar documents
A Study of MatchPyramid Models on Ad hoc Retrieval

A Deep Top-K Relevance Matching Model for Ad-hoc Retrieval

arxiv: v1 [cs.ir] 16 Oct 2017

End-to-End Neural Ad-hoc Ranking with Kernel Pooling

From Neural Re-Ranking to Neural Ranking:

Semantic Matching by Non-Linear Word Transportation for Information Retrieval

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li

Structured Element Search for Scientific Literature

Representation Learning using Multi-Task Deep Neural Networks for Semantic Classification and Information Retrieval

arxiv: v1 [cs.ir] 15 May 2018

From Neural Re-Ranking to Neural Ranking: Learning a Sparse Representation for Inverted Indexing

Entity and Knowledge Base-oriented Information Retrieval

FastText. Jon Koss, Abhishek Jindal

Attentive Neural Architecture for Ad-hoc Structured Document Retrieval

WebSci and Learning to Rank for IR

James Mayfield! The Johns Hopkins University Applied Physics Laboratory The Human Language Technology Center of Excellence!

Estimating Embedding Vectors for Queries

Introduction to Information Retrieval

Learning to Rank with Attentive Media Attributes

Deep Character-Level Click-Through Rate Prediction for Sponsored Search

Classification and retrieval of biomedical literatures: SNUMedinfo at CLEF QA track BioASQ 2014

Information Retrieval. (M&S Ch 15)

Machine Learning 13. week

Effective Latent Space Graph-based Re-ranking Model with Global Consistency

Information Retrieval: Retrieval Models

Chapter 6: Information Retrieval and Web Search. An introduction

Improving Patent Search by Search Result Diversification

An Investigation of Basic Retrieval Models for the Dynamic Domain Task

Deep and Reinforcement Learning for Information Retrieval

Learning to Reweight Terms with Distributed Representations

arxiv: v1 [cs.ir] 26 Oct 2016

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University

Improving Difficult Queries by Leveraging Clusters in Term Graph

Chapter 27 Introduction to Information Retrieval and Web Search

Learning a Hierarchical Embedding Model for Personalized Product Search

An Exploration of Query Term Deletion

Channel Locality Block: A Variant of Squeeze-and-Excitation

Diversifying Query Suggestions Based on Query Documents

AComparisonofRetrievalModelsusingTerm Dependencies

CS 6320 Natural Language Processing

Conditional Random Fields as Recurrent Neural Networks

Better Contextual Suggestions in ClueWeb12 Using Domain Knowledge Inferred from The Open Web

Instructor: Stefan Savev

Artificial Intelligence Introduction Handwriting Recognition Kadir Eren Unal ( ), Jakob Heyder ( )

Definitions. Lecture Objectives. Text Technologies for Data Science INFR Learn about main concepts in IR 9/19/2017. Instructor: Walid Magdy

End-To-End Spam Classification With Neural Networks

27: Hybrid Graphical Models and Neural Networks

Multimodal Information Spaces for Content-based Image Retrieval

Lecture 20: Neural Networks for NLP. Zubin Pahuja

Chapter 2. Architecture of a Search Engine

Reducing Redundancy with Anchor Text and Spam Priors

In = number of words appearing exactly n times N = number of words in the collection of words A = a constant. For example, if N=100 and the most

WEIGHTING QUERY TERMS USING WORDNET ONTOLOGY

Query Likelihood with Negative Query Generation

CS54701: Information Retrieval

Explicit Semantic Ranking for Academic Search via Knowledge Graph Embedding

TREC 2016 Dynamic Domain Track: Exploiting Passage Representation for Retrieval and Relevance Feedback

Kernels vs. DNNs for Speech Recognition

Learning to Match using Local and Distributed Representations of Text for Web Search

Lecture 21 : A Hybrid: Deep Learning and Graphical Models

A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2

A Hybrid Neural Model for Type Classification of Entity Mentions

Deep Learning and Its Applications

A New Measure of the Cluster Hypothesis

Document indexing, similarities and retrieval in large scale text collections

Feature LDA: a Supervised Topic Model for Automatic Detection of Web API Documentations from the Web

Grounded Compositional Semantics for Finding and Describing Images with Sentences

Robust Relevance-Based Language Models

Information Retrieval

Word Embeddings in Search Engines, Quality Evaluation. Eneko Pinzolas

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

Can Active Memory Replace Attention?

Multimodal topic model for texts and images utilizing their embeddings

NPRF: A Neural Pseudo Relevance Feedback Framework for Ad-hoc Information Retrieval

University of Virginia Department of Computer Science. CS 4501: Information Retrieval Fall 2015

Convolutional Networks for Text

vector space retrieval many slides courtesy James Amherst

Match-SRNN: Modeling the Recursive Matching Structure with Spatial RNN

Department of Electronic Engineering FINAL YEAR PROJECT REPORT

Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda

IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, 2013 ISSN:

Investigate the use of Anchor-Text and of Query- Document Similarity Scores to Predict the Performance of Search Engine

Information Retrieval

Information Retrieval CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science

ResPubliQA 2010

Towards Conversational Search and Recommendation: System Ask, User Respond

Joint Representation Learning for Top-N Recommendation with Heterogeneous Information Sources

Promoting Ranking Diversity for Biomedical Information Retrieval based on LDA

A novel supervised learning algorithm and its use for Spam Detection in Social Bookmarking Systems

Semantic image search using queries

Deep Learning for Embedded Security Evaluation

CS229 Final Project: Predicting Expected Response Times

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic

UMass at TREC 2006: Enterprise Track

A Neuro Probabilistic Language Model Bengio et. al. 2003

Making Retrieval Faster Through Document Clustering

A Formal Approach to Score Normalization for Meta-search

Verbose Query Reduction by Learning to Rank for Social Book Search Track

Automatic Boolean Query Suggestion for Professional Search

CSCI 5417 Information Retrieval Systems. Jim Martin!

Transcription:

A Deep Relevance Matching Model for Ad-hoc Retrieval Jiafeng Guo 1, Yixing Fan 1, Qingyao Ai 2, W. Bruce Croft 2 1 CAS Key Lab of Web Data Science and Technology, Institute of Computing Technology, Chinese Academy of Sciences 2 Center for Intelligent Information Retrieval, University of Massachusetts Amherst

Outline Motivation Problem Analysis Our Approach Experiments Conclusions

Outline Motivation Problem Analysis Our Approach Experiments Conclusions

documents Machine learning to Rank Machine learning methods have been successfully applied to Information retrieval (IR) queries q (1) q (2) d 1 (1), 4 d 1 (1), 2 : d 1 (1), 1 d 1 (1), 4 d 1 (1), 2 : d 1 (1), 1 Learning Systems min Loss Model f x, w x: feature representation of <q,d> Human defined features Time-consuming Incomplete Over-specified

Success of deep learning Deep neural networks have led to exciting breakthroughs in a variety of applications Discover the hidden structures and features at different levels of abstraction from the training data that are useful for the tasks Computer Vision Speech Recognition Machine Translation Natural to apply deep learning methods for representation learning in IR

Ad-hoc Retrieval as a Matching Problem The core problem in ad-hoc retrieval can be formalized as a text matching problem match T 1, T 2 = F(φ T 1, φ T 2 ) Scoring function based on the interaction between texts Map each text to a representation vector

Ad-hoc Retrieval as a Matching Problem The core problem in ad-hoc retrieval can be formalized as a text matching problem match T 1, T 2 = F(φ T 1, φ T 2 ) Scoring function based on the interaction between texts Map each text to a representation vector A general formalization Task T 1 T 2 Ad-hoc Retrieval Query Document Paraphrase Identification String A String B Question Answering Question Answer Automatic Conversation Dialog Response

Existing Deep Matching Models Representation-focused Models build a good representation for a single text with a deep neural network Conduct matching between two compositional and abstract text representations Representative methods: DSSM [Huang et al. 2013], C-DSSM [Shen et al. 2014], ARC-I [Hu et al. 2014] Matching Function match T 1, T 2 = F(φ T 1, φ T 2 ) Deep Network Simple matching function, e.g. cosine similarity Complex representation mapping function Input Vector T 1 T 2 Siamese (symmetric) architecture over the text inputs

hierarchical deep architecture over the local interaction matrix Existing Deep Matching Models Interaction-focused Models Build the local interactions between two texts based on some basic representations Use deep neural networks to learn the hierarchical interaction patterns for matching Representative methods: DeepMatch [Lu et al. 2014], ARC-II [Hu et al. 2014], MatchPyramid [Pang et al. 2016] match T 1, T 2 = F(φ T 1, φ T 2 ) T 1 Deep Network Local Interactions Complex deep model, e.g. CNN, DNN A simple mapping function, e.g. sequence of word vectors T 2

Observations However Mostly demonstrated effective on a set of NLP tasks e.g. paraphrase identification, QA Tested on non-typical ad-hoc retrieval setting e.g., DSSM and C-DSSM, were only evaluated on the <query, doc title> pairs Directly apply on benchmark retrieval collections: relatively poor performance! TREC collections; Compared with language model and BM25; A General Formalization Task T 1 T 2 Ad-hoc Retrieval Query Document Paraphrase Identification String A String B Question Answering Question Answer Automatic Conversation Dialog Response

Research Questions 1. Is matching in ad-hoc retrieval really the same as that in NLP tasks? 2. Are the existing deep matching models suitable for the ad-hoc retrieval task? A General Formalization Task T 1 T 2 Ad-hoc Retrieval Query Document Paraphrase Identification String A String B Question Answering Question Answer Automatic Conversation Dialog Response

Outline Motivation Problem Analysis Our Approach Experiments Conclusions

Problem Analysis Matching in many NLP tasks Task T 1 T 2 Ad-hoc Retrieval Query Document Paraphrase Identification String A String B Question Answering Question Answer Automatic Conversation Dialog Response Semantic matching: identifying the semantic meaning and inferring the semantic relations between two pieces of text Homogeneous texts Consist of a few natural language sentences

Sematic Matching Similarity Matching Signals Important to capture the semantic similarity/relatedness; Automatic conversation: S: Where do you come from? R: I am from Madrid, Spain.

Sematic Matching Similarity Matching Signals Important to capture the semantic similarity/relatedness; Compositional meanings Natural language sentences; Compositional meaning based on their grammatical structures; Automatic conversation: S: Where do you come from? R: I am from Madrid, Spain. root Where do you come from?

Sematic Matching Similarity Matching Signals Important to capture the semantic similarity/relatedness; Compositional meanings Natural language sentences; Compositional meaning based on their grammatical structures; Global matching requirement Limited lengths and concentrated topic scope; Treat the text as a whole to infer the semantic relations Automatic conversation: S: Where do you come from? R: I am from Madrid, Spain. root Where do you come from? S: Where do you come from? R: I am from Madrid, Spain. R: I am a student.

Problem Analysis Matching in ad-hoc retrieval Task T 1 T 2 Ad-hoc Retrieval Query Document Paraphrase Identification String A String B Question Answering Question Answer Automatic Conversation Dialog Response Relevance matching: identifying whether a document is relevant to a given query. The query is typically short and keyword based; The document can vary considerably in length, from tens of words to thousands or even tens of thousands of words.

Relevance Matching Exact matching signals The exact matching of terms is still the most important signal in ad-hoc retrieval; Indexing and search paradigm in modern search engines;

Relevance Matching Exact matching signals The exact matching of terms is still the most important signal in ad-hoc retrieval; Indexing and search paradigm in modern search engines; Query term importance Query: mainly short and keyword based without complex grammar Critical to take into account term importance bitcoin news >

Relevance Matching Exact matching signals The exact matching of terms is still the most important signal in ad-hoc retrieval; Indexing and search paradigm in modern search engines; Query term importance Query: mainly short and keyword based without complex grammar Critical to take into account term importance Diverse matching requirement Long document: Different hypotheses concerning document length in ad-hoc retrieval bitcoin news > 1. Verbosity Hypothesis Global relevance 2. Scope Hypothesis Relevance matching could happen in any part of a document

Semantic Matching vs. Relevance Matching There are large differences between relevance matching in adhoc retrieval and semantic matching in NLP tasks Affect the design of deep model architectures No one-fit-all matching models Semantic Matching Similarity Matching Signals Compositional meanings Global matching requirement Relevance Matching Exact matching signals Query term importance Diverse matching requirement

Re-Visit Deep Matching Models Most existing models are about semantic matching rather than relevance matching. Representation-focused models: Focus on the compositional meaning of the texts; Fit the global matching requirement; Exact matching signals are lost. Matching Function DSSM [Huang et al. 2013] Deep Network C-DSSM [Shen et al. 2014] ARC-I [Hu et al. 2014] Input Vector T 1 T 2

Re-Visit Deep Matching Models Most existing models are about semantic matching rather than relevance matching. Interaction-focused models: Preserve detailed matching signals but do not differentiate exact and similarity matching signals in model; Do not address the query term importance; Cannot meet the diverse matching requirement; T 1 Deep Network Local Interactions DeepMatch [Lu et al. 2014] ARC-II [Hu et al. 2014] MatchPyramid [Pang et al. 2016] T 2

Outline Motivation Problem Analysis Our Approach Experiments Conclusions

A Deep Relevance Matching Model Specifically designed for relevance matching in ad-hoc retrieval by explicitly addressing the three factors Exact matching signals Query term importance Diverse matching requirement Our model employs a joint deep architecture at the query term level over the local interactions between query words and document words Similar to interaction-focused models rather than representationfocused models

Model Architecture Matching Score Score Aggregation g 1 g 2 g 3 Feed Forward Matching Network Term Gating Network Matching Histogram Mapping Local Interaction q d

Model Architecture Matching Score Score Aggregation g 1 g 2 g 3 2 Feed Forward Matching Network 3 Term Gating Network 1 Matching Histogram Mapping Local Interaction q d

1. Matching Histogram Mapping Matching histogram: map the varied-size interactions into a fixedlength representation Groups local interactions according to different strength levels position-free but strength-focused representation z i (0) = h wi q d, i = 1,, M q d cosine similarity Different mappings h(): Count-based histogram: frequency Normalized histogram: relative frequency LogCount-based histogram: logarithm

2. Feed forward Matching Network FFMN Extract hierarchical matching patterns from different levels of interaction signals z i (0) = h wi q d, i = 1,, M z i (l) = tanh W (l) z i (l 1) + b (l), i = 1,, M, l = 1,, L Feed Forward Matching Network Matching Histogram Mapping

3. Term Gating Network Control how much relevance score on each query term contribute to the final relevance score M g i = exp(w (q) (L) gx i ) s = g i z i, i = 1,, M σ M (q) j=1 exp(w g x i=1 j ) Input: Term vector [Zheng et al. SIGIR2015] Inverse document frequency Matching Score Score Aggregation g 1 g 2 g 3 q Term Gating Network

Model Discussion Different Input Representations Existing Interaction-focused models Matching matrix position preserving zero-padding signals are equal Our Model Matching histogram strength preserving no need for padding distinguish exact/similarity signals q d

Model Discussion Different Model Architecturs Existing Interaction-focused models: CNNs base on matching matrix Learn positional regularities in matching patterns Suitable for image recognition and global matching requirement (i.e., all the positions are important) Not suitable for diverse matching requirement (i.e., no positional regularity) Deep Network T 1 Local Interactions matching matrix T 2 Our method: DNN on matching histogram Learn position-free but strength-focused patterns Explicitly model term importance

Model Training As a ranking task, we employ pairwise ranking loss such as hinge loss for training our model Back-propagation SGD with AdaGrad Mini-batch (20 in size) Early stopping L q, d +, d ; Θ = max(0,1 s q, d + + s(q, d ))

Outline Motivation Problem Analysis Our Approach Experiments Conclusions

Experimental Settings Dataset: Robust04: news collection ClueWeb09-Cat-B: Web collection Pre-processing: Word stemming using Krovetz stemmer Stop word remove in queries using INQUERY stop list Evaluation Methodology: 5-fold cross validation Tuned towards MAP Evaluated by MAP, ndcg@20, P@20 Robust04 ClueWeb09-Cat-B Vocabulary 0.6M 38M Document Count 0.5M 34M Collection Length 252M 26B Query Count 250 150 The ClueWeb-09-Cat-B collection has been filtered to the set of documents in the 60th percentile of spam scores.

Baseline Methods Traditional retrieval models QL: Query likelihood model based on Dirichlet smoothing BM25: The classical probabilistic retrieval model

Baseline Methods Traditional retrieval models QL: Query likelihood model based on Dirichlet smoothing BM25: The classical probabilistic retrieval model Representation-focused deep matching models DSSM T /DSSM D : A state-of-the-art deep matching model for Web search using feed forward neural network (released model trained on large clickthrough data) C-DSSM T /C-DSSM D : A similar deep matching model to DSSM for Web search, replacing DNN with CNN (released model trained on large clickthrough data) ARC-I: A general representation-focused deep matching model based on CNN (Implement according to the original paper)

Baseline Methods Traditional retrieval models QL: Query likelihood model based on Dirichlet smoothing BM25: The classical probabilistic retrieval model Representation-focused deep matching models DSSM T /DSSM D : A state-of-the-art deep matching model for Web search using feed forward neural network (released model trained on large clickthrough data) C-DSSM T /C-DSSM D : A similar deep matching model to DSSM for Web search, replacing DNN with CNN (released model trained on large clickthrough data) ARC-I: A general representation-focused deep matching model based on CNN (Implement according to the original paper) Interaction-focused deep matching models ARC-II: A hierarchical model over local interactions using CNN (Implement according to the original paper) MP: MatchPyramid is another state-of-the-art interaction-focused deep matching model. There are three variants of the model based on different interaction operators, denoted as MP IND, MP COS, and MP DOT (Released codes)

Baseline Methods Traditional retrieval models QL: Query likelihood model based on Dirichlet smoothing BM25: The classical probabilistic retrieval model Representation-focused deep matching models DSSM T /DSSM D : A state-of-the-art deep matching model for Web search using feed forward neural network (released model trained on large clickthrough data) C-DSSM T /C-DSSM D : A similar deep matching model to DSSM for Web search, replacing DNN with CNN (released model trained on large clickthrough data) ARC-I: A general representation-focused deep matching model based on CNN (Implement according to the original paper) Interaction-focused deep matching models ARC-II: A hierarchical model over local interactions using CNN (Implement according to the original paper) MP: MatchPyramid is another state-of-the-art interaction-focused deep matching model. There are three variants of the model based on different interaction operators, denoted as MP IND, MP COS, and MP DOT (Released codes) Our Approach: DRMM (6 variants) DRMM CHxIDF we refer to DRMM with Count-based histogram and term gating network using inverse document frequency

Other Configurations Term Embeddings For all deep models, we used 300-dimensional term vectoers trained with the CBOW [T. Mikolov et al., NIPS2013] on the Robust04 and ClueWeb-09-Cat-B collections, respectively. Context window size: 10 Negative samples: 10 Preprocess: HTML tags removed and stemming Vocabulary Size: Robust04 (0.1M), ClueWeb-09-Cat-B(4.1M)

Other Configurations Term Embeddings For all deep models, we used 300-dimensional term vectoers trained with the CBOW [T. Mikolov et al., NIPS2013] on the Robust04 and ClueWeb-09-Cat-B collections, respectively. Context window size: 10 Negative samples: 10 Preprocess: HTML tags removed and stemming Vocabulary Size: Robust04 (0.1M), ClueWeb-09-Cat-B(4.1M) Network Configurations ARC-I, ARC-II and MatchPyramid Tried both the default configurations and other settings ARC-I and ARC-II: 3-word windows, 64 feature maps and 6 layers MatchPyramid: 3x3 kernel size, 8feature maps, and 4 layers DRMM 30-bin matching histogram. 4 layers

Retrieval Performance Model Type Traditional Retrieval Baselines Representation-Focused Matching Baselines Interaction-Focused Matching Baselines Our Approach Robust-04 collection Significant improvement or degradation with respect to QL is indicated (+/-) Model Name Topic Titles Topic Descriptions MAP ndcg@20 P@20 MAP ndcg@20 P@20 QL 0.253 0.415 0.369 0.246 0.391 0.334 BM25 0.255 0.418 0.370 0.241 0.399 0.337 DSSM D 0.095 0.201 0.171 0.078 0.169 0.145 CDSSM D 0.067 0.146 0.125 0.050 0.113 0.093 ARC-I 0.041 0.066 0.065 0.030 0.047 0.045 ARC-II 0.067 0.147 0.128 0.042 0.086 0.074 MP IND 0.169 0.319 0.281 0.067 0.142 0.118 MP COS 0.189 0.330 0.290 0.094 0.190 0.162 MP DOT 0.083 0.159 0.155 0.047 0.104 0.092 DRMM CHXTV 0.253 0.407 0.357 0.247 0.404 0.341 DRMM NHXTV 0.160 0.293 0.258 0.132 0.217 0.186 DRMM LCHXTV 0.268+ 0.423 0.381 0.265+ 0.423+ 0.360+ DRMM CHXIDF 0.259 0.412 0.362 0.255 0.410+ 0.344 DRMM NHXIDF 0.187 0.312 0.282 0.145 0.243 0.199 DRMM LCHXIDF 0.279+ 0.431+ 0.382+ 0.275+ 0.437+ 0.371+ All the representation-focused models perform significantly worse than the traditional retrieval models

Retrieval Performance Model Type Traditional Retrieval Baselines Representation-Focused Matching Baselines Interaction-Focused Matching Baselines Our Approach Model Name Robust-04 collection Significant improvement or degradation with respect to QL is indicated (+/-) Topic Titles Topic Descriptions MAP ndcg@20 P@20 MAP ndcg@20 P@20 QL 0.253 0.415 0.369 0.246 0.391 0.334 BM25 0.255 0.418 0.370 0.241 0.399 0.337 DSSM D 0.095 0.201 0.171 0.078 0.169 0.145 CDSSM D 0.067 0.146 0.125 0.050 0.113 0.093 ARC-I 0.041 0.066 0.065 0.030 0.047 0.045 ARC-II 0.067 0.147 0.128 0.042 0.086 0.074 MP IND 0.169 0.319 0.281 0.067 0.142 0.118 MP COS 0.189 0.330 0.290 0.094 0.190 0.162 MP DOT 0.083 0.159 0.155 0.047 0.104 0.092 DRMM CHXTV 0.253 0.407 0.357 0.247 0.404 0.341 DRMM NHXTV 0.160 0.293 0.258 0.132 0.217 0.186 DRMM LCHXTV 0.268+ 0.423 0.381 0.265+ 0.423+ 0.360+ DRMM CHXIDF 0.259 0.412 0.362 0.255 0.410+ 0.344 DRMM NHXIDF 0.187 0.312 0.282 0.145 0.243 0.199 DRMM LCHXIDF 0.279+ 0.431+ 0.382+ 0.275+ 0.437+ 0.371+ 1. Interaction-focused models cannot compete with the traditional retrieval models either 2. By preserving detailed matching signals, interaction-focused models can work better than representation-focused matching models

Retrieval Performance Model Type Traditional Retrieval Baselines Representation-Focused Matching Baselines Interaction-Focused Matching Baselines Our Approach Robust-04 collection Significant improvement or degradation with respect to QL is indicated (+/-) Model Name Topic Titles Topic Descriptions MAP ndcg@20 P@20 MAP ndcg@20 P@20 QL 0.253 0.415 0.369 0.246 0.391 0.334 BM25 0.255 0.418 0.370 0.241 0.399 0.337 DSSM D 0.095 0.201 0.171 0.078 0.169 0.145 CDSSM D 0.067 0.146 0.125 0.050 0.113 0.093 ARC-I 0.041 0.066 0.065 0.030 0.047 0.045 ARC-II 0.067 0.147 0.128 0.042 0.086 0.074 MP IND 0.169 0.319 0.281 0.067 0.142 0.118 MP COS 0.189 0.330 0.290 0.094 0.190 0.162 MP DOT 0.083 0.159 0.155 0.047 0.104 0.092 DRMM CHXTV 0.253 0.407 0.357 0.247 0.404 0.341 DRMM NHXTV 0.160 0.293 0.258 0.132 0.217 0.186 DRMM LCHXTV 0.268+ 0.423 0.381 0.265+ 0.423+ 0.360+ DRMM CHXIDF 0.259 0.412 0.362 0.255 0.410+ 0.344 DRMM NHXIDF 0.187 0.312 0.282 0.145 0.243 0.199 DRMM LCHXIDF 0.279+ 0.431+ 0.382+ 0.275+ 0.437+ 0.371+ 1. The best performing DRMM is significantly better than all the existing baselines 2. The performance on topic descriptions can be comparable to that on topic titles

Retrieval Performance Model Type Our Approach Robust-04 collection Significant improvement or degradation with respect to QL is indicated (+/-) Model Name Topic Titles Topic Descriptions MAP ndcg@20 P@20 MAP ndcg@20 P@20 DRMM CHXTV 0.253 0.407 0.357 0.247 0.404 0.341 DRMM NHXTV 0.160 0.293 0.258 0.132 0.217 0.186 DRMM LCHXTV 0.268 0.423 0.381 0.265 0.423 0.360 DRMM CHXIDF 0.259 0.412 0.362 0.255 0.410 0.344 DRMM NHXIDF 0.187 0.312 0.282 0.145 0.243 0.199 DRMM LCHXIDF 0.279 0.431 0.382 0.275 0.437 0.371 1. LCH-based histogram > CH-based histogram > NH-based histogram CH-based > NH-based: Document length information is important in ad-hoc retrieval LCH-based best: input signals with reduced range, and non-linear transformation useful for learning multiplicative relationships

Retrieval Performance Model Type Our Approach Robust-04 collection Significant improvement or degradation with respect to QL is indicated (+/-) Model Name Topic Titles Topic Descriptions MAP ndcg@20 P@20 MAP ndcg@20 P@20 DRMM CHXTV 0.253 0.407 0.357 0.247 0.404 0.341 DRMM NHXTV 0.160 0.293 0.258 0.132 0.217 0.186 DRMM LCHXTV 0.268 0.423 0.381 0.265 0.423 0.360 DRMM CHXIDF 0.259 0.412 0.362 0.255 0.410 0.344 DRMM NHXIDF 0.187 0.312 0.282 0.145 0.243 0.199 DRMM LCHXIDF 0.279 0.431 0.382 0.275 0.437 0.371 1. LCH-based histogram > CH-based histogram > NH-based histogram CH-based > NH-based: Document length information is important in ad-hoc retrieval LCH-based best: input signals with reduced range, and non-linear transformation useful for learning multiplicative relationships 2. IDF-based Term Gating > Term vector-based Gating Term vectors do not contain sufficient information Model using term vectors introduces too many parameters to be learned sufficiently

Retrieval Performance Model Type Traditional Retrieval Baselines Representation-Focused Matching Baselines Interaction-Focused Matching Baselines Our Approach Model Name ClueWeb-09-Cat-B collection Significant improvement or degradation with respect to QL is indicated (+/-) Topic Titles Topic Descriptions MAP ndcg@20 P@20 MAP ndcg@20 P@20 QL 0.100 0.224 0.328 0.075 0.183 0.234 BM25 0.101 0.225 0.326 0.080 0.196 0.255+ DSSM T 0.054 0.132 0.185 0.046 0.119 0.143 DSSM D 0.039 0.099 0.131 0.034 0.078 0.103 CDSSM T 0.064 0.253 0.214 0.055 0.139 0.171 CDSSM D 0.054 0.134 0.177 0.049 0.125 0.160 ARC-I 0.024 0.073 0.089 0.017 0.036 0.051 ARC-II 0.033 0.087 0.123 0.024 0.056 0.075 MP IND 0.056 0.139 0.208 0.043 0.118 0.158 MP COS 0.066 0.158 0.222 0.057 0.140 0.171 MP DOT 0.044 0.109 0.158 0.033 0.073 0.102 DRMM CHXTV 0.103 0.245 0.347 0.072 0.188 0.253 DRMM NHXTV 0.065 0.151 0.213 0.031 0.075 0.100 DRMM LCHXTV 0.111+ 0.250+ 0.361+ 0.083 0.213 0.275 DRMM CHXIDF 0.104 0.252+ 0.354+ 0.077 0.204 0.267 DRMM NHXIDF 0.066 0.151 0.216 0.038 0.087 0.113 DRMM LCHXIDF 0.113+ 0.258+ 0.365+ 0.087+ 0.235+ 0.310+ Similar results can be found on ClueWeb-09-Cat-B collection

Impact of Different Model Components Comparison of several simpler versions of DRMM over topic titles of the two test collections in terms of MAP 0.3 DRMMLCHxIDF DRMMLCHxUNI DRMMDYNxIDF DRMMKMAXxIDF 0.25 0.2 0.15 0.1 0.05 0 Robust04 ClueWeb-09-Cat-B Original DRMM remove term gating network (no term importance)

Impact of Different Model Components Comparison of several simpler versions of DRMM over topic titles of the two test collections in terms of MAP 0.3 DRMMLCHxIDF DRMMLCHxUNI DRMMDYNxIDF DRMMKMAXxIDF 0.25 0.2 0.15 0.1 0.05 0 Robust04 ClueWeb-09-Cat-B Original DRMM replace matching histogram with matching sequence, using dynamic pooling strategy (position-related input)

Impact of Different Model Components Comparison of several simpler versions of DRMM over topic titles of the two test collections in terms of MAP 0.3 DRMMLCHxIDF DRMMLCHxUNI DRMMDYNxIDF DRMMKMAXxIDF 0.25 0.2 0.15 0.1 0.05 0 Robust04 ClueWeb-09-Cat-B Original DRMM replace matching histogram with matching sequence, using k-max pooling strategy (top K strength related input)

Impact of Term Embeddings Performance comparison of DRMM over different dimensionality of term embeddings trained by CBOW on the Robust04 collection Topic Embedding MAP ndcg@20 P@20 Titles Descriptions CBOW-50d 0.268 0.421 0.375 CBOW-100d 0.270 0.427 0.379 CBOW-300d 0.279 0.431 0.381 CBOW-500d 0.277 0.430 0.381 CBOW-50d 0.268 0.431 0.365 CBOW-100d 0.271 0.433 0.367 CBOW-300d 0.275 0.437 0.371 CBOW-500d 0.274 0.435 0.370 1. With lower dimensionality, the similarity between term embeddings might be coarse and hurt the relevance matching performance. 2. With larger dimensionality, one may need more data to train reliable term embeddings.

Outline Introduction Problem Analysis Our Approach Experiments Conclusions

Major Contributions We point out three major differences between semantic matching and relevance matching, which may lead to significantly different architecture design of the deep matching models. We proposed a novel deep relevance matching model for ad-hoc retrieval by explicitly addressing the three key factors of relevance matching. We conduct rigorous comparisons over state-of-the-art retrieval models to demonstrate the effectiveness of our model.

Future Work Leverage larger training data to train deeper DRMM to further explore the potential of the proposed model e.g. click-through logs Include phrase embeddings so that phrases can be treated as a whole rather than separate terms local interactions can better reflect the meaning by using the proper semantic units in language

Thanks! Email: guojiafeng@ict.ac.cn Data and Codes can be found at http://www.bigdatalab.ac.cn/benchmark/