A Study of MatchPyramid Models on Ad hoc Retrieval

Similar documents
A Deep Relevance Matching Model for Ad-hoc Retrieval

A Deep Top-K Relevance Matching Model for Ad-hoc Retrieval

arxiv: v1 [cs.ir] 16 Oct 2017

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li

Information Retrieval. (M&S Ch 15)

Semantic Estimation for Texts in Software Engineering

End-to-End Neural Ad-hoc Ranking with Kernel Pooling

Deep and Reinforcement Learning for Information Retrieval

End-To-End Spam Classification With Neural Networks

CLEF-IP 2009: Exploring Standard IR Techniques on Patent Retrieval

arxiv: v1 [cs.ir] 15 May 2018

Structured Element Search for Scientific Literature

From Neural Re-Ranking to Neural Ranking:

Sentence selection with neural networks over string kernels

Attentive Neural Architecture for Ad-hoc Structured Document Retrieval

Improving Difficult Queries by Leveraging Clusters in Term Graph

Know your data - many types of networks

S-MART: Novel Tree-based Structured Learning Algorithms Applied to Tweet Entity Linking

A Brief Review of Representation Learning in Recommender 赵鑫 RUC

CS 6320 Natural Language Processing

Match-SRNN: Modeling the Recursive Matching Structure with Spatial RNN

Introduction to Information Retrieval

A Hybrid Neural Model for Type Classification of Entity Mentions

Encoding RNNs, 48 End of sentence (EOS) token, 207 Exploding gradient, 131 Exponential function, 42 Exponential Linear Unit (ELU), 44

ImageCLEF 2011

Perceptron: This is convolution!

Representation Learning using Multi-Task Deep Neural Networks for Semantic Classification and Information Retrieval

Effective Latent Space Graph-based Re-ranking Model with Global Consistency

In = number of words appearing exactly n times N = number of words in the collection of words A = a constant. For example, if N=100 and the most

Information Retrieval CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Classifying Depositional Environments in Satellite Images

Latent Semantic Indexing

Alberto Messina, Maurizio Montagnuolo

Tilburg University. Authoritative re-ranking of search results Bogers, A.M.; van den Bosch, A. Published in: Advances in Information Retrieval

POLAR: Attention-based CNN for One-shot Personalized Article Recommendation

Empirical Evaluation of RNN Architectures on Sentence Classification Task

Deep Character-Level Click-Through Rate Prediction for Sponsored Search

Artificial Intelligence Introduction Handwriting Recognition Kadir Eren Unal ( ), Jakob Heyder ( )

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset. By Joa õ Carreira and Andrew Zisserman Presenter: Zhisheng Huang 03/02/2018

Manning Chapter: Text Retrieval (Selections) Text Retrieval Tasks. Vorhees & Harman (Bulkpack) Evaluation The Vector Space Model Advanced Techniques

A Deep Learning Framework for Authorship Classification of Paintings

Vulnerability of machine learning models to adversarial examples

A Few Things to Know about Machine Learning for Web Search

Inception and Residual Networks. Hantao Zhang. Deep Learning with Python.

Estimating Embedding Vectors for Queries

Entity and Knowledge Base-oriented Information Retrieval

Lecture 7: Semantic Segmentation

CMU Lecture 18: Deep learning and Vision: Convolutional neural networks. Teacher: Gianni A. Di Caro

END-TO-END CHINESE TEXT RECOGNITION

Deconvolution Networks

Deep Face Recognition. Nathan Sun

TEXT SEGMENTATION ON PHOTOREALISTIC IMAGES

Neural Networks and Deep Learning

Facial Expression Classification with Random Filters Feature Extraction

Semantic text features from small world graphs

The Goal of this Document. Where to Start?

CS6200 Information Retrieval. Jesse Anderton College of Computer and Information Science Northeastern University

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University

Inception Network Overview. David White CS793

ECE 6504: Deep Learning for Perception

Modeling the Other Race Effect with ICA

Multi-Glance Attention Models For Image Classification

Tansu Alpcan C. Bauckhage S. Agarwal

Learning Semantic Entity Representations with Knowledge Graph and Deep Neural Networks and its Application to Named Entity Disambiguation

Convolutional-Recursive Deep Learning for 3D Object Classification

CEA LIST s participation to the Scalable Concept Image Annotation task of ImageCLEF 2015

Face Recognition by Combining Kernel Associative Memory and Gabor Transforms

TEXT CATEGORIZATION PROBLEM

Contextual Dropout. Sam Fok. Abstract. 1. Introduction. 2. Background and Related Work

Part I: Data Mining Foundations

CS54701: Information Retrieval

James Mayfield! The Johns Hopkins University Applied Physics Laboratory The Human Language Technology Center of Excellence!

Instructor: Stefan Savev

Learning Hierarchical Features for Scene Labeling

Information Retrieval

CHROMA AND MFCC BASED PATTERN RECOGNITION IN AUDIO FILES UTILIZING HIDDEN MARKOV MODELS AND DYNAMIC PROGRAMMING. Alexander Wankhammer Peter Sciri

From processing to learning on graphs

Jure Leskovec, Cornell/Stanford University. Joint work with Kevin Lang, Anirban Dasgupta and Michael Mahoney, Yahoo! Research

International Journal of Advanced Research in Computer Science and Software Engineering

Describable Visual Attributes for Face Verification and Image Search

Convolutional Networks for Text

Informativeness for Adhoc IR Evaluation:

Robust Face Recognition via Sparse Representation Authors: John Wright, Allen Y. Yang, Arvind Ganesh, S. Shankar Sastry, and Yi Ma

PTE : Predictive Text Embedding through Large-scale Heterogeneous Text Networks

Supplementary A. Overview. C. Time and Space Complexity. B. Shape Retrieval. D. Permutation Invariant SOM. B.1. Dataset

Grounded Compositional Semantics for Finding and Describing Images with Sentences

Semantic Matching by Non-Linear Word Transportation for Information Retrieval

DEEP BLIND IMAGE QUALITY ASSESSMENT

A Deep Learning primer

Photo OCR ( )

Filling the Gaps. Improving Wikipedia Stubs

On Order-Constrained Transitive Distance

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides

Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p.

Supplementary material for the paper Are Sparse Representations Really Relevant for Image Classification?

Improving Face Recognition by Exploring Local Features with Visual Attention

08 An Introduction to Dense Continuous Robotic Mapping

Search Engines. Information Retrieval in Practice

CS294-1 Final Project. Algorithms Comparison

Information Retrieval CS Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Transcription:

A Study of MatchPyramid Models on Ad hoc Retrieval Liang Pang, Yanyan Lan, Jiafeng Guo, Jun Xu, Xueqi Cheng Institute of Computing Technology, Chinese Academy of Sciences

Text Matching Many text based applications have been formalized as a matching task Paraphrase Identification Ad hoc Retrieval Deep Learning Question Answering Machine Translation Lack the evaluation of deep matching models on the standard ad hoc retrieval tasks

A Representative model:matchpyramid A State of the art text matching model [AAAI 2016] Paraphrase Identification Paper Citation Matching

A Brief Introduction to MatchPyramid

Text Matching as Image Recognition

Matching Matrix Capture the local interactions, and bridge the gap between text matching and image recognition Indicator: Dot Product: Cosine: Gaussian Kernel:

Hierarchical Convolution A Way to Capture Rich Matching Patterns (e.g., n grams, n terms)

Matching Score and Training Matching Score Generation Multi Layer Perception, Training, ;Θ max 0,1 Adagrad [Duchi, Hazan,and Singer 2011]

Experiments on Ad hoc Retrieval

Experimental settings TREC Robust04 The topics are collected from TREC Robust Track 2004. Use the Galago Search Engine in this experiment, and both queries and documents are white space tokenized, lower cased, and stemmed during indexing and retrieval. Truncate the document length to 500.

Experimental Settings Similarity Function Kernel Size Pooling Size Model Settings 1 Convolutional Layer and 1 Pooling Layer 128 hidden nodes for the MLP ReLU as the activation function Impact of Similarity Function Impact of Kernel Size Impact of Dynamic Pooling Size

Impact of Similarity Function Comparison of different similarity functions on Paraphrase Identification Comparison of different similarity functions on Robust04 It is important for deep models to differentiate exact matching from semantic matching on ad hoc retrieval task

Impact of Kernel Size 1 x n kernel: to capture the query word level richer matching signals n x n kernel: to capture the n gram matching (word proximity) Observations: Kernel size is not sensitive on the Indicator Matching Matrix due to sparsity. A proper kernel size on the Gaussian Matching Matrix can capture richer semantic information. Surprisingly, no improvement on n gram matching: The dataset is too small to learn the complex proximity patterns.

Impact of Pooling Size Pooling layer is used to pick out the most important signals Observation: 1. Pooling size affect the performance significantly 2. The best dynamic pooling size is about 3x10 On the query side, we keep information at query word level (since the mean query length is 3); On the document side, we aggregate information from every 50 words, close to the average length of a paragraph. It is better to aggregate signals for the lengthy, noisy documents but not the short queries

Compare with Existing Models MatchPyramid outperforms existing deep matching models Compared with representation based models, MatchPyramid keeps all the low level detailed matching signals Compared with interaction based models, Gaussian kernel is much more effective than 1D convolution in ARC II The best performing deep matching models cannot beat the traditional retrieval models on Robust04

Some Thoughts Considering the success of these deep matching models on NLP tasks, e.g. paraphrase identification Task: It seems that the ad hoc retrieval task is quire different from those text matching tasks in NLP: Semantic matching VS. Relevance matching Q: peace process D: Homogeneous texts (sentences) vs. heterogeneous texts (keywords, documents) Comparable lengths vs. significantly different lengths (short queries, very long documents) Semantic matching signals vs. exact matching signals A Deep Relevance Matching Model for Ad hoc Retrieval, to appear in CIKM 2016

Some Thoughts Considering the success of traditional retrieval models, Model: Queries are usually not treated equivalent to documents in traditional retrieval models VSM (symmetric) vs. BM25/LM (non symmetric) Most deep models treat the two input texts in a symmetric way DSSM/CDSSM/ARC I ARC II/MatchPyramid

Some Thoughts Considering the success of deep learning models in other research areas, Data: The limited data size (i.e., query size) of ad hoc retrieval dataset prevents us from using very deep network architectures Some larger data sets: MQ2007: about 1700 queries labeled MQ2008: about 800 queries labeled Paraphrase Identification: 5,800 pairs of sentences (MSRP) ImageNet: 14,197,122 images, synsets: 21,841

Conclusions Test MatchPyramid models on ad hoc retrieval tasks Study three factors in MatchPyramid models Similarity functions Kernel size Dynamic pooling size The weak performance on ad hoc retrieval tasks Task characteristics Model design Data size

Thanks Q&A