Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li

Similar documents
Multimodal Information Spaces for Content-based Image Retrieval

A Few Things to Know about Machine Learning for Web Search

A Deep Relevance Matching Model for Ad-hoc Retrieval

Karami, A., Zhou, B. (2015). Online Review Spam Detection by New Linguistic Features. In iconference 2015 Proceedings.

arxiv: v1 [cs.ir] 21 Apr 2016

Multimodal Medical Image Retrieval based on Latent Topic Modeling

Heterogeneous Graph-Based Intent Learning with Queries, Web Pages and Wikipedia Concepts

Show, Discriminate, and Tell: A Discriminatory Image Captioning Model with Deep Neural Networks

Incorporating Semantic Knowledge into Latent Matching Model in Search

Top-N Recommendations from Implicit Feedback Leveraging Linked Open Data

Image-Sentence Multimodal Embedding with Instructive Objectives

ECS289: Scalable Machine Learning

A Study of MatchPyramid Models on Ad hoc Retrieval

A Data Classification Algorithm of Internet of Things Based on Neural Network

Channel Locality Block: A Variant of Squeeze-and-Excitation

Web Information Retrieval using WordNet

Tag Based Image Search by Social Re-ranking

Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction

A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS. Kuan-Chuan Peng and Tsuhan Chen

Deep Web Crawling and Mining for Building Advanced Search Application

Deep Learning With Noise

Deep Character-Level Click-Through Rate Prediction for Sponsored Search

CEA LIST s participation to the Scalable Concept Image Annotation task of ImageCLEF 2015

Combining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating

Comment Extraction from Blog Posts and Its Applications to Opinion Mining

An efficient face recognition algorithm based on multi-kernel regularization learning

Lecture 2 Notes. Outline. Neural Networks. The Big Idea. Architecture. Instructors: Parth Shah, Riju Pahwa

Category Theory in Ontology Research: Concrete Gain from an Abstract Approach

Empirical Evaluation of RNN Architectures on Sentence Classification Task

Semi supervised clustering for Text Clustering

A Network Intrusion Detection System Architecture Based on Snort and. Computational Intelligence

Semantic Website Clustering

Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting

JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS. Puyang Xu, Ruhi Sarikaya. Microsoft Corporation

Comparison of Optimization Methods for L1-regularized Logistic Regression

arxiv: v1 [cs.mm] 12 Jan 2016

Opinion Mining by Transformation-Based Domain Adaptation

Music Recommendation with Implicit Feedback and Side Information

Novel Image Captioning

S-MART: Novel Tree-based Structured Learning Algorithms Applied to Tweet Entity Linking

Automatic Domain Partitioning for Multi-Domain Learning

An Adaptive Threshold LBP Algorithm for Face Recognition

Progressive Generative Hashing for Image Retrieval

A Novel Image Super-resolution Reconstruction Algorithm based on Modified Sparse Representation

Dynamic Clustering of Data with Modified K-Means Algorithm

Visual object classification by sparse convolutional neural networks

Salient Region Detection and Segmentation in Images using Dynamic Mode Decomposition

University of Delaware at Diversity Task of Web Track 2010

Face recognition based on improved BP neural network

Recommender Systems New Approaches with Netflix Dataset

End-To-End Spam Classification With Neural Networks

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

Bidirectional Recurrent Convolutional Networks for Video Super-Resolution

Easy Samples First: Self-paced Reranking for Zero-Example Multimedia Search

A Deep Top-K Relevance Matching Model for Ad-hoc Retrieval

Limitations of Matrix Completion via Trace Norm Minimization

Machine Learning 13. week

Using Decision Boundary to Analyze Classifiers

Rank Measures for Ordering

Improving the Efficiency of Fast Using Semantic Similarity Algorithm

Effective Latent Space Graph-based Re-ranking Model with Global Consistency

Research on Applications of Data Mining in Electronic Commerce. Xiuping YANG 1, a

PTE : Predictive Text Embedding through Large-scale Heterogeneous Text Networks

Detecting and Parsing of Visual Objects: Humans and Animals. Alan Yuille (UCLA)

Collaborative Filtering using a Spreading Activation Approach

Plagiarism Detection Using FP-Growth Algorithm

arxiv: v1 [cond-mat.dis-nn] 30 Dec 2018

Grounded Compositional Semantics for Finding and Describing Images with Sentences

CS229 Final Project: Predicting Expected Response Times

Query-Sensitive Similarity Measure for Content-Based Image Retrieval

Query Independent Scholarly Article Ranking

Linking Entities in Chinese Queries to Knowledge Graph

Towards Building Large-Scale Multimodal Knowledge Bases

Top-k Keyword Search Over Graphs Based On Backward Search

A Language Independent Author Verifier Using Fuzzy C-Means Clustering

A BFS-BASED SIMILAR CONFERENCE RETRIEVAL FRAMEWORK

Semantic Extensions to Syntactic Analysis of Queries Ben Handy, Rohini Rajaraman

A Survey on Postive and Unlabelled Learning

Knowledge Graph Embedding with Numeric Attributes of Entities

Decentralized and Distributed Machine Learning Model Training with Actors

TriRank: Review-aware Explainable Recommendation by Modeling Aspects

Bipartite Graph Partitioning and Content-based Image Clustering

LRLW-LSI: An Improved Latent Semantic Indexing (LSI) Text Classifier

Leveraging Set Relations in Exact Set Similarity Join

Keywords Data alignment, Data annotation, Web database, Search Result Record

SSD: Single Shot MultiBox Detector. Author: Wei Liu et al. Presenter: Siyu Jiang

Neural Network Approach for Automatic Landuse Classification of Satellite Images: One-Against-Rest and Multi-Class Classifiers

Understanding the Query: THCIB and THUIS at NTCIR-10 Intent Task. Junjun Wang 2013/4/22

Learning the Structures of Online Asynchronous Conversations

Clustering Web Documents using Hierarchical Method for Efficient Cluster Formation

Experiments of Image Retrieval Using Weak Attributes

Extending Enterprise Services Descriptive Metadata with Semantic Aspect Based on RDF

Proceedings of the International MultiConference of Engineers and Computer Scientists 2018 Vol I IMECS 2018, March 14-16, 2018, Hong Kong

Text Data Pre-processing and Dimensionality Reduction Techniques for Document Clustering

Design of student information system based on association algorithm and data mining technology. CaiYan, ChenHua

A Vision Recognition Based Method for Web Data Extraction

An Efficient Methodology for Image Rich Information Retrieval

Robust Face Recognition via Sparse Representation Authors: John Wright, Allen Y. Yang, Arvind Ganesh, S. Shankar Sastry, and Yi Ma

learning stage (Stage 1), CNNH learns approximate hash codes for training images by optimizing the following loss function:

LINK GRAPH ANALYSIS FOR ADULT IMAGES CLASSIFICATION

Transcription:

Learning to Match Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li 1. Introduction The main tasks in many applications can be formalized as matching between heterogeneous objects, including search, recommendation, question answering, paraphrasing, and image retrieval. For example, search can be viewed as a problem of matching between a query and a document, and image retrieval can be viewed as a problem of matching between a text query and an image. A variety of machine learning techniques have been developed for various matching tasks. We refer to them as `learning to match, which can be formalized as the following learning problem. Suppose that there are two spaces X and Y. Objects x and y belong to the two spaces X and Y, respectively. There is a set of numbers R. Number r belongs to the set R, representing a matching degree. A class of `matching functions F f ( x, y) is defined, while the value of the function is a real number. Training data x, y, r ),( x, y, r ),,( x, y, r ) is given, where each instance consists of ( 1 1 1 2 2 2 N N N objects x and y as well as their matching degree r. The data is assumed to be generated according to the distributions ( x, y) ~ P( X, Y), r ~ P( R X, Y). The goal of the learning task is to select a matching function f ( x, y) from the class F based on the observation of the training data. The matching function can be used in classification, regression, or ranking of two matching objects. The learning task, then, becomes the following optimization problem. arg min f F N i 1 L( r, f ( x, y )) ( f ) where L(, ) denotes a loss function and denotes regularization. i i i Learning to match is unique in that it is about learning of a two-input function f ( x, y), in contrast to learning of a one-input function in conventional classification and regression. There usually exist relations between the two inputs x and y, and we can and should leverage the relations to enhance the accuracy of learning. In fact, the inputs x and y can be instances (IDs), feature vectors, and structured objects, and thus the task can be carried out at instance level, feature level, and structure level, as shown in Figure 1.

Figure 1. Matching between heterogeneous objects can be performed at instance, feature, and structure levels. We think that it is necessary and important to conduct research on learning to match. To generalize the techniques for matching developed in different applications, we can make better understanding on the problems, develop more powerful machine learning methodologies, and apply them to all the applications. At Noah s Ark Lab, we have invented learning to match techniques for document retrieval, recommendation, natural language processing, and image retrieval, as described below. Actually, the general view on learning to match has helped us a lot in the development of the technologies.

2. Learning to Match for Document Retrieval Document retrieval or search is a problem mainly based on matching. Current search systems adopt the term-based approach, in which documents are retrieved based on the query terms, and the relevance scores of the documents with respect to the query are mainly calculated on the basis of the matching degrees between the query and documents at term level. In such a framework, query document mismatch often arises, as the most critical problem that stands on the way of a successful search. It occurs, when the searcher and the author use different terms to represent the same concept, e.g., NY and New York. The phenomenon is prevalent due to the flexibility and diversity of human languages. A fundamental solution to the problem would be to perform `semantic matching, in which the query and a document can match if they are semantically similar. We can leverage learning to match techniques to train models to carry out semantic matching [Li & Xu, 2014]. Figure 2. Regularized Mapping to Latent Space In [Wu et al., 2013a; Wu et al. 2013b], a new method for learning to match called regularized mapping to latent space (RMLS) is proposed. RMLS learns two linear mapping functions for respectively mapping two objects (query and document) into a latent space, and taking the dot product of their images in the latent space as matching degree between the two objects (cf. Figure 2). RMLS is an extension of the method of Partial Least Square (PLS). To enhance the scalability, RMLS replaces the orthogonal constraint in PLS with L 1 and L 2 regularization. In this way, learning can be performed in parallel and thus is scalable and efficient. RMLS is applied to web search where a large click-through dataset including associated queries and documents is utilized as training data. The result shows that both PLS and RMLS can significantly outperform the baseline methods, while RMLS can substantially speed up the learning process. One shortcoming with RMLS is that it is hard to train a model for tail queries and tail documents for which there is not enough click through data. In [Wang et al., 2015], a

new method is proposed to tackle the challenge by using not only click-through data but also semantic knowledge in learning of the RMLS model. The semantic knowledge can be categories of queries and documents as well as synonyms of words. The objective function of the method is defined such that semantic knowledge is incorporated through regularization. Two methods for conducting the optimization using coordinate decent and gradient decent are also proposed. Experimental results on two datasets demonstrate that the new model can make effective use of semantic knowledge, and can significantly enhance the accuracy of RMLS, particularly for tail queries. 3. Learning to Match for Recommendation Recommender system is another area in which matching plays an important role. In the collaborative filtering setting, the essential problem is to recommend product items to users, which is equivalent to matching product items to users. The problem can be formalized as matrix factorization, when only IDs of users and items are provided. It can also be extended to feature-based matrix factorization when features of users and items are also available. Figure 3. Functional Matrix Factorization In feature-based matrix factorization, the feature vectors in the feature space are mapped into a latent space. One key question here is how to encode useful features to enhance the performance of matching. Ideally one would like to automatically construct good feature functions in the learning process. In [Chen et al., 2013], we propose formalizing the problem as general functional matrix factorization, whose model includes conventional matrix factorization models as special cases. Specifically, in our method feature functions are automatically created through search in a functional space during the process of learning. We propose a gradient boosting based algorithm to efficiently solve the optimization problem, with the general framework

outlined in Figure 3. The experimental results demonstrate that the proposed method significantly outperforms the baseline methods in recommendation. Another important issue is how to scale up learning to match to extremely large scale recommendation problems. We propose a parallel algorithm to tackle the challenge in feature based matrix factorization [Shang et al., 2014]. In general, it is very hard to parallelize the learning of feature-based matrix factorization, because it requires simultaneous access to all the features. In our work, we solve the problem by leveraging the properties of learning to match. The algorithm parallelizes and accelerates the coordinate descent algorithm by (1) iteratively relaxing the objective function to facilitate parallel updates of parameters, and (2) storing intermediate results to avoid repeated calculations with respect to the features. Experimental results demonstrate that the proposed method is very effective on a wide range of matching problems, with efficiency significantly improved. 4. Learning to Match for Natural Language Processing Many problems in natural language processing depend on matching of two linguistic objects, for example, two sentences. In paraphrasing, one needs to decide whether two sentences are synonymous expressions to each other; in question answering, one needs to figure out whether one sentence is an answer to the other sentence; in dialogue one needs to judge whether the two sentences form one round of conversation. That is to say, all of them amount to matching of two sentences, while the matching relations vary according to tasks. Note that the two sentences have structures and thus we need to consider how to perform matching of sentences at structure level. Recently we have developed several models for matching between sentences in natural language, using deep learning [Lu & Li, 2013; Hu et al., 2014; Wang et al., 2015]. The deep matching models are employed in paraphrasing, question answering, and single turn dialogue. Deep Match CNN [Hu et al., 2014] is a neural network model for matching a pair of sentences in a specific task. There are two architectures: Arc-I and Arc-II. In the former architecture, given two sentences, each of them is fed into a one-dimensional convolutional neural network, yielding a representation of the sentence, and then the two representations are given to a multi layer network, generating a matching score between the two sentences (Figure 4). The model is trained with a large number of sentence pairs in the task. In the latter architecture, given two sentences, both of them are input into a two-dimensional convolutional neural network, outputting a matching score. The model captures both the representations of the two sentences and the interactions between the two sentences in the matching problem. The major advantage of Deep Match CNN lies in its flexibility and robustness. Given matching data for any

task in any language, it can automatically learn a model, without the need of exploiting linguistic knowledge. Figure 4. Deep Match CNN Model (Arc-I) Another matching model is Deep Match Tree, whose main characteristic is to utilize linguistic knowledge (Figure 5). Given two sentences, a dependency parse tree is created for each sentence using a dependency parser. All the subtree pairs between the two parse trees are collected and fed into a multi-layer neural network. A final matching score is output from the network. Each neuron of the first layer corresponds to one subtree matching pattern. All the subtree matching patterns are mined from a large data corpus in advance. The corresponding neuron will become one if a subtree pair agrees with a subtree matching pattern. Intuitively, the more matched patterns there are, the higher the final score will be, i.e., the two sentences will be matched. Figure 5. Deep Match Tree Model 5. Learning to Match for Image Retrieval Image and text are heterogeneous data. Image retrieval is a problem as follows. Given a query in natural language, we retrieve from a collection of images the most relevant ones and return them in a ranking list. The key problem is, again, matching between heterogeneous data and the challenge is that the data is very different in nature. We have developed a model called multimodal convolutional neural network (m-cnns) for matching image and text. As shown in Figure 6, m-cnn employs convolutional architectures to model image and text matching. More specifically, it

exploits one CNN for encoding the image and one CNN for encoding the text, as well as the matching relations between image and text. Experimental results demonstrate that m-cnn can effectively capture the information necessary for image and text matching, and significantly outperform the state-of-the-art methods in bidirectional image and text retrieval. Figure 6. Multimodal Convolutional Neural Network 6. Open Problems There are still many open questions to address in the research on learning to match. We list some of them here. Unified model: Different models have been proposed for different tasks. One may ask whether there is a general model which can be employed in all matching problems, for example, a powerful deep neural network. Incorporation of knowledge: How to incorporate human knowledge into the matching model in order to improve the accuracy of matching in different tasks is still an open problem. It is not clear how to do it in an effective and a theoretically sound way, beyond regularization. Relations between objects: Sometimes relations between the objects exist in the form of network. How to make effective use of the information in the matching model is still an important research topic, although some research has been done. Online learning: In some cases, one may want to train the model in an online fashion, which requires new learning algorithms. Theory: Existing methods on learning to match are still lack of theoretical support. For example, it is not clear whether some of the methods have statistical consistency, i.e., the model learned converges to the `true model. Scalability and efficiency: In practice, the scale of some of the matching problems is extremely large. How to make the learning process more scalable and efficient needs more investigation.

References Tianqi Chen, Hang Li, Qiang Yang, Yong Yu, General Functional Matrix Factorization Using Gradient Boosting, In Proceedings of 30th International Conference on Machine Learning (ICML), JMLR W&CP 28(1):436-444, 2013. Baotian Hu, Zhengdong Lu, Hang Li, Qingcai Chen, Convolutional Neural Network Architectures for Matching Natural Language Sentences, in Proceedings of Advances in Neural Information Processing Systems 27 (NIPS), 2042-2050, 2014. Hang Li, Jun Xu, Semantic Matching in Search, Foundations and Trends in Information Retrieval, Now Publishers, 2014. Zhengdong Lu, Hang Li, A Deep Architecture for Matching Short Texts, In Proceedings of Neural Information Processing Systems 26 (NIPS), 1367-1375, 2013. Lin Ma, Zhengdong Lu, Lifeng Shang, Hang Li, Multimodal Convolutional Neural Networks for Matching Image and Sentence. Proceedings of 2015 IEEE International Conference on Computer Vision (ICCV), 2015. to appear. Jingbo Shang, Tianqi Chen, Hang Li, Zhengdong Lu, Yong Yu, A Parallel and Efficient Algorithm for Learning to Match, in Proceedings of IEEE International Conference on Data Mining (ICDM), 971-976, 2014. Shuxin Wang, Xin Jiang, Jun Xu, Hang Li, Bin Wang, Incorporating Semantic Knowledge into Latent Matching Model in Search, to appear, 2015. Mingxuan Wang, Zhengdong Lu, Hang Li, Qun Liu, Syntax-based Deep Matching of Short Texts, in Proceedings of International Conference on Artificial Intelligence (IJCAI 15), 2015. Wei Wu, Hang Li, Jun Xu, Learning Query and Document Similarities from Click-through Bipartite Graph with Metadata, In Proceedings of the Sixth ACM International Conference on Web Search and Data Mining (WSDM), 687-696, 2013. Wei Wu, Zhengdong Lu, Hang Li, Learning Bilinear Model for Matching Queries and Documents, Journal of Machine Learning Research (JMLR), 14: 2519-2548, 2013.