Representation Learning using Multi-Task Deep Neural Networks for Semantic Classification and Information Retrieval

Size: px
Start display at page:

Download "Representation Learning using Multi-Task Deep Neural Networks for Semantic Classification and Information Retrieval"

Transcription

1 Representation Learning using Multi-Task Deep Neural Networks for Semantic Classification and Information Retrieval Xiaodong Liu 12, Jianfeng Gao 1, Xiaodong He 1 Li Deng 1, Kevin Duh 2, Ye-Yi Wang 1 1 Microsoft Research, USA 2 Nara Institute of Science and Technology, Japan

2 Learning Vector-Space Representation Why Significant accuracy gains in NLP tasks [Collobert+ 11] More compact models, easier to train and generalize better Existing learning methods are not optimal Use unsupervised objectives [Mikolov+ 11] Sub-optimal to the tasks of interest Use supervised objectives on a single task [Socher+ 13] Constrained by limited amounts of training data Our solution is inspired by multi-task learning [Caruana 97] 2

3 Multi-Task Deep Neural Nets for Representation Learning Leverage supervised data from many (related) tasks Reduce overfitting to a specific task Make the learned representations universal across tasks. Combine tasks as disparate as Semantic query classification, and Semantic web search Large scale experiments Higher accuracies on multiple tasks More compact models Easy to adapt to new tasks/domains 3

4 The Query Classification Task Given a search query Q, e.g., denver sushi downtown Identify its domain C e.g., Restaurant Hotel Nightlife Flight Thus, a search engine can tailor the interface and result to provide a richer personalized user experience

5 Problem Formulation For each domain C, build a binary classifier Input: represent a query Q as a vector of features x = [x 1, x n ] T Output: y = P 1 Q, C Q is labeled c is P 1 Q, C > 0.5 Input feature vector, e.g., a bag of words vector Regards words as atomic symbols: denver, sushi, downtown Each word is represented as a one-hot vector: 0,, 0,1,0,, 0 T Bag of words vector = sum of one-hot vectors Other (better) features: n-grams, phrases, (learned) topics, etc. How to construct optimal feature vectors for queries?

6 The Web Search Ranking Task Documents (D) Queries (Q) cold home remedy cold remeedy flu treatment how to deal with stuffy nose 6

7 Semantic Matching between Q and D Fuzzy keyword matching Q: cold home remedy D: best home remedies for cold and flu Spelling correction Q: cold remeedies D: best home remedies for cold and flu Query alteration/expansion Q: flu treatment D: best home remedies for cold and flu Query/document semantic matching Q: how to deal with stuffy nose D: best home remedies for cold and flu R&D progress 7

8 Problem Formulation Given a query Q, and a list of candidate docs D i, i = 1 N Rank D i according to their relevance to Q Represent Q and D as feature vectors, where features are Bag of words, phrases, (learned) topics, etc. Relevance cosine similarity of feature vectors of Q and D How to construct optimal feature vectors for queries and docs? 8

9 A DNN for Classification and DSSM for Ranking Classifier/Ranker that uses the hidden features as input Feature generation: project raw input features (bag of words) to hidden features (topics). Deep Structured Semantic Model (DSSM) [Huang+ 13] 9

10 The Proposed Multi-Task DNN Model 10

11 Shared Layers (l 1 and l 2 ) Word Hash Layer (l 1 ) Control the dimensionality of input using letter-3-gram e.g., cat #cat# #-c-a, c-a-t, a-t-# Only ~50K letter-trigrams in English; no OOV issue OOV words can be represented by letter-3-grams Spelling variations of the same word have similar representations Shared Semantic-Representation Layer (l 2 ) Captures cross-task semantic characteristics for arbitrary text (Q or D) l 2 = tanh(w 1 l 1 ) 11

12 Task Specific Representation (l 3 ) For each task, a nonlinear transformation maps l 2 into the task-specific representation via l 3 = tanh(w 2 t l 2 ) t denotes different tasks Model compactness result Compression from 500k-dim input to shared 300-dim semantic vector l 2 Multi-task DNN takes < 150KB in memory SVM using word-n-grams takes > 200MB Easy to add new domains, small memo footprint, fast runtime 12

13 Task-Specific Output Layers (P) Query classification: Q C 1 l 3 = tanh(w 2 t=c 1 l 2 ) P C 1 Q = sigmoid(w 3 t=c 1 Q C 1) Web search ranking Q and D are mapped into task representation Q S q and D S d. Relevance score is computed by cosine similarity as 13

14 The Training Procedure: Mini-Batch SGD i.e., cross-entropy loss i.e., pair-wise rank loss 14

15 Pair-Wise Rank Loss for Web Search Consider a query Q and two documents D + and D Assume D + is more relevant than D to Q sim θ Q, D is the cosine similarity of Q and D in semantic space, mapped by a neural network parameterized by θ Δ = sim θ Q, D + sim θ Q, D We want to maximize Δ L Δ; θ = log(1 + exp γδ )

16 Experimental Evaluation Metrics AUC scores for query classification NDCG scores for web search ranking 16

17 Query Classification AUC Results MT-DNN > DNN: usefulness of multi-task objective over single-task objective DNN/MT-DNN > SVM-Letter w/ the same input l 1 : importance of learning a semantic representation l 2 DNN/MT-DNN > SVM-Word: power of deep learning 17

18 Web Search NDCG Results 18

19 Domain Adaptation on Query Classification To add a new task, how much training data to label? Experiment design Select one query classification task t, train MT-DNN on the remaining tasks to obtain a semantic representation (l 2 ) Given a fixed l 2, train a SVM on the training data of t, using varying amounts of labels Evaluate the AUC on the test data of t Compare 3 SVM classifiers trained using different feature vectors Semantic Representation (l 2 ) Word-n-grams, n = 1,2,3 Letter-3-grams 19

20 Domain Adaptation in Query Classification Using l 2 features, only small amounts of training labels are needed l 2 features are universally useful across domains/tasks 20

21 Conclusion Learning semantic representation using multi-task DNN Combine tasks as disparate as classification and ranking Consistently outperforms strong baselines Leads to a compact model Facilitates domain adaptation using learned representations Are the learned representations really semantic? What DNN learns are hidden features that are useful for a particular task? Semantic representations are universal in that they are useful for multiple tasks Multi-task DNN is a way to learn universal, semantic representations 21

22 Thanks! Q&A

A Deep Relevance Matching Model for Ad-hoc Retrieval

A Deep Relevance Matching Model for Ad-hoc Retrieval A Deep Relevance Matching Model for Ad-hoc Retrieval Jiafeng Guo 1, Yixing Fan 1, Qingyao Ai 2, W. Bruce Croft 2 1 CAS Key Lab of Web Data Science and Technology, Institute of Computing Technology, Chinese

More information

Learning Semantic Entity Representations with Knowledge Graph and Deep Neural Networks and its Application to Named Entity Disambiguation

Learning Semantic Entity Representations with Knowledge Graph and Deep Neural Networks and its Application to Named Entity Disambiguation Learning Semantic Entity Representations with Knowledge Graph and Deep Neural Networks and its Application to Named Entity Disambiguation Hongzhao Huang 1 and Larry Heck 2 Computer Science Department,

More information

Deep neural networks II

Deep neural networks II Deep neural networks II May 31 st, 2018 Yong Jae Lee UC Davis Many slides from Rob Fergus, Svetlana Lazebnik, Jia-Bin Huang, Derek Hoiem, Adriana Kovashka, Why (convolutional) neural networks? State of

More information

A Hybrid Neural Model for Type Classification of Entity Mentions

A Hybrid Neural Model for Type Classification of Entity Mentions A Hybrid Neural Model for Type Classification of Entity Mentions Motivation Types group entities to categories Entity types are important for various NLP tasks Our task: predict an entity mention s type

More information

JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS. Puyang Xu, Ruhi Sarikaya. Microsoft Corporation

JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS. Puyang Xu, Ruhi Sarikaya. Microsoft Corporation JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS Puyang Xu, Ruhi Sarikaya Microsoft Corporation ABSTRACT We describe a joint model for intent detection and slot filling based

More information

End-to-End Neural Ad-hoc Ranking with Kernel Pooling

End-to-End Neural Ad-hoc Ranking with Kernel Pooling End-to-End Neural Ad-hoc Ranking with Kernel Pooling Chenyan Xiong 1,Zhuyun Dai 1, Jamie Callan 1, Zhiyuan Liu, and Russell Power 3 1 :Language Technologies Institute, Carnegie Mellon University :Tsinghua

More information

Reading group on Ontologies and NLP:

Reading group on Ontologies and NLP: Reading group on Ontologies and NLP: Machine Learning27th infebruary Automated 2014 1 / 25 Te Reading group on Ontologies and NLP: Machine Learning in Automated Text Categorization, by Fabrizio Sebastianini.

More information

Deep Learning Applications

Deep Learning Applications October 20, 2017 Overview Supervised Learning Feedforward neural network Convolution neural network Recurrent neural network Recursive neural network (Recursive neural tensor network) Unsupervised Learning

More information

Sentiment Classification of Food Reviews

Sentiment Classification of Food Reviews Sentiment Classification of Food Reviews Hua Feng Department of Electrical Engineering Stanford University Stanford, CA 94305 fengh15@stanford.edu Ruixi Lin Department of Electrical Engineering Stanford

More information

Semantic Estimation for Texts in Software Engineering

Semantic Estimation for Texts in Software Engineering Semantic Estimation for Texts in Software Engineering 汇报人 : Reporter:Xiaochen Li Dalian University of Technology, China 大连理工大学 2016 年 11 月 29 日 Oscar Lab 2 Ph.D. candidate at OSCAR Lab, in Dalian University

More information

Neural Networks for Machine Learning. Lecture 15a From Principal Components Analysis to Autoencoders

Neural Networks for Machine Learning. Lecture 15a From Principal Components Analysis to Autoencoders Neural Networks for Machine Learning Lecture 15a From Principal Components Analysis to Autoencoders Geoffrey Hinton Nitish Srivastava, Kevin Swersky Tijmen Tieleman Abdel-rahman Mohamed Principal Components

More information

learning stage (Stage 1), CNNH learns approximate hash codes for training images by optimizing the following loss function:

learning stage (Stage 1), CNNH learns approximate hash codes for training images by optimizing the following loss function: 1 Query-adaptive Image Retrieval by Deep Weighted Hashing Jian Zhang and Yuxin Peng arxiv:1612.2541v2 [cs.cv] 9 May 217 Abstract Hashing methods have attracted much attention for large scale image retrieval.

More information

Outline. Morning program Preliminaries Semantic matching Learning to rank Entities

Outline. Morning program Preliminaries Semantic matching Learning to rank Entities 112 Outline Morning program Preliminaries Semantic matching Learning to rank Afternoon program Modeling user behavior Generating responses Recommender systems Industry insights Q&A 113 are polysemic Finding

More information

YJTI at the NTCIR-13 STC Japanese Subtask

YJTI at the NTCIR-13 STC Japanese Subtask 社外秘 YJTI at the NTCIR-13 STC Japanese Subtask Dec. 7, 2017 Toru Shimizu 1 Overview 2 Retrieval or Generation Retrieval-based system Effective if you have a good matching model and enough candidate responses

More information

Show, Discriminate, and Tell: A Discriminatory Image Captioning Model with Deep Neural Networks

Show, Discriminate, and Tell: A Discriminatory Image Captioning Model with Deep Neural Networks Show, Discriminate, and Tell: A Discriminatory Image Captioning Model with Deep Neural Networks Zelun Luo Department of Computer Science Stanford University zelunluo@stanford.edu Te-Lin Wu Department of

More information

Lecture 20: Neural Networks for NLP. Zubin Pahuja

Lecture 20: Neural Networks for NLP. Zubin Pahuja Lecture 20: Neural Networks for NLP Zubin Pahuja zpahuja2@illinois.edu courses.engr.illinois.edu/cs447 CS447: Natural Language Processing 1 Today s Lecture Feed-forward neural networks as classifiers simple

More information

Learning to Rank for Faceted Search Bridging the gap between theory and practice

Learning to Rank for Faceted Search Bridging the gap between theory and practice Learning to Rank for Faceted Search Bridging the gap between theory and practice Agnes van Belle @ Berlin Buzzwords 2017 Job-to-person search system Generated query Match indicator Faceted search Multiple

More information

Layerwise Interweaving Convolutional LSTM

Layerwise Interweaving Convolutional LSTM Layerwise Interweaving Convolutional LSTM Tiehang Duan and Sargur N. Srihari Department of Computer Science and Engineering The State University of New York at Buffalo Buffalo, NY 14260, United States

More information

arxiv: v1 [cs.ir] 16 Oct 2017

arxiv: v1 [cs.ir] 16 Oct 2017 DeepRank: A New Deep Architecture for Relevance Ranking in Information Retrieval Liang Pang, Yanyan Lan, Jiafeng Guo, Jun Xu, Jingfang Xu, Xueqi Cheng pl8787@gmail.com,{lanyanyan,guojiafeng,junxu,cxq}@ict.ac.cn,xujingfang@sogou-inc.com

More information

Grounded Compositional Semantics for Finding and Describing Images with Sentences

Grounded Compositional Semantics for Finding and Describing Images with Sentences Grounded Compositional Semantics for Finding and Describing Images with Sentences R. Socher, A. Karpathy, V. Le,D. Manning, A Y. Ng - 2013 Ali Gharaee 1 Alireza Keshavarzi 2 1 Department of Computational

More information

Chapter 6: Information Retrieval and Web Search. An introduction

Chapter 6: Information Retrieval and Web Search. An introduction Chapter 6: Information Retrieval and Web Search An introduction Introduction n Text mining refers to data mining using text documents as data. n Most text mining tasks use Information Retrieval (IR) methods

More information

Kernels vs. DNNs for Speech Recognition

Kernels vs. DNNs for Speech Recognition Kernels vs. DNNs for Speech Recognition Joint work with: Columbia: Linxi (Jim) Fan, Michael Collins (my advisor) USC: Zhiyun Lu, Kuan Liu, Alireza Bagheri Garakani, Dong Guo, Aurélien Bellet, Fei Sha IBM:

More information

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

COMP 551 Applied Machine Learning Lecture 16: Deep Learning COMP 551 Applied Machine Learning Lecture 16: Deep Learning Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted, all

More information

NUS-I2R: Learning a Combined System for Entity Linking

NUS-I2R: Learning a Combined System for Entity Linking NUS-I2R: Learning a Combined System for Entity Linking Wei Zhang Yan Chuan Sim Jian Su Chew Lim Tan School of Computing National University of Singapore {z-wei, tancl} @comp.nus.edu.sg Institute for Infocomm

More information

Sentence selection with neural networks over string kernels

Sentence selection with neural networks over string kernels Sentence selection with neural networks over string kernels Mihai Dan Mașala, Ștefan Rușeți, Traian Rebedea KES 2017 University POLITEHNICA of Bucharest Introduction Sentence selection: given a question,

More information

Multimodal Medical Image Retrieval based on Latent Topic Modeling

Multimodal Medical Image Retrieval based on Latent Topic Modeling Multimodal Medical Image Retrieval based on Latent Topic Modeling Mandikal Vikram 15it217.vikram@nitk.edu.in Suhas BS 15it110.suhas@nitk.edu.in Aditya Anantharaman 15it201.aditya.a@nitk.edu.in Sowmya Kamath

More information

PTE : Predictive Text Embedding through Large-scale Heterogeneous Text Networks

PTE : Predictive Text Embedding through Large-scale Heterogeneous Text Networks PTE : Predictive Text Embedding through Large-scale Heterogeneous Text Networks Pramod Srinivasan CS591txt - Text Mining Seminar University of Illinois, Urbana-Champaign April 8, 2016 Pramod Srinivasan

More information

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or

More information

Deep Model Compression

Deep Model Compression Deep Model Compression Xin Wang Oct.31.2016 Some of the contents are borrowed from Hinton s and Song s slides. Two papers Distilling the Knowledge in a Neural Network by Geoffrey Hinton et al What s the

More information

Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank text

Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank text Philosophische Fakultät Seminar für Sprachwissenschaft Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank text 06 July 2017, Patricia Fischer & Neele Witte Overview Sentiment

More information

CS229 Final Project: Predicting Expected Response Times

CS229 Final Project: Predicting Expected  Response Times CS229 Final Project: Predicting Expected Email Response Times Laura Cruz-Albrecht (lcruzalb), Kevin Khieu (kkhieu) December 15, 2017 1 Introduction Each day, countless emails are sent out, yet the time

More information

ImageNet Classification with Deep Convolutional Neural Networks

ImageNet Classification with Deep Convolutional Neural Networks ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky Ilya Sutskever Geoffrey Hinton University of Toronto Canada Paper with same name to appear in NIPS 2012 Main idea Architecture

More information

Entity and Knowledge Base-oriented Information Retrieval

Entity and Knowledge Base-oriented Information Retrieval Entity and Knowledge Base-oriented Information Retrieval Presenter: Liuqing Li liuqing@vt.edu Digital Library Research Laboratory Virginia Polytechnic Institute and State University Blacksburg, VA 24061

More information

Applying Supervised Learning

Applying Supervised Learning Applying Supervised Learning When to Consider Supervised Learning A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains

More information

Metric Learning for Large-Scale Image Classification:

Metric Learning for Large-Scale Image Classification: Metric Learning for Large-Scale Image Classification: Generalizing to New Classes at Near-Zero Cost Florent Perronnin 1 work published at ECCV 2012 with: Thomas Mensink 1,2 Jakob Verbeek 2 Gabriela Csurka

More information

Ranking and Learning. Table of Content. Weighted scoring for ranking Learning to rank: A simple example Learning to ranking as classification.

Ranking and Learning. Table of Content. Weighted scoring for ranking Learning to rank: A simple example Learning to ranking as classification. Table of Content anking and Learning Weighted scoring for ranking Learning to rank: A simple example Learning to ranking as classification 290 UCSB, Tao Yang, 2013 Partially based on Manning, aghavan,

More information

James Mayfield! The Johns Hopkins University Applied Physics Laboratory The Human Language Technology Center of Excellence!

James Mayfield! The Johns Hopkins University Applied Physics Laboratory The Human Language Technology Center of Excellence! James Mayfield! The Johns Hopkins University Applied Physics Laboratory The Human Language Technology Center of Excellence! (301) 219-4649 james.mayfield@jhuapl.edu What is Information Retrieval? Evaluation

More information

Ensemble methods in machine learning. Example. Neural networks. Neural networks

Ensemble methods in machine learning. Example. Neural networks. Neural networks Ensemble methods in machine learning Bootstrap aggregating (bagging) train an ensemble of models based on randomly resampled versions of the training set, then take a majority vote Example What if you

More information

Deep Learning. Deep Learning. Practical Application Automatically Adding Sounds To Silent Movies

Deep Learning. Deep Learning. Practical Application Automatically Adding Sounds To Silent Movies http://blog.csdn.net/zouxy09/article/details/8775360 Automatic Colorization of Black and White Images Automatically Adding Sounds To Silent Movies Traditionally this was done by hand with human effort

More information

LSTM for Language Translation and Image Captioning. Tel Aviv University Deep Learning Seminar Oran Gafni & Noa Yedidia

LSTM for Language Translation and Image Captioning. Tel Aviv University Deep Learning Seminar Oran Gafni & Noa Yedidia 1 LSTM for Language Translation and Image Captioning Tel Aviv University Deep Learning Seminar Oran Gafni & Noa Yedidia 2 Part I LSTM for Language Translation Motivation Background (RNNs, LSTMs) Model

More information

A Study of MatchPyramid Models on Ad hoc Retrieval

A Study of MatchPyramid Models on Ad hoc Retrieval A Study of MatchPyramid Models on Ad hoc Retrieval Liang Pang, Yanyan Lan, Jiafeng Guo, Jun Xu, Xueqi Cheng Institute of Computing Technology, Chinese Academy of Sciences Text Matching Many text based

More information

CS 1674: Intro to Computer Vision. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh November 16, 2016

CS 1674: Intro to Computer Vision. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh November 16, 2016 CS 1674: Intro to Computer Vision Neural Networks Prof. Adriana Kovashka University of Pittsburgh November 16, 2016 Announcements Please watch the videos I sent you, if you haven t yet (that s your reading)

More information

CS 224d: Assignment #1

CS 224d: Assignment #1 Due date: assignment) 4/19 11:59 PM PST (You are allowed to use three (3) late days maximum for this These questions require thought, but do not require long answers. Please be as concise as possible.

More information

Information Retrieval. Information Retrieval and Web Search

Information Retrieval. Information Retrieval and Web Search Information Retrieval and Web Search Introduction to IR models and methods Information Retrieval The indexing and retrieval of textual documents. Searching for pages on the World Wide Web is the most recent

More information

Joint Shape Segmentation

Joint Shape Segmentation Joint Shape Segmentation Motivations Structural similarity of segmentations Extraneous geometric clues Single shape segmentation [Chen et al. 09] Joint shape segmentation [Huang et al. 11] Motivations

More information

CPSC 340: Machine Learning and Data Mining. Deep Learning Fall 2018

CPSC 340: Machine Learning and Data Mining. Deep Learning Fall 2018 CPSC 340: Machine Learning and Data Mining Deep Learning Fall 2018 Last Time: Multi-Dimensional Scaling Multi-dimensional scaling (MDS): Non-parametric visualization: directly optimize the z i locations.

More information

FastText. Jon Koss, Abhishek Jindal

FastText. Jon Koss, Abhishek Jindal FastText Jon Koss, Abhishek Jindal FastText FastText is on par with state-of-the-art deep learning classifiers in terms of accuracy But it is way faster: FastText can train on more than one billion words

More information

Query-by-example spoken term detection based on phonetic posteriorgram Query-by-example spoken term detection based on phonetic posteriorgram

Query-by-example spoken term detection based on phonetic posteriorgram Query-by-example spoken term detection based on phonetic posteriorgram International Conference on Education, Management and Computing Technology (ICEMCT 2015) Query-by-example spoken term detection based on phonetic posteriorgram Query-by-example spoken term detection based

More information

Taming Text. How to Find, Organize, and Manipulate It MANNING GRANT S. INGERSOLL THOMAS S. MORTON ANDREW L. KARRIS. Shelter Island

Taming Text. How to Find, Organize, and Manipulate It MANNING GRANT S. INGERSOLL THOMAS S. MORTON ANDREW L. KARRIS. Shelter Island Taming Text How to Find, Organize, and Manipulate It GRANT S. INGERSOLL THOMAS S. MORTON ANDREW L. KARRIS 11 MANNING Shelter Island contents foreword xiii preface xiv acknowledgments xvii about this book

More information

Hidden Markov Models. Gabriela Tavares and Juri Minxha Mentor: Taehwan Kim CS159 04/25/2017

Hidden Markov Models. Gabriela Tavares and Juri Minxha Mentor: Taehwan Kim CS159 04/25/2017 Hidden Markov Models Gabriela Tavares and Juri Minxha Mentor: Taehwan Kim CS159 04/25/2017 1 Outline 1. 2. 3. 4. Brief review of HMMs Hidden Markov Support Vector Machines Large Margin Hidden Markov Models

More information

Outline. Eg. 1: DBLP. Motivation. Eg. 2: ACM DL Portal. Eg. 2: DBLP. Digital Libraries (DL) often have many errors that negatively affect:

Outline. Eg. 1: DBLP. Motivation. Eg. 2: ACM DL Portal. Eg. 2: DBLP. Digital Libraries (DL) often have many errors that negatively affect: Outline Effective and Scalable Solutions for Mixed and Split Citation Problems in Digital Libraries Dongwon Lee, Byung-Won On Penn State University, USA Jaewoo Kang North Carolina State University, USA

More information

Pair-wise Distance Metric Learning of Neural Network Model for Spoken Language Identification

Pair-wise Distance Metric Learning of Neural Network Model for Spoken Language Identification INTERSPEECH 2016 September 8 12, 2016, San Francisco, USA Pair-wise Distance Metric Learning of Neural Network Model for Spoken Language Identification 2 1 Xugang Lu 1, Peng Shen 1, Yu Tsao 2, Hisashi

More information

CS490W. Text Clustering. Luo Si. Department of Computer Science Purdue University

CS490W. Text Clustering. Luo Si. Department of Computer Science Purdue University CS490W Text Clustering Luo Si Department of Computer Science Purdue University [Borrows slides from Chris Manning, Ray Mooney and Soumen Chakrabarti] Clustering Document clustering Motivations Document

More information

A Deep Top-K Relevance Matching Model for Ad-hoc Retrieval

A Deep Top-K Relevance Matching Model for Ad-hoc Retrieval A Deep Top-K Relevance Matching Model for Ad-hoc Retrieval Zhou Yang, Qingfeng Lan, Jiafeng Guo, Yixing Fan, Xiaofei Zhu, Yanyan Lan, Yue Wang, and Xueqi Cheng School of Computer Science and Engineering,

More information

S-MART: Novel Tree-based Structured Learning Algorithms Applied to Tweet Entity Linking

S-MART: Novel Tree-based Structured Learning Algorithms Applied to Tweet Entity Linking S-MART: Novel Tree-based Structured Learning Algorithms Applied to Tweet Entity Linking Yi Yang * and Ming-Wei Chang # * Georgia Institute of Technology, Atlanta # Microsoft Research, Redmond Traditional

More information

Learning to Reweight Terms with Distributed Representations

Learning to Reweight Terms with Distributed Representations Learning to Reweight Terms with Distributed Representations School of Computer Science Carnegie Mellon University August 12, 215 Outline Goal: Assign weights to query terms for better retrieval results

More information

CS 224N: Assignment #1

CS 224N: Assignment #1 Due date: assignment) 1/25 11:59 PM PST (You are allowed to use three (3) late days maximum for this These questions require thought, but do not require long answers. Please be as concise as possible.

More information

Information Retrieval and Web Search

Information Retrieval and Web Search Information Retrieval and Web Search Introduction to IR models and methods Rada Mihalcea (Some of the slides in this slide set come from IR courses taught at UT Austin and Stanford) Information Retrieval

More information

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic SEMANTIC COMPUTING Lecture 8: Introduction to Deep Learning Dagmar Gromann International Center For Computational Logic TU Dresden, 7 December 2018 Overview Introduction Deep Learning General Neural Networks

More information

Neural Networks for unsupervised learning From Principal Components Analysis to Autoencoders to semantic hashing

Neural Networks for unsupervised learning From Principal Components Analysis to Autoencoders to semantic hashing Neural Networks for unsupervised learning From Principal Components Analysis to Autoencoders to semantic hashing feature 3 PC 3 Beate Sick Many slides are taken form Hinton s great lecture on NN: https://www.coursera.org/course/neuralnets

More information

Seq2SQL: Generating Structured Queries from Natural Language Using Reinforcement Learning

Seq2SQL: Generating Structured Queries from Natural Language Using Reinforcement Learning Seq2SQL: Generating Structured Queries from Natural Language Using Reinforcement Learning V. Zhong, C. Xiong, R. Socher Salesforce Research arxiv: 1709.00103 Reviewed by : Bill Zhang University of Virginia

More information

Manifold Constrained Deep Neural Networks for ASR

Manifold Constrained Deep Neural Networks for ASR 1 Manifold Constrained Deep Neural Networks for ASR Department of Electrical and Computer Engineering, McGill University Richard Rose and Vikrant Tomar Motivation Speech features can be characterized as

More information

Automatic Record Linkage using Seeded Nearest Neighbour and SVM Classification

Automatic Record Linkage using Seeded Nearest Neighbour and SVM Classification Automatic Record Linkage using Seeded Nearest Neighbour and SVM Classification Peter Christen Department of Computer Science, ANU College of Engineering and Computer Science, The Australian National University,

More information

CNN optimization. Rassadin A

CNN optimization. Rassadin A CNN optimization Rassadin A. 01.2017-02.2017 What to optimize? Training stage time consumption (CPU / GPU) Inference stage time consumption (CPU / GPU) Training stage memory consumption Inference stage

More information

Machine Learning. MGS Lecture 3: Deep Learning

Machine Learning. MGS Lecture 3: Deep Learning Dr Michel F. Valstar http://cs.nott.ac.uk/~mfv/ Machine Learning MGS Lecture 3: Deep Learning Dr Michel F. Valstar http://cs.nott.ac.uk/~mfv/ WHAT IS DEEP LEARNING? Shallow network: Only one hidden layer

More information

CS 179 Lecture 16. Logistic Regression & Parallel SGD

CS 179 Lecture 16. Logistic Regression & Parallel SGD CS 179 Lecture 16 Logistic Regression & Parallel SGD 1 Outline logistic regression (stochastic) gradient descent parallelizing SGD for neural nets (with emphasis on Google s distributed neural net implementation)

More information

ABC-CNN: Attention Based CNN for Visual Question Answering

ABC-CNN: Attention Based CNN for Visual Question Answering ABC-CNN: Attention Based CNN for Visual Question Answering CIS 601 PRESENTED BY: MAYUR RUMALWALA GUIDED BY: DR. SUNNIE CHUNG AGENDA Ø Introduction Ø Understanding CNN Ø Framework of ABC-CNN Ø Datasets

More information

Lecture 13. Deep Belief Networks. Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen

Lecture 13. Deep Belief Networks. Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen Lecture 13 Deep Belief Networks Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen IBM T.J. Watson Research Center Yorktown Heights, New York, USA {picheny,bhuvana,stanchen}@us.ibm.com 12 December 2012

More information

Query Intent Detection using Convolutional Neural Networks

Query Intent Detection using Convolutional Neural Networks Query Intent Detection using Convolutional Neural Networks Homa B Hashemi Intelligent Systems Program University of Pittsburgh hashemi@cspittedu Amir Asiaee, Reiner Kraft Yahoo! inc Sunnyvale, CA {amirasiaee,

More information

Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda

Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda 1 Observe novel applicability of DL techniques in Big Data Analytics. Applications of DL techniques for common Big Data Analytics problems. Semantic indexing

More information

Estimating Embedding Vectors for Queries

Estimating Embedding Vectors for Queries Estimating Embedding Vectors for Queries Hamed Zamani Center for Intelligent Information Retrieval College of Information and Computer Sciences University of Massachusetts Amherst Amherst, MA 01003 zamani@cs.umass.edu

More information

A Fast Learning Algorithm for Deep Belief Nets

A Fast Learning Algorithm for Deep Belief Nets A Fast Learning Algorithm for Deep Belief Nets Geoffrey E. Hinton, Simon Osindero Department of Computer Science University of Toronto, Toronto, Canada Yee-Whye Teh Department of Computer Science National

More information

Ruslan Salakhutdinov and Geoffrey Hinton. University of Toronto, Machine Learning Group IRGM Workshop July 2007

Ruslan Salakhutdinov and Geoffrey Hinton. University of Toronto, Machine Learning Group IRGM Workshop July 2007 SEMANIC HASHING Ruslan Salakhutdinov and Geoffrey Hinton University of oronto, Machine Learning Group IRGM orkshop July 2007 Existing Methods One of the most popular and widely used in practice algorithms

More information

Support Vector Machines

Support Vector Machines Support Vector Machines RBF-networks Support Vector Machines Good Decision Boundary Optimization Problem Soft margin Hyperplane Non-linear Decision Boundary Kernel-Trick Approximation Accurancy Overtraining

More information

arxiv: v1 [cs.cv] 6 Jul 2016

arxiv: v1 [cs.cv] 6 Jul 2016 arxiv:607.079v [cs.cv] 6 Jul 206 Deep CORAL: Correlation Alignment for Deep Domain Adaptation Baochen Sun and Kate Saenko University of Massachusetts Lowell, Boston University Abstract. Deep neural networks

More information

5 Learning hypothesis classes (16 points)

5 Learning hypothesis classes (16 points) 5 Learning hypothesis classes (16 points) Consider a classification problem with two real valued inputs. For each of the following algorithms, specify all of the separators below that it could have generated

More information

Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p.

Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p. Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p. 6 What is Web Mining? p. 6 Summary of Chapters p. 8 How

More information

Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction

Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction by Noh, Hyeonwoo, Paul Hongsuck Seo, and Bohyung Han.[1] Presented : Badri Patro 1 1 Computer Vision Reading

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University March 4, 2015 Today: Graphical models Bayes Nets: EM Mixture of Gaussian clustering Learning Bayes Net structure

More information

CS 6320 Natural Language Processing

CS 6320 Natural Language Processing CS 6320 Natural Language Processing Information Retrieval Yang Liu Slides modified from Ray Mooney s (http://www.cs.utexas.edu/users/mooney/ir-course/slides/) 1 Introduction of IR System components, basic

More information

DECISION TREES & RANDOM FORESTS X CONVOLUTIONAL NEURAL NETWORKS

DECISION TREES & RANDOM FORESTS X CONVOLUTIONAL NEURAL NETWORKS DECISION TREES & RANDOM FORESTS X CONVOLUTIONAL NEURAL NETWORKS Deep Neural Decision Forests Microsoft Research Cambridge UK, ICCV 2015 Decision Forests, Convolutional Networks and the Models in-between

More information

CSC 578 Neural Networks and Deep Learning

CSC 578 Neural Networks and Deep Learning CSC 578 Neural Networks and Deep Learning Fall 2018/19 7. Recurrent Neural Networks (Some figures adapted from NNDL book) 1 Recurrent Neural Networks 1. Recurrent Neural Networks (RNNs) 2. RNN Training

More information

A novel supervised learning algorithm and its use for Spam Detection in Social Bookmarking Systems

A novel supervised learning algorithm and its use for Spam Detection in Social Bookmarking Systems A novel supervised learning algorithm and its use for Spam Detection in Social Bookmarking Systems Anestis Gkanogiannis and Theodore Kalamboukis Department of Informatics Athens University of Economics

More information

Learning to Rank with Attentive Media Attributes

Learning to Rank with Attentive Media Attributes Learning to Rank with Attentive Media Attributes Baldo Faieta Yang (Allie) Yang Adobe Adobe San Francisco, CA 94103 San Francisco, CA. 94103 bfaieta@adobe.com yangyan@adobe.com Abstract In the context

More information

Bing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. Springer

Bing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. Springer Bing Liu Web Data Mining Exploring Hyperlinks, Contents, and Usage Data With 177 Figures Springer Table of Contents 1. Introduction 1 1.1. What is the World Wide Web? 1 1.2. A Brief History of the Web

More information

ECG782: Multidimensional Digital Signal Processing

ECG782: Multidimensional Digital Signal Processing ECG782: Multidimensional Digital Signal Processing Object Recognition http://www.ee.unlv.edu/~b1morris/ecg782/ 2 Outline Knowledge Representation Statistical Pattern Recognition Neural Networks Boosting

More information

Probabilistic Siamese Network for Learning Representations. Chen Liu

Probabilistic Siamese Network for Learning Representations. Chen Liu Probabilistic Siamese Network for Learning Representations by Chen Liu A thesis submitted in conformity with the requirements for the degree of Master of Applied Science Graduate Department of Electrical

More information

Variable-Component Deep Neural Network for Robust Speech Recognition

Variable-Component Deep Neural Network for Robust Speech Recognition Variable-Component Deep Neural Network for Robust Speech Recognition Rui Zhao 1, Jinyu Li 2, and Yifan Gong 2 1 Microsoft Search Technology Center Asia, Beijing, China 2 Microsoft Corporation, One Microsoft

More information

Learning Dense Models of Query Similarity from User Click Logs

Learning Dense Models of Query Similarity from User Click Logs Learning Dense Models of Query Similarity from User Click Logs Fabio De Bona, Stefan Riezler*, Keith Hall, Massi Ciaramita, Amac Herdagdelen, Maria Holmqvist Google Research, Zürich *Dept. of Computational

More information

Fully Convolutional Networks for Semantic Segmentation

Fully Convolutional Networks for Semantic Segmentation Fully Convolutional Networks for Semantic Segmentation Jonathan Long* Evan Shelhamer* Trevor Darrell UC Berkeley Chaim Ginzburg for Deep Learning seminar 1 Semantic Segmentation Define a pixel-wise labeling

More information

An Empirical Evaluation of Deep Architectures on Problems with Many Factors of Variation

An Empirical Evaluation of Deep Architectures on Problems with Many Factors of Variation An Empirical Evaluation of Deep Architectures on Problems with Many Factors of Variation Hugo Larochelle, Dumitru Erhan, Aaron Courville, James Bergstra, and Yoshua Bengio Université de Montréal 13/06/2007

More information

Autoencoder. Representation learning (related to dictionary learning) Both the input and the output are x

Autoencoder. Representation learning (related to dictionary learning) Both the input and the output are x Deep Learning 4 Autoencoder, Attention (spatial transformer), Multi-modal learning, Neural Turing Machine, Memory Networks, Generative Adversarial Net Jian Li IIIS, Tsinghua Autoencoder Autoencoder Unsupervised

More information

A Neuro Probabilistic Language Model Bengio et. al. 2003

A Neuro Probabilistic Language Model Bengio et. al. 2003 A Neuro Probabilistic Language Model Bengio et. al. 2003 Class Discussion Notes Scribe: Olivia Winn February 1, 2016 Opening thoughts (or why this paper is interesting): Word embeddings currently have

More information

Supervised Models for Multimodal Image Retrieval based on Visual, Semantic and Geographic Information

Supervised Models for Multimodal Image Retrieval based on Visual, Semantic and Geographic Information Supervised Models for Multimodal Image Retrieval based on Visual, Semantic and Geographic Information Duc-Tien Dang-Nguyen, Giulia Boato, Alessandro Moschitti, Francesco G.B. De Natale Department of Information

More information

What Causes My Test Alarm? Automatic Cause Analysis for Test Alarms in System and Integration Testing

What Causes My Test Alarm? Automatic Cause Analysis for Test Alarms in System and Integration Testing The 39th International Conference on Software Engineering What Causes My Test Alarm? Automatic Cause Analysis for Test Alarms in System and Integration Testing Authors: He Jiang 汇报人 1, Xiaochen Li : 1,

More information

Asynchronous Parallel Stochastic Gradient Descent. A Numeric Core for Scalable Distributed Machine Learning Algorithms

Asynchronous Parallel Stochastic Gradient Descent. A Numeric Core for Scalable Distributed Machine Learning Algorithms Asynchronous Parallel Stochastic Gradient Descent A Numeric Core for Scalable Distributed Machine Learning Algorithms J. Keuper and F.-J. Pfreundt Competence Center High Performance Computing Fraunhofer

More information

Weighted Suffix Tree Document Model for Web Documents Clustering

Weighted Suffix Tree Document Model for Web Documents Clustering ISBN 978-952-5726-09-1 (Print) Proceedings of the Second International Symposium on Networking and Network Security (ISNNS 10) Jinggangshan, P. R. China, 2-4, April. 2010, pp. 165-169 Weighted Suffix Tree

More information

Model compression as constrained optimization, with application to neural nets

Model compression as constrained optimization, with application to neural nets Model compression as constrained optimization, with application to neural nets Miguel Á. Carreira-Perpiñán and Yerlan Idelbayev Electrical Engineering and Computer Science University of California, Merced

More information

Natural Language Processing with Deep Learning CS224N/Ling284

Natural Language Processing with Deep Learning CS224N/Ling284 Natural Language Processing with Deep Learning CS224N/Ling284 Lecture 13: Convolutional Neural Networks (for NLP) Christopher Manning and Richard Socher Overview of today Organization Mini tutorial on

More information

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review Gautam Kunapuli Machine Learning Data is identically and independently distributed Goal is to learn a function that maps to Data is generated using an unknown function Learn a hypothesis that minimizes

More information