Machine Learning for Natural Language Processing. Alice Oh January 17, 2018

Size: px
Start display at page:

Download "Machine Learning for Natural Language Processing. Alice Oh January 17, 2018"

Transcription

1 Machine Learning for Natural Language Processing Alice Oh January 17, 2018

2 Overview Distributed representation Temporal neural networks RNN LSTM GRU Sequence-to-sequence models Machine translation Response generation Question answering 1/18/18 Users & Information Lab, KAIST 2

3 Vector Representation Word Embedding Figures by Sungjoon Park Vector Representation Word [ ] NLP tasks Word2vec (Mikolov, 2013) Glove (Pennington, et al., 2014) 1/17/18 Users & Information Lab, KAIST 3

4 Vector Representation Word Embedding Figures by Sungjoon Park CBOW (Mikolov et al., 2013) Input Projection Output Objective Given a sequence of training words w D ) w(t-2) 1 V # # log p w ) w )*+ 3 -./+/.,+12 w(t-1) [ ] sum w(t) where p w 4 w 5 is a softmax: w(t+1) p w 4 w 5 = exp (u =? > v =A ) = exp (u? = v =A ) w(t+2) computed by hierarchical softmax or negative sampling 1/18/18 Users & Information Lab, KAIST 4

5 Vector Representation Word Embedding Figures by Sungjoon Park Skip-Gram (Mikolov et al., 2013) Input Projection Output Objective Given a sequence of training words w D ) w(t-2) 1 V # # log p w )*+ w ) 3 -./+/.,+12 w(t) w(t-1) [ ] where p w 4 w 5 is a softmax: w(t+1) p w 4 w 5 = exp (u = >? v =A ) exp (u =? v =A ) = w(t+2) computed by hierarchical softmax or negative sampling 1/18/18 Users & Information Lab, KAIST 5

6 Vector Representation Word Embedding Figures by Sungjoon Park Glove (Pennington et al., 2014) Motivation Use global information (co-occurrence over corpus) while learning word vectors Objective Dot product between two word vectors should be equal to log of the words probability of co-occurrence (with given context words) [ ] F w G, w +, w H = P GH P +H 3 J = # f(x G+ )(w G ) wm + + b G + bp + log X G+ ) R G,+SD 1/18/18 Users & Information Lab, KAIST 6

7 Vector Representation Word Embedding Figures by Sungjoon Park Vector Representation??? [ ] [ ]??? Not clear what it means 1/18/18 Users & Information Lab, KAIST 7

8 Vector Representation Word Embedding Figures by Sungjoon Park Interpretable Vector Representation Company Fruits Nouns Apple [ ] [ ] Sports Numbers Sparsity (Faruqui et al., 2015) Understanding semantic / syntactic compositionality of words Increasing efficiency of storage Reducing complexity of higher-level models 1/18/18 Users & Information Lab, KAIST 8

9 Vector Representation Word Embedding Figures by Sungjoon Park Interpretable Vector Representation Company Fruits Nouns [ ] [ ] Sports Numbers Rotate Dimensions (as a post-processing method) 1/18/18 Users & Information Lab, KAIST 9

10 Vector Representation Word Embedding Figures by Sungjoon Park x R Superintendent Chairman President Commissioner [ ] Minister Superintendent Chairman President Commissioner Minister Russia China Italy Germany France x D Russia China Italy Germany France 1/18/18 Users & Information Lab, KAIST 10

11 Vector Representation Word Embedding Figures by Sungjoon Park x R x D x R [ ] x D Superintendent Chairman President Commissioner Minister Russia China Italy Germany France 1/18/18 Users & Information Lab, KAIST 11

12 Vector Representation Word Embedding Figures by Sungjoon Park To Compute Λ: Λ = AT Ø Satisfying: T _ T = I Orthogonal diag(t -D T -D_ ) = I Oblique Ø Minimize: Z Y Y f λ = 1 κ # # # λ [ ] G+ GSD +SD X1+,XSD R R λ GX Row complexity Y Z Y + κ # # # λ R R G+ λ X+ +SD GSD X1G,XSD Column complexity ü κ : weighting parameter Quartimax Varimax Parsimax Factor Parsimony 0 1/p m-1/p+m-2 1 1/18/18 Users & Information Lab, KAIST 12

13 Vector Representation Word Embedding Figures by Sungjoon Park Experimental Settings Training [ ] [ ] [ ] English Wikipedia 2 Vector Representations [ ] [ ] [ ] 16 Rotated Representations 5.3M articles 83M sentences 1,676 tokens Word2Vec, Glove words 300 dimensions For each kappa (4) For each Embedding (2) For each constraint (2) Implementation Algorithm : Gradient Projection (Jennrich. 2001) Github: (TensorFlow) 1/18/18 Users & Information Lab, KAIST 13

14 Vector Representation Word Embedding Figures by Sungjoon Park Task : Word Intrusion To measure semantic coherence of words: Intruder! { daughter, wife, sister, mother, son, bigram } Choosing an intruder (for that dimension): [ ] Rank Dim 2 Dim 3 Dim Topwords (k=5) daughter wife sister mother son bigram (also in top 10% in other dim) Intruder bigram (Below half) 1/18/18 Users & Information Lab, KAIST 14

15 Vector Representation Word Embedding Figures by Sungjoon Park Measure: Distance Ratio DR ghijkxx = 1 d m ksd m ksd k D Gl)ij k D Gl)jk, where k DR Gl)jk k DR Gl)ij = Σ = o Σ =p dist(w G, w + ) k(k 1) = Σ = o dist(w G, w Gl)jtmij ) k Example: [ ] k DR Gl)ij x R intruder topword2 topword3 topword1 topword5 topword4 k DR Gl)jk x D 1/18/18 Users & Information Lab, KAIST 15

16 Vector Representation Word Embedding Figures by Sungjoon Park Results Quantitative Results Qualitative Examples Distance Ratio SG Glove Original SOV SOV (non-neg) Quartimax (orthogonal) [ ] Varimax (orthogonal) Parsimax (orthogonal) FacParsim (orthogonal) Quartimax (oblique) Varimax (oblique) Parsimax (oblique) FacParsim (oblique) /18/18 Users & Information Lab, KAIST 16

17 Temporal Models Recurrent Neural Network There are many types of data that are sequential Language: sequence of words DNA: sequence of nucleotides Stock market: sequence of gains/losses Recurrent neural network is a flexible way to represent sequential data of arbitrary length 1/17/18 Users & Information Lab, KAIST 17

18 Temporal Models Recurrent Neural Network Figures by Chris Dyer 1/17/18 Users & Information Lab, KAIST 18

19 Temporal Models Recurrent Neural Network Figures by Chris Dyer 1/17/18 Users & Information Lab, KAIST 19

20 Temporal Models Recurrent Neural Network Figures by Chris Dyer 1/17/18 Users & Information Lab, KAIST 20

21 Temporal Models Recurrent Neural Network Figures by Chris Dyer 1/17/18 Users & Information Lab, KAIST 21

22 Temporal Models RNN Language Model Figures by Chris Dyer 1/17/18 Users & Information Lab, KAIST 22

23 Temporal Models RNN Language Model Figures by Chris Dyer 1/17/18 Users & Information Lab, KAIST 23

24 Temporal Models RNN Language Model Figures by Chris Dyer 1/17/18 Users & Information Lab, KAIST 24

25 Temporal Models RNN Language Model Figures by Chris Dyer 1/17/18 Users & Information Lab, KAIST 25

26 Temporal Models Long Short Term Memory Figures by Chris Dyer 1/17/18 Users & Information Lab, KAIST 26

27 Temporal Models Long Short Term Memory Figures by Chris Dyer 1/17/18 Users & Information Lab, KAIST 27

28 Temporal Models Long Short Term Memory Figures by Chris Dyer 1/17/18 Users & Information Lab, KAIST 28

29 Temporal Models Long Short Term Memory Figures by Chris Dyer 1/17/18 Users & Information Lab, KAIST 29

30 Temporal Models Gated Recurrent Unit Figures by Chris Dyer 1/17/18 Users & Information Lab, KAIST 30

31 Temporal Models Gated Recurrent Unit Figures by Chris Dyer 1/18/18 Users & Information Lab, KAIST 31

32 Sequence to Sequence Machine Translation Figures by Kyunghyun Cho 1/17/18 Users & Information Lab, KAIST 32 25

33 Sequence to Sequence Machine Translation Figures by Kyunghyun Cho 1/18/18 1/17/18 Users & Information Lab, KAIST 33 25

34 Sequence to Sequence Machine Translation Figures by Kyunghyun Cho 1/18/18 1/17/18 Users & Information Lab, KAIST 34 25

35 Sequence to Sequence Machine Translation Figures by Kyunghyun Cho 1/18/18 1/17/18 Users & Information Lab, KAIST 35 25

36 Sequence to Sequence Machine Translation Figures by Kyunghyun Cho 1/18/18 1/17/18 Users & Information Lab, KAIST 36 25

37 Sequence to Sequence Machine Translation Figures by Kyunghyun Cho 1/18/18 1/17/18 Users & Information Lab, KAIST 37 25

38 Sequence to Sequence Machine Translation Figures by Kyunghyun Cho 1/18/18 1/17/18 Users & Information Lab, KAIST 38 25

39 Sequence to Sequence Machine Translation Figures by Kyunghyun Cho 1/18/18 1/17/18 Users & Information Lab, KAIST

40 Sequence to Sequence Machine Translation Figures by Kyunghyun Cho 1/18/18 1/17/18 Users & Information Lab, KAIST 40 25

41 Sequence to Sequence Machine Translation Figures by Kyunghyun Cho 1/18/18 1/17/18 Users & Information Lab, KAIST 41 25

42 Sequence to Sequence Machine Translation Figures by Kyunghyun Cho 1/18/18 1/17/18 Users & Information Lab, KAIST 42 25

43 Sequence to Sequence Machine Translation Figures by Kyunghyun Cho 1/17/18 1/18/18 Users & Information Lab, KAIST 25 43

44 Sequence to Sequence Machine Translation Figures by Kyunghyun Cho 1/18/18 1/17/18 Users & Information Lab, KAIST 44 25

45 Sequence to Sequence Response Generation Slide From Bill MacCartney 1/17/18 Users & Information Lab, KAIST 45

46 Sequence to Sequence Response Generation Figures by Jiwei Lee; References Sutskever et al., 2014; Jean et al., 2014; Luong et al., 2015 Source : Input Messages Target : Responses I m fine. EOS Encoding Decoding how are you? eos I m fine. 1/18/18 Users & Information Lab, KAIST 46

47 Sequence to Sequence Response Generation Figures by Jiwei Lee How old are you? I don t know. 1/18/18 Users & Information Lab, KAIST 47

48 Sequence to Sequence Response Generation Figures by Jiwei Lee How is life? I don t know what you are talking about. 1/18/18 Users & Information Lab, KAIST 48

49 Sequence to Sequence Response Generation Figures by Jiwei Lee Do you love me? I don t know what you are talking about. 30% percent of all generated responses 1/18/18 Users & Information Lab, KAIST 49

50 Sequence to Sequence Response Generation Figures by Jiwei Lee What one asks I don t know 1/18/18 Users & Information Lab, KAIST 50

51 Sequence to Sequence Response Generation Figures by Jiwei Lee 1/18/18 Users & Information Lab, KAIST 51

52 Sequence to Sequence Response Generation Figures by Jiwei Lee Bayesian Rule Standard Seq2Seq model 1/18/18 Users & Information Lab, KAIST 52

53 Sequence to Sequence Response Generation Figures by Jiwei Lee Bayesian Rule 1/18/18 Users & Information Lab, KAIST 53

54 Sequence to Sequence Response Generation Figures by Jiwei Lee BLEU on Twitter Dataset 1/18/18 Users & Information Lab, KAIST 54

55 Sequence to Sequence Response Generation Figures by Jiwei Lee # Distinct Tokens in generated targets (divided by total #) on Opensubtitle dataset +385% +122% Users & Information Lab, KAIST 1/18/18 55

56 Sequence to Sequence Response Generation Figures by Jiwei Lee Standard Seq2Seq p(t s) Mutual Information 1/18/18 Users & Information Lab, KAIST 56

57 Attention Mechanism Sentiment Classification 1/17/18 Users & Information Lab, KAIST 57

58 Memory Networks Question Answering babi tasks (Weston, et al. ICLR 2016) John dropped the milk. John took the milk there. Sandra went to the bathroom. John moved to the hallway. Mary went to the bedroom. Where is the milk? Hallway Task 3: Two supporting facts 1/17/18 Users & Information Lab, KAIST 58

59 Memory Networks Question Answering Figures by Sukhbaatar 1/18/18 Users & Information Lab, KAIST 59

60 Memory Networks Question Answering Figures by Sukhbaatar 1/18/18 Users & Information Lab, KAIST 60

61 Memory Networks Question Answering Figures by Sukhbaatar 1/18/18 Users & Information Lab, KAIST 61

62 Memory Networks Question Answering Figures by Sukhbaatar 1/18/18 Users & Information Lab, KAIST 62

63 Memory Networks Question Answering Figures by Sukhbaatar 1/18/18 Users & Information Lab, KAIST 63

64 Summary Distributed representation Temporal neural networks RNN LSTM GRU Sequence-to-sequence models Machine translation Response generation Question answering 1/18/18 Users & Information Lab, KAIST 64

16-785: Integrated Intelligence in Robotics: Vision, Language, and Planning. Spring 2018 Lecture 14. Image to Text

16-785: Integrated Intelligence in Robotics: Vision, Language, and Planning. Spring 2018 Lecture 14. Image to Text 16-785: Integrated Intelligence in Robotics: Vision, Language, and Planning Spring 2018 Lecture 14. Image to Text Input Output Classification tasks 4/1/18 CMU 16-785: Integrated Intelligence in Robotics

More information

FastText. Jon Koss, Abhishek Jindal

FastText. Jon Koss, Abhishek Jindal FastText Jon Koss, Abhishek Jindal FastText FastText is on par with state-of-the-art deep learning classifiers in terms of accuracy But it is way faster: FastText can train on more than one billion words

More information

Rotated Word Vector Representations and their Interpretability

Rotated Word Vector Representations and their Interpretability Rotated Word Vector Representations and their Interpretability Sungjoon Park and JinYeong Bak and Alice Oh Department of Computing, KAIST, Republic of Korea {sungjoon.park, jy.bak}@kaist.ac.kr, alice.oh@kaist.edu

More information

Encoding RNNs, 48 End of sentence (EOS) token, 207 Exploding gradient, 131 Exponential function, 42 Exponential Linear Unit (ELU), 44

Encoding RNNs, 48 End of sentence (EOS) token, 207 Exploding gradient, 131 Exponential function, 42 Exponential Linear Unit (ELU), 44 A Activation potential, 40 Annotated corpus add padding, 162 check versions, 158 create checkpoints, 164, 166 create input, 160 create train and validation datasets, 163 dropout, 163 DRUG-AE.rel file,

More information

Recurrent Neural Nets II

Recurrent Neural Nets II Recurrent Neural Nets II Steven Spielberg Pon Kumar, Tingke (Kevin) Shen Machine Learning Reading Group, Fall 2016 9 November, 2016 Outline 1 Introduction 2 Problem Formulations with RNNs 3 LSTM for Optimization

More information

Vector Semantics. Dense Vectors

Vector Semantics. Dense Vectors Vector Semantics Dense Vectors Sparse versus dense vectors PPMI vectors are long (length V = 20,000 to 50,000) sparse (most elements are zero) Alternative: learn vectors which are short (length 200-1000)

More information

Mining Human Trajectory Data: A Study on Check-in Sequences. Xin Zhao Renmin University of China,

Mining Human Trajectory Data: A Study on Check-in Sequences. Xin Zhao Renmin University of China, Mining Human Trajectory Data: A Study on Check-in Sequences Xin Zhao batmanfly@qq.com Renmin University of China, Check-in data What information these check-in data contain? User ID Location ID Check-in

More information

CS 224N: Assignment #1

CS 224N: Assignment #1 Due date: assignment) 1/25 11:59 PM PST (You are allowed to use three (3) late days maximum for this These questions require thought, but do not require long answers. Please be as concise as possible.

More information

LSTM for Language Translation and Image Captioning. Tel Aviv University Deep Learning Seminar Oran Gafni & Noa Yedidia

LSTM for Language Translation and Image Captioning. Tel Aviv University Deep Learning Seminar Oran Gafni & Noa Yedidia 1 LSTM for Language Translation and Image Captioning Tel Aviv University Deep Learning Seminar Oran Gafni & Noa Yedidia 2 Part I LSTM for Language Translation Motivation Background (RNNs, LSTMs) Model

More information

Sentiment Classification of Food Reviews

Sentiment Classification of Food Reviews Sentiment Classification of Food Reviews Hua Feng Department of Electrical Engineering Stanford University Stanford, CA 94305 fengh15@stanford.edu Ruixi Lin Department of Electrical Engineering Stanford

More information

CS 224N: Assignment #1

CS 224N: Assignment #1 Due date: assignment) 1/25 11:59 PM PST (You are allowed to use three (3) late days maximum for this These questions require thought, but do not require long answers. Please be as concise as possible.

More information

Differentiable Data Structures (and POMDPs)

Differentiable Data Structures (and POMDPs) Differentiable Data Structures (and POMDPs) Yarin Gal & Rowan McAllister February 11, 2016 Many thanks to Edward Grefenstette for graphics material; other sources include Wikimedia licensed under CC BY-SA

More information

Slide credit from Hung-Yi Lee & Richard Socher

Slide credit from Hung-Yi Lee & Richard Socher Slide credit from Hung-Yi Lee & Richard Socher 1 Review Word Vector 2 Word2Vec Variants Skip-gram: predicting surrounding words given the target word (Mikolov+, 2013) CBOW (continuous bag-of-words): predicting

More information

A Hybrid Neural Model for Type Classification of Entity Mentions

A Hybrid Neural Model for Type Classification of Entity Mentions A Hybrid Neural Model for Type Classification of Entity Mentions Motivation Types group entities to categories Entity types are important for various NLP tasks Our task: predict an entity mention s type

More information

Empirical Evaluation of RNN Architectures on Sentence Classification Task

Empirical Evaluation of RNN Architectures on Sentence Classification Task Empirical Evaluation of RNN Architectures on Sentence Classification Task Lei Shen, Junlin Zhang Chanjet Information Technology lorashen@126.com, zhangjlh@chanjet.com Abstract. Recurrent Neural Networks

More information

Neural Network Models for Text Classification. Hongwei Wang 18/11/2016

Neural Network Models for Text Classification. Hongwei Wang 18/11/2016 Neural Network Models for Text Classification Hongwei Wang 18/11/2016 Deep Learning in NLP Feedforward Neural Network The most basic form of NN Convolutional Neural Network (CNN) Quite successful in computer

More information

Convolutional Networks for Text

Convolutional Networks for Text CS11-747 Neural Networks for NLP Convolutional Networks for Text Graham Neubig Site https://phontron.com/class/nn4nlp2017/ An Example Prediction Problem: Sentence Classification I hate this movie very

More information

An Ensemble Dialogue System for Facts-Based Sentence Generation

An Ensemble Dialogue System for Facts-Based Sentence Generation Track2 Oral Session : Sentence Generation An Ensemble Dialogue System for Facts-Based Sentence Generation Ryota Tanaka, Akihide Ozeki, Shugo Kato, Akinobu Lee Graduate School of Engineering, Nagoya Institute

More information

CSC 578 Neural Networks and Deep Learning

CSC 578 Neural Networks and Deep Learning CSC 578 Neural Networks and Deep Learning Fall 2018/19 7. Recurrent Neural Networks (Some figures adapted from NNDL book) 1 Recurrent Neural Networks 1. Recurrent Neural Networks (RNNs) 2. RNN Training

More information

Neural Symbolic Machines: Learning Semantic Parsers on Freebase with Weak Supervision

Neural Symbolic Machines: Learning Semantic Parsers on Freebase with Weak Supervision Neural Symbolic Machines: Learning Semantic Parsers on Freebase with Weak Supervision Anonymized for review Abstract Extending the success of deep neural networks to high level tasks like natural language

More information

Vector Semantics. Dense Vectors

Vector Semantics. Dense Vectors Vector Semantics Dense Vectors Sparse versus dense vectors PPMI vectors are long (length V = 20,000 to 50,000) sparse (most elements are zero) Alterna>ve: learn vectors which are short (length 200-1000)

More information

ABC-CNN: Attention Based CNN for Visual Question Answering

ABC-CNN: Attention Based CNN for Visual Question Answering ABC-CNN: Attention Based CNN for Visual Question Answering CIS 601 PRESENTED BY: MAYUR RUMALWALA GUIDED BY: DR. SUNNIE CHUNG AGENDA Ø Introduction Ø Understanding CNN Ø Framework of ABC-CNN Ø Datasets

More information

Gated Recurrent Models. Stephan Gouws & Richard Klein

Gated Recurrent Models. Stephan Gouws & Richard Klein Gated Recurrent Models Stephan Gouws & Richard Klein Outline Part 1: Intuition, Inference and Training Building intuitions: From Feedforward to Recurrent Models Inference in RNNs: Fprop Training in RNNs:

More information

Table of Contents. What Really is a Hidden Unit? Visualizing Feed-Forward NNs. Visualizing Convolutional NNs. Visualizing Recurrent NNs

Table of Contents. What Really is a Hidden Unit? Visualizing Feed-Forward NNs. Visualizing Convolutional NNs. Visualizing Recurrent NNs Table of Contents What Really is a Hidden Unit? Visualizing Feed-Forward NNs Visualizing Convolutional NNs Visualizing Recurrent NNs Visualizing Attention Visualizing High Dimensional Data What do visualizations

More information

Machine Learning 13. week

Machine Learning 13. week Machine Learning 13. week Deep Learning Convolutional Neural Network Recurrent Neural Network 1 Why Deep Learning is so Popular? 1. Increase in the amount of data Thanks to the Internet, huge amount of

More information

Convolutional Sequence to Sequence Learning. Denis Yarats with Jonas Gehring, Michael Auli, David Grangier, Yann Dauphin Facebook AI Research

Convolutional Sequence to Sequence Learning. Denis Yarats with Jonas Gehring, Michael Auli, David Grangier, Yann Dauphin Facebook AI Research Convolutional Sequence to Sequence Learning Denis Yarats with Jonas Gehring, Michael Auli, David Grangier, Yann Dauphin Facebook AI Research Sequence generation Need to model a conditional distribution

More information

Word-Conditioned Image Style Transfer. Yu Sugiyama and Keiji Yanai The University of Electro-Communications, Tokyo

Word-Conditioned Image Style Transfer. Yu Sugiyama and Keiji Yanai The University of Electro-Communications, Tokyo Word-Conditioned Image Style Transfer Yu Sugiyama and Keiji Yanai The University of Electro-Communications, Tokyo 1 Introduction Neural Style Transfer, Image style transfer 2018/11/27 UEC Yanai Lab. Tokyo.

More information

Predicting your Next Stop-Over from Location-Based Social Network Data with Recurrent Neural Networks RecTour workshop 2017 RecSys 2017, Como, Italy

Predicting your Next Stop-Over from Location-Based Social Network Data with Recurrent Neural Networks RecTour workshop 2017 RecSys 2017, Como, Italy Predicting your Next Stop-Over from Location-Based Social Network Data with Recurrent Neural Networks RecTour workshop 217 RecSys 217, Como, Italy Enrico Palumbo, ISMB, Italy, Turin Giuseppe Rizzo, ISMB,

More information

Structured Attention Networks

Structured Attention Networks Structured Attention Networks Yoon Kim Carl Denton Luong Hoang Alexander M. Rush HarvardNLP 1 Deep Neural Networks for Text Processing and Generation 2 Attention Networks 3 Structured Attention Networks

More information

Bridging Semantic Gaps between Natural Languages and APIs with Word Embedding

Bridging Semantic Gaps between Natural Languages and APIs with Word Embedding IEEE Transactions on Software Engineering, 2019 Bridging Semantic Gaps between Natural Languages and APIs with Word Embedding Authors: Xiaochen Li 1, He Jiang 1, Yasutaka Kamei 1, Xin Chen 2 1 Dalian University

More information

Sequence Modeling: Recurrent and Recursive Nets. By Pyry Takala 14 Oct 2015

Sequence Modeling: Recurrent and Recursive Nets. By Pyry Takala 14 Oct 2015 Sequence Modeling: Recurrent and Recursive Nets By Pyry Takala 14 Oct 2015 Agenda Why Recurrent neural networks? Anatomy and basic training of an RNN (10.2, 10.2.1) Properties of RNNs (10.2.2, 8.2.6) Using

More information

Natural Language Processing with Deep Learning CS224N/Ling284

Natural Language Processing with Deep Learning CS224N/Ling284 Natural Language Processing with Deep Learning CS224N/Ling284 Lecture 8: Recurrent Neural Networks Christopher Manning and Richard Socher Organization Extra project office hour today after lecture Overview

More information

Deep Learning on Graphs

Deep Learning on Graphs Deep Learning on Graphs with Graph Convolutional Networks Hidden layer Hidden layer Input Output ReLU ReLU, 6 April 2017 joint work with Max Welling (University of Amsterdam) The success story of deep

More information

Transition-Based Dependency Parsing with Stack Long Short-Term Memory

Transition-Based Dependency Parsing with Stack Long Short-Term Memory Transition-Based Dependency Parsing with Stack Long Short-Term Memory Chris Dyer, Miguel Ballesteros, Wang Ling, Austin Matthews, Noah A. Smith Association for Computational Linguistics (ACL), 2015 Presented

More information

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention Show, Attend and Tell: Neural Image Caption Generation with Visual Attention Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard Zemel, Yoshua Bengio Presented

More information

Context Encoding LSTM CS224N Course Project

Context Encoding LSTM CS224N Course Project Context Encoding LSTM CS224N Course Project Abhinav Rastogi arastogi@stanford.edu Supervised by - Samuel R. Bowman December 7, 2015 Abstract This project uses ideas from greedy transition based parsing

More information

Sparse Non-negative Matrix Language Modeling

Sparse Non-negative Matrix Language Modeling Sparse Non-negative Matrix Language Modeling Joris Pelemans Noam Shazeer Ciprian Chelba joris@pelemans.be noam@google.com ciprianchelba@google.com 1 Outline Motivation Sparse Non-negative Matrix Language

More information

Deep Learning based Authorship Identification

Deep Learning based Authorship Identification Deep Learning based Authorship Identification Chen Qian Tianchang He Rao Zhang Department of Electrical Engineering Stanford University, Stanford, CA 94305 cqian23@stanford.edu th7@stanford.edu zhangrao@stanford.edu

More information

Pointer Network. Oriol Vinyals. 박천음 강원대학교 Intelligent Software Lab.

Pointer Network. Oriol Vinyals. 박천음 강원대학교 Intelligent Software Lab. Pointer Network Oriol Vinyals 박천음 강원대학교 Intelligent Software Lab. Intelligent Software Lab. Pointer Network 1 Pointer Network 2 Intelligent Software Lab. 2 Sequence-to-Sequence Model Train 학습학습학습학습학습 Test

More information

End-To-End Spam Classification With Neural Networks

End-To-End Spam Classification With Neural Networks End-To-End Spam Classification With Neural Networks Christopher Lennan, Bastian Naber, Jan Reher, Leon Weber 1 Introduction A few years ago, the majority of the internet s network traffic was due to spam

More information

Show, Discriminate, and Tell: A Discriminatory Image Captioning Model with Deep Neural Networks

Show, Discriminate, and Tell: A Discriminatory Image Captioning Model with Deep Neural Networks Show, Discriminate, and Tell: A Discriminatory Image Captioning Model with Deep Neural Networks Zelun Luo Department of Computer Science Stanford University zelunluo@stanford.edu Te-Lin Wu Department of

More information

Simple and Efficient Learning with Automatic Operation Batching

Simple and Efficient Learning with Automatic Operation Batching Simple and Efficient Learning with Automatic Operation Batching Graham Neubig joint work w/ Yoav Goldberg and Chris Dyer in http://dynet.io/autobatch/ https://github.com/neubig/howtocode-2017 Neural Networks

More information

PTE : Predictive Text Embedding through Large-scale Heterogeneous Text Networks

PTE : Predictive Text Embedding through Large-scale Heterogeneous Text Networks PTE : Predictive Text Embedding through Large-scale Heterogeneous Text Networks Pramod Srinivasan CS591txt - Text Mining Seminar University of Illinois, Urbana-Champaign April 8, 2016 Pramod Srinivasan

More information

Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank text

Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank text Philosophische Fakultät Seminar für Sprachwissenschaft Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank text 06 July 2017, Patricia Fischer & Neele Witte Overview Sentiment

More information

Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling

Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling Authors: Junyoung Chung, Caglar Gulcehre, KyungHyun Cho and Yoshua Bengio Presenter: Yu-Wei Lin Background: Recurrent Neural

More information

Seq2SQL: Generating Structured Queries from Natural Language Using Reinforcement Learning

Seq2SQL: Generating Structured Queries from Natural Language Using Reinforcement Learning Seq2SQL: Generating Structured Queries from Natural Language Using Reinforcement Learning V. Zhong, C. Xiong, R. Socher Salesforce Research arxiv: 1709.00103 Reviewed by : Bill Zhang University of Virginia

More information

Impersonation: Modeling Persona in Smart Responses to

Impersonation: Modeling Persona in Smart Responses to Impersonation: Modeling Persona in Smart Responses to Rajeev Gupta, Ranganath Kondapally, Chakrapani Ravi Kiran S Microsoft India AI & R {rajgup, rakondap, ravichak} @microsoft.com Abstract In this paper,

More information

Recurrent Neural Networks. Nand Kishore, Audrey Huang, Rohan Batra

Recurrent Neural Networks. Nand Kishore, Audrey Huang, Rohan Batra Recurrent Neural Networks Nand Kishore, Audrey Huang, Rohan Batra Roadmap Issues Motivation 1 Application 1: Sequence Level Training 2 Basic Structure 3 4 Variations 5 Application 3: Image Classification

More information

Structured Attention Networks

Structured Attention Networks Structured Attention Networks Yoon Kim Carl Denton Luong Hoang Alexander M. Rush HarvardNLP ICLR, 2017 Presenter: Chao Jiang ICLR, 2017 Presenter: Chao Jiang 1 / Outline 1 Deep Neutral Networks for Text

More information

LSTM and its variants for visual recognition. Xiaodan Liang Sun Yat-sen University

LSTM and its variants for visual recognition. Xiaodan Liang Sun Yat-sen University LSTM and its variants for visual recognition Xiaodan Liang xdliang328@gmail.com Sun Yat-sen University Outline Context Modelling with CNN LSTM and its Variants LSTM Architecture Variants Application in

More information

Recurrent Neural Networks

Recurrent Neural Networks Recurrent Neural Networks 11-785 / Fall 2018 / Recitation 7 Raphaël Olivier Recap : RNNs are magic They have infinite memory They handle all kinds of series They re the basis of recent NLP : Translation,

More information

Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting

Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting Yaguang Li Joint work with Rose Yu, Cyrus Shahabi, Yan Liu Page 1 Introduction Traffic congesting is wasteful of time,

More information

USING RECURRENT NEURAL NETWORKS FOR DUPLICATE DETECTION AND ENTITY LINKING

USING RECURRENT NEURAL NETWORKS FOR DUPLICATE DETECTION AND ENTITY LINKING USING RECURRENT NEURAL NETWORKS FOR DUPLICATE DETECTION AND ENTITY LINKING BRUNO MARTINS, RUI SANTOS, RICARDO CUSTÓDIO SEPTEMBER 20 TH, 2016 GOLOCAL WORKSHOP WHAT THIS TALK IS ABOUT Approximate string

More information

Crowd Scene Understanding with Coherent Recurrent Neural Networks

Crowd Scene Understanding with Coherent Recurrent Neural Networks Crowd Scene Understanding with Coherent Recurrent Neural Networks Hang Su, Yinpeng Dong, Jun Zhu May 22, 2016 Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, 2016 1 / 26 Outline 1 Introduction 2 LSTM

More information

NTT SMT System for IWSLT Katsuhito Sudoh, Taro Watanabe, Jun Suzuki, Hajime Tsukada, and Hideki Isozaki NTT Communication Science Labs.

NTT SMT System for IWSLT Katsuhito Sudoh, Taro Watanabe, Jun Suzuki, Hajime Tsukada, and Hideki Isozaki NTT Communication Science Labs. NTT SMT System for IWSLT 2008 Katsuhito Sudoh, Taro Watanabe, Jun Suzuki, Hajime Tsukada, and Hideki Isozaki NTT Communication Science Labs., Japan Overview 2-stage translation system k-best translation

More information

GraphNet: Recommendation system based on language and network structure

GraphNet: Recommendation system based on language and network structure GraphNet: Recommendation system based on language and network structure Rex Ying Stanford University rexying@stanford.edu Yuanfang Li Stanford University yli03@stanford.edu Xin Li Stanford University xinli16@stanford.edu

More information

CS 224d: Assignment #1

CS 224d: Assignment #1 Due date: assignment) 4/19 11:59 PM PST (You are allowed to use three (3) late days maximum for this These questions require thought, but do not require long answers. Please be as concise as possible.

More information

Deep Learning on Graphs

Deep Learning on Graphs Deep Learning on Graphs with Graph Convolutional Networks Hidden layer Hidden layer Input Output ReLU ReLU, 22 March 2017 joint work with Max Welling (University of Amsterdam) BDL Workshop @ NIPS 2016

More information

CAP 6412 Advanced Computer Vision

CAP 6412 Advanced Computer Vision CAP 6412 Advanced Computer Vision http://www.cs.ucf.edu/~bgong/cap6412.html Boqing Gong Feb 04, 2016 Today Administrivia Attention Modeling in Image Captioning, by Karan Neural networks & Backpropagation

More information

27: Hybrid Graphical Models and Neural Networks

27: Hybrid Graphical Models and Neural Networks 10-708: Probabilistic Graphical Models 10-708 Spring 2016 27: Hybrid Graphical Models and Neural Networks Lecturer: Matt Gormley Scribes: Jakob Bauer Otilia Stretcu Rohan Varma 1 Motivation We first look

More information

Recurrent Neural Networks

Recurrent Neural Networks Recurrent Neural Networks Javier Béjar Deep Learning 2018/2019 Fall Master in Artificial Intelligence (FIB-UPC) Introduction Sequential data Many problems are described by sequences Time series Video/audio

More information

Asynchronous Parallel Learning for Neural Networks and Structured Models with Dense Features

Asynchronous Parallel Learning for Neural Networks and Structured Models with Dense Features Asynchronous Parallel Learning for Neural Networks and Structured Models with Dense Features Xu SUN ( 孙栩 ) Peking University xusun@pku.edu.cn Motivation Neural networks -> Good Performance CNN, RNN, LSTM

More information

Word2vec and beyond. presented by Eleni Triantafillou. March 1, 2016

Word2vec and beyond. presented by Eleni Triantafillou. March 1, 2016 Word2vec and beyond presented by Eleni Triantafillou March 1, 2016 The Big Picture There is a long history of word representations Techniques from information retrieval: Latent Semantic Analysis (LSA)

More information

Sentiment Analysis for Amazon Reviews

Sentiment Analysis for Amazon Reviews Sentiment Analysis for Amazon Reviews Wanliang Tan wanliang@stanford.edu Xinyu Wang xwang7@stanford.edu Xinyu Xu xinyu17@stanford.edu Abstract Sentiment analysis of product reviews, an application problem,

More information

Lecture note 4: How to structure your model in TensorFlow

Lecture note 4: How to structure your model in TensorFlow Lecture note 4: How to structure your model in TensorFlow CS 20SI: TensorFlow for Deep Learning Research (cs20si.stanford.edu) Prepared by Chip Huyen ( huyenn@stanford.edu ) Reviewed by Danijar Hafner

More information

Kyoto-NMT: a Neural Machine Translation implementation in Chainer

Kyoto-NMT: a Neural Machine Translation implementation in Chainer Kyoto-NMT: a Neural Machine Translation implementation in Chainer Fabien Cromières Japan Science and Technology Agency Kawaguchi-shi, Saitama 332-0012 fabien@pa.jst.jp Abstract We present Kyoto-NMT, an

More information

A Neuro Probabilistic Language Model Bengio et. al. 2003

A Neuro Probabilistic Language Model Bengio et. al. 2003 A Neuro Probabilistic Language Model Bengio et. al. 2003 Class Discussion Notes Scribe: Olivia Winn February 1, 2016 Opening thoughts (or why this paper is interesting): Word embeddings currently have

More information

Deep Learning Applications

Deep Learning Applications October 20, 2017 Overview Supervised Learning Feedforward neural network Convolution neural network Recurrent neural network Recursive neural network (Recursive neural tensor network) Unsupervised Learning

More information

Transition-based Parsing with Neural Nets

Transition-based Parsing with Neural Nets CS11-747 Neural Networks for NLP Transition-based Parsing with Neural Nets Graham Neubig Site https://phontron.com/class/nn4nlp2017/ Two Types of Linguistic Structure Dependency: focus on relations between

More information

Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction

Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction by Noh, Hyeonwoo, Paul Hongsuck Seo, and Bohyung Han.[1] Presented : Badri Patro 1 1 Computer Vision Reading

More information

Deep Code Comment Generation

Deep Code Comment Generation Deep Code Comment Generation Xing Hu 1, Ge Li 1, Xin Xia 2, David Lo 3, Zhi Jin 1 1 Key Laboratory of High Confidence Software Technologies (Peking University), MoE, Beijing, China 2 Faculty of Information

More information

Advanced Search Algorithms

Advanced Search Algorithms CS11-747 Neural Networks for NLP Advanced Search Algorithms Daniel Clothiaux https://phontron.com/class/nn4nlp2017/ Why search? So far, decoding has mostly been greedy Chose the most likely output from

More information

Code Mania Artificial Intelligence: a. Module - 1: Introduction to Artificial intelligence and Python:

Code Mania Artificial Intelligence: a. Module - 1: Introduction to Artificial intelligence and Python: Code Mania 2019 Artificial Intelligence: a. Module - 1: Introduction to Artificial intelligence and Python: 1. Introduction to Artificial Intelligence 2. Introduction to python programming and Environment

More information

Combining Neural Networks and Log-linear Models to Improve Relation Extraction

Combining Neural Networks and Log-linear Models to Improve Relation Extraction Combining Neural Networks and Log-linear Models to Improve Relation Extraction Thien Huu Nguyen and Ralph Grishman Computer Science Department, New York University {thien,grishman}@cs.nyu.edu Outline Relation

More information

Outline GF-RNN ReNet. Outline

Outline GF-RNN ReNet. Outline Outline Gated Feedback Recurrent Neural Networks. arxiv1502. Introduction: RNN & Gated RNN Gated Feedback Recurrent Neural Networks (GF-RNN) Experiments: Character-level Language Modeling & Python Program

More information

Domain-Aware Sentiment Classification with GRUs and CNNs

Domain-Aware Sentiment Classification with GRUs and CNNs Domain-Aware Sentiment Classification with GRUs and CNNs Guangyuan Piao 1(B) and John G. Breslin 2 1 Insight Centre for Data Analytics, Data Science Institute, National University of Ireland Galway, Galway,

More information

Learning to Rank with Attentive Media Attributes

Learning to Rank with Attentive Media Attributes Learning to Rank with Attentive Media Attributes Baldo Faieta Yang (Allie) Yang Adobe Adobe San Francisco, CA 94103 San Francisco, CA. 94103 bfaieta@adobe.com yangyan@adobe.com Abstract In the context

More information

CPSC 340: Machine Learning and Data Mining. Deep Learning Fall 2018

CPSC 340: Machine Learning and Data Mining. Deep Learning Fall 2018 CPSC 340: Machine Learning and Data Mining Deep Learning Fall 2018 Last Time: Multi-Dimensional Scaling Multi-dimensional scaling (MDS): Non-parametric visualization: directly optimize the z i locations.

More information

Layerwise Interweaving Convolutional LSTM

Layerwise Interweaving Convolutional LSTM Layerwise Interweaving Convolutional LSTM Tiehang Duan and Sargur N. Srihari Department of Computer Science and Engineering The State University of New York at Buffalo Buffalo, NY 14260, United States

More information

Modeling Sequences Conditioned on Context with RNNs

Modeling Sequences Conditioned on Context with RNNs Modeling Sequences Conditioned on Context with RNNs Sargur Srihari srihari@buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 10. Topics in Sequence

More information

Semantic Word Embedding Neural Network Language Models for Automatic Speech Recognition

Semantic Word Embedding Neural Network Language Models for Automatic Speech Recognition Semantic Word Embedding Neural Network Language Models for Automatic Speech Recognition Kartik Audhkhasi, Abhinav Sethy Bhuvana Ramabhadran Watson Multimodal Group IBM T. J. Watson Research Center Motivation

More information

Image Captioning with Object Detection and Localization

Image Captioning with Object Detection and Localization Image Captioning with Object Detection and Localization Zhongliang Yang, Yu-Jin Zhang, Sadaqat ur Rehman, Yongfeng Huang, Department of Electronic Engineering, Tsinghua University, Beijing 100084, China

More information

Semantically Enhanced Software Traceability Using Deep Learning Techniques

Semantically Enhanced Software Traceability Using Deep Learning Techniques Semantically Enhanced Software Traceability Using Deep Learning Techniques Jin Guo, Jinghui Cheng and Jane Cleland-Huang University of Notre Dame Notre Dame, IN, USA Email: {jguo3, JinghuiCheng, JaneClelandHuang}@nd.edu

More information

Information-Theory Interpretation of the Skip-Gram Negative-Sampling Objective Function

Information-Theory Interpretation of the Skip-Gram Negative-Sampling Objective Function Information-Theory Interpretation of the Skip-Gram Negative-Sampling Objective Function Oren Melamud IBM Research Yorktown Heights, NY, USA oren.melamud@ibm.com Jacob Goldberger Faculty of Engineering

More information

Query Intent Detection using Convolutional Neural Networks

Query Intent Detection using Convolutional Neural Networks Query Intent Detection using Convolutional Neural Networks Homa B. Hashemi, Amir Asiaee, Reiner Kraft QRUMS workshop - February 22, 2016 Query Intent Detection michelle obama age Query Intent Detection

More information

We will start at 2:05 pm! Thanks for coming early!

We will start at 2:05 pm! Thanks for coming early! We will start at 2:05 pm! Thanks for coming early! Yesterday Fundamental 1. Value of visualization 2. Design principles 3. Graphical perception Record Information Support Analytical Reasoning Communicate

More information

JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS. Puyang Xu, Ruhi Sarikaya. Microsoft Corporation

JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS. Puyang Xu, Ruhi Sarikaya. Microsoft Corporation JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS Puyang Xu, Ruhi Sarikaya Microsoft Corporation ABSTRACT We describe a joint model for intent detection and slot filling based

More information

Autoencoder. Representation learning (related to dictionary learning) Both the input and the output are x

Autoencoder. Representation learning (related to dictionary learning) Both the input and the output are x Deep Learning 4 Autoencoder, Attention (spatial transformer), Multi-modal learning, Neural Turing Machine, Memory Networks, Generative Adversarial Net Jian Li IIIS, Tsinghua Autoencoder Autoencoder Unsupervised

More information

Compressed Representations of Text Documents

Compressed Representations of Text Documents Compressed Representations of Text Documents Papers by Paskov et al. 2013 1, 2016 3 Presented by Misha Khodak 10 May 2017 1 Hristo S. Paskov, Robert West, John C. Mitchell, and Trevor J. Hastie. Compressive

More information

Image Captioning and Generation From Text

Image Captioning and Generation From Text Image Captioning and Generation From Text Presented by: Tony Zhang, Jonathan Kenny, and Jeremy Bernstein Mentor: Stephan Zheng CS159 Advanced Topics in Machine Learning: Structured Prediction California

More information

DCU-UvA Multimodal MT System Report

DCU-UvA Multimodal MT System Report DCU-UvA Multimodal MT System Report Iacer Calixto ADAPT Centre School of Computing Dublin City University Dublin, Ireland iacer.calixto@adaptcentre.ie Desmond Elliott ILLC University of Amsterdam Science

More information

Novel Image Captioning

Novel Image Captioning 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

WITH the development of remote sensing technology,

WITH the development of remote sensing technology, 1 Exploring Models and Data for Remote Sensing Image Caption Generation Xiaoqiang Lu, Senior Member, IEEE, Binqiang Wang, Xiangtao Zheng, and Xuelong Li, Fellow, IEEE arxiv:1712.07835v1 [cs.cv] 21 Dec

More information

RNNs as Directed Graphical Models

RNNs as Directed Graphical Models RNNs as Directed Graphical Models Sargur Srihari srihari@buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 10. Topics in Sequence Modeling Overview

More information

Recurrent Neural Network (RNN) Industrial AI Lab.

Recurrent Neural Network (RNN) Industrial AI Lab. Recurrent Neural Network (RNN) Industrial AI Lab. For example (Deterministic) Time Series Data Closed- form Linear difference equation (LDE) and initial condition High order LDEs 2 (Stochastic) Time Series

More information

CS231N Section. Video Understanding 6/1/2018

CS231N Section. Video Understanding 6/1/2018 CS231N Section Video Understanding 6/1/2018 Outline Background / Motivation / History Video Datasets Models Pre-deep learning CNN + RNN 3D convolution Two-stream What we ve seen in class so far... Image

More information

Qualitative Evaluation of Word2Vec for Recommendations

Qualitative Evaluation of Word2Vec for Recommendations Qualitative Evaluation of Word2Vec for Recommendations Siddha Ganju School of Computer Science Carnegie Mellon University sganju1@cs.cmu.edu Sravya Popuri School of Computer Science Carnegie Mellon University

More information

Learning Binary Code with Deep Learning to Detect Software Weakness

Learning Binary Code with Deep Learning to Detect Software Weakness KSII The 9 th International Conference on Internet (ICONI) 2017 Symposium. Copyright c 2017 KSII 245 Learning Binary Code with Deep Learning to Detect Software Weakness Young Jun Lee *, Sang-Hoon Choi

More information

A Comparison of Sequence-Trained Deep Neural Networks and Recurrent Neural Networks Optical Modeling For Handwriting Recognition

A Comparison of Sequence-Trained Deep Neural Networks and Recurrent Neural Networks Optical Modeling For Handwriting Recognition A Comparison of Sequence-Trained Deep Neural Networks and Recurrent Neural Networks Optical Modeling For Handwriting Recognition Théodore Bluche, Hermann Ney, Christopher Kermorvant SLSP 14, Grenoble October

More information

Semantic Estimation for Texts in Software Engineering

Semantic Estimation for Texts in Software Engineering Semantic Estimation for Texts in Software Engineering 汇报人 : Reporter:Xiaochen Li Dalian University of Technology, China 大连理工大学 2016 年 11 月 29 日 Oscar Lab 2 Ph.D. candidate at OSCAR Lab, in Dalian University

More information