Advanced RNN (GRU and LSTM) for Machine Transla:on. Dr. Kira Radinsky CTO SalesPredict Visi8ng Professor/Scien8st Technion

Size: px
Start display at page:

Download "Advanced RNN (GRU and LSTM) for Machine Transla:on. Dr. Kira Radinsky CTO SalesPredict Visi8ng Professor/Scien8st Technion"

Transcription

1 Advanced RNN (GRU and LSTM) for Machine Transla:on Dr. Kira Radinsky CTO SalesPredict Visi8ng Professor/Scien8st Technion Slides were adapted from lectures by Richard Socher

2 Overview Machine transla8on RNN Models tackling MT: Gated Recurrent Units by Cho et al. (2014) Long-Short-Term-Memories by Hochreiter and Schmidhuber (1997)

3 Machine Transla:on Methods are sta8s8cal Use parallel corpora European Parliament First parallel corpus: RoseUa Stone à Tradi8onal systems are very complex

4 Current sta:s:cal machine transla:on systems Source language f, e.g. French Target language e, e.g. English Probabilis8c formula8on (using Bayes rule) Transla8on model p(f e) trained on parallel corpus Language model p(e) trained on English only corpus (lots, free!) Transla8on Model French à à Pieces of English à p(f e) Language Model p(e) Decoder argmax p(f e)p(e) à Proper English

5 Phrase-based decoder

6 Step 1: Alignment Goal: know which word or phrases in source language would translate to what words or phrases in target language? à Hard already! Japan shaken by two new quakes Le Japon secoué par deux nouveaux séismes spurious word Japan shaken by two new quakes Le Japon secoué par deux nouveaux séismes Alignment examples from Chris Manning/CS224n

7 Step 1: Alignment zero fertility word not translated And the program has been implemented one-to-many alignment Le programme a été mis en application And the program has been implemented Le programme a été mis en application

8 Step 1: Alignment Really hard :/ The balance was the territory of the aboriginal people Le reste appartenait aux autochtones The balance was the territory of Le reste appartenait aux autochtones many-to-one alignments the aboriginal people

9 Step 1: Alignment The poor don t have any money many-to-many alignment Les pauvres sont démunis The poor don t have any money Les pauvres sont phrase alignment démunis

10 Step 1: Alignment We could spend an en8re lecture on alignment models Not only single words but could use phrases, syntax Then consider reordering of translated phrases er geht ja nicht nach hause er geht ja nicht nach hause he does not go home Example from Philipp Koehn

11 Phrase-Based Sta:s:cal MT: The Pharaoh/Moses Model Foreign input segmented into phrases phrase is any subsequence of words not a linguis8c phrase Each phrase is probabilis8cally translated into English P(to the conference zur Konferenz) P(into the mee8ng zur Konferenz) Phrases are probabilis8cally re-ordered (See J&M or Lopez 2008 for an intro.) This is s:ll preky much the state-of-the-art!

12 AMer many steps Each phrase in source language has many possible transla8ons resul8ng in large search space: Translation Options er geht ja nicht nach hause he it, it, he it is he will be it goes he goes is are goes go is are is after all does yes is, of course, not is not are not is not a not is not does not do not not do not does not is not to following not after not to after to according to in home under house return home do not house home chamber at home

13 Decode: Search for best of many hypotheses Hard search problem that also includes language model er geht ja nicht nach hause yes he goes home are does not go home it to

14 Tradi:onal MT Skipped hundreds of important details A lot of human feature engineering Very complex systems Many different, independent machine learning problems

15 Deep learning to the rescue!? Maybe, we could translate directly with an RNN? Encoder Decoder: Awesome y 1 sauce y 2 h 1 h 2 W W h 3 x 1 x 2 x 3 Echt dicke Kiste This needs to capture the en8re phrase!

16 MT with RNNs Simplest Model Encoder: Decoder: Minimize cross entropy error for all target words condi8oned on source words It s not quite that simple ;)

17 RNN Transla:on Model Extensions 1. Train different RNN weights for encoding and decoding Awesome y 1 sauce y 2 h 1 h 2 W W h 3 x 1 x 2 x 3 Echt dicke Kiste This means the φ() func8ons in would have different W (hh) matrices in the two models

18 RNN Transla:on Model Extensions Nota8on: Each input of has its own linear transforma8on matrix. Simple: 2. Compute every hidden state in decoder from Previous hidden state (standard) Last hidden vector of encoder c=h T Previous predicted output word y t---1 Language model with three inputs to each decoder neuron: (h t 1, c, y t 1 ) Cho et al. 2014

19 Different picture, same idea Kyunghyun Cho et al. 2014

20 RNN Transla:on Model Extensions 3. Train stacked/deep RNNs with mul8ple layers 4. Poten8ally train bidirec8onal encoder h (3) h (2) h (1) x 5. Train input sequence in reverse order for simple op8miza8on problem: Instead of A B C à X Y, train with C B A à X Y

21 6. Main Improvement: BeKer Units More complex hidden unit computa8on in recurrence! Gated Recurrent Units (GRU) introduced by Cho et al (see reading list) Main ideas: keep around memories to capture long distance dependencies allow error messages to flow at different strengths depending on the inputs

22 GRUs Standard RNN computes hidden layer at next 8me step directly: GRU first computes an update gate (another layer) based on current input word vector and hidden state Compute reset gate similarly but with different weights

23 GRUs Update gate Reset gate New memory content: Final memory at 8me step combines current and previous 8me steps:

24 GRUs Intui8vely, the update gate defines how much of the previous memory to keep around

25 GRUs Intui8vely, the reset gate determines how to combine the new input with the previous memory If we set the reset to all 1 s and update gate to all 0 s we again arrive at our plain RNN model.

26 GRUs Update gate Reset gate New memory content: If reset gate unit is ~0, then this ignores previous memory and only stores the new word informa8on if it the i-th element of r t is 0 we only take the current word into account Final memory at 8me step combines current and previous 8me steps: if it the i-th element of z t is 1 we copy the previous state and ignore the current one (including the current word). Otherwise, we can take only the current word (based on previous reset gate) or with is connec8on to previous words

27 AKempt at a clean illustra:on Final memory h t---1 h t Memory (reset) ~h t---1 ~ ht Update gate z t---1 z t Has to be sigmoid to illustrate the on/off switch beuer Reset gate r t---1 r t Input: x t---1 x t

28 GRU intui:on If reset is close to 0, ignore previous hidden state à Allows model to drop informa8on that is irrelevant in the future Update gate z controls how much of past state should mauer now. If z close to 1, then we can copy informa8on in that unit through many 8me steps! Less vanishing gradient! Units with short-term dependencies ouen have reset gates very ac8ve

29 GRU intui:on Units with long term dependencies have ac8ve update gates z Illustra8on: z h r ~ h x Deriva8ve of? à rest is same chain rule, but implement with modulariza:on or automa8c differen8a8on (e.g. theano)

30 GRU Python Implementa:on GRU layer is just another way of compu8ng the hidden state. So all we really need to do is change the hidden state computa8on in our forward propaga8on func8on. In our implementa8on we also added bias units. It s quite typical that these are not shown in the equa8ons. I also added a word embedding layer E.

31 GRU Python Implementa:on: Gradients We could derive the gradients for E,W,U,b and by hand using the chain rule, just like we did before. But in prac8ce most people use libraries like Theano that support autodifferena8on of expressions.

32 Adding a second GRU layer

33 Results Here are a few good examples of the network output (capitaliza8on added by me). I am a bot, and this ac8on was performed automa8cally. I enforce myself ridiculously well enough to just youtube. I ve got a good rhythm going! There is no problem here, but at least s8ll wave! It depends on how plausible my judgement is. ( with the cons8tu8on which makes it impossible ) Our network was able to Seman:c dependencies! For example, bot and automa8cally are clearly related, as are the opening and closing brackets.

34 Long-short-term-memories (LSTMs) We can make the units even more complex Allow each 8me step to modify Input gate (high if current cell mauers) Forget (gate 0, forget past) Output (how much cell is exposed) New memory cell Final memory cell: Final hidden state: Many varia8on: LSTM: A Search Space Odyssey

35 Long-short-term-memories (LSTMs) A candidate hidden state that is computed based on the current input and the previous hidden state. It is exactly the same equa8on we had in our vanilla RNN! However, instead of taking as the new hidden state as we did in the RNN, we will use the input gate from above to pick some of it.

36 Long-short-term-memories (LSTMs) previous memory mul8plied by the forget gate the newly computed hidden state mul8plied by the input gate The internal memory of the unit combina8on of how we want to combine previous memory and the new input. We could choose to ignore the old memory completely (forget gate all 0 s) or ignore the newly computed state completely (input gate all 0 s), but most likely we want something in between these two extremes.

37 Long-short-term-memories (LSTMs) Given the memory c t, we finally compute the output hidden state h t by mul8plying the memory with the output gate. Not all of the internal memory may be relevant to the hidden state used by other units in the network.

38 Illustra:ons a bit overwhelming ;) net c j s c =s + c g y in j j j y c j g in g y 1.0 j h h y out j w ic j y in j y out j w ic j net in j w in j i w out j i net out j Long Short---Term Memory by Hochreiter and Schmidhuber (1997) hup://people.idsia.ch/~juergen/lstm/sld017.htm hup://deeplearning.net/tutorial/lstm.html Intui8on: memory cells can keep informa8on intact, unless inputs makes them forget it or overwrite it with new input. Cell can decide to output this informa8on or just store it

39 LSTMs are currently very hip! En vogue default model for most sequence labeling tasks Very powerful, especially when stacked and made even deeper (each hidden layer is already computed by a deep internal network) Most useful if you have lots and lots of data

40 Deep LSTMs don t outperform tradi:onal MT yet Method test BLEU score (ntst14) Bahdanau et al. [2] Baseline System [29] Single forward LSTM, beam size Single reversed LSTM, beam size Ensemble of 5 reversed LSTMs, beam size Ensemble of 2 reversed LSTMs, beam size Ensemble of 5 reversed LSTMs, beam size Ensemble of 5 reversed LSTMs, beam size Table 1: The performance of the LSTM on WMT 14 English to French test set (ntst14). Note that an ensemble of 5 LSTMs with a beam of size 2 is cheaper than of a single LSTM with a beam of size 12. Method test BLEU score (ntst14) Baseline System [29] Cho et al. [5] Best WMT 14 result [9] 37.0 Rescoring the baseline 1000-best with a single forward LSTM Rescoring the baseline 1000-best with a single reversed LSTM Rescoring the baseline 1000-best with an ensemble of 5 reversed LSTMs 36.5 Oracle Rescoring of the Baseline 1000-best lists 45 Sequence to Sequence Learning by Sutskever et al. 2014

41 Deep LSTM for Machine Transla:on PCA of vectors from last 8me step hidden layer 4 15 I was given a card by her in the garden Mary admires John Mary is in love with John 10 5 In the garden, she gave me a card She gave me a card in the garden John admires Mary John is in love with Mary Mary respects John 0 5 In the garden, I gave her a card She was given a card by me in the garden John respects Mary 15 I gave her a card in the garden Sequence to Sequence Learning by Sutskever et al. 2014

42 Further Improvements: More Gates! Gated Feedback Recurrent Neural Networks, Chung et al (a) Conventional stacked RNN (b) Gated Feedback RNN

43 Summary LSTMs/GRU were designed to combat vanishing gradients through a ga*ng mechanism. LTSM (1997) GRU(2014) A LSTM/GRU layer is just another way to compute a hidden state that was previously

44 Summary Recurrent Neural Networks are powerful A lot of ongoing work right now Gated Recurrent Units even beuer LSTMs maybe even beuer (jury s8ll out) This was an advanced lecture à gain intui8on, encourage explora8on Next up: Recursive Neural Networks simpler and also powerful :)

Recurrent Neural Nets II

Recurrent Neural Nets II Recurrent Neural Nets II Steven Spielberg Pon Kumar, Tingke (Kevin) Shen Machine Learning Reading Group, Fall 2016 9 November, 2016 Outline 1 Introduction 2 Problem Formulations with RNNs 3 LSTM for Optimization

More information

Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling

Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling Authors: Junyoung Chung, Caglar Gulcehre, KyungHyun Cho and Yoshua Bengio Presenter: Yu-Wei Lin Background: Recurrent Neural

More information

Sequence Modeling: Recurrent and Recursive Nets. By Pyry Takala 14 Oct 2015

Sequence Modeling: Recurrent and Recursive Nets. By Pyry Takala 14 Oct 2015 Sequence Modeling: Recurrent and Recursive Nets By Pyry Takala 14 Oct 2015 Agenda Why Recurrent neural networks? Anatomy and basic training of an RNN (10.2, 10.2.1) Properties of RNNs (10.2.2, 8.2.6) Using

More information

Convolutional Sequence to Sequence Learning. Denis Yarats with Jonas Gehring, Michael Auli, David Grangier, Yann Dauphin Facebook AI Research

Convolutional Sequence to Sequence Learning. Denis Yarats with Jonas Gehring, Michael Auli, David Grangier, Yann Dauphin Facebook AI Research Convolutional Sequence to Sequence Learning Denis Yarats with Jonas Gehring, Michael Auli, David Grangier, Yann Dauphin Facebook AI Research Sequence generation Need to model a conditional distribution

More information

Sentiment Classification of Food Reviews

Sentiment Classification of Food Reviews Sentiment Classification of Food Reviews Hua Feng Department of Electrical Engineering Stanford University Stanford, CA 94305 fengh15@stanford.edu Ruixi Lin Department of Electrical Engineering Stanford

More information

Show, Discriminate, and Tell: A Discriminatory Image Captioning Model with Deep Neural Networks

Show, Discriminate, and Tell: A Discriminatory Image Captioning Model with Deep Neural Networks Show, Discriminate, and Tell: A Discriminatory Image Captioning Model with Deep Neural Networks Zelun Luo Department of Computer Science Stanford University zelunluo@stanford.edu Te-Lin Wu Department of

More information

Natural Language Processing with Deep Learning CS224N/Ling284

Natural Language Processing with Deep Learning CS224N/Ling284 Natural Language Processing with Deep Learning CS224N/Ling284 Lecture 8: Recurrent Neural Networks Christopher Manning and Richard Socher Organization Extra project office hour today after lecture Overview

More information

Kyoto-NMT: a Neural Machine Translation implementation in Chainer

Kyoto-NMT: a Neural Machine Translation implementation in Chainer Kyoto-NMT: a Neural Machine Translation implementation in Chainer Fabien Cromières Japan Science and Technology Agency Kawaguchi-shi, Saitama 332-0012 fabien@pa.jst.jp Abstract We present Kyoto-NMT, an

More information

Natural Language Processing with Deep Learning. CS224N/Ling284. Lecture 5: Backpropagation Kevin Clark

Natural Language Processing with Deep Learning. CS224N/Ling284. Lecture 5: Backpropagation Kevin Clark Natural Language Processing with Deep Learning CS4N/Ling84 Lecture 5: Backpropagation Kevin Clark Announcements Assignment 1 due Thursday, 11:59 You can use up to 3 late days (making it due Sunday at midnight)

More information

Machine Learning 13. week

Machine Learning 13. week Machine Learning 13. week Deep Learning Convolutional Neural Network Recurrent Neural Network 1 Why Deep Learning is so Popular? 1. Increase in the amount of data Thanks to the Internet, huge amount of

More information

Asynchronous Parallel Learning for Neural Networks and Structured Models with Dense Features

Asynchronous Parallel Learning for Neural Networks and Structured Models with Dense Features Asynchronous Parallel Learning for Neural Networks and Structured Models with Dense Features Xu SUN ( 孙栩 ) Peking University xusun@pku.edu.cn Motivation Neural networks -> Good Performance CNN, RNN, LSTM

More information

CSC 578 Neural Networks and Deep Learning

CSC 578 Neural Networks and Deep Learning CSC 578 Neural Networks and Deep Learning Fall 2018/19 7. Recurrent Neural Networks (Some figures adapted from NNDL book) 1 Recurrent Neural Networks 1. Recurrent Neural Networks (RNNs) 2. RNN Training

More information

Algorithms for NLP. Machine Translation. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley

Algorithms for NLP. Machine Translation. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley Algorithms for NLP Machine Translation Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley Machine Translation Machine Translation: Examples Levels of Transfer Word-Level MT: Examples la politique

More information

LSTM for Language Translation and Image Captioning. Tel Aviv University Deep Learning Seminar Oran Gafni & Noa Yedidia

LSTM for Language Translation and Image Captioning. Tel Aviv University Deep Learning Seminar Oran Gafni & Noa Yedidia 1 LSTM for Language Translation and Image Captioning Tel Aviv University Deep Learning Seminar Oran Gafni & Noa Yedidia 2 Part I LSTM for Language Translation Motivation Background (RNNs, LSTMs) Model

More information

Empirical Evaluation of RNN Architectures on Sentence Classification Task

Empirical Evaluation of RNN Architectures on Sentence Classification Task Empirical Evaluation of RNN Architectures on Sentence Classification Task Lei Shen, Junlin Zhang Chanjet Information Technology lorashen@126.com, zhangjlh@chanjet.com Abstract. Recurrent Neural Networks

More information

Machine Learning Crash Course: Part I

Machine Learning Crash Course: Part I Machine Learning Crash Course: Part I Ariel Kleiner August 21, 2012 Machine learning exists at the intersec

More information

Relational inductive biases, deep learning, and graph networks

Relational inductive biases, deep learning, and graph networks Relational inductive biases, deep learning, and graph networks Peter Battaglia et al. 2018 1 What The authors explore how we can combine relational inductive biases and DL. They introduce graph network

More information

Op#miza#on Problems, John Gu7ag MIT Department of Electrical Engineering and Computer Science LECTURE 2 1

Op#miza#on Problems, John Gu7ag MIT Department of Electrical Engineering and Computer Science LECTURE 2 1 Op#miza#on Problems, John Gu7ag MIT Department of Electrical Engineering and Computer Science 6.0002 LECTURE 2 1 Relevant Reading for Today s Lecture Chapter 13 6.0002 LECTURE 2 2 The Pros and Cons of

More information

Modeling Sequences Conditioned on Context with RNNs

Modeling Sequences Conditioned on Context with RNNs Modeling Sequences Conditioned on Context with RNNs Sargur Srihari srihari@buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 10. Topics in Sequence

More information

Recurrent Neural Networks. Nand Kishore, Audrey Huang, Rohan Batra

Recurrent Neural Networks. Nand Kishore, Audrey Huang, Rohan Batra Recurrent Neural Networks Nand Kishore, Audrey Huang, Rohan Batra Roadmap Issues Motivation 1 Application 1: Sequence Level Training 2 Basic Structure 3 4 Variations 5 Application 3: Image Classification

More information

Deep neural networks

Deep neural networks Deep neural networks Outline What s new in ANNs in the last 5-10 years? Deeper networks, more data, and faster training Scalability and use of GPUs Symbolic differentiation reverse-mode automatic differentiation

More information

Tuning. Philipp Koehn presented by Gaurav Kumar. 28 September 2017

Tuning. Philipp Koehn presented by Gaurav Kumar. 28 September 2017 Tuning Philipp Koehn presented by Gaurav Kumar 28 September 2017 The Story so Far: Generative Models 1 The definition of translation probability follows a mathematical derivation argmax e p(e f) = argmax

More information

Sparse Feature Learning

Sparse Feature Learning Sparse Feature Learning Philipp Koehn 1 March 2016 Multiple Component Models 1 Translation Model Language Model Reordering Model Component Weights 2 Language Model.05 Translation Model.26.04.19.1 Reordering

More information

Statistical Machine Translation Part IV Log-Linear Models

Statistical Machine Translation Part IV Log-Linear Models Statistical Machine Translation art IV Log-Linear Models Alexander Fraser Institute for Natural Language rocessing University of Stuttgart 2011.11.25 Seminar: Statistical MT Where we have been We have

More information

Transition-Based Dependency Parsing with Stack Long Short-Term Memory

Transition-Based Dependency Parsing with Stack Long Short-Term Memory Transition-Based Dependency Parsing with Stack Long Short-Term Memory Chris Dyer, Miguel Ballesteros, Wang Ling, Austin Matthews, Noah A. Smith Association for Computational Linguistics (ACL), 2015 Presented

More information

Sta$c Single Assignment (SSA) Form

Sta$c Single Assignment (SSA) Form Sta$c Single Assignment (SSA) Form SSA form Sta$c single assignment form Intermediate representa$on of program in which every use of a variable is reached by exactly one defini$on Most programs do not

More information

Outline GF-RNN ReNet. Outline

Outline GF-RNN ReNet. Outline Outline Gated Feedback Recurrent Neural Networks. arxiv1502. Introduction: RNN & Gated RNN Gated Feedback Recurrent Neural Networks (GF-RNN) Experiments: Character-level Language Modeling & Python Program

More information

Recurrent Neural Network (RNN) Industrial AI Lab.

Recurrent Neural Network (RNN) Industrial AI Lab. Recurrent Neural Network (RNN) Industrial AI Lab. For example (Deterministic) Time Series Data Closed- form Linear difference equation (LDE) and initial condition High order LDEs 2 (Stochastic) Time Series

More information

LSTM and its variants for visual recognition. Xiaodan Liang Sun Yat-sen University

LSTM and its variants for visual recognition. Xiaodan Liang Sun Yat-sen University LSTM and its variants for visual recognition Xiaodan Liang xdliang328@gmail.com Sun Yat-sen University Outline Context Modelling with CNN LSTM and its Variants LSTM Architecture Variants Application in

More information

Machine Learning for Natural Language Processing. Alice Oh January 17, 2018

Machine Learning for Natural Language Processing. Alice Oh January 17, 2018 Machine Learning for Natural Language Processing Alice Oh January 17, 2018 Overview Distributed representation Temporal neural networks RNN LSTM GRU Sequence-to-sequence models Machine translation Response

More information

INF5820/INF9820 LANGUAGE TECHNOLOGICAL APPLICATIONS. Jan Tore Lønning, Lecture 8, 12 Oct

INF5820/INF9820 LANGUAGE TECHNOLOGICAL APPLICATIONS. Jan Tore Lønning, Lecture 8, 12 Oct 1 INF5820/INF9820 LANGUAGE TECHNOLOGICAL APPLICATIONS Jan Tore Lønning, Lecture 8, 12 Oct. 2016 jtl@ifi.uio.no Today 2 Preparing bitext Parameter tuning Reranking Some linguistic issues STMT so far 3 We

More information

OPTIMIZING PERFORMANCE OF RECURRENT NEURAL NETWORKS

OPTIMIZING PERFORMANCE OF RECURRENT NEURAL NETWORKS April 4-7, 2016 Silicon Valley OPTIMIZING PERFORMANCE OF RECURRENT NEURAL NETWORKS Jeremy Appleyard, 7 April 2016 RECURRENT NEURAL NETWORKS Output is fed into input Perform the same operation repeatedly

More information

Convolu'onal Neural Networks

Convolu'onal Neural Networks Convolu'onal Neural Networks Dr. Kira Radinsky CTO SalesPredict Visi8ng Professor/Scien8st Technion Slides were adapted from Fei-Fei Li & Andrej Karpathy & Jus8n Johnson A bit of history: Hubel & Wiesel,

More information

Table of Contents. What Really is a Hidden Unit? Visualizing Feed-Forward NNs. Visualizing Convolutional NNs. Visualizing Recurrent NNs

Table of Contents. What Really is a Hidden Unit? Visualizing Feed-Forward NNs. Visualizing Convolutional NNs. Visualizing Recurrent NNs Table of Contents What Really is a Hidden Unit? Visualizing Feed-Forward NNs Visualizing Convolutional NNs Visualizing Recurrent NNs Visualizing Attention Visualizing High Dimensional Data What do visualizations

More information

Practical Methodology. Lecture slides for Chapter 11 of Deep Learning Ian Goodfellow

Practical Methodology. Lecture slides for Chapter 11 of Deep Learning  Ian Goodfellow Practical Methodology Lecture slides for Chapter 11 of Deep Learning www.deeplearningbook.org Ian Goodfellow 2016-09-26 What drives success in ML? Arcane knowledge of dozens of obscure algorithms? Mountains

More information

Recurrent Neural Networks

Recurrent Neural Networks Recurrent Neural Networks 11-785 / Fall 2018 / Recitation 7 Raphaël Olivier Recap : RNNs are magic They have infinite memory They handle all kinds of series They re the basis of recent NLP : Translation,

More information

Encoding RNNs, 48 End of sentence (EOS) token, 207 Exploding gradient, 131 Exponential function, 42 Exponential Linear Unit (ELU), 44

Encoding RNNs, 48 End of sentence (EOS) token, 207 Exploding gradient, 131 Exponential function, 42 Exponential Linear Unit (ELU), 44 A Activation potential, 40 Annotated corpus add padding, 162 check versions, 158 create checkpoints, 164, 166 create input, 160 create train and validation datasets, 163 dropout, 163 DRUG-AE.rel file,

More information

SEMANTIC COMPUTING. Lecture 9: Deep Learning: Recurrent Neural Networks (RNNs) TU Dresden, 21 December 2018

SEMANTIC COMPUTING. Lecture 9: Deep Learning: Recurrent Neural Networks (RNNs) TU Dresden, 21 December 2018 SEMANTIC COMPUTING Lecture 9: Deep Learning: Recurrent Neural Networks (RNNs) Dagmar Gromann International Center For Computational Logic TU Dresden, 21 December 2018 Overview Handling Overfitting Recurrent

More information

27: Hybrid Graphical Models and Neural Networks

27: Hybrid Graphical Models and Neural Networks 10-708: Probabilistic Graphical Models 10-708 Spring 2016 27: Hybrid Graphical Models and Neural Networks Lecturer: Matt Gormley Scribes: Jakob Bauer Otilia Stretcu Rohan Varma 1 Motivation We first look

More information

Gate-Variants of Gated Recurrent Unit (GRU) Neural Networks

Gate-Variants of Gated Recurrent Unit (GRU) Neural Networks Gate-Variants of Gated Recurrent Unit (GRU) Neural Networks Rahul Dey and Fathi M. Salem Circuits, Systems, and Neural Networks (CSANN) LAB Department of Electrical and Computer Engineering Michigan State

More information

Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction

Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction by Noh, Hyeonwoo, Paul Hongsuck Seo, and Bohyung Han.[1] Presented : Badri Patro 1 1 Computer Vision Reading

More information

Layerwise Interweaving Convolutional LSTM

Layerwise Interweaving Convolutional LSTM Layerwise Interweaving Convolutional LSTM Tiehang Duan and Sargur N. Srihari Department of Computer Science and Engineering The State University of New York at Buffalo Buffalo, NY 14260, United States

More information

Image Captioning with Object Detection and Localization

Image Captioning with Object Detection and Localization Image Captioning with Object Detection and Localization Zhongliang Yang, Yu-Jin Zhang, Sadaqat ur Rehman, Yongfeng Huang, Department of Electronic Engineering, Tsinghua University, Beijing 100084, China

More information

Discriminative Training for Phrase-Based Machine Translation

Discriminative Training for Phrase-Based Machine Translation Discriminative Training for Phrase-Based Machine Translation Abhishek Arun 19 April 2007 Overview 1 Evolution from generative to discriminative models Discriminative training Model Learning schemes Featured

More information

House Price Prediction Using LSTM

House Price Prediction Using LSTM House Price Prediction Using LSTM Xiaochen Chen Lai Wei The Hong Kong University of Science and Technology Jiaxin Xu ABSTRACT In this paper, we use the house price data ranging from January 2004 to October

More information

Recurrent Neural Networks

Recurrent Neural Networks Recurrent Neural Networks Javier Béjar Deep Learning 2018/2019 Fall Master in Artificial Intelligence (FIB-UPC) Introduction Sequential data Many problems are described by sequences Time series Video/audio

More information

CS839: Probabilistic Graphical Models. Lecture 22: The Attention Mechanism. Theo Rekatsinas

CS839: Probabilistic Graphical Models. Lecture 22: The Attention Mechanism. Theo Rekatsinas CS839: Probabilistic Graphical Models Lecture 22: The Attention Mechanism Theo Rekatsinas 1 Why Attention? Consider machine translation: We need to pay attention to the word we are currently translating.

More information

Alignment and Image Comparison

Alignment and Image Comparison Alignment and Image Comparison Erik Learned- Miller University of Massachuse>s, Amherst Alignment and Image Comparison Erik Learned- Miller University of Massachuse>s, Amherst Alignment and Image Comparison

More information

Show, Discriminate, and Tell: A Discriminatory Image Captioning Model with Deep Neural Networks

Show, Discriminate, and Tell: A Discriminatory Image Captioning Model with Deep Neural Networks Show, Discriminate, and Tell: A Discriminatory Image Captioning Model with Deep Neural Networks Boya Peng Department of Computer Science Stanford University boya@stanford.edu Zelun Luo Department of Computer

More information

Use JSL to Scrape Data from the Web and Predict Football Wins! William Baum Graduate Sta/s/cs Student University of New Hampshire

Use JSL to Scrape Data from the Web and Predict Football Wins! William Baum Graduate Sta/s/cs Student University of New Hampshire Use JSL to Scrape Data from the Web and Predict Football Wins! William Baum Graduate Sta/s/cs Student University of New Hampshire Just for Fun! I m an avid American football fan Sports sta/s/cs are easily

More information

CS 224n: Assignment #3

CS 224n: Assignment #3 CS 224n: Assignment #3 Due date: 2/27 11:59 PM PST (You are allowed to use 3 late days maximum for this assignment) These questions require thought, but do not require long answers. Please be as concise

More information

FastText. Jon Koss, Abhishek Jindal

FastText. Jon Koss, Abhishek Jindal FastText Jon Koss, Abhishek Jindal FastText FastText is on par with state-of-the-art deep learning classifiers in terms of accuracy But it is way faster: FastText can train on more than one billion words

More information

Deep Learning. Deep Learning. Practical Application Automatically Adding Sounds To Silent Movies

Deep Learning. Deep Learning. Practical Application Automatically Adding Sounds To Silent Movies http://blog.csdn.net/zouxy09/article/details/8775360 Automatic Colorization of Black and White Images Automatically Adding Sounds To Silent Movies Traditionally this was done by hand with human effort

More information

Lecture 7: Neural network acoustic models in speech recognition

Lecture 7: Neural network acoustic models in speech recognition CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2017 Lecture 7: Neural network acoustic models in speech recognition Outline Hybrid acoustic modeling overview Basic

More information

Domain-Aware Sentiment Classification with GRUs and CNNs

Domain-Aware Sentiment Classification with GRUs and CNNs Domain-Aware Sentiment Classification with GRUs and CNNs Guangyuan Piao 1(B) and John G. Breslin 2 1 Insight Centre for Data Analytics, Data Science Institute, National University of Ireland Galway, Galway,

More information

On the Efficiency of Recurrent Neural Network Optimization Algorithms

On the Efficiency of Recurrent Neural Network Optimization Algorithms On the Efficiency of Recurrent Neural Network Optimization Algorithms Ben Krause, Liang Lu, Iain Murray, Steve Renals University of Edinburgh Department of Informatics s17005@sms.ed.ac.uk, llu@staffmail.ed.ac.uk,

More information

How to Develop Encoder-Decoder LSTMs

How to Develop Encoder-Decoder LSTMs Chapter 9 How to Develop Encoder-Decoder LSTMs 9.0.1 Lesson Goal The goal of this lesson is to learn how to develop encoder-decoder LSTM models. completing this lesson, you will know: After ˆ The Encoder-Decoder

More information

Gated Recurrent Models. Stephan Gouws & Richard Klein

Gated Recurrent Models. Stephan Gouws & Richard Klein Gated Recurrent Models Stephan Gouws & Richard Klein Outline Part 1: Intuition, Inference and Training Building intuitions: From Feedforward to Recurrent Models Inference in RNNs: Fprop Training in RNNs:

More information

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention Show, Attend and Tell: Neural Image Caption Generation with Visual Attention Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard Zemel, Yoshua Bengio Presented

More information

Deep Learning for Visual Computing Prof. Debdoot Sheet Department of Electrical Engineering Indian Institute of Technology, Kharagpur

Deep Learning for Visual Computing Prof. Debdoot Sheet Department of Electrical Engineering Indian Institute of Technology, Kharagpur Deep Learning for Visual Computing Prof. Debdoot Sheet Department of Electrical Engineering Indian Institute of Technology, Kharagpur Lecture - 05 Classification with Perceptron Model So, welcome to today

More information

Sentiment Classification of Food Reviews

Sentiment Classification of Food Reviews 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Sentiment Classification of Food Reviews Hua Feng Ruixi Lin Department of Electrical Engineering Department of Electrical Engineering Stanford University Stanford University

More information

Net.info. A proposal for making network service informa6on easily available. Steven Bauer Slides from 2010 MIT

Net.info. A proposal for making network service informa6on easily available. Steven Bauer Slides from 2010 MIT Net.info A proposal for making network service informa6on easily available Steven Bauer Slides from 2010 MIT Problem No easy way to iden6fy network service informa6on Ini6al mo6va6on is to make very basic

More information

ques4ons? Midterm Projects, etc. Path- Based Sta4c Analysis Sta4c analysis we know Example 11/20/12

ques4ons? Midterm Projects, etc. Path- Based Sta4c Analysis Sta4c analysis we know Example 11/20/12 Midterm Grades and solu4ons are (and have been) on Moodle The midterm was hard[er than I thought] grades will be scaled I gave everyone a 10 bonus point (already included in your total) max: 98 mean: 71

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! h0p://www.cs.toronto.edu/~rsalakhu/ Lecture 3 Parametric Distribu>ons We want model the probability

More information

Advanced branch predic.on algorithms. Ryan Gabrys Ilya Kolykhmatov

Advanced branch predic.on algorithms. Ryan Gabrys Ilya Kolykhmatov Advanced branch predic.on algorithms Ryan Gabrys Ilya Kolykhmatov Context Branches are frequent: 15-25 % A branch predictor allows the processor to specula.vely fetch and execute instruc.ons down the predicted

More information

Embracing Diversity: Searching over multiple languages

Embracing Diversity: Searching over multiple languages Embracing Diversity: Searching over multiple languages Tommaso Teofili Suneel Marthi June 12, 2017 Berlin Buzzwords, Berlin, Germany 1 Tommaso Teofili @tteofili $WhoAreWe Software Engineer, Adobe Systems

More information

Neural Networks: Learning. Cost func5on. Machine Learning

Neural Networks: Learning. Cost func5on. Machine Learning Neural Networks: Learning Cost func5on Machine Learning Neural Network (Classifica2on) total no. of layers in network no. of units (not coun5ng bias unit) in layer Layer 1 Layer 2 Layer 3 Layer 4 Binary

More information

Try typing the following in the Python shell and press return after each calculation. Write the answer the program displays next to the sums below.

Try typing the following in the Python shell and press return after each calculation. Write the answer the program displays next to the sums below. Name: Date: Instructions: PYTHON - INTRODUCTORY TASKS Open Idle (the program we will be using to write our Python codes). We can use the following code in Python to work out numeracy calculations. Try

More information

Plan, Attend, Generate: Planning for Sequence-to-Sequence Models

Plan, Attend, Generate: Planning for Sequence-to-Sequence Models Plan, Attend, Generate: Planning for Sequence-to-Sequence Models Francis Dutil, Caglar Gulcehre, Adam Trischler, Yoshua Bengio Presented by Xinyuan Zhang April 26, 2018 April 26, 2018 1 / 11 Introduction

More information

An Unsupervised Model for Joint Phrase Alignment and Extraction

An Unsupervised Model for Joint Phrase Alignment and Extraction An Unsupervised Model for Joint Phrase Alignment and Extraction Graham Neubig 1,2, Taro Watanabe 2, Eiichiro Sumita 2, Shinsuke Mori 1, Tatsuya Kawahara 1 1 Graduate School of Informatics, Kyoto University

More information

CS395T Visual Recogni5on and Search. Gautam S. Muralidhar

CS395T Visual Recogni5on and Search. Gautam S. Muralidhar CS395T Visual Recogni5on and Search Gautam S. Muralidhar Today s Theme Unsupervised discovery of images Main mo5va5on behind unsupervised discovery is that supervision is expensive Common tasks include

More information

Alignment and Image Comparison. Erik Learned- Miller University of Massachuse>s, Amherst

Alignment and Image Comparison. Erik Learned- Miller University of Massachuse>s, Amherst Alignment and Image Comparison Erik Learned- Miller University of Massachuse>s, Amherst Alignment and Image Comparison Erik Learned- Miller University of Massachuse>s, Amherst Alignment and Image Comparison

More information

What is Search For? CS 188: Ar)ficial Intelligence. Constraint Sa)sfac)on Problems Sep 14, 2015

What is Search For? CS 188: Ar)ficial Intelligence. Constraint Sa)sfac)on Problems Sep 14, 2015 CS 188: Ar)ficial Intelligence Constraint Sa)sfac)on Problems Sep 14, 2015 What is Search For? Assump)ons about the world: a single agent, determinis)c ac)ons, fully observed state, discrete state space

More information

Asaf Cidon, Assaf Eisenman, Mohammad Alizadeh and Sachin KaH

Asaf Cidon, Assaf Eisenman, Mohammad Alizadeh and Sachin KaH Cli$anger: Scaling Performance Cliffs in Memory Caches [NSDI 2016] Cache OS: Data Center Dynamic Cache Management Asaf Cidon, Assaf Eisenman, Mohammad Alizadeh and Sachin KaH 1 Key-Value Caches are Essen1al

More information

Programming Environments

Programming Environments Programming Environments There are several ways of crea/ng a computer program Using an Integrated Development Environment (IDE) Using a text editor You should use the method you are most comfortable with.

More information

Neural Networks for Machine Learning. Lecture 5a Why object recogni:on is difficult. Geoffrey Hinton with Ni:sh Srivastava Kevin Swersky

Neural Networks for Machine Learning. Lecture 5a Why object recogni:on is difficult. Geoffrey Hinton with Ni:sh Srivastava Kevin Swersky Neural Networks for Machine Learning Lecture 5a Why object recogni:on is difficult Geoffrey Hinton with Ni:sh Srivastava Kevin Swersky Things that make it hard to recognize objects Segmenta:on: Real scenes

More information

DCU-UvA Multimodal MT System Report

DCU-UvA Multimodal MT System Report DCU-UvA Multimodal MT System Report Iacer Calixto ADAPT Centre School of Computing Dublin City University Dublin, Ireland iacer.calixto@adaptcentre.ie Desmond Elliott ILLC University of Amsterdam Science

More information

Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah

Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah Reference Most of the slides are taken from the third chapter of the online book by Michael Nielson: neuralnetworksanddeeplearning.com

More information

Algorithms Lecture 11. UC Davis, ECS20, Winter Discrete Mathematics for Computer Science

Algorithms Lecture 11. UC Davis, ECS20, Winter Discrete Mathematics for Computer Science UC Davis, ECS20, Winter 2017 Discrete Mathematics for Computer Science Prof. Raissa D Souza (slides adopted from Michael Frank and Haluk Bingöl) Lecture 11 Algorithms 3.1-3.2 Algorithms Member of the House

More information

(Refer Slide Time: 1:43)

(Refer Slide Time: 1:43) (Refer Slide Time: 1:43) Digital Circuits and Systems Prof. S. Srinivasan Department of Electrical Engineering Indian Institute of Technology, Madras Lecture - 27 Pattern Detector So, we talked about Moore

More information

Structured Prediction Basics

Structured Prediction Basics CS11-747 Neural Networks for NLP Structured Prediction Basics Graham Neubig Site https://phontron.com/class/nn4nlp2017/ A Prediction Problem I hate this movie I love this movie very good good neutral bad

More information

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 32: Pipeline Parallelism 3

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 32: Pipeline Parallelism 3 CS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 32: Pipeline Parallelism 3 Instructor: Dan Garcia inst.eecs.berkeley.edu/~cs61c! Compu@ng in the News At a laboratory in São Paulo,

More information

Generalizing Map- Reduce

Generalizing Map- Reduce Generalizing Map- Reduce 1 Example: A Map- Reduce Graph map reduce map... reduce reduce map 2 Map- reduce is not a solu;on to every problem, not even every problem that profitably can use many compute

More information

Problem Solving through Programming In C Prof. Anupam Basu Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur

Problem Solving through Programming In C Prof. Anupam Basu Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur Problem Solving through Programming In C Prof. Anupam Basu Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur Lecture - 04 Introduction to Programming Language Concepts

More information

(Refer Slide Time: 05:25)

(Refer Slide Time: 05:25) Data Structures and Algorithms Dr. Naveen Garg Department of Computer Science and Engineering IIT Delhi Lecture 30 Applications of DFS in Directed Graphs Today we are going to look at more applications

More information

Gradient Descent. Michail Michailidis & Patrick Maiden

Gradient Descent. Michail Michailidis & Patrick Maiden Gradient Descent Michail Michailidis & Patrick Maiden Outline Mo4va4on Gradient Descent Algorithm Issues & Alterna4ves Stochas4c Gradient Descent Parallel Gradient Descent HOGWILD! Mo4va4on It is good

More information

Ways to implement a language

Ways to implement a language Interpreters Implemen+ng PLs Most of the course is learning fundamental concepts for using PLs Syntax vs. seman+cs vs. idioms Powerful constructs like closures, first- class objects, iterators (streams),

More information

Python & Web Mining. Lecture Old Dominion University. Department of Computer Science CS 495 Fall 2012

Python & Web Mining. Lecture Old Dominion University. Department of Computer Science CS 495 Fall 2012 Python & Web Mining Lecture 6 10-10-12 Old Dominion University Department of Computer Science CS 495 Fall 2012 Hany SalahEldeen Khalil hany@cs.odu.edu Scenario So what did Professor X do when he wanted

More information

Informa(on Retrieval. Administra*ve. Sta*s*cal MT Overview. Problems for Sta*s*cal MT

Informa(on Retrieval. Administra*ve. Sta*s*cal MT Overview. Problems for Sta*s*cal MT Administra*ve Introduc*on to Informa(on Retrieval CS457 Fall 2011! David Kauchak Projects Status 2 on Friday Paper next Friday work on the paper in parallel if you re not done with experiments by early

More information

A Comparison of Sequence-Trained Deep Neural Networks and Recurrent Neural Networks Optical Modeling For Handwriting Recognition

A Comparison of Sequence-Trained Deep Neural Networks and Recurrent Neural Networks Optical Modeling For Handwriting Recognition A Comparison of Sequence-Trained Deep Neural Networks and Recurrent Neural Networks Optical Modeling For Handwriting Recognition Théodore Bluche, Hermann Ney, Christopher Kermorvant SLSP 14, Grenoble October

More information

The Hitchhiker s Guide to TensorFlow:

The Hitchhiker s Guide to TensorFlow: The Hitchhiker s Guide to TensorFlow: Beyond Recurrent Neural Networks (sort of) Keith Davis @keithdavisiii iamthevastidledhitchhiker.github.io Topics Kohonen/Self-Organizing Maps LSTMs in TensorFlow GRU

More information

Speech and Language Processing. Daniel Jurafsky & James H. Martin. Copyright c All rights reserved. Draft of September 23, 2018.

Speech and Language Processing. Daniel Jurafsky & James H. Martin. Copyright c All rights reserved. Draft of September 23, 2018. Speech and Language Processing. Daniel Jurafsky & James H. Martin. Copyright c 2018. All rights reserved. Draft of September 23, 2018. CHAPTER 9 Sequence Processing with Recurrent Networks Time will explain.

More information

Novel Image Captioning

Novel Image Captioning 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

10703 Deep Reinforcement Learning and Control

10703 Deep Reinforcement Learning and Control 10703 Deep Reinforcement Learning and Control Russ Salakhutdinov Machine Learning Department rsalakhu@cs.cmu.edu Policy Gradient II Used Materials Disclaimer: Much of the material and slides for this lecture

More information

CS489/698: Intro to ML

CS489/698: Intro to ML CS489/698: Intro to ML Lecture 14: Training of Deep NNs Instructor: Sun Sun 1 Outline Activation functions Regularization Gradient-based optimization 2 Examples of activation functions 3 5/28/18 Sun Sun

More information

Read & Download (PDF Kindle) Data Structures And Other Objects Using Java (4th Edition)

Read & Download (PDF Kindle) Data Structures And Other Objects Using Java (4th Edition) Read & Download (PDF Kindle) Data Structures And Other Objects Using Java (4th Edition) Data Structures and Other Objects Using Java is a gradual, "just-in-time" introduction to Data Structures for a CS2

More information

Statistical Machine Translation Lecture 3. Word Alignment Models

Statistical Machine Translation Lecture 3. Word Alignment Models p. Statistical Machine Translation Lecture 3 Word Alignment Models Stephen Clark based on slides by Philipp Koehn p. Statistical Modeling p Mary did not slap the green witch Maria no daba una bofetada

More information

Introduction to Scientific Computing

Introduction to Scientific Computing Introduction to Scientific Computing Dr Hanno Rein Last updated: October 12, 2018 1 Computers A computer is a machine which can perform a set of calculations. The purpose of this course is to give you

More information

LSTM with Working Memory

LSTM with Working Memory LSTM with Working Memory Andrew Pulver Department of Computer Science University at Albany Email: apulver@albany.edu Siwei Lyu Department of Computer Science University at Albany Email: slyu@albany.edu

More information