Advanced RNN (GRU and LSTM) for Machine Transla:on. Dr. Kira Radinsky CTO SalesPredict Visi8ng Professor/Scien8st Technion
|
|
- Emerald Richards
- 6 years ago
- Views:
Transcription
1 Advanced RNN (GRU and LSTM) for Machine Transla:on Dr. Kira Radinsky CTO SalesPredict Visi8ng Professor/Scien8st Technion Slides were adapted from lectures by Richard Socher
2 Overview Machine transla8on RNN Models tackling MT: Gated Recurrent Units by Cho et al. (2014) Long-Short-Term-Memories by Hochreiter and Schmidhuber (1997)
3 Machine Transla:on Methods are sta8s8cal Use parallel corpora European Parliament First parallel corpus: RoseUa Stone à Tradi8onal systems are very complex
4 Current sta:s:cal machine transla:on systems Source language f, e.g. French Target language e, e.g. English Probabilis8c formula8on (using Bayes rule) Transla8on model p(f e) trained on parallel corpus Language model p(e) trained on English only corpus (lots, free!) Transla8on Model French à à Pieces of English à p(f e) Language Model p(e) Decoder argmax p(f e)p(e) à Proper English
5 Phrase-based decoder
6 Step 1: Alignment Goal: know which word or phrases in source language would translate to what words or phrases in target language? à Hard already! Japan shaken by two new quakes Le Japon secoué par deux nouveaux séismes spurious word Japan shaken by two new quakes Le Japon secoué par deux nouveaux séismes Alignment examples from Chris Manning/CS224n
7 Step 1: Alignment zero fertility word not translated And the program has been implemented one-to-many alignment Le programme a été mis en application And the program has been implemented Le programme a été mis en application
8 Step 1: Alignment Really hard :/ The balance was the territory of the aboriginal people Le reste appartenait aux autochtones The balance was the territory of Le reste appartenait aux autochtones many-to-one alignments the aboriginal people
9 Step 1: Alignment The poor don t have any money many-to-many alignment Les pauvres sont démunis The poor don t have any money Les pauvres sont phrase alignment démunis
10 Step 1: Alignment We could spend an en8re lecture on alignment models Not only single words but could use phrases, syntax Then consider reordering of translated phrases er geht ja nicht nach hause er geht ja nicht nach hause he does not go home Example from Philipp Koehn
11 Phrase-Based Sta:s:cal MT: The Pharaoh/Moses Model Foreign input segmented into phrases phrase is any subsequence of words not a linguis8c phrase Each phrase is probabilis8cally translated into English P(to the conference zur Konferenz) P(into the mee8ng zur Konferenz) Phrases are probabilis8cally re-ordered (See J&M or Lopez 2008 for an intro.) This is s:ll preky much the state-of-the-art!
12 AMer many steps Each phrase in source language has many possible transla8ons resul8ng in large search space: Translation Options er geht ja nicht nach hause he it, it, he it is he will be it goes he goes is are goes go is are is after all does yes is, of course, not is not are not is not a not is not does not do not not do not does not is not to following not after not to after to according to in home under house return home do not house home chamber at home
13 Decode: Search for best of many hypotheses Hard search problem that also includes language model er geht ja nicht nach hause yes he goes home are does not go home it to
14 Tradi:onal MT Skipped hundreds of important details A lot of human feature engineering Very complex systems Many different, independent machine learning problems
15 Deep learning to the rescue!? Maybe, we could translate directly with an RNN? Encoder Decoder: Awesome y 1 sauce y 2 h 1 h 2 W W h 3 x 1 x 2 x 3 Echt dicke Kiste This needs to capture the en8re phrase!
16 MT with RNNs Simplest Model Encoder: Decoder: Minimize cross entropy error for all target words condi8oned on source words It s not quite that simple ;)
17 RNN Transla:on Model Extensions 1. Train different RNN weights for encoding and decoding Awesome y 1 sauce y 2 h 1 h 2 W W h 3 x 1 x 2 x 3 Echt dicke Kiste This means the φ() func8ons in would have different W (hh) matrices in the two models
18 RNN Transla:on Model Extensions Nota8on: Each input of has its own linear transforma8on matrix. Simple: 2. Compute every hidden state in decoder from Previous hidden state (standard) Last hidden vector of encoder c=h T Previous predicted output word y t---1 Language model with three inputs to each decoder neuron: (h t 1, c, y t 1 ) Cho et al. 2014
19 Different picture, same idea Kyunghyun Cho et al. 2014
20 RNN Transla:on Model Extensions 3. Train stacked/deep RNNs with mul8ple layers 4. Poten8ally train bidirec8onal encoder h (3) h (2) h (1) x 5. Train input sequence in reverse order for simple op8miza8on problem: Instead of A B C à X Y, train with C B A à X Y
21 6. Main Improvement: BeKer Units More complex hidden unit computa8on in recurrence! Gated Recurrent Units (GRU) introduced by Cho et al (see reading list) Main ideas: keep around memories to capture long distance dependencies allow error messages to flow at different strengths depending on the inputs
22 GRUs Standard RNN computes hidden layer at next 8me step directly: GRU first computes an update gate (another layer) based on current input word vector and hidden state Compute reset gate similarly but with different weights
23 GRUs Update gate Reset gate New memory content: Final memory at 8me step combines current and previous 8me steps:
24 GRUs Intui8vely, the update gate defines how much of the previous memory to keep around
25 GRUs Intui8vely, the reset gate determines how to combine the new input with the previous memory If we set the reset to all 1 s and update gate to all 0 s we again arrive at our plain RNN model.
26 GRUs Update gate Reset gate New memory content: If reset gate unit is ~0, then this ignores previous memory and only stores the new word informa8on if it the i-th element of r t is 0 we only take the current word into account Final memory at 8me step combines current and previous 8me steps: if it the i-th element of z t is 1 we copy the previous state and ignore the current one (including the current word). Otherwise, we can take only the current word (based on previous reset gate) or with is connec8on to previous words
27 AKempt at a clean illustra:on Final memory h t---1 h t Memory (reset) ~h t---1 ~ ht Update gate z t---1 z t Has to be sigmoid to illustrate the on/off switch beuer Reset gate r t---1 r t Input: x t---1 x t
28 GRU intui:on If reset is close to 0, ignore previous hidden state à Allows model to drop informa8on that is irrelevant in the future Update gate z controls how much of past state should mauer now. If z close to 1, then we can copy informa8on in that unit through many 8me steps! Less vanishing gradient! Units with short-term dependencies ouen have reset gates very ac8ve
29 GRU intui:on Units with long term dependencies have ac8ve update gates z Illustra8on: z h r ~ h x Deriva8ve of? à rest is same chain rule, but implement with modulariza:on or automa8c differen8a8on (e.g. theano)
30 GRU Python Implementa:on GRU layer is just another way of compu8ng the hidden state. So all we really need to do is change the hidden state computa8on in our forward propaga8on func8on. In our implementa8on we also added bias units. It s quite typical that these are not shown in the equa8ons. I also added a word embedding layer E.
31 GRU Python Implementa:on: Gradients We could derive the gradients for E,W,U,b and by hand using the chain rule, just like we did before. But in prac8ce most people use libraries like Theano that support autodifferena8on of expressions.
32 Adding a second GRU layer
33 Results Here are a few good examples of the network output (capitaliza8on added by me). I am a bot, and this ac8on was performed automa8cally. I enforce myself ridiculously well enough to just youtube. I ve got a good rhythm going! There is no problem here, but at least s8ll wave! It depends on how plausible my judgement is. ( with the cons8tu8on which makes it impossible ) Our network was able to Seman:c dependencies! For example, bot and automa8cally are clearly related, as are the opening and closing brackets.
34 Long-short-term-memories (LSTMs) We can make the units even more complex Allow each 8me step to modify Input gate (high if current cell mauers) Forget (gate 0, forget past) Output (how much cell is exposed) New memory cell Final memory cell: Final hidden state: Many varia8on: LSTM: A Search Space Odyssey
35 Long-short-term-memories (LSTMs) A candidate hidden state that is computed based on the current input and the previous hidden state. It is exactly the same equa8on we had in our vanilla RNN! However, instead of taking as the new hidden state as we did in the RNN, we will use the input gate from above to pick some of it.
36 Long-short-term-memories (LSTMs) previous memory mul8plied by the forget gate the newly computed hidden state mul8plied by the input gate The internal memory of the unit combina8on of how we want to combine previous memory and the new input. We could choose to ignore the old memory completely (forget gate all 0 s) or ignore the newly computed state completely (input gate all 0 s), but most likely we want something in between these two extremes.
37 Long-short-term-memories (LSTMs) Given the memory c t, we finally compute the output hidden state h t by mul8plying the memory with the output gate. Not all of the internal memory may be relevant to the hidden state used by other units in the network.
38 Illustra:ons a bit overwhelming ;) net c j s c =s + c g y in j j j y c j g in g y 1.0 j h h y out j w ic j y in j y out j w ic j net in j w in j i w out j i net out j Long Short---Term Memory by Hochreiter and Schmidhuber (1997) hup://people.idsia.ch/~juergen/lstm/sld017.htm hup://deeplearning.net/tutorial/lstm.html Intui8on: memory cells can keep informa8on intact, unless inputs makes them forget it or overwrite it with new input. Cell can decide to output this informa8on or just store it
39 LSTMs are currently very hip! En vogue default model for most sequence labeling tasks Very powerful, especially when stacked and made even deeper (each hidden layer is already computed by a deep internal network) Most useful if you have lots and lots of data
40 Deep LSTMs don t outperform tradi:onal MT yet Method test BLEU score (ntst14) Bahdanau et al. [2] Baseline System [29] Single forward LSTM, beam size Single reversed LSTM, beam size Ensemble of 5 reversed LSTMs, beam size Ensemble of 2 reversed LSTMs, beam size Ensemble of 5 reversed LSTMs, beam size Ensemble of 5 reversed LSTMs, beam size Table 1: The performance of the LSTM on WMT 14 English to French test set (ntst14). Note that an ensemble of 5 LSTMs with a beam of size 2 is cheaper than of a single LSTM with a beam of size 12. Method test BLEU score (ntst14) Baseline System [29] Cho et al. [5] Best WMT 14 result [9] 37.0 Rescoring the baseline 1000-best with a single forward LSTM Rescoring the baseline 1000-best with a single reversed LSTM Rescoring the baseline 1000-best with an ensemble of 5 reversed LSTMs 36.5 Oracle Rescoring of the Baseline 1000-best lists 45 Sequence to Sequence Learning by Sutskever et al. 2014
41 Deep LSTM for Machine Transla:on PCA of vectors from last 8me step hidden layer 4 15 I was given a card by her in the garden Mary admires John Mary is in love with John 10 5 In the garden, she gave me a card She gave me a card in the garden John admires Mary John is in love with Mary Mary respects John 0 5 In the garden, I gave her a card She was given a card by me in the garden John respects Mary 15 I gave her a card in the garden Sequence to Sequence Learning by Sutskever et al. 2014
42 Further Improvements: More Gates! Gated Feedback Recurrent Neural Networks, Chung et al (a) Conventional stacked RNN (b) Gated Feedback RNN
43 Summary LSTMs/GRU were designed to combat vanishing gradients through a ga*ng mechanism. LTSM (1997) GRU(2014) A LSTM/GRU layer is just another way to compute a hidden state that was previously
44 Summary Recurrent Neural Networks are powerful A lot of ongoing work right now Gated Recurrent Units even beuer LSTMs maybe even beuer (jury s8ll out) This was an advanced lecture à gain intui8on, encourage explora8on Next up: Recursive Neural Networks simpler and also powerful :)
Recurrent Neural Nets II
Recurrent Neural Nets II Steven Spielberg Pon Kumar, Tingke (Kevin) Shen Machine Learning Reading Group, Fall 2016 9 November, 2016 Outline 1 Introduction 2 Problem Formulations with RNNs 3 LSTM for Optimization
More informationEmpirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling
Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling Authors: Junyoung Chung, Caglar Gulcehre, KyungHyun Cho and Yoshua Bengio Presenter: Yu-Wei Lin Background: Recurrent Neural
More informationSequence Modeling: Recurrent and Recursive Nets. By Pyry Takala 14 Oct 2015
Sequence Modeling: Recurrent and Recursive Nets By Pyry Takala 14 Oct 2015 Agenda Why Recurrent neural networks? Anatomy and basic training of an RNN (10.2, 10.2.1) Properties of RNNs (10.2.2, 8.2.6) Using
More informationConvolutional Sequence to Sequence Learning. Denis Yarats with Jonas Gehring, Michael Auli, David Grangier, Yann Dauphin Facebook AI Research
Convolutional Sequence to Sequence Learning Denis Yarats with Jonas Gehring, Michael Auli, David Grangier, Yann Dauphin Facebook AI Research Sequence generation Need to model a conditional distribution
More informationSentiment Classification of Food Reviews
Sentiment Classification of Food Reviews Hua Feng Department of Electrical Engineering Stanford University Stanford, CA 94305 fengh15@stanford.edu Ruixi Lin Department of Electrical Engineering Stanford
More informationShow, Discriminate, and Tell: A Discriminatory Image Captioning Model with Deep Neural Networks
Show, Discriminate, and Tell: A Discriminatory Image Captioning Model with Deep Neural Networks Zelun Luo Department of Computer Science Stanford University zelunluo@stanford.edu Te-Lin Wu Department of
More informationNatural Language Processing with Deep Learning CS224N/Ling284
Natural Language Processing with Deep Learning CS224N/Ling284 Lecture 8: Recurrent Neural Networks Christopher Manning and Richard Socher Organization Extra project office hour today after lecture Overview
More informationKyoto-NMT: a Neural Machine Translation implementation in Chainer
Kyoto-NMT: a Neural Machine Translation implementation in Chainer Fabien Cromières Japan Science and Technology Agency Kawaguchi-shi, Saitama 332-0012 fabien@pa.jst.jp Abstract We present Kyoto-NMT, an
More informationNatural Language Processing with Deep Learning. CS224N/Ling284. Lecture 5: Backpropagation Kevin Clark
Natural Language Processing with Deep Learning CS4N/Ling84 Lecture 5: Backpropagation Kevin Clark Announcements Assignment 1 due Thursday, 11:59 You can use up to 3 late days (making it due Sunday at midnight)
More informationMachine Learning 13. week
Machine Learning 13. week Deep Learning Convolutional Neural Network Recurrent Neural Network 1 Why Deep Learning is so Popular? 1. Increase in the amount of data Thanks to the Internet, huge amount of
More informationAsynchronous Parallel Learning for Neural Networks and Structured Models with Dense Features
Asynchronous Parallel Learning for Neural Networks and Structured Models with Dense Features Xu SUN ( 孙栩 ) Peking University xusun@pku.edu.cn Motivation Neural networks -> Good Performance CNN, RNN, LSTM
More informationCSC 578 Neural Networks and Deep Learning
CSC 578 Neural Networks and Deep Learning Fall 2018/19 7. Recurrent Neural Networks (Some figures adapted from NNDL book) 1 Recurrent Neural Networks 1. Recurrent Neural Networks (RNNs) 2. RNN Training
More informationAlgorithms for NLP. Machine Translation. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley
Algorithms for NLP Machine Translation Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley Machine Translation Machine Translation: Examples Levels of Transfer Word-Level MT: Examples la politique
More informationLSTM for Language Translation and Image Captioning. Tel Aviv University Deep Learning Seminar Oran Gafni & Noa Yedidia
1 LSTM for Language Translation and Image Captioning Tel Aviv University Deep Learning Seminar Oran Gafni & Noa Yedidia 2 Part I LSTM for Language Translation Motivation Background (RNNs, LSTMs) Model
More informationEmpirical Evaluation of RNN Architectures on Sentence Classification Task
Empirical Evaluation of RNN Architectures on Sentence Classification Task Lei Shen, Junlin Zhang Chanjet Information Technology lorashen@126.com, zhangjlh@chanjet.com Abstract. Recurrent Neural Networks
More informationMachine Learning Crash Course: Part I
Machine Learning Crash Course: Part I Ariel Kleiner August 21, 2012 Machine learning exists at the intersec
More informationRelational inductive biases, deep learning, and graph networks
Relational inductive biases, deep learning, and graph networks Peter Battaglia et al. 2018 1 What The authors explore how we can combine relational inductive biases and DL. They introduce graph network
More informationOp#miza#on Problems, John Gu7ag MIT Department of Electrical Engineering and Computer Science LECTURE 2 1
Op#miza#on Problems, John Gu7ag MIT Department of Electrical Engineering and Computer Science 6.0002 LECTURE 2 1 Relevant Reading for Today s Lecture Chapter 13 6.0002 LECTURE 2 2 The Pros and Cons of
More informationModeling Sequences Conditioned on Context with RNNs
Modeling Sequences Conditioned on Context with RNNs Sargur Srihari srihari@buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 10. Topics in Sequence
More informationRecurrent Neural Networks. Nand Kishore, Audrey Huang, Rohan Batra
Recurrent Neural Networks Nand Kishore, Audrey Huang, Rohan Batra Roadmap Issues Motivation 1 Application 1: Sequence Level Training 2 Basic Structure 3 4 Variations 5 Application 3: Image Classification
More informationDeep neural networks
Deep neural networks Outline What s new in ANNs in the last 5-10 years? Deeper networks, more data, and faster training Scalability and use of GPUs Symbolic differentiation reverse-mode automatic differentiation
More informationTuning. Philipp Koehn presented by Gaurav Kumar. 28 September 2017
Tuning Philipp Koehn presented by Gaurav Kumar 28 September 2017 The Story so Far: Generative Models 1 The definition of translation probability follows a mathematical derivation argmax e p(e f) = argmax
More informationSparse Feature Learning
Sparse Feature Learning Philipp Koehn 1 March 2016 Multiple Component Models 1 Translation Model Language Model Reordering Model Component Weights 2 Language Model.05 Translation Model.26.04.19.1 Reordering
More informationStatistical Machine Translation Part IV Log-Linear Models
Statistical Machine Translation art IV Log-Linear Models Alexander Fraser Institute for Natural Language rocessing University of Stuttgart 2011.11.25 Seminar: Statistical MT Where we have been We have
More informationTransition-Based Dependency Parsing with Stack Long Short-Term Memory
Transition-Based Dependency Parsing with Stack Long Short-Term Memory Chris Dyer, Miguel Ballesteros, Wang Ling, Austin Matthews, Noah A. Smith Association for Computational Linguistics (ACL), 2015 Presented
More informationSta$c Single Assignment (SSA) Form
Sta$c Single Assignment (SSA) Form SSA form Sta$c single assignment form Intermediate representa$on of program in which every use of a variable is reached by exactly one defini$on Most programs do not
More informationOutline GF-RNN ReNet. Outline
Outline Gated Feedback Recurrent Neural Networks. arxiv1502. Introduction: RNN & Gated RNN Gated Feedback Recurrent Neural Networks (GF-RNN) Experiments: Character-level Language Modeling & Python Program
More informationRecurrent Neural Network (RNN) Industrial AI Lab.
Recurrent Neural Network (RNN) Industrial AI Lab. For example (Deterministic) Time Series Data Closed- form Linear difference equation (LDE) and initial condition High order LDEs 2 (Stochastic) Time Series
More informationLSTM and its variants for visual recognition. Xiaodan Liang Sun Yat-sen University
LSTM and its variants for visual recognition Xiaodan Liang xdliang328@gmail.com Sun Yat-sen University Outline Context Modelling with CNN LSTM and its Variants LSTM Architecture Variants Application in
More informationMachine Learning for Natural Language Processing. Alice Oh January 17, 2018
Machine Learning for Natural Language Processing Alice Oh January 17, 2018 Overview Distributed representation Temporal neural networks RNN LSTM GRU Sequence-to-sequence models Machine translation Response
More informationINF5820/INF9820 LANGUAGE TECHNOLOGICAL APPLICATIONS. Jan Tore Lønning, Lecture 8, 12 Oct
1 INF5820/INF9820 LANGUAGE TECHNOLOGICAL APPLICATIONS Jan Tore Lønning, Lecture 8, 12 Oct. 2016 jtl@ifi.uio.no Today 2 Preparing bitext Parameter tuning Reranking Some linguistic issues STMT so far 3 We
More informationOPTIMIZING PERFORMANCE OF RECURRENT NEURAL NETWORKS
April 4-7, 2016 Silicon Valley OPTIMIZING PERFORMANCE OF RECURRENT NEURAL NETWORKS Jeremy Appleyard, 7 April 2016 RECURRENT NEURAL NETWORKS Output is fed into input Perform the same operation repeatedly
More informationConvolu'onal Neural Networks
Convolu'onal Neural Networks Dr. Kira Radinsky CTO SalesPredict Visi8ng Professor/Scien8st Technion Slides were adapted from Fei-Fei Li & Andrej Karpathy & Jus8n Johnson A bit of history: Hubel & Wiesel,
More informationTable of Contents. What Really is a Hidden Unit? Visualizing Feed-Forward NNs. Visualizing Convolutional NNs. Visualizing Recurrent NNs
Table of Contents What Really is a Hidden Unit? Visualizing Feed-Forward NNs Visualizing Convolutional NNs Visualizing Recurrent NNs Visualizing Attention Visualizing High Dimensional Data What do visualizations
More informationPractical Methodology. Lecture slides for Chapter 11 of Deep Learning Ian Goodfellow
Practical Methodology Lecture slides for Chapter 11 of Deep Learning www.deeplearningbook.org Ian Goodfellow 2016-09-26 What drives success in ML? Arcane knowledge of dozens of obscure algorithms? Mountains
More informationRecurrent Neural Networks
Recurrent Neural Networks 11-785 / Fall 2018 / Recitation 7 Raphaël Olivier Recap : RNNs are magic They have infinite memory They handle all kinds of series They re the basis of recent NLP : Translation,
More informationEncoding RNNs, 48 End of sentence (EOS) token, 207 Exploding gradient, 131 Exponential function, 42 Exponential Linear Unit (ELU), 44
A Activation potential, 40 Annotated corpus add padding, 162 check versions, 158 create checkpoints, 164, 166 create input, 160 create train and validation datasets, 163 dropout, 163 DRUG-AE.rel file,
More informationSEMANTIC COMPUTING. Lecture 9: Deep Learning: Recurrent Neural Networks (RNNs) TU Dresden, 21 December 2018
SEMANTIC COMPUTING Lecture 9: Deep Learning: Recurrent Neural Networks (RNNs) Dagmar Gromann International Center For Computational Logic TU Dresden, 21 December 2018 Overview Handling Overfitting Recurrent
More information27: Hybrid Graphical Models and Neural Networks
10-708: Probabilistic Graphical Models 10-708 Spring 2016 27: Hybrid Graphical Models and Neural Networks Lecturer: Matt Gormley Scribes: Jakob Bauer Otilia Stretcu Rohan Varma 1 Motivation We first look
More informationGate-Variants of Gated Recurrent Unit (GRU) Neural Networks
Gate-Variants of Gated Recurrent Unit (GRU) Neural Networks Rahul Dey and Fathi M. Salem Circuits, Systems, and Neural Networks (CSANN) LAB Department of Electrical and Computer Engineering Michigan State
More informationImage Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction
Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction by Noh, Hyeonwoo, Paul Hongsuck Seo, and Bohyung Han.[1] Presented : Badri Patro 1 1 Computer Vision Reading
More informationLayerwise Interweaving Convolutional LSTM
Layerwise Interweaving Convolutional LSTM Tiehang Duan and Sargur N. Srihari Department of Computer Science and Engineering The State University of New York at Buffalo Buffalo, NY 14260, United States
More informationImage Captioning with Object Detection and Localization
Image Captioning with Object Detection and Localization Zhongliang Yang, Yu-Jin Zhang, Sadaqat ur Rehman, Yongfeng Huang, Department of Electronic Engineering, Tsinghua University, Beijing 100084, China
More informationDiscriminative Training for Phrase-Based Machine Translation
Discriminative Training for Phrase-Based Machine Translation Abhishek Arun 19 April 2007 Overview 1 Evolution from generative to discriminative models Discriminative training Model Learning schemes Featured
More informationHouse Price Prediction Using LSTM
House Price Prediction Using LSTM Xiaochen Chen Lai Wei The Hong Kong University of Science and Technology Jiaxin Xu ABSTRACT In this paper, we use the house price data ranging from January 2004 to October
More informationRecurrent Neural Networks
Recurrent Neural Networks Javier Béjar Deep Learning 2018/2019 Fall Master in Artificial Intelligence (FIB-UPC) Introduction Sequential data Many problems are described by sequences Time series Video/audio
More informationCS839: Probabilistic Graphical Models. Lecture 22: The Attention Mechanism. Theo Rekatsinas
CS839: Probabilistic Graphical Models Lecture 22: The Attention Mechanism Theo Rekatsinas 1 Why Attention? Consider machine translation: We need to pay attention to the word we are currently translating.
More informationAlignment and Image Comparison
Alignment and Image Comparison Erik Learned- Miller University of Massachuse>s, Amherst Alignment and Image Comparison Erik Learned- Miller University of Massachuse>s, Amherst Alignment and Image Comparison
More informationShow, Discriminate, and Tell: A Discriminatory Image Captioning Model with Deep Neural Networks
Show, Discriminate, and Tell: A Discriminatory Image Captioning Model with Deep Neural Networks Boya Peng Department of Computer Science Stanford University boya@stanford.edu Zelun Luo Department of Computer
More informationUse JSL to Scrape Data from the Web and Predict Football Wins! William Baum Graduate Sta/s/cs Student University of New Hampshire
Use JSL to Scrape Data from the Web and Predict Football Wins! William Baum Graduate Sta/s/cs Student University of New Hampshire Just for Fun! I m an avid American football fan Sports sta/s/cs are easily
More informationCS 224n: Assignment #3
CS 224n: Assignment #3 Due date: 2/27 11:59 PM PST (You are allowed to use 3 late days maximum for this assignment) These questions require thought, but do not require long answers. Please be as concise
More informationFastText. Jon Koss, Abhishek Jindal
FastText Jon Koss, Abhishek Jindal FastText FastText is on par with state-of-the-art deep learning classifiers in terms of accuracy But it is way faster: FastText can train on more than one billion words
More informationDeep Learning. Deep Learning. Practical Application Automatically Adding Sounds To Silent Movies
http://blog.csdn.net/zouxy09/article/details/8775360 Automatic Colorization of Black and White Images Automatically Adding Sounds To Silent Movies Traditionally this was done by hand with human effort
More informationLecture 7: Neural network acoustic models in speech recognition
CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2017 Lecture 7: Neural network acoustic models in speech recognition Outline Hybrid acoustic modeling overview Basic
More informationDomain-Aware Sentiment Classification with GRUs and CNNs
Domain-Aware Sentiment Classification with GRUs and CNNs Guangyuan Piao 1(B) and John G. Breslin 2 1 Insight Centre for Data Analytics, Data Science Institute, National University of Ireland Galway, Galway,
More informationOn the Efficiency of Recurrent Neural Network Optimization Algorithms
On the Efficiency of Recurrent Neural Network Optimization Algorithms Ben Krause, Liang Lu, Iain Murray, Steve Renals University of Edinburgh Department of Informatics s17005@sms.ed.ac.uk, llu@staffmail.ed.ac.uk,
More informationHow to Develop Encoder-Decoder LSTMs
Chapter 9 How to Develop Encoder-Decoder LSTMs 9.0.1 Lesson Goal The goal of this lesson is to learn how to develop encoder-decoder LSTM models. completing this lesson, you will know: After ˆ The Encoder-Decoder
More informationGated Recurrent Models. Stephan Gouws & Richard Klein
Gated Recurrent Models Stephan Gouws & Richard Klein Outline Part 1: Intuition, Inference and Training Building intuitions: From Feedforward to Recurrent Models Inference in RNNs: Fprop Training in RNNs:
More informationShow, Attend and Tell: Neural Image Caption Generation with Visual Attention
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard Zemel, Yoshua Bengio Presented
More informationDeep Learning for Visual Computing Prof. Debdoot Sheet Department of Electrical Engineering Indian Institute of Technology, Kharagpur
Deep Learning for Visual Computing Prof. Debdoot Sheet Department of Electrical Engineering Indian Institute of Technology, Kharagpur Lecture - 05 Classification with Perceptron Model So, welcome to today
More informationSentiment Classification of Food Reviews
1 2 3 4 5 6 7 8 9 10 11 12 13 14 Sentiment Classification of Food Reviews Hua Feng Ruixi Lin Department of Electrical Engineering Department of Electrical Engineering Stanford University Stanford University
More informationNet.info. A proposal for making network service informa6on easily available. Steven Bauer Slides from 2010 MIT
Net.info A proposal for making network service informa6on easily available Steven Bauer Slides from 2010 MIT Problem No easy way to iden6fy network service informa6on Ini6al mo6va6on is to make very basic
More informationques4ons? Midterm Projects, etc. Path- Based Sta4c Analysis Sta4c analysis we know Example 11/20/12
Midterm Grades and solu4ons are (and have been) on Moodle The midterm was hard[er than I thought] grades will be scaled I gave everyone a 10 bonus point (already included in your total) max: 98 mean: 71
More informationSTA 4273H: Sta-s-cal Machine Learning
STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! h0p://www.cs.toronto.edu/~rsalakhu/ Lecture 3 Parametric Distribu>ons We want model the probability
More informationAdvanced branch predic.on algorithms. Ryan Gabrys Ilya Kolykhmatov
Advanced branch predic.on algorithms Ryan Gabrys Ilya Kolykhmatov Context Branches are frequent: 15-25 % A branch predictor allows the processor to specula.vely fetch and execute instruc.ons down the predicted
More informationEmbracing Diversity: Searching over multiple languages
Embracing Diversity: Searching over multiple languages Tommaso Teofili Suneel Marthi June 12, 2017 Berlin Buzzwords, Berlin, Germany 1 Tommaso Teofili @tteofili $WhoAreWe Software Engineer, Adobe Systems
More informationNeural Networks: Learning. Cost func5on. Machine Learning
Neural Networks: Learning Cost func5on Machine Learning Neural Network (Classifica2on) total no. of layers in network no. of units (not coun5ng bias unit) in layer Layer 1 Layer 2 Layer 3 Layer 4 Binary
More informationTry typing the following in the Python shell and press return after each calculation. Write the answer the program displays next to the sums below.
Name: Date: Instructions: PYTHON - INTRODUCTORY TASKS Open Idle (the program we will be using to write our Python codes). We can use the following code in Python to work out numeracy calculations. Try
More informationPlan, Attend, Generate: Planning for Sequence-to-Sequence Models
Plan, Attend, Generate: Planning for Sequence-to-Sequence Models Francis Dutil, Caglar Gulcehre, Adam Trischler, Yoshua Bengio Presented by Xinyuan Zhang April 26, 2018 April 26, 2018 1 / 11 Introduction
More informationWeb- Scale Mul,media: Op,mizing LSH. Malcolm Slaney Yury Li<shits Junfeng He Y! Research
Web- Scale Mul,media: Op,mizing LSH Malcolm Slaney Yury Li
More informationAn Unsupervised Model for Joint Phrase Alignment and Extraction
An Unsupervised Model for Joint Phrase Alignment and Extraction Graham Neubig 1,2, Taro Watanabe 2, Eiichiro Sumita 2, Shinsuke Mori 1, Tatsuya Kawahara 1 1 Graduate School of Informatics, Kyoto University
More informationCS395T Visual Recogni5on and Search. Gautam S. Muralidhar
CS395T Visual Recogni5on and Search Gautam S. Muralidhar Today s Theme Unsupervised discovery of images Main mo5va5on behind unsupervised discovery is that supervision is expensive Common tasks include
More informationAlignment and Image Comparison. Erik Learned- Miller University of Massachuse>s, Amherst
Alignment and Image Comparison Erik Learned- Miller University of Massachuse>s, Amherst Alignment and Image Comparison Erik Learned- Miller University of Massachuse>s, Amherst Alignment and Image Comparison
More informationWhat is Search For? CS 188: Ar)ficial Intelligence. Constraint Sa)sfac)on Problems Sep 14, 2015
CS 188: Ar)ficial Intelligence Constraint Sa)sfac)on Problems Sep 14, 2015 What is Search For? Assump)ons about the world: a single agent, determinis)c ac)ons, fully observed state, discrete state space
More informationAsaf Cidon, Assaf Eisenman, Mohammad Alizadeh and Sachin KaH
Cli$anger: Scaling Performance Cliffs in Memory Caches [NSDI 2016] Cache OS: Data Center Dynamic Cache Management Asaf Cidon, Assaf Eisenman, Mohammad Alizadeh and Sachin KaH 1 Key-Value Caches are Essen1al
More informationProgramming Environments
Programming Environments There are several ways of crea/ng a computer program Using an Integrated Development Environment (IDE) Using a text editor You should use the method you are most comfortable with.
More informationNeural Networks for Machine Learning. Lecture 5a Why object recogni:on is difficult. Geoffrey Hinton with Ni:sh Srivastava Kevin Swersky
Neural Networks for Machine Learning Lecture 5a Why object recogni:on is difficult Geoffrey Hinton with Ni:sh Srivastava Kevin Swersky Things that make it hard to recognize objects Segmenta:on: Real scenes
More informationDCU-UvA Multimodal MT System Report
DCU-UvA Multimodal MT System Report Iacer Calixto ADAPT Centre School of Computing Dublin City University Dublin, Ireland iacer.calixto@adaptcentre.ie Desmond Elliott ILLC University of Amsterdam Science
More informationImproving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah
Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah Reference Most of the slides are taken from the third chapter of the online book by Michael Nielson: neuralnetworksanddeeplearning.com
More informationAlgorithms Lecture 11. UC Davis, ECS20, Winter Discrete Mathematics for Computer Science
UC Davis, ECS20, Winter 2017 Discrete Mathematics for Computer Science Prof. Raissa D Souza (slides adopted from Michael Frank and Haluk Bingöl) Lecture 11 Algorithms 3.1-3.2 Algorithms Member of the House
More information(Refer Slide Time: 1:43)
(Refer Slide Time: 1:43) Digital Circuits and Systems Prof. S. Srinivasan Department of Electrical Engineering Indian Institute of Technology, Madras Lecture - 27 Pattern Detector So, we talked about Moore
More informationStructured Prediction Basics
CS11-747 Neural Networks for NLP Structured Prediction Basics Graham Neubig Site https://phontron.com/class/nn4nlp2017/ A Prediction Problem I hate this movie I love this movie very good good neutral bad
More informationCS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 32: Pipeline Parallelism 3
CS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 32: Pipeline Parallelism 3 Instructor: Dan Garcia inst.eecs.berkeley.edu/~cs61c! Compu@ng in the News At a laboratory in São Paulo,
More informationGeneralizing Map- Reduce
Generalizing Map- Reduce 1 Example: A Map- Reduce Graph map reduce map... reduce reduce map 2 Map- reduce is not a solu;on to every problem, not even every problem that profitably can use many compute
More informationProblem Solving through Programming In C Prof. Anupam Basu Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur
Problem Solving through Programming In C Prof. Anupam Basu Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur Lecture - 04 Introduction to Programming Language Concepts
More information(Refer Slide Time: 05:25)
Data Structures and Algorithms Dr. Naveen Garg Department of Computer Science and Engineering IIT Delhi Lecture 30 Applications of DFS in Directed Graphs Today we are going to look at more applications
More informationGradient Descent. Michail Michailidis & Patrick Maiden
Gradient Descent Michail Michailidis & Patrick Maiden Outline Mo4va4on Gradient Descent Algorithm Issues & Alterna4ves Stochas4c Gradient Descent Parallel Gradient Descent HOGWILD! Mo4va4on It is good
More informationWays to implement a language
Interpreters Implemen+ng PLs Most of the course is learning fundamental concepts for using PLs Syntax vs. seman+cs vs. idioms Powerful constructs like closures, first- class objects, iterators (streams),
More informationPython & Web Mining. Lecture Old Dominion University. Department of Computer Science CS 495 Fall 2012
Python & Web Mining Lecture 6 10-10-12 Old Dominion University Department of Computer Science CS 495 Fall 2012 Hany SalahEldeen Khalil hany@cs.odu.edu Scenario So what did Professor X do when he wanted
More informationInforma(on Retrieval. Administra*ve. Sta*s*cal MT Overview. Problems for Sta*s*cal MT
Administra*ve Introduc*on to Informa(on Retrieval CS457 Fall 2011! David Kauchak Projects Status 2 on Friday Paper next Friday work on the paper in parallel if you re not done with experiments by early
More informationA Comparison of Sequence-Trained Deep Neural Networks and Recurrent Neural Networks Optical Modeling For Handwriting Recognition
A Comparison of Sequence-Trained Deep Neural Networks and Recurrent Neural Networks Optical Modeling For Handwriting Recognition Théodore Bluche, Hermann Ney, Christopher Kermorvant SLSP 14, Grenoble October
More informationThe Hitchhiker s Guide to TensorFlow:
The Hitchhiker s Guide to TensorFlow: Beyond Recurrent Neural Networks (sort of) Keith Davis @keithdavisiii iamthevastidledhitchhiker.github.io Topics Kohonen/Self-Organizing Maps LSTMs in TensorFlow GRU
More informationSpeech and Language Processing. Daniel Jurafsky & James H. Martin. Copyright c All rights reserved. Draft of September 23, 2018.
Speech and Language Processing. Daniel Jurafsky & James H. Martin. Copyright c 2018. All rights reserved. Draft of September 23, 2018. CHAPTER 9 Sequence Processing with Recurrent Networks Time will explain.
More informationNovel Image Captioning
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More information10703 Deep Reinforcement Learning and Control
10703 Deep Reinforcement Learning and Control Russ Salakhutdinov Machine Learning Department rsalakhu@cs.cmu.edu Policy Gradient II Used Materials Disclaimer: Much of the material and slides for this lecture
More informationCS489/698: Intro to ML
CS489/698: Intro to ML Lecture 14: Training of Deep NNs Instructor: Sun Sun 1 Outline Activation functions Regularization Gradient-based optimization 2 Examples of activation functions 3 5/28/18 Sun Sun
More informationRead & Download (PDF Kindle) Data Structures And Other Objects Using Java (4th Edition)
Read & Download (PDF Kindle) Data Structures And Other Objects Using Java (4th Edition) Data Structures and Other Objects Using Java is a gradual, "just-in-time" introduction to Data Structures for a CS2
More informationStatistical Machine Translation Lecture 3. Word Alignment Models
p. Statistical Machine Translation Lecture 3 Word Alignment Models Stephen Clark based on slides by Philipp Koehn p. Statistical Modeling p Mary did not slap the green witch Maria no daba una bofetada
More informationIntroduction to Scientific Computing
Introduction to Scientific Computing Dr Hanno Rein Last updated: October 12, 2018 1 Computers A computer is a machine which can perform a set of calculations. The purpose of this course is to give you
More informationLSTM with Working Memory
LSTM with Working Memory Andrew Pulver Department of Computer Science University at Albany Email: apulver@albany.edu Siwei Lyu Department of Computer Science University at Albany Email: slyu@albany.edu
More information