Recurrent Neural Networks

Size: px
Start display at page:

Download "Recurrent Neural Networks"


1 Recurrent Neural Networks Javier Béjar Deep Learning 2018/2019 Fall Master in Artificial Intelligence (FIB-UPC)

2 Introduction

3 Sequential data Many problems are described by sequences Time series Video/audio processing Natural Language Processing (translation, dialogue) Bioinformatics (DNA/protein) Process control Model the problem = Extract elements sequence dependencies DL 2018/2019 Fall - MAI - FIB 1/49

4 Long time dependences Sequences can be modeled using non sequential ML methods (e.g. Sliding windows), but All sequences must have the same length Order of the elements always matter We cannot model dependencies longer than the chosen sequence length We need methods that explicitly model time dependencies capable of: Processing arbitrary length examples Providing different mappings (one to many, many to one, many to many) DL 2018/2019 Fall - MAI - FIB 2/49

5 Input-Output mapping Usual NN One to many Many to one Many to many Recurrent NN DL 2018/2019 Fall - MAI - FIB 3/49

6 One to many - Image captioning Black and white dog jumps over bar from Deep Visual-Semantic Alignments for Generating Image Descriptions Andrej Karpathy, Li Fei-Fei DL 2018/2019 Fall - MAI - FIB 4/49

7 Many to One - Sentiment Analysis My flight was just delayed, s**t Negative Never again BA, thanks for the dreadful flight Negative We arrived on time, yeehaaa! Positive Another day, another flight Neutral Efficient, quick, delightful, always with BA Positive DL 2018/2019 Fall - MAI - FIB 5/49

8 Many to Many - Machine Translation [How, many, programmers, for, changing, a, lightbulb,?] [Wie, viele, Programmierer, zum, Wechseln, einer, Glühbirne,?] [Combien, de, programmeurs, pour, changer, une, ampoule,?] [,Cuántos, programadores, para, cambiar, una, bombilla,?] [Zenbat, bonbilla, bat, aldatzeko, programatzaileak,?] DL 2018/2019 Fall - MAI - FIB 6/49

9 Recurrent Networks

10 Recurrent Neural Networks RNN are feed forward NN with edges that span adjacent time steps (recurrent edges) At each time step nodes receive input from the current data and from the previous state This makes that input data from previous time steps can influence the output at the current time step RNN are universal function approximators (Turing Complete) DL 2018/2019 Fall - MAI - FIB 7/49

11 Recurrent Node DL 2018/2019 Fall - MAI - FIB 8/49

12 Recurrent Neural Networks Input (x) is a vector of values for time t The hidden node (h) stores the state Weights are shared through time Each step the computation uses the previous step h (t+1) = f (h (t), x t+1 ; θ) = f (f (h (t 1), x t ; θ), x t+1 ; θ) = We can think of a RNN as a deep network that stacks layers through time DL 2018/2019 Fall - MAI - FIB 9/49

13 Training RNN (unfolding) DL 2018/2019 Fall - MAI - FIB 10/49

14 Activation Functions There are different choices for the activation function to compute the hidden state, but: The hyperbolic tangent function (tanh) is a popular choice versus the usual sigmoid function Good results are also achieved using the rectified linear function (ReLU) instead DL 2018/2019 Fall - MAI - FIB 11/49

15 RNN computation a (t) = b + W h (t 1) + U x (t) (1) h (t) = tanh(a (t) ) (2) y (t) = c + V h (t) (3) b and c are bias, an additional step can be added depending on the task DL 2018/2019 Fall - MAI - FIB 12/49

16 Training RNN RNN are usually trained using backpropagation The computation is unfolded through the sequence to propagate the activations and to compute the gradient This is known as Backpropagation Through Time (BPTT) Usually the input is limited in length to reduce computational cost, this is known as Truncated BPTT Assumes that influence is limited to a time horizon DL 2018/2019 Fall - MAI - FIB 13/49

17 Recurrent NN unfolded... DL 2018/2019 Fall - MAI - FIB 14/49

18 Recurrent NN (Regresssion/Classification) Loss... DL 2018/2019 Fall - MAI - FIB 15/49

19 Recurrent NN (Sequence to Sequence) Loss Loss Loss... DL 2018/2019 Fall - MAI - FIB 16/49

20 Gradient problems Two main problems during training Exploding Gradient Vanishing Gradient Problems appear because of the sharing of weights Recurrent edge weights in combination with activation function magnify (W > 1) or shrink (W < 1) the gradient exponentially with the length of the sequence Clipping gradients and regularization are usual solutions to exploding gradient DL 2018/2019 Fall - MAI - FIB 17/49

21 Recurrent NN (Gradient problems)... DL 2018/2019 Fall - MAI - FIB 18/49

22 Recurrent NN (Gradient problems) Applying the chain rule makes that propagating the values forward and backward, the longer the sequence, the smaller the influence of the past: f t W = f t s t s t W = f t s t s t 1 s t s t 1 W = = f t s t s t s t 1 s 2 s 1 s 1 W DL 2018/2019 Fall - MAI - FIB 19/49

23 Beyond Vanilla RNN Learning long time dependencies is difficult for vanilla RNN More sophisticated recurrent architectures allow reducing gradient problems Gated RNNs introduce memory and gating mechanisms When to store information in the state How much new information changes the state DL 2018/2019 Fall - MAI - FIB 20/49

24 LSTMs

25 Long Short Term Memory units (LSTMs) LSTMs specialize on learning long time dependencies They are composed by a memory cell and control gates Gates allow regulating how much the new information changes the state and flows to the next step Forget Gate Input Gate Update Gate Output Gate DL 2018/2019 Fall - MAI - FIB 21/49

26 Long Short Term Memory units (LSTMs) An LSTM propagates a hidden state (h t ) and a cell state (c t ) The hidden state acts as a short-term memory (large updates) The cell state acts as a long-term memory (small updates) Gates control updates from layer to layer Information can flow between short-term and long-term memory DL 2018/2019 Fall - MAI - FIB 22/49

27 LSTMs DL 2018/2019 Fall - MAI - FIB 23/49

28 Role of activation functions Gates use as activation functions the sigmoid and tanh functions Their role is to perform fuzzy decisions tanh: squashes value to range [-1,1] (substract, neutral, add) sigmoid: squashes value to range [0,1] (closed, open) DL 2018/2019 Fall - MAI - FIB 24/49

29 LSTMs computations x + x tanh x f t = σ(w f [h t 1, x t ]) (Forget) tanh DL 2018/2019 Fall - MAI - FIB 25/49

30 LSTMs computations x + x tanh x f t = σ(w f [h t 1, x t ]) (Forget) i t = σ(w i [h t 1, x t ]) (Input) c t = tanh(w c [h t 1, x t ]) tanh DL 2018/2019 Fall - MAI - FIB 25/49

31 LSTMs computations x + x tanh tanh x f t = σ(w f [h t 1, x t ]) (Forget) i t = σ(w i [h t 1, x t ]) (Input) c t = tanh(w c [h t 1, x t ]) c t = f t c t 1 + i t c t (Update) DL 2018/2019 Fall - MAI - FIB 25/49

32 LSTMs computations x + x tanh tanh x f t = σ(w f [h t 1, x t ]) (Forget) i t = σ(w i [h t 1, x t ]) (Input) c t = tanh(w c [h t 1, x t ]) c t = f t c t 1 + i t c t (Update) o t = σ(w o [h t 1, x t ]) h t = o t tanh(c t ) (Output) DL 2018/2019 Fall - MAI - FIB 25/49

33 GRUs

34 Gated Recurrent Units Reduces the complexity of the LSTMs Unifies the forgetting and the update gates as a unique update gate The update gate computes how the input and previous state are combined A reset gate controls the access to the previous state near to one previous state has more effect near to zero new state (updated) has more effect DL 2018/2019 Fall - MAI - FIB 26/49

35 GRU Reset Update z t = σ(w z [h t 1, x t ]) (upd) r t = σ(w r [h t 1, x t ]) (res) h t = tanh(w h [r t h t 1, x t ]) h t = (1 z t ) h t 1 + z t h t DL 2018/2019 Fall - MAI - FIB 27/49

36 Pros/Cons

37 Vanilla RNN vs LSTMs vs GRUs Empirically Vanilla RNN underperforms on complex tasks LSTMs are widely used but GRUs are very recent (2014) There is not yet theoretical arguments in the LSTMs vs GRUs question Empirical studies do not shed light to the question (see references) DL 2018/2019 Fall - MAI - FIB 28/49

38 Vanilla RNN vs LSTMs vs GRUs Jozefowicz, R., Zaremba, W., Sutskever, I. (2015). An empirical exploration of recurrent network architectures. In Proceedings of the 32nd International Conference on Machine Learning (ICML-15) (pp ). Chung, J., Gulcehre, C., Cho, K., Bengio, Y. (2014).Empirical evaluation of gated recurrent neural networks on sequence modeling. arxiv preprint arxiv: DL 2018/2019 Fall - MAI - FIB 29/49

39 Other RNN Variations

40 Bidirectional RNNs - Back from the future In some domains it is easier to learn dependencies if information flows in both directions For instance, domains where decisions depend on the whole sequence Part of Speech tagging Machine translation Speech/handwritting recognition A RNN can be split in two, so sequences are processed in both directions DL 2018/2019 Fall - MAI - FIB 30/49

41 Regular RNNs (only forward) DL 2018/2019 Fall - MAI - FIB 31/49

42 Bidirectional RNNs (forward-backward) DL 2018/2019 Fall - MAI - FIB 32/49

43 Bidirectional RNNs - Training Both directions are independent (no interconnections) Backpropagation needs to follow the graph dependencies, not all weights can be updated at the same time Propagating Forward: First compute the RNN forward and backward pass from the inputs, then the outputs Propagating Backward: First compute the RNN forward and backward pass from the outputs, then the inputs DL 2018/2019 Fall - MAI - FIB 33/49

44 Input Sequence (N) Sequence to sequence Direct sequence association Input and output sequences have the same lengths All the outputs of the BPTT are used for the output Training is straightforward RNN RNN RNN RNN Output Sequence (N) DL 2018/2019 Fall - MAI - FIB 34/49

45 Sequence to sequence Encoder-decoder architecture Input and output sequences can have different lengths Encoder RNN summarizes input in a coding state Decoder RNN generates output from that state Different options connecting Encoder-Decoder (direct, peeking, attention) or training (teacher forcing) Inference: The sequence is generated element by element using the output of the previous step or using a beam search DL 2018/2019 Fall - MAI - FIB 35/49

46 Encoder Decoder Encoder-Decoder (Plain) Today it is raining <start> hoy está lloviendo <eos> Encoder state DL 2018/2019 Fall - MAI - FIB 36/49

47 Encoder-Decoder (Plain) - Inference When generating a sequence of discrete values, the greedy approach could not be the best policy A limited exploration of the branching alternatives is needed (beam search) The sequence with the largest joint probability is the sequence generated Hoy (0.6) Ahora (0.3) Esta (0.7) Estaba (0.2) Esta (0.8) Estaba (0.1) Lloviendo (0.9) Lloviendo (0.9) Lloviendo (0.9) Lloviendo (0.9) DL 2018/2019 Fall - MAI - FIB 37/49

48 Encoder Decoder Encoder-Decoder (Peeking) Today it is raining <start> hoy está lloviendo <eos> Encoder state DL 2018/2019 Fall - MAI - FIB 38/49

49 Encoder Decoder Encoder-Decoder (Attention) Today it is raining <start> hoy está lloviendo <eos> Encoder state Attention DL 2018/2019 Fall - MAI - FIB 39/49

50 Encoder-Decoder (Teacher Forcing) Input Sequence Today it is raining <start> hoy está lloviendo Decoder Encoder hoy está lloviendo <eos> Output Sequence Encoder state DL 2018/2019 Fall - MAI - FIB 40/49

51 Augmented neural networks RNNs are Turing complete, but it is difficult to achieve it in practice New architectures include: Data structures to store information (Read/Write Operations) RNNs control the operations Attention mechanisms Graves, Wayne, Danihelka Neural Turing Machines, ArXiv preprint arxiv: C. Olah, S. Carter, Attention and Augmented Recurrent Neural Networks, Distill, Sept 9, 2016 DL 2018/2019 Fall - MAI - FIB 41/49

52 Applications

53 Large variety of applications Time series prediction NLP Reasoning Multimodal POS tagging, Machine traslation, Question answering Caption Generation for images/video Speech recognition/generation Reinforcement learning DL 2018/2019 Fall - MAI - FIB 42/49

54 Guided Laboratory

55 Sequence prediction Sequence to value Predicting the next step of a time series Classification of time series Predicting sentiment from tweets Text generation (predicting characters) Sequence to sequence Learning to add DL 2018/2019 Fall - MAI - FIB 43/49

56 Task 1: Air Quality prediction Time series regression TS Window t Dataset: Air Quality every hour Goal: Predict wind speed next hour Training time Train/ 8784 Test/6 lag 32 Neurons /1 Layer/30 epochs 30 sec TS t+1 RNN RNN Dense DL 2018/2019 Fall - MAI - FIB 44/49

57 Task 2: Electric Devices Time series classification Power Consumption Daily power consumption of household devices (7 classes, 96 attributes) Goal: Predict household device class Training time 64 Neurons /2 Layers/30 epochs 1 min Dev Class RNN RNN Dense DL 2018/2019 Fall - MAI - FIB 45/49

58 Task 3: Sentiment analysis Tweets (neg, pos, neutral) Tweets as sequences of words Preprocess: Generate vocabulary, recode sequences Embed sequences to a more convenient space Tweet Sequence Embedding RNN RNN Training time 5000 words/ 40 dim embedding 64 Neurons /1 Layer/50 epochs 3 min Sentiment Dense DL 2018/2019 Fall - MAI - FIB 46/49

59 Task 4: Text Generation Poetry text Character prediction from text windows Preprocess: sequences of characters one hot encoding Text generation by predicting characters iteratively Training time poetry1/50 chars/3 skip 64 Neu/1 Ly/10 it/10 ep it 23 min Character Text Window One hot enc RNN RNN Dense DL 2018/2019 Fall - MAI - FIB 47/49

60 Task 5: Learning to add Predicting addition results from text Input sequence: NUMBER+NUMBER Output sequence: NUMBER Preprocess: sequences of characters one hot encoding Training time ex/3 digits 128 Neu/1 Ly/50 ep 5 min 30 sec RNN Encode + RNN Decode DL 2018/2019 Fall - MAI - FIB 48/49

61 Task 5: Learning to add One Hot enc XXX+XXX RNN (enc) RepeatVector TimeDistributed RNN (dec) Dense XXXX DL 2018/2019 Fall - MAI - FIB 49/49

CSC 578 Neural Networks and Deep Learning

CSC 578 Neural Networks and Deep Learning CSC 578 Neural Networks and Deep Learning Fall 2018/19 7. Recurrent Neural Networks (Some figures adapted from NNDL book) 1 Recurrent Neural Networks 1. Recurrent Neural Networks (RNNs) 2. RNN Training

More information

Sequence Modeling: Recurrent and Recursive Nets. By Pyry Takala 14 Oct 2015

Sequence Modeling: Recurrent and Recursive Nets. By Pyry Takala 14 Oct 2015 Sequence Modeling: Recurrent and Recursive Nets By Pyry Takala 14 Oct 2015 Agenda Why Recurrent neural networks? Anatomy and basic training of an RNN (10.2, 10.2.1) Properties of RNNs (10.2.2, 8.2.6) Using

More information

SEMANTIC COMPUTING. Lecture 9: Deep Learning: Recurrent Neural Networks (RNNs) TU Dresden, 21 December 2018

SEMANTIC COMPUTING. Lecture 9: Deep Learning: Recurrent Neural Networks (RNNs) TU Dresden, 21 December 2018 SEMANTIC COMPUTING Lecture 9: Deep Learning: Recurrent Neural Networks (RNNs) Dagmar Gromann International Center For Computational Logic TU Dresden, 21 December 2018 Overview Handling Overfitting Recurrent

More information

Machine Learning 13. week

Machine Learning 13. week Machine Learning 13. week Deep Learning Convolutional Neural Network Recurrent Neural Network 1 Why Deep Learning is so Popular? 1. Increase in the amount of data Thanks to the Internet, huge amount of

More information

LSTM and its variants for visual recognition. Xiaodan Liang Sun Yat-sen University

LSTM and its variants for visual recognition. Xiaodan Liang Sun Yat-sen University LSTM and its variants for visual recognition Xiaodan Liang Sun Yat-sen University Outline Context Modelling with CNN LSTM and its Variants LSTM Architecture Variants Application in

More information

Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling

Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling Authors: Junyoung Chung, Caglar Gulcehre, KyungHyun Cho and Yoshua Bengio Presenter: Yu-Wei Lin Background: Recurrent Neural

More information

Deep Learning Applications

Deep Learning Applications October 20, 2017 Overview Supervised Learning Feedforward neural network Convolution neural network Recurrent neural network Recursive neural network (Recursive neural tensor network) Unsupervised Learning

More information

Sentiment Classification of Food Reviews

Sentiment Classification of Food Reviews Sentiment Classification of Food Reviews Hua Feng Department of Electrical Engineering Stanford University Stanford, CA 94305 Ruixi Lin Department of Electrical Engineering Stanford

More information

DEEP LEARNING REVIEW. Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature Presented by Divya Chitimalla

DEEP LEARNING REVIEW. Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature Presented by Divya Chitimalla DEEP LEARNING REVIEW Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature 2015 -Presented by Divya Chitimalla What is deep learning Deep learning allows computational models that are composed of multiple

More information

Show, Discriminate, and Tell: A Discriminatory Image Captioning Model with Deep Neural Networks

Show, Discriminate, and Tell: A Discriminatory Image Captioning Model with Deep Neural Networks Show, Discriminate, and Tell: A Discriminatory Image Captioning Model with Deep Neural Networks Zelun Luo Department of Computer Science Stanford University Te-Lin Wu Department of

More information

Encoding RNNs, 48 End of sentence (EOS) token, 207 Exploding gradient, 131 Exponential function, 42 Exponential Linear Unit (ELU), 44

Encoding RNNs, 48 End of sentence (EOS) token, 207 Exploding gradient, 131 Exponential function, 42 Exponential Linear Unit (ELU), 44 A Activation potential, 40 Annotated corpus add padding, 162 check versions, 158 create checkpoints, 164, 166 create input, 160 create train and validation datasets, 163 dropout, 163 DRUG-AE.rel file,

More information

Gate-Variants of Gated Recurrent Unit (GRU) Neural Networks

Gate-Variants of Gated Recurrent Unit (GRU) Neural Networks Gate-Variants of Gated Recurrent Unit (GRU) Neural Networks Rahul Dey and Fathi M. Salem Circuits, Systems, and Neural Networks (CSANN) LAB Department of Electrical and Computer Engineering Michigan State

More information

Outline GF-RNN ReNet. Outline

Outline GF-RNN ReNet. Outline Outline Gated Feedback Recurrent Neural Networks. arxiv1502. Introduction: RNN & Gated RNN Gated Feedback Recurrent Neural Networks (GF-RNN) Experiments: Character-level Language Modeling & Python Program

More information

Empirical Evaluation of RNN Architectures on Sentence Classification Task

Empirical Evaluation of RNN Architectures on Sentence Classification Task Empirical Evaluation of RNN Architectures on Sentence Classification Task Lei Shen, Junlin Zhang Chanjet Information Technology, Abstract. Recurrent Neural Networks

More information

27: Hybrid Graphical Models and Neural Networks

27: Hybrid Graphical Models and Neural Networks 10-708: Probabilistic Graphical Models 10-708 Spring 2016 27: Hybrid Graphical Models and Neural Networks Lecturer: Matt Gormley Scribes: Jakob Bauer Otilia Stretcu Rohan Varma 1 Motivation We first look

More information

Natural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu

Natural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu Natural Language Processing CS 6320 Lecture 6 Neural Language Models Instructor: Sanda Harabagiu In this lecture We shall cover: Deep Neural Models for Natural Language Processing Introduce Feed Forward

More information

Slide credit from Hung-Yi Lee & Richard Socher

Slide credit from Hung-Yi Lee & Richard Socher Slide credit from Hung-Yi Lee & Richard Socher 1 Review Word Vector 2 Word2Vec Variants Skip-gram: predicting surrounding words given the target word (Mikolov+, 2013) CBOW (continuous bag-of-words): predicting

More information

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic SEMANTIC COMPUTING Lecture 8: Introduction to Deep Learning Dagmar Gromann International Center For Computational Logic TU Dresden, 7 December 2018 Overview Introduction Deep Learning General Neural Networks

More information

Modeling Sequences Conditioned on Context with RNNs

Modeling Sequences Conditioned on Context with RNNs Modeling Sequences Conditioned on Context with RNNs Sargur Srihari This is part of lecture slides on Deep Learning: 1 10. Topics in Sequence

More information

Residual Networks And Attention Models. cs273b Recitation 11/11/2016. Anna Shcherbina

Residual Networks And Attention Models. cs273b Recitation 11/11/2016. Anna Shcherbina Residual Networks And Attention Models cs273b Recitation 11/11/2016 Anna Shcherbina Introduction to ResNets Introduced in 2015 by Microsoft Research Deep Residual Learning for Image Recognition (He, Zhang,

More information

Layerwise Interweaving Convolutional LSTM

Layerwise Interweaving Convolutional LSTM Layerwise Interweaving Convolutional LSTM Tiehang Duan and Sargur N. Srihari Department of Computer Science and Engineering The State University of New York at Buffalo Buffalo, NY 14260, United States

More information

Recurrent Neural Networks. Nand Kishore, Audrey Huang, Rohan Batra

Recurrent Neural Networks. Nand Kishore, Audrey Huang, Rohan Batra Recurrent Neural Networks Nand Kishore, Audrey Huang, Rohan Batra Roadmap Issues Motivation 1 Application 1: Sequence Level Training 2 Basic Structure 3 4 Variations 5 Application 3: Image Classification

More information

Code Mania Artificial Intelligence: a. Module - 1: Introduction to Artificial intelligence and Python:

Code Mania Artificial Intelligence: a. Module - 1: Introduction to Artificial intelligence and Python: Code Mania 2019 Artificial Intelligence: a. Module - 1: Introduction to Artificial intelligence and Python: 1. Introduction to Artificial Intelligence 2. Introduction to python programming and Environment

More information

Deep Learning. Architecture Design for. Sargur N. Srihari

Deep Learning. Architecture Design for. Sargur N. Srihari Architecture Design for Deep Learning Sargur N. 1 Topics Overview 1. Example: Learning XOR 2. Gradient-Based Learning 3. Hidden Units 4. Architecture Design 5. Backpropagation

More information

LSTM with Working Memory

LSTM with Working Memory LSTM with Working Memory Andrew Pulver Department of Computer Science University at Albany Email: Siwei Lyu Department of Computer Science University at Albany Email:

More information

16-785: Integrated Intelligence in Robotics: Vision, Language, and Planning. Spring 2018 Lecture 14. Image to Text

16-785: Integrated Intelligence in Robotics: Vision, Language, and Planning. Spring 2018 Lecture 14. Image to Text 16-785: Integrated Intelligence in Robotics: Vision, Language, and Planning Spring 2018 Lecture 14. Image to Text Input Output Classification tasks 4/1/18 CMU 16-785: Integrated Intelligence in Robotics

More information

Image Captioning with Object Detection and Localization

Image Captioning with Object Detection and Localization Image Captioning with Object Detection and Localization Zhongliang Yang, Yu-Jin Zhang, Sadaqat ur Rehman, Yongfeng Huang, Department of Electronic Engineering, Tsinghua University, Beijing 100084, China

More information

Artificial Neural Networks. Introduction to Computational Neuroscience Ardi Tampuu

Artificial Neural Networks. Introduction to Computational Neuroscience Ardi Tampuu Artificial Neural Networks Introduction to Computational Neuroscience Ardi Tampuu 7.0.206 Artificial neural network NB! Inspired by biology, not based on biology! Applications Automatic speech recognition

More information

CS489/698: Intro to ML

CS489/698: Intro to ML CS489/698: Intro to ML Lecture 14: Training of Deep NNs Instructor: Sun Sun 1 Outline Activation functions Regularization Gradient-based optimization 2 Examples of activation functions 3 5/28/18 Sun Sun

More information

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU, Machine Learning 10-701, Fall 2015 Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October 6, 2015 Eric Xing @ CMU, 2015 1 A perennial challenge in computer vision: feature engineering SIFT Spin image

More information

LSTM for Language Translation and Image Captioning. Tel Aviv University Deep Learning Seminar Oran Gafni & Noa Yedidia

LSTM for Language Translation and Image Captioning. Tel Aviv University Deep Learning Seminar Oran Gafni & Noa Yedidia 1 LSTM for Language Translation and Image Captioning Tel Aviv University Deep Learning Seminar Oran Gafni & Noa Yedidia 2 Part I LSTM for Language Translation Motivation Background (RNNs, LSTMs) Model

More information


JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS. Puyang Xu, Ruhi Sarikaya. Microsoft Corporation JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS Puyang Xu, Ruhi Sarikaya Microsoft Corporation ABSTRACT We describe a joint model for intent detection and slot filling based

More information

Recurrent Neural Nets II

Recurrent Neural Nets II Recurrent Neural Nets II Steven Spielberg Pon Kumar, Tingke (Kevin) Shen Machine Learning Reading Group, Fall 2016 9 November, 2016 Outline 1 Introduction 2 Problem Formulations with RNNs 3 LSTM for Optimization

More information

Recurrent Neural Network (RNN) Industrial AI Lab.

Recurrent Neural Network (RNN) Industrial AI Lab. Recurrent Neural Network (RNN) Industrial AI Lab. For example (Deterministic) Time Series Data Closed- form Linear difference equation (LDE) and initial condition High order LDEs 2 (Stochastic) Time Series

More information

Deep Learning. Practical introduction with Keras JORDI TORRES 27/05/2018. Chapter 3 JORDI TORRES

Deep Learning. Practical introduction with Keras JORDI TORRES 27/05/2018. Chapter 3 JORDI TORRES Deep Learning Practical introduction with Keras Chapter 3 27/05/2018 Neuron A neural network is formed by neurons connected to each other; in turn, each connection of one neural network is associated

More information

The Hitchhiker s Guide to TensorFlow:

The Hitchhiker s Guide to TensorFlow: The Hitchhiker s Guide to TensorFlow: Beyond Recurrent Neural Networks (sort of) Keith Davis @keithdavisiii Topics Kohonen/Self-Organizing Maps LSTMs in TensorFlow GRU

More information

House Price Prediction Using LSTM

House Price Prediction Using LSTM House Price Prediction Using LSTM Xiaochen Chen Lai Wei The Hong Kong University of Science and Technology Jiaxin Xu ABSTRACT In this paper, we use the house price data ranging from January 2004 to October

More information

Speech and Language Processing. Daniel Jurafsky & James H. Martin. Copyright c All rights reserved. Draft of September 23, 2018.

Speech and Language Processing. Daniel Jurafsky & James H. Martin. Copyright c All rights reserved. Draft of September 23, 2018. Speech and Language Processing. Daniel Jurafsky & James H. Martin. Copyright c 2018. All rights reserved. Draft of September 23, 2018. CHAPTER 9 Sequence Processing with Recurrent Networks Time will explain.

More information

A Quick Guide on Training a neural network using Keras.

A Quick Guide on Training a neural network using Keras. A Quick Guide on Training a neural network using Keras. TensorFlow and Keras Keras Open source High level, less flexible Easy to learn Perfect for quick implementations Starts by François Chollet from

More information

LSTM: An Image Classification Model Based on Fashion-MNIST Dataset

LSTM: An Image Classification Model Based on Fashion-MNIST Dataset LSTM: An Image Classification Model Based on Fashion-MNIST Dataset Kexin Zhang, Research School of Computer Science, Australian National University Kexin Zhang, Abstract. The application

More information

Asynchronous Parallel Learning for Neural Networks and Structured Models with Dense Features

Asynchronous Parallel Learning for Neural Networks and Structured Models with Dense Features Asynchronous Parallel Learning for Neural Networks and Structured Models with Dense Features Xu SUN ( 孙栩 ) Peking University Motivation Neural networks -> Good Performance CNN, RNN, LSTM

More information

Hidden Units. Sargur N. Srihari

Hidden Units. Sargur N. Srihari Hidden Units Sargur N. 1 Topics in Deep Feedforward Networks Overview 1. Example: Learning XOR 2. Gradient-Based Learning 3. Hidden Units 4. Architecture Design 5. Backpropagation

More information

Domain-Aware Sentiment Classification with GRUs and CNNs

Domain-Aware Sentiment Classification with GRUs and CNNs Domain-Aware Sentiment Classification with GRUs and CNNs Guangyuan Piao 1(B) and John G. Breslin 2 1 Insight Centre for Data Analytics, Data Science Institute, National University of Ireland Galway, Galway,

More information

Keras: Handwritten Digit Recognition using MNIST Dataset

Keras: Handwritten Digit Recognition using MNIST Dataset Keras: Handwritten Digit Recognition using MNIST Dataset IIT PATNA January 31, 2018 1 / 30 OUTLINE 1 Keras: Introduction 2 Installing Keras 3 Keras: Building, Testing, Improving A Simple Network 2 / 30

More information

EECS 496 Statistical Language Models. Winter 2018

EECS 496 Statistical Language Models. Winter 2018 EECS 496 Statistical Language Models Winter 2018 Introductions Professor: Doug Downey Course web site: (linked off prof. home page) Logistics Grading

More information

Convolutional Sequence to Sequence Learning. Denis Yarats with Jonas Gehring, Michael Auli, David Grangier, Yann Dauphin Facebook AI Research

Convolutional Sequence to Sequence Learning. Denis Yarats with Jonas Gehring, Michael Auli, David Grangier, Yann Dauphin Facebook AI Research Convolutional Sequence to Sequence Learning Denis Yarats with Jonas Gehring, Michael Auli, David Grangier, Yann Dauphin Facebook AI Research Sequence generation Need to model a conditional distribution

More information

Natural Language Processing with Deep Learning CS224N/Ling284. Christopher Manning Lecture 4: Backpropagation and computation graphs

Natural Language Processing with Deep Learning CS224N/Ling284. Christopher Manning Lecture 4: Backpropagation and computation graphs Natural Language Processing with Deep Learning CS4N/Ling84 Christopher Manning Lecture 4: Backpropagation and computation graphs Lecture Plan Lecture 4: Backpropagation and computation graphs 1. Matrix

More information

Recurrent Neural Networks

Recurrent Neural Networks Recurrent Neural Networks 11-785 / Fall 2018 / Recitation 7 Raphaël Olivier Recap : RNNs are magic They have infinite memory They handle all kinds of series They re the basis of recent NLP : Translation,

More information

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides Deep Learning in Visual Recognition Thanks Da Zhang for the slides Deep Learning is Everywhere 2 Roadmap Introduction Convolutional Neural Network Application Image Classification Object Detection Object

More information

Rationalizing Sentiment Analysis in Tensorflow

Rationalizing Sentiment Analysis in Tensorflow Rationalizing Sentiment Analysis in Tensorflow Alyson Kane Stanford University Henry Neeb Stanford University Kevin Shaw Stanford University

More information

Keras: Handwritten Digit Recognition using MNIST Dataset

Keras: Handwritten Digit Recognition using MNIST Dataset Keras: Handwritten Digit Recognition using MNIST Dataset IIT PATNA February 9, 2017 1 / 24 OUTLINE 1 Introduction Keras: Deep Learning library for Theano and TensorFlow 2 Installing Keras Installation

More information

Machine Learning With Python. Bin Chen Nov. 7, 2017 Research Computing Center

Machine Learning With Python. Bin Chen Nov. 7, 2017 Research Computing Center Machine Learning With Python Bin Chen Nov. 7, 2017 Research Computing Center Outline Introduction to Machine Learning (ML) Introduction to Neural Network (NN) Introduction to Deep Learning NN Introduction

More information

Index. Umberto Michelucci 2018 U. Michelucci, Applied Deep Learning,

Index. Umberto Michelucci 2018 U. Michelucci, Applied Deep Learning, A Acquisition function, 298, 301 Adam optimizer, 175 178 Anaconda navigator conda command, 3 Create button, 5 download and install, 1 installing packages, 8 Jupyter Notebook, 11 13 left navigation pane,

More information

RNNs as Directed Graphical Models

RNNs as Directed Graphical Models RNNs as Directed Graphical Models Sargur Srihari This is part of lecture slides on Deep Learning: 1 10. Topics in Sequence Modeling Overview

More information

Xuedong Huang Chief Speech Scientist & Distinguished Engineer Microsoft Corporation

Xuedong Huang Chief Speech Scientist & Distinguished Engineer Microsoft Corporation Xuedong Huang Chief Speech Scientist & Distinguished Engineer Microsoft Corporation Cloud-enabled multimodal NUI with speech, gesture, gaze

More information

Show, Discriminate, and Tell: A Discriminatory Image Captioning Model with Deep Neural Networks

Show, Discriminate, and Tell: A Discriminatory Image Captioning Model with Deep Neural Networks Show, Discriminate, and Tell: A Discriminatory Image Captioning Model with Deep Neural Networks Boya Peng Department of Computer Science Stanford University Zelun Luo Department of Computer

More information

Voice command module for Smart Home Automation

Voice command module for Smart Home Automation Voice command module for Smart Home Automation LUKA KRALJEVIĆ, MLADEN RUSSO, MAJA STELLA Laboratory for Smart Environment Technologies, University of Split, FESB Ruđera Boškovića 32, 21000, Split CROATIA

More information

On the Efficiency of Recurrent Neural Network Optimization Algorithms

On the Efficiency of Recurrent Neural Network Optimization Algorithms On the Efficiency of Recurrent Neural Network Optimization Algorithms Ben Krause, Liang Lu, Iain Murray, Steve Renals University of Edinburgh Department of Informatics,,

More information

Object Detection Lecture Introduction to deep learning (CNN) Idar Dyrdal

Object Detection Lecture Introduction to deep learning (CNN) Idar Dyrdal Object Detection Lecture 10.3 - Introduction to deep learning (CNN) Idar Dyrdal Deep Learning Labels Computational models composed of multiple processing layers (non-linear transformations) Used to learn

More information

Supervised Learning in Neural Networks (Part 2)

Supervised Learning in Neural Networks (Part 2) Supervised Learning in Neural Networks (Part 2) Multilayer neural networks (back-propagation training algorithm) The input signals are propagated in a forward direction on a layer-bylayer basis. Learning

More information

Title. Author(s)Noguchi, Wataru; Iizuka, Hiroyuki; Yamamoto, Masahit. CitationEAI Endorsed Transactions on Security and Safety, 16

Title. Author(s)Noguchi, Wataru; Iizuka, Hiroyuki; Yamamoto, Masahit. CitationEAI Endorsed Transactions on Security and Safety, 16 Title Proposing Multimodal Integration Model Using LSTM an Author(s)Noguchi, Wataru; Iizuka, Hiroyuki; Yamamoto, Masahit CitationEAI Endorsed Transactions on Security and Safety, 16 Issue Date 216-12-28

More information

Prediction of Pedestrian Trajectories Final Report

Prediction of Pedestrian Trajectories Final Report Prediction of Pedestrian Trajectories Final Report Mingchen Li (limc), Yiyang Li (yiyang7), Gendong Zhang (zgdsh29) December 15, 2017 1 Introduction As the industry of automotive vehicles growing rapidly,

More information

Lecture 20: Neural Networks for NLP. Zubin Pahuja

Lecture 20: Neural Networks for NLP. Zubin Pahuja Lecture 20: Neural Networks for NLP Zubin Pahuja CS447: Natural Language Processing 1 Today s Lecture Feed-forward neural networks as classifiers simple

More information

CAP 6412 Advanced Computer Vision

CAP 6412 Advanced Computer Vision CAP 6412 Advanced Computer Vision Boqing Gong Feb 04, 2016 Today Administrivia Attention Modeling in Image Captioning, by Karan Neural networks & Backpropagation

More information

Deep Learning. Deep Learning. Practical Application Automatically Adding Sounds To Silent Movies

Deep Learning. Deep Learning. Practical Application Automatically Adding Sounds To Silent Movies Automatic Colorization of Black and White Images Automatically Adding Sounds To Silent Movies Traditionally this was done by hand with human effort

More information

Lecture 2 Notes. Outline. Neural Networks. The Big Idea. Architecture. Instructors: Parth Shah, Riju Pahwa

Lecture 2 Notes. Outline. Neural Networks. The Big Idea. Architecture. Instructors: Parth Shah, Riju Pahwa Instructors: Parth Shah, Riju Pahwa Lecture 2 Notes Outline 1. Neural Networks The Big Idea Architecture SGD and Backpropagation 2. Convolutional Neural Networks Intuition Architecture 3. Recurrent Neural

More information

COMP9444 Neural Networks and Deep Learning 5. Geometry of Hidden Units

COMP9444 Neural Networks and Deep Learning 5. Geometry of Hidden Units COMP9 8s Geometry of Hidden Units COMP9 Neural Networks and Deep Learning 5. Geometry of Hidden Units Outline Geometry of Hidden Unit Activations Limitations of -layer networks Alternative transfer functions

More information

Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction

Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction by Noh, Hyeonwoo, Paul Hongsuck Seo, and Bohyung Han.[1] Presented : Badri Patro 1 1 Computer Vision Reading

More information

Pixel-level Generative Model

Pixel-level Generative Model Pixel-level Generative Model Generative Image Modeling Using Spatial LSTMs (2015NIPS) L. Theis and M. Bethge University of Tübingen, Germany Pixel Recurrent Neural Networks (2016ICML) A. van den Oord,

More information

arxiv: v2 [] 10 Nov 2018

arxiv: v2 [] 10 Nov 2018 Number Sequence Prediction Problems for Evaluating Computational Powers of Neural Networks Hyoungwook Nam College of Liberal Studies Seoul National University Seoul, Korea Segwang Kim

More information

Gated Recurrent Models. Stephan Gouws & Richard Klein

Gated Recurrent Models. Stephan Gouws & Richard Klein Gated Recurrent Models Stephan Gouws & Richard Klein Outline Part 1: Intuition, Inference and Training Building intuitions: From Feedforward to Recurrent Models Inference in RNNs: Fprop Training in RNNs:

More information

Introducing CURRENNT: The Munich Open-Source CUDA RecurREnt Neural Network Toolkit

Introducing CURRENNT: The Munich Open-Source CUDA RecurREnt Neural Network Toolkit Journal of Machine Learning Research 6 205) 547-55 Submitted 7/3; Published 3/5 Introducing CURRENNT: The Munich Open-Source CUDA RecurREnt Neural Network Toolkit Felix Weninger Johannes

More information

Neural Network Neurons

Neural Network Neurons Neural Networks Neural Network Neurons 1 Receives n inputs (plus a bias term) Multiplies each input by its weight Applies activation function to the sum of results Outputs result Activation Functions Given

More information

Structured Attention Networks

Structured Attention Networks Structured Attention Networks Yoon Kim Carl Denton Luong Hoang Alexander M. Rush HarvardNLP 1 Deep Neural Networks for Text Processing and Generation 2 Attention Networks 3 Structured Attention Networks

More information

Neural Networks (pp )

Neural Networks (pp ) Notation: Means pencil-and-paper QUIZ Means coding QUIZ Neural Networks (pp. 106-121) The first artificial neural network (ANN) was the (single-layer) perceptron, a simplified model of a biological neuron.

More information

An Online Sequence-to-Sequence Model Using Partial Conditioning

An Online Sequence-to-Sequence Model Using Partial Conditioning An Online Sequence-to-Sequence Model Using Partial Conditioning Navdeep Jaitly Google Brain David Sussillo Google Brain Quoc V. Le Google Brain Oriol

More information

Deep Learning with Tensorflow AlexNet

Deep Learning with Tensorflow   AlexNet Machine Learning and Computer Vision Group Deep Learning with Tensorflow AlexNet Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton, "Imagenet classification

More information

Transition-Based Dependency Parsing with Stack Long Short-Term Memory

Transition-Based Dependency Parsing with Stack Long Short-Term Memory Transition-Based Dependency Parsing with Stack Long Short-Term Memory Chris Dyer, Miguel Ballesteros, Wang Ling, Austin Matthews, Noah A. Smith Association for Computational Linguistics (ACL), 2015 Presented

More information

Novel Image Captioning

Novel Image Captioning 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Deep Neural Networks Applications in Handwriting Recognition

Deep Neural Networks Applications in Handwriting Recognition Deep Neural Networks Applications in Handwriting Recognition 2 Who am I? Théodore Bluche PhD defended at Université Paris-Sud last year Deep Neural Networks for Large Vocabulary Handwritten

More information

Autoencoder. Representation learning (related to dictionary learning) Both the input and the output are x

Autoencoder. Representation learning (related to dictionary learning) Both the input and the output are x Deep Learning 4 Autoencoder, Attention (spatial transformer), Multi-modal learning, Neural Turing Machine, Memory Networks, Generative Adversarial Net Jian Li IIIS, Tsinghua Autoencoder Autoencoder Unsupervised

More information


CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Image Data: Classification via Neural Networks Instructor: Yizhou Sun November 19, 2015 Methods to Learn Classification Clustering Frequent Pattern Mining

More information

Deep Learning based Authorship Identification

Deep Learning based Authorship Identification Deep Learning based Authorship Identification Chen Qian Tianchang He Rao Zhang Department of Electrical Engineering Stanford University, Stanford, CA 94305

More information

Data Mining. Neural Networks

Data Mining. Neural Networks Data Mining Neural Networks Goals for this Unit Basic understanding of Neural Networks and how they work Ability to use Neural Networks to solve real problems Understand when neural networks may be most

More information

Semantic image search using queries

Semantic image search using queries Semantic image search using queries Shabaz Basheer Patel, Anand Sampat Department of Electrical Engineering Stanford University CA 94305, Abstract Previous work,

More information

Deep neural networks II

Deep neural networks II Deep neural networks II May 31 st, 2018 Yong Jae Lee UC Davis Many slides from Rob Fergus, Svetlana Lazebnik, Jia-Bin Huang, Derek Hoiem, Adriana Kovashka, Why (convolutional) neural networks? State of

More information

Deep Learning for Computer Vision II

Deep Learning for Computer Vision II IIIT Hyderabad Deep Learning for Computer Vision II C. V. Jawahar Paradigm Shift Feature Extraction (SIFT, HoG, ) Part Models / Encoding Classifier Sparrow Feature Learning Classifier Sparrow L 1 L 2 L

More information

A Deep Learning primer

A Deep Learning primer A Deep Learning primer Riccardo Zanella SuperComputing Applications and Innovation Department 1/21 Table of Contents Deep Learning: a review Representation Learning methods DL Applications

More information

Grounded Compositional Semantics for Finding and Describing Images with Sentences

Grounded Compositional Semantics for Finding and Describing Images with Sentences Grounded Compositional Semantics for Finding and Describing Images with Sentences R. Socher, A. Karpathy, V. Le,D. Manning, A Y. Ng - 2013 Ali Gharaee 1 Alireza Keshavarzi 2 1 Department of Computational

More information

Structured Attention Networks

Structured Attention Networks Structured Attention Networks Yoon Kim Carl Denton Luong Hoang Alexander M. Rush HarvardNLP ICLR, 2017 Presenter: Chao Jiang ICLR, 2017 Presenter: Chao Jiang 1 / Outline 1 Deep Neutral Networks for Text

More information

RNNs in TensorFlow. CS 20SI: TensorFlow for Deep Learning Research Lecture 11 2/22/2017

RNNs in TensorFlow. CS 20SI: TensorFlow for Deep Learning Research Lecture 11 2/22/2017 RNNs in TensorFlow CS 20SI: TensorFlow for Deep Learning Research Lecture 11 2/22/2017 1 2 Beat the World s Best at Super Smash Bros Beating the World s Best at Super Smash Bros. Melee with Deep Reinforcement

More information

Lecture 7: Neural network acoustic models in speech recognition

Lecture 7: Neural network acoustic models in speech recognition CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2017 Lecture 7: Neural network acoustic models in speech recognition Outline Hybrid acoustic modeling overview Basic

More information

Sentiment Classification of Food Reviews

Sentiment Classification of Food Reviews 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Sentiment Classification of Food Reviews Hua Feng Ruixi Lin Department of Electrical Engineering Department of Electrical Engineering Stanford University Stanford University

More information

Differentiable Data Structures (and POMDPs)

Differentiable Data Structures (and POMDPs) Differentiable Data Structures (and POMDPs) Yarin Gal & Rowan McAllister February 11, 2016 Many thanks to Edward Grefenstette for graphics material; other sources include Wikimedia licensed under CC BY-SA

More information

Neural Symbolic Machines: Learning Semantic Parsers on Freebase with Weak Supervision

Neural Symbolic Machines: Learning Semantic Parsers on Freebase with Weak Supervision Neural Symbolic Machines: Learning Semantic Parsers on Freebase with Weak Supervision Anonymized for review Abstract Extending the success of deep neural networks to high level tasks like natural language

More information

Motivation: Shortcomings of Hidden Markov Model. Ko, Youngjoong. Solution: Maximum Entropy Markov Model (MEMM)

Motivation: Shortcomings of Hidden Markov Model. Ko, Youngjoong. Solution: Maximum Entropy Markov Model (MEMM) Motivation: Shortcomings of Hidden Markov Model Maximum Entropy Markov Models and Conditional Random Fields Ko, Youngjoong Dept. of Computer Engineering, Dong-A University Intelligent System Laboratory,

More information

Deep Generative Models Variational Autoencoders

Deep Generative Models Variational Autoencoders Deep Generative Models Variational Autoencoders Sudeshna Sarkar 5 April 2017 Generative Nets Generative models that represent probability distributions over multiple variables in some way. Directed Generative

More information

Combining Neural Networks and Log-linear Models to Improve Relation Extraction

Combining Neural Networks and Log-linear Models to Improve Relation Extraction Combining Neural Networks and Log-linear Models to Improve Relation Extraction Thien Huu Nguyen and Ralph Grishman Computer Science Department, New York University {thien,grishman} Outline Relation

More information


LECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS LECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS Neural Networks Classifier Introduction INPUT: classification data, i.e. it contains an classification (class) attribute. WE also say that the class

More information

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani Neural Networks CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Biological and artificial neural networks Feed-forward neural networks Single layer

More information