Recurrent Neural Networks
|
|
- Maximillian Oliver
- 5 years ago
- Views:
Transcription
1 Recurrent Neural Networks Javier Béjar Deep Learning 2018/2019 Fall Master in Artificial Intelligence (FIB-UPC)
2 Introduction
3 Sequential data Many problems are described by sequences Time series Video/audio processing Natural Language Processing (translation, dialogue) Bioinformatics (DNA/protein) Process control Model the problem = Extract elements sequence dependencies DL 2018/2019 Fall - MAI - FIB 1/49
4 Long time dependences Sequences can be modeled using non sequential ML methods (e.g. Sliding windows), but All sequences must have the same length Order of the elements always matter We cannot model dependencies longer than the chosen sequence length We need methods that explicitly model time dependencies capable of: Processing arbitrary length examples Providing different mappings (one to many, many to one, many to many) DL 2018/2019 Fall - MAI - FIB 2/49
5 Input-Output mapping Usual NN One to many Many to one Many to many Recurrent NN DL 2018/2019 Fall - MAI - FIB 3/49
6 One to many - Image captioning Black and white dog jumps over bar from Deep Visual-Semantic Alignments for Generating Image Descriptions Andrej Karpathy, Li Fei-Fei DL 2018/2019 Fall - MAI - FIB 4/49
7 Many to One - Sentiment Analysis My flight was just delayed, s**t Negative Never again BA, thanks for the dreadful flight Negative We arrived on time, yeehaaa! Positive Another day, another flight Neutral Efficient, quick, delightful, always with BA Positive DL 2018/2019 Fall - MAI - FIB 5/49
8 Many to Many - Machine Translation [How, many, programmers, for, changing, a, lightbulb,?] [Wie, viele, Programmierer, zum, Wechseln, einer, Glühbirne,?] [Combien, de, programmeurs, pour, changer, une, ampoule,?] [,Cuántos, programadores, para, cambiar, una, bombilla,?] [Zenbat, bonbilla, bat, aldatzeko, programatzaileak,?] DL 2018/2019 Fall - MAI - FIB 6/49
9 Recurrent Networks
10 Recurrent Neural Networks RNN are feed forward NN with edges that span adjacent time steps (recurrent edges) At each time step nodes receive input from the current data and from the previous state This makes that input data from previous time steps can influence the output at the current time step RNN are universal function approximators (Turing Complete) DL 2018/2019 Fall - MAI - FIB 7/49
11 Recurrent Node DL 2018/2019 Fall - MAI - FIB 8/49
12 Recurrent Neural Networks Input (x) is a vector of values for time t The hidden node (h) stores the state Weights are shared through time Each step the computation uses the previous step h (t+1) = f (h (t), x t+1 ; θ) = f (f (h (t 1), x t ; θ), x t+1 ; θ) = We can think of a RNN as a deep network that stacks layers through time DL 2018/2019 Fall - MAI - FIB 9/49
13 Training RNN (unfolding) DL 2018/2019 Fall - MAI - FIB 10/49
14 Activation Functions There are different choices for the activation function to compute the hidden state, but: The hyperbolic tangent function (tanh) is a popular choice versus the usual sigmoid function Good results are also achieved using the rectified linear function (ReLU) instead DL 2018/2019 Fall - MAI - FIB 11/49
15 RNN computation a (t) = b + W h (t 1) + U x (t) (1) h (t) = tanh(a (t) ) (2) y (t) = c + V h (t) (3) b and c are bias, an additional step can be added depending on the task DL 2018/2019 Fall - MAI - FIB 12/49
16 Training RNN RNN are usually trained using backpropagation The computation is unfolded through the sequence to propagate the activations and to compute the gradient This is known as Backpropagation Through Time (BPTT) Usually the input is limited in length to reduce computational cost, this is known as Truncated BPTT Assumes that influence is limited to a time horizon DL 2018/2019 Fall - MAI - FIB 13/49
17 Recurrent NN unfolded... DL 2018/2019 Fall - MAI - FIB 14/49
18 Recurrent NN (Regresssion/Classification) Loss... DL 2018/2019 Fall - MAI - FIB 15/49
19 Recurrent NN (Sequence to Sequence) Loss Loss Loss... DL 2018/2019 Fall - MAI - FIB 16/49
20 Gradient problems Two main problems during training Exploding Gradient Vanishing Gradient Problems appear because of the sharing of weights Recurrent edge weights in combination with activation function magnify (W > 1) or shrink (W < 1) the gradient exponentially with the length of the sequence Clipping gradients and regularization are usual solutions to exploding gradient DL 2018/2019 Fall - MAI - FIB 17/49
21 Recurrent NN (Gradient problems)... DL 2018/2019 Fall - MAI - FIB 18/49
22 Recurrent NN (Gradient problems) Applying the chain rule makes that propagating the values forward and backward, the longer the sequence, the smaller the influence of the past: f t W = f t s t s t W = f t s t s t 1 s t s t 1 W = = f t s t s t s t 1 s 2 s 1 s 1 W DL 2018/2019 Fall - MAI - FIB 19/49
23 Beyond Vanilla RNN Learning long time dependencies is difficult for vanilla RNN More sophisticated recurrent architectures allow reducing gradient problems Gated RNNs introduce memory and gating mechanisms When to store information in the state How much new information changes the state DL 2018/2019 Fall - MAI - FIB 20/49
24 LSTMs
25 Long Short Term Memory units (LSTMs) LSTMs specialize on learning long time dependencies They are composed by a memory cell and control gates Gates allow regulating how much the new information changes the state and flows to the next step Forget Gate Input Gate Update Gate Output Gate DL 2018/2019 Fall - MAI - FIB 21/49
26 Long Short Term Memory units (LSTMs) An LSTM propagates a hidden state (h t ) and a cell state (c t ) The hidden state acts as a short-term memory (large updates) The cell state acts as a long-term memory (small updates) Gates control updates from layer to layer Information can flow between short-term and long-term memory DL 2018/2019 Fall - MAI - FIB 22/49
27 LSTMs DL 2018/2019 Fall - MAI - FIB 23/49
28 Role of activation functions Gates use as activation functions the sigmoid and tanh functions Their role is to perform fuzzy decisions tanh: squashes value to range [-1,1] (substract, neutral, add) sigmoid: squashes value to range [0,1] (closed, open) DL 2018/2019 Fall - MAI - FIB 24/49
29 LSTMs computations x + x tanh x f t = σ(w f [h t 1, x t ]) (Forget) tanh DL 2018/2019 Fall - MAI - FIB 25/49
30 LSTMs computations x + x tanh x f t = σ(w f [h t 1, x t ]) (Forget) i t = σ(w i [h t 1, x t ]) (Input) c t = tanh(w c [h t 1, x t ]) tanh DL 2018/2019 Fall - MAI - FIB 25/49
31 LSTMs computations x + x tanh tanh x f t = σ(w f [h t 1, x t ]) (Forget) i t = σ(w i [h t 1, x t ]) (Input) c t = tanh(w c [h t 1, x t ]) c t = f t c t 1 + i t c t (Update) DL 2018/2019 Fall - MAI - FIB 25/49
32 LSTMs computations x + x tanh tanh x f t = σ(w f [h t 1, x t ]) (Forget) i t = σ(w i [h t 1, x t ]) (Input) c t = tanh(w c [h t 1, x t ]) c t = f t c t 1 + i t c t (Update) o t = σ(w o [h t 1, x t ]) h t = o t tanh(c t ) (Output) DL 2018/2019 Fall - MAI - FIB 25/49
33 GRUs
34 Gated Recurrent Units Reduces the complexity of the LSTMs Unifies the forgetting and the update gates as a unique update gate The update gate computes how the input and previous state are combined A reset gate controls the access to the previous state near to one previous state has more effect near to zero new state (updated) has more effect DL 2018/2019 Fall - MAI - FIB 26/49
35 GRU Reset Update z t = σ(w z [h t 1, x t ]) (upd) r t = σ(w r [h t 1, x t ]) (res) h t = tanh(w h [r t h t 1, x t ]) h t = (1 z t ) h t 1 + z t h t DL 2018/2019 Fall - MAI - FIB 27/49
36 Pros/Cons
37 Vanilla RNN vs LSTMs vs GRUs Empirically Vanilla RNN underperforms on complex tasks LSTMs are widely used but GRUs are very recent (2014) There is not yet theoretical arguments in the LSTMs vs GRUs question Empirical studies do not shed light to the question (see references) DL 2018/2019 Fall - MAI - FIB 28/49
38 Vanilla RNN vs LSTMs vs GRUs Jozefowicz, R., Zaremba, W., Sutskever, I. (2015). An empirical exploration of recurrent network architectures. In Proceedings of the 32nd International Conference on Machine Learning (ICML-15) (pp ). Chung, J., Gulcehre, C., Cho, K., Bengio, Y. (2014).Empirical evaluation of gated recurrent neural networks on sequence modeling. arxiv preprint arxiv: DL 2018/2019 Fall - MAI - FIB 29/49
39 Other RNN Variations
40 Bidirectional RNNs - Back from the future In some domains it is easier to learn dependencies if information flows in both directions For instance, domains where decisions depend on the whole sequence Part of Speech tagging Machine translation Speech/handwritting recognition A RNN can be split in two, so sequences are processed in both directions DL 2018/2019 Fall - MAI - FIB 30/49
41 Regular RNNs (only forward) DL 2018/2019 Fall - MAI - FIB 31/49
42 Bidirectional RNNs (forward-backward) DL 2018/2019 Fall - MAI - FIB 32/49
43 Bidirectional RNNs - Training Both directions are independent (no interconnections) Backpropagation needs to follow the graph dependencies, not all weights can be updated at the same time Propagating Forward: First compute the RNN forward and backward pass from the inputs, then the outputs Propagating Backward: First compute the RNN forward and backward pass from the outputs, then the inputs DL 2018/2019 Fall - MAI - FIB 33/49
44 Input Sequence (N) Sequence to sequence Direct sequence association Input and output sequences have the same lengths All the outputs of the BPTT are used for the output Training is straightforward RNN RNN RNN RNN Output Sequence (N) DL 2018/2019 Fall - MAI - FIB 34/49
45 Sequence to sequence Encoder-decoder architecture Input and output sequences can have different lengths Encoder RNN summarizes input in a coding state Decoder RNN generates output from that state Different options connecting Encoder-Decoder (direct, peeking, attention) or training (teacher forcing) Inference: The sequence is generated element by element using the output of the previous step or using a beam search DL 2018/2019 Fall - MAI - FIB 35/49
46 Encoder Decoder Encoder-Decoder (Plain) Today it is raining <start> hoy está lloviendo <eos> Encoder state DL 2018/2019 Fall - MAI - FIB 36/49
47 Encoder-Decoder (Plain) - Inference When generating a sequence of discrete values, the greedy approach could not be the best policy A limited exploration of the branching alternatives is needed (beam search) The sequence with the largest joint probability is the sequence generated Hoy (0.6) Ahora (0.3) Esta (0.7) Estaba (0.2) Esta (0.8) Estaba (0.1) Lloviendo (0.9) Lloviendo (0.9) Lloviendo (0.9) Lloviendo (0.9) DL 2018/2019 Fall - MAI - FIB 37/49
48 Encoder Decoder Encoder-Decoder (Peeking) Today it is raining <start> hoy está lloviendo <eos> Encoder state DL 2018/2019 Fall - MAI - FIB 38/49
49 Encoder Decoder Encoder-Decoder (Attention) Today it is raining <start> hoy está lloviendo <eos> Encoder state Attention DL 2018/2019 Fall - MAI - FIB 39/49
50 Encoder-Decoder (Teacher Forcing) Input Sequence Today it is raining <start> hoy está lloviendo Decoder Encoder hoy está lloviendo <eos> Output Sequence Encoder state DL 2018/2019 Fall - MAI - FIB 40/49
51 Augmented neural networks RNNs are Turing complete, but it is difficult to achieve it in practice New architectures include: Data structures to store information (Read/Write Operations) RNNs control the operations Attention mechanisms Graves, Wayne, Danihelka Neural Turing Machines, ArXiv preprint arxiv: C. Olah, S. Carter, Attention and Augmented Recurrent Neural Networks, Distill, Sept 9, 2016 DL 2018/2019 Fall - MAI - FIB 41/49
52 Applications
53 Large variety of applications Time series prediction NLP Reasoning Multimodal POS tagging, Machine traslation, Question answering Caption Generation for images/video Speech recognition/generation Reinforcement learning DL 2018/2019 Fall - MAI - FIB 42/49
54 Guided Laboratory
55 Sequence prediction Sequence to value Predicting the next step of a time series Classification of time series Predicting sentiment from tweets Text generation (predicting characters) Sequence to sequence Learning to add DL 2018/2019 Fall - MAI - FIB 43/49
56 Task 1: Air Quality prediction Time series regression TS Window t Dataset: Air Quality every hour Goal: Predict wind speed next hour Training time Train/ 8784 Test/6 lag 32 Neurons /1 Layer/30 epochs 30 sec TS t+1 RNN RNN Dense DL 2018/2019 Fall - MAI - FIB 44/49
57 Task 2: Electric Devices Time series classification Power Consumption Daily power consumption of household devices (7 classes, 96 attributes) Goal: Predict household device class Training time 64 Neurons /2 Layers/30 epochs 1 min Dev Class RNN RNN Dense DL 2018/2019 Fall - MAI - FIB 45/49
58 Task 3: Sentiment analysis Tweets (neg, pos, neutral) Tweets as sequences of words Preprocess: Generate vocabulary, recode sequences Embed sequences to a more convenient space Tweet Sequence Embedding RNN RNN Training time 5000 words/ 40 dim embedding 64 Neurons /1 Layer/50 epochs 3 min Sentiment Dense DL 2018/2019 Fall - MAI - FIB 46/49
59 Task 4: Text Generation Poetry text Character prediction from text windows Preprocess: sequences of characters one hot encoding Text generation by predicting characters iteratively Training time poetry1/50 chars/3 skip 64 Neu/1 Ly/10 it/10 ep it 23 min Character Text Window One hot enc RNN RNN Dense DL 2018/2019 Fall - MAI - FIB 47/49
60 Task 5: Learning to add Predicting addition results from text Input sequence: NUMBER+NUMBER Output sequence: NUMBER Preprocess: sequences of characters one hot encoding Training time ex/3 digits 128 Neu/1 Ly/50 ep 5 min 30 sec RNN Encode + RNN Decode DL 2018/2019 Fall - MAI - FIB 48/49
61 Task 5: Learning to add One Hot enc XXX+XXX RNN (enc) RepeatVector TimeDistributed RNN (dec) Dense XXXX DL 2018/2019 Fall - MAI - FIB 49/49
CSC 578 Neural Networks and Deep Learning
CSC 578 Neural Networks and Deep Learning Fall 2018/19 7. Recurrent Neural Networks (Some figures adapted from NNDL book) 1 Recurrent Neural Networks 1. Recurrent Neural Networks (RNNs) 2. RNN Training
More informationSequence Modeling: Recurrent and Recursive Nets. By Pyry Takala 14 Oct 2015
Sequence Modeling: Recurrent and Recursive Nets By Pyry Takala 14 Oct 2015 Agenda Why Recurrent neural networks? Anatomy and basic training of an RNN (10.2, 10.2.1) Properties of RNNs (10.2.2, 8.2.6) Using
More informationSEMANTIC COMPUTING. Lecture 9: Deep Learning: Recurrent Neural Networks (RNNs) TU Dresden, 21 December 2018
SEMANTIC COMPUTING Lecture 9: Deep Learning: Recurrent Neural Networks (RNNs) Dagmar Gromann International Center For Computational Logic TU Dresden, 21 December 2018 Overview Handling Overfitting Recurrent
More informationMachine Learning 13. week
Machine Learning 13. week Deep Learning Convolutional Neural Network Recurrent Neural Network 1 Why Deep Learning is so Popular? 1. Increase in the amount of data Thanks to the Internet, huge amount of
More informationLSTM and its variants for visual recognition. Xiaodan Liang Sun Yat-sen University
LSTM and its variants for visual recognition Xiaodan Liang xdliang328@gmail.com Sun Yat-sen University Outline Context Modelling with CNN LSTM and its Variants LSTM Architecture Variants Application in
More informationEmpirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling
Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling Authors: Junyoung Chung, Caglar Gulcehre, KyungHyun Cho and Yoshua Bengio Presenter: Yu-Wei Lin Background: Recurrent Neural
More informationDeep Learning Applications
October 20, 2017 Overview Supervised Learning Feedforward neural network Convolution neural network Recurrent neural network Recursive neural network (Recursive neural tensor network) Unsupervised Learning
More informationSentiment Classification of Food Reviews
Sentiment Classification of Food Reviews Hua Feng Department of Electrical Engineering Stanford University Stanford, CA 94305 fengh15@stanford.edu Ruixi Lin Department of Electrical Engineering Stanford
More informationDEEP LEARNING REVIEW. Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature Presented by Divya Chitimalla
DEEP LEARNING REVIEW Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature 2015 -Presented by Divya Chitimalla What is deep learning Deep learning allows computational models that are composed of multiple
More informationShow, Discriminate, and Tell: A Discriminatory Image Captioning Model with Deep Neural Networks
Show, Discriminate, and Tell: A Discriminatory Image Captioning Model with Deep Neural Networks Zelun Luo Department of Computer Science Stanford University zelunluo@stanford.edu Te-Lin Wu Department of
More informationEncoding RNNs, 48 End of sentence (EOS) token, 207 Exploding gradient, 131 Exponential function, 42 Exponential Linear Unit (ELU), 44
A Activation potential, 40 Annotated corpus add padding, 162 check versions, 158 create checkpoints, 164, 166 create input, 160 create train and validation datasets, 163 dropout, 163 DRUG-AE.rel file,
More informationGate-Variants of Gated Recurrent Unit (GRU) Neural Networks
Gate-Variants of Gated Recurrent Unit (GRU) Neural Networks Rahul Dey and Fathi M. Salem Circuits, Systems, and Neural Networks (CSANN) LAB Department of Electrical and Computer Engineering Michigan State
More informationOutline GF-RNN ReNet. Outline
Outline Gated Feedback Recurrent Neural Networks. arxiv1502. Introduction: RNN & Gated RNN Gated Feedback Recurrent Neural Networks (GF-RNN) Experiments: Character-level Language Modeling & Python Program
More informationEmpirical Evaluation of RNN Architectures on Sentence Classification Task
Empirical Evaluation of RNN Architectures on Sentence Classification Task Lei Shen, Junlin Zhang Chanjet Information Technology lorashen@126.com, zhangjlh@chanjet.com Abstract. Recurrent Neural Networks
More information27: Hybrid Graphical Models and Neural Networks
10-708: Probabilistic Graphical Models 10-708 Spring 2016 27: Hybrid Graphical Models and Neural Networks Lecturer: Matt Gormley Scribes: Jakob Bauer Otilia Stretcu Rohan Varma 1 Motivation We first look
More informationNatural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu
Natural Language Processing CS 6320 Lecture 6 Neural Language Models Instructor: Sanda Harabagiu In this lecture We shall cover: Deep Neural Models for Natural Language Processing Introduce Feed Forward
More informationSlide credit from Hung-Yi Lee & Richard Socher
Slide credit from Hung-Yi Lee & Richard Socher 1 Review Word Vector 2 Word2Vec Variants Skip-gram: predicting surrounding words given the target word (Mikolov+, 2013) CBOW (continuous bag-of-words): predicting
More informationSEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic
SEMANTIC COMPUTING Lecture 8: Introduction to Deep Learning Dagmar Gromann International Center For Computational Logic TU Dresden, 7 December 2018 Overview Introduction Deep Learning General Neural Networks
More informationModeling Sequences Conditioned on Context with RNNs
Modeling Sequences Conditioned on Context with RNNs Sargur Srihari srihari@buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 10. Topics in Sequence
More informationResidual Networks And Attention Models. cs273b Recitation 11/11/2016. Anna Shcherbina
Residual Networks And Attention Models cs273b Recitation 11/11/2016 Anna Shcherbina Introduction to ResNets Introduced in 2015 by Microsoft Research Deep Residual Learning for Image Recognition (He, Zhang,
More informationLayerwise Interweaving Convolutional LSTM
Layerwise Interweaving Convolutional LSTM Tiehang Duan and Sargur N. Srihari Department of Computer Science and Engineering The State University of New York at Buffalo Buffalo, NY 14260, United States
More informationRecurrent Neural Networks. Nand Kishore, Audrey Huang, Rohan Batra
Recurrent Neural Networks Nand Kishore, Audrey Huang, Rohan Batra Roadmap Issues Motivation 1 Application 1: Sequence Level Training 2 Basic Structure 3 4 Variations 5 Application 3: Image Classification
More informationCode Mania Artificial Intelligence: a. Module - 1: Introduction to Artificial intelligence and Python:
Code Mania 2019 Artificial Intelligence: a. Module - 1: Introduction to Artificial intelligence and Python: 1. Introduction to Artificial Intelligence 2. Introduction to python programming and Environment
More informationDeep Learning. Architecture Design for. Sargur N. Srihari
Architecture Design for Deep Learning Sargur N. srihari@cedar.buffalo.edu 1 Topics Overview 1. Example: Learning XOR 2. Gradient-Based Learning 3. Hidden Units 4. Architecture Design 5. Backpropagation
More informationLSTM with Working Memory
LSTM with Working Memory Andrew Pulver Department of Computer Science University at Albany Email: apulver@albany.edu Siwei Lyu Department of Computer Science University at Albany Email: slyu@albany.edu
More information16-785: Integrated Intelligence in Robotics: Vision, Language, and Planning. Spring 2018 Lecture 14. Image to Text
16-785: Integrated Intelligence in Robotics: Vision, Language, and Planning Spring 2018 Lecture 14. Image to Text Input Output Classification tasks 4/1/18 CMU 16-785: Integrated Intelligence in Robotics
More informationImage Captioning with Object Detection and Localization
Image Captioning with Object Detection and Localization Zhongliang Yang, Yu-Jin Zhang, Sadaqat ur Rehman, Yongfeng Huang, Department of Electronic Engineering, Tsinghua University, Beijing 100084, China
More informationArtificial Neural Networks. Introduction to Computational Neuroscience Ardi Tampuu
Artificial Neural Networks Introduction to Computational Neuroscience Ardi Tampuu 7.0.206 Artificial neural network NB! Inspired by biology, not based on biology! Applications Automatic speech recognition
More informationCS489/698: Intro to ML
CS489/698: Intro to ML Lecture 14: Training of Deep NNs Instructor: Sun Sun 1 Outline Activation functions Regularization Gradient-based optimization 2 Examples of activation functions 3 5/28/18 Sun Sun
More informationMachine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,
Machine Learning 10-701, Fall 2015 Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October 6, 2015 Eric Xing @ CMU, 2015 1 A perennial challenge in computer vision: feature engineering SIFT Spin image
More informationLSTM for Language Translation and Image Captioning. Tel Aviv University Deep Learning Seminar Oran Gafni & Noa Yedidia
1 LSTM for Language Translation and Image Captioning Tel Aviv University Deep Learning Seminar Oran Gafni & Noa Yedidia 2 Part I LSTM for Language Translation Motivation Background (RNNs, LSTMs) Model
More informationJOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS. Puyang Xu, Ruhi Sarikaya. Microsoft Corporation
JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS Puyang Xu, Ruhi Sarikaya Microsoft Corporation ABSTRACT We describe a joint model for intent detection and slot filling based
More informationRecurrent Neural Nets II
Recurrent Neural Nets II Steven Spielberg Pon Kumar, Tingke (Kevin) Shen Machine Learning Reading Group, Fall 2016 9 November, 2016 Outline 1 Introduction 2 Problem Formulations with RNNs 3 LSTM for Optimization
More informationRecurrent Neural Network (RNN) Industrial AI Lab.
Recurrent Neural Network (RNN) Industrial AI Lab. For example (Deterministic) Time Series Data Closed- form Linear difference equation (LDE) and initial condition High order LDEs 2 (Stochastic) Time Series
More informationDeep Learning. Practical introduction with Keras JORDI TORRES 27/05/2018. Chapter 3 JORDI TORRES
Deep Learning Practical introduction with Keras Chapter 3 27/05/2018 Neuron A neural network is formed by neurons connected to each other; in turn, each connection of one neural network is associated
More informationThe Hitchhiker s Guide to TensorFlow:
The Hitchhiker s Guide to TensorFlow: Beyond Recurrent Neural Networks (sort of) Keith Davis @keithdavisiii iamthevastidledhitchhiker.github.io Topics Kohonen/Self-Organizing Maps LSTMs in TensorFlow GRU
More informationHouse Price Prediction Using LSTM
House Price Prediction Using LSTM Xiaochen Chen Lai Wei The Hong Kong University of Science and Technology Jiaxin Xu ABSTRACT In this paper, we use the house price data ranging from January 2004 to October
More informationSpeech and Language Processing. Daniel Jurafsky & James H. Martin. Copyright c All rights reserved. Draft of September 23, 2018.
Speech and Language Processing. Daniel Jurafsky & James H. Martin. Copyright c 2018. All rights reserved. Draft of September 23, 2018. CHAPTER 9 Sequence Processing with Recurrent Networks Time will explain.
More informationA Quick Guide on Training a neural network using Keras.
A Quick Guide on Training a neural network using Keras. TensorFlow and Keras Keras Open source High level, less flexible Easy to learn Perfect for quick implementations Starts by François Chollet from
More informationLSTM: An Image Classification Model Based on Fashion-MNIST Dataset
LSTM: An Image Classification Model Based on Fashion-MNIST Dataset Kexin Zhang, Research School of Computer Science, Australian National University Kexin Zhang, U6342657@anu.edu.au Abstract. The application
More informationAsynchronous Parallel Learning for Neural Networks and Structured Models with Dense Features
Asynchronous Parallel Learning for Neural Networks and Structured Models with Dense Features Xu SUN ( 孙栩 ) Peking University xusun@pku.edu.cn Motivation Neural networks -> Good Performance CNN, RNN, LSTM
More informationHidden Units. Sargur N. Srihari
Hidden Units Sargur N. srihari@cedar.buffalo.edu 1 Topics in Deep Feedforward Networks Overview 1. Example: Learning XOR 2. Gradient-Based Learning 3. Hidden Units 4. Architecture Design 5. Backpropagation
More informationDomain-Aware Sentiment Classification with GRUs and CNNs
Domain-Aware Sentiment Classification with GRUs and CNNs Guangyuan Piao 1(B) and John G. Breslin 2 1 Insight Centre for Data Analytics, Data Science Institute, National University of Ireland Galway, Galway,
More informationKeras: Handwritten Digit Recognition using MNIST Dataset
Keras: Handwritten Digit Recognition using MNIST Dataset IIT PATNA January 31, 2018 1 / 30 OUTLINE 1 Keras: Introduction 2 Installing Keras 3 Keras: Building, Testing, Improving A Simple Network 2 / 30
More informationEECS 496 Statistical Language Models. Winter 2018
EECS 496 Statistical Language Models Winter 2018 Introductions Professor: Doug Downey Course web site: www.cs.northwestern.edu/~ddowney/courses/496_winter2018 (linked off prof. home page) Logistics Grading
More informationConvolutional Sequence to Sequence Learning. Denis Yarats with Jonas Gehring, Michael Auli, David Grangier, Yann Dauphin Facebook AI Research
Convolutional Sequence to Sequence Learning Denis Yarats with Jonas Gehring, Michael Auli, David Grangier, Yann Dauphin Facebook AI Research Sequence generation Need to model a conditional distribution
More informationNatural Language Processing with Deep Learning CS224N/Ling284. Christopher Manning Lecture 4: Backpropagation and computation graphs
Natural Language Processing with Deep Learning CS4N/Ling84 Christopher Manning Lecture 4: Backpropagation and computation graphs Lecture Plan Lecture 4: Backpropagation and computation graphs 1. Matrix
More informationRecurrent Neural Networks
Recurrent Neural Networks 11-785 / Fall 2018 / Recitation 7 Raphaël Olivier Recap : RNNs are magic They have infinite memory They handle all kinds of series They re the basis of recent NLP : Translation,
More informationDeep Learning in Visual Recognition. Thanks Da Zhang for the slides
Deep Learning in Visual Recognition Thanks Da Zhang for the slides Deep Learning is Everywhere 2 Roadmap Introduction Convolutional Neural Network Application Image Classification Object Detection Object
More informationRationalizing Sentiment Analysis in Tensorflow
Rationalizing Sentiment Analysis in Tensorflow Alyson Kane Stanford University alykane@stanford.edu Henry Neeb Stanford University hneeb@stanford.edu Kevin Shaw Stanford University keshaw@stanford.edu
More informationKeras: Handwritten Digit Recognition using MNIST Dataset
Keras: Handwritten Digit Recognition using MNIST Dataset IIT PATNA February 9, 2017 1 / 24 OUTLINE 1 Introduction Keras: Deep Learning library for Theano and TensorFlow 2 Installing Keras Installation
More informationMachine Learning With Python. Bin Chen Nov. 7, 2017 Research Computing Center
Machine Learning With Python Bin Chen Nov. 7, 2017 Research Computing Center Outline Introduction to Machine Learning (ML) Introduction to Neural Network (NN) Introduction to Deep Learning NN Introduction
More informationIndex. Umberto Michelucci 2018 U. Michelucci, Applied Deep Learning,
A Acquisition function, 298, 301 Adam optimizer, 175 178 Anaconda navigator conda command, 3 Create button, 5 download and install, 1 installing packages, 8 Jupyter Notebook, 11 13 left navigation pane,
More informationRNNs as Directed Graphical Models
RNNs as Directed Graphical Models Sargur Srihari srihari@buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 10. Topics in Sequence Modeling Overview
More informationXuedong Huang Chief Speech Scientist & Distinguished Engineer Microsoft Corporation
Xuedong Huang Chief Speech Scientist & Distinguished Engineer Microsoft Corporation xdh@microsoft.com Cloud-enabled multimodal NUI with speech, gesture, gaze http://cacm.acm.org/magazines/2014/1/170863-ahistorical-perspective-of-speech-recognition
More informationShow, Discriminate, and Tell: A Discriminatory Image Captioning Model with Deep Neural Networks
Show, Discriminate, and Tell: A Discriminatory Image Captioning Model with Deep Neural Networks Boya Peng Department of Computer Science Stanford University boya@stanford.edu Zelun Luo Department of Computer
More informationVoice command module for Smart Home Automation
Voice command module for Smart Home Automation LUKA KRALJEVIĆ, MLADEN RUSSO, MAJA STELLA Laboratory for Smart Environment Technologies, University of Split, FESB Ruđera Boškovića 32, 21000, Split CROATIA
More informationOn the Efficiency of Recurrent Neural Network Optimization Algorithms
On the Efficiency of Recurrent Neural Network Optimization Algorithms Ben Krause, Liang Lu, Iain Murray, Steve Renals University of Edinburgh Department of Informatics s17005@sms.ed.ac.uk, llu@staffmail.ed.ac.uk,
More informationObject Detection Lecture Introduction to deep learning (CNN) Idar Dyrdal
Object Detection Lecture 10.3 - Introduction to deep learning (CNN) Idar Dyrdal Deep Learning Labels Computational models composed of multiple processing layers (non-linear transformations) Used to learn
More informationSupervised Learning in Neural Networks (Part 2)
Supervised Learning in Neural Networks (Part 2) Multilayer neural networks (back-propagation training algorithm) The input signals are propagated in a forward direction on a layer-bylayer basis. Learning
More informationTitle. Author(s)Noguchi, Wataru; Iizuka, Hiroyuki; Yamamoto, Masahit. CitationEAI Endorsed Transactions on Security and Safety, 16
Title Proposing Multimodal Integration Model Using LSTM an Author(s)Noguchi, Wataru; Iizuka, Hiroyuki; Yamamoto, Masahit CitationEAI Endorsed Transactions on Security and Safety, 16 Issue Date 216-12-28
More informationPrediction of Pedestrian Trajectories Final Report
Prediction of Pedestrian Trajectories Final Report Mingchen Li (limc), Yiyang Li (yiyang7), Gendong Zhang (zgdsh29) December 15, 2017 1 Introduction As the industry of automotive vehicles growing rapidly,
More informationLecture 20: Neural Networks for NLP. Zubin Pahuja
Lecture 20: Neural Networks for NLP Zubin Pahuja zpahuja2@illinois.edu courses.engr.illinois.edu/cs447 CS447: Natural Language Processing 1 Today s Lecture Feed-forward neural networks as classifiers simple
More informationCAP 6412 Advanced Computer Vision
CAP 6412 Advanced Computer Vision http://www.cs.ucf.edu/~bgong/cap6412.html Boqing Gong Feb 04, 2016 Today Administrivia Attention Modeling in Image Captioning, by Karan Neural networks & Backpropagation
More informationDeep Learning. Deep Learning. Practical Application Automatically Adding Sounds To Silent Movies
http://blog.csdn.net/zouxy09/article/details/8775360 Automatic Colorization of Black and White Images Automatically Adding Sounds To Silent Movies Traditionally this was done by hand with human effort
More informationLecture 2 Notes. Outline. Neural Networks. The Big Idea. Architecture. Instructors: Parth Shah, Riju Pahwa
Instructors: Parth Shah, Riju Pahwa Lecture 2 Notes Outline 1. Neural Networks The Big Idea Architecture SGD and Backpropagation 2. Convolutional Neural Networks Intuition Architecture 3. Recurrent Neural
More informationCOMP9444 Neural Networks and Deep Learning 5. Geometry of Hidden Units
COMP9 8s Geometry of Hidden Units COMP9 Neural Networks and Deep Learning 5. Geometry of Hidden Units Outline Geometry of Hidden Unit Activations Limitations of -layer networks Alternative transfer functions
More informationImage Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction
Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction by Noh, Hyeonwoo, Paul Hongsuck Seo, and Bohyung Han.[1] Presented : Badri Patro 1 1 Computer Vision Reading
More informationPixel-level Generative Model
Pixel-level Generative Model Generative Image Modeling Using Spatial LSTMs (2015NIPS) L. Theis and M. Bethge University of Tübingen, Germany Pixel Recurrent Neural Networks (2016ICML) A. van den Oord,
More informationarxiv: v2 [cs.ne] 10 Nov 2018
Number Sequence Prediction Problems for Evaluating Computational Powers of Neural Networks Hyoungwook Nam College of Liberal Studies Seoul National University Seoul, Korea hwnam8@snu.ac.kr Segwang Kim
More informationGated Recurrent Models. Stephan Gouws & Richard Klein
Gated Recurrent Models Stephan Gouws & Richard Klein Outline Part 1: Intuition, Inference and Training Building intuitions: From Feedforward to Recurrent Models Inference in RNNs: Fprop Training in RNNs:
More informationIntroducing CURRENNT: The Munich Open-Source CUDA RecurREnt Neural Network Toolkit
Journal of Machine Learning Research 6 205) 547-55 Submitted 7/3; Published 3/5 Introducing CURRENNT: The Munich Open-Source CUDA RecurREnt Neural Network Toolkit Felix Weninger weninger@tum.de Johannes
More informationNeural Network Neurons
Neural Networks Neural Network Neurons 1 Receives n inputs (plus a bias term) Multiplies each input by its weight Applies activation function to the sum of results Outputs result Activation Functions Given
More informationStructured Attention Networks
Structured Attention Networks Yoon Kim Carl Denton Luong Hoang Alexander M. Rush HarvardNLP 1 Deep Neural Networks for Text Processing and Generation 2 Attention Networks 3 Structured Attention Networks
More informationNeural Networks (pp )
Notation: Means pencil-and-paper QUIZ Means coding QUIZ Neural Networks (pp. 106-121) The first artificial neural network (ANN) was the (single-layer) perceptron, a simplified model of a biological neuron.
More informationAn Online Sequence-to-Sequence Model Using Partial Conditioning
An Online Sequence-to-Sequence Model Using Partial Conditioning Navdeep Jaitly Google Brain ndjaitly@google.com David Sussillo Google Brain sussillo@google.com Quoc V. Le Google Brain qvl@google.com Oriol
More informationDeep Learning with Tensorflow AlexNet
Machine Learning and Computer Vision Group Deep Learning with Tensorflow http://cvml.ist.ac.at/courses/dlwt_w17/ AlexNet Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton, "Imagenet classification
More informationTransition-Based Dependency Parsing with Stack Long Short-Term Memory
Transition-Based Dependency Parsing with Stack Long Short-Term Memory Chris Dyer, Miguel Ballesteros, Wang Ling, Austin Matthews, Noah A. Smith Association for Computational Linguistics (ACL), 2015 Presented
More informationNovel Image Captioning
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More informationDeep Neural Networks Applications in Handwriting Recognition
Deep Neural Networks Applications in Handwriting Recognition 2 Who am I? Théodore Bluche PhD defended at Université Paris-Sud last year Deep Neural Networks for Large Vocabulary Handwritten
More informationAutoencoder. Representation learning (related to dictionary learning) Both the input and the output are x
Deep Learning 4 Autoencoder, Attention (spatial transformer), Multi-modal learning, Neural Turing Machine, Memory Networks, Generative Adversarial Net Jian Li IIIS, Tsinghua Autoencoder Autoencoder Unsupervised
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Image Data: Classification via Neural Networks Instructor: Yizhou Sun yzsun@ccs.neu.edu November 19, 2015 Methods to Learn Classification Clustering Frequent Pattern Mining
More informationDeep Learning based Authorship Identification
Deep Learning based Authorship Identification Chen Qian Tianchang He Rao Zhang Department of Electrical Engineering Stanford University, Stanford, CA 94305 cqian23@stanford.edu th7@stanford.edu zhangrao@stanford.edu
More informationData Mining. Neural Networks
Data Mining Neural Networks Goals for this Unit Basic understanding of Neural Networks and how they work Ability to use Neural Networks to solve real problems Understand when neural networks may be most
More informationSemantic image search using queries
Semantic image search using queries Shabaz Basheer Patel, Anand Sampat Department of Electrical Engineering Stanford University CA 94305 shabaz@stanford.edu,asampat@stanford.edu Abstract Previous work,
More informationDeep neural networks II
Deep neural networks II May 31 st, 2018 Yong Jae Lee UC Davis Many slides from Rob Fergus, Svetlana Lazebnik, Jia-Bin Huang, Derek Hoiem, Adriana Kovashka, Why (convolutional) neural networks? State of
More informationDeep Learning for Computer Vision II
IIIT Hyderabad Deep Learning for Computer Vision II C. V. Jawahar Paradigm Shift Feature Extraction (SIFT, HoG, ) Part Models / Encoding Classifier Sparrow Feature Learning Classifier Sparrow L 1 L 2 L
More informationA Deep Learning primer
A Deep Learning primer Riccardo Zanella r.zanella@cineca.it SuperComputing Applications and Innovation Department 1/21 Table of Contents Deep Learning: a review Representation Learning methods DL Applications
More informationGrounded Compositional Semantics for Finding and Describing Images with Sentences
Grounded Compositional Semantics for Finding and Describing Images with Sentences R. Socher, A. Karpathy, V. Le,D. Manning, A Y. Ng - 2013 Ali Gharaee 1 Alireza Keshavarzi 2 1 Department of Computational
More informationStructured Attention Networks
Structured Attention Networks Yoon Kim Carl Denton Luong Hoang Alexander M. Rush HarvardNLP ICLR, 2017 Presenter: Chao Jiang ICLR, 2017 Presenter: Chao Jiang 1 / Outline 1 Deep Neutral Networks for Text
More informationRNNs in TensorFlow. CS 20SI: TensorFlow for Deep Learning Research Lecture 11 2/22/2017
RNNs in TensorFlow CS 20SI: TensorFlow for Deep Learning Research Lecture 11 2/22/2017 1 2 Beat the World s Best at Super Smash Bros Beating the World s Best at Super Smash Bros. Melee with Deep Reinforcement
More informationLecture 7: Neural network acoustic models in speech recognition
CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2017 Lecture 7: Neural network acoustic models in speech recognition Outline Hybrid acoustic modeling overview Basic
More informationSentiment Classification of Food Reviews
1 2 3 4 5 6 7 8 9 10 11 12 13 14 Sentiment Classification of Food Reviews Hua Feng Ruixi Lin Department of Electrical Engineering Department of Electrical Engineering Stanford University Stanford University
More informationDifferentiable Data Structures (and POMDPs)
Differentiable Data Structures (and POMDPs) Yarin Gal & Rowan McAllister February 11, 2016 Many thanks to Edward Grefenstette for graphics material; other sources include Wikimedia licensed under CC BY-SA
More informationNeural Symbolic Machines: Learning Semantic Parsers on Freebase with Weak Supervision
Neural Symbolic Machines: Learning Semantic Parsers on Freebase with Weak Supervision Anonymized for review Abstract Extending the success of deep neural networks to high level tasks like natural language
More informationMotivation: Shortcomings of Hidden Markov Model. Ko, Youngjoong. Solution: Maximum Entropy Markov Model (MEMM)
Motivation: Shortcomings of Hidden Markov Model Maximum Entropy Markov Models and Conditional Random Fields Ko, Youngjoong Dept. of Computer Engineering, Dong-A University Intelligent System Laboratory,
More informationDeep Generative Models Variational Autoencoders
Deep Generative Models Variational Autoencoders Sudeshna Sarkar 5 April 2017 Generative Nets Generative models that represent probability distributions over multiple variables in some way. Directed Generative
More informationCombining Neural Networks and Log-linear Models to Improve Relation Extraction
Combining Neural Networks and Log-linear Models to Improve Relation Extraction Thien Huu Nguyen and Ralph Grishman Computer Science Department, New York University {thien,grishman}@cs.nyu.edu Outline Relation
More informationLECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS
LECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS Neural Networks Classifier Introduction INPUT: classification data, i.e. it contains an classification (class) attribute. WE also say that the class
More informationNeural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani
Neural Networks CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Biological and artificial neural networks Feed-forward neural networks Single layer
More information