Sequence Modeling: Recurrent and Recursive Nets. By Pyry Takala 14 Oct 2015
|
|
- Pauline Allison
- 6 years ago
- Views:
Transcription
1 Sequence Modeling: Recurrent and Recursive Nets By Pyry Takala 14 Oct 2015
2 Agenda Why Recurrent neural networks? Anatomy and basic training of an RNN (10.2, ) Properties of RNNs (10.2.2, 8.2.6) Using RNNs (10.2.3, ) RNN extensions ( ) Demos Next steps & references 1
3 Quiz 1. Where can you use RNNs? 2. Discuss for 1 minute
4 RNNs model sequential data What are examples of sequential data? 3
5 RNNs model sequential data What are examples of sequential data? Time-series data, e.g. economics Videos Speech Images, as perceived by humans Robot sensors Language 4
6 RNNs model sequential data What are examples of sequential data? Time-series data, e.g. economics Videos Speech Images, as perceived by humans Robot sensors Language Some feed-forward net types can also model sequences (e.g. TDNN), but are not ideal for long sequences (memory, network size etc.) 5
7 Example application: RNNs can translate text The heatmap shows probability densities for predicted pen locations as the word under is written 6 Live:
8 Example application: RNNs can caption images and videos Live: 7
9 Example application: RNNs can control robots 8
10 Example application: RNNs can translate text Mielenkiintoinen luento The interesting lecture 9
11 Agenda Why Recurrent neural networks? Anatomy and basic training of an RNN (10.2, ) Properties of RNNs (10.2.2, 8.2.6) Using RNNs (10.2.3, ) RNN extensions ( ) Demos Next steps & references 10
12 Quiz What algorithm can be used to train RNNs?
13 RNNs store a memory of the hidden state for the next sequence step Legend x = input s = state o = output U, V, W = weight matrices 12
14 RNNs store a memory of the hidden state for the next sequence step Legend x = input s = state o = output U, V, W = weight matrices 13
15 RNNs store a memory of the hidden state for the next sequence step Legend x = input s = state o = output U, V, W = weight matrices Shared parameters! 14
16 RNN computation: forward pass Forward pass Legend x = input s = state o = output U, V, W = weight matrices b,c = biases a t = value to hidden p t = output after softmax p t a t 15
17 RNN computation: loss Loss Legend x = input s = state o = output U, V, W = weight matrices b,c = biases a t = value to hidden p t = output after softmax y t = target class p t-1 p t p t+1 p t class1 class2 class3 target class = 3 16
18 RNNs can be trained with back-propagation through time (BPTT) BPTT Unfold the network Backpropagate the loss, calculating first a L for each hidden unit a p t-1 Legend x = input s = state o = output U, V, W = weight matrices b,c = biases a t = value to hidden p t = output after softmax y t = target class p t p t+1 Δ 17
19 RNNs can be trained with back-propagation through time (BPTT) BPTT Unfold the network Backpropagate the loss, calculating first a L for each hidden unit a and then θl for each parameter θ For instance, Δ p t-1 Legend x = input s = state o = output U, V, W = weight matrices b,c = biases a t = value to hidden p t = output after softmax y t = target class p t p t+1 Detailed derivate formulas can be found in the book. Theano calculates these automatically 18
20 Agenda Why Recurrent neural networks? Anatomy and basic training of an RNN (10.2, ) Properties of RNNs (10.2.2, 8.2.6) Using RNNs (10.2.3, ) RNN extensions ( ) Demos Next steps & references 19
21 Quiz What are limitations of RNNs?
22 RNNs have good generalization capabilities RNN learns which aspects of past sequence to keep and with what precision 21
23 RNNs have good generalization capabilities RNN learns which aspects of past sequence to keep and with what precision RNN can generalize because of shared parameters Generalization to different point in sequence Generalization between sequences of different length Complexity of function does not increase with sequence length 22
24 RNNs have good generalization capabilities RNN learns which aspects of past sequence to keep and with what precision RNN can generalize because of shared parameters Generalization to different point in sequence Generalization between sequences of different length Complexity of function does not increase with sequence length Limitations Hidden state must be large enough to remember all information Assumes stationarity Can be overcome, e.g. feed an additional input describing the position Difficult optimization 23
25 RNN states simplify the graph, allowing still complex dependencies Graphical model without states (inefficient parametrization) RNN with states (more efficient parametrization) vs 24
26 Gradients of RNNs can be unstable Non-linear recurrrence with itself, over many time steps à Highly non-linear function Derivatives tend to vanish or explode as the number of steps between two states increases This is because it is equal to product of state transition Jacobian matrices This can cause for instance exploding gradients For details, see chapter
27 Agenda Why Recurrent neural networks? Anatomy and basic training of an RNN (10.2, ) Properties of RNNs (10.2.2, 8.2.6) Using RNNs (10.2.3, ) RNN extensions ( ) Demos Next steps & references 26
28 RNNs can generate sequences Generate an output, and feed it at the next time step Teacher forcing = use actual sequence Strict forcing often not advisable: inputs generated by net likely different A generative model needs to stop generation at some point. Alternatives: a) End of sequence symbol b) Binomial output stop/continue c) Model number of timesteps left 27
29 Adding extra context can be done in several ways or x 28
30 Conditional generative RNN assumes that we want to use also x to predict y 29
31 Some tricks of trade can be useful when training RNNs Gradient explosion can be dealt e.g. with gradient clipping The heuristic introduces a bias but works well in practice Even taking a random step helps Wall in error surface Clipped gradient 30
32 Some tricks of trade can be useful when training RNNs Gradient explosion can be dealt e.g. with gradient clipping The heuristic introduces a bias but works well in practice Even taking a random step helps Wall in error surface Clipped gradient Gradient vanishing can be dealt with memory units, e.g. LSTMs Smart initialization of weights and use of squashing non-linearity (e.g. tanh) can also help 31
33 Agenda Why Recurrent neural networks? Anatomy and basic training of an RNN (10.2, ) Properties of RNNs (10.2.2, 8.2.6) Using RNNs (10.2.3, ) RNN extensions ( ) Demos Next steps & references 32
34 Quiz How can we capture long-term dependencies with RNNs? 5. 33
35 RNNs have been extended for different purposes Architectural variants with different expressive power Deep RNNs Bi-Directional RNNs Recursive nets 34
36 RNNs have been extended for different purposes Architectural variants with different expressive power Deep RNNs Bi-Directional RNNs Recursive nets Solutions to dealing with long-term dependencies and memory RNNs with multiple time-scales LSTM memory units Sequence-to-sequence models Attention Memory nets / Neural Turing Machines 35
37 Deep RNNs Multiple RNN-layers Additional MLP-layer Additional MLP-layer and skip connections May also hurt, as the path from an event becomes longer à harder to learn long-term dependencies 36
38 Bi-directional RNN considers information from two directions We don t always assume a causal left-to-right structure, sometimes the output depends on whole input Bi-directional RNNs give more information to your network You should know the future sequence ahead of time Extends to 2D 37
39 Recursive nets More general than an RNN chain, e.g. a tree Has been used used to process data structures as NN-inputs, in NLP and in computer vision With sequence of the same length N, depth reduced from N (for RNN) to O(logN) Tree structuring unclear Balanced binary? External method (parse tree for NLP)? 38
40 Long-term dependencies are hard to capture Hidden state of RNNs needs to remember a lot This is burdensome especially with long sequences 39
41 Long-term dependencies are hard to capture Hidden state of RNNs needs to remember a lot This is burdensome especially with long sequences Neural units that learn to remember some inputs can alleviate this 40
42 Long-term dependencies are hard to capture Hidden state of RNNs needs to remember a lot This is burdensome especially with long sequences Neural units that learn to remember some inputs can alleviate this Echo-state networks (liquid state machines, reservoir computing) fix all weights but the final layer Weights are set so that the net is at the edge of stability (values around 1 for the leading singular value of J of the state-to-state transition) 41
43 Long-term dependencies are hard to capture Hidden state of RNNs needs to remember a lot This is burdensome especially with long sequences Neural units that learn to remember some inputs can alleviate this Echo-state networks (liquid state machines, reservoir computing) fix all weights but the final layer Weights are set so that the net is at the edge of stability (values around 1 for the leading singular value of J of the state-to-state transition) Long-short term memory (LSTM) first, most commonly used memory units Can accumulate information, and forget it when it was used and no more needed Better at long-term dependencies than normal RNNs Can be trained to work on tasks requiring memory over >200 steps Very successful for instance at text generation, hand-writing recognition and speech recognition Other memory units exist, e.g. GRU and memory units with multiple layers 42
44 Multiple time scales could be used 43
45 LSTMs are a common solution RNN LSTM 44
46 LSTMs are a common solution RNN There is a path from x t-1 to h t+1 with no non-linearities All gates are sigmoid units Remembered state is passed on LSTM 45 Forget-gate (scale old cell value = reset) Input-gate (scale input to cell = write) Output gate (scale output from cell = read) State influences decisions at next time step
47 Some LSTM-cells are interpretable 46
48 An encoder-decoder (sequence-to-sequence) model can capture a different sequence relation 47
49 RNNs can be used with different kinds of sequences Vanilla mode, no RNN. E.g. image classification Sequence output E.g. image captioning Sequence input E.g. sentiment analysis Sequence input and output (encoderdecoder, sequence-tosequence) E.g. translation, question answering Synced sequence input and output E.g. label each video frame 48 Live:
50 Attention avoids having to memorize everything (1/2) Encoder-RNN needs to store a large number of information to a small state An attention mechanism creates an attention vector from all inputs When generating outputs, the mechanism learns to shifts its attention at each step to the most relevant part in the input 49
51 Attention avoids having to memorize everything (2/2) 50
52 Memory networks / Neural Turing Machines (NTMs) can shift their attention and write to memory 51 Neural nets are good at storing implicit knowledge, but bad at storing facts Humans have a working memory system Memory networks / NTMs have memory cells that can be read from (like in attention) and written to A cell stores a vector. The cells can be read from by location ( access cell 347 ) and by content ( access cell that has information about my dad ) Current systems implement a softattention (reading from multiple cells). This is convenient when training based on the gradient. Active research currently on hard attention (reading from a specific cell) Successfully used e.g. to learn to sort values and to perform reasoning from simplified text
53 Agenda Why Recurrent neural networks? Anatomy and basic training of an RNN (10.2, ) Properties of RNNs (10.2.2, 8.2.6) Using RNNs (10.2.3, ) RNN extensions ( ) Demos Next steps & references 52
54 Quiz How can neural networks learn to execute programs? 53
55 State-of-the art RNNs can learn to predict how a (simple) program would execute LSTM 2 layers Unrolled for 50 steps 400 units per layer Params initialized uniformly Clipped gradients Own learning rate scheme 54
56 State-of-the art RNNs can learn to predict conversation responses Sequence-to-sequence 400-words long interactions Single-layer LSTM 1024 units Gradient clipping Most common 20K words 30M tokens, 3M in validation Larger recurrent networks trained with GPU machines 55
57 Code-demo 56
58 Agenda Why Recurrent neural networks? Anatomy and basic training of an RNN (10.2, ) Properties of RNNs (10.2.2, 8.2.6) Using RNNs (10.2.3, ) RNN extensions ( ) Demos Next steps & references 57
59 Quiz 1. Where can you use RNNs? 2. What algorithm can be used to train RNNs? 3. What are limitations of RNNs? 4. How can we capture long-term dependencies with RNNs? 5. How can neural networks learn to execute programs? 58
60 Exercises Read Chapter 10 (Sequence modeling) Read Chapter 15 (Linear Factor Models and Auto-Encoders) Read the Theano-tutorial on recurrent neural networks: For practical code examples, other sources may be useful, e.g. Exercise: Read MNIST columnwise, spit out the class at each step, plot training performance as a function of columns read. No lecture next week 59
61 References
CSC 578 Neural Networks and Deep Learning
CSC 578 Neural Networks and Deep Learning Fall 2018/19 7. Recurrent Neural Networks (Some figures adapted from NNDL book) 1 Recurrent Neural Networks 1. Recurrent Neural Networks (RNNs) 2. RNN Training
More informationMachine Learning 13. week
Machine Learning 13. week Deep Learning Convolutional Neural Network Recurrent Neural Network 1 Why Deep Learning is so Popular? 1. Increase in the amount of data Thanks to the Internet, huge amount of
More informationRecurrent Neural Network (RNN) Industrial AI Lab.
Recurrent Neural Network (RNN) Industrial AI Lab. For example (Deterministic) Time Series Data Closed- form Linear difference equation (LDE) and initial condition High order LDEs 2 (Stochastic) Time Series
More informationRecurrent Neural Networks
Recurrent Neural Networks Javier Béjar Deep Learning 2018/2019 Fall Master in Artificial Intelligence (FIB-UPC) Introduction Sequential data Many problems are described by sequences Time series Video/audio
More informationLSTM and its variants for visual recognition. Xiaodan Liang Sun Yat-sen University
LSTM and its variants for visual recognition Xiaodan Liang xdliang328@gmail.com Sun Yat-sen University Outline Context Modelling with CNN LSTM and its Variants LSTM Architecture Variants Application in
More informationEncoding RNNs, 48 End of sentence (EOS) token, 207 Exploding gradient, 131 Exponential function, 42 Exponential Linear Unit (ELU), 44
A Activation potential, 40 Annotated corpus add padding, 162 check versions, 158 create checkpoints, 164, 166 create input, 160 create train and validation datasets, 163 dropout, 163 DRUG-AE.rel file,
More informationNatural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu
Natural Language Processing CS 6320 Lecture 6 Neural Language Models Instructor: Sanda Harabagiu In this lecture We shall cover: Deep Neural Models for Natural Language Processing Introduce Feed Forward
More informationSEMANTIC COMPUTING. Lecture 9: Deep Learning: Recurrent Neural Networks (RNNs) TU Dresden, 21 December 2018
SEMANTIC COMPUTING Lecture 9: Deep Learning: Recurrent Neural Networks (RNNs) Dagmar Gromann International Center For Computational Logic TU Dresden, 21 December 2018 Overview Handling Overfitting Recurrent
More informationModeling Sequences Conditioned on Context with RNNs
Modeling Sequences Conditioned on Context with RNNs Sargur Srihari srihari@buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 10. Topics in Sequence
More informationRNNs as Directed Graphical Models
RNNs as Directed Graphical Models Sargur Srihari srihari@buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 10. Topics in Sequence Modeling Overview
More informationRecurrent Neural Nets II
Recurrent Neural Nets II Steven Spielberg Pon Kumar, Tingke (Kevin) Shen Machine Learning Reading Group, Fall 2016 9 November, 2016 Outline 1 Introduction 2 Problem Formulations with RNNs 3 LSTM for Optimization
More informationRecurrent Neural Networks. Nand Kishore, Audrey Huang, Rohan Batra
Recurrent Neural Networks Nand Kishore, Audrey Huang, Rohan Batra Roadmap Issues Motivation 1 Application 1: Sequence Level Training 2 Basic Structure 3 4 Variations 5 Application 3: Image Classification
More informationLSTM: An Image Classification Model Based on Fashion-MNIST Dataset
LSTM: An Image Classification Model Based on Fashion-MNIST Dataset Kexin Zhang, Research School of Computer Science, Australian National University Kexin Zhang, U6342657@anu.edu.au Abstract. The application
More informationNatural Language Processing with Deep Learning CS224N/Ling284
Natural Language Processing with Deep Learning CS224N/Ling284 Lecture 8: Recurrent Neural Networks Christopher Manning and Richard Socher Organization Extra project office hour today after lecture Overview
More informationNatural Language Processing with Deep Learning CS224N/Ling284. Christopher Manning Lecture 4: Backpropagation and computation graphs
Natural Language Processing with Deep Learning CS4N/Ling84 Christopher Manning Lecture 4: Backpropagation and computation graphs Lecture Plan Lecture 4: Backpropagation and computation graphs 1. Matrix
More informationGated Recurrent Models. Stephan Gouws & Richard Klein
Gated Recurrent Models Stephan Gouws & Richard Klein Outline Part 1: Intuition, Inference and Training Building intuitions: From Feedforward to Recurrent Models Inference in RNNs: Fprop Training in RNNs:
More informationLecture 2 Notes. Outline. Neural Networks. The Big Idea. Architecture. Instructors: Parth Shah, Riju Pahwa
Instructors: Parth Shah, Riju Pahwa Lecture 2 Notes Outline 1. Neural Networks The Big Idea Architecture SGD and Backpropagation 2. Convolutional Neural Networks Intuition Architecture 3. Recurrent Neural
More informationEmpirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling
Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling Authors: Junyoung Chung, Caglar Gulcehre, KyungHyun Cho and Yoshua Bengio Presenter: Yu-Wei Lin Background: Recurrent Neural
More informationSentiment Classification of Food Reviews
Sentiment Classification of Food Reviews Hua Feng Department of Electrical Engineering Stanford University Stanford, CA 94305 fengh15@stanford.edu Ruixi Lin Department of Electrical Engineering Stanford
More informationThe Hitchhiker s Guide to TensorFlow:
The Hitchhiker s Guide to TensorFlow: Beyond Recurrent Neural Networks (sort of) Keith Davis @keithdavisiii iamthevastidledhitchhiker.github.io Topics Kohonen/Self-Organizing Maps LSTMs in TensorFlow GRU
More informationTopics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound. Lecture 12: Deep Reinforcement Learning
Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound Lecture 12: Deep Reinforcement Learning Types of Learning Supervised training Learning from the teacher Training data includes
More information27: Hybrid Graphical Models and Neural Networks
10-708: Probabilistic Graphical Models 10-708 Spring 2016 27: Hybrid Graphical Models and Neural Networks Lecturer: Matt Gormley Scribes: Jakob Bauer Otilia Stretcu Rohan Varma 1 Motivation We first look
More informationTable of Contents. What Really is a Hidden Unit? Visualizing Feed-Forward NNs. Visualizing Convolutional NNs. Visualizing Recurrent NNs
Table of Contents What Really is a Hidden Unit? Visualizing Feed-Forward NNs Visualizing Convolutional NNs Visualizing Recurrent NNs Visualizing Attention Visualizing High Dimensional Data What do visualizations
More informationDeep Learning Applications
October 20, 2017 Overview Supervised Learning Feedforward neural network Convolution neural network Recurrent neural network Recursive neural network (Recursive neural tensor network) Unsupervised Learning
More informationSpeech and Language Processing. Daniel Jurafsky & James H. Martin. Copyright c All rights reserved. Draft of September 23, 2018.
Speech and Language Processing. Daniel Jurafsky & James H. Martin. Copyright c 2018. All rights reserved. Draft of September 23, 2018. CHAPTER 9 Sequence Processing with Recurrent Networks Time will explain.
More informationSlide credit from Hung-Yi Lee & Richard Socher
Slide credit from Hung-Yi Lee & Richard Socher 1 Review Word Vector 2 Word2Vec Variants Skip-gram: predicting surrounding words given the target word (Mikolov+, 2013) CBOW (continuous bag-of-words): predicting
More informationResidual Networks And Attention Models. cs273b Recitation 11/11/2016. Anna Shcherbina
Residual Networks And Attention Models cs273b Recitation 11/11/2016 Anna Shcherbina Introduction to ResNets Introduced in 2015 by Microsoft Research Deep Residual Learning for Image Recognition (He, Zhang,
More informationLecture 20: Neural Networks for NLP. Zubin Pahuja
Lecture 20: Neural Networks for NLP Zubin Pahuja zpahuja2@illinois.edu courses.engr.illinois.edu/cs447 CS447: Natural Language Processing 1 Today s Lecture Feed-forward neural networks as classifiers simple
More informationDeep Neural Networks Applications in Handwriting Recognition
Deep Neural Networks Applications in Handwriting Recognition 2 Who am I? Théodore Bluche PhD defended at Université Paris-Sud last year Deep Neural Networks for Large Vocabulary Handwritten
More information16-785: Integrated Intelligence in Robotics: Vision, Language, and Planning. Spring 2018 Lecture 14. Image to Text
16-785: Integrated Intelligence in Robotics: Vision, Language, and Planning Spring 2018 Lecture 14. Image to Text Input Output Classification tasks 4/1/18 CMU 16-785: Integrated Intelligence in Robotics
More informationGate-Variants of Gated Recurrent Unit (GRU) Neural Networks
Gate-Variants of Gated Recurrent Unit (GRU) Neural Networks Rahul Dey and Fathi M. Salem Circuits, Systems, and Neural Networks (CSANN) LAB Department of Electrical and Computer Engineering Michigan State
More informationCS 224n: Assignment #3
CS 224n: Assignment #3 Due date: 2/27 11:59 PM PST (You are allowed to use 3 late days maximum for this assignment) These questions require thought, but do not require long answers. Please be as concise
More informationLecture 17: Neural Networks and Deep Learning. Instructor: Saravanan Thirumuruganathan
Lecture 17: Neural Networks and Deep Learning Instructor: Saravanan Thirumuruganathan Outline Perceptron Neural Networks Deep Learning Convolutional Neural Networks Recurrent Neural Networks Auto Encoders
More informationArtificial Neural Networks. Introduction to Computational Neuroscience Ardi Tampuu
Artificial Neural Networks Introduction to Computational Neuroscience Ardi Tampuu 7.0.206 Artificial neural network NB! Inspired by biology, not based on biology! Applications Automatic speech recognition
More informationSEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic
SEMANTIC COMPUTING Lecture 8: Introduction to Deep Learning Dagmar Gromann International Center For Computational Logic TU Dresden, 7 December 2018 Overview Introduction Deep Learning General Neural Networks
More informationEECS 496 Statistical Language Models. Winter 2018
EECS 496 Statistical Language Models Winter 2018 Introductions Professor: Doug Downey Course web site: www.cs.northwestern.edu/~ddowney/courses/496_winter2018 (linked off prof. home page) Logistics Grading
More informationDeep Neural Networks Applications in Handwriting Recognition
Deep Neural Networks Applications in Handwriting Recognition Théodore Bluche theodore.bluche@gmail.com São Paulo Meetup - 9 Mar. 2017 2 Who am I? Théodore Bluche PhD defended
More informationRECURRENT NEURAL NETWORKS
RECURRENT NEURAL NETWORKS Methods Traditional Deep-Learning based Non-machine Learning Machine-Learning based method Supervised SVM MLP CNN RNN (LSTM) Localizati on GPS, SLAM Self Driving Perception Pedestrian
More informationReservoir Computing with Emphasis on Liquid State Machines
Reservoir Computing with Emphasis on Liquid State Machines Alex Klibisz University of Tennessee aklibisz@gmail.com November 28, 2016 Context and Motivation Traditional ANNs are useful for non-linear problems,
More informationAsynchronous Parallel Learning for Neural Networks and Structured Models with Dense Features
Asynchronous Parallel Learning for Neural Networks and Structured Models with Dense Features Xu SUN ( 孙栩 ) Peking University xusun@pku.edu.cn Motivation Neural networks -> Good Performance CNN, RNN, LSTM
More informationRelational inductive biases, deep learning, and graph networks
Relational inductive biases, deep learning, and graph networks Peter Battaglia et al. 2018 1 What The authors explore how we can combine relational inductive biases and DL. They introduce graph network
More informationInference Optimization Using TensorRT with Use Cases. Jack Han / 한재근 Solutions Architect NVIDIA
Inference Optimization Using TensorRT with Use Cases Jack Han / 한재근 Solutions Architect NVIDIA Search Image NLP Maps TensorRT 4 Adoption Use Cases Speech Video AI Inference is exploding 1 Billion Videos
More informationAkarsh Pokkunuru EECS Department Contractive Auto-Encoders: Explicit Invariance During Feature Extraction
Akarsh Pokkunuru EECS Department 03-16-2017 Contractive Auto-Encoders: Explicit Invariance During Feature Extraction 1 AGENDA Introduction to Auto-encoders Types of Auto-encoders Analysis of different
More informationA Quick Guide on Training a neural network using Keras.
A Quick Guide on Training a neural network using Keras. TensorFlow and Keras Keras Open source High level, less flexible Easy to learn Perfect for quick implementations Starts by François Chollet from
More informationLECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS
LECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS Neural Networks Classifier Introduction INPUT: classification data, i.e. it contains an classification (class) attribute. WE also say that the class
More informationRecurrent Neural Networks
Recurrent Neural Networks 11-785 / Fall 2018 / Recitation 7 Raphaël Olivier Recap : RNNs are magic They have infinite memory They handle all kinds of series They re the basis of recent NLP : Translation,
More informationCS489/698: Intro to ML
CS489/698: Intro to ML Lecture 14: Training of Deep NNs Instructor: Sun Sun 1 Outline Activation functions Regularization Gradient-based optimization 2 Examples of activation functions 3 5/28/18 Sun Sun
More informationDeep Convolutional Neural Networks. Nov. 20th, 2015 Bruce Draper
Deep Convolutional Neural Networks Nov. 20th, 2015 Bruce Draper Background: Fully-connected single layer neural networks Feed-forward classification Trained through back-propagation Example Computer Vision
More informationNeural Network Neurons
Neural Networks Neural Network Neurons 1 Receives n inputs (plus a bias term) Multiplies each input by its weight Applies activation function to the sum of results Outputs result Activation Functions Given
More informationMachine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,
Machine Learning 10-701, Fall 2015 Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October 6, 2015 Eric Xing @ CMU, 2015 1 A perennial challenge in computer vision: feature engineering SIFT Spin image
More informationKnowledge Discovery and Data Mining. Neural Nets. A simple NN as a Mathematical Formula. Notes. Lecture 13 - Neural Nets. Tom Kelsey.
Knowledge Discovery and Data Mining Lecture 13 - Neural Nets Tom Kelsey School of Computer Science University of St Andrews http://tom.home.cs.st-andrews.ac.uk twk@st-andrews.ac.uk Tom Kelsey ID5059-13-NN
More informationLecture 21 : A Hybrid: Deep Learning and Graphical Models
10-708: Probabilistic Graphical Models, Spring 2018 Lecture 21 : A Hybrid: Deep Learning and Graphical Models Lecturer: Kayhan Batmanghelich Scribes: Paul Liang, Anirudha Rayasam 1 Introduction and Motivation
More informationMachine Learning With Python. Bin Chen Nov. 7, 2017 Research Computing Center
Machine Learning With Python Bin Chen Nov. 7, 2017 Research Computing Center Outline Introduction to Machine Learning (ML) Introduction to Neural Network (NN) Introduction to Deep Learning NN Introduction
More informationNeural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani
Neural Networks CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Biological and artificial neural networks Feed-forward neural networks Single layer
More informationPractical Methodology. Lecture slides for Chapter 11 of Deep Learning Ian Goodfellow
Practical Methodology Lecture slides for Chapter 11 of Deep Learning www.deeplearningbook.org Ian Goodfellow 2016-09-26 What drives success in ML? Arcane knowledge of dozens of obscure algorithms? Mountains
More informationKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Lecture 13 - Neural Nets Tom Kelsey School of Computer Science University of St Andrews http://tom.home.cs.st-andrews.ac.uk twk@st-andrews.ac.uk Tom Kelsey ID5059-13-NN
More informationFastText. Jon Koss, Abhishek Jindal
FastText Jon Koss, Abhishek Jindal FastText FastText is on par with state-of-the-art deep learning classifiers in terms of accuracy But it is way faster: FastText can train on more than one billion words
More informationTransition-Based Dependency Parsing with Stack Long Short-Term Memory
Transition-Based Dependency Parsing with Stack Long Short-Term Memory Chris Dyer, Miguel Ballesteros, Wang Ling, Austin Matthews, Noah A. Smith Association for Computational Linguistics (ACL), 2015 Presented
More informationDiffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting
Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting Yaguang Li Joint work with Rose Yu, Cyrus Shahabi, Yan Liu Page 1 Introduction Traffic congesting is wasteful of time,
More informationEmpirical Evaluation of RNN Architectures on Sentence Classification Task
Empirical Evaluation of RNN Architectures on Sentence Classification Task Lei Shen, Junlin Zhang Chanjet Information Technology lorashen@126.com, zhangjlh@chanjet.com Abstract. Recurrent Neural Networks
More informationConvolutional Sequence to Sequence Learning. Denis Yarats with Jonas Gehring, Michael Auli, David Grangier, Yann Dauphin Facebook AI Research
Convolutional Sequence to Sequence Learning Denis Yarats with Jonas Gehring, Michael Auli, David Grangier, Yann Dauphin Facebook AI Research Sequence generation Need to model a conditional distribution
More informationCOMP9444 Neural Networks and Deep Learning 5. Geometry of Hidden Units
COMP9 8s Geometry of Hidden Units COMP9 Neural Networks and Deep Learning 5. Geometry of Hidden Units Outline Geometry of Hidden Unit Activations Limitations of -layer networks Alternative transfer functions
More informationNeural Networks and Deep Learning
Neural Networks and Deep Learning Example Learning Problem Example Learning Problem Celebrity Faces in the Wild Machine Learning Pipeline Raw data Feature extract. Feature computation Inference: prediction,
More informationDeep Learning for Computer Vision II
IIIT Hyderabad Deep Learning for Computer Vision II C. V. Jawahar Paradigm Shift Feature Extraction (SIFT, HoG, ) Part Models / Encoding Classifier Sparrow Feature Learning Classifier Sparrow L 1 L 2 L
More informationMachine learning for vision. It s the features, stupid! cathedral. high-rise. Winter Roland Memisevic. Lecture 2, January 26, 2016
Winter 2016 Lecture 2, Januar 26, 2016 f2? cathedral high-rise f1 A common computer vision pipeline before 2012 1. 2. 3. 4. Find interest points. Crop patches around them. Represent each patch with a sparse
More informationCode Mania Artificial Intelligence: a. Module - 1: Introduction to Artificial intelligence and Python:
Code Mania 2019 Artificial Intelligence: a. Module - 1: Introduction to Artificial intelligence and Python: 1. Introduction to Artificial Intelligence 2. Introduction to python programming and Environment
More informationOn the Efficiency of Recurrent Neural Network Optimization Algorithms
On the Efficiency of Recurrent Neural Network Optimization Algorithms Ben Krause, Liang Lu, Iain Murray, Steve Renals University of Edinburgh Department of Informatics s17005@sms.ed.ac.uk, llu@staffmail.ed.ac.uk,
More informationCMU Lecture 18: Deep learning and Vision: Convolutional neural networks. Teacher: Gianni A. Di Caro
CMU 15-781 Lecture 18: Deep learning and Vision: Convolutional neural networks Teacher: Gianni A. Di Caro DEEP, SHALLOW, CONNECTED, SPARSE? Fully connected multi-layer feed-forward perceptrons: More powerful
More informationDeep Learning. Architecture Design for. Sargur N. Srihari
Architecture Design for Deep Learning Sargur N. srihari@cedar.buffalo.edu 1 Topics Overview 1. Example: Learning XOR 2. Gradient-Based Learning 3. Hidden Units 4. Architecture Design 5. Backpropagation
More informationRecurrent Neural Networks and Transfer Learning for Action Recognition
Recurrent Neural Networks and Transfer Learning for Action Recognition Andrew Giel Stanford University agiel@stanford.edu Ryan Diaz Stanford University ryandiaz@stanford.edu Abstract We have taken on the
More informationApplication of Deep Learning Techniques in Satellite Telemetry Analysis.
Application of Deep Learning Techniques in Satellite Telemetry Analysis. Greg Adamski, Member of Technical Staff L3 Technologies Telemetry and RF Products Julian Spencer Jones, Spacecraft Engineer Telenor
More informationImage Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction
Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction by Noh, Hyeonwoo, Paul Hongsuck Seo, and Bohyung Han.[1] Presented : Badri Patro 1 1 Computer Vision Reading
More informationStructured Attention Networks
Structured Attention Networks Yoon Kim Carl Denton Luong Hoang Alexander M. Rush HarvardNLP ICLR, 2017 Presenter: Chao Jiang ICLR, 2017 Presenter: Chao Jiang 1 / Outline 1 Deep Neutral Networks for Text
More informationCPSC 340: Machine Learning and Data Mining. Deep Learning Fall 2016
CPSC 340: Machine Learning and Data Mining Deep Learning Fall 2016 Assignment 5: Due Friday. Assignment 6: Due next Friday. Final: Admin December 12 (8:30am HEBB 100) Covers Assignments 1-6. Final from
More informationCombining Neural Networks and Log-linear Models to Improve Relation Extraction
Combining Neural Networks and Log-linear Models to Improve Relation Extraction Thien Huu Nguyen and Ralph Grishman Computer Science Department, New York University {thien,grishman}@cs.nyu.edu Outline Relation
More informationCOMP 551 Applied Machine Learning Lecture 16: Deep Learning
COMP 551 Applied Machine Learning Lecture 16: Deep Learning Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted, all
More informationContext Encoding LSTM CS224N Course Project
Context Encoding LSTM CS224N Course Project Abhinav Rastogi arastogi@stanford.edu Supervised by - Samuel R. Bowman December 7, 2015 Abstract This project uses ideas from greedy transition based parsing
More informationMachine Learning: Chenhao Tan University of Colorado Boulder LECTURE 15
Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 15 Slides adapted from Jordan Boyd-Graber Machine Learning: Chenhao Tan Boulder 1 of 21 Logistics HW3 available on Github, due on October
More informationHouse Price Prediction Using LSTM
House Price Prediction Using LSTM Xiaochen Chen Lai Wei The Hong Kong University of Science and Technology Jiaxin Xu ABSTRACT In this paper, we use the house price data ranging from January 2004 to October
More informationTitle. Author(s)Noguchi, Wataru; Iizuka, Hiroyuki; Yamamoto, Masahit. CitationEAI Endorsed Transactions on Security and Safety, 16
Title Proposing Multimodal Integration Model Using LSTM an Author(s)Noguchi, Wataru; Iizuka, Hiroyuki; Yamamoto, Masahit CitationEAI Endorsed Transactions on Security and Safety, 16 Issue Date 216-12-28
More informationCAP 6412 Advanced Computer Vision
CAP 6412 Advanced Computer Vision http://www.cs.ucf.edu/~bgong/cap6412.html Boqing Gong Feb 04, 2016 Today Administrivia Attention Modeling in Image Captioning, by Karan Neural networks & Backpropagation
More informationHidden Units. Sargur N. Srihari
Hidden Units Sargur N. srihari@cedar.buffalo.edu 1 Topics in Deep Feedforward Networks Overview 1. Example: Learning XOR 2. Gradient-Based Learning 3. Hidden Units 4. Architecture Design 5. Backpropagation
More informationTutorial on Keras CAP ADVANCED COMPUTER VISION SPRING 2018 KISHAN S ATHREY
Tutorial on Keras CAP 6412 - ADVANCED COMPUTER VISION SPRING 2018 KISHAN S ATHREY Deep learning packages TensorFlow Google PyTorch Facebook AI research Keras Francois Chollet (now at Google) Chainer Company
More informationConvolutional Networks for Text
CS11-747 Neural Networks for NLP Convolutional Networks for Text Graham Neubig Site https://phontron.com/class/nn4nlp2017/ An Example Prediction Problem: Sentence Classification I hate this movie very
More informationBackpropagation + Deep Learning
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Backpropagation + Deep Learning Matt Gormley Lecture 13 Mar 1, 2018 1 Reminders
More informationNeural Networks. Robot Image Credit: Viktoriya Sukhanova 123RF.com
Neural Networks These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online. Feel free to reuse or adapt these slides
More informationProgramming Exercise 4: Neural Networks Learning
Programming Exercise 4: Neural Networks Learning Machine Learning Introduction In this exercise, you will implement the backpropagation algorithm for neural networks and apply it to the task of hand-written
More informationDeep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group
Deep Learning Vladimir Golkov Technical University of Munich Computer Vision Group 1D Input, 1D Output target input 2 2D Input, 1D Output: Data Distribution Complexity Imagine many dimensions (data occupies
More informationCS 6501: Deep Learning for Computer Graphics. Training Neural Networks II. Connelly Barnes
CS 6501: Deep Learning for Computer Graphics Training Neural Networks II Connelly Barnes Overview Preprocessing Initialization Vanishing/exploding gradients problem Batch normalization Dropout Additional
More informationUnsupervised Learning
Deep Learning for Graphics Unsupervised Learning Niloy Mitra Iasonas Kokkinos Paul Guerrero Vladimir Kim Kostas Rematas Tobias Ritschel UCL UCL/Facebook UCL Adobe Research U Washington UCL Timetable Niloy
More informationPointer Network. Oriol Vinyals. 박천음 강원대학교 Intelligent Software Lab.
Pointer Network Oriol Vinyals 박천음 강원대학교 Intelligent Software Lab. Intelligent Software Lab. Pointer Network 1 Pointer Network 2 Intelligent Software Lab. 2 Sequence-to-Sequence Model Train 학습학습학습학습학습 Test
More informationRationalizing Sentiment Analysis in Tensorflow
Rationalizing Sentiment Analysis in Tensorflow Alyson Kane Stanford University alykane@stanford.edu Henry Neeb Stanford University hneeb@stanford.edu Kevin Shaw Stanford University keshaw@stanford.edu
More informationAdministrative. Assignment 1 due Wednesday April 18, 11:59pm
Lecture 4-1 Administrative Assignment 1 due Wednesday April 18, 11:59pm Lecture 4-2 Administrative All office hours this week will use queuestatus Lecture 4-3 Where we are... scores function SVM loss data
More information19: Inference and learning in Deep Learning
10-708: Probabilistic Graphical Models 10-708, Spring 2017 19: Inference and learning in Deep Learning Lecturer: Zhiting Hu Scribes: Akash Umakantha, Ryan Williamson 1 Classes of Deep Generative Models
More informationOPTIMIZING PERFORMANCE OF RECURRENT NEURAL NETWORKS
April 4-7, 2016 Silicon Valley OPTIMIZING PERFORMANCE OF RECURRENT NEURAL NETWORKS Jeremy Appleyard, 7 April 2016 RECURRENT NEURAL NETWORKS Output is fed into input Perform the same operation repeatedly
More informationShow, Discriminate, and Tell: A Discriminatory Image Captioning Model with Deep Neural Networks
Show, Discriminate, and Tell: A Discriminatory Image Captioning Model with Deep Neural Networks Zelun Luo Department of Computer Science Stanford University zelunluo@stanford.edu Te-Lin Wu Department of
More informationLSTM for Language Translation and Image Captioning. Tel Aviv University Deep Learning Seminar Oran Gafni & Noa Yedidia
1 LSTM for Language Translation and Image Captioning Tel Aviv University Deep Learning Seminar Oran Gafni & Noa Yedidia 2 Part I LSTM for Language Translation Motivation Background (RNNs, LSTMs) Model
More informationThis Talk. 1) Node embeddings. Map nodes to low-dimensional embeddings. 2) Graph neural networks. Deep learning architectures for graphstructured
Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018 1 This Talk 1) Node embeddings Map nodes to low-dimensional embeddings. 2) Graph neural networks Deep learning architectures
More informationAlternatives to Direct Supervision
CreativeAI: Deep Learning for Graphics Alternatives to Direct Supervision Niloy Mitra Iasonas Kokkinos Paul Guerrero Nils Thuerey Tobias Ritschel UCL UCL UCL TUM UCL Timetable Theory and Basics State of
More informationCS 4510/9010 Applied Machine Learning. Deep Learning. Paula Matuszek Fall copyright Paula Matuszek 2016
CS 4510/9010 Applied Machine Learning 1 Deep Learning Paula Matuszek Fall 2016 Beyond Simple Neural Nets 2 In the last few ideas we have seen some surprisingly rapid progress in some areas of AI Image
More information