Gated Recurrent Models. Stephan Gouws & Richard Klein
|
|
- Moses Parrish
- 6 years ago
- Views:
Transcription
1 Gated Recurrent Models Stephan Gouws & Richard Klein
2 Outline Part 1: Intuition, Inference and Training Building intuitions: From Feedforward to Recurrent Models Inference in RNNs: Fprop Training in RNNs: Backpropagation-through-time (BPTT) SHORT BREAK
3 Outline Part 2: Gated models & Applications Long Short-term Memory (LSTMs) Gated Recurrent Units (GRUs) Applications: Image captioning Sequence classification (Practical 4: MNIST) Language modeling Sequence-labeling (lots of NLP tasks, e.g. POS tagging, NER, ) Sequence-to-sequence learning (Machine translation, Dialogue modeling, )
4 Recurrent Models PART 1: Intuition, Inference and Training
5 Introduction?
6 Introduction? x2
7 Introduction? x3
8 We need to be able to remember information from previous time steps
9 Recurrent neural networks: Intuition
10 Long-term dependencies: Why do they matter? Michel C. was born in Paris, France. He is married and has three children. He received a M.S. in neurosciences from the University Pierre & Marie Curie and the Ecole Normale Supérieure in 1987, and and then spent most of his career in Switzerland, at the Ecole Polytechnique de Lausanne. He specialized in child and adolescent psychiatry and his first field of research was severe mood disorders in adolescent, topic of his PhD in neurosciences (2002). His Michel Hn-1 C. was born in mother tongue is????? Short context Hn French Long context English, German, Russian, French Problems
11 Types of Sequence Models FFNs Image Captioning MNIST predictor Seq2Seq Sequence labeling (e.g. NER)
12 Types of Sequence Models FFNs Image Captioning MNIST predictor Seq2Seq Sequence labeling (e.g. NER)
13 ? Types of Sequence Models We ll talk about this a little later. We ll also implement this in today s practical! FFNs Image Captioning MNIST predictor Seq2Seq Sequence labeling (e.g. NER)
14 Types of Sequence Models FFNs Image Captioning MNIST predictor Seq2Seq Sequence labeling (e.g. NER)
15 Types of Sequence Models FFNs Image Captioning MNIST predictor Seq2Seq Sequence labeling (e.g. NER)
16 Types of Sequence Models FFNs Image Captioning MNIST predictor Seq2Seq Sequence labeling (e.g. NER)
17 FFNs vs RNNs Classify following examples:,,,
18 FFNs vs RNNs PREDICT: 1 Y: outputs Y softmax tanh FFN H X X: inputs,,,
19 FFNs vs RNNs PREDICT: 9 Y: outputs Y softmax tanh FFN H X X: inputs,,,
20 FFNs vs RNNs PREDICT: 2 Y: outputs Y softmax tanh FFN H X X: inputs,,,
21 FFNs vs RNNs But what if these were not digits, but longer numbers?, Problem? Variable length inputs.,
22 FFNs vs RNNs Y: outputs Yt softmax tanh 1? RNN cell H Xt X: inputs at step t,,,
23 FFNs vs RNNs Y: outputs Yt softmax H: internal state tanh 19? RNN cell H Xt 1 H at t-1, X: inputs at step t,,
24 FFNs vs RNNs Maintains a state (memory) that carries information between inputs! Y: outputs Yt softmax H: internal state tanh 192? RNN cell H Xt 19 H at t-1, X: inputs at step t,,
25 The RNN API prev_state next_state recurrent_fn() x outputs
26 The RNN Computation Graph yt t-1 xt Feedback loop / state / memory / stack (previous time-step)
27 Unrolling the RNN Computation Graph y1 h0 h1 x1
28 Unrolling the RNN Computation Graph h0 x1 y1 y2 h1 h2 x2
29 Unrolling the RNN Computation Graph h0 x1 y1 y2 h1 h2 x2 Unrolled over n time-steps. yn... x3 hn
30 Unrolling the RNN Computation Graph NB: We reuse the same weights at every time-step! h0 y1 y2 h1 h2 x1 x2 The same Unrolled over n time-steps. yn... x3 hn
31 Unrolling the RNN Computation Graph NB: We reuse the same weights at every time-step! y1 y2 yn We can therefore think of an RNN as a composition of identical feedforward neural networks h1 (with replicated/tied h2 weights), one for each moment or step in time. h0 x1 x2 The same Unrolled over n time-steps. x3... hn
32 The FFN API class FeedForwardModel(): y #... def forward(self, x): # Compute activations on the hidden layer. hidden_layer = self.act_fn(np.dot(self.w_xh, x) + b) # Compute the (linear) output layer activations. y = np.dot(self.w_hy, hidden_layer) return y x
33 The FFN API class FeedForwardModel(): y #... def forward(self, x): # Compute activations on the hidden layer. hidden_layer = self.act_fn(np.dot(self.w_xh, x) + b) # Compute the (linear) output layer activations. y = np.dot(self.w_hy, hidden_layer) return y x
34 The FFN API class FeedForwardModel(): y #... def forward(self, x): # Compute activations on the hidden layer. hidden_layer = self.act_fn(np.dot(self.w_xh, x) + b) # Compute the (linear) output layer activations. y = np.dot(self.w_hy, hidden_layer) return y x
35 The FFN API class FeedForwardModel(): y #... def forward(self, x): # Compute activations on the hidden layer. hidden_layer = self.act_fn(np.dot(self.w_xh, x) + b) # Compute the (linear) output layer activations. y = np.dot(self.w_hy, hidden_layer) return y x
36 The FFN API class FeedForwardModel(): y #... def forward(self, x): # Compute activations on the hidden layer. hidden_layer = self.act_fn(np.dot(self.w_xh, x) + b) # Compute the (linear) output layer activations. y = np.dot(self.w_hy, hidden_layer) return y x
37 The RNN API class RecurrentModel(): #... yt def recurrent_fn(self, x, prev_state): # Compute the new state based on the previous state and current input. new_state = self.act_fn(np.dot(self.w_xh, x) + np.dot(self.w_hh, prev_state) + b) t-1 # Compute the output vector. y = np.dot(self.w_hy, new_state) return new_state, y xt
38 The RNN API class RecurrentModel(): #... yt def recurrent_fn(self, x, prev_state): # Compute the new state based on the previous state and current input. new_state = self.act_fn(np.dot(self.w_xh, x) + np.dot(self.w_hh, prev_state) + b) t-1 # Compute the output vector. y = np.dot(self.w_hy, new_state) return new_state, y xt
39 The RNN API New state Recurrent function Input at current time-step Previous state class RecurrentModel(): #... yt def recurrent_fn(self, x, prev_state): # Compute the new state based on the previous state and current input. new_state = self.act_fn(np.dot(self.w_xh, x) + np.dot(self.w_hh, prev_state)) t-1 # Compute the output vector. y = np.dot(self.w_hy, new_state) return new_state, y xt
40 The RNN API New state Recurrent function Input at current time-step Previous state class RecurrentModel(): #... yt def recurrent_fn(self, x, prev_state): # Compute the new state based on the previous state and current input. new_state = self.act_fn(np.dot(self.w_xh, x) + np.dot(self.w_hh, prev_state)) t-1 # Compute the output vector. y = np.dot(self.w_hy, new_state) return new_state, y xt
41 The RNN API def forward(self, data_sequence, initial_state): state = initial_state all_states, all_ys = [state], [] cache = [] for x, y in data_sequence: new_state, y_pred = recurrent_fn(x, state) loss += cross_entropy(y_pred, y) cache.append((new_state, y_pred)) state = new_state return loss, cache
42 The RNN API def forward(self, data_sequence, initial_state): state = initial_state all_states, all_ys = [state], [] cache = [] for x, y in data_sequence: new_state, y_pred = recurrent_fn(x, state) loss += cross_entropy(y_pred, y) cache.append((new_state, y_pred)) state = new_state return loss, cache
43 The RNN API def forward(self, data_sequence, initial_state): state = initial_state all_states, all_ys = [state], [] cache = [] for x, y in data_sequence: new_state, y_pred = recurrent_fn(x, state) loss += cross_entropy(y_pred, y) cache.append((new_state, y_pred)) state = new_state return loss, cache
44 Math: FFNs v RNNs NOTATION: Wxh is a matrix that maps a vector x into a vector h. y x
45 NOTATION: Wxh is a matrix that maps a vector x into a vector h. Math: FFNs v RNNs y Input x
46 NOTATION: Wxh is a matrix that maps a vector x into a vector h. Math: FFNs v RNNs y Activation function Input x
47 NOTATION: Wxh is a matrix that maps a vector x into a vector h. Math: FFNs v RNNs Hidden layer y Activation function Input x
48 Math: FFNs v RNNs NOTATION: Wxh is a matrix that maps a vector x into a vector h. Hidden layer yt Activation function Input t-1... New state Recurrent function Input at current time-step xt
49 Math: FFNs v RNNs NOTATION: Wxh is a matrix that maps a vector x into a vector h. Hidden layer yt Activation function Input t-1 New state Recurrent function Input at current time-step xt
50 Math: FFNs v RNNs NOTATION: Wxh is a matrix that maps a vector x into a vector h. Hidden layer yt Activation function Input Recurrent weights t-1 New state Recurrent function Input at current time-step Previous state xt
51 Inference & Training How do we make predictions using RNNs? Forward propagation: Fprop Essentially a composition of functions: a2 = f2(f1(x)). We unroll the computational graph over time-steps. How do we train RNNs? Backward propagation: Backprop-through time We need to consider predictions over several time-steps! Credit assignment over time. We work backwards in time from the last state to the first.
52 Training: Ways to Train RNNs Echo State Networks: Initialize Wxh, Whh, Who, carefully, then only train Who! Backpropagation through time (BPTT): Propagate errors backwards through the unrolled graph. There are other options.
53 Training: ESNs Simple solution: don t train the recurrent weights (Whh & Wxh)! Initialization very important. Super simple. However, with recent improvements in initialization etc, BPTT does better! [Scholarpedia]
54 Inference & Training How do we make predictions using RNNs? Forward propagation: Fprop Essentially a composition of functions: a2 = f2(f1(x)). We unroll the computational graph over time-steps. Error How do we train RNNs? Propagate errors backwards through unrolled graph: Backprop-through time (BPTT). We need to consider predictions over several time-steps! Credit assignment over time. We work backwards in time from the last state to the first.
55 Training: BPTT Intuition
56 Training: Truncated BPTT
57 Training: Truncated BPTT
58 Training: Truncated BPTT
59 Unrolling the RNN Computation Graph h0 x1 y1 y2 h1 h2 x2 Unrolled over n time-steps. yt... x3 ht
60 Unrolling the RNN Computation Graph h0 x1 y1 y2 h1 h2 x2 Unrolled over n time-steps. yt... x3 ht
61 Step 1: Compute all errors. Unrolling the RNN Computation Graph y1 h0 y2 E1 h1 x1 yt E2 h2 x2 Unrolled over n time-steps.... x3 ht Et
62 Unrolling the RNN Computation Graph y1 h0 y2 E1 h1 x1 Step 1: Compute all errors. Step 2: Pass error back for each time-step from n back to 1. yt E2 h2 x2 Unrolled over n time-steps.... x3 ht Et
63 Unrolling the RNN Computation Graph y1 h0 y2 E1 h1 x1 Step 1: Compute all errors. Step 2: Pass error back for each time-step from n back to 1. Step 3: Update weights. yt E2 h2 x2 Unrolled over n time-steps.... x3 ht Et
64 Unrolling the RNN Computation Graph yt-2 yt-1 yt ht-2 ht-1 ht xt-1 xt
65 Unrolling the RNN Computation Graph yt-2 yt-1 yt ht-2 ht-1 ht Et
66 Unrolling the RNN Computation Graph yt ht-2 ht-1 ht Et
67 Unrolling the RNN Computation Graph yt ht-2 ht-1 ht Et
68 Unrolling the RNN Computation Graph yt ht-2 ht-1 ht Et
69 Unrolling the RNN Computation Graph yt ht-2 ht-1 ht Et
70 Unrolling the RNN Computation Graph yt ht-2 ht-1 ht Et
71 Unrolling the RNN Computation Graph yt ht-2 ht-1 ht Et
72 Unrolling the RNN Computation Graph yt Et where ht-2 ht-1 ht
73 Unrolling the RNN Computation Graph yt-2 Et-2 yt-1 Et-1 yt Et where ht-2 ht-1 ht Total Error = E1 + E2 + + Et Total gradient = sum of all det/dθ s
74 Training: Truncated BPTT Code def bptt(model, X_train, y_train, initial_state): Error # Forward Loss, caches = forward(x_train, y_train, model, initial_state) avg_loss /= y_train.shape[0] # Backward dh_next = np.zeros((1, last_state.shape[0])) grads = {k: np.zeros_like(v) for k, v in model.items()} for t in reversed(range(len(x_train))): grad, dh_next = cell_fn_backward(ys[t], y_train[t], dh_next, caches[t]) for k in grads.keys(): grads[k] += grad[k] return grads, avg_loss Lego block!
75 Training: Truncated BPTT Code def bptt(model, X_train, y_train, initial_state): Error # Forward Loss, caches = forward(x_train, y_train, model, initial_state) avg_loss /= y_train.shape[0] Lego block! # Backward dh_next = np.zeros((1, last_state.shape[0])) grads = {k: np.zeros_like(v) for k, v in model.items()} for t in reversed(range(len(x_train))): grad, dh_next = cell_fn_backward(ys[t], y_train[t], dh_next, caches[t]) for k in grads.keys(): grads[k] += grad[k] return grads, avg_loss Total gradient = Sum of these lego-gradients over time!
76 Vanilla RNN Gradient Flow
77 Vanilla RNN Gradient Flow
78 Vanilla RNN Gradient Flow
79 Vanilla RNN Gradient Flow
80 Vanilla RNN Gradient Flow
81 Vanilla RNN Gradient Flow Part 2!
82 Gated Recurrent Models PART II: Gated Architectures & Applications
83 RECAP: The RNN API New state prev_state Recurrent function Input at current time-step next_state recurrent_fn() x outputs Previous state
84 The GATED RNN API prev_state It is the same! Just a different way of computing the outputs. recurrent_fn() Memory Cell next_state Gates x Controller outputs
85 Implementing a memory cell in a neural network
86 Propagating through a memory cell
87 Backpropagating through a memory cell?
88 LSTM = Long Short Term Memory LSTM concatenate : Xt Ht-1 Ct-1 σ σ tanh σ tanh + X = Xt Ht-1 vector sizes p+n n forget gate : f = σ(x.wf + bf) n update gate : u = σ(x.wu + bu) n result gate : r = σ(x.wr + br) input : X = tanh(x.wc + bc) n Ht new C : Ct = f * Ct-1 + u * X n Ct n new H : Ht = r * tanh(ct) Yt output : Yt = softmax(ht.w + b) m
89 LSTM Xt Ht-1 Ht σ σ tanh σ tanh Ct-1 + Ct Yt σ tanh Neural net. layers tanh Element-wise operations
90 LSTM concatenation Xt Ht-1 Ht σ σ tanh σ tanh Ct-1 + Ct Yt σ tanh Neural net. layers tanh Element-wise operations
91 LSTM Xt What to forget? Ht-1 Ht σ σ tanh σ tanh Ct-1 + Ct Yt σ tanh Neural net. layers tanh Element-wise operations
92 LSTM Xt Ht-1 Actually forget Ht σ σ tanh σ tanh Ct-1 + Ct Yt σ tanh Neural net. layers tanh Element-wise operations
93 LSTM What s the new value? Xt What to update? Ht-1 Ht σ σ tanh σ tanh Ct-1 + Ct Yt σ tanh Neural net. layers tanh Element-wise operations
94 LSTM Xt Ht-1 Ht σ σ tanh Actually update σ tanh Ct-1 + Ct Yt σ tanh Neural net. layers tanh Element-wise operations
95 LSTM Updated! Xt Ht-1 Ht σ σ tanh σ tanh Ct-1 + Ct Yt σ tanh Neural net. layers tanh Element-wise operations
96 LSTM Result Gate! Xt Ht-1 Ht σ σ tanh σ Calculate Result tanh Ct-1 + Ct Yt σ tanh Neural net. layers tanh Element-wise operations
97 LSTM Remember the result for next time step Xt Ht-1 Ht σ σ tanh σ tanh Ct-1 + Ct Yt σ tanh Neural net. layers tanh Element-wise operations Output
98 LSTM = Long Short Term Memory LSTM X = Xt Ht-1 concatenation Xt Ht-1 Ht σ σ tanh σ tanh Ct-1 + Ct Yt σ tanh Neural net. layers tanh Element-wise operations f = σ(x.wf + bf) u = σ(x.wu + bu) r = σ(x.wr + br) X = tanh(x.wc + bc) Ct = f * Ct-1 + u * X Ht = r * tanh(ct) Yt = softmax(ht.w + b)
99 LSTM = Long Short Term Memory LSTM X = Xt Ht-1 concatenation Xt Ht-1 Ht σ σ tanh σ tanh Ct-1 + Ct Yt σ tanh Neural net. layers tanh Element-wise operations f = σ(x.wf + bf) u = σ(x.wu + bu) r = σ(x.wr + br) X = tanh(x.wc + bc) Ct = f * Ct-1 + u * X Ht = r * tanh(ct) Yt = softmax(ht.w + b)
100 Gru!
101 Gated Recurrent Units (GRUs) GRU = Gated Recurrent Unit 2 gates instead of 3 => cheaper GRU Ht Yt vector sizes p+n z = σ(x.wz + bz) r = σ(x.wr + br) X = Xt r * Ht-1 Xt Ht-1 X = Xt Ht-1 Ht n n p+n X = tanh(x.wc + bc) Ht = (1-z) * Ht-1 + z * X n Yt = softmax(ht.w + b) m n
102 Gated Recurrent Units (GRUs) GRU = Gated Recurrent Unit 2 gates instead of 3 => cheaper GRU Ht Yt vector sizes p+n z = σ(x.wz + bz) r = σ(x.wr + br) X = Xt r * Ht-1 Xt Ht-1 X = Xt Ht-1 Ht n n p+n X = tanh(x.wc + bc) Ht = (1-z) * Ht-1 + z * X n Yt = softmax(ht.w + b) m n
103 Long Short-term Memory (LSTM): Gradient Flow
104 Long Short-term Memory (LSTM): Gradient Flow
105 Applications/Tasks Image captioning Sequence classification (Practical 4: MNIST) Language modeling Sequence-labeling (lots of NLP tasks, e.g. POS tagging, NER, ) Sequence-to-sequence learning (Machine translation, Summarization, )
106 One-to-Many: Image captioning GOAL: Given image, generate a sentence to describe its content.
107 One-to-Many: Image captioning GOAL: Given image, generate a sentence to describe its content.
108 Many-to-one: Sequence Classifier (Prac 4) GOAL: Given a sequence of inputs, predict the label for the whole sequence. Examples: Given a sentence, say if it is {negative, neutral, positive} Given the words in an , predict if it is a spam message. Given pieces of an image, predict what number is in the image.
109 Many-to-1: Polarity/Sentiment Classifer We feed all the words into the model one at a time, and make one prediction at the end: h0 h1 Cats Pos/Neg? h2 are h3 awesome
110 Many-to-1: Spam Classifer We feed all the words into the model one at a time, and make one prediction at the end: h0 h1 Viagra Spam/Ham? h2 for h3 cheap
111 Many-to-1: Image Classifier (Prac 4) We chop up the image and feed all the pieces through the model, and then make one prediction at the end: h0 h1 h2 yimag e h3
112 Next-token Prediction: Language modeling 1. Computing p(next word previous words) 2. p(the) p(cat the) p(sits the cat) h0 h1 h2 the cat p(xm...)... h3
113 Next-token Prediction: Language modeling x1 x2 x3 h0 h1 h2 x1 x2 xn... x3 hn
114 Many-to-many: Sequence labeling Mapping each input x1, x2,, xn to its own label y1, y2,, yn (Notice: Same length m; each input has an output.) A lot of NLP Tasks fall in this category, e.g.: Part-of-speech tagging: map words to their parts-of-speech (noun, verb, etc). Named-entity Recognition: identify mentions of people, places, etc in text Semantic Role Labeling: find the main actions, and who performs them on whom/what
115 Many-to-many: Sequence labeling Part-of-speech tagging
116 Many-to-many: Sequence labeling Part-of-speech tagging
117 Many-to-many: Sequence labeling Heat water in a large vessel. h0 heat VERB NOUN h1 h2 water in hn
118 Many-to-many: Sequence-to-Sequence Modeling Mapping input sequences {x1, x2,, xm} to output sequences {y1, y2,, yn}. Note: Arbitrary m and n. Many applications, most notably Machine Translation. Also: POS tagging, Parsing, Summarization, Dialog
119 Many-to-many: Sequence-to-Sequence Modeling Encoder RNN Decoder RNN the blue la house <s> maison la bleu maison <eos> bleu MACHINE TRANSLATION
120 Many-to-many: Sequence-to-Sequence Modeling DIALOGUE MODELING Prompt: You can t eat pasta every day. You can t live only on Italian food. System: I don t eat Italian food, but I do eat a lot of Italian food.
121 Many-to-many: Sequence-to-Sequence Modeling Prompt: What do you have planned tonight? Baseline: I have no idea what I want to do with my life. I have no idea what I want to (Bad, Mediocre, Bad) Ours: I was planning on doing a few of these, but I m not sure what to choose. I need to find a place to camp, and I don t have a lot of time to make a plan. (Excellent, Good, Mediocre)
122 Many-to-many: Sequence-to-Sequence Modeling Prompt: Have you seen Suicide Squad? Baseline: I have not, but I will check it out. (Good, Good, Good) Ours: I saw it in a theater once and it was great. It was very... I don t know, a little dark. I like the psychological effects and the way it portrays the characters in the movie. (Excellent, Excellent, Excellent)
123 Key take-aways RNNs have memory/state that evolve over time. We unroll the graph over time to do forward propagation. Backprop-through-time (BPTT): Vanishing/exploding gradients Gated architectures Perform Chain Rule over the unrolled graph efficiently by saving and reusing previous computations. de/dw is sum over all time-steps (b/c of tied weights) State is selectively overwritten per time-step Uninterrupted gradient flow through time: no vanishing/exploding gradients! Lots of cool applications!
124 Slide Credits Thank-you to the following resources, from which some of these slides were drawn and adapted. Stanford CS231n TensorFlow without a PhD
125 The end.
Sequence Modeling: Recurrent and Recursive Nets. By Pyry Takala 14 Oct 2015
Sequence Modeling: Recurrent and Recursive Nets By Pyry Takala 14 Oct 2015 Agenda Why Recurrent neural networks? Anatomy and basic training of an RNN (10.2, 10.2.1) Properties of RNNs (10.2.2, 8.2.6) Using
More informationRecurrent Neural Networks. Nand Kishore, Audrey Huang, Rohan Batra
Recurrent Neural Networks Nand Kishore, Audrey Huang, Rohan Batra Roadmap Issues Motivation 1 Application 1: Sequence Level Training 2 Basic Structure 3 4 Variations 5 Application 3: Image Classification
More informationMachine Learning 13. week
Machine Learning 13. week Deep Learning Convolutional Neural Network Recurrent Neural Network 1 Why Deep Learning is so Popular? 1. Increase in the amount of data Thanks to the Internet, huge amount of
More informationCSC 578 Neural Networks and Deep Learning
CSC 578 Neural Networks and Deep Learning Fall 2018/19 7. Recurrent Neural Networks (Some figures adapted from NNDL book) 1 Recurrent Neural Networks 1. Recurrent Neural Networks (RNNs) 2. RNN Training
More informationEncoding RNNs, 48 End of sentence (EOS) token, 207 Exploding gradient, 131 Exponential function, 42 Exponential Linear Unit (ELU), 44
A Activation potential, 40 Annotated corpus add padding, 162 check versions, 158 create checkpoints, 164, 166 create input, 160 create train and validation datasets, 163 dropout, 163 DRUG-AE.rel file,
More informationSlide credit from Hung-Yi Lee & Richard Socher
Slide credit from Hung-Yi Lee & Richard Socher 1 Review Word Vector 2 Word2Vec Variants Skip-gram: predicting surrounding words given the target word (Mikolov+, 2013) CBOW (continuous bag-of-words): predicting
More informationLSTM and its variants for visual recognition. Xiaodan Liang Sun Yat-sen University
LSTM and its variants for visual recognition Xiaodan Liang xdliang328@gmail.com Sun Yat-sen University Outline Context Modelling with CNN LSTM and its Variants LSTM Architecture Variants Application in
More informationSpeech and Language Processing. Daniel Jurafsky & James H. Martin. Copyright c All rights reserved. Draft of September 23, 2018.
Speech and Language Processing. Daniel Jurafsky & James H. Martin. Copyright c 2018. All rights reserved. Draft of September 23, 2018. CHAPTER 9 Sequence Processing with Recurrent Networks Time will explain.
More informationTable of Contents. What Really is a Hidden Unit? Visualizing Feed-Forward NNs. Visualizing Convolutional NNs. Visualizing Recurrent NNs
Table of Contents What Really is a Hidden Unit? Visualizing Feed-Forward NNs Visualizing Convolutional NNs Visualizing Recurrent NNs Visualizing Attention Visualizing High Dimensional Data What do visualizations
More informationRecurrent Neural Networks
Recurrent Neural Networks Javier Béjar Deep Learning 2018/2019 Fall Master in Artificial Intelligence (FIB-UPC) Introduction Sequential data Many problems are described by sequences Time series Video/audio
More information27: Hybrid Graphical Models and Neural Networks
10-708: Probabilistic Graphical Models 10-708 Spring 2016 27: Hybrid Graphical Models and Neural Networks Lecturer: Matt Gormley Scribes: Jakob Bauer Otilia Stretcu Rohan Varma 1 Motivation We first look
More informationNatural Language Processing with Deep Learning CS224N/Ling284
Natural Language Processing with Deep Learning CS224N/Ling284 Lecture 8: Recurrent Neural Networks Christopher Manning and Richard Socher Organization Extra project office hour today after lecture Overview
More informationTransition-based Parsing with Neural Nets
CS11-747 Neural Networks for NLP Transition-based Parsing with Neural Nets Graham Neubig Site https://phontron.com/class/nn4nlp2017/ Two Types of Linguistic Structure Dependency: focus on relations between
More informationRecurrent Neural Networks
Recurrent Neural Networks 11-785 / Fall 2018 / Recitation 7 Raphaël Olivier Recap : RNNs are magic They have infinite memory They handle all kinds of series They re the basis of recent NLP : Translation,
More informationA Quick Guide on Training a neural network using Keras.
A Quick Guide on Training a neural network using Keras. TensorFlow and Keras Keras Open source High level, less flexible Easy to learn Perfect for quick implementations Starts by François Chollet from
More informationSentiment Classification of Food Reviews
Sentiment Classification of Food Reviews Hua Feng Department of Electrical Engineering Stanford University Stanford, CA 94305 fengh15@stanford.edu Ruixi Lin Department of Electrical Engineering Stanford
More informationTransition-Based Dependency Parsing with Stack Long Short-Term Memory
Transition-Based Dependency Parsing with Stack Long Short-Term Memory Chris Dyer, Miguel Ballesteros, Wang Ling, Austin Matthews, Noah A. Smith Association for Computational Linguistics (ACL), 2015 Presented
More informationLecture 20: Neural Networks for NLP. Zubin Pahuja
Lecture 20: Neural Networks for NLP Zubin Pahuja zpahuja2@illinois.edu courses.engr.illinois.edu/cs447 CS447: Natural Language Processing 1 Today s Lecture Feed-forward neural networks as classifiers simple
More informationSEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic
SEMANTIC COMPUTING Lecture 8: Introduction to Deep Learning Dagmar Gromann International Center For Computational Logic TU Dresden, 7 December 2018 Overview Introduction Deep Learning General Neural Networks
More informationConvolutional Sequence to Sequence Learning. Denis Yarats with Jonas Gehring, Michael Auli, David Grangier, Yann Dauphin Facebook AI Research
Convolutional Sequence to Sequence Learning Denis Yarats with Jonas Gehring, Michael Auli, David Grangier, Yann Dauphin Facebook AI Research Sequence generation Need to model a conditional distribution
More informationEECS 496 Statistical Language Models. Winter 2018
EECS 496 Statistical Language Models Winter 2018 Introductions Professor: Doug Downey Course web site: www.cs.northwestern.edu/~ddowney/courses/496_winter2018 (linked off prof. home page) Logistics Grading
More informationNatural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu
Natural Language Processing CS 6320 Lecture 6 Neural Language Models Instructor: Sanda Harabagiu In this lecture We shall cover: Deep Neural Models for Natural Language Processing Introduce Feed Forward
More informationSEMANTIC COMPUTING. Lecture 9: Deep Learning: Recurrent Neural Networks (RNNs) TU Dresden, 21 December 2018
SEMANTIC COMPUTING Lecture 9: Deep Learning: Recurrent Neural Networks (RNNs) Dagmar Gromann International Center For Computational Logic TU Dresden, 21 December 2018 Overview Handling Overfitting Recurrent
More informationMachine Learning for Natural Language Processing. Alice Oh January 17, 2018
Machine Learning for Natural Language Processing Alice Oh January 17, 2018 Overview Distributed representation Temporal neural networks RNN LSTM GRU Sequence-to-sequence models Machine translation Response
More informationRecurrent Neural Network (RNN) Industrial AI Lab.
Recurrent Neural Network (RNN) Industrial AI Lab. For example (Deterministic) Time Series Data Closed- form Linear difference equation (LDE) and initial condition High order LDEs 2 (Stochastic) Time Series
More informationConvolutional Networks for Text
CS11-747 Neural Networks for NLP Convolutional Networks for Text Graham Neubig Site https://phontron.com/class/nn4nlp2017/ An Example Prediction Problem: Sentence Classification I hate this movie very
More informationCombining Neural Networks and Log-linear Models to Improve Relation Extraction
Combining Neural Networks and Log-linear Models to Improve Relation Extraction Thien Huu Nguyen and Ralph Grishman Computer Science Department, New York University {thien,grishman}@cs.nyu.edu Outline Relation
More informationStructured Attention Networks
Structured Attention Networks Yoon Kim Carl Denton Luong Hoang Alexander M. Rush HarvardNLP ICLR, 2017 Presenter: Chao Jiang ICLR, 2017 Presenter: Chao Jiang 1 / Outline 1 Deep Neutral Networks for Text
More informationNatural Language Processing with Deep Learning CS224N/Ling284. Christopher Manning Lecture 4: Backpropagation and computation graphs
Natural Language Processing with Deep Learning CS4N/Ling84 Christopher Manning Lecture 4: Backpropagation and computation graphs Lecture Plan Lecture 4: Backpropagation and computation graphs 1. Matrix
More informationCS 6501: Deep Learning for Computer Graphics. Training Neural Networks II. Connelly Barnes
CS 6501: Deep Learning for Computer Graphics Training Neural Networks II Connelly Barnes Overview Preprocessing Initialization Vanishing/exploding gradients problem Batch normalization Dropout Additional
More informationRecurrent Neural Nets II
Recurrent Neural Nets II Steven Spielberg Pon Kumar, Tingke (Kevin) Shen Machine Learning Reading Group, Fall 2016 9 November, 2016 Outline 1 Introduction 2 Problem Formulations with RNNs 3 LSTM for Optimization
More informationLecture 2 Notes. Outline. Neural Networks. The Big Idea. Architecture. Instructors: Parth Shah, Riju Pahwa
Instructors: Parth Shah, Riju Pahwa Lecture 2 Notes Outline 1. Neural Networks The Big Idea Architecture SGD and Backpropagation 2. Convolutional Neural Networks Intuition Architecture 3. Recurrent Neural
More informationLSTM for Language Translation and Image Captioning. Tel Aviv University Deep Learning Seminar Oran Gafni & Noa Yedidia
1 LSTM for Language Translation and Image Captioning Tel Aviv University Deep Learning Seminar Oran Gafni & Noa Yedidia 2 Part I LSTM for Language Translation Motivation Background (RNNs, LSTMs) Model
More informationCS 224n: Assignment #3
CS 224n: Assignment #3 Due date: 2/27 11:59 PM PST (You are allowed to use 3 late days maximum for this assignment) These questions require thought, but do not require long answers. Please be as concise
More informationDeep Neural Networks Applications in Handwriting Recognition
Deep Neural Networks Applications in Handwriting Recognition 2 Who am I? Théodore Bluche PhD defended at Université Paris-Sud last year Deep Neural Networks for Large Vocabulary Handwritten
More informationEmpirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling
Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling Authors: Junyoung Chung, Caglar Gulcehre, KyungHyun Cho and Yoshua Bengio Presenter: Yu-Wei Lin Background: Recurrent Neural
More informationCAP 6412 Advanced Computer Vision
CAP 6412 Advanced Computer Vision http://www.cs.ucf.edu/~bgong/cap6412.html Boqing Gong Feb 04, 2016 Today Administrivia Attention Modeling in Image Captioning, by Karan Neural networks & Backpropagation
More informationDeep Learning Applications
October 20, 2017 Overview Supervised Learning Feedforward neural network Convolution neural network Recurrent neural network Recursive neural network (Recursive neural tensor network) Unsupervised Learning
More informationThis Talk. 1) Node embeddings. Map nodes to low-dimensional embeddings. 2) Graph neural networks. Deep learning architectures for graphstructured
Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018 1 This Talk 1) Node embeddings Map nodes to low-dimensional embeddings. 2) Graph neural networks Deep learning architectures
More informationModeling Sequences Conditioned on Context with RNNs
Modeling Sequences Conditioned on Context with RNNs Sargur Srihari srihari@buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 10. Topics in Sequence
More informationHidden Units. Sargur N. Srihari
Hidden Units Sargur N. srihari@cedar.buffalo.edu 1 Topics in Deep Feedforward Networks Overview 1. Example: Learning XOR 2. Gradient-Based Learning 3. Hidden Units 4. Architecture Design 5. Backpropagation
More informationContext Encoding LSTM CS224N Course Project
Context Encoding LSTM CS224N Course Project Abhinav Rastogi arastogi@stanford.edu Supervised by - Samuel R. Bowman December 7, 2015 Abstract This project uses ideas from greedy transition based parsing
More informationShow, Discriminate, and Tell: A Discriminatory Image Captioning Model with Deep Neural Networks
Show, Discriminate, and Tell: A Discriminatory Image Captioning Model with Deep Neural Networks Zelun Luo Department of Computer Science Stanford University zelunluo@stanford.edu Te-Lin Wu Department of
More informationDEEP LEARNING REVIEW. Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature Presented by Divya Chitimalla
DEEP LEARNING REVIEW Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature 2015 -Presented by Divya Chitimalla What is deep learning Deep learning allows computational models that are composed of multiple
More informationEmpirical Evaluation of RNN Architectures on Sentence Classification Task
Empirical Evaluation of RNN Architectures on Sentence Classification Task Lei Shen, Junlin Zhang Chanjet Information Technology lorashen@126.com, zhangjlh@chanjet.com Abstract. Recurrent Neural Networks
More informationConditioned Generation
CS11-747 Neural Networks for NLP Conditioned Generation Graham Neubig Site https://phontron.com/class/nn4nlp2017/ Language Models Language models are generative models of text s ~ P(x) The Malfoys! said
More informationReservoir Computing with Emphasis on Liquid State Machines
Reservoir Computing with Emphasis on Liquid State Machines Alex Klibisz University of Tennessee aklibisz@gmail.com November 28, 2016 Context and Motivation Traditional ANNs are useful for non-linear problems,
More informationDeep neural networks
Deep neural networks Outline What s new in ANNs in the last 5-10 years? Deeper networks, more data, and faster training Scalability and use of GPUs Symbolic differentiation reverse-mode automatic differentiation
More informationNeural Nets & Deep Learning
Neural Nets & Deep Learning The Inspiration Inputs Outputs Our brains are pretty amazing, what if we could do something similar with computers? Image Source: http://ib.bioninja.com.au/_media/neuron _med.jpeg
More informationDeep Neural Networks Applications in Handwriting Recognition
Deep Neural Networks Applications in Handwriting Recognition Théodore Bluche theodore.bluche@gmail.com São Paulo Meetup - 9 Mar. 2017 2 Who am I? Théodore Bluche PhD defended
More informationThe Mathematics Behind Neural Networks
The Mathematics Behind Neural Networks Pattern Recognition and Machine Learning by Christopher M. Bishop Student: Shivam Agrawal Mentor: Nathaniel Monson Courtesy of xkcd.com The Black Box Training the
More informationCPSC 340: Machine Learning and Data Mining. Multi-Class Classification Fall 2017
CPSC 340: Machine Learning and Data Mining Multi-Class Classification Fall 2017 Assignment 3: Admin Check update thread on Piazza for correct definition of trainndx. This could make your cross-validation
More informationAsynchronous Parallel Learning for Neural Networks and Structured Models with Dense Features
Asynchronous Parallel Learning for Neural Networks and Structured Models with Dense Features Xu SUN ( 孙栩 ) Peking University xusun@pku.edu.cn Motivation Neural networks -> Good Performance CNN, RNN, LSTM
More informationGrounded Compositional Semantics for Finding and Describing Images with Sentences
Grounded Compositional Semantics for Finding and Describing Images with Sentences R. Socher, A. Karpathy, V. Le,D. Manning, A Y. Ng - 2013 Ali Gharaee 1 Alireza Keshavarzi 2 1 Department of Computational
More informationSemantic image search using queries
Semantic image search using queries Shabaz Basheer Patel, Anand Sampat Department of Electrical Engineering Stanford University CA 94305 shabaz@stanford.edu,asampat@stanford.edu Abstract Previous work,
More informationMIXED PRECISION TRAINING: THEORY AND PRACTICE Paulius Micikevicius
MIXED PRECISION TRAINING: THEORY AND PRACTICE Paulius Micikevicius What is Mixed Precision Training? Reduced precision tensor math with FP32 accumulation, FP16 storage Successfully used to train a variety
More informationArtificial Neural Networks. Introduction to Computational Neuroscience Ardi Tampuu
Artificial Neural Networks Introduction to Computational Neuroscience Ardi Tampuu 7.0.206 Artificial neural network NB! Inspired by biology, not based on biology! Applications Automatic speech recognition
More informationHow GPUs Power Comcast's X1 Voice Remote and Smart Video Analytics. Jan Neumann Comcast Labs DC May 10th, 2017
How GPUs Power Comcast's X1 Voice Remote and Smart Video Analytics Jan Neumann Comcast Labs DC May 10th, 2017 Comcast Applied Artificial Intelligence Lab Media & Video Analytics Smart TV Deep Learning
More informationLecture 21 : A Hybrid: Deep Learning and Graphical Models
10-708: Probabilistic Graphical Models, Spring 2018 Lecture 21 : A Hybrid: Deep Learning and Graphical Models Lecturer: Kayhan Batmanghelich Scribes: Paul Liang, Anirudha Rayasam 1 Introduction and Motivation
More informationCan Active Memory Replace Attention?
Google Brain NIPS 2016 Presenter: Chao Jiang NIPS 2016 Presenter: Chao Jiang 1 / Outline 1 Introduction 2 Active Memory 3 Step by Step to Neural GPU 4 Another two steps: 1. the Markovian Neural GPU 5 Another
More information16-785: Integrated Intelligence in Robotics: Vision, Language, and Planning. Spring 2018 Lecture 14. Image to Text
16-785: Integrated Intelligence in Robotics: Vision, Language, and Planning Spring 2018 Lecture 14. Image to Text Input Output Classification tasks 4/1/18 CMU 16-785: Integrated Intelligence in Robotics
More informationFastText. Jon Koss, Abhishek Jindal
FastText Jon Koss, Abhishek Jindal FastText FastText is on par with state-of-the-art deep learning classifiers in terms of accuracy But it is way faster: FastText can train on more than one billion words
More informationStructured Prediction Basics
CS11-747 Neural Networks for NLP Structured Prediction Basics Graham Neubig Site https://phontron.com/class/nn4nlp2017/ A Prediction Problem I hate this movie I love this movie very good good neutral bad
More informationCS489/698: Intro to ML
CS489/698: Intro to ML Lecture 14: Training of Deep NNs Instructor: Sun Sun 1 Outline Activation functions Regularization Gradient-based optimization 2 Examples of activation functions 3 5/28/18 Sun Sun
More informationDeep Learning in Visual Recognition. Thanks Da Zhang for the slides
Deep Learning in Visual Recognition Thanks Da Zhang for the slides Deep Learning is Everywhere 2 Roadmap Introduction Convolutional Neural Network Application Image Classification Object Detection Object
More informationStatistical Methods for NLP
Statistical Methods for NLP Information Extraction, Hidden Markov Models Sameer Maskey * Most of the slides provided by Bhuvana Ramabhadran, Stanley Chen, Michael Picheny Speech Recognition Lecture 4:
More informationRNNs as Directed Graphical Models
RNNs as Directed Graphical Models Sargur Srihari srihari@buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 10. Topics in Sequence Modeling Overview
More informationDeep Learning. Deep Learning. Practical Application Automatically Adding Sounds To Silent Movies
http://blog.csdn.net/zouxy09/article/details/8775360 Automatic Colorization of Black and White Images Automatically Adding Sounds To Silent Movies Traditionally this was done by hand with human effort
More informationLecture 4: Backpropagation and Automatic Differentiation. CSE599W: Spring 2018
Lecture 4: Backpropagation and Automatic Differentiation CSE599W: Spring 208 Announcement Assignment is out today, due in 2 weeks (Apr 9 th, 5pm) Model Training Overview layer extractor layer2 extractor
More informationHow to Develop Encoder-Decoder LSTMs
Chapter 9 How to Develop Encoder-Decoder LSTMs 9.0.1 Lesson Goal The goal of this lesson is to learn how to develop encoder-decoder LSTM models. completing this lesson, you will know: After ˆ The Encoder-Decoder
More informationReplacing Neural Networks with Black-Box ODE Solvers
Replacing Neural Networks with Black-Box ODE Solvers Tian Qi Chen, Yulia Rubanova, Jesse Bettencourt, David Duvenaud University of Toronto, Vector Institute Resnets are Euler integrators Middle layers
More informationA Hybrid Neural Model for Type Classification of Entity Mentions
A Hybrid Neural Model for Type Classification of Entity Mentions Motivation Types group entities to categories Entity types are important for various NLP tasks Our task: predict an entity mention s type
More informationMultinomial Regression and the Softmax Activation Function. Gary Cottrell!
Multinomial Regression and the Softmax Activation Function Gary Cottrell Notation reminder We have N data points, or patterns, in the training set, with the pattern number as a superscript: {(x 1,t 1 ),
More informationThe Hitchhiker s Guide to TensorFlow:
The Hitchhiker s Guide to TensorFlow: Beyond Recurrent Neural Networks (sort of) Keith Davis @keithdavisiii iamthevastidledhitchhiker.github.io Topics Kohonen/Self-Organizing Maps LSTMs in TensorFlow GRU
More informationA Simple (?) Exercise: Predicting the Next Word
CS11-747 Neural Networks for NLP A Simple (?) Exercise: Predicting the Next Word Graham Neubig Site https://phontron.com/class/nn4nlp2017/ Are These Sentences OK? Jane went to the store. store to Jane
More informationBayesian model ensembling using meta-trained recurrent neural networks
Bayesian model ensembling using meta-trained recurrent neural networks Luca Ambrogioni l.ambrogioni@donders.ru.nl Umut Güçlü u.guclu@donders.ru.nl Yağmur Güçlütürk y.gucluturk@donders.ru.nl Julia Berezutskaya
More informationXES Tensorflow Process Prediction using the Tensorflow Deep-Learning Framework
XES Tensorflow Process Prediction using the Tensorflow Deep-Learning Framework Demo Paper Joerg Evermann 1, Jana-Rebecca Rehse 2,3, and Peter Fettke 2,3 1 Memorial University of Newfoundland 2 German Research
More informationDeep Learning in NLP. Horacio Rodríguez. AHLT Deep Learning 2 1
Deep Learning in NLP Horacio Rodríguez AHLT Deep Learning 2 1 Outline Introduction Short review of Distributional Semantics, Semantic spaces, VSM, Embeddings Embedding of words Embedding of more complex
More informationPredicting your Next Stop-Over from Location-Based Social Network Data with Recurrent Neural Networks RecTour workshop 2017 RecSys 2017, Como, Italy
Predicting your Next Stop-Over from Location-Based Social Network Data with Recurrent Neural Networks RecTour workshop 217 RecSys 217, Como, Italy Enrico Palumbo, ISMB, Italy, Turin Giuseppe Rizzo, ISMB,
More informationHouse Price Prediction Using LSTM
House Price Prediction Using LSTM Xiaochen Chen Lai Wei The Hong Kong University of Science and Technology Jiaxin Xu ABSTRACT In this paper, we use the house price data ranging from January 2004 to October
More informationSentiment Classification of Food Reviews
1 2 3 4 5 6 7 8 9 10 11 12 13 14 Sentiment Classification of Food Reviews Hua Feng Ruixi Lin Department of Electrical Engineering Department of Electrical Engineering Stanford University Stanford University
More informationECE 6504: Deep Learning for Perception
ECE 6504: Deep Learning for Perception Topics: (Finish) Backprop Convolutional Neural Nets Dhruv Batra Virginia Tech Administrativia Presentation Assignments https://docs.google.com/spreadsheets/d/ 1m76E4mC0wfRjc4HRBWFdAlXKPIzlEwfw1-u7rBw9TJ8/
More informationWhy is word2vec so fast? Efficiency tricks for neural nets
CS11-747 Neural Networks for NLP Why is word2vec so fast? Efficiency tricks for neural nets Taylor Berg-Kirkpatrick Site https://phontron.com/class/nn4nlp2017/ Glamorous Life of an AI Scientist Perception
More informationAdvanced Search Algorithms
CS11-747 Neural Networks for NLP Advanced Search Algorithms Daniel Clothiaux https://phontron.com/class/nn4nlp2017/ Why search? So far, decoding has mostly been greedy Chose the most likely output from
More informationConvolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech
Convolutional Neural Networks Computer Vision Jia-Bin Huang, Virginia Tech Today s class Overview Convolutional Neural Network (CNN) Training CNN Understanding and Visualizing CNN Image Categorization:
More informationLayerwise Interweaving Convolutional LSTM
Layerwise Interweaving Convolutional LSTM Tiehang Duan and Sargur N. Srihari Department of Computer Science and Engineering The State University of New York at Buffalo Buffalo, NY 14260, United States
More informationStack- propaga+on: Improved Representa+on Learning for Syntax
Stack- propaga+on: Improved Representa+on Learning for Syntax Yuan Zhang, David Weiss MIT, Google 1 Transi+on- based Neural Network Parser p(action configuration) So1max Hidden Embedding words labels POS
More informationResidual Networks And Attention Models. cs273b Recitation 11/11/2016. Anna Shcherbina
Residual Networks And Attention Models cs273b Recitation 11/11/2016 Anna Shcherbina Introduction to ResNets Introduced in 2015 by Microsoft Research Deep Residual Learning for Image Recognition (He, Zhang,
More informationDeep Learning Basic Lecture - Complex Systems & Artificial Intelligence 2017/18 (VO) Asan Agibetov, PhD.
Deep Learning 861.061 Basic Lecture - Complex Systems & Artificial Intelligence 2017/18 (VO) Asan Agibetov, PhD asan.agibetov@meduniwien.ac.at Medical University of Vienna Center for Medical Statistics,
More informationRecurrent Neural Networks and Transfer Learning for Action Recognition
Recurrent Neural Networks and Transfer Learning for Action Recognition Andrew Giel Stanford University agiel@stanford.edu Ryan Diaz Stanford University ryandiaz@stanford.edu Abstract We have taken on the
More informationCOMP 551 Applied Machine Learning Lecture 16: Deep Learning
COMP 551 Applied Machine Learning Lecture 16: Deep Learning Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted, all
More informationNeural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017
3/0/207 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/0/207 Perceptron as a neural
More informationOPTIMIZING PERFORMANCE OF RECURRENT NEURAL NETWORKS
April 4-7, 2016 Silicon Valley OPTIMIZING PERFORMANCE OF RECURRENT NEURAL NETWORKS Jeremy Appleyard, 7 April 2016 RECURRENT NEURAL NETWORKS Output is fed into input Perform the same operation repeatedly
More informationOutline. Morning program Preliminaries Semantic matching Learning to rank Entities
112 Outline Morning program Preliminaries Semantic matching Learning to rank Afternoon program Modeling user behavior Generating responses Recommender systems Industry insights Q&A 113 are polysemic Finding
More informationCSC321 Lecture 10: Automatic Differentiation
CSC321 Lecture 10: Automatic Differentiation Roger Grosse Roger Grosse CSC321 Lecture 10: Automatic Differentiation 1 / 23 Overview Implementing backprop by hand is like programming in assembly language.
More informationMIXED PRECISION TRAINING OF NEURAL NETWORKS. Carl Case, Senior Architect, NVIDIA
MIXED PRECISION TRAINING OF NEURAL NETWORKS Carl Case, Senior Architect, NVIDIA OUTLINE 1. What is mixed precision training with FP16? 2. Considerations and methodology for mixed precision training 3.
More informationResearch at Google: With a Case of YouTube-8M. Joonseok Lee, Google Research
Research at Google: With a Case of YouTube-8M Joonseok Lee, Google Research Google Research We tackle the most challenging problems in Computer Science and related fields. Being bold and taking risks allows
More informationAutoencoder. Representation learning (related to dictionary learning) Both the input and the output are x
Deep Learning 4 Autoencoder, Attention (spatial transformer), Multi-modal learning, Neural Turing Machine, Memory Networks, Generative Adversarial Net Jian Li IIIS, Tsinghua Autoencoder Autoencoder Unsupervised
More informationA Neuro Probabilistic Language Model Bengio et. al. 2003
A Neuro Probabilistic Language Model Bengio et. al. 2003 Class Discussion Notes Scribe: Olivia Winn February 1, 2016 Opening thoughts (or why this paper is interesting): Word embeddings currently have
More informationFrameworks. Advanced Machine Learning for NLP Jordan Boyd-Graber RECURRENT NEURAL NETWORKS IN DYNET
Frameworks Advanced Machine Learning for NLP Jordan Boyd-Graber RECURRENT NEURAL NETWORKS IN DYNET Slides adapted from Chris Dyer, Yoav Goldberg, Graham Neubig Advanced Machine Learning for NLP Boyd-Graber
More information