Gated Recurrent Models. Stephan Gouws & Richard Klein

Size: px
Start display at page:

Download "Gated Recurrent Models. Stephan Gouws & Richard Klein"

Transcription

1 Gated Recurrent Models Stephan Gouws & Richard Klein

2 Outline Part 1: Intuition, Inference and Training Building intuitions: From Feedforward to Recurrent Models Inference in RNNs: Fprop Training in RNNs: Backpropagation-through-time (BPTT) SHORT BREAK

3 Outline Part 2: Gated models & Applications Long Short-term Memory (LSTMs) Gated Recurrent Units (GRUs) Applications: Image captioning Sequence classification (Practical 4: MNIST) Language modeling Sequence-labeling (lots of NLP tasks, e.g. POS tagging, NER, ) Sequence-to-sequence learning (Machine translation, Dialogue modeling, )

4 Recurrent Models PART 1: Intuition, Inference and Training

5 Introduction?

6 Introduction? x2

7 Introduction? x3

8 We need to be able to remember information from previous time steps

9 Recurrent neural networks: Intuition

10 Long-term dependencies: Why do they matter? Michel C. was born in Paris, France. He is married and has three children. He received a M.S. in neurosciences from the University Pierre & Marie Curie and the Ecole Normale Supérieure in 1987, and and then spent most of his career in Switzerland, at the Ecole Polytechnique de Lausanne. He specialized in child and adolescent psychiatry and his first field of research was severe mood disorders in adolescent, topic of his PhD in neurosciences (2002). His Michel Hn-1 C. was born in mother tongue is????? Short context Hn French Long context English, German, Russian, French Problems

11 Types of Sequence Models FFNs Image Captioning MNIST predictor Seq2Seq Sequence labeling (e.g. NER)

12 Types of Sequence Models FFNs Image Captioning MNIST predictor Seq2Seq Sequence labeling (e.g. NER)

13 ? Types of Sequence Models We ll talk about this a little later. We ll also implement this in today s practical! FFNs Image Captioning MNIST predictor Seq2Seq Sequence labeling (e.g. NER)

14 Types of Sequence Models FFNs Image Captioning MNIST predictor Seq2Seq Sequence labeling (e.g. NER)

15 Types of Sequence Models FFNs Image Captioning MNIST predictor Seq2Seq Sequence labeling (e.g. NER)

16 Types of Sequence Models FFNs Image Captioning MNIST predictor Seq2Seq Sequence labeling (e.g. NER)

17 FFNs vs RNNs Classify following examples:,,,

18 FFNs vs RNNs PREDICT: 1 Y: outputs Y softmax tanh FFN H X X: inputs,,,

19 FFNs vs RNNs PREDICT: 9 Y: outputs Y softmax tanh FFN H X X: inputs,,,

20 FFNs vs RNNs PREDICT: 2 Y: outputs Y softmax tanh FFN H X X: inputs,,,

21 FFNs vs RNNs But what if these were not digits, but longer numbers?, Problem? Variable length inputs.,

22 FFNs vs RNNs Y: outputs Yt softmax tanh 1? RNN cell H Xt X: inputs at step t,,,

23 FFNs vs RNNs Y: outputs Yt softmax H: internal state tanh 19? RNN cell H Xt 1 H at t-1, X: inputs at step t,,

24 FFNs vs RNNs Maintains a state (memory) that carries information between inputs! Y: outputs Yt softmax H: internal state tanh 192? RNN cell H Xt 19 H at t-1, X: inputs at step t,,

25 The RNN API prev_state next_state recurrent_fn() x outputs

26 The RNN Computation Graph yt t-1 xt Feedback loop / state / memory / stack (previous time-step)

27 Unrolling the RNN Computation Graph y1 h0 h1 x1

28 Unrolling the RNN Computation Graph h0 x1 y1 y2 h1 h2 x2

29 Unrolling the RNN Computation Graph h0 x1 y1 y2 h1 h2 x2 Unrolled over n time-steps. yn... x3 hn

30 Unrolling the RNN Computation Graph NB: We reuse the same weights at every time-step! h0 y1 y2 h1 h2 x1 x2 The same Unrolled over n time-steps. yn... x3 hn

31 Unrolling the RNN Computation Graph NB: We reuse the same weights at every time-step! y1 y2 yn We can therefore think of an RNN as a composition of identical feedforward neural networks h1 (with replicated/tied h2 weights), one for each moment or step in time. h0 x1 x2 The same Unrolled over n time-steps. x3... hn

32 The FFN API class FeedForwardModel(): y #... def forward(self, x): # Compute activations on the hidden layer. hidden_layer = self.act_fn(np.dot(self.w_xh, x) + b) # Compute the (linear) output layer activations. y = np.dot(self.w_hy, hidden_layer) return y x

33 The FFN API class FeedForwardModel(): y #... def forward(self, x): # Compute activations on the hidden layer. hidden_layer = self.act_fn(np.dot(self.w_xh, x) + b) # Compute the (linear) output layer activations. y = np.dot(self.w_hy, hidden_layer) return y x

34 The FFN API class FeedForwardModel(): y #... def forward(self, x): # Compute activations on the hidden layer. hidden_layer = self.act_fn(np.dot(self.w_xh, x) + b) # Compute the (linear) output layer activations. y = np.dot(self.w_hy, hidden_layer) return y x

35 The FFN API class FeedForwardModel(): y #... def forward(self, x): # Compute activations on the hidden layer. hidden_layer = self.act_fn(np.dot(self.w_xh, x) + b) # Compute the (linear) output layer activations. y = np.dot(self.w_hy, hidden_layer) return y x

36 The FFN API class FeedForwardModel(): y #... def forward(self, x): # Compute activations on the hidden layer. hidden_layer = self.act_fn(np.dot(self.w_xh, x) + b) # Compute the (linear) output layer activations. y = np.dot(self.w_hy, hidden_layer) return y x

37 The RNN API class RecurrentModel(): #... yt def recurrent_fn(self, x, prev_state): # Compute the new state based on the previous state and current input. new_state = self.act_fn(np.dot(self.w_xh, x) + np.dot(self.w_hh, prev_state) + b) t-1 # Compute the output vector. y = np.dot(self.w_hy, new_state) return new_state, y xt

38 The RNN API class RecurrentModel(): #... yt def recurrent_fn(self, x, prev_state): # Compute the new state based on the previous state and current input. new_state = self.act_fn(np.dot(self.w_xh, x) + np.dot(self.w_hh, prev_state) + b) t-1 # Compute the output vector. y = np.dot(self.w_hy, new_state) return new_state, y xt

39 The RNN API New state Recurrent function Input at current time-step Previous state class RecurrentModel(): #... yt def recurrent_fn(self, x, prev_state): # Compute the new state based on the previous state and current input. new_state = self.act_fn(np.dot(self.w_xh, x) + np.dot(self.w_hh, prev_state)) t-1 # Compute the output vector. y = np.dot(self.w_hy, new_state) return new_state, y xt

40 The RNN API New state Recurrent function Input at current time-step Previous state class RecurrentModel(): #... yt def recurrent_fn(self, x, prev_state): # Compute the new state based on the previous state and current input. new_state = self.act_fn(np.dot(self.w_xh, x) + np.dot(self.w_hh, prev_state)) t-1 # Compute the output vector. y = np.dot(self.w_hy, new_state) return new_state, y xt

41 The RNN API def forward(self, data_sequence, initial_state): state = initial_state all_states, all_ys = [state], [] cache = [] for x, y in data_sequence: new_state, y_pred = recurrent_fn(x, state) loss += cross_entropy(y_pred, y) cache.append((new_state, y_pred)) state = new_state return loss, cache

42 The RNN API def forward(self, data_sequence, initial_state): state = initial_state all_states, all_ys = [state], [] cache = [] for x, y in data_sequence: new_state, y_pred = recurrent_fn(x, state) loss += cross_entropy(y_pred, y) cache.append((new_state, y_pred)) state = new_state return loss, cache

43 The RNN API def forward(self, data_sequence, initial_state): state = initial_state all_states, all_ys = [state], [] cache = [] for x, y in data_sequence: new_state, y_pred = recurrent_fn(x, state) loss += cross_entropy(y_pred, y) cache.append((new_state, y_pred)) state = new_state return loss, cache

44 Math: FFNs v RNNs NOTATION: Wxh is a matrix that maps a vector x into a vector h. y x

45 NOTATION: Wxh is a matrix that maps a vector x into a vector h. Math: FFNs v RNNs y Input x

46 NOTATION: Wxh is a matrix that maps a vector x into a vector h. Math: FFNs v RNNs y Activation function Input x

47 NOTATION: Wxh is a matrix that maps a vector x into a vector h. Math: FFNs v RNNs Hidden layer y Activation function Input x

48 Math: FFNs v RNNs NOTATION: Wxh is a matrix that maps a vector x into a vector h. Hidden layer yt Activation function Input t-1... New state Recurrent function Input at current time-step xt

49 Math: FFNs v RNNs NOTATION: Wxh is a matrix that maps a vector x into a vector h. Hidden layer yt Activation function Input t-1 New state Recurrent function Input at current time-step xt

50 Math: FFNs v RNNs NOTATION: Wxh is a matrix that maps a vector x into a vector h. Hidden layer yt Activation function Input Recurrent weights t-1 New state Recurrent function Input at current time-step Previous state xt

51 Inference & Training How do we make predictions using RNNs? Forward propagation: Fprop Essentially a composition of functions: a2 = f2(f1(x)). We unroll the computational graph over time-steps. How do we train RNNs? Backward propagation: Backprop-through time We need to consider predictions over several time-steps! Credit assignment over time. We work backwards in time from the last state to the first.

52 Training: Ways to Train RNNs Echo State Networks: Initialize Wxh, Whh, Who, carefully, then only train Who! Backpropagation through time (BPTT): Propagate errors backwards through the unrolled graph. There are other options.

53 Training: ESNs Simple solution: don t train the recurrent weights (Whh & Wxh)! Initialization very important. Super simple. However, with recent improvements in initialization etc, BPTT does better! [Scholarpedia]

54 Inference & Training How do we make predictions using RNNs? Forward propagation: Fprop Essentially a composition of functions: a2 = f2(f1(x)). We unroll the computational graph over time-steps. Error How do we train RNNs? Propagate errors backwards through unrolled graph: Backprop-through time (BPTT). We need to consider predictions over several time-steps! Credit assignment over time. We work backwards in time from the last state to the first.

55 Training: BPTT Intuition

56 Training: Truncated BPTT

57 Training: Truncated BPTT

58 Training: Truncated BPTT

59 Unrolling the RNN Computation Graph h0 x1 y1 y2 h1 h2 x2 Unrolled over n time-steps. yt... x3 ht

60 Unrolling the RNN Computation Graph h0 x1 y1 y2 h1 h2 x2 Unrolled over n time-steps. yt... x3 ht

61 Step 1: Compute all errors. Unrolling the RNN Computation Graph y1 h0 y2 E1 h1 x1 yt E2 h2 x2 Unrolled over n time-steps.... x3 ht Et

62 Unrolling the RNN Computation Graph y1 h0 y2 E1 h1 x1 Step 1: Compute all errors. Step 2: Pass error back for each time-step from n back to 1. yt E2 h2 x2 Unrolled over n time-steps.... x3 ht Et

63 Unrolling the RNN Computation Graph y1 h0 y2 E1 h1 x1 Step 1: Compute all errors. Step 2: Pass error back for each time-step from n back to 1. Step 3: Update weights. yt E2 h2 x2 Unrolled over n time-steps.... x3 ht Et

64 Unrolling the RNN Computation Graph yt-2 yt-1 yt ht-2 ht-1 ht xt-1 xt

65 Unrolling the RNN Computation Graph yt-2 yt-1 yt ht-2 ht-1 ht Et

66 Unrolling the RNN Computation Graph yt ht-2 ht-1 ht Et

67 Unrolling the RNN Computation Graph yt ht-2 ht-1 ht Et

68 Unrolling the RNN Computation Graph yt ht-2 ht-1 ht Et

69 Unrolling the RNN Computation Graph yt ht-2 ht-1 ht Et

70 Unrolling the RNN Computation Graph yt ht-2 ht-1 ht Et

71 Unrolling the RNN Computation Graph yt ht-2 ht-1 ht Et

72 Unrolling the RNN Computation Graph yt Et where ht-2 ht-1 ht

73 Unrolling the RNN Computation Graph yt-2 Et-2 yt-1 Et-1 yt Et where ht-2 ht-1 ht Total Error = E1 + E2 + + Et Total gradient = sum of all det/dθ s

74 Training: Truncated BPTT Code def bptt(model, X_train, y_train, initial_state): Error # Forward Loss, caches = forward(x_train, y_train, model, initial_state) avg_loss /= y_train.shape[0] # Backward dh_next = np.zeros((1, last_state.shape[0])) grads = {k: np.zeros_like(v) for k, v in model.items()} for t in reversed(range(len(x_train))): grad, dh_next = cell_fn_backward(ys[t], y_train[t], dh_next, caches[t]) for k in grads.keys(): grads[k] += grad[k] return grads, avg_loss Lego block!

75 Training: Truncated BPTT Code def bptt(model, X_train, y_train, initial_state): Error # Forward Loss, caches = forward(x_train, y_train, model, initial_state) avg_loss /= y_train.shape[0] Lego block! # Backward dh_next = np.zeros((1, last_state.shape[0])) grads = {k: np.zeros_like(v) for k, v in model.items()} for t in reversed(range(len(x_train))): grad, dh_next = cell_fn_backward(ys[t], y_train[t], dh_next, caches[t]) for k in grads.keys(): grads[k] += grad[k] return grads, avg_loss Total gradient = Sum of these lego-gradients over time!

76 Vanilla RNN Gradient Flow

77 Vanilla RNN Gradient Flow

78 Vanilla RNN Gradient Flow

79 Vanilla RNN Gradient Flow

80 Vanilla RNN Gradient Flow

81 Vanilla RNN Gradient Flow Part 2!

82 Gated Recurrent Models PART II: Gated Architectures & Applications

83 RECAP: The RNN API New state prev_state Recurrent function Input at current time-step next_state recurrent_fn() x outputs Previous state

84 The GATED RNN API prev_state It is the same! Just a different way of computing the outputs. recurrent_fn() Memory Cell next_state Gates x Controller outputs

85 Implementing a memory cell in a neural network

86 Propagating through a memory cell

87 Backpropagating through a memory cell?

88 LSTM = Long Short Term Memory LSTM concatenate : Xt Ht-1 Ct-1 σ σ tanh σ tanh + X = Xt Ht-1 vector sizes p+n n forget gate : f = σ(x.wf + bf) n update gate : u = σ(x.wu + bu) n result gate : r = σ(x.wr + br) input : X = tanh(x.wc + bc) n Ht new C : Ct = f * Ct-1 + u * X n Ct n new H : Ht = r * tanh(ct) Yt output : Yt = softmax(ht.w + b) m

89 LSTM Xt Ht-1 Ht σ σ tanh σ tanh Ct-1 + Ct Yt σ tanh Neural net. layers tanh Element-wise operations

90 LSTM concatenation Xt Ht-1 Ht σ σ tanh σ tanh Ct-1 + Ct Yt σ tanh Neural net. layers tanh Element-wise operations

91 LSTM Xt What to forget? Ht-1 Ht σ σ tanh σ tanh Ct-1 + Ct Yt σ tanh Neural net. layers tanh Element-wise operations

92 LSTM Xt Ht-1 Actually forget Ht σ σ tanh σ tanh Ct-1 + Ct Yt σ tanh Neural net. layers tanh Element-wise operations

93 LSTM What s the new value? Xt What to update? Ht-1 Ht σ σ tanh σ tanh Ct-1 + Ct Yt σ tanh Neural net. layers tanh Element-wise operations

94 LSTM Xt Ht-1 Ht σ σ tanh Actually update σ tanh Ct-1 + Ct Yt σ tanh Neural net. layers tanh Element-wise operations

95 LSTM Updated! Xt Ht-1 Ht σ σ tanh σ tanh Ct-1 + Ct Yt σ tanh Neural net. layers tanh Element-wise operations

96 LSTM Result Gate! Xt Ht-1 Ht σ σ tanh σ Calculate Result tanh Ct-1 + Ct Yt σ tanh Neural net. layers tanh Element-wise operations

97 LSTM Remember the result for next time step Xt Ht-1 Ht σ σ tanh σ tanh Ct-1 + Ct Yt σ tanh Neural net. layers tanh Element-wise operations Output

98 LSTM = Long Short Term Memory LSTM X = Xt Ht-1 concatenation Xt Ht-1 Ht σ σ tanh σ tanh Ct-1 + Ct Yt σ tanh Neural net. layers tanh Element-wise operations f = σ(x.wf + bf) u = σ(x.wu + bu) r = σ(x.wr + br) X = tanh(x.wc + bc) Ct = f * Ct-1 + u * X Ht = r * tanh(ct) Yt = softmax(ht.w + b)

99 LSTM = Long Short Term Memory LSTM X = Xt Ht-1 concatenation Xt Ht-1 Ht σ σ tanh σ tanh Ct-1 + Ct Yt σ tanh Neural net. layers tanh Element-wise operations f = σ(x.wf + bf) u = σ(x.wu + bu) r = σ(x.wr + br) X = tanh(x.wc + bc) Ct = f * Ct-1 + u * X Ht = r * tanh(ct) Yt = softmax(ht.w + b)

100 Gru!

101 Gated Recurrent Units (GRUs) GRU = Gated Recurrent Unit 2 gates instead of 3 => cheaper GRU Ht Yt vector sizes p+n z = σ(x.wz + bz) r = σ(x.wr + br) X = Xt r * Ht-1 Xt Ht-1 X = Xt Ht-1 Ht n n p+n X = tanh(x.wc + bc) Ht = (1-z) * Ht-1 + z * X n Yt = softmax(ht.w + b) m n

102 Gated Recurrent Units (GRUs) GRU = Gated Recurrent Unit 2 gates instead of 3 => cheaper GRU Ht Yt vector sizes p+n z = σ(x.wz + bz) r = σ(x.wr + br) X = Xt r * Ht-1 Xt Ht-1 X = Xt Ht-1 Ht n n p+n X = tanh(x.wc + bc) Ht = (1-z) * Ht-1 + z * X n Yt = softmax(ht.w + b) m n

103 Long Short-term Memory (LSTM): Gradient Flow

104 Long Short-term Memory (LSTM): Gradient Flow

105 Applications/Tasks Image captioning Sequence classification (Practical 4: MNIST) Language modeling Sequence-labeling (lots of NLP tasks, e.g. POS tagging, NER, ) Sequence-to-sequence learning (Machine translation, Summarization, )

106 One-to-Many: Image captioning GOAL: Given image, generate a sentence to describe its content.

107 One-to-Many: Image captioning GOAL: Given image, generate a sentence to describe its content.

108 Many-to-one: Sequence Classifier (Prac 4) GOAL: Given a sequence of inputs, predict the label for the whole sequence. Examples: Given a sentence, say if it is {negative, neutral, positive} Given the words in an , predict if it is a spam message. Given pieces of an image, predict what number is in the image.

109 Many-to-1: Polarity/Sentiment Classifer We feed all the words into the model one at a time, and make one prediction at the end: h0 h1 Cats Pos/Neg? h2 are h3 awesome

110 Many-to-1: Spam Classifer We feed all the words into the model one at a time, and make one prediction at the end: h0 h1 Viagra Spam/Ham? h2 for h3 cheap

111 Many-to-1: Image Classifier (Prac 4) We chop up the image and feed all the pieces through the model, and then make one prediction at the end: h0 h1 h2 yimag e h3

112 Next-token Prediction: Language modeling 1. Computing p(next word previous words) 2. p(the) p(cat the) p(sits the cat) h0 h1 h2 the cat p(xm...)... h3

113 Next-token Prediction: Language modeling x1 x2 x3 h0 h1 h2 x1 x2 xn... x3 hn

114 Many-to-many: Sequence labeling Mapping each input x1, x2,, xn to its own label y1, y2,, yn (Notice: Same length m; each input has an output.) A lot of NLP Tasks fall in this category, e.g.: Part-of-speech tagging: map words to their parts-of-speech (noun, verb, etc). Named-entity Recognition: identify mentions of people, places, etc in text Semantic Role Labeling: find the main actions, and who performs them on whom/what

115 Many-to-many: Sequence labeling Part-of-speech tagging

116 Many-to-many: Sequence labeling Part-of-speech tagging

117 Many-to-many: Sequence labeling Heat water in a large vessel. h0 heat VERB NOUN h1 h2 water in hn

118 Many-to-many: Sequence-to-Sequence Modeling Mapping input sequences {x1, x2,, xm} to output sequences {y1, y2,, yn}. Note: Arbitrary m and n. Many applications, most notably Machine Translation. Also: POS tagging, Parsing, Summarization, Dialog

119 Many-to-many: Sequence-to-Sequence Modeling Encoder RNN Decoder RNN the blue la house <s> maison la bleu maison <eos> bleu MACHINE TRANSLATION

120 Many-to-many: Sequence-to-Sequence Modeling DIALOGUE MODELING Prompt: You can t eat pasta every day. You can t live only on Italian food. System: I don t eat Italian food, but I do eat a lot of Italian food.

121 Many-to-many: Sequence-to-Sequence Modeling Prompt: What do you have planned tonight? Baseline: I have no idea what I want to do with my life. I have no idea what I want to (Bad, Mediocre, Bad) Ours: I was planning on doing a few of these, but I m not sure what to choose. I need to find a place to camp, and I don t have a lot of time to make a plan. (Excellent, Good, Mediocre)

122 Many-to-many: Sequence-to-Sequence Modeling Prompt: Have you seen Suicide Squad? Baseline: I have not, but I will check it out. (Good, Good, Good) Ours: I saw it in a theater once and it was great. It was very... I don t know, a little dark. I like the psychological effects and the way it portrays the characters in the movie. (Excellent, Excellent, Excellent)

123 Key take-aways RNNs have memory/state that evolve over time. We unroll the graph over time to do forward propagation. Backprop-through-time (BPTT): Vanishing/exploding gradients Gated architectures Perform Chain Rule over the unrolled graph efficiently by saving and reusing previous computations. de/dw is sum over all time-steps (b/c of tied weights) State is selectively overwritten per time-step Uninterrupted gradient flow through time: no vanishing/exploding gradients! Lots of cool applications!

124 Slide Credits Thank-you to the following resources, from which some of these slides were drawn and adapted. Stanford CS231n TensorFlow without a PhD

125 The end.

Sequence Modeling: Recurrent and Recursive Nets. By Pyry Takala 14 Oct 2015

Sequence Modeling: Recurrent and Recursive Nets. By Pyry Takala 14 Oct 2015 Sequence Modeling: Recurrent and Recursive Nets By Pyry Takala 14 Oct 2015 Agenda Why Recurrent neural networks? Anatomy and basic training of an RNN (10.2, 10.2.1) Properties of RNNs (10.2.2, 8.2.6) Using

More information

Recurrent Neural Networks. Nand Kishore, Audrey Huang, Rohan Batra

Recurrent Neural Networks. Nand Kishore, Audrey Huang, Rohan Batra Recurrent Neural Networks Nand Kishore, Audrey Huang, Rohan Batra Roadmap Issues Motivation 1 Application 1: Sequence Level Training 2 Basic Structure 3 4 Variations 5 Application 3: Image Classification

More information

Machine Learning 13. week

Machine Learning 13. week Machine Learning 13. week Deep Learning Convolutional Neural Network Recurrent Neural Network 1 Why Deep Learning is so Popular? 1. Increase in the amount of data Thanks to the Internet, huge amount of

More information

CSC 578 Neural Networks and Deep Learning

CSC 578 Neural Networks and Deep Learning CSC 578 Neural Networks and Deep Learning Fall 2018/19 7. Recurrent Neural Networks (Some figures adapted from NNDL book) 1 Recurrent Neural Networks 1. Recurrent Neural Networks (RNNs) 2. RNN Training

More information

Encoding RNNs, 48 End of sentence (EOS) token, 207 Exploding gradient, 131 Exponential function, 42 Exponential Linear Unit (ELU), 44

Encoding RNNs, 48 End of sentence (EOS) token, 207 Exploding gradient, 131 Exponential function, 42 Exponential Linear Unit (ELU), 44 A Activation potential, 40 Annotated corpus add padding, 162 check versions, 158 create checkpoints, 164, 166 create input, 160 create train and validation datasets, 163 dropout, 163 DRUG-AE.rel file,

More information

Slide credit from Hung-Yi Lee & Richard Socher

Slide credit from Hung-Yi Lee & Richard Socher Slide credit from Hung-Yi Lee & Richard Socher 1 Review Word Vector 2 Word2Vec Variants Skip-gram: predicting surrounding words given the target word (Mikolov+, 2013) CBOW (continuous bag-of-words): predicting

More information

LSTM and its variants for visual recognition. Xiaodan Liang Sun Yat-sen University

LSTM and its variants for visual recognition. Xiaodan Liang Sun Yat-sen University LSTM and its variants for visual recognition Xiaodan Liang xdliang328@gmail.com Sun Yat-sen University Outline Context Modelling with CNN LSTM and its Variants LSTM Architecture Variants Application in

More information

Speech and Language Processing. Daniel Jurafsky & James H. Martin. Copyright c All rights reserved. Draft of September 23, 2018.

Speech and Language Processing. Daniel Jurafsky & James H. Martin. Copyright c All rights reserved. Draft of September 23, 2018. Speech and Language Processing. Daniel Jurafsky & James H. Martin. Copyright c 2018. All rights reserved. Draft of September 23, 2018. CHAPTER 9 Sequence Processing with Recurrent Networks Time will explain.

More information

Table of Contents. What Really is a Hidden Unit? Visualizing Feed-Forward NNs. Visualizing Convolutional NNs. Visualizing Recurrent NNs

Table of Contents. What Really is a Hidden Unit? Visualizing Feed-Forward NNs. Visualizing Convolutional NNs. Visualizing Recurrent NNs Table of Contents What Really is a Hidden Unit? Visualizing Feed-Forward NNs Visualizing Convolutional NNs Visualizing Recurrent NNs Visualizing Attention Visualizing High Dimensional Data What do visualizations

More information

Recurrent Neural Networks

Recurrent Neural Networks Recurrent Neural Networks Javier Béjar Deep Learning 2018/2019 Fall Master in Artificial Intelligence (FIB-UPC) Introduction Sequential data Many problems are described by sequences Time series Video/audio

More information

27: Hybrid Graphical Models and Neural Networks

27: Hybrid Graphical Models and Neural Networks 10-708: Probabilistic Graphical Models 10-708 Spring 2016 27: Hybrid Graphical Models and Neural Networks Lecturer: Matt Gormley Scribes: Jakob Bauer Otilia Stretcu Rohan Varma 1 Motivation We first look

More information

Natural Language Processing with Deep Learning CS224N/Ling284

Natural Language Processing with Deep Learning CS224N/Ling284 Natural Language Processing with Deep Learning CS224N/Ling284 Lecture 8: Recurrent Neural Networks Christopher Manning and Richard Socher Organization Extra project office hour today after lecture Overview

More information

Transition-based Parsing with Neural Nets

Transition-based Parsing with Neural Nets CS11-747 Neural Networks for NLP Transition-based Parsing with Neural Nets Graham Neubig Site https://phontron.com/class/nn4nlp2017/ Two Types of Linguistic Structure Dependency: focus on relations between

More information

Recurrent Neural Networks

Recurrent Neural Networks Recurrent Neural Networks 11-785 / Fall 2018 / Recitation 7 Raphaël Olivier Recap : RNNs are magic They have infinite memory They handle all kinds of series They re the basis of recent NLP : Translation,

More information

A Quick Guide on Training a neural network using Keras.

A Quick Guide on Training a neural network using Keras. A Quick Guide on Training a neural network using Keras. TensorFlow and Keras Keras Open source High level, less flexible Easy to learn Perfect for quick implementations Starts by François Chollet from

More information

Sentiment Classification of Food Reviews

Sentiment Classification of Food Reviews Sentiment Classification of Food Reviews Hua Feng Department of Electrical Engineering Stanford University Stanford, CA 94305 fengh15@stanford.edu Ruixi Lin Department of Electrical Engineering Stanford

More information

Transition-Based Dependency Parsing with Stack Long Short-Term Memory

Transition-Based Dependency Parsing with Stack Long Short-Term Memory Transition-Based Dependency Parsing with Stack Long Short-Term Memory Chris Dyer, Miguel Ballesteros, Wang Ling, Austin Matthews, Noah A. Smith Association for Computational Linguistics (ACL), 2015 Presented

More information

Lecture 20: Neural Networks for NLP. Zubin Pahuja

Lecture 20: Neural Networks for NLP. Zubin Pahuja Lecture 20: Neural Networks for NLP Zubin Pahuja zpahuja2@illinois.edu courses.engr.illinois.edu/cs447 CS447: Natural Language Processing 1 Today s Lecture Feed-forward neural networks as classifiers simple

More information

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic SEMANTIC COMPUTING Lecture 8: Introduction to Deep Learning Dagmar Gromann International Center For Computational Logic TU Dresden, 7 December 2018 Overview Introduction Deep Learning General Neural Networks

More information

Convolutional Sequence to Sequence Learning. Denis Yarats with Jonas Gehring, Michael Auli, David Grangier, Yann Dauphin Facebook AI Research

Convolutional Sequence to Sequence Learning. Denis Yarats with Jonas Gehring, Michael Auli, David Grangier, Yann Dauphin Facebook AI Research Convolutional Sequence to Sequence Learning Denis Yarats with Jonas Gehring, Michael Auli, David Grangier, Yann Dauphin Facebook AI Research Sequence generation Need to model a conditional distribution

More information

EECS 496 Statistical Language Models. Winter 2018

EECS 496 Statistical Language Models. Winter 2018 EECS 496 Statistical Language Models Winter 2018 Introductions Professor: Doug Downey Course web site: www.cs.northwestern.edu/~ddowney/courses/496_winter2018 (linked off prof. home page) Logistics Grading

More information

Natural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu

Natural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu Natural Language Processing CS 6320 Lecture 6 Neural Language Models Instructor: Sanda Harabagiu In this lecture We shall cover: Deep Neural Models for Natural Language Processing Introduce Feed Forward

More information

SEMANTIC COMPUTING. Lecture 9: Deep Learning: Recurrent Neural Networks (RNNs) TU Dresden, 21 December 2018

SEMANTIC COMPUTING. Lecture 9: Deep Learning: Recurrent Neural Networks (RNNs) TU Dresden, 21 December 2018 SEMANTIC COMPUTING Lecture 9: Deep Learning: Recurrent Neural Networks (RNNs) Dagmar Gromann International Center For Computational Logic TU Dresden, 21 December 2018 Overview Handling Overfitting Recurrent

More information

Machine Learning for Natural Language Processing. Alice Oh January 17, 2018

Machine Learning for Natural Language Processing. Alice Oh January 17, 2018 Machine Learning for Natural Language Processing Alice Oh January 17, 2018 Overview Distributed representation Temporal neural networks RNN LSTM GRU Sequence-to-sequence models Machine translation Response

More information

Recurrent Neural Network (RNN) Industrial AI Lab.

Recurrent Neural Network (RNN) Industrial AI Lab. Recurrent Neural Network (RNN) Industrial AI Lab. For example (Deterministic) Time Series Data Closed- form Linear difference equation (LDE) and initial condition High order LDEs 2 (Stochastic) Time Series

More information

Convolutional Networks for Text

Convolutional Networks for Text CS11-747 Neural Networks for NLP Convolutional Networks for Text Graham Neubig Site https://phontron.com/class/nn4nlp2017/ An Example Prediction Problem: Sentence Classification I hate this movie very

More information

Combining Neural Networks and Log-linear Models to Improve Relation Extraction

Combining Neural Networks and Log-linear Models to Improve Relation Extraction Combining Neural Networks and Log-linear Models to Improve Relation Extraction Thien Huu Nguyen and Ralph Grishman Computer Science Department, New York University {thien,grishman}@cs.nyu.edu Outline Relation

More information

Structured Attention Networks

Structured Attention Networks Structured Attention Networks Yoon Kim Carl Denton Luong Hoang Alexander M. Rush HarvardNLP ICLR, 2017 Presenter: Chao Jiang ICLR, 2017 Presenter: Chao Jiang 1 / Outline 1 Deep Neutral Networks for Text

More information

Natural Language Processing with Deep Learning CS224N/Ling284. Christopher Manning Lecture 4: Backpropagation and computation graphs

Natural Language Processing with Deep Learning CS224N/Ling284. Christopher Manning Lecture 4: Backpropagation and computation graphs Natural Language Processing with Deep Learning CS4N/Ling84 Christopher Manning Lecture 4: Backpropagation and computation graphs Lecture Plan Lecture 4: Backpropagation and computation graphs 1. Matrix

More information

CS 6501: Deep Learning for Computer Graphics. Training Neural Networks II. Connelly Barnes

CS 6501: Deep Learning for Computer Graphics. Training Neural Networks II. Connelly Barnes CS 6501: Deep Learning for Computer Graphics Training Neural Networks II Connelly Barnes Overview Preprocessing Initialization Vanishing/exploding gradients problem Batch normalization Dropout Additional

More information

Recurrent Neural Nets II

Recurrent Neural Nets II Recurrent Neural Nets II Steven Spielberg Pon Kumar, Tingke (Kevin) Shen Machine Learning Reading Group, Fall 2016 9 November, 2016 Outline 1 Introduction 2 Problem Formulations with RNNs 3 LSTM for Optimization

More information

Lecture 2 Notes. Outline. Neural Networks. The Big Idea. Architecture. Instructors: Parth Shah, Riju Pahwa

Lecture 2 Notes. Outline. Neural Networks. The Big Idea. Architecture. Instructors: Parth Shah, Riju Pahwa Instructors: Parth Shah, Riju Pahwa Lecture 2 Notes Outline 1. Neural Networks The Big Idea Architecture SGD and Backpropagation 2. Convolutional Neural Networks Intuition Architecture 3. Recurrent Neural

More information

LSTM for Language Translation and Image Captioning. Tel Aviv University Deep Learning Seminar Oran Gafni & Noa Yedidia

LSTM for Language Translation and Image Captioning. Tel Aviv University Deep Learning Seminar Oran Gafni & Noa Yedidia 1 LSTM for Language Translation and Image Captioning Tel Aviv University Deep Learning Seminar Oran Gafni & Noa Yedidia 2 Part I LSTM for Language Translation Motivation Background (RNNs, LSTMs) Model

More information

CS 224n: Assignment #3

CS 224n: Assignment #3 CS 224n: Assignment #3 Due date: 2/27 11:59 PM PST (You are allowed to use 3 late days maximum for this assignment) These questions require thought, but do not require long answers. Please be as concise

More information

Deep Neural Networks Applications in Handwriting Recognition

Deep Neural Networks Applications in Handwriting Recognition Deep Neural Networks Applications in Handwriting Recognition 2 Who am I? Théodore Bluche PhD defended at Université Paris-Sud last year Deep Neural Networks for Large Vocabulary Handwritten

More information

Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling

Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling Authors: Junyoung Chung, Caglar Gulcehre, KyungHyun Cho and Yoshua Bengio Presenter: Yu-Wei Lin Background: Recurrent Neural

More information

CAP 6412 Advanced Computer Vision

CAP 6412 Advanced Computer Vision CAP 6412 Advanced Computer Vision http://www.cs.ucf.edu/~bgong/cap6412.html Boqing Gong Feb 04, 2016 Today Administrivia Attention Modeling in Image Captioning, by Karan Neural networks & Backpropagation

More information

Deep Learning Applications

Deep Learning Applications October 20, 2017 Overview Supervised Learning Feedforward neural network Convolution neural network Recurrent neural network Recursive neural network (Recursive neural tensor network) Unsupervised Learning

More information

This Talk. 1) Node embeddings. Map nodes to low-dimensional embeddings. 2) Graph neural networks. Deep learning architectures for graphstructured

This Talk. 1) Node embeddings. Map nodes to low-dimensional embeddings. 2) Graph neural networks. Deep learning architectures for graphstructured Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018 1 This Talk 1) Node embeddings Map nodes to low-dimensional embeddings. 2) Graph neural networks Deep learning architectures

More information

Modeling Sequences Conditioned on Context with RNNs

Modeling Sequences Conditioned on Context with RNNs Modeling Sequences Conditioned on Context with RNNs Sargur Srihari srihari@buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 10. Topics in Sequence

More information

Hidden Units. Sargur N. Srihari

Hidden Units. Sargur N. Srihari Hidden Units Sargur N. srihari@cedar.buffalo.edu 1 Topics in Deep Feedforward Networks Overview 1. Example: Learning XOR 2. Gradient-Based Learning 3. Hidden Units 4. Architecture Design 5. Backpropagation

More information

Context Encoding LSTM CS224N Course Project

Context Encoding LSTM CS224N Course Project Context Encoding LSTM CS224N Course Project Abhinav Rastogi arastogi@stanford.edu Supervised by - Samuel R. Bowman December 7, 2015 Abstract This project uses ideas from greedy transition based parsing

More information

Show, Discriminate, and Tell: A Discriminatory Image Captioning Model with Deep Neural Networks

Show, Discriminate, and Tell: A Discriminatory Image Captioning Model with Deep Neural Networks Show, Discriminate, and Tell: A Discriminatory Image Captioning Model with Deep Neural Networks Zelun Luo Department of Computer Science Stanford University zelunluo@stanford.edu Te-Lin Wu Department of

More information

DEEP LEARNING REVIEW. Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature Presented by Divya Chitimalla

DEEP LEARNING REVIEW. Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature Presented by Divya Chitimalla DEEP LEARNING REVIEW Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature 2015 -Presented by Divya Chitimalla What is deep learning Deep learning allows computational models that are composed of multiple

More information

Empirical Evaluation of RNN Architectures on Sentence Classification Task

Empirical Evaluation of RNN Architectures on Sentence Classification Task Empirical Evaluation of RNN Architectures on Sentence Classification Task Lei Shen, Junlin Zhang Chanjet Information Technology lorashen@126.com, zhangjlh@chanjet.com Abstract. Recurrent Neural Networks

More information

Conditioned Generation

Conditioned Generation CS11-747 Neural Networks for NLP Conditioned Generation Graham Neubig Site https://phontron.com/class/nn4nlp2017/ Language Models Language models are generative models of text s ~ P(x) The Malfoys! said

More information

Reservoir Computing with Emphasis on Liquid State Machines

Reservoir Computing with Emphasis on Liquid State Machines Reservoir Computing with Emphasis on Liquid State Machines Alex Klibisz University of Tennessee aklibisz@gmail.com November 28, 2016 Context and Motivation Traditional ANNs are useful for non-linear problems,

More information

Deep neural networks

Deep neural networks Deep neural networks Outline What s new in ANNs in the last 5-10 years? Deeper networks, more data, and faster training Scalability and use of GPUs Symbolic differentiation reverse-mode automatic differentiation

More information

Neural Nets & Deep Learning

Neural Nets & Deep Learning Neural Nets & Deep Learning The Inspiration Inputs Outputs Our brains are pretty amazing, what if we could do something similar with computers? Image Source: http://ib.bioninja.com.au/_media/neuron _med.jpeg

More information

Deep Neural Networks Applications in Handwriting Recognition

Deep Neural Networks Applications in Handwriting Recognition Deep Neural Networks Applications in Handwriting Recognition Théodore Bluche theodore.bluche@gmail.com São Paulo Meetup - 9 Mar. 2017 2 Who am I? Théodore Bluche PhD defended

More information

The Mathematics Behind Neural Networks

The Mathematics Behind Neural Networks The Mathematics Behind Neural Networks Pattern Recognition and Machine Learning by Christopher M. Bishop Student: Shivam Agrawal Mentor: Nathaniel Monson Courtesy of xkcd.com The Black Box Training the

More information

CPSC 340: Machine Learning and Data Mining. Multi-Class Classification Fall 2017

CPSC 340: Machine Learning and Data Mining. Multi-Class Classification Fall 2017 CPSC 340: Machine Learning and Data Mining Multi-Class Classification Fall 2017 Assignment 3: Admin Check update thread on Piazza for correct definition of trainndx. This could make your cross-validation

More information

Asynchronous Parallel Learning for Neural Networks and Structured Models with Dense Features

Asynchronous Parallel Learning for Neural Networks and Structured Models with Dense Features Asynchronous Parallel Learning for Neural Networks and Structured Models with Dense Features Xu SUN ( 孙栩 ) Peking University xusun@pku.edu.cn Motivation Neural networks -> Good Performance CNN, RNN, LSTM

More information

Grounded Compositional Semantics for Finding and Describing Images with Sentences

Grounded Compositional Semantics for Finding and Describing Images with Sentences Grounded Compositional Semantics for Finding and Describing Images with Sentences R. Socher, A. Karpathy, V. Le,D. Manning, A Y. Ng - 2013 Ali Gharaee 1 Alireza Keshavarzi 2 1 Department of Computational

More information

Semantic image search using queries

Semantic image search using queries Semantic image search using queries Shabaz Basheer Patel, Anand Sampat Department of Electrical Engineering Stanford University CA 94305 shabaz@stanford.edu,asampat@stanford.edu Abstract Previous work,

More information

MIXED PRECISION TRAINING: THEORY AND PRACTICE Paulius Micikevicius

MIXED PRECISION TRAINING: THEORY AND PRACTICE Paulius Micikevicius MIXED PRECISION TRAINING: THEORY AND PRACTICE Paulius Micikevicius What is Mixed Precision Training? Reduced precision tensor math with FP32 accumulation, FP16 storage Successfully used to train a variety

More information

Artificial Neural Networks. Introduction to Computational Neuroscience Ardi Tampuu

Artificial Neural Networks. Introduction to Computational Neuroscience Ardi Tampuu Artificial Neural Networks Introduction to Computational Neuroscience Ardi Tampuu 7.0.206 Artificial neural network NB! Inspired by biology, not based on biology! Applications Automatic speech recognition

More information

How GPUs Power Comcast's X1 Voice Remote and Smart Video Analytics. Jan Neumann Comcast Labs DC May 10th, 2017

How GPUs Power Comcast's X1 Voice Remote and Smart Video Analytics. Jan Neumann Comcast Labs DC May 10th, 2017 How GPUs Power Comcast's X1 Voice Remote and Smart Video Analytics Jan Neumann Comcast Labs DC May 10th, 2017 Comcast Applied Artificial Intelligence Lab Media & Video Analytics Smart TV Deep Learning

More information

Lecture 21 : A Hybrid: Deep Learning and Graphical Models

Lecture 21 : A Hybrid: Deep Learning and Graphical Models 10-708: Probabilistic Graphical Models, Spring 2018 Lecture 21 : A Hybrid: Deep Learning and Graphical Models Lecturer: Kayhan Batmanghelich Scribes: Paul Liang, Anirudha Rayasam 1 Introduction and Motivation

More information

Can Active Memory Replace Attention?

Can Active Memory Replace Attention? Google Brain NIPS 2016 Presenter: Chao Jiang NIPS 2016 Presenter: Chao Jiang 1 / Outline 1 Introduction 2 Active Memory 3 Step by Step to Neural GPU 4 Another two steps: 1. the Markovian Neural GPU 5 Another

More information

16-785: Integrated Intelligence in Robotics: Vision, Language, and Planning. Spring 2018 Lecture 14. Image to Text

16-785: Integrated Intelligence in Robotics: Vision, Language, and Planning. Spring 2018 Lecture 14. Image to Text 16-785: Integrated Intelligence in Robotics: Vision, Language, and Planning Spring 2018 Lecture 14. Image to Text Input Output Classification tasks 4/1/18 CMU 16-785: Integrated Intelligence in Robotics

More information

FastText. Jon Koss, Abhishek Jindal

FastText. Jon Koss, Abhishek Jindal FastText Jon Koss, Abhishek Jindal FastText FastText is on par with state-of-the-art deep learning classifiers in terms of accuracy But it is way faster: FastText can train on more than one billion words

More information

Structured Prediction Basics

Structured Prediction Basics CS11-747 Neural Networks for NLP Structured Prediction Basics Graham Neubig Site https://phontron.com/class/nn4nlp2017/ A Prediction Problem I hate this movie I love this movie very good good neutral bad

More information

CS489/698: Intro to ML

CS489/698: Intro to ML CS489/698: Intro to ML Lecture 14: Training of Deep NNs Instructor: Sun Sun 1 Outline Activation functions Regularization Gradient-based optimization 2 Examples of activation functions 3 5/28/18 Sun Sun

More information

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides Deep Learning in Visual Recognition Thanks Da Zhang for the slides Deep Learning is Everywhere 2 Roadmap Introduction Convolutional Neural Network Application Image Classification Object Detection Object

More information

Statistical Methods for NLP

Statistical Methods for NLP Statistical Methods for NLP Information Extraction, Hidden Markov Models Sameer Maskey * Most of the slides provided by Bhuvana Ramabhadran, Stanley Chen, Michael Picheny Speech Recognition Lecture 4:

More information

RNNs as Directed Graphical Models

RNNs as Directed Graphical Models RNNs as Directed Graphical Models Sargur Srihari srihari@buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 10. Topics in Sequence Modeling Overview

More information

Deep Learning. Deep Learning. Practical Application Automatically Adding Sounds To Silent Movies

Deep Learning. Deep Learning. Practical Application Automatically Adding Sounds To Silent Movies http://blog.csdn.net/zouxy09/article/details/8775360 Automatic Colorization of Black and White Images Automatically Adding Sounds To Silent Movies Traditionally this was done by hand with human effort

More information

Lecture 4: Backpropagation and Automatic Differentiation. CSE599W: Spring 2018

Lecture 4: Backpropagation and Automatic Differentiation. CSE599W: Spring 2018 Lecture 4: Backpropagation and Automatic Differentiation CSE599W: Spring 208 Announcement Assignment is out today, due in 2 weeks (Apr 9 th, 5pm) Model Training Overview layer extractor layer2 extractor

More information

How to Develop Encoder-Decoder LSTMs

How to Develop Encoder-Decoder LSTMs Chapter 9 How to Develop Encoder-Decoder LSTMs 9.0.1 Lesson Goal The goal of this lesson is to learn how to develop encoder-decoder LSTM models. completing this lesson, you will know: After ˆ The Encoder-Decoder

More information

Replacing Neural Networks with Black-Box ODE Solvers

Replacing Neural Networks with Black-Box ODE Solvers Replacing Neural Networks with Black-Box ODE Solvers Tian Qi Chen, Yulia Rubanova, Jesse Bettencourt, David Duvenaud University of Toronto, Vector Institute Resnets are Euler integrators Middle layers

More information

A Hybrid Neural Model for Type Classification of Entity Mentions

A Hybrid Neural Model for Type Classification of Entity Mentions A Hybrid Neural Model for Type Classification of Entity Mentions Motivation Types group entities to categories Entity types are important for various NLP tasks Our task: predict an entity mention s type

More information

Multinomial Regression and the Softmax Activation Function. Gary Cottrell!

Multinomial Regression and the Softmax Activation Function. Gary Cottrell! Multinomial Regression and the Softmax Activation Function Gary Cottrell Notation reminder We have N data points, or patterns, in the training set, with the pattern number as a superscript: {(x 1,t 1 ),

More information

The Hitchhiker s Guide to TensorFlow:

The Hitchhiker s Guide to TensorFlow: The Hitchhiker s Guide to TensorFlow: Beyond Recurrent Neural Networks (sort of) Keith Davis @keithdavisiii iamthevastidledhitchhiker.github.io Topics Kohonen/Self-Organizing Maps LSTMs in TensorFlow GRU

More information

A Simple (?) Exercise: Predicting the Next Word

A Simple (?) Exercise: Predicting the Next Word CS11-747 Neural Networks for NLP A Simple (?) Exercise: Predicting the Next Word Graham Neubig Site https://phontron.com/class/nn4nlp2017/ Are These Sentences OK? Jane went to the store. store to Jane

More information

Bayesian model ensembling using meta-trained recurrent neural networks

Bayesian model ensembling using meta-trained recurrent neural networks Bayesian model ensembling using meta-trained recurrent neural networks Luca Ambrogioni l.ambrogioni@donders.ru.nl Umut Güçlü u.guclu@donders.ru.nl Yağmur Güçlütürk y.gucluturk@donders.ru.nl Julia Berezutskaya

More information

XES Tensorflow Process Prediction using the Tensorflow Deep-Learning Framework

XES Tensorflow Process Prediction using the Tensorflow Deep-Learning Framework XES Tensorflow Process Prediction using the Tensorflow Deep-Learning Framework Demo Paper Joerg Evermann 1, Jana-Rebecca Rehse 2,3, and Peter Fettke 2,3 1 Memorial University of Newfoundland 2 German Research

More information

Deep Learning in NLP. Horacio Rodríguez. AHLT Deep Learning 2 1

Deep Learning in NLP. Horacio Rodríguez. AHLT Deep Learning 2 1 Deep Learning in NLP Horacio Rodríguez AHLT Deep Learning 2 1 Outline Introduction Short review of Distributional Semantics, Semantic spaces, VSM, Embeddings Embedding of words Embedding of more complex

More information

Predicting your Next Stop-Over from Location-Based Social Network Data with Recurrent Neural Networks RecTour workshop 2017 RecSys 2017, Como, Italy

Predicting your Next Stop-Over from Location-Based Social Network Data with Recurrent Neural Networks RecTour workshop 2017 RecSys 2017, Como, Italy Predicting your Next Stop-Over from Location-Based Social Network Data with Recurrent Neural Networks RecTour workshop 217 RecSys 217, Como, Italy Enrico Palumbo, ISMB, Italy, Turin Giuseppe Rizzo, ISMB,

More information

House Price Prediction Using LSTM

House Price Prediction Using LSTM House Price Prediction Using LSTM Xiaochen Chen Lai Wei The Hong Kong University of Science and Technology Jiaxin Xu ABSTRACT In this paper, we use the house price data ranging from January 2004 to October

More information

Sentiment Classification of Food Reviews

Sentiment Classification of Food Reviews 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Sentiment Classification of Food Reviews Hua Feng Ruixi Lin Department of Electrical Engineering Department of Electrical Engineering Stanford University Stanford University

More information

ECE 6504: Deep Learning for Perception

ECE 6504: Deep Learning for Perception ECE 6504: Deep Learning for Perception Topics: (Finish) Backprop Convolutional Neural Nets Dhruv Batra Virginia Tech Administrativia Presentation Assignments https://docs.google.com/spreadsheets/d/ 1m76E4mC0wfRjc4HRBWFdAlXKPIzlEwfw1-u7rBw9TJ8/

More information

Why is word2vec so fast? Efficiency tricks for neural nets

Why is word2vec so fast? Efficiency tricks for neural nets CS11-747 Neural Networks for NLP Why is word2vec so fast? Efficiency tricks for neural nets Taylor Berg-Kirkpatrick Site https://phontron.com/class/nn4nlp2017/ Glamorous Life of an AI Scientist Perception

More information

Advanced Search Algorithms

Advanced Search Algorithms CS11-747 Neural Networks for NLP Advanced Search Algorithms Daniel Clothiaux https://phontron.com/class/nn4nlp2017/ Why search? So far, decoding has mostly been greedy Chose the most likely output from

More information

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech Convolutional Neural Networks Computer Vision Jia-Bin Huang, Virginia Tech Today s class Overview Convolutional Neural Network (CNN) Training CNN Understanding and Visualizing CNN Image Categorization:

More information

Layerwise Interweaving Convolutional LSTM

Layerwise Interweaving Convolutional LSTM Layerwise Interweaving Convolutional LSTM Tiehang Duan and Sargur N. Srihari Department of Computer Science and Engineering The State University of New York at Buffalo Buffalo, NY 14260, United States

More information

Stack- propaga+on: Improved Representa+on Learning for Syntax

Stack- propaga+on: Improved Representa+on Learning for Syntax Stack- propaga+on: Improved Representa+on Learning for Syntax Yuan Zhang, David Weiss MIT, Google 1 Transi+on- based Neural Network Parser p(action configuration) So1max Hidden Embedding words labels POS

More information

Residual Networks And Attention Models. cs273b Recitation 11/11/2016. Anna Shcherbina

Residual Networks And Attention Models. cs273b Recitation 11/11/2016. Anna Shcherbina Residual Networks And Attention Models cs273b Recitation 11/11/2016 Anna Shcherbina Introduction to ResNets Introduced in 2015 by Microsoft Research Deep Residual Learning for Image Recognition (He, Zhang,

More information

Deep Learning Basic Lecture - Complex Systems & Artificial Intelligence 2017/18 (VO) Asan Agibetov, PhD.

Deep Learning Basic Lecture - Complex Systems & Artificial Intelligence 2017/18 (VO) Asan Agibetov, PhD. Deep Learning 861.061 Basic Lecture - Complex Systems & Artificial Intelligence 2017/18 (VO) Asan Agibetov, PhD asan.agibetov@meduniwien.ac.at Medical University of Vienna Center for Medical Statistics,

More information

Recurrent Neural Networks and Transfer Learning for Action Recognition

Recurrent Neural Networks and Transfer Learning for Action Recognition Recurrent Neural Networks and Transfer Learning for Action Recognition Andrew Giel Stanford University agiel@stanford.edu Ryan Diaz Stanford University ryandiaz@stanford.edu Abstract We have taken on the

More information

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

COMP 551 Applied Machine Learning Lecture 16: Deep Learning COMP 551 Applied Machine Learning Lecture 16: Deep Learning Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted, all

More information

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017 3/0/207 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/0/207 Perceptron as a neural

More information

OPTIMIZING PERFORMANCE OF RECURRENT NEURAL NETWORKS

OPTIMIZING PERFORMANCE OF RECURRENT NEURAL NETWORKS April 4-7, 2016 Silicon Valley OPTIMIZING PERFORMANCE OF RECURRENT NEURAL NETWORKS Jeremy Appleyard, 7 April 2016 RECURRENT NEURAL NETWORKS Output is fed into input Perform the same operation repeatedly

More information

Outline. Morning program Preliminaries Semantic matching Learning to rank Entities

Outline. Morning program Preliminaries Semantic matching Learning to rank Entities 112 Outline Morning program Preliminaries Semantic matching Learning to rank Afternoon program Modeling user behavior Generating responses Recommender systems Industry insights Q&A 113 are polysemic Finding

More information

CSC321 Lecture 10: Automatic Differentiation

CSC321 Lecture 10: Automatic Differentiation CSC321 Lecture 10: Automatic Differentiation Roger Grosse Roger Grosse CSC321 Lecture 10: Automatic Differentiation 1 / 23 Overview Implementing backprop by hand is like programming in assembly language.

More information

MIXED PRECISION TRAINING OF NEURAL NETWORKS. Carl Case, Senior Architect, NVIDIA

MIXED PRECISION TRAINING OF NEURAL NETWORKS. Carl Case, Senior Architect, NVIDIA MIXED PRECISION TRAINING OF NEURAL NETWORKS Carl Case, Senior Architect, NVIDIA OUTLINE 1. What is mixed precision training with FP16? 2. Considerations and methodology for mixed precision training 3.

More information

Research at Google: With a Case of YouTube-8M. Joonseok Lee, Google Research

Research at Google: With a Case of YouTube-8M. Joonseok Lee, Google Research Research at Google: With a Case of YouTube-8M Joonseok Lee, Google Research Google Research We tackle the most challenging problems in Computer Science and related fields. Being bold and taking risks allows

More information

Autoencoder. Representation learning (related to dictionary learning) Both the input and the output are x

Autoencoder. Representation learning (related to dictionary learning) Both the input and the output are x Deep Learning 4 Autoencoder, Attention (spatial transformer), Multi-modal learning, Neural Turing Machine, Memory Networks, Generative Adversarial Net Jian Li IIIS, Tsinghua Autoencoder Autoencoder Unsupervised

More information

A Neuro Probabilistic Language Model Bengio et. al. 2003

A Neuro Probabilistic Language Model Bengio et. al. 2003 A Neuro Probabilistic Language Model Bengio et. al. 2003 Class Discussion Notes Scribe: Olivia Winn February 1, 2016 Opening thoughts (or why this paper is interesting): Word embeddings currently have

More information

Frameworks. Advanced Machine Learning for NLP Jordan Boyd-Graber RECURRENT NEURAL NETWORKS IN DYNET

Frameworks. Advanced Machine Learning for NLP Jordan Boyd-Graber RECURRENT NEURAL NETWORKS IN DYNET Frameworks Advanced Machine Learning for NLP Jordan Boyd-Graber RECURRENT NEURAL NETWORKS IN DYNET Slides adapted from Chris Dyer, Yoav Goldberg, Graham Neubig Advanced Machine Learning for NLP Boyd-Graber

More information