Structured Attention Networks Yoon Kim Carl Denton Luong Hoang Alexander M. Rush HarvardNLP ICLR, 2017 Presenter: Chao Jiang ICLR, 2017 Presenter: Chao Jiang 1 /
Outline 1 Deep Neutral Networks for Text Processing and Generation 2 Attention Networks 3 Structured Attention Networks Overview Computational Challenges Structured Attention in Practice 4 Conclusion and Future Work ICLR, 2017 Presenter: Chao Jiang 2 /
Pure Encoder-Decoder Network ICLR, 2017 Presenter: Chao Jiang 3 /
Pure Encoder-Decoder Network ICLR, 2017 Presenter: Chao Jiang 4 /
Pure Encoder-Decoder Network ICLR, 2017 Presenter: Chao Jiang 5 /
Encoder-Decoder with Attention Machine Translation Question Answering Natural Language Inference Algorithm Learning Parsing Speech Recognition Summarization Caption Generation and more ICLR, 2017 Presenter: Chao Jiang 6 /
Attention Networks ICLR, 2017 Presenter: Chao Jiang 7 /
Attention Networks ICLR, 2017 Presenter: Chao Jiang 8 /
Attention Networks ICLR, 2017 Presenter: Chao Jiang 9 /
Attention Networks ICLR, 2017 Presenter: Chao Jiang 10 /
Attention Networks ICLR, 2017 Presenter: Chao Jiang 11 /
Outline 1 Deep Neutral Networks for Text Processing and Generation 2 Attention Networks 3 Structured Attention Networks Overview Computational Challenges Structured Attention in Practice 4 Conclusion and Future Work ICLR, 2017 Presenter: Chao Jiang 12 /
Overview Key difference: Replace simple attention with distribution over a combinatorial set of structures Attention distribution represented with graph model over multiple latent variables Compute attention using embedding infoerence New Model: P(z x, q : θ) Attention distribution over structures z ICLR, 2017 Presenter: Chao Jiang 13 /
Structured Attention Networks ICLR, 2017 Presenter: Chao Jiang 14 /
Structured Attention Networks ICLR, 2017 Presenter: Chao Jiang 15 /
Structured Attention Networks ICLR, 2017 Presenter: Chao Jiang 16 /
Motivation: Structured Output Prediction Modeling the structured output (i.e. graphical model in top of a neural net) has improved performance Given a sequence x = x 1,, x T Factored potentials θ i,i+1 (z i, z i+1 ; x) T 1 p(z x; θ) = softmax( θ i,i+1 (z i, z i+1 ; x)) = 1 T 1 Z exp( θ i,i+1 (z i, z i+1 ; x) i=1 i=1 ICLR, 2017 Presenter: Chao Jiang 17 /
Outline 1 Deep Neutral Networks for Text Processing and Generation 2 Attention Networks 3 Structured Attention Networks Overview Computational Challenges Structured Attention in Practice 4 Conclusion and Future Work ICLR, 2017 Presenter: Chao Jiang 18 /
Structured Attention Networks: Notation ICLR, 2017 Presenter: Chao Jiang 19 /
Challenge: End-to-End Training ICLR, 2017 Presenter: Chao Jiang 20 /
Forward-Backward Algorithms ICLR, 2017 Presenter: Chao Jiang 21 /
Forward-Backward Algorithms (Log-Space) ICLR, 2017 Presenter: Chao Jiang 22 /
Structured Attention Networks for NMT ICLR, 2017 Presenter: Chao Jiang 23 /
Backpropagating through Forward-Backward ICLR, 2017 Presenter: Chao Jiang 24 /
Outline 1 Deep Neutral Networks for Text Processing and Generation 2 Attention Networks 3 Structured Attention Networks Overview Computational Challenges Structured Attention in Practice 4 Conclusion and Future Work ICLR, 2017 Presenter: Chao Jiang 25 /
Structured Attention Networks for NMT ICLR, 2017 Presenter: Chao Jiang 26 /
Neural Machine Translation Experiments Data Dataset is from WAT 2015) Japanese characters to English characters Japanese words to English words ICLR, 2017 Presenter: Chao Jiang 27 /
Neural Machine Translation Experiments ICLR, 2017 Presenter: Chao Jiang 28 /
Attention Visualization: Ground Truth ICLR, 2017 Presenter: Chao Jiang 29 /
Attention Visualization: Simple Attention ICLR, 2017 Presenter: Chao Jiang 30 /
Attention Visualization: Structured Attention ICLR, 2017 Presenter: Chao Jiang 31 /
Structured Attention Networks for Question Answering ICLR, 2017 Presenter: Chao Jiang 32 /
Structured Attention Networks for Natural Language Inference ICLR, 2017 Presenter: Chao Jiang 33 /
Conclusion and Future Work Structured Attention Networks Generalize attention to incorporate latent structure Exact inference through dynamic programming Training remains end-to-end Future work Approximate differentiable inference in neural networks Incorporate other probabilistic models into deep learning ICLR, 2017 Presenter: Chao Jiang /