Structured Attention Networks - PDF Free Download

Structured Attention Networks Yoon Kim Carl Denton Luong Hoang Alexander M. Rush HarvardNLP ICLR, 2017 Presenter: Chao Jiang ICLR, 2017 Presenter: Chao Jiang 1 /

Outline 1 Deep Neutral Networks for Text Processing and Generation 2 Attention Networks 3 Structured Attention Networks Overview Computational Challenges Structured Attention in Practice 4 Conclusion and Future Work ICLR, 2017 Presenter: Chao Jiang 2 /

Pure Encoder-Decoder Network ICLR, 2017 Presenter: Chao Jiang 3 /

Pure Encoder-Decoder Network ICLR, 2017 Presenter: Chao Jiang 4 /

Pure Encoder-Decoder Network ICLR, 2017 Presenter: Chao Jiang 5 /

Encoder-Decoder with Attention Machine Translation Question Answering Natural Language Inference Algorithm Learning Parsing Speech Recognition Summarization Caption Generation and more ICLR, 2017 Presenter: Chao Jiang 6 /

Attention Networks ICLR, 2017 Presenter: Chao Jiang 7 /

Attention Networks ICLR, 2017 Presenter: Chao Jiang 8 /

Attention Networks ICLR, 2017 Presenter: Chao Jiang 9 /

Attention Networks ICLR, 2017 Presenter: Chao Jiang 10 /

Attention Networks ICLR, 2017 Presenter: Chao Jiang 11 /

Overview Key difference: Replace simple attention with distribution over a combinatorial set of structures Attention distribution represented with graph model over multiple latent variables Compute attention using embedding infoerence New Model: P(z x, q : θ) Attention distribution over structures z ICLR, 2017 Presenter: Chao Jiang 13 /

Structured Attention Networks ICLR, 2017 Presenter: Chao Jiang 14 /

Structured Attention Networks ICLR, 2017 Presenter: Chao Jiang 15 /

Structured Attention Networks ICLR, 2017 Presenter: Chao Jiang 16 /

Motivation: Structured Output Prediction Modeling the structured output (i.e. graphical model in top of a neural net) has improved performance Given a sequence x = x 1,, x T Factored potentials θ i,i+1 (z i, z i+1 ; x) T 1 p(z x; θ) = softmax( θ i,i+1 (z i, z i+1 ; x)) = 1 T 1 Z exp( θ i,i+1 (z i, z i+1 ; x) i=1 i=1 ICLR, 2017 Presenter: Chao Jiang 17 /

Structured Attention Networks: Notation ICLR, 2017 Presenter: Chao Jiang 19 /

Challenge: End-to-End Training ICLR, 2017 Presenter: Chao Jiang 20 /

Forward-Backward Algorithms ICLR, 2017 Presenter: Chao Jiang 21 /

Forward-Backward Algorithms (Log-Space) ICLR, 2017 Presenter: Chao Jiang 22 /

Structured Attention Networks for NMT ICLR, 2017 Presenter: Chao Jiang 23 /

Backpropagating through Forward-Backward ICLR, 2017 Presenter: Chao Jiang 24 /

Structured Attention Networks for NMT ICLR, 2017 Presenter: Chao Jiang 26 /

Neural Machine Translation Experiments Data Dataset is from WAT 2015) Japanese characters to English characters Japanese words to English words ICLR, 2017 Presenter: Chao Jiang 27 /

Neural Machine Translation Experiments ICLR, 2017 Presenter: Chao Jiang 28 /

Attention Visualization: Ground Truth ICLR, 2017 Presenter: Chao Jiang 29 /

Attention Visualization: Simple Attention ICLR, 2017 Presenter: Chao Jiang 30 /

Attention Visualization: Structured Attention ICLR, 2017 Presenter: Chao Jiang 31 /

Structured Attention Networks for Question Answering ICLR, 2017 Presenter: Chao Jiang 32 /

Structured Attention Networks for Natural Language Inference ICLR, 2017 Presenter: Chao Jiang 33 /

Conclusion and Future Work Structured Attention Networks Generalize attention to incorporate latent structure Exact inference through dynamic programming Training remains end-to-end Future work Approximate differentiable inference in neural networks Incorporate other probabilistic models into deep learning ICLR, 2017 Presenter: Chao Jiang /