Structured Attention Networks

Size: px

Start display at page:

Download "Structured Attention Networks"

Hilary Blankenship
5 years ago
Views:

1 Structured Attention Networks Yoon Kim Carl Denton Luong Hoang Alexander M. Rush HarvardNLP ICLR, 2017 Presenter: Chao Jiang ICLR, 2017 Presenter: Chao Jiang 1 /

2 Outline 1 Deep Neutral Networks for Text Processing and Generation 2 Attention Networks 3 Structured Attention Networks Overview Computational Challenges Structured Attention in Practice 4 Conclusion and Future Work ICLR, 2017 Presenter: Chao Jiang 2 /

3 Pure Encoder-Decoder Network ICLR, 2017 Presenter: Chao Jiang 3 /

4 Pure Encoder-Decoder Network ICLR, 2017 Presenter: Chao Jiang 4 /

5 Pure Encoder-Decoder Network ICLR, 2017 Presenter: Chao Jiang 5 /

6 Encoder-Decoder with Attention Machine Translation Question Answering Natural Language Inference Algorithm Learning Parsing Speech Recognition Summarization Caption Generation and more ICLR, 2017 Presenter: Chao Jiang 6 /

7 Attention Networks ICLR, 2017 Presenter: Chao Jiang 7 /

8 Attention Networks ICLR, 2017 Presenter: Chao Jiang 8 /

9 Attention Networks ICLR, 2017 Presenter: Chao Jiang 9 /

10 Attention Networks ICLR, 2017 Presenter: Chao Jiang 10 /

11 Attention Networks ICLR, 2017 Presenter: Chao Jiang 11 /

12 Outline 1 Deep Neutral Networks for Text Processing and Generation 2 Attention Networks 3 Structured Attention Networks Overview Computational Challenges Structured Attention in Practice 4 Conclusion and Future Work ICLR, 2017 Presenter: Chao Jiang 12 /

13 Overview Key difference: Replace simple attention with distribution over a combinatorial set of structures Attention distribution represented with graph model over multiple latent variables Compute attention using embedding infoerence New Model: P(z x, q : θ) Attention distribution over structures z ICLR, 2017 Presenter: Chao Jiang 13 /

14 Structured Attention Networks ICLR, 2017 Presenter: Chao Jiang 14 /

15 Structured Attention Networks ICLR, 2017 Presenter: Chao Jiang 15 /

16 Structured Attention Networks ICLR, 2017 Presenter: Chao Jiang 16 /

17 Motivation: Structured Output Prediction Modeling the structured output (i.e. graphical model in top of a neural net) has improved performance Given a sequence x = x 1,, x T Factored potentials θ i,i+1 (z i, z i+1 ; x) T 1 p(z x; θ) = softmax( θ i,i+1 (z i, z i+1 ; x)) = 1 T 1 Z exp( θ i,i+1 (z i, z i+1 ; x) i=1 i=1 ICLR, 2017 Presenter: Chao Jiang 17 /

18 Outline 1 Deep Neutral Networks for Text Processing and Generation 2 Attention Networks 3 Structured Attention Networks Overview Computational Challenges Structured Attention in Practice 4 Conclusion and Future Work ICLR, 2017 Presenter: Chao Jiang 18 /

19 Structured Attention Networks: Notation ICLR, 2017 Presenter: Chao Jiang 19 /

20 Challenge: End-to-End Training ICLR, 2017 Presenter: Chao Jiang 20 /

21 Forward-Backward Algorithms ICLR, 2017 Presenter: Chao Jiang 21 /

22 Forward-Backward Algorithms (Log-Space) ICLR, 2017 Presenter: Chao Jiang 22 /

23 Structured Attention Networks for NMT ICLR, 2017 Presenter: Chao Jiang 23 /

24 Backpropagating through Forward-Backward ICLR, 2017 Presenter: Chao Jiang 24 /

25 Outline 1 Deep Neutral Networks for Text Processing and Generation 2 Attention Networks 3 Structured Attention Networks Overview Computational Challenges Structured Attention in Practice 4 Conclusion and Future Work ICLR, 2017 Presenter: Chao Jiang 25 /

26 Structured Attention Networks for NMT ICLR, 2017 Presenter: Chao Jiang 26 /

27 Neural Machine Translation Experiments Data Dataset is from WAT 2015) Japanese characters to English characters Japanese words to English words ICLR, 2017 Presenter: Chao Jiang 27 /

28 Neural Machine Translation Experiments ICLR, 2017 Presenter: Chao Jiang 28 /

29 Attention Visualization: Ground Truth ICLR, 2017 Presenter: Chao Jiang 29 /

30 Attention Visualization: Simple Attention ICLR, 2017 Presenter: Chao Jiang 30 /

31 Attention Visualization: Structured Attention ICLR, 2017 Presenter: Chao Jiang 31 /

32 Structured Attention Networks for Question Answering ICLR, 2017 Presenter: Chao Jiang 32 /

33 Structured Attention Networks for Natural Language Inference ICLR, 2017 Presenter: Chao Jiang 33 /

34 Conclusion and Future Work Structured Attention Networks Generalize attention to incorporate latent structure Exact inference through dynamic programming Training remains end-to-end Future work Approximate differentiable inference in neural networks Incorporate other probabilistic models into deep learning ICLR, 2017 Presenter: Chao Jiang /

Structured Attention Networks

Structured Attention Networks Yoon Kim Carl Denton Luong Hoang Alexander M. Rush HarvardNLP 1 Deep Neural Networks for Text Processing and Generation 2 Attention Networks 3 Structured Attention Networks