Structured Attention Networks

Similar documents
Structured Attention Networks

Deep Generative Models Variational Autoencoders

Lecture 21 : A Hybrid: Deep Learning and Graphical Models

27: Hybrid Graphical Models and Neural Networks

Gradient of the lower bound

Can Active Memory Replace Attention?

Conditional Random Fields as Recurrent Neural Networks

Sequence Modeling: Recurrent and Recursive Nets. By Pyry Takala 14 Oct 2015

Conditioned Generation

CS839: Probabilistic Graphical Models. Lecture 22: The Attention Mechanism. Theo Rekatsinas

arxiv: v1 [cond-mat.dis-nn] 30 Dec 2018

CAP 6412 Advanced Computer Vision

Lecture 20: Neural Networks for NLP. Zubin Pahuja

Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting

DCU-UvA Multimodal MT System Report

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic

Recurrent Convolutional Neural Networks for Scene Labeling

Encoder-Decoder Networks for Semantic Segmentation. Sachin Mehta

Layerwise Interweaving Convolutional LSTM

Inference Optimization Using TensorRT with Use Cases. Jack Han / 한재근 Solutions Architect NVIDIA

LSTM and its variants for visual recognition. Xiaodan Liang Sun Yat-sen University

Recurrent Neural Networks. Nand Kishore, Audrey Huang, Rohan Batra

16-785: Integrated Intelligence in Robotics: Vision, Language, and Planning. Spring 2018 Lecture 14. Image to Text

Natural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu

Residual Networks And Attention Models. cs273b Recitation 11/11/2016. Anna Shcherbina

Unsupervised Learning

RNNs as Directed Graphical Models

Alternatives to Direct Supervision

Semi-Amortized Variational Autoencoders

MoonRiver: Deep Neural Network in C++

Energy Based Models, Restricted Boltzmann Machines and Deep Networks. Jesse Eickholt

Backpropagating through Structured Argmax using a SPIGOT

LEARNING TO INFER GRAPHICS PROGRAMS FROM HAND DRAWN IMAGES

LSTM for Language Translation and Image Captioning. Tel Aviv University Deep Learning Seminar Oran Gafni & Noa Yedidia

Technical University of Munich. Exercise 8: Neural Networks

Recurrent Neural Networks

Deconvolution Networks

Neural Symbolic Machines: Learning Semantic Parsers on Freebase with Weak Supervision

END-TO-END CHINESE TEXT RECOGNITION

Pointer Network. Oriol Vinyals. 박천음 강원대학교 Intelligent Software Lab.

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 15, 2015

RGBd Image Semantic Labelling for Urban Driving Scenes via a DCNN

19: Inference and learning in Deep Learning

Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction

Deep Learning. Deep Learning. Practical Application Automatically Adding Sounds To Silent Movies

CS395T paper review. Indoor Segmentation and Support Inference from RGBD Images. Chao Jia Sep

Sequential Dependency and Reliability Analysis of Embedded Systems. Yu Jiang Tsinghua university, Beijing, China

Forest-based Neural Machine Translation. Chunpeng Ma, Akihiro Tamura, Masao Utiyama, Tiejun Zhao, EiichiroSumita

Auto-Encoding Variational Bayes

Latent Variable Models for Structured Prediction and Content-Based Retrieval

Generative Adversarial Text to Image Synthesis

Deep Learning and Its Applications

Kyoto-NMT: a Neural Machine Translation implementation in Chainer

Unstructured Data. CS102 Winter 2019

Deep Learning on Graphs

Autoencoder. Representation learning (related to dictionary learning) Both the input and the output are x

Semantic Segmentation. Zhongang Qi

Score function estimator and variance reduction techniques

Deep Learning in Image Processing

Delving into Transferable Adversarial Examples and Black-box Attacks

08 An Introduction to Dense Continuous Robotic Mapping

ECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov

Learning to generate 3D shapes

Joint Object Detection and Viewpoint Estimation using CNN features

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

Deep Manga Colorization with Color Style Extraction by Conditional Adversarially Learned Inference

Warped Mixture Models

Crowd Scene Understanding with Coherent Recurrent Neural Networks

Backpropagation + Deep Learning

DOMAIN-ADAPTIVE GENERATIVE ADVERSARIAL NETWORKS FOR SKETCH-TO-PHOTO INVERSION

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

COMP90051 Statistical Machine Learning

GENERATIVE ADVERSARIAL NETWORKS (GAN) Presented by Omer Stein and Moran Rubin

ImageNet Classification with Deep Convolutional Neural Networks

Stacked Denoising Autoencoders for Face Pose Normalization

DOMAIN-ADAPTIVE GENERATIVE ADVERSARIAL NETWORKS FOR SKETCH-TO-PHOTO INVERSION

Lecture 7: Semantic Segmentation

10703 Deep Reinforcement Learning and Control

Deep generative models of natural images

Depth Estimation from a Single Image Using a Deep Neural Network Milestone Report

Efficient Segmentation-Aided Text Detection For Intelligent Robots

Deep Learning for Computer Vision with MATLAB By Jon Cherrie

Sony's deep learning software "Neural Network Libraries/Console and its use cases in Sony

Image Captioning with Object Detection and Localization

The Multi-Entity Variational Autoencoder

Recurrent Neural Nets II

PLT: Inception (cuz there are so many layers)

Data Engineering Fuzzy Mathematics in System Theory and Data Analysis

JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS. Puyang Xu, Ruhi Sarikaya. Microsoft Corporation

Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science.

A Joint Model of Language and Perception for Grounded Attribute Learning

Neural Machine Translation In Sogou, Inc. Feifei Zhai and Dongxu Yang

Variational Autoencoders. Sargur N. Srihari

Machine Learning With Python. Bin Chen Nov. 7, 2017 Research Computing Center

FastText. Jon Koss, Abhishek Jindal

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution and Fully Connected CRFs

CS6220: DATA MINING TECHNIQUES

Bidirectional Recurrent Convolutional Networks for Video Super-Resolution

Multiview Feature Learning

Deep Model Compression

Transcription:

Structured Attention Networks Yoon Kim Carl Denton Luong Hoang Alexander M. Rush HarvardNLP ICLR, 2017 Presenter: Chao Jiang ICLR, 2017 Presenter: Chao Jiang 1 /

Outline 1 Deep Neutral Networks for Text Processing and Generation 2 Attention Networks 3 Structured Attention Networks Overview Computational Challenges Structured Attention in Practice 4 Conclusion and Future Work ICLR, 2017 Presenter: Chao Jiang 2 /

Pure Encoder-Decoder Network ICLR, 2017 Presenter: Chao Jiang 3 /

Pure Encoder-Decoder Network ICLR, 2017 Presenter: Chao Jiang 4 /

Pure Encoder-Decoder Network ICLR, 2017 Presenter: Chao Jiang 5 /

Encoder-Decoder with Attention Machine Translation Question Answering Natural Language Inference Algorithm Learning Parsing Speech Recognition Summarization Caption Generation and more ICLR, 2017 Presenter: Chao Jiang 6 /

Attention Networks ICLR, 2017 Presenter: Chao Jiang 7 /

Attention Networks ICLR, 2017 Presenter: Chao Jiang 8 /

Attention Networks ICLR, 2017 Presenter: Chao Jiang 9 /

Attention Networks ICLR, 2017 Presenter: Chao Jiang 10 /

Attention Networks ICLR, 2017 Presenter: Chao Jiang 11 /

Outline 1 Deep Neutral Networks for Text Processing and Generation 2 Attention Networks 3 Structured Attention Networks Overview Computational Challenges Structured Attention in Practice 4 Conclusion and Future Work ICLR, 2017 Presenter: Chao Jiang 12 /

Overview Key difference: Replace simple attention with distribution over a combinatorial set of structures Attention distribution represented with graph model over multiple latent variables Compute attention using embedding infoerence New Model: P(z x, q : θ) Attention distribution over structures z ICLR, 2017 Presenter: Chao Jiang 13 /

Structured Attention Networks ICLR, 2017 Presenter: Chao Jiang 14 /

Structured Attention Networks ICLR, 2017 Presenter: Chao Jiang 15 /

Structured Attention Networks ICLR, 2017 Presenter: Chao Jiang 16 /

Motivation: Structured Output Prediction Modeling the structured output (i.e. graphical model in top of a neural net) has improved performance Given a sequence x = x 1,, x T Factored potentials θ i,i+1 (z i, z i+1 ; x) T 1 p(z x; θ) = softmax( θ i,i+1 (z i, z i+1 ; x)) = 1 T 1 Z exp( θ i,i+1 (z i, z i+1 ; x) i=1 i=1 ICLR, 2017 Presenter: Chao Jiang 17 /

Outline 1 Deep Neutral Networks for Text Processing and Generation 2 Attention Networks 3 Structured Attention Networks Overview Computational Challenges Structured Attention in Practice 4 Conclusion and Future Work ICLR, 2017 Presenter: Chao Jiang 18 /

Structured Attention Networks: Notation ICLR, 2017 Presenter: Chao Jiang 19 /

Challenge: End-to-End Training ICLR, 2017 Presenter: Chao Jiang 20 /

Forward-Backward Algorithms ICLR, 2017 Presenter: Chao Jiang 21 /

Forward-Backward Algorithms (Log-Space) ICLR, 2017 Presenter: Chao Jiang 22 /

Structured Attention Networks for NMT ICLR, 2017 Presenter: Chao Jiang 23 /

Backpropagating through Forward-Backward ICLR, 2017 Presenter: Chao Jiang 24 /

Outline 1 Deep Neutral Networks for Text Processing and Generation 2 Attention Networks 3 Structured Attention Networks Overview Computational Challenges Structured Attention in Practice 4 Conclusion and Future Work ICLR, 2017 Presenter: Chao Jiang 25 /

Structured Attention Networks for NMT ICLR, 2017 Presenter: Chao Jiang 26 /

Neural Machine Translation Experiments Data Dataset is from WAT 2015) Japanese characters to English characters Japanese words to English words ICLR, 2017 Presenter: Chao Jiang 27 /

Neural Machine Translation Experiments ICLR, 2017 Presenter: Chao Jiang 28 /

Attention Visualization: Ground Truth ICLR, 2017 Presenter: Chao Jiang 29 /

Attention Visualization: Simple Attention ICLR, 2017 Presenter: Chao Jiang 30 /

Attention Visualization: Structured Attention ICLR, 2017 Presenter: Chao Jiang 31 /

Structured Attention Networks for Question Answering ICLR, 2017 Presenter: Chao Jiang 32 /

Structured Attention Networks for Natural Language Inference ICLR, 2017 Presenter: Chao Jiang 33 /

Conclusion and Future Work Structured Attention Networks Generalize attention to incorporate latent structure Exact inference through dynamic programming Training remains end-to-end Future work Approximate differentiable inference in neural networks Incorporate other probabilistic models into deep learning ICLR, 2017 Presenter: Chao Jiang /