May I Have Your Attention Please? (said one neuron to another)

Size: px

Start display at page:

Download "May I Have Your Attention Please? (said one neuron to another)"

Colin Bruno Hudson
5 years ago
Views:

1 May I Have Your Attention Please? (said one neuron to another) Ani Kembhavi Allen Institute for Artificial Intelligence

2 The world of visual illustrations and many more Science Diagrams Maps 3d visualizations Infographics

3 Diagrams afford deep opportunities for reasoning Which animal does the Bobcat eat? What is the effect on the population of Bobcats if the population of squirrel decreased?

4 Syntactic Parsing Semantic Interpretation Motion Consumption Detect Constituents Objects, Text, Elements Detect Relationships Label, Connections Phase change

5 Syntactic Parsing Deep Sequential Diagram Parser Structured Set Matching Networks Diagram Question Answering Bidirectional Attention Flow Textbook Question Answering Semantic Interpretation

Graphic Constituents Inter-Constituent Relationships Constituent-Space Relationships The language of

6 The language of diagrams Prior work in the graphics community to represent visual illustrations We build upon Engelhardt s representation of graphic Syntactic decomposition of a diagram Composite Graphic Constituents Inter-Constituent Relationships Constituent-Space Relationships The language of graphics A framework for the analysis of syntax and meaning in maps, charts and diagrams Graphic Space J.von Engelhardt

7 Generating candidates Constituents Segment Proposals Convolutional Neural Networks Inter-Constituent Relationships Relationship Proposals Random Forest Classifiers Kernel Density Estimates Constituent-Space Relationships

8 Deep Sequential Diagram Parser Diagram Parse Graph Add No.change Add Final c 0 c 1 c 2 c T Fully Connected Stacked LSTM Network FC 3 FC 3 FC 3 FC 3 LSTM 2 LSTM 2 LSTM 2 LSTM 2 LSTM 1 LSTM 1 LSTM 1 LSTM 1 LSTMs require a lot of training data! For each training image: Sample 100s of relationship sequences Sample without replacement Relationship score as sampling weight Fully Connected FC 2 FC 1 FC 2 FC 1 FC 2 FC 1 FC 2 FC 1 Test time: Relationships sorted by proposal scores Candidate Relationships Relationship Feature Vector [xy cand,) score cand,)overlap cand,) )score rel,)seen rel )]

9 Parser Results Diagram Parse Graph Blobs Text Arrows Constituents Diagram

Understanding diagrams can be partially addressed by matching Scarce training data motivates a one-shot scenario Must generalize to unseen categories Cannot simply learn a

10 Understanding diagrams can be partially addressed by matching Scarce training data motivates a one-shot scenario Must generalize to unseen categories Cannot simply learn a classifier for each part Absence of color and texture Local cues ambiguous Pose variations between images Absolute position ambiguous Must enforce a 1:1 matching between parts

Consistency Source Image CNNs LSTMs Context Network Similarity Networks Appearance Similarity Source

11 Structured Set Matching Network Appearance Matching Network Encoder Network Target Image Factor Graph for the Structured Prediction Matching Constraint Global Consistency Structural and Appearance Consistency Source Image CNNs LSTMs Context Network Similarity Networks Appearance Similarity Source Target Ap pearance Ap pearance Part Appearance Source Target Part Vector Ap pearance 5 x 5 appearance matching scores

12 Results

13 Semantic Interpretation in the context of question answering

14 Neural Models for Machine Comprehension Vanilla Architecture Answer Network Attention Architecture Answer Network Network Network Network Network Network Network Context Query Context Word 1 Context Word 2... Context Word N Query

Attend over Diagram Parse Graph Embed the question answer pair in a d-dim space Embed each fact into the same space Attention module learns to attend to the relevant fact, given a question!

15 Attend over Diagram Parse Graph Embed the question answer pair in a d-dim space Embed each fact into the same space Attention module learns to attend to the relevant fact, given a question!" softmax max i (m it s 1 ) max i (m it s 2 ) max i (m it s 3 ) max i (m it s 4 ) LSTM m M m MT s 1 m MT s 2 m MT s 3 m MT s 4 DPG LSTM Relation Embedding m 1 m T 1 s 1 m T 1 s 2 m T 1 s 3 m T 1 s 4 s 1 s 2 s 3 s 4 Statement Embedding LSTM LSTM LSTM LSTM Facts from a DPG Statement 1 Statement 2 Statement 3 Statement4 Each question-answer pair into a statement

Results The diagram depicts The life cycle of Method

02 Q + I (VQA) AI2D 32.90 Q + OCR AI2D 34.

924 b) bird 0.02 c) insecticide 0.054 d) insect 0.

16 Results The diagram depicts The life cycle of Method Train Set Accuracy Q + I (VQA) VQA Q AI2D Q + I (VQA) AI2D Q + OCR AI2D Q + I + OCR AI2D DQA-Net AI2D a) frog b) bird 0.02 c) insecticide d) insect How many stages of Growth does the diagram Feature? a) b) c) d) What comes before Second feed? a) digestion 0.0 b) First feed 0.15 c) indigestion 0.0 d) oviposition 0.85

17 Neural Attention Some characteristics of past attention models: Attention weights used to summarize the modality into a single vector Attended vectors allowed to flow through to the modelling layer They are often temporally dynamic (attention at t affects attention at t+1) Our attention mechanism is memory-less They are usually uni-directional We use bi-directional attention: Query-to-context & Context-to-query

18 Bidirectional Attention Flow (BiDAF) Model

19 Bidirectional Attention Flow (BiDAF) Model

20 Bidirectional Attention Flow (BiDAF) Model

21 Bidirectional Attention Flow (BiDAF) Model

22 Machine Comprehension Task Over 100,000 question-answer tuples

23 Visualizations: Word vs Phrase Spaces

24 BiDAF Demo

25 Textbook QA Challenge

26 Complex parsing and reasoning

27 Textbook QA Challenge a part of Workshop on Visual Understanding Across CVPR Prizes sponsored by AI2

28 Newtonian Image Understanding Unfolding the dynamics of objects in static images What happens if? Predicting the effect of forces in images

29 Unfolding Object Dynamics Predicting Effects of Forces What happens if I push this cup?

30 Spectrum of approaches Let neural networks figure it out! Estimate friction, mass, etc. Then solve some equations. Predicted trajectory

31 Spectrum of approaches Let neural networks figure it out! Intermediate Representation Game Engine Estimate friction, mass, etc. Then solve some equations.

34 More results

35 XNOR-Net Image Classification using Binary CNNs

36 between input I and weight filter W Convolutional Neural Networks GPU! I W ( + Network # operations Inference (CPU) AlexNet 1.5B FLOPs ~3 fps VGG 19.6B FLOPs ~0.25 fps

37 centered Asub-tensors with aat 2Dthe infilter location I (denoted k 2 Rij w (across h by, K ), = width wea can k, and approximate where height). 8ijThis the k proce conv ij = w ing thirdt I and factors roweight ofβ figure for filter all 2. Wsub-tensors Once mainly we obtained using the binary input the scaling operations: I. K factor for the ij corresponds t centered sub-tensors at I W the in location I (denoted (sign(i ij Operations )(across by ~ K sign(w ), width wememory )) can and approximate K height). Computation This theproce conv third t I and roweight of filter W mainly using binary operations: I W figure 2. (sign(i Once we ) + ~ obtained sign(w the )) scaling 1x K factor for the 1x sub-tensors in I (denoted by K ), we can approximate the conv t I and weight I W filter W(sign(I mainly ) ~ using sign(w binary )) operations: K + ~32x ~2x XNOR Bit-count I W (sign(i ) ~ sign(w )) K ~32x ~58x

38 XNOR-NET Demo On the iphone!

39 Thank you! Collaborators Minjoon Seo, Eric Kolve, Mike Salvato Jonghyun Choi, Jayant Krishnamurthy, Dustin Schwenk Hannaneh Hajishirzi, Ali Farhadi Projects by AI2 colleagues Roozbeh Mottaghi, Mohammad Rastegari, Ali Farhadi

ABC-CNN: Attention Based CNN for Visual Question Answering

ABC-CNN: Attention Based CNN for Visual Question Answering CIS 601 PRESENTED BY: MAYUR RUMALWALA GUIDED BY: DR. SUNNIE CHUNG AGENDA Ø Introduction Ø Understanding CNN Ø Framework of ABC-CNN Ø Datasets