Pointer Network. Oriol Vinyals. 박천음 강원대학교 Intelligent Software Lab.

Size: px

Start display at page:

Download "Pointer Network. Oriol Vinyals. 박천음 강원대학교 Intelligent Software Lab."

Brendan Hopkins
5 years ago
Views:

1 Pointer Network Oriol Vinyals 박천음 강원대학교 Intelligent Software Lab. Intelligent Software Lab.

2 Pointer Network 1 Pointer Network 2 Intelligent Software Lab. 2

3 Sequence-to-Sequence Model Train 학습학습학습학습학습 Test Intelligent Software Lab. 3

4 Encoder-Decoder Architecture Sentence representation: encoder is based on its property of sequence summarization. RNN h i = φ θ (h i 1, s i ) global RNN`s internal state z i based on the summary vector h T of the source sentence. Intelligent Software Lab. 4

5 Soft Attention Mechanism Bi-RNN: forward and backward RNN 으로전체입력문장을심볼 (symbol) 로요약 (summarization) 할수있음 h h Figure 2. Bidirectional recurrent neural networks for encoding a source sentence. - cho Intelligent Software Lab. 5

Soft Attention Mechanism But RNN은최근심볼에편향적인특성을보임 not good 따라서 Concat h and h Annotation vector h (1) 입력문장을가변길이표현으로저장하는문맥의존단어표현집합 (set of context-dependent word representation) 으로볼수있음 (as

6 Soft Attention Mechanism But RNN은최근심볼에편향적인특성을보임 not good 따라서 Concat h and h Annotation vector h (1) 입력문장을가변길이표현으로저장하는문맥의존단어표현집합 (set of context-dependent word representation) 으로볼수있음 (as opposed to Fixed-length, dimensional summary) 번역된단어 (y 1, y 2,, y t 1 ) 와함께현재단어 x j (or h j ) 를번역 y t Figure 3. attention Mechanism takes into consideration... - cho Intelligent Software Lab. 6

7 Soft Attention Mechanism Attention mechanism: small neural network (+ single hidden layer) Attention vector: e j R (single scalar output) e ij = f z i 1, h j z i 1 : previous decoder`s hidden state h j : one of the source context-dependent word representations Single hidden layer e ij Intelligent Software Lab. 7

8 Soft Attention Mechanism 모든단어에대한 score 를구할때 softmax Attention weight: a ij = exp e ij j exp e ij Context vector (encoding): c ij = j=1 T a ij h j = Ε a ij [h j ] c ij Intelligent Software Lab. 8

9 Pointer Network Repurposing the attention mechanism to create pointers to input elements c: input(x 1, y 1 ) Input 과 output 의길이가서로다른경우 PtrNet 해결가능 Representing variable length dictionaries by using a softmax probability distribution as a pointer Intelligent Software Lab. 9

10 Pointer Network PtrNet u i j = v T tanh W 1 e j + W 2 d i, j 1,, n p C i C 1, C i 1, Ρ = softmax u i, (of length n) Blending attention a i j = softmax u i n d i = a i j e j j=1 PtrNet: Data structure Ρ: point sets Ρ = {Ρ 1,, Ρ n } Ρ j = (x j, y j ): cartesian coordinate v, W 1, W 2 : learnable parameters (input) u j i : use it as pointers to the input elements: C i 1 Ρ Ci 1 d i e j Intelligent Software Lab. 10

11 Pointer Network 1 Pointer Network 2 Intelligent Software Lab. 11

12 Pointer Network PtrNet: Architecture and Hyperparameters No extensive architecture or hyperparameter search Main message of the paper stronger Single layer LSTM Hidden units: 256 or 512 Stochastic gradient descent A learning rate of 1.0 Batch size of 128 Random uniform weight initialization: ~ 0.08 L2 gradient clipping of 2.0 1M training example pairs Intelligent Software Lab. 12

13 Pointer Network Convex Hull: Data structure Ρ: point sets Ρ = {Ρ 1,, Ρ n } (Input) Ρ j : uniform distribution in [0, 1] x [0, 1] C P = {C 1,, C m P } (Output) C i : between 1 and n corresponding to positions in the sequence Ρ Special token: beginning( ) or end( ) Example Input Ρ = Ρ 1,, Ρ 10 Output C Ρ = {, 2, 4, 3, 5, 6, 7, 2, } Intelligent Software Lab. 13

14 Pointer Network Convex Hull: Empirical Results Intelligent Software Lab. 14

15 Pointer Network Delaunay Triangulation: Data Structure Ρ: Delaunay triangulation for a set Ρ of points in a plane (every triangle is empty there is no point from Ρ in its interior) C Ρ = {C 1,, C m P }: the corresponding sequences representing the triangulation of the point set Ρ C i : three integers for each triangle representation Lexicographic order: we order the triangles C i by their incenter coordinates ( 정렬해야더좋은성능 ) Choose the increasing triangle representation Example C i = 1,2,4 (2,4,1) Input Ρ = Ρ 1,, Ρ 5 Output C Ρ = {, 1,2,4, 1,4,5, 1,3,5, (1,2,3), } Intelligent Software Lab. 15

16 Pointer Network Delaunay Triangulation: Empirical Results Given set of points triangulates the convex hull of these points Intelligent Software Lab. 16

17 Pointer Network Travelling Salesman Problem (TSP): Data structure Input/output pairs (Ρ, C Ρ ) as in the Convex Hull problem Ρ: the Cartesian coordinates representing the cities Chosen randomly in the [0, 1] x [0, 1] square C Ρ = {C 1,, C n }: permutation of integers from 1 to n representing the optimal path Intelligent Software Lab. 17

18 Pointer Network Travelling Salesman Problem (TSP): Data structure Valid tour 에대하여 beam search Invalid tour 제외..? 같은도시를반복 or 도착점무시 Intelligent Software Lab. 18

19 references Sequence to sequence Learning with Neural Networks, Stuskever at al. n-neural-machine-translation-gpus-part-2/ n-neural-machine-translation-gpus-part-3/ Pointer Networks, Vinyals at al. Intelligent Software Lab. 19

20 QA 감사합니다. 박천음, 이창기 Intelligent Software Lab., 강원대학교 Intelligent Software Lab.

16-785: Integrated Intelligence in Robotics: Vision, Language, and Planning. Spring 2018 Lecture 14. Image to Text

16-785: Integrated Intelligence in Robotics: Vision, Language, and Planning Spring 2018 Lecture 14. Image to Text Input Output Classification tasks 4/1/18 CMU 16-785: Integrated Intelligence in Robotics