Structured Memory for Neural Turing Machines

Size: px

Start display at page:

Download "Structured Memory for Neural Turing Machines"

Jasmin Ball
5 years ago
Views:

1 Structured Memory for Neural Turing Machines Wei Zhang, Bowen Zhou, Yang Yu {zhangwei, zhou,

2 MoCvaCon NTM memory tries to simulate human working memory using linearly structured memory. It has been proved to be successful for learning simple algorithms. Q: Is Linear memory adequate for more complex algorithms, such as those for quescon answering? Graves, Alex, Greg Wayne, and Ivo Danihelka. "Neural Turing Machines." arxiv preprint arxiv: (2014).

3 NTM s neurological intuicon NTM is trained end-to-end: Both controller and memory are jointly learned by interaccng with environment. How and where to store in memory the response to the input is learnt automaccally, instead of specified explicitly.

control at early stage of training, although careful inicalizacon of the memory content is

4 Empirical observacons Memory plays a key role in NTM. For larger memories, the interaccon between parameter and memory content could be out of control at early stage of training, although careful inicalizacon of the memory content is applied 1. t-1 t C w r C w r controller Memory controller Memory Unroll NTM through Cme 1. h`ps://github.com/kaishengtai/torch-ntm

5 Empirical observacons SomeCmes NTM does not converge, or not as quickly as expected. Copy and recall. NTM struggles to learn the tasks that are intuicvely learnable. E.g. on babi data 1, compared to a`encon mechanism used in a`encve reader 2 1. Jason Weston, Antoine Bordes, Sumit Chopra, Alexander M. Rush, Bart van Merriënboer and Tomas Mikolov. Towards AI Complete Ques9on Answering: A Set of Prerequisite Toy Tasks. arxiv: Hermann, Karl Moritz, et al. "Teaching machines to read and comprehend." NIPS 2015.

6 Are those issues possibly related to the limitacon of NTM s linear memory structure? Can we learn from human working memory? h`p://thebrain.mcgill.ca/flash/d/d_07/d_07_cr/d_07_cr_tra/d_07_cr_tra.html

7 Imagine a tree. As a tree grows, so do the roots and branches. Given the right condicons (nutrients, water and sunlight), at the end of these branches leaves will grow. Much like a tree neurons have a complex network of branches, and given the right condi9ons, new synapses will grow and strength helping in the forma9on of memories. Dr. Daniel Jones BSc PhD Director of Research and Development Revive AcCve A Neuron tree as memory structure h`p://

8 Is it possible to improve NTM by using structured memory that mimics human memory in certain ways? and Can we mimic tree-structured human working memory in NTM?

9 Yes and Yes

10 Upper layer memory Accumulate response Lower layer memory Neurons closer to the root send signals to write to the other connected branch neurons. This is more like accumula9ng responses in memory.

11 Inspired by the tree structure for human memory, we propose Structured Memory for NTM

12 Related work David Krueger & Roland. REGULARIZING RNNS BY STABILIZING ACTIVATIONS. (2016) Memisevic h`p://arxiv.org/pdf/ v2.pdf Meng et al A DEEP MEMORY-BASED ARCHITECTURE FOR SEQUENCE- TO-SEQUENCE LEARNING h`p://arxiv.org/pdf/ v3.pdf

13 How do we create structured memory in NTM? Different ways. Just list 3.

14 Three proposed models Memory visability (w.r.t. write heads): memory is Controlled if it is modified by controller outputs through write heads directly, or Hidden if not. 1. Hidden-memory 2. Double-controlled 3. Tightly-coupled input output input output input output

15 Vanilla NTM input: output: input: output: Output c(t-1) Output c(t) LSTM LSTM c(t) c(t-1) ADDRESSING READ ADDRESSING M (t-1) M(t)

16 Our proposal #1: Hidden-Memory NTM input: output: input: output: Output Output c(t-1) c(t) LSTM LSTM c(t) c(t-1) ADDRESSING READ ADDRESSING Mc (t-1) acc Mh (t-1) Mc (t) acc Mh (t)

17 Mc(t) the controlled memory, is also the short-term memory that remembers immediate input from the environment. Mh(t) the hidden memory, works as a longer short-term memory that accumulates the contents from the controlled memory.

18 Our proposal #2: Double-Controlled NTM input: output: input: output: Output c(t-1) Output c(t) LSTM LSTM c(t) c(t-1) ADDRE SSING ADDRE SSING READ ADDRE SSING ADDRE SSING M1(t-1) M1(t) acc acc M2(t-1) M2(t)

20 Our proposal #3 Tightly-Coupled NTM NTM output: NTM input: Output MulC-layer LSTM L1 c(t) L2 L3 ADDR1 ADDR2 ADDR3 M1 M2 M3

21 Experiments

22 ImplementaCon Details Torch 6 RMSProp All modules are recurrent Train on CPU Memory size: Reserve big enough to hold all inputs.

23 Experiments copy, recall Copy: randomly generate input vector sequence to copy. Input sequence is only copied once. Recall: randomly generate input vector sequence, randomly choose a generated vector in sequence as query, and output the immediate next vector. sigmoid+binary cross entropy babi data for simulacng QuesCon Answering Word prediccon LogSoxMax + NegaCve LogLikelihood

24 Comparison on copy task where did they read from/write to memory? #Iter Vanilla NTM Hidden-Memory NTM Write [PosiCon, weights] Read [posicon, weights] Write [PosiCon, weights] Read [posicon, weights]

25 Comparison on copy task where did they read from/write to memory? #Iter Vanilla NTM Hidden-Memory NTM Write Read Write Read [PosiCon, weights] [posicon, weights] [PosiCon, weights] [posicon, weights]

26 Comparison on copy task where did they read from/write to memory? #Iter Vanilla NTM Hidden-Memory NTM Write Read Write Read [PosiCon, weights] [posicon, weights] [PosiCon, weights] [posicon, weights]

27 Comparison on copy task where did they read from/write to memory? #Iter Vanilla NTM Hidden-Memory NTM Write Read Write Read [PosiCon, weights] [posicon, weights] [PosiCon, weights] [posicon, weights]

28 Comparison on copy task where did they read from/write to memory? #Iter Vanilla NTM Hidden-Memory NTM Write Read Write Read [PosiCon, weights] [posicon, weights] [PosiCon, weights] [posicon, weights]

29 Comparison on copy task where did they read from/write to memory? #Iter Vanilla NTM Hidden-Memory NTM Write weights Read weights Write eights Read weights [PosiCon, weights] [posicon, weights] [PosiCon, weights] [posicon, weights]

30 20 Vanilla NTM -copy Copy Loss per item (x-axis is # of iteracons, y-axis is loss/vector, sampled every 25 iters) 8 Hidden-Memory NTM- copy r 2w 1r 1w Double-controlled NTM-copy Tightly-coupled NTM-copy

31 Copy task is a task that could lead to a nice converged model. Recall task is harder in that it requires random jump back into specific locacon in input.

32 AssociaCve Recall (x-axis is # of iteracons, y-axis is log of NLL loss/vector, sampled 10 Vanilla NTM every 25 iters) 10 Hidden-Memory Double-controlled Tightly-Coupled

33 AssociaCve Recall (x-axis is # of iteracons, y-axis is log of NLL loss/vector, sampled 10 Vanilla NTM every 25 iters) 10 Hidden-Memory Double-controlled Tightly-Coupled

34 QuesCon answering simulacon Test on babi data

35 Learning example: babi task 1 (1K) supporcng fact only F: Mary went to the Kitchen. Q: Where is marry? A: Kitchen.

36 babi task 1 (1K) supporcng fact only x #epochs, y acc NTM-1w1r NTM-2w2r Hidden-Memory Double-controlled Tightly-coupled

37 An example babi task 2 (1K) supporcng fact only F: John got the football there. F: John went to the hallway. Q: Where is the football? A: Hallway

38 babi task 2 (1k train/test) supporcng fact only x #epochs, y acc R1W 2R2W Hidden-Memory Double-controlled Tightly-coupled

39 An example babi task 3 (1K) supporcng fact only F: Mary journeyed to the office. F: Mary journeyed to the bathroom. F: Mary dropped the football. Q: Where was the football before the bathroom? A: office

40 Task 3(1K) Accuracy x #epochs, y acc r1w 2w2r Hidden-memory Double-controlled Tightly-coupled

41 Task 3(1K) Training loss dynamics (x: one sample /25 iteracon. Y: training loss) 1w1r 2w2r Double-controlled Tightly-coupled Hidden-memory

42 Discussion

43 Why structured memory works: The learning perspeccve Stabilizing memory, proved to be successful for training recurrent neural networks. David Krueger & Roland. REGULARIZING RNNS BY STABILIZING ACTIVATIONS. (2016) Memisevic h`p://arxiv.org/pdf/ v2.pdf

44 Why structured memory works: The learning perspeccve Smoothing. ExpectaCon stabilizes accvacon (memory), be`er parameter fi}ng. The items wri`en earlier in the history is forgo`en (might be because of erase and add operacons). AccumulaCon of memory brings the old memory back

45 Future and ongoing work Memory with more complicated structure. Tensor as weights for more complex ways of mixing memory contents? Real quescon answering dataset Hermann, Karl Moritz, Tomáš Kočiský, Edward Grefenste`e, Lasse Espeholt, Will Kay, Mustafa Suleyman, and Phil Blunsom. "Teaching machines to read and comprehend." NIPS (2015).

46 Thanks for your reasoning, a`encon and memory during my talk! Q&A

Structured Attention Networks

Structured Attention Networks Yoon Kim Carl Denton Luong Hoang Alexander M. Rush HarvardNLP 1 Deep Neural Networks for Text Processing and Generation 2 Attention Networks 3 Structured Attention Networks