Dynamic Routing Between Capsules. Yiting Ethan Li, Haakon Hukkelaas, and Kaushik Ram Ramasamy

Size: px

Start display at page:

Download "Dynamic Routing Between Capsules. Yiting Ethan Li, Haakon Hukkelaas, and Kaushik Ram Ramasamy"

Alexis Dean
5 years ago
Views:

1 Dynamic Routing Between Capsules Yiting Ethan Li, Haakon Hukkelaas, and Kaushik Ram Ramasamy

2 Problems & Results Object classification in images without losing information about important parts of the picture. smallnorb: Images of 3D objects 5 classes. Images of 50 toys in different angles CapsNet(2017): 2.7% error State of the art (2017): 2.56% error *CapsNet (2018): 1.4% error MNIST: Handwritten digit classification Result: 0.25% error (state of the art)

3 Problem with CNN How ConvNets would have achieved rotational invariance?

4 Motivation Traditional ConvNet Translational invariance: Max Pooling Susceptible to affine transformations Max Pooling throw away information Human brains don t work like that Capsule Networks Equivariance by Routing by Agreement Equivariance keeps track of where something is in the image Robust to affine transformations Makes biological sense Achieves inverse rendering (with capsules)

5 Rendering vs. Inverse Rendering

7 What is a capsule? A capsule is a group of neurons which outputs a vector activation The vector represents features related to the object Capsule represents the inverse graphics of the patch of image Orientation of vector: Represents properties of the entity Length of vector: Represents existence of the entity

8 A Toy Example

9 A Toy Example Slides heavily inspired by Aurélien Géron [2]

10 Predict Next Layer s Output Slides heavily inspired by Aurélien Géron [2]

11 Predict Next Layer s Output Slides heavily inspired by Aurélien Géron [2]

12 Predict Next Layer s Output Strong agreement! The rectangle and triangle capsules should be routed to the boat capsule Slides heavily inspired by Aurélien Géron [2]

13 Routing Algorithm Routing coefficient between capsule i to parent capsule j. Predict next layer output Output of capsule j (parent)

14 Routing Weights Slides heavily inspired by Aurélien Géron [2]

15 Compute Next Layer s Output Slides heavily inspired by Aurélien Géron [2]

16 Compute Next Layer s Output Slides heavily inspired by Aurélien Géron [2]

17 Update Routing Weights Agreement! Large Slides heavily inspired by Aurélien Géron [2]

18 Update Routing Weights Disagreement! Small Slides heavily inspired by Aurélien Géron [2]

19 Routing Weights Slides heavily inspired by Aurélien Géron [2]

20 Routing Weights Slides heavily inspired by Aurélien Géron [2]

21 Routing Weights Slides heavily inspired by Aurélien Géron [2]

22 The MNIST Dataset 70,000 handwritten digits 28x28 grayscale images DIgit classification (10 classes)

23 Architecture 256, 9x9 Image 28x28 256, 9x9 Conv1 256x20x20 reshape Conv 256x6x6 stride 2 Wij[8x16] Capsules 32x8x6x6 DigitCaps 10x16

24 Architecture 256, 9x9 Image 28x28 256, 9x9 Conv1 256x20x20 reshape Conv 256x6x6 stride 2 Wij[8x16] Capsules 32x8x6x6 DigitCaps 10x16

25 Architecture 256, 9x9 Image 28x28 256, 9x9 Conv1 256x20x20 reshape Conv 256x6x6 stride 2 Wij[8x16] Capsules 32x8x6x6 DigitCaps 10x16

26 Architecture 256, 9x9 Image 28x28 256, 9x9 Conv1 256x20x20 reshape Conv 256x6x6 stride 2 Wij[8x16] Capsules 32x8x6x6 DigitCaps 10x16

27 Architecture 256, 9x9 Image 28x28 256, 9x9 Conv1 256x20x20 reshape Conv 256x6x6 stride 2 Wij[8x16] Capsules 32x8x6x6 DigitCaps 10x16

28 Architecture 256, 9x9 Image 28x28 256, 9x9 Conv1 256x20x20 reshape Conv 256x6x6 stride 2 Wij[8x16] Capsules 32x8x6x6 DigitCaps 10x16

29 Loss Function

30 Reconstruction A decoder is used to reconstruct object from capsule representation Reconstruction loss: mean-squared error Encourages capsules to encode the instantiation parameters of the input digit Input Reconstructed

31 MNIST Result Baseline #parameters: 35.4M CapsNet (with reconstruction) #parameters: 8.2M CapsNet (without reconstruction) #parameters: 6.8M

32 MNIST Results l = label p = prediction r = reconstruction target Predicted 3, reconstructed from 5 Predicted 3, reconstructed from 3

33 Capsule Interpretation

34 MNIST Results continued MNIST data set with small random affine transformations. Training Data : Expanded and translated MNIST dataset Traditional CNN CapsuleNet Expanded & Translated 99.22% 99.23% Affine Transformation 66% 79%

35 MultiMNIST Two digits fused together Each digit has 80% overlap Training size: 60M, Testing size: 10M 5,0 6,7 4,9

36 MultiMNIST Results While = Input Red = Digit 1 reconstruction Green = Digit 2 reconstruction L:(l1,l2) = Label for digit1 and digit 2 R:(r1,r2) = digits used for reconstruction CNN 8.5 Caps(1 itr) 7.1 Caps(3 itr) 5.2

37 MultiMNIST Results While = Input Red = Digit 1 reconstruction Green = Digit 2 reconstruction L:(l1,l2) = Label for digit1 and digit 2 R:(r1,r2) = digits used for reconstruction

Other Datasets CIFAR10: 60000 32x32 colour images in 10

5% error smallnorb: Images of 3D objects 5 classes.

38 Other Datasets CIFAR10: x32 colour images in 10 classes(airplane,bird,cat,deer,dog,frog,horse etc ) Result : 10.6% error State of the art: ~2.5% error smallnorb: Images of 3D objects 5 classes. Images of 50 toys in different angles Result: 2.7% error State of the art: 2.56% error CapsNet (2018): 1.4% SVHN: Street view house numbers Result : 4.3% error State of the art: 1.69% error

39 Discussion Pros: Requires less training data Position and pose is preserved (Equivariance) Robust affine transformations Activation vector is easy to interpret Less trainable parameters required (77% less for MNIST) Great for overlapping objects Good for dealing with segmentation Cons: Computational heavy CapsNet does not allow two instances of the same class at the same location Likes to account for everything in the image Requires a lot of further research

40 Summary Capsule: A group of neurons whose activity vector represents the instantiation parameters of a specific type of entity such as an object or an object part The vector parameters could be: rotation, position, size, texture. Dynamic routing routes information to higher layers by agreeing on output between layers Achieves inverse rendering Equivariance: Keeps track of where the entity is in the image.

41 Additional Information on CapsNet [1] [2] [3] [4] Awesome Capsule Networks. ( Capsule Networks (CapsNets) - Tutorial. ( Understanding Hinton s Capsule Networks. Part IV: CapsNet Architecture. ( Geoffrey Hinton talk "What is wrong with convolutional neural nets?" (

Dynamic Routing Between Capsules

Report Explainable Machine Learning Dynamic Routing Between Capsules Author: Michael Dorkenwald Supervisor: Dr. Ullrich Köthe 28. Juni 2018 Inhaltsverzeichnis 1 Introduction 2 2 Motivation 2 3 CapusleNet