Unsupervised Learning of Spatiotemporally Coherent Metrics

Size: px

Start display at page:

Download "Unsupervised Learning of Spatiotemporally Coherent Metrics"

Allen Moore
5 years ago
Views:

1 Unsupervised Learning of Spatiotemporally Coherent Metrics Ross Goroshin, Joan Bruna, Jonathan Tompson, David Eigen, Yann LeCun arxiv Presented by Jackie Chu

2 Contributions Insight between slow feature learning and metric learning Temporal coherence An auto-encoder to define semantically coherent metric 2

3 Contributions: Food for thought Supervised methods - improved accuracy (with labels) Weakly-supervised methods - more scalable (with queries) Unsupervised methods - no labels = endless possibilities?!?!?!?! 3

4 Contributions: Food for thought Supervised methods - improved accuracy (with labels) Weakly-supervised methods - more scalable (with queries) Unsupervised methods - no labels = endless possibilities?!?!?!?! Exploit video data: 300 hours per minute, n frames per video = a lot of data 4

5 Slowness: What is it? For example, when looking at a picture on a computer screen, we see the objects that are present on it and their relative position in the image, rather than the color of the individual pixels... An important idea in the field is that objects in the world have common structure, which results in statistical regularities in the sensory input. Using these regularities as a guide, the brain is able to form a meaningful representation of its environment. Laurenz Wiskott et al. (2011), Scholarpedia, 6(4):5282 5

6 Slowness: What is it? For example, when looking at a picture on a computer screen, we see the objects that are present on it and their relative position in the image, rather than the color of the individual pixels...an important idea in the field is that objects in the world have common structure, which results in statistical regularities in the sensory input. Using these regularities as a guide, the brain is able to form a meaningful representation of its environment. Laurenz Wiskott et al. (2011), Scholarpedia, 6(4):5282 6

7 Slowness: What is it? For example, when looking at a picture on a computer screen, we see the objects that are present on it and their relative position in the image, rather than the color of the individual pixels...an important idea in the field is that objects in the world have common structure, which results in statistical regularities in the sensory input. Using these regularities as a guide, the brain is able to form a meaningful representation of its environment. Laurenz Wiskott et al. (2011), Scholarpedia, 6(4):5282 7

8 Slowness: What is it? For example, when looking at a picture on a computer screen, we see the objects that are present on it and their relative position in the image, rather than the color of the individual pixels...an important idea in the field is that objects in the world have common structure, which results in statistical regularities in the sensory input. Using these regularities as a guide, the brain is able to form a meaningful representation of its environment. Laurenz Wiskott et al. (2011), Scholarpedia, 6(4):5282 8

9 Slowness: The gist Overview Slowness Feature Analysis (SFA) - an unsupervised learning algorithm Taking fine (sensitive) signals and extracting coarser (gradual) features Laurenz Wiskott et al. (2011), Scholarpedia, 6(4):5282 9

10 Slowness: The gist Overview Slowness Feature Analysis (SFA) - an unsupervised learning algorithm Taking fine (sensitive) signals and extracting coarser (gradual) features Laurenz Wiskott et al. (2011), Scholarpedia, 6(4):

11 Slowness: The gist Overview Slowness Feature Analysis (SFA) - an unsupervised learning algorithm Taking fine (sensitive) signals and extracting coarser (gradual) features Laurenz Wiskott et al. (2011), Scholarpedia, 6(4):

12 Slowness: The gist Overview Slowness Feature Analysis (SFA) - an unsupervised learning algorithm Taking fine (sensitive) signals and extracting coarser (gradual) features Laurenz Wiskott et al. (2011), Scholarpedia, 6(4):

13 Slowness: The gist Challenges A non-linear optimization problem Maximize slowness over time, instantaneously Input can be high-dimensional signals Avoiding trivial solution, which is uninformative 13

14 Slowness: Previous Work Methods Pros Slow feature learning Perfectly slow features Metric learning Dimensionality reduction Cons Avoiding trivial solution in these methods sets limitations -- not discriminative No evaluations amongst trade-offs 14

15 Contributions: Pros Alleviate constraints by Leveraging CNNs, being locally translation-invariant, for slow feature learning Build nontrivial convolutional dictionaries Reconstruction criterion to make sparse inferences, to then maximize slowness Discriminative power with stability 15

16 Dictionaries: Discriminative Left-to-Right: (1) DrLIM, (2) Sparsity, (3) Group Sparsity, (4) Slowness + Sparsity (their method) 16

17 Dictionaries: Discriminative DrLIM Paper s method 17

18 Slowness: A toy example Before training yaw 90 degrees, one-degree increments, over two axes 8100 samples 96x96 images 9216 dimensions ℝ9216 ℝ 2 Blue = 0, Pink = 90 roll After training yaw roll 18

19 Slowness: A toy example Before training yaw 90 degrees, one-degree increments, over two axes 8100 samples 96x96 images 9216 dimensions ℝ9216 ℝ 2 Blue = 0, Pink = 90 roll After training yaw roll 19

20 Slowness: As metric learning Varying slowly: - Total number of frames - Feature vector 20

21 Slowness: As metric learning Avoiding trivial solution: - units in feature space 21

22 Slowness: As metric learning Varying slowly + Avoiding trivial solution = ( t) - t - mapping of input to feature space - trainable coefficients - temporal samples 22

23 Slowness: As metric learning Varying slowly + Avoiding trivial solution = BUT, Not discriminative. ( ) - - mapping of input to feature space t t - trainable coefficients - temporal samples 23

24 Slowness: Pooling auto-encoders Varying slowly + Avoiding trivial solution = avoid degenerate-ness Leverage sparse auto-encoders Penalize reconstruction error Helps preserve input 24

25 Slowness: Pooling auto-encoders Varying slowly + Avoiding trivial solution = avoid degenerate-ness Leverage sparse auto-encoders Penalize reconstruction error Helps preserve input 25

26 Slowness: Pooling auto-encoders ( + penalization) ( + discriminative) Varying slowly + Avoiding trivial solution = avoid degenerate-ness Capture local deformations on hidden activations Slowness as metric learning 26

27 Slowness: Pooling auto-encoders ( + penalization) ( + discriminative) Varying slowly + Avoiding trivial solution = avoid degenerate-ness Group sparsity penalty Promote sparsity activations for learning sparse inference 27

28 Slowness: Pooling auto-encoders ( + penalization) ( + discriminative) Varying slowly + Avoiding trivial solution = avoid degenerate-ness Penalty parameters: - locality - slowness 28

29 Dictionaries: Discriminative Sparsity only ( = 0) Paper s method ( = 2) 29

30 Slowness: Siamese Architecture 30

31 Slowness: Siamese Architecture Feed pair of frames to encode dictionaries 31

32 Slowness: Siamese Architecture ReLU on hidden representations of analysis dictionary, 32

33 Slowness: Siamese Architecture Reconstruction with sparsity 1 penalty 33

34 FCN Without ( With ( 1 penalty = 0) 1 penalty > 0) 34

35 Slowness: Siamese Architecture Pool slowness features from sparse codes 35

36 CNN Spatiotemporally coherent features Attributed to pooling layers Although more redundant, leads to a simpler sparsity problem solved by stochastic gradient descent 36

37 Experiments Evaluate relationship between slowness and metric learning Given a query frame, find its nearest neighbors and compare to ground-truth neighbors query results vs. Laurenz Wiskott et al. (2011), Scholarpedia, 6(4):5282 ground-truth 37

38 Experiments Evaluate relationship between slowness and metric learning Given a query frame, find its nearest neighbors and compare to ground-truth neighbors Introducing temporal coherence as a metric to best find temporal neighbors 38

39 Experiments: Collecting Youtube Data Automatic segmentation (2-40 frames length) An example of incorrect segmentation 39

CIFAR-10 dataset for classification tasks http://www.cs.

40 Experiments: Transfer Learning Temporal coherence can be weak if all scenes are different Instead, evaluated against CIFAR-10 dataset for classification tasks

41 Experiments: Evaluations Evaluating in feature space Greedily or jointly 5-1 ratio of negative to positive samples in mini-batch Precision - proportion of nearest neighbor with same label Recall - Proportion of frames recalled from that query 41

42 Experiments: Evaluations 1. DrLIM 2. Group Sparsity 3. Paper s model 42

43 Experiments: Youtube Visualizing nearest neighbor results in different spaces Pixel-wise (raw, ZCA) Feature (methods) 43

44 Experiments: Youtube Raw Input Paper s method- Layer 2 query DrLIM- Layer 2 44

45 Experiments: CIFAR-10 Transfer learning with classification task in different spaces Pixel-wise (raw, ZCA) Feature (methods) 45

46 Experiments: CIFAR-10 Raw Input Paper s method- Layer 2 query DrLIM- Layer 2 46

47 Results: Temporal Precision Recall Winner: DrLIM Minimizing error in terms of distance across feature spaces 47

48 Results: Classification Precision Winner: Paper s method Best semantically, and comparable to supervised-method 48

49 Conclusions Insight between slow feature learning and metric learning Perfect slowness is not the best Temporal coherence Weakly-supervised metric to fine-tune performance An encoder to define semantically coherent metric Stable and discriminative 49

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

COMP 551 Applied Machine Learning Lecture 16: Deep Learning Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted, all