Unsupervised Deep Learning. James Hays slides from Carl Doersch and Richard Zhang

Size: px

Start display at page:

Download "Unsupervised Deep Learning. James Hays slides from Carl Doersch and Richard Zhang"

Coral Austin
6 years ago
Views:

1 Unsupervised Deep Learning James Hays slides from Carl Doersch and Richard Zhang

2 Recap from Previous Lecture We saw two strategies to get structured output while using deep learning With object detection, one strategy is brute force: detect everywhere at once

estimation / keypoint detection, the network produces an

3 Recap from Previous Lecture We saw two strategies to get structured output while using deep learning With pose estimation / keypoint detection, the network produces an image-based intermediate representation Part Detection Part Association

4 Recap from Previous Lecture More generally, it can pay off to get creative. Even if Deep ConvNets aren t a natural fit for an image-related task, they might be able to learn a subtask or create a useful intermediate representation.

5 Today s Lecture Two methods for unsupervised deep learning Context Prediction. Doersch et al. ICCV 2015 Colorful Image Colorization. Zhang et al. ECCV 2016 Big picture: do we need big datasets like ImageNet to make deep learning worthwhile? Can we learn from something else?

6 Unsupervised Visual Representation Learning by Context Prediction Carl Doersch Joint work with Alexei A. Efros & Abhinav Gupta ICCV 2015

7 ImageNet + Deep Learning Beagle - Image Retrieval - Detection (RCNN) - Segmentation (FCN) - Depth Estimation -

8 ImageNet + Deep Learning Beagle Materials? Pose? Do we Do even we need semantic this task? labels? Geometry? Parts? Boundaries?

9 Context as Supervision [Collobert & Weston 2008; Mikolov et al. 2013] Deep Net

10 Context Prediction for Images????? A??? B

11 Semantics from a non-semantic task

12 Relative Position Task 8 possible locations Classifier CNN CNN Randomly Sample Patch Sample Second Patch

13 Classifier Patch Embedding Input Nearest Neighbors CNN CNN Note: connects across instances!

14 Architecture Softmax loss Fully connected Fully connected Fully connected Max Pooling Convolution Convolution Convolution LRN Max Pooling Convolution LRN Max Pooling Convolution Patch 1 Tied Weights Fully connected Max Pooling Convolution Convolution Convolution LRN Max Pooling Convolution LRN Max Pooling Convolution Patch 2

15 Avoiding Trivial Shortcuts Include a gap Jitter the patch locations

16 A Not-So Trivial Shortcut CNN Position in Image

17 Chromatic Aberration

18 Chromatic Aberration CNN

19 What is learned? Input Ours Random Initialization ImageNet AlexNet

20 Still don t capture everything Input Ours Random Initialization ImageNet AlexNet You don t always need to learn! Input Ours Random Initialization ImageNet AlexNet

21 Mined from Pascal VOC2011

22 Pre-Training for R-CNN Pre-train on relative-position task, w/o labels [Girshick et al. 2014]

23 % Average Precision 54.2 VOC 2007 Performance (pretraining for R-CNN) No Rescaling Krähenbühl et al VGG + Krähenbühl et al. [Krähenbühl, Doersch, Donahue & Darrell, Data-dependent Initializations of CNNs, 2015] ImageNet Labels Ours No Pretraining

24 Capturing Geometry?

25 So, do we need semantic labels?

26 Self-Supervision and the Future Ego-Motion Video Similar [Agrawal et al. 2015; Jayaraman et al. 2015] [Wang et al. 2015; Srivastava et al 2015; ] Context CNN [Doersch et al. 2014; Pathak et al. 2015; Isola et al. 2015]

28 Colorful Image Colorization Richard Zhang, Phillip Isola, Alexei (Alyosha) Efros richzhang.github.io/colorization

29 Ansel Adams, Yosemite Valley Bridge

30 Ansel Adams, Yosemite Valley Bridge Our Result

31 Grayscale image: L channel Color information: ab channels L ab

32 Semantics? Higher-level Grayscale image: L channel abstraction? Concatenate (L,ab) L ab Free supervisory signal

33 Inherent Ambiguity Grayscale

34 Inherent Ambiguity Our Output Ground Truth

35 Better Loss Function Regression with L2 loss inadequate Colors in ab space (continuous) Use multinomial classification Class rebalancing to encourage learning of rare colors

36 Better Loss Function Regression with L2 loss inadequate Colors in ab space (discrete) Use multinomial classification Class rebalancing to encourage learning of rare colors

38 Better Loss Function Regression with L2 loss inadequate log 10 probability Histogram over ab space Use multinomial classification Class rebalancing to encourage learning of rare colors

39 Better Loss Function Regression with L2 loss inadequate log 10 probability Histogram over ab space Use multinomial classification Class rebalancing to encourage learning of rare colors

Deshpande et al. Cheng et al. In ICCV 2015.

40 Classification Parametric L2 Regression Nonparametric Hertzmann et al. In SIGGRAPH, Welsh et al. In TOG, Irony et al. In Eurographics, Liu et al. In TOG, Chia et al. In ACM Gupta et al. In ACM, Hand-engineered Features Deep Networks Deshpande et al. Cheng et al. In ICCV Dahl. Jan Iizuka et al. In SIGGRAPH, Charpiat et al. In ECCV Larsson et al. In ECCV [Concurrent]

41 Network Architecture lightness conv1 conv2 conv3 conv4 conv5 64 fc6 fc7 ab color L

42 Network Architecture lightness conv1 conv2 conv3 conv4 64 conv5 à trous [1]/dilated [2] conv6 à trous/dilated conv7 conv8 ab color L [1] Chen et al. In arxiv, [2] Yu and Koltun. In ICLR, 2016

43 Ground Input Truth L2 Regression Class w/ Rebalancing

44 Failure Cases

45 Biases

46 Evaluation Visual Quality Representation Learning Quantitative Qualitative Per-pixel accuracy Perceptual realism Semantic interpretability Low-level stimuli Legacy grayscale photos Task generalization ImageNet classification Task & dataset generalization PASCAL classification, detection, segmentation Hidden unit activations

47 Evaluation Visual Quality Representation Learning Quantitative Qualitative Per-pixel accuracy Perceptual realism Semantic interpretability Low-level stimuli Legacy grayscale photos Task generalization ImageNet classification Task & dataset generalization PASCAL classification, detection, segmentation Hidden unit activations

48 Evaluation Visual Quality Representation Learning Quantitative Qualitative Per-pixel accuracy Perceptual realism Semantic interpretability Low-level stimuli Legacy grayscale photos Task generalization ImageNet classification Task & dataset generalization PASCAL classification, detection, segmentation Hidden unit activations

49 Perceptual Realism / Amazon Mechanical Turk Test

51 clap if fake clap if fake

52 Fake, 0% fooled

54 clap if fake clap if fake

55 Fake, 55% fooled

57 clap if fake clap if fake

58 Fake, 58% fooled

59 from Reddit /u/sherysantucci

60 Recolorized by Reddit ColorizeBot

61 Photo taken by Reddit /u/timteroo, Mural from street artist Eduardo Kobra

62 Recolorized by Reddit ColorizeBot

63 Perceptual Realism Test 50% AMT Labeled Real [%] Ground Truth Random Larsson et al Ours (L2) Ours (class) Ours (full) images tested per algorithm %

64 Input Ground Truth Output

65 Predicting Labels from Data Supervised training Data x ImageNet images Learned feature hierarchy Label y ImageNet labels

66 Predicting Data from Data Supervised training Unsupervised/ Self-supervised training x 0 x 1 ImageNet images x 0 Learned feature hierarchy Learned feature hierarchy Label y ImageNet labels x 1

67 Autoencoders Hinton & Salakhutdinov. Science Co-Occurrence Denoising Autoencoders Vincent et al. ICML Egomotion Audio Oral O-1B-01 Yesterday 2 3 PM Owens et al. CVPR 2016, ECCV 2016 Isola et al. ICLR Workshop Agrawal et al. ICCV 2015 Jayaraman et al. ICCV 2015 Context Video Doersch et al. ICCV 2015 Pathak et al. CVPR 2016 Wang et al. ICCV 2015

68 Cross-Channel Encoder conv1 conv2 conv3 conv4 conv5 conv6 lightness à trous [1]/dilated [2] conv7 conv8 ab color L Hidden Unit Activations Zhou et al. In ICLR, [1] Chen et al. In arxiv, [2] Yu and Koltun. In ICLR, 2016

69 Task Generalization: ILSVRC linear classification lightness conv1 conv2 conv3 conv4 conv Class Supervision Are semantic classes linearly separable in the learned feature space?

70 Task Generalization: ILSVRC linear classification

71 Task Generalization: ILSVRC linear classification

72 Hidden Unit (conv5) Activations sky trees water

73 Hidden Unit (conv5) Activations faces dog faces flowers

74 Dataset & Task Generalization on PASCAL VOC Does the feature representation transfer to other datasets and tasks? Classification Krähenbühl et al. In ICLR, Detection Fast R-CNN. Girshick. In ICCV, Segmentation FCNs. Long et al. In CVPR, 2015.

75 % from Gaussian to ImageNet labels ImageNet Labels Dataset & Task Generalization on PASCAL VOC 100% Autoencoder Krähenbühl et al. Agrawal et al. Pathak et al. Wang & Gupta Doersch et al. Donahue et al. Ours Gaussian Initialization 0% Classification Detection Segmentation

76 Does the method work on legacy black and white photos?

77 Thylacine, Dr. David Fleay, extinct in 1936.

78 Thylacine, Dr. David Fleay, extinct in 1936.

79 Amateur Family Photo, 1956.

80 Amateur Family Photo, 1956.

81 Henri Cartier-Bresson, Sunday on the Banks of the River Seine, 1938.

82 Henri Cartier-Bresson, Sunday on the Banks of the River Seine, 1938.

83 Dorothea Lange, Migrant Mother, 1936.

84 Dorothea Lange, Migrant Mother, 1936.

85 Additional Information Demo Reddit ColorizeBot Type colorizebot under any image post Code Website full paper, user examples, visualizations

86 For the full paper, additional examples and our model: richzhang.github.io/colorization

Generative Networks. James Hays Computer Vision

Generative Networks. James Hays Computer Vision Generative Networks James Hays Computer Vision Interesting Illusion: Ames Window https://www.youtube.com/watch?v=ahjqe8eukhc https://en.wikipedia.org/wiki/ames_trapezoid Recap Unsupervised Learning Style