ECS 289H: Visual Recognition Fall Yong Jae Lee Department of Computer Science

Size: px

Start display at page:

Download "ECS 289H: Visual Recognition Fall Yong Jae Lee Department of Computer Science"

Sharon Jackson
6 years ago
Views:

1 ECS 289H: Visual Recognition Fall 2014 Yong Jae Lee Department of Computer Science

2 Plan for today Questions? Research overview

3 Standard supervised visual learning building Category models Annotators tree Novel images Number of training images required can be costly Assumes closed-world setting where all categories are known 4

4 Unsupervised visual discovery Discovered categories Visual world 5

5 Unsupervised visual discovery Visual world Object segmentations in images and video 6

6 Unsupervised visual discovery 1:00 pm 2:00 pm 3:00 pm 4:00 pm Storyboard visual summary Visual world No human to explicitly guide visual recognition process 7

7 Why visual discovery? Exploring new environments 8

8 Summarization Why visual discovery? MSR Sensecam 9

9 Why visual discovery? 6 billion images 70 billion images 1 billion images served daily 10 billion images 100 hours uploaded per minute From : Almost 90% of web traffic is visual! Most of it is unlabeled!! 10

10 Inputs today Understand and organize and Personal photo albums Movies, news, sports index all this data!! Surveillance and security Svetlana Lazebnik Medical and scientific images

11 Let s first explore what we can do with big data!

12 Everyday use of big data: Predictive text 13

13 Predictive drawing? 14

14 video ShadowDraw

15 Research goal: Visual discovery Discovered categories Visual world 16

16 Key challenges Simultaneously estimate segmentation and groups Unknown variability in appearance What is the proper distance metric? 17

17 How similar are two pictures? CLIME - CRIME = hamming distance of 1 letter y y x - x = Euclidian distance of 5 units - = Grayvalue distance of 50 values - =? Alyosha Efros 18

18 How similar are two pictures?? = 19

19 Problem Clusters formed from full image matches

20 Mutual Relationship between Foreground Features and Clusters If we have only foreground features, we can form good clusters Clusters formed from full image matches Clusters formed from foreground matches

21 Mutual Relationship between Foreground Features and Clusters If we have good clusters, we can detect the foreground

Our Approach Feature weights Feature index Update cluster based on weighted feature matches Refine feature weights given current clusters

22 Our Approach Feature weights Feature index Update cluster based on weighted feature matches Refine feature weights given current clusters Unsupervised task that iteratively seeks the mutual support between discovered objects and their defining features [Lee & Grauman, Foreground Focus, IJCV 2009]

23 Cluster and Feature Weight Refinement: Iteration 1 Normalized Images Initial Pair-wise as Set Local Cuts of Feature Clustering Matching Clusters Sets Feature weights Feature index

24 Cluster and Feature Weight Refinement: Iteration 1 Feature weights Feature index New Compute Feature Feature Weights Weights

25 Cluster and Feature Weight Refinement: Iteration 2 New Set of Clusters Feature weights Feature index New Compute Feature Feature Weights Weights

26 Cluster and Feature Weight Refinement: Iteration 3 Pair-wise Final Set of Matching + Clusters Normalized Cuts Feature weights Feature index New Feature Weights

27 Quality of Clusters Formed Black dotted lines indicate the best possible quality that could be obtained if the ground truth segmentation were known

28 Quality of Foreground Detection 10-classes subset - highly weighted features

29 Shape Invariant to lighting conditions Relatively stable compared to intra-category appearance (texture, color) variations Can we discover common object shapes within unlabeled multicategory collections of images?

30 Anchoring Edge Fragments to Local Patches Even with accurate patch matches, there s a limit to how much shape information can be captured. By anchoring edge fragments to patch features, we can produce more reliable matches and describe the object s shape.

31 Foreground Shape Discovery: Prototypical Shape Examples of discovered object contours Our shapes [Lee & Grauman, Shape Discovery, CVPR 2009]

32 Works well for object-centric images Complex images with multiple objects remains challenging

33 Existing approaches Previous work treats unsupervised visual discovery as an appearance-grouping problem

34 Our idea How can seeing previously learned objects in novel images help to discover new categories?

35 Our idea Our idea: Discover visual categories within unlabeled images by modeling interactions between the unfamiliar regions and familiar objects [Lee & Grauman, Object-graphs, CVPR 2010] 52

36 Context-aware visual discovery??? sky sky sky driveway house? grass grass house truck fence grass house?? driveway driveway [Lee & Grauman, Object-graphs, CVPR 2010] 53

37 Learn Models Detect Unknowns Object-level Context Discovery Learn known categories tree building sky road Offline: Train region-based classifiers for N known categories using labeled training data. 54

38 Learn Models Detect Unknowns Object-level Context Discovery Identifying unknown regions Input: unlabeled pool of novel images Compute multiple segmentations for each unlabeled image 55

P(class region) P(class region) P(class region)

unknown Prediction: known Prediction: known Deem

39 P(class region) P(class region) P(class region) P(class region) Learn Models Detect Unknowns Object-level Context Discovery Identifying unknown regions Prediction: known High entropy Prediction: unknown Prediction: known Prediction: known Deem each segment as known or unknown based on resulting entropy: 56

40 Learn Models Detect Unknowns Object-level Context Discovery Object-graphs An unknown region within an image Model the topology of category predictions relative to the unknown (unfamiliar) region. 57

Learn Models Detect Unknowns Object-level Context Discovery An unknown

3a 1a 2a S 0 3b 2b 1b Consider spatially near regions above and below,

0 self b t s r 1a above b t s r 1b below g(s) = [,,, ] b t s r H 0 (s) 0

41 Learn Models Detect Unknowns Object-level Context Discovery An unknown region within an image Object-graphs Closest nodes in its object-graph 3a 1a 2a S 0 3b 2b 1b Consider spatially near regions above and below, record distributions for each known class. 0 self b t s r 1a above b t s r 1b below g(s) = [,,, ] b t s r H 0 (s) 0 self b t s r Ra above b t s r Rb below b t s r H 1 (s) H R (s) 1 st nearest region out to R th nearest 58

42 Example object-graphs unknown building sky road Colors indicate the predicted known category (max posterior) 59

43 Learn Models Detect Unknowns Object-level Context Discovery Clusters from region-region affinities Unknown Regions Object-level context provides more robust affinities 60

44 Results: object discovery accuracy MSRC-v2 PASCAL 2008 MSRC-v0 Corel 61

45 Example discoveries 62

46 Context-aware face discovery Kate David Kate David Kate Kate Kate name? David System can suggest novel people to name based on their appearance and co-occurrence with familiar people. [Lee & Grauman, Face discovery, BMVC 2011] 63

Results: Context-aware face discovery Dataset:

2 2 12 12 12 12 12 Co-occurring faces 3 3 3 3 3

47 Results: Context-aware face discovery Dataset: Gallagher, Friends, Buffy 12,542 images, 8,452 faces and 23 unique people Two splits: 8 unknowns, and 15 unknowns Discovered Face Co-occurring faces [Lee & Grauman, Face discovery, BMVC 2011] 64

48 Self-paced discovery Previous work treats unsupervised visual discovery as a one-pass batch procedure. Traditional Batch k-way 66

49 Self-paced discovery Focus on the easier instances first, and gradually discover new models of increasing complexity. Single Easiest (Ours) [Lee & Grauman, Self-paced discovery, CVPR 2011] 67

Easiness (ES) Familiarity Map (F) Obj: how well a window contains any

50 Initialize Stuff Detect Easy Instances Discover New Category Expand Knowledge Identify Easy Objects + Objectness (Obj) Context-Awareness (CA) Easiness (ES) Familiarity Map (F) Obj: how well a window contains any generic object. CA: how well surrounding regions resemble familiar categories. 68

51 Initialize Stuff Detect Easy Instances Discover New Category Expand Knowledge Identify Easy Objects 69

52 Object Discovery Accuracy

53 Unsupervised visual discovery Visual world Object segmentations in images and video 71

54 Collect-Cut Unsupervised Segmentation Examples Discovered Ensemble from Unlabeled Multi-Object Images Unlabeled Images Collect-Cut (ours) Best Bottom-up (with multi-segs) [Lee & Grauman, Collect-Cut, CVPR 2010] 72

Problem: Video object segmentation How to segment the foreground objects in video when background is moving and changing categories of foreground objects are unknown in advance Input: Unannotated

55 Problem: Video object segmentation How to segment the foreground objects in video when background is moving and changing categories of foreground objects are unknown in advance Input: Unannotated video Desired output: Segmentation of high-ranking foreground object Existing methods group pixels using low-level features, which can result in an over-segmentation. [Brendel & Todorovic 2009, Vazquez-Reina et al. 2010, Grundmann et al. 2010, Brox & Malik 2010] 73

56 Key-segment discovery Discover a set of object-like key-segments for category independent video object segmentation Resist over-segmentation by detecting regions with object-like appearance and motion [Lee, Kim, Grauman, Key-segments, ICCV 2011] 74

57 Key-segment discovery 1) Find object-like regions using appearance and motion cues 2) Group regions across video to discover key-segment hypotheses 3) Rank hypotheses and build segmentation models for each hypothesis 4) For a given hypothesis, segment the corresponding foreground object using the models Color model Output segmentation Shape model [Lee, Kim, Grauman, Key-segments, ICCV 2011] 75

58 Results: Key-segment video segmentation Detect and segment people and discovered important objects without category-specific models Success in spite of moving camera, bg changes, low resolution 76

59 Results: Key-segment video segmentation Grundmann et al Ours Grundmann et al Ours Resists over-segmentation by detecting regions with objectlike appearance and motion 77

60 Results: Key-segment video segmentation Segmentation error rate Background subtraction falls apart Ours produces state-of-the-art results even when compared to supervised methods [29]: Tsai et al. BMVC 2010, [7]: Chockalingam et al. ICCV

61 Unsupervised visual discovery 1:00 pm 2:00 pm 3:00 pm 4:00 pm Storyboard visual summary Visual world 79

62 Mining first-person camera data GoPro Google Glass Looxcie Tobii SMI Pivothead 80

63 Mining first-person camera data 90 s Steve Mann life logger 81

64 Problem: Summarizing egocentric videos Wearable camera Input: Egocentric video of the camera wearer s day 9:00 am 10:00 am 11:00 am 12:00 pm 1:00 pm 2:00 pm Output: Storyboard summary of discovered important people and objects [Lee, Ghosh, Grauman, Egocentric video summarization, CVPR 2012] 82

65 Important person/object discovery Discover important people and objects for egocentric video summarization Important: things with which the camera wearer has significant interaction [Lee, Ghosh, Grauman, Egocentric video summarization, CVPR 2012] 83

320 x 480 resolution 10 videos, 3-5 hrs in length; total of 37 hrs

66 Collect training data Learn Importance Segment video into events Discover important regions Storyboard summary Data collection 15 fps, 320 x 480 resolution 10 videos, 3-5 hrs in length; total of 37 hrs Four subjects: one undergraduate, two grad students, and one office worker 84

67 Collect training data Learn Importance Segment video into events Discover important regions Storyboard summary Egocentric features: Learning region importance distance to hand distance to frame center frequency 85

summary Egocentric features: Learning region

frequency Object features: [ ] candidate region s

appearance, motion Object-like appearance, motion

68 Collect training data Learn Importance Segment video into events Discover important regions Storyboard summary Egocentric features: Learning region importance distance to hand distance to frame center frequency Object features: [ ] candidate region s appearance, motion [ ] surrounding area s appearance, motion Object-like appearance, motion Region features: size, width, height, centroid overlap w/ face detection 86

69 Collect training data Learn Importance Segment video into events Discover important regions Storyboard summary Learning region importance importance learned parameters i th feature value Regressor to learn and predict a region s degree of importance Expect significant interactions between the features; e.g., a region near the hand is important only if it is object-like in appearance For training: For testing: predict I(r) given x i (r) s 87

70 Results: Important region prediction Ours Object-like [Carreira, 2010] Object-like [Endres, 2010] Saliency [Walther, 2006] Good predictions 88

71 Results: Important region prediction Ours Object-like [Carreira, 2010] Object-like [Endres, 2010] Saliency [Walther, 2006] Failure cases 89

72 Collect training data Learn Importance Segment video into events Discover important regions Storyboard summary Generating a storyboard summary Event 1 Event 2 Event 3 Event 3 Event 4 Display event boundaries and frames of the selected important people and objects 90

73 Results: Egocentric video summarization Original video (3 hours) Our summary (12 frames) 91

74 Results: Egocentric video summarization 92

75 Fine-grained recognition 94

76 video AverageExplorer

78 Sign-up for papers Coming up Next class Object Recognition from Local Scale-Invariant Features. D. Lowe. ICCV Video Google: A Text Retrieval Approach to Object Matching in Videos. J. Sivic and A. Zisserman. ICCV Read both papers Write a review for one of them

Self-Supervised Learning & Visual Discovery

CS 2770: Computer Vision Self-Supervised Learning & Visual Discovery Prof. Adriana Kovashka University of Pittsburgh April 10, 2017 Motivation So far we ve assumed access to plentiful labeled data How