ECS 289H: Visual Recognition Fall Yong Jae Lee Department of Computer Science

Similar documents
Self-Supervised Learning & Visual Discovery

Unsupervised discovery of category and object models. The task

CAP 6412 Advanced Computer Vision

CS 2750: Machine Learning. Clustering. Prof. Adriana Kovashka University of Pittsburgh January 17, 2017

Copyright by Yong Jae Lee 2012

Collect-Cut: Segmentation with Top-Down Cues Discovered in Multi-Object Images

Instances on a Budget

Joint Inference in Image Databases via Dense Correspondence. Michael Rubinstein MIT CSAIL (while interning at Microsoft Research)

Beyond bags of features: Adding spatial information. Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba

Indexing local features and instance recognition May 16 th, 2017

Segmentation and Grouping April 19 th, 2018

Segmentation by Clustering. Segmentation by Clustering Reading: Chapter 14 (skip 14.5) General ideas

Texture April 17 th, 2018

Segmentation by Clustering Reading: Chapter 14 (skip 14.5)

Video Object Proposals

Indexing local features and instance recognition May 14 th, 2015

Analysis: TextonBoost and Semantic Texton Forests. Daniel Munoz Februrary 9, 2009

Image warping and stitching

ACM MM Dong Liu, Shuicheng Yan, Yong Rui and Hong-Jiang Zhang

Using the Kolmogorov-Smirnov Test for Image Segmentation

Image warping and stitching

Outline. Segmentation & Grouping. Examples of grouping in vision. Grouping in vision. Grouping in vision 2/9/2011. CS 376 Lecture 7 Segmentation 1

Texture April 14 th, 2015

Applications. Foreground / background segmentation Finding skin-colored regions. Finding the moving objects. Intelligent scissors

The Kinect Sensor. Luís Carriço FCUL 2014/15

Learning Spatial Context: Using Stuff to Find Things

Unsupervised Learning

Discovering Important People and Objects for Egocentric Video Summarization

Segmentation. Bottom up Segmentation Semantic Segmentation

End-to-End Localization and Ranking for Relative Attributes

Discriminative classifiers for image recognition

Video Google faces. Josef Sivic, Mark Everingham, Andrew Zisserman. Visual Geometry Group University of Oxford

Segmentation and Grouping April 21 st, 2015

Segmentation and Tracking of Partial Planar Templates

Image Segmentation. Srikumar Ramalingam School of Computing University of Utah. Slides borrowed from Ross Whitaker

CS 4495 Computer Vision. Segmentation. Aaron Bobick (slides by Tucker Hermans) School of Interactive Computing. Segmentation

Window based detectors

human vision: grouping k-means clustering graph-theoretic clustering Hough transform line fitting RANSAC

Previously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011

Boundaries and Sketches

on learned visual embedding patrick pérez Allegro Workshop Inria Rhônes-Alpes 22 July 2015

Shape Discovery from Unlabeled Image Collections

Segmenting Objects in Weakly Labeled Videos

Object Recognition. Computer Vision. Slides from Lana Lazebnik, Fei-Fei Li, Rob Fergus, Antonio Torralba, and Jean Ponce

Ping Tan. Simon Fraser University

Flow-free Video Object Segmentation

CS4495/6495 Introduction to Computer Vision. 8C-L1 Classification: Discriminative models

Supervised learning. y = f(x) function

Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science.

Deep learning for object detection. Slides from Svetlana Lazebnik and many others

Deformable Part Models

Data-driven Depth Inference from a Single Still Image

Patch-Based Image Classification Using Image Epitomes

K-Nearest Neighbors. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824

Human Upper Body Pose Estimation in Static Images

Unsupervised Learning Partitioning Methods

Patch Descriptors. EE/CSE 576 Linda Shapiro

Object recognition (part 2)

Segmentation and Grouping

Grouping and Segmentation

Grouping and Segmentation

Learning Realistic Human Actions from Movies

Region-based Segmentation and Object Detection

CS 4495 Computer Vision Classification 3: Bag of Words. Aaron Bobick School of Interactive Computing

Spatial Latent Dirichlet Allocation

By Suren Manvelyan,

Beyond Bags of features Spatial information & Shape models

Image Segmentation. Ross Whitaker SCI Institute, School of Computing University of Utah

Image Segmentation continued Graph Based Methods. Some slides: courtesy of O. Capms, Penn State, J.Ponce and D. Fortsyth, Computer Vision Book

Object Recognition. Lecture 11, April 21 st, Lexing Xie. EE4830 Digital Image Processing

Announcements. Recognition. Recognition. Recognition. Recognition. Homework 3 is due May 18, 11:59 PM Reading: Computer Vision I CSE 152 Lecture 14

Summarization of Egocentric Moving Videos for Generating Walking Route Guidance

3D Spatial Layout Propagation in a Video Sequence

Part based models for recognition. Kristen Grauman

CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu. Lectures 21 & 22 Segmentation and clustering

Image Analysis Lecture Segmentation. Idar Dyrdal

K Nearest Neighbor Wrap Up K- Means Clustering. Slides adapted from Prof. Carpuat

AN ENHANCED ATTRIBUTE RERANKING DESIGN FOR WEB IMAGE SEARCH

Fitting a transformation: Feature-based alignment April 30 th, Yong Jae Lee UC Davis

Articulated Pose Estimation with Flexible Mixtures-of-Parts

SSD: Single Shot MultiBox Detector. Author: Wei Liu et al. Presenter: Siyu Jiang

Video Object Segmentation by Salient Segment Chain Composition

A Statistical Approach to Culture Colors Distribution in Video Sensors Angela D Angelo, Jean-Luc Dugelay

Copyright by Yong Jae Lee 2008

Learning the Ecological Statistics of Perceptual Organization

Unsupervised Patch-based Context from Millions of Images

Spatio-temporal Feature Classifier

CS 231A Computer Vision (Winter 2014) Problem Set 3

Disguised Face Identification (DFI) with Facial KeyPoints using Spatial Fusion Convolutional Network. Nathan Sun CIS601

Mining Discriminative Adjectives and Prepositions for Natural Scene Recognition

Segmentation by Weighted Aggregation for Video. CVPR 2014 Tutorial Slides! Jason Corso!

EECS 442 Computer Vision fall 2011

Scale and Rotation Invariant Approach to Tracking Human Body Part Regions in Videos

Medical images, segmentation and analysis

Part-based and local feature models for generic object recognition

Contexts and 3D Scenes

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Discovering Visual Hierarchy through Unsupervised Learning Haider Razvi

Local features and image matching. Prof. Xin Yang HUST

Why study Computer Vision?

Transcription:

ECS 289H: Visual Recognition Fall 2014 Yong Jae Lee Department of Computer Science

Plan for today Questions? Research overview

Standard supervised visual learning building Category models Annotators tree Novel images Number of training images required can be costly Assumes closed-world setting where all categories are known 4

Unsupervised visual discovery Discovered categories Visual world 5

Unsupervised visual discovery Visual world Object segmentations in images and video 6

Unsupervised visual discovery 1:00 pm 2:00 pm 3:00 pm 4:00 pm Storyboard visual summary Visual world No human to explicitly guide visual recognition process 7

Why visual discovery? Exploring new environments 8

Summarization Why visual discovery? MSR Sensecam 9

Why visual discovery? 6 billion images 70 billion images 1 billion images served daily 10 billion images 100 hours uploaded per minute From : Almost 90% of web traffic is visual! Most of it is unlabeled!! 10

Inputs today Understand and organize and Personal photo albums Movies, news, sports index all this data!! Surveillance and security Svetlana Lazebnik Medical and scientific images

Let s first explore what we can do with big data!

Everyday use of big data: Predictive text 13

Predictive drawing? 14

video ShadowDraw

Research goal: Visual discovery Discovered categories Visual world 16

Key challenges Simultaneously estimate segmentation and groups Unknown variability in appearance What is the proper distance metric? 17

How similar are two pictures? CLIME - CRIME = hamming distance of 1 letter y y x - x = Euclidian distance of 5 units - = Grayvalue distance of 50 values - =? Alyosha Efros 18

How similar are two pictures?? = 19

Problem Clusters formed from full image matches

Mutual Relationship between Foreground Features and Clusters If we have only foreground features, we can form good clusters Clusters formed from full image matches Clusters formed from foreground matches

Mutual Relationship between Foreground Features and Clusters If we have good clusters, we can detect the foreground

Our Approach Feature weights Feature index Update cluster based on weighted feature matches Refine feature weights given current clusters Unsupervised task that iteratively seeks the mutual support between discovered objects and their defining features [Lee & Grauman, Foreground Focus, IJCV 2009]

Cluster and Feature Weight Refinement: Iteration 1 Normalized Images Initial Pair-wise as Set Local Cuts of Feature Clustering Matching Clusters Sets Feature weights Feature index

Cluster and Feature Weight Refinement: Iteration 1 Feature weights Feature index New Compute Feature Feature Weights Weights

Cluster and Feature Weight Refinement: Iteration 2 New Set of Clusters Feature weights Feature index New Compute Feature Feature Weights Weights

Cluster and Feature Weight Refinement: Iteration 3 Pair-wise Final Set of Matching + Clusters Normalized Cuts Feature weights Feature index New Feature Weights

Quality of Clusters Formed Black dotted lines indicate the best possible quality that could be obtained if the ground truth segmentation were known

Quality of Foreground Detection 10-classes subset - highly weighted features

Shape Invariant to lighting conditions Relatively stable compared to intra-category appearance (texture, color) variations Can we discover common object shapes within unlabeled multicategory collections of images?

Anchoring Edge Fragments to Local Patches Even with accurate patch matches, there s a limit to how much shape information can be captured. By anchoring edge fragments to patch features, we can produce more reliable matches and describe the object s shape.

Foreground Shape Discovery: Prototypical Shape Examples of discovered object contours Our shapes [Lee & Grauman, Shape Discovery, CVPR 2009]

Works well for object-centric images Complex images with multiple objects remains challenging

Existing approaches Previous work treats unsupervised visual discovery as an appearance-grouping problem. 1 2 3 4 50

Our idea How can seeing previously learned objects in novel images help to discover new categories? 1 2 3 4 51

Our idea Our idea: Discover visual categories within unlabeled images by modeling interactions between the unfamiliar regions and familiar objects. 1 2 3 4 [Lee & Grauman, Object-graphs, CVPR 2010] 52

Context-aware visual discovery??? sky sky sky driveway house? grass grass house truck fence grass house?? driveway driveway [Lee & Grauman, Object-graphs, CVPR 2010] 53

Learn Models Detect Unknowns Object-level Context Discovery Learn known categories tree building sky road Offline: Train region-based classifiers for N known categories using labeled training data. 54

Learn Models Detect Unknowns Object-level Context Discovery Identifying unknown regions Input: unlabeled pool of novel images Compute multiple segmentations for each unlabeled image 55

P(class region) P(class region) P(class region) P(class region) Learn Models Detect Unknowns Object-level Context Discovery Identifying unknown regions Prediction: known High entropy Prediction: unknown Prediction: known Prediction: known Deem each segment as known or unknown based on resulting entropy: 56

Learn Models Detect Unknowns Object-level Context Discovery Object-graphs An unknown region within an image Model the topology of category predictions relative to the unknown (unfamiliar) region. 57

Learn Models Detect Unknowns Object-level Context Discovery An unknown region within an image Object-graphs Closest nodes in its object-graph 3a 1a 2a S 0 3b 2b 1b Consider spatially near regions above and below, record distributions for each known class. 0 self b t s r 1a above b t s r 1b below g(s) = [,,, ] b t s r H 0 (s) 0 self b t s r Ra above b t s r Rb below b t s r H 1 (s) H R (s) 1 st nearest region out to R th nearest 58

Example object-graphs unknown building sky road Colors indicate the predicted known category (max posterior) 59

Learn Models Detect Unknowns Object-level Context Discovery Clusters from region-region affinities Unknown Regions Object-level context provides more robust affinities 60

Results: object discovery accuracy MSRC-v2 PASCAL 2008 MSRC-v0 Corel 61

Example discoveries 62

Context-aware face discovery Kate David Kate David Kate Kate Kate name? David System can suggest novel people to name based on their appearance and co-occurrence with familiar people. [Lee & Grauman, Face discovery, BMVC 2011] 63

Results: Context-aware face discovery Dataset: Gallagher, Friends, Buffy 12,542 images, 8,452 faces and 23 unique people Two splits: 8 unknowns, and 15 unknowns Discovered Face 2 2 2 2 2 12 12 12 12 12 Co-occurring faces 3 3 3 3 3 7 7 7 7 7 [Lee & Grauman, Face discovery, BMVC 2011] 64

Self-paced discovery Previous work treats unsupervised visual discovery as a one-pass batch procedure. Traditional Batch k-way 66

Self-paced discovery Focus on the easier instances first, and gradually discover new models of increasing complexity. Single Easiest (Ours) [Lee & Grauman, Self-paced discovery, CVPR 2011] 67

Initialize Stuff Detect Easy Instances Discover New Category Expand Knowledge Identify Easy Objects + Objectness (Obj) Context-Awareness (CA) Easiness (ES) Familiarity Map (F) Obj: how well a window contains any generic object. CA: how well surrounding regions resemble familiar categories. 68

Initialize Stuff Detect Easy Instances Discover New Category Expand Knowledge Identify Easy Objects 69

Object Discovery Accuracy 3 9 12 13 14 20 22 29 70

Unsupervised visual discovery Visual world Object segmentations in images and video 71

Collect-Cut Unsupervised Segmentation Examples Discovered Ensemble from Unlabeled Multi-Object Images Unlabeled Images Collect-Cut (ours) Best Bottom-up (with multi-segs) [Lee & Grauman, Collect-Cut, CVPR 2010] 72

Problem: Video object segmentation How to segment the foreground objects in video when background is moving and changing categories of foreground objects are unknown in advance Input: Unannotated video Desired output: Segmentation of high-ranking foreground object Existing methods group pixels using low-level features, which can result in an over-segmentation. [Brendel & Todorovic 2009, Vazquez-Reina et al. 2010, Grundmann et al. 2010, Brox & Malik 2010] 73

Key-segment discovery Discover a set of object-like key-segments for category independent video object segmentation Resist over-segmentation by detecting regions with object-like appearance and motion [Lee, Kim, Grauman, Key-segments, ICCV 2011] 74

Key-segment discovery 1) Find object-like regions using appearance and motion cues 2) Group regions across video to discover key-segment hypotheses 3) Rank hypotheses and build segmentation models for each hypothesis 4) For a given hypothesis, segment the corresponding foreground object using the models Color model Output segmentation Shape model [Lee, Kim, Grauman, Key-segments, ICCV 2011] 75

Results: Key-segment video segmentation Detect and segment people and discovered important objects without category-specific models Success in spite of moving camera, bg changes, low resolution 76

Results: Key-segment video segmentation Grundmann et al. 2010 Ours Grundmann et al. 2010 Ours Resists over-segmentation by detecting regions with objectlike appearance and motion 77

Results: Key-segment video segmentation Segmentation error rate Background subtraction falls apart Ours produces state-of-the-art results even when compared to supervised methods [29]: Tsai et al. BMVC 2010, [7]: Chockalingam et al. ICCV 2009 78

Unsupervised visual discovery 1:00 pm 2:00 pm 3:00 pm 4:00 pm Storyboard visual summary Visual world 79

Mining first-person camera data GoPro Google Glass Looxcie Tobii SMI Pivothead 80

Mining first-person camera data 90 s Steve Mann life logger 81

Problem: Summarizing egocentric videos Wearable camera Input: Egocentric video of the camera wearer s day 9:00 am 10:00 am 11:00 am 12:00 pm 1:00 pm 2:00 pm Output: Storyboard summary of discovered important people and objects [Lee, Ghosh, Grauman, Egocentric video summarization, CVPR 2012] 82

Important person/object discovery Discover important people and objects for egocentric video summarization Important: things with which the camera wearer has significant interaction [Lee, Ghosh, Grauman, Egocentric video summarization, CVPR 2012] 83

Collect training data Learn Importance Segment video into events Discover important regions Storyboard summary Data collection 15 fps, 320 x 480 resolution 10 videos, 3-5 hrs in length; total of 37 hrs Four subjects: one undergraduate, two grad students, and one office worker 84

Collect training data Learn Importance Segment video into events Discover important regions Storyboard summary Egocentric features: Learning region importance distance to hand distance to frame center frequency 85

Collect training data Learn Importance Segment video into events Discover important regions Storyboard summary Egocentric features: Learning region importance distance to hand distance to frame center frequency Object features: [ ] candidate region s appearance, motion [ ] surrounding area s appearance, motion Object-like appearance, motion Region features: size, width, height, centroid overlap w/ face detection 86

Collect training data Learn Importance Segment video into events Discover important regions Storyboard summary Learning region importance importance learned parameters i th feature value Regressor to learn and predict a region s degree of importance Expect significant interactions between the features; e.g., a region near the hand is important only if it is object-like in appearance For training: For testing: predict I(r) given x i (r) s 87

Results: Important region prediction Ours Object-like [Carreira, 2010] Object-like [Endres, 2010] Saliency [Walther, 2006] Good predictions 88

Results: Important region prediction Ours Object-like [Carreira, 2010] Object-like [Endres, 2010] Saliency [Walther, 2006] Failure cases 89

Collect training data Learn Importance Segment video into events Discover important regions Storyboard summary Generating a storyboard summary Event 1 Event 2 Event 3 Event 3 Event 4 Display event boundaries and frames of the selected important people and objects 90

Results: Egocentric video summarization Original video (3 hours) Our summary (12 frames) 91

Results: Egocentric video summarization 92

Fine-grained recognition 94

video AverageExplorer

Sign-up for papers Coming up Next class Object Recognition from Local Scale-Invariant Features. D. Lowe. ICCV 1999. Video Google: A Text Retrieval Approach to Object Matching in Videos. J. Sivic and A. Zisserman. ICCV 2003. Read both papers Write a review for one of them