Context. CS 554 Computer Vision Pinar Duygulu Bilkent University. (Source:Antonio Torralba, James Hays)

Similar documents
Contexts and 3D Scenes

Contexts and 3D Scenes

CS 558: Computer Vision 13 th Set of Notes

Closing the Loop in Scene Interpretation

Using the Forest to See the Trees: Context-based Object Recognition

Can Similar Scenes help Surface Layout Estimation?

Window based detectors

Automatic Photo Popup

Local cues and global constraints in image understanding

Data-driven Depth Inference from a Single Still Image

IDE-3D: Predicting Indoor Depth Utilizing Geometric and Monocular Cues

ELL 788 Computational Perception & Cognition July November 2015

Single-view 3D Reconstruction

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009

Course Administration

1 Learning Spatial Context: 2 Using Stuff to Find Things

3D Spatial Layout Propagation in a Video Sequence

Discriminative classifiers for image recognition

CS294 43: Recognition in Context. Spring April il14 th, 2009

Overview of section. Lecture 10 Discriminative models. Discriminative methods. Discriminative vs. generative. Discriminative methods.

Texton Clustering for Local Classification using Scene-Context Scale

Visual localization using global visual features and vanishing points

Correcting User Guided Image Segmentation

Attributes and More Crowdsourcing

Recovering the Spatial Layout of Cluttered Rooms

Supervised learning. y = f(x) function

Object recognition (part 2)

Context by Region Ancestry

Object Recognition. Computer Vision. Slides from Lana Lazebnik, Fei-Fei Li, Rob Fergus, Antonio Torralba, and Jean Ponce

STRUCTURAL EDGE LEARNING FOR 3-D RECONSTRUCTION FROM A SINGLE STILL IMAGE. Nan Hu. Stanford University Electrical Engineering

Decomposing a Scene into Geometric and Semantically Consistent Regions

Attributes. Computer Vision. James Hays. Many slides from Derek Hoiem

Support surfaces prediction for indoor scene understanding

Using the forest to see the trees: exploiting context for visual object detection and localization

CS4495/6495 Introduction to Computer Vision. 8C-L1 Classification: Discriminative models

Supervised learning. y = f(x) function

A Hierarchical Conditional Random Field Model for Labeling and Segmenting Images of Street Scenes

24 hours of Photo Sharing. installation by Erik Kessels

Three-Dimensional Object Detection and Layout Prediction using Clouds of Oriented Gradients

Single-view 3D reasoning

Segmentation. Bottom up Segmentation Semantic Segmentation

What are we trying to achieve? Why are we doing this? What do we learn from past history? What will we talk about today?

Single Image Super-resolution. Slides from Libin Geoffrey Sun and James Hays

Object Detection by 3D Aspectlets and Occlusion Reasoning

Does everyone have an override code?

Region-based Segmentation and Object Detection

Viewpoint Invariant Features from Single Images Using 3D Geometry

Object Category Detection: Sliding Windows

Semantic Segmentation of Street-Side Images

Visual words. Map high-dimensional descriptors to tokens/words by quantizing the feature space.

Previously. Window-based models for generic object detection 4/11/2011

Deformable Part Models

Detection III: Analyzing and Debugging Detection Methods

3D layout propagation to improve object recognition in egocentric videos

DEPT: Depth Estimation by Parameter Transfer for Single Still Images

Generic Object-Face detection

Object Recognition and Segmentation in Indoor Scenes from RGB-D Images

Context Based Object Categorization: A Critical Survey

Shadows in the graphics pipeline

Object detection as supervised classification

Image Analysis. Window-based face detection: The Viola-Jones algorithm. iphoto decides that this is a face. It can be trained to recognize pets!

Probabilistic Graphical Models Part III: Example Applications

CS 1674: Intro to Computer Vision. Attributes. Prof. Adriana Kovashka University of Pittsburgh November 2, 2016

CS5670: Intro to Computer Vision

3dRR: Representation and Recognition

Bayes Risk. Classifiers for Recognition Reading: Chapter 22 (skip 22.3) Discriminative vs Generative Models. Loss functions in classifiers

Local Features and Bag of Words Models

Modeling Image Context using Object Centered Grid

Object Detection System

Understanding Faces. Detection, Recognition, and. Transformation of Faces 12/5/17

Object Detection Using Cascaded Classification Models

Classifiers for Recognition Reading: Chapter 22 (skip 22.3)

Single-view metrology

Unsupervised discovery of category and object models. The task

Context Based Object Categorization: A Critical Survey

Today. Introduction to recognition Alignment based approaches 11/4/2008

Interpreting 3D Scenes Using Object-level Constraints

SuperParsing: Scalable Nonparametric Image Parsing with Superpixels

Combining Semantic Scene Priors and Haze Removal for Single Image Depth Estimation

Contextual Classification with Functional Max-Margin Markov Networks

DEPT: Depth Estimation by Parameter Transfer for Single Still Images

Recognizing and Learning Object Categories. Li Fei-Fei, Stanford Rob Fergus, NYU Antonio Torralba, MIT

From 3D Scene Geometry to Human Workspace

CS395T paper review. Indoor Segmentation and Support Inference from RGBD Images. Chao Jia Sep

CEng Computational Vision

Computer Vision: Summary and Discussion

Face Detection. Raghuraman Gopalan AT&T Labs-Research, Middletown NJ USA

Region-based Segmentation and Object Detection

Discrete Optimization of Ray Potentials for Semantic 3D Reconstruction

CV as making bank. Intel buys Mobileye! $15 billion. Mobileye:

Scenes vs. Objects: a Comparative Study of Two Approaches to Context Based Recognition

A Framework for Encoding Object-level Image Priors

Scene Modeling for a Single View

Filters (cont.) CS 554 Computer Vision Pinar Duygulu Bilkent University

Categorization by Learning and Combining Object Parts

Local Image Features

Using the Forest to See the Trees: A Graphical Model Relating Features, Objects, and Scenes

Unsupervised Patch-based Context from Millions of Images

Blocks World Revisited: Image Understanding Using Qualitative Geometry and Mechanics

Opportunities of Scale

Transcription:

Context CS 554 Computer Vision Pinar Duygulu Bilkent University (Source:Antonio Torralba, James Hays)

A computer vision goal Recognize many different objects under many viewing conditions in unconstrained settings. OFFICE Bookshelf Screen Desk 2

Why is this hard? y x 10,000 patches/object/image Plus, we want to do this for ~ 1000 objects 1,000,000 images/day time 3

The face detection age The representation and matching of pictorial structures Fischler, Elschlager (1973). Face recognition using eigenfaces M. Turk and A. Pentland (1991). Human Face Detection in Visual Scenes - Rowley, Baluja, Kanade (1995) Graded Learning for Object Detection - Fleuret, Geman (1999) Robust Real-time Object Detection - Viola, Jones (2001) Feature Reduction and Hierarchy of Classifiers for Fast Object Detection in Video Images - Heisele, Serre, Mukherjee, Poggio (2001). 4

Head in the coffee beans problem Can you find the head in this image? 5

Head in the coffee beans problem Can you find the head in this image? 6

Context in Recognition Objects usually are surrounded by a scene that can provide context in the form of nearby objects, surfaces, scene category, geometry, etc.

Context provides clues for function What is this? Now can you tell?

Sometimes context is the major component of recognition What is this?

Sometimes context is the major component of recognition What is this? Now can you tell?

More Low-Res What are these blobs?

More Low-Res The same pixels! (a car)

Some symptoms of standard approaches 13

Just objects is not enough The detector challenge: by looking at the output of a detector on a random set of images, can you guess which object is it trying to detect? 14

What object is detector trying to detect? The detector challenge: by looking at the output of a detector on a random set of images, can you guess which object is it trying to detect? 15

1. chair, 2. table, 3. road, 4. road, 5. table, 6. car, 7. keyboard. 16

What are the hidden objects? 1 2 17

What are the hidden objects? Chance ~ 1/30000 18

Biederman 1982 Pictures shown for 150 ms. Objects in appropriate context were detected more accurately than objects in an inappropriate context. Scene consistency affects object detection. 19

Look-Alikes by Joan Steiner Even in high resolution, we can not shut down contextual processing and it is hard to recognize the true identities of the elements that compose this scene. 20

Look-Alikes by Joan Steiner 21

The multiple personalities of a blob 22

The multiple personalities of a blob 23

Recognition with low resolution none not sure bedroom bedroom 12x12 16x16 24x24 32x32 bookcase closet window bed skyscraper poster window poster skyscraper bed man bed leg shoes step ladder Humans can perceive the Gist of the scene (scene category + 4/5 objects) with a resolution of 32x32 pixels. 24 Torralba. Visual Neuroscience. 2009

Disambiguation 25

Disambiguation 26

Disambiguation 27

Why is context important? Changes the interpretation of an object (or its function) Context defines what an unexpected event is 28

Global precedence Forest Before Trees: The Precedence of Global Features in Visual Perception Navon (1977) 29

p(o I) ap(i O) p(o) Object model Context model office Full joint Scene model p(o) = S Pp(Oi S=s) p(s=s) s i street Approx. joint 30

Context models object1 object2 object3 Independent model scene object2 object1 object2 object3 object1 object3 Objects are correlated via the scene Dependencies among objects 31

Context models object1 object2 object3 Independent model scene object2 object1 object2 object3 object1 object3 Objects are correlated via the scene Dependencies among objects 32

Scene recognition without object recognition Scene S g Scene features Murphy, Torralba, Freeman; NIPS 2003. Torralba, Murphy, Freeman, CACM 2010. 33

Application of object detection for image retrieval Results using the keyboard detector alone

The system does not care about the scene, but we do We know there is a keyboard present in this scene even if we cannot see it clearly. We know there is no keyboard present in this scene even if there is one indeed.

An integrated model of Scenes, Objects, and Parts Scene S 0.2 0.15 0.1 0.05 0.8 0.6 0.4 0.2 01 0 1 N car P(N car S = street) 5 0 0 5 10 15 P(N car S = park) 5 N N 0 0 5 10 15 g Scene gist features Murphy, Torralba, Freeman; NIPS 2003. Torralba, Murphy, Freeman, CACM 2010. 36

Object retrieval: scene features vs. detector Results using the keyboard detector alone Results using both the detector and the global scene features Murphy, Torralba, Freeman; NIPS 2003. Torralba, Murphy, Freeman, CACM 2010. 37

Context driven object detection Scene S 0.2 0.15 0.1 0.05 N car P(N car S = street) 0 1 5 Z car N 0 0 5 10 15 g Scene gist features 38

The layered structure of scenes Assuming a human observer standing on the ground p(x) p(x 2 x 1 ) In a display with multiple targets present, the location of one target constraints the y coordinate of the remaining targets, but not the x coordinate. Torralba, Oliva, Castelhano, Henderson. 2006

Car detection without a car detector 40

screens keyboard car pedestrian 41

Detecting faces without a face detector Torralba & Sinha, 01; Torralba, 0342

Context driven object detection Scene S 0.2 0.15 0.1 0.05 N car P(N car S = street) 0 1 5 Z car N 0 0 5 10 15 g Scene gist features Murphy, Torralba, Freeman; NIPS 2003. Torralba, Murphy, Freeman, CACM 2010. 43

An integrated model of Scenes, Objects, and Parts Scene S N car Z car car F i g d car i x car i M=4 Scene gist features Murphy, Torralba, Freeman; NIPS 2003. Torralba, Murphy, Freeman, CACM 2010. 44

Failures If the detector fails context can not help 46

Failures If the detector fails context can not help If the detector produces a contextually coherent false alarm, context will increase the error. 47

A car out of context 48

Who needs context anyway? We can recognize objects even out of context Banksy 49

Context models object1 object2 object3 Independent model scene object2 object1 object2 object3 object1 object3 Objects are correlated via the scene Dependencies among objects

Building, cat car, table cow, building road, river 1) Generate candidate objects (run a detector, or segmentation) M possible object labels N regions Label: c k = [1 M] with k = [1 N] Scores: s k = vector length M 2) For each candidate, get a list of possible interpretations with their probabilities p(c k = m s k ) 3) Goal: to assign labels c k to each candidate so that they are in contextual agreement. We want to optimize the joint probability of all the labels: p(c 1 = m 1,, c N = m N s 1,, s N ) 51 A. Rabinovich, A. Vedaldi, C. Galleguillos, E. Wiewiora and S. Belongie. Objects in Context. ICCV 2007

F(c i =m i, c j =m j ) = co-ocurrence matrix on training set (count how many times two objects appear together). A. Rabinovich, A. Vedaldi, C. Galleguillos, E. Wiewiora and S. Belongie. Objects in Context. ICCV 2007 52

A. Rabinovich, A. Vedaldi, C. Galleguillos, E. Wiewiora and S. Belongie. Objects in Context. ICCV 2007 53

Spatial layout is especially important 1. Context for recognition

Spatial layout is especially important 1. Context for recognition Slide: Derek Hoiem

Spatial layout is especially important 1. Context for recognition 2. Scene understanding Slide: Derek Hoiem

Spatial layout is especially important 1. Context for recognition 2. Scene understanding 3. Many direct applications a) Assisted driving b) Robot navigation/interaction c) 2D to 3D conversion for 3D TV d) Object insertion Slide: Derek Hoiem

Spatial Layout: 2D vs. 3D Slide: Derek Hoiem

Context in Image Space [Torralba Murphy Freeman 2004] 59 [Kumar Hebert 2005] [He Zemel Cerreira-Perpiñán 2004] Slide: Derek Hoiem

But object relations are in 3D Close Not Close Slide: Derek Hoiem

How to represent scene space? Slide: Derek Hoiem

Wide variety of possible representations Figs from Hoiem/Savarese Draft

Figs from Hoiem/Savarese Draft

Figs from Hoiem/Savarese Draft

Low detail, Low abstraction Holistic Scene Space: Gist Torralba & Oliva 2002 Oliva & Torralba 2001 Slide: Derek Hoiem

High detail, Low abstraction Depth Map Saxena, Chung & Ng 2005, 2007 Slide: Derek Hoiem

Medium detail, High abstraction Room as a Box Hedau Hoiem Forsyth 2009 Slide: Derek Hoiem

Surface Layout: describe 3D surfaces with geometric classes Sky Non-Planar Porous Vertical Non-Planar Solid Support Planar (Left/Center/Right) Slide: Derek Hoiem

Geometry estimation as recognition Region Features Color Texture Perspective Position Surface Geometry Classifier Vertical, Planar Training Data Slide: Derek Hoiem

Use a variety of image cues Color, texture, image location Vanishing points, lines Texture gradient Slide: Derek Hoiem

Surface Layout Algorithm Input Image Segmentation Features Perspective Color Texture Position Surface Labels Trained Region Classifier Training Data Hoiem Efros Hebert (2007)

Input Image Surface Layout Algorithm Multiple Segmentations Features Perspective Color Texture Position Confidence-Weighted Predictions Final Surface Labels Trained Region Classifier Training Data Hoiem Efros Hebert (2007)

Surface Description Result Slide: Derek Hoiem

Automatic Photo Popup Labeled Image Fit Ground-Vertical Boundary with Line Segments Form Segments into Polylines Cut and Fold Final Pop-up Model [Hoiem Efros Hebert 2005]

Automatic Photo Pop-up Slide: Derek Hoiem

What about more organized but complex spaces? Other excellent works include: Saxena Sun Ng (2009) Lee Kanade Hebert (2009) Gupta Efros Hebert (2010) Slide: Derek Hoiem

The room as a box Hedau Hoiem Forsyth (2009)

Recovering the box layout Surface Label Confidences Vertical VP Left Wall Right Wall Floor Objects Joint Inference Detected Edges + Vanishing Points Most Likely Geometry Hypothesized Boxes

Estimate room s physical space from one image Estimated Box Geometry + Object Pixels 3D Reconstruction + Estimated Occupied Volume

Detecting 3D bed positions in an image 2D Bed Detection 3D Bed Detection with Scene Geometry Hedau Hoiem Forsyth (2010)

Searching for beds in room coordinates Recover Room Coordinates Rectify Features to Room Coordinates Rectified Sliding Windows

3D bed detection from an image

Reason about 3D room and bed space Joint Inference with Priors Beds close to walls Beds within room Consistent bed/wall size Two objects cannot occupy the same space Hedau Hoiem Forsyth (2010)

Depth Estimates from an Image Saxena et al. 2005, 2008

Depth from Image: approach 1. Divide image into superpixels 2. Compute features for each superpixel Position, color, texture, shape 3. Predict 3D plane parameters for each superpixel using features 4. Estimate confidence in prediction using features 5. Global inference, incorporating constraints of connected structure, co-planarity, co-linearity Saxena et al. 2008

Depth Estimates from an Image Saxena et al. 2005, 2008

Depth Estimates from an Image Saxena et al. 2008

Depth from Image: Reconstructions Input Novel View Saxena et al. 2008

Things to remember Objects should be interpreted in the context of the surrounding scene Many types of context to consider Spatial layout is an important part of scene interpretation, but many open problems How to represent space? How to learn and infer spatial models? Consider trade-offs of detail vs. accuracy and abstraction vs. quantification