Local cues and global constraints in image understanding

Similar documents
CS 558: Computer Vision 13 th Set of Notes

Object and Class Recognition I:

Segmentation. Bottom up Segmentation Semantic Segmentation

Context. CS 554 Computer Vision Pinar Duygulu Bilkent University. (Source:Antonio Torralba, James Hays)

Viewpoint Invariant Features from Single Images Using 3D Geometry

12/3/2009. What is Computer Vision? Applications. Application: Assisted driving Pedestrian and car detection. Application: Improving online search

Contexts and 3D Scenes

EE795: Computer Vision and Intelligent Systems

Deformable Part Models

Instance-level recognition part 2

Contexts and 3D Scenes

Object Category Detection. Slides mostly from Derek Hoiem

Verification: is that a lamp? What do we mean by recognition? Recognition. Recognition

Object recognition (part 1)

Beyond Bags of features Spatial information & Shape models

What do we mean by recognition?

Part-based models. Lecture 10

Revisiting 3D Geometric Models for Accurate Object Shape and Pose

Computer Vision Lecture 17

Object detection. Announcements. Last time: Mid-level cues 2/23/2016. Wed Feb 24 Kristen Grauman UT Austin

Computer Vision Lecture 17

Part based models for recognition. Kristen Grauman

Object Recognition. Computer Vision. Slides from Lana Lazebnik, Fei-Fei Li, Rob Fergus, Antonio Torralba, and Jean Ponce

Course Administration

Local Features based Object Categories and Object Instances Recognition

Visuelle Perzeption für Mensch- Maschine Schnittstellen

Beyond bags of features: Adding spatial information. Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba

Joint Vanishing Point Extraction and Tracking. 9. June 2015 CVPR 2015 Till Kroeger, Dengxin Dai, Luc Van Gool, Computer Vision ETH Zürich

Category vs. instance recognition

Local features: detection and description. Local invariant features

Object Category Detection: Sliding Windows

Instance-level recognition II.

Detection III: Analyzing and Debugging Detection Methods

Recap: Features and filters. Recap: Grouping & fitting. Now: Multiple views 10/29/2008. Epipolar geometry & stereo vision. Why multiple views?

Human detection using histogram of oriented gradients. Srikumar Ramalingam School of Computing University of Utah

Modern Object Detection. Most slides from Ali Farhadi

2%34 #5 +,,% ! # %& ()% #% +,,%. & /%0%)( 1 ! # %&# (&)# +, %& ./01 /1&2% # /&

Human Upper Body Pose Estimation in Static Images

High Level Computer Vision

Face Detection and Alignment. Prof. Xin Yang HUST

Local Image Features

Computer Vision CS 776 Fall 2018

Evaluation and comparison of interest points/regions

Category-level localization

CS5670: Intro to Computer Vision

Automatic Photo Popup

Visual Object Recognition

Three-Dimensional Object Detection and Layout Prediction using Clouds of Oriented Gradients

Feature descriptors and matching

HOG-based Pedestriant Detector Training

Discriminative classifiers for image recognition

Large-Scale Traffic Sign Recognition based on Local Features and Color Segmentation

Perspective projection. A. Mantegna, Martyrdom of St. Christopher, c. 1450

Local invariant features

CHAPTER 3. Single-view Geometry. 1. Consequences of Projection

Generic Object-Face detection

Discrete Optimization of Ray Potentials for Semantic 3D Reconstruction

Harder case. Image matching. Even harder case. Harder still? by Diva Sian. by swashford

Harder case. Image matching. Even harder case. Harder still? by Diva Sian. by swashford

What are we trying to achieve? Why are we doing this? What do we learn from past history? What will we talk about today?

Object Detection by 3D Aspectlets and Occlusion Reasoning

BSB663 Image Processing Pinar Duygulu. Slides are adapted from Selim Aksoy

Detecting Multiple Symmetries with Extended SIFT

Image Features: Local Descriptors. Sanja Fidler CSC420: Intro to Image Understanding 1/ 58

Recap Image Classification with Bags of Local Features

BIL Computer Vision Apr 16, 2014

SURF. Lecture6: SURF and HOG. Integral Image. Feature Evaluation with Integral Image

Part-Based Models for Object Class Recognition Part 2

Part-Based Models for Object Class Recognition Part 2

Stereo Vision. MAN-522 Computer Vision

Joint Inference in Image Databases via Dense Correspondence. Michael Rubinstein MIT CSAIL (while interning at Microsoft Research)

CS 2770: Intro to Computer Vision. Multiple Views. Prof. Adriana Kovashka University of Pittsburgh March 14, 2017

EE795: Computer Vision and Intelligent Systems

Closing the Loop in Scene Interpretation

Local features and image matching. Prof. Xin Yang HUST

Structured Models in. Dan Huttenlocher. June 2010

3D Spatial Layout Propagation in a Video Sequence

Beyond Bags of Features

Modeling 3D viewpoint for part-based object recognition of rigid objects

Interpreting 3D Scenes Using Object-level Constraints

A Simple Vision System

Local features: detection and description May 12 th, 2015

Midterm Wed. Local features: detection and description. Today. Last time. Local features: main components. Goal: interest operator repeatability

Person Detection in Images using HoG + Gentleboost. Rahul Rajan June 1st July 15th CMU Q Robotics Lab

Single-view 3D Reconstruction

Supervised learning. y = f(x) function

EECS 442 Computer Vision fall 2011

Object Recognition II

Structure from Motion. Prof. Marco Marcon

DEPTH AND GEOMETRY FROM A SINGLE 2D IMAGE USING TRIANGULATION

Object Category Detection: Sliding Windows

Computer Vision: Summary and Discussion

Generic Face Alignment Using an Improved Active Shape Model

Scene Grammars, Factor Graphs, and Belief Propagation

Object Detection Design challenges

CEng Computational Vision

Introduction to object recognition. Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others

Camera Geometry II. COS 429 Princeton University

Detecting and Segmenting Humans in Crowded Scenes

Ulas Bagci

Transcription:

Local cues and global constraints in image understanding Olga Barinova Lomonosov Moscow State University *Many slides adopted from the courses of Anton Konushin

Image understanding «To see means to know what is where by looking» David Marr, Vision, 1982 We see Computer sees Slide credit S. Narasimhan

Image understanding 6th Microsoft PhD Summer School, Cambridge, UK, 27 Slide June credit: 1 July 2011. Fei-Fei, Fergus & Torralba

Scene interpretation outdoors urban street Beijing, China Image understanding Slide 4 slide credit: Fei-Fei, Fergus & Torralba

Object detection Image understanding Building Flag Text Face Bus Bus Slide credit: Fei-Fei, Fergus & Torralba Slide 5 slide credit: Fei-Fei, Fergus & Torralba

Image understanding Extracting details about the objects Blue Inclined Moving Mao Moving Slide 6 slide credit: Fei-Fei, Fergus & Torralba

Why is image understanding difficult? Dimensionality reduction 3D world 2D image Point of observation What do we lose in perspective projection? Angles Distances and lengths Figures Stephen E. Palmer, 2002

Why is image understanding difficult? Variability: interclass variability of appearance

Why is image understanding difficult? Variability: different viewpoints Michelangelo 1475-1564 slide credit: Fei-Fei, Fergus & Torralba

Why is image understanding difficult? Variability: different lighting image credit: J. Koenderink

Why is image understanding difficult? Variability: deformations and occlusions Xu, Beihong 1943 Slide credit: Fei-Fei, Fergus & Torralba

Reducing variability by using local cues Motivation: stitching panoramas Find distinctive points Find an alignment that matches these points

Reducing variability by using local cues Motivation: stereo matching

Reducing variability by using local cues Motivation: image retrieval object detection Slide creit: S. Lazebnik

Learning local models from local cues Local features and descriptors Feature detectors Harris-Laplace LoG DoG Dense sampling Descriptors SIFT Shape context HOG Pixel comparison

Learning local models from local cues Combining different descriptors

Learning local models from local cues Learning models from the data Learned model Data Output of the model Learning methods: SVM Boosting Random Forests Unseen data instance

Learning local models from local cues Example: object detection using sliding window 'Local' has been the dominant paradigm in computer vision till the 2000s Works notoriously well for detection of rigid objects, e. g. faces [Viola, Jones, 2001], [Dalal, Triggs, 2005]

Learning local models from local cues Let s have a closer look at the results True Detection False Detections Missed Missed True Detections Slide credit: Alyosha Efros Local Detector: [Dalal-Triggs 2005]

Learning local models from local cues What the detector sees Slide credit: Alyosha Efros

Learning local models from local cues Local ambiguity Slide credit: Fei-Fei, Fergus & Torralba

Learning local models from local cues The role of context 1 2 Slide credit: Fei-Fei, Fergus & Torralba

Learning local models from local cues The role of context Slide credit: Fei-Fei, Fergus & Torralba

Constraints of the world The world is structured, not everything is possible Local cues Global constraints Similar appearance of similar objects Limited number of allowed deformations of the objects in 3d Depth ordering and occlusions Rules of perspective projection. Chaotic world Structured world 24

Constraints of the world Limited set of allowed deformations for the objects Dali, 1931

Occlusions Constraints of the world Magritte, 1957 slide credit: Fei-Fei, Fergus & Torralba

Depth ordering Constraints of the world Slide credit: J. Koenderink

Constraints of the world Rules of perspective geometry

Expressing constraints with graphical models Outline of the talk The idea of graphical models Examples: Limiting the set of allowed deformations Occlusion constraint Depth ordering constraint Modeling the rules of perspective geometry

Expressing constraints with graphical models Outline of the talk The idea of graphical models Examples: Limiting the set of allowed deformations Occlusion constraint Depth ordering constraint Modeling the rules of perspective geometry

Expressing constraints with graphical models Graphical models Graphical representation of probability distributions Graph-based algorithms for calculation and computation Capture both local cues and global constraints by modeling dependencies between random variables Picture credit: C. Bishop

Graph representation Graphical models Each node corresponds to a random variable Dependent variables are connected with edges Clique - fully connected set of nodes in the graph Maximal clique - a clique that is not a subset of any other cliques p(x 1, x 4 x 2, x 3 ) = = p(x 1 x 2, x 3 ) p(x 4 x 2, x 3 ) Picture credit: C. Bishop

Graphical models Joint distribution and potentials Joint distribution of all random variables can be written as a product of nonnegative potentials defined on maximal cliques: 1 p( X ) C( X C) Z C ( XC ), C ( XC ) 0 Z C X C 1 p( X ) 1( x1, x2, x3) 2( x2, x3, x4) Z

Graphical models MAP-inference and energy function Maximum a-posteriori (MAP) inference - find the values of all variables in the graphical model that maximize the joint probability 1 x arg max p( X ) arg max 1 = 1 x 2 = 0 C( XC) Z C arg max p( X ) arg max exp EC( X C) C arg min E ( X ) Energy function: E(X) = logp(x) arg min = E ( X ) MAP-inference = energy minimization C C C C C C x 3 = 1 x 4 = 1

Graphical models Methods for MAP-inference Many computationally efficient methods for inference in graphical models have been developed: graph cuts TRW belief propagation expectation propagation MCMC. All these methods have limitations and can be used to minimize energy functions of specific forms the art is to find tradeof between flexibility and tractability

Expressing constraints with graphical models Outline of the talk The idea of graphical models Examples: Limiting the set of allowed deformations Occlusion constraint Depth ordering constraint Modeling the rules of perspective geometry

Expressing constraints with graphical models Limiting the set of allowed deformations Model should be flexible enough, but constrain the allowed deformations of an object

Limiting the set of allowed deformations Pictorial structures model Pictorial structures strike a good balance between flexibility and tractability [Fischler & Elschlager 73], [Felzenshwalb & Huttenlocher 00]

Limiting the set of allowed deformations Pictorial structures model Each vertex corresponds to a part of a person: Head, Torso, Legs, Arms Edges form a tree Person detector - for each vertex find a corresponding position from the set of valid positions

Limiting the set of allowed deformations Pose estimation Pedestrian detection

Expressing constraints with graphical models Outline of the talk The idea of graphical models Examples: Limiting the set of allowed deformations Occlusion constraint Depth ordering constraint Modeling the rules of perspective geometry

Occlusions Expressing constraints with graphical models Occlusions and self occlusions make the task of object detection even harder Joint work with Victor Lempitsky and Pushmeet Kohli, CVPR10

Local model Occlusion constraint Model from [Gall & Lempitsky, CVPR09] Joint work with Victor Lempitsky and Pushmeet Kohli, CVPR10

Occlusion constraint Each image pixel belong to no at most one object Hypotheses of the position of the center of an object Image patches Joint work with Victor Lempitsky and Pushmeet Kohli, CVPR10

Occlusion constraint Modeling the occlusion constraint x labelling of image patches, x i = index of hypothesis, if the patch votes for hypothesis, x i = 0, if the patch votes for background 1 3 2 y labelling of hypotheses, binary variables: y =1, if the object is present, y = 0, otherwise Joint work with Victor Lempitsky and Pushmeet Kohli, CVPR10

Occlusion constraint Modeling the occlusion constraint x 2 =1 x 3 =1 x labelling of image patches, x i = index of hypothesis, if the patch votes for hypothesis, x i = 0, if the patch votes for background x 8 =2 x 1 =1 x 7 =0 1 y 1 =1 y 2 =1 2 3 y 3 =0 x 6 =2 x 5 =2 x 4 =2 Key idea : joint MAP-inference in x and y Joint work with Victor Lempitsky and Pushmeet Kohli, CVPR10 y labelling of hypotheses, binary variables: y =1, if the object is present, y = 0, otherwise

Graphical model Occlusion constraint If labeling of y is fixed, the values of x i are independent So we can maximize x out first and perform inference over y 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 Image patches x 1 x 2 x 3 x 4 x 5 y 1 y 2 y 3 0 1 Hypotheses for the position of object center Joint work with Victor Lempitsky and Pushmeet Kohli, CVPR10 0 1 0 1

Occlusion constraint Comparison Without occlusion constraint White = correct detection Green = missing object Red = false positive Using occlusion constraint Code available online!!! Joint work with Victor Lempitsky and Pushmeet Kohli, CVPR10

Expressing constraints with graphical models Outline of the talk The idea of graphical models Examples: Limiting the set of allowed deformations Occlusion constraint Depth ordering constraint Modeling the rules of perspective geometry

Depth ordering Expressing constraints with graphical models The size of an object depends on the distance from the viewpoint Viewpoint is set by the position of horizon and ground plane in the image

Depth ordering constraint Viewpoint prior on the size of the objects Object Position/Sizes Viewpoint Hoiem et al., CVPR2006

Depth ordering constraint Detected objects viewpoint Object Position/Sizes Viewpoint Hoiem et al., CVPR2006

Graphical model Depth ordering constraint Viewpoint Local Object Evidence θ Local Object Evidence o 1... Objects o n Local Surface Evidence Local Surface Evidence s 1 Local Surfaces s n Hoiem et al., CVPR2006

Depth ordering constraint Prior on the size an position of the objects Image P(surfaces) P(viewpoint) P(object) P(object surfaces, viewpoint) Hoiem et al., CVPR2006

Comparison Depth ordering constraint Car: TP / FP Ped: TP / FP Initial: 2 TP / 3 FP Final: 7 TP / 4 FP Hoiem et al., CVPR2006 Local Detector from [Murphy-Torralba-Freeman 2003]

Comparison Depth ordering constraint Car: TP / FP Ped: TP / FP Initial: 1 TP / 14 FP Final: 3 TP / 5 FP Hoiem et al., CVPR2006 Local Detector from [Murphy-Torralba-Freeman 2003]

Expressing constraints with graphical models Outline of the talk The idea of graphical models Examples: Limiting the set of allowed deformations Occlusion constraint Depth ordering constraint Modeling the rules of perspective geometry

Expressing constraints with graphical models Rules of perspective geometry Straight lines lying on parallel planes in 3D intersect in the image plane in vanishing point Vanishing point corresponding to vertical lines is called zenith All vanishing points which correspond to horizontal lines lie on the same straight line called horizon

Modeling the rules of perspective geometry Geometric primitives and geometric parsing Image of man-made environment Edge pixels Line segments Lines Vanishing points Zenith and horizon Joint work with Elena Tretyak, Victor Lempitsky and Pushmeet Kohli, ECCV10

Modeling the rules of perspective geometry Traditional approach: bottom-up pipeline Input image Edge map 1. Grouping edge pixels into line segments 2.Grouping line segments and VPs estimation 3. Horizon and zenith estimation *Tuytelaars 1998, Antone 2000, Almansa 2003, Aguilera 2005, Boulanger 2006, Tardif 2009

Modeling the rules of perspective geometry Energy function All geometric primitives are detected in the simultaneously by energy minimization: p edge pixels, s line segments, l lines, z zenith, h horizontal vanishing points Joint work with Elena Tretyak, Victor Lempitsky and Pushmeet Kohli, ECCV10

Modeling the rules of perspective geometry Discretization of the model Candidate vanishing points Candidate lines Candidate line segments Candidate edge pixels

Modeling the rules of perspective geometry Graphical model factors correspond to the potentials of the energy function nodes correspond to geometric primitives 63/15 Joint work with Elena Tretyak, Victor Lempitsky and Pushmeet Kohli, ECCV10

Modeling the rules of perspective geometry Results of geometric parsing Energy minimization Input image Detected geometric primitives Joint work with Elena Tretyak, Victor Lempitsky and Pushmeet Kohli, ECCV10

Modeling the rules of perspective geometry Comparison Omitting horizon constraint Result of geometric parsing with full energy Code available online!!! Bottom-up pipeline

Modeling the rules of perspective geometry Comparison Omitting horizon constraint Result of geometric parsing with full energy Code available online!!! Bottom-up pipeline

Modeling the rules of perspective geometry Application: single-view geometry 3-d reconstruction = recovering 3-d structure of the scene from its projection(s) Single-view 3-d reconstruction is an Illposed problem, infinite number of possible solutions Joint work with Vadim Konushin, Anton Yakubenko, Hwasup Lim, and Anton Konushin, ECCV08

Application: single-view geometry The structure of 3d model 3d model is composed of ground plane and a number of vertical walls Joint work with Vadim Konushin, Anton Yakubenko, Hwasup Lim, and Anton Konushin, ECCV08

Application: single-view geometry The structure of 3d model Parameterization: (n-1) X-coordinates of polyline fractures (n) X-coordinate of VP Horizon level h Y-coordinate of polyline left end v,( i 1,..., n) i x 1 p y i p,( i 2,..., n) x

Demo Application: single-view geometry single image 3d model Joint work with Vadim Konushin, Anton Yakubenko, Hwasup Lim, and Anton Konushin, ECCV08

Thank you for attention! Thanks to Microsoft Research Connections for the support of creating courses on computer vision at MSU