EECS 442 Computer vision. Object Recognition

Similar documents
Part-based models. Lecture 10

Part based models for recognition. Kristen Grauman

CEE598 - Visual Sensing for Civil Infrastructure Eng. & Mgmt.

CEE598 - Visual Sensing for Civil Infrastructure Eng. & Mgmt.

Discriminative classifiers for image recognition

Deformable Part Models

Part-based and local feature models for generic object recognition

EECS 442 Computer vision. Object Recognition

EECS 442 Computer vision. Fitting methods

CS4495/6495 Introduction to Computer Vision. 8C-L1 Classification: Discriminative models

Object Category Detection. Slides mostly from Derek Hoiem

Lecture 12 Visual recognition

Object Detection. Sanja Fidler CSC420: Intro to Image Understanding 1/ 1

Fitting: The Hough transform

Previously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011

Fitting: The Hough transform

Window based detectors

Fitting: The Hough transform

Lecture 16: Object recognition: Part-based generative models

Fitting. Instructor: Jason Corso (jjcorso)! web.eecs.umich.edu/~jjcorso/t/598f14!! EECS Fall 2014! Foundations of Computer Vision!

Lecture 8 Fitting and Matching

Lecture 9 Fitting and Matching

Lecture 15: Object recognition: Bag of Words models & Part based generative models

Introduction to Object Recognition & Bag of Words (BoW) Models

Beyond bags of features: Adding spatial information. Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba

Feature Matching and Robust Fitting

Beyond Bags of features Spatial information & Shape models

Object detection as supervised classification

Visual Object Recognition

CS6670: Computer Vision

10/03/11. Model Fitting. Computer Vision CS 143, Brown. James Hays. Slides from Silvio Savarese, Svetlana Lazebnik, and Derek Hoiem

CEE598 - Visual Sensing for Civil Infrastructure Eng. & Mgmt.

Visuelle Perzeption für Mensch- Maschine Schnittstellen

Generic Object-Face detection

Previously. Window-based models for generic object detection 4/11/2011

Object detection. Announcements. Last time: Mid-level cues 2/23/2016. Wed Feb 24 Kristen Grauman UT Austin

Model Fitting: The Hough transform II

Supervised learning. y = f(x) function

Methods for Representing and Recognizing 3D objects

Visuelle Perzeption für Mensch- Maschine Schnittstellen

Object Category Detection: Sliding Windows

Recognition with Bag-ofWords. (Borrowing heavily from Tutorial Slides by Li Fei-fei)

Part-Based Models for Object Class Recognition Part 2

Part-Based Models for Object Class Recognition Part 2

Category vs. instance recognition

Object Recognition. Computer Vision. Slides from Lana Lazebnik, Fei-Fei Li, Rob Fergus, Antonio Torralba, and Jean Ponce

Hough Transform and RANSAC

Recap Image Classification with Bags of Local Features

Object and Class Recognition I:

Modern Object Detection. Most slides from Ali Farhadi

Templates and Background Subtraction. Prof. D. Stricker Doz. G. Bleser

Finding 2D Shapes and the Hough Transform

Bag of Words Models. CS4670 / 5670: Computer Vision Noah Snavely. Bag-of-words models 11/26/2013

Detection III: Analyzing and Debugging Detection Methods

Patch Descriptors. CSE 455 Linda Shapiro

Image Analysis. Window-based face detection: The Viola-Jones algorithm. iphoto decides that this is a face. It can be trained to recognize pets!

High Level Computer Vision

Introduction to object recognition. Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others

Supervised learning. y = f(x) function

Local Features based Object Categories and Object Instances Recognition

ObjectDetectionusingaMax-MarginHoughTransform

Instance-level recognition part 2

Patch-based Object Recognition. Basic Idea

Instance-level recognition II.

Fitting: Voting and the Hough Transform April 23 rd, Yong Jae Lee UC Davis

Lecture 15 Visual recognition

Part-Based Models for Object Class Recognition Part 3

Image classification Computer Vision Spring 2018, Lecture 18

Machine Learning Crash Course

Object recognition (part 1)

Loose Shape Model for Discriminative Learning of Object Categories

Category-level localization

Fusing shape and appearance information for object category detection

Feature Matching + Indexing and Retrieval

Category-level Localization

Object detection using non-redundant local Binary Patterns

Structured Models in. Dan Huttenlocher. June 2010

Overview of section. Lecture 10 Discriminative models. Discriminative methods. Discriminative vs. generative. Discriminative methods.

Course Administration

Selection of Scale-Invariant Parts for Object Class Recognition

Beyond Bags of Features

Recognition Tools: Support Vector Machines

Linear combinations of simple classifiers for the PASCAL challenge

Face Detection and Alignment. Prof. Xin Yang HUST

Bag-of-features. Cordelia Schmid

Today. Introduction to recognition Alignment based approaches 11/4/2008

Bayes Risk. Classifiers for Recognition Reading: Chapter 22 (skip 22.3) Discriminative vs Generative Models. Loss functions in classifiers

Model Fitting: The Hough transform I

Patch Descriptors. EE/CSE 576 Linda Shapiro

Learning the Compositional Nature of Visual Objects

Efficient Kernels for Identifying Unbounded-Order Spatial Features

Classifiers for Recognition Reading: Chapter 22 (skip 22.3)

Detecting Object Instances Without Discriminative Features

Object Classification for Video Surveillance

TA Section: Problem Set 4

Naïve Bayes for text classification

Perception IV: Place Recognition, Line Extraction

Modeling 3D viewpoint for part-based object recognition of rigid objects

EE795: Computer Vision and Intelligent Systems

Computer Vision. Exercise Session 10 Image Categorization

Transcription:

EECS 442 Computer vision Object Recognition Intro Recognition of 3D objects Recognition of object categories: Bag of world models Part based models 3D object categorization

Challenges: intraclass variation

Usual Challenges: Variability due to: View point Illumination Occlusions

Representation Basic properties 2D Bag of Words (BoW) models; 2D Star/ISM models; Multiview models; Learning Generative & Discriminative BoW models Probabilistic Hough voting Generative multiview models Recognition Classification with BoW Detection via Hough voting

Representation Basic properties 2D Bag of Words (BoW) models; 2D Star/ISM models; Multiview models; Learning Generative & Discriminative BoW models Probabilistic Hough voting Generative multiview models Recognition Classification with BoW Detection via Hough voting

definition of BoW Independent features histogram representation codewords dictionary

learning Representation recognition feature detection & representation codewords dictionary image representation category models (and/or) classifiers category decision

Learning and Recognition 1. Discriminative method: NN SVM 2.Generative method: graphical models category models (and/or) classifiers

Discriminative classifiers category models Model space Class 1 Class N

Discriminative models Nearest neighbor Neural networks 10 6 examples Shakhnarovich, Viola, Darrell 2003 Berg, Berg, Malik 2005... LeCun, Bottou, Bengio, Haffner 1998 Rowley, Baluja, Kanade 1998 Support Vector Machines Latent SVM Structural SVM Boosting Guyon, Vapnik, Heisele, Serre, Poggio Felzenszwalb 00 Ramanan 03 Viola, Jones 2001, Torralba et al. 2004, Opelt et al. 2006, Courtesy of Vittorio Ferrari Slide credit: Kristen Grauman Slide adapted from Antonio Torralba

Support vector machines Find hyperplane that maximizes the margin between the positive and negative examples Support vectors: x i w b 1 x w b Distance between point i and hyperplane: w Margin = 2 / w Solution: w y x i i i i Support vectors Credit slide: S. Lazebnik Margin Classification function (decision boundary): w x b y x x i i i i b

Support vector machines Classification w x b y x x i i i i b Test point if x w b 0 class 1 if x w b 0 class 2 C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998

Datasets that are linearly separable work out great: Nonlinear SVMs 0 x But what if the dataset is just too hard? 0 x We can map it to a higherdimensional space: x 2 0 x Slide credit: Andrew Moore

Nonlinear SVMs General idea: the original input space can always be mapped to some higherdimensional feature space where the training set is separable: Φ: x φ(x) lifting transformation Slide credit: Andrew Moore

Overfitting A simple dataset. Two models + + + + + + Linear + + + + + Nonlinear + + + + + + +

Overfitting + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Let s get more data. Simple model has better generalization.

Overfitting As complexity increases, the model overfits the data Training loss decreases Real loss increases We need to penalize model complexity = to regularize Loss Model complexity Real loss Training loss

Learning and Recognition 1. Discriminative method: NN SVM 2.Generative method: graphical models category models (and/or) classifiers

Generative models 1. Naïve Bayes classifier Csurka Bray, Dance & Fan, 2004 2. Hierarchical Bayesian text models (plsa and LDA) Background: Hoffman 2001, Blei, Ng & Jordan, 2004 Object categorization: Sivic et al. 2005, Sudderth et al. 2005 Natural scene categorization: FeiFei et al. 2005

Object categorization: the statistical viewpoint Discriminative methods model posterior Generative methods model likelihood and prior Bayes rule: p( zebra image) p( no zebra image) p( image zebra) p( image no zebra) p( zebra) p( no zebra) posterior ratio likelihood ratio prior ratio

Some notations w: a collection of all N codewords in the image w = [w1,w2,,wn] c: category of the image

the Naïve Bayes model p(c w) ~ p(c)p(w c) p( c) p( w1,, w c) N Prior prob. of the object classes Image likelihood given the class c w N

the Naïve Bayes model p (c w) ~ p(c)p(w c) p( c) p( w1,, w c) Prior prob. of the object classes Image likelihood given the class p( c) N N n 1 p( w n c) Likelihood of n th visual word given the class Assume that each feature (codewords) is conditionally independent given the class p(w, 1,w N c) N i 1 p(w c) i

the Naïve Bayes model p (c w) ~ p(c)p(w c) p( c) p( w1,, w c) Prior prob. of the object classes Image likelihood given the class p( c) N N n 1 p( w n c) Likelihood of n th visual word given the class

Classification/Recognition c arg max c p( c w) p( c) N n 1 p( w n c) Object class decision p( wi c1 ) How do we learn P(w i c j )? From empirical frequencies of code words in images from a given class p( wi c2)

Csurka et al. 2004

E = 28% E = 15%

Summary: Generative models Naïve Bayes Unigram models in document analysis Assumes conditional independence of words given class Parameter estimation: frequency counting

Other generative BoW models Hierarchical Bayesian topic models (e.g. plsa and LDA) Object categorization: Sivic et al. 2005, Sudderth et al. 2005 Natural scene categorization: FeiFei et al. 2005

Generative vs discriminative Discriminative methods Computationally efficient & fast Generative models Convenient for weakly or unsupervised, incremental training Prior information Flexibility in modeling parameters

Weakness of BoW the models No rigorous geometric information of the object components It s intuitive to most of us that objects are made of parts no such information Not extensively tested yet for View point invariance Scale invariance Segmentation and localization unclear

EECS 442 Computer vision Object Recognition Intro Recognition of 3D objects Recognition of object categories: Bag of world models Part based models 3D object categorization

Representation Basic properties 2D Bag of Words (BoW) models; 2D Star/ISM models; Multiview models; Learning Generative & Discriminative BoW models Probabilistic Hough voting Generative multiview models Recognition Classification with BoW Detection via Hough voting

Problem with bagofwords All have equal probability for bagofwords methods Location information is important

Part Based Representation Object as set of parts Model: Relative locations between parts Appearance of part Figure from [Fischler & Elschlager 73]

History of Parts and Structure approaches Fischler & Elschlager 1973 Yuille 91 Brunelli & Poggio 93 Lades, v.d. Malsburg et al. 93 Cootes, Lanitis, Taylor et al. 95 Amit & Geman 95, 99 Perona et al. 95, 96, 98, 00, 03, 04, 05 Ullman et al. 02 Felzenszwalb & Huttenlocher 00, 04 Crandall & Huttenlocher 05, 06 Leibe & Schiele 03, 04 Many papers since 2000

Deformations A B C D

Presence / Absence of Features occlusion

Background clutter

Sparse representation Computationally tractable (10 5 pixels 10 1 10 2 parts) But throw away potentially useful image information

Discriminative Parts need to be distinctive to separate from other classes

Hierarchical representations Pixels Pixel groupings Parts Object Images from [Amit98,Bouchard05]

Different connectivity structures Fergus et al. 03 FeiFei et al. 03 Crandall et al. 05 Leibe 05; Felzenszwalb 09 O(N 6 ) O(N 2 ) O(N 3 ) Crandall et al. 05 Felzenszwalb & Huttenlocher 00 O(N 2 ) Csurka 04 Vasconcelos 00 Bouchard & Triggs 05 Carneiro & Lowe 06 from Sparse Flexible Models of Local Features Gustavo Carneiro and David Lowe, ECCV 2006

Stochastic Grammar of Images S.C. Zhu et al. and D. Mumford

Context and Hierarchy in a Probabilistic Image Model Jin & Geman (2006) e.g. animals, trees, rocks e.g. contours, intermediate objects e.g. linelets, curvelets, T junctions e.g. discontinuities, gradient animal head instantiated by tiger head animal head instantiated by bear head

Different connectivity structures Fergus et al. 03 FeiFei et al. 03 Crandall et al. 05 Leibe 05; Felzenszwalb 09 O(N 6 ) O(N 2 ) O(N 3 ) Crandall et al. 05 Felzenszwalb & Huttenlocher 00 O(N 2 ) Csurka 04 Vasconcelos 00 Bouchard & Triggs 05 Carneiro & Lowe 06 from Sparse Flexible Models of Local Features Gustavo Carneiro and David Lowe, ECCV 2006

Star models by Latent SVM Felzenszwalb, McAllester, Ramanan, 08 Source code:

Star models by Latent SVM Search strategy: Sliding Windows Simple but large computational complexity (x,y, S,, N of classes)

Implicit shape models by generalized Hough voting B. Leibe, A. Leonardis, and B. Schiele, Combined Object Categorization and Segmentation with an Implicit Shape Model, ECCV Workshop on Statistical Learning in Computer Vision 2004

Basic properties Representation How to represent an object category; which classification scheme? Learning How to learn the classifier, given training data Recognition How the classifier is to be used on novel data

Object representation: Constellation of parts w.r.t object centroid B. Leibe, A. Leonardis, and B. Schiele, Combined Object Categorization and Segmentation with an Implicit Shape Model, ECCV Workshop on Statistical Learning in Computer Vision 2004

Object representation: How to capture constellation of parts? Using Hough Voting B. Leibe, A. Leonardis, and B. Schiele, Combined Object Categorization and Segmentation with an Implicit Shape Model, ECCV Workshop on Statistical Learning in Computer Vision 2004

Basic properties Representation How to represent an object category; which classification scheme? Learning How to learn the classifier, given training data Recognition How the classifier is to be used on novel data

Hough Transform Formulation Parts in query image vote for a learnt model Significant aggregations of votes correspond to models Complexity : # parts * # votes Significantly lower than brute force search (e.g., sliding window detectors) Popular for detecting parameterized shapes Hough 59, Duda&Hart 72, Ballard 81, Slide Modified from S. Maji

Hough transform P.V.C. Hough, Machine Analysis of Bubble Chamber Pictures, Proc. Int. Conf. High Energy Accelerators and Instrumentation, 1959 Given a set of points, find the curve or line that explains the data points best y x

Hough transform P.V.C. Hough, Machine Analysis of Bubble Chamber Pictures, Proc. Int. Conf. High Energy Accelerators and Instrumentation, 1959 Given a set of points, find the curve or line that explains the data points best y (x 1, y 1 ) m y 1 = m x 1 + n x y = m x + n Hough space n

Hough transform P.V.C. Hough, Machine Analysis of Bubble Chamber Pictures, Proc. Int. Conf. High Energy Accelerators and Instrumentation, 1959 Issue : parameter space [m,n] is unbounded Use a polar representation for the parameter space y x xcos ysin Hough space

Hough transform experiments features votes

Hough transform experiments Noisy data features votes IDEA: introduce a grid a count intersection points in each cell Issue: Grid size needs to be adjusted

Hough transform experiments features votes Issue: spurious peaks due to uniform noise

Hough transform conclusions Good: All points are processed independently, so can cope with occlusion/outliers Some robustness to noise: noise points unlikely to contribute consistently to any single bin Bad: Spurious peaks due to uniform noise Tradeoff noisegrid size (hard to find sweet point)

Hough transform experiments Courtesy of TKK Automation Technology Laboratory

Credit slide: C. Grauman

Generalized Hough transform What if want to detect arbitrary shapes defined by boundary points and a reference point? [Dana H. Ballard, Generalizing the Hough Transform to Detect Arbitrary Shapes, 1980] Credit slide: C. Grauman

Generalized Hough transform To detect the model shape in a new image: For each edge point Index into table with its gradient orientation θ Use retrieved r vectors to vote for position of reference point Peak in this Hough space is reference point with most supporting edges Assuming translation is the only transformation here, i.e., orientation and scale are fixed. Credit slide: C. Grauman

Example Query Circle model rx ry 0 1 0 45 0.7 0.7 90 0 1 135 0.7 0.7 270 0.7 0.7 P1 = 0 R = [rx,ry] = [1,0] C1 = P1 + R P2 = 45 R = [rx,ry] = [.7,.7] C2 = P2 + R Pk = 180 R = [rx,ry] = [1, 0] Ck = Pk + R

Implicit shape models Instead of indexing displacements by gradient orientation, index by visual codeword Visual codebook is used to index votes for object position [center] and scale CW rx ry 1 0.9 0.1 1 B. Leibe, A. Leonardis, and B. Schiele, Combined Object Categorization and Segmentation with an Implicit Shape Model, ECCV Workshop on Statistical Learning in Computer Vision 2004

Implicit shape models Instead of indexing displacements by gradient orientation, index by visual codeword Visual codebook is used to index votes for object position [center] and scale CW rx ry 1 0.9.1 3?? 1 3 B. Leibe, A. Leonardis, and B. Schiele, Combined Object Categorization and Segmentation with an Implicit Shape Model, ECCV Workshop on Statistical Learning in Computer Vision 2004

Implicit shape models Instead of indexing displacements by gradient orientation, index by visual codeword Visual codebook is used to index votes for object position [center] and scale CW rx ry 1 0.9.1 3 1 0 1 3 N 0.7 0.7 B. Leibe, A. Leonardis, and B. Schiele, Combined Object Categorization and Segmentation with an Implicit Shape Model, ECCV Workshop on Statistical Learning in Computer Vision 2004

Implicit shape models: Training 1. Build codebook of patches around extracted interest points using clustering Credit slide: S. Lazebnik

Implicit shape models: Training 1. Build codebook of patches around extracted interest points using clustering 2. Map the patch around each interest point to closest codebook entry 3. For each codebook entry, store all positions relative to object center [center is given] and scale [bounding box is given] Credit slide: S. Lazebnik

[Leibe, Leonardis, Schiele, SLCV 04; IJCV 08] Implicit Shape Model Recognition Interest Points Matched Codebook Entries Probabilistic Voting y s x 3D Voting Space (continuous) 22Nov11 74

[Leibe, Leonardis, Schiele, SLCV 04; IJCV 08] Implicit Shape Model Recognition Interest Points Matched Codebook Entries Probabilistic Voting y s x 3D Voting Space (continuous) Segmentation Backprojected Hypotheses Backprojection of Maxima 76

Probabilistic Hough Transform Image Feature Codebook match Object Position f = features l = location of the features. C = codebook entry O = object class x = object center Learnt using a max margin formulation Maji et al, CVPR 2009 Detection Score Position Posterior distribution of the centroid given the Codeword Ci observed at location lj. Codeword Match confidence (or weight) of the codeword Ci.

Computer Perceptual Vision and Sensory WS 08/09 Augmented Computing Example: Results on Cows Original image

Computer Perceptual Vision and Sensory WS 08/09 Augmented Computing Example: Results on Cows Interest Original points image

Computer Perceptual Vision and Sensory WS 08/09 Augmented Computing Example: Results on Cows Matched Interest Original points patches image

Computer Perceptual Vision and Sensory WS 08/09 Augmented Computing Example: Results on Cows Prob. Votes

Computer Perceptual Vision and Sensory WS 08/09 Augmented Computing Example: Results on Cows 1 st hypothesis

Computer Perceptual Vision and Sensory WS 08/09 Augmented Computing Example: Results on Cows 2 nd hypothesis

Computer Perceptual Vision and Sensory WS 08/09 Augmented Computing Example: Results on Cows 3 rd hypothesis

Computer Perceptual Vision and Sensory WS 08/09 Augmented Computing Example Results: Chairs Dining room chairs Office chairs

Computer Perceptual Vision and Sensory WS 08/09 Augmented Computing You Can Try It At Home Linux binaries available Including datasets & several pretrained detectors http://www.vision.ee.ethz.ch/bleibe/code 88 Source: Bastian Leibe 22Nov11

89 Extension: Learning Feature Weights Subhransu Maji and Jitendra Malik, Object Detection Using a MaxMargin Hough Transform, CVPR 2009 Weights can be learned optimally using a maxmargin framework. Important Parts blue (low), dark red (high)

Learned Weights (ETHZ shape) Naïve Bayes MaxMargin Influenced by clutter (rare structures) Important Parts blue (low), dark red (high) 95

Learned Weights (UIUC cars) Naïve Bayes MaxMargin Important Parts blue (low), dark red (high) 96

Detection Results (ETHZ dataset) Recall @ 1.0 False Positives Per Window 97

Conclusions Pros: Works well for many different object categories Both rigid and articulated objects Flexible geometric model Can recombine parts seen on different training examples Learning from relatively few (50100) training examples Optimized for detection, good localization properties Cons: Needs supervised training data Object bounding boxes for detection Segmentations for topdown segmentation No discriminative learning

Object detection and 3d shape recovery from a single image M. Sun, B. Xu, G. Bradski, S. Savarese, ECCV 2010 M. Sun, S. Kumar, G. Bradski, S. Savarese, 3DIMPVT 2011 Object categories Arbitrary Texture, Topology Uncontrolled Environment Uncalibrated Camera

Depth Encoded Hough Voting s d : Object location x : Part with center location l and scale s

Depth Encoded Hough Voting Position Posterior Codeword likelihood Scale related to Depth C = Codebook f = Part appearance(feature), l = 2D location s = Scale of the part, d = Depth information, j j j j j i j j j i j i j j j i ds d l s p d s l C O p f C p d s l O C x p ), ( ),,, ( ) ( ),,,, (, Detection Score

Object detection and 3d shape recovery from a single image 3d reconsctruction

Object detection and 3d shape recovery from a single image 3d pose estimation

Object detection and 3d shape recovery from a single image 3d modeling

Object detection and 3d shape recovery from a single image CAD Alignment Texture Completion 105

Summary Part based models enable more descriptive characterization of objects Semantically useful regions Depth occlusions ISM models Generative Enable segmentation Require large supervision 22Nov11 107