Beyond Bags of features Spatial information & Shape models

Similar documents
Beyond bags of features: Adding spatial information. Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba

Object recognition. Methods for classification and image representation

Previously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011

Object Category Detection. Slides mostly from Derek Hoiem

Part-based and local feature models for generic object recognition

Deformable Part Models

Part based models for recognition. Kristen Grauman

Category vs. instance recognition

Modern Object Detection. Most slides from Ali Farhadi

Feature descriptors. Alain Pagani Prof. Didier Stricker. Computer Vision: Object and People Tracking

Object Category Detection: Sliding Windows

Bag of Words Models. CS4670 / 5670: Computer Vision Noah Snavely. Bag-of-words models 11/26/2013

Object Recognition. Computer Vision. Slides from Lana Lazebnik, Fei-Fei Li, Rob Fergus, Antonio Torralba, and Jean Ponce

Introduction to object recognition. Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others

CS6670: Computer Vision

2D Image Processing Feature Descriptors

Object Detection Design challenges

Patch-based Object Recognition. Basic Idea

Patch Descriptors. CSE 455 Linda Shapiro

Discriminative classifiers for image recognition

Supervised learning. y = f(x) function

Object Classification Problem

Fitting: The Hough transform


Classifying Images with Visual/Textual Cues. By Steven Kappes and Yan Cao

Detection III: Analyzing and Debugging Detection Methods

Local Features and Bag of Words Models

Beyond Bags of Features

Fitting: The Hough transform

Patch Descriptors. EE/CSE 576 Linda Shapiro

Object Detection. Sanja Fidler CSC420: Intro to Image Understanding 1/ 1

Visuelle Perzeption für Mensch- Maschine Schnittstellen

Fitting: The Hough transform

Window based detectors

Object detection. Announcements. Last time: Mid-level cues 2/23/2016. Wed Feb 24 Kristen Grauman UT Austin

Recap Image Classification with Bags of Local Features

CS4495/6495 Introduction to Computer Vision. 8C-L1 Classification: Discriminative models

Local Image Features

Visual Object Recognition

Loose Shape Model for Discriminative Learning of Object Categories

Bag-of-features. Cordelia Schmid

Incremental Learning of Object Detectors Using a Visual Shape Alphabet

Part-Based Models for Object Class Recognition Part 2

Part-Based Models for Object Class Recognition Part 2

Part-based models. Lecture 10

String distance for automatic image classification

Object Recognition and Detection

Previously. Window-based models for generic object detection 4/11/2011

CEE598 - Visual Sensing for Civil Infrastructure Eng. & Mgmt.

Lecture 12 Recognition. Davide Scaramuzza

Methods for Representing and Recognizing 3D objects

Model Fitting: The Hough transform II

By Suren Manvelyan,

CEE598 - Visual Sensing for Civil Infrastructure Eng. & Mgmt.

Lecture 12 Recognition

EECS 442 Computer vision. Object Recognition

Human Detection and Action Recognition. in Video Sequences

Recognize Complex Events from Static Images by Fusing Deep Channels Supplementary Materials

Object Recognition Using Pictorial Structures. Daniel Huttenlocher Computer Science Department. In This Talk. Object recognition in computer vision

High Level Computer Vision

IMAGE RETRIEVAL USING VLAD WITH MULTIPLE FEATURES

Templates and Background Subtraction. Prof. D. Stricker Doz. G. Bleser

Generic Object-Face detection

ImageCLEF 2011

Fitting: Voting and the Hough Transform April 23 rd, Yong Jae Lee UC Davis

Image classification Computer Vision Spring 2018, Lecture 18

Instance-level recognition part 2

CS 1674: Intro to Computer Vision. Object Recognition. Prof. Adriana Kovashka University of Pittsburgh April 3, 5, 2018

Multiple-Choice Questionnaire Group C

Category-level Localization

Backprojection Revisited: Scalable Multi-view Object Detection and Similarity Metrics for Detections

Local features: detection and description. Local invariant features

Histograms of Oriented Gradients for Human Detection p. 1/1

An Exploration of Computer Vision Techniques for Bird Species Classification

Artistic ideation based on computer vision methods

Supervised learning. y = f(x) function

HISTOGRAMS OF ORIENTATIO N GRADIENTS

Object Recognition II

Visuelle Perzeption für Mensch- Maschine Schnittstellen

Contents I IMAGE FORMATION 1

Part-Based Models for Object Class Recognition Part 3

Mining Discriminative Adjectives and Prepositions for Natural Scene Recognition

Object Detection. Computer Vision Yuliang Zou, Virginia Tech. Many slides from D. Hoiem, J. Hays, J. Johnson, R. Girshick

OBJECT CATEGORIZATION

SURF. Lecture6: SURF and HOG. Integral Image. Feature Evaluation with Integral Image

EECS 442 Computer vision. Fitting methods

Keypoint-based Recognition and Object Search

Fusing shape and appearance information for object category detection

Human detection using histogram of oriented gradients. Srikumar Ramalingam School of Computing University of Utah

ObjectDetectionusingaMax-MarginHoughTransform

A novel template matching method for human detection

Instance-level recognition II.

Analysis: TextonBoost and Semantic Texton Forests. Daniel Munoz Februrary 9, 2009

Efficient Kernels for Identifying Unbounded-Order Spatial Features

Selection of Scale-Invariant Parts for Object Class Recognition

Descriptors for CV. Introduc)on:

Local features and image matching. Prof. Xin Yang HUST

Classification and Detection in Images. D.A. Forsyth

CV as making bank. Intel buys Mobileye! $15 billion. Mobileye:

Image Analysis. Window-based face detection: The Viola-Jones algorithm. iphoto decides that this is a face. It can be trained to recognize pets!

Transcription:

Beyond Bags of features Spatial information & Shape models Jana Kosecka Many slides adapted from S. Lazebnik, FeiFei Li, Rob Fergus, and Antonio Torralba

Detection, recognition (so far )! Bags of features models, codebooks made based on appearance! No spatial relationships between local features! Incorporating spatial information! Edge based representations! Distance Transform, Chamfer matching! Generalized Hough Transform! Combinations of edge based and patch based! Color based representations! Holistic gist based representations 2

Adding spatial information Forming vocabularies from pairs of nearby features doublets or bigrams Computing bags of features on subwindows of the whole image Using codebooks to vote for object position Generative partbased models

Spatial pyramid representation! Extension of a bag of features! Locally orderless representation at several levels of resolution level 0 Lazebnik, Schmid & Ponce (CVPR 2006)

Spatial pyramid representation! Extension of a bag of features! Locally orderless representation at several levels of resolution level 0 level 1 Lazebnik, Schmid & Ponce (CVPR 2006)

Spatial pyramid representation! Extension of a bag of features! Locally orderless representation at several levels of resolution level 0 level 1 level 2 Lazebnik, Schmid & Ponce (CVPR 2006)

Scene category dataset Multiclass classification results (100 training images per class)

Caltech101 dataset http://www.vision.caltech.edu/image_datasets/caltech101/caltech101.html Multiclass classification results (30 training images per class)

Distance Transform! Given:! binary image, B, of edge and local feature locations! binary edge template, T, of shape we want to match! Let D be an array in registration with B such that D(i,j) is the distance to the nearest 1 in B.! this array is called the distance transform of B (binary image)

Distance Transform! Use of distance transform for template matching! Goal: Find placement of template T in D that minimizes the sum, M, of the distance transform multiplied by the pixel values in T! if T is an exact match to B at location (i,j) then M(i,j) = 0! i.e. all nonzero pixels of T will have distance 0! if the edges in B are slightly displaced from their ideal locations in T, we still get a good match using the distance transform technique

Computing the distance transform! Brute force, exact algorithm, is to scan B and find, for each 0, its closest 1 using the Euclidean distance.! expensive in time, and difficult to implement 0 1 2 3 4 5 6 7 1 1 2 3 4 5 6 7 2 2 2 3 0 1 2 3 3 3 3 1 0 1 2 0 4 4 2 1 0 0 0 1 5 3 2 1 1 1 1 0

Computing the distance transform! Two pass sequential algorithm! Initialize: set D(i,j) = 0 where B(i,j) = 1, else set D(i,j) =! Forward pass! D(i,j) = min[ D(i1,j1) 1, D(i1,j) 1, D(i1, j1) 1, D(i,j1) 1, D(i,j)]! Backward pass! D(i,j) = min[ D(i,j1) 1, D(i1,j1) 1, D(i1, j) 1, D(i 1,j1) 1, D(i,j)] f f f f b b b b f forward pass pixels b backward pass pixels 0 1 2 3 4 5 6 7 1 1 2 3 4 5 6 7 2 2 2 3 0 1 2 3 3 3 3 1 0 1 2 0 4 4 2 1 0 0 0 1 5 3 2 1 1 1 1 0

Computing the distance transform! Two pass sequential algorithm! Initialize: set D(i,j) = 0 where B(i,j) = 1, else set D(i,j) =! Forward pass! D(i,j) = min[ D(i1,j1) 1, D(i1,j) 1, D(i1, j1) 1, D(i,j1) 1, D(i,j)]! Backward pass! D(i,j) = min[ D(i,j1) 1, D(i1,j1) 1, D(i1, j) 1, D(i 1,j1) 1, D(i,j)]

Distance transform example 2 2 3 4 4 4 4 4 1 2 2 3 3 3 3 3 0 1 2 3 4 5 6 7 0 1 2 2 2 2 2 3 1 1 2 3 4 5 6 7 1 1 2 1 1 1 2 2 2 2 2 3 0 1 2 3 2 2 2 1 0 1 1 1 3 3 3 1 0 1 2 0 3 3 2 1 0 1 1 0 4 4 2 1 0 0 0 1 4 3 2 1 0 0 0 1 5 3 2 1 1 1 1 0 4 3 2 1 1 1 1 0 D(i,j) = min[d(i,j), D(i,j1)1, D(i1, j1)1, D(i1,j)1, D(i1,j1)1] f f f f b b b b Extensions to nonbinary images (functions) P. Felzenswalb and D. Huttenlocher Distance Transform of Sampled Functions

Chamfer matching! Chamfer matching is convolution of a binary edge template with the distance transform! for any placement of the template over the image, it sums up the distance transform values for all pixels that are 1 s (edges) in the template! if, at some position in the image, all of the edges in the template coincide with edges in the image (which are the points at which the distance transform is zero), then we have a perfect match with a match score of 0.

Example 2 2 3 4 4 4 4 4 1 2 2 3 3 3 3 3 0 1 2 2 2 2 2 3 1 1 2 1 1 1 2 2 Template 2 2 2 1 0 1 1 1 3 3 2 1 0 1 1 0 4 3 2 1 0 0 0 1 4 3 2 1 1 1 1 0 n n T( i Match score is i= 1 j= 1, 2 2 3 4 4 4 4 4 1 2 2 3 3 3 3 3 0 1 2 2 2 2 2 3 1 1 2 1 1 1 2 2 2 2 2 1 0 1 1 1 3 3 2 1 0 1 1 0 4 3 2 1 0 0 0 1 4 3 2 1 1 1 1 0 j) D( i k, j l)

Implicit shape models Combining the edge based Hough Transform style voting with appearance codebooks Visual codebook is used to index votes for object position training image annotated with object localization info visual codeword with displacement vectors B. Leibe, A. Leonardis, and B. Schiele, Combined Object Categorization and Segmentation with an Implicit Shape Model, ECCV Workshop on Statistical Learning in Computer Vision 2004

Implicit shape models Visual codebook is used to index votes for object position test image

Idea Implicit Shape Model! Faces rectangular templates detection windows! Does not generalize to more complex object with different shapes! How to combine patch based appearance based representations to incorporate notion of shape! Combined Object Categorization and Segmentation with an Implicit Shape Model. Bastian Leibe, Ales Leonardis, and Bernt Schiele. In ECCV'04. 30

Initial Recognition Approach! First Step: Generate hypotheses from local features! Training: Agglomerative Clustering How to decide when to merge two clusters Average NCC of patches NCC between two patches

Initial Recognition Approach! Codebook words spatial information is lost! For each codebook entry store all positions it was activated in relative to object center (positions parametrized by r and theta)! Parts vote for object center Lowe s DoG Detector 3σ x 3σ patches Resize to 25 x 25 Learn Spatial Distribution Find codebook patches

Pedestrian Detection in Crowded Scenes 1. Interleaved Object Categorization and Segmentation, BMVC 03 2. Combined Object Categorization and Segmentation with an Implicit Shape Model. Bastian Leibe, Ales Leonardis, and Bernt Schiele. In ECCV'04 Workshop on Statistical Learning in Computer Vision, Prague, May 2004. 33

Pedestrian Detection! Many applications! Large variation in shape, appearance! Need to combine different representations! Basic Premise: [Such a] problem is too difficult for any type of feature or model alone! Probabilistic bottom up, top down segmentation

! Open Question: How would you do pedestrian detection/segmentation? Original image Segmentation from local features! Solution: integrate as many cues as possible from many sources Support of Segmentation from local features Support of segmentation from global features (Chamfer Matching)

Theme of the Paper! Goal: Localize AND count pedestrians in a given image! Datasets Training Set: 35 people walking parallel to the image plane Testing Set (Much harder!): 209 images of 595 annotated pedestrians

Theme of the Paper

Initial Recognition Approach! First Step: Generate hypotheses from local features (Intrinsic Shape Models)! Testing:! Initial Hypothesis: Overall

Initial Recognition Approach! First Step: Generate hypotheses from local features (Intrinsic Shape Models)! Testing:! Initial Hypothesis: Overall

Initial Recognition Approach! Second Step: Segmentation based Verification (Minimum Description Length)! Caveat: it leads to another set of problems Or four legs and three arms ISM doesn t know a person doesn t have three legs! Global Cues are needed

Assimilation of Global Cues! Distance Transform, Chamfer Matching get Feature Image by an edge detector Chamfer Distance between template and DT image get DT image by computing distance to nearest feature point

Assimilation of Global Cues (Attempt 1)! Distance Transform, Chamfer Matching Initial hypothesis generated by local features Use scale estimate to cut out surrounding region Yellow is highest Chamfer score Chamfer distance based matching Apply Canny detector and compute DT

Results

Generative Part Based Models Many slides adapted from FeiFei Li, Rob Fergus, and Antonio Torralba

Generative partbased models R. Fergus, P. Perona and A. Zisserman, Object Class Recognition by Unsupervised ScaleInvariant Learning, CVPR 2003

Probabilistic model P( image = max h object ) = P( appearance, shape object ) P( appearance h, object ) p( shape h, object ) p( h h: assignment Part of features to Part parts descriptors locations object ) Candidate parts

Probabilistic model P( image object ) = P( appearance, shape object ) = max h P( appearance h, object ) p( shape h, object ) p( h object ) h: assignment of features to parts Part 1 Part 3 Part 2

Probabilistic model P( image object ) = P( appearance, shape object ) = max h P( appearance h, object ) p( shape h, object ) p( h object ) h: assignment of features to parts Part 1 Part 3 Part 2

Probabilistic model P( image = max h object ) = P( appearance, shape object ) P( appearance h, object ) p( shape h, object ) p( h object ) Distribution over patch descriptors Highdimensional appearance space

Probabilistic model P( image = max h object ) = P( appearance, shape object ) P( appearance h, object ) p( shape h, object ) p( h object ) Distribution over joint part positions 2D image space

How to model location?! Explicit: Probability density functions! Implicit: Voting scheme! Invariance! Translation! Scaling! Similarity/affine! Viewpoint Translation and Scaling Affine transformation Translation Similarity transformation

Summary: Adding spatial information Doublet vocabularies! Pro: takes cooccurrences into account, some geometric invariance is preserved! Cons: too many doublet probabilities to estimate Spatial pyramids! Pro: simple extension of a bag of features, works very well! Cons: no geometric invariance, no object localization Implicit shape models! Pro: can localize object, maintain translation and possibly scale invariance! Cons: need supervised training data (known object positions and possibly segmentation masks) Generative partbased models! Pro: very nice conceptually, can be learned from unsegmented images! Cons: combinatorial hypothesis search problem

Slides from Sminchisescu

Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05

! Tested with! RGB! LAB! Grayscale! Gamma Normalization and Compression! Square root! Log

centered diagonal uncentered cubiccorrected Sobel Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05

! Histogram of gradient orientations Orientation Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05

8 orientations X= 15x7 cells Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05

pedestrian Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05

Overfitting! A simple dataset.! Two models Linear Nonlinear

Overfitting! Let s get more data.! Simple model has better generalization.

Overfitting Loss! As complexity increases, the model overfits the data Real loss! Training loss decreases! Real loss increases Training loss! We need to penalize model complexity = to regularize Model complexity

Classification methods! K Nearest Neighbors! Decision Trees! Linear SVMs! Kernel SVMs! Boosted classifiers

K Nearest Neighbors o! Memorize all training data! Find K closest points to the query! The neighbors vote for the label: Vote()=2 Vote( )=1

KNearest Neighbors Nearest Neighbors (silhouettes) Kristen Grauman, Gregory Shakhnarovich, and Trevor Darrell, Virtual Visual Hulls: ExampleBased 3D Shape Inference from Silhouettes

KNearest Neighbors Silhouettes from other views 3D Visual hull Kristen Grauman, Gregory Shakhnarovich, and Trevor Darrell, Virtual Visual Hulls: ExampleBased 3D Shape Inference from Silhouettes

Decision tree V()=8 No X 1 >2 Yes V()=8 o V()=2 V()=8 No X 2 >1 Yes V()=4 V()=0 V()=4 V()=8 V()=2

Decision Tree Training V()=57% V()=64% V()=80% V()=80% V()=100%! Partition data into pure chunks! Find a good rule! Split the training data! Build left tree! Build right tree! Count the examples in the leaves to get the votes: V(), V()! Stop when! Purity is high! Data size is small! At fixed level

Histogram intersection 1 Assign to texture cluster Count S. Lazebnik, C. Schmid, and J. Ponce. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories.

(Spatial) Pyramid Match S. Lazebnik, C. Schmid, and J. Ponce. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories.