Previously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011

Similar documents
Part-based and local feature models for generic object recognition

Part based models for recognition. Kristen Grauman

Beyond bags of features: Adding spatial information. Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba

Deformable Part Models

Beyond Bags of features Spatial information & Shape models

Visual Object Recognition

Object Category Detection. Slides mostly from Derek Hoiem

Object Recognition. Computer Vision. Slides from Lana Lazebnik, Fei-Fei Li, Rob Fergus, Antonio Torralba, and Jean Ponce

Bag of Words Models. CS4670 / 5670: Computer Vision Noah Snavely. Bag-of-words models 11/26/2013

By Suren Manvelyan,

Fitting: The Hough transform

CS6670: Computer Vision

Object Detection. Sanja Fidler CSC420: Intro to Image Understanding 1/ 1

Fitting: The Hough transform

Patch-based Object Recognition. Basic Idea

Patch Descriptors. CSE 455 Linda Shapiro

Fitting: The Hough transform

Supervised learning. y = f(x) function

Beyond Bags of Features

Detection III: Analyzing and Debugging Detection Methods

Local Features and Bag of Words Models

Part-based models. Lecture 10

Model Fitting: The Hough transform II

TA Section: Problem Set 4

Efficient Kernels for Identifying Unbounded-Order Spatial Features

Computer Vision. Exercise Session 10 Image Categorization

Bag-of-features. Cordelia Schmid

Patch Descriptors. EE/CSE 576 Linda Shapiro

Instance-level recognition II.

Object detection. Announcements. Last time: Mid-level cues 2/23/2016. Wed Feb 24 Kristen Grauman UT Austin

Image classification Computer Vision Spring 2018, Lecture 18

Object Classification Problem

Introduction to object recognition. Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others

String distance for automatic image classification

Instance-level recognition part 2

Loose Shape Model for Discriminative Learning of Object Categories

Descriptors for CV. Introduc)on:

CLASSIFICATION Experiments

Visuelle Perzeption für Mensch- Maschine Schnittstellen

Part-Based Models for Object Class Recognition Part 2

Part-Based Models for Object Class Recognition Part 2

Distances and Kernels. Motivation

OBJECT CATEGORIZATION

Artistic ideation based on computer vision methods

Fitting: Voting and the Hough Transform April 23 rd, Yong Jae Lee UC Davis

Local features, distances and kernels. Plan for today

CS 4495 Computer Vision Classification 3: Bag of Words. Aaron Bobick School of Interactive Computing

Backprojection Revisited: Scalable Multi-view Object Detection and Similarity Metrics for Detections

Aggregating Descriptors with Local Gaussian Metrics

Local Features based Object Categories and Object Instances Recognition

Today. Main questions 10/30/2008. Bag of words models. Last time: Local invariant features. Harris corner detector: rotation invariant detection

Improved Spatial Pyramid Matching for Image Classification

Finding 2D Shapes and the Hough Transform

Visual words. Map high-dimensional descriptors to tokens/words by quantizing the feature space.

Modern Object Detection. Most slides from Ali Farhadi

Local features: detection and description. Local invariant features

High Level Computer Vision

Mining Discriminative Adjectives and Prepositions for Natural Scene Recognition

Indexing local features and instance recognition May 16 th, 2017

CEE598 - Visual Sensing for Civil Infrastructure Eng. & Mgmt.

10/03/11. Model Fitting. Computer Vision CS 143, Brown. James Hays. Slides from Silvio Savarese, Svetlana Lazebnik, and Derek Hoiem

Object recognition. Methods for classification and image representation

CEE598 - Visual Sensing for Civil Infrastructure Eng. & Mgmt.

EECS 442 Computer vision. Object Recognition

Lecture 8: Fitting. Tuesday, Sept 25

Category-level Localization

Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science.

Classifying Images with Visual/Textual Cues. By Steven Kappes and Yan Cao

Lecture 16: Object recognition: Part-based generative models

Methods for Representing and Recognizing 3D objects

Recognition with Bag-ofWords. (Borrowing heavily from Tutorial Slides by Li Fei-fei)

Discriminative classifiers for image recognition

Learning Visual Semantics: Models, Massive Computation, and Innovative Applications

Improving Recognition through Object Sub-categorization

Selection of Scale-Invariant Parts for Object Class Recognition

Prof. Kristen Grauman

Visuelle Perzeption für Mensch- Maschine Schnittstellen

Fuzzy based Multiple Dictionary Bag of Words for Image Classification

Fusing shape and appearance information for object category detection

Category vs. instance recognition

Course Administration

Indexing local features and instance recognition May 14 th, 2015

Object detection using non-redundant local Binary Patterns

Learning Realistic Human Actions from Movies

Recognizing object instances

Feature Matching + Indexing and Retrieval

Outline. Segmentation & Grouping. Examples of grouping in vision. Grouping in vision. Grouping in vision 2/9/2011. CS 376 Lecture 7 Segmentation 1

ObjectDetectionusingaMax-MarginHoughTransform

arxiv: v3 [cs.cv] 3 Oct 2012

Integrated Feature Selection and Higher-order Spatial Feature Extraction for Object Categorization

Recap Image Classification with Bags of Local Features

Bias-Variance Trade-off + Other Models and Problems

Beyond bags of Features

Keypoint-based Recognition and Object Search

Linear combinations of simple classifiers for the PASCAL challenge

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

Object Recognition and Detection

Local features and image matching. Prof. Xin Yang HUST

The SIFT (Scale Invariant Feature

Determining Patch Saliency Using Low-Level Context

Transcription:

Previously Part-based and local feature models for generic object recognition Wed, April 20 UT-Austin Discriminative classifiers Boosting Nearest neighbors Support vector machines Useful for object recognition when combined with window-based or holistic appearance descriptors Global window-based appearance representations Part-based and local feature models for recognition Main idea: Gist Map of local orientations Histograms of oriented gradients These examples are truly global; each pixel in the window contributes to the representation. Classifier can account for relative relevance When might this not be ideal? Rather than a representation based on holistic appearance, decompose the image into: local parts or patches, and their relative spatial relationships Part-based and local feature models for recognition We ll look at three forms: 1. Bag of words (no geometry) 2. Implicit shape model (star graph for spatial model) 3. Constellation model (fully connected graph for spatial model) Bag-of-words model Summarize entire image based on its distribution (histogram) of word occurrences. Total freedom on spatial positions, relative geometry. Vector representation easily usable by most classifiers. models 1

Bag-of-words model Words as parts Csurka et al. Visual Categorization with Bags of Keypoints, 2004 All local features Csurka et al. 2004 Local features from two selected clusters occurring in this image Naïve Bayes model for classification Confusion matrix c arg max c c w) c) w c) c) N n 1 w n c) Object class decision Prior prob. of the object classes Image likelihood given the class patches What assumptions does the model make, and what are their significance? Example bag of words + Naïve Bayes classification results for generic categorization of objects Csurka et al. 2004 Clutter or context? Sampling strategies Specific object Category models 2

Sampling strategies Local feature correspondence for generic object categories Sparse, at interest points Dense, uniformly Randomly To find specific, textured objects, sparse sampling from interest points more reliable. Multiple complementary interest operators offer more image coverage. Multiple interest operators For object categorization, dense sampling offers better coverage. Image credits: F-F. Li, E. Nowak, J. Sivic [See Nowak, Jurie & Triggs, ECCV 2006] Local feature correspondence for generic object categories Comparing bags of words histograms coarsely reflects agreement between local parts (patches, words). But choice of quantization directly determines what we consider to be similar Partially matching sets of features Optimal match: O(m 3 ) Greedy match: O(m 2 log m) Pyramid match: O(m) (m=num pts) We introduce an approximate matching kernel that makes it practical to compare large sets of features based on their partial correspondences. [Previous work: Indyk & Thaper, Bartal, Charikar, Agarwal & Varadarajan, ] Pyramid match: main idea Pyramid match: main idea Feature space partitions serve to match the local descriptors within successively wider regions. descriptor space [Grauman & Darrell, ICCV 2005] Histogram intersection counts number of possible matches at a given partitioning. [Grauman & Darrell, ICCV 2005] models 3

Pyramid match kernel Pyramid match kernel Optimal match: O(m 3 ) Pyramid match: O(mL) measures difficulty of a match at level number of newly matched pairs at level For similarity, weights inversely proportional to bin size (or may be learned) Normalize these kernel values to avoid favoring large sets optimal partial matching [Grauman & Darrell, ICCV 2005] [Grauman & Darrell, ICCV 2005] Highlights of the pyramid match Linear time complexity Formal bounds on expected error Mercer kernel Data-driven partitions allow accurate matches even in high-dim. feature spaces Strong performance on benchmark object recognition datasets Example recognition results: Caltech-101 dataset 101 categories 40-800 images per class Data provided by Fei-Fei, Fergus, and Perona Recognition results: Caltech-101 dataset Example recognition results: Caltech-101 dataset Combination of pyramid match and correspondence kernels Jain, Huynh, & Grauman (2007) Accuracy [Kapoor et al. IJCV 2009] Number of training examples models 4

Unordered sets of local features: No spatial layout preserved! Spatial pyramid match Make a pyramid of bag-of-words histograms. Provides some loose (global) spatial layout information Too much? Too little? Sum over PMKs computed in image coordinate space, one per word. [Lazebnik, Schmid & Ponce, CVPR 2006] Spatial pyramid match Captures scene categories well---texture-like patterns but with some variability in the positions of all the local pieces. Spatial pyramid match Captures scene categories well---texture-like patterns but with some variability in the positions of all the local pieces. Confusion matrix Part-based and local feature models for recognition We ll look at three forms: 1. Bag of words (no geometry) 2. Implicit shape model (star graph for spatial model) 3. Constellation model (fully connected graph for spatial model) Shape representation in part-based models Star shape model x 6 x 5 e.g. implicit shape model Parts mutually independent x 1 x 4 x 2 x 3 N image features, P parts in the model models 5

Implicit shape models Visual vocabulary is used to index votes for object position [a visual word = part ] Implicit shape models Visual vocabulary is used to index votes for object position [a visual word = part ] training image annotated with object localization info visual codeword with displacement vectors test image B. Leibe, A. Leonardis, and B. Schiele, Combined Object Categorization and Segmentation with an Implicit Shape Model, ECCV Workshop on Statistical Learning in Computer Vision 2004 B. Leibe, A. Leonardis, and B. Schiele, Combined Object Categorization and Segmentation with an Implicit Shape Model, ECCV Workshop on Statistical Learning in Computer Vision 2004 Implicit shape models: Training 1. Build vocabulary of patches around extracted interest points using clustering Implicit shape models: Training 1. Build vocabulary of patches around extracted interest points using clustering 2. Map the patch around each interest point to closest word Implicit shape models: Training 1. Build vocabulary of patches around extracted interest points using clustering 2. Map the patch around each interest point to closest word 3. For each word, store all positions it was found, relative to object center Implicit shape models: Testing 1. Given new test image, extract patches, match to vocabulary words 2. Cast votes for possible positions of object center 3. Search for maxima in voting space 4. (Extract weighted segmentation mask based on stored masks for the codebook occurrences) What is the dimension of the Hough space? models 6

Implicit shape models: Testing Original image Interest Original points image Matched Interest Original points patches image Matched Interest Original Votes patches points image 41 1 st hypothesis 42 models 7

2 nd hypothesis 43 3 rd hypothesis 44 Detection Results Qualitative Performance Recognizes different kinds of objects Robust to clutter, occlusion, noise, low contrast 45 x 6 x 5 Shape representation in part-based models Star shape model e.g. implicit shape model Parts mutually independent x 1 x 4 x 2 x 3 N image features, P parts in the model Fully connected constellation model x 6 x 5 x 1 x 4 x 2 x 3 e.g. Constellation Model Parts fully connected Slide credit: Rob Fergus Probabilistic constellation model Probabilistic constellation model P( image P( appearance, shape max P( appearance h, shape h, h h h: assignment Part of features to Part parts descriptors locations P( image P( appearance, shape max P( appearance h, shape h, h h h: assignment of features to parts Part 1 Part 3 Candidate parts Part 2 Source: Lana Lazebnik Source: Lana Lazebnik models 8

Probabilistic constellation model P( image P( appearance, shape max P( appearance h, shape h, h h Example results from constellation model: data from four categories h: assignment of features to parts Part 1 Part 3 Part 2 Source: Lana Lazebnik Slide from Li Fei-Fei http://www.vision.caltech.edu/feifeili/resume.htm Face model Appearance: 10 patches closest to mean for each part Face model Appearance: 10 patches closest to mean for each part Recognition results Recognition results Test images: size of circles indicates score of hypothesis Fergus et al. CVPR 2003 Fergus et al. CVPR 2003 Motorbike model Appearance: 10 patches closest to mean for each part Spotted cat model Appearance: 10 patches closest to mean for each part Recognition results Recognition results Fergus et al. CVPR 2003 Fergus et al. CVPR 2003 models 9

Comparison Shape representation in part-based models bag of features bag of features Part-based model Star shape model Fully connected constellation model x 1 x 1 x 6 x 2 x 6 x 2 x 5 x 3 x 5 x 3 x 4 x 4 e.g. implicit shape model Parts mutually independent Recognition complexity: O(NP) Method: Gen. Hough Transform e.g. Constellation Model Parts fully connected Recognition complexity: O(N P ) Method: Exhaustive search Source: Lana Lazebnik N image features, P parts in the model Slide credit: Rob Fergus Summary: part-based and local feature models for generic object recognition Histograms of visual words to capture global or local layout in the bag-of-words framework Pyramid match kernels Powerful in practice for image recognition Recognition models Part-based models encode category s part appearance together with 2d layout and allow detection within cluttered image implicit shape model : shape based on layout of all parts relative to a reference part; Generalized Hough for detection constellation model : explicitly model mutual spatial layout between all pairs of parts; exhaustive search for best fit of features to parts Instances: recognition by alignment Categories: Holistic appearance models (and sliding window detection) Categories: Local feature and part based models Coming up Video processing: motion, tracking, activity models 10