Category vs. instance recognition

Similar documents
Object Category Detection: Sliding Windows

Object Detection Design challenges

Object Category Detection. Slides mostly from Derek Hoiem


Recap Image Classification with Bags of Local Features

Modern Object Detection. Most slides from Ali Farhadi

Object Detection. Computer Vision Yuliang Zou, Virginia Tech. Many slides from D. Hoiem, J. Hays, J. Johnson, R. Girshick

Deformable Part Models

Feature descriptors. Alain Pagani Prof. Didier Stricker. Computer Vision: Object and People Tracking

2D Image Processing Feature Descriptors

Object Recognition II

Beyond Bags of features Spatial information & Shape models

Human detection using histogram of oriented gradients. Srikumar Ramalingam School of Computing University of Utah

SURF. Lecture6: SURF and HOG. Integral Image. Feature Evaluation with Integral Image

Detection III: Analyzing and Debugging Detection Methods

Category-level localization

Histograms of Oriented Gradients for Human Detection p. 1/1

Histogram of Oriented Gradients for Human Detection

Object Category Detection: Sliding Windows

Harder case. Image matching. Even harder case. Harder still? by Diva Sian. by swashford

Find that! Visual Object Detection Primer

Window based detectors

BSB663 Image Processing Pinar Duygulu. Slides are adapted from Selim Aksoy

An Implementation on Histogram of Oriented Gradients for Human Detection

HISTOGRAMS OF ORIENTATIO N GRADIENTS

Harder case. Image matching. Even harder case. Harder still? by Diva Sian. by swashford

Development in Object Detection. Junyuan Lin May 4th

Templates and Background Subtraction. Prof. D. Stricker Doz. G. Bleser

Local Feature Detectors

Person Detection in Images using HoG + Gentleboost. Rahul Rajan June 1st July 15th CMU Q Robotics Lab

Local Image Features

Image matching. Announcements. Harder case. Even harder case. Project 1 Out today Help session at the end of class. by Diva Sian.

Object Detection. Sanja Fidler CSC420: Intro to Image Understanding 1/ 1

Local Features: Detection, Description & Matching

Local Patch Descriptors

Part based models for recognition. Kristen Grauman

Feature Descriptors. CS 510 Lecture #21 April 29 th, 2013

Pedestrian Detection and Tracking in Images and Videos

Object recognition. Methods for classification and image representation

Classification and Detection in Images. D.A. Forsyth

Features Points. Andrea Torsello DAIS Università Ca Foscari via Torino 155, Mestre (VE)

Mobile Human Detection Systems based on Sliding Windows Approach-A Review

Local Image Features

Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science.

Object Detection with Discriminatively Trained Part Based Models

SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014

Local Image Features

Previously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011

Multiple-Person Tracking by Detection

CS 1674: Intro to Computer Vision. Object Recognition. Prof. Adriana Kovashka University of Pittsburgh April 3, 5, 2018

Histogram of Oriented Gradients (HOG) for Object Detection

Edge and corner detection

AK Computer Vision Feature Point Detectors and Descriptors

Local features: detection and description. Local invariant features

Computer Science Faculty, Bandar Lampung University, Bandar Lampung, Indonesia

High Level Computer Vision. Sliding Window Detection: Viola-Jones-Detector & Histogram of Oriented Gradients (HOG)

Feature Matching and Robust Fitting

CS5670: Computer Vision

Patch Descriptors. CSE 455 Linda Shapiro

CS4670: Computer Vision

Face Detection and Alignment. Prof. Xin Yang HUST

Discriminative classifiers for image recognition

Local features and image matching. Prof. Xin Yang HUST

Bias-Variance Trade-off (cont d) + Image Representations

Local Features and Bag of Words Models

Image Features. Work on project 1. All is Vanity, by C. Allan Gilbert,

Visuelle Perzeption für Mensch- Maschine Schnittstellen

Local features: detection and description May 12 th, 2015

Human detection based on Sliding Window Approach

Human detections using Beagle board-xm

Object recognition (part 1)

SCALE INVARIANT FEATURE TRANSFORM (SIFT)

Detecting Object Instances Without Discriminative Features

Using the Deformable Part Model with Autoencoded Feature Descriptors for Object Detection

Part-based and local feature models for generic object recognition

International Journal Of Global Innovations -Vol.4, Issue.I Paper Id: SP-V4-I1-P17 ISSN Online:

Object Recognition with Invariant Features

Fitting: The Hough transform

HOG-based Pedestriant Detector Training

Car Detecting Method using high Resolution images

Feature descriptors and matching

Computer Vision for HCI. Topics of This Lecture

Templates, Image Pyramids, and Filter Banks

A Discriminatively Trained, Multiscale, Deformable Part Model

Large-Scale Traffic Sign Recognition based on Local Features and Color Segmentation

PEOPLE IN SEATS COUNTING VIA SEAT DETECTION FOR MEETING SURVEILLANCE

Human Motion Detection and Tracking for Video Surveillance

Histograms of Oriented Gradients

10/03/11. Model Fitting. Computer Vision CS 143, Brown. James Hays. Slides from Silvio Savarese, Svetlana Lazebnik, and Derek Hoiem

Motion illusion, rotating snakes

Patch Descriptors. EE/CSE 576 Linda Shapiro

Skin and Face Detection

Selective Search for Object Recognition

Visuelle Perzeption für Mensch- Maschine Schnittstellen

Deformable Part Models

Classification of objects from Video Data (Group 30)

The SIFT (Scale Invariant Feature

The bits the whirl-wind left out..

Finding 2D Shapes and the Hough Transform

TA Section 7 Problem Set 3. SIFT (Lowe 2004) Shape Context (Belongie et al. 2002) Voxel Coloring (Seitz and Dyer 1999)

Transcription:

Category vs. instance recognition Category: Find all the people Find all the buildings Often within a single image Often sliding window Instance: Is this face James? Find this specific famous building Often within a database of images

Object detection vs. Scene Recognition Scenes can be defined by distribution of stuff materials and surfaces with arbitrary shape. Objects are things that own their boundaries Bag of words models are less popular for object detection because they throw away shape info. James Hays

Object Category Detection Focus on object search: Where is it? Build templates that quickly differentiate object patch from background patch Object or Non-Object? James Hays

Challenges in modeling the object class Illumination Object pose Clutter Occlusions Intra-class appearance Viewpoint Slide from K. Grauman, B. Leibe

Challenges in modeling the non-object class True Detections Bad Localization Confused with Similar Object Misc. Background Confused with Dissimilar Objects James Hays

Object Detection Design challenges How to efficiently search for likely objects Even simple models require searching hundreds of thousands of positions and scales. Feature design and scoring How should appearance be modeled? What features correspond to the object? How to deal with different viewpoints? Often train different models for a few different viewpoints

General Process of Object Recognition Specify Object Model What are the object parameters? Generate Hypotheses Score Hypotheses Resolve Detections James Hays

Specifying an object model 1. Statistical Template in Bounding Box Object is some (x,y,w,h) in image Features defined wrt bounding box coordinates Image Template Visualization Images from Felzenszwalb

Specifying an object model 2. Articulated parts model Object is configuration of parts Each part is detectable Images from Felzenszwalb

Specifying an object model 3. Hybrid template/parts model Detections Template Visualization Felzenszwalb et al. 2008

Specifying an object model 4. 3D-ish model Object is collection of 3D planar patches under affine transformation

Specifying an object model Loper et al. 2015 5. Deformable 3D model Object is a parameterized space of shape/pose/deformation of class of 3D object

Why not just pick the most complex model? Inference is harder More parameters Harder to fit (infer / optimize fit) Longer computation

General Process of Object Recognition Specify Object Model Generate Hypotheses Propose an alignment of the model to the image Score Hypotheses Resolve Detections James Hays

Generating hypotheses 1. Sliding window Test patch at each location and scale James Hays

Generating hypotheses 1. Sliding window Test patch at each location and scale Note Template did not change size

Each window is separately classified

Generating hypotheses 2. Voting from patches/keypoints Interest Points Matched Codebook Entries Probabilistic Voting 3D Voting Space (continuous) Implicit Shape Model by Leibe et al.

Generating hypotheses 3. Region-based proposal Endres Hoiem 2010

General Process of Object Recognition Specify Object Model Generate Hypotheses Score Hypotheses Mainly-gradient based features, usually based on summary representation, many classifiers Resolve Detections

General Process of Object Recognition Specify Object Model Generate Hypotheses Score Hypotheses Resolve Detections Rescore each proposed object based on whole set

Resolving detection scores 1. Non-max suppression Score = 0.8 Score = 0.8 Score = 0.1 James Hays

Resolving detection scores 1. Non-max suppression Score = 0.8 Score = 0.8 Score = 0.1 Score = 0.1 Overlap score is below some threshold

meters Resolving detection scores 2. Context/reasoning Hoiem et al. 2006 meters

Dalal Triggs: Person detection with HOG & linear SVM Histograms of Oriented Gradients for Human Detection, Navneet Dalal, Bill Triggs, International Conference on Computer Vision & Pattern Recognition - June 2005 http://lear.inrialpes.fr/pubs/2005/dt05/

Statistical Template Object model = sum of scores of features at fixed positions +3 +2-2 -1-2.5 = -0.5? > 7.5 Non-object +4 +1 +0.5 +3 +0.5 = 10.5? > 7.5 Object

Example: Dalal-Triggs pedestrian detector 1. Extract fixed-sized (64x128 pixel) window at each position and scale 2. Compute HOG (histogram of gradient) features within each window 3. Score the window with a linear SVM classifier 4. Perform non-maxima suppression to remove overlapping detections with lower scores Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05

Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05

Tested with RGB LAB Grayscale Gamma Normalization and Compression Square root Log Slightly better performance vs. grayscale Very slightly better performance vs. no adjustment

Outperforms centered diagonal uncentered cubic-corrected Sobel Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05

Histogram of Oriented Gradients Orientation: 9 bins (for unsigned angles 0-180) Histograms in k x k pixel cells Votes weighted by magnitude Bilinear interpolation between cells Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05

Normalize with respect to surrounding cells Rectangular HOG (R-HOG) How to normalize? - Concatenate all cell responses from block into vector. - Normalize vector. - Extract responses from cell of interest. Slides by Pete Barnum e is a small constant (for empty bins) Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05

Normalize with respect to surrounding cells Rectangular HOG (R-HOG) Circular HOG also exist, but trickier implementation Slides by Pete Barnum e is a small constant (for empty bins) Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05

# orientations X= # features = 15 x 7 x 9 x 4 = 3780 # cells # normalizations by neighboring cells Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05

pos w neg w Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05

pedestrian Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05

Pedestrian detection with HOG Learn a pedestrian template using a support vector machine At test time, compare feature map with template over sliding windows. Find local maxima of response Multi-scale: repeat over multiple levels of a HOG pyramid HOG feature map Template Detector response map Can be continuous for more sophisticated maxima finding N. Dalal and B. Triggs, Histograms of Oriented Gradients for Human Detection, CVPR 2005

INRIA pedestrian database

INRIA pedestrian database issues

How good is HOG at person detection? Miss rate = 1 - recall

Something to think about Sliding window detectors work very well for faces fairly well for cars and pedestrians badly for cats and dogs Why are some classes easier than others?

Strengths/Weaknesses of Statistical Template Approach Strengths Works very well for non-deformable objects with canonical orientations: faces, cars, pedestrians Fast detection Weaknesses Not so well for highly deformable objects or stuff Not robust to occlusion Requires lots of training data

Tricks of the trade Details in feature computation really matter E.g., normalization in Dalal-Triggs improves detection rate by 27% at fixed false positive rate Template size Typical choice is size of smallest expected detectable object Jittering or augmenting to create synthetic positive examples Create slightly rotated, translated, scaled, mirrored versions as extra positive examples. Bootstrapping to get hard negative examples 1. Randomly sample negative examples 2. Train detector 3. Sample negative examples that score > -1 4. Repeat until all high-scoring negative examples fit in memory