Modern Object Detection Most slides from Ali Farhadi
Comparison of Classifiers assuming x in {0 1} Learning Objective Training Inference Naïve Bayes maximize j i logp + logp ( x y ; θ ) ( y ; θ ) i ij 0 i j θ kj = i δ ( x = 1 y = k) i ij δ ( y = k) + Kr i i + r θ T 1 x + θ whereθ T 0 1 j θ 0 j ( 1 x) > 0 P ( x j = 1 y = 1 = log ), P( x j = 1 y = 0) P( x j = 0 y = 1) = log P( x = 0 y = 0) j Logistic Regression maximize where P ( P( y x, θ) ) log i + λ θ i T ( y x, θ) = 1/ ( 1+ exp( y θ x ) i i Gradient ascent θ T x > t Linear SVM minimize such that 1 λ ξi + θ i 2 T y θ x 1 ξ i i i, ξ 0 i Quadratic programming or subgradient opt. θ T x > t Kernelized SVM complicated to write Quadratic programming ( xˆ, x) yiα ik i > 0 i Nearest Neighbor most similar features same label Record data y i where i = argmin i K ( x ˆ,x) i
Image Categorization Training Images Training Image Features Training Labels Classifier Training Trained Classifier Test Image Image Features Testing Trained Classifier Prediction Outdoor
Example: Dalal-Triggs pedestrian detector 1. Extract fixed-sized (64x128 pixel) window at each position and scale 2. Compute HOG (histogram of gradient) features within each window 3. Score the window with a linear SVM classifier 4. Perform non-maxima suppression to remove overlapping detections with lower scores Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05
Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05
Tested with RGB LAB Grayscale Slightly better performance vs. grayscale
Outperforms centered diagonal uncentered cubic-corrected Sobel Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05
Histogram of gradient orientations Orientation: 9 bins (for unsigned angles) Histograms in 8x8 pixel cells Votes weighted by magnitude Bilinear interpolation between cells Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05
Normalize with respect to surrounding cells Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05
# orientations X= # features = 15 x 7 x 9 x 4 = 3780 # cells # normalizations by neighboring cells Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05
Training set
SVM pos w neg w Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05
pedestrian Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05
Detection examples
Each window is separately classified
What about this one? Can the model we trained for pedestrians detect the person in this image?
Specifying an object model Statistical Template in Bounding Box Object is some (x,y,w,h) in image Features defined wrt bounding box coordinates Image Template Visualization Images from Felzenszwalb
When do statistical templates make sense? Caltech 101 Average Object Images
Deformable objects Images from Caltech-256 Slide Credit: Duan Tran
Deformable objects Images from D. Ramanan s dataset Slide Credit: Duan Tran
Parts-based Models Define objects by collection of parts modeled by 1. Appearance 2. Spatial configuration Slide credit: Rob Fergus
Explicit Models Hybrid template/parts model Detections Template Visualization Felzenszwalb et al. 2008
How to model spatial relations? Many others... O(N 6 ) O(N 2 ) O(N 3 ) O(N 2 ) Fergus et al. 03 Fei-Fei et al. 03 Leibe et al. 04, 08 Crandall et al. 05 Fergus et al. 05 Crandall et al. 05 Felzenszwalb & Huttenlocher 05 Csurka 04 Vasconcelos 00 Bouchard & Triggs 05 Carneiro & Lowe 06 from [Carneiro & Lowe, ECCV 06]
Tree-shaped model
Pictorial Structures Model Part = oriented rectangle Spatial model = relative size/orientation Felzenszwalb and Huttenlocher 2005
Pictorial Structures Model Appearance likelihood Geometry likelihood
Part representation Background subtraction
Pictorial Structures
Results for person matching 34
Results for person matching 35
Enhanced pictorial structures BMVC 2009
Deformable Latent Parts Model Useful parts discovered during training Detections Template Visualization Felzenszwalb et al. 2008
Score = F 0.Φ(p 0,H) + Σ F i.φ(p i,h) - Σ d i.φ d (x,y) 38
State-of-the-art Detector: Deformable Parts Model (DPM) Lifetime Achievement 1. Strong low-level features based on HOG 2. Efficient matching algorithms for deformable part-based models (pictorial structures) 3. Discriminative learning with latent variables (latent SVM) Felzenszwalb 39 et al., 2008, 2010, 2011, 2012
40
41
Car 42
43
Cat 44
45
Person riding horse
Person riding bicycle
48
Structure Recognition using Visual Phrases, CVPR 2011