Modern Object Detection. Most slides from Ali Farhadi

Modern Object Detection Most slides from Ali Farhadi

Comparison of Classifiers assuming x in {0 1} Learning Objective Training Inference Naïve Bayes maximize j i logp + logp ( x y ; θ ) ( y ; θ ) i ij 0 i j θ kj = i δ ( x = 1 y = k) i ij δ ( y = k) + Kr i i + r θ T 1 x + θ whereθ T 0 1 j θ 0 j ( 1 x) > 0 P ( x j = 1 y = 1 = log ), P( x j = 1 y = 0) P( x j = 0 y = 1) = log P( x = 0 y = 0) j Logistic Regression maximize where P ( P( y x, θ) ) log i + λ θ i T ( y x, θ) = 1/ ( 1+ exp( y θ x ) i i Gradient ascent θ T x > t Linear SVM minimize such that 1 λ ξi + θ i 2 T y θ x 1 ξ i i i, ξ 0 i Quadratic programming or subgradient opt. θ T x > t Kernelized SVM complicated to write Quadratic programming ( xˆ, x) yiα ik i > 0 i Nearest Neighbor most similar features same label Record data y i where i = argmin i K ( x ˆ,x) i

Image Categorization Training Images Training Image Features Training Labels Classifier Training Trained Classifier Test Image Image Features Testing Trained Classifier Prediction Outdoor

Example: Dalal-Triggs pedestrian detector 1. Extract fixed-sized (64x128 pixel) window at each position and scale 2. Compute HOG (histogram of gradient) features within each window 3. Score the window with a linear SVM classifier 4. Perform non-maxima suppression to remove overlapping detections with lower scores Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05

Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05

Tested with RGB LAB Grayscale Slightly better performance vs. grayscale

Outperforms centered diagonal uncentered cubic-corrected Sobel Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05

Histogram of gradient orientations Orientation: 9 bins (for unsigned angles) Histograms in 8x8 pixel cells Votes weighted by magnitude Bilinear interpolation between cells Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05

Normalize with respect to surrounding cells Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05

# orientations X= # features = 15 x 7 x 9 x 4 = 3780 # cells # normalizations by neighboring cells Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05

Training set

SVM pos w neg w Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05

pedestrian Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05

Detection examples

Each window is separately classified

What about this one? Can the model we trained for pedestrians detect the person in this image?

Specifying an object model Statistical Template in Bounding Box Object is some (x,y,w,h) in image Features defined wrt bounding box coordinates Image Template Visualization Images from Felzenszwalb

When do statistical templates make sense? Caltech 101 Average Object Images

Deformable objects Images from Caltech-256 Slide Credit: Duan Tran

Deformable objects Images from D. Ramanan s dataset Slide Credit: Duan Tran

Parts-based Models Define objects by collection of parts modeled by 1. Appearance 2. Spatial configuration Slide credit: Rob Fergus

Explicit Models Hybrid template/parts model Detections Template Visualization Felzenszwalb et al. 2008

How to model spatial relations? Many others... O(N 6 ) O(N 2 ) O(N 3 ) O(N 2 ) Fergus et al. 03 Fei-Fei et al. 03 Leibe et al. 04, 08 Crandall et al. 05 Fergus et al. 05 Crandall et al. 05 Felzenszwalb & Huttenlocher 05 Csurka 04 Vasconcelos 00 Bouchard & Triggs 05 Carneiro & Lowe 06 from [Carneiro & Lowe, ECCV 06]

Tree-shaped model

Pictorial Structures Model Part = oriented rectangle Spatial model = relative size/orientation Felzenszwalb and Huttenlocher 2005

Pictorial Structures Model Appearance likelihood Geometry likelihood

Part representation Background subtraction

Pictorial Structures

Results for person matching 34

Results for person matching 35

Enhanced pictorial structures BMVC 2009

Deformable Latent Parts Model Useful parts discovered during training Detections Template Visualization Felzenszwalb et al. 2008

Score = F 0.Φ(p 0,H) + Σ F i.φ(p i,h) - Σ d i.φ d (x,y) 38

State-of-the-art Detector: Deformable Parts Model (DPM) Lifetime Achievement 1. Strong low-level features based on HOG 2. Efficient matching algorithms for deformable part-based models (pictorial structures) 3. Discriminative learning with latent variables (latent SVM) Felzenszwalb 39 et al., 2008, 2010, 2011, 2012

Car 42

Cat 44

Person riding horse

Person riding bicycle

Structure Recognition using Visual Phrases, CVPR 2011