Histograms of Oriented Gradients for Human Detection p. 1/1 Histograms of Oriented Gradients for Human Detection Navneet Dalal and Bill Triggs INRIA Rhône-Alpes Grenoble, France Funding: acemedia, LAVA, Pascal Network
Histograms of Oriented Gradients for Human Detection p. 2/1 Introduction Detect & localize upright people in static images Challenges Wide variety of articulated poses Variable appearance/clothing Complex backgrounds Unconstrained illumination Occlusions, different scales Applications Pedestrian detection for smart cars Film & media analysis Visual surveillance
Histograms of Oriented Gradients for Human Detection p. 3/1 Approach & Data Set We focus on building robust feature sets Classifier is just linear SVM on normalized image windows, is reliable & fast Moving window based detector with non-maximum suppression over scale-space Data set available http://pascal.inrialpes.fr/data/human/ Data Set Train Test 614 positive images 288 positive images 1218 negative images 453 negative images 1208 positive windows 566 positive windows Overall 1774 human annotations + reflections
Histograms of Oriented Gradients for Human Detection p. 4/1 Feature Sets Haar Wavelets + SVM: Papageorgiou & Poggio (2000), Mohan et al (2001), DePoortere et al (2002) Rectangular differential features + adaboost: Viola & Jones (2001) Parts based binary orientation position histograms + adaboost: Mikolajczyk et al (2004) Edge templates + nearest neighbor: Gavrila & Philomen (1999) Dynamic programming: Felzenszwalb & Huttenlocher (2000), Ioffe & Forsyth (1999) Orientation histograms: c.f. Freeman et al (1996), Lowe (1999) Other descriptors: - Shape contexts: Belongie et al (2002) - PCA-SIFT: Ke and Sukthankar (2004)
Histograms of Oriented Gradients for Human Detection p. 5/1 Processing Chain Orientation Voting Input Image Gradient Image Overlapping Blocks Local Normalization Input image Normalize gamma & colour Compute gradients Weighted vote into spatial & orientation cells Contrast normalize over overlapping spatial blocks Collect HOG s over detection window Linear SVM Person / non person classification
Histograms of Oriented Gradients for Human Detection p. 6/1 HOG Descriptors Parameters Gradient scale Orientation bins Percentage of block overlap R-HOG/SIFT Cell Schemes RGB or Lab, color/gray-space Block normalization, L2-norm, v v/ v 2 2 + ɛ2 or L1-norm, v v/( v 1 + ɛ) C-HOG Center Bin Block Block Radial Bins, Angular Bins
Histograms of Oriented Gradients for Human Detection p. 7/1 Performance miss rate 0.2 0.1 0.05 0.02 0.01 MIT pedestrian database DET different descriptors on MIT database Lin. R HOG Lin. C HOG Lin. EC HOG Wavelet PCA SIFT Lin. G ShaceC Lin. E ShaceC MIT best (part) MIT baseline 10 6 10 5 10 4 10 3 10 2 10 1 false positives per window (FPPW) miss rate 0.5 0.2 0.1 INRIA person database DET different descriptors on INRIA database Ker. R HOG 0.05 Lin. R2 HOG Lin. R HOG Lin. C HOG Lin. EC HOG 0.02 Wavelet PCA SIFT Lin. G ShapeC Lin. E ShapeC 0.01 10 6 10 5 10 4 10 3 10 2 10 1 false positives per window (FPPW) - R/C-HOG give near perfect seperation on MIT database - Have 1-2 orders of magnitude lower false positives than other descriptors
Histograms of Oriented Gradients for Human Detection p. 8/1 Gradient Smoothening & Orientation Bins 0.5 Gradient scale, σ DET effect of gradient scale σ 0.5 Orientation bins, β DET effect of number of orientation bins β 0.2 0.2 miss rate 0.1 0.05 σ=0 σ=0.5 σ=1 0.02 σ=2 σ=3 σ=0, c cor 0.01 10 6 10 5 10 4 10 3 10 2 10 1 false positives per window (FPPW) miss rate 0.1 bin= 9 (0 180) bin= 6 (0 180) 0.05 bin= 4 (0 180) bin= 3 (0 180) bin=18 (0 360) 0.02 bin=12 (0 360) bin= 8 (0 360) bin= 6 (0 360) 0.01 10 6 10 5 10 4 10 3 10 2 10 1 false positives per window (FPPW) Using simple smoothed gradients & many orientations helps! Reducing gradient scale from 3 to 0 decreases false positives by 10 times Increasing orientation bins from 4 to 9 decreases false positives by 10 times
Histograms of Oriented Gradients for Human Detection p. 9/1 Normalization Method & Block Overlap 0.2 Normalization method DET effect of normalization methods Block overlap DET effect of overlap (cell size=8, num cell = 2x2, wt=0) 0.5 miss rate 0.1 0.05 L2 Hys L2 norm L1 Sqrt L1 norm No norm Window norm 0.02 10 5 10 4 10 3 10 2 false positives per window (FPPW) Strong local normalization is essential miss rate 0.2 0.1 0.05 0.02 overlap = 3/4, stride = 4 overlap = 1/2, stride = 8 overlap = 0, stride =16 0.01 10 6 10 5 10 4 10 3 10 2 10 1 false positives per window (FPPW) Overlapping blocks improves performance, but descriptor size increases
Histograms of Oriented Gradients for Human Detection p. 10/1 Effect of Block & Cell Size 20 Miss Rate (%) 15 10 5 0 12x12 10x10 8x8 6x6 Cell size (pixels) 4x4 3x3 1x1 2x2 4x4 Block size (Cells) Trade off between need for local spatial invariance and need for finer spatial resolution
Histograms of Oriented Gradients for Human Detection p. 11/1 Descriptor Cues input image weighted pos wts weighted neg wts The most important cues are head, shoulder, leg silhouettes Vertical gradients inside the person count as negative Overlapping blocks those just outside the contour are the most important avg. grad outside in block
Histograms of Oriented Gradients for Human Detection p. 12/1 Conclusions Fine grained features improve performance No gradient smoothning, [ 1, 0, 1] derivative mask Use gradient magnitude (no thresholding) Orientation voting into fine bins Spatial voting into coarser bins Strong local normalization Overlapping normalization blocks - A general object classifier - Also works well for other classes - Linear SVM is reliable & fast, but not optimal - Human detection: 90% at 10 4 false positives per window
Histograms of Oriented Gradients for Human Detection p. 13/1 Demo No temporal smoothning