Histogram of Oriented Gradients (HOG) for Object Detection

Similar documents
Histograms of Oriented Gradients for Human Detection p. 1/1

Human detection using histogram of oriented gradients. Srikumar Ramalingam School of Computing University of Utah

Histogram of Oriented Gradients for Human Detection

Object Category Detection: Sliding Windows


Object Detection Design challenges

Category vs. instance recognition

Person Detection in Images using HoG + Gentleboost. Rahul Rajan June 1st July 15th CMU Q Robotics Lab

Object Category Detection. Slides mostly from Derek Hoiem

Deformable Part Models

Visuelle Perzeption für Mensch- Maschine Schnittstellen

Human detection using local shape and nonredundant

Human Detection Using Oriented Histograms of Flow and Appearance

SURF. Lecture6: SURF and HOG. Integral Image. Feature Evaluation with Integral Image

Human Detection Using Oriented Histograms of Flow and Appearance

Category-level localization

Fast Human Detection Using a Cascade of Histograms of Oriented Gradients

Large-Scale Traffic Sign Recognition based on Local Features and Color Segmentation

Object Category Detection: Sliding Windows

Histograms of Oriented Gradients for Human Detection

Recap Image Classification with Bags of Local Features

People detection in complex scene using a cascade of Boosted classifiers based on Haar-like-features

High Level Computer Vision. Sliding Window Detection: Viola-Jones-Detector & Histogram of Oriented Gradients (HOG)

Modern Object Detection. Most slides from Ali Farhadi

A New Strategy of Pedestrian Detection Based on Pseudo- Wavelet Transform and SVM

Integral Channel Features Addendum

Multiple-Person Tracking by Detection

A Cascade of Feed-Forward Classifiers for Fast Pedestrian Detection

Human detections using Beagle board-xm

Action recognition in videos

Tri-modal Human Body Segmentation

Mobile Human Detection Systems based on Sliding Windows Approach-A Review

Discriminative classifiers for image recognition

High-Level Fusion of Depth and Intensity for Pedestrian Classification

Human detection based on Sliding Window Approach

HISTOGRAMS OF ORIENTATIO N GRADIENTS

International Journal Of Global Innovations -Vol.4, Issue.I Paper Id: SP-V4-I1-P17 ISSN Online:

FAST HUMAN DETECTION USING TEMPLATE MATCHING FOR GRADIENT IMAGES AND ASC DESCRIPTORS BASED ON SUBTRACTION STEREO

Detection of Multiple, Partially Occluded Humans in a Single Image by Bayesian Combination of Edgelet Part Detectors

Human Detection and Action Recognition. in Video Sequences

Haar Wavelets and Edge Orientation Histograms for On Board Pedestrian Detection

Face Detection and Alignment. Prof. Xin Yang HUST

Feature descriptors. Alain Pagani Prof. Didier Stricker. Computer Vision: Object and People Tracking

Co-occurrence Histograms of Oriented Gradients for Pedestrian Detection

Pedestrian Detection in Infrared Images based on Local Shape Features

DPM Score Regressor for Detecting Occluded Humans from Depth Images

Development in Object Detection. Junyuan Lin May 4th

Detecting Pedestrians by Learning Shapelet Features

Video Google faces. Josef Sivic, Mark Everingham, Andrew Zisserman. Visual Geometry Group University of Oxford

Object detection using non-redundant local Binary Patterns

Part based models for recognition. Kristen Grauman

Real-time Accurate Object Detection using Multiple Resolutions

Human Motion Detection and Tracking for Video Surveillance

Templates and Background Subtraction. Prof. D. Stricker Doz. G. Bleser

The Fastest Pedestrian Detector in the West

Human Detection Based on a Probabilistic Assembly of Robust Part Detectors

Selective Search for Object Recognition

Computer Science Faculty, Bandar Lampung University, Bandar Lampung, Indonesia

Beyond Bags of features Spatial information & Shape models

Multi-Cue Pedestrian Classification With Partial Occlusion Handling

A novel template matching method for human detection

Fast and Stable Human Detection Using Multiple Classifiers Based on Subtraction Stereo with HOG Features

Classification of objects from Video Data (Group 30)

Human Detection. A state-of-the-art survey. Mohammad Dorgham. University of Hamburg

Previously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011

Contour Segment Analysis for Human Silhouette Pre-segmentation

Fast Human Detection With Cascaded Ensembles On The GPU

Fast Human Detection with Cascaded Ensembles. Berkin Bilgiç

Automatic Adaptation of a Generic Pedestrian Detector to a Specific Traffic Scene

Local Image Features

Towards Practical Evaluation of Pedestrian Detectors

Detection III: Analyzing and Debugging Detection Methods

Exploring Weak Stabilization for Motion Feature Extraction

2D Image Processing Feature Descriptors

Window based detectors

Fast Human Detection Algorithm Based on Subtraction Stereo for Generic Environment

Motion illusion, rotating snakes

Implementation of Human detection system using DM3730

Pedestrian Detection: A Survey of Methodologies, Techniques and Current Advancements

CS229: Action Recognition in Tennis

Finding people in repeated shots of the same scene

AK Computer Vision Feature Point Detectors and Descriptors

Object Recognition II

Fast, Accurate Detection of 100,000 Object Classes on a Single Machine

An Implementation on Histogram of Oriented Gradients for Human Detection

Part-based and local feature models for generic object recognition

Detecting and Segmenting Humans in Crowded Scenes

Human Detection and Tracking for Video Surveillance: A Cognitive Science Approach

Online Object Tracking with Proposal Selection. Yang Hua Karteek Alahari Cordelia Schmid Inria Grenoble Rhône-Alpes, France

Histograms of Oriented Gradients

DEPARTMENT OF INFORMATICS

Pedestrian Detection and Tracking in Images and Videos

Previously. Window-based models for generic object detection 4/11/2011

Recent Researches in Automatic Control, Systems Science and Communications

Cat Head Detection - How to Effectively Exploit Shape and Texture Features

The SIFT (Scale Invariant Feature

Pedestrian Detection Using Global-Local Motion Patterns

Object Detection by 3D Aspectlets and Occlusion Reasoning

Part-Based Models for Object Class Recognition Part 2

Part-Based Models for Object Class Recognition Part 2

Transcription:

Histogram of Oriented Gradients (HOG) for Object Detection Navneet DALAL Joint work with Bill TRIGGS and Cordelia SCHMID

Goal & Challenges Goal: Detect and localise people in images and videos n Wide variety of articulated poses n Variable appearance and clothing n Complex backgrounds n Unconstrained illumination n Occlusions, different scales n Videos sequences involves motion of the subject, the camera and the objects in the background Main assumption: upright fully visible people 2

Chronology n Haar Wavelets as features + AdaBoost for learning u Viola & Jones, ICCV 2001 u De-facto standard for detecting faces in images n Another approach: Haar wavelets + SVM: u Papageorgiou & Poggio, 2000; Mohan et al 2000 +1-1 +1-2 +1 3

Chronology n Edge templates from Gavrila et al n Based on Information bottleneck principle of Tishby et al n Maximize MI between edge fragments & detection task J J L L L Supports irregular shapes & partial occlusions Window free framework Sensitive to edge detection & edge threshold Not resistant to local illumination changes Needs segmented positive images At par with then s-o-a 4

Chronology n Key point detectors repeat on backgrounds n Key point detectors do not repeat on people, even when looking at two consecutive frames of a video n Leibe et al, 2005; Mikolajczyk et al, 2004 Needed a different approach 5

Overview of Methodology Detection Phase Scan image(s) at all scales and locations Scale-space pyramid ` Extract features over windows Run linear SVM classifier on all locations Detection window Fuse multiple detections in 3-D position & scale space Focus on building robust feature sets (static & motion) Object detections with bounding boxes 6

HOG for Finding People in Images N. Dalal and B. Triggs. Histograms of Oriented Gradients for Human Detection. CVPR, 2005 7

Static Feature Extraction Input image Detection window Normalise gamma Compute gradients Weighted vote in spatial & orientation cells Cell Block Contrast normalise over overlapping spatial cells Collect HOGs over detection window Linear SVM Overlap of Blocks N. Dalal and B. Triggs. Histograms of Oriented Gradients for Human Detection. CVPR, 2005 Feature vector f = [...,...,...] 8

Overview of Learning Phase Learning phase Input: Annotations on training images Create fixed-resolution normalised training image data set Resample negative training images to create hard examples Encode images into feature spaces Encode images into feature spaces Learn binary classifier Learn binary classifier Object/Non-object decision Retraining reduces false positives by an order of magnitude! 9

HOG Descriptors Parameters n Gradient scale n Orientation bins n Percentage of block overlap Block Schemes n RGB or Lab, colour/gray-space n Block normalisation or L2-norm, L1-norm, v v / v v 2 2 + ε v /( v + ε ) 1 R-HOG/SIFT Cell C-HOG Center bin 10

11 Evaluation Data Sets MIT pedestrian database INRIA person database Train 507 positive windows Negative data unavailable Train 1208 positive windows 1218 negative images Test 200 positive windows Negative data unavailable Test 566 positive windows 453 negative images Overall 709 annotations+ reflections Overall 1774 annotations+ reflections

Overall Performance MIT pedestrian database INRIA person database n R/C-HOG give near perfect separation on MIT database n Have 1-2 order lower false positives than other descriptors 12

Performance on INRIA Database 13

14 Effect of Parameters Gradient smoothing, σ Orientation bins, β n Reducing gradient scale from 3 to 0 decreases false positives by 10 times n Increasing orientation bins from 4 to 9 decreases false positives by 10 times

15 Normalisation Method & Block Overlap Normalisation method Block overlap n Strong local normalisation is essential n Overlapping blocks improve performance, but descriptor size increases

Effect of Block and Cell Size 64 128 n Trade off between need for local spatial invariance and need for finer spatial resolution 16

Descriptor Cues Input example Average gradients Weighted pos wts Weighted neg wts Outside-in weights n Most important cues are head, shoulder, leg silhouettes n Vertical gradients inside a person are counted as negative n Overlapping blocks just outside the contour are most important 17

Multi-Scale Object Localisation Bias Multi-scale dense scan of detection window Clip Detection Score Final detections y Threshold s (in log) Η i x = [exp( s f ( x) = n i i ) σ,exp( s x w i exp ( x x ) / Η / 2 Apply robust mode detection, like mean shift i ) σ, σ ] y i s 1 2 i 18

Effect of Spatial Smoothing n n Spatial smoothing aspect ratio as per window shape, smallest sigma approx. equal to stride/cell size Relatively independent of scale smoothing, sigma equal to 0.4 to 0.7 octaves gives good results 19

20 Effect of Other Parameters Different mappings Effect of scale-ratio n Hard clipping of SVM scores gives the best results than simple probabilistic mapping of these scores n Fine scale sampling helps improve recall

HOGs vs approaches till date HOG still among the best detector in terms of FPPI n See Dollar et al, CVPR 2009 Pedestrian Detection: A Benchmark miss rate 1 0.9 0.8 0.7 0.6 0.5 (b) Typical aspect ratios 0.4 0.3 VJ (0.85) HOG (0.44) FtrMine (0.55) Shapelet (0.80) MultiFtr (0.54) LatSvm (0.63) HikSvm (0.77) 10 2 10 1 10 0 10 1 10 2 false positives per image 21

Results Using Static HOG No temporal smoothing of detections 22

Conclusions for Static Case n Fine grained features improve performance u Rectify fine gradients then pool spatially No gradient smoothing, [1 0-1] derivative mask Orientation voting into fine bins Spatial voting into coarser bins u Use gradient magnitude (no thresholding) u Strong local normalization u Use overlapping blocks u Robust non-maximum suppression Fine scale sampling, hard clipping & anisotropic kernel J Human detection rate of 90% at 10-4 false positives per window L Slower than integral images of Viola & Jones, 2001 23

Applications to Other Classes M. Everingham et al. The 2005 PASCAL Visual Object Classes Challenge. Proceedings of the PASCAL Challenge Workshop, 2006. 24

Motion HOG for Finding People in Videos N. Dalal, B. Triggs and C. Schmid. Human Detection Using Oriented Histograms of Flow and Appearance. ECCV, 2006. 25

Finding People in Videos n Motivation u Human motion is very characteristic n Requirements u Must work for moving camera and background u Robust coding of relative motion of human parts n Previous works u Viola et al, 2003 u Gavrila et al, 2004 u Efros et al, 2003 Courtesy: R. Blake Vanderbilt Univ N. Dalal, B. Triggs and C. Schmid. Human Detection Using Oriented Histograms of Flow and Appearance. ECCV, 2006. 26

Handling Camera Motion n Camera motion characterisation u Pan and tilt is locally translational u Rest is depth induced motion parallax n Use local differential of flow u Cancels out effects of camera rotation u Highlights 3D depth boundaries u Highlights motion boundaries n Robust encoding into oriented histograms u Some focus on capturing motion boundaries u Other focus on capturing internal motion or relative dynamics of different limbs 27

28 Motion HOG Processing Chain Detection windows Input image Consecutive image Normalise gamma & colour Compute optical flow Flow field Magnitude of flow Compute differential flow Accumulate votes for differential flow orientation over spatial cells Differential flow X Differential flow Y Normalise contrast within overlapping blocks of cells Collect HOGs for all blocks over detection window Block Overlap of Blocks Cell

29 Overview of Feature Extraction Appearance Channel Input image Static HOG Encoding Consecutive image(s) Motion HOG Encoding Motion Channel Collect HOGs over detection window Linear SVM Object/Non-object decision Train Test 1 Test 2 Data Set 5 DVDs, 182 shots 5562 positive windows Same 5 DVDs, 50 shots 1704 positive windows 6 new DVDs, 128 shots 2700 positive windows

Coding Motion Boundaries n Treat x, y-flow components as independent images n Take their local gradients separately, and compute HOGs as in static images First frame Second frame Estd. flow Flow mag. Motion Boundary Histograms (MBH) encode depth and motion boundaries x-flow diff y-flow diff Avg. x-flow diff Avg. y-flow diff 30

Coding Internal Dynamics n Ideally compute relative displacements of different limbs u Requires reliable part detectors n Parts are relatively localised in our detection windows n Allows different coding schemes based on fixed spatial differences Internal Motion Histograms (IMH) encode relative dynamics of different regions 31

IMH Continued n Simple difference u Take x, y differentials of flow vector images [I x, I y ] u Variants may use larger spatial displacements while differencing, e.g. [1 0 0 0-1] n Center cell difference n Wavelet-style cell differences +1-1 +1-1 +1-2 +1 +1 +1 +1 +1 +1-1 +1-2 +1 +1-1 +1-1 +1 +1 +1 +1 +1-1 -1 +1-1 +1-1 -1 +1-1 +1 32

Flow Methods n Proesman s flow [ Proesmans et al. ECCV 1994] u 15 seconds per frame n Our flow method u Multi-scale pyramid based method, no regularization u Brightness constancy based damped least squares solution T T 1 T [ x, y] = ( A A + βi) A b on 5X5 window u 1 second per frame n MPEG-4 based block matching u Runs in real-time Input image Proesman s flow Our multi-scale flow 33

34 Performance Comparison Only motion information Appearance + motion n With motion only, MBH scheme on Proesmans flow works best n Combined with appearance, centre difference IMH performs best

Trained on Static & Flow Tested on flow only Tested on appearance + flow n Adding static images during test reduces performance margin n No deterioration in performance on static images 35

Trained on Static & Flow Tested on flow only Tested on appearance + flow n Adding static images during test reduces performance margin n No deterioration in performance on static images 36

37 Motion HOG Video No temporal smoothing, each pair of frames treated independently

r r r 0.3 0.4 0.30.2 0.2 0.1 0.1 HOG, HOG, HOG, HOG, HOG, HOG, HOG, HOG, IMHwd and SVM IMHwd, Haar and SVM IMHwd and MPLBoost (K=3) IMHwd, Haar and MPLBoost HOG, IMHwd and SVM(K=4) IMHwd andand AdaBoost HOG MPLBoost (K=3) IMHwd, Haar and AdaBoost HOG, IMHwd and HIKSVM Haar and MPLBoost (K=4) HOG and HIKSVM IMHwd and Ess et HIKSVM al. (ICCV 07) - Full system 0.3 Recall-Precision for Motion HOG 0.00.0 0.00.0 0.1 0.2 0.1 HOGfalse +1-precision IMHwd + HIK-SVM positives per image 0.2 0.5 0.3 0.4 0.5 1.0 0.6 0.7 1.5 0.8 0.9 1.0 2.0 0.0 0.0 (c) Comparison to [11] (ETH-01) (e) Including motion features (ETH-02) 0.6 0.70.5 0.5 0.6 0.4 0.4 recall 0.9 0.6 0.8 recall recall 0.7 0.5 HOG, IMHwd and SVM HOG, IMHwd, Haar and SVM HOG, IMHwd and MPLBoost (K=3) HOG, IMHwd, Haar andand MPLBoost (K=4) HOG, IMHwd, Haar MPLBoost (K=4) HOG, IMHwd HOG, Haarand andadaboost MPLBoost (K=4) HOG, IMHwd, Haar AdaBoost HOG, IMHwd andand HIKSVM HOG andand MPLBoost (K=3) HOG HIKSVM HOG, and HIKSVM EssIMHwd et al. (ICCV 07) - Full system 0.2 0.1 0.1 0.00.0 0.00.0 0.1 0.2 0.3 0.5 0.4 0.5 1.0 0.6 0.7 false1-precision positives per image 0.8 1.5 0.9 n n 0.9 0.6 0.8 0.5 0.7 1.5 2.0 1.0 2.0 0.3 0.2 HOG, IMHwd and MPLBoost (K=3) HOG and MPLBoost (K=3) HOG, IMHwd and HIKSVM HOG and HIKSVM Ess et al. (ICCV 07) - Full system 0.1 0.0 0.0 (h) Including motion features (ETH-03) (f) Comparison to [11] (ETH-02) 1.0 0.7 1.0 false positives per image ETH03 1.00.7 0.30.2 0.5 (f) Comparison to [11] (ETH-02) ETH02 0.3 0.4 HOG, IMHwd, Haar and MPLBoost (K=4) HOG, Haar and MPLBoost (K=4) HOG, IMHwd and HIKSVM HOG and HIKSVM Ess et al. (ICCV 07) - Full system 0.5 1.0 false positives per image 1.5 2.0 (i) Comparison to [11] (ETH-03) Wojek et al, CVPR 09 Robust regularized flow + max in non-max suppression 1.0 0.9 0.8 0.7 38

Conclusions for Motion HOG n Summary u When combined with appearance, IMH outperforms MBH u Regularization in flow estimates reduces performance u MPEG4 block matching looks good but motion estimates not good for detection u Larger spatial difference masks help u Strong local normalization is very important u Relatively insensitive to number of orientation bins J Window classifier reduces false positives by 10 times L Slow compared to static HOG (probably not any more FlowLib from GPU4Vision) 39

Summary n Bottom-up approach to object detection n Robust feature encoding for person detection n Gives state-of-the-art results for person detection n Also works well for other object classes n Proposed differential motion features vectors for feature extraction from videos 40

Extensions n Real time feature computation (Wojek et al, DAGM 08; Wang et al, ICCV 09) n AdaBoost rejection cascade algorithms (Zhu et al, CVPR 06; Laptev, BMVC 06) n Part based detector for partial occlusions (Felzenszwalb et al, PAMI 09; Wang et al, ICCV 09) n Motion HOG extended (Wojek et al, CVPR 09; Laptev et al, CVPR 08) n Histogram intersection kernel (Maji et al, CVPR 2008, CVPR 2009, ICCV 2009) n Higher level image analysis (Hoiem IJCV 08) 41

Features for Object Detection n Local Binary Pattern u Wang et al, ICCV 2009 n Co-occurrence Matrices + HOG + PLS u Schwartz et al ICCV 2009 n Color HOG (Discriminative segmentation of fg/bg regions) u Ott & Everingham, ICCV 2009 42

43 Founder & CEO Gesture Detection Using Webcams

Complete Lean Back Experience 44

Beta Launch in July 2011 n State of art work in research & engineering n Candidates for usability studies n Summer internships Contact: dalal@botsquare.com http://botsquare.com 45

46 Thank You Contact: dalal@botsquare.com