Visuelle Perzeption für Mensch- Maschine Schnittstellen

Similar documents
Visuelle Perzeption für Mensch- Maschine Schnittstellen

Visuelle Perzeption für Mensch- Maschine Schnittstellen

Object Category Detection. Slides mostly from Derek Hoiem

Part based models for recognition. Kristen Grauman

Histograms of Oriented Gradients for Human Detection p. 1/1

Object Category Detection: Sliding Windows

Deformable Part Models

Previously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011

Modern Object Detection. Most slides from Ali Farhadi

Part-Based Models for Object Class Recognition Part 2

Part-Based Models for Object Class Recognition Part 2

Category vs. instance recognition

Histogram of Oriented Gradients (HOG) for Object Detection

Human Detection. A state-of-the-art survey. Mohammad Dorgham. University of Hamburg

Part-based and local feature models for generic object recognition

A novel template matching method for human detection

Recap Image Classification with Bags of Local Features

Object Detection Design challenges

Human detection using local shape and nonredundant

Histogram of Oriented Gradients for Human Detection

Object Category Detection: Sliding Windows

Window based detectors

Visuelle Perzeption für Mensch- Maschine Schnittstellen

Object detection using non-redundant local Binary Patterns

Detecting and Segmenting Humans in Crowded Scenes

Beyond Bags of features Spatial information & Shape models


Detection III: Analyzing and Debugging Detection Methods

Object recognition (part 1)

Detecting Pedestrians by Learning Shapelet Features

Object detection. Announcements. Last time: Mid-level cues 2/23/2016. Wed Feb 24 Kristen Grauman UT Austin

Face Detection and Alignment. Prof. Xin Yang HUST

Templates and Background Subtraction. Prof. D. Stricker Doz. G. Bleser

MediaTek Video Face Beautify

Fast Human Detection Using a Cascade of Histograms of Oriented Gradients

Beyond Bags of Features

High Level Computer Vision

Find that! Visual Object Detection Primer

Previously. Window-based models for generic object detection 4/11/2011

Local Features based Object Categories and Object Instances Recognition

Category-level localization

Bus Detection and recognition for visually impaired people

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009

Announcements. Recognition I. Gradient Space (p,q) What is the reflectance map?

Object Detection with Discriminatively Trained Part Based Models

Announcements. Recognition. Recognition. Recognition. Recognition. Homework 3 is due May 18, 11:59 PM Reading: Computer Vision I CSE 152 Lecture 14

Action recognition in videos

High-Level Fusion of Depth and Intensity for Pedestrian Classification

CAP 6412 Advanced Computer Vision

Structured Models in. Dan Huttenlocher. June 2010

Three-Dimensional Object Detection and Layout Prediction using Clouds of Oriented Gradients

Detection of Multiple, Partially Occluded Humans in a Single Image by Bayesian Combination of Edgelet Part Detectors

Part-based models. Lecture 10

Designing Applications that See Lecture 7: Object Recognition

Person Detection in Images using HoG + Gentleboost. Rahul Rajan June 1st July 15th CMU Q Robotics Lab

Detecting Object Instances Without Discriminative Features

Final Project Face Detection and Recognition

Computer Vision with MATLAB MATLAB Expo 2012 Steve Kuznicki

Large-Scale Traffic Sign Recognition based on Local Features and Color Segmentation

3D Face and Hand Tracking for American Sign Language Recognition

Human detections using Beagle board-xm

A New Strategy of Pedestrian Detection Based on Pseudo- Wavelet Transform and SVM

Image Processing. Image Features

A Cascade of Feed-Forward Classifiers for Fast Pedestrian Detection

Object and Class Recognition I:

Object Recognition. Computer Vision. Slides from Lana Lazebnik, Fei-Fei Li, Rob Fergus, Antonio Torralba, and Jean Ponce

Articulated Pose Estimation with Flexible Mixtures-of-Parts

People detection in complex scene using a cascade of Boosted classifiers based on Haar-like-features

Detection of a Single Hand Shape in the Foreground of Still Images

Real Time Stereo Vision Based Pedestrian Detection Using Full Body Contours

Non-rigid body Object Tracking using Fuzzy Neural System based on Multiple ROIs and Adaptive Motion Frame Method

Automatic Initialization of the TLD Object Tracker: Milestone Update

FAST HUMAN DETECTION USING TEMPLATE MATCHING FOR GRADIENT IMAGES AND ASC DESCRIPTORS BASED ON SUBTRACTION STEREO

Real-Time Human Detection using Relational Depth Similarity Features

Course Administration

Template Matching Rigid Motion

Feature Descriptors. CS 510 Lecture #21 April 29 th, 2013

Class-Specific Weighted Dominant Orientation Templates for Object Detection

International Journal Of Global Innovations -Vol.4, Issue.I Paper Id: SP-V4-I1-P17 ISSN Online:

Bayes Risk. Classifiers for Recognition Reading: Chapter 22 (skip 22.3) Discriminative vs Generative Models. Loss functions in classifiers

Generic Object-Face detection

Pedestrian detection from traffic scenes based on probabilistic models of the contour fragments

Chapter 9 Object Tracking an Overview

Monocular Pedestrian Detection: Survey and Experiments

Categorization by Learning and Combining Object Parts

Research on Robust Local Feature Extraction Method for Human Detection

Local cues and global constraints in image understanding

Classifiers for Recognition Reading: Chapter 22 (skip 22.3)

Object Recognition II

Discriminative classifiers for image recognition

3D Computer Vision. Structured Light II. Prof. Didier Stricker. Kaiserlautern University.

Traffic Sign Localization and Classification Methods: An Overview

Object Classification for Video Surveillance

DPM Score Regressor for Detecting Occluded Humans from Depth Images

Accurate 3D Face and Body Modeling from a Single Fixed Kinect

Model-based Visual Tracking:

Last week. Multi-Frame Structure from Motion: Multi-View Stereo. Unknown camera viewpoints

Structured Light II. Thanks to Ronen Gvili, Szymon Rusinkiewicz and Maks Ovsjanikov

Human detection using histogram of oriented gradients. Srikumar Ramalingam School of Computing University of Utah

Patch Descriptors. CSE 455 Linda Shapiro

Transcription:

Visuelle Perzeption für Mensch- Maschine Schnittstellen Vorlesung, WS 2009 Prof. Dr. Rainer Stiefelhagen Dr. Edgar Seemann Institut für Anthropomatik Universität Karlsruhe (TH) http://cvhci.ira.uka.de rainer.stiefelhagen@kit.edu seemann@pedestrian-detection.com Edgar Seemann, 04.12.09 1 Interactive Systems Laboratories, Universität Karlsruhe (TH)

Computer Vision: People Detection II WS 2009/10 Dr. Edgar Seemann seemann@pedestrian-detection.com Edgar Seemann, 04.12.09 2 Interactive Systems Laboratories, Universität Karlsruhe (TH)

Termine 18.12.2009 21.12.2009 Thema 19.10.2009 Introduction, Applications Stereo and Optical Flow TBA Termine (1) 23.10.2009 Basics: Cameras, Transformations, Color 26.10.2009 Basics: Image Processing 30.10.2009 Basics: Pattern recognition 02.11.2009 Computer Vision: Tasks, Challenges, Learning, Performance measures 06.11.2009 Face Detection I: Color, Edges (Birchfield) 09.11.2009 Project 1: Intro + Programming tips 13.11.2009 Face Detection II: ANNs, SVM, Viola & Jones 16.11.2009 Project 1: Questions 20.11.2009 Face Recognition I: Traditional Approaches, Eigenfaces, Fisherfaces, EBGM 23.12.2009 Face Recognition II 27.11.2009 Head Pose Estimation: Model-based, NN, Texture Mapping, Focus of Attention 30.11.2009 People Detection I 04.12.2009 People Detection II 07.12.2009 Project 1: Student Presentations, Project 2: Intro 11.12.2009 People Detection III (Part-Based Models) 14.12.2009 Scene Context and Geometry Edgar Seemann, 04.12.09 3 Interactive Systems Laboratories, Universität Karlsruhe (TH)

Today Last time: Global approaches Compute a singe feature vector / representation for the complete body Today: Silhouette matching (global approach) Part-based approaches Edgar Seemann, 04.12.09 4

Silhouette Matching Edgar Seemann, 04.12.09 5

Chamfer Matching [Gavrila & Philomin ICCV 99] Goal Align known object shapes with image Object shapes Requirements for an alignment algorithm High detection rate Few false positives Robustness Computationally inexpensive Real-world image of object Edgar Seemann, 04.12.09 6

Distance Transform Used to compare/align two (typically binary) shapes 1. Compute for each pixel the distance to the next edge pixel Shape 1 Shape 2 Here the eculidean distances are approximated by the 2-3 distance Distance =? Distance transform 8 6 5 4 4 5 6 8 6 5 3 2 2 3 5 6 5 3 2 0 0 2 3 5 3 2 0 2 2 0 2 3 2 0 2 2 2 2 0 2 2 0 0 0 0 0 0 2 3 2 2 2 2 2 3 5 2 4 4 4 4 4 4 5 Edgar Seemann, 04.12.09 7

Distance Transform 2. Overlay second shape over distance transform Distance transform 8 6 5 4 4 5 6 8 6 5 3 2 2 3 5 6 5 3 2 0 0 2 3 5 3 2 0 2 2 0 2 3 2 0 2 2 2 2 0 2 2 0 0 0 0 0 0 2 3 2 2 2 2 2 3 5 2 4 4 4 4 4 4 5 Distance = 14 3. Accumulate distances along shape 2 4. Find best matching position by an exhaustive search Distance is not symmetric Distance has to be normalized w.r.t. the length of the shapes Edgar Seemann, 04.12.09 8

Chamfer Matching Binary image Distance transform 8 6 5 4 4 5 6 8 6 5 3 2 2 3 5 6 5 3 2 0 0 2 3 5 3 2 0 2 2 0 2 3 2 0 2 2 2 2 0 2 2 0 0 0 0 0 0 2 3 2 2 2 2 2 3 5 2 4 4 4 4 4 4 5 Distance transform of a real-world image Chamfer Matching Compute distance transform (DT) For each possible object location Position known object shape over DT Accumulate distances along the contour Distance measure Edgar Seemann, 04.12.09 9

Efficient implementation The distance transform can be efficiently computed by two scans over the complete image Forward-Scan Starts in the upper-left corner and moves from left to right, top to bottom Uses the following mask Backward-Scan Starts in the lower-right corner and moves from right to left, bottom to top Uses the following mask 3 2 3 2 0 0 2 3 2 3 Edgar Seemann, 04.12.09 10

Forward scan We can choose different values for the filter mask The local distances, d, s and c, in the mask are added to the pixel values of the distance map and the new value of the zero pixel is the minimum of the five sums Example: 3 2 3 5 3 2+2 2+3 3+5 d s d s c 2 0?? 2 2+0 2????????? Edgar Seemann, 04.12.09 11

Advantages and Disadvantages Fast Distance transform has to be computed only once Comparison for each shape location is cheap Good performance on uncluttered images (with few background structures) Bad performance for cluttered images Needs a huge number of people silhouettes But computation effort increases with the number of silhouettes Edgar Seemann, 04.12.09 12

Template Hierachy To reduce the number of silhouettes to consider, silhouettes can be organized in a template hierarchy For this, the shapes are clustered by similarity Edgar Seemann, 04.12.09 13

Search in the hierarchy Matching the shapes, then corresponds to a traversal of the template hierarchy How can we prune search branches to speed up matching? Thresholds depend on: Edge detector (likelihood of gaps) Silhouette sizes Hierarchy level Allowed shape variation Thresholds are set statistically during training Edgar Seemann, 04.12.09 14

Example Detections Edgar Seemann, 04.12.09 15

Video Edgar Seemann, 04.12.09 16

Coarse-To-Fine Search Goal: Reduce search effort by discarding unlikely regions with minimal computation Idea: Subsample image and search first at a coarse scale Only consider regions with a low distance when searching for a match on finer scales Again, we have to find reasonable thresholds Level 1 Level 2 Level 3 Edgar Seemann, 04.12.09 17

Protector System (Daimler) Edgar Seemann, 04.12.09 18

Adding edge orientation So far edge orientation has been completely ignored Distance = small Idea: Consider edge orientation for each pixel Edgar Seemann, 04.12.09 19

Edge orientation - The math Given two shapes S, C, we can express the chamfer distance in the following manner The orientation correspondence between two points is then measured by The combined distance measure: Edgar Seemann, 04.12.09 20

Statistical Relevance Adding statistical relevance of silhouette regions further improves the results [Dimitrijevic06] Edgar Seemann, 04.12.09 21

Spatio-Temporal templates Use multiple successive frames to build a spatiotemporal template (T={T 1,,T N }) Allow spatial variations of dx, dy (due to motion or camera movements) Edgar Seemann, 04.12.09 22

Example: single-frame vs. 3 frames Edgar Seemann, 04.12.09 23

Quantitative Results Red: spatio-temporal templates + statistical relevance Edgar Seemann, 04.12.09 24

Video Restrict detection to a single articulation (when legs are in a v-shaped position) Spatio-Temporal templates: Allows more reliable detection of motion direction Avoids confusions and some false positive detections Edgar Seemann, 04.12.09 25

Alternatives: Earth Mover s Distance Originally developed to compare histograms Idea: Find the minimal flow to transform one histogram to another Example: Edgar Seemann, 04.12.09 26

Earth mover s distance (EMD) Chamfer: EMD: Example detection EMD Matching Detect edges in image For each possible object location Optimize correspondences between known shape and edge image Distance measure basis Edgar Seemann, 04.12.09 27

EMD The math Variant of the transportation problem (possible solutions: Stepping Stone Algorithm, Transportation-simplex method) Constraints EMD-Distance Edgar Seemann, 04.12.09 28

Advantages and Disadvantages Optimizes matching between silhouette and edge structure in image Enforces one-to-one matchings (unlike chamfer) Allows partial matches Can deal with arbitrary features High computational complexity Approximation is possible [Graumann, Darrel CVPR 94] Edgar Seemann, 04.12.09 29

Part-Based Models Edgar Seemann, 04.12.09 30

Part-Based Models Fischler & Elschlager 1973 Model has two components parts (2D image fragments) structure (configuration of parts) Edgar Seemann, 04.12.09 31

Configuration of parts Fixed Spatial Layout Local parts have a mostly fixed position and orientation with respect to the center object center or detection window Flexible Spatial Layout Local parts are allowed to shift in location and scale Able to compensate for deformations or articulation changes Well suited for non-rigid objects Spatial relations often modeled probabilistically Edgar Seemann, 04.12.09 32

Different Connectivity Structures O(N 6 ) O(N 2 ) O(N 3 ) O(N 2 ) Fergus et al. 03 Fei-Fei et al. 03 Leibe et al. 04, 08 Crandall et al. 05 Fergus et al. 05 Crandall et al. 05 Felzenszwalb & Huttenlocher 05 Csurka 04 Vasconcelos 00 Bouchard & Triggs 05 Carneiro & Lowe 06 Edgar Seemann, 04.12.09 K. Grauman, B. Leibe 33 from [Carneiro & Lowe, ECCV 06]

Fixed Spatial Layout Edgar Seemann, 04.12.09 34

Motivation Breakdown overall variability into more manageable pieces Pieces can be classified by a less complex classifier Apply prior information by (manually) grouping the global feature vector into meaningful parts Edgar Seemann, 04.12.09 35

Example: Sashua et al. IVS 04 Divide person into 9 overlapping parts 4 pair combinations Edgar Seemann, 04.12.09 36

Feature Representation Edge orientation histograms (similar to HOG) Each part is divided into 2x2 regions i.e. 4 histograms per part 8 gradient orientations Features are insensitive to small local shifts Edgar Seemann, 04.12.09 37

Part Classifier and Combination Parts are classified by a linear regression i.e. distance to a separating hyperplane One score with respect to each of the 9 parts Separate head vs. leg feature, head vs. upper body etc. The values of the individual part detectors are combined by AdaBoost Edgar Seemann, 04.12.09 38

Attention mechanism Attention mechanism Considers scene geometry, camera perspective Filters out regions with lack of texture properties Only 75 image windows are considered for pedestrian classification per frame Edgar Seemann, 04.12.09 39

Results Lower curve: Global SVM Middle curve: Mohan et al. Upper curve: Sashua et al. Edgar Seemann, 04.12.09 40

Results Edgar Seemann, 04.12.09 41

Example: Mohan et al. Body divided into 4 parts Face/Shoulder Legs Right/Left arm Detection Sliding window 64x128 pixels Edgar Seemann, 04.12.09 42

Geometric constraints Body parts are not always at the exact same position Allow local shifts Position Scale Best location has to be found for each detection window Edgar Seemann, 04.12.09 43

Training Data MIT pedestrian database Edgar Seemann, 04.12.09 44

Features Based on wavelets Multi-Resolution representation Haar wavelets Quadruple density Edgar Seemann, 04.12.09 45

Discrete Wavelet Transform (DWT) Motivation Representation with basis functions Given: Signal [9 7 3 5] Disadvantages: Coefficients provide only local information No information about global signal change http://nt.eit.uni-kl.de/wavelet/transformation.html Edgar Seemann, 04.12.09 46

Haar Wavelet Basis Scaling function provides information about the mean value or signal level Representation in wavelet basis: [12 4 sqrt(2) -sqrt(2)] Edgar Seemann, 04.12.09 47

1D Wavelet Transformation Step wise transformation with orthogonal spaces Basis function for V j Scaling function Basis function for W j Wavelet function Edgar Seemann, 04.12.09 48

Our Example Project signal onto subspaces V 1 W 1 V 2 W 2 Edgar Seemann, 04.12.09 49

2D Haar-Wavelets Same princple as in 1D Quadruple density i.e. overcomplete representation Edgar Seemann, 04.12.09 50

Feature Discussion Haar wavelets represent pixel/region differences Strong responses of wavelet coefficients on image boundaries, i.e. coefficients encode shape Similar to gradients at different resolutions Overcomplete basis allows good spatial resolution Fine scales are discarded as they represent noise Coarse scales are discarded as they represent global intensity levels and not the shape Edgar Seemann, 04.12.09 51

Performance: Part Detectors Edgar Seemann, 04.12.09 52

Classifier Combination Voting E.g. majority of part detectors, classify detection window as person SVM combination Feed SVM scores of part detectors in a second stage SVM Referred to as Adaptive Classifier Combination Edgar Seemann, 04.12.09 53

Combination Performance Edgar Seemann, 04.12.09 54

Results Edgar Seemann, 04.12.09 55

Occlusion Edgar Seemann, 04.12.09 56