Visuelle Perzeption für Mensch- Maschine Schnittstellen

Size: px

Start display at page:

Download "Visuelle Perzeption für Mensch- Maschine Schnittstellen"

Ashlynn Gibson
5 years ago
Views:

1 Visuelle Perzeption für Mensch- Maschine Schnittstellen Vorlesung, WS 2009 Prof. Dr. Rainer Stiefelhagen Dr. Edgar Seemann Institut für Anthropomatik Universität Karlsruhe (TH) Edgar Seemann,

2 Computer Vision: People Detection IV WS 2009/10 Dr. Edgar Seemann Edgar Seemann,

3 Termine (1) Termine Thema Introduction, Applications Basics: Cameras, Transformations, Color Basics: Image Processing Basics: Pattern recognition Computer Vision: Tasks, Challenges, Learning, Performance measures Face Detection I: Color, Edges (Birchfield) Project 1: Intro + Programming tips Face Detection II: ANNs, SVM, Viola & Jones Project 1: Questions Face Recognition I: Traditional Approaches, Eigenfaces, Fisherfaces, EBGM Face Recognition II Head Pose Estimation: Model-based, NN, Texture Mapping, Focus of Attention People Detection I People Detection II Project 1: Student Presentations, Project 2: Intro People Detection III (Part-Based Models) People Detection IV Scene Context, Geometry, Stereo and Optical Flow TBA Edgar Seemann,

4 Local Features Edgar Seemann,

5 So far Parts were defined manually Parts represented the semantic structure i.e. face, leg etc. Questions: Do these parts decompose the variability in an optimal way? Must the parts have a semantic meaning Should we use smaller/larger parts? Can we find parts automatically? Edgar Seemann,

6 Requirements for part decomposition Repeatable i.e. we should be able to find the part despite articulation or image transformations (e.g. rotation, perspective, lighting) Distinctive Part should not be confounded with other parts The regions should contain an interesting structure Compact Typically no lengthy or strangely shaped parts Efficient It should be computationally inexpensive to detect or represent part Cover parts need to sufficiently cover the object Edgar Seemann,

color Edgar Seemann, 14.12.09 8 B 3 1. Find a set of distinctive keypoints 2.

7 N pixels A 1 N pixels A 2 A 3 f A e.g. color Similarity measure d( f A, f B ) < T Approach B 1 B 2 f B e.g. color Edgar Seemann, B 3 1. Find a set of distinctive keypoints 2. Define a region around each keypoint 3. Extract and normalize the region content 4. Compute a local descriptor from the normalized region 5. Match local descriptors

8 Hessian Detector [Beaudet78] Hessian determinant I xx Hessian ( I ) = I I xx xy I I xy yy I xy I yy 2 det( Hessian ( I )) = I xx I yy I xy In Matlab: I I xx. yy ( I xy )^2 Edgar Seemann,

9 Automatic Scale Selection Function responses for increasing scale (scale signature) ( I i 1 K i ( x, σ )) I i Ki (, )) f m f m ( 1 x σ Edgar Seemann,

10 Results: Laplacian-of-Gaussian Edgar Seemann,

11 Implicit Shape Model Edgar Seemann,

12 Implicit Shape Model (ISM) Basic ideas 1. Automatically learn a large number of local parts that occur on the object Also referred to as visual vocabulary or appearance codebook 2. Learn a star-topology structural model Features are considered independent given obj. center x 6 x 5 x 1 x 4 x 2 x 3 Edgar Seemann, K. Grauman, B. Leibe 13

13 Visual Vocabulary / Appearance Codebook Edgar Seemann,

14 Visual Vocabulary Detect keypoints on all training examples Extract feature descriptions around keypoints Result: A large set of local image descriptors occurring on people Edgar Seemann,

15 Visual Vocabulary Group visually similar local descriptors i.e. parts that are reoccurring Parts, that occur only once are discarded (they could result from noise or unusual structures) Edgar Seemann,

16 Side Note: Grouping Algorithms Partitional Clustering K-Means Gaussian Mixture Clustering (EM) Hierarchical or Agglomerative Clustering Single-Link Group Average Ward s method (minimum variance) Edgar Seemann,

17 Complexity Standard Approach: Time complexity: O(n 2 logn) Compute distance matrix Consecutively merge the two most similar clusters Space complexity: O(n 2 ) Edgar Seemann,

18 Reciprocal Nearest Neighbor (RNN) RNN Algorithm [de Rham 80, Benzecri 82] Time complexity: O(n 2 ) Space complexity: O(n) Requirement: reducibility property [Bruynooghe 77] Edgar Seemann,

19 Space Complexity Note, that space complexity is quite important for clustering large data sets Example: data points Standard distance matrix contains: 10 5 *10 5 =10 10 entries -> ~40 GB if one entry has 32bit -> Does your PC have enough RAM? Edgar Seemann,

20 Clustering Hierarchy Agglomerative clustering produces a hierarchy Difficult question: where to stop? But Ideally, clusters should be visually compact. Distance value depends on feature dimensionality. Appropriate ratio #features/#clusters depends on data set and interest point detector. Needs to be selected for each detector/descriptor combination! Edgar Seemann,

21 Visual Vocabulary Vocabulary size ~10000 clusters Probabilistic votes decide, whether part is important or not Edgar Seemann,

22 Learning Spatial Structure: Star -Model Edgar Seemann,

Implicit Shape Model - Representation 1. Learn appearance codebook Extract local features at interest points Agglomerative clustering codebook 2.

23 Implicit Shape Model - Representation 1. Learn appearance codebook Extract local features at interest points Agglomerative clustering codebook 2. Learn spatial distributions Match codebook to training images Record matching positions on object Sparse representation of the object appearance Edgar Seemann,

24 Training: Spatial Occurrence (Star-Model) 1. Record spatial occurrence Match codebook to training images Record occurrence distributions with respect to object center Location (x, y) and scale y y s x y y s x Star-Model s s x x Spatial occurrence distributions Edgar Seemann,

25 Occurrence Distribution For each codebook entry, we obtain a nonparametric probability distribution of its position relative to the object center With c i a codebook entry λ=(λ x, λ y, λ s ) the relative position and scale Edgar Seemann,

26 Remember: Generalized Hough Transform [Ballard81] Choose reference point for the contour (e.g. center) For each point on the contour remember where it is located w.r.t. to the reference point Remember radius r and angle φ relative to the contour tangent Recognition: whenever you find a contour point, calculate the tangent angle and vote for all possible reference points Instead of reference point, can also vote for transformation The same idea can be used with local features! Edgar Seemann, Slide credit: Bernt Schiele

features vote for possible object positions Object

27 Generalized Hough Transform For every feature, store possible occurrences For new image, let the matched features vote for possible object positions Object identity Pose Relative position Edgar Seemann,

28 Probabilistic Gen. Hough Transform Exact correspondences Prob. match to object part NN matching Soft matching Feature location on obj. Part location distribution Uniform votes Probabilistic vote weighting Quantized Hough array Continuous Hough space Edgar Seemann,

29 Detection Procedure Edgar Seemann,

30 Recognition: ISM Detection Procedure Image Probabilistic Voting y 3D Voting Space s Detection Confidences Segmentation Back-Projection Edgar Seemann, x

31 Probabilistic Formulation Descriptor contribution: With e an extracted image descriptor l the position of the descriptor in the image Marginalization over all found descriptors: Edgar Seemann,

32 Scale Voting: Efficient Computation s y Scale votes s y Binned accum. array Mean-Shift formulation for refinement Scale-adaptive balloon density estimator s y x Candidate maxima s y x Refinement (MSME) Edgar Seemann,

33 Figure-Ground Segmentation Edgar Seemann,

34 Occurrence distributions Adding local segmentation masks y y s s x x y y s s x x Spatial occurrence distributions + local figure-ground labels Edgar Seemann,

35 Figure-Ground Segmentation Influence of descriptor on an object hypotheses: Figure probability for a hypothesis: Segmentation information Influence on object hypothesis Edgar Seemann,

36 Figure-Ground Segmentation Final segmentation value: Edgar Seemann,

37 Overlapping hypotheses Edgar Seemann,

38 Minimum Description Length (MDL) Reasoning Savings term: S area : #pixels N in segmentation S model : model cost, assumed constant S error : estimate of error Error term: Overlapping hypotheses: Edgar Seemann,

39 MDL based Verification Secondary hypotheses Desired property of algorithm! robustness to occlusion Standard solution: reject based on bounding box Problematic - may lead to missing detections! Use segmentations to resolve ambiguities instead Edgar Seemann, Leibe, Leonardis, Schiele, 04

40 Extensions and Evaluation Edgar Seemann,

41 Outline 1. Image Descriptors and Interest Points 2. Body Articulations 3. Cross-Articulation Learning 4. Discriminative Hypothesis Verification 5. Instance-Specific Models Edgar Seemann,

reduced False alarms may occur on background clutter Idea: Use global constraints to

42 Star-Model and Body Articulations Flexibility of star-model has several drawbacks Hypotheses may contain superfluous body parts Score of neighboring hypotheses may be reduced False alarms may occur on background clutter Idea: Use global constraints to remove inconsistent features 1. Silhouette verification 2. 4D-ISM Edgar Seemann,

43 1. Chamfer Verification Candidate silhouettes Match to image 1. Rescale image region 2. Distance transform 3. Chamfer matching Problem Chamfer matching not robust enough! Edgar Seemann,

44 Chamfer Verification (2) Candidate silhouettes Match to segmentation 1. Rescale segmentation 2. Bhattacharyya coefficient Combined match Weighted sum Edgar Seemann,

45 Effect of the Verification Stage Edgar Seemann,

46 Demo Edgar Seemann,

47 2. 4D-ISM - Learning Occurrences 1. Identify typical articulations by silhouettes clustering 2. Add silhouette information to occurrences 3D Occurrence Distributions 4D Occurrence Distributions y y Resulting articulation clusters s Vote v = (x, y, scale) x Pedestrian silhouettes Vote v = (x, y, scale, articulation) Edgar Seemann, s x

48 Recognition: 4D-ISM Voting Procedure Silhouette Clustering 4D Voting Space y s Articulation 1 x y Resulting hypotheses are consistent w.r.t body articulations s Articulation N x Edgar Seemann,

49 Results 4D-ISM Silhouette verification 4D-ISM 6% improvement 20% better precision at 80% recall Purely global constraints are often too strict Edgar Seemann,

50 Advantages and Disadvantages 4D- Successfully removes false alarms Estimates body articulation ISM 4D-ISM can handle partial occlusions better (compared to silhouette verification) Each training image contributes only to one articulation Number of required training images increases linearly with the number of articulations Edgar Seemann,

51 Outline 1. Image Descriptors and Interest Points 2. Body Articulations 3. Cross-Articulation Learning Transfer knowledge between articulations 4. Discriminative Hypothesis Verification 5. Instance-Specific Models Edgar Seemann,

52 Cross-Articulation Learning Share descriptor occurrences across articulations Less training data needed Better generalization Need to learn for each observed feature, on which articulations it could occur Edgar Seemann,

53 Training: Sharing Features Idea: Look at a larger local context around features Retrieve silhouettes which are locally similar Share corresponding feature between silhouettes For context radius r=object size we obtain 4D-ISM as a special case Edgar Seemann,

54 Model Comparison Standard ISM 4D-ISM Cross-Articulation Shape Occurrences Shape distribution Occurrence distribution (x, y, scale) Codebook Edgar Seemann,

55 Results Cross-Articulation Sharing features can improve performance by up to 10% Tested different context descriptors and sizes Edgar Seemann,

56 Advantages Cross-Articulation Available training data used more effectively Local context size and representation is independent of feature descriptor Edgar Seemann,

57 Outline 1. Image Descriptors and Interest Points 2. Body Articulations 3. Cross-Articulation Learning 4. Discriminative Hypothesis Verification Verify pedestrian hypothesis with a discriminative classifier 5. Instance-Specific Models Edgar Seemann,

58 Support Vector Machine (SVM) Verification Extract correct and false detections of the ISM Discriminate between correct and false examples Use an SVM with a Local Kernel [Wallraven et al. 03, Fritz et al. 05] ISM False Alarms Correct Pedestrians Edgar Seemann,

59 Results SVM Verification Precision of 98.6% at 80% recall At 80% recall: 1 false alarm in 90 images Compared to 528 false alarms for the standard approach Edgar Seemann,

60 Advantages SVM Verification Increased detection precision SVM has to solve an easier problem than actual detection Common representation for generative and discriminative model Detection works best for relatively high-resolution pedestrians Edgar Seemann,

61 Multi-Viewpoint Detections Edgar Seemann,

62 Example Detections and Articulations Detections Corresponding articulations Edgar Seemann,

63 Crowded Scene Movie Single frame detection! No temporal information used! Edgar Seemann,

64 Outline 1. Image Descriptors and Interest Points 2. Body Articulations 3. Cross-Articulation Learning 4. Discriminative Hypothesis Verification 5. Instance-Specific Models Facilitate detection in video sequences Adapt general model to a specific pedestrian appearance General model Instance model Edgar Seemann,

65 Instance-Specific models Learn the specific appearance of a person in the first frames of a sequence Model is build from the results of the general person detector Use a specialized model in subsequent frames General model Instance model Edgar Seemann,

66 Instance-Specific models and articulation Edgar Seemann,

67 Evaluation Results Requires appropriate normalization of the specific models I.e. ensure comparability of models Edgar Seemann,

68 Instance-Specific Models Learn the appearances of pedestrians in a video sequence from detections of the general model Follow pedestrians even through longer periods of occlusion No background subtraction used! Edgar Seemann,

69 Summary Many factors influence the performance of a detection algorithm Interest Point Detector Feature Descriptor Verification stages Non-Maximum Suppression Using a flexible and powerful object model enables many improvement, which could not be applied as easily to standard discriminative methods Edgar Seemann,

70 End of Lecture Edgar Seemann,

Visuelle Perzeption für Mensch- Maschine Schnittstellen

Visuelle Perzeption für Mensch- Maschine Schnittstellen Vorlesung, WS 2009 Prof. Dr. Rainer Stiefelhagen Dr. Edgar Seemann Institut für Anthropomatik Universität Karlsruhe (TH) http://cvhci.ira.uka.de