Part-Based Models for Object Class Recognition Part 2

Similar documents
Part-Based Models for Object Class Recognition Part 2

High Level Computer Vision

Part-Based Models for Object Class Recognition Part 3

Part based models for recognition. Kristen Grauman

Previously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011

Object Category Detection. Slides mostly from Derek Hoiem

Structured Models in. Dan Huttenlocher. June 2010

Deformable Part Models

Part-based and local feature models for generic object recognition

Object Recognition with Deformable Models

In Good Shape: Robust People Detection based on Appearance and Shape

Development in Object Detection. Junyuan Lin May 4th

Combining PGMs and Discriminative Models for Upper Body Pose Detection

Part-based models. Lecture 10

Lecture 16: Object recognition: Part-based generative models

Visuelle Perzeption für Mensch- Maschine Schnittstellen

Visuelle Perzeption für Mensch- Maschine Schnittstellen

Beyond Bags of features Spatial information & Shape models

Automatic Discovery of Meaningful Object Parts with Latent CRFs

Modern Object Detection. Most slides from Ali Farhadi

18 October, 2013 MVA ENS Cachan. Lecture 6: Introduction to graphical models Iasonas Kokkinos

Detection III: Analyzing and Debugging Detection Methods

People-Tracking-by-Detection and People-Detection-by-Tracking

Computer Vision Lecture 16

Detecting and Segmenting Humans in Crowded Scenes

Multiple-Person Tracking by Detection

Category vs. instance recognition

Object Category Detection: Sliding Windows

Combining Discriminative Appearance and Segmentation Cues for Articulated Human Pose Estimation

DPM Score Regressor for Detecting Occluded Humans from Depth Images

Recap Image Classification with Bags of Local Features

Object Recognition. Computer Vision. Slides from Lana Lazebnik, Fei-Fei Li, Rob Fergus, Antonio Torralba, and Jean Ponce

Supervised learning. y = f(x) function

Category-level Localization

Supervised learning. y = f(x) function

Category-level localization

Revisiting 3D Geometric Models for Accurate Object Shape and Pose

Segmentation. Bottom up Segmentation Semantic Segmentation

Local Features based Object Categories and Object Instances Recognition

Deformable Part Models

Course Administration

Multiple Kernel Learning for Emotion Recognition in the Wild

2/15/2009. Part-Based Models. Andrew Harp. Part Based Models. Detect object from physical arrangement of individual features

TA Section: Problem Set 4

High Level Computer Vision. Sliding Window Detection: Viola-Jones-Detector & Histogram of Oriented Gradients (HOG)

Action recognition in videos

Strong Appearance and Expressive Spatial Models for Human Pose Estimation

Test-time Adaptation for 3D Human Pose Estimation

A novel template matching method for human detection

Lecture 15: Detecting Objects by Parts

Object Detection with Discriminatively Trained Part Based Models

Occlusion Patterns for Object Class Detection

Clustered Pose and Nonlinear Appearance Models for Human Pose Estimation

Find that! Visual Object Detection Primer

CEE598 - Visual Sensing for Civil Infrastructure Eng. & Mgmt.

Multiple Pose Context Trees for estimating Human Pose in Object Context

Selection of Scale-Invariant Parts for Object Class Recognition

CEE598 - Visual Sensing for Civil Infrastructure Eng. & Mgmt.

Descriptors for CV. Introduc)on:

A Comparison on Different 2D Human Pose Estimation Method

Object Detection by 3D Aspectlets and Occlusion Reasoning

Beyond bags of features: Adding spatial information. Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba

EECS 442 Computer vision. Object Recognition

Computer Vision. Exercise Session 10 Image Categorization

CS 231A Computer Vision (Fall 2011) Problem Set 4

Object detection using non-redundant local Binary Patterns

Spatial Localization and Detection. Lecture 8-1

The Caltech-UCSD Birds Dataset

Object Recognition Using Pictorial Structures. Daniel Huttenlocher Computer Science Department. In This Talk. Object recognition in computer vision

Computer Vision II Lecture 14

Finding people in repeated shots of the same scene

Lecture 18: Human Motion Recognition

PEDESTRIAN DETECTION IN CROWDED SCENES VIA SCALE AND OCCLUSION ANALYSIS

Human detection using local shape and nonredundant

Fast, Accurate Detection of 100,000 Object Classes on a Single Machine

Tri-modal Human Body Segmentation

LEARNING TO GENERATE CHAIRS WITH CONVOLUTIONAL NEURAL NETWORKS

Key Developments in Human Pose Estimation for Kinect

Fusing shape and appearance information for object category detection

Local Image Features

Object Category Detection: Sliding Windows

Is 2D Information Enough For Viewpoint Estimation? Amir Ghodrati, Marco Pedersoli, Tinne Tuytelaars BMVC 2014

Person Detection in Images using HoG + Gentleboost. Rahul Rajan June 1st July 15th CMU Q Robotics Lab

Repositorio Institucional de la Universidad Autónoma de Madrid.

Patch-based Object Recognition. Basic Idea

Feature Selection using Constellation Models

CS6670: Computer Vision

High-Level Fusion of Depth and Intensity for Pedestrian Classification

CV as making bank. Intel buys Mobileye! $15 billion. Mobileye:

Learning to Localize Objects with Structured Output Regression

Object detection. Announcements. Last time: Mid-level cues 2/23/2016. Wed Feb 24 Kristen Grauman UT Austin

Tracking People. Tracking People: Context

LEARNING MODELS FOR MULTI-VIEWPOINT OBJECT DETECTION

Incremental Learning of Object Detectors Using a Visual Shape Alphabet

Qualitative Pose Estimation by Discriminative Deformable Part Models

Pictorial Structures for Object Recognition

Discriminative classifiers for image recognition

High Level Computer Vision. Basic Image Processing - April 26, 2o17. Bernt Schiele - Mario Fritz -

Object Detection with Partial Occlusion Based on a Deformable Parts-Based Model

Multiple Instance Feature for Robust Part-based Object Detection

Transcription:

High Level Computer Vision Part-Based Models for Object Class Recognition Part 2 Bernt Schiele - schiele@mpi-inf.mpg.de Mario Fritz - mfritz@mpi-inf.mpg.de https://www.mpi-inf.mpg.de/hlcv

Class of Object Models: Part-Based Models / Pictorial Structures Pictorial Structures [Fischler & Elschlager 1973] Model has two components - parts (2D image fragments) - structure (configuration of parts)!2

State-of-the-Art in Object Class Representations Bag of Words Models (BoW) object model = histogram of local features e.g. local feature around interest points Global Object Models object model = global feature object feature e.g. HOG (Histogram of Oriented Gradients) Part-Based Object Models object model = models of parts & spatial topology model e.g. constellation model or ISM (Implicit Shape Model) But: What is the Ideal Notion of Parts here? And: Should those Parts be Semantic? BoW: no spatial relationships e.g. HOG: fixed spatial relationships e.g. ISM: flexible spatial relationships!3

Constellation of Parts Fully connected shape model x 1 x 6 x 2 x 5 x 3 x 4 Weber, Welling, Perona, 00; Fergus, Zisserman, Perona, 03!4

Constellation Model Weber, Welling, Perona, 00; Fergus, Zisserman, Perona, 03 Joint model for appearance and structure (=shape) X: positions, A: part appearance, S: scale h: Hypothesis = assignment of features (in the image) to parts (of the model) Gaussian part appearance pdf Gaussian shape pdf Gaussian relative scale pdf Prob. of detection Log(scale)!5

Object Detection: ISM (Implicit Shape Model) Appearance of parts: Implicit Shape Model (ISM) [Leibe, Seemann & Schiele, CVPR 2005] x o!6

Spatial Models for Categorization Fully connected shape model Star shape model x 1 x 1 x 6 x 2 x 6 x 2 x 5 x 3 x 5 x 3 x 4 x 4 e.g. Constellation Model Parts fully connected Recognition complexity: O(N P ) Method: Exhaustive search e.g. ISM (Implicit Shape Model) Parts mutually independent Recognition complexity: O(NP) Method: Generalized Hough Transform!7

Implicit Shape Model: What are Good Parts? [Leibe,Schiele@bmvc03] [Leibe,Leonardis,Schiele@ijcv08] Parts of the Implicit Shape Model parts = feature clusters lots of parts (in the order of 1,000-10,000 codebook entries) the more the better! parts are mostly non-semantic parts = (mostly non semantic) feature clusters also true for bag of words models constellation model (much fewer parts - but still feature clusters)...!8

Part-Based Models - Overview Today Last Week: Part-Based based on Manual Labeling of Parts - Detection by Components, Multi-Scale Parts The Constellation Model - automatic discovery of parts and part-structure The Implicit Shape Model (ISM) - star-model of part configurations, parts obtained by clustering interest-points Today: Pictorial Structures Model Learning Object Model from CAD Data Deformable Parts Model (DPM) Discussion Semantic Parts vs. Discriminative Parts!9

People Detection: partism Appearance of parts: Implicit Shape Model (ISM) [Leibe, Seemann & Schiele, CVPR 2005] x o!10

People Detection: partism Appearance of parts: Implicit Shape Model (ISM) [Leibe, Seemann & Schiele, CVPR 2005] Part decomposition and inference: Pictorial structures model [Felzenszwalb & Huttenlocher, IJCV 2005] x 8 x 7 x o x 3 x 6 p(l D) p(d L)p(L) x 2 x 1 x 5 x 4 Body-part positions Image evidence!11

Pictorial Structures Model p(l D) = p(d L)p(L) p(d) p(d L)p(L) Body-part positions Image evidence Two Components Prior (capturing possible part configurations): Likelihood of Parts (capturing part appearance): p(l) p(d L)!12

Pictorial Structures: Model Components [Andriluka,Roth,Schiele@cvpr09] Body is represented as flexible configuration of body parts posterior over body poses p(l D) likelihood of observations p(d L)p(L) prior on body poses orientation K likelihood of part N 60 40 estimated pose part posteriors 20 Local Features.... AdaBoost 0 20 40 60 80 100 50 0 50 gure 2. (left) Kine......!13

Kinematic Tree Prior (modeling the structure) l 9 l 5 l 6 l 7 l 8 l 10 Represent pairwise part relations [Felzenszwalb & Huttenlocher, IJCV 05] l 2 l 1 l 3 l 4 l 2 50 p(l) =p(l 0 ) 40 p(l i l j ), 50 40 l 1 30 20 (i,j) E 30 20 10 0 10 p(l 2 l 1 )=N(T 12 (l 2 ) T 21 (l 1 ), 20 12 ) 10 0 10 20 30 30 40 40 part locations relative to the joint 50 50 0 50 transformed part locations 50 50 0 50 l 2 l 1 100 80 60 40 20 0 20 40 60 80 100 100 50 50 40 30 20 10 0 10 20 30 40 50 50 0 + 50 1000 50 50 40 30 20 10 0 10 20 30 40 50 50 0 50 00 50!14

Kinematic Tree Prior Prior parameters: {T ij, ij } Parameters of the prior are estimated with maximum likelihood mean pose several independent samples 60 60 60 60 40 40 40 40 20 20 20 0 0 0 20 20 20 20 40 40 40 0 60 80 60 80 60 80 20 100 120 80 60 40 20 0 20 40 60 80 100 120 80 60 40 20 0 20 40 60 80 100 120 80 60 40 20 0 20 40 60 80 40 60 40 60 40 60 40 20 20 20 60 0 0 0 20 20 20 80 40 60 40 60 40 60 100 80 100 80 100 80 100 50 0 50 gure 2. 120 120 120 80 60 40 20 0 20 40 60 80 80 60 40 20 0 20 40 60 80 80 60 40 20 0 20 40 60 80 Kinematic prior learned on the multi-view a!15

Pictorial Structures: Model Components [Andriluka,Roth,Schiele@cvpr09] Body is represented as flexible configuration of body parts posterior over body poses p(l D) likelihood of observations p(d L)p(L) prior on body poses orientation K likelihood of part N 60 40 estimated pose part posteriors 20 Local Features.... AdaBoost 0 20 40 60 80 100 50 0 50 gure 2. (left) Kine......!16

Likelihood Model Assumption: evidence (image features) for each part independent of all other parts: p(d L) = assumption clearly not correct, but NY p(d i l i ) i=0 allows efficient computation works rather well in practice training data for different body parts should cover all appearances!17

Likelihood Model Build on recent advances in object detection: state-of-the-art image descriptor: Shape Context [Belongie et al., PAMI 02; Mikolajczyk&Schmid, PAMI 05] dense representation discriminative model: AdaBoost classifier for each body part - Shape Context: 96 dimensions (4 angular, 3 radial, 8 gradient orientations) - Feature Vector: concatenate the descriptors inside part bounding box - head: 4032 dimensions - torso: 8448 dimensions!18

Likelihood Model Part likelihood derived from the boosting score: decision stump weight p(d i l i ) = max t i,th t (x(l i )) t decision stump output i,t, 0 part location small constant to deal with part occlusions!19

Likelihood Model Input image Head Torso Upper leg Our part likelihoods.... [Ramanan, NIPS 06]!20

Part-Based Model: 2D Human Pose Estimation [Andriluka,Roth,Schiele@cvpr09]!21

Pictorial Structures for Human Pose Estimation: What are Good Parts? Parts of the Pictorial Structures Model parts = semantic body parts pose estimation = estimation of body part configuration semantic body parts allow to use motion capture data, etc. to improve kinematic tree prior non-semantic parts (e.g. in the ISM-model) are more difficult to generalize across human body poses!22

Part-Based Models - Overview Today Last Week: Part-Based based on Manual Labeling of Parts - Detection by Components, Multi-Scale Parts The Constellation Model - automatic discovery of parts and part-structure The Implicit Shape Model (ISM) - star-model of part configurations, parts obtained by clustering interest-points Today: Pictorial Structures Model Learning Object Model from CAD Data Deformable Parts Model (DPM) Discussion Semantic Parts vs. Discriminative Parts!23

Back to the Future: Learning Shape Models from 3D CAD Data 3D Computer Aided Design (CAD) Models Computer graphics, game design Polygonal meshes + texture descriptions semantic part annotations (may) exist [Stark,Goesele,Schiele@bmvc10] Can we learn Object Class Models directly from 3D CAD data? Issue: Transition between 3D CAD models and 2D real-world images!24

Constellation Model Weber, Welling, Perona, 00; Fergus, Zisserman, Perona, 03 Joint model for appearance and structure (=shape) X: positions, A: part appearance, S: scale h: Hypothesis = assignment of features (in the image) to parts (of the model) Gaussian part appearance pdf Gaussian shape pdf Gaussian relative scale pdf Prob. of detection Log(scale)!25

[Stark,Goesele,Schiele@bmvc10] Three Tools to Meet the Challenge 1. Shape-based appearance abstraction Non-photorealistic rendering Local shape + global geometry 2. Discriminative part detectors Robust local shape features AdaBoost classifiers 3. Powerful spatial model Full covariance Efficient DDMCMC inference!26

Shape-based Appearance Abstraction Non-Photorealistic Rendering Learn shape models from rendered images we do NOT render photo-realistically / texture But focus on 3D CAD model edges (mimic real-world image edges) Part boundaries Mesh creases Silhouette } Final edges (hidden edges removed) [Stark,Goesele,Schiele@bmvc10]!27

Shape: Local Shape + Global Geometry [Stark,Goesele,Schiele@bmvc10] Part-based object class representation Semantic parts from 3D CAD models: left front wheel, left front door, etc. View-points Left Front left Front Parts Local shape +......... Global geometry!28

[Stark,Goesele,Schiele@bmvc10] Qualitative Results Three strongest true positive detections per viewpoint model Right Back-right Observations Accurate part localization Predicted viewpoints do match!29

Shape Model learned from 3D CAD-Data What are Good Parts? Parts of the Shape Model: parts = just means to enable correspondence across 3D-models semantics of parts: - in our case: yes - because of the employed 3D models - but: semantics neither necessary nor important Left Front left Front.........!30

Part-Based Models - Overview Today Last Week: Part-Based based on Manual Labeling of Parts - Detection by Components, Multi-Scale Parts The Constellation Model - automatic discovery of parts and part-structure The Implicit Shape Model (ISM) - star-model of part configurations, parts obtained by clustering interest-points Today: Pictorial Structures Model Learning Object Model from CAD Data Deformable Parts Model (DPM) Discussion Semantic Parts vs. Discriminative Parts!31

slide curtesy: Pedro Felzenszwalb!32

!33

!34

Starting Point: Sliding Window Method Sliding Window Based People Detection: Scan Image Extract Feature Vector Classify Feature Vector Non-Maxima Suppression Two Important Questions: 1) which feature vector 2) which classifier For example: slide detection window over all positions & scales - HOG Pedestrian Detector - HOG descriptor - linear SVM!35

Histogram of Oriented Gradients (HOG): Static Feature Extraction Input Image Detection Window!36

!37

!38

!39

F (p, H) < 0 F (p, H) > 0!40

!41

!42

!43

!44

!45

Efficient Computation Overall score: score(p 0,...,p n )= nx F i (H, p i ) i=0 Maximization can be done separately: nx i=1 d i (dx 2 i,dy 2 i ) score(p 0 ) = max score(p 0,...,p n ) p 1,...,p n = F 0 (H, p 0 )+ max p 1 F 1 (H, p 1 ) d 1 (dx 2 1,dy 2 1) + + max p n F n (H, p n ) d n (dx 2 n,dy 2 n)!46

!47

!48

!49

!50

SVM training Classifier scores and example x using: model parameter: feature vector: f(x) = (x) (x) Linear SVM: objective: maximize margin (for best generalization) margin!51

SVM training Training data: D =(hx 1,y 1 i,...,hx n,y n i) with y i 2 { 1, 1} Constraints: f(x i ) +1 for y i =+1 f(x i ) apple 1 for y i = 1 ) y i f(x i ) +1 margin Training error: ) 0 1 y i f(x i ) nx max (0, 1 y i f(x i )) i=1!52

SVM training Two objectives: maximize margin: minimize training error: Therefore minimize (primal formulation) Hinge loss: min 1 2 2 nx min max (0, 1 y i f(x i )) i=1! L( )=min 1 nx 2 2 + max (0, 1 y i f(x i )) i=1 H(z) = max(0, 1 z) 0 z!53

sudo!54

!55

!56

!57

!58

!59

slides from Dan Huttenlocher!60

!61

!62

!63

!64

!65

!66

Part-Based Models - Overview Today Last Week: Part-Based based on Manual Labeling of Parts - Detection by Components, Multi-Scale Parts The Constellation Model - automatic discovery of parts and part-structure The Implicit Shape Model (ISM) - star-model of part configurations, parts obtained by clustering interest-points Today: Pictorial Structures Model Learning Object Model from CAD Data Deformable Parts Model (DPM) Discussion Semantic Parts vs. Discriminative Parts!67

What are Ideal Parts for Part-Based Object Models? Parts can/may be semantic body parts - e.g. for articulated human body pose estimation be feature clusters (typically many clusters) (e.g. ISM, constellation model, BoW) support learnability of discriminant appearance (e.g. DPM model) enable correspondence across 3D models... in all those cases: the most important property is that parts facilitate correspondence across object instances!68

What are Ideal Parts for Part-Based Object Models? Multiple motivations for part based models exist: intuitiveness: semantic meaning of parts/attributes is attractive (e.g. enables use of language sources) learnability: sharing of parts/attributes across instances/classes scalability: transferability of parts/attributes across classes in general, parts support learnability and scalability when they facilitate correspondence across object instances across object classes across modalities (e.g. from language to visual appearance) and semantics is only a secondary concern (for intuitiveness )!69

What are Ideal Parts for Part-Based Object Models? the Good, the Bad, and the Ugly thanks to: Micha Andriluka, Bastian Leibe, Sandra Ebert, Mario Fritz, Diane Larlus, Marcus Rohrbach, Paul Schnitzspan, Stefan Roth, Michael Stark, Michael Goesele