High Level Computer Vision

Size: px

Start display at page:

Download "High Level Computer Vision"

Louise Wilcox
5 years ago
Views:

1 High Level Computer Vision Part-Based Models for Object Class Recognition Part 2 Bernt Schiele - schiele@mpi-inf.mpg.de Mario Fritz - mfritz@mpi-inf.mpg.de

2 Please Note No lecture next week (June 25) Next lecture July 2nd 2

& Elschlager 1973] Model has two components - parts

3 Class of Object Models: Part-Based Models / Pictorial Structures Pictorial Structures [Fischler & Elschlager 1973] Model has two components - parts (2D image fragments) - structure (configuration of parts) 3

4 State-of-the-Art in Object Class Representations Bag of Words Models (BoW) object model = histogram of local features e.g. local feature around interest points Global Object Models object model = global feature object feature e.g. HOG (Histogram of Oriented Gradients) Part-Based Object Models object model = models of parts & spatial topology model e.g. constellation model or ISM (Implicit Shape Model) But: What is the Ideal Notion of Parts here? And: Should those Parts be Semantic? BoW: no spatial relationships e.g. HOG: fixed spatial relationships e.g. ISM: flexible spatial relationships 4

5 Constellation of Parts Fully connected shape model x 1 x 6 x 2 x 5 x 3 x 4 Weber, Welling, Perona, 00; Fergus, Zisserman, Perona, 03 5

6 Constellation Model Weber, Welling, Perona, 00; Fergus, Zisserman, Perona, 03 Joint model for appearance and structure (=shape) X: positions, A: part appearance, S: scale h: Hypothesis = assignment of features (in the image) to parts (of the model) Gaussian part appearance pdf Gaussian shape pdf Gaussian relative scale pdf Prob. of detection Log(scale) 6

7 Object Detection: ISM (Implicit Shape Model) Appearance of parts: Implicit Shape Model (ISM) [Leibe, Seemann & Schiele, CVPR 2005] x o 7

8 Spatial Models for Categorization Fully connected shape model Star shape model x 1 x 1 x 6 x 2 x 6 x 2 x 5 x 3 x 5 x 3 x 4 x 4 e.g. Constellation Model Parts fully connected Recognition complexity: O(N P ) Method: Exhaustive search e.g. ISM (Implicit Shape Model) Parts mutually independent Recognition complexity: O(NP) Method: Generalized Hough Transform 8

9 Implicit Shape Model: What are Good Parts? Parts of the Implicit Shape Model parts = feature clusters lots of parts (in the order of codebook entries) the more the better parts are mostly non-semantic parts = (mostly non semantic) feature clusters also true for bag of words models constellation model (much fewer parts - but still feature clusters)... 9

10 Part-Based Models - Overview Today (more next week) Last Week: Part-Based based on Manual Labeling of Parts - Detection by Components, Multi-Scale Parts The Constellation Model - automatic discovery of parts and part-structure The Implicit Shape Model (ISM) - star-model of part configurations, parts obtained by clustering interest-points Today: Pictorial Structures Model Hierarchical Graphical Model Learning Object Model from CAD Data Discussion Semantic Parts vs. Discriminative Parts 10

11 Part-Based Models - Overview Today (more next week) Last Week: Part-Based based on Manual Labeling of Parts - Detection by Components, Multi-Scale Parts The Constellation Model - automatic discovery of parts and part-structure The Implicit Shape Model (ISM) - star-model of part configurations, parts obtained by clustering interest-points Today: Pictorial Structures Model Hierarchical Graphical Model Learning Object Model from CAD Data Discussion Semantic Parts vs. Discriminative Parts 11

12 People Detection: partism Appearance of parts: Implicit Shape Model (ISM) [Leibe, Seemann & Schiele, CVPR 2005] x o 12

13 People Detection: partism Appearance of parts: Implicit Shape Model (ISM) [Leibe, Seemann & Schiele, CVPR 2005] Part decomposition and inference: Pictorial structures model [Felzenszwalb & Huttenlocher, IJCV 2005] x 8 x 7 x o x 3 x 6 p(l D) p(d L)p(L) x 2 x 1 x 5 x 4 Body-part positions Image evidence 13

14 Can we make pictorial structures model effective for these tasks? [Fischler&Elschlager, 1973] 14

15 Can we make pictorial structures model effective for these tasks? Yes... if the model components are chosen right. 15

16 Can we make pictorial structures model effective for these tasks? So... what are the right components? 16

17 Pictorial Structures Model Body-part positions Two Components p(l D) = p(d L)p(L) p(d) Image evidence Prior (capturing possible part configurations): Likelihood of Parts (capturing part appearance): p(d L)p(L) p(l) p(d L) 17

Pictorial Structures: Model Components [Andriluka,Roth,Schiele@cvpr09] Body is represented as

observations p(d L)p(L) prior on body poses orientation K likelihood of part N 60 40 estimated

18 Pictorial Structures: Model Components Body is represented as flexible configuration of body parts posterior over body poses p(l D) likelihood of observations p(d L)p(L) prior on body poses orientation K likelihood of part N estimated pose part posteriors 20 Local Features.... AdaBoost gure 2. (left) Kine

19 Kinematic Tree Prior (modeling the structure) l 9 l 7 l 5 l 6 l 8 l 10 Represent pairwise part relations [Felzenszwalb & Huttenlocher, IJCV 05] l 2 l 1 l 3 l 4 l 2 50 p(l) =p(l 0 ) 40 p(l i l j ), l (i,j) E p(l 2 l 1 )=N(T 12 (l 2 ) T 21 (l 1 ), ) part locations relative to the joint transformed part locations l 2 l

20 Kinematic Tree Prior Prior parameters: {T ij, ij } Parameters of the prior are estimated with maximum likelihood mean pose several independent samples gure Kinematic prior learned on the multi-view a 20

21 Pictorial Structures: Model Components Body is represented as flexible configuration of body parts posterior over body poses p(l D) likelihood of observations p(d L)p(L) prior on body poses orientation K likelihood of part N estimated pose part posteriors 20 Local Features.... AdaBoost gure 2. (left) Kine

22 Likelihood Model Assumption: evidence (image features) for each part independent of all other parts: p(d L) = assumption clearly not correct, but allows efficient computation NY p(d i l i ) i=0 works rather well in practice training data for different body parts should cover all appearances 22

23 Likelihood Model Build on recent advances in object detection: state-of-the-art image descriptor: Shape Context [Belongie et al., PAMI 02; Mikolajczyk&Schmid, PAMI 05] dense representation discriminative model: AdaBoost classifier for each body part - Shape Context: 96 dimensions (4 angular, 3 radial, 8 gradient orientations) - Feature Vector: concatenate the descriptors inside part bounding box - head: 4032 dimensions - torso: 8448 dimensions 23

24 AdaBoost (Freund & Shapire 95) f ( x) tht ( x) t Initial uniform weight on training examples weak classifier 1 errort t 0.5log 1 errort w e i yi tht ( xi ) i t 1 w t i yi tht ( xi ) wt 1e i Incorrect classifications re-weighted more heavily weak classifier 2 weak classifier 3 Final classifier is weighted combination of weak classifiers Viola and Jones, Robust object detection using a boosted cascade of simple features, CVPR 2001

25 Likelihood Model Part likelihood derived from the boosting score: decision stump weight p(d i l i ) = max t i,th t (x(l i )) t decision stump output i,t, 0 part location small constant to deal with part occlusions 25

26 Likelihood Model Input image Head Torso Upper leg Our part likelihoods.... [Ramanan, NIPS 06] 26

27 Likelihood Model Input image Head Torso Upper leg Our part likelihoods.... [Ramanan, NIPS 06] 27

28 Likelihood Model Input image Head Torso Upper leg Our part likelihoods.... [Ramanan, NIPS 06] 28

29 Evaluation Scenarios 1. Human Pose Estimation People dataset [Ramanan, NIPS 06] 2. Upper-body Pose Estimation Buffy dataset [Ferrari et al., CVPR 08] 3. Pedestrian Detection TUD Pedestrians dataset [Andriluka et al., CVPR 08] 29

30 Evaluation Scenarios 1. Human Pose Estimation People dataset [Ramanan, NIPS 06] 2. Upper-body Pose Estimation Buffy dataset [Ferrari et al., CVPR 08] 3. Pedestrian Detection TUD Pedestrians dataset [Andriluka et al., CVPR 08] 30

31 Part-Based Model: 2D Human Pose Estimation 31

32 Scenario 1: Quantitative Results Method Torso Upper legs Lower legs Upper arm Forearm Head Total [Ramanan, NIPS 06] 2nd parse Our inference, edge features from [Ramanan, NIPS 06] Our part detectors (SC) Our prior, our part detectors (SC) Our prior, our part detectors (SIFT)

33 Evaluation Scenarios 1. Human Pose Estimation People dataset [Ramanan, NIPS 06] 2. Upper-body Pose Estimation Buffy dataset [Ferrari et al., CVPR 08] 3. Pedestrian Detection TUD Pedestrians dataset [Andriluka et al., CVPR 08] 33

34 Estimated upper-body poses 34

35 Quantitative Results Method Torso Upper arm Lower arm Head Total [Ferrari et al. CVPR 08] detectors only full model generic model - prior and appearance learned on the People dataset

Quantitative Results Method Torso Upper arm Lower arm Head Total [Ferrari et al. CVPR 08] - - - - 57.9 detectors only 18.9 6.8 3.1 47.2 14.3 full model 90.7 79.

36 Quantitative Results Method Torso Upper arm Lower arm Head Total [Ferrari et al. CVPR 08] detectors only full model [Ferrari et al. CVPR 09] generic model - prior and appearance learned on the People dataset

Quantitative Results Method Torso Upper arm Lower arm Head Total [Ferrari et al. CVPR 08] - - - - 57.9 detectors only 18.9 6.8 3.1 47.2 14.3 full model 90.7 79.3 41.2 95.9 71.

37 Quantitative Results Method Torso Upper arm Lower arm Head Total [Ferrari et al. CVPR 08] detectors only full model [Ferrari et al. CVPR 09] full model, Buffy pose prior specialized upper body prior - appearance learned on the People dataset

38 Typical Failure Cases Foreshortening Part occlusion Detections on other body parts 38

39 Pictorial Structures for Human Pose Estimation: What are Good Parts? Parts of the Pictorial Structures Model parts = semantic body parts pose estimation = estimation of body part configuration semantic body parts allow to use motion capture data, etc. to improve kinematic tree prior non-semantic parts (e.g. in the ISM-model) are more difficult to generalize across human body poses 39

40 Part-Based Models - Overview Today (more next week) Last Week: Part-Based based on Manual Labeling of Parts - Detection by Components, Multi-Scale Parts The Constellation Model - automatic discovery of parts and part-structure The Implicit Shape Model (ISM) - star-model of part configurations, parts obtained by clustering interest-points Today: Pictorial Structures Model Hierarchical Graphical Model Learning Object Model from CAD Data Discussion Semantic Parts vs. Discriminative Parts 40

Model [Leibe&Schiele, 2003] also: Spatial Pyramid Kernel

41 Standard Models seen as Graphical Models Monolithic (global) HOG [Dalal & Triggs, 2005] Part-based (local) Implicit Shape Model [Leibe&Schiele, 2003] also: Spatial Pyramid Kernel [Lazebnik et al., 2006] Pictorial Structures [Felzenszwalb et al., 2005] 41

42 Hierarchical Graphical Model Generalization of Standard Models New hierarchical structure Short and longer range dependencies Joint training of local and global representations - Cyclic graph structure - Bottom-up top-down propagation 42

43 Hierarchical CRF Latent Model Goal: discover meaningful parts Labels of nodes not known Now: part labels Allow for ambiguities of part assignments Combine with structure learning Keep hierarchical representation 43

Experiments on PASCAL VOC 2007: Hierarchical CRF Latent Model [Schnitzspan,Roth,Schiele@cvpr10]

44 Experiments on PASCAL VOC 2007: Hierarchical CRF Latent Model Consistently outperforms DPM [Felzenszwalb et al., 2008] MKL [Vedaldi et al., 2009] better due to more features 44

45 Hierarchical Latent CRF-Model: What are Good Parts? Parts of the Latent CRF Model: parts have three main properties - parts = hierarchical appearance abstractions - parts = local appearance shared across objects - parts = discriminat local appearance of the objects parts enable effective learning - by finding correspondences between discriminative local object appearance semantics of parts is a secondary effect and not necessary [Schnitzspan,Roth,Schiele@cvpr10] similar observations hold for DPM [Felzenszwalb et al@cvpr08] 45

46 Part-Based Models - Overview Today (more next week) Last Week: Part-Based based on Manual Labeling of Parts - Detection by Components, Multi-Scale Parts The Constellation Model - automatic discovery of parts and part-structure The Implicit Shape Model (ISM) - star-model of part configurations, parts obtained by clustering interest-points Today: Pictorial Structures Model Hierarchical Graphical Model Learning Object Model from CAD Data Discussion Semantic Parts vs. Discriminative Parts 46

47 Back to the Future: Learning Shape Models from 3D CAD Data 3D Computer Aided Design (CAD) Models Computer graphics, game design Polygonal meshes + texture descriptions semantic part annotations (may) exist [Stark,Goesele,Schiele@bmvc10] Can we learn Object Class Models directly from 3D CAD data? Issue: Transition between 3D CAD models and 2D real-world images 47

48 Shape-based Appearance Abstraction Non-Photorealistic Rendering Learn shape models from rendered images we do NOT render photo-realistically / texture But focus on 3D CAD model edges (mimic real-world image edges) Part boundaries Mesh creases Silhouette } Final edges (hidden edges removed) [Stark,Goesele,Schiele@bmvc10] 48

49 Shape - Local Shape + Global Geometry [Stark,Goesele,Schiele@bmvc10] Part-based object class representation Semantic parts from 3D CAD models: left front wheel, left front door, etc. View-points Left Front left Front Parts Local shape Global geometry 49

50 Constellation Model Weber, Welling, Perona, 00; Fergus, Zisserman, Perona, 03 Joint model for appearance and structure (=shape) X: positions, A: part appearance, S: scale h: Hypothesis = assignment of features (in the image) to parts (of the model) Gaussian part appearance pdf Gaussian shape pdf Gaussian relative scale pdf Prob. of detection Log(scale) 50

51 Experimental Evaluation - Test Data Set [Stark,Goesele,Schiele@bmvc10] 3D Object Classes Cars [Savarese and Fei-Fei ICCV 07] 8 azimuth angles... back back-left left front-left 2 elevation angles low high 5 cars 3 distances 240 images near medium far 51

52 Qualitative Results Three strongest true positive detections per viewpoint model Right Back-right Observations Accurate part localization Predicted viewpoints do match 52

53 Quantitative Results - Multi-View Detection [Stark,Goesele,Schiele@bmvc10] Evaluation protocol Precision / recall PASCAL overlap criterion Observations 4.3% AP improvement ( ) [Liebelt et al. CVPR 10] ( ) We can further improve performance by 8.9% AP ( ) recall bank of 8 (AP = 81.0) 10 degree intervals (AP = 89.9) bank of 8 (AP = 81.0) 10 degree intervals (AP = 89.9) Su et al. ICCV09 (AP = 55.3) Su et al. ICCV09 (AP = 55.3) Gill et al. DAGM09 (AP = 72.6) Gill et al. DAGM09 (AP = 72.6) Liebelt Liebelt et al. et CVPR10 al. (AP = 76.7) = 76.7) precision 53

54 Shape Model learned from 3D CAD-Data What are Good Parts? Parts of the Shape Model: parts = just means to enable correspondence across 3D-models semantics of parts: - in our case: yes - because of the employed 3D models - but: semantics neither necessary nor important Left Front left Front

55 Part-Based Models - Overview Today (more next week) Last Week: Part-Based based on Manual Labeling of Parts - Detection by Components, Multi-Scale Parts The Constellation Model - automatic discovery of parts and part-structure The Implicit Shape Model (ISM) - star-model of part configurations, parts obtained by clustering interest-points Today: Pictorial Structures Model Hierarchical Graphical Model Learning Object Model from CAD Data Discussion Semantic Parts vs. Discriminative Parts 55

56 What are Ideal Parts for Part-Based Object Models? Parts can/may be semantic body parts - e.g. for articulated human body pose estimation be feature clusters (typically many clusters) (e,g, ISM, constellation model, BoW) support learnability of discriminant appearance enable correspondence across 3D models... in all those cases: the most important property is that parts facilitate correspondence across object instances 56

57 What are Ideal Parts for Part-Based Object Models? Multiple motivations for part based models exist: intuitiveness: semantic meaning of parts/attributes is attractive (e.g. enables use of language sources) learnability: sharing of parts/attributes across instances/classes scalability: transferability of parts/attributes across classes... in general, parts support learnability and scalability when they facilitate correspondence across object instances across object classes across modalities (e.g. from language to visual appearance) and semantics is only a secondary concern (for intuitiveness ) 57

58 What are Ideal Parts for Part-Based Object Models? the Good, the Bad, and the Ugly thanks to: Micha Andriluka, Bastian Leibe, Sandra Ebert, Mario Fritz, Diane Larlus, Marcus Rohrbach, Paul Schnitzspan, Stefan Roth, Michael Stark, Michael Goesele

Part-Based Models for Object Class Recognition Part 2

High Level Computer Vision Part-Based Models for Object Class Recognition Part 2 Bernt Schiele - schiele@mpi-inf.mpg.de Mario Fritz - mfritz@mpi-inf.mpg.de https://www.mpi-inf.mpg.de/hlcv Class of Object