Revisiting 3D Geometric Models for Accurate Object Shape and Pose

Revisiting 3D Geometric Models for Accurate Object Shape and Pose M. 1 Michael Stark 2,3 Bernt Schiele 3 Konrad Schindler 1 1 Photogrammetry and Remote Sensing Laboratory Swiss Federal Institute of Technology (ETH), Zurich 2 Artificial Intelligence Lab Stanford University, USA 3 Max-Planck-Institute for Informatics Saarbrücken, Germany

Current object models: coarse grained estimates 1

Our goal: finer-grained models to aid scene-level reasoning 2

Revival of 3D geometric representations 1970 [Marr, Nishihara 78] [Brooks 81] [Pentland 86] [Lowe 87] [Koller, Daniilidis, Nagel 93] [Sullivan, Worrall, Ferryman 95] [Haag, Nagel 99] 1980 1990 2000 2010 3

Revival of 3D geometric representations 1970 [Marr, Nishihara 78] [Brooks 81] [Pentland 86] [Lowe 87] 1980 1990 [Koller, Daniilidis, Nagel 93] [Sullivan, Worrall, Ferryman 95] [Haag, Nagel 99] 2000 [Hoiem, Efros, Hebert 08] [Ess, Leibe, Schindler, Van Gool 09] [Wang, Gould, Koller 10] [Hedau, Hoiem, Forsyth 10] [Barinova, Lempitsky, Tretyak, Kohli 10] [Gupta, Efros, Hebert 10] [Wojek, Roth, Schindler, Schiele 10] 2010 3

Related work in viewpoint invariant detection Multiple, viewpoint dependent representations (connected in different ways) [Thomas et al., 06] [Yan, Khan, Shah 07] [Ozuysal, Lepetit, Fua 09] [Nachimson, Basri 09] [Su, Sun, Fei-Fei, Savarese 09] [Gu, Ren 10] [Stark, Goesele, Schiele 10] 1) 1) 2) Explicit 3D geometry representation [Liebelt, Schmid 10] 2) [Sun, Xu, Bradski, Savarese 10] [Gupta, Efros, Hebert 10] [Chen, Kim, Cipolla 10] [Gupta, Satkin, Efros, Hebert 11] 4

Overview Simplify 3D Active Shape Model PCA 3D CAD Models 5

Overview Simplify 3D Active Shape Model PCA 3D CAD Models Render Positive examples (per part) 5

Overview Simplify 3D Active Shape Model PCA 3D CAD Models Render Positive examples (per part) AdaBoost Negative examples (background) 5

Overview Simplify 3D Active Shape Model PCA 3D CAD Models Render Positive examples (per part) AdaBoost Negative examples (background) Detection maps Test image 5

Overview Simplify 3D Active Shape Model PCA Inference 3D CAD Models Render Positive examples (per part) AdaBoost Negative examples (background) Detection maps Test image 5

Representation: 3D geometry Simplified 3D wireframes : fixed number of vertices 6

Learning: 3D geometry Eigen-Cars Principal Components Analysis (PCA) Tightly constrained global geometry 7

Representation: Local appearance Accurate foreground shape Very cheap training data, dense sampling of viewpoints! 8

Learning: Local appearance Dense Shape Context features [Belongie, Malik. 00] AdaBoost classifiers (per part-viewpoint) + - Annotated vertices are our parts. Related work: [Andriluka, Roth, Schiele 09] 9

Inference Test Image 10

Inference Test Image Detection maps 10

Inference Test Image Detection maps Sample 3D wireframes, project, compute image likelihood 10

Inference Detection maps Sample 3D cars, project, compute image likelihood image evidence shape of wireframe camera focal length recognition hypothesis viewpoint parameters, azimuth and elevation image space translation and scaling Projection matrix local part scale part likelihood self-occlusion indicator 11

Experimental evaluation Test Dataset Evaluations on 3D Object Classes dataset [Savarese et al., 2007] Car class (8 azimuth angles, 2 elevation angles, 3 distances, varying backgrounds) 240 images, 5 cars 12

Experimental evaluation - Training 38 3D CAD models 36 vertices as model points, 20 annotations per model (due to symmetry). Separate local part shape detectors trained from: - 72 different azimuth angles, - 2 different elevation angles (7.5, 15 from ground plane) 13

Experimental evaluation - Initialization Two initializations : 20 Stark et al., 2010 (full system) True initial value (tight bounding box, rough azimuth) 14

Experimental evaluation - Inference 35 35 20 14

Example wireframe fits Parts correctly localized Full system: 74.2% True initial value: 83.4% 15

Fine-grained 3D geometry estimation Accurate estimation of closest 3D CAD model, camera parameters, and ground plane 16

Ultra-wide baseline matching UW-Baseline matching using only model fits (corresponding part locations) Impossible using interest point matching Related work: [Bao, Savarese 11] 17

Ultra-wide baseline matching UW-Baseline matching using only model fits (corresponding part locations) Impossible using interest point matching Related work: [Bao, Savarese 11] 18

Ultra-wide baseline matching Azimuth Difference No. of Image Pairs True initial value Full system SIFT Part detections only 45 53 91% 55% 2% 27% 90 35 91% 60% 0% 27% 135 29 69% 52% 0% 10% 180 17 59% 41% 0% 24% Correct fit = Sampson error < E max on ground truth correspondences 3D Geometric model improves significantly over part detections only 19

Multiview recognition Rescored hypotheses Good 2D localization 20

Continuous viewpoint estimation Total Images True Positives % correct azimuth Average error azimuth Average error elevation Stark et al., 2010 48 46 67.4% 4.2 4.0 Full system 48 45 73.3% 3.8 3.6 True initial value* 48 48 89.6% 4.2 3.6 Comparison against ground truth pose, manually labeled. Full system improves 6% over Stark et al., 2010. * Approximate pose initialization quantized to 45 steps 21

Conclusion 3D deformable object class model have potential for accurate geometric reasoning on scene level. - accurate object localization - geometric parts in 2D - 3D pose estimation Novel application examples - fine-grained object categorization - ultra-wide baseline matching Future extensions - efficient multi-class methods for part likelihoods - analyze importance of geometric model vs. local appearance - occlusion invariance 22

OLD SLIDES

Learning: 3D Geometry any wireframe mean wireframe weight of k th principal component standard deviation of j th principal component direction of j th principal Eigen-Cars component residual (if r < m)

Part localization correct localization ~ localized within 4% of car length from ground truth

Experimental evaluation - Inference 35 35 20 14