Causal and compositional generative models in online perception

Size: px

Start display at page:

Download "Causal and compositional generative models in online perception"

Norma Coleen Baker
6 years ago
Views:

1 Causal and compositional generative models in online perception Michael Janner with Ilker Yildirim, Mario Belledonne, Chris8an Wallraven, Winrich Freiwald, Joshua Tenenbaum MIT July 28, 2017 London, UK

3 ResNet-18 predic/ons zebra dalma8on park bench... ResNet: He et al. CVPR %.2%.1%

4 Outline 1. Occluded face perception with causal and compositional models

5 Outline 1. Occluded face perception with causal and compositional models 2. Automatic training-free vision-to-touch crossmodal transfer

6 Outline 1. Occluded face perception with causal and compositional models 2. Automatic training-free vision-to-touch crossmodal transfer 3. Learning visual causal models for generic object categories Composite Texture Shape Lighting

8 The generative model Intrinsic Shape Texture Shape S; Texture T Face Morphable Model: Blanz, Vetter. SIGGRAPH 1999.

9 The generative model Intrinsic Extrinsic Shape Texture Lights Pose Shape S; Texture T Lights L; Pose P Face Morphable Model: Blanz, Vetter. SIGGRAPH 1999.

10 The generative model Intrinsic Extrinsic Shape Texture Lights Pose Shape S; Texture T Lights L; Pose P Rendering Engine Face Face Face Morphable Model: Blanz, Vetter. SIGGRAPH 1999.

11 The generative model Intrinsic Extrinsic Shape Texture Lights Pose Shape S; Texture T Lights L; Pose P Rendering Engine Face Occluder Face Occluder ShapeNet: Angel X. Chang, et al. arxiv:

12 The generative model Intrinsic Extrinsic Shape Texture Lights Pose Shape S; Texture T Lights L; Pose P Rendering Engine Face Occluder Face Occluder Image Image

13 Samples from the generative model Intrinsic Shape S; Texture T Extrinsic Lights L; Pose P Rendering Engine Face Occluder Image

14 Test occluders Bars Fence Curtain Door Blinds Low Medium High - Occlude face in stripes, mesh patterns, and large patches - Three levels of occlusion, ranging from 15-55% of face covered

15 Occluded face perception task

16 Occluded face perception task

17 Occluded face perception task Occluded à Unoccluded Same

18 Occluded face perception task

19 Occluded face perception task

20 Occluded face perception task Unoccluded à Occluded Different

21 Behavioral results Accuracy Occlusion Level Low Medium High Occluded à Unoccluded Unoccluded à Occluded Chance:.5

22 Naïve model: Learn to ignore the occluder CNN Shape S; Texture T Predict intrinsic and extrinsic face parameters directly. Can model become invariant to all types of occluders?

23 Naïve model: Learn to ignore the occluder Observation Reconstruction Observation Reconstruction Predict intrinsic and extrinsic face parameters directly. Can model become invariant to all types of occluders? No. Test occluders heavily influence face predictions.

24 Inverse graphics: Inverting the generative model Shape S; Texture T Lights L; Pose P Shape S; Texture T; Pose P Rendering Engine Efficient Inverse Graphics Network (EIG) Face Occluder Layer Reconstruction Network (LR) Image [Also see: Egger, et al. Occlusion-aware 3D Morphable Face Models. BMVC 2016.]

25 Inverse graphics: Inverting the generative model Shape S; Texture T Lights L; Pose P Shape S; Texture T; Pose P Rendering Engine Efficient Inverse Graphics Network (EIG) Face Occluder Layer Reconstruction Network (LR) Image [Also see: Egger, et al. Occlusion-aware 3D Morphable Face Models. BMVC 2016.]

26 Observed Layer Reconstructions Rendered EIG Face Occluder Ground truth Modeling causes allows for: (1) face predictions that are invariant to occluders (2) better generalization to unseen occluders

27 Prediction human judgments Occluded à Unoccluded Low Medium High Unoccluded à Occluded Low Medium High Spearman rank correlation Image-space comparison Latent-space baseline VGG

28 1. Occluded face perception with causal and compositional models 2. Automatic training-free vision-to-touch crossmodal transfer 3. Learning visual causal models for generic object categories Composite Texture Shape Lighting

29 Vision-to-touch transfer Study Test Shape is explicitly represented Immediate transfer to haptic domain Can make judgments based on grasping joint angles

30 Shape, S; Texture, T Vision-to-touch transfer Approximate human performance [3] Grasping Engine Joint Angles [3] Dopjans et al, IEEE Transac*ons on Hap*cs, 2000

31 1. Occluded face perception with causal and compositional models 2. Automatic training-free vision-to-touch crossmodal transfer 3. Learning visual causal models for generic object categories Composite Texture Shape Lighting

32 Causal models of generic objects Causal intermediate stage visual representations of reflectance, shape, and lighting

33 c Inputs Shader Outputs Decomposing generic objects Causal intermediate stage visual representations of reflectance, shape, and lighting Learn both the recognition model and the rendering function

Decomposing generic objects Train shapes

34 Decomposing generic objects Train shapes Test shapes Causal intermediate stage visual representations of reflectance, shape, and lighting Learn both the recognition model and the rendering function Representation learning driven by reconstruction error

Decomposing generic objects Observation Initial

recognition model and the rendering function

35 Decomposing generic objects Observation Initial prediction Unsupervised improvement Causal intermediate stage visual representations of reflectance, shape, and lighting Learn both the recognition model and the rendering function Representation learning driven by reconstruction error

36 Summary - Causal and compositional models better predict human judgments - Modeling causes allows for training-free crossmodal transfer - Causal models can improve internal representations without supervision

37 Thank you Ilker Yildirim Mario Belledonne Christian Wallraven Winrich Freiwald Joshua Tenenbaum

P Latents Render study face with pose of test face Occlude rendering and test

38 Comparison Pipeline Predict latents of both faces Latents Shape, S; Texture, T Pose; P Latent-space comparison Approximate Renderer Shape, S; Texture, T Pose; P Latents Render study face with pose of test face Occlude rendering and test image with mask Compare in image (or latent) space Study Image-space comparison Test

Causal and compositional generative models in online perception

Causal and compositional generative models in online perception Ilker Yildirim*1 (ilkery@mit.edu), Michael Janner*1 (janner@mit.edu), Mario Belledonne1 (belledon@mit.edu), Christian Wallraven2 (christian.wallraven@gmail.com),