Human Pose Estimation with Deep Learning. Wei Yang

Size: px

Start display at page:

Download "Human Pose Estimation with Deep Learning. Wei Yang"

Arlene Allison Malone
5 years ago
Views:

1 Human Pose Estimation with Deep Learning Wei Yang

2 Applications Understand Activities Family Robots American Heist (2014) - The Bank Robbery Scene 2

3 What do we need to know to recognize a crime scene? 3

4 stand stand Cues Scene: bank Abnormal pose Lay down Lay down Lay down Hands up Activity: robbery 4

5 Why is human pose estimation challenging? 5

6 #1. Articulation #2. Occlusion #3. Scale variation 6

7 #1. Articulation #2. Occlusion #3. Scale variation 7

8 #1. Articulation #2. Occlusion #3. Scale variation 8

9 Applications Understand Activities Family Robots 9

10 3D Human Poses Real-Time Imitation of Human Whole-Body Motions by Humanoids. J. Koenemann, F. Burget, and M. Bennewitz. ICRA,

11 Deep Learning Based Methods Fully Convolutional Network Regression with Euclidean Loss: P heatmaps H p L = 1 σ P 2 2 p=1 H p H p 2 where H p N l p, Σ, s. t., p = 1,, P 11

12 Outline Scale 3D Pose Gray Feature pyramid learning Black In-the-wild 3D pose estimation ICCV 2017 CVPR

13 Outline Scale 3D Pose Gray Feature pyramid learning Black In-the-wild 3D pose estimation ICCV 2017 CVPR

14 Why the Scale Matters? Yipin Yang, Yao Yu, Yu Zhou, Sidan Du, James Davis, Ruigang Yang. Semantic Parametric Reshaping of Human Body Models. In 3DV Workshop on Dynamic Shape Measurement and Analysis,

15 Why the Scale Matters? Learning Feature Pyramids for Human Pose Estimation Wei Yang, Shuang Li, Wanli Ouyang, Hongsheng Li, Xiaogang Wang ICCV,

16 Previous work Multi-scale testing Multi-branch network The model itself is not scale invariant Felzenszwalb, Pedro F., et al. "Object detection with discriminatively trained part-based models." TPAMI, Need much more memory and computation Tompson, Jonathan, et al. "Efficient object localization using convolutional networks." CVPR

17 Hourglass Newell A, Yang K, Deng J. Stacked hourglass networks for human pose estimation[c]//european Conference on Computer Vision. Springer, Cham, 2016:

Identity Mapping Conv PRM + Pool PRM Pyramid Residual Modules 256x256 128 128 64 64 Stack 1 Stack n (a) x (l) Hourglass Hourglass Ratio 1 Ratio n (b) f 1 f C Detailed

18 Identity Mapping Conv PRM + Pool PRM Pyramid Residual Modules 256x Stack 1 Stack n (a) x (l) Hourglass Hourglass Ratio 1 Ratio n (b) f 1 f C Detailed hourglass structure f 0 g Convolution Pyramid Residual module Score maps Addition x (l+1) Newell et al. Stacked Hourglass Networks for Human Pose Estimation. ECCV,

19 Initialization of Multi-Branch Networks Single-branch networks VGG Multi-branch networks Inceptions Traditional weight initialization methods, e.g., Gaussian, Xavier, MSRA (Kaiming), are not applicable for multi-branch networks. Xavier Glorot, Yoshua Bengio ; Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, PMLR 9: ,

20 Forward Backward Initialization of Multi-Branch Networks x 1 (l) x2 (l) (l) x ci x (l) Conv / FC Conv / FC y (l) C i (l) y (l) = W (l) (l) x c + b (l) c=1 x (l+1) = f(y (l) ) y 1 (l) y2 (l) C o (l) Δx (l) = W l T Δy (l) c=1 Δy (l) = f (y l )Δx (l+1) (l) y co αc i l n i l Var ω l = 1 αc o l n o l Var ω l = 1 * α = 0.5 for ReLU and 1 for Tanh and Sigmoid. 20

21 OUTPUT STD Initialization of Multi-Branch Networks 1.1 MSR init Ours init LAYER INDEX He, Kaiming, et al. "Delving deep into rectifiers: Surpassing human-level performance on imagenet classification." ICCV

22 Qualitative Results MPII dataset LSP dataset 22

23 Evaluation Metric PCK: Percentage of Correct Keypoints α max(h, w) 23

24 Results on MPII Human Pose State-of-the-art performance 24

25 Image Classification Top-1 Test Error on CIFAR-10 25

26 Semantic Segmentation: PASCAL VOC 2012 dataset (a) Image (b) DeepLab (c) DeepLap+PRM (a) Image (b) DeepLab (c) DeepLap+PRM (a) Image (b) DeepLab (c) DeepLap+PRM

27 Section Summary Feature pyramid module Generalizable for various networks and tasks Weight initialization for multi-branch networks Learning Feature Pyramids for Human Pose Estimation Wei Yang, Shuang Li, Wanli Ouyang, Hongsheng Li, Xiaogang Wang ICCV,

28 Outline Scale 3D Pose Gray Feature pyramid learning Black In-the-wild 3D pose estimation ICCV 2017 CVPR

29 Challenges: No Annotation Constrained scenes In-the-wild scenes Domain Discrepancy No annotation 29

30 Which one is more plausible? Discriminator 30

31 Weakly Supervised Adversarial Learning 3D dataset Images w/o GT Real Fake G 3D Human Pose Estimator Prediction D Multi-source Discriminator Ground-truth 31

32 Adversarial Learning Fool Generator Discriminator Loss G Euclidean Loss Tell Loss D Classification Loss 32

33 Conv Residual Residual Depth Generator 2D module Depth module 256x Stack 1 Stack n Hourglass 2D score maps 3D Poses 33

34 Discriminator 34

35 64 P 256 Multi-Source Discriminator Real or Fake samples Image I CNN Geometric descriptor P [Δx, Δy, Δz] [Δx 2, Δy 2, Δz 2 ] CNN Fully Connected layers Real Fake Raw poses CNN 64 2D Heatmaps Depthmaps Concatenation 35

36 Effectiveness of Adversarial Learning 36

37 (Ours) Ablation Study on H36M Dataset MPJPE (error in mm) on H36M Image+Pose+Geo Image+Geo Image+Pose Jointly learn 2D + depth Fix 2D, finetune depth Zhou et al. ICCV % less error Full Geo Pose Baseline Baseline (fix 2D) State-of-art* *Zhou et al. ICCV 17 37

38 Ours baseline Results on Images in the Wild 38

39 Multi-view Results 39

40 Section Summary Weakly supervised adversarial learning for 3D pose estimation in the wild Multi-source discriminator 3D Human Pose Estimation in the Wild by Adversarial Learning Wei Yang, Wanli Ouyang, Xiaolong Wang, Hongsheng Li, Xiaogang Wang CVPR,

41 Code Open-source PyTorch code ICCV

42 Thanks! 42

Towards 3D Human Pose Estimation in the Wild: a Weakly-supervised Approach

Towards 3D Human Pose Estimation in the Wild: a Weakly-supervised Approach Xingyi Zhou, Qixing Huang, Xiao Sun, Xiangyang Xue, Yichen Wei UT Austin & MSRA & Fudan Human Pose Estimation Pose representation