Real-Time Human Pose Recognition in Parts from Single Depth Images

Size: px

Start display at page:

Download "Real-Time Human Pose Recognition in Parts from Single Depth Images"

Branden Cain
6 years ago
Views:

1 Real-Time Human Pose Recognition in Parts from Single Depth Images Jamie Shotton, Andrew Fitzgibbon, Mat Cook, Toby Sharp, Mark Finocchio, Richard Moore, Alex Kipman, Andrew Blake CVPR 2011 PRESENTER: AHSAN ABDULLAH

2 PROBLEM

3 APPROACH Partitioning into body parts helps localizing the joints right hand neck left shoulder right elbow

4 PIPELINE Design Goals Efficiency Robustness capture depth image & remove bg infer body parts per pixel cluster pixels to hypothesize body joint positions fit model & track skeleton

5 BODY PART CLASSIFICATION Compute P(c i w i ) pixels i = (x, y) body part c i image window w i Discriminative approach image windows move with classifier learn classifier P(c i w i ) from training data

6 LEARNING DATA synthetic (train & test) real (test)

7 LEARNING DATA SYNTHESIS Record MoCap 500k frames distilled to 100k poses Retarget to several models Render (depth, body parts) pairs

8 FEATURE SET Depth comparisons - very fast to compute feature response image depth f I, x = d I x d I (x + Δ) image coordinate offset depth Δ Δ x Δ x Δ x Δ Δ x x x input depth image Δ = v d I x scales inversely with depth Background pixels d = large constant

9 DECISION FORESTS Aggregation of decision trees

10 TRAINING DECISION TREES P n (c) body part cn Q n = (I, x) f(i, x; Δ n ) > θ n for all pixels [Breiman et al. 84] P l (c) c l no reduce entropy yes P r (c) r c Take (Δ, θ) that maximises information gain

11 DECISION TREE CLASSIFICATION Toy example: Distinguish left (L) and right (R) sides of the body no image window centred at x f(i, x; Δ 1 ) > θ 1 yes f(i, x; Δ 2 ) > θ 2 no yes P(c) L R P(c) P(c) L R L R

12 DECISION FOREST CLASSIFIER tree 1 (I, x) (I, x) P T (c) tree T P 1 (c) c Trained on different random subset of images bagging helps avoid over-fitting c [Amit & Geman 97] [Breiman 01] [Geurts et al. 06] Average tree posteriors T P c I, x = 1 T t=1 P t (c I, x)

13 Average per-class NUMBER OF TREES ground truth 55% 50% 45% inferred body parts (most likely) 1 tree 3 trees 6 trees 40% Number of trees

14 Average per-class accuracy TREE DEPTH 65% 60% synthetic test data 65% 60% real test data 55% 55% 50% 50% 45% 45% 40% 40% 35% 35% 30% Depth of trees 30% 5 15 Depth of trees

15 Body parts to joint hypotheses Define 3D world space density 3D coord pixel weight 3D coord of i th pixel 1 2 pixel index i bandwidth inferred probability depth at i th pixel Mean shift for mode detection 3. hypothesize body joints

16 input depth inferred body parts front view side view inferred joint positions No tracking or smoothing top view

17 input depth inferred body parts front view side view inferred joint positions No tracking or smoothing top view

18 Center Head Center Neck Left Shoulder Right Left Elbow Right Elbow Left Wrist Right Wrist Left Hand Right Hand Left Knee Right Knee Left Ankle Right Ankle Left Foot Right Foot Mean AP Average precision JOINT PREDICTION ACCURACY

19 Center Head Center Neck Left Shoulder Right Shoulder Left Elbow Right Elbow Left Wrist Right Wrist Left Hand Right Hand Left Knee Right Knee Left Ankle Right Ankle Left Foot Right Foot Mean AP Average precision JOINT PREDICTION ACCURACY Joint prediction from ground truth body parts Joint prediction from inferred body parts

20 ANALYSIS No temporal information - frame-by-frame Very fast - simple depth image feature - parallel decision forest classifier

21 Uses KINECT SYSTEM 3D joint hypotheses kinematic constraints temporal coherence to give full skeleton higher accuracy invisible joints multi-player 4. track skeleton

22 SUMMARY Frame-by-frame gives robustness Body parts representation for efficiency Fast, simple machine learning Significant engineering to scale to a massive, varied training data set

23 QUESTIONS

Jamie Shotton, Andrew Fitzgibbon, Mat Cook, Toby Sharp, Mark Finocchio, Richard Moore, Alex Kipman, Andrew Blake CVPR 2011

Jamie Shotton, Andrew Fitzgibbon, Mat Cook, Toby Sharp, Mark Finocchio, Richard Moore, Alex Kipman, Andrew Blake CVPR 2011 Auto-initialize a tracking algorithm & recover from failures All human poses,