Robust Articulated ICP for Real-Time Hand Tracking

Robust Articulated-ICP for Real-Time Hand Tracking Andrea Tagliasacchi* Sofien Bouaziz Matthias Schröder* Mario Botsch Anastasia Tkach Mark Pauly * equal contribution 1/36

Real-Time Tracking Setup Data from (single) RGBD Sensors PrimeSense (Carmine) SoftKinetic Intel RealSense low SNR along in depth (z-axis) completely discards small portions of geometry 2

Motivation? Augmented Reality Intel Perceptual SDK Oculus Research (VR) MagicLeap (AR) Microsoft HoloLens - PR Video (hololens.com) HoloLens (AR) 3

Previous Work Appearance-based (guess solely based on current frame) Model-based (registers model of previous frame) 4

Previous Work Appearance vision Model geometry [Keskin ICCV 12] [Qian CVPR 14] [Oikono. CVPR 14] [Tompson SIG 14] [Melax I3D 13] [Tang CVPR 14] [Sridhar 3DV 14] [Schroder ICRA 14] 5

Contributions combined 2D/3D registration (within ICP) occlusion-aware correspondences (ICP) regularization with statistical pose-space prior extensible and unified real-time solver (>60fps) revamping ICP for articulated tracking 6

ICP: Iterative Closest Point Step 1: optimizing correspondences Step 2: optimizing transformations update correspondences update transformation update correspondences update transformation for more details please refer to Sparse-ICP [Bouaziz, Tagliasacchi, Pauly SGP 13] 7

Robust Articulated ICP [Wei et al. SIGA 12] our tracking process successfully tracks the entire motion sequence while ICP fails to track most of frames. This is because ICP is often sensitive to initial poses and prone to local minimum, particularly involving tracking high-dimensional human body poses from noisy depth data. [Zhang et al. SIGA 14] The accompanying video clearly shows that our tracking process is much more robust than the ICP algorithm [ ] our tracking process successfully tracks the entire motion sequence while ICP fails to track most of frames. [Qian et al. CVPR 14] It uses alternate and gradient based optimization, converges fast, and is suitable for realtime applications. However, it can be easily trapped in poor local optima and cannot handle non-rigid objects well. Yet, it is still insufficient for high-dimensional articulated hands, especially under free viewpoints. 8

System Overview color image silhouette posed model lost tracking? no yes converged? no yes reinitialize 2D correspondences linear 3D correspondences data fitting solve temporal depth image distance trans. point cloud bounds pose space collision min E points + E silh. +E wrist {z } Fitting terms + E pose + E kin. + E temporal {z } Prior terms 9

Preprocessing S s -sensorsilhouette color to identify the Region-of-Interest demo: assumption on picking +y for PCA but all this could be learned!! [Tompson et al. TOG 14] 10

Data Fitting Energies color image silhouette posed model lost tracking? no yes converged? no yes reinitialize 2D correspondences linear 3D correspondences data fitting solve temporal depth image distance trans. point cloud bounds pose space collision min E points + E silh. +E wrist {z } Fitting terms + E pose + E kin. + E temporal {z } Prior terms 11

3D Registration (w/ occlusions) X s -sensorpointcloud E points =! 1 X x2x s kx M (x, )k 1 2 x M ICP (case #1) ICP (case #2) our method (case #2) M -cylinderhandmodel 3D Registration 12

Lesson #1: insufficient to render X s -sensorpointcloud Correspondences of [Wei et al. SIGA 12] (renders the hand model into a point cloud) c 1 c 2 c 1 c 2 Correspondences of [Our Method] (computes correspondences in close form) Ground Truth Motion (finger 2 comes out of occlusion) c 1 c 2 M -cylinderhandmodel 3D Registration 13

2D/3D Registration X s -sensorpointcloud M -cylinderhandmodel 3D Registration 14

2D/3D Registration X s -sensorpointcloud S s -sensorsilhouette M -cylinderhandmodel 3D Registration S r -renderedsilhouette 2D Registration 15

2D Registration S s -sensorsilhouette E silhouette =! 2 X p2s r kp Ss (p, )k 2 2 x Ss S r -renderedsilhouette 2D Registration 16

Model Prior Energies color image silhouette posed model lost tracking? no yes converged? no yes reinitialize 2D correspondences linear 3D correspondences data fitting solve temporal depth image distance trans. point cloud bounds pose space collision min E points + E silh. +E wrist {z } Fitting terms + E pose + E kin. + E temporal {z } Prior terms 17

Joint Bounds Energy 18

Collision Energy 19

Temporal Coherence Energy 20

Statistical Pose Prior encodes correlation across joints 21

Statistical Pose Prior Recorded by VICON tracking system [Schroder ICRA 14] (they are accurate for the person that have been recorded for) 22

Statistical Pose Prior (Soft) E pose =! 4 k (µ + P )k 2 2 +! 5 k k 2 2 we optimize the current pose So that it is similar to a reconstructed pose from the low dimensional subspace but when DOF are unconstrained we would like to restore the neutral (i.e. mean) pose. µ 23

Statistical Pose Prior (Hard) E pose =! 4 k (µ + P )k 2 2 +! 5 k k 2 2! 4 = 1 only optimize in the subspace (i.e. variable replacement) even with a high number of bases the animation remains stiff!!! 24

CPU/GPU Optimization min X i Ē silh. =! 2 X kf i + J i k 2 2 = X i J T i J i 1 X i J T i f i p2s r (n T (J persp (x)j skel (x) +(p Ss (p, ))) 2 S r 20k!!!! J silh 20k 26 J T silhj silh = 26 26 25

Results and Limitations 26

Motion Transfer to Rig 27

Tracking with Fast Motion 28

Tracking of Interacting Hands 29

Limitation: Calibration 30

Limitations: Fist Rotation 31

State of the Art - Evaluations Dexter-1 Dataset (MPI) mm [Shroder ICRA 14] Subspace ICP [Tang et al. 2014] [Sridhar et al. 2013] [Sridhar et al. 2014] [ours] [ours] + re-init. [Sridhar ICCV 13] EM Tracker [Tang CVPR 14] Forest Classifiers 20 10 adbadd flexex1 pinch count tigergrasp wave random [Melax 14] Intel Perceptual SDK (re-initialization helps because Dexter-1 is a low frame-rate dataset only 30Hz) [Tompson TOG 14] ConvNets [Qian CVPR 14] ICP/PSO Hybrid Qualitative Quantitative 32

Qualitative Comparisons Convex Dynamics (Intel SDK) Subspace ICP 33

Conclusions combined 2D/3D registration (within ICP) occlusion-aware correspondences (ICP) regularization with statistical pose-space prior extensible and unified real-time solver (>60fps) fully open source! https://github.com/opengp/htrack 34

Shameless Advertisement Who? Prof. Andrea Tagliasacchi and Brian Wyvill What? MSc (PhD) Where? Who pays? Language? English https://www.csc.uvic.ca/program_information/graduate_studies/msc_program.htm 35

Thank you!!! Live demo at SGP 15!!! Don t be shy!! Sofien Matthias Mark Anastasia https://github.com/opengp/htrack 36

Extra Slides 37

Statistical Prior: Side Effects E pose =! 4 k (µ + P )k 2 2 +! 5 k k 2 2 0 3 6 10 0 1 2 3 as this prior correlates joint angles the convergence speed increases by about 3x 38

Reinitialization from Failure Determine tracking failure [Melax 13] Detection by ~[Qian et al. CVPR 14] 1 = X x2x s kx M (x, )k 2, 2 = S r \ S s S r [ S s xy-fingers Logistic Regressor (optimize alpha s.t.): 2 1+e T [ 1, 2 ]! tracking ok! z-fingers Exploration of corresp. energy 39

Particle Swarm 40