Robust Articulated ICP for Real-Time Hand Tracking

Similar documents
Accurate 3D Face and Body Modeling from a Single Fixed Kinect

Intrinsic3D: High-Quality 3D Reconstruction by Joint Appearance and Geometry Optimization with Spatially-Varying Lighting

Deep Learning for Virtual Shopping. Dr. Jürgen Sturm Group Leader RGB-D

Dynamic Geometry Processing

Dynamic 2D/3D Registration for the Kinect

A novel approach to motion tracking with wearable sensors based on Probabilistic Graphical Models

Back To RGB: Deep Articulated Hand Pose Estimation From a Single Camera Image

3D Computer Vision. Structured Light II. Prof. Didier Stricker. Kaiserlautern University.

Single-shot Extrinsic Calibration of a Generically Configured RGB-D Camera Rig from Scene Constraints

Structured Light II. Thanks to Ronen Gvili, Szymon Rusinkiewicz and Maks Ovsjanikov

Lecture 19: Depth Cameras. Visual Computing Systems CMU , Fall 2013

Chaplin, Modern Times, 1936

Multiframe Scene Flow with Piecewise Rigid Motion. Vladislav Golyanik,, Kihwan Kim, Robert Maier, Mathias Nießner, Didier Stricker and Jan Kautz

Nonrigid Surface Modelling. and Fast Recovery. Department of Computer Science and Engineering. Committee: Prof. Leo J. Jia and Prof. K. H.

Registration of Dynamic Range Images

Colored Point Cloud Registration Revisited Supplementary Material

Real-time Image-based Reconstruction of Pipes Using Omnidirectional Cameras

Dynamic Time Warping for Binocular Hand Tracking and Reconstruction

User-assisted Segmentation and 3D Motion Tracking. Michael Fleder Sudeep Pillai Jeremy Scott

Learning 6D Object Pose Estimation and Tracking

International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Vol. XXXIV-5/W10

Monocular Tracking and Reconstruction in Non-Rigid Environments

TEXTURE OVERLAY ONTO NON-RIGID SURFACE USING COMMODITY DEPTH CAMERA

SurfNet: Generating 3D shape surfaces using deep residual networks-supplementary Material

Accurate, Robust, and Flexible Real-time Hand Tracking

arxiv: v3 [cs.cv] 10 Jan 2018

Modeling 3D Human Poses from Uncalibrated Monocular Images

Dense 3D Reconstruction from Autonomous Quadrocopters

3D Reconstruction with Tango. Ivan Dryanovski, Google Inc.

High-Fidelity Augmented Reality Interactions Hrvoje Benko Researcher, MSR Redmond

TEXTURE OVERLAY ONTO NON-RIGID SURFACE USING COMMODITY DEPTH CAMERA

High-Fidelity Facial and Speech Animation for VR HMDs

4422 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 27, NO. 9, SEPTEMBER 2018

Augmenting Reality, Naturally:

Planning, Execution and Learning Application: Examples of Planning in Perception

Overview. Related Work Tensor Voting in 2-D Tensor Voting in 3-D Tensor Voting in N-D Application to Vision Problems Stereo Visual Motion

Tri-modal Human Body Segmentation

3D object recognition used by team robotto

Segmentation and Tracking of Partial Planar Templates

ACQUIRING 3D models of deforming objects in real-life is

Perceptual Grouping from Motion Cues Using Tensor Voting

Compressive Sensing: Opportunities and Perils for Computer Vision

3D Human Motion Analysis and Manifolds

Outline. 1 Why we re interested in Real-Time tracking and mapping. 3 Kinect Fusion System Overview. 4 Real-time Surface Mapping

3D Face and Hand Tracking for American Sign Language Recognition

Virtual Reality in Assembly Simulation

CRF Based Point Cloud Segmentation Jonathan Nation

Robust Model-Free Tracking of Non-Rigid Shape. Abstract

3D Reconstruction of Dynamic Textures with Crowd Sourced Data. Dinghuang Ji, Enrique Dunn and Jan-Michael Frahm

ECE 6554:Advanced Computer Vision Pose Estimation

Robotics Programming Laboratory

3D Perception. CS 4495 Computer Vision K. Hawkins. CS 4495 Computer Vision. 3D Perception. Kelsey Hawkins Robotics

Computer Vision II Lecture 14

Personal Navigation and Indoor Mapping: Performance Characterization of Kinect Sensor-based Trajectory Recovery

3D Photography: Active Ranging, Structured Light, ICP

Fast and Robust Hand Tracking Using Detection-Guided Optimization

Task analysis based on observing hands and objects by vision

This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore.

Learning Articulated Skeletons From Motion

Model-Based Reinforcement of Kinect Depth Data for Human Motion Capture Applications

What have we leaned so far?

Dynamic Shape Tracking via Region Matching

Dense 3D Modelling and Monocular Reconstruction of Deformable Objects

LUMS Mine Detector Project

Shape from Selfies : Human Body Shape Estimation using CCA Regression Forests

Surface Registration. Gianpaolo Palma

Iterative Estimation of 3D Transformations for Object Alignment

The Future of Natural User Interaction

Augmented Reality VU. Computer Vision 3D Registration (2) Prof. Vincent Lepetit

calibrated coordinates Linear transformation pixel coordinates

Localization, Mapping and Exploration with Multiple Robots. Dr. Daisy Tang

arxiv: v1 [cs.cv] 28 Sep 2018

Grasp Recognition using a 3D Articulated Model and Infrared Images

Recovering 6D Object Pose and Predicting Next-Best-View in the Crowd Supplementary Material

Multiple View Geometry

CS 775: Advanced Computer Graphics. Lecture 17 : Motion Capture

Human Upper Body Pose Estimation in Static Images

KinectFusion: Real-Time Dense Surface Mapping and Tracking

Computer Vision Lecture 17

ECCV Presented by: Boris Ivanovic and Yolanda Wang CS 331B - November 16, 2016

Does Dimensionality Reduction Improve the Quality of Motion Interpolation?

Computer Vision Lecture 17

Correspondence. CS 468 Geometry Processing Algorithms. Maks Ovsjanikov

Single Camera Calibration

a real convergence computer vision finally becoming useful

Combining PGMs and Discriminative Models for Upper Body Pose Detection

PROBLEM FORMULATION AND RESEARCH METHODOLOGY

On-line and Off-line 3D Reconstruction for Crisis Management Applications

Structure-aware 3D Hand Pose Regression from a Single Depth Image

Virtualized Reality Using Depth Camera Point Clouds

A Model-based Approach to Rapid Estimation of Body Shape and Postures Using Low-Cost Depth Cameras

Leveraging Depth Cameras and Wearable Pressure Sensors for Full-body Kinematics and Dynamics Capture

3D Geometric Computer Vision. Martin Jagersand Univ. Alberta Edmonton, Alberta, Canada

Tracking. Hao Guan( 管皓 ) School of Computer Science Fudan University

Direct Plane Tracking in Stereo Images for Mobile Navigation

Robust Human Body Shape and Pose Tracking

Motion Tracking and Event Understanding in Video Sequences

Stereo and Epipolar geometry

A consumer level 3D object scanning device using Kinect for web-based C2C business

Pose estimation using a variety of techniques

Transcription:

Robust Articulated-ICP for Real-Time Hand Tracking Andrea Tagliasacchi* Sofien Bouaziz Matthias Schröder* Mario Botsch Anastasia Tkach Mark Pauly * equal contribution 1/36

Real-Time Tracking Setup Data from (single) RGBD Sensors PrimeSense (Carmine) SoftKinetic Intel RealSense low SNR along in depth (z-axis) completely discards small portions of geometry 2

Motivation? Augmented Reality Intel Perceptual SDK Oculus Research (VR) MagicLeap (AR) Microsoft HoloLens - PR Video (hololens.com) HoloLens (AR) 3

Previous Work Appearance-based (guess solely based on current frame) Model-based (registers model of previous frame) 4

Previous Work Appearance vision Model geometry [Keskin ICCV 12] [Qian CVPR 14] [Oikono. CVPR 14] [Tompson SIG 14] [Melax I3D 13] [Tang CVPR 14] [Sridhar 3DV 14] [Schroder ICRA 14] 5

Contributions combined 2D/3D registration (within ICP) occlusion-aware correspondences (ICP) regularization with statistical pose-space prior extensible and unified real-time solver (>60fps) revamping ICP for articulated tracking 6

ICP: Iterative Closest Point Step 1: optimizing correspondences Step 2: optimizing transformations update correspondences update transformation update correspondences update transformation for more details please refer to Sparse-ICP [Bouaziz, Tagliasacchi, Pauly SGP 13] 7

Robust Articulated ICP [Wei et al. SIGA 12] our tracking process successfully tracks the entire motion sequence while ICP fails to track most of frames. This is because ICP is often sensitive to initial poses and prone to local minimum, particularly involving tracking high-dimensional human body poses from noisy depth data. [Zhang et al. SIGA 14] The accompanying video clearly shows that our tracking process is much more robust than the ICP algorithm [ ] our tracking process successfully tracks the entire motion sequence while ICP fails to track most of frames. [Qian et al. CVPR 14] It uses alternate and gradient based optimization, converges fast, and is suitable for realtime applications. However, it can be easily trapped in poor local optima and cannot handle non-rigid objects well. Yet, it is still insufficient for high-dimensional articulated hands, especially under free viewpoints. 8

System Overview color image silhouette posed model lost tracking? no yes converged? no yes reinitialize 2D correspondences linear 3D correspondences data fitting solve temporal depth image distance trans. point cloud bounds pose space collision min E points + E silh. +E wrist {z } Fitting terms + E pose + E kin. + E temporal {z } Prior terms 9

Preprocessing S s -sensorsilhouette color to identify the Region-of-Interest demo: assumption on picking +y for PCA but all this could be learned!! [Tompson et al. TOG 14] 10

Data Fitting Energies color image silhouette posed model lost tracking? no yes converged? no yes reinitialize 2D correspondences linear 3D correspondences data fitting solve temporal depth image distance trans. point cloud bounds pose space collision min E points + E silh. +E wrist {z } Fitting terms + E pose + E kin. + E temporal {z } Prior terms 11

3D Registration (w/ occlusions) X s -sensorpointcloud E points =! 1 X x2x s kx M (x, )k 1 2 x M ICP (case #1) ICP (case #2) our method (case #2) M -cylinderhandmodel 3D Registration 12

Lesson #1: insufficient to render X s -sensorpointcloud Correspondences of [Wei et al. SIGA 12] (renders the hand model into a point cloud) c 1 c 2 c 1 c 2 Correspondences of [Our Method] (computes correspondences in close form) Ground Truth Motion (finger 2 comes out of occlusion) c 1 c 2 M -cylinderhandmodel 3D Registration 13

2D/3D Registration X s -sensorpointcloud M -cylinderhandmodel 3D Registration 14

2D/3D Registration X s -sensorpointcloud S s -sensorsilhouette M -cylinderhandmodel 3D Registration S r -renderedsilhouette 2D Registration 15

2D Registration S s -sensorsilhouette E silhouette =! 2 X p2s r kp Ss (p, )k 2 2 x Ss S r -renderedsilhouette 2D Registration 16

Model Prior Energies color image silhouette posed model lost tracking? no yes converged? no yes reinitialize 2D correspondences linear 3D correspondences data fitting solve temporal depth image distance trans. point cloud bounds pose space collision min E points + E silh. +E wrist {z } Fitting terms + E pose + E kin. + E temporal {z } Prior terms 17

Joint Bounds Energy 18

Collision Energy 19

Temporal Coherence Energy 20

Statistical Pose Prior encodes correlation across joints 21

Statistical Pose Prior Recorded by VICON tracking system [Schroder ICRA 14] (they are accurate for the person that have been recorded for) 22

Statistical Pose Prior (Soft) E pose =! 4 k (µ + P )k 2 2 +! 5 k k 2 2 we optimize the current pose So that it is similar to a reconstructed pose from the low dimensional subspace but when DOF are unconstrained we would like to restore the neutral (i.e. mean) pose. µ 23

Statistical Pose Prior (Hard) E pose =! 4 k (µ + P )k 2 2 +! 5 k k 2 2! 4 = 1 only optimize in the subspace (i.e. variable replacement) even with a high number of bases the animation remains stiff!!! 24

CPU/GPU Optimization min X i Ē silh. =! 2 X kf i + J i k 2 2 = X i J T i J i 1 X i J T i f i p2s r (n T (J persp (x)j skel (x) +(p Ss (p, ))) 2 S r 20k!!!! J silh 20k 26 J T silhj silh = 26 26 25

Results and Limitations 26

Motion Transfer to Rig 27

Tracking with Fast Motion 28

Tracking of Interacting Hands 29

Limitation: Calibration 30

Limitations: Fist Rotation 31

State of the Art - Evaluations Dexter-1 Dataset (MPI) mm [Shroder ICRA 14] Subspace ICP [Tang et al. 2014] [Sridhar et al. 2013] [Sridhar et al. 2014] [ours] [ours] + re-init. [Sridhar ICCV 13] EM Tracker [Tang CVPR 14] Forest Classifiers 20 10 adbadd flexex1 pinch count tigergrasp wave random [Melax 14] Intel Perceptual SDK (re-initialization helps because Dexter-1 is a low frame-rate dataset only 30Hz) [Tompson TOG 14] ConvNets [Qian CVPR 14] ICP/PSO Hybrid Qualitative Quantitative 32

Qualitative Comparisons Convex Dynamics (Intel SDK) Subspace ICP 33

Conclusions combined 2D/3D registration (within ICP) occlusion-aware correspondences (ICP) regularization with statistical pose-space prior extensible and unified real-time solver (>60fps) fully open source! https://github.com/opengp/htrack 34

Shameless Advertisement Who? Prof. Andrea Tagliasacchi and Brian Wyvill What? MSc (PhD) Where? Who pays? Language? English https://www.csc.uvic.ca/program_information/graduate_studies/msc_program.htm 35

Thank you!!! Live demo at SGP 15!!! Don t be shy!! Sofien Matthias Mark Anastasia https://github.com/opengp/htrack 36

Extra Slides 37

Statistical Prior: Side Effects E pose =! 4 k (µ + P )k 2 2 +! 5 k k 2 2 0 3 6 10 0 1 2 3 as this prior correlates joint angles the convergence speed increases by about 3x 38

Reinitialization from Failure Determine tracking failure [Melax 13] Detection by ~[Qian et al. CVPR 14] 1 = X x2x s kx M (x, )k 2, 2 = S r \ S s S r [ S s xy-fingers Logistic Regressor (optimize alpha s.t.): 2 1+e T [ 1, 2 ]! tracking ok! z-fingers Exploration of corresp. energy 39

Particle Swarm 40