3D Hand and Fingers Reconstruction from Monocular View

Similar documents
Human Upper Body Pose Estimation in Static Images

Gesture Recognition using a Probabilistic Framework for Pose Matching

Road Sign Analysis Using Multisensory Data

Grasp Recognition using a 3D Articulated Model and Infrared Images

Multi-view Appearance-based 3D Hand Pose Estimation

Dynamic Time Warping for Binocular Hand Tracking and Reconstruction

Monocular 3D human pose estimation with a semi-supervised graph-based method

Modeling the Constraints of Human Hand Motion

MOVING CAMERA ROTATION ESTIMATION USING HORIZON LINE FEATURES MOTION FIELD

The Isometric Self-Organizing Map for 3D Hand Pose Estimation

Hand Gesture Recognition. By Jonathan Pritchard

MAPI Computer Vision. Multiple View Geometry

Motion based 3D Target Tracking with Interacting Multiple Linear Dynamic Models

A Survey of Light Source Detection Methods

Real-Time Multi-Body Vehicle Dynamics Using A Modular Modeling Methodology

Hand Gesture Recognition Based On The Parallel Edge Finger Feature And Angular Projection

3D Hand Pose Estimation by Finding Appearance-Based Matches in a Large Database of Training Views

3D Face and Hand Tracking for American Sign Language Recognition

SUPER RESOLUTION IMAGE BY EDGE-CONSTRAINED CURVE FITTING IN THE THRESHOLD DECOMPOSITION DOMAIN

3-D TERRAIN RECONSTRUCTION WITH AERIAL PHOTOGRAPHY

Machine Vision and Gesture Recognition Machine Vision Term Paper

Reflection and Refraction

Accurate 3D Face and Body Modeling from a Single Fixed Kinect

An object-based approach to plenoptic videos. Proceedings - Ieee International Symposium On Circuits And Systems, 2005, p.

A Quantitative Comparison of 4 Algorithms for Recovering Dense Accurate Depth

AC : DEVELOPMENT OF A ROBOTIC PLATFORM FOR TEACH- ING MODEL-BASED DESIGN TECHNIQUES IN DYNAMICS AND CON- TROL PROGRAM

Gesture Recognition Technique:A Review

Hand Pose Estimation Using Expectation-Constrained-Maximization From Voxel Data

An Interactive Technique for Robot Control by Using Image Processing Method

KANGAL REPORT

Perceptual Grouping from Motion Cues Using Tensor Voting

Multi-View Omni-Directional Imaging

Dense 3D Motion Capture for Human Faces

PATH PLANNING OF UNMANNED AERIAL VEHICLE USING DUBINS GEOMETRY WITH AN OBSTACLE

ROBUST FACE DETECTION UNDER CHALLENGES OF ROTATION, POSE AND OCCLUSION

HUMAN COMPUTER INTERFACE BASED ON HAND TRACKING

DETC COMPARING KINEMATIC AND DYNAMIC HAND MODELS FOR INTERACTIVE GRASPING SIMULATION

On the Geometry of Visual Correspondence

Think-Pair-Share. What visual or physiological cues help us to perceive 3D shape and depth?

3D Hand Pose Estimation by Finding Appearance-Based Matches in a Large Database of Training Views

Real-time 3-D Hand Posture Estimation based on 2-D Appearance Retrieval Using Monocular Camera

A Real Time System for Detecting and Tracking People. Ismail Haritaoglu, David Harwood and Larry S. Davis. University of Maryland

Imitation of Hand Postures with Particle Based Fluidics Method

International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Vol. XXXIV-5/W10

Hand Tracking Miro Enev UCDS Cognitive Science Department 9500 Gilman Dr., La Jolla CA

A Proposed Approach for Solving Rough Bi-Level. Programming Problems by Genetic Algorithm

Applying Synthetic Images to Learning Grasping Orientation from Single Monocular Images

CS485/685 Computer Vision Spring 2012 Dr. George Bebis Programming Assignment 2 Due Date: 3/27/2012

A Fault Model Centered Modeling Framework for Self-healing Computing Systems

arxiv: v1 [cs.cv] 28 Sep 2018

Automatic Video Segmentation for Czech TV Broadcast Transcription

Computer Vision Lecture 17

Computer Vision Lecture 17

Model-based Gait Representation via Spatial Point Reconstruction

MIME: A Gesture-Driven Computer Interface

Projection Model, 3D Reconstruction and Rigid Motion Estimation from Non-central Catadioptric Images

Lexical Gesture Interface

MATRIX ALGORITHM OF SOLVING GRAPH CUTTING PROBLEM

Recap: Features and filters. Recap: Grouping & fitting. Now: Multiple views 10/29/2008. Epipolar geometry & stereo vision. Why multiple views?

2. Design Planning with the Quartus II Software

How to Make a Simple and Robust 3D Hand Tracking Device Using a Single Camera

Research on Image Splicing Based on Weighted POISSON Fusion

A Cylindrical Surface Model to Rectify the Bound Document Image

521466S Machine Vision Exercise #1 Camera models

Virtual Interaction System Based on Optical Capture

Depth. Common Classification Tasks. Example: AlexNet. Another Example: Inception. Another Example: Inception. Depth

Abstract. I. Introduction

EE 264: Image Processing and Reconstruction. Image Motion Estimation II. EE 264: Image Processing and Reconstruction. Outline

ASIAGRAPH 2008 The Intermediate View Synthesis System For Soccer Broadcasts

arxiv: v1 [cs.cv] 16 Aug 2016

Eye tracking assisted extraction of attentionally important objects from videos

A Novel Stereo Camera System by a Biprism

Occlusion Detection of Real Objects using Contour Based Stereo Matching

Classifier Evasion: Models and Open Problems

Model-Based Face Computation

Section II. Nios II Software Development

Occlusion Robust Multi-Camera Face Tracking

A Bayesian Framework for Real-Time 3D Hand Tracking in High Clutter Background

Binary Morphological Model in Refining Local Fitting Active Contour in Segmenting Weak/Missing Edges

Using a Projected Subgradient Method to Solve a Constrained Optimization Problem for Separating an Arbitrary Set of Points into Uniform Segments

Dynamic Shape Tracking via Region Matching

Unsupervised Human Members Tracking Based on an Silhouette Detection and Analysis Scheme

3D Modeling of Objects Using Laser Scanning

Automated Planning for Feature Model Configuration based on Functional and Non-Functional Requirements

Fingertips Tracking based on Gradient Vector

Object Tracking with Dynamic Feature Graph

TERRAIN-ADAPTIVE GENERATION OF OPTIMAL CONTINUOUS TRAJECTORIES FOR MOBILE ROBOTS

Visual Tracking of Human Body with Deforming Motion and Shape Average

Intelligent knowledge-based system for the automated screwing process control

Vehicle Occupant Posture Analysis Using Voxel Data

Continuous Multi-View Tracking using Tensor Voting

2 DETERMINING THE VANISHING POINT LOCA- TIONS

Feature Tracking Evaluation for Pose Estimation in Underwater Environments

Multi-camera Tracking of Articulated Human Motion Using Motion and Shape Cues

Automatic mesh generation for LES in complex geometries

Balanced Multiresolution for Symmetric/Antisymmetric Filters

Shape Descriptor using Polar Plot for Shape Recognition.

Estimating 3D Hand Pose from a Cluttered Image

A Classification System and Analysis for Aspect-Oriented Programs

Torque-Position Transformer for Task Control of Position Controlled Robots

Transcription:

3D Hand and Fingers Reconstruction rom Monocular View 1. Research Team Project Leader: Graduate Students: Pro. Isaac Cohen, Computer Science Sung Uk Lee 2. Statement o Project Goals The needs or an accurate hand motion tracking have been increasing in HCI and applications requiring an understanding o user motions. Hand gestures can be used as a convenient user interace or various computer applications in which the state and the action o the users are automatically inerred rom a set o video cameras. The objective is to extend the current mousekeyboard interaction techniques in order to allow the user to behave naturally in the immersed environment, as the system perceives and responds appropriately to user actions. HCI related applications remains the most requent however; we have ocused our eort on the accurate detection and tracking o un-instrumented hands or assessing user perormance in accomplishing a speciic task. 3. Project Role in Support o IMSC Strategic Plan Multimodal communications rely on speech, haptics and visual sensing o the user in order to characterize the user s emotional and/or physical state. The proposed research ocuses on augmenting the interaction capabilities o a multimodal system by providing an automatic tracking o hands motion and recognizing the corresponding hands gestures. The objective here is to augment or replace the mouse and keyboard paradigm with unctionalities relying on natural hands motion or driving a speciic application. 4. Discussion o Methodology Used Inerring hand motion rom monocular view is challenging, as the detected 2D silhouette is not suicient to depict a complicated hand pose. Thereore, several studies introduce respective hand model and model-based approach. Although these methods can track the articulated hand motion rom monocular view, many o them are computationally too expensive and have diiculties in addressing hand s sel-occlusions. We revised the articulated hand model proposed previous year to deal with natural hand motion constraints, and it allows tracking detected hand eiciently in the presence o sel-occlusions and has a low computational cost. Ater the segmentation o the hand silhouette rom the video stream, an articulated hand model is itted onto the 2D mask o a hand. For initialization, 3D hand model coniguration such as joint oset and size are reined by the avorable stretched-out pose. Section 4.2 describes the initialization process in detail. First joints connecting palm and each inger orm a rigid shape. Once a 3D model is reined, global hand pose is computed using this rigid orm constraint and then estimation o each inger motion is perormed. To reconstruct accurate pose, 3D model is updated by re-projecting it on 2D detected silhouette in the tracking process. An overview o the proposed approach is shown in igure 1. 231

Figure 1. Overview o the proposed approach. 4.1 Modeling Hand Constraints Without any constraints, it is a very exhaustive task to analyze human hand motion. However, joints in a hand are interrelated and their motions are related by physical constraints. Huang et al [4] showed in detail that human hand motion is highly constrained. A learning approach was proposed to model the hand coniguration space directly sensed rom a CyberGlove. In this project, three types o constraints are deined or natural hand motion. Figure 2 shows the articulated hand model proposed to model hand motion constraints. Figure 2. The articulated hand model. Each joint ji is a revolute joint where is the index o ingers enumerated rom thumb to little inger ( = ~ 4). All the irst joints (where i = ) connecting palm and each inger have 2 degrees o reedom and other joints have a single rotational axis. Thereore, the suggested model has 2 DOF. Intra-Finger Constraints. In this category, the main constraint considered is o inger joints during a bending. The number o rotational axes in a joint is constrained. By using the planar 3- joint-arm inger model, DOF o a inger is reduced to 4. j, the irst joint o a inger, has 2 rotation axes one or bending motion and another is or abduction/adduction motion but other joints have single rotation axis. Table 1 shows the range o each joint angle or bending motion. Table 1. Joint angle constraints or bending motion ji i = i = 1 i = 2 = -1 o ~ +45 o + o ~ +9 o -5 o ~ +9 o = 1~4-15 o ~ +9 o -5 o ~ +9 o -5 o ~ +9 o Each contiguous joint has the ollowing constraint: = a (1) j i i ji -1 232

where inger index =1 to 4 and joint index i = 1, 2. a i is deined by (limit angle o j i ) / (limit angle o ji - 1 ). This type o constraint is not strictly applied, but is an eicient guide o joint oset estimation. Inter-Finger Constraints. Proposed model has ixed geometric proportion among j (osets o all irst joint o a inger). No joint can overlap each other in 3D coordinates. Interrelated motion constraints or abduction/adduction motion between adjacent ingers are deined in this category. Adjacent inger tends to ollow the dominant articulated motion. Global Motion Constraints. This category includes some restrictions on target motion. Tracking objective is natural hand motion, begins with avorable pose or initialization. No external torque on a joint is expected. Motion should be smooth. By these constraints, we address the problem o tracking articulated hand motion with sel-occlusion, rotation and translation, but the proposed approach does not address the scaling and depth estimation rom monocular view. Some recent studies introduce similar constraints to the above mentioned ones. However, they are hardly modeled as an explicit orm o equations. Figure 3. Model initialization. (a) Distance map. (b), (c) Approaching median axis (minimum distance) along the gradient o distance map. (d) Joint position reinement. (e) Coarse 2D model with ellipsoid itting. () An example ater reinement. Figure 4. Kinematics inger model 4.2 Hand Model Fitting with Hand Constraints A sequence o hand motion begins with a avorable initial pose as shown in igure 3. By using an ellipsoid itting [12] on top o the segmented hand silhouette, brie scale and orientation o a hand are computed. Borgeors two-pass algorithm [12] is used to compute the distance map d(x,y) o a hand image or the model reinement. D ( x, y) = Ï- d( x, y), ( x, y) Œ Silhouette (2) Ì Ó d( x, y), Otherwise By minimizing the distance D(x,y) along the gradient D( x, y), each joint oset can be reined. Each joint width (thickness) can be also estimated with D(x,y). Figure 3 shows an example o model initialization in 2D. Based on this inormation, the initial coniguration o 3D hand model is reined. 233

Using inter-inger constraints and global motion constraints, a hand motion can be divided into 1 global pose and individual inger motion. At time t, irst joints o j, j, and j 4 orm a palmtriangle PT(t) which has three sides o w 1 t, w 2 t w t. In the two consecutive palm-triangle PT(t) and PT(t+1), the proportion o corresponding sides o each palm-triangle provides the global pose. By the global motion constraint, j is the origin o a hand coniguration. Thus, the translation vector T g can be estimated by two consecutive 2D position o j detected rom hand silhouette. p p By the rigid motion constraint, w t / wt + 1 gives the angle or rotation vector R g. In Figure 5, a palm-triangle o bright violet shows the estimation results o global pose. Figure 5. 3D hand reconstruction 1: 1 st row show the result o the global pose estimation with palm-triangle. Re-projected hand model 2 nd row shows the reconstructed 3D hand model. Figure 6. 3D hand reconstruction 2: 1 st Row shows the original hand pose. 2 nd row represents the reconstructed pose. 3 rd row is dierent angle view. Ater estimating the global rotation and translation [R g, T g ], each joint angle o a inger can be estimated rom the equations in Figure 4. Each length o links in a inger was ixed in the initialization. Thereore, joint angle qi or the bending motion is explicated. The proposed tracking technique is based on tracking by synthesis. To maximize the likelihood, possible sets o coniguration is tested. The adjustment o a hand model is guided by the motion constraints and the coniguration o previous rame. Disparity vector o the closest boundary between the model segment and the detected silhouette is computed or estimating the position o a joint angle. The adjusted 3D hand coniguration is projected back onto the 2D image and updated rom 2D image measurements back and orth. Figure 5 shows some results o motion tracking. Reconstructed 3D hand model and its re-projected 2D model are correctly matched with the detected silhouette. Various hand poses including sel-occlusion and the corresponding results are shown in Figure 6. 5. Short Description o Achievements in Previous Years The 3D shape o the hands is reconstructed rom 2D hands silhouettes extracted rom two calibrated cameras using GC-based approach. Hand pose inormation can be computed by matching these synthesized shapes to pre-deined hand model or learning hands postures using a 3D shape description. The previous ramework we have developed or 3D body reconstruction 234

[14] has been modiied or 3D hands reconstruction. The detected silhouettes are segmented into separate ingers: we compute edge inormation in the segmented hand region (inside the silhouettes) to reconstruct each inger as a cylindrical orm. This local edge analysis allows us to reconstruct the 3D hand shape accurately when the ingers are splayed or when they are not. 5a. Detail o Accomplishments During the Past Year The goal o this project is to understand and reconstruct human hand motion with natural articulation and occlusion in real time. The objective o this research eort is to augment or replace the mouse and keyboard paradigm with unctionalities relying on natural hands motion or driving a speciic application. During this year we have ocused on inerring accurate hand motion rom monocular view. An articulated model is itted accurately by the proposed model reinement technique and global pose o the hand can be computed by characterizing the orientation o the palm. Individual inger motion is modeled and tracked as a planar arm. We have mainly addressed the problem o tracking the simultaneous motion o rotation, translation, and sel-occlusion in monocular view. Proposed approach is easy to extend or real-time and multi-view ramework, although the target motion is somewhat restrictive. 6. Other Relevant Work Being Conducted and How this Project is Dierent A large number o studies have been proposed or capturing human hand motion. Some o them have ocused on very speciic domain and thus have a large number o restrictive assumptions. In [9.1.11], trajectories o hand motion are considered or classiying human gestures and are suicient or the studied domain. Several rameworks have been proposed we can distinguish two types: appearance-based and model-based methods. In [7] an appearance-based method is proposed or characterizing the various hand coniguration and inger pose. This relies on a learning-based approach and requires training the algorithm on a large data set that depicts all possible conigurations. Similarly in [8], articulated pose o a hand is ormulated as a database indexing problem. From a large set o synthetic hand images, the closest matches o an input hand image are retrieved as an estimated pose. Model based approaches rely on the physical properties o the hand and inger motion [1.2.3] in order to iner it rom a single or multiple views. Huang et al. [5] proposed a model-based approach that integrates constraints o natural hand motion. They also proposed a learning approach to model the kinematics constraints directly rom a real input device such as the CyberGlove [4]. They ocused on the analysis o local inger motion and constraints rom observed data. For articulated motion tracking, a Sequential Monte Carlo algorithm based on importance sampling is employed. In a ixed viewpoint, this produces good results; however, global hand position and motion are not dealt with. Recently, some studies o integrating various cues such as motion, shading and edges were proposed. Metaxas et al [6] computes the model driving orces using gradient-based optical low constraint and edges. A orward recursive dynamic model is introduced to track the motion in response to 3D data derived orces applied to the model. They assumed that global hand motion is rigid motion. No simultaneous articulated motion is expected. When the ingers are in motion o signiicant relative motion with occlusions, they hardly drive the hand model. 235

7. Plan or the Next Year We plan to extent the capabilities o the current system to multi-view ramework. We will also ocus the development o an eicient joint angle tracking technique or both o the real-time inger motion tracking and the hand localization in 3D. 8. Expected Milestones and Deliverables For the next ive-years, we will ocus on expanding current method or recognition o hands gestures, as well as tracking hands motion o a person manipulating objects. 24-25 Articulated model itting and tracking Eicient tracking technique or real time perormances 25-26 Gesture recognition rom the inerred articulated model Computer interaction using basic hand gestures. 26-28 Multimodal interaction using hand motion 9. Member Company Beneits We are investigating two potential applications: manipulation o 3D objects displayed on a 3D LCD and evaluation o handwriting deiciencies. We initiated a collaboration with occupational therapy specialists that identiy this technology to be very important or characterizing handwriting deiciencies among persons alicted with ADD syndrome. A potential use o the technology or monitoring operators perorming maintenance tasks on expensive machinery. 1. Reerences [1] J.M. Rehg and T. Kanade, "Model-Based Tracking o Sel-Occluding Articulated Objects", ICCV95 [2] Q. Delamarre and O. Faugeras, Finding pose o hand in video images: a stereo-based approach, In IEEE International Conerence on Automatic Face and Gesture Recognition, Japan, 1998 [3] B. Stenger, P. R. S. Mendonça, and R. Cipolla, Model-Based 3D Tracking o an Articulated Hand, In CVPR, Volume II, pages 31-315, Kauai, USA, December 21. [4] J. Lin, Y. Wu and Thomas S. Huang, "Modeling Human Hand Constraints", In. Workshop on Human Motion, Austin, Dec., 2. [5] Y. Wu, J. Lin and T. S. Huang, "Capturing Natural Hand Articulation", in ICCV 1, pp.426-432. [6] S. Lu, G. Huang, D. Samaras and D. Metaxas, Model-based Integration o Visual Cues or Hand Tracking In Proc. WMVC 22 [7] R. Rosales, V. Athitsos, L. Sigal, and S. Sclaro, "3D Hand Pose Reconstruction Using Specialized Mappings" IEEE ICCV. Canada. Jul. 21. [8] V. Athitsos and S. Sclaro, "Estimating 3D Hand Pose rom a Cluttered Image", In CVPR 23. [9] R. Cutler and M. Turk, View-based interpretation o realtime optical low or gesture recognition, In Proc. Face and Gesture Recognition, pp. 416 421, 1998. 236

[1] A. Nishikawa, A. Ohnishi, and F. Miyazaki. Description and recognition o human gestures based on the transition o curvature rom motion images, In Proc. Face and Gesture Recognition, pp. 552 557, 1998. [11] Y. Xiong and F. Quek, Gestural Hand Motion Oscillation and Symmetries or Multimodal Discourse: Detection and Analysis, In Proc. Workshop on Computer Vision and Pattern Recognition or HCI. June, 23. [12] Intel Corp. Open Source Computer Vision Library Reerence Manual, 1999 [13] L. Sciavicco, B. Siciliano, Modeling and control o robot manipulators MacGraw-Hill, NY, 1996. [14] Isaac Cohen and Mun Wai Lee, "3D Body Reconstruction or Immersive Interaction", Second International Workshop on Articulated Motion and Deormable Objects Palma de Mallorca, Spain, 21-23 November, 22. 237

238