Dynamic Human Shape Description and Characterization

Similar documents
From 3-D Scans to Design Tools

3D Human Motion Analysis and Manifolds

Combined Shape Analysis of Human Poses and Motion Units for Action Segmentation and Recognition

Lecture 18: Human Motion Recognition

Body Trunk Shape Estimation from Silhouettes by Using Homologous Human Body Model

Internal Organ Modeling and Human Activity Analysis

Discriminative human action recognition using pairwise CSP classifiers

Human Upper Body Pose Estimation in Static Images

A three-dimensional shape database from a large-scale anthropometric survey

A Dynamic Human Model using Hybrid 2D-3D Representations in Hierarchical PCA Space

of human activities. Our research is motivated by considerations of a ground-based mobile surveillance system that monitors an extended area for

Short Survey on Static Hand Gesture Recognition

Object and Action Detection from a Single Example

BIOFIDELIC HUMAN ACTIVITY MODELING AND SIMULATION WITH LARGE VARIABILITY. John Camp Darrell Lochtefeld Human Signatures Branch

Lecture 8 Object Descriptors

A Validation Study of a Kinect Based Body Imaging (KBI) Device System Based on ISO 20685:2010

LOCAL APPEARANCE BASED FACE RECOGNITION USING DISCRETE COSINE TRANSFORM

Predicting 3D People from 2D Pictures

Human Action Recognition Using Dynamic Time Warping and Voting Algorithm (1)

Fully Automatic Methodology for Human Action Recognition Incorporating Dynamic Information

Learning Human Motion Models from Unsegmented Videos

STOP: Space-Time Occupancy Patterns for 3D Action Recognition from Depth Map Sequences

Principal motion: PCA-based reconstruction of motion histograms

Game Programming. Bing-Yu Chen National Taiwan University

Part I: HumanEva-I dataset and evaluation metrics

Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science.

Creating Custom Human Avatars for Ergonomic Analysis using Depth Cameras

Task analysis based on observing hands and objects by vision

3D Tracking for Gait Characterization and Recognition

Rigging / Skinning. based on Taku Komura, Jehee Lee and Charles B.Own's slides

Non-rigid body Object Tracking using Fuzzy Neural System based on Multiple ROIs and Adaptive Motion Frame Method

Human Action Recognition Using Independent Component Analysis

GRAPH-BASED APPROACH FOR MOTION CAPTURE DATA REPRESENTATION AND ANALYSIS. Jiun-Yu Kao, Antonio Ortega, Shrikanth S. Narayanan

A Model-based Approach to Rapid Estimation of Body Shape and Postures Using Low-Cost Depth Cameras

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

Statistical Learning of Human Body through Feature Wireframe

Human Activity Recognition Using Multidimensional Indexing

EigenJoints-based Action Recognition Using Naïve-Bayes-Nearest-Neighbor

Visual Learning and Recognition of 3D Objects from Appearance

3D Mesh Sequence Compression Using Thin-plate Spline based Prediction

An Adaptive Eigenshape Model

Epitomic Analysis of Human Motion

EXPANDED ACCOMMODATION ANALYSIS AND RAPID PROTOTYPING TECHNIQUE FOR THE DESIGN OF A CREW STATION COCKPIT

Human Activity Recognition Using a Dynamic Texture Based Method

Annotation of Human Motion Capture Data using Conditional Random Fields

SCAPE: Shape Completion and Animation of People

Recognition of Human Body Movements Trajectory Based on the Three-dimensional Depth Data

Probabilistic Tracking and Reconstruction of 3D Human Motion in Monocular Video Sequences

Static Gesture Recognition with Restricted Boltzmann Machines

Evaluating Example-based Pose Estimation: Experiments on the HumanEva Sets

Gait-based person identification method using shadow biometrics for robustness to changes in the walking direction

3D Gait Recognition Using Spatio-Temporal Motion Descriptors

MULTIVARIATE TEXTURE DISCRIMINATION USING A PRINCIPAL GEODESIC CLASSIFIER

Recognizing Human Activities in Video by Multi-resolutional Optical Flows

A Hierarchical Model of Shape and Appearance for Human Action Classification

Human Action Recognition Using Silhouette Histogram

Automatic Generation of Animatable 3D Personalized Model Based on Multi-view Images

The Novel Approach for 3D Face Recognition Using Simple Preprocessing Method

Thiruvarangan Ramaraj CS525 Graphics & Scientific Visualization Spring 2007, Presentation I, February 28 th 2007, 14:10 15:00. Topic (Research Paper):

Topics for thesis. Automatic Speech-based Emotion Recognition

Human pose estimation using Active Shape Models

Image Coding with Active Appearance Models

Tai Chi Motion Recognition Using Wearable Sensors and Hidden Markov Model Method

Thesis Proposal : Switching Linear Dynamic Systems with Higher-order Temporal Structure. Sang Min Oh

A novel approach to motion tracking with wearable sensors based on Probabilistic Graphical Models

HAND-GESTURE BASED FILM RESTORATION

A Method of Hyper-sphere Cover in Multidimensional Space for Human Mocap Data Retrieval

AFRL-RH-WP-TR

Analyzing and Segmenting Finger Gestures in Meaningful Phases

Unsupervised Human Members Tracking Based on an Silhouette Detection and Analysis Scheme

Determination of 3-D Image Viewpoint Using Modified Nearest Feature Line Method in Its Eigenspace Domain

Human Gait Recognition Using Bezier Curves

Human Activity Recognition Based on Weighted Sum Method and Combination of Feature Extraction Methods

3D Reconstruction of Human Bodies with Clothes from Un-calibrated Monocular Video Images

Recognition Rate. 90 S 90 W 90 R Segment Length T

EVENT DETECTION AND HUMAN BEHAVIOR RECOGNITION. Ing. Lorenzo Seidenari

Expanding gait identification methods from straight to curved trajectories

Face detection and recognition. Detection Recognition Sally

Sketchable Histograms of Oriented Gradients for Object Detection

Vehicle Occupant Posture Analysis Using Voxel Data

PART-LEVEL OBJECT RECOGNITION

Statistics and Information Technology

Human Action Classification

Robust Classification of Human Actions from 3D Data

Human Hand Gesture Recognition Using Motion Orientation Histogram for Interaction of Handicapped Persons with Computer

M I RA Lab. Speech Animation. Where do we stand today? Speech Animation : Hierarchy. What are the technologies?

Arm-hand Action Recognition Based on 3D Skeleton Joints Ling RUI 1, Shi-wei MA 1,a, *, Jia-rui WEN 1 and Li-na LIU 1,2

QMUL-ACTIVA: Person Runs detection for the TRECVID Surveillance Event Detection task

The Analysis of Animate Object Motion using Neural Networks and Snakes

A Two-stage Scheme for Dynamic Hand Gesture Recognition

Human Activity Recognition using the 4D Spatiotemporal Shape Context Descriptor

A Novel Extreme Point Selection Algorithm in SIFT

A Unified Spatio-Temporal Articulated Model for Tracking

Realistic Synthesis of Novel Human Movements from a Database of Motion Capture Examples

Evaluation of Local Space-time Descriptors based on Cuboid Detector in Human Action Recognition

Learnt Inverse Kinematics for Animation Synthesis

Prototype-based Intraclass Pose Recognition of Partial 3D Scans

Beyond Bags of Features

CS 231A Computer Vision (Winter 2014) Problem Set 3

Research Subject. Dynamics Computation and Behavior Capture of Human Figures (Nakamura Group)

Transcription:

Dynamic Human Shape Description and Characterization Z. Cheng*, S. Mosher, Jeanne Smith H. Cheng, and K. Robinette Infoscitex Corporation, Dayton, Ohio, USA 711 th Human Performance Wing, Air Force Research Laboratory, Dayton, Ohio, USA Abstract Dynamic human shape description and characterization was investigated in this paper. The dynamic shapes of a subject in four activities (jogging, limping, shooting, and walking) were generated via 3-D motion replication. The Paquet Shape Descriptor (PSD) was used to describe the shape of the subject in each frame. The unique features of dynamic human shapes were revealed from the observations of the 3-D plots of PSDs. The principal component analysis was performed on the calculated PSDs and principal components (PCs) were used to characterize the PSDs. The PSD calculation was then reasonably approximated by its first few projections in the eigenspace formed by PCs and represented by the corresponding projection coefficients. As such, the dynamic human shapes for each activity were described by these projection coefficients. Based on projection coefficients, data mining technology was employed for activity classification. Case studies were performed to validate the methodology developed. Keywords: Human Modeling, Dynamic Shape, Shape Descriptor, Principal Component Analysis, Activity Recognition 1. Introduction While a human is moving or performing an action, his body shape is changing dynamically. In other words, the shape change and motion are tied together during a human action (activity). However, human shape and motion are often treated separately in activity recognition. The shape dynamics describe the spatial-temporal shape deformation of an object during its movement and thus provide important information about the identity of a subject and the motions performed by the subject (Jin and Mokhtarian, 26). A few researchers utilized shape dynamics for human activity recognition. In (Kilner et al, 29), the authors addressed the problem of human action matching in outdoor sports broadcast environments by analyzing 3-D data from a recorded human activity and retrieving the most appropriate proxy action from a motion capture library. In (Niebles and Li 27), a video sequence was represented as a collection of spatial and spatial-temporal features by extracting static and dynamic interest points; then a hierarchical model was proposed that can be characterized as a constellation of bags-of-features, both spatial and temporal. In (Jin and Mokhtarian 26), a system was proposed for recognizing object motions based on their shape dynamics. The spatial-temporal shape deformation in motions was captured by hidden Markov models. In (Blank et al 25), human action in video sequences was seen as silhouettes of a moving torso and protruding limbs undergoing articulated motion. Human actions were regarded as three dimensional shapes induced by the silhouettes in the space-time volume. Dynamic human shapes can be described by a dynamic 3-D human shape model which, in turn, can be extracted from 2-D video imagery or 3-D sensor data or created by 3-D replication/animation. A dynamic 3-D shape model usually contains tens of thousands of graphic elements (vertices or polygons). In order to use the information coded in dynamic shapes for human identification and activity recognition, it is necessary to find an effective method for dynamic shape description and characterization. 2. Dynamic Shape Creation Since the technologies that are capable of capturing 3-D dynamic shapes of a subject during motion are still very limited in terms of their maturity and availability, there is very little data available at this time for human dynamic shapes. However, as a motion capture system can be used to capture human motion and a laser scanner can be used to *Corresponding author. Email: Zhiqing.cheng@wpafb.af.mil 1

capture human body shape, various techniques have been developed to replicate/animate human motion in a 3-D space, thus generating dynamic shapes of a subject in an action. In this paper, Blender (http://www.blender.org), an open source software tool was used to animate the motion of a human subject in a 3-D space during four different activities. The four activities were walking, jogging, limping and shooting. The data that were used as a basis for the animation were acquired in the Human Signatures Laboratory of the US Air Force, including scan data and motion capture (MoCap) data. The human subject, with markers attached, was scanned using the Cyberware whole body scanner. Motion capture data were acquired for the same subject with the same markers attached. The markers allowed the joint centers to be determined for both the scan and the MoCap data. The scan was imported into Blender and the joint centers were used to define the skeleton in BVH (Bio-vision Hierarchical) file format. Euler angles for the different body segments were computed from the joint centers and other markers and used to set up BVH files for the four different activities. The BVH files were imported into Blender and used to animate the whole body scan of the subject. Figure 1 shows the images captured from the animation within Blender for the four activities. From Blender animation, a 3-D mesh can be output at each frame of motion, as shown in Fig. 2 for limping, which can be used to represent the 3-D dynamic body shape of the subject at this instant of motion. Thus, the output of the 3-D mesh in each frame can be used as the simulation data of dynamic human shapes for training the algorithms developed for activity recognition. 3. Dynamic Shape Description and Characterization Dynamic shapes shown in Figs. 1 and 2 are represented by 3-D meshes. Each mesh may contain as many as tens of thousands of graphical elements (vertices or polygons). It is not feasible to use the vertices or polygons directly for the analysis of human shape dynamics. One way to effectively describe dynamic shapes and to enable further analysis is by using a shape descriptor (Cohen & Li 23; Chu & Cohen 25). In this paper, the Paquet Shape Descriptor (PSD) (Paquet et al 2; Robinette 23) with certain modifications is used to describe dynamic shapes and to analyze shape dynamics. As illustrated in Fig. 3, the PSD uses 12 bins (discrete parameters) to characterize shape variation. Among these 12 bins, 4 are related to the radius r, 4 to the first angle (cos(θ)), and 4 to the second angle (cos(δ)). The details of the PSD calculation are omitted here. The 3-D plots of the time histories of PSD for four activities are illustrated in Fig. 4 where the first 4 bins corresponding to radius are show on the top, the second 4 bins of cos(θ) in the middle, and the last 4 bins of cos(δ) at the bottom. Figure 1. Replication of a subject in four activities: limping, jogging, shooting, and walking. Figure 2. Dynamics shapes of a subject during limping. Example Cord (,,) P1 θ ) δ P3 r P2 Chest Left Arm Figure 3. Paquet shape descriptor and its coordinate system. By visually looking at these plots, one can find that: The variation of each bin over time is different: the variations of some bins over time are larger and significant, but others are not. Periodic features are exhibited by the plots for the activities of jogging, limping, and walking. The 3-D plot for each activity is unique. There are visible and significant differences among the plots for four activities. These observations from the PSD reveal some unique features of shape dynamics. However, directly using PSD to analyze shape dynamics is still not feasible, since it has 12 bins (variables) which form a space of 12-dimension. Further treatment becomes necessary to characterize the shape descriptor and to reduce the dimension of the problem space. 2

Figure 4. The time histories of 12 bins of PSD for four activities 3

Magnitude Magnitude Magnitude Magnitude Z. Cheng, Dynamic Human Shape Description and Characterization Therefore, the principal component analysis (PCA) is used to characterize the high-dimensional space defined by the PSD. Denote T p p p p }, (1) ijk { 1 2 12 ijk as the PSD shape descriptor for the i-th subject in j- th activity at k-th frame. For the data collected, denote P { pijk }, i 1,..., I; j 1,..., J; k 1,..., K. (2) where I represents the number of subjects, J is the number of actions, and K is the number of frames for each action. However note that the number of activities that each subject performs can be different and the number of frames for each activity can be different also. By performing PCA of P one can find the principal components that characterize the space defined by the shape descriptor. In this paper, dynamic shapes were created for the four activities at the frame rate of.2 /s, with 85 frames for jogging, 352 frames for limping, 554 frames for shooting, and 227 frames for walking. The percentage of variance of each principal component (PC) is shown in Fig. 5, and the first four PCs are show in Fig. 6. The original PSD vector can be projected onto the space (eigenspace) formed by PCs, that is, it can be expanded in terms of PCs. As shown in Fig. 5, among all 12 PCs, only the first 1~2 are significant. This means that the original PSD can be reasonably approximated by its first few projections in the eigenspace and represented by the projection coefficients corresponding to these significant PCs. Figure 7 illustrates the time histories of the first and second projection coefficients for four activities. Figure 5. Percentage of variance of each principal component Principal Component 1 Principal Component 2.3.2.1. -.1 -.2 -.3 -.4 4 8 12.3.2.1. -.1 -.2 -.3 4 8 12 Principal Component 3 Principal Component 4.2.1. -.1 -.2 -.3 -.4 4 8 12.3.2.1. -.1 -.2 -.3 4 8 12 Figure 6. First four principal components of PSD. 4

Z. Cheng, Dynamic Human Shape Description and Characterization jog1 Principal component 1 jog1 Principal component 2 2 15 1 5-5 -1-15..5 1. 1.5 2. 25 2 15 1 5-5 -1-15 -25..5 1. 1.5 2. limp1 Principal component 1 limp1 Principal component 2 3 25 2 15 1 5-5 -1-15 2 4 6 8 3 2 1-1 -3-4 2 4 6 8 shoot1 Principal component 1 shoot1 Principal component 2 3 2 1-1 -3 2 4 6 8 1 12 3 2 1-1 -3-4 2 4 6 8 1 12 walk1 Principal component 1 walk1 Principal component 2 1 5-5 -1-15 -25-3 -35 1 2 3 4 5 2 1-1 -3-4 -5 1 2 3 4 5 Figure 7. Time histories of the first and second projection coefficients for four activities. Denote W v v v }, (3) M where { 1 2 M v m is the m-th principal component (eigenvector) of P. The original observations (data) can be projected onto the space defined by W, that is, T T YM P W, (4) M Where YM Y[ M, N] is the matrix of projection coefficients, each column of which corresponds to each original record, M is the dimension of a shape descriptor (M=12 for PSD), and N is the number of total shapes observed (N=1218 for the case of this paper). From Fig. 5 we can see that among the total of 12 principal components, the significant ones are less than 2. This means that instead of using the full space of dimension of M, one can construct a new space with only the significant principal components, that is, WL { v1 v2 vl}, L M, (5) which would substantially reduce the dimension of the space. As for the case investigated in this paper, 5

L 2, which is much less than M 12. Then the projection in this space is given by T T YL P W, (6) L where YL : Y[ L, N]. Each original record can be either fully reconstructed by Eq. (4) or partially reconstructed (approximated) by Eq. (6). Usually an original record can be well approximated by its partial construction with significant principal components. This means that the original data with dimension of M can be represented by its projection coefficients with dimension of L ( L M ). In the space of reduced dimension, the problem becomes tractable, as the number of variables becomes much smaller. In fact, for the case in this paper, the two projection coefficients corresponding to the first two most significant principal components are sufficient to represent the shape dynamics for action recognition. The sequence of a projection coefficient at each frame for a particular subject in a particular action constitutes a time series, as shown in Fig. 7 for example. It is shown that the time histories of the first and second coefficient are unique with respect to each action, which can be used as the discriminators for activity recognition. 4. Activity Recognition Based on Shape Dynamics The shape dynamics of a subject during motion, as described in Section 3, can be used for activity recognition. In this paper, a data mining tool was employed to classify four activities (jog, limp, shoot, walk) based on 85 frames from each activity. Note that in the classification, each frame was treated independently rather than being placed in sequence as a time series. Five attributes were used in classification: (a) Pelvis_Velocity, the resultant velocity at the mid-pelvis location; (b) PC1, the first projection coefficient; (c) PC2, the second projection coefficient; (d) PC1_Velocity, the derivative of PC1; and (e) PC2_Velocity, the derivative of PC2. The significance of each attribute can be assessed in terms of gain ratio as given in Table 1. While Pelvis_Velocity is most significant, all five attributes are selected for classification. Various classification methods are available, such as those provided by Weka http://www.cs.waikato.ac.nz/ml/weka/. Among them, five conventional methods listed in Table 2 were chosen to use in the case study. All of them achieved classification accuracy greater than 95%, as shown in Table 2. Table 1. Attributes ranking results Table 2. Classification accuracy 5. Conclusion Based on the study of this paper, the following conclusions are in order. Shape dynamics contain the information about both body motion and shape changes and have great potential for human identification and activity recognition. Shape dynamics can be well-captured by a shape descriptor and further characterized by principal components. Human motion/action in 3-D space can be replicated or animated with high bio-fidelity, 6

which can be used to generate the data for training a model or to evaluate the performance of a tool. Using a dynamic 3-D human shape model for human activity recognition is plausible. This approach is unique as it differs from other conventional techniques based on 2-D imagery or models. It is effective as it can overcome the shortcomings inherent in 2-D methods. As a shape descriptor, the PSD is not reversible. This means that while it can be used for analysis, as it was used in this paper, it cannot be used for shape reconstruction. Also, spatial information may not be uniquely represented in the original definition of the PSD, which can be remedied by certain treatments or modifications. 18th International Conference on Pattern Recognition (ICPR'6). Kilner J, Guillemaut J-Y, and Hilton A, 29. 3-D Action Matching with Key-Pose Detection. In: 29 IEEE 12th International Conference on Computer Vision Workshops. Niebles J-C and Li F-F, 27. A Hierarchical Model of Shape and Appearance for Human Action Classification. In: IEEE Computer Vision and Pattern Recognition (CVPR 27). Paquet E, Rioux M, Murching A, Naveen T, and Tabatabai A, 2. Description of shape information for 2-D and 3-D objects. Signal Processing: Image Communication 16 (2), pp 13-122. It should be pointed out that the dynamic shape models used in this study were created from 3-D surface scan data and motion capture data using OpenSim and Blender. While these models provide high biofidelic description of body shape during motion, the body surface deformation may not be fully or accurately represented by these models. However, since the body shape variation induced by the articulated motion is much larger than the surface deformation, most observations and results from this paper can be reasonably postulated to be true even if the surface deformation is more precisely represented. More investigations are needed to validate this assumption. Robinette K, 23. An Investigation of 3-D Anthropometric Shape Descriptors for Database Mining. Ph.D. Thesis, University of Cincinnati. Acknowledgement This study was carried out under the support of a SBIR Phase I funding (FA865-1-M-692) provided by the US Air Force. References Blank M, Gorelick L, Shechtman E, Irani M, and Basri R, 25. Actions as Space-Time Shapes. In: Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV 5). Cohen I and Li H, 23. Inference of Human Posture by Classification of 3-D Human Body Shape, IEEE International Workshop on Analysis and Modeling of Faces and Gestures, ICCV 23. Chu C-W and Cohen I, 25. Posture and Gesture Recognition using 3-D Body Shapes Decomposition. In: Proceedings of the 25 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 5). Jin N and Mokhtarian F, 26. A Non-Parametric HMM Learning Method for Shape Dynamics with Application to Human Motion Recognition. In: The 7