Thiruvarangan Ramaraj CS525 Graphics & Scientific Visualization Spring 2007, Presentation I, February 28 th 2007, 14:10 15:00 Topic (Research Paper): Jinxian Chai and Jessica K. Hodgins, Performance Animation from Low-dimensional Control Signals. In ACM Transactions on Graphics (Proceedings of SIGGRAPH 2005).
Table of contents: 0. Introduction 1. System Overview 2. Motion Performance 2.1 Skeleton Calibration 2.2 Marker Calibration 3. Online local modeling 3.1 Nearest Neighbor Construction 4. Online Motion Synthesis 5. Results 6. Conclusion
0. Introduction: This paper studies real-time animation and control of three-dimensional human motions using low-cost and non-intrusive devices. This project introduces an approach to performance animation that employs video cameras and a small set of retro-reflective markers so that a user can create a simple low-cost and easy-to-use system at home. The low-dimensional control signals from the user's performance are supplemented by a database of pre-recorded human motion. During run time, the system learns a series of local models from a set of motion capture examples that are a close match to the marker locations captured by the cameras. These local models are then used to reconstruct the motion of the user as a full-body animation. They demonstrate the power and flexibility of this approach by having users control six behaviors in real time without significant latency: walking, running, hopping, jumping, boxing, and Kendo (Japanese sword art). The reconstructed motion is based on a single large human motion database. The resulting animation captures the individual style of the user's motion through spatialtemporal interpolation of the data. Finally, they assess the quality of the reconstructed motion by comparing against ground truth data simultaneously captured with a full marker set in a commercial motion capture system. Figure 1: Users wearing a few retro-reflective markers control the full-body motion of avatars by acting out the motion in front of two synchronized cameras. Examples From left to right: walking, boxing, and Kendo (Japanese sword art).
1. System Overview: First they perform a series of off-line captures to create a large and heterogeneous human motion database using an optical motion capture system with twelve 120 Hz Mx-40 cameras. This database contains ten full-body behaviors: boxing (71597 frames), walking (105963 frames), running (18523 frames), jumping (40303 frames), hopping (18952 frames), locomotion transitions (36251 frames), dancing (18002 frames), basketball (12484 frames), climbing on playground equipment (51947 frames), and Kendo (59600 frames). They used 41 markers, an adaptation of the Helen Hayes marker set. Also four extra markers are used on the bamboo sword for Kendo. Figure (2): System Overview Each motion in the database has a skeleton and each motion sequence contains trajectories for the absolute position and orientation of the root node (pelvis) as well as relative joint angles of 18 joints. The set of motion capture data in the database is denoted as {q n n = 1 N}, where q n is the joint angle representation of a specific pose in the database. The control signals are denoted by c t and z t denotes the position and the orientation of the user. The online motion control problem is to synthesize the current human body pose q t based on the current low-dimensional control signals, c t, motion capture data in the database, {q 1,..q N }, and the synthesized poses in the previous frames, [ q 1,.., q t_1 ]. There are three major components in this system 1. Motion performance (Real Time Control Signal Capture) 2. Online local modeling (Real Time Local Pose Modeling) 3. Online motion synthesis
2. Motion Performance: Motion performance describes real time control signal capture. They provide an algorithm to extract the control marker locations from the video cameras. The input devices they use in this project are Pulnix video cameras, which have 640* 480 image resolution and a frame rate of 60 fps. The users wear a small set of retro reflective markers to perform in front of the video cameras. To illuminate the markers photography light is used near each camera. Figure (3): Control Marker Locations and its correspondence Figure 3 (a) represents the image from the left camera and figure 3 (b) represents the image from the right camera. Figure 3 (c) shows the detected marker positions in the left image and figure 3 (d) shows the detected marker locations in the right image and the epipolar lines of the markers detected from the left image. For every marker location in the left image, the matching marker location in the right image is located on its corresponding epipolar lines. So basically here a correspondence is established between the marker locations using epipolar geometry and color similarity constraints. The epipolar geometry basically explains the relationship that exists between two images.
In this process they use subject calibration the system interface is robust to users of different size. The two steps involved are Skeleton calibration and Marker Calibration. 2.1 Skeleton Calibration: The Skeleton calibration step is used to estimate the user's skeleton model from the 3D locations of a few markers. They place the markers on the left hand, left elbow, left foot, left knee, and each shoulder and also two markers are placed on the front of the waist. The user assumes T poses and captures the 3D locations of the markers. These locations are not sufficient enough to compute the whole detailed skeleton model, therefore with the help of these measured 3D marker locations they interpolate a database of detailed skeleton models from a variety of subjects. Then they place markers on the right limb and model the right side of the skeleton model in a similar fashion. This step is done exactly once by the user. 2.2 Marker Calibration: The second step is Marker Calibration, to determine the location of the control markers used in the interface relative to the inboard joint. First, the location of the markers in the world coordinate frame are calculated, with this information and user's skeleton model and using forward kinematics the 3D positions of the inboard joints relative to the world coordinate frame is computed. Forward kinematics is a method used for animating models in computer graphics. The 3D location of the control marker c n corresponding to the motion capture data in the database is given by the formula f is the forward kinematics animation function, q n is the current frame in the database, s is the users skeleton model, v l is the location of the control markers relative to the inboard joint and z 0 is the default root position.
3. Online local modeling: In this process to synthesize the current pose q t, they search the motion capture database for examples that are close to the current control signals c t and the recently synthesized poses in the previous frames [ q 1,.,q t_1 ]. These examples are used as training data to learn a simple linear model via Principal Component Analysis (PCA). PCA is a technique for simplifying a data set by reducing multidimensional data sets to lower dimensions for analysis. To find the closet examples of the current pose the system uses the current control signal from the interface and the synthesized poses in the previous two terms. Since the runtime computational cost depends on the efficiency of the nearest neighbor search process, a data structure called nearest neighbor graph is implemented, and an algorithm that accelerates the nearest neighbor search by utilizing the marker control signals. 3.1 Nearest Neighbor Construction: The neighbor graph is constructed with each pose in a human body representing a node and the pose similarity representing an edge. The nodes come from the human motion data base q n. The i th node and the j th node have an edge only if they satisfy the following condition d represents the largest L 1 distance between two consecutive poses in the database, fm and fc are the camera frame rates used for the motion capture and the control interface respectively, and ε is a specified search radius for nearest neighbors. In this experiment they set d to be 1.75 degrees per joint angle and ε to be 3 degrees per joint angle.
Figure (4): 2D example of the fast nearest neighbor search using two dimensions of the neighbor graph for the boxing database Figure (a) shows the data points in the database, in figure (b) the circle represents the previous pose, q t_1, and the square represents the current pose, q t.at run time, we use the neighbors of the previous frame (blue points) {q t_1k k = 1,.,K} and a precomputed neighbor graph to find the neighbors of {q t_1k k = 1,.,K} in the neighbor graph (red points). The algorithm then searches only the red and blue points to find the nearest neighbors of the current query point. Finally figure (c) represents the green points which are the nearest neighbors computed using this algorithm. The nearest neighbor search is independent of the size of the human motion capture database 4. Online motion Synthesis: This section focuses on reconstructing the joint angle values [q 1,.,q t ] from the low dimensional control signals obtained from the vision-based interface [c 1,..,c t ] using the local linear model. During run time the control signals are transformed to full body human motions frame by frame. The online motion synthesis deals with optimizing an objective function that reflects a- priori likelihood, control signals and the smoothness of the motion. They introduce three energy terms for this reason. A human pose prior E prior measures the likelihood of the current pose using the motion capture database. This term ensures that the synthesized motion satisfies the probabilistic distribution of human motions in the database, the control term E control, measures the deviation of the marker locations in the reconstructed motion from the control inputs obtained from the vision based interface, and the smoothness term E smoothness that minimizes velocity changes in the synthesized motion.
5. Results: The effectiveness of this algorithm is tested on different behaviors and different users using a large and heterogeneous human motion database and the quality of the synthesized motions is evaluated by comparing them with motion capture data recorded with a full marker set. The systems performance scales well with the size of the database. The online motion synthesis using the local models is different than the other lazy learning approaches because here they synthesize the motion in a low dimensional continuous control signals. They provide a comparison of this method with two other popular learning methods, the Nearest Neighbor Synthesis and the locally weighted regression. The results are shown in video form; it can be seen in the following link http://graphics.cs.cmu.edu/projects/performanceanimation/jchai_final_siggraph05_mpeg.mov 6. Conclusions: In this paper they present an approach for performance animation that uses a series of local models created from a large and heterogeneous human motion database to reconstruct full-body human motion from low-dimensional control signals. Also it demonstrates the power and flexibility of this approach with different users wearing a small set of markers and controlling a variety of behaviors in real time by performing in front of one or two video cameras. The results they obtained are comparable in quality to those obtained from a commercial motion capture system. One main advantage is this performance animation system is far less expensive. References: Jinxian Chai and Jessica K. Hodgins, Performance Animation from Low-dimensional Control Signals. In ACM Transactions on Graphics (Proceedings of SIGGRAPH 2005). http://www.cs.cmu.edu/~jchai/papers/jchai_pa.pdf Yamane, K., and Kuffner, J.J., and Hodgins, J.K 2004 Synthesizing Animations of Human manipulation tasks. In ACM Transactions on Graphics. 23(3):532.539. http://portal.acm.org/citation.cfm?coll=guide&dl=guide&id=1015756 http://en.wikipedia.org/wiki/forward_kinematic_animation http://en.wikipedia.org/wiki/computer_animation