Low Cost Motion Capture

Similar documents
Target Tracking Using Mean-Shift And Affine Structure

Target Tracking Based on Mean Shift and KALMAN Filter with Kernel Histogram Filtering

Visual Tracking. Image Processing Laboratory Dipartimento di Matematica e Informatica Università degli studi di Catania.

An Adaptive Background Model for Camshift Tracking with a Moving Camera. convergence.

Real Time Motion Detection Using Background Subtraction Method and Frame Difference

Visual Tracking. Antonino Furnari. Image Processing Lab Dipartimento di Matematica e Informatica Università degli Studi di Catania

Mean shift based object tracking with accurate centroid estimation and adaptive Kernel bandwidth

Computer Vision II Lecture 4

A COMPREHENSIVE SIMULATION SOFTWARE FOR TEACHING CAMERA CALIBRATION

CHAPTER 5 MOTION DETECTION AND ANALYSIS

Segmentation and Tracking of Partial Planar Templates

ECE 172A: Introduction to Intelligent Systems: Machine Vision, Fall Midterm Examination

Hand-Eye Calibration from Image Derivatives

Tracking of Human Body using Multiple Predictors

Three-dimensional nondestructive evaluation of cylindrical objects (pipe) using an infrared camera coupled to a 3D scanner

Pattern Feature Detection for Camera Calibration Using Circular Sample

Mobile Human Detection Systems based on Sliding Windows Approach-A Review

Mean Shift Tracking. CS4243 Computer Vision and Pattern Recognition. Leow Wee Kheng

Accurate 3D Face and Body Modeling from a Single Fixed Kinect

Robert Collins CSE598G. Robert Collins CSE598G

Object Detection in Video Using Sequence Alignment and Joint Color & Texture Histogram

Object tracking in a video sequence using Mean-Shift Based Approach: An Implementation using MATLAB7

Particle Filtering. CS6240 Multimedia Analysis. Leow Wee Kheng. Department of Computer Science School of Computing National University of Singapore

Object detection using non-redundant local Binary Patterns

DESIGN AND IMPLEMENTATION OF VISUAL FEEDBACK FOR AN ACTIVE TRACKING

Real-Time Human Detection using Relational Depth Similarity Features

HOG-Based Person Following and Autonomous Returning Using Generated Map by Mobile Robot Equipped with Camera and Laser Range Finder

Towards the completion of assignment 1

Real Time Unattended Object Detection and Tracking Using MATLAB

Reduced Image Noise on Shape Recognition Using Singular Value Decomposition for Pick and Place Robotic Systems

A New Image Based Ligthing Method: Practical Shadow-Based Light Reconstruction

A Vision System for Automatic State Determination of Grid Based Board Games

Nonparametric Clustering of High Dimensional Data

Using Optical Flow for Stabilizing Image Sequences. Peter O Donovan

Fingertips Tracking based on Gradient Vector

EECS150 - Digital Design Lecture 14 FIFO 2 and SIFT. Recap and Outline

Using temporal seeding to constrain the disparity search range in stereo matching

A Simple Automated Void Defect Detection for Poor Contrast X-ray Images of BGA

Detection of a Single Hand Shape in the Foreground of Still Images

Efficient Acquisition of Human Existence Priors from Motion Trajectories

Computer Vision 2. SS 18 Dr. Benjamin Guthier Professur für Bildverarbeitung. Computer Vision 2 Dr. Benjamin Guthier

Computer Vision II Lecture 4

COMPUTER AND ROBOT VISION

A Modified Mean Shift Algorithm for Visual Object Tracking

A 3-D Scanner Capturing Range and Color for the Robotics Applications

Image Segmentation. Shengnan Wang

BSB663 Image Processing Pinar Duygulu. Slides are adapted from Selim Aksoy

Fundamental Matrices from Moving Objects Using Line Motion Barcodes

CS 223B Computer Vision Problem Set 3

Auto-focusing Technique in a Projector-Camera System

Camera Calibration with a Simulated Three Dimensional Calibration Object

An Introduction to PDF Estimation and Clustering

Centre for Digital Image Measurement and Analysis, School of Engineering, City University, Northampton Square, London, ECIV OHB

Removing Shadows from Images

3D Reconstruction of a Hopkins Landmark

Cosine Transform Priors for Enhanced Decoding of Compressed Images.

Elliptical Head Tracker using Intensity Gradients and Texture Histograms

An Approach for Real Time Moving Object Extraction based on Edge Region Determination

Iterative Estimation of 3D Transformations for Object Alignment

IMAGE-BASED 3D ACQUISITION TOOL FOR ARCHITECTURAL CONSERVATION

Flexible Calibration of a Portable Structured Light System through Surface Plane

Synchronized Ego-Motion Recovery of Two Face-to-Face Cameras

CSE 252B: Computer Vision II

Unsupervised learning in Vision

Object Tracking Algorithm based on Combination of Edge and Color Information

Motion Capture & Simulation

Planar homographies. Can we reconstruct another view from one image? vgg/projects/singleview/

CORRELATION BASED CAR NUMBER PLATE EXTRACTION SYSTEM

Structured Light II. Thanks to Ronen Gvili, Szymon Rusinkiewicz and Maks Ovsjanikov

Measurement of Pedestrian Groups Using Subtraction Stereo

Object Tracking with an Adaptive Color-Based Particle Filter

CS 231. Inverse Kinematics Intro to Motion Capture. 3D characters. Representation. 1) Skeleton Origin (root) Joint centers/ bones lengths

IRIS SEGMENTATION OF NON-IDEAL IMAGES

Curling Stone Tracking by an Algorithm Using Appearance and Colour Features

Cs : Computer Vision Final Project Report

Human Motion Detection and Tracking for Video Surveillance

COMPARATIVE STUDY OF DIFFERENT APPROACHES FOR EFFICIENT RECTIFICATION UNDER GENERAL MOTION

Motion. 1 Introduction. 2 Optical Flow. Sohaib A Khan. 2.1 Brightness Constancy Equation

Stereo Image Rectification for Simple Panoramic Image Generation

International Journal of Advance Engineering and Research Development

Feature Tracking and Optical Flow

Programming assignment 3 Mean-shift

Short Survey on Static Hand Gesture Recognition

3D from Photographs: Automatic Matching of Images. Dr Francesco Banterle

Measurement of 3D Foot Shape Deformation in Motion

Colour Segmentation-based Computation of Dense Optical Flow with Application to Video Object Segmentation

Multiple View Reconstruction of Calibrated Images using Singular Value Decomposition

Occlusion Detection of Real Objects using Contour Based Stereo Matching

Model-Based Human Motion Capture from Monocular Video Sequences

Interpolation and extrapolation of motion capture data

Car tracking in tunnels

AUTONOMOUS IMAGE EXTRACTION AND SEGMENTATION OF IMAGE USING UAV S

AN ALGORITHM FOR BLIND RESTORATION OF BLURRED AND NOISY IMAGES

Capturing, Modeling, Rendering 3D Structures

Guided Image Super-Resolution: A New Technique for Photogeometric Super-Resolution in Hybrid 3-D Range Imaging

Overview. Augmented reality and applications Marker-based augmented reality. Camera model. Binary markers Textured planar markers

arxiv: v1 [cs.cv] 28 Sep 2018

3D Computer Vision. Structured Light II. Prof. Didier Stricker. Kaiserlautern University.

Comparison of Some Motion Detection Methods in cases of Single and Multiple Moving Objects

Human body animation. Computer Animation. Human Body Animation. Skeletal Animation

Transcription:

Low Cost Motion Capture R. Budiman M. Bennamoun D.Q. Huynh School of Computer Science and Software Engineering The University of Western Australia Crawley WA 6009 AUSTRALIA Email: budimr01@tartarus.uwa.edu.au, {bennamou,du}@csse.uwa.edu.au Abstract Traditionally, computer animation techniques were used to create movements of an object. Unfortunately, these techniques require much human intervention to work out the different joint angles for each movement. Not only is the task a very time-consuming one, the movements created are often not realistic either. Modern motion capture techniques overcome those problems by capturing the actual movements of a performer (e.g. human being) from the detected positions or angles of the sensors or optical markers on the subject. Despite their advantages, motion capture has always been considered to be an expensive technology. In this paper, we describe a low cost motion capture system that uses two low cost webcams. We also demonstrate our experimental results of 3D reconstruction of the lower body part of a human subject. Keywords: Motion capture, Mean-shift algorithm, Camera calibration, 3D reconstruction 1 Introduction Motion capture, or mocap, is a technique of digitally recording the movements of real beings, usually humans or animals. Traditionally, computer animation techniques are used to create movements of a being. However, this technique is proven to be time consuming and difficult. Motion capture is considered to be a better technique for accurately generating movements for computer animation. There are three types of motion capture techniques [1]. The first technique is called the optical motion capture in which photogrammetry is used to establish the position of an object in 3D space based on its observed location within the 2D fields of a number of cameras. The second technique is called the Magnetic motion capture, where the position and orientation of magnetic sensors are calculated with respect to a transmitter. The last technique is called electro-mechanical motion capture and it involves modelling movements using body suit with sensors attached. The need for optical motion capture can be justified by the fact that this technique is able to cover a large active area and, due to the lightness in weight of the markers, provides more freedom of movement for the subject. Despite these advantages, optical motion capture technologies have been known to be expensive. The high cost is mainly contributed by the cost of hardware components (i.e. high speed cameras). In this paper, we describe the design and implementation of a low cost optical motion capture system that requires two low cost calibrated webcams. This low cost system falls under the optical motion capture category as advanced computer vision techniques are employed to establish the joint positions of a subject. As all motion capture systems involve a tracking phase, we adopt the Mean-shift algorithm as the basis of object tracking. While our current system is constrained by several limitations such as the inability to handle occlusion, it is still able to demonstrate the fundamental idea of motion capture and provides input to animation applications, such as Poser [2]. The outline of this paper is as follows. In Section 2, a brief overview of the Mean-shift algorithm will be explained. In Section 3, the hardware components and the setup of our system will be described. Experiments and results are reported in Section 4. Finally, conclusion and future work are given in Section 5. 2 Mean-shift: An overview The mean-shift algorithm [3, 4, 5] is one of the tracking techniques commonly used in computer vision research when the motion of the object to be tracked cannot be described by a motion model, such as one required by the Kalman filter [6]. The colour and texture information that characterizes the object can be grouped together to form the feature vector in the tracking process. The algo-

rithm requires only a small number of iterations to converge and can easily adapt to the change of scale of the tracked object. The key notion in the mean-shift algorithm is the definition of a multivariate density function with a kernel function K(x) over a region in the image: n x xi 1 X K, f (x) = nhd i=1 h where {xi i = 1,, n} is a set of points falling inside a window of radius h centred at x. There are a number of kernel functions that one can choose from. The commonly used ones are the Normal, Uniform, and Epanechnikov kernel functions. At each iteration the algorithm produces the meanshift vector that describes the movement of the region that encloses the tracked target. As the mean-shift vector is defined in terms of the negative gradient of the kernel profile, a kernel function that has the simplest profile gradient is preferred. Amongst the commonly used kernel function above, the Epanechnikov kernel, which has its kernel profile as a uniform distribution, is preferrable than the other two. Comaniciu et al. [3, 4] formulate the target estimation problem as the derivation of the estimate that maximizes the Bayes error associated with the target model and target candidate distributions. This approach suggest that the larger is the probability of error, the more similar are the distributions. Based on this assumption, the Bhattacharyya coefficient [7] is used to calculate the similarity measure between the two distributions. 3 System description The setup of our motion capture system is intended to be low-cost. The necessary pieces of equipment required are two low-cost webcams, two tripods, and a calibration frame. The block diagram in Fig. 1 shows all the components of the system. Each of these components will be described in detail later. The system uses two low cost webcams for motion capture. Each webcam mounted on a tripod must be calibrated prior to any experiments. The current version of our system focuses on the capture of movements on the lower part of the body only. This requires a total of 9 white circular markers to be put on the following joints (see Fig. 2): the hip (1), two upper legs (2), knees (2), ankles (2), and feet (2). To simplify the tracking process, we darken the background by putting in a black curtain and instruct the subject to wear a dark non-glossy tight Figure 1: System block diagram. (a) (b) Figure 2: (a) Setup of the system. (b) A subject with white circular markers on the lower part of his body. suit so that the white circular markers can be easily detected. This requirement is not considered to be a limitation of the system as most movie editing systems would require the background to be of a certain colour (often in blue) for easy segmentation. The two webcams are directly connected to a PC via two USB ports. This allows video images captured by the webcams to be immediately transferred to a PC for processing. We currently use functions under the Matlab Image Acquisition Toolbox for image acquisition; however, any other equivalent functions from other application software can be used also. 3.1 Camera calibration Camera calibration is a step for determining the 3 4 matrix that maps coordinates in the 3D world into the 2D image. The matrix can be recovered linearly via a method commonly referred to as DLT (Direct Linear Transform) [8, 9] using at least 6 known non-coplanar reference scene points and their corresponding image points. In our system, we use a calibration target with two orthogonal faces, each of which has 6 reference points. The calibration target also implicitly defines in the scene a global coordinate system that can be referenced to in some other applications, such as Poser [2], for graphics rendering.

Without moving the calibration target, each webcam was calibrated in turn. The calibration process produces two 3 4 matrices, one for each webcam. 3.2 Detection of markers The nine markers were detected via a thresholding process. This thresholding process involves choosing a threshold value t from the pixel intensity range of 0 to 255. Using a specific threshold value t, the system converts all intensity values within a gray scale image that are greater than t into 1 and all intensity values less than t into 0, and hence produces a binary image. Since we have a much simplified scene for marker detection, we can inspect the intensity histogram to automatically compute the threshold value. As expected, our intensity histogram is bi-modal. Consecutive frequency values in the intensity histogram can be examined to determine the flat region that separates the two modes. The threshold value can then be estimated from the flat region in the intensity value histogram. 3.3 Automatic labellings of markers The 9 markers are automatically labelled using a heuristic method. At the start of each experiment, the subject must adopt the standing pose position as show in Fig. 3. After all nine markers have been detected, the system labels the top middle marker as marker #1. The initial standing pose shows that there are four markers on each leg. Hence, the four markers that are positioned to the left of marker #1 are labelled as marker #2 to #5 and the four markers to the right of marker #1 are labelled as marker #6 to #9. The assignment of marker number depends on the y component of the marker coordinates. So, for the four markers on the left side, marker #2 is given to the marker which has the smallest y value compared to those of the other three markers. The same labelling algorithm is applied to the four markers to the right side of marker #1. 3.4 Mean-shift tracking and 3D reconstruction of markers The mean-shift algorithm is employed to track the nine white markers independently. The system setup described above allows the tracking to be done on grey level images rather than colour. The feature that we used for tracking is therefore simply the pixels intensity values and the density function is the intensity histogram inside the kernel window. Note that, as defined by the Epanechnikov kernel function, the weighting factors for points near the centre of the kernel are higher, the intensity histogram is computed with these different weighting factors incorporated. There are 3 free parameters that can be set to finetune the performance of the mean-shift algorithm: 1. The radius, h, of the kernel window. 2. The threshold value, ɛ, which is used for terminating the tracking iteration between consecutive images. 3. The number of histogram bins, 1 < m < 255, for storing the frequencies of pixels intensity values inside the kernel window. We will describe in the following section what values these parameters were set to in our experiments. For the computation of the 3D coordinates of each marker, the two 3 4 matrices obtained above are combined to give 4 linear equations for the detected image coordinates of the marker in the two images. The 3D coordinates of each marker, relative to the implicit global coordinate system defined by the calibration frame, can be estimated using leastsquares. 4 Results Many experiments have been conducted to test the tracking algorithm and the 3D reconstruction of markers. We also evaluated the performance of the mean-shift algorithm using different values of the free parameters discussed in Section 3.4 above. In most of our experiments, we found that h = 6 ± 1 pixels, ɛ = 10 4, and m = 128 gave the best performance. The result of tracking and 3D reconstruction using these parameters in one of our experiments is presented in Fig. 3. In every experiment, we tested our system to track the movement of markers over 200 frames. Each webcam took an image in sequence and performed mean-shift tracking of the markers. It is not possible to use software to synchronize the two webcams on-line. This is due to the fact that we are only using a single processor computer and therefore the execution of instructions has to be interleaved. Hence, the system instructions for acquiring two images from the two webcams cannot be executed simultaneously. We found that there is a 0.016 seconds delay between the acquisition of an image by the first webcam and the second webcam. The human subject can perform small movements from the initial standing position while the system attempts to track the markers movements.

(a) seq. 1 (b) seq. 2 From our experiments, we found that the radius of the kernel windows is a crucial parameter to the performance of the mean-shift algorithm. Indeed, this issue has also been reported in [5] that a window size that is too large can cause the tracker to become more easily distracted by background clutter and a window size that is too small can cause the kernel to roam around on a likelihood plateau around the mode, leading to poor object localization. In Fig. 4, we show the result of tracking using a h value of 10 pixels in another experiment. We found that this h value is too large to be used as a radius value as it is able to sometimes encapsulate two markers within a kernel window. As shown in Fig. 4(a) and 4(b), a kernel window that is too large allows the white markers to drift slightly away from the centre of, yet still being enclosed within, the kernel window. (c) front view (d) side view 5 Conclusion and future Work Figure 3: (a) and (b) show the tracking results of the mean-shift algorithm on the 9 white markers using a h value of 6 pixels; (c) and (d) the result of reconstruction. (a) seq. 1 (b) seq. 2 We have presented a low cost motion capture system using two webcams. While the current version of our system only captures movement of the lower part of the subject s body, it can be further extended to include the upper part, and hence allows full body movements to be animated. The current labelling algorithm can also be modified in order to cater for the other initial poses other than the standing one. Furthermore, since the h value is an important parameter to the meanshift algorithm and it affects the overall performance of our motion capture system, instead of relying on human intervention to provide an initial h value, the system can be further improved by automatically determining this value and adapting to scale changes during tracking. The notion of low cost motion capture is important for demonstrating the fundamental idea of motion capture and for providing inputs for various advanced animation applications. References [1] Meta Motion, http://www.metamotion.com /motion-capture/motion-capture.htm, Motion Capture - What is it?, 2004. (c) front view (d) side view Figure 4: (a) and (b) show the tracking results of the mean-shift algorithm on the 9 white markers using a h value of 10 pixels. (c) and (d) the result of 3D reconstruction. [2] e-frontier America, Inc, http://www.efrontier.com/go/poser hpl, Poser 6, 2005. [3] D. Comaniciu, V. Ramesh, and P. Meer, Real- Time Tracking of Non-Rigid Objects using Mean Shift, in Computer Vision and Pattern Recognition, 2000. Proceedings. IEEE Conference on, vol. 2, pp. pp 142 149, 2000.

[4] D. Comaniciu and P. Meer, Mean Shft Analysis and Applications, in Computer Vision, 1999. The Proceedings of the Seventh IEEE International Conference on, vol. 2, pp. pp 1197 1203, 1999. [5] R. T. Collins, Mean-Shift Blob Tracking through Scale Space, in Computer Vision and Pattern Recognition, 2003. Proceedings. 2003 IEEE Computer Society Conference on, vol. 2, pp. pp 18 20, 2003. [6] G. Welch and G. Bishop, An Introduction to the Kalman Filter, Tech. Rep. 95-041, Department of Computer Science, University of North Carolina, Chapel Hill, 1995. [7] T. Kailath, The Divergence and Bhattacharyya Distance Measures in Signal Selection, IEEE Trans. on Communications, vol. 15, pp. 52 60, Feb 1967. [8] Y. I. Abdel-Aziz and H. M. Karara, Direct Linear Transformation from Comparator to Object Space Coordinates in Close-Range Photogrammetry, in ASP Symposium on Close- Range Photogrammetry (H. Karara, ed.), (Urbana, Illinois), pp. 1 18, 1971. [9] C. C. Slama, C. Theurer, and S. W. Henriksen, eds., Manual of Photogrammetry. Falls Church, Virginia, USA: American Society of Photogrammetry and Remote Sensing, 1980.