Uncalibrated Motion Capture Exploiting Articulated Structure Constraints

Size: px
Start display at page:

Download "Uncalibrated Motion Capture Exploiting Articulated Structure Constraints"

Transcription

1 International Journal of Computer Vision 51(3), , 2003 c 2003 Kluwer Academic Publishers. Manufactured in The Netherlands. Uncalibrated Motion Capture Exploiting Articulated Structure Constraints DAVID LIEBOWITZ AND STEFAN CARLSSON Computational Vision and Active Perception Laboratory, Royal Institute of Technology, Sweden Received March 14, 2001; Revised December 19, 2001; Accepted July 30, 2002 Abstract. We present an algorithm for 3D reconstruction of dynamic articulated structures, such as humans, from uncalibrated multiple views. The reconstruction exploits constraints associated with a dynamic articulated structure, specifically the conservation over time of length between rotational joints. These constraints admit reconstruction of metric structure from at least two different images in each of two uncalibrated parallel projection cameras. As a by product, the calibration of the cameras can also be computed. The algorithm is based on a stratified approach, starting with affine reconstruction from factorization, followed by rectification to metric structure using the articulated structure constraints. The exploitation of these specific constraints admits reconstruction and selfcalibration with fewer feature points and views compared to standard self-calibration. The method is extended to pairs of cameras that are zooming, where calibration of the cameras allows compensation for the changing scale factor in a scaled orthographic camera. Results are presented in the form of stick figures and animated 3D reconstructions using pairs of sequences from broadcast television. The technique shows promise as a means of creating 3D animations of dynamic activities such as sports events. Keywords: motion capture, calibration, multiple views, 3D reconstruction 1. Introduction Systems for motion capture, i.e. dynamic 3D reconstruction of articulated structures such as humans in motion, typically employ a large set of cameras in a laboratory environment in order to maximise the stability of the reconstruction and handle the problem of self-occlusion. These systems are pre-calibrated, so the issue of motion capture is essentially data collection and reconstruction by triangulation. In real world situations such as sports events, on the other hand, the number of cameras viewing a specific situation is often limited to around two or three. The necessity of obtaining both an overview of the situation and close up details further limits the number of cameras useful for motion capture. These cameras are esentially never precalibrated, which in itself would present problems since they are often hand-held and moving around. In general these cameras are placed at very different positions in order to get complete coverage. This is beneficial for the problem of multi-view reconstruction since it implies large baselines between cameras. On the other hand it aggravates the problem of occlusion since cameras are essentially looking at different parts of the human body in motion. Considering the fact that events such as team sports naturally imply occlusions due to multiple actors, the limited number of visible features in multiple views can be a serious problem. A system for motion capture from real world sports events therefore has to face the problems of self-calibration and reconstruction from a limited number of views together with limitations on the number of simultaneously visible feature points in multiple views. Camera self-calibration from multiple views of general unconstrained structures requires at least 3 views for perspective cameras (Maybank and Faugeras, 1992). Sporting events are most often monitored by zooming cameras from relatively long distances, which implies that the camera geometry is very close to parallel projection. This further aggravates the problem of calibration, increasing the minimum number of cameras from 3 to 4 (Quan, 1996). It is therefore critical

2 172 Liebowitz and Carlsson Figure 1. One view of subset of selected points from a walking sequence in 3D. The length of body segments remain constant over time. This constraint is used for rectification of affine reconstructions from two views. to exploit all available scene constraints for calibration and reconstruction. Human bodies have traditionally been modelled as rigid link articulated structures. Knees, shoulders and elbows are rotational joints connected with (generally) rigid links. The lengths of body segments between rotational joints therefore remain constant over time (Fig. 1). We call this the fixed length or rigid link constraint, and it is the purpose of this paper to demonstrate that this constraint can be exploited in an efficient way in order to obtain metric reconstruction and camera calibration from two affine views of human motion. The method we propose uses a stratified approach for deriving metric from affine structure, computed from two uncalibrated views (Fig. 2), and proceeds as follows: Body locations at specific rotational joints are selected interactively from recorded image sequences (see Fig. 3). The 3D positions of the dynamic point sets over time are treated as a single static 3D structure. The point coordinates project to each camera as one view, as in Fig. 1. Two cameras will be considered for reconstruction in all cases. Affine 3D sequence structure is derived from two uncalibrated cameras. The rigid link or fixed body segment length constraint is used to upgrade the affine structure to metric. Figure 2. Stratified approach to sequence reconstruction and calibration. At each step, linear methods are used. This greatly simplifies the computations and effectively avoids the pitfalls of non-linear equation solving such as multiple solutions and iterative search. The specific constraints associated with articulated structures and motion capture has received some attention in earlier work. Much of the research in motion capture has focused on monocular sequences. Leventon and Freeman (1998), for example, take a statistical approach by using a motion capture database to build a probability model for motion sequences. Pose of the body in a novel sequence is then inferred by finding the most probable 3D configuration given the imaged co-ordinates. Taylor (2000) assumes known size body segments imaged by a scaled orthographic camera, and computes 3D orientation of the segments from foreshortening. Webb and Aggarwal (1983) considers the rotation axis of relative joints to be fixed for short times and is able to recover 3D structure from monocular sequences of articulated motion. In Bregler and Malik (1998) the constraint of having a kinematic chain is exploited in a single and multiple view system for automatic moption capture. This uses general minimisation procedures for finding structure and motion. The specific problems of having uncalibrated cameras are not discussed. Due to the general ambiguity in self calibration from limited number of views (Quan, 1996), (disussed in the next section) we therefore expect this approach to have problems of convergence unless extra constraints are introduced.

3 Uncalibrated Motion Capture Exploiting Articulated Structure Constraints 173 Figure 3. Body segments, such as the upper arm between the shoulder and elbow, and the upper leg, between the hip and knee, have a constant length when a person moves. The paper begins with a brief look at affine cameras and reconstruction in Section 2. Section 3 describes the general affine rectification transformation that defines a metric reconstruction of a scene This is followed by the calibration constraints that are available from knowledge of internal camera structure in Section 4. Section 5 introduces the relative length constraint, including degenerate configurations, and describes how it is used in motion capture. Finally, Section 6 presents the rigid link constraint applied to motion capture from broadcast video for both fixed and zooming cameras. 2. Affine Cameras and Reconstruction General affine cameras are characterised by an optic centre at infinity (Mundy and Zisserman, 1992). Using the affine camera model, world points are imaged by parallel projection. The rays through the camera centre, the world points and the image points are a parallel pencil. The imaged co-ordinates of a world point are thus independent of its depth in the scene. The affine camera projects a 3D point X to an image point x via ( ) r T ( ) x = K 1 t1 a X + = GX + t (1) r T 2 t 2 where the camera pose is described by the 3D translation vector t = (t 1, t 2, t 3 ) T and rotation matrix R with rows r T 1, rt 2 and rt 3. The matrix K a is associated with the internal parameters of the camera, and has the general form ( ) s k K a = 0 rs The parameter s is an isotropic scaling of image coordinates. The isotropic scaling can account for the changes in the imaged size of an object as a physical camera zooms or translates. Skew k and aspect ratio r model the skew and stretch of the imaging optic array. Cameras often have square pixels, with zero skew (k = 0) and unit aspect ratio (r = 1). K a can then be appropriately simplifed and the camera is termed a scaled orthographic camera. This property will be used in Section 6.3. Given two affine views of scene, world geometry can be reconstructed up to an affine transformation. This result, originating with Koenderink and van Doorn (1991), requires four points matched across views to define an affine basis for reconstruction. Further points may then be assigned 3D co-ordinates with respect to these basis vectors. A minimum of four points are also required to compute the affine fundamental matrix (Hartley and Zisserman, 2000) and thus affine epipolar geometry. The most widely used affine reconstruction method is the factorization algorithm of Tomasi and Kanade (1992). Factorization provides a robust and computationally efficient method of computing affine structure from point correspondences using singular value decomposition. It has also been shown to give a Maximum Likelihood reconstruction under Gaussian noise assumptions for image points (Reid and Murray, 1996). Using factorization, affine 3D structure is computed by the SVD of a measurement matrix W which, for two views, has the form ( ) x1 x 2 x n W = x 1 x 2 x n for n inhomogeneous matched points x i in the first view and x i in the second. The singular value decomposition

4 174 Liebowitz and Carlsson of W returns three matrices from which a pair of cameras G and G as well as 3D points A i can be extracted. Note that image co-ordinates are normalised in W to place the origin at the centroid of the points in each image. Since in parallel projection the 3D centroid of the point cloud is imaged at the 2D centroid, this centres the reconstruction coordinate system at the 3D centroid. In consequence, the translation parameters of the cameras are eliminated in (1), and G and G are 2 3 inhomogenous projection matrices. Attention in this paper is focused on affine cameras because they are useful to the motion capture application. However, an affine reconstruction can also be obtained from perspective cameras. A scene viewed by two or more uncalibrated perspective cameras can be reconstructed up to an arbitrary projective transformation (Faugeras, 1992). Numerous methods of upgrading the projective reconstruction to affine exist (Pollefeys et al., 1996; Hartley, 1992; Liebowitz and Zisserman, 1999). The relative length constraint described in Section 5 applies to any affine reconstruction. 3. From Affine Reconstruction to Metric Using Scene Constraints The general stratified approach to scene reconstruction from multiple perspective cameras distinguishes projective, affine and metric reconstructions (Faugeras, 1995). It is assumed here that an affine reconstruction is known, and must be rectified to metric. This section describes the decomposition of the affine transformation into similarity and pure affine components, and the simplification that results. Formally, a world point W and point A in an affine reconstruction are related by an affine transformation of 3D: W = HA + t (2) where H is a full rank 3 3 matrix and t is a translation 3-vector. In reconstruction for graphics purposes, similarity transformations may often be neglected. The overall scale, rotation and translation between the world and reconstruction can be chosen arbitrarily. The significant transformation is that which rectifies the reconstruction to within a similarity transformation of the world a metric reconstruction. To this end, the affine rectifying transformation, which has 12 degrees of freedom, is decomposed into two transformations. First, we neglect the translation vector, and then apply RQ decomposition (Hartley and Zisserman, 2000) to H. H may thus be written as a product of an orthonormal (rotation) matrix S and an upper triangular matrix U: H = SU (3) In addition, U may be regarded as a homogeneous matrix in the sense that its overall scale factor may be ignored. This scale factor is the isotropic scaling of the similarity transformation ss. U is thus a five degree of freedom purely affine transformation. Determining U and applying it to the affine structure results in metric structure. Any metric constraints on the world coordinates will therefore automatically induce constraints on the rectification U. To see this consider the rectified vector: W i = SUA i + t (4) Metric properties of a set of points can always be expressed in terms of inner products: (W i W j ) T (W k W l ) = (A i A j ) T U T U(A i A j ) (5) A similarity structure invariant can likewise always be expressed as a ratio of inner products. Any such ratio implies a linear constraint on the matrix U T U (W i W j ) T (W k W l )(A i A j )T U T U(A i A j ) (W i W j )T (W k W l ) (A i A j ) T U T U(A i A j ) = 0 (6) We will write = U T U (7) for this matrix. This is closely related to the absolute conic of projective geometry which has the property that it is invariant to 3D similarity transformations. It thus encodes only the affine distortion of the reconstruction, not the similarity component (Faugeras and Maybank, 1990). From the rectification U can easily be computed by Cholesky decomposition (Kreysig, 1993). Typical metric constraints involve lengths and angles for point configurations. In the following chapters we will exploit constraints such as conservation of

5 Uncalibrated Motion Capture Exploiting Articulated Structure Constraints 175 relative length over time that are valid for dynamic articulated structures. Internal parameters and autocalibration constraints may also be included in the computation of the rectification. This is addressed next. 4. Affine Camera Auto-calibration and Known Parameters Camera matrices in the affine reconstruction are upgraded to metric by U as well, with a metric camera given by G M = GU 1 Metric constraints on affine cameras therefore induce linear constraints on the inverse rectification 1 in a very similar way as in the previous section. In the original work on factorization (Tomasi and Kanade, 1992), rectification of affine structure relies on the orthographic projection camera model. For orthographic projection, the rows of the metric rectified cameras are orthonormal. This constraint applies equally to scaled orthography: the rows of each rectified camera are orthogonal and of equal length. Writing this for the camera G, a rectified camera is given by Hence ( ) ( ) g T GU 1 = 1 sr T U 1 = 1 g T 2 sr T 2 g T 1 U 1 (g T 2 U 1 ) T = g T 1 U 1 U T g 2 = g T 1 1 g 2 = 0 (8) and g T 1 U 1 (g T 1 U 1 ) T g T 2 U 1 (g T 2 U 1 ) T = g T 1 1 g 1 g T 2 1 g 2 = 0 (9) Both (8) and (9) are linear in the elements of 1. Given three or more cameras, 1 can be identified and metric structure obtained. Extensions of this approach to other camera models may be found (Poelman and Kanade, 1994; Shapiro et al., 1994; Weinshall and Tomasi, 1993), but scaled orthography is of most interest here since this model can be used to describe a camera with square pixels at some distance from the object of interest. A method for auto-calibration of a number of affine cameras with unknown but fixed internal parameters is described by Quan (1996). The auto-calibration constraint assumes unchanging camera parameters, and can be expressed in terms of the projection of G i 1 G T i = G j 1 G T j (10) for all camera pairs i, j. Quadratic constraints on the elements of 1 follow from dehomogenising (10) by dividing each side through by one of the entries of the matrix. Each pair of cameras provides two constraints, so a minimum of four cameras are required to solve for 1. With knowledge of some internal parameters or more views, the constraint can be linearised. The fixed camera autocalibration constraint is not generally applicable to motion capture, since the two cameras viewing the scene are different. The scaled orthographic camera model and constraints (8) and (9), however, will be used. 5. Rectification Constraints from Relative Length This section describes a constraint on the metric rectification of an affine reconstruction derived from a known ratio of world 3D lengths. It will be shown that the metric length of a line segment in an affine reconstruction can be expressed in terms of, and that this leads to a linear constraint on the elements of from two segments. Degenerate conditions are described and the application of the constraint to motion capture is introduced The Relative Length Constraint Consider two 3D line segments in an affine reconstruction, segment L A with endpoints A 1 and A 2, and segment L B with endpoints B 1 and B 2. The metric lengths of L A and L B, l A and l B, may be expressed by transforming the endpoints with U and taking inner products: la 2 = (UX A) T (UX A ) = X T A X A and lb 2 = (UX B) T (UX B ) = X T B X B (11) where X A = A 1 A 2 and X B = B 1 B 2. If the metric length ratio of L A and L B is λ, from (11): l 2 A = λ2 l 2 B X T A X A = λ 2 X T B X B (12)

6 176 Liebowitz and Carlsson where is the symmetric 3 3 matrix with elements = (13) Expanding (12) in terms of the elements of, X A = (X A1, X A2, X A3 ) T and X B = (X B1, X B2, X B3 ) T Figure 4. Parallel line segments provide no constraint on. X T A X A λ 2 X T B X B = 0 ( X 2 A1 λ 2 XB1 2 ) 1 + 2(X A1 X A2 λ 2 X B1 X B2 ) 2 + ( X 2 A2 λ2 X 2 B2) 3 + 2(X A1 X A3 λ 2 X B1 X B3 ) 4 + 2(X A2 X A3 λ 2 X B2 X B3 ) 5 + ( X 2 A3 λ2 X 2 B3) 6 = 0 This is a linear equation in the elements of, which we may write in vector form as c T Ω v = 0 (14) where c = (c 1,...,c 6 ) T is the vector of coefficients and Ω v = ( 1,..., 6 ) T is the vector of elements of. As before, is a homogeneous matrix with five degrees of freedom. Five independent constraints are thus required to solve for. Givenfive or more independent constraints, the coefficient vectors c 1 to c n can be combined in a constraint matrix. It follows from (14) that c T 1. c T n Ω v = CΩ v = 0 (15) and Ω v is the null vector of C, which is of rank five. In the presence of noise and with more than five constraints, C will generally be of rank six. In this case, singular value decomposition (SVD) provides an estimate of the null vector. The singular vector associated with the smallest singular value is the estimate of the null vector of C that minimises CΩ v. There are also degenerate configurations of lines segments which do not yield independent constraints. These are examined in the following section Degeneracies Constraint degeneracy occurs when a particular constraint fails to provide information about, or when a set of constraints are linearly dependent in (14). Three such conditions are enumerated below Parallel Line Segments. Length ratio in parallel directions is an affine invariant, so parallel line segments in an affine reconstruction preserve their metric length ratio. It is therefore unsurprising that no constraint on is obtained from the relative length of parallel segments. Explicitly, if L A and L B are parallel with length ratio λ, X A = λx B (see Fig. 4) and (12) becomes X T A X A λ 2 X T B X B = λ 2 X T B X B λ 2 X T B X B = 0 and is satisfied for all. The elements of the coefficient vector c will all be zero Pairs of Parallel Line Segments. Two constraints obtained from the relative lengths of segment pairs that are parallel are not independent. Suppose we have L A and L B with metric length ratio λ, and L C and L D with ratio κ, shown in Fig. 5, we can write two Figure 5. on. Parallel pairs of line segments provide one constraint

7 Uncalibrated Motion Capture Exploiting Articulated Structure Constraints 177 linear constraints on : X T A X A λ 2 X T B X B = 0 X T C X C κ 2 X T D X D = 0 However, if L A is parallel to L C and L B is parallel to L D, X C = αx A and X D = βx B. Additionally, for parallel directions length ratios are preserved, so α β = κ λ. The pair of constraints becomes X T A X A λ 2 X T B X B = 0 α 2 X T A X A α 2 λ 2 X T B X B = 0 These two equations are linearly dependent and only one constraint on is obtained Co-Planar or Parallel Plane Line Segments. Any number of constraints originating with coplanar line segments or line segments in parallel planes, as in Fig. 6, provide only two independent constraints on. Geometrically, the constraints fully define metric structure on a plane (Liebowitz and Zisserman, 1998), and indeed on the pencil of parallel planes, but provide no information about other directions. As always with degenerate conditions, care must be taken in implementing the constraints. Theoretically degenerate constraints may appear to provide valid constraints due to measurement noise, and near degenerate conditions provide unstable results The Rigid Link Constraint The relative length constraint can be applied in practice without quantitative knowledge of length ratios. This is the case when the qualitative observation can be made x x A B x E Figure 6. Coplanar or parallel plane line segments provided at most two constraints on. x x F C x D that objects of equal length are present in a scene, determining a unit length ratio constraint. Accordingly, human motion is treated below as the movement of an articulated, rigid link structure. Two cases will be considered (Fig. 7): 1. two fixed general affine cameras with different parameters observing an entire motion sequence, and 2. n pairs of scaled orthographic cameras with changing parameters, each imaging one frame of the motion. In the first case, sequences captured by a pair of unchanging cameras are equivalent to two images of the motion. Each view is regarded as having viewed the entire set of sampled postures from the motion in time in one image. The postures are treated as a static scene, requiring a single reconstruction and rectification transformation. Considered as a static 3D object, the set of postures then contains a number of fixed length objects: each body segment, such as the upper arm, remains a constant length through the set. This is the rigid link constraint. When the cameras are changing, however, the postures cannot be regarded as a static set in a single pair of images. Each frame must be reconstructed and rectified separately. In this case, rigid links are used for each reconstruction as well as between reconstructions to constrain all the rectifications. 6. Motion Capture The application of the rigid link constraint to motion capture is shown here in the context of broadcast footage of sports. In sports broadcasts, interesting events are typically shown from a second camera in the replay, providing two views of the action. The cameras at sports events are also often placed far from the participants, at the edge of the field or mounted on the stands, and use long focal lengths. The affine camera model is thus applicable, and an affine reconstruction can be created directly from matched points. The following sections describe the manual matching process, reconstruction and calibration steps used to rectify this reconstruction using rigid links. The initial method described assumes a pair of fixed cameras. This is a somewhat naive assumption for sports broadcasts, since cameras often pan and zoom while following a player. Despite this, however, a fixed camera assumption is valid in some situations, and can

8 178 Liebowitz and Carlsson 1. Fixed general affine cameras, sequence of n frames Affine structure from factorization 1 2 n Metric structure by rectification 1 2 n rigid link constraint 2. Scaled orthographic cameras with changing parameters 1 2 Affine structure from factorization n (1) (2) (n) 1 Metric structure by rectification 2 n rigid link constraint Figure 7. Factorization and rectification for (1) fixed cameras and (2) changing cameras. also result in realistic animation even when violated. This is followed by a method of jointly calibrating a number of pairs of cameras using rigid links. It is shown that, using the scaled orthography camera model, the change in scale factor resulting from a zooming camera can be eliminated, although rotation between cameras remains unknown Affine Reconstruction The first step in affine reconstruction from two views is to select the corresponding image points defining the body parts of interest. A 15 point model of the human skeleton is used, specifying the dominant joints: shoulders, elbows, hips, knees and ankles, as well as the tips of the feet and the head or neck. An example, two views of a triplejump event, appears in Fig. 8, with the skeleton points shown in two images. This footage is captured by a pair of cameras that are moving with the athlete (probably on tracks). The net effect is that the athlete remains in approximately the same position in the images throughout the sequence. The cameras are thus treated as a pair of fixed affine cameras viewing a motion as if on a treadmill.

9 Uncalibrated Motion Capture Exploiting Articulated Structure Constraints 179 Figure 8. The triplejump. Frames 13, 26, 30 and 51 of an 93 frame sequence, shown in side view above and front view below. Manually selected joints are shown in the first frame of each set. Translation of the athlete is lost, but the relative movement of the body parts can be reconstructed. The skeleton points are selected by hand in the images, frame by frame. Two difficulties typically occur. The first is occlusion, either self-occlusion of joints as a body moves, or occlusion from other people or objects in the scene. Self-occlusion is relatively benign, since the position of the occluded joint can often be inferred (by a human operator) from posture and the other view. Occlusion by other objects can be a severe problem, depending on the extent. A number of football players clustered in a goal area, for example, can obscure a significant part of the player of interest. The second difficulty is image dropout. Sequences are digitised from interlaced broadcast video, and are of poor quality. Rapidly moving features, like the hands, often smear in motion blur, making precise joint location difficult. All these features are present in the triplejump sequence. The athlete s hands move very fast and blur considerably in the side view, and the left shoulder, arm and hip are self-occluded in much of that sequence. Humans are, however, good at inferring the position of self-occluded joints, so reasonable guesses can be made about the occluded co-ordinates. Motion blur remains a source of inaccuracy of the order of a few pixels, but this is unavoidable given the nature of the data. Uncertain points in either view are flagged in the clicking process. Having selected the corresponding point pairs, an initial affine reconstruction is computed using the factorization algorithm. Points flagged as uncertain in the selection process can be neglected from this reconstruction. The affine fundamental matrix can then be computed from the cameras and used to recompute the uncertain points by projecting them to the epipolar line defined by the fundamental matrix and the corresponding point in the other view (Shapiro et al., 1995). The system thus tolerates points uncertain in one view, but clearly visible in the other. An affine reconstruction of the triplejump sequence is depicted in Fig. 9. Wireframe models for four frames corresponding to Fig. 8 are shown, with horizontal Figure 9. Four postures from the affine reconstruction of the triplejump sequences. The same four postures are shown from three viewpoints. Compare to the corresponding frames in Fig. 8.

10 180 Liebowitz and Carlsson translation introduced. Note the effect of affine distortion, particularly on the aspect ratio of the body. The affine reconstruction can now be metric rectified using the rigid link constraint Metric Rectification The rigid link constraint is present in two forms in human motion. This first is the constant length of body segments the upper and lower arms and legs and the distance between the hip joints through the motion. There are nine constraints on from each view pair for these particular segments. The second source of constraints is symmetry, the equal length of left and right arm and leg segments, providing four constraints per frame. Constraints are obtained both these sources using (12) with a length ratio of 1, resulting in a constraint matrix of the form in (15). In principal, two frames are sufficient to over constrain, but for interesting motions there are typically tens or hundreds of frames, so is significantly over constrained. Note also that, in applying (12), the rigid link constraints between frames can be written for the reconstruction of a single body segment between any two postures. There are, for a single segment appearing in n frames, ( n 2 ) constraints, of which n 1 are independent. The approach taken here is to write the n 1 constraints for each segment in successive pairs of frames in time. SVD of the constraint matrix and Cholesky decomposition of the resulting estimate of yield a rectification matrix U, and metric 3D structure follows. Metric rectified structure for the triplejump example appears in Fig. 10. The final step in reconstructing the motion is regularization of segment sizes. Since there is noise in the system, the metric humanoid figure does not have exactly fixed length body segments throughout the reconstructed sequence. The model for each frame of the sequence is thus replaced by a fixed size humanoid with median segment lengths while preserving the angles of body segments. The results appear in Fig. 11. Introducing the hard constraint of same length in this ad hoc way gives a reasonable appearance of the 3D motion. This should eventually be replaced by a bundle adjustment process involving a minimisation of reprojection errors imposing the fixed length constraints. Figure 10. Four postures from the metric reconstruction of the triplejump sequences. The same four postures are shown from three viewpoints. Compare to the corresponding frames in Fig. 9. Figure 11. The reconstruction of the triplejump sequence, with fixed length segments fitted to body parts.

11 Uncalibrated Motion Capture Exploiting Articulated Structure Constraints 181 Figure 12. Five frames from the triplejump, equally spaced in time. In each row, frames from the two input image sequences appear alongside snapshots of an animated generic biped from various viewpoints. Stick figures representations of motion have limited usefullness on their own. A solid skeletal model provides a far more informative representation of the motion. Ultimately, of course, the movement can be mapped to a realistic, textured humanoid model. Accordingly, the motion capture data is exported to a format understood by a commercial animation package, the Character Studio plugin to 3D Studio Max.

12 182 Liebowitz and Carlsson Figure 13. Three images from the animated VRML model. Character Studio is able to take 3D positional motion capture data and convert it to a joint rotation hierarchy using an inverse kinematic system. The motion is mapped onto a generic biped figure, allow further processing including manual editing, frame decimation and animation of a more complex mesh figure. The generic biped triplejumper appears in Fig. 12. The motion data has also been applied to a generic VRML mesh based humanoid, shown in Fig. 13. The humanoid was created using Blaxxun Avatar Studio, a product allowing limited customization of generic humanoid meshes for use in online chat spaces with a simple set of animated gestures. The result is a low polygon mesh suitable for online use, although with limited mobility. The quality of the reconstruction of motion is difficult to evaluate. Clearly, the accuracy of calibrated, marker based, multi-camera systems cannot be replicated. However, when viewed as an animation, the motion appears to be a faithful reproduction of the action. The animation is smooth and duplicates not only the coarse motion of the arms and legs, but also captures some of the subtle elements of the movement, such as the asymmetry in the hip and shouder positions through the jump. Note that degeneracy plays an important role in human motion capture. A pure translation of limbs, for example, leaves them parallel, leading to degenerate constraints. This might occur for a person walking with arms held in a fixed position holding something. Furthermore, when walking or running in a straight line, human arms and legs tend to move roughly in a pair of parallel vertical planes. The calibration constraints from rigid links in the arms and legs provide little information outside of these planes. The use of the constant distance between the hip joints is thus important (the shoulders tend to be too independently mobile to provide a similar constraint) Changing Cameras Given two sequences acquired with changing cameras, the matched points in the pair of images at each time instance should, strictly speaking, be reconstructed separately. The affine reconstructions obtained from each frame pair are related to the world by a general affine transformation, including a similarity component. There is thus an arbitrary rotation, translation and scaling in addition to the affine distortion encoded by the rectification relating the metric rectifications of each frame. It is, however, still possible to write rigid link constraints on the rectification of each reconstruction. These are constraints on the rectifications for each reconstruction, denoted ( j), j = 1,...,m for a pair of sequences of length m. Before going on to describe the constraints, it is worthwhile considering how the fixed camera assumption is violated. The cameras filming sports events like football generally commonly zoom and pan (rotate) to follow a player or the ball. In many situations, the rotation is necessarily limited to keep the player in the field of view, since the camera is some distance away. There is, however, a significant translation in the image co-ordinates of body points. The effect of zooming is to increase the imaged size of the player. Consider the example of Fig. 14, which shows four frames from a sequence of a football goal. The affine, metric and final reconstructions assuming fixed cameras appear in Figs. 15, 16 and 17. Observe that in the intial affine reconstruction there is, in addition to the affine distortion, an overall increase in body size. This is due to the increase in zoom in the two sequences over time. Also, since there is some rotation of the cameras, a rotation of the body from posture to posture is introduced. It is interesting to note that despite this, the final motion reconstruction nevertheless appears to be quite accurate. This is because, while there is considerable zoom in the sequence, the rotation is small. Zoom introduces errors

13 Uncalibrated Motion Capture Exploiting Articulated Structure Constraints 183 Figure 14. A football goal. Frames 1, 20, 30 and 40 of a 42 frame sequence, shown in two views. Figure 15. Four postures from the affine reconstruction of the sequences in Fig. 14 asssuming fixed cameras. Figure 16. Metric reconstruction of the football sequence assuming fixed cameras. Figure 17. Fixed length body segment reconstruction of the football sequence.

14 184 Liebowitz and Carlsson Camera scale factor Camera 2 Camera Frame number Figure 18. Scale factors of the two (scaled orthographic) cameras filming the football sequence. The scales are normalised to the first frame of camera 1. in scale which are ameliorated by both the application of the rigid link constraint to successive frames rather than frames widely spaced in time, and the regularization of structure to fixed length segments. Nevertheless, it will now be shown how the effect of zooming can be eliminated from the images. The approach taken is to jointly calibrate all the pairs of different cameras up to a common global scale factor. Regarding the cameras as scaled orthographies, the camera scale factors are then known. The image points can be scaled accordingly, compensating for the changing zoom. The effects of panning cannot be dealt with in this way, since the orientation between different pairs of cameras is not recovered. This would require some static scene object to be visible. In the changing camera case, the rigid link constraint can be applied in two contexts. Firstly, symmetry constraints that the right and left arm and leg segments are the same length are valid in each reconstruction. There are thus at most four constraints on each ( j) of the form (14). Secondly, the rigid link constraint may be applied across reconstructions: (X j ) T ( j) (X j ) (X k ) T (k) (X k ) = 0 where X j and X k represent the same rigid body segment in reconstructions j and k. This leads to a linear constraint on the vector of elements of both ( j) and (k) ( j) 2 ( j) X ( X j 1 X j ) ( j) ( X j ) 2 ( j) ( X j 1 X j ) ( j) ( X j 2 X j ) ( j) ( X j ) 2 ( j) 3 6 ) 2 (k) ) (k) ) 2 (k) ( X k ( X k 1 X k ( X1 k X 2 k ) (k) ( X2 k X 3 k 2 + ( X2 k ) (k) 5 + ( X3 k 3 ) 2 (k) 6 = 0 (16) The constraint can be written in vector form by constructing a vector of the elements of all the ( j) s ˆΩ v = ( (1) 1, (1) 2, (1) 3, (1) 4,..., (m) 4, (m) 5, (m) ) T 6 and writing the coefficients of these elements in (16) as c ( jk) = ( 0,...,0, c ( j) j) j) 1, c( 2,...,c( 6, 0,...,0, c (k) 1, c(k) 2,...c(k) 6, 0...) T pixels Frame 1 Frame 30 Frame 40 Figure 19. Zoom elimination in images from frames 1, 30 and 40 from the second camera in the football example. The selected image points for frame 1 appear on the left. In the middle and on the right (for frames 30 and 40) are large figures which are plots of the selected image points in those frames, and smaller figures obtained by scaling according to the recovered camera parameters.

15 Uncalibrated Motion Capture Exploiting Articulated Structure Constraints 185 The linear constraint in vector notation is then ( c ( jk) )T ˆΩ v = 0 Up to nine constraints of this form can be written for each pair of reconstructions from the arms, legs and hips. Each ( j) is over constrained, and rectification of each construction can be computed from the SVD of a constraint matrix whose rows are c ( jk) vectors C ˆΩ v = 0 Note that the ( j) s computed are not homogenous matrices in this context. The set of rectifications that are computed are such that the relative lengths of segments in the metric reconstructions are the same, and thus include a scale factor for each reconstruction. In fact it is the vector of elements of all the rectifications in the set, ˆΩ v, that is homogenous. The common scale of this vector is a global scale factor applicable to all the reconstructions. In practice, since there are far fewer constraints on each rectification than is the case for the single rectification with fixed cameras, it is useful to introduce camera parameters. Assuming that the cameras may be modelled as scaled orthographies, four equations on the dual of each rectification ( ( j) ) 1 can be obtained from (8) and (9) for each camera pair. These constraints are linear, but since they are in the inverse rectification, they cannot be simply combined with the rigid link constraints. A solution is to use an iterative method to minimize a cost function over all the rectification affinities U ( j). The cost function has two parts: C = C struc + C cam The first term expresses the orthogonality and equal scale of the camera rows for cameras G ( j) M = ( g j M1, g j ) T M2 = G ( j) ( U ( j)) 1 ( j) and G M = ( g j M1, g j ) T M2 = G ( j) ( U ( j)) 1 and is thus m ( C cam = gm1 j g j ) 2 ( M2 + g j M1 g j ) 2 M2 j=1 + (( g j ) Tg j M1 M2 ) 2 + (( g j M1 ) Tg j ) 2 M2 The second term, C struc, is composed of terms of the form ( U ( j) X j U (k) X k ) 2 for all applicable rigid links in the structure. The minimization can be initialized by a linear solution of ˆΩ v or from the fixed camera method. The set of resulting U ( j) s returns m metric reconstructions with a common scale factor. However, since each initial affine reconstruction is a general affine transformation away from world structure, each has its own unknown Euclidean rectification component. That is, there remains a set of unknown rotations and translations between metric reconstructions of individual frame pairs. But, since the scaling is global to all the reconstructions, the relative scaling between pairs of cameras is known. The effect of a zooming camera can thus be eliminated by scaling the image points in each sequence relative to the first camera. Assuming negligable rotation of the cameras through the sequence, it is possible to return to the fixed camera method having removed the effect of zooming, and thus compute a metric reconstruction of the entire motion. Returning to the football example, consider Fig. 18, a plot of the scale factors of the two cameras calibrate as above. The cameras appear to be zooming in unison, with a ratio of (The fixed ratio between the camera scale factors suggests that there is global control of the zoom on the cameras filming the match.) The selected image points can now be scaled proportionately to negate the effect of the changing scale factor, as in Fig. 19. Applying the fixed camera algorithm to the scaled image data yields the results in Figs Figure 20. Affine reconstruction of the football sequence after zoom elimination. Note that the large scale increase seen in Fig. 15 is gone.

16 186 Liebowitz and Carlsson Figure 21. Final reconstruction of the football sequence. Figure 22. Six frames from an animation of the biped representation of the football example. 7. Conclusion This paper has introduced a new constraint on the rectification of affine reconstructions from a known metric ratio of lengths of a pair of line segments. The constraint has been successfuly applied to motion capture by modelling a moving human as an articulated rigid link structure. Examples of reconstructed motion from uncalibrated broadcast footage of sports events have been presented, with both fixed and changing cameras. A number of extensions are possible. As a system, the motion capture approach would benefit greatly from automated tracking to replace, or complement, the manual selection of joints. Completely automated tracking is unlikely to be completely successful given the quality of some digitised broadcast video. However, even partial tracking in conjunction with operator interaction will significantly speed up the process. The rigid link constraint could also be applied to marker data, where reflective markers are attached to the body to facilitate tracking. A partial solution to the occlusion problem can be found when a section of semi-occluded frames can be omitted from a sequence, but the body can still be reconstructed from the remaining frames. An occluded point, now a known distance from an unoccluded 3D point, can lie anywhere on a sphere centred on the unoccluded point. If the occluded point is visible in one view, the back-projected ray defined by the image point intersects the sphere in two points, one of which is the occluded 3D point. Future work will incorporate this feature in the algorithm.

17 Uncalibrated Motion Capture Exploiting Articulated Structure Constraints 187 Finally, bundle adjustment is desirable to fit a truly rigid linked model to the image data. This is a complex non-linear optimisation problem, also currently under investigation. References Bregler, C. and Malik, J Tracking people with twists and exponential maps. In Proc. IEEE Conference on Computer Vision and Pattern Recognition. Faugeras, O What can be seen in three dimensions with an uncalibrated stereo rig? In Proc. European Conference on Computer Vision, LNCS, vol. 588, Springer-Verlag: Berlin, pp Faugeras, O.D Stratification of three-dimensional vision: Projective, affine, and metric representation. Journal of the Optical Society of America A, 12: Faugeras, O.D. and Maybank, S.J Motion from point matches: Multiplicity of solutions. International Journal of Computer Vision, 4: Hartley, R.I Estimation of relative camera positions for uncalibrated cameras. In Proc. European Conference on Computer Vision, LNCS, vol. 588, Springer-Verlag: Berlin, pp Hartley, R.I. and Zisserman, A Multiple View Geometry in Computer Vision. Cambridge University Press; Cambridge, UK. Koenderink, J.J. and van Doorn, A.J Affine structure from motion. J. Opt. Soc. Am. A, 8(2): Kreysig, E Advanced Engineering Mathematics. John Wiley and Sons: New York. Leventon, M.E. and Freeman, W.T Bayesian estimation of 3-D human motion from an image sequence. Technical Report TR-98-06, MERL. Liebowitz, D. and Zisserman, A Metric rectification for perspective images of planes. In Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp Liebowitz, D. and Zisserman, A Combining scene and autocalibration constraints. In Proc. 7th International Conference on Computer Vision, Kerkyra, Greece. Maybank, S. and Faugeras, O.D A theory of self-calibration of a moving camera. International Journal of Computer Vision, 8(2): Mundy, J. and Zisserman, A Geometric Invariance in Computer Vision. MIT Press: Cambridge, MA. Poelman, C. and Kanade, T A paraperspective factorization method for shape and motion recovery. In Proc. 3rd European Conference on Computer Vision, Stockholm, vol. 2, pp Pollefeys, M., Van Gool, L., and Oosterlinck, A The modulus constraint: A new constraint for self-calibration. In Proc. International Conference on Pattern Recognition, pp Quan, L Self-calibration of an affine camera from multiple views. International Journal of Computer Vision, 19(1): Reid, I.D. and Murray, D.W Active tracking of foveated feature clusters using affine structure. International Journal of Computer Vision, 18(1): Shapiro, L., Zisserman, A., and Brady, M Motion from point matches using affine epipolar geometry. In Proc. European Conference on Computer Vision, LNCS, vol. 800/801, Springer- Verlag: Berlin. Shapiro, L.S., Zisserman, A., and Brady, M D motion recovery via affine epipolar geometry. International Journal of Computer Vision, 16(2): Taylor, C.J Reconstruction of articulated objects from point correspondences in a single uncalibrated image. In Proc. IEEE Conference on Computer Vision and Pattern Recognition. Tomasi, C. and Kanade, T Shape and motion from image streams under orthography: A factorization approach. International Journal of Computer Vision, 9(2): Webb, J.A. and Aggarwal, J.K Structure from motion of rigid and jointed object. Artificial Intelligence, 19: Weinshall, D. and Tomasi, C Linear and incremental acquisition of invariant shape models from image sequences. In Proc. 4th International Conference on Computer Vision, Berlin, IEEE Computer Society Press: Los Alamitos, CA, pp

calibrated coordinates Linear transformation pixel coordinates

calibrated coordinates Linear transformation pixel coordinates 1 calibrated coordinates Linear transformation pixel coordinates 2 Calibration with a rig Uncalibrated epipolar geometry Ambiguities in image formation Stratified reconstruction Autocalibration with partial

More information

A General Expression of the Fundamental Matrix for Both Perspective and Affine Cameras

A General Expression of the Fundamental Matrix for Both Perspective and Affine Cameras A General Expression of the Fundamental Matrix for Both Perspective and Affine Cameras Zhengyou Zhang* ATR Human Information Processing Res. Lab. 2-2 Hikari-dai, Seika-cho, Soraku-gun Kyoto 619-02 Japan

More information

Perception and Action using Multilinear Forms

Perception and Action using Multilinear Forms Perception and Action using Multilinear Forms Anders Heyden, Gunnar Sparr, Kalle Åström Dept of Mathematics, Lund University Box 118, S-221 00 Lund, Sweden email: {heyden,gunnar,kalle}@maths.lth.se Abstract

More information

Rectification and Distortion Correction

Rectification and Distortion Correction Rectification and Distortion Correction Hagen Spies March 12, 2003 Computer Vision Laboratory Department of Electrical Engineering Linköping University, Sweden Contents Distortion Correction Rectification

More information

Stereo and Epipolar geometry

Stereo and Epipolar geometry Previously Image Primitives (feature points, lines, contours) Today: Stereo and Epipolar geometry How to match primitives between two (multiple) views) Goals: 3D reconstruction, recognition Jana Kosecka

More information

Structure from motion

Structure from motion Structure from motion Structure from motion Given a set of corresponding points in two or more images, compute the camera parameters and the 3D point coordinates?? R 1,t 1 R 2,t 2 R 3,t 3 Camera 1 Camera

More information

Unit 3 Multiple View Geometry

Unit 3 Multiple View Geometry Unit 3 Multiple View Geometry Relations between images of a scene Recovering the cameras Recovering the scene structure http://www.robots.ox.ac.uk/~vgg/hzbook/hzbook1.html 3D structure from images Recover

More information

Structure from motion

Structure from motion Structure from motion Structure from motion Given a set of corresponding points in two or more images, compute the camera parameters and the 3D point coordinates?? R 1,t 1 R 2,t R 2 3,t 3 Camera 1 Camera

More information

COMPARATIVE STUDY OF DIFFERENT APPROACHES FOR EFFICIENT RECTIFICATION UNDER GENERAL MOTION

COMPARATIVE STUDY OF DIFFERENT APPROACHES FOR EFFICIENT RECTIFICATION UNDER GENERAL MOTION COMPARATIVE STUDY OF DIFFERENT APPROACHES FOR EFFICIENT RECTIFICATION UNDER GENERAL MOTION Mr.V.SRINIVASA RAO 1 Prof.A.SATYA KALYAN 2 DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING PRASAD V POTLURI SIDDHARTHA

More information

Camera Calibration Using Line Correspondences

Camera Calibration Using Line Correspondences Camera Calibration Using Line Correspondences Richard I. Hartley G.E. CRD, Schenectady, NY, 12301. Ph: (518)-387-7333 Fax: (518)-387-6845 Email : hartley@crd.ge.com Abstract In this paper, a method of

More information

Stereo Vision. MAN-522 Computer Vision

Stereo Vision. MAN-522 Computer Vision Stereo Vision MAN-522 Computer Vision What is the goal of stereo vision? The recovery of the 3D structure of a scene using two or more images of the 3D scene, each acquired from a different viewpoint in

More information

Multiple View Geometry in Computer Vision Second Edition

Multiple View Geometry in Computer Vision Second Edition Multiple View Geometry in Computer Vision Second Edition Richard Hartley Australian National University, Canberra, Australia Andrew Zisserman University of Oxford, UK CAMBRIDGE UNIVERSITY PRESS Contents

More information

Metric Rectification for Perspective Images of Planes

Metric Rectification for Perspective Images of Planes 789139-3 University of California Santa Barbara Department of Electrical and Computer Engineering CS290I Multiple View Geometry in Computer Vision and Computer Graphics Spring 2006 Metric Rectification

More information

METRIC PLANE RECTIFICATION USING SYMMETRIC VANISHING POINTS

METRIC PLANE RECTIFICATION USING SYMMETRIC VANISHING POINTS METRIC PLANE RECTIFICATION USING SYMMETRIC VANISHING POINTS M. Lefler, H. Hel-Or Dept. of CS, University of Haifa, Israel Y. Hel-Or School of CS, IDC, Herzliya, Israel ABSTRACT Video analysis often requires

More information

arxiv: v1 [cs.cv] 28 Sep 2018

arxiv: v1 [cs.cv] 28 Sep 2018 Camera Pose Estimation from Sequence of Calibrated Images arxiv:1809.11066v1 [cs.cv] 28 Sep 2018 Jacek Komorowski 1 and Przemyslaw Rokita 2 1 Maria Curie-Sklodowska University, Institute of Computer Science,

More information

A Factorization Method for Structure from Planar Motion

A Factorization Method for Structure from Planar Motion A Factorization Method for Structure from Planar Motion Jian Li and Rama Chellappa Center for Automation Research (CfAR) and Department of Electrical and Computer Engineering University of Maryland, College

More information

Index. 3D reconstruction, point algorithm, point algorithm, point algorithm, point algorithm, 253

Index. 3D reconstruction, point algorithm, point algorithm, point algorithm, point algorithm, 253 Index 3D reconstruction, 123 5+1-point algorithm, 274 5-point algorithm, 260 7-point algorithm, 255 8-point algorithm, 253 affine point, 43 affine transformation, 55 affine transformation group, 55 affine

More information

Index. 3D reconstruction, point algorithm, point algorithm, point algorithm, point algorithm, 263

Index. 3D reconstruction, point algorithm, point algorithm, point algorithm, point algorithm, 263 Index 3D reconstruction, 125 5+1-point algorithm, 284 5-point algorithm, 270 7-point algorithm, 265 8-point algorithm, 263 affine point, 45 affine transformation, 57 affine transformation group, 57 affine

More information

Toward a Stratification of Helmholtz Stereopsis

Toward a Stratification of Helmholtz Stereopsis Toward a Stratification of Helmholtz Stereopsis Todd E Zickler Electrical Engineering Yale University New Haven CT 652 zickler@yaleedu Peter N Belhumeur Computer Science Columbia University New York NY

More information

University of Southern California, 1590 the Alameda #200 Los Angeles, CA San Jose, CA Abstract

University of Southern California, 1590 the Alameda #200 Los Angeles, CA San Jose, CA Abstract Mirror Symmetry 2-View Stereo Geometry Alexandre R.J. François +, Gérard G. Medioni + and Roman Waupotitsch * + Institute for Robotics and Intelligent Systems * Geometrix Inc. University of Southern California,

More information

BIL Computer Vision Apr 16, 2014

BIL Computer Vision Apr 16, 2014 BIL 719 - Computer Vision Apr 16, 2014 Binocular Stereo (cont d.), Structure from Motion Aykut Erdem Dept. of Computer Engineering Hacettepe University Slide credit: S. Lazebnik Basic stereo matching algorithm

More information

Euclidean Reconstruction Independent on Camera Intrinsic Parameters

Euclidean Reconstruction Independent on Camera Intrinsic Parameters Euclidean Reconstruction Independent on Camera Intrinsic Parameters Ezio MALIS I.N.R.I.A. Sophia-Antipolis, FRANCE Adrien BARTOLI INRIA Rhone-Alpes, FRANCE Abstract bundle adjustment techniques for Euclidean

More information

Visual Recognition: Image Formation

Visual Recognition: Image Formation Visual Recognition: Image Formation Raquel Urtasun TTI Chicago Jan 5, 2012 Raquel Urtasun (TTI-C) Visual Recognition Jan 5, 2012 1 / 61 Today s lecture... Fundamentals of image formation You should know

More information

Multiple Motion Scene Reconstruction from Uncalibrated Views

Multiple Motion Scene Reconstruction from Uncalibrated Views Multiple Motion Scene Reconstruction from Uncalibrated Views Mei Han C & C Research Laboratories NEC USA, Inc. meihan@ccrl.sj.nec.com Takeo Kanade Robotics Institute Carnegie Mellon University tk@cs.cmu.edu

More information

METR Robotics Tutorial 2 Week 2: Homogeneous Coordinates

METR Robotics Tutorial 2 Week 2: Homogeneous Coordinates METR4202 -- Robotics Tutorial 2 Week 2: Homogeneous Coordinates The objective of this tutorial is to explore homogenous transformations. The MATLAB robotics toolbox developed by Peter Corke might be a

More information

CS231A Course Notes 4: Stereo Systems and Structure from Motion

CS231A Course Notes 4: Stereo Systems and Structure from Motion CS231A Course Notes 4: Stereo Systems and Structure from Motion Kenji Hata and Silvio Savarese 1 Introduction In the previous notes, we covered how adding additional viewpoints of a scene can greatly enhance

More information

Factorization Method Using Interpolated Feature Tracking via Projective Geometry

Factorization Method Using Interpolated Feature Tracking via Projective Geometry Factorization Method Using Interpolated Feature Tracking via Projective Geometry Hideo Saito, Shigeharu Kamijima Department of Information and Computer Science, Keio University Yokohama-City, 223-8522,

More information

The end of affine cameras

The end of affine cameras The end of affine cameras Affine SFM revisited Epipolar geometry Two-view structure from motion Multi-view structure from motion Planches : http://www.di.ens.fr/~ponce/geomvis/lect3.pptx http://www.di.ens.fr/~ponce/geomvis/lect3.pdf

More information

Epipolar Geometry and the Essential Matrix

Epipolar Geometry and the Essential Matrix Epipolar Geometry and the Essential Matrix Carlo Tomasi The epipolar geometry of a pair of cameras expresses the fundamental relationship between any two corresponding points in the two image planes, and

More information

Structure from Motion

Structure from Motion Structure from Motion Outline Bundle Adjustment Ambguities in Reconstruction Affine Factorization Extensions Structure from motion Recover both 3D scene geoemetry and camera positions SLAM: Simultaneous

More information

Structure from Motion CSC 767

Structure from Motion CSC 767 Structure from Motion CSC 767 Structure from motion Given a set of corresponding points in two or more images, compute the camera parameters and the 3D point coordinates?? R,t R 2,t 2 R 3,t 3 Camera??

More information

Lecture 3: Camera Calibration, DLT, SVD

Lecture 3: Camera Calibration, DLT, SVD Computer Vision Lecture 3 23--28 Lecture 3: Camera Calibration, DL, SVD he Inner Parameters In this section we will introduce the inner parameters of the cameras Recall from the camera equations λx = P

More information

Structure and motion in 3D and 2D from hybrid matching constraints

Structure and motion in 3D and 2D from hybrid matching constraints Structure and motion in 3D and 2D from hybrid matching constraints Anders Heyden, Fredrik Nyberg and Ola Dahl Applied Mathematics Group Malmo University, Sweden {heyden,fredrik.nyberg,ola.dahl}@ts.mah.se

More information

Linear Multi View Reconstruction and Camera Recovery Using a Reference Plane

Linear Multi View Reconstruction and Camera Recovery Using a Reference Plane International Journal of Computer Vision 49(2/3), 117 141, 2002 c 2002 Kluwer Academic Publishers. Manufactured in The Netherlands. Linear Multi View Reconstruction and Camera Recovery Using a Reference

More information

Uncalibrated and Unsynchronized Human Motion Capture : A Stereo Factorization Approach

Uncalibrated and Unsynchronized Human Motion Capture : A Stereo Factorization Approach Uncalibrated and Unsynchronized Human Motion Capture : Stereo Factorization pproach P. Tresadern and I. Reid, Dept. of Engineering Science, University of Oxford, Oxford, England OX1 3PJ {pat,ian}@robots.ox.ac.uk

More information

Toward a Stratification of Helmholtz Stereopsis

Toward a Stratification of Helmholtz Stereopsis Appears in Proc. CVPR 2003 Toward a Stratification of Helmholtz Stereopsis Todd E. Zickler Electrical Engineering Yale University New Haven, CT, 06520 zickler@yale.edu Peter N. Belhumeur Computer Science

More information

Computer Vision Projective Geometry and Calibration. Pinhole cameras

Computer Vision Projective Geometry and Calibration. Pinhole cameras Computer Vision Projective Geometry and Calibration Professor Hager http://www.cs.jhu.edu/~hager Jason Corso http://www.cs.jhu.edu/~jcorso. Pinhole cameras Abstract camera model - box with a small hole

More information

An Overview of Matchmoving using Structure from Motion Methods

An Overview of Matchmoving using Structure from Motion Methods An Overview of Matchmoving using Structure from Motion Methods Kamyar Haji Allahverdi Pour Department of Computer Engineering Sharif University of Technology Tehran, Iran Email: allahverdi@ce.sharif.edu

More information

Model Refinement from Planar Parallax

Model Refinement from Planar Parallax Model Refinement from Planar Parallax A. R. Dick R. Cipolla Department of Engineering, University of Cambridge, Cambridge, UK {ard28,cipolla}@eng.cam.ac.uk Abstract This paper presents a system for refining

More information

Two-view geometry Computer Vision Spring 2018, Lecture 10

Two-view geometry Computer Vision Spring 2018, Lecture 10 Two-view geometry http://www.cs.cmu.edu/~16385/ 16-385 Computer Vision Spring 2018, Lecture 10 Course announcements Homework 2 is due on February 23 rd. - Any questions about the homework? - How many of

More information

Multiple Views Geometry

Multiple Views Geometry Multiple Views Geometry Subhashis Banerjee Dept. Computer Science and Engineering IIT Delhi email: suban@cse.iitd.ac.in January 2, 28 Epipolar geometry Fundamental geometric relationship between two perspective

More information

55:148 Digital Image Processing Chapter 11 3D Vision, Geometry

55:148 Digital Image Processing Chapter 11 3D Vision, Geometry 55:148 Digital Image Processing Chapter 11 3D Vision, Geometry Topics: Basics of projective geometry Points and hyperplanes in projective space Homography Estimating homography from point correspondence

More information

Factorization with Missing and Noisy Data

Factorization with Missing and Noisy Data Factorization with Missing and Noisy Data Carme Julià, Angel Sappa, Felipe Lumbreras, Joan Serrat, and Antonio López Computer Vision Center and Computer Science Department, Universitat Autònoma de Barcelona,

More information

Structure from Motion. Lecture-15

Structure from Motion. Lecture-15 Structure from Motion Lecture-15 Shape From X Recovery of 3D (shape) from one or two (2D images). Shape From X Stereo Motion Shading Photometric Stereo Texture Contours Silhouettes Defocus Applications

More information

Plane-based Calibration Algorithm for Multi-camera Systems via Factorization of Homography Matrices

Plane-based Calibration Algorithm for Multi-camera Systems via Factorization of Homography Matrices Plane-based Calibration Algorithm for Multi-camera Systems via Factorization of Homography Matrices Toshio Ueshiba Fumiaki Tomita National Institute of Advanced Industrial Science and Technology (AIST)

More information

Structure from Motion and Multi- view Geometry. Last lecture

Structure from Motion and Multi- view Geometry. Last lecture Structure from Motion and Multi- view Geometry Topics in Image-Based Modeling and Rendering CSE291 J00 Lecture 5 Last lecture S. J. Gortler, R. Grzeszczuk, R. Szeliski,M. F. Cohen The Lumigraph, SIGGRAPH,

More information

A linear algorithm for Camera Self-Calibration, Motion and Structure Recovery for Multi-Planar Scenes from Two Perspective Images

A linear algorithm for Camera Self-Calibration, Motion and Structure Recovery for Multi-Planar Scenes from Two Perspective Images A linear algorithm for Camera Self-Calibration, Motion and Structure Recovery for Multi-Planar Scenes from Two Perspective Images Gang Xu, Jun-ichi Terai and Heung-Yeung Shum Microsoft Research China 49

More information

Homogeneous Coordinates. Lecture18: Camera Models. Representation of Line and Point in 2D. Cross Product. Overall scaling is NOT important.

Homogeneous Coordinates. Lecture18: Camera Models. Representation of Line and Point in 2D. Cross Product. Overall scaling is NOT important. Homogeneous Coordinates Overall scaling is NOT important. CSED44:Introduction to Computer Vision (207F) Lecture8: Camera Models Bohyung Han CSE, POSTECH bhhan@postech.ac.kr (",, ) ()", ), )) ) 0 It is

More information

Robust Camera Calibration from Images and Rotation Data

Robust Camera Calibration from Images and Rotation Data Robust Camera Calibration from Images and Rotation Data Jan-Michael Frahm and Reinhard Koch Institute of Computer Science and Applied Mathematics Christian Albrechts University Kiel Herman-Rodewald-Str.

More information

Stereo Image Rectification for Simple Panoramic Image Generation

Stereo Image Rectification for Simple Panoramic Image Generation Stereo Image Rectification for Simple Panoramic Image Generation Yun-Suk Kang and Yo-Sung Ho Gwangju Institute of Science and Technology (GIST) 261 Cheomdan-gwagiro, Buk-gu, Gwangju 500-712 Korea Email:{yunsuk,

More information

Computer Vision I - Algorithms and Applications: Multi-View 3D reconstruction

Computer Vision I - Algorithms and Applications: Multi-View 3D reconstruction Computer Vision I - Algorithms and Applications: Multi-View 3D reconstruction Carsten Rother 09/12/2013 Computer Vision I: Multi-View 3D reconstruction Roadmap this lecture Computer Vision I: Multi-View

More information

EXPERIMENTAL RESULTS ON THE DETERMINATION OF THE TRIFOCAL TENSOR USING NEARLY COPLANAR POINT CORRESPONDENCES

EXPERIMENTAL RESULTS ON THE DETERMINATION OF THE TRIFOCAL TENSOR USING NEARLY COPLANAR POINT CORRESPONDENCES EXPERIMENTAL RESULTS ON THE DETERMINATION OF THE TRIFOCAL TENSOR USING NEARLY COPLANAR POINT CORRESPONDENCES Camillo RESSL Institute of Photogrammetry and Remote Sensing University of Technology, Vienna,

More information

Structure from Motion

Structure from Motion 11/18/11 Structure from Motion Computer Vision CS 143, Brown James Hays Many slides adapted from Derek Hoiem, Lana Lazebnik, Silvio Saverese, Steve Seitz, and Martial Hebert This class: structure from

More information

3D reconstruction class 11

3D reconstruction class 11 3D reconstruction class 11 Multiple View Geometry Comp 290-089 Marc Pollefeys Multiple View Geometry course schedule (subject to change) Jan. 7, 9 Intro & motivation Projective 2D Geometry Jan. 14, 16

More information

3D Modeling for Capturing Human Motion from Monocular Video

3D Modeling for Capturing Human Motion from Monocular Video 3D Modeling for Capturing Human Motion from Monocular Video 1 Weilun Lao, 1 Jungong Han 1,2 Peter H.N. de With 1 Eindhoven University of Technology 2 LogicaCMG Netherlands P.O. Box 513 P.O. Box 7089 5600MB

More information

Week 2: Two-View Geometry. Padua Summer 08 Frank Dellaert

Week 2: Two-View Geometry. Padua Summer 08 Frank Dellaert Week 2: Two-View Geometry Padua Summer 08 Frank Dellaert Mosaicking Outline 2D Transformation Hierarchy RANSAC Triangulation of 3D Points Cameras Triangulation via SVD Automatic Correspondence Essential

More information

Projective Rectification from the Fundamental Matrix

Projective Rectification from the Fundamental Matrix Projective Rectification from the Fundamental Matrix John Mallon Paul F. Whelan Vision Systems Group, Dublin City University, Dublin 9, Ireland Abstract This paper describes a direct, self-contained method

More information

Flexible Calibration of a Portable Structured Light System through Surface Plane

Flexible Calibration of a Portable Structured Light System through Surface Plane Vol. 34, No. 11 ACTA AUTOMATICA SINICA November, 2008 Flexible Calibration of a Portable Structured Light System through Surface Plane GAO Wei 1 WANG Liang 1 HU Zhan-Yi 1 Abstract For a portable structured

More information

Euclidean Reconstruction from Constant Intrinsic Parameters

Euclidean Reconstruction from Constant Intrinsic Parameters uclidean Reconstruction from Constant ntrinsic Parameters nders Heyden, Kalle Åström Dept of Mathematics, Lund University Box 118, S-221 00 Lund, Sweden email: heyden@maths.lth.se, kalle@maths.lth.se bstract

More information

Passive 3D Photography

Passive 3D Photography SIGGRAPH 2000 Course on 3D Photography Passive 3D Photography Steve Seitz Carnegie Mellon University University of Washington http://www.cs cs.cmu.edu/~ /~seitz Visual Cues Shading Merle Norman Cosmetics,

More information

3D Model Acquisition by Tracking 2D Wireframes

3D Model Acquisition by Tracking 2D Wireframes 3D Model Acquisition by Tracking 2D Wireframes M. Brown, T. Drummond and R. Cipolla {96mab twd20 cipolla}@eng.cam.ac.uk Department of Engineering University of Cambridge Cambridge CB2 1PZ, UK Abstract

More information

A Case Against Kruppa s Equations for Camera Self-Calibration

A Case Against Kruppa s Equations for Camera Self-Calibration EXTENDED VERSION OF: ICIP - IEEE INTERNATIONAL CONFERENCE ON IMAGE PRO- CESSING, CHICAGO, ILLINOIS, PP. 172-175, OCTOBER 1998. A Case Against Kruppa s Equations for Camera Self-Calibration Peter Sturm

More information

Hand-Eye Calibration from Image Derivatives

Hand-Eye Calibration from Image Derivatives Hand-Eye Calibration from Image Derivatives Abstract In this paper it is shown how to perform hand-eye calibration using only the normal flow field and knowledge about the motion of the hand. The proposed

More information

Fundamentals of Stereo Vision Michael Bleyer LVA Stereo Vision

Fundamentals of Stereo Vision Michael Bleyer LVA Stereo Vision Fundamentals of Stereo Vision Michael Bleyer LVA Stereo Vision What Happened Last Time? Human 3D perception (3D cinema) Computational stereo Intuitive explanation of what is meant by disparity Stereo matching

More information

Camera Calibration and 3D Reconstruction from Single Images Using Parallelepipeds

Camera Calibration and 3D Reconstruction from Single Images Using Parallelepipeds Camera Calibration and 3D Reconstruction from Single Images Using Parallelepipeds Marta Wilczkowiak Edmond Boyer Peter Sturm Movi Gravir Inria Rhône-Alpes, 655 Avenue de l Europe, 3833 Montbonnot, France

More information

Camera calibration. Robotic vision. Ville Kyrki

Camera calibration. Robotic vision. Ville Kyrki Camera calibration Robotic vision 19.1.2017 Where are we? Images, imaging Image enhancement Feature extraction and matching Image-based tracking Camera models and calibration Pose estimation Motion analysis

More information

Camera Calibration from the Quasi-affine Invariance of Two Parallel Circles

Camera Calibration from the Quasi-affine Invariance of Two Parallel Circles Camera Calibration from the Quasi-affine Invariance of Two Parallel Circles Yihong Wu, Haijiang Zhu, Zhanyi Hu, and Fuchao Wu National Laboratory of Pattern Recognition, Institute of Automation, Chinese

More information

Reminder: Lecture 20: The Eight-Point Algorithm. Essential/Fundamental Matrix. E/F Matrix Summary. Computing F. Computing F from Point Matches

Reminder: Lecture 20: The Eight-Point Algorithm. Essential/Fundamental Matrix. E/F Matrix Summary. Computing F. Computing F from Point Matches Reminder: Lecture 20: The Eight-Point Algorithm F = -0.00310695-0.0025646 2.96584-0.028094-0.00771621 56.3813 13.1905-29.2007-9999.79 Readings T&V 7.3 and 7.4 Essential/Fundamental Matrix E/F Matrix Summary

More information

Structure from Motion

Structure from Motion Structure from Motion Lecture-13 Moving Light Display 1 Shape from Motion Problem Given optical flow or point correspondences, compute 3-D motion (translation and rotation) and shape (depth). 2 S. Ullman

More information

Affine Surface Reconstruction By Purposive Viewpoint Control

Affine Surface Reconstruction By Purposive Viewpoint Control Affine Surface Reconstruction By Purposive Viewpoint Control Kiriakos N. Kutulakos kyros@cs.rochester.edu Department of Computer Sciences University of Rochester Rochester, NY 14627-0226 USA Abstract We

More information

CSE 252B: Computer Vision II

CSE 252B: Computer Vision II CSE 252B: Computer Vision II Lecturer: Serge Belongie Scribe: Sameer Agarwal LECTURE 1 Image Formation 1.1. The geometry of image formation We begin by considering the process of image formation when a

More information

Multiple View Geometry

Multiple View Geometry Multiple View Geometry CS 6320, Spring 2013 Guest Lecture Marcel Prastawa adapted from Pollefeys, Shah, and Zisserman Single view computer vision Projective actions of cameras Camera callibration Photometric

More information

Model-Based Human Motion Capture from Monocular Video Sequences

Model-Based Human Motion Capture from Monocular Video Sequences Model-Based Human Motion Capture from Monocular Video Sequences Jihun Park 1, Sangho Park 2, and J.K. Aggarwal 2 1 Department of Computer Engineering Hongik University Seoul, Korea jhpark@hongik.ac.kr

More information

CSE 252B: Computer Vision II

CSE 252B: Computer Vision II CSE 252B: Computer Vision II Lecturer: Serge Belongie Scribe: Haowei Liu LECTURE 16 Structure from Motion from Tracked Points 16.1. Introduction In the last lecture we learned how to track point features

More information

Combining Two-view Constraints For Motion Estimation

Combining Two-view Constraints For Motion Estimation ombining Two-view onstraints For Motion Estimation Venu Madhav Govindu Somewhere in India venu@narmada.org Abstract In this paper we describe two methods for estimating the motion parameters of an image

More information

C / 35. C18 Computer Vision. David Murray. dwm/courses/4cv.

C / 35. C18 Computer Vision. David Murray.   dwm/courses/4cv. C18 2015 1 / 35 C18 Computer Vision David Murray david.murray@eng.ox.ac.uk www.robots.ox.ac.uk/ dwm/courses/4cv Michaelmas 2015 C18 2015 2 / 35 Computer Vision: This time... 1. Introduction; imaging geometry;

More information

Multiple View Geometry in Computer Vision

Multiple View Geometry in Computer Vision Multiple View Geometry in Computer Vision Prasanna Sahoo Department of Mathematics University of Louisville 1 Structure Computation Lecture 18 March 22, 2005 2 3D Reconstruction The goal of 3D reconstruction

More information

A Stratified Approach for Camera Calibration Using Spheres

A Stratified Approach for Camera Calibration Using Spheres IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. XX, NO. Y, MONTH YEAR 1 A Stratified Approach for Camera Calibration Using Spheres Kwan-Yee K. Wong, Member, IEEE, Guoqiang Zhang, Student-Member, IEEE and Zhihu

More information

Zoom control to compensate camera translation within a robot egomotion estimation approach

Zoom control to compensate camera translation within a robot egomotion estimation approach Zoom control to compensate camera translation within a robot egomotion estimation approach Guillem Alenyà and Carme Torras Institut de Robòtica i Informàtica Industrial (CSIC-UPC), Llorens i Artigas 4-6,

More information

Feature Transfer and Matching in Disparate Stereo Views through the use of Plane Homographies

Feature Transfer and Matching in Disparate Stereo Views through the use of Plane Homographies Feature Transfer and Matching in Disparate Stereo Views through the use of Plane Homographies M. Lourakis, S. Tzurbakis, A. Argyros, S. Orphanoudakis Computer Vision and Robotics Lab (CVRL) Institute of

More information

Combining Off- and On-line Calibration of a Digital Camera

Combining Off- and On-line Calibration of a Digital Camera Combining ff- and n-line Calibration of a Digital Camera Magdalena Urbanek, Radu Horaud and Peter Sturm INRIA Rhône-Alpes 655, avenue de l Europe, 38334 Saint Ismier Cedex, France {MagdalenaUrbanek, RaduHoraud,

More information

Local qualitative shape from stereo. without detailed correspondence. Extended Abstract. Shimon Edelman. Internet:

Local qualitative shape from stereo. without detailed correspondence. Extended Abstract. Shimon Edelman. Internet: Local qualitative shape from stereo without detailed correspondence Extended Abstract Shimon Edelman Center for Biological Information Processing MIT E25-201, Cambridge MA 02139 Internet: edelman@ai.mit.edu

More information

CS 664 Structure and Motion. Daniel Huttenlocher

CS 664 Structure and Motion. Daniel Huttenlocher CS 664 Structure and Motion Daniel Huttenlocher Determining 3D Structure Consider set of 3D points X j seen by set of cameras with projection matrices P i Given only image coordinates x ij of each point

More information

Lecture 14: Basic Multi-View Geometry

Lecture 14: Basic Multi-View Geometry Lecture 14: Basic Multi-View Geometry Stereo If I needed to find out how far point is away from me, I could use triangulation and two views scene point image plane optical center (Graphic from Khurram

More information

55:148 Digital Image Processing Chapter 11 3D Vision, Geometry

55:148 Digital Image Processing Chapter 11 3D Vision, Geometry 55:148 Digital Image Processing Chapter 11 3D Vision, Geometry Topics: Basics of projective geometry Points and hyperplanes in projective space Homography Estimating homography from point correspondence

More information

Recovering structure from a single view Pinhole perspective projection

Recovering structure from a single view Pinhole perspective projection EPIPOLAR GEOMETRY The slides are from several sources through James Hays (Brown); Silvio Savarese (U. of Michigan); Svetlana Lazebnik (U. Illinois); Bill Freeman and Antonio Torralba (MIT), including their

More information

A Method for Interactive 3D Reconstruction of Piecewise Planar Objects from Single Images

A Method for Interactive 3D Reconstruction of Piecewise Planar Objects from Single Images A Method for Interactive 3D Reconstruction of Piecewise Planar Objects from Single Images Peter F Sturm and Stephen J Maybank Computational Vision Group, Department of Computer Science The University of

More information

Rectification for Any Epipolar Geometry

Rectification for Any Epipolar Geometry Rectification for Any Epipolar Geometry Daniel Oram Advanced Interfaces Group Department of Computer Science University of Manchester Mancester, M13, UK oramd@cs.man.ac.uk Abstract This paper proposes

More information

Two-View Geometry (Course 23, Lecture D)

Two-View Geometry (Course 23, Lecture D) Two-View Geometry (Course 23, Lecture D) Jana Kosecka Department of Computer Science George Mason University http://www.cs.gmu.edu/~kosecka General Formulation Given two views of the scene recover the

More information

Lecture 9: Epipolar Geometry

Lecture 9: Epipolar Geometry Lecture 9: Epipolar Geometry Professor Fei Fei Li Stanford Vision Lab 1 What we will learn today? Why is stereo useful? Epipolar constraints Essential and fundamental matrix Estimating F (Problem Set 2

More information

Invariance of l and the Conic Dual to Circular Points C

Invariance of l and the Conic Dual to Circular Points C Invariance of l and the Conic Dual to Circular Points C [ ] A t l = (0, 0, 1) is preserved under H = v iff H is an affinity: w [ ] l H l H A l l v 0 [ t 0 v! = = w w] 0 0 v = 0 1 1 C = diag(1, 1, 0) is

More information

Mei Han Takeo Kanade. January Carnegie Mellon University. Pittsburgh, PA Abstract

Mei Han Takeo Kanade. January Carnegie Mellon University. Pittsburgh, PA Abstract Scene Reconstruction from Multiple Uncalibrated Views Mei Han Takeo Kanade January 000 CMU-RI-TR-00-09 The Robotics Institute Carnegie Mellon University Pittsburgh, PA 1513 Abstract We describe a factorization-based

More information

3D Reconstruction with two Calibrated Cameras

3D Reconstruction with two Calibrated Cameras 3D Reconstruction with two Calibrated Cameras Carlo Tomasi The standard reference frame for a camera C is a right-handed Cartesian frame with its origin at the center of projection of C, its positive Z

More information

Stereo CSE 576. Ali Farhadi. Several slides from Larry Zitnick and Steve Seitz

Stereo CSE 576. Ali Farhadi. Several slides from Larry Zitnick and Steve Seitz Stereo CSE 576 Ali Farhadi Several slides from Larry Zitnick and Steve Seitz Why do we perceive depth? What do humans use as depth cues? Motion Convergence When watching an object close to us, our eyes

More information

CHAPTER 3. Single-view Geometry. 1. Consequences of Projection

CHAPTER 3. Single-view Geometry. 1. Consequences of Projection CHAPTER 3 Single-view Geometry When we open an eye or take a photograph, we see only a flattened, two-dimensional projection of the physical underlying scene. The consequences are numerous and startling.

More information

Module 4F12: Computer Vision and Robotics Solutions to Examples Paper 2

Module 4F12: Computer Vision and Robotics Solutions to Examples Paper 2 Engineering Tripos Part IIB FOURTH YEAR Module 4F2: Computer Vision and Robotics Solutions to Examples Paper 2. Perspective projection and vanishing points (a) Consider a line in 3D space, defined in camera-centered

More information

Critical Motion Sequences for the Self-Calibration of Cameras and Stereo Systems with Variable Focal Length

Critical Motion Sequences for the Self-Calibration of Cameras and Stereo Systems with Variable Focal Length Critical Motion Sequences for the Self-Calibration of Cameras and Stereo Systems with Variable Focal Length Peter F Sturm Computational Vision Group, Department of Computer Science The University of Reading,

More information

Exponential Maps for Computer Vision

Exponential Maps for Computer Vision Exponential Maps for Computer Vision Nick Birnie School of Informatics University of Edinburgh 1 Introduction In computer vision, the exponential map is the natural generalisation of the ordinary exponential

More information

Articulated Structure from Motion through Ellipsoid Fitting

Articulated Structure from Motion through Ellipsoid Fitting Int'l Conf. IP, Comp. Vision, and Pattern Recognition IPCV'15 179 Articulated Structure from Motion through Ellipsoid Fitting Peter Boyi Zhang, and Yeung Sam Hung Department of Electrical and Electronic

More information

NON-RIGID 3D SHAPE RECOVERY USING STEREO FACTORIZATION

NON-RIGID 3D SHAPE RECOVERY USING STEREO FACTORIZATION NON-IGID D SHAPE ECOVEY USING STEEO FACTOIZATION Alessio Del Bue ourdes Agapito Department of Computer Science Queen Mary, University of ondon ondon, E1 NS, UK falessio,lourdesg@dcsqmulacuk ABSTACT In

More information