Line Based Estimation of Object Space Geometry and Camera Motion

Size: px
Start display at page:

Download "Line Based Estimation of Object Space Geometry and Camera Motion"

Transcription

1 Line Based Estimation of Object Space Geometry and Camera Motion Dissertation Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University By Panu Srestasathiern, B.E., M.S. Graduate Program in Geodetic Science The Ohio State University 2012 Dissertation Committee: Alper Yilmaz, Advisor Alan Saalfeld Ralph von Frese

2 c Copyright by Panu Srestasathiern 2012

3 Abstract In this dissertation, two problems of 3D structure and camera motion recovery are addressed. The first problem is the 3D reconstruction problem using multiple images. Particularly, in this dissertation, the line estimation using multiple views is researched. The second addressed problem of 3D structure and camera motion recovery is the line-based bundle adjustment. A novel cost function for line based bundle adjustment is proposed. For the line based 3D structure and camera motion recovery, the first problem is the 3D line estimation which provides an initial solution for the bundle adjustment process. In order to facilitate this I represent the 3D line by its Plücker coordinates. A typical requirement of this representation is the use of the Plücker constraint. I leverage the state of art by waiving the Plücker constraint and propose two streamlined solutions to 3D line estimation problem. The first proposed 3D line estimation model is based on the preservation of coincidence in the dual projective space. The second method is based on the averaging of a set of 3D lines which are generated by the intersection of the back-projection planes from multiple images viewing the estimated 3D line. The second component of my proposal is to develop a new bundle adjustment model. More precisely, a new line-based cost function that defines a geometric error in the object space is proposed. The proposed cost function is derived by using the equivalence between the image plane and the unit Gaussian sphere with its center positioned at the optical center of the image plane. Particularly, the geometric error is defined as the integrated squared ii

4 distance between the projection plane of a 3D line estimate and point on the perimeter of the circular sector equivalent to the image of the 3D line estimate. iii

5 This is dedicated to my family. iv

6 Acknowledgments First of all, I would like to give my sincere appreciation to my advisor, Dr. Alper Yilmaz for encouragement throughout this research and all of the time and energy he spent in meeting with me and discussing my research. Also, I would like to thank Dr. Alan Saalfeld and Dr. Ralph von Frese for serving on my Dissertation Defense Committee. I would like to thank the faculty in the Department of Civil and Environmental Engineering and Geodetic Science and Department of Earth Science for making The Ohio State University one of the best place to study geodetic science. It is a great honor to join the geodetic science program at The Ohio State University. I would like to thank members of photogrammetric computer vision laboratory, and all of my friends in The Ohio State University for making the time I have studied here one of the best time in my life. My appreciation also goes to the Thai Government. This support gave the most valuable chance for me to persue the Ph.D. degree at The Ohio State University. To my parents, your encouragement, love and support have guided me through this process. To my sister, I would like to thank you for having good conversations with me every weekend. v

7 Vita February Born - Bangkok, Thailand May B.Eng. Electrical Engineering, Chulalongkorn University, Bangkok, Thailand December M.S. Geodetic Sciences, The Ohio State University, Columbus, OH, USA Publications Research Publications P. Srestasathiern and A. Yilmaz. View invariant object recognition. In ICPR, pages 1-4, P. Srestasathiern and A. Yilmaz. Planar shape representation and matching under projective transformation. Comput. Vis. Image Underst., 115(11): , November G. Barsai, A. Yilmaz, S. Nagarajan, P. Srestasathiern, : Registration of Images to LIDAR and GIS data without Establishing Explicit Correspondences, Photogrammetric Engineering & Remote Sensing, submitted paper (2 nd round of revision) P. Srestasathiern and A. Yilmaz. A Line based approach to recovering the 3D scene geometry and camera motion, IEEE Trans. Image Process., submitted paper Fields of Study vi

8 Major Field: Geodetic Science and Surveying Studies in: Topic 1 Topic 2 Topic 3 Digital photogrammetry Multiple view geometry Mathematical image analysis vii

9 Table of Contents Page Abstract Dedication Acknowledgments ii iv v Vita vi List of Tables x List of Figures xi 1. Introduction Scope of dissertation Organization of the dissertation Background Projective geometry Representation of geometric entities Pinhole camera model The equivalence between the image plane and the Gaussian sphere Duality principle Riemannian manifold Non-linear mean shift on Riemannian manifold Ordinary mean shift Non-linear mean shift viii

10 3. 3D line estimation Literature review Overview of the proposed methods D line triangulation by incidence property in dual space D line triangulation by averaging approach Effect of scaling on 3D line representation Generating 3D line samples D line averaging by averaging in special orthogonal groups Line based Bundle Adjustment Literature review Overview of this chapter Geometric error in object space Parameterization of camera motion and 3D line Parameterization of camera motion Parameterization of 3D line Numerical optimization Experimental results Performance evaluation Experiments on synthetic data Experiment on real data The model house sequence The book sequence Fish eye camera Conclusion Bibliography ix

11 List of Tables Table Page 2.1 The comparison between operators in vector spaces and manifolds Comparative results from the experiment with the model house sequence Comparative results from the experiment with the book sequence x

12 List of Figures Figure Page 2.1 A 3D line L can be represented by its Plücker coordinates L = [L h L o ], which is 6-vector. The first 3-vector, L h is the direction of the line and the second one is the moment of the line which is the normal vector of the plane containing origin and the line Back-projection planes from a stereo pair intersect at a line in space A scene point X is projected to an image point x The back-projection plane from the image line l. Its is also known as the interpretation plane The mapping between Gaussian sphere S 2 and 2D dimensional projection space P 2. A line l on P 2 is equivalent to a great circle with normal vector N on the Gaussian sphere. If a point x on the image line l is equivalent to a point V on the Gaussian sphere, the point V must be on the great circle equivalent to the image line l A line segment on the image plane is equivalent to a circular arc which is a segment of the circumference of the great circle equivalent to the line passing x 1 and x The dual entity of a point x on 2D projective space is a line x in the dual projective space or vice versa. A proposition also has a dual. This Figure demonstrates the dual of collinearity which is intersection in the dual projective space. Three collinear points on the line l become three concurrent lines in the dual projective space where the intersection point is the dual of the line l xi

13 2.8 Mapping between manifold and tangent space. The logarithm operator log x maps a point y on the manifold to a point on the tangent space T x ; while the exponential operator exp x maps the point back onto the manifold. Note that these two operators depend on the point x on the manifold The non-linear mean shift on manifold. To compute the mean shift vector, all data points are projected to the tangent space of the current mode estimate. On the tangent space, the mean shift vector is computed. The updated mode estimate can be obtained by projecting the mean shift vector back to the manifold D line estimation using two views. The location of the estimated line is not well constrained because the back-projection planes from two views always intersect in space Estimated 3D lines on the Klein quadric denoting the proposed final error. Red denotes the estimated 3D line, and other colors denote pairwise intersections of back-projected lines. (a) Initial solution, (b) final solution The dual projective space. Given two planes intersecting at a line L, the dual elements of these two planes are incident with dual line L The plane containing the origin of the object space coordinate system and the 3D line L intersect the back-projection plane at the 3D line. As a result, the cross product between the moment L o and the normal of the backprojection plan parallel to the line direction L h Given a point in R 2 where all points are on a 1-D manifold i.e. circular arc. The average of there points computed by the barycentricn mean does not lay on the circular arc An image line feature can be represented by raw edge which is a sequence of points. A cost function for bundle adjustment is then defined as a geometric relation between an image point and its corresponding 3D line e.g. collinearity equation xii

14 4.2 The proposed object space error is defined as the integrated squared-distance between the projection plane and point on the boundary of the sector formed by O, V 1 and V 2. Points V 1 and V 2 are from mapping endpoints x 1 and x 2 of an image line segment onto Gaussian sphere s surface where the line is the image of 3D line L. The projection plane of the 3D line L has normal vector M in the camera coordinate system A 3D line under changing coordinate system from object space to camera coordinate system. The moment of of the 3D line in camera coordinate system, L o, becomes the normal of projection plane Local parameterization of the parameter manifold. Each point on the manifold has a Euclidean structure. The current estimate is updated locally such that solution is still on the manifold and minimizes the cost function g The similarity measure between the re-projection of the estimated line and its corresponding line segment is the orthogonal distance from line segment endpoints to the re-projected line Camera setup for wide baseline Experiment result on wide base line case Camera setup for short baseline Experiment result on short baseline case Two images from the model house sequence overlaid with image lines (yellow dash lines) and re-projected line estimates (black solid lines) D line reconstruction and estimated camera motions from the proposed method after bundle adjustment. 5.7a shows the 3D structure and camera poses. 5.7b shows the side view of the model house Reconstructed 3D scene viewed from two different viewpoints Top view of the reconstructed model house sequence from different methods xiii

15 5.10 Reprojection of estimated 3D lines onto two sample images of the book sequence. The manually detected image lines are plotted in yellow dash lines and re-projected line estimates in black solid lines Zoom on an area of an image from the book sequence overlaid with image lines (yellow dash lines) and re-projected line estimates (black solid lines). The result from initial 3D line triangulation is improved by the bundle adjustment process The reconstructed 3D scene and camera poses Top and front views of the reconstructed scene Re-constructed 3D scene viewed from different angles Re-projection of the estimated lines to the image. The re-projected lines are shown in yellow solid line Reconstructed 3D scene and camera poses. Figure 5.16b shows the side view of the scene Top view of the reconstructed calibration pattern The re-projection of initial and adjusted results. The re-projected lines are shown in yellow solid line xiv

16 Chapter 1: Introduction A main objective in the photogrammetry and computer vision fields is to recover the scene geometry and the camera motions. Broadly speaking, the goal is to infer 3D information from 2D data i.e. a set of images of a static or moving scene. Determination of camera motions and 3D scene reconstruction can be also applied for other applications. For example, for video augmentation, known camera motion and 3D scene provides us with the ability to place an artificial object in the reference frame. Another application is the orthophoto generation for which the camera motion recovery is very crucial. In robotics, the recovery of camera motions and 3D scene geometry are used to simultaneously build a map of the environment and localize the robot within that environment. The 3D scene recovery deals with the estimation of unknown geometric entities, such as points; while the recovery of camera motion deals with the estimation of the optical centers and orientations of cameras in an object coordinate system. Broadly speaking, 3D structure and camera motion recovery can be solved by the two following approaches: Solving camera motions first then estimating 3D structure, Solving the 3D structure and the camera motions simultaneously. 1

17 The cameras used for the 3D information recovery can be either non-calibrated or calibrated. When a camera is calibrated, its interior orientation is known. The interior orientation of a camera refers to geometric transformation between the 3D image coordinate system and camera s perspective center. Especially, the interior orientation of a digital perspective camera consists of the position of the image center, scaling factor for row and column pixels, and the skew factor. These parameters, which are referred to as the calibration parameters are compactly embedded in a calibration matrix. The lens distortion such as the radial and tangential distortion can be taken into account in order to obtain a more accurate 3D information recovery. In order to obtain 3D metric information on both the recovered 3D structure and the camera motion, cameras usually are assumed to be calibrated. In other words, non-calibrated cameras only give solution projectively equivalent to the metric one. However, this does not limit the availability of obtaining metric recovery from non-calibrated cameras. A technique called auto-calibration can be performed to upgrade solution from projective to the metric one [1, page 272]. The auto-calibration is a process of directly estimating the calibration parameters from images of un-calibrated cameras with no or few assumptions on the scene structure. Moreover, while traditional camera calibration methods require a special calibration pattern or object, the auto-calibration methods do not. After the calibration matrix is obtained both the camera motion and the scene geometry can be upgraded to metric space. In terms of the input data, traditionally, most of the 3D scene and camera motion recovery methods use point features as the input data [2, 3, 4]. For example, in photogrammetric activities, the colinearity equation is predominantly used. The model is formulated based on the collinearity of rays emanating from the optical center of the camera and passing an 2

18 image point and its corresponding 3D point. In computer vision, the linear camera model is originally formulated for the point projection. The point based approach in both fields is very popular due to the fact that the geometric notion of a point is well defined. However, the disadvantage of the point based approach is that the point feature can be easily occluded. This problem can be fixed by using higher order features such as lines and conic sections. In a typical 3D recovery framework, the camera motions should first be estimated. The estimation of the camera motions is known as the pose estimation problem. The motion can be defined either as the displacement of cameras with respect to a global coordinate system (absolute orientation), or the relative motion between the cameras (relative orientation). A simple setup for the relative motion recovery is to fix a local coordinate system to the coordinate system of a selected camera. For example, lets consider an image sequence. The origin of the local coordinate system can be set to coincide with the projection center of the first image. The orientation of the local coordinate system s axes are then the same as that of the first image s coordinate system. In some applications, the reference to the global coordinate system is not necessary; such that only the relative orientation is required. Moreover, in some scenarios, only the relative orientation can be recovered when no or insufficient control information is available. With the emerging of the Global Positioning System (GPS) and Inertial Navigation System (INS), the absolute orientation of the camera viewpoints with respect to the world coordinate system can be obtained directly. This approach is so called direct orientation or direct georeferencing. Direct orientation approach has been a popular approach, especially in photogrammetry. The principle of this approach is to attach a navigation unit to the imaging system. The GPS is used to determine the optical center of camera at each 3

19 exposure by establishing pseudorange from at least four satellites. The position, orientation and velocity are obtained from INS. The INS motion information can also facilitate the interpolation of position information because the positioning information from GPS is discrete. The main limitation of this approach is their susceptibility to jammed GPS signals in GPS denied environments. An alternative approach, which is called indirect orientataion in photogrammetry, can be used to recover the absolute orientation with respect to a specific local or global coordinate system. This approach requires the availability of the control information or features in the coordinate system. For example, the ground control points from a field survey can be used to compute camera poses. In an urban scene, most architectural buildings have 3 orthogonal line pairs. Their images are used to compute vanishing points in 3 directions which are subsequently used to estimate the displacement of a camera s viewpoint [5]. Alternatively, the absolute orientation of cameras viewpoints can be obtained by first determining the relative orientation and then transforming the reference frame defined for relative orientation to the object coordinate system. Moreover, without the control information in 3D object space, the relative orientation is used instead. The most fundamental problem in computing the relative motion is the stereo vision [6]. Since it is a versatile method in the computation of scene geometry and most similar to biological visual system, it has been the most studied and on going research topic in computer vision, photogrammetry, computer graphics and robotics. Moreover it can be used as a fundamental building box for the computation of multiple camera motions. A well-known geometric description of the stereo view is the epipolar geometry. The expression of the epipolar geometry contains the translation and rotation between two camera coordinate systems. In computer vision, the algebraic expression describing the epipolar 4

20 geometry is called the fundamental matrix, which is also referred to as the essential matrix when cameras are calibrated [7]. Precisely, the fundamental matrix shows the relation between a point on an image and its conjugate point on the second view. One of main disadvantages of the stereo based vision is that some scene features can be occluded and there may be some degenerate or critical configurations. In other words, estimating the camera motions using more view reduces the chances to encounter a critical configuration. An important critical condition, which is called the degenerate, configuration must be avoided in the estimation of camera motion because it cannot be recovered uniquely. For example, lets consider the trifocal tensor or trilinear constrain [8] which is a projective geometric relationship describing the relative motion between three viewpoints. Namely, the relative motions between 3 cameras are compactly embedded in a tensor which is a cube of numbers. While the degenerate configuration for 3 views is just the case that all image features used to estimate the camera motion arise from a common plane in 3D, some degenerate configuration of the two views are the cases when corresponding 3D of image points lie on a quadric surface passing through the two optical centers, image points are on a common plane or a line. Another advantage of using more than 2 views is the ability of transferring geometric entities from one view to another view. After the motions of cameras are determined, the next step is to recover the 3D structure. This process is also known as triangulation or structure from motion [9, 10]. Given the images of the geometric entities in space taken with camera with known pose, the goal is to find the unknown geometric entities e.g. point, line, plane or curve in space. For instance, point reconstruction is the simplest case in the reconstruction problem. The basic principle for point triangulation is to find the intersection of converging lines in space. Such lines are from the back-projection of matched points on the images to object space. 5

21 In the noiseless case, finding the intersection of converging lines in space is trivial with no significant difficulties. In contrast, with the presence of noise, the back-projection lines do generally not intersect in space. Under some assumed noise model, the triangulation problem then becomes finding the points that most likely represent the point of intersection. To choose the most likely point of intersection [9], using calibrated cameras has less difficulty than the uncalibrated ones. Precisely, by using calibrated camera, the reference frame is already Euclidean and the metric information such as distance or angle is meaningful and can be used the define the triangulation model. In contrast, when cameras are not calibrated, the camera motions are in the projective reference frame; the notion of distance and angle have no meaning. Therefore most reconstruction problem using uncalibrated camera use the concept of re-projection error which is invariant to projective transformation in the reference frame. The basic camera setup for the 3D structure reconstruction is the stereo view where two viewpoints see the same scene or have an overlap area in the image. However, using only two views are limited to only the reconstruction of points in space. By using more views, higher order geometric entities such as line, plane, conic section or algebraic curve can be reconstructed. In other words, the higher order entities has more degrees of freedom which requires more redundancy to estimate and use more views to increase the redundancy in the computation. Moreover, using more view can lead to greater accuracy in reconstruction because of the added stability from multiple views, and the more strict constraints on the position of reconstructed entities in space. Additionally, redundancy contributes to the elimination of error in the reconstruction. Another approach to recovering the camera motions and 3D scene geometry is to estimate them simultaneously. This approach itself can also be classified into linear and 6

22 non-linear methods. Both of these methods locally minimize a cost function. A well studied linear method for recovering camera motion and 3D scene geometry is the factorization [2, 11] which is a popular approach in computer vision and robotics. The principle of the factorization method is to factorize the observation matrix, whose elements are matching image points, into the matrix of camera motions, which are embedded in the camera projection matrix, and the matrix the 3D scene geometry such as the 3D point. The advantage of this approach compared to the non-linear approach is waiving of an initial solution requirement. In spite of the advantage of linear methods, the factorization method is limited to the case when the reconstructed points are visible on all images. Moreover, the minimized cost function is not geometrically meaningful and is usually non-linear. The most common non-linear method for recovering both 3D structure and camera motion is the bundle adjustment [12, 4, 3]. It was first developed in photogrammetry and has been widely used e.g. computer vision and robotics. The concept of this method is adjust a bundle of rays between each 3D point and the optical centers of cameras viewing the 3D point. In contrast to the linear approach, it does not require any 3D point to be visible on all images. The process is to minimize a non-linear cost function by iteratively solving its linearized version. Since the iterative computation requires a starting estimate, this initial solution, for instance, can be obtained from the factorization approach. Since its objective is to determine 3D geometric features, camera poses and sometime camera calibrations, the bundle adjustment becomes a large spare optimization problem. The complexity of the bundle adjustment depends on the parameterization of a 3D geometric entity and camera pose and also constrains that must be taken into account in order to get the valid estimate. For example, the orthogonality constraint needs to be imposed in order to get valid rotation matrix or some additional constrains may need to be taken into 7

23 account when some higher order geometric entities are estimated. This is also an advantage of the bundle adjustment because it lets a wide range of parameterizations. Moreover, the choices of the camera s projection models and cost functions are not limited. Although, in terms of the computational model, the bundle adjustment problem is classically formulated as a non-linear least square model assuming quadratic cost function, it has been adapted to be used by a variety of models e.g. non-quadratic or robust cost functions. Thus, bundle adjustment is shown to be very versatile and applicable in a wide range of applications. 1.1 Scope of dissertation In traditional photogrammetry and computer vision activities, most algorithms for 3D scene and camera motion recovery are widely developed based on point feature. Precisely, point matches between images are primarily used as input data. In some application, using higher order features is more suitable than using point. For example, in urban scene, most meaningful features are higher order features e.g. straight line or conic. Using such higher order features can eliminate the occlusion problem that is very critical in a point based technique. Many researchers have investigated feasibility of using lines in computer vision and photogrammetry. In this dissertation, two problems in structure and camera motion recovery using line features are addressed; especially the 3D line estimation and the line-based cost function for bundle adjustment are studied. The aim of this research is to develop new methods for 3D line estimation method and a novel cost function for bundle adjustment. For the first problem, either relative or absolute camera motions are assumed to be known ahead of time. The initial camera motion can be obtained from existing linear algorithms or alternatively from an INS solution. Two possible methods for estimating a 8

24 3D line which are not limited to either calibrated or un-calibrated cameras are introduced. The proposed methods estimate 3D line by using the back-projection planes of image lines. This method estimates a 3D line by minimizing a cost function based on the dual property of the projective space. Specifically, in dual projective space, the duals of back-projection planes are points, while duals estimated lines are lines. Since the coincidence property is preserved, it is hypothesized that the dual of the optimal line estimate coincident with the duals of back-projection planes. Namely, the dual of optimal line estimate minimize the orthogonal distances to the dual of back-projection planes. Since the estimated line is represented by Plücker coordinates, traditionally, the Plücker constraint needs to be imposed in the estimation model. However, by formulating the cost function in the dual projective space, the requirement of the Plücker constraint can be waived. The second line estimation method is based on the non-parametric statistical method, especially the mean shift method. It is a non-parametric method for finding the mode of the sampled data. In terms of geometry, it seeks to find the mode of a set of discrete points in space. In the proposed line estimation method, the sampled set of lines in space is generated by the intersection of back-projection planes from multiple images viewing the estimated line. By representing a line using rotation matrices in 2D and 3D, the mean shift clustering method for 3D line estimation can be developed because the distance metric in the space of rotation matrix is well defined. In order to develop line-based cost function, a line segment is used instead of analytical line, which is presented two end-points. The use of endpoint provides the location of the estimated 3D line. An important fact used to derive the cost function is the equivalence between the two models of 2D projective space. The first model is the image plane, which is commonly used in many applications, and the Gaussian sphere center at the optical center 9

25 of the camera. Under this equivalency property, a line segment is equivalent to a circular segment on the plane containing optical center and the line on the image plane. That is, a line segment is represent by its equivalent circular segment instead. It is then hypothesized that the optimal camera motions and 3D scene geometry is the one that minimizes the integration of squared distance from all points on the boundary of the circular segment and the plane containing optical center and the estimated line. 1.2 Organization of the dissertation The rest of this dissertation is organized as follows. The backgrounds on projective geometry, camera model and mean shift clustering are reviewed in Chapter 2. The proposed 3D line estimation methods are presented in Chapter 3. Chapter 4 is dedicated for the proposed line based bundle adjustment model. The experimental results are reported in Chapter 5 and the conclusion is in Chapter 6. 10

26 Chapter 2: Background In order to facilitate the discussion on the development of the proposed methods, in this chapter, I review necessary background information. This chapter particularly starts with the introduction of related concepts in the projective geometry including the representations of geometric entities in the projective space, camera model, which is followed by the projection and back-projection using camera matrices, and duality in projective space. More detail in projective geometry used in computer vision can be found in [1, 13, 14]. 2.1 Projective geometry The projective geometry is a topic in geometry studying the transformation of geometric entities; especially the projective transformation. An example of the projective transformation is the imaging of a 3D point which is projected onto an image plane. In the global view, the Euclidean transformation is a special case of projective transformation. There are also two geometric transformations between them which are similarity and affine transformations, such that Euclidean similarity affine projective transformation. One of the drawbacks in using the projective geometry is that many geometric quantities are not invariant under the transformation. For instance, orthogonality of lines and parallel lines are not preserved under the projective transformation. Despite this drawback, an interesting property of the projective geometry is the duality principle. For example, in 11

27 2D projective space, this principle makes points and lines equivalent. Moreover, with the existence of homogeneous coordinates, the algebra of projective transformation is linear and has simpler analytical forms Representation of geometric entities I first introduce the notations used in the review of projective geometry. A vector is written in boldface i.e. x or l while a matrix is in typewriter font e.g. R. The notation [a] represents the cross-product between two 3-vectors i.e. [a] b = a b and is skew-symmetric. Geometric primitives are represented in the projective space by its homogeneous coordinates. The homogeneous coordinate of a point in n-d Euclidean space R n is an (n+1)-vector. The coordinates of a point in n-d Euclidean space can be converted to homogeneous coordinates by concatenating the coordinates with 1. Two homogeneous coordinate are equivalent up to scale i.e. x 1 x 2 if and only if x 1 = λx 2 where λ R. For example, the homogeneous coordinates of a 3D point [x y z] is [x y z 1] which is equivalent to [λx λy λz λ]. Broadly speaking, a point in n-d Euclidean space can be represented as a point in the n-d projective space, P n, by an (n+1)-vector. The most significant projective entity used in the dissertation is the straight line. In the 2D projective space e.g. image plane, a line is represented by 3-vector i.e. l = [a b c] where ax + by + c = 0 is the line equation. The line l passing two points x 1 and x 2 can be computed by l [x 1 ] x 2. Let a 3D line be defined by two points X 1 and X 2. Among several other choices, this line can be represented by its Plücker coordinates which is a point in P 5 : L [ X 2 X 1 [X 1 ] X 2 ] = [l1 l 2 l 3 l 4 l 5 l 6 ] = [ L h L o ], (2.1) 12

28 where L h and L o respectively equivalent to the direction and moment of the line which is the normal vector of the plane containing the origin and the 3D line, see Figure 2.1. The Plücker coordinates can also be written in the form of a Plücker matrix: 0 l 6 l 5 l 1 M = l 6 0 l 4 l 2 l 5 l 4 0 l 3. (2.2) l 1 l 2 l 3 0 Here, M contains column vectors, M = [L x L y L z L ], each of which is the point of intersection between the 3D line and the x, y, z - planes and the plane at infinity. Figure 2.1: A 3D line L can be represented by its Plücker coordinates L = [L h L o ], which is 6-vector. The first 3-vector, L h is the direction of the line and the second one is the moment of the line which is the normal vector of the plane containing origin and the line. Due to the unknown scale, line L in P 3 becomes a point in P 5. In other words, points in P 5 are used to represent a line in P 3. One should bare in mind that not all points in P 5 are equivalent to a line in P 3, such that L in (2.1) represents a 3D line if and only if a bi-linear 13

29 constraint, referred to as the Plücker constraint is satisfied: 0 = ζ(l) = L h L o = l 1 l 6 + l 2 l 5 + l 3 l 4. (2.3) Points in P 5 satisfying the Plücker constraint lie on a special surface referred to as the Klein quadric. Given a point X = [ X 3 1 M] in P 3, the (Euclidean) perpendicular distance between the point and a line L is [13, page 78]: [ [ X] ] MI 3 L d (X, L) =. (2.4) ML h Alternative to using L h and L o as a 6-vector, their directions and magnitudes can be represented by Special Orthogonal groups SO(3) and SO(2), where SO(n) is the group of rotation matrix in R n. This 3D line representation is also called as orthonormal representation [15]. That is, the direction and moment of the line are embedded in the rotation matrices. Therefore, any 3D line can be also represented by: L (S, T) SO(3) SO(2), (2.5) where S = [s 1 s 2 s 3 ] (2.6) [ ] Lh L o [L h ] L o =, (2.7) L h L o [L h ] L o [ ] 1 Lh L T = o (2.8) L L o L h [ ] t11 t = 12. (2.9) t 21 t 22 Given orthonormal representation of a 3D line, we can easily converse it back to the Plücker coordinates by: L [ t 11 s 1 t 21 s 2 14 ]. (2.10)

30 In this orthonormal representation, the Plücker constraint is implicitly embedded in S and T due to the orthogonality property of rotation matrix which can be easily proved by taking the dot product between the first and the second columns of the matrix S: 0 = L h L o L h L o (2.11) = L h L o, (2.12) and those of the matrix T: 0 = L h L o + L o L h L 2 (2.13) = L h L o. (2.14) In our approach, we will make explicit use of this representation in bundle adjustment in order to waive the imposition of the Plücker constrain. The review of various 3D line representations can be found in [15, 16]. A plane in 3-space is represented by a homogeneous 4-vector whose elements are the plane parameters. Precisely, a plane AX + BY + CZ + D = 0 is represented by the vector [A B C D]. Given two 3D planes Π and Π such as the back-projection planes of image lines from a stereo pair, they always intersect at a 3D line (see Figure 2.2 for illustration) which can be computed by: L Π Π = Ξ(Π)Π, (2.15) where Ξ is a construction matrix [17]: [ [ Π] 0 Ξ(Π) = 3 1 SI 3 Π ], (2.16) and Π [ Π S]. 15

31 l' l L Figure 2.2: Back-projection planes from a stereo pair intersect at a line in space. The coordinates of geometric entities change under the changing of coordinate system. Let the rotation matrix and the translation vector between the original coordinate system and the transformed one be denoted by R and T. A point, X in the original coordinate system is then transformed by: X = R(X T) (2.17) Let a line L in the original coordinate system be defined by 2 points X 1 and X 2. The Plücker coordinates of the transformed line L is then: [ ] [ X L 2 X 1 [X 1] X = 2 R(X 2 X 1 ) [R(X 1 T)] R(X 2 T) ]. (2.18) By using an identity [Ba] B = adj(b) [a], the above equation becomes: L [ R(X 2 X 1 ) R[X 1 T] (X 2 T) ] [ = R(X 2 X 1 ) R[T] (X 2 X 1 ) + R[X 1 ] X 2 ] (2.19) [ R R[T] R ] [ X2 X 1 [X 1 ] X 2 ] [ = R R[T] R ] L. (2.20) 16

32 2.1.2 Pinhole camera model As it is traditionally assumed, I use the pinhole camera model which is composed of exterior and interior orientation parameters. The exterior orientation parameters contain the rotation matrix R 3 3 and the optical center, T 3 1 ; while the interior orientation parameters are coefficients of the calibration matrix K 3 3 : α γ u 0 K = 0 β v 0, (2.21) where α and β are the scale in x and y direction, (u 0, v 0 ) are the coordinates of the principal point, and γ is the skew parameter. Precisely, the interior orientation parameters describe the geometry between the sensor and image planes. Combining these parameters into a single projection matrix, P, results in a model for projecting a 3D point X to its image x, see Figure 2.3, where both are in the homogeneous coordinates by: x PX with P KR [I 3 T], (2.22) where I is the 3 3 identity matrix. The camera projection matrix P back-projects an image line l to a 3D plane Π called back-projection plane which is also known as interpretation plane. It contains the optical center and an image line, see Figure 2.4, and can be computed by: Π P l. (2.23) Note that Π is the plane in the object space coordinate system not in the camera one The equivalence between the image plane and the Gaussian sphere Under the equivalence relation in the projective geometry, the 2 dimensional projective plane P 2 can be modeled as the unit Guassian sphere S 2 in 3 dimensional vector space R 3. 17

33 Figure 2.3: A scene point X is projected to an image point x. This equivalence is based on a fact that two points are equivalent if and only if they are on the same projection ray. A point on P 2 can be mapped back and forth between S 2 and P 2 via gnomonic projection. Therefore, a point x P 2 can be mapped to a point V on S 2 by V = x x. (2.24) Equation (2.24) can be interpreted as the normalization of the point x to a unit vector. Therefore, V is a point on the unit sphere in R 3 and both V and x are on the same projection ray but on different surfaces. For an infinite line l on the 2D projective plane, it is equivalent to a great circle on S 2 which is the intersection between the sphere S 2 and the interpretation plane of the 2D line l. Hence, the great circle can be represented by the normal vector N of the interpretation plane. A point on the line l is mapped to a point on the great circle equivalent to the line, see Figure 2.5. Consequently, a line segment is equivalent to a circular arc on the interpretation 18

34 Figure 2.4: The back-projection plane from the image line l. Its is also known as the interpretation plane. Figure 2.5: The mapping between Gaussian sphere S 2 and 2D dimensional projection space P 2. A line l on P 2 is equivalent to a great circle with normal vector N on the Gaussian sphere. If a point x on the image line l is equivalent to a point V on the Gaussian sphere, the point V must be on the great circle equivalent to the image line l. 19

35 plane. In order to explain the concept, let the segment of the line l be defined by two endpoints x 1 and x 2 which are equivalent to V 1 and V 2 on S 2 respectively, see Figure 2.6 for the illustration. The circular arc connecting V 1 and V 2 is a segment of the circumference of the great circle equivalent to the line l. For an alternative explanation, lets consider the tracing of the projection ray. The projection ray tracing from the point x 1 to the point x 2 along the line l intersects the Gaussian sphere and the locus of the intersection point is the great circle with normal vector N passing the points V 1 and V 2 because the tracing path is on the interpretation plane. Note that the line l is equivalent to the great circle with normal vector N. Figure 2.6: A line segment on the image plane is equivalent to a circular arc which is a segment of the circumference of the great circle equivalent to the line passing x 1 and x 2. 20

36 Given the calibration matrix K of camera, an image point x can be mapped to the unit Gaussian sphere in the object space with its center positioned at optical center by: V = K 1 x K 1 x, (2.25) where K 1 x is the homogeneous coordinate of the point x = [ x 2 1 1] on the normalized image plane, particularly image plane at the unit focal length. In other words, a point on the image plane can be mapped onto the unit Gaussian sphere in object space by first mapping it to the normalized image plane and normalizing it to unit norm. To compute the normal vector of the great circle equivalent to a line l, let the line l be defined by two points x 1 and x 2. The normal vector of the great circle can be computed by using the cross product between their equivalent point on the Gaussian sphere: N = [K 1 x 1 ] K 1 x 2 K 1 x 1 K 1 x 2. (2.26) By using a fact that [Ba] B = det(b)b [a] (det(b) 0) and l [x 1 ] x 2, one obtains: N = K [x 1 ] x 2 K 1 x 1 K 1 x 2 (2.27) K l. (2.28) Note that the normal vector N is in the camera coordinate system Duality principle One of the important concepts in projective geometry is the duality principle. It is the formulation used to explain the symmetry in roles play by geometric entities in projective space. Moreover, geometric relations in the projective space also have their dual. For instance, the dual of collinearity is the intersection or vice versa. Therefore, given a 21

37 proposition in projective space, the dual of the proposition can be formed by replacing the geometric entities and geometric relations by their duals. For example, lets consider the 2 dimensional projective space. The dual entity of a point is a line or vice versa. Let x be a point in 2D projective space and its dual in the dual projective space is x. Both of them have the same coordinates, but x is interpreted as a point in projective space while x as a line in the dual projective space. Similarly, the dual of a line l is a point l in the dual projective space. A proposition also has a dual. Let x 1, x 2 and x 3 be points in 2D projective space and they are collinear on the line l. In the dual projective space, they are interpreted as the concurrent lines and the point of intersection is the dual of the line l, see Figure 2.7. Figure 2.7: The dual entity of a point x on 2D projective space is a line x in the dual projective space or vice versa. A proposition also has a dual. This Figure demonstrates the dual of collinearity which is intersection in the dual projective space. Three collinear points on the line l become three concurrent lines in the dual projective space where the intersection point is the dual of the line l. 22

38 In 3 dimensional projective space, the dual entity of a point is a plane. Let X be a point in the projective space and its dual in the dual projective space is X. Similarly, the dual of a plane Π is a point Π. For the 3D line L, its dual L is still a line in the dual projective space but their coordinates are different by: L = [ Lh L o ] = [ 0 I3 I 3 0 ] [ Lo L = L h ]. (2.29) 2.2 Riemannian manifold For better understanding and the sake of completeness, this section is dedicated to the Riemannian manifold which will be used for explaining the non-linear mean shift algorithm and computation in the proposed 3D line estimation method. A manifold is a topological space in which the neighborhood region of each point has an Euclidean-like structure. A smooth manifold can be thought of as a continuous surface lying in the Euclidean space. A manifold is called Riemannian if and only if it is smooth (differentiable) and equipped with inner product. Hence this allows one to define a metric on manifold. Unless stated otherwise, the term manifold refers to the Riemannian manifold. An important notion of the manifold is the tangent space. Let consider the infinitesimally small neighborhood region of a point x on the manifold M i.e. x M. Such infinitesimally small neighborhood region can be regarded as a flat space, which is linear approximation of the manifold M around the point x [18]. The flat space at the point x is called the tangent space at x and denoted by T x. In other words, the tangent space T x is a plane tangent to the manifold M at the point x. In Figure 2.8, an example of tangent space is demonstrated and the vector on T x is called tangent vector. Since the tangent space T x is a vector space, the inner product for tangent vector can then be defined. 23

39 Figure 2.8: Mapping between manifold and tangent space. The logarithm operator log x maps a point y on the manifold to a point on the tangent space T x ; while the exponential operator exp x maps the point back onto the manifold. Note that these two operators depend on the point x on the manifold. A point on the manifold can be mapped to a tangent space or vice versa. The mapping from a point on the tangent space T x i.e. to the manifold M is defined by the exponential map, exp x. Inversely, the logarithm map log x = exp 1 x maps a point y on the manifold to the tangent space T x. Note that these operators are dependent to the point x because different points on the manifold have different tangent spaces. The comparison between addition and subtraction operators in vector space and exponential and logarithm operator on manifold is demonstrated in Table 2.1. As mentioned earlier, the Riemannian manifold Table 2.1: The comparison between operators in vector spaces and manifolds Vector Spaces Manifold Addition y = x + y = exp x ( ) Subtraction = y x = log x (y) 24

40 is equipped with the notion of metric which is given in terms of the length of the shortest curve between two points. The curve connecting two points with the shortest distance is called geodesic and the Riemannian distance (metric) between two points on the manifold is the length of the geodesic. In Figure 2.8, the dash line is the geodesic between the point x and y and the initial velocity of the geodesic is [19]. The relations between y and are expressed as follow: exp x ( ) = y, (2.30) log x (y) =. (2.31) Note that the specific form of the exponential and logarithm operators depend on the manifold. Let d(x, y) be the Riemannian distance between points x and y on the manifold M. The gradient of the squared Riemannian distance is given by [20]: x d 2 (x, y) = 2 log x (y). (2.32) As mentioned earlier, the expressions for exponential and logarithm operators depend on the manifold. For example, the logarithm and exponential operators of the matrix manifold are different from that of the Grassmann manifold. Since the matrix manifold frequently occurs as parameter space in photogrammetry and computer vision and is used in this work, only the case of matrix manifold is discussed. Let and Y be square matrices. The matrix exponential and logarithm operators of a matrix defined about the identity matrix are expressed as follow: exp( ) = log(y) = 1 i! i (2.33) ( 1) i 1 (Y I) i. i (2.34) i=0 i=1 25

41 Namely, the above expressions can be computed accurately when their arguments are close to the identity matrix. The manifold operators at a point X i.e. exp X and log X are then defined as: exp X ( ) = X exp(x 1 ) (2.35) log X (Y) = X log(x 1 Y), (2.36) where X and Y are points on the matrix manifold and on T X. Thus, the distance between two points on the matrix manifold is expressed as follow: d(x, Y) = log(x 1 Y) F, (2.37) where F is the Frobenius norm of a matrix. The first order approximation of the gradient of squared distance d 2 (X, Y) is obtained [21]: d 2 (X, Y) log X (Y). (2.38) 2.3 Non-linear mean shift on Riemannian manifold In order to introduce non-linear mean shift on Riemannian manifold, the ordinary mean shift is first discussed and then the non-linear mean shift Ordinary mean shift Mean shift is a non-parametric statistical analysis technique. Strictly speaking, it is an iterative procedure based on the weighted average in a local region for mode detecting/clustering of a given discrete data {x i }. Originally, its purpose is for locating the peak of a density function of a given n data points in a vector space, where the multivariate kernel density estimate at a point x can be computed as follow: ˆf k (x) = c n ( ) k,h x xi k, (2.39) n h 2 i=1 26

42 with kernel function k satisfying: k(z) > 0 z 0. (2.40) c k,h is the normalization term such that ˆf integrates to 1. The bandwidth h is introduced as the scaling of the distance function i.e. Euclidean distance in vector space. By computing the gradient of the kernel density estimation function (2.39), the mean shift vector at point x can be calculated as follow: = where m h (x) = C ˆf k (2.41) ˆf k N ( ) x xi 2 xg i=1 h 2 N ( ) x xi 2 g i=1 h 2 x, (2.42) g(x) = k (x). (2.43) The expression for the mean shift vector is actually proportional to the normalized density gradient estimate. The first term in the right hand side of (2.42) is the weighted average of point in the local region at the point x. Starting at a point x j, it iteratively converges to a stationary point by a gradient ascent technique [22]: x j+1 = m h (x j ) + x j. (2.44) That is, the current mode estimate is shifted to the weighted average of data point in its neighborhood. 27

43 2.3.2 Non-linear mean shift In [21] and [19], the mean shift algorithm was generalized to the case where the set of discrete points is restricted to lie on a manifold e.g. manifold of n-dimensional rotation matrix SO(n), which is also known as special orthogonal group, or Euclidean transformation in 3-space SE(3), which is also known as special euclidean group. That is, the set of points is in a non-linear space and the mean shift vector (2.42) can shift the current mode estimate outside the manifold. For instance, let consider the manifold of 3D rotation matrix. The iterative update (2.44) cannot be used to update the current mode estimate because the sum of two rotation matrices is not a rotation matrix. The useful characteristic of a manifold for deriving the non-linear mean shift is that the tangent space at each point on the manifold is well defined. Furthermore, such tangent space is the Euclidean space. It is hence possible to modify the original mean shift algorithm to work on non-linear manifold because the tangent space at the current mode estimate x j is a vector space. Particularly, the mean shift vector is computed on the tangent space at the current mode estimate. The mode estimate is updated by back-projecting the mean shift vector from tangent space to the manifold. Another important characteristic of the Riemannian manifold is the notation of metric, also known as geodesic, between two points on the manifold. With the existence of the metric, the calculations of the direction and size of the mean shift vector are thus possible. To modify the linear mean shift algorithm, lets consider a manifold M equipped with a distance metric d. The notion of Euclidean distance in (2.39) is the replaced by the distance metric d and the term c k,h is dropped because the position of the mode is not affected by 28

44 Figure 2.9: The non-linear mean shift on manifold. To compute the mean shift vector, all data points are projected to the tangent space of the current mode estimate. On the tangent space, the mean shift vector is computed. The updated mode estimate can be obtained by projecting the mean shift vector back to the manifold. the global scaling [19]. The gradient of ˆf k at x is then: ˆf k = 1 n = 1 n n ( ) d 2 (x, x i ) k i=1 i=1 h 2 (2.45) n ( ) d 2 (x, x i ) d 2 (x, x i ) g. (2.46) h 2 h 2 Note that the gradient of d is with respect to x. The expression for the non-linear mean shift vector for the discrete points on non-linear manifold can be obtained analogously to the linear manifold case in (2.42) [19, 20]: m h (x) = N ( ) d d 2 2 (x, x i ) 2 (x, x i )g i=1 h 2 N ( ). (2.47) d 2 (x, x i ) 2 g i=1 29 h 2

45 The above mean shift vector is the weighted average of the gradient terms d 2 (x, x i ) on the tangent space at the point x. The mode estimate is then updated by projecting the mean shift vector back to the manifold: ( x j+1 = exp x j mh (x j ) ). (2.48) Namely, the mode estimate moving along the geodesic on the manifold [19]. The complete algorithm is presented in Algorithm 1. Data: A set of discrete points on manifold {x i }, i = 1,..., n Result: The mode estimate of the given set of points for iter 1 to w do x x i while m h (x) < ɛ do N d 2 (x, x i )g m h (x) i=1 ( ) d 2 (x, x i ) 2 h 2 N ( ) ; d 2 (x, x i ) 2 g i=1 x expx(m h (x)); end end Algorithm 1: Non-linear mean shift algorithm on manifold h 2 30

46 Chapter 3: 3D line estimation 3.1 Literature review 3D scene geometry reconstruction is an on-going research in many fields i.e. photogrammetry, robotic and computer vision. It is used in many applications including digital archiving of cultural heritages [23], 3D urban Scene Modeling [24], face recognition [25] and medical diagnosis [26]. The aim of 3D reconstruction is to infer the most likely locations of geometric entities in space. The most commonly used feature in the reconstruction of scene geometry is the point feature. Moreover, it is also the feature used in the early work of scene geometry recovery [27]. Since then, most of the existing 3D scene geometry recovery methods were developed for the 3D point estimation [9, 28, 29, 30, 31]. Although the research on 3D scene point recovery is the mainstream, recovering 3D scene points may not be suitable in applications that require the reconstruction of higher order features such as lines, planes or curves. For instance, the plane estimation is required for reconstructing building roofs [32, 33]. Polyhedral objects can be reconstructed by 3D line estimation [34]. Especially, to reconstruct high-rise or man-made buildings in an urban environment, the 3D line reconstruction is more suited than point reconstruction [35, 36], whereas, estimating space curve is better suited for reconstructing arbitrary objects [37, 38, 39]. 31

47 To estimate a 3D line, at least three images are required because two images do not provide adequate constraints for locating the 3D line in space [40]. To illustrate, let there be only two images and the intersection of projection planes of the matched image lines gives estimated 3D line. Without loss of generality, suppose that the motion of the first camera is fixed as the second camera moves (see Figure 3.1),so that the back-projection planes always intersect in space. Therefore, the position of the 3D line is not unique. To constrain the position of the 3D line, a third view is needed because three planes in space do not necessarily intersect at a line. Figure 3.1: 3D line estimation using two views. The location of the estimated line is not well constrained because the back-projection planes from two views always intersect in space. 32

48 In the literature, there are several 3D line estimation methods that use the minimal configuration of 3 views. Hartley [8] and Weng et al. [40] used three views to define linear algorithms for the 3D line estimation problem. In [40], only a set of image lines was used to derive a closed form solution for estimating 3D line from three monocular perspective views. These approaches begin with estimating the relative camera motion and then estimates the 3D line. While this method assumes calibrated camera, Hartley [8] used un-calibrated cameras and estimated the relative camera motions by the trifocal tensor. The 3D line is then reconstructed by the intersection of back-projection planes. Since un-calibrated cameras are used, only projective equivalence of a 3D line is obtained. Similarly, the 3D line estimation method proposed in [41] concurrent uses three backprojection planes to estimate relative camera motion. Once the relative camera motions are computed, the 3D line is estimated from a pair of points on the intersection of three planes. While previous methods use only image line features, Oskarsson et al. [42] use a combination of points and lines. They proposed two methods for reconstructing of four points and three lines in three views, and two points and six lines in three views. In contrast, some methods are generalized to the N-view case. An advantage of using many views in the 3D line estimation is the increased redundancy. Namely, the estimation model becomes overdetermined. Moreover, the location of the estimated 3D line can be well constrained. Hartley and Zisserman [1, page 323] formulated a 3D line estimation model by using the fact that all of the back-projection planes from line matching intersect at a single line in space. Therefore, the intersection line is the pencil of all back-projection planes. Since a 3D line in space can be represented by the two planes where they intersect at a line, they suggested that those two planes can be obtained by using two basis vectors spanning the back-projection planes. The two basis vectors can be computed by 33

49 using Singular Value Decomposition (SVD). The concept of using back-projection plane is also used in by Taillandier and Deriche [43]. They formulated a cost function for 3D line estimation by using the fact that point on the estimated line must be coincident with the back-projection planes. Petsa and Karras [44] proposed a method for 3D line estimation for a stereo pair. Since using only two views cannot provide more constraints on the location of the estimated line in space, this method constrains the estimated line to be on a model plane. In other words, this method hypothesizes that the back-projection planes from a stereo pair and the model plane intersect at the estimated line. This may be impractical in some situations where the equation of the model plane is not known before hand. Heuel [10] and Heuel and Förstner [45] proposed a probabilistic framework for reconstructing a 3D line by direct reconstruction. Strictly speaking, a statistical model for 3D line reconstruction by the intersection of back-projection planes was proposed. Another approach of 3D line estimation is the minimization of the re-projection error. The concept of this approach is to re-project the estimated 3D line back to the image plane. The optimal 3D line estimate is the one that minimizes a cost function defined in the image space. Bartoli and Sturm [15] presented a quasi-linear method for line estimation by representing a 3D line by its Plücker coordinates with a cost function defined in the image space. Their cost function is composed of the orthogonal distances from the image line end-points to the re-projection of the estimated line. This 3D line estimation is quasi-linear because the non-linear Plücker constraint is imposed in the estimation model. Instead of using orthogonal distance from the image line end-points to the re-projection of estimated line, Schindler et al. [35] estimate a 3D line by minimizing the total squared distance between the image line segments and the re-projections of the estimated line. The disadvantage of 34

50 this method is that the 3D line representation is limited to vertical line and horizontal line with fixed direction. The aforementioned methods used the re-projection of infinite straight line to define cost function. An alternative way for defining re-projection error is to use the re-projection of points defining the estimated line. Park [46] proposed an image based rendering method which generates 3D polygons. The basic concept is to estimate 3D straight edges of polygon from multiple images. Both image line features and estimated 3D lines are represented by their endpoints. That is, the 3D straight lines are the polygon s vertices. The cost function is then defined by the area of a quadrilateral whose vertices are the four endpoints of the two line segments i.e. image line segment and projected polygon s vertices. The re-projection of 3D line segments is also used to define the probabilistic distribution function of the location of the estimated 3D line in [47]. The distribution function is generated by sweeping the 3D line end-points along the back-projection rays of the image line endpoints. The probability of the 3D line at each sweep is computed by the total re-projection error on all images of the 3D line end-points where the re-projection error is formulated as the image gradient value. In contrast to most 3D line estimation methods, this method does not need an explicit 2D line match. However, the computational cost of the probability of the 3D line location in space is more expensive than the 2D line matching. Josephson and Kahl [30] proposed a unified framework for estimating point, line and plane. They estimate the 3D line by finding bounds on the coordinates of the two points defining the estimated line. They chose the coordinates system such that the two points defining the estimated line are located on the planes z = 0 and z = 1. The problem of 3D line estimation then becomes finding x and y coordinates of those two points by minimizing their re-projection 35

51 errors. The optimal x and y coordinates are solved by using branch and bound optimization [48]. Instead of using all of the image data at once, a 3D line can be reconstructed sequentially by using sequential adjustment e.g. Kalman filtering [49, 50, 51, 52]. That is, the estimated line is updated when a new frame is added. Seo and Hong in [49] proposed a sequential line reconstruction method based on the Kalman filtering technique. The reconstruction starts with computing the camera projection matrices of the first three frames which are then used to estimate 3D lines. To update the estimated 3D line, new frame are added one by one. After a new camera matrix is computed, the 3D line is updated by using Iterative Extended Kalman filter (IEKF). A drawback of this method is that it is biased to the first three frames used to initialize the camera projection matrices. Similarly, Gee and Mayol-Cuevas [50] proposed a Simultaneous Localization and Mapping (SLAM) system using unscented Kalman filters. The system can generate 3D line segments and estimate the camera location in real-time. The rest of this chapter is organized as following. The overview of two proposed 3D line estimation methods is presented in Section 3.2. The 3D line estimation method based on the incidence property in the dual projective space is then present in Section 3.3. The averaging approach for the 3D line estimation is then presented in Section Overview of the proposed methods In the proposed methods, the initial camera motions are assumed to be known. They can be initialized by some linear methods or given by some navigation device such as an inertial navigation system. Among many choices of 3D line representation, the Plükcer coordinates are adopted to represent a 3D line because of its homogeneity and its ease for 36

52 manipulation in the projective space. As discussed in Chapter 2, the Plücker coordinates is 6-vector in 5 dimensional projective space satisfying the Plücker constraint. This gives a difficulty in the 3D line estimation because an estimation model needs to take the non-linear constraint into account in order to get valid Plücker coordinates. The first proposed method is based on the incidence property of the dual line and dual plane in the dual projective space. Specifically, the cost function used in this line triangulation method is the orthogonal distance between the dual line and the dual of backprojection planes. That is, all of the image lines corresponding to the estimated 3D line are back-projected to the object space. The 3D line can then be estimated by fitting a line to the set of the dual of those back-projection planes. The Plücker constraint is naturally embedded in the solution of the cost function. A 3D line can then be estimated by solving the fitting model which is a set of linear equations without explicitly taking the Plücker constraint into account. The second 3D line estimation method is based on the averaging of a set of lines. For example, the intersection of the back-projection planes from a line match across two images is a 3D line. By considering all stereo pairs viewing the estimated line, a set of lines can be generated for use in the 3D line estimation. For the illustrated 3 image case in Figure 3.2, the plane intersections generate C 3 2 = 3 lines, which is shown by dashed lines in 3.2a. We use these geometrically incorrect line estimates to compute the likelihood of true 3D line shown as red circle lying on the Klein quadric in Figure 3.2b. This can be done by averaging the set of 3D lines. The difficulty of averaging the set of 3D lines is that the naive barycentric mean is not guaranteed to give valid Plücker coordinates. By converting the Plücker coordinates to the orthogonal representation, which is a pair of 3D and 2D rotation matrices, as discussed in Chapter 2. The 3D line averaging can be computed by 37

53 l' l l'' L ˆ Figure 3.2: Estimated 3D lines on the Klein quadric denoting the proposed final error. Red denotes the estimated 3D line, and other colors denote pairwise intersections of backprojected lines. (a) Initial solution, (b) final solution. the averaging in the space of 3D and 2D rotation matrices. The result from averaging is guaranteed to satisfy the Plücker constraint. In contrast to image space based 3D line estimation methods that estimate the 3D line by minimizing geometric or algebraic error on the image plane, the proposed line estimation methods are more generic since its independent of camera types. Namely, the estimation model does not need to be redefined for different camera types e.g. conventional camera, fish eye lens or catadioptric system because the proposed 3D line estimation methods are defined based on error minimization in the object space. By back-projecting line features back to the object space, the estimation methods then become independent to the camera models D line triangulation by incidence property in dual space Given the initial estimates of K, R and C of conventional pinhole cameras or inverse projection functions of omnidirectional cameras, 3D line estimation deals with back-projecting 38

54 image lines using these estimates to locate the 3D lines in the object space. This process, however, does not guarantee projectively correct line estimates. As discussed in Chapter 2, the estimated line should lie on the Klein quadric which suggests using the Plücker constraint in the estimation model: AL = 0. Using this constraint, the 3D line estimation model can be formulated as a least square problem subjected to a nonlinear constraint as follows: L = argmin AL 2 2 (3.1) L subjected to ζ(l) = L DL = 0 and L = 1. (3.2) Imposing the Plücker constraint and additional unit norm constraint in the above estimation model gives a difficulty in the estimation process. Only the linear least square problem (3.1) and the unit norm constraint can be easily solved by the SVD technique because the unit norm constraint is implicitly enforced. With the Plücker constraint, which is non-linear, the 3D line estimation problem needs to be iteratively solved by the linearized estimation model. Recently, Barreto et. al [53] presented a linear 3D line estimation for medical endoscope without imposing the Plücker constraint. A 3D line is triangulated by fitting the line to a set of 3D points. Although the Plücker constraint is not required, this method is not practical for traditional camera because it requires points in 3-space which are not available if they are not measured prior to estimation. Hartley and Zisserman [1, page 323], on the other hand, use a set of back-projection planes to estimate the 3D line from the spanning space corresponding to the two largest eigenvalued eigenvectors. This work is motivated from both [53] and [1, page 323] in that a 3D line is estimated from the dual of back-projection planes which is a point in the dual-space. That is the dual line is fitted to 39

55 the dual of back-projection planes in the dual projective space. The analysis on the estimation model shows that the Plücker constraint is not required during the estimation. An advantage of our 3D line estimation over [53] is that the proposed method is more practical central and non-central projection cameras because no 3D points are required and the cost function is defined in object space. The useful property of dual projective space for deriving the proposed 3D line estimation model is the preservation of the incidence between the geometric entities. By duality principle, the dual elements of the plane Π and line L are point Π and line L such that L = [L o L h ]. Let there be two planes in 3-projective space i.e. Π a and Π b intersecting at a line L, see Figure 3.3. Therefore, in the dual projective space, the dual line L is incident with the duals of planes Π a and Π b, see Figure 3.3. With this observation, a 3D line L can be estimated by finding the dual L minimizing the distance from duals of planes which are points in dual space. Figure 3.3: The dual projective space. Given two planes intersecting at a line L, the dual elements of these two planes are incident with dual line L. 40

56 The first step in the estimation process is to back-project line matches to object space. For the pin hole camera, let the projection matrix of camera i be P i and the image of line L on view i be l i with i = 1,..., N. The back-projection plane containing the projection center and the image of line L on view i is computed by: Π i = P i l i = [ Π i S i ]. (3.3) Geometrically, the back-projection planes must intersect at a single line in object space. However, in practice, those back-projection plane cannot intersect at a single line in space due to errors in the camera motions and image lines. In terms of the dual projective space, the duals of back projection planes are not collinear. To estimate a 3D line from the backprojection planes, the dual line minimizing orthogonal distance from the duals of backprojection planes is suggested. By way of explanation, the sum of the squared orthogonal distance from the dual L = [L o L h ] to all duals Π i is used to form the cost function for estimation, see (2.4): N d (L, Π i ) = i=1 N [ [ Π ] i ] S i I 3 L Si L h i=1 2, (3.4) or more compactly, where A = E(L, {Π i }) = A3N 6 L 2 2, (3.5) (. 1 w i [ Π ) i ] S i I 3., w i = S i L h. (3.6) This function is a non-linear cost function because of the weighting term w i. By setting the weight to be constant, the dual L satisfying Plücker constraint can be estimated from model (3.6) without imposing the constraint by using SVD method. 41

57 The estimation model in (3.5) is a homogeneous system. Its non-trivial solution can be obtained by using the SVD technique. By considering the SVD of the design matrix A: A = USV, (3.7) where V contains the eigenvectors of A A. The non-trivial solution of the estimation model in (3.5) is the column of V corresponding to the smallest eigenvalues of A A. The Plücker constraint satisfying the solution from SVD method can be easily shown by using the fact that the matrix U is orthonormal and V is the eigenvector of A A, we have: [ ] 2 1 i Πi [ S i w 2 i Πi A i w i ] AV = [ S i i Πi w i ] i I V = VS2. (3.8) Following a property of vector cross product such that the cross product between two vectors produces a vector that is orthogonal to those two vectors, the upper and lower 3-vectors of a certain column of V are therefore orthogonal: v u,iv l,i = 0, i = 1,..., 6 (3.9) where v u,i and v l,i is the upper and lower 3-vectors of column i of the matrix V. This implies the Plücker constraint satisfaction of the solution from model (3.5). The dual L = [L o L h ] can be iteratively estimated by re-weighting the model in each iteration. Note that an estimated 3D line is in dual space we need to converse it back to the projective space i.e. L = [L h L o ]. The complete algorithm for estimating dual line L is shown in Algorithm 2. After 3D lines are triangulated, they will provide the initial solution for bundle adjustment to refine the estimation. Be aware that the cost function (3.6) has a singularity at S = 0. This can happen when a back-projection plane contains the origin of the object space coordinate system. This situation can be avoided by selecting the object space coordinate system different than any camera coordinate system. 42

58 Data: A set of image lines {l i } and camera projection matrix {P i } Result: Triangulated line L. Generate a set of back-projection planes {P i l i } = {[ Π i S i ] }; Set unit weight w i = 1; Form the design matrix A in (3.6); L V(:, 6) where A = USV ; Choose the number of iterations MaxIter; for iter 1 to MaxIter do Compute weight w i = S i L h ; Form the design matrix A in (3.6); L V(:, 6) where A = USV ; end [ ] L L o L h Algorithm 2: 3D line averaging In the projective space, the cost function (3.6) can be interpreted as the incidence between the plane and line. Let a line L is incident with a back-projection plane P l = Π = [ Π S] i.e. L Π. The points L x, L y, L z and L on the line L (2.2) must be incident with the plane Π: l PL x = l PL y = l PL z = l PL = 0. (3.10) Alternatively, the incidences of these four points relation can be written by using the Plücker matrix given in (2.2) as: Π M = [ ], (3.11) where M is the Plücker matrix given in (2.2). After using some algebraic manipulation, the above relation can be rewritten as: [ SI3 [ Π] ] } Π {{ } G [ Lh L o ] = (3.12) 43

59 It can be observed that the first three rows of the matrix G is the cost function (3.6) without the weighting term. Let L be a line minimizing the cost function (3.6) in the dual projective space is an element in the null space of the first three rows of the matrix G. In general, it is sufficient to use just only two points from L x, L y, L z and L to check the incidence between the plane Π and the line L. In other words, the rank of the matrix G is 2. Being in the null space of the first three rows of the matrix G and the matrix G has 2 degrees of freedom (rank 2), the line L hence satisfies the relation (3.12). Figure 3.4: The plane containing the origin of the object space coordinate system and the 3D line L intersect the back-projection plane at the 3D line. As a result, the cross product between the moment L o and the normal of the back-projection plan parallel to the line direction L h. In terms of geometry, the direction of line from the intersection between two planes is the cross product between normals of the planes. As discussed in Chapter 2, the moment L o of the 3D line L is the normal of the plane containing the origin of object space coordinate 44

60 system and the 3D line; hence this plane intersects a back-projection planes at the line L, see Figure 3.4. As a result, the cross product between L o and the back-projection planes must be parallel to the direction vector L h : L h [ Π i ] L o, i = 1,..., N, (3.13) or which is equivalent to the proposed cost function (3.6). λl h = [ Π i ] L o (λ R), (3.14) 3.4 3D line triangulation by averaging approach Let there be a 3D line L imaged by N cameras. A set of back-projection planes intersect at a 3D line. Therefore, the intersection lines from different combinations of the backprojection planes form a set of non-intersecting candidate 3D lines because of errors in both image lines and camera matrices. For instance, the intersection lines of back-projection planes from all stereo pairs are not unique, see Figure 3.2a. We conjecture that the 3D line can be estimated by averaging the candidate 3D lines. Intuitively, the mean of samples {L i } where i = 1,..., M can be estimated by the barycentric mean: L = argmin L = 1 M M i=1 M L i L 2 (3.15) i=1 L i. (3.16) The barycentric mean is the the best linear unbiased estimator; it gives a point in a vector space minimizing the sum of Euclidean distance to sampled points. Using the barycentric mean cannot provide a valid mean for a set of discrete points that does not lay on any vector 45

61 Figure 3.5: Given a point in R 2 where all points are on a 1-D manifold i.e. circular arc. The average of there points computed by the barycentricn mean does not lay on the circular arc. space but on a non-linear manifold. Namely, the mean may not be on the manifold of data. For example, A set of data points in R 2 is given and these points all live on a non-linear 1-D manifold e.g. circular arc, see Figure 3.5. By using the barycentric mean to estimate the mean of these points, their mean may not on the manifold. By representing lines by their Plücker coordinates, using the barycentric mean to find the average of a set of 3D cannot gaurantee to give a valid line due to the fact that the space of the 3D lines is not a vector space but a non-linear manifold. The space of 3D lines is non-linear manifold because of the Plücker constraint, as discussed in Chapter 2. This limitation can be inhibited by enforcing the Plücker constraint on barycentric mean based estimation: M L = argmin L i L 2 L i=1 subjected to ζ(l) = L DL = 0 and L = 1. (3.17) 46

62 The model (3.17) with this constraint is a linear least square model with non-linear constraints. The solution of this model is a point on the Klein quadric with minimal total Euclidean distance to each sample point on the manifold. An alternative choice for computing 3D line from candidates is to estimate the mean on manifold using the Karcher mean [54, 55]. Karcher mean is the generalized average of a set of discrete points in space other than vector space R n and defined as a local minimizer of the cost function: 1 M M d 2 geodesic(l i, L), (3.18) i=1 which is a least square problem with a Riemannian metric. Compared to the barycentric mean with quadric constraint, the Karcher mean works directly on the manifold and does not require any additional constraints. Due to this property, we use the Karcher mean to compute the 3D line from the average of a set of candidate 3D lines. The metric between two 3D lines on the line manifold, however, is not well-defined. In order to compute the Karcher mean of a set of 3D lines, we resort to the aforementioned orthonormal 3D line representation SO(3) SO(2), which lets us compute the geodesic on a Riemannian manifold. Hence, we proceed by computing the Karcher mean as averaging in the groups SO(3) and SO(2) Effect of scaling on 3D line representation Let line L be represented in the Plücker coordinates as in (2.1). To compute the average, we have to converse the Plücker coordinates to the orthonormal representation. Let s consider the sign scaling of the line L i.e. L L. Although both L and L represents the same 3D line, their orthonormal representations are not the same which result in the variant of the metric between two orthonormal representations. For example, a metric between L 47

63 and L have to be zero. However, the metric between their orthonormal representations do not vanish. Therefore, the sign scaling have to be eliminated prior to the mean computation. To analyze the effect of scaling, let L be scaled with a positive value i.e. al where a > 0. The orthonormal representation of al given in (2.5) is identical to that of L. The invariant of positive sign scaling can easily be demonstrated by first considering the orthonormal representation the line L = [ L h L o ] (S, T), see (2.5), scaled by a positive real constant a i.e. al (S +, T + ). The orthonormal representation of al is then: S + = = [ alh al h [ Lh L h al o al o L o L o ] a 2 [L h ] L o a 2 [L h ] L o ] [L h ] L o [L h ] L o (3.19) (3.20) = S, (3.21) [ ] 1 alh al T + = o (3.22) al al o al h [ ] 1 Lh L = o (3.23) L L o L h = T. (3.24) The orthonormal representation is hence proved to be invariant to the positive sign scaling of the Plücker coordinates. In contrast, the negative sign scaling does change the orthonormal represenation of the line L. This observation can be demonstrated as follows by considering the negative sign scaling of the line L i.e. bl (S, T ) where b < 0. The 48

64 orthonormal representation of bl is then: [ ] blh bl o b 2 [L h ] L o S = (3.25) bl h bl o b 2 [L h ] L o [ = L h L h L ] o [L h ] L o (3.26) L o [L h ] L o = S (3.27) [ ] 1 blh bl T = o (3.28) bl bl o bl h = 1 [ ] Lh L o (3.29) L L o L h = T (3.30) Consequently, negative scaling of a Plücker vector makes the line averaging unstable, unless a suitable pre-processing is applied. In order to eliminate this problem, we employ the fact that a negative scaling in the Plücker coordinates alters the direction-vector L h in (2.1) to opposite direction. A simple method to fix this problem is to scale the direction-vectors of all 3D lines to the same predefined half-space in R 3. In order to facilitate this, let n be a normal vector of a plane containing the origin and separating R 3 into two sub-spaces. The direction-vectors of lines in the set {L i } can be sign-scaled to the same half-space defined by n. This is achieved by scaling L i with φ ( n Lh) i : L i φ ( n Lh) i Li (3.31) where L i h is the direction-vector of the line L i and φ is a function: { 1 when x 0 φ(x) = 1 otherwise. (3.32) To find the normal vector n, lets define a set of line direction vectors: L = {L i h, L i h} i=1,...,m. (3.33) 49

65 Thus, an intuitive choice for the normal vector n is the eigenvector corresponding to the largest eigenvalue of L. Geometrically, this eigenvector is the normal vector of the plane passing origin and maximizing the sum of distance from plane to the set of line direction L Generating 3D line samples As a reminder, the basic concept of the averaging based 3D line estimation is to find the mean of the 3D line samples. Each sample shows the possible location of the correctly estimated line in space. Let there be N images of the estimated line. In this proposed methods, a line in the sample set is constructed by the intersection of k back-projection planes from a subset of N images (k N). To compute the intersection of back-projection planes, Hartley and Zisserman s line estimation method [1, page 323] is adopted due to its simplicity. This method hypothesizes that the back-projections from line match intersect at the 3D line estimate. Since a 3D line can be represented by 2 planes where these two plane intersect at the line, the goal is then to determine those two planes from the set of back-projection planes: A = [Π 1,..., Π i,..., Π k ], (3.34) where Π i is a 4-vector representing a back-projection plane. Those two planes can be estimated by using two basis vectors spanning the best rank 2 approximation of A which can be computed by the SVD technique. Let the SVD decomposition of A be A = UDV. Those two planes are then the columns of U corresponding to the two largest singular values. Let those two plane be denoted by Π and Π. The estimated line can be computed by the intersection of those two planes using (2.15). 50

66 3D line samples are then generated by estimating a line using k images where k N. That is, all combinations of k images are used to generate line samples. Therefore, the set of 3D line samples is: Γ = Γ k (3.35) k N where Γ k be a set of lines estimated from all k-combination of N views. Therefore the cardinality of Γ k is C N k. Since, in practical, the number of images, N, can be large, starting to generate line samples from small number of views (small k) can lead to the requirement of huge memory and slow computation. That is, the cardinality of Γ, n(γ), is large. Moreover, using small number of views does not provide enough constraints on the location of 3D lines. For example, let s start to generate line samples from 2 views. The the cardinality of Γ can be calculated by: n(γ) = = N n(γ k ) (3.36) k=3 N Ck N (3.37) k=3 = 2 N N 1. (3.38) This suggests that the line samples should be generated starting from sufficiently large k (k N) D line averaging by averaging in special orthogonal groups Given a set of 3D lines L i (S i, T i ) where i = 1, 2,..., M, the average of the set of 3D lines L (Ŝ, T) can be computed by estimating the mean Ŝ of {S i } on SO(3) and T of {T i } on SO(2) using numerous methods [56, 57, 58]. In this paper, we adopt the nonlinear mean shift over analytical manifold approach proposed by Subbarao and Meer [21] because the mean shift computation is based on the weighted average in local area (kernel 51

67 profile). Moreover, the mean shift algorithm is easy to compute and robust. Following the model presented in [21], I will briefly introduced the non-linear mean shift over SO(n). The non-linear mean shift was proposed for computing the mean of a set of points over analytical manifold on which a metric is defined. The weighted sum of points on the manifold is not well defined because the weight sum may not be a point on the manifold; thus, the mean shift vector may not be valid. The basic concept of non-linear mean shift is to map points on manifold to the tangent space which is locally well defined at a point on the manifold and has the Euclidean structure. The mapped points on the tangent space are used to compute the mean shift vector which is then mapped back to the manifold. For the manifold of rotation group, the mapping from the manifold to its associated tangent space so(n) is the logarithmic operator. Particularly, the tangent space so(3) of the 3D rotation group SO(3) is the set of 3 3 skew-symmetric matrix: 0 ω z ω y log(r) = Ω = ω z 0 ω x, (3.39) ω y ω x 0 where R SO(3). The logarithm mapping of R can be computed by: log(r) = θ 2 sin(θ) (R R ), (3.40) where trace(r) = cos(θ). Note that this method fails if θ = π. The inverse of the logarithm operator that maps a point from so(3) back to SO(3) is the exponential mapping: exp(ω) = I 3 + sin ω ω 1 cos ω Ω + Ω 2, (3.41) ω 2 which is also known as Rodrigues rotation formula and ω = [ω x ω y ω z ] which is a vector in so(3). For the 2D rotation group SO(2), the computation of logarithm and exponential mappings is straightforward. Let R be 2D rotation matrix with a rotation angle 52

68 α. The logarithm and exponential mappings for 2D rotation group SO(2) are then: ([ cos α sin α log sin α cos α ]) = [ 0 α α 0 ] (3.42) ([ 0 α exp α 0 ]) = [ cos α sin α sin α cos α ] (3.43) The logarithm and exponential mappings for rotation groups SO(3) and SO(2) presented above are used in the non-linear meanshift computation. To estimated the starting point for non-linear meanshift averaging, the linear rotation averaging scheme proposed by Gramkow [57] is adopted. Let S 1, S 2 SO(3). By closure axiom, S 1 S 2 is also an element of SO(3) with eigenvalues (1, e iθ, e iθ ) where the angle θ is referred to as the angular distance between S 1 and S 2 which is the geodesic between two elements in SO(3): d geodesic (S 1, S 2 ) = θ(s 1 S 2 ). (3.44) Since the eigenvalues of S 1 S 2 are 1, e iθ and e iθ, the trace of S 1 S 2 is then: trace(s 1 S 2 ) = 1 + e iθ + e iθ (3.45) = cos(θ). (3.46) By using the second order Taylor expansion of the cosine whitout the remainder term, one has cos(θ) θ2. (3.47) and after some algebraic manipulations, we have: θ 2 (S 1 S 2 ) 3 trace(s 1 S 2 ). (3.48) 53

69 Using (3.48) and extending this to all S i, the Karcher mean Ŝ can be computed by: Ŝ = argmin S SO(3) argmax S SO(3) M θ 2 (S S i ), (3.49) i=1 trace ( S M i=1 S i ). (3.50) A compact solution for the mean is given using the singular value decomposition of M i=1 S i: the mean of {S i } is then: M S i = U S D S V S, (3.51) i=1 Ŝ = U S diag (1, 1, det (U S ) det (V S )) V S. (3.52) Following the same steps, Karcher mean for T i on SO(2) can be estimated by: T = U T diag (1, det (U T ) det (V T )) V T, (3.53) where U T D T V T is the singular value decomposition of M i=1 T i. Using both (3.52) and (3.53), the initial solution for line averaging is then given by: L ( Ŝ, T ) SO(3) SO(2). (3.54) The important formulas and how to estimate initial solution for non-linear mean shift for 3D line averaging have been discussed. Following the non-linear mean shift over analytical manifold presented in [21], as discussed in Chapter 2, {S i } SO(3) and {T i } SO(2) are mapped to the tangent spaces so(3) and so(2) respectively by the logarithm mapping. The mean shift vector is computed on the tangent space using the normal kernel profile k N (s) = e 1 2 s and then mapped back to the manifolds SO(3) and SO(2) by exponential mapping. Algorithm 3 summarizes the steps for estimating L. 54

70 Data: A set of lines {L i } where i = 1,..., M. Result: The mean L ( Ŝ, T ) of lines {L i }. S 0 3 3, T ; Define the normal n vector of a plane e.g. the direction vector of a line in {L i }; for i 1 to M do L i h direction vector of line L i; L i φ(n L i h ) L i; Represent line L i by (S i, T i ),see (2.7) and (2.9); S S + S i, T T + T i ; end Compute SVD of S = UDV and T = U D V ; Ŝ U diag (1, 1, det(u)det(v)) V ; T U diag (1, det(u )det(v )) V ; Choose the number of iterations max iter; Choose the kernel bandwidth h; for iter 1 to max iter do m h (S) M i=1 log(ŝ S i )g N ( log(ŝ S i )/h 2 ) M i=1 g ; N( log(ŝ S i )/h 2 ) m h (T) M i=1 log( T T i )g N ( log( T T i )/h 2 ) M i=1 g ; N( log( T T i )/h 2 ) Ŝ Ŝ exp(m h (S)); T T exp(m h (T)); end L ( Ŝ, T ) ; Algorithm 3: Algorithm for 3D line averaging. The function g N is defined as g N (s) = k N (s) where k N is the normal kernel profile. 55

71 Chapter 4: Line based Bundle Adjustment 4.1 Literature review Bundle adjustment is a powerful technique for recovery 3D scene geometry and camera motions given the images of the scene from different viewpoints. It was first developed in the photogrammetry community during 1950s and became the standard technique in others fields including computer vision and robotics. Considering the bundle of all rays between each 3D point and the optical centers viewing the 3D point, the 3D points and the camera motions are simultaneously adjusted by minimizing a cost function. Since the bundle of all rays is adjusted, it is referred to as the bundle adjustment. The cost function used in the bundle adjustment is usually non-linear. Hence, an optimal solution can be iteratively solved by numerical computation techniques e.g. Levenberg-Marquardt [59] or Dog Leg algorithm [60]. Since the iterative computation requires a starting estimate, this initial solution can be obtained from linear approaches or first solving the motion linearly and then estimate the geometric entities which are then used as the initial solution. Therefore, the bundle adjustment is usually used at the last step of the 3D scene geometry and camera motion recovery as the refinement of an initial solution. The optimal solution for the bundle adjustment process is a local minimizer of the cost function; hence, the final solution depends on the initial solution. 56

72 Because of its versatility, a wide range of data can be used in the bundle adjustment such as points, lines, planes and space curves. The most basic and yet common used features in the bundle adjustment is the point features [3, 12, 61, 4, 62, 63, 64, 65]. In photogrammetry, the collinearity equation for point features is usually exploited and extensively used by many researcher for more than two decades. Assuming the central projection camera, the collinearity equation shows the geometric relationship between the coordinate of a point in the object space and the coordinate of its projection on the image plane. That is, a 3D point and its image lie on a straight line, called projection ray, passing through the camera s optical center. In computer vision, this concept is reformulated in the language of projective geometry. As a result, the projection of a 3D point to the image plane becomes a linear function in homogeneous coordinates. In order to estimate camera motion and scene geometry using point features, a set of point matches is required [66, 67, 11]. In terms of the cost function, researchers, collectively, adapt cost functions that minimize the symmetric transfer error [68], re-projection error [66] and the Sampson error [69], which are computed in the 2D image space. However, cost functions can also be defined in object space [70]. In this dissertation, in contrast to using points, I use an alternative but less exploited line features. An advantage of using line-feature over point is that the line-feature is less prone occlusion compared to point. In addition, since the line features show discontinuities in grey values function in one direction while point features do not, line features are then easier to be extracted with sub-pixel accuracy [71]. Moreover, line matching is conjectured to be easier than point matching in many cases [72]. An image line can be represented by either straight or raw edges. As a straight edge, an image line can be represented by two points on the line or a point and line direction. 57

73 Alternatively, it can be also represented by the line equation, e.g. Hessian form, which can be obtained by fitting line to the raw edge or using Hough transform [73]. For the raw edge representation, an image line feature is represented as a sequence of points tracing the edge. An advantage of the straight edge representation over the raw edge is the memory usage because the raw edge representation has to store all of points on the edge. However, representing an image feature by straight edge may not be applicable in some cases. For example, an image obtained from linear array scanner consists of strip images in digital form. Therefore, representing an image line feature as a straight edge is not suitable because points on the image line are on different strip images having different viewpoints.to the best of my knowledge, a 3D line is always represented as infinite straight line or by two points on the line. By representing an image line by raw edge, a cost function for bundle adjustment is defined as a geometric relation between a point on the raw edge and 3D lines, see Figure 4.1. An approach using raw edge for line based bundle adjustment is the collinearity approach. Schenk [74] extended the collinearity equation for raw edges. A 3D line is represented by minimal parameters i.e. 4 independent parameters and any point on the 3D line can be parameterized by an additional parameter. That is, any point on a 3D line can be represented by 5 parameters. The collinearity equation between a point on the raw edge and an unknown point on the 3D line can then be formed. A similar method for uncalibrated cameras was proposed by Bartoli [75] where 3D lines are represented by two points. The obtained solution is then the projective equivalent one. A drawback of the collinearity approach is the increasing number of parameters because not only the camera motion and 3D line parameters but also the parameters used to specify each point on the 3D lines need to be solved. 58

74 Figure 4.1: An image line feature can be represented by raw edge which is a sequence of points. A cost function for bundle adjustment is then defined as a geometric relation between an image point and its corresponding 3D line e.g. collinearity equation. In contrast to the collinear approach, co-planarity approach uses coplanar property between points on raw edge and object space features. Mulawa and Mikhail [76], Karjalainen et al. [77] ans Zielinski [78] represent a 3D line by its direction and a point on the line. A co-planarity constraint based cost function is then defined by using a fact that a point on the projection of the object line is co-planar with a plane defined by the optical center, 3D line direction and a point on the 3D line. Habib et al. [79] represents a 3D line by two points. The mathematical model for estimation is formulated based on the co-planarity between the intermediate point along the image line (raw edge) and the plane formed by the perspective center and the end-points of the corresponding 3D line. Since a 3D line is represented by its 2 end-points, therefore, the estimated 3D line cannot be unique. This ambiguity can be fixed by enforcing the projection of the line end-points to be close to 59

75 the end-points of its corresponding image line e.g. longest image line segment. A similar idea was applied to the determination of camera motion and 3D scene geometry from airborne hyperspectral imagery in [80], where the ambiguity in the determination of 3D line endpoints are eliminated by fixing one of the coordinates (x or y) to the initial values. In order to reduce the amount used for storing image line features, an image line can be analytically represented by line equation or two points on the image line. Habib et al. [81] represent an image line by its line equation; hence the normal vector of the back-projection plane can be obtained directly from line equation. The cost function is formulated based on the coplanarity between a point on the 3D line and back-projection plane. Bartoli and Sturm [15] represent an image line segment by its two endpoints and a 3D line by its Plücker coordinates. With the Plücker coordinates representation, a 3D line can be directly projected to the image plane by line projection matrix. The cost function is defined as the orthogonal distances between an image line endpoints and the projected lines. Similarly, Asai et al. [82] used orthogonal distance between image line endpoints and reprojected 3D line estimate to formulate a bundle adjustment model for road reconstruction. Alternatively, the integrated squared-distance between reprojected line and its corresponding image line segment is also used as the cost function for bundle adjustment in [83]. In contrast to the aforementioned methods which simultaneously optimize the motion and structure parameters, Tang et al. [84] use the orthogonal distances computed from the projections of points on a 3D-line and analytical image lines as the cost function and split the optimization process into distinct steps. Strictly speaking, the camera motion and 3D structure parameters are optimized independently using non-linear optimizer such as the Levenberg-Marquardt algorithm. 60

76 4.2 Overview of this chapter In this chapter, I introduce a new geometric error in object space for bundle adjustment based on line features along with the derivation of the cost function. The proposed object space cost function was inspired by the work proposed by Taylor and Kriegman [83] in which the geometric error is defined by integrated squared-distance between image line segment and re-projected line. The proposed geometric cost function is derived based on the equivalent between 2D projective space P 2 i.e. image plane and the Gaussian sphere S 2 center at the optical center of the camera. To perform bundle adjustment, the camera matrix and 3D line need to be parameterized. In classical optimization techniques, the parameter space is modeled as Euclidean space [85, 86]. In contrast to the classical optimization techniques, a modern approach does not define the parameter space as the Euclidean space or vector space but the nonlinear manifold as frequently occurs in many constrained problems, [87, 88, 89]. By using this concept, the geometric structure of the parameter space can be exploited and a numerical optimization technique can provide solution satisfying constraints at every iteration. Strictly speaking, constraint based problems can be treated as un-constraint based problems. With this favorable property, the manifold approach is adopted. In this chapter, how to parameterize the camera motion and 3D line structure by non-linear manifold will be discussed. The rest of this chapter is organized as follows. The derivation of the geometric error for bundle adjustment is first presented. The parameterization of the camera motion and 3D line structure for bundle adjustment by using non-linear manifolds is the discussed. 61

77 4.3 Geometric error in object space The recovery of the 3D scene and camera pose is the inverse problem dealing with converting observed measurement into system parameters. The basic approach for solving the inverse problem is to find the model parameters minimizing a cost function. The cost function is an operator that either explicitly or implicitly describes the relationship between observed data and the model parameter. In this work, the observed data is the image lines segment and the model parameters the camera motion and 3D line. The proposed cost function is formulated as a geometric error in object space which is derived based on the equivalence between the image plane and the Gaussian sphere centered at the optical center of the camera. With this equivalency, a line segment on the image plane is equivalent to a circular arc on the Gaussian sphere, see section Geometrically, if the camera motion and 3D line structures are correct, the circular arc must coplanar with the projection plane of the 3D line corresponding to the line segment. For example, let the image of a 3D line L be the image line segment having two endpoints x 1 and x 2. These two endpoints are represented by their homogeneous coordinates and equivalent to points V 1 and V 2 respectively, see Figure 4.2. With the correct camera motions and estimated 3D lines, the sector OV 1 V 2 should be coplanar with the projection plane of the line L. It is hence hypothesized that, the correct camera motion and 3D line estimates minimize the integrated squared distance from point on the boundary of the sectors to the projection planes. To derive the proposed geometric error, let consider an image line segment defined by two endpoints x 1 and x 2. Given calibrated cameras, the derivation of the object space cost function starts from mapping these endpoints x 1 and x 2 to unit Gaussian sphere in object 62

78 Figure 4.2: The proposed object space error is defined as the integrated squared-distance between the projection plane and point on the boundary of the sector formed by O, V 1 and V 2. Points V 1 and V 2 are from mapping endpoints x 1 and x 2 of an image line segment onto Gaussian sphere s surface where the line is the image of 3D line L. The projection plane of the 3D line L has normal vector M in the camera coordinate system. 63

79 space (2.25) V 1 = K 1 x 1 K 1 x 1, (4.1) V 2 = K 1 x 2 K 1 x 2, (4.2) These two points form a circular sector OV 1 V 2, see Figure 4.2. In other words, two endpoints of a line segment define a sector. With accurate cameras and 3D lines parameter estimates, it is hypothesized that the integrated squared (Euclidean) distance between the projection planes of 3D line estimates and point on the boundary of sector defined by two endpoints of their corresponding image line segments is minimized. The boundary C of the sector OV 1 V 2 is a piecewise smooth curve consisting of three curves i.e C 1 which is the straight line between the optical center and V 1, C 2 the circular arc between V 1 and V 2 and C 3 the straight line between the optical center and V 1. That is, C = C 1 C 2 C 3. A point V on straight lines C 1 or C 3 can be easily parameterized as a distance r from optical center O by: C 1 := {V V = rv 1, 0 r 1} (4.3) C 3 := {V V = rv 2, 0 r 1} (4.4) Points V on the circular arc between point V 1 and V 2 can be generated by revolving the point V 1 about the normal vector N of the great circle which is defined by: N = [V 1] V 2 [V 1 ] V 2. (4.5) By using Rogriquez s formula, we can then generate every point on the great circle by: V(θ) = V 1 cos(θ) + ([N] V 1 ) sin(θ) + N(N V 1 )(1 cos(θ)), (4.6) where θ is the rotation angle such that θ [0, cos 1 (V 1 V 2 )] because we want to parameterize a point between V 1 and V 2 on the great circle. That is, a point on the circular arc 64

80 between V 1 and V 2 is parameterized with rotation angle about the normal vector N in the right hand convention. Since the point V 1 is orthogonal to the plane normal vector N, the right-most term in the right side of (4.6) vanishes: V(θ) = V 1 cos(θ) + ([N] V 1 ) sin(θ) C 2. (4.7) Given a plane Π = [M 0] in camera coordinate system which is the projection plane containing the optical center and 3D line corresponding to the line segment with endpoints x 1 and x 2, the shortest distance from a point V on the curve C to the plane M is simply: d (Π, V) = V M M d2 (Π, V) = M VV M M 2 (4.8) Hence the object space error is formulated as the integration of the squared-distance of point on the curve C to the projection plane with normal vector M: object space error = g(m) = = 1 M VV Mds (4.9) M 2 C ( ) 1 M 2 M VV ds M, (4.10) where ds is an elementary arc length. Since, the curve C is a piecewise smooth curve, the integral (4.10) can be breaked into the summation of 3 line integrals: C=C1 C2 C3 VV ds = VV dr + C 1 VV dθ + C 2 VV dr C 3 (4.11) C From (4.3) and (4.4), the line integration on curve C1 and C3 are simply: VV dr = C 1 VV dr = C r 2 V 1 V 1 dr = V 1V 1 3 r 2 V 2 V 2 dr = V 2V 2 3 (4.12) (4.13) 65

81 With the point parameterization on the circular arc C 2 in (4.7), the line integral on C 2 can be computed by: C 2 VV ds = θ 0 = V 1 V 1 = (V 1 cos(θ) + [N] V 1 sin(θ)) ( V 1 cos(θ) V 1 [N] sin(θ) ) dθ θ 0 θ cos 2 (θ)dθ [N] V 1 V1 [N] sin 2 (θ)dθ + ( ) θ [N] V 1 V1 V 1 V1 [N] cos(θ) sin(θ)dθ 0 ( θ 2 + sin(2θ) ) ( θ V 1 V1 4 2 sin(2θ) ) N] V 1 V1 [N] 4 + sin2 (θ) ( ) [N] V 1 V1 V 1 V1 [N] 2 0 (4.14) By substituting the line integrals (4.3), (4.4) and (4.7) into (4.10), we obtain the closed form of the object space error of an image line. The normal vector M of the projection plane is obviously the function of optical center and a 3D line parameters. By transforming the object space coordinate system to a camera coordinate system, the moment vector of the 3D line, see Figure 2.1, in the camera coordinate system is thus the normal vector M of the projection plane Π. Let L be the Plücker coordinates of a 3D line in object space coordinate system, R and T the rotation matrix and translation vector of the camera. The Plücker coordinate of the line in the camera coordinate system is then (2.20): [ L L = h L o ] [ R R[T] R ] L. (4.15) The normal vector M of the projection plane is then the moment L o of the 3D line L : M = R [ [T] I 3 ] L, (4.16) or alternatively in terms of orthonormal representation: [ t11 s M = R [ [T] I 3 ] 1 t 21 s 2 66 ], (4.17)

82 where L (S, T), see (2.10). Figure 4.3: A 3D line under changing coordinate system from object space to camera coordinate system. The moment of of the 3D line in camera coordinate system, L o, becomes the normal of projection plane. 4.4 Parameterization of camera motion and 3D line In classical optimization techniques, the parameter space is usually a vector space. Namely, the Euclidean space R n is used to model the space of parameters. Classical optimization techniques has long been developed based on the use of Euclidean space. The cost function or objective function is defined as the mapping from the Euclidean space to the real line R. To solve the optimization problem using vector space approach, the first step is to parameterize estimated entities by a vector in Euclidean space R n. If necessary, either linear or non-linear constraints are imposed to guarantee the valid estimated entities. For example, a rotation matrix can be parameterized by a quaternion. The unit norm constraint need to be taken into the estimation model in order to get a quaternion that represent a valid rotation matrix. 67

83 A modern approach for a solving optimization problem is to model the parameter space as a non-linear manifold M, especially, Riemannian manifold. The main characteristic of the Riemannian manifold is that the tangent space at each point on the manifold has an Euclidean structure. The concept of this approach is not to use the parametric expressions of the estimated entities but the expressions for adjusting the estimated entities. In other words, the element of the parameter manifold is the estimated entities not the parameters of the parameterized entities. For example, a point on the parameter manifold of the rotation matrix estimation problem is a matrix not the Euler s angles or quaternion parameterization of the rotation matrix. Therefore, the solution from an optimization technique is used for directly updating the matrix not the parameters of parameterized the rotation matrix. Figure 4.4: Local parameterization of the parameter manifold. Each point on the manifold has a Euclidean structure. The current estimate is updated locally such that solution is still on the manifold and minimizes the cost function g. The objective function is defined as a smooth real function g : M R. The goal of optimization is to find an element ξ in parameter manifold M minimizing g. Given a 68

84 starting point ξ 0 M, the (local) minimizer of g can be solved by iterative algorithms on the manifold M which can be visualized as iteratively walking on the manifold, see Figure 4.4. Namely, the sub-optimal solution is iteratively corrected to a locally optimal one. The step size and direction for iterative computation are computed in the tangent space, which is a Euclidean space, at the current solution on the manifold M, and then project back to manifold to adjust the estimated entities. In the rest of this section, parameterizations of the camera motion and 3D line in terms of non-linear manifold are discussed Parameterization of camera motion The camera motion consists of two entities, particularly translation vector (the position of the optical center) and rotation matrix. The expression for updating the translation vector is discussed first. Let the translation vector from the last iteration be denoted by T 0 and [T x T y T z ] the updating term for translation vector. The new translation vector T is then: T = T 0 + T x T y T z, (4.18) because the manifold of the translation vector is linear. Taking the derivative of the translation vector with respect to its updating term evaluated at the point [T x T y T z ] = (0 0 0) leads to: T x T = 1 0 0, T = T y 0 1 0, T = T z (4.19) For the camera s rotation matrix, there are various parameterization e.g. unit quaternion, SO(3) manifold, Euler s angles, Cayley-Klein parameterization or spinor. A rotation 69

85 matrix R = [r 1 r 2 r 3 ] has 9 elements with 6 constraints: r i = 1 where i = 1, 2, 3 r i r j = 0 where i j. (4.20) Thus, its minimal representation has 3 degrees of freedom. With the above constraints, the manifold of the rotation matrix is then non-linear and a point on the manifold is a matrix satisfying the above constraints. According to the concept of the optimization on manifold, the parametric expression of a rotation matrix does not need to be known but the expression for updating the rotation matrix. Let the rotation matrix from the last iteration be denoted by R 0. Following the technique presented in [90], the rotation matrix R in the local region of R 0 is defined by: R(ω) = R 0 exp ([ω] ), (4.21) where ω = [ω x ω y ω z ]. Note that the vector is used for updating R 0 not the parameters of a parameterized rotation matrix. It can be observed that the above equation is the first order change of rotation matrix R 0. The term exp ([ω] ) is actually a rotation matrix where ω and ω are the axis and angle of rotation of the axis angle representation for 3D rotation. The above equation can be interpreted as the update of R 0 by exp ([ω] ). In terms of the Riemannian manifold, the vector ω is on the tangent space at R 0 and the exp ([ω] ) is the projection from the tangent space back to the manifold SO(3). In other words, the rotation matrix is updated to a distinct point on the manifold in a different way from the parameterization in vector space e.g. Euler angles or quaternion parameterization. 70

86 The derivative of the rotation matrix can be evaluated at R 0 and ω = as follows: 1 R ω x = R 0 0 ω= (4.22) = R = [0 3 1 r 0,3 r 0,2 ] (4.23) R ω y = R 0 1 ω= (4.24) = R = [ r 0, r 0,1 ] (4.25) R ω z = R 0 0 ω= (4.26) = R = [r 0,2 r 0, ] (4.27) (4.28) where r 0,i i = 1, 2, 3, is the column i of the rotation matrix R 0. Although this approach is not well known in computer vision and photogrammetry where Euler s angle and quaternion representations are popular, it is a standard approach in physics and robotics for optimization involving rotation. It can be observed that the updated rotation matrix always satisfies the six constraints for a rotation matrix. Moreover, it is simple and straightforward because the expression of the rotation matrix need not to be known Parameterization of 3D line There are various parameterizations for a 3D line in space. The chosen line parameterization for bundle adjustment should not make the line be over-parameterized or have 71

87 singularity. A simple representation is two points on the line. However, it gives a difficulty in solving for a solution because the solution is not unique. Similarly, representing a line by a point on the line and line direction does not have a unique solution. Although this obstruction can be fixed by using the point on the line closest to the origin, the orthogonality constraint need to be imposed in the estimation model and complexity of the problem is then increased. Although the Plücker coordinates are a good candidate because the uniqueness of the solution can be obtained due to the Plücker and unit norm constraints, its manifold is not Riamannian. In this work, the orthonormal representation proposed in [15] is adopted because the the Plücker and unit norm constraints are naturally satisfied. As discussed in Chapter 2, The concept of the orthonormal representation is to embedded the Plücker coordinates into the 3D and 2D rotation matrices. As a result, the Plücker coordinates represented by the orthonormal representation are parameterized by (minimal) 4 parameters, which are the rotation angles of 3D and 2D rotation. Moreover, the space of 3D line can then be modeled by the manifold of rotation matrix. By representing the line by 3D and 2D rotation matrices, 3D lines can be locally updated in the same way as updating rotation matrices which preserves the orthogonal and unit norm constraint according to the properties of a rotation matrix. In order to demonstrate this process, let a line L 0 from the last iteration be represented by (S 0, T 0 ) SO(3) SO(2). The new line L (S, T) is then 72

88 S = S 0 exp v x v y v z (4.29) ([ 0 θ T = T 0 exp θ 0 ]) [ cos θ sin θ = T 0 sin θ cos θ ]. (4.30) That is S and T are locally defined in the neighbor regions of S 0 and T 0. At each iteration in the estimation process, 3D lines can be locally updated by parameters vector [v x v y v z θ] by above equations. Since the orthonormal representation is based on 3D and 2D rotation matrices, the Jacobian matrix of Plücker coordinate L [ ] t 11 s 1 t 21 s 2 can evaluated with respect to these four parameters by using the derivative of 3D and 2D rotation matrices. The Jacobian matrix of S with respect to v at can be computed by: ( v S = S v=0 v x v=0 S v y v=0 ) S v z v=0 (4.31) 73

89 where S v x = S 0 v=03 1 = S 0 S v y = S 0 v=03 1 = S 0 S v z = S 0 v=03 1 = S (4.32) = [0 3 1 s 0,3 s 0,2 ] (4.33) (4.34) = [ s 0, s 0,1 ] (4.35) (4.36) = [s 0,2 s 0, ] (4.37) (4.38) where s 0,i i = 1, 2, 3, is the column i of the rotation matrix S 0. The derivative of T with respect to θ at θ = 0 is simply: [ d 0 1 dθ T = T ] [ t0,12 t = 0,11 t 0,22 t 0,21 ]. (4.39) where t 0,ij i = 1, 2, is the element at the row i and column j of the rotation matrix T 0. As a result, the Jacobian matrix of the line L with respect to [v 3 1 [ v x L v y L v z L θ] at 0 is: ] [ θ L 03 1 t = 0,11 s 0,3 t 0,11 s 0,2 t 0,21 s 1 t 0,21 s 0, t 0,21 s 0,1 t 0,11 s 0,2 ] (4.40) 4.5 Numerical optimization Given the initial estimation of the camera motion i.e. translation vector and rotation matrix and the structure i.e. 3D line, the bundle adjustment is exploited to refine those 74

90 entities to obtain optimal ones. Since the proposed geometric error presented in Section 4.3 is a non-linear function of camera motion and 3D line, the bundle adjustment can be then be performed by solving the non-linear least square problem. Hence, the structure and motion parameters can be iteratively adjusted by numerical optimization techniques. Given N cameras viewing M 3D lines, the bundle adjustment model can be formulated in least square sense by: F = N M w i,j g(l i,j, T i, R i, L j ) = r r, (4.41) i=1 j=1 where w ij is one if the line j is visible in view i and r the residual vector with dimension (r 1). The optimal 3D lines and camera motions are the minimizer of the above function F. The optimal solution can be obtained by adjusting the given sub-optimal solution. To iteratively adjust the estimated entities particularly the camera motions and 3D lines, we seek to find the correction parameters for: optical center: [T x T y T z ], see (4.18) camera rotation: [ω x ω y ω z ], see (4.21) 3D line: [v x v y v z θ], see (4.30). To obtain the above correction parameters, let δ be the vector of the correction parameters and ξ be a point on the parameter space in the neighborhood region of sub-optimal solution ξ 0, see Figure 4.4. Remind that ξ 0 contains the sub-optimal translations, rotation matrices and 3D lines. For the small correction parameters δ, the Taylor series expansion of the function F is: r(ξ) r(ξ 0 ) + Jδ (4.42) 75

91 where J is the Jacobian matrix of the residual vector r with respect to the correction parameter δ: J r p = δ r (4.43) The above Taylor series expansion is the basis of solving non-linear optimization problem such that the minimization problem (4.41) can be solved by solving a sequence of approximations of the original problem [62]. Namely, we seek to find the solution of the following linear least square problem: r(ξ 0 ) + Jδ 2 2. (4.44) ξ 0 is corrected by the solution of the above linear lease square δ which can be obtain by solving the following normal equation: Nδ = J r(ξ 0 ), (4.45) where N is the Gauss-Newton approximation of the Hessian matrix: N p p = J J (4.46) The normal matrix N in (4.45) can be rank deficient. One of an approaches for solving rank deficient systems is the Levenberg-Marquardt algorithm. The basis of Levenberg- Marquardt algorithm is to solve the augmented normal equation [91, 62], as follow: (N + H(λ))δ = J r(ξ 0 ), (4.47) where λ > 0 is called damping parameter. That is, the normal matrix N is regularized by a symmetric definite matrix H(λ) so called damping matrix [62], which is often chosen as H(λ) = λi p p H(λ) = λdiag(n) 76

92 The damping parameter λ needs to be tuned at each iteration. A basic strategy of tuning λ is by dividing it by a constant, often 10, when the error decreases after the correction of ξ 0 i.e. r(ξ) < r(ξ 0 ) and then the step is accepted. Otherwise, the damping parameter is multiplied by the constant and the step is neglected. The summary of the Levenberg- Marquardt algorithm is presented in Algorithm 4. Data: Initial estimated camera translation T 0 and rotation R 0 and 3D line L 0. Result: Adjusted camera translation t and rotation R and 3D line L. Step 1 Compute the correction parameters (N + H(λ))δ = J r(ξ 0 ); Step 2 Adjust camera translations and rotations and 3D lines using the obtained correction parameters, see (4.18),(4.21), (4.30); Step 3 If the new estimation error is greater than the initial estimation error, the adjusted camera translations and rotations and 3D lines are declined and the damping parameter is multiply by a constant. If the new estimation error is less than the initial estimation error, the adjusted camera translations and rotations and 3D lines are accepted and the damping parameter is divided by a constant. The initial estimation error is reset to this new estimation error.; Step 4 Repeat until convergence; Algorithm 4: Levenberg-Marquardt algorithm In order to solve the camera motions and 3D lines by bundle adjustment, especially the computation of the Jacobian matrix, the proposed cost function need to be differentiated with respect to correction parameters. Let δ = [T x T y T z ω x ω y ω z v x v y v z θ] be the vector of updating parameters. We therefore seek to find the Jacobian matrix J of the cost function (4.10) with respect to δ at given camera s translation vector T 0, rotation matrix R 0 and 3D line L 0 (S 0, T 0 ): J = dg dξ = dg dm dm dξ (4.48) 77

93 Starting from differentiating the cost function (4.10) with respect to the normal vector M, we obtain: dg dm = 1 d M 2 dm M DM + M DM d ( ) 1 dm M 2 = M (D + D) M 2 (4.49) 2M DMM M 4, (4.50) where D is the integral in (4.11) and M 0 = R 0 [ [T 0 ] I 3 ] L 0. The derivative of M (4.17) with respect to the updating parameters of the optical center [T x T y T z ] is: [ M = R 0 [ [1, 0, 0] ] ] T L 0 (4.51) x [ M = R 0 [ [0, 1, 0] ] ] T L 0 (4.52) y [ M = R 0 [ [0, 0, 1] ] ] T L 0 (4.53) z Using the formulae in (4.28), the derivative of the normal vector M with restpect to the updating terms ω x, ω y and ω z is simply: M ω x = [ R ] [ 0 [1, 0, 0] [T0 ] ] I 3 L0 (4.54) M ω y = [ R ] [ 0 [0, 1, 0] [T0 ] ] I 3 L0 (4.55) M ω z = [ R ] [ 0 [0, 0, 1] [T0 ] ] I 3 L0 (4.56) The derivative of the normal vector M with respect to the updating terms v x, v y, v z and θ can be computed by using (4.40): [ v x M v y M v z M ] θ M = R [ [t] I 3 ] [ t 0,11 s 0,3 t 0,11 s 0,2 t 0,21 s 1 t 0,21 s 0, t 0,21 s 0,1 t 0,11 s 0,2 ], (4.57) where s 0,i i = 1, 2, 3 is the column i of the rotation matrix S 0 and t 0,ij i = 1, 2 the element at the row i and column j of the rotation matrix T 0. As a result we obtain the derivative of 78

94 the normal vect M with respect to the updating parameters: [ δ M = M M M M M M M T x T y T z ω x ω y ω z v x v y M v z M ] θ M (4.58) 79

95 Chapter 5: Experimental results This chapter is dedicated for demonstrating the performance of proposed line based 3D scene and camera motion recovery methods including 3D line estimation and bundle adjustment. Experiments were conducted using both synthetic and real data. The synthetic case includes a set of known forty 3D lines bounded by endpoints and camera projection matrices designed to simulate a camera motion. The real data is composed of the model house sequence from Oxford multi-view dataset 1 and a homemade datasets including the images from pinhole and fish eye cameras. The performances of the proposed methods were compared with Hartley and Zisserman s line triangulation method [1, page 323] and line-based bundle adjustment from Taylor and Kriegman [83] and Bartoli and Sturm [15]. 5.1 Performance evaluation An approach for evaluating the performance of 3D scene and camera motion recovery methods is by measuring the similarity between the image lines and the re-projection of estimated 3D lines. Since image lines feature are represented as line segments and estimated lines as infinite lines, the suitable similarity metric between them is the orthogonal distances between the image line endpoints and the re-projections of estimated lines, see Figure 5.1. Let R and T be estimated rotation matrix and translation vector of a camera. 1 vgg/data/data-mview.html 80

96 The estimated line L can be projected to the image plane by first transforming the line to the camera s coordinate system using (2.20): L = [ L h L o ] [ R ] R [ T R ] L. (5.1) The projection plane of the estimated line L then has normal vector L o in the camera coordinate system, see Figure 4.3. By the relation between the image line and the normal of projection plane (2.28), the image of the estimated line can be computed by: [ ] l K R [ T ] R } {{ } Q L. (5.2) The matrix Q is called line projection matrix. The similarity measure is then the orthogonal distances from image line segments to the re-projection of estimated 3D lines i.e. Q L. Unlike the real image dataset, the synthetic data are generated from a control 3D structure. It is hence possible to evaluate and compare the performance of both the proposed methods and reference ones by using error in 3D. Since, in the estimation, the object space coordinate system is attached to the first camera s coordinate system, the estimated 3D structure needs to be transformed to the actual object space coordinate system before computing 3D error. The appropriate transformation is the Euclidean one because the cameras used in the estimation are calibrated. Such transformation can be determined by estimating rotation and translation that minimize the orthogonal distances from 3D line segments endpoints and the estimated 3D lines. Therefore, the 3D reconstruction error, which is the orthogonal distance from estimated lines to end-points of 3D line segments, is computed after least square Euclidean alignment with ground truth. 81

97 Figure 5.1: The similarity measure between the re-projection of the estimated line and its corresponding line segment is the orthogonal distance from line segment endpoints to the re-projected line. 5.2 Experiments on synthetic data To test the performance of the proposed method on controlled setup, the experiments on synthetic data were conducted. Each 3D line was generated by 2 random points in a unit cube [ 1, 1] [ 1, 1] [ 1, 1] and each camera had the identical calibration matrix 2. Each camera pointed to the center of the scene, which is the origin of the object space. The optical centers of cameras are on the circle with radius 4 meters on the plane Z = 3. Namely, the camera optical center were revolved abound the Z axis. The 3D line segments were projected to the image planes by the projections of their endpoints. To analyze the effect of observation noise on the image space on the estimation result, Gaussian random noise with standard deviation σ was added to the endpoints of line 2 The calibration matrix of the camera in Middlebury multi-view stereo dataset, middlebury.edu/mview/data/, is used 82

98 segments on the images and the true camera matrices were perturbed and used in the experiments. The performance was measured by Root Mean Squared (RMS) 2D re-projection and 3D reconstruction errors. The first experiment on synthetic data was the wide baseline case using 40 lines and 8 cameras where the distance between the optical centers of two consecutive cameras is constant. see Figure 5.2. For the averaging approach 3D line estimation, the 3D line sample generation starts from using 5 images. The initial 3D line estimates from dual projective space approach were used as the initial solution for all bundle adjustment models including the proposed and reference ones. The estimation process for all methods was repeated 50 times with different random noise and the averaged RMS of 2D and 3D error are reported in Figure 5.3. The plot shows the performance of both the proposed line estimation methods and the bundle adjustment along with alternative ones as functions of standard deviation of noise. It can be observed that the errors in line estimation and bundle adjustment increase linearly with the increase in the standard deviation of observation noise. The estimation errors were reduced after bundle adjustment. The second experiment on synthetic data was the short baseline case. 40 lines and 8 cameras were used. Camera poses were generated by rotating a current camera by 9 degrees about Z-axis, see Figure trials of estimation process were performed. The Root Mean Square (RMS) of errors are reported in Figure 5.5. It can be observed that the reconstruction errors in line estimation and bundle adjustment of the short baseline case are higher than that of wide baseline case. The increasing of the reconstruction error is natural when the baseline between camera is reduced because insufficient parallaxes between viewpoints. 83

99 45 degrees 4 meters 3 meters (a) Camera setup parameters (b) Cameras in 3D Figure 5.2: Camera setup for wide baseline 84

100 Reprojection error [pixel] Dual projective space approach Averaging approach Hartley and Zisserman Proposed bundle adjustment model Taylor and Kriegman Bartoli and Sturm Standard deviation [pixel] (a) Reconstruction error [meter] 9 x Dual projective space approach Averaging approach Hartley and Zisserman Proposed bundle adjustment model Taylor and Kriegman Bartoli and Sturm Standard deviation [pixel] (b) Figure 5.3: Experiment result on wide base line case. For the comparison for 3D line estimation methods, it can be observed that the dual projective space based 3D line estimation gives the lowest 2D re-projection error among all 85

101 3D line estimation methods in both wide and short baseline cases. In contrast, the averaging approach for 3D line estimation gives the best result in terms of 3D reconstruction error. 5.3 Experiment on real data The proposed method are also tested on two real datasets i.e. the model house and book sequences. The performances of the proposed methods are compared with alternative ones by using only re-projection error because of the lacking of the scene ground truth The model house sequence In this experiment, a sequence, called model house sequence from Oxford s visual geometry group 3 was used. This dataset was generated by taking the images of a model house placed on a revolvable table. Conversely, by considering that the model house was not moving, the camera was moved around the model house in order to obtain views of different sides of the model house. With the provided camera matrices, the cameras were assumed to be calibrated. That is, only the calibration matrices were used as the initial information. The camera motions were then initialized by the factorization method [2] using the provided point matches. The calibration matrices were then used to stratify the camera motion to Euclidean space. The provided line matches and the initial camera matrix estimates were used to estimate lines. Note that most lines were not visible on all views. At this step, the dual projective space approach 3D line estimation method reported the RMS of the orthogonal distance from the re-projected line estimates and their corresponding line segment 0.85 pixel while the averaging approach 0.63 pixel and Hartley and Zisserman s approach [1, page 323] 0.62 pixel. 3 vgg/data/data-mview.html 86

102 The estimated lines from dual projective space approach and the initial camera matrices are then subjected to the bundle adjustment process proposed in this dissertation and two alternative models proposed in [83] and [15]. For quantitative justification, the performance of the proposed method was measured by using orthogonal distance between line segment end-points and projected line estimates, as discussed in Section 5.1. The performance of all the methods are tabulated in Table 5.1. In this table, the RMS, minimum, maximum and standard deviation of orthogonal distance are report. The RMS of the orthogonal distance from the proposed method is closest to that of [15] which directly minimizes the orthogonal distance from line segments endpoints and re-projected line estimates. However, our proposed method shows smallest standard deviation. For visual justification of the proposed bundle adjustment model, the estimated 3D lines are reprojected onto the image planes using (5.2). Figure 5.6 shows the re-projections of 3D line estimates on two images of the model house sequence where the end-points of each line segment were orthogonally projected to the re-projection of their corresponding 3D line estimates. In Figure 5.7, reconstructed 3D structure of the model house seen from different viewpoints are demonstrated. In Figure 5.9, the top view of the model house from the 3D data provided by the dataset along with the bundle adjustment results from [83, 15] and the proposed method are visualized. Although the 3D line data coming with the data does not look reasonable, the 3D lines are projected to their correct position on the image plane using the provided camera matrices. Similarly, the reference [15] gives the best re-projection error but its re-constructed scene does not look reasonable. In contrast, the proposed method has a visually best re-constructed scene and acceptable re-projection error compared to the best re-projection error. Figures 5.7b and 5.9d show the side and top view of the reconstructed model house from the proposed bundle adjustment model. The 87

103 co-planar features are nearly coplanar from the reconstruction e.g. windows and doors on the wall. Notice that the proposed bundle adjustment model could not provide accurate estimates for the positions of line segments on the roof since the cameras were revolved about the model house. Hence, there are not enough information from the top of the model house. Table 5.1: Comparative results from the experiment with the model house sequence. Proposed approach Reference [83] Reference [15] RMS of orthogonal distance (pixel) Minimum orthogonal distance (pixel) Maximum orthogonal distance (pixel) Standard deviation of orthogonal distance (pixel) The book sequence The book sequence consists of 9 images were taken by a calibrated cameras. There are 25 manually detected lines which are visible on all images. The camera motions were initialized by the homography decomposition method [92]. The homographies between ground plane and image planes were computed using just four corners of standard letter size paper flatly laid on the ground plane. The initial camera matrix estimates are used to estimate lines using the proposed methods and Hartley and Zisserman s method. At 88

104 this step, the dual projective space approach 3D line estimation reported the RMS of the orthogonal distance from the re-projected line estimates and their corresponding line segment 3.21 pixels while the averaging approach 4.05 pixels and Hartley and Zisserman s approach [1, page 323] 3.46 pixels. The 2D re-projection errors of this sequence is bigger than those of the model house sequence because fewer information was used to initialize the camera matrices. That is, only four points were used to estimate homography between the image planes and reference plane in the scene. The obtained calibration matrix was used in the bundle adjustment model without any further refinement. Table 5.2: Comparative results from the experiment with the book sequence. Proposed approach Reference [83] Reference [15] RMS of orthogonal distance (pixel) Minimum orthogonal distance (pixel) Maximum orthogonal distance (pixel) Standard deviation of orthogonal distance (pixel) The re-proejection error after bundle adjustment from the proposed method and references [83, 15] are reported in Table 5.2. The RMS of the orthogonal distance from the proposed method is closest to that of [15]. Although, the cost function of [15] is directly the orthogonal distance between end-points of line segments and projected line estimates, our proposed method shows smallest standard deviation. 89

105 For visual justification of the bundle adjustment result, the estimated 3D lines from the proposed model are re-projected onto two sample images of the book sequence in Figure The zoom on an area of an image is illustrated in Figure Note the significant improvement of the bundle adjustment over the line estimation method. In Figure 5.14, reconstructed 3D scene of the book sequence seen from different views are demonstrated. Note the orthogonality, verticality and parallelity of line features from book, bookend and ruler in the top view of the reconstructed 3D scene in Figure 5.13a. Figure 5.13b illustrates the front view of the reconstructed scene. Coplanar lines on the actual object appear in the same configuration in the reconstruction. In contrast to the model house data, the book sequence gives a better reconstruction because of the variation in viewpoints and estimated lines are visible in all views Fish eye camera In order the demonstrate the versatility of the proposed methods, an experiment using fish eye camera was conduct. The Omnidirectional Camera Calibration Toolbox for Matlab (OCamCalib) developed by Scaramuzza 4 was used to calibrate the camera and the checkerboards was used as the calibration pattern. The calibration toolbox estimated the polynomial function and affine transformation of the camera as defined by Micusik and Pajdla [93] and also the camera motions. Corners of squares were used as line segments endpoints. In stead of using the back-projection of the points on image plane to object space by (2.25) which defined for the pinhole camera, the points were mapped onto the Gaussian sphere by first projecting them to the object space using the inverse projection function of the camera model and then normalized them to unit vector

106 To show the recovery performance, the camera motions from the calibration were randomly perturbed. Given the perturbed camera motions, the 3D line estimation and bundle adjustment can then be performed following methods proposed in Chapters 3 and 4, respectively. In this experiment, the line-based bundle adjustment models from Taylor and Kriegman [83] and Bartoli and Sturm [15] are not used for a comparison because their models were proposed for conventional camera. To visualized the re-projection of the estimated 3D lines, line segments endpoint were projected back onto the estimated lines. At this step, endpoints of all 3D estimated lines were obtained. Points between 3D line endpoints were then re-projected to images. Figure 5.15 illustrates the re-projections of 3D line segments. The reconstructed calibration pattern and the camera motions are illustrated in Figure Notice the co-planarity of reconstructed 3D lines in Figure 5.16b and orthogonality and parallelity between lines in Figure 5.17a. The comparison between the initial and adjusted results are shown in Figure 5.18 and the solution was improved after bundle adjustment. 91

107 9 degrees 4 meters 3 meters (a) Camera setup parameters (b) Cameras in 3D Figure 5.4: Camera setup for short baseline 92

108 3 Reprojection error [pixel] Dual projective space approach Averaging approach Hartley and Zisserman Proposed bundle adjustment model Taylor and Kriegman Bartoli and Sturm Standard deviation [pixel] (a) Reconstruction error [meter] Dual projective space approach Averaging approach Hartley and Zisserman Proposed bundle adjustment model Taylor and Kriegman Bartoli and Sturm Standard deviation [pixel] (b) Figure 5.5: Experiment result on short baseline case. 93

109 Figure 5.6: Two images from the model house sequence overlaid with image lines (yellow dash lines) and re-projected line estimates (black solid lines). 94

110 (a) (b) Figure 5.7: 3D line reconstruction and estimated camera motions from the proposed method after bundle adjustment. 5.7a shows the 3D structure and camera poses. 5.7b shows the side view of the model house. 95

111 (a) (b) Figure 5.8: Reconstructed 3D scene viewed from two different viewpoints 96

112 (a) Triangulated 3D line from the dataset (b) Reference [83] (c) Reference [15] (d) Proposed method Figure 5.9: Top view of the reconstructed model house sequence from different methods. 97

113 Figure 5.10: Reprojection of estimated 3D lines onto two sample images of the book sequence. The manually detected image lines are plotted in yellow dash lines and reprojected line estimates in black solid lines. 98

114 (a) Initial solution (b) After bundle adjustment Figure 5.11: Zoom on an area of an image from the book sequence overlaid with image lines (yellow dash lines) and re-projected line estimates (black solid lines). The result from initial 3D line triangulation is improved by the bundle adjustment process. 99

115 (a) Figure 5.12: The reconstructed 3D scene and camera poses. 100

116 (a) (b) Figure 5.13: Top and front views of the reconstructed scene. 101

117 (a) (b) Figure 5.14: Re-constructed 3D scene viewed from different angles. 102

118 (a) (b) Figure 5.15: Re-projection of the estimated lines to the image. The re-projected lines are shown in yellow solid line. 103

Multiple View Geometry in Computer Vision Second Edition

Multiple View Geometry in Computer Vision Second Edition Multiple View Geometry in Computer Vision Second Edition Richard Hartley Australian National University, Canberra, Australia Andrew Zisserman University of Oxford, UK CAMBRIDGE UNIVERSITY PRESS Contents

More information

calibrated coordinates Linear transformation pixel coordinates

calibrated coordinates Linear transformation pixel coordinates 1 calibrated coordinates Linear transformation pixel coordinates 2 Calibration with a rig Uncalibrated epipolar geometry Ambiguities in image formation Stratified reconstruction Autocalibration with partial

More information

Visual Recognition: Image Formation

Visual Recognition: Image Formation Visual Recognition: Image Formation Raquel Urtasun TTI Chicago Jan 5, 2012 Raquel Urtasun (TTI-C) Visual Recognition Jan 5, 2012 1 / 61 Today s lecture... Fundamentals of image formation You should know

More information

Projective geometry for Computer Vision

Projective geometry for Computer Vision Department of Computer Science and Engineering IIT Delhi NIT, Rourkela March 27, 2010 Overview Pin-hole camera Why projective geometry? Reconstruction Computer vision geometry: main problems Correspondence

More information

Camera model and multiple view geometry

Camera model and multiple view geometry Chapter Camera model and multiple view geometry Before discussing how D information can be obtained from images it is important to know how images are formed First the camera model is introduced and then

More information

Exponential Maps for Computer Vision

Exponential Maps for Computer Vision Exponential Maps for Computer Vision Nick Birnie School of Informatics University of Edinburgh 1 Introduction In computer vision, the exponential map is the natural generalisation of the ordinary exponential

More information

Index. 3D reconstruction, point algorithm, point algorithm, point algorithm, point algorithm, 263

Index. 3D reconstruction, point algorithm, point algorithm, point algorithm, point algorithm, 263 Index 3D reconstruction, 125 5+1-point algorithm, 284 5-point algorithm, 270 7-point algorithm, 265 8-point algorithm, 263 affine point, 45 affine transformation, 57 affine transformation group, 57 affine

More information

Flexible Calibration of a Portable Structured Light System through Surface Plane

Flexible Calibration of a Portable Structured Light System through Surface Plane Vol. 34, No. 11 ACTA AUTOMATICA SINICA November, 2008 Flexible Calibration of a Portable Structured Light System through Surface Plane GAO Wei 1 WANG Liang 1 HU Zhan-Yi 1 Abstract For a portable structured

More information

Chapters 1 7: Overview

Chapters 1 7: Overview Chapters 1 7: Overview Chapter 1: Introduction Chapters 2 4: Data acquisition Chapters 5 7: Data manipulation Chapter 5: Vertical imagery Chapter 6: Image coordinate measurements and refinements Chapter

More information

Stereo Vision. MAN-522 Computer Vision

Stereo Vision. MAN-522 Computer Vision Stereo Vision MAN-522 Computer Vision What is the goal of stereo vision? The recovery of the 3D structure of a scene using two or more images of the 3D scene, each acquired from a different viewpoint in

More information

CSE 252B: Computer Vision II

CSE 252B: Computer Vision II CSE 252B: Computer Vision II Lecturer: Serge Belongie Scribe: Sameer Agarwal LECTURE 1 Image Formation 1.1. The geometry of image formation We begin by considering the process of image formation when a

More information

Index. 3D reconstruction, point algorithm, point algorithm, point algorithm, point algorithm, 253

Index. 3D reconstruction, point algorithm, point algorithm, point algorithm, point algorithm, 253 Index 3D reconstruction, 123 5+1-point algorithm, 274 5-point algorithm, 260 7-point algorithm, 255 8-point algorithm, 253 affine point, 43 affine transformation, 55 affine transformation group, 55 affine

More information

Computer Vision I - Algorithms and Applications: Multi-View 3D reconstruction

Computer Vision I - Algorithms and Applications: Multi-View 3D reconstruction Computer Vision I - Algorithms and Applications: Multi-View 3D reconstruction Carsten Rother 09/12/2013 Computer Vision I: Multi-View 3D reconstruction Roadmap this lecture Computer Vision I: Multi-View

More information

Computer Vision I - Appearance-based Matching and Projective Geometry

Computer Vision I - Appearance-based Matching and Projective Geometry Computer Vision I - Appearance-based Matching and Projective Geometry Carsten Rother 05/11/2015 Computer Vision I: Image Formation Process Roadmap for next four lectures Computer Vision I: Image Formation

More information

Multiple View Geometry in Computer Vision

Multiple View Geometry in Computer Vision Multiple View Geometry in Computer Vision Prasanna Sahoo Department of Mathematics University of Louisville 1 More on Single View Geometry Lecture 11 2 In Chapter 5 we introduced projection matrix (which

More information

Exterior Orientation Parameters

Exterior Orientation Parameters Exterior Orientation Parameters PERS 12/2001 pp 1321-1332 Karsten Jacobsen, Institute for Photogrammetry and GeoInformation, University of Hannover, Germany The georeference of any photogrammetric product

More information

CHAPTER 3. Single-view Geometry. 1. Consequences of Projection

CHAPTER 3. Single-view Geometry. 1. Consequences of Projection CHAPTER 3 Single-view Geometry When we open an eye or take a photograph, we see only a flattened, two-dimensional projection of the physical underlying scene. The consequences are numerous and startling.

More information

Rigid Body Motion and Image Formation. Jana Kosecka, CS 482

Rigid Body Motion and Image Formation. Jana Kosecka, CS 482 Rigid Body Motion and Image Formation Jana Kosecka, CS 482 A free vector is defined by a pair of points : Coordinates of the vector : 1 3D Rotation of Points Euler angles Rotation Matrices in 3D 3 by 3

More information

Contents. 1 Introduction Background Organization Features... 7

Contents. 1 Introduction Background Organization Features... 7 Contents 1 Introduction... 1 1.1 Background.... 1 1.2 Organization... 2 1.3 Features... 7 Part I Fundamental Algorithms for Computer Vision 2 Ellipse Fitting... 11 2.1 Representation of Ellipses.... 11

More information

Structure from Motion. Prof. Marco Marcon

Structure from Motion. Prof. Marco Marcon Structure from Motion Prof. Marco Marcon Summing-up 2 Stereo is the most powerful clue for determining the structure of a scene Another important clue is the relative motion between the scene and (mono)

More information

Computer Vision Projective Geometry and Calibration. Pinhole cameras

Computer Vision Projective Geometry and Calibration. Pinhole cameras Computer Vision Projective Geometry and Calibration Professor Hager http://www.cs.jhu.edu/~hager Jason Corso http://www.cs.jhu.edu/~jcorso. Pinhole cameras Abstract camera model - box with a small hole

More information

Homogeneous Coordinates. Lecture18: Camera Models. Representation of Line and Point in 2D. Cross Product. Overall scaling is NOT important.

Homogeneous Coordinates. Lecture18: Camera Models. Representation of Line and Point in 2D. Cross Product. Overall scaling is NOT important. Homogeneous Coordinates Overall scaling is NOT important. CSED44:Introduction to Computer Vision (207F) Lecture8: Camera Models Bohyung Han CSE, POSTECH bhhan@postech.ac.kr (",, ) ()", ), )) ) 0 It is

More information

Stereo and Epipolar geometry

Stereo and Epipolar geometry Previously Image Primitives (feature points, lines, contours) Today: Stereo and Epipolar geometry How to match primitives between two (multiple) views) Goals: 3D reconstruction, recognition Jana Kosecka

More information

Camera Calibration. Schedule. Jesus J Caban. Note: You have until next Monday to let me know. ! Today:! Camera calibration

Camera Calibration. Schedule. Jesus J Caban. Note: You have until next Monday to let me know. ! Today:! Camera calibration Camera Calibration Jesus J Caban Schedule! Today:! Camera calibration! Wednesday:! Lecture: Motion & Optical Flow! Monday:! Lecture: Medical Imaging! Final presentations:! Nov 29 th : W. Griffin! Dec 1

More information

A Stratified Approach for Camera Calibration Using Spheres

A Stratified Approach for Camera Calibration Using Spheres IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. XX, NO. Y, MONTH YEAR 1 A Stratified Approach for Camera Calibration Using Spheres Kwan-Yee K. Wong, Member, IEEE, Guoqiang Zhang, Student-Member, IEEE and Zhihu

More information

3-D D Euclidean Space - Vectors

3-D D Euclidean Space - Vectors 3-D D Euclidean Space - Vectors Rigid Body Motion and Image Formation A free vector is defined by a pair of points : Jana Kosecka http://cs.gmu.edu/~kosecka/cs682.html Coordinates of the vector : 3D Rotation

More information

Metric Rectification for Perspective Images of Planes

Metric Rectification for Perspective Images of Planes 789139-3 University of California Santa Barbara Department of Electrical and Computer Engineering CS290I Multiple View Geometry in Computer Vision and Computer Graphics Spring 2006 Metric Rectification

More information

CS-9645 Introduction to Computer Vision Techniques Winter 2019

CS-9645 Introduction to Computer Vision Techniques Winter 2019 Table of Contents Projective Geometry... 1 Definitions...1 Axioms of Projective Geometry... Ideal Points...3 Geometric Interpretation... 3 Fundamental Transformations of Projective Geometry... 4 The D

More information

Unit 3 Multiple View Geometry

Unit 3 Multiple View Geometry Unit 3 Multiple View Geometry Relations between images of a scene Recovering the cameras Recovering the scene structure http://www.robots.ox.ac.uk/~vgg/hzbook/hzbook1.html 3D structure from images Recover

More information

3D reconstruction class 11

3D reconstruction class 11 3D reconstruction class 11 Multiple View Geometry Comp 290-089 Marc Pollefeys Multiple View Geometry course schedule (subject to change) Jan. 7, 9 Intro & motivation Projective 2D Geometry Jan. 14, 16

More information

Epipolar Geometry Prof. D. Stricker. With slides from A. Zisserman, S. Lazebnik, Seitz

Epipolar Geometry Prof. D. Stricker. With slides from A. Zisserman, S. Lazebnik, Seitz Epipolar Geometry Prof. D. Stricker With slides from A. Zisserman, S. Lazebnik, Seitz 1 Outline 1. Short introduction: points and lines 2. Two views geometry: Epipolar geometry Relation point/line in two

More information

Multiple Views Geometry

Multiple Views Geometry Multiple Views Geometry Subhashis Banerjee Dept. Computer Science and Engineering IIT Delhi email: suban@cse.iitd.ac.in January 2, 28 Epipolar geometry Fundamental geometric relationship between two perspective

More information

Midterm Exam Solutions

Midterm Exam Solutions Midterm Exam Solutions Computer Vision (J. Košecká) October 27, 2009 HONOR SYSTEM: This examination is strictly individual. You are not allowed to talk, discuss, exchange solutions, etc., with other fellow

More information

Epipolar Geometry in Stereo, Motion and Object Recognition

Epipolar Geometry in Stereo, Motion and Object Recognition Epipolar Geometry in Stereo, Motion and Object Recognition A Unified Approach by GangXu Department of Computer Science, Ritsumeikan University, Kusatsu, Japan and Zhengyou Zhang INRIA Sophia-Antipolis,

More information

Homogeneous coordinates, lines, screws and twists

Homogeneous coordinates, lines, screws and twists Homogeneous coordinates, lines, screws and twists In lecture 1 of module 2, a brief mention was made of homogeneous coordinates, lines in R 3, screws and twists to describe the general motion of a rigid

More information

Camera models and calibration

Camera models and calibration Camera models and calibration Read tutorial chapter 2 and 3. http://www.cs.unc.edu/~marc/tutorial/ Szeliski s book pp.29-73 Schedule (tentative) 2 # date topic Sep.8 Introduction and geometry 2 Sep.25

More information

Today. Stereo (two view) reconstruction. Multiview geometry. Today. Multiview geometry. Computational Photography

Today. Stereo (two view) reconstruction. Multiview geometry. Today. Multiview geometry. Computational Photography Computational Photography Matthias Zwicker University of Bern Fall 2009 Today From 2D to 3D using multiple views Introduction Geometry of two views Stereo matching Other applications Multiview geometry

More information

Computer Vision I - Appearance-based Matching and Projective Geometry

Computer Vision I - Appearance-based Matching and Projective Geometry Computer Vision I - Appearance-based Matching and Projective Geometry Carsten Rother 01/11/2016 Computer Vision I: Image Formation Process Roadmap for next four lectures Computer Vision I: Image Formation

More information

Computer Vision Lecture 17

Computer Vision Lecture 17 Computer Vision Lecture 17 Epipolar Geometry & Stereo Basics 13.01.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Seminar in the summer semester

More information

COMPUTER AND ROBOT VISION

COMPUTER AND ROBOT VISION VOLUME COMPUTER AND ROBOT VISION Robert M. Haralick University of Washington Linda G. Shapiro University of Washington T V ADDISON-WESLEY PUBLISHING COMPANY Reading, Massachusetts Menlo Park, California

More information

Computer Vision Lecture 17

Computer Vision Lecture 17 Announcements Computer Vision Lecture 17 Epipolar Geometry & Stereo Basics Seminar in the summer semester Current Topics in Computer Vision and Machine Learning Block seminar, presentations in 1 st week

More information

Multiple View Geometry in Computer Vision

Multiple View Geometry in Computer Vision Multiple View Geometry in Computer Vision Prasanna Sahoo Department of Mathematics University of Louisville 1 Projective 3D Geometry (Back to Chapter 2) Lecture 6 2 Singular Value Decomposition Given a

More information

But First: Multi-View Projective Geometry

But First: Multi-View Projective Geometry View Morphing (Seitz & Dyer, SIGGRAPH 96) Virtual Camera Photograph Morphed View View interpolation (ala McMillan) but no depth no camera information Photograph But First: Multi-View Projective Geometry

More information

Hand-Eye Calibration from Image Derivatives

Hand-Eye Calibration from Image Derivatives Hand-Eye Calibration from Image Derivatives Abstract In this paper it is shown how to perform hand-eye calibration using only the normal flow field and knowledge about the motion of the hand. The proposed

More information

3D Reconstruction from Scene Knowledge

3D Reconstruction from Scene Knowledge Multiple-View Reconstruction from Scene Knowledge 3D Reconstruction from Scene Knowledge SYMMETRY & MULTIPLE-VIEW GEOMETRY Fundamental types of symmetry Equivalent views Symmetry based reconstruction MUTIPLE-VIEW

More information

CS231A Course Notes 4: Stereo Systems and Structure from Motion

CS231A Course Notes 4: Stereo Systems and Structure from Motion CS231A Course Notes 4: Stereo Systems and Structure from Motion Kenji Hata and Silvio Savarese 1 Introduction In the previous notes, we covered how adding additional viewpoints of a scene can greatly enhance

More information

CS201 Computer Vision Camera Geometry

CS201 Computer Vision Camera Geometry CS201 Computer Vision Camera Geometry John Magee 25 November, 2014 Slides Courtesy of: Diane H. Theriault (deht@bu.edu) Question of the Day: How can we represent the relationships between cameras and the

More information

Multiple View Geometry in computer vision

Multiple View Geometry in computer vision Multiple View Geometry in computer vision Chapter 8: More Single View Geometry Olaf Booij Intelligent Systems Lab Amsterdam University of Amsterdam, The Netherlands HZClub 29-02-2008 Overview clubje Part

More information

Partial Calibration and Mirror Shape Recovery for Non-Central Catadioptric Systems

Partial Calibration and Mirror Shape Recovery for Non-Central Catadioptric Systems Partial Calibration and Mirror Shape Recovery for Non-Central Catadioptric Systems Abstract In this paper we present a method for mirror shape recovery and partial calibration for non-central catadioptric

More information

Structure from Motion. Introduction to Computer Vision CSE 152 Lecture 10

Structure from Motion. Introduction to Computer Vision CSE 152 Lecture 10 Structure from Motion CSE 152 Lecture 10 Announcements Homework 3 is due May 9, 11:59 PM Reading: Chapter 8: Structure from Motion Optional: Multiple View Geometry in Computer Vision, 2nd edition, Hartley

More information

Multiple View Geometry in Computer Vision

Multiple View Geometry in Computer Vision Multiple View Geometry in Computer Vision Prasanna Sahoo Department of Mathematics University of Louisville 1 Structure Computation Lecture 18 March 22, 2005 2 3D Reconstruction The goal of 3D reconstruction

More information

Prof. Fanny Ficuciello Robotics for Bioengineering Visual Servoing

Prof. Fanny Ficuciello Robotics for Bioengineering Visual Servoing Visual servoing vision allows a robotic system to obtain geometrical and qualitative information on the surrounding environment high level control motion planning (look-and-move visual grasping) low level

More information

Structure from motion

Structure from motion Structure from motion Structure from motion Given a set of corresponding points in two or more images, compute the camera parameters and the 3D point coordinates?? R 1,t 1 R 2,t 2 R 3,t 3 Camera 1 Camera

More information

C / 35. C18 Computer Vision. David Murray. dwm/courses/4cv.

C / 35. C18 Computer Vision. David Murray.   dwm/courses/4cv. C18 2015 1 / 35 C18 Computer Vision David Murray david.murray@eng.ox.ac.uk www.robots.ox.ac.uk/ dwm/courses/4cv Michaelmas 2015 C18 2015 2 / 35 Computer Vision: This time... 1. Introduction; imaging geometry;

More information

Structure from motion

Structure from motion Structure from motion Structure from motion Given a set of corresponding points in two or more images, compute the camera parameters and the 3D point coordinates?? R 1,t 1 R 2,t R 2 3,t 3 Camera 1 Camera

More information

Epipolar Geometry and the Essential Matrix

Epipolar Geometry and the Essential Matrix Epipolar Geometry and the Essential Matrix Carlo Tomasi The epipolar geometry of a pair of cameras expresses the fundamental relationship between any two corresponding points in the two image planes, and

More information

EXAM SOLUTIONS. Image Processing and Computer Vision Course 2D1421 Monday, 13 th of March 2006,

EXAM SOLUTIONS. Image Processing and Computer Vision Course 2D1421 Monday, 13 th of March 2006, School of Computer Science and Communication, KTH Danica Kragic EXAM SOLUTIONS Image Processing and Computer Vision Course 2D1421 Monday, 13 th of March 2006, 14.00 19.00 Grade table 0-25 U 26-35 3 36-45

More information

METRIC PLANE RECTIFICATION USING SYMMETRIC VANISHING POINTS

METRIC PLANE RECTIFICATION USING SYMMETRIC VANISHING POINTS METRIC PLANE RECTIFICATION USING SYMMETRIC VANISHING POINTS M. Lefler, H. Hel-Or Dept. of CS, University of Haifa, Israel Y. Hel-Or School of CS, IDC, Herzliya, Israel ABSTRACT Video analysis often requires

More information

Multiple View Geometry

Multiple View Geometry Multiple View Geometry CS 6320, Spring 2013 Guest Lecture Marcel Prastawa adapted from Pollefeys, Shah, and Zisserman Single view computer vision Projective actions of cameras Camera callibration Photometric

More information

Chapters 1 9: Overview

Chapters 1 9: Overview Chapters 1 9: Overview Chapter 1: Introduction Chapters 2 4: Data acquisition Chapters 5 9: Data manipulation Chapter 5: Vertical imagery Chapter 6: Image coordinate measurements and refinements Chapters

More information

The Geometry of Multiple Images The Laws That Govern the Formation of Multiple Images of a Scene and Some of Thcir Applications

The Geometry of Multiple Images The Laws That Govern the Formation of Multiple Images of a Scene and Some of Thcir Applications The Geometry of Multiple Images The Laws That Govern the Formation of Multiple Images of a Scene and Some of Thcir Applications Olivier Faugeras QUC1ng-Tuan Luong with contributions from Theo Papadopoulo

More information

Algebraic Geometry of Segmentation and Tracking

Algebraic Geometry of Segmentation and Tracking Ma191b Winter 2017 Geometry of Neuroscience Geometry of lines in 3-space and Segmentation and Tracking This lecture is based on the papers: Reference: Marco Pellegrini, Ray shooting and lines in space.

More information

Stereo imaging ideal geometry

Stereo imaging ideal geometry Stereo imaging ideal geometry (X,Y,Z) Z f (x L,y L ) f (x R,y R ) Optical axes are parallel Optical axes separated by baseline, b. Line connecting lens centers is perpendicular to the optical axis, and

More information

URBAN STRUCTURE ESTIMATION USING PARALLEL AND ORTHOGONAL LINES

URBAN STRUCTURE ESTIMATION USING PARALLEL AND ORTHOGONAL LINES URBAN STRUCTURE ESTIMATION USING PARALLEL AND ORTHOGONAL LINES An Undergraduate Research Scholars Thesis by RUI LIU Submitted to Honors and Undergraduate Research Texas A&M University in partial fulfillment

More information

arxiv: v1 [cs.cv] 28 Sep 2018

arxiv: v1 [cs.cv] 28 Sep 2018 Camera Pose Estimation from Sequence of Calibrated Images arxiv:1809.11066v1 [cs.cv] 28 Sep 2018 Jacek Komorowski 1 and Przemyslaw Rokita 2 1 Maria Curie-Sklodowska University, Institute of Computer Science,

More information

Camera Geometry II. COS 429 Princeton University

Camera Geometry II. COS 429 Princeton University Camera Geometry II COS 429 Princeton University Outline Projective geometry Vanishing points Application: camera calibration Application: single-view metrology Epipolar geometry Application: stereo correspondence

More information

More on single-view geometry class 10

More on single-view geometry class 10 More on single-view geometry class 10 Multiple View Geometry Comp 290-089 Marc Pollefeys Multiple View Geometry course schedule (subject to change) Jan. 7, 9 Intro & motivation Projective 2D Geometry Jan.

More information

Lecture 9: Epipolar Geometry

Lecture 9: Epipolar Geometry Lecture 9: Epipolar Geometry Professor Fei Fei Li Stanford Vision Lab 1 What we will learn today? Why is stereo useful? Epipolar constraints Essential and fundamental matrix Estimating F (Problem Set 2

More information

Model Based Perspective Inversion

Model Based Perspective Inversion Model Based Perspective Inversion A. D. Worrall, K. D. Baker & G. D. Sullivan Intelligent Systems Group, Department of Computer Science, University of Reading, RG6 2AX, UK. Anthony.Worrall@reading.ac.uk

More information

Copyright. Anna Marie Bouboulis

Copyright. Anna Marie Bouboulis Copyright by Anna Marie Bouboulis 2013 The Report committee for Anna Marie Bouboulis Certifies that this is the approved version of the following report: Poincaré Disc Models in Hyperbolic Geometry APPROVED

More information

Fundamentals of Stereo Vision Michael Bleyer LVA Stereo Vision

Fundamentals of Stereo Vision Michael Bleyer LVA Stereo Vision Fundamentals of Stereo Vision Michael Bleyer LVA Stereo Vision What Happened Last Time? Human 3D perception (3D cinema) Computational stereo Intuitive explanation of what is meant by disparity Stereo matching

More information

Introduction to Geometric Algebra Lecture VI

Introduction to Geometric Algebra Lecture VI Introduction to Geometric Algebra Lecture VI Leandro A. F. Fernandes laffernandes@inf.ufrgs.br Manuel M. Oliveira oliveira@inf.ufrgs.br Visgraf - Summer School in Computer Graphics - 2010 CG UFRGS Lecture

More information

METR Robotics Tutorial 2 Week 2: Homogeneous Coordinates

METR Robotics Tutorial 2 Week 2: Homogeneous Coordinates METR4202 -- Robotics Tutorial 2 Week 2: Homogeneous Coordinates The objective of this tutorial is to explore homogenous transformations. The MATLAB robotics toolbox developed by Peter Corke might be a

More information

274 Curves on Surfaces, Lecture 5

274 Curves on Surfaces, Lecture 5 274 Curves on Surfaces, Lecture 5 Dylan Thurston Notes by Qiaochu Yuan Fall 2012 5 Ideal polygons Previously we discussed three models of the hyperbolic plane: the Poincaré disk, the upper half-plane,

More information

COMPARATIVE STUDY OF DIFFERENT APPROACHES FOR EFFICIENT RECTIFICATION UNDER GENERAL MOTION

COMPARATIVE STUDY OF DIFFERENT APPROACHES FOR EFFICIENT RECTIFICATION UNDER GENERAL MOTION COMPARATIVE STUDY OF DIFFERENT APPROACHES FOR EFFICIENT RECTIFICATION UNDER GENERAL MOTION Mr.V.SRINIVASA RAO 1 Prof.A.SATYA KALYAN 2 DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING PRASAD V POTLURI SIDDHARTHA

More information

HOUGH TRANSFORM CS 6350 C V

HOUGH TRANSFORM CS 6350 C V HOUGH TRANSFORM CS 6350 C V HOUGH TRANSFORM The problem: Given a set of points in 2-D, find if a sub-set of these points, fall on a LINE. Hough Transform One powerful global method for detecting edges

More information

Rectification and Distortion Correction

Rectification and Distortion Correction Rectification and Distortion Correction Hagen Spies March 12, 2003 Computer Vision Laboratory Department of Electrical Engineering Linköping University, Sweden Contents Distortion Correction Rectification

More information

University of Southern California, 1590 the Alameda #200 Los Angeles, CA San Jose, CA Abstract

University of Southern California, 1590 the Alameda #200 Los Angeles, CA San Jose, CA Abstract Mirror Symmetry 2-View Stereo Geometry Alexandre R.J. François +, Gérard G. Medioni + and Roman Waupotitsch * + Institute for Robotics and Intelligent Systems * Geometrix Inc. University of Southern California,

More information

Computer Vision Project-1

Computer Vision Project-1 University of Utah, School Of Computing Computer Vision Project- Singla, Sumedha sumedha.singla@utah.edu (00877456 February, 205 Theoretical Problems. Pinhole Camera (a A straight line in the world space

More information

Geometric Algebra. 8. Conformal Geometric Algebra. Dr Chris Doran ARM Research

Geometric Algebra. 8. Conformal Geometric Algebra. Dr Chris Doran ARM Research Geometric Algebra 8. Conformal Geometric Algebra Dr Chris Doran ARM Research Motivation Projective geometry showed that there is considerable value in treating points as vectors Key to this is a homogeneous

More information

Computer Vision. Coordinates. Prof. Flávio Cardeal DECOM / CEFET- MG.

Computer Vision. Coordinates. Prof. Flávio Cardeal DECOM / CEFET- MG. Computer Vision Coordinates Prof. Flávio Cardeal DECOM / CEFET- MG cardeal@decom.cefetmg.br Abstract This lecture discusses world coordinates and homogeneous coordinates, as well as provides an overview

More information

Perspective Projection in Homogeneous Coordinates

Perspective Projection in Homogeneous Coordinates Perspective Projection in Homogeneous Coordinates Carlo Tomasi If standard Cartesian coordinates are used, a rigid transformation takes the form X = R(X t) and the equations of perspective projection are

More information

CS452/552; EE465/505. Geometry Transformations

CS452/552; EE465/505. Geometry Transformations CS452/552; EE465/505 Geometry Transformations 1-26-15 Outline! Geometry: scalars, points & vectors! Transformations Read: Angel, Chapter 4 (study cube.html/cube.js example) Appendix B: Spaces (vector,

More information

Feature Transfer and Matching in Disparate Stereo Views through the use of Plane Homographies

Feature Transfer and Matching in Disparate Stereo Views through the use of Plane Homographies Feature Transfer and Matching in Disparate Stereo Views through the use of Plane Homographies M. Lourakis, S. Tzurbakis, A. Argyros, S. Orphanoudakis Computer Vision and Robotics Lab (CVRL) Institute of

More information

BIL Computer Vision Apr 16, 2014

BIL Computer Vision Apr 16, 2014 BIL 719 - Computer Vision Apr 16, 2014 Binocular Stereo (cont d.), Structure from Motion Aykut Erdem Dept. of Computer Engineering Hacettepe University Slide credit: S. Lazebnik Basic stereo matching algorithm

More information

Lecture 3: Camera Calibration, DLT, SVD

Lecture 3: Camera Calibration, DLT, SVD Computer Vision Lecture 3 23--28 Lecture 3: Camera Calibration, DL, SVD he Inner Parameters In this section we will introduce the inner parameters of the cameras Recall from the camera equations λx = P

More information

Lecture 2 September 3

Lecture 2 September 3 EE 381V: Large Scale Optimization Fall 2012 Lecture 2 September 3 Lecturer: Caramanis & Sanghavi Scribe: Hongbo Si, Qiaoyang Ye 2.1 Overview of the last Lecture The focus of the last lecture was to give

More information

A General Expression of the Fundamental Matrix for Both Perspective and Affine Cameras

A General Expression of the Fundamental Matrix for Both Perspective and Affine Cameras A General Expression of the Fundamental Matrix for Both Perspective and Affine Cameras Zhengyou Zhang* ATR Human Information Processing Res. Lab. 2-2 Hikari-dai, Seika-cho, Soraku-gun Kyoto 619-02 Japan

More information

Binocular Stereo Vision. System 6 Introduction Is there a Wedge in this 3D scene?

Binocular Stereo Vision. System 6 Introduction Is there a Wedge in this 3D scene? System 6 Introduction Is there a Wedge in this 3D scene? Binocular Stereo Vision Data a stereo pair of images! Given two 2D images of an object, how can we reconstruct 3D awareness of it? AV: 3D recognition

More information

Robot Vision: Projective Geometry

Robot Vision: Projective Geometry Robot Vision: Projective Geometry Ass.Prof. Friedrich Fraundorfer SS 2018 1 Learning goals Understand homogeneous coordinates Understand points, line, plane parameters and interpret them geometrically

More information

Planes Intersecting Cones: Static Hypertext Version

Planes Intersecting Cones: Static Hypertext Version Page 1 of 12 Planes Intersecting Cones: Static Hypertext Version On this page, we develop some of the details of the plane-slicing-cone picture discussed in the introduction. The relationship between the

More information

Camera Calibration Using Line Correspondences

Camera Calibration Using Line Correspondences Camera Calibration Using Line Correspondences Richard I. Hartley G.E. CRD, Schenectady, NY, 12301. Ph: (518)-387-7333 Fax: (518)-387-6845 Email : hartley@crd.ge.com Abstract In this paper, a method of

More information

EXPERIMENTAL RESULTS ON THE DETERMINATION OF THE TRIFOCAL TENSOR USING NEARLY COPLANAR POINT CORRESPONDENCES

EXPERIMENTAL RESULTS ON THE DETERMINATION OF THE TRIFOCAL TENSOR USING NEARLY COPLANAR POINT CORRESPONDENCES EXPERIMENTAL RESULTS ON THE DETERMINATION OF THE TRIFOCAL TENSOR USING NEARLY COPLANAR POINT CORRESPONDENCES Camillo RESSL Institute of Photogrammetry and Remote Sensing University of Technology, Vienna,

More information

Camera Calibration for a Robust Omni-directional Photogrammetry System

Camera Calibration for a Robust Omni-directional Photogrammetry System Camera Calibration for a Robust Omni-directional Photogrammetry System Fuad Khan 1, Michael Chapman 2, Jonathan Li 3 1 Immersive Media Corporation Calgary, Alberta, Canada 2 Ryerson University Toronto,

More information

Introduction to Geometric Algebra Lecture V

Introduction to Geometric Algebra Lecture V Introduction to Geometric Algebra Lecture V Leandro A. F. Fernandes laffernandes@inf.ufrgs.br Manuel M. Oliveira oliveira@inf.ufrgs.br Visgraf - Summer School in Computer Graphics - 2010 CG UFRGS Lecture

More information

Compositing a bird's eye view mosaic

Compositing a bird's eye view mosaic Compositing a bird's eye view mosaic Robert Laganiere School of Information Technology and Engineering University of Ottawa Ottawa, Ont KN 6N Abstract This paper describes a method that allows the composition

More information

1 (5 max) 2 (10 max) 3 (20 max) 4 (30 max) 5 (10 max) 6 (15 extra max) total (75 max + 15 extra)

1 (5 max) 2 (10 max) 3 (20 max) 4 (30 max) 5 (10 max) 6 (15 extra max) total (75 max + 15 extra) Mierm Exam CS223b Stanford CS223b Computer Vision, Winter 2004 Feb. 18, 2004 Full Name: Email: This exam has 7 pages. Make sure your exam is not missing any sheets, and write your name on every page. The

More information

Mathematics of a Multiple Omni-Directional System

Mathematics of a Multiple Omni-Directional System Mathematics of a Multiple Omni-Directional System A. Torii A. Sugimoto A. Imiya, School of Science and National Institute of Institute of Media and Technology, Informatics, Information Technology, Chiba

More information

Invariance of l and the Conic Dual to Circular Points C

Invariance of l and the Conic Dual to Circular Points C Invariance of l and the Conic Dual to Circular Points C [ ] A t l = (0, 0, 1) is preserved under H = v iff H is an affinity: w [ ] l H l H A l l v 0 [ t 0 v! = = w w] 0 0 v = 0 1 1 C = diag(1, 1, 0) is

More information

Augmented Reality II - Camera Calibration - Gudrun Klinker May 11, 2004

Augmented Reality II - Camera Calibration - Gudrun Klinker May 11, 2004 Augmented Reality II - Camera Calibration - Gudrun Klinker May, 24 Literature Richard Hartley and Andrew Zisserman, Multiple View Geometry in Computer Vision, Cambridge University Press, 2. (Section 5,

More information