Pose Estimation from Circle or Parallel Lines in a Single Image

Pose Estimation from Circle or Parallel Lines in a Single Image Guanghui Wang 1,2, Q.M. Jonathan Wu 1,andZhengqiaoJi 1 1 Department of Electrical and Computer Engineering, The University of Windsor, 41 Sunset, Windsor, Ontario, Canada N9B 3P4 2 National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, 18, P.R. China ghwangca@gmail.com, jwu@uwindsor.ca Abstract. The paper is focused on the problem of pose estimation from a single view in minimum conditions that can be obtained from images. Under the assumption of known intrinsic parameters, we propose and prove that the pose of the camera can be recovered uniquely in three situations: (a) the image of one circle with discriminable center; (b) the image of one circle with preassigned world frame; (c) the image of any two pairs of parallel lines. Compared with previous techniques, the proposed method does not need any 3D measurement of the circle or lines, thus the required conditions are easily satisfied in many scenarios. Extensive experiments are carried out to validate the proposed method. 1 Introduction Determining the position and orientation of a camera from a single image with respect to a reference frame is a basic and important problem in robot vision field. There are many potential applications such as visual navigation, robot localization, object recognition, photogrammetry, visual surveillance and so on. During the past two decades, the problem was widely studied and many approaches have been proposed. One well known pose estimation problem is the perspective-n-point (PnP) problem, which was first proposed by Fishler and Bolles [5]. The problem is to find the pose of an object from the image of n points at known location on it. Following this idea, the problem was further studied by many researchers [6,8,9,15,14]. One of the major concerns of the PnP problem is the multi-solution phenomenon, all PnP problems for n 5 have multiple solutions. Thus we need further information to determine the correct solution [6]. Another kind of localization algorithm is based on line correspondences. Dhome et al. [4] proposed to compute the attitude of object from three line correspondences. Liu et al. [12] discussed some methods to recover the camera pose linearly or nonlinearly by using different combination of line and point features. Ansar and Daniilidis [1] presented a general framework which allows for a novel set of linear solutions to the pose estimation problem for both n points and n lines. Chen [2] proposed a polynomial approach to find close form solution for Y. Yagi et al. (Eds.): ACCV 27, Part II, LNCS 4844, pp. 363 372, 27. c Springer-Verlag Berlin Heidelberg 27

364 G. Wang, Q.M. Jonathan Wu, and Z. Ji pose determination from line-to-plane correspondences. The line based methods also suffer from the problem of multiple solutions. The above methods assume that the camera is calibrated and the positions of the points and lines are known. In practice, it may be hard to obtain the accurate measurements of these features in space. However, some geometrical constraints, such as coplanarity, parallelity and orthogonality, are abundant in many indoor and outdoor structured scenarios. Some researchers proposed to recover the camera pose from the image of a rectangle, two orthogonal parallel lines and some other scene constraints [7,18]. Circle is another very common pattern in man-made objects and scenes, many studies on camera calibration were based on the image of circles [1,11,13]. In this paper, we try to compute the camera s pose from a single image based on geometrical configurations in the scene. Different from previous methods, we propose to use the image of only one circle, or the image of any two pairs of parallel lines that may not be coplanar or orthogonal. The proposed method is widely applicable since the conditions are easily satisfied in many scenarios. 2 Perspective Geometry and Pose Estimation 2.1 Camera Projection and Pose Estimation Under perspective projection, a 3D point x R 3 in space is projected to an image point m R 2 via a rank-3 projection matrix P R 3 4 as s m = P x = K[R, t] x = K[r 1, r 2, r 3, t] x (1) where, x =[x T,w] T and m =[m T,w] T are the homogeneous forms of points x and m respectively, R and t are the rotation matrix and translation vector from the world system to the camera system, s is a non-zero scalar, K is the camera calibration matrix. In this paper, we assume the camera is calibrated, thus we may set K = I 3 = diag(1, 1, 1), which is equivalent to normalize the image coordinates by applying transformation K 1. In this case, the projection matrix is simplified to P =[R, t] =[r 1, r 2, r 3, t]. When all space points are coplanar, the mapping between the space points and their images can be modeled by a plane homography H which is a nonsingular 3 3 homogeneous matrix. Without loss of generality, we may assume the coordinates of the space plane as [,, 1, ] T for a specified world frame, then we have H =[r 1, r 2, t]. Obviously, the rotation matrix R and translation vector t can be factorized directly from the homography. Proposition 1. When the camera is calibrated, the pose of the camera can be recovered from two orthogonal vanishing points in a single view. Proof. Without loss of generality, let us set the X and Y axes of the world system in line with the two orthogonal directions. In the normalized world coordinate system, the direction of X and Y axes are x w =[1,,, ] T and ỹ w =[, 1,, ] T

Pose Estimation from Circle or Parallel Lines in a Single Image 365 respectively, and the homogeneous vector of the world origin is õ w =[,,, 1] T. Under perspective projection, we have: s x ṽ x = P x w =[r 1, r 2, r 3, t][1,,, ] T = r 1 (2) s y ṽ y = P ỹ w =[r 1, r 2, r 3, t][, 1,, ] T = r 2 (3) s o ṽ o = P õ w =[r 1, r 2, r 3, t][,,, 1] T = t (4) Thus the rotation matrix can be computed from r 1 = ± ṽx ṽ x, r 2 = ± ṽy ṽ y, r 3 = r 1 r 2 (5) where the rotation matrix R = [r 1, r 2, r 3 ] may have four solutions if righthanded coordinate system is adopted. While only two of them can ensure that the reconstructed objects lie in front of the camera, which may be seen by the camera. In practice, if the world coordinate frame is preassigned, the rotation matrix may be uniquely determined [19]. Since we have no metric information of the given scene, the translation vector can only be defined up to scale as t v o. This is to say that we can only recover the direction of the translation vector. In practice, the orthonormal constraint should be enforced during the computation since r 1 and r 2 in (5) may not be orthogonal due to image noise. Suppose the SVD decomposition of R 12 =[r 1, r 2 ]isuσ V T,whereΣ is a 3 2matrix made of the two singular values of R 12. Thus we may obtain the best[ approximation to the rotation matrix in the least square sense from R 12 = U 1 V T, 1 ] since a rotation matrix should have unit singular values. 2.2 The Circular Points and Pose Estimation The absolute conic (AC) is a conic on the ideal plane, which can be expressed in matrix form as Ω = diag(1, 1, 1). Obviously, Ω is composed of purely imaginary points on the infinite plane. Under perspective projection, we can obtain the image of the absolute conic (IAC) as ω a =(KK T ) 1, which depends only on the camera calibration matrix K. The IAC is an invisible imaginary point conic in an image. It is easy to verify that the absolute conic intersects the ideal line at two ideal complex conjugate points, which are called the circular points. The circular points can be expressed in canonical form as I =[1,i,, ] T, J =[1, i,, ] T. Under perspective projection, their images can be expressed as: s i m i = P I =[r 1, r 2, r 3, t][1,i,, ] T = r 1 + i r 2 (6) s j m j = P J =[r 1, r 2, r 3, t][1, i,, ] T = r 1 i r 2 (7) Thus the imaged circular points (ICPs) are a pair of complex conjugate points, whose real and imaginary parts are defined by the first two columns of the rotation matrix. However, the rotation matrix can not be determined uniquely from the ICPs since (6) and (7) are defined up to scales.

366 G. Wang, Q.M. Jonathan Wu, and Z. Ji Proposition 2. Suppose m i and m j are the ICPs of a space plane, the world system is set on the plane. Then the pose of the camera can be uniquely determined from m i and m j if one direction of the world frame is preassigned. Proof. It is easy to verify that the line passing through the two imaged circular points is real, which is the vanishing line of the plane and can be computed from l = m i m j. Suppose ox is the image of one axis of the preassigned world frame, its vanishing point v x can be computed from the intersection of line ox with l. If the vanishing point v y of Y direction is recovered, the camera pose can be determined accordingly from Proposition 1. Since the vanishing points of two orthogonal directions { are conjugate with v T respect to the IAC, thus v y can be easily computed from x ωv y = l T v y =.Onthe other hand, since two orthogonal vanishing points are harmonic with respect to the ICPs, their cross ratio Cross(v x v y ; m i m j )= 1. Thus v y can also be computed from the cross ratio. 3 Methods for Pose Estimation 3.1 Pose Estimation from the Image of a Circle Lemma 1. Any circle Ω c in a space plane π intersects the absolute conic Ω at exactly two points, which are the circular points of the plane. Without loss of generality, let us set the XOY world frame on the supporting plane. Then any circle on the plane can be modelled in homogeneous form as (x wx ) 2 +(y wy ) 2 w 2 r 2 =. The plane π intersects the ideal plane π at the vanishing line L. In the extended plane of the complex domain, L has at most two intersections with Ω c. It is easy to verify that the circular points are the intersections. Lemma 2. The image of the circle Ω c intersects the IAC at four complex points, which can be divided into two pairs of complex conjugate points. Under perspective projection, any circle Ω c on space plane is imaged as a conic ω c = H T Ω c H 1, which is an ellipse in nondegenerate case. The absolute conic is projected to the IAC. Both the IAC and ω c are conics of second order that can be written in homogeneous form as x T ω c x =. According to Bézout s theorem, the two conics have four imaginary intersection points since the absolute conic and the circle have no real intersections in space. Suppose the complex point [a + bi] is one intersection, it is easy to verify that the conjugate point [a bi] is also a solution. Thus the four intersections can be divided into two complex conjugate pairs. It is obvious that one pair of them is the ICPs, but the ambiguity can not be solved in the image with one circle. If there are two or more circles on the same or parallel space plane, the ICPs can be uniquely determined since the imaged circular points are the common intersections of each circle with the IAC in the image. However, we may have only one circle in many situations, then how to determine the ICPs in this case?

Pose Estimation from Circle or Parallel Lines in a Single Image 367 Proposition 3. The imaged circular points can be uniquely determined from the image of one circle if the center of the circle can be detected in the image. Proof. As shown in Fig.1, the image of the circle ω c intersects the IAC at two pairs of complex conjugate points m i, m j and m i, m j. Let us define two lines as l = m i m j, l = m i m j (8) then one of the lines must be the vanishing line and the two supporting points must be the ICPs. Suppose o c is the image of the circle center and l is the vanishing line, then there is a pole-polar relationship between the center image o c and the vanishing line with respect to the conic. λl = ω c o c (9) where λ is a scalar. Thus the true vanishing line and imaged circular points can be determined from (9). Under perspective projection, a circle is transformed into a conic. However, the center of the circle in space usually does not project to the center of the corresponding conic in the image, since perspective projection (1) is not a linear mapping from the space to the image. Thus the imaged center of the circle can not be determined only from the contour of the imaged conic. There are several possible ways to recover the projected center of the circle by virtue of more geometrical information, such as by two or more lines passing through the center [13] or by two concentric circles [1,11]. Space plane Image plane v y vy m j Y O c Ω c y ω c o c m j ω a m i m i l vx l O X o x v x (a) (b) Fig. 1. Determining the ICPs from the image of one circle. (a) a circle and preassigned world frame in space; (b) the imaged conic of the circle. Proposition 4. The imaged circular points can be recovered from the image of one circle with preassigned world coordinate system. Proof. As shown in Fig.1, suppose line x and y are the imaged two axes of the preassigned world frame, the two lines intersect l and l at four points. Since the two ICPs and the two orthogonal vanishing points form a harmonic relation. Thus the true ICPs can be determined by verifying the cross ratio of the

368 G. Wang, Q.M. Jonathan Wu, and Z. Ji two pairs of quadruple collinear points {m i, m j, v x, v y } and {m i, m j, v x, v y }. Then the camera pose can be computed according to Proposition 2. 3.2 Pose Estimation from Two Pairs of Parallel Lines Proposition 5. The pose of the camera can be recovered from the image of any two general pairs of parallel lines in the space. Proof. As shown in Fig.2, suppose L 11, L 12 and L 21, L 22 are two pairs of parallel lines in the space, they may not be coplanar or orthogonal. Their images l 11, l 12 and l 21, l 22 intersect at v 1 and v 2 respectively, then v 1 and v 2 must be the vanishing points of the two directions, and the line connecting the two points must be the vanishing line l.thusm i and m j can be computed from the intersections of l with the IAC. Suppose v 1 is one direction of the world frame, o 2 is the image of the world origin. Then the vanishing point { v1 of the v T direction that is orthogonal to v 1 can be easily computed from 1 ωv1 = or l T v 1 = Cross(v 1 v1 ; m im j )= 1, and the pose of the camera can be recovered from Proposition 1. Specifically, the angle α between the two pairs of parallel lines in v the space can be recovered from cos α = T 1 ωav2. If the two pairs of v T 1 ω av 1 v T 2 ω av 2 lines are orthogonal with each other, then we have v1 = v 2. v1 v 2 ω a L22 L 21 m j l 22 l 21 mi l v 1 L 11 l 11 L 12 o 2 α o 1 l 12 Fig. 2. Pose estimation from two pairs of parallel lines. Left: two pairs of parallel lines in the space; Right: the image of the parallel lines. 3.3 Projection Matrix and 3D Reconstruction After retrieving the pose of the camera, the projection matrix with respect to the world frame can be computed from (1). With the projection matrix, any geometry primitive in the image can be back-projected into the space. For example, a point in the image is back-projected to a line, a line is back-projected to a plane and a conic is back-projected to a cone. Based on the scene constraints, many geometrical entities, such as the length ratios, angles, 3D information of some planar surfaces, can be recovered via the technique of single view metrology [3,17,18]. Therefore the 3D structure of some simple objects and scenarios can be reconstructed only from a single image.

Pose Estimation from Circle or Parallel Lines in a Single Image 369 4 Experiments with Simulated Data During simulations, we generated a circle and two orthogonal pairs of parallel lines in the space, whose size and position in the world system are shown in Fig.3. Each line is composed of 5 evenly distributed points, the circle is composed of 1 evenly distributed points. The camera parameters were set as follows: focal length f u = f v = 18, skew s =, principal point u = v =, rotation axis r = [.717,.359,.598], rotation angle α =.84, translation vector t =[2, 2, 1]. The image resolution was set to 6 6 and Gaussian image noise was added on each imaged point. The generated image with 1-pixel Gaussian noise is shown in Fig.3. 15 1 5-5 -1 L 22 L 21 Ω c L 11 L 12-1 1 2 1-1 -2 l 21 l 22 ω c o 2-2 2 l 11 l 12 Fig. 3. The synthetic scenario and image for simulation In the experiments, the image lines and the imaged conic were fitted via least squares. We set L 11 and L 21 as the X and Y axes of the world frame, and recover the ICPs and camera pose according to the proposed methods. Here we only give the result of the recovered rotation matrix. For the convenience of comparison, we decomposed the rotation matrix into the rotation axis and rotation angle, we define the error of the axis as the angle between the recovered axis and the ground truth and define the error of the rotation angle as the absolute error of the recovered angle with the ground truth. We varied the noise level from to 3 pixels with a step of.5 during the test, and took 2 independent tests at each noise level so as to obtain more statistically meaningful results. The mean and.1 Mean of rotation axis error.5 Mean of rotation angle error.1 STD of rotation axis error.5 STD of rotation angle error.8.4.8.4.6 Alg.2.3 Alg.2.6 Alg.2.3 Alg.2.4.2.4.2.2.1.2.1 1 2 3 Noise level 1 2 3 Noise level 1 2 3 Noise level 1 2 3 Noise level Fig. 4. The mean and standard deviation of the errors of the rotation axis and rotation angle with respect to the noise levels

37 G. Wang, Q.M. Jonathan Wu, and Z. Ji standard deviation of the two methods are shown in Fig.4. It is clear that the accuracy of the two methods are comparable at small noise level (< 1.5 pixels), while the vanishing points based method (Alg.2) is superior to the circle based one () at large noise level. 5 Tests with Real Images All images in the tests were captured by Canon Powershort G3 with a resolution of 124 768. The camera was pre-calibrated via Zhang s method [2]. Test on the tea box image: For this test, the selected world frame, two pairs of parallel lines and the two detected conics by the Hough transform are shown in Fig.5. The line segments were detected and ﬁtted via orthogonal regression algorithm [16]. We recovered the rotation axis, rotation angle (unit: rad) and translation vector by the two methods as shown in Table 1, where the translation vector is normalized by t = 1. The results are reasonable with the imaging conditions, though we do not have the ground truth. ω c2 ω c1 y x Fig. 5. Test results of the tea box image. Upper: the image and the detected conics and parallel lines and world frame for pose estimation; Lower: the reconstructed tea box model at diﬀerent viewpoints with texture mapping. In order to give further evaluation of the recovered parameters, we reconstructed the 3D structure of the scene from the recovered projection matrix via the method in [17]. The result is shown from diﬀerent viewpoints in Fig.5. We manually took the measurements of the tea box and the grid in the background and registered the reconstruction to the ground truth. Then we computed the relative error E1 of the side length of the grid, the relative errors E2, E3 of the diameter and height of the circle. As listed in Table 1, we can see that the reconstruction error is very small. The results veriﬁes the accuracy of the recovered parameters in return. Test on the book image: The image with detected conic and preassigned world frame and two pairs of parallel lines are shown in Fig.6. We recovered the

Pose Estimation from Circle or Parallel Lines in a Single Image 371 Table 1. Test results and performance evaluations for real images Images Method Box Alg.2 Book Alg.2 Raxis [-.9746,.1867,-.1238] [-.9748,.1864,-.1228] [-.9173,.3452,-.1984] [-.9188,.346,-.1899] Rangle 2.4385 2.4354 2.2811 2.3163 t [-.8,.13,.98] [-.8,.13,.98] [-.2,.9,.99] [-.2,.9,.99] E1 (%).219.284.372.36 E2 (%).327.315.365.449 E3 (%).286.329.633.547 pose of the camera by the proposed methods, then computed the relative errors E1, E2 and E3 of the three side lengths of the book with respect to the ground truth taken manually. The results are shown in Table 1. The reconstructed 3D structure of the book is shown Fig.6. The results are realistic with good accuracy. ωc y x Fig. 6. Pose estimation and 3D reconstruction of the book image 6 Conclusion In this paper, we proposed and proved the possibility to recover the pose of the camera from a single image of one circle or two general pairs of parallel lines. Compared with previous techniques, less conditions are required by the proposed method. Thus the results in the paper may ﬁnd wide applications. Since the method utilizes the least information in computation, it is important to adopt some robust techniques to ﬁt the conics and lines. Acknowledgment The work is supported in part by the Canada Research Chair program and the National Natural Science Foundation of China under grant no. 657515. References 1. Ansar, A., Daniilidis, K.: Linear pose estimation from points or lines. IEEE Trans. Pattern Anal. Mach. Intell. 25(5), 578 589 (23) 2. Chen, H.H.: Pose determination from line-to-plane correspondences: Existence condition and closed-form solutions. IEEE Trans. Pattern Anal. Mach. Intell. 13(6), 53 541 (1991)

372 G. Wang, Q.M. Jonathan Wu, and Z. Ji 3. Criminisi, A., Reid, I., Zisserman, A.: Single view metrology. International Journal of Computer Vision 4(2), 123 148 (2) 4. Dhome, M., Richetin, M., Lapreste, J.T.: Determination of the attitude of 3D objects from a single perspective view. IEEE Trans. Pattern Anal. Mach. Intell. 11(12), 1265 1278 (1989) 5. Fischler, M.A., Bolles, R.C.: Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartomated cartography. Communications of the ACM. 24(6), 381 395 (1981) 6. Gao, X.S., Tang, J.: On the probability of the number of solutions for the P4P problem. J. Math. Imaging Vis. 25(1), 79 86 (26) 7. Hartley, R.I., Zisserman, A.: Multiple View Geometry in Computer Vision, 2nd edn. Cambridge University Press, Cambridge (24) 8. Horaud, R., Conio, B., Leboulleux, O., Lacolle, B.: An analytic solution for the perspective 4-point problem. CVGIP 47(1), 33 44 (1989) 9. Hu, Z.Y., Wu, F.C.: A note on the number of solutions of the noncoplanar P4P problem. IEEE Trans. Pattern Anal. Mach. Intell. 24(4), 55 555 (22) 1. Jiang, G., Quan, L.: Detection of concentric circles for camera calibration. In: Proc. of ICCV, pp. 333 34 (25) 11. Kim, J.S., Gurdjos, P., Kweon, I.S.: Geometric and algebraic constraints of projected concentric circles and their applications to camera calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(4), 637 642 (25) 12. Liu, Y., Huang, T.S., Faugeras, O.D.: Determination of camera location from 2-D to 3-D line and point correspondences. IEEE Trans. Pattern Anal. Mach. Intell. 12(1), 28 37 (199) 13. Meng, X., Li, H., Hu, Z.: A new easy camera calibration technique based on circular points. In: Proc. of BMVC (2) 14. Nistér, D., Stewénius, H.: A minimal solution to the generalised 3-point pose problem. J. Math. Imaging Vis. 27(1), 67 79 (27) 15. Quan, L., Lan, Z.: Linear n-point camera pose determination. IEEE Trans. Pattern Anal. Mach. Intell. 21(8), 774 78 (1999) 16. Schmid, C., Zisserman, A.: Automatic line matching across views. In: Proc. of CVPR, pp. 666 671 (1997) 17. Wang, G.H., Hu, Z.Y., Wu, F.C., Tsui, H.T.: Single view metrology from scene constraints. Image Vision Comput. 23(9), 831 84 (25) 18. Wang, G.H., Tsui, H.T., Hu, Z.Y., Wu, F.C.: Camera calibration and 3D reconstruction from a single view based on scene constraints. Image Vision Comput. 23(3), 311 323 (25) 19. Wang, G.H., Wang, S., Gao, X., Li, Y.: Three dimensional reconstruction of structured scenes based on vanishing points. In: Proc. of PCM, pp. 935 942 (26) 2. Zhang, Z.: A flexible new technique for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 22(11), 133 1334 (2)