Machine vision. Summary # 11: Stereo vision and epipolar geometry. u l = λx. v l = λy

1 Machine vision Summary # 11: Stereo vision and epipolar geometry STEREO VISION The goal of stereo vision is to use two cameras to capture 3D scenes. There are two important problems in stereo vision: Correspondence problem: finding matching pairs (conjugate pairs) of the two images that represent the same point in the 3D scene. Reconstruction problem: obtain the 3D structure from the images. For a single pinhole camera we wrote: u = x v = y A simple camera geometry for stereo vision is shown in figure 1, from which we have: (1) (2) where u r = u l = x v r = y v l = y (x b) is the focal length. The distance from the image plane to the center of projection. b is the baseline, distance between the centers of the two cameras. We assume that the optical axes are aligned. By subtraction, we get u l u r = b and therefore = b u l u r (8) It is common to attach the origin to the left camera as shown in figure 1. We assume that both cameras are calibrated and that they are identical. We also assume that the relative orientation of the two cameras is the same. It is also possible to attach the origin to the middle point between the two cameras reference frames. The equations will be slightly different. Equation (8) gives the distance to the 3D point from the camera. Note that The difference u l u r is called the horiontal disparity, retinal disparity, or binocular disparity. In order to get a feel for the disparity, put one finger in front of you, close one eye, then open it and close the other eye. Distance is inversely proportional to disparity Disparity is proportional to the base line Accuracy of depth determination increases with increasing baseline. Images become less similar when the baseline increases. For a given baseline, the accuracy is better for closer objects than for farther objects. Example Using equation (8), we can determine the x and y coordinates of point P as follows: u l x = b (9) u l u r v r y = b (10) u l u r (3) (4) (5) (6) (7)

Fig. 1. Stereo vision geometry, C l is the reference point. Example Consider images 2 and 3 obtained from a stereo vision system (in this problem we use subscripts 1 and 2 for right and left images, respectively). The image sie is 3456 by 4608 pixels. The pixel coordinates of the dot are r 1 = 749, r 2 = 4271, c 1 = 420. The origin of the pixel coordinate system is the bottom left of the image. (u 0, v 0 ) is located in the middle of the image. 1) Deduce a formula to find the distance to point P. 2) Calculate when the intrinsic parameters are The distance between the two cameras is 30cm. 3) Find the (x, y) coordinates of point P. The solution is shown as a code below. % Camera p a r a m e t e r s u0= 2304 v0= 1728 a l p h a v =3450 =3700 s x (11) =3450 s y (12) u 0 =2304 (13) v 0 =1728 (14) 2

Point P Fig. 2. Left image Point P Fig. 3. Right image a l p h a u =3700 %R i g h t r1 =749 c1 =420 %L e f t r2 =4271 c2 =420 b=300 p o i n t =( a l p h a u b ) / ( r2 r1 ) y p o i n t = p o i n t ( c2 v0 ) / a l p h a v y p o i n t= y p o i n t %T r a n s f o r m i n g t h e o r i g i n x p o i n t = p o i n t ( r2 u0 ) / a l p h a u The resiul;ts are = 315.16mm (15) x = 167.54mm (16) y = 119.48mm (17) 3

Fig. 4. Relative geometry between two cameras RELATIVE GEOMETRY BETWEEN TWO CAMERAS The assumption of perfectly aligned cameras is violated in practice. Also, two identical cameras do not exist. In general, the first step in stereo vision is to determine the relationship between the two cameras. By relationship we mean the relative orientation and position (cameras are not aligned any more). Consider the geometric representation of figure 4. Let S l = (x l, y l, ) T be the position of point P in the left camera coordinate system and S r = (x r, y r, r ) T be the position of point P in the right camera coordinate system. It is possible to relate the coordinates by the following equation where R is a rotation matrix, it satisfies R T R = I. System (18) can be written as S r = RS l + T (18) r 11 x l + r 12 y l + r 13 + T x = x r r 21 x l + r 22 y l + r 23 + T y = y r r 31 x l + r 32 y l + r 33 + T = r (19) We do not know R or T but we know the the left and right image projections u r, v r, u l, v l. Knowing the focal length, it is possible to write u l = x l (20) v l = y l (21) u r = x r r (22) v r = y r r (23) Now and r are regarded as additional unknowns. After substituting x l, y l, x r, y r by their formulae in terms of the focal length and the depth distance, we get u l r 11 + r v l 12 + r 13 + T x = u r r (24) u l r 21 + r v l 22 + r 23 + T y = v r r (25) u l r 31 + r v l 32 + r 33 + T = r (26) 4

and r 11 u l + r 12 v l + r 13 + T x = u r r (27) r 21 u l + r 22 v l + r 23 + T y = v r r (28) r 31 u l + r 32 v l + r 33 + T = r (29) There are three equations and fourteen unknowns (r ij, T x, T y, T., r ). Each additional point provides three more equations, but at the same time introduces two unknown variables:, r. For one point, we can write the system as For N points, we obtain Therefore, we need at least 12 points to solve. 3 equations 1 point = 12 unkowns + 2 unkowns 1 point (30) 3 equations N points = 12 unkowns + 2 unkowns N points (31) COMPUTING THE DEPTH If we know the translation and the rotation matrix as well as the image coordinate u l, u r, v l, v r, we can calculate the depths and r : u l r 11 + r v l 12 + r r 13 + T x = u r (32) u l r 21 + r v l 22 + r r 23 + T y = v r (33) u l r 31 + r v l 32 + r 33 + T = r (34) Since we have two unknowns and three equations we can use any two equations to solve for and r. In the particular case when the cameras have the same orientation, we have: ul ur + T x = r (35) vl vr + T y = r (36) + T = r (37) EPIPOLAR GEOMETRY AND FUNDAMENTAL MATRIX Consider the stereo vision geometry of figures 5 and 6. We want to solve the correspondence problem. Point P in the 3D space is imaged in the left camera at q l and in the right camera at q r. Rays C l q l and C r q r intersect at point P and they both lie in the same plane. As a result the image points q l, q r, the space point P, and the camera centers are coplanar, i.e., they belong to the same plane. The plane defined by these three points (C l, C r, P ) is called the epipolar plane and is denoted by Π. The correspondence problem can be formulated as follows: knowing q l, is what are the coordinates and the constraints on the location of q r. Point q r lies in the right image plane and at the same time, it lies in the plane Π. The intersection of the epipolar plane Π and the image plane form a line l r. Now the search for q r is reduced to line l r. Line l r is called the epipolar line. Points e l, e r are called the epipoles. Figure 6 shows the stereo vision geometry. Points P 1, P 2, P 3 have the same projection in the left image plane, but different projections in the right image. The epipolar constraint reduces the problem to 1D search. Example The images in figure 7 and 8 are taken using a stereo vision system with b = 300mm. The coordinates of the point of interest in the pixel coordinate system are (916, 686) in the right image and (97, 701) in the left image. The blue line is the right epipolar line and the red line is the left epipolar line. Assume we have canonical cameras THE ESSENTIAL MATRIX A l = A r = I 5

Machine vision, spring 2017 Summary 11 Fig. 5. Stereo vision geometry Fig. 6. Stereo vision geometry 200 400 X: 916 Y: 685.6 600 800 1000 1200 1400 1600 1800 500 1000 1500 Fig. 7. Examples of the epipolar lines 6 2000 2500

Machine vision, spring 2017 Summary 11 200 400 600 X: 97 Y: 701.4 800 1000 1200 1400 1600 1800 500 1000 1500 2000 2500 Fig. 8. Examples of the epipolar lines where A` and Ar are the intrinsic matrixes of the left and right follows M` = I Mr = R camera, respectively. We define the projection matrixes as 0 T (38) (39) The essential matrix is defines as E = TX R where TX is the translation vector represented under matrix form 0 T 0 TX = T Ty Tx Ty Tx 0 (40) T HE FUNDAMENTAL MATRIX The fundamental matrix is a algebraic representation of the epipolar geometry. It represents a mapping between the right and left image. In general A` 6= I and Ar 6= I. The fundamental matrix is given by 1 T F = A 1 r TX RA` (41) The most important property of the fundamental matrix is summaried in the following theorem. Theorem: The fundamental matrix satisfies the following condition: for any pair of corresponding image points qrt F q` = 0 (42) qr lies on the epilopar line `r = F q` (43) `` = F T qr (44) q` lies on the epilopar line Equations (43) and (44) show that the fundamental matrix represents a mapping between a point and a line. The correspondence problem is formulated in terms of matrix F. Solving the correspondence problem means solving for matrix F, which is a unique 3 3 matrix of rank 2. 7

COMPUTING THE FUNDAMENTAL MATRIX: THE EIGHT-POINT ALGORITHM Equation (41) gives the fundamental matrix in terms of the intrinsic and extrinsic parameters. As mentioned previously, each pair of points gives a scalar constraint as follows q T r i F q l i = 0 (45) The eight-point algorithm proposes to use at least eight points to calculate matrix F. Equation (42) can be written as ur v r 1 f 11 f 12 f 13 u l f 21 f 22 f 23 v l = 0 (46) f 31 f 32 f 33 1 This is a scalar equation that can be reduced to: ul u r u l v r u l v l u r v l v r v l u r v r 1 At least eight points are needed to solve. If we take N points, we obtain N constraint that can be put under matrix form as follow W f = 0 where W is given by and f 11 f 12 f 13 f 21 f 22 f 23 f 31 f 32 f 33 = 0 u l1 u r1 u l1 v r1 u l1 v l1 u r1 v l1 v r1 v l1 u r1 v r1 1 u l2 u r2 u l2 v r2 u l2 v l2 u r2 v l2 v r2 v l2 u r2 v r2 1 u l3 u r3 u l3 v r3 u l3 v l3 u r3 v l3 v r3 v l3 u r3 v r3 1 u l4 u r4 u l4 v r4 u l4 v l4 u r4 v l4 v r4 v l4 u r4 v r4 1 u l5 u r5 u l5 v r5 u l5 v l5 u r5 v l5 v r5 v l5 u r5 v r5 1 u l6 u r6 u l6 v r6 u l6 v l6 u r6 v l6 v r6 v l6 u r6 v r6 1 u l7 u r7 u l7 v r7 u l7 v l7 u r7 v l7 v r7 v l7 u r7 v r7 1......... u ln u rn u ln v rn u ln v ln u rn v ln v rn v ln u rn v rn 1 f = f 11 f 12 f 13 f 21 f 22 f 23 f 31 f 32 f 33 One possible way to solve is bu using the singular value decomposition method. The solution consists of two steps in general Step 1: Linear solution: Use the singular value decomposition to obtain a first estimation of matrix F by soling W f = 0. This estimation may not satisfy the rank requirement for the fundamental matrix. The following commands can be used: U, S, V = svd (A ) ; f = V ( :, end ) ; F = r e s h a p e ( f, 3 3 ) ; Step 2: Constrain enforcement: Find the closest approximation to F that has rank 2. Again SVD is used as follows U, S, V = svd ( F ) ; S ( 3, 3 ) = 0 ; F = U S V ; (47) (48) 8

Example We want to find the epipolar lines for the pair of images given in figures 7 8. The desired points are p 1 =(916, 686) (49) p 2 =(97, 701) (50) For 8 points, matrix W is given by: 86165 635807 907 64410 475278 678 95 701 1 135184 833000 952 122262 753375 861 142 875 1 153576 951588 972 153102 948651 969 158 979 1 W = 85975 941200 905 97850 1071200 1030 95 1040 1 288706 1076086 1193 216590 807290 895 242 902 1 589050 1198890 1386 363800 740440 856 425 865 1 872448 1307136 1536 478824 717393 843 568 851 1 262080 1228500 1170 233632 1095150 1043 224 1050 1 The fundamental matrix is given by The epipolar lines are given by f = l r = 0.0000 0.0000 0.0018 0.0000 0.0000 0.0036 0.0009 0.0022 1.0000 0.0005 0.0026 1.7367 l l = 0.0003 0.0023 1.3524 (51) (52) (53) (54) 9