Week 2: Two-View Geometry Padua Summer 08 Frank Dellaert
Mosaicking Outline 2D Transformation Hierarchy RANSAC Triangulation of 3D Points Cameras Triangulation via SVD Automatic Correspondence Essential and Fundamental Matrix A recipe for Correspondence
Mosaicking www.cs.cmu.edu/~dellaert/mosaicking
Hierarchy of 2D Transforms Subgroup Structure: Translation (2DOF) Rigid 2D (3DOF) Affine (6DOF) Projective (8DOF)
Rigid 2D Transform Take Notes
Mosaicking Outline 2D Transformation Hierarchy RANSAC Triangulation of 3D Points Cameras Triangulation via SVD Automatic Correspondence Essential and Fundamental Matrix A recipe for Correspondence
Motivation Estimating motion models Typically: points in two images Candidates: Translation Rotation 2D Rigid transform Homography
Simpler Example Fitting a straight line
Discard Outliers No point with d>t RANSAC: RANdom SAmple Consensus Fischler & Bolles 1981 Copes with a large proportion of outliers
Main Idea Select 2 points at random Fit a line Support = number of inliers Line with most inliers wins
Why will this work?
Best Line has most support More support -> better fit
In General Fit a more general model Sample = minimal subset Translation:? Homography? Fundamental Matrix?
RANSAC Objective: Robust fit of a model to data S Algorithm Randomly select s points Instantiate a model Get consensus set Si If Si >T, terminate and return model Repeat for N trials, return model with max Si
Distance Threshold Requires noise distribution Gaussian noise with σ Chi-squared distribution with DOF m 95% cumulative: Line, F: m=1, t=3.84 σ 2 Translation, homography: m=2, t=5.99\ σ 2 I.e. -> 95% prob that d<t is inlier
How many samples? We want: at least one sample with all inliers Can t guarantee: probability p E.g. p =0.99
Calculate N If w = proportion of inliers = 1-etha P(sample with all inliers)=w s P(sample with an outlier)=1-w s P(N samples an outlier)=(1-w s )^N We want P(N samples an outlier)<1-p (1-w s )^N<1-p N>log(1-p)/log(1-w s )
Example P=0.99 s=2, etha=5% => N=2 s=2, etha=50% => N=17 s=4, etha=5% => N=3 s=4, etha=50% => N=72 s=8, etha=5% => N=5 s=8, etha=50% => N=1177
Remarks N = f(etha), not the number of points N increases steeply with s
Threshold T Remember: terminate if Si >T Rule of thumb: T #inliers So, T=(1-etha)n
Adaptive N When etha is unknown? Start with etha=50%, N=inf Repeat: Sample s, fit model -> update etha as outliers /n -> set N=f(etha,s,p) Terminate when N samples seen
Mosaicking Outline 2D Transformation Hierarchy RANSAC Triangulation of 3D Points Cameras Triangulation via SVD Automatic Correspondence Essential and Fundamental Matrix A recipe for Correspondence
Pinhole Camera (x, y, z) ( f x z, f y z ) P k C p i O f j q Q
Perspective Camera Model Z Y w v v Z X w u u T Z Y X P I w v u p = = = = = = = ˆ ˆ by normalizing : coordinates Recover image (Euclidean) 0 1 0 0 0 0 1 0 0 0 0 1 0] [ coordinates homogeneous (projective) Linear transformation of
Normalized Image coordinates 1 O u=x/z = dimensionless! P
Pixel units Pixels are on a grid of a certain dimension f O u=k f X/Z = in pixels! P [f] = m (in meters) [k] = pixels/m
Pixel coordinates We put the pixel coordinate origin on topleft f O u=u 0 + k f X/Z P
Pixel coordinates in 2D (0.5,0.5) 640 u 0 + kf X Z,v 0 + lf Y Z 480 (u 0,v 0 ) i j (640.5,480.5)
Important: MATLAB Convention (1,1)! Just as good as any other convention!
Summary: Intrinsic 3 3 Calibration Matrix K Calibration X u α s u 0 1 0 0 0 p = v = K[I 0]P = β v 0 0 1 0 0 Y Z w 1 0 0 1 0 T Recover image (Euclidean) coordinates by normalizing : ˆ u = u w = αx + sy + u 0 Z ˆ v = v w = βy + v 0 Z skew 5 Degrees of Freedom!
Camera Pose In order to apply the camera model, objects in the scene must be expressed in camera coordinates. y World Coordinates x z y c wt z x Camera Coordinates Calibration target looks tilted from camera viewpoint. This can be explained as a difference in coordinate systems.
Hierarchy of 3D Transforms Subgroup Structure: Translation (3DOF) Rigid 3D (6DOF) Affine (12DOF) Projective (15DOF)
Rigid Body Transformations Need a way to specify the six degreesof-freedom of a rigid body. Why are their 6 DOF? A rigid body is a collection of points whose positions relative to each other can t change Fix one point, three DOF Fix second point, two more DOF (must maintain distance constraint) Third point adds one more DOF, for rotation around line
Notations Superscript references coordinate frame A P is coordinates of P in frame A B P is coordinates of P in frame B Example : k A j A A P = A x A y A z OP = ( A x i ) A + ( A y j ) A + ( A z k ) A i A O A P
B P= A P+ B Translation ( O A ) k B k A i B O B j B i A O A j A P
Translation Using homogeneous coordinates, translation can be expressed as a matrix multiplication. B A B P = P + O A B B A P I OA P = 1 0 1 1 Note: composing two translations is commutative
Rotation A B x x A ( ) ( ) B OP = ia ja ka y = ib jb kb y A B z z B A P = R P B B A A R means describing frame A in The coordinate system of frame B
Rotation R i A. ib ja. ib k A. ib = i. j j. j k. j i A. k B ja. k B k A. k B B A A B A B A B B B B = i A ja k A A T ib A T = jb A T k B Orthogonal matrix!
Example: Rotation about z axis What is the rotation matrix?
Rotation in homogeneous coordinates Using homogeneous coordinates, rotation can be expressed as a matrix multiplication. P = R P B B A A B B A P A R 0 P = 1 0 1 1 Note: composing two rotations is not commutative
Rigid transformations B = B A + B A P R P O A
Rigid transformations (cont d) Unified treatment using homogeneous coordinates. B B B A P 1 OA A R 0 P = 1 0 1 0 1 1 B B A A R OA P = T 0 1 1 B A P B P = AT 1 1
3D-2D Projective mapping Projection Matrix (3x4)
Projective Camera Matrix Camera = Calibration Pr ojection Extrinsics p = u v w = K[I 0]TP = α s u 0 β v 0 1 1 0 0 0 0 1 0 0 0 0 1 0 R t 0 1 X Y Z T = K R t [ ]P = MP 5+6 DOF = 11!
Projective Camera Matrix [ ] = = = T Z Y X m m m m m m m m m m m m w v u MP P t R K p 34 33 32 31 24 23 22 21 14 13 12 11 5+6 DOF = 11!
Columns & Rows of M [ ] 4 3 2 1 3 2 1 m m m m m m m M = = m 2 P=0 i i i i i i m P P m v m P m P u 3 2 3 1 = = O
Mosaicking Outline 2D Transformation Hierarchy RANSAC Triangulation of 3D Points Cameras Triangulation via SVD Automatic Correspondence Essential and Fundamental Matrix A recipe for Correspondence
Why Consider Multiple Views? P X P' x x' Answer: To extract 3D structure via triangulation.
Stereo Rig Top View Matches on Scanlines Convenient when searching for correspondences.
Arbitrary 2-View Triangulation P? z Given p and p find P p p C C
Linear Triangulation Method Take Projection equations Apply cross-product trick: p MP p' M ' P p MP = 0 [ p ] M p' M ' P = 0 [ p' ] M' SVD Smallest Eigenvector! Generalizes to N points P = 0 0
Resectioning = Finding a Camera Given Known points SVD: 6-point algorithm Apply cross-product trick Take Notes
Mosaicking Outline 2D Transformation Hierarchy RANSAC Triangulation of 3D Points Cameras Triangulation via SVD Automatic Correspondence Essential and Fundamental Matrix A recipe for Correspondence
Feature Matching!
Real World Challenges Bad News: Good correspondences are hard to find Good news: Geometry constrains possible correspondences. 4 DOF between x and x'; only 3 DOF in X. Constraint is manifest in the Fundamental matrix F can be calculated either from camera matrices or a set of good correspondences.
Geometry of 2 views? What if we do not know R,t? Caveat: My exposition follows book conventions but more intuitive (IMHO) Different from Hartley & Zisserman! F&P use [R T -R T t] camera matrices H&Z uses [R t]
Epipolar Geometry Where can p appear? P p t C C M =[R T -R T t] M=[I 0]? p
Image of Camera Center epipole M =[R T -R T t] M=[I 0]
Example:Cameras Point at Each Other Top View Epipolar Lines
Epipoles Camera Center C in first view: [ ] t 1 e = I 0 = t Origin C in second view: e'= [ R T R T t] 0 = R T t 1
Image of Camera Ray? epipole M =[R T -R T t] M=[I 0]
Point at infinity Given p, what is corresponding point at infinity [x 0]? Answer for any camera M =[A a]: p'= [ A a] x = Ax x = A 1 p' 0 A -1 = Infinite homography In our case M =[R T -R T t]: x = Rp'
Sidebar: Infinite Homographies Homography between image plane plane at infinity Navigation by the stars: Image of stars = function of rotation R only! Traveling on a sphere rotates viewer
Essential Matrix
Epipolar Line Calculation 1) Point 1 = epipole e=t 2) Point 2 = point at infinity [ ] Rp' p = I 0 3) Epipolar line = join of points 1 and 2 0 l = t Rp' = Rp'
P Epipolar Lines P p C e e=t C M =[R T -R T t] M=[I 0] p l = t p =Rp Rp'
Epipolar lines e=t l = t Rp' p =Rp
Epipolar Plane P p l' l p C e e=t C M =[R T -R T t] M=[I 0]
Essential Matrix mapping from p to l l = t Rp'= [ t] R p'= E p' E = 3*3 matrix Because p is on l, we have p T Ep'= 0
E s Degrees of Freedom R,t = 6 DOF However, scale ambiguity! = 5 DOF
Fundamental Matrix
P Uncalibrated Case P p p =A -1 p p [ ] e A 1 p' C e =a e=-a -1 a C M =K [R [A a] -1 K [R T -R -R T t] t] M=K[I 0] M=[I 0] l =
Uncalibrated Case, Forsyth & Ponce Version Fundamental Matrix (Faugeras and Luong, 1992)
Fundamental Matrix mapping from p to l l = e A 1 p'= [ e] A 1 p'= F p' F = 3*3 matrix Because p is on l, we have p T Fp'= 0
Properties of the Fundamental Matrix Fp is the epipolar line associated with p. F T p is the epipolar line associated with p. F T e=0 and Fe =0. F is singular.
The Eight-Point Algorithm (Longuet-Higgins, 1981) Minimize: under the constraint 2 F =1.
Non-Linear Least-Squares Approach (Luong et al., 1993) Minimize with respect to the coefficients of F, using an appropriate rank-2 parameterization.
The Normalized Eight-Point Algorithm (Hartley, 1995) Center the image data at the origin, and scale it so the mean squared distance between the origin and the data points is 2 pixels: q i = T p i q i = T p i. Use the eight-point algorithm to compute F from the points q i and q i. Enforce the rank-2 constraint. Output T -1 F T.