CS231M Mobile Computer Vision Structure from motion - Cameras - Epipolar geometry - Structure from motion
Pinhole camera Pinhole perspective projection f o f = focal length o = center of the camera
z y f y' z f ' y P z y P Pinhole camera Derived using similar triangles ) z y,f z (f z) y,, (
From retina plane to images Piels, bottom-left coordinate systems
From retina plane to images y c c
Converting to piels y y c 1. Off set C=[c, c y ] c y (, y, z) (f c, f c y ) z z
Converting to piels y y c c 1. Off set 2. From metric to piels y (, y, z) (f k c, f l c y ) z z C=[c, c y ] Units: k,l : piel/m f : m Non-square piels, : piel
Camera Matri y y c y (, y, z) ( c, c y z z ) c Matri form? C=[c, c y ]
Homogeneous coordinates For details see lecture on transformations in CS131A homogeneous image coordinates homogeneous scene coordinates Converting from homogeneous coordinates
Camera Matri ) c z y, c z ( z) y,, ( y 1 0 1 0 0 0 0 0 0 ' z y c c z z c y z c P y y y c y c C=[c, c y ] Camera matri K
Camera Skew 1 0 1 0 0 0 sin 0 0 cot ' z y c c P y y c y c C = [c, c y ] P P M ' P I K 0 K has 5 degrees of freedom!
World reference system R,T j w k w O w i w The mapping so far is defined within the camera reference system What if an object is represented in the world reference system
World reference system R,T j w P k w O w P i w In 4D homogeneous coordinates: P R T 1 0 4 4 P w I P P K 0 Internal parameters R 0 T 1 ' K I 0 Pw 44 K Eternal parameters R M T P w
Properties of Projection Points project to points Lines project to lines Distant objects look smaller
Properties of Projection Angles are not preserved Parallel lines meet! Vanishing point
Camera Calibration Calibration rig j C P' KI 0 P K R T P w P 1 P n with known positions in [O w,i w,j w,k w ] p 1, p n known positions in the image Goal: compute intrinsic and etrinsic parameters
Camera Calibration http://docs.opencv.org/_downloads/camera_calibration.cpp Calibration rig image j C P' KI 0 P K R T P w P 1 P n with known positions in [O w,i w,j w,k w ] p 1, p n known positions in the image Goal: compute intrinsic and etrinsic parameters
CS231M Mobile Computer Vision Structure from motion - Cameras - Epipolar geometry - Structure from motion
Can we recover the structure from a single view? Pinhole perspective projection P p O w Calibration rig Scene C Camera K Why is it so difficult? Intrinsic ambiguity of the mapping from 3D to image (2D)
Can we recover the structure from a single view? Intrinsic ambiguity of the mapping from 3D to image (2D) Courtesy slide S. Lazebnik
Two eyes help! X? K =known 2 1 R, T O 2 O 1 This is called triangulation K =known
Structure from motion problem X j 1j M m M 1 mj 2j M 2 Given m images of n fied 3D points ij = M i X j, i = 1,, m, j = 1,, n
Structure from motion problem X j 1j M m M 1 mj 2j M 2 From the mn correspondences ij, can we estimate: m projection matrices M i n 3D points X j motion structure
Study relationship between X, 1 and 2 X K =known 2 1 R, T O 2 O 1 Epipolar geometry! K =known
Epipolar geometry X 1 2 e 1 e 2 Epipolar Plane Epipoles e 1, e 2 Baseline Epipolar Lines O 1 O 2 = intersections of baseline with image planes = projections of the other camera center
Epipolar geometry X For details see CS131A Lecture 9 1 2 e 1 e 2 Epipolar Plane Epipoles e 1, e 2 Baseline Epipolar Lines O 1 O 2 = intersections of baseline with image planes = projections of the other camera center
Eample: Converging image planes e e
Eample: Parallel image planes X e 1 1 2 e 2 O 1 O 2 Baseline intersects the image plane at infinity Epipoles are at infinity Epipolar lines are parallel to ais
Eample: Parallel Image Planes e at infinity e at infinity
Why are epipolar constraints useful? - Two views of the same object - Suppose I know the camera positions and camera matrices - Given a point on left image, how can I find the corresponding point on right image?
Why are epipolar constraints useful? X O 1 O 2
Why are epipolar constraints useful? - Two views of the same object - Suppose I know the camera positions and camera matrices - Given a point on left image, how can I find the corresponding point on right image?
Essential matri P p p O R, T O Assume camera matrices are known p T E E p T R 0 E = essential matri (Longuet-Higgins, 1981)
b a b a ] [ 0 0 0 z y y z y z b b b a a a a a a Cross product as matri multiplication
Epipolar Constraint P p p O R, T O p T F K F p 0 T 1 T R K F = Fundamental Matri (Faugeras and Luong, 1992)
Epipolar Constraint p T F p 1 2 0 P p 1 p 2 e 1 e 2 O 1 O 2 F p 2 is the epipolar line associated with p 2 (l 1 = F p 2 ) F T p 1 is the epipolar line associated with 1 (l 2 = F T p 1 ) F e 2 = 0 and F T e 1 = 0 F is 33 matri; 7 DOF F is singular (rank two)
Why F is useful? l = F T - Suppose F is known - No additional information about the scene and camera is given - Given a point on left image, how can I find the corresponding point on right image?
Why F is useful? F captures information about the epipolar geometry of 2 views + camera parameters MORE IMPORTANTLY: F gives constraints on how the scene changes under view point transformation (without reconstructing the scene!) Powerful tool in: 3D reconstruction Multi-view object/scene matching
O O p p P Estimating F The Eight-Point Algorithm 1 v u p P 1 v u p P 0 F p p T (Hartley, 1995) (Longuet-Higgins, 1981)
Estimating F OPENCV: findfundamentalmat
CS231M Mobile Computer Vision Structure from motion - Cameras - Epipolar geometry - Structure from motion
Structure from motion problem X j 1j M m M 1 mj 2j M 2 From the mn correspondences ij, can we estimate: m projection matrices M i n 3D points X j motion structure
Similarity Ambiguity The scene is determined by the images only up a similarity transformation (rotation, translation and scaling) This is called metric reconstruction Similarity The ambiguity eists even for (intrinsically) calibrated cameras For calibrated cameras, the similarity ambiguity is the only ambiguity [Longuet-Higgins 81]
Similarity Ambiguity It is impossible based on the images alone to estimate the absolute scale of the scene (i.e. house height) http://www.robots.o.ac.uk/~vgg/projects/singleview/models/hut/hutme.wrl
Structure from Motion Ambiguities Projective In the general case (nothing is known) the ambiguity is epressed by an arbitrary affine or projective transformation M j i X j M K i i R i T i H X j M H 1 j j M i X j -1 M H H X i j
Projective Ambiguity R. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision, 2nd edition, 2003
Metric reconstruction (upgrade) The problem of recovering the metric reconstruction from the perspective one is called self-calibration Stratified reconstruction: from perspective to affine from affine to metric
Mobile SFM Intrinsic camera parameters are known or can be calibrated. For calibrated cameras, the similarity ambiguity is the only ambiguity [Longuet-Higgins 81] No need for stratified solution or auto-calibration Similarity Metric reconstruction can be determined if a calibration pattern is used or the absolute size of an known object is given.
Structure-from-Motion Algorithms Algebraic approach (by fundamental matri) Factorization method (by SVD) Bundle adjustment
Algebraic approach (2-view case) 1j M 1 2j M i j i X j M 2 Apply a projective transformation H such that: M H 1 I 0 1 M H A b 1 Canonical perspective cameras 2
Algebraic approach (2-view case) 1. Compute the fundamental matri F from two views (eg. 8 point algorithm) 2. Compute b and A from F Compute b as least sq. solution of F b = 0, with b =1 using SVD; b is an epipole A= [b ] F 3. Use b and A to estimate projective cameras M 0 M [b ]F b 1 I 2 4. Use these cameras to triangulate and estimate points in 3D For details, see CS231A, lecture 7
Structure-from-Motion Algorithms Algebraic approach (by fundamental matri) Factorization method (by SVD) Bundle adjustment
Affine structure from motion (simpler problem) Image World Image From the mn correspondences ij, estimate: m projection matrices M i (affine cameras) n 3D points X j
Affine cameras X Camera matri M for the affine case u v M X 1 AX b; M A b
Centering the data X j ˆ ij Normalize points w.r.t. centroids of measurements from each image ij AX j b ˆ A ij i X j
mn m m n n D ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ 2 1 2 22 21 1 12 11 cameras (2m ) points (n ) A factorization method - factorization Let s create a 2m n data (measurement) matri:
Let s create a 2m n data (measurement) matri: n m mn m m n n X X X A A A D 2 1 2 1 2 1 2 22 21 1 12 11 ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ cameras (2m 3) points (3 n ) The measurement matri D = M S has rank 3 (it s a product of a 2m3 matri and 3n matri) A factorization method - factorization (2m n) M S
2m Factorizing the Measurement Matri = Measurements Motion Structure 3 n 3 n D = MS
2m Factorizing the Measurement Matri Singular value decomposition of D: = D U W V T n n n n n
Factorizing the Measurement Matri Since rank (D)=3, there are only 3 non-zero singular values
Factorizing the Measurement Matri S = structure M = Motion (cameras)
Factorizing the Measurement Matri What is the issue here? D has rank>3 because of: measurement noise affine approimation T Theorem: When D has a rank greater than p, U pwpvp is the best possible rank- p approimation of A in the sense of the Frobenius norm. T A0 U3 D U3W3V3 P W V T 0 3 3
Affine Ambiguity = D M C C -1 S The decomposition is not unique. We get the same D by using any 3 3 matri C and applying the transformations: M MC S C -1 S M S Additional constraints must be enforced to resolve this ambiguity
Reconstruction results C. Tomasi and T. Kanade. Shape and motion from image streams under orthography: A factorization method. IJCV, 9(2):137-154, November 1992.
Structure-from-Motion Algorithms Algebraic approach (by fundamental matri) Factorization method (by SVD) Bundle adjustment
Bundle adjustment Non-linear method for refining structure and motion Minimizing re-projection error E(M, X) m i1 n j1 D 2, X ij M i j X j? M 1 X j 1j 3j P 1 M 2 X j 2j M 3 X j P 2 P 3
Bundle adjustment Non-linear method for refining structure and motion Minimizing re-projection error E(M, X) m i1 n j1 D 2, X ij M i Advantages Handle large number of views Handle missing data Can leverage standard optimization packaged such as Levenberg-Marquardt Limitations Large minimization problem (parameters grow with number of views) Requires good initial condition Used as the final step of SFM j
Results and applications Courtesy of Oford Visual Geometry Group Lucas & Kanade, 81 Chen & Medioni, 92 Debevec et al., 96 Levoy & Hanrahan, 96 Fitzgibbon & Zisserman, 98 Triggs et al., 99 Pollefeys et al., 99 Kutulakos & Seitz, 99 Levoy et al., 00 Hartley & Zisserman, 00 Dellaert et al., 00 Rusinkiewic et al., 02 Nistér, 04 Brown & Lowe, 04 Schindler et al, 04 Lourakis & Argyros, 04 Colombo et al. 05 Golparvar-Fard, et al. JAEI 10 Pandey et al. IFAC, 2010 Pandey et al. ICRA 2011 Microsoft s PhotoSynth Snavely et al., 06-08 Schindler et al., 08 Agarwal et al., 09 Frahm et al., 10
CS231M Mobile Computer Vision Net lecture: Eample of a SFM pipeline for mobile devices: the VSLAM pipeline