CS231M Mobile Computer Vision Structure from motion

Similar documents
Structure from motion

Lecture 9: Epipolar Geometry

Lecture 10: Multi-view geometry

CEE598 - Visual Sensing for Civil Infrastructure Eng. & Mgmt.

Structure from motion

Structure from Motion CSC 767

Lecture 6 Stereo Systems Multi- view geometry Professor Silvio Savarese Computational Vision and Geometry Lab Silvio Savarese Lecture 6-24-Jan-15

Structure from Motion

BIL Computer Vision Apr 16, 2014

Lecture 10: Multi view geometry

Lecture 5 Epipolar Geometry

Lecture 6 Stereo Systems Multi-view geometry

Recovering structure from a single view Pinhole perspective projection

Structure from Motion

Structure from Motion

Structure from motion

Epipolar geometry. x x

CS231A Course Notes 4: Stereo Systems and Structure from Motion

Unit 3 Multiple View Geometry

Structure from Motion

C280, Computer Vision

CS 532: 3D Computer Vision 7 th Set of Notes

calibrated coordinates Linear transformation pixel coordinates

Multi-view geometry problems

Structure from Motion and Multi- view Geometry. Last lecture

Computer Vision I - Algorithms and Applications: Multi-View 3D reconstruction

Lecture'9'&'10:'' Stereo'Vision'

The end of affine cameras

EE290T : 3D Reconstruction and Recognition

Structure from motion

Week 2: Two-View Geometry. Padua Summer 08 Frank Dellaert

55:148 Digital Image Processing Chapter 11 3D Vision, Geometry

Two-view geometry Computer Vision Spring 2018, Lecture 10

Stereo and Epipolar geometry

Multi-View Geometry Part II (Ch7 New book. Ch 10/11 old book)

Epipolar Geometry and Stereo Vision

Epipolar Geometry and Stereo Vision

CS231A Midterm Review. Friday 5/6/2016

3D Photography: Epipolar geometry

Structure from Motion

CS 231A Computer Vision (Winter 2015) Problem Set 2

Reminder: Lecture 20: The Eight-Point Algorithm. Essential/Fundamental Matrix. E/F Matrix Summary. Computing F. Computing F from Point Matches

Last lecture. Passive Stereo Spacetime Stereo

Camera Geometry II. COS 429 Princeton University

Structure from Motion. Introduction to Computer Vision CSE 152 Lecture 10

Part I: Single and Two View Geometry Internal camera parameters

Epipolar Geometry Prof. D. Stricker. With slides from A. Zisserman, S. Lazebnik, Seitz

Structure from Motion

Computational Optical Imaging - Optique Numerique. -- Single and Multiple View Geometry, Stereo matching --

A Factorization Method for Structure from Planar Motion

Announcements. Stereo

But First: Multi-View Projective Geometry

3D reconstruction class 11

Multiple View Geometry. Frank Dellaert

Two-View Geometry (Course 23, Lecture D)

3D Reconstruction from Two Views

3D Reconstruction with two Calibrated Cameras

CS 2770: Intro to Computer Vision. Multiple Views. Prof. Adriana Kovashka University of Pittsburgh March 14, 2017

Computer Vision Lecture 20

CS 664 Structure and Motion. Daniel Huttenlocher

Robust Geometry Estimation from two Images

CSE 252B: Computer Vision II

Computer Vision Lecture 17

Transformations Between Two Images. Translation Rotation Rigid Similarity (scaled rotation) Affine Projective Pseudo Perspective Bi linear

Stereo II CSE 576. Ali Farhadi. Several slides from Larry Zitnick and Steve Seitz

Computer Vision Lecture 17

Index. 3D reconstruction, point algorithm, point algorithm, point algorithm, point algorithm, 263

Announcements. Stereo

Fundamental Matrix. Lecture 13

Index. 3D reconstruction, point algorithm, point algorithm, point algorithm, point algorithm, 253

Undergrad HTAs / TAs. Help me make the course better! HTA deadline today (! sorry) TA deadline March 21 st, opens March 15th

Epipolar Geometry class 11

Computer Vision I. Announcement. Stereo Vision Outline. Stereo II. CSE252A Lecture 15

Stereo Vision. MAN-522 Computer Vision

Stereo CSE 576. Ali Farhadi. Several slides from Larry Zitnick and Steve Seitz

Machine vision. Summary # 11: Stereo vision and epipolar geometry. u l = λx. v l = λy

3D Computer Vision. Structure from Motion. Prof. Didier Stricker

arxiv: v1 [cs.cv] 28 Sep 2018

Camera Calibration. Schedule. Jesus J Caban. Note: You have until next Monday to let me know. ! Today:! Camera calibration

1 Projective Geometry

Lecture 8.2 Structure from Motion. Thomas Opsahl

Multiple View Geometry in Computer Vision

Multiple Views Geometry

Epipolar Geometry and Stereo Vision

CS 231A: Computer Vision (Winter 2018) Problem Set 2

Perception and Action using Multilinear Forms

Stereo. 11/02/2012 CS129, Brown James Hays. Slides by Kristen Grauman

CS201 Computer Vision Camera Geometry

Vision Review: Image Formation. Course web page:

Lecture 9 & 10: Stereo Vision

EECS 442: Final Project

Multiple View Geometry

Computational Optical Imaging - Optique Numerique. -- Multiple View Geometry and Stereo --

Structure from Motion. Lecture-15

Epipolar Geometry and the Essential Matrix

Today. Stereo (two view) reconstruction. Multiview geometry. Today. Multiview geometry. Computational Photography

Computing F class 13. Multiple View Geometry. Comp Marc Pollefeys

Projective 2D Geometry

Image correspondences and structure from motion

Computer Vision I - Robust Geometry Estimation from two Cameras

Transcription:

CS231M Mobile Computer Vision Structure from motion - Cameras - Epipolar geometry - Structure from motion

Pinhole camera Pinhole perspective projection f o f = focal length o = center of the camera

z y f y' z f ' y P z y P Pinhole camera Derived using similar triangles ) z y,f z (f z) y,, (

From retina plane to images Piels, bottom-left coordinate systems

From retina plane to images y c c

Converting to piels y y c 1. Off set C=[c, c y ] c y (, y, z) (f c, f c y ) z z

Converting to piels y y c c 1. Off set 2. From metric to piels y (, y, z) (f k c, f l c y ) z z C=[c, c y ] Units: k,l : piel/m f : m Non-square piels, : piel

Camera Matri y y c y (, y, z) ( c, c y z z ) c Matri form? C=[c, c y ]

Homogeneous coordinates For details see lecture on transformations in CS131A homogeneous image coordinates homogeneous scene coordinates Converting from homogeneous coordinates

Camera Matri ) c z y, c z ( z) y,, ( y 1 0 1 0 0 0 0 0 0 ' z y c c z z c y z c P y y y c y c C=[c, c y ] Camera matri K

Camera Skew 1 0 1 0 0 0 sin 0 0 cot ' z y c c P y y c y c C = [c, c y ] P P M ' P I K 0 K has 5 degrees of freedom!

World reference system R,T j w k w O w i w The mapping so far is defined within the camera reference system What if an object is represented in the world reference system

World reference system R,T j w P k w O w P i w In 4D homogeneous coordinates: P R T 1 0 4 4 P w I P P K 0 Internal parameters R 0 T 1 ' K I 0 Pw 44 K Eternal parameters R M T P w

Properties of Projection Points project to points Lines project to lines Distant objects look smaller

Properties of Projection Angles are not preserved Parallel lines meet! Vanishing point

Camera Calibration Calibration rig j C P' KI 0 P K R T P w P 1 P n with known positions in [O w,i w,j w,k w ] p 1, p n known positions in the image Goal: compute intrinsic and etrinsic parameters

Camera Calibration http://docs.opencv.org/_downloads/camera_calibration.cpp Calibration rig image j C P' KI 0 P K R T P w P 1 P n with known positions in [O w,i w,j w,k w ] p 1, p n known positions in the image Goal: compute intrinsic and etrinsic parameters

CS231M Mobile Computer Vision Structure from motion - Cameras - Epipolar geometry - Structure from motion

Can we recover the structure from a single view? Pinhole perspective projection P p O w Calibration rig Scene C Camera K Why is it so difficult? Intrinsic ambiguity of the mapping from 3D to image (2D)

Can we recover the structure from a single view? Intrinsic ambiguity of the mapping from 3D to image (2D) Courtesy slide S. Lazebnik

Two eyes help! X? K =known 2 1 R, T O 2 O 1 This is called triangulation K =known

Structure from motion problem X j 1j M m M 1 mj 2j M 2 Given m images of n fied 3D points ij = M i X j, i = 1,, m, j = 1,, n

Structure from motion problem X j 1j M m M 1 mj 2j M 2 From the mn correspondences ij, can we estimate: m projection matrices M i n 3D points X j motion structure

Study relationship between X, 1 and 2 X K =known 2 1 R, T O 2 O 1 Epipolar geometry! K =known

Epipolar geometry X 1 2 e 1 e 2 Epipolar Plane Epipoles e 1, e 2 Baseline Epipolar Lines O 1 O 2 = intersections of baseline with image planes = projections of the other camera center

Epipolar geometry X For details see CS131A Lecture 9 1 2 e 1 e 2 Epipolar Plane Epipoles e 1, e 2 Baseline Epipolar Lines O 1 O 2 = intersections of baseline with image planes = projections of the other camera center

Eample: Converging image planes e e

Eample: Parallel image planes X e 1 1 2 e 2 O 1 O 2 Baseline intersects the image plane at infinity Epipoles are at infinity Epipolar lines are parallel to ais

Eample: Parallel Image Planes e at infinity e at infinity

Why are epipolar constraints useful? - Two views of the same object - Suppose I know the camera positions and camera matrices - Given a point on left image, how can I find the corresponding point on right image?

Why are epipolar constraints useful? X O 1 O 2

Why are epipolar constraints useful? - Two views of the same object - Suppose I know the camera positions and camera matrices - Given a point on left image, how can I find the corresponding point on right image?

Essential matri P p p O R, T O Assume camera matrices are known p T E E p T R 0 E = essential matri (Longuet-Higgins, 1981)

b a b a ] [ 0 0 0 z y y z y z b b b a a a a a a Cross product as matri multiplication

Epipolar Constraint P p p O R, T O p T F K F p 0 T 1 T R K F = Fundamental Matri (Faugeras and Luong, 1992)

Epipolar Constraint p T F p 1 2 0 P p 1 p 2 e 1 e 2 O 1 O 2 F p 2 is the epipolar line associated with p 2 (l 1 = F p 2 ) F T p 1 is the epipolar line associated with 1 (l 2 = F T p 1 ) F e 2 = 0 and F T e 1 = 0 F is 33 matri; 7 DOF F is singular (rank two)

Why F is useful? l = F T - Suppose F is known - No additional information about the scene and camera is given - Given a point on left image, how can I find the corresponding point on right image?

Why F is useful? F captures information about the epipolar geometry of 2 views + camera parameters MORE IMPORTANTLY: F gives constraints on how the scene changes under view point transformation (without reconstructing the scene!) Powerful tool in: 3D reconstruction Multi-view object/scene matching

O O p p P Estimating F The Eight-Point Algorithm 1 v u p P 1 v u p P 0 F p p T (Hartley, 1995) (Longuet-Higgins, 1981)

Estimating F OPENCV: findfundamentalmat

CS231M Mobile Computer Vision Structure from motion - Cameras - Epipolar geometry - Structure from motion

Structure from motion problem X j 1j M m M 1 mj 2j M 2 From the mn correspondences ij, can we estimate: m projection matrices M i n 3D points X j motion structure

Similarity Ambiguity The scene is determined by the images only up a similarity transformation (rotation, translation and scaling) This is called metric reconstruction Similarity The ambiguity eists even for (intrinsically) calibrated cameras For calibrated cameras, the similarity ambiguity is the only ambiguity [Longuet-Higgins 81]

Similarity Ambiguity It is impossible based on the images alone to estimate the absolute scale of the scene (i.e. house height) http://www.robots.o.ac.uk/~vgg/projects/singleview/models/hut/hutme.wrl

Structure from Motion Ambiguities Projective In the general case (nothing is known) the ambiguity is epressed by an arbitrary affine or projective transformation M j i X j M K i i R i T i H X j M H 1 j j M i X j -1 M H H X i j

Projective Ambiguity R. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision, 2nd edition, 2003

Metric reconstruction (upgrade) The problem of recovering the metric reconstruction from the perspective one is called self-calibration Stratified reconstruction: from perspective to affine from affine to metric

Mobile SFM Intrinsic camera parameters are known or can be calibrated. For calibrated cameras, the similarity ambiguity is the only ambiguity [Longuet-Higgins 81] No need for stratified solution or auto-calibration Similarity Metric reconstruction can be determined if a calibration pattern is used or the absolute size of an known object is given.

Structure-from-Motion Algorithms Algebraic approach (by fundamental matri) Factorization method (by SVD) Bundle adjustment

Algebraic approach (2-view case) 1j M 1 2j M i j i X j M 2 Apply a projective transformation H such that: M H 1 I 0 1 M H A b 1 Canonical perspective cameras 2

Algebraic approach (2-view case) 1. Compute the fundamental matri F from two views (eg. 8 point algorithm) 2. Compute b and A from F Compute b as least sq. solution of F b = 0, with b =1 using SVD; b is an epipole A= [b ] F 3. Use b and A to estimate projective cameras M 0 M [b ]F b 1 I 2 4. Use these cameras to triangulate and estimate points in 3D For details, see CS231A, lecture 7

Structure-from-Motion Algorithms Algebraic approach (by fundamental matri) Factorization method (by SVD) Bundle adjustment

Affine structure from motion (simpler problem) Image World Image From the mn correspondences ij, estimate: m projection matrices M i (affine cameras) n 3D points X j

Affine cameras X Camera matri M for the affine case u v M X 1 AX b; M A b

Centering the data X j ˆ ij Normalize points w.r.t. centroids of measurements from each image ij AX j b ˆ A ij i X j

mn m m n n D ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ 2 1 2 22 21 1 12 11 cameras (2m ) points (n ) A factorization method - factorization Let s create a 2m n data (measurement) matri:

Let s create a 2m n data (measurement) matri: n m mn m m n n X X X A A A D 2 1 2 1 2 1 2 22 21 1 12 11 ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ cameras (2m 3) points (3 n ) The measurement matri D = M S has rank 3 (it s a product of a 2m3 matri and 3n matri) A factorization method - factorization (2m n) M S

2m Factorizing the Measurement Matri = Measurements Motion Structure 3 n 3 n D = MS

2m Factorizing the Measurement Matri Singular value decomposition of D: = D U W V T n n n n n

Factorizing the Measurement Matri Since rank (D)=3, there are only 3 non-zero singular values

Factorizing the Measurement Matri S = structure M = Motion (cameras)

Factorizing the Measurement Matri What is the issue here? D has rank>3 because of: measurement noise affine approimation T Theorem: When D has a rank greater than p, U pwpvp is the best possible rank- p approimation of A in the sense of the Frobenius norm. T A0 U3 D U3W3V3 P W V T 0 3 3

Affine Ambiguity = D M C C -1 S The decomposition is not unique. We get the same D by using any 3 3 matri C and applying the transformations: M MC S C -1 S M S Additional constraints must be enforced to resolve this ambiguity

Reconstruction results C. Tomasi and T. Kanade. Shape and motion from image streams under orthography: A factorization method. IJCV, 9(2):137-154, November 1992.

Structure-from-Motion Algorithms Algebraic approach (by fundamental matri) Factorization method (by SVD) Bundle adjustment

Bundle adjustment Non-linear method for refining structure and motion Minimizing re-projection error E(M, X) m i1 n j1 D 2, X ij M i j X j? M 1 X j 1j 3j P 1 M 2 X j 2j M 3 X j P 2 P 3

Bundle adjustment Non-linear method for refining structure and motion Minimizing re-projection error E(M, X) m i1 n j1 D 2, X ij M i Advantages Handle large number of views Handle missing data Can leverage standard optimization packaged such as Levenberg-Marquardt Limitations Large minimization problem (parameters grow with number of views) Requires good initial condition Used as the final step of SFM j

Results and applications Courtesy of Oford Visual Geometry Group Lucas & Kanade, 81 Chen & Medioni, 92 Debevec et al., 96 Levoy & Hanrahan, 96 Fitzgibbon & Zisserman, 98 Triggs et al., 99 Pollefeys et al., 99 Kutulakos & Seitz, 99 Levoy et al., 00 Hartley & Zisserman, 00 Dellaert et al., 00 Rusinkiewic et al., 02 Nistér, 04 Brown & Lowe, 04 Schindler et al, 04 Lourakis & Argyros, 04 Colombo et al. 05 Golparvar-Fard, et al. JAEI 10 Pandey et al. IFAC, 2010 Pandey et al. ICRA 2011 Microsoft s PhotoSynth Snavely et al., 06-08 Schindler et al., 08 Agarwal et al., 09 Frahm et al., 10

CS231M Mobile Computer Vision Net lecture: Eample of a SFM pipeline for mobile devices: the VSLAM pipeline