Robust Geometry Estimation from two Images

Robust Geometry Estimation from two Images Carsten Rother 09/12/2016 Computer Vision I: Image Formation Process

Roadmap for next four lectures Computer Vision I: Image Formation Process 09/12/2016 2 Appearance-based Matching (sec. 4.1) Projective Geometry - Basics (sec. 2.1.1-2.1.4) Geometry of a Single Camera (sec 2.1.5, 2.1.6) Camera versus Human Perception The Pinhole Camera Lens effects Geometry of two Views (sec. 7.2) The Homography (e.g. rotating camera) Camera Calibration (3D to 2D Mapping) The Fundamental and Essential Matrix (two arbitrary images) Robust Geometry estimation for two cameras (sec. 6.1.4) Multi-View 3D reconstruction (sec. 7.3-7.4) General scenario From Projective to Metric Space Special Cases

Topic 3: Fundamental/Essential Matrix F/E Computer Vision I: Two-View Geometry 09/12/2016 3 Derive geometrically F/E Calibration: Take measurements (points) to compute F/E How do we do that with a minimal number of points? How do we do that with many points? Can we derive the intrinsic (K) an extrinsic (R, C) parameters from F/E? What can we do with F/E?

Reminder: Matching two Images Computer Vision I: Image Formation Process 09/12/2016 4 Find interest points Find orientated patches around interest points to capture appearance Encode patch appearance in a descriptor Find matching patches according to appearance (similar descriptors) Verify matching patches according to geometry (later lecture) v We will discover in next slides: Seven 3D points defines how other 3D points match between 2 views!

Epipolar Geometry epipol epipol Epipolar line: Constrains the location where a particular point (here p 1 ) from one view can be found in the other. Epipolar lines intersect at the epipoles Computer Vision I: Two-View Geometry 09/12/2016 5

Epipolar Geometry Computer Vision I: Two-View Geometry 09/12/2016 6 epipol Epipolar lines: Intersect at the epipoles In general not parallel epipol

Question Computer Vision I: Image Formation Process 09/12/2016 7 When is the epipol a point in the view of the camera, i.e. image? Answers: 1) Sidewise moving camera 2) Forward moving camera 3) Rotating camera 4) In case 1) and 2) Camera centre Camera motion

Example: Converging Cameras Computer Vision I: Two-View Geometry 09/12/2016 8

Example: Motion Parallel to Camera Computer Vision I: Two-View Geometry 09/12/2016 9 We will use this idea when it comes to stereo matching

Example: Forward Motion Computer Vision I: Two-View Geometry 09/12/2016 10 Epipoles have same coordinate in both images Points move along lines radiating from epipole focus of expansion

The maths behind it: Fundamental/Essential Matrix Computer Vision I: Robst Two-View Geometry 09/12/2016 11 derivation on black board X Camera 0 Camera 1 (R, T)

Computer Vision I: Two-View Geometry 09/12/2016 12 The maths behind it: Fundamental/Essential Matrix The 3 vectors are in same plane (co-planar): 1) T (= C 1 C 0 ) 2) X C 0 3) X C 1 X Set camera matrix: x 0 = K 0 I 0 X and x 1 = K 1 R 1 I C 1 X Hence, C 0 = 0; K 1 0 x 0 = X; RK 1 1 x 1 + C 1 = X (note X = X, 1 T ) The three vectors can be re-writting using x 0, x 1 : 1) T 2) X C 0 = X = K 1 0 x 0 3) X C 1 = RK 1 1 x 1 + C 1 C 1 = RK 1 1 x 1 We know that: K 1 0 x T 0 T RK 1 1 x 1 = 0 which gives: x T 0 K T 0 T RK 1 1 x 1 = 0

The Maths behind it: Fundamental/Essential Matrix Computer Vision I: Two-View Geometry 09/12/2016 13 In an un-calibrated setting (K s not known): x 0 T K 0 T T RK 1 1 x 1 = 0 In short: x 0 T Fx 1 = 0 where F is called the Fundamental Matrix (discovered by Faugeras and Luong 1992, Hartley 1992) In an calibrated setting (K s are known): we use rays: x i = K 1 i x i then we get: x T 0 T Rx 1 = 0 In short: x T 0 Ex 1 = 0 where E is called the Essential Matrix (discovered by Longuet-Higgins 1981)

Fundamental Matrix: Properties Computer Vision I: Two-View Geometry 09/12/2016 14 We have x 0 T Fx 1 = 0 where F is called the Fundamental Matrix It is det F = 0 and F has Rank 2. Hence F has 7 DoF Proof: F = K T 1 0 T RK 1 F has Rank 2 since T has Rank 2 x = 0 x 3 x 2 x 3 0 x 1 x 2 x 1 0 Check: det( x ) = x 3 x 3 0 x 1 x 2 + x 2 x 1 x 3 + x 2 0 = 0

Fundamental Matrix: Properties Computer Vision I: Two-View Geometry 09/12/2016 15 For any two matching points (i.e. they have the same 3D point) we have: x 0 T Fx 1 = 0 Compute epipolar line in camera 1 of a point x 0 : l 1 T = x 0 T F (since l 1 T x 1 = x 0 T Fx 1 = 0) Compute epipolar line in camera 0 of a point x 1 : l 0 = Fx 1 (since x 0 T l 0 = x 0 T Fx 1 = 0) l 1 x 1 camera 1 Camera 0 Camera 1

Fundamental Matrix: Properties Computer Vision I: Robust Two-View Geometry 09/12/2016 16 For any two matching points (i.e. have the same 3D point) we have: x 0 T Fx 1 = 0 Compute e 0 with e 0 T F = 0 T (i.e. left nullspace of F; can be computed with SVD) This is the Epipole e 0. It is: e 0 T Fx 1 = 0 T x 1 = 0 for all points x 1. Hence all lines l 0 for any x 1 : l 0 = Fx 1 go through e 0 T. e 0 x 1 l 0 l 0 x 1 camera 1 camera 1 X Epipole e 1 is right null space of F (Fe 1 = 0) Camera 0 (R, T) Camera 1

How can we compute F (2-view calibration)? Computer Vision I: Two-View Geometry 09/12/2016 17 Each pair of matching points gives one linear constraint x T Fx = 0 in F. For x, x we get: x 1 x 1 x 1 x 2 x 1 x 3 x 2 x 1 x 2 x 2 x 2 x 3 x 3 x 1 x 3 x 2 x 3 x 3... ( here x = x 1, x 2, x 3 T ) Given m 8 matching points (x, x) we can compute the F in a simple way. f 11 f 12 f 13 f 21 f 22 f 23 f 31 f 32 f 33 = 0... 0

How can we compute F (2-view calibration)? Computer Vision I: Two-View Geometry 09/12/2016 18 Method (normalized 8-point algorithm): 1) Take m 8 points 2) Compute T, and condition points: x = Tx; x = T x 3) Assemble A with Af = 0, here A is of size m 9, and f vectorized F 4) Compute f = argmin f Af 2 subject to f 2 = 1. Use SVD to do this. 5) Get F of unconditioned points: T T FT (note: (Tx) T F T x = 0) 4) Make rank F = 2 2 [See HZ page 282]

How to make F Rank 2? Computer Vision I: Two-View Geometry 09/12/2016 19 (Again) Use SVD: Set last singular value σ p 1 to 0 then A has Rank p 1 and not p (assuming A has originally full Rank) Proof: diagonal matrix has Rank p 1 hence A has Rank p 1

Can we compute F with just 7 points? Method (7-point algorithm): 1) Take m = 7 points 2) Assemble A with Af = 0, here A is of size 7 9, and f vectorized F 3) Compute 2D right null space: F 1 and F 2 from last two rows in V T (use the SVD decomposition: A = UDV T ) 4) Choose: F = αf 1 + 1 α F 2 (see comments next slide) 5) Determine α s (either 1 or 3 solutions for α) by using the constraint: det(αf 1 + 1 α F 2 ) = 0.(see comments next slide) Note an 8 th point would determine which of these 3 solutions is the correct one. We will see later that the 7-point algorithm is the best choice for the robust case. Computer Vision I: Robust Two-View Geometry 09/12/2016 20

Comments to previous slide Computer Vision I: Image Formation Process 09/12/2016 21 Step 4) Choose: F = αf 1 + 1 α F 2 The full null-space is given by: F = αf 1 + βf 2. We can say that some norm of F has a fixed value. We are free to say that we want: αf 1 + βf 2 1 (here F 1, F 2 are in vectorised form. Note that this is the same as having F 1, F 2 in matrix form and using the Frobenius norm for matrices) It is: α F 1 + β F 2 = αf 1 + βf 2 αf 1 + βf 2 (triangulation inequality) Hence we want: α F 1 + β F 2 1 Hence we want: α + β 1 (since F 1, F 2 are rows in V T and have norm 1) Hence we can choose: β = 1 α

Comments to previous slide Computer Vision I: Image Formation Process 09/12/2016 22 Step 5) Compute det(αf 1 + 1 α F 2 ) = 0 (This is a cubic polynomial equation for α which has one or three real-value solutions for α)

Computer Vision I: Robust Two-View Geometry 09/12/2016 23 Can we get K s, R, T from F? Assume we have Can we get out K 1, R, K 0, T? F = x 0 T K 0 T T RK 1 1 F has 7 DoF K 1, R, K 0, T have together 16 DoF Not directly possible. Only with assumptions such as: External constraints Camera does not change over several frames (This is a challenging topic (more than 10 years of research!) called auto-calibration or self-calibration. We look at it in detail in next lecture.)

Coming back to Essential Matrix Computer Vision I: Robust Two-View Geometry 09/12/2016 24 In a calibrated setting (K s are known): we use rays: x i = K 1 i x i then we get: x T 0 T Rx 1 = 0 In short: x T 0 Ex 1 = 0 where E is called the Essential Matrix E has 5 DoF, since T has 3DoF, R 3DoF (note overall scale of T is unknown) X E has also Rank 2 (R, T)

Question Computer Vision I: Image Formation Process 09/12/2016 25 What is the minimal number of matching points to compute E? Answer: 1) 3 2) 4 3) 5 4) 6 5) 7 6) 8

Computer Vision I: Robust Two-View Geometry 09/12/2016 26 How to compute E We have: x 0 T Ex 1 = 0 Given m 8 matching run 8-point algorithm (as for F) Given m = 7 run 7-point algorithm and get 1 or 3 solutions Given m = 5 run 5-point algorithm to get up to 10 solutions. This is the minimal case since E has 5 DoF. 5-point algorithm history: Kruppa, Zur Ermittlung eines Objektes aus zwei Perspektiven mit innere Orientierung, Sitz.-Ber. Akad. Wiss., Wien, Math.-Naturw. Kl., Abt. IIa, (122):1939-1948, 1913. found 11 solutions M. Demazure, Sur deux problemes de reconstruction, Technical Report Rep. 882, INRIA, Les Chesnay, France, 1988 showed that only up to 10 valid solutions exist D. Nister, An Efficient Solution to the Five-Point Relative Pose Problem, IEEE Conference on Computer Vision and Pattern Recognition, Volume 2, pp. 195-202, 2003 fast method which gives up to 10 solutions of a 10 degree polynomial

Computer Vision I: Robust Two-View Geometry 09/12/2016 27 Can we get R, T from E? Assume we have E = T R, can we get out R, T? E has 5 DoF R, T have together 6 DoF Yes: We can get T up to scale, and a unique R X (R, T)

Computer Vision I: Image Formation Process 09/12/2016 28 How to get a unique T, R? 1) Compute T Note: E has rank 2, and T is in the left nullspace of E since This means that an SVD of E must look like: T 1 0 0 v 0 E = UDV T = u 0 u 1 T T 0 1 0 v 1 0 0 0 2) Compute 4 possible solutions for R T t T = (0,0,0) This fixes the norm of T to 1, and correct sign (+/ T) is done in step 3 R 1,2 =+/ UR T 90 o where E = UDV T, R 90 = 3) Derive the unique solution for R and sign for T: 1) det(r) = 1 2) Reconstruct a 3D point and choose the solution where it lies in front of the two cameras. (In robust case: Take solution where most ( 5) points lie in front of the cameras) v 2 T V T T ; R 3,4 =+/ UR 90 ov T (see derivation HZ page 259; Szeliski page 310) 0 1 0 1 0 0 0 0 0, R 90 = 0 1 0 1 0 0 0 0 0

Visualization of the 4 solutions for R, T Computer Vision I: Robust Two-View Geometry 09/12/2016 29 This is the correct solution, since point is in front of cameras! T T T T The property that points must lie in front of the camera is called Chirality (Hartley 1998)

What can we do with F, E? Computer Vision I: Robust Two-View Geometry 09/12/2016 30 F/E encode the geometry of 2 cameras Can be used to find matching points (dense or sparse) between two views F/E encodes the essential information to do 3D reconstruction

Fundamental and Essential Matrix: Summary Computer Vision I: Robust Two-View Geometry 09/12/2016 31 Derive geometrically F, E F for un-calibrated cameras E for calibrated cameras Calibration: Take measurements (points) to compute F, E F minimum of 7 points -> 1 or 3 real solutions. F many points -> least square solution with SVD E minimum of 5 points -> 10 solutions E many points -> least square solution with SVD Can we derive the intrinsic (K) an extrinsic (R, T) parameters from F, E? -> F next lecture -> E yes can be done (translation up to scale) What can we do with F, E? -> essential tool for 3D reconstruction

Roadmap for next four lectures Computer Vision I: Image Formation Process 09/12/2016 32 Appearance-based Matching (sec. 4.1) Projective Geometry - Basics (sec. 2.1.1-2.1.4) Geometry of a Single Camera (sec 2.1.5, 2.1.6) Camera versus Human Perception The Pinhole Camera Lens effects Geometry of two Views (sec. 7.2) The Homography (e.g. rotating camera) Camera Calibration (3D to 2D Mapping) The Fundamental and Essential Matrix (two arbitrary images) Robust Geometry estimation for two cameras (sec. 6.1.4) Multi-View 3D reconstruction (sec. 7.3-7.4) General scenario From Projective to Metric Space Special Cases

In last lecture we asked (for rotating camera) Computer Vision I: Robust Two-View Geometry 09/12/2016 33 Question 1: If a match is completely wrong then argmin h Ah is a bad idea Question 2: If a match is slightly wrong then argmin h Ah might not be perfect. Better might be a geometric error: argmin h Hx x

Robust model fitting Computer Vision I: Robust Two-View Geometry 09/12/2016 34 RANSAC: Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography Martin A. Fischler and Robert C. Bolles (June 1981). [Side credits: Dimitri Schlesinger]

Computer Vision I: Robust Two-View Geometry 09/12/2016 35 Example Tasks Search for a straight line in a clutter of points y x i.e. search for parameters and for the model given a training set

Example Tasks Computer Vision I: Robust Two-View Geometry 09/12/2016 36 Estimate the fundamental matrix i.e. parameters satisfying given a training set of correspondent pairs For Homography of rotating camera we have: x l i H = x r i

Two sources of errors Computer Vision I: Robust Two-View Geometry 09/12/2016 37 1. Noise: the coordinates deviate from the true ones according to some rule (probability) the father away the less confident 2. Outliers: the data have nothing in common with the model to be estimated Ignoring outliers can lead to a wrong estimation. The way out: find outliers explicitly, estimate the model from inliers only

Computer Vision I: Robust Two-View Geometry 09/12/2016 38 Task formulation Let be the input space and be the parameter space. The training data consist of data points Let an evaluation function of a point with a model. Straight line Fundamental matrix be given that checks the consistency (Inlier) f x 1, x 2, a, b = 0 if ax 1 + bx 2 1 t (e. g. 0.1) 1 otherwise (Outlier) f x l, x r, F = 0 if x l t Fx r t (e. g. 0.1) 1 otherwise (Outlier) (Inlier) The task is to find the parameter that is consistent with the majority of the data points: y = argmin y i f(x i, y) x 2 x 1

First Idea: 2D Line estimation Computer Vision I: Robust Two-View Geometry 09/12/2016 39 Question: How to compute: y = argmin y A naïve approach: enumerate all parameter values know as Hough Transform (very time consuming and not possible at all for many free parameters (i.e. high dimensional parameter space) i f(x i, y) Image with points y 100 Θ r x Encode all lines with two parameters (r, Θ) Hough transform Goal: Find point in the figure where most lines meet Image with just 3 points 0 All lines that go through these 3 points

First Idea: 2D Line estimation Computer Vision I: Robust Two-View Geometry 09/12/2016 40 Hough transform r 0 1 0 0 2 1 0 0 2 0 3 0 0 1 0 0 2 3 1 0 2 0 0 1 3 3 3 4 4 1 3 0 3 3 4 3 1 0 1 0 9 10 9 3 1 3 1 0 20 200 25 4 2 3 0 0 50 35 5 10 5 2 4 4 Θ Accumulator space number of inliers (sketched) Observation: The parameter space have very low counts Idea: do not try all values but only some of them. Which ones?

Data-driven Oracle Computer Vision I: Robust Two-View Geometry 09/12/2016 41 An Oracle is a function that predicts a parameter given the minimum amount of data points (d-tuple): Examples: Line can be estimated from d = 2 points Fundamental matrix from d = 7 or 8 points correspondences Homography can be computed from d = 4 points correspondences First Idea: Do not enumerate all parameter values but all d-tuples of data points That is then n d number of tests, e.g. n 2 for lines (with n points) The optimization is performed over a discrete domain. y = argmin y i f(x i, y) Second Idea: Do not try all subsets, but sample them randomly

Computer Vision I: Robust Two-View Geometry 09/12/2016 42 RANSAC [Random Sample Consensus, Fischler and Bolles 1981] Basic RANSAC method: Can be done in parallel! Repeat many times select d-tuple, e.g. (x 1, x 2 ) for lines compute parameter(s) y, e.g. line y = g x 1, x 2 evaluate f y = i f(x i, y) If f y f y set y = y and keep value f y Sometimes we get a discrete set of intermediate solutions y. For example for F-matrix computation from 7 points we have up to 3 solutions. The we simply evaluate f y for all solutions. How many times do you have to sample in order to reliable estimate the true model?

Computer Vision I: Robust Two-View Geometry 09/12/2016 43 Convergence Observation: it is necessary to sample any in order to estimate the model correctly. -tuple of inliers just once 200 800 Let ε be the probability of outliers. The probability to sample d inliers is 1 ε d (here 0.8 2 = 0.64) The probability of a wrong d-tuple is (here 0.36) The probability to sample n times only wrong tuples is (here 0.36 20 = 0.0000000013) 1000 points overall ε 0.2 The probability to sample the right tuple at least once during the process (i.e. to estimate the correct model according to assumptions) (here 99.999999866 %)

Convergence probabity Computer Vision I: Robust Two-View Geometry 09/12/2016 44 ε (outliers) n

Comment Computer Vision I: Image Formation Process 09/12/2016 45 In our derivation for p = we were slightly optimistic since degenerate inliers may give rise to bad lines + + However, these bad lines have little support wrt number of inliers We also define later a refinement procedure which can correct such bad lines

The choice of the oracle is crucial Example the fundamental matrix: a) 8-point algorithm Probability: 70% (n = 300; ε = 0.5; d = 8) b) 7-point algorithm Probability: 90% (n = 300; ε = 0.5; d = 7) Number of trials to get p% accuracy (here 99%) d ε p = 1 1 1 ε d n n = log 1 p log(1 1 ε d ) Computer Vision I: Robust Two-View Geometry 09/12/2016 46

The choice of evaluation function is crucial Computer Vision I: Image Formation Process 09/12/2016 47 Evaluation function: f x 1, x 2, a, b = 1 if ax 1 + bx 2 1 t (e. g. 0.1) 0 otherwise Algebraic error: Is a measure that has no geometric meaning Example: For a line: d x 1, x 2, a, b = ax 1 + bx 2 1 For a homograpy: d x 1, x 2, a, b = Ah (where A is 1 8 matrix derived as above For F-matrix: d x l, x r, F = x l t Fx r error function Geometric error: Is a measure that considers a distance in image plane Example: For a line: d x 1, x 2, a, b = d( x 1, x 2, l a, b ) Line: l a, b (d is Euclidean distance between point to line) d( x 1, x 2, l a, b ) x 1, x 2 Geometric error: for homography and F-matrix to come

The choice of confidence interval is crucial Computer Vision I: Robust Two-View Geometry 09/12/2016 48 Examples: Large confidence, right model, 2 outliers Large confidence, wrong model, 2 outliers Small confidence, Almost all points are outliers (independent of the model)

Question Computer Vision I: Image Formation Process 09/12/2016 49 Can you think of an algorithm which adapts the number of samples n itself? Answers: 1) Yes 2) No

Extension: Adaptive number of samples n Computer Vision I: Robst Two-View Geometry 09/12/2016 50 Choose n in an adaptive way: 1) Fix p = 99.9% (very large value) 2) Set n = and ε = 0.9 (large value for outlier) 3) During RANSAC adapt n, ε : 1) Re-compute ε from current best solution ε = outliers / all points 2) Re-Compute new n: n = log 1 p log(1 1 ε d )

MSAC (M-Estimator SAmple Consensus) Computer Vision I: Robust Two-View Geometry 09/12/2016 51 If a data point is an inlier the penalty is not 0, but it depends on the distance to the model. Example for the fundamental matrix: becomes f x l, x r, F = 0 if x l t Fx r t (e. g. 0.1) 1 otherwise f x l, x r, F = x l t Fx r if x t l Fx r t (e. g. 0.1) t otherwise the task is to find the model with the minimum average penalty robust function t t) [P.H.S. Torr und A. Zisserman 1996]

Randomized RANSAC Computer Vision I: Robust Two-View Geometry 09/12/2016 52 Evaluation of a hypothesis, i.e. is often time consuming Randomized RANSAC: instead of checking all data points 1. Sample m points from 2. If all of them are inliers, check all others as before, i.e. evaluate hypothesis. But, if there is at least one bad point, among m, reject the hypothesis It is possible that good hypotheses are rejected. However it saves time (bad hypotheses are recognized fast) one can sample more often overall often profitable (depends on application).