3D Computer Vision Structure from Motion Prof. Didier Stricker Kaiserlautern University http://ags.cs.uni-kl.de/ DFKI Deutsches Forschungszentrum für Künstliche Intelligenz http://av.dfki.de 1
Structure and Motion Unknown camera viewpoints Reconstruct Scene geometry Camera motion 2
The story so far stereo reconstruction from 2 views Given cameras Epipolar geometry: compute fundamental matrix Correspondence search: 1D search for corresponding points x! x/ along epipolar line l/ = F x Now, structure and motion 3
Example cameras and points 4
Structure and Motion: Problem statement Given 2 (or more) images of a scene, compute the scene structure and the camera motion Assume internal calibration (K, K/) is known Assume scene is rigid Start with 2 views only NB epipolar geometry is not known 5
Outline Image point motion Computing the fundamental matrix 8-point algorithm automation motion from the fundamental matrix More than two views matching estimation Applications 6
First image 7
Second image 8
Second motion example 9
Why use interest points? Compute points to avoid the aperture problem 2D feature 1D feature (edge) uniform 10
The Aperture Problem From left to right, the line appears to have moved in the direction indicated. But 11
The Aperture Problem We open our eyes a little farther, and find that we were wrong. Only the component of motion perpendicular to the line can be determined from local image measurements 12
Interest points computed for each frame Harris corner detector 13
Interest points image one 14
Interest points image two 15
The geometric motion problem Given image point correspondences, xi n xi/,determine R and t x C x/ C/ Rotate and translate camera until stars of rays intersect 16
Outline of structure and motion computation 1. Compute the fundamental matrix F from point correspondences xi! xi/ 2. Compute the cameras (motion) from the fundamental matrix (recall ). Obtain 3. Compute the 3D structure Xi from the cameras P, P/ and point correspondences xi! xi/ (triangulation) 17
What can be computed from point correspondences? Suppose we have computed homogeneous matrix, so can the motion be computed? F is a i.e. the translation can only be determined up to scale. This is a consequence of the depth / speed ambiguity: only the ratio of t and Z can be computed since if the images are unchanged. a large motion of a distant object, and a small motion of a nearby object are indistinguishable (from point motion alone) Summary: the rotation R (3 dof) can be determined completely, but only the translation direction (2 dof) can be determined, not its magnitude 18
How many point correspondences are required? for n points there are 3n unknowns (the 3D position of each point) for 2 views there are 5 unknowns (that are recoverable) each point correspondence gives 4 measurements for n points expect a solution if 4n (3n + 5), i.e. n 5 we will give solutions for n = 8 correspondences (algorithms also exist for n=7). 19
Computing the fundamental matrix The 8 point algorithm 20
Problem statement Given: n corresponding points compute the fundamental matrix F such that Solution 21
For n points A is an n x 9 measurement matrix, and f is the fundamental matrix written as a 9-vector For 8 points, A is an 8 x 9 matrix and f can be computed as the null-vector of A, i.e. f is determined up to scale Note, this solution (and those following) does not require (K, K/) 22
Example: compute F from 8 point correspondences Images from a parallel camera stereo rig epipolar lines y = y/ 2 2 x1-4 -2 x/1 x2 2 4-4 -2 2 x/2 4 x/3 x3-2 -2 just consider first three points 23
24
The 8-point algorithm Least squares solution 25
Automatic computation of the fundamental matrix 26
27
Step 1: interest points Harris corner detector 100 s of points per image 28
Step 2: match points a: proximity - search within disparity window b: cross-correlate on intensity neighbourhoods 29
Step 2a: match points proximity proximity - search within disparity window 30
Step 2b: match points cross-correlate cross-correlate on intensity neighbourhoods 31
Correlation matching results Many wrong matches (10-50%), but enough to compute F 32
Robust line estimation - RANSAC Fit a line to 2D data containing outliers There are two problems 1. a line fit which minimizes perpendicular distance 2. a classification into inliers (valid points) and outliers Solution: use robust statistical estimation algorithm RANSAC (RANdom Sample Consensus) [Fishler & Bolles, 1981] 33
RANSAC robust line estimation Repeat 1. Select random sample of 2 points 2. Compute the line through these points 3. Measure support (number of points within threshold distance of the line) Choose the line with the largest number of inliers Compute least squares fit of line to inliers (regression) 34
35
36
37
38
39
40
41
42
43
Algorithm summary RANSAC robust F estimation Repeat 1. Select random sample of 8 correspondences 2. Compute F 3. Measure support (number of inliers within threshold distance of epipolar line) Choose the F with the largest number of inliers 44
Correlation matching results Many wrong matches (10-50%), but enough to compute F 45
Correspondences consistent with epipolar geometry 46
Computed epipolar geometry 47
Determining cameras from the fundamental matrix 48
Epipolar constraint: Calibrated case X x x [t ( R xʹ )] = 0 x xt E xʹ = 0 with E = [t ]R Essential Matrix (Longuet-Higgins, 1981) The vectors x, t, and Rx are coplanar 49
Epipolar constraint: Calibrated case X x x [t ( R xʹ )] = 0 x xt E xʹ = 0 with E = [t ]R E x is the epipolar line associated with x (l = E x ) ETx is the epipolar line associated with x (l = ETx) E e = 0 and ETe = 0 E is singular (rank two) E has five degrees of freedom 50
Epipolar constraint: Calibrated case X x x [t ( R xʹ )] = 0 x xt E xʹ = 0 with E = [t ]R The skew-symmetric matrix must have two singular values which are equal and another which is zero. The multiplication of the rotation matrix does not change the singular values which means that also the essential matrix has two singular values which are equal and one which is zero. 51
Decomposing the fundamental matrix Form the Essential matrix 52
The four camera solutions The 3D point is only in front of both cameras in one case 53
SVD decomposition of E E = UΣVT [T ] = UZU T R = UWV T or R = UW T V T # 0 1 0 & # 0 1 0 & % ( % ( with W = % 1 0 0 ( and Z = % -1 0 0 ( % 0 0 1 ( % 0 0 0 ( $ ' $ ' 54
Structure and Motion for more than 2 views 55
What is gained by having more than 2 views? 1. The two view ambiguities do not get worse there is an overall scale ambiguity 2. Matching: matches can be verified 3. Estimation: accuracy increased by using more measurements 56
Notation for three or more views For 3 views the cameras are P, P/ and P//, and a 3D point is imaged as x = PX x/ = P/X x// = P// X X P// P x P/ x // x/ C // C C/ For m views, a point Xj is imaged in the i th view as i xij = P Xj 57
Point correspondence over 3 views Given: the cameras P, P/ and P//, and matching points x and x/ Find: the matching point in the third view? x x/??? Algorithm: compute the 3D point from x and x/ and project it into the third view the matching point coincides with the projected point Point constraint is stronger than epipolar constraint (line) 58
Problem statement: structure and motion Given: n matching image points xij over m views i i Find: the cameras P and the 3D points Xj such that xij = P Xj number of parameters for each camera there are 6 parameters for each 3D point there are 3 parameters a total of 6 m + 3 n parameters must be estimated e.g. 50 frames, 1000 points: 3300 unknowns 59
Algorithm for structure and motion images 1 Algorithm 2 3 4 Compute interest points in each image Compute point correspondences between consecutive image pairs Extend and verify correspondences and cameras over image triplets Extend correspondences and cameras over all images Optimize over 60
Application: Augmented reality original sequence with point tracking 61
Augmentation 62
Thank you! Check examples of Match Movers for special effects on the Internet 63