Multiview Stereo COSC450 Lecture 8
Stereo Vision So Far Stereo and epipolar geometry Fundamental matrix captures geometry 8-point algorithm Essential matrix with calibrated cameras 5-point algorithm Intersect rays to recover 3D structure Errors and uncertainty Rays don t intersect closest approach Outliers upset estimation RANSAC Y X x T F x = 0 Z COSC450 Multiview Stereo 2
Multi-view Stereo Can use more than two images Multiple camera rigs Single moving camera Moving multi-camera systems Can reconstruct larger areas Can resolve more details Can use two-view methods Fundamental matrix between each pair Scales are not independent Non-overlapping views are a problem Incremental approaches are common Start with two cameras Recover motion and structure Determine a third camera s pose Recover more structure Repeat until done Incremental multiview problems Determine order of reconstruction Recover pose from 2D-3D matches Stopping errors causing drift COSC450 Multiview Stereo 3
Determining Reconstruction Order What frames to start with Should have many matching features Should have good geometry Choose the pair with the most matches? If we have n images, O(n 2 ) pairs Matching can be expensive Can use image search techniques for large n Represent images as Bags of Words Find nearest neighbours O(n log n) once kd-tree is built Once first pair is done, what next? We have some 3D points We ll need many 2D-3D matches Image with many matches with first pair Again, direct matching or kd-tree This repeats in a cycle Determine pose of the new image Compute new 3D structure Update existing 3D points Can add multiple images at once COSC450 Multiview Stereo 4
Perspective-n-Point Pose Can use 2D-3D matches directly Have 6 unknowns (R, t) Each 2D-3D match gives x u k v = K[R t] y z 1 1 We want to determine R and t How many matches do we need? We have Six unknowns for R, t Each point adds three equations But also 1 unknown (k) If we have n matching points, We have 6 + n unknowns And we get 3n equations Therefore n = 3 matches are needed COSC450 Multiview Stereo 5
Perspective-n-Point Pose This is a non-linear problem Homogeneous points Rotation matrix The geometry is simpler We know 3D points, A, B, C We know their projections, a, b, c The camera is at some point, P This defines a tetrahedron giving us P A P Aligning PA with Pa etc. gives us R RANSAC can be used for robust estimation B C COSC450 Multiview Stereo 6
Reprojection Error Many steps minimise some function Af to estimate F Ax for triangulation PnP model for n > 3 It s not always clear what these mean Taking a step back We measure points in images We have a model We want the model and the measurements to agree Our measurements are: u i,j = (u i,j, v i,j ), the ith point in the jth image Our model consists of The 3D location, x i = (x i, y i, z i )of the ith point The calibration, K j of the jth camera The pose, (R j, t j ) of the jth camera Can predict measurements from the model ] [ ] [ũi,j xi K 1 j [R j t j ] 1 COSC450 Multiview Stereo 7
Reprojection Error We want to minimise M N u i,j ũ i,j i=1 j=1 M is the number of 3D points N is the number of images This is non-linear Minimising it is not simple But it has a clear meaning COSC450 Multiview Stereo 8
Non-Linear Least Squares Linear least squares is (fairly) easy To estimate some parameters, p Form the linear equation Ap = b Solve A T Ap = A T b Non-linear least squares is (much) harder Form an initial guess of p Our model is f (p) = b Here f is any (continuous) function Make a linear approximation to f Use this to update the estimate of p We start with a 1D example We are given some measurements m(x i ) = y i We assume that the measurements come from some function with a parameter, p to estimate: y i f (x i, p) And we have an initial guess, p 0 We find a series of estimates, p 1, p 2,... Each estimate is more accurate COSC450 Multiview Stereo 9
Non-Linear Least Squares We can write this in vector form y f (x, p) And we minimise the squared error ɛ = y f (x, p) 2 We have an initial error, ɛ 0 = y f (x, p 0 ) 2 We can approximate f (x, p) by f (x, p 0 + δ) f (x, p 0 ) + f p δ p=p0 The error becomes ɛ y f (x, p 0 ) f p δ 2 We want to update p by δ to minimise ɛ COSC450 Multiview Stereo 10
Updating the Parameters A simple method is to step along the gradient Step along the negative gradient A small enough step always helps Stepping too far can be a problem Can search for a good step size However, this can be slow to converge Slow when the gradient is small Valleys in multiple dimensions Alternatively, at the minimum error 0 = ɛ δ 0 = 2 ( ) f 2 ( ) f δ = ɛ 0 p p ( y f (x, p 0 ) f ) ( p δ This is the Gauss-Newton algorithm Faster to converge in most cases But not guaranteed to converge f p ) COSC450 Multiview Stereo 11
Levenberg-Marquardt Algorithm We ve considered a single parameter, p Generally this is a vector, p = [ p 1 p 2... p n ] T The function is also vector-valued f (x, p) = [ f 1 (x, p)... f m (x, p) ] T The derivative becomes a matrix f 1 f p 1... 1 p n J =..... f m p 1... This is called the Jacobian f m p n We now solve J T Jδ = Jɛ Levenberg suggested solving (J T J λi)δ = Jɛ If λ is small this is Gauss Newton If λ is large, this is gradient descent Marquardt noted that it is more stable to use ( ) J T J λdiag(j T J) δ = Jɛ COSC450 Multiview Stereo 12
Bundle Adjustment For N images and F features y = f (p) y are our measurements 2D locations of image features There are 2NF measurements p are our parameters R, t and maybe K for each image 3D locations for each feature There are at least 6N + 3F parameters The Jacobian is at least 2NF (6N + 3F ) If we take 100 images (easy to do) And each has 1,000 features (not many) J is about 200, 000 3, 600 Just storing J as floats needs nearly 3GB of RAM, let alone doing the maths Fortunately J is sparse Each 2D measurement depends on just one camera and one 3D point This means each row has 9 non-zeros COSC450 Multiview Stereo 13
Sparse Structure x 1,1 y 1,1 x 1,2 y 1,2. x 1,F y 1,F x 2,1 y 2,1 x 2,2 y 2,2. x 2,F y 2,F. x N,1 y N,1 x N,2 y N,2. x N,F y N,F R 1 t 1 R 2 t 2... R N t N X 1 Y 1 Z 1 X 2 Y 2 Z 2... X F Y F Z F............ COSC450 Multiview Stereo 14
Multi-View Stereo Recap 1. Pick an initial pair of images (many features in common) 2. Determine their relative pose (8- or 5-point algorithm) 3. Determine initial 3D structure (triangulation) 4. Refine the initial estimate (bundle adjustment) 5. Pick the next image(s) to be added (many 2D-3D matches) 6. Estimate their pose and additional 3D structure 7. Refine the estimate (bundle adjustment) 8. If there are more images, go to 5 COSC450 Multiview Stereo 15
Dense Stereo Estimation The structure tends to be sparse Made from feature correspondences We reject many matches to find good ones Once camera poses are estimated We know epipolar geometry We can recover more reliable matches We can expand these to form patches COSC450 Multiview Stereo 16
Surface Estimation Point clouds are limited models We want to fit surfaces to points This is an ill-posed problem Interpolating vs approximating surfaces Once we have a surface We can reproject the images This gives fine texture detail Need to merge images (mosaicing) COSC450 Multiview Stereo 17