Dense 3D Reconstruction. Christiano Gava

Dense 3D Reconstruction Christiano Gava christiano.gava@dfki.de

Outline Previous lecture: structure and motion II Structure and motion loop Triangulation Today: dense 3D reconstruction The matching problem 2-view reconstruction Multi-view reconstruction Next lecture: structured light 3/17/2014 Lecture 3D Computer Vision 2

Why dense reconstruction? Accurate 3D models cultural heritage! Reconstruction of houses, buildings, famous touristic sites Piazza San Marco Venice 13.703 images 27.707.825 points Interesting! But how can we go from pictures to 3D models? 3/17/2014 Lecture 3D Computer Vision 3

Overview I 3/17/2014 Lecture 3D Computer Vision 4

Overview II 3/17/2014 Lecture 3D Computer Vision 5

Dense 3D reconstruction When we take a picture, a 3D scene is projected onto a 2D image loss of depth information 3D reconstruction is the inverse process: build the 3D scene from a set of 2D images recover depth information How can we do that? MATCHING (difficult) + TRIANGULATION (previous lecture) 3/17/2014 Lecture 3D Computer Vision 6

Two Subproblems In general, 3D reconstruction can be divided into two major problems: Matching finding corresponding elements among the images Triangulation establishing 3D coordinates based on 2D image correspondences Which image entities should we match? 3/17/2014 Lecture 3D Computer Vision 7

Outline Previous lecture: structure and motion II Structure and motion loop Triangulation Wide baseline matching (SIFT) Today: dense 3D reconstruction The matching problem 2-view reconstruction Multi-view reconstruction Next lecture: structured light 3/17/2014 Lecture 3D Computer Vision 8

Matching Harris corners Lines/edges Feature-based matching SIFT keypoints, etc. Suitable for Structure and Motion But we are interested in dense reconstruction we need dense matching! match as many pixels as possible 3/17/2014 Lecture 3D Computer Vision 9

Dense matching Choice of camera setup Small baseline X wide baseline Matching is relatively easy Higher depth uncertainty Matching is hard Lower depth uncertainty 3/17/2014 Lecture 3D Computer Vision 10

Dense matching For every point on an image, there are many possible matches on the other image(s) Many points look similar high ambiguity What can we do? 3/17/2014 Lecture 3D Computer Vision 11

Dense matching We can use camera geometry finding the correspondence becomes a 1D search problem thanks to the epipolar geometry 3/17/2014 Lecture 3D Computer Vision 12

Dense matching Even using the epipolar constraint, there are many possible matches? Use of pixel neighborhood information correlation-based approaches (also called window-based approaches) 3/17/2014 Lecture 3D Computer Vision 13

Common window-based approaches Normalized Cross-Correlation (NCC) Sum of Squared Differences (SSD) Sum of Absolute Differences (SAD) 3/17/2014 Lecture 3D Computer Vision 14

Recall NCC write regions as vectors region A region B a b vector a vector b 3/17/2014 Lecture 3D Computer Vision 15

Window-based matching Search along the epipolar line for the best match Best match: - Maximum NCC - Minimum SSD or SAD 3/17/2014 Lecture 3D Computer Vision 16

Dense matching challenges Scene elements do not always look the same in the images Camera-related problems Image noise Lens distortion Color/chromatic aberration Viewpoint-related problems Perspective distortions Occlusions Specular reflections Scene-related problems Illumination changes Moving objects 3/17/2014 Lecture 3D Computer Vision 17

Dense matching challenges Matching ambiguity Low textured regions Repetitive texture pattern Some assumptions are made 3/17/2014 Lecture 3D Computer Vision 18

Assumptions Appearance The projections of a scene (3D) patch should have similar appearances in all images Uniqueness A point in an image will have only one match in another image Ordering Order of appearance (e.g. left to right) is not altered by camera displacement Smoothness Depth varies smoothly (objects have smooth surfaces) 3/17/2014 Lecture 3D Computer Vision 19

2-view reconstruction Input: a pair of images (usually left and right) Output: a disparity or depth map left image right image depth map disparity 1 depth 3/17/2014 Lecture 3D Computer Vision 21

2-view reconstruction General case: converging cameras Need to use epipolar geometry This geometry can be simplified by rectification One of the images Both images 3/17/2014 Lecture 3D Computer Vision 22

Image rectification Rectifying one image Image mapping is a 2D homography (projective transformation) Rectifying both images Project images onto plane parallel to baseline epipolar plane 3/17/2014 Lecture 3D Computer Vision 23

Example of rectifying both images 3/17/2014 Lecture 3D Computer Vision 24

Parallel cameras Triangulation reduces to a simple geometric problem x Z x d 3/17/2014 Lecture 3D Computer Vision 25

General dense correspondence algorithm For each pixel in the left image compute the neighbourhood cross correlation along the corresponding epipolar line in the right image the corresponding pixel is the one with the highest cross correlation Parameters size (scale) of neighbourhood search disparity Other constraints uniqueness ordering smoothness of depth field Applicability textured scene, largely fronto-parallel 3/17/2014 Lecture 3D Computer Vision 26

Example with NCC epipolar line 3/17/2014 Lecture 3D Computer Vision 27

Example with NCC left image band right image band 1 0.5 cross correlation 0 3/17/2014 Lecture 3D Computer Vision 28 x

Example with NCC target region left image band right image band 1 0.5 cross correlation 0 Why is it not as good as before? 3/17/2014 Lecture 3D Computer Vision 29 x

2-view reconstruction 1.The neighbourhood region does not have a distinctive spatial intensity distribution 2.Foreshortening effects fronto-parallel surface imaged length the same slanting surface imaged lengths differ

Occlusion surface slice 1 2 3 4,5 6 1 2,3 4 5 6 3/17/2014 Lecture 3D Computer Vision 31

Multi-view reconstruction Many views of the same scene More images More constraints! x 1 x 2 x l 3 31 l 32 l 31 = F T 13 x 1 l 32 = F T 23 x 2 3/17/2014 Lecture 3D Computer Vision 33

Multi-view reconstruction More constraints Less occlusions More robust Appearance constraint can be relaxed However More complex Strong perspective distortions Propagation of calibration uncertainties Dense matching along many images is still hard Precise calibration is essential! 3/17/2014 Lecture 3D Computer Vision 34

Multi-view triangulation Assuming we have found correct matches along n images, how do we triangulate them? X j Exactly as we did before! Remember the algebraic solution x 1j Stack equations for all camera views: x 3j C 1 x 2j C 2 C 3 3/17/2014 Lecture 3D Computer Vision 35

Multi-view reconstruction MATCHING SCENE REPRESENTATION TRIANGULATION 3/17/2014 Lecture 3D Computer Vision 36

Scene Representation Voxel grid Depth map Mesh Point cloud 3/17/2014 Lecture 3D Computer Vision 37

Divide the scene into cubes Voxel grid Needs to sample the 3D space Voxels are marked as occupied/empty according to a photometric consistency measure (NCC, SSD ) Accuracy x size of voxels No assumption about the scene Memory allocation issues 3/17/2014 Lecture 3D Computer Vision 38

Usually one depth map per view (by matching and triangulating features in small sets of images) Depth map No sampling of 3D space Simple Ideal for multi-core Depth maps have to be merged 3/17/2014 Lecture 3D Computer Vision 39

Point cloud Flexible Points are independent Global or local Allows divide and conquer Sparse or dense May be noisy 3/17/2014 Lecture 3D Computer Vision 40

Point cloud 3/17/2014 Lecture 3D Computer Vision 41

Mesh Models the surface as a connected set of planar facets Usually triangular meshes Triangles are local good approximations of the surface Less noisy Good for optimization (mesh refinement) May become difficult to handle for complex surfaces 3/17/2014 Lecture 3D Computer Vision 42

Mesh 3/17/2014 Lecture 3D Computer Vision 43

Mesh 3/17/2014 Lecture 3D Computer Vision 44

Interesting links Bundler (SFM tool) http://www.cs.cornell.edu/~snavely/bundler/ Middleburry stereo data set: http://vision.middlebury.edu/stereo/data/ Middlebury multi-view data set: http://vision.middlebury.edu/mview/ PMVS version 2: http://grail.cs.washington.edu/software/pmvs/ CVLab multi-view data sets: http://cvlab.epfl.ch/data/strechamvs/ http://cvlab.epfl.ch/~strecha/multiview/densemvs.html http://cvlab.epfl.ch/research/surface/emvs/ 3/17/2014 Lecture 3D Computer Vision 56

References Furukawa, Y. & Ponce, J. Accurate, Dense and Robust Multi-View Stereopsis, TPAMI, 2008, 01, 1-14 Furukawa, Y.; Curless, B.; Seitz, S. M. & Szeliski, R. Towards Internet- Scale Multi-View Stereo, CVPR, 2010 Hiep, V. H.; Keriven, R.; Labatut, P. & Pons, J.-P. Towards highresolution large-scale multi-view stereo, CVPR, 2009, 1430-1437 Goesele, M.; Snavely, N.; Curless, B.; Hoppe, H. & Seitz, S. M. Multi- View Stereo for Community Photo Collections, ICCV, 2007, 0, 1-8 Hartley, R. I. & Zisserman, A. Multiple View Geometry in Computer Vision, Cambridge University Press, ISBN: 0521540518, 2004 3/17/2014 Lecture 3D Computer Vision 57

Thank you!

JUST A REMINDER 3/17/2014 Lecture 3D Computer Vision 60

Triangulation If images are not rectified X j x 1j x 2j C 1 C 2 3/17/2014 Lecture 3D Computer Vision 61

Triangulation Given: corresponding measured (i.e. noisy) feature points x and x and camera poses, P and P Problem: in the presence of noise, back projected rays do not intersect Rays are skew in space and measured points do not lie on corresponding epipolar lines 3/17/2014 Lecture 3D Computer Vision 62

Triangulation: algebraic solution Equation for one camera view (camera pose P and feature point x) 4 entries, 3 DOF i th row of P (3x4) matrix 3 equations, only 2 linearly independent drop third row 3/17/2014 Lecture 3D Computer Vision 63

Triangulation: algebraic solution Stack equations for all camera views: X has 3 DOF (4 parameters due to homogeneous representation) One camera view gives two linearly independent equations Two camera views are sufficient for a minimal solution Solution for X is eigenvector of A T A corresponding to smallest eigenvalue 64

Triangulation: minimize geometric error Estimate 3D point, which exactly satisfies the supplied camera geometry P and P, so it projects as where and are closest to the actual image measurements Assumes perfect camera poses! 3/17/2014 Lecture 3D Computer Vision 65