Dense 3D Reconstruction. Christiano Gava

Dense 3D Reconstruction Christiano Gava christiano.gava@dfki.de

Outline Previous lecture: structure and motion II Structure and motion loop Triangulation Wide baseline matching (SIFT) Today: dense 3D reconstruction The matching problem 2-view reconstruction Multi-view reconstruction Next lecture: structured light 12/11/2012 Lecture 3D Computer Vision 2

Why dense reconstruction? Accurate 3D models cultural heritage! Reconstruction of houses, buildings, famous touristic sites Piazza San Marco Venice 13.703 images 27.707.825 points Interesting! But how can we go from pictures to 3D models? 12/11/2012 Lecture 3D Computer Vision 3

Overview I 12/11/2012 Lecture 3D Computer Vision 4

Overview II 12/11/2012 Lecture 3D Computer Vision 5

Dense 3D reconstruction When we take a picture, a 3D scene is projected onto a 2D image loss of depth information 3D reconstruction is the inverse process: build the 3D scene from a set of 2D images recover depth information How can we do that? MATCHING (difficult) + TRIANGULATION (previous lecture) 12/11/2012 Lecture 3D Computer Vision 6

Two Subproblems In general, 3D reconstruction can be divided into two major problems: Matching finding corresponding elements among the images Triangulation establishing 3D coordinates based on 2D image correspondences Which image entities should we match? 12/11/2012 Lecture 3D Computer Vision 7

Matching Harris corners Lines/edges Feature-based matching SIFT keypoints, etc. Suitable for Structure and Motion But we are interested in dense reconstruction we need dense matching! match as many pixels as possible 12/11/2012 Lecture 3D Computer Vision 9

Dense matching Choice of camera setup Small baseline X wide baseline Matching is relatively easy Higher depth uncertainty Matching is hard Lower depth uncertainty 12/11/2012 Lecture 3D Computer Vision 10

Dense matching For every point on an image, there are many possible matches on the other image(s) Many points look similar high ambiguity What can we do? 12/11/2012 Lecture 3D Computer Vision 11

Dense matching We can use camera geometry finding the correspondence becomes a 1D search problem thanks to the epipolar geometry 12/11/2012 Lecture 3D Computer Vision 12

Dense matching Even using the epipolar constraint, there are many possible matches? Use of pixel neighborhood information correlation-based approaches (also called window-based approaches) 12/11/2012 Lecture 3D Computer Vision 13

Common window-based approaches Normalized Cross-Correlation (NCC) Sum of Squared Differences (SSD) Sum of Absolute Differences (SAD) 12/11/2012 Lecture 3D Computer Vision 14

Recall NCC write regions as vectors region A region B a b vector a vector b 12/11/2012 Lecture 3D Computer Vision 15

Window-based matching Search along the epipolar line for the best match Best match: - Maximum NCC - Minimum SSD or SAD 12/11/2012 Lecture 3D Computer Vision 16

Dense matching challenges Scene elements do not always look the same in the images Camera-related problems Image noise Lens distortion Color/chromatic aberration Viewpoint-related problems Perspective distortions Occlusions Specular reflections Scene-related problems Illumination changes Moving objects 12/11/2012 Lecture 3D Computer Vision 17

Dense matching challenges Matching ambiguity Low textured regions Repetitive texture pattern Some assumptions are made 12/11/2012 Lecture 3D Computer Vision 25

Assumptions Appearance The projections of a scene (3D) patch should have similar appearances in all images Uniqueness A point in an image will have only one match in another image Ordering Order of appearance (e.g. left to right) is not altered by camera displacement Smoothness Depth varies smoothly (objects have smooth surfaces) 12/11/2012 Lecture 3D Computer Vision 26

2-view reconstruction Input: a pair of images (usually left and right) Output: a disparity or depth map left image right image depth map disparity 1 depth 12/11/2012 Lecture 3D Computer Vision 28

2-view reconstruction General case: converging cameras Need to use epipolar geometry This geometry can be simplified by rectification One of the images Both images 12/11/2012 Lecture 3D Computer Vision 29

Image rectification Rectifying one image Image mapping is a 2D homography (projective transformation) Rectifying both images Project images onto plane parallel to baseline epipolar plane 12/11/2012 Lecture 3D Computer Vision 30

Example of rectifying both images 12/11/2012 Lecture 3D Computer Vision 31

Parallel cameras Triangulation reduces to a simple geometric problem x Z x d 12/11/2012 Lecture 3D Computer Vision 32

General dense correspondence algorithm For each pixel in the left image compute the neighbourhood cross correlation along the corresponding epipolar line in the right image the corresponding pixel is the one with the highest cross correlation Parameters size (scale) of neighbourhood search disparity Other constraints uniqueness ordering smoothness of depth field Applicability textured scene, largely fronto-parallel 12/11/2012 Lecture 3D Computer Vision 33

Example with NCC epipolar line 12/11/2012 Lecture 3D Computer Vision 34

Example with NCC left image band right image band 1 0.5 cross correlation 0 12/11/2012 Lecture 3D Computer Vision 35 x

Example with NCC target region left image band right image band 1 0.5 cross correlation 0 Why is it not as good as before? 12/11/2012 Lecture 3D Computer Vision 36 x

2-view reconstruction 1.The neighbourhood region does not have a distinctive spatial intensity distribution 2.Foreshortening effects fronto-parallel surface imaged length the same slanting surface imaged lengths differ

Occlusion surface slice 1 2 3 4,5 6 1 2,3 4 5 6 12/11/2012 Lecture 3D Computer Vision 38

Multi-view reconstruction Many views of the same scene More images More constraints! x 1 x 2 x l 3 31 l 32 l 31 = F T 13 x 1 l 32 = F T 23 x 2 12/11/2012 Lecture 3D Computer Vision 40

Multi-view reconstruction More constraints Less occlusions More robust Appearance constraint can be relaxed However More complex Strong perspective distortions Propagation of calibration uncertainties Dense matching along many images is still hard Precise calibration is essential! 12/11/2012 Lecture 3D Computer Vision 41

Multi-view triangulation Assuming we have found correct matches along n images, how do we triangulate them? X j Exactly as we did before! Remember the algebraic solution x 1j Stack equations for all camera views: x 3j C 1 x 2j C 2 C 3 12/11/2012 Lecture 3D Computer Vision 42

Multi-view reconstruction MATCHING SCENE REPRESENTATION TRIANGULATION 12/11/2012 Lecture 3D Computer Vision 43

Scene Representation Voxel grid Depth map Mesh Point cloud 12/11/2012 Lecture 3D Computer Vision 44

Divide the scene into cubes Voxel grid Needs to sample the 3D space Voxels are marked as occupied/empty according to a photometric consistency measure (NCC, SSD ) Accuracy x size of voxels No assumption about the scene Memory allocation issues 12/12/2012 Lecture 3D Computer Vision 45

Usually one depth map per view (by matching and triangulating features in small sets of images) Depth map No sampling of 3D space Simple Ideal for multi-core Depth maps have to be merged 12/11/2012 Lecture 3D Computer Vision 49

Point cloud Flexible Points are independent Global or local Allows divide and conquer Sparse or dense May be noisy 12/11/2012 Lecture 3D Computer Vision 51

Point cloud Flexible Points are independent Global or local Allows divide and conquer Sparse or dense May be noisy 12/12/2012 Lecture 3D Computer Vision 52

Point cloud Flexible Points are independent Global or local Allows divide and conquer Sparse or dense May be noisy 12/12/2012 Lecture 3D Computer Vision 53

Point cloud 12/11/2012 Lecture 3D Computer Vision 54

Mesh Models the surface as a connected set of planar facets Usually triangular meshes Triangles are local good approximations of the surface Less noisy Good for optimization (mesh refinement) May become difficult to handle for complex surfaces 12/11/2012 Lecture 3D Computer Vision 55

Mesh 12/11/2012 Lecture 3D Computer Vision 58

Mesh 12/11/2012 Lecture 3D Computer Vision 59

Interesting links Middleburry stereo data set: http://vision.middlebury.edu/stereo/data/ Middlebury multi-view data set: http://vision.middlebury.edu/mview/ PMVS version 2: http://grail.cs.washington.edu/software/pmvs/ CVLab multi-view data sets: http://cvlab.epfl.ch/data/strechamvs/ http://cvlab.epfl.ch/~strecha/multiview/densemvs.html http://cvlab.epfl.ch/research/surface/emvs/ 12/12/2012 Lecture 3D Computer Vision 60

References Furukawa, Y. & Ponce, J. Accurate, Dense and Robust Multi-View Stereopsis, TPAMI, 2008, 01, 1-14 Furukawa, Y.; Curless, B.; Seitz, S. M. & Szeliski, R. Towards Internet- Scale Multi-View Stereo, CVPR, 2010 Hiep, V. H.; Keriven, R.; Labatut, P. & Pons, J.-P. Towards highresolution large-scale multi-view stereo, CVPR, 2009, 1430-1437 Goesele, M.; Snavely, N.; Curless, B.; Hoppe, H. & Seitz, S. M. Multi- View Stereo for Community Photo Collections, ICCV, 2007, 0, 1-8 Hartley, R. I. & Zisserman, A. Multiple View Geometry in Computer Vision, Cambridge University Press, ISBN: 0521540518, 2004 12/12/2012 Lecture 3D Computer Vision 61

Organisational information Oral exams: Qualification: A minimum average score of 60% in the exercises Period: 11.03.2013-15.03.2013 DFKI, room 1.27 Registration: email to Leivy Michelly Kaul (kaul@dfki.uni-kl.de) to get an appointment (mention days where you can t make it). Lectures: 16.01.2013: no lecture; you are expected to work through a scientific paper based on central questions (will be uploaded to the website shortly before). The paper will be the basis for the lecture at 23.01.2013. The last lecture will be a question lecture and can be moved from 30.01.2013 to a date closer to the actual exam period (will be coordinated by Johannes and Tobias in the last exercise session). Lecture 3D Computer Vision 62

Thank you!

JUST A REMINDER 12/11/2012 Lecture 3D Computer Vision 64

Triangulation If images are not rectified X j x 1j x 2j C 1 C 2 12/11/2012 Lecture 3D Computer Vision 65

Triangulation Given: corresponding measured (i.e. noisy) feature points x and x and camera poses, P and P Problem: in the presence of noise, back projected rays do not intersect Rays are skew in space and measured points do not lie on corresponding epipolar lines 12/11/2012 Lecture 3D Computer Vision 66

Triangulation: algebraic solution Equation for one camera view (camera pose P and feature point x) 4 entries, 3 DOF i th row of P (3x4) matrix 3 equations, only 2 linearly independent drop third row 12/11/2012 Lecture 3D Computer Vision 67

Triangulation: algebraic solution Stack equations for all camera views: X has 3 DOF (4 parameters due to homogeneous representation) One camera view gives two linearly independent equations Two camera views are sufficient for a minimal solution Solution for X is eigenvector of A T A corresponding to smallest eigenvalue 68

Triangulation: minimize geometric error Estimate 3D point, which exactly satisfies the supplied camera geometry P and P, so it projects as where and are closest to the actual image measurements Assumes perfect camera poses! 12/11/2012 Lecture 3D Computer Vision 69