Lecture 10 Multi-view Stereo (3D Dense Reconstruction) Davide Scaramuzza
REMODE: Probabilistic, Monocular Dense Reconstruction in Real Time, ICRA 14, by Pizzoli, Forster, Scaramuzza [M. Pizzoli, C. Forster, D. Scaramuzza, REMODE: Probabilistic, Monocular Dense Reconstruction in Real Time, ICRA 14]
3D Reconstruction from Multiple views Assumption Cameras are calibrated both intrinsically K matrix for each camera is known and extrinsically relative positions T and orientations R between cameras are known (for instance, from SFM)
Multi-view stereo Input: calibrated images from several viewpoints Output: 3D object model
Review: The Epipolar Plane The two camera centers and the feature p determine a plane called the epipolar plane, which intersect each camera image plane into an epipolar line. epipolar line p epipolar plane epipolar line C 1 C 2
Review: Epipolar Lines for Correspondence Search Thanks to the epipolar constraint, corresponding points only need to be searched along epipolar lines Left image Right image
Sparse Reconstruction Estimate the structure from a sparse set of features
Dense Reconstruction Estimate the structure from a dense region of pixels
Dense reconstruction workflow Step 1: Local methods Estimate depth for every pixel independently Step 2: Global methods Refine the depth surface as a whole by enforcing smoothness constraint
Photometric error (SSD or SAD) error This error plot is derived for every combination of the reference image and any further image depth IDEA: the optimal depth minimizes the photometric error in all the images as a function of the depth in the first image
Aggregated photometric error Dense reconstruction requires establishing dense correspondences Correspondences are computed based on photometric error: patch-based correlation (SAD, SSD, NCC) Difference among pixel intensity values (patch 1x1 pixels) What are the pros and cons of using small or large patches? Not all the pixels can be matched reliably Viewpoint and illumination changes, occlusions Take advantage of many small baseline views where high quality matching is possible (why?) [Newcombe et al. 2011]
Aggregated photometric error Photometric error for flat regions or edges parallel to the epipolar line show multiple minima (because of noise, lack of textures or repetitive textures) For distinctive pixels (as in b) the aggregated photometric error has typically one clear minimum.
Disparity Space Image (DSI) For discrete depth hypotheses the aggregate photometric error with respect to the reference image can be stored in the Disparity Space Image (DSI) C u, v, d = ρ(i k u, v, d I r (u, v)) k I k u, v, d is the pixel in the k-th image associated with the pixel u, v in the reference image I r and depth hypothesis d ρ( ) is the photometric error (e.g., SSD, SAD) [Szeliski and Golland 1999]
Disparity Space Image (DSI)
Solution The solution to the depth estimation problem is a function d(u, v) in the DSI that presents some optimality properties: Minimum aggregated photometric cost arg min d AND (optionally) best piecewise smoothness (global methods) C
Solution The solution to the depth estimation problem is a function d(u, v) in the DSI that presents some optimality properties: Minimum aggregated photometric cost arg min d AND (optionally) best piecewise smoothness (global methods) C
Solution The solution to the depth estimation problem is a function d(u, v) in the DSI that presents some optimality properties: Minimum aggregated photometric cost arg min d AND (optionally) best piecewise smoothness (global methods) C
Solution Global methods Formulated in terms of energy minimization The objective is to find d(u, v) that minimizes a global energy E d = E d d + λe s (d) Data term Regularization term E d d = C(u, v, d u, v ) (u,v) E s d = ρ d d u, v d u + 1, v (u,v) +ρ d d u, v d u, v + 1 ρ d is a norm (e.g. L 2, L 1 or Huber norm) λ controls the tradeoff data / regularization. What happens for large λ?
Regularized depth maps The regularization term E s (d) Smooths non smooth surfaces (results of noisy measurements) as well as discontinuities Fills the holes Final depth image for different λ [Newcombe et al. 2011]
Regularized depth maps The regularization term E s (d) Smooths non smooth surfaces (results of noisy measurements) as well as discontinuities Fills the holes Final depth image for different λ [Newcombe et al. 2011]
Regularized depth maps The regularization term E s (d) Smooths non smooth surfaces (results of noisy measurements) as well as discontinuities Fills the holes Final depth image for different λ [Newcombe et al. 2011]
Regularized depth maps The regularization term E s (d) Smooths non smooth surfaces (results of noisy measurements) as well as discontinuities Fills the holes Final depth image for different λ [Newcombe et al. 2011]
Regularized depth maps Popular assumption: discontinuities in intensity coincide with discontinuities in depth Control smoothness penalties according to image gradient ρ d d u, v d u + 1, v ρ I I u, v I u + 1, v ρ I is some monotonically decreasing function of intensity differences: lowers smoothness costs at high intensity gradients
Choosing the stereo baseline width of a pixel all of these points project to the same pair of pixels Large Baseline Small Baseline What s the optimal baseline? Solution Too small: large depth error Too large: difficult search problem Obtain depth map from small baselines When baseline becomes to large, create new reference frame (keyframe) and start new depth computation
Fusion of multiple depth maps depth map 1 depth map 2 combination Sensor Sensor
Fusion of multiple depth maps
Depth map fusion input image 16 47 317 images images (ring) (hemisphere) Goesele, Curless, Seitz, 2006 ground truth model Michael Goesele
GPGPU ALU: Arithmetic Logic Unit GPGPU = General Purpose computing on Graphics Processing Unit Perform demanding calculations on the GPU instead of the CPU On the GPU: high processing power in parallel on thousands of cores On a CPU a few cores optimized for sequential serial processing More transistors devoted to data processing More info: http://www.nvidia.com/object/what-isgpu-computing.html#sthash.bw35idmr.dpuf https://www.youtube.com/watch?v=-p28lkwtzri
GPU Capabilities Fast pixel processing Ray tracing, draw textures, shaded triangles faster than CPU Fast matrix / vector operations Transform vertices Programmable Shading, bump mapping Floating-point support Accurate computations Deep Learning Bump mapping Shaded triangles
GPU for 3D DenseReconstruction Image processing Filtering Warping (e.g., epipolar rectification, homography) Feature extraction (i.e., convolutions) Multi-view geometry Search for dense correspondence Pixel-wise operations (correlation) Matrix and vector operations (epipolar geometry) Photometric Cost Aggregation Global optimization Variational methods (i.e., regularization (smoothing)) Parallel, in-place operations for gradient / divergence computation
Why GPU GPUs run thousands of lightweight threads in parallel Typically on consumer hardware: 1024 threads per multiprocessor; 30 multiprocessor => 30k threads. Compared to CPU: 4 quad core support 32 threads (with HyperThreading). Well suited for data-parallelism The same instructions executed on multiple data in parallel High arithmetic intensity: arithmetic operations / memory operations [Source: nvidia]
DTAM: Dense Tracking and Mapping in Real-Time, ICCV 11 by Newcombe, Lovegrove, Davison
REMODE: Regularized Monocular Dense Reconstruction [M. Pizzoli, C. Forster, D. Scaramuzza, REMODE: Probabilistic, Monocular Dense Reconstruction in Real Time, IEEE International Conference on Robotics and Automation 2014] Open source: https://github.com/uzh-rpg/rpg_open_remode
REMODE: Probabilistic, Monocular Dense Reconstruction in Real Time, ICRA 14, by Pizzoli, Forster, Scaramuzza
REMODE: Probabilistic, Monocular Dense Reconstruction in Real Time, ICRA 14, by Pizzoli, Forster, Scaramuzza [M. Pizzoli, C. Forster, D. Scaramuzza, REMODE: Probabilistic, Monocular Dense Reconstruction in Real Time, ICRA 14]
REMODE: Probabilistic, Monocular Dense Reconstruction in Real Time, ICRA 14, by Pizzoli, Forster, Scaramuzza Tracks every pixel (like DTAM) but Probabilistically Runs live on video streamed from MAV (50 Hz on GPU) Copes well with low texture surfaces [Pizzoli, Forster, Scaramuzza, REMODE: Probabilistic, Monocular Dense Reconstruction in Real Time ICRA 14]
REMODE: Probabilistic, Monocular Dense Reconstruction in Real Time, ICRA 14, by Pizzoli, Forster, Scaramuzza Tracks every pixel (like DTAM) but Probabilistically Runs live on video streamed from MAV (50 Hz on GPU) Copes well with low texture surfaces [Pizzoli, Forster, Scaramuzza, REMODE: Probabilistic, Monocular Dense Reconstruction in Real Time ICRA 14]
REMODE applied to autonomous flying 3D scanning Tracks every pixel (like DTAM) but Probabilistically Runs live on video streamed from MAV (50 Hz on GPU) Copes well with low texture surfaces Live demonstration at the Firefighter Training Area of Zurich Featured on ARTE Tv channel on November 22 and SRF 10vo10
REMODE applied to autonomous flying 3D scanning 2x Faessler, Fontana, Forster, Mueggler, Pizzoli, Scaramuzza, Autonomous, Vision-based Flight and Live Dense 3D Mapping with a Quadrotor Micro Aerial Vehicle, Journal of Field Robotics, 2015.
REMODE applied to autonomous flying 3D scanning Faessler, Fontana, Forster, Mueggler, Pizzoli, Scaramuzza, Autonomous, Vision-based Flight and Live Dense 3D Mapping with a Quadrotor Micro Aerial Vehicle, Journal of Field Robotics, 2015.
REMODE applied to autonomous flying 3D scanning Live demonstration at the Firefighter Training Area of Zurich Featured on ARTE Tv channel on November 22 and SRF 10vo10 [Pizzoli, Forster, Scaramuzza, REMODE: Probabilistic, Monocular Dense Reconstruction in Real Time ICRA 14]
3DAround iphone App Dacuda