Lecture 10 Multi-view Stereo (3D Dense Reconstruction) Davide Scaramuzza

Similar documents
Lecture 10 Dense 3D Reconstruction

Visual-Inertial Localization and Mapping for Robot Navigation

Geometric Reconstruction Dense reconstruction of scene geometry

Direct Methods in Visual Odometry

Dense 3D Reconstruction. Christiano Gava

Jakob Engel, Thomas Schöps, Daniel Cremers Technical University Munich. LSD-SLAM: Large-Scale Direct Monocular SLAM

Dense 3D Reconstruction. Christiano Gava

Application questions. Theoretical questions

Multiple View Geometry

3D Computer Vision. Dense 3D Reconstruction II. Prof. Didier Stricker. Christiano Gava

Image Based Reconstruction II

Stereo II CSE 576. Ali Farhadi. Several slides from Larry Zitnick and Steve Seitz

FLaME: Fast Lightweight Mesh Estimation using Variational Smoothing on Delaunay Graphs

Stereo Matching.

Stereo vision. Many slides adapted from Steve Seitz

Vol agile avec des micro-robots volants contrôlés par vision

BIL Computer Vision Apr 16, 2014

What have we leaned so far?

Stereo. Many slides adapted from Steve Seitz

An Evaluation of Robust Cost Functions for RGB Direct Mapping

On-line and Off-line 3D Reconstruction for Crisis Management Applications

Depth from two cameras: stereopsis

Multi-view stereo. Many slides adapted from S. Seitz

Step-by-Step Model Buidling

CS 4495 Computer Vision A. Bobick. Motion and Optic Flow. Stereo Matching

Binocular stereo. Given a calibrated binocular stereo pair, fuse it to produce a depth image. Where does the depth information come from?

Depth from two cameras: stereopsis

Integrating LIDAR into Stereo for Fast and Improved Disparity Computation

Stereo. 11/02/2012 CS129, Brown James Hays. Slides by Kristen Grauman

Static Scene Reconstruction

CS 4495 Computer Vision A. Bobick. Motion and Optic Flow. Stereo Matching

Recap: Features and filters. Recap: Grouping & fitting. Now: Multiple views 10/29/2008. Epipolar geometry & stereo vision. Why multiple views?

Camera Drones Lecture 3 3D data generation

Computer Vision Lecture 17

Computer Vision Lecture 17

Multi-View 3D-Reconstruction

There are many cues in monocular vision which suggests that vision in stereo starts very early from two similar 2D images. Lets see a few...

Fundamental matrix. Let p be a point in left image, p in right image. Epipolar relation. Epipolar mapping described by a 3x3 matrix F

Chaplin, Modern Times, 1936

Lecture 6 Stereo Systems Multi-view geometry

Stereo Vision II: Dense Stereo Matching

Public Library, Stereoscopic Looking Room, Chicago, by Phillips, 1923

CS5670: Computer Vision

Lecture 14: Basic Multi-View Geometry

Fundamentals of Stereo Vision Michael Bleyer LVA Stereo Vision

Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science.

Stereo. Outline. Multiple views 3/29/2017. Thurs Mar 30 Kristen Grauman UT Austin. Multi-view geometry, matching, invariant features, stereo vision

EECS 442 Computer vision. Stereo systems. Stereo vision Rectification Correspondence problem Active stereo vision systems

Epipolar Geometry and Stereo Vision

Multiple View Geometry

CS5670: Computer Vision

Computer Vision I. Dense Stereo Correspondences. Anita Sellent 1/15/16

Correspondence and Stereopsis. Original notes by W. Correa. Figures from [Forsyth & Ponce] and [Trucco & Verri]

Computer Vision I. Announcement

Augmented Reality, Advanced SLAM, Applications

Final project bits and pieces

A Patch Prior for Dense 3D Reconstruction in Man-Made Environments

Computer Vision I. Announcements. Random Dot Stereograms. Stereo III. CSE252A Lecture 16

Lecture 14: Computer Vision

Project Updates Short lecture Volumetric Modeling +2 papers

Lecture 6 Stereo Systems Multi- view geometry Professor Silvio Savarese Computational Vision and Geometry Lab Silvio Savarese Lecture 6-24-Jan-15

Computer Vision I. Announcement. Stereo Vision Outline. Stereo II. CSE252A Lecture 15

3D Computer Vision. Depth Cameras. Prof. Didier Stricker. Oliver Wasenmüller

Epipolar Geometry and Stereo Vision

Cameras and Stereo CSE 455. Linda Shapiro

CS4495/6495 Introduction to Computer Vision. 3B-L3 Stereo correspondence

Semi-Dense Direct SLAM

Announcements. Stereo Vision II. Midterm. Example: Helmholtz Stereo Depth + Normals + BRDF. Stereo

Prof. Trevor Darrell Lecture 18: Multiview and Photometric Stereo

Subpixel accurate refinement of disparity maps using stereo correspondences

Stereo Vision. MAN-522 Computer Vision

Lecture'9'&'10:'' Stereo'Vision'

Lecture 9 & 10: Stereo Vision

Stereo: Disparity and Matching

EE795: Computer Vision and Intelligent Systems

Direct Visual Odometry for a Fisheye-Stereo Camera

Camera Calibration. Schedule. Jesus J Caban. Note: You have until next Monday to let me know. ! Today:! Camera calibration

Flow Estimation. Min Bai. February 8, University of Toronto. Min Bai (UofT) Flow Estimation February 8, / 47

Announcements. Stereo Vision Wrapup & Intro Recognition

Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science.

3D Photography: Stereo Matching

Project 3 code & artifact due Tuesday Final project proposals due noon Wed (by ) Readings Szeliski, Chapter 10 (through 10.5)

Lecture 10: Multi-view geometry

Introduction à la vision artificielle X

Stereo Vision A simple system. Dr. Gerhard Roth Winter 2012

Passive 3D Photography

Multi-View Stereo for Static and Dynamic Scenes

Project 2 due today Project 3 out today. Readings Szeliski, Chapter 10 (through 10.5)

Recap from Previous Lecture

Intrinsic3D: High-Quality 3D Reconstruction by Joint Appearance and Geometry Optimization with Spatially-Varying Lighting

Computer Vision, Lecture 11

Perception II: Pinhole camera and Stereo Vision

Stereo matching. Francesco Isgrò. 3D Reconstruction and Stereo p.1/21

Reconstruction, Motion Estimation and SLAM from Events

Lecture 10: Multi view geometry

Epipolar Geometry CSE P576. Dr. Matthew Brown

Inline Computational Imaging: Single Sensor Technology for Simultaneous 2D/3D High Definition Inline Inspection

Reminder: Lecture 20: The Eight-Point Algorithm. Essential/Fundamental Matrix. E/F Matrix Summary. Computing F. Computing F from Point Matches

Re-rendering from a Dense/Sparse Set of Images

Computationally Efficient Visual-inertial Sensor Fusion for GPS-denied Navigation on a Small Quadrotor

Transcription:

Lecture 10 Multi-view Stereo (3D Dense Reconstruction) Davide Scaramuzza

REMODE: Probabilistic, Monocular Dense Reconstruction in Real Time, ICRA 14, by Pizzoli, Forster, Scaramuzza [M. Pizzoli, C. Forster, D. Scaramuzza, REMODE: Probabilistic, Monocular Dense Reconstruction in Real Time, ICRA 14]

3D Reconstruction from Multiple views Assumption Cameras are calibrated both intrinsically K matrix for each camera is known and extrinsically relative positions T and orientations R between cameras are known (for instance, from SFM)

Multi-view stereo Input: calibrated images from several viewpoints Output: 3D object model

Review: The Epipolar Plane The two camera centers and the feature p determine a plane called the epipolar plane, which intersect each camera image plane into an epipolar line. epipolar line p epipolar plane epipolar line C 1 C 2

Review: Epipolar Lines for Correspondence Search Thanks to the epipolar constraint, corresponding points only need to be searched along epipolar lines Left image Right image

Sparse Reconstruction Estimate the structure from a sparse set of features

Dense Reconstruction Estimate the structure from a dense region of pixels

Dense reconstruction workflow Step 1: Local methods Estimate depth for every pixel independently Step 2: Global methods Refine the depth surface as a whole by enforcing smoothness constraint

Photometric error (SSD or SAD) error This error plot is derived for every combination of the reference image and any further image depth IDEA: the optimal depth minimizes the photometric error in all the images as a function of the depth in the first image

Aggregated photometric error Dense reconstruction requires establishing dense correspondences Correspondences are computed based on photometric error: patch-based correlation (SAD, SSD, NCC) Difference among pixel intensity values (patch 1x1 pixels) What are the pros and cons of using small or large patches? Not all the pixels can be matched reliably Viewpoint and illumination changes, occlusions Take advantage of many small baseline views where high quality matching is possible (why?) [Newcombe et al. 2011]

Aggregated photometric error Photometric error for flat regions or edges parallel to the epipolar line show multiple minima (because of noise, lack of textures or repetitive textures) For distinctive pixels (as in b) the aggregated photometric error has typically one clear minimum.

Disparity Space Image (DSI) For discrete depth hypotheses the aggregate photometric error with respect to the reference image can be stored in the Disparity Space Image (DSI) C u, v, d = ρ(i k u, v, d I r (u, v)) k I k u, v, d is the pixel in the k-th image associated with the pixel u, v in the reference image I r and depth hypothesis d ρ( ) is the photometric error (e.g., SSD, SAD) [Szeliski and Golland 1999]

Disparity Space Image (DSI)

Solution The solution to the depth estimation problem is a function d(u, v) in the DSI that presents some optimality properties: Minimum aggregated photometric cost arg min d AND (optionally) best piecewise smoothness (global methods) C

Solution The solution to the depth estimation problem is a function d(u, v) in the DSI that presents some optimality properties: Minimum aggregated photometric cost arg min d AND (optionally) best piecewise smoothness (global methods) C

Solution The solution to the depth estimation problem is a function d(u, v) in the DSI that presents some optimality properties: Minimum aggregated photometric cost arg min d AND (optionally) best piecewise smoothness (global methods) C

Solution Global methods Formulated in terms of energy minimization The objective is to find d(u, v) that minimizes a global energy E d = E d d + λe s (d) Data term Regularization term E d d = C(u, v, d u, v ) (u,v) E s d = ρ d d u, v d u + 1, v (u,v) +ρ d d u, v d u, v + 1 ρ d is a norm (e.g. L 2, L 1 or Huber norm) λ controls the tradeoff data / regularization. What happens for large λ?

Regularized depth maps The regularization term E s (d) Smooths non smooth surfaces (results of noisy measurements) as well as discontinuities Fills the holes Final depth image for different λ [Newcombe et al. 2011]

Regularized depth maps The regularization term E s (d) Smooths non smooth surfaces (results of noisy measurements) as well as discontinuities Fills the holes Final depth image for different λ [Newcombe et al. 2011]

Regularized depth maps The regularization term E s (d) Smooths non smooth surfaces (results of noisy measurements) as well as discontinuities Fills the holes Final depth image for different λ [Newcombe et al. 2011]

Regularized depth maps The regularization term E s (d) Smooths non smooth surfaces (results of noisy measurements) as well as discontinuities Fills the holes Final depth image for different λ [Newcombe et al. 2011]

Regularized depth maps Popular assumption: discontinuities in intensity coincide with discontinuities in depth Control smoothness penalties according to image gradient ρ d d u, v d u + 1, v ρ I I u, v I u + 1, v ρ I is some monotonically decreasing function of intensity differences: lowers smoothness costs at high intensity gradients

Choosing the stereo baseline width of a pixel all of these points project to the same pair of pixels Large Baseline Small Baseline What s the optimal baseline? Solution Too small: large depth error Too large: difficult search problem Obtain depth map from small baselines When baseline becomes to large, create new reference frame (keyframe) and start new depth computation

Fusion of multiple depth maps depth map 1 depth map 2 combination Sensor Sensor

Fusion of multiple depth maps

Depth map fusion input image 16 47 317 images images (ring) (hemisphere) Goesele, Curless, Seitz, 2006 ground truth model Michael Goesele

GPGPU ALU: Arithmetic Logic Unit GPGPU = General Purpose computing on Graphics Processing Unit Perform demanding calculations on the GPU instead of the CPU On the GPU: high processing power in parallel on thousands of cores On a CPU a few cores optimized for sequential serial processing More transistors devoted to data processing More info: http://www.nvidia.com/object/what-isgpu-computing.html#sthash.bw35idmr.dpuf https://www.youtube.com/watch?v=-p28lkwtzri

GPU Capabilities Fast pixel processing Ray tracing, draw textures, shaded triangles faster than CPU Fast matrix / vector operations Transform vertices Programmable Shading, bump mapping Floating-point support Accurate computations Deep Learning Bump mapping Shaded triangles

GPU for 3D DenseReconstruction Image processing Filtering Warping (e.g., epipolar rectification, homography) Feature extraction (i.e., convolutions) Multi-view geometry Search for dense correspondence Pixel-wise operations (correlation) Matrix and vector operations (epipolar geometry) Photometric Cost Aggregation Global optimization Variational methods (i.e., regularization (smoothing)) Parallel, in-place operations for gradient / divergence computation

Why GPU GPUs run thousands of lightweight threads in parallel Typically on consumer hardware: 1024 threads per multiprocessor; 30 multiprocessor => 30k threads. Compared to CPU: 4 quad core support 32 threads (with HyperThreading). Well suited for data-parallelism The same instructions executed on multiple data in parallel High arithmetic intensity: arithmetic operations / memory operations [Source: nvidia]

DTAM: Dense Tracking and Mapping in Real-Time, ICCV 11 by Newcombe, Lovegrove, Davison

REMODE: Regularized Monocular Dense Reconstruction [M. Pizzoli, C. Forster, D. Scaramuzza, REMODE: Probabilistic, Monocular Dense Reconstruction in Real Time, IEEE International Conference on Robotics and Automation 2014] Open source: https://github.com/uzh-rpg/rpg_open_remode

REMODE: Probabilistic, Monocular Dense Reconstruction in Real Time, ICRA 14, by Pizzoli, Forster, Scaramuzza

REMODE: Probabilistic, Monocular Dense Reconstruction in Real Time, ICRA 14, by Pizzoli, Forster, Scaramuzza [M. Pizzoli, C. Forster, D. Scaramuzza, REMODE: Probabilistic, Monocular Dense Reconstruction in Real Time, ICRA 14]

REMODE: Probabilistic, Monocular Dense Reconstruction in Real Time, ICRA 14, by Pizzoli, Forster, Scaramuzza Tracks every pixel (like DTAM) but Probabilistically Runs live on video streamed from MAV (50 Hz on GPU) Copes well with low texture surfaces [Pizzoli, Forster, Scaramuzza, REMODE: Probabilistic, Monocular Dense Reconstruction in Real Time ICRA 14]

REMODE: Probabilistic, Monocular Dense Reconstruction in Real Time, ICRA 14, by Pizzoli, Forster, Scaramuzza Tracks every pixel (like DTAM) but Probabilistically Runs live on video streamed from MAV (50 Hz on GPU) Copes well with low texture surfaces [Pizzoli, Forster, Scaramuzza, REMODE: Probabilistic, Monocular Dense Reconstruction in Real Time ICRA 14]

REMODE applied to autonomous flying 3D scanning Tracks every pixel (like DTAM) but Probabilistically Runs live on video streamed from MAV (50 Hz on GPU) Copes well with low texture surfaces Live demonstration at the Firefighter Training Area of Zurich Featured on ARTE Tv channel on November 22 and SRF 10vo10

REMODE applied to autonomous flying 3D scanning 2x Faessler, Fontana, Forster, Mueggler, Pizzoli, Scaramuzza, Autonomous, Vision-based Flight and Live Dense 3D Mapping with a Quadrotor Micro Aerial Vehicle, Journal of Field Robotics, 2015.

REMODE applied to autonomous flying 3D scanning Faessler, Fontana, Forster, Mueggler, Pizzoli, Scaramuzza, Autonomous, Vision-based Flight and Live Dense 3D Mapping with a Quadrotor Micro Aerial Vehicle, Journal of Field Robotics, 2015.

REMODE applied to autonomous flying 3D scanning Live demonstration at the Firefighter Training Area of Zurich Featured on ARTE Tv channel on November 22 and SRF 10vo10 [Pizzoli, Forster, Scaramuzza, REMODE: Probabilistic, Monocular Dense Reconstruction in Real Time ICRA 14]

3DAround iphone App Dacuda