Static Scene Reconstruction

Similar documents
Gain Adaptive Real-Time Stereo Streaming

Lecture 10 Multi-view Stereo (3D Dense Reconstruction) Davide Scaramuzza

A Heightmap Model for Efficient 3D Reconstruction from Street-Level Video

Stereo matching. Reading: Chapter 11 In Szeliski s book

Evaluation of Large Scale Scene Reconstruction

Multi-view Stereo. Ivo Boyadzhiev CS7670: September 13, 2011

Lecture 10 Dense 3D Reconstruction

What have we leaned so far?

Multi-View 3D-Reconstruction

Geometric Reconstruction Dense reconstruction of scene geometry

Dense 3D Reconstruction. Christiano Gava

Mobile Point Fusion. Real-time 3d surface reconstruction out of depth images on a mobile platform

A GPU-based implementation of Motion Detection from a Moving Platform

EE795: Computer Vision and Intelligent Systems

Real-Time Video-Based Rendering from Multiple Cameras

Dense 3D Reconstruction. Christiano Gava

Segmentation Based Stereo. Michael Bleyer LVA Stereo Vision

Efficient Large-Scale Stereo Matching

Multiple View Geometry

Towards Urban 3D Reconstruction From Video

Real-Time Visibility-Based Fusion of Depth Maps

Hierarchical Upsampling for Fast Image-Based Depth Estimation

Parallel algorithms to a parallel hardware: Designing vision algorithms for a GPU

Stereo Vision II: Dense Stereo Matching

Multi-view stereo. Many slides adapted from S. Seitz

Image-based rendering using plane-sweeping modelisation

3D Photography: Stereo Matching

Camera Drones Lecture 3 3D data generation

Computer Vision Lecture 20

Computer Vision Lecture 20

3D Reconstruction Using an n-layer Heightmap

Feature tracking and matching in video using programmable graphics hardware

URBAN STRUCTURE ESTIMATION USING PARALLEL AND ORTHOGONAL LINES

A Systems View of Large- Scale 3D Reconstruction

Realtime Affine-photometric KLT Feature Tracker on GPU in CUDA Framework

From Structure-from-Motion Point Clouds to Fast Location Recognition

Project 2 due today Project 3 out today. Readings Szeliski, Chapter 10 (through 10.5)

AUTOMATED 3D RECONSTRUCTION OF URBAN AREAS FROM NETWORKS OF WIDE-BASELINE IMAGE SEQUENCES

3D model search and pose estimation from single images using VIP features

Multi-View Stereo for Static and Dynamic Scenes

A virtual tour of free viewpoint rendering

Jakob Engel, Thomas Schöps, Daniel Cremers Technical University Munich. LSD-SLAM: Large-Scale Direct Monocular SLAM

Piecewise Planar and Non-Planar Stereo for Urban Scene Reconstruction

Real-Time Scene Reconstruction. Remington Gong Benjamin Harris Iuri Prilepov

Real-Time 2D to 2D+Depth Video Conversion. OzViz David McKinnon QUT

AMS-HMI12: Assisted mobility supported by shared-control and advanced human-machine interfaces

Combining Monocular Geometric Cues with Traditional Stereo Cues for Consumer Camera Stereo

USING THE GPU FOR FAST SYMMETRY-BASED DENSE STEREO MATCHING IN HIGH RESOLUTION IMAGES

EECS 442 Computer vision. Stereo systems. Stereo vision Rectification Correspondence problem Active stereo vision systems

Subpixel accurate refinement of disparity maps using stereo correspondences

Stereo II CSE 576. Ali Farhadi. Several slides from Larry Zitnick and Steve Seitz

PatchMatch Stereo - Stereo Matching with Slanted Support Windows

Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science.

High-Performance Multi-View Reconstruction

Colour Segmentation-based Computation of Dense Optical Flow with Application to Video Object Segmentation

Real-time Global Stereo Matching Using Hierarchical Belief Propagation

Live Metric 3D Reconstruction on Mobile Phones ICCV 2013

Stereo Matching.

Augmented and Mixed Reality

Fundamentals of Stereo Vision Michael Bleyer LVA Stereo Vision

Geometry based Repetition Detection for Urban Scene

Stereo Vision. MAN-522 Computer Vision

Regular Paper Challenges in wide-area structure-from-motion

Monocular Visual Odometry and Dense 3D Reconstruction for On-Road Vehicles

Data Term. Michael Bleyer LVA Stereo Vision

Augmented Reality, Advanced SLAM, Applications

Homographies and RANSAC

EE795: Computer Vision and Intelligent Systems

Ping Tan. Simon Fraser University

Temporally Consistence Depth Estimation from Stereo Video Sequences

3D Reconstruction from Scene Knowledge

DiFi: Distance Fields - Fast Computation Using Graphics Hardware

SPM-BP: Sped-up PatchMatch Belief Propagation for Continuous MRFs. Yu Li, Dongbo Min, Michael S. Brown, Minh N. Do, Jiangbo Lu

Combining Monocular Geometric Cues with Traditional Stereo Cues for Consumer Camera Stereo

Final project bits and pieces

Supplementary Material for ECCV 2012 Paper: Extracting 3D Scene-consistent Object Proposals and Depth from Stereo Images

Epipolar Geometry and Stereo Vision

CS4495/6495 Introduction to Computer Vision. 3B-L3 Stereo correspondence

Midterm Examination CS 534: Computational Photography

Lecture 10: Multi view geometry

Advanced Imaging Applications on Smart-phones Convergence of General-purpose computing, Graphics acceleration, and Sensors

CSE 527: Introduction to Computer Vision

TSBK03 Screen-Space Ambient Occlusion

CS 4495 Computer Vision A. Bobick. Motion and Optic Flow. Stereo Matching

A Computer Vision Approach for 3D Reconstruction of Urban Environments

REAL-TIME VIDEO-BASED RECONSTRUCTION OF URBAN ENVIRONMENTS

LS-ACTS 1.0 USER MANUAL

Stereo vision. Many slides adapted from Steve Seitz

OpenCL Implementation Of A Heterogeneous Computing System For Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data

Computer Vision I. Announcements. Random Dot Stereograms. Stereo III. CSE252A Lecture 16

Motion Estimation. There are three main types (or applications) of motion estimation:

Binocular stereo. Given a calibrated binocular stereo pair, fuse it to produce a depth image. Where does the depth information come from?

BUILDING POINT GROUPING USING VIEW-GEOMETRY RELATIONS INTRODUCTION

TEGRA K1 AND THE AUTOMOTIVE INDUSTRY. Gernot Ziegler, Timo Stich

Lecture 14: Computer Vision

Computational Optical Imaging - Optique Numerique. -- Multiple View Geometry and Stereo --

Direct Methods in Visual Odometry

Photo Tourism: Exploring Photo Collections in 3D

Visualization 2D-to-3D Photo Rendering for 3D Displays

Using Multiple Hypotheses to Improve Depth-Maps for Multi-View Stereo

Transcription:

GPU supported Real-Time Scene Reconstruction with a Single Camera Jan-Michael Frahm, 3D Computer Vision group, University of North Carolina at Chapel Hill Static Scene Reconstruction 1

Capture on campus & in downtown Chapel Hill 2x4 cameras, 1024x768@30Hz GPS location & orientation ~$(400 GPUs) Chapel Hill (15-30mph, 30fps) 2.6M frames (8 cams, 3 hours) 2 TB video data Capture on campus & in downtown Chapel Hill 2x4 cameras, 1024x768@30Hz GPS location & orientation ~$(400 GPUs) Chapel Hill (15-30mph, 30fps) 2.6M frames (8 cams, 3 hours) 2 TB video data 2

Capture on campus & in downtown Chapel Hill 2x4 cameras, 1024x768@30Hz GPS location & orientation ~$(400 GPUs) 6 camera-head GPS/Inertia sensors 2.6 million frames ground reconnaissance video frames aerial video 3

1.3 million frames (2 cams per side) 30 Hz reconstruction frame rate Pollefeys et al. IJCV 08 David Kirk: Factor 100 is fundamentally different! Speedup of ~120x Computation time with 3Ghz CPU, Nvidia GeForce GTX 280: 11.5hrs @ 30Hz ~2 weeks @ 1 Hz ~8 weeks @ 0.25 Hz 3D Reconstruction Result 4

3D model from video 3D model from video images, 2D feature correspondences 5

3D model from video images, 2D feature correspondences 3D model from video images, 2D feature correspondences images, camera poses 6

3D model from video images, 2D feature correspondences images, camera poses 3D model from video images, 2D feature correspondences images, camera poses images, camera poses, depth maps The brighter the farther away 7

3D model from video images, 2D feature correspondences images, camera poses images, camera poses, depth maps 3D model from video images, 2D feature correspondences images, camera poses images, camera poses, depth maps 8

3D model from video GPU speedup ~190 images, 2D feature correspondences images, camera poses images, camera poses, depth maps GPU speedup ~140 3D model from video images, 2D feature correspondences images, camera poses images, camera poses, depth maps 9

2D Feature Tracking Good (point) features to track Good features are distinct in both directions homogeneous edge corner 2D Feature Tracking Good (point) features to track Good features are distinct in both directions X X homogeneous edge corner 10

2D Feature Tracking Establish correspondences between identical points in images Assumption: image content varies slowly/smoothly Video sequences 2D Feature Tracking Establish correspondences between identical points in images Assumption: image content varies slowly/smoothly Video sequences 11

2D Feature Tracking Establish correspondences between identical points in images Assumption: image content varies slowly/smoothly Video sequences Gain-Adaptive KLT-Tracking Kim et al. ICCV 2007 Video with fixed gain Video with auto-gain Simultaneous tracking and radiometric calibration - But: not data parallel hard for GPU acceleration Continuous energy optimization [Zach et al. CVGPU 08] + Data parallel, very efficient on GPU 12

Gain Estimation Camera reported (blue) and estimated gains (red) GPU-KLT Sinha, Frahm, Pollefeys and Genc, Machine, Vision and Applications 07 Zach, Gallup, Frahm CVGPU 08 On CPU with 1000 features in 1024x768 video in 630 ms (Daimler only needs ~100ms on their bold machine without gain and no sub-pixel accuracy) [ms] time [ms] # features On GeForce GTX 280 3.3 ms! Speedup of ~190x (still ~ 30 compared to Daimlers CPU) Code available at: www.cs.unc.edu/~cmzach/klt 13

3D model from video images, 2D feature correspondences images, camera poses Research Roundtable: Computer Vision and Image Processing (1pm, SJCC J4) images, camera poses, depth maps Fast GPU-based planesweeping stereo Plane-sweep multi-view depth estimation (Yang & Pollefeys, CVPR 03) bad estimate Confidence: d d 0 -(c(d) - c(d 0 ))) 2 / e 2-1 For each plane similarity SSD = (I 1 -I 2 ) 2 Chose maximum similarity as best match good estimate 14

Confidence Results Captured Image Stereo Depth Map Confidence Map Compute SSAD per plane 4 planes at a time, 1 plane per color channel plane 0 plane 1 plane 2 plane 3 Compute plane homographies Warps matching view into reference view, as if it were on a given plane. Uses projective texture mapping 15

Compute SSAD per plane SAD fragment shader: N is compile-time constant float4 sad = float4(0,0,0,0); for(i=0; i<n; i++) { // N matching views float4 color; color[0] = tex2dproj(tex[i],coord[i][0]); // plane 0 color[1] = tex2dproj(tex[i],coord[i][1]); // plane 1 color[2] = tex2dproj(tex[i],coord[i][2]); // plane 2 color[3] = tex2dproj(tex[i],coord[i][3]); // plane 3 sad += abs(refcolor-color); } Aggregate with box filter 8 x 8 box filter + + + + + + + + iteration 1 + + + + + + + + iteration 2 + + + + + + + + iteration 3 16

Aggregate with box filter + + + + + + + + iteration 1 + + + + + + + + iteration 2 + + + + + + + + iteration 3 Multi-directional planes Gallup, Frahm, Mordohai, Pollefeys CVPR 07 Improved stereo through plane alignment Urban scenes typically have three dominant surface normals Planes can be automatically estimated from planar camera motion assumption and vanishing points 17

Depth Results from Stereo Greedy (best cost) labeling GPU-Based Global Labeling Goal: assign a label to each pixel, such that The overall data (label) cost is small, and Neighboring pixels prefer the same label Spatial regularization or smoothness Graph cut optimization 2s Continuous energy with Potts model 60 ms Greedy (best cost) labeling Speedup ~33 on Nvidia GeForce 8800 GTX 18

Multi-way plane-sweep Stereo Gallup, Frahm, Mordohai, Pollefeys CVPR 07 Color labels correspond to selected sweeping direction After label optimization Stereo results Video side camera Depth side camera 19

3D model from video images, 2D feature correspondences images, camera poses images, camera poses, depth maps Consensus Surface Fusion on GPU Merrel et al. ICCV 07 Images Depth Maps Stereo Stereo Stereo Stereo Stereo + Confidence + Confidence + Confidence + Confidence Fusion Fusion Double Surface Removal 20

Visibility Relationships Render all depth maps into reference view Choose most confident points along rays Add support from nearby points Remove support for occlusions and free-space violations Reject points with too little support Free- Space Violation view n Support Occlusion reference view Render depthmaps to reference 21

Render depthmaps to reference Point rendering Z buffer Choose most Confident For each depthmap: depth = tex2d(tex,coord).x; conf = tex2d(tex,coord).y; bestdepth = tex2d(texbest,coord).x; bestconf = tex2d(texbest,coord).y; if(conf > bestconf) { bestdepth = depth; bestconf = conf; } OUT.color = float2(bestdepth,bestconf); 22

Add support from nearby points For each depthmap: depth = tex2d(tex,coord).x; conf = tex2d(tex,coord).y; bestdepth = tex2d(texbest,coord).x; support = tex2d(texbest,coord).z; if(depth < bestdepth*1.05 && depth > bestdepth*0.95) { support += conf; } Remove support for occlusions occlusions For each depthmap: depth = tex2d(tex,coord).x; conf = tex2d(tex,coord).y; bestdepth = tex2d(texbest,coord).x; support = tex2d(texbest,coord).z; if(depth < bestdepth*0.95) { support -= conf; } 23

Remove support for free-space violations free-space violations For each depthmap: bestdepth = tex2d(texbest,coord).x; support = tex2d(texbest,coord).z; pt = mul(p,float4(coord.x,coord.y bestdepth,1)); depth = tex2dproj(tex,pt).x; conf = tex2dproj(tex,pt).y; if(depth > pt.z*1.05) { support -= conf; } points with too little support are then rejected Middlebury 3D Evaluation Merrell et al., ICCV 07 & VRML 07 Ring datasets: 47 images Results competitive but much, much faster (30 hours 30 seconds) 21 sec 24

Example Curb Reconstruction Capture & 2D processing Courtesy Daimler Example Curb Reconstruction Capture & 2D processing 25

3D model from video images, 2D feature correspondences images, camera poses images, camera poses, depth maps CUDA Stereo Read image blocks from left and right images. For all disparities, compute SAD matching score, recording disparity with minimum score. Compute for left and right disparity maps Global Memory Shared Memory 26

CUDA Stereo Read image blocks from left and right images. For all disparities, compute SAD matching score, recording disparity with minimum score. Compute for left and right disparity maps Global Memory Shared Memory CUDA Stereo Read image blocks from left and right images. For all disparities, compute SAD matching score, recording disparity with minimum score. Compute for left and right disparity maps Global Memory Shared Memory 27

CUDA Stereo Read image blocks from left and right images. For all disparities, compute SAD matching score, recording disparity with minimum score. Compute for left and right disparity maps Global Memory Shared Memory CUDA Stereo Read image blocks from left and right images. For all disparities, compute SAD matching score, recording disparity with minimum score. Compute for left and right disparity maps Global Memory 28

CUDA Stereo Read image blocks from left and right images. For all disparities, compute SAD matching score, recording disparity with minimum score. Compute for left and right disparity maps Global Memory CUDA Stereo Read image blocks from left and right images. For all disparities, compute SAD matching score, recording disparity with minimum score. Compute for left and right disparity maps Global Memory 29

CUDA Stereo Detect occlusions and outliers by left /right depthmap consistency check. If pixel x in left image corresponds to x in right image, then x should correspond to x, or one of its neighbors. 10% speedup compared to OpenGL Two-view Stereo Results 30

Global Stereo Shares some similarity with global labeling More labels (all possible depth values) Embedded in higher dimension Find surface separating the near from the far plane With minimal costs Data cost (NCC), and Smoothness cost Local updates of values according to PDEs Data parallel! GPU speedup 80x but still 4s! Global Stereo results 31

Conclusions GPU based real-time scene reconstruction system 2D tracking on GPU Stereo estimation on GPU Dense map fusion on GPU Algorithmic redesign to better deploy the GPU What is next? Where am I? x 1.3 M image model Database image Database 3D model 32

Static Scene Reconstruction Thank you! 33