CS 4495/7495 Computer Vision Frank Dellaert, Fall 07. Dense Stereo Some Slides by Forsyth & Ponce, Jim Rehg, Sing Bing Kang

Similar documents
Stereo Matching.

CS4495/6495 Introduction to Computer Vision. 3B-L3 Stereo correspondence

Stereo. Outline. Multiple views 3/29/2017. Thurs Mar 30 Kristen Grauman UT Austin. Multi-view geometry, matching, invariant features, stereo vision

Chaplin, Modern Times, 1936

Stereo. 11/02/2012 CS129, Brown James Hays. Slides by Kristen Grauman

Multiple View Geometry

Epipolar Geometry and Stereo Vision

Stereo vision. Many slides adapted from Steve Seitz

EE795: Computer Vision and Intelligent Systems

Final project bits and pieces

Correspondence and Stereopsis. Original notes by W. Correa. Figures from [Forsyth & Ponce] and [Trucco & Verri]

Epipolar Geometry and Stereo Vision

Lecture 14: Basic Multi-View Geometry

Dense 3D Reconstruction. Christiano Gava

BIL Computer Vision Apr 16, 2014

Stereo Wrap + Motion. Computer Vision I. CSE252A Lecture 17

CS 4495 Computer Vision A. Bobick. Motion and Optic Flow. Stereo Matching

Robert Collins CSE486, Penn State. Lecture 09: Stereo Algorithms

Stereo: Disparity and Matching

Computer Vision Lecture 17

Computer Vision Lecture 17

Introduction à la vision artificielle X

Matching. Compare region of image to region of image. Today, simplest kind of matching. Intensities similar.

Binocular stereo. Given a calibrated binocular stereo pair, fuse it to produce a depth image. Where does the depth information come from?

Stereo. Many slides adapted from Steve Seitz

CS 4495 Computer Vision A. Bobick. Motion and Optic Flow. Stereo Matching

Lecture 14: Computer Vision

Dense 3D Reconstruction. Christiano Gava

Stereo Vision II: Dense Stereo Matching

Motion and Tracking. Andrea Torsello DAIS Università Ca Foscari via Torino 155, Mestre (VE)

CHAPTER 3 DISPARITY AND DEPTH MAP COMPUTATION

Lecture 10: Multi view geometry

Fundamentals of Stereo Vision Michael Bleyer LVA Stereo Vision

Public Library, Stereoscopic Looking Room, Chicago, by Phillips, 1923

Lecture 16: Computer Vision

Fundamental matrix. Let p be a point in left image, p in right image. Epipolar relation. Epipolar mapping described by a 3x3 matrix F

Computer Vision I. Announcements. Random Dot Stereograms. Stereo III. CSE252A Lecture 16

CS5670: Computer Vision

Stereo and Epipolar geometry

Multi-stable Perception. Necker Cube

INFO - H Pattern recognition and image analysis. Vision

Rectification and Disparity

There are many cues in monocular vision which suggests that vision in stereo starts very early from two similar 2D images. Lets see a few...

Stereo II CSE 576. Ali Farhadi. Several slides from Larry Zitnick and Steve Seitz

Recap: Features and filters. Recap: Grouping & fitting. Now: Multiple views 10/29/2008. Epipolar geometry & stereo vision. Why multiple views?

Finally: Motion and tracking. Motion 4/20/2011. CS 376 Lecture 24 Motion 1. Video. Uses of motion. Motion parallax. Motion field

CS664 Lecture #18: Motion

Efficient Large-Scale Stereo Matching

Range Imaging Through Triangulation. Range Imaging Through Triangulation. Range Imaging Through Triangulation. Range Imaging Through Triangulation

Capturing, Modeling, Rendering 3D Structures

Stereo imaging ideal geometry

Cameras and Stereo CSE 455. Linda Shapiro

5LSH0 Advanced Topics Video & Analysis

A dynamic programming algorithm for perceptually consistent stereo

Feature Tracking and Optical Flow

Computer Vision, Lecture 11

Lecture 19: Motion. Effect of window size 11/20/2007. Sources of error in correspondences. Review Problem set 3. Tuesday, Nov 20

Stereo Vision A simple system. Dr. Gerhard Roth Winter 2012

Colour Segmentation-based Computation of Dense Optical Flow with Application to Video Object Segmentation

Lecture 16: Computer Vision

Subpixel accurate refinement of disparity maps using stereo correspondences

Step-by-Step Model Buidling

Feature descriptors and matching

Lecture 10: Multi-view geometry

A virtual tour of free viewpoint rendering

Dense Image-based Motion Estimation Algorithms & Optical Flow

Feature Tracking and Optical Flow

CEng Computational Vision

Outdoor Scene Reconstruction from Multiple Image Sequences Captured by a Hand-held Video Camera

Segmentation Based Stereo. Michael Bleyer LVA Stereo Vision

CS5670: Computer Vision

Introduction to 3D Machine Vision

Computer Vision I. Announcement. Stereo Vision Outline. Stereo II. CSE252A Lecture 15

Image Transfer Methods. Satya Prakash Mallick Jan 28 th, 2003

Recap from Previous Lecture

A Statistical Consistency Check for the Space Carving Algorithm.

Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science.

Last update: May 4, Vision. CMSC 421: Chapter 24. CMSC 421: Chapter 24 1

Geometric Reconstruction Dense reconstruction of scene geometry

Capturing light. Source: A. Efros

CS 4495 Computer Vision Motion and Optic Flow

Introduction to Computer Vision

Comparison between Motion Analysis and Stereo

Miniature faking. In close-up photo, the depth of field is limited.

Stereo Vision. MAN-522 Computer Vision

Computer Vision Lecture 20

Multi-view Stereo. Ivo Boyadzhiev CS7670: September 13, 2011

Complex Sensors: Cameras, Visual Sensing. The Robotics Primer (Ch. 9) ECE 497: Introduction to Mobile Robotics -Visual Sensors

3D Computer Vision. Structured Light I. Prof. Didier Stricker. Kaiserlautern University.

Computer Vision Lecture 20

LUMS Mine Detector Project

Perception II: Pinhole camera and Stereo Vision

Lecture 20: Tracking. Tuesday, Nov 27

CS 664 Slides #9 Multi-Camera Geometry. Prof. Dan Huttenlocher Fall 2003

Assignment 2: Stereo and 3D Reconstruction from Disparity

Visual Tracking (1) Tracking of Feature Points and Planar Rigid Objects

CS4670/5760: Computer Vision Kavita Bala Scott Wehrwein. Lecture 23: Photometric Stereo

Dense 3-D Reconstruction of an Outdoor Scene by Hundreds-baseline Stereo Using a Hand-held Video Camera

CS 787: Assignment 4, Stereo Vision: Block Matching and Dynamic Programming Due: 12:00noon, Fri. Mar. 30, 2007.

Peripheral drift illusion

Transcription:

CS 4495/7495 Computer Vision Frank Dellaert, Fall 07 Dense Stereo Some Slides by Forsyth & Ponce, Jim Rehg, Sing Bing Kang

Etymology Stereo comes from the Greek word for solid (στερεο), and the term can be applied to any system using more than one channel

Effect of Moving Camera As camera is shifted (viewpoint changed): 3D points are projected to different 2D locations Amount of shift in projected 2D location depends on depth 2D shifts=parallax 3D point

Basic Idea of Stereo Left depth baseline Triangulate on two images of the same point to recover depth. Feature matching across views Calibrated cameras Right Matching correlation windows across scan lines

Why is Stereo Useful? Passive and noninvasive Robot navigation (path planning, obstacle detection) 3D modeling (shape analysis, reverse engineering, visualization) Photorealistic rendering

Outline Pinhole camera model Basic (2-view) stereo algorithm Equations Window-based matching (SSD) Dynamic programming Multiple view stereo

Pinhole Camera Model Image plane Focal length f Focal length f Center of projection Virtual Image In actual image plane, scene appears inverted. In virtual image, scene appears right side up. For expediency, use virtual image for analysis.

Pinhole Camera Model 3D scene point P is projected to a 3D point Q in the virtual image plane Virtual image Q P = (X,Y,Z) By simply rescaling: x O z f Hence, the 2D coordinates in the virtual image is given by y (u,v) (0,0) Note: image center is (0,0)

Basic Stereo Derivations baseline B x O L z (u L,v L ) P L = (X,Y,Z) y (u R,v R ) O R z x y Important note: Because the camera shifts along x, v L = v R

Basic Stereo Derivations baseline B x O L z (u L,v L ) P L = (X,Y,Z) y (u R,v R ) O R z x y Disparity:

Stereo Vision Z(x, y) = f B d(x, y) Left depth baseline Z(x, y) is depth at pixel (x, y) d(x, y) is disparity Right Matching correlation windows across scan lines

Components of Stereo Matching criterion (error function) Quantify similarity of pixels Most common: direct intensity difference Aggregation method How error function is accumulated Options: Pixel, edge, window, or segmented regions Optimization and winner selection Examples: Winner-take-all, dynamic programming, graph cuts, belief propagation

Stereo Correspondence Search over disparity to find correspondences Range of disparities can be large virtually no shift large shift

Correspondence Using Window-based Correlation Left Right scanline SSD error Matching criterion = Sum-of-squared differences Aggregation method = Fixed window size disparity Winner-take-all

Sum of Squared (Intensity) Differences Left Right w L and w R are corresponding m by m windows of pixels. We define the window function : W m (x, y) = {u,v x " m # u # x + m,y " m # v # y + m} 2 2 2 2 The SSD cost measures the intensity difference as a function of disparity: % C r (x, y,d) = [I L (u,v) " I R (u " d,v)] 2 (u,v )$W m (x,y )

Correspondence Using Correlation Left Disparity Map Images courtesy of Point Grey Research

Image Normalization Images may be captured under different exposures (gain and aperture) Cameras may have different radiometric characteristics Surfaces may not be Lambertian Hence, it is reasonable to normalize pixel intensity in each window (to remove bias and scale): I = 1 W m (x,y) # I(u,v) Average pixel (u,v)"w m (x,y) I Wm (x,y ) = ˆ I (x, y) = # [I(u,v)]2 Window magnitude (u,v)"w m (x,y) I(x, y) $ I I $ I Wm (x,y ) Normalized pixel

Images as Vectors Left Right Unwrap image to form vector, using raster scan order row 1 Each window is a vector in an m 2 dimensional vector space. Normalization makes them unit length. row 2 row 3

Image Metrics w L θ w R (d) (Normalized) Sum of Squared Differences $ (u,v)#w m (x,y) C SSD (d) = [ˆ I L (u,v) " ˆ I R (u " d,v)] 2 = w L " w R (d) 2 Normalized Correlation C NC (d) = $ (u,v)#w m (x,y) I ˆ L (u,v)ˆ I R (u " d,v) = w L % w R (d) = cos& d * = argmin d w L " w R (d) 2 = argmax d w L # w R (d)

Caveat Image normalization should be used only when deemed necessary The equivalence classes of things that look similar are substantially larger, leading to more matching ambiguities I I I I x x x x Direct intensity Normalized intensity

Alternative: Histogram Warping (Assumes significant visual overlap between images) freq freq I Compare and warp towards each other I freq freq I Cox, Roy, & Hingorani 95: Dynamic Histogram Warping I

Two major roadblocks Textureless regions create ambiguities Occlusions result in missing data Occluded regions Textureless regions

Dealing with ambiguities and occlusion Ordering constraint: Impose same matching order along scanlines Uniqueness constraint: Each pixel in one image maps to unique pixel in other Can encode these constraints easily in dynamic programming

Pixel-based Stereo Center of left camera Center of right camera Left scanline Right scanline (NOTE: I m using the actual, not virtual, image here.)

Stereo Correspondences Right image is reference Definition of occlusion/disocclusion depends on which image is considered the reference Moving from left to right: Pixels that disappear are occluded; pixels that appear are disoccluded Left scanline Right scanline Occlusion Match Match Match Disocclusion

Search Over Correspondences Left scanline Occluded Pixels Right scanline Disoccluded Pixels Three cases: Sequential cost of match Occluded cost of no match Disoccluded cost of no match

Stereo Matching with Dynamic Programming Occluded Pixels Start Left scanline Dis-occluded Pixels Right scanline Dynamic programming yields the optimal path through grid. This is the best set of matches that satisfy the ordering constraint End

Ordering Constraint is not Generally Correct Preserves matching order along scanlines, but cannot handle double nail illusion A

Uniqueness Constraint is not Generally Correct Slanted plane: Matching between M pixels and N pixels

Edge-based Stereo Another approach is to match edges rather than windows of pixels: Which method is better? Edges tend to fail in dense texture (outdoors) Correlation tends to fail in smooth featureless areas Sparse correspondences

Segmentation-based Stereo Hai Tao and Harpreet W. Sawhney

Another Example

From 2 views to >2 views More pixels voting for the right depth Statistically more robust However, occlusion reasoning is more complicated, since we have to account for partial occlusion: Which subset of cameras sees the same 3D point?