Fundamentals of Stereo Vision Michael Bleyer LVA Stereo Vision

Similar documents
Project 2 due today Project 3 out today. Readings Szeliski, Chapter 10 (through 10.5)

CS5670: Computer Vision

Data Term. Michael Bleyer LVA Stereo Vision

Segmentation Based Stereo. Michael Bleyer LVA Stereo Vision

Evaluation of Different Methods for Using Colour Information in Global Stereo Matching Approaches

Stereo Vision. MAN-522 Computer Vision

Lecture 14: Basic Multi-View Geometry

CS 4495 Computer Vision A. Bobick. Motion and Optic Flow. Stereo Matching

Computer Vision Lecture 17

Computer Vision Lecture 17

Multiple View Geometry

Stereo. 11/02/2012 CS129, Brown James Hays. Slides by Kristen Grauman

Dense 3D Reconstruction. Christiano Gava

CS 4495 Computer Vision A. Bobick. Motion and Optic Flow. Stereo Matching

Recap from Previous Lecture

Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science.

Epipolar Geometry and Stereo Vision

BIL Computer Vision Apr 16, 2014

Stereo: Disparity and Matching

CS4495/6495 Introduction to Computer Vision. 3B-L3 Stereo correspondence

Epipolar Geometry and Stereo Vision

Stereo vision. Many slides adapted from Steve Seitz

Dense 3D Reconstruction. Christiano Gava

Chaplin, Modern Times, 1936

Public Library, Stereoscopic Looking Room, Chicago, by Phillips, 1923

Today. Stereo (two view) reconstruction. Multiview geometry. Today. Multiview geometry. Computational Photography

Stereo. Outline. Multiple views 3/29/2017. Thurs Mar 30 Kristen Grauman UT Austin. Multi-view geometry, matching, invariant features, stereo vision

Lecture 14: Computer Vision

Project 3 code & artifact due Tuesday Final project proposals due noon Wed (by ) Readings Szeliski, Chapter 10 (through 10.5)

Final project bits and pieces

Multiple View Geometry

Image Based Reconstruction II

Stereo imaging ideal geometry

Stereo II CSE 576. Ali Farhadi. Several slides from Larry Zitnick and Steve Seitz

What have we leaned so far?

1 (5 max) 2 (10 max) 3 (20 max) 4 (30 max) 5 (10 max) 6 (15 extra max) total (75 max + 15 extra)

EE795: Computer Vision and Intelligent Systems

Lecture 10: Multi view geometry

Depth from two cameras: stereopsis

Lecture 9 & 10: Stereo Vision

Announcements. Stereo Vision Wrapup & Intro Recognition

Depth. Common Classification Tasks. Example: AlexNet. Another Example: Inception. Another Example: Inception. Depth

Binocular stereo. Given a calibrated binocular stereo pair, fuse it to produce a depth image. Where does the depth information come from?

Recap: Features and filters. Recap: Grouping & fitting. Now: Multiple views 10/29/2008. Epipolar geometry & stereo vision. Why multiple views?

Depth from two cameras: stereopsis

Stereo Vision A simple system. Dr. Gerhard Roth Winter 2012

Stereo Wrap + Motion. Computer Vision I. CSE252A Lecture 17

Stereo and structured light

Subpixel accurate refinement of disparity maps using stereo correspondences

There are many cues in monocular vision which suggests that vision in stereo starts very early from two similar 2D images. Lets see a few...

Fundamental matrix. Let p be a point in left image, p in right image. Epipolar relation. Epipolar mapping described by a 3x3 matrix F

EXAM SOLUTIONS. Image Processing and Computer Vision Course 2D1421 Monday, 13 th of March 2006,

An investigation into stereo algorithms: An emphasis on local-matching. Thulani Ndhlovu

Computer Vision Projective Geometry and Calibration. Pinhole cameras

Machine vision. Summary # 11: Stereo vision and epipolar geometry. u l = λx. v l = λy

Stereo. Many slides adapted from Steve Seitz

STEREO BY TWO-LEVEL DYNAMIC PROGRAMMING

CS201 Computer Vision Camera Geometry

Lecture'9'&'10:'' Stereo'Vision'

Complex Sensors: Cameras, Visual Sensing. The Robotics Primer (Ch. 9) ECE 497: Introduction to Mobile Robotics -Visual Sensors

Colour Segmentation-based Computation of Dense Optical Flow with Application to Video Object Segmentation

Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science.

Correspondence and Stereopsis. Original notes by W. Correa. Figures from [Forsyth & Ponce] and [Trucco & Verri]

Computer Vision cmput 428/615

Rectification and Disparity

Model-Based Stereo. Chapter Motivation. The modeling system described in Chapter 5 allows the user to create a basic model of a

Range Imaging Through Triangulation. Range Imaging Through Triangulation. Range Imaging Through Triangulation. Range Imaging Through Triangulation

Stereo Vision Computer Vision (Kris Kitani) Carnegie Mellon University

Supplemental Material: A Dataset and Evaluation Methodology for Depth Estimation on 4D Light Fields

Camera Calibration. Schedule. Jesus J Caban. Note: You have until next Monday to let me know. ! Today:! Camera calibration

Image Rectification (Stereo) (New book: 7.2.1, old book: 11.1)

CS 664 Slides #9 Multi-Camera Geometry. Prof. Dan Huttenlocher Fall 2003

Depth from Stereo. Dominic Cheng February 7, 2018

EECS 442 Computer vision. Stereo systems. Stereo vision Rectification Correspondence problem Active stereo vision systems

Topics and things to know about them:

Lecture 6 Stereo Systems Multi- view geometry Professor Silvio Savarese Computational Vision and Geometry Lab Silvio Savarese Lecture 6-24-Jan-15

Stereo Matching.

Understanding Variability

Multi-Flash Stereopsis: Depth Edge Preserving Stereo with Small Baseline Illumination

55:148 Digital Image Processing Chapter 11 3D Vision, Geometry

METRIC PLANE RECTIFICATION USING SYMMETRIC VANISHING POINTS

Introduction à la vision artificielle X

Structured Light. Tobias Nöll Thanks to Marc Pollefeys, David Nister and David Lowe

Efficient Large-Scale Stereo Matching

Stereo and Epipolar geometry

Introduction to Computer Vision. Introduction CMPSCI 591A/691A CMPSCI 570/670. Image Formation

MAPI Computer Vision. Multiple View Geometry

Lecture 6 Stereo Systems Multi-view geometry

Epipolar Geometry and Stereo Vision

CS664 Lecture #18: Motion

Robert Collins CSE486, Penn State. Lecture 09: Stereo Algorithms

COMPARATIVE STUDY OF DIFFERENT APPROACHES FOR EFFICIENT RECTIFICATION UNDER GENERAL MOTION

Computer Vision I. Announcements. Random Dot Stereograms. Stereo III. CSE252A Lecture 16

Computer Vision, Lecture 11

Step-by-Step Model Buidling

Cameras and Stereo CSE 455. Linda Shapiro

Stereo CSE 576. Ali Farhadi. Several slides from Larry Zitnick and Steve Seitz

Lecture 17: Recursive Ray Tracing. Where is the way where light dwelleth? Job 38:19

CEng Computational Vision

Epipolar Geometry and the Essential Matrix

Transcription:

Fundamentals of Stereo Vision Michael Bleyer LVA Stereo Vision

What Happened Last Time? Human 3D perception (3D cinema) Computational stereo Intuitive explanation of what is meant by disparity Stereo matching problem Various applications of stereo

What Is Going to Happen Today? Stereo from a technical point of view: Stereo pipeline Epipolar geometry Epipolar rectification Depth via triangulation Challenges in stereo matching Commonly used assumptions Middlebury stereo benchmark

Stereo from a Technical Point of View Michael Bleyer LVA Stereo Vision

Stereo Pipeline Left Image Right Image Rectified Left Image Epipolar Rectification Stereo Matching Disparity Map Depth via Triangulation Rectified Right Image 3D Scene Reconstruction

Stereo Pipeline Left Image Right Image Rectified Left Image Epipolar Rectification Stereo Matching Disparity Map Rectified Right Image Discussed in this session Depth via Triangulation 3D Scene Reconstruction

Stereo Pipeline Left Image Right Image Rectified Left Image Epipolar Rectification Stereo Matching Disparity Map Depth via Triangulation Rectified Right Image Discussed in this and all other sessions 3D Scene Reconstruction

Stereo Pipeline Left Image Right Image Epipolar Rectification Let us start here Rectified Left Image Stereo Matching Disparity Map Depth via Triangulation Rectified Right Image 3D Scene Reconstruction

Pinhole Camera Focal Point Image Plane Simplest model for describing the projection of a 3D scene onto a 2D image. Model is commonly used in computer vision.

Image Formation Process Let us assume we have a pinhole camera. The pinhole camera is characterized by its focal point Cl and its image plane L.

Image Formation Process We also have a second pinhole camera <Cr,R>. We assume that the camera system is fully calibrated, i.e. the 3D positions of <Cl, L> and <Cr,R> are known.

Image Formation Process We have a 3D point P.

Image Formation Process We compute the 2D projection pl of P onto the image plane of the left camera L by intersecting the ray from Cl to P with the plane L. This is what is happening when you take a 2D image of a 3D scene with your camera (image formation process).

Image Formation Process Nice, but actually we want to do exactly the opposite: We have a 2D image and want to make it 3D. We compute the 2D projection pl of P onto the image plane of the left camera L by intersecting the ray from Cl to P with the plane L. This is what is happening when you take a 2D image of a 3D scene with your camera (image formation process).

3D Reconstruction Task: We have a 2D point pl and want to compute its 3D position P.

3D Reconstruction P has to lie on the ray of Cl to pl. Problem: It can lie anywhere on this ray.

3D Reconstruction Let us assume we also know the 2D projection pr of P onto the right image plane R.

3D Reconstruction P can now be reconstructed by intersecting the rays Clpl and Crpr.

3D Reconstruction The challenging part is to find the pair of corresponding pixels pl and pr that are projections of the same 3D point P = P can now be reconstructed by intersecting the rays Clpl and Crpr. Stereo Matching Problem

3D Reconstruction Problem: Given pl, the corresponding pixel pr can lie at any x and y coordinate in the right image. Can we make the search P can now be reconstructed by intersecting the rays Clpl and Crpr. easier?

Epipolar Geometry We have stated that P has to lie on the ray Clpl.

Epipolar Geometry If we project each candidate 3D point onto the right image plane, we see that the all lie on a line in R.

Epipolar Geometry If we project each candidate 3D point onto the right image plane, we see that the all lie on a line in R.

Epipolar Geometry If we project each candidate 3D point onto the right image plane, we see that the all lie on a line in R.

Epipolar Geometry If we project each candidate 3D point onto the right image plane, we see that the all lie on a line in R.

Epipolar Geometry Epipolar line of pl This line is called epipolar line of pl. This epipolar line is the projection of the ray Clpl onto the right image plane R. The pixel pr is forced to lie on pl s epipolar line.

Epipolar Geometry To find the corresponding pixel, we only have to search along the epipolar line (1D instead of 2D search). Epipolar line of pl This search space restriction is This line is called epipolar line of pl. known as epipolar constraint. This epipolar line is the projection of the ray Clpl onto the right image plane R. The pixel pr is forced to lie on pl s epipolar line.

Epipolar Rectification X axis Baseline Specifically interesting case: Image plane L and R lie in a common plane. X-axes are parallel to the baseline Epipolar lines coincide with horizontal scanlines => corresponding pixels have the same y-coordinate

Epipolar Rectification Specifically interesting case: Image plane L and R lie in a common plane. X-axes are parallel to the baseline Epipolar lines coincide with horizontal scanlines => corresponding pixels have the same y-coordinate

Epipolar Rectification Specifically interesting case: Image plane L and R lie in a common plane. X-axes are parallel to the baseline Epipolar lines coincide with horizontal scanlines => corresponding pixels have the same y-coordinate

Epipolar Rectification Specifically interesting case: Image plane L and R lie in a common plane. X-axes are parallel to the baseline Epipolar lines coincide with horizontal scanlines => corresponding pixels have the same y-coordinate

Epipolar Rectification To find the corresponding pixel, we only have to search along the horizontal scanline. More convenient than tracing arbitrary epipolar lines. The difference in x coordinates of Specifically interesting case: Image plane L and R lie in a common plane. corresponding pixels is called disparity X-axes are parallel to the baseline Epipolar lines coincide with horizontal scanlines => corresponding pixels have the same y-coordinate

Epipolar Rectification This special case can be achieved by reprojecting left and right images onto virtual cameras. This process is known as epipolar rectification. Throughout the rest of the lecture we assume that images have been rectified. Original images white lines represent epipolar lines Rectified images epipolar lines coincide with horizontal scanlines Images taken from http://profs.sci.univr.it/~fusiello/rectif_cvol/node6.html

Epipolar Constraint Concluding Remarks Epipolar constraint should always be used, because: 1D search is computationally faster than 2D search. Reduced search range lowers chance of finding a wrong match (Quality of depth maps). More or less the only constraint that will always be valid in stereo matching (unless there are calibration errors).

Stereo Pipeline Left Image Right Image Rectified Right Image Epipolar Rectification Stereo Matching Disparity Map Depth via Triangulation Rectified Right Image Let us for now assume that stereo matching has been solved and look at this point 3D Scene Reconstruction

Depth via Triangulation

Depth via Triangulation

Depth via Triangulation Similar Triangles: X Z = x f l

Depth via Triangulation Similar Triangles: X B Z = x f r

Depth via Triangulation From similar triangles: Write X in explicit form: Combine both equations: Write Z in explicit form: f x Z X l = f x Z B X r = d f B x x f B Z r l.. = = f Z x X l. = B f Z x X r + =. B f x Z f x Z r l + =.. f B x Z x Z r l... + = f B x x Z r l. ).( = This is disparity

Depth via Triangulation From similar triangles: Write X in explicit form: Combine both equations: X Z l xr Disparity = and depth are inversely = X x f Z. x f X B Z proportional! Z. x f l = X = r + B Therefore, disparity is commonly used Z. xl Z. xr synonymously = with + B depth. f f Z. xl = Z. xr + B. f f Z.( xl xr) = B. f Write Z in explicit form: This is disparity Z = B. f xl x r = B. f d

Stereo Pipeline Left Image Right Image Rectified Right Image Epipolar Rectification Stereo Matching Disparity Map Depth via Triangulation Rectified Right Image We will now focus on this problem (throughout the rest of the lecture) 3D Scene Reconstruction

Challenges in Stereo Matching Michael Bleyer LVA Stereo Vision

Stereo Matching Left 2D Image Right 2D Image Disparity Map

Why is Stereo Matching Challenging? (1) Color inconsistencies: When solving the stereo matching problem, we typically assume that corresponding pixels have the same intensity/color (= Photo consistency assumption) That does not need to be true due to: Image noise Different illumination conditions in left and right images Different sensor characteristics of the two cameras. Specular reflections (mirroring) Sampling artefacts Matting artefacts

Why is Stereo Matching Challenging? (2) Untextured regions (Matching ambiguities) There needs to be a certain amount of intensity/color variation (i.e. texture) so that a pixel can be uniquely matched in the other view. Can you (as a human) depict depth if you are standing in front of a wall that is completely white? Left image (no texture in the background) Right image Computed disparity map (errors in background)

Why is Stereo Matching Challenging? (3) Occlusion Problem There are pixels that are only visible in exactly one view. We call this pixels occluded (or half-occluded) It is difficult to estimate depth for these pixels. Occlusion problem makes stereo more challenging than a lot of other computer vision problems. Occluded Pixel

The Occlusion Problem Background Object Foreground Object Let s consider a simple scene composed of a foreground and a background object

The Occlusion Problem Regular case: The white pixel P1 can be seen by both camera.

The Occlusion Problem Occlusion in the right camera: The left camera sees the grey pixel P2. The ray from the right camera to P2 hits the white foreground object => P2 cannot be seen by right camera.

The Occlusion Problem Occlusion in the left camera: The right camera sees the grey pixel P3. The ray from the left camera to P3 hits the white foreground object => P3 cannot be seen by left camera.

The Occlusion Problem Occlusions occur in the proximity of disparity discontinuities.

The Occlusion Problem Occlusions occur as a consequence of discontinuities in depth. They occur close object/depth boundaries. They occur in both frames (symmetrical) Occlusions occur in the proximity of disparity discontinuities.

The Occlusion Problem Left Image (Occlusions in red color) Right Image (Occlusions in red color) In the left image, occlusions are located to the left of a disparity boundary. In the right image, occlusions are located to the right of a disparity boundary.

The Occlusion Problem Correct Disparity Map (Geometry of Left Image) Computed Disparity Map (Occlusions Ignored) It is difficult to find disparity in the matching point does not exist. Ignoring the occlusion problem leads to disparity artefacts near disparity borders.

The Occlusion Problem Disparity Artefacts to the left of depth boundaries Correct Disparity Map (Geometry of Left Image) Computed Disparity Map (Occlusions Ignored) It is difficult to find disparity in the matching point does not exist. Ignoring the occlusion problem leads to disparity artefacts near disparity borders.

Commonly Used Assumptions Michael Bleyer LVA Stereo Vision

Assumptions Assumptions are needed to solve the stereo matching problem. Stereo methods differ in What assumptions they use How they implement these assumptions We have already learned two assumptions: Which ones?

Photo Consistency and Epipolar Assumptions Photo consistency assumption: Corresponding pixels have the same intensity/color in both images. Epipolar assumption: The matching point of a pixel has to lie on the same horizontal scanline in the other image. We can combine both assumptions to obtain our first stereo algorithm. Algorithm 1: For each pixel p of the left image, search the pixel q in the right image that lies on the same y-coordinate as p (Epipolar assumption) and has the most similar color in comparison to p (Photo Consistency).

Results of Algorithm 1 Left Image Computed Disparity Map Quite disappointing, why? We have posed the following task: I have a red pixel. Find me a red pixel in the other image. Problem: There are usually many red pixels in the other image (ambiguity) We need additional assumptions.

Results of Algorithm 1 Correct Disparity Map Computed Disparity Map What is the most obvious difference between the correct and the computed disparity maps?

Smoothness Assumption (1) Observation: A correct disparity map typically consists of regions of constant (or very similar) disparity. For example, lamp, head, table, We can give this apriori knowledge to a stereo algorithm in the form of a smoothness assumption Left Image Correct Disparity Map

Regions where smoothness assumption is valid Regions where smoothness assumption is not valid Smoothness Assumption (2) Smoothness assumption: Spatially close pixels have the same (or similar) disparity. (By spatially close I mean pixels of similar image coordinates.) Smoothness assumption typically holds true almost everywhere, except at disparity borders.

Smoothness Assumption (3) Almost every stereo algorithm uses the smoothness assumption. Stereo algorithms are commonly divided into two categories based on the form in which they apply the smoothness assumption. These categories are: Local methods Global methods

Local Methods Compare Find point color of values maximum within search correspondence windows (a) Left image (b) Right image Compare small windows in left and right images. Within the window, pixels are supposed to have the same disparity => implicit smoothness assumption. We will learn a lot about them in the next session.

Global Methods Define a cost function to measure the quality of a disparity map: High costs mean that the disparity map is bad. Low costs mean it is good. Costs function is typically in the form of: E = Edata + Esmooth where Edata measures photo consistency Esmooth measures smoothness Global methods express smoothness assumption in an explicit form (as a smoothness term). The challenge is to find a disparity map of minimum costs (sessions 4 and 5).

Uniqueness Constraint The uniqueness constraint will help us to handle the occlusion problem. It states: A pixel in one frame has at most a single matching point in the other frame. In general valid, but broken for: Transparent objects Slanted surfaces

Uniqueness Constraint Occluding Object Left Pixel 0 Matching Points

Uniqueness Constraint Left Pixel 1 Matching Points

Uniqueness Constraint Left Pixel 2 Matching Points

Uniqueness Constraint If we assume objects to be opaque (non transparent), this case cannot occur Left Pixel 2 Matching Points

Other Assumptions Ordering assumption: The order in which pixels occur is preserved in both images. Does not hold for thin foreground object. Disparity gradient limit: Originates from psychology Not clear whether assumption is valid for arbitrary camera setups. Both assumptions have rarely been used recently => they are slightly obsolete.

Middlebury Stereo Benchmark Michael Bleyer LVA Stereo Vision

Ground Truth Data Left image Right image Ground truth disparities Ground truth = the correct solution to a given problem. The absence of ground truth data has represented a major problem in computer vision: For most computer vision problems, not a single real test image with ground truth solution has been available. Computer-generated ground truth images do oftentimes not reflect the challenges of real data recorded with a camera. It is difficult to measure the progress in a field if there is no commonly agreed data set with ground truth solution.

Ground Truth Data Ground Truth data is now available for a wide range of computer vision problems including: Object recognition Alpha matting Optical flow MRF-optimization Multi view reconstruction For stereo, ground truth data is available on the Middlebury Stereo Evaluation website http://vision.middlebury.edu/stereo/ The Middlebury set is widely adopted in the stereo community.

The Middlebury Set

How Can One Generate Ground Truth Disparities? Hand labelling: Tsukuba test set Extremely labor-intensive Tsukuba test set disparity map Most other Middlebury ground truth disparity maps have been created using a more precise depth computation technique than stereo matching, namely structured light.

Setup Used for Generating the Middlebury Images Different light patterns are projected onto the scene to compute a high-quality depth map (Depth from structured light).

Setup Used for Generating the Middlebury Images We are currently looking for students who shall set up a similar ground truth systems. Tell me if you are interested. Different light patterns are projected onto the scene to compute a high-quality depth map (Depth from structured light).

Disparity Map Quality Evaluation in the Middlebury Benchmark Estimation of wrong pixels: Build absolute difference between computed and ground truth disparity maps. If absolute disparity difference is larger than one pixel, pixel is counted as error. Ground truth disparity map Ground truth disparity map Error map (Pixels having an absolute disparity error > 1 px) 3 Error metrics: Percentage of erroneous pixels in (1) unoccluded regions, (2) the whole image and (3) in regions close to disparity borders.

The Middlebury Online Benchmark If you have implemented a stereo algorithm, you can evaluate its performance using the Middlebury benchmark. You have to run it on these 4 image pairs: The 3 error metrics are then computed for each image pair. Your algorithm is then ranked according to the computed error values.

The Middlebury Table Currently, more than 70 methods evaluated. You should use this table to rank your stereo matching algorithm developed as your home work. Many more

General Findings in the Middlebury Table Global methods outperform local methods. Local methods: Adaptive weight methods represent the state-of-the-art. Global methods: Methods that apply Belief Propagation or Graph-Cuts in the optimization step outperform dynamic programming methods (if such categorization makes sense) All top-performing methods apply color segmentation.

Summary 3D geometry Challenges Ambiguity Occlusions Assumptions Photo consistency Smoothness assumption Uniqueness assumption Middlebury benchmark