Vision 3D articielle Disparity maps, correlation

Similar documents
Vision 3D articielle Multiple view geometry

Stereo II CSE 576. Ali Farhadi. Several slides from Larry Zitnick and Steve Seitz

Epipolar Geometry and Stereo Vision

BIL Computer Vision Apr 16, 2014

CS223b Midterm Exam, Computer Vision. Monday February 25th, Winter 2008, Prof. Jana Kosecka

Capturing, Modeling, Rendering 3D Structures

Stereo. 11/02/2012 CS129, Brown James Hays. Slides by Kristen Grauman

CS5670: Computer Vision

Rectification and Distortion Correction

Stereo and Epipolar geometry

Introduction à la vision artificielle X

EE795: Computer Vision and Intelligent Systems

Epipolar Geometry and Stereo Vision

Stereo Vision. MAN-522 Computer Vision

Public Library, Stereoscopic Looking Room, Chicago, by Phillips, 1923

Flow Estimation. Min Bai. February 8, University of Toronto. Min Bai (UofT) Flow Estimation February 8, / 47

Computer Vision Lecture 17

Computer Vision Lecture 17

55:148 Digital Image Processing Chapter 11 3D Vision, Geometry

Fundamentals of Stereo Vision Michael Bleyer LVA Stereo Vision

Multiple Views Geometry

Binocular stereo. Given a calibrated binocular stereo pair, fuse it to produce a depth image. Where does the depth information come from?

Lecture 6 Stereo Systems Multi-view geometry

MAPI Computer Vision. Multiple View Geometry

Computer Vision I. Announcement. Stereo Vision Outline. Stereo II. CSE252A Lecture 15

Computer Vision I - Algorithms and Applications: Multi-View 3D reconstruction

Multiple View Geometry

Dense 3D Reconstruction. Christiano Gava

Stereo vision. Many slides adapted from Steve Seitz

What have we leaned so far?

CS 4495 Computer Vision A. Bobick. Motion and Optic Flow. Stereo Matching

Structure from Motion. Introduction to Computer Vision CSE 152 Lecture 10

CS201 Computer Vision Camera Geometry

Dense 3D Reconstruction. Christiano Gava

55:148 Digital Image Processing Chapter 11 3D Vision, Geometry

LUMS Mine Detector Project

Application questions. Theoretical questions

CHAPTER 3 DISPARITY AND DEPTH MAP COMPUTATION

Epipolar Geometry and Stereo Vision

CS 4495 Computer Vision A. Bobick. Motion and Optic Flow. Stereo Matching

3D Modeling using multiple images Exam January 2008

Today. Stereo (two view) reconstruction. Multiview geometry. Today. Multiview geometry. Computational Photography

Computer Vision I. Announcements. Random Dot Stereograms. Stereo III. CSE252A Lecture 16

Final Exam Study Guide

calibrated coordinates Linear transformation pixel coordinates

There are many cues in monocular vision which suggests that vision in stereo starts very early from two similar 2D images. Lets see a few...

Chaplin, Modern Times, 1936

Dense Image-based Motion Estimation Algorithms & Optical Flow

Stereo. Many slides adapted from Steve Seitz

Index. 3D reconstruction, point algorithm, point algorithm, point algorithm, point algorithm, 263

Embedded real-time stereo estimation via Semi-Global Matching on the GPU

Stereo Vision Computer Vision (Kris Kitani) Carnegie Mellon University

Rectification and Disparity

Fundamental matrix. Let p be a point in left image, p in right image. Epipolar relation. Epipolar mapping described by a 3x3 matrix F

CS4495/6495 Introduction to Computer Vision. 3B-L3 Stereo correspondence

Recap: Features and filters. Recap: Grouping & fitting. Now: Multiple views 10/29/2008. Epipolar geometry & stereo vision. Why multiple views?

CS 2770: Intro to Computer Vision. Multiple Views. Prof. Adriana Kovashka University of Pittsburgh March 14, 2017

Midterm Exam Solutions

Laser sensors. Transmitter. Receiver. Basilio Bona ROBOTICA 03CFIOR

Quasi-Euclidean Epipolar Rectification

1 (5 max) 2 (10 max) 3 (20 max) 4 (30 max) 5 (10 max) 6 (15 extra max) total (75 max + 15 extra)

Computer Vision I. Dense Stereo Correspondences. Anita Sellent 1/15/16

3D Sensing and Reconstruction Readings: Ch 12: , Ch 13: ,

Index. 3D reconstruction, point algorithm, point algorithm, point algorithm, point algorithm, 253

Cameras and Stereo CSE 455. Linda Shapiro

Reminder: Lecture 20: The Eight-Point Algorithm. Essential/Fundamental Matrix. E/F Matrix Summary. Computing F. Computing F from Point Matches

Stereo imaging ideal geometry

Image Based Reconstruction II

Stereo: Disparity and Matching

Depth from two cameras: stereopsis

Lecture 14: Computer Vision

Matching. Compare region of image to region of image. Today, simplest kind of matching. Intensities similar.

Structure from motion

Camera Calibration. Schedule. Jesus J Caban. Note: You have until next Monday to let me know. ! Today:! Camera calibration

Epipolar Geometry and Stereo Vision

Image Rectification (Stereo) (New book: 7.2.1, old book: 11.1)

Final project bits and pieces

Bilaterally Weighted Patches for Disparity Map Computation

CSE 252B: Computer Vision II

Depth from two cameras: stereopsis

Lecture 14: Basic Multi-View Geometry

Perception II: Pinhole camera and Stereo Vision

EXAM SOLUTIONS. Image Processing and Computer Vision Course 2D1421 Monday, 13 th of March 2006,

EECS 442 Computer vision. Stereo systems. Stereo vision Rectification Correspondence problem Active stereo vision systems

Announcements. Stereo Vision Wrapup & Intro Recognition

Subpixel accurate refinement of disparity maps using stereo correspondences

Computer Vision, Lecture 11

Lecture 9: Epipolar Geometry

Introduction to Computer Vision. Week 10, Winter 2010 Instructor: Prof. Ko Nishino

Assignment 2: Stereo and 3D Reconstruction from Disparity

Computational Optical Imaging - Optique Numerique. -- Single and Multiple View Geometry, Stereo matching --

Machine vision. Summary # 11: Stereo vision and epipolar geometry. u l = λx. v l = λy

Multiple View Geometry

Computer Vision I. Announcement

1 Projective Geometry

Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science.

Project 3 code & artifact due Tuesday Final project proposals due noon Wed (by ) Readings Szeliski, Chapter 10 (through 10.5)

Correspondence and Stereopsis. Original notes by W. Correa. Figures from [Forsyth & Ponce] and [Trucco & Verri]

Stereo CSE 576. Ali Farhadi. Several slides from Larry Zitnick and Steve Seitz

Stereo. Outline. Multiple views 3/29/2017. Thurs Mar 30 Kristen Grauman UT Austin. Multi-view geometry, matching, invariant features, stereo vision

Transcription:

Vision 3D articielle Disparity maps, correlation Pascal Monasse monasse@imagine.enpc.fr IMAGINE, École des Ponts ParisTech http://imagine.enpc.fr/~monasse/stereo/

Contents Triangulation Epipolar rectication Disparity map

Contents Triangulation Epipolar rectication Disparity map

Triangulation Let us write again the binocular formulae: λx = K(RX + T ) λ x = K X Write Y T = ( X T 1 λ λ ) : ( ) KR KT x 0 3 K 0 3 0 3 x Y = 0 6 (6 equations 5 unknowns+1 epipolar constraint) We can then recover X. Special case: R = Id, T = Be 1 We get: If also K = K, d is the disparity z(x KK 1 x ) = ( Bf 0 0 ) T z = fb/[(x x ) e 1 ] = fb/d

Triangulation Fundamental principle of stereo vision z h, B/H z = d H. f f focal length. H distance optical center-ground. B distance between optical centers (baseline). Goal Given two rectied images, point correspondences and computation of their apparent shift (disparity) gives information about relative depth of the scene.

Recovery of R and T Suppose we know K, K, and F or E. Recover R and T? From E = [T ] R, E T E = R T (TT T T 2 I )R = (R T T )(R T T ) T + R T T 2 I If x = R T T, E T E x = 0 and if y x = 0, E T E y = T 2 y. Therefore σ 1 = σ 2 = T and σ 3 = 0. Inversely, from E = Udiag(σ, σ, 0)V T, we can write: 0 1 0 0 1 0 E = σu T U U 1 0 0 T V = σ[t ] R 1 0 0 0 0 0 0 0 1 { T = ±σu[e 3 ] U T Actually, there are up to 4 solutions: R = UR z (± π 2 )V T B A B B A A B A

What is possible without calibration? We can recover F, but not E. Actually, from x = PX x = P X we see that we have also: x = (PH 1 )(HX) x = (P H 1 )(HX) Interpretation: applying a space homography and transforming the projection matrices (this changes K, K, R and T ), we get exactly the same projections. Consequence: in the uncalibrated case, reconstruction can only be done modulo a 3D space homography.

Contents Triangulation Epipolar rectication Disparity map

Epipolar rectication It is convenient to get to a situation where epipolar lines are parallel and at same ordinate in both images. As a consequence, epipoles are at horizontal innity: 1 e = e = 0 0 It is always possible to get to that situation by virtual rotation of cameras (application of homography) Image planes coincide and are parallel to baseline.

Epipolar rectication Image 1

Epipolar rectication Image 2

Epipolar recti cation Image 1 Recti ed image 1

Epipolar recti cation Image 2 Recti ed image 2

Epipolar rectication Fundamental matrix can be written: 1 F = 0 0 0 = 0 0 0 0 1 0 1 thus T x Fx = 0 y y = 0 0 Writing matrices P = K K = f x s c x ( I 0 ) and P = K ( I Be 1 ) : f x s c x 0 f y c y K = 0 f y c y 0 0 1 0 0 1 F = BK T [e 1 ] K 1 = B f y f y 0 0 0 0 0 f y 0 f y c y f y c y f y We must have f y = f y and c y = c y, that is identical second rows of K and K

Epipolar rectication We are looking for homographies H and H to apply to images such that F = H T [e 1 ] H That is 9 equations and 16 variables, 7 degrees of freedom remain: the rst rows of K and K and the rotation angle around baseline α Invariance through rotation around baseline: T 1 0 0 0 0 0 1 0 0 0 cos α sin α 0 0 1 0 cos α sin α = [e 1 ] 0 sin α cos α 0 1 0 0 sin α cos α Several methods exist, they try to distort as little as possible the image Rectif. of Gluckman-Nayar (2001)

Epipolar rectication of Fusiello-Irsara (2008) We are looking for H and H as rotations, supposing K = K known: H = K n RK 1 and H = K n R K 1 with K n and K of identical second row, n R and R rotation matrices parameterized by Euler angles and f 0 w/2 K = 0 f h/2 0 0 1 Writing R = R x (θ x )R y (θ y )R z (θ z ) we must have: F = (K n RK 1 ) T [e 1 ] (K n R K 1 ) = K T R T z R T y [e 1 ] R K 1 We minimize the sum of squares of points to their epipolar line according to the 6 parameters (θ y, θ z, θ x, θ y, θ z, f )

Ruins E 0 = 3.21 pixels. E 6 = 0.12 pixels.

Ruins E 0 = 3.21 pixels. E 6 = 0.12 pixels.

Cake E 0 = 17.9 pixels. E 13 = 0.65 pixels.

Cake E 0 = 17.9 pixels. E 13 = 0.65 pixels.

Cluny E 0 = 4.87 pixels. E 14 = 0.26 pixels.

Cluny E 0 = 4.87 pixels. E 14 = 0.26 pixels.

Carcassonne E 0 = 15.6 pixels. E 4 = 0.24 pixels.

Carcassonne E 0 = 15.6 pixels. E 4 = 0.24 pixels.

Books E 0 = 3.22 pixels. E 14 = 0.27 pixels.

Books E 0 = 3.22 pixels. E 14 = 0.27 pixels.

Contents Triangulation Epipolar rectication Disparity map

Disparity map z = fb d Depth z is inversely proportional to disparity d (apparent motion, in pixels). Disparity map: At each pixel, its apparent motion between left and right images. We already know disparity at feature points, this gives an idea about min and max motion, which makes the search for matching points less ambiguous and faster.

Stereo Matching Principle: invariance of something between corresponding pixels in left and right images (I L, I R ) Example: color, x-derivative, census... Usage of a distance to capture this invariance, such as AD(p, q) = I L (p) I R (q) 1

Stereo Matching Principle: invariance of something between corresponding pixels in left and right images (I L, I R ) Example: color, x-derivative, census... Usage of a distance to capture this invariance, such as AD(p, q) = I L (p) I R (q) 1 Left image Ground truth Min AD

Stereo Matching Post-processing helps a lot! Example: left-right consistency check, followed by simple constant interpolation, and median weighted by original image bilateral weights Min CG Left-right test Post-processed

Stereo Matching Post-processing helps a lot! Example: left-right consistency check, followed by simple constant interpolation, and median weighted by original image bilateral weights Min CG Left-right test Post-processed Still, single pixel estimation not good enough Need to promote some regularity of the result

Global vs. local methods Global method: explicit smoothness term arg min E data (p, p + d(p); I L, I R ) d p + p p E reg (d(p), d(p ); p, p, I L, I R ) Examples: E reg = d(p) d(p ) 2 (Horn-Schunk), E reg = δ(d(p) = d(p )) (Potts), E reg = exp( (I L (p) I L (p )) 2 /σ 2 ) d(p) d(p )...

Global vs. local methods Global method: explicit smoothness term arg min E data (p, p + d(p); I L, I R ) d p + p p E reg (d(p), d(p ); p, p, I L, I R ) Examples: E reg = d(p) d(p ) 2 (Horn-Schunk), E reg = δ(d(p) = d(p )) (Potts), E reg = exp( (I L (p) I L (p )) 2 /σ 2 ) d(p) d(p )... Problem: NP-hard for almost all regularity terms except E reg = λ pp d(p) d(p ) (Ishikawa 2003) Aternative: sub-optimal solution for submodular regularity (graph-cuts: Boykov, Kolmogorov, Zabih), loopy-belief propagation (no guarantee at all), semi-global matching (Hirschmüller)

Global vs. local methods Local method: Take a patch around p, aggregate costs E data (Lucas-Kanade) No explicit regularity term Example: SAD(p, q) = r P I L(p + r) I R (q + r), SSD(p, q) = r P I L(p + r) I R (q + r) 2, SCG(p, q) = CG(p + r, q + r)... r P Can be interpreted as a cost-volume ltering. dmax dmin * 1 Px{0} Increasing patch size P promotes regularity.

Global vs. local methods Local method: Take a patch around p, aggregate costs E data (Lucas-Kanade) No explicit regularity term Example: SAD(p, q) = r P I L(p + r) I R (q + r), SSD(p, q) = r P I L(p + r) I R (q + r) 2, SCG(p, q) = CG(p + r, q + r)... r P Can be interpreted as a cost-volume ltering. Increasing patch size P promotes regularity. p p Proportion of common pixels between p + P and p + P: 1 1 n if P is n n

Local search At each pixel, we consider a context window and we look for the motion of this window. Distance between windows: d(q) = arg min (I (q + p) I (q + de 1 + p)) 2 d p F Variants to be more robust to illumination changes: 1. Translate intensities by the mean over the window. I (q + p) I (q + p) r F I (q + r)/#f 2. Normalize by mean and variance over window.

Distance between patches Several distances or similarity measures are popular: SAD: Sum of Absolute Dierences d(q) = arg min I (q + p) I (q + de 1 + p) d p F SSD: Sum of Squared Dierences d(q) = arg min (I (q + p) I (q + de 1 + p)) 2 d p F CSSD: Centered Sum of Squared Dierences d(q) = arg min (I (q + p) Ī F I (q + de 1 + p) + Ī F )2 d p F NCC: Normalized Cross-Correlation d(q) = arg max d p F (I (q + p) Ī F )(I (q + de 1 + p) Ī F ) (I (q + p) Ī F ) 2 (I (q + de 1 + p) Ī F )2

Another distance The following distance is more and more popular in recent articles: with ɛ(p, q) = (1 α) min ( I (p) I ) (q) 1, τ col + ( ) α min I I (p) (q), τ grad x x I (p) I (q) 1 = I r (p) I r (q) + I g (p) I g (q) + I b (p) I b (q) Usual parameters: α = 0.9 τ col = 30 (not very sensitive if larger) τ grad = 2 (not very sensitive if larger) Note that α = 0 is similar to SAD.

Varying patch size P = {(0, 0)}

Varying patch size P = [ 1, 1] 2

Varying patch size P = [ 7, 7] 2

Varying patch size P = [ 21, 21] 2

Varying patch size P = [ 35, 35] 2

Problems of local methods Implicit hypothesis: all points of window move with same motion, that is they are in a fronto-parallel plane. aperture problem: the context can be too small in certain regions, lack of information. adherence problem: intensity discontinuities inuence strongly the estimated disparity and if it corresponds with a depth discontinuity, we have a tendency to dilate the front object. C C patch rooftop O: aperture problem A: adherence problem ground

Example: seeds expansion We rely on best found distances and we put them in a priority queue (seeds) We pop the best seed G from the queue, we compute for neighbors the best disparity between d(g) 1, d(g), and d(g) + 1 and we push them in the queue. Right image

Example: seeds expansion We rely on best found distances and we put them in a priority queue (seeds) We pop the best seed G from the queue, we compute for neighbors the best disparity between d(g) 1, d(g), and d(g) + 1 and we push them in the queue. Left image

Example: seeds expansion We rely on best found distances and we put them in a priority queue (seeds) We pop the best seed G from the queue, we compute for neighbors the best disparity between d(g) 1, d(g), and d(g) + 1 and we push them in the queue. Seeds

Example: seeds expansion We rely on best found distances and we put them in a priority queue (seeds) We pop the best seed G from the queue, we compute for neighbors the best disparity between d(g) 1, d(g), and d(g) + 1 and we push them in the queue. Seeds expansion

Example: seeds expansion We rely on best found distances and we put them in a priority queue (seeds) We pop the best seed G from the queue, we compute for neighbors the best disparity between d(g) 1, d(g), and d(g) + 1 and we push them in the queue. Left image

Adaptive neighborhoods To reduce adherence (aka fattening eect), an image patch should be on the same object, or even better at constant depth Heuristic inspired by bilateral lter [Yoon&Kweon 2006]: ( ω I (p, p ) = exp p ) p 2 γ pos ( exp I (p) ) I (p ) 1 γ col Selected disparity: d(p) = arg min E(p, q) with d=q p r F E(p, q) = ω I (p, p + r) ω I (q, q + r) ɛ(p + r, q + r) ω r F I (p, p + r) ω I (q, q + r) We can take a large window F (e.g., 35 35)

Bilateral weights (a) (b) (c) (d)

Results Tsukuba Venus Teddy Cones Left image Ground truth Results

What is the limit of adaptive neighborhoods? The best patch is P p (r) = 1(d(p + r) = d(p)) Suppose we have an oracle giving P p Use ground-truth image to compute P p Since GT is subpixel, use P p (r) = 1( d(p + r) d(p) 1/2)

Test with oracle image ground truth oracle patches

Test with oracle image ground truth oracle patches

Conclusion We can get back to the canonical situation by epipolar rectication. Limit: when epipoles are in the image, standard methods are not adapted. For disparity map computation, there are many choices: 1. Size and shape of window? 2. Which distance? 3. Filtering of disparity map to reject uncertain disparities? You will see next session a global method for disparity computation Very active domain of research, >150 methods tested at http://vision.middlebury.edu/stereo/

Practical session: Disparity map computation by propagation of seeds Objective: Compute the disparity map associated to a pair of images. We start from high condence points (seeds), then expand by supposing that the disparity map is regular. Get initial program from the website. Compute disparity map from image 1 to 2 of all points by highest NCC score. Keep only disparity where NCC is suciently high (0.95), put them as seeds in a std::priority_queue. While queue is not empty: 1. Pop P, the top of the queue. 2. For each 4-neighbor Q of P having no valid disparity, set d Q by highest NCC score among d P 1, d P, and d P + 1. 3. Push Q in queue. Hint: the program may be too slow in Debug mode for the full images. Use cropped images to nd your bugs, then build in Release mode for original images.