Stereo CSE 576. Ali Farhadi. Several slides from Larry Zitnick and Steve Seitz

Similar documents
Stereo II CSE 576. Ali Farhadi. Several slides from Larry Zitnick and Steve Seitz

Epipolar geometry. x x

Cameras and Stereo CSE 455. Linda Shapiro

Multi-view geometry problems

Structure from motion

Epipolar Geometry and Stereo Vision

Epipolar Geometry and Stereo Vision

Announcements. Stereo

Announcements. Stereo

Realtime 3D Computer Graphics Virtual Reality

Recovering structure from a single view Pinhole perspective projection

Two-view geometry Computer Vision Spring 2018, Lecture 10

Machine vision. Summary # 11: Stereo vision and epipolar geometry. u l = λx. v l = λy

CS201 Computer Vision Camera Geometry

Undergrad HTAs / TAs. Help me make the course better! HTA deadline today (! sorry) TA deadline March 21 st, opens March 15th

Lecture 9: Epipolar Geometry

Computer Vision I. Announcement. Stereo Vision Outline. Stereo II. CSE252A Lecture 15

Computer Vision Lecture 17

Computer Vision Lecture 17

Reminder: Lecture 20: The Eight-Point Algorithm. Essential/Fundamental Matrix. E/F Matrix Summary. Computing F. Computing F from Point Matches

COSC579: Scene Geometry. Jeremy Bolton, PhD Assistant Teaching Professor

Recap: Features and filters. Recap: Grouping & fitting. Now: Multiple views 10/29/2008. Epipolar geometry & stereo vision. Why multiple views?

3D Geometry and Camera Calibration

Epipolar Geometry and Stereo Vision

Depth. Common Classification Tasks. Example: AlexNet. Another Example: Inception. Another Example: Inception. Depth

Structure from Motion and Multi- view Geometry. Last lecture

Epipolar Geometry Prof. D. Stricker. With slides from A. Zisserman, S. Lazebnik, Seitz

Stereo Vision. MAN-522 Computer Vision

Rigid Body Motion and Image Formation. Jana Kosecka, CS 482

Structure from motion

Camera Model and Calibration

CS4670: Computer Vision

Camera Calibration. Schedule. Jesus J Caban. Note: You have until next Monday to let me know. ! Today:! Camera calibration

Multiple View Geometry

calibrated coordinates Linear transformation pixel coordinates

55:148 Digital Image Processing Chapter 11 3D Vision, Geometry

Stereo. 11/02/2012 CS129, Brown James Hays. Slides by Kristen Grauman

Final Exam Study Guide

BIL Computer Vision Apr 16, 2014

Week 2: Two-View Geometry. Padua Summer 08 Frank Dellaert

Lecture 14: Basic Multi-View Geometry

Lecture 6 Stereo Systems Multi- view geometry Professor Silvio Savarese Computational Vision and Geometry Lab Silvio Savarese Lecture 6-24-Jan-15

Structure from motion

3D Sensing and Reconstruction Readings: Ch 12: , Ch 13: ,

Camera Model and Calibration. Lecture-12

Rectification and Disparity

Today. Stereo (two view) reconstruction. Multiview geometry. Today. Multiview geometry. Computational Photography

Lecture 5 Epipolar Geometry

Epipolar Geometry and Stereo Vision

Geometry of Multiple views

Unit 3 Multiple View Geometry

There are many cues in monocular vision which suggests that vision in stereo starts very early from two similar 2D images. Lets see a few...

Basic distinctions. Definitions. Epstein (1965) familiar size experiment. Distance, depth, and 3D shape cues. Distance, depth, and 3D shape cues

Lecture 6 Stereo Systems Multi-view geometry

CS223b Midterm Exam, Computer Vision. Monday February 25th, Winter 2008, Prof. Jana Kosecka

PERCEIVING DEPTH AND SIZE

MAPI Computer Vision. Multiple View Geometry

3D FACE RECONSTRUCTION BASED ON EPIPOLAR GEOMETRY

Project 4 Results. Representation. Data. Learning. Zachary, Hung-I, Paul, Emanuel. SIFT and HoG are popular and successful.

CS 664 Slides #9 Multi-Camera Geometry. Prof. Dan Huttenlocher Fall 2003

55:148 Digital Image Processing Chapter 11 3D Vision, Geometry

Midterm Exam Solutions

Perception, Part 2 Gleitman et al. (2011), Chapter 5

Robert Collins CSE486, Penn State Lecture 08: Introduction to Stereo

Lecture 9 & 10: Stereo Vision

3D Computer Vision. Structure from Motion. Prof. Didier Stricker

Image Formation. Antonino Furnari. Image Processing Lab Dipartimento di Matematica e Informatica Università degli Studi di Catania

Epipolar Geometry in Stereo, Motion and Object Recognition

Computer Vision Lecture 20

Geometric camera models and calibration

The end of affine cameras

Stereo: Disparity and Matching

Computer Vision Lecture 20

But First: Multi-View Projective Geometry

Lecture'9'&'10:'' Stereo'Vision'

Multi-stable Perception. Necker Cube

Multiple View Geometry. Frank Dellaert

CSE 252B: Computer Vision II

Stereo imaging ideal geometry

Introduction to Homogeneous coordinates

CS231M Mobile Computer Vision Structure from motion

Depth from two cameras: stereopsis

1 (5 max) 2 (10 max) 3 (20 max) 4 (30 max) 5 (10 max) 6 (15 extra max) total (75 max + 15 extra)

Perception II: Pinhole camera and Stereo Vision

C / 35. C18 Computer Vision. David Murray. dwm/courses/4cv.

CS231A Course Notes 4: Stereo Systems and Structure from Motion

EXAM SOLUTIONS. Image Processing and Computer Vision Course 2D1421 Monday, 13 th of March 2006,

Lecture 10: Multi-view geometry

Camera model and multiple view geometry

Camera Geometry II. COS 429 Princeton University

Multi-View Geometry Part II (Ch7 New book. Ch 10/11 old book)

Depth from two cameras: stereopsis

All human beings desire to know. [...] sight, more than any other senses, gives us knowledge of things and clarifies many differences among them.

Structure from Motion

Image Processing 1 (IP1) Bildverarbeitung 1

Stereo. Shadows: Occlusions: 3D (Depth) from 2D. Depth Cues. Viewing Stereo Stereograms Autostereograms Depth from Stereo

Visual Recognition: Image Formation

CV: 3D to 2D mathematics. Perspective transformation; camera calibration; stereo computation; and more

Final Exam Study Guide CSE/EE 486 Fall 2007

Lecture 10: Multi view geometry

Transcription:

Stereo CSE 576 Ali Farhadi Several slides from Larry Zitnick and Steve Seitz

Why do we perceive depth?

What do humans use as depth cues? Motion Convergence When watching an object close to us, our eyes point slightly inward. This difference in the direction of the eyes is called convergence. This depth cue is effective only on short distances (less than 10 meters). Binocular Parallax As our eyes see the world from slightly different locations, the images sensed by the eyes are slightly different. This difference in the sensed images is called binocular parallax. Human visual system is very sensitive to these differences, and binocular parallax is the most important depth cue for medium viewing distances. The sense of depth can be achieved using binocular parallax even if all other depth cues are removed. Monocular Movement Parallax If we close one of our eyes, we can perceive depth by moving our head. This happens because human visual system can extract depth information in two similar images sensed after each other, in the same way it can combine two images from different eyes. Focus Accommodation Accommodation is the tension of the muscle that changes the focal length of the lens of eye. Thus it brings into focus objects at different distances. This depth cue is quite weak, and it is effective only at short viewing distances (less than 2 meters) and with other cues. Marko Teittinen http://www.hitl.washington.edu/scivw/eve/iii.a.1.c.depthcues.html

What do humans use as depth cues? Image cues Retinal Image Size When the real size of the object is known, our brain compares the sensed size of the object to this real size, and thus acquires information about the distance of the object. Linear Perspective When looking down a straight level road we see the parallel sides of the road meet in the horizon. This effect is often visible in photos and it is an important depth cue. It is called linear perspective. Texture Gradient The closer we are to an object the more detail we can see of its surface texture. So objects with smooth textures are usually interpreted being farther away. This is especially true if the surface texture spans all the distance from near to far. Overlapping When objects block each other out of our sight, we know that the object that blocks the other one is closer to us. The object whose outline pattern looks more continuous is felt to lie closer. Aerial Haze The mountains in the horizon look always slightly bluish or hazy. The reason for this are small water and dust particles in the air between the eye and the mountains. The farther the mountains, the hazier they look. Shades and Shadows When we know the location of a light source and see objects casting shadows on other objects, we learn that the object shadowing the other is closer to the light source. As most illumination comes downward we tend to resolve ambiguities using this information. The three dimensional looking computer user interfaces are a nice example on this. Also, bright objects seem to be closer to the observer than dark ones. Jonathan Chiu Marko Teittinen http://www.hitl.washington.edu/scivw/eve/iii.a.1.c.depthcues.html

(a) (b)

Amount of horizontal movement is inversely proportional to the distance from the camera (a) (b) (c)

Cameras Thin lens equation: Any object point satisfying this equation is in focus

Depth from Stereo Goal: recover depth by finding image coordinate x that corresponds to x X X x x x z x' f f C Baseline B C

Depth from disparity X x x O O = f z x x z f f O Baseline O B disparity = x x B f = z Disparity is inversely proportional to depth.

Depth from Stereo Goal: recover depth by finding image coordinate x that corresponds to x Sub-Problems 1. Calibration: How do we recover the relation of the cameras (if not already known)? 2. Correspondence: How do we search for the matching point x? X x x'

Correspondence Problem x? We have two images taken from cameras with different intrinsic and extrinsic parameters How do we match a point in the first image to a point in the second? How can we constrain our search?

Key idea: Epipolar constraint X X X x x x x Potential matches for x have to lie on the corresponding line l. Potential matches for x have to lie on the corresponding line l.

Epipolar geometry: notation X x x Baseline line connecting the two camera centers Epipoles = intersections of baseline with image planes = projections of the other camera center Epipolar Plane plane containing baseline (1D family)

Epipolar geometry: notation X x x Baseline line connecting the two camera centers Epipoles = intersections of baseline with image planes = projections of the other camera center Epipolar Plane plane containing baseline (1D family) Epipolar Lines - intersections of epipolar plane with image planes (always come in corresponding pairs)

Example: Converging cameras

Example: Motion parallel to image plane

Example: Forward motion What would the epipolar lines look like if the camera moves directly forward?

Example: Motion perpendicular to image plane

Example: Motion perpendicular to image plane Points move along lines radiating from the epipole: focus of expansion Epipole is the principal point

Example: Forward motion e e Epipole has same coordinates in both images. Points move along lines radiating from Focus of expansion

Epipolar constraint X x x If we observe a point x in one image, where can the corresponding point x be in the other image?

Epipolar constraint X X X x x x x Potential matches for x have to lie on the corresponding epipolar line l. Potential matches for x have to lie on the corresponding epipolar line l.

Epipolar constraint example

Camera parameters How many numbers do we need to describe a camera? We need to describe its pose in the world We need to describe its internal parameters

A Tale of Two Coordinate Systems v COP w Camera y u Two important coordinate systems: 1. World coordinate system 2. Camera coordinate system x o z The World

Camera parameters To project a point (x,y,z) in world coordinates into a camera First transform (x,y,z) into camera coordinates Need to know Camera position (in world coordinates) Camera orientation (in world coordinates) Then project into the image plane Need to know camera intrinsics These can all be described with matrices

Camera parameters A camera is described by several parameters Translation T of the optical center from the origin of world coords Rotation R of the image plane focal length f, principle point (x c, y c ), pixel size (s x, s y ) blue parameters are called extrinsics, red are intrinsics Projection equation The projection matrix models the cumulative effect of all parameters Useful to decompose into a series of operations identity matrix intrinsics projection rotation translation The definitions of these parameters are not completely standardized especially intrinsics varies from one book to another

Extrinsics How do we get the camera to canonical form? (Center of projechon at the origin, x- axis points right, y- axis points up, z- axis points backwards) Step 1: Translate by - c 0

Extrinsics How do we get the camera to canonical form? (Center of projechon at the origin, x- axis points right, y- axis points up, z- axis points backwards) Step 1: Translate by - c How do we represent translahon as a matrix mulhplicahon? 0

Extrinsics How do we get the camera to canonical form? (Center of projechon at the origin, x- axis points right, y- axis points up, z- axis points backwards) Step 1: Translate by - c Step 2: Rotate by R 0 3x3 rotahon matrix

Extrinsics How do we get the camera to canonical form? (Center of projechon at the origin, x- axis points right, y- axis points up, z- axis points backwards) Step 1: Translate by - c Step 2: Rotate by R 0

PerspecHve projechon (intrinsics) (converts from 3D rays in camera coordinate system to pixel coordinates) in general, (upper triangular matrix) : aspect ra+o (1 unless pixels are not square) : skew (0 unless pixels are shaped like rhombi/parallelograms) : principal point ((0,0) unless ophcal axis doesn t intersect projechon plane at origin)

ProjecHon matrix intrinsics projechon rotahon translahon

ProjecHon matrix = 0 (in homogeneous image coordinates)

Epipolar constraint example

Epipolar constraint: Calibrated case X x x Assume that the intrinsic and extrinsic parameters of the cameras are known We can multiply the projection matrix of each camera (and the image points) by the inverse of the calibration matrix to get normalized image coordinates We can also set the global coordinate system to the coordinate system of the first camera. Then the projection matrices of the two cameras can be written as [I 0] and [R t]

Epipolar constraint: Calibrated case X = (x,1) T x x = Rx+t t R The vectors Rx, t, and x are coplanar

Epipolar constraint: Calibrated case X x x x [ t ( Rx)] = 0 x T E x = with E = [ t ] R 0 Essential Matrix (Longuet-Higgins, 1981) The vectors Rx, t, and x are coplanar

Epipolar constraint: Calibrated case X x x x [ t ( Rx)] = 0 x T E x = with E = [ t ] R 0 E x is the epipolar line associated with x (l' = E x) E T x' is the epipolar line associated with x' (l = E T x') E e = 0 and E T e' = 0 E is singular (rank two) E has five degrees of freedom

Epipolar constraint: Uncalibrated case X x x The calibration matrices K and K of the two cameras are unknown We can write the epipolar constraint in terms of unknown normalized coordinates: ˆ E xˆ = x T 0 ˆ 1 1 x = K x, xˆ = K xˆ

Epipolar constraint: Uncalibrated case X x x xˆ T E xˆ = 0 T = 0 with = T x F x F K E K 1 xˆ xˆ = = K K 1 1 x x Fundamental Matrix (Faugeras and Luong, 1992)

Epipolar constraint: Uncalibrated case X x x xˆ T T E xˆ = 0 = 0 with = T x F x F K E K 1 F x is the epipolar line associated with x (l' = F x) F T x' is the epipolar line associated with x' (l' = F T x') F e = 0 and F T e' = 0 F is singular (rank two) F has seven degrees of freedom

The eight-point algorithm T x = ( u, v,1), x = ( u, v,1) f f 11 u 1 [ v 1] f f f v = 0 21 31 f f 12 22 32 f f 13 u [ u u u v u v u v v v u v 1] f = 0 23 33 Minimize: N T 2 ( x i F xi ) i= 1 under the constraint F 2 =1 A Smallest eigenvalue of A T A f f f f f f f f 11 12 13 21 22 23 31 32 33