CS 1699: Intro to Computer Vision Epipolar Geometry and Stereo Vision Prof. Adriana Kovashka University of Pittsburgh October 8, 2015
Today Review Projective transforms Image stitching (homography) Epipolar geometry Multiple views from different cameras Stereo vision Estimating depth from disparities Exam and homework info
2D Linear Transformations x' y' a c bx d y Only linear 2D transformations can be represented with a 2x2 matrix. Linear transformations are combinations of Scale, Rotation, Shear, and Mirror Alyosha Efros
2D Affine Transformations Affine transformations are combinations of Linear transformations, and Translations Parallel lines remain parallel w y x f e d c b a w y x 1 0 0 ' ' ' Alyosha Efros
Projective Transformations Projective transformations: Affine transformations, and Projective warps Parallel lines do not necessarily remain parallel w y x i h g f e d c b a w y x ' ' ' Kristen Grauman
Fitting an affine transformation How many matches (correspondence pairs) do we need to solve for the transformation parameters? Once we have solved for the parameters, how do we compute given? i i i i i i y x t t m m m m y x y x 2 1 4 3 2 1 1 0 0 0 0 1 0 0 ), ( new x new y Modified from Kristen Grauman ) ', ' ( new x new y
Projection matrix and camera parameters Silvio Savarese, Kristen Grauman x K R t X x: Image Coordinates: (u,v,1) K: Intrinsic Matrix (3x3) R: Rotation (3x3) t: Translation (3x1) X: World Coordinates: (X,Y,Z,1) Extrinsic params R, t Intrinsic params K: focal length, pixel sizes (mm), etc. We ll assume that these parameters are given and fixed.
X 0 x K I 1 0 1 0 0 0 0 0 0 0 0 1 z y x f f v u w K Projection matrix: Simplest case Intrinsic Assumptions Unit aspect ratio Optical center at (0,0) No skew Extrinsic Assumptions No rotation Camera at (0,0,0) Silvio Savarese
1 1 0 0 0 1 33 32 31 23 22 21 13 12 11 0 0 Z Y X t r r r t r r r t r r r v u s f v u w z y x X t x K R 9 Derek Hoiem Projection matrix: General case
10 1 * * * * * * * * * * * * Z Y X w wv wu X t x K R Derek Hoiem Camera calibration
Mosaics... image from S. Seitz Obtain a wider angle view by combining multiple images. Kristen Grauman
Mosaics Two images with rotation/zoom but no translation Derek Hoiem Camera Center
How to stitch together a panorama (a.k.a. Basic Procedure mosaic)? Take a sequence of images from the same position Rotate the camera about its optical center Compute the homography (transformation) between second image and first Transform the second image to overlap with the first Blend the two together to create a mosaic (If there are more images, repeat) Modified from Steve Seitz
Computing the homography Steve Seitz mosaic plane The mosaic has a natural interpretation in 3D The images are reprojected onto a common plane The mosaic is formed on this plane Mosaic is a synthetic wide-angle camera
Computing the homography A projective transform is a mapping between any two PPs with the same center of projection rectangle should map to arbitrary quadrilateral parallel lines aren t but must preserve straight lines called Homography PP2 wx' wy' w p * * * * * * H * x * y * 1 p PP1 Alyosha Efros
Computing the homography x x1, y 1 1, y 1 x 2, y 2 x2, y 2 x,, n y n x y n n To compute the homography given pairs of corresponding points in the images, we need to set up an equation where the parameters of H are the unknowns Kristen Grauman
Computing the homography Can set scale factor i=1. So, there are 8 unknowns. Set up a system of linear equations: Ah = b where vector of unknowns h = [a,b,c,d,e,f,g,h] T Need at least 8 eqs, but the more the better Solve for h. If overconstrained, solve using least-squares: >> help lmdivide p = Hp wx' a b c x wy' d e f y w g h i 1 min Ahb 2 Kristen Grauman
How to stitch together a panorama (a.k.a. Basic Procedure mosaic)? Take a sequence of images from the same position Rotate the camera about its optical center Compute the homography (transformation) between second image and first Transform the second image to overlap with the first Blend the two together to create a mosaic (If there are more images, repeat) Modified from Steve Seitz
1 y x * * * * * * * * * w wy' wx' H p p w wy w wx, y x, y x, To apply a given homography H Compute p = Hp (regular matrix multiply) Convert p from homogeneous to image coordinates Modified from Kristen Grauman Transforming the second image Image 1 canvas Image 2
Transforming the second image Image 2 Image 1 canvas H(x,y) y y x x f(x,y) g(x,y ) Forward warping: Send each pixel f(x,y) to its corresponding location (x,y ) = H(x,y) in the right image Modified from Alyosha Efros
Transforming the second image Image 2 Image 1 canvas y H -1 (x,y) y x x f(x,y) g(x,y ) Inverse warping: Get each pixel g(x,y ) from its corresponding location (x,y) = H -1 (x,y ) in the left image Q: what if pixel comes from between two pixels? A: Interpolate color value from neighbors Modified from Alyosha Efros
Derek Hoiem RANSAC for Homography
Today Review Projective transforms Image stitching (homography) Epipolar geometry Multiple views from different cameras Stereo vision Estimating depth from disparities Exam and homework info
Last class vs this class Last class: same camera center, but camera rotates This class: Camera center is not the same (we have multiple cameras) Epipolar geometry Relates cameras from two positions Stereo depth estimation Recover depth from two images Adapted from Derek Hoiem
Why multiple views? Structure and depth are inherently ambiguous from single views. Multiple views help us to perceive 3d shape and depth. Kristen Grauman, images from Svetlana Lazebnik
Stereo photography and stereo viewers Take two pictures of the same subject from two slightly different viewpoints and display so that each eye sees only one of the images. Invented by Sir Charles Wheatstone, 1838 Image from fisher-price.com Kristen Grauman
Stereo photography and stereo viewers http://www.johnsonshawmuseum.org Kristen Grauman
Stereo vision Two cameras, simultaneous views Single moving camera and static scene Kristen Grauman
Depth from Stereo Goal: recover depth by finding image coordinate x that corresponds to x X X x x x z x' f f C Baseline C B Derek Hoiem
Depth from Stereo Goal: recover depth by finding image coordinate x that corresponds to x Sub-Problems 1. Calibration: How do we recover the relation of the cameras (if not already known)? 2. Correspondence: How do we search for the matching point x? X x x' Derek Hoiem
Geometry for a simple stereo system Assume parallel optical axes, known camera parameters (i.e., calibrated cameras). What is expression for Z? Similar triangles (p l, P, p r ) and (O l, P, O r ): T x l Z f x r T Z Depth is inversely proportional to disparity. depth disparity Z f T x r x l Adapted from Kristen Grauman
Depth from disparity We have two images taken from cameras with different intrinsic and extrinsic parameters. How do we match a point in the first image to a point in the second? image I(x,y) Disparity map D(x,y) image I (x,y ) So if we could find the corresponding points in two images, we could estimate relative depth Kristen Grauman
Stereo correspondence constraints Given p in left image, where can corresponding point p be? Kristen Grauman
Stereo correspondence constraints Kristen Grauman
Epipolar constraint Geometry of two views constrains where the corresponding pixel for some image point in the first view must occur in the second view. It must be on the line carved out by a plane connecting the world point and optical centers. Potential matches for p have to lie on the corresponding line l. Potential matches for p have to lie on the corresponding line l. Kristen Grauman, Derek Hoiem
Epipolar geometry: notation X x x Derek Hoiem Baseline line connecting the two camera centers Epipoles = intersections of baseline with image planes = projections of the other camera center Epipolar Plane plane containing baseline Epipolar Lines - intersections of epipolar plane with image planes (always come in corresponding pairs) Note: All epipolar lines intersect at the epipole.
Epipolar constraint This is useful because it reduces the correspondence problem to a 1D search along an epipolar line. Kristen Grauman, image from Andrew Zisserman
Stereo geometry, with calibrated cameras If the stereo rig is calibrated, we know : how to rotate and translate camera reference frame 1 to get to camera reference frame 2. Rotation: 3x3 matrix R; translation: 3x1 vector T. Kristen Grauman
Stereo geometry, with calibrated cameras If the stereo rig is calibrated, we know : how to rotate and translate camera reference frame 1 to get to camera reference frame 2. Kristen Grauman X ' c RX c T
An aside: cross product Vector cross product takes two vectors and returns a third vector that s perpendicular to both inputs. So here, c is perpendicular to both a and b, which means the dot product = 0. Kristen Grauman
From geometry to algebra Kristen Grauman X' RX T T X Normal to the plane TRX TRX TT X T X XT RX 0
Another aside: Matrix form of cross product Can be expressed as a matrix multiplication. c b b b a a a a a a b a 3 2 1 1 2 1 3 2 3 0 0 0 0 0 0 1 2 1 3 2 3 a a a a a a a x Kristen Grauman
From geometry to algebra Kristen Grauman X' RX T T X Normal to the plane TRX TRX TT X T X XT RX 0
X X Essential matrix T RX 0 [T ] RX 0 x Let E [T x] R X T EX 0 E is called the essential matrix, and it relates corresponding image points between both cameras, given the rotation and translation. If we observe a point in one image, its position in other image is constrained to lie on line defined by above. Ex is the epipolar line through x in the first image, corresponding to x. Note: these points are in camera coordinate systems. Kristen Grauman
Essential matrix example: parallel cameras R I p [ x, y, f ] T E [ d,0,0] [ T x ]R 0 0 0 0 0 d 0 d 0 p' [ x', y', f ] p Ep 0 For the parallel cameras, image of any point must lie on same horizontal line in each image plane. Kristen Grauman
image I(x,y) Disparity map D(x,y) image I (x,y ) (x,y )=(x+d(x,y),y) What about when cameras optical axes are not parallel? Kristen Grauman
Stereo image rectification Reproject image planes onto a common plane parallel to the line between camera centers Pixel motion is horizontal after this transformation Two homographies (3x3 transform), one for each input image reprojection C. Loop and Z. Zhang. Computing Rectifying Homographies for Stereo Vision. IEEE Conf. Computer Vision and Pattern Recognition, 1999. Derek Hoiem
Mubarak Shah Image Rectification for Stereo
Alyosha Efros Stereo image rectification: example
What if we don t know the camera parameters? Want to estimate world geometry without requiring calibrated cameras Archival videos Photos from multiple unrelated users Weak calibration: Estimate epipolar geometry from a (redundant) set of point correspondences between two uncalibrated cameras Kristen Grauman
Computing F from correspondences Each point correspondence generates one constraint on F p im, rightfp im, left 0 Collect n of these constraints Solve for f, vector of parameters. Kristen Grauman
Fundamental matrix Relates pixel coordinates in the two views More general form than essential matrix: we remove need to know intrinsic parameters Kristen Grauman
Properties of the Fundamental matrix X x x Derek Hoiem x T Fx 0 with F K EK T 1 F x is the epipolar line associated with x (l = F x ) F T x is the epipolar line associated with x (l = F T x) F e = 0 and F T e = 0 F is singular (rank two): det(f)=0 F has seven degrees of freedom: 9 entries but defined up to scale, det(f)=0
Let s recap Fundamental matrix song Derek Hoiem
Today Review Projective transforms Image stitching (homography) Epipolar geometry Multiple views from different cameras Stereo vision Estimating depth from disparities Exam and homework info
Moving on to stereo Fuse a calibrated binocular stereo pair to produce a depth image image 1 image 2 Dense depth map Derek Hoiem
Basic stereo matching algorithm For each pixel in the first image Find corresponding epipolar scanline in the right image If necessary, rectify the two stereo images to transform epipolar lines into scanlines Search along epipolar line and pick the best match x Compute disparity x-x and set depth(x) = f*t/(x-x ) Derek Hoiem
Correspondence search Left Right scanline Matching cost disparity Slide a window along the right scanline and compare contents of that window with the reference window in the left image Matching cost: SSD or normalized correlation Derek Hoiem
Geometry for a simple stereo system Assume parallel optical axes, known camera parameters (i.e., calibrated cameras). What is expression for Z? Similar triangles (p l, P, p r ) and (O l, P, O r ): T x l Z f x r T Z depth disparity Z f T x r x l Kristen Grauman
Results with window search Data Left image Right image Window-based matching Window-based matching Ground truth Ground truth Derek Hoiem
How can we improve? Uniqueness For any point in one image, there should be at most one matching point in the other image Ordering Corresponding points should be in the same order in both views Smoothness We expect disparity values to change slowly (for the most part) Derek Hoiem
Many of these constraints can be encoded in an energy function and solved using graph cuts Before Derek Hoiem Graph cuts Ground truth Y. Boykov, O. Veksler, and R. Zabih, Fast Approximate Energy Minimization via Graph Cuts, PAMI 2001 For the latest and greatest: http://vision.middlebury.edu/stereo/
Projective structure from motion Given: m images of n fixed 3D points x ij = P i X j, i = 1,, m, j = 1,, n Problem: estimate m projection matrices P i and n 3D points X j from the mn corresponding 2D points x ij X j x 1j x 3j P 1 x 2j Svetlana Lazebnik P 2 P 3
Photo synth Noah Snavely, Steven M. Seitz, Richard Szeliski, "Photo tourism: Exploring photo collections in 3D," SIGGRAPH 2006 http://photosynth.net/
3D from multiple images Building Rome in a Day: Agarwal et al. 2009
Recap: Epipoles Point x in left image corresponds to epipolar line l in right image Epipolar line passes through the epipole (the intersection of the cameras baseline with the image plane C C Derek Hoiem
Recap: Fundamental Matrix Fundamental matrix maps from a point in one image to a line in the other If x and x correspond to the same 3d point X: Derek Hoiem
Recap: stereo with calibrated cameras Given image pair, R, T Detect some features Compute essential matrix E Match features using the epipolar and other constraints Triangulate for 3d structure and get depth Kristen Grauman
Summary Epipolar geometry Epipoles are intersection of baseline with image planes Matching point in second image is on a line passing through its epipole Epipolar constraint limits where points from one view will be imaged in the other, which makes search for correspondences quicker Fundamental matrix maps from a point in one image to a line (its epipolar line) in the other Can solve for F given corresponding points (e.g., interest points) Stereo depth estimation Find corresponding points along epipolar scanline Estimate disparity (depth is inverse to disparity) Modified from Kristen Grauman and Derek Hoiem
Today Review Projective transforms Image stitching (homography) Epipolar geometry Multiple views from different cameras Stereo vision Estimating depth from disparities Exam and homework info
Next Thursday (10/15) Midterm exam in class Review on Tuesday Email me with topics you want me to review or with questions Format Mostly short-answer questions (from easier/shorter to longer/harder) Some exercises to show you can apply some of the clustering and matching algorithms we discussed
Homework 1 Grades
Homework 2 Due tonight (11:59pm) Review late policy Beyond 3 free late days (3 total for the class), 1 minute late = 1 late day = 25% penalty Notes on Part III a: The x/y/scores you output should correspond to the final set of keypoints, after non-max suppression. If you re getting a negative mean R, you can ignore the threshold and output the top n keypoints (e.g. top 1%). Matlab tips
Homework 3 Released Due October 29, 11:59pm Part I: Hough transform for circles Part II: Video Google system (data and starter code provided)