Introduction to Computer Vision. Week 10, Winter 2010 Instructor: Prof. Ko Nishino

Introduction to Computer Vision Week 10, Winter 2010 Instructor: Prof. Ko Nishino

Today How do we recover geometry from 2 views? Stereo Can we recover geometry from a sequence of images Structure-from-Motion

Stereo

Recovering 3D from Images How can we automatically compute 3D geometry from images? What cues in the image provide 3D information?

Visual Cues Shading Merle Norman Cosmetics, Los Angeles

Visual Cues Shading Texture The Visual Cliff, by William Vandivert, 1960

Visual Cues Shading Texture Focus From The Art of Photography, Canon

Visual Cues Shading Texture Focus Motion

Visual Cues Shading Texture Focus Motion Others: Highlights Shadows Silhouettes Inter-reflections Symmetry Light Polarization... Shape From X X = shading, texture, focus, motion,... In this class we ll focus on the motion cue

Public Library, Stereoscopic Looking Room, Chicago, by Phillips, 1923

Teesta suspension bridge-darjeeling, India

Mark Twain at Pool Table", no date, UCR Museum of Photography

Woman getting eye exam during immigration procedure at Ellis Island, c. 1905-1920, UCR Museum of Phography

3-D Images Ltd.

Zuihoin, Kyoto by ArtServe@ANU.EDU.AU

Nike of Samothrace, Louvre by ArtServe@ANU.EDU.AU

By Shree Nayar

Anaglyphs Art and architecture around the world by ArtServe ANU.EDU.AU http://rubens.anu.edu.au/new/stereo.trials/ Pathfinder @ JPL.NASA http://mars.jpl.nasa.gov/mpf/mpf/anaglyph-arc.html Create your own! http://stereo3d.adpeach.com/ http://wxs.ca/3d/howto.html

Disparity and Depth ( X, Y, Z) scene left image Assume that we know P ( x, Z ) PL L Y b L y L X baseline corresponds to PR right image x, P ( ) R R y R From perspective projection (define the coordinate system as shown above) x X + b L = 2 f Z x X b R = 2 f Z y f y f L R = = Y Z

Disparity and Depth ( X, Y, Z) scene d = X x X + b L = 2 f Z x X b R = 2 f Z y f y f L R = = b( xl + xr) b( yl yr) = Y 2 ( x x ) = + bf Z = 2 ( x x ) ( ) x L x R left image L R P ( x, ) L b L y L baseline L R right image x, P ( ) x L x R is the disparity between corresponding left and right image points R R y R Y Z inverse proportional to depth disparity increases with baseline b

Vergence uncertainty of scenepoint field of view of stereo one pixel Optical axes of the two cameras need not be parallel Field of view decreases with increase in baseline and vergence (the right image is a bit deceptive) Accuracy increases with baseline and vergence

Stereo

Stereo Basic Principle: Triangulation Gives reconstruction as intersection of two rays Requires calibration point correspondence

Stereo Correspondence Determine Pixel Correspondence Pairs of points that correspond to same scene point epipolar plane Epipolar Constraint Reduces correspondence problem to 1D search along conjugate epipolar lines Java demo: http://www.ai.sri.com/~luong/research/meta3dviewer/epipolargeo.html

Fundamental Matrix Let p be a point in left image, p in right image l l Epipolar relation p maps to epipolar line l p maps to epipolar line l p p Epipolar mapping described by a 3x3 matrix F It follows that

Fundamental Matrix This matrix F is called the Essential Matrix when image intrinsic parameters are known the Fundamental Matrix more generally (uncalibrated case) Can solve for F from point correspondences Each (p, p ) pair gives one linear equation in entries of F 8 points give enough to solve for F (8-point algorithm)

So far Computer F For each point Computer epipolar line using F Search along the epipolar line But slanted epipolar lines are hard to search along!

Stereo Image Rectification

Stereo Image Rectification reproject image planes onto a common plane parallel to the line between optical centers pixel motion is horizontal after this transformation two homographies (3x3 transform), one for each input image reprojection C. Loop and Z. Zhang. Computing Rectifying Homographies for Stereo Vision. IEEE Conf. Computer Vision and Pattern Recognition, 1999.

Stereo Matching Algorithms Match Pixels in Conjugate Epipolar Lines Assume brightness constancy This is a tough problem Numerous approaches dynamic programming [Baker 81,Ohta 85] smoothness functionals more images (trinocular, N-ocular) [Okutomi 93] graph cuts [Boykov 00] A good survey and evaluation: http://www.middlebury.edu/stereo/

Basic Stereo Algorithm For each epipolar line For each pixel in the left image compare with every pixel on same epipolar line in right image pick pixel with minimum match cost Improvement: match windows This should look familar... Correlation, Sum of Squared Difference (SSD), etc.

Window size Effect of window size Smaller window good? bad? Larger window good? bad? W = 3 W = 20 Better results with adaptive window T. Kanade and M. Okutomi, A Stereo Matching Algorithm with an Adaptive Window: Theory and Experiment,, Proc. International Conference on Robotics and Automation, 1991. D. Scharstein and R. Szeliski. Stereo matching with nonlinear diffusion. International Journal of Computer Vision, 28(2):155-174, July 1998

Stereo Results Data from University of Tsukuba Similar results on other images without ground truth Scene Ground truth

Results with Window Search Window-based matching (best window size) Ground truth

Stereo as Energy Minimization Matching Cost Formulated as Energy data term penalizing bad matches D ( x, y, d) = I( x, y) J( x + d, y) neighborhood term encouraging spatial smoothness V ( d 1, d2) = = cost of adjacent pixels with labelsd1and d2 d 1 d 2 (or somethingsimilar) E = D( x, y, d x y ) + ( x, y), V ( d x1, y1, d x2, y2) neighbors ( x1, y1),( x2, y2)

Stereo as A Graph Problem [Boykov, 1999] edge weight D( x, y, d3) d 3 d 2 d 1 Labels (disparities) Pixels edge weight V ( d, d 1 1)

Graph Definition d 3 d 2 d 1 Initial state Each pixel connected to it s immediate neighbors Each disparity label connected to all of the pixels

Stereo Matching by Graph Cuts d 3 d 2 d 1 Graph Cut Delete enough edges so that each pixel is (transitively) connected to exactly one label node Cost of a cut: sum of deleted edge weights Finding min cost cut equivalent to finding global minimum of the energy function

Computing a Multiway Cut With two labels: classical min-cut problem Solvable by standard network flow algorithms polynomial time in theory, nearly linear in practice More than 2 labels: NP-hard [Dahlhaus et al., STOC 92] But efficient approximation algorithms exist Within a factor of 2 of optimal Computes local minimum in a strong sense even very large moves will not improve the energy Yuri Boykov, Olga Veksler and Ramin Zabih, Fast Approximate Energy Minimization via Graph Cuts, International Conference on Computer Vision, September 1999. Basic idea reduce to a series of 2-way-cut sub-problems, using one of: swap move: pixels with label l1 can change to l2, and vice-versa expansion move: any pixel can change it s label to l1

Using Graph-Cuts Boykov et al., Fast Approximate Energy Minimization via Graph Cuts, International Conference on Computer Vision, September 1999. Ground truth

Disparity and Depth ( X, Y, Z) scene d = X left image x X + b L = 2 f Z P ( x, Z ) L Y b L y L X baseline x X b R = 2 f Z right image x, P ( ) R y f R y R y f L R = = b( xl + xr) b( yl yr) = Y 2 ( x x ) = + bf Z = 2 ( x x ) ( ) x L x R L R L R x L x R is the disparity between corresponding left and right image points Y Z inverse proportional to depth disparity increases with baseline b

Stereo Example left image right image depth map H. Tao et al. Global matching criterion and color segmentation based stereo

Stereo Example H. Tao et al. Global matching criterion and color segmentation based stereo

Stereo Reconstruction Pipeline Steps Calibrate cameras Rectify images Compute disparity Estimate depth What will cause errors? Camera calibration errors Poor image resolution Occlusions Violations of brightness constancy (specular reflections) Large motions Low-contrast image regions

Active Stereo with Structured Light Li Zhang s one-shot stereo camera 1 camera 1 projector projector camera 2 Project structured light patterns onto the object simplifies the correspondence problem

Active Stereo with Structured Light

Structured Light Scanning Gray Code By Gabriel Taubin

Laser Scanning Object Laser sheet Direction of travel CCD image plane Laser Cylindrical lens CCD Digital Michelangelo Project http://graphics.stanford.edu/projects/mich/ Optical triangulation Project a single stripe of laser light Scan it across the surface of the object This is a very precise version of structured light scanning

Laser Scanned Models The Digital Michelangelo Project, Levoy et al.

Visual Cues Shading Texture Focus Motion

Structure from Motion Many of the slides courtesy of Prof. O. Camp

Structure from Motion Use small disparities to track features Integrate long sequences over time Find the structure (shape) and motion

SfM and Stereo Stereo: Two or more frames SfM: Two or more frames

Assumptions Orthographic projection We will find structure up to a scale factor n not all coplanar points P 1,P 2,,P n have been tracked in F frames, with F >=3

World to Camera Transform P C = R ( P W - C ) C P x C P y C P z 1 r 11 r 21 r 31 0 r 12 r 22 r 32 0 r 13 r 23 r 33 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 1 0 c c c 1 x y z W P x W P y W P z 1 P C. = M ext P W

Perspective Projection

Perspective Projection x!=! y!=! f! f! X! Z! Y! Z!

Simplification: Weak Perspective x!=! y!=! f! Z! o! f! Z! o! X! Y!

Simpler: Orthographic Projection x!=!x! y!=!y!

Perspective Matrix Equation (Camera Coordinates)

Weak Perspective Approximation x = y = f Z o f Z o X Y Z 0 Z 0 0 Z 0

Orthographic Projection x = X y = Y x y = 1 0 0 0 0 1 0 0 X Y Z 1

Combine with External Params x! y! 1! 0! 0! 0! r 11! r 12! r 13! 0! 0! 1! 0! 0! r 21!r 22! r 23! 0! r 31!r 32! r 33! 0! 0! 0! 0! 1! 1! 0! 0! 0! 0! 1! 0! 0! 0! 0! 1! 0! -! c! x! -! c! y! -! c! z! 1! W! P x! W! P y! W! P z! 1! x! y! r 11! r 12! r 21!r 22! r 13! r 23! 0! 0! 1! 0! 0! 0! 0! 1! 0! 0! 0! 0! 1! 0! -! c! x! -! c! y! -! c! z! 1! W! P x! W! P y! W! P z! 1!

Combine with External Params x! y! r 11! r 12! r 21!r 22! r 13! r 23! 0! 0! 1! 0! 0! 0! 0! 1! 0! 0! 0! 0! 1! 0! -! c! x! -! c! y! -! c! z! 1! W! P x! W! P y! W! P z! 1! x! y! r 11! r 12! r 21!r 22! r 13! r 23! W! P x! W! P y! -! W! -! P z! -! c! x! c! y! c! z!

Orthographic: Algebraic Equation x! y! i T! r 11! r 12! r 13! r 21!r 22! r 23! j T! P W! P x! W! P y! -! W! -! P z! -! T c! x! c! y! c! z! x = i T ( P - T )! y = j T ( P - T )!

Multiple Points, Multiple Frames x = i T ( P - T )! y = j T ( P - T )! n points! P 1 P 2 P i P n! F frames! i 1 i 2 i t i F! j 1 j 2 j t j F! T 1 T 2 T t T F! x ti = i t T ( P i - T t )! y ti = j t T ( P i - T t )!

Factorization Approach x ti = i t T ( P i - T t )! y tj = j t T ( P i - T t )! n points! P 1 P 2 P i P n! (We want to recover these)! Note that absolute position of the set of points is " something that cannot be uniquely recovered, so! First Trick: set the origin of the world coordinate" system to be the center of mass of the n points!! n! n!

Tomasi & Kanade Factorization Method World! Image!

Factorization Approach Second Trick: subtract off the center of mass of the" 2D points in each frame. (Centering)! x ti = i t T ( P i - T t )! y ti = j t T ( P i - T t )!

Tomasi & Kanade Factorization Method World! Image!

Factorization Approach x ti = i t T ( P i - T t )! y ti = j t T ( P i - T t )! centering! What have we accomplished so far?! 1) Removed unknown camera locations from equations.! 2) More importantly, we can now write everything " As a big matrix equation!

Factorization Approach Form a matrix of centered image points.! 2Fxn! ~ x! ~ 11 x! ~ 12 x! 13 ~ x! 1n! All N points" in one frame! ~ x! ~ F1 x! ~ F2 x! F3 ~ x! Fn! ~ y! ~ 11 y! ~ 12 y! 13 ~ y! 1n! ~ y! ~ F1 y! ~ F2 y! F3 ~ y! Fn!

Factorization Approach Form a matrix of centered image points.! 2Fxn! ~ x! ~ 11 x! ~ 12 x! 13 ~ x! 1n! Tracking one" point through" all F frames! ~ x! ~ F1 x! ~ F2 x! F3 ~ x! Fn! ~ y! ~ 11 y! ~ 12 y! 13 ~ y! 1n! ~ y! ~ F1 y! ~ F2 y! F3 ~ y! Fn!

Factorization Approach matrix of centered image points:! 2Fxn ~ x ~ 11 x 12 x ~ 13 ~ x 1n 2Fx3 i 1 T 3xn ~ x ~ F1 x ~ F2 x F3 ~ x Fn ~ y ~ 11 y 12 y ~ 13 ~ y 1n = i T F j T 1 P 1 P 2 P n ~ y ~ F1 y ~ F2 y F3 ~ y Fn j F T

Factorization Approach 2F x n! 2F x 3! 3 x n! W = R S! Centered! measurement! matrix! Motion! (camera! rotation)! Structure! (3D scene" points)!

Factorization Approach 2F x N! 2F x 3! 3 x N! W = R S! Rank Theorem:! The 2Fxn centered observation matrix has at most rank 3.! Proof:! Trivial, using the properties:! rank of mxn matrix is at most min(m,n)! rank of A*B is at most min(rank(a),rank(b))!

Tomasi & Kanade Factorization Method

Rank of a Matrix What is rank of a matrix, anyways?! Number of columns (rows) that are linearly independent.! If matrix A is treated as a linear map, it is the intrinsic" dimension of the space that is mapped into.! M-dimensional! MxN matrix! space! A! N-dimensional! space! This matrix would have rank 1!

Factorization Rank Theorem Importance of rank theorem:! Shows that video data is highly redundant! Precisely quantifies the redundancy! Suggests an algorithm for solving SFM!

Tomasi & Kanade Factorization Method

Factorization Approach Form SVD of measurement matrix W! 2Fxn! 2Fx2F! 2Fxn! nxn! W = U D V T! Diagonal matrix with singular values" sorted in decreasing order:! d 11 >= d 22 >= d 33 >=!

Factorization Approach Form SVD of measurement matrix W! 2Fxn! 2Fx2F! 2Fxn! nxn! W = U D V T! Another useful rank property:! Rank of a matrix is equal to the number of" nonzero singular values.! d 11, d 22, d 33 are only nonzero singular values (the rest are 0).!

Factorization Approach 2Fxn! 2Fx2F! 2Fxn! nxn! =! *! *! Singular values in" decreasing order!

Factorization Approach 2Fxn! 2Fx2F! 2Fxn! nxn! =! *! *! Rank theorem says:! These 3 are nonzero! These should be zero! In practice, due to noise, there may be more than" 3 nonzero singular values, but rank theorem tells us" to ignore all but the largest three.!

Factorization Approach 2Fxn! 2Fx2F! 2Fxn! nxn! 2Fx3! 3x3! 3xN! =! *! *! W = U D V T!

Factorization Approach Observed image points! W = SVD! U D V T! W = U D 1/2 D 1/2 V T! 2Fxn! 2Fx3! 3xn! W = R S! Camera" motion! Scene" structure!

Tomasi & Kanade Factorization Method Ambiguity

Tomasi & Kanade Factorization Method

Solving the Ambiguity Solution to both problems:! Solve for Q such that appropriate rows of R satisfy! unit vectors! orthogonal! 3F equations in 9 unknowns" Note that these are nonlinear equations! (Still the solution is up to arbitrary rotation!!-- fix it such that the first frame is identity)!

Factorization Summary Assumptions! - orthographic camera! - n non-coplanar points tracking in F>=3 frames! ~! ~! Form the centered measurement matrix W=[X ; Y]! ~! - where x ti = x ti mx t! ~! - where y ti = y ti my t! - mx t and my t are mean of points in frame t! - i ranges over set of points! Rank theorem: The centered measurement matrix " has a rank of at most 3!

Factorization Algorithm 1) Form the centered measurement matrix W from n points " tracked over F frames.! 2) Compute SVD of W = U D V T " - U is 2Fx2F" - D is 2Fxn" - V T is nxn! 3) Take largest 3 singular values, and form " - D = 3x3 diagonal matrix of largest eigenvalues" - U = 2Fx3 matrix of corresponding column vectors from U" - V T = 3xn matrix of corresponding row vectors from V T! 4) Define" R = U D 1/2 and S = D 1/2 V T! 5) Solve for Q that makes appropriate rows of R orthogonal! 6) Final solution is " R* = R Q and S* = Q -1 S!

Four of 150 Input Images

Tracked Corner Features

3-D Reconstruction

Building

Reconstruction Reconstruction after Triangulation and Texture Mapping!

Input

Reconstruction