Introduction to Computer Vision. Week 10, Winter 2010 Instructor: Prof. Ko Nishino

Similar documents
Project 3 code & artifact due Tuesday Final project proposals due noon Wed (by ) Readings Szeliski, Chapter 10 (through 10.5)

Recap from Previous Lecture

Project 2 due today Project 3 out today. Readings Szeliski, Chapter 10 (through 10.5)

What have we leaned so far?

CS5670: Computer Vision

Public Library, Stereoscopic Looking Room, Chicago, by Phillips, 1923

Multiple View Geometry

Lecture 14: Computer Vision

Binocular stereo. Given a calibrated binocular stereo pair, fuse it to produce a depth image. Where does the depth information come from?

Today. Stereo (two view) reconstruction. Multiview geometry. Today. Multiview geometry. Computational Photography

BIL Computer Vision Apr 16, 2014

Stereo. 11/02/2012 CS129, Brown James Hays. Slides by Kristen Grauman

Lecture 9 & 10: Stereo Vision

Stereo vision. Many slides adapted from Steve Seitz

Passive 3D Photography

Lecture 10: Multi view geometry

Computer Vision Lecture 17

Computer Vision Lecture 17

Passive 3D Photography

Stereo II CSE 576. Ali Farhadi. Several slides from Larry Zitnick and Steve Seitz

CS 4495 Computer Vision A. Bobick. Motion and Optic Flow. Stereo Matching

There are many cues in monocular vision which suggests that vision in stereo starts very early from two similar 2D images. Lets see a few...

CS 4495 Computer Vision A. Bobick. Motion and Optic Flow. Stereo Matching

Stereo: Disparity and Matching

Chaplin, Modern Times, 1936

Stereo and Epipolar geometry

Lecture 10: Multi-view geometry

Matching. Compare region of image to region of image. Today, simplest kind of matching. Intensities similar.

Stereo. Many slides adapted from Steve Seitz

Epipolar Geometry and Stereo Vision

Final project bits and pieces

Stereo Matching.

Stereo Vision. MAN-522 Computer Vision

Announcements. Stereo Vision Wrapup & Intro Recognition

Epipolar Geometry and Stereo Vision

Image Based Reconstruction II

Computer Vision I. Announcements. Random Dot Stereograms. Stereo III. CSE252A Lecture 16

Lecture'9'&'10:'' Stereo'Vision'

Fundamentals of Stereo Vision Michael Bleyer LVA Stereo Vision

CS4495/6495 Introduction to Computer Vision. 3B-L3 Stereo correspondence

Fundamental matrix. Let p be a point in left image, p in right image. Epipolar relation. Epipolar mapping described by a 3x3 matrix F

EE795: Computer Vision and Intelligent Systems

Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science.

EECS 442 Computer vision. Stereo systems. Stereo vision Rectification Correspondence problem Active stereo vision systems

Recap: Features and filters. Recap: Grouping & fitting. Now: Multiple views 10/29/2008. Epipolar geometry & stereo vision. Why multiple views?

55:148 Digital Image Processing Chapter 11 3D Vision, Geometry

Introduction à la vision artificielle X

Capturing, Modeling, Rendering 3D Structures

Dense 3D Reconstruction. Christiano Gava

Lecture 6 Stereo Systems Multi-view geometry

Stereo and structured light

CS6670: Computer Vision

Lecture 6 Stereo Systems Multi- view geometry Professor Silvio Savarese Computational Vision and Geometry Lab Silvio Savarese Lecture 6-24-Jan-15

Lecture 14: Basic Multi-View Geometry

The end of affine cameras

Dense 3D Reconstruction. Christiano Gava

Machine vision. Summary # 11: Stereo vision and epipolar geometry. u l = λx. v l = λy

Correspondence and Stereopsis. Original notes by W. Correa. Figures from [Forsyth & Ponce] and [Trucco & Verri]

Stereo. Outline. Multiple views 3/29/2017. Thurs Mar 30 Kristen Grauman UT Austin. Multi-view geometry, matching, invariant features, stereo vision

Structure from motion

Multiple View Geometry

Cameras and Stereo CSE 455. Linda Shapiro

Structure from motion

3D photography. Digital Visual Effects, Spring 2007 Yung-Yu Chuang 2007/5/15

1 (5 max) 2 (10 max) 3 (20 max) 4 (30 max) 5 (10 max) 6 (15 extra max) total (75 max + 15 extra)

6.819 / 6.869: Advances in Computer Vision Antonio Torralba and Bill Freeman. Lecture 11 Geometry, Camera Calibration, and Stereo.

55:148 Digital Image Processing Chapter 11 3D Vision, Geometry

Epipolar Geometry and Stereo Vision

calibrated coordinates Linear transformation pixel coordinates

EE795: Computer Vision and Intelligent Systems

Unit 3 Multiple View Geometry

Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science.

Stereo imaging ideal geometry

Image Rectification (Stereo) (New book: 7.2.1, old book: 11.1)

Epipolar Geometry and Stereo Vision

Final Exam Study Guide

Epipolar Constraint. Epipolar Lines. Epipolar Geometry. Another look (with math).

Announcements. Stereo

Reminder: Lecture 20: The Eight-Point Algorithm. Essential/Fundamental Matrix. E/F Matrix Summary. Computing F. Computing F from Point Matches

Structure from Motion and Multi- view Geometry. Last lecture

Two-view geometry Computer Vision Spring 2018, Lecture 10

CS231A Course Notes 4: Stereo Systems and Structure from Motion

Multi-stable Perception. Necker Cube

Stereo CSE 576. Ali Farhadi. Several slides from Larry Zitnick and Steve Seitz

Camera Calibration. Schedule. Jesus J Caban. Note: You have until next Monday to let me know. ! Today:! Camera calibration

Last lecture. Passive Stereo Spacetime Stereo

Computer Vision, Lecture 11

Computer Vision Projective Geometry and Calibration. Pinhole cameras

Structure from Motion. Lecture-15

Structure from Motion CSC 767

CS201 Computer Vision Camera Geometry

Computer Vision I. Announcement. Stereo Vision Outline. Stereo II. CSE252A Lecture 15

CEE598 - Visual Sensing for Civil Infrastructure Eng. & Mgmt.

Computer Vision Lecture 20

Epipolar Geometry in Stereo, Motion and Object Recognition

Epipolar geometry. x x

Computer Vision Lecture 20

Structure from Motion

Other approaches to obtaining 3D structure

Stereo Vision Computer Vision (Kris Kitani) Carnegie Mellon University

Transcription:

Introduction to Computer Vision Week 10, Winter 2010 Instructor: Prof. Ko Nishino

Today How do we recover geometry from 2 views? Stereo Can we recover geometry from a sequence of images Structure-from-Motion

Stereo

Recovering 3D from Images How can we automatically compute 3D geometry from images? What cues in the image provide 3D information?

Visual Cues Shading Merle Norman Cosmetics, Los Angeles

Visual Cues Shading Texture The Visual Cliff, by William Vandivert, 1960

Visual Cues Shading Texture Focus From The Art of Photography, Canon

Visual Cues Shading Texture Focus Motion

Visual Cues Shading Texture Focus Motion Others: Highlights Shadows Silhouettes Inter-reflections Symmetry Light Polarization... Shape From X X = shading, texture, focus, motion,... In this class we ll focus on the motion cue

Public Library, Stereoscopic Looking Room, Chicago, by Phillips, 1923

Teesta suspension bridge-darjeeling, India

Mark Twain at Pool Table", no date, UCR Museum of Photography

Woman getting eye exam during immigration procedure at Ellis Island, c. 1905-1920, UCR Museum of Phography

3-D Images Ltd.

Zuihoin, Kyoto by ArtServe@ANU.EDU.AU

Nike of Samothrace, Louvre by ArtServe@ANU.EDU.AU

By Shree Nayar

Anaglyphs Art and architecture around the world by ArtServe ANU.EDU.AU http://rubens.anu.edu.au/new/stereo.trials/ Pathfinder @ JPL.NASA http://mars.jpl.nasa.gov/mpf/mpf/anaglyph-arc.html Create your own! http://stereo3d.adpeach.com/ http://wxs.ca/3d/howto.html

Disparity and Depth ( X, Y, Z) scene left image Assume that we know P ( x, Z ) PL L Y b L y L X baseline corresponds to PR right image x, P ( ) R R y R From perspective projection (define the coordinate system as shown above) x X + b L = 2 f Z x X b R = 2 f Z y f y f L R = = Y Z

Disparity and Depth ( X, Y, Z) scene d = X x X + b L = 2 f Z x X b R = 2 f Z y f y f L R = = b( xl + xr) b( yl yr) = Y 2 ( x x ) = + bf Z = 2 ( x x ) ( ) x L x R left image L R P ( x, ) L b L y L baseline L R right image x, P ( ) x L x R is the disparity between corresponding left and right image points R R y R Y Z inverse proportional to depth disparity increases with baseline b

Vergence uncertainty of scenepoint field of view of stereo one pixel Optical axes of the two cameras need not be parallel Field of view decreases with increase in baseline and vergence (the right image is a bit deceptive) Accuracy increases with baseline and vergence

Stereo

Stereo Basic Principle: Triangulation Gives reconstruction as intersection of two rays Requires calibration point correspondence

Stereo Correspondence Determine Pixel Correspondence Pairs of points that correspond to same scene point epipolar plane Epipolar Constraint Reduces correspondence problem to 1D search along conjugate epipolar lines Java demo: http://www.ai.sri.com/~luong/research/meta3dviewer/epipolargeo.html

Fundamental Matrix Let p be a point in left image, p in right image l l Epipolar relation p maps to epipolar line l p maps to epipolar line l p p Epipolar mapping described by a 3x3 matrix F It follows that

Fundamental Matrix This matrix F is called the Essential Matrix when image intrinsic parameters are known the Fundamental Matrix more generally (uncalibrated case) Can solve for F from point correspondences Each (p, p ) pair gives one linear equation in entries of F 8 points give enough to solve for F (8-point algorithm)

So far Computer F For each point Computer epipolar line using F Search along the epipolar line But slanted epipolar lines are hard to search along!

Stereo Image Rectification

Stereo Image Rectification reproject image planes onto a common plane parallel to the line between optical centers pixel motion is horizontal after this transformation two homographies (3x3 transform), one for each input image reprojection C. Loop and Z. Zhang. Computing Rectifying Homographies for Stereo Vision. IEEE Conf. Computer Vision and Pattern Recognition, 1999.

Stereo Matching Algorithms Match Pixels in Conjugate Epipolar Lines Assume brightness constancy This is a tough problem Numerous approaches dynamic programming [Baker 81,Ohta 85] smoothness functionals more images (trinocular, N-ocular) [Okutomi 93] graph cuts [Boykov 00] A good survey and evaluation: http://www.middlebury.edu/stereo/

Basic Stereo Algorithm For each epipolar line For each pixel in the left image compare with every pixel on same epipolar line in right image pick pixel with minimum match cost Improvement: match windows This should look familar... Correlation, Sum of Squared Difference (SSD), etc.

Window size Effect of window size Smaller window good? bad? Larger window good? bad? W = 3 W = 20 Better results with adaptive window T. Kanade and M. Okutomi, A Stereo Matching Algorithm with an Adaptive Window: Theory and Experiment,, Proc. International Conference on Robotics and Automation, 1991. D. Scharstein and R. Szeliski. Stereo matching with nonlinear diffusion. International Journal of Computer Vision, 28(2):155-174, July 1998

Stereo Results Data from University of Tsukuba Similar results on other images without ground truth Scene Ground truth

Results with Window Search Window-based matching (best window size) Ground truth

Stereo as Energy Minimization Matching Cost Formulated as Energy data term penalizing bad matches D ( x, y, d) = I( x, y) J( x + d, y) neighborhood term encouraging spatial smoothness V ( d 1, d2) = = cost of adjacent pixels with labelsd1and d2 d 1 d 2 (or somethingsimilar) E = D( x, y, d x y ) + ( x, y), V ( d x1, y1, d x2, y2) neighbors ( x1, y1),( x2, y2)

Stereo as A Graph Problem [Boykov, 1999] edge weight D( x, y, d3) d 3 d 2 d 1 Labels (disparities) Pixels edge weight V ( d, d 1 1)

Graph Definition d 3 d 2 d 1 Initial state Each pixel connected to it s immediate neighbors Each disparity label connected to all of the pixels

Stereo Matching by Graph Cuts d 3 d 2 d 1 Graph Cut Delete enough edges so that each pixel is (transitively) connected to exactly one label node Cost of a cut: sum of deleted edge weights Finding min cost cut equivalent to finding global minimum of the energy function

Computing a Multiway Cut With two labels: classical min-cut problem Solvable by standard network flow algorithms polynomial time in theory, nearly linear in practice More than 2 labels: NP-hard [Dahlhaus et al., STOC 92] But efficient approximation algorithms exist Within a factor of 2 of optimal Computes local minimum in a strong sense even very large moves will not improve the energy Yuri Boykov, Olga Veksler and Ramin Zabih, Fast Approximate Energy Minimization via Graph Cuts, International Conference on Computer Vision, September 1999. Basic idea reduce to a series of 2-way-cut sub-problems, using one of: swap move: pixels with label l1 can change to l2, and vice-versa expansion move: any pixel can change it s label to l1

Using Graph-Cuts Boykov et al., Fast Approximate Energy Minimization via Graph Cuts, International Conference on Computer Vision, September 1999. Ground truth

Disparity and Depth ( X, Y, Z) scene d = X left image x X + b L = 2 f Z P ( x, Z ) L Y b L y L X baseline x X b R = 2 f Z right image x, P ( ) R y f R y R y f L R = = b( xl + xr) b( yl yr) = Y 2 ( x x ) = + bf Z = 2 ( x x ) ( ) x L x R L R L R x L x R is the disparity between corresponding left and right image points Y Z inverse proportional to depth disparity increases with baseline b

Stereo Example left image right image depth map H. Tao et al. Global matching criterion and color segmentation based stereo

Stereo Example H. Tao et al. Global matching criterion and color segmentation based stereo

Stereo Example H. Tao et al. Global matching criterion and color segmentation based stereo

Stereo Reconstruction Pipeline Steps Calibrate cameras Rectify images Compute disparity Estimate depth What will cause errors? Camera calibration errors Poor image resolution Occlusions Violations of brightness constancy (specular reflections) Large motions Low-contrast image regions

Active Stereo with Structured Light Li Zhang s one-shot stereo camera 1 camera 1 projector projector camera 2 Project structured light patterns onto the object simplifies the correspondence problem

Active Stereo with Structured Light

Structured Light Scanning Gray Code By Gabriel Taubin

Laser Scanning Object Laser sheet Direction of travel CCD image plane Laser Cylindrical lens CCD Digital Michelangelo Project http://graphics.stanford.edu/projects/mich/ Optical triangulation Project a single stripe of laser light Scan it across the surface of the object This is a very precise version of structured light scanning

Laser Scanned Models The Digital Michelangelo Project, Levoy et al.

Laser Scanned Models The Digital Michelangelo Project, Levoy et al.

Laser Scanned Models The Digital Michelangelo Project, Levoy et al.

Laser Scanned Models The Digital Michelangelo Project, Levoy et al.

Laser Scanned Models The Digital Michelangelo Project, Levoy et al.

Visual Cues Shading Texture Focus Motion

Structure from Motion Many of the slides courtesy of Prof. O. Camp

Structure from Motion Use small disparities to track features Integrate long sequences over time Find the structure (shape) and motion

SfM and Stereo Stereo: Two or more frames SfM: Two or more frames

Assumptions Orthographic projection We will find structure up to a scale factor n not all coplanar points P 1,P 2,,P n have been tracked in F frames, with F >=3

World to Camera Transform P C = R ( P W - C ) C P x C P y C P z 1 r 11 r 21 r 31 0 r 12 r 22 r 32 0 r 13 r 23 r 33 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 1 0 c c c 1 x y z W P x W P y W P z 1 P C. = M ext P W

Perspective Projection

Perspective Projection x!=! y!=! f! f! X! Z! Y! Z!

Simplification: Weak Perspective x!=! y!=! f! Z! o! f! Z! o! X! Y!

Simpler: Orthographic Projection x!=!x! y!=!y!

Perspective Matrix Equation (Camera Coordinates)

Weak Perspective Approximation x = y = f Z o f Z o X Y Z 0 Z 0 0 Z 0

Orthographic Projection x = X y = Y x y = 1 0 0 0 0 1 0 0 X Y Z 1

Combine with External Params x! y! 1! 0! 0! 0! r 11! r 12! r 13! 0! 0! 1! 0! 0! r 21!r 22! r 23! 0! r 31!r 32! r 33! 0! 0! 0! 0! 1! 1! 0! 0! 0! 0! 1! 0! 0! 0! 0! 1! 0! -! c! x! -! c! y! -! c! z! 1! W! P x! W! P y! W! P z! 1! x! y! r 11! r 12! r 21!r 22! r 13! r 23! 0! 0! 1! 0! 0! 0! 0! 1! 0! 0! 0! 0! 1! 0! -! c! x! -! c! y! -! c! z! 1! W! P x! W! P y! W! P z! 1!

Combine with External Params x! y! r 11! r 12! r 21!r 22! r 13! r 23! 0! 0! 1! 0! 0! 0! 0! 1! 0! 0! 0! 0! 1! 0! -! c! x! -! c! y! -! c! z! 1! W! P x! W! P y! W! P z! 1! x! y! r 11! r 12! r 21!r 22! r 13! r 23! W! P x! W! P y! -! W! -! P z! -! c! x! c! y! c! z!

Orthographic: Algebraic Equation x! y! i T! r 11! r 12! r 13! r 21!r 22! r 23! j T! P W! P x! W! P y! -! W! -! P z! -! T c! x! c! y! c! z! x = i T ( P - T )! y = j T ( P - T )!

Multiple Points, Multiple Frames x = i T ( P - T )! y = j T ( P - T )! n points! P 1 P 2 P i P n! F frames! i 1 i 2 i t i F! j 1 j 2 j t j F! T 1 T 2 T t T F! x ti = i t T ( P i - T t )! y ti = j t T ( P i - T t )!

Factorization Approach x ti = i t T ( P i - T t )! y tj = j t T ( P i - T t )! n points! P 1 P 2 P i P n! (We want to recover these)! Note that absolute position of the set of points is " something that cannot be uniquely recovered, so! First Trick: set the origin of the world coordinate" system to be the center of mass of the n points!! n! n!

Tomasi & Kanade Factorization Method World! Image!

Tomasi & Kanade Factorization Method World! Image!

Factorization Approach Second Trick: subtract off the center of mass of the" 2D points in each frame. (Centering)! x ti = i t T ( P i - T t )! y ti = j t T ( P i - T t )!

Tomasi & Kanade Factorization Method World! Image!

Factorization Approach x ti = i t T ( P i - T t )! y ti = j t T ( P i - T t )! centering! What have we accomplished so far?! 1) Removed unknown camera locations from equations.! 2) More importantly, we can now write everything " As a big matrix equation!

Factorization Approach Form a matrix of centered image points.! 2Fxn! ~ x! ~ 11 x! ~ 12 x! 13 ~ x! 1n! All N points" in one frame! ~ x! ~ F1 x! ~ F2 x! F3 ~ x! Fn! ~ y! ~ 11 y! ~ 12 y! 13 ~ y! 1n! ~ y! ~ F1 y! ~ F2 y! F3 ~ y! Fn!

Factorization Approach Form a matrix of centered image points.! 2Fxn! ~ x! ~ 11 x! ~ 12 x! 13 ~ x! 1n! Tracking one" point through" all F frames! ~ x! ~ F1 x! ~ F2 x! F3 ~ x! Fn! ~ y! ~ 11 y! ~ 12 y! 13 ~ y! 1n! ~ y! ~ F1 y! ~ F2 y! F3 ~ y! Fn!

Factorization Approach matrix of centered image points:! 2Fxn ~ x ~ 11 x 12 x ~ 13 ~ x 1n 2Fx3 i 1 T 3xn ~ x ~ F1 x ~ F2 x F3 ~ x Fn ~ y ~ 11 y 12 y ~ 13 ~ y 1n = i T F j T 1 P 1 P 2 P n ~ y ~ F1 y ~ F2 y F3 ~ y Fn j F T

Factorization Approach 2F x n! 2F x 3! 3 x n! W = R S! Centered! measurement! matrix! Motion! (camera! rotation)! Structure! (3D scene" points)!

Factorization Approach 2F x N! 2F x 3! 3 x N! W = R S! Rank Theorem:! The 2Fxn centered observation matrix has at most rank 3.! Proof:! Trivial, using the properties:! rank of mxn matrix is at most min(m,n)! rank of A*B is at most min(rank(a),rank(b))!

Tomasi & Kanade Factorization Method

Rank of a Matrix What is rank of a matrix, anyways?! Number of columns (rows) that are linearly independent.! If matrix A is treated as a linear map, it is the intrinsic" dimension of the space that is mapped into.! M-dimensional! MxN matrix! space! A! N-dimensional! space! This matrix would have rank 1!

Factorization Rank Theorem Importance of rank theorem:! Shows that video data is highly redundant! Precisely quantifies the redundancy! Suggests an algorithm for solving SFM!

Tomasi & Kanade Factorization Method

Factorization Approach Form SVD of measurement matrix W! 2Fxn! 2Fx2F! 2Fxn! nxn! W = U D V T! Diagonal matrix with singular values" sorted in decreasing order:! d 11 >= d 22 >= d 33 >=!

Factorization Approach Form SVD of measurement matrix W! 2Fxn! 2Fx2F! 2Fxn! nxn! W = U D V T! Another useful rank property:! Rank of a matrix is equal to the number of" nonzero singular values.! d 11, d 22, d 33 are only nonzero singular values (the rest are 0).!

Factorization Approach 2Fxn! 2Fx2F! 2Fxn! nxn! =! *! *! Singular values in" decreasing order!

Factorization Approach 2Fxn! 2Fx2F! 2Fxn! nxn! =! *! *! Rank theorem says:! These 3 are nonzero! These should be zero! In practice, due to noise, there may be more than" 3 nonzero singular values, but rank theorem tells us" to ignore all but the largest three.!

Factorization Approach 2Fxn! 2Fx2F! 2Fxn! nxn! 2Fx3! 3x3! 3xN! =! *! *! W = U D V T!

Factorization Approach Observed image points! W = SVD! U D V T! W = U D 1/2 D 1/2 V T! 2Fxn! 2Fx3! 3xn! W = R S! Camera" motion! Scene" structure!

Tomasi & Kanade Factorization Method Ambiguity

Tomasi & Kanade Factorization Method

Tomasi & Kanade Factorization Method

Solving the Ambiguity Solution to both problems:! Solve for Q such that appropriate rows of R satisfy! unit vectors! orthogonal! 3F equations in 9 unknowns" Note that these are nonlinear equations! (Still the solution is up to arbitrary rotation!!-- fix it such that the first frame is identity)!

Factorization Summary Assumptions! - orthographic camera! - n non-coplanar points tracking in F>=3 frames! ~! ~! Form the centered measurement matrix W=[X ; Y]! ~! - where x ti = x ti mx t! ~! - where y ti = y ti my t! - mx t and my t are mean of points in frame t! - i ranges over set of points! Rank theorem: The centered measurement matrix " has a rank of at most 3!

Factorization Algorithm 1) Form the centered measurement matrix W from n points " tracked over F frames.! 2) Compute SVD of W = U D V T " - U is 2Fx2F" - D is 2Fxn" - V T is nxn! 3) Take largest 3 singular values, and form " - D = 3x3 diagonal matrix of largest eigenvalues" - U = 2Fx3 matrix of corresponding column vectors from U" - V T = 3xn matrix of corresponding row vectors from V T! 4) Define" R = U D 1/2 and S = D 1/2 V T! 5) Solve for Q that makes appropriate rows of R orthogonal! 6) Final solution is " R* = R Q and S* = Q -1 S!

Four of 150 Input Images

Tracked Corner Features

3-D Reconstruction

Building

Reconstruction Reconstruction after Triangulation and Texture Mapping!

Input

Reconstruction

Reconstruction