Capturing, Modeling, Rendering 3D Structures

Computer Vision Approach Capturing, Modeling, Rendering 3D Structures Calculate pixel correspondences and extract geometry Not robust Difficult to acquire illumination effects, e.g. specular highlights Computer Vision (for 3D Reconstruction) n a Nutshell [Pollefeys99] Computer and Human Vision What is Computer Vision? Human Lens forms image on retina, sensors (rods and cones) respond to light Computer Lens system forms image, sensors (CCD) respond to light nput: images or video Output: description of the 3D world Low-level Mid-level High-level * Some slides courtesy of Szymon Rusinkiewicz, Princeton University; Marc Pollefeys, UNC- Chapel Hill Low-Level or Early Vision Mid-Level Vision Considers local properties of an image Grouping and segmentation There s an edge! There s an object and a background! 1

High-Level Vision Big Question #1: Who Cares? Recognition Applications of computer vision n A: vision serves as the input stage n medicine: understanding human vision n engineering: model extraction t s a chair! Vision and Other Fields Big Question #: Does t Work? Cognitive Psychology Signal Processing Situation much the same as A: Some fundamental algorithms Large collection of hacks / heuristics Artificial ntelligence Computer Vision Computer Graphics Vision is hard! Especially at high level, physiology unknown Requires integrating many different methods Requires abstract reasoning and understanding Pattern Analysis Metrology Computer Vision Computer Vision Early Vision mage Resolution and Filtering Edges, Corners, and Textures Motion Optical Flow Reconstruction Feature Tracking Epipolar Geometry Correspondence Depth Estimation from mages Pairs Early Vision mage Resolution and Filtering Edges, Corners, and Textures Motion Optical Flow Reconstruction Feature Tracking Epipolar Geometry Correspondence Depth Estimation from mages Pairs

mage Resolution and Filtering We are dealing with images (or 3D views) of limited resolution mage Resolution and Filtering We are dealing with images (or 3D views) of limited resolution mage Resolution and Filtering mage Resolution and Filtering Thresholding Make all pixels > 18 equal to white Make all pixels < 18 equal to black Original: Mandrill Smoothed with Gaussian kernel Gaussian Filters Gaussian Filters 3

Gaussian Filters One-dimensional Gaussian Two-dimensional Gaussian G G x 1 σ 1( x) = e σ π x + y 1 σ ( x, y) = e σ π Gaussian Convolution Out ( x, y) = f ( i, j) n( x i, y j) i Convolution f n is n n, f is m m, takes time O(m n ) OK for small filter kernels, bad for large ones Gaussian convolutions are used because: Smooth Decay to zero rapidly Simple analytic formula j Edges, Corners, Textures What are the basic image features? Edges, Corners, Textures What are the basic image features? point ambiguous? Edges, Corners, Textures What are the basic image features? Edges, Corners, Textures What are the basic image features? edge corner 4

Edges, Corners, Textures Example Feature Detection: Edges What are the basic image features? How do we find edges? s it hard? Are edges well-defined? texture What is an Edge? What is an Edge? Edge easy to find Where is edge? Single pixel wide or multiple pixels? What is an Edge? What is an Edge? Noise: have to distinguish noise from actual edge s this one edge or two? 5

What is an Edge? Canny Edge Detector s a texture discontinuity an edge? Original: Lena Edges Computer Vision Motion in Computer Vision Early Vision mage Resolution and Filtering Edges, Corners, and Textures Motion Optical Flow Reconstruction Feature Tracking Epipolar Geometry Correspondence Depth Estimation from mages Pairs Given an initial image, how does the image change as we move the viewpoint? Expansion/contraction Optical Flow of the mage Optical Flow of the mage Can pixels flow in any direction? Optical flow 6

Optical Flow of the mage Computer Vision Can pixels flow at any speed? Early Vision mage Resolution and Filtering Edges, Corners, and Textures Motion Optical Flow Reconstruction Feature Tracking Epipolar Geometry Correspondence Depth Estimation from mages Pairs Feature Tracking Feature Matching = Object Matching Follow feature A from image to image J Requires matching features Feature Matching = Object Matching Matching: What to match? Simplest: SSD with pixel windows 7

Comparing Windows: Comparing Windows:? = f g Most popular W = 3 W = 0 Must choose a good window size Subpixel SSD When motion is a few pixels or less, motion of an integer no. of pixels can be insufficient. Bilinear nterpolation To compare pixels that are not at integer grid points, we resample the image. Assume image is locally bilinear. (x,y) = ax + by + cxy + d = 0. Given the value of the image at four points: (x,y), (x+1,y), (x,y+1), (x+1,y+1) we can solve for a,b,c,d linearly. Then, for any u between x and x+1, for any v between y and y+1, we use this equation to find (u,v). Matching: How to Match Efficiently Baseline approach: try everything. arg min + u,v ( W( x,y ) ( x + u,y v )) Matching: Multiscale search search search Could range over whole image. Or only over a small displacement. search 8

Low resolution The Gaussian Pyramid When motion is small: Optical Flow Small motion: (u and v are less than 1 pixel) Brute force not possible suppose we take the Taylor series expansion of : High resolution Optical flow equation Optical flow equation Combining these two equations Q: how many unknowns and equations per pixel? ntuitively, what does this constraint mean? The component of the flow in the gradient direction is determined The component of the flow parallel to an edge is unknown n the limit as u and v go to zero, this becomes exact First Order Approximation Aperture problem: where are we? When we assume that: We assume an image locally is: 9

Aperture problem: where are we? Aperture problem: where are we? Aperture problem: where are we? Aperture problem: where are we? Solving the aperture problem How to get more equations for a pixel? Basic idea: impose additional constraints most common is to assume that the flow field is smooth locally one method: pretend the pixel s neighbors have the same (u,v) f we use a 5x5 window, that gives us 5 equations per pixel! Lucas-Kanade flow We have more equations than unknowns: solve least squares problem. This is given by: Summations over all pixels in the KxK window 10

Conditions for solvability What does this look like? Formula for Finding Corners Optimal (u, v) satisfies Lucas-Kanade equation When is This Solvable? A T A should be invertible A T A should not be too small due to noise eigenvalues λ 1 and λ of A T A should not be too small A T A should be well-conditioned λ 1 / λ should not be too large (λ 1 = larger eigenvalue) We look at matrix: Sum over a small region, the hypothetical corner C = Matrix is symmetric x x y Gradient with respect to x, times gradient with respect to y x y y WHY THS? First, consider case where: C = x x y x y y λ1 = 0 0 λ This means all gradients in neighborhood are: (k,0) or (0, c) or (0, 0) (or off-diagonals cancel). What is region like if: 1. λ1 = 0?. λ = 0? 3. λ1 = 0 and λ = 0? 4. λ1 > 0 and λ > 0? General Case: From Singular Value Decomposition it follows that since C is symmetric: λ C = R 1 0 1 R 0 λ where R is a rotation matrix. So every case is like one on last slide. So, corners are the things we can track Edge Corners are when λ1, λ are big; this is also when Lucas-Kanade works. Corners are regions with two different directions of gradient (at least). Aperture problem disappears at corners. Not perfect, but works large gradients, all the same large λ 1, small λ (Seitz) 11

Low texture region High textured region gradients have small magnitude smallλ 1, small λ (Seitz) gradients are different, large magnitudes large λ 1, large λ Errors in Lucas-Kanade Computer Vision What are the potential causes of errors in this procedure? Suppose A T A is easily invertible Suppose there is not much noise in the image When our assumptions are violated Brightness constancy is not satisfied The motion is not small A point does not move like its neighbors window size is too large what is the ideal window size? Early Vision mage Resolution and Filtering Edges, Corners, and Textures Motion Optical Flow Reconstruction Feature Tracking Epipolar Geometry Correspondence Depth Estimation from mages Pairs (Seitz) Simple 3D Reconstruction from a Pair of mages f we do not know the corresponding features Epipolar Geometry Epipolar plane Defined by the centers-of-projection C, C and point X 1

Epipolar Geometry Epipolar Geometry Epipolar constraint: even if the distance from C to X is unknown, we know its projection on C must move along the epipolar line This means 1-degree of freedom C C Epipolar Geometry Epipolar Geometry Example epipolar lines: lateral motion Example epipolar lines: converging cameras Epipolar Geometry Correspondence Find a feature in the other image of the pair Epipolar constraint reduces the problem to a 1D search But still a fundamentally difficult problem!!! Example epipolar lines: forward motion 13

Depth Estimation Depth Estimation How far a projected feature moves along the epipolar line is inversely proportional to the depth (distance) to the feature How big is a pixel? What is the precision of reconstruction from two images? mage ntensity = 1 / Depth Perfect example Depth Estimation Computer Vision Approach More realistic examples of what you can expect Calculate pixel correspondences and extract geometry Not robust Difficult to acquire illumination effects, e.g. specular highlights Normalized correlation Simulated annealing [Pollefeys99] 14