C280, Computer Vision

Similar documents
Structure from motion

Structure from Motion CSC 767

Structure from motion

Structure from motion

BIL Computer Vision Apr 16, 2014

Structure from Motion

Structure from Motion

Structure from Motion

CS 532: 3D Computer Vision 7 th Set of Notes

Lecture 10: Multi-view geometry

Structure from Motion

CEE598 - Visual Sensing for Civil Infrastructure Eng. & Mgmt.

Lecture 10: Multi view geometry

Lecture 6 Stereo Systems Multi- view geometry Professor Silvio Savarese Computational Vision and Geometry Lab Silvio Savarese Lecture 6-24-Jan-15

CS231M Mobile Computer Vision Structure from motion

Structure from Motion

Computer Vision I - Algorithms and Applications: Multi-View 3D reconstruction

Lecture 9: Epipolar Geometry

calibrated coordinates Linear transformation pixel coordinates

CS 664 Structure and Motion. Daniel Huttenlocher

Structure from Motion. Introduction to Computer Vision CSE 152 Lecture 10

Lecture 6 Stereo Systems Multi-view geometry

Last lecture. Passive Stereo Spacetime Stereo

Geometry for Computer Vision

Epipolar Geometry and Stereo Vision

Stereo and Epipolar geometry

Epipolar Geometry and Stereo Vision

Unit 3 Multiple View Geometry

Step-by-Step Model Buidling

55:148 Digital Image Processing Chapter 11 3D Vision, Geometry

Epipolar Geometry and Stereo Vision

CS 4495 Computer Vision A. Bobick. Motion and Optic Flow. Stereo Matching

Computer Vision Lecture 17

Computer Vision Lecture 17

Camera Geometry II. COS 429 Princeton University

55:148 Digital Image Processing Chapter 11 3D Vision, Geometry

Structure from Motion

Two-view geometry Computer Vision Spring 2018, Lecture 10

Visual Recognition: Image Formation

Index. 3D reconstruction, point algorithm, point algorithm, point algorithm, point algorithm, 263

A Factorization Method for Structure from Planar Motion

Metric Structure from Motion

CS 2770: Intro to Computer Vision. Multiple Views. Prof. Adriana Kovashka University of Pittsburgh March 14, 2017

CS231A Midterm Review. Friday 5/6/2016

Structure from Motion and Multi- view Geometry. Last lecture

Stereo Vision. MAN-522 Computer Vision

Index. 3D reconstruction, point algorithm, point algorithm, point algorithm, point algorithm, 253

Epipolar Geometry and Stereo Vision

Dense 3D Reconstruction. Christiano Gava

Multiple Motion Scene Reconstruction from Uncalibrated Views

Multiple Views Geometry

Fundamental matrix. Let p be a point in left image, p in right image. Epipolar relation. Epipolar mapping described by a 3x3 matrix F

CS 4495 Computer Vision A. Bobick. Motion and Optic Flow. Stereo Matching

Computer Vision Lecture 20

Camera Calibration. COS 429 Princeton University

Computer Vision Lecture 20

Reminder: Lecture 20: The Eight-Point Algorithm. Essential/Fundamental Matrix. E/F Matrix Summary. Computing F. Computing F from Point Matches

Two-View Geometry (Course 23, Lecture D)

Structure from Motion. Prof. Marco Marcon

3D reconstruction class 11

Epipolar Geometry Prof. D. Stricker. With slides from A. Zisserman, S. Lazebnik, Seitz

Computer Vision in a Non-Rigid World

CS231A Course Notes 4: Stereo Systems and Structure from Motion

Lecture 5 Epipolar Geometry

Image correspondences and structure from motion

Perception and Action using Multilinear Forms

N-View Methods. Diana Mateus, Nassir Navab. Computer Aided Medical Procedures Technische Universität München. 3D Computer Vision II

Recovering structure from a single view Pinhole perspective projection

Week 2: Two-View Geometry. Padua Summer 08 Frank Dellaert

Homographies and RANSAC

Epipolar geometry. x x

EECS 442 Computer vision. Stereo systems. Stereo vision Rectification Correspondence problem Active stereo vision systems

Geometric camera models and calibration

Stereo II CSE 576. Ali Farhadi. Several slides from Larry Zitnick and Steve Seitz

Chaplin, Modern Times, 1936

Structure from Motion. Lecture-15

N-Views (1) Homographies and Projection

Stereo CSE 576. Ali Farhadi. Several slides from Larry Zitnick and Steve Seitz

CS6670: Computer Vision

3D Photography: Epipolar geometry

Miniature faking. In close-up photo, the depth of field is limited.

Today s lecture. Structure from Motion. Today s lecture. Parameterizing rotations

The end of affine cameras

Occlusion-Aware Reconstruction and Manipulation of 3D Articulated Objects

Final project bits and pieces

Computer Vision Lecture 20

arxiv: v1 [cs.cv] 28 Sep 2018

Multi-view geometry problems

Dense 3D Reconstruction. Christiano Gava

Mei Han Takeo Kanade. January Carnegie Mellon University. Pittsburgh, PA Abstract

Homogeneous Coordinates. Lecture18: Camera Models. Representation of Line and Point in 2D. Cross Product. Overall scaling is NOT important.

1 Projective Geometry

Srikumar Ramalingam. Review. 3D Reconstruction. Pose Estimation Revisited. School of Computing University of Utah

Multi-view stereo. Many slides adapted from S. Seitz

Stereo Epipolar Geometry for General Cameras. Sanja Fidler CSC420: Intro to Image Understanding 1 / 33

Lecture 14: Basic Multi-View Geometry

Camera models and calibration

Today. Stereo (two view) reconstruction. Multiview geometry. Today. Multiview geometry. Computational Photography

There are many cues in monocular vision which suggests that vision in stereo starts very early from two similar 2D images. Lets see a few...

Recap: Features and filters. Recap: Grouping & fitting. Now: Multiple views 10/29/2008. Epipolar geometry & stereo vision. Why multiple views?

Transcription:

C280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 11: Structure from Motion

Roadmap Previous: Image formation, filtering, local features, (Texture) Tues: Feature-based Alignment Stitching images together Homographies, RANSAC, Warping, Blending Global alignment of planar models Today: Dense Motion Models Local motion / feature displacement Parametric optic flow No classes next week: ICCV conference Oct 6 th : Stereo / Multi-view : Estimating depth with known inter-camera pose Oct 8 th : Structure-from-motion : Estimation of pose and 3D structure Factorization approaches Global alignment with 3D point models

Last time: Stereo Human stereopsis& stereograms Epipolar geometry and the epipolar constraint Case example with parallel optical axes General case with calibrated cameras Correspondence search The Essential and the Fundamental Matrix Multi-view stereo

Today: SFM SFM problem statement Factorization Projective SFM Bundle Adjustment Photo Tourism Rome in a day:

Lazebnik Structure from motion

Multiple-view geometry questions Scene geometry (structure): Given 2D point matches in two or more images, where are the corresponding points in 3D? Correspondence (stereo matching): Given a point in just one image, how does it constrain the position of the corresponding point in another image? Camera geometry (motion): Given a set of corresponding points in two or more images, what are the camera matrices for these views? Lazebnik

Structure from motion Given: mimages of nfixed 3D points x ij = P i X j, i = 1,, m, j = 1,, n Problem: estimate mprojection matrices P i and n3d points X j from the mncorrespondences x ij X j x 1j x 3j P 1 x 2j Lazebnik P 2 P 3

Structure from motion ambiguity If we scale the entire scene by some factor k and, at the same time, scale the camera matrices by the factor of 1/k, the projections of the scene points in the image remain exactly the same: x = PX = 1 k P ( kx) It is impossible to recover the absolute scale of the scene! Lazebnik

Structure from motion ambiguity If we scale the entire scene by some factor k and, at the same time, scale the camera matrices by the factor of 1/k, the projections of the scene points in the image remain exactly the same More generally: if we transform the scene using a transformation Qand apply the inverse transformation to the camera matrices, then the images do not change x = PX = ( PQ -1)( QX) Lazebnik

Projective ambiguity x = PX = ( PQ -1 )( Q X) P P Lazebnik

Lazebnik Projective ambiguity

Affine ambiguity Affine x = PX = ( PQ -1 )( Q X) A A Lazebnik

Lazebnik Affine ambiguity

Similarity ambiguity x = PX = ( PQ -1)( Q X) S S Lazebnik

Lazebnik Similarity ambiguity

Hierarchy of 3D transformations Projective 15dof A T v t v Preserves intersection and tangency Affine 12dof A T 0 t 1 Preserves parallellism, volume ratios Similarity 7dof s R T 0 t 1 Preserves angles, ratios of length Euclidean 6dof R T 0 t 1 Preserves angles, lengths With no constraints on the camera calibration matrix or on the scene, we get a projective reconstruction Need additional information to upgradethe reconstruction to affine, similarity, or Euclidean Lazebnik

Structure from motion Let s start with affine cameras (the math is easier) center at infinity Lazebnik

Recall: Orthographic Projection Special case of perspective projection Distance from center of projection to image plane is infinite Image World Projection matrix: Lazebnik Slide by Steve Seitz

Affine cameras Orthographic Projection Parallel Projection Lazebnik

Affine cameras A general affine camera combines the effects of an affine transformation of the 3D space, orthographic projection, and an affine transformation of the image: = = = 1 0 b A P 1 0 0 0 4affine] [4 1 0 0 0 0 0 1 0 0 0 0 1 3affine] 3 [ 2 23 22 21 1 13 12 11 b a a a b a a a Affine projection is a linear mapping + translation in inhomogeneous coordinates 1 0 0 0 1 0 0 0 x X a 1 a 2 b AX x + = + = = 2 1 23 22 21 13 12 11 b b Z Y X a a a a a a y x Projection of world origin Lazebnik

Lazebnik Affine structure from motion Given: mimages of nfixed 3D points: x ij = A i X j + b i, i = 1,, m, j = 1,, n Problem: use the mncorrespondences x ij to estimate m projection matrices A i and translation vectors b i, and npoints X j The reconstruction is defined up to an arbitrary affine transformation Q (12 degrees of freedom): A 1 0 b 1 A 0 b Q 1, X Q 1 We have 2mn knowns and 8m + 3nunknowns (minus 12 dof for affine ambiguity) Thus, we must have 2mn>= 8m + 3n 12 For two views, we need four point correspondences X 1

Affine structure from motion Centering: subtract the centroid of the image points xˆ ij = = x ij A i 1 n X j n k = 1 x 1 n ik n = k = 1 A X X k i j + b i i = A Xˆ 1 n j n k = 1 ( A X + b ) For simplicity, assume that the origin of the world coordinate system is at the centroid of the 3D points After centering, each normalized point x ij is related to the 3D point X i by i k i x ˆ = A ij i X j Lazebnik

Affine structure from motion D Let s create a 2m n data (measurement) matrix: xˆ = xˆ x 11 21 xˆ xˆ 12 22 L L O ˆ m xˆ 1 m2 L xˆ xˆ xˆ 1n 2n mn cameras (2m) points (n) C. Tomasi and T. Kanade. Shape and motion from image streams under orthography: A factorization method. IJCV, 9(2):137-154, November 1992.

Affine structure from motion Let s create a 2m n data (measurement) matrix: [ ] n n n X X X A A x x x x x x D L M O L L 2 1 2 1 2 22 21 1 12 11 ˆ ˆ ˆ ˆ ˆ ˆ = = points (3 n) m mn m m A x x x L 2 1 ˆ ˆ ˆ cameras (2m 3) The measurement matrix D = MS must have rank 3! C. Tomasi and T. Kanade. Shape and motion from image streams under orthography: A factorization method. IJCV, 9(2):137-154, November 1992.

Factorizing the measurement matrix Lazebnik Source: M. Hebert

Factorizing the measurement matrix Singular value decomposition of D: Lazebnik Source: M. Hebert

Factorizing the measurement matrix Singular value decomposition of D: Lazebnik Source: M. Hebert

Factorizing the measurement matrix Obtaining a factorization from SVD: Lazebnik Source: M. Hebert

Factorizing the measurement matrix Obtaining a factorization from SVD: This decomposition minimizes D-MS 2 Lazebnik Source: M. Hebert

Affine ambiguity The decomposition is not unique. We get the same D by using any 3 3 matrix C and applying the transformations M MC, S C -1 S That is because we have only an affine transformation and we have not enforced any Euclidean constraints (like forcing the image axes to be perpendicular, for example) Lazebnik Source: M. Hebert

Eliminating the affine ambiguity Orthographic: image axes are perpendicular and scale is 1 a 1 a 2 = 0 x a 1 2 = a 2 2 = 1 a 2 a 1 This translates into 3mequations in L = CC T : A i L A it = Id, i= 1,, m Solve for L Recover Cfrom Lby Cholesky decomposition: L = CC T Update Mand S: M = MC, S = C -1 S X Lazebnik Source: M. Hebert

Algorithm summary Given: m images and n features x ij For each image i, center the feature coordinates Construct a 2m n measurement matrix D: Column j contains the projection of point j in all views Row i contains one coordinate of the projections of all the n points in image i Factorize D: Compute SVD: D =U W V T Create U 3 by taking the first 3 columns of U Create V 3 by taking the first 3 columns of V Create W 3 by taking the upper left 3 3 block ofw Create the motion and shape matrices: M= U 3 W 3 ½ and S= W 3½ V 3T (or M = U 3 and S= W 3 V 3T ) Eliminate affine ambiguity Lazebnik Source: M. Hebert

Reconstruction results C. Tomasi and T. Kanade. Shape and motion from image streams under orthography: A factorization method. IJCV, 9(2):137-154, November 1992.

Dealing with missing data So far, we have assumed that all points are visible in all views In reality, the measurement matrix typically looks something like this: cameras points Lazebnik

Dealing with missing data Possible solution: decompose matrix into dense subblocks, factorize each sub-block, and fuse the results Finding dense maximal sub-blocks of the matrix is NP-complete (equivalent to finding maximal cliques in a graph) Incremental bilinear refinement (1) Perform factorization on a dense sub-block (2) Solve for a new 3D point visible by at least two known cameras (linear least squares) (3) Solve for a new camera that sees at least three known 3D points (linear least squares) F. Rothganger, S. Lazebnik, C. Schmid, and J. Ponce. Segmenting, Modeling, and Matching Video Clips Containing Multiple Moving Objects. PAMI 2007.

Further Factorization work Factorization with uncertainty Factorization for indep. moving objects (now) Factorization for articulated objects (now) Factorization for dynamic objects (now) Perspective factorization (next week) (Irani & Anandan, IJCV 02) (Costeira and Kanade 94) (Yan and Pollefeys 05) Factorization with outliers and missing pts. (Bregler et al. 2000, Brand 2001) (Sturm & Triggs 1996, ) Pollefeys (Jacobs 97 (affine), Martinek & Pajdla 01 Aanaes 02 (perspective))

Pollefeys Structure from motion of multiple moving objects

Pollefeys Structure from motion of multiple moving objects

Shape interaction matrix Shape interaction matrix for articulated objects looses block diagonal structure Costeira and Kanade s approach is not usable for articulated bodies (assumes independent motions) Pollefeys

Articulated motion subspaces Motion subspaces for articulated bodies intersect Joint (1D intersection) (Yan and Pollefeys, CVPR 05) (Tresadern and Reid, CVPR 05) (rank=8-1) (joint=origin) Hinge (2D intersection) (rank=8-2) (hinge=z-axis) Exploit rank constraint to obtain better estimate Also for non-rigid parts if (Yan & Pollefeys, 06?)

Results Toy truck Segmentation Student Segmentation Intersection Intersection Pollefeys

Articulated shape and motion factorization Automated kinematic chain building for articulated & non-rigid obj. Estimate principal angles between subspaces Compute affinities based on principal angles Compute minimum spanning tree (Yan and Pollefeys, 2006?) Pollefeys

Structure from motion of deforming objects (Bregler et al 00; Brand 01) Extend factorization approaches to deal with dynamic shapes Pollefeys

Representing dynamic shapes S(t) = k c k (t)s k (fig. M.Brand) represent dynamic shape as varying linear combination of basis shapes Pollefeys

Results (Bregler et al 00) Pollefeys

Dynamic SfM factorization (Brand 01) constraints to be satisfied for M constraints to be satisfied for M, use to compute J hard! (different methods are possible, not so simple and also not optimal) Pollefeys

Non-rigid 3D subspace flow Same is also possible using optical flow in stead of features, also takes uncertainty into account (Brand 01) Pollefeys

Results (Brand 01) Pollefeys

Results (Bregler et al 01) Pollefeys

Projective structure from motion Given: mimages of nfixed 3D points z ij x ij = P i X j, i = 1,, m, j = 1,, n Problem: estimate mprojection matrices P i and n3d points X j from the mncorrespondences x ij X j x 1j x 3j P 1 x 2j Lazebnik P 2 P 3

Projective structure from motion Given: mimages of nfixed 3D points z ij x ij = P i X j, i = 1,, m, j = 1,, n Problem: estimate mprojection matrices P i and n3d points X j from the mncorrespondences x ij With no calibration info, cameras and points can only be recovered up to a 4x4 projective transformation Q: X QX, P PQ -1 We can solve for structure and motion when 2mn >= 11m +3n 15 For two cameras, at least 7 points are needed Lazebnik

Projective SFM: Two-camera case Compute fundamental matrix F between the two views First camera matrix: [I 0] Second camera matrix: [A b] Then zx = [ I 0] X, z x = [ A b] X z x = A[ I 0] X + b = zax + b z x b = zax b ( z x b) x = ( zax b) x T [ b ] Ax = 0 x F b ] A = b: epipole (F T b = 0), A = [b ]F [ Lazebnik F&P sec. 13.3.1

Projective factorization D = z z z 11 21 m1 x x x 11 21 m1 z z z 12 22 m2 x x x 12 22 m2 L L O L z z z 1n 2n mn x x x 1n 2n mn P P = M P 1 2 m cameras (3m 4) [ X X L X ] 1 2 points (4 n) n Lazebnik D = MS has rank 4 If we knew the depthsz, we could factorize Dto estimate Mand S If we knew Mand S, we could solve forz Solution: iterative approach (alternate between above two steps)

Sequential structure from motion Initialize motion from two images using fundamental matrix Initialize structure For each additional view: Determine projection matrix of new camera using all the known 3D points that are visible in its image calibration came eras points Lazebnik

Sequential structure from motion Initialize motion from two images using fundamental matrix Initialize structure For each additional view: Determine projection matrix of new camera using all the known 3D points that are visible in its image calibration Refine and extend structure: compute new 3D points, re-optimize existing points that are also seen by this camera triangulation came eras points Lazebnik

Sequential structure from motion Initialize motion from two images using fundamental matrix Initialize structure For each additional view: Determine projection matrix of new camera using all the known 3D points that are visible in its image calibration Refine and extend structure: compute new 3D points, re-optimize existing points that are also seen by this camera triangulation Refine structure and motion: bundle adjustment came eras points Lazebnik

Bundle adjustment Non-linear method for refining structure and motion Minimizing reprojection error m E( P, X) = D n i= 1 j= 1 ( ) 2 x, P X ij i j X j P 1 X j x 1j x 3j P 1 P 2 X j x 2j P 3 X j Lazebnik P 2 P 3

Self-calibration Self-calibration (auto-calibration) is the process of determining intrinsic camera parameters directly from uncalibrated images For example, when the images are acquired by a single moving camera, we can use the constraint that the intrinsic parameter matrix remains fixed for all the images Compute initial projective reconstruction and find 3D projective transformation matrix Qsuch that all camera matrices are in the form P i = K [R i t i ] Can use constraints on the form of the calibration matrix: zero skew Lazebnik

Ambiguity Summary: Structure from motion Affine structure from motion: factorization Dealing with missing data Projective structure from motion: two views Projective structure from motion: iterative factorization Bundle adjustment Self-calibration Lazebnik

(Included presentation available from http://phototour.cs.washington.edu/) Photo Tourism: Exploring Photo Collections in 3D Noah Snavely Steven M. Seitz University of Washington Richard Szeliski Microsoft Research 2006 2006 Noah Snavely Noah Snavely

http://grail.cs.washington.edu/rome/

Rome in a day : Coliseum video http://grail.cs.washington.edu/rome/

Rome in a day : Trevivideo http://grail.cs.washington.edu/rome/

Rome in a day : St. Peters video http://grail.cs.washington.edu/rome/

Slide Credits Svetlana Lazebnik Marc Pollefeys Noah Snaveley& co-authors

Today: SFM SFM problem statement Factorization Projective SFM Bundle Adjustment Photo Tourism Rome in a day:

Roadmap Previous: Image formation, filtering, local features, (Texture) Tues: Feature-based Alignment Stitching images together Homographies, RANSAC, Warping, Blending Global alignment of planar models Today: Dense Motion Models Local motion / feature displacement Parametric optic flow No classes next week: ICCV conference Oct 6 th : Stereo / Multi-view : Estimating depth with known inter-camera pose Oct 8 th : Structure-from-motion : Estimation of pose and 3D structure Factorization approaches Global alignment with 3D point models