Image correspondences and structure from motion

Similar documents
Homographies and RANSAC

Structure from motion

Automatic Image Alignment (feature-based)

Structure from motion

The SIFT (Scale Invariant Feature

Photo Tourism: Exploring Photo Collections in 3D

CS4670: Computer Vision

Geometric camera models and calibration

A Systems View of Large- Scale 3D Reconstruction

Automatic Image Alignment

Structure from Motion CSC 767

Step-by-Step Model Buidling

Camera Calibration. COS 429 Princeton University

Structure from Motion. Introduction to Computer Vision CSE 152 Lecture 10

Automatic Image Alignment

Noah Snavely Steven M. Seitz. Richard Szeliski. University of Washington. Microsoft Research. Modified from authors slides

Local features: detection and description May 12 th, 2015

Instance-level recognition

Local features: detection and description. Local invariant features

CS664 Lecture #19: Layers, RANSAC, panoramas, epipolar geometry

Srikumar Ramalingam. Review. 3D Reconstruction. Pose Estimation Revisited. School of Computing University of Utah

Miniature faking. In close-up photo, the depth of field is limited.

Mosaics. Today s Readings

arxiv: v1 [cs.cv] 28 Sep 2018

Feature Matching and RANSAC

Stereo and Epipolar geometry

Photo Tourism: Exploring Photo Collections in 3D

Lecture 8.2 Structure from Motion. Thomas Opsahl

Large Scale 3D Reconstruction by Structure from Motion

BSB663 Image Processing Pinar Duygulu. Slides are adapted from Selim Aksoy

EECS150 - Digital Design Lecture 14 FIFO 2 and SIFT. Recap and Outline

Local Feature Detectors

RANSAC and some HOUGH transform

Improving Initial Estimations for Structure from Motion Methods

Instance-level recognition

Midterm Wed. Local features: detection and description. Today. Last time. Local features: main components. Goal: interest operator repeatability

Computational Optical Imaging - Optique Numerique. -- Multiple View Geometry and Stereo --

Two-view geometry Computer Vision Spring 2018, Lecture 10

Local features and image matching. Prof. Xin Yang HUST

CSE 252B: Computer Vision II

Srikumar Ramalingam. Review. 3D Reconstruction. Pose Estimation Revisited. School of Computing University of Utah

A Summary of Projective Geometry

Structure from Motion

Instance-level recognition II.

Camera Geometry II. COS 429 Princeton University

Patch Descriptors. CSE 455 Linda Shapiro

Patch Descriptors. EE/CSE 576 Linda Shapiro

Instance-level recognition part 2

Object Recognition with Invariant Features

arxiv: v1 [cs.cv] 28 Sep 2018

Harder case. Image matching. Even harder case. Harder still? by Diva Sian. by swashford

Feature Based Registration - Image Alignment

Methods for Representing and Recognizing 3D objects

CS 4495 Computer Vision A. Bobick. CS 4495 Computer Vision. Features 2 SIFT descriptor. Aaron Bobick School of Interactive Computing

Homography estimation

Structure from Motion

Hough Transform and RANSAC

Building a Panorama. Matching features. Matching with Features. How do we build a panorama? Computational Photography, 6.882

Image matching. Announcements. Harder case. Even harder case. Project 1 Out today Help session at the end of class. by Diva Sian.

Feature Detectors and Descriptors: Corners, Lines, etc.

Image Stitching. Slides from Rick Szeliski, Steve Seitz, Derek Hoiem, Ira Kemelmacher, Ali Farhadi

Final project bits and pieces

Harder case. Image matching. Even harder case. Harder still? by Diva Sian. by swashford

Dense 3D Reconstruction. Christiano Gava

Overview. Augmented reality and applications Marker-based augmented reality. Camera model. Binary markers Textured planar markers

Model Fitting. Introduction to Computer Vision CSE 152 Lecture 11

Geometry for Computer Vision

Stereo and structured light

A New Representation for Video Inspection. Fabio Viola

EECS 442: Final Project

RANSAC: RANdom Sampling And Consensus

Lecture 10: Multi view geometry

What have we leaned so far?

Fast Outlier Rejection by Using Parallax-Based Rigidity Constraint for Epipolar Geometry Estimation

Vision par ordinateur

Lecture 10: Multi-view geometry

Computer Vision I - Appearance-based Matching and Projective Geometry

3D Reconstruction from Two Views

CS6670: Computer Vision

Computer Vision I - Appearance-based Matching and Projective Geometry

Multiview Stereo COSC450. Lecture 8

Application questions. Theoretical questions

CS231M Mobile Computer Vision Structure from motion

Combining Appearance and Topology for Wide

Fitting. Fitting. Slides S. Lazebnik Harris Corners Pkwy, Charlotte, NC

Today s lecture. Image Alignment and Stitching. Readings. Motion models

Final Exam Study Guide

Structure from motion

Computational Optical Imaging - Optique Numerique. -- Single and Multiple View Geometry, Stereo matching --

Today. Stereo (two view) reconstruction. Multiview geometry. Today. Multiview geometry. Computational Photography

Computer Vision I. Announcements. Random Dot Stereograms. Stereo III. CSE252A Lecture 16

C280, Computer Vision

Multi-stable Perception. Necker Cube

Augmented Reality VU. Computer Vision 3D Registration (2) Prof. Vincent Lepetit

Multiple View Geometry. Frank Dellaert

Observations. Basic iteration Line estimated from 2 inliers

Image Features: Local Descriptors. Sanja Fidler CSC420: Intro to Image Understanding 1/ 58

Local invariant features

Parameter estimation. Christiano Gava Gabriele Bleser

10/03/11. Model Fitting. Computer Vision CS 143, Brown. James Hays. Slides from Silvio Savarese, Svetlana Lazebnik, and Derek Hoiem

Transcription:

Image correspondences and structure from motion http://graphics.cs.cmu.edu/courses/15-463 15-463, 15-663, 15-862 Computational Photography Fall 2017, Lecture 20

Course announcements Homework 5 posted. - It is a merged version of the original HW5 and the planned HW6. - You will need cameras for second part that one as well, so keep the ones you picked up for HW4 (or use your phone cameras. - Start in the first week or else you won t finish it. Homework 4 has been graded. - Mean: 76:68. - Median: 80. - Tonemapped (i.e., LDR) images should be stored as.png, not.hdr. Next guest lecture on Wednesday: Suren Jayasuriya. - Will talk about computational sensors.

Overview of today s lecture The image correspondence pipeline. Describing interest points. Matching interest points and RANSAC. Structure from motion.

Slide credits Most of these slides were adapted from: Kris Kitani (15-463, Fall 2016). Noah Snavely (Cornell).

The image correspondence pipeline

Create point correspondences p 1 p 2 p 1 p 2 p 4 p 3 p 4 p 3 original image target image Can we automate this step?

The image correspondence pipeline 1. Feature point detection Detect corners using the Harris corner detector. 2. Feature point description 3. Feature matching and homography estimation

The image correspondence pipeline 1. Feature point detection Detect corners using the Harris corner detector. 2. Feature point description 3. Feature matching and homography estimation

How do we match features robustly?

How do we match features robustly? We need a way to describe regions around each feature. How do you account for changes in viewpoint or scale? Tradeoff between discriminative power and invariance to appearance changes.

Multi-scale Oriented Patches (MOPS) Multi-Image Matching using Multi-Scale Oriented Patches. M. Brown, R. Szeliski and S. Winder. International Conference on Computer Vision and Pattern Recognition (CVPR2005). pages 510-517

Multi-scale Oriented Patches (MOPS) Multi-Image Matching using Multi-Scale Oriented Patches. M. Brown, R. Szeliski and S. Winder. International Conference on Computer Vision and Pattern Recognition (CVPR2005). pages 510-517 Given a feature Get 40 x 40 image patch, subsample every 5th pixel (what s the purpose of this step?) Subtract the mean, divide by standard deviation (what s the purpose of this step?) Haar Wavelet Transform (what s the purpose of this step?)

Orientation normalization Use the dominant image gradient direction to normalize the orientation of the patch This is how you compute the theta of the feature

Multi-scale Oriented Patches (MOPS) Multi-Image Matching using Multi-Scale Oriented Patches. M. Brown, R. Szeliski and S. Winder. International Conference on Computer Vision and Pattern Recognition (CVPR2005). pages 510-517 Given a feature Get 40 x 40 image patch, subsample every 5th pixel (what s the purpose of this step?) Subtract the mean, divide by standard deviation (what s the purpose of this step?) Haar Wavelet Transform (what s the purpose of this step?)

Multi-scale Oriented Patches (MOPS) Multi-Image Matching using Multi-Scale Oriented Patches. M. Brown, R. Szeliski and S. Winder. International Conference on Computer Vision and Pattern Recognition (CVPR2005). pages 510-517 Given a feature Get 40 x 40 image patch, subsample every 5th pixel (low frequency filtering, absorbs localization errors) Subtract the mean, divide by standard deviation (what s the purpose of this step?) Haar Wavelet Transform (what s the purpose of this step?)

Multi-scale Oriented Patches (MOPS) Multi-Image Matching using Multi-Scale Oriented Patches. M. Brown, R. Szeliski and S. Winder. International Conference on Computer Vision and Pattern Recognition (CVPR2005). pages 510-517 Given a feature Get 40 x 40 image patch, subsample every 5th pixel (low frequency filtering, absorbs localization errors) Subtract the mean, divide by standard deviation (removes bias and gain) Haar Wavelet Transform (what s the purpose of this step?)

Multi-scale Oriented Patches (MOPS) Multi-Image Matching using Multi-Scale Oriented Patches. M. Brown, R. Szeliski and S. Winder. International Conference on Computer Vision and Pattern Recognition (CVPR2005). pages 510-517 Given a feature Get 40 x 40 image patch, subsample every 5th pixel (low frequency filtering, absorbs localization errors) Subtract the mean, divide by standard deviation (removes bias and gain) Haar Wavelet Transform (low frequency projection)

Haar Wavelets (actually, Haar-like features) Use responses of a bank of filters as a descriptor

Computing Haar wavelet responses 1. Haar wavelet responses can be computed with filtering. image patch Haar wavelets filters -1 +1-1 +1 2. Haar wavelet responses can be computed efficiently (in constant time) with integral images.

Computing Haar wavelet responses Given an image patch, compute filter responses filter bank (20 Haar wavelet filters) vector of filter responses Responses are usually computed at specified location as a face patch descriptor

The image correspondence pipeline 1. Feature point detection Detect corners using the Harris corner detector. 2. Feature point description Describe features using the Multi-scale oriented patch descriptor. 3. Feature matching and homography estimation

The image correspondence pipeline 1. Feature point detection Detect corners using the Harris corner detector. 2. Feature point description Describe features using the Multi-scale oriented patch descriptor. 3. Feature matching and homography estimation

Up to now, we ve assumed correct correspondences

What if there are mismatches? outlier How would you find just the inliers?

RANSAC RANdom SAmple Consensus [Fischler & Bolles in 81]

Fitting lines (with outliers) Algorithm: 1. Sample (randomly) the number of points required to fit the model 2. Solve for model parameters using samples 3. Score by the fraction of inliers within a preset threshold of the model Repeat 1-3 until the best model is found with high confidence

Fitting lines (with outliers) Algorithm: 1. Sample (randomly) the number of points required to fit the model 2. Solve for model parameters using samples 3. Score by the fraction of inliers within a preset threshold of the model Repeat 1-3 until the best model is found with high confidence

Fitting lines (with outliers) Algorithm: 1. Sample (randomly) the number of points required to fit the model 2. Solve for model parameters using samples 3. Score by the fraction of inliers within a preset threshold of the model Repeat 1-3 until the best model is found with high confidence

Fitting lines (with outliers) Algorithm: 1. Sample (randomly) the number of points required to fit the model 2. Solve for model parameters using samples 3. Score by the fraction of inliers within a preset threshold of the model Repeat 1-3 until the best model is found with high confidence

Fitting lines (with outliers) Algorithm: 1. Sample (randomly) the number of points required to fit the model 2. Solve for model parameters using samples 3. Score by the fraction of inliers within a preset threshold of the model Repeat 1-3 until the best model is found with high confidence

How to choose parameters? Number of samples N Choose N so that, with probability p, at least one random sample is free from outliers (e.g. p=0.99) (outlier ratio: e ) Number of sampled points s Minimum number needed to fit the model Distance threshold δ Choose δ so that a good point with noise is likely (e.g., prob=0.95) within threshold Zero-mean Gaussian noise with std. dev. σ: t 2 =3.84σ 2

Matched points

Least Square fit finds the average transform

RANSAC: Use one correspondence, find inliers

RANSAC: Use one correspondence, find inliers

RANSAC: Use one correspondence, find inliers

RANSAC: Use one correspondence, find inliers

Estimating homography using RANSAC RANSAC loop 1. Get four point correspondences (randomly) 2. Compute homography H (DLT) 3. Count inliers Why four point correspondences? 4. Keep if largest number of inliers Recompute H using all inliers

Useful for

The image correspondence pipeline 1. Feature point detection Detect corners using the Harris corner detector. 2. Feature point description Describe features using the Multi-scale oriented patch descriptor. 3. Feature matching and homography estimation Do both simultaneously using RANSAC.

Structure from motion

Structure from motion Given many images, how can we a) figure out where they were all taken from? b) build a 3D model of the scene? This is (roughly) the structure from motion problem

Structure from motion Input: images with points in correspondence p i,j = (u i,j,v i,j )

Structure from motion Reconstruction (side) (top) Input: images with points in correspondence p i,j = (u i,j,v i,j ) Output structure: 3D location x i for each point p i motion: camera parameters R j, t j possibly K j Objective function: minimize reprojection error

Camera calibration & triangulation Suppose we know 3D points And have matches between these points and an image How can we compute the camera parameters? Suppose we have know camera parameters, each of which observes a point How can we compute the 3D location of that point?

Structure from motion SfM solves both of these problems at once A kind of chicken-and-egg problem (but solvable)

15,464 37,383 76,389

Standard way to view photos

Photo Tourism

Input: Point correspondences

Feature detection Detect features using SIFT [Lowe, IJCV 2004]

Feature description Describe features using SIFT [Lowe, IJCV 2004]

Feature matching Match features between each pair of images

Feature matching Refine matching using RANSAC to estimate fundamental matrix between each pair

Correspondence estimation Link up pairwise matches to form connected components of matches across several images Image 1 Image 2 Image 3 Image 4

Image connectivity graph (graph layout produced using the Graphviz toolkit: http://www.graphviz.org/)

Structure from motion X 1 X 4 X 3 X 2 X 5 X 6 X 7 minimize g(r,t,x) non-linear least squares p 1,1 p 1,2 p 1,3 Camera 1 Camera 3 R 1,t 1 Camera 2 R 3,t 3 R 2,t 2

The pinhole camera image plane real-world object camera center focal length f

The (rearranged) pinhole camera image plane real-world object focal length f camera center

The (rearranged) pinhole camera image plane camera center principal axis

Camera projection matrix 3x3 intrinsics 3x3 3D rotation 3x3 identity 3x1 3D translation Another way to write the mapping:

Structure from motion X 1 X 4 X 3 X 2 X 5 X 6 X 7 minimize g(r,t,x) non-linear least squares p 1,1 p 1,2 p 1,3 Camera 1 Camera 3 R 1,t 1 Camera 2 R 3,t 3 R 2,t 2

Structure from motion Minimize sum of squared reprojection errors: predicted image location indicator variable: is point i visible in image j? observed image location Minimizing this function is called bundle adjustment Optimized using non-linear least squares, e.g. Levenberg-Marquardt

Problem size What are the variables? How many variables per camera? How many variables per point? Trevi Fountain collection 466 input photos + > 100,000 3D points = very large optimization problem

Is SfM always uniquely solvable?

Is SfM always uniquely solvable? No

Incremental structure from motion

Incremental structure from motion

Final reconstruction

More examples

More examples

More examples

Even larger scale SfM City-scale structure from motion Building Rome in a day http://grail.cs.washington.edu/projects/rome/

Necker reversal SfM Failure cases

Structure from Motion Failure cases Repetitive structures

SfM applications 3D modeling Surveying Robot navigation and mapmaking Visual effects ( Match moving ) https://www.youtube.com/watch?v=rdywp70p_ky

Applications Photosynth

Applications Hyperlapse https://www.youtube.com/watch?v=sopwhaqnrsy

References Basic reading: Szeliski textbook, Sections 4, 6.1.4, 7. Additional reading: Hartley and Zisserman, Multiple View Geometry, Cambridge University Press 2003. as usual when it comes to geometry and vision, this book is the best reference; Sections 10, 11, and 14 in particular discuss everything about structure from motion. Snavely et al., Photo tourism: Exploring photo collections in 3D, SIGGRAPH 2006. Snavely et al., Finding Paths through the World's Photos, SIGGRAPH 2008. Snavely et al., Modeling the world from Internet photo collections, IJCV 2008. the series of papers on developing Photo Tourism and large scale structure from motion. Agarwal et al., Building Rome in a Day, ICCV 2009. a follow-up on even larger-scale structure from motion (using city-scale photo collections).