Computer Vision CSCI-GA Assignment 1.

Similar documents
CSCI 5980/8980: Assignment #4. Fundamental Matrix

Two-view geometry Computer Vision Spring 2018, Lecture 10

calibrated coordinates Linear transformation pixel coordinates

Structure from Motion. Introduction to Computer Vision CSE 152 Lecture 10

Structure from motion

Geometric camera models and calibration

CS 231A Computer Vision (Winter 2014) Problem Set 3

Two-View Geometry (Course 23, Lecture D)

Homographies and RANSAC

Structure from motion

Camera calibration. Robotic vision. Ville Kyrki

Structure from motion

3D Geometry and Camera Calibration

Instance-level recognition I. - Camera geometry and image alignment

Final Exam Study Guide

Stereo Vision. MAN-522 Computer Vision

COMP 558 lecture 19 Nov. 17, 2010

3D Reconstruction from Two Views

A Summary of Projective Geometry

Multiple View Geometry in Computer Vision

Vision Review: Image Formation. Course web page:

Lecture 3: Camera Calibration, DLT, SVD

EECS 442: Final Project

The real voyage of discovery consists not in seeking new landscapes, but in having new eyes.

Stereo and Epipolar geometry

Week 2: Two-View Geometry. Padua Summer 08 Frank Dellaert

Computer Vision, Assignment 4 Model Fitting

CS231A Midterm Review. Friday 5/6/2016

Computational Optical Imaging - Optique Numerique. -- Multiple View Geometry and Stereo --

Structure from Motion CSC 767

Fundamental Matrix & Structure from Motion

Computer Vision, Assignment 1 Elements of Projective Geometry

Lecture 9: Epipolar Geometry

Name of student. Personal number. Teaching assistant. Optimisation: Stereo correspondences for fundamental matrix estimation.

Multiple-Choice Questionnaire Group C

The end of affine cameras

3D Computer Vision. Class Roadmap. General Points. Picturing the World. Nowadays: Computer Vision. International Research in Vision Started Early!

CS6670: Computer Vision

Factorization with Missing and Noisy Data

Perspective Projection [2 pts]

An idea which can be used once is a trick. If it can be used more than once it becomes a method

Project: Camera Rectification and Structure from Motion

Image Stitching. Slides from Rick Szeliski, Steve Seitz, Derek Hoiem, Ira Kemelmacher, Ali Farhadi

Exercise 2. AMTH/CPSC 445a/545a - Fall Semester September 21, 2017

CSE 252B: Computer Vision II

Epipolar Geometry CSE P576. Dr. Matthew Brown

CIS 580, Machine Perception, Spring 2014: Assignment 4 Due: Wednesday, April 10th, 10:30am (use turnin)

Srikumar Ramalingam. Review. 3D Reconstruction. Pose Estimation Revisited. School of Computing University of Utah

MAPI Computer Vision. Multiple View Geometry

3D Reconstruction of a Hopkins Landmark

Image correspondences and structure from motion

Visual Recognition: Image Formation

Camera Models and Image Formation. Srikumar Ramalingam School of Computing University of Utah

Reminder: Lecture 20: The Eight-Point Algorithm. Essential/Fundamental Matrix. E/F Matrix Summary. Computing F. Computing F from Point Matches

Camera Model and Calibration

Pin Hole Cameras & Warp Functions

Today. Today. Introduction. Matrices. Matrices. Computergrafik. Transformations & matrices Introduction Matrices

Homework #1. Displays, Image Processing, Affine Transformations, Hierarchical Modeling

55:148 Digital Image Processing Chapter 11 3D Vision, Geometry

BIL Computer Vision Apr 16, 2014

CEE598 - Visual Sensing for Civil Infrastructure Eng. & Mgmt.

Lecture 10: Multi view geometry

Srikumar Ramalingam. Review. 3D Reconstruction. Pose Estimation Revisited. School of Computing University of Utah

CS1114 Assignment 5, Part 2

Camera Calibration. Schedule. Jesus J Caban. Note: You have until next Monday to let me know. ! Today:! Camera calibration

Computer Vision I Name : CSE 252A, Fall 2012 Student ID : David Kriegman Assignment #1. (Due date: 10/23/2012) x P. = z

CS 231A Computer Vision (Winter 2018) Problem Set 3

Unit 3 Multiple View Geometry

Computer Vision I - Appearance-based Matching and Projective Geometry

Computer Vision Projective Geometry and Calibration. Pinhole cameras

3D Reconstruction From Multiple Views Based on Scale-Invariant Feature Transform. Wenqi Zhu

Pin Hole Cameras & Warp Functions

Homework #1. Displays, Image Processing, Affine Transformations, Hierarchical Modeling

55:148 Digital Image Processing Chapter 11 3D Vision, Geometry

CS 231A: Computer Vision (Winter 2018) Problem Set 2

Agenda. Rotations. Camera models. Camera calibration. Homographies

Interlude: Solving systems of Equations

Structure from Motion and Multi- view Geometry. Last lecture

Agenda. Rotations. Camera calibration. Homography. Ransac

Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science.

Computer Vision I - Algorithms and Applications: Multi-View 3D reconstruction

Undergrad HTAs / TAs. Help me make the course better! HTA deadline today (! sorry) TA deadline March 21 st, opens March 15th

Binocular Stereo Vision. System 6 Introduction Is there a Wedge in this 3D scene?

Midterm Examination CS 534: Computational Photography

Automatic Image Alignment (feature-based)

CS231A Section 6: Problem Set 3

Epipolar Geometry and Stereo Vision

Computational Optical Imaging - Optique Numerique. -- Single and Multiple View Geometry, Stereo matching --

Lecture 10: Multi-view geometry

Math background. 2D Geometric Transformations. Implicit representations. Explicit representations. Read: CS 4620 Lecture 6

Instance-level recognition part 2

Image warping , , Computational Photography Fall 2017, Lecture 10

The SIFT (Scale Invariant Feature

Robust Geometry Estimation from two Images

Automatic 3D Model Construction for Turn-Table Sequences - a simplification

Lecture 5 2D Transformation

Homework #1. Displays, Alpha Compositing, Image Processing, Affine Transformations, Hierarchical Modeling

CSE 252B: Computer Vision II

CS231A Course Notes 4: Stereo Systems and Structure from Motion

16720: Computer Vision Homework 1

Transcription:

Computer Vision CSCI-GA.2272-001 Assignment 1. September 22, 2017 Introduction This assignment explores various methods for aligning images and feature extraction. There are four parts to the assignment: 1. Filtering Simple exercises involving image convolution. [20 points]. 2. Image alignment using RANSAC Solve for an affine transformation between a pair of images using the RANSAC fitting algorithm. [30 points]. 3. Estimating Camera Parameters using a set of 3D world points and their 2D image locations, estimate the projection matrix P of a camera. [35 points]. 4. Structure from Motion infer the 3D structure of an object, given a set of images of the object. [35 points]. Please also download the assignment1.zip file from the course webpage as it contains images and code needed for the assignment. Requirements You may perform this assignment in the language of your choice, but Python or Matlab are strongly recommended since they are high-level languages with 1

CSCI-GA.2272-001 2 much of the required funtionality built-in. In the case of Matlab, some tutorials are posted on the course web page 1. This assignment is due on Thursday October 12th at 11.59pm. Please note that the late policy is as follows: (a) assignments that are late by less than 24hrs will suffer a 10% reduction; (b) those between 24 and 72 hrs late will suffer a 25% reduction and (c) those more 72hrs late will suffer a 50% reduction. You are strongly encouraged to start the assignment early and don t be afraid to ask for help from either the TA or myself. You are allowed to collaborate with other students in terms discussing ideas and possible solutions. However you code up the solution yourself, i.e. you must write your own code. Copying your friends code and just changing all the names of the variables is not allowed! You are not allowed to use solutions from similar assignments in courses from other institutions, or those found elsewhere on the web. Your solutions should be emailed to me at (fergus@cs.nyu.edu) and the grader Utku Evci (ue225@nyu.edu) in a single zip file with the filename: lastname firstname a1.zip. This zip file should contain: (i) a PDF file lastname firstname a1.pdf with your report, showing output images for each part of the assignment and explanatory text, where appropriate; (ii) the source code used to generate the images (with code comments), along with a master script (named master a1.m) that runs the code for each part of the assignment in turn. 1 Image Filtering This section explores 2D image convolution, a fundamental operation in computer vision. (a) Consider the following image X and filter kernel F : X = a b c d e f g h i F = [ w x ] y z 1. Write down the output of X convolved with F, using valid boundary conditions. 1 Unfortunately, Matlab is not free software, but a relatively inexpensvie student version is available online or at the NYU bookstore. (1)

CSCI-GA.2272-001 3 2. Write down the output of X convolved with F, using same boundary conditions (and zero padding). (b) Give the output dimensions for a 2D image of size (h, w) convolved with a filter of size (i, j), assuming valid boundary conditions. (c) Write code to perform approximate 2D Gaussian blurring of an image, using only the filter kernel [1 2 1]/4 (N.B.: you are allowed to transpose the filter). Do not use any functions that precompute a Gaussian kernel. Your code should have two inputs: (i) a 2D grayscale image of arbitrary size; (ii) an integer value specifiying the width of the kernel (in pixels), which must be odd (i.e. even widths are not permitted). The output should be a 2D (blurred) image of the correct size, assuming valid boundary conditions. You are allow to use built-in convolution operations, once you have constructed the approximate Gaussian kernel. Please comment your code to describe it works. 2 Image Alignment In this part of the assignment you will write a function that takes two images as input and computes the affine transformation between them. The overall scheme, as outlined in lecture 5 and 6, is as follows: Find local image regions in each image Characterize the local appearance of the regions Get set of putative matches between region descriptors in each image Perform RANSAC to discover best transformation between images The first two stages can be performed using David Lowe s SIFT feature detector and descriptor representation. A Matlab implementation of this can be in found in the VLFeat package (http://www.vlfeat.org/overview/ sift.html). A Python version can be found in the OpenCV-Python environment (http://opencv-python-tutroals.readthedocs.io/en/latest/py_ tutorials/py_feature2d/py_sift_intro/py_sift_intro.html). The two images you should match are contained in the assignment1.zip file: scene.pgm and book.pgm, henceforth called image 1 and 2 respectively. You should first run the SIFT detector over both images to produce a set of regions, characterized by a 128d descriptor vector. Display these regions

CSCI-GA.2272-001 4 on each picture to ensure that a satsifactory number of them have been extracted. Please include the images in your report. The next step is to obtain a set of putative matches T. This should be done as follows: for each descriptor in image 1, compute the closest neighbor amongst the descriptors from image 2 using Euclidean distance. Spurious matches can be removed by then computing the ratio of distances between the closest and second-closest neighbor and rejecting any matches that are above a certain threshold. To test the functioning of RANSAC, we want to have some erroneous matches in our set, thus this threshold should be set to a fairly slack value of 0.9. To check that your code is functioning correctly, plot out the two images side-by-side with lines showing the potential matches (include this in your report). The final stage, running RANSAC, should be performed as follows: Repeat N times (where N is 100): Pick P matches at random from the total set of matches T. Since we are solving for an affine transformation which has 6 degrees of freedom, we only need to select P=3 matches. Construct a matrix A and vector b using the 3 pairs of points as described in lecture 6. Solve for the unknown transformation parameters q. In Matlab you can use the \ command. In Python you can use linalg.solve. Using the transformation parameters, transform the locations of all T points in image 1. If the transformation is correct, they should lie close to their pairs in image 2. Count the number of inliers, inliers being defined as the number of transformed points from image 1 that lie within a radius of 10 pixels of their pair in image 2. If this count exceeds the best total so far, save the transformation parameters and the set of inliers. End repeat. Perform a final refit using the set of inliers belonging to the best transformation you found. This refit should use all inliers, not just 3 points chosen at random.

CSCI-GA.2272-001 5 Finally, transform image 1 using this final set of transformation parameters, q. In Matlab this can be done by first forming a homography matrix H = [ q(1) q(2) q(5) ; q(3) q(4) q(6) ; 0 0 1 ]; and then using the imtransform and maketform functions as follows: transformed image=imtransform(im1,maketform( affine,h ));. In Python you can use the cv2.warpaffine from the OpenCV-Python environment. If you display this image you should find that the pose of the book in the scene should correspond to its pose in image 2. Your report should include: (i) the transformed image 1 and (ii) the values in the matrix H. 3 Estimating the Camera Parameters Here the goal is the compute the 3x4 camera matrix P describing a pinhole camera given the coordinates of 10 world points and their corresponding image projections. Then you will decompose P into the intrinsic and extrinsic parameters. You should write a simple Matlab or Python script that works through the stages below, printing out the important terms. Download from the course webpage the two ASCII files, world.txt and image.txt. The first file contains the (X,Y,Z) values of 10 world points. The second file contains the (x,y) projections of those 10 points. (a) Find the 3x4 matrix P that projects the world points X to the 10 image points x. This should be done in the following steps: Since P is a homogeneous matrix, the world and image points (which are 3 and 2-D respectively), need to be converted into homogeneous points by concatenating a 1 to each of them (thus becoming 4 and 3-D respectively). We now note that x P X = 0, irrespective of the scale ambiguity. This allows us to setup a series of linear equations of the form: 0 T w i X T i y i X T i P 1 w i X T i 0 T x i X T i P 2 = 0 (2) y i X T i x i X T i 0 T P 3 for each correspondence x i X i, where x i = (x i, y i, w i ) T, w i being the homogeneous coordinate, and P j is the j th row of P. But since the 3rd

CSCI-GA.2272-001 6 row is a linear combination of the first two, we need only consider the first two rows for each correspondence i. Thus, you should form a 20 by 12 matrix A, each of the 10 correspondences contributing two rows. This yields Ap = 0, p being the vector containing the entries of matrix P. To solve for p, we need to impose an extra constraint to avoid the trivial solution p = 0. One simple one is to use p 2 = 1. This constraint is implicitly imposed when we compute the SVD of A. The value of p that minimizes Ap subject to p 2 = 1 is given by the eigenvector corresponding to the smallest singular value of A. To find this, compute the SVD of A, picking this eigenvector and reshaping it into a 3 by 4 matrix P. Verify your answer by re-projecting the world points X and checking that they are close to x. (b) Now we have P, we can compute the world coordinates of the projection center of the camera C. Note that P C = 0, thus C lies in the null space of P, which can again be found with an SVD (the Matlab command is svd). Compute the SVD of P and pick the vector corresponding to this null-space. Finally, convert it back to inhomogeneous coordinates and to yield the (X,Y,Z) coordinates. Your report should contain the matrix P and the value of C. In the alternative route, we decompose P into it s constituent matrices. Recall from the lectures that P = K[R t]. However, also, t = R C, C being the inhomogeneous form of C. Since K is upper triangular, use a RQdecomposition to factor KR into the intrinsic parameters K and a rotation matrix R. Then solve for C. Check that your answer agrees with the solution from the first method. 4 Structure from Motion In this section you will code up an affine structure from motion algorithm, as described in the slides of lecture 6. For more details, you can consult page 437 of the Hartley & Zisserman book. Load the file sfm points.mat (included in assignment1.zip). In Python this can be done using scipy (http://docs.scipy.org/doc/scipy/reference/

CSCI-GA.2272-001 7 tutorial/io.html). The file contains a 2 by 600 by 10 matrix, holding the x, y coordinates of 600 world points projected onto the image plane of the camera in 10 different locations. The points correspond, that is image points(:,1,:) is the projection of the same 3D world point in the 10 frames. The points have been drawn randomly to lie on the surface of a transparent 3D cube, which does not move between frames (i.e. the object is static, only the camera moves). Try plotting out several frames and the cube shaped structure should be apparent (the plot3 command may be useful). To simplify matters, we will only attempt an affine reconstruction, thus the projection matrix of each camera i will have following form: P i = p 11 p 12 p 13 p 14 p 21 p 22 p 23 p 24 0 0 0 1 = ( M i t i 0 1 where M i is a 2 by 3 matrix and t i is a 2 by 1 translation vector. So given m = 10 views and n = 600 points, having image locations x i j, where j = 1,..., n, i = 1,..., m, we want to determine the affine camera matrices M i, t i and 3D points X j so that we minimize the reconstruction error: x i j (M i X j + t i ) 2 (4) We do this in the following stages: ij Compute the translations t i directly by computing the centroid of point in each image i. Center the points in each image by subtracting off the centroid, so that the points have zero mean Construct the 2m by n measurement matrix W from the centered data. Perform an SVD decomposition of W into UDV T. The camera locations M i can be obtained from the first three columns of U multiplied by D(1 : 3, 1 : 3), the first three singular values. The 3D world point locations are the first three columns of V. ) (3)

CSCI-GA.2272-001 8 You can verify your answer by plotting the 3D world points out. using the plot3 command. The rotate3d command will let you rotate the plot. This functionality is replicated in Python within the matplotlib package. You should write a script to implement the steps above. The script should print out the M i and t i for the first camera and also the 3D coordinates of the first 10 world points. Cut and paste these into your report.