3D Reconstruction of a Hopkins Landmark

Size: px

Start display at page:

Download "3D Reconstruction of a Hopkins Landmark"

Hester Lyons
6 years ago
Views:

3D Reconstruction of a Hopkins Landmark Ayushi Sinha (461), Hau Sze (461), Diane Duros (361) Abstract - This paper outlines a method for 3D reconstruction from two images.

These camera parameters are then used to triangulate corresponding points in two images of a structure, which are found using SIFT feature matching, to obtain a three dimensional point cloud

1 3D Reconstruction of a Hopkins Landmark Ayushi Sinha (461), Hau Sze (461), Diane Duros (361) Abstract - This paper outlines a method for 3D reconstruction from two images. Our procedure is based on known intrinsic camera parameters, which are used to calculate the extrinsic camera parameters. These camera parameters are then used to triangulate corresponding points in two images of a structure, which are found using SIFT feature matching, to obtain a three dimensional point cloud reconstruction of the structure. We use pictures of a Hopkins landmark, Maryland Hall, to demonstrate our technique. T I. INTRODUCTION HIS paper will outline the procedure we adopted to perform a 3D reconstruction of a Hopkins landmark, Maryland Hall. Although 3D reconstruction can be performed without camera parameters, such a reconstruction can only provide results up to an unknown projective transformation of the scene. Therefore, we calibrated our camera to obtain intrinsic camera parameters, which we then used, along with the homography between the two images we used to reconstruct Maryland Hall, to compute the extrinsic camera parameters. We used our camera parameters to compute the 3D geometry of Maryland Hall. We created features in the two images and matched these features to obtain point correspondences. We used these correspondences along with our camera parameters to triangulate corresponding points and obtain a point cloud reconstruction of Maryland Hall in 3D space. II. A. Obtain Pictures PROCEDURE We started our project by taking several pictures of Maryland Hall from several different angles. We also took pictures of a window, which we used as the grid to calibrate our camera (Figure 1). We made sure Figure 1: 5 out of the 10 images of the window used as grid to obtain intrinsic camera parameters Figure 2: The two pictures of Maryland Hall we used for the 3D reconstruction; image on the left is I1, image on the right is I2 not to change the focus of the camera through this process in order to be able to calibrate the camera and obtain the intrinsic camera parameters. B. Camera Parameters We obtained our intrinsic parameter matrix using the Camera Calibration Toolbox for MATLAB [1] and 10 pictures of the window in Figure 1. Then, we used hardcoded corresponding points in two images of Maryland Hall (Figure 2) to compute a homography between the two images using normalized direct linear transformation (DLT). Once we had the intrinsic parameter matrix and the homography, we used these to compute the extrinsic camera parameters rotation matrix and translation vector using the method described by Zhang [2]. We also optimized our rotation matrix to make it orthogonal following the method described in Appendix C of Zhang s paper [2]. C. SIFT Features We created features in our images using scale invariant feature transform (SIFT). We used VLFeat [3] to implement SIFT, which computed 59,365 features in I1 and 58,816 features in I2. We display 2,500 randomly selected features and their corresponding descriptors in both images in Figure 3. D. Feature Matching Once we have features in both images, we can match these features to find correspondences. Again,

TABLE II MATCHES Number of matches Image 1 to Image 2 9359 Image 2 to Image 1 9862 After enforcing symmetry 7157 Pose clustering using Hough 4545 transform Figure 3: 2,500 randomly selected features

2 TABLE II MATCHES Number of matches Image 1 to Image Image 2 to Image After enforcing symmetry 7157 Pose clustering using Hough 4545 transform Figure 3: 2,500 randomly selected features and the corresponding descriptors in I1 (left) and I2 (right) we used VLFeat [3] to find our initial matches from I1 to I2 and from I2 to I1. We then enforced symmetry on these two sets of matches to eliminate wrong matches. This ensured that each feature in one image was only matched to one feature in the other image. However, it failed to eliminate all incorrect matches. Therefore, we used a type of pose clustering technique called clustering with Hough transform to eliminate the remaining incorrect matches. E. Clustering with Hough Transform We used clustering with the Hough transform described by Lowe [4] to eliminate incorrect SIFT feature matches. In this method, each remaining feature votes for object poses that are similar to itself. Consistent matches vote for same relative scale and angle of the features, as well as same location of the center of the feature in the initial image. Poses with high number of votes are assumed to be good poses. We implemented this method by forming accumulators composed of coarse bins for each of the four dimensions of the SIFT features, that is, the x and y coordinates of the center of each feature frame, and its scale and orientation. Bin increments are described in Table I. Matches were allowed to vote for the one bin that minimized the difference between the feature parameter value and the bin value. After the votes were accumulated, we kept only the bins with at least three votes and eliminated all features corresponding to bins with less than three votes. This step eliminated almost all incorrect matches, leaving us with a dense set of correct feature matches. TABLE I BIN INCEMENTS Start End Increment X 1 Image i age id h width Y 1 Image i age heigh height Scale Angle (radians) - π π π/4 Clustering with Hough transform is a very efficient technique for outlier removal. We eliminated over 2000 outliers within seconds. Table II gives an idea of the number of outliers eliminated by our outlier detection algorithms. In addition, Figures 5 and 6 show the matches after enforcing symmetry and after applying clustering with Hough transform respectively. Incorrect matches can be seen in Figure 5 whereas Figure 6 contains almost only parallel lines representing correct matches. F. Triangulation Once we had our point correspondences, we used these along with the intrinsic and extrinsic camera parameters to triangulate the point correspondences and plot them in 3D space. We implemented triangulation by first computing the two projection matrices corresponding to the two images of Maryland Hall using our extrinsic parameters. Then we converted the point correspondence from pixel 2D coordinates to camera 3D coordinates using the intrinsic parameter matrix. We used these point correspondences in camera 3D coordinates to performed DLT for triangulation [5]. This gave us 3D points in the camera coordinates corresponding to I1, which we then plotted in color by averaging the RGB values of the corresponding matched points for each 3D point (Figure 4). Figure 4: Back view of our 3D reconstruction

3 III. RESULTS Our method gave us very good results. We were able to achieve a dense reconstruction of Maryland Hall. Our 3D point cloud of Maryland Hall is able to demonstrate scale in the size of the windows and the archway, depth in the position of the archway in 3D space as well as details such as windows, as can be seen in Figures 4 and 7. The process of reaching these final results was difficult and we did run into a few difficulties. First, it was difficult to find the right set of pictures which would give us a good number of features to match and hence be able to do a dense reconstruction. This meant that we had to be close enough to the structure to capture as many details as we could. We also had to make sure that we had the right amount of light exposure. Too much or too little exposure made it difficult to find the right number of features. Further, the rotation between the two images could not be too small or too large. Too little rotation would produce ambiguities in structure and depth information, whereas too much rotation would make it difficult to match points. In addition, we had to make sure we avoided occlusions in both images so that all the features of the structure were visible and could be detected. Another difficulty we had was with depth in the image. Subtleties related to converting points in image 2D coordinates to camera 3D coordinates and normalizing points made it difficult for us to obtain a point cloud that exhibited depth (Figure 8). However, after spending some time and analyzing our implementation of DLT for triangulation, we recognized our errors and obtained a 3D point cloud with depth (Figures 4, 7, 9, 10). IV. CONCLUSION AND FUTURE WORK Our approach was able to create a very convincing dense 3D reconstruction of Maryland Hall. However, there are some limitations in our approach. The first limitation is in the method we use to find our extrinsic parameters. We hardcoded corresponding points in the two images of Maryland Hall and used these to compute the homography between the two images. This would require changing these points in our code if we try to reconstruct a different structure or use a different set of images. We do have code (commented out) which could be used to pick four points in one image and corresponding four points in the other image, in case a new set of images was used. Again, this is a limitation because it requires user interaction to compute the homography, and errors on the part of the user could result in bad reconstructions. The best solution to this would be to automate the selection of corresponding points in the two images to compute the homography. This can be challenging because the accuracy of the homography depends on the points picked. If more than two points happen to fall on or near a straight line or if the points are too close together, the homography will be inaccurate leading to errors in the extrinsic parameters, which in turn would result in errors in the reconstruction. One way to automatically find the homography between the two images would be to use Random Sample Consensus (RANSAC) to compute the best homography that would transform the points in one image to the points in the second image. However, this process is slow and with as many point correspondences as we obtained, it would not be a very efficient technique to use. Another way to find the extrinsic parameters is by extracting them from the essential matrix, which can be obtain from the fundamental matrix. However, the results we obtained using this method were not as good as the results from our current method. Therefore, an extension to this work would be to find an efficient way to automatically compute a good estimate of the homography between the two images in order to obtain extrinsic parameters which will give us comparable or better results than we currently have. Another limitation is that computing the matches and enforcing symmetry between our matches is a slow process. Finding a more efficient way to do this will definitely help the runtime of our code. V. CODE INFORMATION In order to run our code, simply extract the folder in the.zip file, add the folder to the path in MATLAB and run main.m. This will run the code and display our results at different steps as well as our final result. The part of our code that matches SIFT features is slow. Therefore, we have also included a VisionFinalProject.mat file in our folder which contains a final version of our project. You can load this file in MATLAB using the load function and execute specific cells in case you do not want to execute the entire code.

REFERENCES [1] Camera Calibration Toolbox for MATLAB, Web. 01 Dec. 2011. http://www.vision.caltech.

4 REFERENCES [1] Camera Calibration Toolbox for MATLAB, Web. 01 Dec [2] Zhengyou Zhang, A Flexible New Technique for Camera Calibration, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.22, No.11, pages , 2000 [3] Vedaldi, A., and B. Fulkerson. "VLFeat: An Open and Portable Library of Computer Vision Algorithms." VLFeat Web. 01 Dec [4] David G. Lowe, Distinctive Image Features from Scale- Invariant Keypoints, International Journal of Computer Vision, Vol. 60, No. 2, page 105, [5] Nicolas Padoy. Two View Geometry, pp. 25. Figure 5: After enforcing symmetry (below) Figure 6: After clustering with Hough transform (above)

5 Figure 7: Front view of our 3D reconstruction (below) Figure 8: Front view of our earlier reconstruction which did not exhibit depth (above)

6 Figure 9: Side view of our reconstruction (below) Figure 10: Top view of our reconstruction (above)

7 Figure 11: 10 features and the corresponding epipolar lines in I1 (below) Figure 12: 10 features and corresponding epipolar lines in I2 (above)

CSCI 5980/8980: Assignment #4. Fundamental Matrix

CSCI 5980/8980: Assignment #4. Fundamental Matrix Submission CSCI 598/898: Assignment #4 Assignment due: March 23 Individual assignment. Write-up submission format: a single PDF up to 5 pages (more than 5 page assignment will be automatically returned.).