A COMPREHENSIVE TOOL FOR RECOVERING 3D MODELS FROM 2D PHOTOS WITH WIDE BASELINES

Similar documents
Step-by-Step Model Buidling

Structure from Motion. Introduction to Computer Vision CSE 152 Lecture 10

Passive 3D Photography

Feature Transfer and Matching in Disparate Stereo Views through the use of Plane Homographies

A General Expression of the Fundamental Matrix for Both Perspective and Affine Cameras

A Summary of Projective Geometry

Flexible Calibration of a Portable Structured Light System through Surface Plane

Computer Vision Lecture 17

Computer Vision Lecture 17

1-2 Feature-Based Image Mosaicing

Stereo and Epipolar geometry

Stereo Image Rectification for Simple Panoramic Image Generation

Invariant Features from Interest Point Groups

IMPACT OF SUBPIXEL PARADIGM ON DETERMINATION OF 3D POSITION FROM 2D IMAGE PAIR Lukas Sroba, Rudolf Ravas

Lecture 10: Multi-view geometry

Fast Outlier Rejection by Using Parallax-Based Rigidity Constraint for Epipolar Geometry Estimation

Today. Stereo (two view) reconstruction. Multiview geometry. Today. Multiview geometry. Computational Photography

calibrated coordinates Linear transformation pixel coordinates

Unit 3 Multiple View Geometry

Object Recognition with Invariant Features

Compositing a bird's eye view mosaic

3D FACE RECONSTRUCTION BASED ON EPIPOLAR GEOMETRY

Surface Normal Aided Dense Reconstruction from Images

arxiv: v1 [cs.cv] 28 Sep 2018

Euclidean Reconstruction Independent on Camera Intrinsic Parameters

3D Models from Extended Uncalibrated Video Sequences: Addressing Key-frame Selection and Projective Drift

Wide Baseline Matching using Triplet Vector Descriptor

On Plane-Based Camera Calibration: A General Algorithm, Singularities, Applications

A New Representation for Video Inspection. Fabio Viola

Miniature faking. In close-up photo, the depth of field is limited.

Camera Calibration. Schedule. Jesus J Caban. Note: You have until next Monday to let me know. ! Today:! Camera calibration

IMAGE-BASED 3D ACQUISITION TOOL FOR ARCHITECTURAL CONSERVATION

Stereo Vision. MAN-522 Computer Vision

Multiple View Geometry

CSE 252B: Computer Vision II

An Overview of Matchmoving using Structure from Motion Methods

Stereo. 11/02/2012 CS129, Brown James Hays. Slides by Kristen Grauman

A linear algorithm for Camera Self-Calibration, Motion and Structure Recovery for Multi-Planar Scenes from Two Perspective Images

EE795: Computer Vision and Intelligent Systems

Lecture 10: Multi view geometry

Structure from Motion. Prof. Marco Marcon

Fast, Unconstrained Camera Motion Estimation from Stereo without Tracking and Robust Statistics

Projector Calibration for Pattern Projection Systems

Perception and Action using Multilinear Forms

Recap: Features and filters. Recap: Grouping & fitting. Now: Multiple views 10/29/2008. Epipolar geometry & stereo vision. Why multiple views?

A Survey of Light Source Detection Methods

Multi-View Geometry Part II (Ch7 New book. Ch 10/11 old book)

On Plane-Based Camera Calibration: A General Algorithm, Singularities, Applications

Structure from motion

1 Projective Geometry

Camera Registration in a 3D City Model. Min Ding CS294-6 Final Presentation Dec 13, 2006

Epipolar Geometry in Stereo, Motion and Object Recognition

Vision-Based Registration for Augmented Reality with Integration of Arbitrary Multiple Planes

Epipolar Geometry and Stereo Vision

MOTION STEREO DOUBLE MATCHING RESTRICTION IN 3D MOVEMENT ANALYSIS

Structure from motion

Using Geometric Blur for Point Correspondence

Structure and motion in 3D and 2D from hybrid matching constraints

Multiple Views Geometry

BIL Computer Vision Apr 16, 2014

Multi-Projector Display with Continuous Self-Calibration

Structure from Motion and Multi- view Geometry. Last lecture

Epipolar Geometry and Stereo Vision

COMPARATIVE STUDY OF DIFFERENT APPROACHES FOR EFFICIENT RECTIFICATION UNDER GENERAL MOTION

Model Refinement from Planar Parallax

Dense 3D Reconstruction. Christiano Gava

Stereo CSE 576. Ali Farhadi. Several slides from Larry Zitnick and Steve Seitz

Mathematics of a Multiple Omni-Directional System

arxiv: v1 [cs.cv] 28 Sep 2018

Estimation of common groundplane based on co-motion statistics

3D Sensing. 3D Shape from X. Perspective Geometry. Camera Model. Camera Calibration. General Stereo Triangulation.

A Novel Stereo Camera System by a Biprism

Factorization Method Using Interpolated Feature Tracking via Projective Geometry

Computer Vision 558 Corner Detection Overview and Comparison

CHAPTER 3. Single-view Geometry. 1. Consequences of Projection

Lecture 14: Basic Multi-View Geometry

Hierarchical Matching Techiques for Automatic Image Mosaicing

Stereo II CSE 576. Ali Farhadi. Several slides from Larry Zitnick and Steve Seitz

Detecting Multiple Symmetries with Extended SIFT

Multi-stable Perception. Necker Cube

Lecture 9: Epipolar Geometry

Camera Calibration for a Robust Omni-directional Photogrammetry System

A Case Against Kruppa s Equations for Camera Self-Calibration

Simultaneous Vanishing Point Detection and Camera Calibration from Single Images

Detection of surfaces for projection of texture

BUILDING POINT GROUPING USING VIEW-GEOMETRY RELATIONS INTRODUCTION

3D RECONSTRUCTION FROM VIDEO SEQUENCES

CS 4495 Computer Vision A. Bobick. Motion and Optic Flow. Stereo Matching

6.819 / 6.869: Advances in Computer Vision Antonio Torralba and Bill Freeman. Lecture 11 Geometry, Camera Calibration, and Stereo.

VIDEO-TO-3D. Marc Pollefeys, Luc Van Gool, Maarten Vergauwen, Kurt Cornelis, Frank Verbiest, Jan Tops

AN AUTOMATIC 3D RECONSTRUCTION METHOD BASED ON MULTI-VIEW STEREO VISION FOR THE MOGAO GROTTOES

Lecture'9'&'10:'' Stereo'Vision'

Face Recognition At-a-Distance Based on Sparse-Stereo Reconstruction

Euclidean Reconstruction and Auto-Calibration from Continuous Motion

On-line and Off-line 3D Reconstruction for Crisis Management Applications

Instance-level recognition part 2

Epipolar geometry. x x

The end of affine cameras

COSC579: Scene Geometry. Jeremy Bolton, PhD Assistant Teaching Professor

Dense 3-D Reconstruction of an Outdoor Scene by Hundreds-baseline Stereo Using a Hand-held Video Camera

Transcription:

A COMPREHENSIVE TOOL FOR RECOVERING 3D MODELS FROM 2D PHOTOS WITH WIDE BASELINES Yuzhu Lu Shana Smith Virtual Reality Applications Center, Human Computer Interaction Program, Iowa State University, Ames, IA, USA. yuzhu@iatate.edu, sssmith@iastate.ed ABSTRACT Recovering 3D objects from 2D photos is an important application in the areas of computer vision, computer intelligence, feature recognition, and virtual reality. This paper describes an innovative and systematic method to integrate automatic feature extraction, automatic feature matching, manual revision, feature recovery, and model reconstruction as an effective recovery tool. This method has been proven to be a convenient and inexpensive way to recover 3D scenes and models directly from 2D photos. We have developed a new automatic key point selection and hierarchical matching algorithm for matching 2D photos, which have less similarity. Our method uses a universal camera intrinsic matrix estimation method to omit camera calibration experiment. We have also developed a new automatic texture-mapping algorithm to find the best textures from 2D photos. In this paper, we include some examples and results to show the capability of the developed tool. KEY WORDS: 3D recovery, Stereo match, Computer modeling, Wide baseline. Introduction With the rapid and wide application of virtual models in many areas, creating 3D models from real scenes is greatly needed. Traditional manual model building is labor-intensive and expensive. Thus, automatically constructing 3D computer models has recently received much attention [][2]. Much work has been conducted concerning recovering existing 3D environments. These recovery efforts can be classified into two categories: using scanning devices [3][4], and using cameras [5][6][7][8][9]. Scanning devices can automatically reconstruct objects precisely, but they are very expensive and inconvenient to carry, especially in an outdoor environment. Here, we chose to focus on 3D model recovery from 2D images by cameras. This technology uses two or more photos of the same objects to recover the 3D information of the overlapped areas and, subsequently, reconstruct the model. The process includes four steps: key feature selection, feature matching, recovery computation, and model reconstruction. The features of an image are often expressed as discontinuities in image signals. In the prior research, these discontinuities were extracted as corner points [][8][9][0][][2], edges [3][4] or regions [7][5][6] by using the first or second derivative information of the image signals. Feature matching is both the focus and the bottleneck of recent research in the area of recovering 3D information from 2D images. These matching processes are applied according to the different attributes of detected features: corner points [][8][9][][2][7], line edges [0][8], curved edges [9], and regions [5][7][20]. The point matching methods have been most widely used in stereovision research because corners are easy to detect 476-07 08

and they are more stable and robust when perspective is changed. Almost all of these point-matching algorithms are designed according to the images similarity, uniqueness, continuity, and Epipolar information [][2][5][9][][7][2]. Recovery computation (also called stereo triangulation) is relatively stable and sophisticated when a user knows the camera s parameters. However, if these parameters are not given, it is necessary to calibrate the camera [][2][2], an operation which is inconvenient for many common users. Thus, camera calibration and selfcalibration research has become another major research focus. Even though there have been many studies in this area, problems still arise because of the lack of sophistication with current methods. These problems include difficulties in completing 3D recovery automatically, and difficulties in dealing with images having wide baselines. In response to these problems, we have developed a systematic semi-automated method to recover 3D models directly from 2D photos with less similarity. We have also developed an automatic feature information extraction method and a hierarchical matching algorithm for images with less similarity, as well as a tool for users to edit key points, revise possible mismatches, and select triangles to reconstruct a model with surfaces. A universal camera intrinsic matrix estimation method from statistical analysis is used to recover 3D information without camera calibration, and ultimately, a new texture-mapping algorithm is developed to automatically select the better textures from different photos. 2. Methodology 2.. Key Points Extraction Feature points are those holding the main characteristics of a 2D image. In our application, geometric information is the main character(s) to be recovered. Again, we return to the Harris corner detection method [][9][0] and the Canny edge detection method [8][3]. We have chosen to utilize the Canny method to extract segment information for two reasons: first, because edge detection will hold more complete geometric information; and second, because edge segments can be displayed using only two end points, which can be easily edited and revised manually. 2.2. Hierarchical Matching Algorithm The epipolar constraints have a great contribution in stereo matching [][2][9][0][]. Unlike other matching methods, epipolar constraints are more robust. Only eight well-matched points information are necessary to compute the epipolar geometry []. However, getting these eight well-matched points is a challenging problem. It is almost impossible to check all the possible combinations of extracted feature points. Therefore, an initial seed matching is necessary to provide candidate matching for epipolar geometry computation. The most widely used method to obtain an initial matching set is the classical cross correlation method []. Although there are other methods which could be used to obtain the initial seed matching, these algorithms usually do not work very well when they are applied to images with less similarity, because high similarity is a fundamental requirement for those matching methods. Our method presents a new hierarchical method to obtain the initial correspondence set, as shown in figure. First, segments will be matched. Because segments have more attributes than points (such as length, position, direction and background color information), matching accuracy could be increased, particularly when using large baseline images with less similarity. Second, two end points of the segments will be matched based on the segments matched from the first step. If these segments from the first step are well matched, the accuracy of the second step will be very high. Our approach uses four indexes for segment matching. The first index is the relative position of the center points of the extracted segments, and is represented 09

CP=a*(I x - α* I x )+b*(i 2x - α* I 2x ) +c*(i 3x - I 3x )+d*(i 4x - I 4x ) () Figure. Hierarchical matching algorithms by the summation of the vectors from the segment centers to all other segment centers. For example, the index vector for p is found by adding up all the vectors from p to other points as shown in figure 2 (a), and the index vector for p2 is found by adding up all the vectors from p2 to other points as shown in figure 2 (b). The second index is the length of a segment, and it is represented by the distance between two end points. The third index is the background information of each segment, which is represented by the mean color value of the neighborhood of the center point of the segment. The fourth index is the direction of a segment, which is represented by the angle of the segment vector. (a) Relative position of p (b) Relative position of p2 Figure 2. Index one for p and p2 - Relative position The four indexes for each segment in two images are compared, and the difference between each index value for each pair of segments is computed and added together, as shown in equation. The potential matched segments will be the ones having the least differences between index values. In the above equation, I x, I 2x, I 3x, I 4x, I x, I 2x, I 3x, and I 4x are the four index values for a pair of segments in two images. Similarly, a, b, c, and d are the weights of the four indexes. These weights are determined by the relative importance of the four indexes. Through our study, we have found that the indexes of relative position and segment length are more crucial when matching images with wide baselines. Therefore, weights of these two indexes are larger than others. α is the estimated scale parameter of two photos used, which is estimated by the ratio of the bounding box size of the object in the two images. This factor could help to solve the scaling problem in the photos used. In the first level of the matching process, we can find potential matched segments, but we cannot determine their matching directions. Consequently, we use the classical cross correlation method in level two to match the end points of the matched segments. If segments are corresponded correctly in level one, level two matching will be easier because there will be few candidate points. Finally, the correspondences for the initial set of key feature points are obtained from the proposed hierarchical matching algorithm. A least squares method was used to find eight points which are best matched. The eight best-matched points are then used to calculate the fundamental matrix, which is useful in finding the most reasonable correspondences. The fundamental matrix is the algebraic representation of the epipolar geometry. After the fundamental matrix is calculated, it is used to find inliers of the seed matches found in the level two matching and reject the outliers. Figure 3 shows the comparison of the final matching result of the proposed hierarchical matching method and the classical cross correlation method [][8][9][][2][7]. This result shows that our algorithm could give more correct correspondent key points (6 matches) than the classical cross correlation algorithm (9 matches). In figure 3(b), there is an obvious mismatch as the result of using the classical cross correlation algorithm. 0

(a) Our matching method (b) Classical cross correlation method Figure 3. Matching result comparison To improve the results and accuracy, various strategies can be implemented, such as the relaxation process and searching the points correspondence a second time with the constraint of epipolar geometry, as suggested by Zhang []. However, this will again cost more time and resources. 2.3. 3D Recovery The relationship of a 3D point coordinate to its image plane coordinate through a camera is shown in equations 2 and 3. S is a scaling factor, while (x,y,z) and (u,v) are correspondent 3D point coordinates and their camera image coordinates. P is a three by four perspective projection matrix, which can be decomposed as camera intrinsic matrix A and extrinsic matrix with rotation and translation information (R, T), as shown in equation 3. x u y S v = P z (2) P = ART [ ] (3) Thus, the relationship of a 3D point and its two image coordinates on two image planes through two cameras can be expressed through equation 4, where A and A 2 are the two cameras intrinsic matrices. Here, we set the first camera as the original, and the second camera as the transformed one. These two equations work as the principle of stereo triangulation to recover 3D coordinates when all parameters are known. After the correspondent key points are matched and triangular surfaces are constructed, the triangulation x u y s v = A [ I0] z x u 2 y s 2 v 2 = A 2 [ RT] z (4) method will be carried out to recover the 3D information [7][2]. Prior research has shown that camera calibration (or self-calibration) should be carried out to obtain the camera s intrinsic parameter matrix [][2][6]. The intrinsic matrix A (shown in equation 5) and the fundamental matrix F we obtain in the matching process can then be used to calculate the rotation and translation parameters between the two cameras, with which we could calculate the 3D information of the object. However, camera calibration experimentation is very inconvenient and time consuming, and is sometimes even impossible when, for example, we attempt to recover a historical scene from old photographs. A= fku fku cotθ u 0 0 fkv /sinθ v 0 0 0 (5) There are six intrinsic parameters: focal length f of the camera, aspect ratios k u and k v, angle θ between the retinal axes, and coordinates of the principal camera points u 0 and v 0. [][2][2]. Xu, Terai, and Shum [2] suggest that if high precision is not required, we can assume that the two cameras are π /2 in between (θ = π /2), that the aspect ratio is (k u = k v = ), and that the principal point is at the image s center. Thus, the only unknown parameter left is the focal length f. The intrinsic matrix can then be rewritten as equation 6: f 0 pixel x / 2 A= 0 f pixel / (6) y 2 0 0

After surveying and analyzing focal lengths used from different research areas and obtained 2 samples of focal lengths, we have found that most of focal lengths (80%) vary in a narrow range [700-300]. Thus, it is possible to estimate a camera s intrinsic matrix if we recover 3D information from photos taken by a normal camera. Our 3D recovery tool allows the user to adjust the focal length to find the best results. Figure 4 shows an example of recovering a 3D scene using a focal length of 000. Figure 5. Reconstruction with texture mapping Figure 4. Results by using f = 000 2.4. Reconstruction by texture mapping After we obtain all the 3D information of the feature points, a new model can be constructed based on the connectivity information given by the triangles created above. Since the reconstructed solid model does not contain surface texture, texture mapping is necessary to make the model more real. However, since we will have at least two photos, and since each photo will be taken from a different angle (and will therefore show certain details in a slightly different way), the question arises: Which photo to use? Normally, a better surface texture might be created from the camera, which captures a larger area of that surface of the object and contains clearer details. Using this principle, we have designed an algorithm to compare every correspondent triangle texture and select the texture with larger area. Figure 5 shows an example of our algorithm. 3. Conclusion and Discussion We have proposed a hierarchical featurematching algorithm for wide baseline images with less similarity.. We have presented a universal camera intrinsic matrix, by which the camera calibration experimentation could be neglected (saving time and resources). 2. We have presented a new texture-mapping algorithm which will automatically select better and clearer triangle textures to be mapped to the reconstructed model. 3. We have presented an integrated and comprehensive process for recovering a 3D model from 2D images, while almost all previous research only focused on one or two topics of this process. Since our method only recovers a 3D scene from two 2D photos, the unseen areas in either photo cannot be recovered. Objects with simple geometric shapes (like buildings) are easier to recover than other complicated objects like trees and grass. Because the estimated camera intrinsic matrix does affect the results recovered, users can adjust the focal length within the suggested range. In the near future, more work will be done to register and fuse model parts retrieved from more photos to recover a complete model. The result will also be output as a VRML model to be used in more applications. References [] K. Cornelis, M. Pollefeys, M. Vergauwen, & L, Van Gool. Augmented Reality using Uncalibrated Video Sequences, 2th European Workshop on 3D Structure 2

from Multiple Images of Large-Scale Environments (SMILE2000), Dublin, Irleand, 2000, 44-60. [2] M. Pollefeys, Self-Calibration and Metric 3D Reconstruction from Uncalibrated Image Sequences, PH.D thesis, Katholieke Universiteit Leuven, Heverlee, Belgium, 999. [3] M. Reed, & P. Allen, 3-D Modeling from Range Imagery: An Incremental Method with a Planning Component, Image and Vision Computing, 7, 999, 99-. [4] I. Stamos, & P. Allen, 3-D Model Construction Using Range and Image Data, Computer Vision & Pattern Recognition Conf. (CVPR), 2000, 53 536. [5] H. Shum, & R. Szeliski, Stereo Reconstruction from Multiperspective Panoramas, 7th International Conf. on Computer Vision (ICCV'99), Kerkyra, Greece, 999,4-2. [6] T. Jebara, A. Azarbayejani, & A. Pentland, 3D Structure from 2D Motion, IEEE Signal Processing Magazine, 6(3). 999, 66-84. [7] S. Baker, R. Szeliski, & P. Anandan, A layered approach to stereo reconstruction, IEEE Computer Society Conf. on Computer Vision and Pattern Recognition (CVPR98), 998, 434-44. [8] C. Baillard, & A. Zisserman, A plane-sweep strategy for the 3d reconstruction of buildings from multiple images, International Archives of Photogrammetry and Remote Sensing, 32(2), 2000, 56 62 [9] A. W. Fitzgibbon, & G. Cross, A. Zisserman, Automatic 3D Model Construction for Turn-Table Sequences, Proc. European Workshop on 3D Structure from Multiple Images of Large-Scale Environments, 998, 55-70. [0] C. G. Harris, & M. J. Stephens, Combined Corner and Edge Detector, Proc. 4th Alvey Vision Conf., Manchester, England, 988,47-5. [] Z. Zhang, R. Deriche, O. Faugeras, & Q. Luong, A robust technique for matching two uncalibrated images through the recovery of the unknown epipolar geometry, Artificial Intelligence, 78, 995,87-9. [2] P. Tissainayagam, & D. Suter, Assessing the performance of corner detectors for point feature tracking applications, Image and Vision Computing, 22(8), 2004, 663-679. [3] J. F. Canny, Finding edges and lines in images, Master s thesis, MIT. AI Lab, 983. [4] R. Gonzalez, & R. Woods, Digital Image Processing (2 nd edition, Prentice Hall, 2002). [5] J. Gao, A. kosaka, & A. Kak, A Deformable Model for Human Organ Extraction, Proc. IEEE International Conf. on Image Processing, 998, 3: 323-327 [6] D. L. Pham, C. Xu, & J. L. Prince, A Survey of Current Methods in Medical Image Segmentation, Annual Review of Biomedical Engineering, 2, 998, 35-337. [7] Y. Ma, S. Soatto, J. Kosecka, & S. Sastry, An Invitation to 3-D Vision: From Images to Geometric Models (Springer-Verlag, 2003). [8] H. Loaiza, J. Triboulet, & S. Lelandais, Matching segments in stereoscopic vision, IEEE Instruction & Measurement Magazine, 4(), 200, 37-42. [9] R. Deriche, & O. Faugeras, 2-D Curve Matching Using High Curvature Points: Application to Stereo Vision, Proc. International Conf. on Pattern Recognition. New Jersey, USA, 990, 240-242. [20] T. Tuytelaars. M. Vergauwen, M. Pollefeys, & L. Van Gool, Image Matching for Wide baseline Stereo, Proc. International Conf. on Forensic Human Identification, 999. [2] G. Xu, J. Terai, & H. Shum, A Linear Algorithm for Camera Self-Calibration, Motion and Structure Recovery for Multi-Planar Scenes from Two Perspective Images, Computer Vision & Pattern Recognition Conf. (CVPR), 2000: 2474-2479. 3