Efficient 3D Reconstruction for Urban Scenes

Size: px

Start display at page:

Download "Efficient 3D Reconstruction for Urban Scenes"

Meagan Collins
5 years ago
Views:

1 Efficient 3D Reconstruction for Urban Scenes Weichao Fu, Lin Zhang, Hongyu Li, Xinfeng Zhang, and Di Wu School of Software Engineering, Tongji University, Shanghai, China Abstract. Recently, researchers working in the fields of computer graphics and computer vision have shown tremendous interests in reconstructing urban scenes. For this task, the acquisition of the 3D point clouds is the first step, for which scans are usually widely utilized. Nevertheless, on seeing the potential drawbacks of scans, in this paper, we propose a novel urban scene reconstruction system based on the Multi-View Stereo (MVS). Given a set of calibrated photographs, we first generate point clouds using an existing MVS algorithm, and then reconstruct the sub-structures that often regularly repeat in urban buildings. Finally, we recover the entire architectural models through an automatic growing algorithm of the sub-structures in dominant directions. Experimental results on regular urban buildings show the practicality and high efficiency of the proposed reconstruction method. Keywords: Urban Scene, Multi-View Stereo, sub-structures. 1 Introduction Reconstruction of urban scenes is attracting increasing attention these days, motivated by ambitious applications that aim to build digital copies of real cities (e.g., Microsoft Virtual Earth 3D and Google Earth 3D) [1]. It is a very complex problem with a lot of research history, open problems, and possible solutions. Usually, the first step for an urban scene reconstruction system is the acquisition of 3D point clouds and the state-of-the-art methods, such as [2], utilized laser scans for this step. However, as an approach for 3D point clouds acquisition from urban scenes, laser scans has some inherent drawbacks with respect to the cost and the easiness for distribution. In addition, the main difficulty of laser scans of large-scale urban environments is the data quality. Large distances between the scanner and the scanned objects reduce the precision and imply higher level of noise. The obtained point clouds usually exhibit significant missing data. As a result, in this paper, we propose a new urban scene reconstruction scheme based on another 3D point clouds acquisition approach, multi-view stereo (MVS). Compared with laser scans, MVS has the advantage of low cost to get data sources (e.g., Internet photo sharing sites like Flickr and Google Images). And for the urban scene, the quality of point clouds obtained by MVS is usually satisfactory. According to a quantitative evaluation on the Middlebury benchmark, a patch-based Corresponding author. D.-S. Huang et al. (Eds.): ICIC 2013, LNCS 7995, pp , Springer-Verlag Berlin Heidelberg 2013

Efficient 3D Reconstruction for Urban Scenes 547 MVS method [3] outperforms all others for most of the datasets. Hence, we make use of this PMVS algorithm to recover point clouds.

2 Efficient 3D Reconstruction for Urban Scenes 547 MVS method [3] outperforms all others for most of the datasets. Hence, we make use of this PMVS algorithm to recover point clouds. Urban landscapes exhibit a high degree of self-similarity and redundancy [1]. This characteristic of urban scenes is not a chance occurrence, but is demonstrated universally across countries and cultures. Such large scale repetitions arise from aesthetics, build-ability, manufacturing ease, etc. Also urban buildings are mostly comprised of flat or near-planar faces because of functional requirements and constraints. Consequently, the presence of such characteristics suggests opportunities for simplifying the reconstruction task. (a) (b) (c) Fig. 1. Images of urban scenes with texture-poor but highly structured surfaces are increasingly ubiquitous on the Internet In this paper, we propose a 3D urban scene reconstruction method based on exploration of properties of architectural scenes. Our method is inspired by recent works of [2], which assemble an architectural model over a 3D point cloud by interactive operations. The key idea is to replace the smoothness prior used in traditional methods with priors that are more appropriate for urban buildings. Our approach operates as follows. Given a set of calibrated photographs, we first generate point clouds using an existing MVS algorithm, and then reconstruct the sub-structures that often regularly repeat in urban buildings. At this step, regularity and self-symmetry properties of urban buildings [1] are explored. Finally, we recover the entire architectural models through an automatic growing algorithm of the sub-structures in dominant directions. Our method, exploits properties of urban scenes, offers the following advantages. 1) It is remarkably robust to lack of texture, and able to model flat painted walls shown in Fig. 1(b). 2) It can work with poor quality point clouds generated from photographs due to occlusion, as shown in Fig. 1(c). 3) It produces remarkably clean and simple models as outputs. This paper is arranged as follows. In section 2 we review the PMVS algorithm and preprocess the point clouds generated by PMVS. In section 3 we reconstruct the substructures that often regularly repeat in urban buildings and in section 4 we recover the entire architectural models through an automatic growing algorithm of the substructures. Experimental results are presented in section 5, followed by conclusions in section 6.

3 548 W. Fu et al. 2 MVS Preprocessing PMVS algorithm presented in [3] is implemented as a match, expand, and filter procedure, starting from a sparse set of matched keypoints, and repeatedly expanding these before using visibility constraints to filter away false matches. Given a set of calibrated photographs, the first step of our method is to use the PMVS algorithm [3] to generate a set of oriented 3D points (positions and normal). We retain only highconfidence points in textured areas. Associated with each point P i, several attributes can be extracted, including the 3D location, a surface normal N i, a set of visible images V i, and a photometric consistency score (normalized cross correlation) C(P i ). 2.1 Outliers Removing Although the PMVS algorithm proposed in [3] enforces local photometric consistency and global visibility constraints to remove outliers, the point clouds generated by PMVS still have sparse outliers which will corrupt the estimation of local point cloud attributes such as surface normal or curvature changes. We solve these irregularities by performing a statistical analysis on each point s neighborhood, and trimming those which do not meet a certain criteria. Our sparse outlier removal scheme is based on the distribution of distances from a point to its neighbors in the PMVS point clouds. For each point, we compute the mean distance from it to all its neighbors. By assuming that the resulted distribution is Gaussian with a mean and a standard deviation, all points whose mean distances are outside an interval defined by the global distances mean and standard deviation can be considered as outliers and trimmed from the point clouds. An example of the outliers-removal is shown in Fig. 2. Fig. 2(a) shows the original point cloud obtained by PMVS, while Fig. 2(b) shows the result of outliers-removal. We set the number of nearest neighbors as 50 for mean distance estimation. (a) (b) (c) Fig. 2. (a) The original point cloud obtained by PMVS. (b) The resultant one after outliers removing. (c) The filtered point cloud with normals.

4 Efficient 3D Reconstruction for Urban Scenes Estimating Normals and Curvature The PMVS algorithm in [3] optimizes the normal of each patch by simply minimizing the photometric discrepancy score, and constraining the number of degrees of freedom to two (yaw and pitch). However, the normal optimized in PMVS is not good enough for urban scene, because of the large distance between building surface and the optical center of the camera, and the initialization of PMVS algorithm works poorly for the large scale urban buildings. Though many different normal estimation methods exist, the one we used in this paper is one of the simplest, and is formulated as follows. The problem of determining the normal to a point on the surface is approximated by the problem of estimating the normal of a plane tangent to the surface, which in turn reduces to an analysis of the eigenvectors and eigenvalues of a covariance matrix created from the nearest neighbors of the query point. More specifically, for each point P i, we assemble the covariance matrix C as follows: 1 C = P P P P C V =λ V j k T ( i )( i ), j j j, { 0,1,2} k i = 1 (1) where k is the number of point neighbors considered in the neighborhood of P i, P represents the 3D centroid of the nearest neighbors, λ j is the j-th eigenvalue of the covariance matrix, and V j is the j-th eigenvector. The point cloud with normals is shown in Fig. 2(c). The surface curvature change is estimated from the eigenvalues as: λ0, where λ 0 <λ 1 <λ2 λ +λ +λ (2) 2.3 Extracting Dominant Directions We could require the dominant axes of buildings in an automatic growing algorithm of the sub-structures. Hence, we employ a simple greedy algorithm using the normal estimates N i recovered above (See [7, 8, 9] for similar ideas). We compute a histogram of normal directions over a unit sphere, subdivided into 1000 bins. We then set the first dominant axis d 1 to the average of the normals within the largest bin. Next, we find the largest bin which is in the range 80 to 100 degrees away from d 1 and set the second dominant axis d 2 dominant axis d 3 to the average normal within that bin. Finally, the third is computed in the same way. We allow for some deviation from orthogonality to compensate for possible errors in normal estimation and to handle urban building that itself is not composed of exactly orthogonal planes.

5 550 W. Fu et al. 3 Sub-structure Reconstruction While in recent years many techniques have been developed to detect repeated parts in models [5, 11, 12, 13] and regularity directly on 3D geometry [6], in this paper, we attempt to learn the repetitions pattern directly from high level user guidance for the existence of occlusion or missing data. Due to the generally poor quality of generated data, we argue that some level of user interaction is required to understand the semantics of the building model. Hence, at the beginning of this section, we select the points that imply the sub-structures by our interactive tool. Then, a region growing algorithm is utilized for simplifying the reconstruction of the sub-structures. 3.1 Region Growing Algorithm The purpose of the region growing algorithm is to merge the points that are close enough in terms of the smoothness constraint. Thereby, the output of this algorithm is a set of clusters, and each cluster has a set of points that are considered to be a part of the same smooth surface. We first sort the points by their curvature value. It is necessary because the region begins its growth from the point that has the minimum curvature value. The reason for this is that the point with the minimum curvature is located in the flattest area. Then, as long as there are unlabeled points in the cloud, we pick up the point with minimum curvature value and start the growth of the region. First, the picked point is added to a set called seeds. Then, for every seed point we find its neighboring points, and test the angle between the normal of the current seed point and its neighbors normals. If the angle is less than a threshold θ, the current point is added to the current region. After that every neighbor is tested for the curvature value. If the curvature is less than a threshold δ, then this point is added to the seeds. Finally, current seed is removed from the seeds, and if the seed set becomes empty which means that a certain region has grown, we repeat the process from the beginning. The overall pseudo code for this growing algorithm is given in Table Reconstruction After the region growing algorithm, we employ an efficient RANSAC method for point-cloud shape detection in each region of the sub-structures. The RANSAC algorithm we used is presented in [9]. It decomposed the point cloud into a concise, hybrid structure of inherent shapes and a set of remaining points. The method can detect planes, spheres, cylinders, cones and tori. However, because of functional requirements and constraints, urban buildings are mostly comprised of flat or near-planar faces. Consequently, the RANSAC method with plane model detection is usually sufficient. Also, the region with too little points can be rejected or determined by high level user guidance which RANSAC model can be adopted. Eventually, the substructure is generated by enforcing alignment among the RANSAC models. This step will resolve the inconsistencies we wish to repair, e.g., model intersections, small gaps, and other forms of misalignments.

6 Efficient 3D Reconstruction for Urban Scenes 551 Table 1. The pseudo code for the region growing algorithm Input: Point cloud = { P}, Point normals = { N}, Point curvatures = { C}, Neighbor finding function Ω ( ), Curvature threshold δ, Angle threshold θ. Output: Region list { R }. R φ, Available points list { A} { P1,..., Pn } While { A } is not empty do Current region{ R c } φ, Current seeds { S c } φ Point with minimum curvature in { A} Pmin { Sc} { Sc} Pmin,{ Rc} { Rc} Pmin,{ A} { A} \ Pmin for i = 0 to size ({ Sc} ) do Find nearest neighbors of current seed point { Bc} Ω ( Sc{ i} ) for j = 0 to size ({ Bc }) do Current neighbor point Pj Bc{ j} 1 If { A } contains Pj and cos ( N{ Sc{} i }, N{ Bc{} j }) < θ then { Rc} { Rc} Pj { A} { A} \ Pj If C{ P j } < δ then { S } { S } P end if end if end for end for c c j Add current region to global region list { R} { R} { R c } end while 4 Automatic Continuation Our sub-structure automatic continuation algorithm is inspired by recent works in [2]. However, in [2], some essential interactive operations are utilized for creating complex structures through the propagation of a repeated structure. In our case, the automatic continuation of sub-structure is constrained along the dominant directions computed in Section 2, which can minimize the required user interaction. Also, in [2], an optimization which balances between the data-fitting and contextual forces is adopted. We continue this trend and present our data-fitting term and contextual term below. Given a sub-structure S and the point cloud, the datafitting term, also known as the fitting error is computed by measuring the one-sided Euclidean distance from the sub-structure to points [14]. The contextual sub-structure S required to define the contextual force is a previously positioned sub-structure in a reconstruction sequence. The contextual term is defined as the sum of two terms: the interval term and the alignment term. The interval term I( S, S ) measures how well

7 552 W. Fu et al. the interval length between S and S agrees with the expected interval length due to regularity constraints. The alignment term A( SS, ) measures how well corresponding edges of S and S align. These two terms are normalized independently and weighted equally to form the contextual term. The automatic growing algorithm is achieved by finding an optimal linear transformation T, consisting of translation and scaling of the sub-structure S, to minim- * ize a weighted sum of data-fitting and contextual terms, { ω ( ( ) ) ( ω) ( ( ) )} * T arg min D T S, P 1 C T S, S T = + (3) where ω is a weight which balances between the data-fitting and contextual forces. The choice of the weight ω should correlate with an assessment of data quality and contextual regularity in the urban building model to be reconstructed. In our method, the default setting of ω is 0.5. Nevertheless, the user can manually adjust it according to data quality and contextual regularity perceived by them. 5 Experimental Results The proposed reconstruction method was tested on both a publicly available data set: Hall [4] and our own images of Building F in Tongji University (taken by cell phone Fig. 3. Top left: a target image. Top middle: the point cloud of the sub-structure. Top right: sub-structure reconstruction. Middle: sub-structure growing among the dominant direction. Bottom: the reconstructed model with texture mapping.

Efficient 3D Reconstruction for Urban Scenes 553 Fig. 4. Top left: a target image. Top right: the point cloud of the sub-structure. Bottom: the reconstructed model with texture mapping. camera).

8 Efficient 3D Reconstruction for Urban Scenes 553 Fig. 4. Top left: a target image. Top right: the point cloud of the sub-structure. Bottom: the reconstructed model with texture mapping. camera). The Hall dataset consists of 61 images which are used for PMVS algorithm to generate point cloud. This results in a total of points. In our case we use only the high-confidence points, thus, we have points after the MVS preprocessing step. We set θ = and δ = 1.0 for the thresholds used in region growing algorithm. The interval term used in automatic continuation is initialized as the length of the sub-structure s projection among the dominant direction. Our own

9 554 W. Fu et al. Building F dataset consists of 15 images taken by cell phone camera, and some of the images have occlusion as shown in top left of Fig. 4. The reconstruction results are shown in Fig. 3 and Fig. 4, which illustrate the convenience and robustness of our proposed method. 6 Conclusions and the Future Work In this paper, we have demonstrated a Multi-View stereopsis based 3D urban scene reconstruction method that utilizes the large scale repetitions and self-similarities of urban buildings. Our method is remarkably robust to buildings lack of texture, and also able to work with poor quality point clouds by exploiting properties of urban scenes. In future work, we plan to incorporate any parametric primitive that can be detected using standard RANSAC in our sub-structure reconstruction step. This may lead to a better model and exhibit the diversity of modern urban scenes. Acknowledgement. This work is supported by the Fundamental Research Funds for the Central Universities under grant no , the Natural Science Foundation of China under grant no , and the Innovation Program of Shanghai Municipal Education Commission under grant no. 12ZZ029. References 1. Zheng, Q., Sharf, A., Wan, G., Li, Y., Mitra, N.J., Cohen-Or, D., Chen, B.: Non-local Scan Consolidation for 3D Urban Scenes. ACM Transactions on Graphics 29, 94 (2010) 2. Nan, L., Sharf, A., Zhang, H., Cohen-Or, D., Chen, B.: SmartBoxes for Interactive Urban Reconstruction. ACM Transactions on Graphics 29, 93 (2010) 3. Furukawa, Y., Ponce, J.: Accurate, dense, and robust Multi-View stereopsis. In: CVPR, pp. 1 8 (2007) 4. Furukawa, Y., Curless, B., Seitz, S.M., Szeliski, R.: Manhattan-world Stereo. In: CVPR, pp (2009) 5. Liu, J., Musialski, P., Wonka, P., Ye, J.: Tensor Completion for Estimating Missing Values in Visual Data. In: ICCV, pp (2009) 6. Pauly, M., Mitra, N.J., Wallner, J., Pottmann, H., Guibas, L.J.: Discovering Structural Regularity in 3D Geometry. ACM Transactions on Graphics 27, 43 (2008) 7. Coorg, S., Teller, S.: Extracting Textured Vertical Facades from Controlled Close-Range Imagery. In: CVPR, pp (1999) 8. Pollefeys, M., Nister, D., Frahm, J.M., Akbarzadeh, A., Mordohai, P., Clipp, B., Engels, C., Gallup, D., Kim, S.J., Merrell, P.: Detailed Real-Time Urban 3D Reconstruction from Video. In: IJCV 2008, pp (2008) 9. Werner, T., Zisserman, A.: New Techniques for Automated Architectural Reconstruction from Photographs. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002, Part II. LNCS, vol. 2351, pp Springer, Heidelberg (2002) 10. Schnabel, R., Wahl, R., Klein, R.: Efficient RANSAC for Point-Cloud Shape Detection. Computer Graphics Forum 26, (2007)

10 Efficient 3D Reconstruction for Urban Scenes Mitra, N.J., Guibas, L.J., Pauly, M.: Partial And Approximate Symmetry Detection for 3D Geometry. ACM Transactions on Graphics 25, (2006) 12. Korah, T., Rasmussen, C.: Spatiotemporal Inpainting for Recovering Texture Maps of Occluded Building Facades. IEEE Transactions on Image Processing 16, (2007) 13. Pauly, M., Mitra, N.J., Wallner, J., Pottmann, H., Guibas, L.J.: Discovering Structural Regularity in 3D Geometry. ACM Transactions on Graphics 27, 43 (2008) 14. Nan, L., Xie, K., Sharf, A.: A Search Classify Approach for Cluttered Indoor Scene Understanding. ACM Transactions on Graphics 31, 137 (2012)

SIMPLE ROOM SHAPE MODELING WITH SPARSE 3D POINT INFORMATION USING PHOTOGRAMMETRY AND APPLICATION SOFTWARE

SIMPLE ROOM SHAPE MODELING WITH SPARSE 3D POINT INFORMATION USING PHOTOGRAMMETRY AND APPLICATION SOFTWARE S. Hirose R&D Center, TOPCON CORPORATION, 75-1, Hasunuma-cho, Itabashi-ku, Tokyo, Japan Commission