Map-Enhanced UAV mage Sequence Registration Yuping Lin Qian Yu Gerard Medioni Computer Science Department University of Southern California Los Angeles, CA 90089-0781 {yupingli, qianyu, medioni}@usc.edu Abstract Registering consecutive images from an airborne sensor into a mosaic is an essential tool for image analysts. Strictly local methods tend to accumulate errors, resulting in distortion. We propose here to use a reference image (such as a high resolution map image) to overcome this limitation. n our approach, we register a frame in an image sequence to the map using both frame-to-frame registration and frameto-map registration iteratively. n frame-to-frame registration, a frame is registered to its previous frame. With its previous frame been registered to the map in the previous iteration, we can derive an estimated transformation from the frame to the map. n frame-to-map registration, we warp the frame to the map by this transformation to compensate for scale and rotation difference and then perform an area based matching using Mutual nformation to find correspondences between this warped frame and the map. From these correspondences, we derive a transformation that further registers the warped frame to the map. With this two-step registration, the errors between each consecutive frames are not accumulated. We present results on real image sequences from a hot air balloon. 1. ntroduction Geo-registration is very useful application, which can be widely used in UAV (Unmannered Aerial Vehicle) to navigate, or to geo-locating a target, or even to refine a map. Feature-based [1][5] registration has produced good progress in recent years. Based on the technology of image registration, mosaicing of image sequences can be done by computing the transformations between consecutive frames. To take into account the accumulated error, bundle adjustment [6] is usually employed as a global error minimizing approach. However, for long sequences with thousands of frames, bundle adjustment is not feasible in terms of computation. Moreover, offline bundle adjust is not appropriate for many tasks. To perform image mosaicing in a progressive manner while still preserving accuracy, we propose to use an associated map image as a global reference. A two-step procedure is applied to register an UAV image sequence to the global map. n the first step, we register consecutive frames by estimating the best homography to align the feature points in each frame. By using the homography obtained from the first step, we roughly align the UAV image with the global map. The first step provides us an initialization which basically compensates for the scale and rotation between the UAV image and the map. n the second step, we try to register the roughly aligned UAV image to the map. A similar scenario has been presented in [8]. n area based matching, MSE[12] or normalized correlation[13] are used to determine correspondences between the UAV image and the reference image. However, the UAV images are captured at different times and in different views with respect to the satellite image. The color, illumination, and the dynamic content (such as vehicles, trees, shadow and so on) could be very different. MSE or normalized correlation in such cases are not robust enough. We propose an approach that applies mutual information [4] to establish correspondences. Mutual information has been successfully applied in establishing correspondence in different modality images, especially in medical image processing. Our experiments show that mutual information does provide strong enough correspondence after roughly compensating for scale and rotation. Given the correspondence between the roughly aligned UAV image and the map, we derive a homography that further registers the roughly aligned UAV image to the map. By linking this homography and the initialized homography from the first step, we can register the UAV images with the map without incremental accumulated registration errors. This paper is organized as follows. n section 2, we formulate our problem with definition of symbols. n section 3,
we present the two-step procedure for geo-registration. n section 4, we compare our results with and without refinement in geo-registration. Experiments show that the refinement procedure significantly reduces the accumulated error. Discussion and future work are presented in section 5. M i in the map, namely M i = H i,m i. Then we register M i to the map at M i, and derive H ɛ, namely M i = H ɛ M i. Finally the actual homography H i,m that registers i to M i on the map is derived as H i,m = H ɛ H i,m. 2. Problem formulation and ssues We start by giving definitions of the symbols used in this paper. We are given a sequence of UAV images 0, 1,..., n, and a map (usually a satellite image) M. Here, we assume the scene depth is small with respect to the distance from the UAV camera, so the transformation between two UAV images can be represented by a homography. The transformation between an UAV image and the map is also represented as a homography. Let H i,j denote the homography from i to j, H i,m denotes the homography from i to M, namely H i,j i = j and H i,m i = M i, M i is the image where i projects to in M. Note that H j,i = H 1 i,j. Our goal is to derive accurate estimates of H 0,M,..., H i,m so that 1,..., n are registered to M and form a mosaic without distortion (Figure 1). Figure 1. For each i,deriveh i,m so that they all register to the map M and form a seamless mosaic However, the map and images are taken at different times, from different sensors, from different viewpoints, and may have different dynamic contents, (such as vehicles or shadows). As a result, it is difficult to simply match each incoming image to the map. nstead, we need to build a partial local mosaic, then register to the map in an iterative manner. 3. Approach Figure 2 illustrates the flow chart of our approach. Each frame i in the UAV image sequence is first registered to the previous frame to derive H i,i 1. n the second step, we estimate H i,m as H i 1,M H i,i 1, denoted as H i,m. This estimated homography warps i to a partial local mosaic Figure 2. Flow chart of our approach n the following sections, we first describe the method we use to register i to the previous image i 1.Thenwe introduce our method to further fine-tune H i,m so that i is mapped to M more accurately and the registration error is not accumulated along the registration process. 3.1. Registration of consecutive mages To compute the H i,i 1, we match features and then perform RANSAC[3] outlier filtering. After trying many kinds of features, we selected SFT (Scale nvariant Feature Transform) [1] features. SFT features are invariant to image scale and rotation, and provide robust descriptions across changes in 3D viewpoint.
of matching and the computation time are far less than directly registering i to the map. 3.2.1. Finding Correspondences between UAV image and Map Figure 3. initial registration between the UAV images and the map n the feature matching step, we use nearest neighbor matching [2]. Since the translation and rotation of the UAV camera between consecutive frames are small, we can assume matched features should be within a small window. This adds one more constraint to match features. Usually, at resolution of 720 480, we can generate 2000 correspondence pairs. Finally, we use RANSAC to filter outliers (we use inlier tolerance = 1 pixel) among the set of correspondences and derive H i 1,i. Having the H i,i 1 and H 0,M, we can roughly register the UAV image to the map by estimating H i,m as: i H i,m = H i 1,M H i,i 1 = H 0,M H k,k 1 (1) k=1 This shows that if there exists a subtle transformation error in each H k,k 1, these errors are multiplied and result in a significant error. This means that later UAV images could be registered to a very wrong area on the map. As shown in Figure 3, the registration is not perfect. Thus, we need to find a way to establish correspondences between the UAV image and the map and refine the homography by using these correspondences. 3.2. UAV to Map registration Registering an aerial image to a map is a challenging problem [10][11]. Due to significant differences in lighting conditions, resolution, and 3D viewpoints between the UAV image and the map, the same point may yield quite different SFT descriptors respectively. Therefore, poor feature matching and poor registration can be expected. Since it is difficult to register an UAV image to the map directly, we make use of H i,i 1 derived from UAV to UAV registration and estimate H i,m as H i,m = H i 1,M H i,i 1, and then fine-tune it to a better one. Let M i denotes the warped image of i by H i,m (Figure 2, Step 2). Our goal is to derive a homography H ɛ that registers M i to the map at M i (Figure 2, Step 3), so that the image is accurately aligned to the map. The advantage of this approach is that with M i roughly aligned to the map, we can perform a local search for correspondence under the same scale. Therefore the ambiguity To derive H ɛ, we try to find correspondences between M i andthemapareawhichm i spans. However, M i is usually a smaller region than i (map has lower resolution), which means M i preserves less amount of information than i. Hencewedoitinareverseway.AsshowninFigure4,let U i be the map image transformed back from the same area which M i spans using H M,i. nstead of finding correspondences between M i and the map area where M i spans, we find correspondences between i and U i. Figure 4. U i denotes the map image transformed back from the same region which M i spans using H M,i. P and P U are points locate at the same coordinate in i and U i respectively. S P,S PU are two image patches of the same size centered at point P and P U respectively, where P is the corresponding point to P U. Let P and P U be points located at the same coordinates in i and U i respectively. With a good enough H i,m, P U should have its correspondence P in i close to P. P is determined by having the UAV image patch centers at it most similar to the map image patch centers at P U. We use mutual information[4] as the similarity measure. Mutual information of two random variables is a quantity that measures the dependence of the two variables. Taking two images (same size) as the random variables, it measures how much information two images share, or how much an image depends on the other. t is a more meaningful criterion way compared to measures such as cross-correlation or grey value differences. Let S Pi,S Pj be two image patches of the same size centered at point P i and P j respectively. M(S Pi,S Pj ) be the mutual information of S Pi and S Pj.WefindP by looking for pixels P i in P s neighborhood that yields the greatest M(S PU,S Pi ).
(a) (a) (b) Figure 5. The correspondences in the UAV image (a) with respect to the feature points in the map image (b). Blue dots and red dots represent good and poor correspondences respectively. (b) Figure 6. The correspondences in the uav image (a) with respect to the feature points in the map image (b). Green dots and orange dots represent RANSAC inliers and outliers respectively. 3.2.2. Defining Good Correspondences t may happen that all or none of the image patches centered on P s neighborhood pixels are similar to the image patch centered on P U. n either case, the maximum mutual information is meaningless, since the mutual information at other places could be just slightly smaller. We need to filter these unreliable correspondences so that the derived homography is accurate. Let P k be the pixel in P s neighborhood area that has the smallest mutual information value. We consider it a good correspondence when M(S PU,S P ) is significantly larger than M(S PU,S Pk ) (we use M(S PU,S P ) > 2M(S PM,S Pk )). ntuitively, it means that image patch S P must be significantly more similar to S PU than S Pk. Figure 5 shows the results of extracting good correspondences. Blue dots and red dots represent good and poor correspondences respectively. We can generate as many correspondences as we want by performing such an operation on feature points in U i.here we use the Harris Corner Detector[5] to extract features instead of SFT because our purpose is to have the locations of some interest points in U i. Harris Corner Detector satisfies our need, and it is computationally cheaper than SFT. Once we have enough correspondences, RANSAC is performed to filter outliers, and then H ɛ is derived. As shown in Figure 6, color dots in 6(b) are feature points extracted by Color dots in 6(a) are their correspondences respectively, while the green dots are RANSAC inliers to derive H ɛ. Finally, H i,m is derived as H i,m = H ɛ H i,m,and i is registered to the map at M i (Figure 2, Step 4). 4. Experimental Results We show results on two data sets. The UAV image sequences are provided with latitude and longitude information. The satellite images are acquired from Google Earth. The size of the each UAV image 720 480. We manually register the first frame of the UAV sequence to their corresponding satellite images, namely H 0,M is given. n each UAV to Map registration step, we select 200 Harris Corners in the UAV image as samples. We require the distance between any two features to be no lower than 10 pixels. For each sample, an image patch of size 100 100 is used to compute the mutual information, and the neighborhood region where we search for a best match is a window of size 40 40. We found the window size of 100 100 is a proper size for a discriminative local feature in our UAV image registration. Since mutual information computation is very costly, we only perform an UAV to Map registration every 50 frames. The results of case 1 with and without UAV to Map registration are shown in 7(a) and 7(c) respectively. The results of case 2 with and without UAV to Map registration are shown
in 7(b) and 7(d) respectively. Table 1 shows the comparison between registration with and without UAV to Map registration in the two examples. Example #1 Example #2 Number of frames 1000 900 w. map w/o map w. map w/o map Total registration time in minutes 349 83 322 75 Avg. error per pixel in the last frame compared with ground truth pixels 6.24 12.04 3.16 109.98 Table 1. Experimental results of two examples. 5. Discussion and Future Work We have proposed a new method to improve the accuracy of mosaicing. An additional map image is provided as a global reference to prevent accumulated error in the mosaic. We use mutual information as a similarity measure between two images to generate correspondences between an image and the map. The main limitation of our approach is the assumption that the scene structure is planar compared with the height of the camera. With the UAV camera not high enough, parallax between the UAV image and the map is strong, and the similarity measured by mutual information becomes meaningless. Moreover, even if all correspondences are accurate, they may not be lying on the same plane, and a homography cannot represent the transformation between the UAV image and the map. n our test cases, case 1 has stronger parallax than case 2. As shown in Figure 7, whenever a UAV image is registered to the map, case 1 is more likely to have images registered to a slightly off location, while case 2 has images registered correctly. Our future work aims at classifying features with the same plane. With correspondences of features on the same plane, our assumption is more valid and the UAV to Map registration should be more accurate. n addition, we are studying faster algorithms to speed up the mutual information computation in the UAV to Map registration step so that the overall mosaicing process can be done in reasonable time. Acknowledgments This work was supported by grants from Lockheed Martin. We thank Mark Pritt for providing the data. References [1] David G. Lowe, Distinctive image features from scaleinvariant keypoints, nternational Journal of Computer Vision, Vol. 60, No. 2, pp. 91-110, 2004. [2] Matthew Brown and David G. Lowe, Recognising panoramas, nternational Conference on Computer Vision (CCV 2003), pp. 1218-25. [3] M. A. Fischler and R. C. Bolles, Random Sample Consensus: A Paradigm for Model Fitting with Applications to mage Analysis and Automated Cartography, Comm. of the ACM, 24, pp. 381-395, 1981. [4] P. A. Viola, Alignment by Maximization of Mutual nformation, nternational Journal of Computer Vision, 24(2) pp. 137-154, 1997. [5] C. Harris and M.J. Stephens. A combined corner and edge detector, Alvey Vision Conference, pp. 147V152, 1988. [6] W. Triggs, P. McLauchlan, R. Hartley, and A. Fitzgibbon. Bundle Adjustment: A Modern Synthesis. n Vision Algorithms: Theory and Practice, number 1883 in LNCS, pages 298V373. Springer-Verlag, Corfu, Greece, September 1999. [7] H. S. Sawhney and R. Kumar. True multi-image alignment and its application to mosaicing and lens distortion correction, EEE Transactions on Pattern Analysis and Machine ntelligence, 21(3):235-243, 1999. [8] L. G. Brown, A survey of image registration techniques, ACM Computing Surveys, 24(4), pp. 325-376, 1992. [9] R. Wildes, D. Horvonen, S. Hsu, R. Kumar, W. Lehman, B. Matei and W. Zhao, Video Georegistration: Algorithm and Quantitative Evaluation, Proc. CCV, 343-350, 2001. [10] G. Medioni, Matching of a Map with an Aerial mage, Proceedings of the 6th nternational Conference on Pattern Recognition, pp. 517-519, Munich, Germany, October 1982. [11] Xiaolei Huang, Yiyong Sun, Dimitris Metaxas, Frank Sauer, Chenyang Xu, Hybrid mage Registration based on Configural Matching of Scale-nvariant Salient Region Features, cvprw, p. 167, 2004 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW 04) Volume 11, 2004 [12] S. Hsu. Geocoded Terrestrial Mosaics Using Pose Sensors and Video Registration, EEE Conf. on Computer Vision and Pattern Recognition, Kauai, Huwaii, USA, Dec. 2001. [13] Cannata, R.W. Shah, M. Blask, S.G. Van Workum, J.A. Harris Corp., Melbourne, FL Autonomous video registration using sensor model parameter adjustments, Applied magery Pattern Recognition Workshop, 2000. Proceedings. 29th, pp. 215-222. [14] D. Hirvonen, B. Matei, R. Wildes and S. Hsu. Video to Reference mage Alignment in the Presence of Sparse Features and Appearance Change, EEE Conf. on Computer Vision and Pattern Recognition, Kauai, Huwaii, USA, Dec. 2001.
(a) (b) (c) (d) Figure 7. (a),(c) Results of case 1 and case 2 with only registration of consecutive UAV images respectively. (b),(d) Results of case 1 and case 2 with additional UAV to Map registration very 50 frames respectively.