Map-Enhanced UAV Image Sequence Registration

Similar documents
Map-Enhanced UAV Image Sequence Registration and Synchronization of Multiple Image Sequences

Homographies and RANSAC

A NEW FEATURE BASED IMAGE REGISTRATION ALGORITHM INTRODUCTION

arxiv: v1 [cs.cv] 28 Sep 2018

arxiv: v1 [cs.cv] 28 Sep 2018

Motion Tracking and Event Understanding in Video Sequences

Fast Image Registration via Joint Gradient Maximization: Application to Multi-Modal Data

Estimation of Camera Pose with Respect to Terrestrial LiDAR Data

A Summary of Projective Geometry

Stereo and Epipolar geometry

Video Georegistration: Key Challenges. Steve Blask Harris Corporation GCSD Melbourne, FL 32934

Combining Appearance and Topology for Wide

Mosaics. Today s Readings

Fast Outlier Rejection by Using Parallax-Based Rigidity Constraint for Epipolar Geometry Estimation

Exploitation of GPS-Control Points in low-contrast IR-imagery for homography estimation

Invariant Features from Interest Point Groups

CSE 252B: Computer Vision II

Object Recognition with Invariant Features

Document Image Mosaicing with Mobile Phones

Feature Based Registration - Image Alignment

Instance-level recognition part 2

CHAPTER 1 INTRODUCTION

Midterm Examination CS 534: Computational Photography

E27 Computer Vision - Final Project: Creating Panoramas David Nahmias, Dan Spagnolo, Vincent Stigliani Professor Zucker Due 5/10/13

A Rapid Automatic Image Registration Method Based on Improved SIFT

III. VERVIEW OF THE METHODS

From Orientation to Functional Modeling for Terrestrial and UAV Images

3D object recognition used by team robotto

Augmenting Reality, Naturally:

Acquisition of high resolution geo objects Using image mosaicking techniques

Feature Matching and RANSAC

Image stitching. Digital Visual Effects Yung-Yu Chuang. with slides by Richard Szeliski, Steve Seitz, Matthew Brown and Vaclav Hlavac

ROBUST LINE-BASED CALIBRATION OF LENS DISTORTION FROM A SINGLE VIEW

An Angle Estimation to Landmarks for Autonomous Satellite Navigation

Image correspondences and structure from motion

K-Means Based Matching Algorithm for Multi-Resolution Feature Descriptors

Instance-level recognition II.

SURF applied in Panorama Image Stitching

Today s lecture. Image Alignment and Stitching. Readings. Motion models

CS229: Action Recognition in Tennis

A Method of Annotation Extraction from Paper Documents Using Alignment Based on Local Arrangements of Feature Points

Estimating Camera Position And Posture by Using Feature Landmark Database

Automatic Panoramic Image Stitching. Dr. Matthew Brown, University of Bath

A novel point matching method for stereovision measurement using RANSAC affine transformation

A Comparison of SIFT, PCA-SIFT and SURF

AR Cultural Heritage Reconstruction Based on Feature Landmark Database Constructed by Using Omnidirectional Range Sensor

CS664 Lecture #19: Layers, RANSAC, panoramas, epipolar geometry

Epipolar geometry-based ego-localization using an in-vehicle monocular camera

Local invariant features

Improving the Detection and Localization of Duplicated Regions in Copy-Move Image Forgery

Image Stitching using Harris Feature Detection

Image Registration and Mosaicking Based on the Criterion of Four Collinear Points Chen Jinwei 1,a*, Guo Bin b, Guo Gangxiang c

A REAL-TIME TRACKING SYSTEM COMBINING TEMPLATE-BASED AND FEATURE-BASED APPROACHES

STUDY OF AUTOMATIC IMAGE RECTIFICATION AND REGISTRATION OF SCANNED HISTORICAL AERIAL PHOTOGRAPHS

SCALE INVARIANT TEMPLATE MATCHING

Instance-level recognition

AN EFFICIENT BINARY CORNER DETECTOR. P. Saeedi, P. Lawrence and D. Lowe

RANSAC and some HOUGH transform

Augmented Reality VU. Computer Vision 3D Registration (2) Prof. Vincent Lepetit

Local Image Registration: An Adaptive Filtering Framework

Gain Adaptive Real-Time Stereo Streaming

Panoramic Image Stitching

Instance-level recognition

Omni-directional Multi-baseline Stereo without Similarity Measures

SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS

CS 231A Computer Vision (Winter 2014) Problem Set 3

An Image Based 3D Reconstruction System for Large Indoor Scenes

Motion Estimation and Optical Flow Tracking

Retinal Image Registration from 2D to 3D

Accurate Motion Estimation and High-Precision 3D Reconstruction by Sensor Fusion

DEVELOPMENT OF A ROBUST IMAGE MOSAICKING METHOD FOR SMALL UNMANNED AERIAL VEHICLE

SIMPLE ROOM SHAPE MODELING WITH SPARSE 3D POINT INFORMATION USING PHOTOGRAMMETRY AND APPLICATION SOFTWARE

Image stitching. Announcements. Outline. Image stitching

Computational Optical Imaging - Optique Numerique. -- Multiple View Geometry and Stereo --

Thin Plate Spline Feature Point Matching for Organ Surfaces in Minimally Invasive Surgery Imaging

PROCEEDINGS OF SPIE. Geospatial content summarization of UAV aerial imagery using mosaicking

Using Geometric Blur for Point Correspondence

Structured Light II. Thanks to Ronen Gvili, Szymon Rusinkiewicz and Maks Ovsjanikov

AUTOMATIC RECTIFICATION OF LONG IMAGE SEQUENCES. Kenji Okuma, James J. Little, David G. Lowe

Visual Odometry. Features, Tracking, Essential Matrix, and RANSAC. Stephan Weiss Computer Vision Group NASA-JPL / CalTech

SIFT - scale-invariant feature transform Konrad Schindler

Visual Tracking (1) Tracking of Feature Points and Planar Rigid Objects

Image Stitching. Slides from Rick Szeliski, Steve Seitz, Derek Hoiem, Ira Kemelmacher, Ali Farhadi

Simultaneous surface texture classification and illumination tilt angle prediction

Local features: detection and description May 12 th, 2015

Specular 3D Object Tracking by View Generative Learning

A Method for Automatic Recognition of Small Mobile Targets in Aerial Images

Local features: detection and description. Local invariant features

URBAN STRUCTURE ESTIMATION USING PARALLEL AND ORTHOGONAL LINES

Utilizing Salient Region Features for 3D Multi-Modality Medical Image Registration

SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014

CAP 5415 Computer Vision Fall 2012

3D Computer Vision. Structured Light II. Prof. Didier Stricker. Kaiserlautern University.

3D Environment Reconstruction

Sampling based Bundle Adjustment using Feature Matches between Ground-view and Aerial Images

A GPU-based implementation of Motion Detection from a Moving Platform

Computational Optical Imaging - Optique Numerique. -- Single and Multiple View Geometry, Stereo matching --

Panoramic Image Mosaicing

Object and Class Recognition I:

CS4670: Computer Vision

Transcription:

Map-Enhanced UAV mage Sequence Registration Yuping Lin Qian Yu Gerard Medioni Computer Science Department University of Southern California Los Angeles, CA 90089-0781 {yupingli, qianyu, medioni}@usc.edu Abstract Registering consecutive images from an airborne sensor into a mosaic is an essential tool for image analysts. Strictly local methods tend to accumulate errors, resulting in distortion. We propose here to use a reference image (such as a high resolution map image) to overcome this limitation. n our approach, we register a frame in an image sequence to the map using both frame-to-frame registration and frameto-map registration iteratively. n frame-to-frame registration, a frame is registered to its previous frame. With its previous frame been registered to the map in the previous iteration, we can derive an estimated transformation from the frame to the map. n frame-to-map registration, we warp the frame to the map by this transformation to compensate for scale and rotation difference and then perform an area based matching using Mutual nformation to find correspondences between this warped frame and the map. From these correspondences, we derive a transformation that further registers the warped frame to the map. With this two-step registration, the errors between each consecutive frames are not accumulated. We present results on real image sequences from a hot air balloon. 1. ntroduction Geo-registration is very useful application, which can be widely used in UAV (Unmannered Aerial Vehicle) to navigate, or to geo-locating a target, or even to refine a map. Feature-based [1][5] registration has produced good progress in recent years. Based on the technology of image registration, mosaicing of image sequences can be done by computing the transformations between consecutive frames. To take into account the accumulated error, bundle adjustment [6] is usually employed as a global error minimizing approach. However, for long sequences with thousands of frames, bundle adjustment is not feasible in terms of computation. Moreover, offline bundle adjust is not appropriate for many tasks. To perform image mosaicing in a progressive manner while still preserving accuracy, we propose to use an associated map image as a global reference. A two-step procedure is applied to register an UAV image sequence to the global map. n the first step, we register consecutive frames by estimating the best homography to align the feature points in each frame. By using the homography obtained from the first step, we roughly align the UAV image with the global map. The first step provides us an initialization which basically compensates for the scale and rotation between the UAV image and the map. n the second step, we try to register the roughly aligned UAV image to the map. A similar scenario has been presented in [8]. n area based matching, MSE[12] or normalized correlation[13] are used to determine correspondences between the UAV image and the reference image. However, the UAV images are captured at different times and in different views with respect to the satellite image. The color, illumination, and the dynamic content (such as vehicles, trees, shadow and so on) could be very different. MSE or normalized correlation in such cases are not robust enough. We propose an approach that applies mutual information [4] to establish correspondences. Mutual information has been successfully applied in establishing correspondence in different modality images, especially in medical image processing. Our experiments show that mutual information does provide strong enough correspondence after roughly compensating for scale and rotation. Given the correspondence between the roughly aligned UAV image and the map, we derive a homography that further registers the roughly aligned UAV image to the map. By linking this homography and the initialized homography from the first step, we can register the UAV images with the map without incremental accumulated registration errors. This paper is organized as follows. n section 2, we formulate our problem with definition of symbols. n section 3,

we present the two-step procedure for geo-registration. n section 4, we compare our results with and without refinement in geo-registration. Experiments show that the refinement procedure significantly reduces the accumulated error. Discussion and future work are presented in section 5. M i in the map, namely M i = H i,m i. Then we register M i to the map at M i, and derive H ɛ, namely M i = H ɛ M i. Finally the actual homography H i,m that registers i to M i on the map is derived as H i,m = H ɛ H i,m. 2. Problem formulation and ssues We start by giving definitions of the symbols used in this paper. We are given a sequence of UAV images 0, 1,..., n, and a map (usually a satellite image) M. Here, we assume the scene depth is small with respect to the distance from the UAV camera, so the transformation between two UAV images can be represented by a homography. The transformation between an UAV image and the map is also represented as a homography. Let H i,j denote the homography from i to j, H i,m denotes the homography from i to M, namely H i,j i = j and H i,m i = M i, M i is the image where i projects to in M. Note that H j,i = H 1 i,j. Our goal is to derive accurate estimates of H 0,M,..., H i,m so that 1,..., n are registered to M and form a mosaic without distortion (Figure 1). Figure 1. For each i,deriveh i,m so that they all register to the map M and form a seamless mosaic However, the map and images are taken at different times, from different sensors, from different viewpoints, and may have different dynamic contents, (such as vehicles or shadows). As a result, it is difficult to simply match each incoming image to the map. nstead, we need to build a partial local mosaic, then register to the map in an iterative manner. 3. Approach Figure 2 illustrates the flow chart of our approach. Each frame i in the UAV image sequence is first registered to the previous frame to derive H i,i 1. n the second step, we estimate H i,m as H i 1,M H i,i 1, denoted as H i,m. This estimated homography warps i to a partial local mosaic Figure 2. Flow chart of our approach n the following sections, we first describe the method we use to register i to the previous image i 1.Thenwe introduce our method to further fine-tune H i,m so that i is mapped to M more accurately and the registration error is not accumulated along the registration process. 3.1. Registration of consecutive mages To compute the H i,i 1, we match features and then perform RANSAC[3] outlier filtering. After trying many kinds of features, we selected SFT (Scale nvariant Feature Transform) [1] features. SFT features are invariant to image scale and rotation, and provide robust descriptions across changes in 3D viewpoint.

of matching and the computation time are far less than directly registering i to the map. 3.2.1. Finding Correspondences between UAV image and Map Figure 3. initial registration between the UAV images and the map n the feature matching step, we use nearest neighbor matching [2]. Since the translation and rotation of the UAV camera between consecutive frames are small, we can assume matched features should be within a small window. This adds one more constraint to match features. Usually, at resolution of 720 480, we can generate 2000 correspondence pairs. Finally, we use RANSAC to filter outliers (we use inlier tolerance = 1 pixel) among the set of correspondences and derive H i 1,i. Having the H i,i 1 and H 0,M, we can roughly register the UAV image to the map by estimating H i,m as: i H i,m = H i 1,M H i,i 1 = H 0,M H k,k 1 (1) k=1 This shows that if there exists a subtle transformation error in each H k,k 1, these errors are multiplied and result in a significant error. This means that later UAV images could be registered to a very wrong area on the map. As shown in Figure 3, the registration is not perfect. Thus, we need to find a way to establish correspondences between the UAV image and the map and refine the homography by using these correspondences. 3.2. UAV to Map registration Registering an aerial image to a map is a challenging problem [10][11]. Due to significant differences in lighting conditions, resolution, and 3D viewpoints between the UAV image and the map, the same point may yield quite different SFT descriptors respectively. Therefore, poor feature matching and poor registration can be expected. Since it is difficult to register an UAV image to the map directly, we make use of H i,i 1 derived from UAV to UAV registration and estimate H i,m as H i,m = H i 1,M H i,i 1, and then fine-tune it to a better one. Let M i denotes the warped image of i by H i,m (Figure 2, Step 2). Our goal is to derive a homography H ɛ that registers M i to the map at M i (Figure 2, Step 3), so that the image is accurately aligned to the map. The advantage of this approach is that with M i roughly aligned to the map, we can perform a local search for correspondence under the same scale. Therefore the ambiguity To derive H ɛ, we try to find correspondences between M i andthemapareawhichm i spans. However, M i is usually a smaller region than i (map has lower resolution), which means M i preserves less amount of information than i. Hencewedoitinareverseway.AsshowninFigure4,let U i be the map image transformed back from the same area which M i spans using H M,i. nstead of finding correspondences between M i and the map area where M i spans, we find correspondences between i and U i. Figure 4. U i denotes the map image transformed back from the same region which M i spans using H M,i. P and P U are points locate at the same coordinate in i and U i respectively. S P,S PU are two image patches of the same size centered at point P and P U respectively, where P is the corresponding point to P U. Let P and P U be points located at the same coordinates in i and U i respectively. With a good enough H i,m, P U should have its correspondence P in i close to P. P is determined by having the UAV image patch centers at it most similar to the map image patch centers at P U. We use mutual information[4] as the similarity measure. Mutual information of two random variables is a quantity that measures the dependence of the two variables. Taking two images (same size) as the random variables, it measures how much information two images share, or how much an image depends on the other. t is a more meaningful criterion way compared to measures such as cross-correlation or grey value differences. Let S Pi,S Pj be two image patches of the same size centered at point P i and P j respectively. M(S Pi,S Pj ) be the mutual information of S Pi and S Pj.WefindP by looking for pixels P i in P s neighborhood that yields the greatest M(S PU,S Pi ).

(a) (a) (b) Figure 5. The correspondences in the UAV image (a) with respect to the feature points in the map image (b). Blue dots and red dots represent good and poor correspondences respectively. (b) Figure 6. The correspondences in the uav image (a) with respect to the feature points in the map image (b). Green dots and orange dots represent RANSAC inliers and outliers respectively. 3.2.2. Defining Good Correspondences t may happen that all or none of the image patches centered on P s neighborhood pixels are similar to the image patch centered on P U. n either case, the maximum mutual information is meaningless, since the mutual information at other places could be just slightly smaller. We need to filter these unreliable correspondences so that the derived homography is accurate. Let P k be the pixel in P s neighborhood area that has the smallest mutual information value. We consider it a good correspondence when M(S PU,S P ) is significantly larger than M(S PU,S Pk ) (we use M(S PU,S P ) > 2M(S PM,S Pk )). ntuitively, it means that image patch S P must be significantly more similar to S PU than S Pk. Figure 5 shows the results of extracting good correspondences. Blue dots and red dots represent good and poor correspondences respectively. We can generate as many correspondences as we want by performing such an operation on feature points in U i.here we use the Harris Corner Detector[5] to extract features instead of SFT because our purpose is to have the locations of some interest points in U i. Harris Corner Detector satisfies our need, and it is computationally cheaper than SFT. Once we have enough correspondences, RANSAC is performed to filter outliers, and then H ɛ is derived. As shown in Figure 6, color dots in 6(b) are feature points extracted by Color dots in 6(a) are their correspondences respectively, while the green dots are RANSAC inliers to derive H ɛ. Finally, H i,m is derived as H i,m = H ɛ H i,m,and i is registered to the map at M i (Figure 2, Step 4). 4. Experimental Results We show results on two data sets. The UAV image sequences are provided with latitude and longitude information. The satellite images are acquired from Google Earth. The size of the each UAV image 720 480. We manually register the first frame of the UAV sequence to their corresponding satellite images, namely H 0,M is given. n each UAV to Map registration step, we select 200 Harris Corners in the UAV image as samples. We require the distance between any two features to be no lower than 10 pixels. For each sample, an image patch of size 100 100 is used to compute the mutual information, and the neighborhood region where we search for a best match is a window of size 40 40. We found the window size of 100 100 is a proper size for a discriminative local feature in our UAV image registration. Since mutual information computation is very costly, we only perform an UAV to Map registration every 50 frames. The results of case 1 with and without UAV to Map registration are shown in 7(a) and 7(c) respectively. The results of case 2 with and without UAV to Map registration are shown

in 7(b) and 7(d) respectively. Table 1 shows the comparison between registration with and without UAV to Map registration in the two examples. Example #1 Example #2 Number of frames 1000 900 w. map w/o map w. map w/o map Total registration time in minutes 349 83 322 75 Avg. error per pixel in the last frame compared with ground truth pixels 6.24 12.04 3.16 109.98 Table 1. Experimental results of two examples. 5. Discussion and Future Work We have proposed a new method to improve the accuracy of mosaicing. An additional map image is provided as a global reference to prevent accumulated error in the mosaic. We use mutual information as a similarity measure between two images to generate correspondences between an image and the map. The main limitation of our approach is the assumption that the scene structure is planar compared with the height of the camera. With the UAV camera not high enough, parallax between the UAV image and the map is strong, and the similarity measured by mutual information becomes meaningless. Moreover, even if all correspondences are accurate, they may not be lying on the same plane, and a homography cannot represent the transformation between the UAV image and the map. n our test cases, case 1 has stronger parallax than case 2. As shown in Figure 7, whenever a UAV image is registered to the map, case 1 is more likely to have images registered to a slightly off location, while case 2 has images registered correctly. Our future work aims at classifying features with the same plane. With correspondences of features on the same plane, our assumption is more valid and the UAV to Map registration should be more accurate. n addition, we are studying faster algorithms to speed up the mutual information computation in the UAV to Map registration step so that the overall mosaicing process can be done in reasonable time. Acknowledgments This work was supported by grants from Lockheed Martin. We thank Mark Pritt for providing the data. References [1] David G. Lowe, Distinctive image features from scaleinvariant keypoints, nternational Journal of Computer Vision, Vol. 60, No. 2, pp. 91-110, 2004. [2] Matthew Brown and David G. Lowe, Recognising panoramas, nternational Conference on Computer Vision (CCV 2003), pp. 1218-25. [3] M. A. Fischler and R. C. Bolles, Random Sample Consensus: A Paradigm for Model Fitting with Applications to mage Analysis and Automated Cartography, Comm. of the ACM, 24, pp. 381-395, 1981. [4] P. A. Viola, Alignment by Maximization of Mutual nformation, nternational Journal of Computer Vision, 24(2) pp. 137-154, 1997. [5] C. Harris and M.J. Stephens. A combined corner and edge detector, Alvey Vision Conference, pp. 147V152, 1988. [6] W. Triggs, P. McLauchlan, R. Hartley, and A. Fitzgibbon. Bundle Adjustment: A Modern Synthesis. n Vision Algorithms: Theory and Practice, number 1883 in LNCS, pages 298V373. Springer-Verlag, Corfu, Greece, September 1999. [7] H. S. Sawhney and R. Kumar. True multi-image alignment and its application to mosaicing and lens distortion correction, EEE Transactions on Pattern Analysis and Machine ntelligence, 21(3):235-243, 1999. [8] L. G. Brown, A survey of image registration techniques, ACM Computing Surveys, 24(4), pp. 325-376, 1992. [9] R. Wildes, D. Horvonen, S. Hsu, R. Kumar, W. Lehman, B. Matei and W. Zhao, Video Georegistration: Algorithm and Quantitative Evaluation, Proc. CCV, 343-350, 2001. [10] G. Medioni, Matching of a Map with an Aerial mage, Proceedings of the 6th nternational Conference on Pattern Recognition, pp. 517-519, Munich, Germany, October 1982. [11] Xiaolei Huang, Yiyong Sun, Dimitris Metaxas, Frank Sauer, Chenyang Xu, Hybrid mage Registration based on Configural Matching of Scale-nvariant Salient Region Features, cvprw, p. 167, 2004 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW 04) Volume 11, 2004 [12] S. Hsu. Geocoded Terrestrial Mosaics Using Pose Sensors and Video Registration, EEE Conf. on Computer Vision and Pattern Recognition, Kauai, Huwaii, USA, Dec. 2001. [13] Cannata, R.W. Shah, M. Blask, S.G. Van Workum, J.A. Harris Corp., Melbourne, FL Autonomous video registration using sensor model parameter adjustments, Applied magery Pattern Recognition Workshop, 2000. Proceedings. 29th, pp. 215-222. [14] D. Hirvonen, B. Matei, R. Wildes and S. Hsu. Video to Reference mage Alignment in the Presence of Sparse Features and Appearance Change, EEE Conf. on Computer Vision and Pattern Recognition, Kauai, Huwaii, USA, Dec. 2001.

(a) (b) (c) (d) Figure 7. (a),(c) Results of case 1 and case 2 with only registration of consecutive UAV images respectively. (b),(d) Results of case 1 and case 2 with additional UAV to Map registration very 50 frames respectively.