LOCAL AND GLOBAL DESCRIPTORS FOR PLACE RECOGNITION IN ROBOTICS

Similar documents
Ensemble of Bayesian Filters for Loop Closure Detection

A Comparison of SIFT, PCA-SIFT and SURF

SURF applied in Panorama Image Stitching

Building a Panorama. Matching features. Matching with Features. How do we build a panorama? Computational Photography, 6.882

Lecture 10 Detectors and descriptors

Implementation and Comparison of Feature Detection Methods in Image Mosaicing

SURF: Speeded Up Robust Features. CRV Tutorial Day 2010 David Chi Chung Tam Ryerson University

Feature Based Registration - Image Alignment

Determinant of homography-matrix-based multiple-object recognition

Object Recognition with Invariant Features

Salient Visual Features to Help Close the Loop in 6D SLAM

III. VERVIEW OF THE METHODS

K-Means Based Matching Algorithm for Multi-Resolution Feature Descriptors

A Novel Extreme Point Selection Algorithm in SIFT

Comparison of Feature Detection and Matching Approaches: SIFT and SURF

Motion Estimation and Optical Flow Tracking

Local Image Features

Appearance-Based Place Recognition Using Whole-Image BRISK for Collaborative MultiRobot Localization

Multi-modal Registration of Visual Data. Massimiliano Corsini Visual Computing Lab, ISTI - CNR - Italy

VK Multimedia Information Systems

Local features and image matching. Prof. Xin Yang HUST

Lecture 4.1 Feature descriptors. Trym Vegard Haavardsholm

Evaluation and comparison of interest points/regions

SIFT: Scale Invariant Feature Transform

arxiv: v1 [cs.cv] 1 Jan 2019

Automatic Image Alignment

Prof. Feng Liu. Spring /26/2017

A Framework for Multiple Radar and Multiple 2D/3D Camera Fusion

Image Features: Local Descriptors. Sanja Fidler CSC420: Intro to Image Understanding 1/ 58

An image to map loop closing method for monocular SLAM

Visual localization using global visual features and vanishing points

Automatic Image Alignment

Fast Image Matching Using Multi-level Texture Descriptor

Lecture 12 Recognition

A NEW FEATURE BASED IMAGE REGISTRATION ALGORITHM INTRODUCTION

Monocular SLAM for a Small-Size Humanoid Robot

Object Recognition Algorithms for Computer Vision System: A Survey

Lecture 12 Recognition. Davide Scaramuzza

WAVELET TRANSFORM BASED FEATURE DETECTION

Image Features: Detection, Description, and Matching and their Applications

Stereoscopic Images Generation By Monocular Camera

International Journal of Advanced Research in Computer Science and Software Engineering

IMAGE RETRIEVAL USING VLAD WITH MULTIPLE FEATURES

Semantic Kernels Binarized A Feature Descriptor for Fast and Robust Matching

Department of Electrical and Electronic Engineering, University of Peradeniya, KY 20400, Sri Lanka

Video Google: A Text Retrieval Approach to Object Matching in Videos

URBAN STRUCTURE ESTIMATION USING PARALLEL AND ORTHOGONAL LINES

Automatic Image Alignment (feature-based)

SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS

Image Matching Using SIFT, SURF, BRIEF and ORB: Performance Comparison for Distorted Images

SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014

Application questions. Theoretical questions

ICICS-2011 Beijing, China

Patch Descriptors. CSE 455 Linda Shapiro

ACEEE Int. J. on Information Technology, Vol. 02, No. 01, March 2012

Real-Time Loop Detection with Bags of Binary Words

ROBUST SCENE CLASSIFICATION BY GIST WITH ANGULAR RADIAL PARTITIONING. Wei Liu, Serkan Kiranyaz and Moncef Gabbouj

Local Features Tutorial: Nov. 8, 04

Panoramic Image Stitching

Instance-level recognition

A Comparison of SIFT and SURF

Viewpoint Invariant Features from Single Images Using 3D Geometry

Outline 7/2/201011/6/

A hardware design of optimized ORB algorithm with reduced hardware cost

Patch Descriptors. EE/CSE 576 Linda Shapiro

SURF. Lecture6: SURF and HOG. Integral Image. Feature Evaluation with Integral Image

A Comparison and Matching Point Extraction of SIFT and ISIFT

Midterm Examination CS 534: Computational Photography

arxiv: v1 [cs.cv] 28 Sep 2018

Homographies and RANSAC

Video Processing for Judicial Applications

Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science.

Feature Detection and Matching

Local features: detection and description May 12 th, 2015

The SIFT (Scale Invariant Feature

CEE598 - Visual Sensing for Civil Infrastructure Eng. & Mgmt.

Evaluation of GIST descriptors for web scale image search

Unified Loop Closing and Recovery for Real Time Monocular SLAM

Local invariant features

Specular 3D Object Tracking by View Generative Learning

EE368 Project Report CD Cover Recognition Using Modified SIFT Algorithm

Keypoint-based Recognition and Object Search

Feature Detection. Raul Queiroz Feitosa. 3/30/2017 Feature Detection 1

Computer vision: models, learning and inference. Chapter 13 Image preprocessing and feature extraction

Improving feature based object recognition in service robotics by disparity map based segmentation.

A System of Image Matching and 3D Reconstruction

Instance-level recognition

Local Patch Descriptors

Filtering and mapping systems for underwater 3D imaging sonar

Performance Evaluation of Scale-Interpolated Hessian-Laplace and Haar Descriptors for Feature Matching

Key properties of local features

Robotics Programming Laboratory

Visual Recognition and Search April 18, 2008 Joo Hyun Kim

Lecture 12 Recognition

Visual Bearing-Only Simultaneous Localization and Mapping with Improved Feature Matching

A Hybrid Feature Extractor using Fast Hessian Detector and SIFT

Tensor Decomposition of Dense SIFT Descriptors in Object Recognition

Scale Invariant Feature Transform

Local features: detection and description. Local invariant features

CSE 252B: Computer Vision II

Transcription:

8th International DAAAM Baltic Conference "INDUSTRIAL ENGINEERING - 19-21 April 2012, Tallinn, Estonia LOCAL AND GLOBAL DESCRIPTORS FOR PLACE RECOGNITION IN ROBOTICS Shvarts, D. & Tamre, M. Abstract: The simultaneous autolocalization and mapping of the environment is one of the most pressing problems of robotics. Among the existing SLAM algorithms, place recognition is a must for several cases. As an example, in multirobot SLAM we have several individual maps created by various robots. In order to combine them into one global map we have to identify common places before merging them. In this paper, two methods that were successfully used for performing scene recognition between different images have been compared. We have considered the advantages and limitations of each method regarding our tasks. Key words: local descriptors, global descriptors, GIST, SIFT, SURF. 1. INTRODUCTION SLAM, standing for Simultaneous Localization and Mapping, tries to locate a mobile robot in its environment and estimate a map of it from sensory information [1]. A wide array of sensors have been used, but nowadays cameras are the preferred ones. At its core, SLAM algorithms apply sequential estimation techniques that estimate a model from noisy data. In a SLAM framework, the ability of recognizing a previously mapped area is useful in several occasions: for correcting the estimation drift when an area is revisited (a problem known as loop closure) [2]; for relocation in an estimated map (the kidnapped robot problem) [3]; or 351 for fusing information between multiple robots that are mapping the same area (multi-robot SLAM) [4]. Place recognition in visual SLAM has been usually addressed by constructing a visual vocabulary of local descriptors [2, 3]. Such vocabulary can be expensive to build and store if a robot performs an exploratory trajectory and accumulates new images. While there exists global descriptors in the computer vision literature, they have never been used in SLAM. This paper proposes the comparison of several local and global descriptors for the purpose of place recognition in robotics, both in terms of performance and computational cost. 2. LOCAL DESCRIPTORS In computer vision, local interest points have been used to solve many problems like object recognition, image registration, 3D reconstruction, and more. The usual approach is based on selecting some points in the image and perform a local analysis on theses ones. For a successfully work of such methods, a sufficient number of such keypoints have to be detected. In addition, these points should be distinguishable and stable features that can be accurately localized. A lot of research on the behavior of several types of feature descriptor and detectors has been done. We compared the results of such investigations to select the appropriate feature descriptor and detector for further work. The best feature detector has to meet the following requirements:

The extracted keypoints have to be rotation and scale invariant. Invariance to luminance transformation, at least partly. Invariant to blur and noise. A comparison of six methods that were implemented in OpenCV library was showed in [10]. Five quality and one performance test was done for each kind of descriptor. Fig. 3. The result of lighting test for OpenCV s feature detector algorithms. Based on the materials presented in [10] there are two descriptors that showed the most stable results SIFT and SURF. It should also be noticed that these algorithms are the slowest among all the tested [Fig. 4]. Fig. 1. The result of rotation test for OpenCV s feature detector algorithms. [Fig. 1] shows that almost all algorithms are partially invariant to rotation except BRIEF, presenting SIFT the best repeatability. Close to SIFT are ORB and SURF feature descriptors. [Fig. 2] shows the scale invariance performance of different algorithms. Fig. 2. The Scale test for OpenCV's features detector algorithms. Again, the most stable results showed SURF and SIFT descriptors. Almost all the descriptors have a high degree of invariance to brightness change, as shown in [Fig. 3] Fig. 4. Speed test of OpenCV's implemented algorithms The use of these algorithms in real-time applications could be limited due to these computational cost. However, the high quality of the calculated keypoints makes these algorithms irreplaceable for solving many problems in computer vision. A detailed description of those algorithms is not presented here for the sake of brevity. The reader is referred to the original papers [7] and [13] for a deep understanding of both algorithms. We carried out several additional tests with SIFT and SURF algorithms to determine witch of those has the most suitable performances for our particular purpose. The results are presented in (Fig. 5) and in table 1. 352

Fig. 5. a) A test-image with extracted keypoints by using SIFT descriptor, b) the same image with extracted keypoints by applying SURF descriptor SURF SIFT Image size 480x640x3 480x640x3 Number of 1126 1511 points extracted keypoints keypoints The execution time 878.86ms 1245.8ms Table 1. The comparison of two local descriptors. Both SIFT and SURF descriptors have showed similar results and could apply for solving a issue in SLAM application but with certain limitations. The number of extracted keypoints has played a major role by choosing suitable descriptors for further work. 3. GLOBAL DESCRIPTORS In the previous section we have investigated the properties of different local descriptors. There are several descriptors and we can choose the best one to accomplish specific task. But if we use a local descriptor, the representation of a whole image is restricted to the description of a set of points that was successfully extracted from the image. In contrast, global descriptors summarize the whole image in a single descriptor, being GIST the most representative [8]. We are going to highlight the major aspects of global descriptors in this chapter. Investigation in the field of global descriptor is conditional upon that the recognition of the real world scene based on encoding the global configuration, ignoring most of the details and objects information [8]. The abstract description of a scene can be obtained by discrete Fourier transformation of an image:, 1,, is the intensity distribution of the image along a spatial variables,, and are the spatial frequency variables. The complex function, can be decomposed into two terms,,, the amplitude spectrum of the image, and, the phase function of Fourier transformation. The phase function represents the information relative to local properties and amplitude spectrum give unlocalized information about to the image structure. The energy spectrum of Fourier transformation,, is a distribution of signal s energy among the different spatial frequencies. The global description of a scene is encoded in this distribution and provides dimensional representation of the image,. It is impossible to operate with such representation of the image in practice, due to the high dimensionality of energy spectrum. The standard way for data reduction of matrix of energy spectrum is principal components analysis. It is needed to rearrange the matrix representation in a column vector than PCA extracts a subspace spanned by a subset of a KL functions. The direct implementation of this method is impossible in practice. The reliable calculation of the KL basis function required a number of image samples more then. But in practice we don't have them usually. [8] suggests sampling the function, as: 353

,, being are a set of Gaussian functions. We have tested the MATLAB code (created by author) to examine properties of GIST descriptor and the possibility to use it instead of local descriptors for scene matching. 4. EXPERIMENTAL RESULTS In this section we examine two different descriptors, global and local. The aim of the experiment is to prove the ability of the descriptors to match two images of the same scene. The way of solving this problem is well understood. The problem consists of estimation of homography between pairs of images. First we have tested the local descriptor. The algorithm is presented below: Algorithm: Local descriptor in problem of matching of two images. Input: Two putative matched images. 1. Extract SIFT features from first and second image. In this section we use SIFT descriptor based on studies in section 3. 2. Estimation of putative correspondences: Find k nearest-neighbors for each feature. From the featurematching step we have identified images that have a large number of matches between them. Than we consider m images that have a largest number of matched points and use RANSAC to select of inliers that are have an impact on calculation of homography. 3. Fundamental matrix estimation: Find geometrically consistent feature matches using RANSAC to solve for the fundamental matrix computation between pairs of images. We just select the image that has a largest number of inlier points. Output: Two matched images. The better way is presented in [12]. But in our case we have made a simper experiment. In the next experiment we examine the properties of a GIST descriptors for automatic image matching. As input set we use the same set of images. During the experiment we calculate the gist descriptor for each image. The best matching is an image with a smaller distance between GIST vectors. The result of two experiments was obvious. All descriptors are invariant to rotation and scaling. As we said earlier, we don't look at the execution time of methods. We focused more on the properties of methods. Both methods can be successfully applied for the automatic matching of images. But without any additional measurement of performances is obvious that GIST descriptor is faster. Fig. 6. Dynamic of SIFT descriptor by image matching. In the top image the result of matched points is 44. And we show an increase of numbers of matched points from 44 at the top to 158 points in the bottom image. We show in the figure 7 the result of the GIST descriptor. 354

global descriptors regarding performance and cost. b) Fig. 7. Image a) is an input image for a GIST descriptor b) output from the GISTtest algorithm. a) Finally, we describe an image set. As input for algorithm we chooses the first image from the 586 sets of images. The set of images is an image sequence created from the moving camera. The image resolution is 568x320 pixels. 5. CONCLUSION In this paper we have tested different image descriptors, several local and a global one, in order to foresee its possible use in a robotic application. From the several local descriptors that have been evaluated we observe a compromise between speed and performance: SIFT and SURF present the higher invariance to different transformations, but are more expensive to compute than the rest of the local descriptors. Regarding the global descriptor GIST, we have observed a good performance for scene recognition, a higher compacity and possibly low cost, which indicates a good potential for image matching in robotics. As future work, our aim is to perform a detailed comparison between local and 5. REFERENCES [1] H. Durrant-Whyte and T. Bailey, Simultaneous localization and mapping (SLAM): Part I the essential algorithms Robotics and Automation Magazine, vol. 13, no. 2, pp. 99 110, 2006. [2] D. Galvez-Lopez and J. D. Tardos, Real-time loop detection with bags of binary words Intelligent Robots and Systems (IROS), 2011 IEEE/RSJ International Conference on, sept. 2011, pp. 51 58. [3] B. Williams, G. Klein, and I. Reid, Real-time SLAM relocalisation in IEEE 11th International Conference on Computer Vision, 2007, p. 1:8. [4] S. Thrun and Y. Liu, Multi-robot slam with sparse extended information filers Robotics Research, pp. 254 266, 2005. [5] K.Mikolajczyk,T.Tuytelaars,C.Schmi,A.Zi sserman,j.matas,f.schaffalitzky, T. Kadir, and L. Gool, A comparison of affine region detectors International Journal of Computer Vision, vol. 65, no. 1, pp. 43 72, 2005. [6] K. Mikolajczyk and C. Schmid, A performance evaluation of local descriptors Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 27, no. 10, pp. 1615 1630, 2005. [7] D. G. Lowe, Distinctive image features from scale-invariant keypoints Interna- tional Journal of Computer Vision, vol. 60, no. 2, pp. 91 110, 2004. [8] A. Oliva and A. Torralba, Modeling the shape of the scene: A holistic representation of the spatial envelope International Journal of Computer Vision, vol. 42, no. 3, pp. 145 175, 2001. [9] B. Williams, M. Cummins, J. Neira, P. Newman, I. Reid, and J. Tardos, An image to map loop closing method for monocular SLAM in IEEE/RSJ International Conference on Intelligent Robots and Systems, 2008. IROS 2008, 2008, pp. 2053 2059. 355

[10] Feature descriptor comparison report http://computer-visiontalks.com/2011/08/feature-descriptorcomparison-report/ [11] R. I. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision ISBN: 0521623049, 2000 [12] M. Brown, D. G. Lowe, Recognising Panoramas. [13] Herbert Bay, Andreas Ess, Tinne Tuytelaars, Luc Van Gool, SURF: Speeded Up Robust Features, Computer Vision and Image Understanding (CVIU), Vol. 110, No. 3, pp. 346--359, 2008 356