Life-Long Place Recognition by Shared Representative Appearance Learning

Size: px
Start display at page:

Download "Life-Long Place Recognition by Shared Representative Appearance Learning"

Transcription

1 Life-Long Place Recognition by Shared Representative Appearance Learning Fei Han, Xue Yang, Yiming Deng, Mark Rentschler, Dejun Yang and Hao Zhang Division of Computer Science, EECS Department, Colorado School of Mines, Golden, CO 8040 of Electrical Engineering, University of Colorado Denver, CO Department of Mechanical Engineering, University of Colorado Boulder, CO {fhan, {djyang, Department Abstract Place recognition plays an important role in Simultaneously Localization and Mapping (SLAM) in many robotics applications. When long-term SLAM is performed in an outdoor environment, additional challenges can be encountered, including strong perceptual aliasing and appearance variations caused by changes of illumination, vegetation, weather, etc. Especially, when considering life-long place recognition, these issues become more critical. We introduce a novel Shared Representative Appearance Learning (SRAL) approach for long-term loop closure detection. Different from previous place recognition methods using a single feature modality or multi-feature concatenation techniques, our SRAL approach can autonomously learn a set of shared features that are representative in all scene scenarios and integrate them together to represent the long-term appearance of scenes observed by a robot during vision-based navigation. By formulating SRAL as an optimization problem regularized by structured sparsityinducing norms, we are able to model the interdependence of the feature modalities. An optimization algorithm is also developed to efficiently solve the formulated convex optimization problem, which holds a theoretical guarantee to find the global optimum. Extensive experiments are performed to assess the SRAL method using large-scale benchmark datasets, including St Lucia, CMUVL, and Nordland datasets. Experimental results have validated that SRAL outperforms existing place recognition methods, and obtains superior performance on life-long place recognition. Fig.. Motivating examples of the algorithm based on learning the shared representative appearance for life-long place recognition. The same place can show great appearance variations in different weather, vegetation, and time (Nordland Dataset). The proposed SRAL algorithm automatically identifies a set of heterogenous features (e.g., color, GIST, HOG, and LBP), which are not only representative, but also shared by different scenarios, thus image matching in these scenarios can be more accurately performed. I. I NTRODUCTION Visual place recognition during a long time period is one of the core capabilities needed by a modern intelligent robot to perform long-term visual Simultaneous Localization And Mapping (SLAM) [23], which thus has attracted an increasing attention in the robotics community [0, 29, 20, 8, 34, 7]. Place recognition aims at identifying the locations previously visited by a mobile robot, which enables the robot to localize itself in an environment during navigation as well as correct incremental pose drifts during SLAM. However, visual place recognition is a challenging problem due to robot movement, occlusion, dynamic objects (e.g., cars and pedestrians), and other vision challenges such as illumination changes. Moreover, multiple locations may have similar appearance, which results in the so-called perceptual aliasing issue. Long-term place recognition also introduces additional difficulties. For example, the same place during daytime and night, or across seasons, can look quite differently, which we name the Longterm Appearance Change (LAC) problem. In the past several years, life-long place recognition has been intensively investigated [23, 29, 35, 2, 28, 24], due to its importance to long-term robot navigation. For example, FABMAP [6] is one of the first techniques to address life-long place recognition for loop-closure detection in SLAM, which detects revisited places by matching individual images according to their visual appearance, and was validated over 000 km [7]. More recently, sequence-based image matching methods have demonstrated very promising performance on long-term place recognition, such as SeqSLAM [24], ABLE-M [2], and others [29, 2, 5], which depend on sequence-based image matching for long-term place recognition. Previous sequencebased place recognition methods either utilize a single visual feature to encode place appearance, or combine two visual features simply through concatenation [34, 38, 39]. However, these methods that rely on purely hand-crafted features are not able to learn the importance of heterogenous features and effectively integrate the features together for robust place recognition, especially across a long-term time span. To construct a descriptive representation for life-long place recognition, we propose a novel algorithm, referred to as the Shared Representative Appearance Learning (SRAL), which

2 automatically learns a set of heterogenous visual features that are important to all scene scenarios, and fuses them together to represent the long-term appearance of scenes observed by a robot during vision-based navigation. Our SRAL algorithm models the key insight that visual features that are not only discriminative but also shared among all scene scenarios are the best for long-term place recognition. If visual features are only highly discriminative to encode a specific scenario (e.g., winter), they are less useful to match images from different scenarios, because they can not encode other scenarios. For the first time, our algorithm automatically learns multimodal shared representative features that are discriminative to represent the long-term visual appearance in all scene scenarios, with strong perceptual aliasing and LCA issues (e.g., during daytime and night, and/or across multiple seasons). We formulate this task as a multi-task multimodal feature learning task, and solve this problem through sparse convex optimization. Then, the fused multimodal, long-term representation is used with SeqSLAM [24], a sequence-based image matching method, to robustly perform life-long place recognition during daytime/night and across seasons. Our contributions are threefold. First, we introduce a novel formulation of constructing long-term appearance representations as a multitask, multimodal feature learning and fusion problem, with the objective of automatically identifying a set of heterogenous visual features that are discriminative for all scene scenarios (e.g., daytime vs night). Second, we propose a novel SRAL algorithm that addresses this problem based on sparse convex optimization regularized by two structured sparsity-inducing norms to analyze the interrelationships of multimodal features and learn shared representative features. Third, a new optimization method is developed to solve the formulated problem, which is theoretically guaranteed to find the optimal solution. The rest of the paper is organized as follows. We describe related research on long-term place recognition in Section II. Section III introduces the novel SRAL algorithm for long-term representation learning. Experimental results are discussed in Section IV. Finally, we conclude our paper in Section V. II. RELATED WORK Existing SLAM techniques can be broadly classified into three categories based on graph optimization, particle filter, and Extended Kalman Filter [37]. Based on the relationship of visual inputs and mapping outputs, SLAM can be categorized into 2D SLAM (using 2D images to produce 2D maps) [20, 5, 6], monocular SLAM (using 2D image sequences to generate 3D maps) [8, 27], and 3D SLAM (using 3D point clouds or RGB-D images to build 3D maps) [9, 33, 3]. Place recognition is an integrated component for loop closuring in all visual SLAM methods, which employs visual features to identify locations previously visited by robots. We provide an overview of image matching methods and visual features used in life-long place recognition. A. Image Matching for Long-Term Place Recognition Most previous place recognition techniques are based on image-to-image matching, which are broadly categorized into two groups, based on pairwise similarity scoring and nearest neighbor search. Methods based on pairwise similarity scores calculate a similarity score of the current scene image and each template based on a certain distance metric and select the template that has the maximum score [5, ]. On the other hand, techniques based on the nearest neighbor search typically build a search tree to efficiently identify the most similar template to the current scene. For example, the FAB- MAP SLAM [7, 6] employs the Chow Liu tree to locate the most similar template; RTAB-MAP [8, 9] uses the KD tree to perform fast nearest neighbor search; and other search trees are also used in different place recognition methods for efficient image-to-image matching [8, 2]. However, due to the limited information provided by a single image, previous image-based matching methods usually suffer from the LAC and perceptual aliasing issues, and are not robust to perform life-long place recognition. To integrate more information, several recent methods use a sequence of image frames to decrease the negative effect of LAC and perceptual aliasing and improve the accuracy of life-long place recognition [24, 2, 5, 4, 25]. Most sequence-based place recognition techniques are based on a similarity score computed from a pair of template and query image sequences. For example, SeqSLAM [24], one of the earliest sequence-based life-long place recognition methods, computes a sum of image-based similarity scores to match image sequences. The ABLE-M approach [2] concatenates binary features extracted from each image in a sequence as a representation to match between sequences. The similarity scores are also used by other sequence-based image matching techniques to perform life-long place recognition for robot navigation [25, 4, 7, 5], which calculate a sequence-based similarity score of the query and all template sequences to construct a similarity matrix, and then choose the template sequence with a statistically high score as a match. To model temporal relationships of individual frames in the sequence, data association methods based on Hidden Markov Models (HMMs) [2], Conditional Random Fields (CRFs) [4], and graph flow [28] were also proposed to align a pair of template and query sequences. However, the previous methods, usually based on a single type of features, are not capable to effectively fuze the rich information offered by multiple modalities of visual features. We address this problem by introducing a novel data fusion algorithm that can work with SeqSLAM and other sequencebased or image-based image matching methods for life-long place recognition. B. Visual Features for Place Representation A large number of visual features are proposed in previous long-term place recognition techniques to encode the scenes observed by robots during navigation, which can be broadly classified into two categories: local and global features.

3 Local features apply a detector to detect points of interest (e.g., corners) in an image and a descriptor to encode local information around each detected point of interest. Then, the Bag-of-Words (BoW) model is typically employed by place recognition methods to build a feature vector for each image. The Scale-Invariant Feature Transform (SIFT) feature is used along with a BoW model to detect revisited locations from 2D images []. FAB-MAP [6, 7] applies the Speeded Up Robust Features (SURF) for visual place recognition. Both features are also used by the RTAB-MAP SLAM [8, 9]. The binary BRIEF and FAST features are applied to build a BoW model to perform fast loop closure detection [9]. The ORB feature is also used for place recognition [26]. The local visual features are usually discriminative to differentiate each scene scenario (e.g., daytime vs night, or different seasons). However, they are often not shared by different scenarios and thus less representative to encode all scene scenarios for image matching in long-term place recognition. Global features extract information from the whole image, and a feature vector is often formed based on concatenation or feature statistics (e.g., histograms). These global features can encode raw image pixels, shape signatures, and other information. GIST features [2], constructed from the response of steerable filters at different orientations and scales, were employed to perform place recognition [34]. Local Difference Binary (LDB) features were applied to represent scenes by directly using intensity and gradient differences of image grid cells to calculate a binary string [2]. SeqSLAM [24] used the sum of absolute differences of low-resolution images as global features to conduct sequence-based place recognition. Convolutional neural networks (CNNs) were also utilized to extract deep features to match image sequences [29]. Other global features were also proposed to perform life-long place recognition and loop-closure detection [25, 28, 3]. These global features can encode whole image information and no dictionary-based quantization is required. They also showed promising performance for long-term place recognition [23]. However, the critical problem of identifying more discriminative global features and effectively fuzing them together were not well studied in previous methods based on global features, which is addressed in this paper. III. SHARED REPRESENTATIVE APPEARANCE LEARNING FOR LIFE-LONG PLACE RECOGNITION We propose to learn and fuse a set of heterogenous features that are representative and shared by multiple scenarios (e.g., night vs daytime, or different seasons), which was not well investigated in the previous research, to address the problem of life-long place recognition. We introduce the formulation of long-term feature learning and our novel SRAL algorithm from the perspective of sparse convex optimization. A new optimization solver is also implemented to solve the formulated problem, which has a theoretical guarantee to converge to the global optimal solution. Notation. In this paper, matrices are written as boldface, capital letters, and vectors are written as boldface lowercase letters. Given a matrix M = {m ij } R n m, we refer to its i-th row and j-th column as m i and m j, respectively. The l -norm of a vector v R n is defined as v = n v i. The l 2 -norm of v is defined as v 2 = v v. The l 2, -norm of the matrix M is defined as: M 2, = n and the Frobenius norm is defined as: A. Problem Formulation m j= m2 ij = n mi 2, () M F = m n j= m2 ij. (2) Given a set of images acquired by a robot during navigation in multiple scenarios (e.g., daytime vs night, or different seasons), the vectors of heterogenous features extracted from this image set are denoted as X = [x,..., x n ] R d n, and their corresponding scene scenarios are represented as Y = [y,..., y n ] R n c, where x i = [(x i ),..., (x m i ) ] R d is a vector of heterogenous features from the i-th image and consists of m modalities such that d = m j= d j, and y i Z c is the indicator vector of scene scenarios with y ij indicating whether the i-th image belongs to the j-th scene scenario. Then, the task of shared representative appearance learning can be considered as multitask feature learning and formulated as the following optimization problem: min A X W Y 2 F + λ W 2,. (3) where W R d c is a weight matrix with w ij denoting the importance of the i-th visual feature with respect to the j-th scene scenario, and λ is a trade-off parameter. The first term encoded by the squared Frobenius norm in Eq. (3) is a loss function that models the squared error of using the weighted features to identify a scene scenarios. The second term based on the l 2, -norm is a regularization that enforces the sparsity of the rows of W and the grouping effect of the weights in each row. The sparsity enforced by the regularization allows the method to find discriminative features with larger weights (non-discriminative features have a weight that is very close to 0); the grouping effect enforces the discriminative features have similar weights for all scene scenarios, since when the weights of a visual feature are equal for all scenarios, m i 2 in the l 2, -norm as presented in Eq. () obtains the minimum value. Therefore, this regularization term models our insight of finding shared representative features to build a descriptive representation that can represent long-term appearance variations for accurate life-long place recognition. Because different types of visual features typically capture different attributes of the scene scenarios (e.g., color, shape, etc.), when multiple modalities of features are used together, it is important to model their underlying structure introduced by multiple modalities. To realize this capability, we define a new regularization term that enforces the grouping effect of the weights of features belonging to the same modality, as well as the sparsity between modalities to identify shared

4 Our fusion method can be applied in both single-imagebased SLAM and sequence-based SLAM. A significant advantage of our SRAL method is that we are able to learn representative feature modalities shared by different environments caused by changes of weather and/or illumination conditions. This fusion method improves the robustness of the feature matching for place recognition, which significantly benefits loop closure detection in life-long SLAM. Fig. 2. Illustration of the proposed structured sparsity-inducing norms. Given the feature weight matrix W = [w ; w 2 ; ; w m] R d c, we model the interrelationship of feature modalities using the M-norm regularization term, which allows our SRAL algorithm to identify the shared modalities across different scenarios; we also introduce the l 2 -norm regularization to identify more representative shared features. This figure is best viewed in color. discriminative modalities, as follows: W M = di c (wpq) i 2 = W i F, (4) p= q= where W i R di c is the weight matrix for the i-th modality. Then, the final problem formulation of our SRAL algorithm becomes: min A X W Y 2 F + λ W 2, + λ 2 W M, (5) which simultaneously considers the underlying structure of feature modalities and identifies shared representative visual features to encode long-term appearance changes. B. Place Recognition The place recognition procedure can be partitioned into two phases according to our SRAL method, the weight learning phase and fusion phase. After solving the problem in Eq.5 during first phase (solution is detailed in Section III-C), we can obtain the optimal weight matrix W = [W ; W 2 ; ; W m ] R d c. The optimal weight ω i, i =,, m for each feature modality i can be obtained by ω i = W i F. Then, in the second fusion phase, given a new observation feature sample x = [x ; x 2 ; ; x m ] R d, we first calculate the matching score s i, i =,, m with respect to the query image and the existing image using each feature modality separately. Then the final score is obtained by the following fusion method s = ω i s i, i =,, m, (6) where ω i is the normalized weight for modality i. Finally, we can determine whether two images are matched by comparing the score with the predefined threshold. C. Optimization Algorithm and Analysis Because the objective in Eq. 5 comprises two non-smooth regularization terms: the l 2, -norm and M-norm, it is difficult to solve in general. To this end, we implement a new iterative algorithm to solve the optimization problem in Eq. 5 with the non-smooth regularization terms. Taking the derivative of the objective with respect to w i ( i c), and setting it to zero vector, we obtain XX w i Xy i + γ Dw i + γ 2 Dwi = 0, (7) where D is a diagonal matrix with the i-th diagonal element as 2 w i 2. D is a diagonal matrix with the i-th diagonal block as 2 W i F I i, I i is an identity matrix with size of d i, W i is the i-th row segment of W and includes the weights of the i-th feature modal. Thus we have w i = (XX + γ D + γ 2 D) Xy i. (8) It is noted that D and D are dependent on W and thus are also unknown variables. An iterative algorithm is proposed to solve this problem, which is described in Algorithm. In the following, we analyze the algorithm convergence and prove that Algorithm converges to the global optimum. First, we present a lemma from Nie et al. [30]: Lemma : For any vector ṽ and v, the following inequality holds: ṽ 2 ṽ v 2 v 2 v v 2. From Lemma, we can easily obtain the following corollary. Corollary : For any matrix M and M, the following inequality holds: M F M 2 F 2 M F M F M 2 F 2 M F. Theorem : Algorithm monotonically decreases the objective value of the problem in Eq. 5 in each iteration. Proof: According to Steps 3 of Algorithm, we know W(t + ) = argmin X W Y 2 F (9) W +γ T rw D(t + )W + γ 2 T rw D(t + )W. Then, we can derive that J (t + ) + γ T rw (t + )D(t + )W(t + ) +γ 2 T rw (t + ) D(t + )W(t + ) J (t) + γ T rw (t)d(t + )W(t) +γ 2 T rw (t) D(t + )W(t), where J (t) = X W(t) Y 2 F.

5 After substituting the definition of D and D, we obtain J (t + ) + γ +γ 2 m J (t) + γ W i (t + ) 2 F 2 W i (t) F w i (t + ) 2 2 (0) w i (t) γ 2 W i (t) 2 F 2 W i (t) F, From Lemma and Corollary, we can derive that and w i (t + ) 2 w i (t) 2 W i (t + ) F w i (t + ) 2 2 w i (t) 2 2, () W i (t + ) 2 F 2 W i (t) F W i W i (t) 2 F (t) F 2 W i. (2) (t) F Adding Eq. 0-2 on both sides, we have J (t + ) + γ w i (t + ) 2 (3) m +γ 2 W i (t + ) F J (t) + γ w i (t) 2 + γ 2 m W i (t) F. Therefore, Algorithm monotonically decreases the objective value in each iteration. Because the optimization problem in Eq. 5 is convex, the algorithm converges to the global optimal solution. Also, since the solution has a closed form in each iteration, Algorithm converges very fast. A. Experiment Setup IV. EXPERIMENTAL RESULTS We use three large-scale benchmark datasets to validate the performance of our SRAL method in different conditions during various time spans. The dataset statistics are summarized in Table I. Four types of visual features were employed in our experiments for all of these datasets, which are color features [22] applied on downsampled images, GIST features [2] applied on downsampled images, HOG features [28] applied on downsampled images, and LBP features [32] applied on downsampled images. Instead of concatenating all these features together into a large vector for the scene representation, we need to learn the weight of each feature in order to select features that Algorithm : An efficient algorithm to solve the optimization problem in Eq.5 Input : X = [x,, x n ] R d n and Y = [y,, y n ] R n c : Let t =. Initialize W(t) by solving min W X W Y 2 F. 2: while not converge do 3: Calculate the block diagonal matrix D(t + ), where the i-th diagonal element of D(t + ) is I i. Calculate the block diagonal matrix D(t + ), where the i-th diagonal block of D(t + ) is 2 W i (t) F I i. 4: For each w i ( i c), w i (t + ) = ( XX + γ D(t + ) + γ 2 D(t + )) Xyi. 5: t = t +. 6: end Output: W = W(t) R d c TABLE I STATISTICS AND SCENARIOS OF THE PUBLIC BENCHMARK DATASETS USED FOR VALIDATION OF OUR SRAL METHOD Dataset Sequence Image Statistics Scenario St Lucia [0] 0 2 km 0 22, 000 frames Different times at 5 FPS of the day CMU-VL [3] 5 8 km 5 3, 000 frames at 5 FPS Different months Nordland [35] km 4 900, 000 frames at 25 FPS Different seasons are robust to various environments in order for the long term place recognition objective in SLAM loop closure detection. We implement both the proposed SRAL method and the simple multi-modal feature concatenation method for longterm place recognition tasks. Our implementation was performed using a mixture of unoptimized Matlab and C++ on a Linux laptop with an i7 3.0 GHz GPU, 6GB memory and 2GB GPU. Similar to other state-of-the-art methods [29, 36], the implementation in this current stage is not able to perform large-scale long-term loop closure detection in real time. A key limiting factor is that the runtime is proportional to the number of previously visited places. Utilizing memory management techniques [9], combined with an optimized implementation, can potentially overcome this challenge to achieve realtime performance. In these experiments, we qualitatively and quantitatively evaluate our algorithms, and compare them with several state-of-the-art methods, including BRIEF-GIST [34] (using single BRIEF feature), SeqSLAM [24] (using single grayscale image feature), etc. B. Results on the St Lucia Dataset (Various Time of the Day) The St Lucia dataset [0] was collected by a single camera installed on a car in the suburban area of St Lucia in Australia at various times over several days during a two-week period. Each data instance includes a video of minutes. GPS data was also recorded, which is used in the experiment as the

6 (a) Matched images BRIEF-GIST Feature Concatenation SRAL SeqSLAM Color LBP Precision 5 Color % LBP 0.3% GIST 40% 0.75 HOG 59.% 0.7 dynamic objects and other vision challenges including camera motions and illumination changes. Comparisons with some of the main state-of-the-art methods are also graphically presented in Fig. 3(b). It is observed that for long-term loop closure detection, our SRAL algorithm outperforms the methods based on single features, including BRIEF-GIST [34], SeqSLAM [24], color-based SLAM [22], and LBP-based localization [32], due to the significant appearance variations of the same location at different times. It can be observed from the matching images in Fig. 3(a) that scene appearance varies a lot during different times of the day, which causes some features less representative to identify the same place in different scenarios. After the feature modality learning procedure in our SRAL method, the weights of these less useful features will be decreased, resulting in better final place recognition performance Recall (b) Precision-recall curves (c) Feature weight Fig. 3. Experimental results over the St Lucia dataset. Fig. 3(a) presents an example showing the matched images recorded at 5:45 on 08/8/2009 and 0:00 on 09/0/2009, respectively. Fig. 3(b) illustrates the precision-recall curves that indicate the performance of our SRAL algorithms. Quantitative comparisons with some of the main state-of-the-art loop closure detection methods are shown in Fig. 3(b). Fig. 3(c) illustrates the weights of each feature modality learned automatically by Algorithm. The figures are best seen in color. (a) Matched sequences Color 9.79% LBP 0.98% GIST 0.98% ground truth for place recognition. The dataset contains several challenges including appearance variations due to illumination changes at different times of a day, dynamic objects including pedestrians and vehicles, and viewpoint variations due to slight route deviations. The dataset statistics is shown in Table I. Loop closure detection results over the St Lucia dataset are illustrated in Fig. 3. The quantitative performance is evaluated using a standard precision-recall curve, as shown in Fig. 3(b). The high precision and recall values indicate that our SRAL methods with `2, -norm and M -norm obtain high performance and well match morning and afternoon video sequences. We also implement the feature concatenation method, which simply concatenate all feature modalities into a big feature for the scene representation. Our SRAL method is able to learn the weight of each feature modality, which enables more powerful and robust scene representation abilities. The precision and recall curve of these two methods can be found in Fig. 3(b), showing that our SRAL method outperforms the simple feature concatenation scheme, which underscores the importance of representative feature modalities with shared ability. To qualitatively evaluate the experimental results, an intuitive example of the sequence-based matching is presented in Fig. 3(a). We show the image (upper row of Fig. 3(a)) that has the maximum matching score for a query image (bottom row of Fig. 3(a)) within the sequence. This qualitative results demonstrate that the proposed SRAL algorithm works well with the presence of Precision BRIEF-GIST Feature Concatenation SRAL SeqSLAM Color LBP HOG 88.25% Recall (b) Precision-recall curves (c) Feature weight Fig. 4. Experimental results over the CMU-VL dataset. Fig. 4(a) presents an example demonstrating the matched images and query images recorded in December and October, respectively. Fig. 4(b) illustrates the precision-recall curves that indicate the performance of our SRAL algorithms. Quantitative comparisons with some of the main state-of-the-art loop closure detection methods are also shown in Fig. 4(b). Fig. 4(c) illustrates the weights of each feature modality learned automatically by Algorithm. The figures are best seen in color. C. Results on the CMU-VL Dataset (Different Months) The CMU Visual Localization (VL) dataset [3] was gathered using two cameras installed on a car that traveled the same route five times in Pittsburgh areas in the USA during different months in varying climatological, environmental and weather conditions. GPS information is also available, which is used as the ground truth for algorithm evaluation. This dataset contains seasonal changes caused by vegetation, snow, and illumination variations, as well as urban scene changes due to constructions

7 and dynamic objects. The visual data from the left camera is used in this set of experiments. The qualitative and quantitative testing results obtained by our SRAL algorithms on the CMU-VL dataset are graphically shown in Fig. 4. The qualitative results in Fig. 4(a) show the images (upper row) with the maximum score for each query image (bottom row) in an observed sequence. It is clearly observed that our SRAL method is able to well match scene images and recognize same locations across different months that exhibit significant weather, vegetation, and illumination changes. The quantitative experimental results in Fig. 4(b) indicate that the SRAL methods with M-norm and l 2, - norm regulations obtain significant performance. In order to demonstrate the significance of our SRAL method with weighted feature fusion, we also implemented the baseline feature concatenation method and compared its performance. The comparison result in Fig. 4(b) indicates that our SRAL method is still able to address the LAC problem and obtain better scene recognition result, which is the same phenomenon observed in the experiment using the St Lucia dataset. Fig. 4(b) also illustrates comparisons of our SRAL method with several previous loop closure detection approaches, which illustrates the same conclusion as in the St Lucia experiment that our SRAL loop closure detection approaches significantly outperform methods based on single feature matching for longterm place recognition. Precision (a) Matched sequences BRIEF-GIST Feature Concatenation SRAL SeqSLAM Color LBP Recall (b) Precision-recall curves Color 2.49% GIST 0.2% LBP 4% HOG 96.68% (c) Feature weights Fig. 5. Experimental results over the Nordland dataset. Fig. 5(a) presents an example illustrating the matched images and query images recorded winter and spring, respectively. Fig. 5(b) shows the precision-recall curves that indicate the performance of our SRAL algorithms. Quantitative comparisons with some of the main state-of-the-art loop closure detection methods are also shown in Fig. 5(b). Fig. 5(c) illustrates the weights of each feature modality learned automatically by Algorithm. The figures are best seen in color. D. Results on the Nordland Dataset (Different Seasons) The Nordland dataset [35] contains visual data from a tenhour long journey of a train traveling around 3000 km, which was recorded in four seasons from the viewpoint of the train s front cart. GPS data was also collected, which is employed as the ground truth for algorithm evaluation. Because the dataset was recorded across different seasons, the visual data contains strong scene appearance variations in different climatological, environmental and illumination conditions. In addition, since multiple places of the wilderness during the trip exhibit similar appearances, this dataset contains strong perceptual aliasing. The visual information is also very limited in several locations such as tunnels. The difficulties make the Nordland dataset one of the most channelling datasets for loop closure detection. Fig. 5 presents the experimental results obtained by matching the spring data to the winter video in the Nordland dataset. Fig. 5(a) demonstrates several example images from one of the matched sequences (upper row) and query (bottom row) images within the sequence. Fig. 5(a) validates that our SRAL algorithm can accurately match images that contain dramatic appearance changes across seasons. Fig. 5(b) illustrates the quantitative result obtained by our SRAL algorithms and the comparison with the previous techniques, showing the improvement of our SRAL method in comparison to existing methods. Also, it can be observed from Fig. 5(b) that our SRAL method outperforms the feature concatenation method, showing the significance of feature weight learning. V. CONCLUSION In this paper, we propose a novel SRAL algorithm that is able to learn the shared representative features and fuse the learned heterogenous feature modalities together to address the LAC problem and accurately perform life-long place recognition. The shared representability of each feature modality is automatically learned, which is formulated as an optimization problem regularized by structured sparsityinducing norms. A new optimization algorithm is also implemented to efficiently solve the formulated problem. To evaluate our SRAL algorithm, extensive experiments are performed using three large-scale benchmark datasets. The qualitative results have validated that SRAL is able to robustly perform long-term place recognition under significant scene variations across various environments in different days, months and seasons. In addition, the quantitative evaluation has shown that the proposed algorithm outperforms previous techniques and obtains a promising life-long place recognition performance. REFERENCES [] Adrien Angeli, David Filliat, Stéphane Doncieux, and Jean-Arcady Meyer. Fast and incremental method for loop-closure detection using bags of visual words. TRO, 24(5): , [2] R. Arroyo, P.F. Alcantarilla, L.M. Bergasa, and E. Romera. Towards life-long visual localization using an efficient matching of binary sequences from images. In ICRA, 205.

8 [3] Hernán Badino, Daniel Huber, and Takeo Kanade. Realtime topometric localization. In ICRA, 202. [4] César Cadena, Dorian Gálvez-López, Juan D Tardós, and José Neira. Robust place recognition with stereo sequences. TRO, 28(4):87 885, 202. [5] Cheng Chen and Han Wang. Appearance-based topological bayesian inference for loop-closing detection in a cross-country environment. IJRR, 25(0): , [6] Mark Cummins and Paul Newman. FAB-MAP: Probabilistic localization and mapping in the space of appearance. IJRR, 27(6): , [7] Mark Cummins and Paul Newman. Highly scalable appearance-only SLAM-FAB-MAP 2.0. In RSS, [8] Jakob Engel, Thomas Schöps, and Daniel Cremers. LSD- SLAM: Large-scale direct monocular SLAM. In ECCV, 204. [9] Dorian Gálvez-López and Juan D Tardós. Bags of binary words for fast place recognition in image sequences. TRO, 28(5):88 97, 202. [0] Arren J Glover, William P Maddern, Michael J Milford, and Gordon F Wyeth. FAB-MAP + RatSLAM: appearance-based slam for multiple times of day. In ICRA, 200. [] Jens-Steffen Gutmann and Kurt Konolige. Incremental mapping of large cyclic environments. In ISCIRA, 999. [2] Paul Hansen and Brett Browning. Visual place recognition using HMM sequence matching. In IROS, 204. [3] Peter Henry, Michael Krainin, Evan Herbst, Xiaofeng Ren, and Dieter Fox. RGB-D mapping: Using Kinectstyle depth cameras for dense 3D modeling of indoor environments. IJRR, 3(5): , 202. [4] Kin Leong Ho and Paul Newman. Detecting loop closure with scene sequences. IJCV, 74(3):26 286, [5] Edward Johns and Guang-Zhong Yang. Feature co-occurrence maps: Appearance-based localisation throughout the day. In ICRA, 203. [6] Michael Kaess, Hordur Johannsson, Richard Roberts, Viorela Ila, John J Leonard, and Frank Dellaert. isam2: Incremental smoothing and mapping using the Bayes tree. IJRR, 3(2):26 235, 202. [7] Manfred Klopschitz, Christopher Zach, Arnold Irschara, and Dieter Schmalstieg. Generalized detection and merging of loop closures for video sequences. In 3D Data Processing, Visualization, and Transmission, [8] Mathieu Labbe and Francois Michaud. Appearancebased loop closure detection for online large-scale and long-term operation. TRO, 29(3): , 203. [9] Mathieu Labbe and François Michaud. Online global loop closure detection for large-scale multi-session graph-based SLAM. In IROS, 204. [20] Yasir Latif, César Cadena, and José Neira. Robust loop closing over time for pose graph SLAM. IJRR, pages 6 626, 203. [2] Yasir Latif, Guoquan Huang, John Leonard, and José Neira. An online sparsity-cognizant loop-closure algorithm for visual navigation. In RSS, 204. [22] Donghwa Lee, Hyongjin Kim, and Hyun Myung. 2D image feature-based real-time RGB-D 3D SLAM. In RITA, pages [23] Stephanie Lowry, Niko Sunderhauf, Paul Newman, John J Leonard, David Cox, Peter Corke, and Michael J Milford. Visual place recognition: A survey. TRO, 205. [24] Michael J Milford and Gordon F Wyeth. SeqSLAM: Visual route-based navigation for sunny summer days and stormy winter nights. In ICRA, 202. [25] Michael J Milford, Gordon F Wyeth, and DF Rasser. RatSLAM: a hippocampal model for simultaneous localization and mapping. In ICRA, [26] Raúl Mur-Artal and Juan D Tardós. Fast relocalisation and loop closing in keyframe-based SLAM. In ICRA, 204. [27] Raul Mur-Artal, Jose Maria Martinez Montiel, and Juan D Tardos. ORB-SLAM: a versatile and accurate monocular slam system. TRO, 3(5):47 63, 205. [28] Tayyab Naseer, Luciano Spinello, Wolfram Burgard, and Cyrill Stachniss. Robust visual robot localization across seasons using network flows. In AAAI, 204. [29] Tayyab Naseer, Michael Ruhnke, Cyrill Stachniss, Luciano Spinello, and Wolfram Burgard. Robust visual SLAM across seasons. In IROS, 205. [30] Feiping Nie, Heng Huang, Xiao Cai, and Chris H Ding. Efficient and robust feature selection via joint l 2, -norms minimization. In NIPS, 200. [3] Edward Pepperell, Peter Corke, and Michael J Milford. All-environment visual place recognition with SMART. In ICRA, 204. [32] Yongliang Qiao, Cindy Cappelle, and Yassine Ruichek. Place recognition based visual localization using LBP feature and SVM. In Advances in Artificial Intelligence and Its Applications, pages [33] Jürgen Sturm, Nikolas Engelhard, Felix Endres, Wolfram Burgard, and Daniel Cremers. A benchmark for the evaluation of RGB-D SLAM systems. In IROS, 202. [34] Niko Sünderhauf and Peter Protzel. BRIEF-Gist Closing the loop by simple means. In IROS, 20. [35] Niko Sünderhauf, Peer Neubert, and Peter Protzel. Are we there yet? challenging seqslam on a 3000 km journey across all four seasons. In ICRAW, 203. [36] Niko Sunderhauf, Sareh Shirazi, Adam Jacobson, Feras Dayoub, Edward Pepperell, Ben Upcroft, and Michael Milford. Place recognition with ConvNet landmarks: Viewpoint-robust, condition-robust, training-free. In RSS, 205. [37] Sebastian Thrun and John J Leonard. Simultaneous localization and mapping. In Springer Handbook of Robotics, pages [38] Xue Yang, Fei Han, Hua Wang, and Hao Zhang. Enforcing template representability and temporal consistency for adaptive sparse tracking. In IJCAI, 206. [39] Hao Zhang, Fei Han, and Hua Wang. Robust multimodal sequence-based loop closure detection via structured sparsity. In RSS, 206.

arxiv: v1 [cs.ro] 18 Apr 2017

arxiv: v1 [cs.ro] 18 Apr 2017 Multisensory Omni-directional Long-term Place Recognition: Benchmark Dataset and Analysis arxiv:1704.05215v1 [cs.ro] 18 Apr 2017 Ashwin Mathur, Fei Han, and Hao Zhang Abstract Recognizing a previously

More information

Vision-Based Markov Localization Across Large Perceptual Changes

Vision-Based Markov Localization Across Large Perceptual Changes Vision-Based Markov Localization Across Large Perceptual Changes Tayyab Naseer Benjamin Suger Michael Ruhnke Wolfram Burgard Autonomous Intelligent Systems Group, University of Freiburg, Germany {naseer,

More information

Towards Life-Long Visual Localization using an Efficient Matching of Binary Sequences from Images

Towards Life-Long Visual Localization using an Efficient Matching of Binary Sequences from Images Towards Life-Long Visual Localization using an Efficient Matching of Binary Sequences from Images Roberto Arroyo 1, Pablo F. Alcantarilla 2, Luis M. Bergasa 1 and Eduardo Romera 1 Abstract Life-long visual

More information

Ensemble of Bayesian Filters for Loop Closure Detection

Ensemble of Bayesian Filters for Loop Closure Detection Ensemble of Bayesian Filters for Loop Closure Detection Mohammad Omar Salameh, Azizi Abdullah, Shahnorbanun Sahran Pattern Recognition Research Group Center for Artificial Intelligence Faculty of Information

More information

Fast and Effective Visual Place Recognition using Binary Codes and Disparity Information

Fast and Effective Visual Place Recognition using Binary Codes and Disparity Information Fast and Effective Visual Place Recognition using Binary Codes and Disparity Information Roberto Arroyo, Pablo F. Alcantarilla 2, Luis M. Bergasa, J. Javier Yebes and Sebastián Bronte Abstract We present

More information

Improving Vision-based Topological Localization by Combining Local and Global Image Features

Improving Vision-based Topological Localization by Combining Local and Global Image Features Improving Vision-based Topological Localization by Combining Local and Global Image Features Shuai Yang and Han Wang Nanyang Technological University, Singapore Introduction Self-localization is crucial

More information

Efficient and Effective Matching of Image Sequences Under Substantial Appearance Changes Exploiting GPS Priors

Efficient and Effective Matching of Image Sequences Under Substantial Appearance Changes Exploiting GPS Priors Efficient and Effective Matching of Image Sequences Under Substantial Appearance Changes Exploiting GPS Priors Olga Vysotska Tayyab Naseer Luciano Spinello Wolfram Burgard Cyrill Stachniss Abstract The

More information

Robust Visual Robot Localization Across Seasons using Network Flows

Robust Visual Robot Localization Across Seasons using Network Flows Robust Visual Robot Localization Across Seasons using Network Flows Tayyab Naseer University of Freiburg naseer@cs.uni-freiburg.de Luciano Spinello University of Freiburg spinello@cs.uni-freiburg.de Wolfram

More information

Analysis of the CMU Localization Algorithm Under Varied Conditions

Analysis of the CMU Localization Algorithm Under Varied Conditions Analysis of the CMU Localization Algorithm Under Varied Conditions Aayush Bansal, Hernán Badino, Daniel Huber CMU-RI-TR-00-00 July 2014 Robotics Institute Carnegie Mellon University Pittsburgh, Pennsylvania

More information

Appearance-Based Place Recognition Using Whole-Image BRISK for Collaborative MultiRobot Localization

Appearance-Based Place Recognition Using Whole-Image BRISK for Collaborative MultiRobot Localization Appearance-Based Place Recognition Using Whole-Image BRISK for Collaborative MultiRobot Localization Jung H. Oh, Gyuho Eoh, and Beom H. Lee Electrical and Computer Engineering, Seoul National University,

More information

Advanced Techniques for Mobile Robotics Bag-of-Words Models & Appearance-Based Mapping

Advanced Techniques for Mobile Robotics Bag-of-Words Models & Appearance-Based Mapping Advanced Techniques for Mobile Robotics Bag-of-Words Models & Appearance-Based Mapping Wolfram Burgard, Cyrill Stachniss, Kai Arras, Maren Bennewitz Motivation: Analogy to Documents O f a l l t h e s e

More information

Robust Visual SLAM Across Seasons

Robust Visual SLAM Across Seasons Robust Visual SLAM Across Seasons Tayyab Naseer Michael Ruhnke Cyrill Stachniss Luciano Spinello Wolfram Burgard Abstract In this paper, we present an appearance-based visual SLAM approach that focuses

More information

arxiv: v1 [cs.cv] 27 May 2015

arxiv: v1 [cs.cv] 27 May 2015 Training a Convolutional Neural Network for Appearance-Invariant Place Recognition Ruben Gomez-Ojeda 1 Manuel Lopez-Antequera 1,2 Nicolai Petkov 2 Javier Gonzalez-Jimenez 1 arxiv:.7428v1 [cs.cv] 27 May

More information

Visual Localization across Seasons Using Sequence Matching Based on Multi-Feature Combination

Visual Localization across Seasons Using Sequence Matching Based on Multi-Feature Combination sensors Article Visual Localization across Seasons Using Sequence Matching Based on Multi-Feature Combination Yongliang Qiao ID Le2i FRE2005, CNRS, Arts et Métiers, UBFC, Université de technologie de Belfort-Montbéliard,

More information

Robust Visual Robot Localization Across Seasons Using Network Flows

Robust Visual Robot Localization Across Seasons Using Network Flows Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence Robust Visual Robot Localization Across Seasons Using Network Flows Tayyab Naseer University of Freiburg naseer@cs.uni-freiburg.de

More information

Are you ABLE to perform a life-long visual topological localization?

Are you ABLE to perform a life-long visual topological localization? Noname manuscript No. (will be inserted by the editor) Are you ABLE to perform a life-long visual topological localization? Roberto Arroyo 1 Pablo F. Alcantarilla 2 Luis M. Bergasa 1 Eduardo Romera 1 Received:

More information

arxiv: v1 [cs.ro] 30 Oct 2018

arxiv: v1 [cs.ro] 30 Oct 2018 Pre-print of article that will appear in Proceedings of the Australasian Conference on Robotics and Automation 28. Please cite this paper as: Stephen Hausler, Adam Jacobson, and Michael Milford. Feature

More information

Proceedings of Australasian Conference on Robotics and Automation, 2-4 Dec 2013, University of New South Wales, Sydney Australia

Proceedings of Australasian Conference on Robotics and Automation, 2-4 Dec 2013, University of New South Wales, Sydney Australia Training-Free Probability Models for Whole-Image Based Place Recognition Stephanie Lowry, Gordon Wyeth and Michael Milford School of Electrical Engineering and Computer Science Queensland University of

More information

Effective Visual Place Recognition Using Multi-Sequence Maps

Effective Visual Place Recognition Using Multi-Sequence Maps IEEE ROBOTICS AND AUTOMATION LETTERS. PREPRINT VERSION. ACCEPTED JANUARY, 2019 1 Effective Visual Place Recognition Using Multi-Sequence Maps Olga Vysotska and Cyrill Stachniss Abstract Visual place recognition

More information

Robotics. Lecture 7: Simultaneous Localisation and Mapping (SLAM)

Robotics. Lecture 7: Simultaneous Localisation and Mapping (SLAM) Robotics Lecture 7: Simultaneous Localisation and Mapping (SLAM) See course website http://www.doc.ic.ac.uk/~ajd/robotics/ for up to date information. Andrew Davison Department of Computing Imperial College

More information

Multiple Kernel Learning for Emotion Recognition in the Wild

Multiple Kernel Learning for Emotion Recognition in the Wild Multiple Kernel Learning for Emotion Recognition in the Wild Karan Sikka, Karmen Dykstra, Suchitra Sathyanarayana, Gwen Littlewort and Marian S. Bartlett Machine Perception Laboratory UCSD EmotiW Challenge,

More information

Visual Place Recognition using HMM Sequence Matching

Visual Place Recognition using HMM Sequence Matching Visual Place Recognition using HMM Sequence Matching Peter Hansen and Brett Browning Abstract Visual place recognition and loop closure is critical for the global accuracy of visual Simultaneous Localization

More information

Simultaneous Localization and Mapping

Simultaneous Localization and Mapping Sebastian Lembcke SLAM 1 / 29 MIN Faculty Department of Informatics Simultaneous Localization and Mapping Visual Loop-Closure Detection University of Hamburg Faculty of Mathematics, Informatics and Natural

More information

LOCAL AND GLOBAL DESCRIPTORS FOR PLACE RECOGNITION IN ROBOTICS

LOCAL AND GLOBAL DESCRIPTORS FOR PLACE RECOGNITION IN ROBOTICS 8th International DAAAM Baltic Conference "INDUSTRIAL ENGINEERING - 19-21 April 2012, Tallinn, Estonia LOCAL AND GLOBAL DESCRIPTORS FOR PLACE RECOGNITION IN ROBOTICS Shvarts, D. & Tamre, M. Abstract: The

More information

Fusion and Binarization of CNN Features for Robust Topological Localization across Seasons

Fusion and Binarization of CNN Features for Robust Topological Localization across Seasons Fusion and Binarization of CNN Features for Robust Topological Localization across Seasons Roberto Arroyo 1, Pablo F. Alcantarilla 2, Luis M. Bergasa 1 and Eduardo Romera 1 Abstract The extreme variability

More information

Improving Vision-based Topological Localization by Combing Local and Global Image Features

Improving Vision-based Topological Localization by Combing Local and Global Image Features Improving Vision-based Topological Localization by Combing Local and Global Image Features Shuai Yang and Han Wang School of Electrical and Electronic Engineering Nanyang Technological University Singapore

More information

Semi-Dense Direct SLAM

Semi-Dense Direct SLAM Computer Vision Group Technical University of Munich Jakob Engel Jakob Engel, Daniel Cremers David Caruso, Thomas Schöps, Lukas von Stumberg, Vladyslav Usenko, Jörg Stückler, Jürgen Sturm Technical University

More information

Fast-SeqSLAM: Place Recognition and Multi-robot Map Merging. Sayem Mohammad Siam

Fast-SeqSLAM: Place Recognition and Multi-robot Map Merging. Sayem Mohammad Siam Fast-SeqSLAM: Place Recognition and Multi-robot Map Merging by Sayem Mohammad Siam A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science Department of Computing

More information

IBuILD: Incremental Bag of Binary Words for Appearance Based Loop Closure Detection

IBuILD: Incremental Bag of Binary Words for Appearance Based Loop Closure Detection IBuILD: Incremental Bag of Binary Words for Appearance Based Loop Closure Detection Sheraz Khan and Dirk Wollherr Chair of Automatic Control Engineering Technische Universität München (TUM), 80333 München,

More information

Single-View Place Recognition under Seasonal Changes

Single-View Place Recognition under Seasonal Changes Single-View Place Recognition under Seasonal Changes Daniel Olid, José M. Fácil and Javier Civera Abstract Single-view place recognition, that we can define as finding an image that corresponds to the

More information

Lazy Data Association For Image Sequences Matching Under Substantial Appearance Changes

Lazy Data Association For Image Sequences Matching Under Substantial Appearance Changes IEEE ROBOTICS AND AUTOMATION LETTERS. PREPRINT VERSION. DECEMBER, 25 Lazy Data Association For Image Sequences Matching Under Substantial Appearance Changes Olga Vysotska and Cyrill Stachniss Abstract

More information

Robot localization method based on visual features and their geometric relationship

Robot localization method based on visual features and their geometric relationship , pp.46-50 http://dx.doi.org/10.14257/astl.2015.85.11 Robot localization method based on visual features and their geometric relationship Sangyun Lee 1, Changkyung Eem 2, and Hyunki Hong 3 1 Department

More information

Deep Regression for Monocular Camera-based 6-DoF Global Localization in Outdoor Environments

Deep Regression for Monocular Camera-based 6-DoF Global Localization in Outdoor Environments Deep Regression for Monocular Camera-based 6-DoF Global Localization in Outdoor Environments Tayyab Naseer and Wolfram Burgard Abstract Precise localization of robots is imperative for their safe and autonomous

More information

Dynamic Environments Localization via Dimensions Reduction of Deep Learning Features

Dynamic Environments Localization via Dimensions Reduction of Deep Learning Features Dynamic Environments Localization via Dimensions Reduction of Deep Learning Features Hui Zhang 1(B), Xiangwei Wang 1, Xiaoguo Du 1, Ming Liu 2, and Qijun Chen 1 1 RAI-LAB, Tongji University, Shanghai,

More information

Semantics-aware Visual Localization under Challenging Perceptual Conditions

Semantics-aware Visual Localization under Challenging Perceptual Conditions Semantics-aware Visual Localization under Challenging Perceptual Conditions Tayyab Naseer Gabriel L. Oliveira Thomas Brox Wolfram Burgard Abstract Visual place recognition under difficult perceptual conditions

More information

Jakob Engel, Thomas Schöps, Daniel Cremers Technical University Munich. LSD-SLAM: Large-Scale Direct Monocular SLAM

Jakob Engel, Thomas Schöps, Daniel Cremers Technical University Munich. LSD-SLAM: Large-Scale Direct Monocular SLAM Computer Vision Group Technical University of Munich Jakob Engel LSD-SLAM: Large-Scale Direct Monocular SLAM Jakob Engel, Thomas Schöps, Daniel Cremers Technical University Munich Monocular Video Engel,

More information

Lightweight Unsupervised Deep Loop Closure

Lightweight Unsupervised Deep Loop Closure Robotics: Science and Systems 2018 Pittsburgh, PA, USA, June 26-30, 2018 Lightweight Unsupervised Deep Loop Closure Nate Merrill and Guoquan Huang Abstract Robust efficient loop closure detection is essential

More information

Visual Place Recognition in Changing Environments with Time-Invariant Image Patch Descriptors

Visual Place Recognition in Changing Environments with Time-Invariant Image Patch Descriptors Visual Place Recognition in Changing Environments with Time-Invariant Image Patch Descriptors Boris Ivanovic Stanford University borisi@cs.stanford.edu Abstract Feature descriptors for images are a mature

More information

Dense Tracking and Mapping for Autonomous Quadrocopters. Jürgen Sturm

Dense Tracking and Mapping for Autonomous Quadrocopters. Jürgen Sturm Computer Vision Group Prof. Daniel Cremers Dense Tracking and Mapping for Autonomous Quadrocopters Jürgen Sturm Joint work with Frank Steinbrücker, Jakob Engel, Christian Kerl, Erik Bylow, and Daniel Cremers

More information

Monocular Camera Localization in 3D LiDAR Maps

Monocular Camera Localization in 3D LiDAR Maps Monocular Camera Localization in 3D LiDAR Maps Tim Caselitz Bastian Steder Michael Ruhnke Wolfram Burgard Abstract Localizing a camera in a given map is essential for vision-based navigation. In contrast

More information

Dynamic Covariance Scaling for Robust Map Optimization

Dynamic Covariance Scaling for Robust Map Optimization Dynamic Covariance Scaling for Robust Map Optimization Gian Diego Tipaldi Luciano Spinello Cyrill Stachniss Wolfram Burgard Simulated Stereo (bad initial estimate) Manhattan35 (1 outliers) Sphere25 (1

More information

Deep Incremental Scene Understanding. Federico Tombari & Christian Rupprecht Technical University of Munich, Germany

Deep Incremental Scene Understanding. Federico Tombari & Christian Rupprecht Technical University of Munich, Germany Deep Incremental Scene Understanding Federico Tombari & Christian Rupprecht Technical University of Munich, Germany C. Couprie et al. "Toward Real-time Indoor Semantic Segmentation Using Depth Information"

More information

Robotics. Lecture 8: Simultaneous Localisation and Mapping (SLAM)

Robotics. Lecture 8: Simultaneous Localisation and Mapping (SLAM) Robotics Lecture 8: Simultaneous Localisation and Mapping (SLAM) See course website http://www.doc.ic.ac.uk/~ajd/robotics/ for up to date information. Andrew Davison Department of Computing Imperial College

More information

CS 378: Autonomous Intelligent Robotics. Instructor: Jivko Sinapov

CS 378: Autonomous Intelligent Robotics. Instructor: Jivko Sinapov CS 378: Autonomous Intelligent Robotics Instructor: Jivko Sinapov http://www.cs.utexas.edu/~jsinapov/teaching/cs378/ Visual Registration and Recognition Announcements Homework 6 is out, due 4/5 4/7 Installing

More information

Tri-modal Human Body Segmentation

Tri-modal Human Body Segmentation Tri-modal Human Body Segmentation Master of Science Thesis Cristina Palmero Cantariño Advisor: Sergio Escalera Guerrero February 6, 2014 Outline 1 Introduction 2 Tri-modal dataset 3 Proposed baseline 4

More information

OpenStreetSLAM: Global Vehicle Localization using OpenStreetMaps

OpenStreetSLAM: Global Vehicle Localization using OpenStreetMaps OpenStreetSLAM: Global Vehicle Localization using OpenStreetMaps Georgios Floros, Benito van der Zander and Bastian Leibe RWTH Aachen University, Germany http://www.vision.rwth-aachen.de floros@vision.rwth-aachen.de

More information

Direct Methods in Visual Odometry

Direct Methods in Visual Odometry Direct Methods in Visual Odometry July 24, 2017 Direct Methods in Visual Odometry July 24, 2017 1 / 47 Motivation for using Visual Odometry Wheel odometry is affected by wheel slip More accurate compared

More information

Memory Management for Real-Time Appearance-Based Loop Closure Detection

Memory Management for Real-Time Appearance-Based Loop Closure Detection Memory Management for Real-Time Appearance-Based Loop Closure Detection Mathieu Labbé and François Michaud Department of Electrical and Computer Engineering Université de Sherbrooke, Sherbrooke, QC, CA

More information

Supplementary Material for Ensemble Diffusion for Retrieval

Supplementary Material for Ensemble Diffusion for Retrieval Supplementary Material for Ensemble Diffusion for Retrieval Song Bai 1, Zhichao Zhou 1, Jingdong Wang, Xiang Bai 1, Longin Jan Latecki 3, Qi Tian 4 1 Huazhong University of Science and Technology, Microsoft

More information

Visual Recognition and Search April 18, 2008 Joo Hyun Kim

Visual Recognition and Search April 18, 2008 Joo Hyun Kim Visual Recognition and Search April 18, 2008 Joo Hyun Kim Introduction Suppose a stranger in downtown with a tour guide book?? Austin, TX 2 Introduction Look at guide What s this? Found Name of place Where

More information

Large-Scale Traffic Sign Recognition based on Local Features and Color Segmentation

Large-Scale Traffic Sign Recognition based on Local Features and Color Segmentation Large-Scale Traffic Sign Recognition based on Local Features and Color Segmentation M. Blauth, E. Kraft, F. Hirschenberger, M. Böhm Fraunhofer Institute for Industrial Mathematics, Fraunhofer-Platz 1,

More information

Learning Visual Semantics: Models, Massive Computation, and Innovative Applications

Learning Visual Semantics: Models, Massive Computation, and Innovative Applications Learning Visual Semantics: Models, Massive Computation, and Innovative Applications Part II: Visual Features and Representations Liangliang Cao, IBM Watson Research Center Evolvement of Visual Features

More information

15 Years of Visual SLAM

15 Years of Visual SLAM 15 Years of Visual SLAM Andrew Davison Robot Vision Group and Dyson Robotics Laboratory Department of Computing Imperial College London www.google.com/+andrewdavison December 18, 2015 What Has Defined

More information

A Sequence-Based Neuronal Model for Mobile Robot Localization

A Sequence-Based Neuronal Model for Mobile Robot Localization Author version. KI 2018, LNAI 11117, pp. 117-130, 2018. The final authenticated publication is available online at https://doi.org/10.1007/978-3-030-00111-7_11 A Sequence-Based Neuronal Model for Mobile

More information

Switchable Constraints vs. Max-Mixture Models vs. RRR A Comparison of Three Approaches to Robust Pose Graph SLAM

Switchable Constraints vs. Max-Mixture Models vs. RRR A Comparison of Three Approaches to Robust Pose Graph SLAM To appear in Proc. of IEEE Intl. Conf. on Robotics and Automation (ICRA), 2013. DOI: not yet available c 2013 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for

More information

FLaME: Fast Lightweight Mesh Estimation using Variational Smoothing on Delaunay Graphs

FLaME: Fast Lightweight Mesh Estimation using Variational Smoothing on Delaunay Graphs FLaME: Fast Lightweight Mesh Estimation using Variational Smoothing on Delaunay Graphs W. Nicholas Greene Robust Robotics Group, MIT CSAIL LPM Workshop IROS 2017 September 28, 2017 with Nicholas Roy 1

More information

Visual localization using global visual features and vanishing points

Visual localization using global visual features and vanishing points Visual localization using global visual features and vanishing points Olivier Saurer, Friedrich Fraundorfer, and Marc Pollefeys Computer Vision and Geometry Group, ETH Zürich, Switzerland {saurero,fraundorfer,marc.pollefeys}@inf.ethz.ch

More information

Beyond a Shadow of a Doubt: Place Recognition with Colour-Constant Images

Beyond a Shadow of a Doubt: Place Recognition with Colour-Constant Images Beyond a Shadow of a Doubt: Place Recognition with Colour-Constant Images Kirk MacTavish, Michael Paton, and Timothy D. Barfoot Abstract Colour-constant images have been shown to improve visual navigation

More information

Place Recognition using Near and Far Visual Information

Place Recognition using Near and Far Visual Information Place Recognition using Near and Far Visual Information César Cadena John McDonald John J. Leonard José Neira Instituto de Investigación en Ingeniería de Aragón (I3A), Universidad de Zaragoza, Zaragoza

More information

Discrete Optimization of Ray Potentials for Semantic 3D Reconstruction

Discrete Optimization of Ray Potentials for Semantic 3D Reconstruction Discrete Optimization of Ray Potentials for Semantic 3D Reconstruction Marc Pollefeys Joined work with Nikolay Savinov, Christian Haene, Lubor Ladicky 2 Comparison to Volumetric Fusion Higher-order ray

More information

String distance for automatic image classification

String distance for automatic image classification String distance for automatic image classification Nguyen Hong Thinh*, Le Vu Ha*, Barat Cecile** and Ducottet Christophe** *University of Engineering and Technology, Vietnam National University of HaNoi,

More information

Proc. 14th Int. Conf. on Intelligent Autonomous Systems (IAS-14), 2016

Proc. 14th Int. Conf. on Intelligent Autonomous Systems (IAS-14), 2016 Proc. 14th Int. Conf. on Intelligent Autonomous Systems (IAS-14), 2016 Outdoor Robot Navigation Based on View-based Global Localization and Local Navigation Yohei Inoue, Jun Miura, and Shuji Oishi Department

More information

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009 Learning and Inferring Depth from Monocular Images Jiyan Pan April 1, 2009 Traditional ways of inferring depth Binocular disparity Structure from motion Defocus Given a single monocular image, how to infer

More information

Human Motion Detection and Tracking for Video Surveillance

Human Motion Detection and Tracking for Video Surveillance Human Motion Detection and Tracking for Video Surveillance Prithviraj Banerjee and Somnath Sengupta Department of Electronics and Electrical Communication Engineering Indian Institute of Technology, Kharagpur,

More information

Short Survey on Static Hand Gesture Recognition

Short Survey on Static Hand Gesture Recognition Short Survey on Static Hand Gesture Recognition Huu-Hung Huynh University of Science and Technology The University of Danang, Vietnam Duc-Hoang Vo University of Science and Technology The University of

More information

Feature descriptors. Alain Pagani Prof. Didier Stricker. Computer Vision: Object and People Tracking

Feature descriptors. Alain Pagani Prof. Didier Stricker. Computer Vision: Object and People Tracking Feature descriptors Alain Pagani Prof. Didier Stricker Computer Vision: Object and People Tracking 1 Overview Previous lectures: Feature extraction Today: Gradiant/edge Points (Kanade-Tomasi + Harris)

More information

/10/$ IEEE 4048

/10/$ IEEE 4048 21 IEEE International onference on Robotics and Automation Anchorage onvention District May 3-8, 21, Anchorage, Alaska, USA 978-1-4244-54-4/1/$26. 21 IEEE 448 Fig. 2: Example keyframes of the teabox object.

More information

Scan Context: Egocentric Spatial Descriptor for Place Recognition within 3D Point Cloud Map

Scan Context: Egocentric Spatial Descriptor for Place Recognition within 3D Point Cloud Map Scan Context: Egocentric Spatial Descriptor for Place Recognition within 3D Point Cloud Map Giseop Kim and Ayoung Kim In many robotics applications, place recognition is the important problem. For SLAM,

More information

Combining PGMs and Discriminative Models for Upper Body Pose Detection

Combining PGMs and Discriminative Models for Upper Body Pose Detection Combining PGMs and Discriminative Models for Upper Body Pose Detection Gedas Bertasius May 30, 2014 1 Introduction In this project, I utilized probabilistic graphical models together with discriminative

More information

Notes 9: Optical Flow

Notes 9: Optical Flow Course 049064: Variational Methods in Image Processing Notes 9: Optical Flow Guy Gilboa 1 Basic Model 1.1 Background Optical flow is a fundamental problem in computer vision. The general goal is to find

More information

Proceedings of Australasian Conference on Robotics and Automation, 7-9 Dec 2011, Monash University, Melbourne Australia.

Proceedings of Australasian Conference on Robotics and Automation, 7-9 Dec 2011, Monash University, Melbourne Australia. Towards Condition-Invariant Sequence-Based Route Recognition Michael Milford School of Engineering Systems Queensland University of Technology michael.milford@qut.edu.au Abstract In this paper we present

More information

3D Environment Reconstruction

3D Environment Reconstruction 3D Environment Reconstruction Using Modified Color ICP Algorithm by Fusion of a Camera and a 3D Laser Range Finder The 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems October 11-15,

More information

Lecture 12 Recognition

Lecture 12 Recognition Institute of Informatics Institute of Neuroinformatics Lecture 12 Recognition Davide Scaramuzza 1 Lab exercise today replaced by Deep Learning Tutorial Room ETH HG E 1.1 from 13:15 to 15:00 Optional lab

More information

Aggregating Descriptors with Local Gaussian Metrics

Aggregating Descriptors with Local Gaussian Metrics Aggregating Descriptors with Local Gaussian Metrics Hideki Nakayama Grad. School of Information Science and Technology The University of Tokyo Tokyo, JAPAN nakayama@ci.i.u-tokyo.ac.jp Abstract Recently,

More information

Seminar Heidelberg University

Seminar Heidelberg University Seminar Heidelberg University Mobile Human Detection Systems Pedestrian Detection by Stereo Vision on Mobile Robots Philip Mayer Matrikelnummer: 3300646 Motivation Fig.1: Pedestrians Within Bounding Box

More information

Estimating Human Pose in Images. Navraj Singh December 11, 2009

Estimating Human Pose in Images. Navraj Singh December 11, 2009 Estimating Human Pose in Images Navraj Singh December 11, 2009 Introduction This project attempts to improve the performance of an existing method of estimating the pose of humans in still images. Tasks

More information

CS4670: Computer Vision

CS4670: Computer Vision CS4670: Computer Vision Noah Snavely Lecture 6: Feature matching and alignment Szeliski: Chapter 6.1 Reading Last time: Corners and blobs Scale-space blob detector: Example Feature descriptors We know

More information

Computer Vision. Exercise Session 10 Image Categorization

Computer Vision. Exercise Session 10 Image Categorization Computer Vision Exercise Session 10 Image Categorization Object Categorization Task Description Given a small number of training images of a category, recognize a-priori unknown instances of that category

More information

Segmentation and Tracking of Partial Planar Templates

Segmentation and Tracking of Partial Planar Templates Segmentation and Tracking of Partial Planar Templates Abdelsalam Masoud William Hoff Colorado School of Mines Colorado School of Mines Golden, CO 800 Golden, CO 800 amasoud@mines.edu whoff@mines.edu Abstract

More information

Dense 3D Reconstruction from Autonomous Quadrocopters

Dense 3D Reconstruction from Autonomous Quadrocopters Dense 3D Reconstruction from Autonomous Quadrocopters Computer Science & Mathematics TU Munich Martin Oswald, Jakob Engel, Christian Kerl, Frank Steinbrücker, Jan Stühmer & Jürgen Sturm Autonomous Quadrocopters

More information

Using the Deformable Part Model with Autoencoded Feature Descriptors for Object Detection

Using the Deformable Part Model with Autoencoded Feature Descriptors for Object Detection Using the Deformable Part Model with Autoencoded Feature Descriptors for Object Detection Hyunghoon Cho and David Wu December 10, 2010 1 Introduction Given its performance in recent years' PASCAL Visual

More information

Local features and image matching. Prof. Xin Yang HUST

Local features and image matching. Prof. Xin Yang HUST Local features and image matching Prof. Xin Yang HUST Last time RANSAC for robust geometric transformation estimation Translation, Affine, Homography Image warping Given a 2D transformation T and a source

More information

A Framework for Multi-Robot Pose Graph SLAM

A Framework for Multi-Robot Pose Graph SLAM A Framework for Multi-Robot Pose Graph SLAM Isaac Deutsch 1, Ming Liu 2 and Roland Siegwart 3 Abstract We introduce a software framework for realtime multi-robot collaborative SLAM. Rather than building

More information

An ICA based Approach for Complex Color Scene Text Binarization

An ICA based Approach for Complex Color Scene Text Binarization An ICA based Approach for Complex Color Scene Text Binarization Siddharth Kherada IIIT-Hyderabad, India siddharth.kherada@research.iiit.ac.in Anoop M. Namboodiri IIIT-Hyderabad, India anoop@iiit.ac.in

More information

Visual Bearing-Only Simultaneous Localization and Mapping with Improved Feature Matching

Visual Bearing-Only Simultaneous Localization and Mapping with Improved Feature Matching Visual Bearing-Only Simultaneous Localization and Mapping with Improved Feature Matching Hauke Strasdat, Cyrill Stachniss, Maren Bennewitz, and Wolfram Burgard Computer Science Institute, University of

More information

Bilevel Sparse Coding

Bilevel Sparse Coding Adobe Research 345 Park Ave, San Jose, CA Mar 15, 2013 Outline 1 2 The learning model The learning algorithm 3 4 Sparse Modeling Many types of sensory data, e.g., images and audio, are in high-dimensional

More information

Efficient Interest Point Detectors & Features

Efficient Interest Point Detectors & Features Efficient Interest Point Detectors & Features Instructor - Simon Lucey 16-423 - Designing Computer Vision Apps Today Review. Efficient Interest Point Detectors. Efficient Descriptors. Review In classical

More information

Human Detection and Tracking for Video Surveillance: A Cognitive Science Approach

Human Detection and Tracking for Video Surveillance: A Cognitive Science Approach Human Detection and Tracking for Video Surveillance: A Cognitive Science Approach Vandit Gajjar gajjar.vandit.381@ldce.ac.in Ayesha Gurnani gurnani.ayesha.52@ldce.ac.in Yash Khandhediya khandhediya.yash.364@ldce.ac.in

More information

(Deep) Learning for Robot Perception and Navigation. Wolfram Burgard

(Deep) Learning for Robot Perception and Navigation. Wolfram Burgard (Deep) Learning for Robot Perception and Navigation Wolfram Burgard Deep Learning for Robot Perception (and Navigation) Lifeng Bo, Claas Bollen, Thomas Brox, Andreas Eitel, Dieter Fox, Gabriel L. Oliveira,

More information

Introduction to SLAM Part II. Paul Robertson

Introduction to SLAM Part II. Paul Robertson Introduction to SLAM Part II Paul Robertson Localization Review Tracking, Global Localization, Kidnapping Problem. Kalman Filter Quadratic Linear (unless EKF) SLAM Loop closing Scaling: Partition space

More information

Self Lane Assignment Using Smart Mobile Camera For Intelligent GPS Navigation and Traffic Interpretation

Self Lane Assignment Using Smart Mobile Camera For Intelligent GPS Navigation and Traffic Interpretation For Intelligent GPS Navigation and Traffic Interpretation Tianshi Gao Stanford University tianshig@stanford.edu 1. Introduction Imagine that you are driving on the highway at 70 mph and trying to figure

More information

Lecture 12 Recognition. Davide Scaramuzza

Lecture 12 Recognition. Davide Scaramuzza Lecture 12 Recognition Davide Scaramuzza Oral exam dates UZH January 19-20 ETH 30.01 to 9.02 2017 (schedule handled by ETH) Exam location Davide Scaramuzza s office: Andreasstrasse 15, 2.10, 8050 Zurich

More information

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor COSC160: Detection and Classification Jeremy Bolton, PhD Assistant Teaching Professor Outline I. Problem I. Strategies II. Features for training III. Using spatial information? IV. Reducing dimensionality

More information

Dynamic Routing Between Capsules

Dynamic Routing Between Capsules Report Explainable Machine Learning Dynamic Routing Between Capsules Author: Michael Dorkenwald Supervisor: Dr. Ullrich Köthe 28. Juni 2018 Inhaltsverzeichnis 1 Introduction 2 2 Motivation 2 3 CapusleNet

More information

A High Dynamic Range Vision Approach to Outdoor Localization

A High Dynamic Range Vision Approach to Outdoor Localization A High Dynamic Range Vision Approach to Outdoor Localization Kiyoshi Irie, Tomoaki Yoshida, and Masahiro Tomono Abstract We propose a novel localization method for outdoor mobile robots using High Dynamic

More information

Topological Mapping. Discrete Bayes Filter

Topological Mapping. Discrete Bayes Filter Topological Mapping Discrete Bayes Filter Vision Based Localization Given a image(s) acquired by moving camera determine the robot s location and pose? Towards localization without odometry What can be

More information

Localization and Map Building

Localization and Map Building Localization and Map Building Noise and aliasing; odometric position estimation To localize or not to localize Belief representation Map representation Probabilistic map-based localization Other examples

More information

Robust Visual Localization in Changing Lighting Conditions

Robust Visual Localization in Changing Lighting Conditions 2017 IEEE International Conference on Robotics and Automation (ICRA) Singapore, May 29 - June 3, 2017 Robust Visual Localization in Changing Lighting Conditions Pyojin Kim 1, Brian Coltin 2, Oleg Alexandrov

More information

Efficient Vision Data Processing on a Mobile Device for Indoor Localization

Efficient Vision Data Processing on a Mobile Device for Indoor Localization Efficient Vision Data Processing on a Mobile Device for Indoor Localization Michał Nowicki, Jan Wietrzykowski, Piotr Skrzypczyński, Member, IEEE 1 arxiv:1611.02061v1 [cs.ro] 7 Nov 2016 Abstract The paper

More information

CS 231A Computer Vision (Fall 2011) Problem Set 4

CS 231A Computer Vision (Fall 2011) Problem Set 4 CS 231A Computer Vision (Fall 2011) Problem Set 4 Due: Nov. 30 th, 2011 (9:30am) 1 Part-based models for Object Recognition (50 points) One approach to object recognition is to use a deformable part-based

More information