Advanced Techniques for Mobile Robotics Bag-of-Words Models & Appearance-Based Mapping

Similar documents
Pattern recognition (3)

Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science.

By Suren Manvelyan,

Indexing local features and instance recognition May 15 th, 2018

Distances and Kernels. Motivation

Recognition with Bag-ofWords. (Borrowing heavily from Tutorial Slides by Li Fei-fei)

Object Classification for Video Surveillance

Indexing local features and instance recognition May 16 th, 2017

Indexing local features and instance recognition May 14 th, 2015

Visual Navigation for Flying Robots. Structure From Motion

CS 4495 Computer Vision Classification 3: Bag of Words. Aaron Bobick School of Interactive Computing

Lecture 14: Indexing with local features. Thursday, Nov 1 Prof. Kristen Grauman. Outline

Recognizing Object Instances. Prof. Xin Yang HUST

Recognizing object instances

Today. Main questions 10/30/2008. Bag of words models. Last time: Local invariant features. Harris corner detector: rotation invariant detection

Recognizing object instances. Some pset 3 results! 4/5/2011. Monday, April 4 Prof. Kristen Grauman UT-Austin. Christopher Tosh.

The bits the whirl-wind left out..

Feature Matching + Indexing and Retrieval

Instance recognition

CEE598 - Visual Sensing for Civil Infrastructure Eng. & Mgmt.

Image Features and Categorization. Computer Vision Jia-Bin Huang, Virginia Tech

Ensemble of Bayesian Filters for Loop Closure Detection

Image Features and Categorization. Computer Vision Jia-Bin Huang, Virginia Tech

Lecture 12 Visual recognition

Simultaneous Localization and Mapping

EECS 442 Computer vision. Object Recognition

Computer Vision. Exercise Session 10 Image Categorization

The SIFT (Scale Invariant Feature

Appearance-Based Place Recognition Using Whole-Image BRISK for Collaborative MultiRobot Localization

Probabilistic Robotics

Introduction to Mobile Robotics Multi-Robot Exploration

COS Lecture 13 Autonomous Robot Navigation

Robot Mapping. SLAM Front-Ends. Cyrill Stachniss. Partial image courtesy: Edwin Olson 1

FAB-MAP: Appearance-Based Place Recognition and Mapping using a Learned Visual Vocabulary Model

Probabilistic Robotics

Topological Mapping. Discrete Bayes Filter

Previously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011

Part-based and local feature models for generic object recognition

Fast-SeqSLAM: Place Recognition and Multi-robot Map Merging. Sayem Mohammad Siam

EECS150 - Digital Design Lecture 14 FIFO 2 and SIFT. Recap and Outline

Improving Vision-based Topological Localization by Combining Local and Global Image Features

Probabilistic Robotics

Loop detection and extended target tracking using laser data

Spring Localization II. Roland Siegwart, Margarita Chli, Juan Nieto, Nick Lawrance. ASL Autonomous Systems Lab. Autonomous Mobile Robots

W4. Perception & Situation Awareness & Decision making

Lecture 12 Recognition

Classifying Images with Visual/Textual Cues. By Steven Kappes and Yan Cao

Visual Recognition and Search April 18, 2008 Joo Hyun Kim

Introduction to SLAM Part II. Paul Robertson

Introduction to Mobile Robotics

Introduction to Object Recognition & Bag of Words (BoW) Models

Autonomous Mobile Robot Design

Advanced Techniques for Mobile Robotics Graph-based SLAM using Least Squares. Wolfram Burgard, Cyrill Stachniss, Kai Arras, Maren Bennewitz

Localization and Map Building

Patch Descriptors. CSE 455 Linda Shapiro

Robot localization method based on visual features and their geometric relationship

Introduction to Mobile Robotics SLAM Landmark-based FastSLAM

Spring Localization II. Roland Siegwart, Margarita Chli, Martin Rufli. ASL Autonomous Systems Lab. Autonomous Mobile Robots

Humanoid Robotics. Monte Carlo Localization. Maren Bennewitz

Introduction to Mobile Robotics Path Planning and Collision Avoidance

LOCAL AND GLOBAL DESCRIPTORS FOR PLACE RECOGNITION IN ROBOTICS

Visual Pathways to the Brain

Human Detection and Action Recognition. in Video Sequences

Semantic Mapping and Reasoning Approach for Mobile Robotics

Lecture 15: Object recognition: Bag of Words models & Part based generative models

Introduction to Mobile Robotics Iterative Closest Point Algorithm. Wolfram Burgard, Cyrill Stachniss, Maren Bennewitz, Kai Arras

IROS 05 Tutorial. MCL: Global Localization (Sonar) Monte-Carlo Localization. Particle Filters. Rao-Blackwellized Particle Filters and Loop Closing

Patch Descriptors. EE/CSE 576 Linda Shapiro

Lecture 12 Recognition. Davide Scaramuzza

Experiments in Place Recognition using Gist Panoramas

Robot Mapping. A Short Introduction to the Bayes Filter and Related Models. Gian Diego Tipaldi, Wolfram Burgard

CS 231A Computer Vision (Fall 2011) Problem Set 4

Robust Place Recognition for 3D Range Data based on Point Features

Recognition of Animal Skin Texture Attributes in the Wild. Amey Dharwadker (aap2174) Kai Zhang (kz2213)

Improving Recognition through Object Sub-categorization

Announcements. Recognition. Recognition. Recognition. Recognition. Homework 3 is due May 18, 11:59 PM Reading: Computer Vision I CSE 152 Lecture 14

Robotics. Lecture 7: Simultaneous Localisation and Mapping (SLAM)

L17. OCCUPANCY MAPS. NA568 Mobile Robotics: Methods & Algorithms

Improving Vision-based Topological Localization by Combing Local and Global Image Features

Det De e t cting abnormal event n s Jaechul Kim

IN computer vision develop mathematical techniques in

Recognition. Topics that we will try to cover:

Multiple-Choice Questionnaire Group C

Overview. EECS 124, UC Berkeley, Spring 2008 Lecture 23: Localization and Mapping. Statistical Models

Extracting Spatio-temporal Local Features Considering Consecutiveness of Motions

Segmentation and Tracking of Partial Planar Templates

Descriptors for CV. Introduc)on:

Beyond Bags of Features

Introduction to Mobile Robotics SLAM Grid-based FastSLAM. Wolfram Burgard, Cyrill Stachniss, Maren Bennewitz, Diego Tipaldi, Luciano Spinello

Object Recognition. Lecture 11, April 21 st, Lexing Xie. EE4830 Digital Image Processing

ORB SLAM 2 : an Open-Source SLAM System for Monocular, Stereo and RGB-D Cameras

Introduction. Introduction. Related Research. SIFT method. SIFT method. Distinctive Image Features from Scale-Invariant. Scale.

Geometric VLAD for Large Scale Image Search. Zixuan Wang 1, Wei Di 2, Anurag Bhardwaj 2, Vignesh Jagadesh 2, Robinson Piramuthu 2

Bridging the Gap Between Local and Global Approaches for 3D Object Recognition. Isma Hadji G. N. DeSouza

Feature Detectors and Descriptors: Corners, Lines, etc.

Robot Mapping. TORO Gradient Descent for SLAM. Cyrill Stachniss

Object detection using non-redundant local Binary Patterns

Arbib: Slides for TMB2 Section 7.2 1

Gaussian Processes, SLAM, Fast SLAM and Rao-Blackwellization

Part based models for recognition. Kristen Grauman

Transcription:

Advanced Techniques for Mobile Robotics Bag-of-Words Models & Appearance-Based Mapping Wolfram Burgard, Cyrill Stachniss, Kai Arras, Maren Bennewitz

Motivation: Analogy to Documents O f a l l t h e s e n s o r y i m p r e s s i o n s proceeding to the brain, the visual experiences are the dominant ones. Our perception of the world around us is based essentially on the messages that reach the brain from our eyes. For a long time it sensory, was thought brain, that the retinal image was transmitted point by point to visual centers visual, in the perception, brain; the cerebral cortex was a movie screen, so to speak, upon which the image in the eye was projected. Through the discoveries of Hubel and eye, Wiesel cell, we now optical know that behind the origin of the visual perception in the brain there is a considerably nerve, more complicated image course of events. By following the visual impulses along Hubel, their path Wiesel to the various cell layers of the optical cortex, Hubel and Wiesel have been able to demonstrate that the message about the image falling on the retina undergoes a step-wise analysis in a system of nerve cells stored in columns. In this system each cell has its specific function and is responsible for a specific detail in the pattern of the retinal image. retinal, cerebral cortex, image source: L. Fei-Fei China is forecasting a trade surplus of $90bn ( 51bn) to $100bn this year, a threefold increase on 2004's $32bn. The Commerce Ministry said the surplus would be created by a predicted 30% jump in exports China, to $750bn, trade, compared with a 18% rise in imports to $660bn. The figures are likely to further annoy the US, surplus, which has commerce, long argued that China's exports are unfairly helped by a deliberately exports, undervalued imports, yuan. Beijing US, agrees the surplus is too high, but says the yuan, is only bank, one domestic, factor. Bank of China governor Zhou Xiaochuan said the country foreign, also needed increase, to do more to boost domestic demand so more goods stayed within the country. China trade, value increased the value of the yuan against the dollar by 2.1% in July and permitted it to trade within a narrow band, but the US wants the yuan to be allowed to trade freely. However, Beijing has made it clear that it will take its time and tread carefully before allowing the yuan to rise further in value. 2

Object Classification / Scene Recognition Analogy to documents: The content can be inferred from the frequency of words image source: L. Fei-Fei object bag of visual words 3

Bag of Visual Words Visual words = independent features face image source: L. Fei-Fei features 4

Bag of Visual Words Visual words = independent features Construct a dictionary of representative words codewords dictionary image source: L. Fei-Fei 5

Bag of Visual Words Visual words = independent features Construct a dictionary of representative words Represent the images based on a histogram of word occurrences (bag) Each detected feature is assigned to the closest entry in the codebook image source: L. Fei-Fei 6

Overview...... feature detection & representation codewords dictionary image representation slide adapted from: L. Fei-Fei

Feature Detection and Representation detected features in a set of training images (intensity changes) slide adapted from: L. Fei-Fei example patch 8

Feature Detection and Representation descriptor vectors (e.g., SIFT/SURF, consider local orientations of gradients) detected features in a set of training images (intensity changes) slide adapted from: L. Fei-Fei example patch 9

Learning the Dictionary slide adapted from: L. Fei-Fei

Learning the Dictionary cluster center = code words clustering, e.g., k-means slide adapted from: L. Fei-Fei

Example Codewords Dictionary Fei-Fei et al. 2005

Example Image Representation Build the histogram by assigning each detected feature to the closest entry in the codebook frequency.. slide adapted from: L. Fei-Fei codewords

Properties Bag-of-Words Compact summary of content Flexible to viewpoint, deformations Can be used for object / image classification by comparing the histograms (and applying some discriminative method) Ignores geometry Unclear how to choose optimal vocabulary Too small: Words not representative of all patches Too large: Artifacts, over-fitting 15

Appearance-Based Mapping with a Bag-of-Words Approach Based on M. Cummins & P. Newman FAB-MAP: Probabilistic Localization and Mapping in the Space of Appearance Int. Journal of Robotics Research, 2008 Appearance-only SLAM at Large Scale with FAB-MAP 2.0 Int. Journal of Robotics Research, 2010 Slides based on a presentation of Mark Cummins at R:SS 2009 16

Motivation: Failure of Metric SLAM Appearance information can help to recover the pose estimate where metric approaches may fail 17

Appearance-Based Mapping (1) Recognize places based on visual appearance, even under difficult conditions Decide whether observations result from places already in the map, or from new, unseen places Difficult problem since different places may have similar visual appearance (and vice versa) Apply a bag-of-words approach Extension: Take into account that certain combinations of words co-occur 18

Appearance-Based Mapping (2) Parameterize the world as a set of discrete locations Estimate their positions in an appearance space Distinctive places can be recognized even after unknown motion (loop-closure) 19

Example current observation: map:

Learning the Visual Vocabulary feature extraction SURF

Clustering in Feature Space

Bag-of-Words Representation feature detection Word 753 quantize compute descriptor vectors...

Inference in FAB-MAP map: new place current observation: Z = [0 1 0 1 1 ]! observation at time k, v = number of words in dictionary

Environment Representation Collection of a set of discrete and disjoint locations at time k: Place appearance model: belief about the existence of scene elements (words) Detector model relates feature existence and feature detection observation existence 26

Graphical Model detector model word existence word observation

Correlations of Word Occurrence Visual words are not independent, instead they tend to co-occur 28

Capturing Correlations Learn a tree-structured Bayesian network to capture dependencies between words (Chow Liu algorithm) Word 28 Word 741 29

Capturing Correlations Learn a tree-structured Bayesian network to capture dependencies between words (Chow Liu algorithm) z 741 z 28 z 119 z 611 z 3118 root parent of

Graphical Model detector model word existence word observation Chow Liu tree

Inference in FAB-MAP all observations up to k observation likelihood prior location i normalizing term

Observation Likelihood Chow Liu tree for the joint distribution 33

Observation Likelihood Chow Liu tree for the joint distribution 34

Observation Likelihood Chow Liu tree for the joint distribution can be further expanded and estimated from training data appearance model (updated online) 35

Location Prior Use a simple motion model to compute If the vehicle is at location i at time k-1, it is likely to be at one of the topologically adjacent locations at time t In case of unknown neighbors, part of the probability mass is assigned to a new place node (no odometry is used) 36

Normalization all observations up to k observation likelihood prior location i normalizing term We need to evaluate the normalizing term since the current observation might come from a location not yet contained in the map

mapped places unmapped places

mapped places unmapped places approximate by sampling: prior probability of being at a new location sampled place models

Updating Place Models Maximum likelihood data association after each observation Update the relevant place appearance model Each component is updated according to prior Bayes rule + two assumption: observations independent given place detection errors independent of location 40

Experimental Results 2k images, collected 30m apart, for training (vocabulary + Chow Liu tree) Vocabulary: 100k words 1000 km test data set: 103k images, ~8m apart, with 50k loop closures, 21h driving Robust matching even when place appearance changes Correct loop closures under perspective changes, rotation, lighting changes, dynamic objects,...

Perspective Change

Rotation

Lighting Change

Dynamic Objects

Perceptual Aliasing Correctly Rejected

Highest Confidence False Positives

Loop Closure (70 km Data Set) Trajectory from GPS data

Video 49

Timing Performance Mean computation times: Inference: 25 ms for 100k locations SURF detection + quantization: 483 ms

Visual Odometry 51

Visual Odometry with Loop Closure Constraints 52

Combined Result 53

Multi-Session Mapping 54

Multi-Session Mapping 55

Summary Appearance-only navigation Bag-of-words approach to recognize places Chow Liu tree to capture dependencies Probabilistic framework can deal with perceptual aliasing and new place detection Successfully detects loops in challenging outdoor environments Fast enough for online loop closure detection Can be used to complement metric SLAM 56

Further Reading M. Cummins & P. Newman FAB-MAP: Probabilistic Localization and Mapping in the Space of Appearance Int. Journal of Robotics Research, 2008 Appearance-only SLAM at Large Scale with FAB-MAP 2.0 Int. Journal of Robotics Research, 2010 57