Perception IV: Place Recognition, Line Extraction

Similar documents
Perception. Autonomous Mobile Robots. Sensors Vision Uncertainties, Line extraction from laser scans. Autonomous Systems Lab. Zürich.

Lecture 12 Recognition

Uncertainties: Representation and Propagation & Line Extraction from Range data

Lecture 12 Recognition. Davide Scaramuzza

Lecture 12 Recognition

Instance-level recognition

Instance-level recognition

Hough Transform and RANSAC

Fitting: The Hough transform

Keypoint-based Recognition and Object Search

Fitting: The Hough transform

Lecture 8 Fitting and Matching

Instance-level recognition part 2

EECS 442 Computer vision. Fitting methods

Instance-level recognition II.

Lecture 9 Fitting and Matching

10/03/11. Model Fitting. Computer Vision CS 143, Brown. James Hays. Slides from Silvio Savarese, Svetlana Lazebnik, and Derek Hoiem

Fitting. Instructor: Jason Corso (jjcorso)! web.eecs.umich.edu/~jjcorso/t/598f14!! EECS Fall 2014! Foundations of Computer Vision!

Feature Matching and Robust Fitting

Fitting: The Hough transform

Model Fitting: The Hough transform I

Feature Matching + Indexing and Retrieval

Homography estimation

An Extended Line Tracking Algorithm

RANSAC and some HOUGH transform

Straight Lines and Hough

RANSAC: RANdom Sampling And Consensus

Robot localization method based on visual features and their geometric relationship

Computer Vision 6 Segmentation by Fitting

Indexing local features and instance recognition May 16 th, 2017

Object Recognition and Augmented Reality

A Systems View of Large- Scale 3D Reconstruction

Indexing local features and instance recognition May 14 th, 2015

CS 664 Image Matching and Robust Fitting. Daniel Huttenlocher

CS 4495 Computer Vision Classification 3: Bag of Words. Aaron Bobick School of Interactive Computing

EE795: Computer Vision and Intelligent Systems

Object Recognition with Invariant Features

Large Scale Image Retrieval

DETECTION AND ROBUST ESTIMATION OF CYLINDER FEATURES IN POINT CLOUDS INTRODUCTION

human vision: grouping k-means clustering graph-theoretic clustering Hough transform line fitting RANSAC

Multi-stable Perception. Necker Cube

INFO0948 Fitting and Shape Matching

Photo by Carl Warner

Large scale object/scene recognition

CSE 527: Introduction to Computer Vision

Automatic Image Alignment (feature-based)

Large Scale 3D Reconstruction by Structure from Motion

OBJECT detection in general has many applications

Model Fitting. Introduction to Computer Vision CSE 152 Lecture 11

Bag of Words Models. CS4670 / 5670: Computer Vision Noah Snavely. Bag-of-words models 11/26/2013

Feature-based methods for image matching

CPSC 425: Computer Vision

Recognizing object instances

Mobile Robots Summery. Autonomous Mobile Robots

Fitting. Lecture 8. Cristian Sminchisescu. Slide credits: K. Grauman, S. Seitz, S. Lazebnik, D. Forsyth, J. Ponce

Observations. Basic iteration Line estimated from 2 inliers

By Suren Manvelyan,

Automatic Image Alignment

HOUGH TRANSFORM CS 6350 C V

Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science.

CS6670: Computer Vision

Maximally Stable Extremal Regions and Local Geometry for Visual Correspondences

Previously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011

Lecture 8: Fitting. Tuesday, Sept 25

Image correspondences and structure from motion

Large-scale visual recognition The bag-of-words representation

Computer and Machine Vision

Fitting. Choose a parametric object/some objects to represent a set of tokens Most interesting case is when criterion is not local

Automatic Image Alignment

CS231A Midterm Review. Friday 5/6/2016

A NEW FEATURE BASED IMAGE REGISTRATION ALGORITHM INTRODUCTION

Feature Detectors and Descriptors: Corners, Lines, etc.

Lecture 9: Hough Transform and Thresholding base Segmentation

Recognition. Topics that we will try to cover:

Nonparametric estimation of multiple structures with outliers

Miniature faking. In close-up photo, the depth of field is limited.

Robotics Programming Laboratory

Davide Scaramuzza. University of Zurich

ECG782: Multidimensional Digital Signal Processing

CS 231A Computer Vision (Winter 2018) Problem Set 3

Image Features: Local Descriptors. Sanja Fidler CSC420: Intro to Image Understanding 1/ 58

Local Features and Bag of Words Models

Fitting (LMedS, RANSAC)

Patch Descriptors. CSE 455 Linda Shapiro

HOUGH TRANSFORM. Plan for today. Introduction to HT. An image with linear structures. INF 4300 Digital Image Analysis

Fitting. Fitting. Slides S. Lazebnik Harris Corners Pkwy, Charlotte, NC

SYDE Winter 2011 Introduction to Pattern Recognition. Clustering

Feature Based Registration - Image Alignment

CS 558: Computer Vision 4 th Set of Notes

RANSAC RANdom SAmple Consensus

Fitting D.A. Forsyth, CS 543

Homographies and RANSAC

VK Multimedia Information Systems

Structured Light II. Thanks to Ronen Gvili, Szymon Rusinkiewicz and Maks Ovsjanikov

Video Google: A Text Retrieval Approach to Object Matching in Videos

A Summary of Projective Geometry

Evaluation of GIST descriptors for web scale image search

Nonparametric estimation of multiple structures with outliers

IMAGE MATCHING - ALOK TALEKAR - SAIRAM SUNDARESAN 11/23/2010 1

Compressed local descriptors for fast image and video search in large databases

Transcription:

Perception IV: Place Recognition, Line Extraction Davide Scaramuzza University of Zurich Margarita Chli, Paul Furgale, Marco Hutter, Roland Siegwart 1

Outline of Today s lecture Place recognition using Vocabulary Tree Line extraction from images Line extraction from laser data Introduction 2

K-means clustering - Review Minimizes the sum of squared Euclidean distances between points x i and their nearest cluster centers m 2 k ( xi D( X, M ) m cluster k point i in cluster k k ) Algorithm: Randomly initialize K cluster centers Iterate until convergence: Assign each data point to the nearest center Recompute each cluster center as the mean of all points assigned to it

K-means clustering - Demo Source: http://shabal.in/visuals/kmeans/1.html

Feature-based object recognition - Review Q: Is this Book present in the Scene? Look for corresponding matches Most of the Book s keypoints are present in the Scene A: The Book is present in the Scene

Taking this a step further Find an object in an image? Find an object in multiple images? Find multiple objects in multiple images As the number of images increases, feature based object recognition becomes computationaly unfeasable?

Fast visual search Query in a database of 110 million images in 5.8 seconds Video Google, Sivic and Zisserman, ICCV 2003 Scalable Recognition with a Vocabulary Tree, Nister and Stewenius, CVPR 2006.

How much is 110 million images? Slide

How much is 110 million images? Slide

How much is 110 million images?

Bag of Words Extension to scene/place recognition: Is this image in my database? Robot: Have I been to this place before? Use analogies from text retrieval: Visual Words Vocabulary of Visual Words Bag of Words (BOW) approach

Indexing local features With potentially thousands of features per image, and hundreds to millions of images to search, how to efficiently find those that are relevant to a new image? Quantize/cluster the descriptors into `visual words Inverted file indexing schemes

Indexing local features: inverted file text For text documents, an efficient way to find all pages on which a word occurs is to use an index We want to find all images in which a feature occurs To use this idea, we ll need to map our features to visual words

Building the Visual Vocabulary Extract Features Image Collection Cluster Descriptors Descriptors space Examples of Visual Words:

Video Google These features map to the same visual word Video Google [J.Sivic and A. Zisserman, ICCV 2003] Demo: http://www.robots.ox.ac.uk/~vgg/research/vgoogle/

Video Google System 1. Collect all words within query region 2. Inverted file index to find relevant frames 3. Compare word counts 4. Spatial verification Query region Sivic & Zisserman, ICCV 2003 Demo online at : http://www.robots.ox.ac.uk/~vgg/research/vgoogle Retrieved frames

Efficient Place/Object Recognition We can describe a scene as a collection of words and look up in the database for images with a similar collection of words What if we need to find an object/scene in a database of millions of images? Build Vocabulary Tree via hierarchical clustering Use the Inverted File system for efficient indexing [Nister and Stewénius, CVPR 2006]

Recognition with K-tree Populate the descriptor space

Recognition with K-tree Populate the descriptor space

Recognition with K-tree Populate the descriptor space

Recognition with K-tree Populate the descriptor space

Recognition with K-tree Populate the descriptor space

Recognition with K-tree Populate the descriptor space

Building the inverted file index

Building the inverted file index

Building the inverted file index

Building the inverted file index

Building the inverted file index

Inverted File index Inverted File DB lists all possible visual words Each word points to a list of images where this word occurs Voting array: has as many cells as the images in the DB each word in query image, votes for an image Query image Q Visual words in Q 101 102 105 105 180 180 180 Inverted File DB Visual word 0 101 102 103 104 105 Voting Array for Q List of images that this word appears in +1 +1 +1

FABMAP [Cummins and Newman IJRR 2011] Place recognition for robot localization Use training images to build the BoW database Probabilistic model of the world At a new frame, compute: P(being at a known place) P(being at a new place) Captures the dependencies of words to distinguish the most characteristic structure of each scene (using the Chow-Liu tree) Binaries available online

FABMAP example robots.ox.ac.uk/~mjc/appearance_based_results.htm p = probability of images coming from the same place

FABMAP example robots.ox.ac.uk/~mjc/appearance_based_results.htm p = probability of images coming from the same place

Robust object/scene recognition Visual Vocabulary holds appearance information but discards the spatial relationships between features Two images with the same features shuffled around in the image will be a 100% match when using only appearance information. If different arrangements of the same features are expected then one might use geometric verification Test the k most similar images to the query image for geometric consistency (e.g. using RANSAC) Further reading (out of the scope of this course): [Cummins and Newman, IJRR 2011] [Stewénius et al, ECCV 2012]

Outline of Today s lecture Place recognition using Vocabulary Tree Line extraction from images Line extraction from laser data Introduction 46

Line extraction Supose that you have been commissioned to implement a lane detection for a car driving-assistance system. How would you proceed?

Line extraction How do we extract lines from edges?

Two popular line extraction algorithms Hough transform (used also to detect circles, ellipses, and any sort of shape) RANSAC (Random Sample Consensus)

Hough-Transform Finds lines from a binary edge image using a voting procedure The voting space (or accumulator) is called Hough space 1. P. Hough, Machine Analysis of Bubble Chamber Pictures, Proc. Int. Conf. High Energy Accelerators and Instrumentation, 1959 2. J. Richard, O. Duda, P.E. Hart (April 1971). "Use of the Hough Transformation to Detect Lines and Curves in Pictures". Artificial Intelligence Center (SRI International)

Hough-Transform Let x 0, y 0 be an image point We can represent all the lines passing through it by y 0 = mx 0 + b The Hough transform works by parameterizing this expression in terms of m and b: b = x 0 m + y 0 This is represented by a line in the Hough space x 0, y 0 Every point votes a line in the Hough space b = x 0 m + y 0

Hough-Transform Let x 0, y 0 be an image point We can represent all the lines passing through it by y 0 = mx 0 + b The Hough transform works by parameterizing this expression in terms of m and b: b = x 0 m + y 0 This is represented by a line in the Hough space x 1, y 1 b = x 1 m + y 1 x 0, y 0 b = x 0 m + y 0 Every point votes a line in the Hough space

Hough-Transform How do we determine the line (b, m ) that contains both x 0, y 0 and x 1, y 1? It is the intersection of the lines b = x 0 m + y 0 and b = x 1 m + y 1 x 1, y 1 b = x 1 m + y 1 x 0, y 0 b = x 0 m + y 0 Every point votes a line in the Hough space b m

Hough-Transform x 1, y 1 b = x 1 m + y 1 x 0, y 0 b = x 0 m + y 0 Every point votes a line in the Hough space Each point in image space, votes for line-parameters in Hough parameter space

Hough-Transform Problems with the (m, b) space: Unbounded parameter domain m, b can assume any value in [, + ]

Hough-Transform Problems with the (m, b) space: Unbounded parameter domain m, b can assume any value in [, + ] Alternative line representation: polar representation x cos y sin

Hough-Transform Each point in image space maps to a sinusoid in the parameter space (ρ, θ) x cos y sin x = 0 y = 180 = 0 = 100

Hough-Transform Each point in image space maps to a sinusoid in the parameter space (ρ, θ) x cos y sin x = 0 y = 180 = 0 = 100

Hough-Transform 1. Initialize: set all accumulator cells to zero 2. for each edge point (x,y) in the image for all θ in [0 : step : 180] Compute ρ = x cos θ + y sin θ H(θ, ρ) = H(θ, ρ) + 1 end end 3. Find the values of (θ, ρ) where H(θ, ρ) is a local maximum 4. The detected line in the image is given by: ρ = x cos θ + y sin θ

Examples

Examples 100 200 300 400 500 600 700 20 40 60 80 100 120 140 160 180 Hough Transform Notice, however, that the Hough only find the parameters of the line, not the ends of it. How do you find them?

Examples

Examples

Problems Effects of noise: peaks get fuzzy and hard to locate How to overcome this? Increase bin size (increase resolution of the Hough space); however, this reduces the accuracy of the line parameters Convolute the output with a box filter; why?

RANSAC (RAndom SAmple Consensus) It has become the standard method for model fitting in the presence of outliers (very noisy points or wrong data) It can be applied to line fitting but also to thousands of different problems where the goal is to estimate the parameters of a model from noisy data (e.g., camera calibration, structure from motion, DLT, homography, etc.) Let s now focus on line extraction M. A.Fischler and R. C.Bolles. Random sample consensus: A paradigm for model fitting with applicatlons to image analysis and automated cartography. Graphics and Image Processing, 24(6):381 395, 1981.

RANSAC

RANSAC Select sample of 2 points at random

RANSAC Select sample of 2 points at random Calculate model parameters that fit the data in the sample

RANSAC Select sample of 2 points at random Calculate model parameters that fit the data in the sample Calculate error function for each data point

RANSAC Select sample of 2 points at random Calculate model parameters that fit the data in the sample Calculate error function for each data point Select data that supports current hypothesis

RANSAC Select sample of 2 points at random Calculate model parameters that fit the data in the sample Calculate error function for each data point Select data that supports current hypothesis Repeat sampling

RANSAC Select sample of 2 points at random Calculate model parameters that fit the data in the sample Calculate error function for each data point Select data that supports current hypothesis Repeat sampling

RANSAC Set with the maximum number of inliers obtained after k iterations

RANSAC How many iterations does RANSAC need? Ideally: check all possible combinations of 2 points in a dataset of N points. Number of all pairwise combinations: N(N-1)/2 computationally unfeasible if N is too large. example: 10 000 edge points need to check all 10 000*9999/2= 50 million combinations! Do we really need to check all combinations or can we stop after some iterations? Checking a subset of combinations is enough if we have a rough estimate of the percentage of inliers in our dataset This can be done in a probabilistic way

RANSAC How many iterations does RANSAC need? w := percentage of inliers: number of inliers/n N := total number of data points w : fraction of inliers in the dataset w = P(selecting an inlier-point out of the dataset) Let p := P(selecting a set of points free of outliers) Assumption: the 2 points necessary to estimate a line are selected independently w 2 = P(both selected points are inliers) 1-w 2 = P(at least one of these two points is an outlier) Let k := no. RANSAC iterations executed so far ( 1-w 2 ) k = P(RANSAC never selects two points that are both inliers) 1-p = ( 1-w 2 ) k and therefore : k log(1 log(1 p) ) 2 w

RANSAC How many iterations does RANSAC need? The number of iterations k is k log(1 log(1 p) ) 2 w knowing the fraction of inliers w, after k RANSAC iterations we will have a probability p of finding a set of points free of outliers Example: if we want a probability of success p=99% and we know that w=50% k=16 iterations these are dramatically fewer than the number of all possible combinations! Notice that the number of points does not influence the estimated number of iterations, only w does! In practice we need only a rough estimate of w. More advanced variants of RANSAC estimate the fraction of inliers and adaptively change it on every iteration

RANSAC - Algorithm Let A be a set of N points 1. repeat 2. Randomly select a sample of 2 points from A 3. Fit a line through the 2 points 4. Compute the distances of all other points to this line 5. Construct the inlier set (i.e. count the number of points whose distance < d) 6. Store these inliers 7. until maximum number of iterations k reached 8. The set with the maximum number of inliers is chosen as a solution to the problem

RANSAC - Applications RANSAC = RANdom SAmple Consensus. A generic & robust fitting algorithm of models in the presence of outliers (i.e. points which do not satisfy a model) Can be applied in general to any problem where the goal is to identify the inliers which satisfy a predefined model.? Typical applications in robotics are: line extraction from 2D range data, plane extraction from 3D data, feature matching, structure from motion, camera calibration, homography estimation, etc. RANSAC is iterative and non-deterministic the probability to find a set free of outliers increases as more iterations are used Drawback: a non-deterministic method, results are different between runs.

Outline of Today s lecture Place recognition using Vocabulary Tree Line extraction from images Line extraction from laser data RANSAC (same as for line dtection from images) Hough (same as for line dtection from images) Split and Merge (only laser scans: uses sequential order of scan data) Line regression (only laser scans: uses sequential order of scan data) Introduction 79

Algorithm 1: Split-and-Merge (standard) Popular algorithm, originates from Computer Vision. A recursive procedure of fitting and splitting. A slightly different version, called Iterative end-point-fit, simply connects the end points for line fitting. Let S be the set of all data points Split Fit a line to points in current set S Find the most distant point to the line If distance > threshold split set & repeat with left and right point sets Merge If two consecutive segments are collinear enough, obtain the common line and find the most distant point If distance <= threshold, merge both segments Introduction 80

Algorithm 1: Split-and-Merge (iterative end-point-fit) Iterative end-point-fit: simply connects the end points for line fitting

Algorithm 1: Split-and-Merge (iterative end-point-fit) Iterative end-point-fit: simply connects the end points for line fitting Split

Algorithm 1: Split-and-Merge (iterative end-point-fit) Iterative end-point-fit: simply connects the end points for line fitting Split Split Split

Algorithm 1: Split-and-Merge (iterative end-point-fit) Iterative end-point-fit: simply connects the end points for line fitting Split Split No more Splits Split

Algorithm 1: Split-and-Merge (iterative end-point-fit) Iterative end-point-fit: simply connects the end points for line fitting Split Split Merge No more Splits Split

Algorithm 2: Line-Regression Sliding window of size N f points Fit line-segment to all points in each window Then adjacent segments are merged if their line parameters are close Line-Regression Initialize sliding window size N f Fit a line to every N f consecutive points (i.e. in each window) Merge overlapping line segments + recompute line parameters for each segment N f = 3 Introduction 86

Algorithm 2: Line-Regression Sliding window of size N f points Fit line-segment to all points in each window Then adjacent segments are merged if their line parameters are close Line-Regression Initialize sliding window size N f Fit a line to every N f consecutive points (i.e. in each window) Merge overlapping line segments + recompute line parameters for each segment N f = 3