Perception IV: Place Recognition, Line Extraction

Perception IV: Place Recognition, Line Extraction Davide Scaramuzza University of Zurich Margarita Chli, Paul Furgale, Marco Hutter, Roland Siegwart 1

Outline of Today s lecture Place recognition using Vocabulary Tree Line extraction from images Line extraction from laser data Introduction 2

K-means clustering - Review Minimizes the sum of squared Euclidean distances between points x i and their nearest cluster centers m 2 k ( xi D( X, M ) m cluster k point i in cluster k k ) Algorithm: Randomly initialize K cluster centers Iterate until convergence: Assign each data point to the nearest center Recompute each cluster center as the mean of all points assigned to it

K-means clustering - Demo Source: http://shabal.in/visuals/kmeans/1.html

Feature-based object recognition - Review Q: Is this Book present in the Scene? Look for corresponding matches Most of the Book s keypoints are present in the Scene A: The Book is present in the Scene

Taking this a step further Find an object in an image? Find an object in multiple images? Find multiple objects in multiple images As the number of images increases, feature based object recognition becomes computationaly unfeasable?

Fast visual search Query in a database of 110 million images in 5.8 seconds Video Google, Sivic and Zisserman, ICCV 2003 Scalable Recognition with a Vocabulary Tree, Nister and Stewenius, CVPR 2006.

How much is 110 million images? Slide

How much is 110 million images?

Bag of Words Extension to scene/place recognition: Is this image in my database? Robot: Have I been to this place before? Use analogies from text retrieval: Visual Words Vocabulary of Visual Words Bag of Words (BOW) approach

Indexing local features With potentially thousands of features per image, and hundreds to millions of images to search, how to efficiently find those that are relevant to a new image? Quantize/cluster the descriptors into `visual words Inverted file indexing schemes

Indexing local features: inverted file text For text documents, an efficient way to find all pages on which a word occurs is to use an index We want to find all images in which a feature occurs To use this idea, we ll need to map our features to visual words

Building the Visual Vocabulary Extract Features Image Collection Cluster Descriptors Descriptors space Examples of Visual Words:

Video Google These features map to the same visual word Video Google [J.Sivic and A. Zisserman, ICCV 2003] Demo: http://www.robots.ox.ac.uk/~vgg/research/vgoogle/

Video Google System 1. Collect all words within query region 2. Inverted file index to find relevant frames 3. Compare word counts 4. Spatial verification Query region Sivic & Zisserman, ICCV 2003 Demo online at : http://www.robots.ox.ac.uk/~vgg/research/vgoogle Retrieved frames

Efficient Place/Object Recognition We can describe a scene as a collection of words and look up in the database for images with a similar collection of words What if we need to find an object/scene in a database of millions of images? Build Vocabulary Tree via hierarchical clustering Use the Inverted File system for efficient indexing [Nister and Stewénius, CVPR 2006]

Recognition with K-tree Populate the descriptor space

Building the inverted file index

Inverted File index Inverted File DB lists all possible visual words Each word points to a list of images where this word occurs Voting array: has as many cells as the images in the DB each word in query image, votes for an image Query image Q Visual words in Q 101 102 105 105 180 180 180 Inverted File DB Visual word 0 101 102 103 104 105 Voting Array for Q List of images that this word appears in +1 +1 +1

FABMAP [Cummins and Newman IJRR 2011] Place recognition for robot localization Use training images to build the BoW database Probabilistic model of the world At a new frame, compute: P(being at a known place) P(being at a new place) Captures the dependencies of words to distinguish the most characteristic structure of each scene (using the Chow-Liu tree) Binaries available online

FABMAP example robots.ox.ac.uk/~mjc/appearance_based_results.htm p = probability of images coming from the same place

Robust object/scene recognition Visual Vocabulary holds appearance information but discards the spatial relationships between features Two images with the same features shuffled around in the image will be a 100% match when using only appearance information. If different arrangements of the same features are expected then one might use geometric verification Test the k most similar images to the query image for geometric consistency (e.g. using RANSAC) Further reading (out of the scope of this course): [Cummins and Newman, IJRR 2011] [Stewénius et al, ECCV 2012]

Outline of Today s lecture Place recognition using Vocabulary Tree Line extraction from images Line extraction from laser data Introduction 46

Line extraction Supose that you have been commissioned to implement a lane detection for a car driving-assistance system. How would you proceed?

Line extraction How do we extract lines from edges?

Two popular line extraction algorithms Hough transform (used also to detect circles, ellipses, and any sort of shape) RANSAC (Random Sample Consensus)

Hough-Transform Finds lines from a binary edge image using a voting procedure The voting space (or accumulator) is called Hough space 1. P. Hough, Machine Analysis of Bubble Chamber Pictures, Proc. Int. Conf. High Energy Accelerators and Instrumentation, 1959 2. J. Richard, O. Duda, P.E. Hart (April 1971). "Use of the Hough Transformation to Detect Lines and Curves in Pictures". Artificial Intelligence Center (SRI International)

Hough-Transform Let x 0, y 0 be an image point We can represent all the lines passing through it by y 0 = mx 0 + b The Hough transform works by parameterizing this expression in terms of m and b: b = x 0 m + y 0 This is represented by a line in the Hough space x 0, y 0 Every point votes a line in the Hough space b = x 0 m + y 0

Hough-Transform Let x 0, y 0 be an image point We can represent all the lines passing through it by y 0 = mx 0 + b The Hough transform works by parameterizing this expression in terms of m and b: b = x 0 m + y 0 This is represented by a line in the Hough space x 1, y 1 b = x 1 m + y 1 x 0, y 0 b = x 0 m + y 0 Every point votes a line in the Hough space

Hough-Transform How do we determine the line (b, m ) that contains both x 0, y 0 and x 1, y 1? It is the intersection of the lines b = x 0 m + y 0 and b = x 1 m + y 1 x 1, y 1 b = x 1 m + y 1 x 0, y 0 b = x 0 m + y 0 Every point votes a line in the Hough space b m

Hough-Transform x 1, y 1 b = x 1 m + y 1 x 0, y 0 b = x 0 m + y 0 Every point votes a line in the Hough space Each point in image space, votes for line-parameters in Hough parameter space

Hough-Transform Problems with the (m, b) space: Unbounded parameter domain m, b can assume any value in [, + ]

Hough-Transform Problems with the (m, b) space: Unbounded parameter domain m, b can assume any value in [, + ] Alternative line representation: polar representation x cos y sin

Hough-Transform Each point in image space maps to a sinusoid in the parameter space (ρ, θ) x cos y sin x = 0 y = 180 = 0 = 100

Hough-Transform 1. Initialize: set all accumulator cells to zero 2. for each edge point (x,y) in the image for all θ in [0 : step : 180] Compute ρ = x cos θ + y sin θ H(θ, ρ) = H(θ, ρ) + 1 end end 3. Find the values of (θ, ρ) where H(θ, ρ) is a local maximum 4. The detected line in the image is given by: ρ = x cos θ + y sin θ

Examples

Examples 100 200 300 400 500 600 700 20 40 60 80 100 120 140 160 180 Hough Transform Notice, however, that the Hough only find the parameters of the line, not the ends of it. How do you find them?

Examples

Problems Effects of noise: peaks get fuzzy and hard to locate How to overcome this? Increase bin size (increase resolution of the Hough space); however, this reduces the accuracy of the line parameters Convolute the output with a box filter; why?

RANSAC (RAndom SAmple Consensus) It has become the standard method for model fitting in the presence of outliers (very noisy points or wrong data) It can be applied to line fitting but also to thousands of different problems where the goal is to estimate the parameters of a model from noisy data (e.g., camera calibration, structure from motion, DLT, homography, etc.) Let s now focus on line extraction M. A.Fischler and R. C.Bolles. Random sample consensus: A paradigm for model fitting with applicatlons to image analysis and automated cartography. Graphics and Image Processing, 24(6):381 395, 1981.

RANSAC

RANSAC Select sample of 2 points at random

RANSAC Select sample of 2 points at random Calculate model parameters that fit the data in the sample

RANSAC Select sample of 2 points at random Calculate model parameters that fit the data in the sample Calculate error function for each data point

RANSAC Select sample of 2 points at random Calculate model parameters that fit the data in the sample Calculate error function for each data point Select data that supports current hypothesis

RANSAC Select sample of 2 points at random Calculate model parameters that fit the data in the sample Calculate error function for each data point Select data that supports current hypothesis Repeat sampling

RANSAC Set with the maximum number of inliers obtained after k iterations

RANSAC How many iterations does RANSAC need? Ideally: check all possible combinations of 2 points in a dataset of N points. Number of all pairwise combinations: N(N-1)/2 computationally unfeasible if N is too large. example: 10 000 edge points need to check all 10 000*9999/2= 50 million combinations! Do we really need to check all combinations or can we stop after some iterations? Checking a subset of combinations is enough if we have a rough estimate of the percentage of inliers in our dataset This can be done in a probabilistic way

RANSAC How many iterations does RANSAC need? w := percentage of inliers: number of inliers/n N := total number of data points w : fraction of inliers in the dataset w = P(selecting an inlier-point out of the dataset) Let p := P(selecting a set of points free of outliers) Assumption: the 2 points necessary to estimate a line are selected independently w 2 = P(both selected points are inliers) 1-w 2 = P(at least one of these two points is an outlier) Let k := no. RANSAC iterations executed so far ( 1-w 2 ) k = P(RANSAC never selects two points that are both inliers) 1-p = ( 1-w 2 ) k and therefore : k log(1 log(1 p) ) 2 w

RANSAC How many iterations does RANSAC need? The number of iterations k is k log(1 log(1 p) ) 2 w knowing the fraction of inliers w, after k RANSAC iterations we will have a probability p of finding a set of points free of outliers Example: if we want a probability of success p=99% and we know that w=50% k=16 iterations these are dramatically fewer than the number of all possible combinations! Notice that the number of points does not influence the estimated number of iterations, only w does! In practice we need only a rough estimate of w. More advanced variants of RANSAC estimate the fraction of inliers and adaptively change it on every iteration

RANSAC - Algorithm Let A be a set of N points 1. repeat 2. Randomly select a sample of 2 points from A 3. Fit a line through the 2 points 4. Compute the distances of all other points to this line 5. Construct the inlier set (i.e. count the number of points whose distance < d) 6. Store these inliers 7. until maximum number of iterations k reached 8. The set with the maximum number of inliers is chosen as a solution to the problem

RANSAC - Applications RANSAC = RANdom SAmple Consensus. A generic & robust fitting algorithm of models in the presence of outliers (i.e. points which do not satisfy a model) Can be applied in general to any problem where the goal is to identify the inliers which satisfy a predefined model.? Typical applications in robotics are: line extraction from 2D range data, plane extraction from 3D data, feature matching, structure from motion, camera calibration, homography estimation, etc. RANSAC is iterative and non-deterministic the probability to find a set free of outliers increases as more iterations are used Drawback: a non-deterministic method, results are different between runs.

Outline of Today s lecture Place recognition using Vocabulary Tree Line extraction from images Line extraction from laser data RANSAC (same as for line dtection from images) Hough (same as for line dtection from images) Split and Merge (only laser scans: uses sequential order of scan data) Line regression (only laser scans: uses sequential order of scan data) Introduction 79

Algorithm 1: Split-and-Merge (standard) Popular algorithm, originates from Computer Vision. A recursive procedure of fitting and splitting. A slightly different version, called Iterative end-point-fit, simply connects the end points for line fitting. Let S be the set of all data points Split Fit a line to points in current set S Find the most distant point to the line If distance > threshold split set & repeat with left and right point sets Merge If two consecutive segments are collinear enough, obtain the common line and find the most distant point If distance <= threshold, merge both segments Introduction 80

Algorithm 1: Split-and-Merge (iterative end-point-fit) Iterative end-point-fit: simply connects the end points for line fitting

Algorithm 1: Split-and-Merge (iterative end-point-fit) Iterative end-point-fit: simply connects the end points for line fitting Split

Algorithm 1: Split-and-Merge (iterative end-point-fit) Iterative end-point-fit: simply connects the end points for line fitting Split Split Split

Algorithm 1: Split-and-Merge (iterative end-point-fit) Iterative end-point-fit: simply connects the end points for line fitting Split Split No more Splits Split

Algorithm 1: Split-and-Merge (iterative end-point-fit) Iterative end-point-fit: simply connects the end points for line fitting Split Split Merge No more Splits Split

Algorithm 2: Line-Regression Sliding window of size N f points Fit line-segment to all points in each window Then adjacent segments are merged if their line parameters are close Line-Regression Initialize sliding window size N f Fit a line to every N f consecutive points (i.e. in each window) Merge overlapping line segments + recompute line parameters for each segment N f = 3 Introduction 86