Autonomous Mobile Robots Localization "Position" Global Map Cognition Environment Model Local Map Path Perception Real World Environment Motion Control Perception Sensors Vision Uncertainties, Line extraction from laser scans Zürich Autonomous Systems Lab
2 Today s lecture From object recognition to scene/place recognition Section 4.6 Omndirectional Cameras (today s exercise) Section 4.2.4 Uncertainties: Section 4.7 Representation + Propagation Line extraction from laser scans
3 Object Recognition Q: Is this Book present in the Scene? Scene Book
4 Object Recognition Q: Is this Book present in the Scene? Extract keypoints in both images
5 Object Recognition Q: Is this Book present in the Scene? Look for corresponding matches Most of the Book s keypoints are present in the Scene A: The Book is present in the Scene
6 Taking this a step further Find an object in an image? Find an object in multiple images? Find multiple objects in multiple images?
7 Bag of Words Extension to scene/place recognition: Is this image in my database? Robot: Have I been to this place before? loop closure problem, kidnapped robot problem Use analogies from text retrieval: Visual Words Vocabulary of Visual Words Bag of Words (BOW) approach
Images for this slide are courtesy of Mark Cummins R. Siegwart, D. Scaramuzza and M. Chli, ETH Zurich - ASL 8 Building the Visual Vocabulary Image Collection Extract Features Cluster Descriptors Descriptors space Examples of Visual Words:
9 Video Google: one image a thousand words? Image courtesy of Andrew Zisserman These features map to the same visual word Video Google [J.Sivic and A. Zisserman, ICCV 2003] Demo:
10 Efficient Place/Object Recognition Same principle: we can describe a scene as a collection of words and look up in the database for images with a similar collection of words What if we need to find an object/scene in a database of millions of images? Build Vocabulary Tree via hierarchical clustering Use the Inverted File system: each node in the tree is associated with a list of images containing an instance of this node [D. Nister and H. Stewénius, CVPR 2006]
11 Extract Features Descriptors Space Slide based on D. Nister s slides R. Siegwart, D. Scaramuzza and M. Chli, ETH Zurich - ASL
12 Extract Features Descriptors Space Slide based on D. Nister s slides R. Siegwart, D. Scaramuzza and M. Chli, ETH Zurich - ASL
13 Extract Features Descriptors Space Slide based on D. Nister s slides R. Siegwart, D. Scaramuzza and M. Chli, ETH Zurich - ASL
14 Extract Features Descriptors Space Slide based on D. Nister s slides R. Siegwart, D. Scaramuzza and M. Chli, ETH Zurich - ASL
15 Descriptors Space Slide based on D. Nister s slides R. Siegwart, D. Scaramuzza and M. Chli, ETH Zurich - ASL
16 Vocabulary Tree: Hierarchical Partitioning Slide based on D. Nister s slides R. Siegwart, D. Scaramuzza and M. Chli, ETH Zurich - ASL
17 Vocabulary Tree: Hierarchical Partitioning Slide based on D. Nister s slides R. Siegwart, D. Scaramuzza and M. Chli, ETH Zurich - ASL
18 Vocabulary Tree: Hierarchical Partitioning Slide based on D. Nister s slides R. Siegwart, D. Scaramuzza and M. Chli, ETH Zurich - ASL
19 Populate Vocabulary Tree Model images Slide based on D. Nister s slides R. Siegwart, D. Scaramuzza and M. Chli, ETH Zurich - ASL
20 Populate Vocabulary Tree Model images Slide based on D. Nister s slides R. Siegwart, D. Scaramuzza and M. Chli, ETH Zurich - ASL
21 Populate Vocabulary Tree Model images Slide based on D. Nister s slides R. Siegwart, D. Scaramuzza and M. Chli, ETH Zurich - ASL
22 Populate Vocabulary Tree Model images Slide based on D. Nister s slides R. Siegwart, D. Scaramuzza and M. Chli, ETH Zurich - ASL
23 Populate Vocabulary Tree Model images Slide based on D. Nister s slides R. Siegwart, D. Scaramuzza and M. Chli, ETH Zurich - ASL
24 Populate Vocabulary Tree Model images Slide based on D. Nister s slides R. Siegwart, D. Scaramuzza and M. Chli, ETH Zurich - ASL
25 Populate Vocabulary Tree Model images Slide based on D. Nister s slides R. Siegwart, D. Scaramuzza and M. Chli, ETH Zurich - ASL
26 Populate Vocabulary Tree Model images Slide based on D. Nister s slides R. Siegwart, D. Scaramuzza and M. Chli, ETH Zurich - ASL
27 Populate Vocabulary Tree Model images Slide based on D. Nister s slides R. Siegwart, D. Scaramuzza and M. Chli, ETH Zurich - ASL
28 Populate Vocabulary Tree Model images Slide based on D. Nister s slides R. Siegwart, D. Scaramuzza and M. Chli, ETH Zurich - ASL
29 Lookup a test mage Model images Test image Slide based on D. Nister s slides R. Siegwart, D. Scaramuzza and M. Chli, ETH Zurich - ASL
30 Lookup a test mage Model images Test image Slide based on D. Nister s slides R. Siegwart, D. Scaramuzza and M. Chli, ETH Zurich - ASL
31 Lookup a test mage Model images Test image Slide based on D. Nister s slides R. Siegwart, D. Scaramuzza and M. Chli, ETH Zurich - ASL
32 Robust object/scene recognition Visual Vocabulary holds appearance information but discards the spatial relationships between features Two images with the same features shuffled around in the image will be a 100% match according to this technique Usually appearance alone is enough, however, if different arrangements of the same features are expected then one might use geometric verification: Test the k most similar images to the query image for geometric consistency (e.g. using RANSAC)
33 Earlier approach to place recognition Image Histograms Aim: achieve compact representation of each scene Smooth the input image (using a Gaussian operator) Image representation = a set of 1D histograms describing the R,G,B components of the current image and its {Hue, Saturation, Luminance} Image Similarity Criterion: divergence of histograms (between query image and database) Make use of the entire image (instead of salient keypoints as in BOW approaches) maximize the camera s field of view use omni-directional cameras R. Siegwart, D. Scaramuzza and M. Chli, ETH Zurich - ASL
34 Short intro to Omni-directional Vision
35 Omni-directional Cameras ~180º FOV >180º FOV ~360º FOV wide FOV dioptric camera (e.g. fisheye) catadioptric camera (camera + shaped mirror) polydioptric camera (multiple cameras with overlapping FOV)
36 Omni-cam: example image
37 Omni-directional Vision: Why? Very popular in the robotics community Omni-vision offers: Tracking of landmarks in all directions Tracking of landmarks for longer periods of time better accuracy in motion estimation and map building Google Street View uses omni-directional snapshots to build photorealistic 3D maps
38 Central Omnidirectional Cameras A vision system is central when all optical rays intersect to a single point in 3D: projection center, single effective viewpoint Perspective Camera: a central projection system Non-Central Catadioptric Camera Central Catadioptric Camera mirror mirror Image plane (CCD) Single effective viewpoint is the camera optical center camera camera
39 Central Omnidirectional Cameras Central catadioptric cameras can be built by appropriate choice of Mirror shape, and Central Catadioptric Camera Distance of camera mirror mirror Why is a single effective point so desirable? Every pixel in the captured image, measures the irradiance of light in one particular direction Allows the generation of geometrically correct perspective images camera
40 Today s Exercise 3 How to Build a Range Finder using an Omnidirectional Camera Guidance notes on: www.asl.ethz.ch/education/master/mobile_robotics R. Siegwart, D. Scaramuzza and M. Chli, ETH Zurich - ASL
41 Uncertainties: Representation and Propagation & Line Extraction from Range data
42 Uncertainty Representation Section 4.1.3 of the book Sensing in the real world is always uncertain How can uncertainty be represented or quantified? How does uncertainty propagate? fusing uncertain inputs into a system, what is the resulting uncertainty? What is the merit of all this for mobile robotics?
43 Uncertainty Representation (2) Use a Probability Density Function (PDF) to characterize the statistical properties of a variable X: Expected value of variable X: Variance of variable X
44 Gaussian Distribution Most common PDF for characterizing uncertainties: Gaussian 68.26% 95.44% 99.72%
45 The Error Propagation Law Imagine extracting a line based on point measurements with uncertainties. Model parameters in polar coordinates [ (r, a) uniquely identifies a line ] r a The question: What is the uncertainty of the extracted line knowing the uncertainties of the measurement points that contribute to it? R. Siegwart, D. Scaramuzza and M. Chli, ETH Zurich - ASL
R. Siegwart, D. Scaramuzza and M. Chli, ETH Zurich - ASL 46 The Error Propagation Law Error propagation in a multiple-input multi-output system with n inputs and m outputs. X 1 Y 1 X i System Y i X n Y m Y j f j ( X1... X n)
47 The Error Propagation Law 1D case of a nonlinear error propagation problem It can be shown that the output covariance matrix C Y is given by the error propagation law: where C X : covariance matrix representing the input uncertainties C Y : covariance matrix representing the propagated uncertainties for the outputs. F X : is the Jacobian matrix defined as: which is the transpose of the gradient of f(x). R. Siegwart, D. Scaramuzza and M. Chli, ETH Zurich - ASL
48 Example: line extraction from laser scans
49 Line Extraction (1) Point-Line distance If each measurement is equally uncertain then sum of sq. errors: r r i x i (r i, q i ) d i Goal: minimize S when selecting (r, a) solve the system a q i Unweighted Least Squares R. Siegwart, D. Scaramuzza and M. Chli, ETH Zurich - ASL
50 Line Extraction (2) Point-Line distance Each sensor measurement, may have its own, unique uncertainty x i (r i, q i ) r d i Weighted Least Squares a
51 Line Extraction (2) Weighted least squares and solving the system: The uncertainty s i of each measurement is proportional to the measured distance r i Gives the line parameters: ˆ 2 ri ~ N( ri, s r ) i If what is the uncertainty in the line (r, a)? ~ ( ˆ 2 q N q, s ) i i q i R. Siegwart, D. Scaramuzza and M. Chli, ETH Zurich - ASL
Error Propagation: Line extraction The uncertainty of each measurement x i (r i, q i ) is described by the covariance matrix: The uncertainty in the line (r, a) is described by the covariance matrix: Define: Jacobian: 52 2 2 r r r C r s s s s a a a a.......... 0. 0 0.. 0. 0 0........... 0 0. 0.. 0 0. 0.. 0 0. 0 0... ) ( 0 0 ) ( 2 1 2 2 2 2 2 1 i i i i diag diag C x q q r r q r s s s s s s.................. 1 1 1 1 i i i i i i i i r r r r F q q r r q a q a r a r a rq 2 2 0 0 i i C xi q r s s Assuming that r i, q i are independent T x r F C F C rq rq a From Error Propagation Law =?
Autonomous Mobile Robots Feature Extraction from Range Data: Line extraction Split and merge Linear regression RANSAC Hough-Transform Zürich Autonomous Systems Lab
54 Extracting Features from Range Data photograph of corridor at ASL raw 3D scan plane segmentation result extracted planes for every cube
55 Extracting Features from Range Data goal: extract planar features from a dense point cloud example: left wall ceiling back wall right wall example scene showing a part of a corridor of the lab floor same scene represented as dense point cloud generated by a rotating laser scanner
56 Extracting Features from Range Data Map of the ASL hallway built using line segments
57 Extracting Features from Range Data Map of the ASL hallway built using orthogonal planes constructed from line segments
58 Features from Range Data: Motivation Point Cloud extract Lines / Planes Why Features: Raw data: huge amount of data to be stored Compact features require less storage Provide rich and accurate information Basis for high level features (e.g. more abstract features, objects) Here, we will study line segments The simplest geometric structure Suitable for most office-like environments R. Siegwart, D. Scaramuzza and M. Chli, ETH Zurich - ASL
59 Line Extraction: The Problem Extract lines from a Range scan (i.e. a point cloud) Three main problems: How many lines are there? Segmentation: Which points belong to which line? Line Fitting/Extraction: Given points that belong to a line, how to estimate the line parameters? Algorithms we will see: 1. Split and merge 2. Linear regression 3. RANSAC 4. Hough-Transform R. Siegwart, D. Scaramuzza and M. Chli, ETH Zurich - ASL
60 Algorithm 1: Split-and-Merge (standard) The most popular algorithm which is originated from computer vision. A recursive procedure of fitting and splitting. A slightly different version, called Iterative-End-Point-Fit, simply connects the end points for line fitting. Initialise set S to contain all points Split Fit a line to points in current set S Find the most distant point to the line If distance > threshold split & repeat with left and right point sets Merge If two consecutive segments are close/collinear enough, obtain the common line and find the most distant point If distance <= threshold, merge both segments R. Siegwart, D. Scaramuzza and M. Chli, ETH Zurich - ASL
61 Algorithm 1: Split-and-Merge (standard) The most popular algorithm which is originated from computer vision. A recursive procedure of fitting and splitting. A slightly different version, called Iterative-End-Point-Fit, simply connects the end points for line fitting. Initialise set S to contain all points Split Fit a line to points in current set S Find the most distant point to the line If distance > threshold split & repeat with left and right point sets Merge If two consecutive segments are close/collinear enough, obtain the common line and find the most distant point If distance <= threshold, merge both segments R. Siegwart, D. Scaramuzza and M. Chli, ETH Zurich - ASL
62 Algorithm 1: Split-and-Merge (standard) The most popular algorithm which is originated from computer vision. A recursive procedure of fitting and splitting. A slightly different version, called Iterative-End-Point-Fit, simply connects the end points for line fitting. Initialise set S to contain all points Split Fit a line to points in current set S Find the most distant point to the line If distance > threshold split & repeat with left and right point sets Merge If two consecutive segments are close/collinear enough, obtain the common line and find the most distant point If distance <= threshold, merge both segments R. Siegwart, D. Scaramuzza and M. Chli, ETH Zurich - ASL
63 Algorithm 1: Split-and-Merge (standard) The most popular algorithm which is originated from computer vision. A recursive procedure of fitting and splitting. A slightly different version, called Iterative-End-Point-Fit, simply connects the end points for line fitting. Initialise set S to contain all points Split Fit a line to points in current set S Find the most distant point to the line If distance > threshold split & repeat with left and right point sets Merge If two consecutive segments are close/collinear enough, obtain the common line and find the most distant point If distance <= threshold, merge both segments R. Siegwart, D. Scaramuzza and M. Chli, ETH Zurich - ASL
64 Algorithm 1: Split-and-Merge (Iterative-End-Point-Fit)
65 Algorithm 1: Split-and-Merge
66 Algorithm 2: Line-Regression Uses a sliding window of size N f The points within each sliding window are fitted by a segment Then adjacent segments are merged if their line parameters are close Line-Regression N f = 3 Initialize sliding window size N f Fit a line to every N f consecutive points (i.e. in each window) Compute a Line Fidelity Array: each element contains the sum of Mahalanobis distances between 3 consecutive windows (Mahalanobis distance used as a measure of similarity) Scan Fidelity array for consecutive elements < threshold (using a clustering algorithm). For every Fidelity Array element < Threshold, construct a new line segment Merge overlapping line segments + recompute line parameters for each segment R. Siegwart, D. Scaramuzza and M. Chli, ETH Zurich - ASL
67 Algorithm 2: Line-Regression Uses a sliding window of size N f The points within each sliding window are fitted by a segment Then adjacent segments are merged if their line parameters are close Line-Regression N f = 3 Initialize sliding window size N f Fit a line to every N f consecutive points (i.e. in each window) Compute a Line Fidelity Array: each element contains the sum of Mahalanobis distances between 3 consecutive windows (Mahalanobis distance used as a measure of similarity) Scan Fidelity array for consecutive elements < threshold (using a clustering algorithm). For every Fidelity Array element < Threshold, construct a new line segment Merge overlapping line segments + recompute line parameters for each segment R. Siegwart, D. Scaramuzza and M. Chli, ETH Zurich - ASL
68 Algorithm 2: Line-Regression Uses a sliding window of size N f The points within each sliding window are fitted by a segment Then adjacent segments are merged if their line parameters are close Line-Regression N f = 3 Initialize sliding window size N f Fit a line to every N f consecutive points (i.e. in each window) Compute a Line Fidelity Array: each element contains the sum of Mahalanobis distances between 3 consecutive windows (Mahalanobis distance used as a measure of similarity) Scan Fidelity array for consecutive elements < threshold (using a clustering algorithm). For every Fidelity Array element < Threshold, construct a new line segment Merge overlapping line segments + recompute line parameters for each segment R. Siegwart, D. Scaramuzza and M. Chli, ETH Zurich - ASL
69 Algorithm 2: Line-Regression Uses a sliding window of size N f The points within each sliding window are fitted by a segment Then adjacent segments are merged if their line parameters are close Line-Regression N f = 3 Initialize sliding window size N f Fit a line to every N f consecutive points (i.e. in each window) Compute a Line Fidelity Array: each element contains the sum of Mahalanobis distances between 3 consecutive windows (Mahalanobis distance used as a measure of similarity) Scan Fidelity array for consecutive elements < threshold (using a clustering algorithm). For every Fidelity Array element < Threshold, construct a new line segment Merge overlapping line segments + recompute line parameters for each segment R. Siegwart, D. Scaramuzza and M. Chli, ETH Zurich - ASL
70 Algorithm 2: Line-Regression Uses a sliding window of size N f The points within each sliding window are fitted by a segment Then adjacent segments are merged if their line parameters are close Line-Regression N f = 3 Initialize sliding window size N f Fit a line to every N f consecutive points (i.e. in each window) Compute a Line Fidelity Array: each element contains the sum of Mahalanobis distances between 3 consecutive windows (Mahalanobis distance used as a measure of similarity) Scan Fidelity array for consecutive elements < threshold (using a clustering algorithm). For every Fidelity Array element < Threshold, construct a new line segment Merge overlapping line segments + recompute line parameters for each segment R. Siegwart, D. Scaramuzza and M. Chli, ETH Zurich - ASL
71 Algorithm 2: Line-Regression Uses a sliding window of size N f The points within each sliding window are fitted by a segment Then adjacent segments are merged if their line parameters are close Line-Regression N f = 3 Initialize sliding window size N f Fit a line to every N f consecutive points (i.e. in each window) Compute a Line Fidelity Array: each element contains the sum of Mahalanobis distances between 3 consecutive windows (Mahalanobis distance used as a measure of similarity) Scan Fidelity array for consecutive elements < threshold (using a clustering algorithm). For every Fidelity Array element < Threshold, construct a new line segment Merge overlapping line segments + recompute line parameters for each segment R. Siegwart, D. Scaramuzza and M. Chli, ETH Zurich - ASL
72 Algorithm 3: RANSAC RANSAC = RANdom SAmple Consensus. It is a generic and robust fitting algorithm of models in the presence of outliers (i.e. points which do not satisfy a model) Generally applicable algorithm to any problem where the goal is to identify the inliers which satisfy a predefined model. Typical applications in robotics are: line extraction from 2D range data, plane extraction from 3D range data, feature matching, structure from motion, RANSAC is an iterative method and is non-deterministic in that the probability to find a set free of outliers increases as more iterations are used Drawback: A nondeterministic method, results are different between runs.
73 Algorithm 3: RANSAC
74 Algorithm 3: RANSAC Select sample of 2 points at random
75 Algorithm 3: RANSAC Select sample of 2 points at random Calculate model parameters that fit the data in the sample
76 RANSAC Select sample of 2 points at random Calculate model parameters that fit the data in the sample Calculate error function for each data point
77 Algorithm 3: RANSAC Select sample of 2 points at random Calculate model parameters that fit the data in the sample Calculate error function for each data point Select data that support current hypothesis
78 Algorithm 3: RANSAC Select sample of 2 points at random Calculate model parameters that fit the data in the sample Calculate error function for each data point Select data that support current hypothesis Repeat sampling
79 Algorithm 3: RANSAC Select sample of 2 points at random Calculate model parameters that fit the data in the sample Calculate error function for each data point Select data that support current hypothesis Repeat sampling
80 Algorithm 3: RANSAC Set with the maximum number of inliers obtained within k iterations
81 Algorithm 3: RANSAC ( for line extraction from 2D range data )
82 Algorithm 3: RANSAC How many iterations does RANSAC need? We cannot know in advance if the observed set contains the max. no. inliers ideally: check all possible combinations of 2 points in a dataset of N points. No. all pairwise combinations: N(N-1)/2 computationally infeasible if N is too large. example: laser scan of 360 points need to check all 360*359/2= 64,620 possibilities! Do we really need to check all possibilities or can we stop RANSAC after iterations? Checking a subset of combinations is enough if we have a rough estimate of the percentage of inliers in our dataset This can be done in a probabilistic way
83 Algorithm 3: RANSAC How many iterations does RANSAC need? w = number of inliers / N where N : tot. no. data points w : fraction of inliers in the dataset = probability of selecting an inlier-point Let p : probability of finding a set of points free of outliers Assumption: the 2 points necessary to estimate a line are selected independently w 2 = prob. that both points are inliers 1-w 2 = prob. that at least one of these two points is an outlier Let k : no. RANSAC iterations executed so far ( 1-w 2 ) k = prob. that RANSAC never selects two points that are both inliers. 1-p = ( 1-w 2 ) k and therefore : k log(1 log(1 p) ) 2 w R. Siegwart, D. Scaramuzza and M. Chli, ETH Zurich - ASL
84 Algorithm 3: RANSAC How many iterations does RANSAC need? The number of iterations k is k log(1 log(1 p) ) 2 w knowing the fraction of inliers w, after k RANSAC iterations we will have a prob. p of finding a set of points free of outliers. Example: if we want a probability of success p=99% and we know that w=50% k=16 iterations, which is much less than the number of all possible combinations! In practice we need only a rough estimate of w. More advanced implementations of RANSAC estimate the fraction of inliers & adaptively set it on every iteration. R. Siegwart, D. Scaramuzza and M. Chli, ETH Zurich - ASL
85 Algorithm 4: Hough-Transform Hough Transform uses a voting scheme
86 Algorithm 4: Hough-Transform A line in the image corresponds to a point in Hough space b 0 m 0
87 Algorithm 4: Hough-Transform What does a point (x 0, y 0 ) in the image space map to in the Hough space?
88 Algorithm 4: Hough-Transform What does a point (x 0, y 0 ) in the image space map to in the Hough space?
89 Algorithm 4: Hough-Transform Where is the line that contains both (x 0, y 0 ) and (x 1, y 1 )? It is the intersection of the lines b = x 0 m + y 0 and b = x 1 m + y 1 R. Siegwart, D. Scaramuzza and M. Chli, ETH Zurich - ASL
90 Algorithm 4: Hough-Transform Each point in image space, votes for line-parameters in Hough parameter space
91 Algorithm 4: Hough-Transform Problems with the (m,b) space: Unbounded parameter domain Vertical lines require infinite m Alternative: polar representation Each point in image space will map to a sinusoid in the (θ,ρ) parameter space R. Siegwart, D. Scaramuzza and M. Chli, ETH Zurich - ASL
92 Algorithm 4: Hough-Transform 1. Initialize accumulator H to all zeros 2. for each edge point (x,y) in the image for all θ in [0,180] Compute ρ = x cos θ + y sin θ H(θ, ρ) = H(θ, ρ) + 1 end end 3. Find the values of (θ, ρ) where H(θ, ρ) is a local maximum 4. The detected line in the image is given by ρ = x cos θ + y sin θ
93 Algorithm 4: Hough-Transform
94 Algorithm 4: Hough-Transform
95 Algorithm 4: Hough-Transform
96 Algorithm 4: Hough-Transform Effect of Noise
97 Algorithm 4: Hough-Transform Application: Lane detection
r R. Siegwart, D. Scaramuzza and M. Chli, ETH Zurich - ASL 98 Example Door detection using Hough Transform 100 200 300 400 500 600 700 20 40 60 80 100 120 140 160 180 q Hough Transform
99 Comparison of Line Extraction Algorithms Complexity Speed (Hz) False positives Precision Split-and-Merge N logn 1500 10% +++ Incremental S N 600 6% +++ Line-Regression N N f 400 10% +++ RANSAC S N k 30 30% ++++ Hough-Transform S N N C + S N R N C 10 30% ++++ Expectation Maximization S N 1 N 2 N 1 50% ++++ Split-and-merge, Incremental and Line-Regression: fastest Deterministic & make use of the sequential ordering of raw scan points (: points captured according to the rotation direction of the laser beam) If applied on randomly captured points only last 3 algorithms would segment all lines. RANSAC, HT and EM: produce greater precision more robust to outliers