Visual Odometry. Features, Tracking, Essential Matrix, and RANSAC. Stephan Weiss Computer Vision Group NASA-JPL / CalTech

Similar documents
Hybrids Mixed Approaches

Dealing with Scale. Stephan Weiss Computer Vision Group NASA-JPL / CalTech

Application questions. Theoretical questions

CSE 252B: Computer Vision II

Master Automática y Robótica. Técnicas Avanzadas de Vision: Visual Odometry. by Pascual Campoy Computer Vision Group

CSE 527: Introduction to Computer Vision

Stereo and Epipolar geometry

CS231A Midterm Review. Friday 5/6/2016

Overview. Augmented reality and applications Marker-based augmented reality. Camera model. Binary markers Textured planar markers

Vision-based Mobile Robot Localization and Mapping using Scale-Invariant Features

Davide Scaramuzza. University of Zurich

Combining Appearance and Topology for Wide

calibrated coordinates Linear transformation pixel coordinates

Visual Tracking (1) Tracking of Feature Points and Planar Rigid Objects

EXAM SOLUTIONS. Image Processing and Computer Vision Course 2D1421 Monday, 13 th of March 2006,

Feature Based Registration - Image Alignment

Local features and image matching. Prof. Xin Yang HUST

Step-by-Step Model Buidling

Vision par ordinateur

CS664 Lecture #19: Layers, RANSAC, panoramas, epipolar geometry

Autonomous Navigation for Flying Robots

Homographies and RANSAC

Outline 7/2/201011/6/

Structure from Motion. Introduction to Computer Vision CSE 152 Lecture 10

arxiv: v1 [cs.cv] 28 Sep 2018

Augmented Reality VU. Computer Vision 3D Registration (2) Prof. Vincent Lepetit

Feature Detectors and Descriptors: Corners, Lines, etc.

Invariant Features from Interest Point Groups

CS 231A Computer Vision (Winter 2014) Problem Set 3

Computational Optical Imaging - Optique Numerique. -- Multiple View Geometry and Stereo --

Final Exam Study Guide CSE/EE 486 Fall 2007

Week 2: Two-View Geometry. Padua Summer 08 Frank Dellaert

Scale Invariant Feature Transform

COMPARATIVE STUDY OF DIFFERENT APPROACHES FOR EFFICIENT RECTIFICATION UNDER GENERAL MOTION

Seminar Heidelberg University

Local invariant features

Long-term motion estimation from images

Dense Tracking and Mapping for Autonomous Quadrocopters. Jürgen Sturm

Last lecture. Passive Stereo Spacetime Stereo

Image Features: Local Descriptors. Sanja Fidler CSC420: Intro to Image Understanding 1/ 58

Robust Geometry Estimation from two Images

Mosaics. Today s Readings

3D Photography: Epipolar geometry

Outline. Introduction System Overview Camera Calibration Marker Tracking Pose Estimation of Markers Conclusion. Media IC & System Lab Po-Chen Wu 2

Towards a visual perception system for LNG pipe inspection

SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014

3D Computer Vision. Structure from Motion. Prof. Didier Stricker

Dense 3D Reconstruction. Christiano Gava

Multiple-Choice Questionnaire Group C

Index. 3D reconstruction, point algorithm, point algorithm, point algorithm, point algorithm, 263

ORB SLAM 2 : an Open-Source SLAM System for Monocular, Stereo and RGB-D Cameras

Image Stitching. Slides from Rick Szeliski, Steve Seitz, Derek Hoiem, Ira Kemelmacher, Ali Farhadi

Deep Learning: Image Registration. Steven Chen and Ty Nguyen

RANSAC and some HOUGH transform

Segmentation and Tracking of Partial Planar Templates

Today s lecture. Image Alignment and Stitching. Readings. Motion models

CS 231A Computer Vision (Winter 2018) Problem Set 3

Srikumar Ramalingam. Review. 3D Reconstruction. Pose Estimation Revisited. School of Computing University of Utah

Computational Optical Imaging - Optique Numerique. -- Single and Multiple View Geometry, Stereo matching --

Index. 3D reconstruction, point algorithm, point algorithm, point algorithm, point algorithm, 253

Image matching. Announcements. Harder case. Even harder case. Project 1 Out today Help session at the end of class. by Diva Sian.

Planetary Rover Absolute Localization by Combining Visual Odometry with Orbital Image Measurements

Live Metric 3D Reconstruction on Mobile Phones ICCV 2013

OpenStreetSLAM: Global Vehicle Localization using OpenStreetMaps

Image processing and features

Direct Methods in Visual Odometry

CSE 527: Introduction to Computer Vision

Local features: detection and description. Local invariant features

Autonomous Navigation for Flying Robots

Visual SLAM for small Unmanned Aerial Vehicles

Rectification and Disparity

CS 664 Slides #9 Multi-Camera Geometry. Prof. Dan Huttenlocher Fall 2003

TEGRA K1 AND THE AUTOMOTIVE INDUSTRY. Gernot Ziegler, Timo Stich

Epipolar Geometry and Stereo Vision

Computer Vision with MATLAB MATLAB Expo 2012 Steve Kuznicki

Recovering structure from a single view Pinhole perspective projection

Wide-Baseline Stereo Vision for Mars Rovers

3D Reconstruction of a Hopkins Landmark

Instance-level recognition part 2

arxiv: v1 [cs.cv] 18 Sep 2017

Final Exam Study Guide

Tightly-Integrated Visual and Inertial Navigation for Pinpoint Landing on Rugged Terrains

Dense 3D Reconstruction. Christiano Gava

Midterm Examination CS 534: Computational Photography

Multiple Views Geometry

Automatic Image Alignment (feature-based)

CS 532: 3D Computer Vision 7 th Set of Notes

3D Scene Reconstruction with a Mobile Camera

CSCI 5980/8980: Assignment #4. Fundamental Matrix

Master Thesis. Vision system for indoor UAV flight. Author: Jonas B. Markussen. Period: 1/2/2015-3/6/2015. Vision, Graphics & Interactive Systems

Autonomous Mobile Robot Design

CSE 252B: Computer Vision II

Eppur si muove ( And yet it moves )

55:148 Digital Image Processing Chapter 11 3D Vision, Geometry

Srikumar Ramalingam. Review. 3D Reconstruction. Pose Estimation Revisited. School of Computing University of Utah

Capturing, Modeling, Rendering 3D Structures

Instance-level recognition

Epipolar Geometry class 11

Monocular Visual Odometry

CS 664 Image Matching and Robust Fitting. Daniel Huttenlocher

Transcription:

Visual Odometry Features, Tracking, Essential Matrix, and RANSAC Stephan Weiss Computer Vision Group NASA-JPL / CalTech Stephan.Weiss@ieee.org (c) 2013. Government sponsorship acknowledged.

Outline The Camera as a sensor Camera motion estimation: the essential matrix Dealing with noise: RANSAC Getting to the point Keeping the point Opportunity EDL Trajectory 2

Camera Motion Estimation Why using a camera? Vast information Extremely low Size, Weight, and Power (SWaP) footprint Cheap and easy to use Passive sensor Processing power is OK today After all, it's what nature uses, too! Cellphone type camera, up to 16Mp (480MB/s @ 30Hz) Cellphone processor unit 1.7GHz quadcore ARM <10g Camera motion estimation Understand the camera as a sensor What information in the image is particularly useful Estimate camera 6(5)DoF using 2 images: Visual Odometry (VO) stereo vision monocular vision 3

A Camera is a Bearing Sensor Projective sensor which measures the bearing of a point with respect to the optical axis Depth can be inferred by re-observing a point from different angles The movement (i.e. the angle between the observations) is the point's parallax A point at infinity is a feature which exhibits no parallax during camera motion The distance of a star cannot be inferred by moving a few kilometers BUT: it is a perfect bearing reference for attitude estimation: NASA's star tracker sensors better than 1 arc second or 0.00027deg star tracker 4

Perspective Camera Projection on the Image Plane assume calibrated camera 5

Epipolar Constraint Suppose a camera undergoes motion C,C,x,x and X are coplanar 6

Epipolar Constraint Suppose a camera undergoes motion What if only C,C,x are known? 7

Epipolar Constraint Suppose a camera undergoes motion All points on π project on l and l 8

Epipolar Constraint Suppose a camera undergoes motion Family of planes π and lines l and l Intersection in e and e 9

Epipolar Constraint Formulating the epipolar constraint: 3D point transformation: Using projected points in the image plane: Divide by λ 1, multiply by Multiply by TX : X2 X1 : Essential Matrix: E = T R C 1 C2 10

Motion Estimation: Solving the Essential Matrix T x 2 Ex=0 x 2 x 1 e 11+ x 2 y 2 e 12 +x 2 e 13 + y 1 x 2 e 21 + y 1 y 2 e 22 + y1 e 23 +x 2 e 31+ y 2 e 32 +e 33=0 separate known from unknown [ x 1 x 2, x 1 y 2, x 1, y 1 x 2, y1 y 2, y 1, x 2, y 2,1 ] [ e 11,e 12,e 13,e 21,e 22,e 23,e 31,e 32,e 33] (data) [ x11 x 21 x 1n x 2n T =0 (unknowns) (linear) x 11 y 21 x 1n y 2n x11 x 1n y 11 x 21 y 1n x 2n y11 y 21 y 1n y 2n y11 y 1n x 21 x 2n ] y 21 1 e =0 y 2n 1 Ae=0 11

Motion Estimation: Solving the Essential Matrix X T e 1 E =0 Ee 2=0 dete=0 rank E=2 SVD from linearly computed E matrix (rank 3) E =U [ σ1 σ2 σ3 ] C1 T V =U 1 σ 1 V T T T +U σ V +U σ V 1 2 2 2 3 3 3 Compute closest rank-2 approximation E'=U [ σ1 σ2 0 ] C2 min E-E' E V T =U 1 σ 1 V T1 +U 2 σ 2 V T2 E is essential matrix if and only if two singularvalues are equal (and third=0) E = Udiag(1,1,0)V T 12

Motion Estimation: linear 8-point algorithm [ x11 x 21 x 1n x 2n ~10000! x 11 y 21 x 1n y 2n ~10000 x11 x 1n ~100 y 11 x 21 y 1n x 2n ~10000 y11 y 21 y 1n y 2n ~10000 y11 y 1n x 21 x 2n ~100 ~100 ] y 21 1 e =0 y 2n 1 ~100 1 Orders of magnitude difference Between column of data matrix not normalized least-squares yields poor results Normalized least squares: (0,500) (0,0) (700,500) (700,0) 2 700 0 2 500 1 1 1 (-1,1) (1,1) (0,0) (-1,-1) (1,-1) 13

Recovering the Pose Recover R, T from E Only one solution where points is in front of both cameras Apply motion consistency E = T R 14

From 8 points to 5 points Linear 8-point algorithm Ae=0 Problem is only of dimension 5 (3 for rotation, 2 for translation up to scale) Linear formulation is fast and simple to solve Non-linear 5-point algorithm (Nistér PAMI 204) Finding roots of cubic polynomials Mathematically hairy but fast implementations exist 15

Motion estimation with less than 5 points General case is a 5-dimensional problem Constraining the general case reduces the dimensionality: Homography: Planar constraint, 4 points Multi plane homography VO: Y. Cheng (ICRA 2010) 16

Motion estimation with less than 5 points General case is a 5-dimensional problem Constraining the general case reduces the dimensionality: Using IMU for rotation: 2-dim constraint for translation up to scale Using robot model and kinematics: 1 point [Scaramuzza et al. IJCV 2011] Special case: known 3D coordinates of the points: stereo vision [Weiss et al. ICRA 2012] [Scaramuzza et al. IJCV 2011] 17

The RANSAC (RAndom SAmple Consensus) Algorithm for model fitting and outlier rejection Assume: The model parameters can be estimated from N data items (e.g. essential matrix from 5-8 points) There are M data items in total. The algorithm: 1. Select N data items at random 2. Estimate parameters (linear or nonlinear least square, or other) 3. Find how many data items (of M) fit the model with parameter vector within a user given tolerance, T. Call this k. if K is the largest (best fit) so far, accept it. 4. Repeat 1. to 4. S times Questions: What is the tolerance? How many trials, S, ensure success? 18

The RANSAC (RAndom SAmple Consensus) Algorithm for model fitting To ensure that RANSAC has high chance to find correct inliers, a sufficient number of trials must be executed. Let p be the probability of inliers of any given correspondence and P is a success probability after S trials. We have (1 P ) = (1 p k ) S where p quickly decreases if many points are needed to fit the model! And S= log(1 P) log(1 p k ) Model fitting needs to be fast: this is executed at every camera frame! [Scaramuzza et al. IJCV 2011] 19

Scale propagation Scale of translation estimation between image pairs can vary arbitrarily Ae=0=λ Ae=Λ Ae Use common points to unify (propagate) the scale factor Accumulated errors lead to scale drift 20

Getting to the point: image features Need at least 5 point correspondences in each image to determine general transformation Extract salient points: feature detector Detect the same salient points independently in both images 21

Getting to the point: image features Need at least 5 point correspondences in each image to determine general transformation Extract salient points: feature detector Detect the same salient points independently in both images Get sufficient information to recognize one point in the other image again 22

Getting to the point: Feature detectors Some examples: FAST AGAST SIFT (DoG) SURF (discretized DoG) General Idea: Extract high contrast areas in the image This often is at object borders: Parallax issue Avoid edges Computaional complexity Be as fast as possible: For every image 100s of features Trade-off between high quality features (good repeatability) and computational complexity homogeneous edge corner 23

Getting to the point: be FAST Mostly used in real-time robotics applications FAST/AGAST: on average checks 2.5 pixels per feature! Machine Learning was applied to Generate a decision tree, that quickly discards pixels that are not a corner Decision tree is build based on evaluating all 16 pixels and a training set From the decision tree, C, Python or Matlab code is generated (~6000 lines of code) Available at: http://mi.eng.cam.ac.uk/~er258/work/fast.html 24

Keeping the Points: Tracking Idea: describe the region around a feature to find it again in the other image Examples of feature descriptors: Image patch SIFT SURF BRISK DAISY... Should find the same feature again even if image is rotated, affine transformed, and scaled Any descriptor aims at a loss-less compression of the surrounding image patch including some invariances Apply normalization to mitigate illumination changes (e.g. ZM-SAD, ZM-SSD) BRISK descriptor sampling pattern This is still in the kernel for motion estimation: happens 100s of times per frame 25

Keeping the point Issue with *-invariant descriptors: Reduction in information? Binary descriptors: Further reduction of information Very fast to compute Issues with large datasets: Classification space is limited Only abrupt class changes possible Difficult to use for loop closures on large data sets 26

Keeping the Points: Speeding it up Feature tracking usually are the bottleneck in VO pipelines Need to be done 100s of times per frame Constrain search region Apply motion model (may include other sensors: IMU, GPS) Use pyramidal approach: rough matching in low res image, refine in high res image Combination of both (Klein & Murray, ISMAR 2007) Search cone based on motion model 80x60 (Klein & Murray ) 640x480 (Klein & Murray ) 27

Keeping the Points: Speeding it up In VO: translation and rotation is usually small Sophisticated descriptors might be an overkill Plain image patches can be used as descriptors This can be different for loop closing Retains most information 3x3 patches (8 pixels) can be computed efficiently by vector operations Not even patch warping may be need Search region must be kept very small! Robust outlier rejection often is preferred over robust and sophisticated feature matching Optical flow computed with image patch matching and IMU motion model (Weiss et al. IROS13): 50Hz on 1 core of a cell phone 1.7GHz quadcore processor board 28

Putting All Together: Mars Exploration Rover (2003) Not all robots drive/fly forever: some just need very good VO for some moments: Velocity estimation to land the Mars Exploration Rover Efficient Implementation Only template and window need to be rectified and flattened Computed on a coarse grid Homography assumption Application Region: Only computed in overlap region of images Sun direction parameter is used to mask out region around zero phase Parachute shadow 29

Putting All Together: Mars Exploration Rover (2003) 30

Q&A 31