Incremental Light Bundle Adjustment for Robotics Vadim Indelman, Andrew Melim, Frank Dellaert Robotics and Intelligent Machines (RIM) Center College of Computing Georgia Institute of Technology
Introduction Robot : Recover the state of a moving robot over time through fusion of multiple sensors, including a monocular camera Simultaneous Localization and Mapping (SLAM) Left image courtesy of Georgia Tech Research Institute Right image courtesy of Chris Beall Indelman et al., Incremental Light Bundle Adjustment for Robot 2
Vision-Aided Robot Fusion of monocular image measurements and IMU measurements Full joint pdf: Q N p(x, L, B Z) / @p(x i x i,b i,zi,i IMU Q ) p(zi,j VIS x i,l j ) A i j 2 M i N M i : number of robot states : set of observed 3D points at state i l j x i zi,j VIS zi,i IMU b i : j-th 3D point : Robot state at time i (pose and velocity) : Image observation : IMU measurement : IMU Bias at time i Image from: http://www.tnt.uni-hannover.de/project/motionestimation 3
Vision-Aided Robot Fusion of monocular image measurements and IMU measurements Full joint pdf: Q N p(x, L, B Z) / @p(x i x i,b i,zi,i IMU Q ) p(zi,j VIS x i,l j ) A i j 2 M i N M i : number of robot states : set of observed 3D points at state i Assuming Gaussian distributions: X,Y,B l j x i zi,i IMU : j-th 3D point : Robot state at time i (pose and velocity) : Image observation : IMU measurement MAP estimate is obtained by b i : IMU Bias at time i J(X, L, B). = X,L,B = N P i = arg max X, L, B p(x, L, B Z) @ x i pred(x i,b i z IMU i,i ) 2 P IMU + P j 2 M z VIS i,j z VIS i,j (x i,l j ) 2 P VIS A Projection of a 3D point into the image plane (x i,l j ). = K i Ri t i lj Mahalanobis squared distance kak 2. = a T a Image from: http://www.tnt.uni-hannover.de/project/motionestimation 4
Factor Graph Representation [Kschischang et al. 2 ToIT] Factor graph: a graphical representation of the joint pdf factorization Full SLAM pdf: p(x, L, B Z) /. = {X, L, B} N Q i @p(x i x i p (X Z) / Y s f s (X s ),b i,z IMU i,i ) p (z i,j x i,l j ) / exp 2 kz i,j (x i,l j )k 2.= f proj (x i,l j ) p(x i x i,b i zi,i IMU ) exp( 2 x i pred(x i,b i,zi,i IMU 2 IMU) =. f IMU (x i,x i,b i ) l f bias (b k+,b k ). =exp 2 b k+ h b (b k ) 2 b f proj f proj Q j 2 M i p(z VIS f proj i,j x i,l j ) A The naïve IMU factor can add a significant number of unnecessary variables! f IMU f IMU f IMU f IMU f IMU x x 2 x 3 x 4 x 5 x 6 b b 2 b 3 b 4 b 5 f bias f bias f bias f bias 5
Incremental Light Bundle Adjustment (ilba) for Robot Problems! 3D structure is expensive to compute (and not necessary for navigation): Algebraically eliminate 3D points using multi-view geometry constraints Significantly reduce the number of variables for optimization 3D points can always be reconstructed (if required) based on optimized camera poses High rate sensors introduce large number of variables: Utilize pre-integration of IMU to reduce the number of variables [Lupton et al., TRO 22] Incremental inference requires only partial re-calculation Update factorization rather than compute from scratch 6
Three-View Constraints [Indelman et al., TAES 22] Theorem: Algebraic elimination of a 3D point that is observed by 3 views k,l and m leads to: g 2v (x k,x l,z k,z l ). = q k (t k!l q l )= g 2v (x l,x m,z l,z m ). = q l (t l!m q m )= Epipolar constraints t i!j g 3v (x k,x l,x m,z k,z l,z m ). =(q l q k ) (q m t l!m ) (q k t k!l ) (q m q l )= R i. q i = R T i K i z i : translation from view i to view j : rotation from global frame to view i Scale consistency Third equation relates between the magnitudes of t l!m and t k!l Necessary and sufficient conditions LBA cost function: J LBA (X) =. XN h kh i (X, Z)k 2 i h i 2 {g 2v,g 3v } i View k 3D point q k q l q m View m x m x k t k!l View l t l!m V. Indelman, P. Gurfil, E. Rivlin, H. Rotstein, Real-Time Vision-Aided Localization and Based on Three-View Geometry, IEEE Transactions on Aerospace and Electronic Systems, 22 x l 7
Vision Only : Light Bundle Adjustment (LBA) LBA cost function: h i : i-th multi-view constraint J LBA (X). = XN h i kh i (X, Z)k 2 i Involves several views and the corresponding image observations i : An equivalent covariance i = A i A T i : Jacobian with respect to image observations A i Number of optimized variables: 6N +3M 6N Multi-view constraints - Different formulations in literature Epipolar geometry, trifocal tensors, quadrifocal tensors etc. Independent relations exist only between up to three cameras [Ma et al., 24] Here, three-view constraints formulation is used Originally developed for navigation aiding [Indelman et al., TAES 22] 8
Pre-Integrated IMU Factors Pre-integrate IMU measurements and insert equivalent factors only when inserting new LBA factors into the graph. x i!j = pi!j, v i!j,rj i = Zi!j IMU,b i, f Equiv (x j,x i,b i ) =exp. 2 x j h Equiv (x i,b i, x i!j ) 2 Components of x i!j are expressed in body-frame, not navigation frame, which allows relinearization of the factor without repeated computation Equivalent IMU (non-linear) factor [Lupton et al., TRO 22] Significantly reduces graph size, and subsequently time for elimination. f IMU f IMU f IMU f IMU f IMU x x 2 x 3 x 4 x 5 x 6 f Equiv x x 6 b b 2 b 3 b 4 b 5 f bias f bias f bias f bias Todd Lupton and Salah Sukkarieh, Visual-Inertial-Aided for High-Dynamic Motion in Built Environments Without Initial Conditions, IEEE Transactions on Robotics, 22 b 9
Second Component - Incremental Inference [Kaess et al., 22] When adding new variables\factors, calculations can be reused Factorization can be updated (and not re-calculated from scratch) Example: Linearization and elimination Elimination order x,b,x 6,b 2,x 2 2-view factor f Equiv x x 6 b x x 6 b 3-view factor f bias 2-view factor f Equiv x Linearization 2 A = b 2 x 2 b 2 R = Jacobian matrix 2 3 6 7 4 5 x x 6 x 2 b b 2 Factorization Factorized Jacobian matrix 2 3 6 7 4 5 x b x 6 b 2 x 2 b 2
Incremental Inference in ilba (Cont.) Example: New camera and factors are added Linearization 2 3 R = 6 7 4 5 x b x 6 b 2 x 2 b 3 x 2 Nodes in all paths that lead from the last-eliminated node to nodes involved in new factors What should be re-calculated? A = Efficiently calculated using Bayes tree [Kaess et al., 22] 2 3 6 7 4 5 x x 6 x 2 b b 2 x 2 b 3
ilba for Robotics Monte-Carlo Study Position -sigma errors and sqrt. covariance North [m] 5 True Inertial East [m] Height [m] 5 5 8 6 4 2 5 5 3 2. LBA Full BA Sqrt. Cov. LBA Sqrt. Cov. Full BA 5 5 Time [sec] Euler Angles -sigma errors and sqrt. covariance Height [m] 2 2 4 4 2 5 2 4 6 8 East [m] 5 North [m] 5 Acceleration bias -sigma errors and sqrt. covariance [deg].5 X Axis [mg] 5 5 5 5 5.8 [deg].6.4.2 Y Axis [mg] 5 5 5 5 5.4 5 [deg].2 Z Axis [mg] 5 5 5 Time [sec] 5 5 Time [sec] 2
ilba for Robotics Simulation Results Scenario (GE for illustration) Estimated trajectory Scenario: LBA + IMU 6 BA + IMU 4 Single camera 5 4 3 65 2 IMU North [m] 64 8 Processing time 25 LBA + IMU BA + IMU 66 68 246 4 3 2 5 25 5 5 East [m] 5 2 Position estimation errors 2 5 Position errors [m] Processing time [sec] 75 2 6 5 IMU only 4 2 Ground Truth 2 3 4 Scenario Time [sec] 5 6 7 LBA + IMU BA + IMU 5 5 2 3 4 Scenario Time [sec] 5 6 7 3
Conclusions Algebraic elimination of 3D points significantly reduces the size of the optimization problem and provides speed up to online robot navigation Use of pre-integration methods for high-frequency inertial measurements also reduces the size of the problem Accuracy is similar to full SLAM At least 2-3.5x speed up in computation time Code and datasets are available from the author s website http://www.cc.gatech.edu/~vindelma 4