School of Computing University of Utah
Presentation Outline 1 2 3
Forward Projection (Reminder) u v 1 KR ( I t ) X m Y m Z m 1
Backward Projection (Reminder) Q K 1 q Q K 1 u v 1
What is pose estimation? The problem of determining the position and orientation of the camera relative to the object (or vice-versa). We use the correspondences between 2D image pixels (and thus camera rays) and 3D object points (from the world) to compute the pose.
We consider that the camera is calibrated, i.e. we know its calibration matrix K. We are given three 2D image to 3D object correspondences. Let the 3 2D points be given by: u 1 q 1 = v 1 1 u 2 q 2 = v 2 1 u 3 q 3 = v 3 1. Let the 3 3D points be given by: Q m 1, Qm 2, Qm 3
Input and Unknowns Given q i, Q m i, i = {1, 2, 3}, and K in the following equation: q i KR ( I t ) Q m i, i = {1, 2, 3} Our goal is to compute the rotation matrix R and the translation t.
Reformulation of We can compute Q c i as follows: Q c i K 1 q i Q c i = λ i K 1 q i Here λ i is an unknown scalar that determines the distance of the 3D point Q c i from the optical center along the ray OQ c i.
Reformulation of Q c i = λ i K 1 q i We simplify the notations, let us denote K 1 q i as follows: X i K 1 q i = Y i Z i (1)
Reformulation of Q c i = λ i The pose estimation can be seen as the computation of the unknown λ i parameters. X i Y i Z i
Reformulation of (λ 1 X 1 λ 2 X 2 ) 2 + (λ 1 Y 1 λ 2 Y 2 ) 2 + (λ 1 Z 1 λ 2 Z 2 ) 2 = d 2 12 (λ 2 X 2 λ 3 X 3 ) 2 + (λ 2 Y 3 λ 3 Y 3 ) 2 + (λ 2 Z 2 λ 3 Z 3 ) 2 = d 2 23 (λ 3 X 3 λ 1 X 1 ) 2 + (λ 3 Y 3 λ 1 Y 1 ) 2 + (λ 3 Z 3 λ 1 Z 1 ) 2 = d 2 31 We have 3 quadratic equations and 3 unknowns. We can have a total of 2 3 possible solutions for the three parameters (λ 1, λ 2, λ 3 ). Several numerical methods exist to solve the polynomial system of equations.
Presentation Outline 1 2 3
Sample Problem Compute the solution for pose estimation when λ 1 is given. (λ 1 X 1 λ 2 X 2 ) 2 + (λ 1 Y 1 λ 2 Y 2 ) 2 + (λ 1 Z 1 λ 2 Z 2 ) 2 = d12 2 (λ 2 X 2 λ 3 X 3 ) 2 + (λ 2 Y 3 λ 3 Y 3 ) 2 + (λ 2 Z 2 λ 3 Z 3 ) 2 = d23 2 (λ 3 X 3 λ 1 X 1 ) 2 + (λ 3 Y 3 λ 1 Y 1 ) 2 + (λ 3 Z 3 λ 1 Z 1 ) 2 = d31 2 Compute λ 2 from the first equation. Compute λ 3 from the third equation. Use the second equation to remove incorrect solutions for λ 2 and λ 3.
We consider that the camera is calibrated, i.e. we know its calibration matrix K. K = 200 0 320 0 200 240 0 0 1 K 1 = 1 1 0 320 0 1 240 200 0 0 200 We are given three 2D image to 3D object correspondences. Let the 3 2D points be given by: q 1 = 320 140 q 2 = 320 50 3 290 q 3 = 320 + 50 3 290. 1 1 1 Let the inter-point distances be given by {d 12 = 1000, d 23 = 1000, d 31 = 1000} Is it possible to have λ 1 λ 2?
using n correct correspondences We can compute the pose using 3 correct correspondences. How to compute pose using n correspondences, with outliers. Use RANSAC to identify m inliers where m n. Use least squares to find the best pose using all the inliers - basic idea is to use all the forward projection equations for all the inliers and compute R and t.
General Version - RANSAC (REMINDER) 1 Randomly choose s samples Typically s = minimum sample size that lets you fit a model 2 Fit a model (e.g., line) to those samples 3 Count the number of inliers that approximately fit the model 4 Repeat N times 5 Choose the model that has the largest set of inliers Slide: Noah Snavely
Let us do RANSAC!
Matching Images We match keypoints from left and right images. 2D-to-2D image matching using descriptors such as SIFT.
Kinect Sample Frames Sequences of RGBD frames (I1, D1 ), (I2, D2 ), (I3, D3 ),..., (In, Dn ). How to register Kinect depth data for reconstructing large scenes? We have 2D-3D pose estimators and 2D-2D image matchers.
Kinect Sample Frames
Matching Images We match keypoints from left and right images. One of the matches is incorrect! In a general image matching problem with 1000s of matches, we can have 100 s of incorrect matches.
Presentation Outline 1 2 3
(Two view triangulation) Given: calibration matrices - (K 1, K 2 ). Given: Camera poses - {(R 1, t 1 ), (R 2, t 2 )} are known. Given: 2D point correspondence - (q 1, q 2 ). Our goal is to find the associated 3D point Q m.
(Two view triangulation) Due to noise, the back-projected rays don t intersect. The required point is given by Q m = Qm 1 +Qm 2 2. The 3D point on the first back-projected ray is given by: q 1 K 1 R 1 (I t 1 )Q m 1. The 3D point on the second back-projected ray is given by: q 2 K 2 R 2 (I t 2 )Q m 2.
(Two view triangulation) Let us parametrize the 3D points using λ 1 and λ 2 : Q m 1 = t 1 + λr T 1 K 1 1 q 1 Q m 2 = t 2 + λr T 2 K 1 2 q 2 We rewrite using 3 1 constant vectors a, b, c and d for simplicity: Q m 1 = a + λb Q m 2 = c + λd
(Two view triangulation) We can compute λ 1 and λ 2 as follows: [λ 1, λ 2 ] = arg min λ 1,λ 2 dist(q m 1, Q m 2 )
(Two view triangulation) dist(q m 1, Q m 2 ) = 3 (a i + λ 1 b i c i λ 2 d i ) 2 i=1 [λ 1, λ 2 ] = arg min 3 (a i + λ 1 b i c i λ 2 d i ) 2 λ 1,λ 2 [λ 1, λ 2 ] = arg min λ 1,λ 2 D sqr = i=1 3 (a i + λ 1 b i c i λ 2 d i ) 2 i=1 3 (a i + λ 1 b i c i λ 2 d i ) 2 i=1
(Two view triangulation) At minima: D sqr = D sqr λ 1 = 3 (a i + λ 1 b i c i λ 2 d i ) 2 i=1 3 2(a i + λ 1 b i c i λ 2 d i )b i = 0 i=1 D sqr λ 2 = 3 2(a i + λ 1 b i c i λ 2 d i )d i = 0 i=1 We have two linear equations with two variables λ 1 and λ 2. This can be solved!
Once λ s are computed then we can obtain: Q m 1 = a + λ 1 b Q m 2 = c + λ 2 d We can compute the required intersection point Q m from the mid-point equation: Q m = Qm 1 +Qm 2 2
Sample Calibration matrices: K 1 = K 2 = Rotation matrices:r 1 = R 2 = I. 200 0 320 0 200 240 0 0 1 Translation matrices: t 1 = 0, t 2 = (100, 0, 0) T. Correspondence: q 1 = 520 440 q 2 = 500 440 1 1 Compute the 3D point Q m.
Simple Pipeline 1 Given a sequence of images {I 1, I 2,..., I n } with known calibration, obtain 3D reconstruction. 2 Compute correspondences for the image pair (I 1, I 2 ). 3 Find the motion between I 1 and I 2 using motion estimation algorithm (next class). 4 Compute partial 3D point cloud P 3D using the point correspondences from (I 1, I 2 ). 5 Initialize k = 3. 6 Compute correspondences for the pair (I k 1, I k ) and compute the pose of I k with respect to P 3D. 7 Increment P 3D using 3D reconstruction from (I k 1, I k ). 8 k = k + 1 9 If k < n go to Step 5.
Acknowledgments Some presentation slides are adapted from the following materials: Peter Sturm, Some lecture notes on geometric computer vision (available online).