Srikumar Ramalingam. Review. 3D Reconstruction. Pose Estimation Revisited. School of Computing University of Utah

School of Computing University of Utah

Presentation Outline 1 2 3

Forward Projection (Reminder) u v 1 KR ( I t ) X m Y m Z m 1

Backward Projection (Reminder) Q K 1 q

Presentation Outline 1 2 3

Sample Problem Compute the solution for pose estimation when λ 1 is given. (λ 1 X 1 λ 2 X 2 ) 2 + (λ 1 Y 1 λ 2 Y 2 ) 2 + (λ 1 Z 1 λ 2 Z 2 ) 2 = d12 2 (λ 2 X 2 λ 3 X 3 ) 2 + (λ 2 Y 3 λ 3 Y 3 ) 2 + (λ 2 Z 2 λ 3 Z 3 ) 2 = d23 2 (λ 3 X 3 λ 1 X 1 ) 2 + (λ 3 Y 3 λ 1 Y 1 ) 2 + (λ 3 Z 3 λ 1 Z 1 ) 2 = d31 2 Compute λ 2 from the first equation. Compute λ 3 from the third equation. Use the second equation to remove incorrect solutions for λ 2 and λ 3.

We consider that the camera is calibrated, i.e. we know its calibration matrix K. K = 200 0 320 0 200 240 0 0 1 K 1 = 1 1 0 320 0 1 240 200 0 0 200 We are given three 2D image to 3D object correspondences. Let the 3 2D points be given by: q 1 = 320 140 q 2 = 320 50 3 290 q 3 = 320 + 50 3 290. 1 1 1 Let the inter-point distances be given by {d 12 = 1000, d 23 = 1000, d 31 = 1000} Is it possible to have λ 1 λ 2?

using n correct correspondences We can compute the pose using 3 correct correspondences. How to compute pose using n correspondences, with outliers. Use RANSAC to identify m inliers where m n. Use least squares to find the best pose using all the inliers - basic idea is to use all the forward projection equations for all the inliers and compute R and t.

General Version - RANSAC (REMINDER) 1 Randomly choose s samples Typically s = minimum sample size that lets you fit a model 2 Fit a model (e.g., line) to those samples 3 Count the number of inliers that approximately fit the model 4 Repeat N times 5 Choose the model that has the largest set of inliers Slide: Noah Snavely

Let us do RANSAC!

Matching Images We match keypoints from left and right images. 2D-to-2D image matching using descriptors such as SIFT.

Kinect Sample Frames Sequences of RGBD frames (I1, D1 ), (I2, D2 ), (I3, D3 ),..., (In, Dn ). How to register Kinect depth data for reconstructing large scenes? We have 2D-3D pose estimators and 2D-2D image matchers.

Kinect Sample Frames

Matching Images We match keypoints from left and right images. One of the matches is incorrect! In a general image matching problem with 1000s of matches, we can have 100 s of incorrect matches.

Presentation Outline 1 2 3

(Two view triangulation) Given: calibration matrices - (K 1, K 2 ). Given: Camera poses - {(R 1, t 1 ), (R 2, t 2 )} are known. Given: 2D point correspondence - (q 1, q 2 ). Our goal is to find the associated 3D point Q m.

(Two view triangulation) Due to noise, the back-projected rays don t intersect. The required point is given by Q m = Qm 1 +Qm 2 2. The 3D point on the first back-projected ray is given by: q 1 K 1 R 1 (I t 1 )Q m 1. The 3D point on the second back-projected ray is given by: q 2 K 2 R 2 (I t 2 )Q m 2.

(Two view triangulation) Let us parametrize the 3D points using λ 1 and λ 2 : Q m 1 = t 1 + λ 1 R T 1 K1 1 1 Q m 2 = t 2 + λ 2 R T 2 K2 1 2 We rewrite using 3 1 constant vectors a, b, c and d for simplicity: Q m 1 = a + λ 1 b, Q m 2 = c + λ 2 d

(Two view triangulation) We can compute λ 1 and λ 2 as follows: [λ 1, λ 2 ] = arg min λ 1,λ 2 dist(q m 1, Q m 2 )

(Two view triangulation) dist(q m 1, Q m 2 ) = 3 (a i + λ 1 b i c i λ 2 d i ) 2 i=1 [λ 1, λ 2 ] = arg min 3 (a i + λ 1 b i c i λ 2 d i ) 2 λ 1,λ 2 [λ 1, λ 2 ] = arg min λ 1,λ 2 D sqr = i=1 3 (a i + λ 1 b i c i λ 2 d i ) 2 i=1 3 (a i + λ 1 b i c i λ 2 d i ) 2 i=1

(Two view triangulation) At minima: D sqr = D sqr λ 1 = 3 (a i + λ 1 b i c i λ 2 d i ) 2 i=1 3 2(a i + λ 1 b i c i λ 2 d i )b i = 0 i=1 D sqr λ 2 = 3 2(a i + λ 1 b i c i λ 2 d i )d i = 0 i=1 We have two linear equations with two variables λ 1 and λ 2. This can be solved!

Once λ s are computed then we can obtain: Q m 1 = a + λ 1 b Q m 2 = c + λ 2 d We can compute the required intersection point Q m from the mid-point equation: Q m = Qm 1 +Qm 2 2

Sample Calibration matrices: K 1 = K 2 = Rotation matrices:r 1 = R 2 = I. 200 0 320 0 200 240 0 0 1 Translation matrices: t 1 = 0, t 2 = (100, 0, 0) T. Correspondence: q 1 = 520 440 q 2 = 500 440 1 1 Compute the 3D point Q m.

Simple Pipeline 1 Given a sequence of images {I 1, I 2,..., I n } with known calibration, obtain 3D reconstruction. 2 Compute correspondences for the image pair (I 1, I 2 ). 3 Find the motion between I 1 and I 2 using motion estimation algorithm (next class). 4 Compute partial 3D point cloud P 3D using the point correspondences from (I 1, I 2 ). 5 Initialize k = 3. 6 Compute correspondences for the pair (I k 1, I k ) and compute the pose of I k with respect to P 3D. 7 Increment P 3D using 3D reconstruction from (I k 1, I k ). 8 k = k + 1 9 If k < n go to Step 5.

Three view triangulation Q m 1 = a + λ 1 b, Q m 2 = c + λ 2 d, Q m 3 = e + λ 3 f We can compute the required point Q m from the intersection of three rays. What is the cost function to minimize?

Problem Calibration matrices: K 1 = K 2 = K 3 = 200 0 320 0 200 240 0 0 1 Rotation matrices: R 1 = R 2 = R 3 = I. Translation matrices: t 1 = 0, t 2 = (100, 0, 0) T, t 3 = (200, 0, 0) T. Correspondence: q 1 = 520 440 q 2 = 500 440 q 3 = 480 440 1 1 1 Compute the 3D point Q m.

Acknowledgments Some presentation slides are adapted from the following materials: Peter Sturm, Some lecture notes on geometric computer vision (available online).