Hartley - Zisserman reading club. Part I: Hartley and Zisserman Appendix 6: Part II: Zhengyou Zhang: Presented by Daniel Fontijne

Hartley - Zisserman reading club Part I: Hartley and Zisserman Appendix 6: Iterative estimation methods Part II: Zhengyou Zhang: A Flexible New Technique for Camera Calibration Presented by Daniel Fontijne p.1/35

HZ Appendix 6: Iterative estimation methods Topics: Basic methods: Newton, Gauss-Newton, gradient descent. Levenberg-Marquardt. Sparse Levenberg-Marquardt. Applications to homography, fundamental matrix, bundle adjustment. Sparse methods for equations solving. Robust cost functions. Parameterization. Lecture notes which I found useful (methods for non-linear least squares problems): http://www2.imm.dtu.dk/pubdb/views/edoc_download.php/3215/pdf/imm3215.pdf p.2/35

Iterative estimation methods Problem: how to find minimum of non-linear functions? p.3/35

Iterative estimation methods Problem: how to find minimum of non-linear functions? Examples of HZ problems: -homography estimation. -fundamental matrix estimation. -multiple image bundle adjustment. -camera calibration (Zhang paper). Examples of my recent problems: -optimization of skeleton geometry given marker data. -optimization of skeleton pose given marker data. p.3/35

Newton iteration Goal: minimize X = f(p) for P. X is the measurement vector. P is the parameter vector. f is some non-linear function. p.4/35

Newton iteration Goal: minimize X = f(p) for P. X is the measurement vector. P is the parameter vector. f is some non-linear function. In other words: Minimize ɛ = X f(p). p.4/35

Newton iteration Goal: minimize X = f(p) for P. X is the measurement vector. P is the parameter vector. f is some non-linear function. In other words: Minimize ɛ = X f(p). We assume f is locally linear at each P i, then f(p i + i ) = f(p i ) +J i i, where matrixj i is the Jacobian f/ P at P i. So we want to minimize ɛ i +J i i for some vector i. p.4/35

Gauss-Newton method Suppose we want to minimize some cost function g(p) = 1 2 ɛ(p) 2 = 1 2 ɛ(p)t ɛ(p). p.5/35

Gauss-Newton method Suppose we want to minimize some cost function g(p) = 1 2 ɛ(p) 2 = 1 2 ɛ(p)t ɛ(p). We may expand in a Taylor series up to second degree g(p + ) = g + g P + T g PP /2, where subscript P denotes differentiation. Differentiating w.r.t., setting to zero results in g PP = g P. Using this equation we could compute if we knew g PP and g P. Gradient vector: g P = ɛ T P ɛ = JT ɛ. Intuition? Hessian: g PP = ɛ T P ɛ P + ɛ T PP ɛ JT J. Assume linear again... p.5/35

Gradient descent Gradient descent or steepest descent searches in the direction of most rapid decrease g P = ɛ T P ɛ. p.6/35

Gradient descent Gradient descent or steepest descent searches in the direction of most rapid decrease g P = ɛ T P ɛ. So we take steps λ = g P where λ controls the step size and is found through line search. p.6/35

Levenberg-Marquardt Levenberg-Marquardt is a blend of Gauss-Newton and gradient descent. Update equation: (J T J + λi) = J T ɛ. p.7/35

Levenberg-Marquardt Levenberg-Marquardt is a blend of Gauss-Newton and gradient descent. Update equation: (J T J + λi) = J T ɛ. Algorithm: Initially set λ = 10 3. Try update equation. If improvement: divide λ by 10. I.e., shift towards Gauss-Newton. Else: multiply λ by 10. I.e., shift towards gradient descent. p.7/35

Sparse Levenberg-Marquardt 1/2 In many estimation problems, the Jacobian is sparse. One should this to lower the time complexity (sometimes even from O(n 3 ) to O(n)). p.8/35

Sparse Levenberg-Marquardt 1/2 In many estimation problems, the Jacobian is sparse. One should this to lower the time complexity (sometimes even from O(n 3 ) to O(n)). In the example, the parameters are partitioned into two blocks: P = (a T,b T ) T The Jacobian then has the formj = [A B], with A = [ ˆX/ a], B = [ ˆX/ b]. p.8/35

Sparse Levenberg-Marquardt 2/2 If the normal equations are written as (what s with the?) [ ] ( ) ( ) U W δ a ɛ A =, W T V δ b ɛ B we can rewrite this to [ U WV 1 W T 0 W T V ] ( δ a δ b ) = ( ɛ A WV 1 ɛ B ɛ B [ ] I WV 1 by multiplying on the left by. 0 I Now first solve the top half, then the lower half using back-substitution. ) p.9/35

Robust cost functions 1/5 p.10/35

Robust cost functions 2/5 Squared-error is not usable unless outliers are filtered out. p.11/35

Robust cost functions 2/5 Squared-error is not usable unless outliers are filtered out. Alternatives: Blake-Zisserman: outliers are given a constant cost. Disadvantages: not a PDF, not convex. Corrupted Gaussian: blend two Gaussians, one for inliers and one for outliers. Disadvantages: not convex. Cauchy: (?). disadvantages: not convex. L1: absolute error (not squared). Disadvantages: not differentiable at 0, minimum is not at a single point when summed. p.11/35

Robust cost functions 3/5 Figure A.6.5 p.12/35

Robust cost functions 4/5 Figure A.6.6 p.13/35

Robust cost functions 5/5 Summary: Squared-error cost function is very susceptible to outliers. p.14/35

Robust cost functions 5/5 Summary: Squared-error cost function is very susceptible to outliers. The non-convex functions (like L1 and corrupted Gaussian) may be good, but they have local minima. So do not use them unless already close to true minimum. p.14/35

Fooling LM implementations Most implementations of Levenberg-Marquardt use the squared error cost function. What if you want a different cost function C instead? p.15/35

Fooling LM implementations Most implementations of Levenberg-Marquardt use the squared error cost function. What if you want a different cost function C instead? Replace the each difference δ i with a weighted version δ i = w i δ i such that δ i 2 = w 2 i δ i 2 = C( δ i ). p.15/35

Parameterization for Levenberg-Marquardt A good parameterization for use with LM is singularity free (at least in area visited during optimization). This means: continuous, differentiable, one-to-one. So latitude-longitude is not suitable to parameterize sphere. p.16/35

Parameterization of 3-D rotations 3-D rotation matrix: 9 elements, only 3 degrees of freedom. Angle-axis (3-vector) representation: 3 elements, 3 d.o.f. p.17/35

Parameterization of 3-D rotations 3-D rotation matrix: 9 elements, only 3 degrees of freedom. Angle-axis (3-vector) representation: 3 elements, 3 d.o.f. Observations: (these are mostly just general observations about log(rotation)) Identity rotation: t = 0. Inverse rotation: t. Small rotation: the rotation matrix isi + [t]. For small rotations: R(t 1 )R(t 2 ) R(t 1 + t 2 ). All rotations can be represented by t with t π. When t = n2π, (n positive integer) you get identity rotation again (singularity). p.17/35

Parameterization of homogeneous vectors Let v be a n-d-vector (already stripped of extra homogeneous coordinate?). Then parameterize it as n + 1 vector: v = (sinc( v /2)v T, cos( v /2)) T. p.18/35

Parameterization of the n-sphere How to parameterize unit vectors x? Compute Householder matrix (reflection) such that H v(x) x = (0,...,0, 1) T. p.19/35

Parameterization of the n-sphere How to parameterize unit vectors x? Compute Householder matrix (reflection) such that H v(x) x = (0,...,0, 1) T. (i) f(y) = ŷ/ ŷ where ŷ = (y T, 1) T, (?) (ii) f(y) = (sinc( y /2)y T, cos( y /2)) T (?). both have a Jacobian f/ y = [I 0] T. p.19/35

Zhang Paper Zhengyou Zhang A Flexible New Technique for Camera Calibration (1998) p.20/35

Zhang Paper Zhengyou Zhang A Flexible New Technique for Camera Calibration (1998) As implemented for: Matlab The Camera Calibration Toolbox for Matlab C++ Intel OpenCV p.20/35

Internal Camera Calibration 1/4 Primary use of the Zhang algorithm is internal camera calibration. It computes: focal center c x and c y. focal length f x and f y. skew s (optional). p.21/35

Internal Camera Calibration 1/4 Primary use of the Zhang algorithm is internal camera calibration. It computes: focal center c x and c y. focal length f x and f y. skew s (optional). In short, the camera intrinsic matrix: A = f x s c x 0 f y c y 0 0 1 p.21/35

Internal Camera Calibration 2/4 The Zhang algorithm also computes radial lens distortion parameters [k 1,k 2,k 3,k 4 ]. The original paper uses x d = x + x (k 1 (x 2 + y 2 ) + k 2 (x 2 + y 2 ) 2 ), y d = y + y (k 1 (x 2 + y 2 ) + k 2 (x 2 + y 2 ) 2 ), where x and y are normalized image coordinates and x d and y d are the distorted coordinates. But the implementations use a more complex model x d = x + x (k 1 (x 2 + y 2 ) + k 2 (x 2 + y 2 ) 2 ) + x td, y d = y + y (k 1 (x 2 + y 2 ) + k 2 (x 2 + y 2 ) 2 ) + y td, where x td = 2k 3 x y + k 4 (3x 2 + y 2 ), y td = 2k 4 x y + k 3 (x 2 + 3y 2 ). p.22/35

Internal Camera Calibration 3/4 Example of internal camera calibration parameters. Camera: PixeLINK A741, 2/3 inch CMOS sensor, 1280x1024. Lens: Cosmicar 8.5mm fixed focal length. f x = 1272.872 pixels = 8.528mm f y = 1272.988 pixels = 8.529mm c x = 632.740 c y = 507.648 k 1 = 0.204 k 2 = 0.171 k 3 = 0.00074896 k 4 = 0.00008878 p.23/35

Internal Camera Calibration 4/4 Show lens distortion in DASiS video viewer... p.24/35

External Camera Calibration The Zhang algorithm may also be used for external camera calibration. Camera rotation and translation are computed as side-product of internal calibration. If two cameras see the same calibration pattern at the same time, their relative position and orientation may be computed. p.25/35

Overall approach Measure projected position of points in a plane (e.g., checkerboard). p.26/35

Overall approach Measure projected position of points in a plane (e.g., checkerboard). Do so for at least two different camera orientations. p.26/35

Overall approach Measure projected position of points in a plane (e.g., checkerboard). Do so for at least two different camera orientations. Setup equations in order to estimate camera intrinsics. Given camera intrinsics, estimate extrinsics. p.26/35

Basic Equations Plane ( checkerboard ) is at Z = 0. p.27/35

Basic Equations Plane ( checkerboard ) is at Z = 0. Homogeneous 2-D image point: m. Homogeneous 3-D world point: M = [X Y 0 1] T. p.27/35

Basic Equations Plane ( checkerboard ) is at Z = 0. Homogeneous 2-D image point: m. Homogeneous 3-D world point: M = [X Y 0 1] T. Projection: α γ u 0 0 β v 0 0 0 1 s m = A[R t] M = [r 1 r 2 r 3 t] [X Y 0 1] T = A [r 1 r 2 t] [X Y 1] T p.27/35

Homography, constraints An homography H can be estimated between known points on the calibration object and the measured world points. H = [h 1 h 2 h 3 ] = λa[r 1 r 2 t] p.28/35

Homography, constraints An homography H can be estimated between known points on the calibration object and the measured world points. H = [h 1 h 2 h 3 ] = λa[r 1 r 2 t] We demand: C1: r T 1 r 2 = 0 (r 1, r 2 orthogonal), C2: r T 1 r 1 = r T 2 r 2 (r 1, r 2 have same length). We know: h 1 = λar 1 r 1 = λ 1 A 1 h 1 h 2 = λar 2 r 2 = λ 1 A 1 h 2 p.28/35

Closed-form solution using constraints 1/4 Using the constraints, we can first find A, followed by R and t. Let B 11 B 12 B 13 B = A T A 1 = B 12 B 22 B 23 = 1 α 2 γ α 2 β B 13 B 23 B 33 γ α 2 β v 0 γ u 0 β α 2 β γ 2 α 2 β 2 + 1 β 2 γ(v 0γ u 0 β) α 2 β 2 v 0 β 2 v 0 γ u 0 β γ(v 0γ u 0 β) v α 2 β α 2 β 2 0 (v 0 γ u 0 β) 2 + v2 β 2 α 2 β 2 0 This allows to solve for α, β, etc. β 2 + 1. p.29/35

Closed-form solution using constraints 2/4 If we reshuffle the six unique elements of B into a vector b = [B 11,B 12,B 22,B 13,B 23,B 33 ], we can rewrite both constraints as h T i Bh j = v T ijb, where v ij = [h i1 h j1,h i1 h j2 + h i2 h j1,h i2 h j2, h i3 h j1 + h i1 h j3,h i3 h j2 + h i2 h j3,h i3 h j3 ] T, ultimately [ resulting ] in v12 T b = 0. (v 11 v 22 ) T p.30/35

Closed-form solution using constraints 3/4 Next, stack all the equations from n measurements (estimated homographies) of the plane ( checkerboard ): Vb = 0, where V is a 2n 6 matrix. Solve as usual using the SVD. p.31/35

Closed-form solution using constraints 4/4 Once A is known, we can obtain r 1, r 2 and t: r 1 = λ 1 A 1 h 1, r 2 = λ 1 A 1 h 2, t = λ 1 A 1 h 3. p.32/35

Closed-form solution using constraints 4/4 Once A is known, we can obtain r 1, r 2 and t: r 1 = λ 1 A 1 h 1, r 2 = λ 1 A 1 h 2, t = λ 1 A 1 h 3. Now Zhang says r 3 = r 1 r 2, and use SVD to make matrix R orthogonal, i.e., R = UV T. I say: Make r 1, r 2 orthogonal in least-squares sense. The compute r 3 = r 1 r 2. Is simpler and boils down to the same thing. p.32/35

Radial distortion Using the camera intrinsics and extrinsics undistorted coordinates of points (corners on the checkerboard) can be approximated. These is used to solve for k 1, k 2 : (u u 0)(x 2 + y 2 ) (u u 0 )(x 2 + y 2 ) 2 (v v 0 )(x 2 + y 2 ) (v v 0 )(x 2 + y 2 ) 2 k 1 k 2 = ŭ u v v. These equations are stacked (D[k 1 k 2 ] T = d) and we solve least squares [k 1 k 2 ] T = (D T D) 1 D T d. Then iterate both algorithm (internal+external, radial) until convergence. p.33/35

Maximum likelihood estimation Optimize: use Levenberg-Marquardt to find minimum of n i=1 m j=1 m ij m(a,k 1,k 2,R i,t i,m i ) 2 (n images, m points per image) All done... p.34/35

Notable experimental results Using three different images, the results are pretty good. Results keep getting better with more images. p.35/35

Notable experimental results Using three different images, the results are pretty good. Results keep getting better with more images. 45 degree angle between image plane and checkerboard seems to give best result. Loss of precision in corner detection was not taken into account (simulated data). Systematic non-planarity of checkerboard has more effect than random noise (duh). Cylindrical non-planarity is worse than spherical non-planarity (cylindrical more common in practice?). p.35/35