Using RANSAC for Omnidirectional Camera Model Fitting

CENTER FOR MACHINE PERCEPTION CZECH TECHNICAL UNIVERSITY Using RANSAC for Omnidirectional Camera Model Fitting Branislav Mičušík and Tomáš Pajdla {micusb,pajdla}@cmp.felk.cvut.cz REPRINT Branislav Mičušík and Tomáš Pajdla. Using RANSAC for Omnidirectional Camera Model Fitting. Computer Vision Winter Workshop, Valtice, Czech Republic, February 3. Available at ftp://cmp.felk.cvut.cz/pub/cmp/articles/micusik/micusik-cvww3.pdf Center for Machine Perception, Department of Cybernetics Faculty of Electrical Engineering, Czech Technical University Technická, 66 7 Prague 6, Czech Republic fax +4 435 7385, phone +4 435 7637, www: http://cmp.felk.cvut.cz

Using RANSAC for Omnidirectional Camera Model Fitting Branislav Mičušík and Tomáš Pajdla Center for Machine Perception, Department of Cybernetics Faculty of Electrical Engineering, Czech Technical University in Prague 66 7 Prague 6, Technicka, Czech Republic micusb@cmp.felk.cvut.cz, pajdla@cmp.felk.cvut.cz Abstract We introduce robust technique based on RANSAC for simultaneous estimation of central omnidirectional camera (view angle above 8 ) model and its epipolar geometry. It is shown that points near the center of view field circle satisfy the camera model for almost any degree of image non-linearity. Therefore, they are often selected in RANSAC based estimation as inliers while the most informative points near the border of the view field circle are rejected and incorrect camera model is estimated. We show that a remedy to this problem is achieved by not using points close to the center of view field circle. The camera calibration is done from image correspondences only, without any calibration objects or any assumption about the scene. We demonstrate our method in real experiments with high quality, but cheap and widely available, Nikon FC E8 fish-eye lens. In practical situations, the proposed method allows to estimate the camera model from 9 correspondences and can be thus used in an efficient RANSAC based estimation technique. Introduction Recently, high quality, but cheap and widely available, lenses, e.g. Nikon FC E8 or Sigma 8mm-f4-EX fish-eye converters, and curved mirrors, e.g. [5], providing the view angle above 8 have appeared. Cameras with so large view angle, called omnidirectional cameras, are especially appropriate in application (e.g. surveillance, tracking, structure from motion, navigation, etc.) where more stable egomotion estimation is required. Using such cameras in a stereo pair calls for searching of correspondences, camera model calibration, epipolar geometry estimation, and 3D reconstruction analogically as for standard directional cameras [7]. In this work we concentrate on robust technique based on RANSAC for simultaneous estimation of camera model and epipolar geometry for omnidirectional cameras preserving central projection. We assume that point correspondences, information about the view field of the lens, and its corresponding view angle are available. Previous work on the estimation of camera model with lens nonlinearity lens includes methods that use some knowledge about the observed scene, e.g. calibration patterns [3, 3] and plumb line methods [4, 6, 9], methods based on the fact that a lens nonlinearity introduces specific higher- Figure : Inliers detection. 8 4 images were acquired by Nikon FC E8 fish-eye converter. Correspondences were obtained by []. Wrong model. All points were used in model estimation using RANSAC. The model, however, suits only to the points near the center of the view field circle since other points are marked as outliers. Correct model. Only points near the boundary of the view field circle were used for computing of the model. The model suits to points near the center as well as to points near the boundary. order correlation in the frequency domain [5], or calibrate cameras from point correspondences only, e.g. [6, 4, 8]. Fitzgibbon [6] deals with the problem of lens nonlinearity estimation in the context of camera self-calibration and structure from motion. His method, however, cannot be directly used for omnidirectional cameras with view angle above 8 because it represents images by points in which rays of a camera intersect an image plane. We extended [] the method [6] for omnidirectional cameras, derived appropriate omnidirectional camera model incorporating lens nonlinearity, and suggested an algorithm for estimation of the model from epipolar geometry. In this work we show, see Figure, how the points should be sampled in RANSAC to obtain correct unbiased estimate of the camera model and epipolar geometry. Our method is useful for lenses as well for mirrors [5] providing view angle above 8 and possessing central projection. The structure of the paper is the following. The omnidirectional camera model and its simultaneous estimation with epipolar geometry is reviewed in Section. The properties of the camera model and the robust bucketing technique based on RANSAC are introduced in Section 3. An algorithm for the camera model estimation is generalized in Section 4.

PSfrag replacements PSfrag replacements optical axis θ p ρ optical axis θ (u, v ) sensor (c) Figure : Nikon FC E8 fish-eye converter. The lens possesses central projection, thus all rays emanate from its optical center, which is shown as a dot. (c) Notice, that the image taken by the lens to the planar sensor π can be represented by intersecting camera rays with a spherical retina ρ. Experiments and summary are given in Sections 5 and 6. π PSfrag replacements p ρ C w(u, v) - opt.axis θ (u, v, ) (u, v, w) (p, q, s) f(r) r Figure 3: The diagram of the construction of mapping f from the sensor plane π to the spherical retina ρ. The point (u, v, ) in the image plane π is transformed by f(.) to (u, v, w) and then normalized to unit length, and thus projected on the sphere ρ. π Omnidirectional camera model For cameras with view angle above 8, see Figure, images of all scene points X cannot be represented by intersections of camera rays with a single image plane. Every line passing through cameras optical center intersects the image plane in one point. However, two scene points can lie on one such line and they can be seen in the image at the same time, see rays p and p in Figure c. For that reason, we will represent rays of the image as a set of unit vectors in R 3 such that one vector corresponds just to one image of a scene point. Let us assume that u = (u, v) are coordinates of a point in an image with the origin of the coordinate system in the center of the view field circle (u, v ). Remember, that it is not always the center of the image. Let us further assume that a nonlinear function g, which assigns D image coordinates to 3D vectors, can be expressed as g(u) = g(u, v) = (u v f(u, v)), () where f(u) is rotationally symmetric function w.r.t. the point (u, v ). Function f can have various forms determined by lens or mirror construction [3, 9]. For Nikon FC E8 fish-eye lens we use the division model [] θ = ar + br, r = a a 4bθ, () bθ where θ is the angle between a ray and the optical axis, and r = u + v is the radius of a point in the image plane w.r.t. (u, v ), and a, b are parameters of the model. Using f(u) = r, see Figure 3, 3D vector p with unit length can tan θ be expressed up to scale as ( ) ( ) u u p = w f(u, a, b) = ( u r tan θ ) ( u = r tan ar +br (3) The equation (3) captures the relationship between the image point u and the 3D vector p emanating from the optical center towards a scene point.. Model estimation from epipolar geometry Function f(u, a, b) in (3) is a two parametric non-linear function, which can be expanded to Taylor series with respect to a and b in a and b, see [] for more details. The ). vector p can be then written, using (3), as [( ) ( ) u p + a f(.) a f a(.) b f b (.) f a(.) x + as + bt, ( )] + b f b (.) where x, s, and t are known vectors computed from image coordinates, a and b are unknown parameters, and f a, f b are partial derivatives of f(.) w.r.t. a and b. The epipolar constraint for vectors p in the left and p in the right image that correspond to the same scene point reads as p Fp = (x + as + bt ) F(x + as + bt) =. After arranging of unknown parameters into the vector h we obtain (D + ad + a D 3 )h =, (4) where matrices D i are known [] and vector h is h = [ f f f 3 f 4 f 5 f 6 f 7 f 8 f 9 bf 3 bf 6 bf 7 bf 8 bf 9 b f 9 ], with f i being elements of the fundamental matrix. Equation (4) represents Quadratic Eigenvalue Problem (QEP) [, 7], which can be solved by MATLAB using the function polyeig. Parameters a, b, and matrix F can be thus computed simultaneously. We recover parameters of model (3), and thus angles between rays and the optical axis, which is equivalent to recovering an essential matrix, and therefore calibrated camera. We used angular error, i.e. angle between a ray and an corresponding epipolar plane [], to measure the quality of the estimate of epipolar geometry instead of the distance of a point from its epipolar line [8]. Knowing that the field of view is circular, the view angle equals θ m, the radius of the view field circle equals R, and from (), parameter a can be then expressed as a = (+br )θ m R. Thus (3) can be linearized to a one parametric model, and a 9-points RANSAC as a pre-test to detect most of outliers can be used like in [6]. To obtain better estimate, two parametric model with a priori knowledge a = θm R, b =, can be used in a 5-points RANSAC estimation.

.5 4 x 3.6.5 θ [rad].5 3 radius [mm] θ [rad] 4 6 radius [mm] 3 Figure 4: Comparison of various lens models with ground truth data. The proposed model (black dots), nd order polynomial (red circles), and 3 rd order polynomial (blue crosses) are fitted to data measured in an optical laboratory. The angle between 3D vector and the optical axis as a function of the radius of a point in the image plane. Approximation errors θ = θ θ gt for all models. θ gt means the ground truth angle. 3 Camera model fitting In this section we want to investigate our proposed division model, fit it to ground truth data, compare it with other commonly used models, and observe the prediction error, i.e. how many points are needed and where they should be located in the image to fit the model from minimum subset of points with a sufficient accuracy. 3. Precision of the division model We compare our division model () with commonly used polynomial models of the nd order (θ = a + a r ), and of the 3 rd order (θ = a + a r + a 3 r 3 ). The constants a i represent parameters of the models, r is the radius of an image point w.r.t. (u, v ) and θ is the angle between the corresponding 3D vector and the optical axis. As a ground truth we used data measured in an optical laboratory. The uncertainty of ground truth data measurement in angle were ±. and in radius were ±.5mm. We fit all three models to all ground truth points, see Figure 4. Angular error θ between the angle computed from fitted model and the ground truth angle is shown in Figure 4b. The accuracies of the approximations are: RMS (3) = 5.6 4 rad, RMS pol = 36 4 rad, RMS pol3 = 6.5 4 rad. As can be seen in Figure 4, our proposed two-parametric model reaches much better fit than the two-parametric nd order polynomial model, and it has comparable (a little bit better) fit than the three-parametric 3 rd order polynomial model. We are interested in prediction error of the models, i.e. error on complete ground truth data for the models that were fitted only from some of them. We want to investigate how points selection can affect the final error of model estimate. We ordered ground truth points into a sequence by their radius computed w.r.t. (u, v ). First, we computed all three models from first three ground truth points in the sequence (a minimal subset to compute parameters of the models), and then tested fitted models on all ground truth points, i.e. computed RMS errror. Then we added points from the sequence gradually into the subset from which the models are estimated and computed RMS error on all ground truth points. We repeated adding of points until all points in the sequence were used for models.4 RMS. 3 4 5 6 7 8 9 number of points 3 RMS..5 3 4 5 6 7 8 9 number of points Figure 5: Prediction error, i.e. the influence of the position of points and the number of points used on model fitting. Gaussian noise with σ = pixel was added to the ground truth data, trials were performed. Error bars with the mean, th and 9th percentile values are shown. The x-axis represents the number of ground truth points used for model fitting. Points are being added to the subset from the center (u, v ) to the boundary of the view field circle. Points are being added from the boundary to the center. The proposed model (black line labeled by ), nd order polynomial (red line labeled by ), and 3 rd order polynomial (blue line labeled by 3) are considered. Graphs for nd and 3 nd order polynomials are shifted to the rigth to show noise bars. fittings. Gaussian noise with σ = pixel was added to the ground truth data and trials were performed in order to see influence of noise on model fitting. Secondly, we repeated the same procedure but ground truth points were added from the end of the sequence (i.e. from the boundary to the center of the view field circle). Figure 5 shows both experiments. As can be seen, noise has smaller effect on model fitting when the number of points, from which the model is computed, increases. It can be seen from Figure 5a, that RMS error is very high for the minimal set of three points and decreases significantly only when points close to the boundary of the view field circle are included. On the other hand, when points are added from the boundary, see Figure 5b, the RMS error of our model already starts with a low value and adding of more points that are closer to the center does not change the RMS error dramatically. It is clear that points near the boundary of the view field circle are more important than points in the center. Thus in order to obtain a good lens model it is important to use points near the boundary preferentially. Equations () show that, our model is easily invertible. It allows us to recompute image points to its corresponding 3D vectors and 3D vectors to its corresponding image points without using any iterative methods. 3. Using bucketing in RANSAC There are outliers and noise in correspondences. We used RANSAC [7] for robust model estimation and outliers detection. We propose a strategy for point sampling, similiar to bucketing [], in order to obtain a good estimate in a reasonable time. As it was described before, angle between a ray and its corresponding epipolar plane is used as the criterion of the estimation quality, call it the angular error. Ideally it should be zero, but we admit some tolerance in real situation. The tolerance in the angular error propagates into the tolerance 3

f f.6. θ [rad].5.4. θ θ [rad].5...3.4.5 radius [pxl/]....3.4.5 radius [pxl/]....3.4.5 radius [pxl/] (c) Figure 6: Model fitting with a tolerance θ. The graph θ = f(r) for ground truth data (black thick curve) and two models satisfying the tolerance (red and blue curves). Parameters a and b can vary for models satisfying the tolerance. The area between dashed curves is determined by the error. In this area, all models satisfying the tolerance must lie. (c) The angular error for both models with respect to the ground truth. θ [rad].5.5.6.4. θ Figure 7: Image zones used for correct model estimation based on RANSAC. Points near the center (u, v ), i.e. points with radius smaller than.4 r max, are discarded. The rest of the image is divided into three zones with equal areas from which the points are randomly sampled by RANSAC. in camera model parameters, see Figure 6. The region, in which lie models that satisfy a certain tolerance is narrowing with increasing the radius of points in the image, see Figure 6b. Since f() = a [], the points near the center (u, v ) will affect only parameter a. There is a large tolerance in parameter a since the tolerance region near the center (u, v ) is large. Since RANSAC looks for a model fitting the highest number of points within a certain tolerance, it may fit only points near the center (u, v ) in order to obtain the highest number of inliers, see Figure a. On the other hand, there may exist model, with less inliers, but suiting as to the points near the center as to the points in the boundary, see Figure b. As it was shown before, points near the center (u, v ) have no special contribution to the final model fitting and the most informative points lie near the boundary of the view field circle. Therefore, to obtain the correct model, it is necessary to reject a priori points near the center (u, v ). The rest of the image, as Figure 7 shows, is split into three zones with equal areas from which the same number of points are randomly chosen by RANSAC. This helps to avoid the degenerate configurations, strongly biased estimates, and it decreases the number of RANSAC iteration. As it was mentioned before, our model can be reduced to a one-parametric model using a = (+br )θ m R, where R is radius corresponding to the maximum view angle θ m. can be obtained by fitting circle to the view field boundary in the image information from manufacturer...3.4.5 radius [pxl/]....3.4.5 radius [pxl/] Figure 8: Model fitting with maximum defined error θ for oneparametric model. See Figure 6 for the explanation. Notice that models labeled as and end in the same point. It can be seen from Figure 8 that a priori known values R and θ m fix all models with various b to the point [R θ m ]. The resulting model has only one degree of freedom and thus smaller possibility to fit outliers. Using the approximate knowledge of a reduces the minimal set to be sampled by RANSAC from 5 to 9 correspondences. It is natural to use a 9-points RANSAC as a pre-test that excludes most disturbing outliers before the full and more accurate a 5-points RANSAC is applied. 4 Algorithm Algorithm for computing 3D rays and an essential matrix:. Find an ellipse corresponding to the lens field of view. Transform the image so that the ellipse becomes a circle. Find correspondences {u u } between two images. Use only correspondences with u + v >.4 R, where R is the radius of the view field circle.. Scale image points u := u/ to obtain better numerical stability. Choose a = R θ m and b =. 3. Create matrices D, D, D 3 R N 5, where N is the number of correspondences. Solve equation (4) with inverted QEP due to singularity of D 3 []. Use MATLAB: [H a] = polyeig(d D 3, D D, D D ), H is a 5 3 matrix with columns h, a is a 3 vector with elements /a. Six possible solutions of b from last six elements of h appear. 4. Choose a and a < (other solutions seem never be correct), 4 solutions remain. For every a there are

Figure 9: Nikon FC E8 fish-eye converter mounted on the PULNIX TM digital camera with resolution 7 8 pixels is rotated along a circle. Correspondences between two consecutive images. Circles mark points in first image, lines join them to the matches in the next one. The images are superimposed in red and green channel. 6 solutions of b. Create 3D rays using a and b and compute F using a standard method [7]. The set of possible solutions {a i b i,...6 F i,...6 } arises. 5. Compute the angular error for all triples {a b F} as a sum of errors for all correspondences. The triple with the minimal error is the solution of a, b, and the essential matrix F. 5 Real data In this section, the method is applied to real data. Correspondences were obtained by commercial program boujou []. Parameters of camera models and cameras trajectories (up to magnitudes of translation vectors), were estimated. Relative camera rotations and directions of translations used for trajectories estimations were computed from essential matrices [7]. For obtaining the magnitudes we would need to reconstruct the observed scene. It was not the task of this paper. Instead, we assumed unit length of translation vectors. The first experiment shows a rotating omnidirectional camera, see Figure 9. The camera was mounted on a turntable such that the final trajectory of its optical center was circular. Images were acquired every, 36 images in total. Three approaches for the estimation of parameters a, b, and essential matrices F were used. The first approach used all correspondences and the essential matrix F was computed for every pair independently from a, b estimated for the given pair, see Figure a. The second approach estimates one ā and one b as the median of all a, b s computed for every consecutive pair of images in the whole sequence. Matrices F were then computed for each pair using the same ā, b, see Figure b. The third approach differs from the second one that a 9-points RANSAC as a pre-test to detect most of outliers and then a 5-points RANSAC were performed to compute parameters a, b for every pair, see Figure c. The next experiment calibrates the omnidirectional camera from its translation in the direction perpendicular to its optical axis, see Figure. The estimated trajectory is shown in Figure a. The angular differences between esti- (c) Figure : Motion estimation for the circle sequence. Red depicts the starting position, depicts the end position. Essential matrix F is computed from actual estimate of a and b. F is computed from a and b that are determined from whole sequence. (c) F is computed from a and b that are determined from whole sequence using RANSAC for detecting outliers. rad...... rad.. Figure : Side motion. Nikon FC E8 fish-eye converter with COOLPIX digital camera with resolution 6 pixels was used. On the left hand side, a diagram of the camera motion is depicted and on the right hand side a picture of the real setup is shown. Below the diagram the estimated trajectory is shown. Angular error between the direction of motion and the optical axis for each pair, and 3σ circle. Figure : General motion of Nikon FC E8 fish-eye converter with COOLPIX digital camera. Setup of the experiment. A mobile tripod with the camera. The correctly estimated trajectory. mated and true motion directions for every pair are depicted in Figure b. Average angular error is.4. The next experiment shows the calibration from a general planar motion. Figure shows a mobile tripod with an omnidirectional camera and estimated real U-shaped trajectory with right angles. The last experiments, see Figure 3, applied our model and model introduced in [6] to an omnidirectional image. It can be seen that the model [6] does not sufficiently capture lens nonlinearity. 6 Conclusion The paper presented robust simultaneous estimation of the omnidirectional camera model and epipolar geometry. As the main contribution, the paper shows how the points should be sampled in RANSAC to avoid degenerate configurations and biased estimates. It was shown that the points

(c) (d) (e) Figure 3: Comparison of two camera models applied to an omnidirectional image acquired by Nikon FC E8 fish-eye converter. Part of the omnidirectional image is linearized and projected to a plane. Input image corresponds to 83 angle of view. Red dashed circle represents image with 6 angle of view. Camera model from [6] is used. (c) Notice that not all lines are straight and parallel. The model does not sufficiently capture lens nonlinearity. (d) Our proposed model. (e) Notice that with our model all lines are straight and parallel. near the center of the view field circle can be discarded and the final model computed only from points near the boundary of the view field circle. The suggested technique allows to incorporate an omnidirectional camera model into a 9points RANSAC followed by a 5-points RANSAC for camera model, essential matrix estimation, and outliers detection. Real experiments suggest that our method is useful for structure from motion with sufficient accuracy as a starting point for bundle adjustment. Acknowledgement This research was supported by the following projects: CTU ˇ //97, MSM 33, MSMT ˇ 953, GACR KONTAKT -3-4, BeNoGo IST 3984. References [] d3 Ltd. Boujou.. http://www.d3.com. [] Z. Bai, J. Demmel, J. Dongarra, A. Ruhe, and H. van der Vorst, editors. Templates for the Solution of Algebraic Eigenvalue Problems : A Practical Guide. SIAM, Philadelphia,. [3] H. Bakstein and T. Pajdla. Panoramic mosaicing with a 8 field of view lens. In Proc. of the IEEE Workshop on Omnidirectional Workshop, pages 6 67,. [4] C. Br auer-burchardt and K. Voss. A new algorithm to correct fish-eye- and strong wide-angle-lens-distortion from single images. In Proc. ICIP, pages 5 8,. [5] H. Farid and A. C. Popescu. Blind removal of image non-linearities. In Proc. ICCV, volume, pages 76 8,. [6] A. Fitzgibbon. Simultaneous linear estimation of multiple view geometry and lens distortion. In Proc. CVPR,. [7] R. Hartley and A. Zisserman. Multiple View Geometry in Computer Vision. Cambridge University Press, Cambridge, UK,. [8] R. I. Hartley and P. Sturm. Triangulation. Computer Vision and Image Understanding: CVIU, 68():46 57, 997. [9] J. Kumler and M. Bauer. Fisheye lens designs and their relative performance. http://www.coastalopt.com/ fisheyep.pdf. [] J. Matas, O. Chum, M. Urban, and T. Pajdla. Robust wide baseline stereo from maximally stable extremal regions. In P. L. Rosin and D. Marshall, editors, Proc. of the British Machine Vision Conference, volume, pages 384 393, UK, September. BMVA. [] B. Miˇcuˇs ık and T. Pajdla. Estimation of omnidirectional camera model from epipolar geometry. Research Report CTU CMP, Center for Machine Perception, K333 FEE Czech Technical University, Prague, Czech Republic, June. [] J. Oliensis. Exact two image structure from motion. PAMI,. [3] S. Shah and J. K. Aggarwal. Intrinsic parameter calibration procedure for a (high distortion) fish-eye lens camera with distortion model and accuracy estimation. Pattern Recognition, 9():775 788, November 996. [4] G. P. Stein. Lens distortion calibrating using point correspondences. In Proc. CVPR, pages 6 69, 997. [5] T. Svoboda and T. Pajdla. Epipolar geometry for central catadioptric cameras. International Journal of Computer Vision, 49():3 37, August. [6] R. Swaminathan and S. K. Nayar. Nonmetric calibration of wide-angle lenses and polycameras. PAMI, ():7 78,. [7] F. Tisseur and K. Meerbergen. The quadratic eigenvalue problem. SIAM Review, 43():35 86,. [8] Y.Xiong and K.Turkowski. Creating image-based VR using a self-calibrating fisheye lens. In Proc. CVPR, pages 37 43, 997. [9] Z. Zhang. On the epipolar geometry between two images with lens distortion. In Proc. ICPR, pages 47 4, 996. [] Z. Zhang, R. Deriche, O. Faugeras, and Q.-T. Luong. A robust technique for matching two uncalibrated images through the recovery of the unknown epipolar geometry. Artificial Intelligence, 78(-):87 9, 995.