CameraTranform: a Scientific Python Package for Perpective Camera Correction Richard Gerum, Sebatian Richter, Alexander Winterl, Ben Fabry, and Daniel Zitterbart,2 arxiv:72.07438v [c.ms] 20 Dec 207 Department of Phyic, Univerity of Erlangen-Nürnberg, Germany 2 Applied Ocean Phyic and Engineering, Wood Hole Oceanographic Intitution, Wood Hole, USA December 2, 207 Abtract Scientific application often require an exact recontruction of object poition and ditance from digital image. Therefore, the image need to be corrected for perpective ditortion. We preent CameraTranform, a python package that perform a perpective image correction whereby the height, tilt/roll angle and heading of the camera can be automatically obtained from the image if additional information uch a GPS coordinate or object ize are provided. We preent example of image of penguin colonie that are recorded with tationary camera and from a helicopter. Introduction Optical recording uch a on-demand image from camera trap, continuou time-lape image, or video recording, are a widely ued tool in ecology [2, 7, ]. While uch recording are ueful for counting animal and etimating abundance [5], they inherently contain perpective ditortion that make it difficult to meaure poition and ditance. To correct for uch ditortion and to map image point to real-world poition, it i paramount to know certain camera parameter. Thi include the geographic camera poition relative to landmark in the cenery, the camera height, tilt/roll angle and heading. Thee parameter are often difficult or impoible to evaluate in the field at the time of the recording, but they can be recontructed afterward if the real-world coordinate of prominent feature in the image are known. The mathematical procedure behind thi recontruction i baed on imple linear algebra, but the tep to apply the underlying matrix operation to image data can be omewhat involved. In thi article we preent the python package CameraTranform that wa developed to facilitate pot-recording calibration baed on ingle (not tereo) image. CameraTranform provide variou tool to etimate the camera parameter from feature preent in the image, and tranform point coordinate in the image to real-world or to geographic coordinate. We explain the mathematical detail of the calibration and tranformation, preent calibration example and provide an analyi of the uncertainty of the procedure. 2 Camera Matrix All information about the mapping of real-world point to image point are tored in a camera matrix. The camera matrix i expreed in projective coordinate, and can be plit into two part: the intrinic matrix and the extrinic matrix [3]. The intrinic matrix depend on the camera enor and len, the extrinic matrix depend on the camera poition and orientation. 2. Projective coordinate Projective coordinate, alo known a homogeneou coordinate, are ued to repreent projective tranformation a matrix multiplication [6]. They are a mathematical trick that extend the
vector repreentation of a point with an additional entry. Thi entry default to, and all calar multiple of a vector are conidered equal: x x y ˆ= y () For example, the point (5,7) can be repreented by the tuple of projective coordinate (5,7,) or (0,4,2) and o on. The calar need not be an integer. Projective coordinate allow u to write the camera projection y a: y y 2 = c c 2 c 3 c 4 c 2 c 22 c 23 c 24 c 3 c 32 c 33 c 34 x x 2 x 3 (2) where x pecifie the point in the 3D world, which i tranformed with the camera matrix C to obtain the point in the camera image y. 2.2 Intrinic parameter To compute the intrinic matrix entrie, we need to know the focal length f of the camera in mm, the enor dimenion (w enor h enor ) in mm, and the image dimenion (w image h image ) in pixel. The intrinic matrix entrie are then the effective focal length f pix and the centre of the image (w image /2, h image /2) according to f pix 0 w image /2 0 C intr. = 0 f pix h image /2 0 (3) 0 0 0 f pix = f/w enor w image (4) Here, the diagonal element account for the recaling from pixel in the image to a poition in mm on the chip. The off-diagonal element preent an offet, whereby the origin of the image i at the top left corner, and the origin of the chip coordinate i at the centre of the chip. 2.3 Extrinic parameter To compute the extrinic matrix, we need to know the offet (x,y,z) of the camera relative on an arbitrary fixed real-world reference point (0,0,0) in the three patial direction. Cutomarily, the z-coordinate of the reference point i the ground, and z i therefore the height of the camera above ground. Similarly, the x,y plane of our coordinate ytem i cutomarily the horizontal plane. We alo need to know three angle: the tilt angle α tilt, which pecifie how much the camera i ideview z y topview x y image x y height tilt y heading 0,0 x roll Figure Extrinic camera parameter. Side view: the height pecifie how high the camera i poitioned over the ground, the tilt angle pecifie how much the camera i tilted againt the horizontal. Top view: the offet (x, y) pecifie how much the camera i moved from the origin and the heading angle pecifie in which direction it i looking. Image: the roll pecifie how much the image i rotated around it centre. 2
tilted againt the horizontal, the heading angle α heading which pecifie the direction relative to the y-direction in which the camera i heading, and the roll angle α roll which pecifie how the image i rotated (ee Fig. ). To compute the extrinic camera matrix, we firt need the three rotation matrice and the tranlation matrix: R tilt = 0 0 0 co(α tilt ) in(α tilt ) (5) 0 in(α tilt ) co(α tilt ) co(α roll ) in(α roll ) 0 R roll = in(α roll ) co(α roll ) 0 (6) 0 0 co(α heading ) in(α heading ) 0 R heading = in(α heading ) co(α heading ) 0 (7) 0 0 x t = y (8) height The extrinic camera matrix then conit of the 3x3 rotation matrix R and the 3x tranlation matrix t ide by ide, a a 4x4 matrix in projective coordinate. R = R roll R tilt R heading (0) T = R tilt R heading t () ( ) R T C extr. = (2) 0 The final camera matrix C i the product of the intrinic and the extrinic camera matrix. 2.4 Projecting from the World to the Camera (9) C = C intr. C extr. (3) Baed on the camera matrix C, it i traight forward to ee how a real-world point correpond to a pixel of the acquired image. Firt, the real-world point p world (x, x 2, x 3 ) i written in projective coordinate: p world = x x 2 x 3 (4) were p denote the vector p in projective coordinate. Second, the point p can be projected to the image coordinate: p im = C p world (5) Finally, the point p im i converted back from projective coordinate (which ha 3 entrie) to conventional coordinate p im (with two entrie) by dividing by the additional caling factor (which i the 3rd entry of p im ): ) ( pim / p p im = im3 (6) p im2 / p im3 where the ubcript denote the entry of the vector p im. 3
2.5 Projecting from the Camera back to real-world coordinate While projecting from the 3D real-world to the 2D image i a traight forward matrix multiplication, projecting from the image back to the real-world i more difficult. A the information of the 3rd dimenion i lot during the tranformation from the real-world to the image, there exit no unique back-tranformation. An additional contraint i needed to tranform a point back to the 3D world, e.g. one of the 3D coordinate mut be fixed. For example: if the real-world point p world ha a fixed x 2 coordinate (for example a mural painting on a vertical wall that i aligned in the y-direction of the coordinate ytem) and the image coordinate y and y 2 are given, the back-tranformation can be performed a follow: y y 2 = c x c 2 c 3 c 4 c 2 c 22 c 23 c 24 x 2 x c 3 c 32 c 33 c 3 (7) 34 x c c 2 x 2 c 3 c 4 = c 2 c 22 x 2 c 23 c 24 x c 3 c 32 x 2 c 33 c 3 (8) 34 c c 2 x 2 + c 4 c 3 x = c 2 c 22 x 2 + c 24 c 23 (9) c 3 c 32 x 2 + c 34 c 33 x 3 = C x (20) x 3 y x C y 2 = (2) x 3 Thi mean that the information about the fixed 3D coordinate ha to be incorporated in the camera matrix. The invere of the reulting matrix, when multiplied with the image point in projective coordinate, give the unknown x and x 3 entrie of the real-world 3D point. After recaling the vector entrie (diviion by ), the known x 2 value i added to the vector to retrieve the real-world coordinate of the 3D point p world. The ame approach can be ued with fixed x coordinate or, more relevant for many application, with fixed x 3 coordinate (i.e. object on a levelled urface are imaged) (ee appendix A). 3 Fitting Camera Parameter Often, only the intrinic camera parameter are known, but not the extrinic parameter that define the orientation of the camera. The CameraTranform package provide everal fitting routine that allow uer to infer the extrinic parameter from characteritic feature in the image. In many cae, the heading and poition of the camera can be et to 0, a they are only of interet when the camera image need to be compared to other camera image or when it need to be cartographically mapped. Thi leave only the parameter height, tilt and roll free, unle the camera wa properly horizontally aligned, in which cae roll i zero. 3. Influence of Camera Parameter Uncertaintie To evaluate the enitivity of the perpective projection with repect to uncertaintie in the camera parameter, we computationally place object of m height in world coordinate at different ditance from the camera (50 300 m) and project them to the camera image. The poition in the camera image are then projected back to real-world coordinate uing a different parameter et where we vary the camera height and tilt angle. We ue a focal length of 4 mm, a enor ize of 7.3 9.7 mm with 4608 2592 px. The camera i placed at a height of 20 m with a tilt angle of 80. For the back projection, the height and tilt are varied by ±0% (Fig. 2). For each parameter configuration, the apparent object height calculated. Since we know the true object height, the 4
A object height (m) 2.00.75.50.25.00 0.75 ditance 50m 300m B object height (m) 2.00.75.50.25.00 0.75 0.50 8 20 22 camera height (m) 0.50 75 80 85 tilt angle (deg) Figure 2 Influence of height and tilt angle variation of ± 0%. Object with a height of m (dahed line) and different ditance (50 m 300 m) projected to the camera and back to the world with changed camera parameter. A) For variation of the heigh parameter: 20 m ± 0% and B) the tilt parameter: 80 ± 0%. recontructed object height indicate the error that i introduced by uncertaintie in the extrinic camera parameter. We find that the apparent object height i robut to variation in camera height regardle of the ditance between object and camera (Fig. 2A). By contrat, the apparent object height i enitive to variation in the camera tilt angle, epecially for object with larger ditance from the camera (Fig. 2B). 3.2 Fitting extrinic parameter from object of known height If the true height of object in the image i know, the camera parameter can be fitted. Thi work epecially well for the tilt angle a it mot enitively affect the apparent object height (Fig. 2B). The input for the fitting routing i a lit of bae (foot) and top (head) poition of the object. For fitting, the algorithm project the foot poition from the image to world coordinate, move the bae poition in z-direction by the known object height, and project thee point back to the camera image. The difference between the input top poition and the back-projected top poition i then minimized with a leat-quare fit routine. Optionally, if a horizon i viible in the image, CameraTranform ue the horizon line a an additional contraint for fitting the camera parameter. The error between the uer-elected horizon and the fitted horizon i aigned a weight of 50% of the total error. To evaluate thi method, an artificial image i created uing the CameraTranform package. We ue again a focal length of 4 mm, a enor ize of 7.3 9.7 mm with 4608 2592 px, a camera height of 20 m and a tilt angle of 80. 5 rectangle with a width of 30 cm and a height of m are placed at ditance ranging from 50 to 50 m. Uing the oftware ClickPoint, we mark the bae and top poition of thee rectangle and provide them a input for the fitting routine. We then invetigate how the fitted height and the fitted tilt angle vary with the number of provided object. We tart with only one object and increae the number of object to 5. For every iteration, the object are randomly choen. The experiment wa repeated multiple time with and without a horizon. The reult indicate, a expected, that by including a larger number of object, the uncertainty of the parameter etimate (a indicated by the variability between repeated meaurement) decreae (Fig. 3). Both, the camera height and the tilt angle can be fitted with coniderably le uncertainty if a horizon i provided (Fig. 3D,E), compared to parameter etimate without horizon (Fig. 3A,B). The recontructed object height ( Fig. 3C,F) follow the ame pattern and alo profit from the horizon information. To demontrate the fitting procedure, we analye an image (Fig. 4A) from a wide-angel camera overeeing an Emperor penguin colony at Pointe Géologie, Antarctica. The camera wa poitioned 5
A 22 B 82 C Fitted height (m) Fitted tilt (deg) 8 78 D 5 0 5 E 5 0 5 F 22 82 Fitted height (m) 8 5 0 5 Number of ued obj. Fitted tilt (deg) 78 5 0 5 Number of ued obj. Recontructed h of obj. (m) Recontructed h of obj. (m).2..0 0.9 0 0.2..0 0.9 0 0 Number of ued obj. Figure 3 Influence of number of object ued for fitting. Top row A-C) without a given horizon, bottom row D-F) with a given horizon. A+D) The fitted camera height and, B+E) the camera tilt angle for different number of ued object. For each number of object a random election (without replacing) i taken from the clicked object and the camera matrix i fitted (parameter blue dot). From thee the mean i calculated (red croe). C-F) The error on fitted object height for different fit (mean± td, blue errorbar). on a nunatak, but no height information wa provided. We etimate the extrinic camera parameter by analying the feet and head poition of 20 animal, auming an average penguin height of m. Fig. 4B how the projected top view after fitting the extrinic camera parameter. The camera height obtained by the fit i with 23.7 m cloe the to the height value of 25.7 m meaured by a differential GPS. 3.3 Fitting by geo-referencing For large tilt angle, e.g. if image are taken by a helicopter (Fig. 5A), the ize of the object in the image doe not vary ufficiently with the y poition in the image o that the fitting approach baed on the known object ize i not viable. In addition, the horizon unlikely to be viible. For uch image, a different method i needed. If the approximate x,y location of the camera i known and an accurate map or a atellite image i available, point correpondence between the image and the map can be ued to etimate the camera parameter uing a proce known a image regitration. In the example hown in Fig. 5 where we photograph a King penguin colony at Baie du Marin from a helicopter flying approximately 300 m above ground, we ue eight point that are recognizable in the camera image and a atellite image provided by Google Earth (Fig. 5A,C). The cot function for our image regitration i the ditance between the projection of the image point to real-world coordinate and the correponding point in the atellite image. The fit routine then compute the height and tilt of the camera a well a the xy-poition and heading angle. The example in Fig. 5 demontrate that the fit routine matche all point except point #7, which i the branch point of a river that likely ha hifted from the time the atellite image wa taken (Fig. 5B). 4 Summary We preent a python package for etimating extrinic camera parameter baed on image feature, for image geo-referencing and correcting for perpective image ditortion. The package i deigned to ait in analying image for ecological application. The package i publihed under the GPLv3 6
B 300 y poition in m A feet head projection 250 200 50 00 50 200 00 0 00 x poition in m 200 Figure 4 Application to real data. A) Image taken with the MicrOb ytem of a penguin colony. The feet (green) and head (blue) poition of 20 penguin were manually marked. Thi data wa ued to fit the camera perpective (fitted head: red croe), which allow to project the image to a top view (B). open ource licene to allow for continuou ue and application in cience. The documentation i hoted on http://cameratranform.readthedoc.io with explanation on how to intall the package and with example on how to ue it. 5 Acknowledgement Thi work wa upported by the Intitut Polaire Françai Paul-Emile Victor (IPEV, Program no. 37 to CLB and 354 to FB). Thi tudy wa funded by the Deutche Forchunggemeinchaft (DFG) grant FA336/5- and ZI525/3- in the framework of the priority program "Antarctic reearch with comparative invetigation in Arctic ice area". Reference [] Tricia L Cutler and Don E Swann. Uing Remote Photography in Wildlife Ecology: A Review. Wildlife Society Bulletin (973-2006), 27(3):57 58, 999. [2] Tremaine Gregory, Farah Carraco Rueda, Jeica Deichmann, Joeph Kolowki, and Alfono Alono. Arboreal camera trapping: taking a proven method to new height. Method in Ecology and Evolution, 5(5):443 45, 5 204. [3] Richard Hartley and Andrew Zierman. Multiple view geometry in computer viion. Cambridge Univerity Pre, Cambridge, 2003. [4] Céline Le Bohec. Programme 37 of the Intitut Polaire Françai Paul-Emile Victor (PI: Céline Le Bohec), 203. [5] Tim P. Lynch, Rachael Alderman, and Alitair J. Hobday. A high-reolution panorama camera ytem for monitoring colony-wide eabird neting behaviour. Method in Ecology and Evolution, 6(5):49 499, 5 205. [6] Augut Ferdinand Möbiu. Der barycentriche Calcul, ein Hülfmittel zur analytichen Behandlung der Geometrie. Barth, Leipzig, 827. [7] Daniel P Zitterbart, Barbara Wienecke, Jame P Butler, and Ben Fabry. Coordinated movement prevent jamming in an Emperor penguin huddle. PloS one, 6(6):e20260, 20. 7
A B 3 7 6 7 4 5 2 3 45 2 C 0 0 6 7 3 452 0 point image point map 6 Figure 5 Fit of image to map. A) recorded camera image from a helicopter flight at the Baie du Marin colony at the Crozet iland [4]. B) image fitted over point in the image (blue) with point in the map image (red). C) a atellite image provided by Google Earth. Appendix A. Backtranform for x3 =0 y c y2 = c2 c3 x c3 c4 x2 c23 c24 x3 c33 c34 x c3 x3 c4 x2 c23 x3 c24 c33 x3 c34 c3 x3 + c4 x c23 x3 + c24 x2 c33 x3 + c34 c2 c22 c32 c c2 = c2 c22 c3 c32 c c2 = c2 c22 c3 c32 x = C x2 y x C y2 = x2 8 (22) (23) (24) (25) (26)