Computer Vision. Coordinates. Prof. Flávio Cardeal DECOM / CEFET- MG.

Computer Vision Coordinates Prof. Flávio Cardeal DECOM / CEFET- MG cardeal@decom.cefetmg.br

Abstract This lecture discusses world coordinates and homogeneous coordinates, as well as provides an overview on camera calibration. 2

World Coordinates World coordinates are used as reference coordinates for cameras or objects in the scene. Suppose we have a camera and 3D objects in the scene to be analyzed by computer vision. It may be convenient to assume an X Y Z w w w world coordinate system that is not defined by the particular camera under consideration. 3

World Coordinates The camera coordinate system X s Y s Z s needs then to be described with respect to the chosen world coordinates. Camera Coordinate System X s y Y s O s x u p d = [x d y d ] Τ x f X w y u Z s p u = [x u y u ] Τ Z w O w World Coordinate System P w = [X w Y w Z w ] Τ 4 Y w

World Coordinates The figure below exemplifies a world coordinate system at a particular moment during a camera calibration procedure. Source: R. Klette 5

Affine Transform World and camera coordinates are transformed into each other by an affine transform. An affine transform of the 3D space maps straight lines into straight lines and does not change ratios of distances between points lying on a straight line. However, it does not necessarily preserve angles between lines or distances between points. 6

Affine Transform Examples of affine transforms include translation, scaling and rotation. Here, the mathematical representation of an affine transform is by a linear transform defined by a matrix multiplication and a translation. For example, we may apply a translation as defined below: t = [ t 1 t 2 t 3 ] T 7

Affine Transform And a rotation as defined here: =[ ] r 11 r 12 r 13 R = r 21 r 22 r 23 = R 1 (α) R 2 (β) R 3 (γ ) r 31 r 32 r 33 Where: 1 0 0 cos β 0 sin β cosγ sin γ 0 R 1 (α) = 0 cosα sin α R 2 (β) = 0 1 0 R 3 (γ ) = sin γ cos γ 0 0 sin α cos α sin β 0 cosβ 0 0 1 8 x-axis y-axis z-axis

Affine Transform R 1 (α), R 2 (β) andr 3 (γ) are the individual rotations about the three coordinate axes, with Eulerian rotation angles,, and, one for each axis. α β Observation: rotation and translation in the 3D space are uniquely determined by six parameters,,,,, and. α β γ t 1 t 2 t 3 γ 9

2D Rotation Matrix (Review) y!y y v v From the figure on the left, we have: x = v cosα y = v sinα!x = v cos(α +θ)!y = v sin(α +θ) θ α!x x x However, we know that: cos(α +θ) = cosα cosθ sinα sinθ sin(α +θ) = sinα cosθ + cosα sinθ 10

2D Rotation Matrix (Review) y!y y θ α v v!x x Then, we have:!x = v cosα cosθ v sinα sinθ = x cosθ y sinθ!y = v sinα cosθ + v cosα sinθ = y cosθ + x sinθ x By using matrix notation:! # "#!x!y! & cosθ sinθ = # %& " sinθ cosθ! & # % "# x y & %& 2D Rotation Matrix 11

3D Rotation Matrix (Review) y z!y In the 3D case, we have, for example: y θ α v v!x x! # # # " x!x!y!z! & # & = # & # % " cosθ sinθ 0 sinθ cosθ 0 0 0 1 3D Rotation Matrix with respect to z-axis! & # & # & # % " x y z & & & % 12

3D Rotation Matrix (Textbook Notation) y y δ β x y φ α z γ ω " cosγ sinγ 0 % R z = sinγ cosγ 0 # 0 0 1 & x z " R y = # cosβ 0 sin β % 0 1 0 sin β 0 cosβ & z " R x = # 1 0 0 0 cosα sinα 0 sinα cosα % & 13 x

World and Camera Coordinates As previously mentioned, world and camera coordinates are transformed into each other by an affine transform. Consider the affine transform of a 3D point, given as P w = [ X w Y w Z w ] T in world coordinates, into in camera coordinates. P s = [ X s Y s Z s ] T 14

World and Camera Coordinates In this case, we have that: P s = R P w + t [ X s Y s Z s ] T = R [ X w Y w Z w ] T + t! # # # "# X s Y s Z s! & # & = # & # %& "# r 11 + r 12 + r 13 r 21 + r 22 + r 23 r 31 + r 32 + r 33! & # & + # & # %& "# t 1 t 2 t 3 & & & %& 15

World and Camera Coordinates The rotation matrix R and a translation vector need to be specified by calibration. t P w P s Note that and denote the same point in the 3D Euclidean space, just with respect to different 3D coordinate systems. 16

World and Camera Coordinates By performing the sum below we obtain:! # # # "# X s Y s Z s! & # & = # & # %& "# r 11 + r 12 + r 13 r 21 + r 22 + r 23 r 31 + r 32 + r 33! & # & + # & # %& "# X s = r 11 + r 12 + r 13 + t 1 Y s = r 21 + r 22 + r 23 + t 2 t 1 t 2 t 3 & & & %& Z s = r 31 + r 32 + r 33 + t 3 17

World and Image Coordinates Assume that a point in the 3D scene is projected into a camera at an image point in the coordinate system. p = [ x y ] T xy P w = [ X w Y w Z w ] T Consider also the affine transform between world and camera coordinates as previously defined: X s = r 11 + r 12 + r 13 + t 1 Y s = r 21 + r 22 + r 23 + t 2 Z s = r 31 + r 32 + r 33 + t 3 18

World and Image Coordinates So, by using: [ x y ] T = [ x u + c x y u + c y ] T = [ f X s Z s + c x f Y s Z s + c y ] T We have: " # x c x y c y f % " = & # x u y u f % " = f & # X s Z s Y s Z s 1 " % = f & # r 11 + r 12 + r 13 + t 1 r 31 + r 32 + r 33 + t 3 r 21 + r 22 + r 23 + t 2 r 31 + r 32 + r 33 + t 3 1 % & 19

World and Image Coordinates Computer vision algorithms for reconstructing the 3D structure of a scene or computing the position of objects in space need equations as below: " # x c x y c y f % " = & # x u y u f % " = f & # X s Y s 1 Z s Z s " % = f & # r 11 + r 12 + r 13 + t 1 r 31 + r 32 + r 33 + t 3 r 21 + r 22 + r 23 + t 2 r 31 + r 32 + r 33 + t 3 1 % & Note that this equation links the coordinates of points in 3D space with the coordinates of their corresponding image points. 20

World and Image Coordinates In those applications it is often assumed that the coordinates of points in the camera coordinate system can be obtained from pixel coordinates. And also that the camera coordinate system can be located with respect to the world coordinate system. This is equivalent to assume knowledge of some camera s characteristics, known in vision as the camera s extrinsic and intrinsic parameters. 21

Homogeneous Coordinates By using homogeneous coordinates, the matrix multiplication and vector addition in the affine transform reduce to one matrix multiplication. But, what are Homogeneous coordinates? They are the coordinates used in projective geometry, as Cartesian coordinates in Euclidean geometry. 22

Homogeneous Coordinates Formulas involving homogeneous coordinates are often simpler and more symmetric than their Cartesian counterparts. So, let s first introduce homogeneous coordinates in the plane before moving on to the 3D space. Basically, the idea is that instead of using only coordinates and, we add a third coordinate. x y w 23

Homogeneous Coordinates w 0 [!x!y w ] Τ Assuming that, represents now the point [ x y ] Τ = [!x w!y w ] Τ in the usual 2D inhomogeneous coordinates. w The scale of is unimportant and we will call [!x!y w ] Τ the homogeneous coordinates for a 2D point [ x y ] Τ = [!x w!y w ]. Τ 24

Homogeneous Coordinates w =1 Obviously, we can decide to use only for representing points in the 2D plane. In this case, we have. Of course, there is also the option to have. [!x!y 1 ] Τ w = 0 Homogeneous coordinates define existing points [ x y ] Τ, while coordinates define points at infinity. [!x!y 0 ] Τ [ x y ] Τ = [!x 1!y 1 ] Τ = [!x!y ] Τ 25

Homogeneous Coordinates [ X Y Z ] Τ R 3 A point is represented by [!X!Y!Z w ] Τ in homogeneous coordinates, with. [ X Y Z ] Τ = [!X w!y w!z w ] Τ So, now the affine transform relating the world and camera coordinates can be represented by a 4 x 4 matrix multiplication. Let s see an example. 26

Example Consider again the previous affine transform: P s = R P w + t [ X s Y s Z s ] T = R [ X w Y w Z w ] T + t! # # # "# X s Y s Z s! & # & = # & # %& "# r 11 + r 12 + r 13 r 21 + r 22 + r 23 r 31 + r 32 + r 33! & # & + # & # %& "# t 1 t 2 t 3 & & & %& 27

Example P w By representing in homogeneous coordinates, that affine transform could be rewritten as follows:! # # # "# r 11 r 12 r 13 t 1 r 21 r 22 r 23 t 2 r 31 r 32 r 33 t 3 & &! " X w Y w Z w 1 & %& Τ % =! " R t %! " X w Y w Z w 1 %Τ =! " Xs Y s Z s Τ % Note that the steps of matrix multiplication and vector addition reduce to one matrix multiplication. 28

Camera Calibration Camera calibration determines the intrinsic (i.e. camera-specific) and extrinsic parameters of a given one- or multi-camera configuration. The extrinsic parameters identify the transform between the unknown camera coordinate system and a known world coordinate system. 29

Camera Calibration That said, the extrinsic parameters are the ones of the rotation matrix R and the translation vector involved in the previous affine transform: P s = R P w + t t It brings the corresponding axes of the camera and world coordinates systems onto each other. It describes the relative positions of the origins of the camera and world coordinates systems. 30

World Coordinates Camera Coordinate System =[ ] Y s x u x O X s s f y r 11 r 12 r 13 R = r 21 r 22 r 23 = R 1 (α) R 2 (β) R 3 (γ ) r 31 r 32 r 33 y u Z s t X w α World Coordinate System Z w O w γ β Y w 31

Camera Calibration By representing P w in homogeneous coordinates, that is, by adding a fourth coordinate 1 to P w, we may rewrite the previous equation as follows: Where " P s = [ R T ] # M e P w 1 % " & = M e # is called the matrix of extrinsic parameters. P w 1 % & 32

Camera Calibration The intrinsic parameters, in turn, are the set of parameters needed to characterize the optical, geometric, and digital features of the camera: Focal length f ;; The location of the center of the image plane in pixel coordinates, also called principal point [ c x c y ] Τ ;; The physical size in horizontal and vertical directions of the sensor cell [ s x s y ] Τ ;; And, if required, the radial distortion parameters. 33

Camera Calibration Disregarding radial distortion and the physical size of the sensor cell, we can group all the intrinsic parameters in a single 3 x 3 matrix :! # M i = # # "# f 0 c x 0 f c y 0 0 1 M i This matrix is called matrix of intrinsic parameters. 34 Τ & & & %&

Camera Calibration A camera-producer specifies normally some intrinsic parameters (e.g. the physical size of sensor cells). However, the given data are often not accurate enough for computer vision applications. M i The matrix of intrinsic parameters performs the transformation between the camera coordinate system and the image coordinate system. 35

Camera Calibration Therefore, by considering the representation of in homogeneous coordinates, we may derive the following equation for a perspective projection: "!p = M i P s = M i M e # P w 1 " % & = #!p 1!p 2!p 3 % & P w 36

Camera Calibration What is interesting about vector!p = [!p 1!p 2!p 3 ] Τ is that the ratios!p 1!p 3 and!p 2!p 3 are nothing but the coordinates in pixel of the image point. Next, we will provide an overview on camera calibration such that you can use well-known software with sufficient background knowledge. However, we will not detail any particular calibration method, which is outside the scope of this course. 37

Camera Calibration For camera calibration, we use geometric patterns on 2D or 3D surfaces that we are able to measure very accurately. For example, we can use a calibration rig that is either attached to walls or dynamically moving in front of the camera while taking multiple images. 38

Camera Calibration A typical calibration rig (a checkerboard pattern). Source: R. Klette 39

Camera Calibration The used geometric patterns are recorded and localized in the resulting images. Next, their appearance in the image grid is compared with the available measurements about their geometry in the real world. Calibration may be done by dealing with only one camera (e.g. of a multi-camera system) at a time. 40

Camera Calibration In this case, we may assume that cameras are either static or moveable. For the latter case, we only calibrate internal (intrinsic) parameters. Recording may start after having the parameters needed for calibration specified and the appropriate calibration rig and software at hand. Calibration needs to be redone from time to time. 41

Camera Calibration When calibrating a multi-camera system, all cameras need to be time-synchronized, especially if the calibration rig moves during the procedure. Each camera contains its own camera coordinate system, having the origin at its projection center. 42

Camera Calibration The calibration rig is commonly used for defining the world coordinates at the moment when taking an image (see figure below). Source: R. Klette 43

Camera Calibration We consider the following transforms: 1 A transform from world coordinates [ X w Y w Z w ] Τ to camera coordinates [ X s Y s Z s ] Τ ;; 2 A central projection of [ X s Y s Z s ] Τ into undistorted image coordinates [ x u y u ] Τ ;; 3 The lens distortion involved, mapping [ x u y u ] Τ into the valid (i.e. distorted) coordinates [ x d y d ] Τ ;; 44

Camera Calibration We consider the following transforms (cont.): x d y d 4 A shift of coordinates by the principal point [ x c y c ] Τ, defining the sensor coordinates [ x s y s ] Τ ;; 5 The mapping of sensor coordinates [ x s y s ] Τ into image coordinates [ x y ] Τ (i.e. the pixel s address). 45

Lens Distortion The mapping from a 3D scene into 2D image points combines a perspective projection and a deviation from the model of a pinhole camera. This deviation is caused by radial lens distortion. In this case, how can we compute the coordinates of undistorted image points? 46

Lens Distortion Given a lens-distorted image point p d = [ x d y d ], Τ we can obtain the corresponding undistorted image point p u = [ x u y u ] Τ as follows: = x u = c x + (x d c x ) ( 1 + κ 1 rd 2 + κ 2rd 4 + e ) x y u = c y + (y d c y ) ( 1 + κ 1 rd 2 + κ 2rd 4 + e ) y For: r d = (x d c x ) 2 + (y d c y ) 2. 47 significant and can be assumed

Lens Distortion e x e y The errors and are insignificant and can be assumed to be zero. There is experimental evidence that with only the two lower-order parameters k 1 and k 2, we can correct more than 90 % of the radial distortion. After having lens distortion corrected, the camera may be viewed as pinhole-camera model. 48

Designing a Calibration Method First, we need to define the set of parameters to be calibrated and a corresponding camera model. For example, if the radial distortion parameters and need to be calibrated, then the camera = model needs to include the previous equations: k 2 k 1 x u = c x + (x d c x ) ( 1 + κ 1 rd 2 + κ 2rd 4 + e ) x y u = c y + (y d c y ) ( 1 + κ 1 rd 2 + κ 2rd 4 + e ) y 49

Designing a Calibration Method If we know the radial distortion parameters and used them for mapping distorted images into undistorted ones, we can use equations such as: " # x c x y c y f % " = & # x u y u f % " = f & # X s Z s Y s Z s 1 " % = f & # r 11 + r 12 + r 13 + t 1 r 31 + r 32 + r 33 + t 3 r 21 + r 22 + r 23 + t 2 r 31 + r 32 + r 33 + t 3 1 % & 50

Designing a Calibration Method P w = [ X w Y w Z w ] T A point on the calibration rig or on a calibration mark is known by its physically measured world coordinates. P w Such a point could be the corners of squares on the calibration rig or special marks in the 3D scene where calibration takes place. 51

Designing a Calibration Method P w For each point, we need to identify the corresponding point p = [ x y ] Τ, which is the projection of in the image plane. P w (P w, p) Having, for example, 100 different pairs, we would have 100 equations in the form of: " # x c x y c y f % " = & # x u y u f % " = f & # X s Y s 1 Z s Z s " % = f & # r 11 + r 12 + r 13 + t 1 r 31 + r 32 + r 33 + t 3 r 21 + r 22 + r 23 + t 2 r 31 + r 32 + r 33 + t 3 1 % & 52

Designing a Calibration Method For each equation of the form: " # x c x y c y f % " = & # x u y u f % " = f & # X s Y s 1 Z s Z s " % = f & # r 11 + r 12 + r 13 + t 1 r 31 + r 32 + r 33 + t 3 r 21 + r 22 + r 23 + t 2 r 31 + r 32 + r 33 + t 3 f,c x,c y,t 1,t 2, We have the following 09 unknowns: t 3,α, β and γ. 53 1 % &

Designing a Calibration Method So, by considering the 9 previous unknowns, at least 5 pairs(p w, p) should be provided, once each pair defines 2 equations. For the 100 different points aforementioned, we would have an overdetermined equational system (a system with more equations than unknowns). In this case, we need to apply an optimization procedure for solving it for those few unknowns. 54

Designing a Calibration Method It is important to emphasize that we still could refine our camera model. For example, we could include in the matrix of intrinsic parameters the dimensions in horizontal and vertical directions of the sensor cell:. [ s x s y ] Τ Accordingly, the resulting system of equations will become more complex and have more unknowns. 55

Designing a Calibration Method Thus, summarizing the general procedure: known points P w in the world coordinate system are related to their corresponding projections in the image. The equations defining our camera model contain X w,y w, Z w, x and y as known values and the intrinsic or extrinsic parameters as unknowns. The resulting system is necessarily nonlinear due to central projection or even radial distortion. 56 p

Designing a Calibration Method So, it needs to be solved for the specified unknowns, where over-determined situations provide for stability of a used numeric solution scheme. We do not discuss any further such system of equations or solution schemes in this course. 57

Calibration Board A rigid calibration board wearing a black and white checkerboard pattern is common. It is recommended that it has 7 x 7 squares at least. The squares need to be large enough such that their minimum size, when recorded on the image plane during calibration, is 10 x 10 pixels at least. 58

Calibration Board A rigid and planar board can be achieved by printing the calibration rig onto paper, which is then glued onto a rigid board. This method is relatively cheap and reliable. The grid can be created with any image-creating tool as long as the squares are exactly of the same size. 59

Corners in the Checkerboard For the checkerboard, calibration marks are the corners of the squares. Those corners can be identified by approximating intersection points of grid lines, thus defining the corners of the squares with subpixel accuracy. For example, assume 10 vertical and 10 horizontal grid lines on a checkerboard. 60

Corners in the Checkerboard Then this should result in 10 + 10 peaks in the Hough space for detecting line segments. dα Each peak defines a detected grid line, and the intersection points of those define the corners of the checkerboard in the recorded image. Applying this method requires that lens distortion has been removed from the recorded images prior to applying the Hough-space method. 61

Next Lecture Stereo Vision Epipolar Geometry. Binocular Vision in Canonical Stereo Geometry. Suggested reading Section 7.3 of textbook. 62