Image Formation. Antonino Furnari. Image Processing Lab Dipartimento di Matematica e Informatica Università degli Studi di Catania

Image Formation Antonino Furnari Image Processing Lab Dipartimento di Matematica e Informatica Università degli Studi di Catania furnari@dmi.unict.it 18/03/2014

Outline Introduction; Geometric Primitives & Basic Transformations; 2 Image Formation: pinhole, thin lenses and perspective camera model; Wider Field Of View projections; From the Real World to the Camera.

Visual Perception 3 The aim of Computer Vision is to replicate the human ability to see and understand the images coming from the environment; Visual Perception is the human ability to interpret the surrounding environment by processing information that is contained in visible light; Visual Perception relies on the Visual System, which is the part of the central nervous system which gives organisms the ability to see; Since its importance in Visual Perception, we are interested in understanding the image formation process.

Visual Perception 4 David Marr described the human vision as proceeding from a two-dimensional visual array (on the retina) to a threedimensional description of the world as output. His stages of vision include: a primal sketch of the scene, based on feature extraction of fundamental components of the scene, including edges and regions; a 2.5D sketch of the scene, where textures are acknowledged, orientations and shades are considered to infer some depth elements; a 3D model, where the scene is visualized in a continuous, 3-dimensional map.

Human Vision 5 The eyes are the organs of vision: they detect light and convert it into electro-chemical impulses; Two components are important: The crystalline lens: a transparent, biconvex structure which helps to refract light to be focused on the retina; The retina: a light sensitive layer of tissue in the inner surface of the eye, where the image is actually formed;

Part I: Geometric Primitives & Basic Transformations 6

Geometric Primitives points 7 A 2D point is defined as usually represented as: A 3D point is defined as usually represented as: and is homogeneous coordinates and is

Homogeneous Coordinates 8 Homogeneous coordinates or projective coordinates are a system of coordinates used in projective geometry; Formulas involving homogeneous coordinates are often more simple than their Cartesian counterparts; Homogeneous coordinates have a range of applications, including computer graphics and 3D computer vision, where they allow affine transformations and, in general, projective transformations to be easily represented by a matrix; The points are projected from a space of dimensions n to a space of dimensions n+1: ; In the 2D case: ; The conversion from homogeneous coordinates to Cartesian ones is possible dividing x and y by w.

Homogeneous Coordinates projective points scale factor (if equals to zero, the point is at infinity) all equivalent cartesian point e.g., 9

Geometric Primitives lines A line in the plane is represented by its canonical equation: cartesian coordinates homogeneous coordinates A line can be represented in normalized form specifying its normal line and its distance from the center: where 10

Geometric Primitives planes A plane in the 3D space is represented using the equation: cartesian coordinates homogeneous coordinates 11 A normalized representation of the planes is possible.

Basic Transformations 12 Translation: preserves orientation; Similarity: preserves lengths; Euclidean: preserves angles; Affine: preserves parallelism; Projective: preserves straight lines.

Basic Transformations If the points are in homogeneous coordinates, the basic transformations can be easily handled as the matrix product of the vector coordinates and a 3x3 transformation matrix. Identity matrix translation vector rotation matrix scale factor affine matrix projection matrix 13

Basic transformations examples (translation) (rigid) (similarity) 14

Basic Transformations examples The projection is performed as: (translation) (rigid) 15

Part II: Image Formation 16

Image Formation The image is formed when the light rays hit the objects and are reflected to a photosensitive surface: each ray is refracted everywhere on the surface 17

Image Formation The image is formed when the light rays hit the objects and are reflected to a photosensitive surface: each point in the surface is hit by rays coming from any position 18

Pinhole Model The pinhole model allows to control the rays' directions:... but it's an impractical model: no zoom, no focus, high sensitivity is required. 19

Thin Lenses A thin lens is a lens which thickness is negligible compared with the radii of curvature of the lens surfaces: f: focal length 20

Thin Lenses They can be used to control the rays' directions as the following rules are valid: optical axis 21 Any ray that enters parallel to the axis on one side of the lens proceeds towards the focal point F on the other side. Any ray that arrives at the lens after passing through the focal point on the front side, comes out parallel to the axis on the other side. Any ray that passes through the center of the lens will not change its direction.

Thin Lenses They can be used to control the rays' directions as the following rules are valid: optical axis image real object 22 Any ray that enters parallel to the axis on one side of the lens proceeds towards the focal point F on the other side. Any ray that arrives at the lens after passing through the focal point on the front side, comes out parallel to the axis on the other side. Any ray that passes through the center of the lens will not change its direction.

Thin Lens Equation The lens equation can be derived directly from the three rules discussed above considering similar triangles: optical axis 23

Thin Lens Equation optical axis 24 Considerations: The image zoom depends on the focal length f; The image focus depends on the focal length f and on the distance between the points and the lens; Only points which distance from the (approximatively) equals to will be in focus. lens is

Perspective Camera 25 In a pinhole camera, a thin lens camera or a human eye, the image is formed through the projection of a 3D scene on a 2D image plane π. Hence we assume a simplified projection model (the perspective camera) where: the projection center is the origin of both the camera and the real world coordinate system; the image plane π is perpendicular to the Z axis (parallel to the XY plane); the distance between π and the Cartesian system origin is equals to the focal length f; We want to model how a 3D point P (X,Y,Z) is projected to a 2D point p (x,y) on the image plane.

Perspective Camera image plane real world point P (X,Y,Z) image point p (x,y) projection center distance between π and the origin The fundamental equations of the perspective camera are derived considering similar triangles: (1) 26

Weak Perspective Camera For different 3D points, equations and are not linear since they refer to different Z values. If the 3D points of the scene are far enough from the camera, their Z values are big and their differences are negligible, so they can be approximated with an average value. The following approximation holds: where 27.

Part III: Different Projections 28

Polar and Spherical Coordinates z (r, θ, φ) r θ φ x 29 y

Perspective Projection ϑ If we consider 2D points in polar coordinates p (ρ,φ) and the 3D points in spherical coordinates p (P,ϑ,φ), the perspective projection can be characterized by the following formula: where ϑ is the angle between the incoming light ray and the lens principal axis and φ is the same as the original 3D point. 30

Field of View a solid angle through which a detector is sensitive to electromagnetic radiation Field of View (horizontal, vertical and diagonal) solid angle (two-dimensional angle) it can be measured horizontally, vertically or diagonally 31

Field of View 32

Characteristics of the Perspective Projection 33 The perspective camera model, although its simplicity, can be sucesfully used to model the image formation process of the class of the perspective cameras. This class includes most conventional cameras, e.g., consumer digital (and analogical) cameras, mobile phone cameras, webcams, etc; Although the aforementioned projection is suitable to describe only narrow Field Of View (FOV) cameras: theoretically under 90, but up to a maximum of 140 with distortion; Moreover the human binocular FOV is approximately 180 horizontally and 120 vertically.

Omnidirectional Cameras In order to obtain wider FOV, different projection models are adopted. The wider FOV cameras are usually referred to as omnidirectional cameras. This is generally achieved in two ways: Using different lenses (e.g., fisheye lenses); Using a curved mirror in otder to reflect the light prior to acquiring it with a perspective camera. The former is the class of dioptric cameras, while the latter is the class of catadioptric cameras. 34

Omnidirectional Cameras catadioptric 35 dioptric (fisheye)

Some Wider FOV Projections fisheye projections also used in catadioptric cameras 36

Some Wider FOV Projections distorted position according to a wider FOV projection position according to perspective projection 37

Fisheye Distortion The effect of the different projection is a wider field of view at the expense of a symmetrical radial distortion increasing with the distance from the center of the lens (which is often close to the center of the image). 38

Part IV: From the World to the Camera 39

From the World to the Camera 40 So far we have assumed that the camera coordinate system corresponds to the real world coordinate system; Accordingly we have modeled the mapping between the real world 3D points and the corresponding 2D points in the image plane as a simple projection; In most of the applications this assumption is often false and we need to know how to project the real world 3D points (in metric coordinates) to image 2D points (in pixel coordinates); The projection function depends on many parameters which are specific to the used camera and to the scene. The parameters can be found through a process called geometric camera calibration; In the next slides we will define the projection function discussing its parameters.

From the World to the Camera coordinate systems For clarity sake we define three coordinate systems: 1) The 3D real world coordinate system (CS) in metric units (e.g., millimeters); 2) The 2D image plane (ideal camera plane) coordinate system in metric units; 3) The 2D camera plane coordinate system in pixels. Therefore we split the mapping of the 3D points into two main steps: I. Projection from the 3D real world coordinate system to the 2D image plane coordinate system; <- extrinsic parameters II. Projection from the 2D image plane coordinate system to the 2D camera plane coordinate system. <- intrinsic parameters 41

1) The Real World CS Z X 42 Y

The Camera CS (still 3D!) 43

The Camera View 44

The Finite Image 45

The Finite Image + 1) Real World CS 46

2) The Image (ideal camera) CS y x 47

3) The Camera CS 48

From the World to the Camera extrinsic parameters In order to align the real world coordinate system to the one of the image plane, we need to perform a translation followed by a rotation. So we need: Degrees of Freedom A rotation matrix R (3 angles = 3 DoF); A translation vector T (3 components = 3 DoF); The 3D world points are expressed in the image plane coordinate system trough the equation: (2) The 3D points are then projected into the 2D image plane through the perspective camera fundamental equations. (1) We have 6 Degrees of Freedom, which are the camera extrinsic parameters. the extrinsic parameters allow the mapping to the ideal image plane 49

From the World to the Camera intrinsic parameters The intrinsic parameters are used to map the image plane points to the ones of the camera plane (the sensor array!): Focal length (related to the zoom factor); Principal point pixel coordinates Scale factors (to express the point coordinates in pixels: the dimensions of a pixel on the sensor array); Radial distortion parameters: deviation form the ideal model). ; (to model the not suitable to model the radial distortion of a wide angle image! 50

From the World to the Camera intrinsic parameters Let be the pixel coordinates (camera plane) of point (image plane), the following equations hold: (3) 51 Real cameras are subject to distortion caused by the deviation from the ideal model. The distortion is modeled with the following formulas: Where and are the distorted coordinates of point.

From the World to the Camera all parameters 52 The parameters are: T: translation vector; R: rotation matrix; f: the focal length; ox, oy: the principal point coordinates; sx, sy: millimeters to pixel conversion factor (dimension of the pixels in millimeters); k1, k2: distortion parameters.

From the World to the Camera coordinate mapping 53 A world point is mapped into a camera point in the following way: is first mapped to the camera coordinate system using the extrinsic parameters ( ); is projected to the image plane point using the perspective camera fundamental equations; is mapped to the camera plane pixel coordinates point using the intrinsic parameters.

From the World to the Camera coordinate mapping Consider equations (1), (2) and (3) first: (1) (2) (3) In (1) and (3), x and y are the coordinates of the point on the image plane, while X and Y are the coordinates of the world point expressed in the image plane coordinate system. Equating (1) and (3) we get: In (2) Pc (X,Y,Z), PW (XW,YW,ZW) and the rotation matrix R is decomposed in the rows R1, R2, R3. Substituting we get: 54

From the World to the Camera coordinate mapping Once we know all the paramters, we can perform the projection (apart from distoriton) using the formula: T R ( P T ) f 1 W x im =o x s x R3 ( P W T )T T R (P T ) f 2 W y im =o y s y R 3 (PW T )T 55

From the World to the Camera matrix form The whole procedure can be written in matrix form defining the two intrinsic parameters matrix matrices: where: extrinsic parameters matrix and the operation: coordinate mapping k1 and k2 are excluded 56

Question Time 57

Contacts For any doubts please contact me: furnari@dmi.unict.it; Room 30; Slides available at: Studium course page: 58 http://studium.unict.it/dokeos/2014/courses/73072c2/