Pattern Recognition and Machine Vision Camera Models and Geometry

Size: px

Start display at page:

Download "Pattern Recognition and Machine Vision Camera Models and Geometry"

Claribel Pitts
6 years ago
Views:

1 Pattern Recognition and Machine Vision Camera Models and Geometry Dr. Simon J.D. Prince, Prof. Bernard F. Buxton Computer Science University College London Gower Street, London, WCE 6BT Preliminary Geometry. Simple Representations of Lines and Points in 2D Points in two dimensional space may be described in lower case letters by the column vector, ˆx = ( ˆx,ŷ) T. In preliminary maths courses we are presented with a different formula for a line which depends on just two parameters: ŷ = â ˆx + ĉ. () An alternative way to describe a line in two dimensions is with the three-dimensional row vector, l = (a,b,c). A point ˆx is defined to be on the line if it satisfies the relation: a ˆx + bŷ + c = 0 (2) Notice that any constant multiplicative factor, s, of the elements of the line vector, l 2 = (sa,sb,sc) produces exactly the same line. Since there are really only two numbers describing a line in two dimension, this representation is redundant and the redundancy is in the scaling factor. We can return to our original description of the line by forcing b to be one. We do this by dividing all terms by the value of b so that:.2 Homogeneous Co-ordinates â = a/b ĉ = c/b (3) Homogeneous co-ordinates extend this notion to the points. Instead of representing a 2D point with two parameters, we represent it with three parameters so that a point in the plane is now represented by the column vector x = (x,y,z) T. Similarly to the case of the line, there is a redundant scaling factor, s, so that all points of the form sx = (sx,sy,sz) T are considered equivalent. To return to the standard or inhomogenous representation of a point, ˆx = ( ˆx,ŷ) T we simply divide through by the last element, z so that: ˆx = x/z ŷ = y/z (4) (5)

2 What are the advantages of having these over-parameterized representations of lines and points? One nice property is that the condition for a point lying on a line may now be written very neatly: lx = 0 (6) A second advantage is that there are certain lines and points that cannot be described with the standard formulation. For example, the line ˆx = 0 cannot be described in the form ŷ = â ˆx + ĉ. However, it can be easily described in the over-parameterised form as l = (, 0, 0). Similarly, points at infinity cannot be properly described using only two numbers. However, we can describe a point at infinity in a given direction by setting the value z to zero. Hence, the point x = (,0,0) T represents a point infinitely far along the x axis. All points at infinity are referred to as ideal points. A third advantage of the homogenous notation is that it permits simple formulae to calculate the line, l through two points x and x 2. Equation 6 shows that the line vector must be orthogonal to the point vector if the point lies on the line. Hence, given two points, the line passing through both must have be represented by a vector that is orthogonal to both. We can find such a vector using the cross product, so that: l = x x 2 (7) Similarly, the point x where two lines, l and l 2 join can be calculated using the analagous formula: x = l l 2 (8) Notice that all of these relations (6,7 and 8) work even when the points are at infinity. So two parallel lines intersect at an ideal point etc..3 Generalization to Three Dimensions We will represent points in three dimensions using upper case letters. The standard (inhomogenous) representation of a point in three dimensions is given by the column vector ˆX = ( ˆX,Ŷ,Ẑ) T. The homogeneous representation introduces a redundant fourth parameter so that X = (X,Y,Z,W) T. Once more, we retrieve the inhomogenous co-ordinates by dividing by the last element of the homogenous co-ordinate. ˆX = X/W Ŷ = Y /W.4 Heirarchy of 2D Transformations Ẑ = Z/W (9) We can introduce a heirarchy of geometric transformations in two dimensions. Each level of the heirarchy is a subset of those transformations above it. Rotation: Rotates points around the origin. Only one degree of freedom, θ, which is the angle around which they are rotated. Represented in matrix form as:

3 T Rot = r r 2 0 cos(θ) sin(θ) 0 r 2 r 22 0 = sin(θ) cos(θ) 0 (0) Note that this is a two-dimensional rotation, as it expects to operate on a homogeneous vector. Euclidean: Rotates points and translates them in the plane. Three parameters: one for the rotation and two for the translation. T Euc = r r 2 t x r 2 r 22 t y = cos(θ) sin(θ) t x sin(θ) cos(θ) t y () Similarity: Rotates points and translates them in the plane, and applies uniform scaling factor, s. Four parameters: one for the rotation, two for the translation and one for the scaling. T Sim = sr sr 2 t x sr 2 sr 22 t y = scos(θ) ssin(θ) t x ssin(θ) scos(θ) t y (2) Affine: A linear transformation of the x and y co-ordinates plus a translation in the plane. The linear transformation allows shearing effects, but maintains parallelism. Six degrees of freedom, four for the linear transformation and two for the translation. T A f f = a a 2 t x a 2 a 22 t y (3) 0 0 Projective: A linear transformation of all three of the homogenous co-ordinates. T Pr j = h h 2 h 3 h 2 h 22 h 23 (4) h 3 h 32 h 33 Although this appears at first sight to have 9 degrees of freedom, there are actually only eight since there is a redundant scaling factor because of the homogeneous notation. To see why, consider what would happen if you multiplies a homogeneous vector by this matrix and converted to inhomogeneous form. The two-dimensional projective transform is variously also referred to as a homography or colinearity (see Section.5). An analogous hierarchy can be constructed for three-dimensional transformations, but we shall have more use for the two dimensional case as these operate on the two-dimensional image plane.

4 .5 The Homography We now add some details to our description of the two-dimensional projective transformation or homography. The homography has the property that it can map any four points in the plane to any other four points. However, given the mapping of these points, the mapping of a fifth point is constrained. It has eight degrees of freedom. One way to think about this is to consider mapping a unit square - the eight degrees of freedom are the final x and y positions of the four corners of the square after transformation. In inhomogeneous notation, the mapping from the original point, ˆx = ( ˆx,ŷ ) T to the new point, ˆx = ( ˆx,ŷ ) T is non-linear: ˆx = h ˆx + h 2 ŷ + h 3 h 3 ˆx + h 32 ŷ + h 33 ŷ = h 2 ˆx + h 22 ŷ + h 23 h 3 ˆx + h 32 ŷ + h 33 (5) However, in homogeneous notation, this becomes a very simple linear transformation: x = T Pr j x (6) This is another strong advantage of the homogeneous notation: it allows us to apply this quite complicated non-linear transformation using only linear algebra. Moreover, we can calculate the inverse transformation by simply inverting the matrix so that: x = T Pr j x (7) Lines transform in a different way. The forward transformation of a line by a homography can be written as: l T = l T T Pr j (8) Technically, points are considered to be covariant tensors and lines are contravariant tensors. One says that points transform covariantly and lines transform contravariantly. 2 Pin-Hole Camera Model The pinhole camera model is a mathematical approximation of the real world that is commonly used in computer vision. Imagine a closed box and make a single pinhole in the centre of one of the sides. Light will pass through the pinhole and form an inverted image inside the box on the opposite side. If the pinhole is infinitely small then this image will be focused. In reality cameras come equipped with lenses, and gather light from a much larger area so it must be emphasized that this is a very crude approximation to the real world. Although an image in a real camera would be inverted, it is common in computer vision to consider a virtual image, which would be associated with an image plane lying in front of the pinhole. The image on this plane is the same orientation as the object in the real world.

5 Images in the pinhole camera are formed using perspective projection. This is associated with a number of common effects. The apparent size of the objects depends on their distance. Parallel lines lying in the same plane all converge to the same vanishing point. These principles were rediscovered in the 4 th and 5 th centuries by Rennaissance artists. Faked photos may often be identified as they may violate this principle. Various approximations to the full perspective projection model which are sometimes used in computer vision. These are discussed in Section Real Cameras Real cameras do not use pinholes as (i) to get good data, it is sensible to collect more light and (ii) a very small pinhole may produce diffraction effects. In practice lenses are used to collect light from a larger area, and focus it on the sensor plane. Although computer vision practitioners rarely model this process, it is important to note that it introduces several known aberrations. Spherical Abberation: The lens models which predict a perfect sharp image are only approximations to the real world and degrade the image by a blurring of every point. Chromatic Abberation: Different wavelengths of light are refracted by different amounts, so it is only possible to focus one wavelength on the imaging plane at once. Other wavelengths will be blurred. Vignetting: Peripheral parts of the scene may be partially occluded internally to the imaging system where there are multiple lenses of finite size. This results in images which are darker around the edges. Radial Distortion: In real camera systems, the linear perspective model is not strictly accurate - that is to say the world point, image point and camera centre are not exactly colinear. The most important deviation is a radial distortion, which is manifested by a barrelling of the image. This is an important factor in geometric algorithms and is usually modelled by computer vision practitioners (see Section 3.4). 3 Perspective Camera Model In order to interpret the images in the sensor, we would like to develop a mathematical model relating how points in the real world ˆX = ( ˆX,Ŷ,Ẑ) T project to pixels in the image plane, ˆx i = ( ˆx i,ŷ i ). We use the subscript, i to denote the fact that we are talking about two dimensional co-ordinates in the final camera image. Another way of thinking about this problem is to start with a pixel in the camera image and ask which ray in the real world projects to this point. We will start with a very simple model of the camera and build up to the full perspective imaging model. 3. Perspective Projection Equations The pinhole camera model consists of two vital components (i) the optical centre (pinhole) and (ii) the imaging plane (where the images are realized). The optical axis is a line

6 passes through the principal point of the image plane and the optical centre. The distance between the imaging plane and the optical axis is known as the focal length of the camera. In the first instance, we will assume that the optical centre is at the origin of the world coordinate system, and the optical axis is in the positive Z direction. Furthermore, we will assume that the focal length of the camera is one. For convenience we place the imaging plane in front of the pinhole so that the image will not be inverted. We define a co-ordinate system on the imaging plane with it s origin at where the plane meets the optical axis. We use a left-handed co-ordinate system so that in the image plane, the positive x-axis points rightwards and the positive y-axis points downwards. This is similar to the pixel indexing in most digitised images. We term this simplified camera a normalized camera. Points on the image plane ( ˆx c,ŷ c ) in this simplified case are referred to as camera co-ordinates or normalized co-ordinates. By similar triangles, it is trivial to show that: ˆx c = ˆX Ẑ ŷ c = Ŷ Ẑ These equations can be written much more neatly in homogeneous form so that: x c y c = X Y Z (20) z c To convert from the resulting homogeneous co-ordinates back to inhomogeneous values, refer to Equations 4 and Intrinsic Parameters The above camera model is somewhat unrealistic, since in general cameras to not have focal lengths of exactly unity. In order to incorporate an arbitrary focal length, we modify the previous equation thus: X x y = z (9) f f 0 0 Y (2) where f is the focal length. In fact it is common to model the focal length in the x and y directions separately so that: x y = f X x f y 0 0 Y Z (22) z where f x is the focal length in the x direction and f y is the focal length in the y direction. An intuitive way to understand these parameters is to consider a fan of rays projecting Z

7 outwards from the optical centre through the pixels of the image and into the world. The focal lengths determine how spread out this fan is. If the focal length is long then the fan will be quite concentrated and if it short then the fan will be much broader. For a fixed pixel array on a ccd camera, the focal lengths effect the field of view in the x and y directions. This more sophisticated camera model is still quite unrealistic because the origin of the co-ordinate system on the image plane is in the center of the image. It is more common in real images to have the origin in the top-left corner. In order to incorporate this into our model, we add the translational offset of the principal point in pixels (o x,o y ) to the model. X x i y i = f x 0 o x 0 0 f y o y 0 Y (23) z i It should be noted that for real cameras, the principal point is not necessarily exactly in the center of the image as one might expect. One final embellishment to the model is to add a skew term, α, to this matrix. One can think of this a modeling the case where the image plane is not precisely perpendicular to the optical axis. X Z x i y i = f x α o x 0 0 f y o y 0 Y (24) z i This full perspective imaging model can be written in matrix form as: Z x i = K[I 0]X (25) where I is the 3 3 identity matrix and 0 is a 3 vector of zeros. K = f x α o x 0 f y o y (26) 0 0 The matrix K is called the intrinsic matrix and the values inside it are known as the intrinsic parameters. One way to think of this matrix is as converting from the camera/normalized co-ordinates to the pixel/image co-ordinates, so that: ˆx i = Kˆx c (27) We can move in the other direction by inverting the matrix. This is common operation in many three dimensional computer vision algorithms. Another way to think of this matrix is as converting directions in space (closely linked to camera co-ordinates) into pixels in the final image.

8 3.3 Extrinsic Parameters In the previous sections we have made the additional simplification that the origin of the world co-ordinate system is at the optical centre of the camera with the z axis aligned with the optical axis. Clearly this is not true in general. For example, if there is more than one camera viewing the scene, they cannot both have this property. In general we must add a three-dimensional Euclidean transformation T W C moving us from the threedimensional world co-ordinate system to a new co-ordinate system which does have these desirable properties. The full projection equations then are: x i y i = f r x α o x 0 r 2 r 3 t x X 0 f y o y 0 r 2 r 22 r 23 t y Y (28) z i or r 3 r 32 r 33 t z x i = K[I 0]T W C X (29) The matrix T W C is termed the extrinsic matrix and the parameters it contains are termed the extrinsic parameters. It can be seen that the product, K[I 0]T W C is a 3 4 camera matrix, which we will term P In the computer vision literature it is sometimes the case that the intrinsic and extrinsic parameters are not considered separately, but the matrix P is provided as a full description of the camera. 3.4 Radial Distortion Parameters The final complication is the radial distortion due to the lens. Recall that the aim of this modelling is to accurately map which pixel a point in the world will project to. For cameras with wide fields of view and inexpensive lenses, radial distortion or barrelling is a serious problem. It can easily be spotted, because straight lines in the real world should project to straight lines in the image. When this is not the case, radial distortion is a serious problem and must be compensated for. Radial distortion is commonly modelled in terms of camera co-ordinates. Hence the full projection model is projection to camera co-ordinates which are radially distorted, ( ˆx c d,ŷ d c ), correction for radial distortion to form ( ˆx c,ŷ c ), followed by conversion to image co-ordinates ( ˆx i,ŷ i ) using the intrinsic matrix as described in Equation 27. A simple model of the radial distortion would be: Z ˆx c = ˆx d c + ˆx d c (k r 2 + k 2 r 4 ) ŷ c = ŷ d c + ŷ d c (k r 2 + k 2 r 4 ) (30) where r is the radial distance from the centre of the image in the distorted co-ordinates, ( ˆx d c,ŷ d c ). This model assumes that the centre of the radial distortion is the same as the optical centre of the camera. The parameters k and k 2 are termed the radial distortion parameters.

9 3.5 Calibration The process of estimating the intrinsic, extrinsic and radial distortion parameters for a given camera is known as calibration. When these parameters are unknown, the camera is termed uncalibrated. Different algorithms can operate in calibrated or uncalibrated settings, but in general any algorithm that produces measurements of the world in meaningful units (e.g. mm) will require calibration. The process of calibration is conceptually simple, but the details are rather complicated. Calibration requires a real world object, containing some distinctive visual features, which are at known locations. A typical example of this is a chequerboard of known dimensions. The procedure is as follows:. Set up a world co-ordinate system in the centre of the chequerboard, with the Z- axis pointing straight up from the plane. Establish the exact world co-ordinates (X,Y,Z,) of each of the corners between the chequers. 2. For each chequer corner find the point in the image (x i,y i,) that this projects to. We now have a series of pairs of co-ordinates that are the left- and right-hand vectors in Equation 29. Each pair consists of a point in a 3D world co-ordinate system and the position in the pixel array where it appears. The projection from one to the other depends on the intrinsic and extrinsic parameters. If we have exactly the correct intrinsic and extrinsic parameters then it will be exact (modulo some noise). 3. Now manipulate the intervening intrinsic and extrinsic parameters until all of the points in the world co-ordinate system project correctly to their known imaged positions. The manipulation in the first part is quite complex since there are non-linear constraints between the rotation entries in the extrinsic matrix. It should be noted that the chequerboard needs to be seen in multiple views to constrain the solution. An alternative possibility is to use a 3 dimensional object where exact 3d positions of points are known. However, this is harder to manufacture in practice. 4 Approximations to Perspective Projection In Section 3 the full perspective projection model was presented. However, there are real-world situations where this rather elaborate model is not required. There are other situations where an alternative simpler model approximates the true viewing conditions. We assume a simplified intrinsic matrix in this section. 4. Orthographic Camera Orthographic viewing assumes that all the rays from the object come in parallel to the optical axis, rather than passing through the optical centre. One way to think about this is that it is what happens in the limit if you keep making the object larger, but further away so that it maintains the same viewing angle. The orthographic camera ignores the division by the distance, z, and can be described as:

10 x i y i = f 0 0 o X x 0 f 0 o y Y Z (3) There are no real world conditions where this camera model is correct, but it may be a useful approximation in some circumstances 4.2 Viewing Frontoplanar Scene at Known Distance In this simplified case, all of the Z components of the object are equal to D, the known distance. The imaging equations can be written as: x i y i = f 0 o r x 0 r 2 0 t x X 0 f o y 0 r 2 r 22 0 t y Y (32) z i It can be seen that there are two parameters that control the scaling of the final image (f and D), two parameters controlling the x-translation (t x and o x ) and two parameters controlling the y-translation (t x and o x ). These can be combined to make one new scaling parameter, m, and two new shift parameters, s x and s y so that: [ ] [ ] xi mr mr = 2 s x X Y (33) y i mr 2 mr 22 s y Under these viewing conditions, there are 3 varying parameters: for the rotation, two for the translation. The scaling is constant regardless of the position and orientation of the object. All images of the object are related by a two-dimensional Euclidean transformation. Under these very restrictive imaging conditions, many properties of the object may be simply related to measurements on the image, including lengths, angles and areas. Since these properties do not change across viewing, they are termed Euclidean invariants. A wide variety of qualitative geometric phenomena are also preserved including intersections (concurrency), tangency, inflexions, cusps, collinearity and parallelism. Such images may be used to measure geometric properties of objects. 4.3 Viewing Frontoplanar Scene at Unknown Distance If the focal distance of the camera is unknown or the distance to the plane is unknown then the images are also now related to each other by the unknown scale factor, m. Images of the same object are now related by the similarity transform, which has four parameters. Nonetheless, some properties of the object stay constant. For example, ratios of lengths, angles and ratios of areas remain invariant under similarity transforms. In other words, shape is preserved under these viewing conditions. D

11 4.4 Viewing Tilted Planar Scene - Affine Approximation A planar surface which is not oriented parallel to the image plane and is at an unknown angle is more problematic. In general, shapes are now distorted in the image, since some points are nearer and others further away. When the viewing distance is large compared to the object size, we can approximate the viewing situation using affine projection. [ ] [ ] xi a a = 2 t x X Y (34) y i a 2 a 22 t y Under this approximation all images of a given object are related by the affine transformations. Affine invariants include ratios of lengths of colinear or parallel line segements, ratios of areas and linear combinations of vectors (affine basis). Parallel lines still map to parallel lines. In general, affine transforms to not preserve shape. 4.5 Viewing Planar Scene - Projective Case The true imaging conditions for a plane at an arbitrary tilt and pan are described by a projective transformation or homography. Consider a world co-ordinate system set up on the plane, so that every point on the plane has Z = 0. The imaging equations are hence: x i y i = f r x α o x 0 r 2 r 3 t x X 0 f y o y 0 r 2 r 22 r 23 t y Y (35) z i r 3 r 32 r 33 t z Examining this equation carefully, it can be seen that many of the terms are zero. We can equivalently write: 0 x i y i = z i = f x α o x 0 f y o y r r 2 t x r 2 r 22 t y X Y 0 0 r 3 r 32 t z h h 2 h 3 h 2 h 22 h 23 X Y (36) h 3 h 32 h 33 This demonstrates that relationship between two dimensional co-ordinates on a real-world plane and the image pixels can be described by a projective transformation or homography. Note that the affine approximation is good if h 33 h 3 X + h 32 Y. The projective transformation does not preserve parallelism, but there does preserve the cross ratio, which is the ratio of ratios of lengths. It also preserves many qualitative phenomena such as collinearity of points. 4.6 Scalar Invariants A scalar invariant I(P) of a geometric structure descriped in a parameter vector, P remains unchanged in value and form under the linear transformation.

12 X = TX (37) The simplest invariants involve points or lines, but invariants can also be defined on algebraic curves (notably conics and cubics) for smooth curves (differential and integral invariants, and canonical frames of reference) and for combinations of those (e.g. points and lines, points and conics). Invariants are never unique - given a scalar invariant, any function of it is also invariant. However, the number of functionally independent invariants is limited and is often given by: #invariants = d.o.f. of structure + d.o.f. of transformation (38) where d.o.f. stands for degrees of freedom. For example, the Euclidean transformation has 3 degrees of freedom (two for the translation and the rotation angle). Two points in the plane have four degrees of freedom (their co-ordinates). Hence, there are 4 3 = invariants. This is the distance between the points. 4.7 Euclidean and Similarity Invariants Euclidean and similarity invariants are mostly familiar quantitative entities such as lengths (Euclidean only) and angles (both). The central moments are also invariant under Euclidean translations: u mn = dxdy(x E(x)) m (y E(y)) n (39) shape Similarly, normalized central moments are invariant under similarity transformations. 4.8 Affine Invariants Consider a one dimensional affine transformation, x = a x+a 2. This has two parameters. Hence, we would expect three points on a line to define one invariant. Let us consider the ratio of two lengths. I = (x 3 x 2 ) (x 2 x ) = (x 3 x 2 ) (x 2 x ) This can trivially be proved by substituting in the expression for the affine transform and showing that the left and right hand ratios are equivalent. In two dimensions the affine transform has six degrees of freedom, so we expect 4 points in the plane to define 2 invariants. Let m 23 be the matrix of co-ordinates: m 23 = x x 2 x 3 y y 2 y 3 (4) Then it can be shown that the ratios of determinants are invariants: (40)

13 I = m 23 m 34 I 2 = m 24 m 234 (42) These are defined as ratios of determinants. Geometrically the invariants may be interpreted as ratios of areas, co-ordinates of a point in an affine basis, or as a result of constructions based on colinearities (both unnaffected by the affine transformation). 4.9 Projective Invariants We may similarly consider the one-dimensional projective invariant. A one dimensional projective transformation is represented by a 2 2 matrix, which is undetermined with respect to scale. [ ] x t t = 2 x (43) t 2 t 22 Hence, there are 3 degrees of freedom and we expect four points to provide one invariant. This invariant is the cross ratio: I = (x 3 x ) (x (x 3 4 x 2 ) x 2 ) (x 4 x ) = (x 3 x ) (x 3 x 2 ) (x 4 x 2 ) (x 4 x ) In two dimensions, the projective transformation has eight parameters and 5 points in a plane define two projective invariants. (44) I = m 43 m 52 m 42 m 53 I 2 = m 42 m 532 m 432 m 52 (45) Similar invariants may be constructed for five lines. Planar conic curves (ellipse/ parabola/ hyperbola) have five degrees of freedom and hence do not provide any projective invariants. However, two conics define (2 5) 8 = 2 invariants. A conic and two points defines invariant. A conic and two lines also defines one invariant. 5 Use of Invariants for Object Recognition Since they remain unchanged under the imaging transformation, the appropriate invariants may be used to recognize instances of objects in the image and subsequently to solve for the pose. Affine and projective invariants are particularly useful for object recognition as they remain unchanged over a wide variety of slightly constrained/unconstrained viewing conditions. There are several advantages to this approach:. Can obtain invariants from example image, or from objects themselves

14 2. Can store the invariant values as models into a library of model objects 3. Indexing can be implemented via a hash table : recognition complexity need not be proportional to the number of models in the library 4. Model hypothesis generated by the invariants should be confirmed/rejected by detailed comparison with the image (verification: see later: model-based vision) 5. Camera calibration is NOT required at any stage provided that non-linear distortions are small. The methods just described rely on finding points, lines, or simple curves such as conics in images. If the objects of interest are more complicated, for example defined by smooth curves, a different approach is needed. In particular, we try to transform our curves into a canonical frame into which all equivalent curves map to the same curve. We first obtain points on the curve that can be unambiguously located before and after transformation to a canonical frame, such as corners (tangent discontinuities), inflexions (zeros in curvature) or bitangent points. Then we use these unambiguous to transform the curve to a canonical frame where its properties are measured. The properties of the curve that have been measured are then used to recognise the curve. Example: Bitangencies If a curve has a convcavity, we can use the bitangent to define a canonical frame of reference that is projectively invariant. This technique can be applied to arbitrary plane figures of sufficient complexity (jigsaw puzzle) We can use a planar object to define a Euclidean frame of reference. To do this we use the fact that: 4 points are sufficient to determine a projective transformation with 8 d.o.f. : any 4 points in general position may be transformed to any other 4 points by a homography. (i) No camera calibration is required (ii) If desired, the image intensity may be rendered onto the image plane by interpolation as discussed previously 5. Comparison with conventional techniques We briefly compare geometric invariants with conventional techniques for object recognition and shape description. Moments: Moments were discussed earlier, when we noted that central moments are invariant to translation. Normalized Central moments are invariant to translation and scale. Principal second moments are invariant to rotation (as are determinant and trace of 2nd order moment matrix). It is also possible to find moment descriptors that are invariant under similarity transformations. In fact, the theory of invariants may be used (Jain 989, page 380, (964) Foundations of the theory of algebraic invariant ) to find invariant moment descriptors. Signature properties: As we saw earlier, the projection of an object on the x and y axes or along an arbitrary direction are related to the moments. Moments taken from a variety of such projections are often called signature properties of the object. In general, they are easy to calculate, either on a conventional computer or on special

15 purpose machines, e.g. pipeline hardware. See (e.g.) Haralick and Shapiro pp and pp SRI Shape Descriptors: A number of methods based on bounding boxes were developed at the Stanford Research institute (SRI) for use in industrial machine vision systems. These included (i) Image oriented bounding box (height,width,area) (ii) Object oriented bounding box (height,width,area) (iii) Distance from centroid to perimeter (minimum radius, minimum radius angle, maximum radius, maxiumum radius angle) (iv) Convex hull (v) Best fitting ellipse (defined from 2nd order moments) Most of these descriptors are not invariant but do provide a useful variety of feature measures for feature-based object recognition. Fourier descriptors: Fourier analysis is often used to provide a means of describing the shape of smoothed curves, especially is they are closed. There are several ways of doing this. In one of the simplest, we imagine moving along the curve and combining the coordinate functions (x(s),y(s)) as a complex variable z(s): that is written as a Fourier series: z(s) = x(s) + iy(s) (46) z(s) = a(k)e 2πiks L (47) k a(k) = L L 0 dse 2πks/L Z(s) (48) If we take a finite number of samples at regular intervals of s, Equation 47 becomes a Discrete Fourier Transform. The first few Fourier coefficients describe the gross shape of the curve whilst the higher order coefficients usually represent small details or noise. The Fourier coefficients are not invariant under geometric transformations, but (i) The defining equations (47) may be used to deduce how they transform under rotations, translations, scaling and shift of origin of the distance along the curve (see, for Example, Pratt p.370) (ii) Certain combinations of the coefficients, are invariant under rotation, translation and scale. (iii) These Fourier descriptors are very effective when used to describe a curve in its canonical frame of reference. References [] D. Forsyth and J. Ponce, Computer Vision: A Modern Approach, Prentice Hall, Chapters,2,3. [2] R. Hartley and A. Zisserman, Multiple View Geometry, Cambridge University Press, 2000, Chapters,2,6.

16 [3] P. Sturm, Algorithms for Plane Based Pose Estimation, Proc. Computer Vision and Pattern Recognition Conf. (CVPR 2000), IEEE Computer Soc. Press, Los Alamitos, Calif, 2000, pp [4] E. Trucco and A. Verri, Introductory Techniques for 3D Computer Vision, Prentice Hall, 998. Chapter 2. [5] Z.Zhang, Flexible camera calibration by viewing a plane from unknown orientations, Proc Int l Conf. Computer Vision (ICCV 999), IEEE Computer Soc. Press, Los Alamitos, Calif. 999, pp

Homogeneous Coordinates. Lecture18: Camera Models. Representation of Line and Point in 2D. Cross Product. Overall scaling is NOT important.

Homogeneous Coordinates. Lecture18: Camera Models. Representation of Line and Point in 2D. Cross Product. Overall scaling is NOT important. Homogeneous Coordinates Overall scaling is NOT important. CSED44:Introduction to Computer Vision (207F) Lecture8: Camera Models Bohyung Han CSE, POSTECH bhhan@postech.ac.kr (",, ) ()", ), )) ) 0 It is