Single View Metrology

Size: px

Start display at page:

Download "Single View Metrology"

Lee Richard
5 years ago
Views:

1 International Journal of Computer Vision 40(2), , 2000 c 2000 Kluwer Academic Publishers. Manufactured in The Netherlands. Single View Metrology A. CRIMINISI, I. REID AND A. ZISSERMAN Department of Engineering Science, University of Oxford, Parks Road, Oxford OX1 3PJ, UK criminis@robots.oxford.ac.uk ian@robots.oxford.ac.uk az@robots.oxford.ac.uk Abstract. We describe how 3D affine measurements may be computed from a single perspective view of a scene given only minimal geometric information determined from the image. This minimal information is typically the vanishing line of a reference plane, and a vanishing point for a direction not parallel to the plane. It is shown that affine scene structure may then be determined from the image, without knowledge of the camera s internal calibration (e.g. focal length), nor of the explicit relation between camera and world (pose). In particular, we show how to (i) compute the distance between planes parallel to the reference plane (up to a common scale factor); (ii) compute area and length ratios on any plane parallel to the reference plane; (iii) determine the camera s location. Simple geometric derivations are given for these results. We also develop an algebraic representation which unifies the three types of measurement and, amongst other advantages, permits a first order error propagation analysis to be performed, associating an uncertainty with each measurement. We demonstrate the technique for a variety of applications, including height measurements in forensic images and 3D graphical modelling from single images. Keywords: 3D reconstruction, video metrology, photogrammetry 1. Introduction In this paper we describe how aspects of the affine 3D geometry of a scene may be measured from a single perspective image. We will concentrate on scenes containing planes and parallel lines, although the methods are not so restricted. The methods we develop extend and generalize previous results on single view metrology (Reid and Zisserman, 1996; Horry et al., 1997; Kim et al., 1998; Proesmans et al., 1998). It is assumed that images are obtained by perspective projection. In addition, we assume that the vanishing line of a reference plane in the scene may be determined from the image, together with a vanishing point for another reference direction (not parallel to the plane). We are then concerned with three canonical types of measurement: (i) measurements of the distance between any of the planes which are parallel to the reference plane; (ii) measurements on these planes (and comparison of these measurements to those obtained on any parallel plane); and (iii) determining the camera s position in terms of the reference plane and direction. The measurement methods developed here are independent of the camera s internal parameters: focal length, aspect ratio, principal point, skew. The camera is always assumed to be uncalibrated, its internal parameters unknown. We analyse situations where the camera (the projection matrix) can only be partially determined from scene landmarks. This is an intermediate situation between calibrated reconstruction (where metric entities like angles between rays can be computed) and completely uncalibrated cameras (where a reconstruction can be obtained only up to a projective transformation).

124 Criminisi, Reid and Zisserman The ideas in this paper can be seen as reversing the rules for drawing perspective images given by Alberti (1980) in his treatise on perspective (1435).

2 124 Criminisi, Reid and Zisserman The ideas in this paper can be seen as reversing the rules for drawing perspective images given by Alberti (1980) in his treatise on perspective (1435). These are the rules followed by the Italian Renaissance painters of the 15th century, and indeed we demonstrate the correctness of their mastery of perspective by analysing a painting by Piero della Francesca. This paper extends the work in Criminisi et al. (1999b). Here particular attention is paid to: computing Maximum Likelihood estimates of measurements when more than the minimum number of references are available; transferring measurements from one reference plane to another by making use of planar homologies; analysing in detail the uncertainty of the computed distances; validating the analytical uncertainty predictions by using statistical tests. A number of worked examples are presented to explain the algorithms step by step and demonstrate their validity. We begin in Section 2 by giving simple geometric derivations of how, in principle, three dimensional affine information may be extracted from the image (Fig. 1). In Section 3 we introduce an algebraic representation of the problem and show that this representation unifies the three canonical measurement types, leading to simple formulae in each case. In Section 4 we describe how errors in image measurements propagate to errors in the 3D measurements, and hence we are able to compute confidence intervals on the 3D measurements, i.e. a quantitative assessment of accuracy. The work has a variety of applications, and we demonstrate three important ones: forensic measurement, virtual modelling and furniture measurements in Section Geometry The camera model employed here is central projection. We assume that the vanishing line of a reference plane in the scene may be computed from image measurements, together with a vanishing point for another direction (not parallel to the plane). This information is generally easily obtainable from images of structured scenes (Collins and Weiss, 1990; McLean and Kotturi, 1995; Liebowitz and Zisserman, 1998; Shufelt, 1999). Effects such as radial distortion (often arising in slightly wide-angle lenses typically used in security cameras) which corrupt the central projection model can generally be removed (Devernay and Faugeras, 1995), and are therefore not detrimental to our methods. Implementation details for: computation of vanishing points and lines, and line detection are given in Appendix A. Although the schematic figures show the camera centre at a finite location, the results we derive apply also to the case of a camera centre at infinity, i.e. where the images are obtained by parallel projection. The basic geometry of the plane s vanishing line and the vanishing point are illustrated in Fig. 2. The vanishing line l of the reference plane is the projection of the line at infinity of the reference plane into the image. The vanishing point v is the image of the point at infinity in the reference direction. Note that the reference direction need not be vertical, although for clarity we will often refer to the vanishing point as the vertical vanishing point. The vanishing point is then the image of the vertical footprint of the camera centre on the reference plane. Likewise, the reference plane will often, but not necessarily, be the ground plane, in which case the vanishing line is more commonly known as the horizon. It can be seen (for example, by inspection of Fig. 2) that the vanishing line partitions all points in scene space. Any scene point which projects onto the vanishing line is at the same distance from the plane as the camera centre; if it lies above the line it is farther from the plane, and if below the vanishing line, then it is closer to the plane than the camera centre Measurements Between Parallel Planes Figure 1. Measuring distances of points from a reference plane (the ground) in a single image: (a) The four pillars have the same height in the world, although their images clearly are not of the same length due to perspective effects. (b) As shown, however, all pillars are correctly measured to have the same height. We wish to measure the distance (in the reference direction) between two parallel planes, specified by the image points x and x. Figure 3 shows the geometry, with points x and x in correspondence. We use upper

The vanishing point v is the intersection of the image plane with a line parallel to the reference direction through the camera centre.

3 Single View Metrology 125 Figure 2. Basic geometry: The plane s vanishing line l is the intersection of the image plane with a plane parallel to the reference plane and passing through the camera centre C. The vanishing point v is the intersection of the image plane with a line parallel to the reference direction through the camera centre. case letters (X) to indicate quantities in space and lower case letters (x) to indicate image quantities. Definition 1. Two points X, X on separate planes (parallel to the reference plane) correspond if the line joining them is parallel to the reference direction. Hence the images of corresponding points and the vanishing point are collinear. For example, if the direction is vertical, then the top of an upright person s head and the sole of his/her foot correspond. If the world distance between the two points is known, we term this a reference distance. We show that: Theorem 1. Given the vanishing line of a reference plane and the vanishing point for a reference direction, then distances from the reference plane parallel to the reference direction can be computed from their imaged end points up to a common scale factor. The scale factor can be determined from one known reference length. Figure 3. Distance between two planes relative to the distance of the camera centre from one of the two planes: (a) in the world; (b) in the image. The point x on the plane π corresponds to the point x on the plane π. The four aligned points v, x, x and the intersection c of the line joining them with the vanishing line define a cross-ratio. The value of the cross-ratio determines a ratio of distances between planes in the world, see text. Proof: The four points x, x, c, v marked on Fig. 3(b) define a cross-ratio (Springer, 1964). The vanishing point is the image of a point at infinity in the scene and the point c, since it lies on the vanishing line, is the image of a point at distance Z c from the plane π, where Z c is the distance of the camera centre from π. In the world the value of the cross-ratio provides an affine length ratio which determines the distance Z between the planes containing X and X (in Fig. 3(a)) relative to the camera s distance Z c from the plane π (or π depending on the ordering of the cross-ratio). Note that the distance Z can alternatively be computed using a line-to-line homography avoiding the ordering ambiguity of the cross-ratio. For the case in Fig. 3(b) we can write d(x, c) d(x, v) d(x, c) d(x, v) = d(x, C) d(x, V) d(x, C) d(x, V) (1) where d(x 1, x 2 ) is distance between two generic points x 1 and x 2. Since the back projection of the point v is a point at infinity d(x,v) = 1 and therefore the right d(x,v) hand side of (1) reduces to. Simple algebraic Z c Z c Z

126 Criminisi, Reid and Zisserman manipulation on (1) yields Z = 1 d(x, c) d(x, v) Z c d(x, c) d(x, v) (2) The absolute distance Z can be obtained from this distance ratio once the camera s distance

4 126 Criminisi, Reid and Zisserman manipulation on (1) yields Z = 1 d(x, c) d(x, v) Z c d(x, c) d(x, v) (2) The absolute distance Z can be obtained from this distance ratio once the camera s distance Z c is specified. However it is usually more practical to determine the distance Z via a second measurement in the image, that of a known reference length. In fact, given a known reference distance Z r, from (2) we can compute the distance of the camera Z c and then apply (2) to a new pair of end points and compute the distance Z.. We now generalize Theorem 1 to the following. Definition 2. A set of parallel planes are linked if it is possible to go from one plane to any other plane in the set through a chain of pairs of corresponding points (see also Definition 1). For example in Fig. 4(a) the planes π, π, π r and π r are linked by the chain of correspondences X X, S 1 S 2, R 1 R 2. Theorem 2. Given a set of linked parallel planes, the distance between any pair of planes is sufficient to determine the absolute distance between any other pair, the link being provided by a chain of point correspondences between the set of planes. Proof: Figure 4 shows a diagram where four parallel planes are imaged. Note that they all share the same vanishing line which is the image of the axis of the pencil. The distance Z r between two of them can be used as reference to compute the distance Z between the other two as follows: From the cross-ratio defined by the four aligned points v, c r, r 2, r 1 and the known distance Z r between the points R 1 and R 2 we can compute the distance of the camera from the plane π r. That camera distance and the cross-ratio defined by the four aligned points v, c s, s 2, s 1, determine the distance between the planes π r and π. The distance Z c of the camera from the plane π is, therefore, determined too. The distance Z c can now be used in (2) to compute the distance Z between the two planes π and π. Figure 4. Distance between two planes relative to the distance between two other planes: (a) in the world; (b) in the image. The point x on the plane π corresponds to the point x on the plane π. The point s 1 corresponds to the point s 2. The point r 1 corresponds to the point r 2. The distance Z r in the world between R 1 and R 2 is known and used as reference to compute the distance Z, see text. In Section 3.1 we give an algebraic derivation of these results which avoids the need to compute the distance of the camera explicitly and simplifies the measurement procedure. Example. Figure 5 shows that a person s height may be computed from an image given a vertical reference distance elsewhere in the scene. The ground plane is reference. The height of the frame of the window has been measured on site and used as the reference distance (it corresponds to the distance between R 1 and R 2 in the world in Fig. 4(a)). This situation corresponds to the one in Fig. 4 where the two points S 2 and R 1 (and therefore s 2 and r 1 ) coincide. The height of the person is computed from the cross ratio defined by the points x, c, x and the vanishing point (c.f. Fig. 4(b)) as described in the proof above. Since the points S 2 and R 1 coincide the derivation is simpler.

5 Single View Metrology 127 compared directly. If the region is parallel projected in the scene from one plane onto the other, affine measurements can then be made from the image since both regions are now on the same plane, and parallel projection between parallel planes does not alter affine properties. A map in the world between parallel planes induces a projective map in the image between images of points on the two planes. This image map is a planar homology (Springer, 1964), which is a plane projective transformation with five degrees of freedom, having a line of fixed points called the axis, and a distinct fixed point not on the axis known as the vertex. Planar homologies arise naturally in an image when two planes related by a perspectivity in three-dimensional space are imaged (Van Gool et al., 1998). The geometry is illustrated in Fig. 6. In our case the vanishing line of the plane, and the vertical vanishing point, are, respectively, the axis and vertex of the homology which relates a pair of planes in the pencil. The homology can then be parametrized as (Viéville and Lingrand, 1999) Figure 5. Measuring the height of a person from single view: (a) original image; (b) the height of the person is computed from the image as cm; the true height is 180 cm, but note that the person is leaning down a bit on his right foot. The vanishing line is shown in white; the vertical vanishing point is not shown since it lies well below the image. The reference distance is in white (the height of the window frame on the right). Compare the marked points with the ones in Fig Measurements on Parallel Planes If the reference plane π is affine calibrated (we know its vanishing line) then from image measurements we can compute: 1. ratios of lengths of parallel line segments on the plane; 2. ratios of areas on the plane. Moreover the vanishing line is shared by the pencil of planes parallel to the reference plane, hence affine measurements may be obtained for any other plane in the pencil. However, although affine measurements, such as an area ratio, may be made on a particular plane, the areas of regions lying on two parallel planes cannot be H = I + µ vl v l (3) where v is the vanishing point, l is the plane vanishing line and µ is a scale factor. Thus v and l specify four of the five degrees of freedom of the homology. The remaining degree of freedom of the homology, µ, is uniquely determined from any pair of image points which correspond between the planes (points r and r in Fig. 6). Once the matrix H is computed each point on a plane can be transferred into the corresponding point on a parallel plane as x = Hx. An example of this homology mapping is shown in Fig. 7. Consequently we can compare measurements made on two separate planes. In particular we may compute: 1. the ratio between two lengths measured along parallel lines, one length on each plane; 2. the ratio between two areas, one area on each plane. In fact we can simply transfer all points from one plane to the reference plane using the homology and then, since the reference plane s vanishing line is known we

128 Criminisi, Reid and Zisserman 2.3. Determining the Camera Position In Section 2.1, we computed distances between planes as a ratio relative to the camera s distance from the reference plane.

6 128 Criminisi, Reid and Zisserman 2.3. Determining the Camera Position In Section 2.1, we computed distances between planes as a ratio relative to the camera s distance from the reference plane. Conversely, we may compute the camera s distance Z c from a particular plane knowing a single reference distance Z r. Furthermore, by considering Fig. 2 it is seen that the location of the camera relative to the reference plane is the back-projection of the vertical vanishing point onto the reference plane. This back-projection is accomplished by a homography which maps the image to the reference plane (and vice-versa). Although the choice of coordinate frame in the world is somewhat arbitrary, fixing this frame immediately defines the homography uniquely and hence the camera position. 3. Algebraic Representation Figure 6. Homology mapping between imaged parallel planes: (a) A point X on plane π is mapped into the point X on π by a parallel projection. (b) In the image the mapping between the images of the two planes is a homology, where v is the vertex and l the axis. The correspondence r r fixes the remaining degree of freedom of the homology from the cross-ratio of the four points: v, i, r and r. may make affine measurements in the plane, e.g. ratios of lengths on parallel lines or ratios of areas. Example. Figure 8 shows an example. The vanishing line of the two front facing walls and the vanishing point are known as is the point correspondence r, r in the reference direction. The ratio of lengths of parallel line segments is computed by using formulae given in Section 3.2. Notice that errors in the selection of point positions affect the computations; the veridical values of the ratios in Fig. 8 are exact integers. A proper error analysis is necessary to estimate the uncertainty of these affine measurements. The measurements described in the previous section are computed in terms of cross-ratios. In this section we develop a uniform algebraic approach to the problem which has a number of advantages over direct geometric construction: first, it avoids potential problems with ordering for the cross-ratio; second, it enables us to deal with both minimal or over-constrained configurations uniformly; third, we unify the different types of measurement within one representation; and fourth, in Section 4 we use this algebraic representation to develop an uncertainty analysis for measurements. To begin we define an affine coordinate system XYZ in space (Koenderink and Van Doorn, 1991; Quan and Mohr, 1992). Let the origin of the coordinate frame lie on the reference plane, with the X and Y -axes spanning the plane. The Z-axis is the reference direction, which is thus any direction not parallel to the plane. The image coordinate system is the usual xy affine image frame, and a point X in space is projected to the image point x viaa3 4 projection matrix P as: x = PX = [p 1 p 2 p 3 p 4 ] X where x and X are homogeneous vectors in the form: x = (x, y,w),x=(x,y,z,w), and = means equality up to scale. If we denote the vanishing points for the X, Y and Z directions as (respectively) v X, v Y and v, then it is clear by inspection (Faugeras, 1993) that the first three columns of P are the vanishing points: v X = p 1, v Y = p 2 and v = p 3, and that the final column of P is the

Single View Metrology 129 Figure 7. Homology mapping of points from one plane to a parallel one: (a) original image, the floor and the top of the filing cabinet are parallel planes.

The vertical vanishing point (vertex of the homology) has been computed by intersecting vertical edges. Two corresponding points r and r are selected and the homology computed.

7 Single View Metrology 129 Figure 7. Homology mapping of points from one plane to a parallel one: (a) original image, the floor and the top of the filing cabinet are parallel planes. (b) Their common vanishing line (axis of the homology, shown in white) has been computed by intersecting two sets of horizontal edges. The vertical vanishing point (vertex of the homology) has been computed by intersecting vertical edges. Two corresponding points r and r are selected and the homology computed. Three corners of the top plane of the cabinet have been selected and their corresponding points on the floor computed by the homology. Note that occluded corners have been retrieved too. (c) The wire frame model shows the structure of the cabinet; occluded sides are dashed. projection of the origin of the world coordinate system, o = p 4. Since our choice of coordinate frame has the X and Y axes in the reference plane p 1 = v X and p 2 = v Y are two distinct points on the vanishing line. Choosing these fixes the X and Y affine coordinate axes. We denote the vanishing line by l, and to emphasize that the vanishing points v X and v Y lie on it, we denote them by l 1, l 2, with l i l = 0. Columns 1, 2 and 4 of the projection matrix are the three columns of the reference plane to image homography. This homography must have rank three, otherwise the reference plane to image map is degenerate. Consequently, the final column (the origin of the coordinate system) must not lie on the vanishing line, since if it does then all three columns are points on the vanishing line, and thus are not linearly independent. Hence we set it to be p 4 = l/ l = l. Therefore the final parameterization of the projection matrix P is: P = [ l 1 l 2 αv l ] (4) Figure 8. Measuring ratio of lengths of parallel line segments lying on two parallel scene planes: The points r and r (together with the plane vanishing line and the vanishing point) define the homology between the two planes on the facade of the building. where α is a scale factor, which has an important rôle to play in the remainder of the paper. Note that the vertical vanishing point v imposes two constraints on the P matrix, the vanishing line l imposes two and the α parameter only one for a total of five independent constraints (at this stage the first two columns of the P matrix are not completely known; the only constraint is that they are orthogonal to the plane vanishing line l, li l = 0). In general however the P matrix has eleven d.o.f., which can be regarded as comprising eight for the world-to-image homography induced by the reference plane, two for the vanishing point and one for the affine parameter α. In our case the vanishing line determines two of the eight d.o.f. of the homography. In the following sections we show how to compute various measurements from this projection matrix. Measurements of distances between planes are independent of the first two (in general under-determined) columns of P.Ifvand l are specified, the only unknown quantity for these measurements is α. Coordinate

8 130 Criminisi, Reid and Zisserman measurements within the planes depend on the first two and the fourth columns of P. These columns define an affine coordinate frame within the plane. Affine measurements (e.g. area ratios), though, are independent of the actual coordinate frame and depend only on the fourth column of P. If any metric information on the plane is known, we may impose constraints on the choice of the frame Measurements Between Parallel Planes Distance of a Plane from the Reference Plane π. We wish to measure the distance between scene planes specified by a point X and a point X in the scene (see Fig. 3(a)). These points may be chosen as respectively X = (X, Y, 0) and X = (X, Y, Z), and their images are x and x (Fig. 9). If P is the projection matrix then the image coordinates are X Y x = P 0, 1 X Y x = P Z 1 The equations above can be rewritten as x = ρ(xp 1 + Yp 2 + p 4 ) (5) x = ρ (Xp 1 + Y p 2 + Zp 3 + p 4 ) (6) where ρ and ρ are unknown scale factors, and p i is the ith column of the P matrix. Since p 1 l = p 2 l = 0 and p 4 l = 1, taking the scalar product of (5) with l yields ρ = l x and therefore (6) can be rewritten as ( ) x x = ρ ρ + αzv (7) By taking the vector product of both terms of (7) with x we obtain x x = αzρ(v x ) (8) and, finally, taking the norm of both sides of (8) yields αz = x x (9) ( l x) v x Since αz scales linearly with α, affine structure has been obtained. If α is known, then a metric value for Z can be immediately computed as: x x Z = (10) (p 4 x) p 3 x Conversely, if Z is known (i.e. it is a reference distance) then (9) provides a means of computing α, and hence removing the affine ambiguity. Metric Calibration from Multiple References. If more than one reference distance is known then an estimate of α can be derived from an error minimization algorithm. We here show a special case where all distances are measured from the same reference plane and an algebraic error is minimized. An optimal minimization algorithm will be described in Section For the ith reference distance Z i with end points r i and r i we define: β i = r i r i, ρ i = l r i, γ i = v r i. Therefore, from (9) we obtain: αzρ i γ i = β i (11) Note that all the points r i are images of world points R i on the reference plane π. We now define the n 2 matrix A (reorganising (11)) as: Z 1 ρ 1 γ 1 β 1.. A = Z i ρ i γ i β i.. Figure 9. Measuring the distance of a plane π from the parallel reference plane π, the geometry. Z n ρ n γ n where n is the number of reference distances. β n

Single View Metrology 131 If there is no measurement error or n = 1 then As = 0 where s = (s 1 s 2 ) is a homogeneous 2-vector and α = s 1 s 2 (12) In general n > 1 and uncertainty is present in the

9 Single View Metrology 131 If there is no measurement error or n = 1 then As = 0 where s = (s 1 s 2 ) is a homogeneous 2-vector and α = s 1 s 2 (12) In general n > 1 and uncertainty is present in the reference distances. In this case we find the solution s which minimizes As. That is the eigenvector of the 2 2 matrix M = A A corresponding to its minimum eigenvalue. The parameter α is finally computed from (12). With more reference distances Z i, α is estimated more accurately (see Section 4), but no more constraints are added on the P matrix. Worked example. In Fig. 10 the distance of a horizontal line from the ground is measured. The vertical vanishing point v is computed by intersecting vertical (scene) edges; All images of lines parallel to the ground plane intersect in points on the horizon, therefore: A point v 1 on the horizon is computed by intersecting the edges of the planks on the right side of the shed; a second point v 2 is computed by intersecting the edges of the planks on the left side of the shed and the parallel edges on the roof; the plane vanishing line l is computed by joining those two points (l = v 1 v 2 ); the distance of the top of the frame of the window on the left from the ground has been measured on site and used as reference to compute α as in (9). the line l x, the image of a horizontal line, is selected in the image by choosing any two points on it; the associated vanishing point v h is computed as v h = l x l; the line l x, which is the image of a line parallel to l x in the scene is constrained to pass through v h, therefore l x is specified by choosing one additional point on it; a point x is selected along the line l x and its corresponding point x on the line l x computed as x = (x v) l x ; Equation (10) is now applied to the pair of points x, x to compute the distance Z = cm Distance Between any two Parallel Planes. The projection matrix P from the world to the image is defined in (4) with respect to a coordinate frame on the reference plane (Fig. 9). In this section we determine the projection matrix P referred to the parallel plane π and we show how distances from the plane π can be computed. Suppose the world coordinate system is translated by Z r from the plane π onto the plane π along the Figure 10. Measuring heights using parallel lines: The vertical vanishing point and the vanishing line for the ground plane have been computed. The distance of the top of the window on the left wall from the ground is known and used as reference. The distance of the top of the window on the right wall from the ground is computed from the distance between the two horizontal lines whose images are l x and l x. The top line l x is defined by the top edge of the window, and the line l x is the corresponding one on the ground plane. The distance between them is computed to be cm. Figure 11. Measuring the distance between any two planes π and π parallel to the reference plane π. reference direction (Fig. 11), then we can parametrize the new projection matrix P as: P = [p 1 p 2 p 3 Z r p 3 + p 4 ] (13) Note that if Z r = 0 then P = P as expected. The distance Z of the plane π from the plane π in space can be computed as (c.f. (10)). Z = x x ρ p 3 x (14)

132 Criminisi, Reid and Zisserman By inspection, since p 1 p 4 = 0 and p 2 p 4 = 0 then (I + Z r p 3 p 4 )H = H, hence the homology matrix H is: H = I + Z r p 3 p 4 (15) Alternatively from the (4)

10 132 Criminisi, Reid and Zisserman By inspection, since p 1 p 4 = 0 and p 2 p 4 = 0 then (I + Z r p 3 p 4 )H = H, hence the homology matrix H is: H = I + Z r p 3 p 4 (15) Alternatively from the (4) the homology matrix can be written as: H = I + ψv l (16) Figure 12. Measuring heights of objects on separate planes: The height of the desk is known and the height of the file on the desk is computed. with ρ x p 4 = 1 + Z r p 3 p 4 Worked example. In Fig. 12 the height of a file on a desk is computed from the height of the desk itself The ground is the reference plane π and the top of the desk is the plane denoted as π in Fig. 11; the plane vanishing line and vertical vanishing point are computed as usual by intersecting parallel edges; the distance Z r between the points r and r is known (the height of the desk has been measured on site) and used to compute the α parameter from (9); Equation (14) is now applied to the end points of the marked segment to compute the height Z = 32.0 cm Measurements on Parallel Planes As described in Section 2.2, given the homology between two planes π and π in the pencil we can transfer all points from one plane to the other and make affine measurements in either plane. The homology between the planes can be derived directly from the two projection matrices (4) and (13). The plane-to-image homographies are extracted from the projection matrices ignoring the third column, to give: H = [p 1 p 2 p 4 ], H = [p 1 p 2 Z r p 3 + p 4 ] Then H = H H 1 maps image points on the plane π onto points on the plane π and so defines the homology. with v the vertical vanishing point, l the normalized plane vanishing line and ψ = αz r (c.f. (3)). If the distance Z r and the last two columns of the matrix P are known then the homology between the two planes π and π is computed as in (15). Otherwise, if only v and l are known and two corresponding points r and r are viewed, then the homology parameter ψ in (16) can be computed from (9) (remember that αz r = ψ) without knowing either the distance Z r between the two planes or the α parameter. Examples of homology transfer and affine measurements are shown in Figs. 8 and 13. Worked example. In Fig. 13 we compute the ratio between the areas of two windows A 1 A 2 in the world. The orthogonal vanishing point v is computed by intersecting the edges of the small windows linking the two front planes; the plane vanishing line l (common to both front planes) is computed by intersecting two sets of parallel edges on the two planes; the only remaining parameter ψ of the homology H in (16) is computed from (9) as ψ = r r ( l r) v r each of the four corners of the window on the left is transferred by the homology H onto the corresponding points on the plane of the other window (Fig. 13(b)); Now we have two quadrilaterals on the same plane the image is affine-warped pulling the plane vanishing line to infinity (Liebowitz and Zisserman, 1998); the ratio between the two areas in the world is computed as the ratio between the areas in the affine-warped image. We obtain A 1 A 2 = Determining Camera Position Suppose the camera centre is C = (X c, Y c, Z c, W c ) (see Fig. 2). Then since PC = 0 we have PC = p 1 X c + p 2 Y c + p 3 Z c + p 4 W c = 0 (17)

Single View Metrology 133 If α is unknown we can write: X c = det [p 2 vp 4 ], Y c =det [p 1 vp 4 ], αz c = det [p 1 p 2 p 4 ], W c = det [p 1 p 2 v] (19) and we obtain the distance Z c of the camera

11 Single View Metrology 133 If α is unknown we can write: X c = det [p 2 vp 4 ], Y c =det [p 1 vp 4 ], αz c = det [p 1 p 2 p 4 ], W c = det [p 1 p 2 v] (19) and we obtain the distance Z c of the camera centre from the plane up to the affine scale factor α. As before, we may upgrade the distance Z c to metric with knowledge of α, or use knowledge of the camera height to compute α and upgrade the affine structure. Note that affine viewing conditions (where the camera centre is at infinity) present no problem in expressions (18) and (19), since in this case we have l = [0 0 ] and v = [ 0]. Hence W c = 0soweobtain a camera centre on the plane at infinity, as expected. This point on π represents the viewing direction for the parallel projection. If the viewpoint is finite (i.e. not affine viewing conditions) then the formula for αz c may be developed further by taking the scalar product of both sides of (17) with the vanishing line l. The result is αz c = 1 (20) l v Worked example. In Fig. 14 the position of the camera centre with respect to the chosen Cartesian coordinates system is determined. Note that in this case we have chosen p 4 to be the point o in the figure instead of l. Figure 13. Measuring ratios of areas on separate planes: (a) original image with two windows hilighted; (b) the left window is transferred onto the plane identified by r by the homology mapping (16). The two areas now lie on the same plane and can, therefore, be compared. The ratio between the areas of the two windows is then computed as: A 1 A 2 = The solution to this set of equations is given (using Cramer s rule) by X c = det [p 2 p 3 p 4 ], Y c = det [p 1 p 3 p 4 ], Z c = det [p 1 p 2 p 4 ], W c = det [p 1 p 2 p 3 ] and the location of the camera centre is defined. (18) The ground plane (X, Y plane) is the reference; the vertical vanishing point is computed by intersecting vertical edges; the two sides of the rectangular base of the porch have been measured thus providing the position of four points on the reference plane. The world-to- image homography is computed from those points (Criminisi et al., 1999a); the distance of the top of the frame of the window on the left from the ground has been measured on site and used as reference to compute α as in (9). the 3D position of the camera centre is then computed simply by applying equations (18). We obtain X c = cm Y c = cm Z c =162.8 cm In Fig. 22(c), the camera has been superimposed into a virtual view of the reconstructed scene. 4. Uncertainty Analysis Feature detection and extraction whether manual or automatic (e.g. using an edge detector) can only be

12 134 Criminisi, Reid and Zisserman with Λ v the homogeneous 3 3 covariance of the vanishing point v and the variance σα 2 computed as in Appendix D. Since p 4 = l = l its covariance is: l Λ p4 = p 4 l Λ p 4 l l (23) where the 3 3 Jacobian p 4 l is Figure 14. Computing the location of the camera: Equations (18) are used to obtain: X c = cm, Y c = cm, Z c = cm. achieved to a finite accuracy. Any features extracted from an image, therefore, are subject to measurements errors. In this section we consider how these errors propagate through the measurement formulae in order to quantify the uncertainty on the final measurements (Faugeras, 1993). This is achieved by using a first order error analysis. We first analyse the uncertainty on the projection matrix and then the uncertainty on distance measurements Uncertainty on the P Matrix The uncertainty in P depends on the location of the vanishing line, the location of the vanishing point, and on α, the affine scale factor. Since only the final two columns contribute, we model the uncertainty in P as a 6 6 homogeneous covariance matrix, Λ P. Since the two columns have only five degrees of freedom (two for v, two for l and one for α), the covariance matrix is singular, with rank five. Assuming statistical independence between the two column vectors p 3 and p 4 the 6 6 rank five covariance matrix Λ P can be written as: Λ P = ( Λp3 0 0 Λ p4 ) (21) Furthermore, assuming statistical independence between α and v, since p 3 = αv,wehave: Λ p3 =α 2 Λ v +σ 2 α vv (22) p 4 l l li ll = (l l) Uncertainty on Measurements Between Planes When making measurements between planes (10), uncertainty arises from the uncertain image locations of the points x and x and from the uncertainty in P. The uncertainty in the end points x, x of the length to be measured (resulting largely from the finite accuracy with which these features may be located in the image) is modeled by covariance matrices Λ x and Λ x Maximum Likelihood Estimation of the End Points and Uncertainties. In this section we assume a noise-free P matrix. This assumption will be removed in Section Since in the error-free case, x and x must be aligned with the vertical vanishing point we can determine the maximum likelihood estimates (ˆx and ˆx ) of their true locations by minimizing the sum of the Mahalanobis distances between the input points x and x and their MLE estimates ˆx and ˆx [ min (x2 ˆx 2 ) Λ 1 ˆx 2,ˆx 2, x 2 (x 2 ˆx 2 ) + (x 2 ˆx 2 ) Λ 1 x (x 2 ˆx 2 )] (24) 2 subject to the alignment constraint v (ˆx ˆx )=0 (25) (the subscript 2 indicates inhomogeneous 2-vectors). This is a constrained minimization problem. A closed-form solution can be found (by the Lagrange multiplier method) in the special case that Λ x 2 = γ 2 Λ x2

Single View Metrology 135 rewritten as: Z c = (p 4 p 3 ) 1 (27) If we assume an exact P matrix, then the camera distance is exact too, in fact it depends only on the matrix elements of P.

13 Single View Metrology 135 rewritten as: Z c = (p 4 p 3 ) 1 (27) If we assume an exact P matrix, then the camera distance is exact too, in fact it depends only on the matrix elements of P. Likewise, the accuracy of Z c depends only on the accuracy of the P matrix. Equation (27) maps R 6 into R, and the associated 1 6 Jacobian matrix Z c is readily derived to be Z c = Zc 2 ( ) p 4 p 3 Figure 15. Maximum likelihood estimation of the end points: (a) Original image (closeup of Fig. 16(b)). (b) The uncertainty ellipses of the end points, Λ x and Λ x, are shown. These ellipses are defined manually, and indicate a confidence region for localizing the points. (c) MLE end points ˆx and ˆx are aligned with the vertical vanishing point (outside the image). with γ a scalar, but, unfortunately, in the general case there is no closed-form solution to the problem. Nevertheless, in the general case, an initial solution can be computed by using the approximation given in Appendix B and then refining it by running a numerical algorithm such as Levenberg-Marquardt. Once the MLE end points have been estimated, we use standard techniques (Faugeras, 1993; Clarke, 1998) to obtain a first order approximation to the 4 4, rankthree covariance of the MLE 4-vector ˆζ = (ˆx 2 ˆx 2 ). Figure 15 illustrates the idea (see Appendix C for details) Uncertainty on Distance Measurements. Assuming noise in both end points and in the projection matrix, and statistical independence between ˆζ and P we obtain a first order approximation for the variance of the distance Z of a point from a plane: ( ) Λˆζ σz 2 = 0 Z Z (26) 0 Λ P where Z is the 1 10 Jacobian matrix of the function (10) which maps the projection matrix and the end points x, x to their world distance Z. The computation of Z is explained in detail in Appendix C Uncertainty on Camera Position The distance of the camera centre from the reference plane is computed according to (20) which can be and, from a first order analysis the variance of Z c is σ 2 Z c = Z c Λ P Z c (28) where Λ P is computed in Section 4.1. The variances σ 2 X c and σ 2 Y c of the X, Y location of the camera can be comupted in a similar way (Criminisi et al., 1999a) Example Uncertainty on Measurements Between Planes In this section we show the effects of the number of reference distances and image localization error on the predicted uncertainty in measurements. An image obtained from a security camera with a poor quality lens is shown in Fig. 16(a). It has been corrected for radial distortion using the method described by Devernay and Faugeras (1995), and the floor taken as the reference plane. The scene is calibrated by identifying two points v 1, v 2 on the reference plane s vanishing line (shown in white at the top of each image) and the vertical vanishing point v. These points are computed by intersecting sets of parallel lines. The uncertainty on each point is assumed to be Gaussian and isotropic with standard deviation 0.1 pixels. The uncertainty of the vanishing line is derived from a first order propagation through the vector product operation l = v 1 v 2. The projection matrix P is therefore uncertain with its covariance given by (21). In addition the end points of the height to be measured are assumed to be uncertain and their covariances estimated as in Section 4. The uncertainties in the height measurements shown are computed as 3- standard deviation intervals.

136 Criminisi, Reid and Zisserman Figure 16. Measuring heights and estimating their uncertainty: (a) Original image; (b) Image corrected for radial distortion and measurements superimposed.

14 136 Criminisi, Reid and Zisserman Figure 16. Measuring heights and estimating their uncertainty: (a) Original image; (b) Image corrected for radial distortion and measurements superimposed. With only one supplied reference height the man s height has been measured to be Z = ± 3.94 cm, (c.f. ground truth value 190 cm). The uncertainty has been estimated by using (26) (the uncertainty bound is at ±3 std.dev.). (c) With two reference heights Z = ± 3.47 cm. (d) With three reference heights Z = ± 3.27 cm. Note that in the limit Λ P = 0 (error-free P matrix) the height uncertainty reduces to 2.16 cm for all (b, c, d); the residual error, in this case, is due only to the error on the two end points. In Fig. 16(b) one reference height is used to compute the affine scale factor α from (9) (i.e. the minimum number of references). Uncertainty has been assumed in the reference heights, vertical vanishing point and plane vanishing line. Once α is computed other measurements in the same direction are metric. The height of the man has been computed and shown in the figure. It differs by 4 mm from the known true value. The uncertainty associated with the height of the man is computed from (26) and displayed in Fig. 16(b). Note that the true height value falls always within the computed 3-standard deviation range as expected. As the number of reference distances is increased (see Figs. 16(c) and (d)), so the uncertainty on P (in fact just on α) decreases, resulting in a decrease in uncertainty of the measured height, as theoretically expected (see Appendix D). Equation (12) has been employed, here, to metric calibrate the distance from the floor. Figure 17 shows images of the same scene with the same people, but acquired from a different point of view. As before the uncertainty on the measurements decreases as the number of references increases (Figs. 17(b) and (c)). The measurement is the same as in the previous view (Fig. 16) thus demostrating invariance to camera location. Figure 18 shows an example, where the height of the woman and the related uncertainty are computed for two different orientations of the uncertainty ellipses of the end points. In Fig. 18(b) the two input ellipses of Fig. 18(a) have been rotated by an angle of approximately 40, maintaining the size and position of the centres. The angle between the direction defined by the major axes (direction of maximum uncertainty) of each ellipse and the measuring direction is smaller than in Fig. 18(a) and the uncertainty in the measurements greater as expected Monte Carlo Test In this section we validate the first order error analysis described above by computing the uncertainty of the height of the man in Fig. 16(d) using our first order

With one supplied reference height Z = 190.2 ± 5.01 cm (c.f. ground truth value 190 cm). (c) With two reference heights Z = 190.4 ± 3.34 cm. See Fig. 16 for details. Figure 18.

15 Single View Metrology 137 Figure 17. Measuring heights and estimating their uncertainty, second point of view: (a) Original image; (b) the image has been corrected for radial distortion and height measurements computed and superimposed. With one supplied reference height Z = ± 5.01 cm (c.f. ground truth value 190 cm). (c) With two reference heights Z = ± 3.34 cm. See Fig. 16 for details. Figure 18. Estimating the uncertainty in height measurements for different orientations of the input 3-standard deviation uncertainty ellipses: (a) Cropped version of image 16(b) with measurements superimposed: Z = ± 2.5 cm (at 3-standard deviations). The ground truth is Z = 170 cm, it lies within the computed range. (b) the input ellipses have been rotated keeping their size and position fixed: Z = ± 3.1 cm (at 3-standard deviations). The height measurement is less accurate. analytical method and comparing it to the uncertainty derived from Monte Carlo simulations as described in Table 1. Specifically, we compute the statistical standard deviation of the man s height from a reference plane and compare it with the standard deviation obtained from the first order error analysis. Uncertainty is modeled as Gaussian noise and described by covariance matrices. We assume noise on the end points of the three reference distances. Uncertainty is assumed also on the vertical vanishing point, the plane vanishing line and on the end points of the height to be measured. Figure 19 shows the results of the test. The base point is randomly distributed according to a 2D non-isotropic Gaussian about the mean location x (on the feet of the man in Fig. 16) with covariance matrix Λ x (Fig. 19(a)). Similarly the top point is randomly distributed according to a 2D non-isotropic Gaussian about the mean location x (on the head of the man in Fig. 16), with covariance Λ x (Fig. 19(b)). The two covariance matrices are respectively: ( ) Λ x = ( ) Λ x =

16 138 Criminisi, Reid and Zisserman Figure 19. Monte Carlo simulation of the example in Fig. 16(d): (a) distribution of the input base point x and the corresponding 3-standard deviation ellipse. (b) distribution of the input top point x and the corresponding 3-standard deviation ellipse. Note that figures (a) and (b) are drawn at the same scale. (c) the analytical and simulated distributions of the computed distance Z. The two curves are almost perfectly overlapping. Table 1. Monte Carlo simulation. for j = 1 to S (with S = number of samples) For each reference: given the measured reference end points r (on the reference plane) and r, generate a random base point r j, a random top point r j and a random reference distance Z rj according to the associated covariances. Generate a random vanishing point according to its covariance Λ v. Generate a random plane vanishing line according to its covariance Λ l. Compute the α parameter by applying (12) to the references, and the current P matrix (4). Generate a random base point x j and a random top point x j for the distance to be computed according to their respective covariances Λ x and Λ x. Project the points x j and x j onto the best fitting line through the vanishing point (see Section 4.2.1). Compute the current distance Z j by applying (10). The statistical standard deviation of the population of simulated Z j values is computed as Sj=1 σ 2 Z = (Z j Z) 2 S and compared to the analytical one (26). Suitable values for the covariances of the three references, the vanishing point and the vanishing line have been used. The simulation has been run with S = samples. Analytical and simulated distributions of Z are plotted in Fig. 19(c); the two curves are almost overlapping. Slight differences are due to the assumptions of statistical independence (21, 22, 26) and first order truncation introduced by the error analysis. A comparison between statistical and analytical standard deviations is reported in the table below with the corresponding relative error: First Order Monte Carlo relative error σ Z σ Z σ Z σ Z σ Z cm cm 0.37% Note that Z = cm and the associated first order uncertainty 3 σ Z = 3.27 cm is shown in Fig. 16(d). In the limit Λ P = 0 (error-free P matrix) the simulated and analytical results are even closer. This result shows the validity of the first order approximation in this case and numerous other examples have followed the same pattern. However some care must be exercised since as the input uncertainty increases, not only does the output uncertainty increases, but the relative error between statistical and analytical output standard deviations also increases. For large covariances, the assumption of linearity and therefore the first order analysis no longer holds. This is illustrated in the table below where the relative error is shown for various increasing values of the input uncertainties. The uncertainties of references distances and end points are multiplied by the increasing factor γ ; for instance, if Λ x is the covariance of the image point x then Λ x (γ ) = γ 2 Λ x. γ σ Z σ Z σ Z (%)

17 Single View Metrology 139 Figure 20. The height of a person standing by a phonebox is computed: (a) Original image. (b) The ground plane is the reference plane, and its vanishing line is computed from the paving stones on the floor. The vertical vanishing point is computed from the edges of the phonebox, whose height is known and used as reference. Vanishing line and reference height are shown. (c) The computed height of the person and the estimated uncertainty are shown. The veridical height is 187 cm. Note that the person is leaning slightly on his right foot. In the affine case (when the vertical vanishing point and the plane vanishing line are at infinity) the first order error propagation is exact (no longer just an approximation as in the general projective case), and the analytic and Monte Carlo simulation results coincide. 5. Applications 5.1. Forensic Science A common requirement in surveillance images is to obtain measurements from the scene, such as the height of a felon. Although, the felon has usually departed the scene, reference lengths can be measured from fixtures such as tables and windows. In Fig. 20 we compute the height of the suspicious person standing next to the phonebox. The ground is the reference plane and the vertical is the reference direction. The edges of the paving stones are used to compute the plane vanishing line, the edges of the phonebox to compute the vertical vanishing point; and the height of the phonebox provides the metric calibration in the vertical direction (Fig. 20(b)). The height of the person is then computed using (10) and shown in Fig. 20(c). The ground truth is 187 cm, note that the person is leaning slightly down on his right foot. The associated uncertainty has also been estimated; two uncertainty ellipses have been defined, one on the head of the person and one on the feet and then propagated through the chain of computations as described in Section 4 to give the 2.2 cm 3-standard deviation uncertainty range shown in Fig. 20(c) Furniture Measurements In this section another application is described. Heights of furniture like shelves, tables or windows in an indoor environment are measured. Figure 21(a) shows a desk in The Queen s College upper library in Oxford. The floor is the reference plane and its vanishing line has been computed by intersecting edges of the floorboards. The vertical vanishing point has been computed by intersecting the vertical edges of the bookshelf. The vanishing line is shown in Fig. 21(b) with the reference height used. Only one reference height (minimal set) has been used in this example. The computed heights and associated uncertainties are shown in Fig. 21(c). The uncertainty bound is ±3 standard deviations. Note that the ground truth always falls within the computed uncertainty range. The height of the camera is computed as 1.71 m from the floor Virtual Modelling In Fig. 22 we show an example of complete 3D reconstruction of a real scene from a single image. Two sets of horizontal edges are used to compute the vanishing line for the ground plane, and vertical edges used to compute the vertical vanishing point.

18 140 Criminisi, Reid and Zisserman Figure 21. Measuring height of furniture in The Queen s College Upper Library, Oxford: (a) Original image. (b) The plane vanishing line (white horizontal line) and reference height (white vertical line) are superimposed on the original image; the marked shelf is 156 cm high. (c) Computed heights and related uncertainties; the uncertainty bound is at ±3 std.dev. The ground truth is: 115 cm for the right hand shelf, 97 cm for the chair and 149 cm for the shelf at the left. Note that the ground truth always falls within the computed uncertainty range. The distance of the top of the window to the ground, and the height of one of the pillars are used as reference heights. Furthermore the two sides of the base of the porch have been measured thus defining the metric calibration of the ground plane. Figure 22(b) shows a view of the reconstructed model. Notice that the person is represented simply as a flat silhouette since we have made no attempt to recover his volume. The position of the camera centre is also estimated and superimposed on a different view of the 3D model in Fig. 22(c) Modelling Paintings Figure 23 shows a masterpiece of Italian Renaissance painting, La Flagellazione di Cristo by Piero della Francesca ( ). The painting faithfully follows the geometric rules of perspective, and therefore the methods developed here can be applied to obtain a 3D reconstruction of the scene. Unlike other techniques (Horry et al., 1997) whose main aim is to create convincing new views of the painting regardless of the correctness of the 3D geometry, here we reconstruct a geometrically correct 3D model of the viewed scene (see Fig. 23(c) and (d)). In the painting analysed here, the ground plane is chosen as reference and its vanishing line computed from the several parallel lines on it. The vertical vanishing point follows from the vertical lines and consequently the relative heights of people and columns can be computed. Figure 23(b) shows the painting with height measurements superimposed. Christ s height is taken as reference and the heights of the other people are expressed as relative percentage differences. Note the consistency between the height of the people in the foreground with the height of the people in the background. By assuming a square floor pattern the ground plane has been rectified and the position of each object estimated (Liebowitz et al., 1999; Criminisi et al., 1999a, Sturm and Maybank, 1999). The scale of floor relative to heights is set from the ratio between height and base of the frontoparallel archway. The measurements, up to an overall scale factor are used to compute a three dimensional VRML model of the scene. Figure 23(c) shows a view of the reconstructed model. Note that the people are represented as flat silhouettes and the columns have been approximated with cylinders. The partially seen ceiling has been reconstructed correctly. Figure 23(d) shows a different view of the reconstructed model, where the roof has been removed to show the relative position of the people in the scene. 6. Summary and Conclusions We have explored how the affine structure of threedimensional space may be partially recovered from perspective images in terms of a set of planes parallel to a reference plane and a reference direction not parallel to the reference plane. Algorithms have been described to obtain different kinds of measurements: measuring the distance between planes parallel to a reference plane; computing area and length ratios on two parallel planes; computing the camera s location. A first order error propagation analysis has been performed to estimate uncertainties on the projection matrix and on measurements of point or camera location

19 Single View Metrology 141 Figure 22. Complete 3D reconstruction of a real scene: (a) original image; (b) a view of the reconstructed 3D model; (c) A view of the reconstructed 3D model which shows the position of the camera centre (plane location X, Y and height) with respect to the scene. in the space. The error analysis has been validated by using Monte Carlo statistical tests. Examples have been provided to show the computed measurements and uncertainties on real images. More generally, affine three-dimensional space may be represented entirely by sets of parallel planes and directions (Berger, 1987). We are currently investigating how this full geometry is best represented and computed from a single perspective image Missing Base Point A restriction of the measurement method we have presented is the need to identify corresponding points between planes. One case where the method does not apply therefore is that of measuring the distance of a general 3D point to a reference plane (the corresponding point on the reference plane is undefined). Here the homology is under-determined. One case of interest is when only one view is provided and a light-source casts shadows onto the reference plane. The light-source provides restrictions analogous to a second viewpoint (Robert and Faugeras, 1993; Reid and Zisserman, 1996; Reid and North, 1998; Van Gool et al., 1998), so the projection (in the reference direction) of the 3D point onto the reference plane may be determined by making use of the homology defined by the 3D points and their shadows.

142 Criminisi, Reid and Zisserman Figure 23. Complete 3D reconstruction of a Renaissance painting: (a) La Flagellazione di Cristo, (1460, Urbino, Galleria Nazionale delle Marche).

20 142 Criminisi, Reid and Zisserman Figure 23. Complete 3D reconstruction of a Renaissance painting: (a) La Flagellazione di Cristo, (1460, Urbino, Galleria Nazionale delle Marche). (b) Height measurements are superimposed on the original image. Christ s height is taken as reference and the heights of all the other people are expressed as percent differences. The vanishing line is dashed. (c) A view of the reconstructed 3D model. The patterned floor has been reconstructed in areas where it is occluded by taking advantage of the symmetry of its pattern. (d) Another view of the model with the roof removed to show the relative positions of people and architectural elements in the scene. Note the repeated geometric pattern on the floor in the area delimited by the columns (barely visible in the painting). Note that the people are represented simply as flat silhouettes since it is not possible to recover their volume from one image, they have been cut out manually from the original image. The columns have been approximated with cylinders. Appendix A: Edge Detection Implementation Details Straight line segments are detected by Canny edge detection at subpixel accuracy (Canny, 1986); edge linking; segmentation of the edgel chain at high curvature points; and finally straight line fitting by orthogonal regression to the resulting chain segments (Fig. 24(b)). Lines which are projection of a physical edge in the world often appear broken in the image because of occlusions. A simple merging algorithm based on orthogonal regression has been implemented to merge manually selected edges together. Merging aligned edges to create longer ones increases the accuracy of their location and orientation. An example is shown in Fig. 24(c). Scene Calibration Vanishing line and vanishing points can be estimated directly from the image and no explicit knowledge of the relative geometry between camera and viewed

Single View Metrology 143 which intersect in the same vanishing point (see Fig. 2) (Barnard, 1983; Caprile and Torre, 1990). Therefore two such lines are sufficient to define it.

21 Single View Metrology 143 which intersect in the same vanishing point (see Fig. 2) (Barnard, 1983; Caprile and Torre, 1990). Therefore two such lines are sufficient to define it. However, if more than two lines are available a Maximum Likelihood Estimate algorithm (Liebowitz and Zisserman, 1998) is employed to estimate the point. Computing the Vanishing Line. Images of lines parallel to each other and to a plane intersect in points on the plane vanishing line. Therefore two sets of those lines with different directions are sufficient to define the plane vanishing line (Fig. 25). If more than two orientations are available then the computation of the vanishing line is performed by employing a Maximum Likelihood algorithm. Appendix B: Maximum Likelihood Estimation of End Points for Isotropic Uncertainties Given two points x and x with distributions Λ x and Λ x isotropic but not necessarily equal, we estimate the points ˆx and ˆx such that the cost function (24) is minimized and the alignment constraint (25) satisfied. It is a constrained minimization problem; a closed form solution esists in this case. The 2 2 covariance matrices Λ x and Λ x for the two inhomogeneous end points x and x define two circles with radius r = σ x = σ y and r = σ x = σ y respectively. The line l through the vanishing point v that best fits the points x and x can be computed as: ξ 2 l = ξ ( ξ 2 )v x ξv y Figure 24. Computing and merging straight edges: (a) original image; (b) computed edges: some of the edges detected by the Canny edge detector; straight lines have been fitted to them. (c) edges after merging: different pieces of broken lines, belonging to the same edge in space, have been merged together. scene is required. Vanishing lines and vanishing points may lie outside the physical image (see Fig. 5), but this does not affect the computations. Computing the Vanishing Point. All world lines parallel to the reference direction are imaged as lines with r d x d y + rd x ξ = 2 d y r ( ( ) dx y) 2 d2 +r d 2 x d y 2 where d and d are the following 2-vectors: d = x v d = x v Note that this formulation is valid if v is finite. The orthogonal projections of the points x and x onto the line l are the two estimated homogeneous

144 Criminisi, Reid and Zisserman Figure 25. Computing the plane vanishing line: The vanishing line for the reference plane (ground) is shown in solid black.

22 144 Criminisi, Reid and Zisserman Figure 25. Computing the plane vanishing line: The vanishing line for the reference plane (ground) is shown in solid black. The planks on both sides of the shed define two sets of lines parallel to the ground (dashed); they intersect in points on the vanishing line. points ˆx and ˆx : l y (x Fl) l x l w ˆx = l x (x Fl) l y l w lx 2 + l2 y l y (x Fl) l x l w ˆx = l x (x Fl) l y l w lx 2 + l2 y (29) In order to simplify the following development we define the points: b = x on the plane π; and t = x on the plane π corresponding to x. It can be shown that the 4 4 covariance matrix Λˆζ of the vector ˆζ = ( ˆb x ˆb y ˆt x ˆt y ) (MLE top and base points, see Section (4.2.1)) can be computed by using the implicit function theorem (Clarke, 1998; Faugeras, 1993) as: Λˆζ = A 1 BΛ ζ B A (30) with F = [ ]. The points ˆx and ˆx obtained above are used to provide an initial solution in the general non-isotropic covariance case, for which closed form solution does not exist. In the general case the non-isotropic covariance matrices Λ x and Λ x are approximated with isotropic ones with radius r = det(λ x ) 1/4 r = det(λ x ) 1/4 then (29) is applied and the solution end points are refined by using a Levenberg-Marquardt numerical algorithm to minimize the (24) while satisfying the alignment constraint (25). Appendix C: Variance of Distance Between Planes Covariance of MLE End Points In Appendix B we have shown how to estimate the MLE points ˆx and ˆx. We here demonstrate how to compute the 4 4 covariance matrix of the MLE 4- vector ˆζ = (ˆx ˆx ) from the covariances of the input points x and x and the covariance of the projection matrix. where ζ = (b x, b y, t x, t y, p 13, p 23, p 33 ) and Λ b 0 0 Λ ζ = 0 Λ t 0 (31) 0 0 Λ p3 Λ b and Λ t are the 2 2 covariance matrices of the points b and t respectively and Λ p3 is the 3 3 covariance matrix of the vector p 3 = αv defined in (4). Note that the assumption of statistical independence in (31) is a valid one. The matrix A in (30) is the the following 4 4 matrix. A = [A.. 1 A2 ] e b 1 δ t e b 2 δ t δ ex δ by δ ey δ by τλp 33 A 1 = τλp 33 δ ex δ bx δ ey δ bx τδ ty λp 33 δ ty τe11 t A 2 = λp 33δ by τe12 t + λp 33δ bx τδ by where we have defined: τδ tx λp 33 δ tx τe12 t λp 33δ by τe22 t + λp 33δ bx τδ bx

Structure from Motion. Prof. Marco Marcon

Structure from Motion. Prof. Marco Marcon Structure from Motion Prof. Marco Marcon Summing-up 2 Stereo is the most powerful clue for determining the structure of a scene Another important clue is the relative motion between the scene and (mono)