Towards Direct Recovery of Shape and Motion Parameters from Image Sequences

Size: px
Start display at page:

Download "Towards Direct Recovery of Shape and Motion Parameters from Image Sequences"

Transcription

1 Towards Direct Recovery of Shape and Motion Parameters from Image Sequences Stephen Benoit and Frank P. Ferrie McGill University, Center for Intelligent Machines 348 University St., Montréal, Québec, CANADA H3A A7 Abstract A novel procedure is presented to construct image-domain filters (receptive fields) that directly recover local motion and shape parameters. These receptive fields are derived from training on image deformations that best discriminate between different shape and motion parameters. Beginning with the construction of -D receptive fields that detect local surface shape and motion parameters within cross sections, we show how the recovered model parameters are sufficient to produce local estimates of optical flow, focus of expansion, and time to collision. The theory is supported by a series of experiments on well-known image sequences for which ground truth is available. Comparisons against published results are quite competitive, which we believe to be significant given the local, feed-forward nature of the resulting algorithms. Key words: structure from motion, template matching, optical flow, focus of expansion, time to collision Introduction Recovery of structure from motion has been examined from a variety of approaches, mainly feature point extraction and correspondence[4,8] or computing dense optical flow[5,9,]. Typically, the Fundamental Matrix framework or a global motion model is used to solve for global motion after which the relative 3-D positions of points of interest in the scene can be computed[,9]. address: {benoits, ferrie}@cim.mcgill.ca (Stephen Benoit and Frank P. Ferrie). Preprint submitted to Elsevier Science 7 September 5

2 Direct recovery of parameters by recognising characteristic motions in a scene has been examined using point correspondences[] and local optical flow field deformations[9,3,4]. However, both methods require the extraction of motion information from the image sequence before parameter recovery. Many of these techniques require local regularization; insufficient information is available in each sample to reconstruct the surface shape[8]. Appearance-based methods have been mostly discarded for structure from motion because much of the shape and motion information are so confounded that they cannot be recovered separately or locally[7]. Soatto proved that perspective is non-linear, therefore no coordinate system will linearize perspective effects[3]. Part of the difficulty involves how regional image changes can be represented in an appearance-based framework. Most approaches involve solving a local flow field as a weighted sum of basis flow fields with some perceptual significance[,] or tracking specific image features[7] with wavelets[], typically training on image sequences of motions of interest[3]. Converting these representations into image domain operations[6,5] could allow direct recovery of significant model parameters without solving a local optimization problem by gradient descent. Two causes prevent the effective direct recovery of local structure from motion with appearance-based methods. First is representation: how to encode image deformation independently of image texture and without the need of feature points. Second is how to map the image deformations into some optimal image operators. This paper addresses both issues by describing a representation and a methodology for designing and using these optimal image domain operators for appearance-based local recovery of some shape and motion parameters. In particular, we show how the resulting feed-forward system leads to an efficient and robust algorithm for estimating dense fields of optical flow and time to collision, as well as accurate localization of the focus of expansion. The optimal operator synthesis will discover what spatial scales most appropriately describe the different deformations for the camera model. Instead of attempting to recover the shape and motion parameters of a scene over the entire parameter space, only those shapes and motions that cause distinctly different image deformations are considered. Even with aliasing of some parameters into similar appearances, we show that enough information can be extracted from the image deformations to recognise specific shapes or motions which are useful for the aforementioned tasks. We begin in Section by considering a simplified structure and motion model that relates cross-sections of a 3-D surface S to its -D image projections. The

3 reduction in dimensionality makes the analysis tractable, while application of the model in multiple orientations relates to 3-D via the geometry of normal sections [8]. Next in Section 3 we present a solution to the inverse problem of recovering model parameters from image observations. Correspondence matrix theory is introduced as a general framework for casting model recovery as an appearance matching problem. The observability of shape and motion parameters is discussed next in Section 4. As might be expected, the confounding of shape and motion limits what can be recovered, but considerable information is still available locally. Section 5 looks at the specifics of what can be recovered, and shows how estimates of optical flow, focus of expansion, and time to collision are available directly from these parameters along with confidence measures. The theory is put to the test in Section 6, which evaluates the resulting algorithms in the recovery of the predicted scene features. Interestingly, despite their local feed-forward nature, very competitive results are obtained relative to the published literature. The paper concludes in Section 7 with a discussion of the results and pointers to future research. Image Formation Model for Local Shape and Motion A convenient representation for a local region S about a surface point P S is the augmented Darboux frame (Figure ) [8], consisting of a vector frame comprised of the surface normal, N P at P, and two orthogonal vectors in the tangent plane, T P, M P and M P corresponding to the directions of maximum and minimum curvature respectively [8]. Of particular interest to this discussion is the intersection of the plane containing the normal, π NP, with the surface S and point P. The intersection is a curve in 3-D, referred to as a normal section, and the curvature of the section at intersection point P is called the normal curvature, κ N. For a smooth region (C ) about P, κ N takes on maximum, κ M, and minimum, κ M, values as π NP is rotated. The directions associated with these extremal values are the principal directions referred to above. Given this parameterization it is straightforward to define local motion and shape change in terms of a transformation of the of the frame at P. With little loss of generality, the complexity of this model can be reduced significantly by considering projections in -D as shown in Figure. We will use this approach to develop a local shape and motion model for a normal section that will subsequently be used to determine the components of motion and shape change in the corresponding image direction. This formulation requires an image tesselation of the form shown in Figure, with the model relating a -D patch (x a, y a, θ i, t ) to its correspondent at (x a, y a, θ i, t ). An oriented slit in this figure corresponds to the white rectangular region within image I shown in Figure. 3

4 Fig.. Differential geometry of a local surface patch and the projection of a normal section. (x, y,, t ) a a θ i (x, y,, t ) a a θ i t t Fig.. Oriented slits at image coordinates (x a, y a ) at multiple orientations θ i cover the image plane at instants t and t The image formation process for each -D image slit also requires a camera model, and the one chosen for the experiments in this paper and shown in Figure 3 uses perspective projection. As shown, the fixation point O is the point on the surface in the center of the aperture s field of view, i.e. view angle γ =. γ ranges in value from α/ to α/, with positive γ to the right (positive x direction). The image plane is made up of N equally-sized, non-overlapping pixels that fit within the aperture. In a physical camera, the image plane is actually behind the focal point with an image inversion, but placing it in front of the focal point simplifies the mathematics without loss of generality. The actual distance between the focal point and the center of the image plane, h, or, proportionally, the actual dimensions of each pixel are arbitrary, and these absolute scales will be factored out and eliminated. For any view angle γ, formed between the V P O and V P P lines within α/ and α/, there is a corresponding value for the image plane coordinate x that can be projected into one of the pixel coordinates between and N, ( ) N i = + N tan(γ) tan ( ). () α 4

5 z VP x h α Image plane γ... i... N N Surface O P Fig. 3. The -D camera model And, conversely, the image coordinate i can be mapped back to the view angle γ, γ = tan [ i N + N ( )] α tan. () With an image formation model now in place, we can examine the parametric relationship between two corresponding -D slits. Exploiting the geometry of normal sections (Figure ), a simple yet expressive shape and motion model can be devised using only 5 parameters, m = (Ω, δ, η, β, k). The model is shown graphically in Figure 4. Referring to the diagram in Figure 4, the surface shape along one direction can be characterized by a curvature K, a normal vector N and distance from the viewpoint, d. Distance d scales all lengths of the diagram, so it is factored out to a canonical representation with unit distance between first viewpoint V P and the fixation point on the surface. The surface normal vector N at is encoded by the angle η with respect to the first view axis V P. The curvature of the canonical surface, the reciprocal of the radius of the circular approximation to the surface, becomes k = Kd. The motion model is chosen to minimize image deformation due to translation, defining the second viewpoint V P at a given distance δ at an angle β from the first view axis V P. V P is fixated on point Q on the surface, a view rotation Ω away from the first fixation point. Collectively, the camera and shape and motion models are sufficient to describe the forward mapping of a 3-D contour, parameterized by orientation, η, and curvature, k, onto a -D image slit, I, and then onto a corresponding image slit, I via translation, Ω, change in distance to viewer, δ, and change in viewpoint direction, β. The role of each model parameter is summarized below in Table. This -D cross section shape and motion model can be aligned with multiple All lengths, including the radius of curvature, R, are scaled by d. 5

6 VP VP Q d (or ) γ γ β η N δ d (or δ ) Ω R=/K=d/k (or /k) O P Fig. 4. What does a -D slit see of a 3-D world? -D curve model shows image correspondences by a surface point P imaged in the first view at angle γ and in the second view at angle γ relative to their respective fixation directions. Symbol Typical Values Ω.4 translate left 3/64 pixels no change +.4 translate right 3/64 pixels View direction translation angle δ.75 zoom in 5%. no change.5 zoom out 5% Relative distance to surface η Surface normal angle β 45 normal pointing 45 left of V P normal pointing toward V P +45 normal pointing 45 right of V P V P moves to left of V P V P stays in line with V P + V P moves to right of V P Viewpoint change angle k -4 concave surface flat surface +4 convex surface Relative surface curvature Table Parameters of structure from motion model. N and α are known constants from the camera model. 6

7 orientations at each image position to reconstruct 3-D surface shape and motion. In theory, this model is sufficiently general to locally describe the shape and motion of a surface. Already, by examining only local shape and motion, the local image deformations (and structure from motion) can be expressed with a model of very few parameters. One concern with the use of -D image slits is the inevitability that some image motions will cause the texture seen by a slit at one instant to move perpendicularly to the slit s orientation, effectively leaving the slit with no meaningful correspondence. Such slits are thus observing stimuli that violate the -D shape and motion model, and recovered parameters will have low likelihoods compared to -D slits aligned with the image motion. More conveniently, with multiple orientations of slits at every location in the image, there will be at least one image slit that is aligned with the dominant image motion and satisfies the -D shape and motion model. An alternative would have been to construct a more complicated model to describe the motion of a surface with two views of -D image data. The guiding principle in this work has been to introduce only as much complexity as necessary to recover structure from motion. In this case, the more complicated 3-D shape and motion has been decoupled into a small set of oriented -D detectors. As will be shown in the experimental results of Section 6, the oriented -D slits produce useful structural data from the scenes. Now that an image formation model has been chosen and parameterized, there remains the issue of recognising the image deformations from moving image samples. To this end, the next section introduces the correspondence matrix representation of image deformation and the recognition theory it provides. 3 Correspondence Matrix Theory Given a pair of images I and I at times t and t respectively, the task is now to recover model parameters m = (Ω, δ, η, β, k). Towards this end we propose an appearance-based strategy. In this section we describe a theory for generating a set of image domain operator pairs, with each pair corresponding to a particular instance of m, and responding maximally when I and I correspond to the transformation associated with m. The inverse problem is formulated as template matching. As we shall demonstrate later in Section 4, the complexity of the associated search problem is made tractable by the properties of the parameter space. Indeed, we show that of the 5 parameters, only Ω and δ can be recovered unambiguously. Fortunately this turns out to be sufficient for recovering useful properties of shape and motion. 7

8 Putting aside the issue of how we sample the parameter space for the moment, to each set of parameters m i we associate a transformation H i that accounts for the corresponding image appearance change. Given that I and I are vectors, this can be conveniently represented in matrix form. As will be seen shortly, H i can be determined procedurally given I and I, which are in turn synthesized using the forward and camera models described in the previous section. Thus, for each instance of m we determine a corresponding image pair (I i, I i) from which H i is determined. Correspondence Matrix Theory [5,4] specifies both how to compute H i as well as how to synthesize a pair of operators, U i and V i, that respond maximally when presented with a pair of images undergoing the same transformation. 3. Building the Correspondence Matrix As described in [5,4], a correspondence matrix H encodes the correspondence between cells of two spaces due to a forward mapping M and back through inverse mapping M, M : x y, x X, y Y, M : y x, x X, y Y. (3) The mapping function M may be continuous or even multi-valued, but when the two spaces X and Y are discretized (say, into M and N cells or pixels, respectively), the correspondence matrix H is of finite size M N. M M (y) x M (x) X M y Y Fig. 5. General correspondence between regions from two spaces, x X and y Y. As illustrated by the Venn diagram in Figure 5, the probability of correspondence between element x = [X] i, i {,..., M} in the first image and element y = [Y] j, j {,..., N} in the second image is related to how well the forward mapping M and the inverse mapping M of the two elements coincide, H ij P ( [X] i M [Y] j ), 8

9 ) + A ( [X]i M ([Y] j ) ) = A ( M([X] i ) [Y] j A ([X] i ) + A ( ) [, ], (4) [Y] j where A (N ) is a measure of area or volume in the neighborhood N, a measure of the size of the cell of N. The general procedure for converting some mapping M is summarized in Table. () Discretize the first image space X into M elements. () Discretize the second image space Y into N elements. (3) Initialize H M N = H M = H M =. (4) For each i {,..., M}: using a uniform distribution of points in [X] i, compute the distribution of points [Y] j at or near M([X] i ), ) assign H ij M = A (M([X] i ) [Y] j. (5) For each j {,..., N}: using a uniform distribution of points in [Y] j, compute the distribution of points [X] i at or near M ([Y] j ), ( ) assign H ij M = A [X] i M ([Y] j ). (6) Assign H ij = A([Y] j)h ij M +A([X] i )H ij M. A([X] i )+A([Y] j ) Table Procedure for synthesizing a correspondence matrix H for a mapping M There is now a clear procedure for building the forward mapping, synthesizing the correspondence matrix H to represent appearance change between two images described by a model vector m. The parameter recovery to be described next will identify the unknown H given two image vectors, and hence the model parameters m. 3. Model Parameter Recovery The image structure from two images separated by time can be encoded by a correspondence matrix H. This section will show that the Singular Value Decomposition of H leads to a unique set of image domain operators that can code for specific scene events as encoded by H. The correspondence matrix H relates the elements of a first image I and a 9

10 second image I. The images are first normalized as Ĩ, Ĩ for a zero mean intensity and a contrast of by finding the image s brightness µ I and contrast I. µ I I i I i + i I i, N max( I i µ I, I i i µ I ) (, ), µ I Ĩ = I µ I µ I I, Ĩ = I µ I µ I I. (5) 3.. Image Mapping H Image correspondence between normalized images (Ĩ, Ĩ ) is determined by H, but not necessarily all information contained in Ĩ is present in Ĩ nor vice versa. This means, for example, that some elements or pixels of Ĩ do not come from Ĩ but come instead from some other source. The image correspondence constraints are in the form of a weighted sum of elements in the complementary image: H ij Ĩ i = j j ( ) H ij Ĩ j = i i H ij Ĩ j, i {...N}, H ij Ĩ i, j {...N}. (6) The image correspondence equations can be re-written: where S = j H j j H j... SĨ = HĨ, S Ĩ = H T Ĩ, (7) j H Nj,

11 and S = i H i i H i... i H in. (8) Equation Set 7 deals automatically with non-shared information: note that elements in one space that have no correspondent in the other space will have a zero row sum of H (element of S) or zero column sum of H (element of S ). 3.. Operator Synthesis Given an image pair (Ĩ, Ĩ ), the goal is to find the H that best describes their deformation and identify that H s model parameters. In order to recognise the image events for a given correspondence matrix, there needs to be a transport function to map between the two views. The image pair (Ĩ, Ĩ ) can be represented as two different transforms of a common texture vector w. In order for the elements of texture vector w to be independent and linear, the image synthesis functions must be: Ĩ N A N N w, Ĩ B N N N N w. (9) N where A is orthonormal and B is also orthonormal. A and B are not expected to be exact solutions, because the image formation process does not lend itself to a direct linear mapping between the image pair. Image domain deformation caused by perspective projections of 3-D objects, for example, confounds the surface texture signal and the geometric deformation into one image signal, and the two components are generally not separable. A and B are, however, the best linear approximation to the geometric deformation image correspondence in the sense of minimizing some error. The texture information shared by the two images via H must exist in the space spanned by H, and thus can be expressed compactly in r independent coefficients, where r = rank(h). The first r coefficients of w encode the stationary signal shared between Ĩ and Ĩ that are expressed through H. The remaining N r coefficients encode the non-shared image signals mapped into the null space of H. Because A and B are chosen to be orthonormal and square, they are invertible (by transposition). This means that the texture vector w can be recovered from

12 images (Ĩ, Ĩ ) knowing the correspondence matrix H, A T Aŵ = A T Ĩ ŵ = A T Ĩ, B T Bŵ = B T Ĩ ŵ = B T Ĩ. () 3..3 Solution via Singular Value Decomposition Substituting Equation set 9 into Equation set 7, SA w HB w w, S B w H T A w w. () The optimization problem to solve for A and B is thus: argmin SA w HB w, argmin S B w H T A w. () A,B A,B Because the matrices A and B are expected to work independently of the surface texture encoded in w, the optimization problem can be expressed as argmin SA HB, argmin S B H T A. (3) A,B A,B The minimization terms can be recombined as argmin SAB T H, argmin S BA T H T. (4) A,B A,B This can be visualized more clearly by substituting H with its Singular Value Decomposition, H = U N N Σ N N N N VT N N, (5) where U is the matrix whose columns are the left-hand eigenvectors of H and the columns of V are the right-hand eigenvectors. That is, the columns of U are the eigenvectors of HH T, and the columns of V are the eigenvectors of H T H, both sorted in decreasing order of eigenvalues. The singular values of H are the elements of the diagonal matrix Σ, which are the square root of the eigenvalues from HH T or H T H. Substituting the SVD, one must solve for: argmin SAB T UΣV T, argmin AB T S UΣV T. (6) A,B A,B

13 The remainder of this proof builds A and B one column at a time, using successively better approximations of H. The k th order approximation, H k uses the first k columns of U and V and the first k diagonal elements of Σ. To solve for unknown orthogonal matrices A and B, consider the st order approximation of H, H = σ u v T, SA B T σ u v T, A B T S σ u v T. (7) The only choice for A and B that spans the same space as u and v, satisfying both constraining equations is A = [ u,,..., ], B = [ v,,..., ]. (8) An inductive proof introduces higher k th order approximations H k to solve for the corresponding A k and B k. k SA k B T k σ i u i v i T, i= k SA k B T k σ i u i v i T, i= S(A k B T k A k B T k ) σ k u k v k T. (9) Similarly, (A k B T k A k B T k )S σ k u k v T k. () This implies that A k is A k plus an orthogonal component in the direction of u k and that B k is B k plus an orthogonal component in the direction of v k. Therefore, the k th column of A is the k th column of U and the k th column of B is the k th column of V. Therefore, the least squares error solution for Equation Set, satisfying full rank and orthonormality of A and B is to substitute A = U, B = V. () Informally, the SVD expresses the correspondence matrix H as a linear remapping U T from Ĩ to the same space as a linear remapping VT from Ĩ. The SVD of H optimizes the spectral representation of the transform between a 3

14 pair of data sets (images), maximizing the independence between channels (elements of w), ranked according to significance in the sense of minimizing least squares error Distance to Data Metric The maximum likelihood hypothesis H for an image pair (Ĩ, Ĩ ) minimizes the residual error r between the input images and their reconstructions predicted by H. Let the feature vector ŵ represent the image vectors Ĩ and Ĩ as follows: [ ŵ = U T k /. V T k / ] Ĩ..., () Ĩ where U k and V k are the k th order approximations of U and V respectively. The latter correspond to the first k columns of U and V respectively sorted by singular values. The feature vector ŵ is now the best parameterization for the image pair assuming deformation H. The residual error can be computed by projecting the feature vector back into the image space. If the assumed deformation H is sufficiently close to the scene geometry, then residual signal error r, the difference between the original image signal and the reconstructed image signal will be low, Ĩ U k / Ĩ r = ŵ.... (3) Ĩ Ĩ V k The likelihood of correspondence H given evidence (Ĩ, Ĩ ) can be expressed as a function L ( H Ĩ, Ĩ ), L ( H Ĩ, Ĩ ) e r (, ]. (4) The uncertainty of the maximum likelihood choice can be expressed as the entropy h of the likelihoods for all the different hypotheses, h = n i= L ( ) ( ( H i Ĩ, Ĩ log L Hi Ĩ, Ĩ )) log(n) (, ]. (5) 4

15 To summarize the procedure for parametric recovery, Preparation (off-line) () Discretize parameter space M into n representative parameter vectors m i, i {,..., n}. () Synthesize a correspondence matrix H i to represent the coordinate mapping M( m i ). (3) Synthesize detectors for H i, using SVD. Usage (on-line) () Using the detectors for H i, given the observations Ĩ and Ĩ, compute L ( H i Ĩ, Ĩ ). () Find the maximum likelihood H i Ĩ, Ĩ, and set the maximum likelihood m = m i. (3) Use the distribution of L ( H i Ĩ, Ĩ ) to find the entropy h of the maximum likelihood choice. 4 Observability of Model Parameters Recovery of structure from motion must address the issue of the confounding between surface shape and motion: some parameters describing the scene are not independently observable. For example, different surface orientations and curvatures will look the same if the viewer moves in a certain direction. Also, the interpretation of some motion components, such as movement toward the scene (i.e. along the optical axis) relies on the proper compensation for lateral motion. This situation arises whenever a new model for scene analysis is proposed. The model describes how the scene appearance changes with respect to the various model parameters. To solve the inverse problem, that is, to recover the model parameters after some scene change, one must identify and characterize the relationship between appearance changes and scene model parameters that cause them. In control systems theory, the terms observability and controllability are used to distinguish between the ability to retrieve a model parameter through observations versus the ability to set a model parameter to a desired value or state. For our situation, all parameters of the scene model are controllable during a training process. In contrast, many parameter changes that produce visible appearance changes are not independently observable. Using the correspondence matrix representation of appearance change, one can automatically identify those scene parameters that are directly recoverable, 5

16 those which must be determined jointly, and those that are not recoverable at all from local observations. Szeliski and Kang used first principles to compute what scene observables could be mapped back to scene model parameters[7]. The method used here is much simpler thanks to the correspondence matrix representation of local shape and motion. To begin, we note that photometric changes, such as an overall change in luminance or light source direction, are largely normalized out by virtue of (5). Edges of shadows might not be removed, and may be interpreted as moving objects in their own right. How distinctive are the appearance changes due to each parameter? A correspondence matrix is a fingerprint of appearance change between two views of a scene. One can build a database of correspondence matrices by discretely sampling the parameter space and then measuring the relationship between changes in appearance and changes in parameters. Constructing a correspondence matrix is straightforward. The shape and motion model is controllable for any choice of parameters. Considering one parameter, say Ω, group the correspondence matrices formed by the same value of Ω together, varying all the other parameters. For example, discretizing the 5 parameters Ω, δ, η, k, β to 5 levels each, for a total of 5 5 = 35 correspondence matrices yields 5 classes of 5 4 = 65 correspondence matrices. Other parameters used in the image formation process can be held constant, such as the aperture size N = 3 pixels. The distinctiveness of the appearance changes due to one changing parameter can now be expressed as the distinctiveness of the 5 classes of correspondence matrices. The naive method of measuring the class separability would be to build detectors for each of the 5 classes and actually test many image domain samples and tally the recognition rate achieved. This would only confuse the issue: the method would need to actually build detectors and test them to answer the more fundamental question of whether or not detectors could work. A much simpler and more compact method is to test the class separability of the correspondence matrices by performing Principal Component Analysis (PCA) on the 35 correspondence matrices to reduce the dimensionality of the problem. From here, Most Discriminating Features (MDF)[6], a technique that extends Linear Discriminant Analysis (LDA), will highlight the separability of the 5 classes of a given parameter. Conclusions about the observability of different shape and motion parameters can be drawn from the information imprinted in the correspondence matrices that represent the deformations, rather than indirectly from random testing. The MDF results for parameters Ω and δ are shown in Figure 6 and Figure 7. 6

17 principle components (after MDF on eigenvectors) for o principle components (after MDF on eigenvectors) for o (a) (b) Fig. 6. MDF-enhanced principal components tuned to Ω variations using all 35 training samples and 5 discrete levels of Ω. The first dimensions are shown in (a), the first 3 dimensions in (b)..8.6 principle components (after MDF on eigenvectors) for d principle components (after MDF on eigenvectors) for d (a) (b) Fig. 7. MDF-enhanced principal components tuned to δ variations using all 35 training samples and 5 discrete levels of δ. The first dimensions are shown in (a), the first 3 dimensions in (b). The results obtained show that only 3 dimensions are required to perform a good quality recovery of Ω and δ parameters from within the training set using correspondence maps alone. To test the usefulness of the 3 principal correspondence maps against values outside of the training set, a new set of correspondence maps was generated with denser sampling of Ω. The interpolation between training values is almost linear, as shown by the almost straight black curves in Figure 8. The interpolation between the training values of δ are a bit noisier, as shown in Figure 9. This illustrates that the image deformation appearances of different values of δ are confounded by different shapes (η, k) and viewpoint differences (β). The confounding can be alleviated by recovering the joint distribution of (Ω, δ). That is, the different appearances due to δ can be recovered if Ω is already known. For the remainder of this paper, Ω and δ will be found jointly; detectors will be built to recognise deformations due to combinations of both 7

18 principle components (after MDF on eigenvectors) for o principle components (after MDF on eigenvectors) for o (a) (b) Fig. 8. Interpolation between training values of Ω. In (a) the first dimensions and in (b) the first 3 dimensions, the dark curve traces the intermediate coordinates of the intermediate values of Ω. parameters..5 3 principle components (after MDF on eigenvectors) for d (a) (b) Fig. 9. Interpolation between training values of δ. The first 3 dimensions are shown in (a) where the dark curve traces the intermediate coordinates of the intermediate values of δ. The curve is reproduced in (b) without the training data clusters for clarity. The results on the other model parameters k, η, β were not as promising. Their discrete values did not cluster well, as shown in Figure, and will need to be recovered through inference of neighborhood relations or other information. These observations reveal that the motion parameters Ω and δ are not separable, but can be found jointly; together they produce distinct motion fields. These can be considered first order structure from motion parameters. In contrast, the view angle change β and the shape parameters k and η are observed to be confounded. Even in apertures of N = 3 pixels, there is very little difference in the motion field to distinguish different shapes from a single aperture. These 3 parameters require some form of post-processing, that would combine information from neighborhoods of support, or require a global fit. Rather than pursue such a strategy, we instead consider what can be recovered from 8

19 3 principle components (after MDF on eigenvectors) for k 3 principle components (after MDF on eigenvectors) for e 3 principle components (after MDF on eigenvectors) for b (a) (b) (c) Fig.. MDF-enhanced principal components tuned to (a) k variations, (b) η variations and (c) β variations, using all 35 training samples and 5 discrete levels of each parameter. The first 3 dimensions are shown. Ω and δ. 5 Recoverable Scene Information For image pairs of interest, the views will overlap in the camera aperture of angle α (shown in Figure 3), restricting the useful range of Ω to about ( α/, +α/), and typically, α <. With these angles so small, Ω will be linearly proportional to the translation along the image slit axis. The δ parameter denotes image scale change, defined as the ratio of the distance to the surface seen from V P versus from V P. Given that only of the 5 shape and motion parameters are recoverable locally, the question naturally arises: What, if anything, do these two parameters tell us about the scene? As it turns out, these two parameters provide very useful information that can be extracted as optical flow and estimates of time to collision (TTC). Calculating the optical flow and time to collision requires recovering the δ and Ω parameters. Figure illustrates some of the unique H for specific instances of model parameters (Ω, δ, k, η, β). The mean correspondence matrices H Ω,δ of Figure are synthesized by holding Ω and δ fixed and computing the mean of all the correspondence matrices formed for those specific Ω and δ and all combinations of the other discrete parameters α, k, η, β, {α},{β},{η},{k} H Ω,δ,α,β,η,k H Ω,δ. (6) {α},{β},{η},{k} 9

20 Shift left, zoom in Shift left, zoom out No motion Shift right, zoom out Ω =.8667 Ω =.8667 Ω =. Ω =.3333 δ =.9543 δ =.4 δ =. δ =.4 Fig.. Mean correspondence matrices for specific values of angle of translation Ω and scale change δ while varying other parameters of the model (k,η, and β) at N = 64 pixels and α = {4.,..., 5.}. The rows are the index into the first image I, and the columns are the index into the second image I. Ω shifts the white curve horizontally, and δ affects its slope. The concept of a mean correspondence matrix H is a special case of encoding transparent motion. Transparent motion, or layered motion, is the perception of multiple simultaneous motions in one image location due to partial transparency of surfaces undergoing separate motions. Although the -D slit shape and motion model of Section deals only with an opaque surface and is subsequently encoded as a specific H with a sharp, dominant correspondence curve, transparent motion can be encoded as a superposition (or averaging) of multiple correspondence matrices to represent the combination of different shapes and motions. The mean correspondence matrices used here, however, are only concerned with expressing variations of slightly different shapes and motions rather than combinations of distinctly different shapes and motions at any one image location. This does not preclude the possibility that transparent motion can be recovered at the recognition stage by identifying multiple valid candidate shapes and motions at each image location. Another convenient property of the correspondence matrix is the ease with which time can be reversed, i.e. that image samples can be presented in reversechronological order to obtain more information. For a given H representing the forward-time mapping from I to I, the reverse-time mapping from I to I is simply the transpose, H T. As will be elaborated shortly, there is always more information to be gained by combining forward-time and reverse-time measurements, but the correspondence matrix offers one of the rare instances where the computation is straightforward. For the next sections, it is assumed that at every location and orientation θ i, i {,..., n} in the image, a maximum likelihood H Ωi,δ i has been recovered (or equivalently, the values Ω i and δ i ), together with the normalized entropy h i to represent the uncertainty of the maximum likelihood choice. It is further assumed that the analogous reverse-time data has been recovered, specifically

21 the maximum likelihood H Ω i,δ i and its entropy h i. 5. Optical Flow Optical flow is a measurement of the apparent movement of picture elements in the image plane of an image sequence. An optical flow vector (u, v) can be computed at each image location by combining the estimated displacement i along each orientation θ i over i {,..., n}. Each orientation has a different confidence ( h i ) for the displacement along that direction, so their contributions will be weighted accordingly. For the experiments in this paper, n = 6 orientations. The optical flow vector thus has components, ni= ( h i ) i cos(θ i ) u = ni=, ( h i ) ni= ( h i ) i sin(θ i ) v = ni= ( h i ). (7) The notion of entropy h i for the displacement i along orientation θ i maps directly to a probabilistic interpretation of the component of optical flow along that orientation. The maximum likelihood displacement i is still the corresponding to the maximum likelihood Ω i. The specificity of the distribution of L (Ω) over all Ω can be measured by its normalized entropy h i. The corresponding i has the same distribution, so the credibility of the maximum likelihood i is found automatically. The introduction of the displacement i formalizes an issue that demands clarification: how does one describe an image plane translation of a patch of an image that is deforming? Should one just use the image plane translation of the point at the center of the receptive field, or express the flow vector as a random variable with a distribution? The experiments in this paper use the former: one maximum likelihood flow vector at the center of the receptive field. At first glance, it would make sense to use the angle Ω to determine translation, referring to the image formation model presented in Equation, i = N tan(ω i ) tan ( ). (8) α There are drawbacks to using Equation 8. Most optical flow algorithms run into an analogous problem: one displacement has been chosen to represent

22 the motion of an entire aperture. The classical optical flow algorithms find the displacement assuming that the surface is planar and in the image plane. Converting Ω itself directly into a translation magnitude would ignore the fact that Ω was determined over the entire aperture of N pixels to represent a range of values for Ω, and does not necessarily reflect the translation occurring at the center of the aperture. More generally, there will be distortions: not all pixels in the aperture are displacing the same amount. Automatically dealing with this issue is one of the principal advantages of using correspondence matrices in the first place. Instead of using the angle Ω to determine translation, it is more informative to use the maximum likelihood correspondence matrix H Ω,δ. This preferred method is to look up the translation of the center pixel (or pixels in a neighborhood N ( N ) near the center) by inspection of the correspondence matrix H Ω,δ, illustrated in Figure, so that (H) r N ( N ) Nc= ch rc r N ( N ) Nc= H rc (N ), (9) i = (H Ωi,δ i ). (3) Already, the optical flow vector is more localized than before. But there is still more information available. i uses estimates from the forward-time H Ωi,δ i. But the reverse-time H Ω i,δ i and its confidence ( h i) can contribute as well. This could be very significant in cases of occlusion and disocclusion, when forward or reverse correspondences individually may have incomplete views. Time reversal is just the transposing of the correspondence matrix. Transposing it back provides a matrix in the same frame as the forward-time version. Therefore, the reverse-time opinion of the forward-time displacement can be expressed as i = (H T Ω i,δ i ). (3) The forward and reverse time information provides a unique opportunity to produce even more robust optical flow vectors by fusing the two together as i i ( h i) + i ( h i) (h, i + h i)

23 Correspondence curve (a) Center I ~ of Linear approximation (using just Ω) Ω (b) H ~ H T (Η) (c) Center (d) I ~ of Fig.. Improved localization of optical flow in one aperture. A correspondence matrix describes the correspondence between each pixel in the first image aperture (vertical axis) with each pixel in the second image aperture (horizontal axis). Examples are shown of 3 3 element correspondence matrices representing -dimensional (a) flat image-plane translation, (b) image-plane translation with scale change, and (c) non-uniform surface velocity. The white squares represent, the dark squares represent. Optical flow is considered to be the displacement of the central pixel in an aperture between two instants. As shown in (d), Typical optical flow assumes uniform displacement in the image plane (thin line) of Ω which would be different from the true central pixel displacement of (H) using a better-fitting non-uniform displacement (thick line). with h i (h i + h i). (3) The improved optical flow vector components (ũ, ṽ) are thus: ni= ( ũ = h i ) i cos(θ i ) ni= ( h, i ) ni= ( ṽ = h i ) i sin(θ i ) ni= ( h. (33) i ) 3

24 5. Time To Collision Simply put, the time to collision (TTC) is the amount of time estimated before the viewer contacts an obstacle in the scene. This information is especially useful for robot navigation and collision avoidance. An image sequence from a moving camera could be transformed into a map of time to collision for each location in an image. If this information is sufficiently precise and dense, it can be used to reconstruct the 3-D shape of objects in the scene. Given the local structure from motion model used here, the δ parameter indicates the ratio of the distance between the new viewpoint to the surface over the distance between the old viewpoint to the surface, δ = V P O V P O. (34) The time between observations, t, is known beforehand. Assuming that the camera s motion relative to the scene will continue at constant velocity, one can estimate how much time will elapse before the camera reaches the point on the surface it is looking at and heading toward, T = t ( δ ) δ. (35) Recovering Ω and δ locally in forward time can be augmented by recovering Ω and δ by reversing the sequence of the images. A direct method of computing the time to collision T between two images, using both forward and reverse information, separated by a delay of t is to average the two contributions: T = t ( δ δ + ) δ. (36) Although time to collision is customarily only used with positive values, as in the amount of time before contact happens, negative numbers are still meaningful; they indicate how much time has elapsed since the camera was in contact with the surface (assuming, as above, constant velocity). 4

25 5.3 Focus of Expansion The Focus of Expansion (FOE) is that point in the image plane towards which the camera is moving. The optical flow field radiates away from this point. Using the model of surface shape and motion from Section, a patch in the image is a candidate for Focus of Expansion if and where Ω θ =, θ. Generally, finding the Focus of Expansion is considered a global problem, requiring information from all over the image to build one globally consistent solution. Some methods [,] use normal optical flow to form concentric rings about the Focus of Expansion. In that approach, candidate points vote inclusion in a Focus of Expansion cluster. As implemented in this paper, each patch of pixels in the image also votes for itself, but based only on local information, weighted by its confidence that it has Ω =. The Focus of Expansion (FOE) is computed by a simple weighted voting procedure. The FOE weight for each patch is the sum of each orientation θ s likelihood of having Ω θ =, w ij = ni= L (Ω θi = ) ( h θi ) + n i= L ( Ω θ i = ) ( h θi ) n [, ].(37) Normalizing the weights w to produce w, w ij = w ij / max(w) [, ]. (38) Clipping the weights to only retain the highest weights, w ij, if w ij >.75 ŵ ij =, otherwise. (39) And finally, the actual coordinate of the Focus of Expansion is a weighted sum of the candidate coordinates, F OE x = ij F OE y = ij x ij ŵ ij, y ij ŵ ij. (4) 5

26 6 Experiments The results of experiments on real and artificial data are now presented in support of the preceding theory. We begin with the synthesis of operators for recovering observable parameters (Ω, δ) and then proceed to investigate the performance of these operators in recovering time to collision and focus of expansion on synthetic data. From here the experiment is repeated on real data acquired with a video camera mounted on a gantry robot system to determine the sensitivity of recovery under realistic imaging conditions. Next we consider the performance of the algorithm on some well-known sequences (Yosemite, Translating Trees) and compare our approach in recovering optical flow against published results. Specifically we show that despite the local nature of the resulting algorithm, competitive flow results are obtained along with reliable estimates of focus of expansion. Finally, we consider a mobile robot scenario and show that fairly accurate time to collision estimates can be obtained using an uncalibrated camera. 6. Synthesis of Operators The model space of (Ω, δ, k, η, β) was discretized into 9 levels for each of Ω and δ, and 5 levels each for k, η, β and α, binning the correspondence matrices into mean correspondence matrices indexed by (Ω, δ) for a total of 36 distinct correspondence matrices. Applying the procedures of Section 3.., 36 U,V pairs are synthesized, the first 4 of which are shown below in Figure 3. Each row i of the upper grayscale images at time t is a deformed sinusoid which corresponds to another deformed sinusoid at time t = t + t, in the corresponding row i in the lower grayscale images. t t + Ω =.8 δ =.9487 Ω =.8 δ =. Ω =. δ =. Ω =. δ =. t Fig. 3. U, V of correspondence matrices. Note that the most significant optimal detectors for this motion model are windowed sinusoids concentrated in the lower spatial frequencies. The SVD 6

27 automatically discovered the appropriate spatial scales for detecting each deformation. The parameters used are: Number of pixels: N 64 View translation : Ω[i].4 ( i 9 9 ), i {,.., 8} Distance to surface: δ[i].3456 ( i 9 9 ), i {,.., 8} Surface normal : η[i] { 45,.5,..., 45 } View change : β[i] {, 5,, 5, } Surface curvature: k[i] { 4,,,, 4} View aperture : α[i] {8, 8.5, 9, 9.5, } These experiments were performed using slits of length N = 64 with widths of 3 pixels. The operators are applied to a 3 3 gridding of the image pair at 6 different orientations θ (, 3, 6, 9,, 5 ). The same procedure is applied by swapping the two images to recover the maximum likelihood for the time reversal, (Ω, δ, θ ). Ω and Ω are useful for optical flow, but the local depth (or time to collision) cues are only available through the δ and δ measures. Ω and δ are found jointly, and the usefulness of δ is sensitive to the correct compensation of Ω. Some neighborhood filtering is required, since δ is noisy. The maximum likelihood values for Ω, δ, Ω, δ in the grid of 3 3 elements are median filtered in a 3 3 neighborhood for Ω, Ω, and a 7 7 neighborhood for δ, δ. 6. Synthetic Images, 4 Boxes To test the Time to Collision detectors, a random dot texture was combined with synthetic range data shown in Figure 4 to generate the synthetic image pair shown in Figure 5. The surface consisted of 4 quadrants of varying distance (45, 5, 55 and 6mm from the camera focal point), and the camera moves by 5mm towards the center of the scene. The ground truth time to collision for the four quadrants are thus 8, 9, and units of time in the future. The camera geometry is modeled as a pinhole perspective camera with 3 4 square pixels where 3 pixels are mapped onto a viewing angle of This means that the image samples of 64 pixels will cover aperture angles α varying from at the center to 8 at the periphery. 7

28 The recovered Ω, Ω data is almost perfectly planar for each orientation, as shown in Figure 6. This is expected, as the scene motion generates a radial flow field. Ω is proportional to the image plane velocity, with linearly increasing magnitude as it moves further from the focus of expansion. Note that the Ω parameter is signed and directional. Fig. 4. Synthetic texture and range. Black is 45mm from focal point, white is 6 mm. Fig. 5. Synthetic image pair. The camera moves 5mm toward the center of the scene. Ω θ Ω Fig. 6. Recovered Ω and Ω for 6 orientations at 3 3 image locations (black:.4, gray:, white: +.4 ). δ θ δ Fig. 7. Recovered δ and δ for 6 orientations at 3 3 image locations (black:.8, gray:., white:

29 Since Ω, Ω have been cleanly recovered, it is not surprising that the more sensitive δ, δ have formed blobs roughly in the four quadrants of the scene, shown in Figure 7. The local structure from motion problem is at its worst in image sequences with a forward-moving camera. Optical flow methods have no information at or near the focus of expansion, and small errors in the flow field even toward the periphery render depth recovery impractical without fitting a global model to the optical flow field. With this experiment, the time to collision detectors are designed specifically to work best at or near the focus of expansion. Time to collision, 4 boxes from 8 to, Ground Truth Time to collision, 4 boxes from 8 to, Recovered Time to collision, 4 boxes from 8 to, Ground Truth Time to collision, 4 boxes from 8 to, Recovered Fig. 8. Time to Collision, ground truth versus recovered. The surfaces recovered in Figure 8 are noisy. The root-mean-squared error of the recovered time to collision surface is.9337 units of time, and is illustrated in Figure 9. The mean error of TTC is 8.75%. This is an interesting result, indicating that under favorable conditions, two frames separated by one unit of time can compute the time to collision to within one unit of time. This is approximately the limit of granularity of the time to collision calculation. The minimum absolute error is.76 in the centers of the boxes, where neighbors agree on the distance to the surface. The maximum error,.98 units of time, occurred at the focus of expansion at the center of the image, which is where the TTC can be interpreted as any of {8, 9,, } units of time. Depending on which of the 4 possible TTC was chosen at the center, there could have been a difference of up to 4 units of time, and the result would still have been within bounds of legitimate interpretation. Next, the receptive field responses (Ω, δ) θ and (Ω, δ ) θ were combined as above to generate the optical flow field shown in Figure. The focus of expansion was computed to be (F OE x, F OE y ) = (6., 9.5), illustrated in. The time to collision for the FOE was found to be.95 units of time. The ground 9

30 Time to collision, 4 boxes from 8 to, Absolute Error Time to collision, 4 boxes from 8 to, Absolute Error (a) (b) Fig. 9. Time to collision error from synthetic image pair. The absolute time to collision error is rendered in (a) -D and (b) 3-D. The RMS error is.9337 units of time, with a minimum error of.76 near the middle of each of the four boxes, and a maximum error of.98 at the center where multiple interpretations are legitimately possible. truth is units of time. Fig.. Recovered optical flow from synthetic image pair, using the same shape deformation receptive fields and multiple orientations as the previous experiments. The composite flow vectors are the weighted sums of the different orientations maximum likelihood displacements. The results in Figure 8 demonstrate that the recovered time to collision estimates are sufficiently clean to distinguish between regions of the image at 8, 9, and time units in the future. Although the flow field is technically correct, the flow field alone is not sufficient to reconstruct the surface shape or time to collision at specific points. At the center, the flow vectors have zero length, so they give no indication of speed at all. There are no distinct flow field changes to indicate different speeds. And yet the receptive fields were sensitive enough to yield rich 3-D shape and motion information without a surface model regularization stage. This observation further reinforces the fact 3

31 (a) (b) Fig.. Finding the Focus of Expansion for the synthetic image pair from local information only. A normalized likelihood map of FOE, w ij, is shown in (a). The + marker indicates the computed FOE. A contour map of FOE likelihood is superimposed on one of the images in (b). The ground truth FOE is at the image center (6, ), and the computed FOE is at (6., 9.5). that a -D image plane optical flow field is a bottleneck, discarding information that is essential for interesting egomotion situations. 6.3 Natural images, Calibration Grid The experiment was repeated using grayscale images captured by a camera mounted on a gantry robot looking at a flat planar calibration grid. The camera field of view is the same as the earlier synthetic example, but it is not an idealized pinhole camera. The camera lens starts at 5mm from the plane, and moves 5mm closer, for a time to collision of 9 units of time, shown in Figure. The camera optics slightly bend the grid lines; the small squares toward the periphery are a bit smaller than the squares near the center. This spherical aberration makes the grid appear as a textured sphere shown close up, directly in front of the camera. The camera s focal length is 7mm, placing the focal point about 5mm behind the lens from which camera distance was measured. Fig.. Planar calibration grid scene. The detector response in Figure 3 is so sensitive that the recovered time to 3

32 Time to collision, flat grid, Recovered Time to collision, flat grid, Recovered Fig. 3. Time to collision, planar grid. collision captures the spherical aberration effect. The time to collision varies from units of time at the center to 8 units of time at the periphery. The aberration makes the periphery appear further than it actually is, moving faster than the center, hence a lower time to collision. In this case, optical flow in the periphery is uninformative; there appears to be no translation, but there is texture expansion. This experimental result confirms that the local feed-forward operators can extract precise estimates of the real time to collision without a global scene motion model. 6.4 Two Well-Known Image Sequences Next we consider performance of these detectors relative to optical flow recovery via comparison to standard published algorithms [3,5]. Errors are computed using the code provided by Barron et al. in [3]. Optical flow estimates computed using (33) are by themselves not informative without consideration of their corresponding entropy values as related by (5). For the experiments presented in this paper, flows with entropies above an empirically selected threshold are marked as invalid Translating Trees Figure 4 shows two successive frames from the well-known translating tree sequence used to compare optical flow algorithms. The resulting optical flow map is shown in Figure 5a with the corresponding confidence map shown in Figure 5b. As can be seen from the confidence map, the algorithm has considerable difficulty producing reliable estimates in regions of shadow and low texture. This is perhaps not too surprising given the local nature of the algorithm. Measured error, relative to ground truth, is Mean 6., SD 6.3, which is considerably worse than the Mean.33, SD.44 reported in [5]. However, using the associated confidences to reject estimates falling below 3

33 threshold, error is reduced to Mean 8.79, SD which we consider quite reasonable for a local operator with limited support. The thresholded optical flow map is shown in Figure 6. (a) (b) Fig. 4. First two frames of the translating trees sequence. (a) (b) Fig. 5. (a) Computed optical flow, full density (b) Associated confidence map (brighter = higher confidences) 6.4. Yosemite Canyon In contrast to the translating tree sequence, the algorithm fares much better on the Yosemite canyon sequence, another well-known reference for optical flow comparison. Two successive frames are shown in Figure 7 with the re- Fig. 6. Computed optical flow, low confidence flows eliminated 33

34 sulting optical flow map shown in Figure 8. Measured error relative to ground truth is Mean.5, SD 9.9, with no confidence thresholding of the data. In comparison to the best method, Liu et al.[5], with Mean 6.84, SD.88, we consider the results quite competitive given the local nature of the computation. (a) (b) Fig. 7. Frames and 3 of the Yosemite sequence To further illustrate the quality of the recovered (Ω, δ) parameters, time to collision is estimated using (36) and shown in -D in Figure 9 and as a 3-D range image in Figure 3. This challenging image set has been processed to generate excellent structural information from scene, in particular the shape of the mountain range nearest the camera viewpoint. The Focus of Expansion was computed to be (F OE x, F OE y ) = (43.5,.85), illustrated in 3. The time to collision for the FOE was found to be units of time. Interestingly, the instantaneous FOE is in the further mountain range a bit to the right, suggesting that the aircraft is not heading straight forward along the camera view axis, but is instead heading a bit to the right of the camera center. Fig. 8. Computed optical flow for Yosemite sequence, full density 34

35 Time to collision, Yosemite, Recovered (a) (b) Fig. 9. Time to collision for Yosemite image pair. Frame is shown in (a) for comparison with the time to collision map (b). The units of time are the number of image frames before collision. Areas more than 4 units of time away are clipped for clarity. The outer border is marked as invalid. Time to collision, Yosemite, Recovered Fig. 3. Yosemite surface reconstruction from time to collision. The shape of the nearest mountain is clearly extracted and rendered in the lower left. The range data is simply the time to collision. Areas more than 4 units of time away are clipped for clarity. The outer borders are marked as invalid. 6.5 Mobile Robots As a final demonstration we consider a video sequence obtained from a mobile robot moving along a known trajectory with known constant velocity. The first and last images of an frame sequence are shown in Figure 3. The robot s position advances cm between each image, hence a velocity of cm per unit of time t. The map of maximum likelihood ˆδ and the map of time to collision T were computed over a grid of slits at 6 orientations and are rendered in Figure 33. Computation time on an Intel 35

36 (a) (b) Fig. 3. Finding the Focus of Expansion for Yosemite canyon image pair from local information only. A likelihood map of FOE, w ij, is shown in (a). The + marker indicates the computed FOE. A contour map of FOE likelihood is superimposed on one of the images in (b). Frame (t ) Frame (t ) Fig. 3. Lab sequence, source images. Pentium 4 66 MHz workstation was approximately 9 seconds per image frame pair. In a practical implementation, the redundancy in the filters would be reduced using linear combinations of Principal Components, potentially reducing computation time by a factor of. Using a sparser sampling of the image plane would further reduce the computation time closer to real time. One significant observation is that although the floor tile pattern expands closer to the camera due to perspective and the texture of the floor is moving toward the robot, the floor is heading underneath the camera and does not appear to be on a collision course with the camera. Because the camera line of sight is parallel to the floor, the algorithm has effectively classified the floor motion as maintaining constant distance from the camera, and thus not an obstacle. The algorithm performs a literal figure-ground separation, and the obstacle blobs are at least qualitatively useful for identifying the location of the nearest obstacles in the image. Next, the quantitative estimates are examined. Note that in Figure 33, there are holes in the time map between the table and chair legs, where the back wall is beyond the detector s range. The chair near the center of the image started 4cm from the camera, and by the eleventh frame is cm from the camera. The chair has a large opening in it, letting a view of the background through. The image slits are based on a model of a 36

37 t t t 3 t 4 t 5 Source δ map Time map Fig. 33. Results from Lab sequence. For the δ map, δ =.8 (rapid approach) is rendered as black, δ =. (no depth change) is middle gray and δ =.5 (rapid retreat) is white. The time map indicates proximity (either about to touch or recently touched) as brightness. Dark patches are more than 5 units of time away, either in the future or the past. continuous surface, so some uncertainty in the maximum likelihood estimates in this situation is unavoidable. The usefulness of the estimates can be shown in Figure 34, comparing the mean of the estimated time to collision of the region around that chair to ground truth from actual measurements. 9 Time to collision estimate: Lab sequence, central chair Ground truth Estimate 8 7 Time to collision Image frame Fig. 34. Neighborhood of central chair in Lab sequence. Image patches from frames, 3, 6, and 9 are shown above. Below, the mean time to collision with this patch, shown as small circles are compared with the ground truth (line). 37

38 6.6 Lab Sequence A second mobile robot travelled a different linear trajectory in the same room, sampling 4 images, each 5cm apart, shown in Figure 35. Frame (t ) Frame (t ) Fig. 35. Lab sequence, source images. As before, the floor tiles are ignored, and the real obstacles near the far wall are highlighted. The textureless patches of the far wall are uninformative, and are of high uncertainty: they are rendered here as high times to contact. The results of Figure 36 ignore the patches that appeared to be moving away from the moving camera. Note the shape of the table, chair and coat rack that stand out against the background. t 6 t 7 t 8 t 9 t Source δ map Time map Fig. 36. Results from Lab sequence. For the δ map, δ =.8 (rapid approach) is rendered as black, δ =. (no depth change) is middle gray and δ =.5 (rapid retreat) is white. The time map indicates proximity (either about to touch or recently touched) as brightness. Dark patches are more than 5 units of time in the future. Negative times are ignored. 38

Towards direct motion and shape parameter recovery from image sequences. Stephen Benoit. Ph.D. Thesis Presentation September 25, 2003

Towards direct motion and shape parameter recovery from image sequences. Stephen Benoit. Ph.D. Thesis Presentation September 25, 2003 Towards direct motion and shape parameter recovery from image sequences Stephen Benoit Ph.D. Thesis Presentation September 25, 2003 September 25, 2003 Towards direct motion and shape parameter recovery

More information

Edge and local feature detection - 2. Importance of edge detection in computer vision

Edge and local feature detection - 2. Importance of edge detection in computer vision Edge and local feature detection Gradient based edge detection Edge detection by function fitting Second derivative edge detectors Edge linking and the construction of the chain graph Edge and local feature

More information

Structure from Motion. Prof. Marco Marcon

Structure from Motion. Prof. Marco Marcon Structure from Motion Prof. Marco Marcon Summing-up 2 Stereo is the most powerful clue for determining the structure of a scene Another important clue is the relative motion between the scene and (mono)

More information

CHAPTER 5 MOTION DETECTION AND ANALYSIS

CHAPTER 5 MOTION DETECTION AND ANALYSIS CHAPTER 5 MOTION DETECTION AND ANALYSIS 5.1. Introduction: Motion processing is gaining an intense attention from the researchers with the progress in motion studies and processing competence. A series

More information

Visual Recognition: Image Formation

Visual Recognition: Image Formation Visual Recognition: Image Formation Raquel Urtasun TTI Chicago Jan 5, 2012 Raquel Urtasun (TTI-C) Visual Recognition Jan 5, 2012 1 / 61 Today s lecture... Fundamentals of image formation You should know

More information

Stereo and Epipolar geometry

Stereo and Epipolar geometry Previously Image Primitives (feature points, lines, contours) Today: Stereo and Epipolar geometry How to match primitives between two (multiple) views) Goals: 3D reconstruction, recognition Jana Kosecka

More information

Chapter 4. Clustering Core Atoms by Location

Chapter 4. Clustering Core Atoms by Location Chapter 4. Clustering Core Atoms by Location In this chapter, a process for sampling core atoms in space is developed, so that the analytic techniques in section 3C can be applied to local collections

More information

ELEC Dr Reji Mathew Electrical Engineering UNSW

ELEC Dr Reji Mathew Electrical Engineering UNSW ELEC 4622 Dr Reji Mathew Electrical Engineering UNSW Review of Motion Modelling and Estimation Introduction to Motion Modelling & Estimation Forward Motion Backward Motion Block Motion Estimation Motion

More information

Motion Tracking and Event Understanding in Video Sequences

Motion Tracking and Event Understanding in Video Sequences Motion Tracking and Event Understanding in Video Sequences Isaac Cohen Elaine Kang, Jinman Kang Institute for Robotics and Intelligent Systems University of Southern California Los Angeles, CA Objectives!

More information

calibrated coordinates Linear transformation pixel coordinates

calibrated coordinates Linear transformation pixel coordinates 1 calibrated coordinates Linear transformation pixel coordinates 2 Calibration with a rig Uncalibrated epipolar geometry Ambiguities in image formation Stratified reconstruction Autocalibration with partial

More information

Motion and Tracking. Andrea Torsello DAIS Università Ca Foscari via Torino 155, Mestre (VE)

Motion and Tracking. Andrea Torsello DAIS Università Ca Foscari via Torino 155, Mestre (VE) Motion and Tracking Andrea Torsello DAIS Università Ca Foscari via Torino 155, 30172 Mestre (VE) Motion Segmentation Segment the video into multiple coherently moving objects Motion and Perceptual Organization

More information

Motion Estimation. There are three main types (or applications) of motion estimation:

Motion Estimation. There are three main types (or applications) of motion estimation: Members: D91922016 朱威達 R93922010 林聖凱 R93922044 謝俊瑋 Motion Estimation There are three main types (or applications) of motion estimation: Parametric motion (image alignment) The main idea of parametric motion

More information

Chapter 3 Image Registration. Chapter 3 Image Registration

Chapter 3 Image Registration. Chapter 3 Image Registration Chapter 3 Image Registration Distributed Algorithms for Introduction (1) Definition: Image Registration Input: 2 images of the same scene but taken from different perspectives Goal: Identify transformation

More information

CHAPTER 3. Single-view Geometry. 1. Consequences of Projection

CHAPTER 3. Single-view Geometry. 1. Consequences of Projection CHAPTER 3 Single-view Geometry When we open an eye or take a photograph, we see only a flattened, two-dimensional projection of the physical underlying scene. The consequences are numerous and startling.

More information

EE795: Computer Vision and Intelligent Systems

EE795: Computer Vision and Intelligent Systems EE795: Computer Vision and Intelligent Systems Spring 2012 TTh 17:30-18:45 FDH 204 Lecture 14 130307 http://www.ee.unlv.edu/~b1morris/ecg795/ 2 Outline Review Stereo Dense Motion Estimation Translational

More information

Motion. 1 Introduction. 2 Optical Flow. Sohaib A Khan. 2.1 Brightness Constancy Equation

Motion. 1 Introduction. 2 Optical Flow. Sohaib A Khan. 2.1 Brightness Constancy Equation Motion Sohaib A Khan 1 Introduction So far, we have dealing with single images of a static scene taken by a fixed camera. Here we will deal with sequence of images taken at different time intervals. Motion

More information

A Review of Image- based Rendering Techniques Nisha 1, Vijaya Goel 2 1 Department of computer science, University of Delhi, Delhi, India

A Review of Image- based Rendering Techniques Nisha 1, Vijaya Goel 2 1 Department of computer science, University of Delhi, Delhi, India A Review of Image- based Rendering Techniques Nisha 1, Vijaya Goel 2 1 Department of computer science, University of Delhi, Delhi, India Keshav Mahavidyalaya, University of Delhi, Delhi, India Abstract

More information

Stereo Vision. MAN-522 Computer Vision

Stereo Vision. MAN-522 Computer Vision Stereo Vision MAN-522 Computer Vision What is the goal of stereo vision? The recovery of the 3D structure of a scene using two or more images of the 3D scene, each acquired from a different viewpoint in

More information

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009 Learning and Inferring Depth from Monocular Images Jiyan Pan April 1, 2009 Traditional ways of inferring depth Binocular disparity Structure from motion Defocus Given a single monocular image, how to infer

More information

Lecture 3: Camera Calibration, DLT, SVD

Lecture 3: Camera Calibration, DLT, SVD Computer Vision Lecture 3 23--28 Lecture 3: Camera Calibration, DL, SVD he Inner Parameters In this section we will introduce the inner parameters of the cameras Recall from the camera equations λx = P

More information

Laser sensors. Transmitter. Receiver. Basilio Bona ROBOTICA 03CFIOR

Laser sensors. Transmitter. Receiver. Basilio Bona ROBOTICA 03CFIOR Mobile & Service Robotics Sensors for Robotics 3 Laser sensors Rays are transmitted and received coaxially The target is illuminated by collimated rays The receiver measures the time of flight (back and

More information

2D rendering takes a photo of the 2D scene with a virtual camera that selects an axis aligned rectangle from the scene. The photograph is placed into

2D rendering takes a photo of the 2D scene with a virtual camera that selects an axis aligned rectangle from the scene. The photograph is placed into 2D rendering takes a photo of the 2D scene with a virtual camera that selects an axis aligned rectangle from the scene. The photograph is placed into the viewport of the current application window. A pixel

More information

Lecture 17: Recursive Ray Tracing. Where is the way where light dwelleth? Job 38:19

Lecture 17: Recursive Ray Tracing. Where is the way where light dwelleth? Job 38:19 Lecture 17: Recursive Ray Tracing Where is the way where light dwelleth? Job 38:19 1. Raster Graphics Typical graphics terminals today are raster displays. A raster display renders a picture scan line

More information

SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS

SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS Cognitive Robotics Original: David G. Lowe, 004 Summary: Coen van Leeuwen, s1460919 Abstract: This article presents a method to extract

More information

Massachusetts Institute of Technology Department of Computer Science and Electrical Engineering 6.801/6.866 Machine Vision QUIZ II

Massachusetts Institute of Technology Department of Computer Science and Electrical Engineering 6.801/6.866 Machine Vision QUIZ II Massachusetts Institute of Technology Department of Computer Science and Electrical Engineering 6.801/6.866 Machine Vision QUIZ II Handed out: 001 Nov. 30th Due on: 001 Dec. 10th Problem 1: (a (b Interior

More information

Dense Image-based Motion Estimation Algorithms & Optical Flow

Dense Image-based Motion Estimation Algorithms & Optical Flow Dense mage-based Motion Estimation Algorithms & Optical Flow Video A video is a sequence of frames captured at different times The video data is a function of v time (t) v space (x,y) ntroduction to motion

More information

Model Based Perspective Inversion

Model Based Perspective Inversion Model Based Perspective Inversion A. D. Worrall, K. D. Baker & G. D. Sullivan Intelligent Systems Group, Department of Computer Science, University of Reading, RG6 2AX, UK. Anthony.Worrall@reading.ac.uk

More information

Segmentation and Tracking of Partial Planar Templates

Segmentation and Tracking of Partial Planar Templates Segmentation and Tracking of Partial Planar Templates Abdelsalam Masoud William Hoff Colorado School of Mines Colorado School of Mines Golden, CO 800 Golden, CO 800 amasoud@mines.edu whoff@mines.edu Abstract

More information

cse 252c Fall 2004 Project Report: A Model of Perpendicular Texture for Determining Surface Geometry

cse 252c Fall 2004 Project Report: A Model of Perpendicular Texture for Determining Surface Geometry cse 252c Fall 2004 Project Report: A Model of Perpendicular Texture for Determining Surface Geometry Steven Scher December 2, 2004 Steven Scher SteveScher@alumni.princeton.edu Abstract Three-dimensional

More information

Computer Vision. Coordinates. Prof. Flávio Cardeal DECOM / CEFET- MG.

Computer Vision. Coordinates. Prof. Flávio Cardeal DECOM / CEFET- MG. Computer Vision Coordinates Prof. Flávio Cardeal DECOM / CEFET- MG cardeal@decom.cefetmg.br Abstract This lecture discusses world coordinates and homogeneous coordinates, as well as provides an overview

More information

Unsupervised learning in Vision

Unsupervised learning in Vision Chapter 7 Unsupervised learning in Vision The fields of Computer Vision and Machine Learning complement each other in a very natural way: the aim of the former is to extract useful information from visual

More information

Flexible Calibration of a Portable Structured Light System through Surface Plane

Flexible Calibration of a Portable Structured Light System through Surface Plane Vol. 34, No. 11 ACTA AUTOMATICA SINICA November, 2008 Flexible Calibration of a Portable Structured Light System through Surface Plane GAO Wei 1 WANG Liang 1 HU Zhan-Yi 1 Abstract For a portable structured

More information

Ruch (Motion) Rozpoznawanie Obrazów Krzysztof Krawiec Instytut Informatyki, Politechnika Poznańska. Krzysztof Krawiec IDSS

Ruch (Motion) Rozpoznawanie Obrazów Krzysztof Krawiec Instytut Informatyki, Politechnika Poznańska. Krzysztof Krawiec IDSS Ruch (Motion) Rozpoznawanie Obrazów Krzysztof Krawiec Instytut Informatyki, Politechnika Poznańska 1 Krzysztof Krawiec IDSS 2 The importance of visual motion Adds entirely new (temporal) dimension to visual

More information

COMPARATIVE STUDY OF DIFFERENT APPROACHES FOR EFFICIENT RECTIFICATION UNDER GENERAL MOTION

COMPARATIVE STUDY OF DIFFERENT APPROACHES FOR EFFICIENT RECTIFICATION UNDER GENERAL MOTION COMPARATIVE STUDY OF DIFFERENT APPROACHES FOR EFFICIENT RECTIFICATION UNDER GENERAL MOTION Mr.V.SRINIVASA RAO 1 Prof.A.SATYA KALYAN 2 DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING PRASAD V POTLURI SIDDHARTHA

More information

A GENTLE INTRODUCTION TO THE BASIC CONCEPTS OF SHAPE SPACE AND SHAPE STATISTICS

A GENTLE INTRODUCTION TO THE BASIC CONCEPTS OF SHAPE SPACE AND SHAPE STATISTICS A GENTLE INTRODUCTION TO THE BASIC CONCEPTS OF SHAPE SPACE AND SHAPE STATISTICS HEMANT D. TAGARE. Introduction. Shape is a prominent visual feature in many images. Unfortunately, the mathematical theory

More information

Shape from Texture: Surface Recovery Through Texture-Element Extraction

Shape from Texture: Surface Recovery Through Texture-Element Extraction Shape from Texture: Surface Recovery Through Texture-Element Extraction Vincent Levesque 1 Abstract Various visual cues are used by humans to recover 3D information from D images. One such cue is the distortion

More information

CALCULATING TRANSFORMATIONS OF KINEMATIC CHAINS USING HOMOGENEOUS COORDINATES

CALCULATING TRANSFORMATIONS OF KINEMATIC CHAINS USING HOMOGENEOUS COORDINATES CALCULATING TRANSFORMATIONS OF KINEMATIC CHAINS USING HOMOGENEOUS COORDINATES YINGYING REN Abstract. In this paper, the applications of homogeneous coordinates are discussed to obtain an efficient model

More information

Capturing, Modeling, Rendering 3D Structures

Capturing, Modeling, Rendering 3D Structures Computer Vision Approach Capturing, Modeling, Rendering 3D Structures Calculate pixel correspondences and extract geometry Not robust Difficult to acquire illumination effects, e.g. specular highlights

More information

COMPUTER AND ROBOT VISION

COMPUTER AND ROBOT VISION VOLUME COMPUTER AND ROBOT VISION Robert M. Haralick University of Washington Linda G. Shapiro University of Washington T V ADDISON-WESLEY PUBLISHING COMPANY Reading, Massachusetts Menlo Park, California

More information

CS231A Course Notes 4: Stereo Systems and Structure from Motion

CS231A Course Notes 4: Stereo Systems and Structure from Motion CS231A Course Notes 4: Stereo Systems and Structure from Motion Kenji Hata and Silvio Savarese 1 Introduction In the previous notes, we covered how adding additional viewpoints of a scene can greatly enhance

More information

Introduction to Computer Vision

Introduction to Computer Vision Introduction to Computer Vision Michael J. Black Nov 2009 Perspective projection and affine motion Goals Today Perspective projection 3D motion Wed Projects Friday Regularization and robust statistics

More information

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)

More information

Image Formation. Antonino Furnari. Image Processing Lab Dipartimento di Matematica e Informatica Università degli Studi di Catania

Image Formation. Antonino Furnari. Image Processing Lab Dipartimento di Matematica e Informatica Università degli Studi di Catania Image Formation Antonino Furnari Image Processing Lab Dipartimento di Matematica e Informatica Università degli Studi di Catania furnari@dmi.unict.it 18/03/2014 Outline Introduction; Geometric Primitives

More information

Chapter 9 Object Tracking an Overview

Chapter 9 Object Tracking an Overview Chapter 9 Object Tracking an Overview The output of the background subtraction algorithm, described in the previous chapter, is a classification (segmentation) of pixels into foreground pixels (those belonging

More information

Prof. Fanny Ficuciello Robotics for Bioengineering Visual Servoing

Prof. Fanny Ficuciello Robotics for Bioengineering Visual Servoing Visual servoing vision allows a robotic system to obtain geometrical and qualitative information on the surrounding environment high level control motion planning (look-and-move visual grasping) low level

More information

CHAPTER 6 PERCEPTUAL ORGANIZATION BASED ON TEMPORAL DYNAMICS

CHAPTER 6 PERCEPTUAL ORGANIZATION BASED ON TEMPORAL DYNAMICS CHAPTER 6 PERCEPTUAL ORGANIZATION BASED ON TEMPORAL DYNAMICS This chapter presents a computational model for perceptual organization. A figure-ground segregation network is proposed based on a novel boundary

More information

(Refer Slide Time 00:17) Welcome to the course on Digital Image Processing. (Refer Slide Time 00:22)

(Refer Slide Time 00:17) Welcome to the course on Digital Image Processing. (Refer Slide Time 00:22) Digital Image Processing Prof. P. K. Biswas Department of Electronics and Electrical Communications Engineering Indian Institute of Technology, Kharagpur Module Number 01 Lecture Number 02 Application

More information

A Singular Example for the Averaged Mean Curvature Flow

A Singular Example for the Averaged Mean Curvature Flow To appear in Experimental Mathematics Preprint Vol. No. () pp. 3 7 February 9, A Singular Example for the Averaged Mean Curvature Flow Uwe F. Mayer Abstract An embedded curve is presented which under numerical

More information

SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014

SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014 SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014 SIFT SIFT: Scale Invariant Feature Transform; transform image

More information

ROBUST LINE-BASED CALIBRATION OF LENS DISTORTION FROM A SINGLE VIEW

ROBUST LINE-BASED CALIBRATION OF LENS DISTORTION FROM A SINGLE VIEW ROBUST LINE-BASED CALIBRATION OF LENS DISTORTION FROM A SINGLE VIEW Thorsten Thormählen, Hellward Broszio, Ingolf Wassermann thormae@tnt.uni-hannover.de University of Hannover, Information Technology Laboratory,

More information

Local qualitative shape from stereo. without detailed correspondence. Extended Abstract. Shimon Edelman. Internet:

Local qualitative shape from stereo. without detailed correspondence. Extended Abstract. Shimon Edelman. Internet: Local qualitative shape from stereo without detailed correspondence Extended Abstract Shimon Edelman Center for Biological Information Processing MIT E25-201, Cambridge MA 02139 Internet: edelman@ai.mit.edu

More information

Feature Descriptors. CS 510 Lecture #21 April 29 th, 2013

Feature Descriptors. CS 510 Lecture #21 April 29 th, 2013 Feature Descriptors CS 510 Lecture #21 April 29 th, 2013 Programming Assignment #4 Due two weeks from today Any questions? How is it going? Where are we? We have two umbrella schemes for object recognition

More information

Biometrics Technology: Image Processing & Pattern Recognition (by Dr. Dickson Tong)

Biometrics Technology: Image Processing & Pattern Recognition (by Dr. Dickson Tong) Biometrics Technology: Image Processing & Pattern Recognition (by Dr. Dickson Tong) References: [1] http://homepages.inf.ed.ac.uk/rbf/hipr2/index.htm [2] http://www.cs.wisc.edu/~dyer/cs540/notes/vision.html

More information

Lecture 8 Object Descriptors

Lecture 8 Object Descriptors Lecture 8 Object Descriptors Azadeh Fakhrzadeh Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University 2 Reading instructions Chapter 11.1 11.4 in G-W Azadeh Fakhrzadeh

More information

1 (5 max) 2 (10 max) 3 (20 max) 4 (30 max) 5 (10 max) 6 (15 extra max) total (75 max + 15 extra)

1 (5 max) 2 (10 max) 3 (20 max) 4 (30 max) 5 (10 max) 6 (15 extra max) total (75 max + 15 extra) Mierm Exam CS223b Stanford CS223b Computer Vision, Winter 2004 Feb. 18, 2004 Full Name: Email: This exam has 7 pages. Make sure your exam is not missing any sheets, and write your name on every page. The

More information

Outline 7/2/201011/6/

Outline 7/2/201011/6/ Outline Pattern recognition in computer vision Background on the development of SIFT SIFT algorithm and some of its variations Computational considerations (SURF) Potential improvement Summary 01 2 Pattern

More information

Applying Synthetic Images to Learning Grasping Orientation from Single Monocular Images

Applying Synthetic Images to Learning Grasping Orientation from Single Monocular Images Applying Synthetic Images to Learning Grasping Orientation from Single Monocular Images 1 Introduction - Steve Chuang and Eric Shan - Determining object orientation in images is a well-established topic

More information

Module 7 VIDEO CODING AND MOTION ESTIMATION

Module 7 VIDEO CODING AND MOTION ESTIMATION Module 7 VIDEO CODING AND MOTION ESTIMATION Lesson 20 Basic Building Blocks & Temporal Redundancy Instructional Objectives At the end of this lesson, the students should be able to: 1. Name at least five

More information

arxiv: v1 [cs.cv] 28 Sep 2018

arxiv: v1 [cs.cv] 28 Sep 2018 Camera Pose Estimation from Sequence of Calibrated Images arxiv:1809.11066v1 [cs.cv] 28 Sep 2018 Jacek Komorowski 1 and Przemyslaw Rokita 2 1 Maria Curie-Sklodowska University, Institute of Computer Science,

More information

COMP 558 lecture 19 Nov. 17, 2010

COMP 558 lecture 19 Nov. 17, 2010 COMP 558 lecture 9 Nov. 7, 2 Camera calibration To estimate the geometry of 3D scenes, it helps to know the camera parameters, both external and internal. The problem of finding all these parameters is

More information

Experiments with Edge Detection using One-dimensional Surface Fitting

Experiments with Edge Detection using One-dimensional Surface Fitting Experiments with Edge Detection using One-dimensional Surface Fitting Gabor Terei, Jorge Luis Nunes e Silva Brito The Ohio State University, Department of Geodetic Science and Surveying 1958 Neil Avenue,

More information

Digital Image Processing

Digital Image Processing Digital Image Processing Part 9: Representation and Description AASS Learning Systems Lab, Dep. Teknik Room T1209 (Fr, 11-12 o'clock) achim.lilienthal@oru.se Course Book Chapter 11 2011-05-17 Contents

More information

Chapter 18. Geometric Operations

Chapter 18. Geometric Operations Chapter 18 Geometric Operations To this point, the image processing operations have computed the gray value (digital count) of the output image pixel based on the gray values of one or more input pixels;

More information

Using surface markings to enhance accuracy and stability of object perception in graphic displays

Using surface markings to enhance accuracy and stability of object perception in graphic displays Using surface markings to enhance accuracy and stability of object perception in graphic displays Roger A. Browse a,b, James C. Rodger a, and Robert A. Adderley a a Department of Computing and Information

More information

9 length of contour = no. of horizontal and vertical components + ( 2 no. of diagonal components) diameter of boundary B

9 length of contour = no. of horizontal and vertical components + ( 2 no. of diagonal components) diameter of boundary B 8. Boundary Descriptor 8.. Some Simple Descriptors length of contour : simplest descriptor - chain-coded curve 9 length of contour no. of horiontal and vertical components ( no. of diagonal components

More information

Robust Shape Retrieval Using Maximum Likelihood Theory

Robust Shape Retrieval Using Maximum Likelihood Theory Robust Shape Retrieval Using Maximum Likelihood Theory Naif Alajlan 1, Paul Fieguth 2, and Mohamed Kamel 1 1 PAMI Lab, E & CE Dept., UW, Waterloo, ON, N2L 3G1, Canada. naif, mkamel@pami.uwaterloo.ca 2

More information

lecture 10 - depth from blur, binocular stereo

lecture 10 - depth from blur, binocular stereo This lecture carries forward some of the topics from early in the course, namely defocus blur and binocular disparity. The main emphasis here will be on the information these cues carry about depth, rather

More information

Schedule for Rest of Semester

Schedule for Rest of Semester Schedule for Rest of Semester Date Lecture Topic 11/20 24 Texture 11/27 25 Review of Statistics & Linear Algebra, Eigenvectors 11/29 26 Eigenvector expansions, Pattern Recognition 12/4 27 Cameras & calibration

More information

CS4733 Class Notes, Computer Vision

CS4733 Class Notes, Computer Vision CS4733 Class Notes, Computer Vision Sources for online computer vision tutorials and demos - http://www.dai.ed.ac.uk/hipr and Computer Vision resources online - http://www.dai.ed.ac.uk/cvonline Vision

More information

Diffusion Wavelets for Natural Image Analysis

Diffusion Wavelets for Natural Image Analysis Diffusion Wavelets for Natural Image Analysis Tyrus Berry December 16, 2011 Contents 1 Project Description 2 2 Introduction to Diffusion Wavelets 2 2.1 Diffusion Multiresolution............................

More information

Segmentation and Grouping

Segmentation and Grouping Segmentation and Grouping How and what do we see? Fundamental Problems ' Focus of attention, or grouping ' What subsets of pixels do we consider as possible objects? ' All connected subsets? ' Representation

More information

UNIT 2 2D TRANSFORMATIONS

UNIT 2 2D TRANSFORMATIONS UNIT 2 2D TRANSFORMATIONS Introduction With the procedures for displaying output primitives and their attributes, we can create variety of pictures and graphs. In many applications, there is also a need

More information

Problem definition Image acquisition Image segmentation Connected component analysis. Machine vision systems - 1

Problem definition Image acquisition Image segmentation Connected component analysis. Machine vision systems - 1 Machine vision systems Problem definition Image acquisition Image segmentation Connected component analysis Machine vision systems - 1 Problem definition Design a vision system to see a flat world Page

More information

Figure 1. Lecture 1: Three Dimensional graphics: Projections and Transformations

Figure 1. Lecture 1: Three Dimensional graphics: Projections and Transformations Lecture 1: Three Dimensional graphics: Projections and Transformations Device Independence We will start with a brief discussion of two dimensional drawing primitives. At the lowest level of an operating

More information

Introduction to Homogeneous coordinates

Introduction to Homogeneous coordinates Last class we considered smooth translations and rotations of the camera coordinate system and the resulting motions of points in the image projection plane. These two transformations were expressed mathematically

More information

Recognition, SVD, and PCA

Recognition, SVD, and PCA Recognition, SVD, and PCA Recognition Suppose you want to find a face in an image One possibility: look for something that looks sort of like a face (oval, dark band near top, dark band near bottom) Another

More information

5/27/12. Objectives. Plane Curves and Parametric Equations. Sketch the graph of a curve given by a set of parametric equations.

5/27/12. Objectives. Plane Curves and Parametric Equations. Sketch the graph of a curve given by a set of parametric equations. Objectives Sketch the graph of a curve given by a set of parametric equations. Eliminate the parameter in a set of parametric equations. Find a set of parametric equations to represent a curve. Understand

More information

Statistical image models

Statistical image models Chapter 4 Statistical image models 4. Introduction 4.. Visual worlds Figure 4. shows images that belong to different visual worlds. The first world (fig. 4..a) is the world of white noise. It is the world

More information

Epipolar Geometry and the Essential Matrix

Epipolar Geometry and the Essential Matrix Epipolar Geometry and the Essential Matrix Carlo Tomasi The epipolar geometry of a pair of cameras expresses the fundamental relationship between any two corresponding points in the two image planes, and

More information

A Survey of Light Source Detection Methods

A Survey of Light Source Detection Methods A Survey of Light Source Detection Methods Nathan Funk University of Alberta Mini-Project for CMPUT 603 November 30, 2003 Abstract This paper provides an overview of the most prominent techniques for light

More information

Camera Calibration for Video See-Through Head-Mounted Display. Abstract. 1.0 Introduction. Mike Bajura July 7, 1993

Camera Calibration for Video See-Through Head-Mounted Display. Abstract. 1.0 Introduction. Mike Bajura July 7, 1993 Camera Calibration for Video See-Through Head-Mounted Display Mike Bajura July 7, 1993 Abstract This report describes a method for computing the parameters needed to model a television camera for video

More information

An Intuitive Explanation of Fourier Theory

An Intuitive Explanation of Fourier Theory An Intuitive Explanation of Fourier Theory Steven Lehar slehar@cns.bu.edu Fourier theory is pretty complicated mathematically. But there are some beautifully simple holistic concepts behind Fourier theory

More information

Midterm Exam CS 184: Foundations of Computer Graphics page 1 of 11

Midterm Exam CS 184: Foundations of Computer Graphics page 1 of 11 Midterm Exam CS 184: Foundations of Computer Graphics page 1 of 11 Student Name: Class Account Username: Instructions: Read them carefully! The exam begins at 2:40pm and ends at 4:00pm. You must turn your

More information

3D Geometry and Camera Calibration

3D Geometry and Camera Calibration 3D Geometry and Camera Calibration 3D Coordinate Systems Right-handed vs. left-handed x x y z z y 2D Coordinate Systems 3D Geometry Basics y axis up vs. y axis down Origin at center vs. corner Will often

More information

Last update: May 4, Vision. CMSC 421: Chapter 24. CMSC 421: Chapter 24 1

Last update: May 4, Vision. CMSC 421: Chapter 24. CMSC 421: Chapter 24 1 Last update: May 4, 200 Vision CMSC 42: Chapter 24 CMSC 42: Chapter 24 Outline Perception generally Image formation Early vision 2D D Object recognition CMSC 42: Chapter 24 2 Perception generally Stimulus

More information

3D Models and Matching

3D Models and Matching 3D Models and Matching representations for 3D object models particular matching techniques alignment-based systems appearance-based systems GC model of a screwdriver 1 3D Models Many different representations

More information

Computer Science 426 Midterm 3/11/04, 1:30PM-2:50PM

Computer Science 426 Midterm 3/11/04, 1:30PM-2:50PM NAME: Login name: Computer Science 46 Midterm 3//4, :3PM-:5PM This test is 5 questions, of equal weight. Do all of your work on these pages (use the back for scratch space), giving the answer in the space

More information

Texture Analysis. Selim Aksoy Department of Computer Engineering Bilkent University

Texture Analysis. Selim Aksoy Department of Computer Engineering Bilkent University Texture Analysis Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr Texture An important approach to image description is to quantify its texture content. Texture

More information

Matrices. Chapter Matrix A Mathematical Definition Matrix Dimensions and Notation

Matrices. Chapter Matrix A Mathematical Definition Matrix Dimensions and Notation Chapter 7 Introduction to Matrices This chapter introduces the theory and application of matrices. It is divided into two main sections. Section 7.1 discusses some of the basic properties and operations

More information

13. Learning Ballistic Movementsof a Robot Arm 212

13. Learning Ballistic Movementsof a Robot Arm 212 13. Learning Ballistic Movementsof a Robot Arm 212 13. LEARNING BALLISTIC MOVEMENTS OF A ROBOT ARM 13.1 Problem and Model Approach After a sufficiently long training phase, the network described in the

More information

Feature Detectors and Descriptors: Corners, Lines, etc.

Feature Detectors and Descriptors: Corners, Lines, etc. Feature Detectors and Descriptors: Corners, Lines, etc. Edges vs. Corners Edges = maxima in intensity gradient Edges vs. Corners Corners = lots of variation in direction of gradient in a small neighborhood

More information

Hand-Eye Calibration from Image Derivatives

Hand-Eye Calibration from Image Derivatives Hand-Eye Calibration from Image Derivatives Abstract In this paper it is shown how to perform hand-eye calibration using only the normal flow field and knowledge about the motion of the hand. The proposed

More information

Multi-azimuth velocity estimation

Multi-azimuth velocity estimation Stanford Exploration Project, Report 84, May 9, 2001, pages 1 87 Multi-azimuth velocity estimation Robert G. Clapp and Biondo Biondi 1 ABSTRACT It is well known that the inverse problem of estimating interval

More information

Identifying Car Model from Photographs

Identifying Car Model from Photographs Identifying Car Model from Photographs Fine grained Classification using 3D Reconstruction and 3D Shape Registration Xinheng Li davidxli@stanford.edu Abstract Fine grained classification from photographs

More information

Chapter 15 Introduction to Linear Programming

Chapter 15 Introduction to Linear Programming Chapter 15 Introduction to Linear Programming An Introduction to Optimization Spring, 2015 Wei-Ta Chu 1 Brief History of Linear Programming The goal of linear programming is to determine the values of

More information

A Simple Vision System

A Simple Vision System Chapter 1 A Simple Vision System 1.1 Introduction In 1966, Seymour Papert wrote a proposal for building a vision system as a summer project [4]. The abstract of the proposal starts stating a simple goal:

More information

Camera model and multiple view geometry

Camera model and multiple view geometry Chapter Camera model and multiple view geometry Before discussing how D information can be obtained from images it is important to know how images are formed First the camera model is introduced and then

More information

Calibrating an Overhead Video Camera

Calibrating an Overhead Video Camera Calibrating an Overhead Video Camera Raul Rojas Freie Universität Berlin, Takustraße 9, 495 Berlin, Germany http://www.fu-fighters.de Abstract. In this section we discuss how to calibrate an overhead video

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Learning without Class Labels (or correct outputs) Density Estimation Learn P(X) given training data for X Clustering Partition data into clusters Dimensionality Reduction Discover

More information

Chapters 1-4: Summary

Chapters 1-4: Summary Chapters 1-4: Summary So far, we have been investigating the image acquisition process. Chapter 1: General introduction Chapter 2: Radiation source and properties Chapter 3: Radiation interaction with

More information