Towards Direct Recovery of Shape and Motion Parameters from Image Sequences

Similar documents
Structure from Motion

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

Fitting & Matching. Lecture 4 Prof. Bregler. Slides from: S. Lazebnik, S. Seitz, M. Pollefeys, A. Effros.

TN348: Openlab Module - Colocalization

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

Image Alignment CSC 767

LECTURE : MANIFOLD LEARNING

y and the total sum of

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

CS 534: Computer Vision Model Fitting

Lecture 5: Multilayer Perceptrons

Range images. Range image registration. Examples of sampling patterns. Range images and range surfaces

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Edge Detection in Noisy Images Using the Support Vector Machines

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

S1 Note. Basis functions.

What are the camera parameters? Where are the light sources? What is the mapping from radiance to pixel color? Want to solve for 3D geometry

Prof. Feng Liu. Spring /24/2017

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Ecient Computation of the Most Probable Motion from Fuzzy. Moshe Ben-Ezra Shmuel Peleg Michael Werman. The Hebrew University of Jerusalem

An efficient method to build panoramic image mosaics

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

A Robust Method for Estimating the Fundamental Matrix

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

Support Vector Machines

Active Contours/Snakes

Hermite Splines in Lie Groups as Products of Geodesics

Calibrating a single camera. Odilon Redon, Cyclops, 1914

Accounting for the Use of Different Length Scale Factors in x, y and z Directions

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

Electrical analysis of light-weight, triangular weave reflector antennas

3D vector computer graphics

Fitting and Alignment

Fitting: Deformable contours April 26 th, 2018

Face Recognition University at Buffalo CSE666 Lecture Slides Resources:

A Binarization Algorithm specialized on Document Images and Photos

A Background Subtraction for a Vision-based User Interface *


Local Quaternary Patterns and Feature Local Quaternary Patterns

High-Boost Mesh Filtering for 3-D Shape Enhancement

MOTION BLUR ESTIMATION AT CORNERS

Inverse-Polar Ray Projection for Recovering Projective Transformations

Analysis of Continuous Beams in General

Machine Learning 9. week

Problem Set 3 Solutions

Lecture 13: High-dimensional Images

Resolving Ambiguity in Depth Extraction for Motion Capture using Genetic Algorithm

Feature Reduction and Selection

Computer Vision I. Xbox Kinnect: Rectification. The Fundamental matrix. Stereo III. CSE252A Lecture 16. Example: forward motion

User Authentication Based On Behavioral Mouse Dynamics Biometrics

Recognizing Faces. Outline

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification

An Entropy-Based Approach to Integrated Information Needs Assessment

Computer Animation and Visualisation. Lecture 4. Rigging / Skinning

Detection of an Object by using Principal Component Analysis

Classifier Selection Based on Data Complexity Measures *

Machine Learning. Support Vector Machines. (contains material adapted from talks by Constantin F. Aliferis & Ioannis Tsamardinos, and Martin Law)

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics

Radial Basis Functions

A Bilinear Model for Sparse Coding

High resolution 3D Tau-p transform by matching pursuit Weiping Cao* and Warren S. Ross, Shearwater GeoServices

Optimal Workload-based Weighted Wavelet Synopses

Reducing Frame Rate for Object Tracking

Parallelism for Nested Loops with Non-uniform and Flow Dependences

A Gradient Difference based Technique for Video Text Detection

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide

A Gradient Difference based Technique for Video Text Detection

An Image Fusion Approach Based on Segmentation Region

Collaboratively Regularized Nearest Points for Set Based Recognition

Line-based Camera Movement Estimation by Using Parallel Lines in Omnidirectional Video

Shape Representation Robust to the Sketching Order Using Distance Map and Direction Histogram

Multi-stable Perception. Necker Cube

Wavefront Reconstructor

Smoothing Spline ANOVA for variable screening

Detection of hand grasping an object from complex background based on machine learning co-occurrence of local image feature

Image Representation & Visualization Basic Imaging Algorithms Shape Representation and Analysis. outline

Model-Based Bundle Adjustment to Face Modeling

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters

APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT

Lecture 9 Fitting and Matching

MOTION PANORAMA CONSTRUCTION FROM STREAMING VIDEO FOR POWER- CONSTRAINED MOBILE MULTIMEDIA ENVIRONMENTS XUNYU PAN

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

Lecture #15 Lecture Notes

Wishing you all a Total Quality New Year!

Correspondence-free Synchronization and Reconstruction in a Non-rigid Scene

Angle-Independent 3D Reconstruction. Ji Zhang Mireille Boutin Daniel Aliaga

Discriminative classifiers for object classification. Last time

An Optimal Algorithm for Prufer Codes *

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z.

3D Virtual Eyeglass Frames Modeling from Multiple Camera Image Data Based on the GFFD Deformation Method

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Corner-Based Image Alignment using Pyramid Structure with Gradient Vector Similarity

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15

Vanishing Hull. Jinhui Hu, Suya You, Ulrich Neumann University of Southern California {jinhuihu,suyay,

A Workflow for Spatial Uncertainty Quantification using Distances and Kernels

The Research of Ellipse Parameter Fitting Algorithm of Ultrasonic Imaging Logging in the Casing Hole

Machine Learning: Algorithms and Applications

Transcription:

Towards Drect Recovery of Shape and Moton arameters from Image Sequences Stephen Benot and Frank. Ferre McGll Unversty, Center for Intellgent Machnes, 4 Unversty St., Montréal, Québec, CANADA HA A7 Tel.: +1 14-1 FAX: +1 14-74 fbenots, ferreg@cm.mcgll.ca Abstract A novel procedure s presented to construct magedoman ters (receptve felds) that drectly recover local moton and shape parameters. These receptve felds are derved from tranng on mage deformatons that best dscrmnate between dfferent shape and moton parameters. Begnnng wth the constructon of 1-D receptve felds that detect local surface shape and moton parameters wthn cross sectons, we show how the recovered shape and moton model parameters are suffcent to produce local estmates of tme to collson. In general, ter pars (receptve felds) can be syntheszed to perform or detect specfc mage deformatons. At the heart of the method s the use of a matrx to represent mage deformaton correspondence between ndvdual pxels of two vews of a surface. The mage correspondence matrx can be decomposed usng Sngular Value Decomposton to yeld a par of correspondng receptve felds that detect mage changes due to the deformaton of nterest. 1. Introducton Recovery of structure from moton has been examned from a varety of approaches, manly feature pont extracton and correspondence[, 1] or computng dense optcal ow[14, 1]. Typcally, the Fundamental Matrx framework or a global moton model s used to solve for global moton after whch the relatve -D postons of ponts of nterest n the scene can be computed[1, 1]. Drect recovery of parameters by recognsng characterstc motons n a scene has been examned usng pont correspondences[1] and local optcal ow feld deformatons[4,, 1]. However, both methods requre the extracton of moton nformaton from the mage sequence before parameter recovery. Many of these technques requre local regularzaton; nsuffcent nformaton s avalable n each sample to reconstruct the surface shape[]. Appearance-based methods have been mostly dscarded for structure from moton because much of the shape and moton nformaton are so confounded that they cannot be recovered separately or locally[]. Soatto proved that perspectve s non-lnear, therefore no coordnate system wll lnearze perspectve effects[17]. art of the dffculty nvolves how regonal mage changes can be represented n an appearance-based framework. Most approaches nvolve solvng a local ow feld as a weghted sum of bass ow felds wth some perceptual sgnfcance[, ] or trackng specfc mage features[1] wth wavelets[7], typcally tranng on mage sequences of motons of nterest[]. Convertng these representatons nto mage doman operatons[11, 1] could allow drect recovery of sgnfcant model parameters wthout solvng a local optmzaton problem by gradent descent. Two causes prevent the effectve drect recovery of local structure from moton wth appearance-based methods. Frst s representaton: how to encode mage deformaton ndependently of mage texture and wthout the need of feature ponts. Second s how to map the mage deformatons nto some optmal mage operators. Ths paper addresses both ssues by descrbng a representaton and a methodology for desgnng and usng these optmal mage doman operators for appearance-based local recovery of some shape and moton parameters. Ths feedforward system s used to compute a dense map of tme to collson n mage sequences. The optmal operator synthess wll dscover what spatal scales most approprately descrbe the dfferent deformatons for the camera model. Instead of attemptng to recover the shape and moton parameters of a scene over all the possble parameter space, only those shapes and motons that cause dstnctly dfferent

mage deformatons are consdered. Even wth alasng of some parameters nto smlar appearances, enough nformaton can be extracted from the mage deformatons to recognse specfc shapes or motons whch are useful for specfc tasks such as computng tme to collson. By understandng the mage formaton process, the mappng from the shape and moton model parameters to mage deformatons (mage correspondences) can be expressed n a matrx form that precsely encodes the mage plane correspondences of a gven model nstance. In ths paper, we wll apply the theory to 1-D cases wthout loss of generalty. A pont-correspondence mappng M from a pont x n a frst space to a second space at pont y s expressed as: x ρ X! M! y ρ Y (1) y ρ Y! M 1! x ρ X () The functon M may be contnuous, dscontnuous, pecewse lnear or non-lnear, mult-valued, and n short, arbtrarly complex. But f the spaces X and Y can be dscretzed, the functon M can be expressed as a correspondence matrx mappng the ndex of a dscrete [x] to the ndex of a dscrete [y] and back. The correspondence matrx H s a representaton of such an mage deformaton or a coordnate remappng. Element H j s a non-negatve real number ndcatng the amount (probablty) of correspondence between coordnate n the frst space, [X] and coordnate j n the second space, [Y] j. H j, H j = [X] M () [Y] j () @ A 1 M([X] ) [Y] j A + A [X] M 1 ([Y] j ) ρ [; 1] (4) A ([X] )+A [Y] j where A (N ) s a measure of area or volume n the neghborhood N, a measure of the sze of the Vorono cell of N. The correspondence matrx thus descrbed s not to be confused wth the fuzzy correspondence matrx of Ben-Ezra et al.[1] whch descrbed the correspondences at each pont p wth a matrx M(p) where each cell (; j) corresponds to a probablty that pont p has a dsplacement (; j). Note that H s a dscretzed representaton of a possbly contnuous mappng M. Both and j can represent sngle-axs (.e. 1-D) postons, but could also represent an arbtrary ndexng nto a mult-dmensonal space. To construct a correspondence matrx H, 1. Dscretze the frst mage space X nto M elements.. Dscretze the second mage space Y nto N elements.. Intalze H M N = Hj M = Hj M 1 =. 4. For each f1;:::;mg: f usng a unform dstrbuton of ponts n [X], f compute the dstrbuton of ponts [Y] j at or near M([X] ), f assgn H j j M = A M([X] ) [Y] j.. For each j f1;:::;ng: f usng a unform dstrbuton of ponts n [Y] j, f compute the dstrbuton of ponts [X] at or near M 1 ([Y] j ), f assgn H j j M 1 = A [X] M 1 ([Y] j ). Assgn H j = A([Y] j)h jjm +A([X] )H jjm 1. A([X] )+A([Y] j ) By choosng a model that may alas the appearance changes due to some parameters together but generates dstnct appearance changes for the most mportant parameters, the model parameter space can be dscretzed over the range of nterestng scene events. Only a fnte number of these correspondence matrces are requred to encode the appearance changes for changes of those most mportant parameters over all varatons for the alased parameters. The scenaro presented n Secton s the recovery of shape and moton usng a smple surface model that captures mage translaton and scale change. By samplng the model space and generatng the correspondence matrces for key shapes and motons, the recoverable parameters are enough to deduce the tme to collson from an mage sequence. The operators are tested wth synthetc and real data n Secton 4 and demonstrate the successful recovery of tme to collson for synthetc and real mage sequences. Wth mage formaton models n mnd, Secton wll derve the unfed soluton of convertng the mage formaton models nto mage doman operators, and how they are used to detect mage events of nterest.. Image Formaton Model for Local Shape and Moton Ths secton detals a model that encodes planar mage translaton, lke optcal ow, as well as depth (or scale) change. By buldng detectors for ths more evolved model, a local operator can detect both translaton and scale change to recover depth cues. Usng these detectors n the earlest vson layer should yeld superor optcal ow. Whle recoverng depth nformaton from an mage sequence can only be found up to a scale factor, the absolute tme to collson can be recovered from the same cues. To ths end, we defne the followng mage formaton model..

A sngle orented slt aperture for a perspectve mage formaton model can be parametrzed wth an aperture angle, ff and the number of pxels N that are vsble on the mage plane wthn that aperture. z d (or 1) V γ γ η N δ d (or δ ) Ω V h α V x Q R=1/K=d/k (or 1/k) O β Image plane 1...... N 1 N Surface Fgure 1. 1-D local aperture magng model. As shown n Fgure 1, the fxaton pont O s the pont on the surface n the center of the aperture s feld of vew,.e. vew angle =. ranges n value from ff= to ff=, wth postve to the rght (postve x drecton). For any vew angle formed between the V O and V lnes wthn ff= and ff=, there s a correspondng value for the mage plane coordnate x that can be projected nto one of the pxel coordnates between and N 1, = N 1 γ O N tan() + ff : () tan And, conversely, the mage coordnate can be mapped back to the vew angle, = tan 1» N +1 N ff tan : () Ths camera model wll be used to map surface ponts to mage coordnates to buld the correspondence matrx H wth a scene structure model, descrbed next. Two vews of a 1-D cross secton of a surface can be locally modeled usng parameters. The surface shape along one drecton can be characterzed by a curvature K, a normal vector ~ N and dstance from the vewpont, d, shownn Fgure. Dstance d scales all lengths of the dagram, so t s factored out to a canoncal representaton wth unt dstance between frst vewpont V and the fxaton pont on the surface. The surface normal vector ~ N at s encoded by the angle wth respect to the frst vew axs V. The curvature of the canoncal surface becomes k = Kd. The moton model s chosen to mnmze mage deformaton due to translaton, defnng the second vewpont V at a gven dstance ff at an angle f from the frst vew axs V. V s fxated on pont Q on the surface, a vew rotaton Ω away fromthe frst fxatonpont. Fgure. 1-D local aperture secton model. Ths 1-D cross secton shape and moton model can be algned wth multple orentatons at each poston n the mage plane to reconstruct -D surface shape and moton. Some expermental observatons[], beyond the scope of ths paper, revealed that the moton parameters Ω and ff are not separable, but can be found jontly; together they produce dstnct moton felds. These can be consdered frst order structure from moton parameters. In contrast, the vew angle change f and the shape parameters k and were observed to be confounded. Even n apertures of N = pxels, there s very lttle dfference n the moton feld to dstngush dfferent shapes from a sngle aperture. These parameters requre ether neghborhoods of support of curvature consstency between neghbors or a global soluton. Ths extra work would qualfy the parameters f;k; as second order structure from moton parameters, and wll not be recovered for the purposes of ths paper. For mage pars of nterest, the vews wll overlap n the camera aperture of angle ff (shown n Fgure 1), restrctng the useful range of Ω to about ( ff=; +ff=), and typcally, ff < 1 ff. Wth these angles so small, Ω wll be lnearly proportonal to the translaton along the mage slt axs. The parameter ff s the recprocal of mage scale change, ff = kv Ok kv Ok : (7) Recoverng Ω and ff locally n forward tme can be augmented by recoverng Ω and ff by reversng the sequence of the mages. A drect method of computng the tme to collson between two mages separated by a delay of t s: T = t ff 1 ff + 1 ff 1 : () Calculatng the tme to collson requres recoverng the ff and Ω parameters. Fgure llustrates some of the unque H for specfc nstances of model parameters (Ω;ff;k; ;f). The rows are the ndex nto the frst mage I, and the columns are the ndex nto the second mage I. Ω shfts the whte curve horzontally, and ff affects ts slope. Ths paper

ams to recover H gven mage par (I; I ). Secton detals the general theory for constructng mage doman operators to detect ther characterstc mage deformatons. Ω = 1: ff ff = :47 Ω = 1: ff ff = 1:1111 Ω = : ff ff = 1: Ω = 1: ff ff = 1:1111 Fgure. Correspondence matrces H Ω;ff.. Theory The mage structure from two mages separated by tme can be encoded by a correspondence matrx H. Ths secton wll show that the Sngular Value Decomposton of H leads to a unque set of mage doman operators that can code for specfc scene events as encoded by H. The correspondence matrx H relates the elements of a frst mage I and a second mage I. The mages are frst normalzed as ~ I; ~ I for a zero mean ntensty and a contrast of 1 by fndng the mage s brghtness μi and contrast I. μi, I, I + I max N ~I = I μ I μi I.1. Image Mappng H (ji μij; ji μij) μi ; ρ (; 1) ; ; ~I = I μi μi I : () Image correspondence between normalzed mages ( ~ I; ~ I ) s determned by H, but not necessarly all nformaton contaned n ~ I s present n ~ I nor vce versa. Ths means, for example, that some elements or pxels of ~ I do not come from ~ I but come nstead from some other source. The mage correspondence constrants are n the form of a weghted sum of elements n the complementary mage: @ X j ψ X H j 1 A ~I = H j! ~I j = X j X H j ~I j ; f1:::ng ; H j ~I ; j f1:::ng :(1) The mage correspondence equatons can be re-wrtten: S ~ I = H ~ I ; S ~I = H T ~I : (11) S = S = 4 4 j H 1j j H j... j H Nj H 1 H... H N 7 7 (1) Equaton Set 11 deals automatcally wth non-shared nformaton: note that elements n one space that have no correspondent n the other space wll have a zero row sum of H (element of S) or zero column sum of H (element of S )... Operator Synthess Gvenanmagepar( I; ~ I ~ ), the goal s to fnd the H that best descrbes ther deformaton and dentfy that H s model parameters. In order to recognse the mage events for a gven correspondence matrx, there needs to be a transport functon to map between the two vews. The mage par ( I; ~ I ~ ) can be represented as two dfferent transforms of a common texture vector ~w. In order for the elements of texture vector ~w to be ndependent and lnear, the mage synthess functons must be: ~I ß A ~w ; ~I ß B ~w : (1) N 1 N N N 1 N 1 N N N 1 where A s orthonormal and B s also orthonormal. A and B are not expected to be exact solutons, because the mage formaton process does not lend tself to a drect lnear mappng between the mage par. Image doman deformaton caused by perspectve projectons of -D objects, for example, confounds the surface texture sgnal and the geometrc deformaton nto one mage sgnal, and the two components are generally not separable. A and B are, however, the best lnear approxmaton to the geometrc deformaton mage correspondence n the sense of mnmzng some error. The texture nformaton shared by the two mages va H must exst n the space spanned by H, and thus can be expressed compactly n r ndependent coeffcents, where r = rank(h). Thefrstr coeffcents of ~w encode the statonary sgnal shared between ~ I and ~ I that are expressed through H. The remanng N r coeffcents encode the non-shared mage sgnals mapped nto the null space of H. Because A and B are chosen to be orthonormal and square, they are nvertble (by transposton). Ths means that the texture vector ~w can be recovered from mages ( ~ I; ~ I ) knowng the correspondence matrx H. A T A ^w = A T ~I ) ^w = A T ~I (14) B T B ^w = B T ~I ) ^w = B T ~I (1)

.. Soluton va Sngular Value Decomposton Substtutng Equaton set 1 nto Equaton set 11, SA ~w ß HB ~w ; ~w S B ~w ß H T A ~w ; ~w : (1) The optmzaton problem to solve for A and B s thus: argmn ksa ~w HB ~wk ; argmnks B ~w H T A ~wk : (17) Because the matrces A and B are expected to work ndependently of the surface texture encoded n ~w, the optmzaton problem can be expressed as argmnksa HBk ; argmn The mnmzaton terms can be recombned as argmn ks B H T Ak : (1) ksab T Hk ; argmnks BA T H T k : (1) Ths can be vsualzed more clearly by substtutng H wth ts Sngular Value Decomposton, H N N = U N N ± N N VT N N : () where U s the matrx whose columns are the left-hand egenvectors of H and the columns of V are the rght-hand egenvectors. That s, the columns of U are the egenvectors of HH T, and the columns of V are the egenvectors of H T H, both sorted n decreasng order of egenvalues. The sngular values of H are the elements of the dagonal matrx ±, whch are the square root of the egenvalues from HH T or H T H. Substtutng the SVD, one must solve for: argmnksab T U±V T k ; argmnkab T S U±V T k : (1) The remander of ths proof bulds A and B one column at a tme, usng successvely better approxmatons of H. The k th order approxmaton, H k uses the frst k columns of U and V and the frst k dagonal elements of ±. To solve for unknown orthogonal matrces A and B, consder the 1 st order approxmaton of H, H 1 = ff 1 ~u 1 ~v T. 1 SA 1 B T 1 ß ff 1 ~u 1 ~v T 1 ; A 1 B T 1 S ß ff 1 ~u 1 ~v T 1 () The only choce for A 1 and B 1 that spans the same space as ~u 1 and ~v 1, satsfyng both constranng equatons s A 1 = h~u 1 ;~ ;:::; ~ ; B 1 = h~v 1 ;~ ;:::; ~ : () An nductve proof ntroduces hgher k th order approxmatons H k to solve for the correspondng A k and B k. Smlarly, SA k B T k SA k 1 B T k 1 ß ß kx =1 k 1 X =1 ff ~u ~v T (4) ff ~u ~v T () S(A k B T k A k 1 B T k 1 ) ß ff k~u k ~v T k : () (A k B T k A k 1 B T k 1 )S ß ff k ~u k ~v T k : (7) Ths mples that A k s A k 1 plus an orthogonal component n the drecton of ~u k and that B k s B k 1 plus an orthogonal component n the drecton of ~v k. Therefore, the k th column of A s the k th column of U and the k th column of B s the k th column of V. Therefore, the least squares error soluton for Equaton Set 1, satsfyng full rank and orthonormalty of A and B s to substtute A = U ; B = V: () Informally, the SVD expresses the correspondence matrx H as a lnear remappng U T from ~ I to the same space as a lnear remappng V T from ~ I. The SVD of H optmzes the spectral representaton of the transform between a par of data sets (mages), maxmzng the ndependence between channels (elements of ~w), ranked accordng to sgnfcance n the sense of mnmzng least squares error..4. Dstance to Data Metrc The maxmum lkelhood hypothess H for an mage par ( ~ I; ~ I ) mnmzes the resdual error ~r between the nput mages and ther reconstructons predcted by H. Determne the feature vector ^w that uses k th order approxmatons U k and V k to represent the mage vectors ~ I and ~ I. ^w = h U k T = p. V k T = p 4 ~ I ::: ~I () The feature vector ^w s now the best parameterzaton for the mage par assumng deformaton H. The resdual error can be computed by projectng the feature vector back nto the mage space. If the assumed deformaton H s suffcently close to the scene geometry, then resdual sgnal error ~r, the dfference between the orgnal mage sgnal and the reconstructed mage sgnal wll be low. ~r = @ 4 ~ I ::: ~I 4 U k ::: V k ^w 1 A. 4 ~ I ::: ~I ()

The lkelhood of correspondence H gven evdence ( ~ I; ~ I ) can be expressed as a functon L Hj ~ I; ~ I. L Hj ~ I; ~ I, e k~rk ρ (; 1] (1) The uncertanty of the maxmum lkelhood choce can be expressed as the entropy h of the lkelhoods for all the dfferent hypotheses. n =1 L H j ~ I; ~ I log L H j ~ I; ~ I h = log(n) ρ (; 1] () To summarze the procedure for parametrc recovery, reparaton (off-lne) 1. Dscretze parameter space M nto n representatve parameter vectors ~m, f1;:::;ng.. Synthesze a correspondence matrx H to represent the coordnate mappng M( ~m ).. Synthesze detectors for H, usng SVD. Usage (on-lne) 1. Usng the detectors for H, gven the observatons ~ I and ~ I, compute L H j ~ I; ~ I.. Fnd the maxmum lkelhood H j ~ I; ~ I,andset the maxmum lkelhood ~m Λ = ~m.. Use the dstrbuton of L H j ~ I; ~ I to fnd the entropy h of the maxmum lkelhood choce. 4. Experments on Tme to Collson 4.1. Synthess of Operators The model space of (Ω;ff;k; ;f) was dscretzed nto 1 levels for each of Ω and ff, and levels each for k,, f and ff, bnnng the correspondence matrces nto mean correspondence matrces ndexed by (Ω;ff) for a total of 1 dstnct correspondence matrces. Applyng the procedures of Secton, 1 new receptve feld banks are syntheszed by retanng the frst 1 ter pars of each, llustrated n Fgure 4. Each row of the upper grayscale mages at tme t s a deformed snusod whch corresponds to another deformed snusod at tme t 1 = t + t, n the correspondng row n the lower grayscale mages. The frst detectors are shown. Note that the most sgnfcant optmal detectors for ths moton model are wndowed snusods concentrated n the lower spatal frequences. The SVD t # t + t Ω = 1: ff ff = :47 Ω = 1: ff ff = 1:1111 Ω = : ff ff = 1: Ω = 1: ff ff = 1:1111 Fgure 4. U, V of correspondence matrces. automatcally dscovered the approprate spatal scales for detectng each deformaton. The parameters used are: Number of pxels: N 4 Vew translaton : Ω[] :4 ff Dstance to surface: ff[] 1:4 (, f; ::; 1g ), f; ::; 1g Surface normal : [] f 4 ff ; : ff ;:::;4 ff g Vew change : f[] f 1 ff ; ff ; ff ; ff ; 1 ff g Surface curvature: k[] f 4; ; ; ; 4g Vew aperture : ff[] f ff ; : ff ; ff ; : ff ; 1 ff g These experments were performed usng slts of length N = 4 wth wdths of pxels. The ter banks are appled to a grddng of the mage par at dfferent orentatons ( ff ; ff ; ff ; ff ; 1 ff ; 1 ff ). The same procedure s appled by swappng the two mages to recover the maxmum lkelhood for the tme reversal, (Ω ;ff ; ). Ω and Ω are useful for optcal ow, but the local depth (or tme to collson) cues are only avalable through the ff and ff measures. Ω and ff are found jontly, and the usefulness of ff s senstve to the correct compensaton of Ω. Some neghborhood terng s requred, snce ff s nosy. The maxmum lkelhood values for Ω;ff; Ω ;ff n the grd of elements are medan tered n a neghborhood for Ω; Ω,anda7 7 neghborhood for ff;ff. 4.. Synthetc Images, 4 Boxes To test the Tme to Collson detectors, a random dot texture was combned wth synthetc range data shown n Fgure to generate the synthetc mage par shown n Fgure. The surface conssted of 4 quadrants of varyng dstance (4,, and mm from the camera focal pont), and the camera moves by mm towards the center of the scene. The ground truth tme to collson for the four quadrants are thus,, 1 and 11 unts of tme n the future. The camera geometry s modeled as a pnhole perspectve camera wth 4 square pxels where pxels are mapped onto a vewng angle of 4:1 ff.thsmeansthat the mage samples of 4 pxels wll cover aperture angles ff varyng from 1 ff at the center to ff at the perphery.

The recovered Ω; Ω data s almost perfectly planar for each orentaton, as shown n Fgure 7. Ths s expected, as the scene moton generates a radal ow feld. Ω s proportonal to the mage plane velocty, wth lnearly ncreasng magntude as t moves further from the focus of expanson. Note that the Ω parameter s sgned and drectonal. The local structure from moton problem s at ts worst n mage sequences wth a forward-movng camera. Optcal ow methods have no nformaton at or near the focus of expanson, and small errors n the ow feld even toward the perphery render depth recovery mpractcal wthout fttng a global model to the optcal ow feld. Wth ths experment, the tme to collson detectors are desgned specfcally to work best at or near the focus of expanson. Tme to collson, 4 boxes from to 11, Ground Truth Tme to collson, 4 boxes from to 11, Recovered 1 1 1 1 11 11 1 1 Fgure. Synthetc texture and range. Black s 4mm from focal pont, whte s mm. 7 7 Tme to collson, 4 boxes from to 11, Ground Truth Tme to collson, 4 boxes from to 11, Recovered 11 1 1. 1. 1 1 11. 11. 1. 1... 1 1 1 1 1 1 1 1 Ω Fgure. Synthetc mage par. The camera moves mm toward the center of the scene. ff ff ff ff 1 ff 1 ff Ω ff Fgure 7. Recovered Ω and Ω for orentatons at mage locatons (black: :4 ff, gray: ff,whte:+:4 ff ). ff ff ff ff 1 ff 1 ff ff Fgure. Recovered ff and ff for orentatons at mage locatons (black: :1, gray: 1:, whte:1:4. Snce Ω; Ω have been cleanly recovered, t s not surprsng that the more senstve ff;ff have formed blobs roughly n the four quadrants of the scene, shown n Fgure. Fgure. Tme to Collson, ground truth versus recovered. After observng the mages of Fgure separated by mm and one unt of tme, the receptve felds are able to detect the separate, yet tghtly clustered, collson events that are,, 1 and 11 unts of tme n the future. Wth ths n mnd, the resultng tme to collson results are reasonably close to the ground truth, shown sde by sde n Fgure. 4.. Natural mages, Calbraton Grd The experment was repeated usng grayscale mages captured by a camera mounted on a gantry robot lookng at a at planar calbraton grd. The camera feld of vew s the same as the earler synthetc example, but t s not an dealzed pnhole camera. The camera lens starts at mm from the plane, and moves mm closer, for a tme to collson of unts of tme, show n Fgure 1. The camera optcs slghtly bend the grd lnes; the small squares toward the perphery are a bt smaller than the squares near the center. Ths sphercal aberraton makes the grd appear as a textured sphere shown close up, drectly n front of the camera. The camera s focal length s 7mm, placng the focal pont about 1mm behnd the lens from whch camera dstance was measured. The detector response n Fgure 11 s so senstve that the recovered tme to collson captures the sphercal aberraton effect. The tme to collson vares from 11 unts of tme at the center to unts of tme at the perphery. The aberraton

Fgure 1. lanar calbraton grd scene. Tme to collson, at grd, Recovered 1 1 4 1 1 4 1 1 Tme to collson, at grd, Recovered Fgure 11. Tme to collson, planar grd. makes the perphery appear further than t actually s, movng faster than the center, hence a lower tme to collson. In ths case, optcal ow n the perphery s unnformatve; there appears to be no translaton, but there s texture expanson. Ths expermental result confrms that the local feed-forward operators can extract precse estmates of the real tme to collson wthout a global scene moton model.. Concluson The drect recovery of some shape and moton parameters from mage sequences s now possble wth magedoman appearance, wthout a need for a global moton soluton. By populatng a synthetc retna wth specalzed receptve felds, the vson system can estmate tme to collson wth the obstacles n the scene. A new framework for operator synthess was ntroduced that constructed scene shape and moton detectors from scene moton models n a prncpled way, automatcally dscoverng approprate spatal scales. Ths technque can be appled to many domans, automatcally fndng optmal mage-doman transfer functons between two mages or two spaces. Image events characterzed by H lead drectly to synthetc local feed-forward operators, and naturally produce uncertanty measures or confdence ntervals on the recovered model parameters. relmnary results ndcate that these purely local feedforward operators perform reasonably accurate recovery of scene structure wthout the need to regularze. References [1] J. Barron and R. Eagleson. Moton and structure from tmevaryng optcal ow. In Vson Interface, pages 14 111, 1 1 May 1. [] S. Benot. Towards Drect Moton and Shape arameter Recovery from Image Sequences. hd thess, McGll Unversty,. [] D. DFranco and S. Kang. Is appearance-based structure from moton vable? In nd Int. Conference on -D Dgtal Imagng and Modelng, Ottawa, Canada, Oct. 1. [4] S. Fejes and L. Davs. Drecton-selectve ters for egomoton estmaton. Techncal Report CS-TR-14,CAR-TR-, Unversty of Maryland at College ark, 17. [] S. Fejes and L. Davs. Explorng vsual moton usng projectons of moton felds. In roceedngs of the ARA Image Understandng Workshop, pages 11 1, May 17. [] S. Fejes and L. Davs. What can projectons of ow felds tell us about the vsual moton. In ICCV, 1. [7] G. D. Hager and. N. Belhumeur. Effcent regon trackng wth parametrc models of geometry and llumnaton. IEEE AMI, (1):1 1, 1. [] J. Heel. Drect estmaton of structure and moton from multple frames. Techncal Report AI Memo 11, MIT AI Lab, March 1. [] B. Krse, N. Vlasss, R. Bunschoten, and Y. Motomura. Feature selecton for appearance-based robot localzaton. In roceedngs RWC Symposum,. [1] M. W. M. Ben-Ezra, S. eleg. Real-tme moton analyss wth lnear-programmng. In roc. ICCV, volume, page 7, Corfu, Greece, September 1. [11] R. Manmatha. Measurng the affne transform usng gaussan ters. In roc. of the ECCV (), pages 1 14, 14. [1] C. Nastar, B. Moghaddam, and A. entland. Generalzed mage matchng: Statstcal learnng of physcally-based deformatons. In roc. of the Fourth European Conference on Computer Vson (ECCV, Cambrdge, UK, Aprl 1. [1] J. Olenss. Drect mult frame structure from moton for hand held cameras. In ICR Vol. I, pages,. [14] S. Roy and I. J. Cox. Moton wthout structure. In 1th Int. Conference on attern Recognton, Vol. I, pages 7 74, Venna, Austra, August 1. IEEE. [1] S. Soatto. Observablty/dentfablty of rgd moton under perspectve projecton. Techncal Report CIT-CDS 4-1, Calforna Insttute of Technology, Jan 14. [1] S. Soatto and. erona. Dynamc vsual moton estmaton from subspace constrants. Techncal Report CIT-CDS 4-, Calforna Inst. of Tech., asadena, CA, Jan. 14. [17] S. Soatto and. erona. On the exact lnearzaton of structure from moton. Techncal Report CIT-CDS 4-11, Calforna Insttute of Technology, asadena, CA, May 14. [1] G.. Sten and A. Shashua. Drect methods for estmaton of structure and moton from three vews. Techncal Report AIM-14, MIT AI Lab, Nov. 1. [1] C. Stller and J. Konrad. Egentransforms for regon-based mage processng. In roc. Int. Conf. on Consumer Electroncs, pages 7, Chcago, IL, USA, June 1. [] C.-K. Tang and G. G. Medon. Robust estmaton of curvature nformaton from nosy d data for shape descrpton. In roceedngs of the ICCV (1), pages 4 4, 1. [1] C. Tomas. Input redundancy and output observablty n the analyss of vsual moton. In roc. Sxth Symposum on Robotcs Research, pages 1. MIT ress, 1. [] Y. Yacoob and L. S. Davs. Learned models for estmaton of rgd and artculated human moton from statonary or movng camera. IJCV, (1):,.