Comparison between Motion Analysis and Stereo

MOTION ESTIMATION The slides are from several sources through James Hays (Brown); Silvio Savarese (U. of Michigan); Octavia Camps (Northeastern); including their own slides.

Comparison between Motion Analysis and Stereo Stereo: Two or more frames baseline Motion: N frames time The baseline is usually larger in stereo than in motion: Motion disparities tend to be smaller Stereo images are taken at the same time: Motion disparities can be due to scene motion There can be more than 1 transformation btw frames

Process of images over time. Image sequence: An image sequence is a series of N images, or frames, acquired at discrete time instants t k =t o +k t, where t is a fixed time interval and k=0,1,,n-1

Assuming that illumination does not change: Image changes are due to the RELATIVE MOTION between the scene and the camera. There are 3 possibilities: Camera still, moving scene Moving camera, still scene Moving camera, moving scene

A simple example of 2D motion estimation Time to Collision f l(t) D o D(t) t L v L t=0 An object of height L moves with constant velocity v: At time t=0 the object is at: D(0) = D o At time t it is at D(t) = D o vt It will crash with the camera at time: D(τ) ) = D o vτ = 0 τ = D o /v

Time to Collision The image of the object has size l(t): v f l(t) t L L t=0 Taking derivative wrt time: D o D(t)

v f l(t) t L L t=0 D o And their ratio is: D(t) It crash in tau seconds from the current t.

Time to Collision f l(t) t L v L t=0 Can be directly measured from image D o And time to collision: D(t) Can be found, without knowing L or D o or v!! or camera calibration

Uses of motion Tracking features Segmenting objects based on motion cues Tracking objects Recognizing events and activities Learning dynamical models Improving video quality motion stabilization super resolution etc.

Computer Vision Motion is a basic cue. Motion can be the only cue for segmentation http://www.michaelbach.de/ot/cog_hiddenbird/index.html 17

Motion is a basic cue. Even impoverished motion data can evoke a strong percept G. Johansson, Visual perception of biological motion and a model for Its analysis", Perception and Psychophysics 14, 201-211, 1973.

Feature tracking is not simple. We will examine here only optical flow but many others, from Kalman filter to mean shift tracking also exist.

Optical flow Definition: optical flow is the apparent motion of brightness patterns in the image. Note: apparent motion can be caused by lighting changes without any actual motion too! Think of a uniform rotating sphere under fixed lighting vs. a stationary sphere under moving illumination. GOAL: Recover image motion at each pixel by optical flow.

Example: Optical flow Vector field function of the spatio-temporal image brightness variations

Estimating optical flow I(x,y,t 1) I(x,y,t) Given two subsequent frames, estimate the apparent motion field u(x,y), v(x,y) between them. Key assumptions Brightness constancy: projection of the same point looks the same in every frame. Small motion: points do not move very far. Spatial coherence: points move like their neighbors.

t y x I y x v I y x u I t y x I t u y u x I ), ( ), ( 1),, ( ),, ( Brightness Constancy Equation: ), ( 1),, ( ),, ( ), ( t y x y x v y u x I t y x I Linearizing the right side using Taylor expansion: The brightness constancy constraint I(x,y,t 1) I(x,y,t) 0 t y x I v I u I Hence, Image derivative along x 0 I v u I t T t y x I y x v I y x u I t y x I t u y u x I ), ( ), ( 1),, ( ),, (

Can we use this equation to recover image motion (u,v) at each pixel? I T u v I 0 How many equations and unknowns per pixel? One equation since this is a scalar equation but at least two unknowns (u,v)... for an x and y. t More then one pixel in the neighbood must have the same optical flow in order to work.

Adding constraints B. Lucas and T. Kanade. An iterative image registration technique with an application to stereo vision. In Proceedings of the International Joint Conference on Artificial Intelligence, pp. 674 679, 1981. The u and v here will be only pure translation. Spatial coherence constraint: Assume the pixel s neighbors have the same (u,v). If we use a 5x5 window, that gives us 25 equations per pixel T

Lucas-Kanade flow Overconstrained linear system:

Lucas-Kanade flow Overconstrained linear system Least squares solution for d given by The summations are over all pixels in the K x K window. (here 5x5 window)

Conditions for solvability Optimal (u, v) satisfies Lucas-Kanade equation When is this solvable? A T A should be invertible A T A should not be too small due to noise eigenvalues 1 and 2 of A T A should not be too small A T A should be well-conditioned 1 / 2 should not be too large ( 1 = larger eigenvalue) Does this remind anything to you...

... M = A T A is the second moment matrix (Harris corner detector ) Eigenvectors and eigenvalues of A T A relate to edge direction and magnitude. The eigenvector associated with the larger eigenvalue points in the direction of fastest intensity change. The other eigenvector is orthogonal to it. The two derivates are...

Edge example large only across the edge large 1, small 2

Low-texture region example both are small in any direction small 1, small 2

High-texture region the best motion tracker, but more difficult to have the same optical flow... example both are large in any direction large 1, large 2

The barber pole illusion Appears as an illusion if optical flow is based on lines. It always exist: this is the aperture problem. www.sandlotscience.com/ambiguous/barberpole_illusion.htm http://en.wikipedia.org/wiki/barberpole_illusion

The aperture problem Actual motion

The aperture problem Perceived motion

Ambiguities in tracking a point on a line gradient The component of the flow perpendicular to the gradient (i.e., parallel to the edge) cannot be measured. The gradient is independent of the motion. The vectors (u, v) and (u+u', v+v') gives the same response upto a scale. Ambiguities exist in optical flow detection.

Motion organization e.g. Gestalt laws can be used for motion...

Grouping cues: Common fate Image credit: Arthus-Bertrand (via F. Durand)

Lucas-Kanade was generalized upto projective transformation. See S. Baker and I. Matthews. Lucas-Kanade 20 years on: A unifying framework. IJCV, v.53, 221-255, 2004. Iterative Refinement Estimate velocity at each pixel using one iteration of Lucas-Kanade type estimation. Wrap one image toward the other using the estimated flow field. Refine estimate by repeating the process till convergence.

Optical Flow: Iterative Estimation one-dimensonal optical flow close to x_0 original f_1 final f_2 estimate update e Initial guess: Estimate: x 0 d_1 = -(difference) / gradient x using d for displacement here instead of u Image gradient in x_0, e intensity difference.

Optical Flow: Iterative Estimation estimate update Initial guess: Estimate: x 0 x Warp one image toward the other. Repeat the procedure.

Optical Flow: Iterative Estimation estimate update Initial guess: Estimate: x 0 x At convergence...

Optical Flow: Iterative Estimation x 0 x... transformed f coincides with f 1 2

Optical Flow: Aliasing Temporal aliasing causes ambiguities in optical flow because images can have many pixels with the same intensity. How do we know which correspondence is correct? actual shift estimated shift nearest match is correct (no aliasing) To overcome aliasing: coarse-to to-fine fine estimation. nearest match is incorrect (aliasing)...will lead to image pyramids

p --> M p + d 2x2 2x1

In a bounded region find the optimal d (2-dim) and M (4-dim). Wrap forward: x' = x + u(x,a) I(x,t) = I(x', t-1)...repeat till covergence

Revisiting the small motion assumption Is this motion small enough? Probably not it s much larger than one pixel between frames.

Reduce the resolution in image pyramid.

Coarse-to-fine optical flow estimation u goes from 1.25 to 10 eliminating most aliasing u=1.25 pixels u=2.5 pixels u=5 pixels image H 1 u=10 pixels image I image 2 Gaussian pyramid of image 1 (t) Gaussian pyramid of image 2 (t+1)

Coarse-to-fine optical flow estimation run iterative L-K run iterative L-K warp & upsample... image J 1 image I image 2 Gaussian pyramid of image 1 (t) Gaussian pyramid of image 2 (t+1)

Example of optical flow without pyramid

Example of optical flow with pyramid

Motion Field (MF) The MF assigns a velocity vector to each pixel in the image. These velocities are INDUCED by the RELATIVE MOTION btw the camera and the 3D scene The MF can be thought as the projection of the 3D velocities on the image plane.

Consider a 3D point P and its image: P p f Z z image plane in 3D Using pinhole camera equation:

Let things move: V P The relative velocity of P wrt camera: pv f Z z Translation velocity Rotation angular velocity

3D Relative Velocity: V P The relative velocity of P wrt camera: pv f Z z

p a point in the image Motion Field: the velocity of p V P coordinates x, y, f pv f Z z Taking derivative wrt time: v is the velocity in the image V and P and 3-dimensional velocity and position vectors

the velocity of the point in the plane V P p = f P/Z pv f Z z v is also three dimensional, but v is always zero. z

Motion Field: the velocity of p V P pv f Z z examples: (f.omega_z.y) / Z = omega_z. y (y.omega_y.x) / Z = (omega_y. xy) / f etc.

velocity of p... pv f V P Z z Translational component

velocity of p... pv f V P Z z Rotational component NOTE: The rotational component is independent of depth Z!

Special Case I: Pure Translation Assume T z 0 Define: we obtain

Pure Translation T z > 0 T z < 0 The motion field in this case is RADIAL: It consists of vectors passing through p o = (x( o,y o ) If: T z > 0, (camera moving towards object) the vectors point away from p o p o is the POINT OF EXPANSION T z < 0, (camera moving away from object) the vectors point towards p o p o is the POINT OF CONTRACTION See also epipolar geometry: Motion perpendicular to image plane.

Pure Translation: Properties of the MF p o is the vanishing point of the direction of translation. 3D scene 2D images

Pure Translation What if T z = 0? All motion field vectors are parallel to each other and inversely proportional to depth! Example: looking out from a train running.

Special Case II: Moving Plane V P n Planar surfaces are common in man-made made environments p v f Z z Question: How does the MF of a moving plane look like?

Moving Plane X Points on the plane must satisfy the equation describing the plane. Y O d P n Z Let n be the unit vector normal to the plane. d be the distance from the plane to the origin. NOTE: If the plane is moving wrt camera, n and d are functions of time. Then: where T d is in fact n.(closest point of the plane from the origin)

Moving Plane X Let be the image of P n d p P Using the pinhole projection equation: Y O Z Using the plane equation: Solving for Z: Z =... gives

Moving Plane X n Now consider the MF equations: Y O d p P Z And Plug in: A quadratic function in the image coordinates.

Moving Plane X n The MF equations become: Y O d p P Z where There are 3(n) + 1(d) +3(T) + 3(omega) = 10 unknown. The same coefficients a 1- a 8 can be obtained with a different plane and a different relative velocity.

How good is optical flow? The AAE (average angular error) race on the Yosemite sequence for over 15 years Improvement # Yosemite sequence State-of-the-art optical flow * # I. Austvoll. Lecture Notes in Computer Science, 2005 * Brox et al. ECCV, 2004.

Optical flow is far from being solved AAE=8.99 AAE=5.24 Frame Ground-truth motion Optical flow AAE=1.94

http://vision.middlebury.edu/flow/ There were 91 optical flow results either in gray or color by July 2013.

Measuring motion for real-life videos Challenging because of occlusion, shadow, reflection, motion blur, sensor noise and compression artifacts Antonio Torralba Accurately measuring motion also has great impact in scientific measurement and graphics applications Humans are experts in perceiving motion. Can we use human expertise to annotate motion?

Motion combined with other properties appears in many more advanced estimation problems. Noncalibrated images are always upto a projective transformation. Auto-calibration can recover the affine and after that the similarity (euclidean) images from a sequence of a few frames, with additional information. Structure from motion, factorization algorithm, recovers from a sequence the object and its motion under an affine (linear) or projective camera.