C18 Computer vision. C18 Computer Vision. This time... Introduction. Outline.

C18 Computer Vision. This time... 1. Introduction; imaging geometry; camera calibration. 2. Salient feature detection edges, line and corners. 3. Recovering 3D from two images I: epipolar geometry. C18 Computer vision Lecture 7 MT 2015 Andrea Vedaldi For slides and up-to-date information: http://www.robots.o.ac.uk/~vedaldi/teach.html 4. Recovering 3D from two images II: stereo correspondence algorithms; triangulation. 5. Structure from Motion I: computing the fundamental matri by matching corners, robust matching with RANSAC. 6. Structure from Motion II: SIFT, recovering egomotion. 7. Image motion: optic flow; the aperture problem; visual tracking. 8. Object recognition and detection. 2 / 40 Introduction Outline So far we have considered estimating structure and motion from a set of n images {I 1,..., I n }. The images were not taken in any particular order or time. Here we analyse video data, intended as a sequence of images I(t), t 0 captured by a moving camera. In particular the goals of this lecture are to: 1. Study the properties of the projected image motion 2. Compute the optical flow as an approimation of the image motion Optical flow and the aperture problem Lucas-Kanade tracking 3. Design trackers to track objects in videos Tracking with models: RaPiD 3 / 40 4 / 40

Outline Scene V Ω Projection from X = (X, Y, Z) ^ y X = f Z X (1) Optical flow and the aperture problem Lucas-Kanade tracking ^ Optic centre f Image plane ^ z Differentiate w.r.t. time to get projected motion ẋ = f Z Ẋ f Ż Z 2 X But the scene motion consists of translation V and rotation Ω relative to the camera. So: Ẋ = (Ẋ, Ẏ, Ż) = V + Ω X Tracking with models: RaPiD Hence writing Ż = Ẋ ẑ and the using (1): ẋ = f (V + Ω X) ẑ (V + Ω X) f X Z Z 2 = f Z V + Ω V ẑ Z (Ω ) ẑ f 5 / 40 6 / 40 (continued) Eample: a simple motion alarm ẋ = f Z V + Ω V z Z Ω y Ω y f In this eample, the camera is rotating around a fied position hence Ω 0 but V = 0. There is also an independently moving object in the scene. Note that: Rotation The depth Z does not appear with the rotation Ω: The rotational component of motion is independent of the scene depth. Translation The depth Z appears with the translation V as V/Z : There is a depth/speed scaling ambiguity. Still from sequence Measured optical flow Subtracting rotational flow The odometry from the camera aes provide Ω. Subtracting the motion due to the rotation (independent of the scene depth Z ) reveals the moving object: ẋ rotational = Ω (Ω ) ẑ f This means that one cannot tell whether something is large, far away and moving quickly or small, near-to and moving slowly. 7 / 40 8 / 40

Approaching and receding motion In this eample, the camera is approaching or receding but not rotating hence V = V y = 0, V z 0 and Ω = 0. Ego-motion and the focus of epansion Generalising the previous eample, if V 0 but Ω = 0, the motion field diverges/converges from a point, called focus of epansion. To find the focus of epansion, set ẋ = 0: 0 = ẋ = f Z V V z Z = f V z V. approaching (V z < 0) receding (V z > 0) Hence the projected motion is ẋ = f Z V V z Z ẋ ẏ = V z y Z 0 0 The projected motion is a divergent/converged vector field, with a stationary point in the middle of the image = 0, ẋ = 0. Divergence and time to contact 9 / 40 Note: This equation says that you can treat the translational velocity V as a ray, and find where it hits the image plane. This point in the image is called the focus of epansion (c.f. epipole) Eample: segmentation and motion alarm 10 / 40 Suppose again that Ω = 0 and that the scene depth Z varies smoothly as a function of and y. The projected motion is ẋ = f Z V V z Z The divergence of the motion field ẋ(, y) is ẋ ẏ = 1 fv V z fv y yv z Z 0 0 ẋ = ẋ + ẏ = 1 Z 2 Z (fv V z ) 1 Z 2 Z (fv y yv z ) 2 Z V z If the camera is approaching/receding, V = V y = 0 and at = y = 0 where t c is the time to contact. ẋ = 2 V z Z = 2 t c If the scene is a fronto-parallel plane, then Z anywhere in the image = Z = 0, and so ẋ = 2 t c 11 / 40 12 / 40

Outline Computing the visual motion Optical flow and the aperture problem We have considered some uses of projected motion, but not yet discussed how to obtain motion field from imagery Three ways: 1. Token based compute conrers in successive frames match corners using some proimity and/or simiilarity scores Lucas-Kanade tracking Tracking with models: RaPiD 2. Gradient based directly from brighness values great if corner detector or matcher unreliable 3. Phase based We will look at (2) in some detail For (3) see work of (eg) Heeger, Fleet and others Observable visual motion and projected motion 13 / 40 Gradient-based visual motion 14 / 40 A word of warning: Compare a rotating snooker ball and a stationary light source, with a stationary ball and a moving light source: Highlight stationary Scene moves, but visual motion is zero Highlight moves In general, observable motion projected motion Scene still, but highlight moves, so visual motion is non-zero. But often its the best we can do (and a reasonable approimation...) Regard the image as a sampling of a continuous irradiance function I = I(, y, t). Follow a particular image patch over time. In this case = (t) and y = y(t), and total di/dt must eists. Using the chain rule: di dt = + d dt + dy dt But if I is constant in the image patch, di/dt = 0, so that 0 = + d dt + dy dt I(,y,t) t 15 / 40 16 / 40

Gradient-based visual motion /ctd The aperture problem Hence if we write I = 0 = + d dt + dy dt ], µ = d ] dt dy dt we arrive at the motion or brightness constraint equation v µ We have not recovered µ...... but only the component of µ along the direction of I. This edge-normal flow is ( ) ( ) ( ) µ I I I v = = I I I 2 I µ = where µ is called the optical flow. This failure to recover both components is called the Aperture Problem. Taking gradients means the information about the intensity is local, as if seen through an aperture. 17 / 40 18 / 40 Overcoming the aperture problem Overcoming the aperture problem /ctd Intuition tells us that there should be no aperture problem if there is a brightness gradient in two directions ; i.e. at corners. E(u, v) i,j ( + i, y + j) u + ( + i, y + j) v + ] 2 ( + i, y + j) Differentiating w.r.t. u, v yields Drawing inspiration from the Plessey corner detector, proceed as follows: Consider tracking a small image patch assumed to have translational image motion. Let E u = i,j E v = i,j u + v + u + v + ] = 0 ] = 0 E(u, v) = i,j i,j I( + i + u, y + j + v, t + δt) I( + i, y + j, t)] 2 ( + i, y + j) u + ( + i, y + j) v + ] 2 ( + i, y + j) Rewrite as ( ) 2 ( ) ( ( ) ] ( ) 2 u = ( v ) ] ) 19 / 40 20 / 40

Rewrite as ( ) 2 ( ) Computing the optic flow ( ( ) ] ) 2 u v ( = ( H µ = b ) ] ) Computing the optic flow Notes The first order approimation used only holds for very small values of µ, µ < 1. To recover bigger flows, use idea of a Gaussian pyramid: Successively smooth and subsample images H is the matri describing the local autocorrelation. Thus: In a region of uniform brightness, H has rank 0 and we can t solve for optic flow On an edge, H has rank 1, and we can find the normal component of optic flow At a corner, H has full rank (i.e. 2), and we can solve the equations to find both components of flow. Outline 21 / 40 Visual tracking 22 / 40 Optical flow and the aperture problem Lucas-Kanade tracking Tracking with models: RaPiD The aim of visual tracking is to locate the position/pose of a target in a image successively over an etended sequence of images. Target: require some appearance model image patch, colour histogram deformable contours (i.e. edges) 3D CAD model Position: image position image position plus deformation parameters 3D translation and rotation full body pose of articulated object... Let s start simply by tracking a planar image patch... 23 / 40 24 / 40

T Tracking as optimisation I The case of pure translation f p ( ) Formulate search as an optimisation problem using brightness constraint as our objective function Optimising non-conve function so need to start near solution Suppose starting point is ] t t y We seek the image position p = ] t t y which maimises the similarity to the template Ehaustive search possible but undesirable Better to do gradient ascent/descent Minimise: E(δt, δt y ) = I( + t + δt, y + t y + δt y ) T (, y)] 2 w.r.t. δt, δt y.,y W Pure translation 25 / 40 Generalisation 26 / 40 Epanding to first order we obtain: E(δt, δt y ),y W I( + t, y + t y ) + I Taking partials w.r.t. δt, δt y yields: 2 I I( + t, y + t y ) + I,y W δt δt δt y ] T (, y) ] ] T (, y) δt y = 0 ] 2 T f p ( ) I Pure translation can be overly restrictive Algorithm generalises straightforwardly Hence,y W I I ] δt δt y =,y W I I( + t, y + t y ) T (, y)] Note the (unsurprising) similarity to the optic flow solution. Suppose represents a location in a template. We seek a set of parameters p such that the warp function f p () aligns the template with an image I(t). Set t t + δt, t y t y + δt y and iterate until change negligible. 27 / 40 28 / 40

More general transformations Generalisation The function f could be quite general. For eample: translation p = 1 0 t ] f t t p () = 0 1 t y y y 0 0 1 1 translation & scale p = s t t y ] f p () = 1 0 p 0 1 p y 0 0 1 y 1 translation & rotation p = ] f p () = cos θ sin θ p sin θ cosθ p y θ t t y 0 0 1 a general affinity f p () = p 11 p 12 p 13 p 21 p 22 p 23 0 0 1 y 1 y 1 Returning to the brightness constraint again, we need to minimize: E( p) = W I(f p+ p (), t) T ()] 2 w.r.t. p. Epanding to first order we obtain: E( p) I(f p (), t) + I f p T () p W Taking partials w.r.t. p yields: 2 I f ] I f ] p p p + I(f p(), t) T () Hence W W ] 2 = 0 I f ] I f ] p = I f ] T () I(f p (), t)] p p p W 29 / 40 30 / 40 Generalisation Summary I f ] I f ] p p W }{{} p = I f ] T () I(f p (), t)] p W }{{} H p = b = Given template T, image I(t) and current tracker parameters p(t 1): p p(t 1) Repeat until p 0: H = I f ] I f ] p p W b = I f ] T () I(f p (), t)] p W Note that the gradients I are computed at f p () T I p = H 1 b f p () = f p f p () p p + p] f p( ) p(t) p 31 / 40 32 / 40

Remarks Eample The epressions look nastier than they are. Easy to go back to the translational case: f p () = ] ] f 1 0 + t y + t y, p = 0 1 f p () f p () = 1 0 δt 0 1 δt y 1 0 t 0 1 t y = 1 0 t + δt 0 1 t y + δt y 0 0 1 0 0 1 0 0 1 We are actually doing a lot more work than we need to calculating gradients in the image at each iteration. Instead pre-compute template gradients, template Hessian, etc, and warp the other way ; see tutorial sheet 33 / 40 34 / 40 Outline Tracking with rigid models: RaPiD tracker Harris, 90 Assumes calibrated camera and rigid 3D polyhedral model consisting of Optical flow and the aperture problem 3D edges control points on model edges y Lucas-Kanade tracking C P Tracking with models: RaPiD Optic ais z T ω v 35 / 40 36 / 40

RaPiD tracker The coordinates of a control point are given (in the camera reference frame) by X = T + P, Ẋ = V + Ω P. The projected motion is then ẋ = d 1 dt Z X = 1 Z Ẋ Ż Z X 2 = 1 V Ω y P z Ω z P y V y + Ω z P Ω P z (V Z z + Ω P y Ω y P ) y V z Ω P y Ω y P 1 V + Ω y P z Ω z P y (V z + Ω P y Ω y P ) = V y + Ω z P Ω P z y(v z + Ω P y Ω y P ) 0 Hence we can write the projection at time t + δt as ] ] 1 0 Py P = + ẋδt = + z + P P y V δt 0 1 y P z yp y yp P Ω RaPiD tracker = + G For each control point, measure its distance to the nearest image edge V Ω] l = ˆn ( ) = ˆn G With n control points, minimize E(V, Ω) = i l i ˆn i G i V Ω w.r.t. V, Ω to obtain pose update. Wrap this up in a Kalman Filter ] δt δt ] 2 ] V δt Ω located edge n predicted edge control point measured edge RaPiD tracker Summary 37 / 40 RaPiD tracker Eample 38 / 40 3D polyhedral model Tracking rigid objects for teleoperations 1D search at various points normal to projected model edges Filter state (6 dof) using constant velocity KF Predict Measure Update 39 / 40 40 / 40