Dense mage-based Motion Estimation Algorithms & Optical Flow
Video A video is a sequence of frames captured at different times The video data is a function of v time (t) v space (x,y)
ntroduction to motion estimation Given a video sequence of moving objects or camera, what information can we extract? How is the camera moving? How many moving objects exist? What is the direction of each moving object? How fast is object moving? f we could get the answer of these questions we can interpret the scene better.
Applications Background Subtraction v a stationary camera is observing the scene v Goal: Separate the static background from the moving foreground
Applications Motion Segmentation Segment the video to the moving objects with different motions
Other Applications Estimating 3D structure Segmenting objects based on motion cues Recognizing events and activities mproving video quality (motion stabilization)
mage Alignment Alignment between two images or image patches template image image 0( x) discrete pixel locations { x = ( x, y )} i i i ( x) 1 { x i =( x i, y i )}
Motion Estimation To estimate motion between two or more images: Error Metric v Measuring the similarity/dissimilarity between images Search Technique v Full search (Simple but too slow) v Hierarchical coarse-to-fine methods based on image pyramids Optical Flow v Multiple independent motions
Translational Alignment Sum of Squared Differences [ ] 2 2 E ( u) = ( x + u) ( x ) = e SSD 1 i 0 i i i i v u=(u,v) : displacement vector v e i = 1 ( x i +u) 0 ( x i ) :the residual error Displaced Frame Difference ( Video Coding)
Translational Alignment [ ] 2 2 E ( u) = ( x + u) ( x ) = e SSD 1 i 0 i i i i
Robust Error Metrics E ( u) = ρ( ( x + u) ( x )) = ρ( e) SRD 1 i 0 i i i i Grows less quickly than the quadratic penalty associated with least squares E ( u) = ( x + u) ( x ) = e SAD 1 i 0 i i i i
Robust Error Metrics 2 x ρ GM ( x) = 1 + x / a 2 2 a: constant that can be thought of as an outlier threshold
Spatially Varying Weights [ ] 2 E ( u) = ω ( x ) ω ( x + u) ( x + u) ( x ) WSSD 0 i 1 i 1 i 0 i i Weighted (or Windowed) SSD function
Windowed SSD n case of a large range of motion: v The above metric has bias toward smaller overlapping solutions A + = ω0( xi) ω1( xi u) Overlapping Area i v Ton counteract this bias: RMS = E / A : per pixel squared pixel error WSSD
Bias and Gain (Exposure Differences) O#en the two images being aligned were not taken with the same exposure. Simple model of intensity varia<ons: α is the gain β is the bias ( x+u) = (1 + α) ( x) + β 1 0
Bias and Gain Least squares with bias and gain Performing a linear regression Color image [ α β] [ α β ] 2 2 = + = + E ( u) ( x +u) (1 ) ( x ) ( x ) e BG 1 i 0 i 0 i i i i Estimate bias and gain for each color channel Bias and gain compensation is also used in video codecs, known as Weighted Prediction.
Correlation Cross-Correlation Alternative to taking intensity difference Maximize the product of two aligned images E ( u) = ( x ) ( x + u) CC 0 i 1 i i s Bias and Gain modeling unnecessary? Bright patch exists in images
Normalized Cross-Correlation E NCC where ( u) = i i 0 i 0 1 i 1 2 2 0 i 0 1 i 1 i 0 0 N i 1 1 N i ( x ) ( x + u) ( x ) ( x + u) 1 = ( x ) and 1 = ( x + u) i i NCC in [-1,1] Works well when matching images taken with different exposure Degrades for noisy low-contrast regions (Zero variance)
Hierarchical Motion Estimation How can we find its minimum? Full search over some range of shifts Often used for block matching in motion compensated video compression Simple to implement but slow To accelerate the search process Hierarchical motion estimation
Hierarchical Motion Estimation Steps Construct image pyramid Full search over the range At coarser levels, search over a smaller number of discrete pixels The motion estimation from one level is used to initialize a smaller local search at next finer level Not guaranteed to produce the same results as a full search, but works almost as well and much faster u=1.25 pixels u=2.5 pixels u=5 pixels 2 l [ SS, ] 2 u=10 pixels image H Gaussian pyramid of image H image Gaussian pyramid of image
Optical Flow The most general and challenging version of motion estimation Computing an independent estimate of motion at each pixel of the image
Optical Flow Field
Problem Definition : Optical Flow How to estimate pixel motion from image H to image? Solve pixel correspondence problem given a pixel in H, look for nearby pixels of the same color in
Problem Definition Key assumptions v color constancy: a point in H looks the same in For grayscale images, this is brightness constancy v small motion: points do not move very far This is called the optical flow problem
Optical Flow Constraints (gray scale images) Let s look at these constraints more closely brightness constancy: Q: what s the equation? H(x, y) = (x+u, y+v) small motion: (u and v are less than 1 pixel) suppose we take the Taylor series expansion of :
Combining Equations What is t? The time derivative of the image at (x,y) How do we calculate it? The x-component of the gradient vector.
Optical Flow Equation Problem 1: Q: how many unknowns and equations per pixel? 1 equation, but 2 unknowns (u and v)
Problem 2: The Aperture Problem For points on a line of fixed intensity we can only recover the normal flow Time t Time t+dt? Where did the blue point move to? We need additional constraints
Use Local nformation Sometimes enlarging the aperture can help
Local smoothness Lucas Kanade (1984) t y x v u = + [ ] t y x v u = assume locally constant motion v pretend the pixel s neighbors have the same (u,v) ü f we use a 5x5 window, that gives us 25 equations per pixel! =!! 2 1 2 2 1 1 t t y x y x v u A = b u!
Lucas Kanade (1984) Goal: Minimize! Au b 2 Method: Least-Squares A u! = b A T A u! = 2x2 2x1 A T b 2x1! u = ( T ) 1 T A A A b
How does Lucas-Kanade behave? ( ) b A A A T T 1 u =! = 2 2 y y x y x x T A A We want this matrix to be invertible. i.e., no zero eigenvalues
How does Lucas-Kanade behave? Edge è A T A becomes singular ( ) y, x ( ) x, y x 2 x y x y 2 y x y = 0 0 x y is eigenvector with eigenvalue 0
How does Lucas- Kanade behave? Homogeneous è è 0 eigenvalues A T A 0 ( ) 0, x y = 2 2 y y x y x x T A A
How does Lucas-Kanade behave? Textured regions è two high eigenvalues ( ) 0, x y
How does Lucas-Kanade behave? Edge è A T A becomes singular Homogeneous regions è low gradients High texture è A T A 0
When does it break? Homogeneous objects generate zero optical flow. Fixed sphere. Changing light source. Non-rigid texture motion
Other break-downs Brightness constancy is not satisfied Correlation based methods A point does not move like its neighbors what is the ideal window size? Regularization based methods The motion is not small (Taylor expansion doesn t hold) Use multi-scale estimation
Multi-Scale Flow Estimation u=1.25 pixels u=2.5 pixels u=5 pixels image t-1 u=10 pixels image t+1 Gaussian pyramid of image t Gaussian pyramid of image t+1
Multi-Scale Flow Estimation run Lucas-Kanade warp & upsample run Lucas-Kanade... image t-1 image t+1 Gaussian pyramid of image t Gaussian pyramid of image t+1
Examples: Motion Based Segmentation nput Segmentation result
Examples: Motion Based Segmentation nput Segmentation result
Other break- downs Brightness constancy is not satisfied Correlation based methods A point does not move like its neighbors what is the ideal window size? Regularization based methods The motion is not small (Taylor expansion doesn t hold) Use multi-scale estimation
Regularization Horn and Schunk (1981) Add global smoothness term Smoothness error: Error in brightness constancy equation E E ( 2 2) ( 2 2 u + u + v v ) dx dy s = x y x + D ( ) 2 u + v dx dy c = x y + D t y Minimize: E + λ c E s Solve by calculus of variations