Motion Estimation. Yao Wang Tandon School of Engineering, New York University

Motion Estimation Yao Wang Tandon School of Engineeing, New Yok Univesity

Outline 3D motion model 2-D motion model 2-D motion vs. optical flow Optical flow equation and ambiguity in motion estimation Geneal methodologies in motion estimation Motion epesentation Motion estimation citeion Optimization methods Gadient descent methods Piel-based motion estimation Block-based motion estimation assuming constant motion in each block EBMA algoithm evisited Half-pel EBMA Hieachical EBMA (HBMA) Defomable block matching (DBMA) Mesh-based motion estimation

Pinhole Camea Model 3-D point Camea cente Image plane 2-D image The image of an object is evesed fom its 3-D position. The object appeas smalle when it is fathe away.

Pinhole Camea Model: Pespective Pojection All points in this ay will have the same image F = X Z, y F = Y Z = F X Z, y = F Y Z!, y!ae!invesely!elated!to!z

Appoimate Model: Othogaphic Pojection When the object is vey fa ( Z ) = X, y = Y Can be used as long as the depth vaiation within the object is small compaed to the distance of the object.

Rigid Object Motion z y z y T T T,, : ;,, ]: [ ; ) ]( [ ' the object cente : Rotation and tanslation wp. T R C T C X R X θ θ θ =

Rotation Mati When all otation angles ae small:

Fleible Object Motion Two ways to descibe Decompose into multiple, but connected igid sub-objects Global motion plus local motion in sub-objects E. Human body consists of many pats each undego a igid motion

3-D Motion -> 2-D Motion 3-D MV 2-D MV

Sample 2D Motion Field At each piel (o cente of a block) of the ancho image (ight), the motion vecto descibes the 2D displacement between this piel and its coesponding piel in the othe taget image (left)

Motion Field Definition Ancho fame: Taget fame: Motion paametes: Motion vecto at a piel in the ancho fame: d() Motion field: Mapping function: ψ 1( ) ψ 2( ) a d ( ; a), Λ w ( ; a) = d( ; a), Λ

Occlusion Effect Motion is undefined in occluded egions uncoveed egion Coveed egion Ideally a 2D motion field should indicate such aea as uncoveed (o occluded) instead of giving false MVs

2-D Motion Coesponding to Rigid Object Motion Geneal case: Pojective mapping: Real object sufaces ae not plana! But can be divided into small patches each appoimated as plana 2D motion can be modeled by piecewise pojective mapping (a diffeent pojective mapping ove each 2D patch) F T Z F y F T Z F y F y F T Z F y F T Z F y F T T T Z Y X Z Y X z y z z y = = = ) ( ) ( ' ) ( ) ( ' ' ' ' 9 8 7 6 5 4 9 8 7 3 2 1 Pespective Pojection 9 8 7 6 5 4 3 2 1! When!the!object!suface!is!plana!(Z = ax by c): '= a 0 a 1 a 2 y 1 c 1 c 2 y, y'= b 0 b 1 b 2 y 1 c 1 c 2 y

Typical Camea Motions Yao Wang, 2016 EL-GY 6123: Image and Video Pocessing

2-D Motion Coesponding to Camea Motion Camea zoom Camea otation aound Z-ais (oll)

2-D Motion Coesponding to Camea Motion o Rigid Object Motion Geneal case: Pojective mapping: F T Z F y F T Z F y F y F T Z F y F T Z F y F T T T Z Y X Z Y X z y z z y = = = ) ( ) ( ' ) ( ) ( ' ' ' ' 9 8 7 6 5 4 9 8 7 3 2 1 Pespective Pojection 9 8 7 6 5 4 3 2 1! When!all!the!object!points!ae!fa!fom!the!camea!and!hence!can!be!consideed!on!the!same!plane!(Z = c): '= a 0 a 1 a 2 y 1 c 1 c 2 y, y'= b 0 b 1 b 2 y 1 c 1 c 2 y The!above!is!also!tue!if!the!imaged!object!has!a!plana!suface!(i.e.!Z=aXbYc)!!(HW!)!

Pojective Mapping and Its Appoimations Two featues of pojective mapping: Chiping: inceasing peceived spatial fequency fo fa away objects Conveging (Keystone): paallel lines convege in distance

Affine and Bilinea Model Affine (6 paametes): Good fo mapping tiangles to tiangles Bilinea (8 paametes): Good fo mapping blocks to quadangles = y b b b y a a a y d y d y 2 1 0 2 1 0 ), ( ), ( = y b y b b b y a y a a a y d y d y 3 2 1 0 3 2 1 0 ), ( ), (

2-D Motion vs. Optical Flow 2-D Motion: Pojection of 3-D motion, depending on 3D object motion and pojection opeato Optical flow: Peceived 2-D motion based on changes in image patten, also depends on illumination and object suface tetue On the left, a sphee is otating unde a constant ambient illumination, but the obseved image does not change. On the ight, a point light souce is otating aound a stationay sphee, causing the highlight point on the sphee to otate.

Optical Flow Equation When illumination condition is unknown, the best one can do it to estimate optical flow. Constant intensity assumption -> Optical flow equation Unde!"constant!intensity!assumption": ψ ( d, y d y,t d t )=ψ (, y,t) But,!using!Taylo's!epansion: ψ ( d, y d y,t d t )=ψ (, y,t) ψ d ψ y d y ψ t d t Compae!the!above!two,!we!have!the!optical!flow!equation: ψ d ψ y d ψ y t d ψ = 0!!!!o!!!! t v ψ y v ψ y t = 0!!o!! ψ T v ψ t = 0! In!discete!sample!domain!(assuming!(,y)!in!ψ 1!is!moved!to!(d,ydy)!in!ψ 2 :! ψ 2 d ψ 2 y d ψ (, y) ψ (, y)= 0! y 2 1 Note: Typo in the tetbook, Eq. (6.2.3). Gadient should be wt ψ 2 Yao Wang, 2016 EL-GY 6123: Image and Video Pocessing

Ambiguities in Motion Estimation Optical flow equation only constains the flow vecto in the gadient diection The flow vecto in the tangent diection ( v t ) is unde-detemined In egions with constant bightness ( ψ = 0), the flow is indeteminate -> Motion estimation is uneliable in egions with flat tetue, moe eliable nea edges v n v = v e v n n n v e ψ ψ = 0 t t t

Geneal Consideations fo Motion Estimation Two categoies of appoaches: Featue based: finding coesponding featues in two diffeent images and then deive the entie motion field based on the motion vectos at coesponding featues. moe often used in object tacking, 3D econstuction fom 2D Intensity based: diectly finding MV at evey piel of block based on constant intensity assumption moe often used fo motion compensated pediction and filteing, equied in video coding, fame intepolation -> Ou focus Thee impotant questions How to epesent the motion field? What citeia to use to estimate motion paametes? How to seach motion paametes?

Motion Repesentation Global: Entie motion field is epesented by a few global paametes Piel-based: One MV at each piel, with some smoothness constaint between adjacent MVs. Block-based: Entie fame is divided into blocks, and motion in each block is chaacteized by a few paametes. Region-based: Entie fame is divided into egions, each egion coesponding to an object o subobject with consistent motion, epesented by a few paametes. Othe epesentation: mesh-based (contol gid) (to be discussed late) Yao Wang, 2016 EL-GY 6123: Image and Video Pocessing

Motion Estimation Citeion To minimize the displaced fame diffeence (DFD) (based on constant intensity assumption) p EDFD( a) = ψ 2( d( ; a)) ψ1( ) min p = 1: MAD; To satisfy the optical flow equation E OF (a)=! Λ P = 2:MSE To impose additional smoothness constaint using egulaization technique (Impotant in piel- and block-based epesentation) E w s ( a) DFD E = Λ y N DFD ( a) w E Bayesian (MAP) citeion: to maimize the a posteioi pobability P D = dψ, ψ ) ma ( 2 1 Λ ( ψ 2 ()) T d(;a)ψ 2 () ψ 1 () d( ; a) d( y; a) s s ( a) min 2 p min Note typo in Eq(6.2.3)- (6.2.7). Spatial gadients should be w..t ψ 2

Relation Among Diffeent Citeia OF citeion is good only if motion is small. OF citeion can often yield closed-fom solution as the objective function is quadatic in MVs. When the motion is not small, can use coase ehaustive seach to find a good initial solution, and use this solution to defom taget fame, and then apply OF citeion between oiginal ancho fame and the defomed taget fame. Bayesian citeion can be educed to the DFD citeion plus motion smoothness constaint Moe in the tetbook

Optimization Methods Ehaustive seach Typically used fo the DFD citeion with p=1 (MAD) Guaantees eaching the global optimal Computation equied may be unacceptable when numbe of paametes to seach simultaneously is lage! Fast seach algoithms each sub-optimal solution in shote time Gadient-based seach Typically used fo the DFD o OF citeion with p=2 (MSE) the gadient can often be calculated analytically When used with the OF citeion, closed-fom solution may be obtained Reaches the local optimal point closest to the initial solution Multi-esolution seach Seach fom coase to fine esolution, faste than ehaustive seach Avoid being tapped into a local minimum

Gadient Descent Method Iteatively update the cuent estimate in the diection opposite the gadient diection. Not a good initial A good initial Stepsize too big Appopiate stepsize The solution depends on the initial condition. Reaches the local minimum closest to the initial condition Choice of step side: Fied stepsize: Stepsize must be small to avoid oscillation, equies many iteations Steepest gadient descent (adjust stepsize optimally) Yao Wang, 2016 EL-GY 6123: Image and Video Pocessing

Newton s Method Newton s method Conveges faste than 1 st ode method (I.e. equies fewe numbe of iteations to each convegence) Requies moe calculation in each iteation Moe pone to noise (gadient calculation is subject to noise, moe so with 2 nd ode than with 1 st ode) May not convege if \alpha >=1. Should choose \alpha appopiate to each a good compomise between guaanteeing convegence and the convegence ate.

Newton-Raphson Method Newton-Ralphson method Appoimate 2 nd ode gadient with poduct of 1 st ode gadients Applicable when the objective function is a sum of squaed eos Only needs to calculate 1 st ode gadients, yet convege at a ate simila to Newton s method.

Piel-Based Motion Estimation Hon-Schunck method DFD motion smoothness citeion Multipoint neighbohood method Assuming evey piel in a small block suounding a piel has the same MV Pel-ecusive method MV fo a cuent pel is updated fom those of its pevious pels, so that the MV does not need to be coded Developed fo ealy geneation of video code Recommended eading fo ecent advances: Sun, Deqing, Stefan Roth, and Michael J. Black. "Secets of optical flow estimation and thei pinciples." In Compute Vision and Patten Recognition (CVPR), 2010 IEEE Confeence on, pp. 2432-2439. IEEE, 2010.

Block-Based Motion Estimation Assume all piels in a block undego a coheent motion, and seach fo the motion paametes fo each block independently Block matching algoithm (BMA): assume tanslational motion, 1 MV pe block (2 paamete) Ehaustive BMA (EBMA) Fast algoithms Defomable block matching algoithm (DBMA): allow moe comple motion (affine, bilinea), to be discussed late.

Block Matching Algoithm Oveview: Assume all piels in a block undego a tanslation, denoted by a single MV Estimate the MV fo each block independently, by minimizing the DFD eo ove this block Minimizing function: E DFD ( dm) = ψ 2( dm) ψ1( ) B m p min Optimization method: Ehaustive seach (feasible as one only needs to seach one MV at a time), using MAD citeion (p=1) Fast seach algoithms Intege vs. factional pel accuacy seach

Ehaustive Block Matching Algoithm (EBMA) Yao Wang, 2016 EL-GY 6123: Image and Video Pocessing

Sample Matlab Scipt fo Intege-pel EBMA %f1: ancho fame; f2: taget fame, fp: pedicted image; %mv,mvy: stoe the MV image %widthheight: image size; N: block size, R: seach ange fo i=1:n:height-n, fo j=1:n:width-n %fo evey block in the ancho fame MAD_min=256*N*N;mv=0;mvy=0; fo k=-r:1:r, fo l=-r:1:r %fo evey seach candidate (needs to be modified so that ik etc ae within the image domain!) MAD=sum(sum(abs(f1(i:iN-1,j:jN-1)-f2(ik:ikN-1,jl:jlN-1)))); % calculate MAD fo this candidate if MAD<MAX_min MAD_min=MAD,dy=k,d=l; end; end;end; fp(i:in-1,j:jn-1)= f2(idy:idyn-1,jd:jdn-1); %put the best matching block in the pedicted image iblk=(floo)(i-1)/n1; jblk=(floo)(j-1)/n1; %block inde mv(iblk,jblk)=d; mvy(iblk,jblk)=dy; %ecod the estimated MV end;end; Note: A eal woking pogam needs to check whethe a piel in the candidate matching block falls outside the image bounday and such piel should not count in MAD. This pogam is meant to illustate the main opeations involved. Not the actual woking matlab scipt. Yao Wang, 2016 EL-GY 6123: Image and Video Pocessing

Compleity of Intege-Pel EBMA Assumption Image size: MM Block size: NN Seach ange: (-R,R) in each dimension Seach stepsize: 1 piel (assuming intege MV) Opeation counts (1 opeation=1 -, 1, 1 * ): Each candidate position: N^2 Each block going though all candidates: (2R1)^2 N^2 Entie fame: (M/N)^2 (2R1)^2 N^2=M^2 (2R1)^2 Independent of block size! Eample: M=512, N=16, R=16, 30 fps Total opeation count = 2.8510^8/fame =8.5510^9/second Regula stuctue suitable fo VLSI implementation Challenging fo softwae-only implementation

Factional Accuacy EBMA Real MV may not always be multiples of piels. To allow sub-piel MV, the seach stepsize must be less than 1 piel Half-pel EBMA: stepsize=1/2 piel in both dimension Difficulty: Taget fame only have intege pels Solution: Intepolate the taget fame by facto of two befoe seaching Bilinea intepolation is typically used Compleity: 4 times of intege-pel, plus additional opeations fo intepolation. Fast algoithms: Seach in intege pecisions fist, then efine in a small seach egion in half-pel accuacy.

Half-Pel Accuacy EBMA Yao Wang, 2016 EL-GY 6123: Image and Video Pocessing

Bilinea Intepolation (,y) (1,y) (2,2y) (21,2y) (2,2y1) (21,2y1) (,y!) (1,y1) O[2,2y]=I[,y] O[21,2y]=(I[,y]I[1,y])/2 O[2,2y1]=(I[,y]I[1,y])/2 O[21,2y1]=(I[,y]I[1,y]I[,y1]I[1,y1])/4

Implementation fo Half-Pel EBMA %f1: ancho fame; f2: taget fame, fp: pedicted image; %mv,mvy: stoe the MV image %widthheight: image size; N: block size, R: seach ange %fist upsample f2 by a facto of 2 in each diection f3=imesize(f2, 2, bilinea ) (o use you own implementation!) fo i=1:n:height-n, fo j=1:n:width-n %fo evey block in the ancho fame MAD_min=256*N*N;mv=0;mvy=0; fo k=-r:0.5:r, fo l=-r:0.5:r %fo evey seach candidate (needs to be modified!) %MAD=sum(sum(abs(f1(i:iN-1,j:jN-1)-f2(ik:ikN-1,jl:jlN-1)))); f3! MAD=sum(sum(abs(f1(i:iN-1,j:jN-1)-f3(2*(ik):2:2*(ikN-1),2*(jl):2:2*(jlN-1))))); % calculate MAD fo this candidate if MAD<MAX_min MAD_min=MAD,dy=k,d=l; end; end;end; fp(i:in-1,j:jn-1)= f2(idy:idyn-1,jd:jdn-1); wong! need to use coesponding piels in %put the best matching block in the pedicted image iblk=(floo)(i-1)/n1; jblk=(floo)(j-1)/n1; %block inde mv(iblk,jblk)=d; mvy(iblk,jblk)=dy; %ecod the estimated MV end;end; Yao Wang, 2016 EL-GY 6123: Image and Video Pocessing

Eample: Half-pel EBMA Yao Wang, 2016 EL-GY 6123: Image and Video Pocessing Motion field taget fame Pedicted ancho fame (29.86dB) ancho fame

Pos and Cons with EBMA Blocking effect (discontinuity acoss block bounday) in the pedicted image Because the block-wise tanslation model is not accuate Fi: Defomable BMA (net lectue) Motion field somewhat chaotic because MVs ae estimated independently fom block to block Fi 1: Mesh-based motion estimation (net lectue) Fi 2: Imposing smoothness constaint eplicitly Wong MV in the flat egion because motion is indeteminate when spatial gadient is nea zeo Nonetheless, widely used fo motion compensated pediction in video coding Because its simplicity and optimality in minimizing pediction eo

Fast Algoithms fo BMA Key idea to educe the computation in EBMA: Reduce # of seach candidates: Only seach fo those that ae likely to poduce small eos. Pedict possible emaining candidates, based on pevious seach esult Simplify the eo measue (DFD) to educe the computation involved fo each candidate Classical fast algoithms Thee-step 2D-log Conjugate diection Many new fast algoithms have been developed since then Some suitable fo softwae implementation, othes fo VLSI implementation (memoy access, etc)

VcDemo Eample VcDemo: Image and Video Compession Leaning Tool Developed at Delft Univesity of Technology http://insy.ewi.tudelft.nl/content/image-and-video-compession-leaning-tool-vcdemo Use the ME tool to show the motion estimation esults with diffeent paamete choices

Multi-esolution Motion Estimation Poblems with BMA Unless ehaustive seach is used, the solution may not be global minimum Ehaustive seach equies etemely lage computation Block wise tanslation motion model is not always appopiate Multiesolution appoach Aim to solve the fist two poblems Fist estimate the motion in a coase esolution ove low-pass filteed, down-sampled image pai Can usually lead to a solution close to the tue motion field Then modify the initial solution in successively fine esolution within a small seach ange Reduce the computation Can be applied to diffeent motion epesentations, but we will focus on its application to BMA

Hieachical Block Matching Algoithm (HBMA) Yao Wang, 2016 EL-GY 6123: Image and Video Pocessing

Eample: Thee-level HBMA Yao Wang, 2016 EL-GY 6123: Image and Video Pocessing Pedicted ancho fame (29.32dB)

Eample: Half-pel EBMA Yao Wang, 2016 EL-GY 6123: Image and Video Pocessing Motion field taget fame Pedicted ancho fame (29.86dB) ancho fame

Computation Requiement of HBMA Assumption Image size: MM; Block size: NN at evey level; Levels: L Seach ange: 1 st level: R/2^(L-1) (Equivalent to R in L-th level) Othe levels: R/2^(L-1) (can be smalle) Opeation counts fo EBMA image size M, block size N, seach ange R # opeations: M 2 2 R 1 Opeation counts at l-th level (Image size: M/2^(L-l)) M / 2 2R/ 2 1 Total opeation count L L l 2 L 1 2 1 ( L 2) 2 M / 2 2R / 2 1 4 4M R 3 l= 1 Saving facto: ( ) 2 L l 2 L 1 ( ) ( ) 2 ( ) ( ) 2 3 4 ( L 2) = 3( L = 2); 12( L = 3)

Defomable Block Matching Algoithm Yao Wang, 2016 EL-GY 6123: Image and Video Pocessing

Oveview of DBMA Patition the ancho fame into egula blocks Model the motion in each block by a moe comple motion The 2-D motion caused by a flat suface patch undegoing igid 3-D motion can be appoimated well by pojective mapping Pojective Mapping can be appoimated by affine mapping and bilinea mapping Vaious possible mappings can be descibed by a node-based motion model Estimate the motion paametes block by block independently Discontinuity poblem coss block boundaies still emain Still cannot solve the poblem of multiple motions within a block o changes due to illumination effect!

Mesh-based vs. block-based motion estimation (a) block-based backwad ME (b) mesh-based backwad ME (c) mesh-based fowad ME

Summay 1: Motion Models 3D Motion Rigid vs. non-igid motion Camea model: 3D -> 2D pojection Pespective pojection vs. othogaphic pojection What causes 2D motion? Object motion pojected to 2D Camea motion Optical flow vs. tue 2D motion Models coesponding to typical camea motion and object motion Rigid 3D motion of a plana suface -> 2D pojective mapping 2D motion of each small patch can be modeled well by pojective mapping (Piece-wise pojective mapping) Affine o bilinea functions can be used to appoimate the pojective mapping, but should know the caveats Affine functions ae often used to chaacteize global 2D motion due to camea motions Constaints fo 2D motion Optical flow equation Deived fom constant intensity and small motion assumption Ambiguity in motion estimation

Summay 2: Geneal Stategy fo Motion Estimation How to epesent motion: Piel-based, block-based, egion-based, global, etc. Estimation citeion: DFD (constant intensity) OF (constant intensitysmall motion) Bayesian (MAP, DFDmotion smoothness) Seach method: Ehaustive seach, gadient-descent, multi-esolution

Summay 3: Motion Estimation Methods Piel-based motion estimation (also known as optical flow estimation) Most accuate epesentation, but also most costly to estimate Block-based motion estimation, assuming each block has a constant motion Good tade-off between accuacy and speed EBMA and its fast but suboptimal vaiant is widely used in video coding fo motion-compensated tempoal pediction. HBMA can not only educe computation but also yield physically moe coect motion estimates Defomable block matching algoithm (DBMA) To allow moe comple motion within each block Mesh-based motion estimation To enfoce continuity of motion acoss block boundaies Global motion estimation (net lectue) Region-based motion estimation (net lectue)

Reading Assignments Reading assignment (Wang, et al, 2004) Chap 5: Sec. 5.1, 5.5 Chap 6: Sec. 6.1-6.6, Ap. A, B. Optional eading: Woods, 2012, Sec. 11.2. Sun, Deqing, Stefan Roth, and Michael J. Black. "Secets of optical flow estimation and thei pinciples." In Compute Vision and Patten Recognition (CVPR), 2010 IEEE Confeence on, pp. 2432-2439. IEEE, 2010.

Witten Assignment 1. Show that the pojected 2-D motion of a 3-D object plana patch undegoing igid motion can be descibed by pojective mapping. 2. Pob. Conside a tiangula patch whose oiginal cone positions ae at k, k=1,2,3. Suppose each cone is moved by d k, k=1,2,3. The motion field within the tiangula patch can be descibed by an affine mapping. Epess the affine paametes in tems of d k. 3. Pob. 6.5 4. Pob. 6.8 5. Pob. 6.9 6. (Optional) Go though and veify the gadient descent algoithm pesented fo estimating the nodal motions in DBMA in Eq. (6.5.2)-(6.5.6). 7. (Optional) Fo estimating the nodal motions in DBMA, instead of minimizing the DFD eo, set up the fomulation using the OF citeion (assuming nodal motions ae small), and find the closed fom solution of the nodal motion.

MATLAB Assignment 1. Pob. 6.12 (EBMA with intege accuacy) 2. Pob. 6.13 (EBMA with half-pel accuacy) 3. Pob. 6.15 (HBMA) Note: you can download sample video fames fom the couse webpage. When applying you motion estimation algoithm, you should choose two fames that have sufficient motion in between so that it is easy to obseve effect of motion estimation inaccuacy. If necessay, choose two fames that ae seveal fames apat. Fo eample, foeman: fame 100 and fame 103.