Redundancy and Correlation: Temporal Mother and Daughter CIF 352 x 288 Frame 60 Frame 61 Time Copyright 2007 by Lina J. Karam 1
Motion Estimation and Compensation Video is a sequence of frames (images) A typical scene does not change rapidly from one frame to the next Huge amount of data that need to be transmitted and played-back in real time or fast enough. Copyright 2007 by Lina J. Karam 2
Motion Estimation and Compensation Exploit intraframe (within one image) redundancy and interframe (between frames) redundancy (motion estimation and compensation). Most common method to remove interframe redundancy is motion compensation (MC). Illustration with linear motion, black ball, white background. d d x d y time If one can extract the ball, just transmit how far it moved each frame (send a vector describing the displacement of the object, motion compensation vector). Copyright 2007 by Lina J. Karam 3
Motion Estimation and Compensation Interframe coding requires storage of the frames used in the coding process. If N previous frames used, one needs to store these N previous frames plus complexity increases This limits the number of frames that can be used only 1 or 2 additional frames are usually used. More recently, long-term memory motion estimation and compensation scheme was proposed and is incorporated within the H.264 standard. Criteria of interest: accuracy, speed, robustness with respect to noise. Copyright 2007 by Lina J. Karam 4
Motion Estimation and Compensation Applications Compression Restoration Interpolation and increase in size of image or in frame rate Motion estimation Motion estimation is the process of determining the movement of objects within a sequence of image frames. Motion compensation Motion compensation refers to the generation of the predicted frame using the estimated motion Copyright 2007 by Lina J. Karam 5
Motion Estimation Assumption: Motion approximated by translational motion locally to compute displacement vector (d x,d y ) at each pixel or small subimage: f(x,y,t) = f(x-d x, y-d y, t -1 ); (x, y) R (1) Local region in current frame Horizontal and vertical displacements Corresponding region in previous frame If we assume uniform translational local motion between t -1 and t: f(x,y,t) = f(x-v x (t-t -1 ), y-v y (t-t -1 )); t -1 t t 0 where v x, v y are uniform horizontal and vertical velocities. By taking partial derivatives,, and, we get a differential x y t equation (spatio-temporal constraint equation): f f f Along trajectory value of the pixels stays the same v x f x, y, t x v y f x, y, t y f x, y, t t 0 (2) Copyright 2007 by Lina J. Karam 6
Motion Estimation Note: Uniform translational motion does not allow for object rotation, camera zoom, uncovered regions by moving object or multiple objects moving with different v x and v y Assumption valid only locally at each pixel or at each small subimage level, for background regions not affected by object motion and for regions with objects which do indeed translate locally with uniform velocity. Such regions occupy a significant portion of a typical sequence of image frames. Motion copensation (MC) can be suppressed in regions where motion estimates were identified to be inaccurate. EEE 508 - Lecture 19 7
Motion Estimation Motion estimation methods can be classified broadly into two groups: Region Matching methods based on Equation (1). Spatio-temporal constraint methods based on Equation (2). Criteria of interest: accuracy, speed, robustness with respect to noise. EEE 508 - Lecture 19
EEE 508 - Lecture 19 9 Motion Estimation Spatio-temporal constraint methods: View (2) as a linear equation with two unknowns v x and v y if partial derivatives given or computed. By evaluating,, and at many points (x i, y i, t) for 1 i N at which v x and v y are assumed constant we obtain an overdetermined set of linear equations Velocity estimates can be obtained by minimizing: Quadratic form in terms of the two unknown v x and v y Solution requires solving two linear equations. x f y f t f 2 1,,,,,,,,,,,, N i t y x t y x y t y x x i i i i i i i i i t t y x f y t y x f v x t y x f v Error
Motion Estimation Advantage over Region matching: Computational simplicity and tractability especially if f(x,y,t) can be expressed by means of polynomial interpolation in the 3D local region of interest. Disadvantage over Region matching: it lacks the robustness with respect to noise as compared to block-based techniques. EEE 508 - Lecture 19 10
Motion Estimation Region matching motion estimation techniques can be categorized into three major groups: Pixel-based techniques Pel-recursive algorithms Gradient-descent algorithms Block-based (Block Matching) techniques Exhaustive full/fractional pel search Fast search algorithms Hierarchical block matching algorithms Object-based techniques mesh-based techniques segmentation-based techniques Copyright 2007 by Lina J. Karam 11
Motion Estimation Comparison Pixel-based techniques Computationally expensive High transmission overhead since motion vectors are assigned for each pixel Block-based techniques Computationally less expensive. Lower bit rate for video data representation due to the transmission of one motion vector per block. Suffers from blocking artifacts -- requires post-processing Low accuracy around boundaries of fast moving objects (block motion does not represent the true object motion). Copyright 2007 by Lina J. Karam 12
Motion Estimation Comparison Object-based techniques Accuracy depends on complexity of object model Complexity determined mostly by the segmentation algorithm involved Mostly used for video coding and data retrieval due to very low bit representation. (MPEG-4, MPEG-7) Copyright 2007 by Lina J. Karam 13
Motion Estimation Main procedure for region matching motion estimation: 1. Divide adjacent and current frames into small regions. 2. For a given region in current frame, search for the displacement which produces the best match among possible blocks in an adjacent frame. 3. The displacement vector is estimated by minimizing: x, y where R is the local spatial region used to estimate (d x,d y ), and C[, ] is a metric that indicates the amount of dissimilarity. Most common C[, ] is the absolute difference Error = Sum Absolute Difference (SAD): EEE 508 - Lecture 19 14 R C f x, y, t 0, f x dx, y d y, C f 1, f2 f1 f2 t 1
Motion Estimation Motion estimation based on Block Matching is commonly used and is incorporated within the video coding standards. Basic procedure for block matching 1. Divide adjacent and current frames into small blocks. 2. For a given block in current frame, search for the displacement (translational motion) which produces the best match among possible blocks in an neighboring/adjacent frame. Copyright 2007 by Lina J. Karam 15
Motion Estimation Block matching-based (Block-based) motion compensation Reference frame (t-l or t+l) Current frame t Motion vector shows from where block moved. (x,y) N 1 (x+m x,y+m y ) (x,y) N 1 p N 2 p N 2 search region 1. Divide current and previous frame into non-overlapping blocks with size N 1 N 2. 2. For each block in current frame: 1. Search in reference frame (previously encoded frame) for a block that closely matches the current block: search region in reference frame is centered at current block position and is of size (2p+N 1 ) (2p+N 2 ). 2. Find closest matching block with respect to a given error metric. 3. Send motion vector (m x,m y ). Copyright 2007 by Lina J. Karam 16
Motion Estimation To perform search (Step 2.1), we search in reference frame for matching block by moving window (of same size as block) in pixels at a time (not in blocks at a time). Large blocks result in fewer blocks fewer motion vectors poor accuracy but low bit-rate Small blocks result in more blocks but high bit-rate good accuracy Typically, for motion estimation, blocks (macroblocks) for MPEG standards are chosen to be 16 16, but also can be with sizes smaller than that (8x8, 4x4, variable). Copyright 2007 by Lina J. Karam 17
Copyright 2007 by Lina J. Karam Motion Estimation How best match block is characterized? B t (n 1, n 2 ) : N 1 N 2 image block in current frame B t-l (n 1, n 2 ) : N 1 N 2 image block in reference frame Compute prediction (match) error: e p (n 1, n 2 ) = B t (n 1, n 2 ) B t-l (n 1, n 2 ) Compute error norm => gives one score Mean-Squared Error (MSE) L 2 norm Sum of Absolute Difference (SAD) L 1 norm 18 1 2 1 2 2 2 1 2 1 2 1 2 1 2 2 1 2 2 1 )), ( ), ( ( 1 ), ( 1 ), ( n n l t t n n p p n n B n n B N N n n e N N n n e 1 2 1 2 ), ( ), ( ), ( ), ( 2 1 2 1 2 1 1 2 1 n n l t t n n p p n n B n n B n n e n n e
Motion Estimation Fractional Pixel (Sub-pixel) Motion Estimation Previous block-matching scheme limited to integer (fullpixel) displacement Better accuracy achieved if fractional pixel displacement can be detected Half-pixel (half-pel) displacement accuracy can be achieved by interpolating (enlarging) the reference frames by two in each of the horizontal and vertical directions Quarter pixel (quarter-pel) displacement accuracy can be achieved by interpolating the reference frames by four in each of the horizontal and vertical directions Copyright 2007 by Lina J. Karam 19
Motion Estimation Half-Pixel Block-based Motion Estimation Displacement vector Current block Interpolated pixel Matching block Original pixel Interpolate the reference frame by two in width and height Perform search on the reference frame Better accuracy at the expense of higher computations (up to four times more computations than full-pixel accuracy) Copyright 2007 by Lina J. Karam 20
Motion Estimation Exhaustive search also known as full search Move search window one pixel at a time in reference frame within some reasonable range surrounding block of interest. Gives best results, but extremely computationally complex, hence very slow. Evaluates SAD at (2p+1+N) 2 locations for N N blocks. For typical application in broadcast TV (480 720 pixels, 30 frames per second) and for N = 16 and p = 7, 29.89 GOPS (Giga operations per second; Giga = 1000 millions) are required Half-pixel full search requires four times more computations Quarter-pixel full search require 16 times more computations Faster method needed. Copyright 2007 by Lina J. Karam 21
Motion Estimation Fast search block-matching methods These attempt to find the candidate block in the reference frame that is closest to the current N 1 N 2 block in current frame by evaluating a subset of the candidate blocks. Computational complexity significantly reduced. Not guaranteed to find the optimal solution. Copyright 2007 by Lina J. Karam 22
Motion Estimation Fast search block matching methods Three-Step-Search method (TTS) Efficient- Three-Step-Search method (ETTS) One-at-a-Time-Search method (OTS) Diamond-Search method (DS) Hexagonal-Search method (HS) Cross-Search-Algorithm (CSA) Copyright 2007 by Lina J. Karam 23
Motion Estimation Fast Search Methods Three-Step Search (TSS) 1. Compute a distortion value for the center point and for 8 uniformly spaced points. Step-size is 4 pixels. 2. The position with smallest distortion is chosen as a centerpoint in next search step. 3. Compute the distortion value for 8 uniformly spaced points denoted by 2. Step-size is halved to two pixels 4. The position with smallest distortion is chosen as a centerpoint in next search step. 5. Compute the distortion value for 8 uniformly spaced points denoted by 3. Step-size is halved to one pixel. 6. The position with smallest distortion defines the motion motion vector. (Koga et al. [1981]) (-6,7) (0,7) (7,7) Third Step Second Step First Step K-step search needs (8K+1) search point per block (-6,-7) (7,-7) Search area size is 15 x 15: bottom corner at (-7,-7) and top corner at (7,7) 24
Motion Estimation Fast Search Methods Three-Step Search (TSS) For typical applications of 2D Logarithmic search in broadcast TV (480 720 pixels, 30 fps) for N 1 = N 2 = 16 and p = 7, 1.02 GOPS required. TSS usually underperforms full search by 0.5 to 1 db in PSNR for common test video sequences. PSNR is the Peak Signal-To-Noise-Ratio defined as: PSNR 10log 10 (MSE between original peak valueof original frameio framei and predictedframei Other issues: Equidistant search patterns May skip global minimum in low motion cases o 2 p ) Copyright 2007 by Lina J. Karam 25
Motion Estimation Fast Search Methods Efficient Three-Step Search (ETTS) (Jing & Chau [2004]) More emphasis on center biased motion Add four neighboring candidates in first step If one of inner blocks match Continue searching around center block until the center minimizes the error Else continue as a TSS method Step 3 Step 2 Step 1 26
Motion Estimation Fast Search Methods Cross-Search-Algorithm (CSA) (Ganbari [1990]) 27
Motion Estimation Fast Search Methods Current block location One-at-a-time-search (Srinivasan & Rao [1985]) 2-D search problem is split into 1-D search in horizontal and vertical directions Gets trapped easily in local minima 28
Motion Estimation Fast Search Methods Diamond-Search (DS) (Zhu & Ma [1997]) Computational complexity of DS is less than that of ETSS Case 1: 5 new neighbors searched Case 2: 3 new neighbors searched Maintains performance for image sequences with global motion like camera zooming Does not perform as well as the ETSS for objects with high motion Step1 Last step Step2 case1 29
Motion Estimation Fast Search Methods Diamond-Search (DS) (Zhu & Ma [1997]) Computational complexity of DS is less than that of ETSS Case 1: 5 new neighbors searched Case 2: 3 new neighbors searched Maintains performance for image sequences with global motion like camera zooming Does not perform as well as the ETSS for objects with high motion Last step Step2 case2 Step1 30
Motion Estimation Fast Search Methods Hexagonal-Search method (HS) (Zhu et al. [2002] ) HS performs better than the DS in terms of reducing computational complexity and maintaining similar picture quality Last Step Step3 Step1 Step2 Basics of Image and Video Copyright 2007 by Lina J. Karam 31
Motion Estimation and Compensation CIF sequences, size [352x288] pixels Mother & Daughter (Low motion) Coastguard (Medium motion, background over foreground motion) Foreman (High motion, camera zooming & panning) Copyright 2007 by Lina J. Karam 32
Motion Estimation and Compensation CIF Mother & Daughter Reference Frame (Frame 197) Current Frame (Frame 198) Difference (Residual) Frame = Frame 198 Frame 197 Copyright 2007 by Lina J. Karam 33
Full Search Motion Estimation [8x8] block motion vectors superimposed on Reference Frame (Frame 197) Copyright 2007 by Lina J. Karam 34
Motion Compensation Motion Compensated Reference (Frame 197) PSNR = 40.8 db, MSE = 5.4 Copyright 2007 by Lina J. Karam 35
Motion Compensated Residual Frame 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 Copyright 2007 by Lina J. Karam 36
Residual Error ( No Motion Compensation) 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 4/1/2012 37
Motion vector map ( 16x16 blocks, Full pixel) 20 MV Frame198 18 16 14 12 10 8 6 4 2 0 0 5 10 15 20 25 Copyright 2007 by Lina J. Karam 38
Motion vector map ( 4x4 blocks, Full pixel) 80 MV Frame198 70 60 50 40 30 20 10 0 0 10 20 30 40 50 60 70 80 90 100 Copyright 2007 by Lina J. Karam 39
Residual Error ( No Motion Compensation) 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 4/1/2012 40
Residual Error ( 16x16 blocks, Full pixel) 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 PSNR = 39.4 db, MSE = 7.5 Copyright 2007 by Lina J. Karam 41
Residual Error ( 8x8 blocks, Full pixel) 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 PSNR = 40.8 db, MSE = 5.4 Copyright 2007 by Lina J. Karam 42
Residual Error ( 4x4 blocks, Full pixel) 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 PSNR = 42.7 db, MSE = 3.5 Copyright 2007 by Lina J. Karam 43
Residual Error ( 4x4 blocks, Half pixel) 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 PSNR = 44 db, MSE = 2.5 Copyright 2007 by Lina J. Karam 44
Residual Error ( 4x4 blocks, quarter pixel) 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 PSNR = 45 db, MSE = 2.1 Copyright 2007 by Lina J. Karam 45
Motion Estimation and Compensation Reference Frame (Frame 13) Current Frame (Frame14) Copyright 2007 by Lina J. Karam 46
Full Search Motion Estimation [8x8] block motion vectors superimposed on Reference F[n-1] (Frame 13) Copyright 2007 by Lina J. Karam 47
Motion Compensation Motion Compensated Reference F[n-1] (Frame 14) PSNR = 34 dbs, MSE = 24.5 Copyright 2007 by Lina J. Karam 48
Motion Compensated Residual Frame 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 Copyright 2007 by Lina J. Karam 49
Residual Frame No Motion Compensation 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 4/1/2012 50
Motion vector map ( 16x16 MV Frame14 blocks, Full pixel) 20 18 16 14 12 10 8 6 4 2 0 0 5 10 15 20 25 Copyright 2007 by Lina J. Karam 51
Motion vector map ( 4x4 blocks, MV Frame14 Full pixel) 80 70 60 50 40 30 20 10 0 0 10 20 30 40 50 60 70 80 90 100 Copyright 2007 by Lina J. Karam 52
Residual Frame No Motion Compensation 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 4/1/2012 53
Residual Error ( 16x16 blocks, Full pixel) 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 PSNR = 31.6 db, MSE = 45.1 Copyright 2007 by Lina J. Karam 54
Residual Error ( 8x8 blocks, Full pixel) 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 PSNR = 34.2 db, MSE = 24.5 Copyright 2007 by Lina J. Karam 55
Residual Error ( 4x4 blocks, Full pixel) 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 PSNR = 37.3 db, MSE = 12 Copyright 2007 by Lina J. Karam 56
Residual Error ( 4x4 blocks, Half pixel) 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 PSNR = 39.5 db, MSE = 7.5 Copyright 2007 by Lina J. Karam 57
Residual Error ( 4x4 blocks, quarter pixel) 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 PSNR = 40.1 db, MSE = 6.3 Copyright 2007 by Lina J. Karam 58
Other Sources of Information Lossless Coding, Lina J. Karam, Chapter 5.1 in Handbook of Image and Video Processing, 2 nd edition (Al Bovik editor), Elsevier Academic Press, 2005. Lossless Compression Handbook, Khalid Sayood editor, Elsevier Academic Press, 2003. H.264 and MPEG-4 Video Compression, Iain E. G. Richardson, John Wiley & Sons, 2003. Digital Video Processing, A. Murat Tekalp, Prentice-Hall, 1995. Copyright 2007 by Lina J. Karam 59
Other Sources of Information Motion Estimation Fast Search Block-based Methods T. Koga et al., Motion-compensated interframe coding for video conferencing, National Telecommunications Conference,, G5.3.1-5, New Orleans, LA, Nov. 1981. R. Srinivasan and K. Rao, Predictive Coding Based on Efficient Motion Estimation, IEEE Transactions on Communications, volume 33, issue 8, pages 888 896, Aug. 1985. M. Ghanbari, The cross-search algorithm for motion estimation and image coding, IEEE Transactions on Communications, volume 38, issue 7, pages 950 953, July 1990. S. Zhu and K. Ma, A new diamond search algorithm for fast block matching motion estimation, International Conference on Information, Communications and Signal Processing, volume 1, pages 292 296, Sept. 1997. C. Zhu, X. Lin, and L.-P. Chau; Hexagon-based search pattern for fast block motion estimation, IEEE Transactions on Circuits and Systems for Video Technology, volume 12, issue 5, pages 349 355, May 2002. X. Jing, L.-P. Chau, An efficient three-step search algorithm for block motion estimation, IEEE Transactions on Multimedia, volume 6, issue 3, pages 435 438, June 2004. Copyright 2007 by Lina J. Karam 60