Bi-directional optical flow for future video codec

2016 Data Compression Conference Bi-directional optical flow for future video codec Alshin Alexander * and Alshina Elena * * Digital Media R&D Center 416, Maetan 3-dong, Yeongtong-Gu Suwon, 443-742, Korea Rep. of {alexander_b.alshin, elena_a.alshina}@samsung.com Abstract: Paper presents theoretical explanation for bi-directional optical flow technique in generic case. Both non-equal distance to reference frames and two reference frames from the same side of predicted frame are allowed. Dynamic range analysis during bi-directional optical flow calculations is provided. Limits for refinement motion vector are recommended. 1. Introduction High Efficient Video Coding (HEVC) [1] is the best performing video compression standard. It is targeted to provide roughly twice better compression compared to its predecessor widely used H.264/AVC. Considering industry needs for further coding efficiency improvements ITU-T Study Group 16 and MPEG established Joined Exploration Team on Video (JVET) for advanced video compression technologies study. Joint Exploration Model (JEM) is common area for testing proposed algorithms. This paper describes some theoretical and implementation aspects of so-called bi-directional optical flow which is the part of first version of JEM1 ([2]). 2. Bi-directional optical flow for arbitrary location of reference frames Technique called Bi-directional Optical flow (BIO) was initially proposed in [3]. Then it was studied in HEVC Core Experiment 1 devoted to decoder motion derivation [4]. Detailed analysis of BIO technology was done in [5]. At that time 8 bits video was mainly in focus and so [3, 5] don t describe peculiarities of BIO implementation for bitdepth. Trying to reduce BIO complexity (which is weak point for all decoder-side motion derivation techniques) after [3, 5] several modifications were introduced. In particular shorter interpolation filter for gradients was introduced and equation for motion vector refinement was simplified in [6]. Since interpolation is much more robust than extrapolation in [3, 5, 6] BIO was applied only in the situation when 2 predictions come from different time directions. Qualcomm experts proposed to remove this restriction [7]. In this paper BIO equations are provided in the generic form, allowing both non-equal distance to reference frames and usage of 2 predictions coming from the same time direction. Let s remind the basic concept for BIO. During bi-directional motion compensation two references are available. The goal of BIO is to refine motion for each sample using only these two references. Suppose block motion is already compensated and both 2 prediction signals and their spatial derivatives are available. In classical biprediction 2 prediction samples located with the same coordinates as currently predicted 1068-0314/16 $31.00 2016 IEEE DOI 10.1109/DCC.2016.125 83

samples but in reference blocks after motion compensation are averaged. BIO uses samples A and B (Fig. 1) instead. The fine displacement vector is symmetrical and proportional to the distance to the reference frame. On Fig.1 shows 2 references Ref0 and Ref1 located in opposite time directions relatively to currently predicted B-slice. In this case the distance to the both references is positive and. If references are located on the same side relatively to current frame then all derivations for BIO equations are still valid but on of signs of is negative. Figure 1: bi-directional optical flow. Suppose that model of optical flow is valid along motion trajectory and so =0. (1) Let s model motion trajectory of the sample using 3 rd order polynomial 2 P t a a t a t a (2) 3 0 1 2 3t and derive coefficients of this polynomial using Hermite s interpolation. Hermite s curve is depicted on Fig. 2 compared to linear interpolation. The key difference is Hermite s interpolation matches not only values of interpolated function, but also derivative at the end of the interval. Assuming optical flow equation (1) we know values of time-derivatives at the ends of interpolation interval since we know spatial gradients on reference frames. Hermite interpolation (2) results in 84

And finally the value of Hermite interpolation polynomial at (currently predicted frame) is. (3) Figure 2: Hermite s interpolation compared to linear interpolation. The most frequently used bi-prediction case the distance to reference frames is the same and (3) becomes which is exactly the same as published in [6]. Applying optical flow PDE in (3) we get which is for the simplest case becomes, (4) 85

as published in [6]. 3. Dynamic range analysis in bi-directional optical flow calculations BIO operates as sample-wise motion refinement on top of block-based motion compensation. Let s discuss dynamic range in BIO compared to HEVC motion compensation process. For Luma motion compensation interpolation filters listed in Table 1 are used. Both filters are normalized by 64. Table 1 shows SumPos and SumNeg, which stand for sum of positive and sum of negative filter coefficients correspondently. The ½ interpolation filter has the largest difference between SumPos and SumNeg. This is the worst case from view point of dynamic range accumulation. Table 1. Luma motion compensation interpolation filters in HEVC. Fractional part block MV Filter coefficients, SumPos SumNeg 0 { 0, 0, 0, 64, 0, 0, 0, 0 } 64 0 ¼ { -1, 4, -10, 58, 17, -5, 1, 0 } 80-16 ½ { -1, 4, -11, 40, 40, -11, 4, -1 } 88-24 ¾ { 0, 1, -5, 17, 58, -10, 4, -1 } 80-16 Motion compensation interpolation is 2 stages process: horizontal and vertical interpolation. Each stage is convolution: ;. (5) Here are sample values in Reference Frame; is temporal buffer; is highprecision prediction. Filter size 8 corresponds to Luma motion compensation interpolation. Figure 3. Separable 2D -dimensional interpolation. Series of 2 convolutions (1) is illustrated on Fig.3. Reg1, Reg2 stand for values accumulated in register ( and in Luma motion 86

compensation correspondently). Temp1 and Temp2 are temporal buffers (for and in Luma motion compensation correspondently). Using SumPos and SumNeg known for filter with fixed coefficients one can estimate dynamic range of convolution as follows:. (6) Here is input, and are minimum and maximum possible values of R. If is internal bit-depth then. In HEVC,. (7) Using (6) we have for the register (Reg1). Which is after de-scaling becomes (Temp1). Temporal buffer values are an input for the second step of interpolation. Applying (5) again we have for the register (Reg2). Which is after de-scaling becomes (Temp2). Depending on internal bit-depth the bit-width (BW) on each stage of HEVC motion compensation interpolation process is summarized below: BW (Reg1) = d+8; BW (Temp1) = 16; BW (Reg2) = 23; BW (Temp2) = 17 + Max(0;d-12). It is easy to observe that system of shifts () for (5) ensures the register within 32 bits; and temporal buffer Temp1 within 16 bits. There is minor overflow of 16-bits for Temp2 if d12, which becomes more significant at higher d. Additionally to motion compensation prediction ( in (4) for bi-directional optical flow spatial derivatives ( need to be calculated. Derivatives calculation depends on fractional position of block motion vector. The procedure for gradients calculation was designed conceptually similar to (1) in order to reuse as much as possible HEVC implementation. So gradients calculation is system of 2 convolutions: ; ; (8) ; 87

. (9) In order to maintain reasonable complexity sizes of interpolation filters sizes for gradients calculation in BIO were chosen smaller (6 taps) compared to motion compensation interpolation (8 taps). The input for the gradient s calculation is reference samples (as well as for motion compensation interpolation). First step is horizontal interpolation and second is vertical (in order to be consistent with HEVC motion compensation design). For horizontal derivative ( ) calculation first gradient interpolation filter () is applied and then derivative is interpolated in vertical direction according to fractional part of vertical component of block motion vector. For vertical derivative ( ) calculation first signal is interpolated in horizontal direction (using ) according to fractional part of horizontal component of block motion vector, and then gradient interpolation filter () is applied in vertical direction. Tables 2 and 3 shows coefficients of BIO interpolation filters for signal ( ) and gradients ( ) correspondently. If is function of then was designed to provide value ; and to provide the value of in fractional position. Table 2. BIO interpolation filters for signal. Fractional part block MV coefficients, SumPos SumNeg 0 {0, 0, 64, 0, 0, 0}, 64 0 ¼ {2, -9, 57, 19, -7, 2}, 80-13 ½ {1, -7, 38, 38, -7, 1 }, 78-14 ¾ {2, -7, 19, 57, -9, 2} 80-13 Table 3. BIO interpolation filters for gradients. Fractional part block MV coefficients, SumPos SumNeg 0 { 8, -39, -3, 46, -17, 5 }, 59-59 ¼ { 4, -17, -36, 60,-15, 4 }, 68-68 ½ { -1, 4, -57, 57, -4, 1 }, 58-58 ¾ { -4, 15, -60, 36, 17, -4 } 68-68 System of shifts in (8-9) was designed in order to ensure register size within 32 bits and temporal buffers sizes within 16 bits as follows: (10) With selection (10) bit-width for gradients calculation is as follows: 88

There is no overflow of 16-bits for both temporal buffers (Temp1 and Temp2) as well as there is no 32 bits overflow in a register. 4. Motion refinement vector calculation The BIO is applied as modification of classical bi-prediction (4). The motion refinement vector minimizes the difference between samples and derivatives values in window surrounding points A and B (Fig. 1): where (11) As shown below the bit-depth of gradients which are input of BIO process is 16. So in a worst case calculation requires >32 bits register. In order to prevent division by 0 in (6) small regularization term is introduced in denominator. Since bit-depth of increases proportionally to, regularization term also increases with internal bit-depth as. Bi-directional optical flow was constructed as fine motion refinement under assumption that, are small. So, there is a clipping to, diapason in (6). Also the denominator in (6) is restricted to be larger than. 5. Conclusions Bi-directional optical flow (BIO) model is presented in generic form in this paper. All formals are derived for arbitrary location of reference frames relatively to currently predicted. Both non-equal distance to reference frames and two predictions coming from the same time directions are allowed. Dynamic range analysis for gradients calculation in BIO as well as recommendation for restriction of for refinement vector diapason is provided. 89

References [1] High Efficiency Video Coding, Rec. ITU-T H.265 and ISO/IEC 23008-2, Jan. 2013. [2] G.J.Sullivan, J. Boyce, J. Chen, E.Alshina, Future video coding: Joint Exploration Model 1 (JEM1) for future video coding investigation, COM16-TD213, Oct. 2015, Geneva. [3] A. Alshin, E. Alshina, Bi-directional optical flow, Joint Collaborative Team on Video Coding (JCT-VC) of ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11, JCTVC-C204, Guangzhou, China, 10-15 October, 2010. [4] Shunichi Sekiguchi, Yusuke Itani (Mitsubishi), Report of CE1 :Decoder-Side Motion Vector Derivation, Joint Collaborative Team on Video Coding (JCT-VC) of ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11, JCTVC-D098, Daegu, Jan. 2011 [5] A. Alshin, E. Alshina, T. Lee Bi-directional optical flow for improving motion compensation, Picture Coding Symposium (PCS), 2010. [6] A.Alshin, E.Alshina, M.Budagavi, K. Choi, J. Min, M. Mishourovsky, Y. Piao, A. Saxena Coding efficiency improvements beyond HEVC with known tools, SPIE Optics + Photonics 2015, San Diego. [7] X.Li, J.Chen, W.-J. Chein, M. Karczewicz, Harmonization and improvement for BIO, COM16-C1045, Oct. 2015, Geneva. 90