Localized Multiple Adaptive Interpolation Filters with Single-Pass Encoding

Localized Multiple Adaptive Interpolation Filters with Single-Pass Encoding Xun Guo 1, Kai Zhang 1,3, Yu-Wen Huang 2, Jicheng An 1, Chih-Ming Fu 2 and Shawmin Lei 2 1 MediaTek Inc., Beijing, China 2 MediaTek Inc., Hsinchu, Taiwan 3 Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China ABSTRACT Adaptive interpolation filtering (AIF) algorithms have been proposed to enhance the hybrid video coding scheme recently. Although these algorithms can improve coding efficiency significantly, their encoders suffer huge increase in complexity in terms of latency and memory access due to its inherent multi-pass encoding procedure. In this paper, we present a novel single-pass solution for these algorithms, which allows optimal selection among different interpolation filters. In this solution, time-delayed interpolation filters are used to achieve single-pass encoding, and localized ratedistortion (RD) selection is used to compensate the possible coding loss from time-delayed filters. Experimental results show that the proposed method is efficient for All AIF techniques in current ITU-T/SG16 reference software. By using the proposed method, single-pass encoding with multiple AIF filters can be achieved while maintaining similar coding efficiency as multi-pass AIF. Keywords: Adaptive interpolation filter, single-pass encoding, H.264/AVC, motion compensation 1. INTRODUCTION In order to further improve the coding performance of H.264/AVC[1], ITU-T Study Group 16 (SG16), known as the video coding expert group (VCEG), has been exploring new promising techniques aiming at next generation video coding (NGVC/H.265) since 2006. Many techniques with good performance have been adopted into KTA (key technology area) software, which is a test model of VCEG. Among them, adaptive interpolation filtering (AIF) algorithm is one of the most important techniques for its outstanding coding efficiency. Compared to the fixed H.264 interpolation filter, AIF can achieve a bitrate reduction up to 26%. The basic concept of AIF is to apply Wiener filter to minimize the prediction errors between predictive pixels and original pixels. In current KTA software, several AIF related algorithms have been included. 2D non-separable AIF (NAIF) [2] was first proposed by Vatis. For each frame, different adaptive interpolation filtering coefficients will be derived for different sub-pixel positions and then transmitted to the decoder. Because of its outstanding coding performance, NAIF was adopted by KTA and attracted many researchers. Then 2D separable AIF (SAIF) [3] and directional AIF (DAIF) [4] were adopted, which can reduce the complexity of NAIF without much sacrifice of coding gain. After that, Qualcomm proposed enhanced AIF (EAIF) [5] and enhanced DAIF (EDAIF) with additional integer pixel filtering and DC offset at sub-pixel level, which can further improve the coding performance. To significantly reduce the complexity of AIF algorithms, a switched interpolation filter method was proposed in [6], which provides the capability of switching among several fixed filters and sending DC offset at sub-pixel level. However, this method usually has significant lower coding performance than aforementioned AIF methods. All existing AIF methods takes into account the alteration of image signal and adaptively estimate optimal filter coefficients for each frame. For this purpose, these AIF methods need at least two encoding passes. The first pass is a normal encoding pass with the fixed H.264 filter for sub-pixel interpolations in motion estimation. By using the motion information of the first pass, a set of linear equations according to Wiener filtering formulation is then constructed. After solving these equations, the AIF coefficients can be derived. After that, the second pass begins and re-encodes the current frame using the newly derived AIF coefficients for sub-pixel interpolations. More encoding passes can also be performed if extra complexity is affordable and better coding efficiency is desired. This multi-pass encoding procedure can achieve good performance but suffers from two obvious drawbacks: high encoding latency and memory access overhead. First, the inherent serial encoding nature of the multi-pass procedure would prevent the use of parallel

computations in hardware implementation, thereby causing a roughly multiple times encoding latency of a picture. Second, the multi-pass procedure has to access the reference picture buffer twice or more, at least leading to a doubled amount of off-chip memory access. Our proposed single-pass AIF solution can well tackle these problems [7][8]. For each frame to be encoded, we first build up a competitive filter set (CFS) including the fixed high precision H.264 filter and Wiener filters trained by several previously coded frames. A rate-distortion optimization (RDO) process is then used to choose the one with the minimum cost for particular local areas, which can provide a better local adaptation and compensate possible inefficiency induced by using the Wiener filters trained by previous frames. There are two main advantages by using the proposed method: low off-chip memory access and parallelizable computations for different interpolation filters. And further more, the proposed single-pass AIF solution takes little performance loss when applied on top of all the existing AIF techniques. The rest of this paper is organized as follows. Section 2 gives an introduction to current AIF techniques. Section 3 discusses the proposed single-pass AIF method in details. Experimental results and corresponding analysis are presented in section 4. Finally, section 5 draws conclusions. 2. ADAPIVE INTERPOLATION FILTERING TECHNIQUES 2.1 Interpolation process in H.264/AVC Current H.264/AVC standard is based on quarter-pel motion accuracy. That is, in order to estimate and compensate subpixel motion vectors (MV), the reference frames need to be up-sampled four times. Therefore, the pixel values on subpixel positions need to be interpolated. As illustrated in Figure 1, blocks with upper-case letters are integer pixels and blocks with lower-case letters are sub-pixel positions to be interpolated. When the interpolation begins, half-pel positions (i.e. b, h, j and aa~jj) will be calculated using a fixed 6-tap Wiener filter (1, -5, 20, 20, -5, 1)/32. After that, the remaining quarter-pel positions can be obtained using a bilinear filter applied at the interpolated half pixels and integer pixels. Figure 1. Integer pixels (blocks with upper-case letters) and sub-pixel positions (blocks with lower-case letters) for quarter-pel interpolation 2.2 Wiener Filter based AIF techniques Considering different characteristics of different frames, AIF techniques calculate an individual set of coefficients for each sub-pel position by using Wiener filtering formulation frame by frame. And different sub-pixel position will use different filter taps and symmetry. For example, if the sub-pel value to be interpolated is located at a, b, c, d, h or l position, a one-dimensional 6-tap Wiener filter is derived using the integer pixels C1~C6 for a, b, c and A3~F3 for d, h, l. Similar to H.264 filter, the filter coefficients for b and h positions are symmetric (e.g. C1 and C6 share the same filter

coefficient). For the other positions, e, f, g, i, j, k, m, n and o, one-dimensional or two-dimensional Wiener filters will be used according to different AIF techniques. For NAIF, a two dimensional filter using all 36 integer pixels are derived. While for DAIF, one-dimensional filters are used for the sub-pixel positions at diagonal lines, e, g, m and o. EAIF changes the traditional usage of integer pixels for these positions, a smaller number of integer pixels are used to reduce complexity and coefficients overhead. In Wiener filtering formulation, the filter coefficients are obtained by solving a set of linear equations, which are to minimize the prediction errors between original pixels and the Wiener-filtered motion-predicted pixels. Specifically, assuming that S x,y is an original pixel with coordinate (x, y), its predictive value in sub-pixel position sp, p sp, can be computed as, N M sp sp p = Pi, j hi 1, j 1+ c, (1) i= 1 j= 1 where {P i,j } are the integer pixels within the filter window (NxM), and c is the possible DC offset. Therefore, the prediction error energy can be formulated as, N M 2 sp 2 = xy, ij, i 1, j 1 x y i= 1 j= 1. (2) sp ( E ) ( S P h c) A set of linear equations for each sub-pel positions then can be built up by setting the partial derivative of (E sp ) 2 to zero: sp 2 ( E ) = 0, kt, (0,5) (3) sp hkt. In practical application, the motions between original pixels and predictive pixels should be considered. Thus, the partial derivative equations can be written as, 2 N M SP (4) SP Sxy, hij, Px+ iy, + j c h % % kl, x y i j where x% = x+ mvx FO, y% = y+ mvy FO (mvx and mvy are motion vectors, and FO is the filter offset). By solving these equations, the optimal interpolation filters for sub-pixel positions can be obtained. In EAIF and EDAIF, integer pixel filter and DC offset for each sub-pixel position are also computed to improve the prediction accuracy. 3. PROPOSED SINGLE-PASS AIF 3.1 Basic ideas Through analyzing the derivation process of AIF coefficients, we can find that the final filter coefficients are related to the contents of images and the temporal motions between different frames. That is, if consecutive frames in a sequence have similar contents and motions, the AIF coefficients of them will be very similar. Based on two observations, we propose a novel single-pass AIF algorithm. The first one is that there exists a high correlation among the adaptive Wiener filters of several consecutive frames within a video sequence. That is, the coefficients derived at one time can be well used by frames at the following time if there is no big difference in image contents, e.g. a scene change. The second observation is that the optimal filter may vary for different areas of a picture. Therefore, we propose to use multiple filters in one frame and make a local adaptation. Compared to the current AIF techniques in KTA reference software, the proposed single-pass AIF has the following two new ideas. 1. For each frame, only one encoding pass is performed and the Wiener filter coefficients which are optimal for the current frame are computed but not used for the current frame. When we start encoding the next frame, the Wiener filters will be used as a candidate together with several other filters. 2. All current AIF techniques use global frame level or sub-pixel level RD selection. The proposed algorithm also considers the local characteristics of an image and uses an area-based selection among several filters. This area can be any partitions with regular or irregular shapes. In order to be compatible with the current standard, we use macroblock (MB) level selection in this paper. 3.2 RD selection of multiple filters At the encoder, when a frame is input, an encoding process similar to normal H.264 is performed based on the competitive filter set (CFS). The main difference is the mode decision process, in which the proposed algorithm will check each candidate filter in CFS one by one. Assume filter i is being checked currently, motion estimation and mode

decision will be performed with reference frames interpolated using filter i. The best MB mode of the given filter i, denoted as mode i, can be formulated as mode = arg min J ( m) = arg min D ( m) + λ R ( m), (5) { } i i i mode i m m where J i is the RD cost function, D i and R i are the distortion function and rate function respectively, and λ mod e is a Lagrange multiplier. After all the candidates have been checked, the final filter type f is chosen as, f = arg min J ( mode ). (6) i If the number of candidate filters is larger than 1, a filter index needs to be transmitted to indicate the selected filter for each MB. Considering complexity reduction, motion search on full pixels can be done only for the first candidate filter. After the whole frame has been encoded, the optimal filter coefficients of it can be obtained. This group of filter coefficients then will be stored into a filter buffer, which can be used by the subsequent frames. Figure 2 shows an example of IPPP structure. For the frame P t, a filter candidate set CFS t is built up. In this example, the CFS consists of the adaptive Wiener filters from previous two frames, F t-2 and F t-1, and the fixed H.264 high precision filter. For each frame, the optimal adaptive filter will be written into the slice header of the next frame. At the decoder, bitstream which contains filter indices and filter coefficients will be parsed and used in motion compensation. The filter coefficients transmitted in slice header will be decoded and added into a filter buffer, which contains all the previous adaptive filters that may be used in the subsequent decoding process. i i Figure 1. Example of single-pass encoding process using multiple adaptive interpolation filters for IPPP structure. 3.3 The competitive filter set (CFS) The candidate filters included in CFS can be user-defined. For description convenience, we define CFS_N as the number of candidate filters in CFS. When CFS_N = 1, the CFS contains only the filter coefficients transmitted in the current frame, which are also the optimal filter of the previous frame in coding order. When CFS_N increases to 2, the H.264 high precision filter is added into CFS. And in case of CFS_N > 2, the CFS will include filters transmitted in reference frames, which are the optimal filters of earlier frames. Figure 3 shows the statistics of filter type percentage for three sequences when CFS_N = 4. F 0 ~F 2 denotes the optimal filters of three previous frames from the nearest to the farthest. From the figure, we can see that most MBs choose the filter coefficients from previous frames. Too many filters may lead to too much overhead, thereby reduce the overall coding performance. Figure 4 shows an example of MB level filter selection. This example is the resulted MB filter selections of a frame from mobile CIF sequence. The blocks in the figure mean MBs. Three different filters are represented by three different colors. The blue color is the filter from the previous frame; the red color is the high precision H.264 filter; and the grey color is the filter from the frame prior to the previous frame. From this figure, we can see that different areas in one frame may choose different filters and the filter from the previous frame was chosen most often.

0. 5 0. 45 0. 4 0. 35 0. 3 0. 25 0. 2 0. 15 0. 1 0. 05 0 F0 F1 F2 H. 264 ci t y ni ght mobile Figure 2. Distribution percentage of different filter types. Figure 4. An example of MB-level filter selection for the third frame of CIF mobile sequence. 3.4 Hardware implementation advantages The proposed single-pass algorithm is much friendlier for hardware implementation. First, due to the MB level adaptation instead of slice or picture level adaptation, the proposed single-pass algorithm can simply perform on-chip parallel processing for different filters. That is, the reference pixels for interpolation with different filters can be loaded only once, and the motion estimation/compensation can be done simultaneously for all these filters. This can significantly reduce the coding latency. Second, the proposed algorithm can also avoid excessive off-chip memory access overhead. In hardware implementations, reference pictures are usually stored in the off-chip memory due to the limited on-chip memory size of today s technology. The search area pixels in reference pictures for one MB are loaded from the off-chip memory to the on-chip memory for performing motion estimation/compensation with on-the-fly fractional pixel interpolation. It is obvious that the multi-pass AIF scheme has to access reference pictures for multiple times, while the single-pass AIF can simply load the on-chip memory once for different filters to perform on-chip access simultaneously. In general, the power consumption for off-chip access is about 10 times of that for on-chip access. And the access time for the off-chip memory is also about 10 times of that for the on-chip memory. 4. EXPERIMENTAL RESULTS In order to verify the coding efficiency of the proposed algorithm, we will show the experimental results in this section. The proposed algorithm has been implemented in KTA software and released as KTA2.5 [9]. The experiments are

performed according to common test conditions in [10]. The experimental results for NAIF, SAIF, DAIF and EAIF IPPP high profile are shown in table 1 ~ table 4, respectively. For each table, the column CFS_N=k represent the bitrate reduction [11] compared to H.264 for the proposed algorithm with k candidate filters. The columns *_Multi and *_Two represent the ΔBitrates compared to H.264 for multi-pass SAIF/DAIF with the parameter RDPictureDecision on and off, respectively. When the parameter is on, the encoder performs multi-pass encoding including two-pass AIF and two additional normal coding passes with QP±1. And when the parameter is off, the encoder only performs two-pass AIF.According to different CFS_N values, different filter sets are used as previously described. In case of CFS_N=1, only the filter from the previous frame is used and no MB level adaptation is necessary. For the cases CFS_N larger than 1, multiple filters can be chosen at MB level to improve the coding performance. From the tables, we see that the proposed algorithm can achieve similar coding performance in all cases when CFS_N is 3. In this case, the complexity and coding performance achieve a very good compromise. A bitrate reduction of 0.35%, 0.69%, -0.25% and 0.46% for NAIF, SAIF, DAIF and EAIF compared to multi-pass AIF anchors are observed. But the complexity reduction is very large in this case. 5. CONCLUSIONS In this paper, we presented a newly adopted H.265 method: single-pass encoding algorithm using multiple adaptive interpolation filters. This algorithm allows switching between several filters at MB level. After all the MBs are encoded, the optimal filter for current frame will be stored and used by the subsequent frames. The experimental results show that the proposed algorithm can achieve single-pass encoding while maintaining similar coding efficiency as multi-pass AIF. The proposed single-pass AIF scheme is much friendlier to hardware implementation. ITU-T/SG16 VCEG has adopted this algorithm as a key technology for the potential NGVC/H.265 standard. REFERENCES 1. ITU-T, Recommendation H.264, http://www.itu.int/rec/t-rec-h.264-200503-i/en, Mar. 2005. 2. Y. Vatis and J. Ostermann, Comparison of complexity between two-dimensional non-separable adaptive interpolation filter and standard wiener filter, ITU-T SGI 6/Q.6 Doc. VCEG-AA11, Nice, France, October 2005. 3. S. Wittmann, T. Wedi, Separable Adaptive Interpolation Filter, ITU-T SG16/Q.6 Doc. T05-SG16-C-0219, Geneva, Switzerland, June 2007. 4. D. Rusanovskyy, K. Ugur, J. Lainema, Adaptive interpolation with directional filters, ITU-T/SG 16, VCEG- AG21, Oct. 2007. 5. Y. Ye, M. Karczewicz, Enhanced adaptive interpolation filter, ITU-T/SG 16, Doc. C464, Apr. 2008. 6. M. Karczewicz, Y. Ye, and Peisong Chen, Switched interpolation filter with offset, ITU-T/SG 16, VCEG-AI35, July, 2008. 7. K. Zhang, X. Guo, Y-W Huang and S. Lei, Single-pass encoding using multiple adaptive interpolation filters, ITU-T SG16/Q.6 Doc. VCEG-AK26, Yokohama, Japan, April, 2009. 8. X. Guo, K. Zhang, Y-W Huang, C-M, Fu and S, Lei, Single-pass 2D separable AIF and Directional AIF, ITU- T/SG 16/Q.6 Doc. VCEG-AL26, Geneva, Swizerland, July, 2009. 9. VCEG, reference software KTA 2.4, http://iphome.hhi.de/ suehring/tml/download/kta/jm11.0kta2.5.zip. 10. TK Tan, G. Sullivan, and T. Wedi, Recommended Simulation Common Conditions for Coding Efficiency Experiments Revision 1, ITU-T Q.6/SG16 VCEG, VCEG-AJ10, San Diego, USA, 8-10 July, 2008. 11. G. Bjontegaard, Calculation of average PSNR differences between RD-curves, ITU-T/SG 16, VCEG-M33, Apr. 2001.

Table 1 Bit-rate reduction for 2D non-separable AIF (NAIF), compared to H.264 NAIF_Multi NAIF_Two CFS_N=1 CFS_N=2 CFS_N=3 paris 0.25 0.82 2.99 3.92 2.66 CIF foreman -3.37-3.34 2.10-0.01-1.18 mobile -3.56-3.46-2.89-4.86-5.69 tempete -1.12-1.04-0.03-1.43-2.44 flower4-3.80-3.02-4.39-3.53-4.93 WVGA keibai3-2.68-2.22-1.25-2.44-3.49 nuts5 0.22 3.17 6.94 7.16 5.93 SVGA janine -3.13-3.04-3.80-5.79-6.27 bigships -4.84-4.97-3.49-5.02-5.66 city -7.35-7.07-3.54-6.41-7.79 720p crew -3.28-1.69 2.69 0.97-0.54 jets -3.40-3.08 0.14-0.54-1.47 night -5.56-5.48-6.14-7.25-7.77 raven -15.98-15.94-4.80-9.71-9.86 crowdrun -1.38-1.47-1.55-2.62-3.23 parkjoy -1.69-1.70-2.00-3.27-3.62 1080p toys_calendar -7.69-7.68-5.81-7.58-8.02 sunflower -25.02-24.95-20.01-21.80-22.14 traffic -4.16-4.27-4.11-4.71-5.27 AVERAGE -5.13-4.76-2.58-3.94-4.78 Table 2 Bit-rate reduction for 2D separable AIF (SAIF), compared to H.264 SAIF_Multi SAIF_Two CFS_N=1 CFS_N=2 CFS_N=3 paris 0.11-2.94 4.27 3.85 2.75 CIF foreman -3.02-5.29 5.31 0.57-0.32 mobile -5.25-1.24-4.38-6.24-7.04 tempete -1.31-2.81 3.21-1.57-2.44 flower4-3.59-3.64-2.12-3.75-5.11 WVGA keibai3-3.90 2.42-1.77-3.62-4.52 nuts5 0.50-3.11 9.39 7.31 5.89 SVGA janine -2.87-6.41-3.10-5.37-5.74 bigships -6.39-7.35-6.44-6.80-7.18 city -7.70-1.97-3.71-6.76-7.33 720p crew -3.23-3.71 7.31 1.43 0.35 jets -4.21-5.28 0.79-2.46-2.93 night -5.30-17.97-4.01-6.81-6.97 raven -18.22-1.56-5.97-10.57-10.85 crowdrun -1.55-2.26-1.21-2.46-2.92 parkjoy -2.26-8.85-2.06-3.57-3.87 1080p toys_calendar -8.95-26.27-7.70-8.77-9.17 sunflower -26.61-4.40-20.71-22.55-22.45 traffic -4.44-2.94-4.36-4.59-5.16 AVERAGE -5.69-5.37-1.96-4.35-5.00

Table 3 Bit-rate reduction for directional AIF (DAIF), compared to H.264 DAIF_Multi DAIF_Two CFS_N=1 CFS_N=2 CFS_N=3 paris 0.14 1.04 2.69 2.12 1.38 CIF foreman -2.51-2.57 0.36-1.58-2.64 mobile -0.96-0.52 4.91-2.73-2.80 tempete -0.20 0.42 4.50-0.55-0.45 flower4-2.28-1.31-1.94-2.92-3.78 WVGA keibai3-2.52-2.54-2.45-3.15-4.41 nuts5-0.27 2.14 3.93 3.38 2.70 SVGA janine -2.62-2.61-3.76-5.28-5.84 bigships -4.63-4.67-3.90-5.40-5.93 city -6.12-5.83-3.72-6.80-7.16 720p crew -2.87-1.37 2.21 0.85-0.10 jets -3.93-3.16-2.03-2.60-2.82 night -4.77-4.61-5.26-6.37-6.82 raven -15.89-16.24-7.06-10.57-10.93 crowdrun -0.37-0.36 0.25-2.05-2.41 parkjoy -0.62-0.57-0.21-2.83-3.12 1080p toys_calendar -8.06-8.05-7.64-8.27-8.79 sunflower -24.73-24.63-21.61-22.19-22.38 traffic -2.60-2.57-1.63-4.14-4.25 AVERAGE -4.52-4.10-2.23-4.27-4.77 Table 4 Bit-rate reduction for Enhanced AIF (EAIF), compared to H.264 EAIF_Multi EAIF_Two CFS_N=1 CFS_N=2 CFS_N=3 paris -0.45-0.34 0.43 0.19-0.12 CIF foreman -4.30-4.27-2.41-3.35-3.60 mobile -3.93-3.88-4.74-5.45-5.55 tempete -1.32-1.25-1.35-1.79-1.87 flower4-3.73-3.71-4.68-6.05-6.12 WVGA keibai3-3.89-3.80-3.53-4.62-4.93 nuts5-5.16-3.83-1.44-1.79-2.02 SVGA janine -4.69-4.54-5.73-5.52-6.61 bigships -5.98-6.13-8.04-8.32-8.38 city -10.10-9.96-8.36-9.06-9.47 720p crew -6.15-5.83-2.10-3.10-3.34 jets -7.28-7.04-5.83-6.73-6.27 night -7.05-6.86-6.66-7.00-7.22 raven -19.95-19.89-12.57-12.47-12.65 crowdrun -1.62-1.66-2.56-3.04-3.27 parkjoy -2.42-2.41-3.72-4.12-4.23 1080p toys_calendar -9.74-9.72-9.48-9.52-9.77 sunflower -25.93-26.21-21.51-21.47-21.46 traffic -5.44-5.43-3.20-3.29-3.48 AVERAGE -6.80-6.67-5.66-6.13-6.34