Fast Transcoding From H.264/AVC To High Efficiency Video Coding

2012 IEEE International Conference on Multimedia and Expo Fast Transcoding From H.264/AVC To High Efficiency Video Coding Dong Zhang* 1, Bin Li 1, Jizheng Xu 2, and Houqiang Li 1 1 University of Science and Technology of China, Hefei, 230027, P.R. China 2 Microsoft Research Asia, Beijing, 100080, P.R. China {top2, yhlibin}@mail.ustc.edu.cn, jzxu@microsoft.com, lihq@ustc.edu.cn Abstract In this paper, we present several strategies for transcoding from H.264/AVC bitstreams to High Efficiency Video Coding (HEVC) bitstreams. Because HEVC and AVC share the similar coding architecture, we try to exploit the information in AVC bitstreams as much as possible. For inter picture, we utilize the power spectrum based rate-distortion optimization (PS-RDO) model as well as the input residual, modes and motion vectors to estimate the best coding unit (CU) split quadtree, the best prediction unit (PU) mode and the best motion vector of each PU partition For intra picture, we propose to reduce the CU and PU partition candidates. The proposed strategies can significantly reduce the transcoding complexity in terms of reduced processing for RDO evaluations, motion estimation, motion compensation as well as fractional pixel interpolation operations. Experiment results show that the proposed transcoding methods can achieve a good tradeoff between coding efficiency and transcoding complexity. Keywords- H.264/AVC; HEVC; transcoding I. INTRODUCTION The increasing of networked video applications, e.g. video conferencing, IPTV and HDTV, with resolution ranging from QVGA to ultra-high definition video, have posed new challenges to design video representation and transmission system, especially for applications with various devices trough heterogeneous wired and wireless networks. How to make video be suitable for various device capabilities and dynamical bandwidths becomes very challenging. Transcoding is one of the most promising technologies, which provides video adaptation in terms of bit-rate reduction, resolution reduction and format conversion to meet various requirements. However, the emerging developments in video coding technology make transcoding much more complicated. A new developed video coding standard always creates new requirement for transcoding from existed formats to the new format for the interoperability of video contents The H.264/AVC (shorten for AVC) [2] standard, which offers better coding performance than previous coding standards such as MPEG-2/H.262, H.263 and MPEG-4 Visual, has been widely used in IPTV, satellite digital multimedia broadcasting (DMB) and mobile communication applications. The High Efficiency Video Coding (HEVC) [1] standard, which is currently under development by the Joint Collaborative Team on Video Coding (JCT-VC), is reported to provide a bit rate saving for equal PSNR of about 39% for random access applications, 44% for low-delay use, and 25% for all-intra use compared with the AVC High Profile [3]. Thus, it can be expected as a successor to AVC. The wide use of the AVC standard today and the expected adoption of HEVC raises a new demand for AVC to HEVC transcoding. In practical, a video transcoder should make tradeoff between complexity and coding performance while making full use of the input bitstream to generate a new one. According to HEVC Working Draft 4 [1], the current HEVC test model (HM 4) [8] still belongs to block-based hybrid video coding framework, except that the block size is extended to up to 64x64 compared with that of AVC (16x16). Basically, AVC and HEVC share a similar prediction, transform, quantization, and entropy coding architecture. In the early works on bit rate reduction in AVC-AVC transcoding, researchers mainly focused on fast motion estimation and mode re nement by limiting possible modes as well as searching points [4] [5]. However, since ratedistortion cost of multiple modes still needs to be evaluated, a mass of sum of absolute difference/sum of square difference (SAD/SSD) computation as well as fractional pixel interpolation has to be involved in the motion reestimation or motion re nement process. Thus, the computation complexity of these techniques cannot be ignored. To further reduce the transcoding complexity, Shen et al. [6] proposed a power spectrum based rate-distortion optimization (PS-RDO) model for transcoding inter pictures, where cost is directly estimated from motion vector (MV) variation and the PS of the prediction signal resulting from the input MV. It maintains a good tradeoff between complexity and coding performance. The main challenge of AVC-HEVC transcoding is caused by the hierarchical quadtree-structures of the coding unit (CU) and transform unit (TU) with extended and variable block sizes (the largest CU size can be 64x64 in HM 4) in the HEVC. According to [6], utilizing macroblock (MB) structures with sizes larger than 16x16 can signi cantly improve coding ef ciency, especially for high resolution videos. However, such design, in turn, brings in numerous RDO evaluations for selecting the best partition mode, quadtree structure as well as motion vectors (MV). Thus, to maintain a good tradeoff between complexity and coding performance, the transcoder should efficiently merge MB size from 16 16 to 64 64, especially for low bit rate transcoding, where the bits to represent modes and MVs would be a heavy burden so as to hurt coding performance. In this paper, we propose several transcoding strategies for AVC to HEVC transcoding with bit rate reduction. Considering the similar coding architecture of HEVC and AVC, and motivated by the work in [6], for inter picture * This work was done during Zhang s internship at MSR Asia. 978-0-7695-4711-4/12 $26.00 2012 IEEE DOI 10.1109/ICME.2012.112 651

transcoding, we utilize the PS-RDO model to determine the best CU quadtree structure, the best CU partition mode and the best motion vector of each prediction unit (PU), and for intra picture transcoding, we propose to reduce the candidate settings for CU quadtree structures and PU partitions. The proposed transcoding strategies maintain a good coding efficiency, and meanwhile avoid high computational complexity in terms of reduced RDO evaluations and motion compensation operation as well as fractional pixel interpolation operation. To the best of our knowledge, there is no prior work on AVC to HEVC transcoding. Thus, we believe out investigations can shed some light on the AVC- HEVC transcoding problem and inspire more work for this important application in the near further. The rest of this paper is organized as follows. HEVC is briefly introduced in Section II. The proposed transcoding strategies are discussed in section III. Experimental results are shown in Section IV. Section V concludes the paper. II. OVERVIEW OF HEVC from minimum 8 up to 32) together with a quadtree structure of TU partitions (from minimum 4x4 to 32 32 luma samples). Besides, it supports angular intra-picture prediction method using up to 35 predictions for intra prediction. For intra coded CUs, the PU is always square. But for inter coded CUs up to 7 kinds of PU sizes could be used in the default HM-4.0 test configurations (three symmetric partitions, e.g. 2Nx2N, Nx2N, and 2NxN, and additional four asymmetric partitions, e.g. nlx2n, nrx2n, 2NxnD, and 2NxnU, which are 1:3 division on one dimension). It should be noticed that an additional inter NxN (N=4) PU for 8x8 CU, could be enabled by an SPS (sequence parameter set) level flag. B. RDO Framework of HM 4 Comparing to AVC, the encoding computational complexity in HEVC to select the best coding parameter has been increased as the increasing of candidates for CU partitions, PU partitions and TU partitions. The main function for the CU compressing is illustrated in Fig. 1.(a). A. Main Differences Between H.264/AVC and HEVC Relative to prior video coding methods, large block size together with many other new features, such as adaptive loop filter (ALF), sample adaptive offset (SAO), partition merging, etc. are used in HEVC to further improve the coding efficiency. The main differences between AVC and HEVC due to extended block size are depicted in Table 1. TABLE I. DIFFERENCES BETWEEN H.264/AVC AND HEVC MB Size/ CU Size MC Block Size Intra Prediction Transform Size AVC 16x16 16x16,16x8,8x16, 8x8,8x4,4x8,4x4 4x4 and 8x8: 9, 16x16: 4 4x4,8x8 HEVC 8x8,16x16,32x32,64x64 2Nx2N,Nx2N,2NxN,nLx2N,nRx2N, 2NxnD,2NxnU (N=4,8,16,32) and conditionally NxN 4x4:18,8x8:35,16x16:35, 32x32:35,64x64:4 * 4x4,8x8,16x16,32x32, 4x16,16x4,8x32,32x8 AVC adopts block-based coding structure with fixed MB size (16x16) and flexible predict block shapes and modes. For intra coding, AVC enables 9 luma prediction directions for a 4x4 or 8x8 block and 4 prediction directions for a 16x16 block. While for inter coding, it supports variable block-size motion compensation (MC), from the largest 16x16 to a minimum luma MC block size as small as 4x4. Besides the improved prediction, it also enables 4x4 and 8x8 integer DCT transform, which can represent residuals in a more locally-adaptive fashion [2]. However, such small size of MB and transform block as well as MC blocks hinder further improvements of coding ef ciency due to less exibility to changing characteristics of input signals, especially for higher resolution videos. To improve exibility of predictive coding for both small and high resolution videos, HEVC adopts a quadtree structure of CU segmentation (the CU size is 2Nx2N, with N (a) CU compressing (b) Intra Mode Decision Figure 1. CU Processing in HM 4 Since the predictor, e.g. the intra predictor and motion vector predictor, of a CU is generated from neighboring coded CUs, the HM 4 performs preorder recursive traversal on the CU quadtree (Fig. 1.(a)). When encoding a CU at current depth, the best PU mode is determined by successively evaluating the RD costs of each inter and intra modes. The dotted box in Fig. 1.(a) indicates that this mode be evaluated conditionally, either depends on the CU size or some fast mode decision algorithms. After the best PU mode is determined, if current CU is larger than 8x8, it might further split into four sub-cus, and then recursively calls the CU compressing function to determine the best CU quadtree structure. The decision of best TU split tree is integrated in the determination of the best PU mode. However, there are some differences between Intra and Inter modes. For intra modes, e.g. Intra 2Nx2N mode and Intra NxN mode, the best predict direction and TU partition are determined as Fig. 1.(b). First, Sum of Absolute Hadamard Transformed Differences (SATD) for the predict errors (difference between original and predict block) with different predict directions are calculated to generate the candidate SATD mode list, where the SATD costs are sorted in increasing order, and the list is truncated according to the PU size to reduce the candidate directions. In order to ensure the * The number of intra predict direction for 64x64 block is 35 in HM-5.0. 652

coding performance, two most probable modes are added in the list if any of them is not in. After that, the best prediction direction is selected according to its RD cost by performing transform/quantization (T/Q) and inverse quantization/transform (IQ/IT) on the residuals with only the largest allowed transform size. Finally, in Residual Quadtree Transform (RQT) module, the best TU partition is determined by preorder recursive traversal with square splitting on the residual quadtree. For inter modes, the best motion vector and TU partition are determined as follows. For each mode, the best motion vector is selected according to the minimum cost with considering both SAD and the bits to represent the motion information. Then, the full depth RDO for RQT is performed to select the best transform partition size. The extended CU block size, PU and TU partition modes, as well as intra prediction directions result in high computational complexity for HEVC. Several fast mode decision methods have been proposed to reduce the complexity, in terms of coding tree pruning [9] and PU mode elimination [10]. In [9], the splitting of a CU will terminate if the best mode of current CU is SKIP mode. And in [10], the RDO process of rest PU modes will terminate if there is no residue in an inter coded CU. However, the performance of those methods for small quantization step size is not as good as that for large quantization step size. It can be explained as follows, the number of zero residual blocks decreases as the quantization step size decreasing. As the fast mode decision is highly demanded in the transcoding, in the next section, we will introduce our fast mode decision methods for transcoding in both intra coding and inter coding. III. PROPOSED AVC TO HEVC TRANSCODER A. Transcoding Architecture The proposed pixel domain AVC to HEVC transcoding architecture is illustrated in Fig. 2. The AVC decoder decodes the input bitstream and extracts information, such as texture, residual, modes, MVs, etc. Since the largest CU (LCU) in HEVC consists of 16 MBs in AVC, after AVC decoding, the information of these MBs will transmit to the mode selection module. Through such a module, for Intra coded picture, the candidate CU split quadtree and the candidate PU modes are reduced; and for Inter coded picture, the candidate CU split quadtree, the best PU mode as well as the best motion vector are determined with PS-RDO model [6], respectively, which will be discussed in details in the following part. Figure 2. Pixel Domain AVC-HEVC Transcoder B. Transcoding of Intra Coded Picture As we know, the quality of each intra picture will have significant impacts on the following inter pictures. Thus, we should keep its quality as better as possible when transcoding. Since the input AVC bitstream already contains useful information of the MB partitions and prediction directions, we propose the following transcoding strategy. -- The LCU will initially split according to the input MB modes in AVC. For example, if the input mode of a MB is Intra 4x4, the initial size of CU partition that contains such block will set to 8x8; if the input mode is Intra 8x8, it will set to 8x8; otherwise, it will set to 16x16. -- The initial CU partitions will be further merged to larger size according to the predict directions of its adjacent four sub-cus. For example, if the predict directions of adjacent four 8x8 CUs are the same, they will be merged to 16x16. Similar merge operations will also perform on CUs larger than 8x8. The merge process is applied from the 4x4 smallest 4x4 blocks to the blocks with size 32x32. -- The best CU split quadtree is determined with CU compressing in HM 4 (Fig. 1.(a)). However, there are following changes. First, if the current CU partition size is smaller than the modified initial CU partition size, the CU will not further split. Second, for a CU with size of 8x8, the Intra NxN mode will be ignored when the input predict direction of its four sub partitions are the same. Although other simplifications may also be applied to intra picture transcoding, for example, in Fig. 1.(b), using the input information to reduce the candidate predict directions for SATD, or reducing the candidate SATD list, the proposed method has already maintained good performance. Further, considering that inter picture transcoding may occupy the most complexity than intra picture transcoding, we should mainly focus on inter picture transcoding. C. Transcoding of Inter Coded Picture The major complexity of Inter picture coding comes from the motion estimation (ME), MC, T/Q and IQ/IT operations when testing every set of possible coding parameters with possible CU size, PU and TU modes. Thus, we propose to reduce these operations with the help of input AVC information, e.g. residuals, modes and MVs. The key technology of AVC to HEVC inter picture transcoding is to merge smaller blocks to a larger CU, especially for bit rate reduction transcoding. Since a large CU may consists of different 4x4 blocks, and probably, these blocks may have different MVs, merging these blocks now turns to measure the RD cost when the MV changes. 1) PS-RDO model for Inter transcoding Motivated by the motion activity [11] for an image block, [6] proposed an PS-RDO model for AVC to AVC bit rate reduction transcoding, which measures the importance of motion vectors, since the input bitstream is assumed with high quality, the RD cost resulting from a new motion vector can be represented as J = D + λ R (1) p p p 653

where R p denotes the number of bits to code current mode, and p denotes the Lagrangian multiplier. And distortion D p represents the SSD between the predict signal p in resulting from the input MV and the predict signal p new resulting from a new MV with the same reference picture. D ( () ()) 2 p = pin i pnew i (2) i block The SSD error (2) can be calculated free from MC. Let P in ( u, v ) be Fourier transform of prediction block p in, considering that when the input MV is adjusted to a new one with small deviation mv =( mv x, mv y ) T, the new prediction block p new can be expressed in frequency domain as P new ( u, v )=e -j mv P in ( u, v ). Let S in ( u, v ) be the power spectral density (PSD) of p in, according to the Parseval s theorem, the SSD distortion (2) in pixel domain equals to (3) in frequency domain 1 2 2 jωδmv Dp = ( ) Sin ( ωu, ωv )1 e dωudωv 2π (3) ( ππ, ) The distortion (3) is further simplified with Taylor expansion, and then ignores the high order terms [11], finally yields 2 D ( ) ( ) 2 p ϕx Δ mvx + ϕy Δ mvy (4) 1 2 2 ϕx = ( ) Sin ( ωu, ωv ) ωu dωudωv 2π (5) ( ππ, ] 1 2 2 ϕ y = ( ) Sin( ωu, ωv) ωv dωudωv 2π (6) ( ππ, ] According to [6], the PSD S in ( u, v ) of each block can be estimated by integer 4x4 DCT-like transform in AVC instead of the square of 2-D fast Fourier transform (FFT). Note that similar approximation also works in HEVC. And in practical terms, equation (5) and (6) are further modified into discrete forms when considering that the coef cients represent the estimated spectrum magnitudes at discrete frequency points ±2 m/2n, with m=0, 1, 2, 3 and N=4. 2) Proposed Inter picture transcoding strategy As mentioned before, we should reduce the ME, MC, T/Q and IQ/IT operation during the CU processing as well as inter mode decision. Thus, we propose the following transcoding strategy for inter picture transcoding. -- The LCU will initially split according to the input MB modes to generate the initial CU split quadtree. -- The minimum candidate CU splitting quadtree is determined with the proposed CU pre-processing method as illustrated in Fig. 3.(a). If current CU size is larger than the initial CU size, we test the eight inter PU modes (we test 8 inter PU modes instead of 7 as in HM-4.0. Because the spatial resolution of the input AVC bitstream may be very small, where inter 4x4 is also an efficient mode.) with PS- RDO model. The different MVs belonging to each PU partition are first competed with (1) to get the best MV for each partition. Then, the candidate PU modes with their selected best MVs are also competed with (1) to obtain the best PU mode for current CU. After that, we perform preorder recursive traversal on the CU quadtree to generate the minimum candidate CU split quadtree for a LCU. (a) Proposed CU Pre-processing (b) Proposed CU Processing Figure 3. Proposed Inter Transcoding Strategy -- We further merge the minimum candidate CU splitting quadtree to get the maximum candidate CU splitting quadtree according to the input DCT domain residuals. We re-quantize the input DCT domain residuals with current QP, if the coefficients of a block after re-quantization are all zero, mark it as true; otherwise, mark it as false. Then, if the blocks of four adjacent CUs are all marked as true, the four CUs will be merged to larger one. During the merging process, we apply the constraints that one CU will not merge twice. After all blocks are processed, we get the maximum candidate CU split quadtree. -- Finally, we apply preorder recursive traversal on CU quadtree with proposed CU processing method (Fig. 3.(b)) to compress the LCU. As illustrated in Fig. 3.(b), the mode decision for current CU happens only if current CU size is between the minimum and maximum candidate CU size. Furthermore, we apply the PS-RDO based mode decision module to determine the pre-determined PU mode among the four symmetric PU modes and four asymmetric PU modes. The best MV is also determined similarly. The SKIP mode, the pre-determined PU mode as well as intra modes will compete as original HEVC RDO process does to get the best PU mode. The proposed inter picture transcoding method can significantly reduce the ME/MC, T/Q and IQ/IT operations. Besides, the input MV is quantized to integer precision when calculating the distortion in PS-RDO model, thus, interpolation operation is also avoided. IV. EXPERIMENT RESULTS For comprehensive understanding the benefits of proposed methods, we show the results for Intra picture transcoding and Inter transcoding, respectively. The AVC bitstream is generated with JM 13 [12] with QP 20. The proposed transcoding strategies are implemented in HM 4.0 [8], which is the latest official version when doing the experiments. For the purpose of bitrate reduction, the QPs of transcoded HEVC bitstream are set to 24, 27, 31 and 35, 654

respectively. For each sequence we encode 60 frames, and the transcoding time is averaged over 5 times. A. Results for Intra Picture Transcoding We simulate the proposed intra transcoding method (Proposed_Best), the proposed intra transcoding method while not remove the Intra NxN mode even if the input predict direction of the four sub partitions of current 8x8 CU are the same (Proposed_Initial), the HEVC intra mode decision (HEVC_Anchor) and HEVC intra transcoding while determining the CU split tree directly according to the AVC input mode (HEVC_AVC). We use all-intra high efficiency configurations except that the ALF mode 1 (one pass filter design, fast mode) is used. This is because for transcoding, since the original video is unavailable, there is no need to enable the most complicate ALF mode. The RD curves of those methods are illustrated in Fig. 4 and the transcoding time curves for different QP settings are illustrated in Fig. 5. It is obvious that directly reusing the CU partition leads to significant loss on coding performance. The Proposed_Best method almost maintains the same coding performance with transcoding times reduces to about 70% compared with the HEVC_Anchor. It also demonstrates that the removal of Intra NxN mode for 8x8 CU according to input modes is very efficient. B. Results for Inter Picture Transcoding We simulate the proposed inter transcoding method where the best CU splitting quardtree, PU mode and MVs are all determined by PS-RDO (Proposed_Inter_ CU_PU_MV) and the proposed Inter transcoding method where the best PU mode and MVs are determined with PS- RDO model (Proposed_Inter _PU_MV). The anchors are HEVC inter mode decision (HEVC_Anchor), HEVC inter mode decision with fast algorithm in [9][10] (HEVC_Anchor_Fast), and HEVC inter transcoding while determining the CU split quadtree, PU mode and MVs directly according to the AVC bitstream (HEVC_AVC). The input AVC bitstream is IPPP coding structure with only one reference frame, and the intra period is set to 60. The output HEVC bitstream is generated with the same coding structure and following settings, one reference frame is used, ALF mode is set to 1, internal bit depth is set to 8, CABAC is enabled, and other parameters are the same with HEVC default common settings. For fair comparison, intra pictures in above methods are all coded with original HEVC RDO method. Figure 4. RD Performance for Intra Transcoding Figure 6. RD Performance for Inter Transcoding Figure 5. Transcoding Time for Intra Transcoding 655

The number of required RDO evaluations is significantly reduced for both intra and inter picture transcoding. Besides, the motion estimation, motion compensation as well as fractional pixel interpolation operations are avoided in the proposed inter picture transcoding strategy. The proposed transcoding strategies maintain good tradeoff between coding efficiency and transcoding complexity. Furthermore, our methods can also be combined with the existing fast encoding methods. Figure 7. Transcoding Time for Inter Transcoding The RD curves of above methods are illustrated in Fig. 6 and the transcoding time curves for different QP settings are illustrated in Fig. 7. Although directly reusing the input information (HEVC_AVC) saves large amount of time (about 70%~80%) compared with HEVC_Anchor, the performance loss is also significant (about 30%). And as expected, the time saving of existing fast encoding algorithm HEVC_Anchor_Fast for small QP is not as well as for larger QP. Finally, our proposed transcoding methods, the Proposed_Inter_CU_PU_MV method and Proposed_Inter_ PU_MV method can obtain a good tradeoff between the coding performance and computational complexity among all tested QP settings. V. CONCLUSION We propose several transcoding strategies for AVC to HEVC transcoding with bitrate reduction. With the input residual, modes and motion vectors of AVC, we utilize the PS-RDO model to determine the best coding unit splitting quadtree, the best prediction unit and the best motion vector. REFERENCES [1] T. Wiegand, W.-J. Han, B. Bross, J.-R. Ohm, and G. J. Sullivan, WD4: Working Draft 4 of High-Efficency Video Coding, JCTVC- F803, Torino, IT, July 2011. [2] ISO/IEC 14496-10 and ITU-T Rec. H.264, Advanced Video Coding, 2003. [3] B. Li, G. J. Sullivan, and J. Xu, Comparison of Compression Performance of HEVC Working Draft 4 with AVC High Profile, JCTVC-G399, Geneva, SW, November 2011. [4] P. Zhang, Q.-M. Huang, and W.Gao, Key techniques of bit-rate reduction for H.264 streams, Proc. Paci c-rim Conf. Multimedia (PCM), 2004, pp. 985 992. [5] J. Youn, M.-T. Sun, and C.-W. Lin, Motion vector re nement for high performance transcoding, IEEE Trans. Multimedia, vol. 1, no. 1, pp.30 40, Mar. 1999. [6] H. Shen, X. Sun, and F. Wu, Fast H.264/MPEG-4 AVC Transcoding Using Power-Spectrum Based Rate-Distortion Optimization, IEEE Trans. Circuits Syst. Video Technol., vol. 18, no. 6, pp. 746 755, June 2008. [7] S. Ma, and C.-C. J. Kuo, High-de nition video coding with super macroblocks, Proc. SPIE, vol. 6508, part 1, p. 650816, Jan. 2007. [8] HM Reference Software 4.0 [Online]. Available: https://hevc.hhi.fraunhofer.de/svn/svn_hevcsoftware [9] K. Choi, S.-H. Park, and E. S. Jang, Coding tree pruning based CU early termination, JCTVC-F092, Torino, IT, July 2011. [10] R. H. Gweon, Y.-L. Lee, and J. Lim, Early Termination of CU Encoding to Reduce HEVC Complexity, JCTVC-F045, Torino, IT, July 2011. [11] A. Secker and D. Taubman, Highly scalable video compression with scalable motion coding, IEEE Trans. Image Process., vol. 13, no. 8, pp. 1029 1041, Aug. 2004 [12] JM Reference Software 13 [Online]. Available: http://iphome.hhi.de/suehring/tml/download/ 656