EE5359 PROJECT PROPOSAL Transcoding from H.264/AVC to High Efficiency Video Coding (HEVC) Shantanu Kulkarni UTA ID: 1000789943
Transcoding from H.264/AVC to HEVC Objective: To discuss and implement H.265 transcoder on AVC bitstream to obtain HEVC standardized compression [1] Introduction The strategies to employ the transcoding methods from H.264/AVC [2] bit stream to HEVC[12] are discussed in this paper. HEVC and AVC share similar coding architecture. For inter picture, the power spectrum based rate distortion optimization model as well as input residual, modes and motion vectors are utilized to estimate the best coding unit split code-tree, the best prediction unit (PU) mode and the best motion vector estimation for each PU. Also for intra frame prediction, it is required to reduce the CU and PU partition candidate. Design of most video coding standards is primarily aimed at having the highest coding efficiency, which is the ability to encode the video at lowest possible bitrate while maintaining certain level of video quality. HEVC which is a recently emerged video coding standard aims at high coding efficiency while retaining the video quality. With its hybrid coding architecture, motion compensation prediction and transform coding technique, it can be seen as an improved version of the previous standard H.264. Transcoding from H.264 to HEVC will enable lowering the bitrate resulting in a more efficient compression. The proposed transcoding technique can achieve a good tradeoff between good tradeoff between coding efficiency and trancoding complexity. The increasing of networked video applications, e.g. video conferencing, IPTV and HDTV, with resolution ranging from QVGA to ultra-high definition video, have posed new challenges to design video representation and transmission system, especially for applications with various devices through heterogeneous wired and wireless networks. How to make video be suitable for various device capabilities and dynamical bandwidths becomes very challenging. Transcoding is one of the most promising technologies, which provides video adaptation in terms of bit-rate reduction, resolution reduction and format conversion to meet various requirements. The HEVC standard is based on the well-known block-based hybrid coding architecture, combining motion-compensated prediction and transform coding with high-efficiency entropy coding. However, in contrast to previous video coding standards, it employs a flexible quad-tree coding block partitioning structure that enables the efficient use of large and multiple sizes of coding, prediction, and transform blocks. It also employs improved intra prediction and coding, adaptive motion parameter prediction and coding, new loop filter and an 2
enhanced version of context-adaptive binary arithmetic coding (CABAC) entropy coding. New high level structures for parallel processing are also employed. The ITU-T began development of a successor to H.264 in 2004, while ISO/IEC (Chapter 5: HEVC) began working in 2007. In January 2010, the groups collaborated on a joint Call for Proposals, which culminated in a meeting of the MPEG & VCEG Joint Collaborative Team on Video Coding (JCT-VC) in April 2010, when the name High Efficiency Video Coding (HEVC) was adopted for the codec. [2] Transcoder Structure and Basics Video transcoding is the operation of converting video from one format to another. A format is defined by characteristics such as bit-rate, spatial resolution etc. One of the earliest applications of transcoding is to adapt the bit-rate of a compressed stream to the channel bandwidth for universal multimedia access in all kinds of channels like wireless networks, internet, dial-up networks etc. Changes in the characteristics of an encoded stream like bit-rate, spatial resolution, quality etc can also be achieved by scalable video coding. However, in cases where the available network bandwidth is insufficient or if it fluctuates with time, it may be difficult to set the base layer bit-rate. In addition, scalable video coding demands additional complexities at both the encoder and the decoder. The emerging developments in video coding technology make transcoding much more complicated. A new developed video coding standard always creates new requirement for transcoding from existed formats to the new format for the interoperability of video contents. Figure 2 is the most basic transcoding architecture. The motion vectors from the incoming bit stream are extracted and reused. Thus the complexity of the motion estimation block is eliminated which accounts for 60% of the encoder computation. Hence, even though it is slightly more complex, it is suited for heterogeneous transcoding between different standards where the basic parameters like mode decisions, motion vectors etc are to be re-derived. 3
[3] Overview of HEVC Fig. 1 Encoder Block Diagram HEVC [12] Fig. 2 Transcoded pixel domain transcoding architecture [4] The input video is first divided into blocks called coding tree units (CTUs), which perform a role that is broadly analogous to that of macroblocks in previous standards. The coding unit (CU) defines a region sharing the same prediction mode (intra, inter or skip) and it is represented by the leaf node of a quadtree structure. The prediction unit (PU) defines a region sharing the same prediction 4
information. The transform unit (TU), specified by another quadtree, defines a region sharing the same transformation and quantization. Fig. 3 Recursive Block structure for HEVC [13] Fig. 4 Intra Prediction mode, HEVC [12] The best intra mode among a total of 35 modes (Fig. 4) (Planar, DC and 33 angular directions) is selected and coded. Mode dependent context sample smoothing is applied to increase prediction efficiency and the three most probable modes (MPM) are used to increase symbol coding efficiency. The best motion parameters are selected and coded by merge mode and adaptive motion vector prediction (AMVP) mode, in which motion predictors are selected and explicitly coded among several candidates. To increase the efficiency of motion-compensated prediction, non-cascaded interpolation structure with 1D FIR filters are used. An 8-tap or 7-tap filter is directly applied to generate the samples of half-pel and quarter-pel luma samples, respectively. A 4-tap filter is utilized for chroma interpolation. 5
Residuals generated by subtracting the prediction from the input are spatially transformed and quantized. In the transform process, matrices which are approximations to DCT are used. For low computational cost, partial butterfly structure is implemented for transformation. In the case of 4x4 intra predicted residuals, DST is used for luma. 52-level quantization steps and rate-distortion optimized quantization (RDOQ) are used in the quantization process. Reconstructed samples are created by inverse quantization and inverse transform. CABAC encoding scheme is used in this encoding standard, which is applied to the generated symbols and quantized transform coefficients. After reconstruction, two in-loop filtering processes are applied to achieve better coding efficiency and visual quality: deblocking filtering and sample adaptive offset (SAO). Reconstructed CTUs are assembled to construct a picture and stored in the decoded picture buffer to be used to encode the next picture of input video. [4] Overview of H.264/AVC H.264 [2] is a standard for video compression, and is equivalent to MPEG-4 Part 10, or MPEG-4 AVC (for advanced video coding). As of 2008, it was the latest block-oriented motion-compensation-based video standard developed by the ITU-T Video coding experts group (VCEG) together with the ISO/IEC moving picture experts group (MPEG), and it was the product of a partnership effort known as the joint video team (JVT). The ITU-T H.264 standard and the ISO/IEC MPEG-4 part 10 standard (formally, ISO/IEC 14496-10) are jointly maintained so that they have identical technical content. Fig. 5 H.264 Encoder [2] 6
Fig. 6 H.264 Decoder [2] Features for enhancement of prediction are as follows. Directional spatial prediction for intra coding (9 directional prediction modes) Variable block-size motion compensation with small block size Quarter-sample-accurate motion compensation Motion vectors over picture boundaries Multiple reference picture motion compensation Decoupling of referencing order from display order Decoupling of picture representation methods from picture referencing capability Weighted prediction Improved skipped and direct motion inference In-the-loop deblocking filtering Features for improved coding efficiency are as follows. Small block-size transform Exact-match inverse transform Short word-length transform Hierarchical block transform Arithmetic entropy coding Context-adaptive entropy coding Features for robustness to data errors/losses are as follows. Parameter set structure NAL unit syntax structure Flexible slice size Flexible macroblock ordering (FMO) Arbitrary slice ordering (ASO) Redundant slices (RS) Data partitioning SP/SI synchronization/switching pictures 7
[5] Comparison with H.264 and previous standards Owing to a number of diverse applications/fields which have been introduced in HEVC, it may overtake the previous coding standard H.264/AVC. Following are the areas in which the AVC and HEVC have differences in their fields: Larger block structure leading to maximum of 64x64 pixels per block Intra prediction direction modes which are upto 35 (33 modes + DC + Planar) in case of HEVC while H.264 has 9 directional modes of intra prediction (Fig. 7) Adaptive motion vector prediction, which allows codec to find more inter frame redundancies Superior parallelization tools, including wavefront parallel processing, for more efficient coding in a multi core environment Entropy using CABAC only, no more CAVLC Improvements to de-blocking filter and addition of one more filter called Sample Adaptive Offset (SAO) that further leaves artifacts along block edges Reduction of bit rate by almost 37% (Approximate) Fig. 7: Intra Predicton modes, H.264/AVC [20] 8
The differences between AVC and HEVC can be summarized through the following table: [6] HEVC Transcoding Table1: Difference between H.264/AVC and HEVC [1] This topic discusses several transcoding strategies for AVC to HEVC transcoding with bit rate reduction. Considering the similar coding architecture of HEVC and AVC, and motivated by the work in [1], for inter picture transcoding, we utilize the PS-RDO model to determine the CU quadtree structure, the best CU partition mode and the best motion vector of each prediction unit (PU), and for intra picture transcoding, we propose to reduce the candidate settings for CU quadtree structures and PU partitions. The transcoding schemes discussed here avoid high computational complexity in terms of reduced RDO evaluations and motion compensation operation as well as fractional pixel interpolation operation. The pixel domain AVC to HEVC architecture is illustrated in Fig. 8 Input AVC Bitstream AVC Decoder HEVC Re-encoder Output HEVC Bitstream Residual, modes and MVs CU, PU partitions and MVs Simplified Mode Selection Fig. 8 Pixel Domain AVC-HEVC Transcoder [1] 9
Transcoding of intra Coded Frames The quality of each intra picture will have significant impacts on the inter coded frames. Thus, its quality needs to be kept as intact as possible. The input AVC bitstream already contains useful information of the MB partitions and prediction directions, the LCU will be initially split according to input macroblocks to the AVC. The CU partitions needs to be further merged into larger sizes according to the predicted directions of the neighboring PUs. The encoding computational complexity in HEVC to select the best coding parameter has been increased as the increasing of candidates for CU partitions, PU partitions and TU partitions. The intra predictor and motion vector predictor, of a CU is generated from neighboring coded CUs, the HM 4 performs preorder recursive traversal on the CU quadtree. When encoding a CU at current depth, the best PU mode is determined by successively evaluating the RD costs of each inter and intra modes. Indicates that this mode be evaluated conditionally, either depends on the CU size or some fast mode decision algorithms. After the best PU mode is determined, if current CU is larger than 8x8, it might further split into four sub-cus, and then recursively calls the CU compressing function to determine the best CU quadtree structure. The decision of best TU split tree is integrated in the determination of the best PU mode. Fig. 9 CU Compressing in HEVC Encoder [1] 10
Transcoding of Inter Coded Frames The major complexity of Inter picture coding comes from the motion estimation (ME), MC, T/Q and IQ/IT operations when testing every set of possible coding parameters with possible CU size, PU and TU modes. Thus, these operations can be reduced by utilizing the information directly from the AVC encoded format. The information that can be used are motion vectors to decide the displacement, the residuals and the modes of the predictions. The key technology of AVC to HEVC inter picture transcoding is to merge smaller blocks to a larger CU, especially for bit rate reduction transcoding. Since a large CU may consists of different 4x4 blocks, and probably, these blocks may have different MVs, merging these blocks now turns to measure the RD cost when the MV changes. [7] Conclusion: Transcoding strategies for AVC to HEVC transcoding with bitrate reduction are proposed in this paper. With the input residual, modes and motion vectors of AVC, the PS-RDO model is utilized to determine the best coding unit splitting quadtree, the best prediction unit and the best motion vector. The number of required RDO evaluations is significantly reduced for both intra and inter picture transcoding. Besides, the motion estimation, motion compensation as well as fractional pixel interpolation operations are avoided in the proposed inter picture transcoding strategy. The proposed transcoding strategies maintain good tradeoff between coding efficiency and transcoding complexity. 11
References: [1] D. Zhang, B. Li, J. Xu, and H. Li, Fast Transcoding from H.264/AVC to High Efficiency Video Coding IEEE International Conference on Multimedia Expo, pp. 651-656, July, 2012 [2] T. Wiegand et al, Overview of the H.264/AVC video coding standard, IEEE Trans. CSVT, Vol. 13, pp. 560-576, July 2003. [3] J Xin, C.W. Lin and M.T. Sun, Digital video transcoding, Proceedings of the IEEE, Vol. 93, pp 84-97, Jan 2005. [4] A. Vetros, C. Christopoulos and H. Sun, Video transcoding architectures and techniques: An overview, IEEE Signal Processing Magazine, Vol. 20, pp 18-29, March 2003. [5] S. Matsuo, S. Takamura and A. Shimizu, Modification of Intra Angular Prediction in HEVC IEEE, Signal & Information Processing Association Annual Summit and Conference (APSIPA ASC), 2012 Asia-Pacific, pp 1-4, Dec 2012. [6] T. Wiegand, G. J. Sullivan, G. Bjøntegaard, and A. Luthra, Overview of the H.264/AVC Video Coding Standard, IEEE transactions on circuits and systems for video technology, vol. 13, no. 7, pp. 560-576, July 2003 [7] I. Kim, J. Min, T. Lee et al, Block Partitioning Structure in the HEVC Standard, IEEE transactions on circuits and systems for video technology, vol. 22, no. 12, pp. 1697-1706, December 2012 [8] Q. Cai, L. Song, G. Li et al, Lossy and Lossless Intra Coding Performance Evaluation: HEVC, H.264/AVC, JPEG 2000 and JPEG LS, Asia Pacific Signal & Information Processing Association Annual Summit and Conference (APSIPA ASC), pp. 1-9, Dec 2012. [9] G. Sullivan, P. Topiwalla and A. Luthra, The H.264/AVC video coding standard: overview and introduction to the fidelity range extensions, SPIE Conference on Applications of Digital Image Processing XXVII, vol. 5558, pp. 53-74 Aug 2004. [10] T. Weigand et al, Introduction to the Special Issue on Scalable Video Coding Standardization and Beyond IEEE Trans on Circuits and Systems for Video Technology, Vol 17, pp 1099-1102, Sept 2007. [11] T. D. Nguyen et al, Efficient MPEG-4 to H.264/AVC transcoding with spatial downscaling, ETRI Journal, vol.29, no.6, pp 826-828, Dec. 2007. [12] G.J. Sullivan, J. Ohm, W. Han et al, Overview of High Efficiency Video Coding (HEVC) Standard IEEE Transactions on Circuits and Systems for Video Technology, Vol. 22, No.12, Dec 2012 [13] H. Zhang and Z. Ma, Fast intra prediction for high efficiency video coding, Pacific Rim Conf. on Multimedia, PCM2012, Singapore, Dec. 2012. 12
Reference Books [14] K. Sayood, Introduction to Data compression, III edition, Morgan Kaufmann publishers, 2006. [15] I.E.G. Richardson, H.264 and MPEG-4 video compression: video coding for next-generation multimedia, Second Edition, Wiley, 2010 Websites [16] http://en.wikipedia.org/wiki/ : Website for Wikipedia, Encyclopedia [17] http://www-ee.uta.edu/dip/courses/ee5359/index.html: Course website [18] http://ieeexplore.ieee.org/: Website archive for IEEE papers online [19] http://www.v-net.tv/hevc-is-game-changer-for-multi-screen-and-iptv/: Impact of HEVC standard on digital media market like cell phones, TVs etc [20] http://www.streamingmedia.com/articles/editorial/what-is-.../what-is- HEVC-(H.265)-87765.aspx: Summary about HEVC, information site. [21] http://mrutyunjayahiremath.blogspot.com/2010/09/h264-videocodec_22.html: Diagram for H.264 prediction direction modes [22] http://codesequoia.wordpress.com/2012/10/28/hevc-ctu-cu-ctb-cb-pb-andtb/ : Block coding in HEVC. Also link to make a HEVC stream. 13