Simple intra prediction algorithms for heterogeneous MPEG-2/H.264 video transcoders

Size: px

Start display at page:

Download "Simple intra prediction algorithms for heterogeneous MPEG-2/H.264 video transcoders"

Geoffrey Roberts
5 years ago
Views:

1 Multimed Tools Appl (2008) 38:1 25 DOI /s Simple intra prediction algorithms for heterogeneous MPEG-2/H.264 video transcoders Gerardo Fernández-Escribano & Pedro Cuenca & Luis Orozco-Barbosa & Antonio Garrido & Hari Kalva Published online: 6 July 2007 # Springer Science + Business Media, LLC 2007 Abstract Recent developments have given birth to H.264/AVC: a video coding standard offering better bandwidth to video quality ratios than MPEG-2. It is expected that the H.264/AVC will take over the digital video market, replacing the use of MPEG-2 in most digital video applications. The complete migration to the new video-coding algorithm will take several years given the wide scale use of MPEG-2 in the market place today. This creates an important need for MPEG-2/H264 transcoding technologies. However, given the significant differences between both encoding algorithms, the transcoding process of such systems is much more complex to other heterogeneous video transcoding processes. In this work, we start by analyzing the methods defined in the H.264 video coding standard for the intra prediction: a central element of every H.264 encoder. We then introduce and evaluate six fast intra mode decision algorithms which should enable the development of MPEG-2 to H.264 transcoders. Having evaluated all the proposed methods, we have come out with a high-efficient method, namely DC-ABS pixel. Our results show that our algorithm considerable reduces the complexity involved in the intra prediction with respect the mode decision algorithms used in H.264 JM reference software, while exhibiting a slight degradation on the RD function.. Finally, we analyze a comparative study with two of the most G. Fernández-Escribano (*) : P. Cuenca : L. Orozco-Barbosa : A. Garrido Instituto de Investigación en Informática, Universidad de Castilla-La Mancha, Avenida de España s/n, Albacete, Spain gerardo@dsi.uclm.es P. Cuenca pcuenca@dsi.uclm.es L. Orozco-Barbosa lorozco@dsi.uclm.es A. Garrido antonio@dsi.uclm.es H. Kalva Department of Computer Science and Engineering, Florida Atlantic University, 777 Glades Road, Boca Raton, FL 33431, USA hari@cse.fau.edu

2 2 Multimed Tools Appl (2008) 38:1 25 prominent fast intra prediction methods presented in the literature. The results show that the proposed DC-ABS pixel method achieves the best results for video transcoding applications. Keywords Video transcoding. MPEG-2. H.264. Intra prediction 1 Introduction During the past few years, technological developments, such as novel video coding algorithms, lower memory costs, and faster processors, are facilitating the design and development of highly efficient video encoding standards. Among the recent works in this area, the H.264 video encoding standard, also known as MPEG-4 AVC occupies a central place [14]. The H.264 standard, jointly developed by the ITU-T and the MPEG committees, is highly efficient offering perceptually equivalent video quality at about 1/3 to 1/2 of the bitrates offered by the MPEG-2 format [13]. However, these gains come with a significant increase in encoding and decoding complexity [12]. While the H.264 video standard is expected to replace MPEG-2 video over the next several years, a significant amount of research needs to be done in developing efficient encoding and transcoding technologies [19, 21, 22, 26]. The transcoding of MPEG-2 video to H.264 format is particularly interesting given the wide availability and use of MPEG-2 video nowadays. However, given the significant differences between the MPEG-2 and the H.264 coding algorithms, transcoding is a much more complex task compared to the task involved in other heterogeneous video transcoding architectures [2, 5 8, 10, 11, 18, 25, 29]. The H.264 employs a hybrid coding approach similar to that of MPEG-2 but differs significantly from MPEG-2 in terms of the actual coding tools used. The main differences are: (1) use of an integer transform with energy compaction properties; (2) an in-loop deblocking filter to reduce block artifacts; (3) multi-frame references for inter-frame prediction; (4) entropy coding; (5) variable block size for motion estimation and (6) intra prediction. The H.264 standard introduces several other new coding tools aiming to improve the coding efficiency. A complete overview of H.264 can be found in [27]. Due to the existing differences between the two standards, MPEG-2 and H.264, the development of real-time transcoders is a very challenging task. A heterogeneous transcoder is composed of an MPEG-2 decoder and an H.264 encoder interconnected in tandem. The idea behind the design of an efficient heterogeneous MPEG-2/H.264 transcoder can be simply stated as follows: the MPEG-2 encoder should provide to the H.264 with all the pieces of information that may be used by the latter to speed up the encoding process. The design of such transcoder requires an in-depth analysis and evaluation of the various steps involved in the H.264 encoding process. In [17], Kalva states that the main elements of the encoding process that require to be addressed are: the motion estimation, the transform coding and the intra prediction. Each one of these elements requires to be examined and various research efforts are underway. In this paper, we focus our attention on the intra prediction module. We start by studying the two mode decision algorithms used in H.264 JM reference software to obtain the best prediction mode and directions for a given macroblock. We evaluate and analyze the results obtained when using these two methods in terms of their computational complexity and rate-distortion performance. As part of this analysis, we show that intra prediction plays a major role not only on the encoding process of the intra frames (I-pictures) but also on the encoding process of the inter frames (P pictures). From an exhaustive analysis of the results obtained using more than 120,000 samples (macroblocks), we derive various simple

3 Multimed Tools Appl (2008) 38: methods suitable for their integration into a real-time MPEG2/H.264 transcoder. The proposed methods are evaluated in terms of their computational complexity and ratedistortion performance. From this analysis, we come out with a simple intra prediction method exhibiting a good compromise between its computational complexity and ratedistortion performance. The rest of the paper is organized as follows. Section 2 reviews the principles of operation of the intra prediction process used by the H.264 encoding standard. We also analyze the performance of the two mode decision algorithms used in JM9.3. In Section 3, we review some of the most relevant proposals aiming to speed-up the intra prediction process. Section 4 introduces our prediction algorithms specifically designed for MPEG-2 to H.264 transcoders. In Section 5, we carry out a performance evaluation of the proposed algorithms in terms of its computational complexity and rate-distortion results. We compare the performance of our proposals to the RD-cost and SAE-cost methods used in H.264 JM reference software and to the other methods proposed in the literature. Finally, Section 6 draws our conclusions and outlines our future research plans. 2 Intra prediction in H.264 H.264 incorporates into its coding process, an intra prediction (defined within the pixel domain) whose main aim is to improve the compression efficiency of the intra coded blocks. This process is carried out at the macroblock (MB) level. Throughout this section, we will illustrate the principle of operation of the intra prediction as applied to the luminance and chrominance blocks. The prediction of an MB from the previously encoded MBs in the same picture is new in H.264. Differently to the MPEG-2 standard, the H.264 standard makes use of three different block sizes 4 4, 8 8 and pixels. For each MB size, there is an associated intra prediction mode. Each intra prediction mode includes several directional predictions greatly improving the prediction in the presence of directional structures. Furthermore, for each MB and for each color component (Y,U,V), one prediction mode and one set of prediction directions have to be obtained. With the intra prediction, the I-pictures and Intra MB (belongs to P and B frames) can be encoded more efficiently than in MPEG-2, which does not support intra prediction. For the luminance component, an MB may make use of 4 4 and block prediction modes, referred to as Intra_4 4 and Intra_16 16, respectively. Recently, the Intra_8 8 block prediction mode has been added as part of the fidelity range extension (FRExt) of the standard. There are nine 4 4 and 8 8 possible block prediction directions and four block prediction directions. Figure 1 depicts the nine and four prediction directions for the 4 4, 8 8 and prediction modes, respectively. For the chrominance component, an MB may make use of only 8 8 block prediction modes and four 8 8 block prediction directions. The prediction directions for the 8 8 prediction mode (not shown in the figure) are similar to the ones used for the prediction mode in the luminance component. In the H.264 JM reference software, two mode decision algorithms are used to obtain the best prediction mode and prediction direction for the intra prediction coding, namely: the SAE-cost and RD-cost methods. 2.1 The SAE-cost method The H.264 JM reference software encoder selects the best combination mode/directions by using the sum of absolute errors (SAE). This implies that for each existing direction of each

4 Multimed Tools Appl (2008) 38:1 25 Fig. 1 Prediction directions for different predictions modes. a Intra_4 4 prediction mode. b Intra_8 8 prediction mode.

4 4 Multimed Tools Appl (2008) 38:1 25 Fig. 1 Prediction directions for different predictions modes. a Intra_4 4 prediction mode. b Intra_8 8 prediction mode. c Intra_16 16 prediction mode mode, the predictor within the pixel-domain is created from the boundary pixels of the current partition and the SAE costs are evaluated. As already stated, for each MB and for each color component (Y, U, V), one prediction mode and one set of prediction directions have to be obtained. The best combination mode/directions is determined corresponding to the one exhibiting the minimum SAE cost.

Multimed Tools Appl (2008) 38:1 25 5 2.2 The RD-cost method The rate-distortion (RD) optimization method is based on a Lagrange multiplier [23] [28].

5 Multimed Tools Appl (2008) 38: The RD-cost method The rate-distortion (RD) optimization method is based on a Lagrange multiplier [23] [28]. The standard makes use of this optimization method to choose the best decision mode for a macroblock. Differently to evaluating the cost of coding a macroblock on a pixel by pixel basis (SAE cost); the RD-cost consists in making the selection based on a Lagrange function. In this way, the H.264 standard selects the decision mode exhibiting the minimum Lagrange cost. However, one of the main drawbacks of this method is its excessive computational cost. For many applications, the use of the Lagrange multiplier may be prohibitive. This is the case when developing a transcoding architecture aimed to work in real-time. For evaluating the RD-cost, H.264 JM reference software has to obtain the encoding rate, R, and the distortion, D, of each macroblock. The former is obtained by first computing the difference between the original macroblock and its predictor. Thereafter, a 4 4 Hadamard transform (HT) has to be applied followed by a quantization process. The distortion, D, is obtained by performing an inverse quantization process followed by its inverse HT and then comparing the original macroblock to the reconstructed one. The JM 9.3 chooses then the decision mode having the minimum cost, J. The cost is evaluated using the Lagrange function J=D+λ Rate, where λ is the Lagrange multiplier. 2.3 Performance analysis In this section, we undertake the performance evaluation of the aforementioned two intra prediction methods included in the H.264 JM reference software. We have used the version 9.3 [16]. In our first experiments, we have used various video sequences exhibiting different spatial characteristics and different size formats (CCIR: Hook and Flower video sequences, CIF: Tempete and Mobile video sequences and QCIF: Akiyo and Dancer video sequences). The original uncompressed sequences have been encoded using the QP factors ranging from 5 up to 45 in steps of 5. This corresponds to the H.264 QP range used in most practical applications. Since we have focused our attention on the intra prediction method, each and every frame of each sequence has been encoded as an I-frame. Figure 2 shows the normalized computational cost (with respect to no intra prediction) for encoding a frame, (using a QP factor of 10) for different video sequences and size formats. As seen from the figure, the RD-cost method has the highest computational cost. The SAE cost method exhibits a normalized computational cost one order of magnitude Fig. 2 Normalized computational cost

6 6 Multimed Tools Appl (2008) 38:1 25 lower than the RD cost method. It is for this reason that we will look further at alternative methods to speed up the intra prediction based on the principles of operation of the SAEcost method. Towards this end, we start by analyzing the statistics of the results when applying the SAE method. In a second experiment, we have evaluated the number of instances that each and every intra prediction direction is used for deriving the best mode. For this second experiment, we have used a database of more than 120,000 samples (MBs) available available at sampl.eng.ohio-state.edu/~sampl/database.htm. We have found out that the three prediction directions, P0, P1 and P2, are used in more than 50% of the cases (Fig. 3). This feature can be exploited to reduce the complexity of the intra prediction process without heavily penalizing the quality of the video sequence. This is to say, we can propose a method making use of only the prediction directions P0, P1 and P2. Further to our previous observation, it is also important to analyze the modes being used as a function of the QP value. Table 1 shows the different prediction modes used by the SAE-cost function when applied to the six different video sequences used in our experiments. Each sequence has been encoded using nine different QPs. For each sequence encoded using a QP value, the table depicts the utilization percentage for each mode. For instance, in the case of the Hook sequence encoded with QP=5, the 4 4 and 8 8 modes were used and 1.81% of the times, while the mode was never used. From the table, it is clear that for low values of QP, the SAE-cost function uses mostly the 4 4 mode. As the QP increases, the other two modes are used more and more up to the point that either the 8 8 or the modes are preferred over the 4 4 mode. The definition of a very simple mechanism can therefore be based on the above findings. Before deriving a high-efficient method based on the above observation, we start by evaluating and comparing the RD performance of a modified version of the SAE-cost method. In this method, we propose to make use of only the three prediction modes: P0, P1 and P2. We compare its performance with the results obtained by the RD-cost method and the SAE- method. It is clear that the RD-cost method provides us with the optimal RD result, i.e., the best mode/directions for a given MB. Furthermore, we will also provide the worst rate-distortion results by disabling the intra prediction mechanism. In this case, the encoder fills up the predicted blocks with zeros so that the difference between the original macroblock and the predicted macroblock is the original macroblock. Figure 4 shows the rate-distortion function for the RD-cost, SAE-cost, No Intra Prediction methods for CIF sequences. Similar results have been obtained for CCIR and QCIF encoded sequences (not shown in the figure). The averaged PSNR values of luma (Y) Fig. 3 Histogram of H.264 intra prediction directions

7 Multimed Tools Appl (2008) 38: Table 1 Prediction modes used with SAE-cost (%) Sequence QP Mode Sequence QP Mode Hook Flower Tempete Mobile Dancer Akiyo and chroma (U, V) is used in the rate-distortion function graphs. The averaged-global PSNR is based on the following equation: PSNR ¼ 4 PSNR Y þ PSNR U þ PSNR V 6 ð1þ As expected, the best results are obtained when the RD-cost method is applied. At the other end, the worst results are obtained when no intra prediction is used. From these preliminary results, we can conclude that there is a tradeoff between the computational cost and the rate-distortion function. As expected, the SAE-cost method obtains good savings in terms of computational cost at expenses of a slightly lower rate-distortion performance with respect to the RD-cost method. This feature makes of this method a good candidate for applications when time is a critical parameter. Figure 4 also shows the results when applying only the three prediction directions (using all prediction modes, 4 4, 8 8 and 16 16) with respect to the results obtained when applying the full estimation methods: RD and SAE-Cost methods. As seen from the figures, the PSNR obtained when applying only the P0, P1 and P2 prediction directions deviates slightly from the results obtained when using the considerable more complex full estimation procedure.

8 8 Multimed Tools Appl (2008) 38:1 25 Fig. 4 Rate distortion results From our results, we can conclude that SAE-cost method could be the method to be used when time is a critical parameter. Finally, based on the results depicted in this section, depending on the image quality requirements, the use of only P0, P1 and P2 predictors may prove a viable solution when computational cost may be an issue. In the Section 3, we review some of the recent proposals to speed-up the intra prediction process. In Section 4, we will then introduce our proposals. 2.4 Intra prediction in P frames As already mentioned, the intra prediction is not only applicable to the encoding process of the I frames of a video sequence. In fact, the intra prediction can be applied to all the three different types of frames in a video sequence, i.e., I, P and B frames. In this section, throughout a set of various examples, we should evaluate the percentage of macroblocks belonging to P frames actually been encoded with intra prediction. From our results, we should also show that the number of macroblocks encoded as intra will highly depend on the method being used, i.e., SAE-cost or the RD-cost. The main aim of this study is to show that the intra prediction plays a major role in the encoding process and therefore in the transcoding process. In other words, by justifying the major role played by the intra prediction process, we should be able to show that on the development of real-time transcoders, the intra prediction process plays a major role. In our study, we have run a set of experiments using the H.264/AVC reference software, version JM 9.3 [16]. Throughout our experiments, the H.264 video encoder has been fed with a decoded high quality MPEG-2 video stream (CCIR-601 MPEG-2 encoded sequences at 5 Mbps). Since our main aim has been to show the percentage of macroblocks to be encoded using intra prediction during the transcoding of the P frames of the MPEG-2

9 Multimed Tools Appl (2008) 38: stream into H.264 P frames, the original MPEG-2 video streams were encoded without B frames. The size of the GOP has been set to 12 frames, at a frame rate of 25 frames per second, with the first frame of every GOP encoded as I-frame, and the remaining frames of the GOP were encoded as P-frames. The rate control process was disabled during our experiments. The ProfileIDC was set to High for all the simulations, while the FRExt options were enabled. Table 2 shows the average percentage of macroblocks in the P frames encoded as intra by the H.264 video encoder, for Hook, Flower and Martin video sequences using all QP range. From the results, it is clear that the H.264 encoder applies the intra prediction process to a large percentage of the P-frame macroblocks. Furthermore, the most computational extensive method, the RD-cost optimization method, encodes a higher number of P-frame macroblocks as intra as the quality of the encoded stream is increased. From this analysis, it is clear that the intra prediction process plays a major role on the overall transcoding process. 3 Related work A large number of initiatives are underway towards developing fast intra prediction methods proposed for transcoding applications. In [3], Bialkowski et al. have shown that there are basic pattern similarities between the frequency domain prediction of H.263 and spatial prediction of H.264. The authors state that one of the main challenges comes out by the fact that the H.264 performs the Intra block prediction by making use of the neighbouring pixel values, while H.263 defines the prediction process within the frequency domain (DCT coefficients). In the transcoding process from H.264 to H.263, the direction has to be remapped to one H.263 direction depending on the mode and direction or directions used in H.264. This remapping can be performed by using a set of preestablished values stored in a predefined table. For the transcoding from H.263 into H.264, the direction used in the H.263 macroblock, is also used to estimate the H.264 direction. The decision of the mode Intra_4 4 or Intra_16 16 is done by evaluating a residual error. If the residual error is lower than a pre-established threshold, then the Intra_16 16 mode should be used, otherwise the Intra_4 4 mode. A small residual error implies a high similarity of the predicted block and its predictor. However, the case of a high residual error indicates that it may be a better direction which requires additional processing to determine it. The proposed algorithms can be used for speeding-up the transcoding between H.263/ H.264. They can be integrated into a QP-based rate control mechanism. Since the intra prediction in H.263 has only one prediction mode with three predictions directions, it turns out to be very simple to approximate its value by reusing the information available from H.264 to select the best direction. In [4], Chen et al. have considered the problem of converting an MPEG-2 video sequence to H.264 standard with the image resolution to half of the original size. For the MPEG-2/H.264 transcoding, they proposed performing the intra prediction in the transform Table 2 Percentage of intra macrobloks in P frames (average) Sequence H.264 (RD-cost; %) H.264 (SAE-cost; %) Hook Flower 45 8 Martin 49 28

10 10 Multimed Tools Appl (2008) 38:1 25 domain. In their proposal, for the prediction directions 0, 1 and 2, the transform-domain predictions directions are easily calculated. However when all modes are included, the complexity of the process highly increases. They are currently looking at how to reduce the complexity of the transformation. Moreover, the authors do not provide any experimental results. Another proposal for the intra prediction in the transform domain has been introduced by Xin et al. [30]. The authors propose to carry out the MPEG-2 to H.264 intra prediction by working in the 4 4 transform domain (HT) used by H.264. They use the mechanism proposed in [31] for translating the incoming 8 8 DCT blocks into a MPEG-2 sequence encoded into four 4 4 HT blocks in H.264. The HT of the predictions directions blocks needs to be performed for every prediction mode. In this way, Xin et al. propose using an HT kernel matrix for calculating the intra prediction directions. The distortion is also calculated in the transform domain. However, the inverse HT used in the H.264 specification is not strictly linear, and an error can be introduced when applying the integer shift operation. Just as the previous proposal, the authors do not provide experimental results and the number of required operations is quite high. On the other hand, recently fast mode decision algorithms for intra prediction in H.264/ AVC video coding have been proposed. In the following, we undertake the analysis of two of the most prominent ones. In [20], a fast mode decision algorithm based on local edge information is presented. Prior to intra prediction, and edge map is created and a local edge direction histogram in then established for each sub-block. Based on the distribution of the edge direction histogram, only a small part of intra prediction modes are chosen for ratedistortion optimization calculation. Experimental results show that this scheme increases the speed of intra prediction significantly with negligible loss of PSNR. In [24], the authors propose a mode decision scheme for 4 4 intra prediction in H.264 encoder to reduce its complexity. The authors uses the characteristics of each prediction mode in terms of reducing its power in DCT domain since it is considered that reducing frequency components with higher level power may improve the efficiency of prediction. The proposed method reduces the candidates of prediction modes by classifying frequency characteristics of 4 4 block, which are computed from its low-frequency components of DCT coefficients. Experimental results show that the proposed scheme reduces the complexity by nearly 60% with negligible loss of rate-distortion performance. 4 Fast intra mode decision algorithms In this section, we introduce six different intra prediction methods aiming to speed-up the intra MB prediction. This goal is achieved by making make use of the DC coefficients available from the decoding process of the MPEG-2, and by reducing the computation requirements at the expense of a slight degradation on the RD performance. The two last methods to be introduced will be further enhanced by adjusting dynamically the parameters being used. In the following, we describe one by one the main steps of our algorithms. 4.1 Computation of the DC coefficients of the decoded MPEG-2 video image blocks Due to the presence of three different sizes of blocks used by the H.264, namely 4 4, 8 8 and 16 16, and that the MPEG-2 standards use only blocks of 8 8, the evaluation of the prediction mode involves an intermediate scaling process. In an MPEG-2/H.264 video transcoder, once having decoded the MPEG-2 video, besides the uncompressed video, the DC coefficient of the 8 8 blocks (Y,U,V) is readily available to the H.264 video encoder. Since the MPEG-2 makes

11 Multimed Tools Appl (2008) 38: use of only 8 8 blocks, we need to devise a mechanism allowing us to properly compute the DC coefficients of the 4 4 and blocks. Figure 5a and c depict the procedure for computing the DC coefficients of the four 4 4 blocks (DC 4 ) and the one associated to the block (DC 16 ). The DC coefficients of the 8 8 blocks (DC 8 ) are directly obtained by reusing the information coming from the MPEG-2 decoding process (Fig. 5b). As seen from Fig. 5a, the process to obtain the four DC coefficients of 4 4 blocks involves first applying the inverse DCT to each 8 8 block of the decoded MPEG-2 picture. This step regenerates the 8 8 block in the space domain (pixel domain values are needed anyway). The process to obtain the DC coefficients in 4 4 blocks consists in the sum of all the pixel of the block divided by 4. In this case, we do not reuse the information of the DC coefficients of MPEG blocks, because this solution is faster than other mechanisms, like the proposed in the paper [9]. To obtain the DC coefficients of the 8 8 blocks, no additional operations are required. This information is available in the decoded sequence. Regarding the computation of the DC coefficient of the block, this one can be obtained as follows, DC 16 ¼ 8 DC1 8 þ 8 DC2 8 þ 8 DC3 8 þ 8 DC ¼ DC1 8 þ DC2 8 þ DC3 8 þ DC4 8 2 ð2þ this is to say, by adding the four DC coefficients of the four corresponding 8 8 blocks (DC 1 8 ; DC2 8 ; DC3 8 ; DC4 8 ) and then dividing the result by two. Equation (2) is simply derived from the fact that the DC coefficient of an N N block is nothing else but the sum of all the pixels within the block divided by N. This conversion procedure is depicted in Fig. 5c. Fig. 5 Resolution conversion method. a 8 8 to 4 4, b 8 8 to 8 8, c 8 8 to 16 16

12 12 Multimed Tools Appl (2008) 38: Computation of the DC coefficients of the H.264 predictors The computation of the DC coefficient of the intra luma and chroma block prediction directions of the H.264 standard is a straightforward procedure. Let us take the example of computing the vertical predictor (P0) involved in the 4 4 intra luma mode prediction (see Fig. 1a). The predictor is created by copying the values of the upper border pixels into all the entries within the same column (see Fig. 6). According to the DCT, the DC coefficient of the predictor is given by: DC ¼ a þ b þ c þ d where a, b, c and d are the neighboring pixels in the upper border. Equation 3 is simply derived from the fact that the DC coefficient of an N N block is nothing else but the sum of all the pixels within the block divided by N. In this simple form, we are able to compute the DC coefficients of the vertical prediction (P0) involved in the Intra_4 4 prediction mode. Similarly, this process can be applied for obtaining all the other predictors. 4.3 Selection of the prediction mode and prediction directions The third step of our proposed algorithm consists in obtaining the prediction mode and predictors (prediction directions) for each macroblock and for each color component. In the following, we present some proposals named: 4 4, 8 8 and 16 6 method, empirical method, and DC-ABS and DC-ABS pixel methods. As a further feature allowing us to speed up this process, we only consider the use of the prediction directions 0, 1 and 2 (for all luma and chroma predictions modes). We base this choice on our findings reported in Section 2.3. As we will show in the following sections, the proposed methods will significantly reduce the number of operations involved in the calculation of the intra predictors when compared to the more classical method of computing the differences on a full estimation pixel per pixel basis (SAE-cost method) , 8 8 and 16 6 methods As a feature allowing us to speed up this process, this first set of methods only consider the use of a fixed prediction mode: Intra_4 4, Intra_8 8 or Intra_ In the case that the prediction mode is selected, the best predictor is simply obtained by taking the one whose DC coefficient (obtained as shown in Section 4.2) exhibits the lowest absolute (ABS) difference with respect to the DC coefficient of the original block (obtained as shown in Section 4.1). Similarly to the prediction mode, for the 8 8 prediction mode, we determine the predictor whose DC coefficient exhibits the lowest absolute difference with respect to the DC coefficient of the original block for each one of the 8 ð3þ Fig. 6 Example of the generation of the vertical predictor (P0)

13 Multimed Tools Appl (2008) 38: blocks of the macroblock. Similarly, for the 4 4 prediction mode, the best predictor is obtained for each one of the blocks of the macroblock Empirical method Based on the previous results, we have also implemented an empirical method to assign the best prediction mode among the Intra_4 4, Intra_8 8 and Intra_16 16 prediction. From an analysis of the preliminary results presented in Section 2.3 applied to the six different video sequences under study, we have found that, Intra_8 8 may be the best prediction mode for video sequences encoded with a QP within the range [15, 45], while the Intra_4 4 may be the best prediction mode for a QP in the range [5, 10]. In the next section, we will carry a more in-depth analysis of this finding. We then propose to use the intra mode providing the best results for a given QP factor DC-ABS and DC-ABS pixel methods Finally, in order to obtain the best prediction modes for a given macroblock, independently of the QP factor and dynamically adapted to the video sequences exhibiting different spatial characteristics, we propose the DC-ABS and DC-ABS pixel methods (see Fig. 7). We proceed as follows: in the case of the prediction mode, the best predictor is simply obtained by taking the one whose DC coefficient exhibits the lowest absolute (ABS) difference with respect to the DC coefficient of the original block. Similarly to the prediction mode, for the 8 8 prediction mode, we determine the predictor whose DC coefficient exhibits the lowest absolute difference with respect to the DC coefficient of the original block for each one of the blocks of the macroblock. Similarly, for the 4 4 prediction mode, the best predictor is obtained for each one of the blocks of the macroblocks. This step is depicted in the box at the bottom of Fig. 7. In the figure, PX j denotes the resulting predictor with X=0,1,2. Thereafter, we propose two different methods for obtaining the best prediction mode (4 4, 8 8 or mode): DC-ABS and DC-ABS Pixel (see Fig. 7). In the DC-ABS method, the best prediction mode corresponds to the one having the lowest cumulative errors (4, 5, 6, 7). MinðError 4x4 ; Error 8x8 ; Error 16x16 Þ ð4þ where the cumulative error for each one of the block sizes is defined by: Error 4x4 ¼ X16 Ej 4 j¼1 ð5þ Error 8x8 ¼ X4 Ej 8 j¼1 ð6þ Error 16x16 ¼ E1 16 ð7þ where Ej n ¼ minðe 1 ; e 2 ; e 3 Þ, see Fig. 7. Similarly to the DC-ABS method, the DC-ABS pixel determines the prediction mode with the lowest cumulative error (4). However, this cumulative error is now evaluated in the pixel domain (8, 9, 10) by.

14 14 Multimed Tools Appl (2008) 38:1 25 Fig. 7 DC-ABS and DC-ABS pixel methods Error 4x4 ¼ X16 X 16 i¼1 j¼1 M original ði; jþ M prediction4x4 ði; jþ ð8þ Error 8x8 ¼ X16 X 16 i¼1 j¼1 M original ði; jþ M prediction8x8 ði; jþ ð9þ

15 Multimed Tools Appl (2008) 38: Table 3 Characteristics of MPEG-2 video bitstreams Sequence Q PSNR (db) Rate (Kbps) Hook ,562 Flower ,330 Tempete ,215 Mobile ,897 Dancer ,787 Akiyo ,361 Error 16x16 ¼ X16 X 16 i¼1 j¼1 M original ði; jþ M prediction16x16 ði; jþ ð10þ where M original (i, j) is the pixel (i, j) in the original macroblock, and M prediction (i, j) is the pixel (i, j) in the predicted macroblock. As we will show in the following section, the proposed algorithms will significantly reduce the number of operations involved in the calculation of the intra predictors when compared to the full estimation SAE-cost method used in H.264 JM reference software. 5 Performance evaluation In order to evaluate the simple intra prediction algorithms for heterogeneous MPEG-2/ H.264 video transcoders proposed in this work, we have implemented these proposed approaches based on the H.264 reference software [16] (JM version 9.3). The metrics we have been interested on are the computational cost and the rate-distortion function. The averaged PSNR values of luma (Y) and chroma (U, V) have been obtained using Equation 1. Throughout our experiments, we have used the six video sequences previously introduced. The H.264 video encoder takes as input the decoder MPEG-2 video (pixel data) and the DC coefficients of the 8 8 blocks. The input to the transcoder is the highest quality MPEG-2 video (see Table 3) in order to obtain a bitstream with the highest spatial details. Since the proposed transcoder addresses transcoding I frames in MPEG-2 to H.264 I frames, MPEG-2 bitstreams were created without P and B frames. Then, every frame of each sequence was Fig. 8 Computational cost. Number of operations per MB

16 16 Multimed Tools Appl (2008) 38:1 25 encoded as I-frame with H.264 in order to obtain results for intra frame prediction only. The sequences have been encoded using the QP factors ranging from 5 up to 45 in steps of 5. This corresponds to the H.264 QP range used in most practical applications. The ProfileIDC was set to high for all the simulations, with the FRExt options enabled. It should be clear that, under these experimental conditions, the H.264 RD curves are generated using MPEG-2 bitstreams encoded with Q=1, and then transcoded to H.264 changing the QP factor from QP=5 to QP=45. For this reason, the highest possible video quality in the H.264 RD curves (see Figs. 9 and 11) is the video quality corresponding to the MPEG-2 bitstream (see Table 3). Fig. 9 Rate distortion results. a CCIR. b CIF. c QCIF

17 Multimed Tools Appl (2008) 38: , 8 8 and methods Figure 8 shows the mean number of operations per MB used for the H.264 JM reference software full estimation SAE-cost approach and for the simplified methods using Intra_4 4, Intra_8 8 or Intra_16 16, showing the high gains on the reduction of the computational complexity characterizing our proposed schemes. Let us recall that as mentioned in Section 2.3, the RD-cost approach exhibits a normalized computational cost one order of magnitude higher than the SAE-cost approach, not shown Fig. 8. Figure 9 shows the RD results of applying the full estimation algorithms (SAE-cost and RD-cost), no intra prediction, and our methods using Intra_4 4, Intra_8 8 or Intra_16 16 to the six different video sequences. As seen from the figures, the PSNR obtained when applying these simple methods deviates slightly from the results obtained when applying the considerable more complex full estimation procedures. As expected, the difference is less noticeable at lower bit rates: the blocking effect is more noticeable, i.e. the DC coefficient has a heavier weight. From the results presented in Fig. 9, it is clear that, the Intra_8 8 prediction mode offers the best results for QP=15 to QP=45, and that the Intra_4 4 prediction mode for QP=5 to QP=10. Based on the results depicted in Figs. 8 and 9, depending on the image quality requirements, the use of these simple methods may prove a viable solution when computational cost may be an issue. Our results show that the proposed methods are able to maintain a good picture quality while considerably reducing the number of operations. 5.2 Empirical, DC-ABS and DC-ABS pixel methods Figure 10 shows the mean number of operations per MB used for the H.264 JM reference software full estimation SAE-Cost approach and for the Empirical, DC-ABS and DC-ABS pixel methods, showing the high gains on the reduction of computational complexity characterizing our proposed schemes. Figure 11 shows the RD results of applying the full estimation algorithms (SAE-Cost and RD-Cost), no intra prediction and Empirical, DC-ABS and DC-ABS pixel methods to six different video sequences. As seen from the figures, the PSNR obtained when applying Fig. 10 Computational cost. Number of operations per MB

18 18 Multimed Tools Appl (2008) 38:1 25 Fig. 11 Rate distortion results. a CCIR. b CIF. c QCIF our algorithms deviates slightly from the results obtained when applying the considerable more complex full estimation procedures. Based on the results depicted in Figs. 10 and 11, depending on the image quality requirement, the use of the DC-ABS scheme may prove a viable solution when computational cost may be an issue. It is important to point out that the empirical method exhibits better results for the Flower video sequence encoded at QP=20 than the SAE-function of the standard. The best result in terms of PSNR is obviously provided by the RD-cost method. However, there is a high computational cost involved into this process. Our results show that by applying the empirical method, we are able to obtain results very close to the ones obtained by the methods used in H.264 JM reference software. In

Multimed Tools Appl (2008) 38:1 25 19 Fig. 12 Prediction error results.

19 Multimed Tools Appl (2008) 38: Fig. 12 Prediction error results. a Flower garden sequence (CIR), frame 0. b Residual, full estimation (SAE), QP=25, MSE=9.84. c Residual, DC-ABS pixel, QP= 25, MSE=9.72

20 20 Multimed Tools Appl (2008) 38:1 25 Table 4 Comparison of different fast intra prediction methods Sequence Method ΔTime (%) ΔPSNR (db) ΔBitrate (%) Mobile DC-ABS pixel Pan et al. [21] Tsukuba et al. [25] Tempete DC-ABS pixel Pan et al. [21] Tsukuba et al. [25] fact, there are instances where the difference is practical negligible. However, this method can not be generalized since it is based on a limited number of sample video streams. This limitation is overcome by the DC-ABS y DC-ABS pixel methods. These two methods can dynamically adapt to the characteristics of the video sequences. Based on the results obtained, we can conclude that the DC-ABS pixel method exhibits the best results. However, the DC-ABS method seems to be a very good candidate, in particular when the required processing time is the major issue to be considered. Further studies should be conducted to gain further insight into the performance of the proposed methods. Figure 12 shows the prediction error results for the SAE method and our proposed DC- ABS pixel method (Fig. 12b and c). The prediction error is quantitatively measured by the well-known mean squared error (MSE) criterion. From these results, it is clear that the DC- ABS pixel method provides the closest MSE to the one obtained by the H.264 JM reference software method, Full Estimation method. It is also clear that the proposed method offers very good results, taking into account the high gains on the reduction of computational complexity. 5.3 Comparison of different fast intra prediction methods As mentioned in Section 3, recently fast mode decision algorithms for intra prediction in H.264/AVC video coding have been proposed in the literature. In this final study, we undertake a comparative analysis with two of the most prominent ones presented in [20] and [24]. In this experiment, the settings and parameters used are the same as that in [20] and [24] in order to undertake a fair comparison. The comparison metrics were produced and tabulated based on the difference of coding time (ΔTime), the PSNR difference (ΔPSNR) and the bit-rate difference (ΔBitrate). PSNR and bit-rate differences are calculated according to the numerical averages between the RD-curves derived from JM encoder and the fast intra prediction methods under study. The detail procedures in calculating these differences can be found from JVT documents authored by Bjontegaard [1], which is recommended by JVT Test Model Ad Hoc Group [15]. Note that PSNR and bit-rate differences should be regarded as equivalent, i.e., there is either the decrease in PSNR or the increase in bit-rate, but not both at the same time. As can be seen from Table 4, all the three methods present similar and negligible loss of video quality (<0.3 db). However, in terms of time saving, which is a critical issue in video transcoding applications, the proposed DC-ABS Pixel method achieves the best results, though with a slight increment in bit rate with respect to the others methods.

21 Multimed Tools Appl (2008) 38: Conclusions In this paper, we have studied the intra prediction process: a central element within the architecture of MPEG-2/H.264 transcoders. We have started by reviewing the two methods defined in the H.264 JM reference software: SAE-Cost and RD-Cost. From our results, we can conclude that SAE-cost method could be the method to be used when time is a critical parameter. Furthermore, depending on the image quality requirement, the only use of P0, P1 and P2 predictors may prove a viable solution, as a further feature allowing us to speed up this process. Also, as part of this analysis, we have shown that intra prediction plays a major role not only on the encoding process of the intra frames (I-pictures) but also on the encoding process of the inter frames (P pictures). In the second part of the paper, we have presented six methods to carry out the intra prediction. These proposals are based in reusing the side information coming out from the MPEG-2 decoder for speeding-up the intra prediction of the H.264. We have performed an exhaustive study by analyzing an image database of more than 120,000 samples. The performance of our proposals has been evaluated in terms of the computational complexity, rate distortion function and prediction error. Our results show that the proposed algorithms are able to maintain a good picture quality while considerably reducing the number of operations to be carried out. In particular, the results obtained by applying the DC-ABS Pixel method come very close to the ones obtained by the Full Estimation method proposed in the H.264 JM reference software. The preferred solution must be a tradeoff between computational cost and rate-distortion requirements. Finally, we have analyzed a comparative study with two of the most prominent fast intra prediction methods presented in the literature. The results show that the proposed DC-ABS pixel method achieves the best results for video transcoding applications. Acknowledgements This work was supported by the Ministry of Science and Technology of Spain under CICYT Project TIC C06-02 and the Council of Science and Technology of Castilla-La Mancha under Project PAI References 1. Bjontegaard G (2001) Calculation of average PSNR differences between RD-curves. Presented at the 13th VCEG-M33 Meeting, Austin, TX, April 2. Bjork N, Christopoulos C (1998) Transcoder architectures for video coding. IEEE Trans Consum Electron 44(1): Bialkowski J, Kaup A, Illgner K (2004) Fast transcoding of intra frames between H.263 and H.264. In Proceedings of IEEE International Conference on Image Processing, Singapore, pp Chen C, Wu P-H, Chen H (2004) MPEG-2 to H.264 Transcoding. In Proceedings of Picture Coding Symposium, San Francisco, CA, USA 5. Dogan S, Sadka A (2003) Video transcoding for inter-networks communications. Chapter of Compressed Video Communications. Wiley, pp , March 6. Dogan S, Sadka AH, Kondoz AM (1998) Tandeming/transcoding issues between MPEG-4 and H.263. In Proceedings of the Third European Workshop on Mobile/Personal Satcoms, Venice, Italy, pp Dogan S, Sadka AH, Kondoz AM (1999) Efficient MPEG-4/H.263 video transcoder for interoperability of heterogeneous multimedia networks. IEE Electron Let 35(11): Feamster N, Wee SJ (1999) An MPEG-2 to H.263 transcoder. In Proceedings of SPIE Voice, Video, and Data Communications Conference, vol Boston, USA, pp Goto T, Hanamura T, Negami T, Kitamura T (2004) A study on resolution conversion method using DCT coefficients. In Proceedings of International Symposium on Information Theory and its Applications, Parma, Italy

22 22 Multimed Tools Appl (2008) 38: Guo Z, Au O, Letaief K (2000) Parameter estimation for image/video transcoding. In Proceeding of IEEE International Symposium on Circuits and Systems, vol 2. Geneva, Switzerland, pp Guo W, Lin L, Zheng W (2001) Mismatch MB retrieval for MPEG-2 to MPEG- 4 transcoding. In Proceedings of IEEE Pacific Rim Conference on Multimedia, Beijing, China, pp Implementation Studies Group (2002) Main Results of the AVC Complexity Analysis. MPEG Document N4964, ISO/IEC JTC11/SC29/WG11, July 13. ISO/IEC JTC11/SC29/WG11 (1994) Generic Coding of Moving Pictures and Associated Audio Information: Video. ISO/IEC , May 14. ITU-T RECOMMENDATION H.264 (2003) Advanced video coding for generic audiovisual services, May 15. JVT Test Model Ad Hoc Group (2003) Evaluation sheet for motion estimation, draft version 4, February 16. Joint Video Team (JVT) (2004) of ISO/IEC MPEG and ITU-T VCEG, reference software to committee draft. JVT-F100 JM Kalva H (2004) "Issues in H.264/MPEG-2 video transcoding. In Proceedings of Consumer Communications and Networking Conference, Las Vegas, USA, pp Kalva H, Vetro A, Sun H (2003) Performance optimization of the MPEG-2 to MPEG-4 video transcoder. In Proceeding of SPIE Conference on Microtechnologies for the New Millennium, VLSI Circuits and Systems, vol San Diego, USA, pp Lin Y-C, Wang C-N, Chiang T, Vetro A, Sun H (2002) Efficient FGS to single layer transcoding. In Proceedings of IEEE International Conference on Consumer Electronics, Los Angeles, USA, pp Pan F, Lin X, Rahardja S, Lim KP, Li ZG, Wu D, Wu S (2005) Fast mode decision algorithm for intraprediction in H.264/AVC video coding. IEEE Trans Circuits Syst Video Technol 15(7): Shanableh T, Ghanbari M (2000) Heterogeneous video transcoding to lower spatio-temporal resolutions and different encoding formats. IEEE Trans Multimedia 2(2): Shanableh T, Ghanbari M (2000b) Transcoding of video into different encoding formats. In Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, Istanbul, Turkey, pp Sullivan G, Wiegand T (1998) Rate-distortion optimization for video compression. IEEE Signal Process Mag 15(6): Tsukuba T, Nagayoshi I, Hanamura T, Tominaga H (2005) H.264 fast intra prediction mode decision based on frequency characteristic. In Proceedings of European Signal Processing Conference (EUSIPCO), Antalya, Turkey 25. Vetro A, Christopoulos C, Sun H (2003) Video transcoding architectures and techniques: an overview. IEEE Signal Process Mag 20(2): Wee SJ, Apostolopoulos JG, Feamster N (1999) Field-to-frame transcoding with temporal and spatial downsampling. In Proceeding of IEEE International Conference on Image Processing, Kobe, Japan, pp Wiegand T, Sullivan G, Bjontegaard G, Luthra A (2003) Overview of the H.264/AVC video coding standard. IEEE Trans Circuits Syst Video Technol 13(7): Wiegand T, et al. (2003) Rate-constrained coder control and comparison of video coding standards. IEEE Trans Circuits Syst Video Technol 13(7): Wu J-L, Huang S-J, Huang Y-M, Hsu C-T, Shiu J (1996) An efficient JPEG to MPEG-1 transcoding algorithm. IEEE Trans Consum Electron 42(3): Xin J, Vetro A, Sun H (2004) Efficient macroblock coding-mode decision for H.264/AVC video coding. Mitsubishi Electric Research Laboratories (MERL), Cambridge, MA, USA, TR Xin J, Vetro A, Sun H (2004) Converting DCT coefficients to H.264/AVC transform coefficients. Mitsubishi Electric Research Laboratories (MERL), Cambridge, MA, USA, TR

Multimed Tools Appl (2008) 38:1 25 23 Gerardo Fernández-Escribano received his M.Sc. degree in Computer Science from the University of Castilla-La Mancha, Albacete, Spain in 2003.

He has also been a visiting researcher at the Florida Atlantic University, Boca Raton (USA), and at the Friedrich Alexander Universität, Erlangen Nuremberg, Germany.

23 Multimed Tools Appl (2008) 38: Gerardo Fernández-Escribano received his M.Sc. degree in Computer Science from the University of Castilla-La Mancha, Albacete, Spain in He is currently a Ph.D. degree student in the Department of Computer Engineering at the University of Castilla-La Mancha. He has also been a visiting researcher at the Florida Atlantic University, Boca Raton (USA), and at the Friedrich Alexander Universität, Erlangen Nuremberg, Germany. His research interests include video compression and video transcoding architectures. Pedro Cuenca received his M.Sc. degree in Physics (Electronics and Computer Science, award extraordinary) from the University of Valencia in He got his Ph.D. degree in Computer Engineering in 1999 from the Polytechnic University of Valencia, Spain. In 1995 he joined the Department de Computer Engineering at the University of Castilla-La Mancha. He is currently an Associate Professor of Communications and Computer Networks and Vice-Dean of the Escuela Politécnica Superior de Albacete (School of Computer Engineering).He has also been a visiting researcher at The Nottingham Trent University, Nottingham (England) and at the Multimedia Communications Research Laboratory, University of Ottawa (Canada). His research topics are centered in the area of high-performance networks, wireless LAN, video compression, QoS video transmission and error-resilient protocol architectures. He has published over 70 papers in international journals and conferences on computer networks and performance evaluation. He has served in the organization of international conferences as chair, program co-chair and technical program committee member. He has been reviewer for several journals and for several international conferences. He is a member of the IFIP 6.8 Working Group and a member of the IEEE.

Mathématiques Appliquées (ENSIMAG), France, in 1984 and the Doctorat de l Université from Université Pierre et Marie Curie, France, in 1987, both in computer science.

24 24 Multimed Tools Appl (2008) 38:1 25 Luis Orozco-Barbosa received the B.Sc. degree in electrical and computer engineering from Universidad Autonoma Metropolitana, Mexico, in 1979, the Diplome d études Approfondies from École Nationale Supérieure d Informatique et de Mathématiques Appliquées (ENSIMAG), France, in 1984 and the Doctorat de l Université from Université Pierre et Marie Curie, France, in 1987, both in computer science. From 1991 to 2002, he was a Faculty member at the School of Information Technology and Engineering (SITE), University of Ottawa, Canada. In 2002, he joined the Department of Computer Engineering at Universidad de Castilla-La Mancha (Spain). He has also been appointed Director of the Albacete Research Institute of Informatics, a National Centre of Excellence. He has conducted numerous research projects with the private sector and served as Technical Advisor for the Canadian International Development Agency (CIDA). He has published over 180 papers in international Journals and Conferences on computer networks and performance evaluation. His current research interests include Internet protocols, network planning, wireless communications, traffic modeling and performance evaluation. He is a member of the IEEE. Antonio Garrido received the degree in physics (electronics and computer science) and the Ph.D. degrees from the University of Granada, Spain, in 1986 and University of Valencia, Spain, in 1991, respectively. In 1986, he joined the Department of Computer Engineering at the University of Castilla-La Mancha, where he is currently a full professor of Computer Architecture and Technology and Dean of the Escuela Politecnica Superior de Albacete (School of Computer Engineering). His research interests include high-performance networks, telemedicine, video compression, and video transmission. He has published over 60 papers in international journals conferences on performance evaluation of parallel computer and communications systems and compression and transmission in high-speed networks. He has led several research projects in telemedicine, computer networks and advanced computer system architectures.

Multimed Tools Appl (2008) 38:1 25 25 Hari Kalva joined the Department of Computer Science and Engineering at Florida Atlantic University as an assistant professor in August 2003.

25 Multimed Tools Appl (2008) 38: Hari Kalva joined the Department of Computer Science and Engineering at Florida Atlantic University as an assistant professor in August Prior to that he was a consultant with Mitsubishi Electric Research Labs, Cambridge, MA. He was a co-founder and the vice president of engineering of Flavor Software, a New York company founded in 1999, that developed MPEG-4 based solutions for the media and entertainment industry. Dr. Kalva is an expert on digital audio visual communications systems with over 12 years of experience in multimedia research, development, and standardization. He has made key contributions to the MPEG-4 Systems standard and also contributed to the DAVIC standards development. His research interests include pervasive media delivery, content adaptation, video transcoding, video compression, and communication. He has over 50 published papers and five patents (11 pending) to his credit. He is the author of one book and coauthor of five book-chapters. Dr. Kalva received a Ph.D. and an M.Phil. in Electrical Engineering from Columbia University in 2000 and 1999 respectively. He received an M.S. in Computer Engineering from Florida Atlantic University in 1994, and a B.Tech. in Electronics and Communications Engineering from N. B.K.R. Institute of Science and Technology, S.V. University, Tirupati, India in 1991.

Very Low Complexity MPEG-2 to H.264 Transcoding Using Machine Learning

Very Low Complexity MPEG-2 to H.264 Transcoding Using Machine Learning Gerardo Fernández Escribano Instituto de Investigación en Informática de Albacete. Universidad de Castilla-La Mancha Avenida de España,