EE 5359 H.264 to VC-1 TRANSCODING

EE 5359 H.264 to VC-1 TRANSCODING Vidhya Vijayakumar Student I.D.: 1000-622152 Date: November 3, 2009 1

H.264 to VC-1 TRANSCODER Objective The objective of the thesis is to implement a H.264 bitstream to VC-1 transcoder for progressive compression. Motivation The high definition video adoption has been growing rapidly for the last five years. The high definition DVD format blue ray has mandated MPEG-2 [3], H.264 [2] and VC-1 [1] as video compression formats. The coexistence of these different video coding standards creates a need for transcoding. As more and more end products use the above standards, transcoding from one format to another adds value to the product s capability. While there has been recent work on MPEG-2 to H.264 transcoding [3], VC-1 to H.264 transcoding [4], the published work on H.264 to VC-1 transcoding is nearly non-existent. This has created the motivation to develop a transcoder that can efficiently transcode a H.264 bitstream to a VC-1 bitstream. Fig. 1 gives a typical application scenario. Fig. 1 An application scenario for transcoding [39] Details Video transcoding is the operation of converting video from one format to another [5]. A format is defined by characteristics such as bit-rate, spatial resolution etc, as shown in Fig. 2. One of the earliest applications of transcoding is to adapt the bit-rate of a compressed stream to the channel bandwidth for universal multimedia access in all kinds of channels like wireless networks, internet, dial-up networks etc. 2

Fig. 2 Transcoding [5] Changes in the characteristics of an encoded stream like bit rate, spatial resolution, quality etc can also be achieved by scalable video coding [5]. However, in cases where the available network bandwidth is insufficient or if it fluctuates with time, it may be difficult to set the base layer bit-rate. In addition, scalable video coding demands additional complexities at both the encoder and the decoder. The basic architecture for converting an H.264 bitstream into a VC-1 elementary stream arises from complete decoding of the H.264 stream and then reencoding into a VC-1 stream. However, this involves significant computational complexity [6]. Hence there also is a need to transcode at low complexity. Transcoding can in general be implemented in the spatial domain or in the transform domain or in a combination of the two domains. The common transcoding architectures [5] are: Open loop transform domain transcoding Open loop transcoders are computationally efficient, as shown in Fig. 3. They operate in the DCT domain. However they are subject to drift error. Drift error occurs due to rounding, quantization loss and clipping functions. Fig. 3 Open loop transform domain transcoder architecture [5] Cascaded Pixel Domain Architecture (CPDT) This is the most basic transcoding architecture (Fig. 4). The motion vectors from the incoming bit stream are extracted and reused. Thus the complexity of the motion estimation block is eliminated which accounts for 60% of the encoder computation. As compared to the previous architecture, CPDT is drift free. Hence, even though it is slightly more complex, it is suited for heterogeneous transcoding between different standards where the basic parameters like mode decisions, motion vectors etc are to be re-derived. 3

Fig. 4 Cascaded pixel domain transcoder architecture [5] Simplified DCT Domain transcoders (SDDT) This transcoder is based on the assumption that DCT, IDCT and motion compensation are linear processes (Fig. 5). This architecture requires that motion compensation be performed in the DCT domain, which is a major computationally intensive operation [3]. For instance, as shown in the Fig. 5, the goal is trying to compute the DCT coefficients of the target block B from the four overlapping blocks B1, B2, B3 and B4. Fig. 5 Simplified transform domain transcoder architecture [5] 4

Fig. 6 Transform domain motion compensation illustration [5] Also, clipping functions and rounding operations performed for interpolation in fractional pixel motion compensation lead to a drift in the transcoded video. Cascaded DCT Domain transcoders (CDDT) This is used for spatial/temporal resolution downscaling and other coding parameter changes (Fig. 7). As compared with SDDT, greater flexibility is achieved by introducing another transform domain motion compensation block; however it is far more computationally intensive and requires more memory [3]. It is often applied to downscaling applications where the encoder end memory will not cost much due to downscaled resolution. Fig. 7 Cascaded transform domain transcoder architecture [5] Choice of basic transcoder architecture: DCT domain transcoders have the main drawback that motion compensation in transform domain is very computationally intensive. DCT domain transcoders are also, less flexible as compared to pixel domain transcoders, for instance, the SDDT architecture can only be used for bit rate reduction transcoding. It assumes that the 5

spatial and temporal resolutions stay the same and that the output video uses the same frame types, mode decisions and motion vectors as the input video. For H.264 to VC-1 transcoding, it is required to implement several changes in order to accommodate the mismatches between the two standards. For instance, for motion estimation and compensation, H.264 supports 16x16, 16x8, 8x16, 8x8, 8x4, 4x8, 4x4 macroblock partitions (Fig. 8), but VC-1 supports 16x16 and 8x8 only (Fig. 9). The transform size and type (8x8 and 4x4 in H.264 and 8x8, 4x8, 8x4 and 4x4 in VC-1) are different and make transform domain transcoding prohibitively complex. Hence, the use of DCT domain transcoders is not very ideal. Fig. 8 Segmentations of the macroblock for motion compensation in H.264 Top: segmentation of macroblocks, bottom: segmentation of 8x8 partitions [2] Fig. 9 Segmentations of the macroblock for motion compensation in VC-1 [2] From Fig. 10, it can be inferred that, the cascaded pixel domain architecture outperforms the DCT domain transcoders. Also for larger GOP sizes, the drift in DCT domain transcoders becomes more significant. 6

Fig. 10 PSNR vs Bit-rate graph for the Foreman sequence transcoded with a GOP size 15, using different transcoding architectures as described in Figs. 1, 2, 3 and 5. [5] Hence, heterogeneous transcoding in the pixel domain is preferred for standards transcoding. Standards transcoding: When transcoding between two different standards, the main factor involved is compatibility between the profile and level of the input stream and that of the output stream for a specific purpose. The goal here is to transcode a H.264 bitstream of baseline profile to VC-1 bit stream of simple profile. The Table 1 compares and contrasts the characteristics of both standards H.264 baseline profile VC-1 Simple Profile Chroma format 4:2:0 4:2:0 Picture coding type I,P I,P Transform size 4x4 8x8, 4x8, 8x4, 4x4 Intra prediction Directional predictors None Block sizes for motion compensation 16x16, 16x8, 8x16, 8x8, 4x8, 8x4, 4x4 16x16, 8x8 Table 1 Main characteristic of H.264 Main profile and VC-1 simple profile Overview of H.264: H.264 [2] is a standard for video compression, and is equivalent to MPEG- 4 Part 10, or MPEG-4 AVC (for advanced video coding) (Fig. 11, Fig. 12). As of 2008, it is the latest block-oriented motion-compensation-based video standard developed by the ITU-T Video coding experts group (VCEG) together with the ISO/IEC moving picture experts group (MPEG), and it was the product of a 7

partnership effort known as the joint video team (JVT). The ITU-T H.264 standard and the ISO/IEC MPEG-4 part 10 standard (formally, ISO/IEC 14496-10) are jointly maintained so that they have identical technical content. Fig. 11 H.264 Encoder [33] Fig. 12 H.264 Decoder [33] The standardization of the first version of H.264/AVC was completed in May 2003. The JVT then developed extensions to the original standard that are known as the fidelity range extensions (FRExt) [29]. These extensions enable higher quality video coding by supporting increased sample bit depth precision and higher-resolution color information, including sampling structures known as YUV 4:2:2 and YUV 4:4:4. Several other features are also included in the fidelity range extensions, such as adaptive switching between 4 4 and 8 8 integer transforms, encoder-specified perceptual-based quantization weighting matrices, efficient inter-picture lossless coding, and support of additional color spaces. The design work on the fidelity range extensions was completed in July 2004, and the drafting work on them was completed in September 2004. Scalable video coding (SVC) [30] as specified in Annex G of H.264/AVC allows the construction of bitstreams that contain sub-bitstreams that conform to H.264/AVC. For temporal bitstream scalability, i.e., the presence of a sub-bitstream with a smaller temporal sampling rate than the bitstream, complete access units are removed from the bitstream when deriving the sub-bitstream. In this case, high-level syntax and inter prediction reference pictures in the bitstream are constructed accordingly. For spatial and quality bitstream scalabilities, i.e. the presence of a subbitstream with lower spatial resolution or quality than the bitstream, network 8

abstraction layer (NAL) units are removed from the bitstream when deriving the subbitstream. In this case, inter-layer prediction, i.e., the prediction of the higher spatial resolution or quality signal by data of the lower spatial resolution or quality signal, is typically used for efficient coding. The scalable video coding extension was completed in November 2007 [30]. Some of the features adopted in H.264 for enhancement of prediction, improved coding efficiency and robustness to data errors/losses are listed as follows. Features for enhancement of prediction are as follows. Directional spatial prediction for intra coding Variable block-size motion compensation with small block size (Fig. 13) Fig. 13 Various block sizes in H.264 for motion estimation/compensation [2] Quarter-sample-accurate motion compensation Motion vectors over picture boundaries Multiple reference picture motion compensation Decoupling of referencing order from display order Decoupling of picture representation methods from picture referencing capability Weighted prediction Improved skipped and direct motion inference In-the-loop deblocking filtering Features for improved coding efficiency are as follows. Small block-size transform Exact-match inverse transform (Fig. 14) 9

Fig. 14 Forward 4x4 and 8x8 integer transforms [29] Short word-length transform Hierarchical block transform Arithmetic entropy coding Context-adaptive entropy coding Features for robustness to data errors/losses are as follows. Profiles in H.264 Parameter set structure NAL unit syntax structure Flexible slice size Flexible macroblock ordering (FMO) Arbitrary slice ordering (ASO) Redundant slices (RS) Data partitioning SP/SI synchronization/switching pictures H.264 standard defines numerous profiles, as listed below. Constrained baseline profile Baseline profile Main profile Extended profile 10

High profile High 10 profile High 4:2:2 profile High 4:4:4 predictive profile High stereo profile High 10 intra profile High 4:2:2 intra profile High 4:4:4 intra profile CAVLC 4:4:4 intra profile Scalable baseline profile Scalable high profile Scalable high intra profile Table 2 and Table 3 outlines the features of the various profiles in H.264. Fig. 15 gives a graphical comparison of the profiles in H.264. Table 2 Features in baseline, main and extended profile [29] Table 3 Features in high profile [29] 11

Fig. 15 Comparison of H.264 baseline, main, extended and high profiles [33] Overview of VC-1 VC-1 [1] is the informal name of the SMPTE 421M video codec standard initially developed by Microsoft. It was released on April 3, 2006 by SMPTE. It is now a supported standard for blu-ray discs, and Windows media video 9 (WMV9). VC-1 is an evolution of the conventional DCT-based video codec design also found in H.261 [31], H.263 [27], MPEG-1 [40] and MPEG-2 [3]. It is widely characterized as an alternative to the latest ITU-T and MPEG video codec standard known as H.264/MPEG-4 AVC. VC-1 contains coding tools for interlaced video sequences as well as progressive encoding. The main goal of VC-1 development and standardization is to support the compression of interlaced content without first converting it to progressive, making it more attractive to broadcast and video industry professionals. The VC-1 codec (Fig. 16) is designed to achieve state-of-the-art compressed video quality at bit rates that may range from very low to very high. The codec can easily handle 1920 pixel 1080 pixel resolution at 6 to 30 megabits per second (Mbps) for high-definition video. VC-1 is capable of higher resolutions such as 2048 pixels 1536 pixels for digital cinema, and of a maximum bit rate of 135 Mbps. An example of very low bit rate video would be 160 pixel 120 pixel resolution at 10 kilobits per second (Kbps) for modem applications. The basic functionality of VC-1 involves a block-based motion compensation and spatial transform scheme similar to that used in other video compression standards such as MPEG-1 and H.261 [31]. However, VC-1 includes a number of innovations and optimizations that make it distinct from the basic compression scheme, resulting in excellent quality and efficiency. VC-1 advanced profile is also transport independent. This provides even greater flexibility for device manufacturers and content services. 12

Fig. 16 VC 1 codec [32] Profiles in VC-1 VC-1 defines three profiles, as listed below 1. Simple profile 2. Main profile 3. Advanced profile Table 4 outlines the features in the different profiles in VC-1. Simple Main Advanced Baseline intra frame compression Variable-sized transform 16-bit transform Overlapped transform 4 motion vector per macroblock ¼ pixel luminance motion compensation Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes 13

¼ pixel chrominance motion compensation No Yes Yes Start codes No Yes Yes Extended motion vectors No Yes Yes Loop filter No Yes Yes Dynamic resolution change Adaptive macroblock quantization No Yes Yes No Yes Yes B frames No Yes Yes Intensity compensation No Yes Yes Range adjustment No Yes Yes Simple Main Advanced Field and frame coding modes No No Yes GOP Layer No No Yes Display metadata No No Yes Table 4 Features in VC-1 profiles [48] Innovations VC-1 includes a number of innovations that enable it to produce high quality content. This section provides brief descriptions of some of these features. Adaptive Block Size Transform 14

Traditionally, 8 8 transforms have been used for image and video coding. However, there is evidence to suggest that 4 4 transforms can reduce ringing artifacts at edges and discontinuities. VC-1 is capable of coding an 8 8 block using either an 8 8 transform, two 8 4 transforms, two 4 8 transforms, or four 4 4 transforms (Fig. 17). This feature enables coding that takes advantage of the different transform sizes as needed for optimal image quality. 16-Bit Transforms Fig. 17 VC-1 transform sizes [4] In order to minimize the computational complexity of the decoder, VC-1 uses 16-bit transforms. This also has the advantage of easy implementation on the large amount of digital signal processing (DSP) hardware built with 16-bit processors. Among the constraints put on transforms specified in VC-1 is the requirement that the 16-bit values used produce results that can fit in 16 bits. The constraints on transforms ensure that decoding is as efficient as possible on a wide range of devices. Motion Compensation Motion compensation is the process of generating a prediction of a video frame by displacing the reference frame. Typically, the prediction is formed for a block (an 8 8 pixel tile) or a macroblock (a 16 16 pixel tile) of data (Fig. 18). The displacement of data due to motion is defined by a motion vector, which captures the shift along both the x- and y-axes. Fig. 18 VC-1 motion compensation sizes [4] The efficiency of the codec is affected by the size of the predicted block, the granularity of sub-pixel data that can be captured, and the type of filter used for 15

generating sub-pixel predictors. VC-1 uses 16 16 blocks for prediction, with the ability to generate mixed frames of 16 16 and 8 8 blocks. The finest granularity of sub-pixel information supported by VC-1 is 1/4 pixel (Fig. 19). Two sets of filters are used by VC-1 for motion compensation. The first is an approximate bicubic filter with four taps. The second is a bilinear filter with two taps. The four-tap bicubic filters used in VC-1 for ¼ and ½ pixel shifts are: [-4 53 18-3]/64 and [-1 9 9-1]/16. Fig. 19 Integer, half and quarter pel positions [2] (A-Q Integer, aa-hh half, a-s quarter pel positions) VC-1 combines the motion vector settings defined by the block size, subpixel resolution, and filter type into modes. The result is four motion compensation modes that suit a range of different situations. This classification of settings into modes also helps compact decoder implementations. Loop Filtering VC-1 uses an in-loop deblocking filter (Fig. 20) that attempts to remove block-boundary discontinuities introduced by quantization errors in interpolated frames. These discontinuities can cause visible artifacts in the decompressed video frames and can impact the quality of the frame as a predictor for future interpolated frames. 16

Block boundary Fig. 20 Loop filtering in VC-1 [4] (Only pixel p4 and p5 are filtered) The loop filter takes into account the adaptive block size transforms. The filter is also optimized to reduce the number of operations required. Interlaced Coding Interlaced video content is widely used in television broadcasting. When encoding interlaced content, the VC-1 codec can take advantage of the characteristics of interlaced frames to improve compression. This is achieved by using data from both fields to predict motion compensation in interpolated frames. Advanced B Frame Coding A bi-directional or B frame is a frame that is interpolated from data both in previous and subsequent frames. B frames are distinct from I frames (also called key frames), which are encoded without reference to other frames. B frames are also distinct from P frames, which are interpolated from previous frames only. VC-1 includes several optimizations that make B frames more efficient. VC-1 does not have a fixed group of pictures (GOP) structure and the number of pictures in a GOP can vary. Fading Compensation Due to the nature of compression that uses motion compensation, encoding of video frames that contain fades to or from black is very inefficient. With a uniform fade, every macroblock needs adjustments to luminance. VC-1 includes fading compensation, which detects fades and uses alternate methods to adjust luminance. This feature improves compression efficiency for sequences with fading and other global illumination changes. Differential Quantization 17

Differential quantization, or dquant, is an encoding method in which multiple quantization steps are used within a single frame. Rather than quantize the entire frame with a single quantization level, macroblocks are identified within the frame that might benefit from lower quantization levels and greater number of preserved AC coefficients. Such macroblocks are then encoded at lower quantization levels than the one used for the remaining macroblocks in the frame. The simplest and typically most efficient form of differential quantization involves only two quantizer levels (bi-level dquant), but VC-1 supports multiple levels, also. MAPPING DIFFERENCES BETWEEN THE TWO STANDARDS: The transcoding algorithm considered in this research assumes full H.264 decoding down to the pixel level, followed by a reduced complexity VC-1 encoding. The data gathered during the H.264 decoding stage is used to accelerate the VC-1 encoding stage. It is assumed that the H.264 encoded bitstream is generated with an R-D optimized encoder. The picture coding types used are similar in both the standards. The transform size and type are different and makes transform domain transcoding prohibitively complex. The semantics of intra MBs are similar except for the intra directional prediction allowed in H.264 and the mixed MBs in VC-1. The inter prediction has significant differences including the block size of MC, block size of transform, and reference frames used. These similarities between the codecs can be exploited in reducing the transcoding complexity. Intra MB Mode Mapping: An intra MB in the incoming H.264 bitstream is coded as a VC-1 intra MB. A H.264 intra MB can be coded as Intra 4x4 (9 different directional modes) or Intra 16x16 (4 different modes). But a VC-1 intra MB has four 8x8 blocks and has no prediction modes. Since intra MB in VC-1 uses 8x8 transform (Fig. 21), irrespective of the block size (16x16 or 4x4) in H.264, we need not carry over the information of the intra prediction type in H.264. Table 5 shows the proposed intra MB mapping. H.264 Intra MB VC-1 Intra MB Intra 16x16 (Any mode) Intra MB 8x8 Intra 4x4 (Any mode) Intra MB 8x8 Table 5 H.264 and VC-1 Intra MB mapping 18

Fig. 21 Matrix for one-dimensional 8-point inverse transform [32] Inter MB Mode Mapping: An inter coded MB in the incoming H.264 bitstream is coded as inter MB in VC-1. The inter MB in H.264 has 7 different motion compensation sizes 16x16, 16x8, 8x16, 8x8, 4x8, 8x4, 4x4. The inter MB in VC-1 has 2 different motion compensation sizes 16x16 and 8x8. Another significant difference is that H.264 uses 4x4 (and 8x8 in fidelity range extensions) transform sizes where as VC-1 uses 4 different transform sizes 8x8, 4x8, 8x4 and 4x4. The 16x16, 8x16, 16x8 motion compensation sizes are usually selected in H.264 for areas that are relatively uniform and will be mapped to inter 16x16 MB in VC-1 using the selected H.264 MC block size as a measure of homogeneity in the block to be able to differentiate the transform size to be applied in VC-1. The 8x8, 8x4, 4x8 and 4x4 modes are usually selected in H.264 for areas that have non-uniform motion. The 16x16 mode in VC-1 is eliminated for such nonuniform MBs. The MB is then mapped to 8x8 block size in VC-1 with the H.264 block size determining the transform size to be used in VC-1. Table 6 describes the decision making for mapping the inter MBs and the type of transform to be used in VC-1. H.264 Inter MB VC-1 Inter MB Transform size in VC-1 Inter 16x16 Inter 16x16 8x8 Inter 16x8 Inter 16x16 8x4 Inter 8x16 Inter 16x16 4x8 Inter 8x8 Inter 8x8 8x8 Inter 4x8 Inter 8x8 4x8 Inter 8x4 Inter 8x8 8x4 Inter 4x4 Inter 8x8 4x4 Table 6 H.264 and VC-1 Inter MB mapping and VC-1 transform type 19

Motion vector mapping: Re-use of motion vectors selected in H.264 can significantly reduce the complexity of VC-1 encoding. Table 7 describes the re-use of motion vectors. H.264 Inter MB VC-1 Inter MB Motion Vector Re-use Inter 16x16 Inter 16x16 Same motion vectors Inter 16x8 Inter 16x16 Average of motion vectors Inter 8x16 Inter 16x16 Average of motion vectors Inter 8x8 Inter 8x8 Same motion vectors Inter 4x8 Inter 8x8 Average of motion vectors Inter 8x4 Inter 8x8 Average of motion vectors Inter 4x4 Inter 8x8 Average of motion vectors Table 7 H.264 and VC-1 Inter MB motion vector mapping Reference Pictures: H.264/AVC standard defines the use of up to sixteen reference pictures for motion estimation, while VC-1 uses only one or two, according to the slice type P or B respectively. The reuse of motion vectors implies using the same reference pictures to maintain their meaning. The motion vector conversion assumes that motion vector length is related to the reference image distance [39]. The source motion vectors are scaled, according to Fig. 22 in order to use valid VC-1 reference pictures. This conversion assumes constant motion between H.264/AVC and VC-1 reference pictures. The motion vector conversion is performed by scaling it with the temporal distance between the two reference pictures. (Fig. 22 describes the direction reference pictures used in the transcoding and not the motion vector direction) H.264 VC-1 Fig. 22 Motion vector scaling [39] Skipped Macroblock: When a skipped macro block is signaled in the bit stream, no further data is sent for that macro block. The mode conversion of H.264 skip macroblocks to VC-1 skip is a straightforward process. Since the skip macro block definition of both standards is fully compatible, a direct conversion is possible. 20

OPEN LOOP TRANSCODER: The open loop transcoder is designed by cascading a H.264 encoder [49], H.264 decoder [49], VC-1 encoder [50] and a VC-1 decoder [50]. YUV H.264 Encoder H.264 Decoder VC-1 Encoder VC-1 Decoder YUV Fig. 23 Open loop transcoder Performance of open loop transcoder Mean square error (MSE), peak-to-peak signal to noise ratio (PSNR), structural similarity index measure (SSIM) for Akiyo QCIF (100 frames) is calculated using the open loop transcoder. Fig. 24 MSE of open loop transcoder Akiyo sequence 21

Fig. 25 PSNR of open loop transcoder Akiyo sequence Fig. 26 SSIM of open loop transcoder Akiyo sequence 22

PROGRESS: The H.264 bitstream has the macroblock type, sub-macroblock type, reference picture index and motion vectors (if applicable). These details are extracted out of the bitstream and written to a data file. This information is used while encoding in VC-1. Current stage involves using the extracted information to reduce the complexity of the encoder Trying to encode a bitstream compliant with the decoder. Extracted details H.264 bitstream contains information about Macroblock type P16x16 P16x8 P8x16 P8x8 I4MB I16MB Macroblock sub block type SMB8x8 SMB8x4 SMB4x8 SMB4x4 Reference picture index Motion vector x, y 23

Fig. 27 gives a screen shot of the extracted information. Fig. 27 Screen shot of extracted information from H.264 bit stream Simplified VC-1 Encoder A sample code that described how the complexity of the VC-1 encoder is reduced by re-using the extracted information from the H.264 bit stream. #ifdef H264VC1TRANSCODER if((mbtype == I4MB) (mbtype == I16MB)) { pmb->embtype = vc1_mbintra; for(blk = 0;Blk < VC1_BLOCKS_PER_MB; Blk++) { pmb->sblk[blk].eblktype = vc1_blkintra; } } else if((mbtype == P16x16) (mbtype == P16x8) (mbtype == P8x16) (mbtype == PSKIP)) {.... } #endif 24

Future work The next work to be carried out is to encode the VC-1 bitstream without errors and analyze the performance of the basic transcoder. CONCLUSIONS: As mentioned earlier, it is proposed to transcode an H.264 bitstream to a VC-1 stream in the pixel domain (CPDT) and compare the results (MSE, PSNR, SSIM, complexity, bit rates) against an open loop transcoder. On the encoder side, since there is no re-estimation of the motion vectors, the complexity on the encoder side reduces by about 40-50%. Road map ahead is to extract re-usable information from the H.264 bitstream to be used in VC-1 encoding. REFERENCES: [1] VC-1 Compressed Video Bitstream Format and Decoding Process (SMPTE 421M-2006), SMPTE Standard, 2006. [2] T. Wiegand et al, Overview of the H.264/AVC video coding standard, IEEE Trans. CSVT, Vol. 13, pp. 560-576, July 2003. [3] C. Chen, P-H.Wu and H. Chen, MPEG-2 to H.264 transcoding, Picture Coding Symposium, pp. 15-17 Dec, 2004. [4] Jae-Beom Lee and H. Kalva, "An efficient algorithm for VC-1 to H.264 video transcoding in progressive compression," IEEE International Conference on Multimedia and Expo, pp. 53-56, July 2006 [5] J Xin, C.W. Lin and M.T. Sun, Digital video transcoding, Proceedings of the IEEE, Vol. 93, pp 84-97, Jan 2005. [6] A. Vetros, C. Christopoulos and H. Sun, Video transcoding architectures and techniques: An overview, IEEE Signal Processing Magazine, Vol. 20, pp 18-29, March 2003. [7] Advanced Video Coding for Generic Audiovisual Services, ITU-T Rec. H.264 / ISO / IEC 14496-10, Mar 2005. [8] S. Srinivasan and S. L. Regunathan, An overview of VC-1 Proc. SPIE, vol. 5960, pp. 720 728, 2005. [9] P. List et al, Adaptive deblocking filter, IEEE Trans. Circuits Syst. Video Technol., vol. 13, pp.614 619, Jun. 2003. [10] T. D. Tran, J. Liang and C. Tu, Lapped transform via time-domain pre- and post-filtering, IEEE Trans. Signal Proc., vol. 51, pp. 1557 1571, Jun. 2003. 25

[11] C. C. Cheng, T. S. Chang, and K. B. Lee, An in-place architecture for the deblocking filter in H.264/AVC, IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 53, pp. 530 534, Jul. 2006. [12] T. C. Chen et al Analysis and architecture design of an HDTV720p 30 frames/s H.264/AVC encoder, IEEE Trans. Circuits Syst. Video Technol., vol. 16, pp. 673 688, Jun. 2006. [13] Y.-W. Huang et al Architecture design for deblocking filter in H.264 / JVT / AVC, in IEEE Proc. Int. Conf. Multimedia and Expo, pp. 693 696, July 2003. [14] S.-C. Chang et al A platform based bus-interleaved architecture for de-blocking filter in H.264/MPEG-4 AVC, IEEE Trans. Consumer Electron., vol. 51, pp. 249 255, Feb 2005. [15] M. Sima, Y. Zhou, and W. Zhang, An efficient architecture for adaptive deblocking filter of H.264/AVC video coding, IEEE Trans. Consumer Electronics, vol. 50, pp. 292 296, Feb. 2004. [16] S.-Y. Shih, C.-R. Chang and Y.-L. Lin, A near optimal deblocking filter for H.264 advanced video coding in Proc. Asia and South Pacific Design Automation Conf., pp. 170 175, Jan 2006. [17] T.-M. Liu et al, A memory-efficient deblocking filter for H.264/AVC video coding, in Proc. IEEE Int. Symp. Circuits Syst., pp. 2140 2143, May 2005. [18] T.-M. Liu et al, A 125 µ W fully scalable MPEG-2 and H.264/AVC video decoder for mobile applications, IEEE J. Solid-State Circuits, vol. 42, pp. 161 169, Jan. 2007. [19] L. Li, S. Goto and T. Ikenaga, An efficient deblocking filter architecture with 2- dimensional parallel memory for H.264/AVC, in Proc. Asia and South Pacific Design Automation Conf., pp.623 626, 2005 [20] H.-Y. Lin et al Efficient deblocking filter architecture for H.264 video coders, in IEEE ISCAS, pp 4, May 2006 [21] T.-M. Liu, W.-P. Lee and C.-Y. Lee, An in/post-loop deblocking filter with hybrid filtering schedule IEEE Trans. Circuits Syst. for Video Technol., vol. 17, pp. 937 943, Jul. 2007. [22] I. Ahmad et al, Video transcoding: An overview of various techniques and research issues, IEEE Trans. on Multimedia, vol. 7, pp. 793-8, Oct. 2005 26

[23] Y.L Lee and T.Q Nguyen, "Analysis and efficient architecture design for VC-1 overlap smoothing and in-loop deblocking filter," IEEE Trans Circuits and Syst. for Video Technol, vol.18, pp 1786-1796, Dec. 2008 [24] G. Fernandez-Escribano et al, Speeding-up the macroblock partition mode decision for MPEG-2 to H.264 transcoding, Proceedings of IEEE ICIP 2006, Atlanta, pp 869-872, Sept 2006. [25] Z. Zhou et al "Motion information and coding mode reuse for MPEG-2 to H.264 transcoding", Proceedings of the IEEE ISCAS 2005, pp 1230-1233, May 2005. [26] B. Petljanski and H. Kalva, DCT domain intra MB mode decision for MPEG-2 to H.264 transcoding Proceedings of the IEEE ICCE 2006, pp. 419-420, Jan 2006. [27] J. Bialkowski, A. Kaup and K. Illgner, Fast transcoding of intra frames between H.263 and H.264, IEEE ICIP, vol.4, pp. 2785-2788, Oct 2004. [28] Y.-K. Lee, S.-S. Lee, and Y.-L. Lee, MPEG-4 to H.264 transcoding using macroblock statistics, Proceedings of the IEEE ICME 2006, pp.57-60, Toronto, Canada, July 2006. [29] G. Sullivan, P. Topiwalla and A. Luthra, The H.264/AVC video coding standard: overview and introduction to the fidelity range extensions, SPIE Conference on Applications of Digital Image Processing XXVII, vol. 5558, pp. 53-74 Aug 2004. [30] T. Weigand et al, Introduction to the Special Issue on Scalable Video Coding Standardization and Beyond IEEE Trans on Circuits and Systems for Video Technology, Vol 17, pp 1034, Sept 2007. [31] V. Roden and T. Praktische, H.261 and MPEG1- A comparison Conference Proceedings of the 1996 IEEE Fifteenth Annual International Phoenix Conference on Computers and Communications, pp.65-71, Mar 1996 [32] S. Srinivasan et al, Windows Media Video 9: overview and applications Signal Processing: Image Communication, Vol 19, pp 851-875, Oct 2004. [33] S. K. Kwon, A. Tamhankar and K.R. Rao, "An overview of H.264/MPEG-4 Part 10," Special issue of Journal of Visual Communication and Image Representation,vol.17, pp 186-216, April 2006. [34] G.A Davidson et al, ATSC video and audio coding, Proc. IEEE, vol 94, pp 60-76, Jan 2006. 27

[35] J. Bialkowski, M Barkowky and A. Kaup, Overview of low complexity video transcoding from H.263 to H.264 IEEE ICME, pp 49-52, 2006. [36] T. D. Nguyen et al, Efficient MPEG-4 to H.264/AVC transcoding with spatial downscaling, ETRI Journal, vol.29, no.6, pp 826-828, Dec. 2007. [37] H. Kalva, G.F. Escribano and K Kunzelmann, Reduced resolution MPEG-2 to H.264 transcoder Proc. SPIE, Vol. 7257, 72571V, Jan 2009. [38] S Moiron et al, "H.264/AVC to MPEG-2 video transcoding architecture", Proc Conf. on Telecommunications - ConfTele, Peniche, Portugal, Vol. 1, pp. 449-452, May, 2007. [39] S Moiron et al, Video transcoding from H.264/AVC to MPEG-2 with reduced computational complexity, Signal Processing: Image Communication, vol 24, pp. 637-650, September 2009 [40] Mei-Juan Chen, Ming-Chung Chu and Chih-Wei Pan, Efficient motionestimation algorithm for reduced frame-rate video transcoder, IEEE Trans on Circuits and Systems for Video Technology, vol. 12, pp. 269 275, Apr. 2002. [41] ISO/IEC 11172-2:1993 Information technology -- Coding of moving pictures and associated audio for digital storage media at up to about 1.5 Mbits/s -- Part 2: Video [42] H. Kalva and J.B. Lee, "The VC-1 Video Coding Standard," IEEE Multimedia, vol. 14, pp. 88-91, Oct.-Dec. 2007 [43] P. Bordes, A. Orhand, Improved algorithm for fast transcoding H.264 EUSIPCO 2007. REFERENCE BOOKS: [44] K. Sayood, Introduction to Data compression, III edition, Morgan Kauffmann publishers, 2006. [45] I.E.G. Richardson, H.264 and MPEG-4 video compression: video coding for next-generation multimedia, Wiley, 2003. [46] K. R. Rao and P. C. Yip, The transform and data compression handbook, Boca Raton, FL: CRC press, 2001. [47] K.R. Rao and J.J. Hwang Techniques and standards for image, video, and audio coding - Prentice Hall, 1996. 28

[48] J.B. Lee and H. Kalva, The VC-1 and H.264 video compression standards for broadband video services, Springer, 2008. REFERENCE WEBSITES: [49] JM software : http://iphome.hhi.de/suehring/tml/ [50] VC-1 Software : http://www.smpte.org/home [51] Microsoft website - VC-1 Technical Overview http://www.microsoft.com/windows/windowsmedia/howto/articles/ vc1techoverview.aspx#vc1comparedtoothercodecs [52] VC-1 Wikipedia site - http://en.wikipedia.org/wiki/vc-1 ACRONYMS: ASO AVC B MB CDDT CPDT DCT DSP DVD FMO FRExt GOP I MB IEC ISO ITU-T JVT P MB IDCT IQ MB MBAFF PicAFF ME MC MV MPEG MSE PSNR Arbitrary slice ordering Advanced Video Coding Bi-predicted MB Cascaded DCT Domain Transcoder Cascaded Pixel Domain Transcoder Discrete Cosine Transform Digital Signal Processing Digital Versatile Disc Flexible macroblock ordering Fidelity Range Extensions Group Of Pictures Intra Predicted MB International Electrotechnical Commission International Organization for Standardization International Telecommunication Union Transmission sector Joint Video Team Inter Predicted MB Inverse Discrete Cosine Transform Inverse Quantizer Macroblock Macroblock level Adaptive Frame/Field Picture level Adaptive Frame/Field Motion Estimation Motion Compensation Motion Vector Moving Picture Experts Group Mean Square Error Peak to peak Signal to Noise Ratio 29

Q R-D RS SDDT SP/SI SMPTE SSIM SVC VCEG VLC VLD YUV Quantizer Rate Distortion Redundant slice Simplified DCT Domain Transcoder Switched P / Switched I Society of Motion Picture and Television Engineers Structural Similarity Index Measure Scalable Video Coding Video Coding Experts Group Variable Length Coding Variable Length Decoder Y- Luminance and UV- Chrominance 30