Lecture 12 Video Coding Cascade Transforms H264, Wavelets

Lecture 12 Video Coding Cascade Transforms H264, Wavelets H.264 features different block sizes, including a so-called macro block, which can be seen in following picture: (Aus: Al Bovik, Ed., "The Essential Guide tovideo Processing", 2009). Macro blocks have the size of 16x16 samples, and can be subdivided, as can be seen in the picture. H.264 also offers the possibility of different transform block sizes, starting with 8x8 transforms, which can be divided into smaller blocks, down to 4x4 transforms, for which we saw the integer transform last time. Macro blocks are used for motion estimation and common coding.

For the common coding, assume we have a 16x16 macro block and 16 4x4 transforms in the macro block. The 16 DC coefficients of these transforms are taken into a new block, which is then again transformed, but this time with a WHT instead of the integer DCT. The integer DCT was also tried, but it was found that for these DC coefficients it has no advantage compared to the WHT, but the WHT is simpler to implement and leads to smaller subband coeffients, which need fewer bits (see: H. Malvar et. al: Low Complexity Transform and Quantization in H.264/AVC, IEEE Trans. on Circuits and Systems for Video Technology, July 2003). This structure can be seen in the following picture: (From: Richardson, "H.264 / MPEG-4 Part 10 White Paper" www.vcodex.com, 2003)

Dynamic range of values after the transform: Assume we have an input signal with a maximum value of A, for instance an image with brightness levels A (for the worst case this would be the maximum value). Then we have a signal vector containing the values +-A (for instance for the chrominance values, which can also be negative), which is here multiplied from the right hand side. If we take the transform matrix H from last time, and if one column of x has for instance the values [A,A,- A,-A] (as a worst case again) then the multiplication with the second row of H results to 6A. This would also be the maximum value for this matrix. If we then also transform the rows of our image, we get a maximum value of 6*6A= 36A. This means that the dynamic range we have for our subband coefficients increases by log2(36)=5.17 bits compared to the dynamic range of the original images. This is an overhead which we need to provide in our coding signal processor. This is also the reason why we wanted to have our factor as small as possible. For the inverse matrix, in the decoder, the factor is somewhat smaller. Here we get a factor of 4, leading to a factor of 16 for rows and colums, and hence 4 additional bits for the dynamic range for the decoded subband

samples. Observe that this also means a reduced (maximum) information content in the subband signals, which is the result of the quantization in the encoder. These effects become important if we want to implement our algorithm with integer arithmetic, with limited word length. H.264 is made such that it can be implementated with 16 bit arithemtic, which enables the implementation into cheap hardware. Wavelet Approaches Back to the cascaded transform. The collection of DC coefficients into a new block with a following transform can also be seen as a tree structure subband decomposition: DC Coefficent s Split DC Coefficients

This is the analysis filter bank structure for the encoder, for the decoder we need the synthesis filter bank, which is the reverse structure with upsamplers instead of downsamplers. This particular structure is used in H.264, but different but similar structures can be found in other coders. This cascaded tree-structured subband decomposition is also called a Discrete Wavelet Transform (DWT). Another example is used in JPEG 2000, which is an image coder, but whose algorithm is also used in Motion JPEG. The equivalent DCT and WHT filters are not particlarly good filters, because they are only as long as the number of subbands we use. To solve this problem, longer Wavelet filters where designed, most often for the 2 band case, where we only have 2 subbands, which are then cascaded. The Daubechies (9,7) Filter pair uses an anlysis lowpass filter with impulse response of length 9:

The corresponding frequency response is The analysis high-pass filter impulse response has length 7

The corresponding frequency response is What is interesting here is that we have a very high attenuation around DC, which is important for images because most energy is concentrated there, and in this way we avoid "crosstalk" of this energy to the higher subband. During filter design this is obtained by placing as many zeros as possible at frequency zero. This can also be seen in the follwoing pole-zero plot,

This type of wavelet filters is, for this reason, also called "maximally flat". Using this 2-band filter bank, we can built a tree structure to obtain higher frequency resolution at low frequenies, as can be seen in the following picture, Analysis: (TP mean Low Pass) Rows Colums

Synthesis: Rows Colums... Insertion of a zero after each sample