Lecture 13 Video Coding H.264 / MPEG4 AVC Last time we saw the macro block partition of H.264, the integer DCT transform, and the cascade using the DC coefficients with the WHT. H.264 has more interesting tools, and the general structure can be seen in the following picture, (From: Al Bovik, Ed.: The Essential Guide to One specialty of H.264 is that the motion prediction mode (I,P,B) does not need to be the
same for an entire picture, but can be split in one picture, using so-called slice groups. Slice groups partition a picture, such that we can use different prediction modes within the picture, one mode for each slice. An example of the division of a picture into slice groups can be seen in the following image, (From: Al Bovik, Ed.: The Essential Guide to The slice groups give us the flexibility to decide within a picture, which prediction mode (I,P, or B) is the most efficient. This results in a multireference picture prediction, as can be seen in the following image,
Spatial Prediction H.264 also offers the possibility of a spatial intra prediction. Here the prediction is not based on temporally neighbouring pictures, but on the already transmitted data of the current picture. Since usually the pictures are transmitted from top to bottom and from left to right, possible directions for this type of prediction can be seen in the following image,
(From: Al Bovik, Ed.: The Essential Guide to There are different prediction modes for different block sizes, selectable on a per macro block basis, for 4x4, 8x8, and 16x16 size blocks. For instance for the horizontal prediction the values just left of the current block are copied into the current block. For the diagonal prediction a weighting of the reconstructed values adjacent to the current blocks is used. Deblocking Filter Some decoder of previous standards used socalled deblocking filters to avoid blocking artifacts. They were applied after the decoding process. Since it is known in the decoder, where the block boundaries are, it can be checked if they contain edges, and try to decide if the edges are part of the picture or artifacts. If the algorithm decides it is an artifact, it can smooth away the edges around the block boundaries. The problem is, if the decision was wrong, then we get an unsharp lattice pattern in the image. To avoid this, this deblocking would need to be part of the prediction loop, such that past wrong deblocking can be compensated by the next prediction error image. Unlike previous standards, H.264 specifies a
deblocking filter. The specification here is important, because the deblocking filters appear as part of the loop for the motion prediction. A picture is deblocked before it is used as a basis for the motion compensation. The deblocking filter is an adaptive filter, it is applied at the boundaries of each block (which is easy to locate for instance in I frames), and it is basically an adaptive local low-pass filter. The deblocking filter operates on 4x4 blocks. The filter strength is adapted according to the likelyhood of a real edge at the block boundary. For instance, if there is a big edge and only small quantization step sizes in the neigbourhood, it is likely to be a real edge, and the smoothing of the filter is reduced. This method was found to improve the subjective and objective quality, compared to no deblocking or separate deblocking. Entropy Coding H.264 specifies 2 entropy coding modes: 1) Context-Adaptive Variable-Length Coding (CAVLC). This is the simpler mode, used in all profiles. 2) Context-Adaptive Binary Arithmetic Coding (CABAC). This is the more complex mode, used for improved compression. Profiles The standard specifies several so-called
profiles, which contain subsets of tools specified in the standard, to enable less complex implementations of encoders and decoders, which can then be still standard compliant. In this way implementations can be tailored for the intended application. Compression Performance H.264 has as one of its main properties an improved compression performance compared to previous standards, as MPEG-2. The following image shows the compression performance compared to MPEG-2, on the basis of the Peak SNR (PSNR) on the luminance signal. The PSNR comparison here makes sense, because the coders which we compare have similar artifacts.
(From: Al Bovik, Ed.: The Essential Guide to Here we can see that, for the same PSNR, H.264 needs only about half the bitrate of MPEG-2. This is also what H.264 is claimed for, that it features about half the bit rate compared to MPEG-2, for the same image quality.