MPEG-4: Simple Profile (SP) I-VOP (Intra-coded rectangular VOP, progressive video format) P-VOP (Inter-coded rectangular VOP, progressive video format) Short Header mode (compatibility with H.263 codec) Compression efficiency tools (4 MVs per macroblock, unrestricted MVs, Intra prediction) Transmission efficiency tools (video packets, data partitioning, reversible VLCs) 1
MPEG-4: Simple Profile (SP) The Very Low Bit-Rate Video (VLBV) Core The VLBV core is essentially the H.263 baseline codec employing the predictive/transform coding model. The Short Header mode enables direct compatibility (at the frame level) between MPEG-4 Simple Profile and an H.263 baseline codec. 2
SP: Basic Coding Tools I-VOP An I-VOP is a rectangular video frame encoded in Intra mode. Source frame DCT Q Reorder RLC VLC Decoded frame IDCT Q 1 Reorder RLD VLD DCT Discrete cosine transform Q Quantization RLC Runlength coding VLC Variable length coding IDCT Inverse discrete cosine transform Q 1 Inverse quantization RLD - Runlength decoding VLD Variable length decoding 3
SP: Basic Coding Tools DCT and IDCT 8 8 DCT and IDCT transformation of luminance (luma) and chrominance (chroma) pixels. Quantization Forward quantization is not defined by the standard. The standard specifies the method of rescaling the quantized transform coefficients during decoding. The DC coefficient in an Intra-coded macroblock is rescaled by: DC = DC Q dc_scaler DC Q is the quantized DC coefficient, DC is the rescaled coefficient and dc_scaler is a parameter defined in the standard. 4
SP: Basic Coding Tools The value of dc_scaler depends on the QP range (1 to 31) according to the table below: Block type QP 4 5 QP 8 9 QP 24 25 QP Luminance 8 2 QP QP + 8 (2 QP) 16 Chrominance 8 (QP + 13)/2 (QP + 13)/2 QP 6 In Short Header mode, dc_scaler is 8, i.e., all Intra DC coefficients are rescaled by a factor of 8. 5
SP: Basic Coding Tools All other transform coefficients (including AC and Inter DC) are rescaled as follows: F = QP (2 F Q + 1) (if QP is odd and F Q 0) F = QP (2 F Q + 1) 1 (if QP is even and F Q 0) F = 0 (if F Q = 0) F Q is the quantized coefficient and F is the rescaled coefficient which has the same sign as F Q. Zig-zag scan Quantized DCT coefficients are reordered in a zig-zag scan prior to encoding. 6
SP: Basic Coding Tools Runlength Coding The array of reordered coefficients corresponding to each block is encoded to represent the zero coefficients efficiently. Each non-zero coefficient is encoded as a triplet of (last, run, level) where last indicates whether this is the final non-zero coefficient in the block, run signals the number of preceding zero coefficients and level indicates the coefficient sign and magnitude. Entropy Coding Header information and (last, run, level) triplets are represented by variable-length codes (VLC) defined in the standard based on pre-calculated coefficient probabilities. 7
s is the sign bit Table 4.1: MPEG-4 Visual DCT coefficient VLCs (partial, all codes <9 bits) 8
SP: Basic Coding Tools A coded I-VOP consists of a VOP header, optional video packet headers and coded macroblocks. Each macroblock (MB) is coded with a header (defining the macroblock type, identifying which blocks in the MB contain coded coefficients, signalling changes in quantization parameter, etc.) followed by coded coefficients for each 8 8 block. In the decoder, the sequence of VLCs are decoded to extract the quantized transform coefficients which are re-scaled and inverse transformed to reconstruct the decoded I-VOP. 9
P-VOP SP: Basic Coding Tools A P-VOP is coded with Inter prediction from a previously encoded I- or P-VOP (a reference VOP). ME Source frame MCP DCT Reorder Q RLC VLC Decoded frame MCR IDCT Reorder Q 1 RLD VLD ME Motion estimation MCP Motion compensated prediction MCR Motion compensated reconstrunction 10
SP: Basic Coding Tools Motion Estimation and Compensation The basic motion compensation (MC) scheme is the block-based compensation of 16 16 pixel blocks. The motion vector (MV) may have half-pixel resolution where the half-pixel positions are calculated using bilinear interpolation between pixels at integer-pixel positions. The motion estimation (ME) method is not defined. The residual MB is formed by subtracting the motion-compensated MB (prediction) in the reference frame from the current MB (motion-compensated prediction, MCP). The residual MB is transformed with the DCT, quantized, zig-zag scanned, run-level coded and entropy coded. 11
SP: Basic Coding Tools The quantized residual data is rescaled and inverse transformed to reconstruct a locally decoded MB for the prediction of next MB. A coded P-VOP consists of VOP header, optional video packet headers and coded MBs, each containing a header (this time including differentially-encoded MVs) and coded residual coefficients for every 8 8 block. The decoder forms the same MCP based on the received MV and its own copy of the reference VOP. The decoded residual data is added to the prediction to reconstruct a decoded MB (motioncompensated reconstruction, MCR). Macroblocks within a P-VOP may be coded in Inter or Intra mode. Inter mode gives the best coding efficiency but Intra mode may be useful where the MCP is inaccurate, such as in a newlyuncovered region. 12
SP: Basic Coding Tools Short Header The Short Header tool provides compatibility between MPEG-4 Visual and the ITU-T H.263 video coding standard. An I- or P-VOP encoded in short header mode has identical syntax to an I- or P-picture coded in the baseline mode of H.263, which means that an MPEG-4 I-VOP or P-VOP should be decodable by an H.263 decoder and vice versa. In Short Header mode, the MBs within a VOP are organized in Groups of Blocks (GOBs), each consisting of one or more complete rows of MBs. Each GOB may start with a resynchronization marker (a fixed-length code which enables a decoder to resynchronize when an error is encountered). 13
SP: Coding Efficiency Tools* Four Motion Vectors per Macroblock The default block size for ME is 16 16 for luma pixels and 8 8 for chroma pixels. This tool allows the encoder to choose a smaller ME block size of 8 8 for luma and 4 4 for chroma pixels, giving 4 MVs per MB. The mode can minimize the energy of the MC residual, particularly in areas of complex motion or near the boundaries of moving objects. There is an increase in overhead in sending the 4 MVs, and so the encoder may choose to send one or four MVs on a MB-by-MB basis. * The coding efficiency tools are not applicable in Short Header mode. 14
SP: Coding Efficiency Tools Unrestricted Motion Vectors (UMV) The UMV tool allows MVs to point to a region outside the reference VOP, which is extrapolated from the nearest edge pixels. UMV improves MC efficiency, especially when there are objects moving in and out of the picture. Reference VOP Current VOP Reference VOP extrapolated beyond boundary VOP edge Current MB Extrapolated region outside reference VOP Best match 15
SP: Coding Efficiency Tools Intra Prediction Low-frequency transform coefficients of neighboring Intra-coded 8 8 blocks are often correlated. In this mode, the DC coefficients and (optionally) the first row and column of AC coefficients are predicted from neighboring coded blocks, as shown in Figs. 4.6 and 4.7. The direction of prediction for block X is determined by: if DC A DC B < DC B DC C else predict from block C predict from block A 16
SP: Coding Efficiency Tools The prediction, P DC, is formed by dividing the DC coefficient of the chosen neighboring block by a scaling factor and P DC is then subtracted from the actual quantized DC coefficient (QDC X ) and the residual (PQDC X ) is coded and transmitted. If A, B or C are outside the VOP boundary, or if they are not Intracoded, their DC coefficient value is assumed to be 1024. B C A X Fig. 4.6 Prediction of DC coefficients. 17
SP: Coding Efficiency Tools The AC coefficient prediction is carried out in a similar way, with the first row or column of AC coefficients predicted in the direction determined for the DC coefficient as shown below. The prediction is scaled depending on the quantizer step size of blocks X and A or C. C A X Fig. 4.7 Prediction of AC coefficients. 18
SP: Error Resilient Tools A transmission error such as a bit error or packet loss may cause a video decoder to lose synchronization with the sequence of decoded VLCs. This can cause the decoder to decode incorrectly some or all of the information after the occurrence of the error, causing spatial error propagation within the same VOP and temporal error propagation in subsequent VOPs. When an error occurs, a decoder can resume correct decoding upon reaching a resynchronization marker (resync marker), inserted at the start of each VOP and (optionally) at the start of each GOB. (see Fig. 4.8) 19
SP: Error Resilient Tools Spatial error propagation Erroneous macroblock Temporal error propagation Time Illustration of error propagation. 20
SP: Error Resilient Tools Video Packet A transmitted VOP consists of one or more video packets (VPs). A VP consists of a resync marker, a header field and a series of coded MBs in raster scan order. The resyn marker is followed by a count of the next MB number, the quantization parameter and the HEC (Header Extension Code) flag. If HEC = 1, it is followed by a duplicate of the current VOP header. Sync Header HEC (Header) MB data Sync Fig. 4.8 Video packet structure. 21
SP: Error Resilient Tools The VP tool assists in error recovery at the decoder in several ways: 1. When an error is detected, the decoder can resynchronize at the start of the next video packet, and so the error does not propagate beyond the boundary of the VP. 2. If used, the HEC field enables a decoder to recover a lost VOP header from elsewhere within the VOP. 3. Predictive coding does not cross the boundary between VPs, thus preventing an error from propagating to another VP. 22
SP: Error Resilient Tools Data Partitioning The data partitioning (DP) tool enables an encoder to reorganize the coded data to reduce the impact of transmission errors. The packet is split into two partitions: the first partition contains the coding mode information for each MB together with DC coefficients of each block (for Intra MBs) or MVs (for Inter MBs). The second partition contains the remaining data, i.e., the AC coefficients and DC coefficients of Inter MBs, after the resync marker. The information sent in the first partition is considered to be most important as this enables the decoder to partially decode the packet, even if the second partition is lost due to transmission errors. 23
SP: Error Resilient Tools Reversible VLCs An optional set of Reversible VLCs may be used to encode the DCT coefficient data. These codes can be correctly decoded in both the forward and reverse directions. A decoder first decodes each video packet in the forward direction and, if an error is detected, the packet is decoded in the reverse direction from the next resync marker. Using this approach, the damage caused by an error may be limited to just one MB, making it easier to conceal the erroneous region. Error Sync Header HEC Header + MV Texture Sync Decode in forward direction Fig. 4.9 Error recovery using RVLCs. Decode in reverse direction 24