MPEG-2 standard and beyond - PDF Free Download

Table of Content MPEG-2 standard and beyond O. Le Meur olemeur@irisa.fr Univ. of Rennes 1 http://www.irisa.fr/temics/staff/lemeur/ November 18, 2009 1

Table of Content MPEG-2 standard 1 A brief history of video compression standards 2 3 4 2

Standard A common framework H.261 H.263 MPEG-1 Performances A brief history of video compression standards 1 A brief history of video compression standards Standard A common framework H.261 H.263 MPEG-1 Performances 2 3 4 3

Standard A common framework H.261 H.263 MPEG-1 Performances What is the goal of a compression standard? Denition (Standard) A format that has been approved by a recognized standards organization or is accepted as a de facto standard by the industry. For the compression standards, there are two organizations: ITU-T-VCEG and ISO MPEG. A video compression standard only species bitstream syntax and decoding process. The goal is to create the best video compression standards for targeted applications. Core experiments CfE CfP First solution Assessment of proposals Iteration t Verication Model (VM) VM Evolution 4 CfE: Call for Evidence; CfP: Call for Proposal.

1 For this part, most of the gure have been extracted from B. Girod's courses (EE398 Image and Video Compression). Video compression standard A common framework for the dierent video standards (H.261, MPEG-1, MPEG-2, H.263, MPEG-4, H.263/AVC): 1

Standard A common framework H.261 H.263 MPEG-1 Performances H.261 ITU-T Rec. H.261 International standard for ISDN picture phones and for video conferencing systems (1990); Image format: CIF (352 288 Y samples) or QCIF (176 144 Y samples), frame rate from 7.5 to 30 fps; Bit-rate: multiple of 64 kbps, typically 128 kbps including audio; Picture quality: for 128 kbps acceptable with limited motion in the scene; Stand-alone videoconferencing system or desk-top videoconferencing system, integrated with PC. 6

Standard A common framework H.261 H.263 MPEG-1 Performances H.261 Main features Macroblock (MB) of 16 16 pixels; Sampling format 4:2:0 MB is composed of 4 luminance and 2 chrominance blocks. Motion-compensated prediction Integer-pel accuracy; One displacement vector per MB; Maximum displacement vector range ±16, horizontally and vertically; Dierential encoding of motion vectors. 7

Standard A common framework H.261 H.263 MPEG-1 Performances H.261 Residual coding 8 8 DCT; Quantization: Uniform quantizer ( = 8) for intra-mode DC coecients; Uniform threshold quantizer ( = 2, 4,..., 62) for AC coecients in intra-mode and all coecients in inter-mode; Zig-zag scan; Run-level coding for entropy coding: (zero-run, value) symbols; zero-run: the number of coecients quantized to zero since the last non-zero coecient; value: the amplitude of the current non-zero coecient. 8

Standard A common framework H.261 H.263 MPEG-1 Performances H.263 ITU-T Rec. H.263 International standard for picture phones over analog subscriber lines (1995); Image format usually CIF, QCIF or Sub-QCIF with frame rate usually below 10 fps; Bit-rate is arbitrary (typically 20 kbps...); Picture quality: with new options as good as H.261 (at half rate); Software-only PC video phone or TV set-top box; Widely used as compression engine for Internet video streaming; H.263 is also the compression coreof the MPEG-4 standard. Four optional coding modes (The terms H.263+ and H.263++ are used to describe CODECs supporting some or all of the optional coding modes). 9

Standard A common framework H.261 H.263 MPEG-1 Performances H.263 H.263 vs H.261 Improved motion compensation: H.261 (1990): integer-pel accuracy, 1 motion vector per MB; H.263 (1995): half-pel accuracy, 1 motion vector per MB; Improved 3-D VLC for DCT coecients (last, run, level); Reduced overhead; Support more picture formats; 10

Standard A common framework H.261 H.263 MPEG-1 Performances MPEG-1 Main features This standard was developed for the specic application of video storage and playback on compact disks; Block-based motion compensation, hybrid DCPM; Optimized for bitrate around 1.2 Mbit/s. 11

MPEG-1

Standard A common framework H.261 H.263 MPEG-1 Performances Performances 13

Standard A common framework H.261 H.263 MPEG-1 Performances Performances 14

MPEG-2 A brief history of video compression standards 1 A brief history of video compression standards 2 3 4 15

Brief history A brief history of video compression standards MPEG = Moving Picture Experts Group Part of the International Standards Organization (ISO) Targeted applications MPEG-2 is dedicated for digital storage media and broadcast. The targeted bit rate is in the range 1 to 20 Mbps: 1 to 6Mbps: digital television broadcastion (SD); 5 to 8 Mbps: DVD video; 10 to 20Mbps: digital television broadcastion (HD). The work started in November, 1991 and the standard (MPEG-2 ISO/IEC 13818) has been published in November, 1995. 16

Brief history A brief history of video compression standards MPEG-2 (ISO/IEC 13818): 13818-1: Systems; 13818-2: Video; 13818-3: Audio; 13818-4: Conformance; 13818-5: Software; 13818-6: Digital Storage Media; 13818-7: Non-Backward Compatible Audio;... MPEG-2 = MPEG-1 (ISO/IEC 11172) + Interlace Tools (Field picture, DCT, prediction...) + Proles & Levels 17

scheme Input Typical MPEG Encoder Structure - Prediction error DCT Q Predicted image Regulation Coecients Q 1 RLE-Human Prediction choice Null Frame Memory 1 MC prediction Frame Memory 2 Motion Estimation DCT 1 Reconstrucuted + image Motion vectors Binary stream Human Open to invention and proprietary techniques 18

Hierarchical syntax MPEG structure Group of pictures Video sequence Slice Block 8x8 Picture Macroblock 16x16 19

Slices A brief history of video compression standards Denition (Slice) A slice is composed of an arbitray number of consecutive macroblocks. The rst and last macroblocks of a slice shall not be skipped macroblocks; Every slice shall contain at least one macroblock; Slices shall not overlapp; The position of slices may change from picture to picture; The rst and last macroblock of a slice shall be in the same horizontal row of macroblocks. 20

Slices A brief history of video compression standards Two slice structures: General slice structure, the slices do not cover the entire picture; Restricted slice structure, every macroblock shall be enclosed in a slice. (a) General (b) Restricted 21

Macroblock A brief history of video compression standards Three macroblock structures: (a) 420 (b) 422 22 (c) 444

Types of pictures I, P, and B pictures Intra picture (I): intra-frame spatial DCT; Predicted picture (P): DCT with forward prediction (residual coding); Bi-directional picture (B): DCT with bi-directional prediction (residual coding). Forward prediction Forward prediction I B B P B B P B B Bi-directional prediction These three types of pictures are used to form the GOP (Group of Pictures). 23

Group of Pictures GOP N-M N is the I picture interval and M is the anchor picture interval (M-1 B pictures between anchor pictures). A GOP must contain a I picture; B pictures must be located between anchor pictures (I or P); A GOP must start with a I picture in coding order; A GOP must start with a I or B picture and must end with an I or P picture in display order. Example (GOP in coding order) GOP 1-1 I I I I I I I I I GOP 6-2 I B P B P B I B P GOP 12-3 I B B P B B P B B P B B 24

Group of Pictures Display and coding order Display order: input of the encoder and output of the decoder; Coding order: output of the encoder and input of the decoder. Display order (input encoder) B 1 B 2 I 1 B 3 B 4 P 1 B 5 B 6 P 2 B 7 B 8 P 3 B 9 Coding order (output encoder) I 1 B 1 B 2 P 1 B 3 B 4 P 2 B 5 B 6 P 3 B 7 B 8 Display order (output decoder) B 1 B 2 I 1 B 3 B 4 P 1 B 5 B 6 P 2 B 7 B 8 Time 25

Input Format A brief history of video compression standards YUV 4:2:0 Color space YUV 4:4:4: Y 0.299 0.587 0.114 R U = 0.1687 0.3313 0.5 G V 0.5 0.4187 0.813 B (d) Source (e) Y (f) U (Blue) (g) V (Red) Format 4:2:0 (Human eyes are less sensitive to the chrominance than to the luminance): 26 (a) Source (b) Y (c) U (d) V

Interlaced format Progressive vs Interlaced Progressive: one time instant is required to acquire the picture; Interlaced: two time instants are required to acquire the picture (Two elds, eld period = frame period / 2). t Frame t Field 1 t + δ Field 2 27

Interlaced format Progressive vs Interlaced Interlaced scanning is a way to save bit rate (almost invisible for us due to the vision persistence and our critical icker frequency). Interlaced scanning provides a high vertical resolution for still scenes (similar to progressive format); Artifacts for moving areas. 28

DCT A brief history of video compression standards Orthogonal transform of 8x8 pixel block into 8x8 frequency coecient matrix The DCT of a block I of size NxN is dened by: DCT (I )(n, m) = 2 N λ(n)λ(m) N 1 N 1 i=0 j=0 The inverse DCT is dened by: I (x, y) = 2 N 1 N 1 N m=0 n=0 cos π(2i+1)n 2N π(2x+1)n λ(n)λ(m)cos 2N { 2 1 if t = 0 1 otherwise λ(t) = cos π(2j+1)m I (i, j) 2N cos π(2y+1)m DCT (I )(n, m). 2N Two dimensional DCT basis. The source data (8x8) is transformed to a linear combination of these 64 frequency squares. 29

Pro les and levels A brief history of video compression standards DCT Example fu fv DC coe. Original 30 Normalized histogram DCT AC coe. Dist. DCT coe.

Field/frame DCT Frame or eld DCT? Frame DCT is the tansformation mode used in MPEG-1 as illustrated below. Field DCT is applied on the eld (see below): 8x8 Macroblock DCT frame 16x16 DCT eld 31 You have the choice between these two dierent DCT. What is the general idea to be ecient?

Quantization A brief history of video compression standards Input DCT Quantization matrix Scalar Quantization Coe Denition (Scalar quantization) Q : X C = {y i, i = 1, 2,...N} x Q(x) = y i N is the number of quantization level; X discret; C is always discret (codebook,dictionnary); card(x ) > card(c); As x Q(x), we will lost some information (lossy compression). 32

Quantization A brief history of video compression standards Input DCT Quantization matrix Scalar Quantization Coe Two dierent quantizer scale Linear quantizer scale (qscaletype=0) Non-linear quantizer scale (qscaletype=1) 33

Quantization A brief history of video compression standards Quantization matrix Possibility to use quantization matrix to weight the quantized coecients. This matrix is based on the Contrast Sensitivity Function (coarser quantization of high spatial frequencies without visual annoyance). Default matrices are specied by the standard. Therefore, it is not necessary to send them. Not the case for proprietary matrix. Example (Illustration of our visual sensitivity (CSF)) Psychophysic experiments seek to determine whether the subject can detect a stimulus. 34

Quantization A brief history of video compression standards Default quantization matrix Default matrix for intra coding: 8 16 19 22 26 27 29 34 16 16 22 24 27 29 34 37 19 22 26 27 29 34 34 38 Q I = 22 22 26 27 29 34 37 40 22 26 27 29 32 35 40 48 26 27 29 32 35 40 48 58 26 27 29 34 38 46 56 69 27 29 35 38 46 56 69 83 Non intra matrix: 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 Q NI = 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 Non-intra quantization Matrix (MPEG-2 Test Model 5) 16 17 18 19 20 21 22 23 17 18 19 20 21 22 23 24 18 19 20 21 22 23 24 25 Q NI = 19 20 21 22 23 24 25 27 20 21 22 23 25 26 27 28 21 22 23 24 26 27 28 30 22 23 24 26 27 28 30 31 23 24 25 27 28 30 31 33 35

Zig-zag scan A brief history of video compression standards Two scan patterns are used in MPEG-2 Normal zig-zag scan as dened in MPEG-1 (below on the left); Alternate zig-zag scan. This new scan pattern can be used when the frame DCT is applied on an interlaced video. There are more DCT coecients for the highest vertical frequencies. 36

Macroblock coding I pictures All macroblocks are encoded in INTRA. P pictures Macroblocks can be encoded in INTRA; Macroblocks can be encoded in INTER (with a forward prediction); B pictures Macroblocks can be encoded in INTRA (fallback mode); Macroblocks can be encoded in INTER: forward prediction; backward prediction; bidir prediction. 37

MC Prediction A brief history of video compression standards Motion estimation and Motion compensation (MC). An image of the sequence is dened by I (x, y, t). (x, y) represents the spatial coordinates whereas t represents the time. Fundamental assumption Classication: The image intensity is conserved along trajectories!!! I (x, y, t) = I (x + δ x, y + δ y, t + δ t) Feature / Region Matching: the motion eld is estimated by correlating features (edge, intensity...) from one frame to another (Block Matching, Phase correlation...); Gradient-based methods: the motion eld is estimated by using spatio-temporal gradients of the image intensity distribution (Pel-recursive method, the Horn-Schunck algorithm...). 38

MC Prediction A brief history of video compression standards Motion models Motion model uses in MPEG-2 is 2D Translation (2 parameters, this model dealing only with the translation is used in video coding (works quite well because motion between concecutive frames is rather small)). [ ] u = v [ ] x + y [ ] dx dy Rotation, scaling and deformation are not taken into account to evaluate the displacement. 39

MC Prediction A brief history of video compression standards Motion estimation/compensation Performed on luminance macroblock (16 16); Supporting half-pixel motion compensation; Chrominance motion vectors are half of luminance MB's vector -2048 to +2047.5 for half-pixel motion vector. 40

MC Prediction A brief history of video compression standards 41

MC Prediction A brief history of video compression standards Forward prediction Forward prediction Forward prediction I B B P B B P B A forward-predicted macroblock depends on decoded pixels from the immediately preceding anchor picture; This mode can be used to encode macroblocks in P and B pictures. 42

MC Prediction A brief history of video compression standards Backward prediction P B B P B B I B Backward prediction A backward-predicted macroblock depends on decoded pixels from the immediately following anchor picture; This mode can only be used to encode macroblocks in B pictures. 43

MC Prediction A brief history of video compression standards Bi-directional prediction I B B P B B P B Bi-directional prediction A bi-directionally-predicted macroblock depends on decoded pixels from the anchor pictures immediately following and immediately preceding; This mode can only be used to encode macroblocks in B pictures. 44

MC prediction A brief history of video compression standards The prediction is dependent on the type of picture, frame or eld... Two predictions are possible Frame prediction or Field prediction. Prediction mode for frame pictures: frame prediction The goal is to get a 16x16 motion vector for the current the frame picture. Top Bottom Top Bottom Block 16x16 Frame prediction for frame picture: one 16x16 vector. Works well for videos with slow and moderate deplacements. 45

MC prediction A brief history of video compression standards The prediction is dependent on the type of picture, frame or eld... Two predictions are possible Frame prediction or Field prediction. Prediction mode for frame pictures: eld prediction The goal is to get two 16x8 vectors, one for the current top eld and one for the current bottom eld. Two candidates are tested for each case. 16x8 16x8 16x8 16x8 Top Bottom Top Bottom Field prediction for frame picture: two 16x8 vectors. 46

MC prediction A brief history of video compression standards The prediction is dependent on the type of picture, frame or eld... Two predictions are possible Frame prediction or Field prediction. The prediction is also dependent on the current eld: eld prediction in the rst eld and eld prediction in the second eld. Prediction mode for eld pictures: eld prediction in the rst eld 16x16 16x16 Top Bottom Top Bottom Field prediction for eld picture: one 16x16 vector. 47

MC prediction A brief history of video compression standards The prediction is dependent on the type of picture, frame or eld... Two predictions are possible Frame prediction or Field prediction. The prediction is also dependent on the current eld: eld prediction in the rst eld and eld prediction in the second eld. Prediction mode for eld pictures: eld prediction in the second eld The goal is to get a 16x16 motion vectors. This case is only possible for P prediction. 16x16 16x16 Top Bottom Top Bottom Field prediction for eld picture: one 16x16 vector. 48

MC prediction A brief history of video compression standards Remarks two other predictions not detailled Dual Prime: only valid for P pictures, the idea is to propose a predictor that is the average of two candidates from the top and bottom elds; 16x8 Motion Compensation (16x8 MC): particular prediction for eld pictures and eld prediction. A MB belonging to the current eld is broken into two 16x8 size blocks. A search is performed for each part on the top and bottom reference elds. 49

MC prediction A brief history of video compression standards Example (Motion mode decision for P pictures) Compute MSE between block and zero motion prediction; Compute MSE between block and its MC frame prediction block; Compute MSE betwen block and its MC eld prediction block; Compute MSE between block and its MC dual-prime prediction block; Choose the prediction mode with the least MSE. In practise, a regularization term is used to decide which vectors will be chosen. The regularization term takes into account an estimated coding costs of the chosen motion vectors. 50

MB modes A brief history of video compression standards I picture Intra: DCT frame DCT eld 51

MB modes A brief history of video compression standards P picture Forward frame: DCT frame DCT eld Forward eld: DCT frame DCT eld Intra: DCT frame DCT eld NoMC: Skip MB if and only if the quantized coecients are nul and the motion vector is null. 52

MB modes A brief history of video compression standards B picture Forward frame: DCT frame DCT eld Forward eld: DCT frame DCT eld Backward frame: DCT frame DCT eld Backward eld: DCT frame DCT eld Bidir frame: DCT frame DCT eld Bidir eld: DCT frame DCT eld Intra: DCT frame DCT eld Skip MB: the quantized coecients are nul; the motion vector is null. 53

Macroblock structure 55

A brief history of video compression standards Denition () The scalability allows to decode videos with dierent resolution/quality from the same bit stream. Four modes of scalability are available: Spatial scalability; Temporal scalability; SNR scalability; Data partitioning. 56

Spatial scalability A spatial scalability scheme encodes a video in a way that allows it to be decoded at multiple spatial resolutions. two coder loops operate at dierent picture resolutions to produce the base and enhancement layers; the use of decoded pictures from a lower layer as a prediction in a higher layer; this prediction is in addition to the prediction from the upper-layer's motion-compensated predictor. The adaptive weighting function, W, selects between the prediction from the upper and lower layers. 57

Temporal scalability In a temporal scalability scheme, intermediate video frames are encoded in the way that allows them to be dropped. Option 1: Option 2: 58

SNR scalability A brief history of video compression standards SNR scalability, also called delity or quality scalability is used to encode a video at a single spatial resolution but in a way that allows it to be decoded at dierent quality levels. the addition of an extra quantisation stage; the coder quantises the DCT coecients to a given accuracy, variable-length codes them and transmits them as the lower-level or base-layer bitstream; the quantisation error introduced by the rst quantiser is itself quantised, variable-length coded and transmitted as the upper-level or enhancement-layer bitstream; Enhancement layers is composed of coded renement DCT coecients with a small overhead. 59

SNR scalability A brief history of video compression standards DCT coecients in base layer are added to DCT coecients in enhancement layer; The combined layer decoding process is identical to decoding of a non-scalable bit stream; Dierent rate controls for the 2 layers. 60

Data partitionning The base layer contains the most critical components, such as header information, motion vectors and (optionally) low-frequency DCT coecients. The enhancement layer contains all remaining coded data. The bit stream is split into 2 layers (partition 1 and partition 2); The priority breakpoint (in sequence header) indicates which syntax elements are placed in partition 0 which is the base or high priority partition. 61

Data partitionning No data partitionning: Data partitionning: Priority breakpoints: after slice header, after MB address increment (MB header), before coded block pattern, after any number of DCT coecients (excluding one). 62

What is the goal? The idea is to dene subsets of the standard in order to provide low-cost decoder and to adapt the decoder to a particular target. There are 6 proles (simple, main, SNR, Spatial, HIGH, 4:2:2) and 4 levels (low, main, high-1440, high). 63

The most used is the Main Prole at Main Level (MP@ML): Main Prole: B frames supported; 4:2:2 and 4:4:4 not supported; Scalable modes not supported. Main Level: Max. Picture size: 720x576, 30 frames/sec; Max. Bitrate: 15 Mbps; Max. Buer size: 1.835008 Mbits. 65

Artifacts A brief history of video compression standards Blocky artifacts: blocky grid remains xed while the objects moves under it; due to a poor motion estimation; lack of bit to encode the AC coecients. Mosquito Noise: may be seen at edges; the discontinuty cause high frequency, but with a quite strong quantization, we can retrieve the original signal (energy is spread spatially). Dirty window: Noise appear to remain stationary while objects move beneath it. 66

Artifacts A brief history of video compression standards 67

Motion estimation based on a linear translational model Every pixels of a block undergo the same displacement (translation). Zooms; Rotations; Transparent moving objects; Dissolves containing moving objects. 68

Added values of an encoder Introduction Bit Rate allocation Preprocessing Ecient coding mode decision Adpative quantization Motion estimator Spatio-temporal events detection Adaptive GOP Statistical Multiplexing 1 A brief history of video compression standards 2 3 Introduction Bit Rate allocation Preprocessing Ecient coding mode decision Adpative quantization Motion estimator Spatio-temporal events detection Adaptive GOP Statistical Multiplexing 4 69

Introduction A brief history of video compression standards Introduction Bit Rate allocation Preprocessing Ecient coding mode decision Adpative quantization Motion estimator Spatio-temporal events detection Adaptive GOP Statistical Multiplexing Normative and non normative algorithms The standard gives a general framework to encode a video sequence. More accurately, MEPG-2 standard denes how to generate a MPEG-2 bit-streams. To design an optimized MPEG-2 encoder system, several areas of research have to be considered (bit rate allocation, adpative quantization, coding mode decisions...). PRODUCT DIFFERENTIATION Actors: Grass Valley (Thomson), Envivio, Harmonics, Tandberg, Scientic Atlanta... 70

Bit Rate allocation Introduction Bit Rate allocation Preprocessing Ecient coding mode decision Adpative quantization Motion estimator Spatio-temporal events detection Adaptive GOP Statistical Multiplexing VBR and CBR Constant Bit Rate (CBR) is used to provide a constant bit rate with a non-uniform picture quality; Variable Bit Rate (VBR) coding is used to provide a constant picture quality with variable coding bit rate. Goal and a priori rules (for VBR) The goal of the rate control is to achieve highest quality given a target bit rate. To reach this goal, the rate control is commonly based on the following propositions: As I frames play an important role for the prediction, the target bit budget for a pciture is expected to be higher than other two types (quality propagation throughout the GOP); In order to have a constant bit rate and the best quality, the bit budget of complex frame should be higher than the bit budget of simple frames; In order to have a constant quality within the frame, some properties of the human visual systems have to be taken into account (adaptive quantization driven by a visual masking model). 71

Bit Rate allocation Introduction Bit Rate allocation Preprocessing Ecient coding mode decision Adpative quantization Motion estimator Spatio-temporal events detection Adaptive GOP Statistical Multiplexing General framework 1 GOP bit budget: B GOP = C N bit/gop, C is the bit rate, N the number of F pictures in the GOP and F the frame rate; 2 Picture bit budget: B GOP = B I + n P B P + n B B B B I B I B GOP = B I + n P + n B k I,P k I,B B GOP = B I (1 + n P k I,P + n B k I,B ) with, n j the number of picture of type j in the GOP (j {P, B}), k I,P and k I,B the ratio between the bit budget of intra picture and P and B pictures, respectively. 72

Bit Rate allocation Introduction Bit Rate allocation Preprocessing Ecient coding mode decision Adpative quantization Motion estimator Spatio-temporal events detection Adaptive GOP Statistical Multiplexing Test Model 5 (TM5) TM5 is a test model to verify MPEG-2 algorithms and tools. It consists of three steps: 1 Target picture bit allocation: Estimation of the number of bits to code a given picture (estimation mainly based on the picture type and its spatial complexity). 73

Bit Rate allocation Introduction Bit Rate allocation Preprocessing Ecient coding mode decision Adpative quantization Motion estimator Spatio-temporal events detection Adaptive GOP Statistical Multiplexing Test Model 5 (TM5) TM5 is a test model to verify MPEG-2 algorithms and tools. It consists of three steps: 1 Target picture bit allocation: Estimation of the number of bits to code a given picture (estimation mainly based on the picture type and its spatial complexity). 2 MB quantization parameter assignement: Within a picture, the bits required to encode each MB are determined. A quantizer step is derived from the number of bits (check the total number of bits and adapt the target if neccessary). Strong assumption: the distortion D linearly increases with the quantization step Q and R 1 D. 73

Bit Rate allocation Introduction Bit Rate allocation Preprocessing Ecient coding mode decision Adpative quantization Motion estimator Spatio-temporal events detection Adaptive GOP Statistical Multiplexing Test Model 5 (TM5) TM5 is a test model to verify MPEG-2 algorithms and tools. It consists of three steps: 1 Target picture bit allocation: Estimation of the number of bits to code a given picture (estimation mainly based on the picture type and its spatial complexity). 2 MB quantization parameter assignement: Within a picture, the bits required to encode each MB are determined. A quantizer step is derived from the number of bits (check the total number of bits and adapt the target if neccessary). Strong assumption: the distortion D linearly increases with the quantization step Q and R 1 D. 3 Activity modulation at MB quantization parameter: The quantization step is adjusted depending on its spatial complexity (visual masking). 73

Bit Rate allocation Introduction Bit Rate allocation Preprocessing Ecient coding mode decision Adpative quantization Motion estimator Spatio-temporal events detection Adaptive GOP Statistical Multiplexing ρ-domain [He et al.,02, He et al.,01] The number of quantized transform coecients having a zero value monotonically increases with the quantization parameter. There is a one-to-one mapping between these two values. Let ρ be the percentage of zeros among the quantized transform coecients and R be the coding bit rate, the relationship between R and p is given by: R hits 0 at ρ = 1; R(ρ) = θ (1 ρ) θ is a constant for each frame. θ, the slope is adaptively adjusted for each frame; The distortion is estimated based on transformed coecients. 74

Bit Rate allocation Introduction Bit Rate allocation Preprocessing Ecient coding mode decision Adpative quantization Motion estimator Spatio-temporal events detection Adaptive GOP Statistical Multiplexing ρ-domain 1 Bit budget denition per picture; 2 Determination of ρ to reach the bit budget: R R(ρ) = θ(1 ρ) Look-Up-Table B 0 ρ ρ 0 ρ 1... ρ N 2 ρ N 1 QP 0 1... N-2 N-2 ρ 0 1 ρ 3 From a Look-Up-Table (LUT), the quantization parameter is determined (LUT is deduced from the coding of previous picture or from a preanalysis pass). 75

Preprocessing A brief history of video compression standards Introduction Bit Rate allocation Preprocessing Ecient coding mode decision Adpative quantization Motion estimator Spatio-temporal events detection Adaptive GOP Statistical Multiplexing Noise Filtering Remove the noise from the video sequence in order to increase coding performance and to improve visual quality. Noise can be due to: imperfection of scanning; transmission... Dierent solutions to reduce the noise: 1 Spatial ltering: ltering each frame independently; 2 Adapted motion ltering: only static areas are ltered (motion detection); 3 Motion-compensated spatio-temporal ltering: a motion compensation is applied before ltering current picture with previous ones. 76

Ecient coding mode decision Introduction Bit Rate allocation Preprocessing Ecient coding mode decision Adpative quantization Motion estimator Spatio-temporal events detection Adaptive GOP Statistical Multiplexing 78

Adpative quantization Introduction Bit Rate allocation Preprocessing Ecient coding mode decision Adpative quantization Motion estimator Spatio-temporal events detection Adaptive GOP Statistical Multiplexing 79

Motion estimator Introduction Bit Rate allocation Preprocessing Ecient coding mode decision Adpative quantization Motion estimator Spatio-temporal events detection Adaptive GOP Statistical Multiplexing 80

Spatio-temporal events detection Introduction Bit Rate allocation Preprocessing Ecient coding mode decision Adpative quantization Motion estimator Spatio-temporal events detection Adaptive GOP Statistical Multiplexing To be ecient, we must predict spatio-temporal events in order to adapt the coding strategy. Scene cuts; Fade-in, fade-out, cross-fade; Uncovered regions; Noise. 81

Adaptive GOP A brief history of video compression standards Introduction Bit Rate allocation Preprocessing Ecient coding mode decision Adpative quantization Motion estimator Spatio-temporal events detection Adaptive GOP Statistical Multiplexing 82

Statistical Multiplexing Introduction Bit Rate allocation Preprocessing Ecient coding mode decision Adpative quantization Motion estimator Spatio-temporal events detection Adaptive GOP Statistical Multiplexing Statistical Multiplexing for multiple program encoding When there is a set of video sequences to encode, we can use the fact that theirs complexities at any given time are usually quite dierent. Example of a pool of program (4): Sport channel (dicult motion to handle, high spatio-temporal activities); New channel (quite easy to encode); Cartoon channel (quite easy to encode); Movies channel (depends on the movie). Statistical multiplexing uses variable bit rate encoding to give more bits to the more dicult scenes. The total bit rate is constant. The use of a statistical multiplexing allows to increase the number of coded programs in a xed bandwith, without a loss of quality... 83

MPEG-2 1 A brief history of video compression standards 2 3 4 85

White paper of Tandberg/Ericsson in April 2009. It's not dead yet! MPEG-2 video coding eciency improvements http://www.tandbergtv.com/ 86

Increasing the coding ecience by more than 15% Three areas are investigated: Look-ahead encoding Two-stage motion estimation Pre-processing 87

Increasing the coding ecience by more than 15% Look-ahead encoding Look-ahead encoding is the process of using a pre-encoder to analyse the incoming video to capture encoding metrics. These metrics are then used by the nal stage encoder to optimize the coding strategy. Multiple look-ahead encoders enable better predictions, which results in better rate control. 88 Better prediction means better stability of the rate control, and therefore better picture quality...

Increasing the coding ecience by more than 15% Two-stage motion estimation MPEG-2 standard does not dene how to perform motion estimation. Most of the times, video encoders perform a block-matching motion estimation. By using information stemming from look-ahead encoders, it is possible to improve the motion estimation. A second stage motion estimation is performed during the nal encode stage. Two-stage motion estimation. 89

Increasing the coding ecience by more than 15% Pre-processor The goal is to perform more comprehensive analysis of the source video in advance of the compression stage. Field-frame decision: measure eld dominance as well as eld and frame picture activity to select between eld and frame picture coding modes; Scene-cut detection: detect single-picture scene changes to prepare buer allocation to handle the large picture data to be required for the start of the next scene Fade detection: detect fade to-from black to use coding parameters more appropriate for this special eect Flash detection: Detect rapid chrominance changes and luminance saturation to use coding parameters more appropriate for this special eect Adaptive GOP structures and GOP length: vary the combination of P-pictures and B-pictures within a GOP to match the picture type to the content better 90

Increasing the coding ecience by more than 15% Coding mode selection MPEG-2 Video Encoder produces the best visual quality by picking the best result among the set of possible encoding results that have been produced in parallel. Rate distortion optimization example. 91

Increasing the coding ecience by more than 15% Result PSNR=f(bitRate) 92

Suggestion for further reading... [He et al.,02] Z. He, and S. K. Mitra. A linear source model and a unied rate control algorithm for DCT video coding. IEEE Trans. Circuits Syst. Video Techn., Vol. 12, N. 11, 2002. [He et al.,01] Z. He, and S. K. Mitra. ρ-domain bit allocation and rate control for real time video coding, ICIP, pp. 546-549, 2001. 92