Outline Introduction MPEG-2 MPEG-4. Video Compression. Introduction to MPEG. Prof. Pratikgiri Goswami

to MPEG Prof. Pratikgiri Goswami Electronics & Communication Department, Shree Swami Atmanand Saraswati Institute of Technology, Surat.

Outline of Topics 1 2 Coding 3 Video Object Representation

To Reduce Bit-Rate

To Reduce Bit-Rate quantified by Compression Ratio

To Reduce Bit-Rate quantified by Compression Ratio Higher the CR, Lower the required bandwidth

To Reduce Bit-Rate quantified by Compression Ratio Higher the CR, Lower the required bandwidth price to pay : increasing degradation of the image : Artifacts

To Reduce Bit-Rate quantified by Compression Ratio Higher the CR, Lower the required bandwidth price to pay : increasing degradation of the image : Artifacts two basic compression standards : JPEG, MPEG. JPEG is associated with still digital images MPEG-I is dedicated to digital video The most popular MPEG standards are and with the former associated with SD and the latter with high definition (HD) television.

Coding Outline of Topics 1 2 Coding 3 Video Object Representation

Coding Coding Two distinguishing features of a video clip, both of which are utilized by MPEG in its data compression technique.

Coding Coding Two distinguishing features of a video clip, both of which are utilized by MPEG in its data compression technique. 1 The first is that a video piece is a sequence of still images and as such can be compressed using the same technique as that used by JPEG. This is known as spatial inter-frame compression. 2 The second feature is, in general, successive images of a video piece differ very little, making it possible to dispense with the unchanging or redundant part and send only the difference.

Coding Coding Consists of three major parts: data preparation, compression (temporal and spatial) and quantization.

Coding Coding Video preparation: regrouping samples into 8 x 8 blocks to be used in spatial redundancy removal

Coding Coding Video preparation: regrouping samples into 8 x 8 blocks to be used in spatial redundancy removal These blocks are then rearranged into 16 x 16 macroblocks to be used in temporal redundancy removal. The macroblocks are then grouped into slices which are the basic units for data compression. The make up of a macroblock is determined by the chosen profile. Using 4:2:0 sampling, a macroblock will consist of four blocks of luminance and one block of each of the chrominance components C R and C B.

Coding Coding

Coding Inter-frame compression is carried out on successive frames

Coding Inter-frame compression is carried out on successive frames It exploits the fact that the difference between two successive frames is very slight.

Coding Inter-frame compression is carried out on successive frames It exploits the fact that the difference between two successive frames is very slight. Thus, it is not necessary to transmit the full contents of every picture frame since most of it is merely a repetition of the previous frame. Only the difference needs to be sent out.

Coding The repeated elements are known as redundant because they do not add anything new to the original composition of the frame. To avoid redundancy, only the changes of the contents of the picture are described instead. These changes may be defined by two aspects: the movement of the tiger from cell A1 to cell B2 and the introduction of a plane in cell A1.

Coding Movement : Motion Vector

Coding Movement : Motion Vector Newly introduced frame : Difference frame : derived by a slightly more complex method

Coding Movement : Motion Vector Newly introduced frame : Difference frame : derived by a slightly more complex method First, the motion vector is added to the first frame to produce a predicted frame.

Coding The predicted frame is then subtracted from the second frame to produce the difference frame.

Coding Group of Pictures(GoP) Motion vector and frame difference are combined to form what is referred to as a P-frame (P for predicted).

Coding Group of Pictures(GoP) Motion vector and frame difference are combined to form what is referred to as a P-frame (P for predicted). Temporal compression is carried out on a group of pictures (GOP) normally composed of 12 non-interlaced frames.

Coding Group of Pictures(GoP) This is followed by a P-frame obtained by comparing the second frame with the I-frame.

Coding Group of Pictures(GoP) This is followed by a P-frame obtained by comparing the second frame with the I-frame. This is then repeated and the third frame is compared with the previous P-frame to produce a second P-frame and so on until the end of the group of 12 frames when a new reference.

Coding Block Matching The motion vector is obtained from the luminance component only by a process known as block matching.

Coding Block Matching The motion vector is obtained from the luminance component only by a process known as block matching. Block matching involves dividing the Y component of the reference frame into 16x16 pixel macroblocks, taking each macroblock, in turn, moving it within a specified area within the next frame and searching for matching block pixel values.

Coding Block Matching When a match is found, the displacement is then used to obtain a motion compensation vector that describes the movement of the macroblock in terms of speed and direction.

Coding Block Matching Only a relatively small amount of data is necessary to describe a motion compensation vector.

Coding Block Matching Only a relatively small amount of data is necessary to describe a motion compensation vector. The actual pixel values of the macroblock themselves do not have to be retransmitted. Once the motion compensation vector has been worked out, it is then used for the other two components, C R and C B. Further reductions in bit count are achieved using differential encoding for each motion compensation vector with reference to the previous vector.

Coding Temporal Prediciton Predicting what a frame known as the P-frame would look like if it were reconstructed using only the motion compensation vector and then comparing this with the actual frame. The difference between the two contains the necessary additional information which, together with the motion compensation vector, fully defines the contents of the picture frame. The P-frame is constructed by adding the motion vector to the same frame that was used to obtain the very same motion vector. The P-frame is then subtracted from the current frame to generate a difference frame, which is also known as the residual or prediction error.

Coding Temporal Prediction

Coding Bi-directional Predicition The bit rate of the output data stream is highly dependent on the accuracy of the motion vector.

Coding Bi-directional Predicition The bit rate of the output data stream is highly dependent on the accuracy of the motion vector. A P-frame that is predicted from a highly accurate motion vector will be so similar to the actual frame that the residual error will be very small, resulting in fewer data bits and therefore a low bit rate. By contrast, a highly speculative motion vector will produce a highly inaccurate prediction frame, hence a large residual error and a high bit rate. Bidirectional prediction: to improve the accuracy of the motion vector.

Coding Bi-directional Predicition The current frame is simultaneously fed into two motion vector estimators.

Coding Bi-directional Predicition The current frame is simultaneously fed into two motion vector estimators. To produce a forward motion vector, the forward motion estimator takes the current frame and compares it macroblock by macroblock with the past frame that has been saved in the past-frame memory store. To produce a backward motion vector, the backward motion estimator takes the current frame and compares it macroblock by macroblock with a future frame that has been saved in the future frame memory store.

Coding Bi-directional Predicition Each vector is used to produce three possible predicted frames: P-frame, B-frame and average or bidirectional frame (Bi-frame). These three predicted frames are compared with the current frame to produce three residual errors. The one with the smallest error, i.e. the lowest bit rate, is used.

Coding The heart of spatial redundancy removal is the DCT processor. The DCT processor receives video slices in the form of a stream of 8 8 blocks.

Coding The heart of spatial redundancy removal is the DCT processor. The DCT processor receives video slices in the form of a stream of 8 8 blocks. The blocks may be part of a luminance frame (Y) or a chrominance frame (C R or C B ). Sample values representing the pixel of each block are then fed into the DCT processor, which translates them into an 8x8 matrix of DCT coefficients representing the spatial frequency content of the block. The coefficients are then scanned and quantized before transmission.

Coding Spatial Frequency

Coding DCT Normal pictures are two-dimensional, and following transformation, they will contain diagonal as well as horizontal and vertical spatial frequencies.

Coding DCT Normal pictures are two-dimensional, and following transformation, they will contain diagonal as well as horizontal and vertical spatial frequencies. specifies DCT as the method of transforming spatial picture information into spatial frequency components. Each spatial frequency is given a value, known as the DCT coefficient. For an 8x8 block of pixel samples, an 8x8 block of DCT coefficients is produced.

Coding DCT

Coding Quantization Rounded up or down

Coding Scanning

Coding Coding & Buffering The coding of the quantized DCT coefficients employs two compression techniques: run-length coding (RLC) and variablelength coding (VLC).

Coding Coding & Buffering The coding of the quantized DCT coefficients employs two compression techniques: run-length coding (RLC) and variablelength coding (VLC). Quantization, RLC and VLC produce a bit rate that depends upon the complexity of the picture content as well as the amount and type of movement involved. A variable bit rate would occupy a varying amount of bandwidth and may exceed the total available bandwidth with detrimental effect on picture quality. To avoid this, a constant bit rate is necessary. This is obtained by dynamically changing the quantization of the DCT matrix block

Coding Complete DCT Coder

Coding Forward Prediciton Codec

Video Object Representation Outline of Topics 1 2 Coding 3 Video Object Representation

Video Object Representation coding did not remain confined to the domain of rectangular-sized pictures but adopted an object based coding concept in which arbitrarily shaped and dynamically changing individual audio-visual objects in a video sequence can be individually encoded, manipulated and transmitted through independent bit-stream. It was standardized to address a wide range of bit-rates- from very low bit rate coding (5-64 Kbits/sec) to 2 Mbits/sec for TV/film applications.

Video Object Representation Conceptualized with an objective to standardize algorithms for audio-visual coding in multimedia applications, with flexibility for interactions, universal accessibility and high compression. Content Based Interactivity Transformation of existing objects (re-positioning, scaling, and rotations), addition of new objects, removal of existing objects etc. are all within the scope of manipulation.

Video Object Representation Content Based Interactivity

Video Object Representation Video Object Representation To achieve content-based interactivity, has standardized on the video object representation.

Video Object Representation Video Object Representation To achieve content-based interactivity, has standardized on the video object representation. A sequence is composed of one or more audio visual objects (AVO). AVOs can be either an audio object resulting out of speech, music, sound effects, etc or a video object (VO) representing a specific content, such as a talking sequence of headand-shoulder images of a person or a moving object, static/moving background etc. A video object may be present over a large collection of frames.

Video Object Representation Video Object Representation For content representation using VOPs, an input video sequence is segmented into a number of arbitrarily shaped regions (VOPs).

Video Object Representation Video Object Representation For content representation using VOPs, an input video sequence is segmented into a number of arbitrarily shaped regions (VOPs). Each of the regions may possibly cover particular image or video content of interest. The shape and the location of the region can vary from frame to frame. The shape, motion and texture information of the VOPs belonging to the same VO is encoded and transmitted into a Video Object Layer (VOL).

Video Object Representation Video Object Representation A snapshot (frame) of a Video sequence, segmented into an arbitrarily shaped foreground VOP, and a background VOP2. binary alpha-plane for the same frame, which is a binary segmentation mask specifying the location of the foreground content VOP.

Video Object Representation Video Object Representation

Video Object Representation Video Object Representation Each VOL encoding has three components

Video Object Representation Video Object Representation Each VOL encoding has three components Shape (contour) coding

Video Object Representation Video Object Representation Each VOL encoding has three components Shape (contour) coding Motion estimation and compensation

Video Object Representation Video Object Representation Each VOL encoding has three components Shape (contour) coding Motion estimation and compensation Texture coding

Video Object Representation Video Object Representation Each VOL encoding has three components Shape (contour) coding Motion estimation and compensation Texture coding We may note that the frame-based functionalities of MPEG- 1 and form a subset of content based functionalities supported in. While supports multiple VOPs, the former two standards support only one VOP containing the entire picture of fixed rectangular size.

Video Object Representation Thank You! www.pratikgoswami.weebly.com E-mail : pratikzg@gmail.com +91 9033144767