VIDEO AND IMAGE PROCESSING USING DSP AND PFGA. Chapter 3: Video Processing

ĐẠI HỌC QUỐC GIA TP.HỒ CHÍ MINH TRƯỜNG ĐẠI HỌC BÁCH KHOA KHOA ĐIỆN-ĐIỆN TỬ BỘ MÔN KỸ THUẬT ĐIỆN TỬ VIDEO AND IMAGE PROCESSING USING DSP AND PFGA Chapter 3: Video Processing 3.1 Video Formats 3.2 Video Compression Standards 3.3 Video processing algorithms 1 Reference Iain E. G. Richardson, H.264 and MPEG-4 Video Compression: Video Coding, Willey, 2003 2 1

Natural video scenes 3.1 Video Formats spatial characteristics: texture, shape of objects, colors, temporal characteristics: object motion, changes in illumination, movement of camera Spatial and temporal sampling of a video sequence 3 Spatial and Temporal Sampling Spatial sampling: pixels of an image are sampled from a CCD array produce resolution of a image Temporal sampling Images of a moving video are captured at periodic time intervals produce frame rate of a video (appearance of motion) 4 2

Interlaced video: Frames and Fields is a sequence of interlaced fields A field consists of either the odd-numbered or even-numbered lines possible to send twice as many fields per second, thus save bandwidth twice. 5 Progressive video: is sequence of frames Deinterlacing: Frames and Fields convert from interlaced video to progressive video Top field Bottom field 6 3

YCbCr 4:4:4 YCbCr Sampling Formats three components Y, Cb, Cr have same resolution YCbCr 4:2:2 chrominance components have the same vertical resolution as the luma but half the horizontal resolution used for high-quality color reproduction. YCbCr 4:2:0 Cband Cr each have half the horizontal and vertical resolution of Y widely used for consumer applications such as video conferencing, digital television and DVD 7 4:2:0, 4:2:2 and 4:4:4 sampling patterns 4:4:4 sampling 4:2:0 sampling 4:2:2 sampling 8 4

4:2:0 Interlaced Video 4:2:0 sampling is described as 12 bits per pixel. Allocation of 4:2:0 samples to top and bottom fields 9 Video Formats CIF: Common Intermediate Format 10 5

Video Formats ITU-R Recommendation BT.601-5 Y is sampled at 13.5MHz, Cband Cr at 6.75MHz Frame rate: NTSC 30Hz, PAL/SECAM 25Hz Each sample has a possible range of 0 to 255 Video frame formats 11 Video Formats ITU-R BT.601-5 Parameters 12 6

Quality Measurement Peak Signal to Noise Ratio (PSNR) is measured on a logarithmic scale depends on the mean squared error (MSE) of between an original and an impaired image (2 1) PSNR( db) = 10log MSE n 2 PSNR examples: (a) original; (b) 30.6 db; (c) 28.3 db 13 Assignments 1. Implement a Matlabfunction to convert interlaced video to progressive using double line method 2. Implement a Matlabfunction to convert interlaced video to progressive using merge-field method 3. Compute the resolution of a YCbCr4:2:2 video with the frame size 1024x768 4. Write a Matlabprogram to convert from YCbCr4:2:2 video to YCbCr 4:4:4 video 5. Implement a Matlab function to calculate PSNR 14 7

3.2 Video Compression and Standard Video coding Video coding methods exploit both temporal and spatial redundancy to achieve compression. In the temporal domain, there is usually a high correlation (similarity) between frames of video that were captured at around the same time. Temporally adjacent frames are often highly correlated, especially if the temporal sampling rate is high. In the spatial domain, there is usually a high correlation between pixels (samples) that are close to each other, i.e. the values of neighboring samples are often very similar 15 3.2 Video Compression and Standard Moving Picture Experts Group (MPEG) a study group who develop standards for the International Standards Organization (ISO). Video Coding Standards MPEG-1 and MPEG-2 standards for coding video and audio H.264/MPEG-4 16 8

MPEG-4 and H.264 development history 1993 MPEG-4 project launched. Early results of H. 263 project produced. 1995MPEG-4 call for proposals including efficient video coding and content-basedfunctionalities. H.263 chosen as core video coding tool 1998 Call for proposals for H.26L. 1999MPEG-4 Visual standard published. Initial Test Model (TM1) of H.26L defined. 2000 MPEG call for proposals for advanced video coding tools. 2001Edition 2 of the MPEG-4 Visual standard published. H.26L adopted as basis for proposed MPEG-4 Part 10. JVT formed. 2002Amendments 1 and 2 (Studio and Streaming Video profiles) to MPEG-4 Visual Edition 2 published. H.264 technical content frozen. 2003 H.264/MPEG-4 Part 10 ( Advanced Video Coding ) published. 17 MPEG-1 Video Structure In MPEG, each video sequence is divided into one or more groups of pictures (GOPs). There are four types of pictures defined in MPEG-l: I, P, B, and D pictures 18 9

MPEG-1 Video Structure I pictures (intracodedpictures) are coded independently with no reference to other pictures. I pictures provide random access points in the compressed video data P pictures (predictive-coded pictures) are coded by using the forward motion-compensated prediction similar to that in H.261 from the preceding I or P picture. P pictures provide more compression than the I pictures by virtue of motion compensated prediction B pictures (bidirectional-coded pictures) allow macroblocks to be coded by using bidirectional motioncompensated prediction from both the past and future reference I or P pictures. In the B pictures, each bidirectional motion-compensated macroblock can have two motion vectors: a forward motion vector and a backward motion vector 19 MPEG-1 Video Structure D pictures (DC pictures) are low-resolution pictures obtained by decoding only the DC coefficient of the discrete cosine transform coefficients of each macroblock. Bidirectional motion estimation 20 10

MPEG-4 MPEG-4 improves on the popular MPEG-2 standard both in terms of compression efficiency (better compression for the same visual quality) flexibility (enabling a much wider range of applications MPEG-4 Visual consists of a core video encoder/decoder model. The core model is based on the well-known hybrid DPCM/DCT coding model The basic function of the core is extended by tools supporting enhanced compression efficiency, reliable transmission, coding of separate shapes or objects in a visual scene, mesh-based compression and animation of face or body models. 21 MPEG-4 One of the key contributions of MPEG-4Visual is a move away from the traditional view of a video sequence as being merely a collection of rectangular frames of video. Instead, MPEG-4 Visual treats a video sequence as a collection of one or more video objects VOPs and VO (rectangular) VOPs and VO (arbitrary shape) 22 11

MPEG-4 Video scene consisting of three VOs 23 Summary of differences between MPEG-4 Visual and H.264 24 12

Levels for Simple-based profiles 25 3.3 Video Processing Algorithms Segmentation Manual segmentation: this requires a human operator to identify manually the borders of each object in each source video frame, This approach may be appropriate for segmentation of an important visual object that may be viewed by many users Semi-automatic segmentation: a human operator identifies objects and perhaps object boundaries in one frame; a segmentation algorithm refines the object boundaries (if necessary) and tracks the video objects through successive frames of the sequence. Fully-automatic segmentation: an algorithm carry out a complete segmentation of a visual scene without any user input, based on spatial characteristics such as edges and temporal characteristics such as object motion between frames. 26 13

3.3 Video Processing Algorithms Motion Estimation is the process of selecting an offset to a suitable reference area in a previously coded frame Motion estimation is carried out in a video encoder Motion vector is the offset between the current region or block and the reference area Current block (white border) 27 Motion Estimation The goal of the temporal model is to reduce redundancy between transmitted frames by forming a predicted frame and subtracting this from the current frame. The output of this process is a residual(difference) frameand the more accurate the prediction process, the less energy is contained in the residual frame Frame 1 Frame 2 Difference 28 14

TEMPORAL MODEL Motion vector: is a trajectory of each pixel between successive video frames Optical flow: is a field of pixel trajectories Optical flow 29 MOTION ESTIMATION Block-based Motion Estimation: Search an area in the reference frame to find a matching M N-sample region. Compare the M N block in the current frame with some or all of the possible M N regions in the search area and finding the region that gives the best match 30 15

MOTION ESTIMATION The macroblock, corresponding to a 16 16-pixel region of a frame, is the basic unit for motion compensated prediction in a number of important visual coding standards including MPEG-1, MPEG-2, MPEG-4 Visual, H.261, H.263 and H.264 Macroblock (4:2:0) 31 Block based motion estimation Motion compensation aims to minimize the energy of the residual transform coefficients after quantization. The energy in a transformed block depends on the energy in the residual block Motion estimation therefore aims to find a match to the current block or region that minimizes the energy in the motion compensated residual 32 16

Block based motion estimation Full search (raster scan) 33 Block based motion estimation Full search (spiral scan) 34 17

Block based motion estimation Three Step Search 35 DCT/IDCT The Discrete Cosine Transform is to de-correlate image or residual data prior to quantization and compression The forward DCT (FDCT) of an N N sample block is given by: Y = AXA T The inverse DCT (IDCT) is given by: X = A T YA 36 18

DCT Example: N = 4 The transform matrix A for a 4 4 DCT is: 37 DCT patterns 4x4 DCT basis patterns 8x8 DCT basis patterns 38 19

Wavelet Transform The popular wavelet transform is based on sets of filters with coefficients that are equivalent to discrete wavelet functions. A pair of filters are applied to the signal to decompose it into a low frequency band (L) and a high frequency band (H). 39 Wavelet Transform Image after one level of decomposition 40 20

Post-processing Post-filter implementation Loop filter implementation 41 Assignments 1. Write a Matlabprogram to convert a video from YUV format to MPEG-4 format 2. Write a Matlabprogram to perform DCT transform of an image 3. Write a Matlabprogram to perform Haarwavelet transform of an image 4. Write a Matlabprogram to perform motion estimation process for two successive frames 5. Write a Matlabprogram to perform image quantisation 42 21