Introduction of Video Codec - PDF Free Download

Introduction of Video Codec Min-Chun Hu anita_hu@mail.ncku.edu.tw MISLab, R65601, CSIE New Building 3D Augmented Reality and Interactive Sensor Technology, 2015 Fall

The Need for Video Compression High-Definition Television (HDTV) 1920x1080 30 frames per second (full motion) 8 bits for each three primary colors Total 1.5 Gb/sec! Each cable channel is 6 MHz Max data rate of 19.2 Mb/sec Reduced to 18 Mb/sec w/audio + control Compression rate must be 83:1!

Requirements Random Access, Reverse, Fast Forward, Search At any point in the stream Can reduce quality somewhat during task, if needed Audio/Video Synchronization Even when under two different clocks Robustness to errors Not catastrophic if bits lost Coding/Decoding delay under 150ms For interactive applications Editability Modify/Replace frames

MPEG and VCEG MPEG: Moving Picture Experts Group Established in 1988 Standards for audio and video compression and transmission VCEG: Video Coding Experts Group A Group of ITU (International Telecomm. Union) Develop H.264 and HEVC (H.265) jointly

Overview of Video Coding Standards ITU-T Rec. H.261 ITU-T Rec. H.263 ITU-T H.264 ITU-T H.265(HEVC) ISO/IEC MPEG-1 ISO/IEC MPEG-2 ISO/IEC MPEG-4 JVT H.264/MPEG-4 AVC 5

Applications of Video Compression Application Bit Rate Standard Digital television broadcasting 2~6 Mbps (10~20 Mbps for HD) MPEG-2 DVD video 6~8 Mbps MPEG-2 Internet video streaming Videoconferencing, Videotelephony 20~200 kbps Proprietary, similar to H.263, MPEG-4, or H.26L/JVT 20~320 kbps H.261, H.263 Video over 3G wireless 20~100 kbps H.263, MPEG-4 6

Core Concept of Video Compression Compressed Video (Residual) = Original Data Already Known = Original Video Prediction

Flow of Video Compression Simplified illustration of video compression DCT, DWT Orig. Video Residual Transform Quantization Prediction Spatial, Temporal, Inter-view, Depth Compressed Video Entropy Coding Huffman Coding Arithmetic Coding CAVLC, CABAC 8

Flow of Video Decoding Simplified illustration of video decoding Video Residual Inverse Transform Inverse Quantization Prediction Compressed Video Entropy decoding 9 9

MPEG 10

MPEG Today MPEG video compression widely used digital television set-top boxes HDTV decoders DVD players video conferencing Internet video...

MPEG Today MPEG-2 Super-set of MPEG-1 Rates up to 10 Mbps (720x486) Can do HDTV (no MPEG-3) MPEG-4 Around Objects, not Frames Lower bandwidth MPEG-7 Not (yet) a standard Allows content-description (ease of searching) MP3 For audio MPEG Layer-3

MPEG Compression Compression through Spatial Temporal

Spatial Redundancy Take advantage of similarity among most neighboring pixels

Spatial Redundancy Reduction RGB to YUV less information required, same visually Macro Blocks Take groups of pixels DCT Represent pixels in blocks with fewer numbers Quantization Reduce data required for coefficients Entropy coding Compress

Spatial Redundancy Reduction Intra-Frame Encoded Quantization major reduction controls quality Zig-Zag Scan, Run-length coding

Loss of Resolution Original (63 kb) Low (7kb) Very Low (4 kb)

Temporal Redundancy Take advantage of similarity between successive frames 950 951 952

Temporal Redundancy Reduction

Temporal Redundancy Reduction I frames are independently encoded P frames are based on previous I, P frames B frames are based on previous and following I and P frames

Group of Pictures (GOP) Starts with an I-frame Ends with frame right before next I-frame Open ends in B-frame, Closed in P-frame MPEG Encoding parameter, but typical : I B B P B B P B B I I B B P B B P B B P B B I Type Size Compression --------------------- I 18 KB 7:1 P 6 KB 20:1 B 2.5 KB 50:1 Avg 4.8 KB 27:1 ---------------------

MPEG Layers

MPEG-1 Encoding Flowchart Backward Frame Buffer 24

H.264 25

Profiles of H.264 Profile: set of capabilities 21 profiles in total High profile For Blu-ray Extension Scalable Video Coding (SVC) Multi-view Video Coding (MVC)

Selected new features in H.264 Advanced intra-coding Enhanced motion estimation with variable block size Multiple reference picture Integer block transform Improved deblocking filter Entropy coding H.264 Error Resilient Coding

H.264 Encoder Transform, Quantization Prediction Entropy Coding

H.264 advanced intra-coding (Cont.) Efficient video encoders mainly use inter-frame prediction, Use of intra-frame coding for parts of the picture is necessary to prevent error propagation But intra-frame coding generates a large bit rate, and hence in order for H.264 to be efficient, special attention is paid to intra-frame coding. H.264 takes advantage of correlations between neighboring blocks to achieve better compression in intra-coding.

H.264 advanced intra-coding (Cont.) Every intra 16 X 16 pixel MB in a picture is first predicted in an appropriate mode from the already coded and reconstructed samples of the same picture.

H.264 advanced intra-coding (Cont.) There are nine advanced intra-prediction modes for the samples when the MB is partitioned into 4 X 4 blocks

H.264 advanced intra-coding (Cont.) Intra prediction modes (16x16 block)

Advanced inter-coding Inter-frame predictive coding is where H.264 makes most of its gain in compression efficiency. Motion compensation (MC) on each 16 X 16 MB can be performed with various block sizes and shapes

Advanced inter-coding

Inter Prediction Data needs to be compressed Motion vector Partition mode Residual Further improvements Multiple reference frames Sub-sample prediction Motion vector prediction

Multiple Reference Frames The H.264 standard also offers the option of using several previous pictures for prediction. Every MB partition can have a different reference picture that is more appropriate for that particular block. Increases the coding efficiency and produces a better subjective quality Improve the robustness of the bitstream to channel errors 36

Frame types of H.264 I-frame (Baseline Profile) P-frame (Baseline Profile) B-frame (Main Profile) SI-frame, SP-frame (Extended Profile) I-frame F0 P-frame F1 P-frame F2 F0 F1 F2

Sub-sample Prediction Integer and sub-sample prediction

Interpolation of Half Pixels Finite Impulse Response (FIR) filter is utilized. Original integer pixel

Interpolation of Quarter Pixels

Motion Vector Prediction Except 16x8,8x16, MVp (predicted mv) is median(a,b,c) For 8x16, left MVp=A, right MVp =C For 16x8, upper MVp=B, lower MVp =A

Transformation H.264 employs a 4x4 integer transform The transform is an approximation of the DCT It has a similar coding-gain to the DCT transform. Since the integer transform has an exact inverse operation, there is no mismatch between the encoder and the decoder which was a problem in all DCT based codecs

Transform 3 Transforms 4x4, 2x2 Hadamard transform 4x4 DCT (approximated) 4x4 DCT in H.264 Integer transform Can be implemented using only additions and shifts Scaling multiplication is integrated into quantization Zero mismatch between encoder and decoder is possible

Transform Luma block size:16x16 Cb, Cr block size: 8x8 2x2 Hadamard 4x4 Hadamard

Transform Block 0~15, 18~25: 4x4 DCT transform

Transform Y=AXA T

Transform

Transform Factorization of Y: (d=c/b 0.414) Scalar multiplication

Transform Comparison between the approximated and the exact DCT

Quantization

Quantization QP vs. Qstep in H.264

Entropy Coding H.264 supports two different methods of entropy coding Context adaptive variable length coding (CAVLC) Context adaptive binary arithmetic coding (CABAC) CAVLC The number of nonzero quantized coefficients (N) and the actual size and position of the coefficients are coded separately. CABAC The efficiency of entropy coding can be improved further if CABAC is used Allows the assignment of a non-integer number of bits to each symbol of an alphabet

Deblocking Filter In H.264 codec, every reconstructed picture is filtered by default using an adaptive deblocking filter. The filter removes visible block structures on the edges of the 4 X 4 blocks caused by block-based transform coding and motion estimation

H.264 Error Resilient Coding Spatial Error Propagation The use of entropy coding means that every coded bit within a slice requires the previous bits for its decoding. Hence, a single bit error in the transmitted stream may destroy the whole remaining coded bits of one slice Temporal error propagation Dependencies of consecutive pictures, as a result of intercoding. Damage in one frame may propagate into many future frames, even if their information is received without error.

H.264 Error Resilient Coding (Cont.) First, the coded video data are grouped in network abstraction layer (NAL) units Each NAL unit can be considered as a packet that contains an integer number of bytes including a header and a payload The header specifies the NAL unit type and the payload contains the related data. Data Partitioning

H.264 Error Resilient Coding (Cont.) In a video sequence, each frame can be divided into several slices; each containing a flexible number of MBs. In each slice, the arithmetic coder is aligned and the spatial predictions are reset. Every slice in the frame is independently decodable, and therefore can be considered as a re-synchronization point that prevents spatial propagation of a probable error to the next slice.

Extensions of AVC Scalable Video Coding (SVC) Simulcast Scalability: Spatial, Temporal, Quality Multi-view Video Coding (MVC)

Extensions of AVC

Extensions of AVC Simulcast

Extensions of AVC Spatial Scalable

Extensions of AVC Coding flow

Extensions of AVC Multi-view Video Coding (MVC)

HEVC vs. AVC Input Video Signal Split into Largest Coding Unit 64x64 pixels - Coder Control Transform/ Scal./Quant. Intra-frame Prediction H.264: 4x4 to 16x16 HM3: 4x4 to 64x64 Scaling & Inv. Transform HM3: Similar to H.264 (CAVLC & CABAC) Swapping Deblocking Filter Control Data H.264: primary 4x4 to 8x8 int. trans. Quant. Transf. coeffs HM3: 4x4 to 32x32 int. trans. Entropy Coding HM3: Similar to H.264 H.264: 8-bit precision HM3: up to 10-bit precision Intra/Inter Motion- Compensation Motion Estimation H.264: Up to 9 directions HM3: Up to 34 directions Adaptive Loop Filter & Offset Motion Data H.264: ¼-pel with 6-tap interpolation filter HM3: ¼-pel with 8-tap interpolation filter

Performance Evaluation PSNR HEVC (H.265) can achieve ~40% bit rate saving as compared with H.264

Multi-view Video Coding One Extension of H.264 Adopt inter-view prediction only New coding tools are not adopted (for backward compatible)

Related research topics of 3D-TV Profiles of MVC Adopted in Blu-ray 3D (Stereo High)

Related research topics of 3D-TV

New 3D Compression Framework

New 3D Compression Framework Depth Estimation Ref. Software (DERS) View Synthesis Ref. Software (VSRS)

Coding Tools in HEVC 3D Compression Wedgelet partition: segment with strait line Contour partition: segmentation with arbitrary shape 73

Coding Tools in HEVC 3D Compression Inter-Component Prediction Predict the partition (Wedgelet or Contour) from a texture reference block 74

View Synthesized Optimization in HEVC 3D Compression Depth data is not visible for users directly Optimize the results of synthesized view 75

MOTION ESTIMATION 76

Introduction of Motion Estimation The goal of video compression is to reduce the total transmission bit rate for reconstructing images at the receiver. Stationary Camera Scaled Frame Difference Non-Stationary Camera Scaled Frame Difference 77

Introduction of Motion Estimation Encode the motion information occupying only a small amount of the transformation bandwidth additional to the picture content information. Motion estimation techniques are also used in many other applications: Computer vision, target tracking, industrial monitoring Motion estimation techniques can be roughly classified into three groups: Block matching groups Differential (Gradient) methods Fourier Methods 78

Introduction of Motion Estimation Motion-compensated estimation is an effective means in reducing the inter-frame correlation for image sequence coding. It is the operation of predicting an image (or portion thereof) based on displacement of a previously transmitted frame in an image sequence. By motion estimation, we mean the estimation of displacement (or velocity) of image structures from one frame to another in a time-sequence of 2-D images. 79

Introduction of Motion Estimation To reduce computation and storage complexity, motion parameters of objects in a picture are estimated based on two or three nearby frames. General assumptions: (i) Objects are rigid bodies; hence, object deformation can be neglected for at least a few nearby frames. (ii) Objects move only in translation movement for at least a few frames. (iii) Illumination is unchanged under movement. (iv) Occlusion of one object by another and uncovered background are neglected. 80

Definition of Moving Object Moving object: A group of contiguous pels that share the same set of motion parameters. It does not necessarily match the ordinary meaning of the object. For example, still background can be considered as a single object. The effect of object size: Small objects (or evaluation windows) Ambiguity problem : Similar objects (image patterns) may appear at multiple locations inside a picture and may lead to incorrect displacement vectors. Noise sensitivity problem : Statistically, estimation based on a small set of data are more vulnerable to random noise than those based on a large set of data. Large objects (or evaluation windows) Accuracy problem : Pels inside an object or evaluation window do not share the same motion parameters and, therefore, the estimated motion parameters are not accurate for some or all pels in it. 81

Definition of Moving Object Practical solution to determine object size: Partition images into regular, non-overlapped blocks; assuming that moving objects can be approximated reasonably well by regular shaped blocks. Then, a single displacement vector is estimated for the entire image block under the assumption that all the pels in the block share the same displacement vector. The above assumption may not always be true because an image block may contain more than one moving object. In image sequence coding, however, prediction errors due to imperfect motion compensation are coded and transmitted. The block-based motion estimation approach is adopted by the video coding standards for partially, at least, its robustness as compared to pel-based approach. 82

Motion-compensated Coding Structure Motion compensation: (1) Access the previous-frame image data according to the estimated displacement vector. (2) Construct the predicted pels by passing the previous-frame image through the prediction filter. 83

Motion-compensated Coding Structure H.261/263: I, P MPEG-1/2: I,P,B Encoding/transmission order A picture frame in MPEG-2 may contain two interleaved fields: Frame-based motion compensation Field-based motion compensation 84

Motion-compensated Coding Structure As long as the motion parameters we obtain can efficiently reduce the total bit rate, these parameters need not to be the true motion parameters. If the reconstructed images are used for estimating another motion information, a rather strong noise component can not be neglected. Drift Problem Notice that the current video standards specify only the decoder, i.e. the encoder which performs the motion estimation operation is not explicitly specified in the standards, a great amount of flexibility exists in choosing and designing a motion estimation scheme for a standard coder. 85

Block Matching Method Block matching is a correlation technique that searches for the best match between the current image block and candidates in a confined area of the previous frame. Goal: Reduce the computational load in calculating the motion vector. Increase the motion vector accuracy. Jain& Jain, Displacement measurement and its application in interframe coding, IEEE Trans. Communications, vol. COM-29, pp. 1799-1809, Dec. 1981. 86

Block Matching Method 87

Block Matching Method The size of the block affects the performance of motion estimation. Small block sizes Afford good approximation to the natural object boundaries; they also provide good approximation to real motion. Produce a large amount of raw motion information, which increases the number of transmission bits or the required data compression complexity to condense this information. Suffer from object (block) ambiguity problem and the random noise problem. Large block sizes Produce less accurate motion vectors, since a large block may likely contain pels moving at different speeds and directions. 8x8, 16x16 in H.261, MPEG-1, MPEG2 88

Block Matching Method The basic operation of block matching is picking up a candidate and calculating the matching function between the candidate and the current block usually a nonnegative function of the intensity difference. This basic operation is repeated until all the candidates have gone through and then the best matches candidate is identified. The location of the best matched candidate forms the estimated displacement vector. Important parameters: the number of candidate blocks (search points) the matching function the search order of candidates 89

Block Matching Method 90

Matching Function The selection of the matching function has a direct impact on computational complexity and the displacement vector accuracy. Let (d 1,d 2 ) represent a motion vector candidate inside the search region and f(n 1,n 2,t) be the digitalized image intensity at the integer-valued 2-D image coordinate (n 1,n 2 ) at the t-th frame. Normalized cross-correlation function (NCF): Mean square error (MSE) Mean absolute difference (MAD): Number of threshold difference(ntd) = 91

Matching Function Remarks: To estimate the motion vector, we normally maximize the value of NCF or minimize the values of the other three functions. In detection theory, if the total noise (a combination of coding error and the other factors violating our motion assumptions) can be modeled as white Gaussian, then the NCF is the optimal matching criterion. The white Gaussian assumption is not completely valid for real images. In addition, the computation requirement of NCF is enormous. The other matching functions are regarded as more practical, and they perform almost equally well for real images. NTD can be adjusted to match the subjective thresholding characteristics of the human visual system. MAD is the most popular choice in designing pratical image coding systems because of its good performance and relatively simple hardware structure. 92

Fast Search Algorithms Basic Principle: Breaking up the search process into a few sequential steps and choosing the next-step search direction based on the current step result. At each step, only a small number of search points are calculated. Therefore, the total number of search points is significantly reduced. Since the steps are performed in sequential order, and incorrect initial search direction may lead to a less favorable result. Also, the sequential search order poses a constraint on available parallel processing structure. 93

Fast Search Algorithms Normally, a fast search algorithm starts with a rough search, computing a set of scattered search points. The distance between two nearby search points is called search step size. After the current step is completed, it then moves to the most promising search points and does another search with probably a smaller step size. Simulated Annealing, Genetic Algorithm, Neural Networks, Support Vector Machine 94

Fast Search Algorithms If the matching function is monotonic along any direction away from the optimal point, a well designed fast algorithm can then be guaranteed to converge to the global optimal point. But in reality the image signal is not a simple Markov process and it contains coding and measurement noises; therefore, the monotonic matching function assumption is often not valid and consequently fast search algorithms are often suboptimal. possible next step 95

2-D-log Search Procedure Diamond-shaped search area (at most 5-points per step) 9 search points in 3x3 area surrounding the last best matching point are compared. The step size is reduced to half of its current value if the best match is located at the center or located on the border of the maximum search region. 96

3-step Search Procedure The search starts with a step size equal to or slightly larger than half of the max search range. 9-points are compared in each step. The step size is reduced by half, after each step, and the search ends with step size of 1 pel. 97

Fast Search Algorithms Remarks: A threshold function is used to terminate the search process without reaching the final step. As long as the matching error is less than a small threshold, the resultant motion vector would be acceptable. One-at-a-time search: Separate a 2-D search problem into two 1-D problems. That is, look for the best matching point in one direction first, and then looks in the other direction. 98

Fast Search Algorithms Remarks: In hardware systems, the exhaustive search and the 3- step search are often favored for their good PSNR performance, their fixed and fewer number of search steps and their identical operation in every step. 99

Variants of Block Matching Algorithms The computation load could also be reduced by calculating fewer blocks of an image. To increase search efficiency, we could place the initial search point at a location predicted from motion vectors of the spatially or the temporally adjacent blocks. A best matching can often be obtained by searching a smaller region surrounding this initial point. We could first separate the moving image blocks from the stationary ones and then conduct block matching only on the moving objects. This is because a moving or change detector can be implemented with much fewer calculations than a motion estimator. We could use only a portion of the pels inside an image block (subsampled images) to calculate the matching function. However, the accuracy of motion might be reduced. Perform the motion estimation only on alternate blocks in an image, and the motion vectors of the missing blocks are interpolated from the calculated motion vectors. 100

Variants of Block Matching Algorithms Basic principle: A large block size is chosen at the beginning to obtain a rough estimation of the motion vector. Because a large-size image pattern is used in matching. The ambiguity problemblocks of similar content can often be eliminated. However, motion vectors estimated form large blocks are not accurate. We then refine the estimated motion vector by decreasing the block size and the search region. A new search with a smaller block size starts from an initial motion vector that is the best matched motion vector in the previous stage. Because pels in a small block are more likely to share the same motion vector, the reduction of block size typically increases the motion vector accuracy. 101

Variants of Block Matching Algorithms Variable-block-size motion estimation: Image frames are partitioned into non-overlapped large image blocks. If the motion-compensated estimation error is higher than a threshold, this large block is not well compensated; therefore, it is further partitioned into, say, four smaller blocks. 102

1st 2nd 3rd 4th 16th 112th 256th Original Image 103