In the name of Allah the compassionate, the merciful
Digital Video Systems S. Kasaei Room: CE 315 Department of Computer Engineering Sharif University of Technology E-Mail: skasaei@sharif.edu Webpage: http://sharif.edu/~skasaei Lab. Website: http://mehr.sharif.edu/~ipl
Acknowledgment Most of the slides used in this course have been provided by: Prof. Yao Wang (Polytechnic University, Brooklyn) based on the book: Video Processing & Communications written by: Yao Wang, Jom Ostermann, & Ya-Oin Zhang Prentice Hall, 1 st edition, 2001, ISBN: 0130175471. [SUT Code: TK 5105.2.W36 2001].
Chapters 9 & 11 Video Coding using Motion Compensation
Outline Block-Based Hybrid Video Coding Overview of Block-Based Hybrid Video Coding Overlapped Block Motion Compensation Coding mode selection & rate control Loop filtering Scalable Video Coding Motivation for scalable coding Basic modes of scalability Scalability in MPEG-2 Fine granularity scalability in MPEG-4 Kasaei 6
Characteristics of Typical Videos Frame t-1 Frame t Adjacent frames are similar & changes are due to object or camera motion. Kasaei 7
Key Ideas in Video Compression Predicts a new frame from a previous frame & only codes the prediction error. Prediction error cab be coded using the DCT method. Prediction errors have smaller energy than the original pixel values & can be coded with fewer bits. Those regions that cannot be predicted well will be coded directly using DCT. Works on each macroblock (MB) (16x16 pixels) independently to reduce the complexity. Motion compensation is done at the MB level. DCT coding of error is done at block level (8x8 pixels). Kasaei 8
Different Coding Modes Kasaei 9
Temporal Prediction No Motion Compensation: Works well in stationary regions. f ( t, m, n) = f ( t 1, m, n) Uni-directional Motion Compensation: Does not work well for uncovered regions by object motion. f ( t, m, n) = f ( t 1, m d, n d ) Bi-directional Motion Compensation: Can better handle covered/uncovered regions. f (, t m, n ) = w f ( t 1, m d, n d ) b b, x b, y + w f ( t + 1, m d, n d ) f f, x f, y Kasaei 10 x y
Temporal Prediction Although bidirectional prediction can improve prediction accuracy and consequently coding efficiency, it incurs encoding delay and is typically not used in real-time applications. The H.261/H.263 use only uni-directional prediction and a restricted bidirectional prediction (PB-mode). The MPEG employs both uni- & bidirectional prediction. Kasaei 11
Encoder Block Diagram of a Block-Based Hybrid Video Coder Kasaei 12
Decoder Block Diagram Kasaei 13
Block-Based Hybrid Video Coding The encoder must emulate the decoder operation to deduce the same reconstructed frame as the decoder. Frame types: When a frame is coded entirely in the intra-mode I-frame. When a previous frame is used for prediction in the inter-mode P-frame. When a weighted sum of previous & following frame is used for prediction in the inter-mode B-frame. The mode information, MVs, & other side information (picture format, block location, ) are coded using VLC. Picture GOB (slice) MB block. Kasaei 14
MB Structure in 4:2:0 Color Format 4 8x8 Y blocks. 1 8x8 Cb blocks. 1 8x8 Cr blocks. Kasaei 15
Block Matching Algorithm for Motion Estimation MV Search Region Frame t-1 Frame t (reference frame) (predicted frame) Kasaei 16
Macroblock Coding in I-Mode DCT transform each 8x8 DCT block. Quantize the DCT coefficients (with properly chosen quantization matrices). Quantized DCT coefficients are zig-zag ordered & run-length coded. Kasaei 17
Macroblock Coding in P-Mode Estimate one MV for each macroblock (16x16). Depending on the motion compensation error, determine the coding mode (intra, inter-with-no-mc, inter-with-mc, etc.) The original values (for intra-mode) or motion compensation errors (for inter-mode) in each of the DCT blocks (8x8) are DCT transformed, quantized, zig-zag/alternate scanned, & run-length coded. Kasaei 18
Macroblock Coding in B-Mode Same as for the P-mode, except a macroblock can be predicted from a previous frame, a following one, or both. v f v b Kasaei 19
Overlapped Block Motion Compensation (OBMC) Conventional block motion compensation: One best matching block is found from a reference frame. The current block is replaced by the best matching block. OBMC: Each pixel in the current block is predicted by a weighted average of several corresponding pixels in the reference frame. The corresponding pixels are determined by the MVs of the current as well as adjacent MBs. The weights for each corresponding pixel depends on the expected accuracy of the associated MV. Kasaei 20
OBMC using 4-Neighboring MBs should be inversely proportional to the distance between x & the center of Kasaei 21
Optimal Weighting Design Convert to an optimization problem: Minimize: Subject to: Optimal weighting functions: Kasaei 22
How to Determine MVs with OBMC Option 1: using conventional BMA, minimize the prediction error (MAD) within each MB independently. Option 2: minimize the prediction error assuming OBMC: Solve the MV for the current MB while keeping the MVs for the neighboring MBs found in the previous iterations. Kasaei 23
How to Determine MVs with OBMC Option 3: Using a weighted error criterion over a larger block: Kasaei 24
Weighting Coefficients used in H.263 Kasaei 25
Window Function Corresponding to H.263 Weights for OBMC Kasaei 26
Coding Mode Selection Coding modes: Intra vs. inter, QP to use for each MB, each leading to different rate. Rate-distortion optimized selection, given target rate: Minimize the distortion, subject to the target rate constraint: Simplified version: The optimal mode is such that each MB works at the same R-D slope:
Rate Control Rate control: How to code a video so that the resulting bit stream satisfies a target bit rate? For pleasant visual perception, video should have a constant quality. But, the coding method necessarily yields variable bit rate. So, at least the bit rate should be constant when averaged over a short period. Rate control is also necessary when the video is to be sent over a constant bit rate (CBR) channel. The fluctuation within the period can be smoothed by a buffer at the encoder output. Kasaei 28
Rate Control Rate control accomplished steps: Step 1) Determine the target average bit rate at the frame, GOB, & MB level, based on the current buffer fullness. Step 2) Satisfy frame level target rate by varying frame rate (skip frames when necessary). Step 3) Satisfy GOB/MB level target rate by varying the coding mode & QP at each MB. (= Rate-distortion optimized mode selection.) Kasaei 29
Loop Filtering Errors in previously reconstructed frames accumulate over time with motion compensated temporal prediction lead to: Reduction of prediction accuracy. Increase of the bit rate needed for coding new frames. Kasaei 30
Loop Filtering Loop filtering: Filters the reference frame before using it for prediction. Can be embedded in the motion compensation loop. Half-pel motion compensation. OBMC. Loop filtering can significantly improve coding efficiency. For theoretically optimal design of loop filters see text. Kasaei 31
Scalable Coding Scalability refers to the capability of recovering physically meaningful image (or video) information by decoding only partial compressed bit stream. It is used when users try to access the same video through different communication links (bandwidth scalability). A scalable stream can also offer adaptivity to varying channel error characteristics & computing power at the receiving terminal. This includes: quality, spatial, temporal, & frequency scalability. Kasaei 32
Scalable Coding Motivation: Real networks are heterogeneous in rate. streaming video from home (56 kbps) using modem vs. corporate LAN (10-100 mbps). Scalable video coding; Ideal goal (embedded stream): Creating a bit stream that can be accessed at any rate. Practical video coder: layered coder: base layer provides basic quality, successive layers refine the quality incrementally. Coarse granularity (typically known as layered coder). Kasaei 33 Fine granularity (FGS).
Combined Spatial/Quality Scalability Kasaei 34
Illustration of Scalable Coding Spatial scalability 6.5 kbps 133.9 kbps 21.6 kbps 436.3 kbps Kasaei 35 Quality (SNR) scalability
Quality Scalability by Multistage Quantization Kasaei 36
Spatial/Temporal Scalability through Down/Up-Sampling Kasaei 37
Scalability in MPEG-2 MPEG-2 is the earliest standard that offers scalability tools. Four types of scalability: Data partition (frequency scalability). SNR scalability (quality scalability). Temporal scalability (frame-rate scalability). Spatial scalability (resolution scalability). Kasaei 38
Fine Granularity Scalability in MPEG-4 MPEG-4 achieves fine granularity quality scalability through bit-plane coding. The DCT coefficients are represented losslessly in binary bits. The bit planes are coded successively, from the most significant bit to the least. The bit plane within each block is coded using run-length coding. Temporal scalability is accomplished by combining I, B, & P-frames. Spatial scalability is achieved by spatial down/up-sampling. Kasaei 39
Scalable Video Coding using Wavelet Transform Wavelet-based image coding: Full frame image transform (as opposed to block-based transform). Bit plane coding of the transform coefficients can lead to embedded bit streams. EZW -> SPIHT. Kasaei 40
Scalable Video Coding Using Wavelet Transforms Wavelet-based video coding: Temporal filtering with & without motion compensation. Can achieve temporal, spatial, & quality scalability simultaneously. Still an active research activity! Kasaei 41
Homework Reading assignment: Sec. 9.3, 11.1 Written assignment Prob. 9.13, 11.3, 11.4 Computer assignment Prob. 9.11, 9.12 Optional: 9.15 Kasaei 42
The End