INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO

Similar documents
AVC VIDEO CODERS WITH SPATIAL AND TEMPORAL SCALABILITY. Marek Domański, Łukasz Błaszak, Sławomir Maćkowiak,

SCALABLE HYBRID VIDEO CODERS WITH DOUBLE MOTION COMPENSATION

Comparative Study of Partial Closed-loop Versus Open-loop Motion Estimation for Coding of HDTV

System Modeling and Implementation of MPEG-4. Encoder under Fine-Granular-Scalability Framework

A LOW-COMPLEXITY MULTIPLE DESCRIPTION VIDEO CODER BASED ON 3D-TRANSFORMS

Week 14. Video Compression. Ref: Fundamentals of Multimedia

System Modeling and Implementation of MPEG-4. Encoder under Fine-Granular-Scalability Framework

A Hybrid Temporal-SNR Fine-Granular Scalability for Internet Video

ECE 634: Digital Video Systems Scalable coding: 3/23/17

Scalable Multiresolution Video Coding using Subband Decomposition

Lecture 5: Error Resilience & Scalability

Reduced Frame Quantization in Video Coding

OPTIMIZATION OF LOW DELAY WAVELET VIDEO CODECS

Optimal Estimation for Error Concealment in Scalable Video Coding

Fast Implementation of VC-1 with Modified Motion Estimation and Adaptive Block Transform

In the name of Allah. the compassionate, the merciful

Motion Estimation. Original. enhancement layers. Motion Compensation. Baselayer. Scan-Specific Entropy Coding. Prediction Error.

JPEG 2000 vs. JPEG in MPEG Encoding

Video Compression Standards (II) A/Prof. Jian Zhang

MPEG-4: Simple Profile (SP)

ARCHITECTURES OF INCORPORATING MPEG-4 AVC INTO THREE-DIMENSIONAL WAVELET VIDEO CODING

A 3-D Virtual SPIHT for Scalable Very Low Bit-Rate Embedded Video Compression

Advances of MPEG Scalable Video Coding Standard

Optimum Quantization Parameters for Mode Decision in Scalable Extension of H.264/AVC Video Codec

An Improved H.26L Coder Using Lagrangian Coder Control. Summary

Video Transcoding Architectures and Techniques: An Overview. IEEE Signal Processing Magazine March 2003 Present by Chen-hsiu Huang

VIDEO COMPRESSION STANDARDS

2014 Summer School on MPEG/VCEG Video. Video Coding Concept

H.264 to MPEG-4 Transcoding Using Block Type Information

Interframe coding A video scene captured as a sequence of frames can be efficiently coded by estimating and compensating for motion between frames pri

H.264/AVC Baseline Profile to MPEG-4 Visual Simple Profile Transcoding to Reduce the Spatial Resolution

Scalable Extension of HEVC 한종기

The Scope of Picture and Video Coding Standardization

An Efficient Mode Selection Algorithm for H.264

A New Configuration of Adaptive Arithmetic Model for Video Coding with 3D SPIHT

IMPROVED CONTEXT-ADAPTIVE ARITHMETIC CODING IN H.264/AVC

Scalable Video Coding

10.2 Video Compression with Motion Compensation 10.4 H H.263

FPGA IMPLEMENTATION OF BIT PLANE ENTROPY ENCODER FOR 3 D DWT BASED VIDEO COMPRESSION

Chapter 10. Basic Video Compression Techniques Introduction to Video Compression 10.2 Video Compression with Motion Compensation

MPEG-2. And Scalability Support. Nimrod Peleg Update: July.2004

DIGITAL TELEVISION 1. DIGITAL VIDEO FUNDAMENTALS

Scalable Video Coding With Managed Drift

STACK ROBUST FINE GRANULARITY SCALABLE VIDEO CODING

Video Compression An Introduction

Module 7 VIDEO CODING AND MOTION ESTIMATION

MPEG-2. ISO/IEC (or ITU-T H.262)

An Operational Rate-Distortion Optimal Single-Pass SNR Scalable Video Coder

FGS+: A FINE-GRANULAR SPATIO-TEMPORAL-SNR SCALABLE VIDEO CODER. Raj Kumar Rajendran, Mihaela van der Schaar, Shih-Fu Chang

Unit-level Optimization for SVC Extractor

Chapter 11.3 MPEG-2. MPEG-2: For higher quality video at a bit-rate of more than 4 Mbps Defined seven profiles aimed at different applications:

A COST-EFFICIENT RESIDUAL PREDICTION VLSI ARCHITECTURE FOR H.264/AVC SCALABLE EXTENSION

Lecture 13 Video Coding H.264 / MPEG4 AVC

Review and Implementation of DWT based Scalable Video Coding with Scalable Motion Coding.

Introduction to Video Compression

WZS: WYNER-ZIV SCALABLE PREDICTIVE VIDEO CODING. Huisheng Wang, Ngai-Man Cheung and Antonio Ortega

Cross Layer Protocol Design

Wavelet-Based Video Compression Using Long-Term Memory Motion-Compensated Prediction and Context-Based Adaptive Arithmetic Coding

One-pass bitrate control for MPEG-4 Scalable Video Coding using ρ-domain

Lecture 4: Video Compression Standards (Part1) Tutorial 2 : Image/video Coding Techniques. Basic Transform coding Tutorial 2

Lecture 5: Video Compression Standards (Part2) Tutorial 3 : Introduction to Histogram

CONTENT ADAPTIVE COMPLEXITY REDUCTION SCHEME FOR QUALITY/FIDELITY SCALABLE HEVC

Rate-Distortion Optimized Layered Coding with Unequal Error Protection for Robust Internet Video

Upcoming Video Standards. Madhukar Budagavi, Ph.D. DSPS R&D Center, Dallas Texas Instruments Inc.

Multimedia Decoder Using the Nios II Processor

LIST OF TABLES. Table 5.1 Specification of mapping of idx to cij for zig-zag scan 46. Table 5.2 Macroblock types 46

Video coding. Concepts and notations.

Video Coding Using Spatially Varying Transform

Recommended Readings

Advanced Video Coding: The new H.264 video compression standard

H.264/AVC und MPEG-4 SVC - die nächsten Generationen der Videokompression

Fully scalable texture coding of arbitrarily shaped video objects

Zonal MPEG-2. Cheng-Hsiung Hsieh *, Chen-Wei Fu and Wei-Lung Hung

Module 8: Video Coding Basics Lecture 42: Sub-band coding, Second generation coding, 3D coding. The Lecture Contains: Performance Measures

Department of Electrical Engineering, IIT Bombay.

Laboratoire d'informatique, de Robotique et de Microélectronique de Montpellier Montpellier Cedex 5 France

SNR Scalable Transcoding for Video over Wireless Channels

Digital Video Processing

Welcome Back to Fundamentals of Multimedia (MR412) Fall, 2012 Chapter 10 ZHU Yongxin, Winson

Multi-path Transport of FGS Video

A macroblock-level analysis on the dynamic behaviour of an H.264 decoder

Scalable video coding with robust mode selection

Objective: Introduction: To: Dr. K. R. Rao. From: Kaustubh V. Dhonsale (UTA id: ) Date: 04/24/2012

Modeling and Simulation of H.26L Encoder. Literature Survey. For. EE382C Embedded Software Systems. Prof. B.L. Evans

University of Bristol - Explore Bristol Research. Peer reviewed version. Link to published version (if available): /ICIP.2006.

signal-to-noise ratio (PSNR), 2

Motion Estimation for Video Coding Standards

Unbalanced Multiple Description Video Communication Using Path Diversity

Mesh Based Interpolative Coding (MBIC)

A Novel Deblocking Filter Algorithm In H.264 for Real Time Implementation

Frequency Band Coding Mode Selection for Key Frames of Wyner-Ziv Video Coding

Video Codecs. National Chiao Tung University Chun-Jen Tsai 1/5/2015

Outline Introduction MPEG-2 MPEG-4. Video Compression. Introduction to MPEG. Prof. Pratikgiri Goswami

Optimizing the Deblocking Algorithm for. H.264 Decoder Implementation

CODING METHOD FOR EMBEDDING AUDIO IN VIDEO STREAM. Harri Sorokin, Jari Koivusaari, Moncef Gabbouj, and Jarmo Takala

Performance Comparison between DWT-based and DCT-based Encoders

Smoooth Streaming over wireless Networks Sreya Chakraborty Final Report EE-5359 under the guidance of Dr. K.R.Rao

MCTF and Scalability Extension of H.264/AVC and its Application to Video Transmission, Storage, and Surveillance

Reduced 4x4 Block Intra Prediction Modes using Directional Similarity in H.264/AVC

FAST AND EFFICIENT SPATIAL SCALABLE IMAGE COMPRESSION USING WAVELET LOWER TREES

Transcription:

NTEATONAL ORGANSATON FOR STANDARDSATON ORGANSATON NTEATONALE DE NORMALSATON SO/EC JTC1/SC29/WG11 CODNG OF MOVNG CTURES AND AUDO SO/EC JTC1/SC29/WG11 MEG2002/M8672 Klagenfurt, July 2002 Title: Spatio-Temporal Scalability in DCT-based Hybrid Video Coders Source: Lukasz Blaszak, Marek Domanski, Adam Luczak, Slawomir Mackowiak oznan University of Technology, oznan, oland Contact: M. Domanski ( domanski@et.put.poznan.pl ) Group: MEG-4 Subgroup: Video urpose: roposal 1. ntroduction MEG-4 provides Fine-Granularity-Scalability (FGS) for precise matching the bitstream to channel capacity. The lack of motion-compensated temporal prediction in the enhancement layer ensures that no accumulating error occurs when only a portion of the enhancement bitstream is received by the de. Moreover, the bitrate of the received bitstream part can be seamlessly controlled in a relatively easy-to-implement system. Unfortunately, the MEG-4 FGS s exhibit significant scalability overhead as compared to respective single-layer (non-scalable) s. Current research activities in scalability are related to two major groups of approaches [1]: wavelet-based techniques and improvements of the hybrid transform s. The difficulty with the first approach is related to the necessity of embedding motion-compensated prediction into wavelet-based systems. Nevertheless, the 3-D-wavelet approach seems to be very promising as it provides inherently scalable systems. The second approach [2-8] mostly exploits multi-loop systems. Here, we are using the latter approach. Some similar approaches are those described in [3,8]. rospective applications of scalable video coding include, e.g., wireless transmission. A practical requirement is that the base layer betrate is lower than the enhancement layer bitrate. n order to meet this requirement for two-layer systems, it has been proposed to combine the spatial scalability with other types of scalability thus reducing the base layer bitrate. The exemplary solutions are the following: combination of spatial and SNR scalability [e.g. 8], combination of spatial and temporal scalability called spatio-temporal scalability [6,9]. Here the base layer represents a video sequence with reduced both temporal and spatial resolutions. This contribtion exploits the latter approach. The features of this proposal are: independent motion estimation and compensation in both loops, mixed spatio-temporal scalability, modified coding of B-frames. n this contribution, a scalable video is proposed that consists of standard MEG-4 building blocks. Moreover, the bitstream syntax is fully standard in the base layer and it needs only one modified macroblock-related field in the enhancement layers. 2. Coder structure The scalable proposed consists of two (or more) motion-compensated s (Fig. 1) that encode a video sequence and produce two bitstreams corresponding to two different levels of both spatial and temporal resolution (Fig. 1). Each of the s has its own prediction loop with own motion estimation. Experiments have proved that the optimum motion vectors are different at different spatio-temporal resolutions (Fig. 2). The experimental data suggest 1

that often the bitrate needed for additional motion vectors is well compensated by the decease in the number of bits spent for the transform coefficients needed for prediction error encoding [5]. n DCT MC ME Q Q DCT FS VLC HGH RESOLUTON BTSTREAM Spatial Decimator Spatial nterpolator High Resolution En mv_h DCT Q Q DCT FS MC ME VLC Discret cosine transform Quantization Dequantization nverse discret cosine transform Frame memory Motion-compensated predictor Motion estimator Variable length en DCT MC ME Q Q DCT FS VLC LOW RESOLUTON BTSTREAM Low Resolution En mv_l Fig. 1. General structure. [%] 60.00 [%] 100.00 80.00 40.00 60.00 20.00 40.00 MVx 20.00 0.00-10 -8-6 -4-2 0 2 4 6 8 10 0.00-10 -8-6 -4-2 0 2 4 6 8 10 MVx Fig. 5. Typical histograms of rounded differences between the base and enhancement layer motion vectors. The two plots correspond to two vector components. Fine granularity may be obtained by data partitioning in the enhancement layer. Various schemes may be applied: successive transmission of bitplanes from the individual blocks, successive transmission of the consecutive nonzero transform coefficients, i.e., the (run,level) pairs. n that way, the bitstream fed into a channel may be well matched with the throughput available. t means that the decoding process exploits only a part of one bitstream thus suffering from drift. Always, only one of the bitstreams is split, usually the medium- or high-resolution one. Therefore only one of the bitstreams received is affected by drift. 2

3. Video sequence structure The phenomenon of drift is related to the reconstruction errors which are accumulating during the process of decoding of the consecutive frames. Therefore insertion of intra-coded frames bounds propagation of drift errors to groups of pictures (GOs) (Fig. 3). n the enhancement layer, the additional -frames have the bitsream syntax of -frames with no motion vectors. Actually, all macroblocks in such a frame can be predicted from the interpolated lowresolution frames from the base layer. Enhancement GO Enhancement GO High Resolution Coder Low Resolution Coder Fig. 3. Exemplary structure of a video sequence: There exist GO structures in the enhancement layer where the - frames are encoded with respect to the interpolated - or -frames from the base layer. Moreover, higher percentage of B-frames also causes that drift accumulates slower. n the case of sequences with B-frames, quit efficient means of temporal decimation can be used, i.e., skipping some or all B-frames. For higher percentages of B-frames in a sequence, there exist two types of B-frames: BE-frames processed by the high-resolution only and BR-frames that are processed by both s like - and -frames. An exemplary but typical enhancement GO structure is as follows: BE BR BE BE BR BE BE BR BE BE BR BE. The full-resolution BR-frames may be used as reference frames for temporal prediction in the enhancement layer. Such a video sequence structure corresponds to a relatively low propagation of drifting errors. BE BR BE BE BR High Resolution Coder BR BR Low Resolution Coder Fig. 4. Exemplary structure of a video sequence: Number of B-frames is 75% of the total number of frames. The BR-frames used as reference frames for the BE-frames in the enhancement layer. 4. B-frame encoding mproved prediction is used for the BR-frames, which are the B-frames represented in both layers. Each macroblock in a full-resolution BR-frame can be predicted from the following reference frames (Fig.5): - previous reference frame R (- or -frame), - next reference frame (- or -frame), - current reference frame (BR-frame). 3

BE BR BE NTER- OLATED Enhancement Layer Base Layer BR Fig. 5. mproved prediction of B-frames. The data from the previous and next reference frames R and are motion-compensated, and data from the current reference frame are upsampled in the two-dimensional space domain. The best suited reference frame or average of two or three reference frames is chosen according to the criterion of smallest prediction error. The improvement consists in another decision strategy. The best prediction/interpolation is chosen from all three possible reference frames: previous, future and interpolated (Fig. 5). Experimental results prove that the current reference frame, i.e. a low-resolution BR-frame, is used in prediction of significant portion of macroblocks, sometimes even for more than 50% of all macroblocks in BR-frames (Fig. 6). Application of the above scheme of prediction leads to up to 20% lower bitrates for the enhancement-layer B-frames as compared to the scheme where the choice is made between spatial interpolation and the best temporal prediction. [%] 50 45 40 35 25 20 15 10 5 0 R R R R [%] 50 45 40 35 25 20 15 10 5 0 R R R R Fig. 6. Exemplary but typical percentages of macroblocks predicted using individual reference frames or their averages. 5. Bitstream structure The base-layer bitsream is the standard single-layer MEG bitsream. n this contribution, in the enhancement high-resolution layer, we propose to encode the macroblock prediction type by use of variable-length codes. These codes could replace the COD flag in the macroblock headers. For the case of -frames, the estimated codes would be as in Table 1. The codes have been obtained from the statistics measured for three test CF sequences at two bitrates. Table 1. Exemplary codes for macroblock prediction type in -macroblock headers. Code Reference Data encoding (DCT coefficients and/or MVs) 1 temporal yes 00 average yes 010 temporal no 0110 average no 01110 interpolated yes 01111 interpolated no The above field corresponds to the only bitstream syntax correction needed. 4

6. Experimental results The structure has been tested using for some reference platforms and wide range of bitrates.. The results obtained for the range of few megabits per second are quite promising as the bitrate overhead due to scalability is mostly below 10% with an MEG-2 reference en. n these experiments, we used the sequences of structure shown in Fig. 4. SNR [db] 36 35 33 31 29 27 0 1 2 3 4 5 6 bitrate [Mbps] Test sequences: BUS single-layer BUS scalable CHEER single-layer CHEER scalable FLOWER GARDEN single-layer FLOWER GARDEN scalable SNR [db] 40 38 36 26 0 1 2 3 4 5 6 bitrate [Mbps] Test sequences: CHEER low resolution layer CHEER high resolution layer BUS low resolution layer BUS high resolution layer FLOWER GARDEN low resolution layer FLOWER GARDEN high resolution layer Fig. 7. Coding efficiency (luminance SNR versus bitrate) for the two-loop structures (temporal subsampling factor = 2, image format = 4CF, 50 fps). Scalable s compared to the respective MEG-2 single-layer s. For the H.263 reference codec, the bitrate overhead due to scalability has been measured relative to nonsacalble bitsream. This relative overhead depends strongly on the options switched on and on the quality of motion estimation as well. For example, the results for full-pel motion estimation exhibit sometimes even negative overhead for constant bitrate mode in both layers independently controlled (Fig. 8). Similar phenomena are also described in the references [10]. mprovement of the temporal prediction results in less intensive use of the reference frames interpolated from the base layer (Fig. 9). The bitrate overhead increases as shown in Table 2. SNR [db] 36 26 SNR [db] 36 26 24 24 0 500 1000 1500 2000 bitrate [kbps] Test sequences: FOOTBALL single-layer FOOTBALL scalable BASKET single-layer BASKET scalable CHEER single-layer CHEER scalable 22 0 200 400 600 800 1000 1200 1400 bitrate [kbps] Test sequences: FOOTBALL low resolution layer FOOTBALL high resolution layer BASKET low resolution layer BASKET high resolution layer CHEER low resolution layer CHEER high resolution layer Fig. 8. The experimental results for the H.263 reference (full-pel resolution, motion vector search range ±16, independent bitrate control for both layers, and 4 MVs per macroblock enabled ). No B-frames used. 5

100 0 fun 0 250 200 150 half pel one pel 100 50 0 Skip Skip_interpolated Skip_average nterpolation_no_mv interpolation_mv no_interpolation Fig. 9. Frequency of individual prediction modes for -macroblocks measured for one- and half-pel motion estimation and compensation (Test sequence Funfair about 400 kbitps). Table 2. Experimental results for half-pel motion estimation and compensation. No B-frames used. H.263 - based for CF (352 8) sequences Football Basket Single-layer Bitstream [Kbps] 402.10 0.40 (H.263) Average luminance SNR [db] 31.61 25.63 Low resolution layer bitstream [Kbps] 109.91 100.31 Low resolution layer average SNR [db] for luminance.10 24.71 High resolution layer bitstream [Kbps] 356.98 9. roposed scalable Single-layer (H.263) roposed scalable Average SNR [db] for luminance recovered from both layers 31.67 25.63 Bitstream overhead [%] 16.11 26.21 Bitstream [Kbps] 651.74 588.66 Average luminance SNR [db].76.13 Low resolution layer bitstream [Kbps] 186.89 194.52 Low resolution layer average SNR [db] for luminance.78 27. High resolution layer bitstream [Kbps] 566.51 584.50 Average SNR [db] for luminance recovered from both layers.69.19 Bitstream overhead [%] 15.60.33 Single-layer Bitstream [Kbps] 925.46 913.13 (H.263) Average luminance SNR [db] 36.70.15 Low resolution layer bitstream [Kbps] 262.95 9.56 Low resolution layer average SNR [db] for luminance.61 29.23 roposed High resolution layer bitstream [Kbps] 815.45 878.41 scalable Average SNR [db] for luminance recovered from both layers 36.74.12 Bitstream overhead [%] 16.53 27.91 6

The TML-9.4 (H.26L) uses more efficient temporal prediction that of H.263. t results in higher scalability overhead mostly not much lower than for simulcast (Fig. 10). The experiments have been made with default options in TML-9.4 for the CF test sequences. 2500000 Comparison between non scalable and scalable s 2000000 Bitrate [bits/sec] 1500000 1000000 500000 Overhead in scalable Nonscalable 0 bus fun cheer football basket Test sequences Fig. 10. A comparison between non scalable and scalable TML 9.4 based s. Results obtained for progressive CF sequences encoded with no B-frames. For all the cases above considered, the base-layer was about - 40% of the total bitrate for all three decimation factors (temporal, horizontal and vertical) set to 2. Exemplary results for fine granularity scalability are shown in Figs. 11 and 12. Fig. 11. Bitrate control using FGS proposed with the GO length of 12. SNR [db] 33 31 29 27 26 25 1 2 3 4 5 6 7 8 9 10 11 12 The frame number in GO all coefficients 9 coefficients 8 coefficients 7 coefficients 6 coefficients 5 coefficients 4 coefficients 3 coefficients 2 coefficients 1 coefficient Fig. 12. Decreasing signal-to-noise ratio according to drift for various numbers of DCT coefficients per block transmitted in the enhancement layer to the de. The two-loop and the sequence structure from Fig. 4. Test 7

sequence Funfair with an average bitrate 5Mbps. 7. Conclusions Described is a generic multi-loop structure for motion-compensated fine-granularity scalability. The basic features of the are: - mixed spatio-temporal scalability, - independent motion estimation for each motion-compensation loop, i.e. for each spatio-temporal resolution layer, - BR/BE-frame structure, - improved prediction of BR-frames. For most experiments, the lost of quality due to fine granularity is not dramatic. n many applications, the reduced bitrate corresponds to some perturbation and therefore a certain loss of coding performance can be acceptable when the whole performs well. Drift can be reduced by dividing a particular layer bitstream into GOs. Unfortunately, for the TML-9.4, the coding performance is only slightly better than that of simulcast. On the other hand, the encoding time is not much higher than for simulcast. Some experiments show that coding efficiency can be improved using some smoothing of motion field and joint encoding of motion vectors from both resolution levels (both layers). References [1] J.-R.Ohm, M. Beermann, Status of scalable technology in video coding, Doc. SO/EC JTC1/SC29/WG11 MEG01/M7483, Sydney, July 2001 [2] J.-R.Ohm, M. van der Schaar, Scalable Video Coding, Tutorial material, nt. Conf. mage rocessing C 2001, EEE, Thessaloniki, October 2001. [3] Y. He, R. Yan, F. Wu, S. Li, H.26L-based fine granularity scalable video coding, Doc. Doc. SO/EC JTC1/SC29/WG11 MEG01/m7788, December 2001. [4] K.Rose, S. Regunathan, Toward optimality in scalable predictive coding, EEE Trans. mage roc., vol. 10, pp.965-976, July 2001. [5] M. Domański, S. Maćkowiak, Modified MEG-2 video s with efficient multi-layer scalability, nt. Conf. mage rocessing C 2001, EEE, vol., pp. 1033-36, Thessaloniki, October 2001. [6] M. Domański, A. Łuczak, S. Maćkowiak, Spatio-temporal scalability for MEG video coding, EEE Trans. Circ. and Syst. Video Technology, vol. 10, pp. 1088-1093, Oct. 2000. [7] A. Reibman, L. Bottou, A. Basso, DCT-based scalable video coding with drift, nt. Conf. mage rocessing C 2001, EEE, vol., pp. 989-992, Thessaloniki, October 2001. [8] U. Benzler, Spatial scalable video coding using a combined subband-dct approach, EEE Trans. Circuits and Systems for Video Technology, vol. 10, pp. 1080-1087, October 2000. [9] M. Domański, A. Łuczak, S. Maćkowiak, R. Świerczyński, Hybrid coding of video with spatio-temporal scalability using subband decomposition, in Signal rocessing X: Theories and Applications, roc. EUSCO-98, pp. 53-56, Rhodes, September 1998. [10] G. Cote, B. Erol, M. Gallant, F. Kossentini, H.263+: video coding at low bit rates, EEE Trans. Circ. and Syst. Video Technology, vol. 8, pp. 849-865, November 1998. 8