AVC VIDEO CODERS WITH SPATIAL AND TEMPORAL SCALABILITY. Marek Domański, Łukasz Błaszak, Sławomir Maćkowiak,

Similar documents
INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO

SCALABLE HYBRID VIDEO CODERS WITH DOUBLE MOTION COMPENSATION

Lecture 5: Error Resilience & Scalability

Interframe coding A video scene captured as a sequence of frames can be efficiently coded by estimating and compensating for motion between frames pri

Advances of MPEG Scalable Video Coding Standard

ARCHITECTURES OF INCORPORATING MPEG-4 AVC INTO THREE-DIMENSIONAL WAVELET VIDEO CODING

Week 14. Video Compression. Ref: Fundamentals of Multimedia

OPTIMIZATION OF LOW DELAY WAVELET VIDEO CODECS

System Modeling and Implementation of MPEG-4. Encoder under Fine-Granular-Scalability Framework

Comparative Study of Partial Closed-loop Versus Open-loop Motion Estimation for Coding of HDTV

System Modeling and Implementation of MPEG-4. Encoder under Fine-Granular-Scalability Framework

A Hybrid Temporal-SNR Fine-Granular Scalability for Internet Video

Fast Decision of Block size, Prediction Mode and Intra Block for H.264 Intra Prediction EE Gaurav Hansda

Chapter 10. Basic Video Compression Techniques Introduction to Video Compression 10.2 Video Compression with Motion Compensation

IMPROVED CONTEXT-ADAPTIVE ARITHMETIC CODING IN H.264/AVC

STACK ROBUST FINE GRANULARITY SCALABLE VIDEO CODING

DIGITAL TELEVISION 1. DIGITAL VIDEO FUNDAMENTALS

Module 7 VIDEO CODING AND MOTION ESTIMATION

LONG-TERM MEMORY PREDICTION USING AFFINE MOTION COMPENSATION

FGS+: A FINE-GRANULAR SPATIO-TEMPORAL-SNR SCALABLE VIDEO CODER. Raj Kumar Rajendran, Mihaela van der Schaar, Shih-Fu Chang

Frequency Band Coding Mode Selection for Key Frames of Wyner-Ziv Video Coding

An Improved H.26L Coder Using Lagrangian Coder Control. Summary

Video Compression Standards (II) A/Prof. Jian Zhang

Mesh Based Interpolative Coding (MBIC)

H.264/AVC und MPEG-4 SVC - die nächsten Generationen der Videokompression

High Efficient Intra Coding Algorithm for H.265/HVC

Decoded. Frame. Decoded. Frame. Warped. Frame. Warped. Frame. current frame

H.264/AVC Baseline Profile to MPEG-4 Visual Simple Profile Transcoding to Reduce the Spatial Resolution

10.2 Video Compression with Motion Compensation 10.4 H H.263

ADVANCES IN VIDEO COMPRESSION

Optimal Estimation for Error Concealment in Scalable Video Coding

Objective: Introduction: To: Dr. K. R. Rao. From: Kaustubh V. Dhonsale (UTA id: ) Date: 04/24/2012

The Scope of Picture and Video Coding Standardization

JPEG 2000 vs. JPEG in MPEG Encoding

H.264 to MPEG-4 Transcoding Using Block Type Information

Module 8: Video Coding Basics Lecture 42: Sub-band coding, Second generation coding, 3D coding. The Lecture Contains: Performance Measures

Video Compression An Introduction

In the name of Allah. the compassionate, the merciful

A 3-D Virtual SPIHT for Scalable Very Low Bit-Rate Embedded Video Compression

Scalable Extension of HEVC 한종기

Optimum Quantization Parameters for Mode Decision in Scalable Extension of H.264/AVC Video Codec

Digital video coding systems MPEG-1/2 Video

Outline Introduction MPEG-2 MPEG-4. Video Compression. Introduction to MPEG. Prof. Pratikgiri Goswami

Video Transcoding Architectures and Techniques: An Overview. IEEE Signal Processing Magazine March 2003 Present by Chen-hsiu Huang

Advanced Video Coding: The new H.264 video compression standard

2014 Summer School on MPEG/VCEG Video. Video Coding Concept

Scalable Video Coding

Rate-Distortion Optimized Layered Coding with Unequal Error Protection for Robust Internet Video

A COST-EFFICIENT RESIDUAL PREDICTION VLSI ARCHITECTURE FOR H.264/AVC SCALABLE EXTENSION

Recent, Current and Future Developments in Video Coding

Unit-level Optimization for SVC Extractor

Using animation to motivate motion

Scalable Multiresolution Video Coding using Subband Decomposition

MULTIDIMENSIONAL SIGNAL, IMAGE, AND VIDEO PROCESSING AND CODING

Standard Codecs. Image compression to advanced video coding. Mohammed Ghanbari. 3rd Edition. The Institution of Engineering and Technology

High Efficiency Video Coding: The Next Gen Codec. Matthew Goldman Senior Vice President TV Compression Technology Ericsson

Digital Video Processing

VIDEO AND IMAGE PROCESSING USING DSP AND PFGA. Chapter 3: Video Processing

Overview: motion-compensated coding

WHITE PAPER ON2 TECHNOLOGIES, INC. TrueMotion VP7 Video Codec. January 10, 2005 Document Version: 1.0

SNR Scalable Transcoding for Video over Wireless Channels

Lecture 6: Compression II. This Week s Schedule

VIDEO COMPRESSION STANDARDS

Wavelet-Based Video Compression Using Long-Term Memory Motion-Compensated Prediction and Context-Based Adaptive Arithmetic Coding

Automatic Video Caption Detection and Extraction in the DCT Compressed Domain

Video coding. Concepts and notations.

Overview of Multiview Video Coding and Anti-Aliasing for 3D Displays

Video Codecs. National Chiao Tung University Chun-Jen Tsai 1/5/2015

EE 5359 H.264 to VC 1 Transcoding

New Techniques for Improved Video Coding

Performance Analysis of DIRAC PRO with H.264 Intra frame coding

Reduced Frame Quantization in Video Coding

Transcoding from H.264/AVC to High Efficiency Video Coding (HEVC)

An Efficient Table Prediction Scheme for CAVLC

MCTF and Scalability Extension of H.264/AVC and its Application to Video Transmission, Storage, and Surveillance

Emerging H.26L Standard:

An Efficient Mode Selection Algorithm for H.264

ECE 634: Digital Video Systems Scalable coding: 3/23/17

Video Compression. Learning Objectives. Contents (Cont.) Contents. Dr. Y. H. Chan. Standards : Background & History

Smoooth Streaming over wireless Networks Sreya Chakraborty Final Report EE-5359 under the guidance of Dr. K.R.Rao

Fraunhofer Institute for Telecommunications - Heinrich Hertz Institute (HHI)

Chapter 11.3 MPEG-2. MPEG-2: For higher quality video at a bit-rate of more than 4 Mbps Defined seven profiles aimed at different applications:

CONTENT ADAPTIVE COMPLEXITY REDUCTION SCHEME FOR QUALITY/FIDELITY SCALABLE HEVC

Scalable Video Coding in H.264/AVC

Compressed-Domain Video Processing and Transcoding

FRAME-RATE UP-CONVERSION USING TRANSMITTED TRUE MOTION VECTORS

Lecture 5: Video Compression Standards (Part2) Tutorial 3 : Introduction to Histogram

Fast Mode Decision for H.264/AVC Using Mode Prediction

1740 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 7, JULY /$ IEEE

Deblocking Filter Algorithm with Low Complexity for H.264 Video Coding

Technische Universität Berlin, Institut für Fernmeldetechnik Three-Dimensional Subband Coding with Motion Compensation

A New Fast Motion Search Algorithm for Block Based Video Encoders

ECE 417 Guest Lecture Video Compression in MPEG-1/2/4. Min-Hsuan Tsai Apr 02, 2013

One-pass bitrate control for MPEG-4 Scalable Video Coding using ρ-domain

(Invited Paper) /$ IEEE

QUANTIZER DESIGN FOR EXPLOITING COMMON INFORMATION IN LAYERED CODING. Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose

EE Multimedia Signal Processing. Scope & Features. Scope & Features. Multimedia Signal Compression VI (MPEG-4, 7)

EE Low Complexity H.264 encoder for mobile applications

Wireless Communication

FAST SPATIAL LAYER MODE DECISION BASED ON TEMPORAL LEVELS IN H.264/AVC SCALABLE EXTENSION

Transcription:

AVC VDEO CODERS WTH SATAL AND TEMORAL SCALABLTY Marek Domański, Łukasz Błaszak, Sławomir Maćkowiak, oznań University of Technology, nstitute of Electronics and Telecommunication, oznań, oland E-mail: { domanski, lblaszak, smack } @ et.put.poznan.pl ABSTRACT The paper describes a scalable extension of the AVC coder. The assumption is to introduce possibly minor modifications of the bitstream semantics and syntax as well as to avoid as much as possible the technologies that are not present in the existing structure of the AVC codec. The coder combines spatial with temporal scalability. The coder consists of two motioncompensated sub-coders that encode a video sequence and produce two bitstreams corresponding to two different levels of spatial and temporal resolution. Each of the sub-coders has its own prediction loop with independent motion estimation. The system employs adaptive interpolation. The interpolation-dependent modes are carefully embedded into the mode hierarchy of the AVC coder thus obtaining the codes that correspond to the mode probabilities. 1. NTRODUCTON Recently, the JVT Committee has prepared Version 1 of the new video coding standard called AVC [1]. This standard is called also H.264, and defines improved hybrid video coding tools. The main features related to improve coding efficiency are the following: flexible size of rectangular blocks for motioncompensated prediction, advanced intraframe prediction, flexible choice of prediction modes, a multiframe memory in the motion-compensated predictor. The AVC codec exhibits substantial efficiency improvement as compared to the H.263+ and MEG-4 natural video codecs. The AVC Version 1 video codec does not support scalability that is currently considered as an important functionality for many applications, e.g., wireless systems with bandwidth variations and fadings, video broadcasting in heterogeneous communication networks, unequal error protection etc. [3,4] The goal of the paper is to describe a scalable extension of the AVC coder. The assumption is to introduce possibly minor modifications of the bitstream semantics and syntax as well as to avoid as much as possible the technologies that are not present in the existing structure of the AVC codec. Such an approach will limit the implementation costs of scalability. Currently, two major candidates are considered for future standard scalable video coders [5], i.e.: - modified hybrid video coder with motioncompensated prediction and block-based transforms, - wavelet-based video coder with a special emphasis on 3-D wavelet video coder with motion-compensated filter banks. Considered are also various combinations of the both above mentioned approaches. Here, the first approach is considered because it does require neither deep modifications of the AVC bitstream syntax nor a major change of the AVC codec structure. n the context of H.26L (the earlier version of the AVC coder), similar approach was already exploited as described in [2]. Nevertheless the approach from [2] has employed a different coder structure with common motion estimation which resulted in worse motion compensation. 2. CODER STRUCTURE This paper deals with coders that can combine spatial and temporal scalability. n such a case, the base represents a video sequence with reduced both temporal and spatial resolutions. Such a combination is very practical as it allows low bitrate base with reasonable spatial resolution, e.g. CF/SF for standard television input. Our scalable coder adopts the structure already proposed for MEG-2 and H.263 codecs [6-8]. t consists of two motion-compensated sub-coders (Fig. 1) that produce two bitstreams corresponding to two different levels of spatial and temporal resolution. Each of the sub-coders has its own prediction loop with independent motion estimation.

input Spatial and temporal subsampling High resolution coder with motion estimation and motion compensation Spatial interpolation Low resolution coder with motion estimation and motion compensation transform coefficients transform coefficients mv_h mv_l HGH-RESOLUTON BTSTREAM LOW-RESOLUTON BTSTREAM Fig. 1. General structure of the scalable coder considered. mv_l and mv_h denote motion vectors from the low-resolution and the high-resolution, respectively. The low-resolution sub-coder is implemented as a standard motion-compensated hybrid AVC coder that produces a bitstream with fully standard AVC syntax. The high-resolution sub-coder is a modified AVC coder that is able to exploit the interpolated macroblocks from the decoded base- bitstream. These interpolated macroblocks are used as reference macroblocks for prediction whenever they provide lower cost. Other additional reference macroblocks are created by averaging the reference of temporal prediction and the interpolated macroblock. 3. SATO-TEMORAL DECOMOSTON Good performance of spatio-temporal down- and upsampling is critical for good performance of the whole codec. Spatial decimation includes spatial lowpass filtering that prevents spatial aliasing in the base- lowresolution sequence. The choice of the filter trades off between high aliasing attenuation and short temporal response. The results of experimental comparisons prove the importance of the careful choice of the decimation-interpolation scheme. The system considered employs edge-adaptive bicubic interpolation as described in [9]. The technique is applicable to both luminance and chrominance. The technique of edge-adaptive interpolation is an extension of the standard non-adaptive bi-cubic separable interpolation that can be described as follows. The two-dimensional interpolation is performed in two steps: horizontal and vertical. Let f(x) is the value to be interpolated, and the nearest available values are located at coordinates x k (left) and x k+1 (right). Let s = x x k, 1 s = x k+1 x, where 0 s 1. There is f(x) = f(x k-1 )(-s 3 + 2 s 2 s)/2 + f(x k )(3s 3-5s 2 +2)/2 + f(x k+1 )(-3s 3 + 4s 2 +s)/2 + f(x k+2 )(s 3 s 2 )/2, where x k-1, x k, x k+1 and x k+2 are the positions of four neighboring known pixels. n the edge-adaptive scheme, a modified value s is used instead of s. s = s kas(s 1), where k is a positive parameter that controls the intensity of warping and A is a function of asymmetry of the data in the neighborhood of x: A= ( f(x k+1 ) f(x k-1 ) - f(x k+2 ) f(x k ) )/(L 1), where L = 256 for 8-bit sample representation. n the experiments, it was k = 3.05. n the simplest case, temporal downsampling is performed via frame skipping. n particular, B-frame skipping constitutes very efficient and robust downsampling scheme (Fig. 2). Therefore there may exist two types of B-frames: - BE-frames that exist in the enhancement only and - BR-frame that exist both in the base and in the enhancement. a) b) c) BE BR BE BE BR BR BE BE BR High Low Fig. 2. Exemplary structures of low-resolution and highresolution video sequences with temporal subsampling by factor 2. High Low High Low

4. REFRENCE FRAMES n the enhancement, the coding scheme takes advantage of two additional reference frames: the frame interpolated from the decoded current base- low-resolution frame, an average of the latter and the last temporal reference frame. One can use even more reference frames obtained as combinations of the interpolated frame and various temporal reference. Nevertheless those possibilities have been not exploited in the experiments reported in a subsequent paragraph. Moreover, for the latter above mentioned reference frame, independent motion estimation can be performed aiming at estimation of the optimum motion vectors that yield the minimum prediction error for the reference being an average of spatial and temporal references. This option was used in the experiments reported further. Application of these additional reference frames does not require bitstream syntax modifications and just minor modifications of the semantics for the reference frame variables. 5. REDCTON MODE SELECTON Sophisticated intra- and interframe predictions are related to major performance improvements in the AVC coders. The enhancement- sub-coder employs additional prediction modes that exploit the current interpolated base- frame as the reference. Other modes exploit averages of temporal prediction and spatial interpolation as references. These modes are carefully embedded into the mode hierarchy of the AVC coder thus obtaining the binary codes that correspond to the mode probabilities. The respective mode hierarchy is shown in Table 1. The choice of the lowest-cost prediction mode plays the key role. The encoding scheme would reduce to simulcast when no interpolated reference macroblocks are used in the enhancement. n the other extreme situation, in the enhancement, no temporal prediction is used, and only interpolated base- frames are used for prediction of the enhancement macroblocks (like in MEG-4 FGS). The latter situation is very unlikely because of the high efficiency of the AVC temporal prediction. Nevertheless the extreme situations are related to unsatisfactory coding performance. The spatial interpolation must be very efficient in order to avoid them. Good fidelity of the decimation-interpolation scheme results in reasonable probability that the reference sample block interpolated from the base leads to smaller prediction error as compared to the temporal prediction within the enhancement. Frame type ntra () nter () nter (B) Table 1. rediction mode hierarchy rediction modes 1. Spatial interpolation from base (16 16 block size). 2. All standard intra prediction modes. 1. rediction (forward) from the nearest reference frame. 2. Spatial interpolation from base (16 16-4 4 block size). 3. Average of two above (1, 2). 4. Temporal prediction modes from other reference frames in the order defined in AVC specification. 5. All standard intra modes. 1. rediction (forward, backward and bidirectional) from the nearest reference frame. 2. Spatial interpolation from base (16 16-4 4 block size). 3. Average of two above (1, 2). 4. Temporal prediction modes from other reference frames in the order defined in AVC specification. 5. All standard intra modes. 6. EXERMENTAL RESULTS The scalable test model has been implemented on the top of standard JVT software version 2.1. Both coder and decoder have been implemented. n order to test the coding performance of the scalable AVC codec, a series of experiments have been performed with (352 288)-pixel sequences. n the experiments, the following modes have been switched on: - CABAC coder, - ¼-pel motion estimation in both s, - all prediction modes. The experiments have been performed for three sets of the quantization parameter values. These values were defined independently for -frames (Q ), -frames (Q ) and B-frames (Q B ). n the tests, equal values of Q, Q and Q B were applied in the base and the enhancement, respectively.

Table 2. Coding efficiency comparison for scalable, nonscalable and simulcast coding Test sequence Bus Cheer Football Fun Basket Original frame rate [fps] Base Enhancement Whole scalable codec 30 30 30 25 25 Q =10, Q =11, Q B =12 37.17 502.11 38.60 279.35 37.72 751.80 39.76 354.37 37.17 502.11 38.06 2215.86 39.06 1533.10 38.56 3030.11 40.75 1388.49 38.06 2215.86 38.06 2717.97 39.06 1812.45 38.56 3781.91 40.75 1742.86 38.06 2717.97 Simulcast 38.02 3039.57 39.05 1939.98 38.54 4384.43 40.75 2081.16 38.02 3039.57 Nonscalable codec 38.02 2537.46 39.05 1660.63 38.54 3632.63 40.75 1726.79 38.02 2537.46 Q =15, Q =16, Q B =17 Base Enhancement Whole scalable codec 32.96 286.13 34.65 151.94 33.51 457.17 36.20 200.74 32.96 286.13 34.08 1189.34 35.14 816.94 34.51 1702.18 37.31 759.71 34.08 1189.34 34.08 1475.47 35.14 968.88 34.51 2159.35 37.31 960.45 34.08 1475.47 Simulcast 34.08 1668.77 35.16 1036.59 34.55 2559.36 37.45 1165.98 34.08 1668.77 Nonscalable codec Base Enhancement Whole scalable codec 34.08 1382.64 35.16 884.65 34.55 2102.19 37.45 965.24 34.08 1382.64 Q =20, Q =21, Q B =22 29.06 156.33 31.03 77.85 29.65 261.20 33.12 109.05 29.06 156.33 30.28 645.36 31.52 434.19 30.76 955.72 34.17 420.95 30.28 645.36 30.28 801.69 31.52 512.04 30.76 1216.92 34.17 530.00 30.28 801.69 Simulcast 30.33 913.55 31.63 544.96 30.89 1466.69 34.39 651.76 30.33 913.55 Nonscalable codec 30.33 757.22 31.63 467.11 30.89 1205.49 34.39 542.71 30.33 757.22

relative to nonscalable Q =10 Q =11 Q B =12 Scalable Simulcast Nonscalable 1.25 1.20 1.15 1.10 1.05 1.00 0.95 coastguard dancer hall_monitor silent singer table basket bus cheer football fun mother_daughter Test sequences relative to nonscalable Q =15 Q =16 Q B =17 Scalable Simulcast Nonscalable 1.35 1.30 1.25 1.20 1.15 1.10 1.05 1.00 0.95 coastguard dancer hall_monitor silent singer table basket bus cheer football fun mother_daughter Test sequences Q =20 Q =21 Q B =22 relative to nonscalable Scalable Simulcast Nonscalable 1.25 1.20 1.15 1.10 1.05 1.00 0.95 coastguard dancer hall_monitor silent singer table basket bus cheer football fun mother_daughter Test sequences Fig. 3. Approximate bitrate comparison for scalable, nonscalable (single-) and simulcast coding.

n order to compare the scalable codec with the nonsacalable reference AVC codec as well as with the simulcast pair of nonscalable AVC codecs, the experiments have been performed with constant values of Q, Q and Q B that imply almost constant quality measured in terms of the factor for the luminance component in a given sequence. Of course, the quality measured for different sequences is different, but for a given video sequence and a given set of Q, Q and Q B, the results for scalable, nonscalable and simulcast coding differ mostly less than 0.3 db and often even less than 0.1 db. For such conditions, bitrates have been estimated for the scalable coder (whole scalable coder), nonscalable coder and simulcast coding (Table 2 and Fig. 3). For such test conditions, the approximate bitrate overhead due to scalability was between -1% and 30% of the bitrate for the nonscalable (single-) codec (Fig. 3). For almost all cases, scalable coder performed better than simulcast coding. Usually scalable coding performance was substantially higher than that of simulcast. Within a scalable coder, the base bitrate was about 15% to 22% of the total bitrate produced by a scalable coder for both s. 7. CONCLUSONS For the two- system with spatio-temporal scalability, the bitrate overhead due to scalability varies between -1% and 30% depending on sequence content and bitrate allocation (Fig.3). For almost all cases, scalable coder performed better than simulcast coding. Usually scalable coding performance was substantially higher than that of simulcast. Within a scalable coder, the base bitrate was about 15% to 22% of the total bitrate produced by a scalable coder for both s. n the paper, described is an extension of AVC coder structure. The major features of the presented solution are: - mixed spatio-temporal scalability, - independent motion estimation for each motioncompensation loop, i.e. for each spatio-temporal resolution, - adaptive decimation and interpolation. These above features are also the reasons for good performance of the whole coder. ACKNOWLEDGEMENT The work has been partially supported by the ST rogramme of the EU under contract number ST-2000-26467 (MASCOT). REFERENCES [1] SO/EC/SC29/WG11/MEG02/N4920, SO/EC 14496-10 AVC TU-T Rec. H.264, Text of Final Committee Draft of Joint Video Specification, Klagenfurt, July 2002. [2] Y. He, R. Yan, F. Wu, S. Li, H.26L-based fine granularity scalable video coding, SO/EC JTC1/SC29/ WG11 MEG02/M7788, Dec. 2001. [3] D. Wu, Y. Hou, Y. Zhang, Scalable video coding and transport over broad-band wireless networks, roc. of the EEE, vol. 89, pp. 6-20, January 2001. [4] M. van der Schaar, C.J. Tsai, T. Ebrahimi, Report of ad hoc group on scalable video coding, SO/EC JTC1/SC29/ WG11 MEG02/M9076, Dec. 2002. [5] J.-R.Ohm, M. van der Schaar, Scalable Video Coding, Tutorial material, nt. Conf. mage rocessing C 2001, 2001. [6] M. Domański, S. Maćkowiak, On improving MEG spatial scalability, in roc. nt. Conf. mage roc., vol. 2, pp. 848-851, 2000. [7] Ł. Błaszak, M. Domański, A. Łuczak, S. Maćkowiak, Spatio-temporal scalability in DCTbased hybrid video coders, SO/EC JTC1/SC29/ WG11 MEG02/M8672, July 2002. [8] S. Maćkowiak, Scalable Coding of Digital Video, Doctoral dissertation, oznań University of Technology, oznań 2002. [9] G. Ramponi, Warped distance for space-variant linear image interpolation, EEE Transactions on mage rocessing, vol. 8, pp. 629-639, May 1999.