Scalable Extension of HEVC 한종기

Contents 0. Overview for Scalable Extension of HEVC 1. Requirements and Test Points 2. Coding Gain/Efficiency 3. Complexity 4. System Level Considerations 5. Related Contributions 6. Future Work

0. Overview for Scalable Extension of HEVC

Video Contents Adaptation Heterogeneous System Homogeneous System Need scalability!!!

Use Cases for Scalable HEVC

Topics related to Scalable Extension 1.Spatial scalability 2.Temporal scalability 3.SNR scalability 4.Aspect ratio 5.Error resilience

Example for Scalable HEVC Encoder

1. Requirements and Test Points

1.1 Spatial Scalability Recommended Requirements The spatial scalability extension shall support 2 or more layers, each with different spatial resolution Test point 1.5x and 2.0x for progressively scanned content will be test points Non 1.5x and 2.0 x will not be tested Has not yet determined Open Issue Support of Interlaced scanned video Switching points (number of spatial layers decoded at intermediate pictures) sh ould be supported Synch up with JCT to see if there is a conflict of this requirement with work going on with base layer We support different color space across the layers in Phase 1

Interlaced Scanned Video To be determined. There still exists content and new content is created that is captured in interlaced scanning format. support the base layer that is progressive (e.g. HVGAp30 or 720p30) and the enhancement layer that is interlaced (e.g. 480i30 or 1080i30).

1.2 SNR Scalability Recommended Requirements The scalability extension shall support layers with same spatial and temporal resolutions but different bit rates Test point Coding efficiency with two scalable layers and 25% bit rate reduction to be used for testing Bit rate variation can be achieved via adjusting Quantization parameter, Quantization matrix and/or truncation of frequency coefficients. Open Issue Fine Grain Scalability (FGS) No requirement identified, so far At least not for Phase 1

1.3 Temporal Scalability Recommended Requirements The standard shall support layers with different frame rates in increasing or decreasing factors of multiple of 2 Test point Key operating point is a factor of 2 in the frame rates at the two layers

1.4 Aspect Ratio Recommended Requirements The standard should be able to support different picture aspect ratios (PARs) across the layers Test point Harmonize with Spatial Scalability requirement Has not yet determined The standard should be able to support different picture sample aspect ratios (PSARs) across the layers The standard should be able to support spatially varying PSAR The standard should be able to support pan scan such that the lower layer can be placed at different locations relative to enhancement layers

PAR vs. PSAR PAR(Picture Aspect Ratio) The ratio width:height of the captured picture where width and height are measured in the same length (spatial measurement) units. 4:3 16:9 PSAR(Picture Sample Aspect Ratio) The ratio between the horizontal distance between the columns and the vertical distance between the rows of the luma sample array in a picture. 4:3 16:9

Pan scan

1.5 Error Resilience Recommended Requirements The scalability extension should enhance the transmission over error prone or congested channels compared to the transmission of single layer HEVC Has not yet deter mined Focus on packet losses with varying statistics across various networks Include mechanisms that can take advantage of advanced system layer features Options Unequal protection of layers Unequal protection of especially important NAL units in a layer (parameter sets, intra slices, ) Graceful degradation Sending additional information layer to help in concealing losses Low delay source coding based error resiliency, using scalable layers Low delay error resiliency Resiliency against burst losses Allow the option of two layers to be independently decodable

1.7 miscellaneous Recommended Requirements (End to End Delay) The scalability extension shall enable at least the same end to end delay and latency as single layer HEVC. Recommended Requirements (Random access) The scalability extension shall allow for at least the same random access pro perties as single layer HEVC Has not yet determined (TBD) How many maximum numbers of layers are allowed

2. Coding Gain/Efficiency

Coding Gain/Efficiency Recommended Requirements shall have significantly less bitrate than simulcast at same perceived quality of same resolutions shall not have significantly higher bitrate than a single layer stream of the highest resolution Open Issue Simulcast Base layer = R (rate) Enhancement layer = α*r Total bit rate = (1+α)*R Scalable system Base layer = β*r Enhancement layer = γ*r Total bit rate = (β+γ)*r Bit reduction BR = (1 + α - β - γ)*r Base layer cost BC = (β - 1)*100% Should be within 10% Case 1 : In this case the base layers for the simulcast and the scalable systems can be same CG = 1/(1- BR/( α*r)) = α/ γ. Should be more than 2 Case 2 : In this case the base layers for the simulcast and the scalable systems may not be the same CG = 1/(1- BR/ ((1+α)*R)) = (1 + α)/(β + γ) Should be more than 1.35

Transcoding for Comparison One single higher quality / resolution layer is sent and the content is transcoded to achieve base layer. As the scale factor increases, the efficiency of the transcoding method increases. The transcoding tends to be significantly more compute intensive than simulcast or scalable decoding. As transcoding based systems use only one layer, the coding efficiency of such system is expected to be better than that of scalable systems.

Single Layer with Interpolation for Comparison One single lower resolution layer is sent and the content is up converted to higher resolution. We are not considering comparison of scalability systems to these approaches.

3. Complexity

Complexity Recommended Requirements Scalable coding shall enable low complexity implementations for encoding as well as decoding Test point Open Issue Both software (CPU utilization) and hardware implementation complexity (in particular, DRAM bandwidth requirement) should be measured and taken into account In comparison to single layer higher resolution decoding, for spatial scalability with 2x scale factor, should the total complexity should be required to be below around 125% In comparison to transcoding based systems, higher complexity can also be acceptable in some systems if it provides better coding gain

4. System Level Considerations

System Level Considerations Recommended Requirements Scalable coding technology shall specify efficient and easy ways to carry the bitstreams over widely used protocols and networks Comments The compressed video will get transported using MPEG 2 System, MPEG DASH and various Internet Protocols Efficiency and ease of carriage over various networks should be a part of the design. The compressed video will also be transported using IP networks. Impact of network switching elements should also be considered

5. Related Contributions

Related Contributions JCTVC F096, Scalable structures and inter layer predictions for HEVC scalable extension JCTVC F290, Scalability Support in HEVC JCTVC F292, Recommendations for evaluation of scalable coding JCTVC F462, On reference picture marking JCTVC F546, Sliding Window Improvement for Temporal Scalability

JCTVC F096, Scalable structures and inter layer predictions for HEVC scalable extension (Kwangwoon University) Proposed scalable structure

JCTVC F096, Scalable structures and inter layer predictions for HEVC scalable extension (Kwangwoon University) Simulcast approach with two different loop designs Single loop Multi loop New two inter layer predictions Inter layer texture prediction (ILTP) Generalized inter layer reference frame (GILR)

JCTVC F096, Scalable structures and inter layer predictions for HEVC scalable extension (Kwangwoon University) Inter layer texture prediction (ILTP): single loop For intra slices in the enhancement layer Reconstruct a corresponding block of the reference layer Perform up sampling 2. Up-sampling DCT-IF 8-tap for Luma, DCT-IF 4-tap for Chroma

JCTVC F096, Scalable structures and inter layer predictions for HEVC scalable extension (Kwangwoon University) Inter layer texture prediction (ILTP): single loop For inter slice in enhancement layer Intra block in the reference layer 1. Parse intra mode of the corresponding block in the reference layer 2. Perform an intra prediction with the decoded intra mode and the neighboring decoded pixels in the enhancement layer 3. Add an up sampled residual signal of the reference layer to the prediction block generated by Step 2 Reference pixels in enhancement layer 2N N N Base layer Intra mode Corresponding block 2N Enhancement layer

JCTVC F096, Scalable structures and inter layer predictions for HEVC scalable extension (Kwangwoon University) Inter layer texture prediction (ILTP): single loop For inter slice in enhancement layer Inter block in the reference layer 1. Decode MV, reference index of the corresponding block in the reference layer 2. Motion compensation at the enhancement layer 3. Add an up sampled residual signal of the reference layer to the MC block generated by Step 2 2N N MV, reference index 2N N Corresponding block Base layer Enhancement layer

JCTVC F096, Scalable structures and inter layer predictions for HEVC scalable extension (Kwangwoon University) Inter layer texture prediction (ILTP): Multiple loop The reference layer is to be fully reconstructed. ILTP works in the same way of the single loop for intra slices

JCTVC F096, Scalable structures and inter layer predictions for HEVC scalable extension (Kwangwoon University) Generalized inter layer reference frame (GILR) GILR generated from the reference layer is a reference frame to be inserted into DPB of the enhancement layer Intra slices of the enhancement layer are coded by inter prediction with referring to GILR as a reference frame. The Intra slice type are changed into inter slice type Such as slices are called G I slice DCT-IF Up-Sampling Inter prediction Intra slice to Inter slice (G-I slice) Reconstructed Base layer GILR Enhancement layer

JCTVC F096, Scalable structures and inter layer predictions for HEVC scalable extension (Kwangwoon University) For inter slices of the enhancement layer, GILR is added into DPB list 0 or list 1. or GILR can be substituted with one of reference frames. Each block of the GILR is generated with the same way of the ILTP DCT-IF & ILTP Up-Sampling Insert DPB DPB List0 or List1 Reconstructed Base layer GILR

JCTVC F096, Scalable structures and inter layer predictions for HEVC scalable extension (Kwangwoon University) Simulation conditions Spatial scalability Resolution of enh. Layer : Class B, Class C, Class D Anchor : HM2.0 Simulcast system Proposed : HM2.0 + Scalable system Measurement Bit rate reduction BR = ( R Simu B + R Simu E ) ( R Sca B + R Sca E ) R Simu B R Sca B PSNR PSNR=PSNR(Enhancement layer) PSNR Simu B PSNR Sca B

JCTVC F096, Scalable structures and inter layer predictions for HEVC scalable extension (Kwangwoon University) Coding efficiency in High Efficiency condition (HE) Single loop with ILTP : 14.04% (AI), 3.70% (LD), 9.82% (RA) Single loop with GILP : 20.86% (AI), 2.48% (LD), 7.73% (RA) Multi loop with ILTP : 14.04% (AI), 5.11% (LD), 12.13% (RA) Multi loop with GILR : 20.86% (AI), 4.50% (LD), 10.37% (RA) Scalable structures and inter layer predictions were proposed Scalable structures with single and multi loop design Inter layer predictions (ILTP/ GILR) GILR is considered as a consolidated solution for multi view scalability

JCTVC F290, Scalability Support in HEVC (Vidyo) A scalable coding architecture is proposed Allows spatial scalability at any resolution ratio (or CGS with no resolution change) Experimental results for intra and inter, 2 spatial layers, 1:2 resolution ratio per dimension Design goal to minimize changes to HEVC codec Architecture could be applied to an H.264/AVC backwardscompatible base layer

JCTVC F290, Scalability Support in HEVC (Vidyo) Difference coding mode Base layer coded pixels up sampled and subtracted from high resolution input, forming difference values Difference values are coded using normal HEVC Pixel coding mode High resolution input pixels coded in non scalable manner Signaling at slice level or CU level, using diff_coding_flag

JCTVC F290, Scalability Support in HEVC (Vidyo) Motion estimation/compensation When difference coding mode used, difference values (high res input picture minus upsampled coded reference layer picture) predicted from difference values of previously coded picture Reference block(s) in reference picture may have been coded in either pixel coding mode or difference coding mode 2N Pixel value or Difference value ME 2N Pixel value or Difference value Reference block in reference picture Current block

JCTVC F290, Scalability Support in HEVC (Vidyo) Intra prediction When difference coding mode used, difference values spatially predicted from neighboring difference blocks Spatial neighboring blocks may have been coded in either pixel coding mode or difference coding mode Reference pixels or Difference 2N 2N Pixel value or Difference value Current block

JCTVC F290, Scalability Support in HEVC (Vidyo) Inter layer motion prediction Base layer Motion Vector added to the motion prediction list Center of the current block used to find co located reference layer MV Scale MV by resolution scaling factor No new syntax in enhancement layer

JCTVC F290, Scalability Support in HEVC (Vidyo) Reference layer sample point Choose whether upsampling of the reference layer done before or after the in loop filtering modules Analogous to disable_inter_layer_deblocking_filter_idc in SVC

JCTVC F290, Scalability Support in HEVC (Vidyo) H.264/AVC base layer mixed codec option Coding of enhancement layer less tightly coupled to base layer coding than in SVC H.264/AVC base layer may be supported using same architecture, with only high level syntax changes to enhancement layer Option 1 Use different HEVC NAL unit types than H.264/AVC Add codec type field to Dependency Parameter Set Only base layer can be of a different codec type, and only H.264/AVC supported May be SVC (Annex G) with multiple dependency or quality layers of its own Option 2 Encapsulation NAL unit type defined Encapsulation NALs would include sequences compliant with a non HEVC standard Allows for any coding standard Allows for multiple scalable or simulcast layers

JCTVC F290, Scalability Support in HEVC (Vidyo) Scalable HEVC experiment results (Anchor : HM 3.0) Single layer anchor : JM 3.0

JCTVC F290, Scalability Support in HEVC (Vidyo) JSVM result using HEVC test conditions(anchor : HM3.0) Single layer anchor : JM 3.0

JCTVC F290, Scalability Support in HEVC (Vidyo) Scalable coding extension to HEVC proposed Similar BD bitrate savings vs. simulcast for RA and LD test cases as JSVM Flexible design Design goal of few changes vs. single layer HEVC design Any scaling factor can be supported Can be used with H.264/AVC base layer

JCTVC F292, Recommendations for evaluation of scalable coding (Vidyo) Method 1 Only comparison between scalable enhancement layer and simulcast high resolution Same QP encoded simulcast high resolution and scalable enhancement layer Simulcast High resolution Enhancement Layer Simulcast high resolution bitstream Comparison PSNR Sim H Base Layer Enhancement layer bitstream vs PSNR Sca E Base layer bitstream

JCTVC F292, Recommendations for evaluation of scalable coding (Vidyo) Method 1 Neglect impact of the difference in quality between simulcast high resolution and scalable enhancement layer on the base layer Tends to magnify the coding gains/losses Smaller denominator is used. Changes in bitrate do correspond more directly to the changes in PSNR of the higher resolution layer It cannot be used to compare with single layer coding. A part of bitstream for enhancement layer cannot be decodable. This method is not recommended.

JCTVC F292, Recommendations for evaluation of scalable coding (Vidyo) Method 2 Total bitrates of both layers are considered Same base layer for both simulcast and scalable coding As in method 1 Same QP to encode simulcast high resolution and scalable enhancement layer Neglect impact of the difference in quality between simulcast high resolution and scalable enhancement layer on the base layer Method tends to disadvantage scalable coding Uses the same base layer for both simulcast and scalable, even though the high resolution qualities differ Would expect simulcast low resolution layer quality/bitrate also to differ, corresponding to the quality differences of the simulcast high resolution layer

JCTVC F292, Recommendations for evaluation of scalable coding (Vidyo) Method 2 Simulcast High resolution Enhancement Layer Simulcast high resolution bitstream Comparison PSNR Sim H Base Layer Enhancement layer bitstream vs PSNR Sca E Base layer bitstream Sometimes, U and V BD-rate values have strange results.

JCTVC F292, Recommendations for evaluation of scalable coding (Vidyo)

JCTVC F292, Recommendations for evaluation of scalable coding (Vidyo) Method 3 Adjust simulcast high res bitrate/psnr to match the scalable enhancement layer PSNR As in method 2 Same base layer for both simulcast and scalable coding Same QP to encode simulcast high res and scalable enhancement layer Total bitrates of both layers considered Use piecewise linear estimation of log(bitrate) vs. PSNR to adjust simulcast bitrate/psnr values used in BD rate/bd PSNR calculations Adjust simulcast, not scalable point, since simulcast high res layer is independently decodable, while scalable enhancement layer depends on particular base layer Bitrate and Y PSNR have a relationship that is approximately linear between log(bitrate) and Y PSNR

JCTVC F292, Recommendations for evaluation of scalable coding (Vidyo) Method 3 Method 2 is used to count bit rate and PSNR BD rate calculate by based on piecewise cubic interpolation PSNR (db) 42.0 41.5 41.0 40.5 40.0 39.5 39.0 38.5 38.0 37.5 37.0 36.5 36.0 35.5 35.0 34.5 34.0 Scalable enhan layer Simulcast high res Adj simulcast high res 0 1000 2000 3000 4000 5000 6000 7000 bitrate (kbps) Log(bit rate) PSNR (db) 42.0 41.5 41.0 40.5 40.0 39.5 39.0 38.5 38.0 37.5 37.0 36.5 36.0 35.5 35.0 34.5 34.0 300 3000 bitrate (kbps) Scalable enhan layer Simulcast high res Adj simulcast high res Adjust the PSNR of simulcast high resolution to that of scalable enhancement layer This method is recommended!

JCTVC F292, Recommendations for evaluation of scalable coding (Vidyo)

JCTVC F292, Recommendations for evaluation of scalable coding (Vidyo) Recommend use of method 3 for assessment method Scalable coding shows slightly higher bitrate savings with Method 3 than with Method 2, although significantly less than with Method 1 Recommend use of fixed QP offsets for layers, rather than specifying target bitrates

JCTVC F462, On reference picture marking (Huawei & Qualcomm) JCTVC F546 (Qualcomm) Proposes some changes to the reference picture marking process Encoder GOP Reference pictures with an identical or greater value of temporal_id may be marked as unused for reference Temporal_id 0 3 2 3 1 3 2 3 0 GOP Discarded pictures non-existing pictures have the greatest possible value of temporal_id is 7 Decoder Temporal_id 0 7 2 7 1 7 2 7 0 Non-existing picture Advantage Clearly parsing of syntax at decoder Reference picture management

6. Future Works

Future Works It was decided by the group to follow the CfP path to develop standards associated with the scalability extensions of HEVC and issue the first formal CfP (Phase 1) sometime next year (2012). First focus on the scalability associated with 2D 4:2:0 video in the Phase 1 of the Scalability Extension of HEVC Under consideration / Future Extensions Beyond Phase 1" (Annex D of N12214) Bit depth Chroma View Scalability for 3D Video

Q and A 한종기교수세종대학교 hjk@sejong.ac.kr http://msp.sejong.ac.kr