Further Reduced Resolution Depth Coding for Stereoscopic 3D Video

Further Reduced Resolution Depth Coding for Stereoscopic 3D Video N. S. Mohamad Anil Shah, H. Abdul Karim, and M. F. Ahmad Fauzi Multimedia University, 63100 Cyberjaya, Selangor, Malaysia Abstract In this paper, Further Reduced Resolution Depth Coding (FRRDC) method is proposed as an improvement for the Reduced Resolution Depth Coding (RRDC) method that was used for Scalable Video Coding (SVC). Similar to RRDC, FRRDC is applied by using Down-Sampling and Up-Sampling (DSUS) of the depth data of the stereoscopic 3D video. The depth data is downsampled before the SVC encoding and up-sampled after the SVC decoding operation. The proposed FRRDC using DSUS method aims to improve the coding method and efficiency of the RRDC method in terms of objective and subjective quality of the stereoscopic 3D video. The difference between FRRDC and RRDC method lies in the sense that FRRDC encodes an even more reduced resolution of the depth video compared to the RRDC. This paper gives an overview of the FRRDC method and the simulation results are obtained to evaluate the performance of the FRRDC method. Index Terms 3D stereoscopic video, scalable video coding, down-sampling and up-sampling. I. INTRODUCTION The Joint Video Team of the ITU-T Video Coding Expert Group (VCEG) and the ISO/IEC Moving Picture Experts Group (MPEG) has previously standardized a scalable video coding extension (SVC) to the existing H.264/AVC codec to support video systems with diverse capabilities. SVC is a technology that uses single bit stream to provide multiple spatial, temporal, and quality (SNR) resolutions, resulting in various options in terms of picture quality and spatialtemporal resolutions [1]. In this paper, the SVC is used to encode the stereoscopic 3D video. Stereoscopic 3D video is the simplest type of 3D video that can be used for many multimedia applications that are compatible with existing video technologies. Stereoscopic video renders a different view for each eye to project the illusion of depth for the 3D scene. A 2D plus depth video format of stereoscopic video is described in [2], where 2D video from an ordinary 2D video camera and depth video from a laser range or depth camera are used. Using the spatial scalability of SVC, the 2D colour video and depth video can be coded at the base and enhancement layers respectively using the SVC encoder as shown in Fig. 1. At the SVC decoder, depth video is combined with colour video using the Depth Image Base Rendering (DIBR) technique [2] to produce the left and right views of the stereoscopic video. The 2D plus depth format of stereoscopic video coding configuration provides a more backward compatibility with other video encoders such as the left and right coding and interlaced coding of the colour and depth data, compared to other types of stereoscopic video coding configurations [3]. The 2D plus depth SVC configuration as shown in Fig. 1 has also been proven to have better performance than the coding of colour and depth format using the MPEG4-Multiple Auxiliary Component (MPEG4-MAC) [3]. Color Video SVC ENCODER SVC DECODER DIBR Depth Video 3D Stereoscopic Video Fig. 1: SVC configuration for 2D plus depth format. This paper investigates the application of DSUS together with further reduced resolution of depth video coding to the SVC for 2D plus depth stereoscopic video format. The objective of this paper is to achieve further compression using the DSUS without impairing the quality of the perceived stereoscopic 3D video. Such compression method is desirable for consumer devices such as the 3DTV [4], mobile 3DTV [5]

and 3D mobile phone [6] as it can store and transmit 3D video at a much reduced bandwidth requirement. In the DSUS method, the depth video frames are spatially down-sampled before the SVC encoding and up-sampled after the SVC decoding. Section II gives a brief literature review on FRRDC based on RRDC DSUS method. Section III describes the application of FRRDC DSUS method with the SVC. Section IV provides the performance evaluation of the proposed solution in error free and error prone environments, objectively and subjectively. Section V gives the concluding remarks of this paper. II. OVERVIEW OF FURTHER REDUCED RESOLUTION DEPTH CODING METHOD The application of reduced resolution or down-sampling for image and video compression has been investigated with the aim of improving the objective and subjective performance at low bit rates in [7] and [14]. In [7], it is shown that the downsampling of the image resolution resulted to an optimum compression with improved PSNR performance. In [14], it is shown that the down-sampling of the depth video resolution resulted to better PSNR performance as well as better perceived quality of the stereoscopic 3D video. The effect of down-sampling is also investigated on multiview video coding efficiency in [8] and [9]. In [8], arbitrary views inside a multi-view sequence are down-sampled before encoding and up-sampled to their original resolutions after decoding. Results showed a reduction in bit rate and computational complexity. In [9] the method in [8] is applied to multi-view sequences with depth information taking into consideration the quantisation distortion and down-sampling distortion that varies with the operating bit rate. In [9], the coding efficiency of the stereoscopic 3D video in [3] is enhanced at low bit rate range by applying the RRDC method. Ultimately, the effects of further reducing or further downsampling of the depth video resolution based on the RRDC method is to be investigated in this paper. Method used in [14] will be modified by further reducing or further down-sampling the depth video resolution used in the coding processes of the stereoscopic 3D videos. III. FRRDC FOR SVC Similar to RRDC, FRRDC uses a simple way to further reduce the spatial resolution of an image by a factor of four instead of a factor of two by using sub-sampling. If f(i,j) is the pixel value of an image at location (i,j), then the downsampled image is, fd(i/4,j/4) = f(i,j) (1) For i=0,4,8, XSIZE, j=0,4,8, YSIZE where XSIZE is the vertical size and YSIZE is the horizontal size of the image to be down-sampled. Take into account that the three neighbouring pixels to f(i,j), which are f(i,j+1), f(i+1,j) and f(i+1,j+1), can also be used. In this paper, the pixel f(i,j) added with the three neighbouring pixel values are averaged as given below to produce an image down-sampled by a factor of four. fd (i/4,j/4) =[f(i,j) + f(i,j+1) + f(i+1,j) + f(i+1,j+1)]/4 (2) For i=0,4,8 XSIZE-1, j=0,4,8 YSIZE-1 As an example, Equation (2) can be repeated for every frame in a 720x576 depth image sequence resulting in a 180x144 depth image sequence. The resulting sequence has to be cropped to 176x144 (QCIF resolution) to suit the SVC encoder operation. During the process of up-sampling from 176x144 to 720x576, the depth video quality is slightly affected as a result from pixel copying at the edge to match the cropped columns, which reduces the Peak Signal to Noise Ratio (PSNR) of the up-sampled depth information. However, the overall bit rate is further reduced due to the further downsampling of the depth information. The FRRDC algorithm can be applied to the enhancement layer of SVC which is used to code the depth information by using the DSUS method. Fig. 2 shows the block diagram of the proposed method for the encoder. At the decoder, the coded depth information from the enhancement layer is upsampled back to its original resolution. 720x576 720x576 Enhancement Layer Downsample 176x144 Base Layer SVC ENCODER Fig. 2: Block diagram of H.264/SVC encoder with FRRDC. IV. SIMULATION RESULTS AND DISCUSSION Orbi, Interview, and Cg test sequences with spatial resolution of 720x576 are used in the simulation. The test sequences include both, color and depth information. The depth information is down-sampled from the original 720x576 spatial resolutions to QCIF (176x144) spatial resolution. The JSVM software [10] is used in the simulation of the SVC. Three layer configurations is used in the simulation as shown in Table 1 due to the fact that the SVC spatial scalability does not allow the enhancement layer to send video at lower spatial resolution than the base layer as mentioned in [14]. Similar to RRDC, both SVC without DSUS (SVC-ORG) and SVC with DSUS (SVC-DSUS) configurations are used for simulation purposes. The base layer (layer 0) of the SVC is used to send the colour information at QCIF (176x144) resolution. For SVC-DSUS configuration, the enhancement layer (layer 1) is used to send the depth at reduced resolution. The third layer which is the enhancement layer 2 is used to

send the colour information of the original resolution for both SVC-ORG and SVC-DSUS configurations. The purpose of having two configurations such as the SVC-ORG and SVC- DSUS is to illustrate, compare, and evaluate the effects of the further reduced resolution coding method with regards of the original resolution coding method. TABLE I. CONFIGURATION USED FOR SVC-ORG AND SVC-DSUS Encoder SVC- ORG SVC- DSUS Layer Spatial Resolution Type 0 (b) 176x144 Color 1 (e) 720x576 Depth 2 (e) 720x576 Color 0 (b) 176x144 Color 1 (e) 176x144 Depth 2 (e) 720x576 Color b: base layer, e: enhancement layer Fig. 3: Rate distortion for the depth information of Orbi sequence. A. Error Free Objective Quality Evaluation The rate distortion curve obtained for the depth information of the Orbi, Cg and Interview sequences simulation are shown in Fig. 3, Fig. 4, and Fig. 5 respectively. The bit rate is varied using the quantization parameter (QP). QP of 10, 30, and 50 is used in this particular simulation. The same quantization parameter is used for colour and depth. There is no rate allocation performed between the color and depth information. Horizontal axis of the graph shows the bit rate for the depth whereas the vertical axis of the graph shows the average PNSR for the depth as shown in Fig.3, Fig. 4, and Fig.5. It can be seen from Fig. 3, Fig.4, and Fig. 5 that the performance of the SVC DSUS (FRRDC) is much better than SVC-ORG (FRRDC) at the same bit rate in low bit range (less than 300 kbit/s for Orbi sequence, less than 500 kbits/s for Interview sequence, and less than 100 kbits/second for Cg sequence). For example it can be seen in Fig 3. that at about 300 kbit/s, the average PSNR for SVC-DSUS Orbi is about 36 db and the average PSNR for SVC-ORG Orbi is about 28 db. However, the DSUS method degrades depth performance at bit rate above 1000 kbit/s due to the up-sampling distortion which is more visible at high bit rates. Fig. 4: Rate distortion for the depth information of Interview sequence. B. Error Free Subjective Quality Evaluation Subjective quality evaluation of the stereoscopic 3D videos is performed to further confirm the perceived quality of the synthesized stereoscopic 3D video. The evaluation is done due to the fact that the PSNR measures video quality has been said to have poor correlation with the real perceptual quality as discussed in [11] and [12]. Orbi, Interview and Cg video sequences are used for the subjective quality evaluation. In the subjective quality evaluation, the QP of the colour video is fixed to 5 and the QP of the depth video is varied from 10, 30 and 50 to evaluate the videos of different compression characteristics or at different bitrates. Fig. 5: Rate distortion for the depth information of Cg sequence. The Double Stimulus Continuous Quality Scale (DSCQS) method (Variant I) is used in this evaluation. The assessors are presented with both the unimpaired (original) reference video

which is the synthesized stereoscopic 3D video from the SVC- ORG (FRRDC) configuration and the so called impaired (down-sampled and up-sampled) synthesized stereoscopic 3D video from the SVC-DSUS (FRRDC) configuration for all the required combinations of QP 10, 30, and 50. The assessors are asked to rate the quality for both videos according on the scale given in the survey form. The SVC-ORG and SVC DSUS videos are presented randomly to the assessors according to its corresponding QP. A total of 15 respondents (8 males and 7 females) with the age ranges from 20 to 50 are chosen to participate in the evaluation. All participants are non-experts in assessing 3D stereoscopic video quality. A full resolution 1366x768 of a 14 inch Sony VAIO LCD monitor display is used for viewing and evaluation of the videos. A pair of red and blue 3D glasses is given to the respondents to view and assess the 3D impairment effect of the video. A sheet of survey form is given to each assessor to evaluate the quality of the three videos sequences. In each test, the videos are assessed using the grading scale as shown in Fig. 6. Assessors were asked to assess the overall picture quality of each video by marking on the continuous vertical scale given in the survey form. The scale was divided into five equal lengths corresponding to the ITUR five-point quality scale. The associated terms categorizing the different levels were the same as used in the BT.500-11 Recommendation. The method mentioned is used for subjective quality evaluation in [14] and again used in this paper for subjective quality evaluation of the FRRDC method. Fig. 6. Grading scale of the subjective quality evaluation. Similarly, the results obtained in the subjective quality evaluation are compiled and analyzed accordingly. The mean score, standard deviation and the confidence interval were calculated and tabulated for all of the test sequences. The quality of the video is considered to be high if the mean score is closer to 0, whereas the quality of the video is considered to be low if the mean score is closer to 100. Other than that, mean score of 0 also indicates that the quality of the SVC- DSUS video is 100% the same as the quality of the SVC-ORG video. The standard deviation and the confidence interval values indicate the level of correlation between one assessor s opinions to the other. deviation values that is closer to 0 also indicates that the quality of the perceived video assessed by the assessor is closer to 100% the same between all of the assessors, which points to no variation in opinion among them. Therefore it should also be of small values in order for the mean score to be meaningful and collective. Fig. 7, Fig. 8 and Fig. 9 shows the pairs of test sequences used for the subjective quality evaluation of both the original (SVC-ORG) and the down-and-up-sampled (SVC-DSUS) of the FRRDC stereoscopic 3D videos for Orbi, Interview, and Cg sequence respectively with QP of the color video fixed to 5 and QP of the depth video varied from 10, 30 and 50. It can be seen from Fig. 7, Fig. 8 and Fig. 9 that the depth video can be further compressed without impairing the perceived stereoscopic 3D video quality much. The first column of Fig. 7, Fig. 8 and Fig. 9 shows the images of the videos from the SVC-ORG configuration whereas the second column Fig. 7, Fig. 8 and Fig. 9 shows the images of the videos from the SVC-DSUS configuration of the FRRDC method. The first row, second and third row of Fig. 7, Fig. 8 and Fig. 9 also shows the images of the videos for the configuration of QP depth equal to 10, 30, and 50 respectively. It can be seen from the Fig. 7 to Fig. 9 that, on the second column, last row (SVC-DSUS), the stereoscopic 3D video quality is almost the same as the SVC Org (first column, last row). This shows that compression distortion together with the DSUS distortion of the depth data does not affect the overall quality stereoscopic 3D video. This is further confirmed by the subjective evaluation results in Table II to Table IV. The overall bit rates, bit rate difference between SVC-ORG and SVC-DSUS, mean score, standard deviation and confidence interval for Orbi, Interview, and Cg sequence are given in Table II, Table III, and Table IV respectively. The results from Table II, Table III and Table IV show that the SVC DSUS can achieve almost the same stereoscopic 3D video quality as SVC-ORG with lesser bit rates. For example as shown in Table II, for QP equals to 10, SVC-ORG encodes the Orbi sequence at 77793.47 kbit/s and SVC-DSUS encodes the Orbi sequence at 62933.95 kbit/s, lower by 14859.52 kbit/s, which is about 19.7% reduction in bit rates which is much higher than the RRDC method with only 15.1% of reduction in bit rates for Orbi sequence in [14]. The results from Table II, Table III, and Table IV also shows that the Orbi, Interview and Cg video quality is of high quality for all QP of 10, 30, and 50 with a mean score of 1.60 each, which indicates 98.4% of similarity in both SVC-ORG and SVC-DSUS videos. It can also be seen that the standard deviation values of all videos are also of small values (less than 3.00) for all QP values which indicate that the majority of the assessors have similar sense of the perceived video quality. The data obtained from Table II, Table III and Table IV are then analyzed to obtain the overall subjective quality. The overall mean score, standard deviation and confidence interval for all the videos of all QP variation tested are calculated. Table V generally sums up the overall subjective quality of the three videos for FRRDC method.

TABLE II. SURVEY RESULTS OF THE SUBJECTIVE QUALITY EVALUATION FOR ORBI VIDEO Fig. 7. Orbi sequence with original depth videos (left) and down-and-upsampled depth videos (right), 1st row QP depth 10, 2nd row QP depth 30, and 3rd row QP depth 50. Orbi QP10 QP30 QP50 Overall Bitrates (kbps) SVC-ORG 77793.47 63135.95 62494.98 SVC-DSUS 62933.95 62194.47 62163.42 Bitrate Difference 14859.52 941.48 331.56 Mean Score 1.60 1.60 1.60 Deviation 2.13 2.13 2.13 Confidence 1.60±1.08 1.60±1.08 1.60±1.08 TABLE III. SURVEY RESULTS OF THE SUBJECTIVE QUALITY EVALUATION FOR INTERVIEW VIDEO Interview QP10 QP30 QP50 Overall Bitrates (kbps) SVC-ORG 67419.91 60912.32 60468.99 Fig. 8. Interview sequence with original depth videos (left) and down-and-up-sampled depth videos (right), 1st row QP depth 10, 2nd row QP depth 30, and 3rd row QP depth 50. Fig. 9. Cg sequence with original depth videos (left) and down-and-up-sampled depth videos (right), 1st row QP depth 10, 2nd row QP depth 30, and 3rd row QP depth 50. SVC-DSUS 60574.79 60173.56 60148.89 Bitrate Difference 6845.12 738.76 320.10 Mean Score 1.60 1.60 1.60 Deviation Confidence 2.13 2.13 2.13 1.60±1.08 1.60±1.08 1.60±1.08 TABLE IV. SURVEY RESULTS OF THE SUBJECTIVE QUALITY EVALUATION FOR CG VIDEO Cg QP10 QP30 QP50 Overall Bitrates (kbps) SVC-ORG SVC-DSUS 21747.48 21357.92 21167.38 20949.74 20843.66 20833.55 Bitrate Difference 797.74 514.26 333.83 Mean Score 1.60 1.60 1.60 Deviation 2.13 2.13 2.13 Confidence 1.60±1.08 1.60±1.08 1.60±1.08 It is also shown in Table V that the overall quality of the SVC-DSUS for Orbi, Interview and Cg sequences has the mean score of 1.60 each which indicates that the videos are about 98.40%, similar to the SVC-ORG of Orbi, Interview and Cg sequences, with a small standard deviation of 1.08 each. The overall subjective quality evaluation result shows that the assessors found it hard to differentiate the perceived

quality of all the stereoscopic 3D videos. This further proves that the process of further down-sampling and up-sampling depth information applied in the SVC-DSUS using FRRDC configuration has the ability to yield high quality stereoscopic 3D video comparable to the quality of the original depth information used in the SVC-ORG configuration at a much reduced storage and bandwidth requirements than the RRDC DSUS method. TABLE V. OVERALL SUBJECTIVE QUALITY EVALUATION RESULTS FOR ALL TEST SEQUENCES Test Sequence Mean Score Deviation Confidence FRRDC Orbi Cg Interview 1.60 1.60 1.60 2.13 2.13 2.13 1.60±1.08 1.60±1.08 1.60±1.08 V. CONCLUSION This paper presents the application of DSUS with FRRDC to SVC for the compression of 3D stereoscopic video using the 2D plus depth video format. The simulation results showed that the FRRDC method provides further compression of the 3D stereoscopic video than the RRDC method with no impairment on the perceived video quality. The application of FRRDC shows an improvement in the rate distortion performance of SVC, particularly at the low bit rate range due to the further reduction in depth bit rates. Similar to RRDC method, although there is a risk of up-sampling distortion; the simulation results shows that the up-sampling distortion only reduces the coding efficiency in the high bit rate range. Results obtained from the subjective quality evaluation of the stereoscopic 3D video further confirmed that the perceived quality of the FRRDC SVC stereoscopic 3D video is better than the RRDC SVC stereoscopic 3D video for both configurations, SVC ORG and SVC DSUS as it yields videos with 98.4% similarity to the original videos with even lesser bit rates. This proves that the FRRDC DSUS depth coding method can be used to encode stereoscopic 3D videos, as it gives the viewer high quality of stereoscopic 3D video with the videos that uses original depth information at a much lower bitrates than the RRDC DSUS depth coding method. ACKNOWLEDGMENT This project is sponsored under the MMU-GRA scheme with project ID of IP20110105032. REFERENCES [1] H. Schwarz, D. Marpe, and T. Wiegand. Overview of the scalable video coding extension of the H.264/AVC standard, IEEE Transcaction on Circuits and Video Systems for Video Technology, vol. 17, no. 9, pp. 1103-1120, September 2007. [2] C. Fehn. A 3d-tv approach using depth-image-based rendering (dibr), in Proc. of VIIP2003, Spain, September 2003. [3] C.T.E.R. Hewage, H. A. Karim, S. Worrall, S.Dogan, and A.M. Kondoz. Comparison of Stereo Video Coding Support in MPEG-4 MAC, H.264/AVC and H.264/SVC, in Proc. of VIE2007, London, July 2007. [4] H.M. Ozaktas and L. Onural, Three-Dimensional Television - Capture, Transmission, Display, Springer, 2008, Pp. 1-10 [5] A. Gotchev, A. Tikanmaki, A. Boev, K. Egiazarian, I. Pushkarov, and N.Daskalov, Mobile 3DTV technology demonstrator based on OMAP3430, 16th International Conference on Digital Signal Processing (DSP 2009), Santorini, Greece, July 2009. [6] C. Bal, Three-Dimensional Video Coding On Mobile Platforms, Master Thesis, Bilkent University, September 2009. [7] A. M. Bruckstein, M. Elad, and R. Kimmel. Downscaling for better transform compression, IEEE Trans. Image Processing, vol. 12, no. 12, pp. 1132-1144, September 2003. [8] E. Ekmekcioglu, S. T. Worrall, and A. M. Kondoz. Utilisation of downsampling for arbitrary views in multi-view video coding, Electronic Letters, vol. 44, no. 5, pp. 339-340, February 2008. [9] E. Ekmekcioglu, S. Worrall, and A. M. Kondoz. Bit-rate adaptive downsampling for the coding of multi-view video with depth information, in Proc. of 3DTVConference, Istanbul, Turkey, May 2008. [10] JSVM 9.19.11 reference software from CVS server, garcon.ient.rwthaachen.de/cvs/jvt. [11] P. W. Gorley and N. S. Holliman, Stereoscopic Image Quality Metrics and Compression, in Proceedings of SPIE-IS&T Electronic Imaging, SPIE, vol. 6803, pp. 1-12, January 2008. [12] Y. Zhong, I. Richardson, A. Sahraie, and P. Mcgeorge, Qualitative and quantitative assessment in video compression, in Proc. of 12 th European Conference on Eye Movements, Dundee, Scotland, August 2003. [13] Recommendation ITU-R BT. 500-11, Methodology for the Subjective Assessment of the Quality of Television Pictures, ITU-R, pp. 1-48, 1974-2002. [14] H. A. Karim, N. S. Mohamad Anil Shah, N. M. Arif, A. Sali, S. Worrall, Reduced Resolution Depth Coding for Stereoscopic 3D Video, IEEE Transactions on Consumer Electronics, 2010.