LBP-GUIDED DEPTH IMAGE FILTER. Rui Zhong, Ruimin Hu

Size: px

Start display at page:

Download "LBP-GUIDED DEPTH IMAGE FILTER. Rui Zhong, Ruimin Hu"

Natalie Grant
5 years ago
Views:

1 LBP-GUIDED DEPTH IMAGE FILTER Rui Zhong, Ruimin Hu National Engineering Research Center for Multimedia Software,School of Computer, Wuhan University,Wuhan, , China ABSTRACT The multi-view video plus depth (MVD) format has been put forward for the call for proposals in free view video (FVV) and 3DTV. Since representing the 3D scene geometry, depth maps are used for synthesizing virtual views. However, compression artifacts of the depth images always lead to geometry distortions in synthesized views. By exploiting LBP features of the corresponding color samples, we propose a novel local binary pattern (LBP) guided depth filter which enables the local neighborhood samples those are in the same object of the current pixel to be filtering input. In recognition of its ability for describing the object edges, the LBP operator is used to calculate the weighted values of the local depth pixels for the depth-map filter. Furthermore, the filter is incorporated into the framework of H.264/MVC as an in-loop filter. The experimental results demonstrate that the proposed approach offers 0.45dB and 0.66dB average PSNR gains in terms of video rendering quality and depth coding efficiency, as well as significant subjective improvement in rendering views. 1. INTRODUCTION 3D TV and movies which provide people with favorable immersion and visual experiences accelerate the development of corresponding technologies. Although multi-view videos of a scene are captured at the same time to support the stereo perception, it is still difficult to generate the 3D perception at free view during interactive activities with users. Therefore, how to render the free views from the available MVD becomes a hot issue. P. Merkle illustrated the special statistic characteristics of depth image that it consists of large homogeneous regions partitioned by sharp object edges [1]. Therefore, the common quantization and block-based coding unit in hybrid coding framework, like H.264/AVC, lead to serious artifacts around the object edges in depth image. Since depth map provides the geometry information of a 3D scene, the depth errors further result in geometry distortions in synthesized views. In contrast to the depth errors in homogeneous areas, the pixel displacements around the sharp edges degrade the quality of the synthesized views more seriously [2]. So, it is essential to maintain the sharpness of object edges when coding depth images. In order to meet the backward compatibility of depth codec, K. Oh [3] incorporated an in-loop filter into H.264 framework to denoise depth images. The in-loop filter recovers the noisy pixels from neighboring pixels within the limitation of geometric closeness, photometric similarity and occurrence frequency. This method significantly improves the efficiency of depth coding and generates better synthesized views. However, the corrupted depth pixels may reduce the accuracy of the reconstructed pixels. And the lower the bitrate, the worse the filtering performance. Based on the idea of bilateral filter, Liu [4] presented a trilateral filter in which the structural similarity between color images and corresponding depth maps also contributes to the weighted values. This filter can definitely remove artifacts in decoded depth maps, as well as improve the synthesized video quality under same bitrate, but its performance is not stable if there are few similar pixels around the current pixel. To achieve the stable filtering output, the filter needs to eliminate the effect caused by the neighboring pixels with significantly different values. In this paper we propose a local binary pattern (LBP) guided depth map filter based on bilateral filter. LBP is firstly presented by Timo Ojala [5] as a robust micro-texture descriptor which describes the uniform distribution of the circularly symmetric neighboring pixels. LBP measures the structural image features such as edges, corners, spots and planes accurately. Since depth map filter selects input pixels on the basis of the similarity of the neighboring depth pixels in an object, LBP of the color image can be used to determine whether the near pixels belong to the same object of current pixels. In addition, LBP is robust to the monotonic gray scale transformation and quantization [6], the slight quantization errors in depth maps do not decrease its description accuracy. Hence, using the LBP of color images to limit the input of the filter is more robust than calculating the weighted vales depending on the luminance component of corresponding color images. The remainder of the paper is organized as follows: the encoding framework is presented in Section 2. Section 3 demonstrates the LBP-guided filter. Section 4 analyzes the experimental results. The paper is concluded in Section 5.

2 2. THE ENCODING FRAMEWORK The depth video could be coded by H.264/MVC as luminance component of the color video. Since representing the geometry information of the 3D scene, it is mainly used to synthesize the virtual view video, not to display directly. Therefore, the block-based encoding and the quantization of depth map lead to serious artifacts in rendering image, such as blocking artifacts, ringing or blurring [7]. To eliminate the artifacts in the depth reconstruction image, this paper designs a depth filter under the guidance of LBP operator which describes the color image structure accurately. Finally, the filter is added into the framework of H.264/MVC codec as an in-loop filter. The specific framework is shown in Fig LBP operator For each pixel g c of the color image, the Local Binary Pattern (LBP) operator is derived from the local circularly symmetric neighbors set containing P members on a circle of radius R, denoted as LBP P,R. The variant P controls the quantization of the angular space, and R is related to the space resolution of the operator[5]. As shown in Fig. 2, it is the location of the neighboring pixels set, where R equals 1 and P is assigned as THE LBP GUIDED DEPTH FILTER Bilateral filter is a non-linear filter that generates the filtering output by the weighted average of the neighboring input pixels. The weighted values are calculated depending on geometric closeness and photometric similarity between the input pixels and the current filtered pixel [8]. The trilateral filter is to exploit not only the photometric similarity in depth image, but also the photometric similarity in corresponding color image that has the similar structure with depth image. By doing so, the smaller weighted values are set for near outlier pixels for producing a smooth depth image while preserving the edges [4]. However, a number of dissimilarity pixels around the current pixel will result in an unstable filtering output. The pixels within an object correspond to similar intensity values in depth-image. To eliminate the unstable influence taken by dissimilarity input pixels, the proposed depth-image filter only select the local pixels within the object which consists of the current filtered pixel as filtering input. Due to the structural similarity between depth and color images, the LBP feature of color image is used to identify the pixels which are in the same object of current pixel. And then those similar depth pixels are taken as filtering input to ensure the stable filtering output. Fig. 2. The locations of the current pixel and the neighboring pixels set T represents the the derivation of the texture operator, the joint distribution of local near pixels could be expressed as, T = T (g c, g 0,, g P 1 ) (1) Where g c is the color sample value at the location of the current filtered depth pixel, and g p (p = 0, P 1) denotes the color sample values of its local circularly symmetric neighboring pixels. While the differences (g p g c ) are independent of g c, the joint distribution of gray values is transformed into joint difference distribution [9]. T T (g 0 g c,, g P 1 g c ) (2) Since the signs of the difference s (g 0 g c ) will not vary with the average luminance, we let the binary signs s (g 0 g c ) replace the gray shift. where, { 1, x 0 s (x) = (3) 0, x < 0 Fig. 1. Framework of H.264/MVC codec for a view 3.2. LBP guided filtering Except for geometric closeness and photometric similarity in depth image, the LBP-guided filter also depends on the LBP

3 operator of the corresponding color image at the same location. In local window w which is labeled in Fig. 2,the depth sample value d p (p = 0, P 1) of the neighboring pixels are weighted to get the filtered output G c for the current filtered depth pixel d c. The LBP-guided filter is finally given by equation (4). G c = 1 F c G S (d p d c ) G Q ( p c ) LBP uni P,R (g p g c ) d p (4) Where, LBPP,R uni is the uniform local binary pattern of the near color pixels in window w. In expression (5), we calculate the summed weighted value F c, the range filter kernel G S (d p d c ) with the photometric similarity (d p d c ), and the spatial kernel G Q (p c) with the geometric closeness p c. F c = G S (d p d c ) G Q ( p c ) LBPP,R uni (g p g c ) ) G S (d p d c ) = exp ( dp dc 2 σs 2 ) G Q ( p c ) = exp ( xp xc 2 + y p y c 2 (5) The neighboring pixels in local window around the depth object edge belong to different objects. In our filter, only the pixels which are in the same object of the current filtered pixel will be taken as the filtering input. As shown in Fig. 3, the LBP operator of the color image is the guidance of local pixel classification by its ability of edge description. For the purpose of describing the object edge direction accurately, the proposed filter omits the procedure of rotating the LBP operator which is used to deduce the minimum value in [5]. Therefore, the operator LBP P,R equals to binary joint distribution of T. LBP P,R = T (s (g 0 g c ),, s (g P 1 g c )) LBP P,R (g p g c ) = s (g p g c ), p [0, P 1] In Fig. 3, the LBP operator classifies the local pixels in local window w into two sets by binary labels, one of which contains the similar pixels with the current filtered pixel while the other set consists of dissimilarity pixels. However, the LBP operator is unable to justify which set the current pixel d c should belong to. In equation (7), we compare the distances from the pixel sets divided by LBP operator to the current filtered pixel. And then the LBP values of the pixels set with closer distance is assigned as 1. LBP uni { P,R (g p g c ) = σ 2 Q (6) LBP P,R (g p g c ), if sum zero sum one 1 LBP P,R (g p g c ), if sum zero < sum one (7) Where p [0, P 1], the variables sum zero and sum one represent the distances from the pixels sets partitioned by LBP operator to the current pixel respectively, as shown in equation (8): sum zero = LBP P,R (g p g c ) d p d c sum one = 1 LBP P,R (g p g c ) d p d c Where, d p (p [0, P 1]) are the near pixels around the current filtered pixel in window w, and d c denotes the current filtered pixel in depth-image. Fig. 3. Edge description of LBP operator 4. THE EXPERIMENT RESULTS The critical contribution of depth video is to assist the synthesis of virtual view, not to display directly. Therefore we evaluate the effectiveness of depth-image filter not only by the coding efficiency of depth image, but also by the quality of virtual view. The trilateral filter proposed by [4] is labeled method 1 as the reference method. Table 1. Depth coding result for ballet sequence The encoding efficiency of the depth image The coding efficiency of the depth-video is evaluated in terms of PSNR and bitrate. The proposed in-loop filter is implemented in reference software JMVC 6.0 [10]. The encoding configuration is follow the standard given by JVT [11]. We set 4 basic QPs 22, 27, 32 and 37 for depth video, and 1 basic QP 22 for color video. 100 frames are coded with 2 temporal reference pictures and 2 reference views. And the search range is ±96. (8)

4 Table 2. Depth coding result for breakdancer sequence This paper uses the sequences Ballet and Breakdancer with resolution of 1024*768 provided by Microsoft. In the two MVD sequences, view 0 and view 2 are selected as the reference views, and view 1 is the virtual view. Table. 1 and Table. 2 give the depth coding results of view 2 for t- wo sequences. And the rate-distortion (RD) curves in terms of bitrate and depth-image objective quality are shown in Fig. 4(a) and Fig. 4(b). (a) 4.2. The quality of the rendered video The software VSRS 2.3 [12] is used to render the virtual view. In the two MVD sequences, view 0 and view 2 are selected as the reference views, and view 1 is the virtual view. The PSNR value is calculated based on the difference between the virtual view image and the original image. While the virtual view image is rendered from the filtered depth image, the o- riginal image is derived from the original depth image. Table. 3, and Table. 4 illustrate the rendering results for Ballet and Breakdancer. The RD curves in terms of depth rate and rendering objective quality are shown in the Fig. 5. The subjective results are shown in Fig. 6. Table 3. Rendering result for ballet sequence Table 4. Rendering result for breakdancer sequence (b) Fig. 4. (a): Rate-distortion curves in terms of depth bitrate and depth quality (ballet). (b): Rate-distortion curves in terms of depth bitrate and depth quality (breakdancer) Analysis of the results Based on the above results, we will calculate the bitrate saving with the Bjontegaard metric [13], which computes average PSNR difference between RD-curves. Fig. 4(a) and Fig. 4(b). show that the proposed depth filter reaches better depth coding efficiency compared to the state-of-the-art approach [4]. The PSNR gains of depth image achieve up to 0.64d- B and 0.69dB, which translate to a bit rate savings of 6.55% and 10.45% for Ballet and Breakdancer. Therefore, the average PSNR gain is 0.66 db for depth coding. Fig. 5(a) and Fig. 5(b) demonstrate that the rendering quality achieves PSNR gains of 0.26dB and 0.63dB, which correspond to a bit rate savings of 8.19% and 23.58% for Ballet and Breakdancer respectively. Thus the average PSNR gain is 0.45dB for rendering video. However, the proposed filter results in a little more bitrate cost for depth video coding at the status of low bitrate. From the subjective results shown in Fig. 6, we conclude that the rendering image quality generated by our LBP-guided filter is better than method 1. Especially the hole and the blur are relieved by our method.

5 International Conference on, sept. 2010, pp [2] Y. Morvan P. Merkle and A. Smolic, The effects of multiview depth video compression on multiview rendering, Signal Processing: Image Communication, vol. 24, no. 1-2, pp. 73C88, January (a) [3] Kwan-Jung Oh, A. Vetro, and Yo-Sung Ho, Depth coding using a boundary reconstruction filter for 3-d video systems, Circuits and Systems for Video Technology, IEEE Transactions on, vol. 21, no. 3, pp , march [4] Shujie Liu, PoLin Lai, Dong Tian, and Chang Wen Chen, New depth coding techniques with utilization of corresponding video, Broadcasting, IEEE Transactions on, vol. 57, no. 2, pp , june [5] T. Ojala, M. Pietikainen, and T. Maenpaa, Multiresolution gray-scale and rotation invariant texture classification with local binary patterns, Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 24, no. 7, pp , jul (b) Fig. 5. (a): Rate-distortion curves in terms of depth bitrate and rendering quality (ballet). (b): Rate-distortion curves in terms of depth bitrate and depth quality (breakdancer). 5. CONCLUSION To solve the problem that the neighboring depth pixels with significantly different values will introduce unstable filtering output result, we made the first attempt to let the LBP operator guide the depth image filtering. Since LBP describes the structural feature of the color image accurately, it is used to select the similar near pixels which are in the same object with the current filtered pixel. The experiment results demonstrate that the proposed depth filter achieves better depth-image coding efficiency, rendering quality and subjective effect than the trilateral depth filter which is the optimal method currently. However, the bitrate cost of depth video is a little increased at low bitrate situation. This problem will be researched in our future work. 6. REFERENCES [1] P. Merkle, J.B. Singla, K. Muller, and T. Wiegand, Correlation histogram analysis of depth-enhanced 3d video coding, in Image Processing (ICIP), th IEEE [6] B. Widrow, I. Kollar, and Ming-Chang Liu, Statistical theory of quantization, Instrumentation and Measurement, IEEE Transactions on, vol. 45, no. 2, pp , apr [7] C. Dorea P. Yin P. Lai, A. Ortega and C. Gomila, Statistical theory of quantization, in Proc. of Visual Communic. and Image Proc., VCIP 09. San Jose, CA, USA,, Jan [8] C. Tomasi and R. Manduchi, Bilateral filtering for gray and color images, in Computer Vision, Sixth International Conference on, jan 1998, pp [9] Timo Ojala, Kimmo Valkealahti, Erkki Oja, and Matti Pietikinen, Texture discrimination with multidimensional distributions of signed gray-level differences, Pattern Recognition, vol. 34, no. 3, pp , [10] ISO/IEC JTC1/SC29/WG11 and ITU-T SG16 Q.6, Draft reference software for mvc, in JVT- AE207,London, June [11] ISO/IEC JTC1/SC29/WG11, Common test conditions for multi-view video coding, in JVT- U211,Hangzhou,China, October [12] Kazuyoshi Suzuki, Reference software for view synthesis version 2.3, in m16090, June [13] G. Bjontegaard, Common test conditions for multiview video coding, in VCEG Contribution VCEG-M33, 13th VCEG Meeting, Austin, Texas, USA, April 2001.

(a1) and (b1) Results by method 1. (a2) and (b2) Results by our method.

6 (a1) (b1) (a2) (b2) (a3) (a4) (b3) (b4) (a5) (a6) (b5) (b6) Fig. 6. Subjective rendering image results for sequences Ballet(a) and Breakdancer(b). (a1) and (b1) Results by method 1. (a2) and (b2) Results by our method. (a3) To (a6) Magnified results of Ballet : (a3)(a5), method 1. (a4)(a6), our method. (b3) To (b6) Magnified results of Breakdancer : (b3)(b5), method 1. (b4)(b6), our method.

Efficient Techniques for Depth Video Compression Using Weighted Mode Filtering

1 Efficient Techniques for Depth Video Compression Using Weighted Mode Filtering Viet-Anh Nguyen, Dongbo Min, Member, IEEE, and Minh N. Do, Senior Member, IEEE Abstract This paper proposes efficient techniques