Distributed Grayscale Stereo Image Coding with Improved Disparity and Noise Estimation

Size: px

Start display at page:

Download "Distributed Grayscale Stereo Image Coding with Improved Disparity and Noise Estimation"

Martin Parsons
6 years ago
Views:

1 Distributed Grayscale Stereo Image Coding with Improved Disparity and Noise Estimation David Chen Dept. Electrical Engineering Stanford University Abstract The problem of distributed coding of correlated grayscale stereo images is effectively addressed by an existing codec that uses block-based disparity compensation at the decoder. Based on the Slepian-Wolf theorem, one image can be transmitted at a rate approaching the conditional entropy if the other image is referenced as side information at the decoder. This paper extends the methods in the previous codec by refining disparity estimates to pixel precision, modeling the reconstruction error as a nonstationary random field, and generating more accurate initial disparity estimates. Relative to the previous codec, the new codec can achieve 5-8 percent bit savings for lossless coding and up to 4 db PSNR improvement for lossy coding. At the same time, the new codec converges faster to the correct reconstructions and removes visually unpleasant blocking artifacts. I. INTRODUCTION A pair of stereo images taken from slightly different positions but focused on the same scene share many common details. Conventionally, joint source coding would be performed to achieve better compression than separate coding of the two images. The assumption, however, is that the joint distribution of the two images is available at the encoder. In a distributed coding scenario, this assumption is no longer valid because the two images are encoded separately. The Slepian-Wolf theorem [] proves that lossless distributed coding can theoretically achieve the same compression ratio as lossless joint coding, provided that the joint distribution is available at the decoder. A novel and effective system for distributed coding of two correlated grayscale stereo images is presented in [], an extension of earlier work for correlated binary sources [3]. The basic operation of the system is depicted in Fig.. One image Y is transmitted using a conventional entropy encoder-decoder combination. For the distributed coding problem, it is assumed that Y is available error-free at the decoder. The other image X is coded using a low-density parity-check (LDPC) code, which has been shown to achieve good compression results for correlated binary sources [4]. At the LDPC encoder, pixel values of X are converted into a sequence of bits and transformed into a sequence of LDPC parity bits. Portions of the parity sequence are incrementally transmitted. The LDPC decoder attempts to reconstruct the original bit sequence from the received parity bits by a belief propagation algorithm similar to that in [4]. If it cannot successfully decode, the decoder requests that the encoder send more parity bits. The fewer parity bits needed Fig.. Existing distributed grayscale stereo image coder from []. for lossless reconstruction, the better the compression gain becomes. As expected on the basis of the Slepian-Wolf theorem, if X and Y are approximately related by a horizontal shift, then a correctly shifted version of Y can aid in lowering the transmission rate for X. The system in Fig. includes a disparity estimator to calculate the disparity D(x, y), or more accurately its distribution, between pixels in X and pixels in Y. The assumption is that X(x, y) Y (x D(x, y),y). () The disparity estimator supplies the LDPC decoder with disparity-compensated side information. If this side information correctly reflects the true state of X, then the side information will help the belief propagation algorithm converge towards the correct solution. The importance of disparity compensation for coding similar images is akin to the importance of motion compensation for coding consecutive video frames. The results in [] show that using disparity-compensated side information can yield significant bit savings over using the simplistic case when D(x, y) =is assumed. The previous work in [] also correctly recognizes that even with good disparity compensation, there will still be differences between X and the shifted version of Y. Errors remaining after disparity compensation are modeled by additive noise. Thus, the overall relationship between X and Y is X(x, y) =Y (x D(x, y),y)+n(x, y), () where N is modeled as Laplacian-distributed noise independent of X, Y, and D. This paper focuses on improving the accuracy of the statistical models in [] to achieve better compression of X. Specifically, we improve two aspects of the earlier models. Previously, estimates of the disparity D are calculated on

2 a block-by-block basis and the noise N is assumed to be stationary. In the new work, the models are refined to estimate D on a pixel-by-pixel resolution and to allow the noise N to be nonstationary across the image. In Section, the methods for estimating disparity and additive noise in the existing decoder are discussed and reasons for refining these methods are addressed. In Section 3, the new decoder is presented in three parts. First, a disparity field with pixel-by-pixel resolution is created by interpolating a disparity field with only block-by-block resolution. Second, an edge-adaptive noise estimator is used to accurately predict the noise power as a function of spatial position. Third, a shape-adaptive disparity estimator is shown to be beneficial in the early iterations of the decoding process in guiding the convergence of the disparity fields towards an optimal solution. Section 4 presents experimental results. Using the same test images as in [], the new system achieves bit savings of 5-8 percent for lossless coding and PSNR improvements of up to 4 db for lossy coding compared to the previous system. Also, the new system converges using fewer iterations of the LDPC belief propagation algorithm. Finally, in Section 5, the work is summarized and related future research is proposed. X Disparity Legend 5 5 D D D 3 D 4 3 Y Y II. EXISTING ESTIMATION METHODS A. Disparity Estimation The system in [] makes a simplification in disparity estimation by calculating the distribution P (D) only on a blockby-block basis. If the image is subdivided into square blocks of size k-by-k, then all the pixels in a given block share the same P (D). Variation in the block size k explores a tradeoff between disparity resolution and noise rejection. A small value of k permits finer resolution in correlating blocks in X with blocks in Y, but this is only useful if the information about X at the decoder is mostly correct. Instead, when only a fraction of the LDPC parity bits have been received, the information about X is noisy and some false beliefs are propagated in the LDPC decoder. In the presence of noisy information, a larger value of k can help to limit the effect of isolated errors. The test images used in [] and reused in the current paper are shown in Fig., along with the ideal block-wise mean disparity values for k =8. Using a single distribution P (D) for all the pixels in a k- by-k block is equivalent to nearest neighbor interpolation of a small disparity field onto a larger disparity field. Specifically, if D block (u, v) denotes the block-by-block disparity field and D(x, y) denotes the pixel-by-pixel disparity field, then P (D(x, y)) = P (D block ( x/k, y/k )) (3) where the interpolation is performed for each value D(x, y) can take. Currently, it is assumed that D(x, y) [ 5, 5]. The disparity field can then be visualized as eleven different probability planes, one for each value in [ 5, 5], and so the problem becomes equivalent to interpolation of eleven separate images. As can be observed from Fig., nearest neighbor interpolation has the disadvantage of causing step-like transitions at block Y 3 Y 4 Fig.. Grayscale stereo image X (8-bit 7-by-88 pixels), horizontal disparity fields for correlating 8-by-8 blocks of X with 8-by-8 blocks of Y through Y 4, and correlated stereo images Y through Y 4 which are available at the decoder as side information. boundaries in the interpolated disparity field. The actual pixelwise disparity field has smooth transitions across space. Thus, in Section 3., nearest neighbor interpolation is replaced with bilinear interpolation, which increases the resolution of the disparity field and achieves smoother disparity transitions. One other issue with the block-based approach is the appearance of false initial disparity estimates. At relative high bit rates, these false estimates can be eventually corrected as the LDPC belief propagation algorithm progresses. At lower bit rates, however, the starting disparity values can permanently bias the state of the system towards an incorrect disparity distribution. Fig. 3 illustrates the initial and final disparity fields used for matching test images X and Y from Fig. at two different bit rates. The two cases have similar false, noisy initial disparity estimates, but only the high-rate case converges to the correct solution. In Section 3.3, a shape-based disparity estimator is introduced to provide better initial disparity estimates than those generated by the previous system at low bit rates. B. Noise Estimation An assumption in [] is that the additive noise N in Eq., which is equivalently the difference between the original

3 Rate 4.48 bpp Rate 4.4 bpp Iteration 4 Iteration Fig. 3. Initial and final disparity estimates at two different bit rates. The previous block-based disparity estimator is used. For the higher rate, the final estimate is close to the ideal field in Fig.. At the lower rate, the final estimate is still very noisy. N = X X^ entire image Fig. 5. Newly proposed distributed grayscale stereo image coder. P(N).. σ N = 6.6 variant noise estimates for the LDPC decoder. P(N).3.. edge σ = 549. edge 5 5 N P(N) 5 5 N flat.3.. σ = 98. flat 5 5 N Fig. 4. Reconstruction error after disparity compensation between X and Y using the previous system after iterations of LDPC decoding, and distributions of the error over the whole image, in edge regions, and in flat regions. Rate = 4.4 bpp. III. IMPROVED ESTIMATION METHODS An overview of the new codec is presented in Fig. 5, which can be compared to Fig.. Three new blocks are introduced in the decoder. First, the output of the existing block-by-block disparity estimator is filtered by a bilinear interpolator prior to use by the LDPC decoder. Second, a noise estimator uses edge information from the reference image Y to provide the LDPC decoder with more accurate spatiallydependent noise statistics. Third, a separate disparity estimator uses shape information from Y to provide an initial estimate of the disparity field, which is found to be useful for lowrate applications. In the following subsections, the details and advantages of each of the three new blocks are discussed. image X and the reconstructed image ˆX, has stationary statistics and is zero-mean Laplacian distributed; that is, P (N) = λ e λ N (4) where by the stationarity assumption λ is a space-invariant constant. The assumption of a Laplacian distribution for the reconstruction error is common and fairly accurate in image coding. Using the test image Y from Fig., by reconstructing X from the disparity-compensated versions of Y, the reconstruction error and its statistical distribution are shown in Fig. 4. As expected, P (N) is well modeled by a Laplacian, especially for small values of N. What Fig. 4 also reveals is that the energy N is not equally distributed across the image. Instead, the energy is concentrated near edges of objects, where disparity compensation on a block-by-block basis performs worst. Thus, the assumption of a stationary N is not accurate. In Section 3., a nonstationary model for N is employed to provide space- A. Disparity Field Interpolation Nearest neighbor interpolation of a block-by-block disparity field onto a pixel-by-pixel disparity field can yield unnatural step-like transitions, as explained in Section.. Thus, a better method of increasing disparity resolution is bilinear interpolation, which mitigates the boundary effects and provides smooth transitions between blocks. The idea is similar to overlapped block motion compensation for video coding [5]. Again, if D block (u, v) denotes the block-by-block disparity field and D(x, y) denotes the pixel-by-pixel disparity field, then bilinear interpolation sets P (D(x, y)) = a P (D block ( x/k, y/k )) + a P (D block ( x/k +, y/k )) + a 3 P (D block ( x/k, y/k +)) + a 4 P (D block ( x/k +, y/k +)) (5) 3

4 D, Iteration 6 D 3, Iteration 4 D, Iteration 9 D 4, Iteration 66 Fig. 6. Bilinear interpolations of the block-wise disparity fields D through D 4 after a sufficient number of iterations for LDPC convergence. Rate = 4.4 bpp. (k x/k + x)(k y/k + y) a = k, (x k x/k ) (k y/k + y) a = k, (k x/k + x)(y k y/k ) a 3 = k, (x k x/k ) (y k y/k ) a 4 = k. The coefficients a through a 4 are chosen so that the spatially closer blocks contribute more to the weighted sum. Also, a through a 4 are properly normalized by the area of the k-by-k block so that bilinear interpolation creates a convex combination of the disparity probabilities. Bilinear interpolation of the block-by-block disparity field has two advantages. First, the method generates smooth transitions between blocks which mimic the natural low-pass patterns of the real disparity field. Second, since the disparity field consists of probabilities living on the interval [, ], the linear convex combination of these probabilities formed during bilinear interpolation continues to reside on the interval [, ]. Thus, probabilities map to probabilities. Other interpolation methods which can map probabilities to values outside [, ] would require proper normalization afterwards. Fig. 6 shows the pixel-wise disparity fields produced by the new system using bilinear interpolation during the decoding process, using Y through Y 4 from Fig. as side information. Each disparity field is recorded after the iteration in which the LDPC decoder converged to an errorless reconstruction. Indeed, as desired, there are no longer any sharp boundary transitions but rather smooth transitions between the original disparity values at the block centers. In Section 4, it will be shown that this method gives bit savings for lossless coding over nearest neighbor interpolation and provides faster convergence in LDPC decoding. B. Edge-Adaptive Noise Estimation The nonstationary nature of the noise N in Eq. was discussed in Section.. It is observed from Fig. 4 that the 3 energy of the noise is heavily concentrated around edges of objects and close to zero in low-frequency regions. Therefore, the generalization of the statistical model of N presented here, from stationary to nonstationary, relies on the edge information. The new noise estimator extracts a binary edge image Y edge (x, y) of Y using a Canny edge detector [6], where Y edge (x, y) = indicates that an edge crosses the point (x, y). Compared to simpler edge detectors, the Canny edge detector is known to perform better at identifying actual object edges and rejecting spurious noise-induced edges. From the observation that errors after disparity compensation are higher along edges, the noise estimator assigns noise variance σ,or equivalently the Laplacian parameter λ = /σ, by setting { λedge, for Y edge (x, y) = λ = λ flat, for Y edge (x, y) = (6) where λ edge <λ flat reflecting greater noise variance along edges. This classification has the benefit of being a simple criterion to apply for any image while at the same time yielding a fairly accurate prediction of spatial variations in noise energy. Since edges in X and Y do not align perfectly, it may also be helpful to broaden the edges beyond the pixelwide edges generated by the Canny edge detector. For this purpose, dilation or another morphological operator can be used. Through the decoding process, the variances can be slightly modified to achieve LDPC convergence at lower bit rates or in fewer iterations. Experimental results in Section 4 demonstrate that use of this new noise model yields bit savings for lossless transmission of X and contributes to speeding up the convergence of LDPC decoding. C. Shape-Adaptive Disparity Estimation Section. commented on how, as the bit rate decreases, the previous block-based disparity estimator can produce false initial disparity estimates which the decoder cannot correct over time. For these low rate situations, a new shape-based disparity estimator is developed. This estimator extracts dominant shapes from the image and assigns the same disparity distribution for all pixels within a single shape. The underlying assumption is that shape boundaries approximately separate regions of different disparity. In Fig. 7, the disparity field for test image Y from Fig. is plotted alongside a segmented version of Y. In the segmented image, each shape has been labeled with the mean value of a corresponding disparity region that is most similar structurally. Then, using the shapebased disparity field is a better initial estimate than using a noisy starting distribution like that in Fig. 3. Image segmentation must first be performed on the reference image Y. Several popular algorithms have been considered: K-means clustering [7], region growing [8], and graph partitioning [9]. For the test images used, the method in [9] outperformed the other two methods in terms of identifying dominant shapes. The algorithm works by creating an undirected graph using the pixels of the image as nodes, where 4

5 D Segmented Y where D [ 5, 5]. If shape-based disparity initialization is activated, then the output of the block-based disparity estimator, after bilinear interpolation, is replaced with the output of the shape-based disparity estimator for the first iterations. The primary focus is on improving lossless coding, although not surprisingly advancements made there also help improve lossy coding. Fig. 7. Disparity field D and segmentation of Y using the graph partitioning algorithm of [9]. The segmented image has each of its regions labeled with the disparity value of the closest structurally matching disparity region in D. pixels of the same region are connected by edges. Initially, every pixel can be connected to every other pixel, which is the trivial case of a single region as large as the image. Then, progressively, weights are assigned along the edges based on pixel similarities. High edge weights represent dissimilarity. Thus, boundaries should form by cutting across the edges with the highest weights. The algorithm is iterative and stops when no more changes above a noise threshold take place between iterations. The shape-adaptive disparity estimator is only meant to be used in the early iterations of LDPC decoding. In the later iterations, the block-based estimator, now supplied with a good rather than a poor starting disparity field, followed by the bilinear interpolator should be used to calculate the final disparity distributions. An optimal method for choosing disparity values inside each segmented region has not yet been found. Instead, the heuristic currently used is to try different combinations of disparity values until one combination causes LDPC convergence or all combinations have been exhausted. The search should begin by assigning disparity values near zero first. This heuristic is only practical if the number of segmented regions is small, as is the case for the test images used. For n different regions and disparity taking values on the range [ 5, 5], there are in the worst case n distinct combinations that must be tried. In Section 4, the simulation results show that using shapeadaptive disparity initialization provides some bit savings for lossless coding but more importantly provides significant PSNR improvements for lossy coding. IV. EXPERIMENTAL RESULTS Coding simulations are performed with the test images in Fig.. Each image is 7-by-88 pixels and has 8-bit depth. The LDPC code used to incrementally transmit X has a full length of 5688 bits []. A simple rate control scheme is employed. If after a maximum number of iterations the LDPC decoder still cannot reconstruct X without error, it requests additional parity bits from the LDPC encoder. The starting distribution P (D(x, y)) set before the first iteration of LDPC decoding is {.75, for D(x, y) = P (D(x, y)) =.5, for D(x, y) (7) A. Bit Savings for Lossless Coding In Table I, a comparison is made between the rates needed for lossless transmission of X using the different methods. The previous system of Fig. provides a significant improvement, on average 3 percent lower bit rate, over decoding without disparity compensation. Newly proposed methods are able to further reduce the bit rate. First, using only disparity bilinear interpolation in the system of Fig. 5, the bit rate is lowered by 3 percent from the previous system for images Y through Y 3 and by 5 percent for image Y 4. Second, using both disparity bilinear interpolation and edge-adpative noise estimation, the bit rate is lowered by 5 percent for images Y through Y 3 and by 8 percent for image Y 4. Third, using also shapeadaptive disparity initialization, the bit rate for image Y can be reduced by 8 percent from the previous system. As the bit rate approaches the Slepian-Wolf lower bound, additional coding gains become increasingly difficult to achieve. B. Faster Convergence for Lossless Coding The proposed system reduces the number of LDPC decoding iterations needed for convergence to a correct reconstruction. Shape-adaptive disparity initilization is deactivated for this simulation, because the computation time in searching for optimal starting disparity values in the segmented regions, as discussed in Section 3.3, would create an unfair comparison. Fig. 8 plots the number of iterations needed for convergence as functions of the bit rate. For each of the four test images, the proposed decoder converges faster than the previous decoder, and the improvements become more noticeable as the rate decreases. C. PSNR Gains for Lossy Coding The new methods also improve performance for lossy coding, if the decoder is chosen to be used in that mode. After iterations of LDPC decoding, if convergence is not achieved, then whatever reconstructed image ˆX is decoded is designated as a lossy version of X. Reconstruction errors are examined for a range of rates lower than those used for lossless coding. The rate-distortion behavior for the four test images is shown in Fig. 9. All three proposed features are activated in the new system. Increases in PSNR of as much as 4 db are obtained over the previous system. Although not shown, when shape-adaptive disparity initialization is deactivated, the PSNR gains are much less. Thus, the effect of having a good starting disparity estimate is even more pronounced for lossy coding than for lossless coding. 5

6 TABLE I COMPARISON OF LOSSLESS BIT RATES FOR TRANSMITTING X USING DIFFERENT METHODS. Side Information Image Y Y Y Y 3 Y 4 Pixel-Wise Conditional Entropy for X Given Y (bpp) No Disparity Compensation, Rate (bpp) Previous System in Fig., Rate (bpp) Proposed System in Fig. 5 (Bilinear Only), Rate (bpp) Proposed System (Bilinear + Noise Only), Rate (bpp) Proposed System (Bilinear + Noise + Shape), Rate (bpp) Number of iteration Number of iterations Y Y 4 Fig. 8. Number of LDPC decoding iterations until convergence for previous codec (- +) and newly proposed codec (- o), evaluated for four side information sources Y through Y 4. D. Removal of Blocking Artifacts Another consideration in lossy coding is the removal of blocking artifacts. If the block-based disparity estimator choses the wrong disparity value, the reconstructed image ˆX usually contains a blocking artifact, in which a 8-by-8 block from Y is incorrectly copied into an 8-by-8 region in ˆX. The new system helps to mitigate this problem in two ways. First, the bilinear interpolator smoothes the disparity field and increases disparity resolution, so that errors in ˆX manifest as isolated errors than block-wise errors. Second, a correctly chosen shape-based disparity initialization lowers the probability of correlating the wrong blocks between X and Y. Fig. shows lossy decodings of X using Y as side information, for the previous and proposed systems, at a rate of 3.87 bpp. The previous decoder generates blocking artifacts on the bear s left arm, chest, and face. In contrast, the new decoder produces isolated salt-and-pepper noise, which can be reduced by a postprocessing median filter, and avoids creating blocking artifacts. V. CONCLUSION This paper has developed a new codec for distributed grayscale stereo image coding, based on a previously successful codec that used block-wise disparity compensation. The new codec refines the disparity field to pixel precision using bilinear interpolation, estimates the nonstationary variance of the noise using edge information, and initializes the disparity field PSNR (db) PSNR (db) 6 Y Y Y Y Fig. 9. Rate-distortion curves for previous codec (- +) and newly proposed codec (- o), evaluated for four side information sources Y through Y 4. Previous Decoder Proposed Decoder Fig.. Reconstructed images showing blocking artifacts using the previous system and showing removal of block artifacts using the proposed system. Rate = 3.87 bpp. using shape information. Compared to the previous codec, these three improvements result in 5-8 percent bit savings for lossless coding and as much as 4 db gain in PSNR for lossy coding. Also, the number of iterations until convergence is reduced and visually disturbing blocking artifacts are removed. The author proposes to revise the system in two ways for future research. First, it is worthwhile to explore efficient disparity initialization methods, as the quality of the original disparity estimate can have a significant effect on the final outcome. A fast strategy for approximately estimating the disparity values of large shapes in the image is desirable. Second, the LDPC encoding currently used gives equal protection to all pixels in the image, when in fact not all pixels are equally important in affecting reconstruction quality. Instead, as observed during the experiments, edge pixels typically have 6

7 higher noise variance in the reconstructed image because the accuracy of disparity compensation decreases around edges. Therefore, assigning more parity bits to edge regions at the LDPC encoder may result in better decoding. ACKNOWLEDGMENT The author would like to thank Prof. Bernd Girod, Dr. Markus Flierl, and David Varodayan for teaching an enjoyable class on image and video coding this quarter. Special thanks are given to David Varodayan for his helpful advice throughout this project. REFERENCES [] D. Slepian and J. K. Wolf, Noiseless coding of correlated information sources, IEEE Transactions on Information Theory, vol. 9, no. 4, pp , July 973. [] D. Varodayan, A. Mavlankar, M. Flierl, and B. Girod, Distributed grayscale stereo image coding with unsupervised learning of disparity, in Proceedings of IEEE Data Compression Conference, pp. 43-5, March 7, [3] D. Varodayan, A. Mavlankar, M. Flierl, and B. Girod, Distributed coding of random dot stereograms with unsupervised learning of disparity, in Proceedings of IEEE International Workshop on Multimedia Signal Processing, pp. 5-8, Victoria, BC, Canada, Oct. 6. [4] A. Liveris, Z. Xiong, and C. Georghiades, Compression of binary sources with side information at the decoder using LDPC codes, IEEE Communications Letters, vol. 6, no., pp , Oct.. [5] M. T. Orchard and G. J. Sullivan, Overlapped block motion compensation: an estimation-theoretic approach, IEEE Transactions on Image Processing, vol. 3, no. 5, pp , Sept [6] J. Canny, A computational approach to edge detection, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 8, no. 6, pp , Nov [7] Y. L. Chang and X. Li, Adaptive image region-growing, IEEE Transactions on Image Processing, vol. 3, no. 6, pp , Nov [8] R. Nock and F. Nielsen, Statistical region merging, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 6, no., pp , Nov. 4. [9] P. F. Felzenszwalb and D. P. Huttenlocher, Efficient graph-based image segmentation, International Journal of Computer Vision, vol. 59, no., Sept. 4. [] D. Varodayan, A. Aaron, and B. Girod, Rate-adaptive distributed source coding using low-density parity-check codes, in Proceedings of Asilomar Conference on Signals, Systems, Computers, Pacific Grove, CA, 5. [] A. Wyner and J. Ziv, The rate-distortion function for source coding with side information at the decoder, IEEE Transactions on Information Theory, vol., no., pp. -, Jan

Network Image Coding for Multicast

Network Image Coding for Multicast David Varodayan, David Chen and Bernd Girod Information Systems Laboratory, Stanford University Stanford, California, USA {varodayan, dmchen, bgirod}@stanford.edu Abstract