Network Image Coding for Multicast

Network Image Coding for Multicast David Varodayan, David Chen and Bernd Girod Information Systems Laboratory, Stanford University Stanford, California, USA {varodayan, dmchen, bgirod}@stanford.edu Abstract We consider a new problem in network image coding for multicast. In a multihop mesh network, structured as a directed graph, all nodes decode and display reconstructions of the image (at possibly different qualities). Each node may also perform transcoding before transmitting data downstream in the network. The problem is the design of the coding and transcoding schemes to deliver the best image quality over the network. For a network with diamond topology, we show that multiple description coding combined with Wyner-Ziv transcoding is often superior to other methods. We argue further that the benefits are magnified for larger networks containing one or more diamond subnets. Our image coding experiments demonstrate that multiple description coding with Wyner-Ziv transcoding outperforms single description coding or multiple description coding with conventional transcoding, for both a diamond network and a two-hop mesh network with four branches. R S N 1 N 2 N k R 1 R R2 T Fig. 1. Two-hop mesh network with source node S and viewers at intermediate nodes N i (for 1 i k) and terminal node T. Link capacities are labeled in bits per pixel (bpp) of the original image. R Rk I. INTRODUCTION Consider a multihop mesh network as a directed graph, where each edge is labeled with its capacity (the maximum permitted bitrate). We are interested in multicast distribution of image content: all nodes decode and display the image, but at possibly different qualities. Each node is also permitted to perform transcoding before sending data to another node. This simple network flow graph applies equally to live streaming and file sharing, since it does not consider intermittent effects such as congestion, packet losses or departure of peers. Given such a network flow graph, we are interested in the fundamental question: how can one achieve the best image quality across the network? The general answer is far from clear, even for very small networks. Information theoretic bounds are available for a few special cases that transform into other problems, such as successive refinement [1] and multiple description coding [2], [3]. We focus on practical image multicast in the two-hop mesh network shown in Fig. 1, for which tight theoretical bounds have yet to be derived. In this network, a source node S communicates image content to viewers at intermediate nodes N i (for 1 i k) and a terminal node T. The first-hop links SN i support the same rate equal to R bits per pixel (bpp) of the original image. The second-hop links N i T support different rates equal to R i bpp of the original image, where R i R. For the case R i = R, Sarshar and Wu showed that multiple description coding produces a better quality reconstruction at node T than single description coding, with little degradation of quality at the intermediate nodes N i [4]. Each intermediate node receives a different description of the original image and relays it to the terminal node, which combines them into a higher quality reconstruction. But they did not consider the possibility of transcoding at intermediate nodes, if the second-hop links support lower bitrate than the first-hop links. In this paper, we demonstrate that Wyner- Ziv transcoding [5] of the multiple descriptions exploits the redundancy between descriptions, improving network image coding performance beyond conventional transcoding. In Section II, we consider the special case of k = 2, for which the network has diamond topology, and argue that any network which contains a diamond subnet with certain link capacities will benefit from Wyner-Ziv transcoding. Section III describes the implementation of the single description, multiple description and Wyner-Ziv image codecs. Their performance in image multicast, for k = 2 and k = 4, is reported in Section IV. II. DIAMOND NETWORK The network with k = 2 is the simplest nontrivial form of the two-hop mesh network in Fig. 1. For simplicity, we fix the rate R 1 = R, but allow R 2 to vary from 0 to R. Case 1: R 2 = 0. The network reduces to a tree, and each node has total downlink rate R. The source node S should encode the image to a single description at rate R and distribute this encoding along links SN 1, SN 2 and N 1 T. Case 2: R 2 = R. In this symmetric case, the terminal node T receives total network flow of 2R. If, as in Case 1, nodes N 1 and N 2 receive the same description encoded at rate R, then T is limited to that quality. Sarshar and Wu [4] showed that it is better to send different rate-r descriptions to N 1 and

1 3 1 3 1 3 1 3 1 3 1 3 1 3 1 3 1 3 1 3 1 3 1 3 i th Description (rate R) Reconstructed i th Description 8x8 DCT Wyner-Ziv Encoder at Node N i Inverse 8x8 DCT Quantizer Inverse Quantizer Gray Mapper LDPC Encoder LDPC/Gray Decoder Syndrome (rate R i ) 1 3 1 3 1 3 1 3 Wyner-Ziv Decoder at Node T 8x8 DCT Fig. 2. Polyphase subsampling of image pixels to create four descriptions. Interpolated 1 st to (i-1) th Descriptions N 2, so that T can combine the multiple descriptions into a higher quality reconstruction than in Case 1. Case 3: 0 < R 2 < R. When link N 2 T supports a lower rate than the other links, node T receives network flow of R + R 2. The multiple description approach of Case 2 can be modified as follows. The first description (of rate R) is relayed to node T via node N 1. The second description is transcoded at node N 2 from rate R to R 2. Since the first description is correlated with the second and is already available at node T, the transcoding can exploit the redundancy between the descriptions. For this reason, Wyner-Ziv transcoding of the second description at node N 2 offers a higher quality second description than conventional transcoding. This in turn improves the quality of the overall reconstruction at node T. In larger mesh networks containing one or more diamond subnets with link capacities as in Case 3, the improvement is greater because more links can benefit from Wyner-Ziv transcoding and the better reconstruction quality may propagate to downstream viewers. In particular, the two-hop mesh network in Fig. 1 consists of k 1 nested diamond subnets. At the top level, one can group the links into two bundles passing through complementary subsets of the intermediate nodes. III. NETWORK IMAGE CODECS A. Single Description Coding We use the JPEG standard to generate a single description of the image. To avoid blocking artifacts at low rate R, the image is subsampled before applying JPEG. We minimize aliasing in resampling using a Lanczos-3 interpolation kernel [6]. B. Multiple Description Coding We create spatially-interleaved multiple descriptions of an image via polyphase subsampling [7]. As shown in Fig. 2, the pixels are divided into four phases, each of which is encoded with JPEG into one of four descriptions. To preserve the original resolution, we do not prefilter the image before polyphase subsampling. Instead, we apply different smoothing kernels at reconstruction depending on which descriptions are available at each node. The intermediate nodes receive one of the multiple descriptions at quarter of the original resolution. We postfilter the JPEG reconstruction to reduce aliasing artifacts. The terminal node in the k = 2 network receives the first and second Fig. 3. Wyner-Ziv codec: encoder at node N i and a decoder at node T. descriptions only, possibly transcoded. These two descriptions are smoothed and missing pixels from the third and fourth phases are interpolated bilinearly. In the k = 4 network, the terminal node receives all four descriptions, possibly transcoded, to which we apply a smoothing postfilter. C. Wyner-Ziv Transcoding of Multiple Descriptions In the k = 2 diamond network described in Section II, the Wyner-Ziv transcoder employs an encoder at node N 2 and a decoder at node T. In line with the principles of source coding with decoder side information, the encoder transcodes the second description in the absence of the first description, and the decoder reconstructs the second description with reference to the first. The decoder side information is the bilinear interpolation of the first description to the pixel positions of the second phase. In the network with k = 4, the second description is transcoded as above. The third and fourth descriptions are also Wyner-Ziv transcoded for transmission along links N 3 T and N 4 T. The decoder side information is the bilinear interpolation of the first two descriptions to the positions of the third and fourth phases, respectively. Fig. 3 shows the block diagram of the Wyner-Ziv codec for transcoding the i th description at intermediate node N i, and reconstructing it in the presence of the first to (i 1) th descriptions at terminal node T. At the Wyner-Ziv encoder, the i th description is transformed blockwise using an 8 8 DCT in order to concentrate the signal energy to a few transform coefficients. Then the coefficients are quantized with a quantization matrix specified by a single quality factor. The quantization indices are binarized using a Gray mapping that maximizes bitwise correlation with the binarized version of the side information. Finally, the Gray bits are encoded into the syndrome of a low-density parity-check (LDPC) code, as in [8], to a rate R i bpp of the original image. The choice of quantization matrix quality factor is critical to whether the transmitted LDPC syndrome is decodable at terminal node T, because the bitstream from a coarse quantizer can be decoded (with reference to the side information) at lower rates than the bitstream from a finer quantizer. The

Syndrome Nodes Gray Bit Nodes Coefficient Nodes Fig. 4. Example LDPC/Gray decoder factor graph. transcoded) multiple descriptions are postfiltered with 3 3 kernels that mitigate visual artifacts while blurring the image as little as possible. A. Single Description For both the k = 2 and k = 4 networks, we set R = 0.23 bpp of the original image and R 1 = R. At this setting, a single description can be coded directly at quality factor 10, but it is subjectively better to subsample the image to quarter resolution with a Lanczos-3 kernel and code it at quality factor 54. Under single description coding, the same reconstruction at quarter resolution, shown in Fig. 5, is viewed at all nodes N i and T. decision is made in advance at the source node S, since it has access to all descriptions, and the selected quality factor is communicated to the Wyner-Ziv encoder at node N i at negligible additional rate. The Wyner-Ziv decoder at node T operates as follows. It receives the LDPC syndrome at an LDPC/Gray decoder, where all Gray bitplanes of the transcoded quantization indices of the i th description are recovered jointly, using the DCT coefficients of the side information. Following inverse quantization and inverse 8 8 DCT, the i th description is reconstructed. The LDPC/Gray decoder has a factor graph like the one in Fig. 4 [9]. It is an LDPC factor graph augmented with nodes representing each coefficient to be decoded. The DCT coefficients of the side information seed the coefficient nodes with probability distributions. These beliefs are propagated through the graph, reconciled with the received values in the syndrome nodes, and propagated back to the coefficient nodes, by means of the sum-product algorithm [9]. As mentioned above, the appropriate selection of quantization matrix quality factor guarantees that the coefficient distributions will converge after a fixed number of sum-product iterations. Further explanation of the LDPC/Gray decoder is available in [10] and [11]. IV. SIMULATION RESULTS In the following simulations, we multicast the luminance channel of the Lena image of original 512 512 resolution over the k = 2 and k = 4 networks, using single description coding and multiple description coding with and without Wyner-Ziv transcoding. The coding and transcoding steps employ commonly-used quantization matrices, specified by the JPEG quality factor ranging from 0 to 100. A higher value represents higher quality and the value 50 gives the matrix in Annex K of the JPEG standard [12]. With respect to an original quality factor of 50, we find that transcoding (whether conventional or Wyner-Ziv) at quality factors 6, 8, 12, 23 or 50 is rate-distortion efficient. The Wyner-Ziv transcoder treats a description as 16 separate tiles of resolution 64 64. It represents quantization indices to 8 Gray bits. We implement the LDPC code as a regular degree 3 code of length 32768 bits and flexible rate [13]. The reconstructions from (possibly B. Multiple Descriptions at Intermediate Nodes When R = 0.23 bpp of the original image for k = 2 or k = 4, the multiple descriptions obtained from polyphase subsampling can be coded at quality factor 50. Since the multiple descriptions are not antialias prefiltered, each description is postfiltered for viewing at its respective intermediate node, using the filter kernel 0.05 0.1 0.05 0.1 0.4 0.1 0.05 0.1 0.05. This kernel produces the reconstruction of the first description shown in Fig. 6, which is representative of the reconstructions at other intermediate nodes. This reconstruction is slightly blurrier than the single description reconstruction, but superior full resolution reconstructions at the terminal node now become possible. C. Multiple Descriptions at Terminal Node for k = 2 In the k = 2 network, the terminal node T receives the first and second descriptions of the image. Since R 1 = R, node N 1 simply relays the unfiltered first description to T without transcoding. Fig. 7 plots the rate R 2 (normalized by R) required for conventional and Wyner-Ziv transcoding the unfiltered second description at node N 2 at different quality factors. The rate required increases with greater quality for both transcoding methods, but Wyner-Ziv transcoding requires significantly less rate than conventional transcoding at equal quality, because it exploits the redundancy between the descriptions. Observe that around link capacity of R 2 = 0.32R, Wyner-Ziv transcoding operates at quality factor 23, but conventional transcoding operates at only 8. The terminal node T combines the first and transcoded second descriptions in two postfiltering steps. The pixels of the two received descriptions are smoothed using the filter kernel 0.15 0 0.15 0 0.4 0. 0.15 0 0.15 Note that the zeros lie on the unoccupied pixel positions of the third and fourth phases of the original image. Next these third and fourth phase positions are bilinearly interpolated.

1 0.8 Conventional transcoding Wyner Ziv transcoding 0.6 R 2 /R 0.4 0.2 Fig. 5. Single description reconstruction (quarter resolution) at all nodes N i and T, for R = 0.23 bpp of original image and R 1 = R. 0 6 8 12 23 50 Transcoding quality factor at N 2 Fig. 7. Required rate R 2 for conventional and Wyner-Ziv transcoding at node N 2 using different quality factors. Fig. 6. Multiple description reconstruction (quarter resolution) at node N 1 (representative of other nodes N i ), for R = 0.23 bpp of original image. With R 2 = 0.32R, Figs. 8(a) and (b) show the reconstructions at terminal node T using conventional transcoding at quality factor 8 and Wyner-Ziv transcoding at quality factor 23, respectively. Both of these reconstructions are at full resolution, higher than the single description reconstruction in Fig. 5, but only the reconstruction resulting from Wyner-Ziv transcoding is visually superior. The reconstruction resulting from conventional transcoding suffers from blocking artifacts that cannot be eliminated by filtering with a 3 3 kernel and bilinear interpolation. D. Multiple Descriptions at Terminal Node for k = 4 The k = 4 network contains the k = 2 network as a subnet. In this section, we fix the following link rates: R 1 = R and R 2 = 0.32R. Thus, the terminal node T obtains the first and second descriptions just as in the k = 2 network described in the previous section. Specifically, the first description is received coded at quality factor 50 and the second description at quality factor 8 if conventionally transcoded, or at quality factor 23 if Wyner-Ziv transcoded. Node T additionally receives the third and fourth descriptions via nodes N 3 and N 4. Figs. 9(a) and (b) plot the rates R 3 and R 4 required to transcode the unfiltered third and fourth descriptions, respectively, at different quality factors. Just like transcoding the second description, the Wyner-Ziv transcoder requires much less rate than conventional transcoding at the same quality. Notice, in particular, that around link capacities of R 3 = R 4 = 0.28R, Wyner-Ziv transcoding operates at quality factor 23, but conventional transcoding operates at only 6, for both the third and fourth descriptions. At reconstruction, the pixels of all four received descriptions are postfiltered using the filter kernel 0.1 0.1 0.1 0.1 0.2 0.1 0.1 0.1 0.1. All 8 neighbors of a pixel are weighted equally in this filter because quantization mismatch between descriptions (rather than aliasing) creates the most visible artifacts. For the setting R 3 = R 4 = 0.28R, Figs. 10(a) and (b) show the reconstructions at terminal node T with the third and fourth descriptions conventional transcoded at quality factor 6 and Wyner-Ziv transcoded at quality factor 23, respectively. As in the k = 2 network, the reconstruction resulting from Wyner- Ziv transcoding is visually superior to the one resulting from conventional transcoding, which suffers from severe blocking artifacts. Moreover, the k = 4 reconstruction in Fig. 10(b) is slightly sharper than the k = 2 reconstruction in Fig. 8(b), especially around horizontal and vertical edges like those found in Lena s hair.

(a) (b) Fig. 8. Multiple description reconstructions (full resolution) in k = 2 network at terminal node T, for R = 0.23 bpp of original image, R 1 = R and R 2 = 0.32R, using (a) conventional transcoding and (b) Wyner-Ziv transcoding. V. CONCLUSIONS We have investigated the problem of network image coding for multicast. All nodes in the multicast network decode and display reconstructions of the image (at possibly different qualities), and may also transcode their reconstructions for transmission to nodes downstream in the network. We showed that multiple description coding combined with Wyner-Ziv transcoding offers better reconstruction quality than multiple description coding combined with conventional transcoding, for the diamond network over a wide range of link rate settings. Furthermore, we argued that this property extends to more complicated networks that contain one or more diamond subnets. Larger networks benefit from more links that can use Wyner-Ziv transcoding and better reconstruction quality cascading to a larger number of viewers downstream. We performed experiments with four spatially-interleaved multiple descriptions, using two in the diamond network and all four in a two-hop mesh network with four branches. Our results showed that Wyner-Ziv transcoding can produce image reconstructions of superior visual quality compared to single description coding and conventional transcoding of multiple descriptions. In future work, we will apply these techniques to network video coding using temporally-interleaved multiple descriptions. Ultimately, we are interested in the theory and practice of network video coding for multicast over general network topologies. ACKNOWLEDGMENT This work has been supported, in part, by a gift from HP Labs. REFERENCES [1] W. Equitz and T. Cover, Successive refinement of information, IEEE Trans. Inform. Theory, vol. 37, no. 2, pp. 269 275, Mar. 1991. [2] L. Ozarow, On a source-coding problem with two channels and three receivers, Bell Sys. Tech. J, vol. 59, no. 10, pp. 1909 1921, Dec. 1980. [3] A. A. El Gamal and T. Cover, Achievable rates for multiple descriptions, IEEE Trans. Inform. Theory, vol. 28, no. 6, pp. 851 857, Nov. 1982. [4] N. Sarshar and X. Wu, Rate-distortion optimized multimedia communication in networks, in Proc. Visual Commun. and Image Processing, San Jose, CA, 2008. [5] A. D. Wyner and J. Ziv, The rate-distortion function for source coding with side information at the decoder, IEEE Trans. Inform. Theory, vol. 22, no. 1, pp. 1 10, Jan. 1976. [6] C. E. Duchon, Lanczos filtering in one and two dimensions, J. Appl. Meteor., vol. 18, pp. 1016 1022, 1979. [7] P. Subrahmanya and T. Berger, Multiple descriptions encoding of images, in Proc. IEEE Data Compression Conf., Snowbird, UT, 1997. [8] A. Liveris, Z. Xiong, and C. Georghiades, Compression of binary sources with side information at the decoder using LDPC codes, IEEE Commun. Lett., vol. 6, no. 10, pp. 440 442, Oct. 2002. [9] F. R. Kschischang, B. J. Frey, and H.-A. Loeliger, Factor graphs and the sum-product algorithm, IEEE Trans. Inform. Theory, vol. 47, no. 2, pp. 498 519, Feb. 2001. [10] D. Varodayan, D. Chen, M. Flierl, and B. Girod, Wyner-Ziv coding of video with unsupervised motion vector learning, EURASIP Signal Processing: Image Commun. J., vol. 23, no. 5, pp. 369 378, 2008. [11] D. Chen, D. Varodayan, M. Flierl, and B. Girod, Wyner-Ziv coding of multiview images with unsupervised learning of disparity and Gray code, in Proc. IEEE Internat. Conf. Image Processing, San Diego, CA, 2008. [12] ITU-T and I. JTC1, Digital compression and coding of continuous-tone still images, ISO/IEC 10918-1 ITU-T Recommendation T.81 (JPEG), Sept. 1992. [13] D. Varodayan, A. Aaron, and B. Girod, Rate-adaptive codes for distributed source coding, EURASIP Signal Processing J., vol. 86, no. 11, pp. 3123 3130, Nov. 2006.

1 0.8 Conventional transcoding Wyner Ziv transcoding 1 0.8 Conventional transcoding Wyner Ziv transcoding 0.6 0.6 R 3 /R 0.4 R 4 /R 0.4 0.2 0.2 0 6 8 12 23 50 0 6 8 12 23 50 Transcoding quality factor at N 3 Transcoding quality factor at N 4 (a) (b) Fig. 9. Required rates for conventional and Wyner-Ziv transcoding, (a) R 3 at node N 3 and (b) R 4 at node N 4, using different quality factors. (a) (b) Fig. 10. Multiple description reconstructions (full resolution) in k = 4 network at terminal node T, for R = 0.23 bpp of original image, R 1 = R, R 2 = 0.32R and R 3 = R 4 = 0.28R, using (a) conventional transcoding and (b) Wyner-Ziv transcoding.