YJVCI 661 No. of Pages 15, Model 3+ ARTICLE IN PRESS. Morphological wavelet-based stereo image coders

1 J. Vis. Commun. Image R. xxx (2005) xxx xxx www.elsevier.com/locate/jvci Morphological wavelet-based stereo image coders J.N. Ellinas *, M.S. Sangriotis Department of Informatics and Telecommunications, National and Kapodistrian University of Athens, Panepistimiopolis, Ilissia, 157 84 Athens, Greece Received 27 January 2004; accepted 1 October 2005 Abstract In this paper, we propose a family of novel stereoscopic image coders based on morphological coding and a block-based disparity compensation algorithm. The proposed schemes employ discrete wavelet transform decomposition and a morphological coder that lowers total entropy by exploiting the intra-band and inter-band statistical properties of the wavelet coefficients. This ensures high coding efficiency, embedded bit streams, fast execution, and simple implementation. Disparity compensation procedure is implemented on blocks of fixed or variable size employing the block-matching algorithm. The blocks of variable size are formed as a result of Right imageõs quad-tree decomposition with a simplified rate-distortion criterion. This technique adapts block size to regions of almost constant binocular disparity in contradiction with fixed block size based disparity estimation that divides these regions into smaller blocks, thus requiring more disparity vectors. The Left and the resulting predictive error images are subsequently transformed, quantized, and coded. The wavelet nature of the algorithm and the proposed disparity compensation provide reconstructed images without blocking artifacts and fewer annoying ringing effects. The extensive experimental evaluation shows that the proposed coders demonstrate very good performance as far as PSNR measures and visual quality are concerned, as well as low complexity. Ó 2005 Elsevier Inc. All rights reserved. Keywords: Stereo image compression; Wavelet transform; Morphology; Disparity 1. Introduction The perception of a scene with 3-D realism may be accomplished by a stereo image pair, consisting of two images of the same scene, recorded from two slightly different perspectives. The two images, which are distinguished as Left and Right, present binocular redundancy and thus may be encoded more efficiently as a pair rather than independently [1]. Therefore, stereo image pairs provide a two-dimensional means to represent 3- D scenes. Stereoscopic vision has a very wide field of applications in robot vision, virtual machines, medical surgery etc. Typically, the transmission or the storage of a stereo image requires twice the bandwidth or the capacity of a single image. The objective on a bandwidth-limited transmission system is to develop an efficient coding scheme that will exploit redundancies between the two images, that is, the intra-image and cross-image 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 * Corresponding author. Fax: +30 210 5450962. E-mail address: iellinas@di.uoa.gr (J.N. Ellinas). 1047-3203/$ - see front matter Ó 2005 Elsevier Inc. All rights reserved. doi:10.1016/j.jvcir.2005.10.005

2 J.N. Ellinas, M.S. Sangriotis / J. Vis. Commun. Image R. xxx (2005) xxx xxx correlation or similarities. Transform coding removes intra-spatial redundancy from an image and disparity compensation minimizes cross-image redundant information. A typical compression scenario is to encode one image, which is called reference and the disparity compensated prediction of the other, namely, the target. The disparity compensation procedure estimates the best prediction of the target image from the reference and results in an error image, which is called residual, together with a disparity vector field. Therefore, the effectiveness of the encoding algorithm, the energy of the residual image and the smoothness of the disparity field, all affect the overall performance of a stereo image compression scheme. In this work, the Left and Right images are considered as reference and target, respectively. In a recently proposed coder, a mixed coding scheme is used employing DCT transform for the best matching blocks and Haar filtering for the occluded ones [2]. Another DCT based coder, selects the quantization parameters for each block in the reference and residual images so as to minimize an averaged distortion measure and maintain a total bit budget [3]. A more advanced disparity compensation procedure proposes an overlapped block-matching scheme employing adaptive windows to improve the performance of the simple block-based schemes [4]. Another family of stereo image coders employs ShapiroÕs zero-tree monocular still image compression algorithm and the classical block-matching for disparity compensation with a pixel or half pixel accuracy [5]. One further method uses disparity compensation in the wavelet domain and a morphological coder as a single framework [6]. The novelty of the proposed work is the combination of a robust still image coder with a disparity compensation procedure in the spatial domain based on variable size block-matching. The proposed coders are based on the discrete wavelet transform (DWT) decomposition in combination with the morphological representation of wavelet data (MRWD) encoding unit, which is a robust algorithm yielding very good lossy compression [15]. This algorithm exploits the statistical properties of the wavelet coefficients to form clusters and uses a morphological operator to efficiently capture and partition the significant and insignificant ones. The proposed disparity compensation is based on the segmentation of one view given the other and achieves a coding representation that is commensurate with the local disparity detail. Typical stereoscopic images contain areas of almost constant disparity such as background or objects of large size. The disparity estimation schemes based on blocks of fixed size divide these areas into small blocks creating more disparity vectors than those are actually needed. To overcome this drawback, a disparity estimation based on a quad-tree segmentation of the Right image is proposed. Stereo_coder A encodes the Left and residual images after an open-loop disparity compensated prediction with blocks of fixed size. Stereo_coder B is similar to stereo_coder_a, except that it employs a closed-loop disparity compensated prediction with blocks of fixed size that reduces distortion at the decoderõs side. The Right image, in stereo_coder C, splits into blocks of variable size after quad-tree decomposition with a rate-distortion splitting criterion. Thus, disparity compensation procedure becomes more effective since it creates near constant disparity areas and devotes fewer bits to encode them. The outstanding features of the proposed stereoscopic coders are: the inherent advantages of the wavelet transform, the efficiency of the employed morphological compression algorithm and the effectiveness of disparity compensation. The main assets of the wavelet transform are the creation of almost decorrelated coefficients, energy compaction, and variable resolution. The morphological coder creates partitions of significant and insignificant coefficients in the wavelet transform domain reducing total entropy. It presents excellent compression efficiency, low complexity, fast execution, and may produce embedded bitstreams. The proposed disparity compensation is based either on the classical and simple fixed size block-matching algorithm (BMA) or on the variable size BMA, which is a more efficient but more complex algorithm. This paper is organized as follows. Section 2 describes the disparity compensation procedure employed by the proposed stereoscopic coders. In Section 3 the morphological coder is briefly explained and experimental results are presented in Section 4. Finally, conclusions are summarized in Section 5. 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 2. The proposed disparity compensation procedure The images of a stereoscopic pair represent the views of the same scene from two slightly different perspectives. Because of this, the points of the scene are recorded to different positions in the two images. This position difference is called disparity having one- or two-dimensional representation. If the recording cameras are 79 80 81 82

J.N. Ellinas, M.S. Sangriotis / J. Vis. Commun. Image R. xxx (2005) xxx xxx 3 placed in parallel, the disparity represents only horizontal displacements. In this work, we consider the general case where disparity is two-dimensional vector representing displacements in both horizontal and vertical directions. The disparity estimation process tries to determine the correspondence for each pixel or block of pixels between Left and Right images. Given the disparity estimate, the prediction of one image is formed and the residual image results by subtracting the prediction from the original. After disparity compensated prediction, the Left and residual Right images are decomposed by DWT and are encoded by MRWD algorithm. Fig. 1 shows the basic structure of stereo_coder A, which divides the Right image into 8 8 blocks and estimates the best matching blocks in the Left image. The disparity compensated difference (DCD) for a corresponding block b ij is defined as: DCDðb ij Þ¼ X ½b R ij ðx; yþ bl ij ðx þ dv x; y þ dv y ÞŠ; ð1þ ðx;yþ2b ij where b R ij and bl ij denote the Right and Left image intensities, respectively, and dv x and dv y are the disparity vector components. The disparity vector for a corresponding block is estimated by: DVðb i;j Þ¼½dv x ; dv y Š T ¼ arg min jdcdðb i;j Þj; ðdv x;dv yþ2s where S is the window searching area and the matching criterion is the minimum mean absolute difference (MAD). Fig. 2 demonstrates stereo_coder B, which performs the aforementioned disparity compensation procedure with the reconstructed Left image. This closed-loop disparity compensation is similar to that used for motion compensation in an MPEG coder. The open-loop disparity compensation used in stereo_coder_a is less effective, although it is simpler since there is no need for inverse quantization and wavelet transform at the encoderõs side. This is quite reasonable because the reconstruction of the Right image is performed with the aid of the reconstructed Left image at the decoderõs side. In closed-loop compensation, the encoder must be informed about the decoding rate of the Left image to be able to provide its reconstructed version which will be used for disparity compensation. In the case of closed-loop compensation Eq. (1) is modified as: Left Right Disparity Compensation Residual DWT & Quantizer Disparity vectors MRWD Entropy Coder Fig. 1. Block diagram of stereo_coder A. Disparity compensation is performed with the aid of the original Left image. Left Disparity Compensation Quantizer -1 & Inverse DWT DWT & Quantizer MRWD -1 MRWD Entropy Coder ð2þ 83 84 85 86 87 88 89 90 91 92 94 95 96 98 99 100 101 102 103 104 105 106 107 108 Right Residual Disparity vectors Fig. 2. Block diagram of stereo_coder B. Closed-loop disparity compensation is performed with the aid of the locally reconstructed Left image.

4 J.N. Ellinas, M.S. Sangriotis / J. Vis. Commun. Image R. xxx (2005) xxx xxx DCDðb ij Þ¼ X ½b R ij ðx; yþ bl ijðx þ dv x ; y þ dv y ÞŠ ðx;yþ2b ij where b L ij is the corresponding block of the reconstructed Left image. Besides the perceptive reason of using such a disparity compensation technique, a mathematical proof is provided in [5]. The experimental evaluation of both coders in Section 4 confirms the superiority of this approach. In stereo_coder C, the Right image is segmented into blocks of variable size according to a quad-tree splitting procedure [7 10]. Right image is initially segmented into blocks of homogeneous intensity using quad-tree decomposition with an intensity difference threshold. These blocks may probably belong to the same object or the background and may present homogeneous disparity characteristics. Then, second quad-tree decomposition with a simplified rate-distortion criterion follows, permitting the splitting of an already existing block to four children blocks only if there is a rate-distortion benefit from this splitting. Sethuraman et al. [8] employ a rectangular tiling segmentation scheme in a multi-resolutional stereo pyramid structure. This segmentation is optimized by employing a similar rate-distortion criterion. Woo et al. [10] use similar RD-based block segmentation for occluded blocks in an MRF framework. Our proposed segmentation scheme is based on a two phase splitting procedure performed in the spatial domain. A brief description of the method follows: Right image is initially quad-tree decomposed by using a suitable intensity difference splitting criterion. According to this criterion, a block splits into four children blocks if the maximum value minus the minimum value of the block elements is greater than a threshold. The threshold is defined as a value between 0 and 1, multiplied by 255 for grayscale images. The lowest permissible block size is set to 8 8 pixels. The threshold is set to a value (e.g., 0.5) so that intensity homogeneous regions are roughly formulated. The resulting 8 8 blocks are usually located at the boundaries of image objects where there are larger intensity gradients. Because of the intensity discontinuity, some of them may be occluded blocks requiring more bits to encode as they have larger residual energy. All the resulting 8 8 blocks are processed according to the above described closed-loop compensation. This initial splitting phase saves processing time because the following quad-tree decomposition, which is time consuming, is not performed in areas that possess intensity discontinuities and have the lowest permissible size. Quad-tree analysis is continued for blocks of larger size, but with a different splitting criterion [7,9]. The splitting criterion for a block is the cost of its residual, which is defined by the following relations: J p ¼ D p þ kr p ; J c ¼ X4 fd c ðkþþkr c ðkþg; k¼1 where J p and J c are the costs of parent and children blocks, respectively. Lagrange multiplier k defines the relation between distortion and bit rate. Its value affects the segmentation depth of the processed image. Distortion D is the MSE for the specific block. Rate R is defined as: R ¼ r dv þ r res ; ð6þ 150 ð3þ ð4þ ð5þ 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 145 146 147 148 where r dv and r res are the bit-rates of the disparity vectors and the residual, respectively. Therefore, a parent block splits into four children blocks if and only if the cost of the parent is greater than the cost of the children. After the split r dv increases, whereas r res and D decrease monotonically. The splitting criterion can be formed as: 151 152 153 154

J.N. Ellinas, M.S. Sangriotis / J. Vis. Commun. Image R. xxx (2005) xxx xxx 5 D p þ kr p i X4 k¼1 D c ðkþþk X4 k¼1 R c ðkþ; ( ) D p X4 D c ðkþik X4 ½r c dv þ rc res Š ½rp dv þ rp res Š ; ð8þ k¼1 k¼1 ( ) ( ) DD > k X4 r c dv X4 rp dv k. ð9þ k¼1 r p res k¼1 r c res Eq. (9) is finally reduced to the following form: DD þ kdr res > kdr dv ð10þ 159 which is satisfied if the following relation is valid: 160 DD > kdr dv ð11þ 162 as Dr res is always positive. This suggests that a parent block splits into four children if the benefit from the 163 distortion is greater than the benefit from the vectors bit-rate. 164 Right image, after completion of quad-tree analysis, consists of variable size blocks that present optimized disparity characteristics. This phase also handles occluded blocks located at the edges of the stereo pair or across overlapping objects of similar intensity, providing less distorted residual image. This segmentation scheme achieves a smoother and more accurate disparity field while substantially reducing the total number of disparity vectors transmitted. Fig. 3 shows the original Room stereo image pair. In Fig. 4, (A and B) illustrate the disparity compensated prediction error of stereo_coder A and stereo_coder B, respectively. Keeping the reconstruction quality of Left image constant (at a bit-rate of about 1 bpp), the disparity compensated prediction error is transmitted at a fixed bit-rate of about 0.75 bpp for both coders. The mean square error of the reconstructed Right image for stereo_coder B is about 15% less than that of stereo_coder A, proving the superiority of closed-loop disparity compensation. Fig. 5 shows the segmentation of the Right image according to the aforementioned quad-tree decomposition and the produced residual image of stereo_coder C, respectively. The blocks with white lines result from the first decomposition, whereas the blocks with grey lines are formed by the rate-distortion quad-tree decomposition. For the same bit-rate settings, the mean square error of the reconstructed Right image for stereo_coder C is about 1.5% less than that of stereo_coder B. This is achieved by keeping the initial intensity difference splitting criterion equal to 0.5 and the Lagrange multiplier equal to 5. In some coders, the two images are transmitted and reconstructed at different qualities. The human visual system may perceive sufficient depth information if the reference image is coded at a high quality and the target image at a lower quality. ð7þ 156 157 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 Fig. 3. Room stereo image pair.

6 J.N. Ellinas, M.S. Sangriotis / J. Vis. Commun. Image R. xxx (2005) xxx xxx Fig. 4. Disparity compensated prediction error at a constant bit-rate of 0.75 bpp, with Left image coded at 1 bpp. (A) stereo_coder A; (B) stereo_coder B. Fig. 5. Variable block size disparity compensation for stereo_coder C. (A) Quadtree segmentation of Right image with intensity difference splitting criterion equal to 0.5 and k = 5. (B) Disparity compensated prediction error at a constant bit-rate of 0.75 bpp, with Left image coded at 1 bpp.

J.N. Ellinas, M.S. Sangriotis / J. Vis. Commun. Image R. xxx (2005) xxx xxx 7 3. The morphological coder 185 The conventional wavelet image coders decompose a still image into multi-resolution bands [11], providing better compression quality than the so far existed DCT transform. An alternative adaptive wavelet packet scheme enhances the benefits of this transform [12]. The types of wavelet decomposition suffer from the fact that they include all the coefficients, which are spread within the subbands, in the transmitted sequence. This is true even for those that are zero or nearly zero and their absence would have little effect on the reconstructed image quality. The statistical properties of the wavelet coefficients led to the development of some very efficient algorithms, such as: the embedded zero tree wavelet coder (EZW) [13], the coder based on set partitioning in hierarchical trees (SPIHT) [14], the coder based on the morphological representation of wavelet data (MRWD) [15], and the embedded block coding with optimized truncation of the embedded bit streams (EBCOT) [16]. The MRWD algorithm, which is used in the present work, exploits the intra-band clustering and inter-band directional spatial dependency of the wavelet coefficients, Fig. 6. Their prediction in a hierarchical manner starts from the coarsest scale, using a 3 3-structuring element for the morphological dilation operation. A dead-zone uniform step size quantizer quantizes all the subbands. The coarsest detail subbands constitute binary images that contain two partitions of coefficients, the significant and insignificant. The coefficients that are greater than a predefined threshold are called significant. The intra-band dependency of wavelet coefficients or the tendency to form clusters suggests that the application of a morphological dilation operator may capture the significant neighbors. The finer scale significant coefficients (in the children subbands) may be predicted from the significant ones of the coarser scale (parent subbands) by the application of the same morphological operator to an enlarged neighborhood because the children subbands have double the size than that of their parents. Each of these two partitions may be further partitioned into two groups. The significant partition is divided into two groups that contain truly significant coefficients and insignificant coefficients that were predicted to be significant. Conversely, the insignificant partition is divided into two groups that contain LL 3 HL 3 HL 2 LH 3 HH 3 HL 1 HH 2 LH 2 LH 1 HH 1 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 Fig. 6. Inter-band spatial dependency among wavelet coefficients in a discrete wavelet decomposition.

8 J.N. Ellinas, M.S. Sangriotis / J. Vis. Commun. Image R. xxx (2005) xxx xxx truly insignificant coefficients and significant coefficients that were predicted to be insignificant. The description of a subband, in terms of its partitions, requires the use of a special symbol for each partition, which is the side information. This partitioning reduces the overall entropy and therefore the bit-rate becomes smaller than in the non-partitioning transmission. Another similar algorithm, SLCCA [17], improves MRWD by employing significance links among clusters. 209 210 211 212 213 4. Experimental results The experimental evaluation of the proposed coders was performed on a set of five stereo image pairs, namely, the most commonly used synthetic image Room (256 256) [18] and the camera-acquired images Fruit512 (512 512) [19], Pentagon (512 512) [19], Fruit256 (256 256) [20], and Aqua (360 288) [20]. The proposed stereoscopic coders employ four level wavelet decomposition with symmetric extension, based on the 9/7 biorthogonal Daubechies filters [21], for both Left and residual Right images of the stereo pair. The disparity compensation process is implemented on blocks of 8 8 pixels or on variable size blocks using the classical block-matching algorithm. The matching criteria usually exploited are the normalized cross correlation (NCC) and the mean absolute difference (MAD). MAD can correctly record matching points either in low or in high textured areas, in contradiction to NCC that is unable to operate correctly in low textured areas. In this paper, MAD is exploited to measure the similarity between corresponding blocks in a window search area of 6 pixels around the blocks. The objective quality measure of the reproduced images is estimated by PSNR, which is defined as: PSNR ¼ 10log 10 255 2 ðmse l þ MSE r Þ=2 where MSE l and MSE r are the mean square errors of the Left and Right images, respectively. The DWT subband coefficients, after their morphological representation and partitioning by the morphological coder, are encoded using arithmetic coding, which practically achieves the theoretical entropy bound [22]. The disparity vectors are losslessly transmitted using DPCM and arithmetic encoding. Stereo_coder A employs open-loop disparity compensation, where the residual Right image results from the disparity compensated prediction of the Left image. Stereo_coder B employs closed-loop compensation, where the disparity compensated difference image is computed using the reconstructed image instead of the original Left image. Stereo matching in both coders relies on the classical BMA with blocks of fixed size. The binocular perception of an asymmetrically coded stereo pair depends on the quality of the two images [23]. In the event that the two inputs are coded at different bit-rates, the human visual system perceives the average quality. Therefore, one image may be coded at a high quality and the other image at a low quality. An asymmetrical quantization scheme is illustrated in Fig. 7A where the Left image is coded at 1 bpp with a PSNR of 43.5 db and the Right image is coded at lower bit-rates. Evidently, closed-loop compensation provides equivalent quality results with open-loop compensation at low bit-rates, because there is a larger increment of Right imageõs reconstruction error. Fig. 7B shows the performance of the two coders when both Left and compensated prediction error are subjected to the same uniform quantization scheme. In this scheme, which is used in the present paper, the reconstructed Left image used at the encoderõs side is of the same quality as that of the decoderõs side. This insures a quality difference of coder B against coder A over the whole range of the examined bit-rates. Stereo_coder C employs closed-loop disparity compensation and the block-matching algorithm is implemented on blocks of variable size. The latter is obtained after quad-tree decomposition of the Right image according to a rate-distortion splitting criterion. The lowest permissible size of blocks is 8 8 and some of them may be occluded. The occluded blocks may be further segmented into blocks of lower size employing the same rate-distortion criterion. In this study, where the primary aim is to build a simple and efficient stereoscopic coder, this splitting of the occluded blocks is not considered as it contributes marginally to the total quality. Nevertheless, the proposed quad-tree decomposition reduces the number of the occluded blocks because an occluded block, which arises from a stereo mismatch, may conclude to matched children blocks with better rate-distortion characteristics. ð12þ 214 215 216 217 218 219 220 221 222 223 224 225 226 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256

J.N. Ellinas, M.S. Sangriotis / J. Vis. Commun. Image R. xxx (2005) xxx xxx 9 44 43 42 B 41 40 39 38 37 36 35 34 33 32 31 30 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 46 45 44 43 42 41 40 39 38 37 36 35 bit-rate (bpp) Left image Right image (coder A) Right image (coder B) 34 33 32 31 30 29 28 Left image Right image (coder A) 27 Right image (coder B) 26 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 bit-rate (bpp) Fig. 7. (A) Asymmetrical coding of Room stereo image pair. Left image is coded at a high quality (43.5 db at 1 bpp) and Right image at lower bit-rates. (B) Symmetrical coding of Room. Tables 1 3 show the experimental results for the five tested images. The stereo image pair Fruit512 has the same theme as Fruit256 but a different histogram. The estimated PSNR values express the performance of the stereo image pair for distinct total bit rates. As can be seen, stereo_coder B outperforms stereo_coder A 257 258 259

10 J.N. Ellinas, M.S. Sangriotis / J. Vis. Commun. Image R. xxx (2005) xxx xxx Table 1 Total performance of the tested stereo images using the proposed stereo_coder A Image pair 0.25 (bpp) 0.5 (bpp) 0.75 (bpp) 1 (bpp) Room 29.6 36 40.7 44 Fruit512 36.4 39.2 41 42.8 Pentagon 29.7 32.7 34.7 36.3 Fruit256 35.3 37.3 39 Aqua 24.8 28.1 30.5 32.3 Table 2 Total performance of the tested stereo images using the proposed stereo_coder B Image pair 0.25 (bpp) 0.5 (bpp) 0.75 (bpp) 1 (bpp) Room 30.2 36.8 41.6 45.5 Fruit512 37.5 40.6 42.6 44.4 Pentagon 30.7 34 36 37.9 Fruit256 36.2 38.5 40.3 Aqua 26 29.2 31.7 33.6 Table 3 Total performance of the tested stereo images using the proposed stereo_coder C Image pair 0.25 (bpp) 0.5 (bpp) 0.75 (bpp) 1 (bpp) Room 30.5 37 41.8 45.7 Fruit512 38.2 40.9 42.9 44.7 Pentagon 31 34.2 36.3 38 Fruit256 36.5 38.6 40.3 Aqua 26.4 29.5 31.8 33.7 due to the closed-loop disparity compensation that reduces the distortion of the reconstructed Right image. Also, stereo_coder C presents slightly better performance than the other two because of the variable size block matching algorithm. The first comparative performance evaluation was implemented using the synthetic stereo image pair Room. Fig. 8 illustrates the performance of the proposed family with respect to the disparity compensated JPEG2000 [24], the Optimal Blockwise Dependent Quantization [3], the Boulgouris et al. stereo coders [5], and Frajka et al. stereo coder [2]. Woo et al. employ a JPEG-like coder for both the reference and residual images, whereas Boulgouris et al. use DWT followed by arithmetic encoding. Frajka et al. employ JPEG for the reference image and a mixed transform coder, followed by arithmetic encoding for the residual image. Stereo_coder A presents comparable performance with respect to Frajka et al. coder, whereas stereo_coder B and stereo_coder C outperform about 0.5 db on average for the whole examined range. The proposed coders present considerable margins over the rest of the state-of-the-art coders. The second comparative performance evaluation was implemented using the real stereo image pairs Fruit256 and Aqua. Fig. 9 shows the performance of the proposed coders in comparison to the independent and disparity compensated JPEG, EZW, JPEG200 coding schemes of Fruit256 stereo image pair. As illustrated, stereo_coder B and stereo_coder C outperform by considerable margins both at medium and high bit-rates, whereas stereo_coder A shows a degraded performance. As illustrated in Fig. 10, the objective quality of stereo coders B and C is considerably better than other coders for Aqua stereo image pair. At medium bit-rates of about 0.5 bpp, improvements of up to 2.8 db are indicated with respect to disparity compensated JPEG, 3 db with respect to OBDC, 1.5 db with respect to Frajka et al. mixed coder and 1 db with respect to disparity compensated JPEG2000. 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280

J.N. Ellinas, M.S. Sangriotis / J. Vis. Commun. Image R. xxx (2005) xxx xxx 11 45 44 43 42 41 40 39 38 37 36 35 34 33 Opt. Blockwise Dependent Quantization 32 Disparity Compensated JPEG2000 Boulgouris et al. (coder A) 31 Boulgouris et al. (coder B) 30 Boulgouris et al. (coder C) Frajka et al. 29 proposed stereo coder A 28 proposed stereo coder B proposed stereo coder C 27 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 bit-rate (bpp) Fig. 8. Experimental evaluation of the proposed family of stereo coders in comparison to other stereoscopic coders of the Room stereo image pair. 41 40 39 38 37 36 Independent JPEG Disparity Compensated JPEG Independent EZW 35 Disparity Compensated EZW Independent JPEG2000 Disparity Compensated JPEG2000 34 proposed stereo coder A proposed stereo coder B proposed stereo coder C 33 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 bit-rate (bpp) Fig. 9. Experimental evaluation of the proposed family of stereo coders in comparison to other stereoscopic coders of the Fruit256 stereo image pair.

12 J.N. Ellinas, M.S. Sangriotis / J. Vis. Commun. Image R. xxx (2005) xxx xxx 35 34 33 32 31 30 29 28 27 26 25 Indep. JPEG 24 Disp. Compens. JPEG Optimal Blockwise Dependent Quant. 23 Indep. JPEG2K Disp. Compens. JPEG2K 22 Frajka et al. proposed stereo coder A 21 proposed stereo coder B proposed stereo coder C 20 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 bit-rate (bpp) Fig. 10. Experimental evaluation of the proposed family of stereo coders in comparison to other stereoscopic coders of the Aqua stereo image pair. The efficiency of our proposed method is basically due to the wavelet-based morphological coder, which is more efficient than EZW and DCT coders. The proposed morphological coder presents, for still images, about 1 db better performance over the popular EZW [15] and also outperforms DCT because of its wavelet nature. As Table 3 demonstrates, the employed rate-distortion algorithm in stereo_coder C contributes about 0.1 0.3 db to the final quality of the reproduced image pair. Figs. 11 13 demonstrate the reconstructed Right images of Room stereo image pair at a bit-rate of 0.43 bpp for the proposed family of stereo coders. Ste- 281 282 283 284 285 286 Fig. 11. Reconstructed Right image at 0.43 bpp by stereo_coder A.

J.N. Ellinas, M.S. Sangriotis / J. Vis. Commun. Image R. xxx (2005) xxx xxx 13 Fig. 12. Reconstructed Right image at 0.43 bpp by stereo_coder B. Fig. 13. Reconstructed Right image at 0.43 bpp by stereo_coder C. reo_coders B and C present practically no difference in the visual quality of the reproduced images but provide better visual quality than stereo_coder A. 287 288 5. Conclusions In this paper, a family of three stereoscopic image coders is presented. The first coder employs open-loop disparity compensation using the Left image to estimate the prediction of the Right image. The second coder 289 290 291

14 J.N. Ellinas, M.S. Sangriotis / J. Vis. Commun. Image R. xxx (2005) xxx xxx employs closed-loop disparity compensation using the reconstructed Left image for the prediction of the Right 292 image. Both coders apply stereo matching with the classical BMA on blocks of fixed size. The third coder seg- 293 ments the Right image employing quad-tree decomposition with a rate-distortion splitting criterion and pro- 294 duces the disparity compensated residual image with closed-loop compensation. Therefore in this coder, stereo 295 matching uses BMA on blocks of variable size. This quad-tree segmentation technique splits one block to four 296 equal size blocks if a simplified rate-distortion criterion is fulfilled. This criterion compares distortion and bit- 297 rate between parent and children blocks and proceeds to splitting if there is a benefit. Finally, all the proposed 298 coders employ a morphological coding unit which divides the wavelet transform coefficients into partitions of 299 significance and insignificance. This coder is a robust still image coder, which inherits all the advantages of a 300 wavelet transform, thereby lowering the entropy of the transmitted sequence. The experimental evaluation 301 showed that the closed-loop disparity compensation scheme produces better results than open-loop systems. 302 Also, the rate-distortion segmentation contributes up to 0.3 db to the performance of a closed-loop system. 303 The combination of a robust morphological encoding unit with the proposed disparity compensation schemes 304 shows beneficial experimental results in comparison to other state-of-the-art coders. 305 Acknowledgments 306 This work was supported in part by the Research Committee of the National and Kapodistrian University 307 of Athens under the project Kapodistrias and the EU and the Greek Ministry of Education under the project 308 of Archimedes-II. The authors thank Dr. E.A. Edirisinghe and Dr. T. Frajka for providing stereo image pairs 309 for the performance evaluation. 310 References 311 [1] M.G. Perkins, Data compression of stereopairs, IEEE Trans. Commun. 40 (1992) 684 696. 312 [2] T. Frajka, K. Zeger, Residual image for stereo image compression, Opt. Eng. 42 (1) (2003) 182 189. 313 [3] W. Woo, A. Ortega, Optimal blockwise dependent quantization for stereo image coding, IEEE Trans. CSVT 9 (6) (1999) 861 867. 314 [4] W. Woo, A. Ortega, Overlapped block disparity compensation with adaptive windows for stereo image coding, IEEE Trans. CSVT 10 315 (2) (2000) 194 200. 316 [5] N.V. Boulgouris, M.G. Strintzis, A family of wavelet-based stereo image coders, IEEE Trans. CSVT 12 (10) (2002) 898 903. 317 [6] J.N. Ellinas, M.S. Sangriotis, Stereo image compression using wavelet coefficients morphology, Image Vision Comput. 22 (4) (2004) 318 13 22. 319 [7] S. Sethuraman, Stereoscopic image sequence compression using multi-resolution and quad tree decomposition based disparity and 320 motion adaptive segmentation, Ph.D. Thesis, Carnegie Mellon University (1996). 321 [8] S. Sethuraman, M.W. Siegel, A.G. Jordan, A multi-resolutional region based segmentation scheme for stereoscopic image 322 compression, Proc. IS&T/SPIE (1995). 323 [9] W. Woo, Rate-Distortion based dependent coding for stereo images and video: disparity estimation and dependent bit allocation, 324 Ph.D. Thesis, University of Southern California (1998). 325 [10] W. Woo, A. Ortega, Y. Iwadate, Stereo image coding using hierarchical MRF model and selective overlapped block disparity 326 compensation, Proc. ICIP 99 (2) (1999) 467 471. 327 [11] M. Antonini, M. Barlaud, P. Mathieu, I. Daubechies, Image coding using wavelet transform, IEEE Trans. Image Process. 1 (2) (1992) 328 205 220. 329 [12] K. Ramchandran, M. Vetterli, Best wavelet packet bases in a rate-distortion sense, IEEE Trans. Image Process. 2 (2) (1993) 160 175. 330 [13] J.M. Shapiro, Embedded image coding using zero trees of wavelet coefficients, IEEE Trans. Signal Process. 41 (12) (1993) 3445 3462. 331 [14] A. Said, W.A. Pearlman, A New, fast and efficient image codec based on set partitioning in hierarchical trees, IEEE Trans. CSVT 6 (3) 332 (1996) 243 250. 333 [15] S.D. Servetto, K. Ramchandran, M.T. Orchard, Image coding based on a morphological representation of wavelet data, IEEE Trans. 334 Image Process. 8 (9) (1999) 1161 1174. 335 [16] D. Taubman, High performance scalable image compression with EBCOT, IEEE Trans. Image Process. 9 (7) (2000) 1158 1170. 336 [17] B. Chai, J. Vass, X. Zhuang, Significance-linked connected component analysis for wavelet image coding, IEEE Trans. Image Process. 337 8 (6) (1999) 774 784. 338 [18] <http://www-dbv.cs.uni-bonn.de/>. 339 [19] <http://vasc.ri.cmu.edu/idb/html/stereo/index.html/>. 340 [20] <http://www.code.ucsd.edu/~frajka/images/stereo/stereo_images.html/>. 341 [21] B.E. Usevitch, A tutorial on modern lossy wavelet image compression: foundations of JPEG 2000, IEEE Signal Process. Mag. (2001) 342 22 35. 343 [22] I.H. Witten, R.M. Neal, J.G. Cleary, Arithmetic coding for data compression, Commun. ACM 30 (1987) 520 540. 344

J.N. Ellinas, M.S. Sangriotis / J. Vis. Commun. Image R. xxx (2005) xxx xxx 15 [23] L.B. Stelmach, W.J. Tam, D.V. Meegan, A. Vincent, P. Corriveau, Human perception of mismatched stereoscopic 3D inputs, Proc. 345 ICIP 2000 (1) (2000) 5 8. 346 [24] M.D. Adams, H. Man, F. Kossentini, T. Ebrahimi, JPEG 2000: the next generation still image compression standard, ISO/IEC JTC 347 348 1/SC 29/WG 1 N 1734, (2000). M.S. Sangriotis received B.Sc. and Ph.D. degrees from Athens University in Greece. In 1981, he was with the 349 359 Department of Physics in Athens University. Since 1990, he has been with the Department of Informatics and 350 360 Telecommunications in Athens, Greece, where he is currently an Associate Professor. His research interests 351 361 include image analysis and image coding. 362 352 J.N. Ellinas received the B.Sc. Degree in Electrical and Electronic Engineering from the University of Sheffield, 353 England, in 1977, and the M.Sc. degree in Telecommunications from the Universities of Sheffield and Leeds, in 1978. Since 1983, he has been with the Technological Educational Institute of Piraeus, Department of Computer 354 355 Engineering, Greece, where he is currently an Assistant Professor. He is currently pursuing the Ph.D. degree in the 356 Department of Informatics and Telecommunications, Section of Telecommunications and Signal Processing, at 357 the University of Athens. His research interests are in image processing, image compression and wavelets. 358 363