EFFICIENT METHODS FOR ENCODING REGIONS OF INTEREST IN THE UPCOMING JPEG2000 STILL IMAGE CODING STANDARD Charilaos Christopoulos, Joel Askelöf and Mathias Larsson Ericsson Research Corporate Unit Ericsson Radio Systems AB, S-164 80 Stockholm, Sweden Email: {Charilaos.Christopoulos, Joel.Askelof, Mathias.Larsson}@era.ericsson.se ABSTRACT The general method for generating the ROI mask needed for encoding Regions of Interest in the upcoming JPEG2000 still image coding standard is presented. A simple method for the generation of the ROI mask for rectangular-shaped ROI is then proposed. Finally, to simplify the decoder when dealing with arbitrary shaped Regions of Interest, a method for ROI coding that does not require any shape information to be transmitted to the decoder is described. A small coding penalty is associated with this method, while it makes it possible to have arbitrarily shaped ROI without the need of shape information and ROI mask generation at the decoder. The proposed methods have been included in the Final Committee Draft of JPEG2000 Part I. EDICS: SPL.IP.1.1 Corresponding author: Charilaos Christopoulos MediaLab Ericsson Research Corporate Unit Ericsson Radio Systems AB 164 80 Stockholm Sweden Tel: +46-(0) 70 4042685 Fax: +46-(0) 8 7573100 E-mail: charilaos.christopoulos@era.ericsson.se 1
1 INTRODUCTION JPEG2000 uses a wavelet-based coding system, based mainly on Embedded Block-based Coding with Optimized Truncation [1][2]. One of the requirements in JPEG2000 is the support of Region of Interest (ROI) coding, where ROI of the image can be coded with better quality than the background (BG). The ROI coding mode supported in JPEG2000 is based on scaling the wavelet coefficients and the general ideas are presented in [3][4]. The principle of a scaling based method is to scale up (shift up) coefficients so that the bits associated with the ROI are placed in higher bit-planes. Then, during the embedded coding process, those bits are placed in the bit-stream before the non-roi parts of the image (depending on the scaling value, some bits of ROI coefficients might be encoded together with non-roi coefficients). Thus, the ROI will be decoded, or refined, before the rest of the image. Regardless of the scaling, a full decoding of the bit-stream results in a reconstruction of the whole image with the highest fidelity available. If the bit-stream is truncated, or the encoding process is terminated before the whole image is fully encoded, the ROI will have a higher fidelity than the rest of the image. In the JPEG2000 standard the scaling-based method is implemented as follows: 1. Calculate the wavelet transform 2. If a ROI is chosen, derive a mask (ROI mask) indicating the set of coefficients that are required for up to lossless ROI reconstruction (see section 2). 3. Quantize the wavelet coefficients 4. Downscale the coefficients outside the ROI mask by a specific scaling value. 5. Entropy encode the resulting coefficients progressively with the most significant bitplanes first. The scaling value assigned to the ROI and the coordinates of the ROI are added to the bit-stream. If the scaling value were chosen arbitrarily, the shape of the ROI would also be needed in the bitstream. The decoder would also perform the ROI mask generation and scale up the downscaled coefficients. This letter presents methods for ROI mask generation and methods for supporting arbitrary shaped ROI s without the need of shape information at the 2
decoder. The proposed methods in this letter have been contributed by the authors to the JPEG2000 standardization committee and have been included in the Final Committee Draft of JPEG2000 Part I. 2 CALCULATION OF ROI MASK The ROI mask is a bit plane indicating which wavelet coefficients have to be transmitted exactly in order for the receiver to reconstruct the desired region perfectly. In the following, 1D 2-channel filters banks are being used, which is easily extended to 2D multi-resolution decompositions formed by iterative application of the 1D decomposition stages. In each level of wavelet decomposition, the inverse wavelet transform is followed to see how the ROI expands [4]. If in one decomposition, the original samples are denoted X(2n) and X(2n+1) and the samples belonging to the low and high frequency subbands are denoted l(n) and h(n) respectively, then the ROI mask can be found by checking which l(n) and h(n) are required for the computation of X(2n) and X(2n+1). As an example, the ROI mask for the integer (5,3) filter (which has 3 low-pass and 5 high pass synthesis taps)) is derived by looking at the inverse transform and checking what coefficients need to be in the mask. The synthesis is represented by the lifting steps detailed in the equations below: H( n 1) + H( n) X( 2n) = L( n) 4 X ( 2n + 1) = L( n) + L( n + 1) H( n 1) 6H( n) H( n 1) + + + 2 8 The coefficients needed to reconstruct X(2n) and X(2n+1) lossless can immediately be seen to be L(n), L(n+1), H(n-1), H(n), H(n+1) and therefore, they should be in the ROI mask. The ROI mask in generated in the same manner for other filters. 3
3 FAST CALCULATION OF THE ROI MASK FOR RECTANGULAR ROI S Complexity of the ROI mask generation can be reduced when the ROI shape is rectangular. In this case, instead of tracing how each coefficient and pixel value is reconstructed in the inverse transform, only two positions need be studied, namely the upper left and the lower right corners of the mask. All samples in between the two corners are considered to belong to the mask. In each level of decomposition, the inverse wavelet transform is followed to see how the mask expands. To calculate the mask in one subband, the mask in the subband that was decomposed into the particular subband is used. If the ROI mask in the parent subband covers the coefficients between (x0,y0) and (x1,y1) then the mask in the new subband will cover: x 0 = x0 / 2 neg_support, y 0 = y0 / 2 neg_support, x 1 = x1 / 2 + pos_support, y 1 = y1 / 2 + pos_support The neg_support and pos_support will depend on whether the new (x0,y0) and (x1,y1) are in the low or high subband in the decomposition and whether the low or the high frequency wavelet filter was applied to the original x0,y0,x1,y1. The values of the supports are calculated from the inverse wavelet transform. If l(n) and h(n) denote the samples belonging to the low and high subbands respectively, and X(2n) and X(2n+1) denote the original coefficients, then to reconstruct a coefficient X(2n) and X(2n+1), a number of samples from {l(n)} and {h(n)} are needed. By tracing the inverse transform, the highest and lowest n can be found for l and h for reconstruction of X(2n) or X(2n+1). For example, for the (5,3) integer wavelet transform, for reconstruction of X(2n) and X(2n+1) from the low subband, pos_support=1 and neg_support=0. For reconstruction of X(2n) and X(2n+1) from the high subband, pos_support=1 and neg_support=1. 4 MAXSHIFT METHOD FOR ROI CODING IN JPEG 2000 For arbitrary shaped ROI s using a general scaling based method with arbitrary scaling values, a shape encoder and a shape decoder is also required to encode/decode the shape information. This makes both the encoder and the decoder more complex, while the bitrate increases. In addition to this, the decoder has to generate the ROI mask, 4
adding therefore to the computational complexity and memory requirements. The MAXSHIFT method is a way to resolve these problems. Instead of specifying the desired scaling value as in the previous method, the encoder scans the quantized coefficients and chooses a scaling value such that the minimum coefficient magnitude belonging to the ROI is larger than the maximum coefficient magnitude in the background (non-roi area). That is, the scaling value is chosen to be the smallest integer such that s (=2 shifting value ) is greater than any coefficient in the background. The decoder gets the bit-stream and starts decoding. Every coefficient that is greater than or equal to s (=2 shifting value ) must belong to the ROI and thus all coefficients smaller than s must belong to the background and should therefore be scaled up. The only thing that the decoder needs to do is the up-scaling of the received Background coefficients. Figure 1 illustrates the MAXSHIFT method. Advantages of the method compared to the general scaling based method are shown in section 5. 5 RESULTS AND COMPARISONS When the ROI mask is generated for an arbitrarily shaped ROI, each sample in the parent subband is checked to see if it belongs to the ROI. For a rectangular ROI, only the two corner points are checked since all samples between them are known to belong to the ROI, reducing significantly the computational complexity of the ROI mask generation Handling of arbitrarily shaped ROI is also simple with the MAXSHIFT method compared to the general scaling based method. No shape encoding is required and the decoder is simpler since no shape decoding and no ROI mask generation is needed. Experiments also show that for lossless coding of images with ROI, the MAXSHIFT method increases the bitrate by approximately 1-8% (this is for images of size 512x512 up to 2048x2560 and ROI that is circular of rectangular and covers approximately 25% of the image), compared to lossless coding of an image without ROI. Compared to lossless coding of images with ROI coded with a general scaling based method, the bitrate increase 5
is less (0.5-4%, depending on the scaling value used). This is small, given the fact that the general scaling based method for arbitrary shaped ROI s would require shape information to be transmitted to the decoder (as well as ROI mask generation), increasing the added complexity to the decoder. Notice that even in the simple case of circular ROI, the ROI mask generation method has to be used with a general scaling based method. This is avoided with the MAXSHIFT method. Notice however that if there are multiple ROI s in the image and we want to give them different degrees of interest, we would need the (number of ROI's) * (the original bit-depth) when we use the MAXSHIFT method. Multiple ROI's can he handled easier with the general scaling method, since the dynamic range will not have to be increased significantly. However, further investigations need to be done to explore the added complexity added due to the need of shape information when using the general scaling method and its advantages compared to the MAXSHIFT method. It might seem that the MAXSHIFT method would result in a decoder receiving the entire ROI part of the image before receiving the BG. This needs not be the case for the following reasons: (a) in progressive by resolution mode, at each level both ROI and BG are encoded. Therefore, at each reconstructed resolution and at each lever of quality, the whole image will be received (with the ROI first). (b) Due to the expansion of the ROI mask at each level of decomposition, the ROI mask covers most (sometimes all, depending on the ROI shape and size) of the lower subbands. This will result in a receiver that will receive both ROI and BG at the early stages of the transmission and (c) If the ROI is to be encoded lossless, the most optimal set of wavelet coefficients giving a lossless result for the ROI is described by the mask generated. However, the MAXSHIFT method supports the use of any mask since the decoder does not need to generate the mask. Thus, it is possible for the encoder to include an entire subband, e.g. the low-low subband, in the ROI mask and thus send a low-resolution version of the background at an early stage of the progressive transmission. This is done by the scaling of all the quantized transform coefficients of the entire subband. In other words, the user can decide in which subband he will start having ROI and thus, it is not necessary to wait for the whole ROI before receiving any information for the background. 6
Notice that the procedure of scaling coefficients might in some cases cause overflow problems, due to limited implementation precision. However, since the background coefficients are scaled down rather than scaling up the ROI coefficients, this will only have the effect that in certain implementations the least significant bitplanes for the background may be lost. The advantage is that the ROI, which is considered to be the most important part of the image, is still optimally treated while the quality of the Background is allowed to have degraded quality, since it is considered to be less important. 6 CONCLUSIONS The ROI mask generation method is described for the upcoming JPEG2000 standard and a simple algorithm is described for the ROI mask generation for rectangular shaped ROI. In addition to this, a revised method of the general scaling based method, the MAXSHIFT method, is described in this paper that does not require any shape information to be transmitted at the decoder. This makes it possible to have arbitrarily shaped region of interest coding without the need of shape information and ROI mask generation at the decoder, enabling JPEG2000 decoders to support ROI fucntionalities with minimum complexity. 7
7 REFERENCES [1] D. Taubman and A. Zalkor, Multirate 3-D subband coding of video, IEEE Trans. Image Processing, Vol. 3, pp. 572.578, Sept. 1994. [2] D. Taubman, "High performance scalable image compression with EBCOT", Proc. IEEE Int. Conference on Image Processing (ICIP), 24-28 October 1999, Kobe, Japan. [3] E. Atsumi and N. Farvardin, Lossy/lossless region-of-interest image coding based on set partitioning in hierarchical trees, Proc. IEEE International Conference on Image Processing (ICIP-98), pp. 87-91, October 4-7, 1998 Chicago, Illinois, USA. [4] Nister D. and Christopoulos C., "Lossless Region of Interest with a naturally progressive still image coding algorithm", Proc. IEEE International Conference on Image Processing (ICIP 98), pp. 856-860, 4-7 October 1998, Chicago, Illinois. 8
FIGURE CAPTIONS Figure 1: Illustration of the general scaling method and the MAXSHIFT method 9
No scaling Scaling based Method Bitplanes background ROI background background ROIbackground Maxshift Method Figure 1 10