Block-Matching based image compression

IEEE Ninth International Conference on Computer and Information Technology Block-Matching based image compression Yun-Xia Liu, Yang Yang School of Information Science and Engineering, Shandong University, Jinan, Shandong, China, 250100 E-mail: yangyang@mail.sdu.edu.cn Abstract In this paper, the motion estimation technique used in video compression is extended to still image compression, for which we called it the Block-Matching based image compression. Using a small amount of sampled data in the spatial domain, an approximate image of the original image is reconstructed. Then each block of the original image is obtained by searching the approximate image. The best match is selected with the minimal prediction error. This gives motion vector and the residual coefficients are obtained. Finally the sampled data, residual coefficients and block position are coded as the output bit-stream. Experimental work has been done and compared with JPEG,JPEG2000 and H.264 to show the validity of proposed algorithm. Block; search; compensation; compression;dct I. INTRODUCTION It has a long history that researchers utilize block based motion composition for the video coding [1, 2, and 3]. Block-based motion estimation is usually not used in intraframe type processing, such as in JPEG or I frame coding of H.26X. The most similar method is the Intra-prediction mechanisms adopted in I frame coding of the H.264 where only one row of the above block and one column of the left block are used to predict current block. In the literature there is not much discussion on Block based motion compensation technique for I-frame coding. There are two reasons; one is that the spatial correlation in a still image is not as strong as the temporal correlation in video sequences, and the other is that there is no reference image as those for P-frame and it might need extra memory to reconstruct an approximate image which will leads to low efficiency. However, as the spatial correlation does exist in the still image. If we can find an effective way to approximate the image and an effective searching method, the total bits for the image will be reduced and the image can effectively be compressed. It is interesting to note that this approach is available for both P and B frames and is also applicable for both lossless and lossy image compression. The next problem is how to obtain a suitable reference image. As we all know, adjacent frames in a video sequence have high similarity so motion estimation and motion compensation algorithm work well. However, in the still image coding there is not any reference image so we have to construct a reference image. There are several methods including the image pyramid [4], DCT, DWT and direct spatial sampling [5] methods. Our basic idea is that we can firstly achieve a small quantity of sampled data and then does the inverse transform or the interpolation [6] to achieve Xin-Yan Xu Shandong College of Electronic Technology, Jinan, Shandong, China, 250013 the approximate image of the source image. These methods are all applicable to our proposed method. II. STATISTICAL PROPERTY OF THE BLOCK-MATCHING METHOD TO STILL IMAGE Spatial correlation is the basis of the image compression. Traditional DPCM or wavelet methods are all based on this idea. However, these methods only consider the correlation of neighbor pixels, and there is not much discussion about the correlation of adjacent blocks. It is known to us that a block matches exactly with itself and has a lower match with lower correlation from any block of the reference frame. However, we can not use the original image as the reference image. Hence we should use much less data to reconstruct the reference image. A probable solution is that we use the downsampled image of the source image and then we do the interpolation to obtain the reference image. Let us give an illustration. Fig.1 (a) has a sharp edge diagonally. If we do the downsample by 2*2, we can get the Fig.1 (b). We then use different interpolation methods on (b) to get different results with the same size of (a). We can see that, no matter which interpolation method we adopt, the sawtooth in the diagonal part is inevitable. This means the current block may not match reference block in exactly the same position so the matching block has a displacement with original position. Most displacement concentrates in 1, -1 and 0. (a) Source image (b)source image downsampled by 4 (c)nearest (d)spline (e)cubic (f)linear Figure 1. Illustration of the displacement of down-sampled image Fig.2 shows the distributions of the displacements of matching block with reference to the original image (Lena, 512*512). Horizontal axis shows the position of the matching block while the vertical axis shows the number of matched blocks at the respective location. Different cases are shown in terms of down-sampling by 2*2, 4*4, 8*8 and 16*16. The displacement is center biased with very shot 978-0-7695-3836-5/09 $26.00 2009 IEEE DOI 10.1109/CIT.2009.46 145

distance for small down-sampling rates. When the downsampling becomes higher the displacement spreads. This implies that the search window should be larger. (a)sampling rate 1 :( 2*2) (b)sampling rate 1 :( 4*4) look for a way of getting the approximate image and the other is to code the residual coefficients. For the approximate image, we can use sampling in spatial domain or can use a reduced set significant coefficients in transform domain of the block; while for the residual coefficients we can use some mature methods such as DCT or discrete wavelet transform (DWT). III. BLOCK-MATCHING ALGORITHM BASED IMAGE COMPRESSION Based on the discussion above, let us propose a novel block-matching algorithm. Fig.3 shows the encoding part of our scheme. The decoding part is the inverse of the encoding part. There are mainly three stages. The first stage is to generate the reference image. In this stage, we compare three kinds of methods, decimation in space, DCT and LWT (Lifting Wavelet Transform) to generate a small set of data and then we use these data to achieve the reference image. The second stage is to get the residual image. In this stage, each block of the original image carries out block search on the approximate image and looks for the matching one; meanwhile the differences between them form the residual coefficients. The last stage is to encode the residual coefficients. We made use of both DCT and LWT to try out the approach. We are going to give a detailed description of each stage as shown below. (c)sampling rate 1 :( 8*8) (d)sampling rate 1 :( 16*16) Figure 2. Displacement of match block of Image Barbara This The biggest difference of the block-matching method for image compression we mentioned above and the block-based motion estimation in video coding is that our reference image has to be interpolated from a small amount of data so it has lower correlation between the original image than the correlation of the adjacent frames in a video sequence. Hence, the sub-pixel interpolation methods in motion estimation algorithm do not give improvement in this approach. Results from the above analysis give us a big difficulty for using motion compensation technique to I-frame coding. The success of this approach relies on whether we can find a tradeoff to optimize the cost and to achieve the best compression result. There are two key issues here, one is to Figure 3. The scheme of encoding part A. To generate the reference image The purpose of this stage is to generate the reference image using a small set of sampled data. We will give 146

further discussion on spatial sampling based, DCT based and LWT based approach. Spatial sampling is the easiest and the most widely used method of down-sampling. In the decimation part, we can use a down-sampling window of 8*8,16*16 or 32*32.After sampling, only 1/64,1/256 or 1/1024 of the original data are needed. Then we do the interpolation to get the reference image. Any interpolation method can be used such as cubic, bilinear, nearest neighbor and spline. DCT is widely used in MPEG and H.26x standards. We saved N DCT coefficients with the largest amplitude along the Zig-Zag order, where N is a number defined by the users.lwt is adopted in JPEG2000. We did an N level LWT over the original image and only saved the coarse image. After that we did the IDCT or ILWT to get the reference image. We give a comparison of different methods to obtain the reference image. We compare the spatial sampling, DCT and LWT (db 9-7).We used full search and then we used DCT to encode the residual image. We used the quantization tables in JPEG standard. For spatial sampling method, we chose different sampling window size values of 128, 64, 32 and 16. A larger value leads to a smaller sampling set of data. For DCT method, we changed the size of the DCT blocks and the number of DCT coefficients to be kept. For LWT method we changed the block size of each LWT and the LWT level of each block for the LWT. We did not give the detailed result here. However from the result we can see that the spatial sampling method has obvious advantages. When the bpp is 0.125, the spatial sampling method is around 0.6db better than that of the DCT and 0.7 db better than that of the LWT. In the case of high bitrates when bpp is 1, the spatial sampling method is around 4.3 db better than DCT and 4.3 db better than LWT. So we mainly consider the case using the spatial sampling method for the rest of the paper. B. Block search and Block match The template is used to format your paper and style the text. All margins, column widths, line spaces, and text fonts are prescribed; please do not alter them. You may note peculiarities. For example, the head margin in this template measures proportionately more than is customary. This measurement and others are deliberate, using specifications that anticipate your paper as one part of the entire proceedings, and not as an independent document. Please do not revise any of the current designations. We did the block search of each block (with size of M*M) of the original image in raster-scan order over the reference image. The criterion is to minimize the sum of square differences (SSD) of current block and the reference block. Based on the discussion before, we know that matched block concentrates at 0 and lower down-sampling ratio corresponds to a higher concentration. We set the search area equal to the block size M. We can use full search and any other motion search method in hybrid video sequence coding. In this part, the displacement of the block and the residual image is calculated and the residual image is sent to the next stage. C. Residual image encoding In this part, any image coding method can be used for the residual image. We use DCT and LWT to code the residual image and we use different quantization tables to control the bitrate according to QP. We will make a detailed discussion in the next section. To compare different residual image encoding methods we carried out experimental works as shown below. We chose 16 for Coarse-MBsize, and we chose median value in the block as the sampling value, and the interpolation method is 'cubic', search area is 8 (the location is from (-8,- 8) to (8,8)), Residual-MBsize is fixed to 16*16. We used the full search for motion estimation and 3 level 9-7 lifting wavelet transform or DCT to generate the residual coefficients. We used the arithmetic coding and different quantization level to achieve different compression rates. The abbreviations we use in the table are defined as below. Tab.1 gives the PSNR value of the reconstructed and original image. The definition of Tab.1 is as shown below. DCT: DCT for residual coding and without block-searching DCT-S: DCT for residual coding and with block-searching LWT: LWT for residual coding and without block-searching LWT-S: LWT for residual coding and with block-searching TABLE I. RESULTS WITH FIXED SPATIAL DOMAIN SAMPLING DENSITY bpp DCT DCT-S LWT LWT-S 0.125 25.324 28.689 7.909 27.928 0.25 30.964 32.238 27.735 31.436 0.5 35.480 36.123 34.266 35.445 1 39.679 40.110 39.366 39.792 From Tab.I, we can see that DCT for residual coding and with block-searching (DCT-S) has high PSNR in all bitrates. Tab.2 gives the percentages of the bit-stream for different values of Coarse-MBsize. It is obvious that when the Coarse-MBsize is small, the percentage of data for the reference image is large while the percentage of residual image and the displacement of each block are small. For example, when Coarse MBsize is 4 and bpp is 0.5, the percentage of reference image is 72.66% and the residual image and displacement are 23.8059% and 3.534% respectively. When the Coarse-MBsize is large,the percentage of data for reference image is greatlyreduced. It reduced to 0.61% when Coarse-MBsize is 32 and bpp is 1. When the Coarse MBsize is fixed, the percentage of data for reference image and displacement becomes smaller and the percentage of the residual image becomes larger when bpp increases. When Coarse-MBsize is 16, under four bitrate 0.125, 0.25, 0.5 and 1 bpp, the PSNR is better than other combinations. Hence in our further experimental work, we 147

chose Coarse-MBsize equal to 16, and we chose the median value in the block as the sampling value, and the interpolation method was 'cubic', the search area was 8 and Residual-MBsize was fixed to 16*16. TABLE II. RATIO OF DIFFERENT PART OF THE BITSTREAM Coarse MBsize 4 bpp Coarse bit % Residual bit % Displacem ent cost % PSNR 0.5 72.66 23.80 3.53 33.79 1 32.12 66.31 1.56 40.19 (a) (b) (c) 0.25 bpp (a) Proposed (b) JPEG (c) JPEG2000 8 0.125 63.67 13.68 22.64 28.25 0.25 41.01 44.38 14.60 31.32 0.5 21.29 71.13 7.57 35.45 1 9.70 86.84 3.45 40.12 16 0.125 21.03 43.56 35.40 28.30 0.25 9.92 73.3769 16.703 32.343 0.5 5.82 84.3747 9.805 35.386 1 2.56 93.1329 4.307 40.12 32 0.125 4.99 59.4372 35.572 28.326 0.25 2.36 80.7660 16.874 32.263 0.5 1.38 88.7750 9.845 35.344 1 0.61 95.0383 4.351 40.111 IV. EXPERIMENTAL RESULTS We did the experimental work on a large quantity of images and in this paper in the following two parts. A. Comparison with JPEG Further experimental works were done on comparing our approach with the JPEG. We only give the result of image Lena but similar results can be drawn from other image. The results are shown in Tab.3. We can see that proposed block search algorithm is better than JPEG. TABLE III. PERFORMANCE OF PROPOSED METHOD AND JPEG Proposed JPEG Proposed JPEG bpp 0.12589 0.1152 0.24751 0.2502 PSNR 29.211 27.307 32.929 32.343 bpp 0.48921 0.4853 1.0057 0.9914 PSNR 35.984 35.74 40.166 39.179 (a) (b) (c) 0.125 bpp (a) Proposed (b) JPEG (c) JPEG2000 Figure 4. Performance of proposed, JPEG and JPEG2000 B. Comparison with H.264 and JPEG2000 Furthermore, we compare the block search method with the JPEG2000 using JasPer [7] and Intra-frame coding of H.264 [8] using JM12.4 [9]. We chose Image Lena and used intra only and entropy coding method was CABAC. Let us mainly consider the encoding and decoding time and also the PSNR when the bitrate is fixed. Results of our experimental work are shown in Fig.5-Fig.7. From Fig.5, it is clear that although the block-searching method is better than JPEG, the performance of H.264 and JPEG2000 are comparable and better than the block search method especially at the low bitrates. From Fig.6, we can get the conclusion that H.264 requires the longest encoding time and the block search method requires only around 1/2 of the encoding time of the JM, and JPEG2000 has the shortest encoding time. From Fig.7, we can see that H.264 has the shortest decoding time cost and the block search method is just in the middle of H.264 and JPEG2000. The main reason that the block search method has poor PSNR is that the cost of encoding the coarse image. Fig.5 shows the processing results of image Lena. We can see that when the bitrate is 0.125 bpp, the blocksearching method has much less blocking effect as compared with that of the JPEG. Figure 5. Comparison of bitrate vs PSNR 148

V. CONCLUSION AND FUTURE WORKS We have proposed a Block-matching based image compression algorithm. Like the motion estimation method in video compression, based on some experimental observation and analysis, we applied spatial sampling and interpolation methods to generate the reference image. Then each block of original image can make a full search on the reference image to find the matched one with the minimal prediction error. The DCT is used to code the residual coefficients. Results of our experimental work show that it is better than JPEG. The application in the I-frame coding of JPEG2000 and H.264 also shows its validity. As most video compression standards are block based, this method can also be generalized to I-frame coding in video compression. Figure 6. Comparison of encoding time VI. ACKNOWLEDGEMENT This work was supported by NSFC (30870666). Figure 7. Comparison of Decoding time REFERENCES [1] Bovik, Al (ed.), Handbook of Image and Video Processing, San Diego: Academic Press, 2000. [2] Wang, Yao, Jörn Ostermann, and Ya-Qin Zhang, Video Processing and Communications, Signal Processing Series. Upper Saddle River, N.J.: Prentice Hall, 2002. [3] Rafael C. Gonzalez, Digital image processing (3rd ed), Pearson Prentice Hall, 2008 [4] Burt, Peter and Adelson, Ted, The Laplacian Pyramid as a Compact Image Code, IEEE Transactions on Communications, vol. COM-31, pp. 532-540,1983. [5] Segall, A., Katsaggelos, A., Resampling for Spatial Scalability, 2006 IEEE International Conference on Image Processing, Oct. 8-11, 2006 [6] de Boor, C., A Practical Guide to Splines, Springer-Verlag, 1978. [7] JasPer-1.900.1, ttp://www.ece.uvic.ca/~mdadams/jasper/ [8] Heiko Schwarz, Detlev Marpe and Thomas Wiegand, Overview of the Scalable Video Coding Extension of the H.264/AVC Standard, IEEE Transactions on Circuits and Systems for Video Technology, vol.17, No.9, Sep,2007 [9] IP Homepage - H.264/AVC JM Reference Software -- JM12.4 149