Prediction of signs of DCT coefficients in block-based lossy image compression Nikolay N. Ponomarenko a, Andriy V. Bazhyna b*, Karen O. Egiazarian b a Deparment 504, National Aerospace University, Chkalova St. 17, 61070, Kharkov, UKRAINE; b Institute of signal processing, Tampere University of Technology, P.O.BOX-553, FIN-33101, Tampere, FINLAND ABSTRACT A practical impossibility of prediction of signs of DCT coefficients is generally accepted. Therefore each coded sign of DCT coefficients occupies usually 1 bit of memory in compressed data. At the same time data of all coded signs of DCT coefficients occupy about 20-25% of a compressed image. In this work we propose an effective approach to predict signs of DCT coefficients in block based image compression. For that, values of pixels of already coded/decoded neighbor blocks of the image are used. The approach consist two stages. At first, values of pixels of a row and a column which both are the nearest to already coded neighbor blocks are predicted by a context-based adaptive predictor. At second stage, these row and column are used for prediction of the signs of the DCT coefficients. Depending on complexity of an image proposed method allows to compress signs of DCT coefficients to 60-85% from their original size. It corresponds to increase of compression ratio of the entire image by 3-9% (or up to 0.5 db improvement in PSNR). Keywords: DCT, lossy image compression, prediction 1. INTRODUCTION Lossy image compression based on discrete cosine transform (DCT) 1 is area of intensive investigations during last several decades. Note that such widely known image and video standards as JPEG 2 and MPEG 3 both are based on DCT. DCT based image compression methods are used in both digital photography and digital TV, for remote sensing data compression 4, for compression of medical images 5, etc. In last decades intense investigations have been carried out in wavelet image compression area. These researches were completed in 2002 by introducing of a new wavelet based JPEG2000 standard 6. However, JPEG2000 is not able to supplant JPEG that remains a main compression standard for digital cameras. Moreover, during last years new effective DCT based methods 7-12 have been developed which significantly outperform JPEG2000. Usually DCT is carried out on separate blocks of an image. After this obtained DCT coefficients are quantized and compressed by a method of reduction of statistical redundancy of data (e.g. Huffman or arithmetical coding). A practical impossibility of prediction of signs of DCT coefficients is generally accepted. Therefore each coded sign of DCT coefficients occupies usually 1 bit of memory in compressed data. At the same time data of all coded signs of DCT coefficients occupy up to 20-25% of a compressed image (see Table 1 in Section 5). Some attempts to predict signs of wavelet coefficients for wavelet based image compression were done recently 13-16. As a result average improvement around 0.3 db could be achieved. However, to the best knowledge of authors, there are no published studies concerning prediction of signs of DCT coefficients till this moment. Thus this topic remains unstudied. On the other hand, finding of an effective way of prediction of the signs of DCT coefficients may significance increase performances of all methods of image and video compression similar to JPEG and MPEG. In this work we propose an effective method of prediction and compression signs of DCT coefficients. Section 2 describes an idea of proposed approach to signs prediction. Different variants of strategies of signs searching are considered in Section 3. Section 4 is devoted to method of prediction pixel values of first row and first column of an * andriy.bazhyna@tut.fi; phone +358 3 3115 4963 Image Processing: Algorithms and Systems V, edited by Jaakko T. Astola, Karen O. Egiazarian, Edward R. Dougherty Proc. of SPIE-IS&T Electronic Imaging, SPIE Vol. 6497, 64970L, 2007 SPIE-IS&T 0277-786X/07/$18 SPIE-IS&T/ Vol. 6497 64970L-1
image block taking into account already coded part of the image. In Section 5 we discuss context modeling for arithmetical coding of prediction errors. Experimental results analysis is presented in Section 6. 2. PROPOSED METHOD In this work we propose an effective method to predict signs of DCT coefficients. The primary assumption for our method is that values of pixels located at the borders of the neighboring blocks should be highly correlated (Fig. 1b-d). That is, pixels of first row and first column of current block could be well predicted using values of pixels of already coded/decoded neighbor blocks of the image. 25-8 2 0 1-1 0 0 0 1 1 0 0 0 0 0 1 1 0 0 0 0 0 0 (a) (b) (e) (g) (i) 25-8 2 0 1-1 0 0 25 8 2 0 1-1 0 0 25-8 -2 0-1 1 0 0 0 1-1 0 0 0 0 0 0 1 1 0 0 0 0 0 0-1 1 0 0 0 0 0-1 1 0 0 0 0 0 0 1-1 0 0 0 0 0 0-1 1 0 0 0 0 0 0 (c) (d) (f) (h) (j) Fig.1. Image blocks from Barbara test image: current block - 11 th block in 3 rd row of blocks (d); DCT coefficients of the current block after transform and uniform quantization by factor 30 (a), block located to north (b), block located to west (c), from current block. Blocks generated using the same combinations of DCT coefficients (e-j). The absolute value of DCT coefficients and their position were remains the same. The signs were randomly selected for each coefficient. Pairs of block and their DCT coefficients (e-f), (g-h), (i-j). However, the above said is true only, if the right combination of the signs of DCT coefficients that correspond to current block was selected (Fig. 1a). For other combinations of signs the border pixels of the blocks most probably will be dissimilar. This is illustrated on Fig. 1e-j, where several examples of blocks generated using the same combinations of DCT coefficients are presented. The absolute value of the DCT coefficients and their position were remains the same. The signs, however, were randomly selected for each coefficient. As could be seen none of the blocks with randomly selected signs is similar to the true current block. The dissimilarity could be easily measured numerically as difference between first row of current block and last row of block located to the north plus difference between first column of current block and last column of block located to the west. We were using the Mean Square Error (MSE) as a measure of similarity. For the right combination of signs for the example on Fig. 1a-d the MSE was 1344. For examples shown on Fig. 1 e-h, g-h, i-j the MSE was 5556, 77359, 12592 correspondingly. As could be seen even for the variant, which differ from original by sign of two small-valued coefficients (Fig.1 e-f) the MSE is increasing in more than four times. Based on above described idea, we can summarize the main steps of the proposed method for prediction of signs of DCT coefficients. Let us denote pixels of first row and first column of current block as test rows. The method consist two stages. At first, values of pixels of test rows are predicted from neighboring, already coded, blocks. We denote this prediction as estimate of test rows. At second stage, the estimate of test rows is used in the following way. For all SPIE-IS&T/ Vol. 6497 64970L-2
possible combinations of signs of DCT coefficients, pixels of test rows are calculated via inverse DCT transform. The generated test rows are compared to previously found estimate of test rows. We use the Mean Square Error (MSE) as a quality criterion. Finally, as the prediction of the sign variant, which provide smallest value of MSE, is selected. The smallest value of MSE is not always corresponds to right combination of signs. Therefore, we cannot fully entrust to it and fully exclude information about sign from coded bitstream. In our method we indicate with one bit guess or not value for each sign. The number of correctly predicted sign is greater than number of noncorrectly predicted. Due to this, entropy of such sequence is smaller than 1. As a result it could be efficiently compressed by binary arithmetic coding. It should be noted, that calculation of full inverse DCT of the block for every possible combination of signs is not required. The number of pixels in test rows is far smaller than total number of pixels in block. Thus, it is reasonable to store values at the positions of test rows, which corresponds to one unit DCT coefficient in every position within block. The values of test rows that correspond to particular combination of DCT coefficients are found using linearity properties of DCT transform, such as multiplicativity and additivity. For 8x8 block the number of pixels in test rows is 15, while the total number of pixel in block is 64. The memory requirement then is 15x63 = 945 numbers (the DC DCT coefficient is not count). 3. SEARCH METHOD FOR SIGN OF DCT COEFFICIENTS The sign of each DCT coefficient can be either positive or negative. The total number of all possible combinations of signs is 2 N, where N denotes number of nonzero DCT coefficients in the block. For 8x8 block size the maximum is N = 63, since sign of DC DCT coefficient is known. For complex blocks and small quantization values the N could easily reach several decades. As the result, exhaustive search over all possible combinations is not applicable in real time. Let us consider objective function as MSE between estimate of test rows from one side and test row obtained for particular combination of sings from another. This objective function was found to have large number of local minima. Therefore, the order in which signs of DCT coefficients inside of a single block are predicted is important for a good performance of the prediction. Otherwise, search method may fall into one of the local minima. The search method, from one side, should be fast enough to be performed in real time. From another, it should provide good convergence to global minimum. The signs of each coefficient could be checked either individually of by pairs, by triples and so on. For the reason of simplicity and real-time constrains we consider here only checking sign of coefficient one by one. This approach is also applicable because of linearity properties of DCT transform. For independent coefficient checking there are number of possible search orders. The simplest way is to check sign of the coefficients in row-wise or column-wise order. This is, however, not the most efficient way. Another possibility is to check DCT coefficients based on their absolute value. That is, coefficients with higher absolute values are checked first. This reason for this is that the probability of false sign prediction is lower for the coefficient with higher absolute value and thus there is a slammer chance to fall into the local minima. The third possibility is to check DCT coefficients based on their distance from beginning of the block (to DC DCT coefficient). That is coefficients located closer to the beginning of the block are checked first. The reason for this is that probability of error in prediction is usually lower for low frequency coefficients. In our experiments we start from block that have only DC DCT coefficient. After, that we add one by one AC coefficients in one of the orders described before. The positive and negative value for each added coefficient is checked. For both variants we calculate values of pixels of test rows. The generated values are compared to that ones obtained by prediction from neighbor blocks. As the prediction of the sign we select variant, which provide smaller value of MSE. This procedure continued for all AC coefficients of the block. The percentage of errors for different scanning methods is given in Table 1. As could be seen the percentage of errors for all three methods is rather close. The absolute value based search is performed slightly better then others for practically all test images and quantization values. Additionally, after such search method guess on not sequence is also ordered starting from coefficients with largest absolute values. As a result, probability of false prediction is smaller at the beginning of guess or not sequence for every block. This feature is used in Section 5 in order to compress sign information more efficiently by binary arithmetic coder. SPIE-IS&T/ Vol. 6497 64970L-3
Table 1. The percentage of errors for different search method and different quantization values. 1 rowwise search, 2 - absolute value based search, 3 position based search. The block size was 8x8 and uniform quantization of all DCT coefficients. Quantization 10 20 40 Image/Method 1 2 3 1 2 3 1 2 3 Lenna 30,0 29,3 29,9 23,0 22,1 23,0 19,5 18,8 19,5 Barbara 34,9 34,9 34,9 34,0 34,0 33,9 33,9 33,9 33,7 Baboon 39,1 38,8 39,2 37,0 37,6 37,1 34,2 33,9 34,3 Pepper 37,8 37,7 37,8 24,6 24,5 24,7 15,6 15,5 15,6 Goldhill 32,2 31,8 32,3 27,5 27,2 27,5 23,1 22,9 23,0 Based on said above as compromise between time of prediction and quality of prediction is achieved we select search method based on an independent prediction of signs of DCT coefficients in order starting with coefficients with largest absolute values. 4. PREDICTION OF BLOCK Till now we have considered as estimate of test rows the closest row and column of neighboring blocks located to the north and to the west from currently processed block. However, quality of prediction of signs by proposed method depends on accuracy of estimation of test rows from neighbor blocks of the image. This can be illustrated with simple example. Suppose, that we know test rows exactly and use them instead of estimate of test rows with search method 2 from Table 2. The percentage of false predicted signs for Lena test image and quantization values 10, 20 and 40 is 4.7, 3.6 and 2.6 correspondently. This is 6-7 times smaller than with simple test rows estimate discussed earlier (see Table 1). Even for most hard case with Baboon test image and quantization value 10 amount of false predictions drops to 7.7%, which is more than 5 times smaller. Therefore, it is important to improve estimate of test rows in order to improve prediction of the signs. Both the first row and the first column of each block of an image should be predicted. The simplest example of such prediction is usage of last row and last column of already coded blocks of the image. In this paper more complicated method of the prediction is proposed. Fig. 2. A fragment of an image around of pixel X to be predicted SPIE-IS&T/ Vol. 6497 64970L-4
A content of a coded pixel X is presented in Fig 2. According to place of the pixel in the block (in predicted row or predicted column of the block) all of values of pixels A-N may be known or only a part of them. Value of the pixel X is predicted by this known part of values of pixels A-N. Prediction of the value of X is carried out according to a method similar to the method described in 17. The prediction relies on hypothesis that the value of X is in the same degree similar to the pixel A as the content (neighborhood) of the pixel X (the pixels A, B, C, D) is similar to the content of the pixel A (pixels E, G, H, B). Let us calculate difference between contents of the pixels A, B, C, D, E and the content of the coded pixel X: LA = ((F-A) 2 +(G-B) 2 +(H-C) 2 +(B-D) 2 )/4, LB = ((G-A) 2 +(H-B) 2 +(I-C) 2 +(C-D) 2 +(D-E) 2 )/5, LC = ((H-A) 2 +(I-B) 2 +(J-C) 2 +(K-D) 2 +(L-E) 2 )/5, LD = ((B-A) 2 +(C-B) 2 +(K-C) 2 +(L-D) 2 +(M-E) 2 )/5, LE = ((D-B) 2 +(L-C) 2 +(M-D) 2 +(N-E) 2 )/4. If the values of LA, LB, LC, LD or LE are equal to zero we assign a unity value to it. The prediction PX is a weighted sum of the nearest pixels A, B, C, D, E: PX = (A/LA+B/LB+C/LC+D/LD+E/LE)/(1/LA+1/LB+1/LC+1/LD+1/LE). If the values of some pixels A, B, C, D, E are unknown they are not used in the prediction. If we do not known the values of some pixels among F-N (e.g. at the image edges) they are substituted by a duplicated value of the nearest known (already coded) pixel. The used prediction method does not require any knowledge of the image and able to effectively predict pixel values in edges and texture areas. This accuracy and consequently the quality of prediction of sings are worse for more complicated (noise-type) images. Table 2 contains the values of root mean square error (RMSE) of pixel values prediction for the proposed prediction method (Pred) and for the method of simple copying of neighbor row and column of already coded blocks (Copy). The RMSE values are given for different quantization steps (QS) of image block DCT coefficients. Table 2. RMSE for two methods of prediction of first row and first column of a image block RMSE Image QS=5 QS=10 QS=20 QS=40 Copy Prediction Copy Prediction Copy Prediction Copy Prediction Lena 8.48 6.73 8.72 7.01 9.50 7.78 11.16 9.90 Barbara 13.97 11.18 14.36 11.57 15.47 12.77 17.78 15.15 Baboon 28.96 26.13 29.09 26.26 29.68 26.81 30.98 28.12 Goldhill 10.78 10.31 11.17 10.61 12.04 11.38 13.44 12.97 Peppers 10.52 8.85 10.75 9.21 10.64 9.40 11.66 10.70 It is well seen that the proposed method always provides significantly smaller prediction error than method Copy. The Second conclusion is that prediction error in larger degree depends upon image characteristics than on given QS. SPIE-IS&T/ Vol. 6497 64970L-5
5. CODING OF THE SIGNS The guess or not value is indicated using single bit. In general the number of correctly predicted sign is significantly greater than number of noncorrectly predicted. Due to this, entropy of such sequence is smaller than 1. Thus it could be efficiently compressed by binary arithmetic coding 18. The coding performance could be even further improved by taking to the account number of factors. The probability of the false sign prediction is usually smaller for low-frequency DCT coefficients (condition C1). Also the probability of the prediction error is smaller for DCT coefficients with larger absolute values (condition C2). Usually, if any signs have been falsely predicted in current block the probability of prediction error for rest of the signs is larger (condition C3). Using these and some others conditions in our methods we employ context probability models for coding (see Fig. 3). This additionally increases the compression ratio (CR) of signs of DCT coefficients. C1 true false C2 C2 true false true false C3 C3 C3 C3 true false true false true false true false PM 1 PM 2 PM 3 PM 4 PM 5 PM 6 PM 7 PM 8 Fig.3. Scheme of selecting of number of probability model for arithmetical coding of prediction errors In the scheme in Fig. 3, PM 1 denotes the probability model 1 and so on. 6. EXPERIMENTAL RESULTS Table 3 illustrates that the memory size for storing DCT coefficients signs is valuable in comparison to the size of entire compressed images. The percentage of bitstream occupied by signs is presented for three compression methods: JPEG, AGU 7 with block size 8x8 and AGU with block size 32x32. This ratio typically remains constant for all image types (JPEG and AGU) and additionally for all compression ratios (AGU 32x32). The smaller percentage of space occupied by sign for JPEG is due to not very efficient coding of other information such as coefficients positions and absolute values. The experiments were carried out on grayscale test image Lenna, Barbara, Baboon, Peppers, Goldhill. The dimensions of test images were 512x512 pixels. The downscaled versions of test images are shown on Fig.4. JiI a) b) c) d) e) Fig 4. Thumbnails of test images: Lenna (a), Barbara (b), Baboon (c), Peppers (d), Goldhill (e). SPIE-IS&T/ Vol. 6497 64970L-6
Table 3. The percentage of bitstream occupied by signs for three compression method: JPEG, AGU8 and AGU32 % Method Image Compression Ratio 4 8 16 32 64 Lenna 16.0 15.5 14.7 12.8 10.0 Barbara 15.7 16.2 15.2 12.8 10.1 JPEG Baboon 17.5 18.1 17.6 15.2 11.2 Peppers 15.8 15.6 14.4 12.8 10.3 Goldhill 16.5 17.0 16.3 14.3 11.2 Lenna 22.1 19.9 17.5 14.9 11.1 Barbara 20.3 19.9 18.9 16.1 12.6 AGU 8x8 Baboon 23.2 22.1 19.5 16.0 11.9 Peppers 22.8 18.8 16.9 15.0 12.3 Goldhill 22.6 20.6 18.1 14.9 11.5 Lenna 23.7 22.1 22.2 22.1 21.4 Barbara 22.4 23.0 23.1 22.0 20.7 AGU 32x32 Baboon 24.5 24.2 23.2 21.5 19.8 Peppers 24.5 22.0 22.0 22.1 21.6 Goldhill 24.3 23.2 22.5 21.6 20.6 Table 4 contains results of usage the proposed method for signs compression (bpp=1/cr) for different QS with AGU coder (image block size is 8x8). Table 4. Results of compression of signs by proposed method Image bpp QS=5 QS=10 QS=20 QS=40 Lena 0.905 0.824 0.694 0.628 Barbara 0.919 0.879 0.849 0.832 Baboon 0.961 0.953 0.939 0.915 Goldhill 0.926 0.887 0.824 0.748 Peppers 0.927 0.882 0.731 0.541 As seen, the proposed method provides CR of DCT coefficient signs within the range from 1.4 up to 1.85 times. Maximal CRs are obtained for simple images like Peppers (see Fig. 4, a) whilst minimal CRs are obtained for complex images like Baboon (see Fig. 4, b). Higher CRs are achieved for higher QS. If size of signs DCT coefficients is 20% form size of compressed image then increase of CR of signs at 1.85 times is equivalent to increase of CR of all image data at 9%. In one s turn it is equivalent to increase PSNR for (fixed CR) approximately to 0.5 db. SPIE-IS&T/ Vol. 6497 64970L-7
6. CONCLUSIONS In this work an effective method for predict signs of DCT coefficients is proposed. Depending on complexity of an image proposed method allows to compress signs of DCT coefficients to 60-85% from their original size. It corresponds to increase of compression ratio of the entire image by 3-9% (or up to 0.5 db improvement in PSNR). These values describe a lower bound of the improvement since there are several ways to advance the sign coding efficiency. One can use a more complicated method of prediction of the first column and row of a block. It is also possible to apply more complex context modeling for coding of prediction error of signs. More complex methods of searching of signs can be used as well. The proposed method of the compression of the signs of DCT is independent from any other part of the coder. Moreover, it is suitable for use in scalable and progressive coding. Thus, it may be used for improving practically all existent image and video compression methods. In particular, the proposed method may be used for further compression of already compressed JPEG-images without any additional losses. Another application is to recover information about the sign of DCT coefficient when it was lost during transmission via lossy transmission channels. REFERENCES 1. Rao, K., Yip, P., Discrete Cosine Transform, Algorithms, Advantages, Applications, Academic Press, 1990. 2. Pennebaker, W.B., and Mitchel, J.L.: JPEG Still Image Data Compression Standard, Van Nostrand Rainhold, New York, 1993 3. ISO/IEC 13818-3 "Generic Coding of Moving Pictures and Associated Audio (Part 3: MPEG/Audio)". 2nd Edition, Feb., 1997. 4. Lukin, V., Ponomarenko, N., Zriakhov, M., Zelensky, A., Egiazarian, K., Astola, J., Quasi-optimal compression of noisy optical and radar images, Proceed. of Image and Signal Processing for Remote Sensing XII, Stockholm, Sweden, Vol. 6365, 12 p., 2006. 5. Chong, M. N., Ang, E. L., Tan, C. S.. Loo, C. Z., Compression of medical images through adaptive block-size DCT coding, Proceedings of SPIE Medical Imaging 1996: Image Display, vol. 2707, pp. 252-260, 1996. 6. Taubman, D., Marcellin, M., JPEG 2000: Image Compression Fun-damentals, Standards and Practice, Boston: Kluwer, 2002. 7. Ponomarenko, N., Lukin, V., Egiazarian, K., Astola, J., DCT Based High Quality Image Compression, in Proc. Scandinavian Conf. on Image Analysis, Springer Series: Lecture notes in comp. science, vol. 3540, pp. 1177-1185, 2005. 8. Ponomarenko, N., Egiazarian, K., Lukin, V., Astola, J., High-Quality DCT-Based Image Compression Using Partition Schemes, to appear in IEEE Signal Processing Letters, Vol. 13, No. 12, December, 2006, 4p. 9. Tran, T.D., Nguyen, T.Q., A lapped transform progressive image coder, in IEEE Proceedings of the International Symposium on Cir-cuits and Systems ISCAS '98, Vol. 4, 1998, pp. 1-4. 10. Huang, Y., Pollak, I., MLC: A Novel Image Coder Based on Multi-tree Local Cosine Dictionaries, in IEEE Signal Processing Letters, Vol. 12, Issue 12, pp. 843-846, 2005. 11. Hou, X.S., Liu, G.Z., Zou, Y.Y., Embedded Quadtree-Based Image Compression in DCT Domain, in Proc. of ICASSP, Vol. III, pp. III-277 III-280, 2003. 12. Dai, W., Liu, L., Tran, T.D., Adaptive block-based image coding with pre-/post-filtering, in Proceedings of Data Compression Confer-ence DCC 2005, pp. 73 82, 2005. 13. Aaron Deever, S. S. Hemami, "What's Your Sign?: Efficient Sign Coding for Embedded Wavelet Image Coding", Proceedings of Data Compression Conference 2000, Snowbird, Utah, March 2000. 14. A. Bilgin, P. Sementilli, and M. Marcellin. Progressive image coding using trellis coded quantization. IEEE Transactions on Image Processing, 8(11):1638-1643, November 1999. 15. D. Taubman. High performance scalable image compression with EBCOT. Proceedings of International Conference on Image Processing, pages 344-348, 1999. 16. X. Wu. High-order context modeling and embedded conditional entropy coding of wavelet coefficients for image compression. Proc. of 31st Asilomar Conf. on Signals, Systems, and Computers, pages 1378-1382, 1997. 17. Golchin, F., Paliwal, K.K., A Context-based Adaptive Predictor For Use In Lossless Coding, in Proc. of IEEE Region 10 Ann. Conf. Speech and Image Techn. for Computing and Telecommunications, vol. 2, pp. 711-714, 1997. 18. Langdon, G.G., and Rissanen, G.G.: A simple general binary source code IEEE Trans. Inf. Theory., pp. 800-803, 1982. SPIE-IS&T/ Vol. 6497 64970L-8