An Adaptive and Deterministic Method for Initializing the Lloyd-Max Algorithm

An Adaptive and Deterministic Method for Initializing the Lloyd-Max Algorithm Jared Vicory and M. Emre Celebi Department of Computer Science Louisiana State University, Shreveport, LA, USA ABSTRACT Gray-level quantization (reduction) is an important operation in image processing and analysis. The Lloyd- Max algorithm (LMA) is a classic scalar quantization algorithm that can be used for gray-level reduction with minimal mean squared distortion. However, the algorithm is known to be very sensitive to the choice of initial centers. In this paper, we introduce an adaptive and deterministic algorithm to initialize the LMA for gray-level quantization. Experiments on a diverse set of publicly available test images demonstrate that the presented method outperforms the commonly used uniform initialization method. Keywords: Gray-level reduction, scalar quantization, Lloyd-Max algorithm, center initialization 1. INTRODUCTION Gray-level quantization (reduction) is an important operation in image processing and analysis. The objective is to reduce the number of unique gray levels in an image from an initial value of L to a desired number of K (K L) with minimal distortion. In most applications, 8-bit pixels in an image can be reduced to 4-bits or fewer without introducing perceivable distortion. Immediate applications of gray-level quantization include: (i) image compression, (ii) image segmentation, (iii) image analysis, and (iv) content-based image retrieval. The Lloyd-Max algorithm (LMA) 1,2 is a well-known scalar quantization algorithm that can be used for graylevel reduction. This algorithm is known to be optimal when the distortion measure is taken as the mean squared error (MSE). While originally developed for the discretization of analog signals so that they can be easily stored and manipulated, the LMA can also be used to quantize discrete data, such as an image histogram. The first step in the LMA is to select an initial set of K points, or centers, from the data set to be quantized. The data can then be partitioned around these initial centers in such a way that each partition consists of a set of points that are best represented by their respective center in the MSE sense. It can be shown that for two adjacent centers c i and c i+1, the optimal partition boundary lying between them is given by their average, i.e. (c i +c i+1 )/2. In addition, for a given partition, the optimum center is given by the centroid of that partition, which is calculated as the weighted average of the bins that lie between the partition s boundaries. Once a new set of centers is obtained by calculating the centroid of each partition, new partition boundaries can be obtained by averaging the new centers. Each iteration of this procedure yields a new set of centers, which give an MSE less than or equal to that of the previous iteration. Therefore, repeating this procedure until the change in MSE becomes negligible yields the final set of centers. In the case of gray-level quantization, each pixel in the original image is mapped to the nearest center. This results in a quantized image, which is an optimal representation of the original image given the initial set of centers. The performance of the LMA deps heavily on the choice of the initial centers. While the algorithm is guaranteed to obtain a local minimum in MSE, this minimum may be far from the global minimum (the ideal quantization) if the initial centers are chosen poorly. Therefore, an intelligent initialization method will not only enhance the quality of the quantized images, but also accelerate the convergence of the algorithm. Thesimplestwaytodeterminethe initialcentersistoselectthem randomlyfromamongthe pointsinthe data set. This method has two seriousdrawbacks. First, while it is possible to obtain the globalminimum MSE for the input image, it is equally likely to get the worst possible result. Therefore, over time, the performance of a random Corresponding author information: E-mail: ecelebi@lsus.edu, Telephone: 1-318-795-4281.

initialization scheme will be only mediocre. Second, this scheme is nondeterministic, allowing the same image to be quantized in multiple ways, some of which may be very different. This can be undesirable for applications such as content-based image retrieval, where it is important to be able to quantize an image the same way so that it can be accurately compared with previously quantized images. Scheunders 3 proposed an initialization method based on genetic algorithms. This method gives better results than random initialization, however it is still non-deterministic and the genetic optimization procedure is computationally demanding. An alternative initialization method involves the selection of the centers so that they are uniformly distributed throughout the data set. This commonly used method has the advantage of being deterministic, but it disregards the distribution of data, which is likely to yield suboptimal results in many cases. In this paper, we present an adaptive and deterministic initialization method for the LMA. Our method is loosely based on the farthest-first heuristic, 4,5 which is commonly used to initialize the multidimensional variant of the LMA, i.e. the Linde-Buzo-Gray (LBG) algorithm. 6 The rest of the paper is organized as follows. Section 2 describes the proposed initialization method. Section 3 presents the comparison of the proposed method with the commonly used uniform initialization method. Finally, Section 4 gives the conclusions. 2. PROPOSED INITIALIZATION METHOD In orderto initialize the LMA adaptively, we modify the farthest-first heuristic. 4,5 According to this method, the first center c 1 is calculated as the mean of the data vectors and the i-th center c i is chosen to be the point that has the largest minimum distance to the previously selected centers, i.e. c 1,c 2,...,c i 1. In the n-dimensional case, the distance function is often chosen as the squared Euclidean (L 2 ) distance. When dealing with scalar data, a different distance measure is required. In this study, the distance of center c to histogram bin b with height h is calculated using the following Gaussian weighted function: e ((c b)/255)2 e h3/2. The advantage of this function is that it weights bins not only by their distance from each center (difference in gray levels), but also by their height (the percentage of pixels that fall into the bin). There are several ways to choose the first center, including the bin with the largest height, the median of the histogram, and the center bin. In this study, the centroid of the histogram, which represents the mean gray-level in the input image, is chosen as the first center. The pseudocode of the overall initialization method is given in Algorithm 1. It is important to note that this algorithm does not produce the centers in sorted order. Since K is often a small number, the sorting of the centers can be accomplished easily by a simple algorithm such as insertion sort. 7 3. EXPERIMENTAL RESULTS AND DISCUSSION The proposed initialization method was compared to the uniform initialization method on 16 images taken from the USC SIPI Database. 8 Table 1 compares the methods for 4, 8, 12, and 16 gray-levels on six representative images. The columns from left to right represent the number of quantization levels (K), MSE obtained by the uniform initialization method, MSE obtained the proposed method, and the percent improvement (degradation) achieved by the proposed method, respectively. Performance statistics over the entire set of (16) images is as follows. The proposed method outperformed uniform initialization in 73% of the cases. In cases where the former method performed better, the average MSE improvement was 10.8%, whereas in the remaining cases the average MSE degradation was 5.34%. Overall, the proposed method obtained an average of 6.49% improvement in MSE. In summary, the proposed method generally outperforms uniform initialization with respect to distortion minimization. Furthermore, in cases where the former gives inferior results, the discrepancy between the two methods is insignificant. This was expected since the latter method allocates the quantization levels without regard to the gray level distribution of the image, whereas the former performs the allocation adaptively. Figures 1 and 2 shows sample quantization results for the Truck and Tiffany images and the corresponding error images, respectively. The error image for a particular initialization method was obtained by taking the pixelwise absolute difference between the original and quantized images. In order to obtain a better visualization,

input : h[0...255] (Normalized histogram of the input image) output: C = {c 1,c 2,...,c K } (K cluster centers) The first center is given by the histogram centroid; c 1 = 0; for g = 0;g 255;g = g +1 do c 1 = c 1 +g h[g]; Iterate over the required number of centers; for i = 2;i K;i = i+1 do Iterate over the histogram bins; max dist = ; max index = 0; for j = 0;j 255;j = j +1 do Iterate over the previously selected centers; min dist = ; for k = 1;k < i;k = k +1 do ( ) ck j 2 255 dist = e e h[j]3/2 ; if dist < min dist then min dist = dist; if max dist < min dist then max dist = min dist; max index = j; c i = max index; Algorithm 1: Proposed Initialization Algorithm pixel values of the error images were multiplied by 8 and then negated. It can be seen that the proposed method produces visually pleasing results with less prominent contouring (see, for example, the top road in the Truck image and the neck area in the Tiffany image) and distortion. 4. CONCLUSIONS In this paper, we introduced an effective Lloyd-Max initialization algorithm for gray-level quantization. In contrast to other popular initialization schemes, this algorithm is adaptive, deterministic, and computationally efficient. Experiments on a large set of test images demonstrated that the presented method generally outperforms the commonly used uniform initialization method with respect to distortion minimization. ACKNOWLEDGMENTS This publication was made possible by grants from the Louisiana Board of Regents (LEQSF2008-11-RD-A-12) and US National Science Foundation (0959583, 1117457). REFERENCES [1] Max, J., Quantizing for Minimum Distortion, IRE Transactions on Information Theory 6(1), 7 12 (1960). [2] Lloyd, S., Least Squares Quantization in PCM, IEEE Transactions on Information Theory 28(2), 129 136 (1982).

Table 1. MSE comparison of the LMA initialization methods K Uniform Proposed Delta (%) Airplane (216 gray levels) 4 131.39 129.64 1.33 8 35.80 33.88 5.34 12 18.60 19.87-6.84 16 10.59 9.04 14.69 House (227 gray levels) 4 171.35 171.52-0.10 8 48.53 47.04 3.07 12 22.71 21.86 3.72 16 14.40 13.74 4.59 Lenna (216 gray levels) 4 162.67 162.53 0.09 8 43.99 41.92 4.69 12 20.83 20.83 0.00 16 14.91 12.92 13.31 Splash (234 gray levels) 4 236.52 236.52 0.00 8 54.00 49.01 9.24 12 23.03 22.90 0.53 16 13.78 13.38 2.93 Tiffany (179 gray levels) 4 162.40 162.40 0.00 8 32.32 32.96-1.99 12 22.43 20.66 7.91 16 19.25 9.88 48.66 Truck (144 gray levels) 4 71.95 72.41-0.65 8 43.50 29.21 32.84 12 11.95 10.33 13.54 16 6.50 6.45 0.77

(a) Original image (b) Uniform initialization (c) Error image (d) Proposed initialization (e) Error image Figure 1. Quantization results for the Truck image (K = 8)

(a) Original image (b) Uniform initialization (c) Error image (d) Proposed initialization (e) Error image Figure 2. Quantization results for the Tiffany image (K = 16)

[3] Scheunders, P., A Genetic Lloyd-Max Image Quantization Algorithm, Pattern Recognition Letters 17(5), 547 556 (1996). [4] Gonzalez, T. F., Clustering to Minimize the Maximum Intercluster Distance, Theoretical Computer Science 38, 293 306 (1985). [5] Hochbaum, D. and Shmoys, D., A Best Possible Heuristic for the k-center Problem, Mathematics of Operations Research 10(2), 180 184 (1985). [6] Linde, Y., Buzo, A., and Gray, R., An Algorithm for Vector Quantizer Design, IEEE Transactions on Communications 28(1), 84 95 (1980). [7] Cormen, T. H., Leiserson, C. E., Rivest, R. L., and Stein, C., [Introduction to Algorithms], The MIT Press, third ed. (2009). [8] Weber, A., The USC-SIPI Image Database, http://sipi.usc.edu/database/(last Accessed: November 20, 2011).